/r/q?req.query.q -- Subreddit Search

20,639 Subscribers

Struggling with light Matrix for Directional Light in Shadow Mapping

1 Comment

2024/12/03
02:56 UTC

Khronos Streamlines Development and Deployment of GPU-Accelerated Applications with Vulkan 1.4

The Khronos Group has announced the release of Vulkan 1.4, the latest version of its cross-platform 3D graphics and compute API. Vulkan 1.4 integrates and mandates support for many proven features into its core specification, expanding the functionality that is consistently available to developers, greatly simplifying application development and deployment across multiple platforms.

The Vulkan 1.4 specification consolidates numerous previously optional extensions, features, and increased minimum hardware limits, many of which were defined in the Vulkan Roadmap 2022 and 2024 milestones and associated profiles, including:

Streaming Transfers: Vulkan 1.4 imposes new implementation requirements to ensure portable, cross-platform applications can stream large quantities of data to a device while simultaneously rendering at full performance.
Previously optional extensions and features critical to emerging high-performance applications are now mandatory in Vulkan 1.4, ensuring their reliable availability across multiple platforms. These include push descriptors, dynamic rendering local reads, and scalar block layouts.
Maintenance extensions up to and including VK_KHR_maintenance6 are now part of the core Vulkan 1.4 specification.
8K rendering with up to eight separate render targets is now guaranteed to be supported, along with several other limit increases.

Learn more: https://khr.io/vulkan14

7 Comments

2024/12/02
17:30 UTC

When does host coherent mapped memory get transferred to the device?

Or another way to phrase the question might be: Does host coherent memory get implicitly transferred over the bus upon CPU write, or on GPU read? I'd guess GPU read, unless there is some automatic device side caching.

Or yet another: Is it better to persistently map host cached memory or not? (Where host writes and reads but device just reads. Is there a down side or consideration?)

Background... I have some CPU code that writes an image into a host allocated buffer. That buffer is mapped to a VkBuffer via VkImportMemoryHostPointerInfoEXT. The reason for importing the host pointer is to avoid an extra staging step for the host to device copy. The type of compatible memory is VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT. So far, this is fine and appears to work well. (I don't care for the )

Now I'd like to add some additional steps where the CPU also reads from that buffer as well as write. This means I want CPU read and write cache performance. Later (after CPU processing) I want to copy this buffer into a device tiling-optimal image for display. I'm trying to determine if CPU read and write creates a problem. Perhaps it's better not to map (vkMapMemory) this memory? (Since it is already host allocated, mapping is not necessary for the current use.)

6 Comments

2024/12/02
09:06 UTC

Memory allocation questions

Checking my understanding here.

Heaps

The Vulkan spec is very general in this area. There are a huge number of options.

The Vulkan spec says there's some number of heaps for each implementation. There's no indication in the spec of how many. One? Two? 65535? I gather from this 2018 GDC presentation that there are very few, rarely more than three. Apparently there is rarely if ever more than one heap of a given type. Is that correct? The main types seem to be unshared CPU memory, unshared device memory, and various slow shared variants which may or may not be supported. Or the other extreme, the integrated graphics case, where everything is in one memory system. Are those pretty much the real world options, or are there other variants?

The Vulkan spec describes allocate and free functions. But the GDC presentation indicates these are very limited, or at least were back in 2018. The number of allocations is limited; that presentation suggests 4K. (Where does that number come from? Can it be read from the Vulkan API?) So you can't just allocate space for each texture with its own Vulkan allocate call. I think. The general idea seems to be to allocate big blocks (256MB was suggested) and then subdivide them with some kind of suballocator. Is that correct? Any comments on memory fragmentation problems.

Finding out how much device local memory is available was apparently hard back in 2018. Is that fixed? What's best practice today on getting a lot of device memory but not locking up the system because you grabbed all of it and nothing else can run?

Spilling from device memory to slower CPU memory accessed via the PCI bus is apparently something some Vulkan implementations can do. Or will do without being asked. When that happens, there's a big performance drop. How is that detected, prevented, or managed?

Is there something I should read that's more current than that 2018 presentation but covers the same material? Thanks.

6 Comments

2024/12/02
04:05 UTC

TLAS build problems, no instances built

I'm new to Vulkan, and decided to take on a ray tracing project to learn the API. Currently I have a bug where my TLAS is not being built correctly. I am completely stumped. According to Nsight graphics, my BLAS is being built fine.

https://preview.redd.it/6t002fpsb54e1.png?width=454&format=png&auto=webp&s=ef6e36f3d86cd6d4e058a2e3344655f699f4f287

However, it shows my TLAS contains no instances,

https://preview.redd.it/gxh88auwb54e1.png?width=455&format=png&auto=webp&s=4f84a340e6c986b6e6721d5374d5bdbc927291dd

From what I can see, I have provided the correct info in AccelerationStructureBuildGeometryInfo and AccelerationStructureBuildRangeInfoKHR to the buildAccelerationStructuresKHRfunction when building the TLAS (I have compared to Sascha Willems' raytracingbasic example). Here are some of the relevant fields Nsight shows for the TLAS build info:

Name	Value
pInfos
type	VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR
flags	VkBuildAccelerationStructureFlagsKHR(VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR)
mode	VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR
srcAccelerationStructure	VK_NULL_HANDLE
dstAccelerationStructure	0x2860e200000000bb
geometryCount	1
pGeometries
geometryType	VK_GEOMETRY_TYPE_INSTANCES_KHR
geometry
instances
arrayOfPointers	VK_FALSE
data
deviceAddress	0xc4bec900000000b0
flags	VkGeometryFlagsKHR(VK_GEOMETRY_OPAQUE_BIT_KHR)
scratchData
deviceAddress	0x88693900000000c0
ppBuildRangeInfos
ppBuildRangeInfos[0]
ppBuildRangeInfos[0][0]
primitiveCount	1
primitiveOffset	0
firstVertex	0
transformOffset	0

I also have a pipeline barrier between the BLAS and TLAS build commands. I do not think it's a synchronisation issue, as I have also tried coarser synchronisation without success (BLAS and TLAS built in separate submits with fence in between).

Nsight also shows the instance buffer contents are as it should:

transform	instanceCustomIdx	mask	instanceSBTOffset	flags	asReference
[1.0000, 0.0000, 0.0000, 0.0000]""[0.0000, 1.0000, 0.0000, 0.0000]""[0.0000, 0.0000, 1.0000, 0.0000]	0	255	0	1	0xE0001F5200

I obtained the acceleration structure reference using getAccelerationStructureAddressKHR. It does not correspond to the buffer address for the BLAS, and I can't find the AS address in Nsight. Not sure if that is suspicious. The instance buffer should be alive while the TLAS is being built.

Repo: https://github.com/arrebarritra/vulkan-raytracer

The relevant code is here: https://pastebin.com/1cx0PYSC

The code is mostly vulkan-hpp + a few of my own abstractions. Some details which might be good to know about the code:

The dereference operator on my Buffer class returns the associated vk::Buffer.
The underlying buffer and memory is destroyed when Buffer goes out of scope
I'm using my own memory allocator (for educational purposes) which is not really battle tested, but works as far as I can tell

Hopefully it is easy enough to read. Would appreciate any help!

Edit: trying to fix tables but reddit is not complying :(

7 Comments

2024/12/01
02:54 UTC

Vulkan in nested X server?

Does anyone know a X Server that allows you to run nested Vulkan applications?

So far I look at Xephyr and xvfb. Both of them only support GL tho.

Thank you so much for any help :)

1 Comment

2024/11/30
21:23 UTC

Depth buffer shader sampling

Hi all, I have this problem. I want to render to the depth buffer and sampling from it at the same time. I know this is not possibile so what i do is to copy the depth buffer to a another texture before rendering and then sampling from this texture. My question: is there a smarter way to solve this problem without copy each time the depth to another depth texture?

Thanks!

8 Comments

2024/11/29
17:49 UTC

Separate Graphics and Presentation Queues

I was revisiting my boilerplate code and noticed a TODO note for myself to check for separate Graphics and Presentation queues (on Windows and may be Linux).

Is this supported now?

6 Comments

2024/11/29
16:40 UTC

How do vulkan drivers like turnip work?

Supposedly turnip is a driver for adreno gpu (android) that replaces the system driver, but only for the application that uses it. But this is something I don't understand, how can a driver be loaded as a shared library and perform the same functions as a driver. Shouldn't this be impossible to do in user mode? Applications like citra and yuzu offered the option to load a custom vulkan driver.

https://preview.redd.it/fa4jx2ru8u3e1.png?width=1080&format=png&auto=webp&s=c783dac918164af052129bac2f3066841986c435

4 Comments

2024/11/29
12:57 UTC

Wildly different draw call times between two supposedly similar implementations of batch rendering: what gives?

Hi everyone, I'm working on an implementation of batch rendering using SDL3 GPU API with Vulkan backend.

I'm trying to reproduce the performance of https://github.com/re-esper/BunnyMarkGame which on my machine is at frametimes of 5ms for 1M sprites (~190-200 fps). My implementation has frametimes twice as long for the same number of sprites (~100-110 fps).

Even when doing nothing, only acquiring a command buffer and a swapchain texture, but not clearing the screen, the window idles with frametimes of 0.3 ms while the benchmark above has frametimes of 0.1 ms when it's doing more like clearing the screen and rendering the basic imgui UI (but no sprites). This suggests to me that there is some form of persistent overhead/latency somewhere. I checked the SDL's backend and it looks fine with no glaring mistakes, so very confused about this.

Here is what RenderDoc reports:

https://preview.redd.it/1cq3aa9det3e1.png?width=2371&format=png&auto=webp&s=c0d37409a4be627b56e1ccf3366d1545fe1370de

Same amount of instances, same texture, same number of total draw calls per frame, yet one has draw calls that take twice as long. My implementation is not even doing rotation or scaling. I checked my CPU and it can build a command buffer for a frame in far less than 1ms, so it shouldn't be CPU bound. What's going on?

9 Comments

2024/11/29
10:07 UTC

Curious about careers with Vulkan

Hey! I have started picking up Vulkan again but its got me thinking, what are some entry-level positions or career paths where Vulkan is used?

I would love to hear from more experienced folks about what its like and what kind of projects do you get to work on? Whats the coolest stuff you get to do with Vulkan?

4 Comments

2024/11/29
02:05 UTC

Vulkan 1.3.260 vs OpenCL 2.0 for GPGPU programming?

Hello everyone! I am building a neural network from scratch in C++ and was wondering which of the two would best tackle the task?

My computer is far from being considered a beast in computing/graphics power, so I would like to get the highest performance out of it. I have some experience with writing a 3D graphics renderer with Vulkan, so I am aware that the coding overhead sucks, but that is not a problem. I am shooting to get the most performance out of my program, so that is not playing a factor in my decision.

Some additional information about my driver specs:

Vulkan API version 1.3.260
Vulkan Driver version 2.0.279
OpenCL API version 2.0
OpenCL Driver version 31.0.21921.1000

7 Comments

2024/11/28
20:40 UTC

beautiful black screen - vulkan-guide

Hi,

as a beginner in vulkan programming I am following the tutorial:

https://vkguide.dev/

With chapter2 a compute shader is introduced and the result should be drawn to the screen. (https://vkguide.dev/docs/new\_chapter\_2/vulkan\_shader\_code/)

But I only see a black screen.

I get through the chapter twice and see no difference btw. the tutorial and my local code. ( https://github.com/Seim2k17/SolarSystem3DV/tree/solEngine/src/engine )

Could this be a hardware issue ?

Can someone help to find out whats wrong?

I have validation layers activated, captured a frame with renderdoc, but at this time i am not able to interprete the output ...

project:

https://github.com/Seim2k17/SolarSystem3DV/tree/solEngine

Thanks

Renderdoc-capture: (Linux)

https://github.com/Seim2k17/SolarSystem3DV/blob/solEngine/_captures/rdoc_capture_sol_blackscreen.rdc

4 Comments

2024/11/28
19:51 UTC

Disabling extension by command line using glslc or glslangValidator?

I use some optional extension for my GLSL shader and compile it to the SPIR-V file using automated CMake script. In the shader file, the source code is guarded by extension's availability (e.g. #extension GL_KHR_extension_name : enable -> enclose the code with #if GL_KHR_extension_name == 1 and #endif).

I want to produce the SPIR-V file with differing the extension availability, using my existing CMake script. It means, I want to control this extension usage by CLI parameters. Note that currently glslc assuems all extension enabled.

How should I do?

3 Comments

2024/11/28
10:34 UTC

Sending image to GPU in runtime

I am trying to send a 1000x1000 image to the gpu for rendering in runtime. I have tried what the following errors suggest but to no success.

I get the following error:

Error:Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x1d41b4b7a70, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x1d41b4b7a70[] expects VkImage 0x521e2f0000001f86[] (subresource: aspectMask 0x1 array layer 0, mip level 1) to be in layout VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.

Followed by the error:

Error:Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x1d42b04e300, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x1d42b04e300[] expects VkImage 0x521e2f0000001f86[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.

Create Image Code:

bool vk::Vulkan_Buffers::createImage(PhysicalDevice& physicalDevice, loadObject& objectToLoad, VkImage& image, VkDeviceMemory& memory, dt::vec2i imageDimentions, uint32_t mipMapLevels, VkImageUsageFlags usage, VkImageTiling tiling,Console& console) {

    VkImageCreateInfo imageCreateInfo{};
    imageCreateInfo.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
    imageCreateInfo.imageType = VK_IMAGE_TYPE_2D;
    imageCreateInfo.extent.width = imageDimentions.x;
    imageCreateInfo.extent.height = imageDimentions.y;
    imageCreateInfo.extent.depth = 1;
    imageCreateInfo.mipLevels = mipMapLevels;
    imageCreateInfo.arrayLayers = 1;
    imageCreateInfo.format = VK_FORMAT_R8G8B8A8_SRGB;
    imageCreateInfo.tiling = tiling;
    imageCreateInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    imageCreateInfo.usage = usage;
    imageCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
    imageCreateInfo.samples = VK_SAMPLE_COUNT_1_BIT;
    imageCreateInfo.flags = 0;

    if (vkCreateImage(physicalDevice.logicalDevice.handle, &imageCreateInfo, nullptr, &image) == VK_SUCCESS) {
        console.printSucsess("Vulkan Image created");
    }
    else {
        console.printError("Vulkan Image Failed to be created");
    }

    VkMemoryRequirements memoryRequirements;
    vkGetImageMemoryRequirements(physicalDevice.logicalDevice.handle, image, &memoryRequirements);

    VkMemoryAllocateInfo allocInfo{};
    allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
    allocInfo.allocationSize = memoryRequirements.size;
    allocInfo.memoryTypeIndex = findMemoryType(memoryRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, physicalDevice);

 if (vkAllocateMemory(physicalDevice.logicalDevice.handle, &allocInfo, nullptr, &memory) != VK_SUCCESS) {
        console.printError("Memory failed to be allocated");
    }

    vkBindImageMemory(physicalDevice.logicalDevice.handle, image, memory, 0);


    return true;
}

Create Texture Buffer:

bool vk::Vulkan_Buffers::createTextureBuffer(PhysicalDevice& physicalDevice, SDL_Surface* surface, VkImage& image, uint32_t mipMapLevels,Console& console) {
    VkBuffer stagingBuffer;
    VkDeviceMemory stagingBufferMemory;

    size_t imageSize = (sizeof(((Uint32*)surface->pixels)[0])) * (surface->w * surface->h);

    createBuffer(physicalDevice, imageSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory,console);

    void* data;
    vkMapMemory(physicalDevice.logicalDevice.handle, stagingBufferMemory, 0, imageSize, 0, &data);
    memcpy(data, surface->pixels, imageSize);

    //transition to the correct image format
    Vulkan_Image vulkanImageHandle;
    transitionImageLayout(physicalDevice, image, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1,console);
    copyBufferToImage(physicalDevice, stagingBuffer, image, static_cast<uint32_t>(surface->w), static_cast<uint32_t>(surface->h),console);
    transitionImageLayout(physicalDevice, image, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, mipMapLevels,console);
    vulkanImageHandle.generateMipMaps(physicalDevice, image, mipMapLevels, dt::vec2i(surface->w, surface->h),console);
    return true;
}

Transition Image Layout:

void vk::Vulkan_Buffers::transitionImageLayout(PhysicalDevice& physicalDevice, VkImage image, VkFormat format, VkImageLayout oldLayout, VkImageLayout newLayout, uint32_t mipMapLevels,Console& console) {
    Vulkan_CommandBuffers commandBuffersHandle;
    VkCommandBuffer commandBuffer = commandBuffersHandle.beginSingleTimeCommands(physicalDevice, physicalDevice.logicalDevice.graphicsCommandPool);

    VkImageMemoryBarrier imageMemoryBarrier{};
    imageMemoryBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
    imageMemoryBarrier.oldLayout = oldLayout;
    imageMemoryBarrier.newLayout = newLayout;
    imageMemoryBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    imageMemoryBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    imageMemoryBarrier.image = image;
    imageMemoryBarrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
    imageMemoryBarrier.subresourceRange.baseMipLevel = 0;
    imageMemoryBarrier.subresourceRange.levelCount = mipMapLevels;
    imageMemoryBarrier.subresourceRange.baseArrayLayer = 0;
    imageMemoryBarrier.subresourceRange.layerCount = 1;

    VkPipelineStageFlags sourceStage;
    VkPipelineStageFlags destinationStage;

    if (oldLayout == VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL && newLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) {
        VkCommandBuffer commandBuffer = commandBuffersHandle.beginSingleTimeCommands(physicalDevice, physicalDevice.logicalDevice.graphicsCommandPool);
        imageMemoryBarrier.srcAccessMask = 0;
        imageMemoryBarrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;

        sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
        destinationStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
        vkCmdPipelineBarrier(commandBuffer, sourceStage, destinationStage, 0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
        commandBuffersHandle.endSingleTimeCommands(physicalDevice, commandBuffer, physicalDevice.logicalDevice.graphicsCommandPool, physicalDevice.logicalDevice.queueFamilies[physicalDevice.logicalDevice.graphicsQueueFamily].queues[0].handle, console);
    }
    else if (oldLayout == VK_IMAGE_LAYOUT_GENERAL && newLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) {
        VkCommandBuffer commandBuffer = commandBuffersHandle.beginSingleTimeCommands(physicalDevice, physicalDevice.logicalDevice.computeCommandPool);
        imageMemoryBarrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
        imageMemoryBarrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;

        sourceStage = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;
        destinationStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
        vkCmdPipelineBarrier(commandBuffer, sourceStage, destinationStage, 0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
        commandBuffersHandle.endSingleTimeCommands(physicalDevice, commandBuffer, physicalDevice.logicalDevice.computeCommandPool, physicalDevice.logicalDevice.queueFamilies[physicalDevice.logicalDevice.graphicsQueueFamily].queues[0].handle, console);
    }
    else if (oldLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL && newLayout == VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL) {
        VkCommandBuffer commandBuffer = commandBuffersHandle.beginSingleTimeCommands(physicalDevice, physicalDevice.logicalDevice.computeCommandPool);
        imageMemoryBarrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
        imageMemoryBarrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

        sourceStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
        destinationStage = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
        vkCmdPipelineBarrier(commandBuffer, sourceStage, destinationStage, 0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
        commandBuffersHandle.endSingleTimeCommands(physicalDevice, commandBuffer, physicalDevice.logicalDevice.computeCommandPool, physicalDevice.logicalDevice.queueFamilies[physicalDevice.logicalDevice.graphicsQueueFamily].queues[0].handle, console);
    }
    else if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) {
        VkCommandBuffer commandBuffer = commandBuffersHandle.beginSingleTimeCommands(physicalDevice, physicalDevice.logicalDevice.graphicsCommandPool);
        imageMemoryBarrier.srcAccessMask = 0;
        imageMemoryBarrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

        sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
        destinationStage = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
        vkCmdPipelineBarrier(commandBuffer, sourceStage, destinationStage, 0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
        commandBuffersHandle.endSingleTimeCommands(physicalDevice, commandBuffer, physicalDevice.logicalDevice.graphicsCommandPool, physicalDevice.logicalDevice.queueFamilies[physicalDevice.logicalDevice.graphicsQueueFamily].queues[0].handle, console);
    }
    else if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout == VK_IMAGE_LAYOUT_GENERAL) {
        VkCommandBuffer commandBuffer = commandBuffersHandle.beginSingleTimeCommands(physicalDevice, physicalDevice.logicalDevice.graphicsCommandPool);
        imageMemoryBarrier.srcAccessMask = 0;
        imageMemoryBarrier.dstAccessMask = VK_ACCESS_SHADER_WRITE_BIT;

        sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
        destinationStage = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;
        vkCmdPipelineBarrier(commandBuffer, sourceStage, destinationStage, 0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
        commandBuffersHandle.endSingleTimeCommands(physicalDevice, commandBuffer, physicalDevice.logicalDevice.graphicsCommandPool, physicalDevice.logicalDevice.queueFamilies[physicalDevice.logicalDevice.graphicsQueueFamily].queues[0].handle, console);
    }
    else if (oldLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL && newLayout == VK_IMAGE_LAYOUT_GENERAL) {
        VkCommandBuffer commandBuffer = commandBuffersHandle.beginSingleTimeCommands(physicalDevice, physicalDevice.logicalDevice.graphicsCommandPool);
        imageMemoryBarrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
        imageMemoryBarrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

        sourceStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
        destinationStage = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT | VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;
        vkCmdPipelineBarrier(commandBuffer, sourceStage, destinationStage, 0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
        commandBuffersHandle.endSingleTimeCommands(physicalDevice, commandBuffer, physicalDevice.logicalDevice.graphicsCommandPool, physicalDevice.logicalDevice.queueFamilies[physicalDevice.logicalDevice.graphicsQueueFamily].queues[0].handle, console);
    }
    else {
        console.printError("Layout transition is not supported");
    }
}

generateMipMaps code:

bool vk::Vulkan_Image::generateMipMaps(PhysicalDevice& physicalDevice, VkImage& vkimage, uint32_t mipMapLevels, dt::vec2i dimentions,Console& console) {
VkFormatProperties formatProperties;
vkGetPhysicalDeviceFormatProperties(physicalDevice.handle, VK_FORMAT_R8G8B8A8_SRGB, &formatProperties);
if (!formatProperties.linearTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT) {
console.printError("Linear Blitting is not supported");
}

Vulkan_CommandBuffers commandBufferHandle;
VkCommandBuffer commandBuffer = commandBufferHandle.beginSingleTimeCommands(physicalDevice, physicalDevice.logicalDevice.graphicsCommandPool);

VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.image = vkimage;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
barrier.subresourceRange.baseArrayLayer = 0;
barrier.subresourceRange.layerCount = 1;
barrier.subresourceRange.levelCount = 1;

int32_t mipWidth = dimentions.x;
int32_t mipHeight = dimentions.y;

for (uint32_t i = 1; i < mipMapLevels; i++) {
barrier.subresourceRange.baseMipLevel = i - 1;
barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier);

VkImageBlit blit{};
blit.srcOffsets[0] = { 0, 0, 0 };
blit.srcOffsets[1] = { mipWidth, mipHeight, 1 };
blit.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
blit.srcSubresource.mipLevel = i - 1;
blit.srcSubresource.baseArrayLayer = 0;
blit.srcSubresource.layerCount = 1;
blit.dstOffsets[0] = { 0, 0, 0 };
blit.dstOffsets[1] = { mipWidth > 1 ? mipWidth / 2 : 1, mipHeight > 1 ? mipHeight / 2 : 1, 1 };
blit.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
blit.dstSubresource.mipLevel = i;
blit.dstSubresource.baseArrayLayer = 0;
blit.dstSubresource.layerCount = 1;
vkCmdBlitImage(commandBuffer, vkimage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, vkimage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &blit, VK_FILTER_LINEAR);

barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
barrier.srcAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier);

if (mipWidth > 1) mipWidth /= 2;
if (mipHeight > 1) mipHeight /= 2;
}

barrier.subresourceRange.baseMipLevel = mipMapLevels - 1;
barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier);
commandBufferHandle.endSingleTimeCommands(physicalDevice, commandBuffer, physicalDevice.logicalDevice.graphicsCommandPool, physicalDevice.logicalDevice.queueFamilies[physicalDevice.logicalDevice.graphicsQueueFamily].queues[0].handle,console);


return true;
}

6 Comments

2024/11/27
11:53 UTC

Updating UBO on different stages?

Hello,

I have a problem updating the UBO Buffer on Fragment stage, any rules on construction of UBO Buffer if using different stages. UBO and PCO on Vertex stage is fine working and output for fragment is ok, but I seems that UBO buffer on fragment is not reflecting on shader.

I'm pretty sure my pipeline layout is correct and descriptor writers seems fine also, any hint where to look at if UBO buffer seems not reflecting to the shader ?

TIA.

2 Comments

2024/11/27
09:23 UTC

Variance Shadow Maps: HUGE memory commitment! Am I doing it wrong?

Hey folks,

I got basic shadow mapping working. But it's... basic. Variance Shadow Maps is a technique that promises affordable soft shadows while offering solutions to common problems like Shadow Acne, or Peter Panning. So I started working on it.

My current setup has one D32_SFLOAT z-buffer for each frame in flight (which I have 2 of). To implement Variance Shadow Maps:

I created a R32G32B32A32_SFLOAT color image as attachment (2x for frames in flight) to store the depth and depth squared images. ~~Apparently, GPUs don't like R32G32 so 2 channels are wasted.~~ This is a huge investment already. EDIT: The GPU does like R32G32, mistake on my side. See comments below.
Then I noticed that my shadow map is in draw order, not in depth order, and it seems obvious now, but I still need the D32_SFLOAT z-buffer to get proper depth testing. (This is also because the depth values are supposed to be "linear", i.e., fragment-to-light distance, and not typical non-linear z-buffer distance).
In order to get soft shadows, I need Gaussian blurring passes. Since this cannot happen on the same texture, I need another R32G32B32A32_SFLOAT texture (for each frame in flight) to do the blurring: shadow map -> temp texture blur pass X -> shadow map blur pass Y.
Finally, the article proposes to use MSAA for the shadow maps, so let's say 4xMSAA for making my point.

To summarize (for 2 frames in flight) I have the following comparision:

Traditional shadow mapping: 2x D32_SFLOAT texture (total 2 SFLOAT channels).
Variance shadow mapping: 2x D32_SFLOAT (2 channels), 4x R32G32B32A32_SFLOAT (16 channels), 4x memory for MSAA (total 72 SFLOAT channels).

This difference seems intense. And that is just for each light I want to cast shadows. Am I missing something?

7 Comments

2024/11/26
23:03 UTC

How are textures and material parameters assigned to triangles?

Let's say you have a bunch of textures and material parameters. How do you assign those to triangles? So far I only know how to pass information per vertex. I could pass the information about which texture and material to use per vertex, but then I would have to store redundant information, so surely there has to be some better method, right?

23 Comments

2024/11/25
21:56 UTC

No window when following vulkan tutorial

Im pretty new to vulkan so Im currently following this tutorial and this also youtube tutorial. However, Im using hyprland on wayland and arch linux and after running the same code (I copied) I cant not see any new windows open. I dont think there are any problems with their code but rather than I dont know that there are some special requirements with my system tho. Thank you for your helps!

5 Comments

2024/11/25
10:25 UTC

vkcube-wayland transparent window issue

Hi, I've recently decided to give the NVK driver a try and I'll admit it works very well most of the time on my RTX 3060 Max-Q. However, I'm experiencing a bug with some vulkan applications that causes them to render only the window border with nothing within it. The best way to reproduce this bug is to run vkcube-wayland as it is the most widely available piece of software that has this bug. Weirdly, the normal vkcube works perfectly and, according to hyprctl (I'm using Hyprland), is running without xwayland. If anyone experienced this bug, it would be very nice to exchange some ideas about it.

3 Comments

2024/11/24
14:24 UTC

Vulkan checking of bindless descriptor indices

In bindless mode, if a shader uses an invalid descriptor index, what happens in these cases?

Index is out of range for the descriptor table.
Index is in range but descriptor slot is not in use.
Index is in range, descriptor slot is in use, but buffer is not currently mapped to the GPU because the CPU is using it.

(Why? Looking into designing a Rust interface and need to know what does and doesn't have to be checke for safety.)

6 Comments

2024/11/24
06:37 UTC

Binding a pipeline multiple times to pass different push constants?

Is it acceptable to bind a graphics pipeline multiple times using different push constants? Does Vulkan copy the push constants at each bind or do I need to hang on to them in memory until it's done with them? i.e. can I just overwrite the same struct in memory for each binding of a given pipeline, or should I be buffering all of the PCs for pipeline binds?

1 Comment

2024/11/24
04:33 UTC

What is the Vulkanised event like in person?

Hi, I want to find out about the event as much as possible, as my friend and I are thinking of going to the Vulkanised 2025 event, problem is aside from the conference agenda shown on the page we don't know what to expect from the event and anything we should know about beforehand. As the passes for the event are pretty expensive, but we would like to go there.

- What is it like?
- What are networking sessions like?
- Most importantly is there food? Or do we leave the venue to get food?

3 Comments

2024/11/23
10:42 UTC

Usage of vkUpdateDescriptorSets and Push Constants

Hello, I have two questions regrading vkUpdateDescriptorSets and Push Constants.

I try to use vkUpdateDescriptorSets and I do realize the call has to avoid recording / executing state. And I checked what is the definition of recording state:

https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#commandbuffers-lifecyclevkBeginCommandBuffer

It basically says begin recording is vkBeginCommandBuffer.
But it seems like I can write something below and everything works fine, why?

BeginCmdBuffer();

// before begin render pass, after BeginCmdBuffer
// shouldn't this be the recording state mentioned in the doc?
vkUpdateDescriptorSets();

BeginRenderPass();
BindPipeline();
BindDescSets();
Draw();
EndRenderPass();

Once I move vkUpdateDescriptorSets() inside BeginRenderPass(), validation layer complains.

I'm thinking about using push constants, are there any downside of using it?

It works asynchronously and seems handy than vkUpdateDescriptorSets.

3 Comments

2024/11/23
03:01 UTC

I'm rendering to a single window, and would like to be rendering to two windows. Can I get a high level overview of what I'd need to do to accomplish this?

Since I'm working on a Vulkan api implementation ( class Vulkan : public GraphicsApiBase ) that has things I'm pretty sure are specific to the company I'm working at, I'm mostly looking for a description of what I'd need to do, maybe which types of variables I need to look for and change, if I'd need to declare and init some things, maybe pseudocode. Anyway, context:

I have a window that's my current only render target, created with handle HWND hWnd0 = CreateWindowEx(...) using HINSTANCE hInstance.

I'd like to have a second window, created using HWND hWnd1 = CreateWindowEx(...), created using the same class as hWnd0, and I'd like to be able to alternate between rendering to hWnd0 and hWnd1, so in essence after it renders to hWnd0, I'd like to be able to switch the render target to hWnd1, and vice-versa.

2 Comments

2024/11/21
21:55 UTC

New video tutorial: Vertex Buffers in Vulkan

https://youtu.be/RYm-M6KwuhI

Enjoy!

0 Comments

2024/11/21
20:10 UTC

This subreddit is aimed at developers and end users, with a strong focus on development of the Vulkan API itself, the development of applications that use the Vulkan API and the state of deployment of implementations available.

20,639 Subscribers

Struggling with light Matrix for Directional Light in Shadow Mapping

Khronos Streamlines Development and Deployment of GPU-Accelerated Applications with Vulkan 1.4

When does host coherent mapped memory get transferred to the device?

Memory allocation questions

TLAS build problems, no instances built

Vulkan in nested X server?

Depth buffer shader sampling

Separate Graphics and Presentation Queues

How do vulkan drivers like turnip work?

Wildly different draw call times between two supposedly similar implementations of batch rendering: what gives?

Curious about careers with Vulkan

Vulkan 1.3.260 vs OpenCL 2.0 for GPGPU programming?

beautiful black screen - vulkan-guide

Disabling extension by command line using glslc or glslangValidator?

Sending image to GPU in runtime

Updating UBO on different stages?

Variance Shadow Maps: HUGE memory commitment! Am I doing it wrong?

How are textures and material parameters assigned to triangles?

No window when following vulkan tutorial

vkcube-wayland transparent window issue

Vulkan checking of bindless descriptor indices

Binding a pipeline multiple times to pass different push constants?

What is the Vulkanised event like in person?

Usage of vkUpdateDescriptorSets and Push Constants

I'm rendering to a single window, and would like to be rendering to two windows. Can I get a high level overview of what I'd need to do to accomplish this?

New video tutorial: Vertex Buffers in Vulkan