20,350 Subscribers

Vulkan samples slow loading

I downloaded the samples repo from here: https://github.com/KhronosGroup/Vulkan-Samples. Built step by step using tutorial.

When I run the examples, it always takes 2-3 seconds to display a window.

What can be the issue ?

5 Comments

2024/11/01
17:15 UTC

How to stream vertices out of compute shader without lock

So I have implemented a marching cubes terrain generator but I have a big bottleneck in my implementation. So the steps go thus

Create 3d density texture
Create 3d texture which gives number of triangles in each voxel
Create 3d texture that has index of the vertex buffer for each voxel
Tessalate each voxel and use the index texture to get point to start reading triangles into the vertex buffer

This is essentially a way to avoid the synchronization issue when writing to the vertex buffer. But the problem is that step 3 is not parallel at all which is massively slowing things down(e.g. it is just a single dispatch with layout(1,1,1) and a loop in the compute shader). I tried googling how to implement a lock so I could write vertices without interfering with other threads but I didn't find anything. I get the impression that locks are not the way to do this in a compute shader.

Update

Here is the new step 3 shader program https://pastebin.com/dLGGW2jT I wasn't sure how to set the initial value of the shared variable indexSo I dispatched it twice in order to set the initial value but I am not sure that is how you do that.

Little thought I had, are you suppose to bind an ssbo with the initialised counter in it then atomicAdd that?

5 Comments

2024/11/01
13:08 UTC

Is VK_EXT_debug_utils gone?

After upgrade to Vulkan SDK 1.3.296, VK_EXT_debug_utils extension is gone. Even GPU Info shows there's no GPU support for it. What's wrong with this?

I'm using LunarG provided Vulkan SDK in Apple M1 Pro (using MoltenVK). VK_EXT_debug_markers exist.

2 Comments

2024/11/01
08:35 UTC

[Help] Some problems with micro-benchmarking the branch divergence in Vulkan

I am new to Vulkan and currently working on a research involving branch divergence. There are articles online indicating that branch divergence also occurs in Vulkan compute shaders, so I attempted to use uvkCompute to write a targeted microbenchmark to reproduce this issue, which is based on Google Benchmark.

Here is the microbenchmark compute shader I wrote, which forks from the original repository. It includes three GLSL codes and basic C++ code. The simplified code looks like this:

  int op = 0;
  if ( input[idx] >= cond) {
    op = (op + 15.f);
    op = (op * op);
    op = ((op * 2.f) - 225.f);
  } else {
    op = (op * 2.f);
    op = (op + 30.f);
    op = (op * (op - 15.f));
  }

  output[idx] = op;

The basic idea is to generate 256 random numbers which range from 0 to 30. Two microbenchmark shader just differ in the value of cond: One benchmark sets condto 15 so that not all branches go into the true branch; The other benchmark sets condto -10 so that all branch would go into the true branch.

Ideally, the first program should take longer to execute due to branch divergence, potentially twice as long as the second program. However, the actual result is:

Benchmark Time CPU Iterations

NVIDIA GeForce GTX 1660 Ti/basic_branch_divergence/manual_time 109960 ns 51432 ns 6076

NVIDIA GeForce GTX 1660 Ti/branch_with_no_divergence/manual_time 121980 ns 45166 ns 6227

This does not meet expectations. I did rerun the benchmark several times and tested on the following environments on two machines, and neither could reproduce the result:

GTX 1660TI with 9750, windows
Intel UHD Graphic with i5-10210U, WSL2 Debian

My questions are:

Does branch divergence really occur in Vulkan?
If the answer to question 1 is yes, what might be wrong with my microbenchmark?
How can I use an appropriate tool to profile Vulkan compute shaders?

10 Comments

2024/11/01
02:59 UTC

Matrix notation in vulkan

I'm currently going through the linear algebra required for rendering a 3D scene. Let's say we have a simple 2D matrix that encodes where the base vectors i and j go. Would you store each vector in a row so [ix,iy,jx,jy] or in a column [ix,jx,iy,jy]?

4 Comments

2024/10/30
12:18 UTC

GLM fix?

I'm having issues with my code. For some context, I have started with the Vulkan tutorial and then used separate files for different things. I have a main.cpp:

#include <header.hpp>

int main() {
mainApplication app;
    try {
        app.run();
    } catch (const std::exception& e) {
        std::cerr << e.what() << std::endl;
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

which connects to header.hpp, defining the class, some functions and variables, and including all the libraries:

#define GLFW_INCLUDE_VULKAN
#define GLM_ENABLE_EXPERIMENTAL
#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEPTH_ZERO_TO_ONE

#include <GLFW/glfw3.h>
#include <glm/glm.hpp>
#include <glm/gtc/matrix_transform.hpp>
#include <glm/gtx/hash.hpp>
#include <stb_image.h>
#include <tiny_obj_loader.h>

#include <iostream>
#include <fstream>
#include <stdexcept>
#include <algorithm>
#include <chrono>
#include <vector>
#include <cstring>
#include <cstdlib>
#include <cstdint>
#include <limits>
#include <array>
#include <optional>
#include <set>
#include <unordered_map>

render.cpp also includes this header, and it details all the functions defined from header.hpp. But when I run the program, it has this error:

Errors

and then when I move all the #define GLM... to render.cpp, it has the following error:

More Errors

So I'm in a bit of a sticky situation. If you need more information, I'm happy to include it. Help is much appreciated!

EDIT - Turns out that when including render.cpp (which handles most of the functions) removes the LNK2001 error. It's probably something to do with things being reference multiple times. Either way, I haven't yet fixed it, to clear that up.

SOLVED!

I just fixed it and the solution might seem a "little" of an oversight. I'm using VS2022 and it turns out that I had to right-click the folder with header.hpp and render.cpp and click "Include in Project" because they were originally excluded.

4 Comments

2024/10/30
06:26 UTC

[Help] How can I learn Vulkan video coding?

So far, over the last several months, I've been learning ray tracing and compute shaders in Vulkan, and now I feel somewhat comfortable with them (though definitely not an expert!). This is my current level of understanding of Vulkan.

Now I’m trying to dive into video coding (both encoding and decoding) with Vulkan, but over the past few weeks, I’ve been stuck. I can’t seem to make any real progress with the APIs.

I don’t have experience in video coding. But for example when I read some basics like these:

- https://www.rastergrid.com/blog/multimedia/2021/05/video-compression-basics/

- https://github.com/leandromoreira/digital_video_introduction

I understand them, but they feel too basic compared to the actual Vulkan APIs. Other resources, like the Vulkan docs, seem too advanced for me to understand anything from them.

I know Vulkan is very low-level, and the APIs feel designed for someone who already has deep video coding knowledge. But for someone starting from scratch in video coding, how do I actually learn this and get comfortable with the Vulkan APIs for video coding? What steps did you take to learn it if you’ve already mastered it?

I realize this isn't something you can pick up from a single article or by reading source code—I'd likely need to cover many topics to truly understand it. What would you recommend as a learning path to reach a level where I can start using these APIs effectively?

Thank you so much in advance

(Please don't suggest the Nvidia examples, I already hate them)

10 Comments

2024/10/29
22:28 UTC

Synchronizing Transfer and Compute

SOLVED: turns out(as is typical with issues like these) the issue was actually in how I was passing the device addresses to the shaders and had nothing to do with the question I was asking. I had not modified the creation to split the address for each frame_in_flight for the nested string pointers, so both frames were running the compute shader against the same buffer(since they were passed the same address).

tl;dr, the synchronization from semaphores here is sufficient, you just need to make sure you're synchronizing the right objects...

OLD POST:

I'm having issues synchronizing transfer operations and a compute shader that runs on the data that was just transferred.

Currently I'm drawing text to learn how to use Vulkan and have the following draw loop(pseudocode):

frame_index = (frame_index + 1)  % MAX_FRAMES_IN_FLIGHT
frame = frames[frame_index]

vkWaitFences(frame->ready)
if frame has pending transfers:
  vkCmdPipelineBarrier(VK_ACCESS_HOST_WRITE memory barrier)
  vkResetCommandBuffer(frame->transfer)
  for transfer in transfers:
    vkCmdCopyBuffer(copy transfer to destination)
  vkCmdSubmit(frame->transfer, signal frame->tsem, wait on frame->fsem=frame->fsem_value)

  vkResetCommandBuffer(frame->compute)
  vkCmdCopyBuffer(zero instance_count in draw call)
  vkCmdDispatchIndirect(frame->compute)
  frame->csem_value += 1
  vkCmdSubmit(frame->compute, signal frame->csem=frame->csem_value, wait on frame->tsem)

vkResetCommandBuffer(frame->draw)
vkCmdDrawIndirect(frame->draw)
frame->fsem_value += 1
vkCmdSubmit(frame->draw, signal frame->fsem=frame->fsem_value, wait on frame->csem=frame->csem_value)

fsem is a timeline semaphore that tracks the current frame, so the transfer waits for the frame to draw with VK_PIPELINE_STAGE_TRANSFER_BIT

csem is a timeline semaphore that tracks the current compute so that draw waits for compute with VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT

tsem is a binary semaphore that compute submit waits on with VK_PIPELINE_STAGE_TRANSFER_BIT | VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT | VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT

both the compute and transfer are being submitted to the same queue, draw is submitted to a different queue

The problem is that toggling the length of a string between 0 and x through the semaphore-synchronized vkCmdCopyBuffer doesn't always happen before the compute shader reads from the memory. This causes graphical glitches where one of the frames has a copy of string with length x and the other length 0 so it flashes in and out of existence.

I've tried adding global memory barriers(VK_ACCESS_MEMORY_WRITE_BIT | VK_ACCESS_MEMORY_READ_BIT), buffer memory barriers, a fence between submitting the render and compute, and running the compute dispatch in the same command buffer with barrier in-between the transfers and dispatch. None have solved the graphical glitch(which is also observable using debug printf in the compute shader, the frames have different values when the compute is run).

I'm confident the issue is the synchronization between the vkCmdCopyBuffer and the vkCmdDispatchIndirect because submitting the compute command buffer again(moving it out of the conditional logic and into the per-frame always logic) results in the correct values being read from memory after a few dispatches.

Am I misunderstanding something about Vulkan synchronization?

Actual draw loop code here

1 Comment

2024/10/29
05:48 UTC

Updating resources via CPU and synchronization question.

So a lot of the simplified example/tutorial code out there will create a one-time-use command buffer to issue copy commands for things, and after submitting to the queue they just call vkQueueWaitIdle() or sometimes vkDeviceWaitIdle(), which obviously is non-optimal.

I'm wondering if a solution could be to just build a list of all of the resources that have been touched when they are written to or updated, before the frame rendering command buffer has started recording and once the main frame rendering commands do start recording just issue a vkCmdPipelineBarrier() with all of those resources at the beginning?

Would that suffice? Is that also sub-optimal? If rendering the frame doesn't actually entail accessing any of the touched resources would the barriers have virtually no impact on performance? Would this not actually work for some reason and there's something I'm missing?

Or should I build the list, then while recording frame-drawing commands, see if any of the resources used are in the list and build a list of those to issue a vkCmdPipelineBarrier() in a one-time-use command buffer before submitting the actual frame render command buffer?

Vulkan is really giving me all the opportunities in the world to overthink/overengineer things but I don't know where the line actually is yet between "good"/"optimal" and "why are you doing it like that?"

Thanks!

6 Comments

2024/10/29
00:30 UTC

Push Constants are assigned for one dispatch group but not another

Solved

!So I am trying to make a terrain generator with compute shaders and the issue has arisen where it worked fine for one chunk but when I try to have multiple chunks suddenly my push constants are only assigned for the chunk that was created last. I realize this is not at all close to a minimum reproducible example but I am at a bit of a loss as to how to create one. I have a renderdoc(a graphics debugging tool) output:!<

!Unassigned push constant chunk:!<

https://preview.redd.it/lnqpbnvolbxd1.png?width=2640&format=png&auto=webp&s=a562d6aa32d617062b8a6b438e5cf101d8b73fdc

!Assigned push constant chunk:!<

https://preview.redd.it/17dpurrplbxd1.png?width=2640&format=png&auto=webp&s=2fc662cbe2497c9fdad9068ba03e6ba4b03c6924

!Notice that in between the examples they have different descriptor sets and in one the push constants are assigned and in the other not. I thought it could be the descriptor sets that are causing an issue but in this manual on push constants it says descriptor sets have no baring on push constant lifetime. One thought I did have is that I reuse the same command buffer but reset it each time. I have also confirmed that the push constant data is actually being assigned correctly on the cpu side.!<

!I am at a bit of a loss as to why this could happen and would be more than happy to provide whatever is asked of me.!<

!Renderdoc file https://file.io/WDpMsc7ht0vE!<

Update

I am an idiot an put the pipeline creation in the loop with the other chunk resource stuff. I was recreating the pipeline for every chunk! Should I delete the post(so as to not clog up the subreddit with stupid stuff)? Also also I can't change the post title.

For anyone curious about the marching cubes draw

7 Comments

2024/10/27
15:58 UTC

Creating multiple buffers/images from large memory allocations: what is up with memorytypes!?

The Vulkan API is setup to where you define your buffer/image with a CreateInfo struct, create the thing, then call VkGetBufferMemoryRequirements()/VkGetImageMemoryRequirements() with which you find a usable memory type for vkMemoryAllocate().

Memory types are all over the dang place - I don't fully grasp what the different is between COHERENT/CACHED, other than COHERENT allows mapping the memory. Also, looking at the types and their heaps, clearly the DEVICE_LOCAL memory is going to be optimal for everything involving static buffers/images.

For transient stuff, or stuff that's updating constantly, obviously the 256MB (at least on my setup) heap that's both DEVICE_LOCAL and HOST_VISIBLE/HOST_COHERENT is going to be a better deal than just the HOST_VISIBLE/HOST_COHERENT memory type.

I'm trying to allocate a big chunk of memory ahead of time, and deduce what memory types (without GetMemoryRequirements) to create these allocations with. So far, all that I've been able to discern, at least with GetBufferMemoryRequirements() is that all of the combinations of the common buffer usage bitflags (0x00 to 0x200) doen't make any difference as to what memoryTypeBits ends up being. It just has all bits set with 0xF, which is saying that any combination of usage flags is OK with any memory type!

The same is the case trying every image usage flag combination from 0x00-0xFF, though a bunch of them do throw unsupported format errors, but everything causes vkGetImageMemoryRequirements() to set memoryTypeBits to 0xF.

Maybe it's different on different platforms, but this is kinda annoying - as it effectively reduces finding a memory type to just deciding whether it is DEVICE_LOCAL or not, and buffer/image usage flags are basically irrelevant.

The only thing that changes is the memory alignment that GetMemReqs() returns. For most buffer usage flag combinations it's 4 bytes, unless USAGE_UNIFORM is included, then it's 16 - which is the minUniformBufferOffset on my system. For images the alignment is 65536, which is the imageBufferGranularity on my system.

How the heck do I know what memory type to create these allocations with so that I can bind buffers/images to different offsets on there and have it not be an epic fail when running on different hardware? Over here we can see that DEVICE_LOCAL | HOST_VISIBLE | HOST_COHERENT has great coverage at 89% which is going to be the fast system RAM for the GPU to access, the 256mb heap on my setup - that most setups have and coverage spans desktop/mobile. There's also 40% coverage for the same flags with HOST_CACHED included on there - I don't understand what HOST_CACHED even means, the dox aren't explaining it very well.

I guess at the end of the day there's only so many heaps, and anything that will fit in the fast GPU-access system RAM will be the priority memory type, whereas for data that's too large and needs to be staged somewheres else can instead go into HOST_VISIBLE | HOST_COHERENT, like a fallback type - if it's present, which it isn't on a lot of Intel HD and mobile hardware. Everything else that needs to be as fast as possible goes straight into the DEVICE_LOCAL type.

Then on my system I have 5 more memory types!

0.3014 3 physical device memory heaps found:
0.3020  heap[0] = size:7920mb flags: DEVICE_LOCAL MULTI_INSTANCE
0.3025  heap[1] = size:7911mb flags: NONE
0.3031  heap[2] = size:256mb flags: DEVICE_LOCAL MULTI_INSTANCE
0.3036 8 physical device memory types found:
0.3042  type[0] = heap[0] flags: DEVICE_LOCAL
0.3048  type[1] = heap[1] flags: HOST_VISIBLE HOST_COHERENT
0.3055  type[2] = heap[2] flags: DEVICE_LOCAL HOST_VISIBLE HOST_COHERENT
0.3060  type[3] = heap[1] flags: HOST_VISIBLE HOST_COHERENT HOST_CACHED
0.3067  type[4] = heap[0] flags: DEVICE_LOCAL DEVICE_COHERENT DEVICE_UNCACHED
0.3072  type[5] = heap[1] flags: HOST_VISIBLE HOST_COHERENT DEVICE_COHERENT DEVICE_UNCACHED
0.3078  type[6] = heap[2] flags: DEVICE_LOCAL HOST_VISIBLE HOST_COHERENT DEVICE_COHERENT DEVICE_UNCACHED
0.3084  type[7] = heap[1] flags: HOST_VISIBLE HOST_COHERENT HOST_CACHED DEVICE_COHERENT DEVICE_UNCACHED

Who needs all these dang memory types?

14 Comments

2024/10/27
04:43 UTC

Is there a C library for low dimension linear and affine algebra convenient to use with Vulkan?

For example, say I want to draw a triangle. I can do this by putting the triangle's vertices into a vertex buffer in homogeneous notation and commanding it to be drawn. The length of this buffer will be 4 * sizeof (float) * 3 — here 4 is the length of a homogeneously denoted 3-dimensional vector and 3 is the number of vectors.

This is a little bit confusing. Am I drawing 3 4-dimensional vectors or 4 3-dimensional ones? It would have been more pleasant to write something like sizeof (vec4) * 3. But C does not provide the type vec4. Certainly I can define my own, but is there not a library that would do this for me?

And then, I may want to rotate, translate or scale my triangle on the central processing unit. For this I shall need a type for matrices and procedures for matrix multiplication and other common algebraic operations. Certainly I can define my own, but is there not a library that would do this for me?

Of course, there are very powerful, fast and reliable libraries that can do this for me. The GNU Scientific Library comes to mind. It can handle vectors of any size, decompose matrices into upper and lower triangular, perform singular value decomposition and many other mathematical operations. But I do not need any of this. I only need vectors in 2–4 dimensions, like in GLSL. I need something convenient and simple.

Is there such a library?

I am primarily interested in a C library, as opposed to C++, although it would be good to know if there is a superiour C++ solution.

20 Comments

2024/10/26
18:20 UTC

Techniques for iterative compute shaders?

Hello, I'm relatively new to Vulkan and I'm looking for advice on how to best implement a compute pipeline that executes iterative "stencil" compute shaders, where the output of the last iteration should be "ping-ponged" as the input to the next iteration (such as in the Jacobi iteration method). Each compute thread corresponds to a single pixel, and reads from its 4 direct neighbouring pixels.

I'm currently getting away with multiple `vkCmdDispatch` (along with descriptor set update) calls when constructing the command buffer, but this approach doesn't seem to hold up with adding further stages to the pipeline.

Does anyone know of a way to handle the "halo region" of a workgroup - the pixels outside of the current workgroup that are referenced by threads within - such that an iterative method can be entirely contained within a single shader dispatch? From what I gather there is no way to synchronize across workgroups, which means I need to globally sync the pipeline with a `VkImageMemoryBarrier` between each dispatch. Is the best method to accept multiple pipelines and continue with this approach, or am I missing something?

Much appreciated!

3 Comments

2024/10/24
23:08 UTC

Order-Independent Transparency with Depth Peeling Sample

The order-independent transparency with depth peeling sample renders a single torus whose opacity can be controlled via the UI producing pixel-perfect results.

https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/api/oit_depth_peeling

5 Comments

2024/10/23
17:01 UTC

Is there another fix for "access violation vulkan-1.dll"?

Does anyone else have to run their IDE (Visual Studio in my case) as administrator at all times to get around an access violation for vulkan-1.dll?

Is there a way to fix this? I saw some people have the same problem but couldn't find another solution. Drivers and software are all up to date. I'm on Windows 10.

Thank you:)

20 Comments

2024/10/23
13:32 UTC

Binding an SSBO as vertex data?

I'm generating vertex data in a compute shader that's output to a section of an SSBO - is it possible to directly bind that section of the SSBO as vertex input for a graphics pipe or does that only work with buffers that have their VkBufferCreateInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT?

Do I need to copy the section of the SSBO to such a buffer with a transfer command?

I know I can just bind no vertex data and use the vertex ID to directly index into the SSBO, using that to set gl_Position in the vertex shader, but if I can get away with directly binding the SSBO as vertex data that would be ideal.

Thanks!

EDIT: Doh! I don't know why I didn't think to look on the spec first https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkCmdBindVertexBuffers.html

VUID-vkCmdBindVertexBuffers-pBuffers-00627
All elements of pBuffers must have been created with the VK_BUFFER_USAGE_VERTEX_BUFFER_BIT flag

2 Comments

2024/10/21
23:48 UTC

I don't understand the point of atomic operations

I'm currently writing a complicated compute shader that dynamically generates some geometry, and I'm having trouble with the memory model of compute shaders.

The information that I've found on the Internet (mostly StackOverflow) and the OpenGL wiki) is very confusing (see for example this answer, and the Vulkan specification is extremely difficult to read.

According to the OpenGL wiki, one must ensure visibility of memory writes even within a single work group. In other words, as long as you don't call memoryBarrier(), the other "work items" in that same work group might not see your write. This even applies to atomic operations, according to the wiki.

This leads me very confused as to what the point of using atomic operations even is.

Let's say for example that I want to do uint value = atomicAdd(someSharedCounter, 1);. The objective is that each work item gets a different value in value. Since this is (according to the wiki) an incoherent memory access, you must instead do something like this:

memoryBarrier();
uint value = atomicAdd(someSharedCounter, 1);
memoryBarrier();

However, if I strictly follow what the wiki says, this can't work. For example: let's say someSharedCounter is initialized to 0, then one work item executes lines 1 and 2 and writes 1 in someSharedCounter, then another work item executes line 1 and 2. But because the first work item hasn't reached line 3 yet, the second work item still sees 0 in someSharedCounter.

Since you don't have the guarantee that work items execute in lock-step, I don't see any way to add any execution or memory barrier to make this work as intended. To me, atomic operations that aren't coherent memory accesses don't make sense. They are useless, as you have the exact same guarantees when doing uint value = atomicAdd(someSharedCounter, 1); as if you did uint value = someSharedCounter; someSharedCounter += 1;.

Maybe the point of atomic operations is instead only to guarantee an order of execution, but shouldn't memoryBarrier() do this job and guarantee that all memory writes that are found before memoryBarrier() in the program actually execute before the barrier?

Note that I understand that in practice it will just work because everything executes in lock-step and that all the atomic adds will execute simultaneously. My question is more about how you're supposed to make this work in theory. Is the wiki wrong, or am I missing something?

11 Comments

2024/10/21
07:44 UTC

Execute Vertex Shader one time

Hi guys, in a nutshell, I have a vertex shader and various fragment shaders, is there a way to execute the vertex shader one time and use different fragment shaders for different attachments? (Like a multi fragment pipeline?)

8 Comments

2024/10/20
03:44 UTC

Multiple window support for SDL + Vulkan

uint32_t flags = SDL_WINDOW_SHOWN | SDL_WINDOW_VULKAN;

auto * window = SDL_CreateWindow(
    windowName.c_str(),
    posX,
    posY,
    windowWidth, windowHeight,
    flags
);
SDL_CheckForError();

I've been trying to support rendering into multiple windows using SDL_CreateWindow, but I get an error that says Vulkan is already loaded when I try to create the second window. Can anyone tell me if having multiple windows with SDL and Vulkan is possible?

13 Comments

2024/10/20
02:34 UTC

Is there any advantage of selecting memory type with no image support for index/vertex buffer?

Hello dear Vulkan devs,

I have a question regarding memory type selection for the index/vertex buffer as the title suggests.

Here is the part of the vulkaninfo output relevant to my question:

memoryHeaps: count = 2
       memoryHeaps[0]:
               size   = 6442450944 (0x180000000) (6.00 GiB)
               budget = 6102515712 (0x16bbd0000) (5.68 GiB)
               usage  = 0 (0x00000000) (0.00 B)
               flags: count = 1
                       MEMORY_HEAP_DEVICE_LOCAL_BIT
       ...
memoryTypes: count = 6
       ...
       memoryTypes[1]:
               heapIndex     = 0
               propertyFlags = 0x0001: count = 1
                       MEMORY_PROPERTY_DEVICE_LOCAL_BIT
               usable for:
                       IMAGE_TILING_OPTIMAL:
                               color images
                               FORMAT_D16_UNORM
                               FORMAT_X8_D24_UNORM_PACK32
                               FORMAT_D32_SFLOAT
                               FORMAT_S8_UINT
                               FORMAT_D24_UNORM_S8_UINT
                               FORMAT_D32_SFLOAT_S8_UINT
                       IMAGE_TILING_LINEAR:
                               color images
                               (non-sparse, non-transient)
       memoryTypes[2]:
               heapIndex     = 0
               propertyFlags = 0x0001: count = 1
                       MEMORY_PROPERTY_DEVICE_LOCAL_BIT
               usable for:
                       IMAGE_TILING_OPTIMAL:
                               None
                       IMAGE_TILING_LINEAR:
                               None
       ...

I have more memory heaps and types but these are the ones in question.

So, let's say I want to select the most optimal memory type for my index/vertex buffer. These two candidate memory types have the same property flags and use the same memory heap. And the memoryTypeBits field of the VkMemoryRequirements returned by the vkGetBufferMemoryRequirements() call states that both of these memory types can be used for my buffer. So, these two memory types seem identical except for the fact that the first type is also usable for image types, which I won't need for an index/vertex buffer.

Also, in an answer to an unrelated question, u/kroOoze states that the memory types with similar flags should be ordered by performance.

So my question is, which memory type should be preferred for index/vertex buffer in a case like this? Skip the more "general" one in favour of the more "specialized" one? Or trust the ordering of the Vulkan driver and just select the first suitable one?

Thanks in advance.

2 Comments

2024/10/19
17:33 UTC

What Vulkan extensions can I realistically use?

I'm going to work on a video game that will use Vulkan as a rendering library under the hood. This game will eventually be published on Steam. My target audience will be desktop Windows users in the first place, but ideally, I would like to cover Linux and Mac users too.

In order to cover as wide an audience as I can, including users of old platforms and outdated drivers, I should be conservative in the Vulkan versions and extensions that my app will require.

For example, the VK_KHR_dynamic_rendering extension that I personally would love to use is probably not an option for me due to its poor coverage on desktop platforms, including my own GPU, AMD Vega M (this extension is present on Windows but is not available on Linux for me).

However, some extensions are unavoidable. For instance, I probably will not be able to achieve my goals with pure Vulkan 1.0 without VK_EXT_descriptor_indexing (bindless descriptors). Fortunately, such extensions have better, but still not full, coverage.

So, my question is, how do you usually make a decision on Vulkan extension usage, considering that you need to satisfy the needs of the typical gamer? If some of my users complain that they cannot run my game just because of Vulkan, I will have a hard time explaining to them that they should upgrade their hardware just to run my non-AAA game. On the other hand, I can't estimate solely based on the GPU-info database which devices I can more or less safely ignore.

22 Comments

2024/10/19
05:16 UTC

Any Good resources to leearn vulkan?

I am transitioning from OpenGL to vulkan (I don't have professional experience, I only finished learnopengl). Any good beginner-friendly resources or series to learn vulkan? Preferrably something like how learnopengl structures and explains the concepts in a step by step format? Thanks in advance!

16 Comments

2024/10/18
18:35 UTC

Strange issue with LunarG layers on Linux; Vulkan Configurator and vulkaninfo show they exist but for some reason the won't load.

Hello,

Recently I moved things over to Linux from Windows and wanted to get continue working. Had to fix a couple of minor differences but have run into one issue I have no idea how to fix.

I'm trying to load the LunarG layers; specifically

VK_LAYER_LUNARG_monitor

For whatever reason, Vulkan Configurator and running vulkaninfo all show that the layer exists but for some reason my app can't load it.

Given that and the fact the app was running fine on Windows, I'm at a bit of a loss. I had to install a separate package from the SDK to get the validation layers, is there another package I need to install to get the LunarG layers to work?

Thanks for your help.

Specifics

CachyOS (Arch based)
I did try the vulkan-lunarg-toolsbut that didn't fix things.
As far as I can tell, I installed the SDK correctly. All the commands seem to run without any issue.
Even overriding things with Vulkan Configurator does not seem to work.
- Though the extension does show up when I enumerate the layer properties, it still will not load.
- As a side note, the layer appears to work when I run the cube sample from the configurator.
I am using Rust and interacting with Vulkan through Ash

1 Comment

2024/10/17
21:59 UTC

vk-bootstrap equivalent for (plain) C?

Kinda dumb question, but I'm following through vkguide.dev after finishing up vulkan-tutorial.com, and I'm wondering if there's a C equivalent to vk-bootstrap. I can write an implementation myself if need be, but I'm feeling pretty lazy at this point so I figured it may be worth looking around. I haven't been able to find an equivalent, but that doesn't mean there isn't one out there somewhere.

Also, I know I shouldn't use C for this, and that C++ would make my life a lot easier, but "no pain no gain" or whatever the saying is.

15 Comments

2024/10/17
20:51 UTC

Do validation layers cause a noticeable change in performance? If I know what I'm doing(for real), are they necessary at all?

From everything I've read, it seems validation layers are only necessary for debugging purposes and nothing more. Do you think I should implement them in the final application anyway?

13 Comments

2024/10/17
10:08 UTC

GLSL vs HLSL for vulkan

I have a Vulkan codebase that uses GLSL, but the thing is that I have a lot of shader code with shit ton of files. And because GLSL doesn't have classes or namespaces, and is just pretty annoying to work with in general, everything starts turning into spaghetti code. So I wanted to switch to HLSL, but I'm not sure whether it plays nicely with Vulkan. What I mean is, can you easily bind Vulkan resources in there? And what about extensions, especially the newer ones for ray tracing?

24 Comments

2024/10/17
09:32 UTC

This subreddit is aimed at developers and end users, with a strong focus on development of the Vulkan API itself, the development of applications that use the Vulkan API and the state of deployment of implementations available.

20,350 Subscribers

Vulkan samples slow loading

How to stream vertices out of compute shader without lock

Update

Is VK_EXT_debug_utils gone?

[Help] Some problems with micro-benchmarking the branch divergence in Vulkan

Matrix notation in vulkan

GLM fix?

[Help] How can I learn Vulkan video coding?

Synchronizing Transfer and Compute

Updating resources via CPU and synchronization question.

Push Constants are assigned for one dispatch group but not another

Solved

Update

Creating multiple buffers/images from large memory allocations: what is up with memorytypes!?

Is there a C library for low dimension linear and affine algebra convenient to use with Vulkan?

Techniques for iterative compute shaders?

Order-Independent Transparency with Depth Peeling Sample

Is there another fix for "access violation vulkan-1.dll"?

Binding an SSBO as vertex data?

I don't understand the point of atomic operations

Execute Vertex Shader one time

Multiple window support for SDL + Vulkan

Is there any advantage of selecting memory type with no image support for index/vertex buffer?

What Vulkan extensions can I realistically use?

Any Good resources to leearn vulkan?

Strange issue with LunarG layers on Linux; Vulkan Configurator and vulkaninfo show they exist but for some reason the won't load.

Specifics

vk-bootstrap equivalent for (plain) C?

Do validation layers cause a noticeable change in performance? If I know what I'm doing(for real), are they necessary at all?

GLSL vs HLSL for vulkan