/r/GraphicsProgramming
A subreddit for everything related to the design and implementation of graphics rendering code.
Rule 1: Posts should be about Graphics Programming.
Rule 2: Be Civil, Professional, and Kind
Suggested Posting Material:
- Graphics API Tutorials
- Academic Papers
- Blog Posts
- Source Code Repositories
- Self Posts
(Ask Questions, Present Work)
- Books
- Renders
(Please xpost to /r/ComputerGraphics)
- Career Advice
- Jobs Postings (Graphics Programming only)
Related Subreddits:
Related Websites:
ACM: SIGGRAPH
Journal of Computer Graphics Techniques
Ke-Sen Huang's Blog of Graphics Papers and Resources
Self Shadow's Blog of Graphics Resources
/r/GraphicsProgramming
https://discourse.threejs.org/t/how-to-antialias-the-sdf-edge/73976 -three.js forum
https://jsfiddle.net/m6oe7c9f/26/ - demo using three.js/glsl
I was trying smooth the edge using fwidth and smoothstep for anti alias and it obviously works for particles but I just don’t how to get the distance to the edge of the sdf shape in this case a sphere radius of 1. I found some blog posts about it but I think it just comes down to storing a variable for distance to the edge then we can smooth or clamp the edges.
I like graphics programming ,but to be honest I'm more interested in the math part ,and I'm working on building a math library for game development .I am looking for a graphics library (c language ) to test my math and demonstrations ,I was going to use graphics.h but apparently i need to use c++ for that .
Thanks in advance for your suggestions .
Has anyone tried any libraries for 3D triangle mesh boolean operations. I'm more interested in robust, accurate results than performance.
So I'm a cs student in my 3rd year and wish to learn a graphics API ( already know a bit of math stuff and general graphics), without being familiar with any of the APIs is it a good place to start with dx11 because it might be easier than dx12 and stuff like that. Also i don't care about portability since I'm using Windows as my primary OS.
I have tried downloading and using NVIDIA's Falcor framework, but after building the Visual Studio solution and running the Mogwai project, an exception is thrown while creating the swapchain in Direct3D 12 and I get DXGI_ERROR_DEVICE_REMOVED. The error happens when trying to load igc1464.dll, which it loads, unloads, and tries to load again causing the error.
I have tried updating my drivers, adding graphics driver registry keys, running a Windows memory diagnostic check, reading the documentation, and checking forums for similar issues; none of which have helped.
I am running on a laptop with 32GB memory, RTX 4060 laptop GPU, and Windows 11.
Any help on how to fix this error and get Falcor to run would be appreciated, as I'd like to start using it for projects. Thank you!
It's very clear to me how halton / sobol and low-discrepancy sequences can be used to generate camera samples and the drawback of clumping when using pure random numbers.
However the part that I'm failing to understand is how to use LDSs everywhere in a path tracer, including hemisphere samping, here's the thought that makes it confusing for me:
Imagine that on each iteration of a path-tracer (using the word "iteration" instead of "sample" to avoid confusion) we have available inside our shader 100 "random" numbers, each generated from a 100-dimensional halton sequence (thus using 100 prime numbers)
On the next iteration, I'm updating the random numbers to use the next index of the halton sequence, for each of the 100 dimensions.
After we get our camera samples and ray direction using the numbers from the halton array, we'll always land on a different point of the scene, sometimes even on totally different objects / materials, in that case how does it make sense to keep on using the other halton samples of the array? aren't we supposed to "use" them to estimate the integral at a specific point? if the point always changes, and even worse, if at each light bounce we can get to a totally different mesh compared to the previous path-tracing iteration, how can I keep on using the "next" sample from the sequence? doesn't that lead to a result that is potentially biased or that it doesn't converge where it should?
I have spent a bit thinking about the problem of meshing topological skeletons and I came up with a solution I kinda like.
So I am sharing here in case other people are interested: https://gitlab.com/dryad1/documentation/-/blob/master/src/math_blog/Parametric%20Polytopology/parametric_polytopology.pdf?ref_type=heads
As I've been studying basic DSA and discrete mathematics, I have felt a bit listless despite trying to recognize the overall importance of these concepts. I wanted to pursue computer graphics programming since teaching a computer to process space, vertexes, form, light, movement etc felt more interesting and comprehensible than systems of search engines and user data. in websites and apps. It's hard to understand why all these algorithms exist and relate the topics to computer graphics. For programming/computer science beginners, what are important topics to know for computer graphics?
So I've had this idea regarding a heatmap that records the size of triangles in a meshes single vertex channel.
I've been looking into the VRAM cost of LODs(higher density) but not a fan of recent cluster implementations(might look into a very conservative streaming plan). So in order to take advantage of faster hardware quad rendering, I want to stop the view samples from sampling small triangles.
Basically the distance of the camera multiplies a sinking effect on small triangles(vertices under a threshold) and closure intensity of neighboring vertices(larger triangles end up occluding the smaller tris).
Up to 12m tris could be processed but I'm aware that some stages in the HW pipeline such as GS are slow and whatever HW stage unreal's WPO uses also had large documented overhead(haven't done serious performance measures).
Target hardware would be 20 series+, rnda2+, and arc gpus(in terms of HW support which are all pretty synced outside of MSAA support I've heard).
A point in the right direction would be helpful and just asking all GPs spaces I can reference 👍
Thanks.
Last week I released on github a new single-header file library for constructing and traversing BVHs on the CPU, with (short-term) plans to take traversal (not construction) to the GPU as well. Link:
https://github.com/jbikker/tinybvh
Features (version 0.4.2):
Coming up:
The code is an implementation and continuation of my articles on BVH construction:
https://jacco.ompf2.com/2022/04/13/how-to-build-a-bvh-part-1-basics
Support / questions:
Greets
Jacco.
Hi everyone,
I am part of a university project where I need to develop an app. My team has chosen Python as the programming language. The app will feature a 3D map, and when you click on an institutional building, the app will display details about that building.
I want the app to look very polished, and I’m particularly focused on rendering the 3D map, which I have exported as an .OBJ file from Blender. The file represents a real-life neighborhood.
However, the file is quite large, and libraries like PyOpenGL, Kivy, or PyGame don’t seem to handle the rendering effectively.
Can anyone suggest a way to render this large .OBJ file in Python?
I've been told colleges like UPenn (due to their DMD program) and Carnegie Mellon are great for graphics due to the fact they have designated programs geared towards CS students seeking to pursue graphics. Are their any particular colleges that stand out to employers or should one just apply to the top 20s and hope for the best?
So I am new to graphics programming and have worked with opengl and made renderers and stuff before and wanted to jump into more recent graphics apis. I thought of starting with dx12 but seen lots of posts saying to start with dx11. Any thought?
For spectral rendering, we rely on CIE curves which contain measured Spectral Power Distribution functions (SPD) in order to accurately model color and eventually convert spectral information back into sRGB for our displays to see.
Examples of these curves from CIE's official dataset are linked below :
The part I'm having a hard time wrapping my head around is the scales of the values. The standard illuminants are scaled such that they take on a value of 100.0
at 560nm
. The XYZ color matching curves seem to be scaled wrt to Y(555)
which is itself relative to the spectral response curve.
If I were to use the curve for the standard illuminant and convert it into XYZ colors (for example), then wouldn't the scales of the inner product all be screwed up? Do raytracing engines do something special to rescale these curves from the official datasets or does it not matter?
One thing that kind of confuses me - Shader Model is a Direct X only thing, correct?
In other words requiring SM5 support or SM6 means nothing to programs using Vulkan, OpenGL, GCN or Metal, correct?
When googling or using ChatGPT this seems to be mixed up constantly....
Hey !
Two months ago i asked for advice to port a program from VEX (Python like) to C++. Well, time has passed as it tends to do and i got results to show.
There is obviously a lot going on and to cover it all we would need like a 50 page paper. We obviously managed to port the entire VEX code to C++, but also improved certain aspects massively. Here is a quick and non-exhaustive rundown of the changes and improvements
Perhaps the most important chance is not in the code, but philosophical. The VEX code had no real objective. Me and Mr. Norway just kinda stumbled along.
VMEC has an objective. We want to make a free Black Hole rendering and education software that could, in principle, be used for Movie grade effects.
The Education bit is not important for this post, it basically boils down to a few options (such as replacing the Volumetric disk with a 2D one, visualizing Geodesics in the scene etc). Those are not hard to do.
What is hard to do is the "Movie grade" bit. Sure, the render above looks very nice, but it is a lot more technically impressive than visually. Then the question becomes what we can do to improve the look. We have two big ticket items on our to do list right now.
That last point carries a lot of hope on our end. Right now VMEC is a "0th Scattering" renderer. The only light a ray sees is that along its direct path. There are no secondary rays because there are no light sources to do Single Scattering with.
We hope Multiple Scattering will improve the volumetrics to the point where they become useful in a production environment. The reason we have avoided Multiple Scattering thus far is the performance cost. But trial GPU ports have given us reasonable confidence in the render time feasibility of a "Multiple Scattering" option for VMEC.
Ofc, there are non-visual features we want to implement as well
amongst other. We will probably not add .obj support or anything similar because that would run into conflict with some very fundamental assumptions we have made. VMEC is build in natural units were c=G=M=1. So the Black Hole is actually just 1.4 units across. The disk is 120 units in radii and the jet is 512 units long.
Anyways, the whole point of this post is to ask for advice.
Right now, while VMEC´s renders look nice, they are very clearly CGI. We think the main reason they do is the lack of Multiple Scattering, judging by other volumetric renderers. But we might miss something. So any advice on how to improve the look would be highly appreciated !
CUDA/HIP kernels can be compiled at runtime with the CUDARTC and HIPRTC APIs (NVIDIA and AMD respectively).
In my experience, starting multiple std::thread
to compile multiple kernels in parallel just doesn't seem to work: launching 2 std::thread
in parallel doesn't take less time than compiling two kernels in a row on the main thread.
The 'lock' seems to be deep in the API DLLs as that's where the thread is stuck when breaking into the debugguer.
Why is it like that? If a compiler is "simply" parses the kernel code to "translate" it to bitcode/PTX/... then why does it have to be synchronized like that?
I work as a full-time Flutter developer, and have intermediate programming skills. I’m interested in trying my hand at low-level game programming and writing everything from scratch. Recently, I started implementing a ray-caster based on a tutorial, choosing to use raylib with C++ (while the tutorial uses pure C with OpenGL).
Given that I’m on macOS (but could switch to Windows in the future if needed), what API would you recommend I use? I’d like something that aligns with modern trends, so if I really enjoy this and decide to pursue a career in the field, I’ll have relevant experience that could help me land a job.
Hi everyone,
I'm working on a Vulkan-based TLAS (Top-Level Acceleration Structure) build, and after adding copy commands to the instance buffer, my application crashes with VkResult -4 (device lost) once the command vkCmdBuildAccelerationStructuresKHR is recorded and submitted with the validation error:
validation layer: Validation Error: [ VUID-vkDestroyFence-fence-01120 ] Object 0: handle = 0xb8de340000002988, type = VK_OBJECT_TYPE_FENCE; | MessageID = 0x5d296248 | vkDestroyFence(): fence (VkFence 0xb8de340000002988[]) is in use. The Vulkan spec states: All queue submission commands that refer to fence must have completed execution (https://vulkan.lunarg.com/doc/view/1.3.275.0/windows/1.3-extensions/vkspec.html#VUID-vkDestroyFence-fence-01120)
The fence crash is a result of the program hanging there due to something in the TLAS which is not correct, though I am struggling to understand what exactly. I followed the vulkan basic example closely on their Github and can't find too much difference from theirs and mine to cause a crash like this.
Here’s the part of the code where I do the copy to the instance buffer. It seems correct to me: Full code
auto instancesBuffer = new Buffer(V::CreateBuffer(sizeof(VkAccelerationStructureInstanceKHR) * instances.size(), VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_STORAGE_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT | VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR | VK_BUFFER_USAGE_TRANSFER_DST_BIT, VMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BIT, VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE));
std::vector<VkAccelerationStructureInstanceKHR> instances;
for (size_t i = 0; i < 1; ++i) {
AS& blas = allBlas[i];
VkAccelerationStructureInstanceKHR instance = {};
...
instance.accelerationStructureReference = blas.deviceAddress;
instances.push_back(instance);
}
auto stagingBuffer = new Buffer(V::CreateBuffer(context.allocator, sizeof(VkAccelerationStructureInstanceKHR) * instances.size(),VK_BUFFER_USAGE_TRANSFER_SRC_BIT,VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT,VMA_MEMORY_USAGE_AUTO_PREFER_HOST));
void* mappedData;
vmaMapMemory(context.allocator.allocator, stagingBuffer->allocation, &mappedData);
memcpy(mappedData, instances.data(), sizeof(VkAccelerationStructureInstanceKHR) * instances.size());
vmaUnmapMemory(context.allocator.allocator, stagingBuffer->allocation);
VkBufferCopy copyRegion = {};
copyRegion.size = sizeof(VkAccelerationStructureInstanceKHR) * instances.size();
vkCmdCopyBuffer(cmdBuff, stagingBuffer->buffer, instancesBuffer->buffer, 1, ©Region);
VkBufferMemoryBarrier bufferBarrier{ VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER };
bufferBarrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
bufferBarrier.dstAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_KHR | VK_ACCESS_SHADER_READ_BIT;
bufferBarrier.buffer = instancesBuffer->buffer;
bufferBarrier.size = VK_WHOLE_SIZE;
bufferBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
bufferBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
// Copy data from CPU staging buffer to GPU
vkCmdPipelineBarrier(cmdBuff,VK_PIPELINE_STAGE_TRANSFER_BIT | VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR,VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR, 0,0, nullptr1, &bufferBarrier, 0, nullptr);
EndAndSubmitCommandBuffer(context, cmdBuff);
The error occurs at this line where I end and submit the command buffer
VkCommandBuffer buildCmd = AllocateCommandBuffer(context, m_renderCommandPools[V::currentFrame].handle);
BeginCommandBuffer(buildCmd);
vkCmdBuildAccelerationStructuresKHR(
buildCmd,
1,
&accelerationBuildGeometryInfo,
accelerationBuildStructureRangeInfos.data());
EndAndSubmitCommandBuffer(context, buildCmd);
Aftermath report which I do not understand
u/BoyBaykiller experimented a bit on the Sponza scene (can be found here) with the wavefront approach vs. the megakernel approach:
| Method | Ray early-exit | Time |
|------------ |----------------:|-------: |
| Wavefront | Yes | 8.74ms |
| Megakernel | Yes | 14.0ms |
| Wavefront | No | 19.54m |
| Megakernel | No | 102.9ms |
Ray early-exit "No" meaning that there is a ceiling on the top of Sponza and no russian roulette: all rays bounce exactly 7 times, wavefront or not.
With 7 bounces, the wavefront approach is 5x times faster but:
Where does the speedup come from?
I'm having some issues combining the lobes of my layered BSDF in an energy preserving way.
The sheen lobe alone (with white lambertian diffuse below instead of glass lobe) passes the furnace test. The glass lobe alone passes the furnace test.
But sheen on top of glass doesn't pass it at all, there's quite a lot of energy gains so if the lobes are fine on their own, it must be a combination issue.
How I currently do things:
For sampling a lobe:
PDF:
0.5f * sheenPDF + 0.5f * glassPDF
(comes from the 50/50 proba in sampling routine)1.0f * glassPDF
because the sheen BRDF does not deal with directions below the normal hemisphere so the sheen BRDF has 0 proba to sample such a direction.Evaluating the layered BSDF: sheen_eval() + (1.0f - sheen_reflectance) * glass_eval()
.
glass_eval()
(because we would be evaluating the sheen lobe with an incident light direction that is below the normal hemisphere so sheen BRDF would be 0.0f)And with a glass sphere 0.0f roughness and IOR 1, coming from air IOR 1, this gives this screenshot.
Any ideas what I might be doing wrong?
Hi , let's say I have a project with shaders , calls to graphical api , or gpgpu functions, is there cons in doing unit tests for that part of the code ?
For example , I want to test how a cuda kernel behaves, do you think it's a good idea to create a unit test , with the whole buffer allocation , memcpy , kernel execution , memcpy , test the result , destroy the buffer.
Or I want to test the output of a shader , etc etc...
It does slow down the test a bit , but I don't see that as an issue ... What do you guys think ?