/r/gpgpu
A subreddit for GPGPU applications, implementations, methods, and code.
If you're new to GPGPU programming, and don't know where to begin, check out /r/cuda101.
For a series of video lectures on OpenCL 1.2, check out the Professional OpenCL Training Series.
/r/gpgpu
Hey I want to do a graph-like visualization using a particles simulation approach, with a GPU-accelerated database. Not sure which tools might enable this. This would feed into react-wrapped webgl front-end. Thanks!
TLDR: I need guidance for which framework to choose in 2024 (the most promising and vendor agnostic). Most posts related to that in this sub are at least 1 year old. Has something changed since then?
Hi guys, I'm a software engineer interested in HPC and I am completely lost trying to get back to GPGPU. I worked on a research project back in 2017/2018, and I went for OpenCL, as it was very appealing: a cross platform non-vendor specific framework that could run on almost everything. And yeah, it had a good Open Source support, specially from AMD. It sounded promising to me.
I was really excited about newer OpenCL releases, but I moved to other projects in which GPGPU weren't appliacable and lost the track of the framework evolution. Now I'm planning to develop some personal projects and dive deep on GPGPU again, but the ecosystem seems to be screwed up.
OpenCL seems to be diying. No vendor is currently suporting newer versions of the ones they were already supportting in 2017! I researched a bit about SYCL (bought Data Parallel C++ with SYCL book), but again, there is not a wide support or even many projects using SYCL. It also looks like an Intel thing. Vulcan is great, and I might be wrong, but I think it doesn't seem to be suitable for what I want (coding generic algorithms and run it on a GPU), despite it is surely cross platform and open.
It seems now that the only way is to choose a vendor and go for Metal (Apple), CUDA (NVIDIA), HIP (AMD) or SYCL (Intel). So I am basically going to have to write a different backend for every one of those, if I want to be vendor agnostic.
Is there a framework I might be missing? Where would you start in 2024? (considering you are aiming to write code that can run fast on any GPU)
I have large 10 TB of text data in AWS S3 and want to train a LLM on it. To save on GPU costs, I want to use CoreWeave or LambdaLabs or similar (i.e. not AWS's GPU offerings). Is there a way to transfer that 10TB of data from AWS S3 to CoreWeave / LambdaLabs / etc. without incurring the egress cost of AWS S3 ?
People who use CoreWeave / LambdaLabs / etc. for training, where are you storing your data for CPU-based preprocessing etc. ?
Hello everyone!
I am struggling for months with a problem that I have, specifically some algorithm to calculate some stuff and I have performance issues because of (a LOT) of global memory writes! I would like to know if there is a specific place I can ask for some opinions for my kernel code, I assume here it is not allowed?
Thanks!
Hey Reddit,
Looking through GPU options for A100 instances, and I'm amazed at how much the hyperscalers charge for GPUs over providers like Coreweave, Lambda, Fluidstack ect.
Can someone explain why businesses use hyperscaler GPUs instead of some of the other options on the market? Is it just availability?
Does anyone know how TornadoVM (https://www.tornadovm.org/) compares to other options like oneAPI or Kokkos?
I've been primarily programming in Java for 25 years, but I'm wondering if I should switch back to C++ for GPGPU development.
Would it be possible make transcoding of newer video formats more efficient by also utilizing the gpu of a system instead of just relying on the cpu?
Let's say I have a somewhat old machine with a gpu that doesn't support hardware based AV1 encoding, but which still supports OpenCL and/or CUDA. Could there be a performance gain from implementing some components of the encoding process as a GPGPU program?
What is the easiest way to start programming with a Radeon Pro VII in C++ in Windows?
In case somebody can make use of some background and has a couple of minutes to read about it:
I'm a mechanical engineer with some interest in programming and simulation. A few years ago I decided to give GPGPU a try using a consumer graphics card from nVidia (probably a GTX 970 at that point) and CUDA. I decided to try CUDA against OpenCL, the main other alternative at that point, because of CUDA was theoretically easier to learn or at least was supported by many more learning resources.
After a few weeks I achieved what I wanted (running mechanical simulations on the card) using C++ in Visual Studio. It didn't offer great advantage over the CPU partly because of consumer cards being heavily capped in double precision math, but I was happy with the fact that I had managed to run those simulations in the GPU.
The idea of trying other cards with more FP64 power has resounded in the back of my mind since then, but such cards are just too expensive they are just hard to justify for a hobbyist. The Radeon VII seemed to be a great option but they mostly sold out before I decided to purchase one. Until in the last weeks the "PRO" version of the card, which I hadn't heard of, dropped its price heavily and I was able to grab a new one for less than 350€, with its 1:2 FP64 ratio and slightly above 6 TFLOPS (against 0.1 for the 970.)
As CUDA is out of the question with an AMD card, I've spent quite a few hours during the last couple of days just trying to understand what programming environment I should use with the card. Actually in the beginning I was just trying to find the best way to use OpenCL with Visual Studio and a few exmaples. But the picture I've discovered seems to be much more complex than what I have expected.
OpenCL appears to be regarded by many as dead and they just advice not to invest any time learning it from scratch at this poing. In addition to that I have discovered some terms which were completely unknown to me: HIP, SYCL, DPC++ and oneAPI, which sometimes seem to be combined in ways I just didn't grasp yet (i.e. hipSYCL and others). At some point of my research oneAPI seem like it could be the way to go as there was some support for AMD cards (albeit in beta stage) until halfway during the installation of the required packages I discovered support for AMD was only offered for Linux, which I have no relevant experience with.
So, I'm quite a bit lost and struggling to make a picture of what all those options mean and which would the best way to start running some math on the Radeon. I would be very thankful to anyone who would want to cast some light in the topic.
I had worked a 3D generalist from age of 18, now I am 2nd year software engineering student (22 yrs old), I switched my career interest from an graphics artist to software engineer. I have been lost for sometime to think what I really want to work on this few years into my degree. I don’t want to do websites, app or any mainstream development. I work with C/C++ and been learning Qt development. I did alot of research and found out much interest always lied on graphics and programming together, also my background supports this. I shared my thought with my brother who was in app dev for 5 years that I want to learn and build my career in graphics programming and GPU programming. He said, there isn’t much money and people working in this field are getting paid way less than how hard they have to work day to day and suggested me to do app or web dev to make good money and also said gpgpu market is niche.
Is this really true, is it not worth it then other developments? Please share how have experienced people in this field have felt till now and how they think the market is.
I am planning to learn OpenGL and Vulkan as I have some C++ programming experience. I am interested in GPGPU programming, and I have already been a 3D artist, which pulled me into this field. I am a 2nd-year software engineering student, and I have some good resources to learn Vulkan, but I am not quite sure where to start OpenCL from. I don't want to do CUDA as I don't want to be bound to one vendor's library. I use a MacBook 14 Pro. I am a complete beginner, so pardon me if my questions don't make much sense. Please, experienced engineers, help me get started.
Also if I am approaching anything the wrong way, please let me know what's the best.
Can any one help me out? For the past couple hours i have been trying to link my password store to github. I have tired so many ways but still fall short. Linking my password store on my laptop was very simple. My main two problems that has been stopping me all this time was when i try to connect github via ssh in password store with an openkeychain authentication key it says "could not get advertised ref for branch master". Then other times after messing around it, it says "enter passphrase for this repository" no matter what password i use it is not the right password to get pass. Can anyone help?
I have an old laptop with specs of i5 8th Gen with an Nvidia 1050 gpu. I have been researching whether this is better to use than Nvidia jetson nano for my use case which is for Deep Learning, Object Detection, and Feature Extraction. I would really like to hear recommendation on what I should be using, thank you so much.
i work in a ophthalmology clinic and we're buying a new machine that requires a decent PC hardware
the maker of the machine recommends a GPGPU to go with it for optimal performance but they are no longer available in my country, so the ppl importing the machine suggest nvidia Quadro's as equivalents for it, they didn't really explain to me why it needs workstation gpu they simply said it needs a good amount of vram, they also said it can even run on an IGPU with 1gb vram so now im confused whether to find a decent fast gaming gpu with decent vram or nvidia quadro's with decent vram
only detail i got about the machine is that it uses the vram for processing images?
i heve no clue if this is a proper subreddit for it but im asking hoping for an expert
the machine in question is TOPCON OCT TRITON
I've implemented the technique described here for collision detection; it looks great and believable.
The one feature I'm missing in my results is determinism; i.e. two identical setups will have slightly different results; but I'm not sure if this technique is supposed to be deterministic– or if I should keep hunting in my implementation for a bug?
My first theory was maybe each collision cell needs to be internally sorted to always execute its objects in the same order. Didn't seem to change improve my results.
I then tried adding a secondary objects buffer so that I wouldn't read and write to the same one while performing the collisions; but this actually made the simulation unstable.
I was looking for advice or any research done on the following problem if anyone has any experience dealing with the issue/has heard of it.
Problem Statement: I have a system that expects to receive and perform inference calls on machine learning models. Any model that can be called is usually very different from any other and hence caching parameters or other model specific data may not be as useful as storing some type of information that is more useful for the overall average compute time across multiple different model inference calls with minimal data replacement done to the cache.
There are a couple options I know of, the main idea of most being some type of predictive caching, but I was wondering if anyone knew of any approach to caching that would provide minor individual model inference call improvements that would average to ok performance over many different models being called as opposed to individual model inference call runtime improvements. I know it's not exactly related, but I'm already implementing quantization so don't worry about that part.
The models are expected to be any supported by the ONNX format. I understand the question is asking for the best of both worlds in a way, but I'm willing to sacrifice a good bit of run time on individual models if something like caching certain operations or values would improve performance overall on average and bypass deciding the most useful parameters to cache when receiving multiple model requests. Anything helps, including telling me there's not a good solution to this and just doing it normally :) Thanks
As in the title.I do need a fast way to sort multiple short arrays (realistically it would be between ~ 40 thousand and 1 million arrays, every one of them ~200 to ~2000 elements long).
For that, the most logical choice does seem to be just to use GPU for that, but I can't find any library that could do that. Is there anything like that?
If there isn't I can just write a GLSL shader, but it seems weird if there isn't anything any library of that type. If there does exist more than one I would prefer Vulkan or SyCL one.
EDIT: I need to sort 32-bit or even 16-bit floats. High precision float/integer or string support is not required.
The MI450 will have this. Will it be done through mainboard, or will they have bridge cables between GPU cards, as in the old Crossfire?
If you wanted to run some AI, the oldest Cuda GPU was on 90 nm lithography, which might be fat enough for cosmic radiation. The most memory was the S870 with 6 GiB, but it appears to be 4 units in one case with 1536 MiB each. Only 1382 GigaFLOPs all four together. But then if it is cruising for years, slow computation might not be an obstacle.