/r/ROCm
The ROCm Platform brings a rich foundation to advanced computing by seamlessly integrating the CPU and GPU with the goal of solving real-world problems. This software enables the high-performance operation of AMD GPUs for computationally-oriented tasks in the Linux operating system.
/r/ROCm
Hi there. I am thinking of trying out Rocm on an Ubuntu 24.04 LTS installation. Is the amdgpu-dkms package necessary for rocm to work, or can I just install the rocm packages?
I do a bit of gaming on this machine too, and I like how the mesa drivers work for that use case. I also see that the amdgpu installer script allows as a --no-dkms option. Is installing the rocm package from the Ubuntu repositories functionally the same as running amdgpu-installer with a --no-dkms argument?
I already installed ROCm driver on my Windows 11 pc. And GPU driver is up to date. But I'm getting this errorError response from daemon: error gathering device information while adding custom device "/dev/fdk": no such file or directory
I'm not sure if my GPU 5700XT is not compatible or /dev/fdk don't really exists if the host operating system is a Windows.
Hi, I am new to rocm and ai.
I was able install ROCm 6.1 and after figuring out that pytorch is yet to support 6.1, I was able to uninstall ROCm 6.1 from my WSL and when I tried to install ROCm 6.0 I am getting unable to find the package error. Can someone letme know what I am doing wrong here.
I am following the official documentation support in install ROCm.
using Ubuntu via WSL.
Hi Everyone,
I have a opencl program running a small kernel that simply asks the GPU shaders to compare 64 bit integer values against an array. Essentially this can be thought of as an if(unsigned long == unsigned long) { do something) comparison. Very basic.
__kernel void mySearch(global unsigned long *massiveArray,global unsigned int *idx,global unsigned int *wire,global unsigned long *toTest,constant unsigned int *kNum, global unsigned int *cnt) {
unsigned int i = get_global_id(0);
unsigned int a;
for (a = 0; a < *kNum; a++) {
if (toTest[a] == massiveArray[i]) { // We have a match of the first 64 bits!
idx[*cnt] = a;
wire[*cnt] = i;
atomic_inc(cnt); // Increment the counter so we know there is a result.
}
}
}
Under any kernel using rocm-opencl-5.5.1 and rocm-opencl-devel-5.5.1 my 7900XTX could process about 1.7 Trillion comparisons per second and 6900XT 1.2 Trillion per second.
Using rocm-opencl-5.7.x / rocm-opencl-devel-5.7.1 or later, including 6.0.0 this drops to 450 and 350 billion-ish respectively - a 75% decrease in speed.
Has anyone else encountered this or know what could be happening? With Fedora 40 newly installed I have downgraded the two packages to 5.5.1 and performance has returned. For contrast, a RTX 3080TI does about 830 Billion comparisons per second using the same kernel - so very happy with the AMD card performance under 5.5.1.
Anyone's insight / help welcome. I got no response on the AMD developer forum.
Ant
Last time I checked, they only provided basic HIP SDK, not the full stack. How is it right now?
And are the GPUs with the same ISA version supported, if one is in the support list? i.e. RX7600(gfx1102) is marked as supported on Windows, does it mean that RX7600XT is supported? Or do they do some GPU name check?
Hi all! I was wondering if/when we can look forward to the next build of ROCm (and AMD's GPU drivers in general) being ready for Ubuntu 24.04 LTS, which just released. I'm currently on 22.04.4 LTS, and the desktop experience is getting long in the tooth. I'd like to be able to upgrade to a more modern software stack.
[update] I went ahead and took another stab at 24.04 and realized I had a gross conceptual error regarding ROCm and the Linux kernel. As stated below by several of you helpful redditors, there are packages included in the baseline repos. I didn’t know they were there because I needed to install synaptic to search for what I was looking for. Basically to get it running was as simple as searching for “ROCm” and installing all the related packages that were libraries.
Of course there are other glitches in 24.04 unrelated to ROCm that I’m dealing with now. But Gnome 46 is a big upgrade over Gnome 42.
So I'm trying to create an that would load a .gguf file for an LLM, then have a function that takes a string of text and run it through, then get a string out in response.
The problem I'm stuck at however is that the docs don't exactly tell me how to just load a .gguf file or how to use it as an LLM. I've tried looking at gist.github.com for something to try and show me and I've tried loading up LM Studio to try and get an LLM to tell me how to do this in C++ with zero results for the effort.
Any help would be appreciated. Current stuck with just a hello world written from a empty project. Compiles and works just fine as a hello world but currently stuck here with no clue what so ever on what I need to do and no clue where to go to find out
Hej, Im trying to make the passthrough work with my two 6600 and Ive tried both vmware and now XCP-ng and I get something like this:
root@ollama:/home/ollama# rocm-smi
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
Expected integer value from monitor, but got ""
======================================== ROCm System Management Interface ========================================
================================================== Concise Info ==================================================
Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
Name (20 chars) (Edge) (Avg) (Mem, Compute)
==================================================================================================================
0 [0x6501 : 0xc1] N/A N/A N/A, N/A None None 0% unknown Unsupported 0% 0%
Navi 23 [Radeon RX 6
1 [0x6501 : 0xc1] N/A N/A N/A, N/A None None 0% unknown Unsupported 0% 0%
Navi 23 [Radeon RX 6
==================================================================================================================
============================================== End of ROCm SMI Log ===============================================
root@ollama:/home/ollama#
When I use a ubuntu 22.04 usb stick with a live desktop all runs fine but when I try to use some sort of passthrough in 2 platforms seems I can see the PCI inside the VM but I cannot use it donno why... any ideas?
whenever I run my code it only executes as shown in the image
from PIL import Image
from torchvision.transforms.functional import to_pil_image
from ultralytics import YOLO
from ultralytics import NAS
model = YOLO('yolov8n-cls.yaml')
results = model.train(data='datasets/datasets/classification', source='config.yaml' , epochs=1, imgsz=640,device='0')
image_path = ['test.jpg','test2.jpg']
for i in image_path:
results = model(i)
print(results)# return a list of Results objects
for result in results:
boxes = result.boxes # Boxes object for bounding box outputs
masks = result.masks # Masks object for segmentation masks outputs
keypoints = result.keypoints # Keypoints object for pose outputs
probs = result.probs # Probs object for classification outputs
result.show() # display to screen
result.save(filename=i+'result.jpg') # save to disk
this is torch version I'm using
result of pip3 show torch
:
Name: torch
Version: 2.0.1+rocm5.4.2
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/hamza/.local/lib/python3.10/site-packages
Requires: filelock, jinja2, networkx, pytorch-triton-rocm, sympy, typing-extensions
Required-by: pytorch-triton-rocm, thop, torchaudio, torchvision, ultralytics
the result of executing the code
YOLOv8n-cls summary: 99 layers, 2719288 parameters, 2719288 gradients, 4.4 GFLOPs
Ultralytics YOLOv8.1.47 🚀 Python-3.10.12 torch-2.0.1+rocm5.4.2 CUDA:0 (AMD Radeon Graphics, 24560MiB)
engine/trainer: task=classify, mode=train, model=yolov8n-cls.yaml, data=datasets/datasets/classification, epochs=1, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train10, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=config.yaml, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/classify/train10
train: /home/hamza/Desktop/workspace/ml/datasets/datasets/classification/train... found 16541 images in 9 classes ✅
val: None...
test: /home/hamza/Desktop/workspace/ml/datasets/datasets/classification/test... found 27 images in 9 classes ✅
Overriding model.yaml nc=1000 with nc=9
from n params module arguments
0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
2 -1 1 7360 ultralytics.nn.modules.block.C2f [32, 32, 1, True]
3 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
4 -1 2 49664 ultralytics.nn.modules.block.C2f [64, 64, 2, True]
5 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
6 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True]
7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
8 -1 1 460288 ultralytics.nn.modules.block.C2f [256, 256, 1, True]
9 -1 1 341769 ultralytics.nn.modules.head.Classify [256, 9]
YOLOv8n-cls summary: 99 layers, 1449817 parameters, 1449817 gradients, 3.4 GFLOPs
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
Currently it is still not working but now my environment can identify my GPU through ROCm and the error message Im getting is very telling.
Currently i followed this tutorial
https://askubuntu.com/questions/1429376/how-can-i-install-amd-rocm-5-on-ubuntu-22-04
Then this to pull then run the docker environment
And this is the python code im running:
>import tensorflow as tf
>print(tf.test.is_gpu_available())
And this is the output of the print part:
>>> print(tf.test.is_gpu_available())
2024-04-13 19:58:34.109675: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-04-13 19:58:34.109733: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-04-13 19:58:34.109788: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-04-13 19:58:34.109814: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-04-13 19:58:34.109839: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-04-13 19:58:34.109854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2266] Ignoring visible gpu device (device: 0, name: AMD Radeon RX 6900 XT, pci bus id: 0000:03:00.0) with AMDGPU version : gfx1030. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.
2024-04-13 19:58:34.109877: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-04-13 19:58:34.109889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2266] Ignoring visible gpu device (device: 1, name: AMD Radeon Graphics, pci bus id: 0000:13:00.0) with AMDGPU version : gfx1036. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.
False
------------------
As you can see my gpu was correctly found as a AMD Radeon RX 6900 XT, and the AMDGPU version : gfx1030 is also correct I assume as it's on the supported list. The issue is that the supported list in that damn stupid check up is written as gfx1030gfx1100 with a damn TYPO. There is no comma in between, so this means my gpu is not passing the check because gfx1030gfx1100 is being used as an actual gpu name. I'm beyond furious.
Is there a way to either bypass that checkup or edit the file myself to fix this? This is stupid. My gpu clearly is supported but the whole gfx1030gfx1100 is not allowing me to progress.
Is it possible to rename my gpu to gfx1030gfx1100 or something like this?
Thank you.
Any tips from people who made that work? I'm a bit stuck on this and I'm trying things non stop for about 5 days without success.
The only thing I did that "worked" was using a package called tensorflow-directml but by using it im stuck with an extremely old version of tensorflow which is not suitable for anything (such as using keras_cv). Could you guys help me?
I was wondering if anyone has been using zluda on Linux? what's been your experience and any difficulties?
It appears that the GPU profiler for OpenCL (gpuopen.com) on RHEL 9 does not work is there an alternative profiling tool that does work? Has anyone had any luck with rocmprofiler?
This is for gfx11000.
Thanks!
I am working through familiarizing myself with the rocm suite, and have made my way to hipRAND. I have worked with other prob/stats libraries previously, mostly for the intended use case of scientific computing applications, but partly as a personal exercise for API development. Digging into it, it seems that hipRAND and has implemented a handful of common distributions (uniform, normal, lognormal, some discrete stuff), but lacks others, even fairly common ones such as gamma, exponential, etc.
It makes sense that the most common use case is simply to provide a tool for pseudo-random uniform distribution generation for the rocm/hip framework. If you're a user of hipRAND, do you feel like there is much missing in terms of breadth? Are you content JUST utilizing hipRAND's uniform and normal distribution functionality?
https://github.com/anishsheikh/rocm-gfx1100 In case anybody needs something . I build for myself Mostly
Hey everyone,
I'll be providing a live webinar with AMD on Wednesday March 27 at 2pm (US ET) that will show how to get started with Pytorch on systems with Radeon and Instinct GPUs.
I'll be talking about our implementation of a matrix-free Implicitly Restarted Lanczos Method (eigenvalue/eigenmode solver) using Pytorch. Plus, I'll cover installation and setup of Pytorch on systems with AMD Radeon and Instinct GPUs. We'll also discuss performance comparisons across a few GPU platforms for some of our benchmark cases for this method. There will also be a Q&A at the end. See you there!
Register to attend the free webinar hosted by AMD. If you can't make the live webinar, you can access the recording after the event using this same link.
Hi all. There has been a long-known bug (such as this and this) in AMD's official build of rocBLAS that prevents running rocBLAS on gfx1010
/gfx1011
/gfx101*
GPUs. This means that if you're on a RDNA1 GPU (such as RX5000 series) and you obtained ROCm packages through AMD's official repository, most of the ML workflows would not work given that the use of rocBLAS is almost ubiquitous, such as running stable diffusion with the official ROCm PyTorch packages. Recently we've fixed a bug that should allow official builds to work again with RDNA1 GPUs. Hopefully, ROCm 6.1 release should contain this fix that will allow RDNA1 users to run ML workflows out-of-the-box again.
Note to distribution maintainers: just porting that single fix is not enough because it depends on a previous bug fix. It's recommended for now to continue building rocBLAS with -DTensile_LAZY_LIBRARY_LOADING=OFF
until a release containing both patches comes out.
It's just a small pull request that i saw, it seems that on MIOpen project they will start testing support for Navi 32 (7800 XT- 7700 XT)
https://github.com/ROCm/MIOpen/pull/2796
so this would mean that Rocm 6.1 might have official support to Navi 32 on Linux.
This is the current state of support on Rocm 6.0
So Yaay happy news for me 7800 XT :D it's getting close.
I want to upgrade my gpu and consider RX 6800XT since it's cheap, fast and has plenty of vram. I play games but I am also a data science undergrad, so I might need acceleration for neural networks in pytorch, gpu computing in LightGBM and all that stuff. Nothing LLM grade ( although, if I could fit some type of LLM into those 16 GBs, hmmmm ), but I'd want decent precision results without artifacts.
So, the question is - can I run stuff like pytorch on RDNA2 gpus, like 6800 or 6700, etc, or is that a feat only bestowed to RDNA3 gpus with their newer tech and AI acceleration cores ?
I know that the RX 7800 XT is not supported by ROCM yet, but I have seen many people that have achieved this unoficially, can someone explain to me how can I do that?
do you think we will have MIOpen on rocm 6.1 enabled on windows? i read the release notes and it doesn't say anything about that . only on migraphx some initial enablement code or something like that.
MIOpen/CHANGELOG.md at release/rocm-rel-6.1 · ROCm/MIOpen (github.com)
I've been reading this sub and other sources and it seems there is limited and undocumented windows rocm support for the 6650 and higher. Any hope for 6600?
I bought a PC last year not really knowing what to look for, just wanted decent gaming performance.
Now I'm learning LLM training and while cloud compute is an option I'm trying to learn all modalities. I'd like to work locally just to learn what there is to learn and save money.
Specifically pytorch, models from hugging face, adapting code written for cuda, etc.
I'm experienced writing python and installing packages in venvs but less so compiling drivers, dual booting Linux, or changing hardware.
Appreciate any guides. Thanks!
I am interested in the 8700G (and the associated 8700 EG) processors for an embedded project. I currently have a 5700G processor and ROCM seems to support it despite the fact that it is not on the supported processor list. Does anyone know if the 8700G processors (gfx1103) are supported if you install the latest ROCM?
Thanks
Seems like a good deal and you can't get that 256bit bandwidth for that price on Nvidia.
Hi! I don't know if this is the right subreddit to ask about this, but I assume a lot of you guys have experience with ZLUDA.
I'm currently working on a project and I'm using the ts_zip tool (full documentation here). The tool can take advantage of CUDA to accelerate the AI processes. I've set it up to run through CPU, but I would like to try and get it running on GPU (I have a RX 6800). I've installed ZLUDA as per these instructions (up to Compilation/Settings, since those are Stable Diffusion specific).
When I try and run ts_zip with cuda, for instance:
./ts_zip --cuda -m rwkv_169M.bin c alice29.txt /tmp/out.bin
I receive this error:
Could not load: nvcuda.dll (error=126)
I have also tried running ts_zip through the ZLUDA executable as documented here under "Usage", for instance:
<ZLUDA_DIRECTORY>\zluda.exe -- ts_zip --cuda -m rwkv_169M.bin c alice29.txt /tmp/out.bin
but then get a different error:
Could not load: libnc_cuda-12.dll (error=126)
The ts_zip documentation mentions that it is very specific about CUDA filepaths, so even wrong versions of CUDA can trigger these errors. It states:
If you get an error such as:
Could not load: libnc_cuda-12.dll (error=126)
it means that cuda is not properly installed.
Then edit the ts_server.cfg configuration to enable GPU support by uncommenting
cuda: true
and run the server.
If anyone has any expertise with using ZLUDA, I would greatly appreciate your help in pointing out any errors I may have committed! Thank you!
Here are the commands I executed
wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
sudo amdgpu-install --usecase=hiplibsdk,rocm
sudo usermod -aG video $USER
sudo usermod -aG render $USER
sudo reboot
rocminfo
output: rock module is not loaded, possibly no gpu devices and my gpu drive turned to llvmpipe
P.S.
To be precise, during the first execution of
"sudo apt install ./amdgpu-install_6.0.60002-1_all.deb",
it finally shows that the permission to detach from sandbox is insufficient. So I executed it for the second time, and it displayed as follows.
Please check amdgpu install instead of/ Amd gpu install_6.0.60002-1aual.deb Amdgpu install is already the latest version (6.0.60002-1718217.22.04)
I don't know if this will affect the installation of the GPU driver, so I installed rocm afterwards
GPU:6700xt
System:ubuntu 22.04.4 LTS
Kernel 6.5
rocm6.0.2
update 4:
I'll have to ask about this later. So far, the pytorch tests with slow tests enabled is reporting 3094 failed, 954 passed, 528 skipped, 69 xfailed [whatever that means], 6188 rerun in 676.99 seconds
so many tests failed. Is that because of the lack of atomics I wonder.
The only error messages I can make out are a bunch of "TestFakeTensorCUDA::..." tests failing.
Now a bunch of "TestCompositeComplianceCUDA" tests are failing
update 3:
It seems to be working at least for some things.
I'm running pytorch tests right now.
vulkan and opengl work.
it does seem that if I enable hardware acceleration in chromium, I get an occasional system crash, though youtube works.
the log files say that PCIe atomics are not present on this machine, but I guess rocm on vega 20 does work for some things without that.
update 2:
the crashes rendering youtubes went away when I made a new install that used proprietary drivers in the initial install instead of open sourced ones.
Not sure about the other crash because I'm still reinstalling things.
I gave up on my previous install when adding the vulkan pro driver crashed linux so hard that I was having trouble recovering it.
update. The drivers were loading despite being tainted, and you can't really turn tainting off.
But they were crashing because I had two video cards installed, the MI50 and an HD 5450.
The driver is happy if only one of the two are installed, but not both. Otherwise it silently crashes.
Current state:
Graphics works, though I that wasn't my intention.
But linux is crashing:
That gets as far as loading a model into the card, but trying to generate a picture either:
a) the first time I tried it, it stalled for a few minutes, then the screen screen went black and the system locked up.
b) the second time I tried it, the model loaded, but generating an image rebooted the system instantly.
I have a pretty big fan on the card so I doubt that's the problem.
It's possible, but unlikely that the power supply in my Dell Precision 5600 can't take the power draw. It's an 850 watt power supply, but the processors themselves have a 135 watt tdp each, and the MI50 has a 300 watt tdp, so that could cut it close under load. However youtube shouldn't make it draw that much, and I didn't hear the fans on the cpus ramp up, nothing should be making 16 cores get used. But maybe the one extra power connector can't handle driving the two extra power inputs to the card.
In any case, the system also crashed while I was surfing the web for answers after an hour or so.
................
I'm trying to get an Mi50 32gb working for machine learning, I don't care if it works as a video card.
I realize that there's a good chance I'll need to change motherboards and so on to get this going, but I'm starting out seeing if I can get it working on the hardware I have with whatever brand new Ubuntu install I can configure for it.
After using amdgpu-install on Ubuntu 22 without errors before the reboot, I noticed during booting a message saying that an AMD driver was not signed and was being marked tainted.
I see in the documentation mention that it's supposed to sign the drivers for secure boot.
The kernel is indeed marked as tainted with two errors, one saying that a driver was from "out of tree" and another that a driver was unsigned.
rocminfo reports "ROCk module is NOT loaded, possibly no GPU devices"
I'm wondering if ROCm requires secure boot, and if I can avoid this whole problem by reinstalling Ubuntu with secure boot and the TPM off.
So that's my current question.
I suppose I should also ask if I should give up and buy a new motherboard. I'm using an ancient xeon workstation that I know full well doesn't support PCIe 3 atomics (dual e5-2690s on a Dell Precision 5600). It's using an intel chipset to (for some reason) convert a couple of xeons that only support PCIe2 to PCIe3, without making it as fast as PCIe3.
I also know that there existed, at one point, a version of ROCm that didn't need PCIe3 atomics to use Vega 7nm boards like the Mi50, but the current documentation no longer says that it isn't necessary. Anyway I thought that it would be worth a try.
Don't think you have to answer both my questions, if you know one of them, speak up. It seems like almost NO ONE is using these cards. I can find no examples of people using them on the internet. I asked one guy who was selling an Mi100 what consumer level motherboards he'd used them on, hoping that if they work for an Mi100 they'll work for an Mi50.
Have a 7900 XTX. Apparently it's only working on MI210 and MI250? Kinda sucks being gimped and forced to use such short sequence lengths despite 24GB VRAM.
https://huggingface.co/docs/transformers/perf_infer_gpu_one?install=AMD#flashattention-2