/r/ROCm

Photograph via snooOG

The ROCm Platform brings a rich foundation to advanced computing by seamlessly integrating the CPU and GPU with the goal of solving real-world problems. This software enables the high-performance operation of AMD GPUs for computationally-oriented tasks in the Linux operating system.

/r/ROCm

1,769 Subscribers

1

Fedora 41 + ROCm (dkms) compatibility

Hey folks, do you know, will amdgpu dkms work in the latest Fedora 41?

I guess it will not because it has kernel 6.11, but just want to make sure. I have AMD Mi100 and unfortunately it requires amdgpu dkms to work. So maybe someone have already tried to install it?

I saw this issue https://github.com/ROCm/ROCm/issues/3870

but maybe you have more information.

3 Comments
2024/11/01
23:05 UTC

17

ROCm 6.2 for Radeon gpus

https://community.amd.com/t5/ai/new-amd-rocm-6-2-for-radeon-gpus-delivers-performance-amp/ba-p/715854

Triton beta support. Official support for stable diffusion 2.1

Flash attention 2

3 Comments
2024/11/01
14:37 UTC

3

Trying to install SS webui with zluda but having issues with webui-user.bat

Used this guide as a basis for installing SD webui: https://youtu.be/n8RhNoAenvM?si=nEXr1st0I33TR3wW

Yet when I open webui-usee.bat and it attempts to open the webui on my browser it craps out after the onyx check giving me a Exception Code: 0xC0000005, currently don't have the full specific strings of issues besides me seeing a zluda dll, a bunch of rocm 6.1 dlls, and some python dlls being listed.

Currently using a 6700xt, python 3.10.06, ROCm 6.1, and latest zluda release.

3 Comments
2024/11/01
11:27 UTC

2

Is there a working version of flash attention 2 for AMD MI50/MI60 (gfx906, Vega 20 chip)?

Hi everyone,

I have been trying to install flash attention 2 to work with my 2x MI60 GPUs. However, I was not successful in finding a correctly working version. Here is what I tried.

I compiled https://github.com/ROCm/flash-attention.git (v2.6.3) successfully on my Ubuntu 22.04.5 LTS (x86_64). By default, gfx906 is not officially supported. I changed file setup.py line 126 - added "gfx906" to allowed_archs. It took 2 hours to compile successfully. But it failed all the tests: pytest -q -s tests/test_flash_attn.py

Still, I tried to benchmark a single MI60. Benchmark worked fine: python benchmarks/benchmark_flash_attention.py

### causal=False, headdim=128, batch_size=16, seqlen=1024 ###
Flash2 fwd: 70.61 TFLOPs/s, bwd: 17.20 TFLOPs/s, fwd + bwd: 21.95 TFLOPs/s
Pytorch fwd: 5.07 TFLOPs/s, bwd: 6.51 TFLOPs/s, fwd + bwd: 6.02 TFLOPs/s
Triton fwd: 0.00 TFLOPs/s, bwd: 0.00 TFLOPs/s, fwd + bwd: 0.00 TFLOPs/s

If FA2 worked correctly, above numbers meant I would get almost 14x improvements in fwd pass and 3x speed up in bwd pass.

Additionally, triton also does not work and for this reason the numbers for triton above is 0 (I have pytorch-triton-rocm 3.1.0).

I was curious and installed exllamav2 that can use FA2 for faster inference. Unfortunately, with FA2 enabled, exllamav2 for llama3 8b was outputting gibberish text. When I disabled FA2, the model was outputting text correctly but 2 times slower.

I also compiled aphrodite-engine (commit) and it worked fine without FA2 using gptq models. However, when I enabled FA2, it also outputted garbage text.

I also compiled the official FA2 repo (https://github.com/Dao-AILab/flash-attention.git) but it did not even run due to gfx906 not being in their support list (I could not find the code to bypass this requirement).

I have PyTorch version 2.6.0, ROCm version 6.2.4, Python 3.10.12, transformers 4.44.1.

Here is how I installed pytorch with ROCm:

python3 -m venv myenv && source myenv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2/

My question is, has anyone been able to correctly compile FA2? or has there ever been support a working version of FA2 for MI50/60? Since AMD manufactured these cards as server cards, I imagine they were used for training and inference of models at some point but what was their use case if they did not support pytorch libraries earlier?

Side note, I have working python experience and happy to look into modifying the ROCm FA2 repo if you could share some pointers on how to get started (which parts I should focus on for gfx906 architecture support)?

Thank you!

0 Comments
2024/10/31
14:45 UTC

1

7600S for Windows HIP SDK?

I have a CUDA application which I want to eventually run on an MI300X. It's being developed on Windows but also runs on Linux.

The easiest path for porting would be a laptop that's compatible with the Windows HIP SDK. The HIP SDK doesn't mention any Radeon mobile GPUs, but I'm wondering if anyone knows if they'd work. The 7600S is easiest for me to get. The 7600 (desktop) is supported.

0 Comments
2024/10/31
14:10 UTC

14

Llama 3.2 Vision on AMD MI300X with vLLM

Check out this post: https://embeddedllm.com/blog/see-the-power-of-llama-32-vision-on-amd-mi300x

https://reddit.com/link/1ggb4a0/video/s8j3n06sh2yd1/player

The ROCm/vLLM fork now includes experimental cross-attention kernel support, essential for running Llama 3.2 Vision on MI300X.

This post shows you how to run Meta's Llama 3.2-90B-Vision-Instruct model on an AMD MI300X GPU using vLLM. We provide Docker commands, code, and a video demo to get you started with image-based prompts.

1 Comment
2024/10/31
10:18 UTC

18

Llama 3.1 Inference on AMD MI300X GPUs: A Technical Guide with vLLM (With benchmark)

Check this out on vLLM Blog:
https://blog.vllm.ai/2024/10/23/vllm-serving-amd.html

This post provides a deep dive into optimizing vLLM for inference of Llama 3.1 models on AMD's MI300X GPUs. We explore key parameters and techniques to maximize throughput and minimize latency.

Key Results:

  • vLLM: 1.5x higher throughput and 1.7x faster TTFT than Text Generation Inference (TGI) for Llama 3.1 405B; 1.8x higher throughput and 5.1x faster TTFT for Llama 3.1 70B.

https://preview.redd.it/p0mw0b8kf2yd1.jpg?width=2486&format=pjpg&auto=webp&s=d0527f134bd24dd2129397e43d6f22d0802d3e1b

The post studies the 9 parameters,

  1. Chunked Prefill: Disable this on MI300X in most cases for better performance.
  2. Multi-Step Scheduling: Set --num-scheduler-steps between 10 and 15 to optimize GPU utilization.
  3. Prefix Caching: Combine with chunked prefill cautiously, considering the caching hit rate.
  4. Graph Capture: For long context models, set --max-seq-len-to-capture to 16384, but monitor for potential performance degradation.
  5. AMD-Specific Optimizations: Disable NUMA balancing and tune NCCL_MIN_NCHANNELS.
  6. KV Cache Data Type: Use the default setting to match the model's data type.
  7. Tensor Parallelism: Adjust based on your throughput vs. latency requirements.
  8. Maximum Number of Sequences: Increase --max-num-seqs (e.g., to 512 or higher) to improve resource utilization.
  9. Use CK Flash Attention: Prioritize the CK Flash Attention implementation for significant speed gains.
8 Comments
2024/10/31
10:05 UTC

7

Any improvements after OpenAI started using AMD?

Recently stumbled upon this article https://www.amd.com/en/newsroom/press-releases/2024-5-21-amd-instinct-mi300x-accelerators-power-microsoft-a.html and started wondering if anyone can see any improvements using AMD cards for deep learning, any sizeable improvements in ROCm stability for example, new features, performance etc.

Currently thinking to buy a bunch of 3090s, but wanted to understand if a couple AMD cards will be a potentially better investment for the next year/two.

3 Comments
2024/10/30
11:32 UTC

5

Help: I want to Use Stable Diffusion CLI with Zluda…

Hi everyone,

I’m currently working on a project based on Auto1111SDK, and I’m aiming to modify it to work with Zluda, a solution that supports AMD GPUs.

I found another project where this setup works: stable-diffusion-webui-amdgpu. This shows it should be possible to get Auto1111SDK running with Zluda, but I’m currently missing the know-how to adjust my project accordingly.

Does anyone have experience with this or know the steps necessary to adapt the Auto1111SDK structure for Zluda? Are there specific settings or dependencies I should be aware of?

Thanks a lot in advance for any help!

28 Comments
2024/10/29
22:16 UTC

3

installing ROCm on iGPU Ryzen 5 4500U (gfx909) (arch)?

is it safe and possible/at which versions to install ROCm on a ryzen 5 4500U's iGPU (gfx909)?
there is no dedicated gpu in the laptop.
in the past while it wasn't officially supported it was possible to install it without even needing to compile it yourself. I do however not remember what version of ROCm that was however.

but now I wanted to reinstall it(added new ssd and switched from mint to arch(just because I like to see many options once in a while to see how far all have come))
and when I visited the ROCm website and it's install page it showed a warning message surrounding installing it on systems with a iGPU. speciffically saying they needed to be disabled in the bios since otherwise it could cause unstability issues and chrashes.
the wiki however(arch wiki) doesn't seem to mention such a warning)

so this left me with multiple questions. many of them are described as exact questions but rough estimates are fine.

  1. can ROCm be installed on that gpu, or speciffically the newest version?
    1. does it need to be manually compiled now?
    2. up to what version can it be directly installed?
    3. up to what version can it be installed with manual compilation?
  2. do those mentioned instabilities still happen?
  3. do those mentioned instabilities also happen on a system without a dedicated gpu?
  4. are there custom versions aimed speciffically at IGPU's or recommended build/compile arguments to optimize it for iGPU?
  5. what are those instabilities and chrashes like?

I would be okay with it occasionally chrashing, as long as it doesn't actually destroy my system or other projects I am working on, and a reboot or such at most is enough to get rid of the chrashes effects, but preffer no or very unlikely chrashes.
as for installing it or compiling it I am okay with those, even though if there are speciffic arguments to make it better or work properly those might be nice.
I just do not want to install it only to find it makes the entire system unstable or such(kind of like NVIDIA drivers do(or atleast before I knew well enough to avoid NVIDIA like hell since they are so closed source you just can't use them properly)) while I know things can be fixed, but I preffer to not seek to much problems if they might be easy to avoid.

2 Comments
2024/10/28
19:13 UTC

3

ROCm on RX 5700 XT / gfx1010 with pytorch ?

I'm new at using ROCm. I've been trying to get it working on RDNA1. However the docs say there is no official support for gfx1010 even though I've come across ROCm/Tensile#1897 going with ROCm 6.2 on this thread? Does it really work or do I have to use rocm_sdk_builder to build for a custom target such as gfx1010 and then build pytorch from source for that custom ROCm?

Many Thanks.

10 Comments
2024/10/26
06:44 UTC

3

is it worth buying rx7800xt for rocm?

7800 is officially supported by rocm ( win or linux)? I want to try tensorflow and ai art (Stable diffusion etc).

20 Comments
2024/10/25
09:23 UTC

2

Trying to gain an understanding on how to install it properly

I have a 6950 xt and Arrch Linux. I already have it set up quite well for gaming and I don't want to botch what I already have. Would Docker be the appropriate solution to isolate any ROCm configurations from my gaming setup? Do I have that right?

5 Comments
2024/10/23
20:16 UTC

2

i unfortunately cannot for the life of me get the rocm fork of koboldcpp working on fedora 40. can someone help?

Ive been able to download comfyui and have it work with my 7900xtx but for some reason koboldcpp keeps giving me the "ROCm error: no kernel image is available for execution on the device" error. ive tried messing with setting hip devices and even turning off my integrated graphics for my 7800x3d in the bios. but nothing i do seems to work. from what i gatheres its not supposed to be that hard to get it up and running on fedora but im stumped. can anyone give some guidance? i can provide any necessary terminal outputs and the sorts.

6 Comments
2024/10/23
19:35 UTC

3

Trying to install rocm to run pytorch for my rx 6950xt.

Hi Everyone!

Im new to RoCM and I installed ubuntu 24.04LTS as I heard ROCM works with Ubuntu unlike Windows. I tried to install version 6.2.2 of RoCM and was met with: "he following packages have unmet dependencies:

hipsolver6.2.2 : Depends: libcholmod3 but it is not installable

Depends: libsuitesparseconfig5 but it is not installable

rocm-gdb6.2.2 : Depends: libpython3.8 (>= 3.8.2) but it is not installable

E: Unable to correct problems, you have held broken packages."

So i installed "sudo add-apt-repository -y -s deb http://security.ubuntu.com/ubuntu jammy main universe" according to an answer at https://askubuntu.com/questions/1517236/rocm-not-working-on-ubuntu-24-04-desktop

and when I retried sudo amdgpu-install --rocmrelease=6.2.2 --usecase=rocm,hip --no-dkms
It still returned "he following packages have unmet dependencies:

rocm-gdb6.2.2 : Depends: libpython3.8 (>= 3.8.2) but it is not installable

E: Unable to correct problems, you have held broken packages."

I'm very new to this so i was hoping if someone could tell me if this is fixable or alternatively if it isn't what version of ubuntu and RoCM is viable and working for my GPU. I am doing an assignment for AI where i need to train a neural network to classify images via pytorch and really need this to speed up processing time.

Thank you so much for your help!

Edit: I was following this github walkthrough that was linked to in this subreddit "https://gist.github.com/jurgonaut/462a6bd9b87ed085fa0fe6c893536993"

Also checked my python version is 3.12.3, and I tried sudo apt-get python3.8 and it returned no installation candidate. Should I look for a PPA for python 3.8?

13 Comments
2024/10/22
12:38 UTC

6

7840HS/780M for cheap 70B LLM Run

Hi all, I am looking for a cheap way to run these big LLMs with a reasonable speed (to me 3-5tok/s is completely fine). Running 70B (Llama3.1 and Qwen2.5) on Llama.cpp with 4bit quantization should be the limit for this. Recently I came across this video: https://www.youtube.com/watch?v=xyKEQjUzfAk which he uses an Core Ultra 5 and 96GB of RAM then allocate all the RAM to the iGPU. The speed is somewhat okay to me.

I wonder if the 780M can achieve the same. I know that the BIOS only let you to set UMA up to 16GB but Linux 6.10 kernel also updates to support Unified Memory. Therefore, my question is, if I get a Mini PC with 7840HS and get a dual SODIMM DDR5 2x48GB, could the 780M achieve somewhat a reasonable performance? (given that AMD APU is considered more powerful), Thank you!

10 Comments
2024/10/21
10:14 UTC

4

RCOm on 6800XT

Hi,

Has anybody here managed to get RCOm to run on a 6800XT? The documentation of 5.7 and 6.2 says its only supported on windows which is hard to believe. I'm currently working with Ubuntu 20.04 LTS

Would be nice if anybody could share their experiences with me :)

10 Comments
2024/10/20
12:48 UTC

1

SD on windows only produces a grey image

I used this video as my guide: https://www.youtube.com/watch?v=n8RhNoAenvM

Basically SD wouldn't utilize my gpu so I followed this guide that involves ZLUDA.

Unfortunately every single image tanks around a second to render but its just a grey color no matter what model I use.

I apologize if am not giving enough information; am really new to Image generation.

here is my CMD in case it'd be useful

D:\SD\stable-diffusion-webui-directml>webui-user.bat

venv "D:\SD\stable-diffusion-webui-directml\venv\Scripts\Python.exe"

WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]

Version: v1.10.1-amd-11-gefddd05e

Commit hash: efddd05e11d9cc5339a41192457e6ff8ad06ae00

Using ZLUDA in D:\SD\stable-diffusion-webui-directml\.zluda

ROCm agents: ['gfx1100'], using gfx1100

no module 'xformers'. Processing without...

no module 'xformers'. Processing without...

No module 'xformers'. Proceeding without it.

D:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: \pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.`

rank_zero_deprecation(

Launching Web UI with arguments: --use-zluda

ONNX failed to initialize: module 'optimum.onnxruntime.modeling_diffusion' has no attribute '_ORTDiffusionModelPart'

Calculating sha256 for D:\SD\stable-diffusion-webui-directml\models\Stable-diffusion\fwm - AdamW-000007.safetensors: e94bda46a7e446257fc5c57724c05c694671e0bd39b5fba4d6ce1351ff1beb6c

Loading weights [e94bda46a7] from D:\SD\stable-diffusion-webui-directml\models\Stable-diffusion\fwm - AdamW-000007.safetensors

Running on local URL: http://127.0.0.1...

Creating model from config: D:\SD\stable-diffusion-webui-directml\configs\v1-inference.yaml

To create a public link, set \share=True` in `launch()`.`

Startup time: 7.0s (prepare environment: 9.4s, initialize shared: 1.1s, load scripts: 0.3s, create ui: 0.2s, gradio launch: 0.4s).

9 Comments
2024/10/16
21:21 UTC

6

6700XT ROCM over WSL2

Hello everyone, i've tried installing rocm on wsl2 but when i run rocminfo i get this output :

ROCR: unsupported GPU

hsa api call failure at: ./sources/wsl/tools/rocminfo/rocminfo.cc:1087

Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

Is the 6700XT just not supported over WSL atm ? Should i switch to dual booting ? Does someone know if/when my gpu will be supported ?

6 Comments
2024/10/16
10:27 UTC

3

ROCm, APU 680M and GTT memory on Arch

Hi!

I installed ROCm on a machine equipped with the 6900HX APU (with the 680M) under a freshly installed archlinux.

I put in 32GB of DDR5 in order to run heavier AI models, expecting to be able to get at least 16 GB of shared memory and avoid out-of-memory problems.

I modified the UMA Buffer Size setting in the BIOS to allocate 16GB at boot, and my whole system now shows me 16GB of VRAM, and 16GB of RAM... Except for ROCm and the tools that use it, which return 8GB, when I use rocminfo, pools 1 and 2 of the GPU part are at 8GB. I also note in dmesg that the use of amdgpu indicates only 8GB of memory ready for GTT (but 16GB for VRAM), I tried to specify 16GB for GTT via the use of amdgpu.gttsize in GRUB without success. (I set HSA_OVERRIDE_GFX_VERSION=10.3.0)

Is there a way to use the 16GB of VRAM through ROCm? Could the AMDKFD driver help? (Since I see related news about this in Linux kernel 6.10 specifically for APU with ROCm)

edit: I just saw that setting the BIOS UMA setting to "UMA_GAME_OPTIMIZED" let the system use more of the memory for the APU usage, and... UMA_AUTO let the system use it completely! (... what I expected to be able to do with UMA_SPECIFIED mode set to 16GB...)

16 Comments
2024/10/14
17:36 UTC

2

Pytorch can't compute a convolution layer on rocm!!!

Hi there! I have been facing this weird problem and can't figure out what might be the cause!! I am a rx 6600 (non XT) user. Recently I have been using this gpu on my ARCH Linux system for deep learning purpose. Installed rocm following this link:
https://gist.github.com/augustin-laurent/d29f026cdb53a4dff50a400c129d3ea7

Though rx 6600 is not an officially rocm supported gpu, did not expect it to work. But it worked well enough on the deep learning tasks I worked on. It works fine in case of fully connected layers, but for some weird reason it can't just process any convolution layer no matter how simple it is!! What can be the reason!!! I have been trying to solve the issue for 2 days and no outcome!! Hours pass, but it can't even process a simple convolutional model like this:
https://pastebin.com/kycUvN72

My System:
Os: Endevour OS(arch based)
Processor: i7 10th gen
rocm version: 6.0.3
torch version: 2.3.1
python version: 3.12

Any help would be appreciated.

N.B: The convolution codes worked well on my cpu, so i dont think there is error in the code. Also non convolution code like fully connected layers or large matrix multiplications worked just fine in my gpu!!

13 Comments
2024/10/14
15:43 UTC

5

Help with installation

Hello

Im trying to use my AMD 6950xt for pytorch DL tasks but i am really struggling with installing it on my windows. I tried also using WSL but i fail in the installation process. I had given up until i found this subreddit, can anyone give tips on how i can install everything correctly?

12 Comments
2024/10/02
13:06 UTC

67

AMD RocM works great with pytorch

There are lots of suspicious and hesitation around whether AMD GPUs are good/easy/robust enough to train full-scale AI models.

We recently got the AMD server with 8x MI100 chips and tested the codebase (including non-trivial home-designed attention modules, different from standard layouts). The AMD RocM holds up, more than expectation. There are no code changes needed and everything "just ran" out of box, including the DDP runs on all 8x GPUs with torchrun.

The MI100 speed is comparable to V100. We will test the code on the MI300X chips.

But overall, AMD Rocm looks made it - to become a painless, much cost-effective replacements to nvidia GPUs.

13 Comments
2024/10/01
16:53 UTC

29

September 2024 Update: AMD GPU (mostly RDNA3) AI/LLM Notes

3 Comments
2024/09/30
11:07 UTC

2

Error launching kernel: invalid device function [AMD Radeon RX 5700 XT]

This is general information about my system. I've just installed ROCm using the native guide for Ubuntu 24.04

Number of HIP devices: 1
Device 0: AMD Radeon RX 5700 XT
Total Global Memory: 8176 MB
Shared Memory per Block: 64 KB
Registers per Block: 65536
Warp Size: 32
Max Threads per Block: 1024

When I run a simple code

#include <iostream>
#include <hip/hip_runtime.h>

#define N 1024  // Size of the arrays

// Kernel function to sum two arrays
__global__ void sumArrays(int* a, int* b, int* c, int size) {
    int tid = threadIdx.x + blockIdx.x * blockDim.x;
    if (tid < size) {
        c[tid] = a[tid] + b[tid];
    }
}


int main() {
    int h_a[N], h_b[N], h_c[N];
    int *d_a, *d_b, *d_c;

    // Initialize the input arrays
    for (int i = 0; i < N; ++i) {
        h_a[i] = i;
        h_b[i] = 0;
        h_c[i] = 0;
    }

    // Allocate device memory
    hipError_t err;
    err = hipMalloc(&d_a, N * sizeof(int));
    if (err != hipSuccess) {
        std::cerr << "Error allocating memory for d_a: " << hipGetErrorString(err) << std::endl;
        return 1;
    }
    err = hipMalloc(&d_b, N * sizeof(int));
    if (err != hipSuccess) {
        std::cerr << "Error allocating memory for d_b: " << hipGetErrorString(err) << std::endl;
        return 1;
    }
    err = hipMalloc(&d_c, N * sizeof(int));
    if (err != hipSuccess) {
        std::cerr << "Error allocating memory for d_c: " << hipGetErrorString(err) << std::endl;
        return 1;
    }

    // Copy input data to device
    err = hipMemcpy(d_a, h_a, N * sizeof(int), hipMemcpyHostToDevice);
    if (err != hipSuccess) {
        std::cerr << "Error copying memory to d_a: " << hipGetErrorString(err) << std::endl;
        return 1;
    }
    err = hipMemcpy(d_b, h_b, N * sizeof(int), hipMemcpyHostToDevice);
    if (err != hipSuccess) {
        std::cerr << "Error copying memory to d_b: " << hipGetErrorString(err) << std::endl;
        return 1;
    }
    err = hipGetLastError();
    if (err != hipSuccess) {
        std::cerr << "Error launching kernel 1: " << hipGetErrorString(err) << std::endl;
        return 1;
    }

    // Launch the kernel
    int blockSize = 256;
    int gridSize = (N + blockSize - 1) / blockSize;
    hipLaunchKernelGGL(sumArrays, dim3(gridSize), dim3(blockSize), 0, 0, d_a, d_b, d_c, N);

    // Check for any errors during kernel launch
    err = hipGetLastError();
    if (err != hipSuccess) {
        std::cerr << "Error launching kernel: " << hipGetErrorString(err) << std::endl;
        return 1;
    }

    // Copy the result back to the host
    err = hipMemcpy(h_c, d_c, N * sizeof(int), hipMemcpyDeviceToHost);
    if (err != hipSuccess) {
        std::cerr << "Error copying memory from d_c: " << hipGetErrorString(err) << std::endl;
        return 1;
    }

    // Print the result
    std::cout << "Result of array sum:\n";
    for (int i = 0; i < 10; ++i) {  // Print first 10 elements for brevity
        std::cout << "c[" << i << "] = " << h_c[i] << std::endl;
    }

    // Free device memory
    hipFree(d_a);
    hipFree(d_b);
    hipFree(d_c);

    return 0;
}

I just get

me@ubuntu:~$ hipcc sum_array.cpp -o sum_array --amdgpu-target=gfx1010
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
sum_array.cpp:87:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   87 |     hipFree(d_a);
      |     ^~~~~~~ ~~~
sum_array.cpp:88:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   88 |     hipFree(d_b);
      |     ^~~~~~~ ~~~
sum_array.cpp:89:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   89 |     hipFree(d_c);
      |     ^~~~~~~ ~~~
3 warnings generated when compiling for gfx1010.
sum_array.cpp:87:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   87 |     hipFree(d_a);
      |     ^~~~~~~ ~~~
sum_array.cpp:88:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   88 |     hipFree(d_b);
      |     ^~~~~~~ ~~~
sum_array.cpp:89:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   89 |     hipFree(d_c);
      |     ^~~~~~~ ~~~
3 warnings generated when compiling for host.
me@ubuntu:~$ ./sum_array
Error launching kernel: invalid device function
5 Comments
2024/09/28
14:24 UTC

7

ROCm Support on Radeon RX-580

Using Radeon RX-580 VCard on Windows 11 OS 64 Bit with Ubuntu 22.04.5 LTS - Kernel is 5.15.153 running in WSL2 container - I apologize for any stupid questions - But can i ge ROCm to work on my machine ? Ive heard that the latest ROCm might not work but maybe i need to install an older version? I want to start dabbling with AI, ML, LLM etc etc and cant justify buying a new VC just yet.

Please can you share exact steps to get it working so that i can use my GPU ? TY

10 Comments
2024/09/28
01:24 UTC

4

rocm-smi -b

Is rocm-smi -b working only for some GPUs? I am trying to get the estimated PCIEx bandwidth utliization with a Radeon Pro W7700 (rocm 6.2.1) or a W5700 (rocm 5.2.1) and it always reports zero.

1 Comment
2024/09/27
11:42 UTC

0

Looking for honest GPU suggestion

Im a computer science bachelor student.

I have two good Deals for a 7900 xt (540€) and 7900 xtx (740€).

However, im really unsure if i can work through all of this to fully leverage the gpus for ML.

I have a bachelor thesis model in pytorch lighnting that i want to run on it, but not sure if amd is currently a viable option for me.

The nvidia option would be the RTX 4070 Super (the ti Upgrade is not worth the 250 bucks for me).

Could i catch the amd deal, or is it better to stay safe right now? What do i have to consider?

17 Comments
2024/09/25
17:04 UTC

Back To Top