/r/q?req.query.q -- Subreddit Search

2,298 Subscribers

Current - POV

0 Comments

2025/02/02
00:41 UTC

Configure a multi-node vLLM inference cluster or No?

1 Comment

2025/02/01
20:16 UTC

Issues with torchaudio and whisperx

Hi,

I have been using a base Docker image on 7900xtx with WSL:

FROM rocm/pytorch:rocm6.3.1_ubuntu22.04_py3.10_pytorch

RUN useradd -m -s /bin/bash jupyter_user && \
    mkdir -p /workspace/node_modules && \
    chown -R jupyter_user:jupyter_user /workspace && \
    chmod -R 755 /workspace && \
    apt-get update && \
    apt-get install -y \
    ffmpeg \
    git \
    curl \
    unzip && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /workspace

CMD ["/bin/bash"]

This setup works, and I can confirm it with:

import torch
torch.cuda.is_available()

However, as soon as I install torchaudio, it seems to start downloading a new version of torch, which messes things up.

I found this page but I'm unsure which .whl file to try: https://download.pytorch.org/whl/torchaudio/

Also, WhisperX seems to have other issues on ROCm: https://github.com/m-bain/whisperX/issues/566

Can anyone clarify which popular libraries like this still don't work properly on ROCm?

5 Comments

2025/02/01
05:46 UTC

My W7900 only showing 45 GB VRAM

Is that expected? the industry standard? Because on AMD website it says up to 48GB, although it says 48GB on the packaging.

Or is it only my card?

Or there is some firmware I can use to get 48GB back, as someone reported having 48GB just before they upgraded something!

Edit: Just needed to deactivate ECC through Radeon Software control panel, LLM token per second is 30% faster, and the model loading no longer hangs for a minute. And GPU temperature seems to be 5 degrees cooler.

12 Comments

2025/01/31
15:05 UTC

Announcing the AMD GPU Operator and Metrics Exporter

https://rocm.blogs.amd.com/software-tools-optimization/gpu-operator-exporter/README.html

0 Comments

2025/01/30
01:49 UTC

resources for learning rocm?

hello! I honestly don't know too much about rocm and hip but want to learn. I was wondering if there were any resources out there like "Programming Massively Parallel Processors" but for like AMD gpus (like some architectures specifics, etc.) Also, how could I test out rocm? Would buying an Mi25 or Mi50 be a good idea or are there free cloud resources? ty in advance!

14 Comments

2025/01/29
22:43 UTC

Best workflow for AI on Windows

I am thinking about using WSL2 with docker containers I get from Hugging face spaces, things should work fine?

Even with a 4090, that was my workflow, it does basically everything, for my dev I just mount my current directory to any docker container I want to customize.

Any suggestions or other workflows you’ve been happy with.

8 Comments

2025/01/27
11:50 UTC

ROCM 6.2 WSL2 seems not caching the model

Total VRAM 24492 MB, total RAM 32046 MB

pytorch version: 2.6.0.dev20241122+rocm6.2

Set vram state to: NORMAL_VRAM

Device: cuda:0 AMD Radeon RX 7900 XTX : native

Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention

every time a different model is loaded, (Flux, florence, sdxl, ollama models), it took huge time for the node to load up, appears like ROCM is rebuilding the cache for the model, even though it was built before in the same session.

Stick with the same model has no issue, fast and responsive.

Anyone has any idea for it?

Zluda in windows doesn't have this problem, once the model is loaded, fast and response for the rest even for different sessions.

1 Comment

2025/01/27
03:26 UTC

Follow up on ROCm feedback thread

A few days ago I made a post asking for feedback on how to improve ROCm here:

https://www.reddit.com/r/ROCm/comments/1i5aatx/rocm_feedback_for_amd/

I took all the comments and fed it to ChatGPT (lol) to organize it into coherent feedback which you can see here:

https://docs.google.com/document/d/17IDQ6rlJqel6uLDoleTGwzZLYOm1h16Y4hM5P5_PRR4/edit?usp=sharing

I sent this to AMD and can confirm that they have seen it.

If I missed anything please feel free to leave a comment below, I'll add it to the feedback doc.

20 Comments

2025/01/24
18:47 UTC

The importance of initializing array values : by example

0 Comments

2025/01/24
01:06 UTC

Upgraded!

4 Comments

2025/01/23
19:19 UTC

Anyone who got 6600M working with rocm?

Hi, I have a 6600M (Navi23 rdna2) card and I'm struggling to get rocm working for stable diffusion. Tried both zluda and ubuntu but resulted in many errors. Is there anyone who got it working (windows or Linux)? What's the rocm version? Thanks a lot.

16 Comments

2025/01/23
05:14 UTC

AMD GPU on Ubuntu: Environment question

Hi Everyone,

For the better part of a week I've been trying to get an old Ubuntu installation I had in an Intel NUC to work on a desktop PC by just swapping over the drive... It has not been a smooth experience.

I'm at the point where I can start up the system, use the desktop environment normally and connect to the Wi-Fi, none of this worked just after swapping the SSD over.

My system has a Ryzen 7 5800X CPU, 32GB Ram and AMD's own 6700XT. Ubuntu is installed on a separate drive than Windows. Fast Boot & secure boot are disabled. I want to use it with ROCm and both Tensorflow and Pytorch. To classify my data (Pictures - about 16.000.000) in 30 main classes and then each class will get subdivided in smaller subclasses (from ten to about 60 for the largest mainclass).

At this point I don't even manage to make my system detect the GPU in there - which is weird because the CPU does not have integrated graphics, yet I have a GUI to work in. Installing amdgpu via sudo apt install amdgpu results in an Error I can't get my head round.

I'll just start over with a clean install of some Linux distro and I'd like to start of a tried and tested system. I'd like to avoid starting off an unproven base, so I'm asking some of the ROCm veterans for advice. My goal is to install all of this baremetal - so preferably no Docker involved.

- Which version of Linux is recommended: I often see Ubuntu 20.04LTS and 22.04LTS. Any reason to pick this over 24.04, especially since the ROCm website doesn't list 20.04 any more.
- Does the Kernel version matter?
- Which version of ROCm?: I currently tried (and failed) to install the most recent version, yet that doesn't seem to work for all and ROCm 5.7 is advised (https://www.reddit.com/r/ROCm/comments/1gu5h7v/comment/lxwknoh/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button)
- Which Python Version do you use? The default 3.12 that came with version of Ubuntu does not seem to like rocm's version of tensorflow, so I downgraded it to version 3.11. Was I right, or is there a way of making 3.12 work?
- Did you install the .deb driver from AMD's website for the GPU? I've encountered mixed advice on this.
- Finally: could someone clarify the difference between the normal tensorflow and tensorflow-rocm; and a likewise explanation for Pytorch?

To anyone willing to help, my sincere thanks!

26 Comments

2025/01/21
17:36 UTC

Status of current testing for AMD Instinct Mi60 AI Servers

0 Comments

2025/01/20
15:04 UTC

125

ROCM Feedback for AMD

Ask: Please share a list of your complaints about ROCM

Give: I will compile a list and send it to AMD to get the bugs fixed / improvements actioned

Context: AMD seems to finally be serious about getting its act together re: ROCM. If you've been following the drama on Twitter the TL;DR is that a research shop called Semi Analysis tore apart ROCM in a widely shared report. This got AMD's CEO Lisa Su to visit Semi Analysis with her top execs. She then tasked one of these execs Anush Elangovan (who was previously founder at nod.ai that got acquired by AMD) to fix ROCM. Drama here:

https://x.com/AnushElangovan/status/1880873827917545824

He seems to be pretty serious about it so now is our chance. I can send him a google doc with all feedback / requests.

126 Comments

2025/01/19
21:57 UTC

UDNA, any insight as to how the ROCm roadmap will adapt?

Not sure there is enough information out there, least none I'm aware of. What do some of you think the complications of having a unified stack will be for the ROCm lib and for merging projects that are optimized to AMD hardware running ROCm when newer hardware shifts from either RDNA and CDNA bases architecture? Do you think the API domain calls will be able to persist and make moving code to the latest UDNA hardware a non-issue?

4 Comments

2025/01/17
18:00 UTC

405B + Ollama vs vLLM + 6x AMD Instinct Mi60 AI Server

5 Comments

2025/01/14
01:44 UTC

Testing vLLM with Open-WebUI - Llama 3 Tulu 70B - 4x AMD Instinct Mi60 Rig - 25 toks/s!

1 Comment

2025/01/14
00:01 UTC

Is AMD starting to bridge the CUDA moat?

As many of you know a research shop called Semi Analysis skewered AMD and shamed them for basically leaving ROCM

https://semianalysis.com/2024/12/22/mi300x-vs-h100-vs-h200-benchmark-part-1-training/

Since that blog post, AMD's CEO Lisa Su met with Semianalysis and it seems that they are fully committed to improving ROCM.

They then published this:
https://www.amd.com/en/developer/resources/technical-articles/vllm-x-amd-highly-efficient-llm-inference-on-amd-instinct-mi300x-gpus-part1.html

(This is part 1 of a 4 part series, links to the other parts are in that link)

Has AMD finally woken up / are you guys seeing any other evidence of ROCM improvements vs CUDA?

31 Comments

2025/01/13
18:06 UTC