/r/HPC
Multicore, cluster, and high-performance computing news, articles and tools.
Multicore, cluster, and high-performance computing news, articles and tools.
"Anyone can build a fast CPU. The trick is to build a fast system." - Seymour Cray
Other subreddits you may like:
Does this sidebar need an addition or correction? Tell us here
/r/HPC
I've been trying to figure out the best solution for this and have ended up here so apologies if this isn't particularly on brand, if so I'm open to other pages or links for info.
I have a hobby of having hobbies and end up lacking compute power things like 3d modeling, physics simulations, game development, and running local LLMs to name a few daily tasks. I'm not doing these at insane scale or for any buisness, however I don't have thousands to shell out on super high end parts that can accommodate everything in one system.
My ideal goal is using one host with a decent cpu and gpu for fast processing along with managing and scheduling between maybe 2-4 nodes. the nodes would be used for the "heavy lifting" with more cores, ram, and vram. Preferably in a smart-ish system that prioritizes currently focused applications.
From what I've read so far I may need to settle for a system that equally schedules everything all the time, which isn't bad so long as I can still accomplish the main goal of having access to large compute strength. OpenPBS sounds like a possible option paired with Ubuntu, I was looking into something like dragonflyBSD or centOS, but I have 0 experience with either.
Something that may or may not cause issues is that almost all of my programs and software only run on windows. A vm running windows on the host or access node is more than doable, but I'm not sure if there are any issues using a vm to access the full range of a cluster.
Edit: forgot to actually state what I'm asking, I'm looking for advice, tips, or just criticism of my idea, I don't have any solid requirements for hardware or OS yet so recommendations are more than welcome. I was looking at possibly using a small 4 node blade server if I can find some with slots for some workstation gpus like a tesla k80 for high VRAM at cheap prices.
Again sorry if this isn't the place for this or if some of these questions are basic knowledge, it took many hours of searching to even get to this point so please point me along if I'm not at the end of my search.
GPFS Optimizations -
We are using GPFS - I am a user, not a admin -we have a specific use case of reading read only files over and over again. I was wondering if using the C api directly gpfs_read etc can optimize this specific use case? Can't seem to find performance numbers of reading data with or without using the c api directly.
Hello all,
I am currently writing a C++ program using MPI and OpenCV, and I am having trouble executing the program.
When I build it using the CLion's Compiler and run it, it seems to be working fine
However, when I compile it using cmake . && make, and run using mpirun, I am unable to execute the code. It is not giving me any output at all, or is giving me a path error.
Any advice would be much appreciated.
I am newly administrating this platform. I am now going to use AD as the authentication source.
For SSH, I can use SSSD + LDAP combination to let user login and everything smooth.
For jupyterhub, it seems BrightCM customized the environment which only can authentictae to CMdaemon, which is an internal LDAP.
I would ask anybody had experience before to make jupyterhub in BrightCM to authenticate with AD. Thank you.
I’m hoping to find project ideas to build skills and show what I know to apply myself to a future HPC role.
TL;DR about the role, mainly troubleshooting clusters, bash, using SLURM, K8 admin, and other automation ways to help with daily roles.
Sorry to make it vague but I cannot find much online other that the listed job for information I would like as each “HPC Engineer” role is HIGHLY varied haha
A HPC service provider requires a change of user's home directory from /home/{user} to /home/{primarygroup}/{user} if we want to upgrade the admin platform.
It seems very rare to me to see the user home in such pattern, what's the pro and con of manage home directory this way?
I have both DMTCP and SLURM installed on Ubuntu 18.04 on a small 2 nodes cluster. I'm planning on running some MPI applications and checkpoint them, but I don't know how to run DMTCP via SLURM.
Hi, I'm looking to get into HPC, but I have no idea what the interview process looks like. Is it like SWE interviews where they ask leetcode problems? Or is it mostly on domain knowledge?
Clarification:
I want to be an HPC software engineer (Not sure if this is the correct term). (Accelerating/Optimizing scientific computing or AI/ML training)
I'm in the process of buying 3 r760 dual CPU machines.
I want to connect them together with infiniband in a switchlese configuration and need some guidance.
Based on poking around it seems easiest to have a dual port adapter and connect each host to the other 2. Then setup a subnet with static routing. Someone else will be helping with this part.
I guess my main question is affordable hardware (<$5k) to accomplish this that will provide good performance for distributed memory computations.
I cannot buy used/older gear. Adapters/cables must be available for purchase brand new from reputable vendors.
The r760 has ocp 3.0 but dell does not appear to offer an infiniband card for it. Is the ocp 3.0 socket beneficial over using pcie?
Since these systems are dual socket is there a performance hit of using a single card to communicate with both CPUs? (The pcie slot belongs to a particular socket?).
It looks like Nvidia had some newer options for host chaining when I was poking around.
Is getting a single port card with a splitter cable a better option than a dual port?
What would you all suggest?
Hi forks,
I am very new to HPC environment and all the server related subjects.
Now i am trying to set up SLURM cluster on my machines, and some file systems.
I am trying to run multiple jobs from multiple clients, and each job should do lot of read / write opertions.
I've read several articles from the communities and heard about the BeeGfs, but when tested with fio randwrite it is way slower than the NFS mounted point.
Hence now i am looking for something else for the FS. Can you recommend any others?
(ps : I am trying to run synopsys vcs regression tests on this cluster)
I'm pleased to announce that our work on the Flux Framework operator is published in F1000Research! This is an example of converged computing and was (continues to be) a joy to collaborate with Aldo and Antonio (Google batch/networking teams, respectively). https://doi.org/10.12688/f1000research.147989.1. I hope to do (and inspire others to do) work like this more often! <3
Except for the repository, i can't find anything about it.
https://github.com/NVIDIA/aistore/tree/main
https://aiatscale.org/
Skimming through the doc, it seems rather feature complete, more flexible than minio, with more potential for performances, its backed by a big corp, and is open source with no strings attached.
So it seems like a very good candidate and i am surprised, i can't find any feedback on it on google.
I thought I'd share in case this is a helpful resource for someone interested in learning about high performance computing for quantitative finance applications. It includes an introduction to high performance computing, a reference to a guide I co-wrote on configuring a small Slurm cluster, and a Python script template with tested examples for implementing Monte Carlo option pricing programs on Slurm clusters.
We are looking a PRO build system for some calculations and model training using GPU. We are aiming for a budget built with 1TB RAM. Can you give your thoughts setup below? Thanks.
Asus Pro WS WRX80E-SAGE SE WIFI II
Seasonic Prime Tx-1600 80+ Titanium Psu
Ryzen ThreadRipper PRO 5975WX
"Crucial Micron 128GB 3200MHz CL22 DDR4 SDRAM DIMM 288-pin" x8
Fractal Design Define 7 XL Light Glass
Kingston FURY Renegade SSD 1000GB M.2 2280 PCI Express 4.0 x4 (NVMe)
ASUS GeForce RTX 4090 TUF Gaming OC 24GB
Seagate Exos X20 20TB 3.5" 7200rpm SATA-600
Kingston DC600M SSD 7680GB 2.5" SATA-600
Corsair ICUE H150I Elite Capellix XT
What do I need in order to install 4x T4s into my R740xd? I don't need power cables since they are 70w each right? Would I only need the Risers, and if yes, which of these risers do I need? Dell keeps only redirecting me to their installment kit which is a pain in the ass to buy and still comes with too many extras. Are those extras needed?
I'm creating a SLURM cluster with an MPICH/DMTCP configuration. What should the installation order be?
Ik it's nice for educational purposes but is there ever a practical reason to build it for preformance? Or is going a bit bigger on the cpu always worth it?
Given a bash script named test.sh
module load cuda/11.6
env
If I run in host system with bash test.sh
, everything is fine.
But if I run it in a singularity container:
singularity exec rocky8.sif bash -l test.sh
Then it will report module not found
But the output show that the function is existed:
BASH_FUNC_module()=() { local _mlredir=1;
if [ -n "${MODULES_REDIRECT_OUTPUT+x}" ]; then
if [ "$MODULES_REDIRECT_OUTPUT" = '0' ]; then
_mlredir=0;
else
if [ "$MODULES_REDIRECT_OUTPUT" = '1' ]; then
_mlredir=1;
fi;
fi;
fi;
case " $@ " in
*' --no-redirect '*)
_mlredir=0
;;
*' --redirect '*)
_mlredir=1
;;
esac;
if [ $_mlredir -eq 0 ]; then
_module_raw "$@";
else
_module_raw "$@" 2>&1;
fi
}
How to fix this?
Hi, there was a deadline for IHPCSS application on 31th January. I applied for the first time ever - does anyone know if they send rejection emails? On the application they said it'll take a month or so, and it's month and half, so I don't know if I'm rejected or just impatient.
Thanks in advance!
Hi Experts,
I am new to the HPC world and I want to learn more about it.
Is there a training course or some content that can help me understand , visualize and practice HPC ?
Tried searching Udemy but that didn't help much.
Hi.
Our current cluster has multiple partitions, mainly to separate between long and short jobs.
I'm starting to see more and more clusters that have only 1 partition and manage their nodes via QOS only. Often I see a "long" and "short" QOS which restricts jobs to specific nodes.
What is the benefit of using QOS here?
Hi.
Our current cluster has multiple partitions, mainly to separate between long and short jobs.
I'm starting to see more and more clusters that have only 1 partition and manage their nodes via QOS only. Often I see a "long" and "short" QOS which restricts jobs to specific nodes.
What is the benefit of using QOS here?
I am reading through github repo of cuda code. Like just whatever comes first or some common tools I use.
I am noticing there are 2 distinct dialects (I think idk I m no expert). The ai people do a lot of meta programing and use common libraries this makes their code even inside kernals very c++ish
In contrast the physics simulations look like plain c with some fancy syntax for kernal lunching. And most of the surrounding code is c or c like c++.
Is this something you have noticed? Is this a thing that transcends cuda or is it specific to that languge?
right now I am stuck not being able to compile on my machine (not the question here) now I will probably find a solution. but I would never know this is an issue on other platforms.
Hello,
I’m currently studying computer science and mathematics. Next year I’ll have to choose a master degree and I heard about HPC. What I really enjoy is developing performant softwares using pretty low level programming languages like C or Rust and optimizing algorithms. Also I would really like to fight against the environmental crisis we’re facing nowadays. And I’ve found out that maybe with HPC I could combine the two. Developing performant softwares for researchers in meteorology, climatology, ecosystem simulations,... I would also like to work on the public research field. Do you think HPC is what I’m im looking for ? Are HPC engineers in demand in the European public research? Does anybody here do this? Do you know what are the best HPC masters degrees in Europe?
Thanks in advance for your answers
In our environment, we have large number of queues and it's difficult to manage them all. This includes queues that are no longer used.
So, we need to do some housekeeping and remove queues that are no longer in use. Is there anyway I can find when was the last time a job ran on each queue in LSF?
I've tried fetching data from RTM, but it's tedious to go through each queue and manually scroll/sort for them. It would be much easier to fetch through a script.
I have built my Discrete Element Method (DEM) code for simulation of granular systems in C++. As the simulation of particle dynamics is fully resolved, I want it to be run on our cluster. I would skip OpenMP implementation even it might be easier than using MPI.
In terms of the APIs, which one is more user-friendly? or they have the same APIs. Suppose I already know the basic algorithm for parallel simulation of system of many particles, Is it doable in 6 months for the implementaiton?
All of my compute nodes can run at a maximum network speed of 1gbps, given the networking in the building. My SLURM cluster is configured so that there is an NFS node that the compute nodes draw their stuff from, but when someone is using a very large dataset or model it takes forever to load. In fact, sometimes it takes longer to load the data or model than it does to run the inference.
I'm thinking of re-configuring the whole damn thing anyway. Given that I am currently limited by the building's networking but my compute nodes have a preposterous amount of hard drive space, I'm thinking about the following solution:
Each compute node is connected to the NFS for new things, but common things (such as models or datasets) are mirrored on every compute node. The compute node SSDs are practically unused, so storage isn't an issue. This way, a client can request that their dataset be stored locally rather than on the NFS, so loading should be much faster.
Is that kludgy? Note that each compute node has a 10gbps NIC on board, but building networking throttles us. The real solution is to set up a LAN for all of the compute nodes to take advantage of the faster NIC, but that's a project for a few months from now when we finally tear the cluster down and rebuild it with all of the lessons we have learned.