/r/VFIO
This is a subreddit to discuss all things related to VFIO and gaming on virtual machines in general.
What is VFIO?
VFIO stands for Virtual Function I/O. VFIO is a device driver that is used to assign devices to virtual machines. One of the most common uses of vfio is setting up a virtual machine with full access to a dedicated GPU. This enables near-bare-metal gaming performance in a Windows VM, offering a great alternative to dual-booting.
The wiki will be a one-stop shop for all things related to VFIO. Right now it has links to a small amount of resources, but it will constantly be updated.
To join the VFIO Discord server, click here.
To join the VFIO Steam group, click here.
1) No harassment
2) No shilling
3) No discussion of developing cheats
/r/VFIO
I'm in the process of speccing a homelab that I plan on building (SFF in a Jonsbo N4 case - this is one I previously owned already as it comes with partner approval for being in the living room) and I am trying to find the best, easiest GPU that will allow SR-IOV? From reading a few of the available links and guides online, I can see that the PNY T1000 might just be my best bang for buck and easiest set up? I would, from what I can tell, be able to fit an RTX 2000 Ada Gen but I have read that only the Turing gen Geforce RTX 20xx supports this and the RTX 2000 ada does not.
I was hoping to split the GPU between multiple VMs but ideally, I was hoping to run a few Llama models and I have 0 confidence that the T1000 will be any good for this given its 8gb VRAM limitation. My other option is keeping the RTX 2000 ADA and potentially passing it through to a VM completely to focus on my LLMs whilst using iGPU SR-IOV to power my Plex (as it will likely be powerful enough for this task) but my research has kind of fallen down at this point - is this even possible?
I'm doing iGPU passthrough with a Linux/qemu-kvm host. There is no additional GPU, only the onboard Intel graphics.
For passthrough to work, I try to not "touch" the GPU much during host boot. I use text
mode, kernel parameters video=vesafb:off,efifb:off nofb
, and hide the iGPU with pci-stub
.
This disables all graphic output, but I still get the (text) system console and (text) login prompt. I actually use the system console to display a small menu, so it's nice to have.
HOWEVER, the system console will sometimes output important kernel messages. This interferes with GPU passthrough, because the VGA buffer is owned by the VFIO guest. The host is not supposed to continue writing chars to it or scrolling it.
I can't find any information on how to disable, or change the system console after booting. The only way I can come up with is to permanently redirect it to nowhere (maybe console=ttyS0
). But then I loose the small text menu, and the login prompt will still appear. So this solution isn't 100% clean either.
When booting XEN (a bare-metal hypervisor), there's a step where access to the VGA console is relinquished. This is what I want to do, but with standard Linux and KVM instead of XEN. Tell the kernel to release the console for good, and not access it anymore.
Is this possible, and how?
VirtIO-GPU Vulkan Support has been enabled in qemu and works great with linux hosts using vulkan-virtio and I hope to see soon implementation for Windows and OSX too!
Problem is though, that libvirt does not support the new feature yet. libvirt only knows about OpenGL, but not vulkan yet. Is there a way to configure virt-manager or use a hook-script in a way that appends custom commands to the qemu command string so that the vulkan gpu can be exposed that way?
-device virtio-gpu-gl,hostmem=8G,blob=true,venus=true
needs to be appended as described here: https://www.qemu.org/docs/master/system/devices/virtio-gpu.html#virtio-gpu-virglrenderer
I just built and setup a new rig. I have everything working, I just have a question about CPU pinning.
I set Proxmox to use CPUs 0-15 on my 7950X3D. From what I read, this should utilize the 8 CPU cores that have the X3D L3 cache. However, the picture attached is the output of Z-CPU, which shows 16 MBytes of L3 cache and not 128 MBytes.
I am not sure if something is wrong or if I am not interpreting it incorrectly?
I'm getting this error with the venus driver. Any idea on how to fix it?
MESA: info: virtgpu backend not enabling VIRTGPU_PARAM_CREATE_FENCE_PASSING
MESA: info: virtgpu backend not enabling VIRTGPU_PARAM_CREATE_GUEST_HANDLE
MESA: error: DRM_IOCTL_VIRTGPU_GET_CAPS failed with Invalid argument
MESA: error: DRM_IOCTL_VIRTGPU_CONTEXT_INIT failed with Invalid argument, continuing without context...
MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
MESA: error: Failed to create virtgpu AddressSpaceStream
MESA: error: vulkan: Failed to get host connection
ERROR: [Loader Message] Code 0 : libpowervr_rogue.so: cannot open shared object file: No such file or directory
ERROR: [Loader Message] Code 0 : loader_icd_scan: Failed loading library associated with ICD JSON /usr/local/lib/x86_64-linux-gnu/libvulkan_powervr_mesa.so. Ignoring this JSON
MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
MESA: error: Failed to create virtgpu AddressSpaceStream
MESA: error: vulkan: Failed to get host connection
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/local/lib/x86_64-linux-gnu/libvulkan_dzn.so. Skipping this driver.
MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
MESA: error: Failed to create virtgpu AddressSpaceStream
MESA: error: vulkan: Failed to get host connection
MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
MESA: error: Failed to create virtgpu AddressSpaceStream
MESA: error: vulkan: Failed to get host connection
MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
MESA: error: Failed to create virtgpu AddressSpaceStream
MESA: error: vulkan: Failed to get host connection
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Received return code -4 from call to vkCreateInstance in ICD /usr/local/lib/x86_64-linux-gnu/libvulkan_gfxstream.so. Skipping this driver.
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/lib/x86_64-linux-gnu/libvulkan_virtio.so. Skipping this driver.
vulkaninfo: ../src/vulkan/wsi/wsi_common_x11.c:931: x11_surface_get_formats2: Assertion `f->sType == VK_STRUCTURE_TYPE_SURFACE_FORMAT_2_KHR' failed.
Aborted (core dumped)
I have a truenas core in a VM with 6 NVME passthrough (zfs pool created inside truenas), everything was ok since I first installed it .. 6+months.
I had to reboot the server (not just the VM) and now I cant boot the VM with the attached NVMEs.
Any ideas?
Thanks
grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on pcie_acs_override=downstream,multifunction pci=realloc,noats pcie_aspm=off"
One of these disks is the boot drive. Same type/model as the other 6.
03:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc Device [1344:51c0] (rev 02)
04:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc Device [1344:51c0] (rev 02)
05:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc Device [1344:51c0] (rev 02)
06:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc Device [1344:51c0] (rev 02)
07:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc Device [1344:51c0] (rev 02)
08:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc Device [1344:51c0] (rev 02)
09:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc Device [1344:51c0] (rev 02)
[ 190.927003] pcieport 0000:02:05.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 190.930684] pcieport 0000:02:05.0: bridge window [io 0x1000-0x0fff] to [bus 08] add_size 1000
[ 190.930691] pcieport 0000:02:05.0: BAR 13: no space for [io size 0x1000]
[ 190.930693] pcieport 0000:02:05.0: BAR 13: failed to assign [io size 0x1000]
[ 190.930694] pcieport 0000:02:05.0: BAR 13: no space for [io size 0x1000]
[ 190.930695] pcieport 0000:02:05.0: BAR 13: failed to assign [io size 0x1000]
[ 190.930698] pci 0000:08:00.0: BAR 0: assigned [mem 0xf4100000-0xf413ffff 64bit]
[ 190.932408] pci 0000:08:00.0: BAR 4: assigned [mem 0xf4140000-0xf417ffff 64bit]
[ 190.934115] pci 0000:08:00.0: BAR 6: assigned [mem 0xf4180000-0xf41bffff pref]
[ 190.934118] pcieport 0000:02:05.0: PCI bridge to [bus 08]
[ 190.934850] pcieport 0000:02:05.0: bridge window [mem 0xf4100000-0xf41fffff]
[ 190.935340] pcieport 0000:02:05.0: bridge window [mem 0xf5100000-0xf52fffff 64bit pref]
[ 190.937343] nvme nvme2: pci function 0000:08:00.0
[ 190.938039] nvme 0000:08:00.0: enabling device (0000 -> 0002)
[ 190.977895] nvme nvme2: 127/0/0 default/read/poll queues
[ 190.993683] nvme2n1: p1 p2
[ 192.318164] vfio-pci 0000:09:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 192.320595] pcieport 0000:02:06.0: pciehp: Slot(0-6): Link Down
[ 192.484916] clocksource: timekeeping watchdog on CPU123: hpet wd-wd read-back delay of 246050ns
[ 192.484937] clocksource: wd-tsc-wd read-back delay of 243047ns, clock-skew test skipped!
[ 192.736191] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x12e8 (issued 2000 msec ago)
[ 193.988867] clocksource: timekeeping watchdog on CPU126: hpet wd-wd read-back delay of 246400ns
[ 193.988894] clocksource: wd-tsc-wd read-back delay of 244095ns, clock-skew test skipped!
[ 194.244006] vfio-pci 0000:09:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 194.244187] pci 0000:09:00.0: Removing from iommu group 84
[ 194.252153] pcieport 0000:02:06.0: pciehp: Timeout on hotplug command 0x13f8 (issued 186956 msec ago)
[ 194.252765] pcieport 0000:02:06.0: pciehp: Slot(0-6): Card present
[ 194.726855] device tap164i0 entered promiscuous mode
[ 194.738469] vmbr1: port 1(tap164i0) entered blocking state
[ 194.738476] vmbr1: port 1(tap164i0) entered disabled state
[ 194.738962] vmbr1: port 1(tap164i0) entered blocking state
[ 194.738964] vmbr1: port 1(tap164i0) entered forwarding state
[ 194.738987] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr1: link becomes ready
[ 196.272094] pcieport 0000:02:06.0: pciehp: Timeout on hotplug command 0x13e8 (issued 2020 msec ago)
[ 196.413036] pci 0000:09:00.0: [1344:51c0] type 00 class 0x010802
[ 196.416962] pci 0000:09:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit]
[ 196.421620] pci 0000:09:00.0: reg 0x20: [mem 0x00000000-0x0003ffff 64bit]
[ 196.422846] pci 0000:09:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[ 196.424317] pci 0000:09:00.0: Max Payload Size set to 512 (was 128, max 512)
[ 196.439279] pci 0000:09:00.0: PME# supported from D0 D1 D3hot
[ 196.459967] pci 0000:09:00.0: Adding to iommu group 84
[ 196.462579] pcieport 0000:02:06.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 196.466259] pcieport 0000:02:06.0: bridge window [io 0x1000-0x0fff] to [bus 09] add_size 1000
[ 196.466265] pcieport 0000:02:06.0: BAR 13: no space for [io size 0x1000]
[ 196.466267] pcieport 0000:02:06.0: BAR 13: failed to assign [io size 0x1000]
[ 196.466268] pcieport 0000:02:06.0: BAR 13: no space for [io size 0x1000]
[ 196.466269] pcieport 0000:02:06.0: BAR 13: failed to assign [io size 0x1000]
[ 196.466272] pci 0000:09:00.0: BAR 0: assigned [mem 0xf4000000-0xf403ffff 64bit]
[ 196.467975] pci 0000:09:00.0: BAR 4: assigned [mem 0xf4040000-0xf407ffff 64bit]
[ 196.469691] pci 0000:09:00.0: BAR 6: assigned [mem 0xf4080000-0xf40bffff pref]
[ 196.469694] pcieport 0000:02:06.0: PCI bridge to [bus 09]
[ 196.470426] pcieport 0000:02:06.0: bridge window [mem 0xf4000000-0xf40fffff]
[ 196.470916] pcieport 0000:02:06.0: bridge window [mem 0xf5300000-0xf54fffff 64bit pref]
[ 196.472884] nvme nvme3: pci function 0000:09:00.0
[ 196.473616] nvme 0000:09:00.0: enabling device (0000 -> 0002)
[ 196.512931] nvme nvme3: 127/0/0 default/read/poll queues
[ 196.529097] nvme3n1: p1 p2
[ 198.092038] pcieport 0000:02:06.0: pciehp: Timeout on hotplug command 0x12e8 (issued 1820 msec ago)
[ 198.690791] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x300
[ 198.691033] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x27@0x920
[ 198.691278] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x26@0x9c0
[ 199.114602] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x19@0x300
[ 199.114847] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x27@0x920
[ 199.115096] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x26@0x9c0
[ 199.485505] vmbr1: port 1(tap164i0) entered disabled state
[ 330.030345] vfio-pci 0000:08:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 330.032580] pcieport 0000:02:05.0: pciehp: Slot(0-5): Link Down
[ 331.935885] vfio-pci 0000:08:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 331.936059] pci 0000:08:00.0: Removing from iommu group 83
[ 331.956272] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x11e8 (issued 139224 msec ago)
[ 331.957145] pcieport 0000:02:05.0: pciehp: Slot(0-5): Card present
[ 333.976326] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x13e8 (issued 2020 msec ago)
[ 334.117418] pci 0000:08:00.0: [1344:51c0] type 00 class 0x010802
[ 334.121345] pci 0000:08:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit]
[ 334.126000] pci 0000:08:00.0: reg 0x20: [mem 0x00000000-0x0003ffff 64bit]
[ 334.127226] pci 0000:08:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[ 334.128698] pci 0000:08:00.0: Max Payload Size set to 512 (was 128, max 512)
[ 334.143659] pci 0000:08:00.0: PME# supported from D0 D1 D3hot
[ 334.164444] pci 0000:08:00.0: Adding to iommu group 83
[ 334.166959] pcieport 0000:02:05.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 334.170643] pcieport 0000:02:05.0: bridge window [io 0x1000-0x0fff] to [bus 08] add_size 1000
[ 334.170650] pcieport 0000:02:05.0: BAR 13: no space for [io size 0x1000]
[ 334.170652] pcieport 0000:02:05.0: BAR 13: failed to assign [io size 0x1000]
[ 334.170653] pcieport 0000:02:05.0: BAR 13: no space for [io size 0x1000]
[ 334.170654] pcieport 0000:02:05.0: BAR 13: failed to assign [io size 0x1000]
[ 334.170658] pci 0000:08:00.0: BAR 0: assigned [mem 0xf4100000-0xf413ffff 64bit]
[ 334.172363] pci 0000:08:00.0: BAR 4: assigned [mem 0xf4140000-0xf417ffff 64bit]
[ 334.174072] pci 0000:08:00.0: BAR 6: assigned [mem 0xf4180000-0xf41bffff pref]
[ 334.174075] pcieport 0000:02:05.0: PCI bridge to [bus 08]
[ 334.174806] pcieport 0000:02:05.0: bridge window [mem 0xf4100000-0xf41fffff]
[ 334.175296] pcieport 0000:02:05.0: bridge window [mem 0xf5100000-0xf52fffff 64bit pref]
[ 334.177298] nvme nvme1: pci function 0000:08:00.0
[ 334.177996] nvme 0000:08:00.0: enabling device (0000 -> 0002)
[ 334.220204] nvme nvme1: 127/0/0 default/read/poll queues
[ 334.237017] nvme1n1: p1 p2
[ 335.796180] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x12e8 (issued 1820 msec ago)
Another try:
[ 79.533603] vfio-pci 0000:07:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 79.535330] pcieport 0000:02:04.0: pciehp: Slot(0-4): Link Down
[ 80.284136] vfio-pci 0000:07:00.0: timed out waiting for pending transaction; performing function level reset anyway
[ 81.532090] vfio-pci 0000:07:00.0: not ready 1023ms after FLR; waiting
[ 82.588056] vfio-pci 0000:07:00.0: not ready 2047ms after FLR; waiting
[ 84.892150] vfio-pci 0000:07:00.0: not ready 4095ms after FLR; waiting
[ 89.243877] vfio-pci 0000:07:00.0: not ready 8191ms after FLR; waiting
[ 97.691632] vfio-pci 0000:07:00.0: not ready 16383ms after FLR; waiting
[ 114.331200] vfio-pci 0000:07:00.0: not ready 32767ms after FLR; waiting
[ 149.146240] vfio-pci 0000:07:00.0: not ready 65535ms after FLR; giving up
[ 149.154174] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x13f8 (issued 141128 msec ago)
[ 151.174121] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x03e0 (issued 2020 msec ago)
[ 152.506070] vfio-pci 0000:07:00.0: not ready 1023ms after bus reset; waiting
[ 153.562091] vfio-pci 0000:07:00.0: not ready 2047ms after bus reset; waiting
[ 155.801981] vfio-pci 0000:07:00.0: not ready 4095ms after bus reset; waiting
[ 160.153992] vfio-pci 0000:07:00.0: not ready 8191ms after bus reset; waiting
[ 168.601641] vfio-pci 0000:07:00.0: not ready 16383ms after bus reset; waiting
[ 186.009203] vfio-pci 0000:07:00.0: not ready 32767ms after bus reset; waiting
[ 220.824284] vfio-pci 0000:07:00.0: not ready 65535ms after bus reset; giving up
[ 220.844289] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x03e0 (issued 71692 msec ago)
[ 222.168321] vfio-pci 0000:07:00.0: not ready 1023ms after bus reset; waiting
[ 223.224211] vfio-pci 0000:07:00.0: not ready 2047ms after bus reset; waiting
[ 225.432174] vfio-pci 0000:07:00.0: not ready 4095ms after bus reset; waiting
[ 229.784044] vfio-pci 0000:07:00.0: not ready 8191ms after bus reset; waiting
[ 238.231807] vfio-pci 0000:07:00.0: not ready 16383ms after bus reset; waiting
[ 245.400141] INFO: task irq/59-pciehp:1664 blocked for more than 120 seconds.
[ 245.400994] Tainted: P O 5.15.158-2-pve #1
[ 245.401399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.401793] task:irq/59-pciehp state:D stack: 0 pid: 1664 ppid: 2 flags:0x00004000
[ 245.401800] Call Trace:
[ 245.401804] <TASK>
[ 245.401809] __schedule+0x34e/0x1740
[ 245.401821] ? srso_alias_return_thunk+0x5/0x7f
[ 245.401827] ? srso_alias_return_thunk+0x5/0x7f
[ 245.401828] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 245.401834] schedule+0x69/0x110
[ 245.401836] schedule_preempt_disabled+0xe/0x20
[ 245.401839] __mutex_lock.constprop.0+0x255/0x480
[ 245.401843] __mutex_lock_slowpath+0x13/0x20
[ 245.401846] mutex_lock+0x38/0x50
[ 245.401848] device_release_driver+0x1f/0x40
[ 245.401855] pci_stop_bus_device+0x74/0xa0
[ 245.401862] pci_stop_and_remove_bus_device+0x13/0x30
[ 245.401864] pciehp_unconfigure_device+0x92/0x150
[ 245.401872] pciehp_disable_slot+0x6c/0x100
[ 245.401875] pciehp_handle_presence_or_link_change+0x22a/0x340
[ 245.401877] ? srso_alias_return_thunk+0x5/0x7f
[ 245.401879] pciehp_ist+0x19a/0x1b0
[ 245.401882] ? irq_forced_thread_fn+0x90/0x90
[ 245.401889] irq_thread_fn+0x28/0x70
[ 245.401892] irq_thread+0xde/0x1b0
[ 245.401895] ? irq_thread_fn+0x70/0x70
[ 245.401898] ? irq_thread_check_affinity+0x100/0x100
[ 245.401901] kthread+0x12a/0x150
[ 245.401905] ? set_kthread_struct+0x50/0x50
[ 245.401907] ret_from_fork+0x22/0x30
[ 245.401915] </TASK>
[ 255.639346] vfio-pci 0000:07:00.0: not ready 32767ms after bus reset; waiting
[ 290.454384] vfio-pci 0000:07:00.0: not ready 65535ms after bus reset; giving up
[ 290.456313] vfio-pci 0000:07:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 290.457400] pci 0000:07:00.0: Removing from iommu group 82
[ 290.499751] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x13e8 (issued 69656 msec ago)
[ 290.500378] pcieport 0000:02:04.0: pciehp: Slot(0-4): Card present
[ 290.500381] pcieport 0000:02:04.0: pciehp: Slot(0-4): Link Up
[ 292.534371] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x13e8 (issued 2036 msec ago)
[ 292.675367] pci 0000:07:00.0: [1344:51c0] type 00 class 0x010802
[ 292.679292] pci 0000:07:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit]
[ 292.683949] pci 0000:07:00.0: reg 0x20: [mem 0x00000000-0x0003ffff 64bit]
[ 292.685175] pci 0000:07:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[ 292.686647] pci 0000:07:00.0: Max Payload Size set to 512 (was 128, max 512)
[ 292.701608] pci 0000:07:00.0: PME# supported from D0 D1 D3hot
[ 292.722320] pci 0000:07:00.0: Adding to iommu group 82
[ 292.725153] pcieport 0000:02:04.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 292.729326] pcieport 0000:02:04.0: bridge window [io 0x1000-0x0fff] to [bus 07] add_size 1000
[ 292.729338] pcieport 0000:02:04.0: BAR 13: no space for [io size 0x1000]
[ 292.729341] pcieport 0000:02:04.0: BAR 13: failed to assign [io size 0x1000]
[ 292.729344] pcieport 0000:02:04.0: BAR 13: no space for [io size 0x1000]
[ 292.729345] pcieport 0000:02:04.0: BAR 13: failed to assign [io size 0x1000]
[ 292.729351] pci 0000:07:00.0: BAR 0: assigned [mem 0xf4200000-0xf423ffff 64bit]
[ 292.731040] pci 0000:07:00.0: BAR 4: assigned [mem 0xf4240000-0xf427ffff 64bit]
[ 292.732756] pci 0000:07:00.0: BAR 6: assigned [mem 0xf4280000-0xf42bffff pref]
[ 292.732761] pcieport 0000:02:04.0: PCI bridge to [bus 07]
[ 292.733491] pcieport 0000:02:04.0: bridge window [mem 0xf4200000-0xf42fffff]
[ 292.733981] pcieport 0000:02:04.0: bridge window [mem 0xf4f00000-0xf50fffff 64bit pref]
[ 292.736102] nvme nvme1: pci function 0000:07:00.0
[ 292.736683] nvme 0000:07:00.0: enabling device (0000 -> 0002)
[ 292.849474] nvme nvme1: 127/0/0 default/read/poll queues
[ 292.873346] nvme1n1: p1 p2
[ 294.144318] vfio-pci 0000:08:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 294.147677] pcieport 0000:02:05.0: pciehp: Slot(0-5): Link Down
[ 294.562254] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x12e8 (issued 2028 msec ago)
[ 294.870254] vfio-pci 0000:08:00.0: timed out waiting for pending transaction; performing function level reset anyway
[ 296.118251] vfio-pci 0000:08:00.0: not ready 1023ms after FLR; waiting
[ 297.174284] vfio-pci 0000:08:00.0: not ready 2047ms after FLR; waiting
[ 299.414197] vfio-pci 0000:08:00.0: not ready 4095ms after FLR; waiting
Hello! I hope some of you can give me some pointers in the right direction for my question!
First off, a little description of my situation and what I am doing:
I have a server with ESXi as a hypervisor running on it. I run all kind of VMware/Omnissa stuff on it and also a bunch of servers. It's a homelab used to monitor and manage stuff in my home. It has an AD, GPO's, DNS, File server and such. Also running Homa Assistent, a Plex server and other stuff.
Also, I have build a VM pool to play a game on it. I don't connect to the virtual machine through RDP, but I open the game in question from the Workspace ONE Intelligent Hub as a published app. This all works nicely.
The thing is, the game (Football Manager 2024) runs way better on my PC than it does on my laptop. Especially during matches it's way smoother on my PC. I was thinking, this should run fine on both machines, as it is all running on the server. The low utilization of resources by the Horizon Client (which is essentially what streams the published app) confirms this I guess. It takes up hardly any resources, like, really low.
My main question is, what does determine the quality of the stream, is it mostly network related? Or is there other stuff on the background causing it to be worse on my laptop?
I am interested in trying to play some games like fortnite or apex legends since my friends play them. However I know anticheat isn't very friendly with virtual machines. So far the only issue I have had was trying to hide the hypervisor. My CPU is a ryzen 7 5700x and when I enter <feature policy='disable' name='hypervisor'/> my virtual machine either doesnt launch or lags terribly. Is there any way to hide the hypervisor at least in my case
Hi VFIO users,
I was wondering if someone could confirm Asrock Deskmeet (and not Deskmini) X300 is a good pick for a Proxmox node considering the following inputs:
Thanks in advance and have a good day
I've carefully followed this guide from GitHub and it results in a black screen with a static underscore "_" symbol like in the picture below.
The logs, XML config and my specifications are at the end of the post.
Here is in short a step-by-step of what I've done. (If you are familiar with the guide you can probably skip the steps as I am highly confident that I've followed them correctly except maybe 8. "trust me bro")
IOMMU & SVM
in BIOS.amd_iommu=on iommu=pt video=efifb:off
to my /etc/default/grub
and generated a grub config using grub-mkconfig
Installed required tools
sudo apt install qemu-kvm qemu-utils libvirt-daemon-system libvirt-clients bridge-utils virt-manager ovmfapt install qemu-kvm qemu-utils libvirt-daemon-system libvirt-clients bridge-utils virt-manager ovmf
Enabled required services
systemctl enable --now libvirtd virsh net-start default virsh net-autostart default
Added me to libvirt
group and also input
and kvm
group for passing input devices.
usermod -aG kvm,input,libvirt username
Downloaded win10.iso
and virtio drivers
Configured my VM hardware carefully like in the guide, installed Windows 10 and installed virtio drivers
on my new Windows system once the installation was over.
Turned off my machine and removed Channel Spice, Display Spice, Video QXL, Sound ich* and other unnecessary devices. It is worth noting that I had trouble of doing this using the virtmanager
GUI, so I had to remove them using the XML in the overview section which might be the cause of black screen.
After removing the unnecessary devices I added 4 PCI Devices
for every entry in my NVIDIA IOMMU group.
Added libvirt hooks for create, start and shutdown.
Passed 2 USB Host Devices
for my keyboard and mouse respectfully.
I've skipped audio passthrough for now.
Spoofed my Vendor ID
and hidden KVM CPU leaf
.
patched.rom
file inside hostdev PCI
representing my NVIDIA VGA adapter
(first one in IOMMU group 15 as seen in the screenshot above).After this I've started my VM and encountered the problem described above. My mouse and keyboard are passed-through so the only thing I can do to exit the screen is to reboot the computer using power button.
Here is some additional info and some logs:
XML: win10.xml
Logs: win10.log
My system specifications:
CPU: AMD Ryzen 5 2600
GPU: NVIDIA RTX 2060 SUPER
OS: Linux Mint 22
2 Monitors, both connected to same GPU, one using primary DisplayPort and secondary using HDMI
Any advice that could point me to a solution is highly appreciated, thank you!
Hi,
I am currently working on setting up a windows 10 VM on my ubuntu server that passes through a quadro p4000 GPU, which has no monitor attached. I will then use Parsec to remotely connect to the VM.
I followed this guide to pass through the GPU, and configured the XML file to hide the fact that I am running a VM. I then installed the appropriate Nvidia drivers, and installed the additional vfio drivers to the VM. I have parsec up and running, and can successfully connect to the VM.
For some reason however, the gpu refuses to work and is spitting out a code 43 error. I have removed all spice connected displays from virt-manager, and uninstalled/reinstalled drivers several times. I am at a bit of a loss of how to solve this. I believe I have set everything up for passthrough on the host, and I believe the issue lies entirely within the VM. I am not sure though.
Any advice would be greatly appreciated. Thanks!
I've been running GPU passthrough with cpu pinning on a windows vm for a long time on my previous machine. I've built a new one and now things work as expected only on the first run of the VM.
After shutting down the VM, as per usual, when I start it again the screen remains black and there doesn't seem to be any activity. I am forced to reboot the host and run the VM successfully the first time again.
My GPU is a 6000 series amd radeon and I verified that all the devices bound to vfio on boot remain so after VM shutdown and before trying to run it the second time.
I'm not sure what is causing this issue. Any help is appreciated.
Thanks.
I'd like to know what's the best method and guide for successfully doing a single GPU passthrough, as I've changed to AMD to experience the "full Linux experience".
I kinda feel like in a rabbit hole, as every guide I find online mentions the need of a secondary iGPU/dGPU. I have a RX 7900 XT and a 13700K, so the iGPU method won't work for me.
What I'd like is to be able to run a Windows VM on QEMU with full GPU support and GPU Accel, as when I tried it on a RTX 3080, it didn't work. I guess AMD will make it easier, but I'm kind of lost! I'd appreciate some help :)
Heyya!
So great success in passing throught my 3070ti into a Win VM on Proxmox, cloud gaming via parsec is awesome. However, I've encountered a small issue. I use my homeserver for a variety of of things, one of which being Plex/media server. I also have a 1050ti in my set up which I want to passthrough to a plex lxc HOWEVER the vfio drivers have bound themselves to the 1050ti and aren't visible using nvidia-smi.
I've tired installing the nvidia drivers, however the installation fails due to an issue, after digging around Ive spotted that the vfio is bound to the 1050ti. Ive looked at how to unbound it but nothing is concrete in terms of steps or paths to do this.
The gpu is working as the card works on a Win VM I'm using as a temporary plex solution. HW transcodes work and the 1050ti is recognised in Proxmox and in Win.
I'm fairly new to Linux in general and yes the Win Plex VM works, however I feel like it's a waste of resources when lxc is so light weight, also Plex Win VM is using SMB to pull the media from my server so it's very round a bout consider I can just mount the storage using lxc anyway.
Please help!!!
So far I've got
But still having Display output is not active rendering issue ie can't game on Windows VM; that's why documenting my progress to seek help as well as help whoever interested.
https://github.com/hmoazzem/env-vfio-pci-passthrough-gpu
Any hint/help is much appreciated.
Is there any native way of passing something to qemu keyboard input? Let's say I want to pass something from file to qemu input or make a program that passes particular input to the VMs depending on its parameters, is there any native way to do it? The only way I can think of is pass <-nographic>
and then use system().
Hi , I've tried for a while to get my single GPU passthrough to work, but now became stuck.
I followed this tutorial:
https://gitlab.com/risingprismtv/single-gpu-passthrough/-/wikis/home
Dumped my vbios with amdvbflash,
However after launching the vm (WIndows 10) and opening task manager, no GPU shows up.
Opening msinfo32 shows that im using "Microsoft Basic Display Adapter"
If anyone can help ill appreciate it!
I am building a new system. Well, everything but the video card (I’m going to save up for something really nice). I have considerable Linux experience, and some ProxMox experience. I am considering using ProxMox and having a gaming Windows 11 VM and then various other Linux VMs/LXCs for other needs. I chose the AMD 7950X3D because I want to try to use the X3D cores with the added cache for gaming and the other cores for the rest. I know this won’t be a cake walk, but if it is possible I think I can do it. Does anyone have any feedback on the components I chose?
Type | Item | Price |
---|---|---|
CPU | AMD Ryzen 9 7950X3D 4.2 GHz 16-Core Processor | $596.99 @ Newegg |
CPU Cooler | Noctua NH-D12L 60.09 CFM CPU Cooler | $89.95 @ Amazon |
Motherboard | MSI MAG X670E TOMAHAWK WIFI ATX AM5 Motherboard | $219.99 @ Newegg |
Memory | Corsair Vengeance 32 GB (2 x 16 GB) DDR5-6000 CL30 Memory | $89.99 @ Newegg |
Storage | Crucial T705 1 TB M.2-2280 PCIe 5.0 X4 NVME Solid State Drive | $115.99 @ Newegg |
Video Card | Gigabyte GAMING GeForce GTX 1060 6GB 6 GB Video Card | - |
Case | Fractal Design North ATX Mid Tower Case | $109.99 @ Amazon |
Power Supply | Corsair RM850e (2023) 850 W 80+ Gold Certified Fully Modular ATX Power Supply | $104.99 @ Amazon |
Prices include shipping, taxes, rebates, and discounts | ||
Total | $1327.89 | |
Generated by PCPartPicker 2024-11-23 21:30 EST-0500 |
I'm a newbie sys admin (1 year experience) and up until now I managed to solve most stuff by following tutorials, reading documentation or just plain old trial and error.
Current problem is:
I have a ubuntu 22.04.05 server as a host and I want to passthrough one or more Nvidia 4090 GPUs to a Qemu KVM.
The IOMMU groups look ok to me when the host starts:
IOMMU GROUP 30 2f:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2684] (rev a1)
IOMMU GROUP 30 2f:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
IOMMU GROUP 45 40:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2684] (rev a1)
IOMMU GROUP 45 40:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
IOMMU GROUP 189 b0:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2684] (rev a1)
IOMMU GROUP 189 b0:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
IOMMU GROUP 206 c2:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2684] (rev a1)
IOMMU GROUP 206 c2:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
IOMMU GROUP 6 00:14.0 USB controller [0c03]: Intel Corporation Device [8086:1bcd] (rev 11)
The grub where I set up the intel_iommu and the vfio ids:
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on iommu=pt vfio-pci.ids=10de:2684,10de:22ba"
GRUB_CMDLINE_LINUX=""
And for "forcing" the gpus to use the vfio-pci driver I used the /etc/initramfs-tools/scripts/init-top/vfio.sh
approach:
PREREQ=""
prereqs()
{
echo "$PREREQ"
}
case $1 in
prereqs)
prereqs
exit 0
;;
esac
for dev in 0000:2f:00.0 0000:2f:00.1 0000:40:00.0 0000:40:00.1 0000:b0:00.0 0000:b0:00.1 0000:c2:00.0 0000:c2:00.1
do
echo "vfio-pci" > /sys/bus/pci/devices/$dev/driver_override
echo "$dev" > /sys/bus/pci/drivers/vfio-pci/bind
done
exit 0
I can assign them when creating or editing the vm just fine, but when the vm starts it outputs this "error" in the log:
-device vfio-pci,host=0000:40:00.0,id=hostdev0,bus=pci.5,addr=0x0,rombar=1 \
-device vfio-pci,host=0000:40:00.1,id=hostdev1,bus=pci.6,addr=0x0,rombar=1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.7,addr=0x0 \
-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.8,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/0 (label charserial0)
2024-11-21T08:08:15.901334Z qemu-system-x86_64: vfio-pci: Cannot read device rom at 0000:40:00.0
Device option ROM contents are probably invalid (check dmesg).
Skip option ROM probe with rombar=0, or load from file with romfile=
I can provide the kvm xml as well, but I only add <rom bar='on'/>
for both the video and audio part.
Tldr: I set it up for gpu passthrough, I launch it and says it cannot access the gpu rom (?) and I'd expect it to be passed through correctly
I'm planning a new build, and am thinking of going with a 9800X3D, 7900 GRE, with 2x32 DDR5. I don't know what motherboard to get yet, which hopefully I can get some advice on since, afaik, not all motherboards work with VFIO the same?
Will the CPU and GPU work with this as well? I have heard the 7000 AMD series has some issues with passthrough.
I'm going to be running Arch underneath, and pass the dGPU through to a Windows VM, and have the Arch host switch to the iGPU. I'll be using the VM for both productivity and gaming, but any gaming I'll be doing on it won't be super intensive.
Hi everyone, I've now tried for a while to get my GPU passthrough to work, but now became stuck with below issue. In short, I need a vbios ROM or my host crashes, but cannot find a way to extract the correct vbios from my card.
I would be extremely happy if someone could point me in a promising direction.
Setup:
GPU for passthrough: AMD RX 7700 XT
CPU: Ryzen 7 7700X
Host GPU: integrated graphics (Raphael)
Mainboard/Chipset: MSI B650M Gaming Plus Wifi
OS: Ubuntu 24.04 (Sway Remix -> Wayland)
Software: libvirt version: 10.0.0, package: 10.0.0-2ubuntu8.4 (Ubuntu), qemu version: 8.2.2Debian 1:8.2.2+ds-0ubuntu1.4, kernel: 6.8.0-48-generic
Passthrough setup:
Pretty default with a Spice display
PCI passtrough of both VGA and audio function of GPU
(Optional: PCI NVME with bare-metal installed Windows)
Both GPUs connected to monitor with different cables.
Pretty sure vfio-pci correctly set up and binding the respective devices.
In BIOS, set IOMMU enabled and resizable BAR disabled.
Main issue: Passing through the GPU makes the host lag and eventually reset.
Once I start the VM, everything immediately breaks. I cannot even see the TianoCore logo of the guest bios in my Spice display, everything stays black. No output on the passed-through GPU.
Also, the host starts to lag immensely. Input will just get eaten (hard to move the mouse), some keypresses are even ignored. After a while (say, a minute?) or after managing to force power off the VM, the host resets.
The extremely weird thing is that I could find absolutely nothing in the logs! Nothing noteworthy in the journal after reboot, not even when I manage to run dmesg when it's lagging. Nothing noteworthy under /var/log/libvirt/ (only thing is about the VM being tainted due to custom-argv, idk).
Does anybody have an idea what's going on here?
What works
Just to mention this, the GPU works fine when not passed through, under a Windows and Linux host without issues.
Now, regarding passthrough, when removing the GPU with its two functions, everything runs smoothly. I can even boot my bare-metal installed Windows with a passed-through nvme and it seems to work fine.
The interesting thing: I read about this whole thing about the PCI device ROM and passing a ROM image to the VM. Thing is, I could find none for my exact graphics card, but downloaded a ROM for a similar card (also RX 7700 XT) from Techpowerup.
With this, the host issue is magically gone! The guest boots fine and I even get some video output on the passed-through GPU (splash screen with a Linux guest).
However, the guest driver still cannot correctly initialize the GPU. Below the amdgpu dmesg output extracted from a Linux guest:
amdgpu 0000:05:00.0: ROM [??? 0x00000000 flags 0x20000000]: can't assign; bogus alignment
amdgpu 0000:05:00.0: amdgpu: Fetched VBIOS from ROM
amdgpu: ATOM BIOS: 113-D7120601-4
amdgpu 0000:05:00.0: amdgpu: CP RS64 enable
amdgpu 0000:05:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
amdgpu 0000:05:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
amdgpu 0000:05:00.0: amdgpu: PCIE atomic ops is not supported
amdgpu 0000:05:00.0: amdgpu: MEM ECC is not presented.
amdgpu 0000:05:00.0: amdgpu: SRAM ECC is not presented.
amdgpu 0000:05:00.0: BAR 2 [mem 0x382010000000-0x3820101fffff 64bit pref]: releasing
amdgpu 0000:05:00.0: BAR 0 [mem 0x382000000000-0x38200fffffff 64bit pref]: releasing
amdgpu 0000:05:00.0: BAR 6: [??? 0x00000000 flags 0x20000000] has bogus alignment
amdgpu 0000:05:00.0: BAR 0 [mem 0x382000000000-0x38200fffffff 64bit pref]: assigned
amdgpu 0000:05:00.0: BAR 2 [mem 0x382010000000-0x3820101fffff 64bit pref]: assigned
amdgpu 0000:05:00.0: BAR 6: [??? 0x00000000 flags 0x20000000] has bogus alignment
amdgpu 0000:05:00.0: amdgpu: VRAM: 12272M 0x0000008000000000 - 0x00000082FEFFFFFF (12272M used)
amdgpu 0000:05:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
I assume this issue is from me not using the correct VBIOS for my card. So I want to fix this, but now I'm also stuck here!
Implied issue: How to extract the vbios from RX 7700 XT (Navi32)
I've tried the extraction with amdvbflash
on both Windows and Linux, but nothing worked.
Under Windows, the latest version I could find (AMD IFWI Flasher Tool Version 5.0.567.0-External) does not even list the GPU.
Under Linux, the amdvbflash tool does not output anything (not even help text), but maybe this is due to me running on Wayland?
I really wonder how people actually managed to extract their vbios. I found a few posts of people getting it done with the 7700/7800, but it seems that Navi32 is badly supported in general. People with Navi31 (RX 7900) seem to have more success.
Ok so next thing I tried was reading out /sys/bus/pci/devices/XXXX/rom
But there I got the issue that I only get the "small" / truncated / initialized version of the vbios (110KB), whereas the downloaded vbios that works is 2.0MB.
I've tried many kernel cmdline parameters (e.g. video=efifb:off) to not get it to initialize the GPU, but then noticed that already GRUB is shown on both GPUs.
So my host BIOS seems to already initialize both GPUs. Unfortunately, I could not find a way around this. There's a setting that lets me choose my boot graphics adapter which I set to IGD and then options like "dedicated gpu detection" and "hybrid graphics" which I played around with, but never changed behavior.
I also tried unplugging the monitor cable from the dGPU, but also no luck. Every time I check, it is already initialized.
I'm out of ideas -- any help is appreciated!
Cheers
Hello everyone, I've discovered about this method recently, watched some videos and searched for the basics, now I'm trying to decide if its worth to migrate to a VM with GPU passthrough. I have a dual boot machine for a long time (few years) and love Linux, its customization, thinkering and everything...
Windows i use for gaming and graphical software without support in Linux (Adobe AE, Premiere and Photoshop). I work with video editing and motion graphics and whatever can be made in Linux, i do (DaVince Resolve, Blender, processing with ffmpeg etc.), Blender has a slightly better performance in Linux as well. So Windows is my secondary system.
Now I've started to study Unreal Engine and, although it has a Linux version, its performance in OpenGL and Vulkan is very low, DX12 unfortunatly is a must. I looked into running the Windows version with proton but looks like to much of a hassle for something that could not work so well.
PC Specs (a bit old, but has a good performance):
- Intel Xeon E5-1680 v2 8 cores (16 threads), has VT-x and VT-d according to Intel's page
- Huananzhi X79 Deluxe v7.1 (has 2 PCIe 3.0 x16 slots, bios modded with reBAR on)
- 32gb ddr3 RAM Gskill (1600mhz C10, looking into oc to 1866 or reduce latency)
- RTX 3060 12gb (reBAR enabled in both Windows and Linux, undervolted with vram oc in both systems)
- GTX 1060 6gb (my old gpu, not connected but can be used if necessary)
- 750W PSU
- OS 1: Rocky Linux 9 (RHEL 9 based) with Gnome DE in X (not Wayland) | Nvidia driver 565
- OS 2: Windows 10 | Nvidia driver 566 (studio driver)
Both systems in UEFI, secure boot disabled.
The Windows and Linux systems are in independent drives. On Windows i can play most DX11 games on high or ultra at 1440p with more than 60fps and DLDSR, DX12 games with same settings with balanced RT and DLSS at 60fps (mostly).
Taking into account that i want to have a seamless/faster experience as possible between systems, i ask:
- How can i be sure my cpu has the needed features? aside from intel's page on it. Is there any commands in Linux for that?
- With my specs its worth to try?
- Can i use the Windows already in its current state?
- What kind of % performance drop i should expect in the Windows VM?
- If using both GPUs, when NOT in the VM, would i be able to assign the other GPU to Linux tasks?
- Its worth to use both GPUs, or better to stick to the most powerful one only?
- Is Looking Glass the better way to use it?
- When in the VM, the hardware resources avaiable to Linux can be only the bare minimum right? When closing VM these resources are restored?
- I manage the GPU OC in Linux using GreenWithEnvy, and in Windows with Afterburner, if using a single GPU, can this be a problem? If using both GPUs, Windows will be able to manage the OC as it was native?
Thanks in advance.
If so is there any specific work-arounds needed?
Hi,
So after months of having issues with Nvidia and EGL Wayland and Looking Glass, none of which seem to be able to be fixed or various reasons, I'm trying to see if I can replace that card with an AMD equivalent, as such I wonder if there's a card that I can fit into my system that would be like the GT1030 (my host GPU), require no extra power and have a similar size, since I can't really fit much else (3090 guest GPU so space and power is an issue). Thanks in advance.
hey guys. I'm a sys admin, with some linux experience, but I'm not 100% sure I know what I'm trying to do.
I have a headless Ubuntu server running running Plex, and several game servers like Minecraft. I want to add a GPU to it, and carve off some resources to a VM, that my daughter can stream games to her low spec laptop with. I deally I would just install steam, and her games on the VM, and she would use Steam in home streaming to play. I do basically the exact same thing, but I stream from the host OS of my main computer (headless, rack mounted windows 10, with whatever combination of good components I can cobble together at the time. works good for me).
I've never tried it before, and I don't know what I don't know on this. google has been leading me down several contradictions, and insider lingo I don't understand. any intro tips? or a yes/no on if my plan will even work?
So, I have been looking into making a new Pc for GPU passthrough, and I have been researching for a while and asked already some help in the making of the PC in a Spanish website called "Pc Componentes", where you buy electronics and can build PCs. I pretend to use this PC to install Linux as the main OS and use Windows under the hood.
After some help of the webpage consultants I got a working build, that should work for passthrough, though I would still like your input, for I had cheked that the CPU had IOMMU compatibility, but I´m not so sure for the Motherboard, even after researching for a while on some IOMMU compatibility pages.
The build is as follows:
-SOCKET: Intel Processor socket LGA 1700
-CPU: Intel Core i9-14900K 3.2/6GHz Box
-Motherboard: ASUS PRIME Z790-P WIFI
-RAM: Corsair Vengeance DDR5 6400MHz PC5-51200 32GB 2x16GB CL32 Black
-Case: Forgeon Mithril ARGB Mesh Case ATX Black
-Liquid Refrigeration: MSI MAG CORELIQUID M360 ARGB Kit for Liquid Refrigeration 360mm Black
-Power Suply: Corsair RMe Series RM1000e 1000W 80 Plus Gold Modular
-GPU: Zotac Gaming GeForce RTX 4070 SUPER Twin Edge 12GB GDDR6X DLSS3
-Hard Drive: WD Black SN770 2TB Disco SSD 5150MB/S NVMe PCIe 4.0 M.2 Gen4 16GT/s
And that is the build, it´s within my budget of 1500 -2500 €.
I went to this webpage because It was a highly trusted and well known place to get a working PC in my country, and because I´m really bad at truly undertanding some hardware stuff, even after trying for many months, so thats why I got consultants to help me. That and that I don´t see myslef physicaly building a PC from parts that I could by in diferent places, even if many could tell me that is easy. That´s why I went to this page in the first place, so at least I could get a working PC, so I could make the OS installation and all other software by myself (which I will, as I´m really looking forward to doing so).
But I understand that those consultants could be selling me anything that may not fit my needs ultimately, so that´s why I came here to ask for some opinions and if there is something wrong with it or if it´s lacks something else that it may need or helps for the passthrough.
I expect in the very beginning I will only require more than 24GB of VRAM for LLMs, though I want to explore all aspects of ML and AI, and I have read that system RAM being twice VRAM is good target.
Development and working with AI is the main priority of this build, though I would like to game sometimes, so I am awaiting the release of the 9950X3D hoping for the best of both. This will also be my first VFIO system, and the Proxmox host and ideally another very light VM for basic web browsing and chat that is always running will require a small amount of RAM, I assume 2 - 4 GBs total?
I have read there are many issues getting 128 GB (4 x 32) stable on AM5. Should I instead opt for 96GB (2 x 48) which seems to be more stable at higher frequencies, or for my use case will I experience some kind of bottleneck and end up wishing I had tried to make 128GB work?
EEC support is also limited on AM5, and it seems to come at a price of speed so I am unsure about it.
How important is the frequency of the DDR5?
The Asus ProArt Creator B650 seems to be the most widely recommended board for VFIO due to the IOMMU grouping but I am open to the x670 or even the x870 versions or any other suggestions.
I currently own an asus B550 motherboard, which has been great except I can't figure out how to passthrough my PCI USB C card because it is in a group with several other devices. I have recently been looking at getting an ASUS TUF X570 pro motherboard, but I haven't found any consistent information on the quality of IOMMU groupings, and have read that both GPU slots end up in the same group. Anyone have good success on other AM4 boards? Willing to try different brands if it yields good results.
I am asking this to collect some information on what works for people and also how it works. What are your configuration that works for you. what is your display manager, DE, display server, your gpus, what method do you use to unbind the the desired gpu from it's driver and etc?
edit: without restarting your display manager