Still getting an error on Unraid #5
Closed
opened 4 years ago by mattcrum
·
3 comments
Loading…
Reference in new issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
Thanks a ton for your work on this! I'm no networking wizard but dabbling with Unraid with a Radeon VII. I installed and I'm still getting a error about an Unknown PCI header type '127' (the standard way the bug reps itself). I suspend the server for a second and it'll boot up as usual. Any ideas?
I've got a Asus X570 Prime Pro, Radeon 3950x, Radeon VII, 64gb RAM running Unraid.
From the logs it seems that everything went fine during the guest shutdown (at least Radeon VII GPU was gracefully disabled without any problems).
Could you please explain what exactly are you experiencing after you shut down the guest VM?
That's odd, for me the standard way the reset bug manifests is that if, without any workarounds, I shut down the guest VM, and then attempt to start it again (or if I restart the guest VM), the host becomes unresponsive and halts completely, and the only way out is to press a hardware "reset" button on PC case.
Could it be that we're talking about different bugs?
I've found that error message in google (e.g. https://www.reddit.com/r/VFIO/comments/d2j7o1/the_old_error_internal_error_unknown_pci_header/ ) and it seems that there are indeed two different reset bugs.
After reading these discussions, I'm now under impression that "ordinary" Radeon GPUs have that "ordinary" reset bug as I've described, while Navi-series GPUs have something more complex which might manifest both as "Unknown PCI Header" error and unresponsive host after guest boot attempt. My workaround is designed to help with the latter; it seems that disabling the GPU gracefully is not enough to fix the former. It could be that it is impossible to work around the former inside guest, and you'll still need to apply some patches to the host.
However, if, after you patched your host to fix that "Unknown PCI header" error, you will experience "host unresponsive" problems, my workaround will help you with that.
I'm sorry, but the solution for Navi cards seems to be more complex than for other Radeon cards. My workaround alone cannot solve that, it seems.
I have noticed a general reliability improvement, indeed! I'll continue to keep up the workarounds. Thanks a ton for the quick response. There's a kernel patch'd version of unraid that I've been really hesitant to try but it seems like it may be worth it now. Cheers!