* [Bug 200101] random freeze under load
[not found] <bug-200101-28872@https.bugzilla.kernel.org/>
@ 2020-06-12 6:15 ` bugzilla-daemon
2020-06-27 8:21 ` bugzilla-daemon
1 sibling, 0 replies; 2+ messages in thread
From: bugzilla-daemon @ 2020-06-12 6:15 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=200101
--- Comment #3 from Garry Filakhtov (filakhtov@gmail.com) ---
Struggling with the same issue. Also coming from Gentoo 👋 lekto!
This was long coming, I just needed a lot of time to ensure there is no
hardware issues or any kind of misconfiguration on my end, before reporting
here.
I have Intel X299 platform and using it to run Windows 10 virtual machine with
PCI pass-through. I use NVMe SSD (Samsung EVO 970 Plus), PCIe USB 3.0 (StarTech
PEXUSB3S3GE) adapter and GPU (nVidia GeForce 1650) pass-through to get best
possible performance and isolation from host OS.
I have been running on 4.19 LTS kernel without any issues, but 5.4 LTS got
promoted to stable for AMD64 architecture and I have switched. After doing so,
I have started experiencing random guest freezes, happening anywhere
immediately after boot all the way up to multiple hours of usage without a
freeze. When the freeze occurs, guest machine will completely stop responding
to input, ping, etc. Host machine works fine and I can connect to qemu socket
without any problems. I am running on QEMU 4.2.0.
Freeze can continue anywhere from 1 minute up to 5 minutes, and eventually VM
is recovering and working properly afterwards, up until the next freeze.
Inspecting dmesg or journalctl on the host machine reveals no any relevant
entries.
Problem appears regardless of the type of workflow performed. It can just
freeze on the desktop, in the web browser or in the GPU benchmark. I was
playing music on the system and just before freezing, sound starts to
drop/glitch and then goes completely silent.
Windows event viewer is of course as useful as a fridge on the North pole
before the climate change :D (pardon my pun), meaning no entries are produced
during the freeze, and there is actually a gap between written entries for
however long the freeze took.
So far, I have tested a good variety of Kernel versions:
[1] linux-4.19.120-gentoo <- works fine
[2] linux-4.20.17-gentoo <- works fine
[3] linux-5.0.0-gentoo <- randomly freezes as described
[4] linux-5.0.21-gentoo <- randomly freezes as described
[5] linux-5.1.21-gentoo <- can't even boot guest, getting freeze during
very early boot
[6] linux-5.2.20-gentoo <- qemu won't even start, complaining about KVM
suberror 1
[7] linux-5.3.18-gentoo <- randomly freezes as described
[8] linux-5.4.38-gentoo <- randomly freezes as described
My takeaway here is that something went wrong in the 5.0.0 and was never fixed
since.
I have not yet tried to bisect the GIT source, but might give it a go, time
permitting.
I am using naked qemu-system-x86_64 command, to rule out virt-manager problems.
PCIe devices are attached via separate pcie-root-port devices. Using OVMF UEFI
(sys-firmware/edk2-ovmf-201905) for booting with Secure Boot enabled (disabling
Secure Boot makes no difference). I have also did clean Windows 10 install to
rule out any issues with the guest OS itself, but problem persisted. I have
tried using Windows-provided GPU drivers as well as the latest from nVidia.
Using "host" CPU for qemu.
There is a similar problem reported on Reddit too, the solution was to
downgrade:
https://www.reddit.com/r/VFIO/comments/b1xx0g/windows_10_qemukvm_freezes_after_50x_kernel_update/
Host hardware:
Motherboard: ASUS WS X299 SAGE
CPU: Intel i9-9940x
Guest GPU: nVidia GTX 1650
Host GPU: AMD Radeon PRO WX 3100
RAM: 64Gb (4x16Gb) DDR4 2666MHz
SSD: Samsung 970 EVO Plus
PCIe adapter: StarTech PEXUSB3S3GE 3xUSB3.0 + USB Realtek Gigabit network combo
adapter
Guest OS: Windows 10 Professional (1909)
QEMU version: 4.2.0
qemu options used:
-name Microsoft Windows 10 Professional
-M q35,kernel_irqchip=on,vmport=off,accel=kvm,mem-merge=off
-nodefaults
-display none
-vga none
-net none
-nographic
-monitor unix:/run/qemu/win10.sock,server,nowait
-pidfile /run/qemu/win10.pid
-cpu host,kvm=off
-smp sockets=1,cores=6,threads=2
-m size=16G
-drive
if=pflash,format=raw,readonly,file=/usr/share/edk2-ovmf/OVMF_CODE.secboot.fd
-drive if=pflash,format=raw,file=/usr/share/edk2-ovmf/OVMF_VARS.secboot.fd
-rtc base=localtime
-device pcie-root-port,id=port0.0,bus=pcie.0,chassis=0,slot=0,addr=1.0
-device vfio-pci,host=19:0.0,multifunction=on,bus=port0.0,addr=0.0
-device vfio-pci,host=19:0.1,bus=pcie.0,bus=port0.0,addr=0.1
-device pcie-root-port,id=port0.2,bus=pcie.0,chassis=0,slot=2
-device vfio-pci,host=1a:0.0,bus=port0.2
-device pcie-root-port,id=port0.5,bus=pcie.0,chassis=0,slot=5
-device vfio-pci,host=b3:0.0,bus=port0.5
I will try lekto's suggestion and report back any progress.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug 200101] random freeze under load
[not found] <bug-200101-28872@https.bugzilla.kernel.org/>
2020-06-12 6:15 ` [Bug 200101] random freeze under load bugzilla-daemon
@ 2020-06-27 8:21 ` bugzilla-daemon
1 sibling, 0 replies; 2+ messages in thread
From: bugzilla-daemon @ 2020-06-27 8:21 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=200101
--- Comment #4 from Garry Filakhtov (filakhtov@gmail.com) ---
Okay, have played a bit further with all of this. I have managed to get freezes
on linux-4.19.120-gentoo as well, after using CPU pinning together with RR
scheduling policy and priority to 1 for all vCPU threads.
After removing the commit 47c61b3955cf712cadfc25635bf9bc174af030ea it seems
like the system is indeed working without freezing. I will continue testing and
updating as I get more information.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-06-27 8:21 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-200101-28872@https.bugzilla.kernel.org/>
2020-06-12 6:15 ` [Bug 200101] random freeze under load bugzilla-daemon
2020-06-27 8:21 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox