public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@bugzilla.kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 200101] random freeze under load
Date: Fri, 12 Jun 2020 06:15:24 +0000	[thread overview]
Message-ID: <bug-200101-28872-Pc7HUPDomr@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-200101-28872@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=200101

--- Comment #3 from Garry Filakhtov (filakhtov@gmail.com) ---
Struggling with the same issue. Also coming from Gentoo 👋 lekto!

This was long coming, I just needed a lot of time to ensure there is no
hardware issues or any kind of misconfiguration on my end, before reporting
here.

I have Intel X299 platform and using it to run Windows 10 virtual machine with
PCI pass-through. I use NVMe SSD (Samsung EVO 970 Plus), PCIe USB 3.0 (StarTech
PEXUSB3S3GE) adapter and GPU (nVidia GeForce 1650) pass-through to get best
possible performance and isolation from host OS.

I have been running on 4.19 LTS kernel without any issues, but 5.4 LTS got
promoted to stable for AMD64 architecture and I have switched. After doing so,
I have started experiencing random guest freezes, happening anywhere
immediately after boot all the way up to multiple hours of usage without a
freeze. When the freeze occurs, guest machine will completely stop responding
to input, ping, etc. Host machine works fine and I can connect to qemu socket
without any problems. I am running on QEMU 4.2.0.

Freeze can continue anywhere from 1 minute up to 5 minutes, and eventually VM
is recovering and working properly afterwards, up until the next freeze.
Inspecting dmesg or journalctl on the host machine reveals no any relevant
entries.

Problem appears regardless of the type of workflow performed. It can just
freeze on the desktop, in the web browser or in the GPU benchmark. I was
playing music on the system and just before freezing, sound starts to
drop/glitch and then goes completely silent.

Windows event viewer is of course as useful as a fridge on the North pole
before the climate change :D (pardon my pun), meaning no entries are produced
during the freeze, and there is actually a gap between written entries for
however long the freeze took.

So far, I have tested a good variety of Kernel versions:

  [1]   linux-4.19.120-gentoo <- works fine
  [2]   linux-4.20.17-gentoo <- works fine
  [3]   linux-5.0.0-gentoo <- randomly freezes as described
  [4]   linux-5.0.21-gentoo <- randomly freezes as described
  [5]   linux-5.1.21-gentoo <- can't even boot guest, getting freeze during
very early boot
  [6]   linux-5.2.20-gentoo <- qemu won't even start, complaining about KVM
suberror 1
  [7]   linux-5.3.18-gentoo <- randomly freezes as described
  [8]   linux-5.4.38-gentoo <- randomly freezes as described

My takeaway here is that something went wrong in the 5.0.0 and was never fixed
since.

I have not yet tried to bisect the GIT source, but might give it a go, time
permitting.

I am using naked qemu-system-x86_64 command, to rule out virt-manager problems.
PCIe devices are attached via separate pcie-root-port devices. Using OVMF UEFI
(sys-firmware/edk2-ovmf-201905) for booting with Secure Boot enabled (disabling
Secure Boot makes no difference). I have also did clean Windows 10 install to
rule out any issues with the guest OS itself, but problem persisted. I have
tried using Windows-provided GPU drivers as well as the latest from nVidia.
Using "host" CPU for qemu.

There is a similar problem reported on Reddit too, the solution was to
downgrade:
https://www.reddit.com/r/VFIO/comments/b1xx0g/windows_10_qemukvm_freezes_after_50x_kernel_update/

Host hardware:
Motherboard: ASUS WS X299 SAGE
CPU: Intel i9-9940x
Guest GPU: nVidia GTX 1650
Host GPU: AMD Radeon PRO WX 3100
RAM: 64Gb (4x16Gb) DDR4 2666MHz
SSD: Samsung 970 EVO Plus
PCIe adapter: StarTech PEXUSB3S3GE 3xUSB3.0 + USB Realtek Gigabit network combo
adapter
Guest OS: Windows 10 Professional (1909)
QEMU version: 4.2.0

qemu options used:
-name Microsoft Windows 10 Professional
-M q35,kernel_irqchip=on,vmport=off,accel=kvm,mem-merge=off
-nodefaults
-display none
-vga none
-net none
-nographic
-monitor unix:/run/qemu/win10.sock,server,nowait
-pidfile /run/qemu/win10.pid
-cpu host,kvm=off
-smp sockets=1,cores=6,threads=2
-m size=16G
-drive
if=pflash,format=raw,readonly,file=/usr/share/edk2-ovmf/OVMF_CODE.secboot.fd
-drive if=pflash,format=raw,file=/usr/share/edk2-ovmf/OVMF_VARS.secboot.fd
-rtc base=localtime
-device pcie-root-port,id=port0.0,bus=pcie.0,chassis=0,slot=0,addr=1.0
-device vfio-pci,host=19:0.0,multifunction=on,bus=port0.0,addr=0.0
-device vfio-pci,host=19:0.1,bus=pcie.0,bus=port0.0,addr=0.1
-device pcie-root-port,id=port0.2,bus=pcie.0,chassis=0,slot=2
-device vfio-pci,host=1a:0.0,bus=port0.2
-device pcie-root-port,id=port0.5,bus=pcie.0,chassis=0,slot=5
-device vfio-pci,host=b3:0.0,bus=port0.5

I will try lekto's suggestion and report back any progress.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

       reply	other threads:[~2020-06-12  6:15 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-200101-28872@https.bugzilla.kernel.org/>
2020-06-12  6:15 ` bugzilla-daemon [this message]
2020-06-27  8:21 ` [Bug 200101] random freeze under load bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-200101-28872-Pc7HUPDomr@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@bugzilla.kernel.org \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox