From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Thomas Lindroth <thomas.lindroth@gmail.com>
Cc: kvm@vger.kernel.org, stable@vger.kernel.org
Subject: Re: Regression in v4.14.94 by "x86,kvm: move qemu/guest FPU switching out to vcpu_run"
Date: Mon, 28 Jan 2019 11:53:05 -0800 [thread overview]
Message-ID: <20190128195304.GA20466@linux.intel.com> (raw)
In-Reply-To: <457d0666-1951-1b7c-f7e8-18c67763e6c3@gmail.com>
On Mon, Jan 28, 2019 at 08:25:20PM +0100, Thomas Lindroth wrote:
> I run a qemu/kvm VM with debian and I've started getting segfaults and failing checksums on
> downloaded files. The failures are undeterministic and similar to the failures you get with
> bad ram. I tried to diagnose the problem with various testing tools and found that
> "stress-ng --verify --cpu 1" always give an error. Stress-ng give one of these errors
> usually within 60 sec:
>
> stress-ng-cpu: Newton-Rapshon sqrt not accurate enough
> stress-ng-cpu: prime error detected, number of primes between 0 and 1000000 miscalculated
>
> Nothing relevant has changed recently in the VM but the host kernel was upgraded from
> 4.14.93 to 4.14.96. I can't reproduce the stress-ng error with a 4.14.93 host kernel. There
> is only one kvm related change in that range so I tried to revert that one.
>
> By reverting commit 4124a4cff344abbf8187775eb643d9827830e715
> "x86,kvm: move qemu/guest FPU switching out to vcpu_run" on kernel 4.14.96 I can't reproduce
> the stress-ng error and I have no segfault or other problems with the guest.
This is the second report of this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=202419
Upon inspection, the commit in question is obviously buggy,
kvm_arch_vcpu_ioctl_run() doubles up on kvm_{load,put}_guest_fpu().
The ordering of mainline commits:
f775b13eedee ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
and
5663d8f9bbe4 ("kvm: x86: fix WARN due to uninitialized guest FPU state")
were reversed when backported to 4.14. Commit 5663d8f9bbe4 even explicitly
notes that it fixes f775b13eedee. I'll send a patch.
>
> The commit was originally introduced in v4.15-rc3 (Nov 14 2017) and was only recently
> backported to 4.14. The other stable kernels before 4.14 didn't get any backport so it looks
> like a broken 4.14 backport. That backport also cause problems for other people.
> https://bugzilla.kernel.org/show_bug.cgi?id=202419
>
> I've rebooted between the different kernels and rebooted the VM enough to be reasonably sure
> that commit is the problem. Stress-ng never lasts more than 10 min with that commit but works
> for hours without it.
>
> Steps to reproduce would be to create a qemu/kvm VM with debian stretch, install stress-ng
> version 0.07.16 and run "stress-ng --verify --cpu 1".
>
> Here is the qemu-3.1.0 commandline generated by libvirt:
> /usr/bin/qemu-system-x86_64 -name guest=debian,debug-threads=on -S -object
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-debian/master-key.aes
> -machine pc-i440fx-2.4,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m 2048
> -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid
> 0473ded4-d417-4b0e-a4f5-36ba5a2cd675 -no-user-config -nodefaults -chardev
> socket,id=charmonitor,fd=21,server,nowait -mon chardev=charmonitor,id=monitor,mode=control
> -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown
> -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on
> -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device
> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device
> ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device
> ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive
> if=none,id=drive-ide0-0-1,readonly=on -device
> ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1,bootindex=2 -drive
> file=/mnt/gemini.61rn.3T/Backups/debian.raw,format=raw,if=none,id=drive-virtio-disk0 -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -netdev tap,fd=23,id=hostnet0 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 -spice
> port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
> VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x7
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -object
> rng-random,id=objrng0,filename=/dev/random -device
> virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
>
> My host kernel .config is big so I put it in a paste: http://sprunge.us/u7YNBt
next prev parent reply other threads:[~2019-01-28 19:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-28 19:25 Regression in v4.14.94 by "x86,kvm: move qemu/guest FPU switching out to vcpu_run" Thomas Lindroth
2019-01-28 19:53 ` Sean Christopherson [this message]
2019-01-28 20:14 ` Sasha Levin
2019-01-28 20:14 ` Sasha Levin
2019-01-28 20:20 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190128195304.GA20466@linux.intel.com \
--to=sean.j.christopherson@intel.com \
--cc=kvm@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=thomas.lindroth@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.