From: Sasha Levin <sashal@kernel.org>
To: Thomas Lindroth <thomas.lindroth@gmail.com>
Cc: kvm@vger.kernel.org, stable@vger.kernel.org
Subject: Re: Regression in v4.14.94 by "x86,kvm: move qemu/guest FPU switching out to vcpu_run"
Date: Mon, 28 Jan 2019 15:14:53 -0500 [thread overview]
Message-ID: <20190128201453.GM3973@sasha-vm> (raw)
In-Reply-To: <457d0666-1951-1b7c-f7e8-18c67763e6c3@gmail.com>
On Mon, Jan 28, 2019 at 08:25:20PM +0100, Thomas Lindroth wrote:
>I run a qemu/kvm VM with debian and I've started getting segfaults and failing checksums on
>downloaded files. The failures are undeterministic and similar to the failures you get with
>bad ram. I tried to diagnose the problem with various testing tools and found that
>"stress-ng --verify --cpu 1" always give an error. Stress-ng give one of these errors
>usually within 60 sec:
>
> stress-ng-cpu: Newton-Rapshon sqrt not accurate enough
> stress-ng-cpu: prime error detected, number of primes between 0 and 1000000 miscalculated
>
>Nothing relevant has changed recently in the VM but the host kernel was upgraded from
>4.14.93 to 4.14.96. I can't reproduce the stress-ng error with a 4.14.93 host kernel. There
>is only one kvm related change in that range so I tried to revert that one.
>
>By reverting commit 4124a4cff344abbf8187775eb643d9827830e715
>"x86,kvm: move qemu/guest FPU switching out to vcpu_run" on kernel 4.14.96 I can't reproduce
>the stress-ng error and I have no segfault or other problems with the guest.
>
>The commit was originally introduced in v4.15-rc3 (Nov 14 2017) and was only recently
>backported to 4.14. The other stable kernels before 4.14 didn't get any backport so it looks
>like a broken 4.14 backport. That backport also cause problems for other people.
>https://bugzilla.kernel.org/show_bug.cgi?id=202419
>
>I've rebooted between the different kernels and rebooted the VM enough to be reasonably sure
>that commit is the problem. Stress-ng never lasts more than 10 min with that commit but works
>for hours without it.
>
>Steps to reproduce would be to create a qemu/kvm VM with debian stretch, install stress-ng
>version 0.07.16 and run "stress-ng --verify --cpu 1".
>
>Here is the qemu-3.1.0 commandline generated by libvirt:
>/usr/bin/qemu-system-x86_64 -name guest=debian,debug-threads=on -S -object
>secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-debian/master-key.aes
>-machine pc-i440fx-2.4,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m 2048
>-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid
>0473ded4-d417-4b0e-a4f5-36ba5a2cd675 -no-user-config -nodefaults -chardev
>socket,id=charmonitor,fd=21,server,nowait -mon chardev=charmonitor,id=monitor,mode=control
>-rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown
>-global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on
>-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device
>ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device
>ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device
>ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive
>if=none,id=drive-ide0-0-1,readonly=on -device
>ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1,bootindex=2 -drive
>file=/mnt/gemini.61rn.3T/Backups/debian.raw,format=raw,if=none,id=drive-virtio-disk0 -device
>virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>-netdev tap,fd=23,id=hostnet0 -device
>virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 -spice
>port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
>VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x7
>-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -object
>rng-random,id=objrng0,filename=/dev/random -device
>virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox
>on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
>
>My host kernel .config is big so I put it in a paste: http://sprunge.us/u7YNBt
Interesting, thank you for the report.
Could you confirm whether this issue reproduces on a newer kernel that
has that patch (4.19.18 for example)?
WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sashal@kernel.org>
To: Thomas Lindroth <thomas.lindroth@gmail.com>
Cc: kvm@vger.kernel.org, stable@vger.kernel.org
Subject: Re: Regression in v4.14.94 by "x86,kvm: move qemu/guest FPU switching out to vcpu_run"
Date: Mon, 28 Jan 2019 15:14:53 -0500 [thread overview]
Message-ID: <20190128201453.GM3973@sasha-vm> (raw)
In-Reply-To: <457d0666-1951-1b7c-f7e8-18c67763e6c3@gmail.com>
On Mon, Jan 28, 2019 at 08:25:20PM +0100, Thomas Lindroth wrote:
>I run a qemu/kvm VM with debian and I've started getting segfaults and failing checksums on
>downloaded files. The failures are undeterministic and similar to the failures you get with
>bad ram. I tried to diagnose the problem with various testing tools and found that
>"stress-ng --verify --cpu 1" always give an error. Stress-ng give one of these errors
>usually within 60 sec:
>
> stress-ng-cpu: Newton-Rapshon sqrt not accurate enough
> stress-ng-cpu: prime error detected, number of primes between 0 and 1000000 miscalculated
>
>Nothing relevant has changed recently in the VM but the host kernel was upgraded from
>4.14.93 to 4.14.96. I can't reproduce the stress-ng error with a 4.14.93 host kernel. There
>is only one kvm related change in that range so I tried to revert that one.
>
>By reverting commit 4124a4cff344abbf8187775eb643d9827830e715
>"x86,kvm: move qemu/guest FPU switching out to vcpu_run" on kernel 4.14.96 I can't reproduce
>the stress-ng error and I have no segfault or other problems with the guest.
>
>The commit was originally introduced in v4.15-rc3 (Nov 14 2017) and was only recently
>backported to 4.14. The other stable kernels before 4.14 didn't get any backport so it looks
>like a broken 4.14 backport. That backport also cause problems for other people.
>https://bugzilla.kernel.org/show_bug.cgi?id=202419
>
>I've rebooted between the different kernels and rebooted the VM enough to be reasonably sure
>that commit is the problem. Stress-ng never lasts more than 10 min with that commit but works
>for hours without it.
>
>Steps to reproduce would be to create a qemu/kvm VM with debian stretch, install stress-ng
>version 0.07.16 and run "stress-ng --verify --cpu 1".
>
>Here is the qemu-3.1.0 commandline generated by libvirt:
>/usr/bin/qemu-system-x86_64 -name guest=debian,debug-threads=on -S -object
>secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-debian/master-key.aes
>-machine pc-i440fx-2.4,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m 2048
>-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid
>0473ded4-d417-4b0e-a4f5-36ba5a2cd675 -no-user-config -nodefaults -chardev
>socket,id=charmonitor,fd=21,server,nowait -mon chardev=charmonitor,id=monitor,mode=control
>-rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown
>-global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on
>-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device
>ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device
>ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device
>ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive
>if=none,id=drive-ide0-0-1,readonly=on -device
>ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1,bootindex=2 -drive
>file=/mnt/gemini.61rn.3T/Backups/debian.raw,format=raw,if=none,id=drive-virtio-disk0 -device
>virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>-netdev tap,fd=23,id=hostnet0 -device
>virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 -spice
>port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
>VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x7
>-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -object
>rng-random,id=objrng0,filename=/dev/random -device
>virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox
>on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
>
>My host kernel .config is big so I put it in a paste: http://sprunge.us/u7YNBt
Interesting, thank you for the report.
Could you confirm whether this issue reproduces on a newer kernel that
has that patch (4.19.18 for example)?
--
Thanks,
Sasha
next prev parent reply other threads:[~2019-01-28 20:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-28 19:25 Regression in v4.14.94 by "x86,kvm: move qemu/guest FPU switching out to vcpu_run" Thomas Lindroth
2019-01-28 19:53 ` Sean Christopherson
2019-01-28 20:14 ` Sasha Levin [this message]
2019-01-28 20:14 ` Sasha Levin
2019-01-28 20:20 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190128201453.GM3973@sasha-vm \
--to=sashal@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=thomas.lindroth@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.