From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sasha Levin Subject: Re: Regression in v4.14.94 by "x86,kvm: move qemu/guest FPU switching out to vcpu_run" Date: Mon, 28 Jan 2019 15:14:53 -0500 Message-ID: <20190128201453.GM3973@sasha-vm> References: <457d0666-1951-1b7c-f7e8-18c67763e6c3@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Cc: kvm@vger.kernel.org, stable@vger.kernel.org To: Thomas Lindroth Return-path: Content-Disposition: inline In-Reply-To: <457d0666-1951-1b7c-f7e8-18c67763e6c3@gmail.com> Sender: stable-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Mon, Jan 28, 2019 at 08:25:20PM +0100, Thomas Lindroth wrote: >I run a qemu/kvm VM with debian and I've started getting segfaults and failing checksums on >downloaded files. The failures are undeterministic and similar to the failures you get with >bad ram. I tried to diagnose the problem with various testing tools and found that >"stress-ng --verify --cpu 1" always give an error. Stress-ng give one of these errors >usually within 60 sec: > > stress-ng-cpu: Newton-Rapshon sqrt not accurate enough > stress-ng-cpu: prime error detected, number of primes between 0 and 1000000 miscalculated > >Nothing relevant has changed recently in the VM but the host kernel was upgraded from >4.14.93 to 4.14.96. I can't reproduce the stress-ng error with a 4.14.93 host kernel. There >is only one kvm related change in that range so I tried to revert that one. > >By reverting commit 4124a4cff344abbf8187775eb643d9827830e715 >"x86,kvm: move qemu/guest FPU switching out to vcpu_run" on kernel 4.14.96 I can't reproduce >the stress-ng error and I have no segfault or other problems with the guest. > >The commit was originally introduced in v4.15-rc3 (Nov 14 2017) and was only recently >backported to 4.14. The other stable kernels before 4.14 didn't get any backport so it looks >like a broken 4.14 backport. That backport also cause problems for other people. >https://bugzilla.kernel.org/show_bug.cgi?id=202419 > >I've rebooted between the different kernels and rebooted the VM enough to be reasonably sure >that commit is the problem. Stress-ng never lasts more than 10 min with that commit but works >for hours without it. > >Steps to reproduce would be to create a qemu/kvm VM with debian stretch, install stress-ng >version 0.07.16 and run "stress-ng --verify --cpu 1". > >Here is the qemu-3.1.0 commandline generated by libvirt: >/usr/bin/qemu-system-x86_64 -name guest=debian,debug-threads=on -S -object >secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-debian/master-key.aes >-machine pc-i440fx-2.4,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m 2048 >-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid >0473ded4-d417-4b0e-a4f5-36ba5a2cd675 -no-user-config -nodefaults -chardev >socket,id=charmonitor,fd=21,server,nowait -mon chardev=charmonitor,id=monitor,mode=control >-rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown >-global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on >-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device >ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device >ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device >ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive >if=none,id=drive-ide0-0-1,readonly=on -device >ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1,bootindex=2 -drive >file=/mnt/gemini.61rn.3T/Backups/debian.raw,format=raw,if=none,id=drive-virtio-disk0 -device >virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 >-netdev tap,fd=23,id=hostnet0 -device >virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 -spice >port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device >VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x7 >-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -object >rng-random,id=objrng0,filename=/dev/random -device >virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox >on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on > >My host kernel .config is big so I put it in a paste: http://sprunge.us/u7YNBt Interesting, thank you for the report. Could you confirm whether this issue reproduces on a newer kernel that has that patch (4.19.18 for example)?