From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sean Christopherson Subject: Re: Regression in v4.14.94 by "x86,kvm: move qemu/guest FPU switching out to vcpu_run" Date: Mon, 28 Jan 2019 12:20:10 -0800 Message-ID: <20190128202010.GA21909@linux.intel.com> References: <457d0666-1951-1b7c-f7e8-18c67763e6c3@gmail.com> <20190128201453.GM3973@sasha-vm> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Thomas Lindroth , kvm@vger.kernel.org, stable@vger.kernel.org To: Sasha Levin Return-path: Content-Disposition: inline In-Reply-To: <20190128201453.GM3973@sasha-vm> Sender: stable-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Mon, Jan 28, 2019 at 03:14:53PM -0500, Sasha Levin wrote: > On Mon, Jan 28, 2019 at 08:25:20PM +0100, Thomas Lindroth wrote: > >I run a qemu/kvm VM with debian and I've started getting segfaults and failing checksums on > >downloaded files. The failures are undeterministic and similar to the failures you get with > >bad ram. I tried to diagnose the problem with various testing tools and found that > >"stress-ng --verify --cpu 1" always give an error. Stress-ng give one of these errors > >usually within 60 sec: > > > > stress-ng-cpu: Newton-Rapshon sqrt not accurate enough > > stress-ng-cpu: prime error detected, number of primes between 0 and 1000000 miscalculated > > > >Nothing relevant has changed recently in the VM but the host kernel was upgraded from > >4.14.93 to 4.14.96. I can't reproduce the stress-ng error with a 4.14.93 host kernel. There > >is only one kvm related change in that range so I tried to revert that one. > > > >By reverting commit 4124a4cff344abbf8187775eb643d9827830e715 > >"x86,kvm: move qemu/guest FPU switching out to vcpu_run" on kernel 4.14.96 I can't reproduce > >the stress-ng error and I have no segfault or other problems with the guest. [...] > Interesting, thank you for the report. > > Could you confirm whether this issue reproduces on a newer kernel that > has that patch (4.19.18 for example)? The bug is specific to 4.14, two dependent commits were applied in the wrong order and introduced the bug. I have a patch, in the process of typing up the changelog.