From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Derek Yerger <derek@djy.llc>
Cc: Alex Williamson <alex.williamson@redhat.com>,
kvm@vger.kernel.org, "Bonzini, Paolo" <pbonzini@redhat.com>
Subject: Re: PROBLEM: Regression of MMU causing guest VM application errors
Date: Tue, 19 Nov 2019 12:01:33 -0800 [thread overview]
Message-ID: <20191119200133.GD25672@linux.intel.com> (raw)
In-Reply-To: <36be1503-f6f1-0ed0-b1fe-9c05d827f624@djy.llc>
On Wed, Oct 30, 2019 at 11:44:09PM -0400, Derek Yerger wrote:
>
> On 10/24/19 1:32 PM, Sean Christopherson wrote:
> >On Thu, Oct 24, 2019 at 11:18:59AM -0400, Derek Yerger wrote:
> >>On 10/22/19 4:28 PM, Sean Christopherson wrote:
> >>>On Thu, Oct 17, 2019 at 07:57:35PM -0400, Derek Yerger wrote:
> >>>Heh, should've checked from the get go... It's definitely not the memslot
> >>>issue, because the memslot bug is in 5.1.16 as well. :-)
> >>I didn't pick up on that, nice catch. The memslot thread was the closest
> >>thing I could find to an educated guess.
> >>>>I'm stuck on 5.1.x for now, maybe I'll give up and get a dedicated windows
> >>>>machine /s
> >>>What hardware are you running on? I was thinking this was AMD specific,
> >>>but then realized you said "AMD Radeon 540 GPU" and not "AMD CPU".
> >>Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> >>
> >>07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> >>Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)
> >> Subsystem: Gigabyte Technology Co., Ltd Device 22fe
> >> Kernel driver in use: vfio-pci
> >> Kernel modules: amdgpu
> >>(plus related audio device)
> >>
> >>I can't think of any other data points that would be helpful to solving
> >>system instability in a guest OS.
> >Can you bisect starting from v5.2? Identifying which commit in the kernel
> >introduced the regression would help immensely.
> On the host, I have to install NVIDIA GPU drivers with each new kernel
> build. During the process I discovered that I can't reproduce the issue on
> any kernel if I skip the *host* GPU drivers and start libvirtd in single
> mode.
>
> I noticed the following in the host kernel log around the time the guest
> encountered BSOD on 5.2.7:
>
> [ 337.841491] WARNING: CPU: 6 PID: 7548 at arch/x86/kvm/x86.c:7963
> kvm_arch_vcpu_ioctl_run+0x19b1/0x1b00 [kvm]
Rats, I overlooked this first time round. In the future, if you get a
WARN splat, try to make it very obvious in the bug report, they're almost
always a smoking gun.
That WARN that fired is:
/* The preempt notifier should have taken care of the FPU already. */
WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD));
which was added part of a bug fix by commit:
240c35a3783a ("kvm: x86: Use task structs fpu field for user")
the buggy commit that was fixed is
5f409e20b794 ("x86/fpu: Defer FPU state load until return to userspace")
which was part of a FPU rewrite that went into 5.2[*]. So yep, big
smoking gun :-)
My understanding of the WARN is that it means the kernel's FPU state is
unexpectedly loaded when entry to the KVM guest is imminent. As for *how*
the kernel's FPU state is getting loaded, no clue. But, I think it'd be
pretty easy to find the the culprit by adding a debug flag into struct
thread_info that gets set in vcpu_load() and clearing it in vcpu_put(),
and then WARN in set_ti_thread_flag() if the debug flag is true when
TIF_NEED_FPU_LOAD is being set. I'll put together a debugging patch later
today and send it your way.
[*] https://lkml.kernel.org/r/20190403164156.19645-1-bigeasy@linutronix.de
next prev parent reply other threads:[~2019-11-19 20:01 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-16 4:49 PROBLEM: Regression of MMU causing guest VM application errors Derek Yerger
2019-10-16 7:28 ` Paolo Bonzini
2019-10-16 17:28 ` Alex Williamson
2019-10-16 17:49 ` Sean Christopherson
2019-10-17 23:57 ` Derek Yerger
2019-10-22 20:28 ` Sean Christopherson
2019-10-24 15:18 ` Derek Yerger
2019-10-24 17:32 ` Sean Christopherson
2019-10-31 3:44 ` Derek Yerger
2019-11-19 20:01 ` Sean Christopherson [this message]
2019-11-20 9:19 ` Wanpeng Li
2019-11-20 9:57 ` Paolo Bonzini
2019-11-20 18:19 ` Sean Christopherson
2019-11-20 19:04 ` Derek Yerger
2019-11-20 19:28 ` Sean Christopherson
2019-11-27 15:24 ` Sean Christopherson
2019-12-17 23:11 ` Sean Christopherson
2019-12-17 23:13 ` Derek Yerger
2020-01-02 13:42 ` Derek Yerger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191119200133.GD25672@linux.intel.com \
--to=sean.j.christopherson@intel.com \
--cc=alex.williamson@redhat.com \
--cc=derek@djy.llc \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).