From: bugzilla-daemon@bugzilla.kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 206215] QEMU guest crash due to random 'general protection fault' since kernel 5.2.5 on i7-3517UE
Date: Thu, 16 Jan 2020 18:08:49 +0000 [thread overview]
Message-ID: <bug-206215-28872-ppTHYf6H3E@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-206215-28872@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=206215
--- Comment #6 from Sean Christopherson (sean.j.christopherson@intel.com) ---
On Thu, Jan 16, 2020 at 07:38:54AM -0800, Sean Christopherson wrote:
> On Wed, Jan 15, 2020 at 08:08:32PM -0500, Derek Yerger wrote:
> > On 1/15/20 4:52 PM, Sean Christopherson wrote:
> > >+cc Derek, who is hitting the same thing.
> > >
> > >On Wed, Jan 15, 2020 at 09:18:56PM +0000,
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > >>https://bugzilla.kernel.org/show_bug.cgi?id=206215
> > >*snip*
> > >that's a big smoking gun pointing at commit ca7e6b286333 ("KVM: X86: Fix
> > >fpu state crash in kvm guest"), which is commit e751732486eb upstream.
> > >
> > >1. Can you verify reverting ca7e6b286333 (or e751732486eb in upstream)
> > > solves the issue?
> > >
> > >2. Assuming the answer is yes, on a buggy kernel, can you run with the
> > > attached patch to try get debug info?
> > I did these out of order since I had 5.3.11 built with the patch, ready to
> > go for weeks now, waiting for an opportunity to test.
> >
> > Win10 guest immediately BSOD'ed with:
> >
> > WARNING: CPU: 2 PID: 9296 at include/linux/thread_info.h:55
> > kernel_fpu_begin+0x6b/0xc0
>
> Can you provide the full stack trace of the WARN? I'm hoping that will
> provide a hint as to what's going wrong.
Aha! I found at least two cases where TIF_NEED_FPU_LOAD could be set
without the vCPU being preempted.
The comment on fpregs_lock() states that softirq can set TIF_NEED_FPU_LOAD,
which would not be handled by the preempt notifier.
/*
* Use fpregs_lock() while editing CPU's FPU registers or fpu->state.
* A context switch will (and softirq might) save CPU's FPU registers to
^^^^^^^^^^^^^^^^^^^
* fpu->state and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in
* a random state.
*/
static inline void fpregs_lock(void)
The other scenario is from a stack trace from commit f775b13eedee ("x86,kvm:
move qemu/guest FPU switching out to vcpu_run"), which clearly shows that
kernel_fpu_begin() can be invoked without KVM being preempted.
__warn+0xcb/0xf0
warn_slowpath_null+0x1d/0x20
kernel_fpu_disable+0x3f/0x50
__kernel_fpu_begin+0x49/0x100
kernel_fpu_begin+0xe/0x10
crc32c_pcl_intel_update+0x84/0xb0
crypto_shash_update+0x3f/0x110
crc32c+0x63/0x8a [libcrc32c]
dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
node_prepare_for_write+0x44/0x70 [dm_persistent_data]
dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
submit_io+0x170/0x1b0 [dm_bufio]
__write_dirty_buffer+0x89/0x90 [dm_bufio]
__make_buffer_clean+0x4f/0x80 [dm_bufio]
__try_evict_buffer+0x42/0x60 [dm_bufio]
dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
shrink_slab.part.40+0x1f5/0x420
shrink_node+0x22c/0x320
do_try_to_free_pages+0xf5/0x330
try_to_free_pages+0xe9/0x190
__alloc_pages_slowpath+0x40f/0xba0
__alloc_pages_nodemask+0x209/0x260
alloc_pages_vma+0x1f1/0x250
do_huge_pmd_anonymous_page+0x123/0x660
handle_mm_fault+0xfd3/0x1330
__get_user_pages+0x113/0x640
get_user_pages+0x4f/0x60
__gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
try_async_pf+0x66/0x230 [kvm]
tdp_page_fault+0x130/0x280 [kvm]
kvm_mmu_page_fault+0x60/0x120 [kvm]
handle_ept_violation+0x91/0x170 [kvm_intel]
vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
Either of the above explains why pre-e751732486eb code waited until IRQs
are disabled by vcpu_enter_guest() to do switch_fpu_return().
Properly fixing soley within KVM is going to be somewhat painful. The
most common case, vcpu_enter_guest(), which is being hit here, is easy
to handle by restoring the switch_fpu_return() that was removed by commit
e751732486eb. The other obvious case I see is emulator's access of guest
fpu state, which will effectively require reverting commit 6ab0b9feb82a
("x86,kvm: remove KVM emulator get_fpu / put_fpu") along with new
implementations of the hooks to handle TIF_NEED_FPU_LOAD.
> > Then stashed the patch, reverted ca7e6b286333, compile, reboot.
> >
> > Guest is running stable now on 5.3.11. Did test my CAD under the guest, did
> > not experience the crashes that had me stuck at 5.1.
--
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2020-01-16 18:08 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-15 21:18 [Bug 206215] New: QEMU guest crash due to random 'general protection fault' since kernel 5.2.5 on i7-3517UE bugzilla-daemon
2020-01-15 21:52 ` Sean Christopherson
2020-01-16 1:08 ` Derek Yerger
2020-01-16 15:38 ` Sean Christopherson
2020-01-16 18:08 ` Sean Christopherson
2020-01-16 19:21 ` Derek Yerger
2020-01-16 19:32 ` Sean Christopherson
2020-01-15 21:52 ` [Bug 206215] " bugzilla-daemon
2020-01-15 22:15 ` bugzilla-daemon
2020-01-16 1:15 ` bugzilla-daemon
2020-01-16 1:36 ` bugzilla-daemon
2020-01-16 15:38 ` bugzilla-daemon
2020-01-16 18:08 ` bugzilla-daemon [this message]
2020-01-16 19:21 ` bugzilla-daemon
2020-01-16 19:32 ` bugzilla-daemon
2020-01-17 22:43 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-206215-28872-ppTHYf6H3E@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@bugzilla.kernel.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.