Re: Deadlock due to EPT_VIOLATION

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Amaan Cheval <amaan.cheval@gmail.com>
Cc: brak@gameservers.com, kvm@vger.kernel.org
Subject: Re: Deadlock due to EPT_VIOLATION
Date: Wed, 2 Aug 2023 10:52:45 -0700	[thread overview]
Message-ID: <ZMqX7TJavsx8WEY2@google.com> (raw)
In-Reply-To: <CAG+wEg2x-oGALCwKkHOxcrcdjP6ceU=K52UoQE2ht6ut1O46ug@mail.gmail.com>

On Wed, Aug 02, 2023, Amaan Cheval wrote:
> > LOL, NUMA autobalancing.  I have a longstanding hatred of that feature.  I'm sure
> > there are setups where it adds value, but from my perspective it's nothing but
> > pain and misery.
> 
> Do you think autobalancing is increasing the odds of some edge-case race
> condition, perhaps?
> I find it really curious that numa_balancing definitely affects this issue, but
> particularly when thp=0. Is it just too many EPT entries to install
> when transparent hugepages is disabled, increasing the likelihood of
> a race condition / lock contention of some sort?

NUMA balancing works by zapping PTEs[*] in userspace page tables for mappings to
remote memory, and then migrating the data to local memory on the resulting page
fault.  When that memory is being used to back a KVM guest, zapping the userspace
(primary) PTEs triggers an mmu_notifier event that in turn saps KVM's PTEs, a.k.a.
SPTEs (which used to mean Shadow PTEs, but we're retroactively redefining SPTE to
also mean Secondary PTEs so that it's correct when shadow paging isn't being used).

If NUMA balancing is going nuclear and constantly zapping PTEs, the resulting
mmu_notifier events could theoretically stall a vCPU indefinitely.  The reason I
dislike NUMA balancing is that it's all too easy to end up with subtle bugs
and/or misconfigured setups where the NUMA balancing logic zaps PTEs/SPTEs without
actuablly being able to move the page in the end, i.e. it's (IMO) too easy for
NUMA balancing to get false positives when determining whether or not to try and
migrate a page.

That said, it's definitely very unexpected that NUMA balancing would be zapping
SPTEs to the point where a vCPU can't make forward progress.   It's theoretically
possible that that's what's happening, but quite unlikely, especially since it
sounds like you're seeing issues even with NUMA balancing disabled.

More likely is that there is a bug somewhere that results in the mmu_notifier
event refcount staying incorrectly eleveated, but that type of bug shouldn't follow
the VM across a live migration...

[*] Not technically a full zap of the PTE, it's just marked PROT_NONE, i.e.
    !PRESET, but on the KVM side of things it does manifest as a full zap of the
    SPTE.

> > > They still remain locked up, but that might be because the original cause of the
> > > looping EPT_VIOLATIONs corrupted/crashed them in an unrecoverable way (are there
> > > any ways you can think of that that might happen)?
> >
> > Define "remain locked up".  If the vCPUs are actively running in the guest and
> > making forward progress, i.e. not looping on VM-Exits on a single RIP, then they
> > aren't stuck from KVM's perspective.
> 
> Right, the traces look like they're not stuck (i.e. no looping on the same
> RIP). By "remain locked up" I mean that the VM is unresponsive on both the
> console and services (such as ssh) used to connect to it.
> 
> > But that doesn't mean the guest didn't take punitive action when a vCPU was
> > effectively stalled indefinitely by KVM, e.g. from the guest's perspective the
> > stuck vCPU will likely manifest as a soft lockup, and that could lead to a panic()
> > if the guest is a Linux kernel running with softlockup_panic=1.
> 
> So far we haven't had any guest kernels with softlockup_panic=1 have this issue,
> so it's hard to confirm, but it makes sense that the guest took punitive action
> in response to being stalled.
> 
> Any thoughts on how we might reproduce the issue or trace it down better?

Before going further, can you confirm that this earlier statement is correct?

 : Another interesting observation we made was that when we migrate a guest to a
 : different host, the guest _stays_ locked up and throws EPT violations on the new
 : host as well

Specifically, after migration, is the vCPU still fully stuck on EPT violations,
i.e. not making forward progress from KVM's perspective?  Or is the guest "stuck"
after migration purely because the guest itself gave up?

next prev parent reply	other threads:[~2023-08-02 17:54 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-23 14:02 Deadlock due to EPT_VIOLATION Brian Rak
2023-05-23 16:22 ` Sean Christopherson
2023-05-24 13:39   ` Brian Rak
2023-05-26 16:59     ` Brian Rak
2023-05-26 21:02       ` Sean Christopherson
2023-05-30 17:35         ` Brian Rak
2023-05-30 18:36           ` Sean Christopherson
2023-05-31 17:40             ` Brian Rak
2023-07-21 14:34             ` Amaan Cheval
2023-07-21 17:37               ` Sean Christopherson
2023-07-24 12:08                 ` Amaan Cheval
2023-07-25 17:30                   ` Sean Christopherson
2023-08-02 14:21                     ` Amaan Cheval
2023-08-02 15:34                       ` Sean Christopherson
2023-08-02 16:45                         ` Amaan Cheval
2023-08-02 17:52                           ` Sean Christopherson [this message]
2023-08-08 15:34                             ` Amaan Cheval
2023-08-08 17:07                               ` Sean Christopherson
2023-08-10  0:48                                 ` Eric Wheeler
2023-08-10  1:27                                   ` Eric Wheeler
2023-08-10 23:58                                     ` Sean Christopherson
2023-08-11 12:37                                       ` Amaan Cheval
2023-08-11 18:02                                         ` Sean Christopherson
2023-08-12  0:50                                           ` Eric Wheeler
2023-08-14 17:29                                             ` Sean Christopherson
2023-08-15  0:30                                 ` Eric Wheeler
2023-08-15 16:10                                   ` Sean Christopherson
2023-08-16 23:54                                     ` Eric Wheeler
2023-08-17 18:21                                       ` Sean Christopherson
2023-08-18  0:55                                         ` Eric Wheeler
2023-08-18 14:33                                           ` Sean Christopherson
2023-08-18 23:06                                             ` Eric Wheeler
2023-08-21 20:27                                               ` Eric Wheeler
2023-08-21 23:51                                                 ` Sean Christopherson
2023-08-22  0:11                                                   ` Sean Christopherson
2023-08-22  1:10                                                   ` Eric Wheeler
2023-08-22 15:11                                                     ` Sean Christopherson
2023-08-22 21:23                                                       ` Eric Wheeler
2023-08-22 21:32                                                         ` Sean Christopherson
2023-08-23  0:39                                                       ` Eric Wheeler
2023-08-23 17:54                                                         ` Sean Christopherson
2023-08-23 19:44                                                           ` Eric Wheeler
2023-08-23 22:12                                                           ` Eric Wheeler
2023-08-23 22:32                                                             ` Eric Wheeler
2023-08-23 23:21                                                               ` Sean Christopherson
2023-08-24  0:30                                                                 ` Eric Wheeler
2023-08-24  0:52                                                                   ` Sean Christopherson
2023-08-24 23:51                                                                     ` Eric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMqX7TJavsx8WEY2@google.com \
    --to=seanjc@google.com \
    --cc=amaan.cheval@gmail.com \
    --cc=brak@gameservers.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.