From: Sean Christopherson <seanjc@google.com>
To: Amaan Cheval <amaan.cheval@gmail.com>
Cc: brak@gameservers.com, kvm@vger.kernel.org
Subject: Re: Deadlock due to EPT_VIOLATION
Date: Fri, 21 Jul 2023 10:37:22 -0700 [thread overview]
Message-ID: <ZLrCUkwot/yiVC8T@google.com> (raw)
In-Reply-To: <20230721143407.2654728-1-amaan.cheval@gmail.com>
On Fri, Jul 21, 2023, Amaan Cheval wrote:
> I've also run a `function_graph` trace on some of the affected hosts, if you
> think it might be helpful to have a look at that to see what the host kernel
> might be doing while the guests are looping on EPT_VIOLATIONs. Nothing obvious
> stands out to me right now.
It wouldn't hurt to see it.
> We suspected KSM briefly, but ruled that out by turning KSM off and unmerging
> KSM pages - after doing that, a guest VM still locked up / started looping
> EPT_VIOLATIONS (like in Brian's original email), so it's unlikely this is KSM specific.
>
> Another interesting observation we made was that when we migrate a guest to a
> different host, the guest _stays_ locked up and throws EPT violations on the new
> host as well
Ooh, that's *very* interesting. That pretty much rules out memslot and mmu_notifier
issues.
>- so it's unlikely the issue is in the guest kernel itself (since
> we see it across guest operating systems), but perhaps the host kernel is
> messing the state of the guest kernel up in a way that keeps it locked up after
> migrating as well?
>
> If you have any thoughts on anything else to try, let me know!
Good news and bad news. Good news: I have a plausible theory as to what might be
going wrong. Bad news: if my theory is correct, our princess is in another castle
(the bug isn't in KVM).
One of the scenario where KVM retries page faults is if KVM asynchronously faults-in
the host backing page. If faulting in the page would require I/O, e.g. because
it's been swapped out, instead of synchronously doing the I/O on the vCPU task,
KVM uses a workqueue to fault in the page and immediately resumes the guest.
There are a variety of conditions that must be met to try an async page fault, but
assuming you aren't disable HLT VM-Exit, i.e. aren't letting the guest execute HLT,
it really just boils down to IRQs being enabled in the guest, which looking at the
traces is pretty much guaranteed to be true.
What's _supposed_ to happen is that async_pf_execute() successfully faults in the
page via get_user_pages_remote(), and then KVM installs a mapping for the guest
either in kvm_arch_async_page_ready() or by resuming the guest and cleanly handling
the retried guest page fault.
What I suspect is happening is that get_user_pages_remote() fails for some reason,
i.e. the workqueue doesn't fault in the page, and the vCPU gets stuck trying to
fault in a page that can't be faulted in for whatever reason. AFAICT, nothing in
KVM will actually complain or even surface the problem in tracepoints (yeah, that's
not good).
Circling back to the bad news, if that's indeed what's happening, it likely means
there's a bug somewhere else in the stack. E.g. it could be core mm/, might be
in the block layer, in swap, possibly in the exact filesystem you're using, etc.
Note, there's also a paravirt extension to async #PFs, where instead of putting
the vCPU into a synthetic halted state, KVM instead *may* inject a synthetic #PF
into the guest, e.g. so that the guest can go run a different task while the
faulting task is blocked. But this really is just a note, guest enabling of PV
async #PF shouldn't actually matter, again assuming my theory is correct.
To mostly confirm this is likely what's happening, can you enable all of the async
#PF tracepoints in KVM? The exact tracepoints might vary dependending on which kernel
version you're running, just enable everything with "async" in the name, e.g.
# ls -1 /sys/kernel/debug/tracing/events/kvm | grep async
kvm_async_pf_completed/
kvm_async_pf_not_present/
kvm_async_pf_ready/
kvm_async_pf_repeated_fault/
kvm_try_async_get_page/
If kvm_try_async_get_page() is more or less keeping pace with the "pf_taken" stat,
then this is likely what's happening.
And then to really confirm, this small bpf program will yell if get_user_pages_remote()
fails when attempting get a single page (which is always the case for KVM's async
#PF usage).
FWIW, get_user_pages_remote() isn't used all that much, e.g. when running a VM in
my, KVM is the only user. So you can likely aggressively instrument
get_user_pages_remote() via bpf without major problems, or maybe even assume that
any call is from KVM.
$ tail gup_remote.bt
kretfunc:get_user_pages_remote
{
if ( args->nr_pages == 1 && retval != 1 ) {
printf("Failed remote gup() on address %lx, ret = %d\n", args->start, retval);
}
}
next prev parent reply other threads:[~2023-07-21 17:37 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-23 14:02 Deadlock due to EPT_VIOLATION Brian Rak
2023-05-23 16:22 ` Sean Christopherson
2023-05-24 13:39 ` Brian Rak
2023-05-26 16:59 ` Brian Rak
2023-05-26 21:02 ` Sean Christopherson
2023-05-30 17:35 ` Brian Rak
2023-05-30 18:36 ` Sean Christopherson
2023-05-31 17:40 ` Brian Rak
2023-07-21 14:34 ` Amaan Cheval
2023-07-21 17:37 ` Sean Christopherson [this message]
2023-07-24 12:08 ` Amaan Cheval
2023-07-25 17:30 ` Sean Christopherson
2023-08-02 14:21 ` Amaan Cheval
2023-08-02 15:34 ` Sean Christopherson
2023-08-02 16:45 ` Amaan Cheval
2023-08-02 17:52 ` Sean Christopherson
2023-08-08 15:34 ` Amaan Cheval
2023-08-08 17:07 ` Sean Christopherson
2023-08-10 0:48 ` Eric Wheeler
2023-08-10 1:27 ` Eric Wheeler
2023-08-10 23:58 ` Sean Christopherson
2023-08-11 12:37 ` Amaan Cheval
2023-08-11 18:02 ` Sean Christopherson
2023-08-12 0:50 ` Eric Wheeler
2023-08-14 17:29 ` Sean Christopherson
2023-08-15 0:30 ` Eric Wheeler
2023-08-15 16:10 ` Sean Christopherson
2023-08-16 23:54 ` Eric Wheeler
2023-08-17 18:21 ` Sean Christopherson
2023-08-18 0:55 ` Eric Wheeler
2023-08-18 14:33 ` Sean Christopherson
2023-08-18 23:06 ` Eric Wheeler
2023-08-21 20:27 ` Eric Wheeler
2023-08-21 23:51 ` Sean Christopherson
2023-08-22 0:11 ` Sean Christopherson
2023-08-22 1:10 ` Eric Wheeler
2023-08-22 15:11 ` Sean Christopherson
2023-08-22 21:23 ` Eric Wheeler
2023-08-22 21:32 ` Sean Christopherson
2023-08-23 0:39 ` Eric Wheeler
2023-08-23 17:54 ` Sean Christopherson
2023-08-23 19:44 ` Eric Wheeler
2023-08-23 22:12 ` Eric Wheeler
2023-08-23 22:32 ` Eric Wheeler
2023-08-23 23:21 ` Sean Christopherson
2023-08-24 0:30 ` Eric Wheeler
2023-08-24 0:52 ` Sean Christopherson
2023-08-24 23:51 ` Eric Wheeler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZLrCUkwot/yiVC8T@google.com \
--to=seanjc@google.com \
--cc=amaan.cheval@gmail.com \
--cc=brak@gameservers.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.