From: Peter Xu <peterx@redhat.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Carsten Stollmaier <stollmc@amazon.com>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
nh-open-source@amazon.com,
Sebastian Biemueller <sbiemue@amazon.de>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] KVM: x86: Use gfn_to_pfn_cache for steal_time
Date: Fri, 2 Aug 2024 18:40:42 -0400 [thread overview]
Message-ID: <Zq1gavwLBHeSr2ju@x1n> (raw)
In-Reply-To: <b40f244f50ce3a14d637fd1769a9b3f709b0842e.camel@infradead.org>
On Fri, Aug 02, 2024 at 01:03:16PM +0100, David Woodhouse wrote:
> On Fri, 2024-08-02 at 11:44 +0000, Carsten Stollmaier wrote:
> > On vcpu_run, before entering the guest, the update of the steal time
> > information causes a page-fault if the page is not present. In our
> > scenario, this gets handled by do_user_addr_fault and successively
> > handle_userfault since we have the region registered to that.
> >
> > handle_userfault uses TASK_INTERRUPTIBLE, so it is interruptible by
> > signals. do_user_addr_fault then busy-retries it if the pending signal
> > is non-fatal. This leads to contention of the mmap_lock.
>
> The busy-loop causes so much contention on mmap_lock that post-copy
> live migration fails to make progress, and is leading to failures. Yes?
>
> > This patch replaces the use of gfn_to_hva_cache with gfn_to_pfn_cache,
> > as gfn_to_pfn_cache ensures page presence for the memory access,
> > preventing the contention of the mmap_lock.
> >
> > Signed-off-by: Carsten Stollmaier <stollmc@amazon.com>
>
> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
>
> I think this makes sense on its own, as it addresses the specific case
> where KVM is *likely* to be touching a userfaulted (guest) page. And it
> allows us to ditch yet another explicit asm exception handler.
>
> We should note, though, that in terms of the original problem described
> above, it's a bit of a workaround. It just means that by using
> kvm_gpc_refresh() to obtain the user page, we end up in
> handle_userfault() without the FAULT_FLAG_INTERRUPTIBLE flag.
>
> (Note to self: should kvm_gpc_refresh() take fault flags, to allow
> interruptible and killable modes to be selected by its caller?)
>
>
> An alternative workaround (which perhaps we should *also* consider)
> looked like this (plus some suitable code comment, of course):
>
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1304,6 +1304,8 @@ void do_user_addr_fault(struct pt_regs *regs,
> */
> if (user_mode(regs))
> flags |= FAULT_FLAG_USER;
> + else
> + flags &= ~FAULT_FLAG_INTERRUPTIBLE;
>
> #ifdef CONFIG_X86_64
> /*
>
>
> That would *also* handle arbitrary copy_to_user/copy_from_user() to
> userfault pages, which could theoretically hit the same busy loop.
>
> I'm actually tempted to make user access *interruptible* though, and
> either add copy_{from,to}_user_interruptible() or change the semantics
> of the existing ones (which I believe are already killable).
>
> That would require each architecture implementing interruptible
> exceptions, by doing an extable lookup before the retry. Not overly
> complex, but needs to be done for all architectures (although not at
> once; we could live with not-yet-done architectures just remaining
> killable).
>
> Thoughts?
Instead of "interruptible exception" or the original patch (which might
still be worthwhile, though? I didn't follow much on kvm and the new gpc
cache, but looks still nicer than get/put user from initial glance), above
looks like the easier and complete solution to me. For "completeness", I
mean I am not sure how many other copy_to/from_user() code in kvm can hit
this, so looks like still possible to hit outside steal time page?
I thought only the slow fault path was involved in INTERRUPTIBLE thing and
that was the plan, but I guess I overlooked how the default value could
affect copy to/from user invoked from KVM as well..
With above patch to drop FAULT_FLAG_INTERRUPTIBLE for !user, KVM can still
opt-in INTERRUPTIBLE anywhere by leveraging hva_to_pfn[_slow]() API, which
is "INTERRUPTIBLE"-ready with a boolean the caller can set. But the caller
will need to be able to process KVM_PFN_ERR_SIGPENDING.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2024-08-02 22:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-02 11:44 [PATCH] KVM: x86: Use gfn_to_pfn_cache for steal_time Carsten Stollmaier
2024-08-02 12:03 ` David Woodhouse
2024-08-02 12:38 ` Matthew Wilcox
2024-08-02 12:53 ` David Woodhouse
2024-08-02 12:56 ` David Woodhouse
2024-08-02 16:06 ` David Woodhouse
2024-08-02 22:40 ` Peter Xu [this message]
2024-08-03 8:35 ` David Woodhouse
2024-08-04 13:31 ` Peter Xu
2024-08-17 0:22 ` Sean Christopherson
2024-08-20 10:11 ` David Woodhouse
2025-07-29 10:28 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zq1gavwLBHeSr2ju@x1n \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=nh-open-source@amazon.com \
--cc=pbonzini@redhat.com \
--cc=sbiemue@amazon.de \
--cc=seanjc@google.com \
--cc=stollmc@amazon.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.