From: David Woodhouse <dwmw2@infradead.org>
To: Carsten Stollmaier <stollmc@amazon.com>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Cc: nh-open-source@amazon.com, Peter Xu <peterx@redhat.com>,
Sebastian Biemueller <sbiemue@amazon.de>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] KVM: x86: Use gfn_to_pfn_cache for steal_time
Date: Fri, 02 Aug 2024 13:03:16 +0100 [thread overview]
Message-ID: <b40f244f50ce3a14d637fd1769a9b3f709b0842e.camel@infradead.org> (raw)
In-Reply-To: <20240802114402.96669-1-stollmc@amazon.com>
[-- Attachment #1: Type: text/plain, Size: 2703 bytes --]
On Fri, 2024-08-02 at 11:44 +0000, Carsten Stollmaier wrote:
> On vcpu_run, before entering the guest, the update of the steal time
> information causes a page-fault if the page is not present. In our
> scenario, this gets handled by do_user_addr_fault and successively
> handle_userfault since we have the region registered to that.
>
> handle_userfault uses TASK_INTERRUPTIBLE, so it is interruptible by
> signals. do_user_addr_fault then busy-retries it if the pending signal
> is non-fatal. This leads to contention of the mmap_lock.
The busy-loop causes so much contention on mmap_lock that post-copy
live migration fails to make progress, and is leading to failures. Yes?
> This patch replaces the use of gfn_to_hva_cache with gfn_to_pfn_cache,
> as gfn_to_pfn_cache ensures page presence for the memory access,
> preventing the contention of the mmap_lock.
>
> Signed-off-by: Carsten Stollmaier <stollmc@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
I think this makes sense on its own, as it addresses the specific case
where KVM is *likely* to be touching a userfaulted (guest) page. And it
allows us to ditch yet another explicit asm exception handler.
We should note, though, that in terms of the original problem described
above, it's a bit of a workaround. It just means that by using
kvm_gpc_refresh() to obtain the user page, we end up in
handle_userfault() without the FAULT_FLAG_INTERRUPTIBLE flag.
(Note to self: should kvm_gpc_refresh() take fault flags, to allow
interruptible and killable modes to be selected by its caller?)
An alternative workaround (which perhaps we should *also* consider)
looked like this (plus some suitable code comment, of course):
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1304,6 +1304,8 @@ void do_user_addr_fault(struct pt_regs *regs,
*/
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+ else
+ flags &= ~FAULT_FLAG_INTERRUPTIBLE;
#ifdef CONFIG_X86_64
/*
That would *also* handle arbitrary copy_to_user/copy_from_user() to
userfault pages, which could theoretically hit the same busy loop.
I'm actually tempted to make user access *interruptible* though, and
either add copy_{from,to}_user_interruptible() or change the semantics
of the existing ones (which I believe are already killable).
That would require each architecture implementing interruptible
exceptions, by doing an extable lookup before the retry. Not overly
complex, but needs to be done for all architectures (although not at
once; we could live with not-yet-done architectures just remaining
killable).
Thoughts?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
next parent reply other threads:[~2024-08-02 12:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240802114402.96669-1-stollmc@amazon.com>
2024-08-02 12:03 ` David Woodhouse [this message]
2024-08-02 12:38 ` [PATCH] KVM: x86: Use gfn_to_pfn_cache for steal_time Matthew Wilcox
2024-08-02 12:53 ` David Woodhouse
2024-08-02 12:56 ` David Woodhouse
2024-08-02 16:06 ` David Woodhouse
2024-08-02 22:40 ` Peter Xu
2024-08-03 8:35 ` David Woodhouse
2024-08-04 13:31 ` Peter Xu
2024-08-17 0:22 ` Sean Christopherson
2024-08-20 10:11 ` David Woodhouse
2025-07-29 10:28 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b40f244f50ce3a14d637fd1769a9b3f709b0842e.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=nh-open-source@amazon.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=sbiemue@amazon.de \
--cc=seanjc@google.com \
--cc=stollmc@amazon.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).