From: Sean Christopherson <seanjc@google.com>
To: Nikita Kalyazin <kalyazin@amazon.com>
Cc: pbonzini@redhat.com, corbet@lwn.net, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, rostedt@goodmis.org, mhiramat@kernel.org,
mathieu.desnoyers@efficios.com, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, jthoughton@google.com,
david@redhat.com, peterx@redhat.com, oleg@redhat.com,
vkuznets@redhat.com, gshan@redhat.com, graf@amazon.de,
jgowans@amazon.com, roypat@amazon.co.uk, derekmn@amazon.com,
nsaenz@amazon.es, xmarcalx@amazon.com
Subject: Re: [RFC PATCH 0/6] KVM: x86: async PF user
Date: Thu, 27 Feb 2025 08:44:02 -0800 [thread overview]
Message-ID: <Z8CWUiAYVvNKqzfK@google.com> (raw)
In-Reply-To: <946fc0f5-4306-4aa9-9b63-f7ccbaff8003@amazon.com>
On Wed, Feb 26, 2025, Nikita Kalyazin wrote:
> On 26/02/2025 00:58, Sean Christopherson wrote:
> > On Fri, Feb 21, 2025, Nikita Kalyazin wrote:
> > > On 20/02/2025 18:49, Sean Christopherson wrote:
> > > > On Thu, Feb 20, 2025, Nikita Kalyazin wrote:
> > > > > On 19/02/2025 15:17, Sean Christopherson wrote:
> > > > > > On Wed, Feb 12, 2025, Nikita Kalyazin wrote:
> > > > > > The conundrum with userspace async #PF is that if userspace is given only a single
> > > > > > bit per gfn to force an exit, then KVM won't be able to differentiate between
> > > > > > "faults" that will be handled synchronously by the vCPU task, and faults that
> > > > > > usersepace will hand off to an I/O task. If the fault is handled synchronously,
> > > > > > KVM will needlessly inject a not-present #PF and a present IRQ.
> > > > >
> > > > > Right, but from the guest's point of view, async PF means "it will probably
> > > > > take a while for the host to get the page, so I may consider doing something
> > > > > else in the meantime (ie schedule another process if available)".
> > > >
> > > > Except in this case, the guest never gets a chance to run, i.e. it can't do
> > > > something else. From the guest point of view, if KVM doesn't inject what is
> > > > effectively a spurious async #PF, the VM-Exiting instruction simply took a (really)
> > > > long time to execute.
> > >
> > > Sorry, I didn't get that. If userspace learns from the
> > > kvm_run::memory_fault::flags that the exit is due to an async PF, it should
> > > call kvm run immediately, inject the not-present PF and allow the guest to
> > > reschedule. What do you mean by "the guest never gets a chance to run"?
> >
> > What I'm saying is that, as proposed, the API doesn't precisely tell userspace
^^^^^^^^^
KVM
> > an exit happened due to an "async #PF". KVM has absolutely zero clue as to
> > whether or not userspace is going to do an async #PF, or if userspace wants to
> > intercept the fault for some entirely different purpose.
>
> Userspace is supposed to know whether the PF is async from the dedicated
> flag added in the memory_fault structure:
> KVM_MEMORY_EXIT_FLAG_ASYNC_PF_USER. It will be set when KVM managed to
> inject page-not-present. Are you saying it isn't sufficient?
Gah, sorry, typo. The API doesn't tell *KVM* that userfault exit is due to an
async #PF.
> > Unless the remote page was already requested, e.g. by a different vCPU, or by a
> > prefetching algorithim.
> >
> > > Conversely, if the page content is available, it must have already been
> > > prepopulated into guest memory pagecache, the bit in the bitmap is cleared
> > > and no exit to userspace occurs.
> >
> > But that doesn't happen instantaneously. Even if the VMM somehow atomically
> > receives the page and marks it present, it's still possible for marking the page
> > present to race with KVM checking the bitmap.
>
> That looks like a generic problem of the VM-exit fault handling. Eg when
Heh, it's a generic "problem" for faults in general. E.g. modern x86 CPUs will
take "spurious" page faults on write accesses if a PTE is writable in memory but
the CPU has a read-only mapping cached in its TLB.
It's all a matter of cost. E.g. pre-Nehalem Intel CPUs didn't take such spurious
read-only faults as they would re-walk the in-memory page tables, but that ended
up being a net negative because the cost of re-walking for all read-only faults
outweighed the benefits of avoiding spurious faults in the unlikely scenario the
fault had already been fixed.
For a spurious async #PF + IRQ, the cost could be signficant, e.g. due to causing
unwanted context switches in the guest, in addition to the raw overhead of the
faults, interrupts, and exits.
> one vCPU exits, userspace handles the fault and races setting the bitmap
> with another vCPU that is about to fault the same page, which may cause a
> spurious exit.
>
> On the other hand, is it malignant? The only downside is additional
> overhead of the async PF protocol, but if the race occurs infrequently, it
> shouldn't be a problem.
When it comes to uAPI, I want to try and avoid statements along the lines of
"IF 'x' holds true, then 'y' SHOULDN'T be a problem". If this didn't impact uAPI,
I wouldn't care as much, i.e. I'd be much more willing iterate as needed.
I'm not saying we should go straight for a complex implementation. Quite the
opposite. But I do want us to consider the possible ramifications of using a
single bit for all userfaults, so that we can at least try to design something
that is extensible and won't be a pain to maintain.
next prev parent reply other threads:[~2025-02-27 16:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-18 12:39 [RFC PATCH 0/6] KVM: x86: async PF user Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 1/6] Documentation: KVM: add userfault KVM exit flag Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 2/6] Documentation: KVM: add async pf user doc Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 3/6] KVM: x86: add async ioctl support Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 4/6] KVM: trace events: add type argument to async pf Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 5/6] KVM: x86: async_pf_user: add infrastructure Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 6/6] KVM: x86: async_pf_user: hook to fault handling and add ioctl Nikita Kalyazin
2024-11-19 1:26 ` [RFC PATCH 0/6] KVM: x86: async PF user James Houghton
2024-11-19 16:19 ` Nikita Kalyazin
2025-02-11 21:17 ` Sean Christopherson
2025-02-12 18:14 ` Nikita Kalyazin
2025-02-19 15:17 ` Sean Christopherson
2025-02-20 18:29 ` Nikita Kalyazin
2025-02-20 18:49 ` Sean Christopherson
2025-02-21 11:02 ` Nikita Kalyazin
2025-02-26 0:58 ` Sean Christopherson
2025-02-26 17:07 ` Nikita Kalyazin
2025-02-27 16:44 ` Sean Christopherson [this message]
2025-02-27 18:24 ` Nikita Kalyazin
2025-02-27 23:47 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8CWUiAYVvNKqzfK@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=graf@amazon.de \
--cc=gshan@redhat.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=nsaenz@amazon.es \
--cc=oleg@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=rostedt@goodmis.org \
--cc=roypat@amazon.co.uk \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=xmarcalx@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.