From: Nikita Kalyazin <kalyazin@amazon.com>
To: Sean Christopherson <seanjc@google.com>
Cc: <pbonzini@redhat.com>, <corbet@lwn.net>, <tglx@linutronix.de>,
<mingo@redhat.com>, <bp@alien8.de>, <dave.hansen@linux.intel.com>,
<hpa@zytor.com>, <rostedt@goodmis.org>, <mhiramat@kernel.org>,
<mathieu.desnoyers@efficios.com>, <kvm@vger.kernel.org>,
<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-trace-kernel@vger.kernel.org>, <jthoughton@google.com>,
<david@redhat.com>, <peterx@redhat.com>, <oleg@redhat.com>,
<vkuznets@redhat.com>, <gshan@redhat.com>, <graf@amazon.de>,
<jgowans@amazon.com>, <roypat@amazon.co.uk>, <derekmn@amazon.com>,
<nsaenz@amazon.es>, <xmarcalx@amazon.com>
Subject: Re: [RFC PATCH 0/6] KVM: x86: async PF user
Date: Wed, 12 Feb 2025 18:14:09 +0000 [thread overview]
Message-ID: <a7080c07-0fc5-45ce-92f7-5f432a67bc63@amazon.com> (raw)
In-Reply-To: <Z6u-WdbiW3n7iTjp@google.com>
On 11/02/2025 21:17, Sean Christopherson wrote:
> On Mon, Nov 18, 2024, Nikita Kalyazin wrote:
>> Async PF [1] allows to run other processes on a vCPU while the host
>> handles a stage-2 fault caused by a process on that vCPU. When using
>> VM-exit-based stage-2 fault handling [2], async PF functionality is lost
>> because KVM does not run the vCPU while a fault is being handled so no
>> other process can execute on the vCPU. This patch series extends
>> VM-exit-based stage-2 fault handling with async PF support by letting
>> userspace handle faults instead of the kernel, hence the "async PF user"
>> name.
>>
>> I circulated the idea with Paolo, Sean, David H, and James H at the LPC,
>> and the only concern I heard was about injecting the "page not present"
>> event via #PF exception in the CoCo case, where it may not work. In my
>> implementation, I reused the existing code for doing that, so the async
>> PF user implementation is on par with the present async PF
>> implementation in this regard, and support for the CoCo case can be
>> added separately.
>>
>> Please note that this series is applied on top of the VM-exit-based
>> stage-2 fault handling RFC [2].
>
> ...
>
>> Nikita Kalyazin (6):
>> Documentation: KVM: add userfault KVM exit flag
>> Documentation: KVM: add async pf user doc
>> KVM: x86: add async ioctl support
>> KVM: trace events: add type argument to async pf
>> KVM: x86: async_pf_user: add infrastructure
>> KVM: x86: async_pf_user: hook to fault handling and add ioctl
>>
>> Documentation/virt/kvm/api.rst | 35 ++++++
>> arch/x86/include/asm/kvm_host.h | 12 +-
>> arch/x86/kvm/Kconfig | 7 ++
>> arch/x86/kvm/lapic.c | 2 +
>> arch/x86/kvm/mmu/mmu.c | 68 ++++++++++-
>> arch/x86/kvm/x86.c | 101 +++++++++++++++-
>> arch/x86/kvm/x86.h | 2 +
>> include/linux/kvm_host.h | 30 +++++
>> include/linux/kvm_types.h | 1 +
>> include/trace/events/kvm.h | 50 +++++---
>> include/uapi/linux/kvm.h | 12 +-
>> virt/kvm/Kconfig | 3 +
>> virt/kvm/Makefile.kvm | 1 +
>> virt/kvm/async_pf.c | 2 +-
>> virt/kvm/async_pf_user.c | 197 ++++++++++++++++++++++++++++++++
>> virt/kvm/async_pf_user.h | 24 ++++
>> virt/kvm/kvm_main.c | 14 +++
>> 17 files changed, 535 insertions(+), 26 deletions(-)
>
> I am supportive of the idea, but there is way too much copy+paste in this series.
Hi Sean,
Yes, like I mentioned in the cover letter, I left the new implementation
isolated on purpose to make the scope of the change clear. There is
certainly lots of duplication that should be removed later on.
> And it's not just the code itself, it's all the structures and concepts. Off the
> top of my head, I can't think of any reason there needs to be a separate queue,
> separate lock(s), etc. The only difference between kernel APF and user APF is
> what chunk of code is responsible for faulting in the page.
There are two queues involved:
- "queue": stores in-flight faults. APF-kernel uses it to cancel all
works if needed. APF-user does not have a way to "cancel" userspace
works, but it uses the queue to look up the struct by the token when
userspace reports a completion.
- "ready": stores completed faults until KVM finds a chance to tell
guest about them.
I agree that the "ready" queue can be shared between APF-kernel and
-user as it's used in the same way. As for the "queue" queue, do you
think it's ok to process its elements differently based on the "type" of
them in a single loop [1] instead of having two separate queues?
[1] https://elixir.bootlin.com/linux/v6.13.2/source/virt/kvm/async_pf.c#L120
> I suspect a good place to start would be something along the lines of the below
> diff, and go from there. Given that KVM already needs to special case the fake
> "wake all" items, I'm guessing it won't be terribly difficult to teach the core
> flows about userspace async #PF.
That sounds sensible. I can certainly approach it in a "bottom up" way
by sparingly adding handling where it's different in APF-user rather
than adding it side by side and trying to merge common parts.
> I'm also not sure that injecting async #PF for all userfaults is desirable. For
> in-kernel async #PF, KVM knows that faulting in the memory would sleep. For
> userfaults, KVM has no way of knowing if the userfault will sleep, i.e. should
> be handled via async #PF. The obvious answer is to have userspace only enable
> userspace async #PF when it's useful, but "an all or nothing" approach isn't
> great uAPI. On the flip side, adding uAPI for a use case that doesn't exist
> doesn't make sense either :-/
I wasn't able to locate the code that would check whether faulting would
sleep in APF-kernel. KVM spins APF-kernel whenever it can ([2]).
Please let me know if I'm missing something here.
[2]
https://elixir.bootlin.com/linux/v6.13.2/source/arch/x86/kvm/mmu/mmu.c#L4360
> Exiting to userspace in vCPU context is also kludgy. It makes sense for base
> userfault, because the vCPU can't make forward progress until the fault is
> resolved. Actually, I'm not even sure it makes sense there. I'll follow-up in
Even though we exit to userspace, in case of APF-user, userspace is
supposed to VM enter straight after scheduling the async job, which is
then executed concurrently with the vCPU.
> James' series. Anyways, it definitely doesn't make sense for async #PF, because
> the whole point is to let the vCPU run. Signalling userspace would definitely
> add complexity, but only because of the need to communicate the token and wait
> for userspace to consume said token. I'll think more on that.
By signalling userspace you mean a new non-exit-to-userspace mechanism
similar to UFFD? What advantage can you see in it over exiting to
userspace (which already exists in James's series)?
Thanks,
Nikita
>
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 0ee4816b079a..fc31b47cf9c5 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -177,7 +177,8 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
> * success, 'false' on failure (page fault has to be handled synchronously).
> */
> bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> - unsigned long hva, struct kvm_arch_async_pf *arch)
> + unsigned long hva, struct kvm_arch_async_pf *arch,
> + bool userfault)
> {
> struct kvm_async_pf *work;
>
> @@ -202,13 +203,16 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> work->addr = hva;
> work->arch = *arch;
>
> - INIT_WORK(&work->work, async_pf_execute);
> -
> list_add_tail(&work->queue, &vcpu->async_pf.queue);
> vcpu->async_pf.queued++;
> work->notpresent_injected = kvm_arch_async_page_not_present(vcpu, work);
>
> - schedule_work(&work->work);
> + if (userfault) {
> + work->userfault = true;
> + } else {
> + INIT_WORK(&work->work, async_pf_execute);
> + schedule_work(&work->work);
> + }
>
> return true;
> }
next prev parent reply other threads:[~2025-02-12 18:14 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-18 12:39 [RFC PATCH 0/6] KVM: x86: async PF user Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 1/6] Documentation: KVM: add userfault KVM exit flag Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 2/6] Documentation: KVM: add async pf user doc Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 3/6] KVM: x86: add async ioctl support Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 4/6] KVM: trace events: add type argument to async pf Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 5/6] KVM: x86: async_pf_user: add infrastructure Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 6/6] KVM: x86: async_pf_user: hook to fault handling and add ioctl Nikita Kalyazin
2024-11-19 1:26 ` [RFC PATCH 0/6] KVM: x86: async PF user James Houghton
2024-11-19 16:19 ` Nikita Kalyazin
2025-02-11 21:17 ` Sean Christopherson
2025-02-12 18:14 ` Nikita Kalyazin [this message]
2025-02-19 15:17 ` Sean Christopherson
2025-02-20 18:29 ` Nikita Kalyazin
2025-02-20 18:49 ` Sean Christopherson
2025-02-21 11:02 ` Nikita Kalyazin
2025-02-26 0:58 ` Sean Christopherson
2025-02-26 17:07 ` Nikita Kalyazin
2025-02-27 16:44 ` Sean Christopherson
2025-02-27 18:24 ` Nikita Kalyazin
2025-02-27 23:47 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a7080c07-0fc5-45ce-92f7-5f432a67bc63@amazon.com \
--to=kalyazin@amazon.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=graf@amazon.de \
--cc=gshan@redhat.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=nsaenz@amazon.es \
--cc=oleg@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=rostedt@goodmis.org \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=xmarcalx@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox