public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Nikita Kalyazin <kalyazin@amazon.com>
To: Sean Christopherson <seanjc@google.com>
Cc: <pbonzini@redhat.com>, <corbet@lwn.net>, <tglx@linutronix.de>,
	<mingo@redhat.com>, <bp@alien8.de>, <dave.hansen@linux.intel.com>,
	<hpa@zytor.com>, <rostedt@goodmis.org>, <mhiramat@kernel.org>,
	<mathieu.desnoyers@efficios.com>, <kvm@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-trace-kernel@vger.kernel.org>, <jthoughton@google.com>,
	<david@redhat.com>, <peterx@redhat.com>, <oleg@redhat.com>,
	<vkuznets@redhat.com>, <gshan@redhat.com>, <graf@amazon.de>,
	<jgowans@amazon.com>, <roypat@amazon.co.uk>, <derekmn@amazon.com>,
	<nsaenz@amazon.es>, <xmarcalx@amazon.com>
Subject: Re: [RFC PATCH 0/6] KVM: x86: async PF user
Date: Wed, 12 Feb 2025 18:14:09 +0000	[thread overview]
Message-ID: <a7080c07-0fc5-45ce-92f7-5f432a67bc63@amazon.com> (raw)
In-Reply-To: <Z6u-WdbiW3n7iTjp@google.com>

On 11/02/2025 21:17, Sean Christopherson wrote:
> On Mon, Nov 18, 2024, Nikita Kalyazin wrote:
>> Async PF [1] allows to run other processes on a vCPU while the host
>> handles a stage-2 fault caused by a process on that vCPU. When using
>> VM-exit-based stage-2 fault handling [2], async PF functionality is lost
>> because KVM does not run the vCPU while a fault is being handled so no
>> other process can execute on the vCPU. This patch series extends
>> VM-exit-based stage-2 fault handling with async PF support by letting
>> userspace handle faults instead of the kernel, hence the "async PF user"
>> name.
>>
>> I circulated the idea with Paolo, Sean, David H, and James H at the LPC,
>> and the only concern I heard was about injecting the "page not present"
>> event via #PF exception in the CoCo case, where it may not work. In my
>> implementation, I reused the existing code for doing that, so the async
>> PF user implementation is on par with the present async PF
>> implementation in this regard, and support for the CoCo case can be
>> added separately.
>>
>> Please note that this series is applied on top of the VM-exit-based
>> stage-2 fault handling RFC [2].
> 
> ...
> 
>> Nikita Kalyazin (6):
>>    Documentation: KVM: add userfault KVM exit flag
>>    Documentation: KVM: add async pf user doc
>>    KVM: x86: add async ioctl support
>>    KVM: trace events: add type argument to async pf
>>    KVM: x86: async_pf_user: add infrastructure
>>    KVM: x86: async_pf_user: hook to fault handling and add ioctl
>>
>>   Documentation/virt/kvm/api.rst  |  35 ++++++
>>   arch/x86/include/asm/kvm_host.h |  12 +-
>>   arch/x86/kvm/Kconfig            |   7 ++
>>   arch/x86/kvm/lapic.c            |   2 +
>>   arch/x86/kvm/mmu/mmu.c          |  68 ++++++++++-
>>   arch/x86/kvm/x86.c              | 101 +++++++++++++++-
>>   arch/x86/kvm/x86.h              |   2 +
>>   include/linux/kvm_host.h        |  30 +++++
>>   include/linux/kvm_types.h       |   1 +
>>   include/trace/events/kvm.h      |  50 +++++---
>>   include/uapi/linux/kvm.h        |  12 +-
>>   virt/kvm/Kconfig                |   3 +
>>   virt/kvm/Makefile.kvm           |   1 +
>>   virt/kvm/async_pf.c             |   2 +-
>>   virt/kvm/async_pf_user.c        | 197 ++++++++++++++++++++++++++++++++
>>   virt/kvm/async_pf_user.h        |  24 ++++
>>   virt/kvm/kvm_main.c             |  14 +++
>>   17 files changed, 535 insertions(+), 26 deletions(-)
> 
> I am supportive of the idea, but there is way too much copy+paste in this series.
Hi Sean,

Yes, like I mentioned in the cover letter, I left the new implementation 
isolated on purpose to make the scope of the change clear.  There is 
certainly lots of duplication that should be removed later on.

> And it's not just the code itself, it's all the structures and concepts.  Off the
> top of my head, I can't think of any reason there needs to be a separate queue,
> separate lock(s), etc.  The only difference between kernel APF and user APF is
> what chunk of code is responsible for faulting in the page.

There are two queues involved:
  - "queue": stores in-flight faults. APF-kernel uses it to cancel all 
works if needed.  APF-user does not have a way to "cancel" userspace 
works, but it uses the queue to look up the struct by the token when 
userspace reports a completion.
  - "ready": stores completed faults until KVM finds a chance to tell 
guest about them.

I agree that the "ready" queue can be shared between APF-kernel and 
-user as it's used in the same way.  As for the "queue" queue, do you 
think it's ok to process its elements differently based on the "type" of 
them in a single loop [1] instead of having two separate queues?

[1] https://elixir.bootlin.com/linux/v6.13.2/source/virt/kvm/async_pf.c#L120

> I suspect a good place to start would be something along the lines of the below
> diff, and go from there.  Given that KVM already needs to special case the fake
> "wake all" items, I'm guessing it won't be terribly difficult to teach the core
> flows about userspace async #PF.

That sounds sensible.  I can certainly approach it in a "bottom up" way 
by sparingly adding handling where it's different in APF-user rather 
than adding it side by side and trying to merge common parts.

> I'm also not sure that injecting async #PF for all userfaults is desirable.  For
> in-kernel async #PF, KVM knows that faulting in the memory would sleep.  For
> userfaults, KVM has no way of knowing if the userfault will sleep, i.e. should
> be handled via async #PF.  The obvious answer is to have userspace only enable
> userspace async #PF when it's useful, but "an all or nothing" approach isn't
> great uAPI.  On the flip side, adding uAPI for a use case that doesn't exist
> doesn't make sense either :-/

I wasn't able to locate the code that would check whether faulting would 
sleep in APF-kernel.  KVM spins APF-kernel whenever it can ([2]). 
Please let me know if I'm missing something here.

[2] 
https://elixir.bootlin.com/linux/v6.13.2/source/arch/x86/kvm/mmu/mmu.c#L4360

> Exiting to userspace in vCPU context is also kludgy.  It makes sense for base
> userfault, because the vCPU can't make forward progress until the fault is
> resolved.  Actually, I'm not even sure it makes sense there.  I'll follow-up in

Even though we exit to userspace, in case of APF-user, userspace is 
supposed to VM enter straight after scheduling the async job, which is 
then executed concurrently with the vCPU.

> James' series.  Anyways, it definitely doesn't make sense for async #PF, because
> the whole point is to let the vCPU run.  Signalling userspace would definitely
> add complexity, but only because of the need to communicate the token and wait
> for userspace to consume said token.  I'll think more on that.

By signalling userspace you mean a new non-exit-to-userspace mechanism 
similar to UFFD?  What advantage can you see in it over exiting to 
userspace (which already exists in James's series)?


Thanks,
Nikita

> 
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 0ee4816b079a..fc31b47cf9c5 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -177,7 +177,8 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
>    * success, 'false' on failure (page fault has to be handled synchronously).
>    */
>   bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> -                       unsigned long hva, struct kvm_arch_async_pf *arch)
> +                       unsigned long hva, struct kvm_arch_async_pf *arch,
> +                       bool userfault)
>   {
>          struct kvm_async_pf *work;
> 
> @@ -202,13 +203,16 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>          work->addr = hva;
>          work->arch = *arch;
> 
> -       INIT_WORK(&work->work, async_pf_execute);
> -
>          list_add_tail(&work->queue, &vcpu->async_pf.queue);
>          vcpu->async_pf.queued++;
>          work->notpresent_injected = kvm_arch_async_page_not_present(vcpu, work);
> 
> -       schedule_work(&work->work);
> +       if (userfault) {
> +               work->userfault = true;
> +       } else {
> +               INIT_WORK(&work->work, async_pf_execute);
> +               schedule_work(&work->work);
> +       }
> 
>          return true;
>   }


  reply	other threads:[~2025-02-12 18:14 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-18 12:39 [RFC PATCH 0/6] KVM: x86: async PF user Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 1/6] Documentation: KVM: add userfault KVM exit flag Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 2/6] Documentation: KVM: add async pf user doc Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 3/6] KVM: x86: add async ioctl support Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 4/6] KVM: trace events: add type argument to async pf Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 5/6] KVM: x86: async_pf_user: add infrastructure Nikita Kalyazin
2024-11-18 12:39 ` [RFC PATCH 6/6] KVM: x86: async_pf_user: hook to fault handling and add ioctl Nikita Kalyazin
2024-11-19  1:26 ` [RFC PATCH 0/6] KVM: x86: async PF user James Houghton
2024-11-19 16:19   ` Nikita Kalyazin
2025-02-11 21:17 ` Sean Christopherson
2025-02-12 18:14   ` Nikita Kalyazin [this message]
2025-02-19 15:17     ` Sean Christopherson
2025-02-20 18:29       ` Nikita Kalyazin
2025-02-20 18:49         ` Sean Christopherson
2025-02-21 11:02           ` Nikita Kalyazin
2025-02-26  0:58             ` Sean Christopherson
2025-02-26 17:07               ` Nikita Kalyazin
2025-02-27 16:44                 ` Sean Christopherson
2025-02-27 18:24                   ` Nikita Kalyazin
2025-02-27 23:47                     ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7080c07-0fc5-45ce-92f7-5f432a67bc63@amazon.com \
    --to=kalyazin@amazon.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=derekmn@amazon.com \
    --cc=graf@amazon.de \
    --cc=gshan@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jgowans@amazon.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=nsaenz@amazon.es \
    --cc=oleg@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=xmarcalx@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox