From: Sean Christopherson <seanjc@google.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
David Matlack <dmatlack@google.com>,
Xu Yilun <yilun.xu@linux.intel.com>
Subject: Re: [PATCH 1/4] KVM: Always flush async #PF workqueue when vCPU is being destroyed
Date: Fri, 26 Jan 2024 09:19:57 -0800 [thread overview]
Message-ID: <ZbPpvZb51cwSGKfE@google.com> (raw)
In-Reply-To: <87le8c82ci.fsf@redhat.com>
On Fri, Jan 26, 2024, Vitaly Kuznetsov wrote:
> > +static void kvm_flush_and_free_async_pf_work(struct kvm_async_pf *work)
> > +{
> > + /*
> > + * The async #PF is "done", but KVM must wait for the work item itself,
> > + * i.e. async_pf_execute(), to run to completion. If KVM is a module,
> > + * KVM must ensure *no* code owned by the KVM (the module) can be run
> > + * after the last call to module_put(), i.e. after the last reference
> > + * to the last vCPU's file is put.
> > + *
>
> Do I understand correctly that the problem is also present on the
> "normal" path, i.e.:
>
> KVM_REQ_APF_READY
> kvm_check_async_pf_completion()
> kmem_cache_free(,work)
>
> on one CPU can actually finish _before_ work is fully flushed on the
> other (async_pf_execute() has already added an item to 'done' list but
> hasn't completed)? Is it just the fact that the window of opportunity
> to get the freed item re-purposed is so short that no real issue was
> ever noticed?
Sort of? It's not a problem with accessing a freed obect, the issue is that
async_pf_execute() can still be executing while KVM-the-module is unloaded.
The reason the "normal" path is problematic is because it removes the
work entry from async_pf.done and from async_pf.queue, i.e. makes it invisible
to kvm_arch_destroy_vm(). So to hit the bug where the "normal" path processes
the completed work, userspace would need to terminate the VM (i.e. exit() or
close all fds), and KVM would need to completely tear down the VM, all before
async_pf_execute() finished it's last few instructions.
Which is possible, just comically unlikely :-)
> In that case I'd suggest we emphasize that in the comment as currently it
> sounds like kvm_arch_destroy_vm() is the only probemmatic path.
How's this?
/*
* The async #PF is "done", but KVM must wait for the work item itself,
* i.e. async_pf_execute(), to run to completion. If KVM is a module,
* KVM must ensure *no* code owned by the KVM (the module) can be run
* after the last call to module_put(). Note, flushing the work item
* is always required when the item is taken off the completion queue.
* E.g. even if the vCPU handles the item in the "normal" path, the VM
* could be terminated before async_pf_execute() completes.
*
* Wake all events skip the queue and go straight done, i.e. don't
* need to be flushed (but sanity check that the work wasn't queued).
*/
> > + * Wake all events skip the queue and go straight done, i.e. don't
> > + * need to be flushed (but sanity check that the work wasn't queued).
> > + */
> > + if (work->wakeup_all)
> > + WARN_ON_ONCE(work->work.func);
> > + else
> > + flush_work(&work->work);
> > + kmem_cache_free(async_pf_cache, work);
> > }
> >
> > void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> > @@ -114,7 +132,6 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> > #else
> > if (cancel_work_sync(&work->work)) {
> > mmput(work->mm);
> > - kvm_put_kvm(vcpu->kvm); /* == work->vcpu->kvm */
> > kmem_cache_free(async_pf_cache, work);
> > }
> > #endif
> > @@ -126,7 +143,18 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> > list_first_entry(&vcpu->async_pf.done,
> > typeof(*work), link);
> > list_del(&work->link);
> > - kmem_cache_free(async_pf_cache, work);
> > +
> > + spin_unlock(&vcpu->async_pf.lock);
> > +
> > + /*
> > + * The async #PF is "done", but KVM must wait for the work item
> > + * itself, i.e. async_pf_execute(), to run to completion. If
> > + * KVM is a module, KVM must ensure *no* code owned by the KVM
> > + * (the module) can be run after the last call to module_put(),
> > + * i.e. after the last reference to the last vCPU's file is put.
> > + */
Doh, this is a duplicate comment, I'll delete it.
> > + kvm_flush_and_free_async_pf_work(work);
> > + spin_lock(&vcpu->async_pf.lock);
> > }
> > spin_unlock(&vcpu->async_pf.lock);
> >
> > @@ -151,7 +179,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
> >
> > list_del(&work->queue);
> > vcpu->async_pf.queued--;
> > - kmem_cache_free(async_pf_cache, work);
> > + kvm_flush_and_free_async_pf_work(work);
> > }
> > }
> >
> > @@ -186,7 +214,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > work->arch = *arch;
> > work->mm = current->mm;
> > mmget(work->mm);
> > - kvm_get_kvm(work->vcpu->kvm);
> >
> > INIT_WORK(&work->work, async_pf_execute);
>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> --
> Vitaly
>
next prev parent reply other threads:[~2024-01-26 17:20 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-10 1:15 [PATCH 0/4] KVM: Async #PF fixes and cleanups Sean Christopherson
2024-01-10 1:15 ` [PATCH 1/4] KVM: Always flush async #PF workqueue when vCPU is being destroyed Sean Christopherson
2024-01-20 12:40 ` Xu Yilun
2024-01-24 19:04 ` Sean Christopherson
2024-01-26 7:36 ` Xu Yilun
2024-02-06 19:06 ` Sean Christopherson
2024-01-26 16:51 ` Vitaly Kuznetsov
2024-01-26 17:19 ` Sean Christopherson [this message]
2024-01-29 9:02 ` Vitaly Kuznetsov
2024-02-19 13:59 ` Xu Yilun
2024-02-19 15:51 ` Sean Christopherson
2024-02-20 3:02 ` Xu Yilun
2024-01-10 1:15 ` [PATCH 2/4] KVM: Put mm immediately after async #PF worker completes remote gup() Sean Christopherson
2024-01-20 15:24 ` Xu Yilun
2024-01-26 16:23 ` Vitaly Kuznetsov
2024-01-10 1:15 ` [PATCH 3/4] KVM: Get reference to VM's address space in the async #PF worker Sean Christopherson
2024-01-20 15:16 ` Xu Yilun
2024-01-24 18:52 ` Sean Christopherson
2024-01-26 8:06 ` Xu Yilun
2024-01-26 16:21 ` Vitaly Kuznetsov
2024-01-26 16:39 ` Sean Christopherson
2024-01-10 1:15 ` [PATCH 4/4] KVM: Nullify async #PF worker's "apf" pointer as soon as it might be freed Sean Christopherson
2024-01-20 15:24 ` Xu Yilun
2024-01-26 16:30 ` Vitaly Kuznetsov
2024-02-06 21:36 ` [PATCH 0/4] KVM: Async #PF fixes and cleanups Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZbPpvZb51cwSGKfE@google.com \
--to=seanjc@google.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vkuznets@redhat.com \
--cc=yilun.xu@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox