From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Christophe de Dinechin <dinechin@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Yan Zhao <yan.y.zhao@intel.com>,
Alex Williamson <alex.williamson@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Kevin Kevin <kevin.tian@intel.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH v3 09/21] KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR]
Date: Tue, 28 Jan 2020 10:24:03 -0800 [thread overview]
Message-ID: <20200128182402.GA18652@linux.intel.com> (raw)
In-Reply-To: <20200128055005.GB662081@xz-x1>
On Tue, Jan 28, 2020 at 01:50:05PM +0800, Peter Xu wrote:
> On Tue, Jan 21, 2020 at 07:56:57AM -0800, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index c4d3972dcd14..ff97782b3919 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -9584,7 +9584,15 @@ void kvm_arch_sync_events(struct kvm *kvm)
> > > kvm_free_pit(kvm);
> > > }
> > >
> > > -int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
> > > +/*
> > > + * If `uaddr' is specified, `*uaddr' will be returned with the
> > > + * userspace address that was just allocated. `uaddr' is only
> > > + * meaningful if the function returns zero, and `uaddr' will only be
> > > + * valid when with either the slots_lock or with the SRCU read lock
> > > + * held. After we release the lock, the returned `uaddr' will be invalid.
> >
> > This is all incorrect. Neither of those locks has any bearing on the
> > validity of the hva. slots_lock does as the name suggests and prevents
> > concurrent writes to the memslots. The SRCU lock ensures the implicit
> > memslots lookup in kvm_clear_guest_page() won't result in a use-after-free
> > due to derefencing old memslots.
> >
> > Neither of those has anything to do with the userspace address, they're
> > both fully tied to KVM's gfn->hva lookup. As Paolo pointed out, KVM's
> > mapping is instead tied to the lifecycle of the VM. Note, even *that* has
> > no bearing on the validity of the mapping or address as KVM only increments
> > mm_count, not mm_users, i.e. guarantees the mm struct itself won't be freed
> > but doesn't ensure the vmas or associated pages tables are valid.
> >
> > Which is the entire point of using __copy_{to,from}_user(), as they
> > gracefully handle the scenario where the process has not valid mapping
> > and/or translation for the address.
>
> Sorry I don't understand.
>
> I do think either the slots_lock or SRCU would protect at least the
> existing kvm.memslots, and if so at least the previous vm_mmap()
> return value should still be valid.
Nope. kvm->slots_lock only protects gfn->hva lookups, e.g. userspace can
munmap() the range at any time.
> I agree that __copy_to_user() will protect us from many cases from process
> mm pov (which allows page faults inside), but again if the kvm.memslots is
> changed underneath us then it's another story, IMHO, and that's why we need
> either the lock or SRCU.
No, again, slots_lock and SRCU only protect gfn->hva lookups.
> Or are you assuming that (1) __x86_set_memory_region() is only for the
> 3 private kvm memslots,
It's not an assumption, the entire purpose of __x86_set_memory_region()
is to provide support for private KVM memslots.
> and (2) currently the kvm private memory slots will never change after VM
> is created and before VM is destroyed?
No, I'm not assuming the private memslots are constant, e.g. the flow in
question, vmx_set_tss_addr() is directly tied to an unprotected ioctl().
KVM's sole responsible for vmx_set_tss_addr() is to not crash the kernel.
Userspace is responsible for ensuring it doesn't break its guests, e.g.
that multiple calls to KVM_SET_TSS_ADDR are properly serialized.
In the existing code, KVM ensures it doesn't crash by holding the SRCU lock
for the duration of init_rmode_tss() so that the gfn->hva lookups in
kvm_clear_guest_page() don't dereference a stale memslots array. In no way
does that ensure the validity of the resulting hva, e.g. multiple calls to
KVM_SET_TSS_ADDR would race to set vmx->tss_addr and so init_rmode_tss()
could be operating on a stale gpa.
Putting the onus on KVM to ensure atomicity is pointless because concurrent
calls to KVM_SET_TSS_ADDR would still race, i.e. the end value of
vmx->tss_addr would be non-deterministic. The intregrity of the underlying
TSS would be guaranteed, but that guarantee isn't part of KVM's ABI.
> If so, I agree with you. However I don't see why we need to restrict
> __x86_set_memory_region() with that assumption, after all taking a
> lock is not expensive in this slow path.
In what way would not holding slots_lock in vmx_set_tss_addr() restrict
__x86_set_memory_region()? Literally every other usage of
__x86_set_memory_region() holds slots_lock for the duration of creating
the private memslot, because in those flows, KVM *is* responsible for
ensuring correct ordering.
> Even if so, we'd better comment above __x86_set_memory_region() about this,
> so we know that we should not use __x86_set_memory_region() for future kvm
> internal memslots that are prone to change during VM's lifecycle (while
> currently it seems to be a very general interface).
There is no such restriction. Obviously such a flow would need to ensure
correctness, but hopefully that goes without saying.
next prev parent reply other threads:[~2020-01-28 18:24 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-09 14:57 [PATCH v3 00/21] KVM: Dirty ring interface Peter Xu
2020-01-09 14:57 ` [PATCH v3 01/21] vfio: introduce vfio_iova_rw to read/write a range of IOVAs Peter Xu
2020-01-09 14:57 ` [PATCH v3 02/21] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_iova_rw Peter Xu
2020-01-09 14:57 ` [PATCH v3 03/21] KVM: Remove kvm_read_guest_atomic() Peter Xu
2020-01-09 14:57 ` [PATCH v3 04/21] KVM: Add build-time error check on kvm_run size Peter Xu
2020-01-09 14:57 ` [PATCH v3 05/21] KVM: X86: Change parameter for fast_page_fault tracepoint Peter Xu
2020-01-09 14:57 ` [PATCH v3 06/21] KVM: X86: Don't take srcu lock in init_rmode_identity_map() Peter Xu
2020-01-09 14:57 ` [PATCH v3 07/21] KVM: Cache as_id in kvm_memory_slot Peter Xu
2020-01-09 14:57 ` [PATCH v3 08/21] KVM: X86: Drop x86_set_memory_region() Peter Xu
2020-01-09 14:57 ` [PATCH v3 09/21] KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR] Peter Xu
2020-01-19 9:01 ` Paolo Bonzini
2020-01-20 6:45 ` Peter Xu
2020-01-21 15:56 ` Sean Christopherson
2020-01-21 16:14 ` Paolo Bonzini
2020-01-28 5:50 ` Peter Xu
2020-01-28 18:24 ` Sean Christopherson [this message]
2020-01-31 15:08 ` Peter Xu
2020-01-31 19:33 ` Sean Christopherson
2020-01-31 20:28 ` Peter Xu
2020-01-31 20:36 ` Sean Christopherson
2020-01-31 20:55 ` Peter Xu
2020-01-31 21:29 ` Sean Christopherson
2020-01-31 22:16 ` Peter Xu
2020-01-31 22:20 ` Sean Christopherson
2020-01-09 14:57 ` [PATCH v3 10/21] KVM: Pass in kvm pointer into mark_page_dirty_in_slot() Peter Xu
2020-01-09 14:57 ` [PATCH v3 11/21] KVM: Move running VCPU from ARM to common code Peter Xu
2020-01-09 14:57 ` [PATCH v3 12/21] KVM: X86: Implement ring-based dirty memory tracking Peter Xu
2020-01-09 16:29 ` Michael S. Tsirkin
2020-01-09 16:56 ` Alex Williamson
2020-01-09 19:21 ` Peter Xu
2020-01-09 19:36 ` Michael S. Tsirkin
2020-01-09 19:15 ` Peter Xu
2020-01-09 19:35 ` Michael S. Tsirkin
2020-01-09 20:19 ` Peter Xu
2020-01-09 22:18 ` Michael S. Tsirkin
2020-01-10 15:29 ` Peter Xu
2020-01-12 6:24 ` Michael S. Tsirkin
2020-01-14 20:01 ` Peter Xu
2020-01-15 6:50 ` Michael S. Tsirkin
2020-01-15 15:20 ` Peter Xu
2020-01-19 9:09 ` Paolo Bonzini
2020-01-19 10:12 ` Michael S. Tsirkin
2020-01-20 7:29 ` Peter Xu
2020-01-20 7:47 ` Michael S. Tsirkin
2020-01-21 8:29 ` Peter Xu
2020-01-21 10:25 ` Paolo Bonzini
2020-01-21 10:24 ` Paolo Bonzini
2020-01-11 4:49 ` kbuild test robot
2020-01-11 23:19 ` kbuild test robot
2020-01-15 6:47 ` Michael S. Tsirkin
2020-01-15 15:27 ` Peter Xu
2020-01-16 8:38 ` Michael S. Tsirkin
2020-01-16 16:27 ` Peter Xu
2020-01-17 9:50 ` Michael S. Tsirkin
2020-01-20 6:48 ` Peter Xu
2020-01-09 14:57 ` [PATCH v3 13/21] KVM: Make dirty ring exclusive to dirty bitmap log Peter Xu
2020-01-09 14:57 ` [PATCH v3 14/21] KVM: Don't allocate dirty bitmap if dirty ring is enabled Peter Xu
2020-01-09 16:41 ` Peter Xu
2020-01-09 14:57 ` [PATCH v3 15/21] KVM: selftests: Always clear dirty bitmap after iteration Peter Xu
2020-01-09 14:57 ` [PATCH v3 16/21] KVM: selftests: Sync uapi/linux/kvm.h to tools/ Peter Xu
2020-01-09 14:57 ` [PATCH v3 17/21] KVM: selftests: Use a single binary for dirty/clear log test Peter Xu
2020-01-09 14:57 ` [PATCH v3 18/21] KVM: selftests: Introduce after_vcpu_run hook for dirty " Peter Xu
2020-01-09 14:57 ` [PATCH v3 19/21] KVM: selftests: Add dirty ring buffer test Peter Xu
2020-01-09 14:57 ` [PATCH v3 20/21] KVM: selftests: Let dirty_log_test async for dirty ring test Peter Xu
2020-01-09 14:57 ` [PATCH v3 21/21] KVM: selftests: Add "-c" parameter to dirty log test Peter Xu
2020-01-09 15:59 ` [PATCH v3 00/21] KVM: Dirty ring interface Michael S. Tsirkin
2020-01-09 16:17 ` Peter Xu
2020-01-09 16:40 ` Michael S. Tsirkin
2020-01-09 17:08 ` Peter Xu
2020-01-09 19:08 ` Michael S. Tsirkin
2020-01-09 19:39 ` Peter Xu
2020-01-09 20:42 ` Paolo Bonzini
2020-01-09 22:28 ` Michael S. Tsirkin
2020-01-10 15:10 ` Peter Xu
2020-01-09 16:47 ` Alex Williamson
2020-01-09 17:58 ` Peter Xu
2020-01-09 19:13 ` Michael S. Tsirkin
2020-01-09 19:23 ` Peter Xu
2020-01-09 19:37 ` Michael S. Tsirkin
2020-01-09 20:51 ` Paolo Bonzini
2020-01-09 22:21 ` Michael S. Tsirkin
2020-01-19 9:11 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200128182402.GA18652@linux.intel.com \
--to=sean.j.christopherson@intel.com \
--cc=alex.williamson@redhat.com \
--cc=dgilbert@redhat.com \
--cc=dinechin@redhat.com \
--cc=jasowang@redhat.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=vkuznets@redhat.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).