From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>,
Emanuele Giuseppe Esposito <eesposit@redhat.com>,
kvm@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
Maxim Levitsky <mlevitsk@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Hyper-V VTLs, permission bitmaps and userspace exits (was Re: [PATCH 0/4] KVM: API to block and resume all running vcpus in a vm)
Date: Wed, 26 Oct 2022 19:33:17 +0000 [thread overview]
Message-ID: <Y1mLfRRKcF5OWZTG@google.com> (raw)
In-Reply-To: <d3e2dd2b-9520-32ef-6785-94164a834adf@redhat.com>
On Wed, Oct 26, 2022, Paolo Bonzini wrote:
> On 10/26/22 01:07, Sean Christopherson wrote:
> > > > - to stop anything else in the system that consumes KVM memslots, e.g. KVM GT
> > >
> > > Is this true if you only look at the KVM_GET_DIRTY_LOG case and consider it
> > > a guest bug to access the memory (i.e. ignore the strange read-only changes
> > > which only happen at boot, and which I agree are QEMU-specific)?
> >
> > Yes? I don't know exactly what "the KVM_GET_DIRTY_LOG case" is.
>
> It is not possible to atomically read the dirty bitmap and delete a memslot.
> When you delete a memslot, the bitmap is gone. In this case however memory
> accesses to the deleted memslot are a guest bug, so stopping KVM-GT would
> not be necessary.
If accesses to the deleted memslot are a guest bug, why do you care about pausing
vCPUs? I don't mean to be beligerent, I'm genuinely confused.
> So while I'm being slowly convinced that QEMU should find a way to pause its
> vCPUs around memslot changes, I'm not sure that pausing everything is needed
> in general.
>
> > > > And because of the nature of KVM, to support this API on all architectures, KVM
> > > > needs to make change on all architectures, whereas userspace should be able to
> > > > implement a generic solution.
> > >
> > > Yes, I agree that this is essentially just a more efficient kill().
> > > Emanuele, perhaps you can put together a patch to x86/vmexit.c in
> > > kvm-unit-tests, where CPU0 keeps changing memslots and the other CPUs are in
> > > a for(;;) busy wait, to measure the various ways to do it?
> >
> > I'm a bit confused. Is the goal of this to simplify QEMU, dedup VMM code, provide
> > a more performant solution, something else entirely?
>
> Well, a bit of all of them and perhaps that's the problem. And while the
> issues at hand *are* self-inflicted wounds on part of QEMU, it seems to me
> that the underlying issues are general.
>
> For example, Alex Graf and I looked back at your proposal of a userspace
> exit for "bad" accesses to memory, wondering if it could help with Hyper-V
> VTLs too. To recap, the "higher privileged" code at VTL1 can set up VM-wide
> restrictions on access to some pages through a hypercall
> (HvModifyVtlProtectionMask). After the hypercall, VTL0 would not be able to
> access those pages. The hypercall would be handled in userspace and would
> invoke a KVM_SET_MEMORY_REGION_PERM ioctl to restrict the RWX permissions,
> and this ioctl would set up a VM-wide permission bitmap that would be used
> when building page tables.
>
> Using such a bitmap instead of memslots makes it possible to cause userspace
> vmexits on VTL mapping violations with efficient data structures. And it
> would also be possible to use this mechanism around KVM_GET_DIRTY_LOG, to
> read the KVM dirty bitmap just before removing a memslot.
What exactly is the behavior you're trying to achieve for KVM_GET_DIRTY_LOG => delete?
If KVM provides KVM_EXIT_MEMORY_FAULT, can you not achieve the desired behavior by
doing mprotect(PROT_NONE) => KVM_GET_DIRTY_LOG => delete? If PROT_NONE causes the
memory to be freed, won't mprotect(PROT_READ) do what you want even without
KVM_EXIT_MEMORY_FAULT?
> However, external accesses to the regions (ITS, Xen, KVM-GT, non KVM_RUN
> ioctls) would not be blocked, due to the lack of a way to report the exit.
Aren't all of those out of scope? E.g. in a very hypothetical world where XEN's
event channel is being used with VTLs, if VTL1 makes the event channel inaccessible,
that's a guest and/or userspace configuration issue and the guest is hosed no matter
what KVM does. Ditto for these case where KVM-GT's buffer is blocked. I'm guessing
the ITS is similar?
> The intersection of these features with VTLs should be very small (sometimes
> zero since VTLs are x86 only), but the ioctls would be a problem so I'm
> wondering what your thoughts are on this.
How do the ioctls() map to VTLs? I.e. are they considered VTL0, VTL1, out-of-band?
> Also, while the exit API could be the same, it is not clear to me that the
> permission bitmap would be a good match for entirely "void" memslots used to
> work around non-atomic memslot changes. So for now let's leave this aside
> and only consider the KVM_GET_DIRTY_LOG case.
As above, can't userspace just mprotect() the entire memslot to prevent writes
between getting the dirty log and deleting the memslot?
prev parent reply other threads:[~2022-10-26 19:33 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-22 15:48 [PATCH 0/4] KVM: API to block and resume all running vcpus in a vm Emanuele Giuseppe Esposito
2022-10-22 15:48 ` [PATCH 1/4] linux-headers/linux/kvm.h: introduce kvm_userspace_memory_region_list ioctl Emanuele Giuseppe Esposito
2022-10-22 15:48 ` [PATCH 2/4] KVM: introduce kvm_clear_all_cpus_request Emanuele Giuseppe Esposito
2022-10-22 15:48 ` [PATCH 3/4] KVM: introduce memory transaction semaphore Emanuele Giuseppe Esposito
2022-10-23 17:50 ` Paolo Bonzini
2022-10-24 12:57 ` Emanuele Giuseppe Esposito
2022-10-25 10:01 ` Paolo Bonzini
2022-10-22 15:48 ` [PATCH 4/4] KVM: use signals to abort enter_guest/blocking and retry Emanuele Giuseppe Esposito
2022-10-23 17:48 ` Paolo Bonzini
2022-10-24 7:43 ` Emanuele Giuseppe Esposito
2022-10-24 7:49 ` Emanuele Giuseppe Esposito
2022-10-25 10:05 ` Paolo Bonzini
2022-10-24 7:56 ` [PATCH 0/4] KVM: API to block and resume all running vcpus in a vm Christian Borntraeger
2022-10-24 8:33 ` Emanuele Giuseppe Esposito
2022-10-24 9:09 ` Christian Borntraeger
2022-10-24 22:45 ` Sean Christopherson
2022-10-25 9:33 ` Paolo Bonzini
2022-10-25 15:55 ` Sean Christopherson
2022-10-25 21:34 ` Paolo Bonzini
2022-10-25 23:07 ` Sean Christopherson
2022-10-26 17:52 ` Hyper-V VTLs, permission bitmaps and userspace exits (was Re: [PATCH 0/4] KVM: API to block and resume all running vcpus in a vm) Paolo Bonzini
2022-10-26 19:33 ` Sean Christopherson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y1mLfRRKcF5OWZTG@google.com \
--to=seanjc@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=eesposit@redhat.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).