public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	kvm@vger.kernel.org, Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org, Like Xu <like.xu.linux@gmail.com>
Subject: Re: [RFC PATCH 0/9] kvm: implement atomic memslot updates
Date: Thu, 29 Sep 2022 15:18:47 +0000	[thread overview]
Message-ID: <YzW3VxqZTb2hnXCy@google.com> (raw)
In-Reply-To: <637e7ef3-e204-52fc-a4ff-1f0df5227a3e@redhat.com>

On Thu, Sep 29, 2022, Emanuele Giuseppe Esposito wrote:
> 
> Am 28/09/2022 um 22:41 schrieb Sean Christopherson:
> > Beyond that, there's no explanation of why this exact API is necessary, i.e. there
> > are no requirements given.
> > 
> >   - Why can't this be solved in userspace?
> 
> Because this would provide the "feature" only to QEMU, leaving each
> other hypervisor to implement its own.

But there's no evidence that any other VMM actually needs this feature.

> >   - When is KVM required to invalidate and flush?  E.g. if a memslot is deleted
> >     and recreated with a different HVA, how does KVM ensure that there are no
> >     outstanding references to the old HVA without introducing non-determinstic
> >     behavior.  The current API works by forcing userspace to fully delete the
> >     memslot, i.e. KVM can ensure all references are gone in all TLBs before
> >     allowing userspace to create new, conflicting entries.  I don't see how this
> >     can work with batched updates.  The update needs to be "atomic", i.e. vCPUs
> >     must never see an invalid/deleted memslot, but if the memslot is writable,
> >     how does KVM prevent some writes from hitting the old HVA and some from hitting
> >     the new HVA without a quiescent period?
> 
> Sorry this might be my fault in providing definitions, even though I
> think I made plenty of examples. Delete/move operations are not really
> "atomic" in the sense that the slot disappears immediately.
> 
> The slot(s) is/are first "atomically" invalidated, allowing the guest to
> retry the page fault

I completely forgot that KVM x86 now retries on KVM_MEMSLOT_INVALID[*].  That is
somewhat arbitrary behavior, e.g. if x86 didn't do MMIO SPTE caching it wouldn't
be "necessary" and x86 would still treat INVALID as MMIO.  It's also x86 specific,
no other architecture retries on invalid memslots, i.e. relying on that behavior
to provide "atomicity" won't work without updating all other architectures.

That's obviously doable, but since these updates aren't actually atomic, I don't
see any meaningful benefit over adding e.g. KVM_MEM_DISABLED.

[*] e0c378684b65 ("KVM: x86/mmu: Retry page faults that hit an invalid memslot")

> >   - If a memslot is truncated while dirty logging is enabled, what happens to
> >     the bitmap?  Is it preserved?  Dropped?
> 
> Can you explain what you mean with "truncated memslot"?

Shrink the size of the memslot.

> Regarding the bitmap, currently QEMU should (probably wrongly) update it
> before even committing the changes to KVM. But I remember upstream
> someone pointed that this could be solved later.

These details can't be punted, as they affect the ABI.  My interpretation of
"atomic" memslot updates was that KVM would truly honor the promise of atomic
updates, e.g. that a DELETE=>CREATE sequence would effectively allow truncating
a memslot without losing any data.  Those details aren't really something that
can be punted down the road to be fixed at a later point because they very much
matter from an ABI perspective.

> > Again, I completely agree that the current memslots API is far from perfect, but
> > I'm not convinced that simply extending the existing API to batch updates is the
> > best solution from a KVM perspective.
> > 
> >>> E.g. why do a batch update and not provide KVM_SET_ALL_USER_MEMORY_REGIONS to
> >>> do wholesale replacement?  That seems like it would be vastly simpler to handle
> >>> on KVM's end.  Or maybe there's a solution in the opposite direction, e.g. an
> >>> API that allows 1->N or N->1 conversions but not arbitrary batching.
> >>
> >> Wholesale replacement was my first idea when I looked at the issue, I think
> >> at the end of 2020.  I never got to a full implementation, but my impression
> >> was that allocating/deallocating dirty bitmaps, rmaps etc. would make it any
> >> easier than arbitrary batch updates.
> > 
> > It's not obvious to me that the memslot metadata is going to be easy to handle
> > regardless of what we do.  E.g. I'm pretty sure that batching updates will "corrupt"
> > the dirty bitmap if a hole is punched in a memslot that's being dirty logged.
> > 
> 
> Could you provide an example?
> I mean I am not an expert but to me it looks like I preserved the same
> old functionalities both here and in QEMU. I just batched and delayed
> some operations, which in this case should cause no harm.

As above, my confusion is due to calling these updates "atomic".  The new API is
really just "allow batching memslot updates and subtly rely on restarting page
faults on KVM_MEMSLOT_INVALID".

IMO, KVM_MEM_DISABLED or similar is the way to go.  I.e. formalize the "restart
page faults" semantics without taking on the complexity of batched updates.

  parent reply	other threads:[~2022-09-29 15:19 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-09 10:44 [RFC PATCH 0/9] kvm: implement atomic memslot updates Emanuele Giuseppe Esposito
2022-09-09 10:44 ` [RFC PATCH 1/9] kvm_main.c: move slot check in kvm_set_memory_region Emanuele Giuseppe Esposito
2022-09-28 16:41   ` Paolo Bonzini
2022-09-09 10:44 ` [RFC PATCH 2/9] kvm.h: introduce KVM_SET_USER_MEMORY_REGION_LIST ioctl Emanuele Giuseppe Esposito
2022-09-28 16:42   ` Paolo Bonzini
2022-09-09 10:45 ` [RFC PATCH 3/9] kvm_main.c: introduce kvm_internal_memory_region_list Emanuele Giuseppe Esposito
2022-09-28 16:48   ` Paolo Bonzini
2022-09-09 10:45 ` [RFC PATCH 4/9] kvm_main.c: split logic in kvm_set_memslots Emanuele Giuseppe Esposito
2022-09-28 17:04   ` Paolo Bonzini
2022-09-09 10:45 ` [RFC PATCH 5/9] kvm_main.c: split __kvm_set_memory_region logic in kvm_check_mem and kvm_prepare_batch Emanuele Giuseppe Esposito
2022-09-13  2:56   ` Yang, Weijiang
2022-09-18 16:22     ` Emanuele Giuseppe Esposito
2022-09-28 17:11   ` Paolo Bonzini
2022-09-09 10:45 ` [RFC PATCH 6/9] kvm_main.c: simplify change-specific callbacks Emanuele Giuseppe Esposito
2022-09-09 10:45 ` [RFC PATCH 7/9] kvm_main.c: duplicate invalid memslot also in inactive list Emanuele Giuseppe Esposito
2022-09-28 17:18   ` Paolo Bonzini
2022-09-09 10:45 ` [RFC PATCH 8/9] kvm_main.c: find memslots from the inactive memslot list Emanuele Giuseppe Esposito
2022-09-09 10:45 ` [RFC PATCH 9/9] kvm_main.c: handle atomic memslot update Emanuele Giuseppe Esposito
2022-09-13  2:30   ` Yang, Weijiang
2022-09-18 16:18     ` Emanuele Giuseppe Esposito
2022-09-27  7:46   ` David Hildenbrand
2022-09-27  8:35     ` Emanuele Giuseppe Esposito
2022-09-27  9:22       ` David Hildenbrand
2022-09-27  9:32         ` Emanuele Giuseppe Esposito
2022-09-27 14:52           ` David Hildenbrand
2022-09-28 17:29   ` Paolo Bonzini
2022-09-09 14:30 ` [RFC PATCH 0/9] kvm: implement atomic memslot updates Sean Christopherson
2022-09-18 16:13   ` Emanuele Giuseppe Esposito
2022-09-19  7:38     ` Like Xu
2022-09-19  7:53     ` David Hildenbrand
2022-09-19 17:30       ` David Hildenbrand
2022-09-23 13:10         ` Emanuele Giuseppe Esposito
2022-09-23 13:21           ` David Hildenbrand
2022-09-23 13:38             ` Emanuele Giuseppe Esposito
2022-09-26  9:03               ` David Hildenbrand
2022-09-26 21:28                 ` Sean Christopherson
2022-09-27  7:38                   ` Emanuele Giuseppe Esposito
2022-09-27 15:58                     ` Sean Christopherson
2022-09-28  9:11                       ` Emanuele Giuseppe Esposito
2022-09-28 11:14                         ` Maxim Levitsky
2022-09-28 12:52                           ` David Hildenbrand
2022-09-28 15:07                       ` Paolo Bonzini
2022-09-28 15:33                         ` David Hildenbrand
2022-09-28 15:58                         ` Sean Christopherson
2022-09-28 16:38                           ` Paolo Bonzini
2022-09-28 20:41                             ` Sean Christopherson
2022-09-29  8:05                               ` Emanuele Giuseppe Esposito
2022-09-29  8:24                                 ` David Hildenbrand
2022-09-29 15:18                                 ` Sean Christopherson [this message]
2022-09-29 15:41                                   ` Paolo Bonzini
2022-09-29 15:28                               ` Paolo Bonzini
2022-09-29 15:40                                 ` Maxim Levitsky
2022-09-29 16:00                                 ` David Hildenbrand
2022-09-29 21:39                                 ` Sean Christopherson
2022-10-13  7:43                                   ` Emanuele Giuseppe Esposito
2022-10-13  8:44                                     ` David Hildenbrand
2022-10-13 11:12                                       ` Emanuele Giuseppe Esposito
2022-10-13 14:45                                         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzW3VxqZTb2hnXCy@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox