kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Houghton <jthoughton@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: David Matlack <dmatlack@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	kvm list <kvm@vger.kernel.org>,
	 Sean Christopherson <seanjc@google.com>,
	Oliver Upton <oupton@google.com>,
	 Mike Kravetz <mike.kravetz@oracle.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	 Frank van der Linden <fvdl@google.com>
Subject: Re: RFC: A KVM-specific alternative to UserfaultFD
Date: Tue, 7 Nov 2023 08:11:09 -0800	[thread overview]
Message-ID: <CADrL8HUHO12Bxrx94_VoS8AsN5uEO1qYM2SCF7Tgw-=vsRUwBA@mail.gmail.com> (raw)
In-Reply-To: <ZUpIB1/5eZ/2X+0M@x1n>

On Tue, Nov 7, 2023 at 6:22 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Nov 06, 2023 at 03:22:05PM -0800, David Matlack wrote:
> > On Mon, Nov 6, 2023 at 3:03 PM Peter Xu <peterx@redhat.com> wrote:
> > > On Mon, Nov 06, 2023 at 02:24:13PM -0800, Axel Rasmussen wrote:
> > > > On Mon, Nov 6, 2023 at 12:23 PM Peter Xu <peterx@redhat.com> wrote:
> > > > > On Mon, Nov 06, 2023 at 10:25:13AM -0800, David Matlack wrote:
> > > > > >
> > > > > >   * Memory Overhead: UserfaultFD requires an extra 8 bytes per page of
> > > > > >     guest memory for the userspace page table entries.
> > > > >
> > > > > What is this one?
> > > >
> > > > In the way we use userfaultfd, there are two shared userspace mappings
> > > > - one non-UFFD registered one which is used to resolve demand paging
> > > > faults, and another UFFD-registered one which is handed to KVM et al
> > > > for the guest to use. I think David is talking about the "second"
> > > > mapping as overhead here, since with the KVM-based approach he's
> > > > describing we don't need that mapping.
> > >
> > > I see, but then is it userspace relevant?  IMHO we should discuss the
> > > proposal based only on the design itself, rather than relying on any
> > > details on possible userspace implementations if two mappings are not
> > > required but optional.
> >
> > What I mean here is that for UserfaultFD to track accesses at
> > PAGE_SIZE granularity, that requires 1 PTE per page, i.e. 8 bytes per
> > page. Versus the KVM-based approach which only requires 1 bit per page
> > for the present bitmap. This is inherent in the design of UserfaultFD
> > because it uses PTEs to track what is present, not specific to how we
> > use UserfaultFD.
>
> Shouldn't the userspace normally still maintain one virtual mapping anyway
> for the guest address range?  As IIUC kvm still relies a lot on HVA to work
> (at least before guest memfd)? E.g., KVM_SET_USER_MEMORY_REGION, or mmu
> notifiers.  If so, that 8 bytes should be there with/without userfaultfd,
> IIUC.
>
> Also, I think that's not strictly needed for any kind of file memories, as
> in those case userfaultfd works with page cache.

This extra ~8 bytes per page overhead is real, and it is the
theoretical maximum additional overhead that userfaultfd would require
over a KVM-based demand paging alternative when we are using
hugepages. Consider the case where we are using THPs and have just
finished post-copy, and we haven't done any collapsing yet:

For userfaultfd: because we have UFFDIO_COPY'd or UFFDIO_CONTINUE'd at
4K (because we demand-fetched at 4K), the userspace page tables are
entirely shattered. KVM has no choice but to have an entirely
shattered second-stage page table as well.

For KVM demand paging: the userspace page tables can remain entirely
populated, so we get PMD mappings here. KVM, though, uses 4K SPTEs
because we have only just finished post-copy and haven't started
collapsing yet.

So both systems end up with a shattered second stage page table, but
userfaultfd has a shattered userspace page table as well (+8 bytes/4K
if using THP, +another 8 bytes/2M if using HugeTLB-1G, etc.) and that
is where the extra overhead comes from.

The second mapping of guest memory that we use today (through which we
install memory), given that we are using hugepages, will use PMDs and
PUDs, so the overhead is minimal.

Hope that clears things up!

Thanks,
James

  reply	other threads:[~2023-11-07 16:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-06 18:25 RFC: A KVM-specific alternative to UserfaultFD David Matlack
2023-11-06 20:23 ` Peter Xu
2023-11-06 22:24   ` Axel Rasmussen
2023-11-06 23:03     ` Peter Xu
2023-11-06 23:22       ` David Matlack
2023-11-07 14:21         ` Peter Xu
2023-11-07 16:11           ` James Houghton [this message]
2023-11-07 17:24             ` Peter Xu
2023-11-07 19:08               ` James Houghton
2023-11-07 16:25   ` Paolo Bonzini
2023-11-07 20:04     ` David Matlack
2023-11-07 21:10       ` Oliver Upton
2023-11-07 21:34         ` David Matlack
2023-11-08  1:27           ` Oliver Upton
2023-11-08 16:56             ` David Matlack
2023-11-08 17:34               ` Peter Xu
2023-11-08 20:10                 ` Sean Christopherson
2023-11-08 20:36                   ` Peter Xu
2023-11-08 20:47                   ` Axel Rasmussen
2023-11-08 21:05                     ` David Matlack
2023-11-08 20:49                 ` David Matlack
2023-11-08 20:33               ` Paolo Bonzini
2023-11-08 20:43                 ` David Matlack
2023-11-07 22:29     ` Peter Xu
2023-11-09 16:41       ` David Matlack
2023-11-09 17:58         ` Sean Christopherson
2023-11-09 18:33           ` David Matlack
2023-11-09 22:44             ` David Matlack
2023-11-09 23:54               ` Sean Christopherson
2023-11-09 19:20           ` Peter Xu
2023-11-11 16:23             ` David Matlack
2023-11-11 17:30               ` Peter Xu
2023-11-13 16:43                 ` David Matlack
2023-11-20 18:32                   ` James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADrL8HUHO12Bxrx94_VoS8AsN5uEO1qYM2SCF7Tgw-=vsRUwBA@mail.gmail.com' \
    --to=jthoughton@google.com \
    --cc=aarcange@redhat.com \
    --cc=axelrasmussen@google.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).