From: Mike Rapoport <rppt@linux.ibm.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: David Hildenbrand <david@redhat.com>,
Peter Xu <peterx@redhat.com>, Linux MM <linux-mm@kvack.org>,
Mike Kravetz <mike.kravetz@oracle.com>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>
Subject: Re: [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG
Date: Tue, 14 Jun 2022 21:56:11 +0300 [thread overview]
Message-ID: <YqjZywZfkV+CiM29@linux.ibm.com> (raw)
In-Reply-To: <D7013337-EF3B-4E40-8022-723904202F61@gmail.com>
On Tue, Jun 14, 2022 at 09:18:43AM -0700, Nadav Amit wrote:
> On Jun 14, 2022, at 8:22 AM, David Hildenbrand <david@redhat.com> wrote:
>
> > On 13.06.22 22:40, Nadav Amit wrote:
> >> From: Nadav Amit <namit@vmware.com>
> >>
> >> As we know, using a PTE on x86 with cleared access-bit (aka young-bit)
> >> takes ~600 cycles more than when the access-bit is set. At the same
> >> time, setting the access-bit for memory that is not used (e.g.,
> >> prefetched) can introduce greater overheads, as the prefetched memory is
> >> reclaimed later than it should be.
> >>
> >> Userfaultfd currently does not set the access-bit (excluding the
> >> huge-pages case). Arguably, it is best to let the uffd monitor control
> >> whether the access-bit should be set or not. The expected use is for the
> >> monitor to request userfaultfd to set the access-bit when the copy
> >> operation is done to resolve a page-fault, and not to set the young-bit
> >> when the memory is prefetched.
> >
> > Thinking out loud about existing users: postcopy live migration in QEMU
> > has two usage for placement of pages
> >
> > a) Resolving a fault. E.g., a VCPU might be waiting for resolution to
> > make progress.
> > b) Background migration to converge without faults on all relevant
> > pages.
> >
> > I guess in a) we'd want UFFDIO_COPY_MODE_YOUNG in b) we don't want it.
> >
> >
> > I wonder, however, instead of calling this "young", which implies what
> > the OS should or shouldn't do, to define this as a hint that the placed
> > page is very likely to be accessed next.
> >
> > I'm bad at naming, UFFDIO_COPY_MODE_ACCESS_LIKELY would express what I
> > have in mind.
>
> How about UFFDIO_COPY_MODE_WILLNEED_READ ?
>
> >
> >> Introduce UFFDIO_COPY_MODE_YOUNG to enable userspace to request the
> >> young bit to be set. For UFFDIO_CONTINUE and UFFDIO_ZEROPAGE set the bit
> >> unconditionally since the former is only used to resolve page-faults and
> >> the latter would not benefit from not setting the access-bit.
> >>
> >> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> >> Cc: Hugh Dickins <hughd@google.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Axel Rasmussen <axelrasmussen@google.com>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Mike Rapoport <rppt@linux.ibm.com>
> >> Signed-off-by: Nadav Amit <namit@vmware.com>
> >>
> >> ---
> >>
> >> There are 2 possible enhancements:
> >>
> >> 1. Use the flag to decide on whether to mark the PTE as dirty (for
> >> writable PTEs). I guess that setting the dirty-bit is as expensive as
> >> setting the access-bit, and setting it introduces similar tradeoffs,
> >> as mentioned above.
> >>
> >> 2. Introduce a similar mode for write-protect and use this information
> >> for setting both the young and dirty bits. Makes one wonder whether
> >> mprotect() should also set the bit in certain cases...
> >
> > I wonder if UFFDIO_COPY_MODE_READ_ACCESS_LIKELY vs.
> > UFFDIO_COPY_WRITE_ACCESS_LIKELY could evenmake sense. I feel like it could.
> >
> > For example, QEMU knows if a page fault it's resolving was due to a read
> > or a write fault and could use that information accordingly. Of course,
> > we don't completely know if we currently have a read fault, if we could
> > get a write fault immediately after.
> >
> > Especially in the context of UFFDIO_ZEROPAGE,
> > UFFDIO_ZEROPAGE_WRITE_ACCESS_LIKELY could ... not place the zeropage but
> > instead populate an actual page and mark it accessed+dirty. I even have
> > a use case for that ;)
> >
> >
> > The kernel could decide how to treat these hints -- for example, if it
> > doesn't want user space to mess with access/dirty bits, it could just
> > mostly ignore the hints.
>
> I can do that. I think users can do the zero page-copy themselves today, but
> whatever you prefer.
>
> But, I cannot take it anymore: the list of arguments for uffd stuff is
> crazy. I would like to collect all the possible arguments that are used for
> uffd operation into some “struct uffd_op”.
Squashing boolean parameters into int flags will also reduce the insane
amount of parameters. No strong feelings though.
> Any objection?
>
>
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2022-06-14 18:56 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-13 20:40 [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG Nadav Amit
2022-06-14 15:22 ` David Hildenbrand
2022-06-14 16:18 ` Nadav Amit
2022-06-14 17:14 ` David Hildenbrand
2022-06-14 18:56 ` Mike Rapoport [this message]
2022-06-14 19:25 ` Nadav Amit
2022-06-14 20:40 ` John Hubbard
2022-06-14 20:56 ` Nadav Amit
2022-06-14 21:40 ` John Hubbard
2022-06-14 21:52 ` Nadav Amit
2022-06-14 21:59 ` John Hubbard
2022-06-15 7:26 ` Mike Rapoport
2022-06-15 15:43 ` Peter Xu
2022-06-15 16:58 ` Nadav Amit
2022-06-15 18:39 ` Peter Xu
2022-06-15 19:42 ` Nadav Amit
2022-06-15 20:56 ` Peter Xu
2022-06-16 5:24 ` Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YqjZywZfkV+CiM29@linux.ibm.com \
--to=rppt@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.