Re: [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nadav Amit <nadav.amit@gmail.com>
To: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Peter Xu <peterx@redhat.com>, Linux MM <linux-mm@kvack.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>
Subject: Re: [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG
Date: Tue, 14 Jun 2022 12:25:36 -0700	[thread overview]
Message-ID: <06230F13-F08C-474E-A06B-62A89AE856D2@gmail.com> (raw)
In-Reply-To: <YqjZywZfkV+CiM29@linux.ibm.com>

On Jun 14, 2022, at 11:56 AM, Mike Rapoport <rppt@linux.ibm.com> wrote:

> On Tue, Jun 14, 2022 at 09:18:43AM -0700, Nadav Amit wrote:
>> On Jun 14, 2022, at 8:22 AM, David Hildenbrand <david@redhat.com> wrote:
>> 
>>> On 13.06.22 22:40, Nadav Amit wrote:
>>>> From: Nadav Amit <namit@vmware.com>
>>>> 
>>>> As we know, using a PTE on x86 with cleared access-bit (aka young-bit)
>>>> takes ~600 cycles more than when the access-bit is set. At the same
>>>> time, setting the access-bit for memory that is not used (e.g.,
>>>> prefetched) can introduce greater overheads, as the prefetched memory is
>>>> reclaimed later than it should be.
>>>> 
>>>> Userfaultfd currently does not set the access-bit (excluding the
>>>> huge-pages case). Arguably, it is best to let the uffd monitor control
>>>> whether the access-bit should be set or not. The expected use is for the
>>>> monitor to request userfaultfd to set the access-bit when the copy
>>>> operation is done to resolve a page-fault, and not to set the young-bit
>>>> when the memory is prefetched.
>>> 
>>> Thinking out loud about existing users: postcopy live migration in QEMU
>>> has two usage for placement of pages
>>> 
>>> a) Resolving a fault. E.g., a VCPU might be waiting for resolution to
>>> make progress.
>>> b) Background migration to converge without faults on all relevant
>>> pages.
>>> 
>>> I guess in a) we'd want UFFDIO_COPY_MODE_YOUNG in b) we don't want it.
>>> 
>>> 
>>> I wonder, however, instead of calling this "young", which implies what
>>> the OS should or shouldn't do, to define this as a hint that the placed
>>> page is very likely to be accessed next.
>>> 
>>> I'm bad at naming, UFFDIO_COPY_MODE_ACCESS_LIKELY would express what I
>>> have in mind.
>> 
>> How about UFFDIO_COPY_MODE_WILLNEED_READ ?
>> 
>>>> Introduce UFFDIO_COPY_MODE_YOUNG to enable userspace to request the
>>>> young bit to be set. For UFFDIO_CONTINUE and UFFDIO_ZEROPAGE set the bit
>>>> unconditionally since the former is only used to resolve page-faults and
>>>> the latter would not benefit from not setting the access-bit.
>>>> 
>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>> Cc: Hugh Dickins <hughd@google.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Axel Rasmussen <axelrasmussen@google.com>
>>>> Cc: Peter Xu <peterx@redhat.com>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>> 
>>>> ---
>>>> 
>>>> There are 2 possible enhancements:
>>>> 
>>>> 1. Use the flag to decide on whether to mark the PTE as dirty (for
>>>> writable PTEs). I guess that setting the dirty-bit is as expensive as
>>>> setting the access-bit, and setting it introduces similar tradeoffs,
>>>> as mentioned above.
>>>> 
>>>> 2. Introduce a similar mode for write-protect and use this information
>>>> for setting both the young and dirty bits. Makes one wonder whether
>>>> mprotect() should also set the bit in certain cases...
>>> 
>>> I wonder if UFFDIO_COPY_MODE_READ_ACCESS_LIKELY vs.
>>> UFFDIO_COPY_WRITE_ACCESS_LIKELY could evenmake sense. I feel like it could.
>>> 
>>> For example, QEMU knows if a page fault it's resolving was due to a read
>>> or a write fault and could use that information accordingly. Of course,
>>> we don't completely know if we currently have a read fault, if we could
>>> get a write fault immediately after.
>>> 
>>> Especially in the context of UFFDIO_ZEROPAGE,
>>> UFFDIO_ZEROPAGE_WRITE_ACCESS_LIKELY could ... not place the zeropage but
>>> instead populate an actual page and mark it accessed+dirty. I even have
>>> a use case for that ;)
>>> 
>>> 
>>> The kernel could decide how to treat these hints -- for example, if it
>>> doesn't want user space to mess with access/dirty bits, it could just
>>> mostly ignore the hints.
>> 
>> I can do that. I think users can do the zero page-copy themselves today, but
>> whatever you prefer.
>> 
>> But, I cannot take it anymore: the list of arguments for uffd stuff is
>> crazy. I would like to collect all the possible arguments that are used for
>> uffd operation into some “struct uffd_op”.
> 
> Squashing boolean parameters into int flags will also reduce the insane
> amount of parameters. No strong feelings though.
> 
>> Any objection?

Thanks. I also noticed a couple of embarrassing bugs that I made. Will send
v1 with fixes.

next prev parent reply	other threads:[~2022-06-14 19:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-13 20:40 [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG Nadav Amit
2022-06-14 15:22 ` David Hildenbrand
2022-06-14 16:18   ` Nadav Amit
2022-06-14 17:14     ` David Hildenbrand
2022-06-14 18:56     ` Mike Rapoport
2022-06-14 19:25       ` Nadav Amit [this message]
2022-06-14 20:40       ` John Hubbard
2022-06-14 20:56         ` Nadav Amit
2022-06-14 21:40           ` John Hubbard
2022-06-14 21:52             ` Nadav Amit
2022-06-14 21:59               ` John Hubbard
2022-06-15  7:26           ` Mike Rapoport
2022-06-15 15:43             ` Peter Xu
2022-06-15 16:58               ` Nadav Amit
2022-06-15 18:39                 ` Peter Xu
2022-06-15 19:42                   ` Nadav Amit
2022-06-15 20:56                     ` Peter Xu
2022-06-16  5:24                       ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06230F13-F08C-474E-A06B-62A89AE856D2@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).