From: David Hildenbrand <david@redhat.com>
To: Nadav Amit <nadav.amit@gmail.com>, Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, Nadav Amit <namit@vmware.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG
Date: Tue, 14 Jun 2022 17:22:29 +0200 [thread overview]
Message-ID: <3eea2e6e-1646-546a-d9ef-d30052c00c7d@redhat.com> (raw)
In-Reply-To: <20220613204043.98432-1-namit@vmware.com>
On 13.06.22 22:40, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
>
> As we know, using a PTE on x86 with cleared access-bit (aka young-bit)
> takes ~600 cycles more than when the access-bit is set. At the same
> time, setting the access-bit for memory that is not used (e.g.,
> prefetched) can introduce greater overheads, as the prefetched memory is
> reclaimed later than it should be.
>
> Userfaultfd currently does not set the access-bit (excluding the
> huge-pages case). Arguably, it is best to let the uffd monitor control
> whether the access-bit should be set or not. The expected use is for the
> monitor to request userfaultfd to set the access-bit when the copy
> operation is done to resolve a page-fault, and not to set the young-bit
> when the memory is prefetched.
Thinking out loud about existing users: postcopy live migration in QEMU
has two usage for placement of pages
a) Resolving a fault. E.g., a VCPU might be waiting for resolution to
make progress.
b) Background migration to converge without faults on all relevant
pages.
I guess in a) we'd want UFFDIO_COPY_MODE_YOUNG in b) we don't want it.
I wonder, however, instead of calling this "young", which implies what
the OS should or shouldn't do, to define this as a hint that the placed
page is very likely to be accessed next.
I'm bad at naming, UFFDIO_COPY_MODE_ACCESS_LIKELY would express what I
have in mind.
>
> Introduce UFFDIO_COPY_MODE_YOUNG to enable userspace to request the
> young bit to be set. For UFFDIO_CONTINUE and UFFDIO_ZEROPAGE set the bit
> unconditionally since the former is only used to resolve page-faults and
> the latter would not benefit from not setting the access-bit.
>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Signed-off-by: Nadav Amit <namit@vmware.com>
>
> ---
>
> There are 2 possible enhancements:
>
> 1. Use the flag to decide on whether to mark the PTE as dirty (for
> writable PTEs). I guess that setting the dirty-bit is as expensive as
> setting the access-bit, and setting it introduces similar tradeoffs,
> as mentioned above.
>
> 2. Introduce a similar mode for write-protect and use this information
> for setting both the young and dirty bits. Makes one wonder whether
> mprotect() should also set the bit in certain cases...
I wonder if UFFDIO_COPY_MODE_READ_ACCESS_LIKELY vs.
UFFDIO_COPY_WRITE_ACCESS_LIKELY could evenmake sense. I feel like it could.
For example, QEMU knows if a page fault it's resolving was due to a read
or a write fault and could use that information accordingly. Of course,
we don't completely know if we currently have a read fault, if we could
get a write fault immediately after.
Especially in the context of UFFDIO_ZEROPAGE,
UFFDIO_ZEROPAGE_WRITE_ACCESS_LIKELY could ... not place the zeropage but
instead populate an actual page and mark it accessed+dirty. I even have
a use case for that ;)
The kernel could decide how to treat these hints -- for example, if it
doesn't want user space to mess with access/dirty bits, it could just
mostly ignore the hints.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2022-06-14 15:22 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-13 20:40 [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG Nadav Amit
2022-06-14 15:22 ` David Hildenbrand [this message]
2022-06-14 16:18 ` Nadav Amit
2022-06-14 17:14 ` David Hildenbrand
2022-06-14 18:56 ` Mike Rapoport
2022-06-14 19:25 ` Nadav Amit
2022-06-14 20:40 ` John Hubbard
2022-06-14 20:56 ` Nadav Amit
2022-06-14 21:40 ` John Hubbard
2022-06-14 21:52 ` Nadav Amit
2022-06-14 21:59 ` John Hubbard
2022-06-15 7:26 ` Mike Rapoport
2022-06-15 15:43 ` Peter Xu
2022-06-15 16:58 ` Nadav Amit
2022-06-15 18:39 ` Peter Xu
2022-06-15 19:42 ` Nadav Amit
2022-06-15 20:56 ` Peter Xu
2022-06-16 5:24 ` Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3eea2e6e-1646-546a-d9ef-d30052c00c7d@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=nadav.amit@gmail.com \
--cc=namit@vmware.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).