linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Muhammad Usama Anjum <usama.anjum@collabora.com>
Subject: Re: [PATCH] mm/uffd: UFFD_FEATURE_WP_ZEROPAGE
Date: Tue, 28 Feb 2023 14:42:45 -0500	[thread overview]
Message-ID: <Y/5ZNTESKfntBSF3@x1n> (raw)
In-Reply-To: <4b3c2f37-3b84-3147-7513-4293e5408fdd@redhat.com>

On Thu, Feb 23, 2023 at 03:35:13PM +0100, David Hildenbrand wrote:
> I had some idea of using two markers: PTE_UFFD_WP and PT_UFFD_NO_WP, and
> being pte_none() being something fuzzy in between that the application knows
> how to deal with ("not touched since we registered uffd-wp").
> 
> The goal would be to not populate page tables just to insert PTE
> markers/zeropages, but to only special case on the "there is a page table
> with a present PTE and we're unmapping something with uffd-wp set or uffd-wp
> not set". Because when we're unmapping we already have a page table entry we
> can just set instead of allocating a page table.
> 
> Sorry for throwing semi-finished ideas at you, but the basic idea would be
> to only special case when we're unmapping something: there, we already do
> have a page mapped and can remember uffd-wp-set (clean) vs. !uffd-wp-set
> (dirty).
> 
> 
> uffd-wp protecting a range:
> * !pte_none() -> set uffd-wp bit and wrprotect
> * pte_none() -> nothing to do
> * PTE_UFFD_WP -> nothing to do
> * PTE_UFFD_NO_WP -> set PTE_UFFD_WP
> 
> unmapping a page (old way: !pte_none() -> pte_none()):
> * uffd-wp bit set: set PTE_UFFD_WP
> * uffd-wp bit not set: set PTE_UFFD_NO_WP
> 
> (re)mapping a page (old: pte_none() -> !pte_none()):
> * PTE_UFFD_WP -> set pte bit for new PTE
> * PTE_UFFD_NO_WP -> don't set pte bit for new PTE
> * pte_none() -> set pte bit for new PTE
> 
> Zapping an anon page using MADV_DONTNEED is a bit confusing. It's actually
> similar to a memory write (-> write zeroes), but we don't notify uffd-wp for
> that (I think that's something you comment on below). Theoretically, we'd
> want to set PTE_UFFD_NO_WP ("dirty") in the async mode. But that might need
> more thought of what the expected semantics actually are.
> 
> When we walk over the page tables we would get the following information
> after protecting the range:
> 
> * PTE_UFFD_WP -> clean, not modified since last protection round
> * PTE_UFFD_NO_WP -> dirty, modified since last protection round
> * pte_none() -> not mapped and therefore not modified since beginning of
>                 protection.
> * !pte_none() -> uffd-wp bit decides

I can't say I thought a lot but I feel like it may work. I'd probably avoid
calling it PTE_UFFD_NO_WP or it'll be confusing.. maybe WP_WRITTEN or
WP_RESOLVED instead.

But that interface looks weird in that the protection happens right after
VM_UFFD_WP applied to VMA and that keeps true until unregister.  One needs
to reprotect using ioctl(UFFDIO_WRITEPROTECT) OTOH after the 1st round of
tracking.  It just looks a little bit over-complicated, not to mention we
will need two markers only for userfault-wp.  I had a feeling this
complexity can cause us some trouble elsewhere.

IIUC this can be something done on top even if it'll work (I think the
userspace API doesn't need to change at all), so I'd suggest giving it some
more thoughts and we start with simple and working.

In general, I'll be happy with anything simpler if Muhammad is happy with
its current performance..  For myself, WP_UNPOPULATED is definitely much
better than the old workaround in QEMU live snapshots, so I never worry that.

> Yes, I focused on anon. Let's see if any of the above I said makes sense. :)
> 
> Anyhow, what we're discussing here is yet another uffd-wp addition, if ever,
> so don't feel blocked by my comments.

Thanks.  I've just posted a new version.

-- 
Peter Xu



  reply	other threads:[~2023-02-28 19:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-15 21:02 [PATCH] mm/uffd: UFFD_FEATURE_WP_ZEROPAGE Peter Xu
2023-02-16 10:47 ` David Hildenbrand
2023-02-16 16:29   ` Peter Xu
2023-02-16 17:00     ` David Hildenbrand
2023-02-16 17:55       ` Peter Xu
2023-02-16 18:23         ` David Hildenbrand
2023-02-16 20:08           ` Peter Xu
2023-02-17 11:41             ` David Hildenbrand
2023-02-17 23:04               ` Peter Xu
2023-02-21 12:43                 ` David Hildenbrand
2023-02-21 23:13                   ` Peter Xu
2023-02-22 17:02                     ` David Hildenbrand
2023-02-22 20:37                       ` Peter Xu
2023-02-23 14:35                         ` David Hildenbrand
2023-02-28 19:42                           ` Peter Xu [this message]
2023-03-02 14:12                             ` David Hildenbrand
2023-02-17 12:31             ` Muhammad Usama Anjum
2023-02-17 23:10               ` Peter Xu
2023-02-20  7:15                 ` Muhammad Usama Anjum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/5ZNTESKfntBSF3@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=usama.anjum@collabora.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).