Re: [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ives van Hoorne <ives@codesandbox.io>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Alistair Popple <apopple@nvidia.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte
Date: Thu, 1 Dec 2022 10:28:30 -0500	[thread overview]
Message-ID: <Y4jIHureiOd8XjDX@x1n> (raw)
In-Reply-To: <20221130142425.6a7fdfa3e5954f3c305a77ee@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 3759 bytes --]

Hi, Andrew,

On Wed, Nov 30, 2022 at 02:24:25PM -0800, Andrew Morton wrote:
> On Tue, 15 Nov 2022 19:17:43 +0100 David Hildenbrand <david@redhat.com> wrote:
> 
> > On 14.11.22 01:04, Peter Xu wrote:
> > > Ives van Hoorne from codesandbox.io reported an issue regarding possible
> > > data loss of uffd-wp when applied to memfds on heavily loaded systems.  The
> > > symptom is some read page got data mismatch from the snapshot child VMs.
> > > 
> > > Here I can also reproduce with a Rust reproducer that was provided by Ives
> > > that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate
> > > 80 instances I can trigger the issues in ten minutes.
> > > 
> > > It turns out that we got some pages write-through even if uffd-wp is
> > > applied to the pte.
> > > 
> > > The problem is, when removing migration entries, we didn't really worry
> > > about write bit as long as we know it's not a write migration entry.  That
> > > may not be true, for some memory types (e.g. writable shmem) mk_pte can
> > > return a pte with write bit set, then to recover the migration entry to its
> > > original state we need to explicit wr-protect the pte or it'll has the
> > > write bit set if it's a read migration entry.  For uffd it can cause
> > > write-through.
> > > 
> > > The relevant code on uffd was introduced in the anon support, which is
> > > commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration",
> > > 2020-04-07).  However anon shouldn't suffer from this problem because anon
> > > should already have the write bit cleared always, so that may not be a
> > > proper Fixes target, while I'm adding the Fixes to be uffd shmem support.
> > > 
> >
> > ...
> >
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio,
> > >   			pte = pte_mkdirty(pte);
> > >   		if (is_writable_migration_entry(entry))
> > >   			pte = maybe_mkwrite(pte, vma);
> > > -		else if (pte_swp_uffd_wp(*pvmw.pte))
> > > +		else
> > > +			/* NOTE: mk_pte can have write bit set */
> > > +			pte = pte_wrprotect(pte);
> > > +
> > > +		if (pte_swp_uffd_wp(*pvmw.pte)) {
> > > +			WARN_ON_ONCE(pte_write(pte));
> 
> Will this warnnig trigger in the scenario you and Ives have discovered?

If without the above newly added wr-protect, yes.  This is the case where
we found we got write bit set even if uffd-wp bit is also set, hence allows
the write to go through even if marked protected.

> 
> > >   			pte = pte_mkuffd_wp(pte);
> > > +		}
> > >   
> > >   		if (folio_test_anon(folio) && !is_readable_migration_entry(entry))
> > >   			rmap_flags |= RMAP_EXCLUSIVE;
> > 
> > As raised, I don't agree to this generic non-uffd-wp change without 
> > further, clear justification.
> 
> Pater, can you please work this further?

I didn't reply here because I have already replied with the question in
previous version with a few attempts.  Quotting myself:

https://lore.kernel.org/all/Y3KgYeMTdTM0FN5W@x1n/

        The thing is recovering the pte into its original form is the
        safest approach to me, so I think we need justification on why it's
        always safe to set the write bit.

I've also got another longer email trying to explain why I think it's the
other way round to be justfied, rather than justifying removal of the write
bit for a read migration entry, here:

https://lore.kernel.org/all/Y3O5bCXSbvKJrjRL@x1n/

> 
> > I won't nack it, but I won't ack it either.
> 
> I wouldn't mind seeing a little code comment which explains why we're
> doing this.

I've got one more fixup to the same patch attached, with enriched comments
on why we need wr-protect for read migration entries.

Please have a look to see whether that helps, thanks.

-- 
Peter Xu

[-- Attachment #2: 0001-fixup-mm-migrate-fix-read-only-page-got-writable-whe.patch --]
[-- Type: text/plain, Size: 1101 bytes --]

From d68c98047ce54c62f3454997a55f23ff6fb317cd Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Thu, 1 Dec 2022 10:19:22 -0500
Subject: [PATCH] fixup! mm/migrate: fix read-only page got writable when
 recover pte
Content-type: text/plain

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/migrate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index c13c828d34f3..d14f1f3ab073 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -214,7 +214,14 @@ static bool remove_migration_pte(struct folio *folio,
 		if (is_writable_migration_entry(entry))
 			pte = maybe_mkwrite(pte, vma);
 		else
-			/* NOTE: mk_pte can have write bit set */
+			/*
+			 * NOTE: mk_pte() can have write bit set per memory
+			 * type (e.g. shmem), or pte_mkdirty() per archs
+			 * (e.g., sparc64).  If this is a read migration
+			 * entry, we need to make sure when we recover the
+			 * pte from migration entry to present entry the
+			 * write bit is cleared.
+			 */
 			pte = pte_wrprotect(pte);
 
 		if (pte_swp_uffd_wp(*pvmw.pte)) {
-- 
2.37.3

next prev parent reply	other threads:[~2022-12-01 15:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-14  0:04 [PATCH v3 0/2] mm/migrate: Fix writable pte for read migration entry Peter Xu
2022-11-14  0:04 ` [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte Peter Xu
2022-11-15 18:17   ` David Hildenbrand
2022-11-30 22:24     ` Andrew Morton
2022-12-01 15:28       ` Peter Xu [this message]
2022-12-01 15:42         ` David Hildenbrand
2022-12-01 22:30           ` Andrew Morton
2022-12-02 11:03             ` David Hildenbrand
2022-12-02 12:07               ` David Hildenbrand
2022-12-02 15:14                 ` Peter Xu
2022-12-02 15:40                   ` David Hildenbrand
2022-12-02 17:18             ` David Hildenbrand
2022-11-14  0:04 ` [PATCH v3 2/2] mm/uffd: Sanity check write bit for uffd-wp protected ptes Peter Xu

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c13c828d34f dfblob:d14f1f3ab07 )
 OR (
bs:"fixup! mm/migrate: fix read-only page got writable when" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4jIHureiOd8XjDX@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=ives@codesandbox.io \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.