linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, Will Deacon <will@kernel.org>,
	Hugh Dickins <hughd@google.com>, Keir Fraser <keirf@google.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	David Hildenbrand <david@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Frederick Mayle <fmayle@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>
Subject: [PATCH] mm/gup: Drain batched mlock folio processing before attempting migration
Date: Fri, 15 Aug 2025 11:18:58 +0100	[thread overview]
Message-ID: <20250815101858.24352-1-will@kernel.org> (raw)

When taking a longterm GUP pin via pin_user_pages(),
__gup_longterm_locked() tries to migrate target folios that should not
be longterm pinned, for example because they reside in a CMA region or
movable zone. This is done by first pinning all of the target folios
anyway, collecting all of the longterm-unpinnable target folios into a
list, dropping the pins that were just taken and finally handing the
list off to migrate_pages() for the actual migration.

It is critically important that no unexpected references are held on the
folios being migrated, otherwise the migration will fail and
pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is
relatively easy to observe migration failures when running pKVM (which
uses pin_user_pages() on crosvm's virtual address space to resolve
stage-2 page faults from the guest) on a 6.15-based Pixel 6 device and
this results in the VM terminating prematurely.

In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its
mapping of guest memory prior to the pinning. Subsequently, when
pin_user_pages() walks the page-table, the relevant 'pte' is not
present and so the faulting logic allocates a new folio, mlocks it
with mlock_folio() and maps it in the page-table.

Since commit 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page()
batch by pagevec"), mlock/munlock operations on a folio (formerly page),
are deferred. For example, mlock_folio() takes an additional reference
on the target folio before placing it into a per-cpu 'folio_batch' for
later processing by mlock_folio_batch(), which drops the refcount once
the operation is complete. Processing of the batches is coupled with
the LRU batch logic and can be forcefully drained with
lru_add_drain_all() but as long as a folio remains unprocessed on the
batch, its refcount will be elevated.

This deferred batching therefore interacts poorly with the pKVM pinning
scenario as we can find ourselves in a situation where the migration
code fails to migrate a folio due to the elevated refcount from the
pending mlock operation.

Extend the existing LRU draining logic in
collect_longterm_unpinnable_folios() so that unpinnable mlocked folios
on the LRU also trigger a drain.

Cc: Hugh Dickins <hughd@google.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Frederick Mayle <fmayle@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Fixes: 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page() batch by pagevec")
Signed-off-by: Will Deacon <will@kernel.org>
---

This has been quite unpleasant to debug and, as I'm not intimately
familiar with the mm internals, I've tried to include all the relevant
details in the commit message in case there's a preferred alternative
way of solving the problem or there's a flaw in my logic.

 mm/gup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index adffe663594d..656835890f05 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2307,7 +2307,8 @@ static unsigned long collect_longterm_unpinnable_folios(
 			continue;
 		}
 
-		if (!folio_test_lru(folio) && drain_allow) {
+		if (drain_allow &&
+		   (!folio_test_lru(folio) || folio_test_mlocked(folio))) {
 			lru_add_drain_all();
 			drain_allow = false;
 		}
-- 
2.51.0.rc1.167.g924127e9c0-goog



             reply	other threads:[~2025-08-15 10:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-15 10:18 Will Deacon [this message]
2025-08-16  1:03 ` [PATCH] mm/gup: Drain batched mlock folio processing before attempting migration John Hubbard
2025-08-16  4:33   ` Hugh Dickins
2025-08-18 13:38   ` Will Deacon
2025-08-16  4:14 ` Hugh Dickins
2025-08-16  8:15   ` David Hildenbrand
2025-08-18 13:31   ` Will Deacon
2025-08-18 14:31     ` Will Deacon
2025-08-25  1:25       ` Hugh Dickins
2025-08-25 16:04         ` David Hildenbrand
2025-08-28  8:47         ` Hugh Dickins
2025-08-28  8:59           ` David Hildenbrand
2025-08-28 16:12             ` Hugh Dickins
2025-08-28 20:38               ` David Hildenbrand
2025-08-29  1:58                 ` Hugh Dickins
2025-08-29  8:56                   ` David Hildenbrand
2025-08-29 11:57           ` Will Deacon
2025-08-29 13:21             ` Will Deacon
2025-08-29 16:04               ` Hugh Dickins
2025-08-29 15:46             ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250815101858.24352-1-will@kernel.org \
    --to=will@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=fmayle@google.com \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=keirf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).