From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Hugh Dickins <hughd@google.com>,
Will Deacon <will@kernel.org>, Kiryl Shutsemau <kas@kernel.org>,
David Hildenbrand <david@redhat.com>,
"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Chris Li <chrisl@kernel.org>,
Christoph Hellwig <hch@infradead.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
Johannes Weiner <hannes@cmpxchg.org>,
John Hubbard <jhubbard@nvidia.com>,
Keir Fraser <keirf@google.com>,
Konstantin Khlebnikov <koct9i@gmail.com>,
Li Zhe <lizhe.67@bytedance.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Peter Xu <peterx@redhat.com>, Rik van Riel <riel@surriel.com>,
Shivank Garg <shivankg@amd.com>, Vlastimil Babka <vbabka@suse.cz>,
Wei Xu <weixugc@google.com>, yangge <yangge1116@126.com>,
Yuanchu Xie <yuanchu@google.com>, Yu Zhao <yuzhao@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 20/73] mm/gup: check ref_count instead of lru before migration
Date: Tue, 30 Sep 2025 16:47:24 +0200 [thread overview]
Message-ID: <20250930143821.404421202@linuxfoundation.org> (raw)
In-Reply-To: <20250930143820.537407601@linuxfoundation.org>
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins <hughd@google.com>
[ Upstream commit 98c6d259319ecf6e8d027abd3f14b81324b8c0ad ]
Patch series "mm: better GUP pin lru_add_drain_all()", v2.
Series of lru_add_drain_all()-related patches, arising from recent mm/gup
migration report from Will Deacon.
This patch (of 5):
Will Deacon reports:-
When taking a longterm GUP pin via pin_user_pages(),
__gup_longterm_locked() tries to migrate target folios that should not be
longterm pinned, for example because they reside in a CMA region or
movable zone. This is done by first pinning all of the target folios
anyway, collecting all of the longterm-unpinnable target folios into a
list, dropping the pins that were just taken and finally handing the list
off to migrate_pages() for the actual migration.
It is critically important that no unexpected references are held on the
folios being migrated, otherwise the migration will fail and
pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is
relatively easy to observe migration failures when running pKVM (which
uses pin_user_pages() on crosvm's virtual address space to resolve stage-2
page faults from the guest) on a 6.15-based Pixel 6 device and this
results in the VM terminating prematurely.
In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its
mapping of guest memory prior to the pinning. Subsequently, when
pin_user_pages() walks the page-table, the relevant 'pte' is not present
and so the faulting logic allocates a new folio, mlocks it with
mlock_folio() and maps it in the page-table.
Since commit 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page() batch
by pagevec"), mlock/munlock operations on a folio (formerly page), are
deferred. For example, mlock_folio() takes an additional reference on the
target folio before placing it into a per-cpu 'folio_batch' for later
processing by mlock_folio_batch(), which drops the refcount once the
operation is complete. Processing of the batches is coupled with the LRU
batch logic and can be forcefully drained with lru_add_drain_all() but as
long as a folio remains unprocessed on the batch, its refcount will be
elevated.
This deferred batching therefore interacts poorly with the pKVM pinning
scenario as we can find ourselves in a situation where the migration code
fails to migrate a folio due to the elevated refcount from the pending
mlock operation.
Hugh Dickins adds:-
!folio_test_lru() has never been a very reliable way to tell if an
lru_add_drain_all() is worth calling, to remove LRU cache references to
make the folio migratable: the LRU flag may be set even while the folio is
held with an extra reference in a per-CPU LRU cache.
5.18 commit 2fbb0c10d1e8 may have made it more unreliable. Then 6.11
commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding
to LRU batch") tried to make it reliable, by moving LRU flag clearing; but
missed the mlock/munlock batches, so still unreliable as reported.
And it turns out to be difficult to extend 33dfe9204f29's LRU flag
clearing to the mlock/munlock batches: if they do benefit from batching,
mlock/munlock cannot be so effective when easily suppressed while !LRU.
Instead, switch to an expected ref_count check, which was more reliable
all along: some more false positives (unhelpful drains) than before, and
never a guarantee that the folio will prove migratable, but better.
Note on PG_private_2: ceph and nfs are still using the deprecated
PG_private_2 flag, with the aid of netfs and filemap support functions.
Although it is consistently matched by an increment of folio ref_count,
folio_expected_ref_count() intentionally does not recognize it, and ceph
folio migration currently depends on that for PG_private_2 folios to be
rejected. New references to the deprecated flag are discouraged, so do
not add it into the collect_longterm_unpinnable_folios() calculation: but
longterm pinning of transiently PG_private_2 ceph and nfs folios (an
uncommon case) may invoke a redundant lru_add_drain_all(). And this makes
easy the backport to earlier releases: up to and including 6.12, btrfs
also used PG_private_2, but without a ref_count increment.
Note for stable backports: requires 6.16 commit 86ebd50224c0 ("mm:
add folio_expected_ref_count() for reference count calculation").
Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com
Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com
Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: yangge <yangge1116@126.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Clean cherry-pick now into this tree ]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/gup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/gup.c b/mm/gup.c
index 599c6b9453166..44e5fe2535d0e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1990,7 +1990,8 @@ static unsigned long collect_longterm_unpinnable_pages(
continue;
}
- if (!folio_test_lru(folio) && drain_allow) {
+ if (drain_allow && folio_ref_count(folio) !=
+ folio_expected_ref_count(folio) + 1) {
lru_add_drain_all();
drain_allow = false;
}
--
2.51.0
next prev parent reply other threads:[~2025-09-30 15:17 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-30 14:47 [PATCH 6.1 00/73] 6.1.155-rc1 review Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 01/73] ALSA: usb-audio: Fix block comments in mixer_quirks Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 02/73] ALSA: usb-audio: Drop unnecessary parentheses " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 03/73] ALSA: usb-audio: Avoid multiple assignments " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 04/73] ALSA: usb-audio: Simplify NULL comparison " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 05/73] ALSA: usb-audio: Remove unneeded wmb() " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 06/73] ALSA: usb-audio: Add mixer quirk for Sony DualSense PS5 Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 07/73] HID: multitouch: Get the contact ID from HID_DG_TRANSDUCER_INDEX fields in case of Apple Touch Bar Greg Kroah-Hartman
2025-09-30 15:23 ` Aditya Garg
2025-10-02 7:09 ` Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 08/73] HID: multitouch: support getting the tip state from HID_DG_TOUCH fields in " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 09/73] HID: multitouch: take cls->maxcontacts into account for Apple Touch Bar even without a HID_DG_CONTACTMAX field Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 10/73] HID: multitouch: specify that Apple Touch Bar is direct Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 11/73] ALSA: usb-audio: Convert comma to semicolon Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 12/73] ALSA: usb-audio: Fix build with CONFIG_INPUT=n Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 13/73] usb: core: Add 0x prefix to quirks debug output Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 14/73] ALSA: usb-audio: Add DSD support for Comtrue USB Audio device Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 15/73] ALSA: usb-audio: move mixer_quirks min_mute into common quirk Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 16/73] ALSA: usb-audio: Add mute TLV for playback volumes on more devices Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 17/73] IB/mlx5: Fix obj_type mismatch for SRQ event subscriptions Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 18/73] mm/gup: revert "mm: gup: fix infinite loop within __get_longterm_locked" Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 19/73] mm: add folio_expected_ref_count() for reference count calculation Greg Kroah-Hartman
2025-09-30 14:47 ` Greg Kroah-Hartman [this message]
2025-09-30 14:47 ` [PATCH 6.1 21/73] mm/gup: local lru_add_drain() to avoid lru_add_drain_all() Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 22/73] mm: folio_may_be_lru_cached() unless folio_test_large() Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 23/73] arm64: dts: imx8mp: Correct thermal sensor index Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 24/73] cpufreq: Initialize cpufreq-based invariance before subsys Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 25/73] smb: server: dont use delayed_work for post_recv_credits_work Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 26/73] can: rcar_can: rcar_can_resume(): fix s2ram with PSCI Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 27/73] bpf: Reject bpf_timer for PREEMPT_RT Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 28/73] can: etas_es58x: sort the includes by alphabetic order Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 29/73] can: etas_es58x: populate ndo_change_mtu() to prevent buffer overflow Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 30/73] can: hi311x: " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 31/73] can: sun4i_can: " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 32/73] can: mcba_usb: " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 33/73] can: peak_usb: fix shift-out-of-bounds issue Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 34/73] ethernet: rvu-af: Remove slash from the driver name Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 35/73] Bluetooth: hci_sync: Fix hci_resume_advertising_sync Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 36/73] Bluetooth: hci_event: Fix UAF in hci_acl_create_conn_sync Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 37/73] bnxt_en: correct offset handling for IPv6 destination address Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 38/73] nexthop: Forbid FDB status change while nexthop is in a group Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 39/73] selftests: fib_nexthops: Fix creation of non-FDB nexthops Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 40/73] net: dsa: lantiq_gswip: do also enable or disable cpu port Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 41/73] net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup() Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 42/73] net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 43/73] octeontx2-pf: Fix potential use after free in otx2_tc_add_flow() Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 44/73] drm/gma500: Fix null dereference in hdmi teardown Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 45/73] futex: Prevent use-after-free during requeue-PI Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 46/73] i40e: fix idx validation in i40e_validate_queue_map Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 47/73] i40e: fix input validation logic for action_meta Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 48/73] i40e: add max boundary check for VF filters Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 49/73] i40e: add mask to apply valid bits for itr_idx Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 50/73] i40e: improve VF MAC filters accounting Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 51/73] crypto: af_alg - Fix incorrect boolean values in af_alg_ctx Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 52/73] tracing: dynevent: Add a missing lockdown check on dynevent Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 53/73] afs: Fix potential null pointer dereference in afs_put_server Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 54/73] mm/hugetlb: fix folio is still mapped when deleted Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 55/73] fbcon: fix integer overflow in fbcon_do_set_font Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 56/73] fbcon: Fix OOB access in font allocation Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 57/73] s390/cpum_cf: Fix uninitialized warning after backport of ce971233242b Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 58/73] mm: migrate_device: use more folio in migrate_device_finalize() Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 59/73] mm/migrate_device: dont add folio to be freed to LRU " Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 60/73] minmax: add in_range() macro Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 61/73] minmax: Introduce {min,max}_array() Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 62/73] minmax: deduplicate __unconst_integer_typeof() Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 63/73] minmax: fix indentation of __cmp_once() and __clamp_once() Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 64/73] minmax: avoid overly complicated constant expressions in VM code Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 65/73] drm/ast: Use msleep instead of mdelay for edid read Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 66/73] i40e: fix validation of VF state in get resources Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 67/73] i40e: fix idx validation in config queues msg Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 68/73] i40e: increase max descriptors for XL710 Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 69/73] i40e: add validation for ring_len param Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 70/73] kmsan: fix out-of-bounds access to shadow memory Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 71/73] minmax: make generic MIN() and MAX() macros available everywhere Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 72/73] minmax: add a few more MIN_T/MAX_T users Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 73/73] minmax: simplify and clarify min_t()/max_t() implementation Greg Kroah-Hartman
2025-09-30 17:37 ` [PATCH 6.1 00/73] 6.1.155-rc1 review Florian Fainelli
2025-09-30 18:33 ` Peter Schneider
2025-09-30 18:50 ` Brett A C Sheffield
2025-10-01 3:04 ` [PATCH 6.1 00/73] " Ron Economos
2025-10-01 9:11 ` Jon Hunter
2025-10-01 10:16 ` Mark Brown
2025-10-01 11:42 ` Naresh Kamboju
2025-10-01 16:17 ` Shuah Khan
2025-10-01 17:17 ` Miguel Ojeda
2025-10-03 6:57 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250930143821.404421202@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=axelrasmussen@google.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=kas@kernel.org \
--cc=keirf@google.com \
--cc=koct9i@gmail.com \
--cc=lizhe.67@bytedance.com \
--cc=patches@lists.linux.dev \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=sashal@kernel.org \
--cc=shivankg@amd.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yangge1116@126.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox