All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Hugh Dickins <hughd@google.com>,
	Will Deacon <will@kernel.org>, Kiryl Shutsemau <kas@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Chris Li <chrisl@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Johannes Weiner <hannes@cmpxchg.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Keir Fraser <keirf@google.com>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Li Zhe <lizhe.67@bytedance.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Peter Xu <peterx@redhat.com>, Rik van Riel <riel@surriel.com>,
	Shivank Garg <shivankg@amd.com>, Vlastimil Babka <vbabka@suse.cz>,
	Wei Xu <weixugc@google.com>, yangge <yangge1116@126.com>,
	Yuanchu Xie <yuanchu@google.com>, Yu Zhao <yuzhao@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 20/73] mm/gup: check ref_count instead of lru before migration
Date: Tue, 30 Sep 2025 16:47:24 +0200	[thread overview]
Message-ID: <20250930143821.404421202@linuxfoundation.org> (raw)
In-Reply-To: <20250930143820.537407601@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

[ Upstream commit 98c6d259319ecf6e8d027abd3f14b81324b8c0ad ]

Patch series "mm: better GUP pin lru_add_drain_all()", v2.

Series of lru_add_drain_all()-related patches, arising from recent mm/gup
migration report from Will Deacon.

This patch (of 5):

Will Deacon reports:-

When taking a longterm GUP pin via pin_user_pages(),
__gup_longterm_locked() tries to migrate target folios that should not be
longterm pinned, for example because they reside in a CMA region or
movable zone.  This is done by first pinning all of the target folios
anyway, collecting all of the longterm-unpinnable target folios into a
list, dropping the pins that were just taken and finally handing the list
off to migrate_pages() for the actual migration.

It is critically important that no unexpected references are held on the
folios being migrated, otherwise the migration will fail and
pin_user_pages() will return -ENOMEM to its caller.  Unfortunately, it is
relatively easy to observe migration failures when running pKVM (which
uses pin_user_pages() on crosvm's virtual address space to resolve stage-2
page faults from the guest) on a 6.15-based Pixel 6 device and this
results in the VM terminating prematurely.

In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its
mapping of guest memory prior to the pinning.  Subsequently, when
pin_user_pages() walks the page-table, the relevant 'pte' is not present
and so the faulting logic allocates a new folio, mlocks it with
mlock_folio() and maps it in the page-table.

Since commit 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page() batch
by pagevec"), mlock/munlock operations on a folio (formerly page), are
deferred.  For example, mlock_folio() takes an additional reference on the
target folio before placing it into a per-cpu 'folio_batch' for later
processing by mlock_folio_batch(), which drops the refcount once the
operation is complete.  Processing of the batches is coupled with the LRU
batch logic and can be forcefully drained with lru_add_drain_all() but as
long as a folio remains unprocessed on the batch, its refcount will be
elevated.

This deferred batching therefore interacts poorly with the pKVM pinning
scenario as we can find ourselves in a situation where the migration code
fails to migrate a folio due to the elevated refcount from the pending
mlock operation.

Hugh Dickins adds:-

!folio_test_lru() has never been a very reliable way to tell if an
lru_add_drain_all() is worth calling, to remove LRU cache references to
make the folio migratable: the LRU flag may be set even while the folio is
held with an extra reference in a per-CPU LRU cache.

5.18 commit 2fbb0c10d1e8 may have made it more unreliable.  Then 6.11
commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding
to LRU batch") tried to make it reliable, by moving LRU flag clearing; but
missed the mlock/munlock batches, so still unreliable as reported.

And it turns out to be difficult to extend 33dfe9204f29's LRU flag
clearing to the mlock/munlock batches: if they do benefit from batching,
mlock/munlock cannot be so effective when easily suppressed while !LRU.

Instead, switch to an expected ref_count check, which was more reliable
all along: some more false positives (unhelpful drains) than before, and
never a guarantee that the folio will prove migratable, but better.

Note on PG_private_2: ceph and nfs are still using the deprecated
PG_private_2 flag, with the aid of netfs and filemap support functions.
Although it is consistently matched by an increment of folio ref_count,
folio_expected_ref_count() intentionally does not recognize it, and ceph
folio migration currently depends on that for PG_private_2 folios to be
rejected.  New references to the deprecated flag are discouraged, so do
not add it into the collect_longterm_unpinnable_folios() calculation: but
longterm pinning of transiently PG_private_2 ceph and nfs folios (an
uncommon case) may invoke a redundant lru_add_drain_all().  And this makes
easy the backport to earlier releases: up to and including 6.12, btrfs
also used PG_private_2, but without a ref_count increment.

Note for stable backports: requires 6.16 commit 86ebd50224c0 ("mm:
add folio_expected_ref_count() for reference count calculation").

Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com
Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com
Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: yangge <yangge1116@126.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Clean cherry-pick now into this tree ]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 mm/gup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 599c6b9453166..44e5fe2535d0e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1990,7 +1990,8 @@ static unsigned long collect_longterm_unpinnable_pages(
 			continue;
 		}
 
-		if (!folio_test_lru(folio) && drain_allow) {
+		if (drain_allow && folio_ref_count(folio) !=
+				   folio_expected_ref_count(folio) + 1) {
 			lru_add_drain_all();
 			drain_allow = false;
 		}
-- 
2.51.0




  parent reply	other threads:[~2025-09-30 15:17 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-30 14:47 [PATCH 6.1 00/73] 6.1.155-rc1 review Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 01/73] ALSA: usb-audio: Fix block comments in mixer_quirks Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 02/73] ALSA: usb-audio: Drop unnecessary parentheses " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 03/73] ALSA: usb-audio: Avoid multiple assignments " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 04/73] ALSA: usb-audio: Simplify NULL comparison " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 05/73] ALSA: usb-audio: Remove unneeded wmb() " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 06/73] ALSA: usb-audio: Add mixer quirk for Sony DualSense PS5 Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 07/73] HID: multitouch: Get the contact ID from HID_DG_TRANSDUCER_INDEX fields in case of Apple Touch Bar Greg Kroah-Hartman
2025-09-30 15:23   ` Aditya Garg
2025-10-02  7:09     ` Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 08/73] HID: multitouch: support getting the tip state from HID_DG_TOUCH fields in " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 09/73] HID: multitouch: take cls->maxcontacts into account for Apple Touch Bar even without a HID_DG_CONTACTMAX field Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 10/73] HID: multitouch: specify that Apple Touch Bar is direct Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 11/73] ALSA: usb-audio: Convert comma to semicolon Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 12/73] ALSA: usb-audio: Fix build with CONFIG_INPUT=n Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 13/73] usb: core: Add 0x prefix to quirks debug output Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 14/73] ALSA: usb-audio: Add DSD support for Comtrue USB Audio device Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 15/73] ALSA: usb-audio: move mixer_quirks min_mute into common quirk Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 16/73] ALSA: usb-audio: Add mute TLV for playback volumes on more devices Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 17/73] IB/mlx5: Fix obj_type mismatch for SRQ event subscriptions Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 18/73] mm/gup: revert "mm: gup: fix infinite loop within __get_longterm_locked" Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 19/73] mm: add folio_expected_ref_count() for reference count calculation Greg Kroah-Hartman
2025-09-30 14:47 ` Greg Kroah-Hartman [this message]
2025-09-30 14:47 ` [PATCH 6.1 21/73] mm/gup: local lru_add_drain() to avoid lru_add_drain_all() Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 22/73] mm: folio_may_be_lru_cached() unless folio_test_large() Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 23/73] arm64: dts: imx8mp: Correct thermal sensor index Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 24/73] cpufreq: Initialize cpufreq-based invariance before subsys Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 25/73] smb: server: dont use delayed_work for post_recv_credits_work Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 26/73] can: rcar_can: rcar_can_resume(): fix s2ram with PSCI Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 27/73] bpf: Reject bpf_timer for PREEMPT_RT Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 28/73] can: etas_es58x: sort the includes by alphabetic order Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 29/73] can: etas_es58x: populate ndo_change_mtu() to prevent buffer overflow Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 30/73] can: hi311x: " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 31/73] can: sun4i_can: " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 32/73] can: mcba_usb: " Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 33/73] can: peak_usb: fix shift-out-of-bounds issue Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 34/73] ethernet: rvu-af: Remove slash from the driver name Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 35/73] Bluetooth: hci_sync: Fix hci_resume_advertising_sync Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 36/73] Bluetooth: hci_event: Fix UAF in hci_acl_create_conn_sync Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 37/73] bnxt_en: correct offset handling for IPv6 destination address Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 38/73] nexthop: Forbid FDB status change while nexthop is in a group Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 39/73] selftests: fib_nexthops: Fix creation of non-FDB nexthops Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 40/73] net: dsa: lantiq_gswip: do also enable or disable cpu port Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 41/73] net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup() Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 42/73] net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 43/73] octeontx2-pf: Fix potential use after free in otx2_tc_add_flow() Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 44/73] drm/gma500: Fix null dereference in hdmi teardown Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 45/73] futex: Prevent use-after-free during requeue-PI Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 46/73] i40e: fix idx validation in i40e_validate_queue_map Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 47/73] i40e: fix input validation logic for action_meta Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 48/73] i40e: add max boundary check for VF filters Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 49/73] i40e: add mask to apply valid bits for itr_idx Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 50/73] i40e: improve VF MAC filters accounting Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 51/73] crypto: af_alg - Fix incorrect boolean values in af_alg_ctx Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 52/73] tracing: dynevent: Add a missing lockdown check on dynevent Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 53/73] afs: Fix potential null pointer dereference in afs_put_server Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 54/73] mm/hugetlb: fix folio is still mapped when deleted Greg Kroah-Hartman
2025-09-30 14:47 ` [PATCH 6.1 55/73] fbcon: fix integer overflow in fbcon_do_set_font Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 56/73] fbcon: Fix OOB access in font allocation Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 57/73] s390/cpum_cf: Fix uninitialized warning after backport of ce971233242b Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 58/73] mm: migrate_device: use more folio in migrate_device_finalize() Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 59/73] mm/migrate_device: dont add folio to be freed to LRU " Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 60/73] minmax: add in_range() macro Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 61/73] minmax: Introduce {min,max}_array() Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 62/73] minmax: deduplicate __unconst_integer_typeof() Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 63/73] minmax: fix indentation of __cmp_once() and __clamp_once() Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 64/73] minmax: avoid overly complicated constant expressions in VM code Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 65/73] drm/ast: Use msleep instead of mdelay for edid read Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 66/73] i40e: fix validation of VF state in get resources Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 67/73] i40e: fix idx validation in config queues msg Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 68/73] i40e: increase max descriptors for XL710 Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 69/73] i40e: add validation for ring_len param Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 70/73] kmsan: fix out-of-bounds access to shadow memory Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 71/73] minmax: make generic MIN() and MAX() macros available everywhere Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 72/73] minmax: add a few more MIN_T/MAX_T users Greg Kroah-Hartman
2025-09-30 14:48 ` [PATCH 6.1 73/73] minmax: simplify and clarify min_t()/max_t() implementation Greg Kroah-Hartman
2025-09-30 17:37 ` [PATCH 6.1 00/73] 6.1.155-rc1 review Florian Fainelli
2025-09-30 18:33 ` Peter Schneider
2025-09-30 18:50 ` Brett A C Sheffield
2025-10-01  3:04 ` [PATCH 6.1 00/73] " Ron Economos
2025-10-01  9:11 ` Jon Hunter
2025-10-01 10:16 ` Mark Brown
2025-10-01 11:42 ` Naresh Kamboju
2025-10-01 16:17 ` Shuah Khan
2025-10-01 17:17 ` Miguel Ojeda
2025-10-03  6:57 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250930143821.404421202@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=axelrasmussen@google.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=kas@kernel.org \
    --cc=keirf@google.com \
    --cc=koct9i@gmail.com \
    --cc=lizhe.67@bytedance.com \
    --cc=patches@lists.linux.dev \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=sashal@kernel.org \
    --cc=shivankg@amd.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yangge1116@126.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.