From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C13B223DD6; Tue, 30 Sep 2025 15:23:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759245806; cv=none; b=A23ihptjq/7SF/zocLM9QQ6mCh2cnwRX942ixMo+5gsUyzAgk4mUOTH1haobyFAOkp2xXKe43+oLRv1BywjZtGf4uoL8tr7skAWX8ueY4XxNL4cw3UZNjj9t9ay0cPcVz+PokF5dKXZAn59ZlvLMmBUMf/nY+m00NzyvHojApeQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759245806; c=relaxed/simple; bh=R+VYrnhAhSogxMxS4ZTzLrMChqUnpNbWXLGW5Fh9bCM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Aw1KiR69uh4MYcB55Qx0A6BpIYSkZRsUiPr60uQxmpMu3fYQVaSwED89VcEhxVCLV+dNhYbkILlDIhJ/bW9xeLkWmiu5hjF8DaH6GwL7XWcAgDBnEJPoxbuW0HYTA9te4mu0uzDP8X6fIKgiGX4pBuNB39Oqz8gLpg5jp8/cqQM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=vs6i3JcM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="vs6i3JcM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 934E2C4CEF0; Tue, 30 Sep 2025 15:23:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1759245806; bh=R+VYrnhAhSogxMxS4ZTzLrMChqUnpNbWXLGW5Fh9bCM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vs6i3JcM8aoBYyrysQ3+7JqBdM8oqW8kIvL+/XDGZ8Jg+rBfN8qALkUOGruWALRJp BhTduXazdpyLk+zLhKe+lskMcFYxBBmItiehpr8mq7z4DPuCO94QygKbqYnswZ5/S4 0XyDL4N7AOnry06RrEaOUjD4McbGjuvdlg90XCfg= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Hugh Dickins , Will Deacon , Kiryl Shutsemau , David Hildenbrand , "Aneesh Kumar K.V" , Axel Rasmussen , Chris Li , Christoph Hellwig , Jason Gunthorpe , Johannes Weiner , John Hubbard , Keir Fraser , Konstantin Khlebnikov , Li Zhe , "Matthew Wilcox (Oracle)" , Peter Xu , Rik van Riel , Shivank Garg , Vlastimil Babka , Wei Xu , yangge , Yuanchu Xie , Yu Zhao , Andrew Morton , Sasha Levin Subject: [PATCH 6.6 24/91] mm/gup: check ref_count instead of lru before migration Date: Tue, 30 Sep 2025 16:47:23 +0200 Message-ID: <20250930143822.142158765@linuxfoundation.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250930143821.118938523@linuxfoundation.org> References: <20250930143821.118938523@linuxfoundation.org> User-Agent: quilt/0.69 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 6.6-stable review patch. If anyone has any objections, please let me know. ------------------ From: Hugh Dickins [ Upstream commit 98c6d259319ecf6e8d027abd3f14b81324b8c0ad ] Patch series "mm: better GUP pin lru_add_drain_all()", v2. Series of lru_add_drain_all()-related patches, arising from recent mm/gup migration report from Will Deacon. This patch (of 5): Will Deacon reports:- When taking a longterm GUP pin via pin_user_pages(), __gup_longterm_locked() tries to migrate target folios that should not be longterm pinned, for example because they reside in a CMA region or movable zone. This is done by first pinning all of the target folios anyway, collecting all of the longterm-unpinnable target folios into a list, dropping the pins that were just taken and finally handing the list off to migrate_pages() for the actual migration. It is critically important that no unexpected references are held on the folios being migrated, otherwise the migration will fail and pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is relatively easy to observe migration failures when running pKVM (which uses pin_user_pages() on crosvm's virtual address space to resolve stage-2 page faults from the guest) on a 6.15-based Pixel 6 device and this results in the VM terminating prematurely. In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its mapping of guest memory prior to the pinning. Subsequently, when pin_user_pages() walks the page-table, the relevant 'pte' is not present and so the faulting logic allocates a new folio, mlocks it with mlock_folio() and maps it in the page-table. Since commit 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page() batch by pagevec"), mlock/munlock operations on a folio (formerly page), are deferred. For example, mlock_folio() takes an additional reference on the target folio before placing it into a per-cpu 'folio_batch' for later processing by mlock_folio_batch(), which drops the refcount once the operation is complete. Processing of the batches is coupled with the LRU batch logic and can be forcefully drained with lru_add_drain_all() but as long as a folio remains unprocessed on the batch, its refcount will be elevated. This deferred batching therefore interacts poorly with the pKVM pinning scenario as we can find ourselves in a situation where the migration code fails to migrate a folio due to the elevated refcount from the pending mlock operation. Hugh Dickins adds:- !folio_test_lru() has never been a very reliable way to tell if an lru_add_drain_all() is worth calling, to remove LRU cache references to make the folio migratable: the LRU flag may be set even while the folio is held with an extra reference in a per-CPU LRU cache. 5.18 commit 2fbb0c10d1e8 may have made it more unreliable. Then 6.11 commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch") tried to make it reliable, by moving LRU flag clearing; but missed the mlock/munlock batches, so still unreliable as reported. And it turns out to be difficult to extend 33dfe9204f29's LRU flag clearing to the mlock/munlock batches: if they do benefit from batching, mlock/munlock cannot be so effective when easily suppressed while !LRU. Instead, switch to an expected ref_count check, which was more reliable all along: some more false positives (unhelpful drains) than before, and never a guarantee that the folio will prove migratable, but better. Note on PG_private_2: ceph and nfs are still using the deprecated PG_private_2 flag, with the aid of netfs and filemap support functions. Although it is consistently matched by an increment of folio ref_count, folio_expected_ref_count() intentionally does not recognize it, and ceph folio migration currently depends on that for PG_private_2 folios to be rejected. New references to the deprecated flag are discouraged, so do not add it into the collect_longterm_unpinnable_folios() calculation: but longterm pinning of transiently PG_private_2 ceph and nfs folios (an uncommon case) may invoke a redundant lru_add_drain_all(). And this makes easy the backport to earlier releases: up to and including 6.12, btrfs also used PG_private_2, but without a ref_count increment. Note for stable backports: requires 6.16 commit 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation"). Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Hugh Dickins Reported-by: Will Deacon Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/ Acked-by: Kiryl Shutsemau Acked-by: David Hildenbrand Cc: "Aneesh Kumar K.V" Cc: Axel Rasmussen Cc: Chris Li Cc: Christoph Hellwig Cc: Jason Gunthorpe Cc: Johannes Weiner Cc: John Hubbard Cc: Keir Fraser Cc: Konstantin Khlebnikov Cc: Li Zhe Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Cc: Rik van Riel Cc: Shivank Garg Cc: Vlastimil Babka Cc: Wei Xu Cc: yangge Cc: Yuanchu Xie Cc: Yu Zhao Cc: Signed-off-by: Andrew Morton [ Clean cherry-pick now into this tree ] Signed-off-by: Hugh Dickins Signed-off-by: Sasha Levin --- mm/gup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/gup.c b/mm/gup.c index 497d7ce43d393..00ac2df7164c3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1975,7 +1975,8 @@ static unsigned long collect_longterm_unpinnable_pages( continue; } - if (!folio_test_lru(folio) && drain_allow) { + if (drain_allow && folio_ref_count(folio) != + folio_expected_ref_count(folio) + 1) { lru_add_drain_all(); drain_allow = false; } -- 2.51.0