* [RFC PATCH 0/3] mm/mincore: clean up swap cache helper and PTL
@ 2025-08-07 15:27 Kairui Song
2025-08-07 15:27 ` [RFC PATCH 1/3] mm/mincore, swap: consolidate swap cache checking for mincore Kairui Song
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Kairui Song @ 2025-08-07 15:27 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka,
Jann Horn, Pedro Falcato, Matthew Wilcox, Hugh Dickins,
David Hildenbrand, Chris Li, Barry Song, Baoquan He, Nhat Pham,
Kemeng Shi, linux-kernel, Kairui Song
From: Kairui Song <kasong@tencent.com>
There is a swap cache helper that is only used by mincore. It was
seperated out from mincore some time ago to be shared with other users,
but now mincore is the only user again. So it can be easily merged back
to simplify the code.
Patch 1 clean this up, but I'm not very sure about Patch 2 and 3:
Realizing that the PTL seems only useful for stablizing the swap
cache space now, by grabbing the swap device before looking up the swap
cache space (patch 2), we can drop the PTL locking (patch 3).
Let me know if I'm missing something. With mmap lock, the mincore result
should be reliable enough without the PTL.
Kairui Song (3):
mm/mincore, swap: consolidate swap cache checking for mincore
mm/mincore: use a helper for checking the swap cache
mm/mincore: avoid touching the PTL
mm/mincore.c | 53 ++++++++++++++++++++++++++++++++++++++-----------
mm/swap.h | 10 ----------
mm/swap_state.c | 38 -----------------------------------
3 files changed, 41 insertions(+), 60 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 1/3] mm/mincore, swap: consolidate swap cache checking for mincore
2025-08-07 15:27 [RFC PATCH 0/3] mm/mincore: clean up swap cache helper and PTL Kairui Song
@ 2025-08-07 15:27 ` Kairui Song
2025-08-07 18:06 ` Nhat Pham
2025-08-07 15:27 ` [RFC PATCH 2/3] mm/mincore: use a helper for checking the swap cache Kairui Song
2025-08-07 15:27 ` [RFC PATCH 3/3] mm/mincore: avoid touching the PTL Kairui Song
2 siblings, 1 reply; 11+ messages in thread
From: Kairui Song @ 2025-08-07 15:27 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka,
Jann Horn, Pedro Falcato, Matthew Wilcox, Hugh Dickins,
David Hildenbrand, Chris Li, Barry Song, Baoquan He, Nhat Pham,
Kemeng Shi, linux-kernel, Kairui Song
From: Kairui Song <kasong@tencent.com>
The filemap_get_incore_folio (previously find_get_incore_page) helper
was introduced by commit 61ef18655704 ("mm: factor find_get_incore_page
out of mincore_page") to be used by later commit f5df8635c5a3 ("mm: use
find_get_incore_page in memcontrol"), so memory cgroup charge move code
can be simplified.
But commit 6b611388b626 ("memcg-v1: remove charge move code") removed
that user completely, it's only used by mincore now.
So this commit basically reverts commit 61ef18655704 ("mm: factor
find_get_incore_page out of mincore_page"). Move it back to mincore side
to simplify the code.
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/mincore.c | 29 +++++++++++++++++++++++++++--
mm/swap.h | 10 ----------
mm/swap_state.c | 38 --------------------------------------
3 files changed, 27 insertions(+), 50 deletions(-)
diff --git a/mm/mincore.c b/mm/mincore.c
index 10dabefc3acc..f0d3c9419e58 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -64,8 +64,33 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t index)
* any other file mapping (ie. marked !present and faulted in with
* tmpfs's .fault). So swapped out tmpfs mappings are tested here.
*/
- folio = filemap_get_incore_folio(mapping, index);
- if (!IS_ERR(folio)) {
+ if (IS_ENABLED(CONFIG_SWAP) && shmem_mapping(mapping)) {
+ folio = filemap_get_entry(mapping, index);
+ /*
+ * shmem/tmpfs may return swap: account for swapcache
+ * page too.
+ */
+ if (xa_is_value(folio)) {
+ struct swap_info_struct *si;
+ swp_entry_t swp = radix_to_swp_entry(folio);
+ /* There might be swapin error entries in shmem mapping. */
+ if (non_swap_entry(swp))
+ return 0;
+ /* Prevent swap device to being swapoff under us */
+ si = get_swap_device(swp);
+ if (si) {
+ folio = filemap_get_folio(swap_address_space(swp),
+ swap_cache_index(swp));
+ put_swap_device(si);
+ } else {
+ return 0;
+ }
+ }
+ } else {
+ folio = filemap_get_folio(mapping, index);
+ }
+
+ if (folio) {
present = folio_test_uptodate(folio);
folio_put(folio);
}
diff --git a/mm/swap.h b/mm/swap.h
index 911ad5ff0f89..1ae44d4193b1 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -64,9 +64,6 @@ void clear_shadow_from_swap_cache(int type, unsigned long begin,
void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry, int nr);
struct folio *swap_cache_get_folio(swp_entry_t entry,
struct vm_area_struct *vma, unsigned long addr);
-struct folio *filemap_get_incore_folio(struct address_space *mapping,
- pgoff_t index);
-
struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr,
struct swap_iocb **plug);
@@ -178,13 +175,6 @@ static inline struct folio *swap_cache_get_folio(swp_entry_t entry,
return NULL;
}
-static inline
-struct folio *filemap_get_incore_folio(struct address_space *mapping,
- pgoff_t index)
-{
- return filemap_get_folio(mapping, index);
-}
-
static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
{
return NULL;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index c354435a0923..99513b74b5d8 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -323,44 +323,6 @@ struct folio *swap_cache_get_folio(swp_entry_t entry,
return folio;
}
-/**
- * filemap_get_incore_folio - Find and get a folio from the page or swap caches.
- * @mapping: The address_space to search.
- * @index: The page cache index.
- *
- * This differs from filemap_get_folio() in that it will also look for the
- * folio in the swap cache.
- *
- * Return: The found folio or %NULL.
- */
-struct folio *filemap_get_incore_folio(struct address_space *mapping,
- pgoff_t index)
-{
- swp_entry_t swp;
- struct swap_info_struct *si;
- struct folio *folio = filemap_get_entry(mapping, index);
-
- if (!folio)
- return ERR_PTR(-ENOENT);
- if (!xa_is_value(folio))
- return folio;
- if (!shmem_mapping(mapping))
- return ERR_PTR(-ENOENT);
-
- swp = radix_to_swp_entry(folio);
- /* There might be swapin error entries in shmem mapping. */
- if (non_swap_entry(swp))
- return ERR_PTR(-ENOENT);
- /* Prevent swapoff from happening to us */
- si = get_swap_device(swp);
- if (!si)
- return ERR_PTR(-ENOENT);
- index = swap_cache_index(swp);
- folio = filemap_get_folio(swap_address_space(swp), index);
- put_swap_device(si);
- return folio;
-}
-
struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated,
bool skip_if_exists)
--
2.50.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 2/3] mm/mincore: use a helper for checking the swap cache
2025-08-07 15:27 [RFC PATCH 0/3] mm/mincore: clean up swap cache helper and PTL Kairui Song
2025-08-07 15:27 ` [RFC PATCH 1/3] mm/mincore, swap: consolidate swap cache checking for mincore Kairui Song
@ 2025-08-07 15:27 ` Kairui Song
2025-08-07 15:27 ` [RFC PATCH 3/3] mm/mincore: avoid touching the PTL Kairui Song
2 siblings, 0 replies; 11+ messages in thread
From: Kairui Song @ 2025-08-07 15:27 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka,
Jann Horn, Pedro Falcato, Matthew Wilcox, Hugh Dickins,
David Hildenbrand, Chris Li, Barry Song, Baoquan He, Nhat Pham,
Kemeng Shi, linux-kernel, Kairui Song
From: Kairui Song <kasong@tencent.com>
Introduce a mincore_swap helper for checking swap entries, seperate it
from page cache checking.
The new helper will only be called on swap address space, so it always
grab the swap device before checking the entry, caller won't need to
lock anything.
The sanity WARN_ON check will also cover all use case now, previously it
only worked for PTE checking.
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/mincore.c | 58 ++++++++++++++++++++++++++++------------------------
1 file changed, 31 insertions(+), 27 deletions(-)
diff --git a/mm/mincore.c b/mm/mincore.c
index f0d3c9419e58..1ac53acac239 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -47,6 +47,31 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
return 0;
}
+static unsigned char mincore_swap(swp_entry_t entry)
+{
+ struct swap_info_struct *si;
+ struct folio *folio = NULL;
+ unsigned char present = 0;
+
+ if (!IS_ENABLED(CONFIG_SWAP)) {
+ WARN_ON(1);
+ return 1;
+ }
+
+ si = get_swap_device(entry);
+ if (si) {
+ folio = filemap_get_folio(swap_address_space(entry),
+ swap_cache_index(entry));
+ put_swap_device(si);
+ if (folio) {
+ present = folio_test_uptodate(folio);
+ folio_put(folio);
+ }
+ }
+
+ return present;
+}
+
/*
* Later we can get more picky about what "in core" means precisely.
* For now, simply check to see if the page is in the page cache,
@@ -64,33 +89,18 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t index)
* any other file mapping (ie. marked !present and faulted in with
* tmpfs's .fault). So swapped out tmpfs mappings are tested here.
*/
- if (IS_ENABLED(CONFIG_SWAP) && shmem_mapping(mapping)) {
- folio = filemap_get_entry(mapping, index);
- /*
- * shmem/tmpfs may return swap: account for swapcache
- * page too.
- */
+ folio = filemap_get_entry(mapping, index);
+ if (folio) {
if (xa_is_value(folio)) {
- struct swap_info_struct *si;
swp_entry_t swp = radix_to_swp_entry(folio);
/* There might be swapin error entries in shmem mapping. */
if (non_swap_entry(swp))
return 0;
- /* Prevent swap device to being swapoff under us */
- si = get_swap_device(swp);
- if (si) {
- folio = filemap_get_folio(swap_address_space(swp),
- swap_cache_index(swp));
- put_swap_device(si);
- } else {
+ if (shmem_mapping(mapping))
+ return mincore_swap(swp);
+ else
return 0;
- }
}
- } else {
- folio = filemap_get_folio(mapping, index);
- }
-
- if (folio) {
present = folio_test_uptodate(folio);
folio_put(folio);
}
@@ -177,13 +187,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
*/
*vec = 1;
} else {
-#ifdef CONFIG_SWAP
- *vec = mincore_page(swap_address_space(entry),
- swap_cache_index(entry));
-#else
- WARN_ON(1);
- *vec = 1;
-#endif
+ *vec = mincore_swap(entry);
}
}
vec += step;
--
2.50.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
2025-08-07 15:27 [RFC PATCH 0/3] mm/mincore: clean up swap cache helper and PTL Kairui Song
2025-08-07 15:27 ` [RFC PATCH 1/3] mm/mincore, swap: consolidate swap cache checking for mincore Kairui Song
2025-08-07 15:27 ` [RFC PATCH 2/3] mm/mincore: use a helper for checking the swap cache Kairui Song
@ 2025-08-07 15:27 ` Kairui Song
2025-08-07 16:02 ` Jann Horn
2025-08-11 8:41 ` David Hildenbrand
2 siblings, 2 replies; 11+ messages in thread
From: Kairui Song @ 2025-08-07 15:27 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka,
Jann Horn, Pedro Falcato, Matthew Wilcox, Hugh Dickins,
David Hildenbrand, Chris Li, Barry Song, Baoquan He, Nhat Pham,
Kemeng Shi, linux-kernel, Kairui Song
From: Kairui Song <kasong@tencent.com>
mincore only interested in the existence of a page, which is a
changing state by nature, locking and making it stable is not needed.
And now neither mincore_page or mincore_swap requires PTL, this PTL
locking can be dropped.
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/mincore.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/mincore.c b/mm/mincore.c
index 1ac53acac239..cc4460aba1f9 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -153,13 +153,13 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
goto out;
}
- ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+ ptep = pte_offset_map(pmd, addr);
if (!ptep) {
walk->action = ACTION_AGAIN;
return 0;
}
for (; addr != end; ptep += step, addr += step * PAGE_SIZE) {
- pte_t pte = ptep_get(ptep);
+ pte_t pte = ptep_get_lockless(ptep);
step = 1;
/* We need to do cache lookup too for pte markers */
@@ -192,7 +192,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
}
vec += step;
}
- pte_unmap_unlock(ptep - 1, ptl);
+ pte_unmap(ptep - 1);
out:
walk->private += nr;
cond_resched();
--
2.50.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
2025-08-07 15:27 ` [RFC PATCH 3/3] mm/mincore: avoid touching the PTL Kairui Song
@ 2025-08-07 16:02 ` Jann Horn
2025-08-07 17:27 ` Kairui Song
2025-08-11 8:41 ` David Hildenbrand
1 sibling, 1 reply; 11+ messages in thread
From: Jann Horn @ 2025-08-07 16:02 UTC (permalink / raw)
To: Kairui Song
Cc: linux-mm, Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
Vlastimil Babka, Pedro Falcato, Matthew Wilcox, Hugh Dickins,
David Hildenbrand, Chris Li, Barry Song, Baoquan He, Nhat Pham,
Kemeng Shi, linux-kernel
On Thu, Aug 7, 2025 at 5:27 PM Kairui Song <ryncsn@gmail.com> wrote:
> mincore only interested in the existence of a page, which is a
> changing state by nature, locking and making it stable is not needed.
> And now neither mincore_page or mincore_swap requires PTL, this PTL
> locking can be dropped.
This means you can race such that you end up looking at an unrelated
page of another process, right? And your patch intentionally allows
that to happen in order to make mincore() faster?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
2025-08-07 16:02 ` Jann Horn
@ 2025-08-07 17:27 ` Kairui Song
2025-08-07 17:45 ` Jann Horn
0 siblings, 1 reply; 11+ messages in thread
From: Kairui Song @ 2025-08-07 17:27 UTC (permalink / raw)
To: Jann Horn
Cc: linux-mm, Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
Vlastimil Babka, Pedro Falcato, Matthew Wilcox, Hugh Dickins,
David Hildenbrand, Chris Li, Barry Song, Baoquan He, Nhat Pham,
Kemeng Shi, linux-kernel
On Fri, Aug 8, 2025 at 12:06 AM Jann Horn <jannh@google.com> wrote:
>
> On Thu, Aug 7, 2025 at 5:27 PM Kairui Song <ryncsn@gmail.com> wrote:
> > mincore only interested in the existence of a page, which is a
> > changing state by nature, locking and making it stable is not needed.
> > And now neither mincore_page or mincore_swap requires PTL, this PTL
> > locking can be dropped.
>
> This means you can race such that you end up looking at an unrelated
> page of another process, right?
I was thinking If the PTE is gone, it will make mincore go check the
page cache, but even if we hold the PTL here, the next mincore call
(if called soon enough) could check the page cache using the same
address. And it never checks any actual page if the PTE is not none.
Perhaps you mean that it's now doing the page / swap cache lookup
without holding PTL so if the PTE changed, then the lookup could be
using an invalidated index, and may find an unrelated page.
A changing PTE also means the mincore return value is changing, and if
called earlier or later by a little bit, the result of that address
could be opposite, and mincore only checks if the page existed,
it's hard to say the returned value is a false positive / negative?
Or could this introduce a new security issue?
> And your patch intentionally allows that to happen in order to make mincore() faster?
When doing a clean up (patch 1) I noticed and didn't understand why we
need a PTL here. It will no longer block others and go faster as we
remove one lock, I can drop this one if we are not comfortable with
it.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
2025-08-07 17:27 ` Kairui Song
@ 2025-08-07 17:45 ` Jann Horn
2025-08-07 18:09 ` Kairui Song
0 siblings, 1 reply; 11+ messages in thread
From: Jann Horn @ 2025-08-07 17:45 UTC (permalink / raw)
To: Kairui Song
Cc: linux-mm, Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
Vlastimil Babka, Pedro Falcato, Matthew Wilcox, Hugh Dickins,
David Hildenbrand, Chris Li, Barry Song, Baoquan He, Nhat Pham,
Kemeng Shi, linux-kernel
On Thu, Aug 7, 2025 at 7:28 PM Kairui Song <ryncsn@gmail.com> wrote:
> On Fri, Aug 8, 2025 at 12:06 AM Jann Horn <jannh@google.com> wrote:
> >
> > On Thu, Aug 7, 2025 at 5:27 PM Kairui Song <ryncsn@gmail.com> wrote:
> > > mincore only interested in the existence of a page, which is a
> > > changing state by nature, locking and making it stable is not needed.
> > > And now neither mincore_page or mincore_swap requires PTL, this PTL
> > > locking can be dropped.
> >
> > This means you can race such that you end up looking at an unrelated
> > page of another process, right?
>
> I was thinking If the PTE is gone, it will make mincore go check the
> page cache, but even if we hold the PTL here, the next mincore call
> (if called soon enough) could check the page cache using the same
> address. And it never checks any actual page if the PTE is not none.
>
> Perhaps you mean that it's now doing the page / swap cache lookup
> without holding PTL so if the PTE changed, then the lookup could be
> using an invalidated index, and may find an unrelated page.
Yes, that's what I meant.
> A changing PTE also means the mincore return value is changing, and if
> called earlier or later by a little bit, the result of that address
> could be opposite, and mincore only checks if the page existed,
> it's hard to say the returned value is a false positive / negative?
>
> Or could this introduce a new security issue?
I don't have specific security concerns here; but this is a change
that trades accuracy and simplicity for performance.
> > And your patch intentionally allows that to happen in order to make mincore() faster?
>
> When doing a clean up (patch 1) I noticed and didn't understand why we
> need a PTL here. It will no longer block others and go faster as we
> remove one lock, I can drop this one if we are not comfortable with
> it.
If you had a specific performance concern here, I think we could
consider changing this, but in my view it would sort of be breaking
the locking rules (by using a swap index that is not guaranteed to be
kept alive) and would need an explanatory comment explaining the
tradeoff.
Since you only wrote the patch because you thought the lock was
unnecessary, I'd prefer it if you drop this patch.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/3] mm/mincore, swap: consolidate swap cache checking for mincore
2025-08-07 15:27 ` [RFC PATCH 1/3] mm/mincore, swap: consolidate swap cache checking for mincore Kairui Song
@ 2025-08-07 18:06 ` Nhat Pham
2025-08-07 18:23 ` Kairui Song
0 siblings, 1 reply; 11+ messages in thread
From: Nhat Pham @ 2025-08-07 18:06 UTC (permalink / raw)
To: Kairui Song
Cc: linux-mm, Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
Vlastimil Babka, Jann Horn, Pedro Falcato, Matthew Wilcox,
Hugh Dickins, David Hildenbrand, Chris Li, Barry Song, Baoquan He,
Kemeng Shi, linux-kernel
On Thu, Aug 7, 2025 at 8:27 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> The filemap_get_incore_folio (previously find_get_incore_page) helper
> was introduced by commit 61ef18655704 ("mm: factor find_get_incore_page
> out of mincore_page") to be used by later commit f5df8635c5a3 ("mm: use
> find_get_incore_page in memcontrol"), so memory cgroup charge move code
> can be simplified.
>
> But commit 6b611388b626 ("memcg-v1: remove charge move code") removed
> that user completely, it's only used by mincore now.
>
> So this commit basically reverts commit 61ef18655704 ("mm: factor
> find_get_incore_page out of mincore_page"). Move it back to mincore side
> to simplify the code.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
Seems reasonable to me for the most part - just a couple of questions below.
> ---
> mm/mincore.c | 29 +++++++++++++++++++++++++++--
> mm/swap.h | 10 ----------
> mm/swap_state.c | 38 --------------------------------------
> 3 files changed, 27 insertions(+), 50 deletions(-)
>
> diff --git a/mm/mincore.c b/mm/mincore.c
> index 10dabefc3acc..f0d3c9419e58 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -64,8 +64,33 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t index)
> * any other file mapping (ie. marked !present and faulted in with
> * tmpfs's .fault). So swapped out tmpfs mappings are tested here.
> */
> - folio = filemap_get_incore_folio(mapping, index);
> - if (!IS_ERR(folio)) {
> + if (IS_ENABLED(CONFIG_SWAP) && shmem_mapping(mapping)) {
Do we need CONFIG_SWAP check here? I suppose if !CONFIG_SWAP we'll
never end up with an ordinary swap entry stored here right?
Saves a couple of cycles, I suppose. No strong opinions.
> + folio = filemap_get_entry(mapping, index);
> + /*
> + * shmem/tmpfs may return swap: account for swapcache
> + * page too.
> + */
> + if (xa_is_value(folio)) {
> + struct swap_info_struct *si;
> + swp_entry_t swp = radix_to_swp_entry(folio);
> + /* There might be swapin error entries in shmem mapping. */
> + if (non_swap_entry(swp))
> + return 0;
> + /* Prevent swap device to being swapoff under us */
> + si = get_swap_device(swp);
> + if (si) {
> + folio = filemap_get_folio(swap_address_space(swp),
> + swap_cache_index(swp));
> + put_swap_device(si);
> + } else {
> + return 0;
> + }
> + }
> + } else {
> + folio = filemap_get_folio(mapping, index);
> + }
> +
> + if (folio) {
Should this check be "if (!IS_ERR(folio))"? Seems like that's how we
inspect the output of filemap_get_folio() in other locations (for e.g,
in filemap_fault()).
> present = folio_test_uptodate(folio);
> folio_put(folio);
> }
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
2025-08-07 17:45 ` Jann Horn
@ 2025-08-07 18:09 ` Kairui Song
0 siblings, 0 replies; 11+ messages in thread
From: Kairui Song @ 2025-08-07 18:09 UTC (permalink / raw)
To: Jann Horn
Cc: linux-mm, Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
Vlastimil Babka, Pedro Falcato, Matthew Wilcox, Hugh Dickins,
David Hildenbrand, Chris Li, Barry Song, Baoquan He, Nhat Pham,
Kemeng Shi, linux-kernel
On Fri, Aug 8, 2025 at 1:45 AM Jann Horn <jannh@google.com> wrote:
> On Thu, Aug 7, 2025 at 7:28 PM Kairui Song <ryncsn@gmail.com> wrote:
> > On Fri, Aug 8, 2025 at 12:06 AM Jann Horn <jannh@google.com> wrote:
> > >
> > > On Thu, Aug 7, 2025 at 5:27 PM Kairui Song <ryncsn@gmail.com> wrote:
> > > > mincore only interested in the existence of a page, which is a
> > > > changing state by nature, locking and making it stable is not needed.
> > > > And now neither mincore_page or mincore_swap requires PTL, this PTL
> > > > locking can be dropped.
> > >
> > > This means you can race such that you end up looking at an unrelated
> > > page of another process, right?
> >
> > I was thinking If the PTE is gone, it will make mincore go check the
> > page cache, but even if we hold the PTL here, the next mincore call
> > (if called soon enough) could check the page cache using the same
> > address. And it never checks any actual page if the PTE is not none.
> >
> > Perhaps you mean that it's now doing the page / swap cache lookup
> > without holding PTL so if the PTE changed, then the lookup could be
> > using an invalidated index, and may find an unrelated page.
>
> Yes, that's what I meant.
>
> > A changing PTE also means the mincore return value is changing, and if
> > called earlier or later by a little bit, the result of that address
> > could be opposite, and mincore only checks if the page existed,
> > it's hard to say the returned value is a false positive / negative?
> >
> > Or could this introduce a new security issue?
>
> I don't have specific security concerns here; but this is a change
> that trades accuracy and simplicity for performance.
>
> > > And your patch intentionally allows that to happen in order to make mincore() faster?
> >
> > When doing a clean up (patch 1) I noticed and didn't understand why we
> > need a PTL here. It will no longer block others and go faster as we
> > remove one lock, I can drop this one if we are not comfortable with
> > it.
>
> If you had a specific performance concern here, I think we could
> consider changing this, but in my view it would sort of be breaking
> the locking rules (by using a swap index that is not guaranteed to be
> kept alive) and would need an explanatory comment explaining the
> tradeoff.
Thanks for the explanation.
From the swap side, get_swap_device also ensures the offset is still
in the valid lookup range so the worst thing is a very rare inaccurate
value.
PTE change will mean the page is being swapped in/out or zapped, so if
the mincore is called by like a jitter earlier / later, the result
changes. So I thought it hard to define the accuracy in such a case
considering the timing.
>
> Since you only wrote the patch because you thought the lock was
> unnecessary, I'd prefer it if you drop this patch.
Understandable, I can update and keep patch 1 and 2, which improves
the performance and clean it up without causing any potential accuracy
issues.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/3] mm/mincore, swap: consolidate swap cache checking for mincore
2025-08-07 18:06 ` Nhat Pham
@ 2025-08-07 18:23 ` Kairui Song
0 siblings, 0 replies; 11+ messages in thread
From: Kairui Song @ 2025-08-07 18:23 UTC (permalink / raw)
To: Nhat Pham
Cc: linux-mm, Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
Vlastimil Babka, Jann Horn, Pedro Falcato, Matthew Wilcox,
Hugh Dickins, David Hildenbrand, Chris Li, Barry Song, Baoquan He,
Kemeng Shi, linux-kernel
On Fri, Aug 8, 2025 at 2:07 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Thu, Aug 7, 2025 at 8:27 AM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > From: Kairui Song <kasong@tencent.com>
> >
> > The filemap_get_incore_folio (previously find_get_incore_page) helper
> > was introduced by commit 61ef18655704 ("mm: factor find_get_incore_page
> > out of mincore_page") to be used by later commit f5df8635c5a3 ("mm: use
> > find_get_incore_page in memcontrol"), so memory cgroup charge move code
> > can be simplified.
> >
> > But commit 6b611388b626 ("memcg-v1: remove charge move code") removed
> > that user completely, it's only used by mincore now.
> >
> > So this commit basically reverts commit 61ef18655704 ("mm: factor
> > find_get_incore_page out of mincore_page"). Move it back to mincore side
> > to simplify the code.
> >
> > Signed-off-by: Kairui Song <kasong@tencent.com>
>
> Seems reasonable to me for the most part - just a couple of questions below.
>
> > ---
> > mm/mincore.c | 29 +++++++++++++++++++++++++++--
> > mm/swap.h | 10 ----------
> > mm/swap_state.c | 38 --------------------------------------
> > 3 files changed, 27 insertions(+), 50 deletions(-)
> >
> > diff --git a/mm/mincore.c b/mm/mincore.c
> > index 10dabefc3acc..f0d3c9419e58 100644
> > --- a/mm/mincore.c
> > +++ b/mm/mincore.c
> > @@ -64,8 +64,33 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t index)
> > * any other file mapping (ie. marked !present and faulted in with
> > * tmpfs's .fault). So swapped out tmpfs mappings are tested here.
> > */
> > - folio = filemap_get_incore_folio(mapping, index);
> > - if (!IS_ERR(folio)) {
> > + if (IS_ENABLED(CONFIG_SWAP) && shmem_mapping(mapping)) {
>
> Do we need CONFIG_SWAP check here? I suppose if !CONFIG_SWAP we'll
> never end up with an ordinary swap entry stored here right?
Yes, and in the next patch I'd like to introduce a WARN_ON if we see
swap entries with !CONFIG_SWAP. That means the memory is corrupted.
>
> Saves a couple of cycles, I suppose. No strong opinions.
Before 61ef18655704 it used a `#ifdef CONFIG_SWAP`, I used
IS_ENABLED(CONFIG_SWAP) here, same thing, the compiler will optimize
out the unused branch. Just with fewer lines of code and I personally
think this looks prettier.
>
> > + folio = filemap_get_entry(mapping, index);
> > + /*
> > + * shmem/tmpfs may return swap: account for swapcache
> > + * page too.
> > + */
> > + if (xa_is_value(folio)) {
> > + struct swap_info_struct *si;
> > + swp_entry_t swp = radix_to_swp_entry(folio);
> > + /* There might be swapin error entries in shmem mapping. */
> > + if (non_swap_entry(swp))
> > + return 0;
> > + /* Prevent swap device to being swapoff under us */
> > + si = get_swap_device(swp);
> > + if (si) {
> > + folio = filemap_get_folio(swap_address_space(swp),
> > + swap_cache_index(swp));
> > + put_swap_device(si);
> > + } else {
> > + return 0;
> > + }
> > + }
> > + } else {
> > + folio = filemap_get_folio(mapping, index);
> > + }
> > +
> > + if (folio) {
>
> Should this check be "if (!IS_ERR(folio))"? Seems like that's how we
> inspect the output of filemap_get_folio() in other locations (for e.g,
> in filemap_fault()).
Yeah you are right, actuall should be IS_ERR_OR_NULL here as it uses
both filemap_get_entry and filemap_get_folio.
I wanted to change to always use filemap_get_entry in the next patch
for better performance, but somehow forgot it...
Will fix it.
>
> > present = folio_test_uptodate(folio);
> > folio_put(folio);
> > }
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] mm/mincore: avoid touching the PTL
2025-08-07 15:27 ` [RFC PATCH 3/3] mm/mincore: avoid touching the PTL Kairui Song
2025-08-07 16:02 ` Jann Horn
@ 2025-08-11 8:41 ` David Hildenbrand
1 sibling, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2025-08-11 8:41 UTC (permalink / raw)
To: Kairui Song, linux-mm
Cc: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka,
Jann Horn, Pedro Falcato, Matthew Wilcox, Hugh Dickins, Chris Li,
Barry Song, Baoquan He, Nhat Pham, Kemeng Shi, linux-kernel
On 07.08.25 17:27, Kairui Song wrote:
> From: Kairui Song <kasong@tencent.com>
>
> mincore only interested in the existence of a page, which is a
> changing state by nature, locking and making it stable is not needed.
> And now neither mincore_page or mincore_swap requires PTL, this PTL
> locking can be dropped.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
> mm/mincore.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/mincore.c b/mm/mincore.c
> index 1ac53acac239..cc4460aba1f9 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -153,13 +153,13 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> goto out;
> }
>
> - ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> + ptep = pte_offset_map(pmd, addr);
I agree with Jann, that if we move away from the PTL, we better have a
very good reason (+performance numbers).
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-08-11 8:41 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07 15:27 [RFC PATCH 0/3] mm/mincore: clean up swap cache helper and PTL Kairui Song
2025-08-07 15:27 ` [RFC PATCH 1/3] mm/mincore, swap: consolidate swap cache checking for mincore Kairui Song
2025-08-07 18:06 ` Nhat Pham
2025-08-07 18:23 ` Kairui Song
2025-08-07 15:27 ` [RFC PATCH 2/3] mm/mincore: use a helper for checking the swap cache Kairui Song
2025-08-07 15:27 ` [RFC PATCH 3/3] mm/mincore: avoid touching the PTL Kairui Song
2025-08-07 16:02 ` Jann Horn
2025-08-07 17:27 ` Kairui Song
2025-08-07 17:45 ` Jann Horn
2025-08-07 18:09 ` Kairui Song
2025-08-11 8:41 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).