All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get()
@ 2026-05-23 19:41 SeongJae Park
  2026-05-23 19:57 ` sashiko-bot
  0 siblings, 1 reply; 4+ messages in thread
From: SeongJae Park @ 2026-05-23 19:41 UTC (permalink / raw)
  Cc: SeongJae Park, # 5 . 15 . x, Andrew Morton, Fernand Sieber,
	Leonard Foerster, Shakeel Butt, damon, linux-kernel, linux-mm

damon_get_folio() speculatively calls folio_test_lru() before
folio_try_get().  The folio can get freed and reallocated to a tail
page.  In the case, VM_BUG_ON_PGFLAGS() in const_folio_flags() can be
triggered.  Remove the speculative call.

Also do the folio_test_lru() check right after folio_try_get() success,
since it is more likely than folio realloc race.

The race should be rare.  Also the problem can happen only if the kernel
has enabled CONFIG_DEBUG_VM_PGFLAGS.  No real world report of this issue
has been made so far.  This fix is based on only theoretical analysis.
That said, a bug is a bug.  A similar issue was also fixed via commit
3203b3ab0fcf ("mm/filemap: don't call folio_test_locked() without a
reference in next_uptodate_folio()").  I don't expect this change will
make a meaningful impact to DAMON performance in the real world, though
I will be happy to be corrected from the real world reports.

The issue was discovered [1] by Sashiko.

[1] https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org

Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces")
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/ops-common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
index 3a0ddc3ac7196..d3404615f9b75 100644
--- a/mm/damon/ops-common.c
+++ b/mm/damon/ops-common.c
@@ -32,9 +32,9 @@ struct folio *damon_get_folio(unsigned long pfn)
 		return NULL;
 
 	folio = page_folio(page);
-	if (!folio_test_lru(folio) || !folio_try_get(folio))
+	if (!folio_try_get(folio))
 		return NULL;
-	if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) {
+	if (!folio_test_lru(folio) || unlikely(page_folio(page) != folio)) {
 		folio_put(folio);
 		folio = NULL;
 	}

base-commit: a94d68c2dfd523cebb2755787fb01c08eef70c43
-- 
2.47.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get()
  2026-05-23 19:41 [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get() SeongJae Park
@ 2026-05-23 19:57 ` sashiko-bot
  2026-05-23 20:26   ` SeongJae Park
  0 siblings, 1 reply; 4+ messages in thread
From: sashiko-bot @ 2026-05-23 19:57 UTC (permalink / raw)
  To: SeongJae Park; +Cc: damon

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Speculative folio flag access occurs before pointer re-validation, triggering VM_BUG_ON_PGFLAGS() on tail pages.
--

commit 119755fb3390d2b7d6c7f2278054cc95e95531c6
Author: SeongJae Park <sj@kernel.org>

mm/damon/ops-common: call folio_test_lru() after folio_get()

This patch attempts to fix a potential VM_BUG_ON_PGFLAGS crash by removing
the speculative LRU flag check before trying to get a reference in
damon_get_folio(). It moves the LRU check after a successful try_get.

Link: https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org

> diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> index 3a0ddc3ac7196..d3404615f9b75 100644
> --- a/mm/damon/ops-common.c
> +++ b/mm/damon/ops-common.c
> @@ -32,9 +32,9 @@ struct folio *damon_get_folio(unsigned long pfn)
>  		return NULL;
>  
>  	folio = page_folio(page);
> -	if (!folio_test_lru(folio) || !folio_try_get(folio))
> +	if (!folio_try_get(folio))
>  		return NULL;
> -	if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) {
> +	if (!folio_test_lru(folio) || unlikely(page_folio(page) != folio)) {

[Severity: High]
Does this sequence reintroduce the same bug the patch intends to fix?

By placing the LRU flag check before the pointer re-validation, C
short-circuit evaluation will evaluate the LRU flag first.

If the page was concurrently freed and reallocated as a tail page,
folio_try_get() might still succeed. Evaluating the LRU flag on this
unvalidated tail page will trigger the VM_BUG_ON_PGFLAGS() in
const_folio_flags().

Should the pointer re-validation happen before accessing any folio flags in
PFN scanners?

>  		folio_put(folio);
>  		folio = NULL;
>  	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260523194145.93122-1-sj@kernel.org?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get()
  2026-05-23 19:57 ` sashiko-bot
@ 2026-05-23 20:26   ` SeongJae Park
  2026-05-23 20:31     ` SeongJae Park
  0 siblings, 1 reply; 4+ messages in thread
From: SeongJae Park @ 2026-05-23 20:26 UTC (permalink / raw)
  To: sashiko-bot; +Cc: SeongJae Park, damon

On Sat, 23 May 2026 19:57:06 +0000 sashiko-bot@kernel.org wrote:

> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Speculative folio flag access occurs before pointer re-validation, triggering VM_BUG_ON_PGFLAGS() on tail pages.
> --
> 
> commit 119755fb3390d2b7d6c7f2278054cc95e95531c6
> Author: SeongJae Park <sj@kernel.org>
> 
> mm/damon/ops-common: call folio_test_lru() after folio_get()
> 
> This patch attempts to fix a potential VM_BUG_ON_PGFLAGS crash by removing
> the speculative LRU flag check before trying to get a reference in
> damon_get_folio(). It moves the LRU check after a successful try_get.
> 
> Link: https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org
> 
> > diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> > index 3a0ddc3ac7196..d3404615f9b75 100644
> > --- a/mm/damon/ops-common.c
> > +++ b/mm/damon/ops-common.c
> > @@ -32,9 +32,9 @@ struct folio *damon_get_folio(unsigned long pfn)
> >  		return NULL;
> >  
> >  	folio = page_folio(page);
> > -	if (!folio_test_lru(folio) || !folio_try_get(folio))
> > +	if (!folio_try_get(folio))
> >  		return NULL;
> > -	if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) {
> > +	if (!folio_test_lru(folio) || unlikely(page_folio(page) != folio)) {
> 
> [Severity: High]
> Does this sequence reintroduce the same bug the patch intends to fix?
> 
> By placing the LRU flag check before the pointer re-validation, C
> short-circuit evaluation will evaluate the LRU flag first.
> 
> If the page was concurrently freed and reallocated as a tail page,
> folio_try_get() might still succeed.
> Evaluating the LRU flag on this
> unvalidated tail page will trigger the VM_BUG_ON_PGFLAGS() in
> const_folio_flags().
> 
> Should the pointer re-validation happen before accessing any folio flags in
> PFN scanners?

I was initially assume folio_try_get() will always return a valid folio.  But I
find no code to avoid Sashiko's theory in folio_try_get().

So, I think Sashiko is correct.  I will drop the above line change on the next
revision.


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get()
  2026-05-23 20:26   ` SeongJae Park
@ 2026-05-23 20:31     ` SeongJae Park
  0 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2026-05-23 20:31 UTC (permalink / raw)
  To: SeongJae Park; +Cc: sashiko-bot, damon

On Sat, 23 May 2026 13:26:07 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Sat, 23 May 2026 19:57:06 +0000 sashiko-bot@kernel.org wrote:
> 
> > Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> > - [High] Speculative folio flag access occurs before pointer re-validation, triggering VM_BUG_ON_PGFLAGS() on tail pages.
> > --
> > 
> > commit 119755fb3390d2b7d6c7f2278054cc95e95531c6
> > Author: SeongJae Park <sj@kernel.org>
> > 
> > mm/damon/ops-common: call folio_test_lru() after folio_get()
> > 
> > This patch attempts to fix a potential VM_BUG_ON_PGFLAGS crash by removing
> > the speculative LRU flag check before trying to get a reference in
> > damon_get_folio(). It moves the LRU check after a successful try_get.
> > 
> > Link: https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org
> > 
> > > diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> > > index 3a0ddc3ac7196..d3404615f9b75 100644
> > > --- a/mm/damon/ops-common.c
> > > +++ b/mm/damon/ops-common.c
> > > @@ -32,9 +32,9 @@ struct folio *damon_get_folio(unsigned long pfn)
> > >  		return NULL;
> > >  
> > >  	folio = page_folio(page);
> > > -	if (!folio_test_lru(folio) || !folio_try_get(folio))
> > > +	if (!folio_try_get(folio))
> > >  		return NULL;
> > > -	if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) {
> > > +	if (!folio_test_lru(folio) || unlikely(page_folio(page) != folio)) {
> > 
> > [Severity: High]
> > Does this sequence reintroduce the same bug the patch intends to fix?
> > 
> > By placing the LRU flag check before the pointer re-validation, C
> > short-circuit evaluation will evaluate the LRU flag first.
> > 
> > If the page was concurrently freed and reallocated as a tail page,
> > folio_try_get() might still succeed.
> > Evaluating the LRU flag on this
> > unvalidated tail page will trigger the VM_BUG_ON_PGFLAGS() in
> > const_folio_flags().
> > 
> > Should the pointer re-validation happen before accessing any folio flags in
> > PFN scanners?
> 
> I was initially assume folio_try_get() will always return a valid folio.  But I
> find no code to avoid Sashiko's theory in folio_try_get().
> 
> So, I think Sashiko is correct.  I will drop the above line change on the next
> revision.

But, I may still mark the folio_test_lru() as no more unlikely.


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-23 20:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-23 19:41 [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get() SeongJae Park
2026-05-23 19:57 ` sashiko-bot
2026-05-23 20:26   ` SeongJae Park
2026-05-23 20:31     ` SeongJae Park

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.