* [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get()
@ 2026-05-23 19:41 SeongJae Park
2026-05-23 19:57 ` sashiko-bot
0 siblings, 1 reply; 4+ messages in thread
From: SeongJae Park @ 2026-05-23 19:41 UTC (permalink / raw)
Cc: SeongJae Park, # 5 . 15 . x, Andrew Morton, Fernand Sieber,
Leonard Foerster, Shakeel Butt, damon, linux-kernel, linux-mm
damon_get_folio() speculatively calls folio_test_lru() before
folio_try_get(). The folio can get freed and reallocated to a tail
page. In the case, VM_BUG_ON_PGFLAGS() in const_folio_flags() can be
triggered. Remove the speculative call.
Also do the folio_test_lru() check right after folio_try_get() success,
since it is more likely than folio realloc race.
The race should be rare. Also the problem can happen only if the kernel
has enabled CONFIG_DEBUG_VM_PGFLAGS. No real world report of this issue
has been made so far. This fix is based on only theoretical analysis.
That said, a bug is a bug. A similar issue was also fixed via commit
3203b3ab0fcf ("mm/filemap: don't call folio_test_locked() without a
reference in next_uptodate_folio()"). I don't expect this change will
make a meaningful impact to DAMON performance in the real world, though
I will be happy to be corrected from the real world reports.
The issue was discovered [1] by Sashiko.
[1] https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org
Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces")
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
mm/damon/ops-common.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
index 3a0ddc3ac7196..d3404615f9b75 100644
--- a/mm/damon/ops-common.c
+++ b/mm/damon/ops-common.c
@@ -32,9 +32,9 @@ struct folio *damon_get_folio(unsigned long pfn)
return NULL;
folio = page_folio(page);
- if (!folio_test_lru(folio) || !folio_try_get(folio))
+ if (!folio_try_get(folio))
return NULL;
- if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) {
+ if (!folio_test_lru(folio) || unlikely(page_folio(page) != folio)) {
folio_put(folio);
folio = NULL;
}
base-commit: a94d68c2dfd523cebb2755787fb01c08eef70c43
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get()
2026-05-23 19:41 [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get() SeongJae Park
@ 2026-05-23 19:57 ` sashiko-bot
2026-05-23 20:26 ` SeongJae Park
0 siblings, 1 reply; 4+ messages in thread
From: sashiko-bot @ 2026-05-23 19:57 UTC (permalink / raw)
To: SeongJae Park; +Cc: damon
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Speculative folio flag access occurs before pointer re-validation, triggering VM_BUG_ON_PGFLAGS() on tail pages.
--
commit 119755fb3390d2b7d6c7f2278054cc95e95531c6
Author: SeongJae Park <sj@kernel.org>
mm/damon/ops-common: call folio_test_lru() after folio_get()
This patch attempts to fix a potential VM_BUG_ON_PGFLAGS crash by removing
the speculative LRU flag check before trying to get a reference in
damon_get_folio(). It moves the LRU check after a successful try_get.
Link: https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org
> diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> index 3a0ddc3ac7196..d3404615f9b75 100644
> --- a/mm/damon/ops-common.c
> +++ b/mm/damon/ops-common.c
> @@ -32,9 +32,9 @@ struct folio *damon_get_folio(unsigned long pfn)
> return NULL;
>
> folio = page_folio(page);
> - if (!folio_test_lru(folio) || !folio_try_get(folio))
> + if (!folio_try_get(folio))
> return NULL;
> - if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) {
> + if (!folio_test_lru(folio) || unlikely(page_folio(page) != folio)) {
[Severity: High]
Does this sequence reintroduce the same bug the patch intends to fix?
By placing the LRU flag check before the pointer re-validation, C
short-circuit evaluation will evaluate the LRU flag first.
If the page was concurrently freed and reallocated as a tail page,
folio_try_get() might still succeed. Evaluating the LRU flag on this
unvalidated tail page will trigger the VM_BUG_ON_PGFLAGS() in
const_folio_flags().
Should the pointer re-validation happen before accessing any folio flags in
PFN scanners?
> folio_put(folio);
> folio = NULL;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260523194145.93122-1-sj@kernel.org?part=1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get()
2026-05-23 19:57 ` sashiko-bot
@ 2026-05-23 20:26 ` SeongJae Park
2026-05-23 20:31 ` SeongJae Park
0 siblings, 1 reply; 4+ messages in thread
From: SeongJae Park @ 2026-05-23 20:26 UTC (permalink / raw)
To: sashiko-bot; +Cc: SeongJae Park, damon
On Sat, 23 May 2026 19:57:06 +0000 sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Speculative folio flag access occurs before pointer re-validation, triggering VM_BUG_ON_PGFLAGS() on tail pages.
> --
>
> commit 119755fb3390d2b7d6c7f2278054cc95e95531c6
> Author: SeongJae Park <sj@kernel.org>
>
> mm/damon/ops-common: call folio_test_lru() after folio_get()
>
> This patch attempts to fix a potential VM_BUG_ON_PGFLAGS crash by removing
> the speculative LRU flag check before trying to get a reference in
> damon_get_folio(). It moves the LRU check after a successful try_get.
>
> Link: https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org
>
> > diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> > index 3a0ddc3ac7196..d3404615f9b75 100644
> > --- a/mm/damon/ops-common.c
> > +++ b/mm/damon/ops-common.c
> > @@ -32,9 +32,9 @@ struct folio *damon_get_folio(unsigned long pfn)
> > return NULL;
> >
> > folio = page_folio(page);
> > - if (!folio_test_lru(folio) || !folio_try_get(folio))
> > + if (!folio_try_get(folio))
> > return NULL;
> > - if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) {
> > + if (!folio_test_lru(folio) || unlikely(page_folio(page) != folio)) {
>
> [Severity: High]
> Does this sequence reintroduce the same bug the patch intends to fix?
>
> By placing the LRU flag check before the pointer re-validation, C
> short-circuit evaluation will evaluate the LRU flag first.
>
> If the page was concurrently freed and reallocated as a tail page,
> folio_try_get() might still succeed.
> Evaluating the LRU flag on this
> unvalidated tail page will trigger the VM_BUG_ON_PGFLAGS() in
> const_folio_flags().
>
> Should the pointer re-validation happen before accessing any folio flags in
> PFN scanners?
I was initially assume folio_try_get() will always return a valid folio. But I
find no code to avoid Sashiko's theory in folio_try_get().
So, I think Sashiko is correct. I will drop the above line change on the next
revision.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get()
2026-05-23 20:26 ` SeongJae Park
@ 2026-05-23 20:31 ` SeongJae Park
0 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2026-05-23 20:31 UTC (permalink / raw)
To: SeongJae Park; +Cc: sashiko-bot, damon
On Sat, 23 May 2026 13:26:07 -0700 SeongJae Park <sj@kernel.org> wrote:
> On Sat, 23 May 2026 19:57:06 +0000 sashiko-bot@kernel.org wrote:
>
> > Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> > - [High] Speculative folio flag access occurs before pointer re-validation, triggering VM_BUG_ON_PGFLAGS() on tail pages.
> > --
> >
> > commit 119755fb3390d2b7d6c7f2278054cc95e95531c6
> > Author: SeongJae Park <sj@kernel.org>
> >
> > mm/damon/ops-common: call folio_test_lru() after folio_get()
> >
> > This patch attempts to fix a potential VM_BUG_ON_PGFLAGS crash by removing
> > the speculative LRU flag check before trying to get a reference in
> > damon_get_folio(). It moves the LRU check after a successful try_get.
> >
> > Link: https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org
> >
> > > diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> > > index 3a0ddc3ac7196..d3404615f9b75 100644
> > > --- a/mm/damon/ops-common.c
> > > +++ b/mm/damon/ops-common.c
> > > @@ -32,9 +32,9 @@ struct folio *damon_get_folio(unsigned long pfn)
> > > return NULL;
> > >
> > > folio = page_folio(page);
> > > - if (!folio_test_lru(folio) || !folio_try_get(folio))
> > > + if (!folio_try_get(folio))
> > > return NULL;
> > > - if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) {
> > > + if (!folio_test_lru(folio) || unlikely(page_folio(page) != folio)) {
> >
> > [Severity: High]
> > Does this sequence reintroduce the same bug the patch intends to fix?
> >
> > By placing the LRU flag check before the pointer re-validation, C
> > short-circuit evaluation will evaluate the LRU flag first.
> >
> > If the page was concurrently freed and reallocated as a tail page,
> > folio_try_get() might still succeed.
> > Evaluating the LRU flag on this
> > unvalidated tail page will trigger the VM_BUG_ON_PGFLAGS() in
> > const_folio_flags().
> >
> > Should the pointer re-validation happen before accessing any folio flags in
> > PFN scanners?
>
> I was initially assume folio_try_get() will always return a valid folio. But I
> find no code to avoid Sashiko's theory in folio_try_get().
>
> So, I think Sashiko is correct. I will drop the above line change on the next
> revision.
But, I may still mark the folio_test_lru() as no more unlikely.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-23 20:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-23 19:41 [RFC PATCH] mm/damon/ops-common: call folio_test_lru() after folio_get() SeongJae Park
2026-05-23 19:57 ` sashiko-bot
2026-05-23 20:26 ` SeongJae Park
2026-05-23 20:31 ` SeongJae Park
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox