From: SeongJae Park <sj@kernel.org>
To: sashiko-bot@kernel.org
Cc: SeongJae Park <sj@kernel.org>,
"Ravi Jonnalagadda" <ravis.opensrc@gmail.com>,
damon@lists.linux.dev
Subject: Re: [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric
Date: Mon, 27 Apr 2026 20:57:53 -0700 [thread overview]
Message-ID: <20260428035754.116202-1-sj@kernel.org> (raw)
In-Reply-To: <20260428032542.98E37C2BCAF@smtp.kernel.org>
On Tue, 28 Apr 2026 03:25:42 +0000 sashiko-bot@kernel.org wrote:
> Sashiko AI review found 4 potential issue(s):
> - [High] Missing `cond_resched()` inside the inner PFN-scanning loop causes CPU soft lockups.
> - [High] Casting 64-bit `phys_addr_t` to 32-bit `unsigned long` causes massive quota calculation errors on 32-bit PAE systems.
> - [High] Physical address wraparound of `folio_end` causes an infinite loop and system hang.
> - [High] Pre-existing `VM_BUG_ON` in speculative folio access within `damon_get_folio()`.
I find none of above as a blocker of this patch. Please read below for more
details.
> --
>
> commit 377bf0664129c87741fa640c563456b379c8a816
> Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
>
> mm/damon: add node_eligible_mem_bp goal metric
>
> This patch enables system-wide page distribution with target-state goals
> using PA-mode DAMON schemes. It tracks the ratio of scheme-eligible memory
> on a target node to adjust migration aggressiveness. The implementation
> calculates the current memory distribution across regions that match the
> scheme's access pattern.
>
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index 3dbbbfdeff71..9a4220174a61 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> [ ... ]
> > @@ -2287,7 +2306,115 @@ static unsigned long damos_get_node_memcg_used_bp(
> [ ... ]
> > + while (addr < end_addr) {
> > + struct folio *folio;
> > + phys_addr_t folio_start, folio_end;
> > + phys_addr_t overlap_start, overlap_end;
> > + phys_addr_t counted;
> > +
> > + folio = damon_get_folio(PHYS_PFN(addr));
>
> Since this code heavily invokes damon_get_folio() in a new PFN scanner, could
> the speculative folio access inside damon_get_folio() trigger a VM_BUG_ON? If
> damon_get_folio() tests folio_test_lru() before acquiring a reference with
> folio_try_get(), the page could be concurrently split or reallocated as a tail
> page.
I'm not sure what VM_BUG_ON() Sashiko is saying about. Even if Sashiko is
correct, seems Sashiko argues the issue is inside damon_get_folio(), which is
not modified by this patch. So this is not a blocker of this patch.
>
> > + if (!folio) {
> > + addr = PAGE_ALIGN_DOWN(addr +
> > + PAGE_SIZE);
> > + if (!addr)
> > + break;
> > + continue;
> > + }
> > +
> > + /*
> > + * Calculate exact overlap between the region
> > + * [addr, end_addr) and the folio range.
> > + * The folio may start before addr if addr is
> > + * in the middle of a large folio.
> > + */
> > + folio_start = PFN_PHYS(folio_pfn(folio));
> > + folio_end = folio_start + folio_size(folio);
> > +
> > + overlap_start = max(addr, folio_start);
> > + overlap_end = min(end_addr, folio_end);
> > +
> > + if (overlap_end > overlap_start) {
> > + counted = overlap_end - overlap_start;
> > + total_eligible += counted;
> > + if (folio_nid(folio) == nid)
> > + node_eligible += counted;
> > + }
> > +
> > + /* Advance past the entire folio */
> > + addr = folio_end;
>
> If a folio is located at the very end of the physical address space, could
> folio_end wrap around to 0? If addr becomes 0, could it cause an infinite
> loop since addr < end_addr would remain true?
The folio is a valid one, so should located on a valid physical address range.
And to my understanding, no real world machine allows 2**64 - 1 address. So
this should be no blocker of this patch.
>
> > + folio_put(folio);
> > + }
> > + cond_resched();
>
> Could scanning a large DAMON region without yielding the CPU in the inner loop
> cause soft lockups? Moving the cond_resched() inside the while loop would
> prevent it from monopolizing the processor.
As replied on previous versions, this is unlikely to happen. We can revisit if
this really causes such issue.
>
> > + }
> > + }
> > +
> > + *total = total_eligible;
> > + return node_eligible;
> > +}
> > +
> > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
> > + struct damos *s, int nid)
> > +{
> > + phys_addr_t total_eligible = 0;
> > + phys_addr_t node_eligible;
> > +
> > + if (c->ops.id != DAMON_OPS_PADDR)
> > + return 0;
> > +
> > + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> > + return 0;
> > +
> > + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
> > +
> > + if (!(unsigned long)total_eligible)
> > + return 0;
> > +
> > + return mult_frac((unsigned long)node_eligible, 10000,
> > + (unsigned long)total_eligible);
>
> Does casting these phys_addr_t variables to unsigned long cause truncation
> errors on 32-bit PAE systems where physical addresses are 64-bit? For
> example, if the eligible memory size exceeds 4GB, the truncation could
> corrupt the ratio metrics and break the quota autotuner. Would converting
> byte counts to page counts before casting avoid this?
As replied on previous versions, the user impact is trivial. No blocker.
Thanks,
SJ
[...]
next prev parent reply other threads:[~2026-04-28 3:58 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 3:05 [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda
2026-04-28 3:25 ` sashiko-bot
2026-04-28 3:57 ` SeongJae Park [this message]
2026-04-28 15:31 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260428035754.116202-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=damon@lists.linux.dev \
--cc=ravis.opensrc@gmail.com \
--cc=sashiko-bot@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox