DAMON development mailing list
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: sashiko-bot@kernel.org
Cc: SeongJae Park <sj@kernel.org>,
	"Ravi Jonnalagadda" <ravis.opensrc@gmail.com>,
	damon@lists.linux.dev
Subject: Re: [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric
Date: Mon, 27 Apr 2026 20:57:53 -0700	[thread overview]
Message-ID: <20260428035754.116202-1-sj@kernel.org> (raw)
In-Reply-To: <20260428032542.98E37C2BCAF@smtp.kernel.org>

On Tue, 28 Apr 2026 03:25:42 +0000 sashiko-bot@kernel.org wrote:

> Sashiko AI review found 4 potential issue(s):
> - [High] Missing `cond_resched()` inside the inner PFN-scanning loop causes CPU soft lockups.
> - [High] Casting 64-bit `phys_addr_t` to 32-bit `unsigned long` causes massive quota calculation errors on 32-bit PAE systems.
> - [High] Physical address wraparound of `folio_end` causes an infinite loop and system hang.
> - [High] Pre-existing `VM_BUG_ON` in speculative folio access within `damon_get_folio()`.

I find none of above as a blocker of this patch.  Please read below for more
details.

> --
> 
> commit 377bf0664129c87741fa640c563456b379c8a816
> Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
> 
> mm/damon: add node_eligible_mem_bp goal metric
> 
> This patch enables system-wide page distribution with target-state goals
> using PA-mode DAMON schemes. It tracks the ratio of scheme-eligible memory
> on a target node to adjust migration aggressiveness. The implementation
> calculates the current memory distribution across regions that match the
> scheme's access pattern.
> 
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index 3dbbbfdeff71..9a4220174a61 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> [ ... ]
> > @@ -2287,7 +2306,115 @@ static unsigned long damos_get_node_memcg_used_bp(
> [ ... ]
> > +			while (addr < end_addr) {
> > +				struct folio *folio;
> > +				phys_addr_t folio_start, folio_end;
> > +				phys_addr_t overlap_start, overlap_end;
> > +				phys_addr_t counted;
> > +
> > +				folio = damon_get_folio(PHYS_PFN(addr));
> 
> Since this code heavily invokes damon_get_folio() in a new PFN scanner, could
> the speculative folio access inside damon_get_folio() trigger a VM_BUG_ON? If
> damon_get_folio() tests folio_test_lru() before acquiring a reference with
> folio_try_get(), the page could be concurrently split or reallocated as a tail
> page.

I'm not sure what VM_BUG_ON() Sashiko is saying about.  Even if Sashiko is
correct, seems Sashiko argues the issue is inside damon_get_folio(), which is
not modified by this patch.  So this is not a blocker of this patch.

> 
> > +				if (!folio) {
> > +					addr = PAGE_ALIGN_DOWN(addr +
> > +							PAGE_SIZE);
> > +					if (!addr)
> > +						break;
> > +					continue;
> > +				}
> > +
> > +				/*
> > +				 * Calculate exact overlap between the region
> > +				 * [addr, end_addr) and the folio range.
> > +				 * The folio may start before addr if addr is
> > +				 * in the middle of a large folio.
> > +				 */
> > +				folio_start = PFN_PHYS(folio_pfn(folio));
> > +				folio_end = folio_start + folio_size(folio);
> > +
> > +				overlap_start = max(addr, folio_start);
> > +				overlap_end = min(end_addr, folio_end);
> > +
> > +				if (overlap_end > overlap_start) {
> > +					counted = overlap_end - overlap_start;
> > +					total_eligible += counted;
> > +					if (folio_nid(folio) == nid)
> > +						node_eligible += counted;
> > +				}
> > +
> > +				/* Advance past the entire folio */
> > +				addr = folio_end;
> 
> If a folio is located at the very end of the physical address space, could
> folio_end wrap around to 0? If addr becomes 0, could it cause an infinite
> loop since addr < end_addr would remain true?

The folio is a valid one, so should located on a valid physical address range.
And to my understanding, no real world machine allows 2**64 - 1 address.  So
this should be no blocker of this patch.

> 
> > +				folio_put(folio);
> > +			}
> > +			cond_resched();
> 
> Could scanning a large DAMON region without yielding the CPU in the inner loop
> cause soft lockups? Moving the cond_resched() inside the while loop would
> prevent it from monopolizing the processor.

As replied on previous versions, this is unlikely to happen.  We can revisit if
this really causes such issue.

> 
> > +		}
> > +	}
> > +
> > +	*total = total_eligible;
> > +	return node_eligible;
> > +}
> > +
> > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
> > +		struct damos *s, int nid)
> > +{
> > +	phys_addr_t total_eligible = 0;
> > +	phys_addr_t node_eligible;
> > +
> > +	if (c->ops.id != DAMON_OPS_PADDR)
> > +		return 0;
> > +
> > +	if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> > +		return 0;
> > +
> > +	node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
> > +
> > +	if (!(unsigned long)total_eligible)
> > +		return 0;
> > +
> > +	return mult_frac((unsigned long)node_eligible, 10000,
> > +			(unsigned long)total_eligible);
> 
> Does casting these phys_addr_t variables to unsigned long cause truncation
> errors on 32-bit PAE systems where physical addresses are 64-bit? For
> example, if the eligible memory size exceeds 4GB, the truncation could
> corrupt the ratio metrics and break the quota autotuner. Would converting
> byte counts to page counts before casting avoid this?

As replied on previous versions, the user impact is trivial.  No blocker.


Thanks,
SJ

[...]

  reply	other threads:[~2026-04-28  3:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  3:05 [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda
2026-04-28  3:25 ` sashiko-bot
2026-04-28  3:57   ` SeongJae Park [this message]
2026-04-28 15:31 ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428035754.116202-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=ravis.opensrc@gmail.com \
    --cc=sashiko-bot@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox