Re: [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric

All of lore.kernel.org
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: sashiko-bot@kernel.org
Cc: SeongJae Park <sj@kernel.org>,
	"Ravi Jonnalagadda" <ravis.opensrc@gmail.com>,
	damon@lists.linux.dev
Subject: Re: [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric
Date: Mon, 27 Apr 2026 20:57:53 -0700	[thread overview]
Message-ID: <20260428035754.116202-1-sj@kernel.org> (raw)
In-Reply-To: <20260428032542.98E37C2BCAF@smtp.kernel.org>

On Tue, 28 Apr 2026 03:25:42 +0000 sashiko-bot@kernel.org wrote:

> Sashiko AI review found 4 potential issue(s):
> - [High] Missing `cond_resched()` inside the inner PFN-scanning loop causes CPU soft lockups.
> - [High] Casting 64-bit `phys_addr_t` to 32-bit `unsigned long` causes massive quota calculation errors on 32-bit PAE systems.
> - [High] Physical address wraparound of `folio_end` causes an infinite loop and system hang.
> - [High] Pre-existing `VM_BUG_ON` in speculative folio access within `damon_get_folio()`.

I find none of above as a blocker of this patch.  Please read below for more
details.

> --
> 
> commit 377bf0664129c87741fa640c563456b379c8a816
> Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
> 
> mm/damon: add node_eligible_mem_bp goal metric
> 
> This patch enables system-wide page distribution with target-state goals
> using PA-mode DAMON schemes. It tracks the ratio of scheme-eligible memory
> on a target node to adjust migration aggressiveness. The implementation
> calculates the current memory distribution across regions that match the
> scheme's access pattern.
> 
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index 3dbbbfdeff71..9a4220174a61 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> [ ... ]
> > @@ -2287,7 +2306,115 @@ static unsigned long damos_get_node_memcg_used_bp(
> [ ... ]
> > +			while (addr < end_addr) {
> > +				struct folio *folio;
> > +				phys_addr_t folio_start, folio_end;
> > +				phys_addr_t overlap_start, overlap_end;
> > +				phys_addr_t counted;
> > +
> > +				folio = damon_get_folio(PHYS_PFN(addr));
> 
> Since this code heavily invokes damon_get_folio() in a new PFN scanner, could
> the speculative folio access inside damon_get_folio() trigger a VM_BUG_ON? If
> damon_get_folio() tests folio_test_lru() before acquiring a reference with
> folio_try_get(), the page could be concurrently split or reallocated as a tail
> page.

I'm not sure what VM_BUG_ON() Sashiko is saying about.  Even if Sashiko is
correct, seems Sashiko argues the issue is inside damon_get_folio(), which is
not modified by this patch.  So this is not a blocker of this patch.

> 
> > +				if (!folio) {
> > +					addr = PAGE_ALIGN_DOWN(addr +
> > +							PAGE_SIZE);
> > +					if (!addr)
> > +						break;
> > +					continue;
> > +				}
> > +
> > +				/*
> > +				 * Calculate exact overlap between the region
> > +				 * [addr, end_addr) and the folio range.
> > +				 * The folio may start before addr if addr is
> > +				 * in the middle of a large folio.
> > +				 */
> > +				folio_start = PFN_PHYS(folio_pfn(folio));
> > +				folio_end = folio_start + folio_size(folio);
> > +
> > +				overlap_start = max(addr, folio_start);
> > +				overlap_end = min(end_addr, folio_end);
> > +
> > +				if (overlap_end > overlap_start) {
> > +					counted = overlap_end - overlap_start;
> > +					total_eligible += counted;
> > +					if (folio_nid(folio) == nid)
> > +						node_eligible += counted;
> > +				}
> > +
> > +				/* Advance past the entire folio */
> > +				addr = folio_end;
> 
> If a folio is located at the very end of the physical address space, could
> folio_end wrap around to 0? If addr becomes 0, could it cause an infinite
> loop since addr < end_addr would remain true?

The folio is a valid one, so should located on a valid physical address range.
And to my understanding, no real world machine allows 2**64 - 1 address.  So
this should be no blocker of this patch.

> 
> > +				folio_put(folio);
> > +			}
> > +			cond_resched();
> 
> Could scanning a large DAMON region without yielding the CPU in the inner loop
> cause soft lockups? Moving the cond_resched() inside the while loop would
> prevent it from monopolizing the processor.

As replied on previous versions, this is unlikely to happen.  We can revisit if
this really causes such issue.

> 
> > +		}
> > +	}
> > +
> > +	*total = total_eligible;
> > +	return node_eligible;
> > +}
> > +
> > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
> > +		struct damos *s, int nid)
> > +{
> > +	phys_addr_t total_eligible = 0;
> > +	phys_addr_t node_eligible;
> > +
> > +	if (c->ops.id != DAMON_OPS_PADDR)
> > +		return 0;
> > +
> > +	if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> > +		return 0;
> > +
> > +	node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
> > +
> > +	if (!(unsigned long)total_eligible)
> > +		return 0;
> > +
> > +	return mult_frac((unsigned long)node_eligible, 10000,
> > +			(unsigned long)total_eligible);
> 
> Does casting these phys_addr_t variables to unsigned long cause truncation
> errors on 32-bit PAE systems where physical addresses are 64-bit? For
> example, if the eligible memory size exceeds 4GB, the truncation could
> corrupt the ratio metrics and break the quota autotuner. Would converting
> byte counts to page counts before casting avoid this?

As replied on previous versions, the user impact is trivial.  No blocker.


Thanks,
SJ

[...]

next prev parent reply	other threads:[~2026-04-28  3:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  3:05 [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda
2026-04-28  3:25 ` sashiko-bot
2026-04-28  3:57   ` SeongJae Park [this message]
2026-04-28 15:31 ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428035754.116202-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=ravis.opensrc@gmail.com \
    --cc=sashiko-bot@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.