Re: [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric

DAMON development mailing list
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Ravi Jonnalagadda" <ravis.opensrc@gmail.com>
Cc: damon@lists.linux.dev
Subject: Re: [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric
Date: Tue, 28 Apr 2026 03:25:42 +0000	[thread overview]
Message-ID: <20260428032542.98E37C2BCAF@smtp.kernel.org> (raw)
In-Reply-To: <20260428030520.701-1-ravis.opensrc@gmail.com>

Sashiko AI review found 4 potential issue(s):
- [High] Missing `cond_resched()` inside the inner PFN-scanning loop causes CPU soft lockups.
- [High] Casting 64-bit `phys_addr_t` to 32-bit `unsigned long` causes massive quota calculation errors on 32-bit PAE systems.
- [High] Physical address wraparound of `folio_end` causes an infinite loop and system hang.
- [High] Pre-existing `VM_BUG_ON` in speculative folio access within `damon_get_folio()`.
--

commit 377bf0664129c87741fa640c563456b379c8a816
Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>

mm/damon: add node_eligible_mem_bp goal metric

This patch enables system-wide page distribution with target-state goals
using PA-mode DAMON schemes. It tracks the ratio of scheme-eligible memory
on a target node to adjust migration aggressiveness. The implementation
calculates the current memory distribution across regions that match the
scheme's access pattern.

> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index 3dbbbfdeff71..9a4220174a61 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
[ ... ]
> @@ -2287,7 +2306,115 @@ static unsigned long damos_get_node_memcg_used_bp(
[ ... ]
> +			while (addr < end_addr) {
> +				struct folio *folio;
> +				phys_addr_t folio_start, folio_end;
> +				phys_addr_t overlap_start, overlap_end;
> +				phys_addr_t counted;
> +
> +				folio = damon_get_folio(PHYS_PFN(addr));

Since this code heavily invokes damon_get_folio() in a new PFN scanner, could
the speculative folio access inside damon_get_folio() trigger a VM_BUG_ON? If
damon_get_folio() tests folio_test_lru() before acquiring a reference with
folio_try_get(), the page could be concurrently split or reallocated as a tail
page.

> +				if (!folio) {
> +					addr = PAGE_ALIGN_DOWN(addr +
> +							PAGE_SIZE);
> +					if (!addr)
> +						break;
> +					continue;
> +				}
> +
> +				/*
> +				 * Calculate exact overlap between the region
> +				 * [addr, end_addr) and the folio range.
> +				 * The folio may start before addr if addr is
> +				 * in the middle of a large folio.
> +				 */
> +				folio_start = PFN_PHYS(folio_pfn(folio));
> +				folio_end = folio_start + folio_size(folio);
> +
> +				overlap_start = max(addr, folio_start);
> +				overlap_end = min(end_addr, folio_end);
> +
> +				if (overlap_end > overlap_start) {
> +					counted = overlap_end - overlap_start;
> +					total_eligible += counted;
> +					if (folio_nid(folio) == nid)
> +						node_eligible += counted;
> +				}
> +
> +				/* Advance past the entire folio */
> +				addr = folio_end;

If a folio is located at the very end of the physical address space, could
folio_end wrap around to 0? If addr becomes 0, could it cause an infinite
loop since addr < end_addr would remain true?

> +				folio_put(folio);
> +			}
> +			cond_resched();

Could scanning a large DAMON region without yielding the CPU in the inner loop
cause soft lockups? Moving the cond_resched() inside the while loop would
prevent it from monopolizing the processor.

> +		}
> +	}
> +
> +	*total = total_eligible;
> +	return node_eligible;
> +}
> +
> +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
> +		struct damos *s, int nid)
> +{
> +	phys_addr_t total_eligible = 0;
> +	phys_addr_t node_eligible;
> +
> +	if (c->ops.id != DAMON_OPS_PADDR)
> +		return 0;
> +
> +	if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> +		return 0;
> +
> +	node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
> +
> +	if (!(unsigned long)total_eligible)
> +		return 0;
> +
> +	return mult_frac((unsigned long)node_eligible, 10000,
> +			(unsigned long)total_eligible);

Does casting these phys_addr_t variables to unsigned long cause truncation
errors on 32-bit PAE systems where physical addresses are 64-bit? For
example, if the eligible memory size exceeds 4GB, the truncation could
corrupt the ratio metrics and break the quota autotuner. Would converting
byte counts to page counts before casting avoid this?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260428030520.701-1-ravis.opensrc@gmail.com?part=1

next prev parent reply	other threads:[~2026-04-28  3:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  3:05 [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda
2026-04-28  3:25 ` sashiko-bot [this message]
2026-04-28  3:57   ` SeongJae Park
2026-04-28 15:31 ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428032542.98E37C2BCAF@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=ravis.opensrc@gmail.com \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox