Re: [PATCH 1/2] mm/damon: introduce damon_rand_fast() for per-ctx PRNG

All of lore.kernel.org
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: Jiayuan Chen <jiayuan.chen@linux.dev>
Cc: SeongJae Park <sj@kernel.org>,
	damon@lists.linux.dev, Jiayuan Chen <jiayuan.chen@shopee.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm/damon: introduce damon_rand_fast() for per-ctx PRNG
Date: Sun, 26 Apr 2026 10:33:45 -0700	[thread overview]
Message-ID: <20260426173346.86238-1-sj@kernel.org> (raw)
In-Reply-To: <afcbb502-b91e-47d7-8fb7-2e4e237928c5@linux.dev>

On Sun, 26 Apr 2026 13:50:42 +0800 Jiayuan Chen <jiayuan.chen@linux.dev> wrote:

> 
> Hello SJ
> 
> Thanks.
> 
> On 4/25/26 11:59 PM, SeongJae Park wrote:
> > On Sat, 25 Apr 2026 11:36:02 +0800 Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> >
> >> On 4/24/26 11:11 PM, SeongJae Park wrote:
> [...]
> >> With ~2 GiB
> >> default regions on a 2 TiB host, a small pod's pages are averaged
> >> with thousands of non-pod pages in the same region, and the
> >> region never reaches nr_accesses=0 even when the pod is genuinely
> >> idle.
> > But, the adaptive regions adjustment mechanism dynamically change size of
> > regions, down to 4 KiB.  If the small pod's page is really cold while its
> > surrounding pages are not, DAMON should down-size the region to capture only
> > the page and show you nr_accesses=0.
> >
> 
> Looking at kdamond_split_regions() / kdamond_merge_regions():
> the per-region floor is 4 KiB, but the *total* count is
> hard-capped at max_nr_regions, and split itself is blind --
> it picks the cut position via damon_rand(1, 10) without looking
> at access patterns.  What keeps a split in place is the next
> merge cycle finding the two halves have visibly different
> nr_accesses; if a small cgroup's signal is averaged with
> surrounding host pages on both sides of the random cut, the
> halves look identical and merge folds them back.  For a 1%
> cgroup on a 2 TiB host with max_nr_regions=1000, the
> split-merge loop never converges to cgroup-aligned regions --
> random cuts almost always land in "99% host + 1% cgroup"
> mixtures.  Raising the cap gives the random splitter enough
> attempts that some cuts happen to land on physically-clustered
> cgroup pages (THP, NUMA-local allocations) and stick.  That's
> why the cap matters in practice, not the 4 KiB floor.

Theoretically, you are rigt.  But the size of the impact would depend on the
workload.

In my previous testing on 1 TiB memory machine that was running a real world
workload, DAMON was able to find 4 KiB cold pages.

Is this somewhat observed on your products?

> 
> >> The cold signal is gone before any cgroup attribution
> >> happens.  Cgroup attribution itself is done at sample granularity
> >> (folio_memcg per sampled page), not at region granularity -- the
> >> regions just need to be fine enough that there *is* a cold signal
> >> to attribute.
> > Could you please share more details about what is the cgroup attribution, and
> > how it is done?  I guess that is the way to map DAMON's monitoring regions to
> > each cgroup to determine if each cgroup is hot or cold.  I'm unsure how it is
> > really be done.
> 
> 
> We sample physical pages within DAMON regions, look up
> folio_memcg() per sampled page to find the owning memcg, and
> accumulate cold bytes per memcg.  Userspace reads the per-cgroup
> result and sizes memory.reclaim per pod.  Conceptually similar
> to the page-level monitoring you pointed me at -- we'll evaluate
> whether [1] / [2] can replace this path.

Thank you for sharing this.  That all make sense.  I also hope the existing and
planned cgroup-aware monitoring will help your use case!

FYI, I removed [1] and [2] in my previous reply, so pasting those here again,
for other readers of this mail.

[1] https://lkml.kernel.org/r/20250303221726.484227-1-sj@kernel.org
[2] https://lore.kernel.org/20260423190841.821E4C2BCAF@smtp.kernel.org

[...]
> >>> So I think this patch is ok to be merged as is (after addressing my nit trivial
> >>> comments about coding styles), but we may still want to fix it in future.  So I
> >> damon_rand() is now only called by damon_split_regions_of() with
> >> the constant range (1, 10).  may by we can rename it to
> >> damon_rand_u32() to make the u32 constraint explicit in the API
> >> name; that closes out the truncation concern at the legacy helper
> >> without needing a separate series.
> > Good point.  I'm wondering if we have a reason to keep using damon_rand() at
> > all.  I find no such reason.  If you also find no real reason, how about simply
> > removing existing damon_rand() and renaming damon_rand_fast() to damon_rand()?
> >
> 
> Good idea.  v2 will remove the legacy helper, rename
> damon_rand_fast() to damon_rand(), and plumb ctx into
> damon_split_regions_of() for the new signature.

Sounds good, looking forward to v2!


Thanks,
SJ

[...]

     prev parent reply	other threads:[~2026-04-26 17:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23 12:23 [PATCH 1/2] mm/damon: introduce damon_rand_fast() for per-ctx PRNG Jiayuan Chen
2026-04-23 12:23 ` [PATCH 2/2] mm/damon/paddr: prefetch struct page of the next region Jiayuan Chen
2026-04-23 19:25   ` sashiko-bot
2026-04-24  1:42   ` SeongJae Park
2026-04-23 19:08 ` [PATCH 1/2] mm/damon: introduce damon_rand_fast() for per-ctx PRNG sashiko-bot
2026-04-24  1:36 ` SeongJae Park
2026-04-24  2:29   ` Jiayuan Chen
2026-04-24 15:11     ` SeongJae Park
2026-04-25  3:36       ` Jiayuan Chen
2026-04-25 15:59         ` SeongJae Park
2026-04-26  5:50           ` Jiayuan Chen
2026-04-26 17:33             ` SeongJae Park [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260426173346.86238-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=damon@lists.linux.dev \
    --cc=jiayuan.chen@linux.dev \
    --cc=jiayuan.chen@shopee.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.