Re: [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk

Linux Documentation
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
Cc: SeongJae Park <sj@kernel.org>,
	damon@lists.linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	akpm@linux-foundation.org, corbet@lwn.net, bijan311@gmail.com,
	ajayjoshi@micron.com, honggyu.kim@sk.com, yunjeong.mun@sk.com
Subject: Re: [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk
Date: Mon, 18 May 2026 18:14:28 -0700	[thread overview]
Message-ID: <20260519011429.100021-1-sj@kernel.org> (raw)
In-Reply-To: <CALa+Y17nudor22aJvakfos3UegPgEG1M8N7cJPAxWX0Ca=MvfA@mail.gmail.com>

On Sun, 17 May 2026 22:38:51 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> On Sun, May 17, 2026 at 4:38 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > On Sat, 16 May 2026 14:03:56 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> >
> > > damon_pa_migrate() walks every PFN in a region linearly, calling
> > > damon_get_folio() for each one.  On sparse physical address spaces
> > > (e.g., CXL-attached memory), a single DAMON region can span hundreds
> > > of gigabytes where most memory is free and sitting in the buddy
> > > allocator.  Most page lookups are fruitless and dominate kdamond
> > > tick time.
> >
> > On sparse address spaces, the problem would be large DAMON regions of offlined
> > memory.  The large DAMON regions that nearly all freed memory is another
> > problem that doesn't require the sparse address spaces.  If I'm not wrong, the
> > above paragraph could better clarified in my opinion.
> >
> > >
> > > Check at pageblock boundaries (2MB on x86_64) whether the block is
> > > entirely free.  If the first page of a pageblock is a buddy page at
> > > pageblock_order or higher, the entire block is free and can be
> > > skipped.
> > > Similarly skip pageblocks where pfn_to_online_page() returns
> > > NULL.
> > >
> > > This reduces the iteration from O(region_sz / PAGE_SIZE) to
> > > O(region_sz / pageblock_sz) + O(populated_pages).
> > >
> > > buddy_order_unsafe() is used without zone->lock.  A transient false
> > > positive (block becomes non-free between the PageBuddy and order
> > > checks) costs at most one tick of missed candidates on that block;
> > > the next tick re-scans.  No correctness consequence as DAMON walks
> > > are best-effort.
> >
> > I was initially thinking this is a good and reasonable optimization approach.
> > But on the second thought I get below questions.
> >
> > For large offlined memory space problem, couldn't we simply tune DAMON's
> > monitoring regions boundary to ignore the holes?
> >
> > For large free memory area, is it reasonable to assume such situations?  In
> > production, users will try to utilize as much memory of the system as possible.
> > Then, wouldn't there be such problematically large free memory area?
> >
> > Could you please enlighten me?
> >
> 
> Hi SJ,
> 
> You're right on the first point.  For static offlined memory
> holes (memory hotplug gaps, partial socket population, etc.) the
> right answer is configuring the monitoring region boundaries to
> exclude them upfront, not making the walk skip them at runtime.
> The changelog is clearer if I narrow the patch to the free-but-
> online case.

Thank you for clarifying, Ravi.

> 
> On the free-online case: I agree large free memory areas are
> not the steady state on a fully-utilized system.  The cases I
> had in mind are more limited:
> 
>    - A workload using a small part of a much larger range, with
>       the rest left as headroom (e.g. 64 GB used of a 512 GB
>       range).

Why would the user have that large amount of headroom?

> 
>   - Shared tiers where workloads are allocated and freed on their own
>     timelines.  Any single piece of free memory doesn't last
>     long, but on a busy system there's typically a meaningful
>     free fraction in the range at any point -- especially on a
>     slower tier, where workloads prefer faster memory first
>     when it's available.

I agree there could be reasonable amount of free memory.  But, I'm still not
feeling difficult to know would that be big enough to cause the issue in DAMOS.

> 
> The patch as written is a narrow optimization for those cases:
> the pageblock-aligned check is one extra read per
> pageblock_nr_pages PFNs (about 1 per 512 on x86_64), so it's
> effectively a no-op when the region is fully populated.
> 
> If you don't see those workloads as warranting the change, I'm
> happy to drop the patch.  If the framing is the issue more than
> the change itself, I can respin a v2 with:
> 
>   - the changelog narrowed to the free-but-online case (no
>     offlined-memory framing);
>   - any suggestions from you on sashiko's review comments.

I think your arguments make sense in general.  But I'm still not quite sure
what is the realistic size of the problem, so difficult to judge.  Having a
clearer and detailed use case and backing data would be nice.

I also got a little and trivial concern for this approach.  DAMOS quota system
assumes the cost of applying DAMOS action will be proportional to the size of
memory it is applied for.  After this patch is applied, the cost will depend on
amount of free or offline memory in the memory.  It might make users difficult
to predict the overhead of DAMOS.  I might be too picky and hallucinated, but
to be honest I'm not feeling 100% comfortable with this change.

For long term, we are working on extending DAMON for general data attributes
monitoring.  I pretty sure you also aware of that.  The v1 [1] is just added to
mm-new for more testing.  It is currently supporting anon page and belinging
memory cgroup attributes.  I'm planning to extend that a lot.  In future, DAMOS
might be able to target and filter memory based on the attributes monitoring
results.  Then, we may be able to extend it for monitoring online or freeness
of the memory and ask DAMOS to filter out or de-prioritize memory regions
having high proportion of free or offline memory.

So, long story short, I'd suggest to revisit this after a clear use case and
real problem is found, unless we have it right now.

[1] https://lore.kernel.org/20260518234119.97569-1-sj@kernel.org


Thanks,
SJ

[...]

next prev parent reply	other threads:[~2026-05-19  1:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
2026-05-17 18:16   ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
2026-05-17 18:36   ` SeongJae Park
2026-05-18  5:22     ` Ravi Jonnalagadda
2026-05-19  0:38       ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
2026-05-17 18:47   ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
2026-05-17 23:37   ` SeongJae Park
2026-05-18  5:38     ` Ravi Jonnalagadda
2026-05-19  1:14       ` SeongJae Park [this message]
2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
2026-05-17 23:43   ` SeongJae Park
2026-05-18  5:54     ` Ravi Jonnalagadda
2026-05-19  1:27       ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519011429.100021-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=ajayjoshi@micron.com \
    --cc=akpm@linux-foundation.org \
    --cc=bijan311@gmail.com \
    --cc=corbet@lwn.net \
    --cc=damon@lists.linux.dev \
    --cc=honggyu.kim@sk.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ravis.opensrc@gmail.com \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox