public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Josh Law <objecting@objecting.org>
To: SeongJae Park <sj@kernel.org>
Cc: akpm@linux-foundation.org, damon@lists.linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm/damon/core: optimize kdamond_ap ply_schemes() by inverting scheme and region loops
Date: Sun, 22 Mar 2026 22:44:58 +0000	[thread overview]
Message-ID: <41251BD0-5796-4600-A75B-3D08A81ADF04@objecting.org> (raw)
In-Reply-To: <20260322222845.89757-1-sj@kernel.org>



On 22 March 2026 22:28:44 GMT, SeongJae Park <sj@kernel.org> wrote:
>On Sun, 22 Mar 2026 21:59:45 +0000 Josh Law <objecting@objecting.org> wrote:
>
>> 
>> 
>> On 22 March 2026 21:44:18 GMT, SeongJae Park <sj@kernel.org> wrote:
>> >Hello Josh,
>> >
>> >On Sun, 22 Mar 2026 18:46:40 +0000 Josh Law <objecting@objecting.org> wrote:
>> >
>> >> Currently, kdamond_apply_schemes() iterates over all targets, then over all
>> >> regions, and finally calls damon_do_apply_schemes() which iterates over
>> >> all schemes. This nested structure causes scheme-level invariants (such as
>> >> time intervals, activation status, and quota limits) to be evaluated inside
>> >> the innermost loop for every single region.
>> >> 
>> >> If a scheme is inactive, has not reached its apply interval, or has already
>> >> fulfilled its quota (quota->charged_sz >= quota->esz), the kernel still
>> >> needlessly iterates through thousands of regions only to repeatedly
>> >> evaluate these same scheme-level conditions and continue.
>> >> 
>> >> This patch inlines damon_do_apply_schemes() into kdamond_apply_schemes()
>> >> and inverts the loop ordering. It now iterates over schemes on the outside,
>> >> and targets/regions on the inside.
>> >> 
>> >> This allows the code to evaluate scheme-level limits once per scheme.
>> >> If a scheme's quota is met or it is inactive, we completely bypass the
>> >> O(Targets * Regions) inner loop for that scheme. This drastically reduces
>> >> unnecessary branching, cache thrashing, and CPU overhead in the kdamond
>> >> hot path.
>> >
>> >That makes sense in high level.  But, this will make a kind of behavioral
>> >difference that could be user-visible.  I am failing at finding a clear use
>> >case that really depends on the old behavior.  But, still it feels like not a
>> >small change to me.
>> >
>> >So, I'd like to be conservative to this change, unless there are good evidences
>> >showing very clear and impactful real world benefits.  Can you share such
>> >evidences if you have?
>> >
>> >
>> >Thanks,
>> >SJ
>> >
>> >[...]
>> 
>> 
>> My last email:
>> 
>> Hi SeongJae,
>> 
>> I've looked into this further and ran some extra benchmarks on the kdamond hot path to see if the gains were actually meaningful.
>> 
>> The main issue right now is that kdamond spends a lot of time "spinning" through regions even when there's no work to do. For example, if a user has 10,000 regions and a few schemes that have already hit their quotas or are disabled by watermarks, the current code still iterates through every single region just to check those same flags 10,000 times.
>> 
>> In my tests:
>> 
>> Typical setup (10 schemes, 2k regions): ~3.4x faster.
>> 
>> Large scale (10k regions, hitting quotas): ~7x faster.
>> 
>> Idle schemes (watermarks off): ~7x faster.
>
>Thank you for sharing these.  This seems like not a real world workload test
>but some micro-benchmarks for only the code path, though.
>
>In real world DAMOS usages, I think most of time will be spent on applying
>DAMOS action.  Compared to that, I think the time spent for the unnecessary
>iteration will be quite small.
>
>> 
>> 
>> It's also a cache locality win. Right now the CPU has to bounce between different scheme metadata inside the innermost loop for every region. Inverting the loops lets us process one scheme completely, which keeps the hot data in L1/L2 and gives about a 10% gain even when everything is active.
>> 
>> The goal isn't just to shave cycles, but to make DAMON scale better on high-memory systems (512GB+) where the region count is high. This keeps the background "CPU floor" much lower when DAMON is supposed to be idle or throttled.
>
>DAMON does adaptive regions adjustment for such large memory system
>scalability.  I understand some users might dislike the adaptive mechanism and
>stick to a fixed granular monitoring, though.
>
>So I'm not yet convinced to this change as is.
>
>Meanwhile, I'm thinking about a way to make similar optimization without
>changing the behavior.
>
>We already have the first loop of kdamond_apply_schemes() to minimize some of
>the inefficiency that this patch is aiming to optimize out.  Maybe we can
>further optimize the first loop.  For example, modifying the first loop to
>build a list or array that contains schemes that passed the next_apply_sis and
>wmarks.activated test, and make damon_do_apply_schemes() to use the test-passed
>schemes instead of the all schemes in the context.
>
>This will keep the behavior but have a performance gain that similar to what
>this patch is aiming to.  If this can be done with a fairly simple way that can
>justify the maintenance burden, I think that's a way path forward.  But, from
>this point, I realize I want it to be *very* simple, and I have no idea about
>the simple way.
>
>So I wanted to help making this be merged.  But I fail at finding a good path
>forward on my own.
>
>In my humble and frank opinion, finding other place to work on insted of this
>specific code path optimization might be a better use of the time.
>
>
>Thanks,
>SJ
>
>[...]


Also, V2 is out for the other patch you liked


V/R


Josh Law


  parent reply	other threads:[~2026-03-22 22:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-22 18:46 [PATCH 0/2] mm/damon/core: performance optimizations for kdamond hot path Josh Law
2026-03-22 18:46 ` [PATCH 1/2] mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme and region loops Josh Law
2026-03-22 21:44   ` SeongJae Park
2026-03-22 21:47     ` Josh Law
2026-03-22 21:53     ` Josh Law
2026-03-22 21:59     ` Josh Law
2026-03-22 22:28       ` [PATCH 1/2] mm/damon/core: optimize kdamond_ap ply_schemes() " SeongJae Park
2026-03-22 22:39         ` Josh Law
2026-03-23 14:01           ` SeongJae Park
2026-03-22 22:44         ` Josh Law [this message]
2026-03-22 18:46 ` [PATCH 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses() Josh Law
2026-03-22 21:30   ` SeongJae Park
2026-03-22 21:32     ` Josh Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41251BD0-5796-4600-A75B-3D08A81ADF04@objecting.org \
    --to=objecting@objecting.org \
    --cc=akpm@linux-foundation.org \
    --cc=damon@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox