Re: [PATCH] mm/damon: fix stale TLB young-state handling on arm64

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: "Kunwu Chan" <kunwu.chan@linux.dev>
Cc: SeongJae Park <sj@kernel.org>, "Kunwu Chan" <chentao@kylinos.cn>,
	"Wang Lian" <lianux.mm@gmail.com>,
	akpm@linux-foundation.org, damon@lists.linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/damon: fix stale TLB young-state handling on arm64
Date: Tue, 26 May 2026 07:50:33 -0700	[thread overview]
Message-ID: <20260526145034.91594-1-sj@kernel.org> (raw)
In-Reply-To: <3d09f6b9cf4a9b275876185f5b234253e7af0225@linux.dev>

On Tue, 26 May 2026 08:57:32 +0000 "Kunwu Chan" <kunwu.chan@linux.dev> wrote:

> May 26, 2026 at 1:46 AM, "SeongJae Park" <sj@kernel.org mailto:sj@kernel.org?to=%22SeongJae%20Park%22%20%3Csj%40kernel.org%3E > wrote:
> 
> 
> > 
> > On Mon, 25 May 2026 22:48:46 +0800 Kunwu Chan <kunwu.chan@linux.dev> wrote:
[...]
> > > Reproduced on arm64 (128 CPUs, 7.1.0-rc4):
> > >  
> > >  before:
> > >  WSS estimation: 50th percentile error 100% (reported as zero)
> > >  apply_interval: schemes never tried
> > >  
> > >  after:
> > >  WSS estimation: 50th percentile error 0.08%
> > >  apply_interval: passes
> > > 
> > And nice test results. I guess you are referring to the tests in damon-tests?
> > Clarifying the context would be nice.
> > 
> Yes, those results are from: make -C tools/testing/selftests/damon run_tests
> on the arm64 test machine mentioned above.
> 
> The before/after summary was extracted from the relevant failing tests
> (sysfs_update_schemes_tried_regions_wss_estimation.py and
> damos_apply_interval.py) for brevity.

Thank you for clarifying!

wss_estimation increases its working set size up to 160 MiB for this issue.
Seems your test machine has large TLB buffer.  I think we should decide the
limit based on the real running system configuration and apply similar approach
to other tests including the apply_interval.

For out-of-tree tests, we may better to provide a guideline, too.  E.g., run
this sort of test program with this DAMON config to find the reliable test
working set size on your setup.

> 
> > Also, have you had a chance to measure the performance impact?
> We haven't done detailed performance measurements yet, but we can try to
> collect some numbers for the flush overhead on a few different setups.
>  
> > So, I'd like to have this change. But, unless we have very clear evidence
> > showing this change is not increasing the performance overhead, I'd prefer
> > making this as an optional feature.
> >
> We agree that making it optional sounds safer unless we have solid
> evidence showing the overhead is negligible. Keeping the current
> default behavior for production workloads also makes sense to me.
> 
> > For the user interface, we could add a new sysfs file for the option, say,
> > 'flush_sample_tlb' under 'monitoring_attrs' directory.
> > 
> The proposed 'flush_sample_tlb' interface under monitoring_attrs sounds
> reasonable to me as well.

I was thinking this again.  I still want DAMON to be easy to test.  But, is
this making tests that difficult?  Users could increase the test working set
size.  I'm not very sure that is too diifficult to add new optional feature.
Meanwhille, adding an optional feature for only test might make users be
confused.  DAMON usage might also be diverged and add maintenance burdens.

So, now I think another option is improving the documentation.  It shouldd
clearly explain how and why DAMON does not flush TLB and what is the expected
problems (in tests) and recommendation.  In this option, we should also update
existing DAMON tests to be reliable and aligned with the documented
recommendation.  If we find it becomes a problem on testing even after applying
the recommendation, or on production, we can revisit.

Regardless of the decision about the optional feature in DAMON, I think such
documentation and tests improvement should be made.

Maybe I'm biased, so any input would be appreicatedd.  What do you think, Kunwu
and Lian?

Thanks,
SJ

[...]

next prev parent reply	other threads:[~2026-05-26 14:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 14:48 [PATCH] mm/damon: fix stale TLB young-state handling on arm64 Kunwu Chan
2026-05-25 17:46 ` SeongJae Park
2026-05-26  8:57   ` Kunwu Chan
2026-05-26 14:50     ` SeongJae Park [this message]
2026-05-31 12:16       ` Kunwu Chan
2026-05-31 16:24         ` SeongJae Park
2026-06-01  2:09           ` Kunwu Chan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260526145034.91594-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=chentao@kylinos.cn \
    --cc=damon@lists.linux.dev \
    --cc=kunwu.chan@linux.dev \
    --cc=lianux.mm@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox