Re: [PATCH] mm/damon: fix stale TLB young-state handling on arm64

DAMON development mailing list
 help / color / mirror / Atom feed

From: "Kunwu Chan" <kunwu.chan@linux.dev>
To: "SeongJae Park" <sj@kernel.org>,
	"Kunwu Chan" <chentao@kylinos.cn>,
	"Wang Lian" <lianux.mm@gmail.com>
Cc: akpm@linux-foundation.org, damon@lists.linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/damon: fix stale TLB young-state handling on arm64
Date: Tue, 26 May 2026 08:57:32 +0000	[thread overview]
Message-ID: <3d09f6b9cf4a9b275876185f5b234253e7af0225@linux.dev> (raw)
In-Reply-To: <20260525174640.9440-1-sj@kernel.org>

May 26, 2026 at 1:46 AM, "SeongJae Park" <sj@kernel.org mailto:sj@kernel.org?to=%22SeongJae%20Park%22%20%3Csj%40kernel.org%3E > wrote:


> 
> On Mon, 25 May 2026 22:48:46 +0800 Kunwu Chan <kunwu.chan@linux.dev> wrote:
> 
> > 
> > From: Kunwu Chan <chentao@kylinos.cn>
> > 
> Thank you for this great patch!
Thanks for the reply.
> 
> > 
> > damon_ptep_mkold() clears the PTE Access Flag so that a later
> >  access will set it again and damon_folio_young() can observe it
> >  via pte_young().
> >  
> >  On arm64, however, ptep_test_and_clear_young() clears AF in the
> >  page tables without invalidating the corresponding TLB entry.
> >  Subsequent accesses can therefore continue hitting a stale TLB
> >  entry without a page table walk. The PTE AF bit stays clear,
> >  pte_young() reports false, and DAMON treats the region as
> >  unaccessed.
> >  
> >  folio_set_idle() does not help here. It updates only software
> >  state, and accesses through a stale TLB entry do not clear the
> >  idle flag.
> >  
> >  As a result, nr_accesses stays low regardless of the real access
> >  pattern. DAMOS schemes fail to match, WSS estimation reports
> >  zero, and actions like pageout never trigger.
> > 
> You are correct. Nonetheless, we intentionally designed DAMON in this way, to
> avoid the performance overhead from the added TLB flushes. Meanwhile, we
> believed this is ok in terms of the monitoring results accuracy on real world
> production environment, becasue such environments would have large amount of
> working set that flushes TLB buffers anyway. We decided to take this way after
> a measurement [1].
> 
Understood, thanks for the detailed explanation and context.  We were
too focused on the testing issue and overlooked the original tradeoff
between monitoring accuracy and the overhead of TLB flushing.

> And apparently you found this behavior as problematic on a test environment
> that having small size of working set. We found a similar issue in DAMON
> selftest, and we updated the test [2] to simulate the expected real world
> production environments, rather than changing DAMON.
> 
That makes sense.

The selftest change in [2] also clarified why DAMON normally works fine
in production workloads. Large and active workloads naturally exceed
TLB coverage, so accessed-bit updates eventually become visible again.

So I agree that adjusting the test instead of forcing TLB flushes in
DAMON was the right tradeoff there.

> But, this kind of question is recurring. In addition to the previous
> discussion, there were a few private inqueries for this issue. And though the
> real world production environment is the priamry target of DAMON, I understand
> it is better to support testing environment, too. So, I think it is better to
> make some changes for this issue, if it doesn't make other problems.
> 
At the same time, I also understand why this keeps coming up.  Test
environments can differ a lot in workload size, THP usage, VM setup,
and hardware configuration, so stable and reproducible testing is not
always easy across different systems.

So I agree it makes sense to improve this area as long as it does not
add noticeable overhead or cause problems for the normal production
case.

> > 
> > Fix this by switching to ptep_clear_flush_young() and
> >  pmdp_clear_flush_young().
> >  
> >  On arm64 these perform the required TLB invalidation after
> >  clearing AF. The invalidation is deferred, but still sufficient
> >  for DAMON's sampling granularity.
> >  
> >  On x86, ptep_clear_flush_young() is equivalent to
> >  ptep_test_and_clear_young() for base pages, so there is no
> >  behavioral change. pmdp_clear_flush_young() additionally performs
> >  a flush at PMD level, matching the existing x86 implementation.
> >  
> >  On powerpc, riscv, and s390, the clear_flush variants currently
> >  map back to test_and_clear implementations, so this patch does not
> >  change their behavior.
> > 
> This change seems much nicer and might be more optimized than my simple
> implementation of tlb flush [1] that I tested before.
> 
Thanks. Yes, we were mainly trying to reuse the existing
arch-optimized helpers as much as possible.

> > 
> > Reproduced on arm64 (128 CPUs, 7.1.0-rc4):
> >  
> >  before:
> >  WSS estimation: 50th percentile error 100% (reported as zero)
> >  apply_interval: schemes never tried
> >  
> >  after:
> >  WSS estimation: 50th percentile error 0.08%
> >  apply_interval: passes
> > 
> And nice test results. I guess you are referring to the tests in damon-tests?
> Clarifying the context would be nice.
> 
Yes, those results are from: make -C tools/testing/selftests/damon run_tests
on the arm64 test machine mentioned above.

The before/after summary was extracted from the relevant failing tests
(sysfs_update_schemes_tried_regions_wss_estimation.py and
damos_apply_interval.py) for brevity.

> Also, have you had a chance to measure the performance impact?
We haven't done detailed performance measurements yet, but we can try to
collect some numbers for the flush overhead on a few different setups.
 
> So, I'd like to have this change. But, unless we have very clear evidence
> showing this change is not increasing the performance overhead, I'd prefer
> making this as an optional feature.
>
We agree that making it optional sounds safer unless we have solid
evidence showing the overhead is negligible. Keeping the current
default behavior for production workloads also makes sense to me.

> For the user interface, we could add a new sysfs file for the option, say,
> 'flush_sample_tlb' under 'monitoring_attrs' directory.
> 
The proposed 'flush_sample_tlb' interface under monitoring_attrs sounds
reasonable to me as well.

> For long term, I'm planning [1] to extend the data attributes monitoring
> feature so that data access becomes just one of the attributes. Once it is
> done, we could control this tlb flush option using the probes interface.
> 
> I was initially thinking about asking Kunwu to wait until the data attributes
> monitoring extension is done, and add this tlb flush option on top of that.
> Because, otherwise, we may need to deprecate 'flush_sample_tlb' after the
> extension is done. But, we will anyway need to deprecate a few interfaces
> including 'nr_accesses'. Doing the deprecation of 'flush_sample_tlb' together
> with it shouldn't be huge amount of overhead.
> 
> So, unless Kunwu and Lian has other concerns, I'd suggest the
> 'flush_sample_tlb' path.
Yes, we also think deprecating it later together with interfaces such as
'nr_accesses' should be manageable once the data attributes monitoring
extension is ready.

Thanks, Kunwu
> 
> [...]
> 
> [1] https://lore.kernel.org/20200403103059.12762-1-sjpark@amazon.com/
> [2] https://lore.kernel.org/20260117020731.226785-3-sj@kernel.org/
> 
> Thanks,
> SJ
>

next prev parent reply	other threads:[~2026-05-26  8:58 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 14:48 [PATCH] mm/damon: fix stale TLB young-state handling on arm64 Kunwu Chan
2026-05-25 15:34 ` sashiko-bot
2026-05-25 17:46 ` SeongJae Park
2026-05-26  8:57   ` Kunwu Chan [this message]
2026-05-26 14:50     ` SeongJae Park
2026-05-31 12:16       ` Kunwu Chan
2026-05-31 16:24         ` SeongJae Park
2026-06-01  2:09           ` Kunwu Chan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d09f6b9cf4a9b275876185f5b234253e7af0225@linux.dev \
    --to=kunwu.chan@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=chentao@kylinos.cn \
    --cc=damon@lists.linux.dev \
    --cc=lianux.mm@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox