Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: Kunwu Chan <kunwu.chan@linux.dev>
Cc: SeongJae Park <sj@kernel.org>,
	akpm@linux-foundation.org, damon@lists.linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Kunwu Chan <chentao@kylinos.cn>, Wang Lian <lianux.mm@gmail.com>
Subject: Re: [PATCH] mm/damon: fix stale TLB young-state handling on arm64
Date: Mon, 25 May 2026 10:46:39 -0700	[thread overview]
Message-ID: <20260525174640.9440-1-sj@kernel.org> (raw)
In-Reply-To: <20260525144846.604907-1-kunwu.chan@linux.dev>

On Mon, 25 May 2026 22:48:46 +0800 Kunwu Chan <kunwu.chan@linux.dev> wrote:

> From: Kunwu Chan <chentao@kylinos.cn>

Thank you for this great patch!

> 
> damon_ptep_mkold() clears the PTE Access Flag so that a later
> access will set it again and damon_folio_young() can observe it
> via pte_young().
> 
> On arm64, however, ptep_test_and_clear_young() clears AF in the
> page tables without invalidating the corresponding TLB entry.
> Subsequent accesses can therefore continue hitting a stale TLB
> entry without a page table walk.  The PTE AF bit stays clear,
> pte_young() reports false, and DAMON treats the region as
> unaccessed.
> 
> folio_set_idle() does not help here.  It updates only software
> state, and accesses through a stale TLB entry do not clear the
> idle flag.
> 
> As a result, nr_accesses stays low regardless of the real access
> pattern.  DAMOS schemes fail to match, WSS estimation reports
> zero, and actions like pageout never trigger.

You are correct.  Nonetheless, we intentionally designed DAMON in this way, to
avoid the performance overhead from the added TLB flushes.  Meanwhile, we
believed this is ok in terms of the monitoring results accuracy on real world
production environment, becasue such environments would have large amount of
working set that flushes TLB buffers anyway.  We decided to take this way after
a measurement [1].

And apparently you found this behavior as problematic on a test environment
that having small size of working set.  We found a similar issue in DAMON
selftest, and we updated the test [2] to simulate the expected real world
production environments, rather than changing DAMON.

But, this kind of question is recurring.  In addition to the previous
discussion, there were a few private inqueries for this issue.  And though the
real world production environment is the priamry target of DAMON, I understand
it is better to support testing environment, too.  So, I think it is better to
make some changes for this issue, if it doesn't make other problems.

> 
> Fix this by switching to ptep_clear_flush_young() and
> pmdp_clear_flush_young().
> 
> On arm64 these perform the required TLB invalidation after
> clearing AF.  The invalidation is deferred, but still sufficient
> for DAMON's sampling granularity.
> 
> On x86, ptep_clear_flush_young() is equivalent to
> ptep_test_and_clear_young() for base pages, so there is no
> behavioral change.  pmdp_clear_flush_young() additionally performs
> a flush at PMD level, matching the existing x86 implementation.
> 
> On powerpc, riscv, and s390, the clear_flush variants currently
> map back to test_and_clear implementations, so this patch does not
> change their behavior.

This change seems much nicer and might be more optimized than my simple
implementation of tlb flush [1] that I tested before.

> 
> Reproduced on arm64 (128 CPUs, 7.1.0-rc4):
> 
>   before:
>     WSS estimation: 50th percentile error 100% (reported as zero)
>     apply_interval: schemes never tried
> 
>   after:
>     WSS estimation: 50th percentile error 0.08%
>     apply_interval: passes

And nice test results.  I guess you are referring to the tests in damon-tests?
Clarifying the context would be nice.

Also, have you had a chance to measure the performance impact?

So, I'd like to have this change.  But, unless we have very clear evidence
showing this change is not increasing the performance overhead, I'd prefer
making this as an optional feature.

For the user interface, we could add a new sysfs file for the option, say,
'flush_sample_tlb' under 'monitoring_attrs' directory.

For long term, I'm planning [1] to extend the data attributes monitoring
feature so that data access becomes just one of the attributes.  Once it is
done, we could control this tlb flush option using the probes interface.

I was initially thinking about asking Kunwu to wait until the data attributes
monitoring extension is done, and add this tlb flush option on top of that.
Because, otherwise, we may need to deprecate 'flush_sample_tlb' after the
extension is done.  But, we will anyway need to deprecate a few interfaces
including 'nr_accesses'.  Doing the deprecation of 'flush_sample_tlb' together
with it shouldn't be huge amount of overhead.

So, unless Kunwu and Lian has other concerns, I'd suggest the
'flush_sample_tlb' path.

[...]

[1] https://lore.kernel.org/20200403103059.12762-1-sjpark@amazon.com/
[2] https://lore.kernel.org/20260117020731.226785-3-sj@kernel.org/


Thanks,
SJ


  reply	other threads:[~2026-05-25 17:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 14:48 [PATCH] mm/damon: fix stale TLB young-state handling on arm64 Kunwu Chan
2026-05-25 17:46 ` SeongJae Park [this message]
2026-05-26  8:57   ` Kunwu Chan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260525174640.9440-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=chentao@kylinos.cn \
    --cc=damon@lists.linux.dev \
    --cc=kunwu.chan@linux.dev \
    --cc=lianux.mm@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox