From: SeongJae Park <sj@kernel.org>
To: Quanmin Yan <yanquanmin1@huawei.com>
Cc: SeongJae Park <sj@kernel.org>, zuoze <zuoze1@huawei.com>,
akpm@linux-foundation.org, damon@lists.linux.dev,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
wangkefeng.wang@huawei.com
Subject: Re: [RFC PATCH] mm/damon: add full LPAE support for memory monitoring above 4GB
Date: Sat, 26 Jul 2025 10:16:16 -0700 [thread overview]
Message-ID: <20250726171616.53704-1-sj@kernel.org> (raw)
In-Reply-To: <527714dd-0e33-43ab-bbbd-d89670ba79e7@huawei.com>
Hi Quanmin,
On Sat, 26 Jul 2025 11:14:19 +0800 Quanmin Yan <yanquanmin1@huawei.com> wrote:
>
> 在 2025/7/26 4:22, SeongJae Park 写道:
> > On Fri, 25 Jul 2025 11:15:22 +0800 zuoze <zuoze1@huawei.com> wrote:
> >
> >>
> >> 在 2025/4/23 1:43, SeongJae Park 写道:
> >>> On Tue, 22 Apr 2025 19:50:11 +0800 zuoze <zuoze1@huawei.com> wrote:
> >>>
> >>> [...]
> >>>> Thanks for the patches - I’ve noted the RFC series and user-space
> >>>> updates. Apologies for the delay; I’ll prioritize reviewing these soon
> >>>> to verify they meet the intended tracking goals. Appreciate your
> >>>> patience.
> >>> No worry. Please take your time and let me know if there is anything I can
> >>> help.
> >>>
> >>> I think we can improve the user-space tool support better for usability. For
> >>> example, it could find LPAE case, set addr_unit parameter, and convert
> >>> user-input and output address ranges on its own. But hopefully the current
> >>> support allows simple tests of the kernel side change, and we could do such
> >>> improvement after the kernel side change is made.
> >>>
> >>>
> >> Hi SJ,
> >>
> >> Apologies for the delayed response. We've verified your patch in our
> >> environment and confirmed it supports LPAE address monitoring.
> > No worry, thank you for testing that :)
> >
> >> However,
> >> we observed some anomalies in the reclaim functionality. During code
> >> review, we identified a few issues:
> >>
> >> The semantic meaning of damon_region changed after addr_unit was
> >> introduced. The units in damon_addr_range may no longer represent bytes
> >> directly.
> > You're right, and this is an intended change.
> >
> >> The size returned by damon_sz_region() now requires multiplication by
> >> addr_unit to get the actual byte count.
> > Again, this is an intended change. damon_sz_region() callers should aware this
> > semantic and updated accordingly, if it could make a real problem otherwise.
> > If you found such changes required cases that this patch series is missing,
> > could you please list up?
> >
> >> Heavy usage of damon_sz_region() and DAMON_MIN_REGION likely requires
> >> addr_unit-aware adjustments throughout the codebase. While this approach
> >> works, it would involve considerable changes.
> > It has been a while since I wrote this patch series, but at least while writing
> > it, I didn't find such required changes. Of course I should missed something,
> > though. As I mentioned above, could you please list such changes required
> > parts that makes problem? That would be helpful at finding the path forward.
> >
> >> What's your perspective on
> >> how we should proceed?
> > Let's see the list of required additional changes with why those are required
> > (what problems can happen if such chadnges are not made), and discuss.
>
> Hi SJ,
>
> Thank you for your email reply. Let's discuss the impacts introduced after
> incorporating addr_unit. First of all, it's essential to clarify that the
> definition of damon_addr_range (in damon_region) has changed, we will now use
> damon_addr_range * addr_unit to calculate physical addresses.
>
> I've noticed some issues, in mm/damon/core.c:
>
> damos_apply_scheme()
> ...
> unsigned long sz = damon_sz_region(r); // the unit of 'sz' is no longer bytes.
> ...
> if (c->ops.apply_scheme)
> if (quota->esz && quota->charged_sz + sz > quota->esz)
> sz = ALIGN_DOWN(quota->esz - quota->charged_sz,
> DAMON_MIN_REGION); // the core issue lies here.
> ...
> quota->charged_sz += sz; // note the units.
> ...
> update_stat:
> // 'sz' should be multiplied by addr_unit:
> damos_update_stat(s, sz, sz_applied, sz_ops_filter_passed);
>
> Currently, DAMON_MIN_REGION is defined as PAGE_SIZE, therefore aligning
> sz downward to DAMON_MIN_REGION is likely unreasonable. Meanwhile, the unit
> of sz in damos_quota is also not bytes, which necessitates updates to comments
> and user documentation. Additionally, the calculation involving DAMON_MIN_REGION
> requires reconsideration. Here are a few examples:
>
> damos_skip_charged_region()
> ...
> sz_to_skip = ALIGN_DOWN(quota->charge_addr_from -
> r->ar.start, DAMON_MIN_REGION);
> ...
> if (damon_sz_region(r) <= DAMON_MIN_REGION)
> return true;
> sz_to_skip = DAMON_MIN_REGION;
>
> damon_region_sz_limit()
> ...
> if (sz < DAMON_MIN_REGION)
> sz = DAMON_MIN_REGION;
Thank you for this kind and detailed explanation of the issue! I understand
adopting addr_unit would make DAMON_MINREGION 'addr_unit * 4096' bytes, and it
is not a desired result when 'addr_unit' is large. For example, if 'addr_unit'
is set as 4096, the access monitoring and operation schemes will work in only
>16 MiB granularity at the best.
>
> Now I can think of two approaches, one is to keep sz in bytes, this requires
> modifications to many other call sites that use these two functions (at least
> passing the corresponding ctx->addr_unit. By the way, should we restrict the
> input of addr_unit?):
>
> damos_apply_scheme()
> ...
> - unsigned long sz = damon_sz_region(r);
> + unsigned long sz = damon_sz_region(r) * c->addr_unit;
> ...
> - damon_split_region_at(t, r, sz);
> + damon_split_region_at(t, r, sz / c->addr_unit);
>
> The second approach is to divide by addr_unit when applying DAMON_MIN_REGION,
> and revert to byte units for statistics, this approach seems to involve
> significant changes as well:
>
> damos_apply_scheme()
> ...
> sz = ALIGN_DOWN(quota->esz - quota->charged_sz,
> - DAMON_MIN_REGION);
> + DAMON_MIN_REGION / c->addr_unit);
> ...
> update_stat:
> - damos_update_stat(s, sz, sz_applied, sz_ops_filter_passed);
> + damos_update_stat(s, sz, sz_applied * c->addr_unit, sz_ops_filter_passed);
>
> These are my observations. What's your perspective on how we should proceed? Looking
> forward to your reply.
I think the second approach is better. But I want to avoid changing every
DAMON_MIN_REGION usage. What about changing DAMON_MIN_REGION as 'max(4096 /
addr_unit, 1)' instead? Specifically, we can change DAMON_MIN_REGION from a
global macro value to per-context variable (a field of damon_ctx), and set it
accordingly when the parameters are set.
For stats, I think the users should aware of the fact DAMON is working with the
addr_unit, so they should multiply addr_unit to the stats to get bytes
information. So, I think the stats update in kernel is not really required.
DAMON user-space tool may need to be updated accordingly, though.
I didn't take time to think about all corner cases, so I may missing something.
Please let me knwo if you find such missing things.
Thanks,
SJ
[...]
next prev parent reply other threads:[~2025-07-26 17:16 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-08 7:55 [RFC PATCH] mm/damon: add full LPAE support for memory monitoring above 4GB Ze Zuo
2025-04-09 2:50 ` SeongJae Park
2025-04-09 7:01 ` zuoze
2025-04-09 17:36 ` SeongJae Park
2025-04-10 6:28 ` zuoze
2025-04-10 22:25 ` SeongJae Park
2025-04-11 6:30 ` zuoze
2025-04-11 16:35 ` SeongJae Park
2025-04-14 1:21 ` zuoze
2025-04-20 16:27 ` SeongJae Park
2025-04-22 11:50 ` zuoze
2025-04-22 17:43 ` SeongJae Park
2025-07-25 3:15 ` zuoze
2025-07-25 20:22 ` SeongJae Park
2025-07-26 3:14 ` Quanmin Yan
2025-07-26 17:16 ` SeongJae Park [this message]
2025-07-29 13:52 ` Quanmin Yan
2025-07-29 15:53 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250726171616.53704-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=damon@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wangkefeng.wang@huawei.com \
--cc=yanquanmin1@huawei.com \
--cc=zuoze1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.