All of lore.kernel.org
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: SeongJae Park <sj@kernel.org>
Cc: Ravi Jonnalagadda <ravis.opensrc@gmail.com>,
	damon@lists.linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	akpm@linux-foundation.org, corbet@lwn.net, bijan311@gmail.com,
	ajayjoshi@micron.com, honggyu.kim@sk.com, yunjeong.mun@sk.com
Subject: Re: (sashiko review) [PATCH v6 1/1] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics
Date: Tue,  7 Apr 2026 09:05:44 -0700	[thread overview]
Message-ID: <20260407160546.52220-1-sj@kernel.org> (raw)
In-Reply-To: <20260407001310.78557-1-sj@kernel.org>

Adding another thought at the end of the mail without cutting the previous
unrelated questions, so that Ravi can answer all my questions at once.

On Mon,  6 Apr 2026 17:13:08 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Mon, 6 Apr 2026 12:47:56 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> 
> > On Sun, Apr 5, 2026 at 3:45 PM SeongJae Park <sj@kernel.org> wrote:
> > >
> > >
> > > Ravi, thank you for reposting this patch after the rebase.  This time sashiko
> > > was able to review this, and found good points including things that deserve
> > > another revision of this patch.
> > >
> > > Forwarding full sashiko review in a reply format with my inline comments below,
> > > for sharing details of my view and doing followup discussions via mails.  Ravi,
> > > could you please reply?
> > >
> > 
> > Thanks SJ, providing your comments on top of sashiko's review is very helpful.
> 
> I'm glad to hear that it is working for you :)
> 
> [...]
> > > > +static unsigned long damos_calc_eligible_bytes(struct damon_ctx *c,
> > > > > +           struct damos *s, int nid, unsigned long *total)
> > > > > +{
> [...]
> > > > > +                           struct folio *folio;
> > > > > +                           unsigned long folio_sz, counted;
> > > > > +
> > > > > +                           folio = damon_get_folio(PHYS_PFN(addr));
> > > >
> > > > What happens if this metric is assigned to a DAMON context configured for
> > > > virtual address space monitoring? If the context uses DAMON_OPS_VADDR,
> > > > passing a user-space virtual address to PHYS_PFN() might cause invalid
> > > > memory accesses or out-of-bounds page struct reads. Should this code
> > > > explicitly verify the operations type first?
> > >
> > > Good finding.  We intend to support only paddr ops.  But there is no guard for
> > > using this on vaddr ops configuration.  Ravi, could we add underlying ops
> > > check?  I think damon_commit_ctx() is a good place to add that.  The check
> > > could be something like below?
> > >
> > 
> > I plan to add the ops type check directly in the metric functions
> > (damos_get_node_eligible_mem_bp and its counterpart) rather than in
> > damon_commit_ctx(). The functions will return 0 early
> > if c->ops.id != DAMON_OPS_PADDR.
> > 
> > That said, if you prefer the damon_commit_ctx() validation approach to
> > reject the configuration outright, I can implement it that way instead.
> > Please let me know your preference.
> 
> I'd prefer damon_commit_ctx() validation approach since it would give users
> more clear message of the failure.
> 
> > 
> > > '''
> > > --- a/mm/damon/core.c
> > > +++ b/mm/damon/core.c
> > > @@ -1515,10 +1515,23 @@ static int damon_commit_sample_control(
> > >  int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src)
> > >  {
> > >         int err;
> > > +       struct damos *scheme;
> > > +       struct damos_quota_goal *goal;
> > >
> > >         dst->maybe_corrupted = true;
> > >         if (!is_power_of_2(src->min_region_sz))
> > >                 return -EINVAL;
> > > +       if (src->ops.id != DAMON_OPS_PADDR) {
> > > +               damon_for_each_scheme(scheme, src) {
> > > +                       damos_for_each_quota_goal(goal, &scheme->quota) {
> > > +                               switch (goal->metric) {
> > > +                               case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
> > > +                               case DAMOS_QUOTA_NODE_INELIGIBLE_MEMPBP:
> > > +                                       return -EINVAL;
> > > +                               }
> > > +                       }
> > > +               }
> > > +       }
> > >
> > >         err = damon_commit_schemes(dst, src);
> > >         if (err)
> > > '''
> [...]
> > > > > +   /* Compute ineligible ratio directly: 10000 - eligible_bp */
> > > > > +   return 10000 - mult_frac(node_eligible, 10000, total_eligible);
> > > > > +}
> > > >
> > > > Does this return value match the documented metric? The formula computes the
> > > > percentage of the system's eligible memory located on other NUMA nodes,
> > > > rather than the amount of actual ineligible (filtered out) memory residing
> > > > on the target node. Could this semantic mismatch cause confusion when
> > > > configuring quota policies?
> > >
> > > Nice catch.  The name and the documentation are confusing.  We actually
> > > confused a few times in previous revisions, and I'm again confused now.  IIUC,
> > > the current implementation is the intended and right one for the given use
> > > case, though.  If my understanding is correct, how about renaming
> > > DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP to
> > > DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP_COMPLEMENT, and updating the documentation
> > > together?  Ravi, what do you think?
> > >
> > 
> > Agreed, the current name is confusing. How about
> > DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP_OFFNODE?
> > 
> > The rationale is that this metric measures "eligible memory that is off
> > this node" (i.e., on other nodes).
> > 
> >  I think "offnode" conveys the physical meaning more directly than "complement".
> > That said, I'm happy to go with "complement" if you prefer.
> > both are clearer than "ineligible".
> 
> Thank you for the nice suggestion.  I like "offnode" term.  But I think having
> "node" twice on the name is not really efficient for people who print code on
> papers.  What about DAMOS_QUOTA_OFFNODE_ELIGIBLE_MEM_BP?
> 
> But...  Maybe more importantly...  Now I realize this means that
> offnode_eligible_mem_bp with target nid 0 is just same to node_eligible_mem_bp
> with target nid 1, on your test setup.  Maybe we don't really need
> offnode_eligible_mem_bp?  That is, your test setup could be like below.
> 
> '''
> For maintaining hot memory on DRAM (node 0) and CXL (node 1) in a 7:3
> ratio:
> 
>     PUSH scheme: migrate_hot from node 0 -> node 1
>       goal: node_eligible_mem_bp, nid=1, target=3000
>       "Move hot pages from DRAM to CXL if less thatn 30% of hot data is
>        in CXL"
> 
>     PULL scheme: migrate_hot from node 1 -> node 0
>       goal: node_eligible_mem_bp, nid=0, target=7000
>       "Move hot pages from CXL to DRAM if less than 70% of hot data is
>        in DRAM"
> '''
> 
> And the schemes are more easy to read and understand for me.  This seems even
> straightforward to scale for >2 nodes.  For example, if we want hot memory
> distribution of 5:3:2 to nodes 0:1:2,
> 
> 	Two schemes for migrating hot pages out of node 0
> 	- migrate_hot from node 0 -> node 1
> 	  - goal: node_eligible_mem_bp, nid=1, target=3000
> 	- migrate_hot from node 0 -> node 2
> 	  - goal: node_eligible_mem_bp, nid=2, target=2000
> 
> 	Two schemes for migrating hot pages out of node 1
> 	- migrate_hot from node 1 -> node 0
> 	  - goal: node_eligible_mem_bp, nid=0, target=5000
> 	- migrate_hot from node 1 -> node 2
> 	  - goal: node_eligible_mem_bp, nid=2, target=2000
> 
> 	Two schemes for migrating hot pages out of node 2
> 	- migrate_hot from node 2 -> node 0
> 	  - goal: node_eligible_mem_bp, nid=0, target=5000
> 	- migrate_hot from node 2 -> node 1
> 	  - goal: node_eligible_mem_bp, nid=1, target=3000
> 
> Do you think this makes sense?  If it makes sense and works for your use case,
> what about dropping the offnode goal type?

Now I recall I suggested the offnode metric because I suggested to run a
kdamond per node.  That is, having one kdamond that monitors only node 0 and
migrate hot memory to node 1, and another kdamond that monitors only node 1 and
migrate hot memory to node 0.  And I suggested to do so because I knew it is
suboptimal to run DAMOS schemes with node filter.

We made a change [1] for making that more optimum, though.  The change is now
in mm-stable, so hopefully it will be available from 7.1-rc1.  So I believe the
single quota goal metric should work now.  Ravi, could you share what you
think?

[1] commit e1ace69c33ec ("mm/damon/core: set quota-score histogram with core filters")


Thanks,
SJ

[...]

  reply	other threads:[~2026-04-07 16:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-05 18:42 [PATCH v6 0/1] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics Ravi Jonnalagadda
2026-04-05 18:42 ` [PATCH v6 1/1] " Ravi Jonnalagadda
2026-04-05 22:45   ` (sashiko review) " SeongJae Park
2026-04-06 19:47     ` Ravi Jonnalagadda
2026-04-07  0:13       ` SeongJae Park
2026-04-07 16:05         ` SeongJae Park [this message]
2026-04-08  2:33           ` Ravi Jonnalagadda
2026-04-08 13:54             ` SeongJae Park
2026-04-05 22:51 ` [PATCH v6 0/1] " SeongJae Park
2026-04-06  0:20   ` Ravi Jonnalagadda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260407160546.52220-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=ajayjoshi@micron.com \
    --cc=akpm@linux-foundation.org \
    --cc=bijan311@gmail.com \
    --cc=corbet@lwn.net \
    --cc=damon@lists.linux.dev \
    --cc=honggyu.kim@sk.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ravis.opensrc@gmail.com \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.