cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: Chris Li <chrisl@kernel.org>
Cc: "Michal Koutný" <mkoutny@suse.com>,
	akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	muchun.song@linux.dev, shikemeng@huaweicloud.com,
	kasong@tencent.com, nphamcs@gmail.com, bhe@redhat.com,
	baohua@kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, gunho.lee@lge.com,
	iamjoonsoo.kim@lge.com, taejoon.song@lge.com,
	"Matthew Wilcox" <willy@infradead.org>,
	"David Hildenbrand" <david@redhat.com>,
	"Kairui Song" <ryncsn@gmail.com>
Subject: Re: [PATCH 1/4] mm/swap, memcg: Introduce infrastructure for cgroup-based swap priority
Date: Sun, 24 Aug 2025 21:05:37 +0900	[thread overview]
Message-ID: <aKsAES4cXWbDG1xn@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CAF8kJuMb5i6GuD_-XWtHPYnu-8dQ0W51_KqUk60DccqbKjNq6w@mail.gmail.com>

> How do you express the default tier who shall not name? There are
> actually 3 states associated with default. It is not binary.
> 1) default not specified: look up parent chain for default.
> 2) default specified as on. Override parent default.
> 3) default specified as off. Override parent default.

As I understand, your intention is to define inheritance semantics depending
on the default value, and allow children to override this freely with `-` and
`+` semantics. Is that correct?

When I originally proposed the swap cgroup priority mechanism, Michal Koutný
commented that it is unnatural for cgroups if a parent attribute is not
inherited by its child:
(https://lore.kernel.org/linux-mm/rivwhhhkuqy7p4r6mmuhpheaj3c7vcw4w4kavp42avpz7es5vp@hbnvrmgzb5tr/)

Therefore, my current thinking is:
* The global swap setting itself is tier 1 (if nothing is configured).
* If a cgroup has no setting:
  - Top-level cgroups follow the global swap.
  - Child cgroups follow their parent’s setting.
* If a cgroup has its own setting, that setting is applied.
(child cgroups can only select tiers that the parent has allowed.)

This seems natural because most cgroup resource distribution mechanisms follow
a subset inheritance model.

Thus, in my concept, there is no notion of a “default” value that controls
inheritance.

> How are you going to store the list of ranges? Just a bitmask integer
> or a list?

They can be represented as increasing integers, up to 32, and stored as a
bitmask.

> I feel the tier name is more readable. The number to which actual
> device mapping is non trivial to track for humans.

Using increasing integers makes it simpler for the kernel to accept a uniform
interface format, it is identical to the existing cpuset interface, and it
expresses the meaning of “tiers of swap by speed hierarchy” more clearly in my
view.

However, my feeling is still that this approach is clearer both in terms of
implementation and conceptual expression. I would appreciate if you could
reconsider it once more. If after reconsideration you still prefer your
direction, I will follow your decision.

> I want to add another usage case into consideration. The swap.tiers
> does not have to be per cgroup. It can be per VMA. [...]

I understand this as a potential extension use case for swap.tier.  
I will keep this in mind when implementing. If I have further ideas here, I
will share them for discussion.

> Sounds fine. Maybe we can have "ssd:100 zswap:40 hdd" [...]

Yes, this alignment looks good to me!

> Can you elaborate on that. Just brainstorming, can we keep the
> swap.tiers and assign NUMA autobind range to tier as well? [...]

That is actually the same idea I had in mind for the NUMA use case.  
However, I doubt if there is any real workload using this in practice, so I
thought it may be better to leave it out for now. If NUMA autobind is truly
needed later, it could be implemented then.

This point can also be revisited during review or patch writing, so I will
keep thinking about it.

> I feel that that has the risk of  premature optimization. I suggest
> just going with the simplest bitmask check first then optimize as
> follow up when needed. [...]

Yes, I agree with you. Starting with the bitmask implementation seems to be
the right approach.

By the way, while thinking about possible implementation, I would like to ask
your opinion on the following situation:

Suppose a tier has already been defined and cgroups are configured to use it.
Should we allow the tier definition itself to be modified afterwards?

There seem to be two possible choices:

1. Once a cgroup references a tier, modifying that tier should be disallowed.
2. Allow tier re-definition even if cgroups are already referencing it.

Personally, I prefer option (1), since it avoids unexpected changes for
cgroups that already rely on a particular tier definition.

What is your opinion on this?

Best Regards,
Youngjun Park

  reply	other threads:[~2025-08-24 12:05 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-16 20:20 [PATCH 0/4] mm/swap, memcg: Support per-cgroup swap device priorities Youngjun Park
2025-07-16 20:20 ` [PATCH 1/4] mm/swap, memcg: Introduce infrastructure for cgroup-based swap priority Youngjun Park
2025-07-17 11:20   ` kernel test robot
2025-07-22 14:09     ` YoungJun Park
2025-07-18 17:08   ` kernel test robot
2025-07-22 14:11     ` YoungJun Park
2025-07-21 15:13   ` kernel test robot
2025-07-22 14:14     ` YoungJun Park
2025-07-22  8:41   ` Michal Koutný
2025-07-22 14:05     ` YoungJun Park
2025-07-22 18:41       ` YoungJun Park
2025-08-14 14:03         ` Michal Koutný
2025-08-15 15:10           ` Chris Li
2025-08-16 17:21             ` YoungJun Park
2025-08-16 19:15               ` Chris Li
2025-08-19 10:12                 ` YoungJun Park
2025-08-20  0:52                   ` Chris Li
2025-08-20 14:39                     ` YoungJun Park
2025-08-21 20:39                       ` Chris Li
2025-08-22  5:45                         ` YoungJun Park
2025-08-22 16:48                           ` Chris Li
2025-08-24 12:05                             ` YoungJun Park [this message]
2025-08-26  8:19                               ` Chris Li
2025-08-26 12:57                                 ` YoungJun Park
2025-08-26 14:30                                   ` Chris Li
2025-08-30  4:05                                     ` YoungJun Park
2025-08-30  7:13                                       ` Chris Li
2025-08-31 13:53                                         ` YoungJun Park
2025-08-31 16:45                                           ` Chris Li
2025-09-01 16:03                                             ` YoungJun Park
2025-09-01 16:06                                             ` YoungJun Park
2025-09-01 22:40                                               ` Chris Li
2025-09-03  9:32                                                 ` Chris Li
2025-09-03 10:18                                                   ` YoungJun Park
2025-08-24 14:19                             ` YoungJun Park
2025-08-16 16:41           ` YoungJun Park
2025-07-16 20:20 ` [PATCH 2/4] mm: swap: Apply per-cgroup swap priority mechanism to swap layer Youngjun Park
2025-07-16 20:20 ` [PATCH 3/4] mm: memcg: Add swap cgroup priority inheritance mechanism Youngjun Park
2025-07-16 20:20 ` [PATCH 4/4] mm: swap: Per-cgroup per-CPU swap device cache with shared clusters Youngjun Park
2025-07-22 17:44   ` Kairui Song
2025-07-22 18:30     ` YoungJun Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKsAES4cXWbDG1xn@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=gunho.lee@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=taejoon.song@lge.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).