From: YoungJun Park <youngjun.park@lge.com>
To: Chris Li <chrisl@kernel.org>
Cc: "Michal Koutný" <mkoutny@suse.com>,
akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, shikemeng@huaweicloud.com,
kasong@tencent.com, nphamcs@gmail.com, bhe@redhat.com,
baohua@kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, gunho.lee@lge.com,
iamjoonsoo.kim@lge.com, taejoon.song@lge.com,
"Matthew Wilcox" <willy@infradead.org>,
"David Hildenbrand" <david@redhat.com>,
"Kairui Song" <ryncsn@gmail.com>
Subject: Re: [PATCH 1/4] mm/swap, memcg: Introduce infrastructure for cgroup-based swap priority
Date: Tue, 26 Aug 2025 21:57:05 +0900 [thread overview]
Message-ID: <aK2vIdU0szcu7smP@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CACePvbV=OuxGTqoZvgwkx9D-1CycbDv7iQdKhqH1i2e8rTq9OQ@mail.gmail.com>
> > Therefore, my current thinking is:
> > * The global swap setting itself is tier 1 (if nothing is configured).
> > * If a cgroup has no setting:
> > - Top-level cgroups follow the global swap.
> > - Child cgroups follow their parent’s setting.
> > * If a cgroup has its own setting, that setting is applied.
> > (child cgroups can only select tiers that the parent has allowed.)
>
> That is too restrictive. The most common case is just the parent
> cgroup matters, the child uses the exact same setting as the parent.
> However, if you want the child to be different from the parent, there
> are two cases depending on your intention. Both can make sense.
> 1) The parent is more latency sensitive than the child. That way the
> child will be more (slower) tired than the parent. Using more tiers is
> slower, that is the inverted relationship. Your proposal does not
> allow this?
> 2) The parent is latency tolerant and the child is latency sensitive.
> In this case, the child will remove some swap files from the parent.
> This is also a valid case, e.g. the parent is just a wrapper daemon
> invoking the real worker as a child. The wrapper just does log
> rotation and restarting the child group with a watchdog, it does not
> need to be very latency sensitive, let say the watchdog is 1 hours.
> The child is the heavy lifter and requires fast response.
>
> I think both cases are possible, I don't see a strong reason to limit
> the flexibility when there is no additional cost. I expect the
> restriction approach having similar complexity.
In my use case, I think a restrictive inheritance model could
be sufficient. My argument was mainly based on the fact that most cgroup
resource distribution mechanisms usually follow a parent→child restrictive
pattern. Through the review, I came to the view that I should adhere to the
common behavior whenever possible.
Firstly(on RFC), I initially supported allowing parent/child inconsistency
for flexibility, so I actually agree with your view regarding flexibility.
For the examples you mentioned, I have no disagreement. I think my final
understanding is aligned with yours.
> Can you clarify what I need to reconsider? I have the very similar
> bitmask idea as you describe now.
> I am not a dictator. I just provide feedback to your usage case with
> my reasoning.
>
Oh! I think you are a good reviewer :D
Okay then, Let me explain my preference for numeric tiers in more detail.
It seems we are aligned on the implementation strategy with bitmask,
but I think our difference lies in the interface style — 'name' vs.
'numeric increase'."
1. A simple numeric interface makes the usage more straightforward.
Instead of '+/-' semantics, directly listing the numeric range feels
clearer and easier to use. For example:
tier 1 (ram)
tier 2 (ssd)
tier 3 (hdd)
tier 4 (network device)
tier 5 (some device)
tier 6 (some device2)
cg1: echo 1-3 > memory.swap.tier (ram,ssd,hdd)
cg1/cg2: 2-4,6 > memory.swap.tie (ssd,hdd,network device, somedevice 2, assuming non-subset is allowed)
Tier specification can also be expressed simply as arrays of priority
ranges, which feels easy to understand.
2. Since tiers are inherently ordered, numbering fits naturally and is
easier for users to accept.
In my view, assigning a name is mainly useful to distinguish between
otherwise 'indistinguishable' groups, but in this case, there is already
a clear distinction given by the different priorities which simply be
charaterized by increasing number.
I understand your point that tier names may be more convenient for
administrators, and I see the value in that. That was why I used the word
"reconsider" — your feedback makes sense as well.
I do not have a strong preference. It would be good to align after
considering the pros and cons. I look forward to your thoughts."
> > There seem to be two possible choices:
> >
> > 1. Once a cgroup references a tier, modifying that tier should be
> > disallowed.
>
> Even modify a tier to cover more priority range but no swap device
> falls in that additional range yet?
> I think we should make the change follow the swap on/swap off
> behavior. Once the swap device is swapped on, it can't change its tier
> until it is swapped off again. when it is swapped off, there is no
> cgroup on it. Notice the swap file belongs to which tier is not the
> same as the priority range of the tier. You can modify the range and
> reorder swap tiers as long as it is not causing swap on device jump to
> a different tier.
>
> > 2. Allow tier re-definition even if cgroups are already referencing
> > it.
>
> You can still swap off even if cgroup is still using it.
>
> > Personally, I prefer option (1), since it avoids unexpected changes
> > for cgroups that already rely on a particular tier definition.
>
> Swap off and on already have similar problems. We can't change the
> priority when the swap device is swapon already. We can go through a
> swap off to change it.
I see your point. In practice, when tiers are already being referenced
by cgroups, swap devices may come and go within those tiers. I think
this can be considered a "natural" behavior, as swap management is
usually performed explicitly by the administrator.
From that perspective, I expect that unintended behavior is very
unlikely to occur in real scenarios. So I am comfortable assuming this
implicit behavior when reasoning about tier modifications.
Thanks again for the clarification. With this, the overall picture
feels much clearer. Once we reach alignment on the "named" vs. "numeric"
tier interface, I plan to move forward with the patch work.
Best Regards
Youngjun Park
next prev parent reply other threads:[~2025-08-26 12:57 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-16 20:20 [PATCH 0/4] mm/swap, memcg: Support per-cgroup swap device priorities Youngjun Park
2025-07-16 20:20 ` [PATCH 1/4] mm/swap, memcg: Introduce infrastructure for cgroup-based swap priority Youngjun Park
2025-07-17 11:20 ` kernel test robot
2025-07-22 14:09 ` YoungJun Park
2025-07-18 17:08 ` kernel test robot
2025-07-22 14:11 ` YoungJun Park
2025-07-21 15:13 ` kernel test robot
2025-07-22 14:14 ` YoungJun Park
2025-07-22 8:41 ` Michal Koutný
2025-07-22 14:05 ` YoungJun Park
2025-07-22 18:41 ` YoungJun Park
2025-08-14 14:03 ` Michal Koutný
2025-08-15 15:10 ` Chris Li
2025-08-16 17:21 ` YoungJun Park
2025-08-16 19:15 ` Chris Li
2025-08-19 10:12 ` YoungJun Park
2025-08-20 0:52 ` Chris Li
2025-08-20 14:39 ` YoungJun Park
2025-08-21 20:39 ` Chris Li
2025-08-22 5:45 ` YoungJun Park
2025-08-22 16:48 ` Chris Li
2025-08-24 12:05 ` YoungJun Park
2025-08-26 8:19 ` Chris Li
2025-08-26 12:57 ` YoungJun Park [this message]
2025-08-26 14:30 ` Chris Li
2025-08-30 4:05 ` YoungJun Park
2025-08-30 7:13 ` Chris Li
2025-08-31 13:53 ` YoungJun Park
2025-08-31 16:45 ` Chris Li
2025-09-01 16:03 ` YoungJun Park
2025-09-01 16:06 ` YoungJun Park
2025-09-01 22:40 ` Chris Li
2025-08-24 14:19 ` YoungJun Park
2025-08-16 16:41 ` YoungJun Park
2025-07-16 20:20 ` [PATCH 2/4] mm: swap: Apply per-cgroup swap priority mechanism to swap layer Youngjun Park
2025-07-16 20:20 ` [PATCH 3/4] mm: memcg: Add swap cgroup priority inheritance mechanism Youngjun Park
2025-07-16 20:20 ` [PATCH 4/4] mm: swap: Per-cgroup per-CPU swap device cache with shared clusters Youngjun Park
2025-07-22 17:44 ` Kairui Song
2025-07-22 18:30 ` YoungJun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aK2vIdU0szcu7smP@yjaykim-PowerEdge-T330 \
--to=youngjun.park@lge.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=taejoon.song@lge.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).