From: SeongJae Park <sj@kernel.org>
To: YoungJun Park <youngjun.park@lge.com>
Cc: SeongJae Park <sj@kernel.org>,
akpm@linux-foundation.org, linux-mm@kvack.org,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
chrisl@kernel.org, kasong@tencent.com, hannes@cmpxchg.org,
mhocko@kernel.org, roman.gushchin@linux.dev,
shakeel.butt@linux.dev, muchun.song@linux.dev,
shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com,
baohua@kernel.org, gunho.lee@lge.com, taejoon.song@lge.com
Subject: Re: [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
Date: Sat, 15 Nov 2025 08:56:35 -0800 [thread overview]
Message-ID: <20251115165637.82966-1-sj@kernel.org> (raw)
In-Reply-To: <aRhLmEKixuKGCUJX@yjaykim-PowerEdge-T330>
On Sat, 15 Nov 2025 18:44:56 +0900 YoungJun Park <youngjun.park@lge.com> wrote:
> On Fri, Nov 14, 2025 at 05:22:45PM -0800, SeongJae Park wrote:
> > On Sun, 9 Nov 2025 21:49:44 +0900 Youngjun Park <youngjun.park@lge.com> wrote:
> >
> > > Hi all,
> > >
> > > In constrained environments, there is a need to improve workload
> > > performance by controlling swap device usage on a per-process or
> > > per-cgroup basis. For example, one might want to direct critical
> > > processes to faster swap devices (like SSDs) while relegating
> > > less critical ones to slower devices (like HDDs or Network Swap).
> > >
> > > Initial approach was to introduce a per-cgroup swap priority
> > > mechanism [1]. However, through review and discussion, several
> > > drawbacks were identified:
> > >
> > > a. There is a lack of concrete use cases for assigning a fine-grained,
> > > unique swap priority to each cgroup.
> > > b. The implementation complexity was high relative to the desired
> > > level of control.
> > > c. Differing swap priorities between cgroups could lead to LRU
> > > inversion problems.
> > >
> > > To address these concerns, I propose the "swap tiers" concept,
> > > originally suggested by Chris Li [2] and further developed through
> > > collaborative discussions. I would like to thank Chris Li and
> > > He Baoquan for their invaluable contributions in refining this
> > > approach, and Kairui Song, Nhat Pham, and Michal Koutný for their
> > > insightful reviews of earlier RFC versions.
> >
> > I think the tiers concept is a nice abstraction. I'm also interested in how
> > the in-kernel control mechanism will deal with tiers management, which is not
> > always simple. I'll try to take a time to read this series thoroughly. Thank
> > you for sharing this nice work!
>
> Hi SeongJae,
>
> Thank you for your feedback and interest in the swap tiers concept
> I appreciate your willingness to review this series.
>
> Regarding your question about simpler approaches using memory.reclaim,
> MADV_PAGEOUT, or DAMOS_PAGEOUT with swap device specification - I've
> looked into this perspective after reading your comments. This approach
> would indeed be one way to enable per-process swap device selection
> from a broader standpoint.
>
> > Nevertheless, I'm curious if there is simpler and more flexible ways to achieve
> > the goal (control of swap device to use). For example, extending existing
> > proactive pageout features, such as memory.reclaim, MADV_PAGEOUT or
> > DAMOS_PAGEOUT, to let users specify the swap device to use. Doing such
> > extension for MADV_PAGEOUT may be challenging, but it might be doable for
> > memory.reclaim and DAMOS_PAGEOUT. Have you considered this kind of options?
>
> Regarding your question about simpler approaches using memory.reclaim,
> MADV_PAGEOUT, or DAMOS_PAGEOUT with swap device specification - I've
> looked into this perspective after reading your comments. This approach
> would indeed be one way to enable per-process swap device selection
> from a broader standpoint.
>
> However, for our use case, per-process granularity feels too fine-grained,
> which is why we've been focusing more on the cgroup-based approach.
Thank you for kindly sharing your opinion. That all makes sense. Nonetheless,
I think the limitation is only for MADV_PAGEOUT.
MADV_PAGEOUT would indeed have a limitation at applying it on cgroup level. In
case of memory.reclaim and DAMOS_PAGEOUT, however, I think it can work in
cgroup level, since memory.reclaim exists per cgroup, and DAMOS_PAGEOUT has
knobs for cgroup level controls, including cgroup based DAMOS filters and
per-node per-cgroup memory usage based DAMOS quota goal. Also, if needed for
swap tiers, extending DAMOS seems doable, to my perspective.
>
> That said, if we were to aggressively consider the per-process approach
> as well in the future, I'm thinking about how we might integrate it with
> the tier concept(not just indivisual swap device). During discussions with Chris Li, we also talked about
> potentially tying this to per-VMA control (see the discussion at
> https://lore.kernel.org/linux-mm/CACePvbW_Q6O2ppMG35gwj7OHCdbjja3qUCF1T7GFsm9VDr2e_g@mail.gmail.com/).
> This concept could go beyond just selection at the cgroup layer.
Sounds interesting. I once thought extending DAMOS for vma level control
(e.g., asking some DAMOS actions to target only vmas of specific names) could
be useful, in the past. I have no real plan to do that at the moment due to
the absence of expected usage. But if that could be used for swap tiers, I
would be happy to help.
Thanks,
SJ
[...]
next prev parent reply other threads:[~2025-11-15 16:56 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 12:49 [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Youngjun Park
2025-11-09 12:49 ` [PATCH 1/3] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2025-11-13 6:07 ` Kairui Song
2025-11-13 11:45 ` YoungJun Park
2025-11-14 1:05 ` Baoquan He
2025-11-14 15:52 ` Kairui Song
2025-11-15 9:28 ` YoungJun Park
2025-11-09 12:49 ` [PATCH 2/3] mm: swap: introduce swap tier infrastructure Youngjun Park
2025-11-12 14:20 ` Chris Li
2025-11-13 2:01 ` YoungJun Park
2025-11-09 12:49 ` [PATCH 3/3] mm/swap: integrate swap tier infrastructure into swap subsystem Youngjun Park
2025-11-10 11:40 ` kernel test robot
2025-11-10 12:12 ` kernel test robot
2025-11-10 13:26 ` kernel test robot
2025-11-12 14:44 ` Chris Li
2025-11-13 4:07 ` YoungJun Park
2025-11-12 13:34 ` [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2025-11-13 1:33 ` YoungJun Park
2025-11-15 1:22 ` SeongJae Park
2025-11-15 9:44 ` YoungJun Park
2025-11-15 16:56 ` SeongJae Park [this message]
2025-11-15 15:13 ` Chris Li
2025-11-15 17:24 ` SeongJae Park
2025-11-17 22:17 ` Chris Li
2025-11-18 1:11 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251115165637.82966-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=chrisl@kernel.org \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=taejoon.song@lge.com \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.