From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Yosry Ahmed <yosry@kernel.org>
Cc: Youngjun Park <her0gyugyu@gmail.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
akpm@linux-foundation.org, chrisl@kernel.org,
youngjun.park@lge.com, linux-mm@kvack.org,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
kasong@tencent.com, hannes@cmpxchg.org, mhocko@kernel.org,
roman.gushchin@linux.dev, muchun.song@linux.dev,
shikemeng@huaweicloud.com, nphamcs@gmail.com,
baoquan.he@linux.dev, baohua@kernel.org, gunho.lee@lge.com,
taejoon.song@lge.com, hyungjun.cho@lge.com, mkoutny@suse.com,
baver.bae@lge.com, matia.kim@lge.com
Subject: Re: [PATCH v9 3/6] mm: memcontrol: add interface for swap tier selection
Date: Tue, 23 Jun 2026 13:20:10 -0700 [thread overview]
Message-ID: <20260623202012.2446676-1-joshua.hahnjy@gmail.com> (raw)
In-Reply-To: <CAO9r8zPwHYj284gyyjqnH6Z-NNLLbftKqzoOKycaMzm3+ifSdA@mail.gmail.com>
On Tue, 23 Jun 2026 13:06:10 -0700 Yosry Ahmed <yosry@kernel.org> wrote:
> On Tue, Jun 23, 2026 at 11:56 AM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> >
> > On Tue, 23 Jun 2026 11:10:32 -0700 Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > > > To get back to the question of how the auto-tuning should work, the
> > > > main question is to which ratio we scale the swap limits to.
> > > > Do we set the swap limits proportional to how much swap is present
> > > > in the system, or how much swap is available to the cgroup?
> > > >
> > > > So if we have 3 swap tiers A, B, C, with 50G, 30G, and 20G capacity
> > > > respectively, how much should a cgroup with swap.max = 10G have if
> > > > it is limited to tiers A and B?
> > > >
> > > > This is what I was getting at earlier when I said we have to calculate
> > > > different ratios for different cgroups, based on what tiers they have
> > > > access to.
> > >
> > > That's a good question. I think the case that is particularly
> > > interesting is whether or not the limits of other tiers should change
> > > when another tier is disabled/enabled.
> > >
> > > So basically in your example, assuming everything starts as "max",
> > > when swap.max is set to 10G, the autoscaled limits would be: (tier A,
> > > 5G), (tier B, 3G), (tier C, 2G). Now the question becomes, if
> > > userspace sets the limit of tier C to 0, should the limits for tiers A
> > > and B change?
> > >
> > > On one hand, it's simpler to just keep the autoscaled limits unchanged
> > > in this case. However, this means that the effective swap limit is now
> > > 8G, which is not great :/
> > >
> > > The alternative is to recalculate all the limits when one of them
> > > changes, in which case the limits of A and B would change to 6.25G and
> > > 3.75G. But I don't know if this will work well if we allow custom
> > > limits. What happens if the limit of tier C is written as 1 (or 4096)
> > > instead of 0? It's effectively the same scenario, but the tier is
> > > technically allowed.
> >
> > I think the one problem with this is that it becomes quite easy to
> > accidentally overcommit. As a toy example, if you have 10 workloads and
> > 100G swap (as in the example I gave above), intuitively setting
> > swap.max = 10G for all 10 workloads shouldn't ever cause any contention
> > on capacity. But if you start excluding some tiers from some workloads,
> > you actually get overcommitting on the tiers that can service the
> > most workloads.
> >
> > I am not sure how concerning swap overcommit was, but at least in the
> > memory tiering scenario accidental overcommitting of toptier memory
> > seemed bad enough that I wanted to avoid the problem entirely.
> >
> > > The more I think about it, the more I realize it may be best to drop
> > > the autoscaling thing. I imagine memory tiering might run into similar
> > > issues too :/
> >
> > And that's why I didn't include opt-in/opt-out for any of the tiers;
> > if you have system-wide ratios, there's no need to change the ratios
> > at all, and as long as the sum of your memory.limit for each workload
> > is under the total capacity, all tiers will also not be overcommitted.
>
> I think eventually there may be use cases to opt some memcgs out for
> some memory tiers. For example, limit sensitive workloads to the top
> tier (or vice versa).
Yup, that makes sense to me too.
One of the things that did concern me a bit with my model for tiered
memcg limit was that system-critical processes would also be susceptible
to being demoted and churned, when we would much rather make sure
those are kept protected at the toptier.
> > Now, all of these complications aside, I think we might be overthinking
> > a bit here : -) The auto-scaling should just provide some sort of
> > "reasonable" default, the users can always override the per-tier
> > limits if they are unhappy with the autoscaled values.
>
> I agree, but it seems like both options are not ideal here. I think it
> might make more sense to not present a default value at all, have
> "max" be the default for all the tiers, even if memory.max or swap.max
> isn't. Userspace can set the limits if they need to. Autoscaling the
> limits in userspace should be easy.
I like this idea a lot. That would basically make swap tiers a no-op
unless you opt-into setting the limits yourself, so we don't run the
risk of accidentally enabling tiers.
On that note, maybe it makes sense for me to change my
memory tiering series to also just not present a default setting for
tiered limits, and instead just set them as max until the user comes
and configures them?
I think this is a better question for the memcg maintainers, who might
have more to say on this. Johannes, Michal, Roman, and Shakeel,
what do you guys think? Could an approach to just make the memory
tier limits writable from the get-go and not expose any defaults
make sense to you?
I think that would simplify the code quite a bit and also help mitigate
the possible side effects on system-critical workloads.
Thanks!
Joshua
next prev parent reply other threads:[~2026-06-23 20:20 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-20 18:16 [PATCH v9 0/6] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Youngjun Park
2026-06-20 18:16 ` [PATCH v9 1/6] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-06-20 18:16 ` [PATCH v9 2/6] mm: swap: associate swap devices with tiers Youngjun Park
2026-06-20 18:16 ` [PATCH v9 3/6] mm: memcontrol: add interface for swap tier selection Youngjun Park
2026-06-22 5:03 ` Youngjun Park
2026-06-22 21:21 ` Yosry Ahmed
2026-06-22 22:10 ` Joshua Hahn
2026-06-22 22:26 ` Yosry Ahmed
2026-06-22 23:19 ` Joshua Hahn
2026-06-22 23:46 ` Yosry Ahmed
2026-06-23 0:40 ` Joshua Hahn
2026-06-23 18:10 ` Yosry Ahmed
2026-06-23 18:56 ` Joshua Hahn
2026-06-23 20:06 ` Yosry Ahmed
2026-06-23 20:20 ` Joshua Hahn [this message]
2026-06-20 18:16 ` [PATCH v9 4/6] mm: swap: filter swap allocation by memcg tier mask Youngjun Park
2026-06-20 18:16 ` [PATCH v9 5/6] selftests/mm: add a swap tier configuration test Youngjun Park
2026-06-20 18:16 ` [PATCH v9 6/6] selftests/cgroup: add a swap tier routing test Youngjun Park
2026-06-22 21:23 ` [PATCH v9 0/6] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Yosry Ahmed
2026-06-23 1:29 ` Youngjun Park
2026-07-02 2:08 ` Youngjun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260623202012.2446676-1-joshua.hahnjy@gmail.com \
--to=joshua.hahnjy@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baoquan.he@linux.dev \
--cc=baver.bae@lge.com \
--cc=cgroups@vger.kernel.org \
--cc=chrisl@kernel.org \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=her0gyugyu@gmail.com \
--cc=hyungjun.cho@lge.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matia.kim@lge.com \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=taejoon.song@lge.com \
--cc=yosry@kernel.org \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox