From: JP Kobryn <inwardvessel@gmail.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com,
mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org,
linux-mm@kvack.org, cgroups@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem
Date: Wed, 30 Apr 2025 16:43:41 -0700 [thread overview]
Message-ID: <e8de82fe-feea-4be2-93eb-620b8ece6748@gmail.com> (raw)
In-Reply-To: <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>
On 4/22/25 6:33 AM, Yosry Ahmed wrote:
> On Thu, Apr 03, 2025 at 06:10:49PM -0700, JP Kobryn wrote:
[..]
>> @@ -5425,6 +5417,9 @@ static void css_free_rwork_fn(struct work_struct *work)
>> struct cgroup_subsys_state *parent = css->parent;
>> int id = css->id;
>>
>> + if (ss->css_rstat_flush)
>> + css_rstat_exit(css);
>> +
>
> This call now exists in both branches (self css or not), so it's
> probably best to pull it outside. We should probably also pull the call
> in cgroup_destroy_root() outside into css_free_rwork_fn() so that we end
> up with a single call to css_rstat_exit() (apart from failure paths).
This can be done if css_rstat_exit() is modified to guard against
invalid css's like being a subsystem css and not having implemented
css_rstat_flush.
>
> We can probably also use css_is_cgroup() here instead of 'if (ss)' for
> consistency.
>
>> ss->css_free(css);
>> cgroup_idr_remove(&ss->css_idr, id);
>> cgroup_put(cgrp);
> [..]
>> @@ -5659,6 +5647,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
>> goto err_free_css;
>> css->id = err;
>>
>> + if (ss->css_rstat_flush) {
>> + err = css_rstat_init(css);
>> + if (err)
>> + goto err_free_css;
>> + }
>> +
>> /* @css is ready to be brought online now, make it visible */
>> list_add_tail_rcu(&css->sibling, &parent_css->children);
>> cgroup_idr_replace(&ss->css_idr, css, css->id);
>> @@ -5672,7 +5666,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
>> err_list_del:
>> list_del_rcu(&css->sibling);
>> err_free_css:
>> - list_del_rcu(&css->rstat_css_node);
>> INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
>> queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
>> return ERR_PTR(err);
>> @@ -6104,11 +6097,17 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
>> css->flags |= CSS_NO_REF;
>>
>> if (early) {
>> - /* allocation can't be done safely during early init */
>> + /*
>> + * Allocation can't be done safely during early init.
>> + * Defer IDR and rstat allocations until cgroup_init().
>> + */
>> css->id = 1;
>> } else {
>> css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
>> BUG_ON(css->id < 0);
>> +
>> + if (ss->css_rstat_flush)
>> + BUG_ON(css_rstat_init(css));
>> }
>>
>> /* Update the init_css_set to contain a subsys
>> @@ -6207,9 +6206,17 @@ int __init cgroup_init(void)
>> struct cgroup_subsys_state *css =
>> init_css_set.subsys[ss->id];
>>
>> + /*
>> + * It is now safe to perform allocations.
>> + * Finish setting up subsystems that previously
>> + * deferred IDR and rstat allocations.
>> + */
>> css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2,
>> GFP_KERNEL);
>> BUG_ON(css->id < 0);
>> +
>> + if (ss->css_rstat_flush)
>> + BUG_ON(css_rstat_init(css));
>
> The calls to css_rstat_init() are really difficult to track. Let's
> recap, before this change we had two calls:
> - In cgroup_setup_root(), for root cgroups.
> - In cgroup_create(), for non-root cgroups.
>
> This patch adds 3 more, so we end up with 5 calls as follows:
> - In cgroup_setup_root(), for root self css's.
> - In cgroup_create(), for non-root self css's.
> - In cgroup_subsys_init(), for root subsys css's without early
> initialization.
> - In cgroup_init(), for root subsys css's with early
> initialization, as we cannot call it from cgroup_subsys_init() early
> as allocations are not allowed during early init.
> - In css_create(), for non-root non-self css's.
>
> We should try to consolidate as much as possible. For example:
> - Can we always make the call for root subsys css's in cgroup_init(),
> regardless of early initialization status? Is there a need to make the
> call early for subsystems that use early in cgroup_subsys_init()
> initialization?
>
> - Can we always make the call for root css's in cgroup_init(),
> regardless of whether the css is a self css or a subsys css? I imagine
> we'd still need two separate calls, one outside the loop for the self
> css's, and one in the loop for subsys css's, but having them in the
> same place should make things easier.
The answer might be the same for the two questions above. I think at
best, we can eliminate the single call below to css_rstat_init():
cgroup_init()
for_each_subsys(ss, ssid)
if (ss->early_init)
css_rstat_init(css) /* remove */
I'm not sure if it's by design and also an undocumented constraint but I
find that it is not possible to have a cgroup subsystem that is
designated for "early init" participate in rstat at the same time. The
necessary ordering of actions should be:
init_and_link_css(css, ss, ...)
css_rstat_init(css)
css_online(css)
This sequence occurs within cgroup_init_subsys() when ss->early_init is
false. However, when early_init is true, the last two calls are
effectively inverted:
css_online(css)
css_rstat_init(css) /* too late */
This needs to be avoided or else failures will occur during boot.
Note that even before this series, this constraint seems to have
existed. Using branch for-6.16 as a base, I experimented with a minimal
custom test cgroup in which I implement css_rstat_flush while early_init
is on. The system fails during boot because online_css() is called
before cgroup_rstat_init().
cgroup_init_early()
for_each_subsys(ss, ssid)
if (ss->early)
cgroup_init_subsys(ss, true)
css = ss->css_alloc()
online_css(css)
cgroup_init()
cgroup_setup_root()
cgroup_rstat_init(root_cgrp) /* too late */
Unless I"m missing something, do you think this constraint is worthy of
a separate patch? Something like this that prevents the combination of
rstat with early_init:
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6130,7 +6130,8 @@ int __init cgroup_init_early(void)
for_each_subsys(ss, i) {
- WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id,
+ WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id ||
+ (ss->early_init && ss->css_rstat_flush),
>
> Ideally if we can do both the above, we'd end up with 3 calling
> functions only:
> - cgroup_init() -> for all root css's.
> - cgroup_create() -> for non-root self css's.
> - css_create() -> for non-root subsys css's.
>
> Also, we should probably document all the different call paths for
> css_rstat_init() and css_rstat_exit() somewhere.
Will do.
next prev parent reply other threads:[~2025-04-30 23:43 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 1:10 [PATCH v4 0/5] cgroup: separate rstat trees JP Kobryn
2025-04-04 1:10 ` [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct JP Kobryn
2025-04-15 17:16 ` Michal Koutný
2025-04-22 12:13 ` Yosry Ahmed
2025-05-29 18:58 ` Ihor Solodrai
2025-05-29 19:11 ` Yonghong Song
2025-04-04 1:10 ` [PATCH v4 2/5] cgroup: add helper for checking when css is cgroup::self JP Kobryn
2025-04-22 12:19 ` Yosry Ahmed
[not found] ` <68078968.5d0a0220.2c3c35.bab3SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 16:59 ` JP Kobryn
2025-04-04 1:10 ` [PATCH v4 3/5] cgroup: change rstat function signatures from cgroup-based to css-based JP Kobryn
2025-04-04 20:00 ` Tejun Heo
2025-04-04 20:09 ` Tejun Heo
2025-04-04 21:21 ` JP Kobryn
2025-04-22 12:35 ` Yosry Ahmed
2025-04-22 12:39 ` Yosry Ahmed
[not found] ` <68078d3c.050a0220.3d37e.6d82SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 17:10 ` JP Kobryn
2025-04-04 1:10 ` [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-04-15 17:15 ` Michal Koutný
2025-04-16 21:43 ` JP Kobryn
2025-04-17 9:26 ` Michal Koutný
2025-04-17 19:05 ` JP Kobryn
2025-04-17 20:10 ` JP Kobryn
2025-04-21 18:18 ` JP Kobryn
2025-04-22 13:33 ` Yosry Ahmed
[not found] ` <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-30 23:43 ` JP Kobryn [this message]
2025-05-06 9:37 ` Yosry Ahmed
2025-04-04 1:10 ` [PATCH v4 5/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-04-04 20:28 ` Tejun Heo
2025-04-11 3:31 ` JP Kobryn
2025-04-15 17:15 ` Michal Koutný
2025-04-15 19:30 ` Tejun Heo
2025-04-16 9:50 ` Michal Koutný
2025-04-16 18:10 ` JP Kobryn
2025-04-16 18:14 ` Tejun Heo
2025-04-16 18:01 ` JP Kobryn
2025-04-22 14:01 ` Yosry Ahmed
2025-04-24 17:25 ` Shakeel Butt
[not found] ` <6807a132.df0a0220.28dc80.a1f0SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-25 0:18 ` JP Kobryn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e8de82fe-feea-4be2-93eb-620b8ece6748@gmail.com \
--to=inwardvessel@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-mm@kvack.org \
--cc=mkoutny@suse.com \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
--cc=yosry.ahmed@linux.dev \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).