All of lore.kernel.org
 help / color / mirror / Atom feed
From: JP Kobryn <inwardvessel@gmail.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com,
	mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	kernel-team@meta.com
Subject: Re: [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem
Date: Wed, 30 Apr 2025 16:43:41 -0700	[thread overview]
Message-ID: <e8de82fe-feea-4be2-93eb-620b8ece6748@gmail.com> (raw)
In-Reply-To: <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>

On 4/22/25 6:33 AM, Yosry Ahmed wrote:
> On Thu, Apr 03, 2025 at 06:10:49PM -0700, JP Kobryn wrote:
[..]
>> @@ -5425,6 +5417,9 @@ static void css_free_rwork_fn(struct work_struct *work)
>>   		struct cgroup_subsys_state *parent = css->parent;
>>   		int id = css->id;
>>   
>> +		if (ss->css_rstat_flush)
>> +			css_rstat_exit(css);
>> +
> 
> This call now exists in both branches (self css or not), so it's
> probably best to pull it outside. We should probably also pull the call
> in cgroup_destroy_root() outside into css_free_rwork_fn() so that we end
> up with a single call to css_rstat_exit() (apart from failure paths).

This can be done if css_rstat_exit() is modified to guard against
invalid css's like being a subsystem css and not having implemented
css_rstat_flush.

> 
> We can probably also use css_is_cgroup() here instead of 'if (ss)' for
> consistency.
> 
>>   		ss->css_free(css);
>>   		cgroup_idr_remove(&ss->css_idr, id);
>>   		cgroup_put(cgrp);
> [..]
>> @@ -5659,6 +5647,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
>>   		goto err_free_css;
>>   	css->id = err;
>>   
>> +	if (ss->css_rstat_flush) {
>> +		err = css_rstat_init(css);
>> +		if (err)
>> +			goto err_free_css;
>> +	}
>> +
>>   	/* @css is ready to be brought online now, make it visible */
>>   	list_add_tail_rcu(&css->sibling, &parent_css->children);
>>   	cgroup_idr_replace(&ss->css_idr, css, css->id);
>> @@ -5672,7 +5666,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
>>   err_list_del:
>>   	list_del_rcu(&css->sibling);
>>   err_free_css:
>> -	list_del_rcu(&css->rstat_css_node);
>>   	INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
>>   	queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
>>   	return ERR_PTR(err);
>> @@ -6104,11 +6097,17 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
>>   	css->flags |= CSS_NO_REF;
>>   
>>   	if (early) {
>> -		/* allocation can't be done safely during early init */
>> +		/*
>> +		 * Allocation can't be done safely during early init.
>> +		 * Defer IDR and rstat allocations until cgroup_init().
>> +		 */
>>   		css->id = 1;
>>   	} else {
>>   		css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
>>   		BUG_ON(css->id < 0);
>> +
>> +		if (ss->css_rstat_flush)
>> +			BUG_ON(css_rstat_init(css));
>>   	}
>>   
>>   	/* Update the init_css_set to contain a subsys
>> @@ -6207,9 +6206,17 @@ int __init cgroup_init(void)
>>   			struct cgroup_subsys_state *css =
>>   				init_css_set.subsys[ss->id];
>>   
>> +			/*
>> +			 * It is now safe to perform allocations.
>> +			 * Finish setting up subsystems that previously
>> +			 * deferred IDR and rstat allocations.
>> +			 */
>>   			css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2,
>>   						   GFP_KERNEL);
>>   			BUG_ON(css->id < 0);
>> +
>> +			if (ss->css_rstat_flush)
>> +				BUG_ON(css_rstat_init(css));
> 
> The calls to css_rstat_init() are really difficult to track. Let's
> recap, before this change we had two calls:
> - In cgroup_setup_root(), for root cgroups.
> - In cgroup_create(), for non-root cgroups.
> 
> This patch adds 3 more, so we end up with 5 calls as follows:
> - In cgroup_setup_root(), for root self css's.
> - In cgroup_create(), for non-root self css's.
> - In cgroup_subsys_init(), for root subsys css's without early
>    initialization.
> - In cgroup_init(), for root subsys css's with early
>    initialization, as we cannot call it from cgroup_subsys_init() early
>    as allocations are not allowed during early init.
> - In css_create(), for non-root non-self css's.
> 
> We should try to consolidate as much as possible. For example:
> - Can we always make the call for root subsys css's in cgroup_init(),
>    regardless of early initialization status? Is there a need to make the
>    call early for subsystems that use early in cgroup_subsys_init()
>    initialization?
> 
> - Can we always make the call for root css's in cgroup_init(),
>    regardless of whether the css is a self css or a subsys css? I imagine
>    we'd still need two separate calls, one outside the loop for the self
>    css's, and one in the loop for subsys css's, but having them in the
>    same place should make things easier.

The answer might be the same for the two questions above. I think at
best, we can eliminate the single call below to css_rstat_init():

cgroup_init()
	for_each_subsys(ss, ssid)
		if (ss->early_init)
			css_rstat_init(css) /* remove */

I'm not sure if it's by design and also an undocumented constraint but I
find that it is not possible to have a cgroup subsystem that is
designated for "early init" participate in rstat at the same time. The
necessary ordering of actions should be:

init_and_link_css(css, ss, ...)
css_rstat_init(css)
css_online(css)

This sequence occurs within cgroup_init_subsys() when ss->early_init is
false. However, when early_init is true, the last two calls are
effectively inverted:

css_online(css)
css_rstat_init(css) /* too late */

This needs to be avoided or else failures will occur during boot.

Note that even before this series, this constraint seems to have
existed. Using branch for-6.16 as a base, I experimented with a minimal
custom test cgroup in which I implement css_rstat_flush while early_init
is on. The system fails during boot because online_css() is called
before cgroup_rstat_init().

cgroup_init_early()
	for_each_subsys(ss, ssid)
		if (ss->early)
			cgroup_init_subsys(ss, true)
				css = ss->css_alloc()
				online_css(css)
cgroup_init()
	cgroup_setup_root()
		cgroup_rstat_init(root_cgrp) /* too late */

Unless I"m missing something, do you think this constraint is worthy of
a separate patch? Something like this that prevents the combination of
rstat with early_init:

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6130,7 +6130,8 @@ int __init cgroup_init_early(void)
     for_each_subsys(ss, i) {
-       WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id,
+       WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id ||
+           (ss->early_init && ss->css_rstat_flush),

> 
> Ideally if we can do both the above, we'd end up with 3 calling
> functions only:
> - cgroup_init() -> for all root css's.
> - cgroup_create() -> for non-root self css's.
> - css_create() -> for non-root subsys css's.
> 
> Also, we should probably document all the different call paths for
> css_rstat_init() and css_rstat_exit() somewhere.

Will do.

  parent reply	other threads:[~2025-04-30 23:43 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04  1:10 [PATCH v4 0/5] cgroup: separate rstat trees JP Kobryn
2025-04-04  1:10 ` [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct JP Kobryn
2025-04-15 17:16   ` Michal Koutný
2025-04-22 12:13   ` Yosry Ahmed
2025-05-29 18:58   ` Ihor Solodrai
2025-05-29 19:11     ` Yonghong Song
2025-04-04  1:10 ` [PATCH v4 2/5] cgroup: add helper for checking when css is cgroup::self JP Kobryn
2025-04-22 12:19   ` Yosry Ahmed
     [not found]   ` <68078968.5d0a0220.2c3c35.bab3SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 16:59     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 3/5] cgroup: change rstat function signatures from cgroup-based to css-based JP Kobryn
2025-04-04 20:00   ` Tejun Heo
2025-04-04 20:09     ` Tejun Heo
2025-04-04 21:21       ` JP Kobryn
2025-04-22 12:35   ` Yosry Ahmed
2025-04-22 12:39   ` Yosry Ahmed
     [not found]   ` <68078d3c.050a0220.3d37e.6d82SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 17:10     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-16 21:43     ` JP Kobryn
2025-04-17  9:26       ` Michal Koutný
2025-04-17 19:05         ` JP Kobryn
2025-04-17 20:10           ` JP Kobryn
2025-04-21 18:18   ` JP Kobryn
2025-04-22 13:33   ` Yosry Ahmed
     [not found]   ` <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-30 23:43     ` JP Kobryn [this message]
2025-05-06  9:37       ` Yosry Ahmed
2025-04-04  1:10 ` [PATCH v4 5/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-04-04 20:28   ` Tejun Heo
2025-04-11  3:31     ` JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-15 19:30     ` Tejun Heo
2025-04-16  9:50       ` Michal Koutný
2025-04-16 18:10         ` JP Kobryn
2025-04-16 18:14           ` Tejun Heo
2025-04-16 18:01     ` JP Kobryn
2025-04-22 14:01   ` Yosry Ahmed
2025-04-24 17:25     ` Shakeel Butt
     [not found]   ` <6807a132.df0a0220.28dc80.a1f0SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-25  0:18     ` JP Kobryn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8de82fe-feea-4be2-93eb-620b8ece6748@gmail.com \
    --to=inwardvessel@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-mm@kvack.org \
    --cc=mkoutny@suse.com \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    --cc=yosry.ahmed@linux.dev \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.