linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: JP Kobryn <inwardvessel@gmail.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com,
	mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	kernel-team@meta.com
Subject: Re: [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem
Date: Wed, 30 Apr 2025 16:43:41 -0700	[thread overview]
Message-ID: <e8de82fe-feea-4be2-93eb-620b8ece6748@gmail.com> (raw)
In-Reply-To: <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>

On 4/22/25 6:33 AM, Yosry Ahmed wrote:
> On Thu, Apr 03, 2025 at 06:10:49PM -0700, JP Kobryn wrote:
[..]
>> @@ -5425,6 +5417,9 @@ static void css_free_rwork_fn(struct work_struct *work)
>>   		struct cgroup_subsys_state *parent = css->parent;
>>   		int id = css->id;
>>   
>> +		if (ss->css_rstat_flush)
>> +			css_rstat_exit(css);
>> +
> 
> This call now exists in both branches (self css or not), so it's
> probably best to pull it outside. We should probably also pull the call
> in cgroup_destroy_root() outside into css_free_rwork_fn() so that we end
> up with a single call to css_rstat_exit() (apart from failure paths).

This can be done if css_rstat_exit() is modified to guard against
invalid css's like being a subsystem css and not having implemented
css_rstat_flush.

> 
> We can probably also use css_is_cgroup() here instead of 'if (ss)' for
> consistency.
> 
>>   		ss->css_free(css);
>>   		cgroup_idr_remove(&ss->css_idr, id);
>>   		cgroup_put(cgrp);
> [..]
>> @@ -5659,6 +5647,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
>>   		goto err_free_css;
>>   	css->id = err;
>>   
>> +	if (ss->css_rstat_flush) {
>> +		err = css_rstat_init(css);
>> +		if (err)
>> +			goto err_free_css;
>> +	}
>> +
>>   	/* @css is ready to be brought online now, make it visible */
>>   	list_add_tail_rcu(&css->sibling, &parent_css->children);
>>   	cgroup_idr_replace(&ss->css_idr, css, css->id);
>> @@ -5672,7 +5666,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
>>   err_list_del:
>>   	list_del_rcu(&css->sibling);
>>   err_free_css:
>> -	list_del_rcu(&css->rstat_css_node);
>>   	INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
>>   	queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
>>   	return ERR_PTR(err);
>> @@ -6104,11 +6097,17 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
>>   	css->flags |= CSS_NO_REF;
>>   
>>   	if (early) {
>> -		/* allocation can't be done safely during early init */
>> +		/*
>> +		 * Allocation can't be done safely during early init.
>> +		 * Defer IDR and rstat allocations until cgroup_init().
>> +		 */
>>   		css->id = 1;
>>   	} else {
>>   		css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
>>   		BUG_ON(css->id < 0);
>> +
>> +		if (ss->css_rstat_flush)
>> +			BUG_ON(css_rstat_init(css));
>>   	}
>>   
>>   	/* Update the init_css_set to contain a subsys
>> @@ -6207,9 +6206,17 @@ int __init cgroup_init(void)
>>   			struct cgroup_subsys_state *css =
>>   				init_css_set.subsys[ss->id];
>>   
>> +			/*
>> +			 * It is now safe to perform allocations.
>> +			 * Finish setting up subsystems that previously
>> +			 * deferred IDR and rstat allocations.
>> +			 */
>>   			css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2,
>>   						   GFP_KERNEL);
>>   			BUG_ON(css->id < 0);
>> +
>> +			if (ss->css_rstat_flush)
>> +				BUG_ON(css_rstat_init(css));
> 
> The calls to css_rstat_init() are really difficult to track. Let's
> recap, before this change we had two calls:
> - In cgroup_setup_root(), for root cgroups.
> - In cgroup_create(), for non-root cgroups.
> 
> This patch adds 3 more, so we end up with 5 calls as follows:
> - In cgroup_setup_root(), for root self css's.
> - In cgroup_create(), for non-root self css's.
> - In cgroup_subsys_init(), for root subsys css's without early
>    initialization.
> - In cgroup_init(), for root subsys css's with early
>    initialization, as we cannot call it from cgroup_subsys_init() early
>    as allocations are not allowed during early init.
> - In css_create(), for non-root non-self css's.
> 
> We should try to consolidate as much as possible. For example:
> - Can we always make the call for root subsys css's in cgroup_init(),
>    regardless of early initialization status? Is there a need to make the
>    call early for subsystems that use early in cgroup_subsys_init()
>    initialization?
> 
> - Can we always make the call for root css's in cgroup_init(),
>    regardless of whether the css is a self css or a subsys css? I imagine
>    we'd still need two separate calls, one outside the loop for the self
>    css's, and one in the loop for subsys css's, but having them in the
>    same place should make things easier.

The answer might be the same for the two questions above. I think at
best, we can eliminate the single call below to css_rstat_init():

cgroup_init()
	for_each_subsys(ss, ssid)
		if (ss->early_init)
			css_rstat_init(css) /* remove */

I'm not sure if it's by design and also an undocumented constraint but I
find that it is not possible to have a cgroup subsystem that is
designated for "early init" participate in rstat at the same time. The
necessary ordering of actions should be:

init_and_link_css(css, ss, ...)
css_rstat_init(css)
css_online(css)

This sequence occurs within cgroup_init_subsys() when ss->early_init is
false. However, when early_init is true, the last two calls are
effectively inverted:

css_online(css)
css_rstat_init(css) /* too late */

This needs to be avoided or else failures will occur during boot.

Note that even before this series, this constraint seems to have
existed. Using branch for-6.16 as a base, I experimented with a minimal
custom test cgroup in which I implement css_rstat_flush while early_init
is on. The system fails during boot because online_css() is called
before cgroup_rstat_init().

cgroup_init_early()
	for_each_subsys(ss, ssid)
		if (ss->early)
			cgroup_init_subsys(ss, true)
				css = ss->css_alloc()
				online_css(css)
cgroup_init()
	cgroup_setup_root()
		cgroup_rstat_init(root_cgrp) /* too late */

Unless I"m missing something, do you think this constraint is worthy of
a separate patch? Something like this that prevents the combination of
rstat with early_init:

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6130,7 +6130,8 @@ int __init cgroup_init_early(void)
     for_each_subsys(ss, i) {
-       WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id,
+       WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id ||
+           (ss->early_init && ss->css_rstat_flush),

> 
> Ideally if we can do both the above, we'd end up with 3 calling
> functions only:
> - cgroup_init() -> for all root css's.
> - cgroup_create() -> for non-root self css's.
> - css_create() -> for non-root subsys css's.
> 
> Also, we should probably document all the different call paths for
> css_rstat_init() and css_rstat_exit() somewhere.

Will do.


  parent reply	other threads:[~2025-04-30 23:43 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04  1:10 [PATCH v4 0/5] cgroup: separate rstat trees JP Kobryn
2025-04-04  1:10 ` [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct JP Kobryn
2025-04-15 17:16   ` Michal Koutný
2025-04-22 12:13   ` Yosry Ahmed
2025-05-29 18:58   ` Ihor Solodrai
2025-05-29 19:11     ` Yonghong Song
2025-04-04  1:10 ` [PATCH v4 2/5] cgroup: add helper for checking when css is cgroup::self JP Kobryn
2025-04-22 12:19   ` Yosry Ahmed
     [not found]   ` <68078968.5d0a0220.2c3c35.bab3SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 16:59     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 3/5] cgroup: change rstat function signatures from cgroup-based to css-based JP Kobryn
2025-04-04 20:00   ` Tejun Heo
2025-04-04 20:09     ` Tejun Heo
2025-04-04 21:21       ` JP Kobryn
2025-04-22 12:35   ` Yosry Ahmed
2025-04-22 12:39   ` Yosry Ahmed
     [not found]   ` <68078d3c.050a0220.3d37e.6d82SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 17:10     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-16 21:43     ` JP Kobryn
2025-04-17  9:26       ` Michal Koutný
2025-04-17 19:05         ` JP Kobryn
2025-04-17 20:10           ` JP Kobryn
2025-04-21 18:18   ` JP Kobryn
2025-04-22 13:33   ` Yosry Ahmed
     [not found]   ` <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-30 23:43     ` JP Kobryn [this message]
2025-05-06  9:37       ` Yosry Ahmed
2025-04-04  1:10 ` [PATCH v4 5/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-04-04 20:28   ` Tejun Heo
2025-04-11  3:31     ` JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-15 19:30     ` Tejun Heo
2025-04-16  9:50       ` Michal Koutný
2025-04-16 18:10         ` JP Kobryn
2025-04-16 18:14           ` Tejun Heo
2025-04-16 18:01     ` JP Kobryn
2025-04-22 14:01   ` Yosry Ahmed
2025-04-24 17:25     ` Shakeel Butt
     [not found]   ` <6807a132.df0a0220.28dc80.a1f0SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-25  0:18     ` JP Kobryn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8de82fe-feea-4be2-93eb-620b8ece6748@gmail.com \
    --to=inwardvessel@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-mm@kvack.org \
    --cc=mkoutny@suse.com \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    --cc=yosry.ahmed@linux.dev \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).