All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: JP Kobryn <inwardvessel@gmail.com>
Cc: tj@kernel.org, shakeel.butt@linux.dev, mkoutny@suse.com,
	hannes@cmpxchg.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	kernel-team@meta.com
Subject: Re: [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem
Date: Tue, 6 May 2025 09:37:42 +0000	[thread overview]
Message-ID: <aBnYZqJtZLDMKQYU@google.com> (raw)
In-Reply-To: <e8de82fe-feea-4be2-93eb-620b8ece6748@gmail.com>

On Wed, Apr 30, 2025 at 04:43:41PM -0700, JP Kobryn wrote:
> On 4/22/25 6:33 AM, Yosry Ahmed wrote:
> > On Thu, Apr 03, 2025 at 06:10:49PM -0700, JP Kobryn wrote:
> [..]
> > > @@ -5425,6 +5417,9 @@ static void css_free_rwork_fn(struct work_struct *work)
> > >   		struct cgroup_subsys_state *parent = css->parent;
> > >   		int id = css->id;
> > > +		if (ss->css_rstat_flush)
> > > +			css_rstat_exit(css);
> > > +
> > 
> > This call now exists in both branches (self css or not), so it's
> > probably best to pull it outside. We should probably also pull the call
> > in cgroup_destroy_root() outside into css_free_rwork_fn() so that we end
> > up with a single call to css_rstat_exit() (apart from failure paths).
> 
> This can be done if css_rstat_exit() is modified to guard against
> invalid css's like being a subsystem css and not having implemented
> css_rstat_flush.
> 
> > 
> > We can probably also use css_is_cgroup() here instead of 'if (ss)' for
> > consistency.
> > 
> > >   		ss->css_free(css);
> > >   		cgroup_idr_remove(&ss->css_idr, id);
> > >   		cgroup_put(cgrp);
> > [..]
> > > @@ -5659,6 +5647,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
> > >   		goto err_free_css;
> > >   	css->id = err;
> > > +	if (ss->css_rstat_flush) {
> > > +		err = css_rstat_init(css);
> > > +		if (err)
> > > +			goto err_free_css;
> > > +	}
> > > +
> > >   	/* @css is ready to be brought online now, make it visible */
> > >   	list_add_tail_rcu(&css->sibling, &parent_css->children);
> > >   	cgroup_idr_replace(&ss->css_idr, css, css->id);
> > > @@ -5672,7 +5666,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
> > >   err_list_del:
> > >   	list_del_rcu(&css->sibling);
> > >   err_free_css:
> > > -	list_del_rcu(&css->rstat_css_node);
> > >   	INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
> > >   	queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
> > >   	return ERR_PTR(err);
> > > @@ -6104,11 +6097,17 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
> > >   	css->flags |= CSS_NO_REF;
> > >   	if (early) {
> > > -		/* allocation can't be done safely during early init */
> > > +		/*
> > > +		 * Allocation can't be done safely during early init.
> > > +		 * Defer IDR and rstat allocations until cgroup_init().
> > > +		 */
> > >   		css->id = 1;
> > >   	} else {
> > >   		css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
> > >   		BUG_ON(css->id < 0);
> > > +
> > > +		if (ss->css_rstat_flush)
> > > +			BUG_ON(css_rstat_init(css));
> > >   	}
> > >   	/* Update the init_css_set to contain a subsys
> > > @@ -6207,9 +6206,17 @@ int __init cgroup_init(void)
> > >   			struct cgroup_subsys_state *css =
> > >   				init_css_set.subsys[ss->id];
> > > +			/*
> > > +			 * It is now safe to perform allocations.
> > > +			 * Finish setting up subsystems that previously
> > > +			 * deferred IDR and rstat allocations.
> > > +			 */
> > >   			css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2,
> > >   						   GFP_KERNEL);
> > >   			BUG_ON(css->id < 0);
> > > +
> > > +			if (ss->css_rstat_flush)
> > > +				BUG_ON(css_rstat_init(css));
> > 
> > The calls to css_rstat_init() are really difficult to track. Let's
> > recap, before this change we had two calls:
> > - In cgroup_setup_root(), for root cgroups.
> > - In cgroup_create(), for non-root cgroups.
> > 
> > This patch adds 3 more, so we end up with 5 calls as follows:
> > - In cgroup_setup_root(), for root self css's.
> > - In cgroup_create(), for non-root self css's.
> > - In cgroup_subsys_init(), for root subsys css's without early
> >    initialization.
> > - In cgroup_init(), for root subsys css's with early
> >    initialization, as we cannot call it from cgroup_subsys_init() early
> >    as allocations are not allowed during early init.
> > - In css_create(), for non-root non-self css's.
> > 
> > We should try to consolidate as much as possible. For example:
> > - Can we always make the call for root subsys css's in cgroup_init(),
> >    regardless of early initialization status? Is there a need to make the
> >    call early for subsystems that use early in cgroup_subsys_init()
> >    initialization?
> > 
> > - Can we always make the call for root css's in cgroup_init(),
> >    regardless of whether the css is a self css or a subsys css? I imagine
> >    we'd still need two separate calls, one outside the loop for the self
> >    css's, and one in the loop for subsys css's, but having them in the
> >    same place should make things easier.
> 
> The answer might be the same for the two questions above. I think at
> best, we can eliminate the single call below to css_rstat_init():
> 
> cgroup_init()
> 	for_each_subsys(ss, ssid)
> 		if (ss->early_init)
> 			css_rstat_init(css) /* remove */
> 
> I'm not sure if it's by design and also an undocumented constraint but I
> find that it is not possible to have a cgroup subsystem that is
> designated for "early init" participate in rstat at the same time. The
> necessary ordering of actions should be:
> 
> init_and_link_css(css, ss, ...)
> css_rstat_init(css)
> css_online(css)
> 
> This sequence occurs within cgroup_init_subsys() when ss->early_init is
> false. However, when early_init is true, the last two calls are
> effectively inverted:
> 
> css_online(css)
> css_rstat_init(css) /* too late */
> 
> This needs to be avoided or else failures will occur during boot.
> 
> Note that even before this series, this constraint seems to have
> existed. Using branch for-6.16 as a base, I experimented with a minimal
> custom test cgroup in which I implement css_rstat_flush while early_init
> is on. The system fails during boot because online_css() is called
> before cgroup_rstat_init().
> 
> cgroup_init_early()
> 	for_each_subsys(ss, ssid)
> 		if (ss->early)
> 			cgroup_init_subsys(ss, true)
> 				css = ss->css_alloc()
> 				online_css(css)
> cgroup_init()
> 	cgroup_setup_root()
> 		cgroup_rstat_init(root_cgrp) /* too late */

I am looking at online_css() and cgroup_rstat_init() and I cannot
immediately see the ordering requirement.

Is the idea that once a css is online an rstat flush can occur, and it
would crash if cgroup_rstat_init() hadn't been called yet? I am
wondering when would the flush occur before cgroup_init() is called.

Anyway, if that is possible I think enforcing this is probably
important.

> 
> Unless I"m missing something, do you think this constraint is worthy of
> a separate patch? Something like this that prevents the combination of
> rstat with early_init:
> 
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6130,7 +6130,8 @@ int __init cgroup_init_early(void)
>     for_each_subsys(ss, i) {
> -       WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id,
> +       WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id ||
> +           (ss->early_init && ss->css_rstat_flush),
> 
> > 
> > Ideally if we can do both the above, we'd end up with 3 calling
> > functions only:
> > - cgroup_init() -> for all root css's.
> > - cgroup_create() -> for non-root self css's.
> > - css_create() -> for non-root subsys css's.
> > 
> > Also, we should probably document all the different call paths for
> > css_rstat_init() and css_rstat_exit() somewhere.
> 
> Will do.

  reply	other threads:[~2025-05-06  9:37 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04  1:10 [PATCH v4 0/5] cgroup: separate rstat trees JP Kobryn
2025-04-04  1:10 ` [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct JP Kobryn
2025-04-15 17:16   ` Michal Koutný
2025-04-22 12:13   ` Yosry Ahmed
2025-05-29 18:58   ` Ihor Solodrai
2025-05-29 19:11     ` Yonghong Song
2025-04-04  1:10 ` [PATCH v4 2/5] cgroup: add helper for checking when css is cgroup::self JP Kobryn
2025-04-22 12:19   ` Yosry Ahmed
     [not found]   ` <68078968.5d0a0220.2c3c35.bab3SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 16:59     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 3/5] cgroup: change rstat function signatures from cgroup-based to css-based JP Kobryn
2025-04-04 20:00   ` Tejun Heo
2025-04-04 20:09     ` Tejun Heo
2025-04-04 21:21       ` JP Kobryn
2025-04-22 12:35   ` Yosry Ahmed
2025-04-22 12:39   ` Yosry Ahmed
     [not found]   ` <68078d3c.050a0220.3d37e.6d82SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 17:10     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-16 21:43     ` JP Kobryn
2025-04-17  9:26       ` Michal Koutný
2025-04-17 19:05         ` JP Kobryn
2025-04-17 20:10           ` JP Kobryn
2025-04-21 18:18   ` JP Kobryn
2025-04-22 13:33   ` Yosry Ahmed
     [not found]   ` <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-30 23:43     ` JP Kobryn
2025-05-06  9:37       ` Yosry Ahmed [this message]
2025-04-04  1:10 ` [PATCH v4 5/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-04-04 20:28   ` Tejun Heo
2025-04-11  3:31     ` JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-15 19:30     ` Tejun Heo
2025-04-16  9:50       ` Michal Koutný
2025-04-16 18:10         ` JP Kobryn
2025-04-16 18:14           ` Tejun Heo
2025-04-16 18:01     ` JP Kobryn
2025-04-22 14:01   ` Yosry Ahmed
2025-04-24 17:25     ` Shakeel Butt
     [not found]   ` <6807a132.df0a0220.28dc80.a1f0SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-25  0:18     ` JP Kobryn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBnYZqJtZLDMKQYU@google.com \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-mm@kvack.org \
    --cc=mkoutny@suse.com \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.