[PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup_create()
@ 2025-05-14  4:43 Tejun Heo
  2025-05-14  4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2025-05-14  4:43 UTC (permalink / raw)
  To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf

cgroup_bpf init and exit handling will be moved to a notifier chain. In
prepartion, reorganize cgroup_create() a bit so that the new cgroup is fully
initialized before any outside changes are made.

- cgrp->ancestors[] initialization and the hierarchical nr_descendants and
  nr_frozen_descendants updates were in the same loop. Separate them out and
  do the former earlier and do the latter later.

- Relocate cgroup_bpf_inherit() call so that it's after all cgroup
  initializations are complete.

No visible behavior changes expected.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup/cgroup.c |   48 ++++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5684,7 +5684,7 @@ static struct cgroup *cgroup_create(stru
 	struct cgroup_root *root = parent->root;
 	struct cgroup *cgrp, *tcgrp;
 	struct kernfs_node *kn;
-	int level = parent->level + 1;
+	int i, level = parent->level + 1;
 	int ret;
 
 	/* allocate the cgroup and its ID, 0 is reserved for the root */
@@ -5720,11 +5720,8 @@ static struct cgroup *cgroup_create(stru
 	if (ret)
 		goto out_kernfs_remove;
 
-	if (cgrp->root == &cgrp_dfl_root) {
-		ret = cgroup_bpf_inherit(cgrp);
-		if (ret)
-			goto out_psi_free;
-	}
+	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp))
+		cgrp->ancestors[tcgrp->level] = tcgrp;
 
 	/*
 	 * New cgroup inherits effective freeze counter, and
@@ -5742,24 +5739,6 @@ static struct cgroup *cgroup_create(stru
 		set_bit(CGRP_FROZEN, &cgrp->flags);
 	}
 
-	spin_lock_irq(&css_set_lock);
-	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) {
-		cgrp->ancestors[tcgrp->level] = tcgrp;
-
-		if (tcgrp != cgrp) {
-			tcgrp->nr_descendants++;
-
-			/*
-			 * If the new cgroup is frozen, all ancestor cgroups
-			 * get a new frozen descendant, but their state can't
-			 * change because of this.
-			 */
-			if (cgrp->freezer.e_freeze)
-				tcgrp->freezer.nr_frozen_descendants++;
-		}
-	}
-	spin_unlock_irq(&css_set_lock);
-
 	if (notify_on_release(parent))
 		set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
 
@@ -5768,7 +5747,28 @@ static struct cgroup *cgroup_create(stru
 
 	cgrp->self.serial_nr = css_serial_nr_next++;
 
+	if (cgrp->root == &cgrp_dfl_root) {
+		ret = cgroup_bpf_inherit(cgrp);
+		if (ret)
+			goto out_psi_free;
+	}
+
 	/* allocation complete, commit to creation */
+	spin_lock_irq(&css_set_lock);
+	for (i = 0; i < level; i++) {
+		tcgrp = cgrp->ancestors[i];
+		tcgrp->nr_descendants++;
+
+		/*
+		 * If the new cgroup is frozen, all ancestor cgroups get a new
+		 * frozen descendant, but their state can't change because of
+		 * this.
+		 */
+		if (cgrp->freezer.e_freeze)
+			tcgrp->freezer.nr_frozen_descendants++;
+	}
+	spin_unlock_irq(&css_set_lock);
+
 	list_add_tail_rcu(&cgrp->self.sibling, &cgroup_parent(cgrp)->self.children);
 	atomic_inc(&root->nr_cgrps);
 	cgroup_get_live(parent);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier
  2025-05-14  4:43 [PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup_create() Tejun Heo
@ 2025-05-14  4:44 ` Tejun Heo
  2025-05-14  4:46   ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo
  2025-06-02 15:07   ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný
  0 siblings, 2 replies; 6+ messages in thread
From: Tejun Heo @ 2025-05-14  4:44 UTC (permalink / raw)
  To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf

Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf
support being one such example. For such a feature, it's useful to be able
to hook into cgroup creation and destruction paths to perform
feature-specific initializations and cleanups.

Add cgroup_lifetime_notifier which generates CGROUP_LIFETIME_ONLINE and
CGROUP_LIFETIME_OFFLINE events whenever cgroups are created and destroyed,
respectively.

The next patch will convert cgroup_bpf to use the new notifier and other
uses are planned.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h |    9 ++++++++-
 kernel/cgroup/cgroup.c |   27 ++++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 2 deletions(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -19,6 +19,7 @@
 #include <linux/kernfs.h>
 #include <linux/jump_label.h>
 #include <linux/types.h>
+#include <linux/notifier.h>
 #include <linux/ns_common.h>
 #include <linux/nsproxy.h>
 #include <linux/user_namespace.h>
@@ -40,7 +41,7 @@ struct kernel_clone_args;
 
 #ifdef CONFIG_CGROUPS
 
-enum {
+enum css_task_iter_flags {
 	CSS_TASK_ITER_PROCS    = (1U << 0),  /* walk only threadgroup leaders */
 	CSS_TASK_ITER_THREADED = (1U << 1),  /* walk all threaded css_sets in the domain */
 	CSS_TASK_ITER_SKIPPED  = (1U << 16), /* internal flags */
@@ -66,10 +67,16 @@ struct css_task_iter {
 	struct list_head		iters_node;	/* css_set->task_iters */
 };
 
+enum cgroup_lifetime_events {
+	CGROUP_LIFETIME_ONLINE,
+	CGROUP_LIFETIME_OFFLINE,
+};
+
 extern struct file_system_type cgroup_fs_type;
 extern struct cgroup_root cgrp_dfl_root;
 extern struct css_set init_css_set;
 extern spinlock_t css_set_lock;
+extern struct blocking_notifier_head cgroup_lifetime_notifier;
 
 #define SUBSYS(_x) extern struct cgroup_subsys _x ## _cgrp_subsys;
 #include <linux/cgroup_subsys.h>
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -95,6 +95,9 @@ EXPORT_SYMBOL_GPL(cgroup_mutex);
 EXPORT_SYMBOL_GPL(css_set_lock);
 #endif
 
+struct blocking_notifier_head cgroup_lifetime_notifier =
+	BLOCKING_NOTIFIER_INIT(cgroup_lifetime_notifier);
+
 DEFINE_SPINLOCK(trace_cgroup_path_lock);
 char trace_cgroup_path[TRACE_CGROUP_PATH_LEN];
 static bool cgroup_debug __read_mostly;
@@ -1335,6 +1338,7 @@ static void cgroup_destroy_root(struct c
 {
 	struct cgroup *cgrp = &root->cgrp;
 	struct cgrp_cset_link *link, *tmp_link;
+	int ret;
 
 	trace_cgroup_destroy_root(root);
 
@@ -1343,6 +1347,10 @@ static void cgroup_destroy_root(struct c
 	BUG_ON(atomic_read(&root->nr_cgrps));
 	BUG_ON(!list_empty(&cgrp->self.children));
 
+	ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
+					   CGROUP_LIFETIME_OFFLINE, cgrp);
+	WARN_ON_ONCE(notifier_to_errno(ret));
+
 	/* Rebind all subsystems back to the default hierarchy */
 	WARN_ON(rebind_subsystems(&cgrp_dfl_root, root->subsys_mask));
 
@@ -2159,6 +2167,10 @@ int cgroup_setup_root(struct cgroup_root
 		WARN_ON_ONCE(ret);
 	}
 
+	ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
+					   CGROUP_LIFETIME_ONLINE, root_cgrp);
+	WARN_ON_ONCE(notifier_to_errno(ret));
+
 	trace_cgroup_setup_root(root);
 
 	/*
@@ -5753,6 +5765,15 @@ static struct cgroup *cgroup_create(stru
 			goto out_psi_free;
 	}
 
+	ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier,
+						  CGROUP_LIFETIME_ONLINE,
+						  CGROUP_LIFETIME_OFFLINE, cgrp);
+	ret = notifier_to_errno(ret);
+	if (ret) {
+		cgroup_bpf_offline(cgrp);
+		goto out_psi_free;
+	}
+
 	/* allocation complete, commit to creation */
 	spin_lock_irq(&css_set_lock);
 	for (i = 0; i < level; i++) {
@@ -5980,7 +6001,7 @@ static int cgroup_destroy_locked(struct
 	struct cgroup *tcgrp, *parent = cgroup_parent(cgrp);
 	struct cgroup_subsys_state *css;
 	struct cgrp_cset_link *link;
-	int ssid;
+	int ssid, ret;
 
 	lockdep_assert_held(&cgroup_mutex);
 
@@ -6041,6 +6062,10 @@ static int cgroup_destroy_locked(struct
 	if (cgrp->root == &cgrp_dfl_root)
 		cgroup_bpf_offline(cgrp);
 
+	ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
+					   CGROUP_LIFETIME_OFFLINE, cgrp);
+	WARN_ON_ONCE(notifier_to_errno(ret));
+
 	/* put the base reference */
 	percpu_ref_kill(&cgrp->self.refcnt);
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier
  2025-05-14  4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo
@ 2025-05-14  4:46   ` Tejun Heo
  2025-05-22 19:16     ` Tejun Heo
  2025-06-02 15:07   ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný
  1 sibling, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2025-05-14  4:46 UTC (permalink / raw)
  To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf

Replace explicit cgroup_bpf_inherit/offline() calls from cgroup
creation/destruction paths with notification callback registered on
cgroup_lifetime_notifier.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/bpf-cgroup.h |    9 +++++----
 kernel/bpf/cgroup.c        |   38 ++++++++++++++++++++++++++++++++++++--
 kernel/cgroup/cgroup.c     |   20 +++-----------------
 3 files changed, 44 insertions(+), 23 deletions(-)

--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -114,8 +114,7 @@ struct bpf_prog_list {
 	u32 flags;
 };
 
-int cgroup_bpf_inherit(struct cgroup *cgrp);
-void cgroup_bpf_offline(struct cgroup *cgrp);
+void __init cgroup_bpf_lifetime_notifier_init(void);
 
 int __cgroup_bpf_run_filter_skb(struct sock *sk,
 				struct sk_buff *skb,
@@ -431,8 +430,10 @@ const struct bpf_func_proto *
 cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
 #else
 
-static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
-static inline void cgroup_bpf_offline(struct cgroup *cgrp) {}
+static inline void cgroup_bpf_lifetime_notifier_init(void)
+{
+	return;
+}
 
 static inline int cgroup_bpf_prog_attach(const union bpf_attr *attr,
 					 enum bpf_prog_type ptype,
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -41,6 +41,19 @@ static int __init cgroup_bpf_wq_init(voi
 }
 core_initcall(cgroup_bpf_wq_init);
 
+static int cgroup_bpf_lifetime_notify(struct notifier_block *nb,
+				      unsigned long action, void *data);
+
+static struct notifier_block cgroup_bpf_lifetime_nb = {
+	.notifier_call = cgroup_bpf_lifetime_notify,
+};
+
+void __init cgroup_bpf_lifetime_notifier_init(void)
+{
+	BUG_ON(blocking_notifier_chain_register(&cgroup_lifetime_notifier,
+						&cgroup_bpf_lifetime_nb));
+}
+
 /* __always_inline is necessary to prevent indirect call through run_prog
  * function pointer.
  */
@@ -206,7 +219,7 @@ bpf_cgroup_atype_find(enum bpf_attach_ty
 }
 #endif /* CONFIG_BPF_LSM */
 
-void cgroup_bpf_offline(struct cgroup *cgrp)
+static void cgroup_bpf_offline(struct cgroup *cgrp)
 {
 	cgroup_get(cgrp);
 	percpu_ref_kill(&cgrp->bpf.refcnt);
@@ -491,7 +504,7 @@ static void activate_effective_progs(str
  * cgroup_bpf_inherit() - inherit effective programs from parent
  * @cgrp: the cgroup to modify
  */
-int cgroup_bpf_inherit(struct cgroup *cgrp)
+static int cgroup_bpf_inherit(struct cgroup *cgrp)
 {
 /* has to use marco instead of const int, since compiler thinks
  * that array below is variable length
@@ -534,6 +547,27 @@ cleanup:
 	return -ENOMEM;
 }
 
+static int cgroup_bpf_lifetime_notify(struct notifier_block *nb,
+				      unsigned long action, void *data)
+{
+	struct cgroup *cgrp = data;
+	int ret = 0;
+
+	if (cgrp->root != &cgrp_dfl_root)
+		return NOTIFY_OK;
+
+	switch (action) {
+	case CGROUP_LIFETIME_ONLINE:
+		ret = cgroup_bpf_inherit(cgrp);
+		break;
+	case CGROUP_LIFETIME_OFFLINE:
+		cgroup_bpf_offline(cgrp);
+		break;
+	}
+
+	return notifier_from_errno(ret);
+}
+
 static int update_effective_progs(struct cgroup *cgrp,
 				  enum cgroup_bpf_attach_type atype)
 {
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2162,11 +2162,6 @@ int cgroup_setup_root(struct cgroup_root
 	if (ret)
 		goto exit_stats;
 
-	if (root == &cgrp_dfl_root) {
-		ret = cgroup_bpf_inherit(root_cgrp);
-		WARN_ON_ONCE(ret);
-	}
-
 	ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
 					   CGROUP_LIFETIME_ONLINE, root_cgrp);
 	WARN_ON_ONCE(notifier_to_errno(ret));
@@ -5759,20 +5754,12 @@ static struct cgroup *cgroup_create(stru
 
 	cgrp->self.serial_nr = css_serial_nr_next++;
 
-	if (cgrp->root == &cgrp_dfl_root) {
-		ret = cgroup_bpf_inherit(cgrp);
-		if (ret)
-			goto out_psi_free;
-	}
-
 	ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier,
 						  CGROUP_LIFETIME_ONLINE,
 						  CGROUP_LIFETIME_OFFLINE, cgrp);
 	ret = notifier_to_errno(ret);
-	if (ret) {
-		cgroup_bpf_offline(cgrp);
+	if (ret)
 		goto out_psi_free;
-	}
 
 	/* allocation complete, commit to creation */
 	spin_lock_irq(&css_set_lock);
@@ -6059,9 +6046,6 @@ static int cgroup_destroy_locked(struct
 
 	cgroup1_check_for_release(parent);
 
-	if (cgrp->root == &cgrp_dfl_root)
-		cgroup_bpf_offline(cgrp);
-
 	ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
 					   CGROUP_LIFETIME_OFFLINE, cgrp);
 	WARN_ON_ONCE(notifier_to_errno(ret));
@@ -6215,6 +6199,8 @@ int __init cgroup_init(void)
 	hash_add(css_set_table, &init_css_set.hlist,
 		 css_set_hash(init_css_set.subsys));
 
+	cgroup_bpf_lifetime_notifier_init();
+
 	BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0));
 
 	cgroup_unlock();

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier
  2025-05-14  4:46   ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo
@ 2025-05-22 19:16     ` Tejun Heo
  0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-05-22 19:16 UTC (permalink / raw)
  To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf

On Wed, May 14, 2025 at 12:46:12AM -0400, Tejun Heo wrote:
> Replace explicit cgroup_bpf_inherit/offline() calls from cgroup
> creation/destruction paths with notification callback registered on
> cgroup_lifetime_notifier.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Applied 1-3 to cgroup/for-6.16.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier
  2025-05-14  4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo
  2025-05-14  4:46   ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo
@ 2025-06-02 15:07   ` Michal Koutný
  2025-06-02 17:30     ` Tejun Heo
  1 sibling, 1 reply; 6+ messages in thread
From: Michal Koutný @ 2025-06-02 15:07 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Johannes Weiner, cgroups, linux-kernel, bpf

[-- Attachment #1: Type: text/plain, Size: 727 bytes --]

On Wed, May 14, 2025 at 12:44:44AM -0400, Tejun Heo <tj@kernel.org> wrote:
> Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf
> support being one such example. For such a feature, it's useful to be able
<snip>

> other uses are planned.
<snip>

> @@ -5753,6 +5765,15 @@ static struct cgroup *cgroup_create(stru
>  			goto out_psi_free;
>  	}
>  
> +	ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier,
> +						  CGROUP_LIFETIME_ONLINE,
> +						  CGROUP_LIFETIME_OFFLINE, cgrp);

This is with cgroup_mutex taken.

Wouldn't it be more prudent to start with atomic or raw notifier chain?
(To prevent future unwitting expansion of cgroup_mutex.)


Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier
  2025-06-02 15:07   ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný
@ 2025-06-02 17:30     ` Tejun Heo
  0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-06-02 17:30 UTC (permalink / raw)
  To: Michal Koutný; +Cc: Johannes Weiner, cgroups, linux-kernel, bpf

Hello,

On Mon, Jun 02, 2025 at 05:07:39PM +0200, Michal Koutný wrote:
> On Wed, May 14, 2025 at 12:44:44AM -0400, Tejun Heo <tj@kernel.org> wrote:
> > Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf
> > support being one such example. For such a feature, it's useful to be able
> <snip>
> 
> > other uses are planned.
> <snip>
> 
> > @@ -5753,6 +5765,15 @@ static struct cgroup *cgroup_create(stru
> >  			goto out_psi_free;
> >  	}
> >  
> > +	ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier,
> > +						  CGROUP_LIFETIME_ONLINE,
> > +						  CGROUP_LIFETIME_OFFLINE, cgrp);
> 
> This is with cgroup_mutex taken.
> 
> Wouldn't it be more prudent to start with atomic or raw notifier chain?
> (To prevent future unwitting expansion of cgroup_mutex.)

This being primarily useful for init/exiting stuff, I think it'd be
reasonable to expect memory allocations. e.g. Even the existing BPF cgroup
support needs sleepable context for percpu_ref init and prog allocations.

If cgroup_mutex gets involved in locking dep loops, it'll light up lockdep,
so I'm not *too* worried.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-06-02 17:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-14  4:43 [PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup_create() Tejun Heo
2025-05-14  4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo
2025-05-14  4:46   ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo
2025-05-22 19:16     ` Tejun Heo
2025-06-02 15:07   ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný
2025-06-02 17:30     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).