* [PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup_create()
@ 2025-05-14 4:43 Tejun Heo
2025-05-14 4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2025-05-14 4:43 UTC (permalink / raw)
To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf
cgroup_bpf init and exit handling will be moved to a notifier chain. In
prepartion, reorganize cgroup_create() a bit so that the new cgroup is fully
initialized before any outside changes are made.
- cgrp->ancestors[] initialization and the hierarchical nr_descendants and
nr_frozen_descendants updates were in the same loop. Separate them out and
do the former earlier and do the latter later.
- Relocate cgroup_bpf_inherit() call so that it's after all cgroup
initializations are complete.
No visible behavior changes expected.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/cgroup/cgroup.c | 48 ++++++++++++++++++++++++------------------------
1 file changed, 24 insertions(+), 24 deletions(-)
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5684,7 +5684,7 @@ static struct cgroup *cgroup_create(stru
struct cgroup_root *root = parent->root;
struct cgroup *cgrp, *tcgrp;
struct kernfs_node *kn;
- int level = parent->level + 1;
+ int i, level = parent->level + 1;
int ret;
/* allocate the cgroup and its ID, 0 is reserved for the root */
@@ -5720,11 +5720,8 @@ static struct cgroup *cgroup_create(stru
if (ret)
goto out_kernfs_remove;
- if (cgrp->root == &cgrp_dfl_root) {
- ret = cgroup_bpf_inherit(cgrp);
- if (ret)
- goto out_psi_free;
- }
+ for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp))
+ cgrp->ancestors[tcgrp->level] = tcgrp;
/*
* New cgroup inherits effective freeze counter, and
@@ -5742,24 +5739,6 @@ static struct cgroup *cgroup_create(stru
set_bit(CGRP_FROZEN, &cgrp->flags);
}
- spin_lock_irq(&css_set_lock);
- for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) {
- cgrp->ancestors[tcgrp->level] = tcgrp;
-
- if (tcgrp != cgrp) {
- tcgrp->nr_descendants++;
-
- /*
- * If the new cgroup is frozen, all ancestor cgroups
- * get a new frozen descendant, but their state can't
- * change because of this.
- */
- if (cgrp->freezer.e_freeze)
- tcgrp->freezer.nr_frozen_descendants++;
- }
- }
- spin_unlock_irq(&css_set_lock);
-
if (notify_on_release(parent))
set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
@@ -5768,7 +5747,28 @@ static struct cgroup *cgroup_create(stru
cgrp->self.serial_nr = css_serial_nr_next++;
+ if (cgrp->root == &cgrp_dfl_root) {
+ ret = cgroup_bpf_inherit(cgrp);
+ if (ret)
+ goto out_psi_free;
+ }
+
/* allocation complete, commit to creation */
+ spin_lock_irq(&css_set_lock);
+ for (i = 0; i < level; i++) {
+ tcgrp = cgrp->ancestors[i];
+ tcgrp->nr_descendants++;
+
+ /*
+ * If the new cgroup is frozen, all ancestor cgroups get a new
+ * frozen descendant, but their state can't change because of
+ * this.
+ */
+ if (cgrp->freezer.e_freeze)
+ tcgrp->freezer.nr_frozen_descendants++;
+ }
+ spin_unlock_irq(&css_set_lock);
+
list_add_tail_rcu(&cgrp->self.sibling, &cgroup_parent(cgrp)->self.children);
atomic_inc(&root->nr_cgrps);
cgroup_get_live(parent);
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier
2025-05-14 4:43 [PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup_create() Tejun Heo
@ 2025-05-14 4:44 ` Tejun Heo
2025-05-14 4:46 ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo
2025-06-02 15:07 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný
0 siblings, 2 replies; 6+ messages in thread
From: Tejun Heo @ 2025-05-14 4:44 UTC (permalink / raw)
To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf
Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf
support being one such example. For such a feature, it's useful to be able
to hook into cgroup creation and destruction paths to perform
feature-specific initializations and cleanups.
Add cgroup_lifetime_notifier which generates CGROUP_LIFETIME_ONLINE and
CGROUP_LIFETIME_OFFLINE events whenever cgroups are created and destroyed,
respectively.
The next patch will convert cgroup_bpf to use the new notifier and other
uses are planned.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
include/linux/cgroup.h | 9 ++++++++-
kernel/cgroup/cgroup.c | 27 ++++++++++++++++++++++++++-
2 files changed, 34 insertions(+), 2 deletions(-)
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -19,6 +19,7 @@
#include <linux/kernfs.h>
#include <linux/jump_label.h>
#include <linux/types.h>
+#include <linux/notifier.h>
#include <linux/ns_common.h>
#include <linux/nsproxy.h>
#include <linux/user_namespace.h>
@@ -40,7 +41,7 @@ struct kernel_clone_args;
#ifdef CONFIG_CGROUPS
-enum {
+enum css_task_iter_flags {
CSS_TASK_ITER_PROCS = (1U << 0), /* walk only threadgroup leaders */
CSS_TASK_ITER_THREADED = (1U << 1), /* walk all threaded css_sets in the domain */
CSS_TASK_ITER_SKIPPED = (1U << 16), /* internal flags */
@@ -66,10 +67,16 @@ struct css_task_iter {
struct list_head iters_node; /* css_set->task_iters */
};
+enum cgroup_lifetime_events {
+ CGROUP_LIFETIME_ONLINE,
+ CGROUP_LIFETIME_OFFLINE,
+};
+
extern struct file_system_type cgroup_fs_type;
extern struct cgroup_root cgrp_dfl_root;
extern struct css_set init_css_set;
extern spinlock_t css_set_lock;
+extern struct blocking_notifier_head cgroup_lifetime_notifier;
#define SUBSYS(_x) extern struct cgroup_subsys _x ## _cgrp_subsys;
#include <linux/cgroup_subsys.h>
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -95,6 +95,9 @@ EXPORT_SYMBOL_GPL(cgroup_mutex);
EXPORT_SYMBOL_GPL(css_set_lock);
#endif
+struct blocking_notifier_head cgroup_lifetime_notifier =
+ BLOCKING_NOTIFIER_INIT(cgroup_lifetime_notifier);
+
DEFINE_SPINLOCK(trace_cgroup_path_lock);
char trace_cgroup_path[TRACE_CGROUP_PATH_LEN];
static bool cgroup_debug __read_mostly;
@@ -1335,6 +1338,7 @@ static void cgroup_destroy_root(struct c
{
struct cgroup *cgrp = &root->cgrp;
struct cgrp_cset_link *link, *tmp_link;
+ int ret;
trace_cgroup_destroy_root(root);
@@ -1343,6 +1347,10 @@ static void cgroup_destroy_root(struct c
BUG_ON(atomic_read(&root->nr_cgrps));
BUG_ON(!list_empty(&cgrp->self.children));
+ ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
+ CGROUP_LIFETIME_OFFLINE, cgrp);
+ WARN_ON_ONCE(notifier_to_errno(ret));
+
/* Rebind all subsystems back to the default hierarchy */
WARN_ON(rebind_subsystems(&cgrp_dfl_root, root->subsys_mask));
@@ -2159,6 +2167,10 @@ int cgroup_setup_root(struct cgroup_root
WARN_ON_ONCE(ret);
}
+ ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
+ CGROUP_LIFETIME_ONLINE, root_cgrp);
+ WARN_ON_ONCE(notifier_to_errno(ret));
+
trace_cgroup_setup_root(root);
/*
@@ -5753,6 +5765,15 @@ static struct cgroup *cgroup_create(stru
goto out_psi_free;
}
+ ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier,
+ CGROUP_LIFETIME_ONLINE,
+ CGROUP_LIFETIME_OFFLINE, cgrp);
+ ret = notifier_to_errno(ret);
+ if (ret) {
+ cgroup_bpf_offline(cgrp);
+ goto out_psi_free;
+ }
+
/* allocation complete, commit to creation */
spin_lock_irq(&css_set_lock);
for (i = 0; i < level; i++) {
@@ -5980,7 +6001,7 @@ static int cgroup_destroy_locked(struct
struct cgroup *tcgrp, *parent = cgroup_parent(cgrp);
struct cgroup_subsys_state *css;
struct cgrp_cset_link *link;
- int ssid;
+ int ssid, ret;
lockdep_assert_held(&cgroup_mutex);
@@ -6041,6 +6062,10 @@ static int cgroup_destroy_locked(struct
if (cgrp->root == &cgrp_dfl_root)
cgroup_bpf_offline(cgrp);
+ ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
+ CGROUP_LIFETIME_OFFLINE, cgrp);
+ WARN_ON_ONCE(notifier_to_errno(ret));
+
/* put the base reference */
percpu_ref_kill(&cgrp->self.refcnt);
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier
2025-05-14 4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo
@ 2025-05-14 4:46 ` Tejun Heo
2025-05-22 19:16 ` Tejun Heo
2025-06-02 15:07 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný
1 sibling, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2025-05-14 4:46 UTC (permalink / raw)
To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf
Replace explicit cgroup_bpf_inherit/offline() calls from cgroup
creation/destruction paths with notification callback registered on
cgroup_lifetime_notifier.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
include/linux/bpf-cgroup.h | 9 +++++----
kernel/bpf/cgroup.c | 38 ++++++++++++++++++++++++++++++++++++--
kernel/cgroup/cgroup.c | 20 +++-----------------
3 files changed, 44 insertions(+), 23 deletions(-)
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -114,8 +114,7 @@ struct bpf_prog_list {
u32 flags;
};
-int cgroup_bpf_inherit(struct cgroup *cgrp);
-void cgroup_bpf_offline(struct cgroup *cgrp);
+void __init cgroup_bpf_lifetime_notifier_init(void);
int __cgroup_bpf_run_filter_skb(struct sock *sk,
struct sk_buff *skb,
@@ -431,8 +430,10 @@ const struct bpf_func_proto *
cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
#else
-static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
-static inline void cgroup_bpf_offline(struct cgroup *cgrp) {}
+static inline void cgroup_bpf_lifetime_notifier_init(void)
+{
+ return;
+}
static inline int cgroup_bpf_prog_attach(const union bpf_attr *attr,
enum bpf_prog_type ptype,
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -41,6 +41,19 @@ static int __init cgroup_bpf_wq_init(voi
}
core_initcall(cgroup_bpf_wq_init);
+static int cgroup_bpf_lifetime_notify(struct notifier_block *nb,
+ unsigned long action, void *data);
+
+static struct notifier_block cgroup_bpf_lifetime_nb = {
+ .notifier_call = cgroup_bpf_lifetime_notify,
+};
+
+void __init cgroup_bpf_lifetime_notifier_init(void)
+{
+ BUG_ON(blocking_notifier_chain_register(&cgroup_lifetime_notifier,
+ &cgroup_bpf_lifetime_nb));
+}
+
/* __always_inline is necessary to prevent indirect call through run_prog
* function pointer.
*/
@@ -206,7 +219,7 @@ bpf_cgroup_atype_find(enum bpf_attach_ty
}
#endif /* CONFIG_BPF_LSM */
-void cgroup_bpf_offline(struct cgroup *cgrp)
+static void cgroup_bpf_offline(struct cgroup *cgrp)
{
cgroup_get(cgrp);
percpu_ref_kill(&cgrp->bpf.refcnt);
@@ -491,7 +504,7 @@ static void activate_effective_progs(str
* cgroup_bpf_inherit() - inherit effective programs from parent
* @cgrp: the cgroup to modify
*/
-int cgroup_bpf_inherit(struct cgroup *cgrp)
+static int cgroup_bpf_inherit(struct cgroup *cgrp)
{
/* has to use marco instead of const int, since compiler thinks
* that array below is variable length
@@ -534,6 +547,27 @@ cleanup:
return -ENOMEM;
}
+static int cgroup_bpf_lifetime_notify(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct cgroup *cgrp = data;
+ int ret = 0;
+
+ if (cgrp->root != &cgrp_dfl_root)
+ return NOTIFY_OK;
+
+ switch (action) {
+ case CGROUP_LIFETIME_ONLINE:
+ ret = cgroup_bpf_inherit(cgrp);
+ break;
+ case CGROUP_LIFETIME_OFFLINE:
+ cgroup_bpf_offline(cgrp);
+ break;
+ }
+
+ return notifier_from_errno(ret);
+}
+
static int update_effective_progs(struct cgroup *cgrp,
enum cgroup_bpf_attach_type atype)
{
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2162,11 +2162,6 @@ int cgroup_setup_root(struct cgroup_root
if (ret)
goto exit_stats;
- if (root == &cgrp_dfl_root) {
- ret = cgroup_bpf_inherit(root_cgrp);
- WARN_ON_ONCE(ret);
- }
-
ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
CGROUP_LIFETIME_ONLINE, root_cgrp);
WARN_ON_ONCE(notifier_to_errno(ret));
@@ -5759,20 +5754,12 @@ static struct cgroup *cgroup_create(stru
cgrp->self.serial_nr = css_serial_nr_next++;
- if (cgrp->root == &cgrp_dfl_root) {
- ret = cgroup_bpf_inherit(cgrp);
- if (ret)
- goto out_psi_free;
- }
-
ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier,
CGROUP_LIFETIME_ONLINE,
CGROUP_LIFETIME_OFFLINE, cgrp);
ret = notifier_to_errno(ret);
- if (ret) {
- cgroup_bpf_offline(cgrp);
+ if (ret)
goto out_psi_free;
- }
/* allocation complete, commit to creation */
spin_lock_irq(&css_set_lock);
@@ -6059,9 +6046,6 @@ static int cgroup_destroy_locked(struct
cgroup1_check_for_release(parent);
- if (cgrp->root == &cgrp_dfl_root)
- cgroup_bpf_offline(cgrp);
-
ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier,
CGROUP_LIFETIME_OFFLINE, cgrp);
WARN_ON_ONCE(notifier_to_errno(ret));
@@ -6215,6 +6199,8 @@ int __init cgroup_init(void)
hash_add(css_set_table, &init_css_set.hlist,
css_set_hash(init_css_set.subsys));
+ cgroup_bpf_lifetime_notifier_init();
+
BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0));
cgroup_unlock();
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier
2025-05-14 4:46 ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo
@ 2025-05-22 19:16 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-05-22 19:16 UTC (permalink / raw)
To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf
On Wed, May 14, 2025 at 12:46:12AM -0400, Tejun Heo wrote:
> Replace explicit cgroup_bpf_inherit/offline() calls from cgroup
> creation/destruction paths with notification callback registered on
> cgroup_lifetime_notifier.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Applied 1-3 to cgroup/for-6.16.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier
2025-05-14 4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo
2025-05-14 4:46 ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo
@ 2025-06-02 15:07 ` Michal Koutný
2025-06-02 17:30 ` Tejun Heo
1 sibling, 1 reply; 6+ messages in thread
From: Michal Koutný @ 2025-06-02 15:07 UTC (permalink / raw)
To: Tejun Heo; +Cc: Johannes Weiner, cgroups, linux-kernel, bpf
[-- Attachment #1: Type: text/plain, Size: 727 bytes --]
On Wed, May 14, 2025 at 12:44:44AM -0400, Tejun Heo <tj@kernel.org> wrote:
> Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf
> support being one such example. For such a feature, it's useful to be able
<snip>
> other uses are planned.
<snip>
> @@ -5753,6 +5765,15 @@ static struct cgroup *cgroup_create(stru
> goto out_psi_free;
> }
>
> + ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier,
> + CGROUP_LIFETIME_ONLINE,
> + CGROUP_LIFETIME_OFFLINE, cgrp);
This is with cgroup_mutex taken.
Wouldn't it be more prudent to start with atomic or raw notifier chain?
(To prevent future unwitting expansion of cgroup_mutex.)
Thanks,
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier
2025-06-02 15:07 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný
@ 2025-06-02 17:30 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-06-02 17:30 UTC (permalink / raw)
To: Michal Koutný; +Cc: Johannes Weiner, cgroups, linux-kernel, bpf
Hello,
On Mon, Jun 02, 2025 at 05:07:39PM +0200, Michal Koutný wrote:
> On Wed, May 14, 2025 at 12:44:44AM -0400, Tejun Heo <tj@kernel.org> wrote:
> > Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf
> > support being one such example. For such a feature, it's useful to be able
> <snip>
>
> > other uses are planned.
> <snip>
>
> > @@ -5753,6 +5765,15 @@ static struct cgroup *cgroup_create(stru
> > goto out_psi_free;
> > }
> >
> > + ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier,
> > + CGROUP_LIFETIME_ONLINE,
> > + CGROUP_LIFETIME_OFFLINE, cgrp);
>
> This is with cgroup_mutex taken.
>
> Wouldn't it be more prudent to start with atomic or raw notifier chain?
> (To prevent future unwitting expansion of cgroup_mutex.)
This being primarily useful for init/exiting stuff, I think it'd be
reasonable to expect memory allocations. e.g. Even the existing BPF cgroup
support needs sleepable context for percpu_ref init and prog allocations.
If cgroup_mutex gets involved in locking dep loops, it'll light up lockdep,
so I'm not *too* worried.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-06-02 17:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-14 4:43 [PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup_create() Tejun Heo
2025-05-14 4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo
2025-05-14 4:46 ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo
2025-05-22 19:16 ` Tejun Heo
2025-06-02 15:07 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný
2025-06-02 17:30 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).