* [PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup_create() @ 2025-05-14 4:43 Tejun Heo 2025-05-14 4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo 0 siblings, 1 reply; 6+ messages in thread From: Tejun Heo @ 2025-05-14 4:43 UTC (permalink / raw) To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf cgroup_bpf init and exit handling will be moved to a notifier chain. In prepartion, reorganize cgroup_create() a bit so that the new cgroup is fully initialized before any outside changes are made. - cgrp->ancestors[] initialization and the hierarchical nr_descendants and nr_frozen_descendants updates were in the same loop. Separate them out and do the former earlier and do the latter later. - Relocate cgroup_bpf_inherit() call so that it's after all cgroup initializations are complete. No visible behavior changes expected. Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/cgroup/cgroup.c | 48 ++++++++++++++++++++++++------------------------ 1 file changed, 24 insertions(+), 24 deletions(-) --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5684,7 +5684,7 @@ static struct cgroup *cgroup_create(stru struct cgroup_root *root = parent->root; struct cgroup *cgrp, *tcgrp; struct kernfs_node *kn; - int level = parent->level + 1; + int i, level = parent->level + 1; int ret; /* allocate the cgroup and its ID, 0 is reserved for the root */ @@ -5720,11 +5720,8 @@ static struct cgroup *cgroup_create(stru if (ret) goto out_kernfs_remove; - if (cgrp->root == &cgrp_dfl_root) { - ret = cgroup_bpf_inherit(cgrp); - if (ret) - goto out_psi_free; - } + for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) + cgrp->ancestors[tcgrp->level] = tcgrp; /* * New cgroup inherits effective freeze counter, and @@ -5742,24 +5739,6 @@ static struct cgroup *cgroup_create(stru set_bit(CGRP_FROZEN, &cgrp->flags); } - spin_lock_irq(&css_set_lock); - for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) { - cgrp->ancestors[tcgrp->level] = tcgrp; - - if (tcgrp != cgrp) { - tcgrp->nr_descendants++; - - /* - * If the new cgroup is frozen, all ancestor cgroups - * get a new frozen descendant, but their state can't - * change because of this. - */ - if (cgrp->freezer.e_freeze) - tcgrp->freezer.nr_frozen_descendants++; - } - } - spin_unlock_irq(&css_set_lock); - if (notify_on_release(parent)) set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags); @@ -5768,7 +5747,28 @@ static struct cgroup *cgroup_create(stru cgrp->self.serial_nr = css_serial_nr_next++; + if (cgrp->root == &cgrp_dfl_root) { + ret = cgroup_bpf_inherit(cgrp); + if (ret) + goto out_psi_free; + } + /* allocation complete, commit to creation */ + spin_lock_irq(&css_set_lock); + for (i = 0; i < level; i++) { + tcgrp = cgrp->ancestors[i]; + tcgrp->nr_descendants++; + + /* + * If the new cgroup is frozen, all ancestor cgroups get a new + * frozen descendant, but their state can't change because of + * this. + */ + if (cgrp->freezer.e_freeze) + tcgrp->freezer.nr_frozen_descendants++; + } + spin_unlock_irq(&css_set_lock); + list_add_tail_rcu(&cgrp->self.sibling, &cgroup_parent(cgrp)->self.children); atomic_inc(&root->nr_cgrps); cgroup_get_live(parent); ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier 2025-05-14 4:43 [PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup_create() Tejun Heo @ 2025-05-14 4:44 ` Tejun Heo 2025-05-14 4:46 ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo 2025-06-02 15:07 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný 0 siblings, 2 replies; 6+ messages in thread From: Tejun Heo @ 2025-05-14 4:44 UTC (permalink / raw) To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf support being one such example. For such a feature, it's useful to be able to hook into cgroup creation and destruction paths to perform feature-specific initializations and cleanups. Add cgroup_lifetime_notifier which generates CGROUP_LIFETIME_ONLINE and CGROUP_LIFETIME_OFFLINE events whenever cgroups are created and destroyed, respectively. The next patch will convert cgroup_bpf to use the new notifier and other uses are planned. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/cgroup.h | 9 ++++++++- kernel/cgroup/cgroup.c | 27 ++++++++++++++++++++++++++- 2 files changed, 34 insertions(+), 2 deletions(-) --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -19,6 +19,7 @@ #include <linux/kernfs.h> #include <linux/jump_label.h> #include <linux/types.h> +#include <linux/notifier.h> #include <linux/ns_common.h> #include <linux/nsproxy.h> #include <linux/user_namespace.h> @@ -40,7 +41,7 @@ struct kernel_clone_args; #ifdef CONFIG_CGROUPS -enum { +enum css_task_iter_flags { CSS_TASK_ITER_PROCS = (1U << 0), /* walk only threadgroup leaders */ CSS_TASK_ITER_THREADED = (1U << 1), /* walk all threaded css_sets in the domain */ CSS_TASK_ITER_SKIPPED = (1U << 16), /* internal flags */ @@ -66,10 +67,16 @@ struct css_task_iter { struct list_head iters_node; /* css_set->task_iters */ }; +enum cgroup_lifetime_events { + CGROUP_LIFETIME_ONLINE, + CGROUP_LIFETIME_OFFLINE, +}; + extern struct file_system_type cgroup_fs_type; extern struct cgroup_root cgrp_dfl_root; extern struct css_set init_css_set; extern spinlock_t css_set_lock; +extern struct blocking_notifier_head cgroup_lifetime_notifier; #define SUBSYS(_x) extern struct cgroup_subsys _x ## _cgrp_subsys; #include <linux/cgroup_subsys.h> --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -95,6 +95,9 @@ EXPORT_SYMBOL_GPL(cgroup_mutex); EXPORT_SYMBOL_GPL(css_set_lock); #endif +struct blocking_notifier_head cgroup_lifetime_notifier = + BLOCKING_NOTIFIER_INIT(cgroup_lifetime_notifier); + DEFINE_SPINLOCK(trace_cgroup_path_lock); char trace_cgroup_path[TRACE_CGROUP_PATH_LEN]; static bool cgroup_debug __read_mostly; @@ -1335,6 +1338,7 @@ static void cgroup_destroy_root(struct c { struct cgroup *cgrp = &root->cgrp; struct cgrp_cset_link *link, *tmp_link; + int ret; trace_cgroup_destroy_root(root); @@ -1343,6 +1347,10 @@ static void cgroup_destroy_root(struct c BUG_ON(atomic_read(&root->nr_cgrps)); BUG_ON(!list_empty(&cgrp->self.children)); + ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier, + CGROUP_LIFETIME_OFFLINE, cgrp); + WARN_ON_ONCE(notifier_to_errno(ret)); + /* Rebind all subsystems back to the default hierarchy */ WARN_ON(rebind_subsystems(&cgrp_dfl_root, root->subsys_mask)); @@ -2159,6 +2167,10 @@ int cgroup_setup_root(struct cgroup_root WARN_ON_ONCE(ret); } + ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier, + CGROUP_LIFETIME_ONLINE, root_cgrp); + WARN_ON_ONCE(notifier_to_errno(ret)); + trace_cgroup_setup_root(root); /* @@ -5753,6 +5765,15 @@ static struct cgroup *cgroup_create(stru goto out_psi_free; } + ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier, + CGROUP_LIFETIME_ONLINE, + CGROUP_LIFETIME_OFFLINE, cgrp); + ret = notifier_to_errno(ret); + if (ret) { + cgroup_bpf_offline(cgrp); + goto out_psi_free; + } + /* allocation complete, commit to creation */ spin_lock_irq(&css_set_lock); for (i = 0; i < level; i++) { @@ -5980,7 +6001,7 @@ static int cgroup_destroy_locked(struct struct cgroup *tcgrp, *parent = cgroup_parent(cgrp); struct cgroup_subsys_state *css; struct cgrp_cset_link *link; - int ssid; + int ssid, ret; lockdep_assert_held(&cgroup_mutex); @@ -6041,6 +6062,10 @@ static int cgroup_destroy_locked(struct if (cgrp->root == &cgrp_dfl_root) cgroup_bpf_offline(cgrp); + ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier, + CGROUP_LIFETIME_OFFLINE, cgrp); + WARN_ON_ONCE(notifier_to_errno(ret)); + /* put the base reference */ percpu_ref_kill(&cgrp->self.refcnt); ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier 2025-05-14 4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo @ 2025-05-14 4:46 ` Tejun Heo 2025-05-22 19:16 ` Tejun Heo 2025-06-02 15:07 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný 1 sibling, 1 reply; 6+ messages in thread From: Tejun Heo @ 2025-05-14 4:46 UTC (permalink / raw) To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf Replace explicit cgroup_bpf_inherit/offline() calls from cgroup creation/destruction paths with notification callback registered on cgroup_lifetime_notifier. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/bpf-cgroup.h | 9 +++++---- kernel/bpf/cgroup.c | 38 ++++++++++++++++++++++++++++++++++++-- kernel/cgroup/cgroup.c | 20 +++----------------- 3 files changed, 44 insertions(+), 23 deletions(-) --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -114,8 +114,7 @@ struct bpf_prog_list { u32 flags; }; -int cgroup_bpf_inherit(struct cgroup *cgrp); -void cgroup_bpf_offline(struct cgroup *cgrp); +void __init cgroup_bpf_lifetime_notifier_init(void); int __cgroup_bpf_run_filter_skb(struct sock *sk, struct sk_buff *skb, @@ -431,8 +430,10 @@ const struct bpf_func_proto * cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); #else -static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; } -static inline void cgroup_bpf_offline(struct cgroup *cgrp) {} +static inline void cgroup_bpf_lifetime_notifier_init(void) +{ + return; +} static inline int cgroup_bpf_prog_attach(const union bpf_attr *attr, enum bpf_prog_type ptype, --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -41,6 +41,19 @@ static int __init cgroup_bpf_wq_init(voi } core_initcall(cgroup_bpf_wq_init); +static int cgroup_bpf_lifetime_notify(struct notifier_block *nb, + unsigned long action, void *data); + +static struct notifier_block cgroup_bpf_lifetime_nb = { + .notifier_call = cgroup_bpf_lifetime_notify, +}; + +void __init cgroup_bpf_lifetime_notifier_init(void) +{ + BUG_ON(blocking_notifier_chain_register(&cgroup_lifetime_notifier, + &cgroup_bpf_lifetime_nb)); +} + /* __always_inline is necessary to prevent indirect call through run_prog * function pointer. */ @@ -206,7 +219,7 @@ bpf_cgroup_atype_find(enum bpf_attach_ty } #endif /* CONFIG_BPF_LSM */ -void cgroup_bpf_offline(struct cgroup *cgrp) +static void cgroup_bpf_offline(struct cgroup *cgrp) { cgroup_get(cgrp); percpu_ref_kill(&cgrp->bpf.refcnt); @@ -491,7 +504,7 @@ static void activate_effective_progs(str * cgroup_bpf_inherit() - inherit effective programs from parent * @cgrp: the cgroup to modify */ -int cgroup_bpf_inherit(struct cgroup *cgrp) +static int cgroup_bpf_inherit(struct cgroup *cgrp) { /* has to use marco instead of const int, since compiler thinks * that array below is variable length @@ -534,6 +547,27 @@ cleanup: return -ENOMEM; } +static int cgroup_bpf_lifetime_notify(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct cgroup *cgrp = data; + int ret = 0; + + if (cgrp->root != &cgrp_dfl_root) + return NOTIFY_OK; + + switch (action) { + case CGROUP_LIFETIME_ONLINE: + ret = cgroup_bpf_inherit(cgrp); + break; + case CGROUP_LIFETIME_OFFLINE: + cgroup_bpf_offline(cgrp); + break; + } + + return notifier_from_errno(ret); +} + static int update_effective_progs(struct cgroup *cgrp, enum cgroup_bpf_attach_type atype) { --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2162,11 +2162,6 @@ int cgroup_setup_root(struct cgroup_root if (ret) goto exit_stats; - if (root == &cgrp_dfl_root) { - ret = cgroup_bpf_inherit(root_cgrp); - WARN_ON_ONCE(ret); - } - ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier, CGROUP_LIFETIME_ONLINE, root_cgrp); WARN_ON_ONCE(notifier_to_errno(ret)); @@ -5759,20 +5754,12 @@ static struct cgroup *cgroup_create(stru cgrp->self.serial_nr = css_serial_nr_next++; - if (cgrp->root == &cgrp_dfl_root) { - ret = cgroup_bpf_inherit(cgrp); - if (ret) - goto out_psi_free; - } - ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier, CGROUP_LIFETIME_ONLINE, CGROUP_LIFETIME_OFFLINE, cgrp); ret = notifier_to_errno(ret); - if (ret) { - cgroup_bpf_offline(cgrp); + if (ret) goto out_psi_free; - } /* allocation complete, commit to creation */ spin_lock_irq(&css_set_lock); @@ -6059,9 +6046,6 @@ static int cgroup_destroy_locked(struct cgroup1_check_for_release(parent); - if (cgrp->root == &cgrp_dfl_root) - cgroup_bpf_offline(cgrp); - ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier, CGROUP_LIFETIME_OFFLINE, cgrp); WARN_ON_ONCE(notifier_to_errno(ret)); @@ -6215,6 +6199,8 @@ int __init cgroup_init(void) hash_add(css_set_table, &init_css_set.hlist, css_set_hash(init_css_set.subsys)); + cgroup_bpf_lifetime_notifier_init(); + BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0)); cgroup_unlock(); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier 2025-05-14 4:46 ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo @ 2025-05-22 19:16 ` Tejun Heo 0 siblings, 0 replies; 6+ messages in thread From: Tejun Heo @ 2025-05-22 19:16 UTC (permalink / raw) To: Johannes Weiner, Michal Koutný; +Cc: cgroups, linux-kernel, bpf On Wed, May 14, 2025 at 12:46:12AM -0400, Tejun Heo wrote: > Replace explicit cgroup_bpf_inherit/offline() calls from cgroup > creation/destruction paths with notification callback registered on > cgroup_lifetime_notifier. > > Signed-off-by: Tejun Heo <tj@kernel.org> Applied 1-3 to cgroup/for-6.16. Thanks. -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier 2025-05-14 4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo 2025-05-14 4:46 ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo @ 2025-06-02 15:07 ` Michal Koutný 2025-06-02 17:30 ` Tejun Heo 1 sibling, 1 reply; 6+ messages in thread From: Michal Koutný @ 2025-06-02 15:07 UTC (permalink / raw) To: Tejun Heo; +Cc: Johannes Weiner, cgroups, linux-kernel, bpf [-- Attachment #1: Type: text/plain, Size: 727 bytes --] On Wed, May 14, 2025 at 12:44:44AM -0400, Tejun Heo <tj@kernel.org> wrote: > Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf > support being one such example. For such a feature, it's useful to be able <snip> > other uses are planned. <snip> > @@ -5753,6 +5765,15 @@ static struct cgroup *cgroup_create(stru > goto out_psi_free; > } > > + ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier, > + CGROUP_LIFETIME_ONLINE, > + CGROUP_LIFETIME_OFFLINE, cgrp); This is with cgroup_mutex taken. Wouldn't it be more prudent to start with atomic or raw notifier chain? (To prevent future unwitting expansion of cgroup_mutex.) Thanks, Michal [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier 2025-06-02 15:07 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný @ 2025-06-02 17:30 ` Tejun Heo 0 siblings, 0 replies; 6+ messages in thread From: Tejun Heo @ 2025-06-02 17:30 UTC (permalink / raw) To: Michal Koutný; +Cc: Johannes Weiner, cgroups, linux-kernel, bpf Hello, On Mon, Jun 02, 2025 at 05:07:39PM +0200, Michal Koutný wrote: > On Wed, May 14, 2025 at 12:44:44AM -0400, Tejun Heo <tj@kernel.org> wrote: > > Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf > > support being one such example. For such a feature, it's useful to be able > <snip> > > > other uses are planned. > <snip> > > > @@ -5753,6 +5765,15 @@ static struct cgroup *cgroup_create(stru > > goto out_psi_free; > > } > > > > + ret = blocking_notifier_call_chain_robust(&cgroup_lifetime_notifier, > > + CGROUP_LIFETIME_ONLINE, > > + CGROUP_LIFETIME_OFFLINE, cgrp); > > This is with cgroup_mutex taken. > > Wouldn't it be more prudent to start with atomic or raw notifier chain? > (To prevent future unwitting expansion of cgroup_mutex.) This being primarily useful for init/exiting stuff, I think it'd be reasonable to expect memory allocations. e.g. Even the existing BPF cgroup support needs sleepable context for percpu_ref init and prog allocations. If cgroup_mutex gets involved in locking dep loops, it'll light up lockdep, so I'm not *too* worried. Thanks. -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-06-02 17:30 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-14 4:43 [PATCH 1/3 cgroup/for-6.16] cgroup: Minor reorganization of cgroup_create() Tejun Heo 2025-05-14 4:44 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Tejun Heo 2025-05-14 4:46 ` [PATCH 3/3 cgroup/for-6.16] sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier Tejun Heo 2025-05-22 19:16 ` Tejun Heo 2025-06-02 15:07 ` [PATCH 2/3 cgroup/for-6.16] sched_ext: Introduce cgroup_lifetime_notifier Michal Koutný 2025-06-02 17:30 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).