From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: JP Kobryn <inwardvessel@gmail.com>
Cc: tj@kernel.org, shakeel.butt@linux.dev, mkoutny@suse.com,
hannes@cmpxchg.org, akpm@linux-foundation.org,
linux-mm@kvack.org, cgroups@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH v5 2/5] cgroup: use separate rstat trees for each subsystem
Date: Wed, 7 May 2025 09:24:43 +0000 [thread overview]
Message-ID: <aBsm22A8qWjGJgY9@google.com> (raw)
In-Reply-To: <20250503001222.146355-3-inwardvessel@gmail.com>
On Fri, May 02, 2025 at 05:12:19PM -0700, JP Kobryn wrote:
> Different subsystems may call cgroup_rstat_updated() within the same
> cgroup, resulting in a tree of pending updates from multiple subsystems.
> When one of these subsystems is flushed via cgroup_rstat_flushed(), all
> other subsystems with pending updates on the tree will also be flushed.
>
> Change the paradigm of having a single rstat tree for all subsystems to
> having separate trees for each subsystem. This separation allows for
> subsystems to perform flushes without the side effects of other subsystems.
> As an example, flushing the cpu stats will no longer cause the memory stats
> to be flushed and vice versa.
>
> In order to achieve subsystem-specific trees, change the tree node type
> from cgroup to cgroup_subsys_state pointer. Then remove those pointers from
> the cgroup and instead place them on the css. Finally, change update/flush
> functions to make use of the different node type (css). These changes allow
> a specific subsystem to be associated with an update or flush. Separate
> rstat trees will now exist for each unique subsystem.
>
> Since updating/flushing will now be done at the subsystem level, there is
> no longer a need to keep track of updated css nodes at the cgroup level.
> The list management of these nodes done within the cgroup (rstat_css_list
> and related) has been removed accordingly.
>
> Conditional guards for checking validity of a given css were placed within
> css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc
> usage in bpf programs. Guards were also placed within css_rstat_init/exit()
> in order to help consolidate calls to them. At call sites for all four
> functions, the existing guards were removed.
>
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
> ---
> include/linux/cgroup-defs.h | 46 ++--
> kernel/cgroup/cgroup.c | 34 +--
> kernel/cgroup/rstat.c | 200 ++++++++++--------
> .../selftests/bpf/progs/btf_type_tag_percpu.c | 18 +-
> 4 files changed, 160 insertions(+), 138 deletions(-)
[..]
> @@ -6101,6 +6087,8 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
> } else {
> css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
> BUG_ON(css->id < 0);
> +
> + BUG_ON(css_rstat_init(css));
We call css_rstat_init() here for subsys css's that are not early
initialized, and in cgroup_setup_root() self css's. We can probably move
both calls into cgroup_init() as I mentioned earlier?
Also, I think this version just skips calling css_rstat_init() for early
initialized subsys css's, without adding the patch that you talked about
earlier which protects against early initialized subsystems using rstat.
> }
>
> /* Update the init_css_set to contain a subsys
[..]
> @@ -217,31 +225,32 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head,
> }
>
> /**
> - * cgroup_rstat_updated_list - return a list of updated cgroups to be flushed
> - * @root: root of the cgroup subtree to traverse
> + * css_rstat_updated_list - return a list of updated cgroups to be flushed
css's?
> + * @root: root of the css subtree to traverse
> * @cpu: target cpu
> * Return: A singly linked list of cgroups to be flushed
> *
> * Walks the updated rstat_cpu tree on @cpu from @root. During traversal,
> - * each returned cgroup is unlinked from the updated tree.
> + * each returned css is unlinked from the updated tree.
> *
> * The only ordering guarantee is that, for a parent and a child pair
> * covered by a given traversal, the child is before its parent in
> * the list.
> *
> * Note that updated_children is self terminated and points to a list of
> - * child cgroups if not empty. Whereas updated_next is like a sibling link
> - * within the children list and terminated by the parent cgroup. An exception
> + * child css's if not empty. Whereas updated_next is like a sibling link
> + * within the children list and terminated by the parent css. An exception
> * here is the cgroup root whose updated_next can be self terminated.
> */
[..]
> @@ -383,32 +395,45 @@ __bpf_kfunc void css_rstat_flush(struct cgroup_subsys_state *css)
>
> int css_rstat_init(struct cgroup_subsys_state *css)
> {
> - struct cgroup *cgrp = css->cgroup;
> + struct cgroup *cgrp;
> int cpu;
> + bool is_cgroup = css_is_cgroup(css);
>
> - /* the root cgrp has rstat_cpu preallocated */
> - if (!cgrp->rstat_cpu) {
> - cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu);
> - if (!cgrp->rstat_cpu)
> - return -ENOMEM;
> - }
> + if (is_cgroup) {
> + cgrp = css->cgroup;
You can keep 'cgrp' initialized at the top of the function to avoid the
extra level of indentation here, right?
>
> - if (!cgrp->rstat_base_cpu) {
> - cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
> + /* the root cgrp has rstat_base_cpu preallocated */
> if (!cgrp->rstat_base_cpu) {
> - free_percpu(cgrp->rstat_cpu);
> + cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
> + if (!cgrp->rstat_base_cpu)
> + return -ENOMEM;
> + }
> + } else if (css->ss->css_rstat_flush == NULL)
> + return 0;
We can probably just do this at the beginning of the function to be able
to use the helper:
if (!css_is_cgroup(css) && css->ss->css_rstat_flush == NULL)
return 0;
Also, when the return value of css_is_cgroup() is cached as is_cgroup it
makes me hate the function name even more, because 'is_cgroup' is very
confusing for a css in my opinion since they all represent cgroups.
I really think this should be css_is_self() (or css_is_self_cgroup())
and the variable names would be 'is_self'.
> +
> + /* the root cgrp's self css has rstat_cpu preallocated */
> + if (!css->rstat_cpu) {
> + css->rstat_cpu = alloc_percpu(struct css_rstat_cpu);
> + if (!css->rstat_cpu) {
> + if (is_cgroup)
> + free_percpu(cgrp->rstat_base_cpu);
> +
> return -ENOMEM;
> }
> }
>
> /* ->updated_children list is self terminated */
> for_each_possible_cpu(cpu) {
> - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
> - struct cgroup_rstat_base_cpu *rstatbc =
> - cgroup_rstat_base_cpu(cgrp, cpu);
> + struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu);
>
> - rstatc->updated_children = cgrp;
> - u64_stats_init(&rstatbc->bsync);
> + rstatc->updated_children = css;
> +
> + if (is_cgroup) {
> + struct cgroup_rstat_base_cpu *rstatbc;
> +
> + rstatbc = cgroup_rstat_base_cpu(cgrp, cpu);
> + u64_stats_init(&rstatbc->bsync);
> + }
> }
>
> return 0;
[..]
next prev parent reply other threads:[~2025-05-07 9:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-03 0:12 [PATCH v5 0/5] cgroup: separate rstat trees JP Kobryn
2025-05-03 0:12 ` [PATCH v5 1/5] cgroup: use helper for distingushing css in callbacks JP Kobryn
2025-05-06 0:52 ` Shakeel Butt
2025-05-07 9:02 ` Yosry Ahmed
2025-05-09 21:46 ` JP Kobryn
2025-05-03 0:12 ` [PATCH v5 2/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-05-07 9:24 ` Yosry Ahmed [this message]
2025-05-09 17:53 ` JP Kobryn
2025-05-12 17:30 ` JP Kobryn
2025-05-03 0:12 ` [PATCH v5 3/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-05-07 9:37 ` Yosry Ahmed
2025-05-03 0:12 ` [PATCH v5 4/5] cgroup: helper for checking rstat participation of css JP Kobryn
2025-05-07 9:38 ` Yosry Ahmed
2025-05-03 0:12 ` [PATCH v5 5/5] cgroup: document the rstat per-cpu initialization JP Kobryn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBsm22A8qWjGJgY9@google.com \
--to=yosry.ahmed@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=inwardvessel@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-mm@kvack.org \
--cc=mkoutny@suse.com \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.