* [PATCH v6 0/6] cgroup: separate rstat trees
@ 2025-05-15 0:19 JP Kobryn
2025-05-15 0:19 ` [PATCH v6 1/6] cgroup: warn on rstat usage by early init subsystems JP Kobryn
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: JP Kobryn @ 2025-05-15 0:19 UTC (permalink / raw)
To: tj, shakeel.butt, yosryahmed, mkoutny, hannes, akpm
Cc: linux-mm, cgroups, kernel-team
The current design of rstat takes the approach that if one subsystem is
flushed, all other subsystems with pending updates should also be flushed.
A flush may be initiated by reading specific stats (like cpu.stat) and
other subsystems will be flushed alongside. The complexity of flushing some
subsystems has grown to the extent that the overhead of side flushes is
causing noticeable delays in reading the desired stats.
One big area where the issue comes up is system telemetry, where programs
periodically sample cpu stats while the memory controller is enabled. It
would be a benefit for programs sampling cpu.stat if the overhead of having
to flush memory (and also io) stats was eliminated. It would save cpu
cycles for existing stat reader programs and improve scalability in terms
of sampling frequency and host volume.
This series changes the approach of "flush all subsystems" to "flush only
the requested subsystem". The core design change is moving from a unified
model where rstat trees are shared by subsystems to having separate trees
for each subsystem. On a per-cpu basis, there will be separate trees for
each enabled subsystem that implements css_rstat_flush plus one tree
dedicated to the base stats. In order to do this, the rstat list pointers
were moved off of the cgroup and onto the css. In the transition, these
pointer types were changed to cgroup_subsys_state. Finally the API for
updated/flush was changed to accept a reference to a css instead of a
cgroup. This allows for a specific subsystem to be associated with a given
update or flush. The result is that rstat trees will now be made up of css
nodes, and a given tree will only contain nodes associated with a specific
subsystem.
Since separate trees will now be in use, the locking scheme was adjusted.
The global locks were split up in such a way that there are separate locks
for the base stats and also for each subsystem (memory, io, etc). This
allows different subsystems (and base stats) to use rstat in parallel with
no contention.
Breaking up the unified tree into separate trees eliminates the overhead
and scalability issues explained in the first section, but comes at the
cost of additional memory. Originally, each cgroup contained an instance of
the cgroup_rstat_cpu. The design change of moving to css-based trees calls
for each css having the rstat per-cpu objects instead. Moving these objects
to every css is where this overhead is created. In an effort to minimize
this, the cgroup_rstat_cpu struct was split into two separate structs. One
is the cgroup_rstat_base_cpu struct which only contains the per-cpu base
stat objects used in rstat. The other is the css_rstat_cpu struct which
contains the minimum amount of pointers needed for a css to participate in
rstat. Since only the cgroup::self css is associated with the base stats,
an instance of the cgroup_rstat_base_cpu struct is placed on the cgroup.
Meanwhile an instance of the css_rstat_cpu is placed on the
cgroup_subsys_state. This allows for all css's to participate in rstat
while avoiding the unnecessary inclusion of the base stats. The base stat
objects will only exist once per-cgroup regardless of however many
subsystems are enabled. With this division of rstat list pointers and base
stats, the change in memory overhead on a per-cpu basis before/after is
shown below.
memory overhead before:
nr_cgroups * sizeof(struct cgroup_rstat_cpu)
where
sizeof(struct cgroup_rstat_cpu) = 144 bytes /* config-dependent */
resulting in
nr_cgroups * 144 bytes
memory overhead after:
nr_cgroups * (
sizeof(struct cgroup_rstat_base_cpu) +
sizeof(struct css_rstat_cpu) * (1 + nr_rstat_controllers)
)
where
sizeof(struct cgroup_rstat_base_cpu) = 128 bytes
sizeof(struct css_rstat_cpu) = 16 bytes
the constant "1" accounts for the cgroup::self css
nr_rstat_controllers = number of controllers defining css_rstat_flush
when both memory and io are enabled
nr_rstat_controllers = 2
resulting in
nr_cgroups * (128 + 16 * (1 + 2))
nr_cgroups * 176 bytes
This leaves us with an increase in memory overhead of:
32 bytes per cgroup per cpu
Validation was performed by reading some *.stat files of a target parent
cgroup while the system was under different workloads. A test program was
made to loop 1M times, reading the files cgroup.stat, cpu.stat, io.stat,
memory.stat of the parent cgroup each iteration. Using a non-patched kernel
as control and this series as experimental, the findings show perf gains
when reading stats with this series.
The first experiment consisted of a parent cgroup with memory.swap.max=0
and memory.max=1G. On a 52-cpu machine, 26 child cgroups were created and
within each child cgroup a process was spawned to frequently update the
memory cgroup stats by creating and then reading a file of size 1T
(encouraging reclaim). The test program was run alongside these 26 tasks in
parallel. The results showed time and perf gains for the reader test
program.
test program elapsed time
control:
real 1m48.716s
user 0m0.968s
sys 1m47.271s
experiment:
real 1m2.455s
user 0m0.849s
sys 1m1.349s
test program perf
control:
31.73% mem_cgroup_css_rstat_flush
5.43% __blkcg_rstat_flush
0.06% cpu_stat_show
experiment:
4.28% mem_cgroup_css_rstat_flush
0.30% blkcg_print_stat
0.06% cpu_stat_show
It's worth noting that memcg uses heuristics to optimize flushing.
Depending on the state of updated stats at a given time, a memcg flush may
be considered unnecessary and skipped as a result. This opportunity to skip
a flush is bypassed when memcg is flushed as a consequence of sharing the
tree with another controller.
A second experiment was setup on the same host using a parent cgroup with
two child cgroups. In the two child cgroups, kernel builds were done in
parallel, each using "-j 20". The perf comparison is shown below.
test program elapsed time
control:
real 1m55.079s
user 0m1.147s
sys 1m53.153s
experiment:
real 1m0.840s
user 0m0.999s
sys 0m59.413s
test program perf
control:
34.52% mem_cgroup_css_rstat_flush
4.24% __blkcg_rstat_flush
0.07% cpu_stat_show
experiment:
2.06% mem_cgroup_css_rstat_flush
0.17% blkcg_print_stat
0.11% cpu_stat_show
The final experiment differs from the previous two in that it measures
performance from the stat updater perspective. A kernel build was run in a
child node with -j 20 on the same host and cgroup setup. A baseline was
established by having the build run while no stats were read. The builds
were then repeated while stats were constantly being read. In all cases,
perf appeared similar in cycles spent on cgroup_rstat_updated()
(insignificant compared to the other recorded events). As for the elapsed
build times, the results of the different scenarios are shown below,
showing no significant drawbacks of the split tree approach.
control with no readers
real 5m12.240s
user 84m51.325s
sys 3m53.507s
control with constant readers of {memory,io,cpu,cgroup}.stat
real 5m13.485s
user 84m51.361s
sys 4m7.204s
experiment with no readers
real 5m12.123s
user 84m50.655s
sys 3m53.344s
experiment with constant readers of {memory,io,cpu,cgroup}.stat
real 5m13.936s
user 85m11.534s
sys 4m4.301s
changelog
v6:
add patch for warning on invalid rstat and early init combination
change impl of css_is_cgroup() and rename to css_is_self()
change impl of ss_rstat_init() so there is only one per-cpu loop
move "cgrp" pointer initialization to top of css_rstat_init()
rename is_rstat_css() to css_uses_rstat()
change comment "subsystem lock" to include "rstat" in blk-cgroup.c
change comment "cgroup" to "css" in updated_list/push_children
v5:
new patch for using css_is_cgroup() in more places
new patch adding is_css_rstat() helper
new patch documenting circumstances behind where css_rstat_init occurs
check if css is cgroup early in css_rstat_flush()
remove ss->css_rstat_flush check in flush loop
fix css_rstat_flush where "pos" should be used instead of "css"
change lockdep text in __css_rstat_lock/unlock()
remove unnecessary base lock init in ss_rstat_init()
guard against invalid css in css_rstat_updated/flush()
guard against invalid css in css_rstat_init/exit()
call css_rstat_updated/flush and css_rstat_init/exit unconditionally
consolidate calls to css_rstat_exit() into one (aside from error cases)
eliminate call to css_rstat_init() in cgroup_init() for ss->early_init
move comment changes to matching commits where applicable
fix comment with mention of stale function css_rstat_flush_locked()
fix comment referring to "cgroup" where "css" should be used
v4:
drop bpf api patch
drop cgroup_rstat_cpu split and union patch,
replace with patch for moving base stats into new struct
new patch for renaming rstat api's from cgroup_* to css_*
new patch for adding css_is_cgroup() helper
rename ss->lock and ss->cpu_lock to ss->rstat_ss_lock and
ss->rstat_ss_cpu_lock respectively
rename root_self_stat_cpu to root_base_rstat_cpu
rename cgroup_rstat_push_children to css_rstat_push_children
format comments for consistency in wings and capitalization
update comments in bpf selftests
v3:
new bpf kfunc api for updated/flush
rename cgroup_rstat_{updated,flush} and related to "css_rstat_*"
check for ss->css_rstat_flush existence where applicable
rename locks for base stats
move subsystem locks to cgroup_subsys struct
change cgroup_rstat_boot() to ss_rstat_init(ss) and init locks within
change lock helpers to accept css and perform lock selection within
fix comments that had outdated lock names
add open css_is_cgroup() helper
rename rstatc to rstatbc to reflect base stats in use
rename cgroup_dfl_root_rstat_cpu to root_self_rstat_cpu
add comments in early init code to explain deferred allocation
misc formatting fixes
v2:
drop the patch creating a new cgroup_rstat struct and related code
drop bpf-specific patches. instead just use cgroup::self in bpf progs
drop the cpu lock patches. instead select cpu lock in updated_list func
relocate the cgroup_rstat_init() call to inside css_create()
relocate the cgroup_rstat_exit() cleanup from apply_control_enable()
to css_free_rwork_fn()
v1:
https://lore.kernel.org/all/20250218031448.46951-1-inwardvessel@gmail.com/
JP Kobryn (6):
cgroup: warn on rstat usage by early init subsystems
cgroup: compare css to cgroup::self in helper for distingushing css
cgroup: use separate rstat trees for each subsystem
cgroup: use subsystem-specific rstat locks to avoid contention
cgroup: helper for checking rstat participation of css
cgroup: document the rstat per-cpu initialization
block/blk-cgroup.c | 4 +-
include/linux/cgroup-defs.h | 78 +++--
include/linux/cgroup.h | 10 +-
include/trace/events/cgroup.h | 12 +-
kernel/cgroup/cgroup-internal.h | 2 +-
kernel/cgroup/cgroup.c | 47 ++-
kernel/cgroup/rstat.c | 313 +++++++++++-------
.../selftests/bpf/progs/btf_type_tag_percpu.c | 18 +-
8 files changed, 301 insertions(+), 183 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v6 1/6] cgroup: warn on rstat usage by early init subsystems
2025-05-15 0:19 [PATCH v6 0/6] cgroup: separate rstat trees JP Kobryn
@ 2025-05-15 0:19 ` JP Kobryn
2025-05-19 20:18 ` Tejun Heo
2025-05-15 0:19 ` [PATCH v6 2/6] cgroup: compare css to cgroup::self in helper for distingushing css JP Kobryn
` (4 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: JP Kobryn @ 2025-05-15 0:19 UTC (permalink / raw)
To: tj, shakeel.butt, yosryahmed, mkoutny, hannes, akpm
Cc: linux-mm, cgroups, kernel-team
An early init subsystem that attempts to make use of rstat can lead to
failures during early boot. The reason for this is the timing in which the
css's of the root cgroup have css_online() invoked on them. At the point of
this call, there is a stated assumption that a cgroup has "successfully
completed all allocations" [0]. An example of a subsystem that relies on
the previously mentioned assumption [0] is the memory subsystem. Within its
implementation of css_online(), work is queued to asynchronously begin
flushing via rstat. In the early init path for a given subsystem, having
rstat enabled leads to this sequence:
cgroup_init_early()
for_each_subsys(ss, ssid)
if (ss->early_init)
cgroup_init_subsys(ss, true)
cgroup_init_subsys(ss, early_init)
css = ss->css_alloc(...)
init_and_link_css(css, ss, ...)
...
online_css(css)
online_css(css)
ss = css->ss
ss->css_online(css)
Continuing to use the memory subsystem as an example, the issue with this
sequence is that css_rstat_init() has not been called yet. This means there
is now a race between the pending async work to flush rstat and the call to
css_rstat_init(). So a flush can occur within the given cgroup while the
rstat fields are not initialized.
Since we are in the early init phase, the rstat fields cannot be
initialized because they require per-cpu allocations. So it's not possible
to have css_rstat_init() called early enough (before online_css()). This
patch treats the combination of early init and rstat the same as as other
invalid conditions.
[0] Documentation/admin-guide/cgroup-v1/cgroups.rst (section: css_online)
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
---
kernel/cgroup/cgroup.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 7471811a00de..83b35c22da95 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6149,6 +6149,8 @@ int __init cgroup_init_early(void)
ss->id, ss->name);
WARN(strlen(cgroup_subsys_name[i]) > MAX_CGROUP_TYPE_NAMELEN,
"cgroup_subsys_name %s too long\n", cgroup_subsys_name[i]);
+ WARN(ss->early_init && ss->css_rstat_flush,
+ "cgroup rstat cannot be used with early init subsystem\n");
ss->id = i;
ss->name = cgroup_subsys_name[i];
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v6 2/6] cgroup: compare css to cgroup::self in helper for distingushing css
2025-05-15 0:19 [PATCH v6 0/6] cgroup: separate rstat trees JP Kobryn
2025-05-15 0:19 ` [PATCH v6 1/6] cgroup: warn on rstat usage by early init subsystems JP Kobryn
@ 2025-05-15 0:19 ` JP Kobryn
2025-05-19 20:21 ` Tejun Heo
2025-05-15 0:19 ` [PATCH v6 3/6] cgroup: use separate rstat trees for each subsystem JP Kobryn
` (3 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: JP Kobryn @ 2025-05-15 0:19 UTC (permalink / raw)
To: tj, shakeel.butt, yosryahmed, mkoutny, hannes, akpm
Cc: linux-mm, cgroups, kernel-team
Adjust the implementation of css_is_cgroup() so that it compares the given
css to cgroup::self. Rename the function to css_is_self() in order to
reflect that. Change the existing css->ss NULL check to a warning in the
true branch. Finally, adjust call sites to use the new function name.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
---
include/linux/cgroup.h | 10 ++++++++--
kernel/cgroup/cgroup.c | 8 ++++----
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 04d4ccc7b1c5..00cb37b6fdab 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -347,9 +347,15 @@ static inline bool css_is_dying(struct cgroup_subsys_state *css)
return css->flags & CSS_DYING;
}
-static inline bool css_is_cgroup(struct cgroup_subsys_state *css)
+static inline bool css_is_self(struct cgroup_subsys_state *css)
{
- return css->ss == NULL;
+ if (css == &css->cgroup->self) {
+ /* cgroup::self should not have subsystem association */
+ WARN_ON(css->ss != NULL);
+ return true;
+ }
+
+ return false;
}
static inline void cgroup_get(struct cgroup *cgrp)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 83b35c22da95..ce6a60b9b585 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1706,7 +1706,7 @@ static void css_clear_dir(struct cgroup_subsys_state *css)
css->flags &= ~CSS_VISIBLE;
- if (css_is_cgroup(css)) {
+ if (css_is_self(css)) {
if (cgroup_on_dfl(cgrp)) {
cgroup_addrm_files(css, cgrp,
cgroup_base_files, false);
@@ -1738,7 +1738,7 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
if (css->flags & CSS_VISIBLE)
return 0;
- if (css_is_cgroup(css)) {
+ if (css_is_self(css)) {
if (cgroup_on_dfl(cgrp)) {
ret = cgroup_addrm_files(css, cgrp,
cgroup_base_files, true);
@@ -5406,7 +5406,7 @@ static void css_free_rwork_fn(struct work_struct *work)
percpu_ref_exit(&css->refcnt);
- if (ss) {
+ if (!css_is_self(css)) {
/* css free path */
struct cgroup_subsys_state *parent = css->parent;
int id = css->id;
@@ -5460,7 +5460,7 @@ static void css_release_work_fn(struct work_struct *work)
css->flags |= CSS_RELEASED;
list_del_rcu(&css->sibling);
- if (ss) {
+ if (!css_is_self(css)) {
struct cgroup *parent_cgrp;
/* css release path */
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v6 3/6] cgroup: use separate rstat trees for each subsystem
2025-05-15 0:19 [PATCH v6 0/6] cgroup: separate rstat trees JP Kobryn
2025-05-15 0:19 ` [PATCH v6 1/6] cgroup: warn on rstat usage by early init subsystems JP Kobryn
2025-05-15 0:19 ` [PATCH v6 2/6] cgroup: compare css to cgroup::self in helper for distingushing css JP Kobryn
@ 2025-05-15 0:19 ` JP Kobryn
2025-05-15 0:19 ` [PATCH v6 4/6] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: JP Kobryn @ 2025-05-15 0:19 UTC (permalink / raw)
To: tj, shakeel.butt, yosryahmed, mkoutny, hannes, akpm
Cc: linux-mm, cgroups, kernel-team
Different subsystems may call cgroup_rstat_updated() within the same
cgroup, resulting in a tree of pending updates from multiple subsystems.
When one of these subsystems is flushed via cgroup_rstat_flushed(), all
other subsystems with pending updates on the tree will also be flushed.
Change the paradigm of having a single rstat tree for all subsystems to
having separate trees for each subsystem. This separation allows for
subsystems to perform flushes without the side effects of other subsystems.
As an example, flushing the cpu stats will no longer cause the memory stats
to be flushed and vice versa.
In order to achieve subsystem-specific trees, change the tree node type
from cgroup to cgroup_subsys_state pointer. Then remove those pointers from
the cgroup and instead place them on the css. Finally, change update/flush
functions to make use of the different node type (css). These changes allow
a specific subsystem to be associated with an update or flush. Separate
rstat trees will now exist for each unique subsystem.
Since updating/flushing will now be done at the subsystem level, there is
no longer a need to keep track of updated css nodes at the cgroup level.
The list management of these nodes done within the cgroup (rstat_css_list
and related) has been removed accordingly.
Conditional guards for checking validity of a given css were placed within
css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc
usage in bpf programs. Guards were also placed within css_rstat_init/exit()
in order to help consolidate calls to them. At call sites for all four
functions, the existing guards were removed.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
---
include/linux/cgroup-defs.h | 46 ++--
kernel/cgroup/cgroup.c | 34 +--
kernel/cgroup/rstat.c | 206 ++++++++++--------
.../selftests/bpf/progs/btf_type_tag_percpu.c | 18 +-
4 files changed, 162 insertions(+), 142 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index e58bfb880111..17ecaae9c5f8 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -169,6 +169,8 @@ struct cgroup_subsys_state {
/* reference count - access via css_[try]get() and css_put() */
struct percpu_ref refcnt;
+ struct css_rstat_cpu __percpu *rstat_cpu;
+
/*
* siblings list anchored at the parent's ->children
*
@@ -177,9 +179,6 @@ struct cgroup_subsys_state {
struct list_head sibling;
struct list_head children;
- /* flush target list anchored at cgrp->rstat_css_list */
- struct list_head rstat_css_node;
-
/*
* PI: Subsys-unique ID. 0 is unused and root is always 1. The
* matching css can be looked up using css_from_id().
@@ -219,6 +218,13 @@ struct cgroup_subsys_state {
* Protected by cgroup_mutex.
*/
int nr_descendants;
+
+ /*
+ * A singly-linked list of css structures to be rstat flushed.
+ * This is a scratch field to be used exclusively by
+ * css_rstat_flush() and protected by cgroup_rstat_lock.
+ */
+ struct cgroup_subsys_state *rstat_flush_next;
};
/*
@@ -329,10 +335,10 @@ struct cgroup_base_stat {
/*
* rstat - cgroup scalable recursive statistics. Accounting is done
- * per-cpu in cgroup_rstat_cpu which is then lazily propagated up the
+ * per-cpu in css_rstat_cpu which is then lazily propagated up the
* hierarchy on reads.
*
- * When a stat gets updated, the cgroup_rstat_cpu and its ancestors are
+ * When a stat gets updated, the css_rstat_cpu and its ancestors are
* linked into the updated tree. On the following read, propagation only
* considers and consumes the updated tree. This makes reading O(the
* number of descendants which have been active since last read) instead of
@@ -346,20 +352,20 @@ struct cgroup_base_stat {
* This struct hosts both the fields which implement the above -
* updated_children and updated_next.
*/
-struct cgroup_rstat_cpu {
+struct css_rstat_cpu {
/*
* Child cgroups with stat updates on this cpu since the last read
* are linked on the parent's ->updated_children through
- * ->updated_next.
+ * ->updated_next. updated_children is terminated by its container css.
*
- * In addition to being more compact, singly-linked list pointing
- * to the cgroup makes it unnecessary for each per-cpu struct to
- * point back to the associated cgroup.
+ * In addition to being more compact, singly-linked list pointing to
+ * the css makes it unnecessary for each per-cpu struct to point back
+ * to the associated css.
*
* Protected by per-cpu cgroup_rstat_cpu_lock.
*/
- struct cgroup *updated_children; /* terminated by self cgroup */
- struct cgroup *updated_next; /* NULL iff not on the list */
+ struct cgroup_subsys_state *updated_children;
+ struct cgroup_subsys_state *updated_next; /* NULL if not on the list */
};
/*
@@ -521,25 +527,15 @@ struct cgroup {
struct cgroup *dom_cgrp;
struct cgroup *old_dom_cgrp; /* used while enabling threaded */
- /* per-cpu recursive resource statistics */
- struct cgroup_rstat_cpu __percpu *rstat_cpu;
struct cgroup_rstat_base_cpu __percpu *rstat_base_cpu;
- struct list_head rstat_css_list;
/*
- * Add padding to separate the read mostly rstat_cpu and
- * rstat_css_list into a different cacheline from the following
- * rstat_flush_next and *bstat fields which can have frequent updates.
+ * Add padding to keep the read mostly rstat per-cpu pointer on a
+ * different cacheline than the following *bstat fields which can have
+ * frequent updates.
*/
CACHELINE_PADDING(_pad_);
- /*
- * A singly-linked list of cgroup structures to be rstat flushed.
- * This is a scratch field to be used exclusively by
- * css_rstat_flush_locked() and protected by cgroup_rstat_lock.
- */
- struct cgroup *rstat_flush_next;
-
/* cgroup basic resource statistics */
struct cgroup_base_stat last_bstat;
struct cgroup_base_stat bstat;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index ce6a60b9b585..45097dc9e099 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -161,12 +161,12 @@ static struct static_key_true *cgroup_subsys_on_dfl_key[] = {
};
#undef SUBSYS
-static DEFINE_PER_CPU(struct cgroup_rstat_cpu, root_rstat_cpu);
+static DEFINE_PER_CPU(struct css_rstat_cpu, root_rstat_cpu);
static DEFINE_PER_CPU(struct cgroup_rstat_base_cpu, root_rstat_base_cpu);
/* the default hierarchy */
struct cgroup_root cgrp_dfl_root = {
- .cgrp.rstat_cpu = &root_rstat_cpu,
+ .cgrp.self.rstat_cpu = &root_rstat_cpu,
.cgrp.rstat_base_cpu = &root_rstat_base_cpu,
};
EXPORT_SYMBOL_GPL(cgrp_dfl_root);
@@ -1362,7 +1362,6 @@ static void cgroup_destroy_root(struct cgroup_root *root)
cgroup_unlock();
- css_rstat_exit(&cgrp->self);
kernfs_destroy_root(root->kf_root);
cgroup_free_root(root);
}
@@ -1867,13 +1866,6 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
}
spin_unlock_irq(&css_set_lock);
- if (ss->css_rstat_flush) {
- list_del_rcu(&css->rstat_css_node);
- synchronize_rcu();
- list_add_rcu(&css->rstat_css_node,
- &dcgrp->rstat_css_list);
- }
-
/* default hierarchy doesn't enable controllers by default */
dst_root->subsys_mask |= 1 << ssid;
if (dst_root == &cgrp_dfl_root) {
@@ -2056,7 +2048,6 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
cgrp->dom_cgrp = cgrp;
cgrp->max_descendants = INT_MAX;
cgrp->max_depth = INT_MAX;
- INIT_LIST_HEAD(&cgrp->rstat_css_list);
prev_cputime_init(&cgrp->prev_cputime);
for_each_subsys(ss, ssid)
@@ -5405,6 +5396,7 @@ static void css_free_rwork_fn(struct work_struct *work)
struct cgroup *cgrp = css->cgroup;
percpu_ref_exit(&css->refcnt);
+ css_rstat_exit(css);
if (!css_is_self(css)) {
/* css free path */
@@ -5435,7 +5427,6 @@ static void css_free_rwork_fn(struct work_struct *work)
cgroup_put(cgroup_parent(cgrp));
kernfs_put(cgrp->kn);
psi_cgroup_free(cgrp);
- css_rstat_exit(css);
kfree(cgrp);
} else {
/*
@@ -5463,11 +5454,7 @@ static void css_release_work_fn(struct work_struct *work)
if (!css_is_self(css)) {
struct cgroup *parent_cgrp;
- /* css release path */
- if (!list_empty(&css->rstat_css_node)) {
- css_rstat_flush(css);
- list_del_rcu(&css->rstat_css_node);
- }
+ css_rstat_flush(css);
cgroup_idr_replace(&ss->css_idr, NULL, css->id);
if (ss->css_released)
@@ -5493,7 +5480,7 @@ static void css_release_work_fn(struct work_struct *work)
/* cgroup release path */
TRACE_CGROUP_PATH(release, cgrp);
- css_rstat_flush(css);
+ css_rstat_flush(&cgrp->self);
spin_lock_irq(&css_set_lock);
for (tcgrp = cgroup_parent(cgrp); tcgrp;
@@ -5541,7 +5528,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css,
css->id = -1;
INIT_LIST_HEAD(&css->sibling);
INIT_LIST_HEAD(&css->children);
- INIT_LIST_HEAD(&css->rstat_css_node);
css->serial_nr = css_serial_nr_next++;
atomic_set(&css->online_cnt, 0);
@@ -5550,9 +5536,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css,
css_get(css->parent);
}
- if (ss->css_rstat_flush)
- list_add_rcu(&css->rstat_css_node, &cgrp->rstat_css_list);
-
BUG_ON(cgroup_css(cgrp, ss));
}
@@ -5645,6 +5628,10 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
goto err_free_css;
css->id = err;
+ err = css_rstat_init(css);
+ if (err)
+ goto err_free_css;
+
/* @css is ready to be brought online now, make it visible */
list_add_tail_rcu(&css->sibling, &parent_css->children);
cgroup_idr_replace(&ss->css_idr, css, css->id);
@@ -5658,7 +5645,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
err_list_del:
list_del_rcu(&css->sibling);
err_free_css:
- list_del_rcu(&css->rstat_css_node);
INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
return ERR_PTR(err);
@@ -6101,6 +6087,8 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
} else {
css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
BUG_ON(css->id < 0);
+
+ BUG_ON(css_rstat_init(css));
}
/* Update the init_css_set to contain a subsys
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 357c538d14da..6ce134a7294d 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -14,9 +14,10 @@ static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock);
static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu);
-static struct cgroup_rstat_cpu *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu)
+static struct css_rstat_cpu *css_rstat_cpu(
+ struct cgroup_subsys_state *css, int cpu)
{
- return per_cpu_ptr(cgrp->rstat_cpu, cpu);
+ return per_cpu_ptr(css->rstat_cpu, cpu);
}
static struct cgroup_rstat_base_cpu *cgroup_rstat_base_cpu(
@@ -87,34 +88,40 @@ void _css_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu,
* @css: target cgroup subsystem state
* @cpu: cpu on which rstat_cpu was updated
*
- * @css->cgroup's rstat_cpu on @cpu was updated. Put it on the parent's
- * matching rstat_cpu->updated_children list. See the comment on top of
- * cgroup_rstat_cpu definition for details.
+ * @css's rstat_cpu on @cpu was updated. Put it on the parent's matching
+ * rstat_cpu->updated_children list. See the comment on top of
+ * css_rstat_cpu definition for details.
*/
__bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
{
- struct cgroup *cgrp = css->cgroup;
raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
unsigned long flags;
+ /*
+ * Since bpf programs can call this function, prevent access to
+ * uninitialized rstat pointers.
+ */
+ if (!css_is_self(css) && css->ss->css_rstat_flush == NULL)
+ return;
+
/*
* Speculative already-on-list test. This may race leading to
* temporary inaccuracies, which is fine.
*
* Because @parent's updated_children is terminated with @parent
- * instead of NULL, we can tell whether @cgrp is on the list by
+ * instead of NULL, we can tell whether @css is on the list by
* testing the next pointer for NULL.
*/
- if (data_race(cgroup_rstat_cpu(cgrp, cpu)->updated_next))
+ if (data_race(css_rstat_cpu(css, cpu)->updated_next))
return;
flags = _css_rstat_cpu_lock(cpu_lock, cpu, css, true);
- /* put @cgrp and all ancestors on the corresponding updated lists */
+ /* put @css and all ancestors on the corresponding updated lists */
while (true) {
- struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
- struct cgroup *parent = cgroup_parent(cgrp);
- struct cgroup_rstat_cpu *prstatc;
+ struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu);
+ struct cgroup_subsys_state *parent = css->parent;
+ struct css_rstat_cpu *prstatc;
/*
* Both additions and removals are bottom-up. If a cgroup
@@ -125,40 +132,41 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
/* Root has no parent to link it to, but mark it busy */
if (!parent) {
- rstatc->updated_next = cgrp;
+ rstatc->updated_next = css;
break;
}
- prstatc = cgroup_rstat_cpu(parent, cpu);
+ prstatc = css_rstat_cpu(parent, cpu);
rstatc->updated_next = prstatc->updated_children;
- prstatc->updated_children = cgrp;
+ prstatc->updated_children = css;
- cgrp = parent;
+ css = parent;
}
_css_rstat_cpu_unlock(cpu_lock, cpu, css, flags, true);
}
/**
- * cgroup_rstat_push_children - push children cgroups into the given list
+ * css_rstat_push_children - push children css's into the given list
* @head: current head of the list (= subtree root)
* @child: first child of the root
* @cpu: target cpu
- * Return: A new singly linked list of cgroups to be flushed
+ * Return: A new singly linked list of css's to be flushed
*
- * Iteratively traverse down the cgroup_rstat_cpu updated tree level by
+ * Iteratively traverse down the css_rstat_cpu updated tree level by
* level and push all the parents first before their next level children
* into a singly linked list via the rstat_flush_next pointer built from the
- * tail backward like "pushing" cgroups into a stack. The root is pushed by
+ * tail backward like "pushing" css's into a stack. The root is pushed by
* the caller.
*/
-static struct cgroup *cgroup_rstat_push_children(struct cgroup *head,
- struct cgroup *child, int cpu)
+static struct cgroup_subsys_state *css_rstat_push_children(
+ struct cgroup_subsys_state *head,
+ struct cgroup_subsys_state *child, int cpu)
{
- struct cgroup *cnext = child; /* Next head of child cgroup level */
- struct cgroup *ghead = NULL; /* Head of grandchild cgroup level */
- struct cgroup *parent, *grandchild;
- struct cgroup_rstat_cpu *crstatc;
+ struct cgroup_subsys_state *cnext = child; /* Next head of child css level */
+ struct cgroup_subsys_state *ghead = NULL; /* Head of grandchild css level */
+ struct cgroup_subsys_state *parent, *grandchild;
+ struct css_rstat_cpu *crstatc;
child->rstat_flush_next = NULL;
@@ -189,13 +197,13 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head,
while (cnext) {
child = cnext;
cnext = child->rstat_flush_next;
- parent = cgroup_parent(child);
+ parent = child->parent;
/* updated_next is parent cgroup terminated if !NULL */
while (child != parent) {
child->rstat_flush_next = head;
head = child;
- crstatc = cgroup_rstat_cpu(child, cpu);
+ crstatc = css_rstat_cpu(child, cpu);
grandchild = crstatc->updated_children;
if (grandchild != child) {
/* Push the grand child to the next level */
@@ -217,31 +225,32 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head,
}
/**
- * cgroup_rstat_updated_list - return a list of updated cgroups to be flushed
- * @root: root of the cgroup subtree to traverse
+ * css_rstat_updated_list - build a list of updated css's to be flushed
+ * @root: root of the css subtree to traverse
* @cpu: target cpu
- * Return: A singly linked list of cgroups to be flushed
+ * Return: A singly linked list of css's to be flushed
*
* Walks the updated rstat_cpu tree on @cpu from @root. During traversal,
- * each returned cgroup is unlinked from the updated tree.
+ * each returned css is unlinked from the updated tree.
*
* The only ordering guarantee is that, for a parent and a child pair
* covered by a given traversal, the child is before its parent in
* the list.
*
* Note that updated_children is self terminated and points to a list of
- * child cgroups if not empty. Whereas updated_next is like a sibling link
- * within the children list and terminated by the parent cgroup. An exception
- * here is the cgroup root whose updated_next can be self terminated.
+ * child css's if not empty. Whereas updated_next is like a sibling link
+ * within the children list and terminated by the parent css. An exception
+ * here is the css root whose updated_next can be self terminated.
*/
-static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu)
+static struct cgroup_subsys_state *css_rstat_updated_list(
+ struct cgroup_subsys_state *root, int cpu)
{
raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
- struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(root, cpu);
- struct cgroup *head = NULL, *parent, *child;
+ struct css_rstat_cpu *rstatc = css_rstat_cpu(root, cpu);
+ struct cgroup_subsys_state *head = NULL, *parent, *child;
unsigned long flags;
- flags = _css_rstat_cpu_lock(cpu_lock, cpu, &root->self, false);
+ flags = _css_rstat_cpu_lock(cpu_lock, cpu, root, false);
/* Return NULL if this subtree is not on-list */
if (!rstatc->updated_next)
@@ -251,17 +260,17 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu)
* Unlink @root from its parent. As the updated_children list is
* singly linked, we have to walk it to find the removal point.
*/
- parent = cgroup_parent(root);
+ parent = root->parent;
if (parent) {
- struct cgroup_rstat_cpu *prstatc;
- struct cgroup **nextp;
+ struct css_rstat_cpu *prstatc;
+ struct cgroup_subsys_state **nextp;
- prstatc = cgroup_rstat_cpu(parent, cpu);
+ prstatc = css_rstat_cpu(parent, cpu);
nextp = &prstatc->updated_children;
while (*nextp != root) {
- struct cgroup_rstat_cpu *nrstatc;
+ struct css_rstat_cpu *nrstatc;
- nrstatc = cgroup_rstat_cpu(*nextp, cpu);
+ nrstatc = css_rstat_cpu(*nextp, cpu);
WARN_ON_ONCE(*nextp == parent);
nextp = &nrstatc->updated_next;
}
@@ -276,9 +285,9 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu)
child = rstatc->updated_children;
rstatc->updated_children = root;
if (child != root)
- head = cgroup_rstat_push_children(head, child, cpu);
+ head = css_rstat_push_children(head, child, cpu);
unlock_ret:
- _css_rstat_cpu_unlock(cpu_lock, cpu, &root->self, flags, false);
+ _css_rstat_cpu_unlock(cpu_lock, cpu, root, flags, false);
return head;
}
@@ -339,41 +348,44 @@ static inline void __css_rstat_unlock(struct cgroup_subsys_state *css,
}
/**
- * css_rstat_flush - flush stats in @css->cgroup's subtree
+ * css_rstat_flush - flush stats in @css's rstat subtree
* @css: target cgroup subsystem state
*
- * Collect all per-cpu stats in @css->cgroup's subtree into the global counters
- * and propagate them upwards. After this function returns, all cgroups in
- * the subtree have up-to-date ->stat.
+ * Collect all per-cpu stats in @css's subtree into the global counters
+ * and propagate them upwards. After this function returns, all rstat
+ * nodes in the subtree have up-to-date ->stat.
*
- * This also gets all cgroups in the subtree including @css->cgroup off the
+ * This also gets all rstat nodes in the subtree including @css off the
* ->updated_children lists.
*
* This function may block.
*/
__bpf_kfunc void css_rstat_flush(struct cgroup_subsys_state *css)
{
- struct cgroup *cgrp = css->cgroup;
int cpu;
+ bool is_self = css_is_self(css);
+
+ /*
+ * Since bpf programs can call this function, prevent access to
+ * uninitialized rstat pointers.
+ */
+ if (!is_self && css->ss->css_rstat_flush == NULL)
+ return;
might_sleep();
for_each_possible_cpu(cpu) {
- struct cgroup *pos;
+ struct cgroup_subsys_state *pos;
/* Reacquire for each CPU to avoid disabling IRQs too long */
__css_rstat_lock(css, cpu);
- pos = cgroup_rstat_updated_list(cgrp, cpu);
+ pos = css_rstat_updated_list(css, cpu);
for (; pos; pos = pos->rstat_flush_next) {
- struct cgroup_subsys_state *css;
-
- cgroup_base_stat_flush(pos, cpu);
- bpf_rstat_flush(pos, cgroup_parent(pos), cpu);
-
- rcu_read_lock();
- list_for_each_entry_rcu(css, &pos->rstat_css_list,
- rstat_css_node)
- css->ss->css_rstat_flush(css, cpu);
- rcu_read_unlock();
+ if (is_self) {
+ cgroup_base_stat_flush(pos->cgroup, cpu);
+ bpf_rstat_flush(pos->cgroup,
+ cgroup_parent(pos->cgroup), cpu);
+ } else
+ pos->ss->css_rstat_flush(pos, cpu);
}
__css_rstat_unlock(css, cpu);
if (!cond_resched())
@@ -385,30 +397,41 @@ int css_rstat_init(struct cgroup_subsys_state *css)
{
struct cgroup *cgrp = css->cgroup;
int cpu;
+ bool is_self = css_is_self(css);
- /* the root cgrp has rstat_cpu preallocated */
- if (!cgrp->rstat_cpu) {
- cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu);
- if (!cgrp->rstat_cpu)
- return -ENOMEM;
- }
-
- if (!cgrp->rstat_base_cpu) {
- cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
+ if (is_self) {
+ /* the root cgrp has rstat_base_cpu preallocated */
if (!cgrp->rstat_base_cpu) {
- free_percpu(cgrp->rstat_cpu);
+ cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
+ if (!cgrp->rstat_base_cpu)
+ return -ENOMEM;
+ }
+ } else if (css->ss->css_rstat_flush == NULL)
+ return 0;
+
+ /* the root cgrp's self css has rstat_cpu preallocated */
+ if (!css->rstat_cpu) {
+ css->rstat_cpu = alloc_percpu(struct css_rstat_cpu);
+ if (!css->rstat_cpu) {
+ if (is_self)
+ free_percpu(cgrp->rstat_base_cpu);
+
return -ENOMEM;
}
}
/* ->updated_children list is self terminated */
for_each_possible_cpu(cpu) {
- struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
- struct cgroup_rstat_base_cpu *rstatbc =
- cgroup_rstat_base_cpu(cgrp, cpu);
+ struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu);
- rstatc->updated_children = cgrp;
- u64_stats_init(&rstatbc->bsync);
+ rstatc->updated_children = css;
+
+ if (is_self) {
+ struct cgroup_rstat_base_cpu *rstatbc;
+
+ rstatbc = cgroup_rstat_base_cpu(cgrp, cpu);
+ u64_stats_init(&rstatbc->bsync);
+ }
}
return 0;
@@ -416,24 +439,31 @@ int css_rstat_init(struct cgroup_subsys_state *css)
void css_rstat_exit(struct cgroup_subsys_state *css)
{
- struct cgroup *cgrp = css->cgroup;
int cpu;
- css_rstat_flush(&cgrp->self);
+ if (!css_is_self(css) && css->ss->css_rstat_flush == NULL)
+ return;
+
+ css_rstat_flush(css);
/* sanity check */
for_each_possible_cpu(cpu) {
- struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
+ struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu);
- if (WARN_ON_ONCE(rstatc->updated_children != cgrp) ||
+ if (WARN_ON_ONCE(rstatc->updated_children != css) ||
WARN_ON_ONCE(rstatc->updated_next))
return;
}
- free_percpu(cgrp->rstat_cpu);
- cgrp->rstat_cpu = NULL;
- free_percpu(cgrp->rstat_base_cpu);
- cgrp->rstat_base_cpu = NULL;
+ if (css_is_self(css)) {
+ struct cgroup *cgrp = css->cgroup;
+
+ free_percpu(cgrp->rstat_base_cpu);
+ cgrp->rstat_base_cpu = NULL;
+ }
+
+ free_percpu(css->rstat_cpu);
+ css->rstat_cpu = NULL;
}
void __init cgroup_rstat_boot(void)
diff --git a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c
index 38f78d9345de..69f81cb555ca 100644
--- a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c
+++ b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c
@@ -30,22 +30,27 @@ int BPF_PROG(test_percpu2, struct bpf_testmod_btf_type_tag_2 *arg)
/* trace_cgroup_mkdir(struct cgroup *cgrp, const char *path)
*
- * struct cgroup_rstat_cpu {
+ * struct css_rstat_cpu {
* ...
- * struct cgroup *updated_children;
+ * struct cgroup_subsys_state *updated_children;
* ...
* };
*
- * struct cgroup {
+ * struct cgroup_subsys_state {
+ * ...
+ * struct css_rstat_cpu __percpu *rstat_cpu;
* ...
- * struct cgroup_rstat_cpu __percpu *rstat_cpu;
+ * };
+ *
+ * struct cgroup {
+ * struct cgroup_subsys_state self;
* ...
* };
*/
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_percpu_load, struct cgroup *cgrp, const char *path)
{
- g = (__u64)cgrp->rstat_cpu->updated_children;
+ g = (__u64)cgrp->self.rstat_cpu->updated_children;
return 0;
}
@@ -56,7 +61,8 @@ int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char *path)
__u32 cpu;
cpu = bpf_get_smp_processor_id();
- rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr(cgrp->rstat_cpu, cpu);
+ rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr(
+ cgrp->self.rstat_cpu, cpu);
if (rstat) {
/* READ_ONCE */
*(volatile int *)rstat;
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v6 4/6] cgroup: use subsystem-specific rstat locks to avoid contention
2025-05-15 0:19 [PATCH v6 0/6] cgroup: separate rstat trees JP Kobryn
` (2 preceding siblings ...)
2025-05-15 0:19 ` [PATCH v6 3/6] cgroup: use separate rstat trees for each subsystem JP Kobryn
@ 2025-05-15 0:19 ` JP Kobryn
2025-05-15 0:19 ` [PATCH v6 5/6] cgroup: helper for checking rstat participation of css JP Kobryn
2025-05-15 0:19 ` [PATCH v6 6/6] cgroup: document the rstat per-cpu initialization JP Kobryn
5 siblings, 0 replies; 11+ messages in thread
From: JP Kobryn @ 2025-05-15 0:19 UTC (permalink / raw)
To: tj, shakeel.butt, yosryahmed, mkoutny, hannes, akpm
Cc: linux-mm, cgroups, kernel-team
It is possible to eliminate contention between subsystems when
updating/flushing stats by using subsystem-specific locks. Let the existing
rstat locks be dedicated to the cgroup base stats and rename them to
reflect that. Add similar locks to the cgroup_subsys struct for use with
individual subsystems.
Lock initialization is done in the new function ss_rstat_init(ss) which
replaces cgroup_rstat_boot(void). If NULL is passed to this function, the
global base stat locks will be initialized. Otherwise, the subsystem locks
will be initialized.
Change the existing lock helper functions to accept a reference to a css.
Then within these functions, conditionally select the appropriate locks
based on the subsystem affiliation of the given css. Add helper functions
for this selection routine to avoid repeated code.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
---
block/blk-cgroup.c | 4 +-
include/linux/cgroup-defs.h | 10 +++-
include/trace/events/cgroup.h | 12 +++-
kernel/cgroup/cgroup-internal.h | 2 +-
kernel/cgroup/cgroup.c | 3 +-
kernel/cgroup/rstat.c | 98 +++++++++++++++++++++++----------
6 files changed, 91 insertions(+), 38 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0560ea402856..4dc10c1e97a4 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1074,8 +1074,8 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
/*
* For covering concurrent parent blkg update from blkg_release().
*
- * When flushing from cgroup, cgroup_rstat_lock is always held, so
- * this lock won't cause contention most of time.
+ * When flushing from cgroup, the subsystem rstat lock is always held,
+ * so this lock won't cause contention most of time.
*/
raw_spin_lock_irqsave(&blkg_stat_lock, flags);
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 17ecaae9c5f8..5b8127d29dc5 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -222,7 +222,10 @@ struct cgroup_subsys_state {
/*
* A singly-linked list of css structures to be rstat flushed.
* This is a scratch field to be used exclusively by
- * css_rstat_flush() and protected by cgroup_rstat_lock.
+ * css_rstat_flush().
+ *
+ * Protected by rstat_base_lock when css is cgroup::self.
+ * Protected by css->ss->rstat_ss_lock otherwise.
*/
struct cgroup_subsys_state *rstat_flush_next;
};
@@ -362,7 +365,7 @@ struct css_rstat_cpu {
* the css makes it unnecessary for each per-cpu struct to point back
* to the associated css.
*
- * Protected by per-cpu cgroup_rstat_cpu_lock.
+ * Protected by per-cpu css->ss->rstat_ss_cpu_lock.
*/
struct cgroup_subsys_state *updated_children;
struct cgroup_subsys_state *updated_next; /* NULL if not on the list */
@@ -792,6 +795,9 @@ struct cgroup_subsys {
* specifies the mask of subsystems that this one depends on.
*/
unsigned int depends_on;
+
+ spinlock_t rstat_ss_lock;
+ raw_spinlock_t __percpu *rstat_ss_cpu_lock;
};
extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
diff --git a/include/trace/events/cgroup.h b/include/trace/events/cgroup.h
index af2755bda6eb..7d332387be6c 100644
--- a/include/trace/events/cgroup.h
+++ b/include/trace/events/cgroup.h
@@ -231,7 +231,11 @@ DECLARE_EVENT_CLASS(cgroup_rstat,
__entry->cpu, __entry->contended)
);
-/* Related to global: cgroup_rstat_lock */
+/*
+ * Related to locks:
+ * global rstat_base_lock for base stats
+ * cgroup_subsys::rstat_ss_lock for subsystem stats
+ */
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_lock_contended,
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
@@ -253,7 +257,11 @@ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_unlock,
TP_ARGS(cgrp, cpu, contended)
);
-/* Related to per CPU: cgroup_rstat_cpu_lock */
+/*
+ * Related to per CPU locks:
+ * global rstat_base_cpu_lock for base stats
+ * cgroup_subsys::rstat_ss_cpu_lock for subsystem stats
+ */
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_lock_contended,
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index c161d34be634..b14e61c64a34 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -272,7 +272,7 @@ int cgroup_task_count(const struct cgroup *cgrp);
*/
int css_rstat_init(struct cgroup_subsys_state *css);
void css_rstat_exit(struct cgroup_subsys_state *css);
-void cgroup_rstat_boot(void);
+int ss_rstat_init(struct cgroup_subsys *ss);
void cgroup_base_stat_cputime_show(struct seq_file *seq);
/*
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 45097dc9e099..44baa0318713 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6088,6 +6088,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
BUG_ON(css->id < 0);
+ BUG_ON(ss_rstat_init(ss));
BUG_ON(css_rstat_init(css));
}
@@ -6167,7 +6168,7 @@ int __init cgroup_init(void)
BUG_ON(cgroup_init_cftypes(NULL, cgroup_psi_files));
BUG_ON(cgroup_init_cftypes(NULL, cgroup1_base_files));
- cgroup_rstat_boot();
+ BUG_ON(ss_rstat_init(NULL));
get_user_ns(init_cgroup_ns.user_ns);
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 6ce134a7294d..0bb609e73bde 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -9,8 +9,8 @@
#include <trace/events/cgroup.h>
-static DEFINE_SPINLOCK(cgroup_rstat_lock);
-static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock);
+static DEFINE_SPINLOCK(rstat_base_lock);
+static DEFINE_PER_CPU(raw_spinlock_t, rstat_base_cpu_lock);
static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu);
@@ -26,8 +26,24 @@ static struct cgroup_rstat_base_cpu *cgroup_rstat_base_cpu(
return per_cpu_ptr(cgrp->rstat_base_cpu, cpu);
}
+static spinlock_t *ss_rstat_lock(struct cgroup_subsys *ss)
+{
+ if (ss)
+ return &ss->rstat_ss_lock;
+
+ return &rstat_base_lock;
+}
+
+static raw_spinlock_t *ss_rstat_cpu_lock(struct cgroup_subsys *ss, int cpu)
+{
+ if (ss)
+ return per_cpu_ptr(ss->rstat_ss_cpu_lock, cpu);
+
+ return per_cpu_ptr(&rstat_base_cpu_lock, cpu);
+}
+
/*
- * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock).
+ * Helper functions for rstat per CPU locks.
*
* This makes it easier to diagnose locking issues and contention in
* production environments. The parameter @fast_path determine the
@@ -35,21 +51,23 @@ static struct cgroup_rstat_base_cpu *cgroup_rstat_base_cpu(
* operations without handling high-frequency fast-path "update" events.
*/
static __always_inline
-unsigned long _css_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu,
- struct cgroup_subsys_state *css, const bool fast_path)
+unsigned long _css_rstat_cpu_lock(struct cgroup_subsys_state *css, int cpu,
+ const bool fast_path)
{
struct cgroup *cgrp = css->cgroup;
+ raw_spinlock_t *cpu_lock;
unsigned long flags;
bool contended;
/*
- * The _irqsave() is needed because cgroup_rstat_lock is
- * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring
- * this lock with the _irq() suffix only disables interrupts on
- * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables
- * interrupts on both configurations. The _irqsave() ensures
- * that interrupts are always disabled and later restored.
+ * The _irqsave() is needed because the locks used for flushing are
+ * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring this lock
+ * with the _irq() suffix only disables interrupts on a non-PREEMPT_RT
+ * kernel. The raw_spinlock_t below disables interrupts on both
+ * configurations. The _irqsave() ensures that interrupts are always
+ * disabled and later restored.
*/
+ cpu_lock = ss_rstat_cpu_lock(css->ss, cpu);
contended = !raw_spin_trylock_irqsave(cpu_lock, flags);
if (contended) {
if (fast_path)
@@ -69,17 +87,18 @@ unsigned long _css_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu,
}
static __always_inline
-void _css_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu,
- struct cgroup_subsys_state *css, unsigned long flags,
- const bool fast_path)
+void _css_rstat_cpu_unlock(struct cgroup_subsys_state *css, int cpu,
+ unsigned long flags, const bool fast_path)
{
struct cgroup *cgrp = css->cgroup;
+ raw_spinlock_t *cpu_lock;
if (fast_path)
trace_cgroup_rstat_cpu_unlock_fastpath(cgrp, cpu, false);
else
trace_cgroup_rstat_cpu_unlock(cgrp, cpu, false);
+ cpu_lock = ss_rstat_cpu_lock(css->ss, cpu);
raw_spin_unlock_irqrestore(cpu_lock, flags);
}
@@ -94,7 +113,6 @@ void _css_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu,
*/
__bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
{
- raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
unsigned long flags;
/*
@@ -115,7 +133,7 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
if (data_race(css_rstat_cpu(css, cpu)->updated_next))
return;
- flags = _css_rstat_cpu_lock(cpu_lock, cpu, css, true);
+ flags = _css_rstat_cpu_lock(css, cpu, true);
/* put @css and all ancestors on the corresponding updated lists */
while (true) {
@@ -143,7 +161,7 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
css = parent;
}
- _css_rstat_cpu_unlock(cpu_lock, cpu, css, flags, true);
+ _css_rstat_cpu_unlock(css, cpu, flags, true);
}
/**
@@ -171,11 +189,11 @@ static struct cgroup_subsys_state *css_rstat_push_children(
child->rstat_flush_next = NULL;
/*
- * The cgroup_rstat_lock must be held for the whole duration from
+ * The subsystem rstat lock must be held for the whole duration from
* here as the rstat_flush_next list is being constructed to when
* it is consumed later in css_rstat_flush().
*/
- lockdep_assert_held(&cgroup_rstat_lock);
+ lockdep_assert_held(ss_rstat_lock(head->ss));
/*
* Notation: -> updated_next pointer
@@ -245,12 +263,11 @@ static struct cgroup_subsys_state *css_rstat_push_children(
static struct cgroup_subsys_state *css_rstat_updated_list(
struct cgroup_subsys_state *root, int cpu)
{
- raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
struct css_rstat_cpu *rstatc = css_rstat_cpu(root, cpu);
struct cgroup_subsys_state *head = NULL, *parent, *child;
unsigned long flags;
- flags = _css_rstat_cpu_lock(cpu_lock, cpu, root, false);
+ flags = _css_rstat_cpu_lock(root, cpu, false);
/* Return NULL if this subtree is not on-list */
if (!rstatc->updated_next)
@@ -287,7 +304,7 @@ static struct cgroup_subsys_state *css_rstat_updated_list(
if (child != root)
head = css_rstat_push_children(head, child, cpu);
unlock_ret:
- _css_rstat_cpu_unlock(cpu_lock, cpu, root, flags, false);
+ _css_rstat_cpu_unlock(root, cpu, flags, false);
return head;
}
@@ -314,7 +331,7 @@ __weak noinline void bpf_rstat_flush(struct cgroup *cgrp,
__bpf_hook_end();
/*
- * Helper functions for locking cgroup_rstat_lock.
+ * Helper functions for locking.
*
* This makes it easier to diagnose locking issues and contention in
* production environments. The parameter @cpu_in_loop indicate lock
@@ -324,27 +341,31 @@ __bpf_hook_end();
*/
static inline void __css_rstat_lock(struct cgroup_subsys_state *css,
int cpu_in_loop)
- __acquires(&cgroup_rstat_lock)
+ __acquires(ss_rstat_lock(css->ss))
{
struct cgroup *cgrp = css->cgroup;
+ spinlock_t *lock;
bool contended;
- contended = !spin_trylock_irq(&cgroup_rstat_lock);
+ lock = ss_rstat_lock(css->ss);
+ contended = !spin_trylock_irq(lock);
if (contended) {
trace_cgroup_rstat_lock_contended(cgrp, cpu_in_loop, contended);
- spin_lock_irq(&cgroup_rstat_lock);
+ spin_lock_irq(lock);
}
trace_cgroup_rstat_locked(cgrp, cpu_in_loop, contended);
}
static inline void __css_rstat_unlock(struct cgroup_subsys_state *css,
int cpu_in_loop)
- __releases(&cgroup_rstat_lock)
+ __releases(ss_rstat_lock(css->ss))
{
struct cgroup *cgrp = css->cgroup;
+ spinlock_t *lock;
+ lock = ss_rstat_lock(css->ss);
trace_cgroup_rstat_unlock(cgrp, cpu_in_loop, false);
- spin_unlock_irq(&cgroup_rstat_lock);
+ spin_unlock_irq(lock);
}
/**
@@ -466,12 +487,29 @@ void css_rstat_exit(struct cgroup_subsys_state *css)
css->rstat_cpu = NULL;
}
-void __init cgroup_rstat_boot(void)
+/**
+ * ss_rstat_init - subsystem-specific rstat initialization
+ * @ss: target subsystem
+ *
+ * If @ss is NULL, the static locks associated with the base stats
+ * are initialized. If @ss is non-NULL, the subsystem-specific locks
+ * are initialized.
+ */
+int __init ss_rstat_init(struct cgroup_subsys *ss)
{
int cpu;
+ if (ss) {
+ ss->rstat_ss_cpu_lock = alloc_percpu(raw_spinlock_t);
+ if (!ss->rstat_ss_cpu_lock)
+ return -ENOMEM;
+ }
+
+ spin_lock_init(ss_rstat_lock(ss));
for_each_possible_cpu(cpu)
- raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu));
+ raw_spin_lock_init(ss_rstat_cpu_lock(ss, cpu));
+
+ return 0;
}
/*
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v6 5/6] cgroup: helper for checking rstat participation of css
2025-05-15 0:19 [PATCH v6 0/6] cgroup: separate rstat trees JP Kobryn
` (3 preceding siblings ...)
2025-05-15 0:19 ` [PATCH v6 4/6] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
@ 2025-05-15 0:19 ` JP Kobryn
2025-05-15 0:19 ` [PATCH v6 6/6] cgroup: document the rstat per-cpu initialization JP Kobryn
5 siblings, 0 replies; 11+ messages in thread
From: JP Kobryn @ 2025-05-15 0:19 UTC (permalink / raw)
To: tj, shakeel.butt, yosryahmed, mkoutny, hannes, akpm
Cc: linux-mm, cgroups, kernel-team
There are a few places where a conditional check is performed to validate a
given css on its rstat participation. This new helper tries to make the
code more readable where this check is performed.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
---
kernel/cgroup/rstat.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 0bb609e73bde..7dd396ae3c68 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -14,6 +14,17 @@ static DEFINE_PER_CPU(raw_spinlock_t, rstat_base_cpu_lock);
static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu);
+/*
+ * Determines whether a given css can participate in rstat.
+ * css's that are cgroup::self use rstat for base stats.
+ * Other css's associated with a subsystem use rstat only when
+ * they define the ss->css_rstat_flush callback.
+ */
+static inline bool css_uses_rstat(struct cgroup_subsys_state *css)
+{
+ return css_is_self(css) || css->ss->css_rstat_flush != NULL;
+}
+
static struct css_rstat_cpu *css_rstat_cpu(
struct cgroup_subsys_state *css, int cpu)
{
@@ -119,7 +130,7 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
* Since bpf programs can call this function, prevent access to
* uninitialized rstat pointers.
*/
- if (!css_is_self(css) && css->ss->css_rstat_flush == NULL)
+ if (!css_uses_rstat(css))
return;
/*
@@ -390,7 +401,7 @@ __bpf_kfunc void css_rstat_flush(struct cgroup_subsys_state *css)
* Since bpf programs can call this function, prevent access to
* uninitialized rstat pointers.
*/
- if (!is_self && css->ss->css_rstat_flush == NULL)
+ if (!css_uses_rstat(css))
return;
might_sleep();
@@ -462,7 +473,7 @@ void css_rstat_exit(struct cgroup_subsys_state *css)
{
int cpu;
- if (!css_is_self(css) && css->ss->css_rstat_flush == NULL)
+ if (!css_uses_rstat(css))
return;
css_rstat_flush(css);
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v6 6/6] cgroup: document the rstat per-cpu initialization
2025-05-15 0:19 [PATCH v6 0/6] cgroup: separate rstat trees JP Kobryn
` (4 preceding siblings ...)
2025-05-15 0:19 ` [PATCH v6 5/6] cgroup: helper for checking rstat participation of css JP Kobryn
@ 2025-05-15 0:19 ` JP Kobryn
2025-05-19 20:28 ` Tejun Heo
5 siblings, 1 reply; 11+ messages in thread
From: JP Kobryn @ 2025-05-15 0:19 UTC (permalink / raw)
To: tj, shakeel.butt, yosryahmed, mkoutny, hannes, akpm
Cc: linux-mm, cgroups, kernel-team
The calls to css_rstat_init() occur at different places depending on the
context. Document the conditions that determine which point of
initialization is used.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
---
include/linux/cgroup-defs.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 5b8127d29dc5..e61687d5e496 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -169,6 +169,21 @@ struct cgroup_subsys_state {
/* reference count - access via css_[try]get() and css_put() */
struct percpu_ref refcnt;
+ /*
+ * Depending on the context, this field is initialized
+ * via css_rstat_init() at different places:
+ *
+ * when css is associated with cgroup::self
+ * when css->cgroup is the root cgroup
+ * performed in cgroup_init()
+ * when css->cgroup is not the root cgroup
+ * performed in cgroup_create()
+ * when css is associated with a subsystem
+ * when css->cgroup is the root cgroup
+ * performed in cgroup_init_subsys() in the non-early path
+ * when css->cgroup is not the root cgroup
+ * performed in css_create()
+ */
struct css_rstat_cpu __percpu *rstat_cpu;
/*
@@ -530,6 +545,15 @@ struct cgroup {
struct cgroup *dom_cgrp;
struct cgroup *old_dom_cgrp; /* used while enabling threaded */
+ /*
+ * Depending on the context, this field is initialized via
+ * css_rstat_init() at different places:
+ *
+ * when cgroup is the root cgroup
+ * performed in cgroup_setup_root()
+ * otherwise
+ * performed in cgroup_create()
+ */
struct cgroup_rstat_base_cpu __percpu *rstat_base_cpu;
/*
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v6 1/6] cgroup: warn on rstat usage by early init subsystems
2025-05-15 0:19 ` [PATCH v6 1/6] cgroup: warn on rstat usage by early init subsystems JP Kobryn
@ 2025-05-19 20:18 ` Tejun Heo
2025-05-19 20:20 ` Tejun Heo
0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2025-05-19 20:18 UTC (permalink / raw)
To: JP Kobryn
Cc: shakeel.butt, yosryahmed, mkoutny, hannes, akpm, linux-mm,
cgroups, kernel-team
On Wed, May 14, 2025 at 05:19:32PM -0700, JP Kobryn wrote:
> An early init subsystem that attempts to make use of rstat can lead to
> failures during early boot. The reason for this is the timing in which the
> css's of the root cgroup have css_online() invoked on them. At the point of
> this call, there is a stated assumption that a cgroup has "successfully
> completed all allocations" [0]. An example of a subsystem that relies on
> the previously mentioned assumption [0] is the memory subsystem. Within its
> implementation of css_online(), work is queued to asynchronously begin
> flushing via rstat. In the early init path for a given subsystem, having
> rstat enabled leads to this sequence:
>
> cgroup_init_early()
> for_each_subsys(ss, ssid)
> if (ss->early_init)
> cgroup_init_subsys(ss, true)
>
> cgroup_init_subsys(ss, early_init)
> css = ss->css_alloc(...)
> init_and_link_css(css, ss, ...)
> ...
> online_css(css)
>
> online_css(css)
> ss = css->ss
> ss->css_online(css)
>
> Continuing to use the memory subsystem as an example, the issue with this
> sequence is that css_rstat_init() has not been called yet. This means there
> is now a race between the pending async work to flush rstat and the call to
> css_rstat_init(). So a flush can occur within the given cgroup while the
> rstat fields are not initialized.
>
> Since we are in the early init phase, the rstat fields cannot be
> initialized because they require per-cpu allocations. So it's not possible
> to have css_rstat_init() called early enough (before online_css()). This
> patch treats the combination of early init and rstat the same as as other
> invalid conditions.
>
> [0] Documentation/admin-guide/cgroup-v1/cgroups.rst (section: css_online)
>
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Applied to cgroup/for-6.16.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v6 1/6] cgroup: warn on rstat usage by early init subsystems
2025-05-19 20:18 ` Tejun Heo
@ 2025-05-19 20:20 ` Tejun Heo
0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2025-05-19 20:20 UTC (permalink / raw)
To: JP Kobryn
Cc: shakeel.butt, yosryahmed, mkoutny, hannes, akpm, linux-mm,
cgroups, kernel-team
On Mon, May 19, 2025 at 10:18:03AM -1000, Tejun Heo wrote:
> On Wed, May 14, 2025 at 05:19:32PM -0700, JP Kobryn wrote:
...
> > Since we are in the early init phase, the rstat fields cannot be
> > initialized because they require per-cpu allocations. So it's not possible
> > to have css_rstat_init() called early enough (before online_css()). This
> > patch treats the combination of early init and rstat the same as as other
> > invalid conditions.
...
> Applied to cgroup/for-6.16.
Applied this but we might want to WARN on actual flushing and skip flushing
rather than disallowing early_init + rstat wholesale if any subsys wants to
limp along between early_init and percpu allocator being up.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v6 2/6] cgroup: compare css to cgroup::self in helper for distingushing css
2025-05-15 0:19 ` [PATCH v6 2/6] cgroup: compare css to cgroup::self in helper for distingushing css JP Kobryn
@ 2025-05-19 20:21 ` Tejun Heo
0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2025-05-19 20:21 UTC (permalink / raw)
To: JP Kobryn
Cc: shakeel.butt, yosryahmed, mkoutny, hannes, akpm, linux-mm,
cgroups, kernel-team
On Wed, May 14, 2025 at 05:19:33PM -0700, JP Kobryn wrote:
> Adjust the implementation of css_is_cgroup() so that it compares the given
> css to cgroup::self. Rename the function to css_is_self() in order to
> reflect that. Change the existing css->ss NULL check to a warning in the
> true branch. Finally, adjust call sites to use the new function name.
>
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Applied to cgroup/for-6.16.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v6 6/6] cgroup: document the rstat per-cpu initialization
2025-05-15 0:19 ` [PATCH v6 6/6] cgroup: document the rstat per-cpu initialization JP Kobryn
@ 2025-05-19 20:28 ` Tejun Heo
0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2025-05-19 20:28 UTC (permalink / raw)
To: JP Kobryn
Cc: shakeel.butt, yosryahmed, mkoutny, hannes, akpm, linux-mm,
cgroups, kernel-team
On Wed, May 14, 2025 at 05:19:37PM -0700, JP Kobryn wrote:
> The calls to css_rstat_init() occur at different places depending on the
> context. Document the conditions that determine which point of
> initialization is used.
>
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Applied 3-6 to cgroup/for-6.16. If there still are issues, let's deal with
them with incremental patches.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-05-19 21:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 0:19 [PATCH v6 0/6] cgroup: separate rstat trees JP Kobryn
2025-05-15 0:19 ` [PATCH v6 1/6] cgroup: warn on rstat usage by early init subsystems JP Kobryn
2025-05-19 20:18 ` Tejun Heo
2025-05-19 20:20 ` Tejun Heo
2025-05-15 0:19 ` [PATCH v6 2/6] cgroup: compare css to cgroup::self in helper for distingushing css JP Kobryn
2025-05-19 20:21 ` Tejun Heo
2025-05-15 0:19 ` [PATCH v6 3/6] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-05-15 0:19 ` [PATCH v6 4/6] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-05-15 0:19 ` [PATCH v6 5/6] cgroup: helper for checking rstat participation of css JP Kobryn
2025-05-15 0:19 ` [PATCH v6 6/6] cgroup: document the rstat per-cpu initialization JP Kobryn
2025-05-19 20:28 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).