From: JP Kobryn <inwardvessel@gmail.com>
To: Bertrand Wlodarczyk <bertrand.wlodarczyk@intel.com>,
tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cgroup/rstat: change cgroup_base_stat to atomic
Date: Tue, 17 Jun 2025 14:05:37 -0700 [thread overview]
Message-ID: <5f055416-9c49-42c0-9ba0-e45f6aaeac04@gmail.com> (raw)
In-Reply-To: <20250617102644.752201-2-bertrand.wlodarczyk@intel.com>
On 6/17/25 3:26 AM, Bertrand Wlodarczyk wrote:
> The kernel currently faces scalability issues when multiple userspace
> programs attempt to read cgroup statistics concurrently.
>
> The primary bottleneck is the css_cgroup_lock in cgroup_rstat_flush,
> which prevents access and updates to the statistics
> of the css from multiple CPUs in parallel.
>
> Given that rstat operates on a per-CPU basis and only aggregates
> statistics in the parent cgroup, there is no compelling reason
> why these statistics cannot be atomic.
Have you considered the "tearing" that will occur when writes and reads
are happening in parallel?
The existing state is more of a snapshot approach. Changing the fields
involved to atomic and lockless reading/writing can result in
inconsistent values, i.e. fieldA might be more current than fieldB.
> By eliminating the lock, each CPU can traverse its rstat hierarchy
> independently, without blocking. Synchronization is achieved during
> parent propagation through atomic operations.
Even if the tearing scenario mentioned above is acceptable, removing
the lock will break synchronization of flushing non-base stat
subsystems.
>
> This change significantly enhances performance in scenarios
> where multiple CPUs access CPU rstat within a single cgroup hierarchy,
> yielding a performance improvement of around 50 times compared
> to the mainline version.
> Notably, performance for memory and I/O rstats remains unchanged,
> as these are managed in separate submodules.
>
> Additionally, this patch addresses a race condition detectable
> in the current mainline by KCSAN in __cgroup_account_cputime,
> which occurs when attempting to read a single hierarchy
> from multiple CPUs.
>
> Signed-off-by: Bertrand Wlodarczyk <bertrand.wlodarczyk@intel.com>
> ---
> Benchmark code: https://gist.github.com/bwlodarcz/c955b36b5667f0167dffcff23953d1da
>
> Tested on Intel(R) Xeon(R) Platinum 8468V, 2s 48c 2tpc, 377GiB RAM, Fedora 41:
> +--------+-------+
> |Mainline|Patched|
> +--------+-------+
> |369.95s |6.52s |
> +--------+-------+
>
[..]
> @@ -820,7 +813,6 @@ struct cgroup_subsys {
> */
> unsigned int depends_on;
>
> - spinlock_t rstat_ss_lock;
This lock is not used with base stats. The base stats are not a formal
subsystem.
[..]
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index cbeaa499a96a..36af2b883440 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -9,7 +9,6 @@
>
> #include <trace/events/cgroup.h>
>
> -static DEFINE_SPINLOCK(rstat_base_lock);
> static DEFINE_PER_CPU(raw_spinlock_t, rstat_base_cpu_lock);
>
> static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu);
> @@ -37,14 +36,6 @@ static struct cgroup_rstat_base_cpu *cgroup_rstat_base_cpu(
> return per_cpu_ptr(cgrp->rstat_base_cpu, cpu);
> }
>
> -static spinlock_t *ss_rstat_lock(struct cgroup_subsys *ss)
> -{
> - if (ss)
> - return &ss->rstat_ss_lock;
This was needed for non-base stat subsystems like memory and io.
> -
> - return &rstat_base_lock;
> -}
next prev parent reply other threads:[~2025-06-17 21:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-17 10:26 [PATCH] cgroup/rstat: change cgroup_base_stat to atomic Bertrand Wlodarczyk
2025-06-17 12:38 ` kernel test robot
2025-06-17 13:16 ` Michal Koutný
2025-06-18 14:31 ` Wlodarczyk, Bertrand
2025-06-17 13:32 ` kernel test robot
2025-06-17 21:05 ` JP Kobryn [this message]
2025-06-18 16:05 ` Wlodarczyk, Bertrand
2025-06-18 17:17 ` Shakeel Butt
2025-06-18 17:21 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5f055416-9c49-42c0-9ba0-e45f6aaeac04@gmail.com \
--to=inwardvessel@gmail.com \
--cc=bertrand.wlodarczyk@intel.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.