* [PATCH] sched/cpuacct: fix use-after-free in cpuacct_account_field()
@ 2026-04-05 2:47 Rik van Riel
2026-04-07 7:58 ` Peter Zijlstra
0 siblings, 1 reply; 4+ messages in thread
From: Rik van Riel @ 2026-04-05 2:47 UTC (permalink / raw)
To: linux-kernel
Cc: kernel-team, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Steven Rostedt
cpuacct_css_free() calls free_percpu() on ca->cpustat and ca->cpuusage,
then kfree(ca). However, a timer interrupt on another CPU can
concurrently access this data through cpuacct_account_field(), which
walks the cpuacct hierarchy via task_ca()/parent_ca() and performs
__this_cpu_add(ca->cpustat->cpustat[index], val).
The race window exists because put_css_set_locked() drops the CSS
reference (css_put) before the css_set is RCU-freed (kfree_rcu). This
means the CSS percpu_ref can reach zero and trigger the css_free chain
while readers obtained the CSS pointer from the old css_set that is
still visible via RCU.
Although css_free_rwork_fn is already called after one RCU grace period,
the css_set -> CSS reference drop in put_css_set_locked() creates a
window where the CSS free chain races with readers still holding the
old css_set reference.
With KASAN enabled, free_percpu() unmaps shadow pages, so the
KASAN-instrumented __this_cpu_add hits an unmapped shadow page
(PMD=0), causing a page fault in IRQ context that cascades into an
IRQ stack overflow.
Fix this by deferring the actual freeing of percpu data and the cpuacct
struct to an RCU callback via call_rcu(), ensuring that all concurrent
readers in RCU read-side critical sections (including timer tick
handlers) have completed before the memory is freed.
Found in an AI driven syzkaller run. The bug did not repeat in the
14 hours since this patch was applied.
Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4.6 syzkaller
Fixes: 3eba0505d03a ("sched/cpuacct: Remove redundant RCU read lock")
Cc: stable@kernel.org
---
kernel/sched/cpuacct.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index ca9d52cb1ebb..b6e7b34de616 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -28,6 +28,7 @@ struct cpuacct {
/* cpuusage holds pointer to a u64-type object on every CPU */
u64 __percpu *cpuusage;
struct kernel_cpustat __percpu *cpustat;
+ struct rcu_head rcu;
};
static inline struct cpuacct *css_ca(struct cgroup_subsys_state *css)
@@ -84,15 +85,22 @@ cpuacct_css_alloc(struct cgroup_subsys_state *parent_css)
}
/* Destroy an existing CPU accounting group */
-static void cpuacct_css_free(struct cgroup_subsys_state *css)
+static void cpuacct_free_rcu(struct rcu_head *rcu)
{
- struct cpuacct *ca = css_ca(css);
+ struct cpuacct *ca = container_of(rcu, struct cpuacct, rcu);
free_percpu(ca->cpustat);
free_percpu(ca->cpuusage);
kfree(ca);
}
+static void cpuacct_css_free(struct cgroup_subsys_state *css)
+{
+ struct cpuacct *ca = css_ca(css);
+
+ call_rcu(&ca->rcu, cpuacct_free_rcu);
+}
+
static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu,
enum cpuacct_stat_index index)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] sched/cpuacct: fix use-after-free in cpuacct_account_field()
2026-04-05 2:47 [PATCH] sched/cpuacct: fix use-after-free in cpuacct_account_field() Rik van Riel
@ 2026-04-07 7:58 ` Peter Zijlstra
2026-04-07 18:59 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2026-04-07 7:58 UTC (permalink / raw)
To: Rik van Riel, Tejun Heo
Cc: linux-kernel, kernel-team, Ingo Molnar, Juri Lelli,
Vincent Guittot, Steven Rostedt
On Sat, Apr 04, 2026 at 10:47:42PM -0400, Rik van Riel wrote:
> cpuacct_css_free() calls free_percpu() on ca->cpustat and ca->cpuusage,
> then kfree(ca). However, a timer interrupt on another CPU can
> concurrently access this data through cpuacct_account_field(), which
> walks the cpuacct hierarchy via task_ca()/parent_ca() and performs
> __this_cpu_add(ca->cpustat->cpustat[index], val).
>
> The race window exists because put_css_set_locked() drops the CSS
> reference (css_put) before the css_set is RCU-freed (kfree_rcu). This
> means the CSS percpu_ref can reach zero and trigger the css_free chain
> while readers obtained the CSS pointer from the old css_set that is
> still visible via RCU.
>
> Although css_free_rwork_fn is already called after one RCU grace period,
> the css_set -> CSS reference drop in put_css_set_locked() creates a
> window where the CSS free chain races with readers still holding the
> old css_set reference.
To me this reads like a cgroup fail, not a cpuacct fail per se. But I'm
forever confused there. TJ?
> With KASAN enabled, free_percpu() unmaps shadow pages, so the
> KASAN-instrumented __this_cpu_add hits an unmapped shadow page
> (PMD=0), causing a page fault in IRQ context that cascades into an
> IRQ stack overflow.
>
> Fix this by deferring the actual freeing of percpu data and the cpuacct
> struct to an RCU callback via call_rcu(), ensuring that all concurrent
> readers in RCU read-side critical sections (including timer tick
> handlers) have completed before the memory is freed.
>
> Found in an AI driven syzkaller run. The bug did not repeat in the
> 14 hours since this patch was applied.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Assisted-by: Claude:claude-opus-4.6 syzkaller
> Fixes: 3eba0505d03a ("sched/cpuacct: Remove redundant RCU read lock")
> Cc: stable@kernel.org
> ---
> kernel/sched/cpuacct.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index ca9d52cb1ebb..b6e7b34de616 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -28,6 +28,7 @@ struct cpuacct {
> /* cpuusage holds pointer to a u64-type object on every CPU */
> u64 __percpu *cpuusage;
> struct kernel_cpustat __percpu *cpustat;
> + struct rcu_head rcu;
> };
>
> static inline struct cpuacct *css_ca(struct cgroup_subsys_state *css)
> @@ -84,15 +85,22 @@ cpuacct_css_alloc(struct cgroup_subsys_state *parent_css)
> }
>
> /* Destroy an existing CPU accounting group */
> -static void cpuacct_css_free(struct cgroup_subsys_state *css)
> +static void cpuacct_free_rcu(struct rcu_head *rcu)
> {
> - struct cpuacct *ca = css_ca(css);
> + struct cpuacct *ca = container_of(rcu, struct cpuacct, rcu);
>
> free_percpu(ca->cpustat);
> free_percpu(ca->cpuusage);
> kfree(ca);
> }
>
> +static void cpuacct_css_free(struct cgroup_subsys_state *css)
> +{
> + struct cpuacct *ca = css_ca(css);
> +
> + call_rcu(&ca->rcu, cpuacct_free_rcu);
> +}
> +
> static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu,
> enum cpuacct_stat_index index)
> {
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] sched/cpuacct: fix use-after-free in cpuacct_account_field()
2026-04-07 7:58 ` Peter Zijlstra
@ 2026-04-07 18:59 ` Tejun Heo
2026-04-08 13:58 ` Rik van Riel
0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2026-04-07 18:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rik van Riel, linux-kernel, kernel-team, Ingo Molnar, Juri Lelli,
Vincent Guittot, Steven Rostedt
Hello,
On Tue, Apr 07, 2026 at 09:58:11AM +0200, Peter Zijlstra wrote:
> On Sat, Apr 04, 2026 at 10:47:42PM -0400, Rik van Riel wrote:
> > cpuacct_css_free() calls free_percpu() on ca->cpustat and ca->cpuusage,
> > then kfree(ca). However, a timer interrupt on another CPU can
> > concurrently access this data through cpuacct_account_field(), which
> > walks the cpuacct hierarchy via task_ca()/parent_ca() and performs
> > __this_cpu_add(ca->cpustat->cpustat[index], val).
> >
> > The race window exists because put_css_set_locked() drops the CSS
> > reference (css_put) before the css_set is RCU-freed (kfree_rcu). This
> > means the CSS percpu_ref can reach zero and trigger the css_free chain
> > while readers obtained the CSS pointer from the old css_set that is
> > still visible via RCU.
> >
> > Although css_free_rwork_fn is already called after one RCU grace period,
> > the css_set -> CSS reference drop in put_css_set_locked() creates a
> > window where the CSS free chain races with readers still holding the
> > old css_set reference.
>
> To me this reads like a cgroup fail, not a cpuacct fail per se. But I'm
> forever confused there. TJ?
css_free() is already called after a RCU grace period after the refcnt
reaches zero. The patch is adding another RCU grace period in the path,
which likely is just patching over the underlying problem.
cpuacct_account_field() is called from ticks. The task->cgroups is the RCU
protected pointer to the css_set from which the cpuacct pointer is read.
Each css_set pins the csses that it points to. The cpuacct's refcnt can't
reach zero as long as task->cgroups point to it and if the timer tick is
what's accessing it, the built-in RCU grace period in cgroup core should be
enough.
How reproducible is the problem? Do you have the KASAN report?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] sched/cpuacct: fix use-after-free in cpuacct_account_field()
2026-04-07 18:59 ` Tejun Heo
@ 2026-04-08 13:58 ` Rik van Riel
0 siblings, 0 replies; 4+ messages in thread
From: Rik van Riel @ 2026-04-08 13:58 UTC (permalink / raw)
To: Tejun Heo, Peter Zijlstra
Cc: linux-kernel, kernel-team, Ingo Molnar, Juri Lelli,
Vincent Guittot, Steven Rostedt
On Tue, 2026-04-07 at 08:59 -1000, Tejun Heo wrote:
>
> How reproducible is the problem? Do you have the KASAN report?
>
>
It looks like this crash only happened once, and the log
I found was kinda dubious. This makes the "didn't happen
again" data point a lot less useful.
Lets drop this one for now.
--
All Rights Reversed.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-08 13:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-05 2:47 [PATCH] sched/cpuacct: fix use-after-free in cpuacct_account_field() Rik van Riel
2026-04-07 7:58 ` Peter Zijlstra
2026-04-07 18:59 ` Tejun Heo
2026-04-08 13:58 ` Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox