* [PATCH v3] sched/mmcid: fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
@ 2026-06-16 20:38 Rik van Riel
2026-06-16 21:40 ` Mathieu Desnoyers
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Rik van Riel @ 2026-06-16 20:38 UTC (permalink / raw)
To: linux-kernel
Cc: kernel-team, Rik van Riel, Ingo Molnar, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Thomas Gleixner, Mathieu Desnoyers,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, K Prateek Nayak
In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and
mm_cid.active is set, the CID is checked with cid_in_transit() before
setting the transition bit. In per-CPU mode a newly forked or exec'd
task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are
assigned lazily on schedule-in. With cid_in_transit() the guard passes
for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET |
MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this
to clear_bit() with MM_CID_UNSET as the bit number, triggering an
out-of-bounds write.
Symptoms: this is genuine memory corruption, but a bounded out-of-bounds
write, not an arbitrary one. MM_CID_UNSET is the fixed sentinel BIT(31),
so once the bad value reaches mm_cid_schedout() the cid_from_transit_cid()
strip leaves MM_CID_UNSET, which fails the "cid < max_cids" convergence
test and falls into mm_drop_cid() -> clear_bit(MM_CID_UNSET,
mm_cidmask(mm)). The cid bitmap is embedded in the mm_struct slab object
(after cpu_bitmap and mm_cpus_allowed) and is only num_possible_cpus()
bits wide, so clearing bit 31 is a deterministic OOB bit-clear at a
fixed offset of 2^31 / 8 == 256 MiB past the bitmap base. The address is
not attacker-influenced (fixed sentinel -> fixed offset) and the op only
clears a single bit; what sits 256 MiB further along the direct map is
whatever kernel object happens to live there, so this corrupts one bit of
unpredictable kernel memory -- it is not an arbitrary-address or
arbitrary-value write.
It triggers only in per-CPU CID mode, when a CPU is running an active
task of the target mm whose cid is still MM_CID_UNSET -- the
fork()/execve() window before that task's next schedule-in assigns it a
real CID -- and a per-CPU -> per-task fixup walks over it (the mode
fallback driven by a thread exit, sched_mm_cid_exit(), or by the deferred
max_cids recompute in mm_cid_work_fn()).
In practice syzkaller surfaced it as a KASAN use-after-free reported in
__schedule -> mm_cid_switch_to, where the offending clear_bit() is inlined
via mm_cid_schedout() -> mm_drop_cid().
Guard the transition-bit assignment against MM_CID_UNSET, in addition to
the existing cid_in_transit() check, so the bit is only set on a genuine
task-owned CID. A CPU-owned (MM_CID_ONCPU) CID of a running active task
is handled by the cid_on_cpu(pcp->cid) branch above and never reaches
this path, so excluding MM_CID_UNSET (and the already-transitioning case)
is sufficient.
Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Assisted-by: Claude:claude-opus-4-8 syzkaller
Signed-off-by: Rik van Riel <riel@surriel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: linux-kernel@vger.kernel.org
---
kernel/sched/core.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b791e9e9f67..3cc6fb1d2054 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10909,8 +10909,19 @@ static void mm_cid_fixup_cpus_to_tasks(struct mm_struct *mm)
} else if (rq->curr->mm == mm && rq->curr->mm_cid.active) {
unsigned int cid = rq->curr->mm_cid.cid;
- /* Ensure it has the transition bit set */
- if (!cid_in_transit(cid)) {
+ /*
+ * Set the transition bit only on a genuine task-owned
+ * CID. A running active task can legitimately have
+ * MM_CID_UNSET here: in per-CPU mode CIDs are assigned
+ * lazily on schedule-in, so the fork()/execve() window
+ * leaves the task active with no owned CID. Setting the
+ * transition bit on MM_CID_UNSET would later feed
+ * clear_bit() an out-of-bounds bit number via
+ * mm_cid_schedout(), so exclude it. A CPU-owned
+ * (MM_CID_ONCPU) CID is handled by the cid_on_cpu()
+ * branch above and never reaches here.
+ */
+ if (cid != MM_CID_UNSET && !cid_in_transit(cid)) {
cid = cid_to_transit_cid(cid);
rq->curr->mm_cid.cid = cid;
pcp->cid = cid;
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v3] sched/mmcid: fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
2026-06-16 20:38 [PATCH v3] sched/mmcid: fix OOB clear_bit when CID is MM_CID_UNSET in fixup path Rik van Riel
@ 2026-06-16 21:40 ` Mathieu Desnoyers
2026-06-19 19:40 ` Thomas Gleixner
2026-06-19 19:50 ` [tip: core/urgent] sched/mmcid: Fix " tip-bot2 for Rik van Riel
2 siblings, 0 replies; 4+ messages in thread
From: Mathieu Desnoyers @ 2026-06-16 21:40 UTC (permalink / raw)
To: Rik van Riel, linux-kernel, Thomas Gleixner
Cc: kernel-team, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, K Prateek Nayak
On 2026-06-16 16:38, Rik van Riel wrote:
> In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and
> mm_cid.active is set, the CID is checked with cid_in_transit() before
> setting the transition bit. In per-CPU mode a newly forked or exec'd
> task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are
> assigned lazily on schedule-in. With cid_in_transit() the guard passes
> for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET |
> MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this
> to clear_bit() with MM_CID_UNSET as the bit number, triggering an
> out-of-bounds write.
Thomas, can you have a look as well in case we missed something
subtle ?
Rik, did you check whether there are other instances of that
MM_CID_UNSET issue lurking in the code, or was your analysis
focused on the reproduced bug ?
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3] sched/mmcid: fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
2026-06-16 20:38 [PATCH v3] sched/mmcid: fix OOB clear_bit when CID is MM_CID_UNSET in fixup path Rik van Riel
2026-06-16 21:40 ` Mathieu Desnoyers
@ 2026-06-19 19:40 ` Thomas Gleixner
2026-06-19 19:50 ` [tip: core/urgent] sched/mmcid: Fix " tip-bot2 for Rik van Riel
2 siblings, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2026-06-19 19:40 UTC (permalink / raw)
To: Rik van Riel, linux-kernel
Cc: kernel-team, Rik van Riel, Ingo Molnar, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Mathieu Desnoyers, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak
On Tue, Jun 16 2026 at 16:38, Rik van Riel wrote:
> In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and
> mm_cid.active is set, the CID is checked with cid_in_transit() before
> setting the transition bit. In per-CPU mode a newly forked or exec'd
> task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are
> assigned lazily on schedule-in. With cid_in_transit() the guard passes
> for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET |
> MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this
> to clear_bit() with MM_CID_UNSET as the bit number, triggering an
> out-of-bounds write.
>
> Symptoms: this is genuine memory corruption, but a bounded out-of-bounds
> write, not an arbitrary one. MM_CID_UNSET is the fixed sentinel BIT(31),
> so once the bad value reaches mm_cid_schedout() the cid_from_transit_cid()
> strip leaves MM_CID_UNSET, which fails the "cid < max_cids" convergence
> test and falls into mm_drop_cid() -> clear_bit(MM_CID_UNSET,
> mm_cidmask(mm)). The cid bitmap is embedded in the mm_struct slab object
> (after cpu_bitmap and mm_cpus_allowed) and is only num_possible_cpus()
> bits wide, so clearing bit 31 is a deterministic OOB bit-clear at a
> fixed offset of 2^31 / 8 == 256 MiB past the bitmap base. The address is
> not attacker-influenced (fixed sentinel -> fixed offset) and the op only
> clears a single bit; what sits 256 MiB further along the direct map is
> whatever kernel object happens to live there, so this corrupts one bit of
> unpredictable kernel memory -- it is not an arbitrary-address or
> arbitrary-value write.
>
> It triggers only in per-CPU CID mode, when a CPU is running an active
> task of the target mm whose cid is still MM_CID_UNSET -- the
> fork()/execve() window before that task's next schedule-in assigns it a
> real CID -- and a per-CPU -> per-task fixup walks over it (the mode
> fallback driven by a thread exit, sched_mm_cid_exit(), or by the deferred
> max_cids recompute in mm_cid_work_fn()).
>
> In practice syzkaller surfaced it as a KASAN use-after-free reported in
> __schedule -> mm_cid_switch_to, where the offending clear_bit() is inlined
> via mm_cid_schedout() -> mm_drop_cid().
>
> Guard the transition-bit assignment against MM_CID_UNSET, in addition to
> the existing cid_in_transit() check, so the bit is only set on a genuine
> task-owned CID. A CPU-owned (MM_CID_ONCPU) CID of a running active task
> is handled by the cid_on_cpu(pcp->cid) branch above and never reaches
> this path, so excluding MM_CID_UNSET (and the already-transitioning case)
> is sufficient.
Duh. Now that you explained it it's obvious. Thanks for tracking this
nasty down!
^ permalink raw reply [flat|nested] 4+ messages in thread
* [tip: core/urgent] sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
2026-06-16 20:38 [PATCH v3] sched/mmcid: fix OOB clear_bit when CID is MM_CID_UNSET in fixup path Rik van Riel
2026-06-16 21:40 ` Mathieu Desnoyers
2026-06-19 19:40 ` Thomas Gleixner
@ 2026-06-19 19:50 ` tip-bot2 for Rik van Riel
2 siblings, 0 replies; 4+ messages in thread
From: tip-bot2 for Rik van Riel @ 2026-06-19 19:50 UTC (permalink / raw)
To: linux-tip-commits
Cc: Rik van Riel, Thomas Gleixner, Mathieu Desnoyers, stable, x86,
linux-kernel
The following commit has been merged into the core/urgent branch of tip:
Commit-ID: de3ab9bd3133899efb92e4cd05ba4203e58fc0a3
Gitweb: https://git.kernel.org/tip/de3ab9bd3133899efb92e4cd05ba4203e58fc0a3
Author: Rik van Riel <riel@surriel.com>
AuthorDate: Tue, 16 Jun 2026 16:38:17 -04:00
Committer: Thomas Gleixner <tglx@kernel.org>
CommitterDate: Fri, 19 Jun 2026 21:44:16 +02:00
sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and
mm_cid.active is set, the CID is checked with cid_in_transit() before
setting the transition bit. In per-CPU mode a newly forked or exec'd
task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are
assigned lazily on schedule-in. With cid_in_transit() the guard passes
for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET |
MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this
to clear_bit() with MM_CID_UNSET as the bit number, triggering an
out-of-bounds write.
Symptoms: this is genuine memory corruption, but a bounded out-of-bounds
write, not an arbitrary one. MM_CID_UNSET is the fixed sentinel BIT(31),
so once the bad value reaches mm_cid_schedout() the cid_from_transit_cid()
strip leaves MM_CID_UNSET, which fails the "cid < max_cids" convergence
test and falls into mm_drop_cid() -> clear_bit(MM_CID_UNSET,
mm_cidmask(mm)). The cid bitmap is embedded in the mm_struct slab object
(after cpu_bitmap and mm_cpus_allowed) and is only num_possible_cpus()
bits wide, so clearing bit 31 is a deterministic OOB bit-clear at a
fixed offset of 2^31 / 8 == 256 MiB past the bitmap base. The address is
not attacker-influenced (fixed sentinel -> fixed offset) and the op only
clears a single bit; what sits 256 MiB further along the direct map is
whatever kernel object happens to live there, so this corrupts one bit of
unpredictable kernel memory -- it is not an arbitrary-address or
arbitrary-value write.
It triggers only in per-CPU CID mode, when a CPU is running an active
task of the target mm whose cid is still MM_CID_UNSET -- the
fork()/execve() window before that task's next schedule-in assigns it a
real CID -- and a per-CPU -> per-task fixup walks over it (the mode
fallback driven by a thread exit, sched_mm_cid_exit(), or by the deferred
max_cids recompute in mm_cid_work_fn()).
In practice syzkaller surfaced it as a KASAN use-after-free reported in
__schedule -> mm_cid_switch_to, where the offending clear_bit() is inlined
via mm_cid_schedout() -> mm_drop_cid().
Guard the transition-bit assignment against MM_CID_UNSET, in addition to
the existing cid_in_transit() check, so the bit is only set on a genuine
task-owned CID. A CPU-owned (MM_CID_ONCPU) CID of a running active task
is handled by the cid_on_cpu(pcp->cid) branch above and never reaches
this path, so excluding MM_CID_UNSET (and the already-transitioning case)
is sufficient.
Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Assisted-by: Claude:claude-opus-4-8 syzkaller
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260616203818.1516263-1-riel@surriel.com
---
kernel/sched/core.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b791e9..3cc6fb1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10909,8 +10909,19 @@ static void mm_cid_fixup_cpus_to_tasks(struct mm_struct *mm)
} else if (rq->curr->mm == mm && rq->curr->mm_cid.active) {
unsigned int cid = rq->curr->mm_cid.cid;
- /* Ensure it has the transition bit set */
- if (!cid_in_transit(cid)) {
+ /*
+ * Set the transition bit only on a genuine task-owned
+ * CID. A running active task can legitimately have
+ * MM_CID_UNSET here: in per-CPU mode CIDs are assigned
+ * lazily on schedule-in, so the fork()/execve() window
+ * leaves the task active with no owned CID. Setting the
+ * transition bit on MM_CID_UNSET would later feed
+ * clear_bit() an out-of-bounds bit number via
+ * mm_cid_schedout(), so exclude it. A CPU-owned
+ * (MM_CID_ONCPU) CID is handled by the cid_on_cpu()
+ * branch above and never reaches here.
+ */
+ if (cid != MM_CID_UNSET && !cid_in_transit(cid)) {
cid = cid_to_transit_cid(cid);
rq->curr->mm_cid.cid = cid;
pcp->cid = cid;
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-19 19:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 20:38 [PATCH v3] sched/mmcid: fix OOB clear_bit when CID is MM_CID_UNSET in fixup path Rik van Riel
2026-06-16 21:40 ` Mathieu Desnoyers
2026-06-19 19:40 ` Thomas Gleixner
2026-06-19 19:50 ` [tip: core/urgent] sched/mmcid: Fix " tip-bot2 for Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox