From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79EF1314A98 for ; Tue, 16 Jun 2026 20:38:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781642321; cv=none; b=OUUf6EKgUirNzO4kZaFyR0vZunFK0HrEDnC2Bj/6XDjjnQ2qAiUfb+g4wqcYmDZGSk2LuPPyDyV/ml+ARAckI0I5Cys1brez6Tny+a6I41xqrJ1F5cI9TCnPvGfFy+YBJbVobT8ifyYvWvQ2MnSFRx+heZEQHgOfPPkKZpriUoc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781642321; c=relaxed/simple; bh=mO+wvQYlwq5DeSCHqGAXwwZiVrVOtKM9CWq7mExtFM8=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=HIi/0lSA6uWhjS0zhs3ciZ6Ox4RtqacE6iuvBdylIwf1u+vWJ/NCJJF1hIIpbOdIGIVjOZyK7UlraVVIqXTaLzt7upoHkKS2Zr4lThHJ+IQxcjyUVJ+WiOKFqx7HosYS8x3ntcEw1bctFebiUq2Gr4er7kxSq/7sPkqOf0+u8CQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=eW3XGoDv; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="eW3XGoDv" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:Message-ID:Date:Subject:Cc :To:From:Sender:Reply-To:Content-Type:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=oHYJXEuxVWKdz4fA17ZPZ70zxlaqHh/2BHdf5zYKCUY=; b=eW3XGoDvmqneX9QOj3HtBFKKDx fCsxYfdDMky6v9WrX9gKxDYDID7PZrCybg77VgzUYoTvrhCM5moQafMGCRAnEozI9+wdpUuZpnhgD OjnHgNfL2kk7hqAnNqZu9oMhKlcafDUgS7OWwmxK0Z0bDBdUt2wPLxo3uEm4H39ZhheHYCmBmoMwC uQ9RP0p1NmfyPY9PPMqVmZjhkNWzdMRhw/3gMvXkDLSjW05l77DvkmmMCHN0VEzhd7Z+l5Uhigwny LdNSEvVS+ezpByhDVqJ0tM+HPAZOwiTJ9pGzboB40W2qaAelnWfF6e8MdzsNpl/TyT5CdJTnzMzYj BZm52Egg==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wZaYE-000000006E8-35DA; Tue, 16 Jun 2026 16:38:26 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, Rik van Riel , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Thomas Gleixner , Mathieu Desnoyers , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak Subject: [PATCH v3] sched/mmcid: fix OOB clear_bit when CID is MM_CID_UNSET in fixup path Date: Tue, 16 Jun 2026 16:38:17 -0400 Message-ID: <20260616203818.1516263-1-riel@surriel.com> X-Mailer: git-send-email 2.54.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and mm_cid.active is set, the CID is checked with cid_in_transit() before setting the transition bit. In per-CPU mode a newly forked or exec'd task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are assigned lazily on schedule-in. With cid_in_transit() the guard passes for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET | MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this to clear_bit() with MM_CID_UNSET as the bit number, triggering an out-of-bounds write. Symptoms: this is genuine memory corruption, but a bounded out-of-bounds write, not an arbitrary one. MM_CID_UNSET is the fixed sentinel BIT(31), so once the bad value reaches mm_cid_schedout() the cid_from_transit_cid() strip leaves MM_CID_UNSET, which fails the "cid < max_cids" convergence test and falls into mm_drop_cid() -> clear_bit(MM_CID_UNSET, mm_cidmask(mm)). The cid bitmap is embedded in the mm_struct slab object (after cpu_bitmap and mm_cpus_allowed) and is only num_possible_cpus() bits wide, so clearing bit 31 is a deterministic OOB bit-clear at a fixed offset of 2^31 / 8 == 256 MiB past the bitmap base. The address is not attacker-influenced (fixed sentinel -> fixed offset) and the op only clears a single bit; what sits 256 MiB further along the direct map is whatever kernel object happens to live there, so this corrupts one bit of unpredictable kernel memory -- it is not an arbitrary-address or arbitrary-value write. It triggers only in per-CPU CID mode, when a CPU is running an active task of the target mm whose cid is still MM_CID_UNSET -- the fork()/execve() window before that task's next schedule-in assigns it a real CID -- and a per-CPU -> per-task fixup walks over it (the mode fallback driven by a thread exit, sched_mm_cid_exit(), or by the deferred max_cids recompute in mm_cid_work_fn()). In practice syzkaller surfaced it as a KASAN use-after-free reported in __schedule -> mm_cid_switch_to, where the offending clear_bit() is inlined via mm_cid_schedout() -> mm_drop_cid(). Guard the transition-bit assignment against MM_CID_UNSET, in addition to the existing cid_in_transit() check, so the bit is only set on a genuine task-owned CID. A CPU-owned (MM_CID_ONCPU) CID of a running active task is handled by the cid_on_cpu(pcp->cid) branch above and never reaches this path, so excluding MM_CID_UNSET (and the already-transitioning case) is sufficient. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions") Assisted-by: Claude:claude-opus-4-8 syzkaller Signed-off-by: Rik van Riel Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Thomas Gleixner Cc: Mathieu Desnoyers Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Valentin Schneider Cc: K Prateek Nayak Cc: linux-kernel@vger.kernel.org --- kernel/sched/core.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8b791e9e9f67..3cc6fb1d2054 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10909,8 +10909,19 @@ static void mm_cid_fixup_cpus_to_tasks(struct mm_struct *mm) } else if (rq->curr->mm == mm && rq->curr->mm_cid.active) { unsigned int cid = rq->curr->mm_cid.cid; - /* Ensure it has the transition bit set */ - if (!cid_in_transit(cid)) { + /* + * Set the transition bit only on a genuine task-owned + * CID. A running active task can legitimately have + * MM_CID_UNSET here: in per-CPU mode CIDs are assigned + * lazily on schedule-in, so the fork()/execve() window + * leaves the task active with no owned CID. Setting the + * transition bit on MM_CID_UNSET would later feed + * clear_bit() an out-of-bounds bit number via + * mm_cid_schedout(), so exclude it. A CPU-owned + * (MM_CID_ONCPU) CID is handled by the cid_on_cpu() + * branch above and never reaches here. + */ + if (cid != MM_CID_UNSET && !cid_in_transit(cid)) { cid = cid_to_transit_cid(cid); rq->curr->mm_cid.cid = cid; pcp->cid = cid; -- 2.53.0-Meta