From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96D4128BA95 for ; Tue, 10 Mar 2026 20:28:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174537; cv=none; b=pneL8B9uTRz5tbEM5HhQmkdG528M7GgZLaGOKveQf/+nH9GPsBEFCkUUfmezt/5HZqWZe381xLT6Oaf72RAacCaNI8PU6ZjS7qY60wLeUlz7yUU+wsX+36m0AUmefnsEGywUOFoVBZpI+sbbL72Sjkfr/5jdf9oCKNOhtyTw7BU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174537; c=relaxed/simple; bh=3a7Im+dx/w+zlGIvyhEdzYwRXoUOFhCC+NiflF0+us4=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=qPvwUG/YQRNPbP2J4HLAr1AhlLrYGhtZPK0z3yNZCrsXyFIGq+uxDCEVh+/PiqmOv39ie9HlzZpGJihzicgy+/S/ubSa/uYW75jMhKeMQr6EEQoKotKuR/IJ0TJvgKbCCkU8KMKsSFvbBSdYoWqsknN/n+oXnjltQJRMmwnW8tk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YEGjWncK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YEGjWncK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9BA18C19423; Tue, 10 Mar 2026 20:28:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773174537; bh=3a7Im+dx/w+zlGIvyhEdzYwRXoUOFhCC+NiflF0+us4=; h=Date:From:To:Cc:Subject:References:From; b=YEGjWncKIm5P+312kvroTSfDdve+QktWzkhFncXVVw/0hjb8ddYrWJZCT6j4LZUOj kNRhSq/m/zfzqraXH8eUFGYDBMRoWNqexsA4iNUDfFr8ZjfnMmODans2DdPnUMQN/s +1LhWocXP5HVyXVI/CDhoJASulM6W5GNGeKRLcZU+I7RYw3ogF2P4n1+JW4DjTf/OZ 9ZQurJmM8e9G+f2WTevd4x4Jh6Uo9RYLzYXmwdlCC6bS8zxHbkHWL+v/bR6LdsxpVb 3Karn2AQJY2DW8VQiGn2AMKXjdPSvSji7Z321hn8B2Fu8AWT64z16VsyqpzpuwD062 VJrgFePmmL6tg== Date: Tue, 10 Mar 2026 21:28:53 +0100 Message-ID: <20260310202525.969061974@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Mathieu Desnoyers , Matthieu Baerts , Jiri Slaby Subject: [patch 1/4] sched/mmcid: Prevent CID stalls due to concurrent forks References: <20260310201009.257617049@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 A newly forked task is accounted as MMCID user before the task is visible in the process' thread list and the global task list. This creates the following problem: CPU1 CPU2 fork() sched_mm_cid_fork(tnew1) tnew1->mm.mm_cid_users++; tnew1->mm_cid.cid = getcid() -> preemption fork() sched_mm_cid_fork(tnew2) tnew2->mm.mm_cid_users++; // Reaches the per CPU threshold mm_cid_fixup_tasks_to_cpus() for_each_other(current, p) .... As tnew1 is not visible yet, this fails to fix up the already allocated CID of tnew1. As a consequence a subsequent schedule in might fail to acquire a (transitional) CID and the machine stalls. Move the invocation of sched_mm_cid_fork() after the new task becomes visible in the thread and the task list to prevent this. This also makes it symmetrical vs. exit() where the task is removed as CID user before the task is removed from the thread and task lists. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions") Signed-off-by: Thomas Gleixner --- include/linux/sched.h | 2 -- kernel/fork.c | 2 -- kernel/sched/core.c | 22 +++++++++++++++------- 3 files changed, 15 insertions(+), 11 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2354,7 +2354,6 @@ static __always_inline void alloc_tag_re #ifdef CONFIG_SCHED_MM_CID void sched_mm_cid_before_execve(struct task_struct *t); void sched_mm_cid_after_execve(struct task_struct *t); -void sched_mm_cid_fork(struct task_struct *t); void sched_mm_cid_exit(struct task_struct *t); static __always_inline int task_mm_cid(struct task_struct *t) { @@ -2363,7 +2362,6 @@ static __always_inline int task_mm_cid(s #else static inline void sched_mm_cid_before_execve(struct task_struct *t) { } static inline void sched_mm_cid_after_execve(struct task_struct *t) { } -static inline void sched_mm_cid_fork(struct task_struct *t) { } static inline void sched_mm_cid_exit(struct task_struct *t) { } static __always_inline int task_mm_cid(struct task_struct *t) { --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1586,7 +1586,6 @@ static int copy_mm(u64 clone_flags, stru tsk->mm = mm; tsk->active_mm = mm; - sched_mm_cid_fork(tsk); return 0; } @@ -2498,7 +2497,6 @@ static bool need_futex_hash_allocate_def exit_nsproxy_namespaces(p); bad_fork_cleanup_mm: if (p->mm) { - sched_mm_cid_exit(p); mm_clear_owner(p->mm, p); mmput(p->mm); } --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4729,8 +4729,12 @@ void sched_cancel_fork(struct task_struc scx_cancel_fork(p); } +static void sched_mm_cid_fork(struct task_struct *t); + void sched_post_fork(struct task_struct *p) { + if (IS_ENABLED(CONFIG_SCHED_MM_CID)) + sched_mm_cid_fork(p); uclamp_post_fork(p); scx_post_fork(p); } @@ -10646,12 +10650,13 @@ static void mm_cid_do_fixup_tasks_to_cpu * possible switch back to per task mode happens either in the * deferred handler function or in the next fork()/exit(). * - * The caller has already transferred. The newly incoming task is - * already accounted for, but not yet visible. + * The caller has already transferred so remove it from the users + * count. The incoming task is already visible and has mm_cid.active, + * but has task::mm_cid::cid == UNSET. Still it needs to be accounted + * for. Concurrent fork()s might add more threads, but all of them have + * task::mm_cid::active = 0, so they don't affect the accounting here. */ - users = mm->mm_cid.users - 2; - if (!users) - return; + users = mm->mm_cid.users - 1; guard(rcu)(); for_other_threads(current, t) { @@ -10688,12 +10693,15 @@ static bool sched_mm_cid_add_user(struct return mm_update_max_cids(mm); } -void sched_mm_cid_fork(struct task_struct *t) +static void sched_mm_cid_fork(struct task_struct *t) { struct mm_struct *mm = t->mm; bool percpu; - WARN_ON_ONCE(!mm || t->mm_cid.cid != MM_CID_UNSET); + if (!mm) + return; + + WARN_ON_ONCE(t->mm_cid.cid != MM_CID_UNSET); guard(mutex)(&mm->mm_cid.mutex); scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {