From: Thomas Gleixner <tglx@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Ihor Solodrai <ihor.solodrai@linux.dev>,
Shrikanth Hegde <sshegde@linux.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Michael Jeanson <mjeanson@efficios.com>
Subject: [patch V2 0/4] sched/mmcid: Cure mode transition woes
Date: Mon, 02 Feb 2026 10:39:35 +0100 [thread overview]
Message-ID: <20260201192234.380608594@kernel.org> (raw)
This is a follow up to the V1 submission:
https://lore.kernel.org/20260129210219.452851594@kernel.org
Ihor and Shrikanth reported hard lockups which can be tracked back to the recent
rewrite of the MM_CID management code.
1) The from task to CPU ownership transition lacks the intermediate
transition mode, which can lead to CID pool exhaustion and a
subsequent live lock. That intermediate mode was implemented for the
reverse operation already but omitted for this transition as the
original analysis missed a few possible scheduling scenarios.
2) Weakly ordered architectures can observe inconsistent state which
causes them to make the wrong decision. That leads to the same problem
as with #1.
The following series addresses these issue and fixes another albeit harmless
inconsistent state hickup which was found when analysing the above issues.
With these issues addressed the last change optimizes the bitmap
utilization in the transition modes.
The series applies on Linus tree and passes the selftests and a thread pool
emulator which stress tests the ownership transitions.
Changes vs. V1:
- Move the mm_cid_fixup_tasks_to_cpus() wrapping where it belongs (patch 1)
- Add barriers before and after the fixup functions to prevent CPU
reordering of the mode stores - Mathieu
- Update change logs - Mathieu
Delta patch against V1 is below
Thanks,
tglx
---
--- a/include/linux/rseq_types.h
+++ b/include/linux/rseq_types.h
@@ -133,7 +133,6 @@ struct mm_cid_pcpu {
* as that is modified by mmget()/mm_put() by other entities which
* do not actually share the MM.
* @pcpu_thrs: Threshold for switching back from per CPU mode
- * @mode_change: Mode change in progress
* @update_deferred: A deferred switch back to per task mode is pending.
*/
struct mm_mm_cid {
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10445,6 +10445,12 @@ static bool mm_update_max_cids(struct mm
/* Flip the mode and set the transition flag to bridge the transfer */
WRITE_ONCE(mc->mode, mc->mode ^ (MM_CID_TRANSIT | MM_CID_ONCPU));
+ /*
+ * Order the store against the subsequent fixups so that
+ * acquire(rq::lock) cannot be reordered by the CPU before the
+ * store.
+ */
+ smp_mb();
return true;
}
@@ -10487,6 +10493,16 @@ static inline void mm_update_cpus_allowe
irq_work_queue(&mc->irq_work);
}
+static inline void mm_cid_complete_transit(struct mm_struct *mm, unsigned int mode)
+{
+ /*
+ * Ensure that the store removing the TRANSIT bit cannot be
+ * reordered by the CPU before the fixups have been completed.
+ */
+ smp_mb();
+ WRITE_ONCE(mm->mm_cid.mode, mode);
+}
+
static inline void mm_cid_transit_to_task(struct task_struct *t, struct mm_cid_pcpu *pcp)
{
if (cid_on_cpu(t->mm_cid.cid)) {
@@ -10530,8 +10546,7 @@ static void mm_cid_fixup_cpus_to_tasks(s
}
}
}
- /* Clear the transition bit in the mode */
- WRITE_ONCE(mm->mm_cid.mode, 0);
+ mm_cid_complete_transit(mm, 0);
}
static inline void mm_cid_transit_to_cpu(struct task_struct *t, struct mm_cid_pcpu *pcp)
@@ -10603,8 +10618,7 @@ static void mm_cid_fixup_tasks_to_cpus(v
struct mm_struct *mm = current->mm;
mm_cid_do_fixup_tasks_to_cpus(mm);
- /* Clear the transition bit in the mode */
- WRITE_ONCE(mm->mm_cid.mode, MM_CID_ONCPU);
+ mm_cid_complete_transit(mm, MM_CID_ONCPU);
}
static bool sched_mm_cid_add_user(struct task_struct *t, struct mm_struct *mm)
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3914,8 +3914,7 @@ static __always_inline void mm_cid_sched
/*
* If transition mode is done, transfer ownership when the CID is
- * within the convergion range. Otherwise the next schedule in will
- * have to allocate or converge
+ * within the convergence range to optimize the next schedule in.
*/
if (!cid_in_transit(mode) && cid < READ_ONCE(mm->mm_cid.max_cids)) {
if (cid_on_cpu(mode))
next reply other threads:[~2026-02-02 9:39 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 9:39 Thomas Gleixner [this message]
2026-02-02 9:39 ` [patch V2 1/4] sched/mmcid: Prevent live lock on task to CPU mode transition Thomas Gleixner
2026-02-02 14:50 ` Mathieu Desnoyers
2026-02-04 13:27 ` [tip: sched/urgent] " tip-bot2 for Thomas Gleixner
2026-02-02 9:39 ` [patch V2 2/4] sched/mmcid: Protect transition on weakly ordered systems Thomas Gleixner
2026-02-02 14:53 ` Mathieu Desnoyers
2026-02-04 13:27 ` [tip: sched/urgent] " tip-bot2 for Thomas Gleixner
2026-02-02 9:39 ` [patch V2 3/4] sched/mmcid: Drop per CPU CID immediately when switching to per task mode Thomas Gleixner
2026-02-04 13:27 ` [tip: sched/urgent] " tip-bot2 for Thomas Gleixner
2026-02-10 7:33 ` [patch V2 3/4] " Shinichiro Kawasaki
2026-02-10 10:44 ` Thomas Gleixner
2026-02-10 11:51 ` Shinichiro Kawasaki
2026-02-10 13:03 ` Peter Zijlstra
2026-02-10 14:15 ` Shinichiro Kawasaki
2026-02-10 13:33 ` Thomas Gleixner
2026-02-10 14:55 ` Shinichiro Kawasaki
2026-02-10 16:20 ` [PATCH] sched/mmcid: Don't assume CID is CPU owned on mode switch Thomas Gleixner
2026-02-10 16:28 ` Mathieu Desnoyers
2026-02-11 10:33 ` Takashi Iwai
2026-02-11 21:00 ` Linus Torvalds
2026-02-02 9:39 ` [patch V2 4/4] sched/mmcid: Optimize transitional CIDs when scheduling out Thomas Gleixner
2026-02-02 14:56 ` Mathieu Desnoyers
2026-02-04 13:27 ` [tip: sched/urgent] " tip-bot2 for Thomas Gleixner
2026-02-02 10:14 ` [patch V2 0/4] sched/mmcid: Cure mode transition woes Peter Zijlstra
2026-02-02 11:46 ` Mathieu Desnoyers
2026-02-02 12:54 ` Peter Zijlstra
2026-02-02 21:22 ` Mathieu Desnoyers
2026-02-04 10:53 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260201192234.380608594@kernel.org \
--to=tglx@kernel.org \
--cc=ihor.solodrai@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mjeanson@efficios.com \
--cc=peterz@infradead.org \
--cc=sshegde@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox