All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: Shrikanth Hegde <sshegde@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ihor Solodrai <ihor.solodrai@linux.dev>,
	LKML <linux-kernel@vger.kernel.org>
Cc: Gabriele Monaco <gmonaco@redhat.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Michael Jeanson <mjeanson@efficios.com>,
	Jens Axboe <axboe@kernel.dk>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Gautham R. Shenoy" <gautham.shenoy@amd.com>,
	Florian Weimer <fweimer@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Yury Norov <yury.norov@gmail.com>, bpf <bpf@vger.kernel.org>,
	sched-ext@lists.linux.dev, Kernel Team <kernel-team@meta.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Puranjay Mohan <puranjay@kernel.org>, Tejun Heo <tj@kernel.org>
Subject: Re: [patch V5 00/20] sched: Rewrite MM CID management
Date: Wed, 28 Jan 2026 23:24:50 +0100	[thread overview]
Message-ID: <87jyx1ml3h.ffs@tglx> (raw)
In-Reply-To: <87y0lh96xo.ffs@tglx>

On Wed, Jan 28 2026 at 14:56, Thomas Gleixner wrote:
> On Wed, Jan 28 2026 at 18:28, Shrikanth Hegde wrote:
>> On 1/28/26 5:27 PM, Thomas Gleixner wrote:
>>   watchdog: CPU 23 self-detected hard LOCKUP @ mm_get_cid+0xe8/0x188
>>   watchdog: CPU 23 TB:1434903268401795, last heartbeat TB:1434897252302837 (11750ms ago)
>>   NIP [c0000000001b7134] mm_get_cid+0xe8/0x188
>>   LR [c0000000001b7154] mm_get_cid+0x108/0x188
>>   Call Trace:
>>   [c000000004c37db0] [c000000001145d84] cpuidle_enter_state+0xf8/0x6a4 (unreliable)
>>   [c000000004c37e00] [c0000000001b95ac] mm_cid_switch_to+0x3c4/0x52c
>>   [c000000004c37e60] [c000000001147264] __schedule+0x47c/0x700
>
> So if the above spins in mm_get_cid() then the below is just a consequence.
>
>>   watchdog: CPU 11 self-detected hard LOCKUP @ plpar_hcall_norets_notrace+0x18/0x2c
>>   watchdog: CPU 11 TB:1434903340004919, last heartbeat TB:1434897249749892 (11895ms ago)
>>   NIP [c0000000000f84fc] plpar_hcall_norets_notrace+0x18/0x2c
>>   LR [c000000001152588] queued_spin_lock_slowpath+0xd88/0x15d0
>>   Call Trace:
>>   [c00000056b69fb10] [c00000056b69fba0] 0xc00000056b69fba0 (unreliable)
>>   [c00000056b69fc30] [c000000001153ce0] _raw_spin_lock+0x80/0xa0
>>   [c00000056b69fc50] [c0000000001b9a34] raw_spin_rq_lock_nested+0x3c/0xf8
>>   [c00000056b69fc80] [c0000000001b9bb8] mm_cid_fixup_cpus_to_tasks+0xc8/0x28c
>>   [c00000056b69fd00] [c0000000001bff34] sched_mm_cid_exit+0x108/0x22c
>>   [c00000056b69fd40] [c000000000167b08] do_exit+0xf4/0x5d0
>>   [c00000056b69fdf0] [c00000000016800c] make_task_dead+0x0/0x178
>>   [c00000056b69fe10] [c0000000000316c8] system_call_exception+0x128/0x390
>>   [c00000056b69fe50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec
>
>> I am wondering if it this loop in mm_get_cid, which may not be getting a cid
>> for a long time? Is that possible?
>
> It shouldn't be possible by design, but it seems there is a corner case
> lurking somewhere which hasn't been covered. Let me stare at the logic
> in the transition functions once more. That's where CPU11 comes from:
>
>>   [c00000056b69fc80] [c0000000001b9bb8] mm_cid_fixup_cpus_to_tasks+0xc8/0x28c
>
> The exiting it initiated a transition back from per CPU to per task mode
> and that seems to make things unhappy for mysterious reasons.

I stared at it for a while and found the below stupidity. But when I
actually sat down after a while away from the keyboard and tried to
write a concise changelog explaining the root cause I failed to come up
with a coherent explanation why this would prevent the above scenario,
which hints at a situation of MMCID exhaustion.

@Ihor: Is the BPF CI fallout reproducible? If so, can you please provide
       it?

Thanks,

        tglx
---
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10664,8 +10664,14 @@ void sched_mm_cid_exit(struct task_struc
 			scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
 				if (!__sched_mm_cid_exit(t))
 					return;
-				/* Mode change required. Transfer currents CID */
-				mm_cid_transit_to_task(current, this_cpu_ptr(mm->mm_cid.pcpu));
+				/*
+				 * Mode change. The task has the CID unset
+				 * already. The CPU CID is still valid and
+				 * does not have MM_CID_TRANSIT set as the
+				 * mode change has just taken effect under
+				 * mm::mm_cid::lock. Drop it.
+				 */
+				mm_drop_cid_on_cpu(mm, this_cpu_ptr(mm->mm_cid.pcpu));
 			}
 			mm_cid_fixup_cpus_to_tasks(mm);
 			return;

  reply	other threads:[~2026-01-28 22:24 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19 17:26 [patch V5 00/20] sched: Rewrite MM CID management Thomas Gleixner
2025-11-19 17:26 ` [patch V5 01/20] sched/mmcid: Revert the complex " Thomas Gleixner
2025-11-20 11:20   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-19 17:26 ` [patch V5 02/20] sched/mmcid: Use proper data structures Thomas Gleixner
2025-11-20 11:20   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-19 17:26 ` [patch V5 03/20] sched/mmcid: Cacheline align MM CID storage Thomas Gleixner
2025-11-20 11:20   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-19 17:26 ` [patch V5 04/20] sched: Fixup whitespace damage Thomas Gleixner
2025-11-20 11:20   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-19 17:26 ` [patch V5 05/20] sched/mmcid: Move scheduler code out of global header Thomas Gleixner
2025-11-20 11:20   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-19 17:26 ` [patch V5 06/20] sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() Thomas Gleixner
2025-11-20 11:20   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-19 17:26 ` [patch V5 07/20] cpumask: Introduce cpumask_weighted_or() Thomas Gleixner
2025-11-20 11:20   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-19 17:26 ` [patch V5 08/20] sched/mmcid: Use cpumask_weighted_or() Thomas Gleixner
2025-11-20 11:20   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 09/20] cpumask: Cache num_possible_cpus() Thomas Gleixner
2025-11-20 11:20   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-21 22:56   ` [patch V5 09/20] " Marek Szyprowski
2025-11-22 15:36     ` Thomas Gleixner
2025-11-22 16:24       ` Marek Szyprowski
2025-11-22 19:09         ` Paul E. McKenney
2025-11-23 19:03       ` [tip: core/rseq] cpu: Initialize __num_possible_cpus correctly tip-bot2 for Thomas Gleixner
2025-11-22 18:47     ` [patch V5 09/20] cpumask: Cache num_possible_cpus() Paul E. McKenney
2025-11-22 19:10       ` Thomas Gleixner
2025-11-22  0:27   ` Nathan Chancellor
2025-11-26  4:36   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 10/20] sched/mmcid: Convert mm CID mask to a bitmap Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 11/20] signal: Move MMCID exit out of sighand lock Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 12/20] sched/mmcid: Move initialization out of line Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 13/20] sched/mmcid: Provide precomputed maximal value Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 14/20] sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 15/20] sched/mmcid: Introduce per task/CPU ownership infrastructure Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 16/20] sched/mmcid: Provide new scheduler CID mechanism Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 17/20] sched/mmcid: Provide CID ownership mode fixup functions Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 18/20] irqwork: Move data struct to a types header Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 19/20] sched/mmcid: Implement deferred mode change Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` tip-bot2 for Thomas Gleixner
2025-11-19 17:27 ` [patch V5 20/20] sched/mmcid: Switch over to the new mechanism Thomas Gleixner
2025-11-20 11:19   ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-22  0:43   ` [patch V5 20/20] " Nathan Chancellor
2025-11-22 15:02     ` Thomas Gleixner
2025-11-22 16:54       ` Shrikanth Hegde
2025-11-23 19:03       ` [tip: core/rseq] sched/mmcid: Ensure that per CPU threshold is > 0 tip-bot2 for Thomas Gleixner
2025-11-26  4:36   ` [tip: core/rseq] sched/mmcid: Switch over to the new mechanism tip-bot2 for Thomas Gleixner
2026-01-28  0:01 ` [patch V5 00/20] sched: Rewrite MM CID management Ihor Solodrai
2026-01-28  8:46   ` Peter Zijlstra
2026-01-28 11:57   ` Thomas Gleixner
2026-01-28 12:58     ` Shrikanth Hegde
2026-01-28 13:56       ` Thomas Gleixner
2026-01-28 22:24         ` Thomas Gleixner [this message]
2026-01-28 22:33           ` Ihor Solodrai
2026-01-28 23:08             ` Ihor Solodrai
2026-01-29 17:06               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jyx1ml3h.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=fweimer@redhat.com \
    --cc=gautham.shenoy@amd.com \
    --cc=gmonaco@redhat.com \
    --cc=ihor.solodrai@linux.dev \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mjeanson@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=puranjay@kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=sshegde@linux.ibm.com \
    --cc=tim.c.chen@intel.com \
    --cc=tj@kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.