public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0
       [not found] <20260212211213.F1BE52A1C1D@windowsforum.com>
@ 2026-02-12 21:19 ` Mathieu Desnoyers
  2026-02-12 23:21   ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Mathieu Desnoyers @ 2026-02-12 21:19 UTC (permalink / raw)
  To: root, Thomas Gleixner
  Cc: peterz, mingo, linux-kernel, mjfara, Greg Kroah-Hartman,
	stable@vger.kernel.org

On 2026-02-12 16:12, root wrote:
> To: mathieu.desnoyers@efficios.com
> Cc: peterz@infradead.org, mingo@redhat.com, linux-kernel@vger.kernel.org
> Subject: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0
> 
> Hi Mathieu,
> 
> I'm hitting a repeatable page fault in sched_mm_cid_exit() on 6.19.0
> when booting with nopti. The crash occurs during process exit
> (do_exit -> sched_mm_cid_exit) on an atomic bit-clear (lock btr) of
> the CID bitmap. The faulting address is within a 2MB huge page that
> returns a permissions violation on supervisor write access.
> 
> The bug triggered 8 times over ~20 hours on a single boot, hitting
> multiple unrelated processes (git, gce_workload_ce). Eventually D-Bus
> died and systemd became non-functional, requiring a hard power-off.

Can you confirm whether the following fix in Linus' tree fixes your issue ?

commit 1e83ccd5921a ("sched/mmcid: Don't assume CID is CPU owned on mode switch")

I suspect that it will soon be cherry picked into stable for an eventual v6.19.1.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0
  2026-02-12 21:19 ` [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0 Mathieu Desnoyers
@ 2026-02-12 23:21   ` Thomas Gleixner
  2026-02-13 11:16     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Gleixner @ 2026-02-12 23:21 UTC (permalink / raw)
  To: Mathieu Desnoyers, root
  Cc: peterz, mingo, linux-kernel, mjfara, Greg Kroah-Hartman,
	stable@vger.kernel.org

On Thu, Feb 12 2026 at 16:19, Mathieu Desnoyers wrote:
> On 2026-02-12 16:12, root wrote:
>> I'm hitting a repeatable page fault in sched_mm_cid_exit() on 6.19.0
>> when booting with nopti. The crash occurs during process exit
>> (do_exit -> sched_mm_cid_exit) on an atomic bit-clear (lock btr) of
>> the CID bitmap. The faulting address is within a 2MB huge page that
>> returns a permissions violation on supervisor write access.
>> 
>> The bug triggered 8 times over ~20 hours on a single boot, hitting
>> multiple unrelated processes (git, gce_workload_ce). Eventually D-Bus
>> died and systemd became non-functional, requiring a hard power-off.
>
> Can you confirm whether the following fix in Linus' tree fixes your issue ?

It's exactly that problem:

  2a:*	f0 48 0f b3 10       	lock btr %rdx,(%rax)		<-- trapping instruction

RDX: 0000000020000006

which has the TRANSIT bit set and that's what below fixes:

> commit 1e83ccd5921a ("sched/mmcid: Don't assume CID is CPU owned on mode switch")

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0
  2026-02-12 23:21   ` Thomas Gleixner
@ 2026-02-13 11:16     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 3+ messages in thread
From: Greg Kroah-Hartman @ 2026-02-13 11:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, root, peterz, mingo, linux-kernel, mjfara,
	stable@vger.kernel.org

On Fri, Feb 13, 2026 at 12:21:52AM +0100, Thomas Gleixner wrote:
> On Thu, Feb 12 2026 at 16:19, Mathieu Desnoyers wrote:
> > On 2026-02-12 16:12, root wrote:
> >> I'm hitting a repeatable page fault in sched_mm_cid_exit() on 6.19.0
> >> when booting with nopti. The crash occurs during process exit
> >> (do_exit -> sched_mm_cid_exit) on an atomic bit-clear (lock btr) of
> >> the CID bitmap. The faulting address is within a 2MB huge page that
> >> returns a permissions violation on supervisor write access.
> >> 
> >> The bug triggered 8 times over ~20 hours on a single boot, hitting
> >> multiple unrelated processes (git, gce_workload_ce). Eventually D-Bus
> >> died and systemd became non-functional, requiring a hard power-off.
> >
> > Can you confirm whether the following fix in Linus' tree fixes your issue ?
> 
> It's exactly that problem:
> 
>   2a:*	f0 48 0f b3 10       	lock btr %rdx,(%rax)		<-- trapping instruction
> 
> RDX: 0000000020000006
> 
> which has the TRANSIT bit set and that's what below fixes:
> 
> > commit 1e83ccd5921a ("sched/mmcid: Don't assume CID is CPU owned on mode switch")
> 

Great, I'll go grab it now.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-02-13 11:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260212211213.F1BE52A1C1D@windowsforum.com>
2026-02-12 21:19 ` [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0 Mathieu Desnoyers
2026-02-12 23:21   ` Thomas Gleixner
2026-02-13 11:16     ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox