public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0
@ 2026-02-12 21:12 root
  2026-02-12 21:19 ` Mathieu Desnoyers
  0 siblings, 1 reply; 8+ messages in thread
From: root @ 2026-02-12 21:12 UTC (permalink / raw)
  To: mathieu.desnoyers; +Cc: peterz, mingo, linux-kernel, mjfara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 5957 bytes --]

To: mathieu.desnoyers@efficios.com
Cc: peterz@infradead.org, mingo@redhat.com, linux-kernel@vger.kernel.org
Subject: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0

Hi Mathieu,

I'm hitting a repeatable page fault in sched_mm_cid_exit() on 6.19.0
when booting with nopti. The crash occurs during process exit
(do_exit -> sched_mm_cid_exit) on an atomic bit-clear (lock btr) of
the CID bitmap. The faulting address is within a 2MB huge page that
returns a permissions violation on supervisor write access.

The bug triggered 8 times over ~20 hours on a single boot, hitting
multiple unrelated processes (git, gce_workload_ce). Eventually D-Bus
died and systemd became non-functional, requiring a hard power-off.

Reproducer:
  Boot 6.19.0 with "nopti" (or "mitigations=off") on the cmdline.
  Run any workload that spawns and exits processes frequently
  (e.g. repeated git operations). Crashes begin within hours.

Workarounds tested:
  - CONFIG_SCHED_MM_CID=n: eliminates the crash (confirmed)
  - Removing "nopti" from cmdline (PTI enabled): no crash observed
    (other spectre/ibrs params remained disabled during this test)

Environment:
  Kernel:      6.19.0 (built from kernel.org tarball, LOCALVERSION=-gce)
  Config:      CONFIG_SCHED_MM_CID=y, CONFIG_CPU_MITIGATIONS=y (but
               mitigations disabled at boot via cmdline, see below)
  Boot params: nospectre_v1 nospectre_v2 spectre_v2=off
               spec_store_bypass_disable=off nopti noibrs noibpb
               no_stf_barrier mitigations=off
  CPU:         Intel Xeon Platinum 8581C @ 2.30GHz
               (pcid, invpcid, la57, 52-bit physical / 57-bit virtual)
  Platform:    Google Compute Engine VM
               8 vCPUs, 4 cores (HT), 30GB RAM, single NUMA node
  Preemption:  PREEMPT_NONE, HZ_250
  Modules:     all unsigned (E taint) — stock: xfs, nft_compat,
               aesni_intel, virtio_rng, ip_tables, etc.

  Full .config available on request.

All 8 crashes share identical characteristics:
  - RIP: sched_mm_cid_exit+0xe2/0x200
  - Fault type: supervisor write, permissions violation (error 0x0003)
  - PMD: 2MB huge page mapping (bit 7 set), dirty+accessed
  - Affected process exits with irqs disabled and preempt_count 1

Crash summary (oops #1-#2 occurred early in the boot but were not
captured by journald; the D taint on #3 confirms prior oopses):

  Oops  Time     Process                  CPU  Fault address         PMD
  #3    16:21:40 git[872493]              0    ff12983438c79bd0      80000001f8c001a1
  #4    16:24:26 git[874084]              3    ff1298343a86c5d0      80000001fa8001a1
  #5    16:33:52 git[882005]              0    ff1298343a86b0d0      80000001fa8001a1
  #6    16:49:41 gce_workload_ce[919343]  4    ff1298343a868dd0      80000001fa8001a1
  #7    18:54:23 git[982223]              1    ff1298343a86e8d0      80000001fa8001a1
  #8    19:27:13 git[1010036]             6    ff1298343a86bed0      80000001fa8001a1

Observations:
  - Oopses #4-#8 all fault within the same PMD (80000001fa8001a1) at
    different offsets, suggesting the 2MB huge page mapping for the CID
    bitmap has incorrect write permissions, rather than individual
    bitmap entries being corrupted.
  - Oops #3 uses a different PMD (80000001f8c001a1), indicating the
    issue can affect multiple huge page mappings.
  - Crashes occur on all CPUs (0,1,3,4,6) — not tied to a specific
    core.
  - Not process-specific: both git (via exit(2) syscall) and
    gce_workload_ce (via signal delivery -> do_group_exit) trigger it.

Full oops from first captured occurrence (#3):

BUG: unable to handle page fault for address: ff12983438c79bd0
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 1fba01067 P4D 1fba02067 PUD 1002a0063 PMD 80000001f8c001a1
Oops: Oops: 0003 [#3] SMP NOPTI
CPU: 0 UID: 0 PID: 872493 Comm: git Tainted: G      D     E       6.19.0-gce #1 PREEMPT(none)
Tainted: [D]=DIE, [E]=UNSIGNED_MODULE
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/24/2025
RIP: 0010:sched_mm_cid_exit+0xe2/0x200
Code: 48 03 05 29 54 44 02 8b 10 81 e2 ff ff ff bf 89 10 8b 05 51 0b fa 01 83 c0 3f c1 e8 03 25 f8 ff ff 1f 48 8d 84 43 c0 06 00 00 <f0> 48 0f b3 10 48 81 f9 ff ef ff ff 77 0c 48 8d bb 10 01 00 00 e8
RSP: 0018:ff7720274e6a3b58 EFLAGS: 00010002
RAX: ff12983434c79bd0 RBX: ff12983434c79500 RCX: ff12983434c7960f
RDX: 0000000020000006 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ff7720274e6a3b80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ff129836b560b180 R14: ff129833634d0000 R15: ff129833634d0000
FS:  000077b36aadc6c0(0000) GS:ff12983a5742b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff12983438c79bd0 CR3: 0000000351154002 CR4: 0000000000371ef0
Call Trace:
 <TASK>
 do_exit+0xc3/0xa00
 __x64_sys_exit+0x1b/0x20
 x64_sys_call+0x234d/0x2360
 do_syscall_64+0x7b/0x580
 ? _raw_spin_unlock_irq+0xe/0x50
 ? __x64_sys_rt_sigprocmask+0xdd/0x160
 ? x64_sys_call+0x160f/0x2360
 ? do_syscall_64+0xb4/0x580
 ? __do_sys_newfstatat+0x56/0x90
 ? __x64_sys_newfstatat+0x1c/0x30
 ? x64_sys_call+0x1510/0x2360
 ? do_syscall_64+0xb4/0x580
 ? __x64_sys_newfstatat+0x1c/0x30
 ? x64_sys_call+0x1510/0x2360
 ? do_syscall_64+0xb4/0x580
 ? x64_sys_call+0x1510/0x2360
 ? do_syscall_64+0xb4/0x580
 ? exc_page_fault+0x8b/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>
Modules linked in: xt_comment(E) nft_compat(E) tcp_diag(E) inet_diag(E)
 8021q(E) garp(E) mrp(E) nls_iso8859_1(E) xfs(E) intel_rapl_msr(E)
 intel_rapl_common(E) intel_uncore_frequency_common(E)
 ghash_clmulni_intel(E) aesni_intel(E) rapl(E) pvpanic_mmio(E)
 pvpanic(E) mac_hid(E) efi_pstore(E) virtio_rng(E) ip_tables(E)
---[ end trace 0000000000000000 ]---
note: git[872493] exited with irqs disabled
note: git[872493] exited with preempt_count 1

Thanks,

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0
@ 2026-02-12 21:13 mjfara
  0 siblings, 0 replies; 8+ messages in thread
From: mjfara @ 2026-02-12 21:13 UTC (permalink / raw)
  To: mathieu.desnoyers; +Cc: peterz, mingo, linux-kernel, mjfara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 166 bytes --]

Apologies, the previous message was sent from a local root account.
My contact email is mjfara@gmail.com — please direct any replies there.

Thanks,
Michael Faraci

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0
@ 2026-02-12 21:14 mjfara
  0 siblings, 0 replies; 8+ messages in thread
From: mjfara @ 2026-02-12 21:14 UTC (permalink / raw)
  To: mathieu.desnoyers; +Cc: peterz, mingo, linux-kernel, mjfara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 420 bytes --]

Apologies for the earlier message from a local root account. My contact
email is mjfara@gmail.com — please direct any replies there.

For transparency: the bug analysis and this report were drafted with
the assistance of Claude (AI), working from real crash data on my
server. The oops traces, kernel config, and test results are all from
my environment and accurately represented.

Thanks,
Mike Fara
mjfara@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0
@ 2026-02-12 21:33 Mike Fara
  0 siblings, 0 replies; 8+ messages in thread
From: Mike Fara @ 2026-02-12 21:33 UTC (permalink / raw)
  To: mathieu.desnoyers; +Cc: peterz, mingo, linux-kernel, mjfara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 432 bytes --]

Hi Mathieu,

Thanks for the quick response. Commit 1e83ccd5921a looks correct for
our issue — the symptoms match exactly. The lock btr was hitting an
out-of-bounds bit number from an unvalidated TRANSIT state.

I've cherry-picked the patch onto 6.19.0 with CONFIG_SCHED_MM_CID=y
and nopti still on the cmdline. Build is running now. Will report back
with results after reboot and soak testing.

Thanks,
Mike Fara
mjfara@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0
@ 2026-02-12 22:28 Mike Fara
  0 siblings, 0 replies; 8+ messages in thread
From: Mike Fara @ 2026-02-12 22:28 UTC (permalink / raw)
  To: mathieu.desnoyers; +Cc: peterz, mingo, linux-kernel, mjfara

To: mathieu.desnoyers@efficios.com
Cc: peterz@infradead.org, mingo@redhat.com, linux-kernel@vger.kernel.org, mjfara@gmail.com
Subject: Re: [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0

Hi Mathieu,

Confirmed. Rebuilt 6.19.0 with commit 1e83ccd5921a cherry-picked,
CONFIG_SCHED_MM_CID=y, and nopti still on the cmdline. Clean boot,
no oopses.

Verified the fix is compiled in by disassembling sched_mm_cid_exit
from the running kernel. The inlined mm_drop_cid_on_cpu() now has
the cid_on_cpu() guard before the lock btr:

  mm_drop_cid_on_cpu (inlined at sched_mm_cid_exit+0xc4):

    mov    (%rcx),%eax           # Load pcp->cid
    test   $0x40000000,%eax      # Test ONCPU bit (bit 30)
    je     <skip>                # Not CPU-owned? Skip drop entirely
    and    $0xbfffffff,%eax      # Clear ONCPU: cpu_cid_to_cid()
    mov    %eax,(%rcx)           # Store back
    ...
    lock btr %rax,(%rcx)         # mm_drop_cid (bitmap clear)
  <skip>:
    ...                          # Continue safely

Without the fix, the code would fall through to lock btr with a
garbage bit number derived from the TRANSIT flag (bit 29), causing
the out-of-bounds write we reported.

System info:

  # uname -a
  6.19.0-gce #2 SMP PREEMPT_DYNAMIC Thu Feb 12 21:42:52 UTC 2026 x86_64

  # grep SCHED_MM_CID /boot/config-$(uname -r)
  CONFIG_SCHED_MM_CID=y

  # Boot cmdline includes: nopti mitigations=off

  # dmesg | grep -i 'BUG\|oops\|sched_mm\|page.fault'
  (clean - no errors)

Will continue soak testing and report back if anything surfaces.

Tested-by: Mike Fara <mjfara@gmail.com>

Thanks,
Mike Fara
mjfara@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-02-13 11:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-12 21:12 [BUG] sched_mm_cid_exit+0xe2: page fault on CID bitmap write with nopti on 6.19.0 root
2026-02-12 21:19 ` Mathieu Desnoyers
2026-02-12 23:21   ` Thomas Gleixner
2026-02-13 11:16     ` Greg Kroah-Hartman
  -- strict thread matches above, loose matches on Subject: below --
2026-02-12 21:13 mjfara
2026-02-12 21:14 mjfara
2026-02-12 21:33 Mike Fara
2026-02-12 22:28 Mike Fara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox