Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout

public inbox for rcu@vger.kernel.org
 help / color / mirror / Atom feed

* Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
@ 2026-02-06 11:54 Matthieu Baerts
  2026-02-06 16:38 ` Stefano Garzarella
  2026-02-26 10:37 ` Jiri Slaby
  0 siblings, 2 replies; 45+ messages in thread
From: Matthieu Baerts @ 2026-02-06 11:54 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella
  Cc: kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Peter Zijlstra, Thomas Gleixner, Shinichiro Kawasaki,
	Paul E. McKenney

Hi Stefan, Stefano, + VM, RCU, sched people,

First, I'm sorry to cc a few MLs, but I'm still trying to locate the
origin of the issue I'm seeing.

Our CI for the MPTCP subsystem is now regularly hitting various stalls
before even starting the MPTCP test suite. These issues are visible on
top of the latest net and net-next trees, which have been sync with
Linus' tree yesterday. All these issues have been seen on a "public CI"
using GitHub-hosted runners with KVM support, where the tested kernel is
launched in a nested (I suppose) VM. I can see the issue with or without
debug.config. According to the logs, it might have started around
v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
and the CI doesn't currently have the ability to execute bisections.

The stalls happen before starting the MPTCP test suite. The init program
creates a VSOCK listening socket via socat [1], and different hangs are
then visible: RCU stalls followed by a soft lockup [2], only a soft
lockup [3], sometimes the soft lockup comes with a delay [4] [5], or
there is no RCU stalls or soft lockups detected after one minute, but VM
is stalled [6]. In the last case, the VM is stopped after having
launched GDB to get more details about what was being executed.

It feels like the issue is not directly caused by the VSOCK listening
socket, but the stalls always happen after having started the socat
command [1] in the background.

One last thing: I thought my issue was linked to another one seen on XFS
side and reported by Shinichiro Kawasaki [7], but apparently not.
Indeed, Paul McKenney mentioned Shinichiro's issue is probably fixed by
Thomas Gleixner's series called "sched/mmcid: Cure mode transition woes"
[8]. I applied these patches from Peter Zijlstra's tree from
tip/sched/urgent [9], and my issue is still present.

Any idea what could cause that, where to look at, or what could help to
find the root cause?

Commit info, kernel config, vmlinux, etc. are available on the CI side
on GitHub -- you need to click on the Summary button at the top left --
but I can share them here if needed.

Cheers,
Matt


[1] socat "VSOCK-LISTEN:1024,reuseaddr,fork" \
      "EXEC:\"${vsock_exec}\",pty,stderr,setsid,sigint,sane,echo=0" &

[2] From:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21723325004/job/62658752123#step:7:7288

> [   22.040424] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   22.043079] rcu: 	3-...0: (1 GPs behind) idle=b87c/1/0x4000000000000000 softirq=75/76 fqs=2100
> [   22.043387] rcu: 	(detected by 0, t=21005 jiffies, g=-1019, q=84 ncpus=4)
> [   22.043595] Sending NMI from CPU 0 to CPUs 3:
> [   22.043627] NMI backtrace for cpu 3
> [   22.043632] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.19.0-rc7+ #1 PREEMPT(voluntary) 
> [   22.043635] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [   22.043637] RIP: 0010:__schedule (include/linux/cpumask.h:1222)
> [   22.043643] Code: 75 b4 e8 0e d1 a7 ff 3b 45 b4 48 8b 7d b8 8b 55 a8 41 89 c4 73 66 89 c0 f0 49 0f ab 86 50 06 00 00 73 31 eb 57 89 55 a8 f3 90 <8b> 35 39 c8 6a 00 48 89 7d b8 89 75 b4 e8 d9 d0 a7 ff 3b 45 b4 48
> All code
> ========
>    0:	75 b4                	jne    0xffffffffffffffb6
>    2:	e8 0e d1 a7 ff       	call   0xffffffffffa7d115
>   31:	83 c1 01             	add    $0x1,%ecx
>   34:	48 63 c1             	movslq %ecx,%rax
>   37:	48 83 f8 3f          	cmp    $0x3f,%rax
>   3b:	76 bc                	jbe    0xfffffffffffffff9
>   3d:	48                   	rex.W
>   3e:	83                   	.byte 0x83
>   3f:	c4                   	.byte 0xc4
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	8b 42 08             	mov    0x8(%rdx),%eax
>    3:	a8 01                	test   $0x1,%al
>    5:	75 f7                	jne    0xfffffffffffffffe
>    7:	83 c1 01             	add    $0x1,%ecx
>    a:	48 63 c1             	movslq %ecx,%rax
>    d:	48 83 f8 3f          	cmp    $0x3f,%rax
>   11:	76 bc                	jbe    0xffffffffffffffcf
>   13:	48                   	rex.W
>   14:	83                   	.byte 0x83
>   15:	c4                   	.byte 0xc4
> [   28.498759] RSP: 0018:ffa0000000397b18 EFLAGS: 00000202
> [   28.498761] RAX: 0000000000000011 RBX: ff1100017acac340 RCX: 0000000000000003
> [   28.498762] RDX: ff1100017adb0aa0 RSI: 0000000000000003 RDI: 00007f27e4acf000
> [   28.498763] RBP: 0000000000000202 R08: ff1100017adb0aa0 R09: 0000000000000003
> [   28.498763] R10: ffffffffffffffff R11: 0000000000000003 R12: 0000000081484d01
> [   28.498764] R13: 0000000000000002 R14: ff1100017ac98000 R15: 0000000000000001
> [   28.498773] FS:  00007f27e50d86c0(0000) GS:ff110001f7d77000(0000) knlGS:0000000000000000
> [   28.498774] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   28.498775] CR2: 00007f27d8000020 CR3: 00000001009ac003 CR4: 0000000000373ef0
> [   28.498776] Call Trace:
> [   28.498817]  <TASK>
> [   28.498818]  ? __pfx_should_flush_tlb (arch/x86/mm/tlb.c:1298)
> [   28.498824]  ? __pfx_flush_tlb_func (arch/x86/mm/tlb.c:1125)
> [   28.498825]  ? unlink_anon_vmas (mm/rmap.c:438)
> [   28.498829]  on_each_cpu_cond_mask (arch/x86/include/asm/preempt.h:95 (discriminator 1))
> [   28.498830]  flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:91)
> [   28.498832]  tlb_flush_mmu_tlbonly (include/asm-generic/tlb.h:407)
> [   28.498835]  tlb_finish_mmu (mm/mmu_gather.c:356)
> [   28.498837]  vms_clear_ptes (mm/vma.c:1279)
> [   28.498839]  vms_complete_munmap_vmas (include/linux/mm.h:2928)
> [   28.498841]  do_vmi_align_munmap (mm/vma.c:1580)
> [   28.498844]  do_vmi_munmap (mm/vma.c:1627)
> [   28.498846]  __vm_munmap (mm/vma.c:3247)
> [   28.498849]  __x64_sys_munmap (mm/mmap.c:1077 (discriminator 1))
> [   28.498850]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1))
> [   28.498855]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
> [   28.498857] RIP: 0033:0x7f27e538d7bb
> [   28.498875] Code: 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48
> All code
> ========
>    0:	73 01                	jae    0x3
>    2:	c3                   	ret
>    3:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>    a:	f7 d8                	neg    %eax
>    c:	64 89 01             	mov    %eax,%fs:(%rcx)
>    f:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
>   13:	c3                   	ret
>   14:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
>   1b:	00 00 00 
>   1e:	90                   	nop
>   1f:	f3 0f 1e fa          	endbr64
>   23:	b8 0b 00 00 00       	mov    $0xb,%eax
>   28:	0f 05                	syscall
>   2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>   30:	73 01                	jae    0x33
>   32:	c3                   	ret
>   33:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>   3a:	f7 d8                	neg    %eax
>   3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>   3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>    6:	73 01                	jae    0x9
>    8:	c3                   	ret
>    9:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>   10:	f7 d8                	neg    %eax
>   12:	64 89 01             	mov    %eax,%fs:(%rcx)
>   15:	48                   	rex.W
> [   28.498876] RSP: 002b:00007f27e50d77f8 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> [   28.498878] RAX: ffffffffffffffda RBX: 0000000000009000 RCX: 00007f27e538d7bb
> [   28.498878] RDX: 00007f27e53cc280 RSI: 0000000000009000 RDI: 00007f27e4ac7000
> [   28.498879] RBP: 00007f27e50d7a80 R08: 000000000000004d R09: 0000000000000000
> [   28.498880] R10: 0000000000000008 R11: 0000000000000202 R12: 00007f27e4ac7000
> [   28.498880] R13: 00007f27e50d78a0 R14: 0000000000000001 R15: 0000000000000000
> [   28.498881]  </TASK>



[3]
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21723325004/job/62658752082#step:7:7609

> [   30.907497][    C1] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [virtme-ng-init:76]
> [   30.907506][    C1] Modules linked in:
> [   30.907510][    C1] irq event stamp: 53188
> [   30.907512][    C1] hardirqs last  enabled at (53187): irqentry_exit (kernel/entry/common.c:220)
> [   30.907521][    C1] hardirqs last disabled at (53188): sysvec_apic_timer_interrupt (arch/x86/include/asm/hardirq.h:78)
> [   30.907526][    C1] softirqs last  enabled at (52956): handle_softirqs (kernel/softirq.c:469 (discriminator 2))
> [   30.907531][    C1] softirqs last disabled at (52951): __irq_exit_rcu (kernel/softirq.c:657)
> [   30.907537][    C1] CPU: 1 UID: 0 PID: 76 Comm: virtme-ng-init Not tainted 6.19.0-rc7+ #1 PREEMPT(full) 
> [   30.907541][    C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [   30.907544][    C1] RIP: 0010:smp_call_function_many_cond (kernel/smp.c:351 (discriminator 5))
> [   30.907550][    C1] Code: cf 07 00 00 8b 43 08 a8 01 74 38 48 b8 00 00 00 00 00 fc ff df 49 89 f4 48 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 9c 08 00 00 8b 43 08 a8
> All code
> ========
>    0:	cf                   	iret
>    1:	07                   	(bad)
>    2:	00 00                	add    %al,(%rax)
>    4:	8b 43 08             	mov    0x8(%rbx),%eax
>    7:	a8 01                	test   $0x1,%al
>    9:	74 38                	je     0x43
>    b:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
>   12:	fc ff df 
>   15:	49 89 f4             	mov    %rsi,%r12
>   18:	48 89 f5             	mov    %rsi,%rbp
>   1b:	49 c1 ec 03          	shr    $0x3,%r12
>   1f:	83 e5 07             	and    $0x7,%ebp
>   22:	49 01 c4             	add    %rax,%r12
>   25:	83 c5 03             	add    $0x3,%ebp
>   28:	f3 90                	pause
>   2a:*	41 0f b6 04 24       	movzbl (%r12),%eax		<-- trapping instruction
>   2f:	40 38 c5             	cmp    %al,%bpl
>   32:	7c 08                	jl     0x3c
>   34:	84 c0                	test   %al,%al
>   36:	0f 85 9c 08 00 00    	jne    0x8d8
>   3c:	8b 43 08             	mov    0x8(%rbx),%eax
>   3f:	a8                   	.byte 0xa8
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	41 0f b6 04 24       	movzbl (%r12),%eax
>    5:	40 38 c5             	cmp    %al,%bpl
>    8:	7c 08                	jl     0x12
>    a:	84 c0                	test   %al,%al
>    c:	0f 85 9c 08 00 00    	jne    0x8ae
>   12:	8b 43 08             	mov    0x8(%rbx),%eax
>   15:	a8                   	.byte 0xa8
> [   30.907553][    C1] RSP: 0018:ffffc9000101f6a0 EFLAGS: 00000202
> [   30.907555][    C1] RAX: 0000000000000011 RBX: ffff888152040c00 RCX: 0000000000000000
> [   30.907557][    C1] RDX: ffff8881520ba948 RSI: ffff888152040c08 RDI: 0000000000000000
> [   30.907559][    C1] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001
> [   30.907560][    C1] R10: 0000000000000001 R11: 00007f21a6200000 R12: ffffed102a408181
> [   30.907561][    C1] R13: ffff8881520ba940 R14: ffffed102a417529 R15: 0000000000000001
> [   30.907573][    C1] FS:  00007f21a69186c0(0000) GS:ffff8881cc22e000(0000) knlGS:0000000000000000
> [   30.907585][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   30.907587][    C1] CR2: 00007f2198000020 CR3: 0000000107149002 CR4: 0000000000370ef0
> [   30.907591][    C1] Call Trace:
> [   30.907596][    C1]  <TASK>
> [   30.907603][    C1]  ? __pfx_should_flush_tlb (arch/x86/mm/tlb.c:1298)
> [   30.907612][    C1]  ? __pfx_flush_tlb_func (arch/x86/mm/tlb.c:1125)
> [   30.907626][    C1]  ? kasan_quarantine_put (arch/x86/include/asm/irqflags.h:26)
> [   30.907637][    C1]  ? __pfx_smp_call_function_many_cond (kernel/smp.c:784)
> [   30.907646][    C1]  ? kmem_cache_free (mm/slub.c:6674 (discriminator 3))
> [   30.907656][    C1]  ? __pfx_should_flush_tlb (arch/x86/mm/tlb.c:1298)
> [   30.907660][    C1]  on_each_cpu_cond_mask (kernel/smp.c:1044)
> [   30.907664][    C1]  ? __pfx_flush_tlb_func (arch/x86/mm/tlb.c:1125)
> [   30.907669][    C1]  kvm_flush_tlb_multi (arch/x86/kernel/kvm.c:666)
> [   30.907675][    C1]  ? __pfx_kvm_flush_tlb_multi (arch/x86/kernel/kvm.c:666)
> [   30.907679][    C1]  ? get_flush_tlb_info (arch/x86/mm/tlb.c:1434 (discriminator 1))
> [   30.907686][    C1]  flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:91)
> [   30.907690][    C1]  ? rcu_read_lock_any_held (kernel/rcu/update.c:386 (discriminator 1))
> [   30.907695][    C1]  ? __pfx_flush_tlb_mm_range (arch/x86/mm/tlb.c:1452)
> [   30.907703][    C1]  tlb_flush_mmu_tlbonly (include/asm-generic/tlb.h:407)
> [   30.907712][    C1]  tlb_finish_mmu (mm/mmu_gather.c:356)
> [   30.907718][    C1]  vms_clear_ptes (mm/vma.c:1279)
> [   30.907724][    C1]  ? vms_complete_munmap_vmas (include/linux/mmap_lock.h:386)
> [   30.907728][    C1]  ? __pfx_vms_clear_ptes (mm/vma.c:1258)
> [   30.907738][    C1]  ? __pfx_mas_store_gfp (lib/maple_tree.c:5119)
> [   30.907747][    C1]  vms_complete_munmap_vmas (include/linux/mm.h:2928)
> [   30.907750][    C1]  ? vms_gather_munmap_vmas (mm/vma.c:1495)
> [   30.907776][    C1]  do_vmi_align_munmap (mm/vma.c:1580)
> [   30.907780][    C1]  ? lock_acquire.part.0 (kernel/locking/lockdep.c:470)
> [   30.907784][    C1]  ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1))
> [   30.907789][    C1]  ? __pfx_do_vmi_align_munmap (mm/vma.c:1561)
> [   30.907792][    C1]  ? __lock_release.isra.0 (kernel/locking/lockdep.c:5536)
> [   30.907800][    C1]  ? put_pid.part.0 (arch/x86/include/asm/atomic.h:93 (discriminator 4))
> [   30.907826][    C1]  do_vmi_munmap (mm/vma.c:1627)
> [   30.907832][    C1]  __vm_munmap (mm/vma.c:3247)
> [   30.907837][    C1]  ? __pfx___vm_munmap (mm/vma.c:3238)
> [   30.907841][    C1]  ? _copy_to_user (arch/x86/include/asm/uaccess_64.h:121)
> [   30.907858][    C1]  __x64_sys_munmap (mm/mmap.c:1077 (discriminator 1))
> [   30.907861][    C1]  ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4473)
> [   30.907863][    C1]  ? do_syscall_64 (arch/x86/include/asm/irqflags.h:42)
> [   30.907866][    C1]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1))
> [   30.907871][    C1]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
> [   30.907875][    C1] RIP: 0033:0x7f21a6bc47bb
> [   30.907880][    C1] Code: 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48
> All code
> ========
>    0:	73 01                	jae    0x3
>    2:	c3                   	ret
>    3:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>    a:	f7 d8                	neg    %eax
>    c:	64 89 01             	mov    %eax,%fs:(%rcx)
>    f:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
>   13:	c3                   	ret
>   14:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
>   1b:	00 00 00 
>   1e:	90                   	nop
>   1f:	f3 0f 1e fa          	endbr64
>   23:	b8 0b 00 00 00       	mov    $0xb,%eax
>   28:	0f 05                	syscall
>   2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>   30:	73 01                	jae    0x33
>   32:	c3                   	ret
>   33:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>   3a:	f7 d8                	neg    %eax
>   3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>   3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>    6:	73 01                	jae    0x9
>    8:	c3                   	ret
>    9:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>   10:	f7 d8                	neg    %eax
>   12:	64 89 01             	mov    %eax,%fs:(%rcx)
>   15:	48                   	rex.W
> [   30.907882][    C1] RSP: 002b:00007f21a69177f8 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> [   30.907884][    C1] RAX: ffffffffffffffda RBX: 0000000000009000 RCX: 00007f21a6bc47bb
> [   30.907886][    C1] RDX: 00007f21a6c03280 RSI: 0000000000009000 RDI: 00007f21a62fe000
> [   30.907887][    C1] RBP: 00007f21a6917a80 R08: 0000000000000050 R09: 0000000000000000
> [   30.907889][    C1] R10: 0000000000000008 R11: 0000000000000202 R12: 00007f21a62fe000
> [   30.907890][    C1] R13: 00007f21a69178a0 R14: 0000000000000001 R15: 0000000000000000
> [   30.907902][    C1]  </TASK>



[4]
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21741113372/job/62716612654#step:7:12820
[5]
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21741112047/job/62716608856#step:7:14820

[6]
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21741112047/job/62716608836#step:7:4811


# l

> virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
> 106			goto __retry;
> 101	 __retry:
> 102		val = atomic_read(&lock->val);
> 103	
> 104		if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
> 105			cpu_relax();
> 106			goto __retry;
> 107		}
> 108	
> 109		return true;
> 110	}


# bt full

> #0  virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
>         val = <optimized out>
> #1  queued_spin_lock_slowpath (lock=0xff1100017acab300, val=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/locking/qspinlock.c:141
>         prev = <optimized out>
>         next = 0x1
>         node = <optimized out>
>         old = <optimized out>
>         tail = <optimized out>
>         idx = <optimized out>
>         locked = <optimized out>
>         __vpp_verify = <optimized out>
>         __vpp_verify = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
> #2  0xffffffff813de445 in raw_spin_rq_lock_nested (rq=0xff1100017acab300, subclass=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:639
>         lock = <optimized out>
> #3  0xffffffff813ef2d5 in raw_spin_rq_lock (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1580
> No locals.
> #4  _raw_spin_rq_lock_irqsave (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1600
>         flags = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
> #5  rq_lock_irqsave (rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1893
> No locals.
> #6  sched_balance_rq (this_cpu=0x7acab300, this_rq=0x1, sd=0x1, idle=2060104448, continue_balancing=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:11867
>         ld_moved = 0x1
>         cur_ld_moved = <optimized out>
>         active_balance = <optimized out>
>         sd_parent = <optimized out>
>         group = <optimized out>
>         busiest = <optimized out>
>         rf = <optimized out>
>         cpus = <optimized out>
>         env = {sd = 0xff1100010020b400, src_rq = 0xff1100017acab300, src_cpu = 0x1, dst_cpu = 0x0, dst_rq = 0xff1100017ac2b300, dst_grpmask = 0xff110001001e4930, new_dst_cpu = 0x0, idle = CPU_NEWLY_IDLE, imbalance = 0x1, cpus = 0xff1100017ac183e0, flags = 0x1, loop = 0x0, loop_break = 0x20, loop_max = 0x2, fbq_type = all, migration_type = migrate_task, tasks = <incomplete type>}
>         need_unlock = <optimized out>
>         redo = <optimized out>
>         more_balance = <optimized out>
>         __vpp_verify = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
>         __vpp_verify = <optimized out>
> #7  0xffffffff813efe9b in sched_balance_newidle (this_rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:12932
>         weight = <optimized out>
>         domain_cost = 0xff1100017acab300
>         next_balance = <optimized out>
>         this_cpu = 0x20b400
>         continue_balancing = 0x1
>         t0 = <optimized out>
>         t1 = <optimized out>
>         curr_cost = 0x0
>         sd = 0xff1100010020b400
>         pulled_task = 0x1
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
> #8  pick_next_task_fair (rq=0xff1100010020b400, prev=0x1, rf=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:8973
>         se = 0xfffb6e65
>         p = <optimized out>
>         new_tasks = <optimized out>
>         again = <optimized out>
>         idle = <optimized out>
>         simple = <optimized out>
> #9  0xffffffff81e1337e in __pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5890
>         class = 0xff1100010224aa00
>         p = 0xffffffff824b34a8 <fair_sched_class>
>         restart = <optimized out>
> #10 pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6426
> No locals.
> #11 __schedule (sched_mode=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6809
>         prev = 0xff1100010146c380
>         next = 0xffffffff824b34a8 <fair_sched_class>
>         preempt = 0x1
>         is_switch = 0x1
>         switch_count = <optimized out>
>         prev_state = <optimized out>
>         rf = <incomplete type>
>         rq = <optimized out>
>         cpu = <optimized out>
>         keep_resched = <optimized out>
>         __vpp_verify = <optimized out>
> #12 0xffffffff81e14097 in __schedule_loop (sched_mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6949
> No locals.
> #13 schedule () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6964
>         tsk = <optimized out>
> #14 0xffffffff8179e372 in request_wait_answer (req=0xff11000100910160) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:552
>         __int = <optimized out>
>         __out = <optimized out>
>         __wq_entry = <incomplete type>
>         __ret = <optimized out>
>         __ret = <optimized out>
>         fc = 0xff110001023e6800
>         fiq = <optimized out>
>         err = <optimized out>
> #15 0xffffffff8179e5a0 in __fuse_request_send (req=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:599
>         fiq = 0x0
> #16 __fuse_simple_request (idmap=0xff1100017acab300, fm=0x1, args=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:693
>         fc = 0xff110001023e6800
>         req = 0xff11000100910160
>         ret = 0xff110001023e6800
> #17 0xffffffff817a47d9 in fuse_simple_request (fm=<optimized out>, args=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1263
> No locals.
> #18 fuse_lookup_name (sb=0xff1100017acab300, nodeid=0x1, name=0x1, outarg=0xff1100017acab300, inode=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dir.c:574
>         fm = <optimized out>
>         args = <incomplete type>
>         forget = <optimized out>
>         attr_version = <optimized out>
>         evict_ctr = <optimized out>
>         err = 0x411620
> #19 0xffffffff817a49c9 in fuse_lookup (dir=0xff11000100606a00, entry=0xff11000100411600, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1062
>         outarg = <incomplete type>
>         fc = <optimized out>
>         inode = 0x0
>         newent = 0xffa00000003afaf0
>         err = <optimized out>
>         epoch = <optimized out>
>         outarg_valid = 0x0
>         locked = <optimized out>
>         out_iput = <optimized out>
> #20 0xffffffff816c9e63 in __lookup_slow (name=0xff1100017acab300, dir=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:1866
>         dentry = 0xff11000100411600
>         old = <optimized out>
>         inode = 0xff11000100606a00
>         wq = <incomplete type>
> #21 0xffffffff816c9f69 in lookup_slow (name=0xff1100017acab300, dir=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:1883
>         inode = <optimized out>
>         res = <optimized out>
> #22 0xffffffff816cddd8 in walk_component (nd=<optimized out>, flags=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2229
>         dentry = 0x1
> #23 lookup_last (nd=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2730
> No locals.
> #24 path_lookupat (nd=0xff1100017acab300, flags=0x1, path=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2754
>         s = 0x1 <error: Cannot access memory at address 0x1>
>         err = <optimized out>
> #25 0xffffffff816cfee0 in filename_lookup (dfd=0x7acab300, name=0x1, flags=0x1, path=0xff1100017acab300, root=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2783
>         retval = 0x1
>         nd = {path = <incomplete type>, last = {{{hash = 0x314ef79d, len = 0x4}, hash_len = 0x4314ef79d}, name = 0xff110001009f1029 "dpkg"}, root = <incomplete type>, inode = 0xff11000100606a00, flags = 0x5, state = 0x2, seq = 0x0, next_seq = 0x0, m_seq = 0x34, r_seq = 0x4, last_type = 0x0, depth = 0x0, total_link_count = 0x0, stack = 0xffa00000003afcd8, internal = {{link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}, {link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}}, name = 0xff110001009f1000, pathname = 0xff110001009f1020 "/var/lib/dpkg", saved = 0x0, root_seq = 0x2, dfd = 0xffffff9c, dir_vfsuid = <incomplete type>, dir_mode = 0x41ed}
> #26 0xffffffff816c1a8c in vfs_statx (dfd=0x7acab300, filename=0x1, flags=0x1, stat=0xff1100017acab300, request_mask=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:353
>         path = <incomplete type>
>         lookup_flags = 0x5
>         error = 0xffffff9c
> #27 0xffffffff816c2863 in do_statx (dfd=0x7acab300, filename=0x1, flags=0x1, mask=0x7acab300, buffer=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:769
>         stat = {result_mask = 0x0, mode = 0x0, nlink = 0x0, blksize = 0x0, attributes = 0x0, attributes_mask = 0x0, ino = 0x0, dev = 0x0, rdev = 0x0, uid = <incomplete type>, gid = <incomplete type>, size = 0x0, atime = <incomplete type>, mtime = <incomplete type>, ctime = <incomplete type>, btime = <incomplete type>, blocks = 0x0, mnt_id = 0x0, change_cookie = 0x0, subvol = 0x0, dio_mem_align = 0x0, dio_offset_align = 0x0, dio_read_offset_align = 0x0, atomic_write_unit_min = 0x0, atomic_write_unit_max = 0x0, atomic_write_unit_max_opt = 0x0, atomic_write_segments_max = 0x0}
>         error = 0x1
> #28 0xffffffff816c2ab0 in __do_sys_statx (dfd=<optimized out>, filename=<optimized out>, flags=<optimized out>, mask=<optimized out>, buffer=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:823
>         ret = 0xffffff9c
>         name = <error reading variable name (Cannot access memory at address 0x0)>
> #29 __se_sys_statx (dfd=<optimized out>, filename=<optimized out>, flags=<optimized out>, mask=<optimized out>, buffer=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:812
>         ret = <optimized out>
> #30 __x64_sys_statx (regs=0xff1100017acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:812
> No locals.
> #31 0xffffffff81e07124 in do_syscall_x64 (regs=<optimized out>, nr=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:63
>         unr = <optimized out>
> #32 do_syscall_64 (regs=0xff1100017acab300, nr=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:94
> No locals.
> #33 0xffffffff81000130 in entry_SYSCALL_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/entry_64.S:122
> No locals.
> #34 0x000000000000000e in ?? ()
> No symbol table info available.
> #35 0x0000000000000001 in ?? ()
> No symbol table info available.
> #36 0x00007fff8cbcae40 in ?? ()
> No symbol table info available.
> #37 0x00007fbb012d6530 in ?? ()
> No symbol table info available.
> #38 0x00007fbb00d14c70 in ?? ()
> No symbol table info available.
> #39 0x00007fbb00d14e00 in ?? ()
> No symbol table info available.
> #40 0x0000000000000246 in ?? ()
> No symbol table info available.
> #41 0x0000000000000fff in ?? ()
> No symbol table info available.
> #42 0x0000000000000000 in ?? ()
> No symbol table info available.


# info frame ; info registers

> Stack level 0, frame at 0xffa00000003af740:
>  rip = 0xffffffff81e1c641 in virt_spin_lock (/home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106); saved rip = 0xffffffff813de445
>  inlined into frame 1
>  source language c.
>  Arglist at unknown address.
>  Locals at unknown address, Previous frame's sp in rsp
> rax            0x1                 0x1
> rbx            0xff1100017acab300  0xff1100017acab300
> rcx            0xff1100017acab300  0xff1100017acab300
> rdx            0x1                 0x1
> rsi            0x1                 0x1
> rdi            0xff1100017acab300  0xff1100017acab300
> rbp            0x2                 0x2
> rsp            0xffa00000003af738  0xffa00000003af738
> r8             0x0                 0x0
> r9             0x400               0x400
> r10            0x0                 0x0
> r11            0x2                 0x2
> r12            0x1                 0x1
> r13            0xff110001001e48c0  0xff110001001e48c0
> r14            0xffa00000003af810  0xffa00000003af810
> r15            0xff110001001e4940  0xff110001001e4940
> rip            0xffffffff81e1c641  0xffffffff81e1c641 <queued_spin_lock_slowpath+305>
> eflags         0x2                 [ IOPL=0 ]
> cs             0x10                0x10
> ss             0x18                0x18
> ds             0x0                 0x0
> es             0x0                 0x0
> fs             0x0                 0x0
> gs             0x0                 0x0
> fs_base        0x7fbb00d156c0      0x7fbb00d156c0
> gs_base        0xff110001f7cf7000  0xff110001f7cf7000
> k_gs_base      0x0                 0x0
> cr0            0x80050033          [ PG AM WP NE ET MP PE ]
> cr2            0x7fbaf8001118      0x7fbaf8001118
> cr3            0x1022df003         [ PDBR=1057503 PCID=3 ]
> cr4            0x373ef0            [ SMAP SMEP OSXSAVE PCIDE FSGSBASE VMXE LA57 UMIP OSXMMEXCPT OSFXSR PGE MCE PAE PSE ]
> cr8            0x1                 0x1
> efer           0xd01               [ NXE LMA LME SCE ]
> xmm0           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> xmm1           {v4_float = {0x4, 0x0, 0x2c, 0x0}, v2_double = {0x4, 0x2c}, v16_int8 = {0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x4, 0x0, 0x0, 0x0, 0x2c, 0x0, 0x0, 0x0}, v4_int32 = {0x4, 0x0, 0x2c, 0x0}, v2_int64 = {0x4, 0x2c}, uint128 = 0x2c0000000000000004}
> xmm2           {v4_float = {0xf8000fb0, 0x7fba, 0x2c, 0x0}, v2_double = {0x7fbaf8000fb0, 0x2c}, v16_int8 = {0xb0, 0xf, 0x0, 0xf8, 0xba, 0x7f, 0x0, 0x0, 0x2c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0xfb0, 0xf800, 0x7fba, 0x0, 0x2c, 0x0, 0x0, 0x0}, v4_int32 = {0xf8000fb0, 0x7fba, 0x2c, 0x0}, v2_int64 = {0x7fbaf8000fb0, 0x2c}, uint128 = 0x2c00007fbaf8000fb0}
> xmm3           {v4_float = {0x59ff1020, 0x5555, 0x1e, 0x0}, v2_double = {0x555559ff1020, 0x1e}, v16_int8 = {0x20, 0x10, 0xff, 0x59, 0x55, 0x55, 0x0, 0x0, 0x1e, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x1020, 0x59ff, 0x5555, 0x0, 0x1e, 0x0, 0x0, 0x0}, v4_int32 = {0x59ff1020, 0x5555, 0x1e, 0x0}, v2_int64 = {0x555559ff1020, 0x1e}, uint128 = 0x1e0000555559ff1020}
> xmm4           {v4_float = {0xf8000090, 0x7fba, 0x0, 0x0}, v2_double = {0x7fbaf8000090, 0x0}, v16_int8 = {0x90, 0x0, 0x0, 0xf8, 0xba, 0x7f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x90, 0xf800, 0x7fba, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0xf8000090, 0x7fba, 0x0, 0x0}, v2_int64 = {0x7fbaf8000090, 0x0}, uint128 = 0x7fbaf8000090}
> xmm5           {v4_float = {0xff0000, 0x0, 0xff0000, 0x0}, v2_double = {0xff0000, 0xff0000}, v16_int8 = {0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x0, 0xff, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0}, v4_int32 = {0xff0000, 0x0, 0xff0000, 0x0}, v2_int64 = {0xff0000, 0xff0000}, uint128 = 0xff00000000000000ff0000}
> xmm6           {v4_float = {0xff0000, 0x0, 0x0, 0x0}, v2_double = {0xff0000, 0x0}, v16_int8 = {0x0, 0x0, 0xff, 0x0 <repeats 13 times>}, v8_int16 = {0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0xff0000, 0x0, 0x0, 0x0}, v2_int64 = {0xff0000, 0x0}, uint128 = 0xff0000}
> xmm7           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> xmm8           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> xmm9           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> xmm10          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> xmm11          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> xmm12          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> xmm13          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> xmm14          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> xmm15          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
> mxcsr          0x1f80              [ IM DM ZM OM UM PM ]


# thread apply all bt full

> Thread 4 (Thread 1.4 (CPU#3 [running])):
> #0  virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
>         val = <optimized out>
> #1  queued_spin_lock_slowpath (lock=0xff1100017acab300, val=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/locking/qspinlock.c:141
>         prev = <optimized out>
>         next = 0x1
>         node = <optimized out>
>         old = <optimized out>
>         tail = <optimized out>
>         idx = <optimized out>
>         locked = <optimized out>
>         __vpp_verify = <optimized out>
>         __vpp_verify = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
> #2  0xffffffff813de445 in raw_spin_rq_lock_nested (rq=0xff1100017acab300, subclass=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:639
>         lock = <optimized out>
> #3  0xffffffff813ef2d5 in raw_spin_rq_lock (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1580
> No locals.
> #4  _raw_spin_rq_lock_irqsave (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1600
>         flags = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
> #5  rq_lock_irqsave (rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1893
> No locals.
> #6  sched_balance_rq (this_cpu=0x7acab300, this_rq=0x1, sd=0x1, idle=2060104448, continue_balancing=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:11867
>         ld_moved = 0x1
>         cur_ld_moved = <optimized out>
>         active_balance = <optimized out>
>         sd_parent = <optimized out>
>         group = <optimized out>
>         busiest = <optimized out>
>         rf = <optimized out>
>         cpus = <optimized out>
>         env = {sd = 0xff1100010020ba00, src_rq = 0xff1100017acab300, src_cpu = 0x1, dst_cpu = 0x3, dst_rq = 0xff1100017adab300, dst_grpmask = 0xff110001001e4ab0, new_dst_cpu = 0x0, idle = CPU_NEWLY_IDLE, imbalance = 0x1, cpus = 0xff1100017ad983e0, flags = 0x1, loop = 0x0, loop_break = 0x20, loop_max = 0x2, fbq_type = all, migration_type = migrate_task, tasks = <incomplete type>}
>         need_unlock = <optimized out>
>         redo = <optimized out>
>         more_balance = <optimized out>
>         __vpp_verify = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
>         __vpp_verify = <optimized out>
> #7  0xffffffff813efe9b in sched_balance_newidle (this_rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:12932
>         weight = <optimized out>
>         domain_cost = 0xff1100017acab300
>         next_balance = <optimized out>
>         this_cpu = 0x20ba00
>         continue_balancing = 0x1
>         t0 = <optimized out>
>         t1 = <optimized out>
>         curr_cost = 0x0
>         sd = 0xff1100010020ba00
>         pulled_task = 0x1
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
> #8  pick_next_task_fair (rq=0xff1100010020ba00, prev=0x1, rf=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:8973
>         se = 0xfffb6e63
>         p = <optimized out>
>         new_tasks = <optimized out>
>         again = <optimized out>
>         idle = <optimized out>
>         simple = <optimized out>
> #9  0xffffffff81e1337e in __pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5890
>         class = 0x32
>         p = 0xffffffff824b34a8 <fair_sched_class>
>         restart = <optimized out>
> #10 pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6426
> No locals.
> #11 __schedule (sched_mode=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6809
>         prev = 0xff11000100238000
>         next = 0xffffffff824b34a8 <fair_sched_class>
>         preempt = 0x1
>         is_switch = 0x1
>         switch_count = <optimized out>
>         prev_state = <optimized out>
>         rf = <incomplete type>
>         rq = <optimized out>
>         cpu = <optimized out>
>         keep_resched = <optimized out>
>         __vpp_verify = <optimized out>
> #12 0xffffffff81e14097 in __schedule_loop (sched_mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6949
> No locals.
> #13 schedule () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6964
>         tsk = <optimized out>
> #14 0xffffffff814833ba in futex_do_wait (q=0x1, timeout=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/waitwake.c:358
> No locals.
> #15 0xffffffff81483b7e in __futex_wait (uaddr=0xff1100017acab300, flags=0x1, val=0x1, to=0xff1100017acab300, bitset=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/waitwake.c:687
>         q = {list = <incomplete type>, task = 0xff11000100238000, lock_ptr = 0xff11000100a15684, wake = 0xffffffff81482a80 <futex_wake_mark>, wake_data = 0x0, key = {shared = {i_seq = 0xff11000102bf0000, pgoff = 0x7fbb00f16000, offset = 0x992}, private = {{mm = 0xff11000102bf0000, __tmp = 0xff11000102bf0000}, address = 0x7fbb00f16000, offset = 0x992}, both = {ptr = 0xff11000102bf0000, word = 0x7fbb00f16000, offset = 0x992, node = 0xffffffff}}, pi_state = 0x0, rt_waiter = 0x0, requeue_pi_key = 0x0, bitset = 0xffffffff, requeue_state = <incomplete type>, drop_hb_ref = 0x0}
>         ret = 0x1
> #16 0xffffffff81483c68 in futex_wait (uaddr=0xff1100017acab300, flags=0x1, val=0x1, abs_time=0xff1100017acab300, bitset=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/waitwake.c:715
>         timeout = {timer = <incomplete type>, task = 0x0}
>         to = <optimized out>
>         restart = <optimized out>
>         ret = 0xffffffff
> #17 0xffffffff8147f4a5 in do_futex (uaddr=0xff1100017acab300, op=0x1, val=0x1, timeout=0xff1100017acab300, uaddr2=0x0, val2=0x400, val3=0xffffffff) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/syscalls.c:130
>         flags = 0x1
>         cmd = <optimized out>
> #18 0xffffffff8147f6ad in __do_sys_futex (uaddr=<optimized out>, op=<optimized out>, val=<optimized out>, utime=<optimized out>, uaddr2=<optimized out>, val3=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/syscalls.c:207
>         ret = <optimized out>
>         cmd = <optimized out>
>         t = 0x0
>         tp = 0xff1100017acab300
>         ts = <incomplete type>
> #19 __se_sys_futex (uaddr=<optimized out>, op=<optimized out>, val=<optimized out>, utime=<optimized out>, uaddr2=<optimized out>, val3=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/syscalls.c:188
>         ret = <optimized out>
> #20 __x64_sys_futex (regs=0xff1100017acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/syscalls.c:188
> No locals.
> #21 0xffffffff81e07124 in do_syscall_x64 (regs=<optimized out>, nr=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:63
>         unr = <optimized out>
> #22 do_syscall_64 (regs=0xff1100017acab300, nr=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:94
> No locals.
> #23 0xffffffff81000130 in entry_SYSCALL_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/entry_64.S:122
> No locals.
> #24 0x0000000000000000 in ?? ()
> No symbol table info available.
> 
> Thread 3 (Thread 1.3 (CPU#2 [running])):
> #0  virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
>         val = <optimized out>
> #1  queued_spin_lock_slowpath (lock=0xff1100017acab300, val=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/locking/qspinlock.c:141
>         prev = <optimized out>
>         next = 0x1
>         node = <optimized out>
>         old = <optimized out>
>         tail = <optimized out>
>         idx = <optimized out>
>         locked = <optimized out>
>         __vpp_verify = <optimized out>
>         __vpp_verify = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
> #2  0xffffffff813de445 in raw_spin_rq_lock_nested (rq=0xff1100017acab300, subclass=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:639
>         lock = <optimized out>
> #3  0xffffffff813ef2d5 in raw_spin_rq_lock (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1580
> No locals.
> #4  _raw_spin_rq_lock_irqsave (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1600
>         flags = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
> #5  rq_lock_irqsave (rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1893
> No locals.
> #6  sched_balance_rq (this_cpu=0x7acab300, this_rq=0x1, sd=0x1, idle=2060104448, continue_balancing=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:11867
>         ld_moved = 0x1
>         cur_ld_moved = <optimized out>
>         active_balance = <optimized out>
>         sd_parent = <optimized out>
>         group = <optimized out>
>         busiest = <optimized out>
>         rf = <optimized out>
>         cpus = <optimized out>
>         env = {sd = 0xff1100010020b800, src_rq = 0xff1100017acab300, src_cpu = 0x1, dst_cpu = 0x2, dst_rq = 0xff1100017ad2b300, dst_grpmask = 0xff110001001e4a30, new_dst_cpu = 0x0, idle = CPU_NEWLY_IDLE, imbalance = 0x1, cpus = 0xff1100017ad183e0, flags = 0x1, loop = 0x0, loop_break = 0x20, loop_max = 0x2, fbq_type = all, migration_type = migrate_task, tasks = <incomplete type>}
>         need_unlock = <optimized out>
>         redo = <optimized out>
>         more_balance = <optimized out>
>         __vpp_verify = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
>         __vpp_verify = <optimized out>
> #7  0xffffffff813efe9b in sched_balance_newidle (this_rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:12932
>         weight = <optimized out>
>         domain_cost = 0xff1100017acab300
>         next_balance = <optimized out>
>         this_cpu = 0x20b800
>         continue_balancing = 0x1
>         t0 = <optimized out>
>         t1 = <optimized out>
>         curr_cost = 0x0
>         sd = 0xff1100010020b800
>         pulled_task = 0x1
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
> #8  pick_next_task_fair (rq=0xff1100010020b800, prev=0x1, rf=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:8973
>         se = 0xfffb6e64
>         p = <optimized out>
>         new_tasks = <optimized out>
>         again = <optimized out>
>         idle = <optimized out>
>         simple = <optimized out>
> #9  0xffffffff81e1337e in __pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5890
>         class = 0xffa00000003bfc50
>         p = 0xffffffff824b34a8 <fair_sched_class>
>         restart = <optimized out>
> #10 pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6426
> No locals.
> #11 __schedule (sched_mode=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6809
>         prev = 0xff11000102369680
>         next = 0xffffffff824b34a8 <fair_sched_class>
>         preempt = 0x1
>         is_switch = 0x1
>         switch_count = <optimized out>
>         prev_state = <optimized out>
>         rf = <incomplete type>
>         rq = <optimized out>
>         cpu = <optimized out>
>         keep_resched = <optimized out>
>         __vpp_verify = <optimized out>
> #12 0xffffffff81e14097 in __schedule_loop (sched_mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6949
> No locals.
> #13 schedule () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6964
>         tsk = <optimized out>
> #14 0xffffffff8179e372 in request_wait_answer (req=0xff11000103061000) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:552
>         __int = <optimized out>
>         __out = <optimized out>
>         __wq_entry = <incomplete type>
>         __ret = <optimized out>
>         __ret = <optimized out>
>         fc = 0xff110001023e6800
>         fiq = <optimized out>
>         err = <optimized out>
> #15 0xffffffff8179e5a0 in __fuse_request_send (req=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:599
>         fiq = 0x0
> #16 __fuse_simple_request (idmap=0xff1100017acab300, fm=0x1, args=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:693
>         fc = 0xff110001023e6800
>         req = 0xff11000103061000
>         ret = 0xff110001023e6800
> #17 0xffffffff817a27e2 in fuse_simple_request (fm=<optimized out>, args=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1263
> No locals.
> #18 fuse_readlink_folio (inode=0xff11000100614e00, folio=0xffd40000040c1800) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dir.c:1834
>         fm = <optimized out>
>         desc = <incomplete type>
>         ap = <incomplete type>
>         link = <optimized out>
>         res = <optimized out>
> #19 0xffffffff817a2943 in fuse_get_link (dentry=0xff1100017acab300, inode=0xff1100017acab300, callback=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dir.c:1873
>         fc = <optimized out>
>         folio = <optimized out>
>         err = <optimized out>
> #20 0xffffffff816cc950 in pick_link (nd=0xff1100017acab300, link=0x1, inode=0xff11000100614e00, flags=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2013
>         get = 0xff1100017acab300
>         last = 0xffa00000003bfd88
>         res = 0x1 <error: Cannot access memory at address 0x1>
>         error = <optimized out>
>         all_done = <optimized out>
> #21 0xffffffff816ccb5e in step_into_slowpath (nd=0xff1100017acab300, flags=0x1, dentry=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2074
>         path = <incomplete type>
>         inode = 0x0
>         err = <optimized out>
> #22 0xffffffff816d14f7 in step_into (nd=<optimized out>, flags=<optimized out>, dentry=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2099
> No locals.
> #23 open_last_lookups (nd=<optimized out>, file=<optimized out>, op=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:4584
>         delegated_inode = <optimized out>
>         dir = 0xff1100010040ad80
>         open_flag = 0x1
>         got_write = <optimized out>
>         dentry = 0xff110001006af840
>         res = <optimized out>
>         retry = <optimized out>
> #24 path_openat (nd=0xff1100017acab300, op=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:4793
>         s = 0xff110001006af840 "\004"
>         file = <optimized out>
>         error = 0x0
> #25 0xffffffff816d2618 in do_filp_open (dfd=0x7acab300, pathname=0x1, op=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:4823
>         nd = {path = <incomplete type>, last = {{{hash = 0xf361748d, len = 0xd}, hash_len = 0xdf361748d}, name = 0xff11000102348031 "systemd-udevd"}, root = <incomplete type>, inode = 0xff110001004ac380, flags = 0x10001, state = 0x2, seq = 0x0, next_seq = 0x0, m_seq = 0x34, r_seq = 0x4, last_type = 0x0, depth = 0x1, total_link_count = 0x1, stack = 0xffa00000003bfd88, internal = {{link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x2}, {link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}}, name = 0xff11000102348000, pathname = 0xff11000102348020 "/usr/lib/systemd/systemd-udevd", saved = 0x0, root_seq = 0x2, dfd = 0xffffff9c, dir_vfsuid = <incomplete type>, dir_mode = 0x41ed}
>         flags = 0x1
>         filp = 0x1
> #26 0xffffffff816c3baf in do_open_execat (fd=0x7acab300, name=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:783
>         err = <optimized out>
>         file = <optimized out>
>         open_exec_flags = <incomplete type>
>         __ptr = <optimized out>
>         __val = <optimized out>
> #27 0xffffffff816c3de0 in alloc_bprm (fd=0x7acab300, filename=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:1409
>         bprm = <optimized out>
>         file = <optimized out>
>         retval = <optimized out>
> #28 0xffffffff816c48fd in do_execveat_common (fd=0x7acab300, filename=0x1, flags=0x0, envp=..., argv=...) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:1810
>         bprm = <optimized out>
>         retval = <optimized out>
> #29 0xffffffff816c5988 in do_execve (filename=<error reading variable: Cannot access memory at address 0x0>, __argv=<optimized out>, __envp=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:1933
>         argv = <optimized out>
>         envp = <optimized out>
> #30 __do_sys_execve (filename=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:2009
> No locals.
> #31 __se_sys_execve (filename=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:2004
>         ret = <optimized out>
> #32 __x64_sys_execve (regs=0xff1100017acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:2004
> No locals.
> #33 0xffffffff81e07124 in do_syscall_x64 (regs=<optimized out>, nr=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:63
>         unr = <optimized out>
> #34 do_syscall_64 (regs=0xff1100017acab300, nr=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:94
> No locals.
> #35 0xffffffff81000130 in entry_SYSCALL_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/entry_64.S:122
> No locals.
> #36 0x0000000000000003 in ?? ()
> No symbol table info available.
> #37 0x00007fbaf4000e50 in ?? ()
> No symbol table info available.
> #38 0x0000555559fefea0 in ?? ()
> No symbol table info available.
> #39 0x00007fbaf4000d90 in ?? ()
> No symbol table info available.
> #40 0x00007fbb0090de60 in ?? ()
> No symbol table info available.
> #41 0x00007fbaf4000cb0 in ?? ()
> No symbol table info available.
> #42 0x0000000000000202 in ?? ()
> No symbol table info available.
> #43 0x0000000000000008 in ?? ()
> No symbol table info available.
> #44 0x0000000000000000 in ?? ()
> No symbol table info available.
> 
> Thread 2 (Thread 1.2 (CPU#1 [running])):
> #0  num_possible_cpus () at /home/runner/work/mptcp_net-next/mptcp_net-next/include/linux/cpumask.h:1222
> No locals.
> #1  mm_get_cid (mm=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:3782
>         cid = 0x4
> #2  mm_cid_from_cpu (t=<optimized out>, cpu_cid=0x4, mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:3844
>         max_cids = <optimized out>
>         tcid = <optimized out>
>         mm = <optimized out>
> #3  mm_cid_schedin (next=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:3900
>         mm = 0xff11000102bf0000
>         cpu_cid = <optimized out>
>         mode = <optimized out>
> #4  mm_cid_switch_to (prev=<optimized out>, next=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:3935
> No locals.
> #5  context_switch (rq=<optimized out>, prev=<optimized out>, next=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5249
> No locals.
> #6  __schedule (sched_mode=0x2bf0650) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6867
>         prev = 0xff1100010031da00
>         next = 0xff1100010146da00
>         preempt = 0x4
>         is_switch = 0x0
>         switch_count = <optimized out>
>         prev_state = <optimized out>
>         rf = <incomplete type>
>         rq = <optimized out>
>         cpu = <optimized out>
>         keep_resched = <optimized out>
>         __vpp_verify = <optimized out>
> #7  0xffffffff81e14232 in schedule_idle () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6990
> No locals.
> #8  0xffffffff813f68a9 in cpu_startup_entry (state=46073424) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/idle.c:430
> No locals.
> #9  0xffffffff8135fef4 in start_secondary (unused=0xff11000102bf0650) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/kernel/smpboot.c:312
> No locals.
> #10 0xffffffff8132b266 in secondary_startup_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/kernel/head_64.S:418
> No locals.
> #11 0x0000000000000000 in ?? ()
> No symbol table info available.
> 
> Thread 1 (Thread 1.1 (CPU#0 [running])):
> #0  virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
>         val = <optimized out>
> #1  queued_spin_lock_slowpath (lock=0xff1100017acab300, val=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/locking/qspinlock.c:141
>         prev = <optimized out>
>         next = 0x1
>         node = <optimized out>
>         old = <optimized out>
>         tail = <optimized out>
>         idx = <optimized out>
>         locked = <optimized out>
>         __vpp_verify = <optimized out>
>         __vpp_verify = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
>         pao_ID__ = <optimized out>
>         pao_tmp__ = <optimized out>
>         pto_val__ = <optimized out>
>         pto_tmp__ = <optimized out>
> #2  0xffffffff813de445 in raw_spin_rq_lock_nested (rq=0xff1100017acab300, subclass=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:639
>         lock = <optimized out>
> #3  0xffffffff813ef2d5 in raw_spin_rq_lock (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1580
> No locals.
> #4  _raw_spin_rq_lock_irqsave (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1600
>         flags = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
> #5  rq_lock_irqsave (rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1893
> No locals.
> #6  sched_balance_rq (this_cpu=0x7acab300, this_rq=0x1, sd=0x1, idle=2060104448, continue_balancing=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:11867
>         ld_moved = 0x1
>         cur_ld_moved = <optimized out>
>         active_balance = <optimized out>
>         sd_parent = <optimized out>
>         group = <optimized out>
>         busiest = <optimized out>
>         rf = <optimized out>
>         cpus = <optimized out>
>         env = {sd = 0xff1100010020b400, src_rq = 0xff1100017acab300, src_cpu = 0x1, dst_cpu = 0x0, dst_rq = 0xff1100017ac2b300, dst_grpmask = 0xff110001001e4930, new_dst_cpu = 0x0, idle = CPU_NEWLY_IDLE, imbalance = 0x1, cpus = 0xff1100017ac183e0, flags = 0x1, loop = 0x0, loop_break = 0x20, loop_max = 0x2, fbq_type = all, migration_type = migrate_task, tasks = <incomplete type>}
>         need_unlock = <optimized out>
>         redo = <optimized out>
>         more_balance = <optimized out>
>         __vpp_verify = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
>         __vpp_verify = <optimized out>
> #7  0xffffffff813efe9b in sched_balance_newidle (this_rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:12932
>         weight = <optimized out>
>         domain_cost = 0xff1100017acab300
>         next_balance = <optimized out>
>         this_cpu = 0x20b400
>         continue_balancing = 0x1
>         t0 = <optimized out>
>         t1 = <optimized out>
>         curr_cost = 0x0
>         sd = 0xff1100010020b400
>         pulled_task = 0x1
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
>         __dummy = <optimized out>
>         __dummy2 = <optimized out>
> #8  pick_next_task_fair (rq=0xff1100010020b400, prev=0x1, rf=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:8973
>         se = 0xfffb6e65
>         p = <optimized out>
>         new_tasks = <optimized out>
>         again = <optimized out>
>         idle = <optimized out>
>         simple = <optimized out>
> #9  0xffffffff81e1337e in __pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5890
>         class = 0xff1100010224aa00
>         p = 0xffffffff824b34a8 <fair_sched_class>
>         restart = <optimized out>
> #10 pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6426
> No locals.
> #11 __schedule (sched_mode=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6809
>         prev = 0xff1100010146c380
>         next = 0xffffffff824b34a8 <fair_sched_class>
>         preempt = 0x1
>         is_switch = 0x1
>         switch_count = <optimized out>
>         prev_state = <optimized out>
>         rf = <incomplete type>
>         rq = <optimized out>
>         cpu = <optimized out>
>         keep_resched = <optimized out>
>         __vpp_verify = <optimized out>
> #12 0xffffffff81e14097 in __schedule_loop (sched_mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6949
> No locals.
> #13 schedule () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6964
>         tsk = <optimized out>
> #14 0xffffffff8179e372 in request_wait_answer (req=0xff11000100910160) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:552
>         __int = <optimized out>
>         __out = <optimized out>
>         __wq_entry = <incomplete type>
>         __ret = <optimized out>
>         __ret = <optimized out>
>         fc = 0xff110001023e6800
>         fiq = <optimized out>
>         err = <optimized out>
> #15 0xffffffff8179e5a0 in __fuse_request_send (req=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:599
>         fiq = 0x0
> #16 __fuse_simple_request (idmap=0xff1100017acab300, fm=0x1, args=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:693
>         fc = 0xff110001023e6800
>         req = 0xff11000100910160
>         ret = 0xff110001023e6800
> #17 0xffffffff817a47d9 in fuse_simple_request (fm=<optimized out>, args=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1263
> No locals.
> #18 fuse_lookup_name (sb=0xff1100017acab300, nodeid=0x1, name=0x1, outarg=0xff1100017acab300, inode=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dir.c:574
>         fm = <optimized out>
>         args = <incomplete type>
>         forget = <optimized out>
>         attr_version = <optimized out>
>         evict_ctr = <optimized out>
>         err = 0x411620
> #19 0xffffffff817a49c9 in fuse_lookup (dir=0xff11000100606a00, entry=0xff11000100411600, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1062
>         outarg = <incomplete type>
>         fc = <optimized out>
>         inode = 0x0
>         newent = 0xffa00000003afaf0
>         err = <optimized out>
>         epoch = <optimized out>
>         outarg_valid = 0x0
>         locked = <optimized out>
>         out_iput = <optimized out>
> #20 0xffffffff816c9e63 in __lookup_slow (name=0xff1100017acab300, dir=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:1866
>         dentry = 0xff11000100411600
>         old = <optimized out>
>         inode = 0xff11000100606a00
>         wq = <incomplete type>
> #21 0xffffffff816c9f69 in lookup_slow (name=0xff1100017acab300, dir=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:1883
>         inode = <optimized out>
>         res = <optimized out>
> #22 0xffffffff816cddd8 in walk_component (nd=<optimized out>, flags=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2229
>         dentry = 0x1
> #23 lookup_last (nd=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2730
> No locals.
> #24 path_lookupat (nd=0xff1100017acab300, flags=0x1, path=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2754
>         s = 0x1 <error: Cannot access memory at address 0x1>
>         err = <optimized out>
> #25 0xffffffff816cfee0 in filename_lookup (dfd=0x7acab300, name=0x1, flags=0x1, path=0xff1100017acab300, root=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2783
>         retval = 0x1
>         nd = {path = <incomplete type>, last = {{{hash = 0x314ef79d, len = 0x4}, hash_len = 0x4314ef79d}, name = 0xff110001009f1029 "dpkg"}, root = <incomplete type>, inode = 0xff11000100606a00, flags = 0x5, state = 0x2, seq = 0x0, next_seq = 0x0, m_seq = 0x34, r_seq = 0x4, last_type = 0x0, depth = 0x0, total_link_count = 0x0, stack = 0xffa00000003afcd8, internal = {{link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}, {link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}}, name = 0xff110001009f1000, pathname = 0xff110001009f1020 "/var/lib/dpkg", saved = 0x0, root_seq = 0x2, dfd = 0xffffff9c, dir_vfsuid = <incomplete type>, dir_mode = 0x41ed}
> #26 0xffffffff816c1a8c in vfs_statx (dfd=0x7acab300, filename=0x1, flags=0x1, stat=0xff1100017acab300, request_mask=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:353
>         path = <incomplete type>
>         lookup_flags = 0x5
>         error = 0xffffff9c
> #27 0xffffffff816c2863 in do_statx (dfd=0x7acab300, filename=0x1, flags=0x1, mask=0x7acab300, buffer=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:769
>         stat = {result_mask = 0x0, mode = 0x0, nlink = 0x0, blksize = 0x0, attributes = 0x0, attributes_mask = 0x0, ino = 0x0, dev = 0x0, rdev = 0x0, uid = <incomplete type>, gid = <incomplete type>, size = 0x0, atime = <incomplete type>, mtime = <incomplete type>, ctime = <incomplete type>, btime = <incomplete type>, blocks = 0x0, mnt_id = 0x0, change_cookie = 0x0, subvol = 0x0, dio_mem_align = 0x0, dio_offset_align = 0x0, dio_read_offset_align = 0x0, atomic_write_unit_min = 0x0, atomic_write_unit_max = 0x0, atomic_write_unit_max_opt = 0x0, atomic_write_segments_max = 0x0}
>         error = 0x1
> #28 0xffffffff816c2ab0 in __do_sys_statx (dfd=<optimized out>, filename=<optimized out>, flags=<optimized out>, mask=<optimized out>, buffer=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:823
>         ret = 0xffffff9c
>         name = <error reading variable name (Cannot access memory at address 0x0)>
> #29 __se_sys_statx (dfd=<optimized out>, filename=<optimized out>, flags=<optimized out>, mask=<optimized out>, buffer=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:812
>         ret = <optimized out>
> #30 __x64_sys_statx (regs=0xff1100017acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:812
> No locals.
> #31 0xffffffff81e07124 in do_syscall_x64 (regs=<optimized out>, nr=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:63
>         unr = <optimized out>
> #32 do_syscall_64 (regs=0xff1100017acab300, nr=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:94
> No locals.
> #33 0xffffffff81000130 in entry_SYSCALL_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/entry_64.S:122
> No locals.
> #34 0x000000000000000e in ?? ()
> No symbol table info available.
> #35 0x0000000000000001 in ?? ()
> No symbol table info available.
> #36 0x00007fff8cbcae40 in ?? ()
> No symbol table info available.
> #37 0x00007fbb012d6530 in ?? ()
> No symbol table info available.
> #38 0x00007fbb00d14c70 in ?? ()
> No symbol table info available.
> #39 0x00007fbb00d14e00 in ?? ()
> No symbol table info available.
> #40 0x0000000000000246 in ?? ()
> No symbol table info available.
> #41 0x0000000000000fff in ?? ()
> No symbol table info available.
> #42 0x0000000000000000 in ?? ()
> No symbol table info available.


[7] https://lore.kernel.org/aXdO52wh2rqTUi1E@shinmob

[8] https://lore.kernel.org/20260201192234.380608594@kernel.org

[9]
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=sched/urgent
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
@ 2026-02-06 16:38 ` Stefano Garzarella
  2026-02-06 17:13   ` Matthieu Baerts
  2026-02-26 10:37 ` Jiri Slaby
  1 sibling, 1 reply; 45+ messages in thread
From: Stefano Garzarella @ 2026-02-06 16:38 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Stefan Hajnoczi, kvm, virtualization, Netdev, rcu, MPTCP Linux,
	Linux Kernel, Peter Zijlstra, Thomas Gleixner,
	Shinichiro Kawasaki, Paul E. McKenney

On Fri, Feb 06, 2026 at 12:54:13PM +0100, Matthieu Baerts wrote:
>Hi Stefan, Stefano, + VM, RCU, sched people,

Hi Matt,

>
>First, I'm sorry to cc a few MLs, but I'm still trying to locate the
>origin of the issue I'm seeing.
>
>Our CI for the MPTCP subsystem is now regularly hitting various stalls
>before even starting the MPTCP test suite. These issues are visible on
>top of the latest net and net-next trees, which have been sync with
>Linus' tree yesterday. All these issues have been seen on a "public CI"
>using GitHub-hosted runners with KVM support, where the tested kernel is
>launched in a nested (I suppose) VM. I can see the issue with or without

Just to be sure I'm on the same page, the issue is in the most nested 
guest, right? (the last VM started)

>debug.config. According to the logs, it might have started around
>v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
>quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
>and the CI doesn't currently have the ability to execute bisections.
>
>The stalls happen before starting the MPTCP test suite. The init program
>creates a VSOCK listening socket via socat [1], and different hangs are
>then visible: RCU stalls followed by a soft lockup [2], only a soft
>lockup [3], sometimes the soft lockup comes with a delay [4] [5], or
>there is no RCU stalls or soft lockups detected after one minute, but VM
>is stalled [6]. In the last case, the VM is stopped after having
>launched GDB to get more details about what was being executed.
>
>It feels like the issue is not directly caused by the VSOCK listening
>socket, but the stalls always happen after having started the socat
>command [1] in the background.
>
>One last thing: I thought my issue was linked to another one seen on XFS
>side and reported by Shinichiro Kawasaki [7], but apparently not.
>Indeed, Paul McKenney mentioned Shinichiro's issue is probably fixed by
>Thomas Gleixner's series called "sched/mmcid: Cure mode transition woes"
>[8]. I applied these patches from Peter Zijlstra's tree from
>tip/sched/urgent [9], and my issue is still present.
>
>Any idea what could cause that, where to look at, or what could help to
>find the root cause?

Mmm, nothing comes to mind at the vsock side :-(

I understand that bisection can't be done in the CI env, but can you 
confirm in some way that 6.18 is working right with the same userspace?

That could help to try to identify at least if there is anything in 
AF_VSOCK we merged recently that can trigger that.

Thanks,
Stefano

>
>Commit info, kernel config, vmlinux, etc. are available on the CI side
>on GitHub -- you need to click on the Summary button at the top left --
>but I can share them here if needed.
>
>Cheers,
>Matt
>
>
>[1] socat "VSOCK-LISTEN:1024,reuseaddr,fork" \
>      "EXEC:\"${vsock_exec}\",pty,stderr,setsid,sigint,sane,echo=0" &
>
>[2] From:
>https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21723325004/job/62658752123#step:7:7288
>
>> [   22.040424] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> [   22.043079] rcu: 	3-...0: (1 GPs behind) idle=b87c/1/0x4000000000000000 softirq=75/76 fqs=2100
>> [   22.043387] rcu: 	(detected by 0, t=21005 jiffies, g=-1019, q=84 ncpus=4)
>> [   22.043595] Sending NMI from CPU 0 to CPUs 3:
>> [   22.043627] NMI backtrace for cpu 3
>> [   22.043632] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.19.0-rc7+ #1 PREEMPT(voluntary)
>> [   22.043635] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [   22.043637] RIP: 0010:__schedule (include/linux/cpumask.h:1222)
>> [   22.043643] Code: 75 b4 e8 0e d1 a7 ff 3b 45 b4 48 8b 7d b8 8b 55 a8 41 89 c4 73 66 89 c0 f0 49 0f ab 86 50 06 00 00 73 31 eb 57 89 55 a8 f3 90 <8b> 35 39 c8 6a 00 48 89 7d b8 89 75 b4 e8 d9 d0 a7 ff 3b 45 b4 48
>> All code
>> ========
>>    0:	75 b4                	jne    0xffffffffffffffb6
>>    2:	e8 0e d1 a7 ff       	call   0xffffffffffa7d115
>>   31:	83 c1 01             	add    $0x1,%ecx
>>   34:	48 63 c1             	movslq %ecx,%rax
>>   37:	48 83 f8 3f          	cmp    $0x3f,%rax
>>   3b:	76 bc                	jbe    0xfffffffffffffff9
>>   3d:	48                   	rex.W
>>   3e:	83                   	.byte 0x83
>>   3f:	c4                   	.byte 0xc4
>>
>> Code starting with the faulting instruction
>> ===========================================
>>    0:	8b 42 08             	mov    0x8(%rdx),%eax
>>    3:	a8 01                	test   $0x1,%al
>>    5:	75 f7                	jne    0xfffffffffffffffe
>>    7:	83 c1 01             	add    $0x1,%ecx
>>    a:	48 63 c1             	movslq %ecx,%rax
>>    d:	48 83 f8 3f          	cmp    $0x3f,%rax
>>   11:	76 bc                	jbe    0xffffffffffffffcf
>>   13:	48                   	rex.W
>>   14:	83                   	.byte 0x83
>>   15:	c4                   	.byte 0xc4
>> [   28.498759] RSP: 0018:ffa0000000397b18 EFLAGS: 00000202
>> [   28.498761] RAX: 0000000000000011 RBX: ff1100017acac340 RCX: 0000000000000003
>> [   28.498762] RDX: ff1100017adb0aa0 RSI: 0000000000000003 RDI: 00007f27e4acf000
>> [   28.498763] RBP: 0000000000000202 R08: ff1100017adb0aa0 R09: 0000000000000003
>> [   28.498763] R10: ffffffffffffffff R11: 0000000000000003 R12: 0000000081484d01
>> [   28.498764] R13: 0000000000000002 R14: ff1100017ac98000 R15: 0000000000000001
>> [   28.498773] FS:  00007f27e50d86c0(0000) GS:ff110001f7d77000(0000) knlGS:0000000000000000
>> [   28.498774] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   28.498775] CR2: 00007f27d8000020 CR3: 00000001009ac003 CR4: 0000000000373ef0
>> [   28.498776] Call Trace:
>> [   28.498817]  <TASK>
>> [   28.498818]  ? __pfx_should_flush_tlb (arch/x86/mm/tlb.c:1298)
>> [   28.498824]  ? __pfx_flush_tlb_func (arch/x86/mm/tlb.c:1125)
>> [   28.498825]  ? unlink_anon_vmas (mm/rmap.c:438)
>> [   28.498829]  on_each_cpu_cond_mask (arch/x86/include/asm/preempt.h:95 (discriminator 1))
>> [   28.498830]  flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:91)
>> [   28.498832]  tlb_flush_mmu_tlbonly (include/asm-generic/tlb.h:407)
>> [   28.498835]  tlb_finish_mmu (mm/mmu_gather.c:356)
>> [   28.498837]  vms_clear_ptes (mm/vma.c:1279)
>> [   28.498839]  vms_complete_munmap_vmas (include/linux/mm.h:2928)
>> [   28.498841]  do_vmi_align_munmap (mm/vma.c:1580)
>> [   28.498844]  do_vmi_munmap (mm/vma.c:1627)
>> [   28.498846]  __vm_munmap (mm/vma.c:3247)
>> [   28.498849]  __x64_sys_munmap (mm/mmap.c:1077 (discriminator 1))
>> [   28.498850]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1))
>> [   28.498855]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
>> [   28.498857] RIP: 0033:0x7f27e538d7bb
>> [   28.498875] Code: 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48
>> All code
>> ========
>>    0:	73 01                	jae    0x3
>>    2:	c3                   	ret
>>    3:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>>    a:	f7 d8                	neg    %eax
>>    c:	64 89 01             	mov    %eax,%fs:(%rcx)
>>    f:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
>>   13:	c3                   	ret
>>   14:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
>>   1b:	00 00 00
>>   1e:	90                   	nop
>>   1f:	f3 0f 1e fa          	endbr64
>>   23:	b8 0b 00 00 00       	mov    $0xb,%eax
>>   28:	0f 05                	syscall
>>   2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>>   30:	73 01                	jae    0x33
>>   32:	c3                   	ret
>>   33:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>>   3a:	f7 d8                	neg    %eax
>>   3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>>   3f:	48                   	rex.W
>>
>> Code starting with the faulting instruction
>> ===========================================
>>    0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>>    6:	73 01                	jae    0x9
>>    8:	c3                   	ret
>>    9:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>>   10:	f7 d8                	neg    %eax
>>   12:	64 89 01             	mov    %eax,%fs:(%rcx)
>>   15:	48                   	rex.W
>> [   28.498876] RSP: 002b:00007f27e50d77f8 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
>> [   28.498878] RAX: ffffffffffffffda RBX: 0000000000009000 RCX: 00007f27e538d7bb
>> [   28.498878] RDX: 00007f27e53cc280 RSI: 0000000000009000 RDI: 00007f27e4ac7000
>> [   28.498879] RBP: 00007f27e50d7a80 R08: 000000000000004d R09: 0000000000000000
>> [   28.498880] R10: 0000000000000008 R11: 0000000000000202 R12: 00007f27e4ac7000
>> [   28.498880] R13: 00007f27e50d78a0 R14: 0000000000000001 R15: 0000000000000000
>> [   28.498881]  </TASK>
>
>
>
>[3]
>https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21723325004/job/62658752082#step:7:7609
>
>> [   30.907497][    C1] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [virtme-ng-init:76]
>> [   30.907506][    C1] Modules linked in:
>> [   30.907510][    C1] irq event stamp: 53188
>> [   30.907512][    C1] hardirqs last  enabled at (53187): irqentry_exit (kernel/entry/common.c:220)
>> [   30.907521][    C1] hardirqs last disabled at (53188): sysvec_apic_timer_interrupt (arch/x86/include/asm/hardirq.h:78)
>> [   30.907526][    C1] softirqs last  enabled at (52956): handle_softirqs (kernel/softirq.c:469 (discriminator 2))
>> [   30.907531][    C1] softirqs last disabled at (52951): __irq_exit_rcu (kernel/softirq.c:657)
>> [   30.907537][    C1] CPU: 1 UID: 0 PID: 76 Comm: virtme-ng-init Not tainted 6.19.0-rc7+ #1 PREEMPT(full)
>> [   30.907541][    C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [   30.907544][    C1] RIP: 0010:smp_call_function_many_cond (kernel/smp.c:351 (discriminator 5))
>> [   30.907550][    C1] Code: cf 07 00 00 8b 43 08 a8 01 74 38 48 b8 00 00 00 00 00 fc ff df 49 89 f4 48 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 9c 08 00 00 8b 43 08 a8
>> All code
>> ========
>>    0:	cf                   	iret
>>    1:	07                   	(bad)
>>    2:	00 00                	add    %al,(%rax)
>>    4:	8b 43 08             	mov    0x8(%rbx),%eax
>>    7:	a8 01                	test   $0x1,%al
>>    9:	74 38                	je     0x43
>>    b:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
>>   12:	fc ff df
>>   15:	49 89 f4             	mov    %rsi,%r12
>>   18:	48 89 f5             	mov    %rsi,%rbp
>>   1b:	49 c1 ec 03          	shr    $0x3,%r12
>>   1f:	83 e5 07             	and    $0x7,%ebp
>>   22:	49 01 c4             	add    %rax,%r12
>>   25:	83 c5 03             	add    $0x3,%ebp
>>   28:	f3 90                	pause
>>   2a:*	41 0f b6 04 24       	movzbl (%r12),%eax		<-- trapping instruction
>>   2f:	40 38 c5             	cmp    %al,%bpl
>>   32:	7c 08                	jl     0x3c
>>   34:	84 c0                	test   %al,%al
>>   36:	0f 85 9c 08 00 00    	jne    0x8d8
>>   3c:	8b 43 08             	mov    0x8(%rbx),%eax
>>   3f:	a8                   	.byte 0xa8
>>
>> Code starting with the faulting instruction
>> ===========================================
>>    0:	41 0f b6 04 24       	movzbl (%r12),%eax
>>    5:	40 38 c5             	cmp    %al,%bpl
>>    8:	7c 08                	jl     0x12
>>    a:	84 c0                	test   %al,%al
>>    c:	0f 85 9c 08 00 00    	jne    0x8ae
>>   12:	8b 43 08             	mov    0x8(%rbx),%eax
>>   15:	a8                   	.byte 0xa8
>> [   30.907553][    C1] RSP: 0018:ffffc9000101f6a0 EFLAGS: 00000202
>> [   30.907555][    C1] RAX: 0000000000000011 RBX: ffff888152040c00 RCX: 0000000000000000
>> [   30.907557][    C1] RDX: ffff8881520ba948 RSI: ffff888152040c08 RDI: 0000000000000000
>> [   30.907559][    C1] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001
>> [   30.907560][    C1] R10: 0000000000000001 R11: 00007f21a6200000 R12: ffffed102a408181
>> [   30.907561][    C1] R13: ffff8881520ba940 R14: ffffed102a417529 R15: 0000000000000001
>> [   30.907573][    C1] FS:  00007f21a69186c0(0000) GS:ffff8881cc22e000(0000) knlGS:0000000000000000
>> [   30.907585][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   30.907587][    C1] CR2: 00007f2198000020 CR3: 0000000107149002 CR4: 0000000000370ef0
>> [   30.907591][    C1] Call Trace:
>> [   30.907596][    C1]  <TASK>
>> [   30.907603][    C1]  ? __pfx_should_flush_tlb (arch/x86/mm/tlb.c:1298)
>> [   30.907612][    C1]  ? __pfx_flush_tlb_func (arch/x86/mm/tlb.c:1125)
>> [   30.907626][    C1]  ? kasan_quarantine_put (arch/x86/include/asm/irqflags.h:26)
>> [   30.907637][    C1]  ? __pfx_smp_call_function_many_cond (kernel/smp.c:784)
>> [   30.907646][    C1]  ? kmem_cache_free (mm/slub.c:6674 (discriminator 3))
>> [   30.907656][    C1]  ? __pfx_should_flush_tlb (arch/x86/mm/tlb.c:1298)
>> [   30.907660][    C1]  on_each_cpu_cond_mask (kernel/smp.c:1044)
>> [   30.907664][    C1]  ? __pfx_flush_tlb_func (arch/x86/mm/tlb.c:1125)
>> [   30.907669][    C1]  kvm_flush_tlb_multi (arch/x86/kernel/kvm.c:666)
>> [   30.907675][    C1]  ? __pfx_kvm_flush_tlb_multi (arch/x86/kernel/kvm.c:666)
>> [   30.907679][    C1]  ? get_flush_tlb_info (arch/x86/mm/tlb.c:1434 (discriminator 1))
>> [   30.907686][    C1]  flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:91)
>> [   30.907690][    C1]  ? rcu_read_lock_any_held (kernel/rcu/update.c:386 (discriminator 1))
>> [   30.907695][    C1]  ? __pfx_flush_tlb_mm_range (arch/x86/mm/tlb.c:1452)
>> [   30.907703][    C1]  tlb_flush_mmu_tlbonly (include/asm-generic/tlb.h:407)
>> [   30.907712][    C1]  tlb_finish_mmu (mm/mmu_gather.c:356)
>> [   30.907718][    C1]  vms_clear_ptes (mm/vma.c:1279)
>> [   30.907724][    C1]  ? vms_complete_munmap_vmas (include/linux/mmap_lock.h:386)
>> [   30.907728][    C1]  ? __pfx_vms_clear_ptes (mm/vma.c:1258)
>> [   30.907738][    C1]  ? __pfx_mas_store_gfp (lib/maple_tree.c:5119)
>> [   30.907747][    C1]  vms_complete_munmap_vmas (include/linux/mm.h:2928)
>> [   30.907750][    C1]  ? vms_gather_munmap_vmas (mm/vma.c:1495)
>> [   30.907776][    C1]  do_vmi_align_munmap (mm/vma.c:1580)
>> [   30.907780][    C1]  ? lock_acquire.part.0 (kernel/locking/lockdep.c:470)
>> [   30.907784][    C1]  ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1))
>> [   30.907789][    C1]  ? __pfx_do_vmi_align_munmap (mm/vma.c:1561)
>> [   30.907792][    C1]  ? __lock_release.isra.0 (kernel/locking/lockdep.c:5536)
>> [   30.907800][    C1]  ? put_pid.part.0 (arch/x86/include/asm/atomic.h:93 (discriminator 4))
>> [   30.907826][    C1]  do_vmi_munmap (mm/vma.c:1627)
>> [   30.907832][    C1]  __vm_munmap (mm/vma.c:3247)
>> [   30.907837][    C1]  ? __pfx___vm_munmap (mm/vma.c:3238)
>> [   30.907841][    C1]  ? _copy_to_user (arch/x86/include/asm/uaccess_64.h:121)
>> [   30.907858][    C1]  __x64_sys_munmap (mm/mmap.c:1077 (discriminator 1))
>> [   30.907861][    C1]  ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4473)
>> [   30.907863][    C1]  ? do_syscall_64 (arch/x86/include/asm/irqflags.h:42)
>> [   30.907866][    C1]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1))
>> [   30.907871][    C1]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
>> [   30.907875][    C1] RIP: 0033:0x7f21a6bc47bb
>> [   30.907880][    C1] Code: 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48
>> All code
>> ========
>>    0:	73 01                	jae    0x3
>>    2:	c3                   	ret
>>    3:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>>    a:	f7 d8                	neg    %eax
>>    c:	64 89 01             	mov    %eax,%fs:(%rcx)
>>    f:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
>>   13:	c3                   	ret
>>   14:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
>>   1b:	00 00 00
>>   1e:	90                   	nop
>>   1f:	f3 0f 1e fa          	endbr64
>>   23:	b8 0b 00 00 00       	mov    $0xb,%eax
>>   28:	0f 05                	syscall
>>   2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>>   30:	73 01                	jae    0x33
>>   32:	c3                   	ret
>>   33:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>>   3a:	f7 d8                	neg    %eax
>>   3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>>   3f:	48                   	rex.W
>>
>> Code starting with the faulting instruction
>> ===========================================
>>    0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>>    6:	73 01                	jae    0x9
>>    8:	c3                   	ret
>>    9:	48 c7 c1 e0 ff ff ff 	mov    $0xffffffffffffffe0,%rcx
>>   10:	f7 d8                	neg    %eax
>>   12:	64 89 01             	mov    %eax,%fs:(%rcx)
>>   15:	48                   	rex.W
>> [   30.907882][    C1] RSP: 002b:00007f21a69177f8 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
>> [   30.907884][    C1] RAX: ffffffffffffffda RBX: 0000000000009000 RCX: 00007f21a6bc47bb
>> [   30.907886][    C1] RDX: 00007f21a6c03280 RSI: 0000000000009000 RDI: 00007f21a62fe000
>> [   30.907887][    C1] RBP: 00007f21a6917a80 R08: 0000000000000050 R09: 0000000000000000
>> [   30.907889][    C1] R10: 0000000000000008 R11: 0000000000000202 R12: 00007f21a62fe000
>> [   30.907890][    C1] R13: 00007f21a69178a0 R14: 0000000000000001 R15: 0000000000000000
>> [   30.907902][    C1]  </TASK>
>
>
>
>[4]
>https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21741113372/job/62716612654#step:7:12820
>[5]
>https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21741112047/job/62716608856#step:7:14820
>
>[6]
>https://github.com/multipath-tcp/mptcp_net-next/actions/runs/21741112047/job/62716608836#step:7:4811
>
>
># l
>
>> virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
>> 106			goto __retry;
>> 101	 __retry:
>> 102		val = atomic_read(&lock->val);
>> 103	
>> 104		if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
>> 105			cpu_relax();
>> 106			goto __retry;
>> 107		}
>> 108	
>> 109		return true;
>> 110	}
>
>
># bt full
>
>> #0  virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
>>         val = <optimized out>
>> #1  queued_spin_lock_slowpath (lock=0xff1100017acab300, val=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/locking/qspinlock.c:141
>>         prev = <optimized out>
>>         next = 0x1
>>         node = <optimized out>
>>         old = <optimized out>
>>         tail = <optimized out>
>>         idx = <optimized out>
>>         locked = <optimized out>
>>         __vpp_verify = <optimized out>
>>         __vpp_verify = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>> #2  0xffffffff813de445 in raw_spin_rq_lock_nested (rq=0xff1100017acab300, subclass=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:639
>>         lock = <optimized out>
>> #3  0xffffffff813ef2d5 in raw_spin_rq_lock (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1580
>> No locals.
>> #4  _raw_spin_rq_lock_irqsave (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1600
>>         flags = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>> #5  rq_lock_irqsave (rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1893
>> No locals.
>> #6  sched_balance_rq (this_cpu=0x7acab300, this_rq=0x1, sd=0x1, idle=2060104448, continue_balancing=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:11867
>>         ld_moved = 0x1
>>         cur_ld_moved = <optimized out>
>>         active_balance = <optimized out>
>>         sd_parent = <optimized out>
>>         group = <optimized out>
>>         busiest = <optimized out>
>>         rf = <optimized out>
>>         cpus = <optimized out>
>>         env = {sd = 0xff1100010020b400, src_rq = 0xff1100017acab300, src_cpu = 0x1, dst_cpu = 0x0, dst_rq = 0xff1100017ac2b300, dst_grpmask = 0xff110001001e4930, new_dst_cpu = 0x0, idle = CPU_NEWLY_IDLE, imbalance = 0x1, cpus = 0xff1100017ac183e0, flags = 0x1, loop = 0x0, loop_break = 0x20, loop_max = 0x2, fbq_type = all, migration_type = migrate_task, tasks = <incomplete type>}
>>         need_unlock = <optimized out>
>>         redo = <optimized out>
>>         more_balance = <optimized out>
>>         __vpp_verify = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>>         __vpp_verify = <optimized out>
>> #7  0xffffffff813efe9b in sched_balance_newidle (this_rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:12932
>>         weight = <optimized out>
>>         domain_cost = 0xff1100017acab300
>>         next_balance = <optimized out>
>>         this_cpu = 0x20b400
>>         continue_balancing = 0x1
>>         t0 = <optimized out>
>>         t1 = <optimized out>
>>         curr_cost = 0x0
>>         sd = 0xff1100010020b400
>>         pulled_task = 0x1
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>> #8  pick_next_task_fair (rq=0xff1100010020b400, prev=0x1, rf=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:8973
>>         se = 0xfffb6e65
>>         p = <optimized out>
>>         new_tasks = <optimized out>
>>         again = <optimized out>
>>         idle = <optimized out>
>>         simple = <optimized out>
>> #9  0xffffffff81e1337e in __pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5890
>>         class = 0xff1100010224aa00
>>         p = 0xffffffff824b34a8 <fair_sched_class>
>>         restart = <optimized out>
>> #10 pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6426
>> No locals.
>> #11 __schedule (sched_mode=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6809
>>         prev = 0xff1100010146c380
>>         next = 0xffffffff824b34a8 <fair_sched_class>
>>         preempt = 0x1
>>         is_switch = 0x1
>>         switch_count = <optimized out>
>>         prev_state = <optimized out>
>>         rf = <incomplete type>
>>         rq = <optimized out>
>>         cpu = <optimized out>
>>         keep_resched = <optimized out>
>>         __vpp_verify = <optimized out>
>> #12 0xffffffff81e14097 in __schedule_loop (sched_mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6949
>> No locals.
>> #13 schedule () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6964
>>         tsk = <optimized out>
>> #14 0xffffffff8179e372 in request_wait_answer (req=0xff11000100910160) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:552
>>         __int = <optimized out>
>>         __out = <optimized out>
>>         __wq_entry = <incomplete type>
>>         __ret = <optimized out>
>>         __ret = <optimized out>
>>         fc = 0xff110001023e6800
>>         fiq = <optimized out>
>>         err = <optimized out>
>> #15 0xffffffff8179e5a0 in __fuse_request_send (req=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:599
>>         fiq = 0x0
>> #16 __fuse_simple_request (idmap=0xff1100017acab300, fm=0x1, args=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:693
>>         fc = 0xff110001023e6800
>>         req = 0xff11000100910160
>>         ret = 0xff110001023e6800
>> #17 0xffffffff817a47d9 in fuse_simple_request (fm=<optimized out>, args=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1263
>> No locals.
>> #18 fuse_lookup_name (sb=0xff1100017acab300, nodeid=0x1, name=0x1, outarg=0xff1100017acab300, inode=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dir.c:574
>>         fm = <optimized out>
>>         args = <incomplete type>
>>         forget = <optimized out>
>>         attr_version = <optimized out>
>>         evict_ctr = <optimized out>
>>         err = 0x411620
>> #19 0xffffffff817a49c9 in fuse_lookup (dir=0xff11000100606a00, entry=0xff11000100411600, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1062
>>         outarg = <incomplete type>
>>         fc = <optimized out>
>>         inode = 0x0
>>         newent = 0xffa00000003afaf0
>>         err = <optimized out>
>>         epoch = <optimized out>
>>         outarg_valid = 0x0
>>         locked = <optimized out>
>>         out_iput = <optimized out>
>> #20 0xffffffff816c9e63 in __lookup_slow (name=0xff1100017acab300, dir=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:1866
>>         dentry = 0xff11000100411600
>>         old = <optimized out>
>>         inode = 0xff11000100606a00
>>         wq = <incomplete type>
>> #21 0xffffffff816c9f69 in lookup_slow (name=0xff1100017acab300, dir=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:1883
>>         inode = <optimized out>
>>         res = <optimized out>
>> #22 0xffffffff816cddd8 in walk_component (nd=<optimized out>, flags=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2229
>>         dentry = 0x1
>> #23 lookup_last (nd=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2730
>> No locals.
>> #24 path_lookupat (nd=0xff1100017acab300, flags=0x1, path=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2754
>>         s = 0x1 <error: Cannot access memory at address 0x1>
>>         err = <optimized out>
>> #25 0xffffffff816cfee0 in filename_lookup (dfd=0x7acab300, name=0x1, flags=0x1, path=0xff1100017acab300, root=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2783
>>         retval = 0x1
>>         nd = {path = <incomplete type>, last = {{{hash = 0x314ef79d, len = 0x4}, hash_len = 0x4314ef79d}, name = 0xff110001009f1029 "dpkg"}, root = <incomplete type>, inode = 0xff11000100606a00, flags = 0x5, state = 0x2, seq = 0x0, next_seq = 0x0, m_seq = 0x34, r_seq = 0x4, last_type = 0x0, depth = 0x0, total_link_count = 0x0, stack = 0xffa00000003afcd8, internal = {{link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}, {link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}}, name = 0xff110001009f1000, pathname = 0xff110001009f1020 "/var/lib/dpkg", saved = 0x0, root_seq = 0x2, dfd = 0xffffff9c, dir_vfsuid = <incomplete type>, dir_mode = 0x41ed}
>> #26 0xffffffff816c1a8c in vfs_statx (dfd=0x7acab300, filename=0x1, flags=0x1, stat=0xff1100017acab300, request_mask=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:353
>>         path = <incomplete type>
>>         lookup_flags = 0x5
>>         error = 0xffffff9c
>> #27 0xffffffff816c2863 in do_statx (dfd=0x7acab300, filename=0x1, flags=0x1, mask=0x7acab300, buffer=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:769
>>         stat = {result_mask = 0x0, mode = 0x0, nlink = 0x0, blksize = 0x0, attributes = 0x0, attributes_mask = 0x0, ino = 0x0, dev = 0x0, rdev = 0x0, uid = <incomplete type>, gid = <incomplete type>, size = 0x0, atime = <incomplete type>, mtime = <incomplete type>, ctime = <incomplete type>, btime = <incomplete type>, blocks = 0x0, mnt_id = 0x0, change_cookie = 0x0, subvol = 0x0, dio_mem_align = 0x0, dio_offset_align = 0x0, dio_read_offset_align = 0x0, atomic_write_unit_min = 0x0, atomic_write_unit_max = 0x0, atomic_write_unit_max_opt = 0x0, atomic_write_segments_max = 0x0}
>>         error = 0x1
>> #28 0xffffffff816c2ab0 in __do_sys_statx (dfd=<optimized out>, filename=<optimized out>, flags=<optimized out>, mask=<optimized out>, buffer=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:823
>>         ret = 0xffffff9c
>>         name = <error reading variable name (Cannot access memory at address 0x0)>
>> #29 __se_sys_statx (dfd=<optimized out>, filename=<optimized out>, flags=<optimized out>, mask=<optimized out>, buffer=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:812
>>         ret = <optimized out>
>> #30 __x64_sys_statx (regs=0xff1100017acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:812
>> No locals.
>> #31 0xffffffff81e07124 in do_syscall_x64 (regs=<optimized out>, nr=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:63
>>         unr = <optimized out>
>> #32 do_syscall_64 (regs=0xff1100017acab300, nr=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:94
>> No locals.
>> #33 0xffffffff81000130 in entry_SYSCALL_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/entry_64.S:122
>> No locals.
>> #34 0x000000000000000e in ?? ()
>> No symbol table info available.
>> #35 0x0000000000000001 in ?? ()
>> No symbol table info available.
>> #36 0x00007fff8cbcae40 in ?? ()
>> No symbol table info available.
>> #37 0x00007fbb012d6530 in ?? ()
>> No symbol table info available.
>> #38 0x00007fbb00d14c70 in ?? ()
>> No symbol table info available.
>> #39 0x00007fbb00d14e00 in ?? ()
>> No symbol table info available.
>> #40 0x0000000000000246 in ?? ()
>> No symbol table info available.
>> #41 0x0000000000000fff in ?? ()
>> No symbol table info available.
>> #42 0x0000000000000000 in ?? ()
>> No symbol table info available.
>
>
># info frame ; info registers
>
>> Stack level 0, frame at 0xffa00000003af740:
>>  rip = 0xffffffff81e1c641 in virt_spin_lock (/home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106); saved rip = 0xffffffff813de445
>>  inlined into frame 1
>>  source language c.
>>  Arglist at unknown address.
>>  Locals at unknown address, Previous frame's sp in rsp
>> rax            0x1                 0x1
>> rbx            0xff1100017acab300  0xff1100017acab300
>> rcx            0xff1100017acab300  0xff1100017acab300
>> rdx            0x1                 0x1
>> rsi            0x1                 0x1
>> rdi            0xff1100017acab300  0xff1100017acab300
>> rbp            0x2                 0x2
>> rsp            0xffa00000003af738  0xffa00000003af738
>> r8             0x0                 0x0
>> r9             0x400               0x400
>> r10            0x0                 0x0
>> r11            0x2                 0x2
>> r12            0x1                 0x1
>> r13            0xff110001001e48c0  0xff110001001e48c0
>> r14            0xffa00000003af810  0xffa00000003af810
>> r15            0xff110001001e4940  0xff110001001e4940
>> rip            0xffffffff81e1c641  0xffffffff81e1c641 <queued_spin_lock_slowpath+305>
>> eflags         0x2                 [ IOPL=0 ]
>> cs             0x10                0x10
>> ss             0x18                0x18
>> ds             0x0                 0x0
>> es             0x0                 0x0
>> fs             0x0                 0x0
>> gs             0x0                 0x0
>> fs_base        0x7fbb00d156c0      0x7fbb00d156c0
>> gs_base        0xff110001f7cf7000  0xff110001f7cf7000
>> k_gs_base      0x0                 0x0
>> cr0            0x80050033          [ PG AM WP NE ET MP PE ]
>> cr2            0x7fbaf8001118      0x7fbaf8001118
>> cr3            0x1022df003         [ PDBR=1057503 PCID=3 ]
>> cr4            0x373ef0            [ SMAP SMEP OSXSAVE PCIDE FSGSBASE VMXE LA57 UMIP OSXMMEXCPT OSFXSR PGE MCE PAE PSE ]
>> cr8            0x1                 0x1
>> efer           0xd01               [ NXE LMA LME SCE ]
>> xmm0           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> xmm1           {v4_float = {0x4, 0x0, 0x2c, 0x0}, v2_double = {0x4, 0x2c}, v16_int8 = {0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x4, 0x0, 0x0, 0x0, 0x2c, 0x0, 0x0, 0x0}, v4_int32 = {0x4, 0x0, 0x2c, 0x0}, v2_int64 = {0x4, 0x2c}, uint128 = 0x2c0000000000000004}
>> xmm2           {v4_float = {0xf8000fb0, 0x7fba, 0x2c, 0x0}, v2_double = {0x7fbaf8000fb0, 0x2c}, v16_int8 = {0xb0, 0xf, 0x0, 0xf8, 0xba, 0x7f, 0x0, 0x0, 0x2c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0xfb0, 0xf800, 0x7fba, 0x0, 0x2c, 0x0, 0x0, 0x0}, v4_int32 = {0xf8000fb0, 0x7fba, 0x2c, 0x0}, v2_int64 = {0x7fbaf8000fb0, 0x2c}, uint128 = 0x2c00007fbaf8000fb0}
>> xmm3           {v4_float = {0x59ff1020, 0x5555, 0x1e, 0x0}, v2_double = {0x555559ff1020, 0x1e}, v16_int8 = {0x20, 0x10, 0xff, 0x59, 0x55, 0x55, 0x0, 0x0, 0x1e, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x1020, 0x59ff, 0x5555, 0x0, 0x1e, 0x0, 0x0, 0x0}, v4_int32 = {0x59ff1020, 0x5555, 0x1e, 0x0}, v2_int64 = {0x555559ff1020, 0x1e}, uint128 = 0x1e0000555559ff1020}
>> xmm4           {v4_float = {0xf8000090, 0x7fba, 0x0, 0x0}, v2_double = {0x7fbaf8000090, 0x0}, v16_int8 = {0x90, 0x0, 0x0, 0xf8, 0xba, 0x7f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x90, 0xf800, 0x7fba, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0xf8000090, 0x7fba, 0x0, 0x0}, v2_int64 = {0x7fbaf8000090, 0x0}, uint128 = 0x7fbaf8000090}
>> xmm5           {v4_float = {0xff0000, 0x0, 0xff0000, 0x0}, v2_double = {0xff0000, 0xff0000}, v16_int8 = {0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x0, 0xff, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0}, v4_int32 = {0xff0000, 0x0, 0xff0000, 0x0}, v2_int64 = {0xff0000, 0xff0000}, uint128 = 0xff00000000000000ff0000}
>> xmm6           {v4_float = {0xff0000, 0x0, 0x0, 0x0}, v2_double = {0xff0000, 0x0}, v16_int8 = {0x0, 0x0, 0xff, 0x0 <repeats 13 times>}, v8_int16 = {0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0xff0000, 0x0, 0x0, 0x0}, v2_int64 = {0xff0000, 0x0}, uint128 = 0xff0000}
>> xmm7           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> xmm8           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> xmm9           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> xmm10          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> xmm11          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> xmm12          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> xmm13          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> xmm14          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> xmm15          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0}
>> mxcsr          0x1f80              [ IM DM ZM OM UM PM ]
>
>
># thread apply all bt full
>
>> Thread 4 (Thread 1.4 (CPU#3 [running])):
>> #0  virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
>>         val = <optimized out>
>> #1  queued_spin_lock_slowpath (lock=0xff1100017acab300, val=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/locking/qspinlock.c:141
>>         prev = <optimized out>
>>         next = 0x1
>>         node = <optimized out>
>>         old = <optimized out>
>>         tail = <optimized out>
>>         idx = <optimized out>
>>         locked = <optimized out>
>>         __vpp_verify = <optimized out>
>>         __vpp_verify = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>> #2  0xffffffff813de445 in raw_spin_rq_lock_nested (rq=0xff1100017acab300, subclass=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:639
>>         lock = <optimized out>
>> #3  0xffffffff813ef2d5 in raw_spin_rq_lock (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1580
>> No locals.
>> #4  _raw_spin_rq_lock_irqsave (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1600
>>         flags = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>> #5  rq_lock_irqsave (rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1893
>> No locals.
>> #6  sched_balance_rq (this_cpu=0x7acab300, this_rq=0x1, sd=0x1, idle=2060104448, continue_balancing=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:11867
>>         ld_moved = 0x1
>>         cur_ld_moved = <optimized out>
>>         active_balance = <optimized out>
>>         sd_parent = <optimized out>
>>         group = <optimized out>
>>         busiest = <optimized out>
>>         rf = <optimized out>
>>         cpus = <optimized out>
>>         env = {sd = 0xff1100010020ba00, src_rq = 0xff1100017acab300, src_cpu = 0x1, dst_cpu = 0x3, dst_rq = 0xff1100017adab300, dst_grpmask = 0xff110001001e4ab0, new_dst_cpu = 0x0, idle = CPU_NEWLY_IDLE, imbalance = 0x1, cpus = 0xff1100017ad983e0, flags = 0x1, loop = 0x0, loop_break = 0x20, loop_max = 0x2, fbq_type = all, migration_type = migrate_task, tasks = <incomplete type>}
>>         need_unlock = <optimized out>
>>         redo = <optimized out>
>>         more_balance = <optimized out>
>>         __vpp_verify = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>>         __vpp_verify = <optimized out>
>> #7  0xffffffff813efe9b in sched_balance_newidle (this_rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:12932
>>         weight = <optimized out>
>>         domain_cost = 0xff1100017acab300
>>         next_balance = <optimized out>
>>         this_cpu = 0x20ba00
>>         continue_balancing = 0x1
>>         t0 = <optimized out>
>>         t1 = <optimized out>
>>         curr_cost = 0x0
>>         sd = 0xff1100010020ba00
>>         pulled_task = 0x1
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>> #8  pick_next_task_fair (rq=0xff1100010020ba00, prev=0x1, rf=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:8973
>>         se = 0xfffb6e63
>>         p = <optimized out>
>>         new_tasks = <optimized out>
>>         again = <optimized out>
>>         idle = <optimized out>
>>         simple = <optimized out>
>> #9  0xffffffff81e1337e in __pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5890
>>         class = 0x32
>>         p = 0xffffffff824b34a8 <fair_sched_class>
>>         restart = <optimized out>
>> #10 pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6426
>> No locals.
>> #11 __schedule (sched_mode=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6809
>>         prev = 0xff11000100238000
>>         next = 0xffffffff824b34a8 <fair_sched_class>
>>         preempt = 0x1
>>         is_switch = 0x1
>>         switch_count = <optimized out>
>>         prev_state = <optimized out>
>>         rf = <incomplete type>
>>         rq = <optimized out>
>>         cpu = <optimized out>
>>         keep_resched = <optimized out>
>>         __vpp_verify = <optimized out>
>> #12 0xffffffff81e14097 in __schedule_loop (sched_mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6949
>> No locals.
>> #13 schedule () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6964
>>         tsk = <optimized out>
>> #14 0xffffffff814833ba in futex_do_wait (q=0x1, timeout=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/waitwake.c:358
>> No locals.
>> #15 0xffffffff81483b7e in __futex_wait (uaddr=0xff1100017acab300, flags=0x1, val=0x1, to=0xff1100017acab300, bitset=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/waitwake.c:687
>>         q = {list = <incomplete type>, task = 0xff11000100238000, lock_ptr = 0xff11000100a15684, wake = 0xffffffff81482a80 <futex_wake_mark>, wake_data = 0x0, key = {shared = {i_seq = 0xff11000102bf0000, pgoff = 0x7fbb00f16000, offset = 0x992}, private = {{mm = 0xff11000102bf0000, __tmp = 0xff11000102bf0000}, address = 0x7fbb00f16000, offset = 0x992}, both = {ptr = 0xff11000102bf0000, word = 0x7fbb00f16000, offset = 0x992, node = 0xffffffff}}, pi_state = 0x0, rt_waiter = 0x0, requeue_pi_key = 0x0, bitset = 0xffffffff, requeue_state = <incomplete type>, drop_hb_ref = 0x0}
>>         ret = 0x1
>> #16 0xffffffff81483c68 in futex_wait (uaddr=0xff1100017acab300, flags=0x1, val=0x1, abs_time=0xff1100017acab300, bitset=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/waitwake.c:715
>>         timeout = {timer = <incomplete type>, task = 0x0}
>>         to = <optimized out>
>>         restart = <optimized out>
>>         ret = 0xffffffff
>> #17 0xffffffff8147f4a5 in do_futex (uaddr=0xff1100017acab300, op=0x1, val=0x1, timeout=0xff1100017acab300, uaddr2=0x0, val2=0x400, val3=0xffffffff) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/syscalls.c:130
>>         flags = 0x1
>>         cmd = <optimized out>
>> #18 0xffffffff8147f6ad in __do_sys_futex (uaddr=<optimized out>, op=<optimized out>, val=<optimized out>, utime=<optimized out>, uaddr2=<optimized out>, val3=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/syscalls.c:207
>>         ret = <optimized out>
>>         cmd = <optimized out>
>>         t = 0x0
>>         tp = 0xff1100017acab300
>>         ts = <incomplete type>
>> #19 __se_sys_futex (uaddr=<optimized out>, op=<optimized out>, val=<optimized out>, utime=<optimized out>, uaddr2=<optimized out>, val3=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/syscalls.c:188
>>         ret = <optimized out>
>> #20 __x64_sys_futex (regs=0xff1100017acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/futex/syscalls.c:188
>> No locals.
>> #21 0xffffffff81e07124 in do_syscall_x64 (regs=<optimized out>, nr=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:63
>>         unr = <optimized out>
>> #22 do_syscall_64 (regs=0xff1100017acab300, nr=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:94
>> No locals.
>> #23 0xffffffff81000130 in entry_SYSCALL_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/entry_64.S:122
>> No locals.
>> #24 0x0000000000000000 in ?? ()
>> No symbol table info available.
>>
>> Thread 3 (Thread 1.3 (CPU#2 [running])):
>> #0  virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
>>         val = <optimized out>
>> #1  queued_spin_lock_slowpath (lock=0xff1100017acab300, val=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/locking/qspinlock.c:141
>>         prev = <optimized out>
>>         next = 0x1
>>         node = <optimized out>
>>         old = <optimized out>
>>         tail = <optimized out>
>>         idx = <optimized out>
>>         locked = <optimized out>
>>         __vpp_verify = <optimized out>
>>         __vpp_verify = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>> #2  0xffffffff813de445 in raw_spin_rq_lock_nested (rq=0xff1100017acab300, subclass=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:639
>>         lock = <optimized out>
>> #3  0xffffffff813ef2d5 in raw_spin_rq_lock (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1580
>> No locals.
>> #4  _raw_spin_rq_lock_irqsave (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1600
>>         flags = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>> #5  rq_lock_irqsave (rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1893
>> No locals.
>> #6  sched_balance_rq (this_cpu=0x7acab300, this_rq=0x1, sd=0x1, idle=2060104448, continue_balancing=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:11867
>>         ld_moved = 0x1
>>         cur_ld_moved = <optimized out>
>>         active_balance = <optimized out>
>>         sd_parent = <optimized out>
>>         group = <optimized out>
>>         busiest = <optimized out>
>>         rf = <optimized out>
>>         cpus = <optimized out>
>>         env = {sd = 0xff1100010020b800, src_rq = 0xff1100017acab300, src_cpu = 0x1, dst_cpu = 0x2, dst_rq = 0xff1100017ad2b300, dst_grpmask = 0xff110001001e4a30, new_dst_cpu = 0x0, idle = CPU_NEWLY_IDLE, imbalance = 0x1, cpus = 0xff1100017ad183e0, flags = 0x1, loop = 0x0, loop_break = 0x20, loop_max = 0x2, fbq_type = all, migration_type = migrate_task, tasks = <incomplete type>}
>>         need_unlock = <optimized out>
>>         redo = <optimized out>
>>         more_balance = <optimized out>
>>         __vpp_verify = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>>         __vpp_verify = <optimized out>
>> #7  0xffffffff813efe9b in sched_balance_newidle (this_rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:12932
>>         weight = <optimized out>
>>         domain_cost = 0xff1100017acab300
>>         next_balance = <optimized out>
>>         this_cpu = 0x20b800
>>         continue_balancing = 0x1
>>         t0 = <optimized out>
>>         t1 = <optimized out>
>>         curr_cost = 0x0
>>         sd = 0xff1100010020b800
>>         pulled_task = 0x1
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>> #8  pick_next_task_fair (rq=0xff1100010020b800, prev=0x1, rf=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:8973
>>         se = 0xfffb6e64
>>         p = <optimized out>
>>         new_tasks = <optimized out>
>>         again = <optimized out>
>>         idle = <optimized out>
>>         simple = <optimized out>
>> #9  0xffffffff81e1337e in __pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5890
>>         class = 0xffa00000003bfc50
>>         p = 0xffffffff824b34a8 <fair_sched_class>
>>         restart = <optimized out>
>> #10 pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6426
>> No locals.
>> #11 __schedule (sched_mode=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6809
>>         prev = 0xff11000102369680
>>         next = 0xffffffff824b34a8 <fair_sched_class>
>>         preempt = 0x1
>>         is_switch = 0x1
>>         switch_count = <optimized out>
>>         prev_state = <optimized out>
>>         rf = <incomplete type>
>>         rq = <optimized out>
>>         cpu = <optimized out>
>>         keep_resched = <optimized out>
>>         __vpp_verify = <optimized out>
>> #12 0xffffffff81e14097 in __schedule_loop (sched_mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6949
>> No locals.
>> #13 schedule () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6964
>>         tsk = <optimized out>
>> #14 0xffffffff8179e372 in request_wait_answer (req=0xff11000103061000) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:552
>>         __int = <optimized out>
>>         __out = <optimized out>
>>         __wq_entry = <incomplete type>
>>         __ret = <optimized out>
>>         __ret = <optimized out>
>>         fc = 0xff110001023e6800
>>         fiq = <optimized out>
>>         err = <optimized out>
>> #15 0xffffffff8179e5a0 in __fuse_request_send (req=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:599
>>         fiq = 0x0
>> #16 __fuse_simple_request (idmap=0xff1100017acab300, fm=0x1, args=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:693
>>         fc = 0xff110001023e6800
>>         req = 0xff11000103061000
>>         ret = 0xff110001023e6800
>> #17 0xffffffff817a27e2 in fuse_simple_request (fm=<optimized out>, args=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1263
>> No locals.
>> #18 fuse_readlink_folio (inode=0xff11000100614e00, folio=0xffd40000040c1800) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dir.c:1834
>>         fm = <optimized out>
>>         desc = <incomplete type>
>>         ap = <incomplete type>
>>         link = <optimized out>
>>         res = <optimized out>
>> #19 0xffffffff817a2943 in fuse_get_link (dentry=0xff1100017acab300, inode=0xff1100017acab300, callback=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dir.c:1873
>>         fc = <optimized out>
>>         folio = <optimized out>
>>         err = <optimized out>
>> #20 0xffffffff816cc950 in pick_link (nd=0xff1100017acab300, link=0x1, inode=0xff11000100614e00, flags=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2013
>>         get = 0xff1100017acab300
>>         last = 0xffa00000003bfd88
>>         res = 0x1 <error: Cannot access memory at address 0x1>
>>         error = <optimized out>
>>         all_done = <optimized out>
>> #21 0xffffffff816ccb5e in step_into_slowpath (nd=0xff1100017acab300, flags=0x1, dentry=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2074
>>         path = <incomplete type>
>>         inode = 0x0
>>         err = <optimized out>
>> #22 0xffffffff816d14f7 in step_into (nd=<optimized out>, flags=<optimized out>, dentry=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2099
>> No locals.
>> #23 open_last_lookups (nd=<optimized out>, file=<optimized out>, op=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:4584
>>         delegated_inode = <optimized out>
>>         dir = 0xff1100010040ad80
>>         open_flag = 0x1
>>         got_write = <optimized out>
>>         dentry = 0xff110001006af840
>>         res = <optimized out>
>>         retry = <optimized out>
>> #24 path_openat (nd=0xff1100017acab300, op=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:4793
>>         s = 0xff110001006af840 "\004"
>>         file = <optimized out>
>>         error = 0x0
>> #25 0xffffffff816d2618 in do_filp_open (dfd=0x7acab300, pathname=0x1, op=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:4823
>>         nd = {path = <incomplete type>, last = {{{hash = 0xf361748d, len = 0xd}, hash_len = 0xdf361748d}, name = 0xff11000102348031 "systemd-udevd"}, root = <incomplete type>, inode = 0xff110001004ac380, flags = 0x10001, state = 0x2, seq = 0x0, next_seq = 0x0, m_seq = 0x34, r_seq = 0x4, last_type = 0x0, depth = 0x1, total_link_count = 0x1, stack = 0xffa00000003bfd88, internal = {{link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x2}, {link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}}, name = 0xff11000102348000, pathname = 0xff11000102348020 "/usr/lib/systemd/systemd-udevd", saved = 0x0, root_seq = 0x2, dfd = 0xffffff9c, dir_vfsuid = <incomplete type>, dir_mode = 0x41ed}
>>         flags = 0x1
>>         filp = 0x1
>> #26 0xffffffff816c3baf in do_open_execat (fd=0x7acab300, name=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:783
>>         err = <optimized out>
>>         file = <optimized out>
>>         open_exec_flags = <incomplete type>
>>         __ptr = <optimized out>
>>         __val = <optimized out>
>> #27 0xffffffff816c3de0 in alloc_bprm (fd=0x7acab300, filename=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:1409
>>         bprm = <optimized out>
>>         file = <optimized out>
>>         retval = <optimized out>
>> #28 0xffffffff816c48fd in do_execveat_common (fd=0x7acab300, filename=0x1, flags=0x0, envp=..., argv=...) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:1810
>>         bprm = <optimized out>
>>         retval = <optimized out>
>> #29 0xffffffff816c5988 in do_execve (filename=<error reading variable: Cannot access memory at address 0x0>, __argv=<optimized out>, __envp=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:1933
>>         argv = <optimized out>
>>         envp = <optimized out>
>> #30 __do_sys_execve (filename=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:2009
>> No locals.
>> #31 __se_sys_execve (filename=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:2004
>>         ret = <optimized out>
>> #32 __x64_sys_execve (regs=0xff1100017acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/exec.c:2004
>> No locals.
>> #33 0xffffffff81e07124 in do_syscall_x64 (regs=<optimized out>, nr=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:63
>>         unr = <optimized out>
>> #34 do_syscall_64 (regs=0xff1100017acab300, nr=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:94
>> No locals.
>> #35 0xffffffff81000130 in entry_SYSCALL_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/entry_64.S:122
>> No locals.
>> #36 0x0000000000000003 in ?? ()
>> No symbol table info available.
>> #37 0x00007fbaf4000e50 in ?? ()
>> No symbol table info available.
>> #38 0x0000555559fefea0 in ?? ()
>> No symbol table info available.
>> #39 0x00007fbaf4000d90 in ?? ()
>> No symbol table info available.
>> #40 0x00007fbb0090de60 in ?? ()
>> No symbol table info available.
>> #41 0x00007fbaf4000cb0 in ?? ()
>> No symbol table info available.
>> #42 0x0000000000000202 in ?? ()
>> No symbol table info available.
>> #43 0x0000000000000008 in ?? ()
>> No symbol table info available.
>> #44 0x0000000000000000 in ?? ()
>> No symbol table info available.
>>
>> Thread 2 (Thread 1.2 (CPU#1 [running])):
>> #0  num_possible_cpus () at /home/runner/work/mptcp_net-next/mptcp_net-next/include/linux/cpumask.h:1222
>> No locals.
>> #1  mm_get_cid (mm=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:3782
>>         cid = 0x4
>> #2  mm_cid_from_cpu (t=<optimized out>, cpu_cid=0x4, mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:3844
>>         max_cids = <optimized out>
>>         tcid = <optimized out>
>>         mm = <optimized out>
>> #3  mm_cid_schedin (next=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:3900
>>         mm = 0xff11000102bf0000
>>         cpu_cid = <optimized out>
>>         mode = <optimized out>
>> #4  mm_cid_switch_to (prev=<optimized out>, next=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:3935
>> No locals.
>> #5  context_switch (rq=<optimized out>, prev=<optimized out>, next=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5249
>> No locals.
>> #6  __schedule (sched_mode=0x2bf0650) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6867
>>         prev = 0xff1100010031da00
>>         next = 0xff1100010146da00
>>         preempt = 0x4
>>         is_switch = 0x0
>>         switch_count = <optimized out>
>>         prev_state = <optimized out>
>>         rf = <incomplete type>
>>         rq = <optimized out>
>>         cpu = <optimized out>
>>         keep_resched = <optimized out>
>>         __vpp_verify = <optimized out>
>> #7  0xffffffff81e14232 in schedule_idle () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6990
>> No locals.
>> #8  0xffffffff813f68a9 in cpu_startup_entry (state=46073424) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/idle.c:430
>> No locals.
>> #9  0xffffffff8135fef4 in start_secondary (unused=0xff11000102bf0650) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/kernel/smpboot.c:312
>> No locals.
>> #10 0xffffffff8132b266 in secondary_startup_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/kernel/head_64.S:418
>> No locals.
>> #11 0x0000000000000000 in ?? ()
>> No symbol table info available.
>>
>> Thread 1 (Thread 1.1 (CPU#0 [running])):
>> #0  virt_spin_lock (lock=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/include/asm/qspinlock.h:106
>>         val = <optimized out>
>> #1  queued_spin_lock_slowpath (lock=0xff1100017acab300, val=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/locking/qspinlock.c:141
>>         prev = <optimized out>
>>         next = 0x1
>>         node = <optimized out>
>>         old = <optimized out>
>>         tail = <optimized out>
>>         idx = <optimized out>
>>         locked = <optimized out>
>>         __vpp_verify = <optimized out>
>>         __vpp_verify = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>>         pao_ID__ = <optimized out>
>>         pao_tmp__ = <optimized out>
>>         pto_val__ = <optimized out>
>>         pto_tmp__ = <optimized out>
>> #2  0xffffffff813de445 in raw_spin_rq_lock_nested (rq=0xff1100017acab300, subclass=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:639
>>         lock = <optimized out>
>> #3  0xffffffff813ef2d5 in raw_spin_rq_lock (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1580
>> No locals.
>> #4  _raw_spin_rq_lock_irqsave (rq=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1600
>>         flags = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>> #5  rq_lock_irqsave (rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/sched.h:1893
>> No locals.
>> #6  sched_balance_rq (this_cpu=0x7acab300, this_rq=0x1, sd=0x1, idle=2060104448, continue_balancing=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:11867
>>         ld_moved = 0x1
>>         cur_ld_moved = <optimized out>
>>         active_balance = <optimized out>
>>         sd_parent = <optimized out>
>>         group = <optimized out>
>>         busiest = <optimized out>
>>         rf = <optimized out>
>>         cpus = <optimized out>
>>         env = {sd = 0xff1100010020b400, src_rq = 0xff1100017acab300, src_cpu = 0x1, dst_cpu = 0x0, dst_rq = 0xff1100017ac2b300, dst_grpmask = 0xff110001001e4930, new_dst_cpu = 0x0, idle = CPU_NEWLY_IDLE, imbalance = 0x1, cpus = 0xff1100017ac183e0, flags = 0x1, loop = 0x0, loop_break = 0x20, loop_max = 0x2, fbq_type = all, migration_type = migrate_task, tasks = <incomplete type>}
>>         need_unlock = <optimized out>
>>         redo = <optimized out>
>>         more_balance = <optimized out>
>>         __vpp_verify = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>>         __vpp_verify = <optimized out>
>> #7  0xffffffff813efe9b in sched_balance_newidle (this_rq=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:12932
>>         weight = <optimized out>
>>         domain_cost = 0xff1100017acab300
>>         next_balance = <optimized out>
>>         this_cpu = 0x20b400
>>         continue_balancing = 0x1
>>         t0 = <optimized out>
>>         t1 = <optimized out>
>>         curr_cost = 0x0
>>         sd = 0xff1100010020b400
>>         pulled_task = 0x1
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>>         __dummy = <optimized out>
>>         __dummy2 = <optimized out>
>> #8  pick_next_task_fair (rq=0xff1100010020b400, prev=0x1, rf=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/fair.c:8973
>>         se = 0xfffb6e65
>>         p = <optimized out>
>>         new_tasks = <optimized out>
>>         again = <optimized out>
>>         idle = <optimized out>
>>         simple = <optimized out>
>> #9  0xffffffff81e1337e in __pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:5890
>>         class = 0xff1100010224aa00
>>         p = 0xffffffff824b34a8 <fair_sched_class>
>>         restart = <optimized out>
>> #10 pick_next_task (rq=<optimized out>, prev=<optimized out>, rf=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6426
>> No locals.
>> #11 __schedule (sched_mode=0x7acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6809
>>         prev = 0xff1100010146c380
>>         next = 0xffffffff824b34a8 <fair_sched_class>
>>         preempt = 0x1
>>         is_switch = 0x1
>>         switch_count = <optimized out>
>>         prev_state = <optimized out>
>>         rf = <incomplete type>
>>         rq = <optimized out>
>>         cpu = <optimized out>
>>         keep_resched = <optimized out>
>>         __vpp_verify = <optimized out>
>> #12 0xffffffff81e14097 in __schedule_loop (sched_mode=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6949
>> No locals.
>> #13 schedule () at /home/runner/work/mptcp_net-next/mptcp_net-next/kernel/sched/core.c:6964
>>         tsk = <optimized out>
>> #14 0xffffffff8179e372 in request_wait_answer (req=0xff11000100910160) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:552
>>         __int = <optimized out>
>>         __out = <optimized out>
>>         __wq_entry = <incomplete type>
>>         __ret = <optimized out>
>>         __ret = <optimized out>
>>         fc = 0xff110001023e6800
>>         fiq = <optimized out>
>>         err = <optimized out>
>> #15 0xffffffff8179e5a0 in __fuse_request_send (req=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:599
>>         fiq = 0x0
>> #16 __fuse_simple_request (idmap=0xff1100017acab300, fm=0x1, args=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dev.c:693
>>         fc = 0xff110001023e6800
>>         req = 0xff11000100910160
>>         ret = 0xff110001023e6800
>> #17 0xffffffff817a47d9 in fuse_simple_request (fm=<optimized out>, args=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1263
>> No locals.
>> #18 fuse_lookup_name (sb=0xff1100017acab300, nodeid=0x1, name=0x1, outarg=0xff1100017acab300, inode=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/dir.c:574
>>         fm = <optimized out>
>>         args = <incomplete type>
>>         forget = <optimized out>
>>         attr_version = <optimized out>
>>         evict_ctr = <optimized out>
>>         err = 0x411620
>> #19 0xffffffff817a49c9 in fuse_lookup (dir=0xff11000100606a00, entry=0xff11000100411600, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/fuse/fuse_i.h:1062
>>         outarg = <incomplete type>
>>         fc = <optimized out>
>>         inode = 0x0
>>         newent = 0xffa00000003afaf0
>>         err = <optimized out>
>>         epoch = <optimized out>
>>         outarg_valid = 0x0
>>         locked = <optimized out>
>>         out_iput = <optimized out>
>> #20 0xffffffff816c9e63 in __lookup_slow (name=0xff1100017acab300, dir=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:1866
>>         dentry = 0xff11000100411600
>>         old = <optimized out>
>>         inode = 0xff11000100606a00
>>         wq = <incomplete type>
>> #21 0xffffffff816c9f69 in lookup_slow (name=0xff1100017acab300, dir=0x1, flags=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:1883
>>         inode = <optimized out>
>>         res = <optimized out>
>> #22 0xffffffff816cddd8 in walk_component (nd=<optimized out>, flags=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2229
>>         dentry = 0x1
>> #23 lookup_last (nd=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2730
>> No locals.
>> #24 path_lookupat (nd=0xff1100017acab300, flags=0x1, path=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2754
>>         s = 0x1 <error: Cannot access memory at address 0x1>
>>         err = <optimized out>
>> #25 0xffffffff816cfee0 in filename_lookup (dfd=0x7acab300, name=0x1, flags=0x1, path=0xff1100017acab300, root=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/namei.c:2783
>>         retval = 0x1
>>         nd = {path = <incomplete type>, last = {{{hash = 0x314ef79d, len = 0x4}, hash_len = 0x4314ef79d}, name = 0xff110001009f1029 "dpkg"}, root = <incomplete type>, inode = 0xff11000100606a00, flags = 0x5, state = 0x2, seq = 0x0, next_seq = 0x0, m_seq = 0x34, r_seq = 0x4, last_type = 0x0, depth = 0x0, total_link_count = 0x0, stack = 0xffa00000003afcd8, internal = {{link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}, {link = <incomplete type>, done = <incomplete type>, name = 0x0, seq = 0x0}}, name = 0xff110001009f1000, pathname = 0xff110001009f1020 "/var/lib/dpkg", saved = 0x0, root_seq = 0x2, dfd = 0xffffff9c, dir_vfsuid = <incomplete type>, dir_mode = 0x41ed}
>> #26 0xffffffff816c1a8c in vfs_statx (dfd=0x7acab300, filename=0x1, flags=0x1, stat=0xff1100017acab300, request_mask=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:353
>>         path = <incomplete type>
>>         lookup_flags = 0x5
>>         error = 0xffffff9c
>> #27 0xffffffff816c2863 in do_statx (dfd=0x7acab300, filename=0x1, flags=0x1, mask=0x7acab300, buffer=0x0) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:769
>>         stat = {result_mask = 0x0, mode = 0x0, nlink = 0x0, blksize = 0x0, attributes = 0x0, attributes_mask = 0x0, ino = 0x0, dev = 0x0, rdev = 0x0, uid = <incomplete type>, gid = <incomplete type>, size = 0x0, atime = <incomplete type>, mtime = <incomplete type>, ctime = <incomplete type>, btime = <incomplete type>, blocks = 0x0, mnt_id = 0x0, change_cookie = 0x0, subvol = 0x0, dio_mem_align = 0x0, dio_offset_align = 0x0, dio_read_offset_align = 0x0, atomic_write_unit_min = 0x0, atomic_write_unit_max = 0x0, atomic_write_unit_max_opt = 0x0, atomic_write_segments_max = 0x0}
>>         error = 0x1
>> #28 0xffffffff816c2ab0 in __do_sys_statx (dfd=<optimized out>, filename=<optimized out>, flags=<optimized out>, mask=<optimized out>, buffer=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:823
>>         ret = 0xffffff9c
>>         name = <error reading variable name (Cannot access memory at address 0x0)>
>> #29 __se_sys_statx (dfd=<optimized out>, filename=<optimized out>, flags=<optimized out>, mask=<optimized out>, buffer=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:812
>>         ret = <optimized out>
>> #30 __x64_sys_statx (regs=0xff1100017acab300) at /home/runner/work/mptcp_net-next/mptcp_net-next/fs/stat.c:812
>> No locals.
>> #31 0xffffffff81e07124 in do_syscall_x64 (regs=<optimized out>, nr=<optimized out>) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:63
>>         unr = <optimized out>
>> #32 do_syscall_64 (regs=0xff1100017acab300, nr=0x1) at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/syscall_64.c:94
>> No locals.
>> #33 0xffffffff81000130 in entry_SYSCALL_64 () at /home/runner/work/mptcp_net-next/mptcp_net-next/arch/x86/entry/entry_64.S:122
>> No locals.
>> #34 0x000000000000000e in ?? ()
>> No symbol table info available.
>> #35 0x0000000000000001 in ?? ()
>> No symbol table info available.
>> #36 0x00007fff8cbcae40 in ?? ()
>> No symbol table info available.
>> #37 0x00007fbb012d6530 in ?? ()
>> No symbol table info available.
>> #38 0x00007fbb00d14c70 in ?? ()
>> No symbol table info available.
>> #39 0x00007fbb00d14e00 in ?? ()
>> No symbol table info available.
>> #40 0x0000000000000246 in ?? ()
>> No symbol table info available.
>> #41 0x0000000000000fff in ?? ()
>> No symbol table info available.
>> #42 0x0000000000000000 in ?? ()
>> No symbol table info available.
>
>
>[7] https://lore.kernel.org/aXdO52wh2rqTUi1E@shinmob
>
>[8] https://lore.kernel.org/20260201192234.380608594@kernel.org
>
>[9]
>https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=sched/urgent
>-- 
>Sponsored by the NGI0 Core fund.
>
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-02-06 16:38 ` Stefano Garzarella
@ 2026-02-06 17:13   ` Matthieu Baerts
  0 siblings, 0 replies; 45+ messages in thread
From: Matthieu Baerts @ 2026-02-06 17:13 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, kvm, virtualization, Netdev, rcu, MPTCP Linux,
	Linux Kernel, Peter Zijlstra, Thomas Gleixner,
	Shinichiro Kawasaki, Paul E. McKenney

Hi Stefano,

Thank you for your reply!

On 06/02/2026 17:38, Stefano Garzarella wrote:
> On Fri, Feb 06, 2026 at 12:54:13PM +0100, Matthieu Baerts wrote:
>> Hi Stefan, Stefano, + VM, RCU, sched people,
> 
> Hi Matt,
> 
>>
>> First, I'm sorry to cc a few MLs, but I'm still trying to locate the
>> origin of the issue I'm seeing.
>>
>> Our CI for the MPTCP subsystem is now regularly hitting various stalls
>> before even starting the MPTCP test suite. These issues are visible on
>> top of the latest net and net-next trees, which have been sync with
>> Linus' tree yesterday. All these issues have been seen on a "public CI"
>> using GitHub-hosted runners with KVM support, where the tested kernel is
>> launched in a nested (I suppose) VM. I can see the issue with or without
> 
> Just to be sure I'm on the same page, the issue is in the most nested
> guest, right? (the last VM started)

That's correct. From what I see [1], each GitHub-hosted runner is a new
VM, and I'm launching QEmu from there.

[1]
https://docs.github.com/en/actions/concepts/runners/github-hosted-runners

>> debug.config. According to the logs, it might have started around
>> v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
>> quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
>> and the CI doesn't currently have the ability to execute bisections.
>>
>> The stalls happen before starting the MPTCP test suite. The init program
>> creates a VSOCK listening socket via socat [1], and different hangs are
>> then visible: RCU stalls followed by a soft lockup [2], only a soft
>> lockup [3], sometimes the soft lockup comes with a delay [4] [5], or
>> there is no RCU stalls or soft lockups detected after one minute, but VM
>> is stalled [6]. In the last case, the VM is stopped after having
>> launched GDB to get more details about what was being executed.
>>
>> It feels like the issue is not directly caused by the VSOCK listening
>> socket, but the stalls always happen after having started the socat
>> command [1] in the background.
>>
>> One last thing: I thought my issue was linked to another one seen on XFS
>> side and reported by Shinichiro Kawasaki [7], but apparently not.
>> Indeed, Paul McKenney mentioned Shinichiro's issue is probably fixed by
>> Thomas Gleixner's series called "sched/mmcid: Cure mode transition woes"
>> [8]. I applied these patches from Peter Zijlstra's tree from
>> tip/sched/urgent [9], and my issue is still present.
>>
>> Any idea what could cause that, where to look at, or what could help to
>> find the root cause?
> 
> Mmm, nothing comes to mind at the vsock side :-(

That's OK, thank you for having checked! I hope someone else in CC can
help me finding the root cause!

> I understand that bisection can't be done in the CI env, but can you
> confirm in some way that 6.18 is working right with the same userspace?

Yes, I can confirm that. We run the tests on both the dev ("export") and
fixes ("export-net") branches, but also on stable versions:

  https://ci-results.mptcp.dev/flakes.html

(The "critical issues" have their headers red)

We don't see such issues in v6.18 and old kernels.

> That could help to try to identify at least if there is anything in
> AF_VSOCK we merged recently that can trigger that.

Our dev branch is on top of net-next, I guess I would have seen issues
directly related to AF_VSOCK earlier than after the net-next freeze in
January. Here, it looks like the first issues came during Linus' merge
window from the beginning of December, e.g. [2] is from the 4th of
December, on top of 'net' which was at commit 8f7aa3d3c732 ("Merge tag
'net-next-6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") from
Linus tree.

[2]
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19919313666/job/57104626001#step:7:5052

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
  2026-02-06 16:38 ` Stefano Garzarella
@ 2026-02-26 10:37 ` Jiri Slaby
  2026-03-02  5:28   ` Jiri Slaby
  2026-03-03 13:23   ` Matthieu Baerts
  1 sibling, 2 replies; 45+ messages in thread
From: Jiri Slaby @ 2026-02-26 10:37 UTC (permalink / raw)
  To: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella
  Cc: kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Peter Zijlstra, Thomas Gleixner, Shinichiro Kawasaki,
	Paul E. McKenney, Dave Hansen, luto@kernel.org

On 06. 02. 26, 12:54, Matthieu Baerts wrote:
> Our CI for the MPTCP subsystem is now regularly hitting various stalls
> before even starting the MPTCP test suite. These issues are visible on
> top of the latest net and net-next trees, which have been sync with
> Linus' tree yesterday. All these issues have been seen on a "public CI"
> using GitHub-hosted runners with KVM support, where the tested kernel is
> launched in a nested (I suppose) VM. I can see the issue with or without
> debug.config. According to the logs, it might have started around
> v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
> quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
> and the CI doesn't currently have the ability to execute bisections.

Hmm, after the switch of the qemu guest kernels to 6.19, our (opensuse) 
build service is stalling in smp_call_function_many_cond() randomly too:
https://bugzilla.suse.com/show_bug.cgi?id=1258936

The attachment from there contains sysrq-t logs too:
https://bugzilla.suse.com/attachment.cgi?id=888612

> The stalls happen before starting the MPTCP test suite. The init program
> creates a VSOCK listening socket via socat [1], and different hangs are
> then visible: RCU stalls followed by a soft lockup [2], only a soft
> lockup [3], sometimes the soft lockup comes with a delay [4] [5], or
> there is no RCU stalls or soft lockups detected after one minute, but VM
> is stalled [6]. In the last case, the VM is stopped after having
> launched GDB to get more details about what was being executed.
> 
> It feels like the issue is not directly caused by the VSOCK listening
> socket, but the stalls always happen after having started the socat
> command [1] in the background.

It fails randomly while building random packages (go, libreoffice, 
bayle, ...). I don't think it is VSOCK related in those cases, but who 
knows what the builds do...

I cannot reproduce locally either.

I came across:
   614da1d3d4cd x86: make page fault handling disable interrupts properly
but I have no idea if it could have impact on this at all.

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-02-26 10:37 ` Jiri Slaby
@ 2026-03-02  5:28   ` Jiri Slaby
  2026-03-02 11:46     ` Peter Zijlstra
  2026-03-03 13:23   ` Matthieu Baerts
  1 sibling, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-02  5:28 UTC (permalink / raw)
  To: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella
  Cc: kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Peter Zijlstra, Thomas Gleixner, Shinichiro Kawasaki,
	Paul E. McKenney, Dave Hansen, luto@kernel.org,
	Michal Koutný

On 26. 02. 26, 11:37, Jiri Slaby wrote:
> On 06. 02. 26, 12:54, Matthieu Baerts wrote:
>> Our CI for the MPTCP subsystem is now regularly hitting various stalls
>> before even starting the MPTCP test suite. These issues are visible on
>> top of the latest net and net-next trees, which have been sync with
>> Linus' tree yesterday. All these issues have been seen on a "public CI"
>> using GitHub-hosted runners with KVM support, where the tested kernel is
>> launched in a nested (I suppose) VM. I can see the issue with or without
>> debug.config. According to the logs, it might have started around
>> v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
>> quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
>> and the CI doesn't currently have the ability to execute bisections.
> 
> Hmm, after the switch of the qemu guest kernels to 6.19, our (opensuse) 
> build service is stalling in smp_call_function_many_cond() randomly too:
> https://bugzilla.suse.com/show_bug.cgi?id=1258936
> 
> The attachment from there contains sysrq-t logs too:
> https://bugzilla.suse.com/attachment.cgi?id=888612

A small update. Just in case this rings a bell somewhere.

We have a qemu mem dump from the affected kernel. It shows that both 
CPU0 and CPU1 are waiting for CPU2's rq lock. CPU2 is in userspace.




crash> bt -xsc 0
PID: 6483     TASK: ffff8d1759c20000  CPU: 0    COMMAND: "compile"
     [exception RIP: native_halt+14]
     RIP: ffffffffb9d1124e  RSP: ffffcead0696f9a0  RFLAGS: 00000046
     RAX: 0000000000000003  RBX: 0000000000040000  RCX: 00000000fffffff8
     RDX: ffff8d1a7ffc5140  RSI: 0000000000000003  RDI: ffff8d1a6fd35dc0
     RBP: ffff8d1a6fd35dc0   R8: ffff8d1a6fc36dc0   R9: fffffffffffffff8
     R10: 0000000000000000  R11: 0000000000000004  R12: ffff8d1a6fc36dc0
     R13: 0000000000000000  R14: ffff8d1a7ffc5140  R15: ffffcead0696fad0
     CS: 0010  SS: 0018
  #0 [ffffcead0696f9a0] kvm_wait+0x44 at ffffffffb9d0fe54
  #1 [ffffcead0696f9a8] __pv_queued_spin_lock_slowpath+0x247 at 
ffffffffbaafb507
  #2 [ffffcead0696f9d8] _raw_spin_lock+0x29 at ffffffffbaafadf9
  #3 [ffffcead0696f9e0] raw_spin_rq_lock_nested+0x1c at ffffffffb9d8c12c
  #4 [ffffcead0696f9f8] _raw_spin_rq_lock_irqsave+0x17 at ffffffffb9d96ca7
  #5 [ffffcead0696fa08] sched_balance_rq+0x56d at ffffffffb9da718d
  #6 [ffffcead0696fb18] pick_next_task_fair+0x240 at ffffffffb9da7e00
  #7 [ffffcead0696fb88] __schedule+0x19e at ffffffffbaaf00de
  #8 [ffffcead0696fc40] schedule+0x27 at ffffffffbaaf1697
  #9 [ffffcead0696fc50] futex_do_wait+0x4a at ffffffffb9e61c5a
#10 [ffffcead0696fc68] __futex_wait+0x8e at ffffffffb9e6241e
#11 [ffffcead0696fd30] futex_wait+0x6b at ffffffffb9e624fb
#12 [ffffcead0696fdc0] do_futex+0xc5 at ffffffffb9e5e305
#13 [ffffcead0696fdc8] __x64_sys_futex+0x112 at ffffffffb9e5e932
#14 [ffffcead0696fe38] do_syscall_64+0x81 at ffffffffbaae2a61
#15 [ffffcead0696ff40] entry_SYSCALL_64_after_hwframe+0x76 at 
ffffffffb9a0012f
     RIP: 0000000000495303  RSP: 000000c000073c98  RFLAGS: 00000286
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 0000000000495303
     RDX: 0000000000000000  RSI: 0000000000000080  RDI: 000000c000058958
     RBP: 000000c000073ce0   R8: 0000000000000000   R9: 0000000000000000
     R10: 0000000000000000  R11: 0000000000000286  R12: 0000000000000024
     R13: 0000000000000001  R14: 000000c000002c40  R15: 0000000000000001
     ORIG_RAX: 00000000000000ca  CS: 0033  SS: 002b


crash> bt -xsc 1
PID: 6481     TASK: ffff8d1759c8b680  CPU: 1    COMMAND: "compile"
     [exception RIP: __pv_queued_spin_lock_slowpath+190]
     RIP: ffffffffbaafb37e  RSP: ffffcead000f8b38  RFLAGS: 00000046
     RAX: 0000000000000001  RBX: 0000000000000000  RCX: 0000000000000001
     RDX: 0000000000040003  RSI: 0000000000040003  RDI: ffff8d1a6fd35dc0
     RBP: ffff8d1a6fd35dc0   R8: 0000000000000000   R9: 00000001000c3f60
     R10: ffffffffbbc75960  R11: ffffcead000f8a48  R12: ffff8d1a6fcb6dc0
     R13: 0000000000000001  R14: 0000000000000000  R15: ffffffffbbe65940
     CS: 0010  SS: 0000
  #0 [ffffcead000f8b60] _raw_spin_lock+0x29 at ffffffffbaafadf9
  #1 [ffffcead000f8b68] raw_spin_rq_lock_nested+0x1c at ffffffffb9d8c12c
  #2 [ffffcead000f8b80] _raw_spin_rq_lock_irqsave+0x17 at ffffffffb9dc9cc7
  #3 [ffffcead000f8b90] print_cfs_rq+0xce at ffffffffb9dd0d8e
  #4 [ffffcead000f8c98] print_cfs_stats+0x62 at ffffffffb9da9ee2
  #5 [ffffcead000f8cc8] print_cpu+0x243 at ffffffffb9dcbe73
  #6 [ffffcead000f8d00] sysrq_sched_debug_show+0x2e at ffffffffb9dd1b7e
  #7 [ffffcead000f8d18] show_state_filter+0xcd at ffffffffb9d91f4d
  #8 [ffffcead000f8d40] sysrq_handle_showstate+0x10 at ffffffffba60b750
  #9 [ffffcead000f8d48] __handle_sysrq.cold+0x9b at ffffffffb9c4f486
#10 [ffffcead000f8d70] sysrq_filter+0xd7 at ffffffffba60c237
#11 [ffffcead000f8d98] input_handle_events_filter+0x45 at ffffffffba766c05
#12 [ffffcead000f8dd0] input_pass_values+0x134 at ffffffffba766ec4
#13 [ffffcead000f8e00] input_event_dispose+0x156 at ffffffffba767046
#14 [ffffcead000f8e20] input_event+0x58 at ffffffffba76ac18
#15 [ffffcead000f8e50] atkbd_receive_byte+0x64d at ffffffffba772e6d
#16 [ffffcead000f8ea8] ps2_interrupt+0x9d at ffffffffba7665ed
#17 [ffffcead000f8ed0] serio_interrupt+0x4f at ffffffffba761e0f
#18 [ffffcead000f8f00] i8042_handle_data+0x11c at ffffffffba76316c
#19 [ffffcead000f8f40] i8042_interrupt+0x11 at ffffffffba763581
#20 [ffffcead000f8f50] __handle_irq_event_percpu+0x55 at ffffffffb9df1e15
#21 [ffffcead000f8f90] handle_irq_event+0x38 at ffffffffb9df2058
#22 [ffffcead000f8fb0] handle_edge_irq+0xc5 at ffffffffb9df7b95
#23 [ffffcead000f8fd0] __common_interrupt+0x44 at ffffffffb9cc2354
#24 [ffffcead000f8ff0] common_interrupt+0x80 at ffffffffbaae6090
--- <IRQ stack> ---
#25 [ffffcead06bcfb98] asm_common_interrupt+0x26 at ffffffffb9a01566
     [exception RIP: smp_call_function_many_cond+304]
     RIP: ffffffffb9e63080  RSP: ffffcead06bcfc40  RFLAGS: 00000202
     RAX: 0000000000000011  RBX: 0000000000000202  RCX: ffff8d1a6fc3f800
     RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
     RBP: 0000000000000001   R8: ffff8d174009cc30   R9: 0000000000000000
     R10: ffff8d174009c0d8  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000003  R14: ffff8d1a6fcb7280  R15: 0000000000000001
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
#26 [ffffcead06bcfcb0] on_each_cpu_cond_mask+0x24 at ffffffffb9e634f4
#27 [ffffcead06bcfcb8] flush_tlb_mm_range+0x1b1 at ffffffffb9d225d1
#28 [ffffcead06bcfd08] ptep_clear_flush+0x93 at ffffffffba066e13
#29 [ffffcead06bcfd30] do_wp_page+0x6a2 at ffffffffba04c692
#30 [ffffcead06bcfdb8] __handle_mm_fault+0xa49 at ffffffffba055c79
#31 [ffffcead06bcfe98] handle_mm_fault+0xe7 at ffffffffba056297
#32 [ffffcead06bcfed8] do_user_addr_fault+0x21a at ffffffffb9d1db6a
#33 [ffffcead06bcff18] exc_page_fault+0x69 at ffffffffbaae99c9
#34 [ffffcead06bcff40] asm_exc_page_fault+0x26 at ffffffffb9a012a6
     RIP: 000000000042351c  RSP: 000000c0013aafd0  RFLAGS: 00010246
     RAX: 0000000000000002  RBX: 00000000017584c0  RCX: 0000000000000000
     RDX: 0000000000000005  RSI: 000000000163edc0  RDI: 0000000000000003
     RBP: 000000c0013ab080   R8: 0000000000000001   R9: 00007f0d9853f800
     R10: 00007f0d98334e00  R11: 00007f0d98afa020  R12: 00007f0d98afa020
     R13: 0000000000000050  R14: 000000c000002380  R15: 0000000000000001
     ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
     RIP: 000000000042351c  RSP: 000000c0013aafd0  RFLAGS: 00010246
     RAX: 0000000000000002  RBX: 00000000017584c0  RCX: 0000000000000000
     RDX: 0000000000000005  RSI: 000000000163edc0  RDI: 0000000000000003
     RBP: 000000c0013ab080   R8: 0000000000000001   R9: 00007f0d9853f800
     R10: 00007f0d98334e00  R11: 00007f0d98afa020  R12: 00007f0d98afa020
     R13: 0000000000000050  R14: 000000c000002380  R15: 0000000000000001
     ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b



crash> bt -xsc 2
PID: 6540     TASK: ffff8d1773ae3680  CPU: 2    COMMAND: "compile"
     RIP: 0000000000495372  RSP: 000000c00003e000  RFLAGS: 00000206
     RAX: 0000000000000000  RBX: 0000000000000003  RCX: 0000000000495372
     RDX: 0000000000000000  RSI: 000000c00003e000  RDI: 00000000000d0f00
     RBP: 00007ffcf8a71aa8   R8: 000000c00005a090   R9: 000000c000002700
     R10: 0000000000000000  R11: 0000000000000206  R12: 0000000000491580
     R13: 000000c00005a008  R14: 00000000017222e0  R15: ffffffffffffffff
     ORIG_RAX: 0000000000000038  CS: 0033  SS: 002b



The state of the lock:

crash> struct rq.__lock -x ffff8d1a6fd35dc0
   __lock = {
     raw_lock = {
       {
         val = {
           counter = 0x40003
         },
         {
           locked = 0x3,
           pending = 0x0
         },
         {
           locked_pending = 0x3,
           tail = 0x4
         }
       }
     }
   },


thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-02  5:28   ` Jiri Slaby
@ 2026-03-02 11:46     ` Peter Zijlstra
  2026-03-02 14:30       ` Waiman Long
  2026-03-05  7:00       ` Jiri Slaby
  0 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-03-02 11:46 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Thomas Gleixner, Shinichiro Kawasaki, Paul E. McKenney,
	Dave Hansen, luto@kernel.org, Michal Koutný, Waiman Long

On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:

> The state of the lock:
> 
> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>   __lock = {
>     raw_lock = {
>       {
>         val = {
>           counter = 0x40003
>         },
>         {
>           locked = 0x3,
>           pending = 0x0
>         },
>         {
>           locked_pending = 0x3,
>           tail = 0x4
>         }
>       }
>     }
>   },
> 


That had me remember the below patch that never quite made it. I've
rebased it to something more recent so it applies.

If you stick that in, we might get a clue as to who is owning that lock.
Provided it all wants to reproduce well enough.

---
Subject: locking/qspinlock: Save previous node & owner CPU into mcs_spinlock
From: Waiman Long <longman@redhat.com>
Date: Fri, 3 May 2024 22:41:06 -0400

From: Waiman Long <longman@redhat.com>

When examining a contended spinlock in a crash dump, we can only find
out the tail CPU in the MCS wait queue. There is no simple way to find
out what other CPUs are waiting for the spinlock and which CPU is the
lock owner.

Make it easier to figure out these information by saving previous node
data into the mcs_spinlock structure. This will allow us to reconstruct
the MCS wait queue from tail to head. In order not to expand the size
of mcs_spinlock, the original count field is split into two 16-bit
chunks. The first chunk is for count and the second one is the new
prev_node value.

  bits 0-1 : qnode index
  bits 2-15: CPU number + 1

This prev_node value may be truncated if there are 16k or more CPUs in
the system.

The locked value in the queue head is also repurposed to hold an encoded
qspinlock owner CPU number when acquiring the lock in the qspinlock
slowpath of an contended lock.

This lock owner information will not be available when the lock is
acquired directly in the fast path or in the pending code path. There
is no easy way around that.

These changes should make analysis of a contended spinlock in a crash
dump easier.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20240504024106.654319-1-longman@redhat.com
---
 include/asm-generic/mcs_spinlock.h |    5 +++--
 kernel/locking/mcs_spinlock.h      |    8 +++++++-
 kernel/locking/qspinlock.c         |    8 ++++++++
 3 files changed, 18 insertions(+), 3 deletions(-)

--- a/include/asm-generic/mcs_spinlock.h
+++ b/include/asm-generic/mcs_spinlock.h
@@ -3,8 +3,9 @@
 
 struct mcs_spinlock {
 	struct mcs_spinlock *next;
-	int locked; /* 1 if lock acquired */
-	int count;  /* nesting count, see qspinlock.c */
+	int locked;	 /* non-zero if lock acquired */
+	short count;	 /* nesting count, see qspinlock.c */
+	short prev_node; /* encoded previous node value */
 };
 
 /*
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -13,6 +13,12 @@
 #ifndef __LINUX_MCS_SPINLOCK_H
 #define __LINUX_MCS_SPINLOCK_H
 
+/*
+ * Save an encoded version of the current MCS lock owner CPU to the
+ * mcs_spinlock structure of the next lock owner.
+ */
+#define MCS_LOCKED	(smp_processor_id() + 1)
+
 #include <asm/mcs_spinlock.h>
 
 #ifndef arch_mcs_spin_lock_contended
@@ -34,7 +40,7 @@
  * unlocking.
  */
 #define arch_mcs_spin_unlock_contended(l)				\
-	smp_store_release((l), 1)
+	smp_store_release((l), MCS_LOCKED)
 #endif
 
 /*
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -250,6 +250,7 @@ void __lockfunc queued_spin_lock_slowpat
 
 	node->locked = 0;
 	node->next = NULL;
+	node->prev_node = 0;
 	pv_init_node(node);
 
 	/*
@@ -278,6 +279,13 @@ void __lockfunc queued_spin_lock_slowpat
 	next = NULL;
 
 	/*
+	 * The prev_node value is saved for crash dump analysis purpose only,
+	 * it is not used within the qspinlock code. The encoded node value
+	 * may be truncated if there are 16k or more CPUs in the system.
+	 */
+	node->prev_node = old >> _Q_TAIL_IDX_OFFSET;
+
+	/*
 	 * if there was a previous node; link it and wait until reaching the
 	 * head of the waitqueue.
 	 */

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-02 11:46     ` Peter Zijlstra
@ 2026-03-02 14:30       ` Waiman Long
  2026-03-05  7:00       ` Jiri Slaby
  1 sibling, 0 replies; 45+ messages in thread
From: Waiman Long @ 2026-03-02 14:30 UTC (permalink / raw)
  To: Peter Zijlstra, Jiri Slaby
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Thomas Gleixner, Shinichiro Kawasaki, Paul E. McKenney,
	Dave Hansen, luto@kernel.org, Michal Koutný

On 3/2/26 6:46 AM, Peter Zijlstra wrote:
> On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:
>
>> The state of the lock:
>>
>> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>>    __lock = {
>>      raw_lock = {
>>        {
>>          val = {
>>            counter = 0x40003
>>          },
>>          {
>>            locked = 0x3,
>>            pending = 0x0
>>          },
>>          {
>>            locked_pending = 0x3,
>>            tail = 0x4
>>          }
>>        }
>>      }
>>    },
>>
>
> That had me remember the below patch that never quite made it. I've
> rebased it to something more recent so it applies.
>
> If you stick that in, we might get a clue as to who is owning that lock.
> Provided it all wants to reproduce well enough.
>
> ---
> Subject: locking/qspinlock: Save previous node & owner CPU into mcs_spinlock
> From: Waiman Long <longman@redhat.com>
> Date: Fri, 3 May 2024 22:41:06 -0400

Oh, I forgot about that patch. I should had followed up at that time. 
BTW, a lock value of 3 means that it is running paravirtual qspinlock. 
It also means that we may not know exactly what the lock owner is if it 
was acquired by lock stealing.

Cheers,
Longman

>
> From: Waiman Long <longman@redhat.com>
>
> When examining a contended spinlock in a crash dump, we can only find
> out the tail CPU in the MCS wait queue. There is no simple way to find
> out what other CPUs are waiting for the spinlock and which CPU is the
> lock owner.
>
> Make it easier to figure out these information by saving previous node
> data into the mcs_spinlock structure. This will allow us to reconstruct
> the MCS wait queue from tail to head. In order not to expand the size
> of mcs_spinlock, the original count field is split into two 16-bit
> chunks. The first chunk is for count and the second one is the new
> prev_node value.
>
>    bits 0-1 : qnode index
>    bits 2-15: CPU number + 1
>
> This prev_node value may be truncated if there are 16k or more CPUs in
> the system.
>
> The locked value in the queue head is also repurposed to hold an encoded
> qspinlock owner CPU number when acquiring the lock in the qspinlock
> slowpath of an contended lock.
>
> This lock owner information will not be available when the lock is
> acquired directly in the fast path or in the pending code path. There
> is no easy way around that.
>
> These changes should make analysis of a contended spinlock in a crash
> dump easier.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://patch.msgid.link/20240504024106.654319-1-longman@redhat.com
> ---
>   include/asm-generic/mcs_spinlock.h |    5 +++--
>   kernel/locking/mcs_spinlock.h      |    8 +++++++-
>   kernel/locking/qspinlock.c         |    8 ++++++++
>   3 files changed, 18 insertions(+), 3 deletions(-)
>
> --- a/include/asm-generic/mcs_spinlock.h
> +++ b/include/asm-generic/mcs_spinlock.h
> @@ -3,8 +3,9 @@
>   
>   struct mcs_spinlock {
>   	struct mcs_spinlock *next;
> -	int locked; /* 1 if lock acquired */
> -	int count;  /* nesting count, see qspinlock.c */
> +	int locked;	 /* non-zero if lock acquired */
> +	short count;	 /* nesting count, see qspinlock.c */
> +	short prev_node; /* encoded previous node value */
>   };
>   
>   /*
> --- a/kernel/locking/mcs_spinlock.h
> +++ b/kernel/locking/mcs_spinlock.h
> @@ -13,6 +13,12 @@
>   #ifndef __LINUX_MCS_SPINLOCK_H
>   #define __LINUX_MCS_SPINLOCK_H
>   
> +/*
> + * Save an encoded version of the current MCS lock owner CPU to the
> + * mcs_spinlock structure of the next lock owner.
> + */
> +#define MCS_LOCKED	(smp_processor_id() + 1)
> +
>   #include <asm/mcs_spinlock.h>
>   
>   #ifndef arch_mcs_spin_lock_contended
> @@ -34,7 +40,7 @@
>    * unlocking.
>    */
>   #define arch_mcs_spin_unlock_contended(l)				\
> -	smp_store_release((l), 1)
> +	smp_store_release((l), MCS_LOCKED)
>   #endif
>   
>   /*
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -250,6 +250,7 @@ void __lockfunc queued_spin_lock_slowpat
>   
>   	node->locked = 0;
>   	node->next = NULL;
> +	node->prev_node = 0;
>   	pv_init_node(node);
>   
>   	/*
> @@ -278,6 +279,13 @@ void __lockfunc queued_spin_lock_slowpat
>   	next = NULL;
>   
>   	/*
> +	 * The prev_node value is saved for crash dump analysis purpose only,
> +	 * it is not used within the qspinlock code. The encoded node value
> +	 * may be truncated if there are 16k or more CPUs in the system.
> +	 */
> +	node->prev_node = old >> _Q_TAIL_IDX_OFFSET;
> +
> +	/*
>   	 * if there was a previous node; link it and wait until reaching the
>   	 * head of the waitqueue.
>   	 */
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-02-26 10:37 ` Jiri Slaby
  2026-03-02  5:28   ` Jiri Slaby
@ 2026-03-03 13:23   ` Matthieu Baerts
  2026-03-05  6:46     ` Jiri Slaby
  1 sibling, 1 reply; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-03 13:23 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Peter Zijlstra, Thomas Gleixner, Shinichiro Kawasaki,
	Paul E. McKenney, Dave Hansen, Stefan Hajnoczi, luto@kernel.org,
	Stefano Garzarella

Hi Jiri,

On 26/02/2026 11:37, Jiri Slaby wrote:
> On 06. 02. 26, 12:54, Matthieu Baerts wrote:
>> Our CI for the MPTCP subsystem is now regularly hitting various stalls
>> before even starting the MPTCP test suite. These issues are visible on
>> top of the latest net and net-next trees, which have been sync with
>> Linus' tree yesterday. All these issues have been seen on a "public CI"
>> using GitHub-hosted runners with KVM support, where the tested kernel is
>> launched in a nested (I suppose) VM. I can see the issue with or without
>> debug.config. According to the logs, it might have started around
>> v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
>> quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
>> and the CI doesn't currently have the ability to execute bisections.
> 
> Hmm, after the switch of the qemu guest kernels to 6.19, our (opensuse)
> build service is stalling in smp_call_function_many_cond() randomly too:
> https://bugzilla.suse.com/show_bug.cgi?id=1258936
> 
> The attachment from there contains sysrq-t logs too:
> https://bugzilla.suse.com/attachment.cgi?id=888612

I'm glad I'm not the only one with this issue :)

In your case, do you also have nested VMs with KVM support?

Are you able to easily reproduce the issue and change the guest kernel
in your build service?

On my side, any debugging steps need to be automated. Lately, it looks
like the issue is more easily triggered on a stable 6.19 kernel, than on
the last RC.

>> The stalls happen before starting the MPTCP test suite. The init program
>> creates a VSOCK listening socket via socat [1], and different hangs are
>> then visible: RCU stalls followed by a soft lockup [2], only a soft
>> lockup [3], sometimes the soft lockup comes with a delay [4] [5], or
>> there is no RCU stalls or soft lockups detected after one minute, but VM
>> is stalled [6]. In the last case, the VM is stopped after having
>> launched GDB to get more details about what was being executed.
>>
>> It feels like the issue is not directly caused by the VSOCK listening
>> socket, but the stalls always happen after having started the socat
>> command [1] in the background.
> 
> It fails randomly while building random packages (go, libreoffice,
> bayle, ...). I don't think it is VSOCK related in those cases, but who
> knows what the builds do...

Indeed, unlikely to be VSOCK then.

> I cannot reproduce locally either.
> 
> I came across:
>   614da1d3d4cd x86: make page fault handling disable interrupts properly
> but I have no idea if it could have impact on this at all.

Did it help to revert it?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-03 13:23   ` Matthieu Baerts
@ 2026-03-05  6:46     ` Jiri Slaby
  0 siblings, 0 replies; 45+ messages in thread
From: Jiri Slaby @ 2026-03-05  6:46 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Peter Zijlstra, Thomas Gleixner, Shinichiro Kawasaki,
	Paul E. McKenney, Dave Hansen, Stefan Hajnoczi, luto@kernel.org,
	Stefano Garzarella

Hi,

On 03. 03. 26, 14:23, Matthieu Baerts wrote:
> On 26/02/2026 11:37, Jiri Slaby wrote:
>> On 06. 02. 26, 12:54, Matthieu Baerts wrote:
>>> Our CI for the MPTCP subsystem is now regularly hitting various stalls
>>> before even starting the MPTCP test suite. These issues are visible on
>>> top of the latest net and net-next trees, which have been sync with
>>> Linus' tree yesterday. All these issues have been seen on a "public CI"
>>> using GitHub-hosted runners with KVM support, where the tested kernel is
>>> launched in a nested (I suppose) VM. I can see the issue with or without
>>> debug.config. According to the logs, it might have started around
>>> v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
>>> quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
>>> and the CI doesn't currently have the ability to execute bisections.
>>
>> Hmm, after the switch of the qemu guest kernels to 6.19, our (opensuse)
>> build service is stalling in smp_call_function_many_cond() randomly too:
>> https://bugzilla.suse.com/show_bug.cgi?id=1258936
>>
>> The attachment from there contains sysrq-t logs too:
>> https://bugzilla.suse.com/attachment.cgi?id=888612
> 
> I'm glad I'm not the only one with this issue :)
> 
> In your case, do you also have nested VMs with KVM support?

No, it's KVM directly on bare metal.

> Are you able to easily reproduce the issue and change the guest kernel
> in your build service?

Unfortunately no and no.

> On my side, any debugging steps need to be automated. Lately, it looks
> like the issue is more easily triggered on a stable 6.19 kernel, than on
> the last RC.
> 
>>> The stalls happen before starting the MPTCP test suite. The init program
>>> creates a VSOCK listening socket via socat [1], and different hangs are
>>> then visible: RCU stalls followed by a soft lockup [2], only a soft
>>> lockup [3], sometimes the soft lockup comes with a delay [4] [5], or
>>> there is no RCU stalls or soft lockups detected after one minute, but VM
>>> is stalled [6]. In the last case, the VM is stopped after having
>>> launched GDB to get more details about what was being executed.
>>>
>>> It feels like the issue is not directly caused by the VSOCK listening
>>> socket, but the stalls always happen after having started the socat
>>> command [1] in the background.
>>
>> It fails randomly while building random packages (go, libreoffice,
>> bayle, ...). I don't think it is VSOCK related in those cases, but who
>> knows what the builds do...
> 
> Indeed, unlikely to be VSOCK then.
> 
>> I cannot reproduce locally either.
>>
>> I came across:
>>    614da1d3d4cd x86: make page fault handling disable interrupts properly
>> but I have no idea if it could have impact on this at all.
> 
> Did it help to revert it?

We haven't tried, it is unlikely the cause.

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-02 11:46     ` Peter Zijlstra
  2026-03-02 14:30       ` Waiman Long
@ 2026-03-05  7:00       ` Jiri Slaby
  2026-03-05 11:53         ` Jiri Slaby
  1 sibling, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-05  7:00 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On 02. 03. 26, 12:46, Peter Zijlstra wrote:
> On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:
> 
>> The state of the lock:
>>
>> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>>    __lock = {
>>      raw_lock = {
>>        {
>>          val = {
>>            counter = 0x40003
>>          },
>>          {
>>            locked = 0x3,
>>            pending = 0x0
>>          },
>>          {
>>            locked_pending = 0x3,
>>            tail = 0x4
>>          }
>>        }
>>      }
>>    },
>>
> 
> 
> That had me remember the below patch that never quite made it. I've
> rebased it to something more recent so it applies.
> 
> If you stick that in, we might get a clue as to who is owning that lock.
> Provided it all wants to reproduce well enough.

Thanks, I applied it, but to date it is still not accepted yet:
https://build.opensuse.org/requests/1335893


In the meantime, me and Michal K. did some digging into qemu dumps. 
Details at (and a couple previous comments):
https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17

tl;dr:

In one of the dumps, one process sits in
   context_switch
     -> mm_get_cid (before switch_to())

 > 65 kworker/1:1 SP= 0xffffcf82c022fd98 -> __schedule+0x16ee 
(ffffffff820f162e) -> call mm_get_cid

Michal extracted the vCPU's RIP and it turned out:
 > Hm, I'd say the CPU could be spinning in mm_get_cid() waiting for a 
free CID.
 > ...
 > ffff8a88458137c0:  000000000000000f 000000000000000f
 >                                                    ^
 > Hm, so indeed CIDs for all four CPUs are occupied.

To me (I don't know what CID is either), this might point as a possible 
culprit to Thomas' "sched/mmcid: Cure mode transition woes" [1].

Funnily enough, 47ee94efccf6 ("sched/mmcid: Protect transition on weakly 
ordered systems") spells:
 >     As a consequence the task will
 >     not drop the CID when scheduling out before the fixup is 
completed, which
 >     means the CID space can be exhausted and the next task scheduling 
in will
 >     loop in mm_get_cid() and the fixup thread can livelock on the 
held runqueue
 >     lock as above.

Which sounds like what exactly happens here. Except the patch is from 
the series above, so is already in 6.19 obviously.


I noticed there is also a 7.0-rc1 fix:
   1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on mode switch
But that got into 6.19.1 already (we are at 6.19.3). So does not improve 
the situation.

Any ideas?



[1] https://lore.kernel.org/all/20260201192234.380608594@kernel.org/

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-05  7:00       ` Jiri Slaby
@ 2026-03-05 11:53         ` Jiri Slaby
  2026-03-05 12:20           ` Jiri Slaby
  0 siblings, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-05 11:53 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On 05. 03. 26, 8:00, Jiri Slaby wrote:
> On 02. 03. 26, 12:46, Peter Zijlstra wrote:
>> On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:
>>
>>> The state of the lock:
>>>
>>> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>>>    __lock = {
>>>      raw_lock = {
>>>        {
>>>          val = {
>>>            counter = 0x40003
>>>          },
>>>          {
>>>            locked = 0x3,
>>>            pending = 0x0
>>>          },
>>>          {
>>>            locked_pending = 0x3,
>>>            tail = 0x4
>>>          }
>>>        }
>>>      }
>>>    },
>>>
>>
>>
>> That had me remember the below patch that never quite made it. I've
>> rebased it to something more recent so it applies.
>>
>> If you stick that in, we might get a clue as to who is owning that lock.
>> Provided it all wants to reproduce well enough.
> 
> Thanks, I applied it, but to date it is still not accepted yet:
> https://build.opensuse.org/requests/1335893

OK, I have a first dump with the patch applied:
   __lock = {
     raw_lock = {
       {
         val = {
           counter = 0x2c0003
         },
         {
           locked = 0x3,
           pending = 0x0
         },
         {
           locked_pending = 0x3,
           tail = 0x2c
         }
       }
     }
   },

I am not sure if it is of any help?




BUT: I have another dump with LOCKDEP (but NOT the patch above). The 
kernel is again spinning in mm_get_cid(), presumably waiting for a free 
bit in the map as before [1]:


[  162.660584] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[  162.661378] Sending NMI from CPU 3 to CPUs 1:
[  162.661398] NMI backtrace for cpu 1
...
[  162.661411] RIP: 0010:mm_get_cid+0x54/0xc0


7680 is active on CPU 1:
PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"


CPU3 is waiting for the CPU1's rq_lock:
RDX: 0000000000000000  RSI: 0000000000000003  RDI: ffff8cc72fcb8500
...
  #3 [ffffd2e9c0083da0] raw_spin_rq_lock_nested+0x20 at ffffffff9339e700

crash> struct rq.__lock -x ffff8cc72fcb8500
   __lock = {
     raw_lock = {
       {
         val = {
           counter = 0x100003
         },
         {
           locked = 0x3,
           pending = 0x0
         },
         {
           locked_pending = 0x3,
           tail = 0x10
         }
       }
     },
     magic = 0xdead4ead,
     owner_cpu = 0x1,
     owner = 0xffff8cc4038b8000,
     dep_map = {
       key = 0xffffffff96245970 <__key.7>,
       class_cache = {0xffffffff9644b488 <lock_classes+10600>, 0x0},
       name = 0xffffffff94ba3ab3 "&rq->__lock",
       wait_type_outer = 0x0,
       wait_type_inner = 0x2,
       lock_type = 0x0
     }
   },

owner_cpu is 1, owner is:
PID: 7508     TASK: ffff8cc4038b8000  CPU: 1    COMMAND: "compile"

But as you can see above, CPU1 is occupied with a different task:
crash> bt -sxc 1
PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"

spinning in mm_get_cid() as I wrote. See the objdump of mm_get_cid below.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17


> ffffffff8139cd40 <mm_get_cid>:
> mm_get_cid():
> include/linux/cpumask.h:1020
> ffffffff8139cd40:       8b 05 9a d7 40 02       mov    0x240d79a(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
> kernel/sched/sched.h:3779
> ffffffff8139cd46:       55                      push   %rbp
> ffffffff8139cd47:       53                      push   %rbx
> include/linux/mm_types.h:1477
> ffffffff8139cd48:       48 8d 9f 80 0b 00 00    lea    0xb80(%rdi),%rbx
> kernel/sched/sched.h:3780 (discriminator 2)
> ffffffff8139cd4f:       8b b7 0c 01 00 00       mov    0x10c(%rdi),%esi
> include/linux/cpumask.h:1020
> ffffffff8139cd55:       83 c0 3f                add    $0x3f,%eax
> ffffffff8139cd58:       c1 e8 03                shr    $0x3,%eax
> kernel/sched/sched.h:3780 (discriminator 2)
> ffffffff8139cd5b:       48 89 f5                mov    %rsi,%rbp
> include/linux/mm_types.h:1479 (discriminator 1)
> ffffffff8139cd5e:       25 f8 ff ff 1f          and    $0x1ffffff8,%eax
> include/linux/mm_types.h:1489 (discriminator 1)
> ffffffff8139cd63:       48 8d 3c 43             lea    (%rbx,%rax,2),%rdi
> include/linux/find.h:393
> ffffffff8139cd67:       e8 44 d8 6e 00          call   ffffffff81a8a5b0 <_find_first_zero_bit>
> kernel/sched/sched.h:3771
> ffffffff8139cd6c:       39 e8                   cmp    %ebp,%eax
> ffffffff8139cd6e:       73 7c                   jae    ffffffff8139cdec <mm_get_cid+0xac>
> ffffffff8139cd70:       89 c1                   mov    %eax,%ecx
> kernel/sched/sched.h:3773 (discriminator 1)
> ffffffff8139cd72:       89 c2                   mov    %eax,%edx
> include/linux/cpumask.h:1020
> ffffffff8139cd74:       8b 05 66 d7 40 02       mov    0x240d766(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
> ffffffff8139cd7a:       83 c0 3f                add    $0x3f,%eax
> ffffffff8139cd7d:       c1 e8 03                shr    $0x3,%eax
> include/linux/mm_types.h:1479 (discriminator 1)
> ffffffff8139cd80:       25 f8 ff ff 1f          and    $0x1ffffff8,%eax
> include/linux/mm_types.h:1489 (discriminator 1)
> ffffffff8139cd85:       48 8d 04 43             lea    (%rbx,%rax,2),%rax
> arch/x86/include/asm/bitops.h:136
> ffffffff8139cd89:       f0 48 0f ab 10          lock bts %rdx,(%rax)
> kernel/sched/sched.h:3773 (discriminator 2)
> ffffffff8139cd8e:       73 4b                   jae    ffffffff8139cddb <mm_get_cid+0x9b>
> ffffffff8139cd90:       eb 5a                   jmp    ffffffff8139cdec <mm_get_cid+0xac>
> arch/x86/include/asm/vdso/processor.h:13
> ffffffff8139cd92:       f3 90                   pause
> include/linux/cpumask.h:1020
> ffffffff8139cd94:       8b 05 46 d7 40 02       mov    0x240d746(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>

The CPU1 was caught by the NMI here ^^^^^^^^^^^^^^^^^^^^.




> In the meantime, me and Michal K. did some digging into qemu dumps. 
> Details at (and a couple previous comments):
> https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
> 
> tl;dr:
> 
> In one of the dumps, one process sits in
>    context_switch
>      -> mm_get_cid (before switch_to())
> 
>  > 65 kworker/1:1 SP= 0xffffcf82c022fd98 -> __schedule+0x16ee 
> (ffffffff820f162e) -> call mm_get_cid
> 
> Michal extracted the vCPU's RIP and it turned out:
>  > Hm, I'd say the CPU could be spinning in mm_get_cid() waiting for a 
> free CID.
>  > ...
>  > ffff8a88458137c0:  000000000000000f 000000000000000f
>  >                                                    ^
>  > Hm, so indeed CIDs for all four CPUs are occupied.
> 
> To me (I don't know what CID is either), this might point as a possible 
> culprit to Thomas' "sched/mmcid: Cure mode transition woes" [1].
> 
> Funnily enough, 47ee94efccf6 ("sched/mmcid: Protect transition on weakly 
> ordered systems") spells:
>  >     As a consequence the task will
>  >     not drop the CID when scheduling out before the fixup is 
> completed, which
>  >     means the CID space can be exhausted and the next task scheduling 
> in will
>  >     loop in mm_get_cid() and the fixup thread can livelock on the 
> held runqueue
>  >     lock as above.
> 
> Which sounds like what exactly happens here. Except the patch is from 
> the series above, so is already in 6.19 obviously.
> 
> 
> I noticed there is also a 7.0-rc1 fix:
>    1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on mode switch
> But that got into 6.19.1 already (we are at 6.19.3). So does not improve 
> the situation.
> 
> Any ideas?
> 
> 
> 
> [1] https://lore.kernel.org/all/20260201192234.380608594@kernel.org/
> 
> thanks,

-- 
js
suse labs


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-05 11:53         ` Jiri Slaby
@ 2026-03-05 12:20           ` Jiri Slaby
  2026-03-05 16:16             ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-05 12:20 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On 05. 03. 26, 12:53, Jiri Slaby wrote:
> On 05. 03. 26, 8:00, Jiri Slaby wrote:
>> On 02. 03. 26, 12:46, Peter Zijlstra wrote:
>>> On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:
>>>
>>>> The state of the lock:
>>>>
>>>> crash> struct rq.__lock -x ffff8d1a6fd35dc0
>>>>    __lock = {
>>>>      raw_lock = {
>>>>        {
>>>>          val = {
>>>>            counter = 0x40003
>>>>          },
>>>>          {
>>>>            locked = 0x3,
>>>>            pending = 0x0
>>>>          },
>>>>          {
>>>>            locked_pending = 0x3,
>>>>            tail = 0x4
>>>>          }
>>>>        }
>>>>      }
>>>>    },
>>>>
>>>
>>>
>>> That had me remember the below patch that never quite made it. I've
>>> rebased it to something more recent so it applies.
>>>
>>> If you stick that in, we might get a clue as to who is owning that lock.
>>> Provided it all wants to reproduce well enough.
>>
>> Thanks, I applied it, but to date it is still not accepted yet:
>> https://build.opensuse.org/requests/1335893
> 
> OK, I have a first dump with the patch applied:
>    __lock = {
>      raw_lock = {
>        {
>          val = {
>            counter = 0x2c0003
>          },
>          {
>            locked = 0x3,
>            pending = 0x0
>          },
>          {
>            locked_pending = 0x3,
>            tail = 0x2c
>          }
>        }
>      }
>    },
> 
> I am not sure if it is of any help?
> 
> 
> 
> 
> BUT: I have another dump with LOCKDEP (but NOT the patch above). The 
> kernel is again spinning in mm_get_cid(), presumably waiting for a free 
> bit in the map as before [1]:
> 
> 
> [  162.660584] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> ...
> [  162.661378] Sending NMI from CPU 3 to CPUs 1:
> [  162.661398] NMI backtrace for cpu 1
> ...
> [  162.661411] RIP: 0010:mm_get_cid+0x54/0xc0
> 
> 
> 7680 is active on CPU 1:
> PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
> 
> 
> CPU3 is waiting for the CPU1's rq_lock:
> RDX: 0000000000000000  RSI: 0000000000000003  RDI: ffff8cc72fcb8500
> ...
>   #3 [ffffd2e9c0083da0] raw_spin_rq_lock_nested+0x20 at ffffffff9339e700
> 
> crash> struct rq.__lock -x ffff8cc72fcb8500
>    __lock = {
>      raw_lock = {
>        {
>          val = {
>            counter = 0x100003
>          },
>          {
>            locked = 0x3,
>            pending = 0x0
>          },
>          {
>            locked_pending = 0x3,
>            tail = 0x10
>          }
>        }
>      },
>      magic = 0xdead4ead,
>      owner_cpu = 0x1,
>      owner = 0xffff8cc4038b8000,
>      dep_map = {
>        key = 0xffffffff96245970 <__key.7>,
>        class_cache = {0xffffffff9644b488 <lock_classes+10600>, 0x0},
>        name = 0xffffffff94ba3ab3 "&rq->__lock",
>        wait_type_outer = 0x0,
>        wait_type_inner = 0x2,
>        lock_type = 0x0
>      }
>    },
> 
> owner_cpu is 1, owner is:
> PID: 7508     TASK: ffff8cc4038b8000  CPU: 1    COMMAND: "compile"
> 
> But as you can see above, CPU1 is occupied with a different task:
> crash> bt -sxc 1
> PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
> 
> spinning in mm_get_cid() as I wrote. See the objdump of mm_get_cid below.

You might be interested in mm_cid dumps:

====== PID 7508 (sleeping, holding the rq lock) ======

crash> task -R mm_cid -x 7508
PID: 7508     TASK: ffff8cc4038b8000  CPU: 1    COMMAND: "compile"
   mm_cid = {
     active = 0x1,
     cid = 0x40000003
   },

crash> p ((struct task_struct *)(0xffff8cc4038b8000))->mm->mm_cid|head -4
$6 = {
   pcpu = 0x66222619df40,
   mode = 1073741824,
   max_cids = 4,


====== PID 7680 (spinning in mm_get_cid()) ======

crash> task -R mm_cid -x 7680
PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
   mm_cid = {
     active = 0x1,
     cid = 0x80000000
   },

crash> p ((struct task_struct *)(0xffff8cc4038b8000))->mm->mm_cid|head -4
$8 = {
   pcpu = 0x66222619df40,
   mode = 1073741824,
   max_cids = 4,


====== per-cpu for CPU1 ======

crash> struct mm_cid_pcpu -x fffff2e9bfc89f40
struct mm_cid_pcpu {
   cid = 0x40000003
}



Dump of any other's mm_cids needed?

> [1] https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
> 
> 
>> ffffffff8139cd40 <mm_get_cid>:
>> mm_get_cid():
>> include/linux/cpumask.h:1020
>> ffffffff8139cd40:       8b 05 9a d7 40 02       mov    
>> 0x240d79a(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
>> kernel/sched/sched.h:3779
>> ffffffff8139cd46:       55                      push   %rbp
>> ffffffff8139cd47:       53                      push   %rbx
>> include/linux/mm_types.h:1477
>> ffffffff8139cd48:       48 8d 9f 80 0b 00 00    lea    0xb80(%rdi),%rbx
>> kernel/sched/sched.h:3780 (discriminator 2)
>> ffffffff8139cd4f:       8b b7 0c 01 00 00       mov    0x10c(%rdi),%esi
>> include/linux/cpumask.h:1020
>> ffffffff8139cd55:       83 c0 3f                add    $0x3f,%eax
>> ffffffff8139cd58:       c1 e8 03                shr    $0x3,%eax
>> kernel/sched/sched.h:3780 (discriminator 2)
>> ffffffff8139cd5b:       48 89 f5                mov    %rsi,%rbp
>> include/linux/mm_types.h:1479 (discriminator 1)
>> ffffffff8139cd5e:       25 f8 ff ff 1f          and    $0x1ffffff8,%eax
>> include/linux/mm_types.h:1489 (discriminator 1)
>> ffffffff8139cd63:       48 8d 3c 43             lea    (%rbx,%rax,2),%rdi
>> include/linux/find.h:393
>> ffffffff8139cd67:       e8 44 d8 6e 00          call   
>> ffffffff81a8a5b0 <_find_first_zero_bit>
>> kernel/sched/sched.h:3771
>> ffffffff8139cd6c:       39 e8                   cmp    %ebp,%eax
>> ffffffff8139cd6e:       73 7c                   jae    
>> ffffffff8139cdec <mm_get_cid+0xac>
>> ffffffff8139cd70:       89 c1                   mov    %eax,%ecx
>> kernel/sched/sched.h:3773 (discriminator 1)
>> ffffffff8139cd72:       89 c2                   mov    %eax,%edx
>> include/linux/cpumask.h:1020
>> ffffffff8139cd74:       8b 05 66 d7 40 02       mov    
>> 0x240d766(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
>> ffffffff8139cd7a:       83 c0 3f                add    $0x3f,%eax
>> ffffffff8139cd7d:       c1 e8 03                shr    $0x3,%eax
>> include/linux/mm_types.h:1479 (discriminator 1)
>> ffffffff8139cd80:       25 f8 ff ff 1f          and    $0x1ffffff8,%eax
>> include/linux/mm_types.h:1489 (discriminator 1)
>> ffffffff8139cd85:       48 8d 04 43             lea    (%rbx,%rax,2),%rax
>> arch/x86/include/asm/bitops.h:136
>> ffffffff8139cd89:       f0 48 0f ab 10          lock bts %rdx,(%rax)
>> kernel/sched/sched.h:3773 (discriminator 2)
>> ffffffff8139cd8e:       73 4b                   jae    
>> ffffffff8139cddb <mm_get_cid+0x9b>
>> ffffffff8139cd90:       eb 5a                   jmp    
>> ffffffff8139cdec <mm_get_cid+0xac>
>> arch/x86/include/asm/vdso/processor.h:13
>> ffffffff8139cd92:       f3 90                   pause
>> include/linux/cpumask.h:1020
>> ffffffff8139cd94:       8b 05 46 d7 40 02       mov    
>> 0x240d746(%rip),%eax        # ffffffff837aa4e0 <nr_cpu_ids>
> 
> The CPU1 was caught by the NMI here ^^^^^^^^^^^^^^^^^^^^.
> 
> 
> 
> 
>> In the meantime, me and Michal K. did some digging into qemu dumps. 
>> Details at (and a couple previous comments):
>> https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17
>>
>> tl;dr:
>>
>> In one of the dumps, one process sits in
>>    context_switch
>>      -> mm_get_cid (before switch_to())
>>
>>  > 65 kworker/1:1 SP= 0xffffcf82c022fd98 -> __schedule+0x16ee 
>> (ffffffff820f162e) -> call mm_get_cid
>>
>> Michal extracted the vCPU's RIP and it turned out:
>>  > Hm, I'd say the CPU could be spinning in mm_get_cid() waiting for a 
>> free CID.
>>  > ...
>>  > ffff8a88458137c0:  000000000000000f 000000000000000f
>>  >                                                    ^
>>  > Hm, so indeed CIDs for all four CPUs are occupied.
>>
>> To me (I don't know what CID is either), this might point as a 
>> possible culprit to Thomas' "sched/mmcid: Cure mode transition woes" [1].
>>
>> Funnily enough, 47ee94efccf6 ("sched/mmcid: Protect transition on 
>> weakly ordered systems") spells:
>>  >     As a consequence the task will
>>  >     not drop the CID when scheduling out before the fixup is 
>> completed, which
>>  >     means the CID space can be exhausted and the next task 
>> scheduling in will
>>  >     loop in mm_get_cid() and the fixup thread can livelock on the 
>> held runqueue
>>  >     lock as above.
>>
>> Which sounds like what exactly happens here. Except the patch is from 
>> the series above, so is already in 6.19 obviously.
>>
>>
>> I noticed there is also a 7.0-rc1 fix:
>>    1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on mode switch
>> But that got into 6.19.1 already (we are at 6.19.3). So does not 
>> improve the situation.
>>
>> Any ideas?
>>
>>
>>
>> [1] https://lore.kernel.org/all/20260201192234.380608594@kernel.org/
>>
>> thanks,
> 

-- 
js
suse labs


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-05 12:20           ` Jiri Slaby
@ 2026-03-05 16:16             ` Thomas Gleixner
  2026-03-05 17:33               ` Jiri Slaby
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-05 16:16 UTC (permalink / raw)
  To: Jiri Slaby, Peter Zijlstra
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On Thu, Mar 05 2026 at 13:20, Jiri Slaby wrote:
> On 05. 03. 26, 12:53, Jiri Slaby wrote:
>> owner_cpu is 1, owner is:
>> PID: 7508     TASK: ffff8cc4038b8000  CPU: 1    COMMAND: "compile"
>> 
>> But as you can see above, CPU1 is occupied with a different task:
>> crash> bt -sxc 1
>> PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
>> 
>> spinning in mm_get_cid() as I wrote. See the objdump of mm_get_cid below.
>
> You might be interested in mm_cid dumps:
>
> ====== PID 7508 (sleeping, holding the rq lock) ======
>
> crash> task -R mm_cid -x 7508
> PID: 7508     TASK: ffff8cc4038b8000  CPU: 1    COMMAND: "compile"
>    mm_cid = {
>      active = 0x1,
>      cid = 0x40000003

CID 3 owned by CPU 1

>    },
>
> crash> p ((struct task_struct *)(0xffff8cc4038b8000))->mm->mm_cid|head -4
> $6 = {
>    pcpu = 0x66222619df40,
>    mode = 1073741824,

mode = per CPU mode

>    max_cids = 4,
>
>
> ====== PID 7680 (spinning in mm_get_cid()) ======
>
> crash> task -R mm_cid -x 7680
> PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
>    mm_cid = {
>      active = 0x1,
>      cid = 0x80000000

CID is unset

>    },
>
> crash> p ((struct task_struct *)(0xffff8cc4038b8000))->mm->mm_cid|head -4
> $8 = {
>    pcpu = 0x66222619df40,
>    mode = 1073741824,

That's per CPU mode too

>    max_cids = 4,
>
>
> ====== per-cpu for CPU1 ======
>
> crash> struct mm_cid_pcpu -x fffff2e9bfc89f40
> struct mm_cid_pcpu {
>    cid = 0x40000003

That's the one owned by CPU 1

> }
>
> Dump of any other's mm_cids needed?

It would be helpful to see the content of all PCPU CIDs and
tsk::mm_cid::* for all tasks which belong to that process.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-05 16:16             ` Thomas Gleixner
@ 2026-03-05 17:33               ` Jiri Slaby
  2026-03-05 19:25                 ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-05 17:33 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On 05. 03. 26, 17:16, Thomas Gleixner wrote:
>> Dump of any other's mm_cids needed?
> 
> It would be helpful to see the content of all PCPU CIDs and
> tsk::mm_cid::* for all tasks which belong to that process.

Not sure which of the two processes. So both:

crash> rd __per_cpu_offset 4
ffffffff94c38160:  ffff8cc799a6c000 ffff8cc799aec000   ................
ffffffff94c38170:  ffff8cc799b6c000 ffff8cc799bec000   ................



====== PID 7680 (spinning in mm_get_cid()) ======
4 tasks with
   mm = 0xffff8cc406824680
     mm_cid.pcpu = 0x66222619df00,


crash> task -x -R mm_cid ffff8cc4038525c0 ffff8cc40ad40000 
ffff8cc40683cb80 ffff8cc418424b80
PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
   mm_cid = {
     active = 0x1,
     cid = 0x80000000
   },

PID: 7681     TASK: ffff8cc40ad40000  CPU: 3    COMMAND: "asm"
   mm_cid = {
     active = 0x1,
     cid = 0x40000000
   },

PID: 7682     TASK: ffff8cc40683cb80  CPU: 0    COMMAND: "asm"
   mm_cid = {
     active = 0x1,
     cid = 0x40000002
   },

PID: 7684     TASK: ffff8cc418424b80  CPU: 2    COMMAND: "asm"
   mm_cid = {
     active = 0x1,
     cid = 0x40000001
   },



crash> struct mm_cid_pcpu -x 0xfffff2e9bfc09f00
struct mm_cid_pcpu {
   cid = 0x40000002
}
crash> struct mm_cid_pcpu -x 0xfffff2e9bfc89f00
struct mm_cid_pcpu {
   cid = 0x0
}
crash> struct mm_cid_pcpu -x 0xfffff2e9bfd09f00
struct mm_cid_pcpu {
   cid = 0x40000001
}
crash> struct mm_cid_pcpu -x 0xfffff2e9bfd89f00
struct mm_cid_pcpu {
   cid = 0x40000000
}




====== PID 7508 (sleeping, holding the rq lock) ======
6 tasks with
   mm = 0xffff8cc407222340
     mm_cid.pcpu = 0x66222619df40,

crash> task -x -R mm_cid ffff8cc43d090000 ffff8cc43d094b80 
ffff8cc494a00000 ffff8cc494a04b80 ffff8cc4038b8000 ffff8cc4038bcb80
PID: 7504     TASK: ffff8cc43d090000  CPU: 0    COMMAND: "compile"
   mm_cid = {
     active = 0x1,
     cid = 0x40000001
   },

PID: 7505     TASK: ffff8cc43d094b80  CPU: 1    COMMAND: "compile"
   mm_cid = {
     active = 0x1,
     cid = 0x40000003
   },

PID: 7506     TASK: ffff8cc494a00000  CPU: 3    COMMAND: "compile"
   mm_cid = {
     active = 0x1,
     cid = 0x40000000
   },

PID: 7507     TASK: ffff8cc494a04b80  CPU: 2    COMMAND: "compile"
   mm_cid = {
     active = 0x1,
     cid = 0x40000002
   },

PID: 7508     TASK: ffff8cc4038b8000  CPU: 1    COMMAND: "compile"
   mm_cid = {
     active = 0x1,
     cid = 0x40000003
   },

PID: 7630     TASK: ffff8cc4038bcb80  CPU: 2    COMMAND: "compile"
   mm_cid = {
     active = 0x1,
     cid = 0x40000002
   },


crash> struct mm_cid_pcpu -x 0xfffff2e9bfc09f40
struct mm_cid_pcpu {
   cid = 0x40000001
}
crash> struct mm_cid_pcpu -x 0xfffff2e9bfc89f40
struct mm_cid_pcpu {
   cid = 0x40000003
}
crash> struct mm_cid_pcpu -x 0xfffff2e9bfd09f40
struct mm_cid_pcpu {
   cid = 0x40000002
}
crash> struct mm_cid_pcpu -x 0xfffff2e9bfd89f40
struct mm_cid_pcpu {
   cid = 0x40000000
}


Anything else :)?

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-05 17:33               ` Jiri Slaby
@ 2026-03-05 19:25                 ` Thomas Gleixner
  2026-03-06  5:48                   ` Jiri Slaby
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-05 19:25 UTC (permalink / raw)
  To: Jiri Slaby, Peter Zijlstra
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On Thu, Mar 05 2026 at 18:33, Jiri Slaby wrote:
> On 05. 03. 26, 17:16, Thomas Gleixner wrote:
>
> ====== PID 7680 (spinning in mm_get_cid()) ======
> 4 tasks with
>    mm = 0xffff8cc406824680
>      mm_cid.pcpu = 0x66222619df00,
>
>
> crash> task -x -R mm_cid ffff8cc4038525c0 ffff8cc40ad40000 
> ffff8cc40683cb80 ffff8cc418424b80
> PID: 7680     TASK: ffff8cc4038525c0  CPU: 1    COMMAND: "asm"
>    mm_cid = {
>      active = 0x1,
>      cid = 0x80000000
>    },

So CID 3 has gone AWOL...

> PID: 7681     TASK: ffff8cc40ad40000  CPU: 3    COMMAND: "asm"
>    mm_cid = {
>      active = 0x1,
>      cid = 0x40000000
>    },
>
> PID: 7682     TASK: ffff8cc40683cb80  CPU: 0    COMMAND: "asm"
>    mm_cid = {
>      active = 0x1,
>      cid = 0x40000002
>    },
>
> PID: 7684     TASK: ffff8cc418424b80  CPU: 2    COMMAND: "asm"
>    mm_cid = {
>      active = 0x1,
>      cid = 0x40000001
>    },
>
> crash> struct mm_cid_pcpu -x 0xfffff2e9bfc09f00
> struct mm_cid_pcpu {
>    cid = 0x40000002
> }
> crash> struct mm_cid_pcpu -x 0xfffff2e9bfc89f00
> struct mm_cid_pcpu {
>    cid = 0x0
> }
> crash> struct mm_cid_pcpu -x 0xfffff2e9bfd09f00
> struct mm_cid_pcpu {
>    cid = 0x40000001
> }
> crash> struct mm_cid_pcpu -x 0xfffff2e9bfd89f00
> struct mm_cid_pcpu {
>    cid = 0x40000000
> }

... as 0, 1, 2 are owned by CPUs 3, 2, 0. 

The other process is not relevant. That's just fallout and has a
different CID space, which is consistent.

Is there simple way to reproduce?

Thanks,

        tglx





^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-05 19:25                 ` Thomas Gleixner
@ 2026-03-06  5:48                   ` Jiri Slaby
  2026-03-06  9:57                     ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-06  5:48 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On 05. 03. 26, 20:25, Thomas Gleixner wrote:
> Is there simple way to reproduce?

Unfortunately not at all. To date, I even cannot reproduce locally, it 
reproduces exclusively in opensuse build service (and github CI as per 
Matthieu's report). I have a project in there with packages which fail 
more often than others:
   https://build.opensuse.org/project/monitor/home:jirislaby:softlockup
But it's all green ATM.

Builds of Go 1.24 and tests of rust 1.90 fail the most. The former even 
takes only ~ 8 minutes, so it's not that intensive build at all. So the 
reasons are unknown to me. At least, Go apparently uses threads for 
building (unlike gcc/clang with forks/processes). Dunno about rust.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06  5:48                   ` Jiri Slaby
@ 2026-03-06  9:57                     ` Thomas Gleixner
  2026-03-06 10:16                       ` Jiri Slaby
                                         ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-06  9:57 UTC (permalink / raw)
  To: Jiri Slaby, Peter Zijlstra
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On Fri, Mar 06 2026 at 06:48, Jiri Slaby wrote:
> On 05. 03. 26, 20:25, Thomas Gleixner wrote:
>> Is there simple way to reproduce?
>
> Unfortunately not at all. To date, I even cannot reproduce locally, it 
> reproduces exclusively in opensuse build service (and github CI as per 
> Matthieu's report). I have a project in there with packages which fail 
> more often than others:
>    https://build.opensuse.org/project/monitor/home:jirislaby:softlockup
> But it's all green ATM.
>
> Builds of Go 1.24 and tests of rust 1.90 fail the most. The former even 
> takes only ~ 8 minutes, so it's not that intensive build at all. So the 
> reasons are unknown to me. At least, Go apparently uses threads for 
> building (unlike gcc/clang with forks/processes). Dunno about rust.

I tried with tons of test cases which stress test mmcid with threads and
failed.

Can you provide me your .config, source version, VM setup (Number of
CPUs, memory etc.)?

I tried to find it on that github page Matthiue mentioned but I'm
probably too stupid to navigate this clicky interface.

Thanks

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06  9:57                     ` Thomas Gleixner
@ 2026-03-06 10:16                       ` Jiri Slaby
  2026-03-06 16:28                         ` Thomas Gleixner
  2026-03-06 11:06                       ` Matthieu Baerts
  2026-03-06 15:24                       ` Peter Zijlstra
  2 siblings, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-06 10:16 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On 06. 03. 26, 10:57, Thomas Gleixner wrote:
> On Fri, Mar 06 2026 at 06:48, Jiri Slaby wrote:
>> On 05. 03. 26, 20:25, Thomas Gleixner wrote:
>>> Is there simple way to reproduce?
>>
>> Unfortunately not at all. To date, I even cannot reproduce locally, it
>> reproduces exclusively in opensuse build service (and github CI as per
>> Matthieu's report). I have a project in there with packages which fail
>> more often than others:
>>     https://build.opensuse.org/project/monitor/home:jirislaby:softlockup
>> But it's all green ATM.
>>
>> Builds of Go 1.24 and tests of rust 1.90 fail the most. The former even
>> takes only ~ 8 minutes, so it's not that intensive build at all. So the
>> reasons are unknown to me. At least, Go apparently uses threads for
>> building (unlike gcc/clang with forks/processes). Dunno about rust.
> 
> I tried with tons of test cases which stress test mmcid with threads and
> failed.

Me too, with many artificial pthread or fork or combined loads/bombs 
with loops and yields.

I was once successful to see the failure in a local build of Go: using 
"osc build --vm-type=kvm" which is what the build service (see below) calls.

It's extremely hard to hit it locally. So there is likely some rather 
small race window or whatnot.

> Can you provide me your .config

Sure, it's the standard openSUSE kernel, i.e.:
https://github.com/openSUSE/kernel-source/blob/9c1596772e0/config/x86_64/default

source version,

It happens with 6.19+, the current failures are with the commit above 
which is 6.19.5.

I added 7.0-rc2 as well now to:
   https://build.opensuse.org/project/monitor/home:jirislaby:softlockup

Well, it already failed for Go:
  
https://build.opensuse.org/package/live_build_log/home:jirislaby:softlockup/go1.24/617/x86_64

So at least it is consistent, and not stable tree related ;).


If that helps, I would be likely able to "bisect" the 4 your mm_cid 
patches if they can be reverted on the top of 6.19 easily. (By letting 
the kernel run in the build service.)

> VM setup (Number of CPUs, memory etc.)?

For example, the currently failing build:
https://build.opensuse.org/package/live_build_log/home:jirislaby:softlockup/rust1.90:test/openSUSE_Factory/x86_64

says:
[   10s] /usr/bin/qemu-kvm -nodefaults -no-reboot -nographic -vga none 
-cpu host -M pc,accel=kvm,usb=off,dump-guest-core=off,vmport=off 
-sandbox on -bios /usr/share/qemu/qboot.rom -object 
rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 
-object iothread,id=io0 -run-with user=qemu -net none -kernel 
/var/cache/obs/worker/root_4/.mount/boot/kernel -initrd 
/var/cache/obs/worker/root_4/.mount/boot/initrd -append 
root=/dev/disk/by-id/virtio-0 rootfstype=ext4 rootflags=noatime 
elevator=noop nmi_watchdog=0 rw ia32_emulation=1 oops=panic panic=1 
quiet console=hvc0 init=/.build/build -m 40960 -drive 
file=/var/cache/obs/worker/root_4/root,format=raw,if=none,id=disk,cache=unsafe,aio=io_uring 
-device virtio-blk-pci,iothread=io0,drive=disk,serial=0 -drive 
file=/var/cache/obs/worker/root_4/swap,format=raw,if=none,id=swap,cache=unsafe,aio=io_uring 
-device virtio-blk-pci,iothread=io0,drive=swap,serial=1 -device 
virtio-serial,max_ports=2 -device virtconsole,chardev=virtiocon0 
-chardev stdio,mux=on,id=virtiocon0 -mon chardev=virtiocon0 -chardev 
socket,id=monitor,server=on,wait=off,path=/var/cache/obs/worker/root_4/root.qemu/monitor 
-mon chardev=monitor,mode=readline -smp 12



The with-7.0-rc2 Go fail above runs:

[    4s] /usr/bin/qemu-kvm -nodefaults -no-reboot -nographic -vga none 
-cpu host -M pc,accel=kvm,usb=off,dump-guest-core=off,vmport=off 
-sandbox on -bios /usr/share/qemu/qboot.rom -object 
rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 
-object iothread,id=io0 -run-with user=qemu -net none -kernel 
/var/cache/obs/worker/root_12/.mount/boot/kernel -initrd 
/var/cache/obs/worker/root_12/.mount/boot/initrd -append 
root=/dev/disk/by-id/virtio-0 rootfstype=ext4 rootflags=noatime 
elevator=noop nmi_watchdog=0 rw ia32_emulation=1 oops=panic panic=1 
quiet console=hvc0 init=/.build/build -m 16384 -drive 
file=/var/cache/obs/worker/root_12/root,format=raw,if=none,id=disk,cache=unsafe,aio=io_uring 
-device virtio-blk-pci,iothread=io0,drive=disk,serial=0 -drive 
file=/var/cache/obs/worker/root_12/swap,format=raw,if=none,id=swap,cache=unsafe,aio=io_uring 
-device virtio-blk-pci,iothread=io0,drive=swap,serial=1 -device 
virtio-serial,max_ports=2 -device virtconsole,chardev=virtiocon0 
-chardev stdio,mux=on,id=virtiocon0 -mon chardev=virtiocon0 -chardev 
socket,id=monitor,server=on,wait=off,path=/var/cache/obs/worker/root_12/root.qemu/monitor 
-mon chardev=monitor,mode=readline -smp 4



> I tried to find it on that github page Matthiue mentioned but I'm
> probably too stupid to navigate this clicky interface.

I haven't looked into the details of the github failure yet...

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06  9:57                     ` Thomas Gleixner
  2026-03-06 10:16                       ` Jiri Slaby
@ 2026-03-06 11:06                       ` Matthieu Baerts
  2026-03-06 16:57                         ` Matthieu Baerts
  2026-03-06 15:24                       ` Peter Zijlstra
  2 siblings, 1 reply; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-06 11:06 UTC (permalink / raw)
  To: Thomas Gleixner, Jiri Slaby, Peter Zijlstra
  Cc: Stefan Hajnoczi, Stefano Garzarella, kvm, virtualization, Netdev,
	rcu, MPTCP Linux, Linux Kernel, Shinichiro Kawasaki,
	Paul E. McKenney, Dave Hansen, luto@kernel.org,
	Michal Koutný, Waiman Long

Hi Thomas,

Thank you for looking into this!

On 06/03/2026 10:57, Thomas Gleixner wrote:
> On Fri, Mar 06 2026 at 06:48, Jiri Slaby wrote:
>> On 05. 03. 26, 20:25, Thomas Gleixner wrote:
>>> Is there simple way to reproduce?
>>
>> Unfortunately not at all. To date, I even cannot reproduce locally, it 
>> reproduces exclusively in opensuse build service (and github CI as per 
>> Matthieu's report). I have a project in there with packages which fail 
>> more often than others:
>>    https://build.opensuse.org/project/monitor/home:jirislaby:softlockup
>> But it's all green ATM.
>>
>> Builds of Go 1.24 and tests of rust 1.90 fail the most. The former even 
>> takes only ~ 8 minutes, so it's not that intensive build at all. So the 
>> reasons are unknown to me. At least, Go apparently uses threads for 
>> building (unlike gcc/clang with forks/processes). Dunno about rust.
> 
> I tried with tons of test cases which stress test mmcid with threads and
> failed.

On my side, I didn't manage to reproduce it locally either.


> Can you provide me your .config, source version, VM setup (Number of
> CPUs, memory etc.)?

My CI ran into this issue 2 days ago, with and without a debug kernel
config. The kernel being tested was on top of 'net-next', which was on
top of this commit from Linus' tree: fbdfa8da05b6 ("selftests:
tc-testing: fix list_categories() crash on list type").

- Config without debug:


https://github.com/user-attachments/files/25791728/config-run-22657946888-normal-join.gz

- Config with debug:


https://github.com/user-attachments/files/25791960/config-run-22657946888-debug-nojoin.gz

- Just in case, stacktraces available there:

  https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22657946888


My tests are being executed in VMs I don't control using a kernel v6.14
on Azure with 4 vCPUs, 16GB of RAM, and KVM nested support. From more
details about what's in it:


https://github.com/actions/runner-images/blob/ubuntu24/20260302.42/images/ubuntu/Ubuntu2404-Readme.md


From there, a docker container is started, from which QEMU 10.1.0
(Debian 1:10.1.0+ds-5ubuntu2.2) is launched with 4 vCPU and 5GB of RAM
using this command:


/usr/bin/qemu-system-x86_64 \
  -name mptcpdev \
  -m 5120M \
  -smp 4 \
  -chardev socket,id=charvirtfs5,path=/tmp/virtmevrwrzu5k \
  -device vhost-user-fs-device,chardev=charvirtfs5,tag=ROOTFS \
  -object memory-backend-memfd,id=mem,size=5120M,share=on \
  -numa node,memdev=mem \
  -machine accel=kvm:tcg \
  -M microvm,accel=kvm,pcie=on,rtc=on \
  -cpu host,topoext=on \
  -parallel none \
  -net none \
  -echr 1 \
  -chardev file,path=/proc/self/fd/2,id=dmesg \
  -device virtio-serial-device \
  -device virtconsole,chardev=dmesg \
  -chardev stdio,id=console,signal=off,mux=on \
  -serial chardev:console \
  -mon chardev=console \
  -vga none \
  -display none \
  -device vhost-vsock-device,guest-cid=3 \
  -kernel
/home/runner/work/mptcp_net-next/mptcp_net-next/.virtme/build/arch/x86/boot/bzImage
\
  -append 'virtme_hostname=mptcpdev nr_open=1048576
virtme_link_mods=/home/runner/work/mptcp_net-next/mptcp_net-next/.virtme/build/.virtme_mods/lib/modules/0.0.0
virtme_rw_overlay0=/tmp console=hvc0 earlyprintk=serial,ttyS0,115200
virtme_console=ttyS0 psmouse.proto=exps
virtme.vsockexec=`/tmp/virtme-console/3.sh`
virtme_chdir=home/runner/work/mptcp_net-next/mptcp_net-next
virtme_root_user=1 rootfstype=virtiofs root=ROOTFS raid=noautodetect rw
debug nokaslr mitigations=off softlockup_panic=1 nmi_watchdog=1
hung_task_panic=1 panic=-1 oops=panic
init=/usr/local/lib/python3.13/dist-packages/virtme/guest/bin/virtme-ng-init'
\
  -gdb tcp::1234 \
  -qmp tcp::3636,server,nowait \
  -no-reboot


It is possible to locally launch the same command using the same QEMU
version (but not the same host kernel) with the help of Docker:

  $ cd <kernel source code>
  # docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --rm \
    -it --privileged mptcp/mptcp-upstream-virtme-docker:latest \
    manual normal

This will build a new kernel in O=.virtme/build, launch it and give you
access to a prompt.


After that, you can do also use the "auto" mode with the last built
image to boot the VM, only print "OK", stop and retry if there were no
errors:

  $ cd <kernel source code>
  $ echo 'echo OK' > .virtme-exec-run
  # i=1; \
    while docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --rm \
    -it --privileged mptcp/mptcp-upstream-virtme-docker:latest \
    vm auto normal; do \
      echo "== Attempt: $i: OK =="; \
      i=$((i+1)); \
    done; \
    echo "== Failure after $i attempts =="


> I tried to find it on that github page Matthiue mentioned but I'm
> probably too stupid to navigate this clicky interface.

I'm sorry about that, I understand, the interface is not very clear. Do
not hesitate to tell me if you need anything else from me.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06  9:57                     ` Thomas Gleixner
  2026-03-06 10:16                       ` Jiri Slaby
  2026-03-06 11:06                       ` Matthieu Baerts
@ 2026-03-06 15:24                       ` Peter Zijlstra
  2026-03-07  9:01                         ` Thomas Gleixner
  2 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2026-03-06 15:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jiri Slaby, Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long, Marco Elver

On Fri, Mar 06, 2026 at 10:57:15AM +0100, Thomas Gleixner wrote:

> I tried with tons of test cases which stress test mmcid with threads and
> failed.

Are some of those in tools/testing/selftests ?

Anyway, I was going over that code, and I noticed that there seems to be
inconsistent locking for mm_mm_cid::pcpu.

There's a bunch of sites that state we need rq->lock for remote access;
but then things like sched_mm_cid_fork() and sched_mm_cid_exit() seem to
think that holding mm_cid->lock is sufficient.

This doesn't make sense to me, but maybe I missed something.

Anyway, I cobbled together the below, and that builds and boots and
passes everything in tools/testing/selftests/rseq.


YMMV

---
diff --git a/include/linux/rseq_types.h b/include/linux/rseq_types.h
index da5fa6f40294..df2b4629cbfd 100644
--- a/include/linux/rseq_types.h
+++ b/include/linux/rseq_types.h
@@ -2,9 +2,12 @@
 #ifndef _LINUX_RSEQ_TYPES_H
 #define _LINUX_RSEQ_TYPES_H
 
+#include <linux/compiler_types.h>
 #include <linux/irq_work_types.h>
 #include <linux/types.h>
 #include <linux/workqueue_types.h>
+#include <linux/mutex.h>
+#include <asm/percpu.h>
 
 #ifdef CONFIG_RSEQ
 struct rseq;
@@ -145,8 +148,14 @@ struct sched_mm_cid {
  *		while a task with a CID is running
  */
 struct mm_cid_pcpu {
-	unsigned int	cid;
-}____cacheline_aligned_in_smp;
+	unsigned int		cid;
+} ____cacheline_aligned_in_smp;
+
+/*
+ * See helpers in kernel/sched/sched.h that convert
+ * from __rq_lockp(rq) to RQ_LOCK.
+ */
+token_context_lock(RQ_LOCK);
 
 /**
  * struct mm_mm_cid - Storage for per MM CID data
@@ -167,7 +176,7 @@ struct mm_cid_pcpu {
  */
 struct mm_mm_cid {
 	/* Hotpath read mostly members */
-	struct mm_cid_pcpu	__percpu *pcpu;
+	struct mm_cid_pcpu	__percpu *pcpu __guarded_by(RQ_LOCK);
 	unsigned int		mode;
 	unsigned int		max_cids;
 
@@ -179,11 +188,11 @@ struct mm_mm_cid {
 	struct mutex		mutex;
 
 	/* Low frequency modified */
-	unsigned int		nr_cpus_allowed;
-	unsigned int		users;
-	unsigned int		pcpu_thrs;
-	unsigned int		update_deferred;
-}____cacheline_aligned_in_smp;
+	unsigned int		nr_cpus_allowed __guarded_by(&lock);
+	unsigned int		users		__guarded_by(&lock);
+	unsigned int		pcpu_thrs	__guarded_by(&lock);
+	unsigned int		update_deferred __guarded_by(&lock);
+} ____cacheline_aligned_in_smp;
 #else /* CONFIG_SCHED_MM_CID */
 struct mm_mm_cid { };
 struct sched_mm_cid { };
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2b571e640372..f7c03c9c4fd0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5335,7 +5335,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		}
 	}
 
-	mm_cid_switch_to(prev, next);
+	mm_cid_switch_to(rq, prev, next);
 
 	/*
 	 * Tell rseq that the task was scheduled in. Must be after
@@ -10511,6 +10511,7 @@ void call_trace_sched_update_nr_running(struct rq *rq, int count)
  * fork(), exit() and affinity changes
  */
 static void __mm_update_max_cids(struct mm_mm_cid *mc)
+	__must_hold(&mc->lock)
 {
 	unsigned int opt_cids, max_cids;
 
@@ -10523,15 +10524,17 @@ static void __mm_update_max_cids(struct mm_mm_cid *mc)
 }
 
 static inline unsigned int mm_cid_calc_pcpu_thrs(struct mm_mm_cid *mc)
+	__must_hold(&mc->lock)
 {
 	unsigned int opt_cids;
 
 	opt_cids = min(mc->nr_cpus_allowed, mc->users);
 	/* Has to be at least 1 because 0 indicates PCPU mode off */
-	return max(min(opt_cids - opt_cids / 4, num_possible_cpus() / 2), 1);
+	return max(min(opt_cids - (opt_cids / 4), num_possible_cpus() / 2), 1);
 }
 
 static bool mm_update_max_cids(struct mm_struct *mm)
+	__must_hold(&mm->mm_cid.lock)
 {
 	struct mm_mm_cid *mc = &mm->mm_cid;
 	bool percpu = cid_on_cpu(mc->mode);
@@ -10558,6 +10561,7 @@ static bool mm_update_max_cids(struct mm_struct *mm)
 		return false;
 
 	/* Flip the mode and set the transition flag to bridge the transfer */
+	WARN_ON_ONCE(mc->mode & MM_CID_TRANSIT);
 	WRITE_ONCE(mc->mode, mc->mode ^ (MM_CID_TRANSIT | MM_CID_ONCPU));
 	/*
 	 * Order the store against the subsequent fixups so that
@@ -10568,16 +10572,28 @@ static bool mm_update_max_cids(struct mm_struct *mm)
 	return true;
 }
 
+/*
+ * Silly helper because we cannot express that mm_mm_cid::users is updated
+ * while holding both mutex and lock and can thus be read while holding
+ * either.
+ */
+static __always_inline unsigned int mm_cid_users(struct mm_struct *mm)
+	__must_hold(&mm->mm_cid.mutex)
+{
+	__assume_ctx_lock(&mm->mm_cid.lock);
+	return mm->mm_cid.users;
+}
+
 static inline void mm_update_cpus_allowed(struct mm_struct *mm, const struct cpumask *affmsk)
 {
 	struct cpumask *mm_allowed;
 	struct mm_mm_cid *mc;
 	unsigned int weight;
 
-	if (!mm || !READ_ONCE(mm->mm_cid.users))
+	if (!mm || !data_race(READ_ONCE(mm->mm_cid.users)))
 		return;
 	/*
-	 * mm::mm_cid::mm_cpus_allowed is the superset of each threads
+	 * mm::mm_cid::mm_cpus_allowed is the superset of each thread's
 	 * allowed CPUs mask which means it can only grow.
 	 */
 	mc = &mm->mm_cid;
@@ -10609,6 +10625,7 @@ static inline void mm_update_cpus_allowed(struct mm_struct *mm, const struct cpu
 
 static inline void mm_cid_complete_transit(struct mm_struct *mm, unsigned int mode)
 {
+	WARN_ON_ONCE(!(mm->mm_cid.mode & MM_CID_TRANSIT));
 	/*
 	 * Ensure that the store removing the TRANSIT bit cannot be
 	 * reordered by the CPU before the fixups have been completed.
@@ -10633,11 +10650,12 @@ static void mm_cid_fixup_cpus_to_tasks(struct mm_struct *mm)
 
 	/* Walk the CPUs and fixup all stale CIDs */
 	for_each_possible_cpu(cpu) {
-		struct mm_cid_pcpu *pcp = per_cpu_ptr(mm->mm_cid.pcpu, cpu);
 		struct rq *rq = cpu_rq(cpu);
+		struct mm_cid_pcpu *pcp;
 
 		/* Remote access to mm::mm_cid::pcpu requires rq_lock */
 		guard(rq_lock_irq)(rq);
+		pcp = mm_cid_pcpu(&mm->mm_cid, rq);
 		/* Is the CID still owned by the CPU? */
 		if (cid_on_cpu(pcp->cid)) {
 			/*
@@ -10675,6 +10693,7 @@ static bool mm_cid_fixup_task_to_cpu(struct task_struct *t, struct mm_struct *mm
 {
 	/* Remote access to mm::mm_cid::pcpu requires rq_lock */
 	guard(task_rq_lock)(t);
+	__assume_ctx_lock(RQ_LOCK);
 	/* If the task is not active it is not in the users count */
 	if (!t->mm_cid.active)
 		return false;
@@ -10689,6 +10708,7 @@ static bool mm_cid_fixup_task_to_cpu(struct task_struct *t, struct mm_struct *mm
 }
 
 static void mm_cid_do_fixup_tasks_to_cpus(struct mm_struct *mm)
+	__must_hold(&mm->mm_cid.mutex)
 {
 	struct task_struct *p, *t;
 	unsigned int users;
@@ -10703,7 +10723,7 @@ static void mm_cid_do_fixup_tasks_to_cpus(struct mm_struct *mm)
 	 * The caller has already transferred. The newly incoming task is
 	 * already accounted for, but not yet visible.
 	 */
-	users = mm->mm_cid.users - 2;
+	users = mm_cid_users(mm) - 2;
 	if (!users)
 		return;
 
@@ -10727,18 +10747,19 @@ static void mm_cid_do_fixup_tasks_to_cpus(struct mm_struct *mm)
 	}
 }
 
-static void mm_cid_fixup_tasks_to_cpus(void)
+static void mm_cid_fixup_tasks_to_cpus(struct mm_struct *mm)
+	__must_hold(mm->mm_cid.mutex)
 {
-	struct mm_struct *mm = current->mm;
-
 	mm_cid_do_fixup_tasks_to_cpus(mm);
 	mm_cid_complete_transit(mm, MM_CID_ONCPU);
 }
 
 static bool sched_mm_cid_add_user(struct task_struct *t, struct mm_struct *mm)
+	__must_hold(&mm->mm_cid.mutex)
+	__must_hold(&mm->mm_cid.lock)
 {
 	t->mm_cid.active = 1;
-	mm->mm_cid.users++;
+	mm->mm_cid.users++; /* mutex && lock */
 	return mm_update_max_cids(mm);
 }
 
@@ -10750,8 +10771,9 @@ void sched_mm_cid_fork(struct task_struct *t)
 	WARN_ON_ONCE(!mm || t->mm_cid.cid != MM_CID_UNSET);
 
 	guard(mutex)(&mm->mm_cid.mutex);
-	scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
-		struct mm_cid_pcpu *pcp = this_cpu_ptr(mm->mm_cid.pcpu);
+	scoped_guard(rq_lock_irq, this_rq()) {
+		struct mm_cid_pcpu *pcp = mm_cid_pcpu(&mm->mm_cid, this_rq());
+		guard(raw_spinlock)(&mm->mm_cid.lock);
 
 		/* First user ? */
 		if (!mm->mm_cid.users) {
@@ -10777,7 +10799,7 @@ void sched_mm_cid_fork(struct task_struct *t)
 	}
 
 	if (percpu) {
-		mm_cid_fixup_tasks_to_cpus();
+		mm_cid_fixup_tasks_to_cpus(mm);
 	} else {
 		mm_cid_fixup_cpus_to_tasks(mm);
 		t->mm_cid.cid = mm_get_cid(mm);
@@ -10785,6 +10807,8 @@ void sched_mm_cid_fork(struct task_struct *t)
 }
 
 static bool sched_mm_cid_remove_user(struct task_struct *t)
+	__must_hold(t->mm->mm_cid.mutex)
+	__must_hold(t->mm->mm_cid.lock)
 {
 	t->mm_cid.active = 0;
 	scoped_guard(preempt) {
@@ -10792,11 +10816,13 @@ static bool sched_mm_cid_remove_user(struct task_struct *t)
 		t->mm_cid.cid = cid_from_transit_cid(t->mm_cid.cid);
 		mm_unset_cid_on_task(t);
 	}
-	t->mm->mm_cid.users--;
+	t->mm->mm_cid.users--; /* mutex && lock */
 	return mm_update_max_cids(t->mm);
 }
 
 static bool __sched_mm_cid_exit(struct task_struct *t)
+	__must_hold(t->mm->mm_cid.mutex)
+	__must_hold(t->mm->mm_cid.lock)
 {
 	struct mm_struct *mm = t->mm;
 
@@ -10837,8 +10863,9 @@ void sched_mm_cid_exit(struct task_struct *t)
 	 */
 	scoped_guard(mutex, &mm->mm_cid.mutex) {
 		/* mm_cid::mutex is sufficient to protect mm_cid::users */
-		if (likely(mm->mm_cid.users > 1)) {
-			scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
+		if (likely(mm_cid_users(mm) > 1)) {
+			scoped_guard(rq_lock_irq, this_rq()) {
+				guard(raw_spinlock)(&mm->mm_cid.lock);
 				if (!__sched_mm_cid_exit(t))
 					return;
 				/*
@@ -10847,16 +10874,17 @@ void sched_mm_cid_exit(struct task_struct *t)
 				 * TRANSIT bit. If the CID is owned by the CPU
 				 * then drop it.
 				 */
-				mm_drop_cid_on_cpu(mm, this_cpu_ptr(mm->mm_cid.pcpu));
+				mm_drop_cid_on_cpu(mm, mm_cid_pcpu(&mm->mm_cid, this_rq()));
 			}
 			mm_cid_fixup_cpus_to_tasks(mm);
 			return;
 		}
 		/* Last user */
-		scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
+		scoped_guard(rq_lock_irq, this_rq()) {
+			guard(raw_spinlock)(&mm->mm_cid.lock);
 			/* Required across execve() */
 			if (t == current)
-				mm_cid_transit_to_task(t, this_cpu_ptr(mm->mm_cid.pcpu));
+				mm_cid_transit_to_task(t, mm_cid_pcpu(&mm->mm_cid, this_rq()));
 			/* Ignore mode change. There is nothing to do. */
 			sched_mm_cid_remove_user(t);
 		}
@@ -10893,7 +10921,7 @@ static void mm_cid_work_fn(struct work_struct *work)
 
 	guard(mutex)(&mm->mm_cid.mutex);
 	/* Did the last user task exit already? */
-	if (!mm->mm_cid.users)
+	if (!mm_cid_users(mm))
 		return;
 
 	scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
@@ -10924,13 +10952,14 @@ static void mm_cid_irq_work(struct irq_work *work)
 
 void mm_init_cid(struct mm_struct *mm, struct task_struct *p)
 {
-	mm->mm_cid.max_cids = 0;
 	mm->mm_cid.mode = 0;
-	mm->mm_cid.nr_cpus_allowed = p->nr_cpus_allowed;
-	mm->mm_cid.users = 0;
-	mm->mm_cid.pcpu_thrs = 0;
-	mm->mm_cid.update_deferred = 0;
-	raw_spin_lock_init(&mm->mm_cid.lock);
+	mm->mm_cid.max_cids = 0;
+	scoped_guard (raw_spinlock_init, &mm->mm_cid.lock) {
+		mm->mm_cid.nr_cpus_allowed = p->nr_cpus_allowed;
+		mm->mm_cid.users = 0;
+		mm->mm_cid.pcpu_thrs = 0;
+		mm->mm_cid.update_deferred = 0;
+	}
 	mutex_init(&mm->mm_cid.mutex);
 	mm->mm_cid.irq_work = IRQ_WORK_INIT_HARD(mm_cid_irq_work);
 	INIT_WORK(&mm->mm_cid.work, mm_cid_work_fn);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fd36ae390520..8c761822d6b2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3870,13 +3870,39 @@ static __always_inline void mm_cid_update_task_cid(struct task_struct *t, unsign
 	}
 }
 
-static __always_inline void mm_cid_update_pcpu_cid(struct mm_struct *mm, unsigned int cid)
+/*
+ * Helpers to convert from __rq_lockp(rq) to RQ_LOCK intermediate.
+ */
+static __always_inline
+void mm_cid_update_pcpu_cid(struct rq *rq, struct mm_struct *mm, unsigned int cid)
+	__must_hold(__rq_lockp(rq))
 {
+	__assume_ctx_lock(RQ_LOCK);
+	lockdep_assert(rq->cpu == smp_processor_id());
 	__this_cpu_write(mm->mm_cid.pcpu->cid, cid);
 }
 
-static __always_inline void mm_cid_from_cpu(struct task_struct *t, unsigned int cpu_cid,
-					    unsigned int mode)
+static __always_inline
+unsigned int mm_cid_pcpu_cid(struct rq *rq, struct mm_struct *mm)
+	__must_hold(__rq_lockp(rq))
+{
+	__assume_ctx_lock(RQ_LOCK);
+	lockdep_assert(rq->cpu == smp_processor_id());
+	return __this_cpu_read(mm->mm_cid.pcpu->cid);
+}
+
+static __always_inline
+struct mm_cid_pcpu *mm_cid_pcpu(struct mm_mm_cid *mc, struct rq *rq)
+	__must_hold(__rq_lockp(rq))
+
+{
+	__assume_ctx_lock(RQ_LOCK);
+	return per_cpu_ptr(mc->pcpu, rq->cpu);
+}
+
+static __always_inline void mm_cid_from_cpu(struct rq *rq, struct task_struct *t,
+					    unsigned int cpu_cid, unsigned int mode)
+	__must_hold(__rq_lockp(rq))
 {
 	unsigned int max_cids, tcid = t->mm_cid.cid;
 	struct mm_struct *mm = t->mm;
@@ -3906,12 +3932,13 @@ static __always_inline void mm_cid_from_cpu(struct task_struct *t, unsigned int
 		if (mode & MM_CID_TRANSIT)
 			cpu_cid = cpu_cid_to_cid(cpu_cid) | MM_CID_TRANSIT;
 	}
-	mm_cid_update_pcpu_cid(mm, cpu_cid);
+	mm_cid_update_pcpu_cid(rq, mm, cpu_cid);
 	mm_cid_update_task_cid(t, cpu_cid);
 }
 
-static __always_inline void mm_cid_from_task(struct task_struct *t, unsigned int cpu_cid,
-					     unsigned int mode)
+static __always_inline void mm_cid_from_task(struct rq *rq, struct task_struct *t,
+					     unsigned int cpu_cid, unsigned int mode)
+	__must_hold(__rq_lockp(rq))
 {
 	unsigned int max_cids, tcid = t->mm_cid.cid;
 	struct mm_struct *mm = t->mm;
@@ -3920,7 +3947,7 @@ static __always_inline void mm_cid_from_task(struct task_struct *t, unsigned int
 	/* Optimize for the common case, where both have the ONCPU bit clear */
 	if (likely(cid_on_task(tcid | cpu_cid))) {
 		if (likely(tcid < max_cids)) {
-			mm_cid_update_pcpu_cid(mm, tcid);
+			mm_cid_update_pcpu_cid(rq, mm, tcid);
 			return;
 		}
 		/* Try to converge into the optimal CID space */
@@ -3929,7 +3956,7 @@ static __always_inline void mm_cid_from_task(struct task_struct *t, unsigned int
 		/* Hand over or drop the CPU owned CID */
 		if (cid_on_cpu(cpu_cid)) {
 			if (cid_on_task(tcid))
-				mm_drop_cid_on_cpu(mm, this_cpu_ptr(mm->mm_cid.pcpu));
+				mm_drop_cid_on_cpu(mm, mm_cid_pcpu(&mm->mm_cid, rq));
 			else
 				tcid = cpu_cid_to_cid(cpu_cid);
 		}
@@ -3939,11 +3966,12 @@ static __always_inline void mm_cid_from_task(struct task_struct *t, unsigned int
 		/* Set the transition mode flag if required */
 		tcid |= mode & MM_CID_TRANSIT;
 	}
-	mm_cid_update_pcpu_cid(mm, tcid);
+	mm_cid_update_pcpu_cid(rq, mm, tcid);
 	mm_cid_update_task_cid(t, tcid);
 }
 
-static __always_inline void mm_cid_schedin(struct task_struct *next)
+static __always_inline void mm_cid_schedin(struct rq *rq, struct task_struct *next)
+	__must_hold(__rq_lockp(rq))
 {
 	struct mm_struct *mm = next->mm;
 	unsigned int cpu_cid, mode;
@@ -3951,15 +3979,16 @@ static __always_inline void mm_cid_schedin(struct task_struct *next)
 	if (!next->mm_cid.active)
 		return;
 
-	cpu_cid = __this_cpu_read(mm->mm_cid.pcpu->cid);
+	cpu_cid = mm_cid_pcpu_cid(rq, mm);
 	mode = READ_ONCE(mm->mm_cid.mode);
 	if (likely(!cid_on_cpu(mode)))
-		mm_cid_from_task(next, cpu_cid, mode);
+		mm_cid_from_task(rq, next, cpu_cid, mode);
 	else
-		mm_cid_from_cpu(next, cpu_cid, mode);
+		mm_cid_from_cpu(rq, next, cpu_cid, mode);
 }
 
-static __always_inline void mm_cid_schedout(struct task_struct *prev)
+static __always_inline void mm_cid_schedout(struct rq *rq, struct task_struct *prev)
+	__must_hold(__rq_lockp(rq))
 {
 	struct mm_struct *mm = prev->mm;
 	unsigned int mode, cid;
@@ -3980,7 +4009,7 @@ static __always_inline void mm_cid_schedout(struct task_struct *prev)
 			cid = cid_to_cpu_cid(cid);
 
 		/* Update both so that the next schedule in goes into the fast path */
-		mm_cid_update_pcpu_cid(mm, cid);
+		mm_cid_update_pcpu_cid(rq, mm, cid);
 		prev->mm_cid.cid = cid;
 	} else {
 		mm_drop_cid(mm, cid);
@@ -3988,10 +4017,12 @@ static __always_inline void mm_cid_schedout(struct task_struct *prev)
 	}
 }
 
-static inline void mm_cid_switch_to(struct task_struct *prev, struct task_struct *next)
+static inline void mm_cid_switch_to(struct rq *rq, struct task_struct *prev,
+				    struct task_struct *next)
+	__must_hold(__rq_lockp(rq))
 {
-	mm_cid_schedout(prev);
-	mm_cid_schedin(next);
+	mm_cid_schedout(rq, prev);
+	mm_cid_schedin(rq, next);
 }
 
 #else /* !CONFIG_SCHED_MM_CID: */

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06 10:16                       ` Jiri Slaby
@ 2026-03-06 16:28                         ` Thomas Gleixner
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-06 16:28 UTC (permalink / raw)
  To: Jiri Slaby, Peter Zijlstra
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long

On Fri, Mar 06 2026 at 11:16, Jiri Slaby wrote:
> On 06. 03. 26, 10:57, Thomas Gleixner wrote:
>
> If that helps, I would be likely able to "bisect" the 4 your mm_cid 
> patches if they can be reverted on the top of 6.19 easily. (By letting 
> the kernel run in the build service.)

That would just introduce the other bugs again. Let me try to reproduce.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06 11:06                       ` Matthieu Baerts
@ 2026-03-06 16:57                         ` Matthieu Baerts
  2026-03-06 18:31                           ` Jiri Slaby
  2026-03-06 21:40                           ` Matthieu Baerts
  0 siblings, 2 replies; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-06 16:57 UTC (permalink / raw)
  To: Thomas Gleixner, Jiri Slaby, Peter Zijlstra
  Cc: Stefan Hajnoczi, Stefano Garzarella, kvm, virtualization, Netdev,
	rcu, MPTCP Linux, Linux Kernel, Shinichiro Kawasaki,
	Paul E. McKenney, Dave Hansen, luto@kernel.org,
	Michal Koutný, Waiman Long

Hi Thomas, Jiri, Peter,

On 06/03/2026 12:06, Matthieu Baerts wrote:
> On 06/03/2026 10:57, Thomas Gleixner wrote:
>> On Fri, Mar 06 2026 at 06:48, Jiri Slaby wrote:
>>> On 05. 03. 26, 20:25, Thomas Gleixner wrote:
>>>> Is there simple way to reproduce?
>>>
>>> Unfortunately not at all. To date, I even cannot reproduce locally, it 
>>> reproduces exclusively in opensuse build service (and github CI as per 
>>> Matthieu's report). I have a project in there with packages which fail 
>>> more often than others:
>>>    https://build.opensuse.org/project/monitor/home:jirislaby:softlockup
>>> But it's all green ATM.
>>>
>>> Builds of Go 1.24 and tests of rust 1.90 fail the most. The former even 
>>> takes only ~ 8 minutes, so it's not that intensive build at all. So the 
>>> reasons are unknown to me. At least, Go apparently uses threads for 
>>> building (unlike gcc/clang with forks/processes). Dunno about rust.
>>
>> I tried with tons of test cases which stress test mmcid with threads and
>> failed.
> 
> On my side, I didn't manage to reproduce it locally either.

Apparently I can now... sorry, I don't know why I was not able to do
that before!

(...)

> It is possible to locally launch the same command using the same QEMU
> version (but not the same host kernel) with the help of Docker:
> 
>   $ cd <kernel source code>
>   # docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --rm \
>     -it --privileged mptcp/mptcp-upstream-virtme-docker:latest \
>     manual normal
> 
> This will build a new kernel in O=.virtme/build, launch it and give you
> access to a prompt.
> 
> 
> After that, you can do also use the "auto" mode with the last built
> image to boot the VM, only print "OK", stop and retry if there were no
> errors:
> 
>   $ cd <kernel source code>
>   $ echo 'echo OK' > .virtme-exec-run
>   # i=1; \
>     while docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --rm \
>     -it --privileged mptcp/mptcp-upstream-virtme-docker:latest \
>     vm auto normal; do \
>       echo "== Attempt: $i: OK =="; \
>       i=$((i+1)); \
>     done; \
>     echo "== Failure after $i attempts =="

After having sent this email, I re-checked on my side, and I was able to
reproduce this issue with the technique described above: using the
docker image with "build" argument, then max 50 boot iterations with "vm
auto normal" argument. I then used 'git bisect' between v6.18 and
v6.19-rc1 to find the guilty commit, and got:

  653fda7ae73d ("sched/mmcid: Switch over to the new mechanism")

Reverting it on top of v6.19-rc1 fixes the issue.

Unfortunatelly, reverting it on top of Linus' tree causes some
conflicts. I did my best to resolve them, and with this patch attached
below -- also available in [1] -- I no longer have the issue. I don't
know if it is correct -- some quick tests don't show any issues -- nor
if Jiri should test it. I guess the final fix will be different from
this simple revert.

Note: I also tried Peter's patch (thank you for sharing it!), but I can
still reproduce the issue with it on top of Linus' tree.

[1] https://git.kernel.org/matttbe/net-next/c/5e4b47fd150c

Cheers,
Matt

---
diff --git a/include/linux/rseq.h b/include/linux/rseq.h
index b9d62fc2140d..ef4ff117d037 100644
--- a/include/linux/rseq.h
+++ b/include/linux/rseq.h
@@ -84,6 +84,24 @@ static __always_inline void rseq_sched_set_ids_changed(struct task_struct *t)
 	t->rseq.event.ids_changed = true;
 }
 
+/*
+ * Invoked from switch_mm_cid() in context switch when the task gets a MM
+ * CID assigned.
+ *
+ * This does not raise TIF_NOTIFY_RESUME as that happens in
+ * rseq_sched_switch_event().
+ */
+static __always_inline void rseq_sched_set_task_mm_cid(struct task_struct *t, unsigned int cid)
+{
+	/*
+	 * Requires a comparison as the switch_mm_cid() code does not
+	 * provide a conditional for it readily. So avoid excessive updates
+	 * when nothing changes.
+	 */
+	if (t->rseq.ids.mm_cid != cid)
+		t->rseq.event.ids_changed = true;
+}
+
 /* Enforce a full update after RSEQ registration and when execve() failed */
 static inline void rseq_force_update(void)
 {
@@ -163,6 +181,7 @@ static inline void rseq_handle_slowpath(struct pt_regs *regs) { }
 static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs) { }
 static inline void rseq_sched_switch_event(struct task_struct *t) { }
 static inline void rseq_sched_set_ids_changed(struct task_struct *t) { }
+static inline void rseq_sched_set_task_mm_cid(struct task_struct *t, unsigned int cid) { }
 static inline void rseq_force_update(void) { }
 static inline void rseq_virt_userspace_exit(void) { }
 static inline void rseq_fork(struct task_struct *t, u64 clone_flags) { }
diff --git a/include/linux/rseq_types.h b/include/linux/rseq_types.h
index da5fa6f40294..61d294d3bbd7 100644
--- a/include/linux/rseq_types.h
+++ b/include/linux/rseq_types.h
@@ -131,18 +131,18 @@ struct rseq_data { };
 /**
  * struct sched_mm_cid - Storage for per task MM CID data
  * @active:	MM CID is active for the task
- * @cid:	The CID associated to the task either permanently or
- *		borrowed from the CPU
+ * @cid:	The CID associated to the task
+ * @last_cid:	The last CID associated to the task
  */
 struct sched_mm_cid {
 	unsigned int		active;
 	unsigned int		cid;
+	unsigned int		last_cid;
 };
 
 /**
  * struct mm_cid_pcpu - Storage for per CPU MM_CID data
- * @cid:	The CID associated to the CPU either permanently or
- *		while a task with a CID is running
+ * @cid:	The CID associated to the CPU
  */
 struct mm_cid_pcpu {
 	unsigned int	cid;
diff --git a/kernel/fork.c b/kernel/fork.c
index 65113a304518..af3f65f963e2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -999,6 +999,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 
 #ifdef CONFIG_SCHED_MM_CID
 	tsk->mm_cid.cid = MM_CID_UNSET;
+	tsk->mm_cid.last_cid = MM_CID_UNSET;
 	tsk->mm_cid.active = 0;
 #endif
 	return tsk;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7f77c165a6e..cc969711cb08 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5281,7 +5281,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		}
 	}
 
-	mm_cid_switch_to(prev, next);
+	switch_mm_cid(prev, next);
 
 	/*
 	 * Tell rseq that the task was scheduled in. Must be after
@@ -10634,7 +10634,7 @@ static bool mm_cid_fixup_task_to_cpu(struct task_struct *t, struct mm_struct *mm
 	return true;
 }
 
-static void mm_cid_do_fixup_tasks_to_cpus(struct mm_struct *mm)
+static void __maybe_unused mm_cid_do_fixup_tasks_to_cpus(struct mm_struct *mm)
 {
 	struct task_struct *p, *t;
 	unsigned int users;
@@ -10673,7 +10673,7 @@ static void mm_cid_do_fixup_tasks_to_cpus(struct mm_struct *mm)
 	}
 }
 
-static void mm_cid_fixup_tasks_to_cpus(void)
+static void __maybe_unused mm_cid_fixup_tasks_to_cpus(void)
 {
 	struct mm_struct *mm = current->mm;
 
@@ -10691,81 +10691,25 @@ static bool sched_mm_cid_add_user(struct task_struct *t, struct mm_struct *mm)
 void sched_mm_cid_fork(struct task_struct *t)
 {
 	struct mm_struct *mm = t->mm;
-	bool percpu;
 
 	WARN_ON_ONCE(!mm || t->mm_cid.cid != MM_CID_UNSET);
 
 	guard(mutex)(&mm->mm_cid.mutex);
-	scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
-		struct mm_cid_pcpu *pcp = this_cpu_ptr(mm->mm_cid.pcpu);
-
-		/* First user ? */
-		if (!mm->mm_cid.users) {
-			sched_mm_cid_add_user(t, mm);
-			t->mm_cid.cid = mm_get_cid(mm);
-			/* Required for execve() */
-			pcp->cid = t->mm_cid.cid;
-			return;
-		}
-
-		if (!sched_mm_cid_add_user(t, mm)) {
-			if (!cid_on_cpu(mm->mm_cid.mode))
-				t->mm_cid.cid = mm_get_cid(mm);
-			return;
-		}
-
-		/* Handle the mode change and transfer current's CID */
-		percpu = cid_on_cpu(mm->mm_cid.mode);
-		if (!percpu)
-			mm_cid_transit_to_task(current, pcp);
-		else
-			mm_cid_transit_to_cpu(current, pcp);
-	}
-
-	if (percpu) {
-		mm_cid_fixup_tasks_to_cpus();
-	} else {
-		mm_cid_fixup_cpus_to_tasks(mm);
-		t->mm_cid.cid = mm_get_cid(mm);
+	scoped_guard(raw_spinlock, &mm->mm_cid.lock) {
+		sched_mm_cid_add_user(t, mm);
+		/* Preset last_cid for mm_cid_select() */
+		t->mm_cid.last_cid = mm->mm_cid.max_cids - 1;
 	}
 }
 
 static bool sched_mm_cid_remove_user(struct task_struct *t)
 {
 	t->mm_cid.active = 0;
-	scoped_guard(preempt) {
-		/* Clear the transition bit */
-		t->mm_cid.cid = cid_from_transit_cid(t->mm_cid.cid);
-		mm_unset_cid_on_task(t);
-	}
+	mm_unset_cid_on_task(t);
 	t->mm->mm_cid.users--;
 	return mm_update_max_cids(t->mm);
 }
 
-static bool __sched_mm_cid_exit(struct task_struct *t)
-{
-	struct mm_struct *mm = t->mm;
-
-	if (!sched_mm_cid_remove_user(t))
-		return false;
-	/*
-	 * Contrary to fork() this only deals with a switch back to per
-	 * task mode either because the above decreased users or an
-	 * affinity change increased the number of allowed CPUs and the
-	 * deferred fixup did not run yet.
-	 */
-	if (WARN_ON_ONCE(cid_on_cpu(mm->mm_cid.mode)))
-		return false;
-	/*
-	 * A failed fork(2) cleanup never gets here, so @current must have
-	 * the same MM as @t. That's true for exit() and the failed
-	 * pthread_create() cleanup case.
-	 */
-	if (WARN_ON_ONCE(current->mm != mm))
-		return false;
-	return true;
-}
-
 /*
  * When a task exits, the MM CID held by the task is not longer required as
  * the task cannot return to user space.
@@ -10776,48 +10720,10 @@ void sched_mm_cid_exit(struct task_struct *t)
 
 	if (!mm || !t->mm_cid.active)
 		return;
-	/*
-	 * Ensure that only one instance is doing MM CID operations within
-	 * a MM. The common case is uncontended. The rare fixup case adds
-	 * some overhead.
-	 */
-	scoped_guard(mutex, &mm->mm_cid.mutex) {
-		/* mm_cid::mutex is sufficient to protect mm_cid::users */
-		if (likely(mm->mm_cid.users > 1)) {
-			scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
-				if (!__sched_mm_cid_exit(t))
-					return;
-				/*
-				 * Mode change. The task has the CID unset
-				 * already and dealt with an eventually set
-				 * TRANSIT bit. If the CID is owned by the CPU
-				 * then drop it.
-				 */
-				mm_drop_cid_on_cpu(mm, this_cpu_ptr(mm->mm_cid.pcpu));
-			}
-			mm_cid_fixup_cpus_to_tasks(mm);
-			return;
-		}
-		/* Last user */
-		scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
-			/* Required across execve() */
-			if (t == current)
-				mm_cid_transit_to_task(t, this_cpu_ptr(mm->mm_cid.pcpu));
-			/* Ignore mode change. There is nothing to do. */
-			sched_mm_cid_remove_user(t);
-		}
-	}
 
-	/*
-	 * As this is the last user (execve(), process exit or failed
-	 * fork(2)) there is no concurrency anymore.
-	 *
-	 * Synchronize eventually pending work to ensure that there are no
-	 * dangling references left. @t->mm_cid.users is zero so nothing
-	 * can queue this work anymore.
-	 */
-	irq_work_sync(&mm->mm_cid.irq_work);
-	cancel_work_sync(&mm->mm_cid.work);
+	guard(mutex)(&mm->mm_cid.mutex);
+	scoped_guard(raw_spinlock, &mm->mm_cid.lock)
+		sched_mm_cid_remove_user(t);
 }
 
 /* Deactivate MM CID allocation across execve() */
@@ -10831,12 +10737,18 @@ void sched_mm_cid_after_execve(struct task_struct *t)
 {
 	if (t->mm)
 		sched_mm_cid_fork(t);
+	guard(preempt)();
+	mm_cid_select(t);
 }
 
 static void mm_cid_work_fn(struct work_struct *work)
 {
 	struct mm_struct *mm = container_of(work, struct mm_struct, mm_cid.work);
 
+	/* Make it compile, but not functional yet */
+	if (!IS_ENABLED(CONFIG_NEW_MM_CID))
+		return;
+
 	guard(mutex)(&mm->mm_cid.mutex);
 	/* Did the last user task exit already? */
 	if (!mm->mm_cid.users)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 43bbf0693cca..b60d49fc9c11 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -4003,7 +4003,83 @@ static inline void mm_cid_switch_to(struct task_struct *prev, struct task_struct
 	mm_cid_schedin(next);
 }
 
+/* Active implementation */
+static inline void init_sched_mm_cid(struct task_struct *t)
+{
+	struct mm_struct *mm = t->mm;
+	unsigned int max_cid;
+
+	if (!mm)
+		return;
+
+	/* Preset last_mm_cid */
+	max_cid = min_t(int, READ_ONCE(mm->mm_cid.nr_cpus_allowed), atomic_read(&mm->mm_users));
+	t->mm_cid.last_cid = max_cid - 1;
+}
+
+static inline bool __mm_cid_get(struct task_struct *t, unsigned int cid, unsigned int max_cids)
+{
+	struct mm_struct *mm = t->mm;
+
+	if (cid >= max_cids)
+		return false;
+	if (test_and_set_bit(cid, mm_cidmask(mm)))
+		return false;
+	t->mm_cid.cid = t->mm_cid.last_cid = cid;
+	__this_cpu_write(mm->mm_cid.pcpu->cid, cid);
+	return true;
+}
+
+static inline bool mm_cid_get(struct task_struct *t)
+{
+	struct mm_struct *mm = t->mm;
+	unsigned int max_cids;
+
+	max_cids = READ_ONCE(mm->mm_cid.max_cids);
+
+	/* Try to reuse the last CID of this task */
+	if (__mm_cid_get(t, t->mm_cid.last_cid, max_cids))
+		return true;
+
+	/* Try to reuse the last CID of this mm on this CPU */
+	if (__mm_cid_get(t, __this_cpu_read(mm->mm_cid.pcpu->cid), max_cids))
+		return true;
+
+	/* Try the first zero bit in the cidmask. */
+	return __mm_cid_get(t, find_first_zero_bit(mm_cidmask(mm), num_possible_cpus()), max_cids);
+}
+
+static inline void mm_cid_select(struct task_struct *t)
+{
+	/*
+	 * mm_cid_get() can fail when the maximum CID, which is determined
+	 * by min(mm->nr_cpus_allowed, mm->mm_users) changes concurrently.
+	 * That's a transient failure as there cannot be more tasks
+	 * concurrently on a CPU (or about to be scheduled in) than that.
+	 */
+	for (;;) {
+		if (mm_cid_get(t))
+			break;
+	}
+}
+
+static inline void switch_mm_cid(struct task_struct *prev, struct task_struct *next)
+{
+	if (prev->mm_cid.active) {
+		if (prev->mm_cid.cid != MM_CID_UNSET)
+			clear_bit(prev->mm_cid.cid, mm_cidmask(prev->mm));
+		prev->mm_cid.cid = MM_CID_UNSET;
+	}
+
+	if (next->mm_cid.active) {
+		mm_cid_select(next);
+		rseq_sched_set_task_mm_cid(next, next->mm_cid.cid);
+	}
+}
+
 #else /* !CONFIG_SCHED_MM_CID: */
+static inline void mm_cid_select(struct task_struct *t) { }
+static inline void switch_mm_cid(struct task_struct *prev, struct task_struct *next) { }
 static inline void mm_cid_switch_to(struct task_struct *prev, struct task_struct *next) { }
 #endif /* !CONFIG_SCHED_MM_CID */
 
-- 
Sponsored by the NGI0 Core fund.

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06 16:57                         ` Matthieu Baerts
@ 2026-03-06 18:31                           ` Jiri Slaby
  2026-03-06 18:44                             ` Matthieu Baerts
  2026-03-06 21:40                           ` Matthieu Baerts
  1 sibling, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-06 18:31 UTC (permalink / raw)
  To: Matthieu Baerts, Thomas Gleixner, Peter Zijlstra
  Cc: Stefan Hajnoczi, Stefano Garzarella, kvm, virtualization, Netdev,
	rcu, MPTCP Linux, Linux Kernel, Shinichiro Kawasaki,
	Paul E. McKenney, Dave Hansen, luto@kernel.org,
	Michal Koutný, Waiman Long

On 06. 03. 26, 17:57, Matthieu Baerts wrote:
>    653fda7ae73d ("sched/mmcid: Switch over to the new mechanism")

It looks like there were similar issues reported in the thread with the 
above submitted patch. Did the patchset mentioned at the end of the 
thread here:
   https://lore.kernel.org/all/87h5s4mjqw.ffs@tglx/
made it to the list/some tree? Or are the two issues there different 
from this one?

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06 18:31                           ` Jiri Slaby
@ 2026-03-06 18:44                             ` Matthieu Baerts
  0 siblings, 0 replies; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-06 18:44 UTC (permalink / raw)
  To: Jiri Slaby, Thomas Gleixner, Peter Zijlstra
  Cc: Stefan Hajnoczi, Stefano Garzarella, kvm, virtualization, Netdev,
	rcu, MPTCP Linux, Linux Kernel, Shinichiro Kawasaki,
	Paul E. McKenney, Dave Hansen, luto@kernel.org,
	Michal Koutný, Waiman Long

Hi Jiri,

On 06/03/2026 19:31, Jiri Slaby wrote:
> On 06. 03. 26, 17:57, Matthieu Baerts wrote:
>>    653fda7ae73d ("sched/mmcid: Switch over to the new mechanism")
> 
> It looks like there were similar issues reported in the thread with the
> above submitted patch. Did the patchset mentioned at the end of the
> thread here:
>   https://lore.kernel.org/all/87h5s4mjqw.ffs@tglx/
> made it to the list/some tree? Or are the two issues there different
> from this one?

Yes, I think the two issues are different from what we are seeing. If
I'm not mistaken, the mentioned patchset is this one:

  https://lore.kernel.org/20260129210219.452851594@kernel.org

The v2 has been sent short after:

  https://lore.kernel.org/20260201192234.380608594@kernel.org

Applied and sent to Linus before the v6.19 release:

  https://lore.kernel.org/aYcPrLN6PV5xr63J@gmail.com

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06 16:57                         ` Matthieu Baerts
  2026-03-06 18:31                           ` Jiri Slaby
@ 2026-03-06 21:40                           ` Matthieu Baerts
  1 sibling, 0 replies; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-06 21:40 UTC (permalink / raw)
  To: Thomas Gleixner, Jiri Slaby, Peter Zijlstra
  Cc: Stefan Hajnoczi, Stefano Garzarella, kvm, virtualization, Netdev,
	rcu, MPTCP Linux, Linux Kernel, Shinichiro Kawasaki,
	Paul E. McKenney, Dave Hansen, luto, Michal Koutný,
	Waiman Long

06 Mar 2026 17:57:26 Matthieu Baerts <matttbe@kernel.org>:

(...)

> After having sent this email, I re-checked on my side, and I was able to
> reproduce this issue with the technique described above: using the
> docker image with "build" argument, then max 50 boot iterations with "vm
> auto normal" argument. I then used 'git bisect' between v6.18 and
> v6.19-rc1 to find the guilty commit, and got:
>
>   653fda7ae73d ("sched/mmcid: Switch over to the new mechanism")
>
> Reverting it on top of v6.19-rc1 fixes the issue.
>
> Unfortunatelly, reverting it on top of Linus' tree causes some
> conflicts. I did my best to resolve them, and with this patch attached
> below -- also available in [1] -- I no longer have the issue. I don't
> know if it is correct -- some quick tests don't show any issues -- nor
> if Jiri should test it. I guess the final fix will be different from
> this simple revert.

As probably expected, even if this revert fixed the boot issues, it
caused a regression during the tests execution, see below.


[  493.608357][    C3] rcu: 3-....: (26000 ticks this GP) idle=d54c/1/0x4000000000000000 softirq=214151/214154 fqs=6500
[  493.609867][    C3] rcu: (t=26003 jiffies g=392697 q=1127 ncpus=4)
[  493.610566][    C3] CPU: 3 UID: 0 PID: 4961 Comm: sleep Tainted: G                 N  7.0.0-rc2+ #1 PREEMPT(full)
[  493.610575][    C3] Tainted: [N]=TEST
[  493.610577][    C3] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  493.610583][    C3] RIP: 0010:sched_mm_cid_after_execve (kernel/sched/sched.h:4026 (discriminator 2))
[  493.610596][    C3] Code: 05 00 00 4c 89 e8 48 c1 f8 06 4c 89 4c 24 08 49 8d bc c1 10 0b 00 00 e8 99 4a 90 00 4c 8b 4c 24 08 f0 4d 0f ab a9 10 0b 00 00 <0f> 82 b7 00 00 00 48 8b 54 24 10 44 8b 44 24 1c 48 b8 00 00 00 00
All code
========
   0: 05 00 00 4c 89       add    $0x894c0000,%eax
   5: e8 48 c1 f8 06       call   0x6f8c152
   a: 4c 89 4c 24 08       mov    %r9,0x8(%rsp)
   f: 49 8d bc c1 10 0b 00 lea    0xb10(%r9,%rax,8),%rdi
  16: 00
  17: e8 99 4a 90 00       call   0x904ab5
  1c: 4c 8b 4c 24 08       mov    0x8(%rsp),%r9
  21: f0 4d 0f ab a9 10 0b lock bts %r13,0xb10(%r9)
  28: 00 00
  2a:* 0f 82 b7 00 00 00   jb     0xe7 <-- trapping instruction
  30: 48 8b 54 24 10       mov    0x10(%rsp),%rdx
  35: 44 8b 44 24 1c       mov    0x1c(%rsp),%r8d
  3a: 48                   rex.W
  3b: b8 00 00 00 00       mov    $0x0,%eax

Code starting with the faulting instruction
===========================================
   0: 0f 82 b7 00 00 00    jb     0xbd
   6: 48 8b 54 24 10       mov    0x10(%rsp),%rdx
   b: 44 8b 44 24 1c       mov    0x1c(%rsp),%r8d
  10: 48                   rex.W
  11: b8 00 00 00 00       mov    $0x0,%eax
[  493.610600][    C3] RSP: 0018:ffffc90002957e00 EFLAGS: 00000247
[  493.610605][    C3] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001
[  493.610609][    C3] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8881000aba10
[  493.610612][    C3] RBP: dffffc0000000000 R08: ffffffff8185fe97 R09: ffff8881000aaf00
[  493.610615][    C3] R10: ffffed1020015743 R11: 0000000000000000 R12: ffffed1021dda7a8
[  493.610618][    C3] R13: 0000000000000000 R14: ffff8881000aaf00 R15: ffff88810eed3800
[  493.610663][    C3] FS:  0000000000000000(0000) GS:ffff8881cc110000(0000) knlGS:0000000000000000
[  493.610680][    C3] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  493.610684][    C3] CR2: 00007ffef3a7dd89 CR3: 00000001204a1005 CR4: 0000000000370ef0
[  493.610688][    C3] Call Trace:
[  493.610691][    C3]  <TASK>
[  493.610703][    C3]  bprm_execve (include/linux/rseq.h:140)
[  493.610717][    C3]  do_execveat_common.isra.0 (fs/exec.c:1846)
[  493.610731][    C3]  __x64_sys_execve (include/linux/fs.h:2539)
[  493.610740][    C3]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1))
[  493.610750][    C3]  ? exc_page_fault (arch/x86/mm/fault.c:1480 (discriminator 3))
[  493.610760][    C3]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[  493.610767][    C3] RIP: 0033:0x7fb401448140
[  493.610780][    C3] Code: Unable to access opcode bytes at 0x7fb401448116.

Code starting with the faulting instruction
===========================================
[  493.610782][    C3] RSP: 002b:00007ffef3a7da70 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[  493.610787][    C3] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000
[  493.610790][    C3] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  493.610793][    C3] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  493.610795][    C3] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  493.610796][    C3] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  493.610815][    C3]  </TASK>
main_loop_s: timed out
[  503.574195][   T17] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 35898 jiffies s: 8761 root: 0x8/.
[  503.575291][   T17] rcu: blocking rcu_node structures (internal RCU debug):
[  503.576282][   T17] Sending NMI from CPU 1 to CPUs 3:
[  503.577258][    C3] NMI backtrace for cpu 3
[  503.577269][    C3] CPU: 3 UID: 0 PID: 4961 Comm: sleep Tainted: G                 N  7.0.0-rc2+ #1 PREEMPT(full)
[  503.577277][    C3] Tainted: [N]=TEST
[  503.577279][    C3] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  503.577282][    C3] RIP: 0010:sched_mm_cid_after_execve (kernel/sched/sched.h:4022)
[  503.577295][    C3] Code: 34 2e 40 38 f0 7c 09 40 84 f6 0f 85 6e 01 00 00 8b 35 d6 72 d5 02 49 8d be 10 0b 00 00 e8 b6 cc f9 00 49 89 c5 41 80 3c 24 00 <0f> 85 5f 01 00 00 4d 8b b7 40 05 00 00 41 39 dd 0f 83 1e fd ff ff
All code
========
   0: 34 2e                xor    $0x2e,%al
   2: 40 38 f0             cmp    %sil,%al
   5: 7c 09                jl     0x10
   7: 40 84 f6             test   %sil,%sil
   a: 0f 85 6e 01 00 00    jne    0x17e
  10: 8b 35 d6 72 d5 02    mov    0x2d572d6(%rip),%esi        # 0x2d572ec
  16: 49 8d be 10 0b 00 00 lea    0xb10(%r14),%rdi
  1d: e8 b6 cc f9 00       call   0xf9ccd8
  22: 49 89 c5             mov    %rax,%r13
  25: 41 80 3c 24 00       cmpb   $0x0,(%r12)
  2a:* 0f 85 5f 01 00 00    jne    0x18f <-- trapping instruction
  30: 4d 8b b7 40 05 00 00 mov    0x540(%r15),%r14
  37: 41 39 dd             cmp    %ebx,%r13d
  3a: 0f 83 1e fd ff ff    jae    0xfffffffffffffd5e

Code starting with the faulting instruction
===========================================
   0: 0f 85 5f 01 00 00    jne    0x165
   6: 4d 8b b7 40 05 00 00 mov    0x540(%r15),%r14
   d: 41 39 dd             cmp    %ebx,%r13d
  10: 0f 83 1e fd ff ff    jae    0xfffffffffffffd34
[  503.577300][    C3] RSP: 0018:ffffc90002957e00 EFLAGS: 00000246
[  503.577305][    C3] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001
[  503.577308][    C3] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffff8881000aba10
[  503.577311][    C3] RBP: dffffc0000000000 R08: ffffffff8185fe97 R09: ffff8881000aaf00
[  503.577314][    C3] R10: ffffed1020015743 R11: 0000000000000000 R12: ffffed1021dda7a8
[  503.577316][    C3] R13: 0000000000000001 R14: ffff8881000aaf00 R15: ffff88810eed3800
[  503.577335][    C3] FS:  0000000000000000(0000) GS:ffff8881cc110000(0000) knlGS:0000000000000000
[  503.577350][    C3] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  503.577353][    C3] CR2: 00007fb401448116 CR3: 00000001204a1005 CR4: 0000000000370ef0
[  503.577356][    C3] Call Trace:
[  503.577360][    C3]  <TASK>
[  503.577368][    C3]  bprm_execve (include/linux/rseq.h:140)
[  503.577377][    C3]  do_execveat_common.isra.0 (fs/exec.c:1846)
[  503.577385][    C3]  __x64_sys_execve (include/linux/fs.h:2539)
[  503.577392][    C3]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1))
[  503.577401][    C3]  ? exc_page_fault (arch/x86/mm/fault.c:1480 (discriminator 3))
[  503.577408][    C3]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[  503.577415][    C3] RIP: 0033:0x7fb401448140
[  503.577427][    C3] Code: Unable to access opcode bytes at 0x7fb401448116.

Code starting with the faulting instruction
===========================================
[  503.577430][    C3] RSP: 002b:00007ffef3a7da70 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[  503.577435][    C3] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000
[  503.577437][    C3] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  503.577439][    C3] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  503.577442][    C3] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  503.577444][    C3] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  503.577453][    C3]  </TASK>
[  530.927646][    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 60s! [sleep:4961]
[  530.927660][    C3] Modules linked in: nft_tproxy nf_tproxy_ipv6 nf_tproxy_ipv4 nft_socket nf_socket_ipv4 nf_socket_ipv6 nf_tables sch_netem tcp_diag mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[  530.927693][    C3] irq event stamp: 141062
[  530.927696][    C3] hardirqs last  enabled at (141061): irqentry_exit (kernel/entry/common.c:243)
[  530.927711][    C3] hardirqs last disabled at (141062): sysvec_apic_timer_interrupt (arch/x86/include/asm/hardirq.h:81)
[  530.927715][    C3] softirqs last  enabled at (140962): handle_softirqs (kernel/softirq.c:469 (discriminator 2))
[  530.927720][    C3] softirqs last disabled at (140949): __irq_exit_rcu (kernel/softirq.c:657)
[  530.927727][    C3] CPU: 3 UID: 0 PID: 4961 Comm: sleep Tainted: G                 N  7.0.0-rc2+ #1 PREEMPT(full)
[  530.927732][    C3] Tainted: [N]=TEST
[  530.927734][    C3] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  530.927736][    C3] RIP: 0010:sched_mm_cid_after_execve (kernel/sched/sched.h:4045 (discriminator 6))
[  530.927742][    C3] Code: 02 83 04 85 c0 0f 84 6b 02 00 00 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 c7 c7 40 f5 cd 83 e8 bd df 19 02 <49> 8d be 00 01 00 00 48 89 f8 48 c1 e8 03 80 3c 28 00 0f 85 65 02
All code
========
   0: 02 83 04 85 c0 0f    add    0xfc08504(%rbx),%al
   6: 84 6b 02             test   %ch,0x2(%rbx)
   9: 00 00                add    %al,(%rax)
   b: 48 83 c4 30          add    $0x30,%rsp
   f: 5b                   pop    %rbx
  10: 5d                   pop    %rbp
  11: 41 5c                pop    %r12
  13: 41 5d                pop    %r13
  15: 41 5e                pop    %r14
  17: 41 5f                pop    %r15
  19: c3                   ret
  1a: cc                   int3
  1b: cc                   int3
  1c: cc                   int3
  1d: cc                   int3
  1e: 48 c7 c7 40 f5 cd 83 mov    $0xffffffff83cdf540,%rdi
  25: e8 bd df 19 02       call   0x219dfe7
  2a:* 49 8d be 00 01 00 00 lea    0x100(%r14),%rdi <-- trapping instruction
  31: 48 89 f8             mov    %rdi,%rax
  34: 48 c1 e8 03          shr    $0x3,%rax
  38: 80 3c 28 00          cmpb   $0x0,(%rax,%rbp,1)
  3c: 0f                   .byte 0xf
  3d: 85 65 02             test   %esp,0x2(%rbp)

Code starting with the faulting instruction
===========================================
   0: 49 8d be 00 01 00 00 lea    0x100(%r14),%rdi
   7: 48 89 f8             mov    %rdi,%rax
   a: 48 c1 e8 03          shr    $0x3,%rax
   e: 80 3c 28 00          cmpb   $0x0,(%rax,%rbp,1)
  12: 0f                   .byte 0xf
  13: 85 65 02             test   %esp,0x2(%rbp)
[  530.927744][    C3] RSP: 0018:ffffc90002957e00 EFLAGS: 00000202
[  530.927747][    C3] RAX: 0000000000000003 RBX: 0000000000000001 RCX: 0000000000000001
[  530.927749][    C3] RDX: 0000000000000001 RSI: ffffffff83cdf540 RDI: ffffffff83eec920
[  530.927751][    C3] RBP: dffffc0000000000 R08: ffffffff8185fd4a R09: 0000000000000000
[  530.927752][    C3] R10: 0000000000000003 R11: 0000000000000000 R12: ffffed1021dda7a8
[  530.927753][    C3] R13: 0000000000000000 R14: ffff8881000aaf00 R15: ffff88810eed3800
[  530.927766][    C3] FS:  0000000000000000(0000) GS:ffff8881cc110000(0000) knlGS:0000000000000000
[  530.927776][    C3] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  530.927778][    C3] CR2: 00007fb401448116 CR3: 00000001204a1005 CR4: 0000000000370ef0
[  530.927780][    C3] Call Trace:
[  530.927784][    C3]  <TASK>
[  530.927792][    C3]  bprm_execve (include/linux/rseq.h:140)
[  530.927801][    C3]  do_execveat_common.isra.0 (fs/exec.c:1846)
[  530.927807][    C3]  __x64_sys_execve (include/linux/fs.h:2539)
[  530.927812][    C3]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1))
[  530.927817][    C3]  ? exc_page_fault (arch/x86/mm/fault.c:1480 (discriminator 3))
[  530.927821][    C3]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[  530.927826][    C3] RIP: 0033:0x7fb401448140
[  530.927835][    C3] Code: Unable to access opcode bytes at 0x7fb401448116.

Code starting with the faulting instruction
===========================================
[  530.927837][    C3] RSP: 002b:00007ffef3a7da70 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[  530.927839][    C3] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000
[  530.927840][    C3] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  530.927842][    C3] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  530.927843][    C3] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  530.927844][    C3] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  530.927854][    C3]  </TASK>
[  530.927857][    C3] Kernel panic - not syncing: softlockup: hung tasks
[  530.958801][    C3] CPU: 3 UID: 0 PID: 4961 Comm: sleep Tainted: G             L   N  7.0.0-rc2+ #1 PREEMPT(full)
[  530.960019][    C3] Tainted: [L]=SOFTLOCKUP, [N]=TEST
[  530.960652][    C3] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  530.961437][    C3] Call Trace:
[  530.961931][    C3]  <IRQ>
[  530.962208][    C3]  dump_stack_lvl (lib/dump_stack.c:122)
[  530.962740][    C3]  vpanic (kernel/panic.c:651)
[  530.963120][    C3]  panic (kernel/panic.c:787)
[  530.963502][    C3]  ? __pfx_panic (kernel/panic.c:783)
[  530.964024][    C3]  ? add_taint (arch/x86/include/asm/bitops.h:60)
[  530.964459][    C3]  watchdog_timer_fn.cold (kernel/watchdog.c:871)
[  530.965081][    C3]  ? __pfx_watchdog_timer_fn (kernel/watchdog.c:774)
[  530.965629][    C3]  __run_hrtimer (kernel/time/hrtimer.c:1785)
[  530.966417][    C3]  __hrtimer_run_queues (include/linux/timerqueue.h:25)
[  530.966930][    C3]  ? __pfx___hrtimer_run_queues (kernel/time/hrtimer.c:1819)
[  530.967470][    C3]  ? ktime_get_update_offsets_now (kernel/time/timekeeping.c:381)
[  530.968251][    C3]  hrtimer_interrupt (kernel/time/hrtimer.c:1914)
[  530.968990][    C3]  __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:37)
[  530.969783][    C3]  sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1056 (discriminator 47))
[  530.970343][    C3]  </IRQ>
[  530.970723][    C3]  <TASK>
[  530.971036][    C3]  asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:697)
[  530.971947][    C3] RIP: 0010:sched_mm_cid_after_execve (kernel/sched/sched.h:4045 (discriminator 6))
[  530.972671][    C3] Code: 02 83 04 85 c0 0f 84 6b 02 00 00 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 c7 c7 40 f5 cd 83 e8 bd df 19 02 <49> 8d be 00 01 00 00 48 89 f8 48 c1 e8 03 80 3c 28 00 0f 85 65 02
All code
========
   0: 02 83 04 85 c0 0f    add    0xfc08504(%rbx),%al
   6: 84 6b 02             test   %ch,0x2(%rbx)
   9: 00 00                add    %al,(%rax)
   b: 48 83 c4 30          add    $0x30,%rsp
   f: 5b                   pop    %rbx
  10: 5d                   pop    %rbp
  11: 41 5c                pop    %r12
  13: 41 5d                pop    %r13
  15: 41 5e                pop    %r14
  17: 41 5f                pop    %r15
  19: c3                   ret
  1a: cc                   int3
  1b: cc                   int3
  1c: cc                   int3
  1d: cc                   int3
  1e: 48 c7 c7 40 f5 cd 83 mov    $0xffffffff83cdf540,%rdi
  25: e8 bd df 19 02       call   0x219dfe7
  2a:* 49 8d be 00 01 00 00 lea    0x100(%r14),%rdi <-- trapping instruction
  31: 48 89 f8             mov    %rdi,%rax
  34: 48 c1 e8 03          shr    $0x3,%rax
  38: 80 3c 28 00          cmpb   $0x0,(%rax,%rbp,1)
  3c: 0f                   .byte 0xf
  3d: 85 65 02             test   %esp,0x2(%rbp)

Code starting with the faulting instruction
===========================================
   0: 49 8d be 00 01 00 00 lea    0x100(%r14),%rdi
   7: 48 89 f8             mov    %rdi,%rax
   a: 48 c1 e8 03          shr    $0x3,%rax
   e: 80 3c 28 00          cmpb   $0x0,(%rax,%rbp,1)
  12: 0f                   .byte 0xf
  13: 85 65 02             test   %esp,0x2(%rbp)
[  530.974394][    C3] RSP: 0018:ffffc90002957e00 EFLAGS: 00000202
[  530.975308][    C3] RAX: 0000000000000003 RBX: 0000000000000001 RCX: 0000000000000001
[  530.976147][    C3] RDX: 0000000000000001 RSI: ffffffff83cdf540 RDI: ffffffff83eec920
[  530.977030][    C3] RBP: dffffc0000000000 R08: ffffffff8185fd4a R09: 0000000000000000
[  530.977910][    C3] R10: 0000000000000003 R11: 0000000000000000 R12: ffffed1021dda7a8
[  530.978926][    C3] R13: 0000000000000000 R14: ffff8881000aaf00 R15: ffff88810eed3800
[  530.979595][    C3]  ? sched_mm_cid_after_execve (arch/x86/include/asm/bitops.h:136 (discriminator 1))
[  530.980252][    C3]  bprm_execve (include/linux/rseq.h:140)
[  530.980941][    C3]  do_execveat_common.isra.0 (fs/exec.c:1846)
[  530.981424][    C3]  __x64_sys_execve (include/linux/fs.h:2539)
[  530.981941][    C3]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1))
[  530.982406][    C3]  ? exc_page_fault (arch/x86/mm/fault.c:1480 (discriminator 3))
[  530.983097][    C3]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[  530.983809][    C3] RIP: 0033:0x7fb401448140
[  530.984409][    C3] Code: Unable to access opcode bytes at 0x7fb401448116.

Code starting with the faulting instruction
===========================================
[  530.985240][    C3] RSP: 002b:00007ffef3a7da70 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[  530.986149][    C3] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000
[  530.986945][    C3] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  530.987807][    C3] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  530.988490][    C3] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  530.989334][    C3] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  530.990257][    C3]  </TASK>
[  530.991655][    C3] Kernel Offset: disabled


More details:

  https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22775627304#summary-66068034344

Cheers,
Matt

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-06 15:24                       ` Peter Zijlstra
@ 2026-03-07  9:01                         ` Thomas Gleixner
  2026-03-07 22:29                           ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-07  9:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Slaby, Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long, Marco Elver

On Fri, Mar 06 2026 at 16:24, Peter Zijlstra wrote:
> On Fri, Mar 06, 2026 at 10:57:15AM +0100, Thomas Gleixner wrote:
>
>> I tried with tons of test cases which stress test mmcid with threads and
>> failed.
>
> Are some of those in tools/testing/selftests ?
>
> Anyway, I was going over that code, and I noticed that there seems to be
> inconsistent locking for mm_mm_cid::pcpu.
>
> There's a bunch of sites that state we need rq->lock for remote access;
> but then things like sched_mm_cid_fork() and sched_mm_cid_exit() seem to
> think that holding mm_cid->lock is sufficient.
>
> This doesn't make sense to me, but maybe I missed something.

fork() and exit() are fully serialized. There can't be a mode change
with remote access going on concurrently.

I gave up staring at it yesterday as my brain started to melt. Let me
try again.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-07  9:01                         ` Thomas Gleixner
@ 2026-03-07 22:29                           ` Thomas Gleixner
  2026-03-08  9:15                             ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-07 22:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Slaby, Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long, Marco Elver

On Sat, Mar 07 2026 at 10:01, Thomas Gleixner wrote:
> I gave up staring at it yesterday as my brain started to melt. Let me
> try again.

[Un]Surprisingly a rested and awake brain works way better.

The good news is that I actually found a nasty brown paperbag bug in
mm_cid_schedout() while going through all of this with a fine comb:

     cid = cid_from_transit_cid(...);

     That preserves the MM_CID_ONCPU bit, which makes mm_drop_cid()
     clear bit 0x40000000 + CID. That is obviously way outside of the
     bitmap. So the actual CID bit is not cleared and the clear just
     corrupts some other piece of memory.

     I just retried with all the K*SAN muck enabled which should catch
     that out of bounds access, but it never triggered and I haven't
     seen syzbot reports to that effect either.

     Fix for that is below.

The bad news is that I couldn't come up with a scenario yet where this
bug leads to the outcome observed by Jiri and Matthieu, because the not
dropped CID bit in the bitmap is by chance cleaned up on the next
schedule in on that CPU due to the ONCPU bit still being set.

I'll look at it more tomorrow in the hope that this rested brain
approach works out again.

Thanks,

        tglx
---
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3809,7 +3809,8 @@ static __always_inline bool cid_on_task(
 
 static __always_inline void mm_drop_cid(struct mm_struct *mm, unsigned int cid)
 {
-	clear_bit(cid, mm_cidmask(mm));
+	if (!WARN_ON_ONCE(cid >= num_possible_cpus()))
+		clear_bit(cid, mm_cidmask(mm));
 }
 
 static __always_inline void mm_unset_cid_on_task(struct task_struct *t)
@@ -3978,7 +3979,13 @@ static __always_inline void mm_cid_sched
 		return;
 
 	mode = READ_ONCE(mm->mm_cid.mode);
+
+	/*
+	 * Needs to clear both TRANSIT and ONCPU to make the range comparison
+	 * and mm_drop_cid() work correctly.
+	 */
 	cid = cid_from_transit_cid(prev->mm_cid.cid);
+	cid = cpu_cid_to_cid(cid);
 
 	/*
 	 * If transition mode is done, transfer ownership when the CID is
@@ -3994,6 +4001,11 @@ static __always_inline void mm_cid_sched
 	} else {
 		mm_drop_cid(mm, cid);
 		prev->mm_cid.cid = MM_CID_UNSET;
+		/*
+		 * Invalidate the per CPU CID so that the next mm_cid_schedin()
+		 * can't observe MM_CID_ONCPU on the per CPU CID.
+		 */
+		mm_cid_update_pcpu_cid(mm, 0);
 	}
 }
 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-07 22:29                           ` Thomas Gleixner
@ 2026-03-08  9:15                             ` Thomas Gleixner
  2026-03-08 16:55                               ` Jiri Slaby
  2026-03-08 16:58                               ` Thomas Gleixner
  0 siblings, 2 replies; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-08  9:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Slaby, Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long, Marco Elver

On Sat, Mar 07 2026 at 23:29, Thomas Gleixner wrote:
> I'll look at it more tomorrow in the hope that this rested brain
> approach works out again.

There is another one of the same category. Combo patch below.

Thanks,

        tglx
---
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10584,6 +10584,11 @@ static void mm_cid_fixup_cpus_to_tasks(s
 
 		/* Remote access to mm::mm_cid::pcpu requires rq_lock */
 		guard(rq_lock_irq)(rq);
+
+		/* If the transit bit is set already, nothing to do anymore.  */
+		if (cid_in_transit(pcp->cid))
+			continue;
+
 		/* Is the CID still owned by the CPU? */
 		if (cid_on_cpu(pcp->cid)) {
 			/*
@@ -10598,12 +10603,9 @@ static void mm_cid_fixup_cpus_to_tasks(s
 		} else if (rq->curr->mm == mm && rq->curr->mm_cid.active) {
 			unsigned int cid = rq->curr->mm_cid.cid;
 
-			/* Ensure it has the transition bit set */
-			if (!cid_in_transit(cid)) {
-				cid = cid_to_transit_cid(cid);
-				rq->curr->mm_cid.cid = cid;
-				pcp->cid = cid;
-			}
+			cid = cid_to_transit_cid(cid);
+			rq->curr->mm_cid.cid = cid;
+			pcp->cid = cid;
 		}
 	}
 	mm_cid_complete_transit(mm, 0);
@@ -10733,11 +10735,30 @@ void sched_mm_cid_fork(struct task_struc
 static bool sched_mm_cid_remove_user(struct task_struct *t)
 {
 	t->mm_cid.active = 0;
-	scoped_guard(preempt) {
-		/* Clear the transition bit */
+	/*
+	 * If @t is current and the CID is in transition mode, then this has to
+	 * handle both the task and the per CPU storage.
+	 *
+	 * If the CID has TRANSIT and ONCPU set, then mm_unset_cid_on_task()
+	 * won't drop the CID. As @t has already mm_cid::active cleared
+	 * mm_cid_schedout() won't drop it either.
+	 *
+	 * A failed fork cleanup can't have the transit bit set because the task
+	 * never showed up in the task list or got on a CPU.
+	 */
+	if (t == current) {
+		/* Invalidate the per CPU CID */
+		this_cpu_ptr(t->mm->mm_cid.pcpu)->cid = 0;
+		/*
+		 * Clear TRANSIT and ONCPU, so the CID gets actually dropped
+		 * below.
+		 */
 		t->mm_cid.cid = cid_from_transit_cid(t->mm_cid.cid);
-		mm_unset_cid_on_task(t);
+		t->mm_cid.cid = cpu_cid_to_cid(t->mm_cid.cid);
 	}
+
+	mm_unset_cid_on_task(t);
+
 	t->mm->mm_cid.users--;
 	return mm_update_max_cids(t->mm);
 }
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3809,7 +3809,8 @@ static __always_inline bool cid_on_task(
 
 static __always_inline void mm_drop_cid(struct mm_struct *mm, unsigned int cid)
 {
-	clear_bit(cid, mm_cidmask(mm));
+	if (!WARN_ON_ONCE(cid >= num_possible_cpus()))
+		clear_bit(cid, mm_cidmask(mm));
 }
 
 static __always_inline void mm_unset_cid_on_task(struct task_struct *t)
@@ -3978,7 +3979,13 @@ static __always_inline void mm_cid_sched
 		return;
 
 	mode = READ_ONCE(mm->mm_cid.mode);
+
+	/*
+	 * Needs to clear both TRANSIT and ONCPU to make the range comparison
+	 * and mm_drop_cid() work correctly.
+	 */
 	cid = cid_from_transit_cid(prev->mm_cid.cid);
+	cid = cpu_cid_to_cid(cid);
 
 	/*
 	 * If transition mode is done, transfer ownership when the CID is
@@ -3994,6 +4001,11 @@ static __always_inline void mm_cid_sched
 	} else {
 		mm_drop_cid(mm, cid);
 		prev->mm_cid.cid = MM_CID_UNSET;
+		/*
+		 * Invalidate the per CPU CID so that the next mm_cid_schedin()
+		 * can't observe MM_CID_ONCPU on the per CPU CID.
+		 */
+		mm_cid_update_pcpu_cid(mm, 0);
 	}
 }
 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-08  9:15                             ` Thomas Gleixner
@ 2026-03-08 16:55                               ` Jiri Slaby
  2026-03-08 16:58                               ` Thomas Gleixner
  1 sibling, 0 replies; 45+ messages in thread
From: Jiri Slaby @ 2026-03-08 16:55 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long, Marco Elver

On 08. 03. 26, 10:15, Thomas Gleixner wrote:
> On Sat, Mar 07 2026 at 23:29, Thomas Gleixner wrote:
>> I'll look at it more tomorrow in the hope that this rested brain
>> approach works out again.
> 
> There is another one of the same category. Combo patch below.

Thanks, submitted:
   https://build.opensuse.org/requests/1337507

I will report once I have something...

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-08  9:15                             ` Thomas Gleixner
  2026-03-08 16:55                               ` Jiri Slaby
@ 2026-03-08 16:58                               ` Thomas Gleixner
  2026-03-08 17:23                                 ` Matthieu Baerts
  1 sibling, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-08 16:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Slaby, Matthieu Baerts, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen,
	luto@kernel.org, Michal Koutný, Waiman Long, Marco Elver

On Sun, Mar 08 2026 at 10:15, Thomas Gleixner wrote:

> On Sat, Mar 07 2026 at 23:29, Thomas Gleixner wrote:
>> I'll look at it more tomorrow in the hope that this rested brain
>> approach works out again.
>
> There is another one of the same category. Combo patch below.

This rested brain thing is clearly a myth. The patch actually solves
nothing because the code ensures that the TRANSIT bit is never set
together with the ONCPU bit.

One of those moments where you just hope that the earth opens up and
swallows you.

So I'm back to square one. I go and do what I should have done in the
first place. Write a debug patch with trace_printks and let the people
who can actually trigger the problem run with it.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-08 16:58                               ` Thomas Gleixner
@ 2026-03-08 17:23                                 ` Matthieu Baerts
  2026-03-09  8:43                                   ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-08 17:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

08 Mar 2026 17:58:26 Thomas Gleixner <tglx@kernel.org>:

> On Sun, Mar 08 2026 at 10:15, Thomas Gleixner wrote:
>
>> On Sat, Mar 07 2026 at 23:29, Thomas Gleixner wrote:
>>> I'll look at it more tomorrow in the hope that this rested brain
>>> approach works out again.
>>
>> There is another one of the same category. Combo patch below.
>
> This rested brain thing is clearly a myth. The patch actually solves
> nothing because the code ensures that the TRANSIT bit is never set
> together with the ONCPU bit.

Thank you for having shared these patches. I confirm the myth: I can
still reproduce the issue on my side.

> One of those moments where you just hope that the earth opens up and
> swallows you.
>
> So I'm back to square one. I go and do what I should have done in the
> first place. Write a debug patch with trace_printks and let the people
> who can actually trigger the problem run with it.

Happy to test such debug patches!

Cheers,
Matt

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-08 17:23                                 ` Matthieu Baerts
@ 2026-03-09  8:43                                   ` Thomas Gleixner
  2026-03-09 12:23                                     ` Matthieu Baerts
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-09  8:43 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Peter Zijlstra, Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On Sun, Mar 08 2026 at 18:23, Matthieu Baerts wrote:
> 08 Mar 2026 17:58:26 Thomas Gleixner <tglx@kernel.org>:
>> So I'm back to square one. I go and do what I should have done in the
>> first place. Write a debug patch with trace_printks and let the people
>> who can actually trigger the problem run with it.
>
> Happy to test such debug patches!

See below.

Enable the tracepoints either on the kernel command line:

    trace_event=sched_switch,mmcid:*

or before starting the test case:

    echo 1 >/sys/kernel/tracing/events/sched/sched_switch/enable
    echo 1 >/sys/kernel/tracing/events/mmcid/enable

I added a 50ms timeout into mm_cid_get() which freezes the trace and
emits a warning. If you enable panic_on_warn and ftrace_dump_on_oops,
then it dumps the trace buffer once it hits the warning.

Either kernel command line:

   panic_on_warn ftrace_dump_on_oops

or

  echo 1 >/proc/sys/kernel/panic_on_warn
  echo 1 >/proc/sys/kernel/ftrace_dump_on_oops

That should provide enough information to decode this mystery.

Thanks,

        tglx
---
 include/trace/events/mmcid.h |  138 +++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/core.c          |   10 +++
 kernel/sched/sched.h         |   20 +++++-
 3 files changed, 165 insertions(+), 3 deletions(-)

--- /dev/null
+++ b/include/trace/events/mmcid.h
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mmcid
+
+#if !defined(_TRACE_MMCID_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MMCID_H
+
+#include <linux/sched.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(mmcid_class,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(mm, cid),
+
+	TP_STRUCT__entry(
+		__field( void *,	mm	)
+		__field( unsigned int,	cid	)
+	),
+
+	TP_fast_assign(
+		__entry->mm	= mm;
+		__entry->cid	= cid;
+	),
+
+	TP_printk("mm=%p cid=%08x", __entry->mm, __entry->cid)
+);
+
+DEFINE_EVENT(mmcid_class, mmcid_getcid,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(mm, cid)
+);
+
+DEFINE_EVENT(mmcid_class, mmcid_putcid,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(mm, cid)
+);
+
+DECLARE_EVENT_CLASS(mmcid_task_class,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(t, mm, cid),
+
+	TP_STRUCT__entry(
+		__field( void *,	t	)
+		__field( void *,	mm	)
+		__field( unsigned int,	cid	)
+	),
+
+	TP_fast_assign(
+		__entry->t	= t;
+		__entry->mm	= mm;
+		__entry->cid	= cid;
+	),
+
+	TP_printk("t=%p mm=%p cid=%08x", __entry->t, __entry->mm, __entry->cid)
+);
+
+DEFINE_EVENT(mmcid_task_class, mmcid_task_update,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(t, mm, cid)
+);
+
+DECLARE_EVENT_CLASS(mmcid_cpu_class,
+
+	TP_PROTO(unsigned int cpu, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(cpu, mm, cid),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	cpu	)
+		__field( void *,	mm	)
+		__field( unsigned int,	cid	)
+	),
+
+	TP_fast_assign(
+		__entry->cpu	= cpu;
+		__entry->mm	= mm;
+		__entry->cid	= cid;
+	),
+
+	TP_printk("cpu=%u mm=%p cid=%08x", __entry->cpu, __entry->mm, __entry->cid)
+);
+
+DEFINE_EVENT(mmcid_cpu_class, mmcid_cpu_update,
+
+	TP_PROTO(unsigned int cpu, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(cpu, mm, cid)
+);
+
+DECLARE_EVENT_CLASS(mmcid_user_class,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm),
+
+	TP_ARGS(t, mm),
+
+	TP_STRUCT__entry(
+		__field( void *,	t	)
+		__field( void *,	mm	)
+		__field( unsigned int,	users	)
+	),
+
+	TP_fast_assign(
+		__entry->t	= t;
+		__entry->mm	= mm;
+		__entry->users	= mm->mm_cid.users;
+	),
+
+	TP_printk("t=%p mm=%p users=%u", __entry->t, __entry->mm, __entry->users)
+);
+
+DEFINE_EVENT(mmcid_user_class, mmcid_user_add,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm),
+
+	TP_ARGS(t, mm)
+);
+
+DEFINE_EVENT(mmcid_user_class, mmcid_user_del,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm),
+
+	     TP_ARGS(t, mm)
+);
+
+#endif
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -86,6 +86,7 @@
 #include <linux/sched/rseq_api.h>
 #include <trace/events/sched.h>
 #include <trace/events/ipi.h>
+#include <trace/events/mmcid.h>
 #undef CREATE_TRACE_POINTS
 
 #include "sched.h"
@@ -10569,7 +10570,9 @@ static inline void mm_cid_transit_to_tas
 		unsigned int cid = cpu_cid_to_cid(t->mm_cid.cid);
 
 		t->mm_cid.cid = cid_to_transit_cid(cid);
+		trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 		pcp->cid = t->mm_cid.cid;
+		trace_mmcid_cpu_update(task_cpu(t), t->mm, pcp->cid);
 	}
 }
 
@@ -10602,7 +10605,9 @@ static void mm_cid_fixup_cpus_to_tasks(s
 			if (!cid_in_transit(cid)) {
 				cid = cid_to_transit_cid(cid);
 				rq->curr->mm_cid.cid = cid;
+				trace_mmcid_task_update(rq->curr, rq->curr->mm, cid);
 				pcp->cid = cid;
+				trace_mmcid_cpu_update(cpu, mm, cid);
 			}
 		}
 	}
@@ -10613,7 +10618,9 @@ static inline void mm_cid_transit_to_cpu
 {
 	if (cid_on_task(t->mm_cid.cid)) {
 		t->mm_cid.cid = cid_to_transit_cid(t->mm_cid.cid);
+		trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 		pcp->cid = t->mm_cid.cid;
+		trace_mmcid_cpu_update(task_cpu(t), t->mm, pcp->cid);
 	}
 }
 
@@ -10685,6 +10692,7 @@ static bool sched_mm_cid_add_user(struct
 {
 	t->mm_cid.active = 1;
 	mm->mm_cid.users++;
+	trace_mmcid_user_add(t, mm);
 	return mm_update_max_cids(mm);
 }
 
@@ -10727,6 +10735,7 @@ void sched_mm_cid_fork(struct task_struc
 	} else {
 		mm_cid_fixup_cpus_to_tasks(mm);
 		t->mm_cid.cid = mm_get_cid(mm);
+		trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 	}
 }
 
@@ -10739,6 +10748,7 @@ static bool sched_mm_cid_remove_user(str
 		mm_unset_cid_on_task(t);
 	}
 	t->mm->mm_cid.users--;
+	trace_mmcid_user_del(t, t->mm);
 	return mm_update_max_cids(t->mm);
 }
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -75,6 +75,7 @@
 #include <linux/delayacct.h>
 #include <linux/mmu_context.h>
 
+#include <trace/events/mmcid.h>
 #include <trace/events/power.h>
 #include <trace/events/sched.h>
 
@@ -3809,6 +3810,7 @@ static __always_inline bool cid_on_task(
 
 static __always_inline void mm_drop_cid(struct mm_struct *mm, unsigned int cid)
 {
+	trace_mmcid_putcid(mm, cid);
 	clear_bit(cid, mm_cidmask(mm));
 }
 
@@ -3817,6 +3819,7 @@ static __always_inline void mm_unset_cid
 	unsigned int cid = t->mm_cid.cid;
 
 	t->mm_cid.cid = MM_CID_UNSET;
+	trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 	if (cid_on_task(cid))
 		mm_drop_cid(t->mm, cid);
 }
@@ -3838,6 +3841,7 @@ static inline unsigned int __mm_get_cid(
 		return MM_CID_UNSET;
 	if (test_and_set_bit(cid, mm_cidmask(mm)))
 		return MM_CID_UNSET;
+	trace_mmcid_getcid(mm, cid);
 	return cid;
 }
 
@@ -3845,9 +3849,17 @@ static inline unsigned int mm_get_cid(st
 {
 	unsigned int cid = __mm_get_cid(mm, READ_ONCE(mm->mm_cid.max_cids));
 
-	while (cid == MM_CID_UNSET) {
-		cpu_relax();
-		cid = __mm_get_cid(mm, num_possible_cpus());
+	if (cid == MM_CID_UNSET) {
+		ktime_t t0 = ktime_get();
+
+		while (cid == MM_CID_UNSET) {
+			cpu_relax();
+			cid = __mm_get_cid(mm, num_possible_cpus());
+			if (ktime_get() - t0 > 50 * NSEC_PER_MSEC) {
+				tracing_off();
+				WARN_ON_ONCE(1);
+			}
+		}
 	}
 	return cid;
 }
@@ -3874,6 +3886,7 @@ static inline unsigned int mm_cid_conver
 static __always_inline void mm_cid_update_task_cid(struct task_struct *t, unsigned int cid)
 {
 	if (t->mm_cid.cid != cid) {
+		trace_mmcid_task_update(t, t->mm, cid);
 		t->mm_cid.cid = cid;
 		rseq_sched_set_ids_changed(t);
 	}
@@ -3881,6 +3894,7 @@ static __always_inline void mm_cid_updat
 
 static __always_inline void mm_cid_update_pcpu_cid(struct mm_struct *mm, unsigned int cid)
 {
+	trace_mmcid_cpu_update(smp_processor_id(), mm, cid);
 	__this_cpu_write(mm->mm_cid.pcpu->cid, cid);
 }
 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-09  8:43                                   ` Thomas Gleixner
@ 2026-03-09 12:23                                     ` Matthieu Baerts
  2026-03-10  8:09                                       ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-09 12:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

Hi Thomas,

On 09/03/2026 09:43, Thomas Gleixner wrote:
> On Sun, Mar 08 2026 at 18:23, Matthieu Baerts wrote:
>> 08 Mar 2026 17:58:26 Thomas Gleixner <tglx@kernel.org>:
>>> So I'm back to square one. I go and do what I should have done in the
>>> first place. Write a debug patch with trace_printks and let the people
>>> who can actually trigger the problem run with it.
>>
>> Happy to test such debug patches!
> 
> See below.
> 
> Enable the tracepoints either on the kernel command line:
> 
>     trace_event=sched_switch,mmcid:*
> 
> or before starting the test case:
> 
>     echo 1 >/sys/kernel/tracing/events/sched/sched_switch/enable
>     echo 1 >/sys/kernel/tracing/events/mmcid/enable
> 
> I added a 50ms timeout into mm_cid_get() which freezes the trace and
> emits a warning. If you enable panic_on_warn and ftrace_dump_on_oops,
> then it dumps the trace buffer once it hits the warning.
> 
> Either kernel command line:
> 
>    panic_on_warn ftrace_dump_on_oops
> 
> or
> 
>   echo 1 >/proc/sys/kernel/panic_on_warn
>   echo 1 >/proc/sys/kernel/ftrace_dump_on_oops
> 
> That should provide enough information to decode this mystery.

Thank you for the debug patch and the clear instructions. I managed to
reproduce the issue with the extra debug. The ouput is available here:

  https://github.com/user-attachments/files/25841808/issue-617-debug.txt.gz

Just in case, the kernel config file that was used:


https://github.com/user-attachments/files/25841873/issue-617-debug.config.gz

Please tell me if it is an issue to download these files from GitHub.
The output file has 10k+ lines.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-09 12:23                                     ` Matthieu Baerts
@ 2026-03-10  8:09                                       ` Thomas Gleixner
  2026-03-10  8:20                                         ` Thomas Gleixner
  2026-03-10  8:56                                         ` Jiri Slaby
  0 siblings, 2 replies; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-10  8:09 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Peter Zijlstra, Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

Matthieu!

On Mon, Mar 09 2026 at 13:23, Matthieu Baerts wrote:
> On 09/03/2026 09:43, Thomas Gleixner wrote:
>> That should provide enough information to decode this mystery.

That was wishful thinking, but at least it narrows down the search space.

> Thank you for the debug patch and the clear instructions. I managed to
> reproduce the issue with the extra debug. The ouput is available here:
>
>   https://github.com/user-attachments/files/25841808/issue-617-debug.txt.gz

Thank you for testing. So what I can see from the trace is:

[    2.101917] virtme-n-68        3d..1. 703536us : mmcid_user_add: t=00000000e4425b1d mm=00000000a22be644 users=3
[    2.102057] virtme-n-68        3d..1. 703537us : mmcid_getcid: mm=00000000a22be644 cid=00000002
[    2.102195] virtme-n-68        3d..2. 703548us : sched_switch: prev_comm=virtme-ng-init prev_pid=68 prev_prio=120 prev_state=D ==> next_comm=swapper/3 next_pid=0 next_prio=120

This one creates the third thread related to the mm and schedules
out. The new thread schedules in a moment later:

[    2.102828]   <idle>-0         2d..2. 703565us : sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=virtme-ng-init next_pid=69 next_prio=120
[    2.103039]   <idle>-0         2d..2. 703567us : mmcid_cpu_update: cpu=2 mm=00000000a22be644 cid=00000002

[    2.104283]   <idle>-0         0d..2. 703642us : sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=virtme-ng-init next_pid=1 next_prio=120
[    2.104493]   <idle>-0         0d..2. 703643us : mmcid_cpu_update: cpu=0 mm=00000000a22be644 cid=00000000

virtme-n-1 owns CID 0 and after scheduled in it creates the 4th thread,
which is still in the CID space (0..3)

[    2.104616] virtme-n-1         0d..1. 703690us : mmcid_user_add: t=0000000031a5ee91 mm=00000000a22be644 users=4

Unsurprisingly this assignes CID 3:

[    2.104757] virtme-n-1         0d..1. 703691us : mmcid_getcid: mm=00000000a22be644 cid=00000003

And the newly created task schedules in on CPU3:

[    2.104880]   <idle>-0         3d..2. 703708us : sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=virtme-ng-init next_pid=70 next_prio=120
[    2.105091]   <idle>-0         3d..2. 703708us : mmcid_cpu_update: cpu=3 mm=00000000a22be644 cid=00000003

Now n-1 continues and creates the 5th thread:

[    2.105227] virtme-n-1         0d..1. 703730us : mmcid_user_add: t=00000000f2e4a8c8 mm=00000000a22be644 users=5

which makes it switch to per CPU ownership mode. Then it continues to go
through the tasks in mm_cid_do_fixup_tasks_to_cpus() and fixes up:

[    2.105368] virtme-n-1         0d..1. 703730us : mmcid_task_update: t=00000000c923c125 mm=00000000a22be644 cid=20000000
[    2.105509] virtme-n-1         0d..1. 703731us : mmcid_cpu_update: cpu=0 mm=00000000a22be644 cid=20000000

Itself to be in TRANSIT mode

[    2.105632] virtme-n-1         0d..2. 703731us : mmcid_task_update: t=00000000478c5e8d mm=00000000a22be644 cid=80000000
[    2.105773] virtme-n-1         0d..2. 703731us : mmcid_putcid: mm=00000000a22be644 cid=00000001

Drops the CID of one task which is not on a CPU

[    2.105896] virtme-n-1         0d..2. 703731us : mmcid_task_update: t=0000000031a5ee91 mm=00000000a22be644 cid=20000003
[    2.106037] virtme-n-1         0d..2. 703731us : mmcid_cpu_update: cpu=3 mm=00000000a22be644 cid=20000003

and puts the third one correctly into TRANSIT mode

[    2.106174] virtme-n-69        2d..2. 703736us : sched_switch: prev_comm=virtme-ng-init prev_pid=69 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120

Here the one which owns CID 2 schedules out without notice, which is
just wrong as the above should have already moved it over to TRANSIT
mode. Why didn't that happen?

So the only circumstances where mm_cid_do_fixup_tasks_to_cpus() fails to
do that are:

   1) task->mm != mm.

or

   2) task is not longer in the task list w/o going through do_exit()

How the heck is either one of them possible?

Just for the record. The picture Jiri decoded from the VM crash dump is
exactly the same. One task is not listed.

Confused

        tglx



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10  8:09                                       ` Thomas Gleixner
@ 2026-03-10  8:20                                         ` Thomas Gleixner
  2026-03-10  8:56                                         ` Jiri Slaby
  1 sibling, 0 replies; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-10  8:20 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Peter Zijlstra, Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On Tue, Mar 10 2026 at 09:09, Thomas Gleixner wrote:
> On Mon, Mar 09 2026 at 13:23, Matthieu Baerts wrote:
> So the only circumstances where mm_cid_do_fixup_tasks_to_cpus() fails to
> do that are:
>
>    1) task->mm != mm.
>
> or
>
>    2) task is not longer in the task list w/o going through do_exit()

  or

     3) task->mm_cid.active == 0 again w/o going through do_exit()

> How the heck is either one of them possible?
>
> Just for the record. The picture Jiri decoded from the VM crash dump is
> exactly the same. One task is not listed.
>
> Confused
>
>         tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10  8:09                                       ` Thomas Gleixner
  2026-03-10  8:20                                         ` Thomas Gleixner
@ 2026-03-10  8:56                                         ` Jiri Slaby
  2026-03-10  9:00                                           ` Jiri Slaby
  1 sibling, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-10  8:56 UTC (permalink / raw)
  To: Thomas Gleixner, Matthieu Baerts
  Cc: Peter Zijlstra, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

Thomas,

On 10. 03. 26, 9:09, Thomas Gleixner wrote:
>     2) task is not longer in the task list w/o going through do_exit()

After a quick look (I have to run now, so I could be completely wrong): 
what about the task is not in the list *yet*. You use RCU -> 
for_each_process_thread(), so there could be a race with attach_pid() 
protected by tasklist_lock.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10  8:56                                         ` Jiri Slaby
@ 2026-03-10  9:00                                           ` Jiri Slaby
  2026-03-10 10:03                                             ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Jiri Slaby @ 2026-03-10  9:00 UTC (permalink / raw)
  To: Thomas Gleixner, Matthieu Baerts
  Cc: Peter Zijlstra, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On 10. 03. 26, 9:56, Jiri Slaby wrote:
> Thomas,
> 
> On 10. 03. 26, 9:09, Thomas Gleixner wrote:
>>     2) task is not longer in the task list w/o going through do_exit()
> 
> After a quick look (I have to run now, so I could be completely wrong): 
> what about the task is not in the list *yet*. You use RCU -> 
> for_each_process_thread(), so there could be a race with attach_pid() 
> protected by tasklist_lock.

Not attach_pid(), that's irrelevant here.

But one line below in copy_process(): list_add_tail().

> thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10  9:00                                           ` Jiri Slaby
@ 2026-03-10 10:03                                             ` Thomas Gleixner
  2026-03-10 10:06                                               ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-10 10:03 UTC (permalink / raw)
  To: Jiri Slaby, Matthieu Baerts
  Cc: Peter Zijlstra, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On Tue, Mar 10 2026 at 10:00, Jiri Slaby wrote:
> On 10. 03. 26, 9:56, Jiri Slaby wrote:
>> Thomas,
>> 
>> On 10. 03. 26, 9:09, Thomas Gleixner wrote:
>>>     2) task is not longer in the task list w/o going through do_exit()
>> 
>> After a quick look (I have to run now, so I could be completely wrong): 
>> what about the task is not in the list *yet*. You use RCU -> 
>> for_each_process_thread(), so there could be a race with attach_pid() 
>> protected by tasklist_lock.
>
> Not attach_pid(), that's irrelevant here.
>
> But one line below in copy_process(): list_add_tail().

Yes. There is an issue. Peter and me just discovered that as well, but
in the case at hand it can't be the problem. The missing task is on the
CPU which means it must be visible in the thread list and task list.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10 10:03                                             ` Thomas Gleixner
@ 2026-03-10 10:06                                               ` Thomas Gleixner
  2026-03-10 11:24                                                 ` Matthieu Baerts
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-10 10:06 UTC (permalink / raw)
  To: Jiri Slaby, Matthieu Baerts
  Cc: Peter Zijlstra, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On Tue, Mar 10 2026 at 11:03, Thomas Gleixner wrote:
> On Tue, Mar 10 2026 at 10:00, Jiri Slaby wrote:
> Yes. There is an issue. Peter and me just discovered that as well, but
> in the case at hand it can't be the problem. The missing task is on the
> CPU which means it must be visible in the thread list and task list.

The updated debug patch fixes this fork problem and adds more tracing to
it.

Mathieu, can you give it another spin please?

Thanks,

        tglx
---
 include/linux/sched.h        |    2 
 include/trace/events/mmcid.h |  171 +++++++++++++++++++++++++++++++++++++++++++
 kernel/fork.c                |    2 
 kernel/sched/core.c          |   34 ++++++--
 kernel/sched/sched.h         |   20 ++++-
 5 files changed, 215 insertions(+), 14 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2354,7 +2354,6 @@ static __always_inline void alloc_tag_re
 #ifdef CONFIG_SCHED_MM_CID
 void sched_mm_cid_before_execve(struct task_struct *t);
 void sched_mm_cid_after_execve(struct task_struct *t);
-void sched_mm_cid_fork(struct task_struct *t);
 void sched_mm_cid_exit(struct task_struct *t);
 static __always_inline int task_mm_cid(struct task_struct *t)
 {
@@ -2363,7 +2362,6 @@ static __always_inline int task_mm_cid(s
 #else
 static inline void sched_mm_cid_before_execve(struct task_struct *t) { }
 static inline void sched_mm_cid_after_execve(struct task_struct *t) { }
-static inline void sched_mm_cid_fork(struct task_struct *t) { }
 static inline void sched_mm_cid_exit(struct task_struct *t) { }
 static __always_inline int task_mm_cid(struct task_struct *t)
 {
--- /dev/null
+++ b/include/trace/events/mmcid.h
@@ -0,0 +1,171 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mmcid
+
+#if !defined(_TRACE_MMCID_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MMCID_H
+
+#include <linux/sched.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(mmcid_class,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(mm, cid),
+
+	TP_STRUCT__entry(
+		__field( void *,	mm	)
+		__field( unsigned int,	cid	)
+	),
+
+	TP_fast_assign(
+		__entry->mm	= mm;
+		__entry->cid	= cid;
+	),
+
+	TP_printk("mm=%p cid=%08x", __entry->mm, __entry->cid)
+);
+
+DEFINE_EVENT(mmcid_class, mmcid_getcid,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(mm, cid)
+);
+
+DEFINE_EVENT(mmcid_class, mmcid_putcid,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(mm, cid)
+);
+
+DECLARE_EVENT_CLASS(mmcid_task_class,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(t, mm, cid),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	pid	)
+		__field( unsigned int,	cid	)
+		__field( void *,	mm	)
+	),
+
+	TP_fast_assign(
+		__entry->pid	= t->pid;
+		__entry->cid	= cid;
+		__entry->mm	= mm;
+	),
+
+	TP_printk("pid=%u cid=%08x mm=%p", __entry->pid, __entry->cid, __entry->mm)
+);
+
+DEFINE_EVENT(mmcid_task_class, mmcid_task_update,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(t, mm, cid)
+);
+
+DECLARE_EVENT_CLASS(mmcid_fixup_class,
+
+	TP_PROTO(struct task_struct *t, unsigned int users),
+
+	TP_ARGS(t, users),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	pid	)
+		__field( unsigned int,	cid	)
+		__field( unsigned int,	active	)
+		__field( unsigned int,	users	)
+		__field( void *,	mm	)
+	),
+
+	TP_fast_assign(
+		__entry->pid	= t->pid;
+		__entry->cid	= t->mm_cid.cid;
+		__entry->active	= t->mm_cid.active;
+		__entry->users	= users;
+		__entry->mm	= t->mm;
+	),
+
+	TP_printk("pid=%u cid=%08x active=%u users=%u mm=%p", __entry->pid, __entry->cid,
+		  __entry->active, __entry->users, __entry->mm)
+);
+
+DEFINE_EVENT(mmcid_fixup_class, mmcid_fixup_task,
+
+	TP_PROTO(struct task_struct *t, unsigned int users),
+
+	TP_ARGS(t, users)
+);
+
+DECLARE_EVENT_CLASS(mmcid_cpu_class,
+
+	TP_PROTO(unsigned int cpu, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(cpu, mm, cid),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	cpu	)
+		__field( unsigned int,	cid	)
+		__field( void *,	mm	)
+	),
+
+	TP_fast_assign(
+		__entry->cpu	= cpu;
+		__entry->cid	= cid;
+		__entry->mm	= mm;
+	),
+
+	TP_printk("cpu=%u cid=%08x mm=%p", __entry->cpu, __entry->cid, __entry->mm)
+);
+
+DEFINE_EVENT(mmcid_cpu_class, mmcid_cpu_update,
+
+	TP_PROTO(unsigned int cpu, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(cpu, mm, cid)
+);
+
+DECLARE_EVENT_CLASS(mmcid_user_class,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm),
+
+	TP_ARGS(t, mm),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	pid	)
+		__field( unsigned int,	users	)
+		__field( void *,	mm	)
+	),
+
+	TP_fast_assign(
+		__entry->pid	= t->pid;
+		__entry->users	= mm->mm_cid.users;
+		__entry->mm	= mm;
+	),
+
+	TP_printk("pid=%u users=%u mm=%p", __entry->pid, __entry->users, __entry->mm)
+);
+
+DEFINE_EVENT(mmcid_user_class, mmcid_user_add,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm),
+
+	TP_ARGS(t, mm)
+);
+
+DEFINE_EVENT(mmcid_user_class, mmcid_user_del,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm),
+
+	     TP_ARGS(t, mm)
+);
+
+#endif
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1586,7 +1586,6 @@ static int copy_mm(u64 clone_flags, stru
 
 	tsk->mm = mm;
 	tsk->active_mm = mm;
-	sched_mm_cid_fork(tsk);
 	return 0;
 }
 
@@ -2498,7 +2497,6 @@ static bool need_futex_hash_allocate_def
 	exit_nsproxy_namespaces(p);
 bad_fork_cleanup_mm:
 	if (p->mm) {
-		sched_mm_cid_exit(p);
 		mm_clear_owner(p->mm, p);
 		mmput(p->mm);
 	}
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -86,6 +86,7 @@
 #include <linux/sched/rseq_api.h>
 #include <trace/events/sched.h>
 #include <trace/events/ipi.h>
+#include <trace/events/mmcid.h>
 #undef CREATE_TRACE_POINTS
 
 #include "sched.h"
@@ -4729,8 +4730,12 @@ void sched_cancel_fork(struct task_struc
 	scx_cancel_fork(p);
 }
 
+static void sched_mm_cid_fork(struct task_struct *t);
+
 void sched_post_fork(struct task_struct *p)
 {
+	if (IS_ENABLED(CONFIG_SCHED_MM_CID))
+		sched_mm_cid_fork(p);
 	uclamp_post_fork(p);
 	scx_post_fork(p);
 }
@@ -10569,7 +10574,9 @@ static inline void mm_cid_transit_to_tas
 		unsigned int cid = cpu_cid_to_cid(t->mm_cid.cid);
 
 		t->mm_cid.cid = cid_to_transit_cid(cid);
+		trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 		pcp->cid = t->mm_cid.cid;
+		trace_mmcid_cpu_update(task_cpu(t), t->mm, pcp->cid);
 	}
 }
 
@@ -10602,7 +10609,9 @@ static void mm_cid_fixup_cpus_to_tasks(s
 			if (!cid_in_transit(cid)) {
 				cid = cid_to_transit_cid(cid);
 				rq->curr->mm_cid.cid = cid;
+				trace_mmcid_task_update(rq->curr, rq->curr->mm, cid);
 				pcp->cid = cid;
+				trace_mmcid_cpu_update(cpu, mm, cid);
 			}
 		}
 	}
@@ -10613,7 +10622,9 @@ static inline void mm_cid_transit_to_cpu
 {
 	if (cid_on_task(t->mm_cid.cid)) {
 		t->mm_cid.cid = cid_to_transit_cid(t->mm_cid.cid);
+		trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 		pcp->cid = t->mm_cid.cid;
+		trace_mmcid_cpu_update(task_cpu(t), t->mm, pcp->cid);
 	}
 }
 
@@ -10646,15 +10657,17 @@ static void mm_cid_do_fixup_tasks_to_cpu
 	 * possible switch back to per task mode happens either in the
 	 * deferred handler function or in the next fork()/exit().
 	 *
-	 * The caller has already transferred. The newly incoming task is
-	 * already accounted for, but not yet visible.
+	 * The caller has already transferred so remove it from the users
+	 * count. The incoming task is already visible and has mm_cid.active,
+	 * but has task::mm_cid::cid == UNSET. Still it needs to be accounted
+	 * for. Concurrent fork()s might add more threads, but all of them have
+	 * task::mm_cid::active = 0, so they don't affect the accounting here.
 	 */
-	users = mm->mm_cid.users - 2;
-	if (!users)
-		return;
+	users = mm->mm_cid.users - 1;
 
 	guard(rcu)();
 	for_other_threads(current, t) {
+		trace_mmcid_fixup_task(t, users);
 		if (mm_cid_fixup_task_to_cpu(t, mm))
 			users--;
 	}
@@ -10666,6 +10679,7 @@ static void mm_cid_do_fixup_tasks_to_cpu
 	for_each_process_thread(p, t) {
 		if (t == current || t->mm != mm)
 			continue;
+		trace_mmcid_fixup_task(t, users);
 		if (mm_cid_fixup_task_to_cpu(t, mm)) {
 			if (--users == 0)
 				return;
@@ -10685,15 +10699,19 @@ static bool sched_mm_cid_add_user(struct
 {
 	t->mm_cid.active = 1;
 	mm->mm_cid.users++;
+	trace_mmcid_user_add(t, mm);
 	return mm_update_max_cids(mm);
 }
 
-void sched_mm_cid_fork(struct task_struct *t)
+static void sched_mm_cid_fork(struct task_struct *t)
 {
 	struct mm_struct *mm = t->mm;
 	bool percpu;
 
-	WARN_ON_ONCE(!mm || t->mm_cid.cid != MM_CID_UNSET);
+	if (!mm)
+		return;
+
+	WARN_ON_ONCE(t->mm_cid.cid != MM_CID_UNSET);
 
 	guard(mutex)(&mm->mm_cid.mutex);
 	scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
@@ -10727,6 +10745,7 @@ void sched_mm_cid_fork(struct task_struc
 	} else {
 		mm_cid_fixup_cpus_to_tasks(mm);
 		t->mm_cid.cid = mm_get_cid(mm);
+		trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 	}
 }
 
@@ -10739,6 +10758,7 @@ static bool sched_mm_cid_remove_user(str
 		mm_unset_cid_on_task(t);
 	}
 	t->mm->mm_cid.users--;
+	trace_mmcid_user_del(t, t->mm);
 	return mm_update_max_cids(t->mm);
 }
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -75,6 +75,7 @@
 #include <linux/delayacct.h>
 #include <linux/mmu_context.h>
 
+#include <trace/events/mmcid.h>
 #include <trace/events/power.h>
 #include <trace/events/sched.h>
 
@@ -3809,6 +3810,7 @@ static __always_inline bool cid_on_task(
 
 static __always_inline void mm_drop_cid(struct mm_struct *mm, unsigned int cid)
 {
+	trace_mmcid_putcid(mm, cid);
 	clear_bit(cid, mm_cidmask(mm));
 }
 
@@ -3817,6 +3819,7 @@ static __always_inline void mm_unset_cid
 	unsigned int cid = t->mm_cid.cid;
 
 	t->mm_cid.cid = MM_CID_UNSET;
+	trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 	if (cid_on_task(cid))
 		mm_drop_cid(t->mm, cid);
 }
@@ -3838,6 +3841,7 @@ static inline unsigned int __mm_get_cid(
 		return MM_CID_UNSET;
 	if (test_and_set_bit(cid, mm_cidmask(mm)))
 		return MM_CID_UNSET;
+	trace_mmcid_getcid(mm, cid);
 	return cid;
 }
 
@@ -3845,9 +3849,17 @@ static inline unsigned int mm_get_cid(st
 {
 	unsigned int cid = __mm_get_cid(mm, READ_ONCE(mm->mm_cid.max_cids));
 
-	while (cid == MM_CID_UNSET) {
-		cpu_relax();
-		cid = __mm_get_cid(mm, num_possible_cpus());
+	if (cid == MM_CID_UNSET) {
+		ktime_t t0 = ktime_get();
+
+		while (cid == MM_CID_UNSET) {
+			cpu_relax();
+			cid = __mm_get_cid(mm, num_possible_cpus());
+			if (ktime_get() - t0 > 50 * NSEC_PER_MSEC) {
+				tracing_off();
+				WARN_ON_ONCE(1);
+			}
+		}
 	}
 	return cid;
 }
@@ -3874,6 +3886,7 @@ static inline unsigned int mm_cid_conver
 static __always_inline void mm_cid_update_task_cid(struct task_struct *t, unsigned int cid)
 {
 	if (t->mm_cid.cid != cid) {
+		trace_mmcid_task_update(t, t->mm, cid);
 		t->mm_cid.cid = cid;
 		rseq_sched_set_ids_changed(t);
 	}
@@ -3881,6 +3894,7 @@ static __always_inline void mm_cid_updat
 
 static __always_inline void mm_cid_update_pcpu_cid(struct mm_struct *mm, unsigned int cid)
 {
+	trace_mmcid_cpu_update(smp_processor_id(), mm, cid);
 	__this_cpu_write(mm->mm_cid.pcpu->cid, cid);
 }
 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10 10:06                                               ` Thomas Gleixner
@ 2026-03-10 11:24                                                 ` Matthieu Baerts
  2026-03-10 11:54                                                   ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-10 11:24 UTC (permalink / raw)
  To: Thomas Gleixner, Jiri Slaby
  Cc: Peter Zijlstra, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

Hi Thomas,

On 10/03/2026 11:06, Thomas Gleixner wrote:
> On Tue, Mar 10 2026 at 11:03, Thomas Gleixner wrote:
>> On Tue, Mar 10 2026 at 10:00, Jiri Slaby wrote:
>> Yes. There is an issue. Peter and me just discovered that as well, but
>> in the case at hand it can't be the problem. The missing task is on the
>> CPU which means it must be visible in the thread list and task list.
> 
> The updated debug patch fixes this fork problem and adds more tracing to
> it.

Thank you for this new patch, and for continuing to look at this!

> Mathieu, can you give it another spin please?

Just did. Output is available there:

  https://github.com/user-attachments/files/25867817/issue-617-debug-20260310.txt.gz

Only 7.7k lines this time.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10 11:24                                                 ` Matthieu Baerts
@ 2026-03-10 11:54                                                   ` Peter Zijlstra
  2026-03-10 12:28                                                     ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2026-03-10 11:54 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Thomas Gleixner, Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella,
	kvm, virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On Tue, Mar 10, 2026 at 12:24:02PM +0100, Matthieu Baerts wrote:

> Just did. Output is available there:
> 
>   https://github.com/user-attachments/files/25867817/issue-617-debug-20260310.txt.gz
> 
> Only 7.7k lines this time.

Same damn thing again...

[    2.533811] virtme-n-1         3d..1. 849756us : mmcid_user_add: pid=1 users=1 mm=000000002b3f8459
[    4.523998] virtme-n-1         3d..1. 1115085us : mmcid_user_add: pid=71 users=2 mm=000000002b3f8459
[    4.529065] virtme-n-1         3d..1. 1115937us : mmcid_user_add: pid=72 users=3 mm=000000002b3f8459

[    4.529448] virtme-n-71        2d..1. 1115969us : mmcid_user_add: pid=73 users=4 mm=000000002b3f8459         <=== missing!
[    4.529946] virtme-n-71        2d..1. 1115971us : mmcid_getcid: mm=000000002b3f8459 cid=00000003

71 spawns 73, assigns cid 3

[    4.530573]   <idle>-0         1d..2. 1115991us : sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=virtme-ng-init next_pid=73 next_prio=120
[    4.530865]   <idle>-0         1d..2. 1115993us : mmcid_cpu_update: cpu=1 cid=00000003 mm=000000002b3f8459

It gets scheduled on CPU-1, sets CID...

[    4.531038] virtme-n-1         3d..1. 1116013us : mmcid_user_add: pid=74 users=5 mm=000000002b3f8459

Then 1 spawns 74 on CPU 3, this is the 5th task, so we initiate a
task->cpu cid transition:

[    4.531203] virtme-n-1         3d..1. 1116014us : mmcid_task_update: pid=1 cid=20000000 mm=000000002b3f8459
[    4.531369] virtme-n-1         3d..1. 1116014us : mmcid_cpu_update: cpu=3 cid=20000000 mm=000000002b3f8459

Task 1

[    4.531530] virtme-n-1         3..... 1116014us : mmcid_fixup_task: pid=71 cid=00000001 active=1 users=4 mm=000000002b3f8459
[    4.531790] virtme-n-1         3d..2. 1116015us : mmcid_task_update: pid=71 cid=80000000 mm=000000002b3f8459
[    4.532000] virtme-n-1         3d..2. 1116015us : mmcid_putcid: mm=000000002b3f8459 cid=00000001

Task 71

[    4.532169] virtme-n-1         3..... 1116015us : mmcid_fixup_task: pid=72 cid=00000002 active=1 users=3 mm=000000002b3f8459
[    4.532362] virtme-n-1         3d..2. 1116016us : mmcid_task_update: pid=72 cid=20000002 mm=000000002b3f8459
[    4.532514] virtme-n-1         3d..2. 1116016us : mmcid_cpu_update: cpu=0 cid=20000002 mm=000000002b3f8459

Task 72

[    4.532649] virtme-n-1         3..... 1116016us : mmcid_fixup_task: pid=74 cid=80000000 active=1 users=2 mm=000000002b3f8459

Task 74, note the glaring lack of 73!!! which all this time is running
on CPU 1. Per the fact that it got scheduled it must be on tasklist,
per the fact that 1 spawns 74 after it on CPU3, we must observe any
prior tasklist changes and per the fact that it got a cid ->active must
be set. WTF!

That said, we set active after tasklist_lock now, so it might be
possible we simply miss that store, observe the 'old' 0 and skip over
it?

Let me stare hard at that...


[    4.532912] virtme-n-1         3..... 1116017us : mmcid_fixup_task: pid=71 cid=80000000 active=1 users=1 mm=000000002b3f8459
[    4.533386] virtme-n-1         3d..2. 1116041us : mmcid_cpu_update: cpu=3 cid=40000000 mm=000000002b3f8459

I *think* this is the for_each_process_thread() hitting 71 again.

[    4.533805]   <idle>-0         2d..2. 1116043us : mmcid_getcid: mm=000000002b3f8459 cid=00000001
[    4.533980]   <idle>-0         2d..2. 1116044us : mmcid_cpu_update: cpu=2 cid=40000001 mm=000000002b3f8459
[    4.534156]   <idle>-0         2d..2. 1116044us : mmcid_task_update: pid=74 cid=40000001 mm=000000002b3f8459
[    4.534579] virtme-n-72        0d..2. 1116046us : mmcid_cpu_update: cpu=0 cid=40000002 mm=000000002b3f8459

[    4.535803] virtme-n-73        1d..2. 1116179us : sched_switch: prev_comm=virtme-ng-init prev_pid=73 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120

And then after all that, 73 blocks.. not having been marked TRANSIT or
anything and thus holding on to the CID, leading to all this trouble.




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10 11:54                                                   ` Peter Zijlstra
@ 2026-03-10 12:28                                                     ` Thomas Gleixner
  2026-03-10 13:40                                                       ` Matthieu Baerts
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-10 12:28 UTC (permalink / raw)
  To: Peter Zijlstra, Matthieu Baerts
  Cc: Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On Tue, Mar 10 2026 at 12:54, Peter Zijlstra wrote:
> On Tue, Mar 10, 2026 at 12:24:02PM +0100, Matthieu Baerts wrote:
>> Only 7.7k lines this time.
>
> Same damn thing again...
>
> [    2.533811] virtme-n-1         3d..1. 849756us : mmcid_user_add: pid=1 users=1 mm=000000002b3f8459
> [    4.523998] virtme-n-1         3d..1. 1115085us : mmcid_user_add: pid=71 users=2 mm=000000002b3f8459
> [    4.529065] virtme-n-1         3d..1. 1115937us : mmcid_user_add: pid=72 users=3 mm=000000002b3f8459
>
> [    4.529448] virtme-n-71        2d..1. 1115969us : mmcid_user_add: pid=73 users=4 mm=000000002b3f8459         <=== missing!
> [    4.529946] virtme-n-71        2d..1. 1115971us : mmcid_getcid: mm=000000002b3f8459 cid=00000003
>
> 71 spawns 73, assigns cid 3
>
> [    4.530573]   <idle>-0         1d..2. 1115991us : sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=virtme-ng-init next_pid=73 next_prio=120
> [    4.530865]   <idle>-0         1d..2. 1115993us : mmcid_cpu_update: cpu=1 cid=00000003 mm=000000002b3f8459
>
> It gets scheduled on CPU-1, sets CID...
>
> [    4.531038] virtme-n-1         3d..1. 1116013us : mmcid_user_add: pid=74 users=5 mm=000000002b3f8459
>
> Then 1 spawns 74 on CPU 3, this is the 5th task, so we initiate a
> task->cpu cid transition:
>
> [    4.531203] virtme-n-1         3d..1. 1116014us : mmcid_task_update: pid=1 cid=20000000 mm=000000002b3f8459
> [    4.531369] virtme-n-1         3d..1. 1116014us : mmcid_cpu_update: cpu=3 cid=20000000 mm=000000002b3f8459
>
> Task 1
>
> [    4.531530] virtme-n-1         3..... 1116014us : mmcid_fixup_task: pid=71 cid=00000001 active=1 users=4 mm=000000002b3f8459
> [    4.531790] virtme-n-1         3d..2. 1116015us : mmcid_task_update: pid=71 cid=80000000 mm=000000002b3f8459
> [    4.532000] virtme-n-1         3d..2. 1116015us : mmcid_putcid: mm=000000002b3f8459 cid=00000001
>
> Task 71
>
> [    4.532169] virtme-n-1         3..... 1116015us : mmcid_fixup_task: pid=72 cid=00000002 active=1 users=3 mm=000000002b3f8459
> [    4.532362] virtme-n-1         3d..2. 1116016us : mmcid_task_update: pid=72 cid=20000002 mm=000000002b3f8459
> [    4.532514] virtme-n-1         3d..2. 1116016us : mmcid_cpu_update: cpu=0 cid=20000002 mm=000000002b3f8459
>
> Task 72
>
> [    4.532649] virtme-n-1         3..... 1116016us : mmcid_fixup_task: pid=74 cid=80000000 active=1 users=2 mm=000000002b3f8459
>
> Task 74, note the glaring lack of 73!!! which all this time is running
> on CPU 1. Per the fact that it got scheduled it must be on tasklist,
> per the fact that 1 spawns 74 after it on CPU3, we must observe any
> prior tasklist changes and per the fact that it got a cid ->active must
> be set. WTF!
>
> That said, we set active after tasklist_lock now, so it might be
> possible we simply miss that store, observe the 'old' 0 and skip over
> it?

No. We'd see it and print:

  virtme-n-1         3..... 1116017us : mmcid_fixup_task: pid=73 cid=00000003 active=0 ...

But we don't see it at all.

> Let me stare hard at that...
>
>
> [    4.532912] virtme-n-1         3..... 1116017us : mmcid_fixup_task: pid=71 cid=80000000 active=1 users=1 mm=000000002b3f8459
>
> I *think* this is the for_each_process_thread() hitting 71 again.

Correct, because the user count did not get down to zero in the pass
going through the thread list of the process.

Duh. I just noticed that this is stupid as it stops right there based on
@users becoming 0. So it fails to take the already handled tasks out of
the picture.

If 73 is result of a vfork() then it won't show up in the thread list of
the process, but is accounted to the MM. And then the (--users == 0 )
condition prevents to look it up. Duh!

I've updated the debug patch and removed the @users conditionals so it
keeps searching. So it should find that magic 73.

Thanks,

        tglx
---
 include/linux/sched.h        |    2 
 include/trace/events/mmcid.h |  171 +++++++++++++++++++++++++++++++++++++++++++
 kernel/fork.c                |    2 
 kernel/sched/core.c          |   39 +++++++--
 kernel/sched/sched.h         |   20 ++++-
 5 files changed, 216 insertions(+), 18 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2354,7 +2354,6 @@ static __always_inline void alloc_tag_re
 #ifdef CONFIG_SCHED_MM_CID
 void sched_mm_cid_before_execve(struct task_struct *t);
 void sched_mm_cid_after_execve(struct task_struct *t);
-void sched_mm_cid_fork(struct task_struct *t);
 void sched_mm_cid_exit(struct task_struct *t);
 static __always_inline int task_mm_cid(struct task_struct *t)
 {
@@ -2363,7 +2362,6 @@ static __always_inline int task_mm_cid(s
 #else
 static inline void sched_mm_cid_before_execve(struct task_struct *t) { }
 static inline void sched_mm_cid_after_execve(struct task_struct *t) { }
-static inline void sched_mm_cid_fork(struct task_struct *t) { }
 static inline void sched_mm_cid_exit(struct task_struct *t) { }
 static __always_inline int task_mm_cid(struct task_struct *t)
 {
--- /dev/null
+++ b/include/trace/events/mmcid.h
@@ -0,0 +1,171 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mmcid
+
+#if !defined(_TRACE_MMCID_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MMCID_H
+
+#include <linux/sched.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(mmcid_class,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(mm, cid),
+
+	TP_STRUCT__entry(
+		__field( void *,	mm	)
+		__field( unsigned int,	cid	)
+	),
+
+	TP_fast_assign(
+		__entry->mm	= mm;
+		__entry->cid	= cid;
+	),
+
+	TP_printk("mm=%p cid=%08x", __entry->mm, __entry->cid)
+);
+
+DEFINE_EVENT(mmcid_class, mmcid_getcid,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(mm, cid)
+);
+
+DEFINE_EVENT(mmcid_class, mmcid_putcid,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(mm, cid)
+);
+
+DECLARE_EVENT_CLASS(mmcid_task_class,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(t, mm, cid),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	pid	)
+		__field( unsigned int,	cid	)
+		__field( void *,	mm	)
+	),
+
+	TP_fast_assign(
+		__entry->pid	= t->pid;
+		__entry->cid	= cid;
+		__entry->mm	= mm;
+	),
+
+	TP_printk("pid=%u cid=%08x mm=%p", __entry->pid, __entry->cid, __entry->mm)
+);
+
+DEFINE_EVENT(mmcid_task_class, mmcid_task_update,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(t, mm, cid)
+);
+
+DECLARE_EVENT_CLASS(mmcid_fixup_class,
+
+	TP_PROTO(struct task_struct *t, unsigned int users),
+
+	TP_ARGS(t, users),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	pid	)
+		__field( unsigned int,	cid	)
+		__field( unsigned int,	active	)
+		__field( unsigned int,	users	)
+		__field( void *,	mm	)
+	),
+
+	TP_fast_assign(
+		__entry->pid	= t->pid;
+		__entry->cid	= t->mm_cid.cid;
+		__entry->active	= t->mm_cid.active;
+		__entry->users	= users;
+		__entry->mm	= t->mm;
+	),
+
+	TP_printk("pid=%u cid=%08x active=%u users=%u mm=%p", __entry->pid, __entry->cid,
+		  __entry->active, __entry->users, __entry->mm)
+);
+
+DEFINE_EVENT(mmcid_fixup_class, mmcid_fixup_task,
+
+	TP_PROTO(struct task_struct *t, unsigned int users),
+
+	TP_ARGS(t, users)
+);
+
+DECLARE_EVENT_CLASS(mmcid_cpu_class,
+
+	TP_PROTO(unsigned int cpu, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(cpu, mm, cid),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	cpu	)
+		__field( unsigned int,	cid	)
+		__field( void *,	mm	)
+	),
+
+	TP_fast_assign(
+		__entry->cpu	= cpu;
+		__entry->cid	= cid;
+		__entry->mm	= mm;
+	),
+
+	TP_printk("cpu=%u cid=%08x mm=%p", __entry->cpu, __entry->cid, __entry->mm)
+);
+
+DEFINE_EVENT(mmcid_cpu_class, mmcid_cpu_update,
+
+	TP_PROTO(unsigned int cpu, struct mm_struct *mm, unsigned int cid),
+
+	TP_ARGS(cpu, mm, cid)
+);
+
+DECLARE_EVENT_CLASS(mmcid_user_class,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm),
+
+	TP_ARGS(t, mm),
+
+	TP_STRUCT__entry(
+		__field( unsigned int,	pid	)
+		__field( unsigned int,	users	)
+		__field( void *,	mm	)
+	),
+
+	TP_fast_assign(
+		__entry->pid	= t->pid;
+		__entry->users	= mm->mm_cid.users;
+		__entry->mm	= mm;
+	),
+
+	TP_printk("pid=%u users=%u mm=%p", __entry->pid, __entry->users, __entry->mm)
+);
+
+DEFINE_EVENT(mmcid_user_class, mmcid_user_add,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm),
+
+	TP_ARGS(t, mm)
+);
+
+DEFINE_EVENT(mmcid_user_class, mmcid_user_del,
+
+	TP_PROTO(struct task_struct *t, struct mm_struct *mm),
+
+	     TP_ARGS(t, mm)
+);
+
+#endif
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1586,7 +1586,6 @@ static int copy_mm(u64 clone_flags, stru
 
 	tsk->mm = mm;
 	tsk->active_mm = mm;
-	sched_mm_cid_fork(tsk);
 	return 0;
 }
 
@@ -2498,7 +2497,6 @@ static bool need_futex_hash_allocate_def
 	exit_nsproxy_namespaces(p);
 bad_fork_cleanup_mm:
 	if (p->mm) {
-		sched_mm_cid_exit(p);
 		mm_clear_owner(p->mm, p);
 		mmput(p->mm);
 	}
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -86,6 +86,7 @@
 #include <linux/sched/rseq_api.h>
 #include <trace/events/sched.h>
 #include <trace/events/ipi.h>
+#include <trace/events/mmcid.h>
 #undef CREATE_TRACE_POINTS
 
 #include "sched.h"
@@ -4729,8 +4730,12 @@ void sched_cancel_fork(struct task_struc
 	scx_cancel_fork(p);
 }
 
+static void sched_mm_cid_fork(struct task_struct *t);
+
 void sched_post_fork(struct task_struct *p)
 {
+	if (IS_ENABLED(CONFIG_SCHED_MM_CID))
+		sched_mm_cid_fork(p);
 	uclamp_post_fork(p);
 	scx_post_fork(p);
 }
@@ -10569,7 +10574,9 @@ static inline void mm_cid_transit_to_tas
 		unsigned int cid = cpu_cid_to_cid(t->mm_cid.cid);
 
 		t->mm_cid.cid = cid_to_transit_cid(cid);
+		trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 		pcp->cid = t->mm_cid.cid;
+		trace_mmcid_cpu_update(task_cpu(t), t->mm, pcp->cid);
 	}
 }
 
@@ -10602,7 +10609,9 @@ static void mm_cid_fixup_cpus_to_tasks(s
 			if (!cid_in_transit(cid)) {
 				cid = cid_to_transit_cid(cid);
 				rq->curr->mm_cid.cid = cid;
+				trace_mmcid_task_update(rq->curr, rq->curr->mm, cid);
 				pcp->cid = cid;
+				trace_mmcid_cpu_update(cpu, mm, cid);
 			}
 		}
 	}
@@ -10613,7 +10622,9 @@ static inline void mm_cid_transit_to_cpu
 {
 	if (cid_on_task(t->mm_cid.cid)) {
 		t->mm_cid.cid = cid_to_transit_cid(t->mm_cid.cid);
+		trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 		pcp->cid = t->mm_cid.cid;
+		trace_mmcid_cpu_update(task_cpu(t), t->mm, pcp->cid);
 	}
 }
 
@@ -10646,15 +10657,17 @@ static void mm_cid_do_fixup_tasks_to_cpu
 	 * possible switch back to per task mode happens either in the
 	 * deferred handler function or in the next fork()/exit().
 	 *
-	 * The caller has already transferred. The newly incoming task is
-	 * already accounted for, but not yet visible.
+	 * The caller has already transferred so remove it from the users
+	 * count. The incoming task is already visible and has mm_cid.active,
+	 * but has task::mm_cid::cid == UNSET. Still it needs to be accounted
+	 * for. Concurrent fork()s might add more threads, but all of them have
+	 * task::mm_cid::active = 0, so they don't affect the accounting here.
 	 */
-	users = mm->mm_cid.users - 2;
-	if (!users)
-		return;
+	users = mm->mm_cid.users - 1;
 
 	guard(rcu)();
 	for_other_threads(current, t) {
+		trace_mmcid_fixup_task(t, users);
 		if (mm_cid_fixup_task_to_cpu(t, mm))
 			users--;
 	}
@@ -10666,10 +10679,8 @@ static void mm_cid_do_fixup_tasks_to_cpu
 	for_each_process_thread(p, t) {
 		if (t == current || t->mm != mm)
 			continue;
-		if (mm_cid_fixup_task_to_cpu(t, mm)) {
-			if (--users == 0)
-				return;
-		}
+		trace_mmcid_fixup_task(t, users);
+		mm_cid_fixup_task_to_cpu(t, mm);
 	}
 }
 
@@ -10685,15 +10696,19 @@ static bool sched_mm_cid_add_user(struct
 {
 	t->mm_cid.active = 1;
 	mm->mm_cid.users++;
+	trace_mmcid_user_add(t, mm);
 	return mm_update_max_cids(mm);
 }
 
-void sched_mm_cid_fork(struct task_struct *t)
+static void sched_mm_cid_fork(struct task_struct *t)
 {
 	struct mm_struct *mm = t->mm;
 	bool percpu;
 
-	WARN_ON_ONCE(!mm || t->mm_cid.cid != MM_CID_UNSET);
+	if (!mm)
+		return;
+
+	WARN_ON_ONCE(t->mm_cid.cid != MM_CID_UNSET);
 
 	guard(mutex)(&mm->mm_cid.mutex);
 	scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) {
@@ -10727,6 +10742,7 @@ void sched_mm_cid_fork(struct task_struc
 	} else {
 		mm_cid_fixup_cpus_to_tasks(mm);
 		t->mm_cid.cid = mm_get_cid(mm);
+		trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 	}
 }
 
@@ -10739,6 +10755,7 @@ static bool sched_mm_cid_remove_user(str
 		mm_unset_cid_on_task(t);
 	}
 	t->mm->mm_cid.users--;
+	trace_mmcid_user_del(t, t->mm);
 	return mm_update_max_cids(t->mm);
 }
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -75,6 +75,7 @@
 #include <linux/delayacct.h>
 #include <linux/mmu_context.h>
 
+#include <trace/events/mmcid.h>
 #include <trace/events/power.h>
 #include <trace/events/sched.h>
 
@@ -3809,6 +3810,7 @@ static __always_inline bool cid_on_task(
 
 static __always_inline void mm_drop_cid(struct mm_struct *mm, unsigned int cid)
 {
+	trace_mmcid_putcid(mm, cid);
 	clear_bit(cid, mm_cidmask(mm));
 }
 
@@ -3817,6 +3819,7 @@ static __always_inline void mm_unset_cid
 	unsigned int cid = t->mm_cid.cid;
 
 	t->mm_cid.cid = MM_CID_UNSET;
+	trace_mmcid_task_update(t, t->mm, t->mm_cid.cid);
 	if (cid_on_task(cid))
 		mm_drop_cid(t->mm, cid);
 }
@@ -3838,6 +3841,7 @@ static inline unsigned int __mm_get_cid(
 		return MM_CID_UNSET;
 	if (test_and_set_bit(cid, mm_cidmask(mm)))
 		return MM_CID_UNSET;
+	trace_mmcid_getcid(mm, cid);
 	return cid;
 }
 
@@ -3845,9 +3849,17 @@ static inline unsigned int mm_get_cid(st
 {
 	unsigned int cid = __mm_get_cid(mm, READ_ONCE(mm->mm_cid.max_cids));
 
-	while (cid == MM_CID_UNSET) {
-		cpu_relax();
-		cid = __mm_get_cid(mm, num_possible_cpus());
+	if (cid == MM_CID_UNSET) {
+		ktime_t t0 = ktime_get();
+
+		while (cid == MM_CID_UNSET) {
+			cpu_relax();
+			cid = __mm_get_cid(mm, num_possible_cpus());
+			if (ktime_get() - t0 > 50 * NSEC_PER_MSEC) {
+				tracing_off();
+				WARN_ON_ONCE(1);
+			}
+		}
 	}
 	return cid;
 }
@@ -3874,6 +3886,7 @@ static inline unsigned int mm_cid_conver
 static __always_inline void mm_cid_update_task_cid(struct task_struct *t, unsigned int cid)
 {
 	if (t->mm_cid.cid != cid) {
+		trace_mmcid_task_update(t, t->mm, cid);
 		t->mm_cid.cid = cid;
 		rseq_sched_set_ids_changed(t);
 	}
@@ -3881,6 +3894,7 @@ static __always_inline void mm_cid_updat
 
 static __always_inline void mm_cid_update_pcpu_cid(struct mm_struct *mm, unsigned int cid)
 {
+	trace_mmcid_cpu_update(smp_processor_id(), mm, cid);
 	__this_cpu_write(mm->mm_cid.pcpu->cid, cid);
 }
 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10 12:28                                                     ` Thomas Gleixner
@ 2026-03-10 13:40                                                       ` Matthieu Baerts
  2026-03-10 13:47                                                         ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-10 13:40 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On 10/03/2026 13:28, Thomas Gleixner wrote:

(...)

> I've updated the debug patch and removed the @users conditionals so it
> keeps searching. So it should find that magic 73.

It looks like it does, thank you! I just tried this new debug patch, and
I managed to boot 100 times without issues, while before I was getting
it after max 20 attempts. I left the test running to boot up to 1000
times, just in case.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10 13:40                                                       ` Matthieu Baerts
@ 2026-03-10 13:47                                                         ` Thomas Gleixner
  2026-03-10 15:51                                                           ` Matthieu Baerts
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2026-03-10 13:47 UTC (permalink / raw)
  To: Matthieu Baerts, Peter Zijlstra
  Cc: Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On Tue, Mar 10 2026 at 14:40, Matthieu Baerts wrote:
> On 10/03/2026 13:28, Thomas Gleixner wrote:
>
> (...)
>
>> I've updated the debug patch and removed the @users conditionals so it
>> keeps searching. So it should find that magic 73.
>
> It looks like it does, thank you! I just tried this new debug patch, and
> I managed to boot 100 times without issues, while before I was getting
> it after max 20 attempts. I left the test running to boot up to 1000
> times, just in case.

Now that I actually understood the problem, I was able to write a
reproducer which triggers the issue 100% reliable.

The debug patch does cure it and it does the right thing. I have a
better version of that in testing right now which avoid all that flaky
accounting and especially in the vfork() case the full tasklist walk.
I'll send out a complete patch series with changelogs etc. later.

Thanks a lot for your patience and invaluable help. Very appreciated!

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
  2026-03-10 13:47                                                         ` Thomas Gleixner
@ 2026-03-10 15:51                                                           ` Matthieu Baerts
  0 siblings, 0 replies; 45+ messages in thread
From: Matthieu Baerts @ 2026-03-10 15:51 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Jiri Slaby, Stefan Hajnoczi, Stefano Garzarella, kvm,
	virtualization, Netdev, rcu, MPTCP Linux, Linux Kernel,
	Shinichiro Kawasaki, Paul E. McKenney, Dave Hansen, luto,
	Michal Koutný, Waiman Long, Marco Elver

On 10/03/2026 14:47, Thomas Gleixner wrote:
> On Tue, Mar 10 2026 at 14:40, Matthieu Baerts wrote:
>> On 10/03/2026 13:28, Thomas Gleixner wrote:
>>
>> (...)
>>
>>> I've updated the debug patch and removed the @users conditionals so it
>>> keeps searching. So it should find that magic 73.
>>
>> It looks like it does, thank you! I just tried this new debug patch, and
>> I managed to boot 100 times without issues, while before I was getting
>> it after max 20 attempts. I left the test running to boot up to 1000
>> times, just in case.
> 
> Now that I actually understood the problem, I was able to write a
> reproducer which triggers the issue 100% reliable.
> 
> The debug patch does cure it and it does the right thing. I have a
> better version of that in testing right now which avoid all that flaky
> accounting and especially in the vfork() case the full tasklist walk.
> I'll send out a complete patch series with changelogs etc. later.

Great, thank you!

Note that my 1000 boot iteration finished without errors.

> Thanks a lot for your patience and invaluable help. Very appreciated!

My pleasure! Thank you for your support, and for maintaining so many
important pieces in the Linux kernel!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2026-03-10 15:51 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-06 11:54 Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout Matthieu Baerts
2026-02-06 16:38 ` Stefano Garzarella
2026-02-06 17:13   ` Matthieu Baerts
2026-02-26 10:37 ` Jiri Slaby
2026-03-02  5:28   ` Jiri Slaby
2026-03-02 11:46     ` Peter Zijlstra
2026-03-02 14:30       ` Waiman Long
2026-03-05  7:00       ` Jiri Slaby
2026-03-05 11:53         ` Jiri Slaby
2026-03-05 12:20           ` Jiri Slaby
2026-03-05 16:16             ` Thomas Gleixner
2026-03-05 17:33               ` Jiri Slaby
2026-03-05 19:25                 ` Thomas Gleixner
2026-03-06  5:48                   ` Jiri Slaby
2026-03-06  9:57                     ` Thomas Gleixner
2026-03-06 10:16                       ` Jiri Slaby
2026-03-06 16:28                         ` Thomas Gleixner
2026-03-06 11:06                       ` Matthieu Baerts
2026-03-06 16:57                         ` Matthieu Baerts
2026-03-06 18:31                           ` Jiri Slaby
2026-03-06 18:44                             ` Matthieu Baerts
2026-03-06 21:40                           ` Matthieu Baerts
2026-03-06 15:24                       ` Peter Zijlstra
2026-03-07  9:01                         ` Thomas Gleixner
2026-03-07 22:29                           ` Thomas Gleixner
2026-03-08  9:15                             ` Thomas Gleixner
2026-03-08 16:55                               ` Jiri Slaby
2026-03-08 16:58                               ` Thomas Gleixner
2026-03-08 17:23                                 ` Matthieu Baerts
2026-03-09  8:43                                   ` Thomas Gleixner
2026-03-09 12:23                                     ` Matthieu Baerts
2026-03-10  8:09                                       ` Thomas Gleixner
2026-03-10  8:20                                         ` Thomas Gleixner
2026-03-10  8:56                                         ` Jiri Slaby
2026-03-10  9:00                                           ` Jiri Slaby
2026-03-10 10:03                                             ` Thomas Gleixner
2026-03-10 10:06                                               ` Thomas Gleixner
2026-03-10 11:24                                                 ` Matthieu Baerts
2026-03-10 11:54                                                   ` Peter Zijlstra
2026-03-10 12:28                                                     ` Thomas Gleixner
2026-03-10 13:40                                                       ` Matthieu Baerts
2026-03-10 13:47                                                         ` Thomas Gleixner
2026-03-10 15:51                                                           ` Matthieu Baerts
2026-03-03 13:23   ` Matthieu Baerts
2026-03-05  6:46     ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox