[sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
@ 2024-07-09  7:51 Aboorva Devarajan
  2024-07-10  0:02 ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Aboorva Devarajan @ 2024-07-09  7:51 UTC (permalink / raw)
  To: tj, void; +Cc: aboorvad, linux-kernel

I'm currently experimenting with sched_ext schedulers and have
encountered an issue with CPU hotplug when running scx_simple
scheduler, this issue is intermittent and doesn't happen frequently.
However, it appears to be a common problem when performing CPU
hotplug with any external scheduler.

---------------------------------------------------------------

Here is the system info:

Kernel Version:  Latest sched-ext (for-next) tree with patches to
support struct_ops on ppc64le.

root@ltc # lscpu
Architecture:                         ppc64le
Byte Order:                           Little Endian
CPU(s):                               128
On-line CPU(s) list:                  0-127
Thread(s) per core:                   4
Core(s) per socket:                   16
Socket(s):                            2
NUMA node(s):                         8
Model:                                2.3 (pvr 004e 1203)
Model name:                           POWER9, altivec supported
Frequency boost:                      enabled
CPU max MHz:                          3800.0000
CPU min MHz:                          2300.0000
L1d cache:                            1 MiB
L1i cache:                            1 MiB
L2 cache:                             8 MiB
L3 cache:                             160 MiB
NUMA node0 CPU(s):                    0-63
NUMA node8 CPU(s):                    64-127

---------------------------------------------------------------

Steps to reproduce the issue:

- Run the scx_simple scheduler.
- Randomly offline and online CPUs from 1 to 127.

During the CPU hotplug, the custom scheduler successfully
unregisters and registers itself without issues. However, it
occasionally encounters a crash and the system becomes unresponsive.

---------------------------------------------------------------

Here is the crash log:

[64650.796448] INFO: task rcu_tasks_trace:14 blocked for more than 120 seconds.
[64650.796557]       Not tainted 6.10.0-rc2+ #4
[64650.796630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[64650.796688] task:rcu_tasks_trace state:D stack:0     pid:14    tgid:14    ppid:2      flags:0x00000000
[64650.796748] Sched_ext: simple (prepping)
[64650.796771] Call Trace:
[64650.796841] [c000000ddf0c3700] [c0000000016f0c06] scx_dump_state.trunc_marker+0x2078/0x5d2a (unreliable)
[64650.796950] [c000000ddf0c38b0] [c00000000001dd68] __switch_to+0x238/0x310
[64650.797041] [c000000ddf0c3910] [c0000000013e6170] __schedule+0xa10/0xf80
[64650.797126] [c000000ddf0c3a10] [c0000000013e6764] schedule+0x84/0x128
[64650.797208] [c000000ddf0c3a40] [c00000000023f4dc] percpu_rwsem_wait+0x13c/0x1a0
[64650.797299] [c000000ddf0c3ac0] [c0000000013ee7ec] __percpu_down_read+0xac/0x21c
[64650.797371] [c000000ddf0c3b40] [c0000000001739b8] cpus_read_lock+0x158/0x180
[64650.797468] [c000000ddf0c3b80] [c000000000285a94] check_all_holdout_tasks_trace+0x84/0x450
[64650.797564] [c000000ddf0c3cc0] [c000000000284c68] rcu_tasks_wait_gp+0x318/0x430
[64650.797657] [c000000ddf0c3dd0] [c000000000284394] rcu_tasks_one_gp+0x4f4/0x5e0
[64650.797747] [c000000ddf0c3f00] [c0000000002850ac] rcu_tasks_kthread+0x12c/0x150
[64650.797838] [c000000ddf0c3f80] [c0000000001be158] kthread+0x1c8/0x1e0
[64650.797867] [c000000ddf0c3fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
[64650.799240] INFO: task kworker/0:3:1216 blocked for more than 120 seconds.
[64650.799324]       Not tainted 6.10.0-rc2+ #4
[64650.799419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[64650.799487] task:kworker/0:3     state:D stack:0     pid:1216  tgid:1216  ppid:2      flags:0x00000000
[64650.799575] Workqueue: events vmstat_shepherd
[64650.799666] Sched_ext: simple (prepping)
[64650.799696] Call Trace:
[64650.799830] [c000000ec40af8e0] [c0000000016f0c06] scx_dump_state.trunc_marker+0x2078/0x5d2a (unreliable)
[64650.800001] [c000000ec40afa90] [c00000000001dd68] __switch_to+0x238/0x310
[64650.800139] [c000000ec40afaf0] [c0000000013e6170] __schedule+0xa10/0xf80
[64650.800234] [c000000ec40afbf0] [c0000000013e6764] schedule+0x84/0x128
[64650.800336] [c000000ec40afc20] [c00000000023f4dc] percpu_rwsem_wait+0x13c/0x1a0
[64650.800443] [c000000ec40afca0] [c0000000013ee7ec] __percpu_down_read+0xac/0x21c
[64650.800550] [c000000ec40afd20] [c0000000001739b8] cpus_read_lock+0x158/0x180
[64650.800694] [c000000ec40afd60] [c00000000051bb88] vmstat_shepherd+0x48/0x1b0
[64650.800798] [c000000ec40afdd0] [c0000000001b0208] process_one_work+0x2c8/0x6d0
[64650.800884] [c000000ec40afed0] [c0000000001b3c88] worker_thread+0x3d8/0x560
[64650.800948] [c000000ec40aff80] [c0000000001be158] kthread+0x1c8/0x1e0
[64650.801051] [c000000ec40affe0] [c00000000000d030] start_kernel_thread+0x14/0x18
[64650.801563] INFO: task scx_simple:11214 blocked for more than 120 seconds.
[64650.801668]       Not tainted 6.10.0-rc2+ #4
[64650.801734] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[64650.801834] task:scx_simple      state:D stack:0     pid:11214 tgid:11214 ppid:11013  flags:0x00040080
[64650.801949] Sched_ext: simple (prepping)
[64650.801978] Call Trace:
[64650.802062] [c000000e9bdf35b0] [c0000000016f0c06] scx_dump_state.trunc_marker+0x2078/0x5d2a (unreliable)
[64650.802231] [c000000e9bdf3760] [c00000000001dd68] __switch_to+0x238/0x310
[64650.802437] [c000000e9bdf37c0] [c0000000013e6170] __schedule+0xa10/0xf80
[64650.802599] [c000000e9bdf38c0] [c0000000013e6764] schedule+0x84/0x128
[64650.802700] [c000000e9bdf38f0] [c00000000023f4dc] percpu_rwsem_wait+0x13c/0x1a0
[64650.802807] [c000000e9bdf3970] [c0000000013ee7ec] __percpu_down_read+0xac/0x21c
[64650.802926] [c000000e9bdf39f0] [c0000000001739b8] cpus_read_lock+0x158/0x180
[64650.803065] [c000000e9bdf3a30] [c00000000021fd88] bpf_scx_reg+0x5d8/0xb80
[64650.803161] [c000000e9bdf3c10] [c00000000049e308] bpf_struct_ops_link_create+0x158/0x200
[64650.803259] [c000000e9bdf3c80] [c000000000407628] link_create+0x78/0x3b0
[64650.803319] [c000000e9bdf3cd0] [c0000000004026fc] __sys_bpf+0x39c/0x560
[64650.803389] [c000000e9bdf3dc0] [c000000000400b00] sys_bpf+0x50/0x80
[64650.803449] [c000000e9bdf3e00] [c00000000003218c] system_call_exception+0x10c/0x2b0
[64650.803559] [c000000e9bdf3e50] [c00000000000c7d4] system_call_common+0xf4/0x258
[64650.803679] --- interrupt: c00 at 0x737414d42474
[64650.803783] NIP:  0000737414d42474 LR: 000004382701451c CTR: 0000000000000000
[64650.803848] REGS: c000000e9bdf3e80 TRAP: 0c00   Not tainted  (6.10.0-rc2+)
[64650.803953] MSR:  900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE>  CR: 28048824  XER: 00000000
[64650.804202] IRQMASK: 0 
[64650.804202] GPR00: 0000000000000169 00007fffc9ebc2b0 0000737414e37100 000000000000001c 
[64650.804202] GPR04: 00007fffc9ebc430 0000000000000040 0000000000000001 0000043827017f00 
[64650.804202] GPR08: 000000000000001c 0000000000000000 0000000000000000 0000000000000000 
[64650.804202] GPR12: 0000000000000000 000073741514b9c0 00000438656ced30 000004382705bd48 
[64650.804202] GPR16: 000004382705bcb0 000004382705bf00 ffffffffffffffff 00000438270a0430 
[64650.804202] GPR20: 0000737414e318c8 00007fffc9ebc600 00007fffc9ebc580 00000000ffffffff 
[64650.804202] GPR24: 0000000000000001 00000438270a0438 00007fffc9ebcae8 000004382705bea8 
[64650.804202] GPR28: 000004382705bff0 000004382705bd00 0000000000000001 00007fffc9ebc2b0 
[64650.804953] NIP [0000737414d42474] 0x737414d42474
[64650.805045] LR [000004382701451c] 0x4382701451c
[64650.805118] --- interrupt: c00
[64650.805275] INFO: task hotplug.sh:11440 blocked for more than 120 seconds.
[64650.805369]       Not tainted 6.10.0-rc2+ #4
[64650.805451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[64650.805532] task:hotplug.sh      state:D stack:0     pid:11440 tgid:11440 ppid:10770  flags:0x00040080
[64650.805629] Sched_ext: simple (prepping)
[64650.805653] Call Trace:
[64650.805756] [c00020006ef233e0] [c0000000016f0c06] scx_dump_state.trunc_marker+0x2078/0x5d2a (unreliable)
[64650.805860] [c00020006ef23590] [c00000000001dd68] __switch_to+0x238/0x310
[64650.805980] [c00020006ef235f0] [c0000000013e6170] __schedule+0xa10/0xf80
[64650.806088] [c00020006ef236f0] [c0000000013e6764] schedule+0x84/0x128
[64650.806182] [c00020006ef23720] [c0000000013f2a48] schedule_timeout+0x48/0x128
[64650.806374] [c00020006ef23800] [c0000000013e703c] wait_for_common+0x118/0x1f4
[64650.806519] [c00020006ef23880] [c00000000017889c] cpuhp_kick_ap+0x13c/0x2a0
[64650.806618] [c00020006ef238e0] [c000000000177a18] cpuhp_invoke_callback+0x278/0xac0
[64650.806691] [c00020006ef239f0] [c000000000175978] _cpu_up+0x238/0x420
[64650.806782] [c00020006ef23ab0] [c000000000174d18] cpu_up+0x168/0x1b0
[64650.806842] [c00020006ef23b40] [c000000000cf382c] cpu_subsys_online+0x6c/0x1b0
[64650.806925] [c00020006ef23b90] [c000000000ce5b20] online_store+0xe0/0x1a0
[64650.807008] [c00020006ef23bf0] [c000000000ce554c] dev_attr_store+0x6c/0xb0
[64650.807072] [c00020006ef23c40] [c000000000765798] sysfs_kf_write+0xd8/0x110
[64650.807138] [c00020006ef23c90] [c0000000007624f4] kernfs_fop_write_iter+0x1b4/0x260
[64650.807208] [c00020006ef23cf0] [c000000000643fb8] vfs_write+0x288/0x370
[64650.807285] [c00020006ef23da0] [c0000000006442dc] ksys_write+0x9c/0x120
[64650.807332] [c00020006ef23e00] [c00000000003218c] system_call_exception+0x10c/0x2b0
[64650.807409] [c00020006ef23e50] [c00000000000c7d4] system_call_common+0xf4/0x258
[64650.807459] --- interrupt: c00 at 0x7f4acef33614
[64650.807483] NIP:  00007f4acef33614 LR: 00007f4acee9f374 CTR: 0000000000000000
[64650.807549] REGS: c00020006ef23e80 TRAP: 0c00   Not tainted  (6.10.0-rc2+)
[64650.807625] MSR:  900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE>  CR: 28422488  XER: 00000000
[64650.807800] IRQMASK: 0 
[64650.807800] GPR00: 0000000000000004 00007ffff045b7f0 00007f4acf037100 0000000000000001 
[64650.807800] GPR04: 0000077e966d2f10 0000000000000002 0000000000000010 0000000000000020 
[64650.807800] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[64650.807800] GPR12: 0000000000000000 00007f4acf21b330 0000077e966d6130 0000077e79ceb738 
[64650.807800] GPR16: 0000077e79c00ac0 0000000000000000 0000000020000000 0000077e966d2d70 
[64650.807800] GPR20: 0000000000000000 0000000000000001 0000077e79c93c80 00007ffff045b9e4 
[64650.807800] GPR24: 00007ffff045b9e0 0000077e79ceb0d0 00007f4acf02c0d8 00007f4acf02ca08 
[64650.807800] GPR28: 0000000000000002 00007f4acf0317e8 0000077e966d2f10 0000000000000002 
[64650.808157] NIP [00007f4acef33614] 0x7f4acef33614
[64650.808215] LR [00007f4acee9f374] 0x7f4acee9f374
[64650.808272] --- interrupt: c00
[64650.808330] 

---------------------------------------------------------------

More info provided by lockdep:

[64650.808330] Showing all locks held in the system:
[64650.808391] 1 lock held by kthreadd/2:
[64650.808431]  #0: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x80/0x240
[64650.808578] 2 locks held by rcu_tasks_trace/14:
[64650.808647]  #0: c000000002a0c550 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{4:4}, at: rcu_tasks_one_gp+0x494/0x5e0
[64650.808764]  #1: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: check_all_holdout_tasks_trace+0x84/0x450
[64650.808868] 3 locks held by cpuhp/8/64:
[64650.808935]  #0: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: lockdep_acquire_cpus_lock+0x24/0x80
[64650.809004]  #1: c0000000029133a8 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_lock_acquire+0xc/0x80
[64650.809135]  #2: c00000000291e448 (wq_pool_mutex){+.+.}-{4:4}, at: workqueue_online_cpu+0x84/0x6c0
[64650.809351] 1 lock held by khungtaskd/843:
[64650.809399]  #0: c000000002a0c110 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x60
[64650.809522] 3 locks held by kworker/0:3/1216:
[64650.809581]  #0: c000000004809958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x218/0x6d0
[64650.809697]  #1: c000000ec40afe40 ((shepherd).work){+.+.}-{0:0}, at: process_one_work+0x244/0x6d0
[64650.809835]  #2: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: vmstat_shepherd+0x48/0x1b0
[64650.809965] 1 lock held by systemd-journal/1389:
[64650.809983]  #0: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x80/0x240
[64650.810122] 1 lock held by in:imklog/2498:
[64650.810194]  #0: c000000e135cb6d8 (&f->f_pos_lock){+.+.}-{4:4}, at: __fdget_pos+0xd0/0x130
[64650.810333] 5 locks held by rs:main Q:Reg/2499:
[64650.810515] 4 locks held by scx_simple/11214:
[64650.810606]  #0: c000000002a609b0 (update_mutex){+.+.}-{4:4}, at: bpf_struct_ops_link_create+0x134/0x200
[64650.810781]  #1: c000000002927170 (scx_ops_enable_mutex){+.+.}-{4:4}, at: bpf_scx_reg+0x80/0xb80
[64650.810886]  #2: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: bpf_scx_reg+0x5d0/0xb80
[64650.811023]  #3: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: bpf_scx_reg+0x5d8/0xb80
[64650.811174] 7 locks held by hotplug.sh/11440:
[64650.811240]  #0: c000000e9bfe6450 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0xcc/0x370
[64650.811362]  #1: c000000e11b31e90 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x10c/0x260
[64650.811514]  #2: c000000df2892620 (kn->active#73){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x118/0x260
[64650.811675]  #3: c000000002aebc08 (device_hotplug_lock){+.+.}-{4:4}, at: online_store+0x6c/0x1a0
[64650.811814]  #4: c000007ff4ce0128 (&dev->mutex){....}-{4:4}, at: online_store+0x94/0x1a0
[64650.811929]  #5: c000000002913168 (cpu_add_remove_lock){+.+.}-{4:4}, at: cpu_up+0xa4/0x1b0
[64650.812061]  #6: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_up+0x78/0x420
[64650.812224] 
[64650.812274] =============================================
[64650.812274] 

---------------------------------------------------------------

This issue occurs when the CPU comes back online and the BPF scheduler
attempts to re-register. I'll keep looking into it and share any
additional information I find.


Thanks,
Aboorva


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-07-09  7:51 [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug Aboorva Devarajan
@ 2024-07-10  0:02 ` Tejun Heo
  2024-07-11 12:00   ` Aboorva Devarajan
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2024-07-10  0:02 UTC (permalink / raw)
  To: Aboorva Devarajan; +Cc: void, linux-kernel

Hello,

On Tue, Jul 09, 2024 at 01:21:38PM +0530, Aboorva Devarajan wrote:
...
> Steps to reproduce the issue:
> 
> - Run the scx_simple scheduler.
> - Randomly offline and online CPUs from 1 to 127.
> 
> During the CPU hotplug, the custom scheduler successfully
> unregisters and registers itself without issues. However, it
> occasionally encounters a crash and the system becomes unresponsive.

How reproducible is the problem?

...
> [64650.801834] task:scx_simple      state:D stack:0     pid:11214 tgid:11214 ppid:11013  flags:0x00040080
> [64650.801949] Sched_ext: simple (prepping)
> [64650.801978] Call Trace:
> [64650.802062] [c000000e9bdf35b0] [c0000000016f0c06] scx_dump_state.trunc_marker+0x2078/0x5d2a (unreliable)
> [64650.802231] [c000000e9bdf3760] [c00000000001dd68] __switch_to+0x238/0x310
> [64650.802437] [c000000e9bdf37c0] [c0000000013e6170] __schedule+0xa10/0xf80
> [64650.802599] [c000000e9bdf38c0] [c0000000013e6764] schedule+0x84/0x128
> [64650.802700] [c000000e9bdf38f0] [c00000000023f4dc] percpu_rwsem_wait+0x13c/0x1a0
> [64650.802807] [c000000e9bdf3970] [c0000000013ee7ec] __percpu_down_read+0xac/0x21c
> [64650.802926] [c000000e9bdf39f0] [c0000000001739b8] cpus_read_lock+0x158/0x180
> [64650.803065] [c000000e9bdf3a30] [c00000000021fd88] bpf_scx_reg+0x5d8/0xb80
> [64650.803161] [c000000e9bdf3c10] [c00000000049e308] bpf_struct_ops_link_create+0x158/0x200
> [64650.803259] [c000000e9bdf3c80] [c000000000407628] link_create+0x78/0x3b0
> [64650.803319] [c000000e9bdf3cd0] [c0000000004026fc] __sys_bpf+0x39c/0x560
> [64650.803389] [c000000e9bdf3dc0] [c000000000400b00] sys_bpf+0x50/0x80
> [64650.803449] [c000000e9bdf3e00] [c00000000003218c] system_call_exception+0x10c/0x2b0
> [64650.803559] [c000000e9bdf3e50] [c00000000000c7d4] system_call_common+0xf4/0x258
...
> [64650.808330] Showing all locks held in the system:
> [64650.808391] 1 lock held by kthreadd/2:
> [64650.808431]  #0: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x80/0x240
...
> [64650.810515] 4 locks held by scx_simple/11214:
> [64650.810606]  #0: c000000002a609b0 (update_mutex){+.+.}-{4:4}, at: bpf_struct_ops_link_create+0x134/0x200
> [64650.810781]  #1: c000000002927170 (scx_ops_enable_mutex){+.+.}-{4:4}, at: bpf_scx_reg+0x80/0xb80
> [64650.810886]  #2: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: bpf_scx_reg+0x5d0/0xb80
> [64650.811023]  #3: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: bpf_scx_reg+0x5d8/0xb80
> [64650.811174] 7 locks held by hotplug.sh/11440:
> [64650.811240]  #0: c000000e9bfe6450 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0xcc/0x370
> [64650.811362]  #1: c000000e11b31e90 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x10c/0x260
> [64650.811514]  #2: c000000df2892620 (kn->active#73){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x118/0x260
> [64650.811675]  #3: c000000002aebc08 (device_hotplug_lock){+.+.}-{4:4}, at: online_store+0x6c/0x1a0
> [64650.811814]  #4: c000007ff4ce0128 (&dev->mutex){....}-{4:4}, at: online_store+0x94/0x1a0
> [64650.811929]  #5: c000000002913168 (cpu_add_remove_lock){+.+.}-{4:4}, at: cpu_up+0xa4/0x1b0
> [64650.812061]  #6: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_up+0x78/0x420

It's difficult to tell from the log but I wonder whether the problem is CPU
hotplug operation needing to fork to complete (e.g. maybe it needs to
schedule a work item and flush it?) while sched_ext enable path first blocks
forks through scx_fork_rwsem and then does cpus_read_lock(). That'd create
ABBA deadlock. It's a bit tricky as the existing code already makes it
impossible to flip the nesting order.

It could also be specific to ppc64 (e.g. there's something dependingon fork
during CPU hotplug), or at least doesn't happen on x86. I'll probe it a bit
more.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-07-10  0:02 ` Tejun Heo
@ 2024-07-11 12:00   ` Aboorva Devarajan
  2024-07-12 10:04     ` Aboorva Devarajan
  0 siblings, 1 reply; 18+ messages in thread
From: Aboorva Devarajan @ 2024-07-11 12:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, linux-kernel

Hi Tejun,

On Tue, 2024-07-09 at 14:02 -1000, Tejun Heo wrote:

> Hello,
> 
> On Tue, Jul 09, 2024 at 01:21:38PM +0530, Aboorva Devarajan wrote:
> ...
> > Steps to reproduce the issue:
> > 
> > - Run the scx_simple scheduler.
> > - Randomly offline and online CPUs from 1 to 127.
> > 
> > During the CPU hotplug, the custom scheduler successfully
> > unregisters and registers itself without issues. However, it
> > occasionally encounters a crash and the system becomes unresponsive.
> 
> How reproducible is the problem?
> 
> 

This is highly transient and occurs approximately once in every 15-20 mins
when continuosly running CPU hotplug online and offline operations while
`scx_simple` scheduler is up and running.

> ...
> > [64650.801834] task:scx_simple      state:D stack:0     pid:11214 tgid:11214 ppid:11013  flags:0x00040080
> > [64650.801949] Sched_ext: simple (prepping)
> > [64650.801978] Call Trace:
> > [64650.802062] [c000000e9bdf35b0] [c0000000016f0c06] scx_dump_state.trunc_marker+0x2078/0x5d2a (unreliable)
> > [64650.802231] [c000000e9bdf3760] [c00000000001dd68] __switch_to+0x238/0x310
> > [64650.802437] [c000000e9bdf37c0] [c0000000013e6170] __schedule+0xa10/0xf80
> > [64650.802599] [c000000e9bdf38c0] [c0000000013e6764] schedule+0x84/0x128
> > [64650.802700] [c000000e9bdf38f0] [c00000000023f4dc] percpu_rwsem_wait+0x13c/0x1a0
> > [64650.802807] [c000000e9bdf3970] [c0000000013ee7ec] __percpu_down_read+0xac/0x21c
> > [64650.802926] [c000000e9bdf39f0] [c0000000001739b8] cpus_read_lock+0x158/0x180
> > [64650.803065] [c000000e9bdf3a30] [c00000000021fd88] bpf_scx_reg+0x5d8/0xb80
> > [64650.803161] [c000000e9bdf3c10] [c00000000049e308] bpf_struct_ops_link_create+0x158/0x200
> > [64650.803259] [c000000e9bdf3c80] [c000000000407628] link_create+0x78/0x3b0
> > [64650.803319] [c000000e9bdf3cd0] [c0000000004026fc] __sys_bpf+0x39c/0x560
> > [64650.803389] [c000000e9bdf3dc0] [c000000000400b00] sys_bpf+0x50/0x80
> > [64650.803449] [c000000e9bdf3e00] [c00000000003218c] system_call_exception+0x10c/0x2b0
> > [64650.803559] [c000000e9bdf3e50] [c00000000000c7d4] system_call_common+0xf4/0x258
> ...
> > [64650.808330] Showing all locks held in the system:
> > [64650.808391] 1 lock held by kthreadd/2:
> > [64650.808431]  #0: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x80/0x240
> ...
> > [64650.810515] 4 locks held by scx_simple/11214:
> > [64650.810606]  #0: c000000002a609b0 (update_mutex){+.+.}-{4:4}, at: bpf_struct_ops_link_create+0x134/0x200
> > [64650.810781]  #1: c000000002927170 (scx_ops_enable_mutex){+.+.}-{4:4}, at: bpf_scx_reg+0x80/0xb80
> > [64650.810886]  #2: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: bpf_scx_reg+0x5d0/0xb80
> > [64650.811023]  #3: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: bpf_scx_reg+0x5d8/0xb80
> > [64650.811174] 7 locks held by hotplug.sh/11440:
> > [64650.811240]  #0: c000000e9bfe6450 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0xcc/0x370
> > [64650.811362]  #1: c000000e11b31e90 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x10c/0x260
> > [64650.811514]  #2: c000000df2892620 (kn->active#73){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x118/0x260
> > [64650.811675]  #3: c000000002aebc08 (device_hotplug_lock){+.+.}-{4:4}, at: online_store+0x6c/0x1a0
> > [64650.811814]  #4: c000007ff4ce0128 (&dev->mutex){....}-{4:4}, at: online_store+0x94/0x1a0
> > [64650.811929]  #5: c000000002913168 (cpu_add_remove_lock){+.+.}-{4:4}, at: cpu_up+0xa4/0x1b0
> > [64650.812061]  #6: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_up+0x78/0x420
> 
> It's difficult to tell from the log but I wonder whether the problem is CPU
> hotplug operation needing to fork to complete (e.g. maybe it needs to
> schedule a work item and flush it?) while sched_ext enable path first blocks
> forks through scx_fork_rwsem and then does cpus_read_lock(). That'd create
> ABBA deadlock. It's a bit tricky as the existing code already makes it
> impossible to flip the nesting order.

Yes, it looks like it could be a ABBA deadlock, but I'm not able to get enough
details with lockdep and lockstat enabled. I'll try to collect more traces and
investigate further.

> 
> It could also be specific to ppc64 (e.g. there's something dependingon fork
> during CPU hotplug), or at least doesn't happen on x86. I'll probe it a bit
> more.

Sure, so far I haven't been able to reproduce the issue on x86, also I'm running
some tests for a longer duration to see if it can be recreated on x86 as well
inorder to rule out the possiblity of it being a ppc64 specific issue, it could
be that the issue occurs much rarer on x86 when compared to ppc64.


Thanks,

Aboorva


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-07-11 12:00   ` Aboorva Devarajan
@ 2024-07-12 10:04     ` Aboorva Devarajan
  2024-08-02 21:20       ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Aboorva Devarajan @ 2024-07-12 10:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, linux-kernel, aboorvad

Hi Tejun,

On Thu, 2024-07-11 at 17:30 +0530, Aboorva Devarajan wrote:

> > It could also be specific to ppc64 (e.g. there's something dependingon fork
> > during CPU hotplug), or at least doesn't happen on x86. I'll probe it a bit
> > more.
> 
> Sure, so far I haven't been able to reproduce the issue on x86, also I'm running
> some tests for a longer duration to see if it can be recreated on x86 as well
> inorder to rule out the possiblity of it being a ppc64 specific issue, it could
> be that the issue occurs much rarer on x86 when compared to ppc64.
> 

I have been able to recreate a similar issue on x86 as well, so this
rules out the possibility of it being ppc64 specific.

---------------------------------------------------------------
Kernel Version:
6.10.0-rc7-next-20240711-sched-ext

System information:
# lscpu
Architecture:         x86_64
CPU op-mode(s):       32-bit, 64-bit
Address sizes:        46 bits physical, 48 bits virtual
Byte Order:           Little Endian
CPU(s):               32
On-line CPU(s) list:  0-31
Vendor ID:            GenuineIntel
Model name:           Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz
CPU family:           6
Model:                45
Thread(s) per core:   2
Core(s) per socket:   8
Socket(s):            2
Stepping:             7
CPU max MHz:          3100.0000
CPU min MHz:          1200.0000

---------------------------------------------------------------
Steps to recreate:

- Run the scx_simple scheduler.

- Run the following script to online and offline CPUs:

    while true
    do
      # Turn off CPUs except 0
      for i in {1..31}
      do
        echo 0 > /sys/devices/system/cpu/cpu$i/online
      done

      # Turn on CPUs
      for i in {1..31}
      do
        echo 1 > /sys/devices/system/cpu/cpu$i/online
      done
    done

- The crash is intermittent and can occur only after hours of running
  the script on x86.

---------------------------------------------------------------
Here's the crash log:

[  487.476415] INFO: task systemd:1 blocked for more than 120 seconds.
[  487.483476]       Tainted: G        W          6.10.0-rc7-next-20240711-sched-ext #6
[  487.492190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  487.500976] task:systemd         state:D stack:0     pid:1     tgid:1     ppid:0      flags:0x00000002
[  487.511415] Sched_ext: simple (disabling)
[  487.511423] Call Trace:
[  487.518693]  <TASK>
[  487.521094]  __schedule+0x44b/0x1930
[  487.525143]  ? __this_cpu_preempt_check+0x17/0x20
[  487.530462]  schedule+0x4b/0x1a0
[  487.534094]  percpu_rwsem_wait+0x118/0x140
[  487.538715]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[  487.544775]  __percpu_down_read+0x7b/0xb0
[  487.549289]  cpus_read_lock+0x86/0x90
[  487.553416]  cpuset_css_online+0x28/0x500
[  487.557935]  online_css+0x31/0xa0
[  487.561673]  cgroup_apply_control_enable+0x25b/0x400
[  487.567248]  cgroup_mkdir+0x16a/0x4d0
[  487.571369]  kernfs_iop_mkdir+0x63/0x90
[  487.575689]  vfs_mkdir+0x18f/0x250
[  487.579532]  do_mkdirat+0x117/0x160
[  487.583473]  __x64_sys_mkdir+0x4d/0x70
[  487.587689]  x64_sys_call+0x1c5a/0x2140
[  487.591996]  do_syscall_64+0x6f/0x110
[  487.596112]  ? debug_smp_processor_id+0x1b/0x30
[  487.601196]  ? irqentry_exit_to_user_mode+0x72/0x1a0
[  487.606773]  ? irqentry_exit+0x3f/0x50
[  487.610980]  ? exc_page_fault+0x96/0x2a0
[  487.615377]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  487.621049] RIP: 0033:0x7f5c8131445b
[  487.625069] RSP: 002b:00007ffd6d569808 EFLAGS: 00000202 ORIG_RAX: 0000000000000053
[  487.633543] RAX: ffffffffffffffda RBX: 000055989335bd00 RCX: 00007f5c8131445b
[  487.641521] RDX: 0000000000000000 RSI: 00000000000001ed RDI: 0000559893389d70
[  487.649514] RBP: 00007f5c818011e8 R08: 0000000000000000 R09: 0000000000000000
[  487.657508] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[  487.665484] R13: 00000000000007ad R14: 000055989335bd00 R15: 0000000000000003
[  487.673503]  </TASK>
[  487.675971] INFO: task kthreadd:2 blocked for more than 121 seconds.
[  487.683082]       Tainted: G        W          6.10.0-rc7-next-20240711-sched-ext #6
[  487.691745] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  487.700517] task:kthreadd        state:D stack:0     pid:2     tgid:2     ppid:0      flags:0x00004000
[  487.710932] Sched_ext: simple (disabling)
[  487.710940] Call Trace:
[  487.718168]  <TASK>
[  487.720528]  __schedule+0x44b/0x1930
[  487.724542]  ? __this_cpu_preempt_check+0x17/0x20
[  487.729822]  schedule+0x4b/0x1a0
[  487.733440]  percpu_rwsem_wait+0x118/0x140
[  487.738028]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[  487.744079]  __percpu_down_read+0x7b/0xb0
[  487.748578]  scx_pre_fork+0x86/0x90
[  487.752505]  sched_fork+0x59/0x1c0
[  487.756313]  copy_process+0x9a8/0x2840
[  487.760544]  ? __this_cpu_preempt_check+0x17/0x20
[  487.765821]  kernel_clone+0xa0/0x4c0
[  487.769821]  ? raw_spin_rq_unlock+0x14/0x40
[  487.774506]  ? sched_clock_noinstr+0xd/0x20
[  487.779197]  ? local_clock_noinstr+0x12/0xf0
[  487.783991]  kernel_thread+0x70/0x90
[  487.787996]  ? __pfx_kthread+0x10/0x10
[  487.792200]  kthreadd+0x2de/0x370
[  487.795937]  ? __pfx_kthreadd+0x10/0x10
[  487.800236]  ret_from_fork+0x40/0x60
[  487.804245]  ? __pfx_kthreadd+0x10/0x10
[  487.808543]  ret_from_fork_asm+0x1a/0x30
[  487.812958]  </TASK>
[  487.815407] INFO: task kworker/0:0:8 blocked for more than 121 seconds.
[  487.822803]       Tainted: G        W          6.10.0-rc7-next-20240711-sched-ext #6
[  487.831468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  487.840229] task:kworker/0:0     state:D stack:0     pid:8     tgid:8     ppid:2      flags:0x00004000
[  487.850634] Workqueue: events vmstat_shepherd
[  487.855521] Sched_ext: simple (disabling)
[  487.855524] Call Trace:
[  487.862764]  <TASK>
[  487.865124]  __schedule+0x44b/0x1930
[  487.869162]  ? __this_cpu_preempt_check+0x17/0x20
[  487.874438]  schedule+0x4b/0x1a0
[  487.878065]  percpu_rwsem_wait+0x118/0x140
[  487.882664]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[  487.888712]  __percpu_down_read+0x7b/0xb0
[  487.893202]  cpus_read_lock+0x86/0x90
[  487.897307]  vmstat_shepherd+0x20/0x120
[  487.901614]  process_one_work+0x21e/0x680
[  487.906120]  ? __this_cpu_preempt_check+0x17/0x20
[  487.911398]  worker_thread+0x194/0x340
[  487.915597]  ? __pfx_worker_thread+0x10/0x10
[  487.920378]  kthread+0xfa/0x130
[  487.923904]  ? __pfx_kthread+0x10/0x10
[  487.928110]  ret_from_fork+0x40/0x60
[  487.932112]  ? __pfx_kthread+0x10/0x10
[  487.936312]  ret_from_fork_asm+0x1a/0x30
[  487.940736]  </TASK>
[  487.943187] INFO: task rcu_tasks_trace:15 blocked for more than 121 seconds.
[  487.951067]       Tainted: G        W          6.10.0-rc7-next-20240711-sched-ext #6
[  487.959732] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  487.968491] task:rcu_tasks_trace state:D stack:0     pid:15    tgid:15    ppid:2      flags:0x00004000
[  487.978897] Sched_ext: simple (disabling)
[  487.978899] Call Trace:
[  487.986129]  <TASK>
[  487.988496]  __schedule+0x44b/0x1930
[  487.992539]  ? __this_cpu_preempt_check+0x17/0x20
[  487.997830]  schedule+0x4b/0x1a0
[  488.001460]  percpu_rwsem_wait+0x118/0x140
[  488.006056]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[  488.012112]  __percpu_down_read+0x7b/0xb0
[  488.016608]  cpus_read_lock+0x86/0x90
[  488.020712]  check_all_holdout_tasks_trace+0x40/0x470
[  488.026409]  rcu_tasks_wait_gp+0x1f5/0x360
[  488.031013]  rcu_tasks_one_gp+0x452/0x4c0
[  488.035514]  ? __pfx_rcu_tasks_kthread+0x10/0x10
[  488.040682]  rcu_tasks_kthread+0xb8/0xe0
[  488.045088]  kthread+0xfa/0x130
[  488.048609]  ? __pfx_kthread+0x10/0x10
[  488.052812]  ret_from_fork+0x40/0x60
[  488.056832]  ? __pfx_kthread+0x10/0x10
[  488.061033]  ret_from_fork_asm+0x1a/0x30
[  488.065448]  </TASK>
[  488.067938] INFO: task systemd-journal:800 blocked for more than 121 seconds.
[  488.075917]       Tainted: G        W          6.10.0-rc7-next-20240711-sched-ext #6
[  488.084573] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  488.093348] task:systemd-journal state:D stack:0     pid:800   tgid:800   ppid:1      flags:0x00000002
[  488.103755] Sched_ext: simple (disabling)
[  488.103758] Call Trace:
[  488.111007]  <TASK>
[  488.113381]  __schedule+0x44b/0x1930
[  488.117423]  ? __this_cpu_preempt_check+0x17/0x20
[  488.122690]  schedule+0x4b/0x1a0
[  488.126334]  percpu_rwsem_wait+0x118/0x140
[  488.130935]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[  488.137002]  __percpu_down_read+0x7b/0xb0
[  488.141486]  scx_pre_fork+0x86/0x90
[  488.145407]  sched_fork+0x59/0x1c0
[  488.149247]  copy_process+0x9a8/0x2840
[  488.153462]  ? sched_clock_noinstr+0xd/0x20
[  488.158166]  ? sched_clock_noinstr+0xd/0x20
[  488.162872]  kernel_clone+0xa0/0x4c0
[  488.166870]  ? copy_clone_args_from_user+0xa3/0x290
[  488.172362]  __do_sys_clone3+0xc8/0x110
[  488.176706]  ? __secure_computing+0x89/0xe0
[  488.181389]  __x64_sys_clone3+0x1a/0x20
[  488.185710]  x64_sys_call+0x1fca/0x2140
[  488.190006]  do_syscall_64+0x6f/0x110
[  488.194138]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  488.199615]  ? do_syscall_64+0x7b/0x110
[  488.203904]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  488.209390]  ? do_syscall_64+0x7b/0x110
[  488.213679]  ? exc_page_fault+0x96/0x2a0
[  488.218079]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  488.223758] RIP: 0033:0x7f643752683d
[  488.227757] RSP: 002b:00007ffe57fd63b8 EFLAGS: 00000202 ORIG_RAX: 00000000000001b3
[  488.236258] RAX: ffffffffffffffda RBX: 00007ffe57fd6400 RCX: 00007f643752683d
[  488.244239] RDX: 00007f64374947d0 RSI: 0000000000000058 RDI: 00007ffe57fd6400
[  488.252212] RBP: fffffffffffffdc8 R08: 00007f6435a00640 R09: 00007ffe57fd651f
[  488.260213] R10: 0000000000000008 R11: 0000000000000202 R12: 00007f6435a00640
[  488.268203] R13: 0000000000000002 R14: 00007f64374947d0 R15: 00007ffe57fd6560
[  488.276217]  </TASK>
[  488.278715] INFO: task anacron:1537 blocked for more than 121 seconds.
[  488.286010]       Tainted: G        W          6.10.0-rc7-next-20240711-sched-ext #6
[  488.294677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  488.303426] task:anacron         state:D stack:0     pid:1537  tgid:1537  ppid:1      flags:0x00000002
[  488.313858] Sched_ext: simple (disabling)
[  488.313869] Call Trace:
[  488.321100]  <TASK>
[  488.323444]  __schedule+0x44b/0x1930
[  488.327457]  ? __this_cpu_preempt_check+0x17/0x20
[  488.332736]  schedule+0x4b/0x1a0
[  488.336352]  percpu_rwsem_wait+0x118/0x140
[  488.340945]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[  488.346985]  __percpu_down_read+0x7b/0xb0
[  488.351507]  scx_pre_fork+0x86/0x90
[  488.355410]  sched_fork+0x59/0x1c0
[  488.359254]  copy_process+0x9a8/0x2840
[  488.363463]  ? __do_sys_newfstatat+0x3e/0x60
[  488.368266]  kernel_clone+0xa0/0x4c0
[  488.372263]  ? do_syscall_64+0x7b/0x110
[  488.376607]  ? debug_smp_processor_id+0x1b/0x30
[  488.381668]  __do_sys_clone+0x66/0x90
[  488.385819]  __x64_sys_clone+0x29/0x30
[  488.390011]  x64_sys_call+0x1ac3/0x2140
[  488.394311]  do_syscall_64+0x6f/0x110
[  488.398438]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  488.403895]  ? do_syscall_64+0x7b/0x110
[  488.408196]  ? do_syscall_64+0x7b/0x110
[  488.412488]  ? exc_page_fault+0x96/0x2a0
[  488.416884]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  488.422530] RIP: 0033:0x7f3671aeab57
[  488.426530] RSP: 002b:00007ffc8cff27c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  488.435009] RAX: ffffffffffffffda RBX: 00007f3671e0d040 RCX: 00007f3671aeab57
[  488.443000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  488.450975] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  488.458966] R10: 00007f3671db6a10 R11: 0000000000000246 R12: 0000000000000001
[  488.466954] R13: 000055d8160b0f60 R14: 00007ffc8cff2980 R15: 0000000000000005
[  488.474964]  </TASK>
[  488.477487] INFO: task scx_simple:3628 blocked for more than 121 seconds.
[  488.485109]       Tainted: G        W          6.10.0-rc7-next-20240711-sched-ext #6
[  488.493763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  488.502542] task:scx_simple      state:D stack:0     pid:3628  tgid:3628  ppid:3536   flags:0x00004002
[  488.512957] Sched_ext: simple (disabling)
[  488.512970] Call Trace:
[  488.520204]  <TASK>
[  488.522556]  __schedule+0x44b/0x1930
[  488.526597]  ? __this_cpu_preempt_check+0x17/0x20
[  488.531864]  schedule+0x4b/0x1a0
[  488.535496]  percpu_rwsem_wait+0x118/0x140
[  488.540109]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[  488.546157]  __percpu_down_read+0x7b/0xb0
[  488.550673]  cpus_read_lock+0x86/0x90
[  488.554770]  scx_ops_enable.constprop.0+0x36f/0xbd0
[  488.560243]  ? __this_cpu_preempt_check+0x17/0x20
[  488.565516]  ? debug_smp_processor_id+0x1b/0x30
[  488.570594]  ? rcu_is_watching+0x17/0x70
[  488.575031]  ? trace_contention_end+0x6c/0xa0
[  488.579914]  ? __mutex_lock+0x399/0x900
[  488.584249]  ? __anon_inode_getfile+0x8b/0x180
[  488.589220]  ? bpf_struct_ops_link_create+0x105/0x180
[  488.594909]  bpf_scx_reg+0x12/0x20
[  488.598718]  bpf_struct_ops_link_create+0x124/0x180
[  488.604214]  __sys_bpf+0x2005/0x28e0
[  488.608242]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  488.613706]  ? do_syscall_64+0x7b/0x110
[  488.618043]  __x64_sys_bpf+0x1e/0x30
[  488.622042]  x64_sys_call+0x2038/0x2140
[  488.626365]  do_syscall_64+0x6f/0x110
[  488.630461]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  488.635929]  ? do_syscall_64+0x7b/0x110
[  488.640247]  ? debug_smp_processor_id+0x1b/0x30
[  488.645328]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  488.650794]  ? do_syscall_64+0x7b/0x110
[  488.655103]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  488.660591]  ? do_syscall_64+0x7b/0x110
[  488.664881]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  488.670347]  ? do_syscall_64+0x7b/0x110
[  488.674669]  ? debug_smp_processor_id+0x1b/0x30
[  488.679756]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  488.685212]  ? do_syscall_64+0x7b/0x110
[  488.689512]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  488.695170] RIP: 0033:0x7f0e8a11e88d
[  488.699167] RSP: 002b:00007ffc0fac4ad8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[  488.707663] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f0e8a11e88d
[  488.715637] RDX: 0000000000000040 RSI: 00007ffc0fac4bc0 RDI: 000000000000001c
[  488.723627] RBP: 00007ffc0fac4af0 R08: 00007ffc0fac4bc0 R09: 00007ffc0fac4bc0
[  488.731599] R10: 00005577269fbb30 R11: 0000000000000202 R12: 00005577257de2cd
[  488.739590] R13: 00005577257de200 R14: 00007ffc0fac4cf8 R15: 00005577269a9460
[  488.747607]  </TASK>
[  488.750096] INFO: task sched_ext_ops_h:3629 blocked for more than 122 seconds.
[  488.758223]       Tainted: G        W          6.10.0-rc7-next-20240711-sched-ext #6
[  488.766883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  488.775646] task:sched_ext_ops_h state:D stack:0     pid:3629  tgid:3629  ppid:2      flags:0x00004000
[  488.786079] Sched_ext: simple (disabling)
[  488.786082] Call Trace:
[  488.793303]  <TASK>
[  488.795648]  __schedule+0x44b/0x1930
[  488.799679]  ? __this_cpu_preempt_check+0x17/0x20
[  488.804958]  schedule+0x4b/0x1a0
[  488.808583]  schedule_preempt_disabled+0x1c/0x30
[  488.813748]  __mutex_lock+0x51b/0x900
[  488.817880]  ? scx_ops_disable_workfn+0xe0/0x580
[  488.823071]  mutex_lock_nested+0x1f/0x30
[  488.827448]  ? mutex_lock_nested+0x1f/0x30
[  488.832044]  scx_ops_disable_workfn+0xe0/0x580
[  488.837048]  ? _raw_spin_unlock+0x31/0x60
[  488.841531]  ? raw_spin_rq_unlock+0x14/0x40
[  488.846213]  ? sched_clock_noinstr+0xd/0x20
[  488.850900]  ? local_clock_noinstr+0x12/0xf0
[  488.855709]  ? sched_clock_noinstr+0xd/0x20
[  488.860398]  ? __this_cpu_preempt_check+0x17/0x20
[  488.865697]  kthread_worker_fn+0xbb/0x2e0
[  488.870188]  ? __pfx_scx_ops_disable_workfn+0x10/0x10
[  488.875848]  ? __pfx_kthread_worker_fn+0x10/0x10
[  488.881021]  kthread+0xfa/0x130
[  488.884548]  ? __pfx_kthread+0x10/0x10
[  488.888744]  ret_from_fork+0x40/0x60
[  488.892747]  ? __pfx_kthread+0x10/0x10
[  488.896939]  ret_from_fork_asm+0x1a/0x30
[  488.901359]  </TASK>
[  488.903798] INFO: task bash:3633 blocked for more than 122 seconds.
[  488.910811]       Tainted: G        W          6.10.0-rc7-next-20240711-sched-ext #6
[  488.919477] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  488.928258] task:bash            state:D stack:0     pid:3633  tgid:3633  ppid:3599   flags:0x00004006
[  488.938670] Sched_ext: simple (disabling)
[  488.938672] Call Trace:
[  488.945926]  <TASK>
[  488.948280]  __schedule+0x44b/0x1930
[  488.952329]  ? __this_cpu_preempt_check+0x17/0x20
[  488.957616]  schedule+0x4b/0x1a0
[  488.961239]  schedule_timeout+0x116/0x120
[  488.965767]  __wait_for_common+0x91/0x180
[  488.970270]  ? __pfx_schedule_timeout+0x10/0x10
[  488.975360]  wait_for_completion+0x28/0x30
[  488.979943]  __cpuhp_kick_ap+0x6c/0x80
[  488.984156]  cpuhp_bringup_ap+0x149/0x260
[  488.988677]  ? __pfx_cpuhp_bringup_ap+0x10/0x10
[  488.993746]  cpuhp_invoke_callback+0x159/0x2f0
[  488.998743]  ? __pfx_trace_rb_cpu_prepare+0x10/0x10
[  489.004231]  __cpuhp_invoke_callback_range+0x79/0xf0
[  489.009814]  _cpu_up+0x10e/0x290
[  489.013453]  cpu_up+0xb0/0x110
[  489.016893]  cpu_device_up+0x1d/0x30
[  489.020899]  cpu_subsys_online+0x5e/0x130
[  489.025436]  device_online+0x6f/0x90
[  489.029472]  online_store+0xa4/0xe0
[  489.033378]  dev_attr_store+0x1b/0x30
[  489.037513]  sysfs_kf_write+0x4f/0x70
[  489.041613]  kernfs_fop_write_iter+0x172/0x230
[  489.046615]  vfs_write+0x369/0x480
[  489.050478]  ksys_write+0x71/0xf0
[  489.054210]  __x64_sys_write+0x1d/0x30
[  489.058413]  x64_sys_call+0x16b3/0x2140
[  489.062714]  do_syscall_64+0x6f/0x110
[  489.066842]  ? debug_smp_processor_id+0x1b/0x30
[  489.071921]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  489.077407]  ? do_syscall_64+0x7b/0x110
[  489.081695]  ? do_syscall_64+0x7b/0x110
[  489.085995]  ? syscall_exit_to_user_mode+0x87/0x1c0
[  489.091481]  ? do_syscall_64+0x7b/0x110
[  489.095777]  ? sysvec_call_function+0x76/0xf0
[  489.100681]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  489.106348] RIP: 0033:0x7f468e314887
[  489.110345] RSP: 002b:00007fffe2d5bf68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  489.118830] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f468e314887
[  489.126812] RDX: 0000000000000002 RSI: 0000561770d46100 RDI: 0000000000000001
[  489.134802] RBP: 0000561770d46100 R08: 00007f468e3d1460 R09: 000000007fffffff
[  489.142785] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[  489.150760] R13: 00007f468e41b780 R14: 00007f468e417600 R15: 00007f468e416a00
[  489.158759]  </TASK>
[  489.161225] 
[  489.161225] Showing all locks held in the system:
[  489.168159] 4 locks held by systemd/1:
[  489.172351]  #0: ffff9b2b46118450 (sb_writers#9){....}-{0:0}, at: filename_create+0x63/0x160
[  489.181808]  #1: ffff9b2c2025c988 (&type->i_mutex_dir_key#7/1){....}-{3:3}, at: filename_create+0xa2/0x160
[  489.192633]  #2: ffffffffa8487790 (cgroup_mutex){....}-{3:3}, at: cgroup_kn_lock_live+0x52/0x150
[  489.202477]  #3: ffffffffa82a38a0 (cpu_hotplug_lock){....}-{0:0}, at: cpuset_css_online+0x28/0x500
[  489.212540] 1 lock held by kthreadd/2:
[  489.216731]  #0: ffffffffa82c2ba0 (scx_fork_rwsem){....}-{0:0}, at: sched_fork+0x59/0x1c0
[  489.225908] 3 locks held by kworker/0:0/8:
[  489.230486]  #0: ffff9b2b40051358 ((wq_completion)events){....}-{0:0}, at: process_one_work+0x3f5/0x680
[  489.241008]  #1: ffffab14001b7e40 ((shepherd).work){....}-{0:0}, at: process_one_work+0x1d6/0x680
[  489.250955]  #2: ffffffffa82a38a0 (cpu_hotplug_lock){....}-{0:0}, at: vmstat_shepherd+0x20/0x120
[  489.260816] 2 locks held by rcu_tasks_trace/15:
[  489.265880]  #0: ffffffffa83af720 (rcu_tasks_trace.tasks_gp_mutex){....}-{3:3}, at: rcu_tasks_one_gp+0x31f/0x4c0
[  489.277272]  #1: ffffffffa82a38a0 (cpu_hotplug_lock){....}-{0:0}, at: check_all_holdout_tasks_trace+0x40/0x470
[  489.288486] 3 locks held by cpuhp/8/65:
[  489.292780]  #0: ffffffffa82a38a0 (cpu_hotplug_lock){....}-{0:0}, at: cpuhp_thread_fun+0x87/0x260
[  489.302712]  #1: ffffffffa82a5fc0 (cpuhp_state-up){....}-{0:0}, at: cpuhp_thread_fun+0x87/0x260
[  489.312450]  #2: ffffffffa82b1e70 (wq_pool_mutex){....}-{3:3}, at: workqueue_online_cpu+0x3d/0x3c0
[  489.322522] 1 lock held by khungtaskd/216:
[  489.327103]  #0: ffffffffa83b0040 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x3b/0x140
[  489.337164] 1 lock held by systemd-journal/800:
[  489.342229]  #0: ffffffffa82c2ba0 (scx_fork_rwsem){....}-{0:0}, at: sched_fork+0x59/0x1c0
[  489.351415] 1 lock held by wqtimer/2197:
[  489.355802]  #0: ffffffffa82c2ba0 (scx_fork_rwsem){....}-{0:0}, at: sched_fork+0x59/0x1c0
[  489.364980] 1 lock held by anacron/1537:
[  489.369366]  #0: ffffffffa82c2ba0 (scx_fork_rwsem){....}-{0:0}, at: sched_fork+0x59/0x1c0
[  489.378541] 1 lock held by in:imklog/1594:
[  489.383121]  #0: ffff9b42a8abf0d8 (&f->f_pos_lock){....}-{3:3}, at: __fdget_pos+0x4d/0x70
[  489.392309] 1 lock held by gdm3/1906:
[  489.396405]  #0: ffffffffa82c2ba0 (scx_fork_rwsem){....}-{0:0}, at: sched_fork+0x59/0x1c0
[  489.405585] 2 locks held by sh/1963:
[  489.409582]  #0: ffff9b2c204508a8 (&tty->ldisc_sem){....}-{0:0}, at: ldsem_down_read+0x29/0x40
[  489.419224]  #1: ffffab140dcaf2f8 (&ldata->atomic_read_lock){....}-{3:3}, at: n_tty_read+0x563/0x690
[  489.429483] 4 locks held by scx_simple/3628:
[  489.434258]  #0: ffffffffa84b4e50 (update_mutex){....}-{3:3}, at: bpf_struct_ops_link_create+0x105/0x180
[  489.444892]  #1: ffffffffa82c2c50 (scx_ops_enable_mutex){....}-{3:3}, at: scx_ops_enable.constprop.0+0x60/0xbd0
[  489.456199]  #2: ffffffffa82c2ba0 (scx_fork_rwsem){....}-{0:0}, at: scx_ops_enable.constprop.0+0x36a/0xbd0
[  489.467021]  #3: ffffffffa82a38a0 (cpu_hotplug_lock){....}-{0:0}, at: scx_ops_enable.constprop.0+0x36f/0xbd0
[  489.478027] 1 lock held by sched_ext_ops_h/3629:
[  489.483198]  #0: ffffffffa82c2c50 (scx_ops_enable_mutex){....}-{3:3}, at: scx_ops_disable_workfn+0xe0/0x580
[  489.494120] 7 locks held by bash/3633:
[  489.498319]  #0: ffff9b2b44385450 (sb_writers#6){....}-{0:0}, at: ksys_write+0x71/0xf0
[  489.507193]  #1: ffff9b2c22223c90 (&of->mutex){....}-{3:3}, at: kernfs_fop_write_iter+0x122/0x230
[  489.517141]  #2: ffff9b2b44b11148 (kn->active#150){....}-{0:0}, at: kernfs_fop_write_iter+0x12a/0x230
[  489.527500]  #3: ffffffffa85de5b0 (device_hotplug_lock){....}-{3:3}, at: online_store+0x47/0xe0
[  489.537245]  #4: ffff9b424fa04b98 (&dev->mutex){....}-{3:3}, at: device_online+0x29/0x90
[  489.546326]  #5: ffffffffa82a3950 (cpu_add_remove_lock){....}-{3:3}, at: cpu_up+0x7d/0x110
[  489.555585]  #6: ffffffffa82a38a0 (cpu_hotplug_lock){....}-{0:0}, at: _cpu_up+0x49/0x290
[  489.564673] 
[  489.566342] =============================================
[  489.566342] 


Thanks,
Aboorva


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-07-12 10:04     ` Aboorva Devarajan
@ 2024-08-02 21:20       ` Tejun Heo
  2024-08-10 18:24         ` Aboorva Devarajan
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2024-08-02 21:20 UTC (permalink / raw)
  To: Aboorva Devarajan; +Cc: void, linux-kernel

Hello, Aboorva.

On Fri, Jul 12, 2024 at 03:34:59PM +0530, Aboorva Devarajan wrote:
> Hi Tejun,
> 
> On Thu, 2024-07-11 at 17:30 +0530, Aboorva Devarajan wrote:
> 
> > > It could also be specific to ppc64 (e.g. there's something dependingon fork
> > > during CPU hotplug), or at least doesn't happen on x86. I'll probe it a bit
> > > more.
> > 
> > Sure, so far I haven't been able to reproduce the issue on x86, also I'm running
> > some tests for a longer duration to see if it can be recreated on x86 as well
> > inorder to rule out the possiblity of it being a ppc64 specific issue, it could
> > be that the issue occurs much rarer on x86 when compared to ppc64.
> > 
> 
> I have been able to recreate a similar issue on x86 as well, so this
> rules out the possibility of it being ppc64 specific.

Can you please applying the following patchset and see whether the problem
persists?

 http://lkml.kernel.org/r/20240802211850.1211737-1-tj@kernel.org

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-08-02 21:20       ` Tejun Heo
@ 2024-08-10 18:24         ` Aboorva Devarajan
  2024-08-13 19:54           ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Aboorva Devarajan @ 2024-08-10 18:24 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, linux-kernel

On Fri, 2024-08-02 at 11:20 -1000, Tejun Heo wrote:

> Hello, Aboorva.
> 
> On Fri, Jul 12, 2024 at 03:34:59PM +0530, Aboorva Devarajan wrote:
> > Hi Tejun,
> > 
> > On Thu, 2024-07-11 at 17:30 +0530, Aboorva Devarajan wrote:
> > 
> > > > It could also be specific to ppc64 (e.g. there's something dependingon fork
> > > > during CPU hotplug), or at least doesn't happen on x86. I'll probe it a bit
> > > > more.
> > > 
> > > Sure, so far I haven't been able to reproduce the issue on x86, also I'm running
> > > some tests for a longer duration to see if it can be recreated on x86 as well
> > > inorder to rule out the possiblity of it being a ppc64 specific issue, it could
> > > be that the issue occurs much rarer on x86 when compared to ppc64.
> > > 
> > 
> > I have been able to recreate a similar issue on x86 as well, so this
> > rules out the possibility of it being ppc64 specific.
> 
> Can you please applying the following patchset and see whether the problem
> persists?
> 
>  http://lkml.kernel.org/r/20240802211850.1211737-1-tj@kernel.org
> 

Hi Tejun,

Sorry for the delayed response, I just got sometime on the machine to
run this test.

I applied the patchset on the latest sched-ext (for-next) branch and
tested it, but the issue still persists. I did the test on a PowerPC
system, as it is easier to reproduce the issue there compared to x86.

Here's the error log:

------------------------------------------------------
[  605.329686] INFO: task systemd:1 blocked for more than 120 seconds.
[  605.329716]       Not tainted 6.11.0-rc1+ #13
[  605.329728] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.329752] task:systemd         state:D stack:0     pid:1     tgid:1     ppid:0      flags:0x00040000
[  605.329772] Sched_ext: simple (prepping)
[  605.329773] Call Trace:
[  605.329788] [c000000dddecf620] [0000000000000001] 0x1 (unreliable)
[  605.329814] [c000000dddecf7d0] [c00000000001d2d8] __switch_to+0x238/0x300
[  605.329840] [c000000dddecf830] [c0000000012b8958] __schedule+0x6f8/0xbac
[  605.329866] [c000000dddecf920] [c0000000012b8ec8] schedule+0xbc/0x174
[  605.329890] [c000000dddecf990] [c0000000002243d4] percpu_rwsem_wait+0x184/0x1d0
[  605.329917] [c000000dddecfa00] [c0000000012be358] __percpu_down_read+0x58/0x108
[  605.329952] [c000000dddecfa40] [c0000000001707a8] cpus_read_lock+0x88/0xa0
[  605.329968] [c000000dddecfa70] [c0000000004730f0] static_key_slow_inc+0x30/0x70
[  605.329986] [c000000dddecfaa0] [c0000000005ab80c] mem_cgroup_css_alloc+0x4ec/0x6c0
[  605.330022] [c000000dddecfb40] [c0000000002c9048] cgroup_apply_control_enable+0x208/0x600
[  605.330057] [c000000dddecfc40] [c0000000002c8bfc] cgroup_mkdir+0x39c/0x5e0
[  605.330090] [c000000dddecfcc0] [c0000000006dee7c] kernfs_iop_mkdir+0xdc/0x170
[  605.330133] [c000000dddecfd10] [c0000000005ea5cc] vfs_mkdir+0x13c/0x1f0
[  605.330169] [c000000dddecfd60] [c0000000005ea768] do_mkdirat+0xe8/0x2e0
[  605.330213] [c000000dddecfdc0] [c0000000005eaa40] sys_mkdir+0x50/0x80
[  605.330246] [c000000dddecfe00] [c000000000030cac] system_call_exception+0xfc/0x290
[  605.330294] [c000000dddecfe50] [c00000000000c7d4] system_call_common+0xf4/0x258
[  605.330342] --- interrupt: c00 at 0x7bff27d33038
[  605.330365] NIP:  00007bff27d33038 LR: 00007bff27958278 CTR: 0000000000000000
[  605.330398] REGS: c000000dddecfe80 TRAP: 0c00   Not tainted  (6.11.0-rc1+)
[  605.330421] MSR:  900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE>  CR: 24004422  XER: 00000000
[  605.330452] IRQMASK: 0 
[  605.330452] GPR00: 0000000000000027 00007fffed2f4ae0 00007bff27e37100 00000ff04a9b6940 
[  605.330452] GPR04: 00000000000001ed 0000000000000000 0000000000000000 0000000000000000 
[  605.330452] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  605.330452] GPR12: 0000000000000000 00007bff266a7780 00000ff029879080 0000000000000000 
[  605.330452] GPR16: 0000000000000000 00000ff04a9490c0 0000000000000000 fffffffffffffffe 
[  605.330452] GPR20: 0000000000000000 fffffffffffffffd 00007fffed2f5338 0000000000000020 
[  605.330452] GPR24: fffffffffffffc0c 00000ff04a8f59f0 0000000000000001 00000000ffffff0c 
[  605.330452] GPR28: 0000000000000001 00000ff04a8f59f0 00007bff27a99588 0000000000000000 
[  605.330717] NIP [00007bff27d33038] 0x7bff27d33038
[  605.330746] LR [00007bff27958278] 0x7bff27958278
[  605.330777] --- interrupt: c00
[  605.330795] INFO: task kworker/0:1:9 blocked for more than 120 seconds.
[  605.330837]       Not tainted 6.11.0-rc1+ #13
[  605.330857] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.330891] task:kworker/0:1     state:D stack:0     pid:9     tgid:9     ppid:2      flags:0x00000000
[  605.330939] Workqueue: events vmstat_shepherd
[  605.330973] Sched_ext: simple (prepping)
[  605.330974] Call Trace:
[  605.331006] [c000000ddde8b920] [c0000000001e65a4] update_cfs_rq_load_avg+0x194/0x1e0 (unreliable)
[  605.331069] [c000000ddde8bad0] [c00000000001d2d8] __switch_to+0x238/0x300
[  605.331103] [c000000ddde8bb30] [c0000000012b8958] __schedule+0x6f8/0xbac
[  605.331128] [c000000ddde8bc20] [c0000000012b8ec8] schedule+0xbc/0x174
[  605.331163] [c000000ddde8bc90] [c0000000002243d4] percpu_rwsem_wait+0x184/0x1d0
[  605.331209] [c000000ddde8bd00] [c0000000012be358] __percpu_down_read+0x58/0x108
[  605.331255] [c000000ddde8bd40] [c0000000001707a8] cpus_read_lock+0x88/0xa0
[  605.331300] [c000000ddde8bd70] [c0000000004b6d38] vmstat_shepherd+0x48/0x1b0
[  605.331344] [c000000ddde8bde0] [c0000000001a6508] process_scheduled_works+0x268/0x520
[  605.331391] [c000000ddde8bee0] [c0000000001a9bb0] worker_thread+0x3f0/0x590
[  605.331423] [c000000ddde8bf80] [c0000000001b35b0] kthread+0x1a0/0x1c0
[  605.331457] [c000000ddde8bfe0] [c00000000000d030] start_kernel_thread+0x14/0x18
[  605.331493] INFO: task rcu_tasks_trace:14 blocked for more than 120 seconds.
[  605.331529]       Not tainted 6.11.0-rc1+ #13
[  605.331568] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.331603] task:rcu_tasks_trace state:D stack:0     pid:14    tgid:14    ppid:2      flags:0x00000000
[  605.331649] Sched_ext: simple (prepping)
[  605.331650] Call Trace:
[  605.331694] [c000000dddea77e0] [c000000ddde1f900] 0xc000000ddde1f900 (unreliable)
[  605.331738] [c000000dddea7990] [c00000000001d2d8] __switch_to+0x238/0x300
[  605.331784] [c000000dddea79f0] [c0000000012b8958] __schedule+0x6f8/0xbac
[  605.331811] [c000000dddea7ae0] [c0000000012b8ec8] schedule+0xbc/0x174
[  605.331864] [c000000dddea7b50] [c0000000002243d4] percpu_rwsem_wait+0x184/0x1d0
[  605.331901] [c000000dddea7bc0] [c0000000012be358] __percpu_down_read+0x58/0x108
[  605.331949] [c000000dddea7c00] [c0000000001707a8] cpus_read_lock+0x88/0xa0
[  605.331982] [c000000dddea7c30] [c0000000002525fc] rcu_tasks_trace_pregp_step+0xec/0x580
[  605.332019] [c000000dddea7cf0] [c000000000251fdc] rcu_tasks_wait_gp+0xac/0x360
[  605.332046] [c000000dddea7e00] [c000000000251ab0] rcu_tasks_one_gp+0x540/0x620
[  605.332063] [c000000dddea7f10] [c0000000002523ec] rcu_tasks_kthread+0xfc/0x120
[  605.332099] [c000000dddea7f80] [c0000000001b35b0] kthread+0x1a0/0x1c0
[  605.332133] [c000000dddea7fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
[  605.332171] INFO: task systemd-journal:426 blocked for more than 120 seconds.
[  605.332226]       Not tainted 6.11.0-rc1+ #13
[  605.332267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.332282] task:systemd-journal state:D stack:0     pid:426   tgid:426   ppid:1      flags:0x00040400
[  605.332327] Sched_ext: simple (prepping)
[  605.332329] Call Trace:
[  605.332371] [c000000e86bef750] [0000000000000001] 0x1 (unreliable)
[  605.332405] [c000000e86bef900] [c00000000001d2d8] __switch_to+0x238/0x300
[  605.332439] [c000000e86bef960] [c0000000012b8958] __schedule+0x6f8/0xbac
[  605.332475] [c000000e86befa50] [c0000000012b8ec8] schedule+0xbc/0x174
[  605.332501] [c000000e86befac0] [c0000000002243d4] percpu_rwsem_wait+0x184/0x1d0
[  605.332538] [c000000e86befb30] [c0000000012be358] __percpu_down_read+0x58/0x108
[  605.332607] [c000000e86befb70] [c0000000001f7c78] scx_pre_fork+0x88/0xa0
[  605.332692] [c000000e86befba0] [c0000000001cbbb8] sched_fork+0x88/0x230
[  605.332781] [c000000e86befbe0] [c00000000016a80c] copy_process+0x55c/0xfa0
[  605.332958] [c000000e86befcb0] [c00000000016cc3c] kernel_clone+0xcc/0x410
[  605.333028] [c000000e86befd30] [c00000000016d394] sys_clone+0xa4/0xe0
[  605.333094] [c000000e86befe00] [c000000000030cac] system_call_exception+0xfc/0x290
[  605.333193] [c000000e86befe50] [c00000000000c7d4] system_call_common+0xf4/0x258
[  605.333279] --- interrupt: c00 at 0x777e7434ba24
[  605.333351] NIP:  0000777e7434ba24 LR: 0000777e737e6bc4 CTR: 0000000000000000
[  605.333425] REGS: c000000e86befe80 TRAP: 0c00   Not tainted  (6.11.0-rc1+)
[  605.333498] MSR:  900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE>  CR: 24044420  XER: 00000000
[  605.333598] IRQMASK: 0 
[  605.333598] GPR00: 0000000000000078 00007fffc0151d30 0000777e74437100 00000000003d0f00 
[  605.333598] GPR04: 0000777e5320e7c0 0000777e5320eff0 0000777e53216690 0000777e5320eff0 
[  605.333598] GPR08: 0000777e53216690 0000000000000000 0000000000000000 0000000000000000 
[  605.333598] GPR12: 0000000000000000 0000777e74447780 0000777e73824410 0000000000000000 
[  605.333598] GPR16: 0000777e74660000 0000777e73820320 0000777e5320efe0 0000777e73824420 
[  605.333598] GPR20: 0000000000000000 0000000000000000 00000e5e33bf8c70 0000777e73e7c680 
[  605.333598] GPR24: 00000e5e33bf8de0 0000777e5320f690 0000777e5320e820 00007fffc0151e10 
[  605.333598] GPR28: 00007fffc0151e0f 00000000003d0f00 0000777e737e8750 0000777e5320ef20 
[  605.334310] NIP [0000777e7434ba24] 0x777e7434ba24
[  605.334366] LR [0000777e737e6bc4] 0x777e737e6bc4
[  605.334411] --- interrupt: c00
[  605.334454] INFO: task irqbalance:1262 blocked for more than 120 seconds.
[  605.334527]       Not tainted 6.11.0-rc1+ #13
[  605.334572] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.334649] task:irqbalance      state:D stack:0     pid:1262  tgid:1262  ppid:1      flags:0x00040080
[  605.334747] Sched_ext: simple (prepping)
[  605.334748] Call Trace:
[  605.334836] [c000000e5b69f5b0] [0000000000000001] 0x1 (unreliable)
[  605.334894] [c000000e5b69f760] [c00000000001d2d8] __switch_to+0x238/0x300
[  605.334957] [c000000e5b69f7c0] [c0000000012b8958] __schedule+0x6f8/0xbac
[  605.335031] [c000000e5b69f8b0] [c0000000012b8ec8] schedule+0xbc/0x174
[  605.335100] [c000000e5b69f920] [c0000000012b9028] schedule_preempt_disabled+0x28/0x48
[  605.335199] [c000000e5b69f950] [c0000000012bb5d0] __mutex_lock+0x390/0x9e8
[  605.335281] [c000000e5b69f9f0] [c000000000c3abf4] online_show+0x44/0xb0
[  605.335373] [c000000e5b69fa70] [c000000000c3a670] dev_attr_show+0x60/0xd0
[  605.335433] [c000000e5b69faf0] [c0000000006e3f44] sysfs_kf_seq_show+0xe4/0x190
[  605.335510] [c000000e5b69fb80] [c0000000006e244c] kernfs_seq_show+0x6c/0xa0
[  605.335596] [c000000e5b69fbc0] [c00000000061b988] seq_read_iter+0x1c8/0x560
[  605.335685] [c000000e5b69fca0] [c0000000006e0e78] kernfs_fop_read_iter+0x98/0x230
[  605.335779] [c000000e5b69fd00] [c0000000005cd6a8] vfs_read+0x2b8/0x330
[  605.335844] [c000000e5b69fda0] [c0000000005ce13c] ksys_read+0x9c/0x120
[  605.335921] [c000000e5b69fe00] [c000000000030cac] system_call_exception+0xfc/0x290
[  605.336019] [c000000e5b69fe50] [c00000000000c7d4] system_call_common+0xf4/0x258
[  605.336098] --- interrupt: c00 at 0x7344fed3356c
[  605.336150] NIP:  00007344fed3356c LR: 00007344fed33550 CTR: 0000000000000000
[  605.336242] REGS: c000000e5b69fe80 TRAP: 0c00   Not tainted  (6.11.0-rc1+)
[  605.336303] MSR:  900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE>  CR: 24004284  XER: 00000000
[  605.336412] IRQMASK: 0 
[  605.336412] GPR00: 0000000000000003 00007fffd6167fd0 00007344fee37100 0000000000000007 
[  605.336412] GPR04: 00000eab8d894ea0 0000000000002000 0000000000000001 00007344ff507978 
[  605.336412] GPR08: 0000000000000002 0000000000000000 0000000000000000 0000000000000000 
[  605.336412] GPR12: 0000000000000000 00007344ff50efe0 0000000000000000 0000000065646f6e 
[  605.336412] GPR16: 0000000000000000 0000000000000000 0000000000000000 00000eab8d8a7eb0 
[  605.336412] GPR20: 00000eab53340ae8 00000eab533404d8 00007fffd6168468 00007fffd6168868 
[  605.336412] GPR24: 00000eab53340248 00000eab5334005c 000000000000000a 00007344fee2cca8 
[  605.336412] GPR28: 0000000000000bd0 0000000000000000 0000000000000007 0000000000002000 
[  605.337155] NIP [00007344fed3356c] 0x7344fed3356c
[  605.337211] LR [00007344fed33550] 0x7344fed33550
[  605.337280] --- interrupt: c00
[  605.337319] INFO: task dockerd:1935 blocked for more than 120 seconds.
[  605.337384]       Not tainted 6.11.0-rc1+ #13
[  605.337441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.337512] task:dockerd         state:D stack:0     pid:1935  tgid:1866  ppid:1      flags:0x00040080
[  605.337615] Sched_ext: simple (prepping)
[  605.337616] Call Trace:
[  605.337690] [c000000de4287750] [c000000ea246d800] 0xc000000ea246d800 (unreliable)
[  605.337790] [c000000de4287900] [c00000000001d2d8] __switch_to+0x238/0x300
[  605.337870] [c000000de4287960] [c0000000012b8958] __schedule+0x6f8/0xbac
[  605.337934] [c000000de4287a50] [c0000000012b8ec8] schedule+0xbc/0x174
[  605.338028] [c000000de4287ac0] [c0000000002243d4] percpu_rwsem_wait+0x184/0x1d0
[  605.338103] [c000000de4287b30] [c0000000012be358] __percpu_down_read+0x58/0x108
[  605.338169] [c000000de4287b70] [c0000000001f7c78] scx_pre_fork+0x88/0xa0
[  605.338257] [c000000de4287ba0] [c0000000001cbbb8] sched_fork+0x88/0x230
[  605.338345] [c000000de4287be0] [c00000000016a80c] copy_process+0x55c/0xfa0
[  605.338441] [c000000de4287cb0] [c00000000016cc3c] kernel_clone+0xcc/0x410
[  605.338500] [c000000de4287d30] [c00000000016d394] sys_clone+0xa4/0xe0
[  605.338585] [c000000de4287e00] [c000000000030cac] system_call_exception+0xfc/0x290
[  605.338666] [c000000de4287e50] [c00000000000c7d4] system_call_common+0xf4/0x258
[  605.338764] --- interrupt: c00 at 0x2c4f62c635c
[  605.338810] NIP:  000002c4f62c635c LR: 000002c4f62bb118 CTR: 0000000000000000
[  605.338904] REGS: c000000de4287e80 TRAP: 0c00   Not tainted  (6.11.0-rc1+)
[  605.338975] MSR:  900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE>  CR: 24444444  XER: 00000000
[  605.339081] IRQMASK: 0 
[  605.339081] GPR00: 0000000000000078 000000c00059c560 000002c4f9757f00 0000000000004111 
[  605.339081] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  605.339081] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  605.339081] GPR12: 0000000000000000 00007f0632616750 0000000000000000 0000000000000000 
[  605.339081] GPR16: ffffffffffffffff 0000000000000084 0000000000000004 0000000000000000 
[  605.339081] GPR20: 0000000000000000 0000000000000000 000000c00059a3a0 0000000000000000 
[  605.339081] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  605.339081] GPR28: 0000000000000000 000002c4f7d15780 000000c000783860 000002c4f62bb0c0 
[  605.339783] NIP [000002c4f62c635c] 0x2c4f62c635c
[  605.339838] LR [000002c4f62bb118] 0x2c4f62bb118
[  605.339899] --- interrupt: c00
[  605.339941] INFO: task scx_simple:2676 blocked for more than 120 seconds.
[  605.340034]       Not tainted 6.11.0-rc1+ #13
[  605.340076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.340139] task:scx_simple      state:D stack:0     pid:2676  tgid:2676  ppid:2480   flags:0x00040080
[  605.340254] Sched_ext: simple (prepping)
[  605.340255] Call Trace:
[  605.340335] [c000000e5b6af5d0] [0000000000000001] 0x1 (unreliable)
[  605.340403] [c000000e5b6af780] [c00000000001d2d8] __switch_to+0x238/0x300
[  605.340495] [c000000e5b6af7e0] [c0000000012b8958] __schedule+0x6f8/0xbac
[  605.340587] [c000000e5b6af8d0] [c0000000012b8ec8] schedule+0xbc/0x174
[  605.340659] [c000000e5b6af940] [c0000000002243d4] percpu_rwsem_wait+0x184/0x1d0
[  605.340743] [c000000e5b6af9b0] [c0000000012be358] __percpu_down_read+0x58/0x108
[  605.340827] [c000000e5b6af9f0] [c0000000001707a8] cpus_read_lock+0x88/0xa0
[  605.340923] [c000000e5b6afa20] [c0000000002091cc] bpf_scx_reg+0x5dc/0xc50
[  605.340991] [c000000e5b6afc20] [c000000000443a80] bpf_struct_ops_link_create+0x150/0x1f0
[  605.341098] [c000000e5b6afc80] [c0000000003b75e8] link_create+0x78/0x430
[  605.341162] [c000000e5b6afcd0] [c0000000003b284c] __sys_bpf+0x39c/0x560
[  605.341248] [c000000e5b6afdc0] [c0000000003b0ee0] sys_bpf+0x50/0x80
[  605.341334] [c000000e5b6afe00] [c000000000030cac] system_call_exception+0xfc/0x290
[  605.341433] [c000000e5b6afe50] [c00000000000c7d4] system_call_common+0xf4/0x258
[  605.341504] --- interrupt: c00 at 0x78acc9142474
[  605.341578] NIP:  000078acc9142474 LR: 000001d33e92451c CTR: 0000000000000000
[  605.341644] REGS: c000000e5b6afe80 TRAP: 0c00   Not tainted  (6.11.0-rc1+)
[  605.341703] MSR:  900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE>  CR: 28048824  XER: 00000000
[  605.341812] IRQMASK: 0 
[  605.341812] GPR00: 0000000000000169 00007fffe6869100 000078acc9237100 000000000000001c 
[  605.341812] GPR04: 00007fffe6869280 0000000000000040 0000000000000001 000001d33e927f00 
[  605.341812] GPR08: 000000000000001c 0000000000000000 0000000000000000 0000000000000000 
[  605.341812] GPR12: 0000000000000000 000078acc95bb9c0 000001d373e5ede0 000001d33e96bd48 
[  605.341812] GPR16: 000001d33e96bcb0 000001d33e96bf00 ffffffffffffffff 000001d33e9b0430 
[  605.341812] GPR20: 000078acc92318c8 00007fffe6869450 00007fffe68693d0 00000000ffffffff 
[  605.341812] GPR24: 0000000000000001 000001d33e9b0438 00007fffe6869938 000001d33e96bea8 
[  605.341812] GPR28: 000001d33e96bff0 000001d33e96bd00 0000000000000001 00007fffe6869100 
[  605.342533] NIP [000078acc9142474] 0x78acc9142474
[  605.342590] LR [000001d33e92451c] 0x1d33e92451c
[  605.342633] --- interrupt: c00
------------------------------------------------------

I'll see if I can collect more details, and if there's anything else
you'd want me check, please let me know.


Thanks,
Aboorva



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-08-10 18:24         ` Aboorva Devarajan
@ 2024-08-13 19:54           ` Tejun Heo
  2024-08-20  7:03             ` Aboorva Devarajan
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2024-08-13 19:54 UTC (permalink / raw)
  To: Aboorva Devarajan; +Cc: void, linux-kernel

Hello,

On Sat, Aug 10, 2024 at 11:54:24PM +0530, Aboorva Devarajan wrote:
...
> Sorry for the delayed response, I just got sometime on the machine to
> run this test.
> 
> I applied the patchset on the latest sched-ext (for-next) branch and
> tested it, but the issue still persists. I did the test on a PowerPC
> system, as it is easier to reproduce the issue there compared to x86.

Bummer.

...
> I'll see if I can collect more details, and if there's anything else
> you'd want me check, please let me know.

Can you trigger sysrq-t when the system is stuck? Also, can you see whether
the problem is reproducible on x86 w/ the wq changes applied?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-08-13 19:54           ` Tejun Heo
@ 2024-08-20  7:03             ` Aboorva Devarajan
  2024-08-20 19:38               ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Aboorva Devarajan @ 2024-08-20  7:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, linux-kernel, aboorvad

On Tue, 2024-08-13 at 09:54 -1000, Tejun Heo wrote:
> Hello,
> 
> On Sat, Aug 10, 2024 at 11:54:24PM +0530, Aboorva Devarajan wrote:
> ...
> ...

> Can you trigger sysrq-t when the system is stuck? Also, can you see whether
> the problem is reproducible on x86 w/ the wq changes applied?

Hi Tejun,

I couldn't trigger sysrq-t so far as the system becomes unresponsive,
will get back if I can successfully trigger this via console. 

And yes, this issue also occurs on x86 when applying the proposed workqueue
patch on top of the recent sched-ext for-6.12 tree. However, it takes
significantly longer to hit the issue, and the stack trace differs
a bit.

Here is the trace on x86:
...
9.374534] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10779.374809] smpboot: Booting Node 1 Processor 27 APIC 0x27
[10779.594551] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10779.594788] smpboot: Booting Node 1 Processor 28 APIC 0x29
[10779.790551] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10779.790804] smpboot: Booting Node 1 Processor 29 APIC 0x2b
[10779.990527] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10779.990762] smpboot: Booting Node 1 Processor 30 APIC 0x2d
[10780.186553] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10780.198737] smpboot: Booting Node 1 Processor 31 APIC 0x2f
[10780.556624] smpboot: CPU 1 is now offline
[10780.773765] smpboot: CPU 2 is now offline
[10780.984782] smpboot: CPU 3 is now offline
[10781.030533] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10781.224421] smpboot: CPU 4 is now offline
[10781.408442] smpboot: CPU 5 is now offline
[10781.627324] smpboot: CPU 6 is now offline
[10781.824777] smpboot: CPU 7 is now offline
[10781.878540] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10782.088456] smpboot: CPU 8 is now offline
[10782.296472] smpboot: CPU 9 is now offline
[10782.504424] smpboot: CPU 10 is now offline
[10782.696423] smpboot: CPU 11 is now offline
[10782.734601] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10782.948499] smpboot: CPU 12 is now offline
[10999.932491] INFO: task kthreadd:2 blocked for more than 120 seconds.
[10999.939871]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[10999.946938] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10999.955740] task:kthreadd        state:D stack:0     pid:2     tgid:2     ppid:0      flags:0x00004000
[10999.966210] Sched_ext: simple (disabling)
[10999.966220] Call Trace:
[10999.973536]  <TASK>
[10999.975926]  __schedule+0x47e/0x19d0
[10999.979993]  ? __this_cpu_preempt_check+0x17/0x20
[10999.985351]  schedule+0x4b/0x1a0
[10999.989000]  percpu_rwsem_wait+0x118/0x140
[10999.993639]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[10999.999745]  __percpu_down_read+0x7b/0xb0
[11000.004255]  scx_pre_fork+0xd1/0xf0
[11000.008194]  sched_fork+0x59/0x1c0
[11000.012034]  copy_process+0xa69/0x2b10
[11000.016279]  ? sched_clock_noinstr+0xd/0x20
[11000.021006]  kernel_clone+0xa0/0x3d0
[11000.025040]  kernel_thread+0x70/0x90
[11000.029099]  ? __pfx_kthread+0x10/0x10
[11000.033340]  kthreadd+0x2f6/0x3a0
[11000.037097]  ? __pfx_kthreadd+0x10/0x10
[11000.041423]  ret_from_fork+0x40/0x60
[11000.045468]  ? __pfx_kthreadd+0x10/0x10
[11000.049800]  ret_from_fork_asm+0x1a/0x30
[11000.054265]  </TASK>
[11000.056781] INFO: task rcu_tasks_trace:15 blocked for more than 120 seconds.
[11000.064719]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.071751] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.080538] task:rcu_tasks_trace state:D stack:0     pid:15    tgid:15    ppid:2      flags:0x00004000
[11000.090982] Sched_ext: simple (disabling)
[11000.090986] Call Trace:
[11000.098249]  <TASK>
[11000.100620]  __schedule+0x47e/0x19d0
[11000.104685]  ? __this_cpu_preempt_check+0x17/0x20
[11000.109982]  schedule+0x4b/0x1a0
[11000.113619]  percpu_rwsem_wait+0x118/0x140
[11000.118211]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11000.124325]  __percpu_down_read+0x7b/0xb0
[11000.128836]  cpus_read_lock+0xd1/0xf0
[11000.132964]  rcu_tasks_trace_pregp_step+0xa2/0x700
[11000.138362]  ? __mutex_lock+0x388/0x8e0
[11000.142710]  rcu_tasks_wait_gp+0x5f/0x430
[11000.147208]  ? __this_cpu_preempt_check+0x17/0x20
[11000.152499]  rcu_tasks_one_gp+0x474/0x4e0
[11000.157028]  ? __pfx_rcu_tasks_kthread+0x10/0x10
[11000.162205]  rcu_tasks_kthread+0xb8/0xe0
[11000.166642]  kthread+0xfa/0x130
[11000.170171]  ? __pfx_kthread+0x10/0x10
[11000.174416]  ret_from_fork+0x40/0x60
[11000.178434]  ? __pfx_kthread+0x10/0x10
[11000.182667]  ret_from_fork_asm+0x1a/0x30
[11000.187096]  </TASK>
[11000.189581] INFO: task cpuhp/13:95 blocked for more than 121 seconds.
[11000.196833]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.203863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.212646] task:cpuhp/13        state:D stack:0     pid:95    tgid:95    ppid:2      flags:0x00004000
[11000.223106] Sched_ext: simple (disabling)
[11000.223111] Call Trace:
[11000.230370]  <TASK>
[11000.232777]  __schedule+0x47e/0x19d0
[11000.236807]  ? __this_cpu_preempt_check+0x17/0x20
[11000.242121]  schedule+0x4b/0x1a0
[11000.245758]  schedule_timeout+0x116/0x120
[11000.250280]  ? _raw_spin_unlock_irq+0x2c/0x60
[11000.255176]  ? __this_cpu_preempt_check+0x17/0x20
[11000.260482]  ? lockdep_hardirqs_on+0xce/0x150
[11000.265377]  ? trace_hardirqs_on+0x51/0x60
[11000.270014]  __wait_for_common+0x91/0x180
[11000.274519]  ? __pfx_schedule_timeout+0x10/0x10
[11000.279630]  ? __flush_work+0x232/0x610
[11000.283940]  wait_for_completion+0x28/0x30
[11000.288565]  __flush_work+0x2d4/0x610
[11000.292693]  ? __pfx_wq_barrier_func+0x10/0x10
[11000.297728]  ? __wait_for_common+0x3e/0x180
[11000.302424]  work_on_cpu_key+0x9c/0xc0
[11000.306641]  ? __pfx_work_for_cpu_fn+0x10/0x10
[11000.311629]  ? __pfx_tmigr_trigger_active+0x10/0x10
[11000.317136]  tmigr_cpu_offline+0x1dd/0x260
[11000.321742]  ? __pfx_tmigr_cpu_offline+0x10/0x10
[11000.326935]  cpuhp_invoke_callback+0x159/0x2f0
[11000.331936]  cpuhp_thread_fun+0x1e8/0x280
[11000.336452]  smpboot_thread_fn+0x196/0x240
[11000.341070]  ? __pfx_smpboot_thread_fn+0x10/0x10
[11000.346267]  kthread+0xfa/0x130
[11000.349805]  ? __pfx_kthread+0x10/0x10
[11000.354028]  ret_from_fork+0x40/0x60
[11000.358070]  ? __pfx_kthread+0x10/0x10
[11000.362288]  ret_from_fork_asm+0x1a/0x30
[11000.366740]  </TASK>
[11000.369243] INFO: task systemd-journal:790 blocked for more than 121 seconds.
[11000.377250]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.384277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.393062] task:systemd-journal state:D stack:0     pid:790   tgid:790   ppid:1      flags:0x00000002
[11000.403510] Sched_ext: simple (disabling)
[11000.403514] Call Trace:
[11000.410789]  <TASK>
[11000.413177]  __schedule+0x47e/0x19d0
[11000.417215]  ? __this_cpu_preempt_check+0x17/0x20
[11000.422510]  schedule+0x4b/0x1a0
[11000.426168]  percpu_rwsem_wait+0x118/0x140
[11000.430773]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11000.436846]  __percpu_down_read+0x7b/0xb0
[11000.441361]  scx_pre_fork+0xd1/0xf0
[11000.445284]  sched_fork+0x59/0x1c0
[11000.449117]  copy_process+0xa69/0x2b10
[11000.453346]  ? timerfd_read_iter+0x136/0x290
[11000.458160]  ? sched_clock_noinstr+0xd/0x20
[11000.462862]  ? _copy_to_iter+0xb9/0x630
[11000.467188]  kernel_clone+0xa0/0x3d0
[11000.471208]  ? copy_clone_args_from_user+0xa3/0x290
[11000.476733]  __do_sys_clone3+0xc8/0x110
[11000.481068]  ? __secure_computing+0x89/0xe0
[11000.485785]  __x64_sys_clone3+0x1a/0x20
[11000.490096]  x64_sys_call+0x1800/0x20d0
[11000.494438]  do_syscall_64+0x8b/0x140
[11000.498561]  ? do_syscall_64+0x97/0x140
[11000.502887]  ? __this_cpu_preempt_check+0x17/0x20
[11000.508186]  ? lockdep_hardirqs_on+0xce/0x150
[11000.513096]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.518568]  ? do_syscall_64+0x97/0x140
[11000.522881]  ? __lock_acquire+0xac8/0x28c0
[11000.527511]  ? sigprocmask+0xad/0xe0
[11000.531526]  ? sched_clock_noinstr+0xd/0x20
[11000.536239]  ? _raw_spin_unlock_irq+0x2c/0x60
[11000.541158]  ? __this_cpu_preempt_check+0x17/0x20
[11000.546431]  ? lockdep_hardirqs_on+0xce/0x150
[11000.551346]  ? do_syscall_64+0x97/0x140
[11000.555656]  ? __this_cpu_preempt_check+0x17/0x20
[11000.560955]  ? lockdep_hardirqs_on+0xce/0x150
[11000.565850]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.571340]  ? do_syscall_64+0x97/0x140
[11000.575655]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11000.581333] RIP: 0033:0x7ffa9a32683d
[11000.585368] RSP: 002b:00007ffc04eb0418 EFLAGS: 00000202 ORIG_RAX: 00000000000001b3
[11000.593859] RAX: ffffffffffffffda RBX: 00007ffc04eb0460 RCX: 00007ffa9a32683d
[11000.601849] RDX: 00007ffa9a2947d0 RSI: 0000000000000058 RDI: 00007ffc04eb0460
[11000.609863] RBP: fffffffffffffdc8 R08: 00007ffa97800640 R09: 00007ffc04eb057f
[11000.617868] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffa97800640
[11000.625863] R13: 0000000000000002 R14: 00007ffa9a2947d0 R15: 00007ffc04eb05c0
[11000.633887]  </TASK>
[11000.636366] INFO: task udisksd:1584 blocked for more than 121 seconds.
[11000.643691]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.650730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.659506] task:udisksd         state:D stack:0     pid:1584  tgid:1584  ppid:1      flags:0x00000002
[11000.669935] Sched_ext: simple (disabling)
[11000.669939] Call Trace:
[11000.677214]  <TASK>
[11000.679584]  __schedule+0x47e/0x19d0
[11000.683620]  ? __this_cpu_preempt_check+0x17/0x20
[11000.688932]  schedule+0x4b/0x1a0
[11000.692578]  percpu_rwsem_wait+0x118/0x140
[11000.697207]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11000.703295]  __percpu_down_read+0x7b/0xb0
[11000.707808]  scx_pre_fork+0xd1/0xf0
[11000.711747]  sched_fork+0x59/0x1c0
[11000.715598]  copy_process+0xa69/0x2b10
[11000.719847]  ? __lock_acquire+0xac8/0x28c0
[11000.724498]  kernel_clone+0xa0/0x3d0
[11000.728515]  ? copy_clone_args_from_user+0xa3/0x290
[11000.734029]  __do_sys_clone3+0xc8/0x110
[11000.738378]  ? do_syscall_64+0x97/0x140
[11000.742712]  ? do_syscall_64+0x4c/0x140
[11000.747036]  ? __this_cpu_preempt_check+0x17/0x20
[11000.752340]  __x64_sys_clone3+0x1a/0x20
[11000.756665]  x64_sys_call+0x1800/0x20d0
[11000.760999]  do_syscall_64+0x8b/0x140
[11000.765146]  ? do_syscall_64+0x97/0x140
[11000.769462]  ? __this_cpu_preempt_check+0x17/0x20
[11000.774754]  ? lockdep_hardirqs_on+0xce/0x150
[11000.779668]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.785151]  ? do_syscall_64+0x97/0x140
[11000.789459]  ? do_syscall_64+0x97/0x140
[11000.793766]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.799248]  ? do_syscall_64+0x97/0x140
[11000.803560]  ? do_syscall_64+0x97/0x140
[11000.807878]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11000.813549] RIP: 0033:0x7f0fe352683d
[11000.817566] RSP: 002b:00007ffed86d7578 EFLAGS: 00000202 ORIG_RAX: 00000000000001b3
[11000.826059] RAX: ffffffffffffffda RBX: 00007ffed86d75c0 RCX: 00007f0fe352683d
[11000.834058] RDX: 00007f0fe34947d0 RSI: 0000000000000058 RDI: 00007ffed86d75c0
[11000.842053] RBP: fffffffffffffea0 R08: 00007f0fe2000640 R09: 00007ffed86d76df
[11000.850054] R10: 0000000000000008 R11: 0000000000000202 R12: 00007f0fe2000640
[11000.858050] R13: 0000000000000002 R14: 00007f0fe34947d0 R15: 00007ffed86d7720
[11000.866079]  </TASK>
[11000.868534] INFO: task cron:1820 blocked for more than 121 seconds.
[11000.875563]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.882609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.891391] task:cron            state:D stack:0     pid:1820  tgid:1820  ppid:1      flags:0x00000002
[11000.901817] Sched_ext: simple (disabling)
[11000.901820] Call Trace:
[11000.909094]  <TASK>
[11000.911466]  __schedule+0x47e/0x19d0
[11000.915490]  ? __this_cpu_preempt_check+0x17/0x20
[11000.920810]  schedule+0x4b/0x1a0
[11000.924449]  percpu_rwsem_wait+0x118/0x140
[11000.929054]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11000.935120]  __percpu_down_read+0x7b/0xb0
[11000.939639]  scx_pre_fork+0xd1/0xf0
[11000.943564]  sched_fork+0x59/0x1c0
[11000.947396]  copy_process+0xa69/0x2b10
[11000.951607]  ? mntput_no_expire+0x9a/0x4c0
[11000.956244]  kernel_clone+0xa0/0x3d0
[11000.960271]  __do_sys_clone+0x66/0x90
[11000.964399]  __x64_sys_clone+0x29/0x30
[11000.968630]  x64_sys_call+0x1d80/0x20d0
[11000.972946]  do_syscall_64+0x8b/0x140
[11000.977062]  ? lockdep_hardirqs_on+0xce/0x150
[11000.981960]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.987440]  ? do_syscall_64+0x97/0x140
[11000.991773]  ? __do_sys_newfstatat+0x3e/0x60
[11000.996583]  ? do_syscall_64+0x97/0x140
[11001.000897]  ? __this_cpu_preempt_check+0x17/0x20
[11001.006189]  ? lockdep_hardirqs_on+0xce/0x150
[11001.011088]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.016577]  ? do_syscall_64+0x97/0x140
[11001.020892]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.026372]  ? do_syscall_64+0x97/0x140
[11001.030689]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.036178]  ? do_syscall_64+0x97/0x140
[11001.040496]  ? do_syscall_64+0x97/0x140
[11001.044811]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.050290]  ? do_syscall_64+0x97/0x140
[11001.054605]  ? do_syscall_64+0x97/0x140
[11001.058929]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11001.064608] RIP: 0033:0x7fc230eeab57
[11001.068633] RSP: 002b:00007fff0ef6db58 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[11001.077128] RAX: ffffffffffffffda RBX: 00007fc231195040 RCX: 00007fc230eeab57
[11001.085123] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[11001.093120] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[11001.101126] R10: 00007fc231031b10 R11: 0000000000000246 R12: 0000000000000001
[11001.109121] R13: 00005564ce386af0 R14: 000055649c0af0a1 R15: 00005564ce36dd00
[11001.117134]  </TASK>
[11001.119642] INFO: task scx_simple:3632 blocked for more than 122 seconds.
[11001.127291]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11001.134342] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11001.143146] task:scx_simple      state:D stack:0     pid:3632  tgid:3632  ppid:3619   flags:0x00004002
[11001.153597] Sched_ext: simple (disabling)
[11001.153609] Call Trace:
[11001.160883]  <TASK>
[11001.163242]  __schedule+0x47e/0x19d0
[11001.167297]  ? __this_cpu_preempt_check+0x17/0x20
[11001.172592]  schedule+0x4b/0x1a0
[11001.176200]  percpu_rwsem_wait+0x118/0x140
[11001.180863]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11001.186919]  __percpu_down_read+0x7b/0xb0
[11001.191468]  cpus_read_lock+0xd1/0xf0
[11001.195586]  scx_ops_enable.constprop.0+0x341/0xae0
[11001.201105]  ? __this_cpu_preempt_check+0x17/0x20
[11001.206397]  ? debug_smp_processor_id+0x1b/0x30
[11001.211519]  ? rcu_is_watching+0x17/0x70
[11001.215954]  ? trace_contention_end+0x6c/0xa0
[11001.220871]  ? __mutex_lock+0x388/0x8e0
[11001.225190]  ? __anon_inode_getfile+0x8b/0x180
[11001.230217]  ? bpf_struct_ops_link_create+0x105/0x180
[11001.235896]  ? lockdep_init_map_type+0x5a/0x290
[11001.241016]  bpf_scx_reg+0x12/0x20
[11001.244859]  bpf_struct_ops_link_create+0x124/0x180
[11001.250375]  __sys_bpf+0x2088/0x2b90
[11001.254411]  ? mutex_unlock+0x16/0x20
[11001.258604]  __x64_sys_bpf+0x1e/0x30
[11001.262614]  x64_sys_call+0x1b2e/0x20d0
[11001.266966]  do_syscall_64+0x8b/0x140
[11001.271089]  ? map_update_elem+0x252/0x310
[11001.275737]  ? __sys_bpf+0x12a2/0x2b90
[11001.279949]  ? do_syscall_64+0x97/0x140
[11001.284328]  ? __rseq_handle_notify_resume+0x440/0x630
[11001.290138]  ? do_syscall_64+0x97/0x140
[11001.294490]  ? __this_cpu_preempt_check+0x17/0x20
[11001.299766]  ? lockdep_hardirqs_on+0xce/0x150
[11001.304688]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.310173]  ? do_syscall_64+0x97/0x140
[11001.314500]  ? sysvec_apic_timer_interrupt+0x76/0xf0
[11001.320109]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11001.325807] RIP: 0033:0x7f1e1911e88d
[11001.329816] RSP: 002b:00007ffdb1b0ed08 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[11001.338343] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1e1911e88d
[11001.346368] RDX: 0000000000000040 RSI: 00007ffdb1b0edf0 RDI: 000000000000001c
[11001.354379] RBP: 00007ffdb1b0ed20 R08: 00007ffdb1b0edf0 R09: 00007ffdb1b0edf0
[11001.362395] R10: 000055722ade9500 R11: 0000000000000202 R12: 00005571f595a2cd
[11001.370419] R13: 00005571f595a200 R14: 00007ffdb1b0ef28 R15: 000055722ae06f50
[11001.378447]  </TASK>
[11001.380974] INFO: task sched_ext_ops_h:3633 blocked for more than 122 seconds.
[11001.389131]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11001.396167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11001.404971] task:sched_ext_ops_h state:D stack:0     pid:3633  tgid:3633  ppid:2      flags:0x00004000
[11001.415378] Sched_ext: simple (disabling)
[11001.415381] Call Trace:
[11001.422636]  <TASK>
[11001.425007]  __schedule+0x47e/0x19d0
[11001.429051]  ? __this_cpu_preempt_check+0x17/0x20
[11001.434342]  schedule+0x4b/0x1a0
[11001.437986]  schedule_preempt_disabled+0x1c/0x30
[11001.443162]  __mutex_lock+0x50a/0x8e0
[11001.447320]  ? scx_ops_disable_workfn+0xe0/0x580
[11001.452525]  mutex_lock_nested+0x1f/0x30
[11001.456916]  ? mutex_lock_nested+0x1f/0x30
[11001.461529]  scx_ops_disable_workfn+0xe0/0x580
[11001.466538]  ? __this_cpu_preempt_check+0x17/0x20
[11001.471834]  ? kthread_worker_fn+0x76/0x2e0
[11001.476506]  ? sched_clock_noinstr+0xd/0x20
[11001.481231]  ? _raw_spin_unlock_irq+0x2c/0x60
[11001.486138]  ? __this_cpu_preempt_check+0x17/0x20
[11001.491422]  kthread_worker_fn+0xbb/0x2e0
[11001.495911]  ? __pfx_scx_ops_disable_workfn+0x10/0x10
[11001.501593]  ? __pfx_kthread_worker_fn+0x10/0x10
[11001.506770]  kthread+0xfa/0x130
[11001.510321]  ? __pfx_kthread+0x10/0x10
[11001.514545]  ret_from_fork+0x40/0x60
[11001.518573]  ? __pfx_kthread+0x10/0x10
[11001.522780]  ret_from_fork_asm+0x1a/0x30
[11001.527229]  </TASK>
[11001.529701] INFO: task bash:3660 blocked for more than 122 seconds.
[11001.536731]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11001.543746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11001.552539] task:bash            state:D stack:0     pid:3660  tgid:3660  ppid:3640   flags:0x00004002
[11001.562963] Sched_ext: simple (disabling)
[11001.562966] Call Trace:
[11001.570197]  <TASK>
[11001.572571]  __schedule+0x47e/0x19d0
[11001.576603]  ? __this_cpu_preempt_check+0x17/0x20
[11001.581895]  schedule+0x4b/0x1a0
[11001.585535]  schedule_timeout+0x116/0x120
[11001.590054]  ? _raw_spin_unlock_irq+0x2c/0x60
[11001.594938]  ? __this_cpu_preempt_check+0x17/0x20
[11001.600242]  ? lockdep_hardirqs_on+0xce/0x150
[11001.605129]  ? trace_hardirqs_on+0x51/0x60
[11001.609735]  __wait_for_common+0x91/0x180
[11001.614261]  ? __pfx_schedule_timeout+0x10/0x10
[11001.619343]  ? __flush_work+0x232/0x610
[11001.623656]  wait_for_completion+0x28/0x30
[11001.628285]  __flush_work+0x2d4/0x610
[11001.632410]  ? __pfx_wq_barrier_func+0x10/0x10
[11001.637427]  ? __wait_for_common+0x3e/0x180
[11001.642134]  work_on_cpu_key+0x9c/0xc0
[11001.646361]  ? __pfx_work_for_cpu_fn+0x10/0x10
[11001.651337]  ? __pfx___cpu_down_maps_locked+0x10/0x10
[11001.657013]  cpu_down_maps_locked+0xbf/0xd0
[11001.661737]  cpu_device_down+0x2e/0x50
[11001.665966]  cpu_subsys_offline+0x12/0x20
[11001.670472]  device_offline+0xf0/0x120
[11001.674732]  online_store+0x64/0xe0
[11001.678684]  dev_attr_store+0x1b/0x30
[11001.682824]  sysfs_kf_write+0x4f/0x70
[11001.686975]  kernfs_fop_write_iter+0x172/0x230
[11001.691988]  vfs_write+0x378/0x540
[11001.695841]  ksys_write+0x71/0xf0
[11001.699585]  __x64_sys_write+0x1d/0x30
[11001.703786]  x64_sys_call+0x1685/0x20d0
[11001.708122]  do_syscall_64+0x8b/0x140
[11001.712246]  ? do_syscall_64+0x97/0x140
[11001.716539]  ? __this_cpu_preempt_check+0x17/0x20
[11001.721819]  ? lockdep_hardirqs_on+0xce/0x150
[11001.726734]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.732205]  ? do_syscall_64+0x97/0x140
[11001.736517]  ? lockdep_hardirqs_on+0xce/0x150
[11001.741432]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.746899]  ? do_syscall_64+0x97/0x140
[11001.751209]  ? lockdep_hardirqs_on+0xce/0x150
[11001.756105]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.761608]  ? do_syscall_64+0x97/0x140
[11001.765901]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.771371]  ? do_syscall_64+0x97/0x140
[11001.775689]  ? do_syscall_64+0x97/0x140
[11001.780019]  ? irqentry_exit+0x6f/0xa0
[11001.784217]  ? sysvec_call_function+0x76/0xf0
[11001.789114]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11001.794793] RIP: 0033:0x7f48b4914887
[11001.798807] RSP: 002b:00007ffc30aa2b48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[11001.807309] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f48b4914887
[11001.815287] RDX: 0000000000000002 RSI: 00005593f11663b0 RDI: 0000000000000001
[11001.823275] RBP: 00005593f11663b0 R08: 00007f48b49d1460 R09: 000000007fffffff
[11001.831270] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[11001.839262] R13: 00007f48b4a1b780 R14: 00007f48b4a17600 R15: 00007f48b4a16a00
[11001.847287]  </TASK>
[11001.849780] INFO: task kworker/0:1:8931 blocked for more than 122 seconds.
[11001.857471]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11001.864516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11001.873290] task:kworker/0:1     state:D stack:0     pid:8931  tgid:8931  ppid:2      flags:0x00004000
[11001.883720] Workqueue: events vmstat_shepherd
[11001.888608] Sched_ext: simple (disabling)
[11001.888630] Call Trace:
[11001.895884]  <TASK>
[11001.898248]  __schedule+0x47e/0x19d0
[11001.902293]  ? __this_cpu_preempt_check+0x17/0x20
[11001.907598]  schedule+0x4b/0x1a0
[11001.911223]  percpu_rwsem_wait+0x118/0x140
[11001.915836]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11001.921910]  __percpu_down_read+0x7b/0xb0
[11001.926409]  cpus_read_lock+0xd1/0xf0
[11001.930536]  vmstat_shepherd+0x20/0x120
[11001.934863]  process_one_work+0x21e/0x680
[11001.939358]  ? __this_cpu_preempt_check+0x17/0x20
[11001.944672]  worker_thread+0x194/0x340
[11001.948888]  ? __pfx_worker_thread+0x10/0x10
[11001.953689]  kthread+0xfa/0x130
[11001.957226]  ? __pfx_kthread+0x10/0x10
[11001.961446]  ret_from_fork+0x40/0x60
[11001.965465]  ? __pfx_kthread+0x10/0x10
[11001.969679]  ret_from_fork_asm+0x1a/0x30
[11001.974106]  </TASK>
[11001.976563] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
[11001.985786] 
[11001.985786] Showing all locks held in the system:
[11001.992741] 1 lock held by kthreadd/2:
[11001.996931]  #0: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x59/0x1c0
[11002.006155] 2 locks held by rcu_tasks_trace/15:
[11002.011223]  #0: ffffffff94fb1780 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{4:4}, at: rcu_tasks_one_gp+0x341/0x4e0
[11002.022633]  #1: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: rcu_tasks_trace_pregp_step+0xa2/0x700
[11002.033587] 2 locks held by cpuhp/13/95:
[11002.037976]  #0: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x280
[11002.047933]  #1: ffffffff94ea6400 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x280
[11002.057916] 1 lock held by khungtaskd/247:
[11002.062498]  #0: ffffffff94fb24e0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x3b/0x200
[11002.072578] 1 lock held by systemd-journal/790:
[11002.077646]  #0: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x59/0x1c0
[11002.086846] 1 lock held by in:imklog/1591:
[11002.091427]  #0: ffff9aa01164eed8 (&f->f_pos_lock){+.+.}-{4:4}, at: __fdget_pos+0x7c/0xc0
[11002.100635] 1 lock held by udisksd/1584:
[11002.105024]  #0: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x59/0x1c0
[11002.114238] 1 lock held by cron/1820:
[11002.118344]  #0: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x59/0x1c0
[11002.127562] 2 locks held by sh/1940:
[11002.131580]  #0: ffff9aab88e248a8 (&tty->ldisc_sem){++++}-{0:0}, at: ldsem_down_read+0x29/0x40
[11002.141249]  #1: ffffb4458dd7b2f8 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x563/0x690
[11002.151519] 4 locks held by scx_simple/3632:
[11002.156310]  #0: ffffffff950ba1d0 (update_mutex){+.+.}-{4:4}, at: bpf_struct_ops_link_create+0x105/0x180
[11002.166926]  #1: ffffffff94ec3cb0 (scx_ops_enable_mutex){+.+.}-{4:4}, at: scx_ops_enable.constprop.0+0x60/0xae0
[11002.178233]  #2: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: scx_ops_enable.constprop.0+0x33c/0xae0
[11002.189062]  #3: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: scx_ops_enable.constprop.0+0x341/0xae0
[11002.200077] 1 lock held by sched_ext_ops_h/3633:
[11002.205267]  #0: ffffffff94ec3cb0 (scx_ops_enable_mutex){+.+.}-{4:4}, at: scx_ops_disable_workfn+0xe0/0x580
[11002.216182] 6 locks held by bash/3660:
[11002.220387]  #0: ffff9aab83cfd450 (sb_writers#6){.+.+}-{0:0}, at: ksys_write+0x71/0xf0
[11002.229260]  #1: ffff9aa011637090 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x122/0x230
[11002.239216]  #2: ffff9aa0051ef4e0 (kn->active#150){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x12a/0x230
[11002.249553]  #3: ffffffff951e9070 (device_hotplug_lock){+.+.}-{4:4}, at: online_store+0x47/0xe0
[11002.259315]  #4: ffff9ab74d404d98 (&dev->mutex){....}-{4:4}, at: device_offline+0x97/0x120
[11002.268593]  #5: ffffffff94ea3dd0 (cpu_add_remove_lock){+.+.}-{4:4}, at: cpu_device_down+0x24/0x50
[11002.278664] 3 locks held by kworker/0:1/8931:
[11002.283545]  #0: ffff9aa000051358 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[11002.294083]  #1: ffffb4458b617e40 ((shepherd).work){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[11002.304037]  #2: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: vmstat_shepherd+0x20/0x120
[11002.313888] 3 locks held by kworker/0:11/8939:
[11002.318874]  #0: ffff9aa000051358 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[11002.329403]  #1: ffffb4458b847e40 ((work_completion)(&wfc.work)#3){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[11002.340812]  #2: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x7d/0x460
[11002.350102] 4 locks held by kworker/13:2/9104:
[11002.355081]  #0: ffff9aa000051358 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[11002.365619]  #1: ffffb4458923fe40 (rebuild_sd_work){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[11002.375578]  #2: ffffffff94ec7130 (sched_energy_mutex){+.+.}-{4:4}, at: rebuild_sd_workfn+0x1b/0x40
[11002.385732]  #3: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: rebuild_sched_domains+0x12/0x40
[11002.396090] 
[11002.397783] =============================================


--------------------------------------------------------------------------------------------------------------------------------

Meanwhile, I spent some time investigating the circular dependency issue on
PowerPC.

Here are some details:

During continuous CPU online and offline operations, the scx_simple scheduler
unregisters and registers itself. This process can lead to a potential deadlock.

Below is an example of a common circular dependency that is frequently observed
on PowerPC, though other scenarios may also exist:

Note: The entire stack trace is not included to keep the flow simple.
------------------------------------------------------------------------------------------------------------------------
Time     | CPU 1 (scx_simple)                 | CPU 7 (hotplug.sh)                   | CPU 3 (kthreadd)               |
------------------------------------------------------------------------------------------------------------------------
T1       | bpf_scx_reg                        | cpu_device_up                        | -                              |
------------------------------------------------------------------------------------------------------------------------
T2       | scx_ops_enable                     | _cpu_up                              | -                              |
------------------------------------------------------------------------------------------------------------------------
T3       | scx_fork_rwsem taken (#1)          | cpus_write_lock taken (#2)           | -                              |
         | (Holds scx_fork_rwsem)             | (Holds cpus_write_lock)              |                                |
------------------------------------------------------------------------------------------------------------------------
T4       | Waiting on CPU 7                   | cpuhp_invoke_callback                | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T5       | Waiting on CPU 7                   | smpboot_create_threads               | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T6       | Waiting on CPU 7                   | kthread_create_on_cpu                | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T7       | Waiting on CPU 7                   | __kthread_create_on_node             | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T8       | Waiting on CPU 7                   | spawn kthread                        | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T9       | Waiting on CPU 7                   | Waits for kthread completion ------> | kthreadd (#3)                  |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T10      | Waiting on CPU 7                   | Waits on CPU 3                       | create_kthread                 |
         | (To take cpus_read_lock)           | (For kthread completion)             |                                |
------------------------------------------------------------------------------------------------------------------------
T11      | Waiting on CPU 7                   | Waits on CPU 3                       | sched_fork                     |
         | (To take cpus_read_lock)           | (For kthread completion)             |                                |
------------------------------------------------------------------------------------------------------------------------
T12      | Waiting on CPU 7                   | Waits on CPU 3                       | scx_pre_fork                   |
         | (To take cpus_read_lock)           | (For kthread completion)             |                                |
------------------------------------------------------------------------------------------------------------------------
T13      | Waiting on CPU 7                   | Waits on CPU 3                       | Trying to hold scx_fork_rwsem  |
         | (To take cpus_read_lock)           | (For kthread completion)             | but it is held by #1           |
------------------------------------------------------------------------------------------------------------------------
T14      | Waiting on CPU 7                   | Waits on CPU 3                       | Waits on CPU 1                 |
         | (To take cpus_read_lock)           | (For kthread completion)             | (To release scx_fork_rwsem)    |
         | (Dependent on #2)                  | (Dependent on #3)                    | (Dependent on #1)              |
------------------------------------------------------------------------------------------------------------------------

Summary of events:

T1-T3: CPU 1 takes scx_fork_rwsem (#1), and CPU 7 takes cpus_write_lock (#2).

T4-T13: CPU 1 is waiting for CPU 7 to take cpus_read_lock, while CPU 7 is
performing a sequence of hotplug callbacks involving kthread creation and
waiting for the kthread completion on CPU 3.

T14: Circular dependency forms:
CPU 1 is waiting on CPU 7 (dependent on lock #2 to be released).
CPU 7 is waiting on CPU 3 (dependent on completion of #3).
CPU 3 is waiting on CPU 1 (dependent on lock #1 to be released).


This results in a deadlock where each CPU is waiting on another to release
a lock or complete an operation, creating a clear circular dependency
where none of the CPUs can progress.


Could I be missing something here, please let me know your thoughts?

Thanks,
Aboorva


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-08-20  7:03             ` Aboorva Devarajan
@ 2024-08-20 19:38               ` Tejun Heo
  2024-08-23  9:20                 ` Aboorva Devarajan
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2024-08-20 19:38 UTC (permalink / raw)
  To: Aboorva Devarajan; +Cc: void, linux-kernel

On Tue, Aug 20, 2024 at 12:33:34PM +0530, Aboorva Devarajan wrote:
> On Tue, 2024-08-13 at 09:54 -1000, Tejun Heo wrote:
> > Hello,
> > 
> > On Sat, Aug 10, 2024 at 11:54:24PM +0530, Aboorva Devarajan wrote:
> > ...
> > ...
> 
> > Can you trigger sysrq-t when the system is stuck? Also, can you see whether
> > the problem is reproducible on x86 w/ the wq changes applied?
> 
> Hi Tejun,
> 
> I couldn't trigger sysrq-t so far as the system becomes unresponsive,
> will get back if I can successfully trigger this via console. 
> 
> And yes, this issue also occurs on x86 when applying the proposed workqueue
> patch on top of the recent sched-ext for-6.12 tree. However, it takes
> significantly longer to hit the issue, and the stack trace differs
> a bit.

I think this *should* fix the problem but it is rather ugly. The locking
order is such that there's no good way out. Maybe the solution is making
cpu_hotplug_disable() more useful. Anyways, can you test this one?

Thanks.

---
 kernel/sched/ext.c |   30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3990,6 +3990,24 @@ static const char *scx_exit_reason(enum
 	}
 }
 
+static void block_fork_hotplug(void)
+{
+	while (true) {
+		percpu_down_write(&scx_fork_rwsem);
+		if (cpus_read_trylock())
+			return;
+		percpu_up_write(&scx_fork_rwsem);
+		cpus_read_lock();
+		cpus_read_unlock();
+	}
+}
+
+static void unblock_fork_hotplug(void)
+{
+	cpus_read_unlock();
+	percpu_up_write(&scx_fork_rwsem);
+}
+
 static void scx_ops_disable_workfn(struct kthread_work *work)
 {
 	struct scx_exit_info *ei = scx_exit_info;
@@ -4045,8 +4063,7 @@ static void scx_ops_disable_workfn(struc
 	 * Avoid racing against fork. See scx_ops_enable() for explanation on
 	 * the locking order.
 	 */
-	percpu_down_write(&scx_fork_rwsem);
-	cpus_read_lock();
+	block_fork_hotplug();
 
 	spin_lock_irq(&scx_tasks_lock);
 	scx_task_iter_init(&sti);
@@ -4090,8 +4107,7 @@ static void scx_ops_disable_workfn(struc
 	static_branch_disable_cpuslocked(&scx_builtin_idle_enabled);
 	synchronize_rcu();
 
-	cpus_read_unlock();
-	percpu_up_write(&scx_fork_rwsem);
+	unblock_fork_hotplug();
 
 	if (ei->kind >= SCX_EXIT_ERROR) {
 		pr_err("sched_ext: BPF scheduler \"%s\" disabled (%s)\n",
@@ -4657,8 +4673,7 @@ static int scx_ops_enable(struct sched_e
 	 *
 	 *   scx_fork_rwsem --> pernet_ops_rwsem --> cpu_hotplug_lock
 	 */
-	percpu_down_write(&scx_fork_rwsem);
-	cpus_read_lock();
+	block_fork_hotplug();
 
 	check_hotplug_seq(ops);
 
@@ -4765,8 +4780,7 @@ static int scx_ops_enable(struct sched_e
 
 	spin_unlock_irq(&scx_tasks_lock);
 	preempt_enable();
-	cpus_read_unlock();
-	percpu_up_write(&scx_fork_rwsem);
+	unblock_fork_hotplug();
 
 	/* see above ENABLING transition for the explanation on exiting with 0 */
 	if (!scx_ops_tryset_enable_state(SCX_OPS_ENABLED, SCX_OPS_ENABLING)) {

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-08-20 19:38               ` Tejun Heo
@ 2024-08-23  9:20                 ` Aboorva Devarajan
  2024-08-26 18:32                   ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Aboorva Devarajan @ 2024-08-23  9:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, linux-kernel

On Tue, 2024-08-20 at 09:38 -1000, Tejun Heo wrote:
> On Tue, Aug 20, 2024 at 12:33:34PM +0530, Aboorva Devarajan wrote:
> > On Tue, 2024-08-13 at 09:54 -1000, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Sat, Aug 10, 2024 at 11:54:24PM +0530, Aboorva Devarajan wrote:
> > > ...
> I think this *should* fix the problem but it is rather ugly. The locking
> order is such that there's no good way out. Maybe the solution is making
> cpu_hotplug_disable() more useful. Anyways, can you test this one?
> 
> Thanks.
> 
> ---
>  kernel/sched/ext.c |   30 ++++++++++++++++++++++--------
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3990,6 +3990,24 @@ static const char *scx_exit_reason(enum
>  	}
>  }
>  
> +static void block_fork_hotplug(void)
> +{
> +	while (true) {
> +		percpu_down_write(&scx_fork_rwsem);
> +		if (cpus_read_trylock())
> +			return;
> +		percpu_up_write(&scx_fork_rwsem);
> +		cpus_read_lock();
> +		cpus_read_unlock();
> +	}
> +}
> +
> +static void unblock_fork_hotplug(void)
> +{
> +	cpus_read_unlock();
> +	percpu_up_write(&scx_fork_rwsem);
> +}
> +
>  static void scx_ops_disable_workfn(struct kthread_work *work)
>  {
>  	struct scx_exit_info *ei = scx_exit_info;
> @@ -4045,8 +4063,7 @@ static void scx_ops_disable_workfn(struc
>  	 * Avoid racing against fork. See scx_ops_enable() for explanation on
>  	 * the locking order.
>  	 */
> -	percpu_down_write(&scx_fork_rwsem);
> -	cpus_read_lock();
> +	block_fork_hotplug();
>  
>  	spin_lock_irq(&scx_tasks_lock);
>  	scx_task_iter_init(&sti);
> @@ -4090,8 +4107,7 @@ static void scx_ops_disable_workfn(struc
>  	static_branch_disable_cpuslocked(&scx_builtin_idle_enabled);
>  	synchronize_rcu();
>  
> -	cpus_read_unlock();
> -	percpu_up_write(&scx_fork_rwsem);
> +	unblock_fork_hotplug();
>  
>  	if (ei->kind >= SCX_EXIT_ERROR) {
>  		pr_err("sched_ext: BPF scheduler \"%s\" disabled (%s)\n",
> @@ -4657,8 +4673,7 @@ static int scx_ops_enable(struct sched_e
>  	 *
>  	 *   scx_fork_rwsem --> pernet_ops_rwsem --> cpu_hotplug_lock
>  	 */
> -	percpu_down_write(&scx_fork_rwsem);
> -	cpus_read_lock();
> +	block_fork_hotplug();
>  
>  	check_hotplug_seq(ops);
>  
> @@ -4765,8 +4780,7 @@ static int scx_ops_enable(struct sched_e
>  
>  	spin_unlock_irq(&scx_tasks_lock);
>  	preempt_enable();
> -	cpus_read_unlock();
> -	percpu_up_write(&scx_fork_rwsem);
> +	unblock_fork_hotplug();
>  
>  	/* see above ENABLING transition for the explanation on exiting with 0 */
>  	if (!scx_ops_tryset_enable_state(SCX_OPS_ENABLED, SCX_OPS_ENABLING)) {

Hi Tejun,

I applied this patch to the almost latest sched-ext (for-6.12) branch upto
commit 89909296a51e792 ("sched_ext: Don't use double locking to migrate
tasks across CPUs") and let the test run for over 20 hours, and it completed
without any hangs on both x86 and PowerPC.

So, indeed, making sure that both scx_fork_rwsem and cpu_hotplug_lock (read)
are only held together simulataneously when they can both be acquired seems
to be resolving the deadlock.

Thanks,
Aboorva


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-08-23  9:20                 ` Aboorva Devarajan
@ 2024-08-26 18:32                   ` Tejun Heo
  2024-09-17 12:48                     ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2024-08-26 18:32 UTC (permalink / raw)
  To: Aboorva Devarajan; +Cc: void, linux-kernel

Hello,

On Fri, Aug 23, 2024 at 02:50:01PM +0530, Aboorva Devarajan wrote:
...
> I applied this patch to the almost latest sched-ext (for-6.12) branch upto
> commit 89909296a51e792 ("sched_ext: Don't use double locking to migrate
> tasks across CPUs") and let the test run for over 20 hours, and it completed
> without any hangs on both x86 and PowerPC.
> 
> So, indeed, making sure that both scx_fork_rwsem and cpu_hotplug_lock (read)
> are only held together simulataneously when they can both be acquired seems
> to be resolving the deadlock.

Thanks a lot for confirming. Let me think it over a bit re. what should be
applied.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-08-26 18:32                   ` Tejun Heo
@ 2024-09-17 12:48                     ` Tejun Heo
  2024-09-23 14:56                       ` Aboorva Devarajan
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2024-09-17 12:48 UTC (permalink / raw)
  To: Aboorva Devarajan; +Cc: void, linux-kernel

Hello, Aboorva.

On Mon, Aug 26, 2024 at 08:32:03AM -1000, Tejun Heo wrote:
> On Fri, Aug 23, 2024 at 02:50:01PM +0530, Aboorva Devarajan wrote:
> ...
> > I applied this patch to the almost latest sched-ext (for-6.12) branch upto
> > commit 89909296a51e792 ("sched_ext: Don't use double locking to migrate
> > tasks across CPUs") and let the test run for over 20 hours, and it completed
> > without any hangs on both x86 and PowerPC.
> > 
> > So, indeed, making sure that both scx_fork_rwsem and cpu_hotplug_lock (read)
> > are only held together simulataneously when they can both be acquired seems
> > to be resolving the deadlock.
> 
> Thanks a lot for confirming. Let me think it over a bit re. what should be
> applied.

Sorry about the delay. It ended up a bit invasive and took longer. Can you
please verify the following branch fixes the problem?

 https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-enable-locking-fix

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-09-17 12:48                     ` Tejun Heo
@ 2024-09-23 14:56                       ` Aboorva Devarajan
  2024-09-23 18:21                         ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Aboorva Devarajan @ 2024-09-23 14:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, linux-kernel

On Tue, 2024-09-17 at 14:48 +0200, Tejun Heo wrote:
> Hello, Aboorva.
> 
> On Mon, Aug 26, 2024 at 08:32:03AM -1000, Tejun Heo wrote:
> > On Fri, Aug 23, 2024 at 02:50:01PM +0530, Aboorva Devarajan wrote:
> > ...
> > > I applied this patch to the almost latest sched-ext (for-6.12) branch upto
> > > commit 89909296a51e792 ("sched_ext: Don't use double locking to migrate
> > > tasks across CPUs") and let the test run for over 20 hours, and it completed
> > > without any hangs on both x86 and PowerPC.
> > > 
> > > So, indeed, making sure that both scx_fork_rwsem and cpu_hotplug_lock (read)
> > > are only held together simulataneously when they can both be acquired seems
> > > to be resolving the deadlock.
> > 
> > Thanks a lot for confirming. Let me think it over a bit re. what should be
> > applied.
> 
> Sorry about the delay. It ended up a bit invasive and took longer. Can you
> please verify the following branch fixes the problem?
> 
>  https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-enable-locking-fix
> 
> Thanks.
> 

Hi Tejun,

I've run CPU hotplug tests and verified that with the above specified branch on both
Power10 and x86 the originally reported hang is no longer seen.

------------------------------------------------------------------------------------------
On x86: 

https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-enable-locking-fix

Upto - a3b4678ca086 ("sched_ext: Decouple locks in scx_ops_enable()")

I kept the tests running for over 48 hours, and I didn't encounter any hangs or lockups.
------------------------------------------------------------------------------------------

On Power10:

https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-enable-locking-fix

Upto - a3b4678ca086 ("sched_ext: Decouple locks in scx_ops_enable()") +
Patches to support struct_ops on PowerPC

After running the tests for over 4 hours, I didn’t observe the original issue, but I hit
another crash, which seems specific to PowerPC which is related to instruction patching.
However, the originally reported hang appears to be resolved.
------------------------------------------------------------------------------------------

Sharing the crash logs observed in PowerPC here for general reference, FYI:


[ 8638.891964] Kernel attempted to read user page (a8) - exploit attempt? (uid: 0)
[ 8638.892002] BUG: Kernel NULL pointer dereference on read at 0x000000a8
[ 8638.892019] Faulting instruction address: 0xc0000000004e7cc0
[ 8638.892038] Oops: Kernel access of bad area, sig: 11 [#1]
[ 8638.892060] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[ 8638.892080] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 ebtable_filter ebtables vhost_vsock vmw_vsock_virtio_transport_common ip6tabl
e_filter ip6_tables vhost vhost_iotlb iptable_filter vsock bridge stp llc kvm_hv kvm joydev
input_leds mac_hid at24 ofpart cmdlinepart uio_pdrv_genirq ibmpowernv opal_prd ipmi_powernv 
powernv_flash uio binfmt_misc sch_fq_codel nfsd mtd ipmi_devintf ipmi_msghandler auth_rpcgss
jc42 ramoops reed_solomon ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async
_memcpy async_pq async_xor async_tx raid1 raid0 dm_mirror dm_region_hash dm_log mlx5_ib ib_uverbs
ib_core mlx5_core hid_generic usbhid hid ast i2c_algo_bit drm_shmem_helper drm_kms_hel
per vmx_crypto drm mlxfw crct10dif_vpmsum crc32c_vpmsum psample tls tg3 ahci libahci
drm_panel_orientation_quirks
[ 8638.892621] CPU: 62 UID: 0 PID: 5591 Comm: kworker/62:2 Not tainted 6.11.0-rc4+ #2
[ 8638.892663] Hardware name: 8335-GTW POWER9 0x4e1203 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
[ 8638.892693] Workqueue: events bpf_prog_free_deferred
[ 8638.892735] NIP:  c0000000004e7cc0 LR: c0000000004e7bbc CTR: c0000000003a9b30
[ 8638.892798] REGS: c000000ea4cbf7f0 TRAP: 0300   Not tainted  (6.11.0-rc4+)
[ 8638.892862] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42a00284  XER: 00000000
[ 8638.892915] CFAR: c0000000004e7bb8 DAR: 00000000000000a8 DSISR: 40000000 IRQMASK: 1 
[ 8638.892915] GPR00: c0000000004e7bbc c000000ea4cbfa90 c000000002837f00 0000000000000005 
[ 8638.892915] GPR04: 0000000000000015 0000000000000009 0000000000000009 c000000004840b00 
[ 8638.892915] GPR08: ffffffffffffffff 00000000ffffe000 ffffffffffffffff 000001937b55db50 
[ 8638.892915] GPR12: 0000000000200000 c000007ffdfac300 c0000000031b1fc8 0000000000010000 
[ 8638.892915] GPR16: c00000000000018e 000000007fffffff 0000000000000000 000000000000e1c0 
[ 8638.892915] GPR20: 61c8864680b583eb 0000000000000000 0000000000000000 00000000000de1d5 
[ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8 
[ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8 
[ 8638.892915] GPR28: 0000000000000000 0000000000000000 0000000000000000 c000000ea4cbfa90 
[ 8638.893350] NIP [c0000000004e7cc0] walk_to_pmd+0x80/0x240
[ 8638.893380] LR [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
[ 8638.893398] Call Trace:
[ 8638.893407] [c000000ea4cbfa90] [c000000ea4cbfb20] 0xc000000ea4cbfb20 (unreliable)
[ 8638.893429] [c000000ea4cbfaf0] [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
[ 8638.893457] [c000000ea4cbfb40] [c0000000000b1dd0] patch_instructions+0x130/0x630
[ 8638.893500] [c000000ea4cbfc10] [c000000000123180] bpf_arch_text_invalidate+0x80/0xd0
[ 8638.893552] [c000000ea4cbfc60] [c0000000003a7508] bpf_prog_pack_free+0x138/0x2f0
[ 8638.893584] [c000000ea4cbfd10] [c0000000003a7e38] bpf_jit_binary_pack_free+0x48/0xa0
[ 8638.893617] [c000000ea4cbfd50] [c000000000123258] bpf_jit_free+0x88/0x100
[ 8638.893667] [c000000ea4cbfd90] [c0000000003a9d70] bpf_prog_free_deferred+0x240/0x280
[ 8638.893725] [c000000ea4cbfde0] [c0000000001a6828] process_scheduled_works+0x268/0x520
[ 8638.893767] [c000000ea4cbfee0] [c0000000001a9ed0] worker_thread+0x3f0/0x590
[ 8638.893809] [c000000ea4cbff80] [c0000000001b37b0] kthread+0x1a0/0x1c0
[ 8638.893862] [c000000ea4cbffe0] [c00000000000d030] start_kernel_thread+0x14/0x18
[ 8638.893913] Code: 3cc20157 3b63c4f8 3b45c500 3929c510 3b26c508 3940ffff e87b0000 e8ba0000
81290000 e8d90000 38830010 7d494830 <e87d00a8> 7ce42a14 7d2948f8 7d073214 
[ 8638.894003] ---[ end trace 0000000000000000 ]---
[ 8639.098185] pstore: backend (nvram) writing error (-1)
[ 8639.098205] 
[ 8639.098215] note: kworker/62:2[5591] exited with irqs disabled
[ 8798.806603] ------------[ cut here ]------------
[ 8798.806631] WARNING: CPU: 62 PID: 3769 at kernel/kthread.c:76 kthread_set_per_cpu+0x40/0xd0
[ 8798.806653] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle
ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv
------------------------------------------------------------------------------------------

We will look at this issue as it is specific to PowerPC.

But I can confirm that the originally reported hang (deadlock) is no longer present.

Thanks much,
Aboorva


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-09-23 14:56                       ` Aboorva Devarajan
@ 2024-09-23 18:21                         ` Tejun Heo
  2024-09-24  8:03                           ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2024-09-23 18:21 UTC (permalink / raw)
  To: Aboorva Devarajan; +Cc: void, linux-kernel, Alexei Starovoitov, Andrii Nakryiko

Hello,

(cc'ing Alexei and Andrii for the BPF part)

On Mon, Sep 23, 2024 at 08:26:32PM +0530, Aboorva Devarajan wrote:
> Sharing the crash logs observed in PowerPC here for general reference, FYI:
> 
> [ 8638.891964] Kernel attempted to read user page (a8) - exploit attempt? (uid: 0)
> [ 8638.892002] BUG: Kernel NULL pointer dereference on read at 0x000000a8
> [ 8638.892019] Faulting instruction address: 0xc0000000004e7cc0
> [ 8638.892038] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 8638.892060] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
> [ 8638.892080] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
> br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
>  ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 ebtable_filter ebtables vhost_vsock vmw_vsock_virtio_transport_common ip6tabl
> e_filter ip6_tables vhost vhost_iotlb iptable_filter vsock bridge stp llc kvm_hv kvm joydev
> input_leds mac_hid at24 ofpart cmdlinepart uio_pdrv_genirq ibmpowernv opal_prd ipmi_powernv 
> powernv_flash uio binfmt_misc sch_fq_codel nfsd mtd ipmi_devintf ipmi_msghandler auth_rpcgss
> jc42 ramoops reed_solomon ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async
> _memcpy async_pq async_xor async_tx raid1 raid0 dm_mirror dm_region_hash dm_log mlx5_ib ib_uverbs
> ib_core mlx5_core hid_generic usbhid hid ast i2c_algo_bit drm_shmem_helper drm_kms_hel
> per vmx_crypto drm mlxfw crct10dif_vpmsum crc32c_vpmsum psample tls tg3 ahci libahci
> drm_panel_orientation_quirks
> [ 8638.892621] CPU: 62 UID: 0 PID: 5591 Comm: kworker/62:2 Not tainted 6.11.0-rc4+ #2
> [ 8638.892663] Hardware name: 8335-GTW POWER9 0x4e1203 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
> [ 8638.892693] Workqueue: events bpf_prog_free_deferred
> [ 8638.892735] NIP:  c0000000004e7cc0 LR: c0000000004e7bbc CTR: c0000000003a9b30
> [ 8638.892798] REGS: c000000ea4cbf7f0 TRAP: 0300   Not tainted  (6.11.0-rc4+)
> [ 8638.892862] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42a00284  XER: 00000000
> [ 8638.892915] CFAR: c0000000004e7bb8 DAR: 00000000000000a8 DSISR: 40000000 IRQMASK: 1 
> [ 8638.892915] GPR00: c0000000004e7bbc c000000ea4cbfa90 c000000002837f00 0000000000000005 
> [ 8638.892915] GPR04: 0000000000000015 0000000000000009 0000000000000009 c000000004840b00 
> [ 8638.892915] GPR08: ffffffffffffffff 00000000ffffe000 ffffffffffffffff 000001937b55db50 
> [ 8638.892915] GPR12: 0000000000200000 c000007ffdfac300 c0000000031b1fc8 0000000000010000 
> [ 8638.892915] GPR16: c00000000000018e 000000007fffffff 0000000000000000 000000000000e1c0 
> [ 8638.892915] GPR20: 61c8864680b583eb 0000000000000000 0000000000000000 00000000000de1d5 
> [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8 
> [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8 
> [ 8638.892915] GPR28: 0000000000000000 0000000000000000 0000000000000000 c000000ea4cbfa90 
> [ 8638.893350] NIP [c0000000004e7cc0] walk_to_pmd+0x80/0x240
> [ 8638.893380] LR [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
> [ 8638.893398] Call Trace:
> [ 8638.893407] [c000000ea4cbfa90] [c000000ea4cbfb20] 0xc000000ea4cbfb20 (unreliable)
> [ 8638.893429] [c000000ea4cbfaf0] [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
> [ 8638.893457] [c000000ea4cbfb40] [c0000000000b1dd0] patch_instructions+0x130/0x630
> [ 8638.893500] [c000000ea4cbfc10] [c000000000123180] bpf_arch_text_invalidate+0x80/0xd0
> [ 8638.893552] [c000000ea4cbfc60] [c0000000003a7508] bpf_prog_pack_free+0x138/0x2f0
> [ 8638.893584] [c000000ea4cbfd10] [c0000000003a7e38] bpf_jit_binary_pack_free+0x48/0xa0
> [ 8638.893617] [c000000ea4cbfd50] [c000000000123258] bpf_jit_free+0x88/0x100
> [ 8638.893667] [c000000ea4cbfd90] [c0000000003a9d70] bpf_prog_free_deferred+0x240/0x280
> [ 8638.893725] [c000000ea4cbfde0] [c0000000001a6828] process_scheduled_works+0x268/0x520
> [ 8638.893767] [c000000ea4cbfee0] [c0000000001a9ed0] worker_thread+0x3f0/0x590
> [ 8638.893809] [c000000ea4cbff80] [c0000000001b37b0] kthread+0x1a0/0x1c0
> [ 8638.893862] [c000000ea4cbffe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> [ 8638.893913] Code: 3cc20157 3b63c4f8 3b45c500 3929c510 3b26c508 3940ffff e87b0000 e8ba0000
> 81290000 e8d90000 38830010 7d494830 <e87d00a8> 7ce42a14 7d2948f8 7d073214 
> [ 8638.894003] ---[ end trace 0000000000000000 ]---
> [ 8639.098185] pstore: backend (nvram) writing error (-1)
> [ 8639.098205] 
> [ 8639.098215] note: kworker/62:2[5591] exited with irqs disabled
> [ 8798.806603] ------------[ cut here ]------------
> [ 8798.806631] WARNING: CPU: 62 PID: 3769 at kernel/kthread.c:76 kthread_set_per_cpu+0x40/0xd0
> [ 8798.806653] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
> br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle
> ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv
> ------------------------------------------------------------------------------------------
> 
> We will look at this issue as it is specific to PowerPC.

This does look like a bug in BPF proper.

> But I can confirm that the originally reported hang (deadlock) is no longer present.

Great to hear. I'll post the patchset w/ your Tested-by.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-09-23 18:21                         ` Tejun Heo
@ 2024-09-24  8:03                           ` Alexei Starovoitov
  2024-09-24 12:00                             ` Aboorva Devarajan
  0 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2024-09-24  8:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aboorva Devarajan, David Vernet, LKML, Alexei Starovoitov,
	Andrii Nakryiko

On Mon, Sep 23, 2024 at 8:21 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> (cc'ing Alexei and Andrii for the BPF part)
>
> On Mon, Sep 23, 2024 at 08:26:32PM +0530, Aboorva Devarajan wrote:
> > Sharing the crash logs observed in PowerPC here for general reference, FYI:
> >
> > [ 8638.891964] Kernel attempted to read user page (a8) - exploit attempt? (uid: 0)
> > [ 8638.892002] BUG: Kernel NULL pointer dereference on read at 0x000000a8
> > [ 8638.892019] Faulting instruction address: 0xc0000000004e7cc0
> > [ 8638.892038] Oops: Kernel access of bad area, sig: 11 [#1]
> > [ 8638.892060] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
> > [ 8638.892080] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
> > br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
> >  ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> > nf_defrag_ipv4 ebtable_filter ebtables vhost_vsock vmw_vsock_virtio_transport_common ip6tabl
> > e_filter ip6_tables vhost vhost_iotlb iptable_filter vsock bridge stp llc kvm_hv kvm joydev
> > input_leds mac_hid at24 ofpart cmdlinepart uio_pdrv_genirq ibmpowernv opal_prd ipmi_powernv
> > powernv_flash uio binfmt_misc sch_fq_codel nfsd mtd ipmi_devintf ipmi_msghandler auth_rpcgss
> > jc42 ramoops reed_solomon ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async
> > _memcpy async_pq async_xor async_tx raid1 raid0 dm_mirror dm_region_hash dm_log mlx5_ib ib_uverbs
> > ib_core mlx5_core hid_generic usbhid hid ast i2c_algo_bit drm_shmem_helper drm_kms_hel
> > per vmx_crypto drm mlxfw crct10dif_vpmsum crc32c_vpmsum psample tls tg3 ahci libahci
> > drm_panel_orientation_quirks
> > [ 8638.892621] CPU: 62 UID: 0 PID: 5591 Comm: kworker/62:2 Not tainted 6.11.0-rc4+ #2
> > [ 8638.892663] Hardware name: 8335-GTW POWER9 0x4e1203 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
> > [ 8638.892693] Workqueue: events bpf_prog_free_deferred
> > [ 8638.892735] NIP:  c0000000004e7cc0 LR: c0000000004e7bbc CTR: c0000000003a9b30
> > [ 8638.892798] REGS: c000000ea4cbf7f0 TRAP: 0300   Not tainted  (6.11.0-rc4+)
> > [ 8638.892862] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42a00284  XER: 00000000
> > [ 8638.892915] CFAR: c0000000004e7bb8 DAR: 00000000000000a8 DSISR: 40000000 IRQMASK: 1
> > [ 8638.892915] GPR00: c0000000004e7bbc c000000ea4cbfa90 c000000002837f00 0000000000000005
> > [ 8638.892915] GPR04: 0000000000000015 0000000000000009 0000000000000009 c000000004840b00
> > [ 8638.892915] GPR08: ffffffffffffffff 00000000ffffe000 ffffffffffffffff 000001937b55db50
> > [ 8638.892915] GPR12: 0000000000200000 c000007ffdfac300 c0000000031b1fc8 0000000000010000
> > [ 8638.892915] GPR16: c00000000000018e 000000007fffffff 0000000000000000 000000000000e1c0
> > [ 8638.892915] GPR20: 61c8864680b583eb 0000000000000000 0000000000000000 00000000000de1d5
> > [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8
> > [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8
> > [ 8638.892915] GPR28: 0000000000000000 0000000000000000 0000000000000000 c000000ea4cbfa90
> > [ 8638.893350] NIP [c0000000004e7cc0] walk_to_pmd+0x80/0x240
> > [ 8638.893380] LR [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
> > [ 8638.893398] Call Trace:
> > [ 8638.893407] [c000000ea4cbfa90] [c000000ea4cbfb20] 0xc000000ea4cbfb20 (unreliable)
> > [ 8638.893429] [c000000ea4cbfaf0] [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
> > [ 8638.893457] [c000000ea4cbfb40] [c0000000000b1dd0] patch_instructions+0x130/0x630
> > [ 8638.893500] [c000000ea4cbfc10] [c000000000123180] bpf_arch_text_invalidate+0x80/0xd0
> > [ 8638.893552] [c000000ea4cbfc60] [c0000000003a7508] bpf_prog_pack_free+0x138/0x2f0
> > [ 8638.893584] [c000000ea4cbfd10] [c0000000003a7e38] bpf_jit_binary_pack_free+0x48/0xa0
> > [ 8638.893617] [c000000ea4cbfd50] [c000000000123258] bpf_jit_free+0x88/0x100
> > [ 8638.893667] [c000000ea4cbfd90] [c0000000003a9d70] bpf_prog_free_deferred+0x240/0x280
> > [ 8638.893725] [c000000ea4cbfde0] [c0000000001a6828] process_scheduled_works+0x268/0x520
> > [ 8638.893767] [c000000ea4cbfee0] [c0000000001a9ed0] worker_thread+0x3f0/0x590
> > [ 8638.893809] [c000000ea4cbff80] [c0000000001b37b0] kthread+0x1a0/0x1c0
> > [ 8638.893862] [c000000ea4cbffe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> > [ 8638.893913] Code: 3cc20157 3b63c4f8 3b45c500 3929c510 3b26c508 3940ffff e87b0000 e8ba0000
> > 81290000 e8d90000 38830010 7d494830 <e87d00a8> 7ce42a14 7d2948f8 7d073214
> > [ 8638.894003] ---[ end trace 0000000000000000 ]---
> > [ 8639.098185] pstore: backend (nvram) writing error (-1)
> > [ 8639.098205]
> > [ 8639.098215] note: kworker/62:2[5591] exited with irqs disabled
> > [ 8798.806603] ------------[ cut here ]------------
> > [ 8798.806631] WARNING: CPU: 62 PID: 3769 at kernel/kthread.c:76 kthread_set_per_cpu+0x40/0xd0
> > [ 8798.806653] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
> > br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle
> > ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv
> > ------------------------------------------------------------------------------------------
> >
> > We will look at this issue as it is specific to PowerPC.
>
> This does look like a bug in BPF proper.

powerpc doesn't have bpf trampolines, struct-ops and
all the plumbing necessary to run sched-ext.

Aboorva,

please help test this patch set instead that is trying to add it:
https://lore.kernel.org/all/20240915205648.830121-1-hbathini@linux.ibm.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-09-24  8:03                           ` Alexei Starovoitov
@ 2024-09-24 12:00                             ` Aboorva Devarajan
  2024-09-24 16:57                               ` Song Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Aboorva Devarajan @ 2024-09-24 12:00 UTC (permalink / raw)
  To: Alexei Starovoitov, Tejun Heo
  Cc: David Vernet, LKML, Alexei Starovoitov, Andrii Nakryiko, hbathini

On Tue, 2024-09-24 at 10:03 +0200, Alexei Starovoitov wrote:
> On Mon, Sep 23, 2024 at 8:21 PM Tejun Heo <tj@kernel.org> wrote:
> > Hello,
> > 
> > (cc'ing Alexei and Andrii for the BPF part)
> > 
> > On Mon, Sep 23, 2024 at 08:26:32PM +0530, Aboorva Devarajan wrote:
> > > Sharing the crash logs observed in PowerPC here for general reference, FYI:
> > > 
> > > [ 8638.891964] Kernel attempted to read user page (a8) - exploit attempt? (uid: 0)
> > > [ 8638.892002] BUG: Kernel NULL pointer dereference on read at 0x000000a8
> > > [ 8638.892019] Faulting instruction address: 0xc0000000004e7cc0
> > > [ 8638.892038] Oops: Kernel access of bad area, sig: 11 [#1]
> > > [ 8638.892060] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
> > > [ 8638.892080] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
> > > br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
> > >  ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> > > nf_defrag_ipv4 ebtable_filter ebtables vhost_vsock vmw_vsock_virtio_transport_common ip6tabl
> > > e_filter ip6_tables vhost vhost_iotlb iptable_filter vsock bridge stp llc kvm_hv kvm joydev
> > > input_leds mac_hid at24 ofpart cmdlinepart uio_pdrv_genirq ibmpowernv opal_prd ipmi_powernv
> > > powernv_flash uio binfmt_misc sch_fq_codel nfsd mtd ipmi_devintf ipmi_msghandler auth_rpcgss
> > > jc42 ramoops reed_solomon ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async
> > > _memcpy async_pq async_xor async_tx raid1 raid0 dm_mirror dm_region_hash dm_log mlx5_ib ib_uverbs
> > > ib_core mlx5_core hid_generic usbhid hid ast i2c_algo_bit drm_shmem_helper drm_kms_hel
> > > per vmx_crypto drm mlxfw crct10dif_vpmsum crc32c_vpmsum psample tls tg3 ahci libahci
> > > drm_panel_orientation_quirks
> > > [ 8638.892621] CPU: 62 UID: 0 PID: 5591 Comm: kworker/62:2 Not tainted 6.11.0-rc4+ #2
> > > [ 8638.892663] Hardware name: 8335-GTW POWER9 0x4e1203 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
> > > [ 8638.892693] Workqueue: events bpf_prog_free_deferred
> > > [ 8638.892735] NIP:  c0000000004e7cc0 LR: c0000000004e7bbc CTR: c0000000003a9b30
> > > [ 8638.892798] REGS: c000000ea4cbf7f0 TRAP: 0300   Not tainted  (6.11.0-rc4+)
> > > [ 8638.892862] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42a00284  XER: 00000000
> > > [ 8638.892915] CFAR: c0000000004e7bb8 DAR: 00000000000000a8 DSISR: 40000000 IRQMASK: 1
> > > [ 8638.892915] GPR00: c0000000004e7bbc c000000ea4cbfa90 c000000002837f00 0000000000000005
> > > [ 8638.892915] GPR04: 0000000000000015 0000000000000009 0000000000000009 c000000004840b00
> > > [ 8638.892915] GPR08: ffffffffffffffff 00000000ffffe000 ffffffffffffffff 000001937b55db50
> > > [ 8638.892915] GPR12: 0000000000200000 c000007ffdfac300 c0000000031b1fc8 0000000000010000
> > > [ 8638.892915] GPR16: c00000000000018e 000000007fffffff 0000000000000000 000000000000e1c0
> > > [ 8638.892915] GPR20: 61c8864680b583eb 0000000000000000 0000000000000000 00000000000de1d5
> > > [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8
> > > [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8
> > > [ 8638.892915] GPR28: 0000000000000000 0000000000000000 0000000000000000 c000000ea4cbfa90
> > > [ 8638.893350] NIP [c0000000004e7cc0] walk_to_pmd+0x80/0x240
> > > [ 8638.893380] LR [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
> > > [ 8638.893398] Call Trace:
> > > [ 8638.893407] [c000000ea4cbfa90] [c000000ea4cbfb20] 0xc000000ea4cbfb20 (unreliable)
> > > [ 8638.893429] [c000000ea4cbfaf0] [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
> > > [ 8638.893457] [c000000ea4cbfb40] [c0000000000b1dd0] patch_instructions+0x130/0x630
> > > [ 8638.893500] [c000000ea4cbfc10] [c000000000123180] bpf_arch_text_invalidate+0x80/0xd0
> > > [ 8638.893552] [c000000ea4cbfc60] [c0000000003a7508] bpf_prog_pack_free+0x138/0x2f0
> > > [ 8638.893584] [c000000ea4cbfd10] [c0000000003a7e38] bpf_jit_binary_pack_free+0x48/0xa0
> > > [ 8638.893617] [c000000ea4cbfd50] [c000000000123258] bpf_jit_free+0x88/0x100
> > > [ 8638.893667] [c000000ea4cbfd90] [c0000000003a9d70] bpf_prog_free_deferred+0x240/0x280
> > > [ 8638.893725] [c000000ea4cbfde0] [c0000000001a6828] process_scheduled_works+0x268/0x520
> > > [ 8638.893767] [c000000ea4cbfee0] [c0000000001a9ed0] worker_thread+0x3f0/0x590
> > > [ 8638.893809] [c000000ea4cbff80] [c0000000001b37b0] kthread+0x1a0/0x1c0
> > > [ 8638.893862] [c000000ea4cbffe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> > > [ 8638.893913] Code: 3cc20157 3b63c4f8 3b45c500 3929c510 3b26c508 3940ffff e87b0000 e8ba0000
> > > 81290000 e8d90000 38830010 7d494830 <e87d00a8> 7ce42a14 7d2948f8 7d073214
> > > [ 8638.894003] ---[ end trace 0000000000000000 ]---
> > > [ 8639.098185] pstore: backend (nvram) writing error (-1)
> > > [ 8639.098205]
> > > [ 8639.098215] note: kworker/62:2[5591] exited with irqs disabled
> > > [ 8798.806603] ------------[ cut here ]------------
> > > [ 8798.806631] WARNING: CPU: 62 PID: 3769 at kernel/kthread.c:76 kthread_set_per_cpu+0x40/0xd0
> > > [ 8798.806653] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
> > > br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle
> > > ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv
> > > ------------------------------------------------------------------------------------------
> > > 
> > > We will look at this issue as it is specific to PowerPC.
> > 
> > This does look like a bug in BPF proper.
> 
> powerpc doesn't have bpf trampolines, struct-ops and
> all the plumbing necessary to run sched-ext.
> 

Hi Alexei, 

That's right, powerpc at the moment doesn't have support for BPF trampoline
and struct ops.

As mentioned in my previous message I have the patch series to support trampolie,
struct_ops applied on top of Tejun's branch while running sched_ext on PowerPC
which is the same series that you referenced.
https://lore.kernel.org/all/20240915205648.830121-1-hbathini@linux.ibm.com/.

However, I am seeing the above crash with this series applied.

> Aboorva,
> 
> please help test this patch set instead that is trying to add it:
> https://lore.kernel.org/all/20240915205648.830121-1-hbathini@linux.ibm.com/


Sure, I will post more details and will continue the discussion in the below
thread going forward.
https://lore.kernel.org/all/20240915205648.830121-1-hbathini@linux.ibm.com/

Thanks,
Aboorva


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-09-24 12:00                             ` Aboorva Devarajan
@ 2024-09-24 16:57                               ` Song Liu
  2024-09-30  6:37                                 ` Hari Bathini
  0 siblings, 1 reply; 18+ messages in thread
From: Song Liu @ 2024-09-24 16:57 UTC (permalink / raw)
  To: Aboorva Devarajan
  Cc: Alexei Starovoitov, Tejun Heo, David Vernet, LKML,
	Alexei Starovoitov, Andrii Nakryiko, hbathini

On Tue, Sep 24, 2024 at 5:00 AM Aboorva Devarajan
<aboorvad@linux.ibm.com> wrote:
>
> On Tue, 2024-09-24 at 10:03 +0200, Alexei Starovoitov wrote:
> > On Mon, Sep 23, 2024 at 8:21 PM Tejun Heo <tj@kernel.org> wrote:
> > > Hello,
> > >
> > > (cc'ing Alexei and Andrii for the BPF part)
> > >
> > > On Mon, Sep 23, 2024 at 08:26:32PM +0530, Aboorva Devarajan wrote:
> > > > Sharing the crash logs observed in PowerPC here for general reference, FYI:
> > > >
> > > > [ 8638.891964] Kernel attempted to read user page (a8) - exploit attempt? (uid: 0)
> > > > [ 8638.892002] BUG: Kernel NULL pointer dereference on read at 0x000000a8
> > > > [ 8638.892019] Faulting instruction address: 0xc0000000004e7cc0
> > > > [ 8638.892038] Oops: Kernel access of bad area, sig: 11 [#1]
> > > > [ 8638.892060] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
> > > > [ 8638.892080] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
> > > > br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
> > > >  ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> > > > nf_defrag_ipv4 ebtable_filter ebtables vhost_vsock vmw_vsock_virtio_transport_common ip6tabl
> > > > e_filter ip6_tables vhost vhost_iotlb iptable_filter vsock bridge stp llc kvm_hv kvm joydev
> > > > input_leds mac_hid at24 ofpart cmdlinepart uio_pdrv_genirq ibmpowernv opal_prd ipmi_powernv
> > > > powernv_flash uio binfmt_misc sch_fq_codel nfsd mtd ipmi_devintf ipmi_msghandler auth_rpcgss
> > > > jc42 ramoops reed_solomon ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async
> > > > _memcpy async_pq async_xor async_tx raid1 raid0 dm_mirror dm_region_hash dm_log mlx5_ib ib_uverbs
> > > > ib_core mlx5_core hid_generic usbhid hid ast i2c_algo_bit drm_shmem_helper drm_kms_hel
> > > > per vmx_crypto drm mlxfw crct10dif_vpmsum crc32c_vpmsum psample tls tg3 ahci libahci
> > > > drm_panel_orientation_quirks
> > > > [ 8638.892621] CPU: 62 UID: 0 PID: 5591 Comm: kworker/62:2 Not tainted 6.11.0-rc4+ #2
> > > > [ 8638.892663] Hardware name: 8335-GTW POWER9 0x4e1203 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
> > > > [ 8638.892693] Workqueue: events bpf_prog_free_deferred
> > > > [ 8638.892735] NIP:  c0000000004e7cc0 LR: c0000000004e7bbc CTR: c0000000003a9b30
> > > > [ 8638.892798] REGS: c000000ea4cbf7f0 TRAP: 0300   Not tainted  (6.11.0-rc4+)
> > > > [ 8638.892862] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42a00284  XER: 00000000
> > > > [ 8638.892915] CFAR: c0000000004e7bb8 DAR: 00000000000000a8 DSISR: 40000000 IRQMASK: 1
> > > > [ 8638.892915] GPR00: c0000000004e7bbc c000000ea4cbfa90 c000000002837f00 0000000000000005
> > > > [ 8638.892915] GPR04: 0000000000000015 0000000000000009 0000000000000009 c000000004840b00
> > > > [ 8638.892915] GPR08: ffffffffffffffff 00000000ffffe000 ffffffffffffffff 000001937b55db50
> > > > [ 8638.892915] GPR12: 0000000000200000 c000007ffdfac300 c0000000031b1fc8 0000000000010000
> > > > [ 8638.892915] GPR16: c00000000000018e 000000007fffffff 0000000000000000 000000000000e1c0
> > > > [ 8638.892915] GPR20: 61c8864680b583eb 0000000000000000 0000000000000000 00000000000de1d5
> > > > [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8
> > > > [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8
> > > > [ 8638.892915] GPR28: 0000000000000000 0000000000000000 0000000000000000 c000000ea4cbfa90
> > > > [ 8638.893350] NIP [c0000000004e7cc0] walk_to_pmd+0x80/0x240

With "BUG: Kernel NULL pointer dereference on read at 0x000000a8" (from above),
it appears bpf_arch_text_invalidate() is racing with
text_area_cpu_down_mm(), which
sets cpu_patching_context.mm to NULL?

Am I going in the right direction?

> > > > [ 8638.893380] LR [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
> > > > [ 8638.893398] Call Trace:
> > > > [ 8638.893407] [c000000ea4cbfa90] [c000000ea4cbfb20] 0xc000000ea4cbfb20 (unreliable)
> > > > [ 8638.893429] [c000000ea4cbfaf0] [c0000000004e7bbc] __get_locked_pte+0x4c/0xd0
> > > > [ 8638.893457] [c000000ea4cbfb40] [c0000000000b1dd0] patch_instructions+0x130/0x630
> > > > [ 8638.893500] [c000000ea4cbfc10] [c000000000123180] bpf_arch_text_invalidate+0x80/0xd0
> > > > [ 8638.893552] [c000000ea4cbfc60] [c0000000003a7508] bpf_prog_pack_free+0x138/0x2f0
> > > > [ 8638.893584] [c000000ea4cbfd10] [c0000000003a7e38] bpf_jit_binary_pack_free+0x48/0xa0
> > > > [ 8638.893617] [c000000ea4cbfd50] [c000000000123258] bpf_jit_free+0x88/0x100
> > > > [ 8638.893667] [c000000ea4cbfd90] [c0000000003a9d70] bpf_prog_free_deferred+0x240/0x280
> > > > [ 8638.893725] [c000000ea4cbfde0] [c0000000001a6828] process_scheduled_works+0x268/0x520
> > > > [ 8638.893767] [c000000ea4cbfee0] [c0000000001a9ed0] worker_thread+0x3f0/0x590
> > > > [ 8638.893809] [c000000ea4cbff80] [c0000000001b37b0] kthread+0x1a0/0x1c0
> > > > [ 8638.893862] [c000000ea4cbffe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> > > > [ 8638.893913] Code: 3cc20157 3b63c4f8 3b45c500 3929c510 3b26c508 3940ffff e87b0000 e8ba0000
> > > > 81290000 e8d90000 38830010 7d494830 <e87d00a8> 7ce42a14 7d2948f8 7d073214
> > > > [ 8638.894003] ---[ end trace 0000000000000000 ]---
> > > > [ 8639.098185] pstore: backend (nvram) writing error (-1)
> > > > [ 8639.098205]
> > > > [ 8639.098215] note: kworker/62:2[5591] exited with irqs disabled
> > > > [ 8798.806603] ------------[ cut here ]------------
> > > > [ 8798.806631] WARNING: CPU: 62 PID: 3769 at kernel/kthread.c:76 kthread_set_per_cpu+0x40/0xd0

This warning also seems relevant. Are running the work queue on a cpu
that is going away?

Thanks,
Song

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
  2024-09-24 16:57                               ` Song Liu
@ 2024-09-30  6:37                                 ` Hari Bathini
  0 siblings, 0 replies; 18+ messages in thread
From: Hari Bathini @ 2024-09-30  6:37 UTC (permalink / raw)
  To: Song Liu, Aboorva Devarajan
  Cc: Alexei Starovoitov, Tejun Heo, David Vernet, LKML,
	Alexei Starovoitov, Andrii Nakryiko



On 24/09/24 10:27 pm, Song Liu wrote:
> On Tue, Sep 24, 2024 at 5:00 AM Aboorva Devarajan
> <aboorvad@linux.ibm.com> wrote:
>>
>> On Tue, 2024-09-24 at 10:03 +0200, Alexei Starovoitov wrote:
>>> On Mon, Sep 23, 2024 at 8:21 PM Tejun Heo <tj@kernel.org> wrote:
>>>> Hello,
>>>>
>>>> (cc'ing Alexei and Andrii for the BPF part)
>>>>
>>>> On Mon, Sep 23, 2024 at 08:26:32PM +0530, Aboorva Devarajan wrote:
>>>>> Sharing the crash logs observed in PowerPC here for general reference, FYI:
>>>>>
>>>>> [ 8638.891964] Kernel attempted to read user page (a8) - exploit attempt? (uid: 0)
>>>>> [ 8638.892002] BUG: Kernel NULL pointer dereference on read at 0x000000a8
>>>>> [ 8638.892019] Faulting instruction address: 0xc0000000004e7cc0
>>>>> [ 8638.892038] Oops: Kernel access of bad area, sig: 11 [#1]
>>>>> [ 8638.892060] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
>>>>> [ 8638.892080] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
>>>>> br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
>>>>>   ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
>>>>> nf_defrag_ipv4 ebtable_filter ebtables vhost_vsock vmw_vsock_virtio_transport_common ip6tabl
>>>>> e_filter ip6_tables vhost vhost_iotlb iptable_filter vsock bridge stp llc kvm_hv kvm joydev
>>>>> input_leds mac_hid at24 ofpart cmdlinepart uio_pdrv_genirq ibmpowernv opal_prd ipmi_powernv
>>>>> powernv_flash uio binfmt_misc sch_fq_codel nfsd mtd ipmi_devintf ipmi_msghandler auth_rpcgss
>>>>> jc42 ramoops reed_solomon ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async
>>>>> _memcpy async_pq async_xor async_tx raid1 raid0 dm_mirror dm_region_hash dm_log mlx5_ib ib_uverbs
>>>>> ib_core mlx5_core hid_generic usbhid hid ast i2c_algo_bit drm_shmem_helper drm_kms_hel
>>>>> per vmx_crypto drm mlxfw crct10dif_vpmsum crc32c_vpmsum psample tls tg3 ahci libahci
>>>>> drm_panel_orientation_quirks
>>>>> [ 8638.892621] CPU: 62 UID: 0 PID: 5591 Comm: kworker/62:2 Not tainted 6.11.0-rc4+ #2
>>>>> [ 8638.892663] Hardware name: 8335-GTW POWER9 0x4e1203 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
>>>>> [ 8638.892693] Workqueue: events bpf_prog_free_deferred
>>>>> [ 8638.892735] NIP:  c0000000004e7cc0 LR: c0000000004e7bbc CTR: c0000000003a9b30
>>>>> [ 8638.892798] REGS: c000000ea4cbf7f0 TRAP: 0300   Not tainted  (6.11.0-rc4+)
>>>>> [ 8638.892862] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42a00284  XER: 00000000
>>>>> [ 8638.892915] CFAR: c0000000004e7bb8 DAR: 00000000000000a8 DSISR: 40000000 IRQMASK: 1
>>>>> [ 8638.892915] GPR00: c0000000004e7bbc c000000ea4cbfa90 c000000002837f00 0000000000000005
>>>>> [ 8638.892915] GPR04: 0000000000000015 0000000000000009 0000000000000009 c000000004840b00
>>>>> [ 8638.892915] GPR08: ffffffffffffffff 00000000ffffe000 ffffffffffffffff 000001937b55db50
>>>>> [ 8638.892915] GPR12: 0000000000200000 c000007ffdfac300 c0000000031b1fc8 0000000000010000
>>>>> [ 8638.892915] GPR16: c00000000000018e 000000007fffffff 0000000000000000 000000000000e1c0
>>>>> [ 8638.892915] GPR20: 61c8864680b583eb 0000000000000000 0000000000000000 00000000000de1d5
>>>>> [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8
>>>>> [ 8638.892915] GPR24: 0000000000000000 c000000003da4408 c000000003da4400 c000000003da43f8
>>>>> [ 8638.892915] GPR28: 0000000000000000 0000000000000000 0000000000000000 c000000ea4cbfa90
>>>>> [ 8638.893350] NIP [c0000000004e7cc0] walk_to_pmd+0x80/0x240
> 
> With "BUG: Kernel NULL pointer dereference on read at 0x000000a8" (from above),
> it appears bpf_arch_text_invalidate() is racing with
> text_area_cpu_down_mm(), which
> sets cpu_patching_context.mm to NULL?
> 
> Am I going in the right direction?

Hi Song,

Yeah. That is the most plausible explanation for it.
I have just about started looking at this.
Will update if I find anything different..

Thanks
Hari



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-09-30  6:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-09  7:51 [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug Aboorva Devarajan
2024-07-10  0:02 ` Tejun Heo
2024-07-11 12:00   ` Aboorva Devarajan
2024-07-12 10:04     ` Aboorva Devarajan
2024-08-02 21:20       ` Tejun Heo
2024-08-10 18:24         ` Aboorva Devarajan
2024-08-13 19:54           ` Tejun Heo
2024-08-20  7:03             ` Aboorva Devarajan
2024-08-20 19:38               ` Tejun Heo
2024-08-23  9:20                 ` Aboorva Devarajan
2024-08-26 18:32                   ` Tejun Heo
2024-09-17 12:48                     ` Tejun Heo
2024-09-23 14:56                       ` Aboorva Devarajan
2024-09-23 18:21                         ` Tejun Heo
2024-09-24  8:03                           ` Alexei Starovoitov
2024-09-24 12:00                             ` Aboorva Devarajan
2024-09-24 16:57                               ` Song Liu
2024-09-30  6:37                                 ` Hari Bathini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox