* BUG: unable to handle kernel paging request in bpf_dispatcher_xdp @ 2022-12-06 3:28 Hao Sun 2022-12-06 6:46 ` Hao Sun 2022-12-08 8:44 ` BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot Thorsten Leemhuis 0 siblings, 2 replies; 26+ messages in thread From: Hao Sun @ 2022-12-06 3:28 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Miller, Jakub Kicinski, hawk, Linux Kernel Mailing List, netdev Hi, The following crash can be triggered with the BPF prog provided. It seems the verifier passed some invalid progs. I will try to simplify the C reproducer, for now, the following can reproduce this: HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in functions in bpf_iter_ksym git tree: bpf-next console log: https://pastebin.com/raw/87RCSnCs kernel config: https://pastebin.com/raw/rZdWLcgK Syz reproducer: https://pastebin.com/raw/4kbwhdEv C reproducer: https://pastebin.com/raw/GFfDn2Gk wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50 IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready wlan1: Created IBSS using preconfigured BSSID 50:50:50:50:50:50 wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50 IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready BUG: unable to handle page fault for address: 000000000fe0840f #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 2ebe3067 P4D 2ebe3067 PUD 1dd9b067 PMD 0 Oops: 0002 [#1] PREEMPT SMP KASAN CPU: 0 PID: 7536 Comm: a.out Not tainted 6.1.0-rc7-01489-gab0350c743d5-dirty #118 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.1-1-1 04/01/2014 RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000 Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81 fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b RSP: 0018:ffffc900029df908 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000 RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000 R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50 FS: 00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __bpf_prog_run include/linux/filter.h:600 [inline] ? bpf_prog_run_xdp include/linux/filter.h:775 [inline] ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400 ? bpf_test_timer_continue+0x3d0/0x3d0 net/bpf/test_run.c:79 ? bpf_dispatcher_xdp+0x800/0x1000 ? bpf_dispatcher_xdp+0x800/0x1000 ? bpf_dispatcher_xdp+0x800/0x1000 ? _copy_from_user+0x5f/0x180 lib/usercopy.c:21 ? bpf_test_init.isra.0+0x111/0x150 net/bpf/test_run.c:772 ? bpf_prog_test_run_xdp+0xbde/0x1400 net/bpf/test_run.c:1389 ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594 ? rcu_lock_release include/linux/rcupdate.h:321 [inline] ? rcu_read_unlock include/linux/rcupdate.h:783 [inline] ? __fget_files+0x283/0x3e0 fs/file.c:914 ? fput+0x30/0x1a0 fs/file_table.c:371 ? ____bpf_prog_get kernel/bpf/syscall.c:2206 [inline] ? __bpf_prog_get+0x9a/0x2e0 kernel/bpf/syscall.c:2270 ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594 ? bpf_prog_test_run kernel/bpf/syscall.c:3644 [inline] ? __sys_bpf+0x1293/0x5840 kernel/bpf/syscall.c:4997 ? futex_wait_setup+0x230/0x230 kernel/futex/waitwake.c:625 ? bpf_perf_link_attach+0x520/0x520 kernel/bpf/syscall.c:2720 ? instrument_atomic_read include/linux/instrumented.h:72 [inline] ? atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] ? queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline] ? debug_spin_unlock kernel/locking/spinlock_debug.c:100 [inline] ? do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:140 ? futex_wake+0x15b/0x4a0 kernel/futex/waitwake.c:161 ? do_futex+0x130/0x350 kernel/futex/syscalls.c:122 ? __ia32_sys_get_robust_list+0x3b0/0x3b0 kernel/futex/syscalls.c:72 ? __do_sys_bpf kernel/bpf/syscall.c:5083 [inline] ? __se_sys_bpf kernel/bpf/syscall.c:5081 [inline] ? __x64_sys_bpf+0x78/0xc0 kernel/bpf/syscall.c:5081 ? syscall_enter_from_user_mode+0x26/0xb0 kernel/entry/common.c:111 ? do_syscall_x64 arch/x86/entry/common.c:50 [inline] ? do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 ? entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) CR2: 000000000fe0840f ---[ end trace 0000000000000000 ]--- RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000 Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81 fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b RSP: 0018:ffffc900029df908 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000 RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000 R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50 FS: 00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0 PKRU: 55555554 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-06 3:28 BUG: unable to handle kernel paging request in bpf_dispatcher_xdp Hao Sun @ 2022-12-06 6:46 ` Hao Sun 2022-12-06 15:18 ` Jiri Olsa 2022-12-08 8:44 ` BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot Thorsten Leemhuis 1 sibling, 1 reply; 26+ messages in thread From: Hao Sun @ 2022-12-06 6:46 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Miller, Jakub Kicinski, hawk, Linux Kernel Mailing List, netdev Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道: > > Hi, > > The following crash can be triggered with the BPF prog provided. > It seems the verifier passed some invalid progs. I will try to simplify > the C reproducer, for now, the following can reproduce this: > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in > functions in bpf_iter_ksym > git tree: bpf-next > console log: https://pastebin.com/raw/87RCSnCs > kernel config: https://pastebin.com/raw/rZdWLcgK > Syz reproducer: https://pastebin.com/raw/4kbwhdEv > C reproducer: https://pastebin.com/raw/GFfDn2Gk > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW Only two syscalls are required to reproduce this, seems it's an issue in XDP test run. Essentially, the reproducer just loads a very simple prog and tests run repeatedly and concurrently: r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb, &(0x7f0000000500)}, 0x80) bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48) Loaded prog: 0: (18) r0 = 0x0 2: (18) r6 = 0x0 4: (18) r7 = 0x0 6: (18) r8 = 0x0 8: (18) r9 = 0x0 10: (95) exit > wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50 > IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready > wlan1: Created IBSS using preconfigured BSSID 50:50:50:50:50:50 > wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50 > IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready > BUG: unable to handle page fault for address: 000000000fe0840f > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 2ebe3067 P4D 2ebe3067 PUD 1dd9b067 PMD 0 > Oops: 0002 [#1] PREEMPT SMP KASAN > CPU: 0 PID: 7536 Comm: a.out Not tainted > 6.1.0-rc7-01489-gab0350c743d5-dirty #118 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux > 1.16.1-1-1 04/01/2014 > RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000 > Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81 > fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0 > 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b > RSP: 0018:ffffc900029df908 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000 > RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70 > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000 > R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50 > FS: 00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0 > PKRU: 55555554 > Call Trace: > <TASK> > ? __bpf_prog_run include/linux/filter.h:600 [inline] > ? bpf_prog_run_xdp include/linux/filter.h:775 [inline] > ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400 > ? bpf_test_timer_continue+0x3d0/0x3d0 net/bpf/test_run.c:79 > ? bpf_dispatcher_xdp+0x800/0x1000 > ? bpf_dispatcher_xdp+0x800/0x1000 > ? bpf_dispatcher_xdp+0x800/0x1000 > ? _copy_from_user+0x5f/0x180 lib/usercopy.c:21 > ? bpf_test_init.isra.0+0x111/0x150 net/bpf/test_run.c:772 > ? bpf_prog_test_run_xdp+0xbde/0x1400 net/bpf/test_run.c:1389 > ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594 > ? rcu_lock_release include/linux/rcupdate.h:321 [inline] > ? rcu_read_unlock include/linux/rcupdate.h:783 [inline] > ? __fget_files+0x283/0x3e0 fs/file.c:914 > ? fput+0x30/0x1a0 fs/file_table.c:371 > ? ____bpf_prog_get kernel/bpf/syscall.c:2206 [inline] > ? __bpf_prog_get+0x9a/0x2e0 kernel/bpf/syscall.c:2270 > ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594 > ? bpf_prog_test_run kernel/bpf/syscall.c:3644 [inline] > ? __sys_bpf+0x1293/0x5840 kernel/bpf/syscall.c:4997 > ? futex_wait_setup+0x230/0x230 kernel/futex/waitwake.c:625 > ? bpf_perf_link_attach+0x520/0x520 kernel/bpf/syscall.c:2720 > ? instrument_atomic_read include/linux/instrumented.h:72 [inline] > ? atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] > ? queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline] > ? debug_spin_unlock kernel/locking/spinlock_debug.c:100 [inline] > ? do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:140 > ? futex_wake+0x15b/0x4a0 kernel/futex/waitwake.c:161 > ? do_futex+0x130/0x350 kernel/futex/syscalls.c:122 > ? __ia32_sys_get_robust_list+0x3b0/0x3b0 kernel/futex/syscalls.c:72 > ? __do_sys_bpf kernel/bpf/syscall.c:5083 [inline] > ? __se_sys_bpf kernel/bpf/syscall.c:5081 [inline] > ? __x64_sys_bpf+0x78/0xc0 kernel/bpf/syscall.c:5081 > ? syscall_enter_from_user_mode+0x26/0xb0 kernel/entry/common.c:111 > ? do_syscall_x64 arch/x86/entry/common.c:50 [inline] > ? do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 > ? entry_SYSCALL_64_after_hwframe+0x63/0xcd > </TASK> > Modules linked in: > Dumping ftrace buffer: > (ftrace buffer empty) > CR2: 000000000fe0840f > ---[ end trace 0000000000000000 ]--- > RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000 > Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81 > fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0 > 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b > RSP: 0018:ffffc900029df908 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000 > RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70 > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000 > R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50 > FS: 00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0 > PKRU: 55555554 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-06 6:46 ` Hao Sun @ 2022-12-06 15:18 ` Jiri Olsa 2022-12-07 19:57 ` Alexei Starovoitov 0 siblings, 1 reply; 26+ messages in thread From: Jiri Olsa @ 2022-12-06 15:18 UTC (permalink / raw) To: Hao Sun, Peter Zijlstra Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, hawk, Linux Kernel Mailing List, netdev On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote: > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道: > > > > Hi, > > > > The following crash can be triggered with the BPF prog provided. > > It seems the verifier passed some invalid progs. I will try to simplify > > the C reproducer, for now, the following can reproduce this: > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in > > functions in bpf_iter_ksym > > git tree: bpf-next > > console log: https://pastebin.com/raw/87RCSnCs > > kernel config: https://pastebin.com/raw/rZdWLcgK > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv > > C reproducer: https://pastebin.com/raw/GFfDn2Gk > > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW > > Only two syscalls are required to reproduce this, seems it's an issue > in XDP test run. Essentially, the reproducer just loads a very simple > prog and tests run repeatedly and concurrently: > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb, > &(0x7f0000000500)}, 0x80) > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48) > > Loaded prog: > 0: (18) r0 = 0x0 > 2: (18) r6 = 0x0 > 4: (18) r7 = 0x0 > 6: (18) r8 = 0x0 > 8: (18) r9 = 0x0 > 10: (95) exit hi, I can reproduce with your config.. it seems related to the recent static call change: c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) I can't reproduce when I revert that commit.. Peter, any idea? thanks, jirka > > > wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50 > > IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready > > wlan1: Created IBSS using preconfigured BSSID 50:50:50:50:50:50 > > wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50 > > IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready > > BUG: unable to handle page fault for address: 000000000fe0840f > > #PF: supervisor write access in kernel mode > > #PF: error_code(0x0002) - not-present page > > PGD 2ebe3067 P4D 2ebe3067 PUD 1dd9b067 PMD 0 > > Oops: 0002 [#1] PREEMPT SMP KASAN > > CPU: 0 PID: 7536 Comm: a.out Not tainted > > 6.1.0-rc7-01489-gab0350c743d5-dirty #118 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux > > 1.16.1-1-1 04/01/2014 > > RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000 > > Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81 > > fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0 > > 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b > > RSP: 0018:ffffc900029df908 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000 > > RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70 > > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > > R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000 > > R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50 > > FS: 00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0 > > PKRU: 55555554 > > Call Trace: > > <TASK> > > ? __bpf_prog_run include/linux/filter.h:600 [inline] > > ? bpf_prog_run_xdp include/linux/filter.h:775 [inline] > > ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400 > > ? bpf_test_timer_continue+0x3d0/0x3d0 net/bpf/test_run.c:79 > > ? bpf_dispatcher_xdp+0x800/0x1000 > > ? bpf_dispatcher_xdp+0x800/0x1000 > > ? bpf_dispatcher_xdp+0x800/0x1000 > > ? _copy_from_user+0x5f/0x180 lib/usercopy.c:21 > > ? bpf_test_init.isra.0+0x111/0x150 net/bpf/test_run.c:772 > > ? bpf_prog_test_run_xdp+0xbde/0x1400 net/bpf/test_run.c:1389 > > ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594 > > ? rcu_lock_release include/linux/rcupdate.h:321 [inline] > > ? rcu_read_unlock include/linux/rcupdate.h:783 [inline] > > ? __fget_files+0x283/0x3e0 fs/file.c:914 > > ? fput+0x30/0x1a0 fs/file_table.c:371 > > ? ____bpf_prog_get kernel/bpf/syscall.c:2206 [inline] > > ? __bpf_prog_get+0x9a/0x2e0 kernel/bpf/syscall.c:2270 > > ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594 > > ? bpf_prog_test_run kernel/bpf/syscall.c:3644 [inline] > > ? __sys_bpf+0x1293/0x5840 kernel/bpf/syscall.c:4997 > > ? futex_wait_setup+0x230/0x230 kernel/futex/waitwake.c:625 > > ? bpf_perf_link_attach+0x520/0x520 kernel/bpf/syscall.c:2720 > > ? instrument_atomic_read include/linux/instrumented.h:72 [inline] > > ? atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] > > ? queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline] > > ? debug_spin_unlock kernel/locking/spinlock_debug.c:100 [inline] > > ? do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:140 > > ? futex_wake+0x15b/0x4a0 kernel/futex/waitwake.c:161 > > ? do_futex+0x130/0x350 kernel/futex/syscalls.c:122 > > ? __ia32_sys_get_robust_list+0x3b0/0x3b0 kernel/futex/syscalls.c:72 > > ? __do_sys_bpf kernel/bpf/syscall.c:5083 [inline] > > ? __se_sys_bpf kernel/bpf/syscall.c:5081 [inline] > > ? __x64_sys_bpf+0x78/0xc0 kernel/bpf/syscall.c:5081 > > ? syscall_enter_from_user_mode+0x26/0xb0 kernel/entry/common.c:111 > > ? do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > ? do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 > > ? entry_SYSCALL_64_after_hwframe+0x63/0xcd > > </TASK> > > Modules linked in: > > Dumping ftrace buffer: > > (ftrace buffer empty) > > CR2: 000000000fe0840f > > ---[ end trace 0000000000000000 ]--- > > RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000 > > Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81 > > fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0 > > 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b > > RSP: 0018:ffffc900029df908 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000 > > RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70 > > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > > R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000 > > R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50 > > FS: 00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0 > > PKRU: 55555554 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-06 15:18 ` Jiri Olsa @ 2022-12-07 19:57 ` Alexei Starovoitov 2022-12-08 17:48 ` Alexei Starovoitov 0 siblings, 1 reply; 26+ messages in thread From: Alexei Starovoitov @ 2022-12-07 19:57 UTC (permalink / raw) To: Jiri Olsa Cc: Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote: > > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道: > > > > > > Hi, > > > > > > The following crash can be triggered with the BPF prog provided. > > > It seems the verifier passed some invalid progs. I will try to simplify > > > the C reproducer, for now, the following can reproduce this: > > > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in > > > functions in bpf_iter_ksym > > > git tree: bpf-next > > > console log: https://pastebin.com/raw/87RCSnCs > > > kernel config: https://pastebin.com/raw/rZdWLcgK > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk > > > > > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW > > > > Only two syscalls are required to reproduce this, seems it's an issue > > in XDP test run. Essentially, the reproducer just loads a very simple > > prog and tests run repeatedly and concurrently: > > > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb, > > &(0x7f0000000500)}, 0x80) > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0, > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48) > > > > Loaded prog: > > 0: (18) r0 = 0x0 > > 2: (18) r6 = 0x0 > > 4: (18) r7 = 0x0 > > 6: (18) r8 = 0x0 > > 8: (18) r9 = 0x0 > > 10: (95) exit > > hi, > I can reproduce with your config.. it seems related to the > recent static call change: > c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) > > I can't reproduce when I revert that commit.. Peter, any idea? Jiri, I see your tested-by tag on Peter's commit c86df29d11df. I assume you're actually tested it, but this syzbot oops shows that even empty bpf prog crashes, so there is something wrong with that commit. What is the difference between this new kconfig and old one that you've tested? I'm trying to understand the severity of the issues and whether we need to revert that commit asap since the merge window is about to start. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-07 19:57 ` Alexei Starovoitov @ 2022-12-08 17:48 ` Alexei Starovoitov 2022-12-08 18:06 ` Jiri Olsa 0 siblings, 1 reply; 26+ messages in thread From: Alexei Starovoitov @ 2022-12-08 17:48 UTC (permalink / raw) To: Jiri Olsa Cc: Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Wed, Dec 7, 2022 at 11:57 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote: > > > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道: > > > > > > > > Hi, > > > > > > > > The following crash can be triggered with the BPF prog provided. > > > > It seems the verifier passed some invalid progs. I will try to simplify > > > > the C reproducer, for now, the following can reproduce this: > > > > > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in > > > > functions in bpf_iter_ksym > > > > git tree: bpf-next > > > > console log: https://pastebin.com/raw/87RCSnCs > > > > kernel config: https://pastebin.com/raw/rZdWLcgK > > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv > > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk > > > > > > > > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW > > > > > > Only two syscalls are required to reproduce this, seems it's an issue > > > in XDP test run. Essentially, the reproducer just loads a very simple > > > prog and tests run repeatedly and concurrently: > > > > > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb, > > > &(0x7f0000000500)}, 0x80) > > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0, > > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48) > > > > > > Loaded prog: > > > 0: (18) r0 = 0x0 > > > 2: (18) r6 = 0x0 > > > 4: (18) r7 = 0x0 > > > 6: (18) r8 = 0x0 > > > 8: (18) r9 = 0x0 > > > 10: (95) exit > > > > hi, > > I can reproduce with your config.. it seems related to the > > recent static call change: > > c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) > > > > I can't reproduce when I revert that commit.. Peter, any idea? > > Jiri, > > I see your tested-by tag on Peter's commit c86df29d11df. > I assume you're actually tested it, but > this syzbot oops shows that even empty bpf prog crashes, > so there is something wrong with that commit. > > What is the difference between this new kconfig and old one that > you've tested? > > I'm trying to understand the severity of the issues and > whether we need to revert that commit asap since the merge window > is about to start. Jiri, Peter, ping. cc-ing Thorsten, since he's tracking it now. The config has CONFIG_X86_KERNEL_IBT=y. Is it related? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-08 17:48 ` Alexei Starovoitov @ 2022-12-08 18:06 ` Jiri Olsa [not found] ` <Y5JkomOZaCETLDaZ@krava> 0 siblings, 1 reply; 26+ messages in thread From: Jiri Olsa @ 2022-12-08 18:06 UTC (permalink / raw) To: Alexei Starovoitov Cc: Jiri Olsa, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Thu, Dec 08, 2022 at 09:48:52AM -0800, Alexei Starovoitov wrote: > On Wed, Dec 7, 2022 at 11:57 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote: > > > > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道: > > > > > > > > > > Hi, > > > > > > > > > > The following crash can be triggered with the BPF prog provided. > > > > > It seems the verifier passed some invalid progs. I will try to simplify > > > > > the C reproducer, for now, the following can reproduce this: > > > > > > > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in > > > > > functions in bpf_iter_ksym > > > > > git tree: bpf-next > > > > > console log: https://pastebin.com/raw/87RCSnCs > > > > > kernel config: https://pastebin.com/raw/rZdWLcgK > > > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv > > > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk > > > > > > > > > > > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW > > > > > > > > Only two syscalls are required to reproduce this, seems it's an issue > > > > in XDP test run. Essentially, the reproducer just loads a very simple > > > > prog and tests run repeatedly and concurrently: > > > > > > > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb, > > > > &(0x7f0000000500)}, 0x80) > > > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0, > > > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48) > > > > > > > > Loaded prog: > > > > 0: (18) r0 = 0x0 > > > > 2: (18) r6 = 0x0 > > > > 4: (18) r7 = 0x0 > > > > 6: (18) r8 = 0x0 > > > > 8: (18) r9 = 0x0 > > > > 10: (95) exit > > > > > > hi, > > > I can reproduce with your config.. it seems related to the > > > recent static call change: > > > c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) > > > > > > I can't reproduce when I revert that commit.. Peter, any idea? > > > > Jiri, > > > > I see your tested-by tag on Peter's commit c86df29d11df. > > I assume you're actually tested it, but > > this syzbot oops shows that even empty bpf prog crashes, > > so there is something wrong with that commit. > > > > What is the difference between this new kconfig and old one that > > you've tested? > > > > I'm trying to understand the severity of the issues and > > whether we need to revert that commit asap since the merge window > > is about to start. > > Jiri, Peter, > > ping. > > cc-ing Thorsten, since he's tracking it now. > > The config has CONFIG_X86_KERNEL_IBT=y. > Is it related? sorry for late reply.. I still did not find the reason, but I did not try with IBT yet, will test now jirka ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <Y5JkomOZaCETLDaZ@krava>]
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp [not found] ` <Y5JkomOZaCETLDaZ@krava> @ 2022-12-08 23:02 ` Jiri Olsa 2022-12-09 7:09 ` Jiri Olsa 0 siblings, 1 reply; 26+ messages in thread From: Jiri Olsa @ 2022-12-08 23:02 UTC (permalink / raw) To: Jiri Olsa Cc: Alexei Starovoitov, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Thu, Dec 08, 2022 at 11:26:45PM +0100, Jiri Olsa wrote: > On Thu, Dec 08, 2022 at 07:06:59PM +0100, Jiri Olsa wrote: > > On Thu, Dec 08, 2022 at 09:48:52AM -0800, Alexei Starovoitov wrote: > > > On Wed, Dec 7, 2022 at 11:57 AM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > > > > > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote: > > > > > > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > The following crash can be triggered with the BPF prog provided. > > > > > > > It seems the verifier passed some invalid progs. I will try to simplify > > > > > > > the C reproducer, for now, the following can reproduce this: > > > > > > > > > > > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in > > > > > > > functions in bpf_iter_ksym > > > > > > > git tree: bpf-next > > > > > > > console log: https://pastebin.com/raw/87RCSnCs > > > > > > > kernel config: https://pastebin.com/raw/rZdWLcgK > > > > > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv > > > > > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk > > > > > > > > > > > > > > > > > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW > > > > > > > > > > > > Only two syscalls are required to reproduce this, seems it's an issue > > > > > > in XDP test run. Essentially, the reproducer just loads a very simple > > > > > > prog and tests run repeatedly and concurrently: > > > > > > > > > > > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb, > > > > > > &(0x7f0000000500)}, 0x80) > > > > > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0, > > > > > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48) > > > > > > > > > > > > Loaded prog: > > > > > > 0: (18) r0 = 0x0 > > > > > > 2: (18) r6 = 0x0 > > > > > > 4: (18) r7 = 0x0 > > > > > > 6: (18) r8 = 0x0 > > > > > > 8: (18) r9 = 0x0 > > > > > > 10: (95) exit > > > > > > > > > > hi, > > > > > I can reproduce with your config.. it seems related to the > > > > > recent static call change: > > > > > c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) > > > > > > > > > > I can't reproduce when I revert that commit.. Peter, any idea? > > > > > > > > Jiri, > > > > > > > > I see your tested-by tag on Peter's commit c86df29d11df. > > > > I assume you're actually tested it, but > > > > this syzbot oops shows that even empty bpf prog crashes, > > > > so there is something wrong with that commit. > > > > > > > > What is the difference between this new kconfig and old one that > > > > you've tested? > > I attached the diff, 'config-issue' is the one that reproduces the issue > > > > > > > > > I'm trying to understand the severity of the issues and > > > > whether we need to revert that commit asap since the merge window > > > > is about to start. > > > > > > Jiri, Peter, > > > > > > ping. > > > > > > cc-ing Thorsten, since he's tracking it now. > > > > > > The config has CONFIG_X86_KERNEL_IBT=y. > > > Is it related? > > > > sorry for late reply.. I still did not find the reason, > > but I did not try with IBT yet, will test now > > no difference with IBT enabled, can't reproduce the issue > ok, scratch that.. the reproducer got stuck on wifi init :-\ after I fix that I can now reproduce on my local config with IBT enabled or disabled.. it's something else jirka ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-08 23:02 ` Jiri Olsa @ 2022-12-09 7:09 ` Jiri Olsa [not found] ` <Y5MaffJOe1QtumSN@krava> 0 siblings, 1 reply; 26+ messages in thread From: Jiri Olsa @ 2022-12-09 7:09 UTC (permalink / raw) To: Jiri Olsa Cc: Alexei Starovoitov, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Fri, Dec 09, 2022 at 12:02:24AM +0100, Jiri Olsa wrote: > On Thu, Dec 08, 2022 at 11:26:45PM +0100, Jiri Olsa wrote: > > On Thu, Dec 08, 2022 at 07:06:59PM +0100, Jiri Olsa wrote: > > > On Thu, Dec 08, 2022 at 09:48:52AM -0800, Alexei Starovoitov wrote: > > > > On Wed, Dec 7, 2022 at 11:57 AM Alexei Starovoitov > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > > > > > > > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote: > > > > > > > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > The following crash can be triggered with the BPF prog provided. > > > > > > > > It seems the verifier passed some invalid progs. I will try to simplify > > > > > > > > the C reproducer, for now, the following can reproduce this: > > > > > > > > > > > > > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in > > > > > > > > functions in bpf_iter_ksym > > > > > > > > git tree: bpf-next > > > > > > > > console log: https://pastebin.com/raw/87RCSnCs > > > > > > > > kernel config: https://pastebin.com/raw/rZdWLcgK > > > > > > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv > > > > > > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk > > > > > > > > > > > > > > > > > > > > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW > > > > > > > > > > > > > > Only two syscalls are required to reproduce this, seems it's an issue > > > > > > > in XDP test run. Essentially, the reproducer just loads a very simple > > > > > > > prog and tests run repeatedly and concurrently: > > > > > > > > > > > > > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb, > > > > > > > &(0x7f0000000500)}, 0x80) > > > > > > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0, > > > > > > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48) > > > > > > > > > > > > > > Loaded prog: > > > > > > > 0: (18) r0 = 0x0 > > > > > > > 2: (18) r6 = 0x0 > > > > > > > 4: (18) r7 = 0x0 > > > > > > > 6: (18) r8 = 0x0 > > > > > > > 8: (18) r9 = 0x0 > > > > > > > 10: (95) exit > > > > > > > > > > > > hi, > > > > > > I can reproduce with your config.. it seems related to the > > > > > > recent static call change: > > > > > > c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) > > > > > > > > > > > > I can't reproduce when I revert that commit.. Peter, any idea? > > > > > > > > > > Jiri, > > > > > > > > > > I see your tested-by tag on Peter's commit c86df29d11df. > > > > > I assume you're actually tested it, but > > > > > this syzbot oops shows that even empty bpf prog crashes, > > > > > so there is something wrong with that commit. > > > > > > > > > > What is the difference between this new kconfig and old one that > > > > > you've tested? > > > > I attached the diff, 'config-issue' is the one that reproduces the issue > > > > > > > > > > > > I'm trying to understand the severity of the issues and > > > > > whether we need to revert that commit asap since the merge window > > > > > is about to start. > > > > > > > > Jiri, Peter, > > > > > > > > ping. > > > > > > > > cc-ing Thorsten, since he's tracking it now. > > > > > > > > The config has CONFIG_X86_KERNEL_IBT=y. > > > > Is it related? > > > > > > sorry for late reply.. I still did not find the reason, > > > but I did not try with IBT yet, will test now > > > > no difference with IBT enabled, can't reproduce the issue > > > > ok, scratch that.. the reproducer got stuck on wifi init :-\ > > after I fix that I can now reproduce on my local config with > IBT enabled or disabled.. it's something else I'm getting the error also when reverting the static call change, looking for good commit, bisecting I'm getting fail with: f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4 v6.1-rc1 is ok jirka ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <Y5MaffJOe1QtumSN@krava>]
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp [not found] ` <Y5MaffJOe1QtumSN@krava> @ 2022-12-09 13:50 ` Jiri Olsa 2022-12-09 15:20 ` Jiri Olsa 0 siblings, 1 reply; 26+ messages in thread From: Jiri Olsa @ 2022-12-09 13:50 UTC (permalink / raw) To: Alexei Starovoitov, Song Liu Cc: Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote: SBIP > > > > > > > > > > > > > > I'm trying to understand the severity of the issues and > > > > > > > whether we need to revert that commit asap since the merge window > > > > > > > is about to start. > > > > > > > > > > > > Jiri, Peter, > > > > > > > > > > > > ping. > > > > > > > > > > > > cc-ing Thorsten, since he's tracking it now. > > > > > > > > > > > > The config has CONFIG_X86_KERNEL_IBT=y. > > > > > > Is it related? > > > > > > > > > > sorry for late reply.. I still did not find the reason, > > > > > but I did not try with IBT yet, will test now > > > > > > > > no difference with IBT enabled, can't reproduce the issue > > > > > > > > > > ok, scratch that.. the reproducer got stuck on wifi init :-\ > > > > > > after I fix that I can now reproduce on my local config with > > > IBT enabled or disabled.. it's something else > > > > I'm getting the error also when reverting the static call change, > > looking for good commit, bisecting > > > > I'm getting fail with: > > f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4 > > > > v6.1-rc1 is ok > > so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far > > attaching some more logs looking at the code.. how do we ensure that code running through bpf_prog_run_xdp will not get dispatcher image changed while it's being exetuted we use 'the other half' of the image when we add/remove programs, but could bpf_dispatcher_update race with bpf_prog_run_xdp like: cpu 0: cpu 1: bpf_prog_run_xdp ... bpf_dispatcher_xdp_func start exec image at offset 0x0 bpf_dispatcher_update update image at offset 0x800 bpf_dispatcher_update update image at offset 0x0 still in image at offset 0x0 that might explain why I wasn't able to trigger that on bare metal just in qemu jirka ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-09 13:50 ` Jiri Olsa @ 2022-12-09 15:20 ` Jiri Olsa 2022-12-09 20:31 ` Yonghong Song 0 siblings, 1 reply; 26+ messages in thread From: Jiri Olsa @ 2022-12-09 15:20 UTC (permalink / raw) To: Jiri Olsa Cc: Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote: > On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote: > > SBIP > > > > > > > > > > > > > > > > > I'm trying to understand the severity of the issues and > > > > > > > > whether we need to revert that commit asap since the merge window > > > > > > > > is about to start. > > > > > > > > > > > > > > Jiri, Peter, > > > > > > > > > > > > > > ping. > > > > > > > > > > > > > > cc-ing Thorsten, since he's tracking it now. > > > > > > > > > > > > > > The config has CONFIG_X86_KERNEL_IBT=y. > > > > > > > Is it related? > > > > > > > > > > > > sorry for late reply.. I still did not find the reason, > > > > > > but I did not try with IBT yet, will test now > > > > > > > > > > no difference with IBT enabled, can't reproduce the issue > > > > > > > > > > > > > ok, scratch that.. the reproducer got stuck on wifi init :-\ > > > > > > > > after I fix that I can now reproduce on my local config with > > > > IBT enabled or disabled.. it's something else > > > > > > I'm getting the error also when reverting the static call change, > > > looking for good commit, bisecting > > > > > > I'm getting fail with: > > > f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4 > > > > > > v6.1-rc1 is ok > > > > so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far > > > > attaching some more logs > > looking at the code.. how do we ensure that code running through > bpf_prog_run_xdp will not get dispatcher image changed while > it's being exetuted > > we use 'the other half' of the image when we add/remove programs, > but could bpf_dispatcher_update race with bpf_prog_run_xdp like: > > > cpu 0: cpu 1: > > bpf_prog_run_xdp > ... > bpf_dispatcher_xdp_func > start exec image at offset 0x0 > > bpf_dispatcher_update > update image at offset 0x800 > bpf_dispatcher_update > update image at offset 0x0 > > still in image at offset 0x0 > > > that might explain why I wasn't able to trigger that on > bare metal just in qemu I tried patch below and it fixes the issue for me and seems to confirm the race above.. but not sure it's the best fix jirka --- diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c index c19719f48ce0..6a2ced102fc7 100644 --- a/kernel/bpf/dispatcher.c +++ b/kernel/bpf/dispatcher.c @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) } __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); + synchronize_rcu_tasks(); if (new) d->image_off = noff; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-09 15:20 ` Jiri Olsa @ 2022-12-09 20:31 ` Yonghong Song 2022-12-09 21:53 ` Jiri Olsa 0 siblings, 1 reply; 26+ messages in thread From: Yonghong Song @ 2022-12-09 20:31 UTC (permalink / raw) To: Jiri Olsa Cc: Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On 12/9/22 7:20 AM, Jiri Olsa wrote: > On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote: >> On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote: >> >> SBIP >> >>>>>>>>> >>>>>>>>> I'm trying to understand the severity of the issues and >>>>>>>>> whether we need to revert that commit asap since the merge window >>>>>>>>> is about to start. >>>>>>>> >>>>>>>> Jiri, Peter, >>>>>>>> >>>>>>>> ping. >>>>>>>> >>>>>>>> cc-ing Thorsten, since he's tracking it now. >>>>>>>> >>>>>>>> The config has CONFIG_X86_KERNEL_IBT=y. >>>>>>>> Is it related? >>>>>>> >>>>>>> sorry for late reply.. I still did not find the reason, >>>>>>> but I did not try with IBT yet, will test now >>>>>> >>>>>> no difference with IBT enabled, can't reproduce the issue >>>>>> >>>>> >>>>> ok, scratch that.. the reproducer got stuck on wifi init :-\ >>>>> >>>>> after I fix that I can now reproduce on my local config with >>>>> IBT enabled or disabled.. it's something else >>>> >>>> I'm getting the error also when reverting the static call change, >>>> looking for good commit, bisecting >>>> >>>> I'm getting fail with: >>>> f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4 >>>> >>>> v6.1-rc1 is ok >>> >>> so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far >>> >>> attaching some more logs >> >> looking at the code.. how do we ensure that code running through >> bpf_prog_run_xdp will not get dispatcher image changed while >> it's being exetuted >> >> we use 'the other half' of the image when we add/remove programs, >> but could bpf_dispatcher_update race with bpf_prog_run_xdp like: >> >> >> cpu 0: cpu 1: >> >> bpf_prog_run_xdp >> ... >> bpf_dispatcher_xdp_func >> start exec image at offset 0x0 >> >> bpf_dispatcher_update >> update image at offset 0x800 >> bpf_dispatcher_update >> update image at offset 0x0 >> >> still in image at offset 0x0 >> >> >> that might explain why I wasn't able to trigger that on >> bare metal just in qemu > > I tried patch below and it fixes the issue for me and seems > to confirm the race above.. but not sure it's the best fix > > jirka > > > --- > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c > index c19719f48ce0..6a2ced102fc7 100644 > --- a/kernel/bpf/dispatcher.c > +++ b/kernel/bpf/dispatcher.c > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) > } > > __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); > + synchronize_rcu_tasks(); > > if (new) > d->image_off = noff; This might work. In arch/x86/kernel/alternative.c, we have following code and comments. For text_poke, synchronize_rcu_tasks() might be able to avoid concurrent execution and update. /** * text_poke_copy - Copy instructions into (an unused part of) RX memory * @addr: address to modify * @opcode: source of the copy * @len: length to copy, could be more than 2x PAGE_SIZE * * Not safe against concurrent execution; useful for JITs to dump * new code blocks into unused regions of RX memory. Can be used in * conjunction with synchronize_rcu_tasks() to wait for existing * execution to quiesce after having made sure no existing functions * pointers are live. */ void *text_poke_copy(void *addr, const void *opcode, size_t len) { unsigned long start = (unsigned long)addr; size_t patched = 0; if (WARN_ON_ONCE(core_kernel_text(start))) return NULL; mutex_lock(&text_mutex); while (patched < len) { unsigned long ptr = start + patched; size_t s; s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched); __text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, s); patched += s; } mutex_unlock(&text_mutex); return addr; } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-09 20:31 ` Yonghong Song @ 2022-12-09 21:53 ` Jiri Olsa 2022-12-09 22:41 ` Daniel Borkmann 0 siblings, 1 reply; 26+ messages in thread From: Jiri Olsa @ 2022-12-09 21:53 UTC (permalink / raw) To: Yonghong Song Cc: Jiri Olsa, Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Fri, Dec 09, 2022 at 12:31:06PM -0800, Yonghong Song wrote: > > > On 12/9/22 7:20 AM, Jiri Olsa wrote: > > On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote: > > > On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote: > > > > > > SBIP > > > > > > > > > > > > > > > > > > > > > > > I'm trying to understand the severity of the issues and > > > > > > > > > > whether we need to revert that commit asap since the merge window > > > > > > > > > > is about to start. > > > > > > > > > > > > > > > > > > Jiri, Peter, > > > > > > > > > > > > > > > > > > ping. > > > > > > > > > > > > > > > > > > cc-ing Thorsten, since he's tracking it now. > > > > > > > > > > > > > > > > > > The config has CONFIG_X86_KERNEL_IBT=y. > > > > > > > > > Is it related? > > > > > > > > > > > > > > > > sorry for late reply.. I still did not find the reason, > > > > > > > > but I did not try with IBT yet, will test now > > > > > > > > > > > > > > no difference with IBT enabled, can't reproduce the issue > > > > > > > > > > > > > > > > > > > ok, scratch that.. the reproducer got stuck on wifi init :-\ > > > > > > > > > > > > after I fix that I can now reproduce on my local config with > > > > > > IBT enabled or disabled.. it's something else > > > > > > > > > > I'm getting the error also when reverting the static call change, > > > > > looking for good commit, bisecting > > > > > > > > > > I'm getting fail with: > > > > > f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4 > > > > > > > > > > v6.1-rc1 is ok > > > > > > > > so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far > > > > > > > > attaching some more logs > > > > > > looking at the code.. how do we ensure that code running through > > > bpf_prog_run_xdp will not get dispatcher image changed while > > > it's being exetuted > > > > > > we use 'the other half' of the image when we add/remove programs, > > > but could bpf_dispatcher_update race with bpf_prog_run_xdp like: > > > > > > > > > cpu 0: cpu 1: > > > > > > bpf_prog_run_xdp > > > ... > > > bpf_dispatcher_xdp_func > > > start exec image at offset 0x0 > > > > > > bpf_dispatcher_update > > > update image at offset 0x800 > > > bpf_dispatcher_update > > > update image at offset 0x0 > > > > > > still in image at offset 0x0 > > > > > > > > > that might explain why I wasn't able to trigger that on > > > bare metal just in qemu > > > > I tried patch below and it fixes the issue for me and seems > > to confirm the race above.. but not sure it's the best fix > > > > jirka > > > > > > --- > > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c > > index c19719f48ce0..6a2ced102fc7 100644 > > --- a/kernel/bpf/dispatcher.c > > +++ b/kernel/bpf/dispatcher.c > > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) > > } > > __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); > > + synchronize_rcu_tasks(); > > if (new) > > d->image_off = noff; > > This might work. In arch/x86/kernel/alternative.c, we have following > code and comments. For text_poke, synchronize_rcu_tasks() might be able > to avoid concurrent execution and update. so my idea was that we need to ensure all the current callers of bpf_dispatcher_xdp_func (which should have rcu read lock, based on the comment in bpf_prog_run_xdp) are gone before and new ones execute the new image, so the next call to the bpf_dispatcher_update will be safe to overwrite the other half of the image jirka > > /** > * text_poke_copy - Copy instructions into (an unused part of) RX memory > * @addr: address to modify > * @opcode: source of the copy > * @len: length to copy, could be more than 2x PAGE_SIZE > * > * Not safe against concurrent execution; useful for JITs to dump > * new code blocks into unused regions of RX memory. Can be used in > * conjunction with synchronize_rcu_tasks() to wait for existing > * execution to quiesce after having made sure no existing functions > * pointers are live. > */ > void *text_poke_copy(void *addr, const void *opcode, size_t len) > { > unsigned long start = (unsigned long)addr; > size_t patched = 0; > > if (WARN_ON_ONCE(core_kernel_text(start))) > return NULL; > > mutex_lock(&text_mutex); > while (patched < len) { > unsigned long ptr = start + patched; > size_t s; > > s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - > patched); > > __text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, > s); > patched += s; > } > mutex_unlock(&text_mutex); > return addr; > } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-09 21:53 ` Jiri Olsa @ 2022-12-09 22:41 ` Daniel Borkmann 2022-12-09 23:07 ` Jiri Olsa 0 siblings, 1 reply; 26+ messages in thread From: Daniel Borkmann @ 2022-12-09 22:41 UTC (permalink / raw) To: Jiri Olsa, Yonghong Song Cc: Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On 12/9/22 10:53 PM, Jiri Olsa wrote: > On Fri, Dec 09, 2022 at 12:31:06PM -0800, Yonghong Song wrote: >> >> >> On 12/9/22 7:20 AM, Jiri Olsa wrote: >>> On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote: >>>> On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote: >>>> >>>> SBIP >>>> >>>>>>>>>>> >>>>>>>>>>> I'm trying to understand the severity of the issues and >>>>>>>>>>> whether we need to revert that commit asap since the merge window >>>>>>>>>>> is about to start. >>>>>>>>>> >>>>>>>>>> Jiri, Peter, >>>>>>>>>> >>>>>>>>>> ping. >>>>>>>>>> >>>>>>>>>> cc-ing Thorsten, since he's tracking it now. >>>>>>>>>> >>>>>>>>>> The config has CONFIG_X86_KERNEL_IBT=y. >>>>>>>>>> Is it related? >>>>>>>>> >>>>>>>>> sorry for late reply.. I still did not find the reason, >>>>>>>>> but I did not try with IBT yet, will test now >>>>>>>> >>>>>>>> no difference with IBT enabled, can't reproduce the issue >>>>>>>> >>>>>>> >>>>>>> ok, scratch that.. the reproducer got stuck on wifi init :-\ >>>>>>> >>>>>>> after I fix that I can now reproduce on my local config with >>>>>>> IBT enabled or disabled.. it's something else >>>>>> >>>>>> I'm getting the error also when reverting the static call change, >>>>>> looking for good commit, bisecting >>>>>> >>>>>> I'm getting fail with: >>>>>> f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4 >>>>>> >>>>>> v6.1-rc1 is ok >>>>> >>>>> so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far >>>>> >>>>> attaching some more logs >>>> >>>> looking at the code.. how do we ensure that code running through >>>> bpf_prog_run_xdp will not get dispatcher image changed while >>>> it's being exetuted >>>> >>>> we use 'the other half' of the image when we add/remove programs, >>>> but could bpf_dispatcher_update race with bpf_prog_run_xdp like: >>>> >>>> >>>> cpu 0: cpu 1: >>>> >>>> bpf_prog_run_xdp >>>> ... >>>> bpf_dispatcher_xdp_func >>>> start exec image at offset 0x0 >>>> >>>> bpf_dispatcher_update >>>> update image at offset 0x800 >>>> bpf_dispatcher_update >>>> update image at offset 0x0 >>>> >>>> still in image at offset 0x0 >>>> >>>> >>>> that might explain why I wasn't able to trigger that on >>>> bare metal just in qemu >>> >>> I tried patch below and it fixes the issue for me and seems >>> to confirm the race above.. but not sure it's the best fix >>> >>> jirka >>> >>> >>> --- >>> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c >>> index c19719f48ce0..6a2ced102fc7 100644 >>> --- a/kernel/bpf/dispatcher.c >>> +++ b/kernel/bpf/dispatcher.c >>> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) >>> } >>> __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); >>> + synchronize_rcu_tasks(); >>> if (new) >>> d->image_off = noff; >> >> This might work. In arch/x86/kernel/alternative.c, we have following >> code and comments. For text_poke, synchronize_rcu_tasks() might be able >> to avoid concurrent execution and update. > > so my idea was that we need to ensure all the current callers of > bpf_dispatcher_xdp_func (which should have rcu read lock, based > on the comment in bpf_prog_run_xdp) are gone before and new ones > execute the new image, so the next call to the bpf_dispatcher_update > will be safe to overwrite the other half of the image If v6.1-rc1 was indeed okay, then it looks like this may be related to the trampoline patching for the static_call? Did it repro on v6.1-rc1 just with dbe69b299884 ("bpf: Fix dispatcher patchable function entry to 5 bytes nop") cherry-picked? >> /** >> * text_poke_copy - Copy instructions into (an unused part of) RX memory >> * @addr: address to modify >> * @opcode: source of the copy >> * @len: length to copy, could be more than 2x PAGE_SIZE >> * >> * Not safe against concurrent execution; useful for JITs to dump >> * new code blocks into unused regions of RX memory. Can be used in >> * conjunction with synchronize_rcu_tasks() to wait for existing >> * execution to quiesce after having made sure no existing functions >> * pointers are live. >> */ >> void *text_poke_copy(void *addr, const void *opcode, size_t len) >> { >> unsigned long start = (unsigned long)addr; >> size_t patched = 0; >> >> if (WARN_ON_ONCE(core_kernel_text(start))) >> return NULL; >> >> mutex_lock(&text_mutex); >> while (patched < len) { >> unsigned long ptr = start + patched; >> size_t s; >> >> s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - >> patched); >> >> __text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, >> s); >> patched += s; >> } >> mutex_unlock(&text_mutex); >> return addr; >> } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-09 22:41 ` Daniel Borkmann @ 2022-12-09 23:07 ` Jiri Olsa 2022-12-09 23:29 ` Jiri Olsa 2022-12-09 23:32 ` Daniel Borkmann 0 siblings, 2 replies; 26+ messages in thread From: Jiri Olsa @ 2022-12-09 23:07 UTC (permalink / raw) To: Daniel Borkmann Cc: Jiri Olsa, Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Fri, Dec 09, 2022 at 11:41:11PM +0100, Daniel Borkmann wrote: > On 12/9/22 10:53 PM, Jiri Olsa wrote: > > On Fri, Dec 09, 2022 at 12:31:06PM -0800, Yonghong Song wrote: > > > > > > > > > On 12/9/22 7:20 AM, Jiri Olsa wrote: > > > > On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote: > > > > > On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote: > > > > > > > > > > SBIP > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm trying to understand the severity of the issues and > > > > > > > > > > > > whether we need to revert that commit asap since the merge window > > > > > > > > > > > > is about to start. > > > > > > > > > > > > > > > > > > > > > > Jiri, Peter, > > > > > > > > > > > > > > > > > > > > > > ping. > > > > > > > > > > > > > > > > > > > > > > cc-ing Thorsten, since he's tracking it now. > > > > > > > > > > > > > > > > > > > > > > The config has CONFIG_X86_KERNEL_IBT=y. > > > > > > > > > > > Is it related? > > > > > > > > > > > > > > > > > > > > sorry for late reply.. I still did not find the reason, > > > > > > > > > > but I did not try with IBT yet, will test now > > > > > > > > > > > > > > > > > > no difference with IBT enabled, can't reproduce the issue > > > > > > > > > > > > > > > > > > > > > > > > > ok, scratch that.. the reproducer got stuck on wifi init :-\ > > > > > > > > > > > > > > > > after I fix that I can now reproduce on my local config with > > > > > > > > IBT enabled or disabled.. it's something else > > > > > > > > > > > > > > I'm getting the error also when reverting the static call change, > > > > > > > looking for good commit, bisecting > > > > > > > > > > > > > > I'm getting fail with: > > > > > > > f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4 > > > > > > > > > > > > > > v6.1-rc1 is ok > > > > > > > > > > > > so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far > > > > > > > > > > > > attaching some more logs > > > > > > > > > > looking at the code.. how do we ensure that code running through > > > > > bpf_prog_run_xdp will not get dispatcher image changed while > > > > > it's being exetuted > > > > > > > > > > we use 'the other half' of the image when we add/remove programs, > > > > > but could bpf_dispatcher_update race with bpf_prog_run_xdp like: > > > > > > > > > > > > > > > cpu 0: cpu 1: > > > > > > > > > > bpf_prog_run_xdp > > > > > ... > > > > > bpf_dispatcher_xdp_func > > > > > start exec image at offset 0x0 > > > > > > > > > > bpf_dispatcher_update > > > > > update image at offset 0x800 > > > > > bpf_dispatcher_update > > > > > update image at offset 0x0 > > > > > > > > > > still in image at offset 0x0 > > > > > > > > > > > > > > > that might explain why I wasn't able to trigger that on > > > > > bare metal just in qemu > > > > > > > > I tried patch below and it fixes the issue for me and seems > > > > to confirm the race above.. but not sure it's the best fix > > > > > > > > jirka > > > > > > > > > > > > --- > > > > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c > > > > index c19719f48ce0..6a2ced102fc7 100644 > > > > --- a/kernel/bpf/dispatcher.c > > > > +++ b/kernel/bpf/dispatcher.c > > > > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) > > > > } > > > > __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); > > > > + synchronize_rcu_tasks(); > > > > if (new) > > > > d->image_off = noff; > > > > > > This might work. In arch/x86/kernel/alternative.c, we have following > > > code and comments. For text_poke, synchronize_rcu_tasks() might be able > > > to avoid concurrent execution and update. > > > > so my idea was that we need to ensure all the current callers of > > bpf_dispatcher_xdp_func (which should have rcu read lock, based > > on the comment in bpf_prog_run_xdp) are gone before and new ones > > execute the new image, so the next call to the bpf_dispatcher_update > > will be safe to overwrite the other half of the image > > If v6.1-rc1 was indeed okay, then it looks like this may be related to > the trampoline patching for the static_call? Did it repro on v6.1-rc1 > just with dbe69b299884 ("bpf: Fix dispatcher patchable function entry > to 5 bytes nop") cherry-picked? I'll try that.. it looks to me like the problem was always there, maybe harder to trigger.. also to reproduce it you need to call bpf_dispatcher_update heavily, which is not probably the common use case one other thing is that I think the fix might need rcu locking on the bpf_dispatcher_xdp_func side, because local_bh_disable seems not to be enough to make synchronize_rcu_tasks work I'm now testing patch below jirka --- diff --git a/include/linux/filter.h b/include/linux/filter.h index efc42a6e3aed..a27245b96d6b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -772,7 +772,13 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, * under local_bh_disable(), which provides the needed RCU protection * for accessing map entries. */ - u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); + u32 act; + + rcu_read_lock(); + + act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); + + rcu_read_unlock(); if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c index c19719f48ce0..6a2ced102fc7 100644 --- a/kernel/bpf/dispatcher.c +++ b/kernel/bpf/dispatcher.c @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) } __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); + synchronize_rcu_tasks(); if (new) d->image_off = noff; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-09 23:07 ` Jiri Olsa @ 2022-12-09 23:29 ` Jiri Olsa 2022-12-09 23:32 ` Daniel Borkmann 1 sibling, 0 replies; 26+ messages in thread From: Jiri Olsa @ 2022-12-09 23:29 UTC (permalink / raw) To: Daniel Borkmann Cc: Jiri Olsa, Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Sat, Dec 10, 2022 at 12:07:58AM +0100, Jiri Olsa wrote: SNIP > > > > > > > > > > > > looking at the code.. how do we ensure that code running through > > > > > > bpf_prog_run_xdp will not get dispatcher image changed while > > > > > > it's being exetuted > > > > > > > > > > > > we use 'the other half' of the image when we add/remove programs, > > > > > > but could bpf_dispatcher_update race with bpf_prog_run_xdp like: > > > > > > > > > > > > > > > > > > cpu 0: cpu 1: > > > > > > > > > > > > bpf_prog_run_xdp > > > > > > ... > > > > > > bpf_dispatcher_xdp_func > > > > > > start exec image at offset 0x0 > > > > > > > > > > > > bpf_dispatcher_update > > > > > > update image at offset 0x800 > > > > > > bpf_dispatcher_update > > > > > > update image at offset 0x0 > > > > > > > > > > > > still in image at offset 0x0 > > > > > > > > > > > > > > > > > > that might explain why I wasn't able to trigger that on > > > > > > bare metal just in qemu > > > > > > > > > > I tried patch below and it fixes the issue for me and seems > > > > > to confirm the race above.. but not sure it's the best fix > > > > > > > > > > jirka > > > > > > > > > > > > > > > --- > > > > > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c > > > > > index c19719f48ce0..6a2ced102fc7 100644 > > > > > --- a/kernel/bpf/dispatcher.c > > > > > +++ b/kernel/bpf/dispatcher.c > > > > > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) > > > > > } > > > > > __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); > > > > > + synchronize_rcu_tasks(); > > > > > if (new) > > > > > d->image_off = noff; > > > > > > > > This might work. In arch/x86/kernel/alternative.c, we have following > > > > code and comments. For text_poke, synchronize_rcu_tasks() might be able > > > > to avoid concurrent execution and update. > > > > > > so my idea was that we need to ensure all the current callers of > > > bpf_dispatcher_xdp_func (which should have rcu read lock, based > > > on the comment in bpf_prog_run_xdp) are gone before and new ones > > > execute the new image, so the next call to the bpf_dispatcher_update > > > will be safe to overwrite the other half of the image > > > > If v6.1-rc1 was indeed okay, then it looks like this may be related to > > the trampoline patching for the static_call? Did it repro on v6.1-rc1 > > just with dbe69b299884 ("bpf: Fix dispatcher patchable function entry > > to 5 bytes nop") cherry-picked? > > I'll try that.. it looks to me like the problem was always there, > maybe harder to trigger.. also to reproduce it you need to call > bpf_dispatcher_update heavily, which is not probably the common > use case > > one other thing is that I think the fix might need rcu locking > on the bpf_dispatcher_xdp_func side, because local_bh_disable > seems not to be enough to make synchronize_rcu_tasks work > > I'm now testing patch below > > jirka > > > --- > diff --git a/include/linux/filter.h b/include/linux/filter.h > index efc42a6e3aed..a27245b96d6b 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -772,7 +772,13 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, > * under local_bh_disable(), which provides the needed RCU protection > * for accessing map entries. > */ > - u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); > + u32 act; > + > + rcu_read_lock(); > + > + act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); > + > + rcu_read_unlock(); > > if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { > if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c > index c19719f48ce0..6a2ced102fc7 100644 > --- a/kernel/bpf/dispatcher.c > +++ b/kernel/bpf/dispatcher.c > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) > } > > __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); > + synchronize_rcu_tasks(); > > if (new) > d->image_off = noff; hm, so I'm eventually getting splats like below I guess I'm missing some rcu/xdp detail, thoughts? ;-) jirka --- [ 1107.911088][ T41] INFO: task rcu_tasks_kthre:12 blocked for more than 122 seconds. [ 1107.913332][ T41] Not tainted 6.1.0-rc7+ #847 [ 1107.914801][ T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1107.916691][ T41] task:rcu_tasks_kthre state:D stack:14392 pid:12 ppid:2 flags:0x00004000 [ 1107.917324][ T41] Call Trace: [ 1107.917563][ T41] <TASK> [ 1107.917784][ T41] __schedule+0x419/0xe30 [ 1107.918764][ T41] schedule+0x5d/0xe0 [ 1107.919061][ T41] schedule_timeout+0x102/0x140 [ 1107.919386][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.919747][ T41] ? lock_release+0x264/0x4f0 [ 1107.920079][ T41] ? lock_acquired+0x207/0x470 [ 1107.920397][ T41] ? trace_hardirqs_on+0x2b/0xd0 [ 1107.920723][ T41] __wait_for_common+0xb6/0x210 [ 1107.921067][ T41] ? usleep_range_state+0xb0/0xb0 [ 1107.921401][ T41] __synchronize_srcu+0x151/0x1e0 [ 1107.921731][ T41] ? rcu_tasks_pregp_step+0x10/0x10 [ 1107.922112][ T41] ? ktime_get_mono_fast_ns+0x3a/0x90 [ 1107.922463][ T41] ? synchronize_srcu+0xa1/0xe0 [ 1107.922784][ T41] rcu_tasks_wait_gp+0x183/0x3b0 [ 1107.923129][ T41] ? lock_release+0x264/0x4f0 [ 1107.923442][ T41] rcu_tasks_one_gp+0x35a/0x3e0 [ 1107.923766][ T41] ? rcu_tasks_postscan+0x20/0x20 [ 1107.924114][ T41] rcu_tasks_kthread+0x31/0x40 [ 1107.924434][ T41] kthread+0xf2/0x120 [ 1107.924713][ T41] ? kthread_complete_and_exit+0x20/0x20 [ 1107.925095][ T41] ret_from_fork+0x1f/0x30 [ 1107.925404][ T41] </TASK> [ 1107.925664][ T41] INFO: task ex:7319 blocked for more than 122 seconds. [ 1107.926121][ T41] Not tainted 6.1.0-rc7+ #847 [ 1107.926461][ T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1107.927090][ T41] task:ex state:D stack:13648 pid:7319 ppid:677 flags:0x00004006 [ 1107.927791][ T41] Call Trace: [ 1107.928079][ T41] <TASK> [ 1107.928334][ T41] __schedule+0x419/0xe30 [ 1107.928683][ T41] schedule+0x5d/0xe0 [ 1107.929019][ T41] schedule_preempt_disabled+0x14/0x30 [ 1107.929440][ T41] __mutex_lock+0x3fd/0x850 [ 1107.929799][ T41] ? bpf_dispatcher_change_prog+0x3a/0x380 [ 1107.930235][ T41] ? bpf_dispatcher_change_prog+0x3a/0x380 [ 1107.930609][ T41] bpf_dispatcher_change_prog+0x3a/0x380 [ 1107.930977][ T41] bpf_prog_test_run_xdp+0x39b/0x600 [ 1107.931340][ T41] __sys_bpf+0x963/0x2bb0 [ 1107.931684][ T41] ? futex_wait+0x175/0x250 [ 1107.932014][ T41] ? lock_acquire+0x2ed/0x370 [ 1107.932328][ T41] ? lock_release+0x264/0x4f0 [ 1107.932640][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.933028][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.933388][ T41] ? lock_release+0x264/0x4f0 [ 1107.933700][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.934070][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.934432][ T41] __x64_sys_bpf+0x1a/0x30 [ 1107.934733][ T41] do_syscall_64+0x37/0x90 [ 1107.935050][ T41] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 1107.935428][ T41] RIP: 0033:0x7f02f9f0af3d [ 1107.935731][ T41] RSP: 002b:00007f02fa0e9df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ 1107.936291][ T41] RAX: ffffffffffffffda RBX: 00007f02fa0ea640 RCX: 00007f02f9f0af3d [ 1107.936811][ T41] RDX: 0000000000000048 RSI: 0000000020000140 RDI: 000000000000000a [ 1107.937360][ T41] RBP: 00007f02fa0e9e20 R08: 0000000000000000 R09: 0000000000000000 [ 1107.937884][ T41] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff80 [ 1107.938425][ T41] R13: 0000000000000011 R14: 00007ffda75fd290 R15: 00007f02fa0ca000 [ 1107.939050][ T41] </TASK> [ 1107.939315][ T41] INFO: task ex:7352 blocked for more than 122 seconds. [ 1107.939744][ T41] Not tainted 6.1.0-rc7+ #847 [ 1107.940095][ T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1107.940651][ T41] task:ex state:D stack:13648 pid:7352 ppid:766 flags:0x00004006 [ 1107.941254][ T41] Call Trace: [ 1107.941492][ T41] <TASK> [ 1107.941710][ T41] __schedule+0x419/0xe30 [ 1107.942018][ T41] ? lock_acquired+0x207/0x470 [ 1107.942339][ T41] schedule+0x5d/0xe0 [ 1107.942616][ T41] schedule_timeout+0x102/0x140 [ 1107.942955][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.943330][ T41] ? lock_release+0x264/0x4f0 [ 1107.943643][ T41] ? lock_acquired+0x207/0x470 [ 1107.943965][ T41] ? trace_hardirqs_on+0x2b/0xd0 [ 1107.944318][ T41] __wait_for_common+0xb6/0x210 [ 1107.944641][ T41] ? usleep_range_state+0xb0/0xb0 [ 1107.950003][ T41] __wait_rcu_gp+0x14d/0x170 [ 1107.950399][ T41] ? 0xffffffffa0013840 [ 1107.950726][ T41] synchronize_rcu_tasks_generic.part.0.isra.0+0x31/0x50 [ 1107.951207][ T41] ? call_rcu_tasks_generic+0x350/0x350 [ 1107.951643][ T41] ? rcu_tasks_pregp_step+0x10/0x10 [ 1107.952070][ T41] bpf_dispatcher_change_prog+0x204/0x380 [ 1107.952521][ T41] bpf_prog_test_run_xdp+0x39b/0x600 [ 1107.952941][ T41] __sys_bpf+0x963/0x2bb0 [ 1107.953302][ T41] ? futex_wait+0x175/0x250 [ 1107.953669][ T41] ? lock_acquire+0x2ed/0x370 [ 1107.954058][ T41] ? lock_release+0x264/0x4f0 [ 1107.954435][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.954868][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.955329][ T41] ? lock_release+0x264/0x4f0 [ 1107.955705][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.956148][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.956582][ T41] __x64_sys_bpf+0x1a/0x30 [ 1107.956937][ T41] do_syscall_64+0x37/0x90 [ 1107.957312][ T41] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 1107.957771][ T41] RIP: 0033:0x7ffaa610af3d [ 1107.958140][ T41] RSP: 002b:00007ffaa629adf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ 1107.958792][ T41] RAX: ffffffffffffffda RBX: 00007ffaa629b640 RCX: 00007ffaa610af3d [ 1107.959427][ T41] RDX: 0000000000000048 RSI: 0000000020000140 RDI: 000000000000000a [ 1107.960054][ T41] RBP: 00007ffaa629ae20 R08: 0000000000000000 R09: 0000000000000000 [ 1107.960680][ T41] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff80 [ 1107.961314][ T41] R13: 0000000000000011 R14: 00007ffef5c89e00 R15: 00007ffaa627b000 [ 1107.961948][ T41] </TASK> [ 1107.962226][ T41] INFO: task ex:7354 blocked for more than 122 seconds. [ 1107.962756][ T41] Not tainted 6.1.0-rc7+ #847 [ 1107.963178][ T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1107.963786][ T41] task:ex state:D stack:13648 pid:7354 ppid:767 flags:0x00004006 [ 1107.964451][ T41] Call Trace: [ 1107.964733][ T41] <TASK> [ 1107.965001][ T41] __schedule+0x419/0xe30 [ 1107.965354][ T41] schedule+0x5d/0xe0 [ 1107.965682][ T41] schedule_preempt_disabled+0x14/0x30 [ 1107.966130][ T41] __mutex_lock+0x3fd/0x850 [ 1107.966493][ T41] ? lock_acquire+0x2ed/0x370 [ 1107.966870][ T41] ? bpf_dispatcher_change_prog+0x3a/0x380 [ 1107.967340][ T41] ? bpf_dispatcher_change_prog+0x3a/0x380 [ 1107.967792][ T41] bpf_dispatcher_change_prog+0x3a/0x380 [ 1107.968236][ T41] bpf_prog_test_run_xdp+0x2c8/0x600 [ 1107.968654][ T41] __sys_bpf+0x963/0x2bb0 [ 1107.969012][ T41] ? futex_wait+0x175/0x250 [ 1107.969380][ T41] ? lock_acquire+0x2ed/0x370 [ 1107.969754][ T41] ? lock_release+0x264/0x4f0 [ 1107.970135][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.970565][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.971008][ T41] ? lock_release+0x264/0x4f0 [ 1107.971385][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.971813][ T41] ? rcu_read_lock_sched_held+0x10/0x90 [ 1107.972257][ T41] __x64_sys_bpf+0x1a/0x30 [ 1107.972614][ T41] do_syscall_64+0x37/0x90 [ 1107.972984][ T41] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 1107.973385][ T41] RIP: 0033:0x7ffaa610af3d [ 1107.973696][ T41] RSP: 002b:00007ffaa629adf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ 1107.974261][ T41] RAX: ffffffffffffffda RBX: 00007ffaa629b640 RCX: 00007ffaa610af3d [ 1107.974795][ T41] RDX: 0000000000000048 RSI: 0000000020000140 RDI: 000000000000000a [ 1107.975348][ T41] RBP: 00007ffaa629ae20 R08: 0000000000000000 R09: 0000000000000000 [ 1107.975942][ T41] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff80 [ 1107.976570][ T41] R13: 0000000000000011 R14: 00007ffef5c89e00 R15: 00007ffaa627b000 [ 1107.977216][ T41] </TASK> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-09 23:07 ` Jiri Olsa 2022-12-09 23:29 ` Jiri Olsa @ 2022-12-09 23:32 ` Daniel Borkmann 2022-12-09 23:34 ` Jakub Kicinski 1 sibling, 1 reply; 26+ messages in thread From: Daniel Borkmann @ 2022-12-09 23:32 UTC (permalink / raw) To: Jiri Olsa Cc: Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On 12/10/22 12:07 AM, Jiri Olsa wrote: > On Fri, Dec 09, 2022 at 11:41:11PM +0100, Daniel Borkmann wrote: >> On 12/9/22 10:53 PM, Jiri Olsa wrote: >>> On Fri, Dec 09, 2022 at 12:31:06PM -0800, Yonghong Song wrote: >>>> >>>> >>>> On 12/9/22 7:20 AM, Jiri Olsa wrote: >>>>> On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote: >>>>>> On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote: >>>>>> >>>>>> SBIP >>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm trying to understand the severity of the issues and >>>>>>>>>>>>> whether we need to revert that commit asap since the merge window >>>>>>>>>>>>> is about to start. >>>>>>>>>>>> >>>>>>>>>>>> Jiri, Peter, >>>>>>>>>>>> >>>>>>>>>>>> ping. >>>>>>>>>>>> >>>>>>>>>>>> cc-ing Thorsten, since he's tracking it now. >>>>>>>>>>>> >>>>>>>>>>>> The config has CONFIG_X86_KERNEL_IBT=y. >>>>>>>>>>>> Is it related? >>>>>>>>>>> >>>>>>>>>>> sorry for late reply.. I still did not find the reason, >>>>>>>>>>> but I did not try with IBT yet, will test now >>>>>>>>>> >>>>>>>>>> no difference with IBT enabled, can't reproduce the issue >>>>>>>>>> >>>>>>>>> >>>>>>>>> ok, scratch that.. the reproducer got stuck on wifi init :-\ >>>>>>>>> >>>>>>>>> after I fix that I can now reproduce on my local config with >>>>>>>>> IBT enabled or disabled.. it's something else >>>>>>>> >>>>>>>> I'm getting the error also when reverting the static call change, >>>>>>>> looking for good commit, bisecting >>>>>>>> >>>>>>>> I'm getting fail with: >>>>>>>> f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4 >>>>>>>> >>>>>>>> v6.1-rc1 is ok >>>>>>> >>>>>>> so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far >>>>>>> >>>>>>> attaching some more logs >>>>>> >>>>>> looking at the code.. how do we ensure that code running through >>>>>> bpf_prog_run_xdp will not get dispatcher image changed while >>>>>> it's being exetuted >>>>>> >>>>>> we use 'the other half' of the image when we add/remove programs, >>>>>> but could bpf_dispatcher_update race with bpf_prog_run_xdp like: >>>>>> >>>>>> >>>>>> cpu 0: cpu 1: >>>>>> >>>>>> bpf_prog_run_xdp >>>>>> ... >>>>>> bpf_dispatcher_xdp_func >>>>>> start exec image at offset 0x0 >>>>>> >>>>>> bpf_dispatcher_update >>>>>> update image at offset 0x800 >>>>>> bpf_dispatcher_update >>>>>> update image at offset 0x0 >>>>>> >>>>>> still in image at offset 0x0 >>>>>> >>>>>> >>>>>> that might explain why I wasn't able to trigger that on >>>>>> bare metal just in qemu >>>>> >>>>> I tried patch below and it fixes the issue for me and seems >>>>> to confirm the race above.. but not sure it's the best fix >>>>> >>>>> jirka >>>>> >>>>> >>>>> --- >>>>> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c >>>>> index c19719f48ce0..6a2ced102fc7 100644 >>>>> --- a/kernel/bpf/dispatcher.c >>>>> +++ b/kernel/bpf/dispatcher.c >>>>> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) >>>>> } >>>>> __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); >>>>> + synchronize_rcu_tasks(); >>>>> if (new) >>>>> d->image_off = noff; >>>> >>>> This might work. In arch/x86/kernel/alternative.c, we have following >>>> code and comments. For text_poke, synchronize_rcu_tasks() might be able >>>> to avoid concurrent execution and update. >>> >>> so my idea was that we need to ensure all the current callers of >>> bpf_dispatcher_xdp_func (which should have rcu read lock, based >>> on the comment in bpf_prog_run_xdp) are gone before and new ones >>> execute the new image, so the next call to the bpf_dispatcher_update >>> will be safe to overwrite the other half of the image >> >> If v6.1-rc1 was indeed okay, then it looks like this may be related to >> the trampoline patching for the static_call? Did it repro on v6.1-rc1 >> just with dbe69b299884 ("bpf: Fix dispatcher patchable function entry >> to 5 bytes nop") cherry-picked? > > I'll try that.. it looks to me like the problem was always there, > maybe harder to trigger.. also to reproduce it you need to call > bpf_dispatcher_update heavily, which is not probably the common > use case > > one other thing is that I think the fix might need rcu locking > on the bpf_dispatcher_xdp_func side, because local_bh_disable > seems not to be enough to make synchronize_rcu_tasks work > > I'm now testing patch below > > jirka > > --- > diff --git a/include/linux/filter.h b/include/linux/filter.h > index efc42a6e3aed..a27245b96d6b 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -772,7 +772,13 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, > * under local_bh_disable(), which provides the needed RCU protection > * for accessing map entries. > */ > - u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); > + u32 act; > + > + rcu_read_lock(); > + > + act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); > + > + rcu_read_unlock(); fwiw, these should not be necessary, Documentation/RCU/checklist.rst : [...] One example of non-obvious pairing is the XDP feature in networking, which calls BPF programs from network-driver NAPI (softirq) context. BPF relies heavily on RCU protection for its data structures, but because the BPF program invocation happens entirely within a single local_bh_disable() section in a NAPI poll cycle, this usage is safe. The reason that this usage is safe is that readers can use anything that disables BH when updaters use call_rcu() or synchronize_rcu(). [...] > if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { > if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c > index c19719f48ce0..6a2ced102fc7 100644 > --- a/kernel/bpf/dispatcher.c > +++ b/kernel/bpf/dispatcher.c > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) > } > > __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); > + synchronize_rcu_tasks(); > > if (new) > d->image_off = noff; > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-09 23:32 ` Daniel Borkmann @ 2022-12-09 23:34 ` Jakub Kicinski 2022-12-10 0:06 ` Jiri Olsa 0 siblings, 1 reply; 26+ messages in thread From: Jakub Kicinski @ 2022-12-09 23:34 UTC (permalink / raw) To: Daniel Borkmann Cc: Jiri Olsa, Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote: > fwiw, these should not be necessary, Documentation/RCU/checklist.rst : > > [...] One example of non-obvious pairing is the XDP feature in networking, > which calls BPF programs from network-driver NAPI (softirq) context. BPF > relies heavily on RCU protection for its data structures, but because the > BPF program invocation happens entirely within a single local_bh_disable() > section in a NAPI poll cycle, this usage is safe. The reason that this usage > is safe is that readers can use anything that disables BH when updaters use > call_rcu() or synchronize_rcu(). [...] FWIW I sent a link to the thread to Paul and he confirmed the RCU will wait for just the BH. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-09 23:34 ` Jakub Kicinski @ 2022-12-10 0:06 ` Jiri Olsa 2022-12-10 0:38 ` Paul E. McKenney 2022-12-10 1:12 ` Alexei Starovoitov 0 siblings, 2 replies; 26+ messages in thread From: Jiri Olsa @ 2022-12-10 0:06 UTC (permalink / raw) To: Jakub Kicinski, Paul E. McKenney Cc: Daniel Borkmann, Jiri Olsa, Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote: > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote: > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst : > > > > [...] One example of non-obvious pairing is the XDP feature in networking, > > which calls BPF programs from network-driver NAPI (softirq) context. BPF > > relies heavily on RCU protection for its data structures, but because the > > BPF program invocation happens entirely within a single local_bh_disable() > > section in a NAPI poll cycle, this usage is safe. The reason that this usage > > is safe is that readers can use anything that disables BH when updaters use > > call_rcu() or synchronize_rcu(). [...] > > FWIW I sent a link to the thread to Paul and he confirmed > the RCU will wait for just the BH. so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side Paul, any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog with bpf_prog_run_xdp callers? with synchronize_rcu_tasks I'm getting splats like: https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-) thanks, jirka --- diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c index c19719f48ce0..e6126f07e85b 100644 --- a/kernel/bpf/dispatcher.c +++ b/kernel/bpf/dispatcher.c @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) } __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); + synchronize_rcu_tasks_rude(); if (new) d->image_off = noff; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-10 0:06 ` Jiri Olsa @ 2022-12-10 0:38 ` Paul E. McKenney 2022-12-10 13:05 ` Jiri Olsa 2022-12-10 1:12 ` Alexei Starovoitov 1 sibling, 1 reply; 26+ messages in thread From: Paul E. McKenney @ 2022-12-10 0:38 UTC (permalink / raw) To: Jiri Olsa Cc: Jakub Kicinski, Daniel Borkmann, Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Sat, Dec 10, 2022 at 01:06:16AM +0100, Jiri Olsa wrote: > On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote: > > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote: > > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst : > > > > > > [...] One example of non-obvious pairing is the XDP feature in networking, > > > which calls BPF programs from network-driver NAPI (softirq) context. BPF > > > relies heavily on RCU protection for its data structures, but because the > > > BPF program invocation happens entirely within a single local_bh_disable() > > > section in a NAPI poll cycle, this usage is safe. The reason that this usage > > > is safe is that readers can use anything that disables BH when updaters use > > > call_rcu() or synchronize_rcu(). [...] > > > > FWIW I sent a link to the thread to Paul and he confirmed > > the RCU will wait for just the BH. > > so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side > > Paul, > any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog > with bpf_prog_run_xdp callers? > > with synchronize_rcu_tasks I'm getting splats like: > https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f > > synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-) It sounds like we are all talking past each other, leaving me no choice but to supply a wall of text: It is quite true that synchronize_rcu_tasks_rude() will wait for bh-disabled regions of code, just like synchronize_rcu() and synchronize_rcu_tasks() will. However, please note that synchronize_rcu_tasks() never waits on any of the idle tasks. So the usual approach in tracing is to do both a synchronize_rcu_tasks() and synchronize_rcu_tasks_rude(). One way of overlapping the resulting pair of grace periods is to use synchronize_rcu_mult(). But none of these permit readers to sleep. That is what synchronize_rcu_tasks_trace() is for, but unlike both synchronize_rcu_tasks() and synchronize_rcu_tasks_rude(), you must explicitly mark the readers with rcu_read_lock_trace() and rcu_read_unlock_trace(). This is used to protect sleepable BPF programs. Now, synchronize_rcu() will also wait on bh-disabled lines of code, with the exception of such code in the exception path, way deep in the idle loop, early in the CPU-online process, or late in the CPU-offline process. You can recognize the first two categories of code by the noinstr tags on the functions. And yes, synchronize_rcu_rude() is quite special. ;-) Does this help, or am I simply adding to the confusion? Thanx, Paul > thanks, > jirka > > > --- > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c > index c19719f48ce0..e6126f07e85b 100644 > --- a/kernel/bpf/dispatcher.c > +++ b/kernel/bpf/dispatcher.c > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) > } > > __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); > + synchronize_rcu_tasks_rude(); > > if (new) > d->image_off = noff; ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-10 0:38 ` Paul E. McKenney @ 2022-12-10 13:05 ` Jiri Olsa 0 siblings, 0 replies; 26+ messages in thread From: Jiri Olsa @ 2022-12-10 13:05 UTC (permalink / raw) To: Paul E. McKenney Cc: Jiri Olsa, Jakub Kicinski, Daniel Borkmann, Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Fri, Dec 09, 2022 at 04:38:38PM -0800, Paul E. McKenney wrote: > On Sat, Dec 10, 2022 at 01:06:16AM +0100, Jiri Olsa wrote: > > On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote: > > > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote: > > > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst : > > > > > > > > [...] One example of non-obvious pairing is the XDP feature in networking, > > > > which calls BPF programs from network-driver NAPI (softirq) context. BPF > > > > relies heavily on RCU protection for its data structures, but because the > > > > BPF program invocation happens entirely within a single local_bh_disable() > > > > section in a NAPI poll cycle, this usage is safe. The reason that this usage > > > > is safe is that readers can use anything that disables BH when updaters use > > > > call_rcu() or synchronize_rcu(). [...] > > > > > > FWIW I sent a link to the thread to Paul and he confirmed > > > the RCU will wait for just the BH. > > > > so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side > > > > Paul, > > any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog > > with bpf_prog_run_xdp callers? > > > > with synchronize_rcu_tasks I'm getting splats like: > > https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f > > > > synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-) > > It sounds like we are all talking past each other, leaving me no > choice but to supply a wall of text: > > It is quite true that synchronize_rcu_tasks_rude() will wait > for bh-disabled regions of code, just like synchronize_rcu() > and synchronize_rcu_tasks() will. However, please note that > synchronize_rcu_tasks() never waits on any of the idle tasks. So the > usual approach in tracing is to do both a synchronize_rcu_tasks() and > synchronize_rcu_tasks_rude(). One way of overlapping the resulting > pair of grace periods is to use synchronize_rcu_mult(). > > But none of these permit readers to sleep. That is what > synchronize_rcu_tasks_trace() is for, but unlike both > synchronize_rcu_tasks() and synchronize_rcu_tasks_rude(), > you must explicitly mark the readers with rcu_read_lock_trace() > and rcu_read_unlock_trace(). This is used to protect sleepable > BPF programs. > > Now, synchronize_rcu() will also wait on bh-disabled lines of code, with > the exception of such code in the exception path, way deep in the idle > loop, early in the CPU-online process, or late in the CPU-offline process. > You can recognize the first two categories of code by the noinstr tags > on the functions. > > And yes, synchronize_rcu_rude() is quite special. ;-) > > Does this help, or am I simply adding to the confusion? I see, so as Alexei said to synchronize bpf_prog_run_xdp callers, we should be able to use just synchronize_rcu, because it's allways called just in bh-disabled code thanks, jirka ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-10 0:06 ` Jiri Olsa 2022-12-10 0:38 ` Paul E. McKenney @ 2022-12-10 1:12 ` Alexei Starovoitov 2022-12-10 13:11 ` Jiri Olsa 1 sibling, 1 reply; 26+ messages in thread From: Alexei Starovoitov @ 2022-12-10 1:12 UTC (permalink / raw) To: Jiri Olsa Cc: Jakub Kicinski, Paul E. McKenney, Daniel Borkmann, Yonghong Song, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Fri, Dec 9, 2022 at 4:06 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote: > > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote: > > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst : > > > > > > [...] One example of non-obvious pairing is the XDP feature in networking, > > > which calls BPF programs from network-driver NAPI (softirq) context. BPF > > > relies heavily on RCU protection for its data structures, but because the > > > BPF program invocation happens entirely within a single local_bh_disable() > > > section in a NAPI poll cycle, this usage is safe. The reason that this usage > > > is safe is that readers can use anything that disables BH when updaters use > > > call_rcu() or synchronize_rcu(). [...] > > > > FWIW I sent a link to the thread to Paul and he confirmed > > the RCU will wait for just the BH. > > so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side > > Paul, > any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog > with bpf_prog_run_xdp callers? > > with synchronize_rcu_tasks I'm getting splats like: > https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f > > synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-) Jiri, I haven't tried to repro this yet, but I feel you're on the wrong path here. The splat has this: ? bpf_prog_run_xdp include/linux/filter.h:775 [inline] ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400 that test_run logic takes rcu_read_lock. See bpf_test_timer_enter. I suspect the addition of synchronize_rcu_tasks_rude only slows down the race. The synchronize_rcu_tasks_trace also behaves like synchronize_rcu. See our new and fancy rcu_trace_implies_rcu_gp(), but I'm not sure it applies to synchronize_rcu_tasks_rude. Have you tried with just synchronize_rcu() ? If your theory about the race is correct then the vanila sync_rcu should help. If not, the issue is some place else. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-10 1:12 ` Alexei Starovoitov @ 2022-12-10 13:11 ` Jiri Olsa 2022-12-12 15:04 ` Jiri Olsa 0 siblings, 1 reply; 26+ messages in thread From: Jiri Olsa @ 2022-12-10 13:11 UTC (permalink / raw) To: Alexei Starovoitov Cc: Jiri Olsa, Jakub Kicinski, Paul E. McKenney, Daniel Borkmann, Yonghong Song, Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Fri, Dec 09, 2022 at 05:12:03PM -0800, Alexei Starovoitov wrote: > On Fri, Dec 9, 2022 at 4:06 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote: > > > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote: > > > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst : > > > > > > > > [...] One example of non-obvious pairing is the XDP feature in networking, > > > > which calls BPF programs from network-driver NAPI (softirq) context. BPF > > > > relies heavily on RCU protection for its data structures, but because the > > > > BPF program invocation happens entirely within a single local_bh_disable() > > > > section in a NAPI poll cycle, this usage is safe. The reason that this usage > > > > is safe is that readers can use anything that disables BH when updaters use > > > > call_rcu() or synchronize_rcu(). [...] > > > > > > FWIW I sent a link to the thread to Paul and he confirmed > > > the RCU will wait for just the BH. > > > > so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side > > > > Paul, > > any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog > > with bpf_prog_run_xdp callers? > > > > with synchronize_rcu_tasks I'm getting splats like: > > https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f > > > > synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-) > > Jiri, > > I haven't tried to repro this yet, but I feel you're on > the wrong path here. The splat has this: > ? bpf_prog_run_xdp include/linux/filter.h:775 [inline] > ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400 > that test_run logic takes rcu_read_lock. > See bpf_test_timer_enter. > I suspect the addition of synchronize_rcu_tasks_rude > only slows down the race. > The synchronize_rcu_tasks_trace also behaves like synchronize_rcu. > See our new and fancy rcu_trace_implies_rcu_gp(), > but I'm not sure it applies to synchronize_rcu_tasks_rude. > Have you tried with just synchronize_rcu() ? > If your theory about the race is correct then > the vanila sync_rcu should help. > If not, the issue is some place else. synchronize_rcu seems to work as well, I'll keep the test running for some time thanks, jirka ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-10 13:11 ` Jiri Olsa @ 2022-12-12 15:04 ` Jiri Olsa 2022-12-13 2:26 ` Hao Sun 0 siblings, 1 reply; 26+ messages in thread From: Jiri Olsa @ 2022-12-12 15:04 UTC (permalink / raw) To: Jiri Olsa, Hao Sun Cc: Alexei Starovoitov, Jakub Kicinski, Paul E. McKenney, Daniel Borkmann, Yonghong Song, Song Liu, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis On Sat, Dec 10, 2022 at 02:11:34PM +0100, Jiri Olsa wrote: > On Fri, Dec 09, 2022 at 05:12:03PM -0800, Alexei Starovoitov wrote: > > On Fri, Dec 9, 2022 at 4:06 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote: > > > > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote: > > > > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst : > > > > > > > > > > [...] One example of non-obvious pairing is the XDP feature in networking, > > > > > which calls BPF programs from network-driver NAPI (softirq) context. BPF > > > > > relies heavily on RCU protection for its data structures, but because the > > > > > BPF program invocation happens entirely within a single local_bh_disable() > > > > > section in a NAPI poll cycle, this usage is safe. The reason that this usage > > > > > is safe is that readers can use anything that disables BH when updaters use > > > > > call_rcu() or synchronize_rcu(). [...] > > > > > > > > FWIW I sent a link to the thread to Paul and he confirmed > > > > the RCU will wait for just the BH. > > > > > > so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side > > > > > > Paul, > > > any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog > > > with bpf_prog_run_xdp callers? > > > > > > with synchronize_rcu_tasks I'm getting splats like: > > > https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f > > > > > > synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-) > > > > Jiri, > > > > I haven't tried to repro this yet, but I feel you're on > > the wrong path here. The splat has this: > > ? bpf_prog_run_xdp include/linux/filter.h:775 [inline] > > ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400 > > that test_run logic takes rcu_read_lock. > > See bpf_test_timer_enter. > > I suspect the addition of synchronize_rcu_tasks_rude > > only slows down the race. > > The synchronize_rcu_tasks_trace also behaves like synchronize_rcu. > > See our new and fancy rcu_trace_implies_rcu_gp(), > > but I'm not sure it applies to synchronize_rcu_tasks_rude. > > Have you tried with just synchronize_rcu() ? > > If your theory about the race is correct then > > the vanila sync_rcu should help. > > If not, the issue is some place else. > > synchronize_rcu seems to work as well, I'll keep the test > running for some time looks good, Hao Sun, could you please test change below? thanks, jirka --- diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c index c19719f48ce0..4b0fa5b98137 100644 --- a/kernel/bpf/dispatcher.c +++ b/kernel/bpf/dispatcher.c @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) } __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); + synchronize_rcu(); if (new) d->image_off = noff; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp 2022-12-12 15:04 ` Jiri Olsa @ 2022-12-13 2:26 ` Hao Sun 0 siblings, 0 replies; 26+ messages in thread From: Hao Sun @ 2022-12-13 2:26 UTC (permalink / raw) To: Jiri Olsa Cc: Alexei Starovoitov, Jakub Kicinski, Paul E. McKenney, Daniel Borkmann, Yonghong Song, Song Liu, Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, David Miller, Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev, Thorsten Leemhuis > On 12 Dec 2022, at 11:04 PM, Jiri Olsa <olsajiri@gmail.com> wrote: > > On Sat, Dec 10, 2022 at 02:11:34PM +0100, Jiri Olsa wrote: >> On Fri, Dec 09, 2022 at 05:12:03PM -0800, Alexei Starovoitov wrote: >>> On Fri, Dec 9, 2022 at 4:06 PM Jiri Olsa <olsajiri@gmail.com> wrote: >>>> >>>> On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote: >>>>> On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote: >>>>>> fwiw, these should not be necessary, Documentation/RCU/checklist.rst : >>>>>> >>>>>> [...] One example of non-obvious pairing is the XDP feature in networking, >>>>>> which calls BPF programs from network-driver NAPI (softirq) context. BPF >>>>>> relies heavily on RCU protection for its data structures, but because the >>>>>> BPF program invocation happens entirely within a single local_bh_disable() >>>>>> section in a NAPI poll cycle, this usage is safe. The reason that this usage >>>>>> is safe is that readers can use anything that disables BH when updaters use >>>>>> call_rcu() or synchronize_rcu(). [...] >>>>> >>>>> FWIW I sent a link to the thread to Paul and he confirmed >>>>> the RCU will wait for just the BH. >>>> >>>> so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side >>>> >>>> Paul, >>>> any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog >>>> with bpf_prog_run_xdp callers? >>>> >>>> with synchronize_rcu_tasks I'm getting splats like: >>>> https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f >>>> >>>> synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-) >>> >>> Jiri, >>> >>> I haven't tried to repro this yet, but I feel you're on >>> the wrong path here. The splat has this: >>> ? bpf_prog_run_xdp include/linux/filter.h:775 [inline] >>> ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400 >>> that test_run logic takes rcu_read_lock. >>> See bpf_test_timer_enter. >>> I suspect the addition of synchronize_rcu_tasks_rude >>> only slows down the race. >>> The synchronize_rcu_tasks_trace also behaves like synchronize_rcu. >>> See our new and fancy rcu_trace_implies_rcu_gp(), >>> but I'm not sure it applies to synchronize_rcu_tasks_rude. >>> Have you tried with just synchronize_rcu() ? >>> If your theory about the race is correct then >>> the vanila sync_rcu should help. >>> If not, the issue is some place else. >> >> synchronize_rcu seems to work as well, I'll keep the test >> running for some time > > looks good, Hao Sun, could you please test change below? Hi, Tested on a latest bpf-next build. The reproducer would trigger the Oops in 5 mins without the patch. After applying the patch, the reproducer cannot trigger any issue for more than 15 mins. Seems working, tested on: HEAD commit: ef3911a3e4d6 docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE git tree: bpf-next kernel config: https://pastebin.com/raw/rZdWLcgK C reproducer: https://pastebin.com/raw/GFfDn2Gk > > --- > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c > index c19719f48ce0..4b0fa5b98137 100644 > --- a/kernel/bpf/dispatcher.c > +++ b/kernel/bpf/dispatcher.c > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) > } > > __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func); > + synchronize_rcu(); > > if (new) > d->image_off = noff; ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot 2022-12-06 3:28 BUG: unable to handle kernel paging request in bpf_dispatcher_xdp Hao Sun 2022-12-06 6:46 ` Hao Sun @ 2022-12-08 8:44 ` Thorsten Leemhuis 2022-12-19 9:59 ` Thorsten Leemhuis 1 sibling, 1 reply; 26+ messages in thread From: Thorsten Leemhuis @ 2022-12-08 8:44 UTC (permalink / raw) To: bpf, regressions@lists.linux.dev; +Cc: Linux Kernel Mailing List, netdev [Note: this mail contains only information for Linux kernel regression tracking. Mails like these contain '#forregzbot' in the subject to make then easy to spot and filter out. The author also tried to remove most or all individuals from the list of recipients to spare them the hassle.] On 06.12.22 04:28, Hao Sun wrote: > > The following crash can be triggered with the BPF prog provided. > It seems the verifier passed some invalid progs. I will try to simplify > the C reproducer, for now, the following can reproduce this: Thanks for the report. To be sure below issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced c86df29d11df #regzbot title net/bpf: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot 2022-12-08 8:44 ` BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot Thorsten Leemhuis @ 2022-12-19 9:59 ` Thorsten Leemhuis 0 siblings, 0 replies; 26+ messages in thread From: Thorsten Leemhuis @ 2022-12-19 9:59 UTC (permalink / raw) To: bpf, regressions@lists.linux.dev; +Cc: Linux Kernel Mailing List, netdev On 08.12.22 09:44, Thorsten Leemhuis wrote: > [Note: this mail contains only information for Linux kernel regression > tracking. Mails like these contain '#forregzbot' in the subject to make > then easy to spot and filter out. The author also tried to remove most > or all individuals from the list of recipients to spare them the hassle.] > > On 06.12.22 04:28, Hao Sun wrote: >> >> The following crash can be triggered with the BPF prog provided. >> It seems the verifier passed some invalid progs. I will try to simplify >> the C reproducer, for now, the following can reproduce this: > > Thanks for the report. To be sure below issue doesn't fall through the > cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression > tracking bot: > > #regzbot ^introduced c86df29d11df > #regzbot title net/bpf: BUG: unable to handle kernel paging request in > bpf_dispatcher_xdp > #regzbot ignore-activity #regzbot fix: bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2022-12-19 9:59 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-06 3:28 BUG: unable to handle kernel paging request in bpf_dispatcher_xdp Hao Sun
2022-12-06 6:46 ` Hao Sun
2022-12-06 15:18 ` Jiri Olsa
2022-12-07 19:57 ` Alexei Starovoitov
2022-12-08 17:48 ` Alexei Starovoitov
2022-12-08 18:06 ` Jiri Olsa
[not found] ` <Y5JkomOZaCETLDaZ@krava>
2022-12-08 23:02 ` Jiri Olsa
2022-12-09 7:09 ` Jiri Olsa
[not found] ` <Y5MaffJOe1QtumSN@krava>
2022-12-09 13:50 ` Jiri Olsa
2022-12-09 15:20 ` Jiri Olsa
2022-12-09 20:31 ` Yonghong Song
2022-12-09 21:53 ` Jiri Olsa
2022-12-09 22:41 ` Daniel Borkmann
2022-12-09 23:07 ` Jiri Olsa
2022-12-09 23:29 ` Jiri Olsa
2022-12-09 23:32 ` Daniel Borkmann
2022-12-09 23:34 ` Jakub Kicinski
2022-12-10 0:06 ` Jiri Olsa
2022-12-10 0:38 ` Paul E. McKenney
2022-12-10 13:05 ` Jiri Olsa
2022-12-10 1:12 ` Alexei Starovoitov
2022-12-10 13:11 ` Jiri Olsa
2022-12-12 15:04 ` Jiri Olsa
2022-12-13 2:26 ` Hao Sun
2022-12-08 8:44 ` BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot Thorsten Leemhuis
2022-12-19 9:59 ` Thorsten Leemhuis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox