BUG: unable to handle kernel paging request in bpf_dispatcher

BPF List
 help / color / mirror / Atom feed

* BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
@ 2022-12-06  3:28 Hao Sun
  2022-12-06  6:46 ` Hao Sun
  2022-12-08  8:44 ` BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot Thorsten Leemhuis
  0 siblings, 2 replies; 26+ messages in thread
From: Hao Sun @ 2022-12-06  3:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Miller,
	Jakub Kicinski, hawk, Linux Kernel Mailing List, netdev

Hi,

The following crash can be triggered with the BPF prog provided.
It seems the verifier passed some invalid progs. I will try to simplify
the C reproducer, for now, the following can reproduce this:

HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in
functions in bpf_iter_ksym
git tree: bpf-next
console log: https://pastebin.com/raw/87RCSnCs
kernel config: https://pastebin.com/raw/rZdWLcgK
Syz reproducer: https://pastebin.com/raw/4kbwhdEv
C reproducer: https://pastebin.com/raw/GFfDn2Gk

wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50
IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
wlan1: Created IBSS using preconfigured BSSID 50:50:50:50:50:50
wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50
IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
BUG: unable to handle page fault for address: 000000000fe0840f
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 2ebe3067 P4D 2ebe3067 PUD 1dd9b067 PMD 0
Oops: 0002 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7536 Comm: a.out Not tainted
6.1.0-rc7-01489-gab0350c743d5-dirty #118
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux
1.16.1-1-1 04/01/2014
RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000
Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81
fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0
0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b
RSP: 0018:ffffc900029df908 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000
RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50
FS:  00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __bpf_prog_run include/linux/filter.h:600 [inline]
 ? bpf_prog_run_xdp include/linux/filter.h:775 [inline]
 ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400
 ? bpf_test_timer_continue+0x3d0/0x3d0 net/bpf/test_run.c:79
 ? bpf_dispatcher_xdp+0x800/0x1000
 ? bpf_dispatcher_xdp+0x800/0x1000
 ? bpf_dispatcher_xdp+0x800/0x1000
 ? _copy_from_user+0x5f/0x180 lib/usercopy.c:21
 ? bpf_test_init.isra.0+0x111/0x150 net/bpf/test_run.c:772
 ? bpf_prog_test_run_xdp+0xbde/0x1400 net/bpf/test_run.c:1389
 ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594
 ? rcu_lock_release include/linux/rcupdate.h:321 [inline]
 ? rcu_read_unlock include/linux/rcupdate.h:783 [inline]
 ? __fget_files+0x283/0x3e0 fs/file.c:914
 ? fput+0x30/0x1a0 fs/file_table.c:371
 ? ____bpf_prog_get kernel/bpf/syscall.c:2206 [inline]
 ? __bpf_prog_get+0x9a/0x2e0 kernel/bpf/syscall.c:2270
 ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594
 ? bpf_prog_test_run kernel/bpf/syscall.c:3644 [inline]
 ? __sys_bpf+0x1293/0x5840 kernel/bpf/syscall.c:4997
 ? futex_wait_setup+0x230/0x230 kernel/futex/waitwake.c:625
 ? bpf_perf_link_attach+0x520/0x520 kernel/bpf/syscall.c:2720
 ? instrument_atomic_read include/linux/instrumented.h:72 [inline]
 ? atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
 ? queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
 ? debug_spin_unlock kernel/locking/spinlock_debug.c:100 [inline]
 ? do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:140
 ? futex_wake+0x15b/0x4a0 kernel/futex/waitwake.c:161
 ? do_futex+0x130/0x350 kernel/futex/syscalls.c:122
 ? __ia32_sys_get_robust_list+0x3b0/0x3b0 kernel/futex/syscalls.c:72
 ? __do_sys_bpf kernel/bpf/syscall.c:5083 [inline]
 ? __se_sys_bpf kernel/bpf/syscall.c:5081 [inline]
 ? __x64_sys_bpf+0x78/0xc0 kernel/bpf/syscall.c:5081
 ? syscall_enter_from_user_mode+0x26/0xb0 kernel/entry/common.c:111
 ? do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 ? do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
 ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
 </TASK>
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
CR2: 000000000fe0840f
---[ end trace 0000000000000000 ]---
RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000
Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81
fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0
0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b
RSP: 0018:ffffc900029df908 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000
RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50
FS:  00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0
PKRU: 55555554

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-06  3:28 BUG: unable to handle kernel paging request in bpf_dispatcher_xdp Hao Sun
@ 2022-12-06  6:46 ` Hao Sun
  2022-12-06 15:18   ` Jiri Olsa
  2022-12-08  8:44 ` BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot Thorsten Leemhuis
  1 sibling, 1 reply; 26+ messages in thread
From: Hao Sun @ 2022-12-06  6:46 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Miller,
	Jakub Kicinski, hawk, Linux Kernel Mailing List, netdev

Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道：
>
> Hi,
>
> The following crash can be triggered with the BPF prog provided.
> It seems the verifier passed some invalid progs. I will try to simplify
> the C reproducer, for now, the following can reproduce this:
>
> HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in
> functions in bpf_iter_ksym
> git tree: bpf-next
> console log: https://pastebin.com/raw/87RCSnCs
> kernel config: https://pastebin.com/raw/rZdWLcgK
> Syz reproducer: https://pastebin.com/raw/4kbwhdEv
> C reproducer: https://pastebin.com/raw/GFfDn2Gk
>

Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW

Only two syscalls are required to reproduce this, seems it's an issue
in XDP test run. Essentially, the reproducer just loads a very simple
prog and tests run repeatedly and concurrently:

r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb,
&(0x7f0000000500)}, 0x80)
bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0,
0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48)

Loaded prog:
   0: (18) r0 = 0x0
   2: (18) r6 = 0x0
   4: (18) r7 = 0x0
   6: (18) r8 = 0x0
   8: (18) r9 = 0x0
  10: (95) exit

> wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50
> IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
> wlan1: Created IBSS using preconfigured BSSID 50:50:50:50:50:50
> wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50
> IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
> BUG: unable to handle page fault for address: 000000000fe0840f
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 2ebe3067 P4D 2ebe3067 PUD 1dd9b067 PMD 0
> Oops: 0002 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 7536 Comm: a.out Not tainted
> 6.1.0-rc7-01489-gab0350c743d5-dirty #118
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux
> 1.16.1-1-1 04/01/2014
> RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000
> Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81
> fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0
> 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b
> RSP: 0018:ffffc900029df908 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000
> RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70
> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
> R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50
> FS:  00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  ? __bpf_prog_run include/linux/filter.h:600 [inline]
>  ? bpf_prog_run_xdp include/linux/filter.h:775 [inline]
>  ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400
>  ? bpf_test_timer_continue+0x3d0/0x3d0 net/bpf/test_run.c:79
>  ? bpf_dispatcher_xdp+0x800/0x1000
>  ? bpf_dispatcher_xdp+0x800/0x1000
>  ? bpf_dispatcher_xdp+0x800/0x1000
>  ? _copy_from_user+0x5f/0x180 lib/usercopy.c:21
>  ? bpf_test_init.isra.0+0x111/0x150 net/bpf/test_run.c:772
>  ? bpf_prog_test_run_xdp+0xbde/0x1400 net/bpf/test_run.c:1389
>  ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594
>  ? rcu_lock_release include/linux/rcupdate.h:321 [inline]
>  ? rcu_read_unlock include/linux/rcupdate.h:783 [inline]
>  ? __fget_files+0x283/0x3e0 fs/file.c:914
>  ? fput+0x30/0x1a0 fs/file_table.c:371
>  ? ____bpf_prog_get kernel/bpf/syscall.c:2206 [inline]
>  ? __bpf_prog_get+0x9a/0x2e0 kernel/bpf/syscall.c:2270
>  ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594
>  ? bpf_prog_test_run kernel/bpf/syscall.c:3644 [inline]
>  ? __sys_bpf+0x1293/0x5840 kernel/bpf/syscall.c:4997
>  ? futex_wait_setup+0x230/0x230 kernel/futex/waitwake.c:625
>  ? bpf_perf_link_attach+0x520/0x520 kernel/bpf/syscall.c:2720
>  ? instrument_atomic_read include/linux/instrumented.h:72 [inline]
>  ? atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
>  ? queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
>  ? debug_spin_unlock kernel/locking/spinlock_debug.c:100 [inline]
>  ? do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:140
>  ? futex_wake+0x15b/0x4a0 kernel/futex/waitwake.c:161
>  ? do_futex+0x130/0x350 kernel/futex/syscalls.c:122
>  ? __ia32_sys_get_robust_list+0x3b0/0x3b0 kernel/futex/syscalls.c:72
>  ? __do_sys_bpf kernel/bpf/syscall.c:5083 [inline]
>  ? __se_sys_bpf kernel/bpf/syscall.c:5081 [inline]
>  ? __x64_sys_bpf+0x78/0xc0 kernel/bpf/syscall.c:5081
>  ? syscall_enter_from_user_mode+0x26/0xb0 kernel/entry/common.c:111
>  ? do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  ? do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
>  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
>  </TASK>
> Modules linked in:
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> CR2: 000000000fe0840f
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000
> Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81
> fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0
> 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b
> RSP: 0018:ffffc900029df908 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000
> RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70
> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
> R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50
> FS:  00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0
> PKRU: 55555554

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-06  6:46 ` Hao Sun
@ 2022-12-06 15:18   ` Jiri Olsa
  2022-12-07 19:57     ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2022-12-06 15:18 UTC (permalink / raw)
  To: Hao Sun, Peter Zijlstra
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, David Miller,
	Jakub Kicinski, hawk, Linux Kernel Mailing List, netdev

On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote:
> Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道：
> >
> > Hi,
> >
> > The following crash can be triggered with the BPF prog provided.
> > It seems the verifier passed some invalid progs. I will try to simplify
> > the C reproducer, for now, the following can reproduce this:
> >
> > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in
> > functions in bpf_iter_ksym
> > git tree: bpf-next
> > console log: https://pastebin.com/raw/87RCSnCs
> > kernel config: https://pastebin.com/raw/rZdWLcgK
> > Syz reproducer: https://pastebin.com/raw/4kbwhdEv
> > C reproducer: https://pastebin.com/raw/GFfDn2Gk
> >
> 
> Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW
> 
> Only two syscalls are required to reproduce this, seems it's an issue
> in XDP test run. Essentially, the reproducer just loads a very simple
> prog and tests run repeatedly and concurrently:
> 
> r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb,
> &(0x7f0000000500)}, 0x80)
> bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48)
> 
> Loaded prog:
>    0: (18) r0 = 0x0
>    2: (18) r6 = 0x0
>    4: (18) r7 = 0x0
>    6: (18) r8 = 0x0
>    8: (18) r9 = 0x0
>   10: (95) exit

hi,
I can reproduce with your config.. it seems related to the
recent static call change:
  c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)

I can't reproduce when I revert that commit.. Peter, any idea?

thanks,
jirka

> 
> > wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50
> > IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
> > wlan1: Created IBSS using preconfigured BSSID 50:50:50:50:50:50
> > wlan1: Creating new IBSS network, BSSID 50:50:50:50:50:50
> > IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
> > BUG: unable to handle page fault for address: 000000000fe0840f
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 2ebe3067 P4D 2ebe3067 PUD 1dd9b067 PMD 0
> > Oops: 0002 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 7536 Comm: a.out Not tainted
> > 6.1.0-rc7-01489-gab0350c743d5-dirty #118
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux
> > 1.16.1-1-1 04/01/2014
> > RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000
> > Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81
> > fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0
> > 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b
> > RSP: 0018:ffffc900029df908 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000
> > RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70
> > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
> > R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50
> > FS:  00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0
> > PKRU: 55555554
> > Call Trace:
> >  <TASK>
> >  ? __bpf_prog_run include/linux/filter.h:600 [inline]
> >  ? bpf_prog_run_xdp include/linux/filter.h:775 [inline]
> >  ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400
> >  ? bpf_test_timer_continue+0x3d0/0x3d0 net/bpf/test_run.c:79
> >  ? bpf_dispatcher_xdp+0x800/0x1000
> >  ? bpf_dispatcher_xdp+0x800/0x1000
> >  ? bpf_dispatcher_xdp+0x800/0x1000
> >  ? _copy_from_user+0x5f/0x180 lib/usercopy.c:21
> >  ? bpf_test_init.isra.0+0x111/0x150 net/bpf/test_run.c:772
> >  ? bpf_prog_test_run_xdp+0xbde/0x1400 net/bpf/test_run.c:1389
> >  ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594
> >  ? rcu_lock_release include/linux/rcupdate.h:321 [inline]
> >  ? rcu_read_unlock include/linux/rcupdate.h:783 [inline]
> >  ? __fget_files+0x283/0x3e0 fs/file.c:914
> >  ? fput+0x30/0x1a0 fs/file_table.c:371
> >  ? ____bpf_prog_get kernel/bpf/syscall.c:2206 [inline]
> >  ? __bpf_prog_get+0x9a/0x2e0 kernel/bpf/syscall.c:2270
> >  ? bpf_prog_test_run_skb+0x1dd0/0x1dd0 include/linux/skbuff.h:2594
> >  ? bpf_prog_test_run kernel/bpf/syscall.c:3644 [inline]
> >  ? __sys_bpf+0x1293/0x5840 kernel/bpf/syscall.c:4997
> >  ? futex_wait_setup+0x230/0x230 kernel/futex/waitwake.c:625
> >  ? bpf_perf_link_attach+0x520/0x520 kernel/bpf/syscall.c:2720
> >  ? instrument_atomic_read include/linux/instrumented.h:72 [inline]
> >  ? atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
> >  ? queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
> >  ? debug_spin_unlock kernel/locking/spinlock_debug.c:100 [inline]
> >  ? do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:140
> >  ? futex_wake+0x15b/0x4a0 kernel/futex/waitwake.c:161
> >  ? do_futex+0x130/0x350 kernel/futex/syscalls.c:122
> >  ? __ia32_sys_get_robust_list+0x3b0/0x3b0 kernel/futex/syscalls.c:72
> >  ? __do_sys_bpf kernel/bpf/syscall.c:5083 [inline]
> >  ? __se_sys_bpf kernel/bpf/syscall.c:5081 [inline]
> >  ? __x64_sys_bpf+0x78/0xc0 kernel/bpf/syscall.c:5081
> >  ? syscall_enter_from_user_mode+0x26/0xb0 kernel/entry/common.c:111
> >  ? do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >  ? do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
> >  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >  </TASK>
> > Modules linked in:
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > CR2: 000000000fe0840f
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:bpf_dispatcher_xdp+0x24/0x1000
> > Code: cc cc cc cc cc cc 48 81 fa e8 55 00 a0 0f 8f 63 00 00 00 48 81
> > fa d8 54 00 a0 7f 2a 48 81 fa 4c 53 00 a0 7f 11 48 81 fa 4c 53 <00> a0
> > 0f 84 e0 0f 00 00 ff e2 66 90 48 81 fa d8 54 00 a0 0f 84 5b
> > RSP: 0018:ffffc900029df908 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffc900028b9000 RCX: 0000000000000000
> > RDX: ffffffffa000534c RSI: ffffc900028b9048 RDI: ffffc900029dfb70
> > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
> > R13: 0000000000000001 R14: ffffc900028b9030 R15: ffffc900029dfb50
> > FS:  00007ff249efc700(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000000000fe0840f CR3: 000000002e0ba000 CR4: 0000000000750ef0
> > PKRU: 55555554

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-06 15:18   ` Jiri Olsa
@ 2022-12-07 19:57     ` Alexei Starovoitov
  2022-12-08 17:48       ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2022-12-07 19:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	David Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Linux Kernel Mailing List, netdev

On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote:
> > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道：
> > >
> > > Hi,
> > >
> > > The following crash can be triggered with the BPF prog provided.
> > > It seems the verifier passed some invalid progs. I will try to simplify
> > > the C reproducer, for now, the following can reproduce this:
> > >
> > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in
> > > functions in bpf_iter_ksym
> > > git tree: bpf-next
> > > console log: https://pastebin.com/raw/87RCSnCs
> > > kernel config: https://pastebin.com/raw/rZdWLcgK
> > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv
> > > C reproducer: https://pastebin.com/raw/GFfDn2Gk
> > >
> >
> > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW
> >
> > Only two syscalls are required to reproduce this, seems it's an issue
> > in XDP test run. Essentially, the reproducer just loads a very simple
> > prog and tests run repeatedly and concurrently:
> >
> > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb,
> > &(0x7f0000000500)}, 0x80)
> > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0,
> > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48)
> >
> > Loaded prog:
> >    0: (18) r0 = 0x0
> >    2: (18) r6 = 0x0
> >    4: (18) r7 = 0x0
> >    6: (18) r8 = 0x0
> >    8: (18) r9 = 0x0
> >   10: (95) exit
>
> hi,
> I can reproduce with your config.. it seems related to the
> recent static call change:
>   c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
>
> I can't reproduce when I revert that commit.. Peter, any idea?

Jiri,

I see your tested-by tag on Peter's commit c86df29d11df.
I assume you're actually tested it, but
this syzbot oops shows that even empty bpf prog crashes,
so there is something wrong with that commit.

What is the difference between this new kconfig and old one that
you've tested?

I'm trying to understand the severity of the issues and
whether we need to revert that commit asap since the merge window
is about to start.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-07 19:57     ` Alexei Starovoitov
@ 2022-12-08 17:48       ` Alexei Starovoitov
  2022-12-08 18:06         ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2022-12-08 17:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	David Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Linux Kernel Mailing List, netdev, Thorsten Leemhuis

On Wed, Dec 7, 2022 at 11:57 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote:
> > > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道：
> > > >
> > > > Hi,
> > > >
> > > > The following crash can be triggered with the BPF prog provided.
> > > > It seems the verifier passed some invalid progs. I will try to simplify
> > > > the C reproducer, for now, the following can reproduce this:
> > > >
> > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in
> > > > functions in bpf_iter_ksym
> > > > git tree: bpf-next
> > > > console log: https://pastebin.com/raw/87RCSnCs
> > > > kernel config: https://pastebin.com/raw/rZdWLcgK
> > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv
> > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk
> > > >
> > >
> > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW
> > >
> > > Only two syscalls are required to reproduce this, seems it's an issue
> > > in XDP test run. Essentially, the reproducer just loads a very simple
> > > prog and tests run repeatedly and concurrently:
> > >
> > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb,
> > > &(0x7f0000000500)}, 0x80)
> > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0,
> > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48)
> > >
> > > Loaded prog:
> > >    0: (18) r0 = 0x0
> > >    2: (18) r6 = 0x0
> > >    4: (18) r7 = 0x0
> > >    6: (18) r8 = 0x0
> > >    8: (18) r9 = 0x0
> > >   10: (95) exit
> >
> > hi,
> > I can reproduce with your config.. it seems related to the
> > recent static call change:
> >   c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
> >
> > I can't reproduce when I revert that commit.. Peter, any idea?
>
> Jiri,
>
> I see your tested-by tag on Peter's commit c86df29d11df.
> I assume you're actually tested it, but
> this syzbot oops shows that even empty bpf prog crashes,
> so there is something wrong with that commit.
>
> What is the difference between this new kconfig and old one that
> you've tested?
>
> I'm trying to understand the severity of the issues and
> whether we need to revert that commit asap since the merge window
> is about to start.

Jiri, Peter,

ping.

cc-ing Thorsten, since he's tracking it now.

The config has CONFIG_X86_KERNEL_IBT=y.
Is it related?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-08 17:48       ` Alexei Starovoitov
@ 2022-12-08 18:06         ` Jiri Olsa
       [not found]           ` <Y5JkomOZaCETLDaZ@krava>
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2022-12-08 18:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev,
	Thorsten Leemhuis

On Thu, Dec 08, 2022 at 09:48:52AM -0800, Alexei Starovoitov wrote:
> On Wed, Dec 7, 2022 at 11:57 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote:
> > > > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道：
> > > > >
> > > > > Hi,
> > > > >
> > > > > The following crash can be triggered with the BPF prog provided.
> > > > > It seems the verifier passed some invalid progs. I will try to simplify
> > > > > the C reproducer, for now, the following can reproduce this:
> > > > >
> > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in
> > > > > functions in bpf_iter_ksym
> > > > > git tree: bpf-next
> > > > > console log: https://pastebin.com/raw/87RCSnCs
> > > > > kernel config: https://pastebin.com/raw/rZdWLcgK
> > > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv
> > > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk
> > > > >
> > > >
> > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW
> > > >
> > > > Only two syscalls are required to reproduce this, seems it's an issue
> > > > in XDP test run. Essentially, the reproducer just loads a very simple
> > > > prog and tests run repeatedly and concurrently:
> > > >
> > > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb,
> > > > &(0x7f0000000500)}, 0x80)
> > > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0,
> > > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48)
> > > >
> > > > Loaded prog:
> > > >    0: (18) r0 = 0x0
> > > >    2: (18) r6 = 0x0
> > > >    4: (18) r7 = 0x0
> > > >    6: (18) r8 = 0x0
> > > >    8: (18) r9 = 0x0
> > > >   10: (95) exit
> > >
> > > hi,
> > > I can reproduce with your config.. it seems related to the
> > > recent static call change:
> > >   c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
> > >
> > > I can't reproduce when I revert that commit.. Peter, any idea?
> >
> > Jiri,
> >
> > I see your tested-by tag on Peter's commit c86df29d11df.
> > I assume you're actually tested it, but
> > this syzbot oops shows that even empty bpf prog crashes,
> > so there is something wrong with that commit.
> >
> > What is the difference between this new kconfig and old one that
> > you've tested?
> >
> > I'm trying to understand the severity of the issues and
> > whether we need to revert that commit asap since the merge window
> > is about to start.
> 
> Jiri, Peter,
> 
> ping.
> 
> cc-ing Thorsten, since he's tracking it now.
> 
> The config has CONFIG_X86_KERNEL_IBT=y.
> Is it related?

sorry for late reply.. I still did not find the reason,
but I did not try with IBT yet, will test now

jirka

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <Y5JkomOZaCETLDaZ@krava>]

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
       [not found]           ` <Y5JkomOZaCETLDaZ@krava>
@ 2022-12-08 23:02             ` Jiri Olsa
  2022-12-09  7:09               ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2022-12-08 23:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Hao Sun, Peter Zijlstra, bpf,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, David Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List,
	netdev, Thorsten Leemhuis

On Thu, Dec 08, 2022 at 11:26:45PM +0100, Jiri Olsa wrote:
> On Thu, Dec 08, 2022 at 07:06:59PM +0100, Jiri Olsa wrote:
> > On Thu, Dec 08, 2022 at 09:48:52AM -0800, Alexei Starovoitov wrote:
> > > On Wed, Dec 7, 2022 at 11:57 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > >
> > > > > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote:
> > > > > > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道：
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > The following crash can be triggered with the BPF prog provided.
> > > > > > > It seems the verifier passed some invalid progs. I will try to simplify
> > > > > > > the C reproducer, for now, the following can reproduce this:
> > > > > > >
> > > > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in
> > > > > > > functions in bpf_iter_ksym
> > > > > > > git tree: bpf-next
> > > > > > > console log: https://pastebin.com/raw/87RCSnCs
> > > > > > > kernel config: https://pastebin.com/raw/rZdWLcgK
> > > > > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv
> > > > > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk
> > > > > > >
> > > > > >
> > > > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW
> > > > > >
> > > > > > Only two syscalls are required to reproduce this, seems it's an issue
> > > > > > in XDP test run. Essentially, the reproducer just loads a very simple
> > > > > > prog and tests run repeatedly and concurrently:
> > > > > >
> > > > > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb,
> > > > > > &(0x7f0000000500)}, 0x80)
> > > > > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0,
> > > > > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48)
> > > > > >
> > > > > > Loaded prog:
> > > > > >    0: (18) r0 = 0x0
> > > > > >    2: (18) r6 = 0x0
> > > > > >    4: (18) r7 = 0x0
> > > > > >    6: (18) r8 = 0x0
> > > > > >    8: (18) r9 = 0x0
> > > > > >   10: (95) exit
> > > > >
> > > > > hi,
> > > > > I can reproduce with your config.. it seems related to the
> > > > > recent static call change:
> > > > >   c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
> > > > >
> > > > > I can't reproduce when I revert that commit.. Peter, any idea?
> > > >
> > > > Jiri,
> > > >
> > > > I see your tested-by tag on Peter's commit c86df29d11df.
> > > > I assume you're actually tested it, but
> > > > this syzbot oops shows that even empty bpf prog crashes,
> > > > so there is something wrong with that commit.
> > > >
> > > > What is the difference between this new kconfig and old one that
> > > > you've tested?
> 
> I attached the diff, 'config-issue' is the one that reproduces the issue
> 
> > > >
> > > > I'm trying to understand the severity of the issues and
> > > > whether we need to revert that commit asap since the merge window
> > > > is about to start.
> > > 
> > > Jiri, Peter,
> > > 
> > > ping.
> > > 
> > > cc-ing Thorsten, since he's tracking it now.
> > > 
> > > The config has CONFIG_X86_KERNEL_IBT=y.
> > > Is it related?
> > 
> > sorry for late reply.. I still did not find the reason,
> > but I did not try with IBT yet, will test now
> 
> no difference with IBT enabled, can't reproduce the issue
> 

ok, scratch that.. the reproducer got stuck on wifi init :-\

after I fix that I can now reproduce on my local config with
IBT enabled or disabled.. it's something else

jirka

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-08 23:02             ` Jiri Olsa
@ 2022-12-09  7:09               ` Jiri Olsa
       [not found]                 ` <Y5MaffJOe1QtumSN@krava>
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2022-12-09  7:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Hao Sun, Peter Zijlstra, bpf,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, David Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List,
	netdev, Thorsten Leemhuis

On Fri, Dec 09, 2022 at 12:02:24AM +0100, Jiri Olsa wrote:
> On Thu, Dec 08, 2022 at 11:26:45PM +0100, Jiri Olsa wrote:
> > On Thu, Dec 08, 2022 at 07:06:59PM +0100, Jiri Olsa wrote:
> > > On Thu, Dec 08, 2022 at 09:48:52AM -0800, Alexei Starovoitov wrote:
> > > > On Wed, Dec 7, 2022 at 11:57 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote:
> > > > > > > Hao Sun <sunhao.th@gmail.com> 于2022年12月6日周二 11:28写道：
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > The following crash can be triggered with the BPF prog provided.
> > > > > > > > It seems the verifier passed some invalid progs. I will try to simplify
> > > > > > > > the C reproducer, for now, the following can reproduce this:
> > > > > > > >
> > > > > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in
> > > > > > > > functions in bpf_iter_ksym
> > > > > > > > git tree: bpf-next
> > > > > > > > console log: https://pastebin.com/raw/87RCSnCs
> > > > > > > > kernel config: https://pastebin.com/raw/rZdWLcgK
> > > > > > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv
> > > > > > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk
> > > > > > > >
> > > > > > >
> > > > > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW
> > > > > > >
> > > > > > > Only two syscalls are required to reproduce this, seems it's an issue
> > > > > > > in XDP test run. Essentially, the reproducer just loads a very simple
> > > > > > > prog and tests run repeatedly and concurrently:
> > > > > > >
> > > > > > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb,
> > > > > > > &(0x7f0000000500)}, 0x80)
> > > > > > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0,
> > > > > > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48)
> > > > > > >
> > > > > > > Loaded prog:
> > > > > > >    0: (18) r0 = 0x0
> > > > > > >    2: (18) r6 = 0x0
> > > > > > >    4: (18) r7 = 0x0
> > > > > > >    6: (18) r8 = 0x0
> > > > > > >    8: (18) r9 = 0x0
> > > > > > >   10: (95) exit
> > > > > >
> > > > > > hi,
> > > > > > I can reproduce with your config.. it seems related to the
> > > > > > recent static call change:
> > > > > >   c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
> > > > > >
> > > > > > I can't reproduce when I revert that commit.. Peter, any idea?
> > > > >
> > > > > Jiri,
> > > > >
> > > > > I see your tested-by tag on Peter's commit c86df29d11df.
> > > > > I assume you're actually tested it, but
> > > > > this syzbot oops shows that even empty bpf prog crashes,
> > > > > so there is something wrong with that commit.
> > > > >
> > > > > What is the difference between this new kconfig and old one that
> > > > > you've tested?
> > 
> > I attached the diff, 'config-issue' is the one that reproduces the issue
> > 
> > > > >
> > > > > I'm trying to understand the severity of the issues and
> > > > > whether we need to revert that commit asap since the merge window
> > > > > is about to start.
> > > > 
> > > > Jiri, Peter,
> > > > 
> > > > ping.
> > > > 
> > > > cc-ing Thorsten, since he's tracking it now.
> > > > 
> > > > The config has CONFIG_X86_KERNEL_IBT=y.
> > > > Is it related?
> > > 
> > > sorry for late reply.. I still did not find the reason,
> > > but I did not try with IBT yet, will test now
> > 
> > no difference with IBT enabled, can't reproduce the issue
> > 
> 
> ok, scratch that.. the reproducer got stuck on wifi init :-\
> 
> after I fix that I can now reproduce on my local config with
> IBT enabled or disabled.. it's something else

I'm getting the error also when reverting the static call change,
looking for good commit, bisecting

I'm getting fail with:
   f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4

v6.1-rc1 is ok

jirka

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <Y5MaffJOe1QtumSN@krava>]

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
       [not found]                 ` <Y5MaffJOe1QtumSN@krava>
@ 2022-12-09 13:50                   ` Jiri Olsa
  2022-12-09 15:20                     ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2022-12-09 13:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Song Liu
  Cc: Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, David Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Linux Kernel Mailing List,
	netdev, Thorsten Leemhuis

On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote:

SBIP

> > > > > > >
> > > > > > > I'm trying to understand the severity of the issues and
> > > > > > > whether we need to revert that commit asap since the merge window
> > > > > > > is about to start.
> > > > > > 
> > > > > > Jiri, Peter,
> > > > > > 
> > > > > > ping.
> > > > > > 
> > > > > > cc-ing Thorsten, since he's tracking it now.
> > > > > > 
> > > > > > The config has CONFIG_X86_KERNEL_IBT=y.
> > > > > > Is it related?
> > > > > 
> > > > > sorry for late reply.. I still did not find the reason,
> > > > > but I did not try with IBT yet, will test now
> > > > 
> > > > no difference with IBT enabled, can't reproduce the issue
> > > > 
> > > 
> > > ok, scratch that.. the reproducer got stuck on wifi init :-\
> > > 
> > > after I fix that I can now reproduce on my local config with
> > > IBT enabled or disabled.. it's something else
> > 
> > I'm getting the error also when reverting the static call change,
> > looking for good commit, bisecting
> > 
> > I'm getting fail with:
> >    f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4
> > 
> > v6.1-rc1 is ok
> 
> so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far
> 
> attaching some more logs

looking at the code.. how do we ensure that code running through
bpf_prog_run_xdp will not get dispatcher image changed while
it's being exetuted

we use 'the other half' of the image when we add/remove programs,
but could bpf_dispatcher_update race with bpf_prog_run_xdp like:


cpu 0:                                  cpu 1:

bpf_prog_run_xdp
   ...
   bpf_dispatcher_xdp_func
      start exec image at offset 0x0

                                        bpf_dispatcher_update
                                                update image at offset 0x800
                                        bpf_dispatcher_update
                                                update image at offset 0x0

      still in image at offset 0x0


that might explain why I wasn't able to trigger that on
bare metal just in qemu

jirka

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-09 13:50                   ` Jiri Olsa
@ 2022-12-09 15:20                     ` Jiri Olsa
  2022-12-09 20:31                       ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2022-12-09 15:20 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev,
	Thorsten Leemhuis

On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote:
> On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote:
> 
> SBIP
> 
> > > > > > > >
> > > > > > > > I'm trying to understand the severity of the issues and
> > > > > > > > whether we need to revert that commit asap since the merge window
> > > > > > > > is about to start.
> > > > > > > 
> > > > > > > Jiri, Peter,
> > > > > > > 
> > > > > > > ping.
> > > > > > > 
> > > > > > > cc-ing Thorsten, since he's tracking it now.
> > > > > > > 
> > > > > > > The config has CONFIG_X86_KERNEL_IBT=y.
> > > > > > > Is it related?
> > > > > > 
> > > > > > sorry for late reply.. I still did not find the reason,
> > > > > > but I did not try with IBT yet, will test now
> > > > > 
> > > > > no difference with IBT enabled, can't reproduce the issue
> > > > > 
> > > > 
> > > > ok, scratch that.. the reproducer got stuck on wifi init :-\
> > > > 
> > > > after I fix that I can now reproduce on my local config with
> > > > IBT enabled or disabled.. it's something else
> > > 
> > > I'm getting the error also when reverting the static call change,
> > > looking for good commit, bisecting
> > > 
> > > I'm getting fail with:
> > >    f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4
> > > 
> > > v6.1-rc1 is ok
> > 
> > so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far
> > 
> > attaching some more logs
> 
> looking at the code.. how do we ensure that code running through
> bpf_prog_run_xdp will not get dispatcher image changed while
> it's being exetuted
> 
> we use 'the other half' of the image when we add/remove programs,
> but could bpf_dispatcher_update race with bpf_prog_run_xdp like:
> 
> 
> cpu 0:                                  cpu 1:
> 
> bpf_prog_run_xdp
>    ...
>    bpf_dispatcher_xdp_func
>       start exec image at offset 0x0
> 
>                                         bpf_dispatcher_update
>                                                 update image at offset 0x800
>                                         bpf_dispatcher_update
>                                                 update image at offset 0x0
> 
>       still in image at offset 0x0
> 
> 
> that might explain why I wasn't able to trigger that on
> bare metal just in qemu

I tried patch below and it fixes the issue for me and seems
to confirm the race above.. but not sure it's the best fix

jirka


---
diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
index c19719f48ce0..6a2ced102fc7 100644
--- a/kernel/bpf/dispatcher.c
+++ b/kernel/bpf/dispatcher.c
@@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
 	}
 
 	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
+	synchronize_rcu_tasks();
 
 	if (new)
 		d->image_off = noff;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-09 15:20                     ` Jiri Olsa
@ 2022-12-09 20:31                       ` Yonghong Song
  2022-12-09 21:53                         ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2022-12-09 20:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev,
	Thorsten Leemhuis



On 12/9/22 7:20 AM, Jiri Olsa wrote:
> On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote:
>> On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote:
>>
>> SBIP
>>
>>>>>>>>>
>>>>>>>>> I'm trying to understand the severity of the issues and
>>>>>>>>> whether we need to revert that commit asap since the merge window
>>>>>>>>> is about to start.
>>>>>>>>
>>>>>>>> Jiri, Peter,
>>>>>>>>
>>>>>>>> ping.
>>>>>>>>
>>>>>>>> cc-ing Thorsten, since he's tracking it now.
>>>>>>>>
>>>>>>>> The config has CONFIG_X86_KERNEL_IBT=y.
>>>>>>>> Is it related?
>>>>>>>
>>>>>>> sorry for late reply.. I still did not find the reason,
>>>>>>> but I did not try with IBT yet, will test now
>>>>>>
>>>>>> no difference with IBT enabled, can't reproduce the issue
>>>>>>
>>>>>
>>>>> ok, scratch that.. the reproducer got stuck on wifi init :-\
>>>>>
>>>>> after I fix that I can now reproduce on my local config with
>>>>> IBT enabled or disabled.. it's something else
>>>>
>>>> I'm getting the error also when reverting the static call change,
>>>> looking for good commit, bisecting
>>>>
>>>> I'm getting fail with:
>>>>     f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4
>>>>
>>>> v6.1-rc1 is ok
>>>
>>> so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far
>>>
>>> attaching some more logs
>>
>> looking at the code.. how do we ensure that code running through
>> bpf_prog_run_xdp will not get dispatcher image changed while
>> it's being exetuted
>>
>> we use 'the other half' of the image when we add/remove programs,
>> but could bpf_dispatcher_update race with bpf_prog_run_xdp like:
>>
>>
>> cpu 0:                                  cpu 1:
>>
>> bpf_prog_run_xdp
>>     ...
>>     bpf_dispatcher_xdp_func
>>        start exec image at offset 0x0
>>
>>                                          bpf_dispatcher_update
>>                                                  update image at offset 0x800
>>                                          bpf_dispatcher_update
>>                                                  update image at offset 0x0
>>
>>        still in image at offset 0x0
>>
>>
>> that might explain why I wasn't able to trigger that on
>> bare metal just in qemu
> 
> I tried patch below and it fixes the issue for me and seems
> to confirm the race above.. but not sure it's the best fix
> 
> jirka
> 
> 
> ---
> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> index c19719f48ce0..6a2ced102fc7 100644
> --- a/kernel/bpf/dispatcher.c
> +++ b/kernel/bpf/dispatcher.c
> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
>   	}
>   
>   	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> +	synchronize_rcu_tasks();
>   
>   	if (new)
>   		d->image_off = noff;

This might work. In arch/x86/kernel/alternative.c, we have following
code and comments. For text_poke, synchronize_rcu_tasks() might be able
to avoid concurrent execution and update.

/**
  * text_poke_copy - Copy instructions into (an unused part of) RX memory
  * @addr: address to modify
  * @opcode: source of the copy
  * @len: length to copy, could be more than 2x PAGE_SIZE
  *
  * Not safe against concurrent execution; useful for JITs to dump
  * new code blocks into unused regions of RX memory. Can be used in
  * conjunction with synchronize_rcu_tasks() to wait for existing
  * execution to quiesce after having made sure no existing functions
  * pointers are live.
  */
void *text_poke_copy(void *addr, const void *opcode, size_t len)
{
         unsigned long start = (unsigned long)addr;
         size_t patched = 0;

         if (WARN_ON_ONCE(core_kernel_text(start)))
                 return NULL;

         mutex_lock(&text_mutex);
         while (patched < len) {
                 unsigned long ptr = start + patched;
                 size_t s;

                 s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), 
len - patched);

                 __text_poke(text_poke_memcpy, (void *)ptr, opcode + 
patched, s);
                 patched += s;
         }
         mutex_unlock(&text_mutex);
         return addr;
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-09 20:31                       ` Yonghong Song
@ 2022-12-09 21:53                         ` Jiri Olsa
  2022-12-09 22:41                           ` Daniel Borkmann
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2022-12-09 21:53 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Jiri Olsa, Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra,
	bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev,
	Thorsten Leemhuis

On Fri, Dec 09, 2022 at 12:31:06PM -0800, Yonghong Song wrote:
> 
> 
> On 12/9/22 7:20 AM, Jiri Olsa wrote:
> > On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote:
> > > On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote:
> > > 
> > > SBIP
> > > 
> > > > > > > > > > 
> > > > > > > > > > I'm trying to understand the severity of the issues and
> > > > > > > > > > whether we need to revert that commit asap since the merge window
> > > > > > > > > > is about to start.
> > > > > > > > > 
> > > > > > > > > Jiri, Peter,
> > > > > > > > > 
> > > > > > > > > ping.
> > > > > > > > > 
> > > > > > > > > cc-ing Thorsten, since he's tracking it now.
> > > > > > > > > 
> > > > > > > > > The config has CONFIG_X86_KERNEL_IBT=y.
> > > > > > > > > Is it related?
> > > > > > > > 
> > > > > > > > sorry for late reply.. I still did not find the reason,
> > > > > > > > but I did not try with IBT yet, will test now
> > > > > > > 
> > > > > > > no difference with IBT enabled, can't reproduce the issue
> > > > > > > 
> > > > > > 
> > > > > > ok, scratch that.. the reproducer got stuck on wifi init :-\
> > > > > > 
> > > > > > after I fix that I can now reproduce on my local config with
> > > > > > IBT enabled or disabled.. it's something else
> > > > > 
> > > > > I'm getting the error also when reverting the static call change,
> > > > > looking for good commit, bisecting
> > > > > 
> > > > > I'm getting fail with:
> > > > >     f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4
> > > > > 
> > > > > v6.1-rc1 is ok
> > > > 
> > > > so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far
> > > > 
> > > > attaching some more logs
> > > 
> > > looking at the code.. how do we ensure that code running through
> > > bpf_prog_run_xdp will not get dispatcher image changed while
> > > it's being exetuted
> > > 
> > > we use 'the other half' of the image when we add/remove programs,
> > > but could bpf_dispatcher_update race with bpf_prog_run_xdp like:
> > > 
> > > 
> > > cpu 0:                                  cpu 1:
> > > 
> > > bpf_prog_run_xdp
> > >     ...
> > >     bpf_dispatcher_xdp_func
> > >        start exec image at offset 0x0
> > > 
> > >                                          bpf_dispatcher_update
> > >                                                  update image at offset 0x800
> > >                                          bpf_dispatcher_update
> > >                                                  update image at offset 0x0
> > > 
> > >        still in image at offset 0x0
> > > 
> > > 
> > > that might explain why I wasn't able to trigger that on
> > > bare metal just in qemu
> > 
> > I tried patch below and it fixes the issue for me and seems
> > to confirm the race above.. but not sure it's the best fix
> > 
> > jirka
> > 
> > 
> > ---
> > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> > index c19719f48ce0..6a2ced102fc7 100644
> > --- a/kernel/bpf/dispatcher.c
> > +++ b/kernel/bpf/dispatcher.c
> > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
> >   	}
> >   	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> > +	synchronize_rcu_tasks();
> >   	if (new)
> >   		d->image_off = noff;
> 
> This might work. In arch/x86/kernel/alternative.c, we have following
> code and comments. For text_poke, synchronize_rcu_tasks() might be able
> to avoid concurrent execution and update.

so my idea was that we need to ensure all the current callers of
bpf_dispatcher_xdp_func (which should have rcu read lock, based
on the comment in bpf_prog_run_xdp) are gone before and new ones
execute the new image, so the next call to the bpf_dispatcher_update
will be safe to overwrite the other half of the image

jirka

> 
> /**
>  * text_poke_copy - Copy instructions into (an unused part of) RX memory
>  * @addr: address to modify
>  * @opcode: source of the copy
>  * @len: length to copy, could be more than 2x PAGE_SIZE
>  *
>  * Not safe against concurrent execution; useful for JITs to dump
>  * new code blocks into unused regions of RX memory. Can be used in
>  * conjunction with synchronize_rcu_tasks() to wait for existing
>  * execution to quiesce after having made sure no existing functions
>  * pointers are live.
>  */
> void *text_poke_copy(void *addr, const void *opcode, size_t len)
> {
>         unsigned long start = (unsigned long)addr;
>         size_t patched = 0;
> 
>         if (WARN_ON_ONCE(core_kernel_text(start)))
>                 return NULL;
> 
>         mutex_lock(&text_mutex);
>         while (patched < len) {
>                 unsigned long ptr = start + patched;
>                 size_t s;
> 
>                 s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len -
> patched);
> 
>                 __text_poke(text_poke_memcpy, (void *)ptr, opcode + patched,
> s);
>                 patched += s;
>         }
>         mutex_unlock(&text_mutex);
>         return addr;
> }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-09 21:53                         ` Jiri Olsa
@ 2022-12-09 22:41                           ` Daniel Borkmann
  2022-12-09 23:07                             ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Borkmann @ 2022-12-09 22:41 UTC (permalink / raw)
  To: Jiri Olsa, Yonghong Song
  Cc: Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf,
	Alexei Starovoitov, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev,
	Hao Luo, David Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Linux Kernel Mailing List, netdev, Thorsten Leemhuis

On 12/9/22 10:53 PM, Jiri Olsa wrote:
> On Fri, Dec 09, 2022 at 12:31:06PM -0800, Yonghong Song wrote:
>>
>>
>> On 12/9/22 7:20 AM, Jiri Olsa wrote:
>>> On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote:
>>>> On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote:
>>>>
>>>> SBIP
>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm trying to understand the severity of the issues and
>>>>>>>>>>> whether we need to revert that commit asap since the merge window
>>>>>>>>>>> is about to start.
>>>>>>>>>>
>>>>>>>>>> Jiri, Peter,
>>>>>>>>>>
>>>>>>>>>> ping.
>>>>>>>>>>
>>>>>>>>>> cc-ing Thorsten, since he's tracking it now.
>>>>>>>>>>
>>>>>>>>>> The config has CONFIG_X86_KERNEL_IBT=y.
>>>>>>>>>> Is it related?
>>>>>>>>>
>>>>>>>>> sorry for late reply.. I still did not find the reason,
>>>>>>>>> but I did not try with IBT yet, will test now
>>>>>>>>
>>>>>>>> no difference with IBT enabled, can't reproduce the issue
>>>>>>>>
>>>>>>>
>>>>>>> ok, scratch that.. the reproducer got stuck on wifi init :-\
>>>>>>>
>>>>>>> after I fix that I can now reproduce on my local config with
>>>>>>> IBT enabled or disabled.. it's something else
>>>>>>
>>>>>> I'm getting the error also when reverting the static call change,
>>>>>> looking for good commit, bisecting
>>>>>>
>>>>>> I'm getting fail with:
>>>>>>      f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4
>>>>>>
>>>>>> v6.1-rc1 is ok
>>>>>
>>>>> so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far
>>>>>
>>>>> attaching some more logs
>>>>
>>>> looking at the code.. how do we ensure that code running through
>>>> bpf_prog_run_xdp will not get dispatcher image changed while
>>>> it's being exetuted
>>>>
>>>> we use 'the other half' of the image when we add/remove programs,
>>>> but could bpf_dispatcher_update race with bpf_prog_run_xdp like:
>>>>
>>>>
>>>> cpu 0:                                  cpu 1:
>>>>
>>>> bpf_prog_run_xdp
>>>>      ...
>>>>      bpf_dispatcher_xdp_func
>>>>         start exec image at offset 0x0
>>>>
>>>>                                           bpf_dispatcher_update
>>>>                                                   update image at offset 0x800
>>>>                                           bpf_dispatcher_update
>>>>                                                   update image at offset 0x0
>>>>
>>>>         still in image at offset 0x0
>>>>
>>>>
>>>> that might explain why I wasn't able to trigger that on
>>>> bare metal just in qemu
>>>
>>> I tried patch below and it fixes the issue for me and seems
>>> to confirm the race above.. but not sure it's the best fix
>>>
>>> jirka
>>>
>>>
>>> ---
>>> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
>>> index c19719f48ce0..6a2ced102fc7 100644
>>> --- a/kernel/bpf/dispatcher.c
>>> +++ b/kernel/bpf/dispatcher.c
>>> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
>>>    	}
>>>    	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
>>> +	synchronize_rcu_tasks();
>>>    	if (new)
>>>    		d->image_off = noff;
>>
>> This might work. In arch/x86/kernel/alternative.c, we have following
>> code and comments. For text_poke, synchronize_rcu_tasks() might be able
>> to avoid concurrent execution and update.
> 
> so my idea was that we need to ensure all the current callers of
> bpf_dispatcher_xdp_func (which should have rcu read lock, based
> on the comment in bpf_prog_run_xdp) are gone before and new ones
> execute the new image, so the next call to the bpf_dispatcher_update
> will be safe to overwrite the other half of the image

If v6.1-rc1 was indeed okay, then it looks like this may be related to
the trampoline patching for the static_call? Did it repro on v6.1-rc1
just with dbe69b299884 ("bpf: Fix dispatcher patchable function entry
to 5 bytes nop") cherry-picked?

>> /**
>>   * text_poke_copy - Copy instructions into (an unused part of) RX memory
>>   * @addr: address to modify
>>   * @opcode: source of the copy
>>   * @len: length to copy, could be more than 2x PAGE_SIZE
>>   *
>>   * Not safe against concurrent execution; useful for JITs to dump
>>   * new code blocks into unused regions of RX memory. Can be used in
>>   * conjunction with synchronize_rcu_tasks() to wait for existing
>>   * execution to quiesce after having made sure no existing functions
>>   * pointers are live.
>>   */
>> void *text_poke_copy(void *addr, const void *opcode, size_t len)
>> {
>>          unsigned long start = (unsigned long)addr;
>>          size_t patched = 0;
>>
>>          if (WARN_ON_ONCE(core_kernel_text(start)))
>>                  return NULL;
>>
>>          mutex_lock(&text_mutex);
>>          while (patched < len) {
>>                  unsigned long ptr = start + patched;
>>                  size_t s;
>>
>>                  s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len -
>> patched);
>>
>>                  __text_poke(text_poke_memcpy, (void *)ptr, opcode + patched,
>> s);
>>                  patched += s;
>>          }
>>          mutex_unlock(&text_mutex);
>>          return addr;
>> }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-09 22:41                           ` Daniel Borkmann
@ 2022-12-09 23:07                             ` Jiri Olsa
  2022-12-09 23:29                               ` Jiri Olsa
  2022-12-09 23:32                               ` Daniel Borkmann
  0 siblings, 2 replies; 26+ messages in thread
From: Jiri Olsa @ 2022-12-09 23:07 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jiri Olsa, Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun,
	Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev,
	Thorsten Leemhuis

On Fri, Dec 09, 2022 at 11:41:11PM +0100, Daniel Borkmann wrote:
> On 12/9/22 10:53 PM, Jiri Olsa wrote:
> > On Fri, Dec 09, 2022 at 12:31:06PM -0800, Yonghong Song wrote:
> > > 
> > > 
> > > On 12/9/22 7:20 AM, Jiri Olsa wrote:
> > > > On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote:
> > > > > On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote:
> > > > > 
> > > > > SBIP
> > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > I'm trying to understand the severity of the issues and
> > > > > > > > > > > > whether we need to revert that commit asap since the merge window
> > > > > > > > > > > > is about to start.
> > > > > > > > > > > 
> > > > > > > > > > > Jiri, Peter,
> > > > > > > > > > > 
> > > > > > > > > > > ping.
> > > > > > > > > > > 
> > > > > > > > > > > cc-ing Thorsten, since he's tracking it now.
> > > > > > > > > > > 
> > > > > > > > > > > The config has CONFIG_X86_KERNEL_IBT=y.
> > > > > > > > > > > Is it related?
> > > > > > > > > > 
> > > > > > > > > > sorry for late reply.. I still did not find the reason,
> > > > > > > > > > but I did not try with IBT yet, will test now
> > > > > > > > > 
> > > > > > > > > no difference with IBT enabled, can't reproduce the issue
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > ok, scratch that.. the reproducer got stuck on wifi init :-\
> > > > > > > > 
> > > > > > > > after I fix that I can now reproduce on my local config with
> > > > > > > > IBT enabled or disabled.. it's something else
> > > > > > > 
> > > > > > > I'm getting the error also when reverting the static call change,
> > > > > > > looking for good commit, bisecting
> > > > > > > 
> > > > > > > I'm getting fail with:
> > > > > > >      f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4
> > > > > > > 
> > > > > > > v6.1-rc1 is ok
> > > > > > 
> > > > > > so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far
> > > > > > 
> > > > > > attaching some more logs
> > > > > 
> > > > > looking at the code.. how do we ensure that code running through
> > > > > bpf_prog_run_xdp will not get dispatcher image changed while
> > > > > it's being exetuted
> > > > > 
> > > > > we use 'the other half' of the image when we add/remove programs,
> > > > > but could bpf_dispatcher_update race with bpf_prog_run_xdp like:
> > > > > 
> > > > > 
> > > > > cpu 0:                                  cpu 1:
> > > > > 
> > > > > bpf_prog_run_xdp
> > > > >      ...
> > > > >      bpf_dispatcher_xdp_func
> > > > >         start exec image at offset 0x0
> > > > > 
> > > > >                                           bpf_dispatcher_update
> > > > >                                                   update image at offset 0x800
> > > > >                                           bpf_dispatcher_update
> > > > >                                                   update image at offset 0x0
> > > > > 
> > > > >         still in image at offset 0x0
> > > > > 
> > > > > 
> > > > > that might explain why I wasn't able to trigger that on
> > > > > bare metal just in qemu
> > > > 
> > > > I tried patch below and it fixes the issue for me and seems
> > > > to confirm the race above.. but not sure it's the best fix
> > > > 
> > > > jirka
> > > > 
> > > > 
> > > > ---
> > > > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> > > > index c19719f48ce0..6a2ced102fc7 100644
> > > > --- a/kernel/bpf/dispatcher.c
> > > > +++ b/kernel/bpf/dispatcher.c
> > > > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
> > > >    	}
> > > >    	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> > > > +	synchronize_rcu_tasks();
> > > >    	if (new)
> > > >    		d->image_off = noff;
> > > 
> > > This might work. In arch/x86/kernel/alternative.c, we have following
> > > code and comments. For text_poke, synchronize_rcu_tasks() might be able
> > > to avoid concurrent execution and update.
> > 
> > so my idea was that we need to ensure all the current callers of
> > bpf_dispatcher_xdp_func (which should have rcu read lock, based
> > on the comment in bpf_prog_run_xdp) are gone before and new ones
> > execute the new image, so the next call to the bpf_dispatcher_update
> > will be safe to overwrite the other half of the image
> 
> If v6.1-rc1 was indeed okay, then it looks like this may be related to
> the trampoline patching for the static_call? Did it repro on v6.1-rc1
> just with dbe69b299884 ("bpf: Fix dispatcher patchable function entry
> to 5 bytes nop") cherry-picked?

I'll try that.. it looks to me like the problem was always there,
maybe harder to trigger.. also to reproduce it you need to call
bpf_dispatcher_update heavily, which is not probably the common
use case

one other thing is that I think the fix might need rcu locking
on the bpf_dispatcher_xdp_func side, because local_bh_disable
seems not to be enough to make synchronize_rcu_tasks work

I'm now testing patch below

jirka


---
diff --git a/include/linux/filter.h b/include/linux/filter.h
index efc42a6e3aed..a27245b96d6b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -772,7 +772,13 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * under local_bh_disable(), which provides the needed RCU protection
 	 * for accessing map entries.
 	 */
-	u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	u32 act;
+
+	rcu_read_lock();
+
+	act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+
+	rcu_read_unlock();
 
 	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
 		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
index c19719f48ce0..6a2ced102fc7 100644
--- a/kernel/bpf/dispatcher.c
+++ b/kernel/bpf/dispatcher.c
@@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
 	}
 
 	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
+	synchronize_rcu_tasks();
 
 	if (new)
 		d->image_off = noff;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-09 23:07                             ` Jiri Olsa
@ 2022-12-09 23:29                               ` Jiri Olsa
  2022-12-09 23:32                               ` Daniel Borkmann
  1 sibling, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2022-12-09 23:29 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jiri Olsa, Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun,
	Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev,
	Thorsten Leemhuis

On Sat, Dec 10, 2022 at 12:07:58AM +0100, Jiri Olsa wrote:

SNIP

> > > > > > 
> > > > > > looking at the code.. how do we ensure that code running through
> > > > > > bpf_prog_run_xdp will not get dispatcher image changed while
> > > > > > it's being exetuted
> > > > > > 
> > > > > > we use 'the other half' of the image when we add/remove programs,
> > > > > > but could bpf_dispatcher_update race with bpf_prog_run_xdp like:
> > > > > > 
> > > > > > 
> > > > > > cpu 0:                                  cpu 1:
> > > > > > 
> > > > > > bpf_prog_run_xdp
> > > > > >      ...
> > > > > >      bpf_dispatcher_xdp_func
> > > > > >         start exec image at offset 0x0
> > > > > > 
> > > > > >                                           bpf_dispatcher_update
> > > > > >                                                   update image at offset 0x800
> > > > > >                                           bpf_dispatcher_update
> > > > > >                                                   update image at offset 0x0
> > > > > > 
> > > > > >         still in image at offset 0x0
> > > > > > 
> > > > > > 
> > > > > > that might explain why I wasn't able to trigger that on
> > > > > > bare metal just in qemu
> > > > > 
> > > > > I tried patch below and it fixes the issue for me and seems
> > > > > to confirm the race above.. but not sure it's the best fix
> > > > > 
> > > > > jirka
> > > > > 
> > > > > 
> > > > > ---
> > > > > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> > > > > index c19719f48ce0..6a2ced102fc7 100644
> > > > > --- a/kernel/bpf/dispatcher.c
> > > > > +++ b/kernel/bpf/dispatcher.c
> > > > > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
> > > > >    	}
> > > > >    	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> > > > > +	synchronize_rcu_tasks();
> > > > >    	if (new)
> > > > >    		d->image_off = noff;
> > > > 
> > > > This might work. In arch/x86/kernel/alternative.c, we have following
> > > > code and comments. For text_poke, synchronize_rcu_tasks() might be able
> > > > to avoid concurrent execution and update.
> > > 
> > > so my idea was that we need to ensure all the current callers of
> > > bpf_dispatcher_xdp_func (which should have rcu read lock, based
> > > on the comment in bpf_prog_run_xdp) are gone before and new ones
> > > execute the new image, so the next call to the bpf_dispatcher_update
> > > will be safe to overwrite the other half of the image
> > 
> > If v6.1-rc1 was indeed okay, then it looks like this may be related to
> > the trampoline patching for the static_call? Did it repro on v6.1-rc1
> > just with dbe69b299884 ("bpf: Fix dispatcher patchable function entry
> > to 5 bytes nop") cherry-picked?
> 
> I'll try that.. it looks to me like the problem was always there,
> maybe harder to trigger.. also to reproduce it you need to call
> bpf_dispatcher_update heavily, which is not probably the common
> use case
> 
> one other thing is that I think the fix might need rcu locking
> on the bpf_dispatcher_xdp_func side, because local_bh_disable
> seems not to be enough to make synchronize_rcu_tasks work
> 
> I'm now testing patch below
> 
> jirka
> 
> 
> ---
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index efc42a6e3aed..a27245b96d6b 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -772,7 +772,13 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
>  	 * under local_bh_disable(), which provides the needed RCU protection
>  	 * for accessing map entries.
>  	 */
> -	u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> +	u32 act;
> +
> +	rcu_read_lock();
> +
> +	act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> +
> +	rcu_read_unlock();
>  
>  	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
>  		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> index c19719f48ce0..6a2ced102fc7 100644
> --- a/kernel/bpf/dispatcher.c
> +++ b/kernel/bpf/dispatcher.c
> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
>  	}
>  
>  	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> +	synchronize_rcu_tasks();
>  
>  	if (new)
>  		d->image_off = noff;

hm, so I'm eventually getting splats like below

I guess I'm missing some rcu/xdp detail, thoughts? ;-)

jirka


---
[ 1107.911088][   T41] INFO: task rcu_tasks_kthre:12 blocked for more than 122 seconds.
[ 1107.913332][   T41]       Not tainted 6.1.0-rc7+ #847
[ 1107.914801][   T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1107.916691][   T41] task:rcu_tasks_kthre state:D stack:14392 pid:12    ppid:2      flags:0x00004000
[ 1107.917324][   T41] Call Trace:
[ 1107.917563][   T41]  <TASK>
[ 1107.917784][   T41]  __schedule+0x419/0xe30
[ 1107.918764][   T41]  schedule+0x5d/0xe0
[ 1107.919061][   T41]  schedule_timeout+0x102/0x140
[ 1107.919386][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.919747][   T41]  ? lock_release+0x264/0x4f0
[ 1107.920079][   T41]  ? lock_acquired+0x207/0x470
[ 1107.920397][   T41]  ? trace_hardirqs_on+0x2b/0xd0
[ 1107.920723][   T41]  __wait_for_common+0xb6/0x210
[ 1107.921067][   T41]  ? usleep_range_state+0xb0/0xb0
[ 1107.921401][   T41]  __synchronize_srcu+0x151/0x1e0
[ 1107.921731][   T41]  ? rcu_tasks_pregp_step+0x10/0x10
[ 1107.922112][   T41]  ? ktime_get_mono_fast_ns+0x3a/0x90
[ 1107.922463][   T41]  ? synchronize_srcu+0xa1/0xe0
[ 1107.922784][   T41]  rcu_tasks_wait_gp+0x183/0x3b0
[ 1107.923129][   T41]  ? lock_release+0x264/0x4f0
[ 1107.923442][   T41]  rcu_tasks_one_gp+0x35a/0x3e0
[ 1107.923766][   T41]  ? rcu_tasks_postscan+0x20/0x20
[ 1107.924114][   T41]  rcu_tasks_kthread+0x31/0x40
[ 1107.924434][   T41]  kthread+0xf2/0x120
[ 1107.924713][   T41]  ? kthread_complete_and_exit+0x20/0x20
[ 1107.925095][   T41]  ret_from_fork+0x1f/0x30
[ 1107.925404][   T41]  </TASK>
[ 1107.925664][   T41] INFO: task ex:7319 blocked for more than 122 seconds.
[ 1107.926121][   T41]       Not tainted 6.1.0-rc7+ #847
[ 1107.926461][   T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1107.927090][   T41] task:ex              state:D stack:13648 pid:7319  ppid:677    flags:0x00004006
[ 1107.927791][   T41] Call Trace:
[ 1107.928079][   T41]  <TASK>
[ 1107.928334][   T41]  __schedule+0x419/0xe30
[ 1107.928683][   T41]  schedule+0x5d/0xe0
[ 1107.929019][   T41]  schedule_preempt_disabled+0x14/0x30
[ 1107.929440][   T41]  __mutex_lock+0x3fd/0x850
[ 1107.929799][   T41]  ? bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.930235][   T41]  ? bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.930609][   T41]  bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.930977][   T41]  bpf_prog_test_run_xdp+0x39b/0x600
[ 1107.931340][   T41]  __sys_bpf+0x963/0x2bb0
[ 1107.931684][   T41]  ? futex_wait+0x175/0x250
[ 1107.932014][   T41]  ? lock_acquire+0x2ed/0x370
[ 1107.932328][   T41]  ? lock_release+0x264/0x4f0
[ 1107.932640][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.933028][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.933388][   T41]  ? lock_release+0x264/0x4f0
[ 1107.933700][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.934070][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.934432][   T41]  __x64_sys_bpf+0x1a/0x30
[ 1107.934733][   T41]  do_syscall_64+0x37/0x90
[ 1107.935050][   T41]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1107.935428][   T41] RIP: 0033:0x7f02f9f0af3d
[ 1107.935731][   T41] RSP: 002b:00007f02fa0e9df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ 1107.936291][   T41] RAX: ffffffffffffffda RBX: 00007f02fa0ea640 RCX: 00007f02f9f0af3d
[ 1107.936811][   T41] RDX: 0000000000000048 RSI: 0000000020000140 RDI: 000000000000000a
[ 1107.937360][   T41] RBP: 00007f02fa0e9e20 R08: 0000000000000000 R09: 0000000000000000
[ 1107.937884][   T41] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff80
[ 1107.938425][   T41] R13: 0000000000000011 R14: 00007ffda75fd290 R15: 00007f02fa0ca000
[ 1107.939050][   T41]  </TASK>
[ 1107.939315][   T41] INFO: task ex:7352 blocked for more than 122 seconds.
[ 1107.939744][   T41]       Not tainted 6.1.0-rc7+ #847
[ 1107.940095][   T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1107.940651][   T41] task:ex              state:D stack:13648 pid:7352  ppid:766    flags:0x00004006
[ 1107.941254][   T41] Call Trace:
[ 1107.941492][   T41]  <TASK>
[ 1107.941710][   T41]  __schedule+0x419/0xe30
[ 1107.942018][   T41]  ? lock_acquired+0x207/0x470
[ 1107.942339][   T41]  schedule+0x5d/0xe0
[ 1107.942616][   T41]  schedule_timeout+0x102/0x140
[ 1107.942955][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.943330][   T41]  ? lock_release+0x264/0x4f0
[ 1107.943643][   T41]  ? lock_acquired+0x207/0x470
[ 1107.943965][   T41]  ? trace_hardirqs_on+0x2b/0xd0
[ 1107.944318][   T41]  __wait_for_common+0xb6/0x210
[ 1107.944641][   T41]  ? usleep_range_state+0xb0/0xb0
[ 1107.950003][   T41]  __wait_rcu_gp+0x14d/0x170
[ 1107.950399][   T41]  ? 0xffffffffa0013840
[ 1107.950726][   T41]  synchronize_rcu_tasks_generic.part.0.isra.0+0x31/0x50
[ 1107.951207][   T41]  ? call_rcu_tasks_generic+0x350/0x350
[ 1107.951643][   T41]  ? rcu_tasks_pregp_step+0x10/0x10
[ 1107.952070][   T41]  bpf_dispatcher_change_prog+0x204/0x380
[ 1107.952521][   T41]  bpf_prog_test_run_xdp+0x39b/0x600
[ 1107.952941][   T41]  __sys_bpf+0x963/0x2bb0
[ 1107.953302][   T41]  ? futex_wait+0x175/0x250
[ 1107.953669][   T41]  ? lock_acquire+0x2ed/0x370
[ 1107.954058][   T41]  ? lock_release+0x264/0x4f0
[ 1107.954435][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.954868][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.955329][   T41]  ? lock_release+0x264/0x4f0
[ 1107.955705][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.956148][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.956582][   T41]  __x64_sys_bpf+0x1a/0x30
[ 1107.956937][   T41]  do_syscall_64+0x37/0x90
[ 1107.957312][   T41]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1107.957771][   T41] RIP: 0033:0x7ffaa610af3d
[ 1107.958140][   T41] RSP: 002b:00007ffaa629adf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ 1107.958792][   T41] RAX: ffffffffffffffda RBX: 00007ffaa629b640 RCX: 00007ffaa610af3d
[ 1107.959427][   T41] RDX: 0000000000000048 RSI: 0000000020000140 RDI: 000000000000000a
[ 1107.960054][   T41] RBP: 00007ffaa629ae20 R08: 0000000000000000 R09: 0000000000000000
[ 1107.960680][   T41] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff80
[ 1107.961314][   T41] R13: 0000000000000011 R14: 00007ffef5c89e00 R15: 00007ffaa627b000
[ 1107.961948][   T41]  </TASK>
[ 1107.962226][   T41] INFO: task ex:7354 blocked for more than 122 seconds.
[ 1107.962756][   T41]       Not tainted 6.1.0-rc7+ #847
[ 1107.963178][   T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1107.963786][   T41] task:ex              state:D stack:13648 pid:7354  ppid:767    flags:0x00004006
[ 1107.964451][   T41] Call Trace:
[ 1107.964733][   T41]  <TASK>
[ 1107.965001][   T41]  __schedule+0x419/0xe30
[ 1107.965354][   T41]  schedule+0x5d/0xe0
[ 1107.965682][   T41]  schedule_preempt_disabled+0x14/0x30
[ 1107.966130][   T41]  __mutex_lock+0x3fd/0x850
[ 1107.966493][   T41]  ? lock_acquire+0x2ed/0x370
[ 1107.966870][   T41]  ? bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.967340][   T41]  ? bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.967792][   T41]  bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.968236][   T41]  bpf_prog_test_run_xdp+0x2c8/0x600
[ 1107.968654][   T41]  __sys_bpf+0x963/0x2bb0
[ 1107.969012][   T41]  ? futex_wait+0x175/0x250
[ 1107.969380][   T41]  ? lock_acquire+0x2ed/0x370
[ 1107.969754][   T41]  ? lock_release+0x264/0x4f0
[ 1107.970135][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.970565][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.971008][   T41]  ? lock_release+0x264/0x4f0
[ 1107.971385][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.971813][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.972257][   T41]  __x64_sys_bpf+0x1a/0x30
[ 1107.972614][   T41]  do_syscall_64+0x37/0x90
[ 1107.972984][   T41]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1107.973385][   T41] RIP: 0033:0x7ffaa610af3d
[ 1107.973696][   T41] RSP: 002b:00007ffaa629adf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ 1107.974261][   T41] RAX: ffffffffffffffda RBX: 00007ffaa629b640 RCX: 00007ffaa610af3d
[ 1107.974795][   T41] RDX: 0000000000000048 RSI: 0000000020000140 RDI: 000000000000000a
[ 1107.975348][   T41] RBP: 00007ffaa629ae20 R08: 0000000000000000 R09: 0000000000000000
[ 1107.975942][   T41] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff80
[ 1107.976570][   T41] R13: 0000000000000011 R14: 00007ffef5c89e00 R15: 00007ffaa627b000
[ 1107.977216][   T41]  </TASK>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-09 23:07                             ` Jiri Olsa
  2022-12-09 23:29                               ` Jiri Olsa
@ 2022-12-09 23:32                               ` Daniel Borkmann
  2022-12-09 23:34                                 ` Jakub Kicinski
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel Borkmann @ 2022-12-09 23:32 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun,
	Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev,
	Thorsten Leemhuis

On 12/10/22 12:07 AM, Jiri Olsa wrote:
> On Fri, Dec 09, 2022 at 11:41:11PM +0100, Daniel Borkmann wrote:
>> On 12/9/22 10:53 PM, Jiri Olsa wrote:
>>> On Fri, Dec 09, 2022 at 12:31:06PM -0800, Yonghong Song wrote:
>>>>
>>>>
>>>> On 12/9/22 7:20 AM, Jiri Olsa wrote:
>>>>> On Fri, Dec 09, 2022 at 02:50:55PM +0100, Jiri Olsa wrote:
>>>>>> On Fri, Dec 09, 2022 at 12:22:37PM +0100, Jiri Olsa wrote:
>>>>>>
>>>>>> SBIP
>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm trying to understand the severity of the issues and
>>>>>>>>>>>>> whether we need to revert that commit asap since the merge window
>>>>>>>>>>>>> is about to start.
>>>>>>>>>>>>
>>>>>>>>>>>> Jiri, Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> ping.
>>>>>>>>>>>>
>>>>>>>>>>>> cc-ing Thorsten, since he's tracking it now.
>>>>>>>>>>>>
>>>>>>>>>>>> The config has CONFIG_X86_KERNEL_IBT=y.
>>>>>>>>>>>> Is it related?
>>>>>>>>>>>
>>>>>>>>>>> sorry for late reply.. I still did not find the reason,
>>>>>>>>>>> but I did not try with IBT yet, will test now
>>>>>>>>>>
>>>>>>>>>> no difference with IBT enabled, can't reproduce the issue
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ok, scratch that.. the reproducer got stuck on wifi init :-\
>>>>>>>>>
>>>>>>>>> after I fix that I can now reproduce on my local config with
>>>>>>>>> IBT enabled or disabled.. it's something else
>>>>>>>>
>>>>>>>> I'm getting the error also when reverting the static call change,
>>>>>>>> looking for good commit, bisecting
>>>>>>>>
>>>>>>>> I'm getting fail with:
>>>>>>>>       f0c4d9fc9cc9 (tag: v6.1-rc4) Linux 6.1-rc4
>>>>>>>>
>>>>>>>> v6.1-rc1 is ok
>>>>>>>
>>>>>>> so far I narrowed it down between rc1 and rc3.. bisect got me nowhere so far
>>>>>>>
>>>>>>> attaching some more logs
>>>>>>
>>>>>> looking at the code.. how do we ensure that code running through
>>>>>> bpf_prog_run_xdp will not get dispatcher image changed while
>>>>>> it's being exetuted
>>>>>>
>>>>>> we use 'the other half' of the image when we add/remove programs,
>>>>>> but could bpf_dispatcher_update race with bpf_prog_run_xdp like:
>>>>>>
>>>>>>
>>>>>> cpu 0:                                  cpu 1:
>>>>>>
>>>>>> bpf_prog_run_xdp
>>>>>>       ...
>>>>>>       bpf_dispatcher_xdp_func
>>>>>>          start exec image at offset 0x0
>>>>>>
>>>>>>                                            bpf_dispatcher_update
>>>>>>                                                    update image at offset 0x800
>>>>>>                                            bpf_dispatcher_update
>>>>>>                                                    update image at offset 0x0
>>>>>>
>>>>>>          still in image at offset 0x0
>>>>>>
>>>>>>
>>>>>> that might explain why I wasn't able to trigger that on
>>>>>> bare metal just in qemu
>>>>>
>>>>> I tried patch below and it fixes the issue for me and seems
>>>>> to confirm the race above.. but not sure it's the best fix
>>>>>
>>>>> jirka
>>>>>
>>>>>
>>>>> ---
>>>>> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
>>>>> index c19719f48ce0..6a2ced102fc7 100644
>>>>> --- a/kernel/bpf/dispatcher.c
>>>>> +++ b/kernel/bpf/dispatcher.c
>>>>> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
>>>>>     	}
>>>>>     	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
>>>>> +	synchronize_rcu_tasks();
>>>>>     	if (new)
>>>>>     		d->image_off = noff;
>>>>
>>>> This might work. In arch/x86/kernel/alternative.c, we have following
>>>> code and comments. For text_poke, synchronize_rcu_tasks() might be able
>>>> to avoid concurrent execution and update.
>>>
>>> so my idea was that we need to ensure all the current callers of
>>> bpf_dispatcher_xdp_func (which should have rcu read lock, based
>>> on the comment in bpf_prog_run_xdp) are gone before and new ones
>>> execute the new image, so the next call to the bpf_dispatcher_update
>>> will be safe to overwrite the other half of the image
>>
>> If v6.1-rc1 was indeed okay, then it looks like this may be related to
>> the trampoline patching for the static_call? Did it repro on v6.1-rc1
>> just with dbe69b299884 ("bpf: Fix dispatcher patchable function entry
>> to 5 bytes nop") cherry-picked?
> 
> I'll try that.. it looks to me like the problem was always there,
> maybe harder to trigger.. also to reproduce it you need to call
> bpf_dispatcher_update heavily, which is not probably the common
> use case
> 
> one other thing is that I think the fix might need rcu locking
> on the bpf_dispatcher_xdp_func side, because local_bh_disable
> seems not to be enough to make synchronize_rcu_tasks work
> 
> I'm now testing patch below
> 
> jirka
> 
> ---
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index efc42a6e3aed..a27245b96d6b 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -772,7 +772,13 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
>   	 * under local_bh_disable(), which provides the needed RCU protection
>   	 * for accessing map entries.
>   	 */
> -	u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> +	u32 act;
> +
> +	rcu_read_lock();
> +
> +	act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> +
> +	rcu_read_unlock();

fwiw, these should not be necessary, Documentation/RCU/checklist.rst :

   [...] One example of non-obvious pairing is the XDP feature in networking,
   which calls BPF programs from network-driver NAPI (softirq) context. BPF
   relies heavily on RCU protection for its data structures, but because the
   BPF program invocation happens entirely within a single local_bh_disable()
   section in a NAPI poll cycle, this usage is safe. The reason that this usage
   is safe is that readers can use anything that disables BH when updaters use
   call_rcu() or synchronize_rcu(). [...]

>   	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
>   		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> index c19719f48ce0..6a2ced102fc7 100644
> --- a/kernel/bpf/dispatcher.c
> +++ b/kernel/bpf/dispatcher.c
> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
>   	}
>   
>   	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> +	synchronize_rcu_tasks();
>   
>   	if (new)
>   		d->image_off = noff;
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-09 23:32                               ` Daniel Borkmann
@ 2022-12-09 23:34                                 ` Jakub Kicinski
  2022-12-10  0:06                                   ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2022-12-09 23:34 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jiri Olsa, Yonghong Song, Alexei Starovoitov, Song Liu, Hao Sun,
	Peter Zijlstra, bpf, Alexei Starovoitov, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, David Miller, Jesper Dangaard Brouer,
	Linux Kernel Mailing List, netdev, Thorsten Leemhuis

On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote:
> fwiw, these should not be necessary, Documentation/RCU/checklist.rst :
> 
>    [...] One example of non-obvious pairing is the XDP feature in networking,
>    which calls BPF programs from network-driver NAPI (softirq) context. BPF
>    relies heavily on RCU protection for its data structures, but because the
>    BPF program invocation happens entirely within a single local_bh_disable()
>    section in a NAPI poll cycle, this usage is safe. The reason that this usage
>    is safe is that readers can use anything that disables BH when updaters use
>    call_rcu() or synchronize_rcu(). [...]

FWIW I sent a link to the thread to Paul and he confirmed 
the RCU will wait for just the BH.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-09 23:34                                 ` Jakub Kicinski
@ 2022-12-10  0:06                                   ` Jiri Olsa
  2022-12-10  0:38                                     ` Paul E. McKenney
  2022-12-10  1:12                                     ` Alexei Starovoitov
  0 siblings, 2 replies; 26+ messages in thread
From: Jiri Olsa @ 2022-12-10  0:06 UTC (permalink / raw)
  To: Jakub Kicinski, Paul E. McKenney
  Cc: Daniel Borkmann, Jiri Olsa, Yonghong Song, Alexei Starovoitov,
	Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, David Miller,
	Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev,
	Thorsten Leemhuis

On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote:
> On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote:
> > fwiw, these should not be necessary, Documentation/RCU/checklist.rst :
> > 
> >    [...] One example of non-obvious pairing is the XDP feature in networking,
> >    which calls BPF programs from network-driver NAPI (softirq) context. BPF
> >    relies heavily on RCU protection for its data structures, but because the
> >    BPF program invocation happens entirely within a single local_bh_disable()
> >    section in a NAPI poll cycle, this usage is safe. The reason that this usage
> >    is safe is that readers can use anything that disables BH when updaters use
> >    call_rcu() or synchronize_rcu(). [...]
> 
> FWIW I sent a link to the thread to Paul and he confirmed 
> the RCU will wait for just the BH.

so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side

Paul,
any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog
with bpf_prog_run_xdp callers?

with synchronize_rcu_tasks I'm getting splats like:
  https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f

synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-)

thanks,
jirka


---
diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
index c19719f48ce0..e6126f07e85b 100644
--- a/kernel/bpf/dispatcher.c
+++ b/kernel/bpf/dispatcher.c
@@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
 	}
 
 	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
+	synchronize_rcu_tasks_rude();
 
 	if (new)
 		d->image_off = noff;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-10  0:06                                   ` Jiri Olsa
@ 2022-12-10  0:38                                     ` Paul E. McKenney
  2022-12-10 13:05                                       ` Jiri Olsa
  2022-12-10  1:12                                     ` Alexei Starovoitov
  1 sibling, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2022-12-10  0:38 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jakub Kicinski, Daniel Borkmann, Yonghong Song,
	Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf,
	Alexei Starovoitov, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev,
	Hao Luo, David Miller, Jesper Dangaard Brouer,
	Linux Kernel Mailing List, netdev, Thorsten Leemhuis

On Sat, Dec 10, 2022 at 01:06:16AM +0100, Jiri Olsa wrote:
> On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote:
> > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote:
> > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst :
> > > 
> > >    [...] One example of non-obvious pairing is the XDP feature in networking,
> > >    which calls BPF programs from network-driver NAPI (softirq) context. BPF
> > >    relies heavily on RCU protection for its data structures, but because the
> > >    BPF program invocation happens entirely within a single local_bh_disable()
> > >    section in a NAPI poll cycle, this usage is safe. The reason that this usage
> > >    is safe is that readers can use anything that disables BH when updaters use
> > >    call_rcu() or synchronize_rcu(). [...]
> > 
> > FWIW I sent a link to the thread to Paul and he confirmed 
> > the RCU will wait for just the BH.
> 
> so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side
> 
> Paul,
> any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog
> with bpf_prog_run_xdp callers?
> 
> with synchronize_rcu_tasks I'm getting splats like:
>   https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f
> 
> synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-)

It sounds like we are all talking past each other, leaving me no
choice but to supply a wall of text:

It is quite true that synchronize_rcu_tasks_rude() will wait
for bh-disabled regions of code, just like synchronize_rcu()
and synchronize_rcu_tasks() will.  However, please note that
synchronize_rcu_tasks() never waits on any of the idle tasks.  So the
usual approach in tracing is to do both a synchronize_rcu_tasks() and
synchronize_rcu_tasks_rude().  One way of overlapping the resulting
pair of grace periods is to use synchronize_rcu_mult().

But none of these permit readers to sleep.  That is what
synchronize_rcu_tasks_trace() is for, but unlike both
synchronize_rcu_tasks() and synchronize_rcu_tasks_rude(),
you must explicitly mark the readers with rcu_read_lock_trace()
and rcu_read_unlock_trace().  This is used to protect sleepable
BPF programs.

Now, synchronize_rcu() will also wait on bh-disabled lines of code, with
the exception of such code in the exception path, way deep in the idle
loop, early in the CPU-online process, or late in the CPU-offline process.
You can recognize the first two categories of code by the noinstr tags
on the functions.

And yes, synchronize_rcu_rude() is quite special.  ;-)

Does this help, or am I simply adding to the confusion?

							Thanx, Paul

> thanks,
> jirka
> 
> 
> ---
> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> index c19719f48ce0..e6126f07e85b 100644
> --- a/kernel/bpf/dispatcher.c
> +++ b/kernel/bpf/dispatcher.c
> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
>  	}
>  
>  	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> +	synchronize_rcu_tasks_rude();
>  
>  	if (new)
>  		d->image_off = noff;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-10  0:38                                     ` Paul E. McKenney
@ 2022-12-10 13:05                                       ` Jiri Olsa
  0 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2022-12-10 13:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jiri Olsa, Jakub Kicinski, Daniel Borkmann, Yonghong Song,
	Alexei Starovoitov, Song Liu, Hao Sun, Peter Zijlstra, bpf,
	Alexei Starovoitov, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev,
	Hao Luo, David Miller, Jesper Dangaard Brouer,
	Linux Kernel Mailing List, netdev, Thorsten Leemhuis

On Fri, Dec 09, 2022 at 04:38:38PM -0800, Paul E. McKenney wrote:
> On Sat, Dec 10, 2022 at 01:06:16AM +0100, Jiri Olsa wrote:
> > On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote:
> > > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote:
> > > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst :
> > > > 
> > > >    [...] One example of non-obvious pairing is the XDP feature in networking,
> > > >    which calls BPF programs from network-driver NAPI (softirq) context. BPF
> > > >    relies heavily on RCU protection for its data structures, but because the
> > > >    BPF program invocation happens entirely within a single local_bh_disable()
> > > >    section in a NAPI poll cycle, this usage is safe. The reason that this usage
> > > >    is safe is that readers can use anything that disables BH when updaters use
> > > >    call_rcu() or synchronize_rcu(). [...]
> > > 
> > > FWIW I sent a link to the thread to Paul and he confirmed 
> > > the RCU will wait for just the BH.
> > 
> > so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side
> > 
> > Paul,
> > any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog
> > with bpf_prog_run_xdp callers?
> > 
> > with synchronize_rcu_tasks I'm getting splats like:
> >   https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f
> > 
> > synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-)
> 
> It sounds like we are all talking past each other, leaving me no
> choice but to supply a wall of text:
> 
> It is quite true that synchronize_rcu_tasks_rude() will wait
> for bh-disabled regions of code, just like synchronize_rcu()
> and synchronize_rcu_tasks() will.  However, please note that
> synchronize_rcu_tasks() never waits on any of the idle tasks.  So the
> usual approach in tracing is to do both a synchronize_rcu_tasks() and
> synchronize_rcu_tasks_rude().  One way of overlapping the resulting
> pair of grace periods is to use synchronize_rcu_mult().
> 
> But none of these permit readers to sleep.  That is what
> synchronize_rcu_tasks_trace() is for, but unlike both
> synchronize_rcu_tasks() and synchronize_rcu_tasks_rude(),
> you must explicitly mark the readers with rcu_read_lock_trace()
> and rcu_read_unlock_trace().  This is used to protect sleepable
> BPF programs.
> 
> Now, synchronize_rcu() will also wait on bh-disabled lines of code, with
> the exception of such code in the exception path, way deep in the idle
> loop, early in the CPU-online process, or late in the CPU-offline process.
> You can recognize the first two categories of code by the noinstr tags
> on the functions.
> 
> And yes, synchronize_rcu_rude() is quite special.  ;-)
> 
> Does this help, or am I simply adding to the confusion?

I see, so as Alexei said to synchronize bpf_prog_run_xdp callers,
we should be able to use just synchronize_rcu, because it's allways
called just in bh-disabled code

thanks,
jirka

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-10  0:06                                   ` Jiri Olsa
  2022-12-10  0:38                                     ` Paul E. McKenney
@ 2022-12-10  1:12                                     ` Alexei Starovoitov
  2022-12-10 13:11                                       ` Jiri Olsa
  1 sibling, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2022-12-10  1:12 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jakub Kicinski, Paul E. McKenney, Daniel Borkmann, Yonghong Song,
	Song Liu, Hao Sun, Peter Zijlstra, bpf, Alexei Starovoitov,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, David Miller,
	Jesper Dangaard Brouer, Linux Kernel Mailing List, netdev,
	Thorsten Leemhuis

On Fri, Dec 9, 2022 at 4:06 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote:
> > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote:
> > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst :
> > >
> > >    [...] One example of non-obvious pairing is the XDP feature in networking,
> > >    which calls BPF programs from network-driver NAPI (softirq) context. BPF
> > >    relies heavily on RCU protection for its data structures, but because the
> > >    BPF program invocation happens entirely within a single local_bh_disable()
> > >    section in a NAPI poll cycle, this usage is safe. The reason that this usage
> > >    is safe is that readers can use anything that disables BH when updaters use
> > >    call_rcu() or synchronize_rcu(). [...]
> >
> > FWIW I sent a link to the thread to Paul and he confirmed
> > the RCU will wait for just the BH.
>
> so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side
>
> Paul,
> any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog
> with bpf_prog_run_xdp callers?
>
> with synchronize_rcu_tasks I'm getting splats like:
>   https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f
>
> synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-)

Jiri,

I haven't tried to repro this yet, but I feel you're on
the wrong path here. The splat has this:
? bpf_prog_run_xdp include/linux/filter.h:775 [inline]
? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400
that test_run logic takes rcu_read_lock.
See bpf_test_timer_enter.
I suspect the addition of synchronize_rcu_tasks_rude
only slows down the race.
The synchronize_rcu_tasks_trace also behaves like synchronize_rcu.
See our new and fancy rcu_trace_implies_rcu_gp(),
but I'm not sure it applies to synchronize_rcu_tasks_rude.
Have you tried with just synchronize_rcu() ?
If your theory about the race is correct then
the vanila sync_rcu should help.
If not, the issue is some place else.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-10  1:12                                     ` Alexei Starovoitov
@ 2022-12-10 13:11                                       ` Jiri Olsa
  2022-12-12 15:04                                         ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2022-12-10 13:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Jakub Kicinski, Paul E. McKenney, Daniel Borkmann,
	Yonghong Song, Song Liu, Hao Sun, Peter Zijlstra, bpf,
	Alexei Starovoitov, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev,
	Hao Luo, David Miller, Jesper Dangaard Brouer,
	Linux Kernel Mailing List, netdev, Thorsten Leemhuis

On Fri, Dec 09, 2022 at 05:12:03PM -0800, Alexei Starovoitov wrote:
> On Fri, Dec 9, 2022 at 4:06 PM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote:
> > > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote:
> > > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst :
> > > >
> > > >    [...] One example of non-obvious pairing is the XDP feature in networking,
> > > >    which calls BPF programs from network-driver NAPI (softirq) context. BPF
> > > >    relies heavily on RCU protection for its data structures, but because the
> > > >    BPF program invocation happens entirely within a single local_bh_disable()
> > > >    section in a NAPI poll cycle, this usage is safe. The reason that this usage
> > > >    is safe is that readers can use anything that disables BH when updaters use
> > > >    call_rcu() or synchronize_rcu(). [...]
> > >
> > > FWIW I sent a link to the thread to Paul and he confirmed
> > > the RCU will wait for just the BH.
> >
> > so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side
> >
> > Paul,
> > any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog
> > with bpf_prog_run_xdp callers?
> >
> > with synchronize_rcu_tasks I'm getting splats like:
> >   https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f
> >
> > synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-)
> 
> Jiri,
> 
> I haven't tried to repro this yet, but I feel you're on
> the wrong path here. The splat has this:
> ? bpf_prog_run_xdp include/linux/filter.h:775 [inline]
> ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400
> that test_run logic takes rcu_read_lock.
> See bpf_test_timer_enter.
> I suspect the addition of synchronize_rcu_tasks_rude
> only slows down the race.
> The synchronize_rcu_tasks_trace also behaves like synchronize_rcu.
> See our new and fancy rcu_trace_implies_rcu_gp(),
> but I'm not sure it applies to synchronize_rcu_tasks_rude.
> Have you tried with just synchronize_rcu() ?
> If your theory about the race is correct then
> the vanila sync_rcu should help.
> If not, the issue is some place else.

synchronize_rcu seems to work as well, I'll keep the test
running for some time

thanks,
jirka

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-10 13:11                                       ` Jiri Olsa
@ 2022-12-12 15:04                                         ` Jiri Olsa
  2022-12-13  2:26                                           ` Hao Sun
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2022-12-12 15:04 UTC (permalink / raw)
  To: Jiri Olsa, Hao Sun
  Cc: Alexei Starovoitov, Jakub Kicinski, Paul E. McKenney,
	Daniel Borkmann, Yonghong Song, Song Liu, Peter Zijlstra, bpf,
	Alexei Starovoitov, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev,
	Hao Luo, David Miller, Jesper Dangaard Brouer,
	Linux Kernel Mailing List, netdev, Thorsten Leemhuis

On Sat, Dec 10, 2022 at 02:11:34PM +0100, Jiri Olsa wrote:
> On Fri, Dec 09, 2022 at 05:12:03PM -0800, Alexei Starovoitov wrote:
> > On Fri, Dec 9, 2022 at 4:06 PM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote:
> > > > On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote:
> > > > > fwiw, these should not be necessary, Documentation/RCU/checklist.rst :
> > > > >
> > > > >    [...] One example of non-obvious pairing is the XDP feature in networking,
> > > > >    which calls BPF programs from network-driver NAPI (softirq) context. BPF
> > > > >    relies heavily on RCU protection for its data structures, but because the
> > > > >    BPF program invocation happens entirely within a single local_bh_disable()
> > > > >    section in a NAPI poll cycle, this usage is safe. The reason that this usage
> > > > >    is safe is that readers can use anything that disables BH when updaters use
> > > > >    call_rcu() or synchronize_rcu(). [...]
> > > >
> > > > FWIW I sent a link to the thread to Paul and he confirmed
> > > > the RCU will wait for just the BH.
> > >
> > > so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side
> > >
> > > Paul,
> > > any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog
> > > with bpf_prog_run_xdp callers?
> > >
> > > with synchronize_rcu_tasks I'm getting splats like:
> > >   https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f
> > >
> > > synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-)
> > 
> > Jiri,
> > 
> > I haven't tried to repro this yet, but I feel you're on
> > the wrong path here. The splat has this:
> > ? bpf_prog_run_xdp include/linux/filter.h:775 [inline]
> > ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400
> > that test_run logic takes rcu_read_lock.
> > See bpf_test_timer_enter.
> > I suspect the addition of synchronize_rcu_tasks_rude
> > only slows down the race.
> > The synchronize_rcu_tasks_trace also behaves like synchronize_rcu.
> > See our new and fancy rcu_trace_implies_rcu_gp(),
> > but I'm not sure it applies to synchronize_rcu_tasks_rude.
> > Have you tried with just synchronize_rcu() ?
> > If your theory about the race is correct then
> > the vanila sync_rcu should help.
> > If not, the issue is some place else.
> 
> synchronize_rcu seems to work as well, I'll keep the test
> running for some time

looks good, Hao Sun, could you please test change below?

thanks,
jirka


---
diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
index c19719f48ce0..4b0fa5b98137 100644
--- a/kernel/bpf/dispatcher.c
+++ b/kernel/bpf/dispatcher.c
@@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
 	}
 
 	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
+	synchronize_rcu();
 
 	if (new)
 		d->image_off = noff;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
  2022-12-12 15:04                                         ` Jiri Olsa
@ 2022-12-13  2:26                                           ` Hao Sun
  0 siblings, 0 replies; 26+ messages in thread
From: Hao Sun @ 2022-12-13  2:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Jakub Kicinski, Paul E. McKenney,
	Daniel Borkmann, Yonghong Song, Song Liu, Peter Zijlstra, bpf,
	Alexei Starovoitov, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev,
	Hao Luo, David Miller, Jesper Dangaard Brouer,
	Linux Kernel Mailing List, netdev, Thorsten Leemhuis



> On 12 Dec 2022, at 11:04 PM, Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> On Sat, Dec 10, 2022 at 02:11:34PM +0100, Jiri Olsa wrote:
>> On Fri, Dec 09, 2022 at 05:12:03PM -0800, Alexei Starovoitov wrote:
>>> On Fri, Dec 9, 2022 at 4:06 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>>>> 
>>>> On Fri, Dec 09, 2022 at 03:34:45PM -0800, Jakub Kicinski wrote:
>>>>> On Sat, 10 Dec 2022 00:32:07 +0100 Daniel Borkmann wrote:
>>>>>> fwiw, these should not be necessary, Documentation/RCU/checklist.rst :
>>>>>> 
>>>>>>   [...] One example of non-obvious pairing is the XDP feature in networking,
>>>>>>   which calls BPF programs from network-driver NAPI (softirq) context. BPF
>>>>>>   relies heavily on RCU protection for its data structures, but because the
>>>>>>   BPF program invocation happens entirely within a single local_bh_disable()
>>>>>>   section in a NAPI poll cycle, this usage is safe. The reason that this usage
>>>>>>   is safe is that readers can use anything that disables BH when updaters use
>>>>>>   call_rcu() or synchronize_rcu(). [...]
>>>>> 
>>>>> FWIW I sent a link to the thread to Paul and he confirmed
>>>>> the RCU will wait for just the BH.
>>>> 
>>>> so IIUC we can omit the rcu_read_lock/unlock on bpf_prog_run_xdp side
>>>> 
>>>> Paul,
>>>> any thoughts on what we can use in here to synchronize bpf_dispatcher_change_prog
>>>> with bpf_prog_run_xdp callers?
>>>> 
>>>> with synchronize_rcu_tasks I'm getting splats like:
>>>>  https://lore.kernel.org/bpf/20221209153445.22182ca5@kernel.org/T/#m0a869f93404a2744884d922bc96d497ffe8f579f
>>>> 
>>>> synchronize_rcu_tasks_rude seems to work (patch below), but it also sounds special ;-)
>>> 
>>> Jiri,
>>> 
>>> I haven't tried to repro this yet, but I feel you're on
>>> the wrong path here. The splat has this:
>>> ? bpf_prog_run_xdp include/linux/filter.h:775 [inline]
>>> ? bpf_test_run+0x2ce/0x990 net/bpf/test_run.c:400
>>> that test_run logic takes rcu_read_lock.
>>> See bpf_test_timer_enter.
>>> I suspect the addition of synchronize_rcu_tasks_rude
>>> only slows down the race.
>>> The synchronize_rcu_tasks_trace also behaves like synchronize_rcu.
>>> See our new and fancy rcu_trace_implies_rcu_gp(),
>>> but I'm not sure it applies to synchronize_rcu_tasks_rude.
>>> Have you tried with just synchronize_rcu() ?
>>> If your theory about the race is correct then
>>> the vanila sync_rcu should help.
>>> If not, the issue is some place else.
>> 
>> synchronize_rcu seems to work as well, I'll keep the test
>> running for some time
> 
> looks good, Hao Sun, could you please test change below?

Hi,

Tested on a latest bpf-next build. The reproducer would trigger
the Oops in 5 mins without the patch. After applying the patch,
the reproducer cannot trigger any issue for more than 15 mins.
Seems working, tested on:

HEAD commit: ef3911a3e4d6 docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE
git tree: bpf-next
kernel config: https://pastebin.com/raw/rZdWLcgK
C reproducer: https://pastebin.com/raw/GFfDn2Gk

> 
> ---
> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> index c19719f48ce0..4b0fa5b98137 100644
> --- a/kernel/bpf/dispatcher.c
> +++ b/kernel/bpf/dispatcher.c
> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
> }
> 
> __BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> + synchronize_rcu();
> 
> if (new)
> d->image_off = noff;



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot
  2022-12-06  3:28 BUG: unable to handle kernel paging request in bpf_dispatcher_xdp Hao Sun
  2022-12-06  6:46 ` Hao Sun
@ 2022-12-08  8:44 ` Thorsten Leemhuis
  2022-12-19  9:59   ` Thorsten Leemhuis
  1 sibling, 1 reply; 26+ messages in thread
From: Thorsten Leemhuis @ 2022-12-08  8:44 UTC (permalink / raw)
  To: bpf, regressions@lists.linux.dev; +Cc: Linux Kernel Mailing List, netdev

[Note: this mail contains only information for Linux kernel regression
tracking. Mails like these contain '#forregzbot' in the subject to make
then easy to spot and filter out. The author also tried to remove most
or all individuals from the list of recipients to spare them the hassle.]

On 06.12.22 04:28, Hao Sun wrote:
> 
> The following crash can be triggered with the BPF prog provided.
> It seems the verifier passed some invalid progs. I will try to simplify
> the C reproducer, for now, the following can reproduce this:

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced c86df29d11df
#regzbot title net/bpf: BUG: unable to handle kernel paging request in
bpf_dispatcher_xdp
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot
  2022-12-08  8:44 ` BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot Thorsten Leemhuis
@ 2022-12-19  9:59   ` Thorsten Leemhuis
  0 siblings, 0 replies; 26+ messages in thread
From: Thorsten Leemhuis @ 2022-12-19  9:59 UTC (permalink / raw)
  To: bpf, regressions@lists.linux.dev; +Cc: Linux Kernel Mailing List, netdev

On 08.12.22 09:44, Thorsten Leemhuis wrote:
> [Note: this mail contains only information for Linux kernel regression
> tracking. Mails like these contain '#forregzbot' in the subject to make
> then easy to spot and filter out. The author also tried to remove most
> or all individuals from the list of recipients to spare them the hassle.]
> 
> On 06.12.22 04:28, Hao Sun wrote:
>>
>> The following crash can be triggered with the BPF prog provided.
>> It seems the verifier passed some invalid progs. I will try to simplify
>> the C reproducer, for now, the following can reproduce this:
> 
> Thanks for the report. To be sure below issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced c86df29d11df
> #regzbot title net/bpf: BUG: unable to handle kernel paging request in
> bpf_dispatcher_xdp
> #regzbot ignore-activity

#regzbot fix: bpf: Synchronize dispatcher update with
bpf_dispatcher_xdp_func

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-12-19  9:59 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-06  3:28 BUG: unable to handle kernel paging request in bpf_dispatcher_xdp Hao Sun
2022-12-06  6:46 ` Hao Sun
2022-12-06 15:18   ` Jiri Olsa
2022-12-07 19:57     ` Alexei Starovoitov
2022-12-08 17:48       ` Alexei Starovoitov
2022-12-08 18:06         ` Jiri Olsa
     [not found]           ` <Y5JkomOZaCETLDaZ@krava>
2022-12-08 23:02             ` Jiri Olsa
2022-12-09  7:09               ` Jiri Olsa
     [not found]                 ` <Y5MaffJOe1QtumSN@krava>
2022-12-09 13:50                   ` Jiri Olsa
2022-12-09 15:20                     ` Jiri Olsa
2022-12-09 20:31                       ` Yonghong Song
2022-12-09 21:53                         ` Jiri Olsa
2022-12-09 22:41                           ` Daniel Borkmann
2022-12-09 23:07                             ` Jiri Olsa
2022-12-09 23:29                               ` Jiri Olsa
2022-12-09 23:32                               ` Daniel Borkmann
2022-12-09 23:34                                 ` Jakub Kicinski
2022-12-10  0:06                                   ` Jiri Olsa
2022-12-10  0:38                                     ` Paul E. McKenney
2022-12-10 13:05                                       ` Jiri Olsa
2022-12-10  1:12                                     ` Alexei Starovoitov
2022-12-10 13:11                                       ` Jiri Olsa
2022-12-12 15:04                                         ` Jiri Olsa
2022-12-13  2:26                                           ` Hao Sun
2022-12-08  8:44 ` BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot Thorsten Leemhuis
2022-12-19  9:59   ` Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox