* NULL pointer deref inside xdp_do_flush due to bpf_net_ctx_get_all_used_flush_lists
@ 2024-10-01 11:30 Yury Vostrikov
2024-10-01 13:36 ` Sebastian Andrzej Siewior
2024-10-01 13:36 ` Toke Høiland-Jørgensen
0 siblings, 2 replies; 5+ messages in thread
From: Yury Vostrikov @ 2024-10-01 11:30 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: bpf
Hi,
I stumbled upon a NULL pointer derefence inside BPF code. The triggering condition is
message from OOM killer + netconsole. The crash happens at
u32 kern_flags = bpf_net_ctx->ri.kern_flags;
line of bpf_net_ctx_get_all_used_flush_lists() function. bpf_net_ctx is NULL here. With trivial fix
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 7d7578a8eac1..cba16bf307f7 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -844,6 +844,9 @@ static inline void bpf_net_ctx_get_all_used_flush_lists(struct list_head **lh_ma
struct list_head **lh_xsk)
{
struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get();
+ WARN_ON(bpf_net_ctx == NULL);
+ if (bpf_net_ctx == NULL)
+ return;
u32 kern_flags = bpf_net_ctx->ri.kern_flags;
struct list_head *lh;
I get the following backtrace instead of crash:
[ 177.216427] ------------[ cut here ]------------
[ 177.216428] WARNING: CPU: 25 PID: 4584 at include/linux/filter.h:847 xdp_do_flush+0x7b/0x80
[ 177.216430] Modules linked in: veth snd_hrtimer snd_seq bridge stp llc nfnetlink overlay netconsole configfs nfs lockd grace netfs sunrpc af_packet 8021q nls_iso8859_1 nls_cp866 vfat fat i2c_dev cpuid edac_mce_amd edac_core snd_hda_codec_realtek kvm_amd snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi kvm uvcvideo sha512_ssse3 snd_usb_audio videobuf2_vmalloc sha512_generic snd_hda_intel videobuf2_memops uvc snd_intel_dspcfg sha256_ssse3 videobuf2_v4l2 snd_hwdep snd_hda_codec aesni_intel snd_usbmidi_lib sfc_siena videodev snd_hda_core snd_rawmidi gf128mul videobuf2_common snd_seq_device crypto_simd mdio snd_pcm mc i2c_designware_platform cryptd wmi_bmof i2c_piix4 ahci ptp efi_pstore snd_timer i2c_designware_core k10temp i2c_smbus libahci pps_core snd soundcore ccp sha1_generic rng_core efivarfs hid_logitech_hidpp hid_lenovo hid_logitech_dj input_leds led_class hid_generic usbhid hid dm_mod amdgpu i2c_algo_bit drm_ttm_helper ttm drm_exec drm_suballoc_helper amdxcp mfd
[ 177.216455] drm_display_helper drm_kms_helper xhci_pci xhci_hcd drm cec thermal video wmi evdev
[ 177.216458] CPU: 25 UID: 0 PID: 4584 Comm: spill Not tainted 6.12.0-rc1-00031-ge32cde8d2bd7-dirty #3
[ 177.216459] Hardware name: Gigabyte Technology Co., Ltd. B650M AORUS ELITE AX/B650M AORUS ELITE AX, BIOS F32b 09/03/2024
[ 177.216459] RIP: 0010:xdp_do_flush+0x7b/0x80
[ 177.216460] Code: db 74 16 48 89 df 5b e9 e3 9b af ff 85 c9 74 09 48 8b 40 40 48 39 c3 75 e5 5b c3 cc cc cc cc 48 85 ff 74 f5 5b e9 15 7e af ff <0f> 0b eb c6 90 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca
[ 177.216460] RSP: 0000:ffff8881b892f818 EFLAGS: 00010046
[ 177.216461] RAX: 0000000000000000 RBX: ffff88810a603030 RCX: 000000000000000f
[ 177.216461] RDX: ffff888109fe88c0 RSI: 0000000000000000 RDI: ffff8881b892f838
[ 177.216461] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 177.216462] R10: 2c545830061c0544 R11: 1ced901e1303e22e R12: ffff88810a603480
[ 177.216462] R13: ffff88810a603000 R14: ffff888109fe88c0 R15: ffff88810a603980
[ 177.216462] FS: 00000000004d9790(0000) GS:ffff888ffe440000(0000) knlGS:0000000000000000
[ 177.216463] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 177.216463] CR2: 0000000000489418 CR3: 00000001c457a000 CR4: 0000000000350ef0
[ 177.216464] Call Trace:
[ 177.216464] <TASK>
[ 177.216465] ? __warn+0x81/0x110
[ 177.216467] ? xdp_do_flush+0x7b/0x80
[ 177.216468] ? report_bug+0x14c/0x170
[ 177.216469] ? handle_bug+0x53/0x90
[ 177.216470] ? exc_invalid_op+0x13/0x60
[ 177.216471] ? asm_exc_invalid_op+0x16/0x20
[ 177.216472] ? xdp_do_flush+0x7b/0x80
[ 177.216474] efx_poll+0x178/0x380 [sfc_siena]
[ 177.216479] netpoll_poll_dev+0x118/0x1b0
[ 177.216481] __netpoll_send_skb+0x1ae/0x240
[ 177.216482] netpoll_send_udp+0x2e5/0x400
[ 177.216484] write_msg+0xeb/0x100 [netconsole]
[ 177.216486] console_flush_all+0x261/0x440
[ 177.216489] console_unlock+0x71/0xf0
[ 177.216490] vprintk_emit+0x251/0x2b0
[ 177.216491] _printk+0x48/0x50
[ 177.216492] seq_buf_do_printk+0x62/0xb0
[ 177.216493] mem_cgroup_print_oom_meminfo+0xc6/0x130
[ 177.216494] dump_header+0x52/0x1a0
[ 177.216495] oom_kill_process+0xf2/0x1e0
[ 177.216496] out_of_memory+0xec/0x2f0
[ 177.216497] mem_cgroup_out_of_memory+0x118/0x130
[ 177.216498] try_charge_memcg+0x43a/0x5f0
[ 177.216499] charge_memcg+0x2f/0x70
[ 177.216499] __mem_cgroup_charge+0x2c/0x80
[ 177.216500] filemap_add_folio+0x33/0xc0
[ 177.216501] __filemap_get_folio+0x165/0x2c0
[ 177.216502] filemap_fault+0x5c2/0xc70
[ 177.216502] __do_fault+0x2e/0xb0
[ 177.216503] __handle_mm_fault+0xf13/0x14b0
[ 177.216504] ? copy_fpstate_to_sigframe+0x26e/0x2b0
[ 177.216505] handle_mm_fault+0x1a9/0x2f0
[ 177.216506] do_user_addr_fault+0x1fb/0x600
[ 177.216507] exc_page_fault+0x69/0x110
[ 177.216507] asm_exc_page_fault+0x22/0x30
[ 177.216508] RIP: 0033:0x44ace6
[ 177.216509] Code: 89 d6 48 c1 ea 0c 81 e6 ff 0f 00 00 48 c1 ee 08 48 8d 14 92 48 c1 e2 02 48 03 91 98 00 00 00 0f 1f 44 00 00 48 83 fe 10 73 73 <8b> 3a 0f b6 54 32 04 01 fa eb 02 89 fa 48 8b b1 88 00 00 00 8d 7a
[ 177.216509] RSP: 002b:000000c00000f310 EFLAGS: 00010287
[ 177.216509] RAX: 0000000000046ae0 RBX: 000000c00000f398 RCX: 00000000004d5840
[ 177.216510] RDX: 0000000000489418 RSI: 000000000000000a RDI: 0000000000447ae0
[ 177.216510] RBP: 000000c00000f320 R08: 0000000000000000 R09: 0000000000000001
[ 177.216510] R10: 0000000000000000 R11: 0000000000000041 R12: 0000000000000400
[ 177.216510] R13: 00007f0b4b43f468 R14: 000000c000006000 R15: 00ffffffffffffff
[ 177.216511] </TASK>
[ 177.216511] ---[ end trace 0000000000000000 ]---
I'm out of my depth figuring out why bpf_net_ctx_get() returns NULL. For now I'm simply running with NULL check enabled.
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: NULL pointer deref inside xdp_do_flush due to bpf_net_ctx_get_all_used_flush_lists
2024-10-01 11:30 NULL pointer deref inside xdp_do_flush due to bpf_net_ctx_get_all_used_flush_lists Yury Vostrikov
@ 2024-10-01 13:36 ` Sebastian Andrzej Siewior
2024-10-01 14:01 ` Edward Cree
2024-10-01 13:36 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-10-01 13:36 UTC (permalink / raw)
To: Yury Vostrikov; +Cc: bpf, Edward Cree, Martin Habets, Thomas Gleixner
On 2024-10-01 13:30:13 [+0200], Yury Vostrikov wrote:
> Hi,
Hi,
…
> I get the following backtrace instead of crash:
>
> [ 177.216427] ------------[ cut here ]------------
…
> [ 177.216464] Call Trace:
> [ 177.216464] <TASK>
> [ 177.216474] efx_poll+0x178/0x380 [sfc_siena]
> [ 177.216479] netpoll_poll_dev+0x118/0x1b0
> [ 177.216481] __netpoll_send_skb+0x1ae/0x240
> [ 177.216482] netpoll_send_udp+0x2e5/0x400
> [ 177.216484] write_msg+0xeb/0x100 [netconsole]
> [ 177.216486] console_flush_all+0x261/0x440
> [ 177.216489] console_unlock+0x71/0xf0
> [ 177.216490] vprintk_emit+0x251/0x2b0
> [ 177.216491] _printk+0x48/0x50
…
> I'm out of my depth figuring out why bpf_net_ctx_get() returns NULL. For now I'm simply running with NULL check enabled.
netpoll_send_udp() Does not assign a context and invokes a NAPI poll.
However with a budget of 0 to just clean the TX resources.
Now, the SFC driver does not clean any RX packets but invokes
xdp_do_flush() anyway which leads to the crash later on.
Are the SFC maintainer against the following:
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index c9e17a8208a90..f3288e02c1bd8 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -1260,7 +1260,8 @@ static int efx_poll(struct napi_struct *napi, int budget)
spent = efx_process_channel(channel, budget);
- xdp_do_flush();
+ if (spent)
+ xdp_do_flush();
if (spent < budget) {
if (efx_channel_has_rx_queue(channel) &&
diff --git a/drivers/net/ethernet/sfc/siena/efx_channels.c b/drivers/net/ethernet/sfc/siena/efx_channels.c
index a7346e965bfe7..2b8b7c69bd7ae 100644
--- a/drivers/net/ethernet/sfc/siena/efx_channels.c
+++ b/drivers/net/ethernet/sfc/siena/efx_channels.c
@@ -1285,7 +1285,8 @@ static int efx_poll(struct napi_struct *napi, int budget)
spent = efx_process_channel(channel, budget);
- xdp_do_flush();
+ if (spent)
+ xdp_do_flush();
if (spent < budget) {
if (efx_channel_has_rx_queue(channel) &&
This should fix the crash. As an alternative we could keep track of
channel->n_rx_xdp_redirect before and after the efx_process_channel()
invocation to avoid the flush if there is no XDP done.
Sebastian
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: NULL pointer deref inside xdp_do_flush due to bpf_net_ctx_get_all_used_flush_lists
2024-10-01 11:30 NULL pointer deref inside xdp_do_flush due to bpf_net_ctx_get_all_used_flush_lists Yury Vostrikov
2024-10-01 13:36 ` Sebastian Andrzej Siewior
@ 2024-10-01 13:36 ` Toke Høiland-Jørgensen
1 sibling, 0 replies; 5+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-10-01 13:36 UTC (permalink / raw)
To: Yury Vostrikov, Sebastian Andrzej Siewior; +Cc: bpf
"Yury Vostrikov" <mon@unformed.ru> writes:
> Hi,
>
> I stumbled upon a NULL pointer derefence inside BPF code. The triggering condition is
> message from OOM killer + netconsole. The crash happens at
>
> u32 kern_flags = bpf_net_ctx->ri.kern_flags;
>
> line of bpf_net_ctx_get_all_used_flush_lists() function. bpf_net_ctx is NULL here. With trivial fix
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 7d7578a8eac1..cba16bf307f7 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -844,6 +844,9 @@ static inline void bpf_net_ctx_get_all_used_flush_lists(struct list_head **lh_ma
> struct list_head **lh_xsk)
> {
> struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get();
> + WARN_ON(bpf_net_ctx == NULL);
> + if (bpf_net_ctx == NULL)
> + return;
> u32 kern_flags = bpf_net_ctx->ri.kern_flags;
> struct list_head *lh;
>
> I get the following backtrace instead of crash:
[...]
> [ 177.216474] efx_poll+0x178/0x380 [sfc_siena]
Looks like the sfc driver is missing the context setup stuff entirely...
-Toke
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: NULL pointer deref inside xdp_do_flush due to bpf_net_ctx_get_all_used_flush_lists
2024-10-01 13:36 ` Sebastian Andrzej Siewior
@ 2024-10-01 14:01 ` Edward Cree
2024-10-01 14:04 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 5+ messages in thread
From: Edward Cree @ 2024-10-01 14:01 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Yury Vostrikov
Cc: bpf, Martin Habets, Thomas Gleixner
On 01/10/2024 14:36, Sebastian Andrzej Siewior wrote:
> diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
> index c9e17a8208a90..f3288e02c1bd8 100644
> --- a/drivers/net/ethernet/sfc/efx_channels.c
> +++ b/drivers/net/ethernet/sfc/efx_channels.c
> @@ -1260,7 +1260,8 @@ static int efx_poll(struct napi_struct *napi, int budget)
>
> spent = efx_process_channel(channel, budget);
>
> - xdp_do_flush();
> + if (spent)
> + xdp_do_flush();
>
> if (spent < budget) {
> if (efx_channel_has_rx_queue(channel) &&
Seems reasonable to me.
Another alternative is to look at budget rather than spent,
as that seems like the condition that drives whether we
have a real XDP.
-ed
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: NULL pointer deref inside xdp_do_flush due to bpf_net_ctx_get_all_used_flush_lists
2024-10-01 14:01 ` Edward Cree
@ 2024-10-01 14:04 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-10-01 14:04 UTC (permalink / raw)
To: Edward Cree; +Cc: Yury Vostrikov, bpf, Martin Habets, Thomas Gleixner
On 2024-10-01 15:01:25 [+0100], Edward Cree wrote:
> On 01/10/2024 14:36, Sebastian Andrzej Siewior wrote:
> > diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
> > index c9e17a8208a90..f3288e02c1bd8 100644
> > --- a/drivers/net/ethernet/sfc/efx_channels.c
> > +++ b/drivers/net/ethernet/sfc/efx_channels.c
> > @@ -1260,7 +1260,8 @@ static int efx_poll(struct napi_struct *napi, int budget)
> >
> > spent = efx_process_channel(channel, budget);
> >
> > - xdp_do_flush();
> > + if (spent)
> > + xdp_do_flush();
> >
> > if (spent < budget) {
> > if (efx_channel_has_rx_queue(channel) &&
>
> Seems reasonable to me.
> Another alternative is to look at budget rather than spent,
> as that seems like the condition that drives whether we
> have a real XDP.
If you prefer. What are chances that you had budget but cleaned nothing?
> -ed
Sebastian
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-01 14:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 11:30 NULL pointer deref inside xdp_do_flush due to bpf_net_ctx_get_all_used_flush_lists Yury Vostrikov
2024-10-01 13:36 ` Sebastian Andrzej Siewior
2024-10-01 14:01 ` Edward Cree
2024-10-01 14:04 ` Sebastian Andrzej Siewior
2024-10-01 13:36 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox