netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unchecked sock pointer causes panic in RAW_TP
@ 2025-01-31 20:32 Yan Zhai
  2025-01-31 22:38 ` Kuniyuki Iwashima
  0 siblings, 1 reply; 5+ messages in thread
From: Yan Zhai @ 2025-01-31 20:32 UTC (permalink / raw)
  To: bpf; +Cc: netdev, linux-kernel, kernel-team

Hello,

We encountered a panic when tracing kfree_skb with RAW_TP. The problematic
argument was introduced in commit ba8de796baf4 ("net: introduce
sk_skb_reason_drop function"). It turns out that the verifier still accepted
the program despite it didn't test sk == NULL. And this caused kernel panic. I
attached a small reproducer and panic trace at the end. It's stably
reproducible when packets are dropped without a receiver (e.g. run iperf2 UDP
test toward localhost), in both 6.12.11 release and a recent bpf-next master
snapshot (I was using commit c03320a6768c).

As a contrast, for another tracepoint like tcp_send_reset, if sk is not checked
before dereferencing, the verifier will complain and reject the program as
expected. So this feels like some annotation is missing? Appreciate if someone
could help me figure out.

thanks
Yan

----- Reproducer and panic trace ----
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

char _license[] SEC("license") = "GPL";

SEC("tp_btf/tcp_send_reset")
int BPF_PROG(tcp_send_reset, struct sock *sk, struct sk_buff *skb)
{
        if (skb && sk && sk->__sk_common.skc_state == TCP_LISTEN) {
                bpf_printk("sk: %d, %d\n", sk, sk->__sk_common.skc_family);
        }
        return 0;
}

SEC("tp_btf/kfree_skb")
int BPF_PROG(drop, struct sk_buff *skb, void *location,
             enum skb_drop_reason reason, struct sock *sk)
{
        bpf_printk("sk: %d, %d\n", sk, sk->__sk_common.skc_family);
        return 0;
}

Byte code:
int drop(unsigned long long * ctx):
; int BPF_PROG(drop, struct sk_buff *skb, void *location,
   0: (79) r3 = *(u64 *)(r1 +24)
; bpf_printk("sk: %d, %d\n", sk, sk->__sk_common.skc_family);
   1: (69) r4 = *(u16 *)(r3 +16)
   2: (18) r1 = map[id:7][0]+12
   4: (b7) r2 = 12
   5: (85) call bpf_trace_printk#-63104
; int BPF_PROG(drop, struct sk_buff *skb, void *location,
   6: (b7) r0 = 0
   7: (95) exit

Trace:
[   29.982295][  T348] BUG: kernel NULL pointer
dereference, address: 0000000000000010
[   29.983487][  T348] #PF: supervisor read access in kernel mode
[   29.984326][  T348] #PF: error_code(0x0000) - not-present page
[   29.985138][  T348] PGD 0 P4D 0
[   29.985654][  T348] Oops: Oops: 0000 [#1] PREEMPT SMP
[   29.986351][  T348] CPU: 6 UID: 0 PID: 348 Comm: sshd Not tainted
6.12.11 #206
[   29.987309][  T348] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   29.988678][  T348] RIP:
0010:bpf_prog_5e21a6db8fcff1aa_drop+0x10/0x2d
[   29.989553][  T348] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 0f 1f 00 55 48 89
e5 48 8b 57 18 <48> 0f b7 4a 10 48 bf 0c 4f e2 c1 ad 90 ff ff be 0c 00
00 00 e8 0f
[   29.992008][  T348] RSP: 0018:ffffa86640b53da8 EFLAGS: 00010202
[   29.992811][  T348] RAX: 0000000000000001 RBX: ffffa866402d1000
RCX: 0000000000000002
[   29.993852][  T348] RDX: 0000000000000000 RSI: ffffa866402d1048
RDI: ffffa86640b53dc8
[   29.994929][  T348] RBP: ffffa86640b53da8 R08: 0000000000000000
R09: 9c908cd09b9c8c91
[   29.995991][  T348] R10: ffff90adc056b540 R11: 0000000000000002
R12: 0000000000000000
[   29.997043][  T348] R13: ffffa86640b53e88 R14: 0000000000000800
R15: fffffffffffffffe
[   29.998097][  T348] FS:  00007f2a27c2b480(0000)
GS:ffff90b0efd00000(0000) knlGS:0000000000000000
[   29.999279][  T348] CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[   30.000161][  T348] CR2: 0000000000000010 CR3: 0000000100e69004
CR4: 00000000001726f0
[   30.001217][  T348] Call Trace:
[   30.001724][  T348]  <TASK>
[   30.002145][  T348]  ? __die+0x1f/0x60
[   30.002694][  T348]  ? page_fault_oops+0x148/0x420
[   30.003386][  T348]  ? search_bpf_extables+0x5b/0x70
[   30.004082][  T348]  ? fixup_exception+0x27/0x2c0
[   30.004748][  T348]  ? exc_page_fault+0x75/0x170
[   30.005416][  T348]  ? asm_exc_page_fault+0x22/0x30
[   30.006104][  T348]  ? bpf_prog_5e21a6db8fcff1aa_drop+0x10/0x2d
[   30.006923][  T348]  bpf_trace_run4+0x68/0xd0
[   30.007566][  T348]  ? unix_stream_connect+0x1f4/0x6f0
[   30.008274][  T348]  sk_skb_reason_drop+0x90/0x120
[   30.008960][  T348]  unix_stream_connect+0x1f4/0x6f0
[   30.009662][  T348]  __sys_connect+0x7f/0xb0
[   30.010267][  T348]  __x64_sys_connect+0x14/0x20
[   30.010927][  T348]  do_syscall_64+0x47/0xc30
[   30.011567][  T348]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[   30.012371][  T348] RIP: 0033:0x7f2a27f296a0
[   30.012998][  T348] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f
1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d 41 ff 0c 00 00 74 17 b8 2a
00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83
ec 18 89 54
[   30.015491][  T348] RSP: 002b:00007ffe29274f58 EFLAGS: 00000202
ORIG_RAX: 000000000000002a




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unchecked sock pointer causes panic in RAW_TP
  2025-01-31 20:32 Unchecked sock pointer causes panic in RAW_TP Yan Zhai
@ 2025-01-31 22:38 ` Kuniyuki Iwashima
  2025-01-31 23:28   ` Kuniyuki Iwashima
  0 siblings, 1 reply; 5+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-31 22:38 UTC (permalink / raw)
  To: yan; +Cc: bpf, kernel-team, linux-kernel, netdev, kuniyu

From: Yan Zhai <yan@cloudflare.com>
Date: Fri, 31 Jan 2025 12:32:57 -0800
> Hello,
> 
> We encountered a panic when tracing kfree_skb with RAW_TP. The problematic
> argument was introduced in commit ba8de796baf4 ("net: introduce
> sk_skb_reason_drop function"). It turns out that the verifier still accepted
> the program despite it didn't test sk == NULL. And this caused kernel panic. I
> attached a small reproducer and panic trace at the end. It's stably
> reproducible when packets are dropped without a receiver (e.g. run iperf2 UDP
> test toward localhost), in both 6.12.11 release and a recent bpf-next master
> snapshot (I was using commit c03320a6768c).
> 
> As a contrast, for another tracepoint like tcp_send_reset, if sk is not checked
> before dereferencing, the verifier will complain and reject the program as
> expected. So this feels like some annotation is missing? Appreciate if someone
> could help me figure out.

Maybe __nullable is missing given we annotated skb for tcp_send_reset ?
https://lore.kernel.org/netdev/20240911033719.91468-4-lulie@linux.alibaba.com/

completely untested:

---8<---
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index b877133cd93a..34accc5929d6 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -24,14 +24,14 @@ DEFINE_DROP_REASON(FN, FN)
 TRACE_EVENT(kfree_skb,
 
 	TP_PROTO(struct sk_buff *skb, void *location,
-		 enum skb_drop_reason reason, struct sock *rx_sk),
+		 enum skb_drop_reason reason, struct sock *rx_sk__nullable),
 
-	TP_ARGS(skb, location, reason, rx_sk),
+	TP_ARGS(skb, location, reason, rx_sk__nullable),
 
 	TP_STRUCT__entry(
 		__field(void *,		skbaddr)
 		__field(void *,		location)
-		__field(void *,		rx_sk)
+		__field(void *,		rx_sk__nullable)
 		__field(unsigned short,	protocol)
 		__field(enum skb_drop_reason,	reason)
 	),
@@ -39,7 +39,7 @@ TRACE_EVENT(kfree_skb,
 	TP_fast_assign(
 		__entry->skbaddr = skb;
 		__entry->location = location;
-		__entry->rx_sk = rx_sk;
+		__entry->rx_sk = rx_sk__nullable;
 		__entry->protocol = ntohs(skb->protocol);
 		__entry->reason = reason;
 	),
---8<---

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Unchecked sock pointer causes panic in RAW_TP
  2025-01-31 22:38 ` Kuniyuki Iwashima
@ 2025-01-31 23:28   ` Kuniyuki Iwashima
  2025-02-01  0:24     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 5+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-31 23:28 UTC (permalink / raw)
  To: kuniyu; +Cc: bpf, kernel-team, linux-kernel, netdev, yan

From: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Fri, 31 Jan 2025 14:38:38 -0800
> From: Yan Zhai <yan@cloudflare.com>
> Date: Fri, 31 Jan 2025 12:32:57 -0800
> > Hello,
> > 
> > We encountered a panic when tracing kfree_skb with RAW_TP. The problematic
> > argument was introduced in commit ba8de796baf4 ("net: introduce
> > sk_skb_reason_drop function"). It turns out that the verifier still accepted
> > the program despite it didn't test sk == NULL. And this caused kernel panic. I
> > attached a small reproducer and panic trace at the end. It's stably
> > reproducible when packets are dropped without a receiver (e.g. run iperf2 UDP
> > test toward localhost), in both 6.12.11 release and a recent bpf-next master
> > snapshot (I was using commit c03320a6768c).
> > 
> > As a contrast, for another tracepoint like tcp_send_reset, if sk is not checked
> > before dereferencing, the verifier will complain and reject the program as
> > expected. So this feels like some annotation is missing? Appreciate if someone
> > could help me figure out.
> 
> Maybe __nullable is missing given we annotated skb for tcp_send_reset ?
> https://lore.kernel.org/netdev/20240911033719.91468-4-lulie@linux.alibaba.com/
> 
> completely untested:
> 
> ---8<---
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index b877133cd93a..34accc5929d6 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -24,14 +24,14 @@ DEFINE_DROP_REASON(FN, FN)
>  TRACE_EVENT(kfree_skb,
>  
>  	TP_PROTO(struct sk_buff *skb, void *location,
> -		 enum skb_drop_reason reason, struct sock *rx_sk),
> +		 enum skb_drop_reason reason, struct sock *rx_sk__nullable),
>  
> -	TP_ARGS(skb, location, reason, rx_sk),
> +	TP_ARGS(skb, location, reason, rx_sk__nullable),
>  
>  	TP_STRUCT__entry(
>  		__field(void *,		skbaddr)
>  		__field(void *,		location)
> -		__field(void *,		rx_sk)
> +		__field(void *,		rx_sk__nullable)

This part is unnecessary.


>  		__field(unsigned short,	protocol)
>  		__field(enum skb_drop_reason,	reason)
>  	),
> @@ -39,7 +39,7 @@ TRACE_EVENT(kfree_skb,
>  	TP_fast_assign(
>  		__entry->skbaddr = skb;
>  		__entry->location = location;
> -		__entry->rx_sk = rx_sk;
> +		__entry->rx_sk = rx_sk__nullable;
>  		__entry->protocol = ntohs(skb->protocol);
>  		__entry->reason = reason;
>  	),
> ---8<---
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unchecked sock pointer causes panic in RAW_TP
  2025-01-31 23:28   ` Kuniyuki Iwashima
@ 2025-02-01  0:24     ` Kuniyuki Iwashima
  2025-02-01  0:55       ` Yan Zhai
  0 siblings, 1 reply; 5+ messages in thread
From: Kuniyuki Iwashima @ 2025-02-01  0:24 UTC (permalink / raw)
  To: kuniyu; +Cc: bpf, kernel-team, linux-kernel, netdev, yan

From: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Fri, 31 Jan 2025 15:28:51 -0800
> From: Kuniyuki Iwashima <kuniyu@amazon.com>
> Date: Fri, 31 Jan 2025 14:38:38 -0800
> > From: Yan Zhai <yan@cloudflare.com>
> > Date: Fri, 31 Jan 2025 12:32:57 -0800
> > > Hello,
> > > 
> > > We encountered a panic when tracing kfree_skb with RAW_TP. The problematic
> > > argument was introduced in commit ba8de796baf4 ("net: introduce
> > > sk_skb_reason_drop function"). It turns out that the verifier still accepted
> > > the program despite it didn't test sk == NULL. And this caused kernel panic. I
> > > attached a small reproducer and panic trace at the end. It's stably
> > > reproducible when packets are dropped without a receiver (e.g. run iperf2 UDP
> > > test toward localhost), in both 6.12.11 release and a recent bpf-next master
> > > snapshot (I was using commit c03320a6768c).
> > > 
> > > As a contrast, for another tracepoint like tcp_send_reset, if sk is not checked
> > > before dereferencing, the verifier will complain and reject the program as
> > > expected. So this feels like some annotation is missing? Appreciate if someone
> > > could help me figure out.
> > 
> > Maybe __nullable is missing given we annotated skb for tcp_send_reset ?
> > https://lore.kernel.org/netdev/20240911033719.91468-4-lulie@linux.alibaba.com/

Just for the record, I posted the fix:
https://lore.kernel.org/bpf/20250201001425.42377-1-kuniyu@amazon.com/T/#u

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unchecked sock pointer causes panic in RAW_TP
  2025-02-01  0:24     ` Kuniyuki Iwashima
@ 2025-02-01  0:55       ` Yan Zhai
  0 siblings, 0 replies; 5+ messages in thread
From: Yan Zhai @ 2025-02-01  0:55 UTC (permalink / raw)
  To: Kuniyuki Iwashima; +Cc: bpf, kernel-team, linux-kernel, netdev

On Fri, Jan 31, 2025 at 6:25 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Kuniyuki Iwashima <kuniyu@amazon.com>
> Date: Fri, 31 Jan 2025 15:28:51 -0800
> > From: Kuniyuki Iwashima <kuniyu@amazon.com>
> > Date: Fri, 31 Jan 2025 14:38:38 -0800
> > > From: Yan Zhai <yan@cloudflare.com>
> > > Date: Fri, 31 Jan 2025 12:32:57 -0800
> > > > Hello,
> > > >
> > > > We encountered a panic when tracing kfree_skb with RAW_TP. The problematic
> > > > argument was introduced in commit ba8de796baf4 ("net: introduce
> > > > sk_skb_reason_drop function"). It turns out that the verifier still accepted
> > > > the program despite it didn't test sk == NULL. And this caused kernel panic. I
> > > > attached a small reproducer and panic trace at the end. It's stably
> > > > reproducible when packets are dropped without a receiver (e.g. run iperf2 UDP
> > > > test toward localhost), in both 6.12.11 release and a recent bpf-next master
> > > > snapshot (I was using commit c03320a6768c).
> > > >
> > > > As a contrast, for another tracepoint like tcp_send_reset, if sk is not checked
> > > > before dereferencing, the verifier will complain and reject the program as
> > > > expected. So this feels like some annotation is missing? Appreciate if someone
> > > > could help me figure out.
> > >
> > > Maybe __nullable is missing given we annotated skb for tcp_send_reset ?
> > > https://lore.kernel.org/netdev/20240911033719.91468-4-lulie@linux.alibaba.com/
>
> Just for the record, I posted the fix:
> https://lore.kernel.org/bpf/20250201001425.42377-1-kuniyu@amazon.com/T/#u

This fix in the other thread is working as expected, thanks for the quick patch!

Yan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-01  0:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-31 20:32 Unchecked sock pointer causes panic in RAW_TP Yan Zhai
2025-01-31 22:38 ` Kuniyuki Iwashima
2025-01-31 23:28   ` Kuniyuki Iwashima
2025-02-01  0:24     ` Kuniyuki Iwashima
2025-02-01  0:55       ` Yan Zhai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).