netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Recursive locking in sockmap
@ 2024-06-06 10:00 Vincent Whitchurch
  2024-06-06 12:46 ` Jason Xing
  0 siblings, 1 reply; 6+ messages in thread
From: Vincent Whitchurch @ 2024-06-06 10:00 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, Jason Xing; +Cc: netdev, bpf

[-- Attachment #1: Type: text/plain, Size: 12047 bytes --]

With a socket in the sockmap, if there's a parser callback installed
and the verdict callback returns SK_PASS, the kernel deadlocks
immediately after the verdict callback is run. This started at commit
6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
pointer dereference in sk_psock_skb_ingress_enqueue").

It can be reproduced by running ./test_sockmap -t ping
--txmsg_pass_skb.  The --txmsg_pass_skb command to test_sockmap is
available in this series:
https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.

Lockdep splat below (also attached in case it gets damaged). This is
from an unmodified 6.10.0-rc2, but the problem also exists on latest
mainline and net-next.

 ============================================
 WARNING: possible recursive locking detected
 6.10.0-rc2 #59 Not tainted
 --------------------------------------------
 test_sockmap/342 is trying to acquire lock:
 ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)

 but task is already holding lock:
 ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_strp_data_ready (net/core/skmsg.c:1120)

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(clock-AF_INET);
   lock(clock-AF_INET);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 9 locks held by test_sockmap/342:
 #0: ffff888007a85818 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg
(net/ipv4/tcp.c:1348)
 #1: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit
(./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781
net/ipv4/ip_output.c:470)
 #2: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
ip_finish_output2 (./include/linux/rcupdate.h:329
./include/linux/rcupdate.h:781 net/ipv4/ip_output.c:228)
 #3: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: process_backlog
(./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781
net/core/dev.c:6066)
 #4: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
ip_local_deliver_finish (./include/linux/rcupdate.h:329
./include/linux/rcupdate.h:781 net/ipv4/ip_input.c:232)
 #5: ffff888007a87018 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv
(./include/linux/skbuff.h:1640 ./include/net/tcp.h:2510
net/ipv4/tcp_ipv4.c:2342)
 #6: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
sk_psock_strp_data_ready (./include/linux/rcupdate.h:329
./include/linux/rcupdate.h:781 net/core/skmsg.c:1113)
 #7: ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_strp_data_ready (net/core/skmsg.c:1120)
 #8: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
sk_psock_strp_read (./include/linux/rcupdate.h:329
./include/linux/rcupdate.h:781 net/core/skmsg.c:1062)

 stack backtrace:
 CPU: 0 PID: 342 Comm: test_sockmap Not tainted 6.10.0-rc2 #59
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
 Call Trace:
   <IRQ>
  dump_stack_lvl (lib/dump_stack.c:118)
  __lock_acquire (kernel/locking/lockdep.c:3858 kernel/locking/lockdep.c:5137)
  ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
  ? tcp_rcv_established (./include/linux/skbuff.h:2097
./include/net/tcp.h:2026 ./include/net/tcp.h:2099
net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
  ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
  ? ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
  ? ip_local_deliver_finish (./include/linux/rcupdate.h:810
net/ipv4/ip_input.c:234)
  ? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
  lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
  ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)
  ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
  ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
  ? mark_held_locks (kernel/locking/lockdep.c:4274)
  ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:466
net/core/skmsg.c:555)
  _raw_read_lock_bh (./include/linux/rwlock_api_smp.h:177
kernel/locking/spinlock.c:252)
  ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)
  sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)
  sk_psock_skb_ingress_self (net/core/skmsg.c:607)
  sk_psock_verdict_apply (net/core/skmsg.c:1008)
  sk_psock_strp_read (./include/linux/rcupdate.h:810 net/core/skmsg.c:1081)
  ? sk_psock_strp_parse (net/core/skmsg.c:1104)
  __strp_recv (net/strparser/strparser.c:301 (discriminator 3))
  tcp_read_sock (net/ipv4/tcp.c:1583)
  ? __pfx_strp_recv (net/strparser/strparser.c:332)
  ? __pfx_tcp_read_sock (net/ipv4/tcp.c:1560)
  ? lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
  strp_read_sock (net/strparser/strparser.c:358)
  ? __pfx_strp_read_sock (net/strparser/strparser.c:346)
  ? __pfx_do_raw_write_lock (kernel/locking/spinlock_debug.c:209)
  ? lock_is_held_type (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5826)
  strp_data_ready (net/strparser/strparser.c:388 net/strparser/strparser.c:366)
  sk_psock_strp_data_ready (net/core/skmsg.c:1121)
  tcp_data_queue (net/ipv4/tcp_input.c:5234)
  ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
  ? __pfx_tcp_data_queue (net/ipv4/tcp_input.c:5148)
  ? __pfx_tcp_urg (net/ipv4/tcp_input.c:5820)
  ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
  ? kvm_clock_get_cycles (./arch/x86/include/asm/preempt.h:94
arch/x86/kernel/kvmclock.c:80 arch/x86/kernel/kvmclock.c:86)
  ? ktime_get (kernel/time/timekeeping.c:195 (discriminator 4)
kernel/time/timekeeping.c:395 (discriminator 4)
kernel/time/timekeeping.c:403 (discriminator 4)
kernel/time/timekeeping.c:850 (discriminator 4))
  tcp_rcv_established (./include/linux/skbuff.h:2097
./include/net/tcp.h:2026 ./include/net/tcp.h:2099
net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
  ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
  ? __pfx_tcp_inbound_hash.constprop.0 (./include/net/tcp.h:2800)
  ? __pfx_tcp_rcv_established (net/ipv4/tcp_input.c:6006)
  ? do_raw_spin_lock (./arch/x86/include/asm/atomic.h:107
./include/linux/atomic/atomic-arch-fallback.h:2170
./include/linux/atomic/atomic-instrumented.h:1302
./include/asm-generic/qspinlock.h:111
kernel/locking/spinlock_debug.c:116)
  tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1956)
  tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
  ? __pfx_tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2172)
  ? __pfx_raw_local_deliver (net/ipv4/raw.c:201)
  ? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
  ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
  ? lock_is_held_type (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5826)
  ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
  ip_local_deliver_finish (./include/linux/rcupdate.h:810
net/ipv4/ip_input.c:234)
  ip_local_deliver (./include/linux/netfilter.h:314
./include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
  ? __pfx_ip_local_deliver (net/ipv4/ip_input.c:243)
  ? lock_is_held_type (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5826)
  ? ip_rcv_finish_core.constprop.0 (./include/net/net_namespace.h:383
./include/linux/netdevice.h:2577 net/ipv4/ip_input.c:372)
  ip_rcv (./include/net/dst.h:460 net/ipv4/ip_input.c:449
./include/linux/netfilter.h:314 ./include/linux/netfilter.h:308
net/ipv4/ip_input.c:569)
  ? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
  ? lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
  ? lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
  ? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
  __netif_receive_skb_one_core (net/core/dev.c:5624 (discriminator 4))
  ? __pfx___netif_receive_skb_one_core (net/core/dev.c:5617)
  ? mark_held_locks (kernel/locking/lockdep.c:4274)
  process_backlog (./include/linux/rcupdate.h:810 net/core/dev.c:6068)
  __napi_poll.constprop.0 (net/core/dev.c:6721)
  net_rx_action (net/core/dev.c:6792 net/core/dev.c:6906)
  ? __pfx_net_rx_action (net/core/dev.c:6870)
  ? __pfx_rcu_core (kernel/rcu/tree.c:2756)
  ? mark_held_locks (kernel/locking/lockdep.c:4274)
  ? __dev_queue_xmit (./include/linux/rcupdate.h:339
./include/linux/rcupdate.h:849 net/core/dev.c:4420)
  handle_softirqs (kernel/softirq.c:554)
  ? __dev_queue_xmit (./include/linux/rcupdate.h:339
./include/linux/rcupdate.h:849 net/core/dev.c:4420)
  do_softirq (kernel/softirq.c:455 kernel/softirq.c:442)
   </IRQ>
   <TASK>
  __local_bh_enable_ip (kernel/softirq.c:382)
  ? __dev_queue_xmit (./include/linux/rcupdate.h:339
./include/linux/rcupdate.h:849 net/core/dev.c:4420)
  __dev_queue_xmit (net/core/dev.c:4421)
  ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
  ? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
  ? __pfx___dev_queue_xmit (net/core/dev.c:4302)
  ? find_held_lock (kernel/locking/lockdep.c:5244)
  ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
  ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
  ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
  ? mark_held_locks (kernel/locking/lockdep.c:4274)
  ip_finish_output2 (./include/linux/netdevice.h:3095
./include/net/neighbour.h:526 ./include/net/neighbour.h:540
net/ipv4/ip_output.c:235)
  ? __pfx_nf_hook (./include/linux/netfilter.h:227)
  ? lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
  ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:199)
  ? ip_skb_dst_mtu (./include/net/net_namespace.h:383
./include/linux/netdevice.h:2577 ./include/net/ip.h:465
./include/net/ip.h:502)
  ? __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
  __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
  ? __skb_clone (./arch/x86/include/asm/atomic.h:53 (discriminator 4)
./include/linux/atomic/atomic-arch-fallback.h:992 (discriminator 4)
./include/linux/atomic/atomic-instrumented.h:436 (discriminator 4)
net/core/skbuff.c:1576 (discriminator 4))
  __tcp_transmit_skb (net/ipv4/tcp_output.c:1466 (discriminator 4))
  ? __pfx___tcp_transmit_skb (net/ipv4/tcp_output.c:1287)
  ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
  ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
  ? ktime_get (./arch/x86/include/asm/irqflags.h:42
./arch/x86/include/asm/irqflags.h:77
./arch/x86/include/asm/irqflags.h:135 ./include/linux/seqlock.h:74
kernel/time/timekeeping.c:848)
  ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
  tcp_write_xmit (net/ipv4/tcp_output.c:2829)
  ? __pfx_mem_cgroup_charge_skmem (mm/memcontrol.c:7886)
  ? skb_page_frag_refill (net/core/sock.c:2920 net/core/sock.c:2904)
  __tcp_push_pending_frames (net/ipv4/tcp_output.c:3014)
  tcp_sendmsg_locked (net/ipv4/tcp.c:1316)
  ? print_usage_bug.part.0 (kernel/locking/lockdep.c:3980)
  ? __pfx_tcp_sendmsg_locked (net/ipv4/tcp.c:1046)
  ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
  ? __local_bh_enable_ip (./arch/x86/include/asm/irqflags.h:42
./arch/x86/include/asm/irqflags.h:77 kernel/softirq.c:387)
  tcp_sendmsg (net/ipv4/tcp.c:1349)
  __sys_sendto (net/socket.c:730 net/socket.c:745 net/socket.c:2192)
  ? __pfx___sys_sendto (net/socket.c:2162)
  ? lock_is_held_type (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5826)
  ? fd_install (./arch/x86/include/asm/preempt.h:103
./include/linux/rcupdate.h:896 fs/file.c:631)
  ? __sys_accept4 (./include/linux/file.h:47 net/socket.c:2002)
  ? __pfx___sys_accept4 (net/socket.c:1994)
  ? handle_mm_fault (./include/linux/memcontrol.h:1078
./include/linux/memcontrol.h:1066 mm/memory.c:5557 mm/memory.c:5704)
  __x64_sys_sendto (net/socket.c:2200)
  ? do_syscall_64 (./arch/x86/include/asm/irqflags.h:42
./arch/x86/include/asm/irqflags.h:77
./include/linux/entry-common.h:197 arch/x86/entry/common.c:79)
  ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
  do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

[-- Attachment #2: splat.txt --]
[-- Type: text/plain, Size: 11458 bytes --]

 ============================================
 WARNING: possible recursive locking detected
 6.10.0-rc2 #59 Not tainted
 --------------------------------------------
 test_sockmap/342 is trying to acquire lock:
 ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at: sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467 net/core/skmsg.c:555) 

 but task is already holding lock:
 ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at: sk_psock_strp_data_ready (net/core/skmsg.c:1120) 

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(clock-AF_INET);
   lock(clock-AF_INET);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 9 locks held by test_sockmap/342:
 #0: ffff888007a85818 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg (net/ipv4/tcp.c:1348) 
 #1: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/ipv4/ip_output.c:470) 
 #2: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2 (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/ipv4/ip_output.c:228) 
 #3: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: process_backlog (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/core/dev.c:6066) 
 #4: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/ipv4/ip_input.c:232) 
 #5: ffff888007a87018 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv (./include/linux/skbuff.h:1640 ./include/net/tcp.h:2510 net/ipv4/tcp_ipv4.c:2342) 
 #6: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: sk_psock_strp_data_ready (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/core/skmsg.c:1113) 
 #7: ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at: sk_psock_strp_data_ready (net/core/skmsg.c:1120) 
 #8: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: sk_psock_strp_read (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/core/skmsg.c:1062) 

 stack backtrace:
 CPU: 0 PID: 342 Comm: test_sockmap Not tainted 6.10.0-rc2 #59
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
 Call Trace:
   <IRQ>
  dump_stack_lvl (lib/dump_stack.c:118) 
  __lock_acquire (kernel/locking/lockdep.c:3858 kernel/locking/lockdep.c:5137) 
  ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993) 
  ? tcp_rcv_established (./include/linux/skbuff.h:2097 ./include/net/tcp.h:2026 ./include/net/tcp.h:2099 net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179) 
  ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345) 
  ? ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8)) 
  ? ip_local_deliver_finish (./include/linux/rcupdate.h:810 net/ipv4/ip_input.c:234) 
  ? __pfx_mark_lock (kernel/locking/lockdep.c:4639) 
  lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719) 
  ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467 net/core/skmsg.c:555) 
  ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722) 
  ? __pfx_lock_release (kernel/locking/lockdep.c:5762) 
  ? mark_held_locks (kernel/locking/lockdep.c:4274) 
  ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:466 net/core/skmsg.c:555) 
  _raw_read_lock_bh (./include/linux/rwlock_api_smp.h:177 kernel/locking/spinlock.c:252) 
  ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467 net/core/skmsg.c:555) 
  sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467 net/core/skmsg.c:555) 
  sk_psock_skb_ingress_self (net/core/skmsg.c:607) 
  sk_psock_verdict_apply (net/core/skmsg.c:1008) 
  sk_psock_strp_read (./include/linux/rcupdate.h:810 net/core/skmsg.c:1081) 
  ? sk_psock_strp_parse (net/core/skmsg.c:1104) 
  __strp_recv (net/strparser/strparser.c:301 (discriminator 3)) 
  tcp_read_sock (net/ipv4/tcp.c:1583) 
  ? __pfx_strp_recv (net/strparser/strparser.c:332) 
  ? __pfx_tcp_read_sock (net/ipv4/tcp.c:1560) 
  ? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719) 
  strp_read_sock (net/strparser/strparser.c:358) 
  ? __pfx_strp_read_sock (net/strparser/strparser.c:346) 
  ? __pfx_do_raw_write_lock (kernel/locking/spinlock_debug.c:209) 
  ? lock_is_held_type (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5826) 
  strp_data_ready (net/strparser/strparser.c:388 net/strparser/strparser.c:366) 
  sk_psock_strp_data_ready (net/core/skmsg.c:1121) 
  tcp_data_queue (net/ipv4/tcp_input.c:5234) 
  ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776) 
  ? __pfx_tcp_data_queue (net/ipv4/tcp_input.c:5148) 
  ? __pfx_tcp_urg (net/ipv4/tcp_input.c:5820) 
  ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421) 
  ? kvm_clock_get_cycles (./arch/x86/include/asm/preempt.h:94 arch/x86/kernel/kvmclock.c:80 arch/x86/kernel/kvmclock.c:86) 
  ? ktime_get (kernel/time/timekeeping.c:195 (discriminator 4) kernel/time/timekeeping.c:395 (discriminator 4) kernel/time/timekeeping.c:403 (discriminator 4) kernel/time/timekeeping.c:850 (discriminator 4)) 
  tcp_rcv_established (./include/linux/skbuff.h:2097 ./include/net/tcp.h:2026 ./include/net/tcp.h:2099 net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179) 
  ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722) 
  ? __pfx_tcp_inbound_hash.constprop.0 (./include/net/tcp.h:2800) 
  ? __pfx_tcp_rcv_established (net/ipv4/tcp_input.c:6006) 
  ? do_raw_spin_lock (./arch/x86/include/asm/atomic.h:107 ./include/linux/atomic/atomic-arch-fallback.h:2170 ./include/linux/atomic/atomic-instrumented.h:1302 ./include/asm-generic/qspinlock.h:111 kernel/locking/spinlock_debug.c:116) 
  tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1956) 
  tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345) 
  ? __pfx_tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2172) 
  ? __pfx_raw_local_deliver (net/ipv4/raw.c:201) 
  ? __pfx_mark_lock (kernel/locking/lockdep.c:4639) 
  ? __pfx_lock_release (kernel/locking/lockdep.c:5762) 
  ? lock_is_held_type (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5826) 
  ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8)) 
  ip_local_deliver_finish (./include/linux/rcupdate.h:810 net/ipv4/ip_input.c:234) 
  ip_local_deliver (./include/linux/netfilter.h:314 ./include/linux/netfilter.h:308 net/ipv4/ip_input.c:254) 
  ? __pfx_ip_local_deliver (net/ipv4/ip_input.c:243) 
  ? lock_is_held_type (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5826) 
  ? ip_rcv_finish_core.constprop.0 (./include/net/net_namespace.h:383 ./include/linux/netdevice.h:2577 net/ipv4/ip_input.c:372) 
  ip_rcv (./include/net/dst.h:460 net/ipv4/ip_input.c:449 ./include/linux/netfilter.h:314 ./include/linux/netfilter.h:308 net/ipv4/ip_input.c:569) 
  ? __pfx_ip_rcv (net/ipv4/ip_input.c:562) 
  ? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719) 
  ? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719) 
  ? __pfx_ip_rcv (net/ipv4/ip_input.c:562) 
  __netif_receive_skb_one_core (net/core/dev.c:5624 (discriminator 4)) 
  ? __pfx___netif_receive_skb_one_core (net/core/dev.c:5617) 
  ? mark_held_locks (kernel/locking/lockdep.c:4274) 
  process_backlog (./include/linux/rcupdate.h:810 net/core/dev.c:6068) 
  __napi_poll.constprop.0 (net/core/dev.c:6721) 
  net_rx_action (net/core/dev.c:6792 net/core/dev.c:6906) 
  ? __pfx_net_rx_action (net/core/dev.c:6870) 
  ? __pfx_rcu_core (kernel/rcu/tree.c:2756) 
  ? mark_held_locks (kernel/locking/lockdep.c:4274) 
  ? __dev_queue_xmit (./include/linux/rcupdate.h:339 ./include/linux/rcupdate.h:849 net/core/dev.c:4420) 
  handle_softirqs (kernel/softirq.c:554) 
  ? __dev_queue_xmit (./include/linux/rcupdate.h:339 ./include/linux/rcupdate.h:849 net/core/dev.c:4420) 
  do_softirq (kernel/softirq.c:455 kernel/softirq.c:442) 
   </IRQ>
   <TASK>
  __local_bh_enable_ip (kernel/softirq.c:382) 
  ? __dev_queue_xmit (./include/linux/rcupdate.h:339 ./include/linux/rcupdate.h:849 net/core/dev.c:4420) 
  __dev_queue_xmit (net/core/dev.c:4421) 
  ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993) 
  ? __pfx_mark_lock (kernel/locking/lockdep.c:4639) 
  ? __pfx___dev_queue_xmit (net/core/dev.c:4302) 
  ? find_held_lock (kernel/locking/lockdep.c:5244) 
  ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776) 
  ? __pfx_lock_release (kernel/locking/lockdep.c:5762) 
  ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993) 
  ? mark_held_locks (kernel/locking/lockdep.c:4274) 
  ip_finish_output2 (./include/linux/netdevice.h:3095 ./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv4/ip_output.c:235) 
  ? __pfx_nf_hook (./include/linux/netfilter.h:227) 
  ? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719) 
  ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:199) 
  ? ip_skb_dst_mtu (./include/net/net_namespace.h:383 ./include/linux/netdevice.h:2577 ./include/net/ip.h:465 ./include/net/ip.h:502) 
  ? __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4)) 
  __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4)) 
  ? __skb_clone (./arch/x86/include/asm/atomic.h:53 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:992 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:436 (discriminator 4) net/core/skbuff.c:1576 (discriminator 4)) 
  __tcp_transmit_skb (net/ipv4/tcp_output.c:1466 (discriminator 4)) 
  ? __pfx___tcp_transmit_skb (net/ipv4/tcp_output.c:1287) 
  ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776) 
  ? __pfx_lock_release (kernel/locking/lockdep.c:5762) 
  ? ktime_get (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 ./arch/x86/include/asm/irqflags.h:135 ./include/linux/seqlock.h:74 kernel/time/timekeeping.c:848) 
  ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421) 
  tcp_write_xmit (net/ipv4/tcp_output.c:2829) 
  ? __pfx_mem_cgroup_charge_skmem (mm/memcontrol.c:7886) 
  ? skb_page_frag_refill (net/core/sock.c:2920 net/core/sock.c:2904) 
  __tcp_push_pending_frames (net/ipv4/tcp_output.c:3014) 
  tcp_sendmsg_locked (net/ipv4/tcp.c:1316) 
  ? print_usage_bug.part.0 (kernel/locking/lockdep.c:3980) 
  ? __pfx_tcp_sendmsg_locked (net/ipv4/tcp.c:1046) 
  ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776) 
  ? __local_bh_enable_ip (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 kernel/softirq.c:387) 
  tcp_sendmsg (net/ipv4/tcp.c:1349) 
  __sys_sendto (net/socket.c:730 net/socket.c:745 net/socket.c:2192) 
  ? __pfx___sys_sendto (net/socket.c:2162) 
  ? lock_is_held_type (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5826) 
  ? fd_install (./arch/x86/include/asm/preempt.h:103 ./include/linux/rcupdate.h:896 fs/file.c:631) 
  ? __sys_accept4 (./include/linux/file.h:47 net/socket.c:2002) 
  ? __pfx___sys_accept4 (net/socket.c:1994) 
  ? handle_mm_fault (./include/linux/memcontrol.h:1078 ./include/linux/memcontrol.h:1066 mm/memory.c:5557 mm/memory.c:5704) 
  __x64_sys_sendto (net/socket.c:2200) 
  ? do_syscall_64 (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 ./include/linux/entry-common.h:197 arch/x86/entry/common.c:79) 
  ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421) 
  do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) 
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recursive locking in sockmap
  2024-06-06 10:00 Recursive locking in sockmap Vincent Whitchurch
@ 2024-06-06 12:46 ` Jason Xing
  2024-06-07 12:09   ` Vincent Whitchurch
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Xing @ 2024-06-06 12:46 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: John Fastabend, Jakub Sitnicki, Jason Xing, netdev, bpf

Hello Vincent,

On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
<vincent.whitchurch@datadoghq.com> wrote:
>
> With a socket in the sockmap, if there's a parser callback installed
> and the verdict callback returns SK_PASS, the kernel deadlocks
> immediately after the verdict callback is run. This started at commit
> 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> pointer dereference in sk_psock_skb_ingress_enqueue").
>
> It can be reproduced by running ./test_sockmap -t ping
> --txmsg_pass_skb.  The --txmsg_pass_skb command to test_sockmap is
> available in this series:
> https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.

Thanks for your report.

I don't have time right now to look into this issue carefully until
this weekend. BTW, did you mean the patch [2/5] in the link that can
solve the problem?

Thanks,
Jason

>
> Lockdep splat below (also attached in case it gets damaged). This is
> from an unmodified 6.10.0-rc2, but the problem also exists on latest
> mainline and net-next.
>
>  ============================================
>  WARNING: possible recursive locking detected
>  6.10.0-rc2 #59 Not tainted
>  --------------------------------------------
>  test_sockmap/342 is trying to acquire lock:
>  ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
> sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
> net/core/skmsg.c:555)
>
>  but task is already holding lock:
>  ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
> sk_psock_strp_data_ready (net/core/skmsg.c:1120)
>
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
>
>         CPU0
>         ----
>    lock(clock-AF_INET);
>    lock(clock-AF_INET);
>
>   *** DEADLOCK ***
>
>   May be due to missing lock nesting notation
>
>  9 locks held by test_sockmap/342:
>  #0: ffff888007a85818 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg
> (net/ipv4/tcp.c:1348)
>  #1: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit
> (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781
> net/ipv4/ip_output.c:470)
>  #2: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
> ip_finish_output2 (./include/linux/rcupdate.h:329
> ./include/linux/rcupdate.h:781 net/ipv4/ip_output.c:228)
>  #3: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: process_backlog
> (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781
> net/core/dev.c:6066)
>  #4: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
> ip_local_deliver_finish (./include/linux/rcupdate.h:329
> ./include/linux/rcupdate.h:781 net/ipv4/ip_input.c:232)
>  #5: ffff888007a87018 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv
> (./include/linux/skbuff.h:1640 ./include/net/tcp.h:2510
> net/ipv4/tcp_ipv4.c:2342)
>  #6: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
> sk_psock_strp_data_ready (./include/linux/rcupdate.h:329
> ./include/linux/rcupdate.h:781 net/core/skmsg.c:1113)
>  #7: ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
> sk_psock_strp_data_ready (net/core/skmsg.c:1120)
>  #8: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
> sk_psock_strp_read (./include/linux/rcupdate.h:329
> ./include/linux/rcupdate.h:781 net/core/skmsg.c:1062)
>
>  stack backtrace:
>  CPU: 0 PID: 342 Comm: test_sockmap Not tainted 6.10.0-rc2 #59
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
>  Call Trace:
>    <IRQ>
>   dump_stack_lvl (lib/dump_stack.c:118)
>   __lock_acquire (kernel/locking/lockdep.c:3858 kernel/locking/lockdep.c:5137)
>   ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
>   ? tcp_rcv_established (./include/linux/skbuff.h:2097
> ./include/net/tcp.h:2026 ./include/net/tcp.h:2099
> net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
>   ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
>   ? ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
>   ? ip_local_deliver_finish (./include/linux/rcupdate.h:810
> net/ipv4/ip_input.c:234)
>   ? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
>   lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
>   ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
> net/core/skmsg.c:555)
>   ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
>   ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
>   ? mark_held_locks (kernel/locking/lockdep.c:4274)
>   ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:466
> net/core/skmsg.c:555)
>   _raw_read_lock_bh (./include/linux/rwlock_api_smp.h:177
> kernel/locking/spinlock.c:252)
>   ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
> net/core/skmsg.c:555)
>   sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
> net/core/skmsg.c:555)
>   sk_psock_skb_ingress_self (net/core/skmsg.c:607)
>   sk_psock_verdict_apply (net/core/skmsg.c:1008)
>   sk_psock_strp_read (./include/linux/rcupdate.h:810 net/core/skmsg.c:1081)
>   ? sk_psock_strp_parse (net/core/skmsg.c:1104)
>   __strp_recv (net/strparser/strparser.c:301 (discriminator 3))
>   tcp_read_sock (net/ipv4/tcp.c:1583)
>   ? __pfx_strp_recv (net/strparser/strparser.c:332)
>   ? __pfx_tcp_read_sock (net/ipv4/tcp.c:1560)
>   ? lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
>   strp_read_sock (net/strparser/strparser.c:358)
>   ? __pfx_strp_read_sock (net/strparser/strparser.c:346)
>   ? __pfx_do_raw_write_lock (kernel/locking/spinlock_debug.c:209)
>   ? lock_is_held_type (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5826)
>   strp_data_ready (net/strparser/strparser.c:388 net/strparser/strparser.c:366)
>   sk_psock_strp_data_ready (net/core/skmsg.c:1121)
>   tcp_data_queue (net/ipv4/tcp_input.c:5234)
>   ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
>   ? __pfx_tcp_data_queue (net/ipv4/tcp_input.c:5148)
>   ? __pfx_tcp_urg (net/ipv4/tcp_input.c:5820)
>   ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
>   ? kvm_clock_get_cycles (./arch/x86/include/asm/preempt.h:94
> arch/x86/kernel/kvmclock.c:80 arch/x86/kernel/kvmclock.c:86)
>   ? ktime_get (kernel/time/timekeeping.c:195 (discriminator 4)
> kernel/time/timekeeping.c:395 (discriminator 4)
> kernel/time/timekeeping.c:403 (discriminator 4)
> kernel/time/timekeeping.c:850 (discriminator 4))
>   tcp_rcv_established (./include/linux/skbuff.h:2097
> ./include/net/tcp.h:2026 ./include/net/tcp.h:2099
> net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
>   ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
>   ? __pfx_tcp_inbound_hash.constprop.0 (./include/net/tcp.h:2800)
>   ? __pfx_tcp_rcv_established (net/ipv4/tcp_input.c:6006)
>   ? do_raw_spin_lock (./arch/x86/include/asm/atomic.h:107
> ./include/linux/atomic/atomic-arch-fallback.h:2170
> ./include/linux/atomic/atomic-instrumented.h:1302
> ./include/asm-generic/qspinlock.h:111
> kernel/locking/spinlock_debug.c:116)
>   tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1956)
>   tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
>   ? __pfx_tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2172)
>   ? __pfx_raw_local_deliver (net/ipv4/raw.c:201)
>   ? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
>   ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
>   ? lock_is_held_type (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5826)
>   ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
>   ip_local_deliver_finish (./include/linux/rcupdate.h:810
> net/ipv4/ip_input.c:234)
>   ip_local_deliver (./include/linux/netfilter.h:314
> ./include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
>   ? __pfx_ip_local_deliver (net/ipv4/ip_input.c:243)
>   ? lock_is_held_type (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5826)
>   ? ip_rcv_finish_core.constprop.0 (./include/net/net_namespace.h:383
> ./include/linux/netdevice.h:2577 net/ipv4/ip_input.c:372)
>   ip_rcv (./include/net/dst.h:460 net/ipv4/ip_input.c:449
> ./include/linux/netfilter.h:314 ./include/linux/netfilter.h:308
> net/ipv4/ip_input.c:569)
>   ? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
>   ? lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
>   ? lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
>   ? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
>   __netif_receive_skb_one_core (net/core/dev.c:5624 (discriminator 4))
>   ? __pfx___netif_receive_skb_one_core (net/core/dev.c:5617)
>   ? mark_held_locks (kernel/locking/lockdep.c:4274)
>   process_backlog (./include/linux/rcupdate.h:810 net/core/dev.c:6068)
>   __napi_poll.constprop.0 (net/core/dev.c:6721)
>   net_rx_action (net/core/dev.c:6792 net/core/dev.c:6906)
>   ? __pfx_net_rx_action (net/core/dev.c:6870)
>   ? __pfx_rcu_core (kernel/rcu/tree.c:2756)
>   ? mark_held_locks (kernel/locking/lockdep.c:4274)
>   ? __dev_queue_xmit (./include/linux/rcupdate.h:339
> ./include/linux/rcupdate.h:849 net/core/dev.c:4420)
>   handle_softirqs (kernel/softirq.c:554)
>   ? __dev_queue_xmit (./include/linux/rcupdate.h:339
> ./include/linux/rcupdate.h:849 net/core/dev.c:4420)
>   do_softirq (kernel/softirq.c:455 kernel/softirq.c:442)
>    </IRQ>
>    <TASK>
>   __local_bh_enable_ip (kernel/softirq.c:382)
>   ? __dev_queue_xmit (./include/linux/rcupdate.h:339
> ./include/linux/rcupdate.h:849 net/core/dev.c:4420)
>   __dev_queue_xmit (net/core/dev.c:4421)
>   ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
>   ? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
>   ? __pfx___dev_queue_xmit (net/core/dev.c:4302)
>   ? find_held_lock (kernel/locking/lockdep.c:5244)
>   ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
>   ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
>   ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
>   ? mark_held_locks (kernel/locking/lockdep.c:4274)
>   ip_finish_output2 (./include/linux/netdevice.h:3095
> ./include/net/neighbour.h:526 ./include/net/neighbour.h:540
> net/ipv4/ip_output.c:235)
>   ? __pfx_nf_hook (./include/linux/netfilter.h:227)
>   ? lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
>   ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:199)
>   ? ip_skb_dst_mtu (./include/net/net_namespace.h:383
> ./include/linux/netdevice.h:2577 ./include/net/ip.h:465
> ./include/net/ip.h:502)
>   ? __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
>   __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
>   ? __skb_clone (./arch/x86/include/asm/atomic.h:53 (discriminator 4)
> ./include/linux/atomic/atomic-arch-fallback.h:992 (discriminator 4)
> ./include/linux/atomic/atomic-instrumented.h:436 (discriminator 4)
> net/core/skbuff.c:1576 (discriminator 4))
>   __tcp_transmit_skb (net/ipv4/tcp_output.c:1466 (discriminator 4))
>   ? __pfx___tcp_transmit_skb (net/ipv4/tcp_output.c:1287)
>   ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
>   ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
>   ? ktime_get (./arch/x86/include/asm/irqflags.h:42
> ./arch/x86/include/asm/irqflags.h:77
> ./arch/x86/include/asm/irqflags.h:135 ./include/linux/seqlock.h:74
> kernel/time/timekeeping.c:848)
>   ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
>   tcp_write_xmit (net/ipv4/tcp_output.c:2829)
>   ? __pfx_mem_cgroup_charge_skmem (mm/memcontrol.c:7886)
>   ? skb_page_frag_refill (net/core/sock.c:2920 net/core/sock.c:2904)
>   __tcp_push_pending_frames (net/ipv4/tcp_output.c:3014)
>   tcp_sendmsg_locked (net/ipv4/tcp.c:1316)
>   ? print_usage_bug.part.0 (kernel/locking/lockdep.c:3980)
>   ? __pfx_tcp_sendmsg_locked (net/ipv4/tcp.c:1046)
>   ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
>   ? __local_bh_enable_ip (./arch/x86/include/asm/irqflags.h:42
> ./arch/x86/include/asm/irqflags.h:77 kernel/softirq.c:387)
>   tcp_sendmsg (net/ipv4/tcp.c:1349)
>   __sys_sendto (net/socket.c:730 net/socket.c:745 net/socket.c:2192)
>   ? __pfx___sys_sendto (net/socket.c:2162)
>   ? lock_is_held_type (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5826)
>   ? fd_install (./arch/x86/include/asm/preempt.h:103
> ./include/linux/rcupdate.h:896 fs/file.c:631)
>   ? __sys_accept4 (./include/linux/file.h:47 net/socket.c:2002)
>   ? __pfx___sys_accept4 (net/socket.c:1994)
>   ? handle_mm_fault (./include/linux/memcontrol.h:1078
> ./include/linux/memcontrol.h:1066 mm/memory.c:5557 mm/memory.c:5704)
>   __x64_sys_sendto (net/socket.c:2200)
>   ? do_syscall_64 (./arch/x86/include/asm/irqflags.h:42
> ./arch/x86/include/asm/irqflags.h:77
> ./include/linux/entry-common.h:197 arch/x86/entry/common.c:79)
>   ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
>   do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recursive locking in sockmap
  2024-06-06 12:46 ` Jason Xing
@ 2024-06-07 12:09   ` Vincent Whitchurch
  2024-06-07 16:00     ` Cong Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Vincent Whitchurch @ 2024-06-07 12:09 UTC (permalink / raw)
  To: Jason Xing; +Cc: John Fastabend, Jakub Sitnicki, Jason Xing, netdev, bpf

On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> <vincent.whitchurch@datadoghq.com> wrote:
> > With a socket in the sockmap, if there's a parser callback installed
> > and the verdict callback returns SK_PASS, the kernel deadlocks
> > immediately after the verdict callback is run. This started at commit
> > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > pointer dereference in sk_psock_skb_ingress_enqueue").
> >
> > It can be reproduced by running ./test_sockmap -t ping
> > --txmsg_pass_skb.  The --txmsg_pass_skb command to test_sockmap is
> > available in this series:
> > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
>
> I don't have time right now to look into this issue carefully until
> this weekend. BTW, did you mean the patch [2/5] in the link that can
> solve the problem?

No.  That patch set addresses a different problem which occurs even if
only a verdict callback is used. But patch 4/5 in that patch set adds
the --txmsg_pass_skb option to the test_sockmap test program, and that
option can be used to reproduce this deadlock too.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recursive locking in sockmap
  2024-06-07 12:09   ` Vincent Whitchurch
@ 2024-06-07 16:00     ` Cong Wang
  2024-06-13 19:08       ` John Fastabend
  0 siblings, 1 reply; 6+ messages in thread
From: Cong Wang @ 2024-06-07 16:00 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: Jason Xing, John Fastabend, Jakub Sitnicki, Jason Xing, netdev,
	bpf

On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote:
> On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> > <vincent.whitchurch@datadoghq.com> wrote:
> > > With a socket in the sockmap, if there's a parser callback installed
> > > and the verdict callback returns SK_PASS, the kernel deadlocks
> > > immediately after the verdict callback is run. This started at commit
> > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > > pointer dereference in sk_psock_skb_ingress_enqueue").
> > >
> > > It can be reproduced by running ./test_sockmap -t ping
> > > --txmsg_pass_skb.  The --txmsg_pass_skb command to test_sockmap is
> > > available in this series:
> > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
> >
> > I don't have time right now to look into this issue carefully until
> > this weekend. BTW, did you mean the patch [2/5] in the link that can
> > solve the problem?
> 
> No.  That patch set addresses a different problem which occurs even if
> only a verdict callback is used. But patch 4/5 in that patch set adds
> the --txmsg_pass_skb option to the test_sockmap test program, and that
> option can be used to reproduce this deadlock too.

I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you
test the following patch?

------------>

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..da64ded97f3a 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk)
 		if (tls_sw_has_ctx_rx(sk)) {
 			psock->saved_data_ready(sk);
 		} else {
-			write_lock_bh(&sk->sk_callback_lock);
 			strp_data_ready(&psock->strp);
-			write_unlock_bh(&sk->sk_callback_lock);
 		}
 	}
 	rcu_read_unlock();

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Recursive locking in sockmap
  2024-06-07 16:00     ` Cong Wang
@ 2024-06-13 19:08       ` John Fastabend
  2024-07-20 18:20         ` Cong Wang
  0 siblings, 1 reply; 6+ messages in thread
From: John Fastabend @ 2024-06-13 19:08 UTC (permalink / raw)
  To: Cong Wang, Vincent Whitchurch
  Cc: Jason Xing, John Fastabend, Jakub Sitnicki, Jason Xing, netdev,
	bpf

Cong Wang wrote:
> On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote:
> > On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> > > <vincent.whitchurch@datadoghq.com> wrote:
> > > > With a socket in the sockmap, if there's a parser callback installed
> > > > and the verdict callback returns SK_PASS, the kernel deadlocks
> > > > immediately after the verdict callback is run. This started at commit
> > > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > > > pointer dereference in sk_psock_skb_ingress_enqueue").
> > > >
> > > > It can be reproduced by running ./test_sockmap -t ping
> > > > --txmsg_pass_skb.  The --txmsg_pass_skb command to test_sockmap is
> > > > available in this series:
> > > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
> > >
> > > I don't have time right now to look into this issue carefully until
> > > this weekend. BTW, did you mean the patch [2/5] in the link that can
> > > solve the problem?
> > 
> > No.  That patch set addresses a different problem which occurs even if
> > only a verdict callback is used. But patch 4/5 in that patch set adds
> > the --txmsg_pass_skb option to the test_sockmap test program, and that
> > option can be used to reproduce this deadlock too.
> 
> I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you
> test the following patch?
> 
> ------------>
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..da64ded97f3a 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk)
>  		if (tls_sw_has_ctx_rx(sk)) {
>  			psock->saved_data_ready(sk);
>  		} else {
> -			write_lock_bh(&sk->sk_callback_lock);
>  			strp_data_ready(&psock->strp);
> -			write_unlock_bh(&sk->sk_callback_lock);
>  		}
>  	}
>  	rcu_read_unlock();

Its not obvious to me that we can run the strp parser without the
sk_callback lock here. I believe below is the correct fix. It
fixes the splat above with test.

bpf: sockmap, fix introduced strparser recursive lock

Originally there was a race where removing a psock from the sock map while
it was also receiving an skb and calling sk_psock_data_ready(). It was
possible the removal code would NULL/set the data_ready callback while
concurrently calling the hook from receive path. The fix was to wrap the
access in sk_callback_lock to ensure the saved_data_ready pointer didn't
change under us. There was some discussion around doing a larger change
to ensure we could use READ_ONCE/WRITE_ONCE over the callback, but that
was for *next kernels not stable fixes.

But, we unfortunately introduced a regression with the fix because there
is another path into this code (that didn't have a test case) through
the stream parser. The stream parser runs with the lower lock which means
we get the following splat and lock up.


 ============================================
 WARNING: possible recursive locking detected
 6.10.0-rc2 #59 Not tainted
 --------------------------------------------
 test_sockmap/342 is trying to acquire lock:
 ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
 sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
 net/core/skmsg.c:555)

 but task is already holding lock:
 ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
 sk_psock_strp_data_ready (net/core/skmsg.c:1120)

To fix ensure we do not grap lock when we reach this code through the
strparser.

Fixes: 6648e613226e1 ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/skmsg.h |    9 +++++++--
 net/core/skmsg.c      |    5 ++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c9efda9df285..3659e9b514d0 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -461,13 +461,18 @@ static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock)
 		sk_psock_drop(sk, psock);
 }
 
-static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
+static inline void __sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
 {
-	read_lock_bh(&sk->sk_callback_lock);
 	if (psock->saved_data_ready)
 		psock->saved_data_ready(sk);
 	else
 		sk->sk_data_ready(sk);
+}
+
+static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
+{
+	read_lock_bh(&sk->sk_callback_lock);
+	__sk_psock_data_ready(sk, psock);
 	read_unlock_bh(&sk->sk_callback_lock);
 }
 
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..8429daecbbb6 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -552,7 +552,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
 	msg->skb = skb;
 
 	sk_psock_queue_msg(psock, msg);
-	sk_psock_data_ready(sk, psock);
+	if (skb_bpf_strparser(skb))
+		__sk_psock_data_ready(sk, psock);
+	else
+		sk_psock_data_ready(sk, psock);
 	return copied;
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Recursive locking in sockmap
  2024-06-13 19:08       ` John Fastabend
@ 2024-07-20 18:20         ` Cong Wang
  0 siblings, 0 replies; 6+ messages in thread
From: Cong Wang @ 2024-07-20 18:20 UTC (permalink / raw)
  To: John Fastabend
  Cc: Vincent Whitchurch, Jason Xing, Jakub Sitnicki, Jason Xing,
	netdev, bpf

On Thu, Jun 13, 2024 at 12:08:25PM -0700, John Fastabend wrote:
> Cong Wang wrote:
> > On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote:
> > > On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> > > > <vincent.whitchurch@datadoghq.com> wrote:
> > > > > With a socket in the sockmap, if there's a parser callback installed
> > > > > and the verdict callback returns SK_PASS, the kernel deadlocks
> > > > > immediately after the verdict callback is run. This started at commit
> > > > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > > > > pointer dereference in sk_psock_skb_ingress_enqueue").
> > > > >
> > > > > It can be reproduced by running ./test_sockmap -t ping
> > > > > --txmsg_pass_skb.  The --txmsg_pass_skb command to test_sockmap is
> > > > > available in this series:
> > > > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
> > > >
> > > > I don't have time right now to look into this issue carefully until
> > > > this weekend. BTW, did you mean the patch [2/5] in the link that can
> > > > solve the problem?
> > > 
> > > No.  That patch set addresses a different problem which occurs even if
> > > only a verdict callback is used. But patch 4/5 in that patch set adds
> > > the --txmsg_pass_skb option to the test_sockmap test program, and that
> > > option can be used to reproduce this deadlock too.
> > 
> > I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you
> > test the following patch?
> > 
> > ------------>
> > 
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index fd20aae30be2..da64ded97f3a 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk)
> >  		if (tls_sw_has_ctx_rx(sk)) {
> >  			psock->saved_data_ready(sk);
> >  		} else {
> > -			write_lock_bh(&sk->sk_callback_lock);
> >  			strp_data_ready(&psock->strp);
> > -			write_unlock_bh(&sk->sk_callback_lock);
> >  		}
> >  	}
> >  	rcu_read_unlock();
> 
> Its not obvious to me that we can run the strp parser without the
> sk_callback lock here. I believe below is the correct fix. It
> fixes the splat above with test.

The lock is still there, but just read lock. And I don't see any writing to
psock->strp in strp_data_ready(), so using read lock makes sense to me.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-20 18:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-06 10:00 Recursive locking in sockmap Vincent Whitchurch
2024-06-06 12:46 ` Jason Xing
2024-06-07 12:09   ` Vincent Whitchurch
2024-06-07 16:00     ` Cong Wang
2024-06-13 19:08       ` John Fastabend
2024-07-20 18:20         ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).