* Recursive locking in sockmap
@ 2024-06-06 10:00 Vincent Whitchurch
2024-06-06 12:46 ` Jason Xing
0 siblings, 1 reply; 6+ messages in thread
From: Vincent Whitchurch @ 2024-06-06 10:00 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki, Jason Xing; +Cc: netdev, bpf
[-- Attachment #1: Type: text/plain, Size: 12047 bytes --]
With a socket in the sockmap, if there's a parser callback installed
and the verdict callback returns SK_PASS, the kernel deadlocks
immediately after the verdict callback is run. This started at commit
6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
pointer dereference in sk_psock_skb_ingress_enqueue").
It can be reproduced by running ./test_sockmap -t ping
--txmsg_pass_skb. The --txmsg_pass_skb command to test_sockmap is
available in this series:
https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
Lockdep splat below (also attached in case it gets damaged). This is
from an unmodified 6.10.0-rc2, but the problem also exists on latest
mainline and net-next.
============================================
WARNING: possible recursive locking detected
6.10.0-rc2 #59 Not tainted
--------------------------------------------
test_sockmap/342 is trying to acquire lock:
ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)
but task is already holding lock:
ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_strp_data_ready (net/core/skmsg.c:1120)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(clock-AF_INET);
lock(clock-AF_INET);
*** DEADLOCK ***
May be due to missing lock nesting notation
9 locks held by test_sockmap/342:
#0: ffff888007a85818 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg
(net/ipv4/tcp.c:1348)
#1: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit
(./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781
net/ipv4/ip_output.c:470)
#2: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
ip_finish_output2 (./include/linux/rcupdate.h:329
./include/linux/rcupdate.h:781 net/ipv4/ip_output.c:228)
#3: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: process_backlog
(./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781
net/core/dev.c:6066)
#4: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
ip_local_deliver_finish (./include/linux/rcupdate.h:329
./include/linux/rcupdate.h:781 net/ipv4/ip_input.c:232)
#5: ffff888007a87018 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv
(./include/linux/skbuff.h:1640 ./include/net/tcp.h:2510
net/ipv4/tcp_ipv4.c:2342)
#6: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
sk_psock_strp_data_ready (./include/linux/rcupdate.h:329
./include/linux/rcupdate.h:781 net/core/skmsg.c:1113)
#7: ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_strp_data_ready (net/core/skmsg.c:1120)
#8: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
sk_psock_strp_read (./include/linux/rcupdate.h:329
./include/linux/rcupdate.h:781 net/core/skmsg.c:1062)
stack backtrace:
CPU: 0 PID: 342 Comm: test_sockmap Not tainted 6.10.0-rc2 #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:118)
__lock_acquire (kernel/locking/lockdep.c:3858 kernel/locking/lockdep.c:5137)
? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
? tcp_rcv_established (./include/linux/skbuff.h:2097
./include/net/tcp.h:2026 ./include/net/tcp.h:2099
net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
? ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
? ip_local_deliver_finish (./include/linux/rcupdate.h:810
net/ipv4/ip_input.c:234)
? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)
? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
? __pfx_lock_release (kernel/locking/lockdep.c:5762)
? mark_held_locks (kernel/locking/lockdep.c:4274)
? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:466
net/core/skmsg.c:555)
_raw_read_lock_bh (./include/linux/rwlock_api_smp.h:177
kernel/locking/spinlock.c:252)
? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)
sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)
sk_psock_skb_ingress_self (net/core/skmsg.c:607)
sk_psock_verdict_apply (net/core/skmsg.c:1008)
sk_psock_strp_read (./include/linux/rcupdate.h:810 net/core/skmsg.c:1081)
? sk_psock_strp_parse (net/core/skmsg.c:1104)
__strp_recv (net/strparser/strparser.c:301 (discriminator 3))
tcp_read_sock (net/ipv4/tcp.c:1583)
? __pfx_strp_recv (net/strparser/strparser.c:332)
? __pfx_tcp_read_sock (net/ipv4/tcp.c:1560)
? lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
strp_read_sock (net/strparser/strparser.c:358)
? __pfx_strp_read_sock (net/strparser/strparser.c:346)
? __pfx_do_raw_write_lock (kernel/locking/spinlock_debug.c:209)
? lock_is_held_type (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5826)
strp_data_ready (net/strparser/strparser.c:388 net/strparser/strparser.c:366)
sk_psock_strp_data_ready (net/core/skmsg.c:1121)
tcp_data_queue (net/ipv4/tcp_input.c:5234)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
? __pfx_tcp_data_queue (net/ipv4/tcp_input.c:5148)
? __pfx_tcp_urg (net/ipv4/tcp_input.c:5820)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
? kvm_clock_get_cycles (./arch/x86/include/asm/preempt.h:94
arch/x86/kernel/kvmclock.c:80 arch/x86/kernel/kvmclock.c:86)
? ktime_get (kernel/time/timekeeping.c:195 (discriminator 4)
kernel/time/timekeeping.c:395 (discriminator 4)
kernel/time/timekeeping.c:403 (discriminator 4)
kernel/time/timekeeping.c:850 (discriminator 4))
tcp_rcv_established (./include/linux/skbuff.h:2097
./include/net/tcp.h:2026 ./include/net/tcp.h:2099
net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
? __pfx_tcp_inbound_hash.constprop.0 (./include/net/tcp.h:2800)
? __pfx_tcp_rcv_established (net/ipv4/tcp_input.c:6006)
? do_raw_spin_lock (./arch/x86/include/asm/atomic.h:107
./include/linux/atomic/atomic-arch-fallback.h:2170
./include/linux/atomic/atomic-instrumented.h:1302
./include/asm-generic/qspinlock.h:111
kernel/locking/spinlock_debug.c:116)
tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1956)
tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
? __pfx_tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2172)
? __pfx_raw_local_deliver (net/ipv4/raw.c:201)
? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
? __pfx_lock_release (kernel/locking/lockdep.c:5762)
? lock_is_held_type (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5826)
ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
ip_local_deliver_finish (./include/linux/rcupdate.h:810
net/ipv4/ip_input.c:234)
ip_local_deliver (./include/linux/netfilter.h:314
./include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
? __pfx_ip_local_deliver (net/ipv4/ip_input.c:243)
? lock_is_held_type (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5826)
? ip_rcv_finish_core.constprop.0 (./include/net/net_namespace.h:383
./include/linux/netdevice.h:2577 net/ipv4/ip_input.c:372)
ip_rcv (./include/net/dst.h:460 net/ipv4/ip_input.c:449
./include/linux/netfilter.h:314 ./include/linux/netfilter.h:308
net/ipv4/ip_input.c:569)
? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
? lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
? lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
__netif_receive_skb_one_core (net/core/dev.c:5624 (discriminator 4))
? __pfx___netif_receive_skb_one_core (net/core/dev.c:5617)
? mark_held_locks (kernel/locking/lockdep.c:4274)
process_backlog (./include/linux/rcupdate.h:810 net/core/dev.c:6068)
__napi_poll.constprop.0 (net/core/dev.c:6721)
net_rx_action (net/core/dev.c:6792 net/core/dev.c:6906)
? __pfx_net_rx_action (net/core/dev.c:6870)
? __pfx_rcu_core (kernel/rcu/tree.c:2756)
? mark_held_locks (kernel/locking/lockdep.c:4274)
? __dev_queue_xmit (./include/linux/rcupdate.h:339
./include/linux/rcupdate.h:849 net/core/dev.c:4420)
handle_softirqs (kernel/softirq.c:554)
? __dev_queue_xmit (./include/linux/rcupdate.h:339
./include/linux/rcupdate.h:849 net/core/dev.c:4420)
do_softirq (kernel/softirq.c:455 kernel/softirq.c:442)
</IRQ>
<TASK>
__local_bh_enable_ip (kernel/softirq.c:382)
? __dev_queue_xmit (./include/linux/rcupdate.h:339
./include/linux/rcupdate.h:849 net/core/dev.c:4420)
__dev_queue_xmit (net/core/dev.c:4421)
? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
? __pfx___dev_queue_xmit (net/core/dev.c:4302)
? find_held_lock (kernel/locking/lockdep.c:5244)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
? __pfx_lock_release (kernel/locking/lockdep.c:5762)
? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
? mark_held_locks (kernel/locking/lockdep.c:4274)
ip_finish_output2 (./include/linux/netdevice.h:3095
./include/net/neighbour.h:526 ./include/net/neighbour.h:540
net/ipv4/ip_output.c:235)
? __pfx_nf_hook (./include/linux/netfilter.h:227)
? lock_acquire (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:199)
? ip_skb_dst_mtu (./include/net/net_namespace.h:383
./include/linux/netdevice.h:2577 ./include/net/ip.h:465
./include/net/ip.h:502)
? __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
__ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
? __skb_clone (./arch/x86/include/asm/atomic.h:53 (discriminator 4)
./include/linux/atomic/atomic-arch-fallback.h:992 (discriminator 4)
./include/linux/atomic/atomic-instrumented.h:436 (discriminator 4)
net/core/skbuff.c:1576 (discriminator 4))
__tcp_transmit_skb (net/ipv4/tcp_output.c:1466 (discriminator 4))
? __pfx___tcp_transmit_skb (net/ipv4/tcp_output.c:1287)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
? __pfx_lock_release (kernel/locking/lockdep.c:5762)
? ktime_get (./arch/x86/include/asm/irqflags.h:42
./arch/x86/include/asm/irqflags.h:77
./arch/x86/include/asm/irqflags.h:135 ./include/linux/seqlock.h:74
kernel/time/timekeeping.c:848)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
tcp_write_xmit (net/ipv4/tcp_output.c:2829)
? __pfx_mem_cgroup_charge_skmem (mm/memcontrol.c:7886)
? skb_page_frag_refill (net/core/sock.c:2920 net/core/sock.c:2904)
__tcp_push_pending_frames (net/ipv4/tcp_output.c:3014)
tcp_sendmsg_locked (net/ipv4/tcp.c:1316)
? print_usage_bug.part.0 (kernel/locking/lockdep.c:3980)
? __pfx_tcp_sendmsg_locked (net/ipv4/tcp.c:1046)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
? __local_bh_enable_ip (./arch/x86/include/asm/irqflags.h:42
./arch/x86/include/asm/irqflags.h:77 kernel/softirq.c:387)
tcp_sendmsg (net/ipv4/tcp.c:1349)
__sys_sendto (net/socket.c:730 net/socket.c:745 net/socket.c:2192)
? __pfx___sys_sendto (net/socket.c:2162)
? lock_is_held_type (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5826)
? fd_install (./arch/x86/include/asm/preempt.h:103
./include/linux/rcupdate.h:896 fs/file.c:631)
? __sys_accept4 (./include/linux/file.h:47 net/socket.c:2002)
? __pfx___sys_accept4 (net/socket.c:1994)
? handle_mm_fault (./include/linux/memcontrol.h:1078
./include/linux/memcontrol.h:1066 mm/memory.c:5557 mm/memory.c:5704)
__x64_sys_sendto (net/socket.c:2200)
? do_syscall_64 (./arch/x86/include/asm/irqflags.h:42
./arch/x86/include/asm/irqflags.h:77
./include/linux/entry-common.h:197 arch/x86/entry/common.c:79)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[-- Attachment #2: splat.txt --]
[-- Type: text/plain, Size: 11458 bytes --]
============================================
WARNING: possible recursive locking detected
6.10.0-rc2 #59 Not tainted
--------------------------------------------
test_sockmap/342 is trying to acquire lock:
ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at: sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467 net/core/skmsg.c:555)
but task is already holding lock:
ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at: sk_psock_strp_data_ready (net/core/skmsg.c:1120)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(clock-AF_INET);
lock(clock-AF_INET);
*** DEADLOCK ***
May be due to missing lock nesting notation
9 locks held by test_sockmap/342:
#0: ffff888007a85818 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg (net/ipv4/tcp.c:1348)
#1: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/ipv4/ip_output.c:470)
#2: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2 (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/ipv4/ip_output.c:228)
#3: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: process_backlog (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/core/dev.c:6066)
#4: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/ipv4/ip_input.c:232)
#5: ffff888007a87018 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv (./include/linux/skbuff.h:1640 ./include/net/tcp.h:2510 net/ipv4/tcp_ipv4.c:2342)
#6: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: sk_psock_strp_data_ready (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/core/skmsg.c:1113)
#7: ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at: sk_psock_strp_data_ready (net/core/skmsg.c:1120)
#8: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: sk_psock_strp_read (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781 net/core/skmsg.c:1062)
stack backtrace:
CPU: 0 PID: 342 Comm: test_sockmap Not tainted 6.10.0-rc2 #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:118)
__lock_acquire (kernel/locking/lockdep.c:3858 kernel/locking/lockdep.c:5137)
? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
? tcp_rcv_established (./include/linux/skbuff.h:2097 ./include/net/tcp.h:2026 ./include/net/tcp.h:2099 net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
? ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
? ip_local_deliver_finish (./include/linux/rcupdate.h:810 net/ipv4/ip_input.c:234)
? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467 net/core/skmsg.c:555)
? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
? __pfx_lock_release (kernel/locking/lockdep.c:5762)
? mark_held_locks (kernel/locking/lockdep.c:4274)
? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:466 net/core/skmsg.c:555)
_raw_read_lock_bh (./include/linux/rwlock_api_smp.h:177 kernel/locking/spinlock.c:252)
? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467 net/core/skmsg.c:555)
sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467 net/core/skmsg.c:555)
sk_psock_skb_ingress_self (net/core/skmsg.c:607)
sk_psock_verdict_apply (net/core/skmsg.c:1008)
sk_psock_strp_read (./include/linux/rcupdate.h:810 net/core/skmsg.c:1081)
? sk_psock_strp_parse (net/core/skmsg.c:1104)
__strp_recv (net/strparser/strparser.c:301 (discriminator 3))
tcp_read_sock (net/ipv4/tcp.c:1583)
? __pfx_strp_recv (net/strparser/strparser.c:332)
? __pfx_tcp_read_sock (net/ipv4/tcp.c:1560)
? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
strp_read_sock (net/strparser/strparser.c:358)
? __pfx_strp_read_sock (net/strparser/strparser.c:346)
? __pfx_do_raw_write_lock (kernel/locking/spinlock_debug.c:209)
? lock_is_held_type (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5826)
strp_data_ready (net/strparser/strparser.c:388 net/strparser/strparser.c:366)
sk_psock_strp_data_ready (net/core/skmsg.c:1121)
tcp_data_queue (net/ipv4/tcp_input.c:5234)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
? __pfx_tcp_data_queue (net/ipv4/tcp_input.c:5148)
? __pfx_tcp_urg (net/ipv4/tcp_input.c:5820)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
? kvm_clock_get_cycles (./arch/x86/include/asm/preempt.h:94 arch/x86/kernel/kvmclock.c:80 arch/x86/kernel/kvmclock.c:86)
? ktime_get (kernel/time/timekeeping.c:195 (discriminator 4) kernel/time/timekeeping.c:395 (discriminator 4) kernel/time/timekeeping.c:403 (discriminator 4) kernel/time/timekeeping.c:850 (discriminator 4))
tcp_rcv_established (./include/linux/skbuff.h:2097 ./include/net/tcp.h:2026 ./include/net/tcp.h:2099 net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
? __pfx_tcp_inbound_hash.constprop.0 (./include/net/tcp.h:2800)
? __pfx_tcp_rcv_established (net/ipv4/tcp_input.c:6006)
? do_raw_spin_lock (./arch/x86/include/asm/atomic.h:107 ./include/linux/atomic/atomic-arch-fallback.h:2170 ./include/linux/atomic/atomic-instrumented.h:1302 ./include/asm-generic/qspinlock.h:111 kernel/locking/spinlock_debug.c:116)
tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1956)
tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
? __pfx_tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2172)
? __pfx_raw_local_deliver (net/ipv4/raw.c:201)
? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
? __pfx_lock_release (kernel/locking/lockdep.c:5762)
? lock_is_held_type (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5826)
ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
ip_local_deliver_finish (./include/linux/rcupdate.h:810 net/ipv4/ip_input.c:234)
ip_local_deliver (./include/linux/netfilter.h:314 ./include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
? __pfx_ip_local_deliver (net/ipv4/ip_input.c:243)
? lock_is_held_type (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5826)
? ip_rcv_finish_core.constprop.0 (./include/net/net_namespace.h:383 ./include/linux/netdevice.h:2577 net/ipv4/ip_input.c:372)
ip_rcv (./include/net/dst.h:460 net/ipv4/ip_input.c:449 ./include/linux/netfilter.h:314 ./include/linux/netfilter.h:308 net/ipv4/ip_input.c:569)
? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
__netif_receive_skb_one_core (net/core/dev.c:5624 (discriminator 4))
? __pfx___netif_receive_skb_one_core (net/core/dev.c:5617)
? mark_held_locks (kernel/locking/lockdep.c:4274)
process_backlog (./include/linux/rcupdate.h:810 net/core/dev.c:6068)
__napi_poll.constprop.0 (net/core/dev.c:6721)
net_rx_action (net/core/dev.c:6792 net/core/dev.c:6906)
? __pfx_net_rx_action (net/core/dev.c:6870)
? __pfx_rcu_core (kernel/rcu/tree.c:2756)
? mark_held_locks (kernel/locking/lockdep.c:4274)
? __dev_queue_xmit (./include/linux/rcupdate.h:339 ./include/linux/rcupdate.h:849 net/core/dev.c:4420)
handle_softirqs (kernel/softirq.c:554)
? __dev_queue_xmit (./include/linux/rcupdate.h:339 ./include/linux/rcupdate.h:849 net/core/dev.c:4420)
do_softirq (kernel/softirq.c:455 kernel/softirq.c:442)
</IRQ>
<TASK>
__local_bh_enable_ip (kernel/softirq.c:382)
? __dev_queue_xmit (./include/linux/rcupdate.h:339 ./include/linux/rcupdate.h:849 net/core/dev.c:4420)
__dev_queue_xmit (net/core/dev.c:4421)
? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
? __pfx___dev_queue_xmit (net/core/dev.c:4302)
? find_held_lock (kernel/locking/lockdep.c:5244)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
? __pfx_lock_release (kernel/locking/lockdep.c:5762)
? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
? mark_held_locks (kernel/locking/lockdep.c:4274)
ip_finish_output2 (./include/linux/netdevice.h:3095 ./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv4/ip_output.c:235)
? __pfx_nf_hook (./include/linux/netfilter.h:227)
? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:199)
? ip_skb_dst_mtu (./include/net/net_namespace.h:383 ./include/linux/netdevice.h:2577 ./include/net/ip.h:465 ./include/net/ip.h:502)
? __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
__ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
? __skb_clone (./arch/x86/include/asm/atomic.h:53 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:992 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:436 (discriminator 4) net/core/skbuff.c:1576 (discriminator 4))
__tcp_transmit_skb (net/ipv4/tcp_output.c:1466 (discriminator 4))
? __pfx___tcp_transmit_skb (net/ipv4/tcp_output.c:1287)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
? __pfx_lock_release (kernel/locking/lockdep.c:5762)
? ktime_get (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 ./arch/x86/include/asm/irqflags.h:135 ./include/linux/seqlock.h:74 kernel/time/timekeeping.c:848)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
tcp_write_xmit (net/ipv4/tcp_output.c:2829)
? __pfx_mem_cgroup_charge_skmem (mm/memcontrol.c:7886)
? skb_page_frag_refill (net/core/sock.c:2920 net/core/sock.c:2904)
__tcp_push_pending_frames (net/ipv4/tcp_output.c:3014)
tcp_sendmsg_locked (net/ipv4/tcp.c:1316)
? print_usage_bug.part.0 (kernel/locking/lockdep.c:3980)
? __pfx_tcp_sendmsg_locked (net/ipv4/tcp.c:1046)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
? __local_bh_enable_ip (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 kernel/softirq.c:387)
tcp_sendmsg (net/ipv4/tcp.c:1349)
__sys_sendto (net/socket.c:730 net/socket.c:745 net/socket.c:2192)
? __pfx___sys_sendto (net/socket.c:2162)
? lock_is_held_type (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5826)
? fd_install (./arch/x86/include/asm/preempt.h:103 ./include/linux/rcupdate.h:896 fs/file.c:631)
? __sys_accept4 (./include/linux/file.h:47 net/socket.c:2002)
? __pfx___sys_accept4 (net/socket.c:1994)
? handle_mm_fault (./include/linux/memcontrol.h:1078 ./include/linux/memcontrol.h:1066 mm/memory.c:5557 mm/memory.c:5704)
__x64_sys_sendto (net/socket.c:2200)
? do_syscall_64 (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 ./include/linux/entry-common.h:197 arch/x86/entry/common.c:79)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Recursive locking in sockmap
2024-06-06 10:00 Recursive locking in sockmap Vincent Whitchurch
@ 2024-06-06 12:46 ` Jason Xing
2024-06-07 12:09 ` Vincent Whitchurch
0 siblings, 1 reply; 6+ messages in thread
From: Jason Xing @ 2024-06-06 12:46 UTC (permalink / raw)
To: Vincent Whitchurch
Cc: John Fastabend, Jakub Sitnicki, Jason Xing, netdev, bpf
Hello Vincent,
On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
<vincent.whitchurch@datadoghq.com> wrote:
>
> With a socket in the sockmap, if there's a parser callback installed
> and the verdict callback returns SK_PASS, the kernel deadlocks
> immediately after the verdict callback is run. This started at commit
> 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> pointer dereference in sk_psock_skb_ingress_enqueue").
>
> It can be reproduced by running ./test_sockmap -t ping
> --txmsg_pass_skb. The --txmsg_pass_skb command to test_sockmap is
> available in this series:
> https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
Thanks for your report.
I don't have time right now to look into this issue carefully until
this weekend. BTW, did you mean the patch [2/5] in the link that can
solve the problem?
Thanks,
Jason
>
> Lockdep splat below (also attached in case it gets damaged). This is
> from an unmodified 6.10.0-rc2, but the problem also exists on latest
> mainline and net-next.
>
> ============================================
> WARNING: possible recursive locking detected
> 6.10.0-rc2 #59 Not tainted
> --------------------------------------------
> test_sockmap/342 is trying to acquire lock:
> ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
> sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
> net/core/skmsg.c:555)
>
> but task is already holding lock:
> ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
> sk_psock_strp_data_ready (net/core/skmsg.c:1120)
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(clock-AF_INET);
> lock(clock-AF_INET);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 9 locks held by test_sockmap/342:
> #0: ffff888007a85818 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg
> (net/ipv4/tcp.c:1348)
> #1: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit
> (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781
> net/ipv4/ip_output.c:470)
> #2: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
> ip_finish_output2 (./include/linux/rcupdate.h:329
> ./include/linux/rcupdate.h:781 net/ipv4/ip_output.c:228)
> #3: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at: process_backlog
> (./include/linux/rcupdate.h:329 ./include/linux/rcupdate.h:781
> net/core/dev.c:6066)
> #4: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
> ip_local_deliver_finish (./include/linux/rcupdate.h:329
> ./include/linux/rcupdate.h:781 net/ipv4/ip_input.c:232)
> #5: ffff888007a87018 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv
> (./include/linux/skbuff.h:1640 ./include/net/tcp.h:2510
> net/ipv4/tcp_ipv4.c:2342)
> #6: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
> sk_psock_strp_data_ready (./include/linux/rcupdate.h:329
> ./include/linux/rcupdate.h:781 net/core/skmsg.c:1113)
> #7: ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
> sk_psock_strp_data_ready (net/core/skmsg.c:1120)
> #8: ffffffffb8849c00 (rcu_read_lock){....}-{1:2}, at:
> sk_psock_strp_read (./include/linux/rcupdate.h:329
> ./include/linux/rcupdate.h:781 net/core/skmsg.c:1062)
>
> stack backtrace:
> CPU: 0 PID: 342 Comm: test_sockmap Not tainted 6.10.0-rc2 #59
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> Call Trace:
> <IRQ>
> dump_stack_lvl (lib/dump_stack.c:118)
> __lock_acquire (kernel/locking/lockdep.c:3858 kernel/locking/lockdep.c:5137)
> ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
> ? tcp_rcv_established (./include/linux/skbuff.h:2097
> ./include/net/tcp.h:2026 ./include/net/tcp.h:2099
> net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
> ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
> ? ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
> ? ip_local_deliver_finish (./include/linux/rcupdate.h:810
> net/ipv4/ip_input.c:234)
> ? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
> lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
> ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
> net/core/skmsg.c:555)
> ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
> ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
> ? mark_held_locks (kernel/locking/lockdep.c:4274)
> ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:466
> net/core/skmsg.c:555)
> _raw_read_lock_bh (./include/linux/rwlock_api_smp.h:177
> kernel/locking/spinlock.c:252)
> ? sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
> net/core/skmsg.c:555)
> sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
> net/core/skmsg.c:555)
> sk_psock_skb_ingress_self (net/core/skmsg.c:607)
> sk_psock_verdict_apply (net/core/skmsg.c:1008)
> sk_psock_strp_read (./include/linux/rcupdate.h:810 net/core/skmsg.c:1081)
> ? sk_psock_strp_parse (net/core/skmsg.c:1104)
> __strp_recv (net/strparser/strparser.c:301 (discriminator 3))
> tcp_read_sock (net/ipv4/tcp.c:1583)
> ? __pfx_strp_recv (net/strparser/strparser.c:332)
> ? __pfx_tcp_read_sock (net/ipv4/tcp.c:1560)
> ? lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
> strp_read_sock (net/strparser/strparser.c:358)
> ? __pfx_strp_read_sock (net/strparser/strparser.c:346)
> ? __pfx_do_raw_write_lock (kernel/locking/spinlock_debug.c:209)
> ? lock_is_held_type (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5826)
> strp_data_ready (net/strparser/strparser.c:388 net/strparser/strparser.c:366)
> sk_psock_strp_data_ready (net/core/skmsg.c:1121)
> tcp_data_queue (net/ipv4/tcp_input.c:5234)
> ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
> ? __pfx_tcp_data_queue (net/ipv4/tcp_input.c:5148)
> ? __pfx_tcp_urg (net/ipv4/tcp_input.c:5820)
> ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
> ? kvm_clock_get_cycles (./arch/x86/include/asm/preempt.h:94
> arch/x86/kernel/kvmclock.c:80 arch/x86/kernel/kvmclock.c:86)
> ? ktime_get (kernel/time/timekeeping.c:195 (discriminator 4)
> kernel/time/timekeeping.c:395 (discriminator 4)
> kernel/time/timekeeping.c:403 (discriminator 4)
> kernel/time/timekeeping.c:850 (discriminator 4))
> tcp_rcv_established (./include/linux/skbuff.h:2097
> ./include/net/tcp.h:2026 ./include/net/tcp.h:2099
> net/ipv4/tcp_input.c:5660 net/ipv4/tcp_input.c:6179)
> ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
> ? __pfx_tcp_inbound_hash.constprop.0 (./include/net/tcp.h:2800)
> ? __pfx_tcp_rcv_established (net/ipv4/tcp_input.c:6006)
> ? do_raw_spin_lock (./arch/x86/include/asm/atomic.h:107
> ./include/linux/atomic/atomic-arch-fallback.h:2170
> ./include/linux/atomic/atomic-instrumented.h:1302
> ./include/asm-generic/qspinlock.h:111
> kernel/locking/spinlock_debug.c:116)
> tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1956)
> tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2345)
> ? __pfx_tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2172)
> ? __pfx_raw_local_deliver (net/ipv4/raw.c:201)
> ? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
> ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
> ? lock_is_held_type (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5826)
> ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207 (discriminator 8))
> ip_local_deliver_finish (./include/linux/rcupdate.h:810
> net/ipv4/ip_input.c:234)
> ip_local_deliver (./include/linux/netfilter.h:314
> ./include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
> ? __pfx_ip_local_deliver (net/ipv4/ip_input.c:243)
> ? lock_is_held_type (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5826)
> ? ip_rcv_finish_core.constprop.0 (./include/net/net_namespace.h:383
> ./include/linux/netdevice.h:2577 net/ipv4/ip_input.c:372)
> ip_rcv (./include/net/dst.h:460 net/ipv4/ip_input.c:449
> ./include/linux/netfilter.h:314 ./include/linux/netfilter.h:308
> net/ipv4/ip_input.c:569)
> ? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
> ? lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
> ? lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
> ? __pfx_ip_rcv (net/ipv4/ip_input.c:562)
> __netif_receive_skb_one_core (net/core/dev.c:5624 (discriminator 4))
> ? __pfx___netif_receive_skb_one_core (net/core/dev.c:5617)
> ? mark_held_locks (kernel/locking/lockdep.c:4274)
> process_backlog (./include/linux/rcupdate.h:810 net/core/dev.c:6068)
> __napi_poll.constprop.0 (net/core/dev.c:6721)
> net_rx_action (net/core/dev.c:6792 net/core/dev.c:6906)
> ? __pfx_net_rx_action (net/core/dev.c:6870)
> ? __pfx_rcu_core (kernel/rcu/tree.c:2756)
> ? mark_held_locks (kernel/locking/lockdep.c:4274)
> ? __dev_queue_xmit (./include/linux/rcupdate.h:339
> ./include/linux/rcupdate.h:849 net/core/dev.c:4420)
> handle_softirqs (kernel/softirq.c:554)
> ? __dev_queue_xmit (./include/linux/rcupdate.h:339
> ./include/linux/rcupdate.h:849 net/core/dev.c:4420)
> do_softirq (kernel/softirq.c:455 kernel/softirq.c:442)
> </IRQ>
> <TASK>
> __local_bh_enable_ip (kernel/softirq.c:382)
> ? __dev_queue_xmit (./include/linux/rcupdate.h:339
> ./include/linux/rcupdate.h:849 net/core/dev.c:4420)
> __dev_queue_xmit (net/core/dev.c:4421)
> ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
> ? __pfx_mark_lock (kernel/locking/lockdep.c:4639)
> ? __pfx___dev_queue_xmit (net/core/dev.c:4302)
> ? find_held_lock (kernel/locking/lockdep.c:5244)
> ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
> ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
> ? __pfx___lock_acquire (kernel/locking/lockdep.c:4993)
> ? mark_held_locks (kernel/locking/lockdep.c:4274)
> ip_finish_output2 (./include/linux/netdevice.h:3095
> ./include/net/neighbour.h:526 ./include/net/neighbour.h:540
> net/ipv4/ip_output.c:235)
> ? __pfx_nf_hook (./include/linux/netfilter.h:227)
> ? lock_acquire (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
> ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:199)
> ? ip_skb_dst_mtu (./include/net/net_namespace.h:383
> ./include/linux/netdevice.h:2577 ./include/net/ip.h:465
> ./include/net/ip.h:502)
> ? __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
> __ip_queue_xmit (net/ipv4/ip_output.c:535 (discriminator 4))
> ? __skb_clone (./arch/x86/include/asm/atomic.h:53 (discriminator 4)
> ./include/linux/atomic/atomic-arch-fallback.h:992 (discriminator 4)
> ./include/linux/atomic/atomic-instrumented.h:436 (discriminator 4)
> net/core/skbuff.c:1576 (discriminator 4))
> __tcp_transmit_skb (net/ipv4/tcp_output.c:1466 (discriminator 4))
> ? __pfx___tcp_transmit_skb (net/ipv4/tcp_output.c:1287)
> ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
> ? __pfx_lock_release (kernel/locking/lockdep.c:5762)
> ? ktime_get (./arch/x86/include/asm/irqflags.h:42
> ./arch/x86/include/asm/irqflags.h:77
> ./arch/x86/include/asm/irqflags.h:135 ./include/linux/seqlock.h:74
> kernel/time/timekeeping.c:848)
> ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
> tcp_write_xmit (net/ipv4/tcp_output.c:2829)
> ? __pfx_mem_cgroup_charge_skmem (mm/memcontrol.c:7886)
> ? skb_page_frag_refill (net/core/sock.c:2920 net/core/sock.c:2904)
> __tcp_push_pending_frames (net/ipv4/tcp_output.c:3014)
> tcp_sendmsg_locked (net/ipv4/tcp.c:1316)
> ? print_usage_bug.part.0 (kernel/locking/lockdep.c:3980)
> ? __pfx_tcp_sendmsg_locked (net/ipv4/tcp.c:1046)
> ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5776)
> ? __local_bh_enable_ip (./arch/x86/include/asm/irqflags.h:42
> ./arch/x86/include/asm/irqflags.h:77 kernel/softirq.c:387)
> tcp_sendmsg (net/ipv4/tcp.c:1349)
> __sys_sendto (net/socket.c:730 net/socket.c:745 net/socket.c:2192)
> ? __pfx___sys_sendto (net/socket.c:2162)
> ? lock_is_held_type (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5826)
> ? fd_install (./arch/x86/include/asm/preempt.h:103
> ./include/linux/rcupdate.h:896 fs/file.c:631)
> ? __sys_accept4 (./include/linux/file.h:47 net/socket.c:2002)
> ? __pfx___sys_accept4 (net/socket.c:1994)
> ? handle_mm_fault (./include/linux/memcontrol.h:1078
> ./include/linux/memcontrol.h:1066 mm/memory.c:5557 mm/memory.c:5704)
> __x64_sys_sendto (net/socket.c:2200)
> ? do_syscall_64 (./arch/x86/include/asm/irqflags.h:42
> ./arch/x86/include/asm/irqflags.h:77
> ./include/linux/entry-common.h:197 arch/x86/entry/common.c:79)
> ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4421)
> do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Recursive locking in sockmap
2024-06-06 12:46 ` Jason Xing
@ 2024-06-07 12:09 ` Vincent Whitchurch
2024-06-07 16:00 ` Cong Wang
0 siblings, 1 reply; 6+ messages in thread
From: Vincent Whitchurch @ 2024-06-07 12:09 UTC (permalink / raw)
To: Jason Xing; +Cc: John Fastabend, Jakub Sitnicki, Jason Xing, netdev, bpf
On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> <vincent.whitchurch@datadoghq.com> wrote:
> > With a socket in the sockmap, if there's a parser callback installed
> > and the verdict callback returns SK_PASS, the kernel deadlocks
> > immediately after the verdict callback is run. This started at commit
> > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > pointer dereference in sk_psock_skb_ingress_enqueue").
> >
> > It can be reproduced by running ./test_sockmap -t ping
> > --txmsg_pass_skb. The --txmsg_pass_skb command to test_sockmap is
> > available in this series:
> > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
>
> I don't have time right now to look into this issue carefully until
> this weekend. BTW, did you mean the patch [2/5] in the link that can
> solve the problem?
No. That patch set addresses a different problem which occurs even if
only a verdict callback is used. But patch 4/5 in that patch set adds
the --txmsg_pass_skb option to the test_sockmap test program, and that
option can be used to reproduce this deadlock too.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Recursive locking in sockmap
2024-06-07 12:09 ` Vincent Whitchurch
@ 2024-06-07 16:00 ` Cong Wang
2024-06-13 19:08 ` John Fastabend
0 siblings, 1 reply; 6+ messages in thread
From: Cong Wang @ 2024-06-07 16:00 UTC (permalink / raw)
To: Vincent Whitchurch
Cc: Jason Xing, John Fastabend, Jakub Sitnicki, Jason Xing, netdev,
bpf
On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote:
> On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> > <vincent.whitchurch@datadoghq.com> wrote:
> > > With a socket in the sockmap, if there's a parser callback installed
> > > and the verdict callback returns SK_PASS, the kernel deadlocks
> > > immediately after the verdict callback is run. This started at commit
> > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > > pointer dereference in sk_psock_skb_ingress_enqueue").
> > >
> > > It can be reproduced by running ./test_sockmap -t ping
> > > --txmsg_pass_skb. The --txmsg_pass_skb command to test_sockmap is
> > > available in this series:
> > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
> >
> > I don't have time right now to look into this issue carefully until
> > this weekend. BTW, did you mean the patch [2/5] in the link that can
> > solve the problem?
>
> No. That patch set addresses a different problem which occurs even if
> only a verdict callback is used. But patch 4/5 in that patch set adds
> the --txmsg_pass_skb option to the test_sockmap test program, and that
> option can be used to reproduce this deadlock too.
I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you
test the following patch?
------------>
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..da64ded97f3a 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk)
if (tls_sw_has_ctx_rx(sk)) {
psock->saved_data_ready(sk);
} else {
- write_lock_bh(&sk->sk_callback_lock);
strp_data_ready(&psock->strp);
- write_unlock_bh(&sk->sk_callback_lock);
}
}
rcu_read_unlock();
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Recursive locking in sockmap
2024-06-07 16:00 ` Cong Wang
@ 2024-06-13 19:08 ` John Fastabend
2024-07-20 18:20 ` Cong Wang
0 siblings, 1 reply; 6+ messages in thread
From: John Fastabend @ 2024-06-13 19:08 UTC (permalink / raw)
To: Cong Wang, Vincent Whitchurch
Cc: Jason Xing, John Fastabend, Jakub Sitnicki, Jason Xing, netdev,
bpf
Cong Wang wrote:
> On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote:
> > On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> > > <vincent.whitchurch@datadoghq.com> wrote:
> > > > With a socket in the sockmap, if there's a parser callback installed
> > > > and the verdict callback returns SK_PASS, the kernel deadlocks
> > > > immediately after the verdict callback is run. This started at commit
> > > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > > > pointer dereference in sk_psock_skb_ingress_enqueue").
> > > >
> > > > It can be reproduced by running ./test_sockmap -t ping
> > > > --txmsg_pass_skb. The --txmsg_pass_skb command to test_sockmap is
> > > > available in this series:
> > > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
> > >
> > > I don't have time right now to look into this issue carefully until
> > > this weekend. BTW, did you mean the patch [2/5] in the link that can
> > > solve the problem?
> >
> > No. That patch set addresses a different problem which occurs even if
> > only a verdict callback is used. But patch 4/5 in that patch set adds
> > the --txmsg_pass_skb option to the test_sockmap test program, and that
> > option can be used to reproduce this deadlock too.
>
> I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you
> test the following patch?
>
> ------------>
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..da64ded97f3a 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk)
> if (tls_sw_has_ctx_rx(sk)) {
> psock->saved_data_ready(sk);
> } else {
> - write_lock_bh(&sk->sk_callback_lock);
> strp_data_ready(&psock->strp);
> - write_unlock_bh(&sk->sk_callback_lock);
> }
> }
> rcu_read_unlock();
Its not obvious to me that we can run the strp parser without the
sk_callback lock here. I believe below is the correct fix. It
fixes the splat above with test.
bpf: sockmap, fix introduced strparser recursive lock
Originally there was a race where removing a psock from the sock map while
it was also receiving an skb and calling sk_psock_data_ready(). It was
possible the removal code would NULL/set the data_ready callback while
concurrently calling the hook from receive path. The fix was to wrap the
access in sk_callback_lock to ensure the saved_data_ready pointer didn't
change under us. There was some discussion around doing a larger change
to ensure we could use READ_ONCE/WRITE_ONCE over the callback, but that
was for *next kernels not stable fixes.
But, we unfortunately introduced a regression with the fix because there
is another path into this code (that didn't have a test case) through
the stream parser. The stream parser runs with the lower lock which means
we get the following splat and lock up.
============================================
WARNING: possible recursive locking detected
6.10.0-rc2 #59 Not tainted
--------------------------------------------
test_sockmap/342 is trying to acquire lock:
ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)
but task is already holding lock:
ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_strp_data_ready (net/core/skmsg.c:1120)
To fix ensure we do not grap lock when we reach this code through the
strparser.
Fixes: 6648e613226e1 ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
include/linux/skmsg.h | 9 +++++++--
net/core/skmsg.c | 5 ++++-
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c9efda9df285..3659e9b514d0 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -461,13 +461,18 @@ static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock)
sk_psock_drop(sk, psock);
}
-static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
+static inline void __sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
{
- read_lock_bh(&sk->sk_callback_lock);
if (psock->saved_data_ready)
psock->saved_data_ready(sk);
else
sk->sk_data_ready(sk);
+}
+
+static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
+{
+ read_lock_bh(&sk->sk_callback_lock);
+ __sk_psock_data_ready(sk, psock);
read_unlock_bh(&sk->sk_callback_lock);
}
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..8429daecbbb6 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -552,7 +552,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
msg->skb = skb;
sk_psock_queue_msg(psock, msg);
- sk_psock_data_ready(sk, psock);
+ if (skb_bpf_strparser(skb))
+ __sk_psock_data_ready(sk, psock);
+ else
+ sk_psock_data_ready(sk, psock);
return copied;
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Recursive locking in sockmap
2024-06-13 19:08 ` John Fastabend
@ 2024-07-20 18:20 ` Cong Wang
0 siblings, 0 replies; 6+ messages in thread
From: Cong Wang @ 2024-07-20 18:20 UTC (permalink / raw)
To: John Fastabend
Cc: Vincent Whitchurch, Jason Xing, Jakub Sitnicki, Jason Xing,
netdev, bpf
On Thu, Jun 13, 2024 at 12:08:25PM -0700, John Fastabend wrote:
> Cong Wang wrote:
> > On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote:
> > > On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> > > > <vincent.whitchurch@datadoghq.com> wrote:
> > > > > With a socket in the sockmap, if there's a parser callback installed
> > > > > and the verdict callback returns SK_PASS, the kernel deadlocks
> > > > > immediately after the verdict callback is run. This started at commit
> > > > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > > > > pointer dereference in sk_psock_skb_ingress_enqueue").
> > > > >
> > > > > It can be reproduced by running ./test_sockmap -t ping
> > > > > --txmsg_pass_skb. The --txmsg_pass_skb command to test_sockmap is
> > > > > available in this series:
> > > > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
> > > >
> > > > I don't have time right now to look into this issue carefully until
> > > > this weekend. BTW, did you mean the patch [2/5] in the link that can
> > > > solve the problem?
> > >
> > > No. That patch set addresses a different problem which occurs even if
> > > only a verdict callback is used. But patch 4/5 in that patch set adds
> > > the --txmsg_pass_skb option to the test_sockmap test program, and that
> > > option can be used to reproduce this deadlock too.
> >
> > I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you
> > test the following patch?
> >
> > ------------>
> >
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index fd20aae30be2..da64ded97f3a 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk)
> > if (tls_sw_has_ctx_rx(sk)) {
> > psock->saved_data_ready(sk);
> > } else {
> > - write_lock_bh(&sk->sk_callback_lock);
> > strp_data_ready(&psock->strp);
> > - write_unlock_bh(&sk->sk_callback_lock);
> > }
> > }
> > rcu_read_unlock();
>
> Its not obvious to me that we can run the strp parser without the
> sk_callback lock here. I believe below is the correct fix. It
> fixes the splat above with test.
The lock is still there, but just read lock. And I don't see any writing to
psock->strp in strp_data_ready(), so using read lock makes sense to me.
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-20 18:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-06 10:00 Recursive locking in sockmap Vincent Whitchurch
2024-06-06 12:46 ` Jason Xing
2024-06-07 12:09 ` Vincent Whitchurch
2024-06-07 16:00 ` Cong Wang
2024-06-13 19:08 ` John Fastabend
2024-07-20 18:20 ` Cong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).