* [PATCH 5.10 228/290] tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset()
[not found] <20240703102904.170852981@linuxfoundation.org>
@ 2024-07-03 10:40 ` Greg Kroah-Hartman
2024-07-03 11:42 ` Yunseong Kim
2024-07-03 10:41 ` [PATCH 5.10 279/290] ipv6: annotate some data-races around sk->sk_prot Greg Kroah-Hartman
` (2 subsequent siblings)
3 siblings, 1 reply; 5+ messages in thread
From: Greg Kroah-Hartman @ 2024-07-03 10:40 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, netdev, Yunseong Kim, Yeoreum Yun,
Paolo Abeni, Sasha Levin
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yunseong Kim <yskelg@gmail.com>
[ Upstream commit bab4923132feb3e439ae45962979c5d9d5c7c1f1 ]
In the TRACE_EVENT(qdisc_reset) NULL dereference occurred from
qdisc->dev_queue->dev <NULL> ->name
This situation simulated from bunch of veths and Bluetooth disconnection
and reconnection.
During qdisc initialization, qdisc was being set to noop_queue.
In veth_init_queue, the initial tx_num was reduced back to one,
causing the qdisc reset to be called with noop, which led to the kernel
panic.
I've attached the GitHub gist link that C converted syz-execprogram
source code and 3 log of reproduced vmcore-dmesg.
https://gist.github.com/yskelg/cc64562873ce249cdd0d5a358b77d740
Yeoreum and I use two fuzzing tool simultaneously.
One process with syz-executor : https://github.com/google/syzkaller
$ ./syz-execprog -executor=./syz-executor -repeat=1 -sandbox=setuid \
-enable=none -collide=false log1
The other process with perf fuzzer:
https://github.com/deater/perf_event_tests/tree/master/fuzzer
$ perf_event_tests/fuzzer/perf_fuzzer
I think this will happen on the kernel version.
Linux kernel version +v6.7.10, +v6.8, +v6.9 and it could happen in v6.10.
This occurred from 51270d573a8d. I think this patch is absolutely
necessary. Previously, It was showing not intended string value of name.
I've reproduced 3 time from my fedora 40 Debug Kernel with any other module
or patched.
version: 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug
[ 5287.164555] veth0_vlan: left promiscuous mode
[ 5287.164929] veth1_macvtap: left promiscuous mode
[ 5287.164950] veth0_macvtap: left promiscuous mode
[ 5287.164983] veth1_vlan: left promiscuous mode
[ 5287.165008] veth0_vlan: left promiscuous mode
[ 5287.165450] veth1_macvtap: left promiscuous mode
[ 5287.165472] veth0_macvtap: left promiscuous mode
[ 5287.165502] veth1_vlan: left promiscuous mode
…
[ 5297.598240] bridge0: port 2(bridge_slave_1) entered blocking state
[ 5297.598262] bridge0: port 2(bridge_slave_1) entered forwarding state
[ 5297.598296] bridge0: port 1(bridge_slave_0) entered blocking state
[ 5297.598313] bridge0: port 1(bridge_slave_0) entered forwarding state
[ 5297.616090] 8021q: adding VLAN 0 to HW filter on device bond0
[ 5297.620405] bridge0: port 1(bridge_slave_0) entered disabled state
[ 5297.620730] bridge0: port 2(bridge_slave_1) entered disabled state
[ 5297.627247] 8021q: adding VLAN 0 to HW filter on device team0
[ 5297.629636] bridge0: port 1(bridge_slave_0) entered blocking state
…
[ 5298.002798] bridge_slave_0: left promiscuous mode
[ 5298.002869] bridge0: port 1(bridge_slave_0) entered disabled state
[ 5298.309444] bond0 (unregistering): (slave bond_slave_0): Releasing backup interface
[ 5298.315206] bond0 (unregistering): (slave bond_slave_1): Releasing backup interface
[ 5298.320207] bond0 (unregistering): Released all slaves
[ 5298.354296] hsr_slave_0: left promiscuous mode
[ 5298.360750] hsr_slave_1: left promiscuous mode
[ 5298.374889] veth1_macvtap: left promiscuous mode
[ 5298.374931] veth0_macvtap: left promiscuous mode
[ 5298.374988] veth1_vlan: left promiscuous mode
[ 5298.375024] veth0_vlan: left promiscuous mode
[ 5299.109741] team0 (unregistering): Port device team_slave_1 removed
[ 5299.185870] team0 (unregistering): Port device team_slave_0 removed
…
[ 5300.155443] Bluetooth: hci3: unexpected cc 0x0c03 length: 249 > 1
[ 5300.155724] Bluetooth: hci3: unexpected cc 0x1003 length: 249 > 9
[ 5300.155988] Bluetooth: hci3: unexpected cc 0x1001 length: 249 > 9
….
[ 5301.075531] team0: Port device team_slave_1 added
[ 5301.085515] bridge0: port 1(bridge_slave_0) entered blocking state
[ 5301.085531] bridge0: port 1(bridge_slave_0) entered disabled state
[ 5301.085588] bridge_slave_0: entered allmulticast mode
[ 5301.085800] bridge_slave_0: entered promiscuous mode
[ 5301.095617] bridge0: port 1(bridge_slave_0) entered blocking state
[ 5301.095633] bridge0: port 1(bridge_slave_0) entered disabled state
…
[ 5301.149734] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 5301.173234] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 5301.180517] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
[ 5301.193481] hsr_slave_0: entered promiscuous mode
[ 5301.204425] hsr_slave_1: entered promiscuous mode
[ 5301.210172] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.210185] Cannot create hsr debugfs directory
[ 5301.224061] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
[ 5301.246901] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 5301.255934] team0: Port device team_slave_0 added
[ 5301.256480] team0: Port device team_slave_1 added
[ 5301.256948] team0: Port device team_slave_0 added
…
[ 5301.435928] hsr_slave_0: entered promiscuous mode
[ 5301.446029] hsr_slave_1: entered promiscuous mode
[ 5301.455872] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.455884] Cannot create hsr debugfs directory
[ 5301.502664] hsr_slave_0: entered promiscuous mode
[ 5301.513675] hsr_slave_1: entered promiscuous mode
[ 5301.526155] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.526164] Cannot create hsr debugfs directory
[ 5301.563662] hsr_slave_0: entered promiscuous mode
[ 5301.576129] hsr_slave_1: entered promiscuous mode
[ 5301.580259] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.580270] Cannot create hsr debugfs directory
[ 5301.590269] 8021q: adding VLAN 0 to HW filter on device bond0
[ 5301.595872] KASAN: null-ptr-deref in range [0x0000000000000130-0x0000000000000137]
[ 5301.595877] Mem abort info:
[ 5301.595881] ESR = 0x0000000096000006
[ 5301.595885] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5301.595889] SET = 0, FnV = 0
[ 5301.595893] EA = 0, S1PTW = 0
[ 5301.595896] FSC = 0x06: level 2 translation fault
[ 5301.595900] Data abort info:
[ 5301.595903] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[ 5301.595907] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 5301.595911] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 5301.595915] [dfff800000000026] address between user and kernel address ranges
[ 5301.595971] Internal error: Oops: 0000000096000006 [#1] SMP
…
[ 5301.596076] CPU: 2 PID: 102769 Comm:
syz-executor.3 Kdump: loaded Tainted:
G W ------- --- 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug #1
[ 5301.596080] Hardware name: VMware, Inc. VMware20,1/VBSA,
BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023
[ 5301.596082] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 5301.596085] pc : strnlen+0x40/0x88
[ 5301.596114] lr : trace_event_get_offsets_qdisc_reset+0x6c/0x2b0
[ 5301.596124] sp : ffff8000beef6b40
[ 5301.596126] x29: ffff8000beef6b40 x28: dfff800000000000 x27: 0000000000000001
[ 5301.596131] x26: 6de1800082c62bd0 x25: 1ffff000110aa9e0 x24: ffff800088554f00
[ 5301.596136] x23: ffff800088554ec0 x22: 0000000000000130 x21: 0000000000000140
[ 5301.596140] x20: dfff800000000000 x19: ffff8000beef6c60 x18: ffff7000115106d8
[ 5301.596143] x17: ffff800121bad000 x16: ffff800080020000 x15: 0000000000000006
[ 5301.596147] x14: 0000000000000002 x13: ffff0001f3ed8d14 x12: ffff700017ddeda5
[ 5301.596151] x11: 1ffff00017ddeda4 x10: ffff700017ddeda4 x9 : ffff800082cc5eec
[ 5301.596155] x8 : 0000000000000004 x7 : 00000000f1f1f1f1 x6 : 00000000f2f2f200
[ 5301.596158] x5 : 00000000f3f3f3f3 x4 : ffff700017dded80 x3 : 00000000f204f1f1
[ 5301.596162] x2 : 0000000000000026 x1 : 0000000000000000 x0 : 0000000000000130
[ 5301.596166] Call trace:
[ 5301.596175] strnlen+0x40/0x88
[ 5301.596179] trace_event_get_offsets_qdisc_reset+0x6c/0x2b0
[ 5301.596182] perf_trace_qdisc_reset+0xb0/0x538
[ 5301.596184] __traceiter_qdisc_reset+0x68/0xc0
[ 5301.596188] qdisc_reset+0x43c/0x5e8
[ 5301.596190] netif_set_real_num_tx_queues+0x288/0x770
[ 5301.596194] veth_init_queues+0xfc/0x130 [veth]
[ 5301.596198] veth_newlink+0x45c/0x850 [veth]
[ 5301.596202] rtnl_newlink_create+0x2c8/0x798
[ 5301.596205] __rtnl_newlink+0x92c/0xb60
[ 5301.596208] rtnl_newlink+0xd8/0x130
[ 5301.596211] rtnetlink_rcv_msg+0x2e0/0x890
[ 5301.596214] netlink_rcv_skb+0x1c4/0x380
[ 5301.596225] rtnetlink_rcv+0x20/0x38
[ 5301.596227] netlink_unicast+0x3c8/0x640
[ 5301.596231] netlink_sendmsg+0x658/0xa60
[ 5301.596234] __sock_sendmsg+0xd0/0x180
[ 5301.596243] __sys_sendto+0x1c0/0x280
[ 5301.596246] __arm64_sys_sendto+0xc8/0x150
[ 5301.596249] invoke_syscall+0xdc/0x268
[ 5301.596256] el0_svc_common.constprop.0+0x16c/0x240
[ 5301.596259] do_el0_svc+0x48/0x68
[ 5301.596261] el0_svc+0x50/0x188
[ 5301.596265] el0t_64_sync_handler+0x120/0x130
[ 5301.596268] el0t_64_sync+0x194/0x198
[ 5301.596272] Code: eb15001f 54000120 d343fc02 12000801 (38f46842)
[ 5301.596285] SMP: stopping secondary CPUs
[ 5301.597053] Starting crashdump kernel...
[ 5301.597057] Bye!
After applying our patch, I didn't find any kernel panic errors.
We've found a simple reproducer
# echo 1 > /sys/kernel/debug/tracing/events/qdisc/qdisc_reset/enable
# ip link add veth0 type veth peer name veth1
Error: Unknown device type.
However, without our patch applied, I tested upstream 6.10.0-rc3 kernel
using the qdisc_reset event and the ip command on my qemu virtual machine.
This 2 commands makes always kernel panic.
Linux version: 6.10.0-rc3
[ 0.000000] Linux version 6.10.0-rc3-00164-g44ef20baed8e-dirty
(paran@fedora) (gcc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4), GNU ld
version 2.41-34.fc40) #20 SMP PREEMPT Sat Jun 15 16:51:25 KST 2024
Kernel panic message:
[ 615.236484] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[ 615.237250] Dumping ftrace buffer:
[ 615.237679] (ftrace buffer empty)
[ 615.238097] Modules linked in: veth crct10dif_ce virtio_gpu
virtio_dma_buf drm_shmem_helper drm_kms_helper zynqmp_fpga xilinx_can
xilinx_spi xilinx_selectmap xilinx_core xilinx_pr_decoupler versal_fpga
uvcvideo uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev
videobuf2_common mc usbnet deflate zstd ubifs ubi rcar_canfd rcar_can
omap_mailbox ntb_msi_test ntb_hw_epf lattice_sysconfig_spi
lattice_sysconfig ice40_spi gpio_xilinx dwmac_altr_socfpga mdio_regmap
stmmac_platform stmmac pcs_xpcs dfl_fme_region dfl_fme_mgr dfl_fme_br
dfl_afu dfl fpga_region fpga_bridge can can_dev br_netfilter bridge stp
llc atl1c ath11k_pci mhi ath11k_ahb ath11k qmi_helpers ath10k_sdio
ath10k_pci ath10k_core ath mac80211 libarc4 cfg80211 drm fuse backlight ipv6
Jun 22 02:36:5[3 6k152.62-4sm98k4-0k]v kCePUr:n e1l :P IUDn:a b4le6
8t oC ohmma: nidpl eN oketr nteali nptaedg i6n.g1 0re.0q-urecs3t- 0at0
1v6i4r-tgu4a4le fa2d0dbraeeds0se-dir tyd f#f2f08
615.252376] Hardware name: linux,dummy-virt (DT)
[ 615.253220] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 615.254433] pc : strnlen+0x6c/0xe0
[ 615.255096] lr : trace_event_get_offsets_qdisc_reset+0x94/0x3d0
[ 615.256088] sp : ffff800080b269a0
[ 615.256615] x29: ffff800080b269a0 x28: ffffc070f3f98500 x27:
0000000000000001
[ 615.257831] x26: 0000000000000010 x25: ffffc070f3f98540 x24:
ffffc070f619cf60
[ 615.259020] x23: 0000000000000128 x22: 0000000000000138 x21:
dfff800000000000
[ 615.260241] x20: ffffc070f631ad00 x19: 0000000000000128 x18:
ffffc070f448b800
[ 615.261454] x17: 0000000000000000 x16: 0000000000000001 x15:
ffffc070f4ba2a90
[ 615.262635] x14: ffff700010164d73 x13: 1ffff80e1e8d5eb3 x12:
1ffff00010164d72
[ 615.263877] x11: ffff700010164d72 x10: dfff800000000000 x9 :
ffffc070e85d6184
[ 615.265047] x8 : ffffc070e4402070 x7 : 000000000000f1f1 x6 :
000000001504a6d3
[ 615.266336] x5 : ffff28ca21122140 x4 : ffffc070f5043ea8 x3 :
0000000000000000
[ 615.267528] x2 : 0000000000000025 x1 : 0000000000000000 x0 :
0000000000000000
[ 615.268747] Call trace:
[ 615.269180] strnlen+0x6c/0xe0
[ 615.269767] trace_event_get_offsets_qdisc_reset+0x94/0x3d0
[ 615.270716] trace_event_raw_event_qdisc_reset+0xe8/0x4e8
[ 615.271667] __traceiter_qdisc_reset+0xa0/0x140
[ 615.272499] qdisc_reset+0x554/0x848
[ 615.273134] netif_set_real_num_tx_queues+0x360/0x9a8
[ 615.274050] veth_init_queues+0x110/0x220 [veth]
[ 615.275110] veth_newlink+0x538/0xa50 [veth]
[ 615.276172] __rtnl_newlink+0x11e4/0x1bc8
[ 615.276944] rtnl_newlink+0xac/0x120
[ 615.277657] rtnetlink_rcv_msg+0x4e4/0x1370
[ 615.278409] netlink_rcv_skb+0x25c/0x4f0
[ 615.279122] rtnetlink_rcv+0x48/0x70
[ 615.279769] netlink_unicast+0x5a8/0x7b8
[ 615.280462] netlink_sendmsg+0xa70/0x1190
Yeoreum and I don't know if the patch we wrote will fix the underlying
cause, but we think that priority is to prevent kernel panic happening.
So, we're sending this patch.
Fixes: 51270d573a8d ("tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string")
Link: https://lore.kernel.org/lkml/20240229143432.273b4871@gandalf.local.home/t/
Cc: netdev@vger.kernel.org
Tested-by: Yunseong Kim <yskelg@gmail.com>
Signed-off-by: Yunseong Kim <yskelg@gmail.com>
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://lore.kernel.org/r/20240624173320.24945-4-yskelg@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/trace/events/qdisc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/trace/events/qdisc.h b/include/trace/events/qdisc.h
index a50df41634c58..f661d3d7c410a 100644
--- a/include/trace/events/qdisc.h
+++ b/include/trace/events/qdisc.h
@@ -53,7 +53,7 @@ TRACE_EVENT(qdisc_reset,
TP_ARGS(q),
TP_STRUCT__entry(
- __string( dev, qdisc_dev(q)->name )
+ __string( dev, qdisc_dev(q) ? qdisc_dev(q)->name : "(null)" )
__string( kind, q->ops->id )
__field( u32, parent )
__field( u32, handle )
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 5.10 279/290] ipv6: annotate some data-races around sk->sk_prot
[not found] <20240703102904.170852981@linuxfoundation.org>
2024-07-03 10:40 ` [PATCH 5.10 228/290] tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() Greg Kroah-Hartman
@ 2024-07-03 10:41 ` Greg Kroah-Hartman
2024-07-03 10:41 ` [PATCH 5.10 280/290] ipv6: Fix data races " Greg Kroah-Hartman
2024-07-03 10:41 ` [PATCH 5.10 281/290] tcp: Fix data races around icsk->icsk_af_ops Greg Kroah-Hartman
3 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2024-07-03 10:41 UTC (permalink / raw)
To: stable, netdev
Cc: Greg Kroah-Hartman, patches, syzbot, Eric Dumazet,
David S. Miller, Kazunori Kobayashi
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <edumazet@google.com>
commit 086d49058cd8471046ae9927524708820f5fd1c7 upstream.
IPv6 has this hack changing sk->sk_prot when an IPv6 socket
is 'converted' to an IPv4 one with IPV6_ADDRFORM option.
This operation is only performed for TCP and UDP, knowing
their 'struct proto' for the two network families are populated
in the same way, and can not disappear while a reader
might use and dereference sk->sk_prot.
If we think about it all reads of sk->sk_prot while
either socket lock or RTNL is not acquired should be using READ_ONCE().
Also note that other layers like MPTCP, XFRM, CHELSIO_TLS also
write over sk->sk_prot.
BUG: KCSAN: data-race in inet6_recvmsg / ipv6_setsockopt
write to 0xffff8881386f7aa8 of 8 bytes by task 26932 on cpu 0:
do_ipv6_setsockopt net/ipv6/ipv6_sockglue.c:492 [inline]
ipv6_setsockopt+0x3758/0x3910 net/ipv6/ipv6_sockglue.c:1019
udpv6_setsockopt+0x85/0x90 net/ipv6/udp.c:1649
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3489
__sys_setsockopt+0x209/0x2a0 net/socket.c:2180
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff8881386f7aa8 of 8 bytes by task 26911 on cpu 1:
inet6_recvmsg+0x7a/0x210 net/ipv6/af_inet6.c:659
____sys_recvmsg+0x16c/0x320
___sys_recvmsg net/socket.c:2674 [inline]
do_recvmmsg+0x3f5/0xae0 net/socket.c:2768
__sys_recvmmsg net/socket.c:2847 [inline]
__do_sys_recvmmsg net/socket.c:2870 [inline]
__se_sys_recvmmsg net/socket.c:2863 [inline]
__x64_sys_recvmmsg+0xde/0x160 net/socket.c:2863
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffffffff85e0e980 -> 0xffffffff85e01580
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 26911 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00316-g0457e5153e0e-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv6/af_inet6.c | 24 ++++++++++++++++++------
net/ipv6/ipv6_sockglue.c | 6 ++++--
2 files changed, 22 insertions(+), 8 deletions(-)
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -449,11 +449,14 @@ out_unlock:
int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
{
struct sock *sk = sock->sk;
+ const struct proto *prot;
int err = 0;
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ prot = READ_ONCE(sk->sk_prot);
/* If the socket has its own bind function then use it. */
- if (sk->sk_prot->bind)
- return sk->sk_prot->bind(sk, uaddr, addr_len);
+ if (prot->bind)
+ return prot->bind(sk, uaddr, addr_len);
if (addr_len < SIN6_LEN_RFC2133)
return -EINVAL;
@@ -566,6 +569,7 @@ int inet6_ioctl(struct socket *sock, uns
void __user *argp = (void __user *)arg;
struct sock *sk = sock->sk;
struct net *net = sock_net(sk);
+ const struct proto *prot;
switch (cmd) {
case SIOCADDRT:
@@ -583,9 +587,11 @@ int inet6_ioctl(struct socket *sock, uns
case SIOCSIFDSTADDR:
return addrconf_set_dstaddr(net, argp);
default:
- if (!sk->sk_prot->ioctl)
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ prot = READ_ONCE(sk->sk_prot);
+ if (!prot->ioctl)
return -ENOIOCTLCMD;
- return sk->sk_prot->ioctl(sk, cmd, arg);
+ return prot->ioctl(sk, cmd, arg);
}
/*NOTREACHED*/
return 0;
@@ -647,11 +653,14 @@ INDIRECT_CALLABLE_DECLARE(int udpv6_send
int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
struct sock *sk = sock->sk;
+ const struct proto *prot;
if (unlikely(inet_send_prepare(sk)))
return -EAGAIN;
- return INDIRECT_CALL_2(sk->sk_prot->sendmsg, tcp_sendmsg, udpv6_sendmsg,
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ prot = READ_ONCE(sk->sk_prot);
+ return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg,
sk, msg, size);
}
@@ -661,13 +670,16 @@ int inet6_recvmsg(struct socket *sock, s
int flags)
{
struct sock *sk = sock->sk;
+ const struct proto *prot;
int addr_len = 0;
int err;
if (likely(!(flags & MSG_ERRQUEUE)))
sock_rps_record_flow(sk);
- err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udpv6_recvmsg,
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ prot = READ_ONCE(sk->sk_prot);
+ err = INDIRECT_CALL_2(prot->recvmsg, tcp_recvmsg, udpv6_recvmsg,
sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);
if (err >= 0)
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -475,7 +475,8 @@ static int do_ipv6_setsockopt(struct soc
sock_prot_inuse_add(net, sk->sk_prot, -1);
sock_prot_inuse_add(net, &tcp_prot, 1);
local_bh_enable();
- sk->sk_prot = &tcp_prot;
+ /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */
+ WRITE_ONCE(sk->sk_prot, &tcp_prot);
icsk->icsk_af_ops = &ipv4_specific;
sk->sk_socket->ops = &inet_stream_ops;
sk->sk_family = PF_INET;
@@ -489,7 +490,8 @@ static int do_ipv6_setsockopt(struct soc
sock_prot_inuse_add(net, sk->sk_prot, -1);
sock_prot_inuse_add(net, prot, 1);
local_bh_enable();
- sk->sk_prot = prot;
+ /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */
+ WRITE_ONCE(sk->sk_prot, prot);
sk->sk_socket->ops = &inet_dgram_ops;
sk->sk_family = PF_INET;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 5.10 280/290] ipv6: Fix data races around sk->sk_prot.
[not found] <20240703102904.170852981@linuxfoundation.org>
2024-07-03 10:40 ` [PATCH 5.10 228/290] tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() Greg Kroah-Hartman
2024-07-03 10:41 ` [PATCH 5.10 279/290] ipv6: annotate some data-races around sk->sk_prot Greg Kroah-Hartman
@ 2024-07-03 10:41 ` Greg Kroah-Hartman
2024-07-03 10:41 ` [PATCH 5.10 281/290] tcp: Fix data races around icsk->icsk_af_ops Greg Kroah-Hartman
3 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2024-07-03 10:41 UTC (permalink / raw)
To: stable, netdev
Cc: Greg Kroah-Hartman, patches, Kuniyuki Iwashima, Jakub Kicinski,
Kazunori Kobayashi
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima <kuniyu@amazon.com>
commit 364f997b5cfe1db0d63a390fe7c801fa2b3115f6 upstream.
Commit 086d49058cd8 ("ipv6: annotate some data-races around sk->sk_prot")
fixed some data-races around sk->sk_prot but it was not enough.
Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot
without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid
load tearing.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/core/sock.c | 6 ++++--
net/ipv4/af_inet.c | 23 ++++++++++++++++-------
net/ipv6/ipv6_sockglue.c | 4 ++--
3 files changed, 22 insertions(+), 11 deletions(-)
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3263,7 +3263,8 @@ int sock_common_getsockopt(struct socket
{
struct sock *sk = sock->sk;
- return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen);
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ return READ_ONCE(sk->sk_prot)->getsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(sock_common_getsockopt);
@@ -3290,7 +3291,8 @@ int sock_common_setsockopt(struct socket
{
struct sock *sk = sock->sk;
- return sk->sk_prot->setsockopt(sk, level, optname, optval, optlen);
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ return READ_ONCE(sk->sk_prot)->setsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(sock_common_setsockopt);
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -562,22 +562,27 @@ int inet_dgram_connect(struct socket *so
int addr_len, int flags)
{
struct sock *sk = sock->sk;
+ const struct proto *prot;
int err;
if (addr_len < sizeof(uaddr->sa_family))
return -EINVAL;
+
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ prot = READ_ONCE(sk->sk_prot);
+
if (uaddr->sa_family == AF_UNSPEC)
- return sk->sk_prot->disconnect(sk, flags);
+ return prot->disconnect(sk, flags);
if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) {
- err = sk->sk_prot->pre_connect(sk, uaddr, addr_len);
+ err = prot->pre_connect(sk, uaddr, addr_len);
if (err)
return err;
}
if (data_race(!inet_sk(sk)->inet_num) && inet_autobind(sk))
return -EAGAIN;
- return sk->sk_prot->connect(sk, uaddr, addr_len);
+ return prot->connect(sk, uaddr, addr_len);
}
EXPORT_SYMBOL(inet_dgram_connect);
@@ -740,10 +745,11 @@ EXPORT_SYMBOL(inet_stream_connect);
int inet_accept(struct socket *sock, struct socket *newsock, int flags,
bool kern)
{
- struct sock *sk1 = sock->sk;
+ struct sock *sk1 = sock->sk, *sk2;
int err = -EINVAL;
- struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, &err, kern);
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ sk2 = READ_ONCE(sk1->sk_prot)->accept(sk1, flags, &err, kern);
if (!sk2)
goto do_err;
@@ -828,12 +834,15 @@ ssize_t inet_sendpage(struct socket *soc
size_t size, int flags)
{
struct sock *sk = sock->sk;
+ const struct proto *prot;
if (unlikely(inet_send_prepare(sk)))
return -EAGAIN;
- if (sk->sk_prot->sendpage)
- return sk->sk_prot->sendpage(sk, page, offset, size, flags);
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ prot = READ_ONCE(sk->sk_prot);
+ if (prot->sendpage)
+ return prot->sendpage(sk, page, offset, size, flags);
return sock_no_sendpage(sock, page, offset, size, flags);
}
EXPORT_SYMBOL(inet_sendpage);
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -475,7 +475,7 @@ static int do_ipv6_setsockopt(struct soc
sock_prot_inuse_add(net, sk->sk_prot, -1);
sock_prot_inuse_add(net, &tcp_prot, 1);
local_bh_enable();
- /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */
+ /* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */
WRITE_ONCE(sk->sk_prot, &tcp_prot);
icsk->icsk_af_ops = &ipv4_specific;
sk->sk_socket->ops = &inet_stream_ops;
@@ -490,7 +490,7 @@ static int do_ipv6_setsockopt(struct soc
sock_prot_inuse_add(net, sk->sk_prot, -1);
sock_prot_inuse_add(net, prot, 1);
local_bh_enable();
- /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */
+ /* Paired with READ_ONCE(sk->sk_prot) in inet6_dgram_ops */
WRITE_ONCE(sk->sk_prot, prot);
sk->sk_socket->ops = &inet_dgram_ops;
sk->sk_family = PF_INET;
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 5.10 281/290] tcp: Fix data races around icsk->icsk_af_ops.
[not found] <20240703102904.170852981@linuxfoundation.org>
` (2 preceding siblings ...)
2024-07-03 10:41 ` [PATCH 5.10 280/290] ipv6: Fix data races " Greg Kroah-Hartman
@ 2024-07-03 10:41 ` Greg Kroah-Hartman
3 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2024-07-03 10:41 UTC (permalink / raw)
To: stable, netdev
Cc: Greg Kroah-Hartman, patches, syzbot, Eric Dumazet,
Kuniyuki Iwashima, Jakub Kicinski, Kazunori Kobayashi
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima <kuniyu@amazon.com>
commit f49cd2f4d6170d27a2c61f1fecb03d8a70c91f57 upstream.
setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops
under lock_sock(), but tcp_(get|set)sockopt() read it locklessly. To
avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE()
for the reads and writes.
Thanks to Eric Dumazet for providing the syzbot report:
BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect
write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0:
tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240
__inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660
inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724
__sys_connect_file net/socket.c:1976 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1993
__do_sys_connect net/socket.c:2003 [inline]
__se_sys_connect net/socket.c:2000 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:2000
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1:
tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585
__sys_setsockopt+0x212/0x2b0 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted
6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/26/2022
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kazunori Kobayashi <kazunori.kobayashi@miraclelinux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/tcp.c | 10 ++++++----
net/ipv6/ipv6_sockglue.c | 3 ++-
net/ipv6/tcp_ipv6.c | 6 ++++--
3 files changed, 12 insertions(+), 7 deletions(-)
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3501,8 +3501,9 @@ int tcp_setsockopt(struct sock *sk, int
const struct inet_connection_sock *icsk = inet_csk(sk);
if (level != SOL_TCP)
- return icsk->icsk_af_ops->setsockopt(sk, level, optname,
- optval, optlen);
+ /* Paired with WRITE_ONCE() in do_ipv6_setsockopt() and tcp_v6_connect() */
+ return READ_ONCE(icsk->icsk_af_ops)->setsockopt(sk, level, optname,
+ optval, optlen);
return do_tcp_setsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(tcp_setsockopt);
@@ -4063,8 +4064,9 @@ int tcp_getsockopt(struct sock *sk, int
struct inet_connection_sock *icsk = inet_csk(sk);
if (level != SOL_TCP)
- return icsk->icsk_af_ops->getsockopt(sk, level, optname,
- optval, optlen);
+ /* Paired with WRITE_ONCE() in do_ipv6_setsockopt() and tcp_v6_connect() */
+ return READ_ONCE(icsk->icsk_af_ops)->getsockopt(sk, level, optname,
+ optval, optlen);
return do_tcp_getsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(tcp_getsockopt);
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -477,7 +477,8 @@ static int do_ipv6_setsockopt(struct soc
local_bh_enable();
/* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */
WRITE_ONCE(sk->sk_prot, &tcp_prot);
- icsk->icsk_af_ops = &ipv4_specific;
+ /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
+ WRITE_ONCE(icsk->icsk_af_ops, &ipv4_specific);
sk->sk_socket->ops = &inet_stream_ops;
sk->sk_family = PF_INET;
tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -237,7 +237,8 @@ static int tcp_v6_connect(struct sock *s
sin.sin_port = usin->sin6_port;
sin.sin_addr.s_addr = usin->sin6_addr.s6_addr32[3];
- icsk->icsk_af_ops = &ipv6_mapped;
+ /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
+ WRITE_ONCE(icsk->icsk_af_ops, &ipv6_mapped);
if (sk_is_mptcp(sk))
mptcpv6_handle_mapped(sk, true);
sk->sk_backlog_rcv = tcp_v4_do_rcv;
@@ -249,7 +250,8 @@ static int tcp_v6_connect(struct sock *s
if (err) {
icsk->icsk_ext_hdr_len = exthdrlen;
- icsk->icsk_af_ops = &ipv6_specific;
+ /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */
+ WRITE_ONCE(icsk->icsk_af_ops, &ipv6_specific);
if (sk_is_mptcp(sk))
mptcpv6_handle_mapped(sk, false);
sk->sk_backlog_rcv = tcp_v6_do_rcv;
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 5.10 228/290] tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset()
2024-07-03 10:40 ` [PATCH 5.10 228/290] tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() Greg Kroah-Hartman
@ 2024-07-03 11:42 ` Yunseong Kim
0 siblings, 0 replies; 5+ messages in thread
From: Yunseong Kim @ 2024-07-03 11:42 UTC (permalink / raw)
To: Greg Kroah-Hartman, stable
Cc: patches, netdev, Yeoreum Yun, Paolo Abeni, Sasha Levin
Hi Greg,
On 7/3/24 7:40 오후, Greg Kroah-Hartman wrote:
> 5.10-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Yunseong Kim <yskelg@gmail.com>
>
> [ Upstream commit bab4923132feb3e439ae45962979c5d9d5c7c1f1 ]
>
> In the TRACE_EVENT(qdisc_reset) NULL dereference occurred from
>
> qdisc->dev_queue->dev <NULL> ->name
>
> This situation simulated from bunch of veths and Bluetooth disconnection
> and reconnection.
>
> During qdisc initialization, qdisc was being set to noop_queue.
> In veth_init_queue, the initial tx_num was reduced back to one,
> causing the qdisc reset to be called with noop, which led to the kernel
> panic.
> Fixes: 51270d573a8d ("tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string")
> Link: https://lore.kernel.org/lkml/20240229143432.273b4871@gandalf.local.home/t/
> Cc: netdev@vger.kernel.org
> Tested-by: Yunseong Kim <yskelg@gmail.com>
> Signed-off-by: Yunseong Kim <yskelg@gmail.com>
> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> Link: https://lore.kernel.org/r/20240624173320.24945-4-yskelg@gmail.com
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> include/trace/events/qdisc.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/trace/events/qdisc.h b/include/trace/events/qdisc.h
> index a50df41634c58..f661d3d7c410a 100644
> --- a/include/trace/events/qdisc.h
> +++ b/include/trace/events/qdisc.h
> @@ -53,7 +53,7 @@ TRACE_EVENT(qdisc_reset,
> TP_ARGS(q),
>
> TP_STRUCT__entry(
> - __string( dev, qdisc_dev(q)->name )
> + __string( dev, qdisc_dev(q) ? qdisc_dev(q)->name : "(null)" )
> __string( kind, q->ops->id )
> __field( u32, parent )
> __field( u32, handle )
Since that code changed in +v5.10.213, +v5.15.152, +v6.1.82 +v6.6.22
+v6.7.10, +v6.8, +6.9 and the stable is in an intermediate state.
commit 2c92ca849fcc ("tracing/treewide: Remove second parameter of
__assign_str()")
Link:
https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home/
I had to fix some other part for the stable branch backporting.
So, I submit another patch. Please check out.
Link:
https://lore.kernel.org/stable/20240702180146.5126-2-yskelg@gmail.com/T/#u
---
include/trace/events/qdisc.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/qdisc.h b/include/trace/events/qdisc.h
index 1f4258308b96..061fd4960303 100644
--- a/include/trace/events/qdisc.h
+++ b/include/trace/events/qdisc.h
@@ -81,14 +81,14 @@ TRACE_EVENT(qdisc_reset,
TP_ARGS(q),
TP_STRUCT__entry(
- __string( dev, qdisc_dev(q)->name )
+ __string( dev, qdisc_dev(q) ? qdisc_dev(q)->name : "(null)" )
__string( kind, q->ops->id )
__field( u32, parent )
__field( u32, handle )
),
TP_fast_assign(
- __assign_str(dev, qdisc_dev(q)->name);
+ __assign_str(dev, qdisc_dev(q) ? qdisc_dev(q)->name : "(null)");
__assign_str(kind, q->ops->id);
__entry->parent = q->parent;
__entry->handle = q->handle;
--
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-03 11:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240703102904.170852981@linuxfoundation.org>
2024-07-03 10:40 ` [PATCH 5.10 228/290] tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() Greg Kroah-Hartman
2024-07-03 11:42 ` Yunseong Kim
2024-07-03 10:41 ` [PATCH 5.10 279/290] ipv6: annotate some data-races around sk->sk_prot Greg Kroah-Hartman
2024-07-03 10:41 ` [PATCH 5.10 280/290] ipv6: Fix data races " Greg Kroah-Hartman
2024-07-03 10:41 ` [PATCH 5.10 281/290] tcp: Fix data races around icsk->icsk_af_ops Greg Kroah-Hartman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).