* use-after-free warnings in tcp_v4_connect() due to inet_twsk_hashdance() inserting the object into ehash table without initializing its reference counter
@ 2024-04-30 22:00 Anderson Nascimento
2024-05-01 0:22 ` Kuniyuki Iwashima
0 siblings, 1 reply; 7+ messages in thread
From: Anderson Nascimento @ 2024-04-30 22:00 UTC (permalink / raw)
To: netdev
Hello,
There is a bug in inet_twsk_hashdance(). This function inserts a
time-wait socket in the established hash table without initializing the
object's reference counter, as seen below. The reference counter
initialization is done after the object is added to the established hash
table and the lock is released. Because of this, a sock_hold() in
tcp_twsk_unique() and other operations on the object trigger warnings
from the reference counter saturation mechanism. The warnings can also
be seen below. They were triggered on Fedora 39 Linux kernel v6.8.
The bug is triggered via a connect() system call on a TCP socket,
reaching __inet_check_established() and then passing the time-wait
socket to tcp_twsk_unique(). Other operations are also performed on the
time-wait socket in __inet_check_established() before its reference
counter is initialized correctly by inet_twsk_hashdance(). The fix seems
to be to move the reference counter initialization inside the lock, but
as I didn't test it, I can't confirm it.
The bug seems to be introduced by commit ec94c269 ("
tcp/dccp: avoid one atomic operation for timewait hashdance").
100 void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
101 struct inet_hashinfo *hashinfo)
102 {
103 const struct inet_sock *inet = inet_sk(sk);
104 const struct inet_connection_sock *icsk = inet_csk(sk);
105 struct inet_ehash_bucket *ehead =
inet_ehash_bucket(hashinfo, sk->sk_hash);
106 spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
107 struct inet_bind_hashbucket *bhead, *bhead2;
...
129
130 spin_lock(lock);
131
132 inet_twsk_add_node_rcu(tw, &ehead->chain);
133
134 /* Step 3: Remove SK from hash chain */
135 if (__sk_nulls_del_node_init_rcu(sk))
136 sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
137
138 spin_unlock(lock);
...
149 refcount_set(&tw->tw_refcnt, 3);
150 }
538 static int __inet_check_established(struct inet_timewait_death_row
*death_row,
539 struct sock *sk, __u16 lport,
540 struct inet_timewait_sock **twp)
541 {
542 struct inet_hashinfo *hinfo = death_row->hashinfo;
543 struct inet_sock *inet = inet_sk(sk);
544 __be32 daddr = inet->inet_rcv_saddr;
545 __be32 saddr = inet->inet_daddr;
546 int dif = sk->sk_bound_dev_if;
547 struct net *net = sock_net(sk);
548 int sdif = l3mdev_master_ifindex_by_index(net, dif);
549 INET_ADDR_COOKIE(acookie, saddr, daddr);
550 const __portpair ports =
INET_COMBINED_PORTS(inet->inet_dport, lport);
551 unsigned int hash = inet_ehashfn(net, daddr, lport,
552 saddr, inet->inet_dport);
553 struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash);
554 spinlock_t *lock = inet_ehash_lockp(hinfo, hash);
555 struct sock *sk2;
556 const struct hlist_nulls_node *node;
557 struct inet_timewait_sock *tw = NULL;
558
559 spin_lock(lock);
560
561 sk_nulls_for_each(sk2, node, &head->chain) {
562 if (sk2->sk_hash != hash)
563 continue;
564
565 if (likely(inet_match(net, sk2, acookie, ports, dif,
sdif))) {
566 if (sk2->sk_state == TCP_TIME_WAIT) {
567 tw = inet_twsk(sk2);
568 if (twsk_unique(sk, sk2, twp))
569 break;
570 }
571 goto not_unique;
572 }
573 }
...
23 static inline int twsk_unique(struct sock *sk, struct sock *sktw,
void *twp)
24 {
25 if (sk->sk_prot->twsk_prot->twsk_unique != NULL)
26 return sk->sk_prot->twsk_prot->twsk_unique(sk, sktw,
twp);
27 return 0;
28 }
110 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
111 {
112 int reuse = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_tw_reuse);
113 const struct inet_timewait_sock *tw = inet_twsk(sktw);
114 const struct tcp_timewait_sock *tcptw = tcp_twsk(sktw);
115 struct tcp_sock *tp = tcp_sk(sk);
116
...
154 if (tcptw->tw_ts_recent_stamp &&
155 (!twp || (reuse && time_after32(ktime_get_seconds(),
156 tcptw->tw_ts_recent_stamp)))) {
...
168 if (likely(!tp->repair)) {
...
176 }
177 sock_hold(sktw);
178 return 1;
179 }
180
181 return 0;
182 }
[433522.338983] ------------[ cut here ]------------
[433522.339033] refcount_t: addition on 0; use-after-free.
[433522.339706] WARNING: CPU: 0 PID: 1039313 at lib/refcount.c:25
refcount_warn_saturate+0xe5/0x110
[433522.340028] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 rfkill nf_tables nfnetlink qrtr vsock_loopback
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
intel_rapl_msr intel_rapl_common intel_uncore_frequency_common
intel_pmc_core snd_ens1371 intel_vsec pmt_telemetry snd_ac97_codec
pmt_class rapl gameport vmw_balloon snd_rawmidi snd_seq_device sunrpc
ac97_bus snd_pcm snd_timer snd soundcore vfat fat vmw_vmci i2c_piix4
joydev loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel nvme
polyval_clmulni polyval_generic nvme_core ghash_clmulni_intel vmwgfx
sha512_ssse3 sha256_ssse3 sha1_ssse3 vmxnet3 nvme_auth drm_ttm_helper
ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
scsi_dh_alua fuse dm_multipath
[433522.340141] CPU: 0 PID: 1039313 Comm: trigger Not tainted
6.8.6-200.fc39.x86_64 #1
[433522.340170] Hardware name: VMware, Inc. VMware20,1/440BX Desktop
Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023
[433522.340172] RIP: 0010:refcount_warn_saturate+0xe5/0x110
[433522.340179] Code: 42 8e ff 0f 0b c3 cc cc cc cc 80 3d aa 13 ea 01 00
0f 85 5e ff ff ff 48 c7 c7 f8 8e b7 82 c6 05 96 13 ea 01 01 e8 7b 42 8e
ff <0f> 0b c3 cc cc cc cc 48 c7 c7 50 8f b7 82 c6 05 7a 13 ea 01 01 e8
[433522.340182] RSP: 0018:ffffc90006b43b60 EFLAGS: 00010282
[433522.340185] RAX: 0000000000000000 RBX: ffff888009bb3ef0 RCX:
0000000000000027
[433522.340213] RDX: ffff88807be218c8 RSI: 0000000000000001 RDI:
ffff88807be218c0
[433522.340215] RBP: 0000000000069d70 R08: 0000000000000000 R09:
ffffc90006b439f0
[433522.340217] R10: ffffc90006b439e8 R11: 0000000000000003 R12:
ffff8880029ede84
[433522.340219] R13: 0000000000004e20 R14: ffffffff84356dc0 R15:
ffff888009bb3ef0
[433522.340221] FS: 00007f62c10926c0(0000) GS:ffff88807be00000(0000)
knlGS:0000000000000000
[433522.340224] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[433522.340226] CR2: 0000000020ccb000 CR3: 000000004628c005 CR4:
0000000000f70ef0
[433522.340276] PKRU: 55555554
[433522.340278] Call Trace:
[433522.340282] <TASK>
[433522.340307] ? refcount_warn_saturate+0xe5/0x110
[433522.340313] ? __warn+0x81/0x130
[433522.340462] ? refcount_warn_saturate+0xe5/0x110
[433522.340492] ? report_bug+0x171/0x1a0
[433522.340723] ? refcount_warn_saturate+0xe5/0x110
[433522.340731] ? handle_bug+0x3c/0x80
[433522.340781] ? exc_invalid_op+0x17/0x70
[433522.340785] ? asm_exc_invalid_op+0x1a/0x20
[433522.340838] ? refcount_warn_saturate+0xe5/0x110
[433522.340843] tcp_twsk_unique+0x186/0x190
[433522.340945] __inet_check_established+0x176/0x2d0
[433522.340974] __inet_hash_connect+0x74/0x7d0
[433522.340980] ? __pfx___inet_check_established+0x10/0x10
[433522.340983] tcp_v4_connect+0x278/0x530
[433522.340989] __inet_stream_connect+0x10f/0x3d0
[433522.341019] inet_stream_connect+0x3a/0x60
[433522.341024] __sys_connect+0xa8/0xd0
[433522.341186] __x64_sys_connect+0x18/0x20
[433522.341190] do_syscall_64+0x83/0x170
[433522.341195] ? __count_memcg_events+0x4d/0xc0
[433522.341334] ? count_memcg_events.constprop.0+0x1a/0x30
[433522.341385] ? handle_mm_fault+0xa2/0x360
[433522.341412] ? do_user_addr_fault+0x304/0x670
[433522.341442] ? clear_bhb_loop+0x55/0xb0
[433522.341446] ? clear_bhb_loop+0x55/0xb0
[433522.341449] ? clear_bhb_loop+0x55/0xb0
[433522.341453] entry_SYSCALL_64_after_hwframe+0x78/0x80
[433522.341458] RIP: 0033:0x7f62c11a885d
[433522.341685] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48
[433522.341688] RSP: 002b:00007f62c1091e58 EFLAGS: 00000296 ORIG_RAX:
000000000000002a
[433522.341691] RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX:
00007f62c11a885d
[433522.341693] RDX: 0000000000000010 RSI: 0000000020ccb000 RDI:
0000000000000003
[433522.341695] RBP: 00007f62c1091e90 R08: 0000000000000000 R09:
0000000000000000
[433522.341696] R10: 0000000000000000 R11: 0000000000000296 R12:
00007f62c10926c0
[433522.341698] R13: ffffffffffffff88 R14: 0000000000000000 R15:
00007ffe237885b0
[433522.341702] </TASK>
[433522.341703] ---[ end trace 0000000000000000 ]---
[433522.341709] ------------[ cut here ]------------
[433522.341710] refcount_t: underflow; use-after-free.
[433522.341720] WARNING: CPU: 0 PID: 1039313 at lib/refcount.c:28
refcount_warn_saturate+0xbe/0x110
[433522.341727] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 rfkill nf_tables nfnetlink qrtr vsock_loopback
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
intel_rapl_msr intel_rapl_common intel_uncore_frequency_common
intel_pmc_core snd_ens1371 intel_vsec pmt_telemetry snd_ac97_codec
pmt_class rapl gameport vmw_balloon snd_rawmidi snd_seq_device sunrpc
ac97_bus snd_pcm snd_timer snd soundcore vfat fat vmw_vmci i2c_piix4
joydev loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel nvme
polyval_clmulni polyval_generic nvme_core ghash_clmulni_intel vmwgfx
sha512_ssse3 sha256_ssse3 sha1_ssse3 vmxnet3 nvme_auth drm_ttm_helper
ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
scsi_dh_alua fuse dm_multipath
[433522.341820] CPU: 0 PID: 1039313 Comm: trigger Tainted: G W
6.8.6-200.fc39.x86_64 #1
[433522.341823] Hardware name: VMware, Inc. VMware20,1/440BX Desktop
Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023
[433522.341825] RIP: 0010:refcount_warn_saturate+0xbe/0x110
[433522.341829] Code: 01 01 e8 c5 42 8e ff 0f 0b c3 cc cc cc cc 80 3d cc
13 ea 01 00 75 85 48 c7 c7 28 8f b7 82 c6 05 bc 13 ea 01 01 e8 a2 42 8e
ff <0f> 0b c3 cc cc cc cc 80 3d aa 13 ea 01 00 0f 85 5e ff ff ff 48 c7
[433522.341831] RSP: 0018:ffffc90006b43b80 EFLAGS: 00010282
[433522.341834] RAX: 0000000000000000 RBX: 0000000000004e20 RCX:
0000000000000027
[433522.341836] RDX: ffff88807be218c8 RSI: 0000000000000001 RDI:
ffff88807be218c0
[433522.341837] RBP: ffff888009a640c0 R08: 0000000000000000 R09:
ffffc90006b43a10
[433522.341839] R10: ffffc90006b43a08 R11: 0000000000000003 R12:
ffff8880029ede84
[433522.341840] R13: 000000000000204e R14: ffffffff84356dc0 R15:
ffff888009bb3ef0
[433522.341842] FS: 00007f62c10926c0(0000) GS:ffff88807be00000(0000)
knlGS:0000000000000000
[433522.341844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[433522.341846] CR2: 0000000020ccb000 CR3: 000000004628c005 CR4:
0000000000f70ef0
[433522.341886] PKRU: 55555554
[433522.341887] Call Trace:
[433522.341889] <TASK>
[433522.341890] ? refcount_warn_saturate+0xbe/0x110
[433522.341894] ? __warn+0x81/0x130
[433522.341899] ? refcount_warn_saturate+0xbe/0x110
[433522.341903] ? report_bug+0x171/0x1a0
[433522.341907] ? console_unlock+0x78/0x120
[433522.341977] ? handle_bug+0x3c/0x80
[433522.341981] ? exc_invalid_op+0x17/0x70
[433522.342007] ? asm_exc_invalid_op+0x1a/0x20
[433522.342011] ? refcount_warn_saturate+0xbe/0x110
[433522.342015] __inet_check_established+0x24d/0x2d0
[433522.342019] __inet_hash_connect+0x74/0x7d0
[433522.342023] ? __pfx___inet_check_established+0x10/0x10
[433522.342026] tcp_v4_connect+0x278/0x530
[433522.342031] __inet_stream_connect+0x10f/0x3d0
[433522.342035] inet_stream_connect+0x3a/0x60
[433522.342039] __sys_connect+0xa8/0xd0
[433522.342044] __x64_sys_connect+0x18/0x20
[433522.342048] do_syscall_64+0x83/0x170
[433522.342051] ? __count_memcg_events+0x4d/0xc0
[433522.342054] ? count_memcg_events.constprop.0+0x1a/0x30
[433522.342058] ? handle_mm_fault+0xa2/0x360
[433522.342060] ? do_user_addr_fault+0x304/0x670
[433522.342065] ? clear_bhb_loop+0x55/0xb0
[433522.342068] ? clear_bhb_loop+0x55/0xb0
[433522.342071] ? clear_bhb_loop+0x55/0xb0
[433522.342074] entry_SYSCALL_64_after_hwframe+0x78/0x80
[433522.342077] RIP: 0033:0x7f62c11a885d
[433522.342083] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48
[433522.342085] RSP: 002b:00007f62c1091e58 EFLAGS: 00000296 ORIG_RAX:
000000000000002a
[433522.342087] RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX:
00007f62c11a885d
[433522.342089] RDX: 0000000000000010 RSI: 0000000020ccb000 RDI:
0000000000000003
[433522.342091] RBP: 00007f62c1091e90 R08: 0000000000000000 R09:
0000000000000000
[433522.342092] R10: 0000000000000000 R11: 0000000000000296 R12:
00007f62c10926c0
[433522.342093] R13: ffffffffffffff88 R14: 0000000000000000 R15:
00007ffe237885b0
[433522.342096] </TASK>
[433522.342097] ---[ end trace 0000000000000000 ]---
[435060.554199] ------------[ cut here ]------------
[435060.554243] refcount_t: decrement hit 0; leaking memory.
[435060.554261] WARNING: CPU: 2 PID: 879478 at lib/refcount.c:31
refcount_warn_saturate+0xff/0x110
[435060.554278] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 rfkill nf_tables nfnetlink qrtr vsock_loopback
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
intel_rapl_msr intel_rapl_common intel_uncore_frequency_common
intel_pmc_core snd_ens1371 intel_vsec pmt_telemetry snd_ac97_codec
pmt_class rapl gameport vmw_balloon snd_rawmidi snd_seq_device sunrpc
ac97_bus snd_pcm snd_timer snd soundcore vfat fat vmw_vmci i2c_piix4
joydev loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel nvme
polyval_clmulni polyval_generic nvme_core ghash_clmulni_intel vmwgfx
sha512_ssse3 sha256_ssse3 sha1_ssse3 vmxnet3 nvme_auth drm_ttm_helper
ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
scsi_dh_alua fuse dm_multipath
[435060.554426] CPU: 2 PID: 879478 Comm: trigger Tainted: G W
6.8.6-200.fc39.x86_64 #1
[435060.554431] Hardware name: VMware, Inc. VMware20,1/440BX Desktop
Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023
[435060.554433] RIP: 0010:refcount_warn_saturate+0xff/0x110
[435060.554439] Code: f8 8e b7 82 c6 05 96 13 ea 01 01 e8 7b 42 8e ff 0f
0b c3 cc cc cc cc 48 c7 c7 50 8f b7 82 c6 05 7a 13 ea 01 01 e8 61 42 8e
ff <0f> 0b c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[435060.554442] RSP: 0018:ffffc90005e2bb50 EFLAGS: 00010286
[435060.554445] RAX: 0000000000000000 RBX: 0000000000004e20 RCX:
0000000000000027
[435060.554448] RDX: ffff88807bea18c8 RSI: 0000000000000001 RDI:
ffff88807bea18c0
[435060.554450] RBP: ffff8880274d9bc0 R08: 0000000000000000 R09:
ffffc90005e2b9e0
[435060.554451] R10: ffffc90005e2b9d8 R11: 0000000000000003 R12:
ffff8880029ede84
[435060.554453] R13: 000000000000204e R14: ffffffff84356dc0 R15:
ffff888009bb2738
[435060.554456] FS: 00007f102ab566c0(0000) GS:ffff88807be80000(0000)
knlGS:0000000000000000
[435060.554458] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[435060.554460] CR2: 0000000020ccb000 CR3: 000000004e184003 CR4:
0000000000f70ef0
[435060.554601] PKRU: 55555554
[435060.554603] Call Trace:
[435060.554607] <TASK>
[435060.554608] ? refcount_warn_saturate+0xff/0x110
[435060.554614] ? __warn+0x81/0x130
[435060.554625] ? refcount_warn_saturate+0xff/0x110
[435060.554630] ? report_bug+0x171/0x1a0
[435060.554638] ? console_unlock+0x78/0x120
[435060.554670] ? handle_bug+0x3c/0x80
[435060.554676] ? exc_invalid_op+0x17/0x70
[435060.554682] ? asm_exc_invalid_op+0x1a/0x20
[435060.554694] ? refcount_warn_saturate+0xff/0x110
[435060.554699] __inet_check_established+0x29b/0x2d0
[435060.554707] __inet_hash_connect+0x74/0x7d0
[435060.554712] ? __pfx___inet_check_established+0x10/0x10
[435060.554716] tcp_v4_connect+0x278/0x530
[435060.554723] __inet_stream_connect+0x10f/0x3d0
[435060.554729] inet_stream_connect+0x3a/0x60
[435060.554734] __sys_connect+0xa8/0xd0
[435060.554744] __x64_sys_connect+0x18/0x20
[435060.554748] do_syscall_64+0x83/0x170
[435060.554752] ? __switch_to_asm+0x3e/0x70
[435060.554826] ? finish_task_switch.isra.0+0x94/0x2f0
[435060.554835] ? __schedule+0x3f4/0x1530
[435060.554865] ? __count_memcg_events+0x4d/0xc0
[435060.554871] ? __rseq_handle_notify_resume+0xa9/0x4f0
[435060.554946] ? count_memcg_events.constprop.0+0x1a/0x30
[435060.554953] ? switch_fpu_return+0x50/0xe0
[435060.555065] ? clear_bhb_loop+0x55/0xb0
[435060.555070] ? clear_bhb_loop+0x55/0xb0
[435060.555073] ? clear_bhb_loop+0x55/0xb0
[435060.555077] entry_SYSCALL_64_after_hwframe+0x78/0x80
[435060.555082] RIP: 0033:0x7f102ac6c85d
[435060.555141] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48
[435060.555143] RSP: 002b:00007f102ab55e58 EFLAGS: 00000296 ORIG_RAX:
000000000000002a
[435060.555147] RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX:
00007f102ac6c85d
[435060.555149] RDX: 0000000000000010 RSI: 0000000020ccb000 RDI:
0000000000000003
[435060.555151] RBP: 00007f102ab55e90 R08: 0000000000000000 R09:
0000000000000000
[435060.555153] R10: 0000000000000000 R11: 0000000000000296 R12:
00007f102ab566c0
[435060.555154] R13: ffffffffffffff88 R14: 0000000000000000 R15:
00007ffc83d0fa70
[435060.555158] </TASK>
[435060.555160] ---[ end trace 0000000000000000 ]---
--
Anderson Nascimento
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: use-after-free warnings in tcp_v4_connect() due to inet_twsk_hashdance() inserting the object into ehash table without initializing its reference counter
2024-04-30 22:00 use-after-free warnings in tcp_v4_connect() due to inet_twsk_hashdance() inserting the object into ehash table without initializing its reference counter Anderson Nascimento
@ 2024-05-01 0:22 ` Kuniyuki Iwashima
2024-05-01 6:56 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Kuniyuki Iwashima @ 2024-05-01 0:22 UTC (permalink / raw)
To: anderson; +Cc: netdev, edumazet, kuniyu
+cc Eric
From: Anderson Nascimento <anderson@allelesecurity.com>
Date: Tue, 30 Apr 2024 19:00:34 -0300
> Hello,
Hi,
Thanks for the detailed report.
>
> There is a bug in inet_twsk_hashdance(). This function inserts a
> time-wait socket in the established hash table without initializing the
> object's reference counter, as seen below. The reference counter
> initialization is done after the object is added to the established hash
> table and the lock is released. Because of this, a sock_hold() in
> tcp_twsk_unique() and other operations on the object trigger warnings
> from the reference counter saturation mechanism. The warnings can also
> be seen below. They were triggered on Fedora 39 Linux kernel v6.8.
>
> The bug is triggered via a connect() system call on a TCP socket,
> reaching __inet_check_established() and then passing the time-wait
> socket to tcp_twsk_unique(). Other operations are also performed on the
> time-wait socket in __inet_check_established() before its reference
> counter is initialized correctly by inet_twsk_hashdance(). The fix seems
> to be to move the reference counter initialization inside the lock,
or use refcount_inc_not_zero() and give up on reusing the port
under the race ?
---8<---
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0427deca3e0e..637f4965326d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -175,8 +175,13 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
tp->rx_opt.ts_recent = tcptw->tw_ts_recent;
tp->rx_opt.ts_recent_stamp = tcptw->tw_ts_recent_stamp;
}
- sock_hold(sktw);
- return 1;
+
+ /* Here, sk_refcnt could be 0 because inet_twsk_hashdance() puts
+ * twsk into ehash and releases the bucket lock *before* setting
+ * sk_refcnt. Then, give up on reusing the port.
+ */
+ if (likely(refcount_inc_not_zero(&sktw->sk_refcnt)))
+ return 1;
}
return 0;
---8<---
> but
> as I didn't test it, I can't confirm it.
>
> The bug seems to be introduced by commit ec94c269 ("
> tcp/dccp: avoid one atomic operation for timewait hashdance").
>
> 100 void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
> 101 struct inet_hashinfo *hashinfo)
> 102 {
> 103 const struct inet_sock *inet = inet_sk(sk);
> 104 const struct inet_connection_sock *icsk = inet_csk(sk);
> 105 struct inet_ehash_bucket *ehead =
> inet_ehash_bucket(hashinfo, sk->sk_hash);
> 106 spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
> 107 struct inet_bind_hashbucket *bhead, *bhead2;
> ...
> 129
> 130 spin_lock(lock);
> 131
> 132 inet_twsk_add_node_rcu(tw, &ehead->chain);
> 133
> 134 /* Step 3: Remove SK from hash chain */
> 135 if (__sk_nulls_del_node_init_rcu(sk))
> 136 sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
> 137
> 138 spin_unlock(lock);
> ...
> 149 refcount_set(&tw->tw_refcnt, 3);
> 150 }
>
> 538 static int __inet_check_established(struct inet_timewait_death_row
> *death_row,
> 539 struct sock *sk, __u16 lport,
> 540 struct inet_timewait_sock **twp)
> 541 {
> 542 struct inet_hashinfo *hinfo = death_row->hashinfo;
> 543 struct inet_sock *inet = inet_sk(sk);
> 544 __be32 daddr = inet->inet_rcv_saddr;
> 545 __be32 saddr = inet->inet_daddr;
> 546 int dif = sk->sk_bound_dev_if;
> 547 struct net *net = sock_net(sk);
> 548 int sdif = l3mdev_master_ifindex_by_index(net, dif);
> 549 INET_ADDR_COOKIE(acookie, saddr, daddr);
> 550 const __portpair ports =
> INET_COMBINED_PORTS(inet->inet_dport, lport);
> 551 unsigned int hash = inet_ehashfn(net, daddr, lport,
> 552 saddr, inet->inet_dport);
> 553 struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash);
> 554 spinlock_t *lock = inet_ehash_lockp(hinfo, hash);
> 555 struct sock *sk2;
> 556 const struct hlist_nulls_node *node;
> 557 struct inet_timewait_sock *tw = NULL;
> 558
> 559 spin_lock(lock);
> 560
> 561 sk_nulls_for_each(sk2, node, &head->chain) {
> 562 if (sk2->sk_hash != hash)
> 563 continue;
> 564
> 565 if (likely(inet_match(net, sk2, acookie, ports, dif,
> sdif))) {
> 566 if (sk2->sk_state == TCP_TIME_WAIT) {
> 567 tw = inet_twsk(sk2);
> 568 if (twsk_unique(sk, sk2, twp))
> 569 break;
> 570 }
> 571 goto not_unique;
> 572 }
> 573 }
> ...
>
> 23 static inline int twsk_unique(struct sock *sk, struct sock *sktw,
> void *twp)
> 24 {
> 25 if (sk->sk_prot->twsk_prot->twsk_unique != NULL)
> 26 return sk->sk_prot->twsk_prot->twsk_unique(sk, sktw,
> twp);
> 27 return 0;
> 28 }
>
> 110 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
> 111 {
> 112 int reuse = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_tw_reuse);
> 113 const struct inet_timewait_sock *tw = inet_twsk(sktw);
> 114 const struct tcp_timewait_sock *tcptw = tcp_twsk(sktw);
> 115 struct tcp_sock *tp = tcp_sk(sk);
> 116
> ...
> 154 if (tcptw->tw_ts_recent_stamp &&
> 155 (!twp || (reuse && time_after32(ktime_get_seconds(),
> 156 tcptw->tw_ts_recent_stamp)))) {
> ...
> 168 if (likely(!tp->repair)) {
> ...
> 176 }
> 177 sock_hold(sktw);
> 178 return 1;
> 179 }
> 180
> 181 return 0;
> 182 }
>
> [433522.338983] ------------[ cut here ]------------
> [433522.339033] refcount_t: addition on 0; use-after-free.
> [433522.339706] WARNING: CPU: 0 PID: 1039313 at lib/refcount.c:25
> refcount_warn_saturate+0xe5/0x110
> [433522.340028] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 rfkill nf_tables nfnetlink qrtr vsock_loopback
> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> intel_rapl_msr intel_rapl_common intel_uncore_frequency_common
> intel_pmc_core snd_ens1371 intel_vsec pmt_telemetry snd_ac97_codec
> pmt_class rapl gameport vmw_balloon snd_rawmidi snd_seq_device sunrpc
> ac97_bus snd_pcm snd_timer snd soundcore vfat fat vmw_vmci i2c_piix4
> joydev loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel nvme
> polyval_clmulni polyval_generic nvme_core ghash_clmulni_intel vmwgfx
> sha512_ssse3 sha256_ssse3 sha1_ssse3 vmxnet3 nvme_auth drm_ttm_helper
> ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua fuse dm_multipath
> [433522.340141] CPU: 0 PID: 1039313 Comm: trigger Not tainted
> 6.8.6-200.fc39.x86_64 #1
> [433522.340170] Hardware name: VMware, Inc. VMware20,1/440BX Desktop
> Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023
> [433522.340172] RIP: 0010:refcount_warn_saturate+0xe5/0x110
> [433522.340179] Code: 42 8e ff 0f 0b c3 cc cc cc cc 80 3d aa 13 ea 01 00
> 0f 85 5e ff ff ff 48 c7 c7 f8 8e b7 82 c6 05 96 13 ea 01 01 e8 7b 42 8e
> ff <0f> 0b c3 cc cc cc cc 48 c7 c7 50 8f b7 82 c6 05 7a 13 ea 01 01 e8
> [433522.340182] RSP: 0018:ffffc90006b43b60 EFLAGS: 00010282
> [433522.340185] RAX: 0000000000000000 RBX: ffff888009bb3ef0 RCX:
> 0000000000000027
> [433522.340213] RDX: ffff88807be218c8 RSI: 0000000000000001 RDI:
> ffff88807be218c0
> [433522.340215] RBP: 0000000000069d70 R08: 0000000000000000 R09:
> ffffc90006b439f0
> [433522.340217] R10: ffffc90006b439e8 R11: 0000000000000003 R12:
> ffff8880029ede84
> [433522.340219] R13: 0000000000004e20 R14: ffffffff84356dc0 R15:
> ffff888009bb3ef0
> [433522.340221] FS: 00007f62c10926c0(0000) GS:ffff88807be00000(0000)
> knlGS:0000000000000000
> [433522.340224] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [433522.340226] CR2: 0000000020ccb000 CR3: 000000004628c005 CR4:
> 0000000000f70ef0
> [433522.340276] PKRU: 55555554
> [433522.340278] Call Trace:
> [433522.340282] <TASK>
> [433522.340307] ? refcount_warn_saturate+0xe5/0x110
> [433522.340313] ? __warn+0x81/0x130
> [433522.340462] ? refcount_warn_saturate+0xe5/0x110
> [433522.340492] ? report_bug+0x171/0x1a0
> [433522.340723] ? refcount_warn_saturate+0xe5/0x110
> [433522.340731] ? handle_bug+0x3c/0x80
> [433522.340781] ? exc_invalid_op+0x17/0x70
> [433522.340785] ? asm_exc_invalid_op+0x1a/0x20
> [433522.340838] ? refcount_warn_saturate+0xe5/0x110
> [433522.340843] tcp_twsk_unique+0x186/0x190
> [433522.340945] __inet_check_established+0x176/0x2d0
> [433522.340974] __inet_hash_connect+0x74/0x7d0
> [433522.340980] ? __pfx___inet_check_established+0x10/0x10
> [433522.340983] tcp_v4_connect+0x278/0x530
> [433522.340989] __inet_stream_connect+0x10f/0x3d0
> [433522.341019] inet_stream_connect+0x3a/0x60
> [433522.341024] __sys_connect+0xa8/0xd0
> [433522.341186] __x64_sys_connect+0x18/0x20
> [433522.341190] do_syscall_64+0x83/0x170
> [433522.341195] ? __count_memcg_events+0x4d/0xc0
> [433522.341334] ? count_memcg_events.constprop.0+0x1a/0x30
> [433522.341385] ? handle_mm_fault+0xa2/0x360
> [433522.341412] ? do_user_addr_fault+0x304/0x670
> [433522.341442] ? clear_bhb_loop+0x55/0xb0
> [433522.341446] ? clear_bhb_loop+0x55/0xb0
> [433522.341449] ? clear_bhb_loop+0x55/0xb0
> [433522.341453] entry_SYSCALL_64_after_hwframe+0x78/0x80
> [433522.341458] RIP: 0033:0x7f62c11a885d
> [433522.341685] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48
> [433522.341688] RSP: 002b:00007f62c1091e58 EFLAGS: 00000296 ORIG_RAX:
> 000000000000002a
> [433522.341691] RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX:
> 00007f62c11a885d
> [433522.341693] RDX: 0000000000000010 RSI: 0000000020ccb000 RDI:
> 0000000000000003
> [433522.341695] RBP: 00007f62c1091e90 R08: 0000000000000000 R09:
> 0000000000000000
> [433522.341696] R10: 0000000000000000 R11: 0000000000000296 R12:
> 00007f62c10926c0
> [433522.341698] R13: ffffffffffffff88 R14: 0000000000000000 R15:
> 00007ffe237885b0
> [433522.341702] </TASK>
> [433522.341703] ---[ end trace 0000000000000000 ]---
> [433522.341709] ------------[ cut here ]------------
> [433522.341710] refcount_t: underflow; use-after-free.
> [433522.341720] WARNING: CPU: 0 PID: 1039313 at lib/refcount.c:28
> refcount_warn_saturate+0xbe/0x110
> [433522.341727] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 rfkill nf_tables nfnetlink qrtr vsock_loopback
> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> intel_rapl_msr intel_rapl_common intel_uncore_frequency_common
> intel_pmc_core snd_ens1371 intel_vsec pmt_telemetry snd_ac97_codec
> pmt_class rapl gameport vmw_balloon snd_rawmidi snd_seq_device sunrpc
> ac97_bus snd_pcm snd_timer snd soundcore vfat fat vmw_vmci i2c_piix4
> joydev loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel nvme
> polyval_clmulni polyval_generic nvme_core ghash_clmulni_intel vmwgfx
> sha512_ssse3 sha256_ssse3 sha1_ssse3 vmxnet3 nvme_auth drm_ttm_helper
> ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua fuse dm_multipath
> [433522.341820] CPU: 0 PID: 1039313 Comm: trigger Tainted: G W
> 6.8.6-200.fc39.x86_64 #1
> [433522.341823] Hardware name: VMware, Inc. VMware20,1/440BX Desktop
> Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023
> [433522.341825] RIP: 0010:refcount_warn_saturate+0xbe/0x110
> [433522.341829] Code: 01 01 e8 c5 42 8e ff 0f 0b c3 cc cc cc cc 80 3d cc
> 13 ea 01 00 75 85 48 c7 c7 28 8f b7 82 c6 05 bc 13 ea 01 01 e8 a2 42 8e
> ff <0f> 0b c3 cc cc cc cc 80 3d aa 13 ea 01 00 0f 85 5e ff ff ff 48 c7
> [433522.341831] RSP: 0018:ffffc90006b43b80 EFLAGS: 00010282
> [433522.341834] RAX: 0000000000000000 RBX: 0000000000004e20 RCX:
> 0000000000000027
> [433522.341836] RDX: ffff88807be218c8 RSI: 0000000000000001 RDI:
> ffff88807be218c0
> [433522.341837] RBP: ffff888009a640c0 R08: 0000000000000000 R09:
> ffffc90006b43a10
> [433522.341839] R10: ffffc90006b43a08 R11: 0000000000000003 R12:
> ffff8880029ede84
> [433522.341840] R13: 000000000000204e R14: ffffffff84356dc0 R15:
> ffff888009bb3ef0
> [433522.341842] FS: 00007f62c10926c0(0000) GS:ffff88807be00000(0000)
> knlGS:0000000000000000
> [433522.341844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [433522.341846] CR2: 0000000020ccb000 CR3: 000000004628c005 CR4:
> 0000000000f70ef0
> [433522.341886] PKRU: 55555554
> [433522.341887] Call Trace:
> [433522.341889] <TASK>
> [433522.341890] ? refcount_warn_saturate+0xbe/0x110
> [433522.341894] ? __warn+0x81/0x130
> [433522.341899] ? refcount_warn_saturate+0xbe/0x110
> [433522.341903] ? report_bug+0x171/0x1a0
> [433522.341907] ? console_unlock+0x78/0x120
> [433522.341977] ? handle_bug+0x3c/0x80
> [433522.341981] ? exc_invalid_op+0x17/0x70
> [433522.342007] ? asm_exc_invalid_op+0x1a/0x20
> [433522.342011] ? refcount_warn_saturate+0xbe/0x110
> [433522.342015] __inet_check_established+0x24d/0x2d0
> [433522.342019] __inet_hash_connect+0x74/0x7d0
> [433522.342023] ? __pfx___inet_check_established+0x10/0x10
> [433522.342026] tcp_v4_connect+0x278/0x530
> [433522.342031] __inet_stream_connect+0x10f/0x3d0
> [433522.342035] inet_stream_connect+0x3a/0x60
> [433522.342039] __sys_connect+0xa8/0xd0
> [433522.342044] __x64_sys_connect+0x18/0x20
> [433522.342048] do_syscall_64+0x83/0x170
> [433522.342051] ? __count_memcg_events+0x4d/0xc0
> [433522.342054] ? count_memcg_events.constprop.0+0x1a/0x30
> [433522.342058] ? handle_mm_fault+0xa2/0x360
> [433522.342060] ? do_user_addr_fault+0x304/0x670
> [433522.342065] ? clear_bhb_loop+0x55/0xb0
> [433522.342068] ? clear_bhb_loop+0x55/0xb0
> [433522.342071] ? clear_bhb_loop+0x55/0xb0
> [433522.342074] entry_SYSCALL_64_after_hwframe+0x78/0x80
> [433522.342077] RIP: 0033:0x7f62c11a885d
> [433522.342083] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48
> [433522.342085] RSP: 002b:00007f62c1091e58 EFLAGS: 00000296 ORIG_RAX:
> 000000000000002a
> [433522.342087] RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX:
> 00007f62c11a885d
> [433522.342089] RDX: 0000000000000010 RSI: 0000000020ccb000 RDI:
> 0000000000000003
> [433522.342091] RBP: 00007f62c1091e90 R08: 0000000000000000 R09:
> 0000000000000000
> [433522.342092] R10: 0000000000000000 R11: 0000000000000296 R12:
> 00007f62c10926c0
> [433522.342093] R13: ffffffffffffff88 R14: 0000000000000000 R15:
> 00007ffe237885b0
> [433522.342096] </TASK>
> [433522.342097] ---[ end trace 0000000000000000 ]---
> [435060.554199] ------------[ cut here ]------------
> [435060.554243] refcount_t: decrement hit 0; leaking memory.
> [435060.554261] WARNING: CPU: 2 PID: 879478 at lib/refcount.c:31
> refcount_warn_saturate+0xff/0x110
> [435060.554278] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 rfkill nf_tables nfnetlink qrtr vsock_loopback
> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> intel_rapl_msr intel_rapl_common intel_uncore_frequency_common
> intel_pmc_core snd_ens1371 intel_vsec pmt_telemetry snd_ac97_codec
> pmt_class rapl gameport vmw_balloon snd_rawmidi snd_seq_device sunrpc
> ac97_bus snd_pcm snd_timer snd soundcore vfat fat vmw_vmci i2c_piix4
> joydev loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel nvme
> polyval_clmulni polyval_generic nvme_core ghash_clmulni_intel vmwgfx
> sha512_ssse3 sha256_ssse3 sha1_ssse3 vmxnet3 nvme_auth drm_ttm_helper
> ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua fuse dm_multipath
> [435060.554426] CPU: 2 PID: 879478 Comm: trigger Tainted: G W
> 6.8.6-200.fc39.x86_64 #1
> [435060.554431] Hardware name: VMware, Inc. VMware20,1/440BX Desktop
> Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023
> [435060.554433] RIP: 0010:refcount_warn_saturate+0xff/0x110
> [435060.554439] Code: f8 8e b7 82 c6 05 96 13 ea 01 01 e8 7b 42 8e ff 0f
> 0b c3 cc cc cc cc 48 c7 c7 50 8f b7 82 c6 05 7a 13 ea 01 01 e8 61 42 8e
> ff <0f> 0b c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
> [435060.554442] RSP: 0018:ffffc90005e2bb50 EFLAGS: 00010286
> [435060.554445] RAX: 0000000000000000 RBX: 0000000000004e20 RCX:
> 0000000000000027
> [435060.554448] RDX: ffff88807bea18c8 RSI: 0000000000000001 RDI:
> ffff88807bea18c0
> [435060.554450] RBP: ffff8880274d9bc0 R08: 0000000000000000 R09:
> ffffc90005e2b9e0
> [435060.554451] R10: ffffc90005e2b9d8 R11: 0000000000000003 R12:
> ffff8880029ede84
> [435060.554453] R13: 000000000000204e R14: ffffffff84356dc0 R15:
> ffff888009bb2738
> [435060.554456] FS: 00007f102ab566c0(0000) GS:ffff88807be80000(0000)
> knlGS:0000000000000000
> [435060.554458] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [435060.554460] CR2: 0000000020ccb000 CR3: 000000004e184003 CR4:
> 0000000000f70ef0
> [435060.554601] PKRU: 55555554
> [435060.554603] Call Trace:
> [435060.554607] <TASK>
> [435060.554608] ? refcount_warn_saturate+0xff/0x110
> [435060.554614] ? __warn+0x81/0x130
> [435060.554625] ? refcount_warn_saturate+0xff/0x110
> [435060.554630] ? report_bug+0x171/0x1a0
> [435060.554638] ? console_unlock+0x78/0x120
> [435060.554670] ? handle_bug+0x3c/0x80
> [435060.554676] ? exc_invalid_op+0x17/0x70
> [435060.554682] ? asm_exc_invalid_op+0x1a/0x20
> [435060.554694] ? refcount_warn_saturate+0xff/0x110
> [435060.554699] __inet_check_established+0x29b/0x2d0
> [435060.554707] __inet_hash_connect+0x74/0x7d0
> [435060.554712] ? __pfx___inet_check_established+0x10/0x10
> [435060.554716] tcp_v4_connect+0x278/0x530
> [435060.554723] __inet_stream_connect+0x10f/0x3d0
> [435060.554729] inet_stream_connect+0x3a/0x60
> [435060.554734] __sys_connect+0xa8/0xd0
> [435060.554744] __x64_sys_connect+0x18/0x20
> [435060.554748] do_syscall_64+0x83/0x170
> [435060.554752] ? __switch_to_asm+0x3e/0x70
> [435060.554826] ? finish_task_switch.isra.0+0x94/0x2f0
> [435060.554835] ? __schedule+0x3f4/0x1530
> [435060.554865] ? __count_memcg_events+0x4d/0xc0
> [435060.554871] ? __rseq_handle_notify_resume+0xa9/0x4f0
> [435060.554946] ? count_memcg_events.constprop.0+0x1a/0x30
> [435060.554953] ? switch_fpu_return+0x50/0xe0
> [435060.555065] ? clear_bhb_loop+0x55/0xb0
> [435060.555070] ? clear_bhb_loop+0x55/0xb0
> [435060.555073] ? clear_bhb_loop+0x55/0xb0
> [435060.555077] entry_SYSCALL_64_after_hwframe+0x78/0x80
> [435060.555082] RIP: 0033:0x7f102ac6c85d
> [435060.555141] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48
> [435060.555143] RSP: 002b:00007f102ab55e58 EFLAGS: 00000296 ORIG_RAX:
> 000000000000002a
> [435060.555147] RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX:
> 00007f102ac6c85d
> [435060.555149] RDX: 0000000000000010 RSI: 0000000020ccb000 RDI:
> 0000000000000003
> [435060.555151] RBP: 00007f102ab55e90 R08: 0000000000000000 R09:
> 0000000000000000
> [435060.555153] R10: 0000000000000000 R11: 0000000000000296 R12:
> 00007f102ab566c0
> [435060.555154] R13: ffffffffffffff88 R14: 0000000000000000 R15:
> 00007ffc83d0fa70
> [435060.555158] </TASK>
> [435060.555160] ---[ end trace 0000000000000000 ]---
>
> --
> Anderson Nascimento
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: use-after-free warnings in tcp_v4_connect() due to inet_twsk_hashdance() inserting the object into ehash table without initializing its reference counter
2024-05-01 0:22 ` Kuniyuki Iwashima
@ 2024-05-01 6:56 ` Eric Dumazet
2024-05-01 11:30 ` Anderson Nascimento
2024-05-01 16:52 ` Kuniyuki Iwashima
0 siblings, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2024-05-01 6:56 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: anderson, netdev
On Wed, May 1, 2024 at 2:22 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> +cc Eric
>
> From: Anderson Nascimento <anderson@allelesecurity.com>
> Date: Tue, 30 Apr 2024 19:00:34 -0300
> > Hello,
>
> Hi,
>
> Thanks for the detailed report.
>
> >
> > There is a bug in inet_twsk_hashdance(). This function inserts a
> > time-wait socket in the established hash table without initializing the
> > object's reference counter, as seen below. The reference counter
> > initialization is done after the object is added to the established hash
> > table and the lock is released. Because of this, a sock_hold() in
> > tcp_twsk_unique() and other operations on the object trigger warnings
> > from the reference counter saturation mechanism. The warnings can also
> > be seen below. They were triggered on Fedora 39 Linux kernel v6.8.
> >
> > The bug is triggered via a connect() system call on a TCP socket,
> > reaching __inet_check_established() and then passing the time-wait
> > socket to tcp_twsk_unique(). Other operations are also performed on the
> > time-wait socket in __inet_check_established() before its reference
> > counter is initialized correctly by inet_twsk_hashdance(). The fix seems
> > to be to move the reference counter initialization inside the lock,
>
> or use refcount_inc_not_zero() and give up on reusing the port
> under the race ?
>
> ---8<---
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0427deca3e0e..637f4965326d 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -175,8 +175,13 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
> tp->rx_opt.ts_recent = tcptw->tw_ts_recent;
> tp->rx_opt.ts_recent_stamp = tcptw->tw_ts_recent_stamp;
> }
> - sock_hold(sktw);
> - return 1;
> +
> + /* Here, sk_refcnt could be 0 because inet_twsk_hashdance() puts
> + * twsk into ehash and releases the bucket lock *before* setting
> + * sk_refcnt. Then, give up on reusing the port.
> + */
> + if (likely(refcount_inc_not_zero(&sktw->sk_refcnt)))
> + return 1;
> }
>
Thanks for CC me.
Nice analysis from Anderson ! Have you found this with a fuzzer ?
This patch would avoid the refcount splat, but would leave side
effects on tp, I am too lazy to double check them.
Incidentally, I think we have to annotate data-races on
tcptw->tw_ts_recent and tcptw->tw_ts_recent_stamp
Perhaps something like this instead ?
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0427deca3e0eb9239558aa124a41a1525df62a04..f1e3707d0b33180a270e6d3662d4cf17a4f72bb8
100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -155,6 +155,10 @@ int tcp_twsk_unique(struct sock *sk, struct sock
*sktw, void *twp)
if (tcptw->tw_ts_recent_stamp &&
(!twp || (reuse && time_after32(ktime_get_seconds(),
tcptw->tw_ts_recent_stamp)))) {
+
+ if (!refcount_inc_not_zero(&sktw->sk_refcnt))
+ return 0;
+
/* In case of repair and re-using TIME-WAIT sockets we still
* want to be sure that it is safe as above but honor the
* sequence numbers and time stamps set as part of the repair
@@ -175,7 +179,6 @@ int tcp_twsk_unique(struct sock *sk, struct sock
*sktw, void *twp)
tp->rx_opt.ts_recent = tcptw->tw_ts_recent;
tp->rx_opt.ts_recent_stamp = tcptw->tw_ts_recent_stamp;
}
- sock_hold(sktw);
return 1;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: use-after-free warnings in tcp_v4_connect() due to inet_twsk_hashdance() inserting the object into ehash table without initializing its reference counter
2024-05-01 6:56 ` Eric Dumazet
@ 2024-05-01 11:30 ` Anderson Nascimento
2024-05-01 16:52 ` Kuniyuki Iwashima
1 sibling, 0 replies; 7+ messages in thread
From: Anderson Nascimento @ 2024-05-01 11:30 UTC (permalink / raw)
To: Eric Dumazet, Kuniyuki Iwashima; +Cc: netdev
[-- Attachment #1.1: Type: text/plain, Size: 4343 bytes --]
On 5/1/24 03:56, Eric Dumazet wrote:
> On Wed, May 1, 2024 at 2:22 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>>
>> +cc Eric
>>
>> From: Anderson Nascimento <anderson@allelesecurity.com>
>> Date: Tue, 30 Apr 2024 19:00:34 -0300
>>> Hello,
>>
>> Hi,
>>
>> Thanks for the detailed report.
>>
>>>
>>> There is a bug in inet_twsk_hashdance(). This function inserts a
>>> time-wait socket in the established hash table without initializing the
>>> object's reference counter, as seen below. The reference counter
>>> initialization is done after the object is added to the established hash
>>> table and the lock is released. Because of this, a sock_hold() in
>>> tcp_twsk_unique() and other operations on the object trigger warnings
>>> from the reference counter saturation mechanism. The warnings can also
>>> be seen below. They were triggered on Fedora 39 Linux kernel v6.8.
>>>
>>> The bug is triggered via a connect() system call on a TCP socket,
>>> reaching __inet_check_established() and then passing the time-wait
>>> socket to tcp_twsk_unique(). Other operations are also performed on the
>>> time-wait socket in __inet_check_established() before its reference
>>> counter is initialized correctly by inet_twsk_hashdance(). The fix seems
>>> to be to move the reference counter initialization inside the lock,
>>
>> or use refcount_inc_not_zero() and give up on reusing the port
>> under the race ?
>>
>> ---8<---
>> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
>> index 0427deca3e0e..637f4965326d 100644
>> --- a/net/ipv4/tcp_ipv4.c
>> +++ b/net/ipv4/tcp_ipv4.c
>> @@ -175,8 +175,13 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
>> tp->rx_opt.ts_recent = tcptw->tw_ts_recent;
>> tp->rx_opt.ts_recent_stamp = tcptw->tw_ts_recent_stamp;
>> }
>> - sock_hold(sktw);
>> - return 1;
>> +
>> + /* Here, sk_refcnt could be 0 because inet_twsk_hashdance() puts
>> + * twsk into ehash and releases the bucket lock *before* setting
>> + * sk_refcnt. Then, give up on reusing the port.
>> + */
>> + if (likely(refcount_inc_not_zero(&sktw->sk_refcnt)))
>> + return 1;
>> }
>>
>
> Thanks for CC me.
>
> Nice analysis from Anderson ! Have you found this with a fuzzer ?
I ran the reproducer of a bug found by syzkaller on an older kernel, and
this issue was triggered. Analyzing it, I discovered it had nothing to
do with the problem the reproducer aimed to trigger, and it was present
upstream. I rewrote the reproducer and triggered it on v6.8 to confirm.
https://syzkaller.appspot.com/bug?extid=278279efdd2730dd14bf
>
> This patch would avoid the refcount splat, but would leave side
> effects on tp, I am too lazy to double check them.
>
> Incidentally, I think we have to annotate data-races on
> tcptw->tw_ts_recent and tcptw->tw_ts_recent_stamp
>
> Perhaps something like this instead ?
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0427deca3e0eb9239558aa124a41a1525df62a04..f1e3707d0b33180a270e6d3662d4cf17a4f72bb8
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -155,6 +155,10 @@ int tcp_twsk_unique(struct sock *sk, struct sock
> *sktw, void *twp)
> if (tcptw->tw_ts_recent_stamp &&
> (!twp || (reuse && time_after32(ktime_get_seconds(),
> tcptw->tw_ts_recent_stamp)))) {
> +
> + if (!refcount_inc_not_zero(&sktw->sk_refcnt))
> + return 0;
> +
> /* In case of repair and re-using TIME-WAIT sockets we still
> * want to be sure that it is safe as above but honor the
> * sequence numbers and time stamps set as part of the repair
> @@ -175,7 +179,6 @@ int tcp_twsk_unique(struct sock *sk, struct sock
> *sktw, void *twp)
> tp->rx_opt.ts_recent = tcptw->tw_ts_recent;
> tp->rx_opt.ts_recent_stamp = tcptw->tw_ts_recent_stamp;
> }
> - sock_hold(sktw);
> return 1;
> }
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: use-after-free warnings in tcp_v4_connect() due to inet_twsk_hashdance() inserting the object into ehash table without initializing its reference counter
2024-05-01 6:56 ` Eric Dumazet
2024-05-01 11:30 ` Anderson Nascimento
@ 2024-05-01 16:52 ` Kuniyuki Iwashima
2024-05-01 17:01 ` Eric Dumazet
1 sibling, 1 reply; 7+ messages in thread
From: Kuniyuki Iwashima @ 2024-05-01 16:52 UTC (permalink / raw)
To: edumazet; +Cc: anderson, kuniyu, netdev
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 1 May 2024 08:56:51 +0200
> On Wed, May 1, 2024 at 2:22 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > +cc Eric
> >
> > From: Anderson Nascimento <anderson@allelesecurity.com>
> > Date: Tue, 30 Apr 2024 19:00:34 -0300
> > > Hello,
> >
> > Hi,
> >
> > Thanks for the detailed report.
> >
> > >
> > > There is a bug in inet_twsk_hashdance(). This function inserts a
> > > time-wait socket in the established hash table without initializing the
> > > object's reference counter, as seen below. The reference counter
> > > initialization is done after the object is added to the established hash
> > > table and the lock is released. Because of this, a sock_hold() in
> > > tcp_twsk_unique() and other operations on the object trigger warnings
> > > from the reference counter saturation mechanism. The warnings can also
> > > be seen below. They were triggered on Fedora 39 Linux kernel v6.8.
> > >
> > > The bug is triggered via a connect() system call on a TCP socket,
> > > reaching __inet_check_established() and then passing the time-wait
> > > socket to tcp_twsk_unique(). Other operations are also performed on the
> > > time-wait socket in __inet_check_established() before its reference
> > > counter is initialized correctly by inet_twsk_hashdance(). The fix seems
> > > to be to move the reference counter initialization inside the lock,
> >
> > or use refcount_inc_not_zero() and give up on reusing the port
> > under the race ?
> >
> > ---8<---
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 0427deca3e0e..637f4965326d 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -175,8 +175,13 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
> > tp->rx_opt.ts_recent = tcptw->tw_ts_recent;
> > tp->rx_opt.ts_recent_stamp = tcptw->tw_ts_recent_stamp;
> > }
> > - sock_hold(sktw);
> > - return 1;
> > +
> > + /* Here, sk_refcnt could be 0 because inet_twsk_hashdance() puts
> > + * twsk into ehash and releases the bucket lock *before* setting
> > + * sk_refcnt. Then, give up on reusing the port.
> > + */
> > + if (likely(refcount_inc_not_zero(&sktw->sk_refcnt)))
> > + return 1;
> > }
> >
>
> Thanks for CC me.
>
> Nice analysis from Anderson ! Have you found this with a fuzzer ?
>
> This patch would avoid the refcount splat, but would leave side
> effects on tp, I am too lazy to double check them.
Ah exactly :)
>
> Incidentally, I think we have to annotate data-races on
> tcptw->tw_ts_recent and tcptw->tw_ts_recent_stamp
>
> Perhaps something like this instead ?
This looks good to me.
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0427deca3e0eb9239558aa124a41a1525df62a04..f1e3707d0b33180a270e6d3662d4cf17a4f72bb8
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -155,6 +155,10 @@ int tcp_twsk_unique(struct sock *sk, struct sock
> *sktw, void *twp)
> if (tcptw->tw_ts_recent_stamp &&
> (!twp || (reuse && time_after32(ktime_get_seconds(),
> tcptw->tw_ts_recent_stamp)))) {
> +
> + if (!refcount_inc_not_zero(&sktw->sk_refcnt))
> + return 0;
> +
> /* In case of repair and re-using TIME-WAIT sockets we still
> * want to be sure that it is safe as above but honor the
> * sequence numbers and time stamps set as part of the repair
> @@ -175,7 +179,6 @@ int tcp_twsk_unique(struct sock *sk, struct sock
> *sktw, void *twp)
> tp->rx_opt.ts_recent = tcptw->tw_ts_recent;
> tp->rx_opt.ts_recent_stamp = tcptw->tw_ts_recent_stamp;
> }
> - sock_hold(sktw);
> return 1;
> }
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: use-after-free warnings in tcp_v4_connect() due to inet_twsk_hashdance() inserting the object into ehash table without initializing its reference counter
2024-05-01 16:52 ` Kuniyuki Iwashima
@ 2024-05-01 17:01 ` Eric Dumazet
2024-05-01 17:44 ` Kuniyuki Iwashima
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2024-05-01 17:01 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: anderson, netdev
On Wed, May 1, 2024 at 6:52 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> This looks good to me.
>
Is it ok if you submit an official patch ? This is getting late here in France.
Thanks !
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: use-after-free warnings in tcp_v4_connect() due to inet_twsk_hashdance() inserting the object into ehash table without initializing its reference counter
2024-05-01 17:01 ` Eric Dumazet
@ 2024-05-01 17:44 ` Kuniyuki Iwashima
0 siblings, 0 replies; 7+ messages in thread
From: Kuniyuki Iwashima @ 2024-05-01 17:44 UTC (permalink / raw)
To: edumazet; +Cc: anderson, kuniyu, netdev
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 1 May 2024 19:01:35 +0200
> On Wed, May 1, 2024 at 6:52 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > This looks good to me.
> >
>
> Is it ok if you submit an official patch ? This is getting late here in France.
Sure thing, will do.
Thanks!
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-05-01 17:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-30 22:00 use-after-free warnings in tcp_v4_connect() due to inet_twsk_hashdance() inserting the object into ehash table without initializing its reference counter Anderson Nascimento
2024-05-01 0:22 ` Kuniyuki Iwashima
2024-05-01 6:56 ` Eric Dumazet
2024-05-01 11:30 ` Anderson Nascimento
2024-05-01 16:52 ` Kuniyuki Iwashima
2024-05-01 17:01 ` Eric Dumazet
2024-05-01 17:44 ` Kuniyuki Iwashima
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).