* [PATCH net-next v2 0/2] tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()
@ 2025-08-28 8:14 Dmitry Safonov via B4 Relay
2025-08-28 8:14 ` [PATCH net-next v2 1/2] " Dmitry Safonov via B4 Relay
2025-08-28 8:14 ` [PATCH net-next v2 2/2] tcp: Free TCP-AO/TCP-MD5 info/keys without RCU Dmitry Safonov via B4 Relay
0 siblings, 2 replies; 5+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2025-08-28 8:14 UTC (permalink / raw)
To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
David Ahern, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Bob Gilligan, Salam Noureddine, Dmitry Safonov, netdev,
linux-kernel, Dmitry Safonov
On one side a minor/cosmetic issue, especially nowadays when
TCP-AO/TCP-MD5 signature verification failures aren't logged to dmesg.
Yet, I think worth addressing for two reasons:
- unsigned RST gets ignored by the peer and the connection is alive for
longer (keep-alive interval)
- netstat counters increase and trace events report that trusted BGP peer
is sending unsigned/incorrectly signed segments, which can ring alarm
on monitoring.
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
Changes in v2:
- Fixed TCP-MD5 ifdeffery (Reported-by: Victor Nogueira)
- Call proper destructor for inet_ipv6 (Reported-by: syzbot@syzkaller.appspotmail.com)
- Link to v1: https://lore.kernel.org/r/20250822-b4-tcp-ao-md5-rst-finwait2-v1-0-25825d085dcb@arista.com
---
Dmitry Safonov (2):
tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()
tcp: Free TCP-AO/TCP-MD5 info/keys without RCU
include/net/tcp.h | 4 ++++
net/ipv4/tcp.c | 16 ++++++++++++++++
net/ipv4/tcp_ao.c | 5 ++---
net/ipv4/tcp_ipv4.c | 37 ++++++++++---------------------------
net/ipv4/tcp_minisocks.c | 19 +++++--------------
net/ipv6/tcp_ipv6.c | 8 ++++++++
6 files changed, 45 insertions(+), 44 deletions(-)
---
base-commit: d4854be4ec21dad23907c0fbc3389c3a394ebf67
change-id: 20250822-b4-tcp-ao-md5-rst-finwait2-e632b4d8f58d
Best regards,
--
Dmitry Safonov <dima@arista.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next v2 1/2] tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()
2025-08-28 8:14 [PATCH net-next v2 0/2] tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct() Dmitry Safonov via B4 Relay
@ 2025-08-28 8:14 ` Dmitry Safonov via B4 Relay
2025-08-28 19:43 ` Eric Dumazet
2025-08-28 8:14 ` [PATCH net-next v2 2/2] tcp: Free TCP-AO/TCP-MD5 info/keys without RCU Dmitry Safonov via B4 Relay
1 sibling, 1 reply; 5+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2025-08-28 8:14 UTC (permalink / raw)
To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
David Ahern, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Bob Gilligan, Salam Noureddine, Dmitry Safonov, netdev,
linux-kernel, Dmitry Safonov
From: Dmitry Safonov <dima@arista.com>
Currently there are a couple of minor issues with destroying the keys
tcp_v4_destroy_sock():
1. The socket is yet in TCP bind buckets, making it reachable for
incoming segments [on another CPU core], potentially available to send
late FIN/ACK/RST replies.
2. There is at least one code path, where tcp_done() is called before
sending RST [kudos to Bob for investigation]. This is a case of
a server, that finished sending its data and just called close().
The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
__tcp_close())
tcp_v4_do_rcv()/tcp_v6_do_rcv()
tcp_rcv_state_process() /* LINUX_MIB_TCPABORTONDATA */
tcp_reset()
tcp_done_with_error()
tcp_done()
inet_csk_destroy_sock() /* Destroys AO/MD5 keys */
/* tcp_rcv_state_process() returns SKB_DROP_REASON_TCP_ABORT_ON_DATA */
tcp_v4_send_reset() /* Sends an unsigned RST segment */
tcpdump:
> 22:53:15.399377 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 33929, offset 0, flags [DF], proto TCP (6), length 60)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [F.], seq 2185658590, ack 3969644355, win 502, options [nop,nop,md5 valid], length 0
> 22:53:15.399396 00:00:01:01:00:00 > 00:00:b2:1f:00:00, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 51951, offset 0, flags [DF], proto TCP (6), length 72)
> 1.0.0.2.49848 > 1.0.0.1.34567: Flags [.], seq 3969644375, ack 2185658591, win 128, options [nop,nop,md5 valid,nop,nop,sack 1 {2185658590:2185658591}], length 0
> 22:53:16.429588 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658590, win 0, length 0
> 22:53:16.664725 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
> 22:53:17.289832 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
Note the signed RSTs later in the dump - those are sent by the server
when the fin-wait socket gets removed from hash buckets, by
the listener socket.
Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
socket can yet send non-data replies, they should be signed in order for
the peer to process them. Now it also matches how AO/MD5 gets destructed
for TIME-WAIT sockets (in tcp_twsk_destructor()).
This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
open problem: once RST get sent and socket gets actually destructed,
there is no information on the initial sequence numbers. So, in case
this last RST gets lost in the network, the server's listener socket
won't be able to properly sign another RST. Nothing in RFC 1122
prescribes keeping any local state after non-graceful reset.
Luckily, BGP are known to use keep alive(s).
While the issue is quite minor/cosmetic, these days monitoring network
counters is a common practice and getting invalid signed segments from
a trusted BGP peer can get customers worried.
Investigated-by: Bob Gilligan <gilligan@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
include/net/tcp.h | 4 ++++
net/ipv4/tcp.c | 27 +++++++++++++++++++++++++++
net/ipv4/tcp_ipv4.c | 33 ++++++++-------------------------
net/ipv6/tcp_ipv6.c | 8 ++++++++
4 files changed, 47 insertions(+), 25 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2936b8175950faa777f81f3c6b7230bcc375d772..0009c26241964b54aa93bc1b86158050d96c2c98 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1931,6 +1931,7 @@ tcp_md5_do_lookup_any_l3index(const struct sock *sk,
}
#define tcp_twsk_md5_key(twsk) ((twsk)->tw_md5_key)
+void tcp_md5_destruct_sock(struct sock *sk);
#else
static inline struct tcp_md5sig_key *
tcp_md5_do_lookup(const struct sock *sk, int l3index,
@@ -1947,6 +1948,9 @@ tcp_md5_do_lookup_any_l3index(const struct sock *sk,
}
#define tcp_twsk_md5_key(twsk) NULL
+static inline void tcp_md5_destruct_sock(struct sock *sk)
+{
+}
#endif
int tcp_md5_alloc_sigpool(void);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9bc8317e92b7952871f07ae11a9c2eaa7d3a9e65..927233ee7500e0568782ae4a3860af56d1476acd 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -412,6 +412,33 @@ static u64 tcp_compute_delivery_rate(const struct tcp_sock *tp)
return rate64;
}
+#ifdef CONFIG_TCP_MD5SIG
+static void tcp_md5sig_info_free_rcu(struct rcu_head *head)
+{
+ struct tcp_md5sig_info *md5sig;
+
+ md5sig = container_of(head, struct tcp_md5sig_info, rcu);
+ kfree(md5sig);
+ static_branch_slow_dec_deferred(&tcp_md5_needed);
+ tcp_md5_release_sigpool();
+}
+
+void tcp_md5_destruct_sock(struct sock *sk)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (tp->md5sig_info) {
+ struct tcp_md5sig_info *md5sig;
+
+ md5sig = rcu_dereference_protected(tp->md5sig_info, 1);
+ tcp_clear_md5_list(sk);
+ call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu);
+ rcu_assign_pointer(tp->md5sig_info, NULL);
+ }
+}
+EXPORT_SYMBOL_GPL(tcp_md5_destruct_sock);
+#endif
+
/* Address-family independent initialization for a tcp_sock.
*
* NOTE: A lot of things set to zero explicitly by call to
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a0c93b24c6e0ca2eb477686e477d164b0b132e7a..158b366a55bfd198ffeba13a426d993c3b02528e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2494,6 +2494,13 @@ static const struct tcp_sock_af_ops tcp_sock_ipv4_specific = {
.ao_calc_key_sk = tcp_v4_ao_calc_key_sk,
#endif
};
+
+static void tcp4_destruct_sock(struct sock *sk)
+{
+ tcp_md5_destruct_sock(sk);
+ tcp_ao_destroy_sock(sk, false);
+ inet_sock_destruct(sk);
+}
#endif
/* NOTE: A lot of things set to zero explicitly by call to
@@ -2509,23 +2516,12 @@ static int tcp_v4_init_sock(struct sock *sk)
#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO)
tcp_sk(sk)->af_specific = &tcp_sock_ipv4_specific;
+ sk->sk_destruct = tcp4_destruct_sock;
#endif
return 0;
}
-#ifdef CONFIG_TCP_MD5SIG
-static void tcp_md5sig_info_free_rcu(struct rcu_head *head)
-{
- struct tcp_md5sig_info *md5sig;
-
- md5sig = container_of(head, struct tcp_md5sig_info, rcu);
- kfree(md5sig);
- static_branch_slow_dec_deferred(&tcp_md5_needed);
- tcp_md5_release_sigpool();
-}
-#endif
-
static void tcp_release_user_frags(struct sock *sk)
{
#ifdef CONFIG_PAGE_POOL
@@ -2562,19 +2558,6 @@ void tcp_v4_destroy_sock(struct sock *sk)
/* Cleans up our, hopefully empty, out_of_order_queue. */
skb_rbtree_purge(&tp->out_of_order_queue);
-#ifdef CONFIG_TCP_MD5SIG
- /* Clean up the MD5 key list, if any */
- if (tp->md5sig_info) {
- struct tcp_md5sig_info *md5sig;
-
- md5sig = rcu_dereference_protected(tp->md5sig_info, 1);
- tcp_clear_md5_list(sk);
- call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu);
- rcu_assign_pointer(tp->md5sig_info, NULL);
- }
-#endif
- tcp_ao_destroy_sock(sk, false);
-
/* Clean up a referenced TCP bind bucket. */
if (inet_csk(sk)->icsk_bind_hash)
inet_put_port(sk);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 8b2e7b7afbd847b5d94b30ab27779e4dc705710d..5b328aa7b550f8fe9dd4c17128c88d952d81a06d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2112,6 +2112,13 @@ static const struct tcp_sock_af_ops tcp_sock_ipv6_mapped_specific = {
.ao_calc_key_sk = tcp_v4_ao_calc_key_sk,
#endif
};
+
+static void tcp6_destruct_sock(struct sock *sk)
+{
+ tcp_md5_destruct_sock(sk);
+ tcp_ao_destroy_sock(sk, false);
+ inet6_sock_destruct(sk);
+}
#endif
/* NOTE: A lot of things set to zero explicitly by call to
@@ -2127,6 +2134,7 @@ static int tcp_v6_init_sock(struct sock *sk)
#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO)
tcp_sk(sk)->af_specific = &tcp_sock_ipv6_specific;
+ sk->sk_destruct = tcp6_destruct_sock;
#endif
return 0;
--
2.42.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next v2 2/2] tcp: Free TCP-AO/TCP-MD5 info/keys without RCU
2025-08-28 8:14 [PATCH net-next v2 0/2] tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct() Dmitry Safonov via B4 Relay
2025-08-28 8:14 ` [PATCH net-next v2 1/2] " Dmitry Safonov via B4 Relay
@ 2025-08-28 8:14 ` Dmitry Safonov via B4 Relay
1 sibling, 0 replies; 5+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2025-08-28 8:14 UTC (permalink / raw)
To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
David Ahern, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Bob Gilligan, Salam Noureddine, Dmitry Safonov, netdev,
linux-kernel, Dmitry Safonov
From: Dmitry Safonov <dima@arista.com>
Now that the destruction of info/keys is delayed until the socket
destructor, it's safe to use kfree() without an RCU callback.
As either socket was yet in TCP_CLOSE state or the socket refcounter is
zero and no one can discover it anymore, it's safe to release memory
straight away.
Similar thing was possible for twsk already.
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
net/ipv4/tcp.c | 17 +++--------------
net/ipv4/tcp_ao.c | 5 ++---
net/ipv4/tcp_ipv4.c | 4 ++--
net/ipv4/tcp_minisocks.c | 19 +++++--------------
4 files changed, 12 insertions(+), 33 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 927233ee7500e0568782ae4a3860af56d1476acd..254ca95d0c3c5c44029be0e84120c5e9fb9d4514 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -413,27 +413,16 @@ static u64 tcp_compute_delivery_rate(const struct tcp_sock *tp)
}
#ifdef CONFIG_TCP_MD5SIG
-static void tcp_md5sig_info_free_rcu(struct rcu_head *head)
-{
- struct tcp_md5sig_info *md5sig;
-
- md5sig = container_of(head, struct tcp_md5sig_info, rcu);
- kfree(md5sig);
- static_branch_slow_dec_deferred(&tcp_md5_needed);
- tcp_md5_release_sigpool();
-}
-
void tcp_md5_destruct_sock(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
if (tp->md5sig_info) {
- struct tcp_md5sig_info *md5sig;
- md5sig = rcu_dereference_protected(tp->md5sig_info, 1);
tcp_clear_md5_list(sk);
- call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu);
- rcu_assign_pointer(tp->md5sig_info, NULL);
+ kfree(rcu_replace_pointer(tp->md5sig_info, NULL, 1));
+ static_branch_slow_dec_deferred(&tcp_md5_needed);
+ tcp_md5_release_sigpool();
}
}
EXPORT_SYMBOL_GPL(tcp_md5_destruct_sock);
diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
index bbb8d5f0eae7d3d8887da3fa4d68e248af9060ad..31302be78bc4450b56fa23a390b6d03b2262741d 100644
--- a/net/ipv4/tcp_ao.c
+++ b/net/ipv4/tcp_ao.c
@@ -268,9 +268,8 @@ static void tcp_ao_key_free_rcu(struct rcu_head *head)
kfree_sensitive(key);
}
-static void tcp_ao_info_free_rcu(struct rcu_head *head)
+static void tcp_ao_info_free(struct tcp_ao_info *ao)
{
- struct tcp_ao_info *ao = container_of(head, struct tcp_ao_info, rcu);
struct tcp_ao_key *key;
struct hlist_node *n;
@@ -310,7 +309,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
if (!twsk)
tcp_ao_sk_omem_free(sk, ao);
- call_rcu(&ao->rcu, tcp_ao_info_free_rcu);
+ tcp_ao_info_free(ao);
}
void tcp_ao_time_wait(struct tcp_timewait_sock *tcptw, struct tcp_sock *tp)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 158b366a55bfd198ffeba13a426d993c3b02528e..ba8b6090df2a1f5c415faa37941292d76346f9b8 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1503,9 +1503,9 @@ void tcp_clear_md5_list(struct sock *sk)
md5sig = rcu_dereference_protected(tp->md5sig_info, 1);
hlist_for_each_entry_safe(key, n, &md5sig->head, node) {
- hlist_del_rcu(&key->node);
+ hlist_del(&key->node);
atomic_sub(sizeof(*key), &sk->sk_omem_alloc);
- kfree_rcu(key, rcu);
+ kfree(key);
}
}
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index d1c9e40886463ca308f9f3682c4039f491e7555f..7c2ae07d8d5d2a18d6ce3210cc09ee5d9850ea29 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -377,26 +377,17 @@ void tcp_time_wait(struct sock *sk, int state, int timeo)
}
EXPORT_SYMBOL(tcp_time_wait);
-#ifdef CONFIG_TCP_MD5SIG
-static void tcp_md5_twsk_free_rcu(struct rcu_head *head)
-{
- struct tcp_md5sig_key *key;
-
- key = container_of(head, struct tcp_md5sig_key, rcu);
- kfree(key);
- static_branch_slow_dec_deferred(&tcp_md5_needed);
- tcp_md5_release_sigpool();
-}
-#endif
-
void tcp_twsk_destructor(struct sock *sk)
{
#ifdef CONFIG_TCP_MD5SIG
if (static_branch_unlikely(&tcp_md5_needed.key)) {
struct tcp_timewait_sock *twsk = tcp_twsk(sk);
- if (twsk->tw_md5_key)
- call_rcu(&twsk->tw_md5_key->rcu, tcp_md5_twsk_free_rcu);
+ if (twsk->tw_md5_key) {
+ kfree(twsk->tw_md5_key);
+ static_branch_slow_dec_deferred(&tcp_md5_needed);
+ tcp_md5_release_sigpool();
+ }
}
#endif
tcp_ao_destroy_sock(sk, true);
--
2.42.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v2 1/2] tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()
2025-08-28 8:14 ` [PATCH net-next v2 1/2] " Dmitry Safonov via B4 Relay
@ 2025-08-28 19:43 ` Eric Dumazet
2025-08-28 20:00 ` Dmitry Safonov
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2025-08-28 19:43 UTC (permalink / raw)
To: dima
Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, David Ahern,
Jakub Kicinski, Paolo Abeni, Simon Horman, Bob Gilligan,
Salam Noureddine, Dmitry Safonov, netdev, linux-kernel
On Thu, Aug 28, 2025 at 1:15 AM Dmitry Safonov via B4 Relay
<devnull+dima.arista.com@kernel.org> wrote:
>
> From: Dmitry Safonov <dima@arista.com>
>
> Currently there are a couple of minor issues with destroying the keys
> tcp_v4_destroy_sock():
>
> 1. The socket is yet in TCP bind buckets, making it reachable for
> incoming segments [on another CPU core], potentially available to send
> late FIN/ACK/RST replies.
>
> 2. There is at least one code path, where tcp_done() is called before
> sending RST [kudos to Bob for investigation]. This is a case of
> a server, that finished sending its data and just called close().
>
> The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
> __tcp_close())
>
> tcp_v4_do_rcv()/tcp_v6_do_rcv()
> tcp_rcv_state_process() /* LINUX_MIB_TCPABORTONDATA */
> tcp_reset()
> tcp_done_with_error()
> tcp_done()
> inet_csk_destroy_sock() /* Destroys AO/MD5 keys */
> /* tcp_rcv_state_process() returns SKB_DROP_REASON_TCP_ABORT_ON_DATA */
> tcp_v4_send_reset() /* Sends an unsigned RST segment */
>
> tcpdump:
> > 22:53:15.399377 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 33929, offset 0, flags [DF], proto TCP (6), length 60)
> > 1.0.0.1.34567 > 1.0.0.2.49848: Flags [F.], seq 2185658590, ack 3969644355, win 502, options [nop,nop,md5 valid], length 0
> > 22:53:15.399396 00:00:01:01:00:00 > 00:00:b2:1f:00:00, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 51951, offset 0, flags [DF], proto TCP (6), length 72)
> > 1.0.0.2.49848 > 1.0.0.1.34567: Flags [.], seq 3969644375, ack 2185658591, win 128, options [nop,nop,md5 valid,nop,nop,sack 1 {2185658590:2185658591}], length 0
> > 22:53:16.429588 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
> > 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658590, win 0, length 0
> > 22:53:16.664725 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> > 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
> > 22:53:17.289832 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> > 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
>
> Note the signed RSTs later in the dump - those are sent by the server
> when the fin-wait socket gets removed from hash buckets, by
> the listener socket.
>
> Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
> slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
> socket can yet send non-data replies, they should be signed in order for
> the peer to process them. Now it also matches how AO/MD5 gets destructed
> for TIME-WAIT sockets (in tcp_twsk_destructor()).
>
> This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
> open problem: once RST get sent and socket gets actually destructed,
> there is no information on the initial sequence numbers. So, in case
> this last RST gets lost in the network, the server's listener socket
> won't be able to properly sign another RST. Nothing in RFC 1122
> prescribes keeping any local state after non-graceful reset.
> Luckily, BGP are known to use keep alive(s).
>
> While the issue is quite minor/cosmetic, these days monitoring network
> counters is a common practice and getting invalid signed segments from
> a trusted BGP peer can get customers worried.
>
> Investigated-by: Bob Gilligan <gilligan@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
> include/net/tcp.h | 4 ++++
> net/ipv4/tcp.c | 27 +++++++++++++++++++++++++++
> net/ipv4/tcp_ipv4.c | 33 ++++++++-------------------------
> net/ipv6/tcp_ipv6.c | 8 ++++++++
> 4 files changed, 47 insertions(+), 25 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 2936b8175950faa777f81f3c6b7230bcc375d772..0009c26241964b54aa93bc1b86158050d96c2c98 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1931,6 +1931,7 @@ tcp_md5_do_lookup_any_l3index(const struct sock *sk,
> }
>
> #define tcp_twsk_md5_key(twsk) ((twsk)->tw_md5_key)
> +void tcp_md5_destruct_sock(struct sock *sk);
> #else
> static inline struct tcp_md5sig_key *
> tcp_md5_do_lookup(const struct sock *sk, int l3index,
> @@ -1947,6 +1948,9 @@ tcp_md5_do_lookup_any_l3index(const struct sock *sk,
> }
>
> #define tcp_twsk_md5_key(twsk) NULL
> +static inline void tcp_md5_destruct_sock(struct sock *sk)
> +{
> +}
> #endif
>
> int tcp_md5_alloc_sigpool(void);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 9bc8317e92b7952871f07ae11a9c2eaa7d3a9e65..927233ee7500e0568782ae4a3860af56d1476acd 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -412,6 +412,33 @@ static u64 tcp_compute_delivery_rate(const struct tcp_sock *tp)
> return rate64;
> }
>
> +#ifdef CONFIG_TCP_MD5SIG
> +static void tcp_md5sig_info_free_rcu(struct rcu_head *head)
> +{
> + struct tcp_md5sig_info *md5sig;
> +
> + md5sig = container_of(head, struct tcp_md5sig_info, rcu);
> + kfree(md5sig);
> + static_branch_slow_dec_deferred(&tcp_md5_needed);
> + tcp_md5_release_sigpool();
> +}
> +
> +void tcp_md5_destruct_sock(struct sock *sk)
> +{
> + struct tcp_sock *tp = tcp_sk(sk);
> +
> + if (tp->md5sig_info) {
> + struct tcp_md5sig_info *md5sig;
> +
> + md5sig = rcu_dereference_protected(tp->md5sig_info, 1);
> + tcp_clear_md5_list(sk);
> + call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu);
> + rcu_assign_pointer(tp->md5sig_info, NULL);
I would move this line before call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu),
otherwise the free could happen before the clear, and an UAF could occur.
It is not absolutely clear if this function runs under rcu_read_lock(),
and even if it is currently safe, this could change in the future.
Other than that :
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v2 1/2] tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()
2025-08-28 19:43 ` Eric Dumazet
@ 2025-08-28 20:00 ` Dmitry Safonov
0 siblings, 0 replies; 5+ messages in thread
From: Dmitry Safonov @ 2025-08-28 20:00 UTC (permalink / raw)
To: Eric Dumazet
Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, David Ahern,
Jakub Kicinski, Paolo Abeni, Simon Horman, Bob Gilligan,
Salam Noureddine, Dmitry Safonov, netdev, linux-kernel
Hi Eric,
On Thu, Aug 28, 2025 at 8:43 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Aug 28, 2025 at 1:15 AM Dmitry Safonov via B4 Relay
> <devnull+dima.arista.com@kernel.org> wrote:
...
> > +void tcp_md5_destruct_sock(struct sock *sk)
> > +{
> > + struct tcp_sock *tp = tcp_sk(sk);
> > +
> > + if (tp->md5sig_info) {
> > + struct tcp_md5sig_info *md5sig;
> > +
> > + md5sig = rcu_dereference_protected(tp->md5sig_info, 1);
> > + tcp_clear_md5_list(sk);
> > + call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu);
> > + rcu_assign_pointer(tp->md5sig_info, NULL);
>
> I would move this line before call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu),
> otherwise the free could happen before the clear, and an UAF could occur.
Good catch! I'll reorder these in v3 just in case the next patch 2/2
would have to be reverted for any reason.
> It is not absolutely clear if this function runs under rcu_read_lock(),
> and even if it is currently safe, this could change in the future.
>
> Other than that :
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
Thanks,
Dmitry
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-08-28 20:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28 8:14 [PATCH net-next v2 0/2] tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct() Dmitry Safonov via B4 Relay
2025-08-28 8:14 ` [PATCH net-next v2 1/2] " Dmitry Safonov via B4 Relay
2025-08-28 19:43 ` Eric Dumazet
2025-08-28 20:00 ` Dmitry Safonov
2025-08-28 8:14 ` [PATCH net-next v2 2/2] tcp: Free TCP-AO/TCP-MD5 info/keys without RCU Dmitry Safonov via B4 Relay
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).