netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/6] Make TCP-MD5-diag slightly less broken
@ 2024-11-06 18:10 Dmitry Safonov via B4 Relay
  2024-11-06 18:10 ` [PATCH net 1/6] net/diag: Do not race on dumping MD5 keys with adding new MD5 keys Dmitry Safonov via B4 Relay
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2024-11-06 18:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, David Ahern, Ivan Delalande, Matthieu Baerts,
	Mat Martineau, Geliang Tang, Boris Pismenny, John Fastabend
  Cc: netdev, linux-kernel, mptcp, Dmitry Safonov, stable

My original intent was to replace the last non-upstream Arista's TCP-AO
piece. That is per-netns procfs seqfile which lists AO keys. In my view
an acceptable upstream alternative would be TCP-AO-diag uAPI.

So, I started by looking and reviewing TCP-MD5-diag code. And straight
away I saw a bunch of issues:

1. Similarly to TCP_MD5SIG_EXT, which doesn't check tcpm_flags for
   unknown flags and so being non-extendable setsockopt(), the same
   way tcp_diag_put_md5sig() dumps md5 keys in an array of
   tcp_diag_md5sig, which makes it ABI non-extendable structure
   as userspace can't tolerate any new members in it.

2. Inet-diag allocates netlink message for sockets in
   inet_diag_dump_one_icsk(), which uses a TCP-diag callback
   .idiag_get_aux_size(), that pre-calculates the needed space for
   TCP-diag related information. But as neither socket lock nor
   rcu_readlock() are held between allocation and the actual TCP
   info filling, the TCP-related space requirement may change before
   reaching tcp_diag_put_md5sig(). I.e., the number of TCP-MD5 keys on
   a socket. Thankfully, TCP-MD5-diag won't overwrite the skb, but will
   return EMSGSIZE, triggering WARN_ON() in inet_diag_dump_one_icsk().

3. Inet-diag "do" request* can create skb of any message required size.
   But "dump" request* the skb size, since d35c99ff77ec ("netlink: do
   not enter direct reclaim from netlink_dump()") is limited by
   32 KB. Having in mind that sizeof(struct tcp_diag_md5sig) = 100 bytes, 
   dumps for sockets that have more than 327 keys are going to fail
   (not counting other diag infos, which lower this limit futher).
   That is much lower than the number of TCP-MD5 keys that can be
   allocated on a socket with the current default
   optmem_max limit (128Kb).

So, then I went and written selftests for TCP-MD5-diag and besides
confirming that (2) and (3) are not theoretical issues, I also
discovered another issues, that I didn't notice on code inspection:

4. nlattr::nla_len is __u16, which limits the largest netlink attibute
   by 64Kb or by 655 tcp_diag_md5sig keys in the diag array. What
   happens de-facto is that the netlink attribute gets u16 overflow,
   breaking the userspace parsing - RTA_NEXT(), that should point
   to the next attribute, points into the middle of md5 keys array.

In this patch set issues (2) and (4) are addressed.
(2) by not returning EMSGSIZE when the dump raced with modifying
TCP-MD5 keys on a socket, but mark the dump inconsistent by setting
NLM_F_DUMP_INTR nlmsg flag. Which changes uAPI in situations where
previously kernel did WARN() and errored the dump.
(4) by artificially limiting the maximum attribute size by U16_MAX - 1.

In order to remove the new limit from (4) solution, my plan is to
convert the dump of TCP-MD5 keys from an array to
NL_ATTR_TYPE_NESTED_ARRAY (or alike), which should also address (1).
And for (3), it's needed to teach tcp-diag how-to remember not only
socket on which previous recvmsg() stopped, but potentially TCP-MD5
key as well.

I plan in the next part of patch set address (3), (1) and the new limit
for (4), together with adding new TCP-AO-diag.

* Terminology from Documentation/userspace-api/netlink/intro.rst

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
Dmitry Safonov (6):
      net/diag: Do not race on dumping MD5 keys with adding new MD5 keys
      net/diag: Warn only once on EMSGSIZE
      net/diag: Pre-allocate optional info only if requested
      net/diag: Always pre-allocate tcp_ulp info
      net/diag: Limit TCP-MD5-diag array by max attribute length
      net/netlink: Correct the comment on netlink message max cap

 include/linux/inet_diag.h |  3 +-
 include/net/tcp.h         |  1 -
 net/ipv4/inet_diag.c      | 87 ++++++++++++++++++++++++++++++++++++++---------
 net/ipv4/tcp_diag.c       | 69 ++++++++++++++++++-------------------
 net/mptcp/diag.c          | 20 -----------
 net/netlink/af_netlink.c  |  4 +--
 net/tls/tls_main.c        | 17 ---------
 7 files changed, 109 insertions(+), 92 deletions(-)
---
base-commit: 2e1b3cc9d7f790145a80cb705b168f05dab65df2
change-id: 20241106-tcp-md5-diag-prep-2f0dcf371d90

Best regards,
-- 
Dmitry Safonov <0x7f454c46@gmail.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net 1/6] net/diag: Do not race on dumping MD5 keys with adding new MD5 keys
  2024-11-06 18:10 [PATCH net 0/6] Make TCP-MD5-diag slightly less broken Dmitry Safonov via B4 Relay
@ 2024-11-06 18:10 ` Dmitry Safonov via B4 Relay
  2024-11-07  4:21   ` kernel test robot
  2024-11-06 18:10 ` [PATCH net 2/6] net/diag: Warn only once on EMSGSIZE Dmitry Safonov via B4 Relay
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2024-11-06 18:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, David Ahern, Ivan Delalande, Matthieu Baerts,
	Mat Martineau, Geliang Tang, Boris Pismenny, John Fastabend
  Cc: netdev, linux-kernel, mptcp, Dmitry Safonov, stable

From: Dmitry Safonov <0x7f454c46@gmail.com>

Inet-diag has two modes: (1) dumping information for a specific socket,
for which kernel creates one netlink message with the information and
(2) dumping information for multiple sockets (possibly with a filter),
where for the reply kernel sends many messages, one for each matched
socket.

Currently those two modes work differently as the information about
a specific socket is never split between multiple messages. For (2),
multi-socket dump for the reply kernel allocates up to 32Kb skb and
fills that with as many socket dumps as possible. For (1), one-socket
dump kernel pre-calculates the required space for the reply, allocates
a new skb and nlmsg and only then starts filling the socket's details.

Preallocating the needed size quite makes sense as most of the details
are fix-sized and provided for each socket, see inet_sk_attr_size().
But there's an exception: .idiag_get_aux_size() which is optional for
a socket. This is provided only for TCP sockets by tcp_diag.

For TCP-MD5 it calculates the memory needed to fill an array of
(struct tcp_diag_md5sig). The issue here is that the amount of keys may
change in inet_diag_dump_one_icsk() between inet_sk_attr_size() and
sk_diag_fill() calls. As the code expects fix-sized information on any
socket, it considers sk_diag_fill() failures by -EMSGSIZE reason as
a bug, resulting in such WARN_ON():

[] ------------[ cut here ]------------
[] WARNING: CPU: 7 PID: 17420 at net/ipv4/inet_diag.c:586 inet_diag_dump_one_icsk+0x3c8/0x420
[] CPU: 7 UID: 0 PID: 17420 Comm: diag_ipv4 Tainted: G        W          6.11.0-rc6-00022-gc9fd7a9f9aca-dirty #2
[] pc : inet_diag_dump_one_icsk+0x3c8/0x420
[] lr : inet_diag_dump_one_icsk+0x1d4/0x420
[] sp : ffff8000aef87460
...
[] Call trace:
[]  inet_diag_dump_one_icsk+0x3c8/0x420
[]  tcp_diag_dump_one+0xa0/0xf0
[]  inet_diag_cmd_exact+0x234/0x278
[]  inet_diag_handler_cmd+0x16c/0x288
[]  sock_diag_rcv_msg+0x1a8/0x550
[]  netlink_rcv_skb+0x198/0x378
[]  sock_diag_rcv+0x20/0x48
[]  netlink_unicast+0x400/0x6a8
[]  netlink_sendmsg+0x654/0xa58
[]  __sys_sendto+0x1ec/0x330
[]  __arm64_sys_sendto+0xc8/0x168
...
[] ---[ end trace 0000000000000000 ]---

One way to solve it would be to grab lock_sock() in
inet_diag_dump_one_icsk(), but that may be costly and bring new lock
dependencies. The alternative is to call tcp_diag_put_md5sig() as
the last attribute of the netlink message and calculate how much space
left after all previous attributes filled and translate it into
(struct tcp_diag_md5sig)-sized units. If it turns out that there's not
enough space for all TCP-MD5 keys, mark the dump as inconsistent by
setting NLM_F_DUMP_INTR flag. Userspace may figure out that dumping
raced with the socket properties change and retry again.

Currently it may be unexpected by userspace that netlink message for one
socket may be inconsistent, but I believe we're on a safe side from
breaking userspace as previously dump would fail and an ugly WARN was
produced in dmesg. IOW, it is a clear improvement.

This is not a theoretical issue: I've written a test and that reproduces
the issue I suspected (the backtrace above).

Fixes: c03fa9bcacd9 ("tcp_diag: report TCP MD5 signing keys and addresses")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
 include/linux/inet_diag.h |  3 ++-
 net/ipv4/inet_diag.c      |  8 +++----
 net/ipv4/tcp_diag.c       | 55 ++++++++++++++++++++++++++++-------------------
 3 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index a9033696b0aad36ab9abd47e4b68e272053019d7..cb2ba672eba131986d0432dd628fc42bbf800886 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -22,7 +22,8 @@ struct inet_diag_handler {
 
 	int		(*idiag_get_aux)(struct sock *sk,
 					 bool net_admin,
-					 struct sk_buff *skb);
+					 struct sk_buff *skb,
+					 struct nlmsghdr *nlh);
 
 	size_t		(*idiag_get_aux_size)(struct sock *sk,
 					      bool net_admin);
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 67639309163d05c034fad80fc9a6096c3b79d42f..67b9cc4c0e47a596a4d588e793b7f13ee040a1e3 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -350,10 +350,6 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 
 	handler->idiag_get_info(sk, r, info);
 
-	if (ext & (1 << (INET_DIAG_INFO - 1)) && handler->idiag_get_aux)
-		if (handler->idiag_get_aux(sk, net_admin, skb) < 0)
-			goto errout;
-
 	if (sk->sk_state < TCP_TIME_WAIT) {
 		union tcp_cc_info info;
 		size_t sz = 0;
@@ -368,6 +364,10 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 			goto errout;
 	}
 
+	if (ext & (1 << (INET_DIAG_INFO - 1)) && handler->idiag_get_aux)
+		if (handler->idiag_get_aux(sk, net_admin, skb, nlh) < 0)
+			goto errout;
+
 	/* Keep it at the end for potential retry with a larger skb,
 	 * or else do best-effort fitting, which is only done for the
 	 * first_nlmsg.
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index f428ecf9120f2f596e1d67db2b2a0d0d0e211905..36606a19b451f059e32c58c0d76a878dc9be5ff0 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -53,29 +53,39 @@ static void tcp_diag_md5sig_fill(struct tcp_diag_md5sig *info,
 }
 
 static int tcp_diag_put_md5sig(struct sk_buff *skb,
-			       const struct tcp_md5sig_info *md5sig)
+			       const struct tcp_md5sig_info *md5sig,
+			       struct nlmsghdr *nlh)
 {
+	size_t key_size = sizeof(struct tcp_diag_md5sig);
+	unsigned int attrlen, md5sig_count;
 	const struct tcp_md5sig_key *key;
 	struct tcp_diag_md5sig *info;
 	struct nlattr *attr;
-	int md5sig_count = 0;
 
+	/*
+	 * Userspace doesn't like to see zero-filled key-values, so
+	 * allocating too large attribute is bad.
+	 */
 	hlist_for_each_entry_rcu(key, &md5sig->head, node)
 		md5sig_count++;
 	if (md5sig_count == 0)
 		return 0;
 
-	attr = nla_reserve(skb, INET_DIAG_MD5SIG,
-			   md5sig_count * sizeof(struct tcp_diag_md5sig));
+	attrlen = skb_availroom(skb) - NLA_HDRLEN;
+	md5sig_count = min(md5sig_count, attrlen / key_size);
+	attr = nla_reserve(skb, INET_DIAG_MD5SIG, md5sig_count * key_size);
 	if (!attr)
 		return -EMSGSIZE;
 
 	info = nla_data(attr);
-	memset(info, 0, md5sig_count * sizeof(struct tcp_diag_md5sig));
+	memset(info, 0, md5sig_count * key_size);
 	hlist_for_each_entry_rcu(key, &md5sig->head, node) {
-		tcp_diag_md5sig_fill(info++, key);
-		if (--md5sig_count == 0)
+		/* More keys on a socket than pre-allocated space available */
+		if (md5sig_count-- == 0) {
+			nlh->nlmsg_flags |= NLM_F_DUMP_INTR;
 			break;
+		}
+		tcp_diag_md5sig_fill(info++, key);
 	}
 
 	return 0;
@@ -110,25 +120,11 @@ static int tcp_diag_put_ulp(struct sk_buff *skb, struct sock *sk,
 }
 
 static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
-			    struct sk_buff *skb)
+			    struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	int err = 0;
 
-#ifdef CONFIG_TCP_MD5SIG
-	if (net_admin) {
-		struct tcp_md5sig_info *md5sig;
-
-		rcu_read_lock();
-		md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
-		if (md5sig)
-			err = tcp_diag_put_md5sig(skb, md5sig);
-		rcu_read_unlock();
-		if (err < 0)
-			return err;
-	}
-#endif
-
 	if (net_admin) {
 		const struct tcp_ulp_ops *ulp_ops;
 
@@ -138,6 +134,21 @@ static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
 		if (err)
 			return err;
 	}
+
+#ifdef CONFIG_TCP_MD5SIG
+	if (net_admin) {
+		struct tcp_md5sig_info *md5sig;
+
+		rcu_read_lock();
+		md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
+		if (md5sig)
+			err = tcp_diag_put_md5sig(skb, md5sig, nlh);
+		rcu_read_unlock();
+		if (err < 0)
+			return err;
+	}
+#endif
+
 	return 0;
 }
 

-- 
2.42.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 2/6] net/diag: Warn only once on EMSGSIZE
  2024-11-06 18:10 [PATCH net 0/6] Make TCP-MD5-diag slightly less broken Dmitry Safonov via B4 Relay
  2024-11-06 18:10 ` [PATCH net 1/6] net/diag: Do not race on dumping MD5 keys with adding new MD5 keys Dmitry Safonov via B4 Relay
@ 2024-11-06 18:10 ` Dmitry Safonov via B4 Relay
  2024-11-06 18:10 ` [PATCH net 3/6] net/diag: Pre-allocate optional info only if requested Dmitry Safonov via B4 Relay
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2024-11-06 18:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, David Ahern, Ivan Delalande, Matthieu Baerts,
	Mat Martineau, Geliang Tang, Boris Pismenny, John Fastabend
  Cc: netdev, linux-kernel, mptcp, Dmitry Safonov

From: Dmitry Safonov <0x7f454c46@gmail.com>

The code clearly expects that the pre-allocated skb will be enough for
the netlink reply message. But if in an unbelievable situation there is
a kernel issue and sk_diag_fill() fails with -EMSGSIZE, this WARN_ON()
can be triggered from userspace. That aggravates the issue from KASLR
leak into possible DOS vector. Use WARN_ON_ONCE() which is clearly
enough to provide an information on a kernel issue.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
 net/ipv4/inet_diag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 67b9cc4c0e47a596a4d588e793b7f13ee040a1e3..ca9a7e61d8d7de80cb234c45c41d6357fde50c11 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -583,7 +583,7 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
 
 	err = sk_diag_fill(sk, rep, cb, req, 0, net_admin);
 	if (err < 0) {
-		WARN_ON(err == -EMSGSIZE);
+		WARN_ON_ONCE(err == -EMSGSIZE);
 		nlmsg_free(rep);
 		goto out;
 	}

-- 
2.42.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 3/6] net/diag: Pre-allocate optional info only if requested
  2024-11-06 18:10 [PATCH net 0/6] Make TCP-MD5-diag slightly less broken Dmitry Safonov via B4 Relay
  2024-11-06 18:10 ` [PATCH net 1/6] net/diag: Do not race on dumping MD5 keys with adding new MD5 keys Dmitry Safonov via B4 Relay
  2024-11-06 18:10 ` [PATCH net 2/6] net/diag: Warn only once on EMSGSIZE Dmitry Safonov via B4 Relay
@ 2024-11-06 18:10 ` Dmitry Safonov via B4 Relay
  2024-11-06 18:10 ` [PATCH net 4/6] net/diag: Always pre-allocate tcp_ulp info Dmitry Safonov via B4 Relay
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2024-11-06 18:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, David Ahern, Ivan Delalande, Matthieu Baerts,
	Mat Martineau, Geliang Tang, Boris Pismenny, John Fastabend
  Cc: netdev, linux-kernel, mptcp, Dmitry Safonov

From: Dmitry Safonov <0x7f454c46@gmail.com>

Those INET_DIAG_* flags from req->idiag_ext are provided by the
userspace, so they are not going to change during one socket dump.
This is going to save just nits and bits for typical netlink reply,
which I'm going to utilise in the very next patch by always allocating
tls_get_info_size().
It's possible to save even some more by checking the request in
inet_diag_msg_attrs_size(), but that's being on very stingy side.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
 net/ipv4/inet_diag.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index ca9a7e61d8d7de80cb234c45c41d6357fde50c11..2dd173a73bd1e2657957e5e4ecb70401cc85dfda 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -102,24 +102,31 @@ static size_t inet_sk_attr_size(struct sock *sk,
 				bool net_admin)
 {
 	const struct inet_diag_handler *handler;
-	size_t aux = 0;
+	int ext = req->idiag_ext;
+	size_t ret = 0;
 
 	rcu_read_lock();
 	handler = rcu_dereference(inet_diag_table[req->sdiag_protocol]);
 	DEBUG_NET_WARN_ON_ONCE(!handler);
 	if (handler && handler->idiag_get_aux_size)
-		aux = handler->idiag_get_aux_size(sk, net_admin);
+		ret += handler->idiag_get_aux_size(sk, net_admin);
 	rcu_read_unlock();
 
-	return	  nla_total_size(sizeof(struct tcp_info))
-		+ nla_total_size(sizeof(struct inet_diag_msg))
-		+ inet_diag_msg_attrs_size()
-		+ nla_total_size(sizeof(struct inet_diag_meminfo))
-		+ nla_total_size(SK_MEMINFO_VARS * sizeof(u32))
-		+ nla_total_size(TCP_CA_NAME_MAX)
-		+ nla_total_size(sizeof(struct tcpvegas_info))
-		+ aux
-		+ 64;
+	ret += nla_total_size(sizeof(struct tcp_info))
+	     + nla_total_size(sizeof(struct inet_diag_msg))
+	     + inet_diag_msg_attrs_size()
+	     + 64;
+
+	if (ext & (1 << (INET_DIAG_MEMINFO - 1)))
+		ret += nla_total_size(sizeof(struct inet_diag_meminfo));
+	if (ext & (1 << (INET_DIAG_SKMEMINFO - 1)))
+		ret += nla_total_size(SK_MEMINFO_VARS * sizeof(u32));
+	if (ext & (1 << (INET_DIAG_CONG - 1)))
+		ret += nla_total_size(TCP_CA_NAME_MAX);
+	if (ext & (1 << (INET_DIAG_VEGASINFO - 1)))
+		ret += nla_total_size(sizeof(struct tcpvegas_info));
+
+	return ret;
 }
 
 int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,

-- 
2.42.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 4/6] net/diag: Always pre-allocate tcp_ulp info
  2024-11-06 18:10 [PATCH net 0/6] Make TCP-MD5-diag slightly less broken Dmitry Safonov via B4 Relay
                   ` (2 preceding siblings ...)
  2024-11-06 18:10 ` [PATCH net 3/6] net/diag: Pre-allocate optional info only if requested Dmitry Safonov via B4 Relay
@ 2024-11-06 18:10 ` Dmitry Safonov via B4 Relay
  2024-11-07  0:21   ` Kuniyuki Iwashima
  2024-11-06 18:10 ` [PATCH net 5/6] net/diag: Limit TCP-MD5-diag array by max attribute length Dmitry Safonov via B4 Relay
  2024-11-06 18:10 ` [PATCH net 6/6] net/netlink: Correct the comment on netlink message max cap Dmitry Safonov via B4 Relay
  5 siblings, 1 reply; 12+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2024-11-06 18:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, David Ahern, Ivan Delalande, Matthieu Baerts,
	Mat Martineau, Geliang Tang, Boris Pismenny, John Fastabend
  Cc: netdev, linux-kernel, mptcp, Dmitry Safonov

From: Dmitry Safonov <0x7f454c46@gmail.com>

Currently there is a theoretical race between netlink one-socket dump
and allocating icsk->icsk_ulp_ops.

Simplify the expectations by always allocating maximum tcp_ulp-info.
With the previous patch the typical netlink message allocation was
decreased for kernel replies on requests without idiag_ext flags,
so let's use it.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
 include/net/tcp.h    |  1 -
 net/ipv4/inet_diag.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_diag.c  | 13 -------------
 net/mptcp/diag.c     | 20 --------------------
 net/tls/tls_main.c   | 17 -----------------
 5 files changed, 48 insertions(+), 51 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d1948d357dade0842777265d3397842919f9eee0..757711aa5337ae7e6abee62d303eb66d37082e19 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2568,7 +2568,6 @@ struct tcp_ulp_ops {
 	void (*release)(struct sock *sk);
 	/* diagnostic */
 	int (*get_info)(struct sock *sk, struct sk_buff *skb);
-	size_t (*get_info_size)(const struct sock *sk);
 	/* clone ulp */
 	void (*clone)(const struct request_sock *req, struct sock *newsk,
 		      const gfp_t priority);
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 2dd173a73bd1e2657957e5e4ecb70401cc85dfda..97862971d552216e574cac3dd2a8fc8c893888d3 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -97,6 +97,53 @@ void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill);
 
+static size_t tls_get_info_size(void)
+{
+	size_t size = 0;
+
+#ifdef CONFIG_TLS
+	size += nla_total_size(0) +             /* INET_ULP_INFO_TLS */
+		nla_total_size(sizeof(u16)) +   /* TLS_INFO_VERSION */
+		nla_total_size(sizeof(u16)) +   /* TLS_INFO_CIPHER */
+		nla_total_size(sizeof(u16)) +   /* TLS_INFO_RXCONF */
+		nla_total_size(sizeof(u16)) +   /* TLS_INFO_TXCONF */
+		nla_total_size(0) +             /* TLS_INFO_ZC_RO_TX */
+		nla_total_size(0) +             /* TLS_INFO_RX_NO_PAD */
+		0;
+#endif
+
+	return size;
+}
+
+static size_t subflow_get_info_size(void)
+{
+	size_t size = 0;
+
+#ifdef CONFIG_MPTCP
+	size += nla_total_size(0) +     /* INET_ULP_INFO_MPTCP */
+		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_TOKEN_REM */
+		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_TOKEN_LOC */
+		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_RELWRITE_SEQ */
+		nla_total_size_64bit(8) +       /* MPTCP_SUBFLOW_ATTR_MAP_SEQ */
+		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_MAP_SFSEQ */
+		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_SSN_OFFSET */
+		nla_total_size(2) +     /* MPTCP_SUBFLOW_ATTR_MAP_DATALEN */
+		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_FLAGS */
+		nla_total_size(1) +     /* MPTCP_SUBFLOW_ATTR_ID_REM */
+		nla_total_size(1) +     /* MPTCP_SUBFLOW_ATTR_ID_LOC */
+		0;
+#endif
+
+	return size;
+}
+
+static size_t tcp_ulp_ops_size(void)
+{
+	size_t size = max(tls_get_info_size(), subflow_get_info_size());
+
+	return size + nla_total_size(0) + nla_total_size(TCP_ULP_NAME_MAX);
+}
+
 static size_t inet_sk_attr_size(struct sock *sk,
 				const struct inet_diag_req_v2 *req,
 				bool net_admin)
@@ -115,6 +162,7 @@ static size_t inet_sk_attr_size(struct sock *sk,
 	ret += nla_total_size(sizeof(struct tcp_info))
 	     + nla_total_size(sizeof(struct inet_diag_msg))
 	     + inet_diag_msg_attrs_size()
+	     + tcp_ulp_ops_size()
 	     + 64;
 
 	if (ext & (1 << (INET_DIAG_MEMINFO - 1)))
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index 36606a19b451f059e32c58c0d76a878dc9be5ff0..722dbfd54d247b4def1e77b1674c5b207c5a939d 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -154,7 +154,6 @@ static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
 
 static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
 {
-	struct inet_connection_sock *icsk = inet_csk(sk);
 	size_t size = 0;
 
 #ifdef CONFIG_TCP_MD5SIG
@@ -174,18 +173,6 @@ static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
 				       sizeof(struct tcp_diag_md5sig));
 	}
 #endif
-
-	if (net_admin && sk_fullsock(sk)) {
-		const struct tcp_ulp_ops *ulp_ops;
-
-		ulp_ops = icsk->icsk_ulp_ops;
-		if (ulp_ops) {
-			size += nla_total_size(0) +
-				nla_total_size(TCP_ULP_NAME_MAX);
-			if (ulp_ops->get_info_size)
-				size += ulp_ops->get_info_size(sk);
-		}
-	}
 	return size;
 }
 
diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c
index 2d3efb405437d85c0bca70d7a92ca3a7363365e1..8b36867e4ddd5f45cebcf60e9093a061d5208756 100644
--- a/net/mptcp/diag.c
+++ b/net/mptcp/diag.c
@@ -84,27 +84,7 @@ static int subflow_get_info(struct sock *sk, struct sk_buff *skb)
 	return err;
 }
 
-static size_t subflow_get_info_size(const struct sock *sk)
-{
-	size_t size = 0;
-
-	size += nla_total_size(0) +	/* INET_ULP_INFO_MPTCP */
-		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_TOKEN_REM */
-		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_TOKEN_LOC */
-		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_RELWRITE_SEQ */
-		nla_total_size_64bit(8) +	/* MPTCP_SUBFLOW_ATTR_MAP_SEQ */
-		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_MAP_SFSEQ */
-		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_SSN_OFFSET */
-		nla_total_size(2) +	/* MPTCP_SUBFLOW_ATTR_MAP_DATALEN */
-		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_FLAGS */
-		nla_total_size(1) +	/* MPTCP_SUBFLOW_ATTR_ID_REM */
-		nla_total_size(1) +	/* MPTCP_SUBFLOW_ATTR_ID_LOC */
-		0;
-	return size;
-}
-
 void mptcp_diag_subflow_init(struct tcp_ulp_ops *ops)
 {
 	ops->get_info = subflow_get_info;
-	ops->get_info_size = subflow_get_info_size;
 }
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 6b4b9f2749a6fd6de495940c5cb3f2154a5a451e..f3491c4e942e08dc882cb81eef071203384b2b37 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -1072,22 +1072,6 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb)
 	return err;
 }
 
-static size_t tls_get_info_size(const struct sock *sk)
-{
-	size_t size = 0;
-
-	size += nla_total_size(0) +		/* INET_ULP_INFO_TLS */
-		nla_total_size(sizeof(u16)) +	/* TLS_INFO_VERSION */
-		nla_total_size(sizeof(u16)) +	/* TLS_INFO_CIPHER */
-		nla_total_size(sizeof(u16)) +	/* TLS_INFO_RXCONF */
-		nla_total_size(sizeof(u16)) +	/* TLS_INFO_TXCONF */
-		nla_total_size(0) +		/* TLS_INFO_ZC_RO_TX */
-		nla_total_size(0) +		/* TLS_INFO_RX_NO_PAD */
-		0;
-
-	return size;
-}
-
 static int __net_init tls_init_net(struct net *net)
 {
 	int err;
@@ -1123,7 +1107,6 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = {
 	.init			= tls_init,
 	.update			= tls_update,
 	.get_info		= tls_get_info,
-	.get_info_size		= tls_get_info_size,
 };
 
 static int __init tls_register(void)

-- 
2.42.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 5/6] net/diag: Limit TCP-MD5-diag array by max attribute length
  2024-11-06 18:10 [PATCH net 0/6] Make TCP-MD5-diag slightly less broken Dmitry Safonov via B4 Relay
                   ` (3 preceding siblings ...)
  2024-11-06 18:10 ` [PATCH net 4/6] net/diag: Always pre-allocate tcp_ulp info Dmitry Safonov via B4 Relay
@ 2024-11-06 18:10 ` Dmitry Safonov via B4 Relay
  2024-11-07  0:25   ` Kuniyuki Iwashima
  2024-11-06 18:10 ` [PATCH net 6/6] net/netlink: Correct the comment on netlink message max cap Dmitry Safonov via B4 Relay
  5 siblings, 1 reply; 12+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2024-11-06 18:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, David Ahern, Ivan Delalande, Matthieu Baerts,
	Mat Martineau, Geliang Tang, Boris Pismenny, John Fastabend
  Cc: netdev, linux-kernel, mptcp, Dmitry Safonov

From: Dmitry Safonov <0x7f454c46@gmail.com>

Currently TCP-MD5 keys are dumped as an array of
(struct tcp_diag_md5sig). All the keys from a socket go
into the same netlink attribute. The maximum amount of TCP-MD5 keys on
any socket is limited by /proc/sys/net/core/optmem_max, which post
commit 4944566706b2 ("net: increase optmem_max default value") is now by
default 128 KB. With the help of selftest I've figured out that equals
to 963 keys, without user having to increase optmem_max:
> test_set_md5() [963/1024]: Cannot allocate memory

The maximum length of nlattr is limited by typeof(nlattr::nla_len),
which is (U16_MAX - 1). When there are too many keys the array written
overflows the netlink attribute. Here is what one can see on a test,
with no adjustments to optmem_max defaults:

> recv() = 65180
> socket: 10.0.254.1:7013->0.0.0.0:0 (intf 3)
>      family: 2 state: 10 timer: 0 retrans: 0
>      expires: 0 rqueu: 0 wqueue: 1 uid: 0 inode: 456
>              attr type: 8 (5)
>              attr type: 15 (8)
>              attr type: 21 (12)
>              attr type: 22 (6)
>              attr type: 2 (252)
>              attr type: 18 (64804)
> recv() = 130680
> socket: 10.0.254.1:7013->0.0.0.0:0 (intf 3)
>      family: 2 state: 10 timer: 0 retrans: 0
>      expires: 0 rqueu: 0 wqueue: 1 uid: 0 inode: 456
>              attr type: 8 (5)
>              attr type: 15 (8)
>              attr type: 21 (12)
>              attr type: 22 (6)
>              attr type: 2 (252)
>              attr type: 18 (64768)
>              attr type: 29555 (25966)
> recv() = 130680
> socket: 10.0.254.1:7013->0.0.0.0:0 (intf 3)
>      family: 2 state: 10 timer: 0 retrans: 0
>      expires: 0 rqueu: 0 wqueue: 1 uid: 0 inode: 456
>              attr type: 8 (5)
>              attr type: 15 (8)
>              attr type: 21 (12)
>              attr type: 22 (6)
>              attr type: 2 (252)
>              attr type: 18 (64768)
>              attr type: 29555 (25966)
>              attr type: 8265 (8236)

Here attribute type 18 is INET_DIAG_MD5SIG, the following nlattr types
are junk made of tcp_diag_md5sig's content.

Here is the overflow of the nlattr size:
>>> hex(64768)
'0xfd00'
>>> hex(130300)
'0x1fcfc'

Limit the size of (struct tcp_diag_md5sig) array in the netlink reply by
maximum attribute length. Not perfect as NLM_F_DUMP_INTR will be set on
the netlink header flags, but the userspace can differ if it's due to
inconsistency or due to maximum size of the netlink attribute.

In a following patch set, I'm planning to address this and re-introduce
TCP-MD5-diag that actually works.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
 net/ipv4/tcp_diag.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index 722dbfd54d247b4def1e77b1674c5b207c5a939d..d55a0ac39fa0853806efd4a6b38591255e0f4930 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -72,6 +72,7 @@ static int tcp_diag_put_md5sig(struct sk_buff *skb,
 		return 0;
 
 	attrlen = skb_availroom(skb) - NLA_HDRLEN;
+	attrlen = min(attrlen, U16_MAX - 1); /* attr->nla_len */
 	md5sig_count = min(md5sig_count, attrlen / key_size);
 	attr = nla_reserve(skb, INET_DIAG_MD5SIG, md5sig_count * key_size);
 	if (!attr)

-- 
2.42.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 6/6] net/netlink: Correct the comment on netlink message max cap
  2024-11-06 18:10 [PATCH net 0/6] Make TCP-MD5-diag slightly less broken Dmitry Safonov via B4 Relay
                   ` (4 preceding siblings ...)
  2024-11-06 18:10 ` [PATCH net 5/6] net/diag: Limit TCP-MD5-diag array by max attribute length Dmitry Safonov via B4 Relay
@ 2024-11-06 18:10 ` Dmitry Safonov via B4 Relay
  5 siblings, 0 replies; 12+ messages in thread
From: Dmitry Safonov via B4 Relay @ 2024-11-06 18:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, David Ahern, Ivan Delalande, Matthieu Baerts,
	Mat Martineau, Geliang Tang, Boris Pismenny, John Fastabend
  Cc: netdev, linux-kernel, mptcp, Dmitry Safonov

From: Dmitry Safonov <0x7f454c46@gmail.com>

Since commit d35c99ff77ec ("netlink: do not enter direct reclaim from
netlink_dump()") the cap is 32KiB.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
 net/netlink/af_netlink.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 0a9287fadb47a2afaf0babe675738bc43051c5a7..27979cefc06256bde052898d193ed99f710c2087 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2279,7 +2279,7 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 		goto errout_skb;
 
 	/* NLMSG_GOODSIZE is small to avoid high order allocations being
-	 * required, but it makes sense to _attempt_ a 16K bytes allocation
+	 * required, but it makes sense to _attempt_ a 32KiB allocation
 	 * to reduce number of system calls on dump operations, if user
 	 * ever provided a big enough buffer.
 	 */
@@ -2301,7 +2301,7 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 		goto errout_skb;
 
 	/* Trim skb to allocated size. User is expected to provide buffer as
-	 * large as max(min_dump_alloc, 16KiB (mac_recvmsg_len capped at
+	 * large as max(min_dump_alloc, 32KiB (max_recvmsg_len capped at
 	 * netlink_recvmsg())). dump will pack as many smaller messages as
 	 * could fit within the allocated skb. skb is typically allocated
 	 * with larger space than required (could be as much as near 2x the

-- 
2.42.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net 4/6] net/diag: Always pre-allocate tcp_ulp info
  2024-11-06 18:10 ` [PATCH net 4/6] net/diag: Always pre-allocate tcp_ulp info Dmitry Safonov via B4 Relay
@ 2024-11-07  0:21   ` Kuniyuki Iwashima
  2024-11-07 17:35     ` Dmitry Safonov
  0 siblings, 1 reply; 12+ messages in thread
From: Kuniyuki Iwashima @ 2024-11-07  0:21 UTC (permalink / raw)
  To: devnull+0x7f454c46.gmail.com
  Cc: 0x7f454c46, borisp, colona, davem, dsahern, edumazet, geliang,
	horms, john.fastabend, kuba, linux-kernel, martineau, matttbe,
	mptcp, netdev, pabeni

From: Dmitry Safonov via B4 Relay <devnull+0x7f454c46.gmail.com@kernel.org>
Date: Wed, 06 Nov 2024 18:10:17 +0000
> From: Dmitry Safonov <0x7f454c46@gmail.com>
> 
> Currently there is a theoretical race between netlink one-socket dump
> and allocating icsk->icsk_ulp_ops.
> 
> Simplify the expectations by always allocating maximum tcp_ulp-info.
> With the previous patch the typical netlink message allocation was
> decreased for kernel replies on requests without idiag_ext flags,
> so let's use it.
>

I think Fixes tag is needed.


> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
> ---
>  include/net/tcp.h    |  1 -
>  net/ipv4/inet_diag.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  net/ipv4/tcp_diag.c  | 13 -------------
>  net/mptcp/diag.c     | 20 --------------------
>  net/tls/tls_main.c   | 17 -----------------
>  5 files changed, 48 insertions(+), 51 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index d1948d357dade0842777265d3397842919f9eee0..757711aa5337ae7e6abee62d303eb66d37082e19 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2568,7 +2568,6 @@ struct tcp_ulp_ops {
>  	void (*release)(struct sock *sk);
>  	/* diagnostic */
>  	int (*get_info)(struct sock *sk, struct sk_buff *skb);
> -	size_t (*get_info_size)(const struct sock *sk);
>  	/* clone ulp */
>  	void (*clone)(const struct request_sock *req, struct sock *newsk,
>  		      const gfp_t priority);
> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index 2dd173a73bd1e2657957e5e4ecb70401cc85dfda..97862971d552216e574cac3dd2a8fc8c893888d3 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -97,6 +97,53 @@ void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
>  }
>  EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill);
>  
> +static size_t tls_get_info_size(void)
> +{
> +	size_t size = 0;
> +
> +#ifdef CONFIG_TLS
> +	size += nla_total_size(0) +             /* INET_ULP_INFO_TLS */

It seems '\t' after '+' was converted to '\s' by copy-and-paste.


> +		nla_total_size(sizeof(u16)) +   /* TLS_INFO_VERSION */
> +		nla_total_size(sizeof(u16)) +   /* TLS_INFO_CIPHER */
> +		nla_total_size(sizeof(u16)) +   /* TLS_INFO_RXCONF */
> +		nla_total_size(sizeof(u16)) +   /* TLS_INFO_TXCONF */
> +		nla_total_size(0) +             /* TLS_INFO_ZC_RO_TX */
> +		nla_total_size(0) +             /* TLS_INFO_RX_NO_PAD */
> +		0;
> +#endif
> +
> +	return size;
> +}
> +
> +static size_t subflow_get_info_size(void)
> +{
> +	size_t size = 0;
> +
> +#ifdef CONFIG_MPTCP
> +	size += nla_total_size(0) +     /* INET_ULP_INFO_MPTCP */
> +		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_TOKEN_REM */
> +		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_TOKEN_LOC */
> +		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_RELWRITE_SEQ */
> +		nla_total_size_64bit(8) +       /* MPTCP_SUBFLOW_ATTR_MAP_SEQ */

While at it, let's adjust tabs to match with MPTCP_SUBFLOW_ATTR_MAP_SEQ.


> +		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_MAP_SFSEQ */
> +		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_SSN_OFFSET */
> +		nla_total_size(2) +     /* MPTCP_SUBFLOW_ATTR_MAP_DATALEN */
> +		nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_FLAGS */
> +		nla_total_size(1) +     /* MPTCP_SUBFLOW_ATTR_ID_REM */
> +		nla_total_size(1) +     /* MPTCP_SUBFLOW_ATTR_ID_LOC */
> +		0;
> +#endif
> +
> +	return size;
> +}
> +
> +static size_t tcp_ulp_ops_size(void)
> +{
> +	size_t size = max(tls_get_info_size(), subflow_get_info_size());
> +
> +	return size + nla_total_size(0) + nla_total_size(TCP_ULP_NAME_MAX);

Is nla_total_size(0) for INET_DIAG_ULP_INFO ?

It would be better to break them down in the same format with comment
like tls_get_info_size() and subflow_get_info_size().


> +}
> +
>  static size_t inet_sk_attr_size(struct sock *sk,
>  				const struct inet_diag_req_v2 *req,
>  				bool net_admin)
> @@ -115,6 +162,7 @@ static size_t inet_sk_attr_size(struct sock *sk,
>  	ret += nla_total_size(sizeof(struct tcp_info))
>  	     + nla_total_size(sizeof(struct inet_diag_msg))
>  	     + inet_diag_msg_attrs_size()
> +	     + tcp_ulp_ops_size()
>  	     + 64;
>  
>  	if (ext & (1 << (INET_DIAG_MEMINFO - 1)))
> diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
> index 36606a19b451f059e32c58c0d76a878dc9be5ff0..722dbfd54d247b4def1e77b1674c5b207c5a939d 100644
> --- a/net/ipv4/tcp_diag.c
> +++ b/net/ipv4/tcp_diag.c
> @@ -154,7 +154,6 @@ static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
>  
>  static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
>  {
> -	struct inet_connection_sock *icsk = inet_csk(sk);
>  	size_t size = 0;
>  
>  #ifdef CONFIG_TCP_MD5SIG
> @@ -174,18 +173,6 @@ static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
>  				       sizeof(struct tcp_diag_md5sig));
>  	}
>  #endif
> -
> -	if (net_admin && sk_fullsock(sk)) {
> -		const struct tcp_ulp_ops *ulp_ops;
> -
> -		ulp_ops = icsk->icsk_ulp_ops;
> -		if (ulp_ops) {
> -			size += nla_total_size(0) +
> -				nla_total_size(TCP_ULP_NAME_MAX);
> -			if (ulp_ops->get_info_size)
> -				size += ulp_ops->get_info_size(sk);
> -		}
> -	}
>  	return size;
>  }
>  
> diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c
> index 2d3efb405437d85c0bca70d7a92ca3a7363365e1..8b36867e4ddd5f45cebcf60e9093a061d5208756 100644
> --- a/net/mptcp/diag.c
> +++ b/net/mptcp/diag.c
> @@ -84,27 +84,7 @@ static int subflow_get_info(struct sock *sk, struct sk_buff *skb)
>  	return err;
>  }
>  
> -static size_t subflow_get_info_size(const struct sock *sk)
> -{
> -	size_t size = 0;
> -
> -	size += nla_total_size(0) +	/* INET_ULP_INFO_MPTCP */
> -		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_TOKEN_REM */
> -		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_TOKEN_LOC */
> -		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_RELWRITE_SEQ */
> -		nla_total_size_64bit(8) +	/* MPTCP_SUBFLOW_ATTR_MAP_SEQ */
> -		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_MAP_SFSEQ */
> -		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_SSN_OFFSET */
> -		nla_total_size(2) +	/* MPTCP_SUBFLOW_ATTR_MAP_DATALEN */
> -		nla_total_size(4) +	/* MPTCP_SUBFLOW_ATTR_FLAGS */
> -		nla_total_size(1) +	/* MPTCP_SUBFLOW_ATTR_ID_REM */
> -		nla_total_size(1) +	/* MPTCP_SUBFLOW_ATTR_ID_LOC */
> -		0;
> -	return size;
> -}
> -
>  void mptcp_diag_subflow_init(struct tcp_ulp_ops *ops)
>  {
>  	ops->get_info = subflow_get_info;
> -	ops->get_info_size = subflow_get_info_size;
>  }
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index 6b4b9f2749a6fd6de495940c5cb3f2154a5a451e..f3491c4e942e08dc882cb81eef071203384b2b37 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -1072,22 +1072,6 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb)
>  	return err;
>  }
>  
> -static size_t tls_get_info_size(const struct sock *sk)
> -{
> -	size_t size = 0;
> -
> -	size += nla_total_size(0) +		/* INET_ULP_INFO_TLS */
> -		nla_total_size(sizeof(u16)) +	/* TLS_INFO_VERSION */
> -		nla_total_size(sizeof(u16)) +	/* TLS_INFO_CIPHER */
> -		nla_total_size(sizeof(u16)) +	/* TLS_INFO_RXCONF */
> -		nla_total_size(sizeof(u16)) +	/* TLS_INFO_TXCONF */
> -		nla_total_size(0) +		/* TLS_INFO_ZC_RO_TX */
> -		nla_total_size(0) +		/* TLS_INFO_RX_NO_PAD */
> -		0;
> -
> -	return size;
> -}
> -
>  static int __net_init tls_init_net(struct net *net)
>  {
>  	int err;
> @@ -1123,7 +1107,6 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = {
>  	.init			= tls_init,
>  	.update			= tls_update,
>  	.get_info		= tls_get_info,
> -	.get_info_size		= tls_get_info_size,
>  };
>  
>  static int __init tls_register(void)
> 
> -- 
> 2.42.2
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 5/6] net/diag: Limit TCP-MD5-diag array by max attribute length
  2024-11-06 18:10 ` [PATCH net 5/6] net/diag: Limit TCP-MD5-diag array by max attribute length Dmitry Safonov via B4 Relay
@ 2024-11-07  0:25   ` Kuniyuki Iwashima
  2024-11-07 17:46     ` Dmitry Safonov
  0 siblings, 1 reply; 12+ messages in thread
From: Kuniyuki Iwashima @ 2024-11-07  0:25 UTC (permalink / raw)
  To: devnull+0x7f454c46.gmail.com
  Cc: 0x7f454c46, borisp, colona, davem, dsahern, edumazet, geliang,
	horms, john.fastabend, kuba, linux-kernel, martineau, matttbe,
	mptcp, netdev, pabeni

From: Dmitry Safonov via B4 Relay <devnull+0x7f454c46.gmail.com@kernel.org>
Date: Wed, 06 Nov 2024 18:10:18 +0000
> From: Dmitry Safonov <0x7f454c46@gmail.com>
> 
> Currently TCP-MD5 keys are dumped as an array of
> (struct tcp_diag_md5sig). All the keys from a socket go
> into the same netlink attribute. The maximum amount of TCP-MD5 keys on
> any socket is limited by /proc/sys/net/core/optmem_max, which post
> commit 4944566706b2 ("net: increase optmem_max default value") is now by
> default 128 KB. With the help of selftest I've figured out that equals
> to 963 keys, without user having to increase optmem_max:
> > test_set_md5() [963/1024]: Cannot allocate memory
> 
> The maximum length of nlattr is limited by typeof(nlattr::nla_len),
> which is (U16_MAX - 1). When there are too many keys the array written
> overflows the netlink attribute. Here is what one can see on a test,
> with no adjustments to optmem_max defaults:
> 
> > recv() = 65180
> > socket: 10.0.254.1:7013->0.0.0.0:0 (intf 3)
> >      family: 2 state: 10 timer: 0 retrans: 0
> >      expires: 0 rqueu: 0 wqueue: 1 uid: 0 inode: 456
> >              attr type: 8 (5)
> >              attr type: 15 (8)
> >              attr type: 21 (12)
> >              attr type: 22 (6)
> >              attr type: 2 (252)
> >              attr type: 18 (64804)
> > recv() = 130680
> > socket: 10.0.254.1:7013->0.0.0.0:0 (intf 3)
> >      family: 2 state: 10 timer: 0 retrans: 0
> >      expires: 0 rqueu: 0 wqueue: 1 uid: 0 inode: 456
> >              attr type: 8 (5)
> >              attr type: 15 (8)
> >              attr type: 21 (12)
> >              attr type: 22 (6)
> >              attr type: 2 (252)
> >              attr type: 18 (64768)
> >              attr type: 29555 (25966)
> > recv() = 130680
> > socket: 10.0.254.1:7013->0.0.0.0:0 (intf 3)
> >      family: 2 state: 10 timer: 0 retrans: 0
> >      expires: 0 rqueu: 0 wqueue: 1 uid: 0 inode: 456
> >              attr type: 8 (5)
> >              attr type: 15 (8)
> >              attr type: 21 (12)
> >              attr type: 22 (6)
> >              attr type: 2 (252)
> >              attr type: 18 (64768)
> >              attr type: 29555 (25966)
> >              attr type: 8265 (8236)
> 
> Here attribute type 18 is INET_DIAG_MD5SIG, the following nlattr types
> are junk made of tcp_diag_md5sig's content.
> 
> Here is the overflow of the nlattr size:
> >>> hex(64768)
> '0xfd00'
> >>> hex(130300)
> '0x1fcfc'
> 
> Limit the size of (struct tcp_diag_md5sig) array in the netlink reply by
> maximum attribute length. Not perfect as NLM_F_DUMP_INTR will be set on
> the netlink header flags, but the userspace can differ if it's due to
> inconsistency or due to maximum size of the netlink attribute.
> 
> In a following patch set, I'm planning to address this and re-introduce
> TCP-MD5-diag that actually works.

Given the issue has not been reported so far (I think), we can wait for
the series rather than backporting this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 1/6] net/diag: Do not race on dumping MD5 keys with adding new MD5 keys
  2024-11-06 18:10 ` [PATCH net 1/6] net/diag: Do not race on dumping MD5 keys with adding new MD5 keys Dmitry Safonov via B4 Relay
@ 2024-11-07  4:21   ` kernel test robot
  0 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2024-11-07  4:21 UTC (permalink / raw)
  To: Dmitry Safonov via B4 Relay, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Ahern,
	Ivan Delalande, Matthieu Baerts, Mat Martineau, Geliang Tang,
	Boris Pismenny, John Fastabend
  Cc: llvm, oe-kbuild-all, netdev, linux-kernel, mptcp, Dmitry Safonov,
	stable

Hi Dmitry,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 2e1b3cc9d7f790145a80cb705b168f05dab65df2]

url:    https://github.com/intel-lab-lkp/linux/commits/Dmitry-Safonov-via-B4-Relay/net-diag-Do-not-race-on-dumping-MD5-keys-with-adding-new-MD5-keys/20241107-025054
base:   2e1b3cc9d7f790145a80cb705b168f05dab65df2
patch link:    https://lore.kernel.org/r/20241106-tcp-md5-diag-prep-v1-1-d62debf3dded%40gmail.com
patch subject: [PATCH net 1/6] net/diag: Do not race on dumping MD5 keys with adding new MD5 keys
config: arm64-randconfig-003-20241107 (https://download.01.org/0day-ci/archive/20241107/202411071218.G7g6a8JG-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 592c0fe55f6d9a811028b5f3507be91458ab2713)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241107/202411071218.G7g6a8JG-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411071218.G7g6a8JG-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from net/ipv4/tcp_diag.c:9:
   In file included from include/linux/net.h:24:
   In file included from include/linux/mm.h:2213:
   include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     504 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     505 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     511 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     512 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     524 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     525 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> net/ipv4/tcp_diag.c:70:3: warning: variable 'md5sig_count' is uninitialized when used here [-Wuninitialized]
      70 |                 md5sig_count++;
         |                 ^~~~~~~~~~~~
   net/ipv4/tcp_diag.c:60:36: note: initialize the variable 'md5sig_count' to silence this warning
      60 |         unsigned int attrlen, md5sig_count;
         |                                           ^
         |                                            = 0
   5 warnings generated.


vim +/md5sig_count +70 net/ipv4/tcp_diag.c

c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  54  
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  55  static int tcp_diag_put_md5sig(struct sk_buff *skb,
4a6144fbc706b3c Dmitry Safonov 2024-11-06  56  			       const struct tcp_md5sig_info *md5sig,
4a6144fbc706b3c Dmitry Safonov 2024-11-06  57  			       struct nlmsghdr *nlh)
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  58  {
4a6144fbc706b3c Dmitry Safonov 2024-11-06  59  	size_t key_size = sizeof(struct tcp_diag_md5sig);
4a6144fbc706b3c Dmitry Safonov 2024-11-06  60  	unsigned int attrlen, md5sig_count;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  61  	const struct tcp_md5sig_key *key;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  62  	struct tcp_diag_md5sig *info;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  63  	struct nlattr *attr;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  64  
4a6144fbc706b3c Dmitry Safonov 2024-11-06  65  	/*
4a6144fbc706b3c Dmitry Safonov 2024-11-06  66  	 * Userspace doesn't like to see zero-filled key-values, so
4a6144fbc706b3c Dmitry Safonov 2024-11-06  67  	 * allocating too large attribute is bad.
4a6144fbc706b3c Dmitry Safonov 2024-11-06  68  	 */
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  69  	hlist_for_each_entry_rcu(key, &md5sig->head, node)
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31 @70  		md5sig_count++;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  71  	if (md5sig_count == 0)
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  72  		return 0;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  73  
4a6144fbc706b3c Dmitry Safonov 2024-11-06  74  	attrlen = skb_availroom(skb) - NLA_HDRLEN;
4a6144fbc706b3c Dmitry Safonov 2024-11-06  75  	md5sig_count = min(md5sig_count, attrlen / key_size);
4a6144fbc706b3c Dmitry Safonov 2024-11-06  76  	attr = nla_reserve(skb, INET_DIAG_MD5SIG, md5sig_count * key_size);
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  77  	if (!attr)
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  78  		return -EMSGSIZE;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  79  
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  80  	info = nla_data(attr);
4a6144fbc706b3c Dmitry Safonov 2024-11-06  81  	memset(info, 0, md5sig_count * key_size);
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  82  	hlist_for_each_entry_rcu(key, &md5sig->head, node) {
4a6144fbc706b3c Dmitry Safonov 2024-11-06  83  		/* More keys on a socket than pre-allocated space available */
4a6144fbc706b3c Dmitry Safonov 2024-11-06  84  		if (md5sig_count-- == 0) {
4a6144fbc706b3c Dmitry Safonov 2024-11-06  85  			nlh->nlmsg_flags |= NLM_F_DUMP_INTR;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  86  			break;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  87  		}
4a6144fbc706b3c Dmitry Safonov 2024-11-06  88  		tcp_diag_md5sig_fill(info++, key);
4a6144fbc706b3c Dmitry Safonov 2024-11-06  89  	}
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  90  
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  91  	return 0;
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  92  }
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  93  #endif
c03fa9bcacd9ac0 Ivan Delalande 2017-08-31  94  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 4/6] net/diag: Always pre-allocate tcp_ulp info
  2024-11-07  0:21   ` Kuniyuki Iwashima
@ 2024-11-07 17:35     ` Dmitry Safonov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Safonov @ 2024-11-07 17:35 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: devnull+0x7f454c46.gmail.com, borisp, colona, davem, dsahern,
	edumazet, geliang, horms, john.fastabend, kuba, linux-kernel,
	martineau, matttbe, mptcp, netdev, pabeni

Hi Kuniyuki,

thanks for your review,

On Thu, 7 Nov 2024 at 00:21, Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Dmitry Safonov via B4 Relay <devnull+0x7f454c46.gmail.com@kernel.org>
> Date: Wed, 06 Nov 2024 18:10:17 +0000
> > From: Dmitry Safonov <0x7f454c46@gmail.com>
> >
> > Currently there is a theoretical race between netlink one-socket dump
> > and allocating icsk->icsk_ulp_ops.
> >
> > Simplify the expectations by always allocating maximum tcp_ulp-info.
> > With the previous patch the typical netlink message allocation was
> > decreased for kernel replies on requests without idiag_ext flags,
> > so let's use it.
> >
>
> I think Fixes tag is needed.

Yeah, probably, wasn't sure if it's -stable material as I didn't
attempt to create a reproducer for this.

[..]
> > @@ -97,6 +97,53 @@ void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
> >  }
> >  EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill);
> >
> > +static size_t tls_get_info_size(void)
> > +{
> > +     size_t size = 0;
> > +
> > +#ifdef CONFIG_TLS
> > +     size += nla_total_size(0) +             /* INET_ULP_INFO_TLS */
>
> It seems '\t' after '+' was converted to '\s' by copy-and-paste.

Thanks, will correct

[..]
> > +static size_t subflow_get_info_size(void)
> > +{
> > +     size_t size = 0;
> > +
> > +#ifdef CONFIG_MPTCP
> > +     size += nla_total_size(0) +     /* INET_ULP_INFO_MPTCP */
> > +             nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_TOKEN_REM */
> > +             nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_TOKEN_LOC */
> > +             nla_total_size(4) +     /* MPTCP_SUBFLOW_ATTR_RELWRITE_SEQ */
> > +             nla_total_size_64bit(8) +       /* MPTCP_SUBFLOW_ATTR_MAP_SEQ */
>
> While at it, let's adjust tabs to match with MPTCP_SUBFLOW_ATTR_MAP_SEQ.

Sure

[..]
> > +static size_t tcp_ulp_ops_size(void)
> > +{
> > +     size_t size = max(tls_get_info_size(), subflow_get_info_size());
> > +
> > +     return size + nla_total_size(0) + nla_total_size(TCP_ULP_NAME_MAX);
>
> Is nla_total_size(0) for INET_DIAG_ULP_INFO ?
>
> It would be better to break them down in the same format with comment
> like tls_get_info_size() and subflow_get_info_size().

Good idea! Will do in v2.

[..]

Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 5/6] net/diag: Limit TCP-MD5-diag array by max attribute length
  2024-11-07  0:25   ` Kuniyuki Iwashima
@ 2024-11-07 17:46     ` Dmitry Safonov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Safonov @ 2024-11-07 17:46 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: devnull+0x7f454c46.gmail.com, borisp, colona, davem, dsahern,
	edumazet, geliang, horms, john.fastabend, kuba, linux-kernel,
	martineau, matttbe, mptcp, netdev, pabeni

On Thu, 7 Nov 2024 at 00:26, Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Dmitry Safonov via B4 Relay <devnull+0x7f454c46.gmail.com@kernel.org>
> Date: Wed, 06 Nov 2024 18:10:18 +0000
> > From: Dmitry Safonov <0x7f454c46@gmail.com>
> >
> > Currently TCP-MD5 keys are dumped as an array of
> > (struct tcp_diag_md5sig). All the keys from a socket go
> > into the same netlink attribute. The maximum amount of TCP-MD5 keys on
> > any socket is limited by /proc/sys/net/core/optmem_max, which post
> > commit 4944566706b2 ("net: increase optmem_max default value") is now by
> > default 128 KB. With the help of selftest I've figured out that equals
> > to 963 keys, without user having to increase optmem_max:
> > > test_set_md5() [963/1024]: Cannot allocate memory
> >
> > The maximum length of nlattr is limited by typeof(nlattr::nla_len),
> > which is (U16_MAX - 1). When there are too many keys the array written
> > overflows the netlink attribute. Here is what one can see on a test,
> > with no adjustments to optmem_max defaults:
> >
> > > recv() = 65180
> > > socket: 10.0.254.1:7013->0.0.0.0:0 (intf 3)
> > >      family: 2 state: 10 timer: 0 retrans: 0
> > >      expires: 0 rqueu: 0 wqueue: 1 uid: 0 inode: 456
> > >              attr type: 8 (5)
> > >              attr type: 15 (8)
> > >              attr type: 21 (12)
> > >              attr type: 22 (6)
> > >              attr type: 2 (252)
> > >              attr type: 18 (64804)
> > > recv() = 130680
> > > socket: 10.0.254.1:7013->0.0.0.0:0 (intf 3)
> > >      family: 2 state: 10 timer: 0 retrans: 0
> > >      expires: 0 rqueu: 0 wqueue: 1 uid: 0 inode: 456
> > >              attr type: 8 (5)
> > >              attr type: 15 (8)
> > >              attr type: 21 (12)
> > >              attr type: 22 (6)
> > >              attr type: 2 (252)
> > >              attr type: 18 (64768)
> > >              attr type: 29555 (25966)
> > > recv() = 130680
> > > socket: 10.0.254.1:7013->0.0.0.0:0 (intf 3)
> > >      family: 2 state: 10 timer: 0 retrans: 0
> > >      expires: 0 rqueu: 0 wqueue: 1 uid: 0 inode: 456
> > >              attr type: 8 (5)
> > >              attr type: 15 (8)
> > >              attr type: 21 (12)
> > >              attr type: 22 (6)
> > >              attr type: 2 (252)
> > >              attr type: 18 (64768)
> > >              attr type: 29555 (25966)
> > >              attr type: 8265 (8236)
> >
> > Here attribute type 18 is INET_DIAG_MD5SIG, the following nlattr types
> > are junk made of tcp_diag_md5sig's content.
> >
> > Here is the overflow of the nlattr size:
> > >>> hex(64768)
> > '0xfd00'
> > >>> hex(130300)
> > '0x1fcfc'
> >
> > Limit the size of (struct tcp_diag_md5sig) array in the netlink reply by
> > maximum attribute length. Not perfect as NLM_F_DUMP_INTR will be set on
> > the netlink header flags, but the userspace can differ if it's due to
> > inconsistency or due to maximum size of the netlink attribute.
> >
> > In a following patch set, I'm planning to address this and re-introduce
> > TCP-MD5-diag that actually works.
>
> Given the issue has not been reported so far (I think), we can wait for
> the series rather than backporting this.

Yeah, my concern is that ss or or other tools may interrupt the
md5keys from overflow as other netlink attributes and either show
non-meaningful things or even hide some socket information (as if an
attribute is met the second time, from what I read in ss code, it will
put the last met attribute of the same type over previous pointers and
print only that).

Regarding reports, one has to have U16_MAX / sizeof(struct
tcp_diag_md5sig) = 655 keys on a socket. Probably, not many people use
BGP with that many peers.

I'm fine moving that to later net-next patches; I've sent it only
because the above seemed concerning to me.

Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-11-07 17:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-06 18:10 [PATCH net 0/6] Make TCP-MD5-diag slightly less broken Dmitry Safonov via B4 Relay
2024-11-06 18:10 ` [PATCH net 1/6] net/diag: Do not race on dumping MD5 keys with adding new MD5 keys Dmitry Safonov via B4 Relay
2024-11-07  4:21   ` kernel test robot
2024-11-06 18:10 ` [PATCH net 2/6] net/diag: Warn only once on EMSGSIZE Dmitry Safonov via B4 Relay
2024-11-06 18:10 ` [PATCH net 3/6] net/diag: Pre-allocate optional info only if requested Dmitry Safonov via B4 Relay
2024-11-06 18:10 ` [PATCH net 4/6] net/diag: Always pre-allocate tcp_ulp info Dmitry Safonov via B4 Relay
2024-11-07  0:21   ` Kuniyuki Iwashima
2024-11-07 17:35     ` Dmitry Safonov
2024-11-06 18:10 ` [PATCH net 5/6] net/diag: Limit TCP-MD5-diag array by max attribute length Dmitry Safonov via B4 Relay
2024-11-07  0:25   ` Kuniyuki Iwashima
2024-11-07 17:46     ` Dmitry Safonov
2024-11-06 18:10 ` [PATCH net 6/6] net/netlink: Correct the comment on netlink message max cap Dmitry Safonov via B4 Relay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).