* [PATCH net-next v3] tcp: Dump bound-only sockets in inet_diag.
@ 2023-11-30 15:40 Guillaume Nault
2023-11-30 15:51 ` Guillaume Nault
2023-11-30 16:17 ` Eric Dumazet
0 siblings, 2 replies; 4+ messages in thread
From: Guillaume Nault @ 2023-11-30 15:40 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: netdev, David Ahern, Kuniyuki Iwashima, Michal Kubecek
Walk the hashinfo->bhash2 table so that inet_diag can dump TCP sockets
that are bound but haven't yet called connect() or listen().
The code is inspired by the ->lhash2 loop. However there's no manual
test of the source port, since this kind of filtering is already
handled by inet_diag_bc_sk(). Also, a maximum of 16 sockets are dumped
at a time, to avoid running with bh disabled for too long.
There's no TCP state for bound but otherwise inactive sockets. Such
sockets normally map to TCP_CLOSE. However, "ss -l", which is supposed
to only dump listening sockets, actually requests the kernel to dump
sockets in either the TCP_LISTEN or TCP_CLOSE states. To avoid dumping
bound-only sockets with "ss -l", we therefore need to define a new
pseudo-state (TCP_BOUND_INACTIVE) that user space will be able to set
explicitly.
With an IPv4, an IPv6 and an IPv6-only socket, bound respectively to
40000, 64000, 60000, an updated version of iproute2 could work as
follow:
$ ss -t state bound-inactive
Recv-Q Send-Q Local Address:Port Peer Address:Port Process
0 0 0.0.0.0:40000 0.0.0.0:*
0 0 [::]:60000 [::]:*
0 0 *:64000 *:*
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
v3:
* Grab sockets with sock_hold(), instead of refcount_inc_not_zero()
(Kuniyuki Iwashima).
* Use a new TCP pseudo-state (TCP_BOUND_INACTIVE), to dump bound-only
sockets, so that "ss -l" won't print them (Eric Dumazet).
v2:
* Use ->bhash2 instead of ->bhash (Kuniyuki Iwashima).
* Process no more than 16 sockets at a time (Kuniyuki Iwashima).
include/net/tcp_states.h | 2 +
include/uapi/linux/bpf.h | 1 +
net/ipv4/inet_diag.c | 86 +++++++++++++++++++++++++++++++++++++++-
net/ipv4/tcp.c | 1 +
4 files changed, 89 insertions(+), 1 deletion(-)
diff --git a/include/net/tcp_states.h b/include/net/tcp_states.h
index cc00118acca1..d60e8148ff4c 100644
--- a/include/net/tcp_states.h
+++ b/include/net/tcp_states.h
@@ -22,6 +22,7 @@ enum {
TCP_LISTEN,
TCP_CLOSING, /* Now a valid state */
TCP_NEW_SYN_RECV,
+ TCP_BOUND_INACTIVE, /* Pseudo-state for inet_diag */
TCP_MAX_STATES /* Leave at the end! */
};
@@ -43,6 +44,7 @@ enum {
TCPF_LISTEN = (1 << TCP_LISTEN),
TCPF_CLOSING = (1 << TCP_CLOSING),
TCPF_NEW_SYN_RECV = (1 << TCP_NEW_SYN_RECV),
+ TCPF_BOUND_INACTIVE = (1 << TCP_BOUND_INACTIVE),
};
#endif /* _LINUX_TCP_STATES_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7a5498242eaa..8ee2404d077c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6892,6 +6892,7 @@ enum {
BPF_TCP_LISTEN,
BPF_TCP_CLOSING, /* Now a valid state */
BPF_TCP_NEW_SYN_RECV,
+ BPF_TCP_BOUND_INACTIVE,
BPF_TCP_MAX_STATES /* Leave at the end! */
};
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 7d0e7aaa71e0..05fa0edd78b1 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -1077,10 +1077,94 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb,
s_i = num = s_num = 0;
}
+/* Process a maximum of SKARR_SZ sockets at a time when walking hash buckets
+ * with bh disabled.
+ */
+#define SKARR_SZ 16
+
+ /* Dump bound but inactive (not listening, connecting, etc.) sockets */
+ if (cb->args[0] == 1) {
+ if (!(idiag_states & TCPF_BOUND_INACTIVE))
+ goto skip_bind_ht;
+
+ for (i = s_i; i < hashinfo->bhash_size; i++) {
+ struct inet_bind_hashbucket *ibb;
+ struct inet_bind2_bucket *tb2;
+ struct sock *sk_arr[SKARR_SZ];
+ int num_arr[SKARR_SZ];
+ int idx, accum, res;
+
+resume_bind_walk:
+ num = 0;
+ accum = 0;
+ ibb = &hashinfo->bhash2[i];
+
+ spin_lock_bh(&ibb->lock);
+ inet_bind_bucket_for_each(tb2, &ibb->chain) {
+ if (!net_eq(ib2_net(tb2), net))
+ continue;
+
+ sk_for_each_bound_bhash2(sk, &tb2->owners) {
+ struct inet_sock *inet = inet_sk(sk);
+
+ if (num < s_num)
+ goto next_bind;
+
+ if (sk->sk_state != TCP_CLOSE ||
+ !inet->inet_num)
+ goto next_bind;
+
+ if (r->sdiag_family != AF_UNSPEC &&
+ r->sdiag_family != sk->sk_family)
+ goto next_bind;
+
+ if (!inet_diag_bc_sk(bc, sk))
+ goto next_bind;
+
+ sock_hold(sk);
+ num_arr[accum] = num;
+ sk_arr[accum] = sk;
+ if (++accum == SKARR_SZ)
+ goto pause_bind_walk;
+next_bind:
+ num++;
+ }
+ }
+pause_bind_walk:
+ spin_unlock_bh(&ibb->lock);
+
+ res = 0;
+ for (idx = 0; idx < accum; idx++) {
+ if (res >= 0) {
+ res = inet_sk_diag_fill(sk_arr[idx],
+ NULL, skb, cb,
+ r, NLM_F_MULTI,
+ net_admin);
+ if (res < 0)
+ num = num_arr[idx];
+ }
+ sock_gen_put(sk_arr[idx]);
+ }
+ if (res < 0)
+ goto done;
+
+ cond_resched();
+
+ if (accum == SKARR_SZ) {
+ s_num = num + 1;
+ goto resume_bind_walk;
+ }
+
+ s_num = 0;
+ }
+skip_bind_ht:
+ cb->args[0] = 2;
+ s_i = num = s_num = 0;
+ }
+
if (!(idiag_states & ~TCPF_LISTEN))
goto out;
-#define SKARR_SZ 16
for (i = s_i; i <= hashinfo->ehash_mask; i++) {
struct inet_ehash_bucket *head = &hashinfo->ehash[i];
spinlock_t *lock = inet_ehash_lockp(hashinfo, i);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 53bcc17c91e4..a100df07d34a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2605,6 +2605,7 @@ void tcp_set_state(struct sock *sk, int state)
BUILD_BUG_ON((int)BPF_TCP_LISTEN != (int)TCP_LISTEN);
BUILD_BUG_ON((int)BPF_TCP_CLOSING != (int)TCP_CLOSING);
BUILD_BUG_ON((int)BPF_TCP_NEW_SYN_RECV != (int)TCP_NEW_SYN_RECV);
+ BUILD_BUG_ON((int)BPF_TCP_BOUND_INACTIVE != (int)TCP_BOUND_INACTIVE);
BUILD_BUG_ON((int)BPF_TCP_MAX_STATES != (int)TCP_MAX_STATES);
/* bpf uapi header bpf.h defines an anonymous enum with values
--
2.39.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net-next v3] tcp: Dump bound-only sockets in inet_diag.
2023-11-30 15:40 [PATCH net-next v3] tcp: Dump bound-only sockets in inet_diag Guillaume Nault
@ 2023-11-30 15:51 ` Guillaume Nault
2023-11-30 16:17 ` Eric Dumazet
1 sibling, 0 replies; 4+ messages in thread
From: Guillaume Nault @ 2023-11-30 15:51 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: netdev, David Ahern, Kuniyuki Iwashima, Michal Kubecek
On Thu, Nov 30, 2023 at 04:40:51PM +0100, Guillaume Nault wrote:
> Walk the hashinfo->bhash2 table so that inet_diag can dump TCP sockets
> that are bound but haven't yet called connect() or listen().
>
> The code is inspired by the ->lhash2 loop. However there's no manual
> test of the source port, since this kind of filtering is already
> handled by inet_diag_bc_sk(). Also, a maximum of 16 sockets are dumped
> at a time, to avoid running with bh disabled for too long.
>
> There's no TCP state for bound but otherwise inactive sockets. Such
> sockets normally map to TCP_CLOSE. However, "ss -l", which is supposed
> to only dump listening sockets, actually requests the kernel to dump
> sockets in either the TCP_LISTEN or TCP_CLOSE states. To avoid dumping
> bound-only sockets with "ss -l", we therefore need to define a new
> pseudo-state (TCP_BOUND_INACTIVE) that user space will be able to set
> explicitly.
>
> With an IPv4, an IPv6 and an IPv6-only socket, bound respectively to
> 40000, 64000, 60000, an updated version of iproute2 could work as
> follow:
>
> $ ss -t state bound-inactive
> Recv-Q Send-Q Local Address:Port Peer Address:Port Process
> 0 0 0.0.0.0:40000 0.0.0.0:*
> 0 0 [::]:60000 [::]:*
> 0 0 *:64000 *:*
Here's a patch for iproute2-next for easy testing.
I'll submit it formally once the kernel side will be in place.
-------- >8 --------
diff --git a/man/man8/ss.8 b/man/man8/ss.8
index 073e9f03..4ece41fa 100644
--- a/man/man8/ss.8
+++ b/man/man8/ss.8
@@ -40,6 +40,10 @@ established connections) sockets.
.B \-l, \-\-listening
Display only listening sockets (these are omitted by default).
.TP
+.B \-B, \-\-bound-inactive
+Display only TCP bound but inactive (not listening, connecting, etc.) sockets
+(these are omitted by default).
+.TP
.B \-o, \-\-options
Show timer information. For TCP protocol, the output format is:
.RS
@@ -456,6 +460,9 @@ states except for
- opposite to
.B bucket
+.B bound-inactive
+- bound but otherwise inactive sockets (not listening, connecting, etc.)
+
.SH EXPRESSION
.B EXPRESSION
diff --git a/misc/ss.c b/misc/ss.c
index 9438382b..45f01286 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -210,6 +210,8 @@ enum {
SS_LAST_ACK,
SS_LISTEN,
SS_CLOSING,
+ SS_NEW_SYN_RECV,
+ SS_BOUND_INACTIVE,
SS_MAX
};
@@ -1381,6 +1383,8 @@ static void sock_state_print(struct sockstat *s)
[SS_LAST_ACK] = "LAST-ACK",
[SS_LISTEN] = "LISTEN",
[SS_CLOSING] = "CLOSING",
+ [SS_NEW_SYN_RECV] = "NEW-SYN-RECV",
+ [SS_BOUND_INACTIVE] = "BOUND-INACTIVE",
};
switch (s->local.family) {
@@ -5333,6 +5337,7 @@ static void _usage(FILE *dest)
" -r, --resolve resolve host names\n"
" -a, --all display all sockets\n"
" -l, --listening display listening sockets\n"
+" -B, --bound-inactive display TCP bound but inactive sockets\n"
" -o, --options show timer information\n"
" -e, --extended show detailed socket information\n"
" -m, --memory show socket memory usage\n"
@@ -5415,6 +5420,8 @@ static int scan_state(const char *state)
[SS_LAST_ACK] = "last-ack",
[SS_LISTEN] = "listening",
[SS_CLOSING] = "closing",
+ [SS_NEW_SYN_RECV] = "new-syn-recv",
+ [SS_BOUND_INACTIVE] = "bound-inactive",
};
int i;
@@ -5481,6 +5488,7 @@ static const struct option long_opts[] = {
{ "vsock", 0, 0, OPT_VSOCK },
{ "all", 0, 0, 'a' },
{ "listening", 0, 0, 'l' },
+ { "bound-inactive", 0, 0, 'B' },
{ "ipv4", 0, 0, '4' },
{ "ipv6", 0, 0, '6' },
{ "packet", 0, 0, '0' },
@@ -5519,7 +5527,7 @@ int main(int argc, char *argv[])
int state_filter = 0;
while ((ch = getopt_long(argc, argv,
- "dhaletuwxnro460spTbEf:mMiA:D:F:vVzZN:KHSO",
+ "dhalBetuwxnro460spTbEf:mMiA:D:F:vVzZN:KHSO",
long_opts, NULL)) != EOF) {
switch (ch) {
case 'n':
@@ -5584,6 +5592,9 @@ int main(int argc, char *argv[])
case 'l':
state_filter = (1 << SS_LISTEN) | (1 << SS_CLOSE);
break;
+ case 'B':
+ state_filter = 1 << SS_BOUND_INACTIVE;
+ break;
case '4':
filter_af_set(¤t_filter, AF_INET);
break;
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net-next v3] tcp: Dump bound-only sockets in inet_diag.
2023-11-30 15:40 [PATCH net-next v3] tcp: Dump bound-only sockets in inet_diag Guillaume Nault
2023-11-30 15:51 ` Guillaume Nault
@ 2023-11-30 16:17 ` Eric Dumazet
2023-11-30 16:30 ` Guillaume Nault
1 sibling, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2023-11-30 16:17 UTC (permalink / raw)
To: Guillaume Nault
Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev, David Ahern,
Kuniyuki Iwashima, Michal Kubecek
On Thu, Nov 30, 2023 at 4:40 PM Guillaume Nault <gnault@redhat.com> wrote:
>
> Walk the hashinfo->bhash2 table so that inet_diag can dump TCP sockets
> that are bound but haven't yet called connect() or listen().
>
> The code is inspired by the ->lhash2 loop. However there's no manual
> test of the source port, since this kind of filtering is already
> handled by inet_diag_bc_sk(). Also, a maximum of 16 sockets are dumped
> at a time, to avoid running with bh disabled for too long.
>
> There's no TCP state for bound but otherwise inactive sockets. Such
> sockets normally map to TCP_CLOSE. However, "ss -l", which is supposed
> to only dump listening sockets, actually requests the kernel to dump
> sockets in either the TCP_LISTEN or TCP_CLOSE states. To avoid dumping
> bound-only sockets with "ss -l", we therefore need to define a new
> pseudo-state (TCP_BOUND_INACTIVE) that user space will be able to set
> explicitly.
>
> With an IPv4, an IPv6 and an IPv6-only socket, bound respectively to
> 40000, 64000, 60000, an updated version of iproute2 could work as
> follow:
>
> $ ss -t state bound-inactive
> Recv-Q Send-Q Local Address:Port Peer Address:Port Process
> 0 0 0.0.0.0:40000 0.0.0.0:*
> 0 0 [::]:60000 [::]:*
> 0 0 *:64000 *:*
>
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
>
> v3:
> * Grab sockets with sock_hold(), instead of refcount_inc_not_zero()
> (Kuniyuki Iwashima).
> * Use a new TCP pseudo-state (TCP_BOUND_INACTIVE), to dump bound-only
> sockets, so that "ss -l" won't print them (Eric Dumazet).
>
> +pause_bind_walk:
> + spin_unlock_bh(&ibb->lock);
> +
> + res = 0;
> + for (idx = 0; idx < accum; idx++) {
> + if (res >= 0) {
> + res = inet_sk_diag_fill(sk_arr[idx],
> + NULL, skb, cb,
> + r, NLM_F_MULTI,
> + net_admin);
> + if (res < 0)
> + num = num_arr[idx];
> + }
> + sock_gen_put(sk_arr[idx]);
nit: this could be a mere sock_put(), because only full sockets are
hashed in bhash2[]
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net-next v3] tcp: Dump bound-only sockets in inet_diag.
2023-11-30 16:17 ` Eric Dumazet
@ 2023-11-30 16:30 ` Guillaume Nault
0 siblings, 0 replies; 4+ messages in thread
From: Guillaume Nault @ 2023-11-30 16:30 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev, David Ahern,
Kuniyuki Iwashima, Michal Kubecek
On Thu, Nov 30, 2023 at 05:17:57PM +0100, Eric Dumazet wrote:
> On Thu, Nov 30, 2023 at 4:40 PM Guillaume Nault <gnault@redhat.com> wrote:
> >
> > Walk the hashinfo->bhash2 table so that inet_diag can dump TCP sockets
> > that are bound but haven't yet called connect() or listen().
> >
> > The code is inspired by the ->lhash2 loop. However there's no manual
> > test of the source port, since this kind of filtering is already
> > handled by inet_diag_bc_sk(). Also, a maximum of 16 sockets are dumped
> > at a time, to avoid running with bh disabled for too long.
> >
> > There's no TCP state for bound but otherwise inactive sockets. Such
> > sockets normally map to TCP_CLOSE. However, "ss -l", which is supposed
> > to only dump listening sockets, actually requests the kernel to dump
> > sockets in either the TCP_LISTEN or TCP_CLOSE states. To avoid dumping
> > bound-only sockets with "ss -l", we therefore need to define a new
> > pseudo-state (TCP_BOUND_INACTIVE) that user space will be able to set
> > explicitly.
> >
> > With an IPv4, an IPv6 and an IPv6-only socket, bound respectively to
> > 40000, 64000, 60000, an updated version of iproute2 could work as
> > follow:
> >
> > $ ss -t state bound-inactive
> > Recv-Q Send-Q Local Address:Port Peer Address:Port Process
> > 0 0 0.0.0.0:40000 0.0.0.0:*
> > 0 0 [::]:60000 [::]:*
> > 0 0 *:64000 *:*
> >
> > Signed-off-by: Guillaume Nault <gnault@redhat.com>
> > ---
> >
> > v3:
> > * Grab sockets with sock_hold(), instead of refcount_inc_not_zero()
> > (Kuniyuki Iwashima).
> > * Use a new TCP pseudo-state (TCP_BOUND_INACTIVE), to dump bound-only
> > sockets, so that "ss -l" won't print them (Eric Dumazet).
> >
>
>
> > +pause_bind_walk:
> > + spin_unlock_bh(&ibb->lock);
> > +
> > + res = 0;
> > + for (idx = 0; idx < accum; idx++) {
> > + if (res >= 0) {
> > + res = inet_sk_diag_fill(sk_arr[idx],
> > + NULL, skb, cb,
> > + r, NLM_F_MULTI,
> > + net_admin);
> > + if (res < 0)
> > + num = num_arr[idx];
> > + }
> > + sock_gen_put(sk_arr[idx]);
>
> nit: this could be a mere sock_put(), because only full sockets are
> hashed in bhash2[]
Yes, makes sense.
I'll send a v4.
> Reviewed-by: Eric Dumazet <edumazet@google.com>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-11-30 16:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-30 15:40 [PATCH net-next v3] tcp: Dump bound-only sockets in inet_diag Guillaume Nault
2023-11-30 15:51 ` Guillaume Nault
2023-11-30 16:17 ` Eric Dumazet
2023-11-30 16:30 ` Guillaume Nault
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).