* tcp_diag for all network namespaces?
@ 2024-12-09 19:24 dave seddon
2024-12-10 2:00 ` Kuniyuki Iwashima
2024-12-10 5:17 ` Cong Wang
0 siblings, 2 replies; 7+ messages in thread
From: dave seddon @ 2024-12-09 19:24 UTC (permalink / raw)
To: netdev
G'day,
Short
Is there a way to extract tcp_diag socket data for all sockets from
all network name spaces please?
Background
I've been using tcp_diag to dump out TCP socket performance every
minute and then stream the data via Kafka and then into a Clickhouse
database. This is awesome for socket performance monitoring.
Kubernetes
I'd like to adapt this solution to <somehow> allow monitoring of
kubernetes clusters, so that it would be possible to monitor the
socket performance of all pods. Ideally, a single process could open
a netlink socket into each network namespace, but currently that isn't
possible.
Would it be crazy to add a new feature to the kernel to allow dumping
all sockets from all name spaces?
Maybe I'm missing some other better option(s)?
Thanks in advance
--
Regards,
Dave Seddon
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces?
2024-12-09 19:24 tcp_diag for all network namespaces? dave seddon
@ 2024-12-10 2:00 ` Kuniyuki Iwashima
2024-12-19 18:11 ` Martin KaFai Lau
2024-12-10 5:17 ` Cong Wang
1 sibling, 1 reply; 7+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-10 2:00 UTC (permalink / raw)
To: dave.seddon.ca; +Cc: netdev, kuniyu
From: dave seddon <dave.seddon.ca@gmail.com>
Date: Mon, 9 Dec 2024 11:24:18 -0800
> G'day,
>
> Short
> Is there a way to extract tcp_diag socket data for all sockets from
> all network name spaces please?
I think there's no such interface.
I remember there was a similar request for TCP BPF iterator,
but now it's difficult because each netns could have its own
TCP hash table for established connections.
>
> Background
> I've been using tcp_diag to dump out TCP socket performance every
> minute and then stream the data via Kafka and then into a Clickhouse
> database. This is awesome for socket performance monitoring.
>
> Kubernetes
> I'd like to adapt this solution to <somehow> allow monitoring of
> kubernetes clusters, so that it would be possible to monitor the
> socket performance of all pods. Ideally, a single process could open
> a netlink socket into each network namespace, but currently that isn't
> possible.
>
> Would it be crazy to add a new feature to the kernel to allow dumping
> all sockets from all name spaces?
Iterating netns in userspace is much simpler than in kernel that needs
iterating net_namespace_list under net_rwsem and remembering the last
netns with the refcount bumped.
>
> Maybe I'm missing some other better option(s)?
>
> Thanks in advance
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces?
2024-12-09 19:24 tcp_diag for all network namespaces? dave seddon
2024-12-10 2:00 ` Kuniyuki Iwashima
@ 2024-12-10 5:17 ` Cong Wang
2024-12-10 20:45 ` Jay Vosburgh
1 sibling, 1 reply; 7+ messages in thread
From: Cong Wang @ 2024-12-10 5:17 UTC (permalink / raw)
To: dave seddon; +Cc: netdev
On Mon, Dec 09, 2024 at 11:24:18AM -0800, dave seddon wrote:
> G'day,
>
> Short
> Is there a way to extract tcp_diag socket data for all sockets from
> all network name spaces please?
>
> Background
> I've been using tcp_diag to dump out TCP socket performance every
> minute and then stream the data via Kafka and then into a Clickhouse
> database. This is awesome for socket performance monitoring.
>
> Kubernetes
> I'd like to adapt this solution to <somehow> allow monitoring of
> kubernetes clusters, so that it would be possible to monitor the
> socket performance of all pods. Ideally, a single process could open
> a netlink socket into each network namespace, but currently that isn't
> possible.
>
> Would it be crazy to add a new feature to the kernel to allow dumping
> all sockets from all name spaces?
You are already able to do so in user-space, something like:
for ns in $(ip netns list | cut -d' ' -f1); do
ip netns exec $ns ss -tapn
done
(If you use API, you can find equivalent API's)
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces?
2024-12-10 5:17 ` Cong Wang
@ 2024-12-10 20:45 ` Jay Vosburgh
2024-12-11 6:35 ` Xiao Liang
0 siblings, 1 reply; 7+ messages in thread
From: Jay Vosburgh @ 2024-12-10 20:45 UTC (permalink / raw)
To: Cong Wang; +Cc: dave seddon, netdev
Cong Wang <xiyou.wangcong@gmail.com> wrote:
>On Mon, Dec 09, 2024 at 11:24:18AM -0800, dave seddon wrote:
>> G'day,
>>
>> Short
>> Is there a way to extract tcp_diag socket data for all sockets from
>> all network name spaces please?
>>
>> Background
>> I've been using tcp_diag to dump out TCP socket performance every
>> minute and then stream the data via Kafka and then into a Clickhouse
>> database. This is awesome for socket performance monitoring.
>>
>> Kubernetes
>> I'd like to adapt this solution to <somehow> allow monitoring of
>> kubernetes clusters, so that it would be possible to monitor the
>> socket performance of all pods. Ideally, a single process could open
>> a netlink socket into each network namespace, but currently that isn't
>> possible.
>>
>> Would it be crazy to add a new feature to the kernel to allow dumping
>> all sockets from all name spaces?
>
>You are already able to do so in user-space, something like:
>
>for ns in $(ip netns list | cut -d' ' -f1); do
> ip netns exec $ns ss -tapn
>done
>
>(If you use API, you can find equivalent API's)
FWIW, if any namespaces weren't created through /sbin/ip, then
something like the following works as well:
#!/bin/bash
nspidlist=`lsns -t net -o pid -n`
for p in ${nspidlist}; do
lsns -p ${p} -t net
nsenter -n -t ${p} ss -tapn
done
-J
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces?
2024-12-10 20:45 ` Jay Vosburgh
@ 2024-12-11 6:35 ` Xiao Liang
2024-12-11 17:28 ` Cong Wang
0 siblings, 1 reply; 7+ messages in thread
From: Xiao Liang @ 2024-12-11 6:35 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Cong Wang, dave seddon, netdev, Kuniyuki Iwashima
On Wed, Dec 11, 2024 at 1:43 PM Jay Vosburgh <jv@jvosburgh.net> wrote:
>
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> >On Mon, Dec 09, 2024 at 11:24:18AM -0800, dave seddon wrote:
> >> G'day,
> >>
> >> Short
> >> Is there a way to extract tcp_diag socket data for all sockets from
> >> all network name spaces please?
> >>
> >> Background
> >> I've been using tcp_diag to dump out TCP socket performance every
> >> minute and then stream the data via Kafka and then into a Clickhouse
> >> database. This is awesome for socket performance monitoring.
> >>
> >> Kubernetes
> >> I'd like to adapt this solution to <somehow> allow monitoring of
> >> kubernetes clusters, so that it would be possible to monitor the
> >> socket performance of all pods. Ideally, a single process could open
> >> a netlink socket into each network namespace, but currently that isn't
> >> possible.
> >>
> >> Would it be crazy to add a new feature to the kernel to allow dumping
> >> all sockets from all name spaces?
> >
> >You are already able to do so in user-space, something like:
> >
> >for ns in $(ip netns list | cut -d' ' -f1); do
> > ip netns exec $ns ss -tapn
> >done
> >
> >(If you use API, you can find equivalent API's)
>
> FWIW, if any namespaces weren't created through /sbin/ip, then
> something like the following works as well:
>
> #!/bin/bash
>
> nspidlist=`lsns -t net -o pid -n`
>
> for p in ${nspidlist}; do
> lsns -p ${p} -t net
> nsenter -n -t ${p} ss -tapn
> done
I think neither iproute2 nor lsns can actually list all net namespaces.
iproute2 uses mounts under /run/netns by default, and lsns iterates
through processes. But there are more ways to hold a reference to
netns: open fds, sockets, and files hidden in mnt namespaces...
Consider if we move an interface to a netns, and some process
creates a socket in that ns and switches back to init ns. Then when
we delete it with "ip netns delete", the interface and ns are lost from
userspace. It's hard to troubleshoot.
I haven't found a way to enumerate net namespaces reliably. Maybe
we can have an API to list namespaces in net_namespace_list, and
allow processes to open an ns file by inum?
>
> -J
>
> ---
> -Jay Vosburgh, jv@jvosburgh.net
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces?
2024-12-11 6:35 ` Xiao Liang
@ 2024-12-11 17:28 ` Cong Wang
0 siblings, 0 replies; 7+ messages in thread
From: Cong Wang @ 2024-12-11 17:28 UTC (permalink / raw)
To: Xiao Liang; +Cc: Jay Vosburgh, dave seddon, netdev, Kuniyuki Iwashima
On Wed, Dec 11, 2024 at 02:35:16PM +0800, Xiao Liang wrote:
> On Wed, Dec 11, 2024 at 1:43 PM Jay Vosburgh <jv@jvosburgh.net> wrote:
> >
> > Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > >On Mon, Dec 09, 2024 at 11:24:18AM -0800, dave seddon wrote:
> > >> G'day,
> > >>
> > >> Short
> > >> Is there a way to extract tcp_diag socket data for all sockets from
> > >> all network name spaces please?
> > >>
> > >> Background
> > >> I've been using tcp_diag to dump out TCP socket performance every
> > >> minute and then stream the data via Kafka and then into a Clickhouse
> > >> database. This is awesome for socket performance monitoring.
> > >>
> > >> Kubernetes
> > >> I'd like to adapt this solution to <somehow> allow monitoring of
> > >> kubernetes clusters, so that it would be possible to monitor the
> > >> socket performance of all pods. Ideally, a single process could open
> > >> a netlink socket into each network namespace, but currently that isn't
> > >> possible.
> > >>
> > >> Would it be crazy to add a new feature to the kernel to allow dumping
> > >> all sockets from all name spaces?
> > >
> > >You are already able to do so in user-space, something like:
> > >
> > >for ns in $(ip netns list | cut -d' ' -f1); do
> > > ip netns exec $ns ss -tapn
> > >done
> > >
> > >(If you use API, you can find equivalent API's)
> >
> > FWIW, if any namespaces weren't created through /sbin/ip, then
> > something like the following works as well:
> >
> > #!/bin/bash
> >
> > nspidlist=`lsns -t net -o pid -n`
> >
> > for p in ${nspidlist}; do
> > lsns -p ${p} -t net
> > nsenter -n -t ${p} ss -tapn
> > done
>
> I think neither iproute2 nor lsns can actually list all net namespaces.
> iproute2 uses mounts under /run/netns by default, and lsns iterates
> through processes. But there are more ways to hold a reference to
> netns: open fds, sockets, and files hidden in mnt namespaces...
Do you really need that accuracy? Dumping just provides a snapshot, it
is by definition not accurate.
>
> Consider if we move an interface to a netns, and some process
> creates a socket in that ns and switches back to init ns. Then when
> we delete it with "ip netns delete", the interface and ns are lost from
> userspace. It's hard to troubleshoot.
You also use tracing tools like bpftrace for troubleshooting like your
case, dumping is not the only way.
>
> I haven't found a way to enumerate net namespaces reliably. Maybe
> we can have an API to list namespaces in net_namespace_list, and
> allow processes to open an ns file by inum?
>
If you have a solid and real use case, maybe.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces?
2024-12-10 2:00 ` Kuniyuki Iwashima
@ 2024-12-19 18:11 ` Martin KaFai Lau
0 siblings, 0 replies; 7+ messages in thread
From: Martin KaFai Lau @ 2024-12-19 18:11 UTC (permalink / raw)
To: Kuniyuki Iwashima, dave.seddon.ca; +Cc: netdev, bpf
On 12/9/24 6:00 PM, Kuniyuki Iwashima wrote:
>> G'day,
>>
>> Short
>> Is there a way to extract tcp_diag socket data for all sockets from
>> all network name spaces please?
> I think there's no such interface.
>
> I remember there was a similar request for TCP BPF iterator,
> but now it's difficult because each netns could have its own
> TCP hash table for established connections.
It would be nice to be able to iterate netns in bpf. There is a bpf task/file
iterator that iterates tasks and all files under each task
(tools/testing/selftests/bpf/progs/bpf_iter_task_file.c). The netns/sock
iteration feels similar. The first step could be to allow bpf prog to iterate
all netns first. Then it will allow bpf to inspect "struct net". There is also a
newer open iterator approach in bpf which should be considered also.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-19 18:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-09 19:24 tcp_diag for all network namespaces? dave seddon
2024-12-10 2:00 ` Kuniyuki Iwashima
2024-12-19 18:11 ` Martin KaFai Lau
2024-12-10 5:17 ` Cong Wang
2024-12-10 20:45 ` Jay Vosburgh
2024-12-11 6:35 ` Xiao Liang
2024-12-11 17:28 ` Cong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).