* tcp_diag for all network namespaces? @ 2024-12-09 19:24 dave seddon 2024-12-10 2:00 ` Kuniyuki Iwashima 2024-12-10 5:17 ` Cong Wang 0 siblings, 2 replies; 7+ messages in thread From: dave seddon @ 2024-12-09 19:24 UTC (permalink / raw) To: netdev G'day, Short Is there a way to extract tcp_diag socket data for all sockets from all network name spaces please? Background I've been using tcp_diag to dump out TCP socket performance every minute and then stream the data via Kafka and then into a Clickhouse database. This is awesome for socket performance monitoring. Kubernetes I'd like to adapt this solution to <somehow> allow monitoring of kubernetes clusters, so that it would be possible to monitor the socket performance of all pods. Ideally, a single process could open a netlink socket into each network namespace, but currently that isn't possible. Would it be crazy to add a new feature to the kernel to allow dumping all sockets from all name spaces? Maybe I'm missing some other better option(s)? Thanks in advance -- Regards, Dave Seddon ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces? 2024-12-09 19:24 tcp_diag for all network namespaces? dave seddon @ 2024-12-10 2:00 ` Kuniyuki Iwashima 2024-12-19 18:11 ` Martin KaFai Lau 2024-12-10 5:17 ` Cong Wang 1 sibling, 1 reply; 7+ messages in thread From: Kuniyuki Iwashima @ 2024-12-10 2:00 UTC (permalink / raw) To: dave.seddon.ca; +Cc: netdev, kuniyu From: dave seddon <dave.seddon.ca@gmail.com> Date: Mon, 9 Dec 2024 11:24:18 -0800 > G'day, > > Short > Is there a way to extract tcp_diag socket data for all sockets from > all network name spaces please? I think there's no such interface. I remember there was a similar request for TCP BPF iterator, but now it's difficult because each netns could have its own TCP hash table for established connections. > > Background > I've been using tcp_diag to dump out TCP socket performance every > minute and then stream the data via Kafka and then into a Clickhouse > database. This is awesome for socket performance monitoring. > > Kubernetes > I'd like to adapt this solution to <somehow> allow monitoring of > kubernetes clusters, so that it would be possible to monitor the > socket performance of all pods. Ideally, a single process could open > a netlink socket into each network namespace, but currently that isn't > possible. > > Would it be crazy to add a new feature to the kernel to allow dumping > all sockets from all name spaces? Iterating netns in userspace is much simpler than in kernel that needs iterating net_namespace_list under net_rwsem and remembering the last netns with the refcount bumped. > > Maybe I'm missing some other better option(s)? > > Thanks in advance ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces? 2024-12-10 2:00 ` Kuniyuki Iwashima @ 2024-12-19 18:11 ` Martin KaFai Lau 0 siblings, 0 replies; 7+ messages in thread From: Martin KaFai Lau @ 2024-12-19 18:11 UTC (permalink / raw) To: Kuniyuki Iwashima, dave.seddon.ca; +Cc: netdev, bpf On 12/9/24 6:00 PM, Kuniyuki Iwashima wrote: >> G'day, >> >> Short >> Is there a way to extract tcp_diag socket data for all sockets from >> all network name spaces please? > I think there's no such interface. > > I remember there was a similar request for TCP BPF iterator, > but now it's difficult because each netns could have its own > TCP hash table for established connections. It would be nice to be able to iterate netns in bpf. There is a bpf task/file iterator that iterates tasks and all files under each task (tools/testing/selftests/bpf/progs/bpf_iter_task_file.c). The netns/sock iteration feels similar. The first step could be to allow bpf prog to iterate all netns first. Then it will allow bpf to inspect "struct net". There is also a newer open iterator approach in bpf which should be considered also. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces? 2024-12-09 19:24 tcp_diag for all network namespaces? dave seddon 2024-12-10 2:00 ` Kuniyuki Iwashima @ 2024-12-10 5:17 ` Cong Wang 2024-12-10 20:45 ` Jay Vosburgh 1 sibling, 1 reply; 7+ messages in thread From: Cong Wang @ 2024-12-10 5:17 UTC (permalink / raw) To: dave seddon; +Cc: netdev On Mon, Dec 09, 2024 at 11:24:18AM -0800, dave seddon wrote: > G'day, > > Short > Is there a way to extract tcp_diag socket data for all sockets from > all network name spaces please? > > Background > I've been using tcp_diag to dump out TCP socket performance every > minute and then stream the data via Kafka and then into a Clickhouse > database. This is awesome for socket performance monitoring. > > Kubernetes > I'd like to adapt this solution to <somehow> allow monitoring of > kubernetes clusters, so that it would be possible to monitor the > socket performance of all pods. Ideally, a single process could open > a netlink socket into each network namespace, but currently that isn't > possible. > > Would it be crazy to add a new feature to the kernel to allow dumping > all sockets from all name spaces? You are already able to do so in user-space, something like: for ns in $(ip netns list | cut -d' ' -f1); do ip netns exec $ns ss -tapn done (If you use API, you can find equivalent API's) Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces? 2024-12-10 5:17 ` Cong Wang @ 2024-12-10 20:45 ` Jay Vosburgh 2024-12-11 6:35 ` Xiao Liang 0 siblings, 1 reply; 7+ messages in thread From: Jay Vosburgh @ 2024-12-10 20:45 UTC (permalink / raw) To: Cong Wang; +Cc: dave seddon, netdev Cong Wang <xiyou.wangcong@gmail.com> wrote: >On Mon, Dec 09, 2024 at 11:24:18AM -0800, dave seddon wrote: >> G'day, >> >> Short >> Is there a way to extract tcp_diag socket data for all sockets from >> all network name spaces please? >> >> Background >> I've been using tcp_diag to dump out TCP socket performance every >> minute and then stream the data via Kafka and then into a Clickhouse >> database. This is awesome for socket performance monitoring. >> >> Kubernetes >> I'd like to adapt this solution to <somehow> allow monitoring of >> kubernetes clusters, so that it would be possible to monitor the >> socket performance of all pods. Ideally, a single process could open >> a netlink socket into each network namespace, but currently that isn't >> possible. >> >> Would it be crazy to add a new feature to the kernel to allow dumping >> all sockets from all name spaces? > >You are already able to do so in user-space, something like: > >for ns in $(ip netns list | cut -d' ' -f1); do > ip netns exec $ns ss -tapn >done > >(If you use API, you can find equivalent API's) FWIW, if any namespaces weren't created through /sbin/ip, then something like the following works as well: #!/bin/bash nspidlist=`lsns -t net -o pid -n` for p in ${nspidlist}; do lsns -p ${p} -t net nsenter -n -t ${p} ss -tapn done -J --- -Jay Vosburgh, jv@jvosburgh.net ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces? 2024-12-10 20:45 ` Jay Vosburgh @ 2024-12-11 6:35 ` Xiao Liang 2024-12-11 17:28 ` Cong Wang 0 siblings, 1 reply; 7+ messages in thread From: Xiao Liang @ 2024-12-11 6:35 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Cong Wang, dave seddon, netdev, Kuniyuki Iwashima On Wed, Dec 11, 2024 at 1:43 PM Jay Vosburgh <jv@jvosburgh.net> wrote: > > Cong Wang <xiyou.wangcong@gmail.com> wrote: > > >On Mon, Dec 09, 2024 at 11:24:18AM -0800, dave seddon wrote: > >> G'day, > >> > >> Short > >> Is there a way to extract tcp_diag socket data for all sockets from > >> all network name spaces please? > >> > >> Background > >> I've been using tcp_diag to dump out TCP socket performance every > >> minute and then stream the data via Kafka and then into a Clickhouse > >> database. This is awesome for socket performance monitoring. > >> > >> Kubernetes > >> I'd like to adapt this solution to <somehow> allow monitoring of > >> kubernetes clusters, so that it would be possible to monitor the > >> socket performance of all pods. Ideally, a single process could open > >> a netlink socket into each network namespace, but currently that isn't > >> possible. > >> > >> Would it be crazy to add a new feature to the kernel to allow dumping > >> all sockets from all name spaces? > > > >You are already able to do so in user-space, something like: > > > >for ns in $(ip netns list | cut -d' ' -f1); do > > ip netns exec $ns ss -tapn > >done > > > >(If you use API, you can find equivalent API's) > > FWIW, if any namespaces weren't created through /sbin/ip, then > something like the following works as well: > > #!/bin/bash > > nspidlist=`lsns -t net -o pid -n` > > for p in ${nspidlist}; do > lsns -p ${p} -t net > nsenter -n -t ${p} ss -tapn > done I think neither iproute2 nor lsns can actually list all net namespaces. iproute2 uses mounts under /run/netns by default, and lsns iterates through processes. But there are more ways to hold a reference to netns: open fds, sockets, and files hidden in mnt namespaces... Consider if we move an interface to a netns, and some process creates a socket in that ns and switches back to init ns. Then when we delete it with "ip netns delete", the interface and ns are lost from userspace. It's hard to troubleshoot. I haven't found a way to enumerate net namespaces reliably. Maybe we can have an API to list namespaces in net_namespace_list, and allow processes to open an ns file by inum? > > -J > > --- > -Jay Vosburgh, jv@jvosburgh.net > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_diag for all network namespaces? 2024-12-11 6:35 ` Xiao Liang @ 2024-12-11 17:28 ` Cong Wang 0 siblings, 0 replies; 7+ messages in thread From: Cong Wang @ 2024-12-11 17:28 UTC (permalink / raw) To: Xiao Liang; +Cc: Jay Vosburgh, dave seddon, netdev, Kuniyuki Iwashima On Wed, Dec 11, 2024 at 02:35:16PM +0800, Xiao Liang wrote: > On Wed, Dec 11, 2024 at 1:43 PM Jay Vosburgh <jv@jvosburgh.net> wrote: > > > > Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > >On Mon, Dec 09, 2024 at 11:24:18AM -0800, dave seddon wrote: > > >> G'day, > > >> > > >> Short > > >> Is there a way to extract tcp_diag socket data for all sockets from > > >> all network name spaces please? > > >> > > >> Background > > >> I've been using tcp_diag to dump out TCP socket performance every > > >> minute and then stream the data via Kafka and then into a Clickhouse > > >> database. This is awesome for socket performance monitoring. > > >> > > >> Kubernetes > > >> I'd like to adapt this solution to <somehow> allow monitoring of > > >> kubernetes clusters, so that it would be possible to monitor the > > >> socket performance of all pods. Ideally, a single process could open > > >> a netlink socket into each network namespace, but currently that isn't > > >> possible. > > >> > > >> Would it be crazy to add a new feature to the kernel to allow dumping > > >> all sockets from all name spaces? > > > > > >You are already able to do so in user-space, something like: > > > > > >for ns in $(ip netns list | cut -d' ' -f1); do > > > ip netns exec $ns ss -tapn > > >done > > > > > >(If you use API, you can find equivalent API's) > > > > FWIW, if any namespaces weren't created through /sbin/ip, then > > something like the following works as well: > > > > #!/bin/bash > > > > nspidlist=`lsns -t net -o pid -n` > > > > for p in ${nspidlist}; do > > lsns -p ${p} -t net > > nsenter -n -t ${p} ss -tapn > > done > > I think neither iproute2 nor lsns can actually list all net namespaces. > iproute2 uses mounts under /run/netns by default, and lsns iterates > through processes. But there are more ways to hold a reference to > netns: open fds, sockets, and files hidden in mnt namespaces... Do you really need that accuracy? Dumping just provides a snapshot, it is by definition not accurate. > > Consider if we move an interface to a netns, and some process > creates a socket in that ns and switches back to init ns. Then when > we delete it with "ip netns delete", the interface and ns are lost from > userspace. It's hard to troubleshoot. You also use tracing tools like bpftrace for troubleshooting like your case, dumping is not the only way. > > I haven't found a way to enumerate net namespaces reliably. Maybe > we can have an API to list namespaces in net_namespace_list, and > allow processes to open an ns file by inum? > If you have a solid and real use case, maybe. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-19 18:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-09 19:24 tcp_diag for all network namespaces? dave seddon 2024-12-10 2:00 ` Kuniyuki Iwashima 2024-12-19 18:11 ` Martin KaFai Lau 2024-12-10 5:17 ` Cong Wang 2024-12-10 20:45 ` Jay Vosburgh 2024-12-11 6:35 ` Xiao Liang 2024-12-11 17:28 ` Cong Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).