From: Ido Schimmel <idosch@nvidia.com>
To: Ferenc Fejes <ferenc@fejes.dev>
Cc: dsahern@gmail.com, netdev <netdev@vger.kernel.org>, kuniyu@amazon.com
Subject: Re: [question] robust netns association with fib4 lookup
Date: Mon, 28 Apr 2025 18:35:18 +0300 [thread overview]
Message-ID: <aA-gNpCWG2XJaf-X@shredder> (raw)
In-Reply-To: <2eb4b72dc5578407715e91f87116d2385598fa82.camel@fejes.dev>
On Mon, Apr 28, 2025 at 12:20:06PM +0200, Ferenc Fejes wrote:
> On Fri, 2025-04-25 at 21:17 +0300, Ido Schimmel wrote:
> > On Thu, Apr 24, 2025 at 01:33:08PM +0200, Ferenc Fejes wrote:
> > > Hi,
> > >
> > > tl;dr: I want to trace fib4 lookups within a network namespace with eBPF.
> > > This
> > > works well with fib6, as the struct net ptr passed as an argument to
> > > fib6_table_lookup [0], so I can read the inode from it and pass it to
> > > userspace.
> > >
> > >
> > > Additional context. I'm working on a fib table and fib rule lookup tracer
> > > application that hooks fib_table_lookup/fib6_table_lookup and
> > > fib_rules_lookup
> > > with fexit eBPF probes and gathers useful data from the struct flowi4 and
> > > flowi6
> > > used for the lookup as well as the resulting nexthop (gw, seg6, mpls tunnel)
> > > if
> > > the lookup is successful. If this works, my plan is to extend it to
> > > neighbour,
> > > fdb and mdb lookups.
> > >
> > > Tracepoints exist for fib lookups v4 [1] and v6 [2] but in my tracer I would
> > > like to have netns filtering. For example: "check unsuccessful fib4 rule and
> > > table lookups in netns foo". Unfortunately I can't find a reliable way to
> > > associate netns info with fib4 lookups. The main problems are as follows.
> > >
> > > Unlike fib6_table_lookup for v6, fib_table_lookup for v4 does not have a
> > > struct
> > > net argument. This makes sense, as struct net is not needed there. But
> > > without
> > > it, the netns association is not as easy as in the v6 case.
> > >
> > > On the other hand, fib_lookup [3], which in most cases calls
> > > fib_table_lookup,
> > > has a struct net parameter. Even better, there is the struct fib_result ptr
> > > returned by fib_table_lookup. This would be the perfect candidate to hook
> > > into,
> > > but unfortunately it is an inline function.
> > >
> > > If there are custom fib rules in the netns, __fib_lookup [4] is called,
> > > which is
> > > hookable. This has all the necessary info like netns, table and result. To
> > > use
> > > this I have to add the custom rule to the traced netns and remove it
> > > immediately. This will enforce the __fib_lookup codepath. But I feel that at
> > > some point this bug(?) will be fixed and the kernel will notice the absence
> > > of
> > > custom rules and switch back to the original codepath.
> > >
> > > But this option is useless for tracing unsuccessful lookups. The stack looks
> > > like this:
> > > __fib_lookup <-- netns info available
> > > fib_rules_lookup <-- losing netns info... :-(
> > > fib4_rule_action <-- unsuccessful result available
> > > fib_table_lookup <-- source of unsuccessful result
> > >
> > > My current workaround is to restore the netns info using the struct flowi4
> > > pointer. When we have the stack above, I use an eBPF hashmap and use the
> > > flowi4
> > > pointer as the key and netns as the value. Then in the fib_table_lookup I
> > > look
> > > up the netns id based on the value of the flowi4 pointer. Since this is the
> > > common case, it works, but looks like fib_table_lookup is called from other
> > > places as well (even its rare).
> > >
> > > Is there any other way to get the netns info for fib4 lookups? If not, would
> > > it
> > > be worth an RFC to pass the struct net argument to fib_table_lookup as well,
> > > as
> > > is currently done in fib6_table_lookup?
> >
> > I think it makes sense to make both tracepoints similar and pass the net
> > argument to trace_fib_table_lookup()
>
> Thank you for looking into it.
>
> >
> > > Unfortunately this includes some callers to fib_table_lookup. The
> > > netns id would also be presented in the existing tracepoints ([1] and
> > > [2]). Thanks in advance for any suggestion.
> >
> > By "netns id" you mean the netns cookie? It seems that some TCP trace
> > events already expose it (see include/trace/events/tcp.h). It would be
> > nice to finally have "perf" filter these FIB events based on netns.
>
> No, by netns id I mean struct net::ns::inum, which is the inode number
> associated with the netns. This is convenient since it's easy to look up this
> value in userspace with the lsns tool or just stat through the procfs for the
> inode.
>
> Looks like struct net::net_cookie is for similar purpose and can be used from
> restricted context (e.g.: xdp/tc/cls eBPF progs) where rich context (struct net
> for example) as in a fexit/fentry probe is not available.
I'm not sure the inode number is a good identifier for a namespace. See
this comment from the namespace maintainer for a patch that tried to add
a BPF helper to read this value:
https://lore.kernel.org/all/87efzq8jbi.fsf@xmission.com/
More here:
https://lore.kernel.org/netdev/87h93xqlui.fsf@xmission.com/
Which I suspect is why Daniel added the netns cookie:
https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net/
Regarding retrieval of this cookie, there is SO_NETNS_COOKIE:
https://lore.kernel.org/all/20210623135646.1632083-1-m@lambda.lt/
Seems to work fine [1]. Maybe ip-netns can be extended to retrieve the
cookie with something like:
ip netns cookie [ NETNSNAME | PID ]
[1]
# cat so_netns_cookie.c
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/socket.h>
int main(int argc, char *argvp[])
{
socklen_t vallen;
uint64_t cookie;
int sock;
sock = socket(AF_INET, SOCK_STREAM, 0);
if (sock < 0)
return sock;
vallen = sizeof(cookie);
if (getsockopt(sock, SOL_SOCKET, SO_NETNS_COOKIE, &cookie, &vallen) != 0)
return -1;
printf("cookie = %lu\n", cookie);
close(sock);
return 0;
}
# gcc -Wall so_netns_cookie.c -o so_netns_cookie
# ip netns add ns1
# ip netns add ns2
# ./so_netns_cookie
cookie = 1
# ip netns exec ns1 ./so_netns_cookie
cookie = 2
# ip netns exec ns2 ./so_netns_cookie
cookie = 3
next prev parent reply other threads:[~2025-04-28 15:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-24 11:33 [question] robust netns association with fib4 lookup Ferenc Fejes
2025-04-25 18:17 ` Ido Schimmel
2025-04-25 18:21 ` David Ahern
2025-04-28 10:23 ` Ferenc Fejes
2025-04-28 10:20 ` Ferenc Fejes
2025-04-28 15:35 ` Ido Schimmel [this message]
2025-04-29 5:50 ` Ferenc Fejes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aA-gNpCWG2XJaf-X@shredder \
--to=idosch@nvidia.com \
--cc=dsahern@gmail.com \
--cc=ferenc@fejes.dev \
--cc=kuniyu@amazon.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox