From: Yonghong Song <yonghong.song@linux.dev>
To: Martin KaFai Lau <martin.lau@linux.dev>,
Mahe Tardy <mahe.tardy@gmail.com>
Cc: daniel@iogearbox.net, john.fastabend@gmail.com, ast@kernel.org,
andrii@kernel.org, jolsa@kernel.org, bpf@vger.kernel.org,
Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: add get_netns_cookie helper to tracing programs
Date: Fri, 7 Mar 2025 22:17:44 -0800 [thread overview]
Message-ID: <908c6a63-3049-4dd2-859a-215b31e5d1ea@linux.dev> (raw)
In-Reply-To: <a66af5a8-1aa4-481a-b57e-b3076cc520b0@linux.dev>
On 3/7/25 3:06 PM, Martin KaFai Lau wrote:
> On 3/6/25 9:03 AM, Mahe Tardy wrote:
>>>>> The immediate question is whether sock_net(sk) must be non-NULL
>>>>> for tracing.
>>>> We discussed this offline with Daniel Borkmann and we think that it
>>>> might not be the question. The get_netns_cookie(NULL) call allows
>>>> us to
>>>> compare against get_netns_cookie(sock) to see whether the sock's netns
>>>> is equal to the init netns and thus dispatch different logic.
>>> bpf_get_netns_cookie(NULL) should be fine.
>>>
>>> I meant to ask if sock_net(sk) may return NULL for a non NULL sk.
>>> Please check.
>> Oh sorry for the confusion, I investigated with my humble kernel
>> knowledge: essentially sock_net(sk) is doing sk->sk_net->net, retrieving
>> the net struct representing the network namespace, to later extract the
>> cookie, and thus dereference the returned pointer (here is the concern).
>> The sk_net intermediary (in reality __sk_common.skc_net) is here because
>> of the possibility of switching on/off network namespaces via
>> CONFIG_NET_NS. It's a possible_net_t type containing (or not) the struct
>> net pointer, explaining why we use write/read_pnet to no-op or return
>> the global net ns.
>>
>> Now by adding this helper to tracing progs, it allows to call this
>> function in any function entry or function exit, but unlike kprobes,
>> it's not possible to just hook at an obvious arbitrary point in the code
>> where the net ns would be NULL in the sock struct. With that in mind, I
>> failed to crash the kernel tracing a function (some candidates were
>> inlined). I mostly grepped for sock_net_set, but I lack the knowledge to
>
> Thanks for checking.
>
> I took a quick look at the callers of sock_net_set. I suspect
> "fentry/sk_prot_alloc" and "lsm/sk_alloc" could have a NULL?
>
>> guarantee that this could not happen right now or in the future. Maybe
>> that would be just safer to add a check and return 0 in that case if
>> that's ok? Not sure since the helper returns an 8-byte long opaque
>> number which thus includes 0 as a valid value.
>
> I assume net_cookie 0 is invalid, but then it leaks the implementation
> details of what is a valid cookie in a uapi helper
>
> * u64 bpf_get_netns_cookie(void *ctx)
> * ...
> * Return
> * A 8-byte long opaque number
>
> Note that, the tracing program can already read most fields of the sk,
> including sk->sk_net.net->net_cookie. Therefore, what this patch aims
> to achieve has already been supported in tracing. It can also save a
> helper call.
>
> The only thing that may be missing in your use case is determining the
> init_net. I don't think reading a global kernel variable has been
> supported yet. Not sure if init_net must have net_cookie 1. Otherwise,
> we could consider to add a kfunc to return &init_net, which could be
> used to compare with sk->sk_net.net. Having a pointer to &init_net
> might be more useful for other tracing use cases in general.
There is the workaround for this tracing use case.
1. Declare a global variable in the bpf program, e.g.
struct net *init_net;
2. After skel_open and before skel_load, find init_net address (from /proc/kallsyms) and
assign the address to skel->bss->init_net.
3. In the prog, do
struct net *netns = bpf_rdonly_cast(init_net, bpf_core_type_id_kernel(struct net));
bpf_printk("%u\n", netns->net_cookie);
There is an effort to add global variables to BTF.
See https://lore.kernel.org/bpf/20250207012045.2129841-1-stephen.s.brennan@oracle.com/
The recommended way is to put these global variables in a module to avoid consume
too much kernel memory unconditionally.
prev parent reply other threads:[~2025-03-08 6:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20250227182830.90863-1-mahe.tardy@gmail.com>
2025-02-27 20:32 ` [PATCH bpf-next 1/2] bpf: add get_netns_cookie helper to tracing programs Martin KaFai Lau
2025-03-03 10:14 ` Mahe Tardy
2025-03-03 19:14 ` Martin KaFai Lau
2025-03-06 17:03 ` Mahe Tardy
2025-03-07 23:06 ` Martin KaFai Lau
2025-03-08 6:17 ` Yonghong Song [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=908c6a63-3049-4dd2-859a-215b31e5d1ea@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=mahe.tardy@gmail.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).