From: Jakub Sitnicki <jakub@cloudflare.com>
To: Philo Lu <lulie@linux.alibaba.com>
Cc: bpf <bpf@vger.kernel.org>,
netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
kernel-team <kernel-team@cloudflare.com>
Subject: Re: Question: Move BPF_SK_LOOKUP ahead of connected UDP sk lookup?
Date: Wed, 21 Aug 2024 11:23:16 +0200 [thread overview]
Message-ID: <87bk1mdybf.fsf@cloudflare.com> (raw)
In-Reply-To: <6e239bb7-b7f9-4a40-bd1d-a522d4b9529c@linux.alibaba.com> (Philo Lu's message of "Tue, 20 Aug 2024 20:31:00 +0800")
Hi Philo,
[CC Eric and Paolo who have more context than me here.]
On Tue, Aug 20, 2024 at 08:31 PM +08, Philo Lu wrote:
> Hi all, I wonder if it is feasible to move BPF_SK_LOOKUP ahead of connected UDP
> sk lookup?
>
> That is something like:
> (i.e., move connected udp socket lookup behind bpf sk lookup prog)
> ```
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index ddb86baaea6c8..9a1408775bcb1 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -493,13 +493,6 @@ struct sock *__udp4_lib_lookup(const struct net *net,
> __be32 saddr,
> slot2 = hash2 & udptable->mask;
> hslot2 = &udptable->hash2[slot2];
>
> - /* Lookup connected or non-wildcard socket */
> - result = udp4_lib_lookup2(net, saddr, sport,
> - daddr, hnum, dif, sdif,
> - hslot2, skb);
> - if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
> - goto done;
> -
> /* Lookup redirect from BPF */
> if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
> udptable == net->ipv4.udp_table) {
> @@ -512,6 +505,13 @@ struct sock *__udp4_lib_lookup(const struct net *net,
> __be32 saddr,
> }
> }
>
> + /* Lookup connected or non-wildcard socket */
> + result = udp4_lib_lookup2(net, saddr, sport,
> + daddr, hnum, dif, sdif,
> + hslot2, skb);
> + if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
> + goto done;
> +
> /* Got non-wildcard socket or error on first lookup */
> if (result)
> goto done;
> ```
>
> This will be useful, e.g., if there are many concurrent udp sockets of a same
> ip:port, where udp4_lib_lookup2() may induce high softirq overhead, because it
> computes score for all sockets of the ip:port. With bpf sk_lookup prog, we can
> implement 4-tuple hash for udp socket lookup to solve the problem (if bpf prog
> runs before udp4_lib_lookup2).
>
> Currently, in udp, bpf sk lookup runs after connected socket lookup. IIUC, this
> is because the early version of SK_LOOKUP[0] modified local_ip/local_port to
> redirect socket. This may interact wrongly with udp lookup because udp uses
> score to select socket, and setting local_ip/local_port cannot guarantee the
> result socket selected. However, now we get socket directly from map in bpf
> sk_lookup prog, so the above problem no longer exists.
>
> So is there any other problem on it?Or I'll try to work on it and commit
> patches later.
>
> [0]https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
>
> Thank you for your time.
It was done like that to maintain the connected UDP socket guarantees.
Similarly to the established TCP sockets. The contract is that if you
are bound to a 4-tuple, you will receive the packets destined to it.
It sounds like you are looking for an efficient way to lookup a
connected UDP socket. We would be interested in that as well. We use
connected UDP/QUIC on egress where we don't expect the peer to roam and
change its address. There's a memory cost on the kernel side to using
them, but they make it easier to structure your application, because you
can have roughly the same design for TCP and UDP transport.
So what if instead of doing it in BPF, we make it better for everyone
and introduce a hash table keyed by 4-tuple for connected sockets in the
udp stack itself (counterpart of ehash in tcp)?
Thanks,
(the other) Jakub
next prev parent reply other threads:[~2024-08-21 9:23 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-20 12:31 Question: Move BPF_SK_LOOKUP ahead of connected UDP sk lookup? Philo Lu
2024-08-21 9:23 ` Jakub Sitnicki [this message]
2024-08-21 11:44 ` Philo Lu
2024-08-22 18:29 ` Jakub Sitnicki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bk1mdybf.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=edumazet@google.com \
--cc=kernel-team@cloudflare.com \
--cc=lulie@linux.alibaba.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).