netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Philo Lu <lulie@linux.alibaba.com>
Cc: bpf <bpf@vger.kernel.org>,
	 netdev@vger.kernel.org,  ast@kernel.org, daniel@iogearbox.net,
	 andrii@kernel.org,  Eric Dumazet <edumazet@google.com>,
	 Paolo Abeni <pabeni@redhat.com>,
	 kernel-team <kernel-team@cloudflare.com>
Subject: Re: Question: Move BPF_SK_LOOKUP ahead of connected UDP sk lookup?
Date: Thu, 22 Aug 2024 20:29:03 +0200	[thread overview]
Message-ID: <877cc8e7io.fsf@cloudflare.com> (raw)
In-Reply-To: <2fd14650-2294-4285-b3a5-88b443367a79@linux.alibaba.com> (Philo Lu's message of "Wed, 21 Aug 2024 19:44:27 +0800")

On Wed, Aug 21, 2024 at 07:44 PM +08, Philo Lu wrote:
> On 2024/8/21 17:23, Jakub Sitnicki wrote:
>> Hi Philo,
>> [CC Eric and Paolo who have more context than me here.]
>> On Tue, Aug 20, 2024 at 08:31 PM +08, Philo Lu wrote:
>>> Hi all, I wonder if it is feasible to move BPF_SK_LOOKUP ahead of connected UDP
>>> sk lookup?
>>>
> ...
>>>
>>> So is there any other problem on it?Or I'll try to work on it and commit
>>> patches later.
>>>
>>> [0]https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
>>>
>>> Thank you for your time.
>> It was done like that to maintain the connected UDP socket guarantees.
>> Similarly to the established TCP sockets. The contract is that if you
>> are bound to a 4-tuple, you will receive the packets destined to it.
>> 
>
> Thanks for your explaination. IIUC, bpf_sk_lookup was designed to skip connected
> socket lookup (established for TCP and connected for UDP), so it is not supposed
> to run before connected UDP lookup.
> (though it seems so close to solve our problem...)

Yes, correct. Motivation behind bpf_sk_lookup was to steer TCP
connections & UDP flows to listening / unconnected sockets, like you can
do with TPROXY [1].

Since it had nothing to do with established / connected sockets, we
added the BPF hook in such a way that they are unaffected by it.

>> It sounds like you are looking for an efficient way to lookup a
>> connected UDP socket. We would be interested in that as well. We use> connected UDP/QUIC on egress where we don't expect the peer to roam and
>> change its address. There's a memory cost on the kernel side to using
>> them, but they make it easier to structure your application, because you
>> can have roughly the same design for TCP and UDP transport.
>> 
> Yes, we have exactly the same problem.

Good to know that there are other users of connected UDP out there.

Loosely related - I'm planning to raise the question if using connected
UDP sockets on ingress makes sense for QUIC at Plumbers [2].  Connected
UDP lookup performance is one of the aspects, here.

>> So what if instead of doing it in BPF, we make it better for everyone
>> and introduce a hash table keyed by 4-tuple for connected sockets in the
>> udp stack itself (counterpart of ehash in tcp)?
>
> This solution is also ok to me. But I'm not sure are there previous attempts or
> technical problems on it?
>
> In fact, I have done a simple test with 4-tuple UDP lookup, and it does make a
> difference:
> (kernel-5.10, 1000 connected UDP socket on server, use sockperf to send msg to
> one of them, and take average for 5s)
>
> Without 4-tuple lookup:
>
> %Cpu0: 0.0 us, 0.0 sy, 0.0 ni,  0.0 id, 0.0 wa, 0.0 hi, 100.0 si, 0.0 st
> %Cpu1: 0.2 us, 0.2 sy, 0.0 ni, 99.4 id, 0.0 wa, 0.2 hi,   0.0 si, 0.0 st
> MiB Mem :7625.1 total,   6761.5 free,    210.2 used,    653.4 buff/cache
> MiB Swap:   0.0 total,      0.0 free,      0.0 used.   7176.2 avail Mem
>
> ---
> With 4-tuple lookup:
>
> %Cpu0: 0.2 us, 0.4 sy, 0.0 ni, 48.1 id, 0.0 wa, 1.2 hi, 50.1 si,  0.0 st
> %Cpu1: 0.6 us, 0.4 sy, 0.0 ni, 98.8 id, 0.0 wa, 0.2 hi,  0.0 si,  0.0 st
> MiB Mem :7625.1 total,   6759.9 free,    211.9 used,    653.3 buff/cache
> MiB Swap:   0.0 total,      0.0 free,      0.0 used.   7174.6 avail Mem

Right. The overhead is expected. All server's connected sockets end up
in one hash bucket and we need to walk a long chain on lookup.

The workaround is not "pretty". You have configure your server to
receive on IP addresses and/or ports :-/

[1] Which also respects established / connected sockets, as long as they
    have_TRANSPARENT flag set.  Users need to set it "manually" for UDP.

[2] https://lpc.events/event/18/abstracts/2134/

      reply	other threads:[~2024-08-22 18:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-20 12:31 Question: Move BPF_SK_LOOKUP ahead of connected UDP sk lookup? Philo Lu
2024-08-21  9:23 ` Jakub Sitnicki
2024-08-21 11:44   ` Philo Lu
2024-08-22 18:29     ` Jakub Sitnicki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877cc8e7io.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@cloudflare.com \
    --cc=lulie@linux.alibaba.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).