From: Jakub Sitnicki <jakub@cloudflare.com>
To: "Shanti Lombard née Bouchez-Mongardé" <shanti20210120@mildred.fr>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
bpf <bpf@vger.kernel.org>,
Network Development <netdev@vger.kernel.org>,
Martin KaFai Lau <kafai@fb.com>
Subject: Re: More flexible BPF socket inet_lookup hooking after listening sockets are dispatched
Date: Thu, 21 Jan 2021 12:14:34 +0100 [thread overview]
Message-ID: <87r1me4k4l.fsf@cloudflare.com> (raw)
In-Reply-To: <CAADnVQJnX-+9u--px_VnhrMTPB=O9Y0LH9T7RJbqzfLchbUFvg@mail.gmail.com>
On Wed, Jan 20, 2021 at 10:06 PM CET, Alexei Starovoitov wrote:
> cc-ing the right folks
>
> On Wed, Jan 20, 2021 at 12:30 PM Shanti Lombard née Bouchez-Mongardé
> <shanti20210120@mildred.fr> wrote:
>>
>> Hello,
>>
>> I believe this is my first time here, so please excuse me for mistakes.
>> Also, please Cc me on answers.
>>
>> Background : I am currently investigating putting network services on a
>> machine without using network namespace but still keep them isolated. To
>> do that, I allocated a separate IP address (127.0.0.0/8 for IPv4 and ULA
>> prefix below fd00::/8 for IPv6) and those services are forced to listen
>> to this IP address only. For some, I use seccomp with a small utility I
>> wrote at <https://github.com/mildred/force-bind-seccomp>. Now, I still
>> want a few selected services (reverse proxies) to listed for public
>> address but they can't necessarily listen with INADDR_ANY because some
>> other services might listen on the same port on their private IP. It
>> seems SO_REUSEADDR can be used to circumvent this on BSD but not on
>> Linux. After much research, I found Cloudflare recent contribution
>> (explained here <https://blog.cloudflare.com/its-crowded-in-here/>)
>> about inet_lookup BPF programs that could replace INADDR_ANY listening.
There is also documentation in the kernel:
https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html
>> The inet_lookup BPF programs are hooking up in socket selection code for
>> incoming packets after connected packets are dispatched to their
>> respective sockets but before any new connection is dispatched to a
>> listening socket. This is well explained in the blog post.
>>
>> However, I believe that being able to hook up later in the process could
>> have great use cases. With its current position, the BPF program can
>> override any listening socket too easily. It can also be surprising for
>> administrators used to the socket API not understanding why their
>> listening socket does not receives any packet.
>>
>> Socket selection process (in net/ipv4/inet_hashtables.c function
>> __inet_lookup_listener):
>>
>> - A: look for already connected sockets (before __inet_lookup_listener)
>> - B: look for inet_lookup BPF programs
>> - C: look for listening sockets specifying address and port
>> - D: here, provide another inet_lookup BPF hook
>> - E: look for sockets listening using INADDR_ANY
>> - F: here, provide another inet_lookup BPF hook
>>
>> In position D, a BPF program could implement socket listening like
>> INADDR_ANY listening would do but without the limitation that the port
>> must not be listened on by another IP address
>>
>> In position F, a BPF program could redirect new connection attempts to a
>> socket of its choice, allowing any connection attempt to be intercepted
>> if not catched before by an already listening socket.
Existing hook is placed before regular listening/unconnected socket
lookup to prevent port hijacking on the unprivileged range.
>> The suggestion above would work for my use case, but there is another
>> possibility to make the same use cases possible : implement in BPF (or
>> allow BPF to call) the C and E steps above so the BPF program can
>> supplant the kernel behavior. I find this solution less elegant and it
>> might not work well in case there are multiple inet_lookup BPF programs
>> installed.
Having a BPF helper available to BPF sk_lookup programs that looks up a
socket by packet 4-tuple and netns ID in tcp/udp hashtables sounds
reasonable to me. You gain the flexibility that you describe without
adding code on the hot path.
[...]
next prev parent reply other threads:[~2021-01-21 11:18 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-20 13:40 More flexible BPF socket inet_lookup hooking after listening sockets are dispatched Shanti Lombard née Bouchez-Mongardé
2021-01-20 21:06 ` Alexei Starovoitov
2021-01-21 11:14 ` Jakub Sitnicki [this message]
2021-01-21 20:40 ` Shanti Lombard
2021-01-21 22:08 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1me4k4l.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
--cc=shanti20210120@mildred.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.