netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: "Shanti Lombard née Bouchez-Mongardé" <shanti20210120@mildred.fr>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	bpf <bpf@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	Martin KaFai Lau <kafai@fb.com>
Subject: Re: More flexible BPF socket inet_lookup hooking after listening sockets are dispatched
Date: Thu, 21 Jan 2021 12:14:34 +0100	[thread overview]
Message-ID: <87r1me4k4l.fsf@cloudflare.com> (raw)
In-Reply-To: <CAADnVQJnX-+9u--px_VnhrMTPB=O9Y0LH9T7RJbqzfLchbUFvg@mail.gmail.com>

On Wed, Jan 20, 2021 at 10:06 PM CET, Alexei Starovoitov wrote:
> cc-ing the right folks
>
> On Wed, Jan 20, 2021 at 12:30 PM Shanti Lombard née Bouchez-Mongardé
> <shanti20210120@mildred.fr> wrote:
>>
>> Hello,
>>
>> I believe this is my first time here, so please excuse me for mistakes.
>> Also, please Cc me on answers.
>>
>> Background : I am currently investigating putting network services on a
>> machine without using network namespace but still keep them isolated. To
>> do that, I allocated a separate IP address (127.0.0.0/8 for IPv4 and ULA
>> prefix below fd00::/8 for IPv6) and those services are forced to listen
>> to this IP address only. For some, I use seccomp with a small utility I
>> wrote at <https://github.com/mildred/force-bind-seccomp>. Now, I still
>> want a few selected services (reverse proxies) to listed for public
>> address but they can't necessarily listen with INADDR_ANY because some
>> other services might listen on the same port on their private IP. It
>> seems SO_REUSEADDR can be used to circumvent this on BSD but not on
>> Linux. After much research, I found Cloudflare recent contribution
>> (explained here <https://blog.cloudflare.com/its-crowded-in-here/>)
>> about inet_lookup BPF programs that could replace INADDR_ANY listening.

There is also documentation in the kernel:

https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html

>> The inet_lookup BPF programs are hooking up in socket selection code for
>> incoming packets after connected packets are dispatched to their
>> respective sockets but before any new connection is dispatched to a
>> listening socket. This is well explained in the blog post.
>>
>> However, I believe that being able to hook up later in the process could
>> have great use cases. With its current position, the BPF program can
>> override any listening socket too easily. It can also be surprising for
>> administrators used to the socket API not understanding why their
>> listening socket does not receives any packet.
>>
>> Socket selection process (in net/ipv4/inet_hashtables.c function
>> __inet_lookup_listener):
>>
>> - A: look for already connected sockets (before __inet_lookup_listener)
>> - B: look for inet_lookup BPF programs
>> - C: look for listening sockets specifying address and port
>> - D: here, provide another inet_lookup BPF hook
>> - E: look for sockets listening using INADDR_ANY
>> - F: here, provide another inet_lookup BPF hook
>>
>> In position D, a BPF program could implement socket listening like
>> INADDR_ANY listening would do but without the limitation that the port
>> must not be listened on by another IP address
>>
>> In position F, a BPF program could redirect new connection attempts to a
>> socket of its choice, allowing any connection attempt to be intercepted
>> if not catched before by an already listening socket.

Existing hook is placed before regular listening/unconnected socket
lookup to prevent port hijacking on the unprivileged range.

>> The suggestion above would work for my use case, but there is another
>> possibility to make the same use cases possible : implement in BPF (or
>> allow BPF to call) the C and E steps above so the BPF program can
>> supplant the kernel behavior. I find this solution less elegant and it
>> might not work well in case there are multiple inet_lookup BPF programs
>> installed.

Having a BPF helper available to BPF sk_lookup programs that looks up a
socket by packet 4-tuple and netns ID in tcp/udp hashtables sounds
reasonable to me. You gain the flexibility that you describe without
adding code on the hot path.

[...]

  reply	other threads:[~2021-01-21 11:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <afb4e544-d081-eee8-e792-a480364a6572@mildred.fr>
2021-01-20 21:06 ` More flexible BPF socket inet_lookup hooking after listening sockets are dispatched Alexei Starovoitov
2021-01-21 11:14   ` Jakub Sitnicki [this message]
2021-01-21 20:40     ` Shanti Lombard
2021-01-21 22:08       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1me4k4l.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=shanti20210120@mildred.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).