BPF List
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Amery Hung <ameryhung@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	 Kuniyuki Iwashima <kuniyu@google.com>,
	 bpf <bpf@vger.kernel.org>,  Alexei Starovoitov <ast@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Jiayuan Chen <jiayuan.chen@linux.dev>,
	 John Fastabend <john.fastabend@gmail.com>,
	 Network Development <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
Date: Tue, 23 Jun 2026 22:36:02 +0200	[thread overview]
Message-ID: <878q85yoy5.fsf@cloudflare.com> (raw)
In-Reply-To: <CAMB2axMVhJJpP5HZtDFyQLLbKoRxhW08rj1zGRtWtgDkfYaVNA@mail.gmail.com> (Amery Hung's message of "Tue, 23 Jun 2026 13:22:38 -0700")

On Tue, Jun 23, 2026 at 01:22 PM -07, Amery Hung wrote:
> On Tue, Jun 23, 2026 at 1:04 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> On Tue, Jun 23, 2026 at 12:33 PM -07, Alexei Starovoitov wrote:
>> > On Tue, Jun 23, 2026 at 12:31 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>> >>
>> >> On Tue, Jun 23, 2026 at 12:21 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >> >
>> >> > On Tue, Jun 23, 2026 at 09:08 AM -07, Kuniyuki Iwashima wrote:
>> >> > > On Tue, Jun 23, 2026 at 4:20 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >> > >>
>> >> > >> Prepare to decouple BPF_SYSCALL config option from NET_SOCK_MSG. When
>> >> > >> completed all code paths related to sockmap-based redirects should be
>> >> > >> guarded by BPF_SYSCALL && NET_SOCK_MSG to allow users to opt out by
>> >> > >> disabling NET_SOCK_MSG. The implementation of sockmap as a container for
>> >> > >> socket references would remain under BPF_SYSCALL.
>> >> > >>
>> >> > >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> >> > >> ---
>> >> > >> Changes in v2:
>> >> > >> - Handle prot->recvmsg being NULL (Sashiko)
>> >> > >> - Elaborate on the end goal in description
>> >> > >> - Link to v1: https://patch.msgid.link/20260622-bpf-sk_msg-split-unix-v1-1-d7e0cb7bb03b@cloudflare.com
>> >> > >> ---
>> >> > >>  net/unix/af_unix.c  | 4 ++--
>> >> > >>  net/unix/unix_bpf.c | 6 ++++++
>> >> > >>  2 files changed, 8 insertions(+), 2 deletions(-)
>> >> > >>
>> >> > >> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> >> > >> index f7a9d55eee8a..84c11c60c75f 100644
>> >> > >> --- a/net/unix/af_unix.c
>> >> > >> +++ b/net/unix/af_unix.c
>> >> > >> @@ -2675,7 +2675,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t si
>> >> > >>  #ifdef CONFIG_BPF_SYSCALL
>> >> > >>         const struct proto *prot = READ_ONCE(sk->sk_prot);
>> >> > >>
>> >> > >> -       if (prot != &unix_dgram_proto)
>> >> > >> +       if (prot->recvmsg)
>> >> > >
>> >> > > There is no reason to have this dead branch when
>> >> > > CONFIG_BPF_SYSCALL && !NET_SOCK_MSG.
>> >> > >
>> >> > > Let's compile out all sockmap code when both configs
>> >> > > are not enabled.
>> >> > >
>> >> > > Since AF_UNIX differs from TCP/UDP, it can take the
>> >> > > simpler approach.
>> >> >
>> >> > Okay, will put the whole file behind hidden config option like so:
>> >> >
>> >> > --- a/net/unix/Kconfig
>> >> > +++ b/net/unix/Kconfig
>> >> > @@ -30,3 +30,8 @@ config UNIX_DIAG
>> >> >         help
>> >> >           Support for UNIX socket monitoring interface used by the ss tool.
>> >> >           If unsure, say Y.
>> >> > +
>> >> > +config UNIX_BPF
>> >>
>> >> Maybe UNIX_BPF_SOCKMAP or something.
>> >> bpf_iter is supported without this config.
>> >
>> > I don't like where it's going.
>> > I strongly dislike new config knobs.
>> > I'd rather remove existing knobs.
>> > What is the motivation?
>>
>> The goal is to compile out sockmap bits that use sk_msg.
>> NET_SOCK_MSG is natural, exisiting candidate.
>> New knob wasn't my idea.
>
> I'm also missing the big picture here.
>
> sockmap already holds socket references today. You can store and look
> up sockets without attaching any verdict/parser program, and no
> redirect happens. So if the goal is to use sockmap purely as a socket
> container without the sk_msg fast-path overhead, what does a
> compile-time NET_SOCK_MSG knob add over the runtime checks?

Sure, let me clarify. It's about the maintenance overhead.

sockmap-based redirects are a rather niche feature with few users, for
which we've been getting quite a few bug reports since AI came along.

We're not using it internally at Cloudflare, so I don't really have a
good reason to justify time spent on these bug reports.

Hence the move to put sockmap-based redirect behind a config option,
which you can enable at your own risk. Or which we can deprecate, but
that's not really my call.

> I am also not sure if NET_SOCK_MSG is right. It is broader than
> "sockmap redirect". It is selected by TLS and {INET,INET6}_ESPINTCP.
> Because those select it, it can't be toggled independently.

Once the sockmap redirect bits are behind _some_ config option, it will
be easy to replace it with a more granular one that depends on
NET_SOCK_MSG. But we're not there yet. One step at a time.

> Could you share the concrete use case you have in mind, and whether
> this came out of an earlier discussion or thread upstream?

This is a follow up from discussions at BPF summit with Alexei & John.

  reply	other threads:[~2026-06-23 20:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 11:20 [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG Jakub Sitnicki
2026-06-23 16:08 ` Kuniyuki Iwashima
2026-06-23 19:21   ` Jakub Sitnicki
2026-06-23 19:31     ` Kuniyuki Iwashima
2026-06-23 19:33       ` Alexei Starovoitov
2026-06-23 20:03         ` Jakub Sitnicki
2026-06-23 20:13           ` Kuniyuki Iwashima
2026-06-23 20:22           ` Amery Hung
2026-06-23 20:36             ` Jakub Sitnicki [this message]
2026-06-23 20:44               ` Amery Hung
2026-06-23 21:26               ` Alexei Starovoitov
2026-06-23 20:09       ` Jakub Sitnicki
2026-06-23 20:14         ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878q85yoy5.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jiayuan.chen@linux.dev \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox