From: Jakub Sitnicki <jakub@cloudflare.com>
To: Amery Hung <ameryhung@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Kuniyuki Iwashima <kuniyu@google.com>,
bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jakub Kicinski <kuba@kernel.org>,
Jiayuan Chen <jiayuan.chen@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
Network Development <netdev@vger.kernel.org>,
kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
Date: Tue, 23 Jun 2026 22:36:02 +0200 [thread overview]
Message-ID: <878q85yoy5.fsf@cloudflare.com> (raw)
In-Reply-To: <CAMB2axMVhJJpP5HZtDFyQLLbKoRxhW08rj1zGRtWtgDkfYaVNA@mail.gmail.com> (Amery Hung's message of "Tue, 23 Jun 2026 13:22:38 -0700")
On Tue, Jun 23, 2026 at 01:22 PM -07, Amery Hung wrote:
> On Tue, Jun 23, 2026 at 1:04 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> On Tue, Jun 23, 2026 at 12:33 PM -07, Alexei Starovoitov wrote:
>> > On Tue, Jun 23, 2026 at 12:31 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>> >>
>> >> On Tue, Jun 23, 2026 at 12:21 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >> >
>> >> > On Tue, Jun 23, 2026 at 09:08 AM -07, Kuniyuki Iwashima wrote:
>> >> > > On Tue, Jun 23, 2026 at 4:20 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >> > >>
>> >> > >> Prepare to decouple BPF_SYSCALL config option from NET_SOCK_MSG. When
>> >> > >> completed all code paths related to sockmap-based redirects should be
>> >> > >> guarded by BPF_SYSCALL && NET_SOCK_MSG to allow users to opt out by
>> >> > >> disabling NET_SOCK_MSG. The implementation of sockmap as a container for
>> >> > >> socket references would remain under BPF_SYSCALL.
>> >> > >>
>> >> > >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> >> > >> ---
>> >> > >> Changes in v2:
>> >> > >> - Handle prot->recvmsg being NULL (Sashiko)
>> >> > >> - Elaborate on the end goal in description
>> >> > >> - Link to v1: https://patch.msgid.link/20260622-bpf-sk_msg-split-unix-v1-1-d7e0cb7bb03b@cloudflare.com
>> >> > >> ---
>> >> > >> net/unix/af_unix.c | 4 ++--
>> >> > >> net/unix/unix_bpf.c | 6 ++++++
>> >> > >> 2 files changed, 8 insertions(+), 2 deletions(-)
>> >> > >>
>> >> > >> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> >> > >> index f7a9d55eee8a..84c11c60c75f 100644
>> >> > >> --- a/net/unix/af_unix.c
>> >> > >> +++ b/net/unix/af_unix.c
>> >> > >> @@ -2675,7 +2675,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t si
>> >> > >> #ifdef CONFIG_BPF_SYSCALL
>> >> > >> const struct proto *prot = READ_ONCE(sk->sk_prot);
>> >> > >>
>> >> > >> - if (prot != &unix_dgram_proto)
>> >> > >> + if (prot->recvmsg)
>> >> > >
>> >> > > There is no reason to have this dead branch when
>> >> > > CONFIG_BPF_SYSCALL && !NET_SOCK_MSG.
>> >> > >
>> >> > > Let's compile out all sockmap code when both configs
>> >> > > are not enabled.
>> >> > >
>> >> > > Since AF_UNIX differs from TCP/UDP, it can take the
>> >> > > simpler approach.
>> >> >
>> >> > Okay, will put the whole file behind hidden config option like so:
>> >> >
>> >> > --- a/net/unix/Kconfig
>> >> > +++ b/net/unix/Kconfig
>> >> > @@ -30,3 +30,8 @@ config UNIX_DIAG
>> >> > help
>> >> > Support for UNIX socket monitoring interface used by the ss tool.
>> >> > If unsure, say Y.
>> >> > +
>> >> > +config UNIX_BPF
>> >>
>> >> Maybe UNIX_BPF_SOCKMAP or something.
>> >> bpf_iter is supported without this config.
>> >
>> > I don't like where it's going.
>> > I strongly dislike new config knobs.
>> > I'd rather remove existing knobs.
>> > What is the motivation?
>>
>> The goal is to compile out sockmap bits that use sk_msg.
>> NET_SOCK_MSG is natural, exisiting candidate.
>> New knob wasn't my idea.
>
> I'm also missing the big picture here.
>
> sockmap already holds socket references today. You can store and look
> up sockets without attaching any verdict/parser program, and no
> redirect happens. So if the goal is to use sockmap purely as a socket
> container without the sk_msg fast-path overhead, what does a
> compile-time NET_SOCK_MSG knob add over the runtime checks?
Sure, let me clarify. It's about the maintenance overhead.
sockmap-based redirects are a rather niche feature with few users, for
which we've been getting quite a few bug reports since AI came along.
We're not using it internally at Cloudflare, so I don't really have a
good reason to justify time spent on these bug reports.
Hence the move to put sockmap-based redirect behind a config option,
which you can enable at your own risk. Or which we can deprecate, but
that's not really my call.
> I am also not sure if NET_SOCK_MSG is right. It is broader than
> "sockmap redirect". It is selected by TLS and {INET,INET6}_ESPINTCP.
> Because those select it, it can't be toggled independently.
Once the sockmap redirect bits are behind _some_ config option, it will
be easy to replace it with a more granular one that depends on
NET_SOCK_MSG. But we're not there yet. One step at a time.
> Could you share the concrete use case you have in mind, and whether
> this came out of an earlier discussion or thread upstream?
This is a follow up from discussions at BPF summit with Alexei & John.
next prev parent reply other threads:[~2026-06-23 20:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 11:20 [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG Jakub Sitnicki
2026-06-23 16:08 ` Kuniyuki Iwashima
2026-06-23 19:21 ` Jakub Sitnicki
2026-06-23 19:31 ` Kuniyuki Iwashima
2026-06-23 19:33 ` Alexei Starovoitov
2026-06-23 20:03 ` Jakub Sitnicki
2026-06-23 20:13 ` Kuniyuki Iwashima
2026-06-23 20:22 ` Amery Hung
2026-06-23 20:36 ` Jakub Sitnicki [this message]
2026-06-23 20:44 ` Amery Hung
2026-06-23 21:26 ` Alexei Starovoitov
2026-06-23 20:09 ` Jakub Sitnicki
2026-06-23 20:14 ` Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878q85yoy5.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jiayuan.chen@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox