From: Jakub Sitnicki <jakub@cloudflare.com>
To: Amery Hung <ameryhung@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Kuniyuki Iwashima <kuniyu@google.com>,
bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jakub Kicinski <kuba@kernel.org>,
Jiayuan Chen <jiayuan.chen@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
Network Development <netdev@vger.kernel.org>,
kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
Date: Tue, 23 Jun 2026 22:36:02 +0200 [thread overview]
Message-ID: <878q85yoy5.fsf@cloudflare.com> (raw)
In-Reply-To: <CAMB2axMVhJJpP5HZtDFyQLLbKoRxhW08rj1zGRtWtgDkfYaVNA@mail.gmail.com> (Amery Hung's message of "Tue, 23 Jun 2026 13:22:38 -0700")
On Tue, Jun 23, 2026 at 01:22 PM -07, Amery Hung wrote:
> On Tue, Jun 23, 2026 at 1:04 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> On Tue, Jun 23, 2026 at 12:33 PM -07, Alexei Starovoitov wrote:
>> > On Tue, Jun 23, 2026 at 12:31 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>> >>
>> >> On Tue, Jun 23, 2026 at 12:21 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >> >
>> >> > On Tue, Jun 23, 2026 at 09:08 AM -07, Kuniyuki Iwashima wrote:
>> >> > > On Tue, Jun 23, 2026 at 4:20 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >> > >>
>> >> > >> Prepare to decouple BPF_SYSCALL config option from NET_SOCK_MSG. When
>> >> > >> completed all code paths related to sockmap-based redirects should be
>> >> > >> guarded by BPF_SYSCALL && NET_SOCK_MSG to allow users to opt out by
>> >> > >> disabling NET_SOCK_MSG. The implementation of sockmap as a container for
>> >> > >> socket references would remain under BPF_SYSCALL.
>> >> > >>
>> >> > >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> >> > >> ---
>> >> > >> Changes in v2:
>> >> > >> - Handle prot->recvmsg being NULL (Sashiko)
>> >> > >> - Elaborate on the end goal in description
>> >> > >> - Link to v1: https://patch.msgid.link/20260622-bpf-sk_msg-split-unix-v1-1-d7e0cb7bb03b@cloudflare.com
>> >> > >> ---
>> >> > >> net/unix/af_unix.c | 4 ++--
>> >> > >> net/unix/unix_bpf.c | 6 ++++++
>> >> > >> 2 files changed, 8 insertions(+), 2 deletions(-)
>> >> > >>
>> >> > >> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> >> > >> index f7a9d55eee8a..84c11c60c75f 100644
>> >> > >> --- a/net/unix/af_unix.c
>> >> > >> +++ b/net/unix/af_unix.c
>> >> > >> @@ -2675,7 +2675,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t si
>> >> > >> #ifdef CONFIG_BPF_SYSCALL
>> >> > >> const struct proto *prot = READ_ONCE(sk->sk_prot);
>> >> > >>
>> >> > >> - if (prot != &unix_dgram_proto)
>> >> > >> + if (prot->recvmsg)
>> >> > >
>> >> > > There is no reason to have this dead branch when
>> >> > > CONFIG_BPF_SYSCALL && !NET_SOCK_MSG.
>> >> > >
>> >> > > Let's compile out all sockmap code when both configs
>> >> > > are not enabled.
>> >> > >
>> >> > > Since AF_UNIX differs from TCP/UDP, it can take the
>> >> > > simpler approach.
>> >> >
>> >> > Okay, will put the whole file behind hidden config option like so:
>> >> >
>> >> > --- a/net/unix/Kconfig
>> >> > +++ b/net/unix/Kconfig
>> >> > @@ -30,3 +30,8 @@ config UNIX_DIAG
>> >> > help
>> >> > Support for UNIX socket monitoring interface used by the ss tool.
>> >> > If unsure, say Y.
>> >> > +
>> >> > +config UNIX_BPF
>> >>
>> >> Maybe UNIX_BPF_SOCKMAP or something.
>> >> bpf_iter is supported without this config.
>> >
>> > I don't like where it's going.
>> > I strongly dislike new config knobs.
>> > I'd rather remove existing knobs.
>> > What is the motivation?
>>
>> The goal is to compile out sockmap bits that use sk_msg.
>> NET_SOCK_MSG is natural, exisiting candidate.
>> New knob wasn't my idea.
>
> I'm also missing the big picture here.
>
> sockmap already holds socket references today. You can store and look
> up sockets without attaching any verdict/parser program, and no
> redirect happens. So if the goal is to use sockmap purely as a socket
> container without the sk_msg fast-path overhead, what does a
> compile-time NET_SOCK_MSG knob add over the runtime checks?
Sure, let me clarify. It's about the maintenance overhead.
sockmap-based redirects are a rather niche feature with few users, for
which we've been getting quite a few bug reports since AI came along.
We're not using it internally at Cloudflare, so I don't really have a
good reason to justify time spent on these bug reports.
Hence the move to put sockmap-based redirect behind a config option,
which you can enable at your own risk. Or which we can deprecate, but
that's not really my call.
> I am also not sure if NET_SOCK_MSG is right. It is broader than
> "sockmap redirect". It is selected by TLS and {INET,INET6}_ESPINTCP.
> Because those select it, it can't be toggled independently.
Once the sockmap redirect bits are behind _some_ config option, it will
be easy to replace it with a more granular one that depends on
NET_SOCK_MSG. But we're not there yet. One step at a time.
> Could you share the concrete use case you have in mind, and whether
> this came out of an earlier discussion or thread upstream?
This is a follow up from discussions at BPF summit with Alexei & John.
next prev parent reply other threads:[~2026-06-23 20:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 11:20 [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG Jakub Sitnicki
2026-06-23 16:08 ` Kuniyuki Iwashima
2026-06-23 19:21 ` Jakub Sitnicki
2026-06-23 19:31 ` Kuniyuki Iwashima
2026-06-23 19:33 ` Alexei Starovoitov
2026-06-23 20:03 ` Jakub Sitnicki
2026-06-23 20:13 ` Kuniyuki Iwashima
2026-06-23 20:22 ` Amery Hung
2026-06-23 20:36 ` Jakub Sitnicki [this message]
2026-06-23 20:44 ` Amery Hung
2026-06-23 21:26 ` Alexei Starovoitov
2026-06-24 1:32 ` Jiayuan Chen
2026-06-23 20:09 ` Jakub Sitnicki
2026-06-23 20:14 ` Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878q85yoy5.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jiayuan.chen@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.