BPF List
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Jakub Sitnicki <jakub@cloudflare.com>
Cc: Amery Hung <ameryhung@gmail.com>,
	Kuniyuki Iwashima <kuniyu@google.com>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jakub Kicinski <kuba@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Network Development <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
Date: Wed, 24 Jun 2026 09:32:25 +0800	[thread overview]
Message-ID: <a50cef70-d8fe-4f42-a89b-2c63c33a72ef@linux.dev> (raw)
In-Reply-To: <CAADnVQKr1XisnigNsBw7CsXxY3Xn5KOGtX_YDdXmNMZyJy4_Cw@mail.gmail.com>


On 6/24/26 5:26 AM, Alexei Starovoitov wrote:
> On Tue, Jun 23, 2026 at 1:36 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> On Tue, Jun 23, 2026 at 01:22 PM -07, Amery Hung wrote:
>>> On Tue, Jun 23, 2026 at 1:04 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>>> On Tue, Jun 23, 2026 at 12:33 PM -07, Alexei Starovoitov wrote:
>>>>> On Tue, Jun 23, 2026 at 12:31 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>>>>>> On Tue, Jun 23, 2026 at 12:21 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>>>>>> On Tue, Jun 23, 2026 at 09:08 AM -07, Kuniyuki Iwashima wrote:
>>>>>>>> On Tue, Jun 23, 2026 at 4:20 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>>>>>>>> Prepare to decouple BPF_SYSCALL config option from NET_SOCK_MSG. When
>>>>>>>>> completed all code paths related to sockmap-based redirects should be
>>>>>>>>> guarded by BPF_SYSCALL && NET_SOCK_MSG to allow users to opt out by
>>>>>>>>> disabling NET_SOCK_MSG. The implementation of sockmap as a container for
>>>>>>>>> socket references would remain under BPF_SYSCALL.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>>>>>>>>> ---
>>>>>>>>> Changes in v2:
>>>>>>>>> - Handle prot->recvmsg being NULL (Sashiko)
>>>>>>>>> - Elaborate on the end goal in description
>>>>>>>>> - Link to v1: https://patch.msgid.link/20260622-bpf-sk_msg-split-unix-v1-1-d7e0cb7bb03b@cloudflare.com
>>>>>>>>> ---
>>>>>>>>>   net/unix/af_unix.c  | 4 ++--
>>>>>>>>>   net/unix/unix_bpf.c | 6 ++++++
>>>>>>>>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>>>>>>>>> index f7a9d55eee8a..84c11c60c75f 100644
>>>>>>>>> --- a/net/unix/af_unix.c
>>>>>>>>> +++ b/net/unix/af_unix.c
>>>>>>>>> @@ -2675,7 +2675,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t si
>>>>>>>>>   #ifdef CONFIG_BPF_SYSCALL
>>>>>>>>>          const struct proto *prot = READ_ONCE(sk->sk_prot);
>>>>>>>>>
>>>>>>>>> -       if (prot != &unix_dgram_proto)
>>>>>>>>> +       if (prot->recvmsg)
>>>>>>>> There is no reason to have this dead branch when
>>>>>>>> CONFIG_BPF_SYSCALL && !NET_SOCK_MSG.
>>>>>>>>
>>>>>>>> Let's compile out all sockmap code when both configs
>>>>>>>> are not enabled.
>>>>>>>>
>>>>>>>> Since AF_UNIX differs from TCP/UDP, it can take the
>>>>>>>> simpler approach.
>>>>>>> Okay, will put the whole file behind hidden config option like so:
>>>>>>>
>>>>>>> --- a/net/unix/Kconfig
>>>>>>> +++ b/net/unix/Kconfig
>>>>>>> @@ -30,3 +30,8 @@ config UNIX_DIAG
>>>>>>>          help
>>>>>>>            Support for UNIX socket monitoring interface used by the ss tool.
>>>>>>>            If unsure, say Y.
>>>>>>> +
>>>>>>> +config UNIX_BPF
>>>>>> Maybe UNIX_BPF_SOCKMAP or something.
>>>>>> bpf_iter is supported without this config.
>>>>> I don't like where it's going.
>>>>> I strongly dislike new config knobs.
>>>>> I'd rather remove existing knobs.
>>>>> What is the motivation?
>>>> The goal is to compile out sockmap bits that use sk_msg.
>>>> NET_SOCK_MSG is natural, exisiting candidate.
>>>> New knob wasn't my idea.
>>> I'm also missing the big picture here.
>>>
>>> sockmap already holds socket references today. You can store and look
>>> up sockets without attaching any verdict/parser program, and no
>>> redirect happens. So if the goal is to use sockmap purely as a socket
>>> container without the sk_msg fast-path overhead, what does a
>>> compile-time NET_SOCK_MSG knob add over the runtime checks?
>> Sure, let me clarify. It's about the maintenance overhead.
>>
>> sockmap-based redirects are a rather niche feature with few users, for
>> which we've been getting quite a few bug reports since AI came along.
>>
>> We're not using it internally at Cloudflare, so I don't really have a
>> good reason to justify time spent on these bug reports.
>>
>> Hence the move to put sockmap-based redirect behind a config option,
>> which you can enable at your own risk. Or which we can deprecate, but
>> that's not really my call.


Hi Alexei and Jakub,

skmsg is actually still pretty useful for gateways.
I started with bpf by integrating skmsg into nginx as a module and envoy 
has something similar.
The usual setup is cgroup/sk for L4 bypass (reject SYN), and skmsg for 
L7, redirecting
between local apps by looking at the payload. So there are real users.


> This is wishful thinking that a config knob will stop
> the bug reports.
> Just disable it for real instead.


About the AI bug reports - yeah, I've seen them too. I think it just 
comes from the complexity
of networking plus how programmable bpf is. Reviewing AI-written patches 
is often painful,
the commit message is frequently wrong, once it took me a whole day just 
to reproduce and
confirm the issue. But I do believe these reports will converge eventually.


>>> I am also not sure if NET_SOCK_MSG is right. It is broader than
>>> "sockmap redirect". It is selected by TLS and {INET,INET6}_ESPINTCP.
>>> Because those select it, it can't be toggled independently.
>> Once the sockmap redirect bits are behind _some_ config option, it will
>> be easy to replace it with a more granular one that depends on
>> NET_SOCK_MSG. But we're not there yet. One step at a time.
> No. That's not workable.
>
>>> Could you share the concrete use case you have in mind, and whether
>>> this came out of an earlier discussion or thread upstream?
>> This is a follow up from discussions at BPF summit with Alexei & John.
> Not quite. The discussion was to disable pieces of sockmap
> that are causing trouble.
> Not to move them under config knobs, but disable them.

Agree, just like we remove skmsg from KTLS which is rarely used.


I think the motivation of this patch - making the boundary between skmsg 
and sockmap clear - is worthwhile.

Hope not have skmsg disabled by default.
I don't work on that upper-layer software anymore, but I really don't 
want my ex-colleagues to
upgrade their kernel some day, find the feature I wrote broken, and come 
curse me :) (selfish)


  reply	other threads:[~2026-06-24  1:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 11:20 [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG Jakub Sitnicki
2026-06-23 16:08 ` Kuniyuki Iwashima
2026-06-23 19:21   ` Jakub Sitnicki
2026-06-23 19:31     ` Kuniyuki Iwashima
2026-06-23 19:33       ` Alexei Starovoitov
2026-06-23 20:03         ` Jakub Sitnicki
2026-06-23 20:13           ` Kuniyuki Iwashima
2026-06-23 20:22           ` Amery Hung
2026-06-23 20:36             ` Jakub Sitnicki
2026-06-23 20:44               ` Amery Hung
2026-06-23 21:26               ` Alexei Starovoitov
2026-06-24  1:32                 ` Jiayuan Chen [this message]
2026-06-23 20:09       ` Jakub Sitnicki
2026-06-23 20:14         ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a50cef70-d8fe-4f42-a89b-2c63c33a72ef@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox