All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Jakub Sitnicki <jakub@cloudflare.com>
Cc: Amery Hung <ameryhung@gmail.com>,
	Kuniyuki Iwashima <kuniyu@google.com>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jakub Kicinski <kuba@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Network Development <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
Date: Wed, 24 Jun 2026 09:32:25 +0800	[thread overview]
Message-ID: <a50cef70-d8fe-4f42-a89b-2c63c33a72ef@linux.dev> (raw)
In-Reply-To: <CAADnVQKr1XisnigNsBw7CsXxY3Xn5KOGtX_YDdXmNMZyJy4_Cw@mail.gmail.com>


On 6/24/26 5:26 AM, Alexei Starovoitov wrote:
> On Tue, Jun 23, 2026 at 1:36 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> On Tue, Jun 23, 2026 at 01:22 PM -07, Amery Hung wrote:
>>> On Tue, Jun 23, 2026 at 1:04 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>>> On Tue, Jun 23, 2026 at 12:33 PM -07, Alexei Starovoitov wrote:
>>>>> On Tue, Jun 23, 2026 at 12:31 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>>>>>> On Tue, Jun 23, 2026 at 12:21 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>>>>>> On Tue, Jun 23, 2026 at 09:08 AM -07, Kuniyuki Iwashima wrote:
>>>>>>>> On Tue, Jun 23, 2026 at 4:20 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>>>>>>>> Prepare to decouple BPF_SYSCALL config option from NET_SOCK_MSG. When
>>>>>>>>> completed all code paths related to sockmap-based redirects should be
>>>>>>>>> guarded by BPF_SYSCALL && NET_SOCK_MSG to allow users to opt out by
>>>>>>>>> disabling NET_SOCK_MSG. The implementation of sockmap as a container for
>>>>>>>>> socket references would remain under BPF_SYSCALL.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>>>>>>>>> ---
>>>>>>>>> Changes in v2:
>>>>>>>>> - Handle prot->recvmsg being NULL (Sashiko)
>>>>>>>>> - Elaborate on the end goal in description
>>>>>>>>> - Link to v1: https://patch.msgid.link/20260622-bpf-sk_msg-split-unix-v1-1-d7e0cb7bb03b@cloudflare.com
>>>>>>>>> ---
>>>>>>>>>   net/unix/af_unix.c  | 4 ++--
>>>>>>>>>   net/unix/unix_bpf.c | 6 ++++++
>>>>>>>>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>>>>>>>>> index f7a9d55eee8a..84c11c60c75f 100644
>>>>>>>>> --- a/net/unix/af_unix.c
>>>>>>>>> +++ b/net/unix/af_unix.c
>>>>>>>>> @@ -2675,7 +2675,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t si
>>>>>>>>>   #ifdef CONFIG_BPF_SYSCALL
>>>>>>>>>          const struct proto *prot = READ_ONCE(sk->sk_prot);
>>>>>>>>>
>>>>>>>>> -       if (prot != &unix_dgram_proto)
>>>>>>>>> +       if (prot->recvmsg)
>>>>>>>> There is no reason to have this dead branch when
>>>>>>>> CONFIG_BPF_SYSCALL && !NET_SOCK_MSG.
>>>>>>>>
>>>>>>>> Let's compile out all sockmap code when both configs
>>>>>>>> are not enabled.
>>>>>>>>
>>>>>>>> Since AF_UNIX differs from TCP/UDP, it can take the
>>>>>>>> simpler approach.
>>>>>>> Okay, will put the whole file behind hidden config option like so:
>>>>>>>
>>>>>>> --- a/net/unix/Kconfig
>>>>>>> +++ b/net/unix/Kconfig
>>>>>>> @@ -30,3 +30,8 @@ config UNIX_DIAG
>>>>>>>          help
>>>>>>>            Support for UNIX socket monitoring interface used by the ss tool.
>>>>>>>            If unsure, say Y.
>>>>>>> +
>>>>>>> +config UNIX_BPF
>>>>>> Maybe UNIX_BPF_SOCKMAP or something.
>>>>>> bpf_iter is supported without this config.
>>>>> I don't like where it's going.
>>>>> I strongly dislike new config knobs.
>>>>> I'd rather remove existing knobs.
>>>>> What is the motivation?
>>>> The goal is to compile out sockmap bits that use sk_msg.
>>>> NET_SOCK_MSG is natural, exisiting candidate.
>>>> New knob wasn't my idea.
>>> I'm also missing the big picture here.
>>>
>>> sockmap already holds socket references today. You can store and look
>>> up sockets without attaching any verdict/parser program, and no
>>> redirect happens. So if the goal is to use sockmap purely as a socket
>>> container without the sk_msg fast-path overhead, what does a
>>> compile-time NET_SOCK_MSG knob add over the runtime checks?
>> Sure, let me clarify. It's about the maintenance overhead.
>>
>> sockmap-based redirects are a rather niche feature with few users, for
>> which we've been getting quite a few bug reports since AI came along.
>>
>> We're not using it internally at Cloudflare, so I don't really have a
>> good reason to justify time spent on these bug reports.
>>
>> Hence the move to put sockmap-based redirect behind a config option,
>> which you can enable at your own risk. Or which we can deprecate, but
>> that's not really my call.


Hi Alexei and Jakub,

skmsg is actually still pretty useful for gateways.
I started with bpf by integrating skmsg into nginx as a module and envoy 
has something similar.
The usual setup is cgroup/sk for L4 bypass (reject SYN), and skmsg for 
L7, redirecting
between local apps by looking at the payload. So there are real users.


> This is wishful thinking that a config knob will stop
> the bug reports.
> Just disable it for real instead.


About the AI bug reports - yeah, I've seen them too. I think it just 
comes from the complexity
of networking plus how programmable bpf is. Reviewing AI-written patches 
is often painful,
the commit message is frequently wrong, once it took me a whole day just 
to reproduce and
confirm the issue. But I do believe these reports will converge eventually.


>>> I am also not sure if NET_SOCK_MSG is right. It is broader than
>>> "sockmap redirect". It is selected by TLS and {INET,INET6}_ESPINTCP.
>>> Because those select it, it can't be toggled independently.
>> Once the sockmap redirect bits are behind _some_ config option, it will
>> be easy to replace it with a more granular one that depends on
>> NET_SOCK_MSG. But we're not there yet. One step at a time.
> No. That's not workable.
>
>>> Could you share the concrete use case you have in mind, and whether
>>> this came out of an earlier discussion or thread upstream?
>> This is a follow up from discussions at BPF summit with Alexei & John.
> Not quite. The discussion was to disable pieces of sockmap
> that are causing trouble.
> Not to move them under config knobs, but disable them.

Agree, just like we remove skmsg from KTLS which is rarely used.


I think the motivation of this patch - making the boundary between skmsg 
and sockmap clear - is worthwhile.

Hope not have skmsg disabled by default.
I don't work on that upper-layer software anymore, but I really don't 
want my ex-colleagues to
upgrade their kernel some day, find the feature I wrote broken, and come 
curse me :) (selfish)


  reply	other threads:[~2026-06-24  1:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 11:20 [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG Jakub Sitnicki
2026-06-23 16:08 ` Kuniyuki Iwashima
2026-06-23 19:21   ` Jakub Sitnicki
2026-06-23 19:31     ` Kuniyuki Iwashima
2026-06-23 19:33       ` Alexei Starovoitov
2026-06-23 20:03         ` Jakub Sitnicki
2026-06-23 20:13           ` Kuniyuki Iwashima
2026-06-23 20:22           ` Amery Hung
2026-06-23 20:36             ` Jakub Sitnicki
2026-06-23 20:44               ` Amery Hung
2026-06-23 21:26               ` Alexei Starovoitov
2026-06-24  1:32                 ` Jiayuan Chen [this message]
2026-06-23 20:09       ` Jakub Sitnicki
2026-06-23 20:14         ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a50cef70-d8fe-4f42-a89b-2c63c33a72ef@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.