All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: "Jiayuan Chen" <jiayuan.chen@linux.dev>
Cc: bpf@vger.kernel.org,  "John Fastabend" <john.fastabend@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	 "Eric Dumazet" <edumazet@google.com>,
	 "Jakub Kicinski" <kuba@kernel.org>,
	 "Paolo Abeni" <pabeni@redhat.com>,
	 "Simon Horman" <horms@kernel.org>,
	 "Neal Cardwell" <ncardwell@google.com>,
	 "Kuniyuki Iwashima" <kuniyu@google.com>,
	 "David Ahern" <dsahern@kernel.org>,
	 "Andrii Nakryiko" <andrii@kernel.org>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	 "Alexei Starovoitov" <ast@kernel.org>,
	 "Daniel Borkmann" <daniel@iogearbox.net>,
	 "Martin KaFai Lau" <martin.lau@linux.dev>,
	 "Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	 "KP Singh" <kpsingh@kernel.org>,
	 "Stanislav Fomichev" <sdf@fomichev.me>,
	 "Hao Luo" <haoluo@google.com>,  "Jiri Olsa" <jolsa@kernel.org>,
	 "Shuah Khan" <shuah@kernel.org>,  "Michal Luczaj" <mhal@rbox.co>,
	 "Cong Wang" <cong.wang@bytedance.com>,
	 netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf-next v7 2/3] bpf, sockmap: Fix FIONREAD for sockmap
Date: Fri, 23 Jan 2026 15:59:33 +0100	[thread overview]
Message-ID: <87cy30tlwq.fsf@cloudflare.com> (raw)
In-Reply-To: <d8c440c1255979b5120eed985f7d156f68cd7a3f@linux.dev> (Jiayuan Chen's message of "Thu, 22 Jan 2026 03:56:50 +0000")

On Thu, Jan 22, 2026 at 03:56 AM GMT, Jiayuan Chen wrote:
> January 21, 2026 at 20:55, "Jiayuan Chen" <jiayuan.chen@linux.dev
> mailto:jiayuan.chen@linux.dev?to=%22Jiayuan%20Chen%22%20%3Cjiayuan.chen%40linux.dev%3E
>> wrote:
>> January 21, 2026 at 17:36, "Jakub Sitnicki" <jakub@cloudflare.com
>> mailto:jakub@cloudflare.com?to=%22Jakub%20Sitnicki%22%20%3Cjakub%40cloudflare.com%3E
>> >  I've been thinking about this some more and came to the conclusion that
>> >  this udp_bpf_ioctl implementation is actually what we want, while
>> >  tcp_bpf_ioctl *should not* be checking if the sk_receive_queue is
>> >  non-empty.
>> >  
>> >  Why? Because the verdict prog might redirect or drop the skbs from
>> >  sk_receive_queue once it actually runs. The messages might never appear
>> >  on the msg_ingress queue.
>> >  
>> >  What I think we should be doing, in the end, is kicking the
>> >  sk_receive_queue processing on bpf_map_update_elem, if there's data
>> >  ready.
>> >  
>> >  The API semantics I'm proposing is:
>> >  
>> >  1. ioctl(FIONREAD) -> reports N bytes
>> >  2. bpf_map_update_elem(sk) -> socket inserted into sockmap
>> >  3. poll() for POLLIN -> wait for socket to be ready to read
>> >  5. ioctl(FIONREAD) -> report N bytes if verdict prog didn't
>> >  redirect or drop it
>> >  
>> >  We don't have to add the the queue kick on map update in this series.
>> >  
>> >  If you decide to leave it for later, can I ask that you open an issue at
>> >  our GH project [1]?
>> >  
>> >  I don't want it to fall through the cracks. And I sometimes have people
>> >  asking what they could help with in sockmap.
>> >  
>> >  Thanks,
>> >  -jkbs
>> >  
>> >  [1] https://github.com/sockmap-project/sockmap-project/issues
>> > 
>> Hi Jakub,
>> 
>> Thanks for taking the time to think through this carefully. I agree with your
>> analysis - reporting sk_receive_queue length is misleading since the verdict
>> prog might redirect or drop those skbs.
>> 
>> There's no rush to merge this patch.
>> 
>> Since the kick queue on bpf_map_update_elem addresses a closely related issue,
>> I think it makes sense to include it in this patchset for easier tracking rather
>> than splitting it out.
>> 
>> I'll spend more time looking into this and come back with an updated version.
>> 
>> Thanks,
>> Jiayuan
>>
>
>
> Hi Jakub,
>
>   I've been thinking about this more, and I realize the problem is not as simple as it seems.
>
>   Regarding kicking the sk_receive_queue on bpf_map_update_elem: the BPF
>   program may not be fully initialized at that point. For example, with a
>   redirect program, the destination fd might not yet be inserted into the
>   map. If we kick the data through the BPF program immediately, the
>   redirect lookup would fail, leading to unexpected behavior (data being
>   dropped or passed to the wrong socket).

I reckon there is not much we can do about it because we have no control
over when inserts/removes sockets from sockmap. It can happen at any
time.

Also, a newly received segment can trigger sk_data_ready callback,
and that would also cause the skbs to get processed. We don't have
control of that either.

Does this change break any of our existing tests/benchmarks or some
other setup of yours?

>   I also considered triggering the kick in poll/select via
>   sk_msg_is_readable(). However, this approach doesn't work for TCP
>   because tcp_poll() -> tcp_stream_is_readable() -> tcp_epollin_ready()
>   will return early when sk_receive_queue has data, before ever calling
>   sk_is_readable().
>
>   In the next version, I'll address your other nits and remove the
>   sk_receive_queue check from tcp_bpf_ioctl. I'll also open an issue on
>   the GH project to track this problem so we can continue exploring
>   better solutions.

Sounds like a plan. Thanks!

  reply	other threads:[~2026-01-23 14:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13  2:50 [PATCH bpf-next v7 0/3] bpf: Fix FIONREAD and copied_seq issues Jiayuan Chen
2026-01-13  2:50 ` [PATCH bpf-next v7 1/3] bpf, sockmap: Fix incorrect copied_seq calculation Jiayuan Chen
2026-01-20 15:01   ` Jakub Sitnicki
2026-01-20 15:38     ` John Fastabend
2026-01-21  9:43       ` Jakub Sitnicki
2026-01-13  2:50 ` [PATCH bpf-next v7 2/3] bpf, sockmap: Fix FIONREAD for sockmap Jiayuan Chen
2026-01-20 15:00   ` Jakub Sitnicki
2026-01-21  9:36     ` Jakub Sitnicki
2026-01-21 12:55       ` Jiayuan Chen
2026-01-22  3:56         ` Jiayuan Chen
2026-01-23 14:59           ` Jakub Sitnicki [this message]
2026-01-13  2:50 ` [PATCH bpf-next v7 3/3] bpf, selftest: Add tests for FIONREAD and copied_seq Jiayuan Chen
2026-01-21  9:45   ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cy30tlwq.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=horms@kernel.org \
    --cc=jiayuan.chen@linux.dev \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mhal@rbox.co \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.