From: Jakub Sitnicki <jakub@cloudflare.com>
To: "Jiayuan Chen" <jiayuan.chen@linux.dev>
Cc: bpf@vger.kernel.org, "John Fastabend" <john.fastabend@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"Neal Cardwell" <ncardwell@google.com>,
"Kuniyuki Iwashima" <kuniyu@google.com>,
"David Ahern" <dsahern@kernel.org>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Eduard Zingerman" <eddyz87@gmail.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"KP Singh" <kpsingh@kernel.org>,
"Stanislav Fomichev" <sdf@fomichev.me>,
"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
"Shuah Khan" <shuah@kernel.org>, "Michal Luczaj" <mhal@rbox.co>,
"Cong Wang" <cong.wang@bytedance.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf-next v7 2/3] bpf, sockmap: Fix FIONREAD for sockmap
Date: Fri, 23 Jan 2026 15:59:33 +0100 [thread overview]
Message-ID: <87cy30tlwq.fsf@cloudflare.com> (raw)
In-Reply-To: <d8c440c1255979b5120eed985f7d156f68cd7a3f@linux.dev> (Jiayuan Chen's message of "Thu, 22 Jan 2026 03:56:50 +0000")
On Thu, Jan 22, 2026 at 03:56 AM GMT, Jiayuan Chen wrote:
> January 21, 2026 at 20:55, "Jiayuan Chen" <jiayuan.chen@linux.dev
> mailto:jiayuan.chen@linux.dev?to=%22Jiayuan%20Chen%22%20%3Cjiayuan.chen%40linux.dev%3E
>> wrote:
>> January 21, 2026 at 17:36, "Jakub Sitnicki" <jakub@cloudflare.com
>> mailto:jakub@cloudflare.com?to=%22Jakub%20Sitnicki%22%20%3Cjakub%40cloudflare.com%3E
>> > I've been thinking about this some more and came to the conclusion that
>> > this udp_bpf_ioctl implementation is actually what we want, while
>> > tcp_bpf_ioctl *should not* be checking if the sk_receive_queue is
>> > non-empty.
>> >
>> > Why? Because the verdict prog might redirect or drop the skbs from
>> > sk_receive_queue once it actually runs. The messages might never appear
>> > on the msg_ingress queue.
>> >
>> > What I think we should be doing, in the end, is kicking the
>> > sk_receive_queue processing on bpf_map_update_elem, if there's data
>> > ready.
>> >
>> > The API semantics I'm proposing is:
>> >
>> > 1. ioctl(FIONREAD) -> reports N bytes
>> > 2. bpf_map_update_elem(sk) -> socket inserted into sockmap
>> > 3. poll() for POLLIN -> wait for socket to be ready to read
>> > 5. ioctl(FIONREAD) -> report N bytes if verdict prog didn't
>> > redirect or drop it
>> >
>> > We don't have to add the the queue kick on map update in this series.
>> >
>> > If you decide to leave it for later, can I ask that you open an issue at
>> > our GH project [1]?
>> >
>> > I don't want it to fall through the cracks. And I sometimes have people
>> > asking what they could help with in sockmap.
>> >
>> > Thanks,
>> > -jkbs
>> >
>> > [1] https://github.com/sockmap-project/sockmap-project/issues
>> >
>> Hi Jakub,
>>
>> Thanks for taking the time to think through this carefully. I agree with your
>> analysis - reporting sk_receive_queue length is misleading since the verdict
>> prog might redirect or drop those skbs.
>>
>> There's no rush to merge this patch.
>>
>> Since the kick queue on bpf_map_update_elem addresses a closely related issue,
>> I think it makes sense to include it in this patchset for easier tracking rather
>> than splitting it out.
>>
>> I'll spend more time looking into this and come back with an updated version.
>>
>> Thanks,
>> Jiayuan
>>
>
>
> Hi Jakub,
>
> I've been thinking about this more, and I realize the problem is not as simple as it seems.
>
> Regarding kicking the sk_receive_queue on bpf_map_update_elem: the BPF
> program may not be fully initialized at that point. For example, with a
> redirect program, the destination fd might not yet be inserted into the
> map. If we kick the data through the BPF program immediately, the
> redirect lookup would fail, leading to unexpected behavior (data being
> dropped or passed to the wrong socket).
I reckon there is not much we can do about it because we have no control
over when inserts/removes sockets from sockmap. It can happen at any
time.
Also, a newly received segment can trigger sk_data_ready callback,
and that would also cause the skbs to get processed. We don't have
control of that either.
Does this change break any of our existing tests/benchmarks or some
other setup of yours?
> I also considered triggering the kick in poll/select via
> sk_msg_is_readable(). However, this approach doesn't work for TCP
> because tcp_poll() -> tcp_stream_is_readable() -> tcp_epollin_ready()
> will return early when sk_receive_queue has data, before ever calling
> sk_is_readable().
>
> In the next version, I'll address your other nits and remove the
> sk_receive_queue check from tcp_bpf_ioctl. I'll also open an issue on
> the GH project to track this problem so we can continue exploring
> better solutions.
Sounds like a plan. Thanks!
next prev parent reply other threads:[~2026-01-23 14:59 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-13 2:50 [PATCH bpf-next v7 0/3] bpf: Fix FIONREAD and copied_seq issues Jiayuan Chen
2026-01-13 2:50 ` [PATCH bpf-next v7 1/3] bpf, sockmap: Fix incorrect copied_seq calculation Jiayuan Chen
2026-01-20 15:01 ` Jakub Sitnicki
2026-01-20 15:38 ` John Fastabend
2026-01-21 9:43 ` Jakub Sitnicki
2026-01-13 2:50 ` [PATCH bpf-next v7 2/3] bpf, sockmap: Fix FIONREAD for sockmap Jiayuan Chen
2026-01-20 15:00 ` Jakub Sitnicki
2026-01-21 9:36 ` Jakub Sitnicki
2026-01-21 12:55 ` Jiayuan Chen
2026-01-22 3:56 ` Jiayuan Chen
2026-01-23 14:59 ` Jakub Sitnicki [this message]
2026-01-13 2:50 ` [PATCH bpf-next v7 3/3] bpf, selftest: Add tests for FIONREAD and copied_seq Jiayuan Chen
2026-01-21 9:45 ` Jakub Sitnicki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cy30tlwq.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cong.wang@bytedance.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=jiayuan.chen@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mhal@rbox.co \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox