From: Jakub Sitnicki <jakub@cloudflare.com>
To: Jiayuan Chen <mrpre@163.com>
Cc: bpf@vger.kernel.org, john.fastabend@gmail.com,
netdev@vger.kernel.org, martin.lau@linux.dev, ast@kernel.org,
edumazet@google.com, davem@davemloft.net, dsahern@kernel.org,
kuba@kernel.org, pabeni@redhat.com,
linux-kernel@vger.kernel.org, song@kernel.org,
andrii@kernel.org, mhal@rbox.co, yonghong.song@linux.dev,
daniel@iogearbox.net, xiyou.wangcong@gmail.com,
horms@kernel.org, corbet@lwn.net, eddyz87@gmail.com,
cong.wang@bytedance.com, shuah@kernel.org, mykolal@fb.com,
jolsa@kernel.org, haoluo@google.com, sdf@fomichev.me,
kpsingh@kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation
Date: Sat, 18 Jan 2025 15:50:22 +0100 [thread overview]
Message-ID: <87ikqcdvm9.fsf@cloudflare.com> (raw)
In-Reply-To: <20250116140531.108636-3-mrpre@163.com> (Jiayuan Chen's message of "Thu, 16 Jan 2025 22:05:28 +0800")
On Thu, Jan 16, 2025 at 10:05 PM +08, Jiayuan Chen wrote:
> 'sk->copied_seq' was updated in the tcp_eat_skb() function when the
> action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
> the update logic for 'sk->copied_seq' was moved to
> tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
>
> It works for a single stream_verdict scenario, as it also modified
> 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
> to remove updating 'sk->copied_seq'.
>
> However, for programs where both stream_parser and stream_verdict are
> active(strparser purpose), tcp_read_sock() was used instead of
> tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock)
> tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated
> updates.
>
> In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
> in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
>
> The issue causes incorrect copied_seq calculations, which prevent
> correct data reads from the recv() interface in user-land.
>
> We do not want to add new proto_ops to implement a new version of
> tcp_read_sock, as this would introduce code complexity [1].
>
> We add new callback for strparser for customized read operation, also as
> a wrapper function it provides abstraction use psock.
>
> [1]: https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com
> Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq")
> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Jiayuan Chen <mrpre@163.com>
> ---
[...]
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 47f65b1b70ca..6dcde3506a9b 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -646,6 +646,47 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops)
> ops->sendmsg == tcp_sendmsg ? 0 : -ENOTSUPP;
> }
>
> +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> +static int tcp_bpf_strp_read_sock(struct sock *sk, read_descriptor_t *desc,
> + sk_read_actor_t recv_actor)
> +{
> + struct sk_psock *psock;
> + struct tcp_sock *tp;
> + int copied = 0;
> +
> + tp = tcp_sk(sk);
> + rcu_read_lock();
> + psock = sk_psock(sk);
> + if (WARN_ON(!psock)) {
> + desc->error = -EINVAL;
> + goto out;
> + }
> +
> + psock->ingress_bytes = 0;
> + /* We could easily add copied_seq and noack into desc then call
> + * ops->read_sock without calling symbol directly. But unfortunately
> + * most descriptors used by other modules are not inited with zero.
> + * Also it not work by replacing ops->read_sock without introducing
> + * new ops as ops itself is located in rodata segment.
> + */
> + copied = tcp_read_sock_noack(sk, desc, recv_actor, true,
> + &psock->copied_seq);
> + if (copied < 0)
> + goto out;
> + /* recv_actor may redirect skb to another socket(SK_REDIRECT) or
> + * just put skb into ingress queue of current socket(SK_PASS).
> + * For SK_REDIRECT, we need 'ack' the frame immediately but for
> + * SK_PASS, the 'ack' was delay to tcp_bpf_recvmsg_parser()
> + */
> + tp->copied_seq = psock->copied_seq - psock->ingress_bytes;
> + tcp_rcv_space_adjust(sk);
> + __tcp_cleanup_rbuf(sk, copied - psock->ingress_bytes);
> +out:
> + rcu_read_unlock();
> + return copied;
> +}
> +#endif /* CONFIG_BPF_STREAM_PARSER */
> +
> int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
> {
> int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
> @@ -681,6 +722,12 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
>
> /* Pairs with lockless read in sk_clone_lock() */
> sock_replace_proto(sk, &tcp_bpf_prots[family][config]);
> +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> + if (psock->progs.stream_parser && psock->progs.stream_verdict) {
> + psock->copied_seq = tcp_sk(sk)->copied_seq;
> + psock->read_sock = tcp_bpf_strp_read_sock;
Just directly set psock->strp.cb.read_sock to tcp_bpf_strp_read_sock.
Then we don't need this intermediate psock->read_sock callback, which
doesn't do anything useful.
> + }
> +#endif
> return 0;
> }
> EXPORT_SYMBOL_GPL(tcp_bpf_update_proto);
next prev parent reply other threads:[~2025-01-18 14:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-16 14:05 [PATCH bpf v7 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen
2025-01-16 14:05 ` [PATCH bpf v7 1/5] strparser: add read_sock callback Jiayuan Chen
2025-01-18 14:56 ` Jakub Sitnicki
2025-01-16 14:05 ` [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation Jiayuan Chen
2025-01-18 14:50 ` Jakub Sitnicki [this message]
2025-01-18 15:29 ` Jiayuan Chen
2025-01-20 3:35 ` Jiayuan Chen
2025-01-20 10:13 ` Jakub Sitnicki
2025-01-16 14:05 ` [PATCH bpf v7 3/5] bpf: disable non stream socket for strparser Jiayuan Chen
2025-01-18 15:03 ` Jakub Sitnicki
2025-01-18 15:32 ` Jiayuan Chen
2025-01-16 14:05 ` [PATCH bpf v7 4/5] selftests/bpf: fix invalid flag of recv() Jiayuan Chen
2025-01-16 14:05 ` [PATCH bpf v7 5/5] selftests/bpf: add strparser test for bpf Jiayuan Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ikqcdvm9.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cong.wang@bytedance.com \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mhal@rbox.co \
--cc=mrpre@163.com \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=xiyou.wangcong@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.