All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Jiayuan Chen <mrpre@163.com>,  bpf@vger.kernel.org
Cc: martin.lau@linux.dev,  ast@kernel.org,  edumazet@google.com,
	 jakub@cloudflare.com,  davem@davemloft.net,  dsahern@kernel.org,
	 kuba@kernel.org,  pabeni@redhat.com,
	 linux-kernel@vger.kernel.org,  song@kernel.org,
	 john.fastabend@gmail.com,  andrii@kernel.org,  mhal@rbox.co,
	 yonghong.song@linux.dev,  daniel@iogearbox.net,
	 xiyou.wangcong@gmail.com,  horms@kernel.org,
	 Jiayuan Chen <mrpre@163.com>
Subject: RE: [PATCH bpf v2 1/2] bpf: fix wrong copied_seq calculation
Date: Tue, 10 Dec 2024 18:11:26 -0800	[thread overview]
Message-ID: <6758f4ce604d5_4e1720871@john.notmuch> (raw)
In-Reply-To: <20241209152740.281125-2-mrpre@163.com>

Jiayuan Chen wrote:
> 'sk->copied_seq' was updated in the tcp_eat_skb() function when the
> action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
> the update logic for 'sk->copied_seq' was moved to
> tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
> 
> It works for a single stream_verdict scenario, as it also modified
> 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
> to remove updating 'sk->copied_seq'.
> 
> However, for programs where both stream_parser and stream_verdict are
> active(strparser purpose), tcp_read_sock() was used instead of
> tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock)
> tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated
> updates.
> 
> In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
> in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
> 
> The issue causes incorrect copied_seq calculations, which prevent
> correct data reads from the recv() interface in user-land.
> 
> Modifying tcp_read_sock() or strparser implementation directly is
> unreasonable, as it is widely used in other modules.
> 
> Here, we introduce a method tcp_bpf_read_sock() to replace
> 'sk->sk_socket->ops->read_sock' (like 'tls_build_proto()' does in
> tls_main.c). Such replacement action was also used in updating
> tcp_bpf_prots in tcp_bpf.c, so it's not weird.
> (Note that checkpatch.pl may complain missing 'const' qualifier when we
> define the bpf-specified 'proto_ops', but we have to do because we need
> update it).
> 
> Also we remove strparser check in tcp_eat_skb() since we implement custom
> function tcp_bpf_read_sock() without copied_seq updating.
> 
> Since strparser currently supports only TCP, it's sufficient for 'ops' to
> inherit inet_stream_ops.
> 
> In strparser's implementation, regardless of partial or full reads,
> it completely clones the entire skb, allowing us to unconditionally
> free skb in tcp_bpf_read_sock().
> 
> Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq")
> Signed-off-by: Jiayuan Chen <mrpre@163.com>

[...]

> +/* The tcp_bpf_read_sock() is an alternative implementation
> + * of tcp_read_sock(), except that it does not update copied_seq.
> + */
> +static int tcp_bpf_read_sock(struct sock *sk, read_descriptor_t *desc,
> +			     sk_read_actor_t recv_actor)
> +{
> +	struct sk_buff *skb;
> +	int copied = 0;
> +
> +	if (sk->sk_state == TCP_LISTEN)
> +		return -ENOTCONN;
> +
> +	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
> +		u8 tcp_flags;
> +		int used;
> +
> +		WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
> +		tcp_flags = TCP_SKB_CB(skb)->tcp_flags;
> +		used = recv_actor(desc, skb, 0, skb->len);

Here the skb is still on the receive_queue how does this work with
tcp_try_coalesce()? So I believe you need to unlink before you
call the actor which creates a bit of trouble if recv_actor
doesn't want the entire skb.  

I think easier is to do similar logic to read_sock and track
offset and len? Did I miss something.

> +		/* strparser clone and consume all input skb
> +		 * even in waiting head or body status
> +		 */
> +		tcp_eat_recv_skb(sk, skb);
> +		if (used <= 0) {
> +			if (!copied)
> +				copied = used;
> +			break;
> +		}
> +		copied += used;
> +		if (!desc->count)
> +			break;
> +		if (tcp_flags & TCPHDR_FIN)
> +			break;
> +	}
> +	return copied;
> +}
> +
>  enum {
>  	TCP_BPF_IPV4,
>  	TCP_BPF_IPV6,
> @@ -595,6 +636,10 @@ enum {

  reply	other threads:[~2024-12-11  2:11 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-09 15:27 [PATCH bpf v2 0/2] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen
2024-12-09 15:27 ` [PATCH bpf v2 1/2] bpf: fix wrong copied_seq calculation Jiayuan Chen
2024-12-11  2:11   ` John Fastabend [this message]
2024-12-12  6:33     ` Jiayuan Chen
2024-12-13  1:36       ` John Fastabend
2024-12-13 14:07         ` Jiayuan Chen
2024-12-16  3:32           ` John Fastabend
2024-12-18  5:01             ` Jiayuan Chen
2024-12-09 15:27 ` [PATCH bpf v2 2/2] selftests/bpf: add strparser test for bpf Jiayuan Chen
2024-12-14 18:50 ` [PATCH bpf v2 0/2] bpf: fix wrong copied_seq calculation and add tests Jakub Sitnicki
2024-12-15  1:18   ` Cong Wang
2024-12-16 12:19     ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6758f4ce604d5_4e1720871@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mhal@rbox.co \
    --cc=mrpre@163.com \
    --cc=pabeni@redhat.com \
    --cc=song@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.