From: Simon Horman <horms@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>
Cc: daniel@iogearbox.net, ast@kernel.org, andrii@kernel.org,
jakub@cloudflare.com, bpf@vger.kernel.org,
netdev@vger.kernel.org
Subject: Re: [PATCH bpf 1/3] bpf: tcp_read_skb needs to pop skb regardless of seq
Date: Thu, 21 Sep 2023 22:08:38 +0100 [thread overview]
Message-ID: <20230921210838.GR224399@kernel.org> (raw)
In-Reply-To: <20230920232706.498747-2-john.fastabend@gmail.com>
On Wed, Sep 20, 2023 at 04:27:04PM -0700, John Fastabend wrote:
> Before fix e5c6de5fa0258 tcp_read_skb() would increment the tp->copied-seq
> value. This (as described in the commit) would cause an error for apps
> because once that is incremented the application might believe there is no
> data to be read. Then some apps would stall or abort believing no data is
> available.
>
> However, the fix is incomplete because it introduces another issue in
> the skb dequeue. The loop does tcp_recv_skb() in a while loop to consume
> as many skbs as possible. The problem is the call is,
>
> tcp_recv_skb(sk, seq, &offset)
>
> Where 'seq' is
>
> u32 seq = tp->copied_seq;
>
> Now we can hit a case where we've yet incremented copied_seq from BPF side,
> but then tcp_recv_skb() fails this test,
>
> if (offset < skb->len || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
>
> so that instead of returning the skb we call tcp_eat_recv_skb() which frees
> the skb. This is because the routine believes the SKB has been collapsed
> per comment,
>
> /* This looks weird, but this can happen if TCP collapsing
> * splitted a fat GRO packet, while we released socket lock
> * in skb_splice_bits()
> */
>
> This can't happen here we've unlinked the full SKB and orphaned it. Anyways
> it would confuse any BPF programs if the data were suddenly moved underneath
> it.
>
> To fix this situation do simpler operation and just skb_peek() the data
> of the queue followed by the unlink. It shouldn't need to check this
> condition and tcp_read_skb() reads entire skbs so there is no need to
> handle the 'offset!=0' case as we would see in tcp_read_sock().
>
> Fixes: e5c6de5fa0258 ("bpf, sockmap: Incorrectly handling copied_seq")
> Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
> net/ipv4/tcp.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0c3040a63ebd..45e7f39e67bc 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1625,12 +1625,11 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
> u32 seq = tp->copied_seq;
Hi John,
according to clang-16, with this change seq is now set but unused.
I guess seq can simply be removed as part of this change.
> struct sk_buff *skb;
> int copied = 0;
> - u32 offset;
>
> if (sk->sk_state == TCP_LISTEN)
> return -ENOTCONN;
>
> - while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
> + while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
> u8 tcp_flags;
> int used;
>
> --
> 2.33.0
>
>
next prev parent reply other threads:[~2023-09-21 21:08 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-20 23:27 [PATCH bpf 0/3] bpf, sockmap complete fixes for avail bytes John Fastabend
2023-09-20 23:27 ` [PATCH bpf 1/3] bpf: tcp_read_skb needs to pop skb regardless of seq John Fastabend
2023-09-21 21:08 ` Simon Horman [this message]
2023-09-21 21:23 ` John Fastabend
2023-09-23 14:37 ` kernel test robot
2023-09-20 23:27 ` [PATCH bpf 2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set John Fastabend
2023-09-22 10:23 ` Jakub Sitnicki
2023-09-20 23:27 ` [PATCH bpf 3/3] bpf: sockmap, add tests for MSG_F_PEEK John Fastabend
2023-09-22 11:06 ` Jakub Sitnicki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230921210838.GR224399@kernel.org \
--to=horms@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.