All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: daniel@iogearbox.net, lmb@isovalent.com, edumazet@google.com,
	bpf@vger.kernel.org, netdev@vger.kernel.org, ast@kernel.org,
	andrii@kernel.org, will@isovalent.com
Subject: Re: [PATCH bpf v7 08/13] bpf: sockmap, incorrectly handling copied_seq
Date: Fri, 05 May 2023 14:14:12 +0200	[thread overview]
Message-ID: <87zg6jvtnx.fsf@cloudflare.com> (raw)
In-Reply-To: <20230502155159.305437-9-john.fastabend@gmail.com>

On Tue, May 02, 2023 at 08:51 AM -07, John Fastabend wrote:
> The read_skb() logic is incrementing the tcp->copied_seq which is used for
> among other things calculating how many outstanding bytes can be read by
> the application. This results in application errors, if the application
> does an ioctl(FIONREAD) we return zero because this is calculated from
> the copied_seq value.
>
> To fix this we move tcp->copied_seq accounting into the recv handler so
> that we update these when the recvmsg() hook is called and data is in
> fact copied into user buffers. This gives an accurate FIONREAD value
> as expected and improves ACK handling. Before we were calling the
> tcp_rcv_space_adjust() which would update 'number of bytes copied to
> user in last RTT' which is wrong for programs returning SK_PASS. The
> bytes are only copied to the user when recvmsg is handled.
>
> Doing the fix for recvmsg is straightforward, but fixing redirect and
> SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then
> call this from skmsg handlers. This fixes another issue where a broken
> socket with a BPF program doing a resubmit could hang the receiver. This
> happened because although read_skb() consumed the skb through sock_drop()
> it did not update the copied_seq. Now if a single reccv socket is
> redirecting to many sockets (for example for lb) the receiver sk will be
> hung even though we might expect it to continue. The hang comes from
> not updating the copied_seq numbers and memory pressure resulting from
> that.
>
> We have a slight layer problem of calling tcp_eat_skb even if its not
> a TCP socket. To fix we could refactor and create per type receiver
> handlers. I decided this is more work than we want in the fix and we
> already have some small tweaks depending on caller that use the
> helper skb_bpf_strparser(). So we extend that a bit and always set
> the strparser bit when it is in use and then we can gate the
> seq_copied updates on this.
>
> Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  include/net/tcp.h  | 10 ++++++++++
>  net/core/skmsg.c   |  7 +++++--
>  net/ipv4/tcp.c     | 10 +---------
>  net/ipv4/tcp_bpf.c | 28 +++++++++++++++++++++++++++-
>  4 files changed, 43 insertions(+), 12 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index db9f828e9d1e..76bf0a11bdc7 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1467,6 +1467,8 @@ static inline void tcp_adjust_rcv_ssthresh(struct sock *sk)
>  }
>  
>  void tcp_cleanup_rbuf(struct sock *sk, int copied);
> +void __tcp_cleanup_rbuf(struct sock *sk, int copied);
> +
>  
>  /* We provision sk_rcvbuf around 200% of sk_rcvlowat.
>   * If 87.5 % (7/8) of the space has been consumed, we want to override
> @@ -2323,6 +2325,14 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
>  void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
>  #endif /* CONFIG_BPF_SYSCALL */
>  
> +#ifdef CONFIG_INET
> +void tcp_eat_skb(struct sock *sk, struct sk_buff *skb);
> +#else
> +static inline void tcp_eat_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +}
> +#endif
> +
>  int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
>  			  struct sk_msg *msg, u32 bytes, int flags);
>  #endif /* CONFIG_NET_SOCK_MSG */
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 3c0663f5cc3e..18c4f4015559 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1017,11 +1017,14 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
>  		}
>  		break;
>  	case __SK_REDIRECT:
> +		tcp_eat_skb(psock->sk, skb);
>  		err = sk_psock_skb_redirect(psock, skb);
>  		break;
>  	case __SK_DROP:
>  	default:
>  out_free:
> +		tcp_eat_skb(psock->sk, skb);
> +		skb_bpf_redirect_clear(skb);
>  		sock_drop(psock->sk, skb);
>  	}
>  

I have a feeling you wanted to factor out the common
skb_bpf_redirect_clear() into out_free: block, but maybe forgot to
update the jump sites?

[...]

  reply	other threads:[~2023-05-05 12:39 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-02 15:51 [PATCH bpf v7 00/13] bpf sockmap fixes John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 01/13] bpf: sockmap, pass skb ownership through read_skb John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 02/13] bpf: sockmap, convert schedule_work into delayed_work John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 03/13] bpf: sockmap, reschedule is now done through backlog John Fastabend
2023-05-03  9:49   ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 04/13] bpf: sockmap, improved check for empty queue John Fastabend
2023-05-04 16:53   ` Jakub Sitnicki
2023-05-04 17:42     ` John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 05/13] bpf: sockmap, handle fin correctly John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 06/13] bpf: sockmap, TCP data stall on recv before accept John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 07/13] bpf: sockmap, wake up polling after data copy John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 08/13] bpf: sockmap, incorrectly handling copied_seq John Fastabend
2023-05-05 12:14   ` Jakub Sitnicki [this message]
2023-05-02 15:51 ` [PATCH bpf v7 09/13] bpf: sockmap, pull socket helpers out of listen test for general use John Fastabend
2023-05-05 17:38   ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 10/13] bpf: sockmap, build helper to create connected socket pair John Fastabend
2023-05-05 17:39   ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 11/13] bpf: sockmap, test shutdown() correctly exits epoll and recv()=0 John Fastabend
2023-05-08 11:04   ` Jakub Sitnicki
2023-05-16  1:51     ` John Fastabend
2023-05-16 13:41       ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 12/13] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer John Fastabend
2023-05-08 11:19   ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 13/13] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer with drops John Fastabend
2023-05-08 11:34   ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zg6jvtnx.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=edumazet@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=lmb@isovalent.com \
    --cc=netdev@vger.kernel.org \
    --cc=will@isovalent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.