public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Jiayuan Chen <mrpre@163.com>,  John Fastabend <john.fastabend@gmail.com>
Cc: bpf@vger.kernel.org,  martin.lau@linux.dev,  ast@kernel.org,
	 edumazet@google.com,  jakub@cloudflare.com,
	 davem@davemloft.net,  dsahern@kernel.org,  kuba@kernel.org,
	 pabeni@redhat.com,  linux-kernel@vger.kernel.org,
	 song@kernel.org,  andrii@kernel.org,  mhal@rbox.co,
	 yonghong.song@linux.dev,  daniel@iogearbox.net,
	 xiyou.wangcong@gmail.com,  horms@kernel.org
Subject: Re: [PATCH bpf v2 1/2] bpf: fix wrong copied_seq calculation
Date: Sun, 15 Dec 2024 19:32:01 -0800	[thread overview]
Message-ID: <675f9f3184dfe_159ba20815@john.notmuch> (raw)
In-Reply-To: <xtsolkbkdecvlbqx4zjtvd74c45lg5kqx2ojgdvovxrjgaghij@ld4wjwi7imvy>

Jiayuan Chen wrote:
> On Thu, Dec 12, 2024 at 05:36:15PM -0800, John Fastabend wrote:
> [...]
> > > > I think easier is to do similar logic to read_sock and track
> > > > offset and len? Did I miss something.
> > > 
> > > Thanks to John Fastabend.
> > > 
> > > Let me explain it.
> > > Now I only replace the read_sock handler when using strparser.
> > > 
> > > My previous implementation added the replacement of read_sock in
> > > sk_psock_start_strp() to more explicitly require replacement when using
> > > strparser, for instance:
> > > '''skmsg.c
> > > void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock)
> > > {
> > >     ...
> > >     sk->sk_data_ready = sk_psock_strp_data_ready;
> > >     /* Replacement */
> > >     sk->sk_socket->ops->read_sock = tcp_bpf_read_sock;
> > > }
> > > '''
> > > 
> > > As you can see that it only works for strparser.
> > > (The current implementation of replacement in tcp_bpf_update_proto()
> > > achieves the same effect just not as obviously.)
> > > 
> > > So the current implementation of recv_actor() can only be strp_recv(),
> > > with the call stack as follows:
> > > '''
> > > sk_psock_strp_data_ready
> > >   -> strp_data_ready
> > >     -> strp_read_sock
> > >       -> strp_recv
> > > '''
> > > 
> > > The implementation of strp_recv() will consume all input skb. Even if it
> > > reads part of the data, it will clone it, then place it into its own
> > > queue, expecting the caller to release the skb. I didn't find any
> > > logic like tcp_try_coalesce() (fix me if i miss something).
> > 
> > 
> > So here is what I believe the flow is,
> > 
> > sk_psock_strp_data_ready
> >   -> str_data_ready
> >      -> strp_read_sock
> >         -> sock->ops->read_sock(.., strp_recv)
> > 
> > 
> > We both have the same idea up to here. But then the proposed data_ready()
> > call
> > 
> > +	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
> > +		u8 tcp_flags;
> > +		int used;
> > +
> > +		WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
> > +		tcp_flags = TCP_SKB_CB(skb)->tcp_flags;
> > +		used = recv_actor(desc, skb, 0, skb->len);
> > 
> > The recv_actor here is strp_recv() all good so far. But, because
> > that skb is still on the sk_receive_queue() the TCP stack may
> > at the same time do
> > 
> >  tcp_data_queue
> >   -> tcp_queue_rcv
> >      -> tail = skb_peek_tail(&sk->sk_receive_queue);
> >         tcp_try_coalesce(sk, tail, skb, fragstolen)
> >          -> skb_try_coalesce()
> >             ... skb->len += len
> > 
> > So among other things you will have changed the skb->len and added some
> > data to it. If this happens while you are running the recv actor we will
> > eat the data by calling tcp_eat_recv_skb(). Any data added from the
> > try_coalesce will just be dropped and never handled? The clone() from
> > the strparser side doesn't help you the tcp_eat_recv_skb call will
> > unlik the skb from the sk_receive_queue.
> > 
> > I don't think you have any way to protect this at the moment.
> 
> Thanks John Fastabend.
> 
> It seems sk was always locked whenever data_ready called.
> 
> '''
> bh_lock_sock_nested(sk)
> tcp_v4_do_rcv(sk)
>    |
>    |-> tcp_rcv_established
>    	|-> tcp_queue_rcv 
>    		|-> tcp_try_coalesce
>    |
>    |-> tcp_rcv_state_process
>    	|-> tcp_queue_rcv
>    		|-> tcp_try_coalesce
>    |
>    |-> sk->sk_data_ready()
> 
> bh_unlock_sock(sk)
> '''
> 
> other data_ready path:
> '''
> lock_sk(sk)
> sk->sk_data_ready()
> release_sock(sk)
> '''
> 
> I can not find any concurrency there. 

OK thanks, one more concern though. What if strp_recv thorws an ENOMEM
error on the clone? Would we just drop the data then? This is problem
not the expected behavior its already been ACKed.

Thanks,
John

  reply	other threads:[~2024-12-16  3:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-09 15:27 [PATCH bpf v2 0/2] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen
2024-12-09 15:27 ` [PATCH bpf v2 1/2] bpf: fix wrong copied_seq calculation Jiayuan Chen
2024-12-11  2:11   ` John Fastabend
2024-12-12  6:33     ` Jiayuan Chen
2024-12-13  1:36       ` John Fastabend
2024-12-13 14:07         ` Jiayuan Chen
2024-12-16  3:32           ` John Fastabend [this message]
2024-12-18  5:01             ` Jiayuan Chen
2024-12-09 15:27 ` [PATCH bpf v2 2/2] selftests/bpf: add strparser test for bpf Jiayuan Chen
2024-12-14 18:50 ` [PATCH bpf v2 0/2] bpf: fix wrong copied_seq calculation and add tests Jakub Sitnicki
2024-12-15  1:18   ` Cong Wang
2024-12-16 12:19     ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=675f9f3184dfe_159ba20815@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mhal@rbox.co \
    --cc=mrpre@163.com \
    --cc=pabeni@redhat.com \
    --cc=song@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox