public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Jason Xing <kerneljasonxing@gmail.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: davem@davemloft.net,  edumazet@google.com,  kuba@kernel.org,
	 pabeni@redhat.com,  horms@kernel.org,  willemb@google.com,
	 martin.lau@kernel.org,  netdev@vger.kernel.org,
	 bpf@vger.kernel.org,  Jason Xing <kernelxing@tencent.com>,
	 Yushan Zhou <katrinzhou@tencent.com>
Subject: Re: [PATCH net-next v2 3/4] bpf-timestamp: keep track of the skb when wait_for_space occurs
Date: Mon, 06 Apr 2026 10:37:57 -0400	[thread overview]
Message-ID: <willemdebruijn.kernel.27ec47b22b23c@gmail.com> (raw)
In-Reply-To: <CAL+tcoB94=ptFB-acU3nAK_MqW74DV0MQUVvvV3nbQ7Zc6zSZw@mail.gmail.com>

Jason Xing wrote:
> On Mon, Apr 6, 2026 at 10:28 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > The patch is the 1/2 part of push-level granularity feature.
> > >
> > > Tag the skb in tcp_sendmsg_locked() when wait_for_space occurs even
> > > though it might not carry the last byte of the sendmsg.
> > >
> > > Prior to the patch, BPF timestamping cannot cover this case:
> > > The following steps reproduce this:
> > > 1) skb A is the current last skb before entering wait_for_space process
> > > 2) tcp_push() pushes A without any tag
> > > 3) A is transmitted from TCP to driver without putting any skb carrying
> > >    timestamps in the error queue, like SCHED, DRV/HARDWARE.
> > > 4) sk_stream_wait_memory() sleeps for a while and then returns with an
> > >    error code. Note that the socket lock is released.
> > > 5) skb A finally gets acked and removed from the rtx queue.
> > > 6) continue with the rest of tcp_sendmsg_locked(): it will jump to(goto)
> > >    'do_error' label and then 'out' label.
> > > 7) at this moment, skb A turns out to be the last one in this send
> > >    syscall, and miss the following tcp_bpf_tx_timestamp() opportunity
> > >    before the final tcp_push()
> > > 8) BPF script fails to see any timestamps this time
> > >
> > > Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >  net/ipv4/tcp.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index c603b90057f6..7d030a11d004 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -1400,9 +1400,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> > >  wait_for_space:
> > >               set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> > >               tcp_remove_empty_skb(sk);
> > > -             if (copied)
> > > +             if (copied) {
> > > +                     tcp_bpf_tx_timestamp(sk);
> > >                       tcp_push(sk, flags & ~MSG_MORE, mss_now,
> > >                                TCP_NAGLE_PUSH, size_goal);
> >
> > Now the number of skbs that will be tracked will be unpredictable,
> > varying based on memory pressure.
> 
> Right, I put some effort into writing a selftests to check how many
> push functions get called at one time and failed to do so.
> 
> >
> > That sounds hard to use to me. Especially if these extra pushes
> > cannot be identified as such.
> >
> > Perhaps if all skbs from the same sendmsg call can be identified,
> > that would help explain pattern in data resulting from these
> > uncommon extra data points.
> 
> You meant move tcp_bpf_tx_timestamp before tcp_skb_entail()? That is
> close to packet basis without considering fragmentation of skb :)

No, I meant somehow in the notification having a way to identify all
the skbs belonging to the same sendmsg call, to allow filtering on
that. But I also don't immediately see how to do that (without adding
yet another counter say).

Right now, push-based seems rather arbitrary to me, informed more by
technical limitations than a clear design. Perhaps per-packet makes
more sense, esp. since BPF calls are cheap (compared to the other
errqueue mechanism).


  reply	other threads:[~2026-04-06 14:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04 15:04 [PATCH net-next v2 0/4] bpf-timestamp: convert to push-level granularity Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 1/4] tcp: separate BPF timestamping from tcp_tx_timestamp Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 2/4] tcp: advance the tsflags check to save cycles Jason Xing
2026-04-06  2:23   ` Willem de Bruijn
2026-04-06 11:48     ` Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 3/4] bpf-timestamp: keep track of the skb when wait_for_space occurs Jason Xing
2026-04-06  2:28   ` Willem de Bruijn
2026-04-06 11:59     ` Jason Xing
2026-04-06 14:37       ` Willem de Bruijn [this message]
2026-04-07  3:33         ` Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 4/4] bpf-timestamp: complete tracing the skb from each push in sendmsg Jason Xing
2026-04-06  2:17 ` [PATCH net-next v2 0/4] bpf-timestamp: convert to push-level granularity Willem de Bruijn
2026-04-06 12:25   ` Jason Xing
2026-04-06 14:38     ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.27ec47b22b23c@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=katrinzhou@tencent.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox