From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Jason Xing <kerneljasonxing@gmail.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, willemb@google.com,
martin.lau@kernel.org, netdev@vger.kernel.org,
bpf@vger.kernel.org, Jason Xing <kernelxing@tencent.com>,
Yushan Zhou <katrinzhou@tencent.com>
Subject: Re: [PATCH net-next v2 3/4] bpf-timestamp: keep track of the skb when wait_for_space occurs
Date: Mon, 06 Apr 2026 10:37:57 -0400 [thread overview]
Message-ID: <willemdebruijn.kernel.27ec47b22b23c@gmail.com> (raw)
In-Reply-To: <CAL+tcoB94=ptFB-acU3nAK_MqW74DV0MQUVvvV3nbQ7Zc6zSZw@mail.gmail.com>
Jason Xing wrote:
> On Mon, Apr 6, 2026 at 10:28 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > The patch is the 1/2 part of push-level granularity feature.
> > >
> > > Tag the skb in tcp_sendmsg_locked() when wait_for_space occurs even
> > > though it might not carry the last byte of the sendmsg.
> > >
> > > Prior to the patch, BPF timestamping cannot cover this case:
> > > The following steps reproduce this:
> > > 1) skb A is the current last skb before entering wait_for_space process
> > > 2) tcp_push() pushes A without any tag
> > > 3) A is transmitted from TCP to driver without putting any skb carrying
> > > timestamps in the error queue, like SCHED, DRV/HARDWARE.
> > > 4) sk_stream_wait_memory() sleeps for a while and then returns with an
> > > error code. Note that the socket lock is released.
> > > 5) skb A finally gets acked and removed from the rtx queue.
> > > 6) continue with the rest of tcp_sendmsg_locked(): it will jump to(goto)
> > > 'do_error' label and then 'out' label.
> > > 7) at this moment, skb A turns out to be the last one in this send
> > > syscall, and miss the following tcp_bpf_tx_timestamp() opportunity
> > > before the final tcp_push()
> > > 8) BPF script fails to see any timestamps this time
> > >
> > > Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > > net/ipv4/tcp.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index c603b90057f6..7d030a11d004 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -1400,9 +1400,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> > > wait_for_space:
> > > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> > > tcp_remove_empty_skb(sk);
> > > - if (copied)
> > > + if (copied) {
> > > + tcp_bpf_tx_timestamp(sk);
> > > tcp_push(sk, flags & ~MSG_MORE, mss_now,
> > > TCP_NAGLE_PUSH, size_goal);
> >
> > Now the number of skbs that will be tracked will be unpredictable,
> > varying based on memory pressure.
>
> Right, I put some effort into writing a selftests to check how many
> push functions get called at one time and failed to do so.
>
> >
> > That sounds hard to use to me. Especially if these extra pushes
> > cannot be identified as such.
> >
> > Perhaps if all skbs from the same sendmsg call can be identified,
> > that would help explain pattern in data resulting from these
> > uncommon extra data points.
>
> You meant move tcp_bpf_tx_timestamp before tcp_skb_entail()? That is
> close to packet basis without considering fragmentation of skb :)
No, I meant somehow in the notification having a way to identify all
the skbs belonging to the same sendmsg call, to allow filtering on
that. But I also don't immediately see how to do that (without adding
yet another counter say).
Right now, push-based seems rather arbitrary to me, informed more by
technical limitations than a clear design. Perhaps per-packet makes
more sense, esp. since BPF calls are cheap (compared to the other
errqueue mechanism).
next prev parent reply other threads:[~2026-04-06 14:37 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-04 15:04 [PATCH net-next v2 0/4] bpf-timestamp: convert to push-level granularity Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 1/4] tcp: separate BPF timestamping from tcp_tx_timestamp Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 2/4] tcp: advance the tsflags check to save cycles Jason Xing
2026-04-06 2:23 ` Willem de Bruijn
2026-04-06 11:48 ` Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 3/4] bpf-timestamp: keep track of the skb when wait_for_space occurs Jason Xing
2026-04-06 2:28 ` Willem de Bruijn
2026-04-06 11:59 ` Jason Xing
2026-04-06 14:37 ` Willem de Bruijn [this message]
2026-04-07 3:33 ` Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 4/4] bpf-timestamp: complete tracing the skb from each push in sendmsg Jason Xing
2026-04-06 2:17 ` [PATCH net-next v2 0/4] bpf-timestamp: convert to push-level granularity Willem de Bruijn
2026-04-06 12:25 ` Jason Xing
2026-04-06 14:38 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.27ec47b22b23c@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=katrinzhou@tencent.com \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=martin.lau@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox