public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] net-timestamp: take track of the skb when wait_for_space occurs
@ 2026-04-02  8:58 Jason Xing
  2026-04-02 14:24 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Xing @ 2026-04-02  8:58 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, willemb
  Cc: netdev, Jason Xing, Yushan Zhou

From: Jason Xing <kernelxing@tencent.com>

Tag the skb in tcp_sendmsg_locked() when wait_for_space occurs even
though it might not carry the last byte of the sendmsg.

If we don't do so, we might be faced with no single timestamp that
can be received by application from the error queue. The following steps
reproduce this:
1) skb A is the current last skb before entering wait_for_space process
2) tcp_push() pushes A without any tag
3) A is transmitted from TCP to driver without putting any skb carring
   timestamps in the error queue, like SCHED, DRV/HARDWARE.
4) sk_stream_wait_memory() sleeps for a while and then returns with an
   error code. Note that the socket lock is released.
5) skb A finally gets acked and removed from the rtx queue.
6) continue with the rest of tcp_sendmsg_locked(): it will jump to(goto)
   'do_error' label and then 'out' label.
7) at this moment, skb A turns out to be the last one in this send
   syscall, and miss the following tcp_tx_timestamp() opportunity before
   the final tcp_push
8) application receives no timestamps this time

The original commit ad02c4f54782 ("tcp: provide timestamps for partial writes")
says it is best effort. Now it's time to cover the only potential point
to avoid missing record.

The side effect is obvious that we might record more than one time for a
single send syscall since the skb that we keep track of in this scenario
might not be the last one. But tracing more than one skb is not a bad
thing since there is an emerging/promissing trend to do a detailed
packet granularity monitor.

Thanks to the great ID, namely, tskey, application that is responsible
for the collect/sort of timestamps leverages it to put that record in
between two consecutive send syscalls correctly.

Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/ipv4/tcp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 516087c622ad..2db80d75cfa4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1411,9 +1411,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 wait_for_space:
 		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 		tcp_remove_empty_skb(sk);
-		if (copied)
+		if (copied) {
+			tcp_tx_timestamp(sk, &sockc);
 			tcp_push(sk, flags & ~MSG_MORE, mss_now,
 				 TCP_NAGLE_PUSH, size_goal);
+		}
 
 		err = sk_stream_wait_memory(sk, &timeo);
 		if (err != 0)
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-02 19:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02  8:58 [PATCH net-next] net-timestamp: take track of the skb when wait_for_space occurs Jason Xing
2026-04-02 14:24 ` Eric Dumazet
2026-04-02 15:02   ` Jason Xing
2026-04-02 15:39     ` Eric Dumazet
2026-04-02 16:09       ` Jason Xing
2026-04-02 19:18         ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox