[PATCH net-next 0/6] bpf-timetamp: support rx side

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 0/6] bpf-timetamp: support rx side
@ 2026-05-18  8:23 Jason Xing
  2026-05-18  8:23 ` [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc Jason Xing
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Jason Xing @ 2026-05-18  8:23 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf
  Cc: netdev, bpf, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Previously the series[1] has already supported tx side for BPF
timestamping, now it's time to support rx side.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=68b92ac494eb

Jason Xing (6):
  bpf: Add bpf_ktime_get_real_ns() kfunc
  net: export sock_disable_timestamp() declaration
  bpf: support bpf_setsockopt for bpf timestamping rx feature
  bpf: add BPF_SOCK_OPS_TSTAMP_RCV_CB callback
  bpf: enable bpf timestamping rx in TCP layer
  selftests/bpf: Add RX latency tests for bpf timestamping

 include/net/sock.h                            | 12 +++-
 include/uapi/linux/bpf.h                      | 10 ++-
 kernel/bpf/helpers.c                          |  6 ++
 net/core/filter.c                             |  8 +++
 net/core/sock.c                               | 20 +++++-
 net/ipv4/tcp.c                                | 14 +++-
 tools/include/uapi/linux/bpf.h                |  5 ++
 .../bpf/prog_tests/net_timestamping.c         | 69 ++++++++++++++++++-
 .../selftests/bpf/progs/net_timestamping.c    | 35 ++++++++++
 9 files changed, 172 insertions(+), 7 deletions(-)

-- 
2.43.7


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc
  2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
@ 2026-05-18  8:23 ` Jason Xing
  2026-05-18 11:57   ` Jesper Dangaard Brouer
  2026-05-18  8:23 ` [PATCH net-next 2/6] net: export sock_disable_timestamp() declaration Jason Xing
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Jason Xing @ 2026-05-18  8:23 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf
  Cc: netdev, bpf, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Currently BPF programs can obtain timestamps via bpf_ktime_get_ns(),
which returns CLOCK_MONOTONIC time. However, the skb->tstamp field
populated by the network stack uses ktime_get_real() which is in the
CLOCK_REALTIME domain.

In the series, kernel reports the software/hardware timestamps through
sockopt and then userspace bpf application gets them and calculate the
delta only through an unified time unit.

However, prior to this, when a BPF program tries to measure RX packet
delay by comparing skb->tstamp with bpf_ktime_get_ns(), the result
is incorrect because the two clocks have different epochs.

Introduce a new BPF kfunc bpf_ktime_get_real_ns() that returns the
current CLOCK_REALTIME time. This allows BPF programs to perform
accurate delay calculations without clock domain mismatch issue.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 kernel/bpf/helpers.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 2bb60200c266..863645d096ef 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2317,6 +2317,11 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,

 __bpf_kfunc_start_defs();

+__bpf_kfunc u64 bpf_ktime_get_real_ns(void)
+{
+	return ktime_get_real_fast_ns();
+}
+
 /**
  * bpf_obj_new() - allocate an object described by program BTF
  * @local_type_id__k: type ID in program BTF
@@ -4859,6 +4864,7 @@ BTF_ID_FLAGS(func, bpf_task_work_schedule_resume, KF_IMPLICIT_ARGS)
 BTF_ID_FLAGS(func, bpf_dynptr_from_file)
 BTF_ID_FLAGS(func, bpf_dynptr_file_discard)
 BTF_ID_FLAGS(func, bpf_timer_cancel_async)
+BTF_ID_FLAGS(func, bpf_ktime_get_real_ns)
 BTF_KFUNCS_END(common_btf_ids)

 static const struct btf_kfunc_id_set common_kfunc_set = {
-- 
2.43.7

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 2/6] net: export sock_disable_timestamp() declaration
  2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
  2026-05-18  8:23 ` [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc Jason Xing
@ 2026-05-18  8:23 ` Jason Xing
  2026-05-18  8:23 ` [PATCH net-next 3/6] bpf: support bpf_setsockopt for bpf timestamping rx feature Jason Xing
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-05-18  8:23 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf
  Cc: netdev, bpf, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

In the series, sock_disable_timestamp() will be shortly used by bpf
timestamping rx feature to dynamically turn off the global time record
function. So remove the static label.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/sock.h | 1 +
 net/core/sock.c    | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index e0263bae8da9..a579d5b09207 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -3047,6 +3047,7 @@ static inline bool sk_listener_or_tw(const struct sock *sk)
 }
 
 void sock_enable_timestamp(struct sock *sk, enum sock_flags flag);
+void sock_disable_timestamp(struct sock *sk, unsigned long flags);
 int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len, int level,
 		       int type);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index f362e3ce1efb..f3d78da3aeba 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -474,7 +474,7 @@ static bool sock_needs_netstamp(const struct sock *sk)
 	}
 }
 
-static void sock_disable_timestamp(struct sock *sk, unsigned long flags)
+void sock_disable_timestamp(struct sock *sk, unsigned long flags)
 {
 	if (sk->sk_flags & flags) {
 		sk->sk_flags &= ~flags;
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 3/6] bpf: support bpf_setsockopt for bpf timestamping rx feature
  2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
  2026-05-18  8:23 ` [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc Jason Xing
  2026-05-18  8:23 ` [PATCH net-next 2/6] net: export sock_disable_timestamp() declaration Jason Xing
@ 2026-05-18  8:23 ` Jason Xing
  2026-05-18  8:23 ` [PATCH net-next 4/6] bpf: add BPF_SOCK_OPS_TSTAMP_RCV_CB callback Jason Xing
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-05-18  8:23 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf
  Cc: netdev, bpf, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Add SK_BPF_CB_RX_TIMESTAMPING callback flag to enable RX timestamping
via bpf_setsockopt(SK_BPF_CB_FLAGS).

Add SOCK_BPF_TIMESTAMPING_RX into enum sock_flags that is used as
centralized management of net_enable/disable_timestamp. Note that only
one of them (timestamp, so_timestamping and bpf timestamping rx) can
enable and disable the global time record of skbs.

In addition, include SOCK_BPF_TIMESTAMPING_RX in SK_FLAGS_TIMESTAMP so
that __sk_destruct() calling sock_disable_timestamp() to thoroughly turn
off the global time record.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/sock.h       | 5 ++++-
 include/uapi/linux/bpf.h | 5 +++--
 net/core/filter.c        | 8 ++++++++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index a579d5b09207..cf0e82e46482 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1022,9 +1022,12 @@ enum sock_flags {
 	SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
 	SOCK_RCVPRIORITY, /* Receive SO_PRIORITY ancillary data with packet */
 	SOCK_TIMESTAMPING_ANY, /* Copy of sk_tsflags & TSFLAGS_ANY */
+	SOCK_BPF_TIMESTAMPING_RX, /* BPF RX timestamping enabled */
 };
 
-#define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
+#define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | \
+			    (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE) | \
+			    (1UL << SOCK_BPF_TIMESTAMPING_RX))
 /*
  * The highest bit of sk_tsflags is reserved for kernel-internal
  * SOCKCM_FLAG_TS_OPT_ID. There is a check in core/sock.c to control that
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 552bc5d9afbd..1e09b5cd7a39 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7029,8 +7029,9 @@ enum {
 
 enum {
 	SK_BPF_CB_TX_TIMESTAMPING	= 1<<0,
-	SK_BPF_CB_MASK			= (SK_BPF_CB_TX_TIMESTAMPING - 1) |
-					   SK_BPF_CB_TX_TIMESTAMPING
+	SK_BPF_CB_RX_TIMESTAMPING	= 1<<1,
+	SK_BPF_CB_MASK			= (SK_BPF_CB_RX_TIMESTAMPING - 1) |
+					   SK_BPF_CB_RX_TIMESTAMPING
 };
 
 /* List of known BPF sock_ops operators.
diff --git a/net/core/filter.c b/net/core/filter.c
index 9590877b0714..08ad102f204e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5304,6 +5304,14 @@ static int sk_bpf_set_get_cb_flags(struct sock *sk, char *optval, bool getopt)
 	if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
 		return -EINVAL;
 
+	if ((sk_bpf_cb_flags ^ sk->sk_bpf_cb_flags) & SK_BPF_CB_RX_TIMESTAMPING) {
+		if (sk_bpf_cb_flags & SK_BPF_CB_RX_TIMESTAMPING)
+			sock_enable_timestamp(sk, SOCK_BPF_TIMESTAMPING_RX);
+		else
+			sock_disable_timestamp(sk,
+					       (1UL << SOCK_BPF_TIMESTAMPING_RX));
+	}
+
 	sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
 
 	return 0;
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 4/6] bpf: add BPF_SOCK_OPS_TSTAMP_RCV_CB callback
  2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
                   ` (2 preceding siblings ...)
  2026-05-18  8:23 ` [PATCH net-next 3/6] bpf: support bpf_setsockopt for bpf timestamping rx feature Jason Xing
@ 2026-05-18  8:23 ` Jason Xing
  2026-05-18  8:23 ` [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer Jason Xing
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-05-18  8:23 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf
  Cc: netdev, bpf, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

This is the prep patch adding BPF_SOCK_OPS_TSTAMP_RCV_CB cb and the rx
tunnel to allow kernel to report timestamps.

It's possible to have both software and hardware timestamps in the last
skb from this recv syscall, so the tunnel bpf_skops_rx_timestamping()
supports four slots to record and report.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/sock.h             |  6 ++++++
 include/uapi/linux/bpf.h       |  5 +++++
 net/core/sock.c                | 18 ++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  5 +++++
 4 files changed, 34 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index cf0e82e46482..14945cd69c84 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -3138,10 +3138,16 @@ int sock_set_timestamping(struct sock *sk, int optname,
 
 #if defined(CONFIG_CGROUP_BPF)
 void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
+void bpf_skops_rx_timestamping(struct sock *sk,
+			       struct scm_timestamping_internal *tss, int op);
 #else
 static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
 {
 }
+static inline void bpf_skops_rx_timestamping(struct sock *sk,
+					     struct scm_timestamping_internal *tss, int op)
+{
+}
 #endif
 void sock_no_linger(struct sock *sk);
 void sock_set_keepalive(struct sock *sk);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1e09b5cd7a39..113a2a72cbf4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7169,6 +7169,11 @@ enum {
 					 * sendmsg timestamp with corresponding
 					 * tskey.
 					 */
+	BPF_SOCK_OPS_TSTAMP_RCV_CB,	/* Called in tcp_recvmsg() to record
+					 * sw/hw timestamp of the last skb
+					 * after receiving all the data when
+					 * SK_BPF_CB_RX_TIMESTAMPING is on.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/sock.c b/net/core/sock.c
index f3d78da3aeba..81a234e10fd3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -952,6 +952,24 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
 	bpf_skops_init_skb(&sock_ops, skb, 0);
 	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
 }
+
+void bpf_skops_rx_timestamping(struct sock *sk,
+			       struct scm_timestamping_internal *tss, int op)
+{
+	struct bpf_sock_ops_kern sock_ops;
+	u64 sw_tstamp = ktime_to_ns(tss->ts[0]);
+	u64 hw_tstamp = ktime_to_ns(tss->ts[2]);
+
+	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
+	sock_ops.op = op;
+	sock_ops.is_fullsock = 1;
+	sock_ops.sk = sk;
+	sock_ops.args[0] = (u32)sw_tstamp;
+	sock_ops.args[1] = (u32)(sw_tstamp >> 32);
+	sock_ops.args[2] = (u32)hw_tstamp;
+	sock_ops.args[3] = (u32)(hw_tstamp >> 32);
+	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
+}
 #endif
 
 void sock_set_keepalive(struct sock *sk)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 677be9a47347..483ff4497d51 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7168,6 +7168,11 @@ enum {
 					 * sendmsg timestamp with corresponding
 					 * tskey.
 					 */
+	BPF_SOCK_OPS_TSTAMP_RCV_CB,	/* Called in tcp_recvmsg() to record
+					 * sw/hw timestamp of the last skb
+					 * after receiving all the data when
+					 * SK_BPF_CB_RX_TIMESTAMPING is on.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
  2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
                   ` (3 preceding siblings ...)
  2026-05-18  8:23 ` [PATCH net-next 4/6] bpf: add BPF_SOCK_OPS_TSTAMP_RCV_CB callback Jason Xing
@ 2026-05-18  8:23 ` Jason Xing
  2026-05-18 13:01   ` Jesper Dangaard Brouer
  2026-05-18 15:34   ` Stanislav Fomichev
  2026-05-18  8:23 ` [PATCH net-next 6/6] selftests/bpf: Add RX latency tests for bpf timestamping Jason Xing
  2026-05-18 11:46 ` [PATCH net-next 0/6] bpf-timetamp: support rx side Jesper Dangaard Brouer
  6 siblings, 2 replies; 19+ messages in thread
From: Jason Xing @ 2026-05-18  8:23 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf
  Cc: netdev, bpf, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Add two if statements to accurately isolate bpf timestamping and so
timestamping. They can work respectively.

As to so_timestamping, only add a loose condition via report flags
to avoid duplicate strict checks that is done in tcp_recv_timestamp()
and performance impact. If the loose condition is hit,
tcp_recv_timestamp() is able to handle the exact case and doesn't
hamper the existing timestamping feature.

Make it work in TCP protocol.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/ipv4/tcp.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 21ece4c71612..64c69bb3578a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
 	release_sock(sk);
 
 	if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
-		if (cmsg_flags & TCP_CMSG_TS)
-			tcp_recv_timestamp(msg, sk, &tss);
+		if (cmsg_flags & TCP_CMSG_TS) {
+			u32 tsflags = READ_ONCE(sk->sk_tsflags);
+
+			if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
+			    SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
+				bpf_skops_rx_timestamping(sk, &tss,
+							  BPF_SOCK_OPS_TSTAMP_RCV_CB);
+			if (sock_flag(sk, SOCK_RCVTSTAMP) ||
+			    tsflags & SOF_TIMESTAMPING_SOFTWARE ||
+			    tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
+				tcp_recv_timestamp(msg, sk, &tss);
+		}
 		if ((cmsg_flags & TCP_CMSG_INQ) | msg->msg_get_inq) {
 			msg->msg_inq = tcp_inq_hint(sk);
 			if (cmsg_flags & TCP_CMSG_INQ)
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 6/6] selftests/bpf: Add RX latency tests for bpf timestamping
  2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
                   ` (4 preceding siblings ...)
  2026-05-18  8:23 ` [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer Jason Xing
@ 2026-05-18  8:23 ` Jason Xing
  2026-05-18 11:46 ` [PATCH net-next 0/6] bpf-timetamp: support rx side Jesper Dangaard Brouer
  6 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-05-18  8:23 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf
  Cc: netdev, bpf, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

The client sends data, and the server reads it via read(), which
triggers tcp_recvmsg() internally. And then the computation is done
in the BPF_SOCK_OPS_TSTAMP_RCV_CB callback. Extend test_tcp() to cover
the new BPF RX timestamping path.

One more crucial thing is to add usleep to allow the delay of workqueue
turning on global time record of each skb in recv path, or else the test
might fail due to no timestamp in the skb.

Verify that SO_TIMESTAMPING RX (software) and BPF RX timestamping can
work simultaneously on the same socket without conflict. When
enable_socket_timestamping is set, enable SOF_TIMESTAMPING_SOFTWARE |
SOF_TIMESTAMPING_RX_SOFTWARE on afd and use recvmsg() to verify the
SCM_TIMESTAMPING cmsg contains a valid software RX timestamp.

This exercises the mixed mode where all 4 timestamping paths (SO TX on
cfd, SO RX on afd, BPF TX on cfd, BPF RX on afd) run together. The
standalone mode (enable_socket_timestamping=false) keeps only BPF TX
and BPF RX active using read().

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 .../bpf/prog_tests/net_timestamping.c         | 69 ++++++++++++++++++-
 .../selftests/bpf/progs/net_timestamping.c    | 35 ++++++++++
 2 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/net_timestamping.c b/tools/testing/selftests/bpf/prog_tests/net_timestamping.c
index dbfd87499b6b..3cc52b670e74 100644
--- a/tools/testing/selftests/bpf/prog_tests/net_timestamping.c
+++ b/tools/testing/selftests/bpf/prog_tests/net_timestamping.c
@@ -143,11 +143,45 @@ static void test_socket_timestamping(int fd)
 	SK_TS_ACK = 0;
 }
 
+static bool recv_verify_rx_timestamp(int fd, char *buf, int len)
+{
+	struct scm_timestamping *tss = NULL;
+	char ctrl[1024];
+	struct msghdr msg = {};
+	struct iovec iov;
+	struct cmsghdr *cm;
+	int ret;
+
+	iov.iov_base = buf;
+	iov.iov_len = len;
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = ctrl;
+	msg.msg_controllen = sizeof(ctrl);
+
+	ret = recvmsg(fd, &msg, 0);
+	if (!ASSERT_EQ(ret, len, "recvmsg from client"))
+		return false;
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if (cm->cmsg_level == SOL_SOCKET &&
+		    cm->cmsg_type == SCM_TIMESTAMPING) {
+			tss = (void *)CMSG_DATA(cm);
+			break;
+		}
+	}
+
+	ASSERT_TRUE(tss != NULL, "SCM_TIMESTAMPING cmsg present");
+	ASSERT_TRUE(tss && (tss->ts[0].tv_sec || tss->ts[0].tv_nsec),
+		    "rx sw timestamp non-zero");
+	return true;
+}
+
 static void test_tcp(int family, bool enable_socket_timestamping)
 {
 	struct net_timestamping__bss *bss;
 	char buf[cfg_payload_len];
-	int sfd = -1, cfd = -1;
+	int sfd = -1, cfd = -1, afd = -1;
 	unsigned int sock_opt;
 	struct netns_obj *ns;
 	int cg_fd;
@@ -187,7 +221,27 @@ static void test_tcp(int family, bool enable_socket_timestamping)
 	if (!ASSERT_OK_FD(cfd, "connect_to_fd_server"))
 		goto out;
 
+	afd = accept(sfd, NULL, NULL);
+	if (!ASSERT_OK_FD(afd, "accept"))
+		goto out;
+
+	/* net_enable_timestamp() defers the static key update via
+	 * schedule_work() when CONFIG_JUMP_LABEL is set. Give the
+	 * workqueue a chance to run so that netstamp_needed_key is
+	 * active and skb->tstamp gets populated in the receive path.
+	 */
+	usleep(10000);
+
 	if (enable_socket_timestamping) {
+		unsigned int rx_opt;
+
+		rx_opt = SOF_TIMESTAMPING_SOFTWARE |
+			 SOF_TIMESTAMPING_RX_SOFTWARE;
+		ret = setsockopt(afd, SOL_SOCKET, SO_TIMESTAMPING,
+				 (char *)&rx_opt, sizeof(rx_opt));
+		if (!ASSERT_OK(ret, "setsockopt SO_TIMESTAMPING RX on afd"))
+			goto out;
+
 		sock_opt = SOF_TIMESTAMPING_SOFTWARE |
 			   SOF_TIMESTAMPING_OPT_ID |
 			   SOF_TIMESTAMPING_TX_SCHED |
@@ -207,6 +261,15 @@ static void test_tcp(int family, bool enable_socket_timestamping)
 	if (!ASSERT_EQ(ret, sizeof(buf), "send to server"))
 		goto out;
 
+	if (enable_socket_timestamping) {
+		if (!recv_verify_rx_timestamp(afd, buf, sizeof(buf)))
+			goto out;
+	} else {
+		ret = read(afd, buf, sizeof(buf));
+		if (!ASSERT_EQ(ret, sizeof(buf), "recv from client"))
+			goto out;
+	}
+
 	if (enable_socket_timestamping)
 		test_socket_timestamping(cfd);
 
@@ -215,8 +278,12 @@ static void test_tcp(int family, bool enable_socket_timestamping)
 	ASSERT_EQ(bss->nr_sched, 1, "nr_sched");
 	ASSERT_EQ(bss->nr_txsw, 1, "nr_txsw");
 	ASSERT_EQ(bss->nr_ack, 1, "nr_ack");
+	ASSERT_EQ(bss->nr_passive, 1, "nr_passive");
+	ASSERT_GE(bss->nr_rcv, 1, "nr_rcv");
 
 out:
+	if (afd >= 0)
+		close(afd);
 	if (sfd >= 0)
 		close(sfd);
 	if (cfd >= 0)
diff --git a/tools/testing/selftests/bpf/progs/net_timestamping.c b/tools/testing/selftests/bpf/progs/net_timestamping.c
index b4c2f0f2be11..4bbff09db55c 100644
--- a/tools/testing/selftests/bpf/progs/net_timestamping.c
+++ b/tools/testing/selftests/bpf/progs/net_timestamping.c
@@ -6,6 +6,8 @@
 #include "bpf_kfuncs.h"
 #include <errno.h>
 
+extern u64 bpf_ktime_get_real_ns(void) __ksym;
+
 __u32 monitored_pid = 0;
 
 int nr_active;
@@ -14,6 +16,7 @@ int nr_passive;
 int nr_sched;
 int nr_txsw;
 int nr_ack;
+int nr_rcv;
 
 struct sk_stg {
 	__u64 sendmsg_ns;	/* record ts when sendmsg is called */
@@ -65,6 +68,22 @@ static int bpf_test_sockopt(void *ctx, const struct sock *sk, int expected)
 	return 0;
 }
 
+static int bpf_test_rx_sockopt(void *ctx, const struct sock *sk, int expected)
+{
+	int tmp, new = SK_BPF_CB_RX_TIMESTAMPING;
+	int opt = SK_BPF_CB_FLAGS;
+	int level = SOL_SOCKET;
+
+	if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new)) != expected)
+		return 1;
+
+	if (bpf_getsockopt(ctx, level, opt, &tmp, sizeof(tmp)) != expected ||
+	    (!expected && tmp != new))
+		return 1;
+
+	return 0;
+}
+
 static bool bpf_test_access_sockopt(void *ctx, const struct sock *sk)
 {
 	if (bpf_test_sockopt(ctx, sk, -EOPNOTSUPP))
@@ -224,6 +243,9 @@ int skops_sockopt(struct bpf_sock_ops *skops)
 	case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
 		nr_active += !bpf_test_sockopt(skops, sk, 0);
 		break;
+	case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+		nr_passive += !bpf_test_rx_sockopt(skops, sk, 0);
+		break;
 	case BPF_SOCK_OPS_TSTAMP_SENDMSG_CB:
 		if (bpf_test_delay(skops, sk))
 			nr_snd += 1;
@@ -240,6 +262,19 @@ int skops_sockopt(struct bpf_sock_ops *skops)
 		if (bpf_test_delay(skops, sk))
 			nr_ack += 1;
 		break;
+	case BPF_SOCK_OPS_TSTAMP_RCV_CB: {
+		u64 sw_tstamp, now, delay;
+
+		sw_tstamp = (u64)skops->args[0] | ((u64)skops->args[1] << 32);
+		if (!sw_tstamp)
+			break;
+
+		now = bpf_ktime_get_real_ns();
+		delay = now - sw_tstamp;
+		if (delay < delay_tolerance_nsec)
+			nr_rcv += 1;
+		break;
+	}
 	}
 
 	return 1;
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] bpf-timetamp: support rx side
  2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
                   ` (5 preceding siblings ...)
  2026-05-18  8:23 ` [PATCH net-next 6/6] selftests/bpf: Add RX latency tests for bpf timestamping Jason Xing
@ 2026-05-18 11:46 ` Jesper Dangaard Brouer
  2026-05-18 12:32   ` Jason Xing
  6 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2026-05-18 11:46 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, horms, willemb, kuniyu,
	ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, john.fastabend, sdf
  Cc: netdev, bpf, Jason Xing

On 18/05/2026 10.23, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 

You have a spelling mistake in the subject/title.

  timetamp -> timestamp

--Jesper

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc
  2026-05-18  8:23 ` [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc Jason Xing
@ 2026-05-18 11:57   ` Jesper Dangaard Brouer
  2026-05-18 12:35     ` Jason Xing
  0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2026-05-18 11:57 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, horms, willemb, kuniyu,
	ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, sdf
  Cc: netdev, bpf, Jason Xing, john.fastabend, Simon Sundberg,
	Toke Høiland-Jørgensen

On 18/05/2026 10.23, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Currently BPF programs can obtain timestamps via bpf_ktime_get_ns(),
> which returns CLOCK_MONOTONIC time. However, the skb->tstamp field
> populated by the network stack uses ktime_get_real() which is in the
> CLOCK_REALTIME domain.
> 
> In the series, kernel reports the software/hardware timestamps through
> sockopt and then userspace bpf application gets them and calculate the
> delta only through an unified time unit.
> 
> However, prior to this, when a BPF program tries to measure RX packet
> delay by comparing skb->tstamp with bpf_ktime_get_ns(), the result
> is incorrect because the two clocks have different epochs.
> 
> Introduce a new BPF kfunc bpf_ktime_get_real_ns() that returns the
> current CLOCK_REALTIME time. This allows BPF programs to perform
> accurate delay calculations without clock domain mismatch issue.
> 

I support adding this helper, because have also hit this issue[1].
Our ugly workaround is to use the TAI clock and adjust for the current 
TAI offset.

[1] https://github.com/xdp-project/bpf-examples/tree/main/netstacklat

Cc Simon Sundberg <Simon.Sundberg@kau.se>

> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>   kernel/bpf/helpers.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 2bb60200c266..863645d096ef 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2317,6 +2317,11 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
>   
>   __bpf_kfunc_start_defs();
>   
> +__bpf_kfunc u64 bpf_ktime_get_real_ns(void)
> +{
> +	return ktime_get_real_fast_ns();
> +}
> +
>   /**
>    * bpf_obj_new() - allocate an object described by program BTF
>    * @local_type_id__k: type ID in program BTF
> @@ -4859,6 +4864,7 @@ BTF_ID_FLAGS(func, bpf_task_work_schedule_resume, KF_IMPLICIT_ARGS)
>   BTF_ID_FLAGS(func, bpf_dynptr_from_file)
>   BTF_ID_FLAGS(func, bpf_dynptr_file_discard)
>   BTF_ID_FLAGS(func, bpf_timer_cancel_async)
> +BTF_ID_FLAGS(func, bpf_ktime_get_real_ns)
>   BTF_KFUNCS_END(common_btf_ids)
>   
>   static const struct btf_kfunc_id_set common_kfunc_set = {


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] bpf-timetamp: support rx side
  2026-05-18 11:46 ` [PATCH net-next 0/6] bpf-timetamp: support rx side Jesper Dangaard Brouer
@ 2026-05-18 12:32   ` Jason Xing
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-05-18 12:32 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf, netdev, bpf, Jason Xing

On Mon, May 18, 2026 at 7:46 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
> On 18/05/2026 10.23, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
>
> You have a spelling mistake in the subject/title.
>
>   timetamp -> timestamp

Oh, right, sorry, this is not the first time I made this mistake...
missing 's'... :(

Thanks,
Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc
  2026-05-18 11:57   ` Jesper Dangaard Brouer
@ 2026-05-18 12:35     ` Jason Xing
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-05-18 12:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, sdf, netdev, bpf, Jason Xing, john.fastabend,
	Simon Sundberg, Toke Høiland-Jørgensen

On Mon, May 18, 2026 at 7:57 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
> On 18/05/2026 10.23, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Currently BPF programs can obtain timestamps via bpf_ktime_get_ns(),
> > which returns CLOCK_MONOTONIC time. However, the skb->tstamp field
> > populated by the network stack uses ktime_get_real() which is in the
> > CLOCK_REALTIME domain.
> >
> > In the series, kernel reports the software/hardware timestamps through
> > sockopt and then userspace bpf application gets them and calculate the
> > delta only through an unified time unit.
> >
> > However, prior to this, when a BPF program tries to measure RX packet
> > delay by comparing skb->tstamp with bpf_ktime_get_ns(), the result
> > is incorrect because the two clocks have different epochs.
> >
> > Introduce a new BPF kfunc bpf_ktime_get_real_ns() that returns the
> > current CLOCK_REALTIME time. This allows BPF programs to perform
> > accurate delay calculations without clock domain mismatch issue.
> >
>
> I support adding this helper, because have also hit this issue[1].
> Our ugly workaround is to use the TAI clock and adjust for the current
> TAI offset.

Right. Thanks :)

>
> [1] https://github.com/xdp-project/bpf-examples/tree/main/netstacklat

The tool is useful as long as we enable net_enable_timetamp(). Yes,
bpf/so timestamping is one of methods to do so.

Thanks,
Jason

>
> Cc Simon Sundberg <Simon.Sundberg@kau.se>
>
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >   kernel/bpf/helpers.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 2bb60200c266..863645d096ef 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2317,6 +2317,11 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
> >
> >   __bpf_kfunc_start_defs();
> >
> > +__bpf_kfunc u64 bpf_ktime_get_real_ns(void)
> > +{
> > +     return ktime_get_real_fast_ns();
> > +}
> > +
> >   /**
> >    * bpf_obj_new() - allocate an object described by program BTF
> >    * @local_type_id__k: type ID in program BTF
> > @@ -4859,6 +4864,7 @@ BTF_ID_FLAGS(func, bpf_task_work_schedule_resume, KF_IMPLICIT_ARGS)
> >   BTF_ID_FLAGS(func, bpf_dynptr_from_file)
> >   BTF_ID_FLAGS(func, bpf_dynptr_file_discard)
> >   BTF_ID_FLAGS(func, bpf_timer_cancel_async)
> > +BTF_ID_FLAGS(func, bpf_ktime_get_real_ns)
> >   BTF_KFUNCS_END(common_btf_ids)
> >
> >   static const struct btf_kfunc_id_set common_kfunc_set = {
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
  2026-05-18  8:23 ` [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer Jason Xing
@ 2026-05-18 13:01   ` Jesper Dangaard Brouer
  2026-05-18 13:53     ` Jason Xing
  2026-05-18 15:34   ` Stanislav Fomichev
  1 sibling, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2026-05-18 13:01 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, horms, willemb, kuniyu,
	ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, john.fastabend, sdf, Simon Sundberg,
	Toke Høiland-Jørgensen
  Cc: netdev, bpf, Jason Xing



On 18/05/2026 10.23, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Add two if statements to accurately isolate bpf timestamping and so
> timestamping. They can work respectively.
> 
> As to so_timestamping, only add a loose condition via report flags
> to avoid duplicate strict checks that is done in tcp_recv_timestamp()
> and performance impact. If the loose condition is hit,
> tcp_recv_timestamp() is able to handle the exact case and doesn't
> hamper the existing timestamping feature.
> 
> Make it work in TCP protocol.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>   net/ipv4/tcp.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 21ece4c71612..64c69bb3578a 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
>   	release_sock(sk);
>   
>   	if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
> -		if (cmsg_flags & TCP_CMSG_TS)
> -			tcp_recv_timestamp(msg, sk, &tss);
> +		if (cmsg_flags & TCP_CMSG_TS) {
> +			u32 tsflags = READ_ONCE(sk->sk_tsflags);
> +
> +			if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
> +			    SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
> +				bpf_skops_rx_timestamping(sk, &tss,
> +							  BPF_SOCK_OPS_TSTAMP_RCV_CB);

Does this mean I can enable timestamp reading per cgroup?

In Simon's netstacklat[1] tool we are forced process all RX timestamp
(hooking fentry/tcp_recv_timestamp), and then we have a BPF filter[2] on
the cgroup IDs that we are interested in (which is a significant
overhead, as this is deployed at Cloudflare production scale).


[1] https://github.com/xdp-project/bpf-examples/tree/main/netstacklat

[2] 
https://github.com/xdp-project/bpf-examples/blob/main/netstacklat/netstacklat.bpf.c#L484-L488


> +			if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> +			    tsflags & SOF_TIMESTAMPING_SOFTWARE ||
> +			    tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
> +				tcp_recv_timestamp(msg, sk, &tss);
> +		}
>   		if ((cmsg_flags & TCP_CMSG_INQ) | msg->msg_get_inq) {
>   			msg->msg_inq = tcp_inq_hint(sk);
>   			if (cmsg_flags & TCP_CMSG_INQ)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
  2026-05-18 13:01   ` Jesper Dangaard Brouer
@ 2026-05-18 13:53     ` Jason Xing
  2026-05-18 16:40       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Xing @ 2026-05-18 13:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf, Simon Sundberg,
	Toke Høiland-Jørgensen, netdev, bpf, Jason Xing

On Mon, May 18, 2026 at 9:01 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
>
>
> On 18/05/2026 10.23, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Add two if statements to accurately isolate bpf timestamping and so
> > timestamping. They can work respectively.
> >
> > As to so_timestamping, only add a loose condition via report flags
> > to avoid duplicate strict checks that is done in tcp_recv_timestamp()
> > and performance impact. If the loose condition is hit,
> > tcp_recv_timestamp() is able to handle the exact case and doesn't
> > hamper the existing timestamping feature.
> >
> > Make it work in TCP protocol.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >   net/ipv4/tcp.c | 14 ++++++++++++--
> >   1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 21ece4c71612..64c69bb3578a 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
> >       release_sock(sk);
> >
> >       if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
> > -             if (cmsg_flags & TCP_CMSG_TS)
> > -                     tcp_recv_timestamp(msg, sk, &tss);
> > +             if (cmsg_flags & TCP_CMSG_TS) {
> > +                     u32 tsflags = READ_ONCE(sk->sk_tsflags);
> > +
> > +                     if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
> > +                         SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
> > +                             bpf_skops_rx_timestamping(sk, &tss,
> > +                                                       BPF_SOCK_OPS_TSTAMP_RCV_CB);
>
> Does this mean I can enable timestamp reading per cgroup?

Yes, I think so, but I didn't try. One of the natures of sockopt
feature is supporting cgroup attach.
cgroup_bpf_prog_attach()/cgroup_bpf_link_attach() is probably
something that you're looking for.

IIUC, you can attach the prog onto the cgroup where all the sockets
are set using the bpf timestamping function. So the current impl is
cleaner and has better isolation (to filter out those unmatched
flows).

>
> In Simon's netstacklat[1] tool we are forced process all RX timestamp
> (hooking fentry/tcp_recv_timestamp), and then we have a BPF filter[2] on
> the cgroup IDs that we are interested in (which is a significant
> overhead, as this is deployed at Cloudflare production scale).

I can feel the pain when filtering in this kind of relatively hot
path, which is what I'm trying to avoid internally. What I've done in
production (to cover those old kernels) is to just let the kernel
print the information, that's it, and there is an agent continuously
gathering the data, doing the match and computing latency. But it's
overall complicated.

Many thanks here, I'm always interested in hearing more useful and
real requirements and fancy ideas on how to monitor the latency :) Now
I'm still struggling to port the internal functions to bpf
timestamping.

Thanks,
Jason

>
>
> [1] https://github.com/xdp-project/bpf-examples/tree/main/netstacklat
>
> [2]
> https://github.com/xdp-project/bpf-examples/blob/main/netstacklat/netstacklat.bpf.c#L484-L488
>
>
> > +                     if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> > +                         tsflags & SOF_TIMESTAMPING_SOFTWARE ||
> > +                         tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
> > +                             tcp_recv_timestamp(msg, sk, &tss);
> > +             }
> >               if ((cmsg_flags & TCP_CMSG_INQ) | msg->msg_get_inq) {
> >                       msg->msg_inq = tcp_inq_hint(sk);
> >                       if (cmsg_flags & TCP_CMSG_INQ)
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
  2026-05-18  8:23 ` [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer Jason Xing
  2026-05-18 13:01   ` Jesper Dangaard Brouer
@ 2026-05-18 15:34   ` Stanislav Fomichev
  2026-05-18 23:56     ` Jason Xing
  1 sibling, 1 reply; 19+ messages in thread
From: Stanislav Fomichev @ 2026-05-18 15:34 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf, netdev, bpf, Jason Xing

On 05/18, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Add two if statements to accurately isolate bpf timestamping and so
> timestamping. They can work respectively.
> 
> As to so_timestamping, only add a loose condition via report flags
> to avoid duplicate strict checks that is done in tcp_recv_timestamp()
> and performance impact. If the loose condition is hit,
> tcp_recv_timestamp() is able to handle the exact case and doesn't
> hamper the existing timestamping feature.
> 
> Make it work in TCP protocol.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/ipv4/tcp.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 21ece4c71612..64c69bb3578a 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
>  	release_sock(sk);
>  
>  	if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
> -		if (cmsg_flags & TCP_CMSG_TS)
> -			tcp_recv_timestamp(msg, sk, &tss);
> +		if (cmsg_flags & TCP_CMSG_TS) {
> +			u32 tsflags = READ_ONCE(sk->sk_tsflags);
> +
> +			if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
> +			    SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
> +				bpf_skops_rx_timestamping(sk, &tss,
> +							  BPF_SOCK_OPS_TSTAMP_RCV_CB);

What about tcp_zc_finalize_rx_tstamp? Do you not want the rx tstamp for
tcp rx zc?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
  2026-05-18 13:53     ` Jason Xing
@ 2026-05-18 16:40       ` Jesper Dangaard Brouer
  2026-05-18 23:16         ` Jason Xing
  0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2026-05-18 16:40 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf, Simon Sundberg,
	Toke Høiland-Jørgensen, netdev, bpf, Jason Xing



On 18/05/2026 15.53, Jason Xing wrote:
> On Mon, May 18, 2026 at 9:01 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>>
>>
>>
>> On 18/05/2026 10.23, Jason Xing wrote:
>>> From: Jason Xing <kernelxing@tencent.com>
>>>
>>> Add two if statements to accurately isolate bpf timestamping and so
>>> timestamping. They can work respectively.
>>>
>>> As to so_timestamping, only add a loose condition via report flags
>>> to avoid duplicate strict checks that is done in tcp_recv_timestamp()
>>> and performance impact. If the loose condition is hit,
>>> tcp_recv_timestamp() is able to handle the exact case and doesn't
>>> hamper the existing timestamping feature.
>>>
>>> Make it work in TCP protocol.
>>>
>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>>> ---
>>>    net/ipv4/tcp.c | 14 ++++++++++++--
>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 21ece4c71612..64c69bb3578a 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
>>>        release_sock(sk);
>>>
>>>        if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
>>> -             if (cmsg_flags & TCP_CMSG_TS)
>>> -                     tcp_recv_timestamp(msg, sk, &tss);
>>> +             if (cmsg_flags & TCP_CMSG_TS) {
>>> +                     u32 tsflags = READ_ONCE(sk->sk_tsflags);
>>> +
>>> +                     if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
>>> +                         SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
>>> +                             bpf_skops_rx_timestamping(sk, &tss,
>>> +                                                       BPF_SOCK_OPS_TSTAMP_RCV_CB);
>>
>> Does this mean I can enable timestamp reading per cgroup?
> 
> Yes, I think so, but I didn't try. One of the natures of sockopt
> feature is supporting cgroup attach.
> cgroup_bpf_prog_attach()/cgroup_bpf_link_attach() is probably
> something that you're looking for.
> 

Sound good

> IIUC, you can attach the prog onto the cgroup where all the sockets
> are set using the bpf timestamping function. So the current impl is
> cleaner and has better isolation (to filter out those unmatched
> flows).
> 
>>
>> In Simon's netstacklat[1] tool we are forced process all RX timestamp
>> (hooking fentry/tcp_recv_timestamp), and then we have a BPF filter[2] on
>> the cgroup IDs that we are interested in (which is a significant
>> overhead, as this is deployed at Cloudflare production scale).
> 
> I can feel the pain when filtering in this kind of relatively hot
> path, which is what I'm trying to avoid internally. What I've done in
> production (to cover those old kernels) is to just let the kernel
> print the information, that's it, and there is an agent continuously
> gathering the data, doing the match and computing latency. But it's
> overall complicated.
> 

I hope you don't mean your internal/old approach was using printk and 
then analyzing this data.

> Many thanks here, I'm always interested in hearing more useful and
> real requirements and fancy ideas on how to monitor the latency :) Now

Simon Sundberg <Simon.Sundberg@kau.se> have many more fancy ideas on how 
to monitor the latency.
The netstacklat tool is part of Simon's PhD thesis:
- https://doi.org/10.59217/qklv6836

And we even gotten a paper accepted on netstacklat:
- 
https://kau.diva-portal.org/smash/record.jsf?pid=diva2%3A2034009&dswid=3032


> I'm still struggling to port the internal functions to bpf
> timestamping.
> 

Good luck, feel free to get inspired by our netstacklat tool.
Key point is to let BPF store latency histograms and then let userspace
periodically consume these - as heatmaps.

We are proposing to add 'netstacklat' as a new libbpf-tools utility
- https://github.com/iovisor/bcc/issues/5510


>>
>>
>> [1] https://github.com/xdp-project/bpf-examples/tree/main/netstacklat
>>
>> [2]
>> https://github.com/xdp-project/bpf-examples/blob/main/netstacklat/netstacklat.bpf.c#L484-L488
>>
>>
>>> +                     if (sock_flag(sk, SOCK_RCVTSTAMP) ||
>>> +                         tsflags & SOF_TIMESTAMPING_SOFTWARE ||
>>> +                         tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
>>> +                             tcp_recv_timestamp(msg, sk, &tss);
>>> +             }
>>>                if ((cmsg_flags & TCP_CMSG_INQ) | msg->msg_get_inq) {
>>>                        msg->msg_inq = tcp_inq_hint(sk);
>>>                        if (cmsg_flags & TCP_CMSG_INQ)
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
  2026-05-18 16:40       ` Jesper Dangaard Brouer
@ 2026-05-18 23:16         ` Jason Xing
  2026-05-18 23:24           ` Jason Xing
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Xing @ 2026-05-18 23:16 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf, Simon Sundberg,
	Toke Høiland-Jørgensen, netdev, bpf, Jason Xing

On Tue, May 19, 2026 at 12:40 AM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
>
>
> On 18/05/2026 15.53, Jason Xing wrote:
> > On Mon, May 18, 2026 at 9:01 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
> >>
> >>
> >>
> >> On 18/05/2026 10.23, Jason Xing wrote:
> >>> From: Jason Xing <kernelxing@tencent.com>
> >>>
> >>> Add two if statements to accurately isolate bpf timestamping and so
> >>> timestamping. They can work respectively.
> >>>
> >>> As to so_timestamping, only add a loose condition via report flags
> >>> to avoid duplicate strict checks that is done in tcp_recv_timestamp()
> >>> and performance impact. If the loose condition is hit,
> >>> tcp_recv_timestamp() is able to handle the exact case and doesn't
> >>> hamper the existing timestamping feature.
> >>>
> >>> Make it work in TCP protocol.
> >>>
> >>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >>> ---
> >>>    net/ipv4/tcp.c | 14 ++++++++++++--
> >>>    1 file changed, 12 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>> index 21ece4c71612..64c69bb3578a 100644
> >>> --- a/net/ipv4/tcp.c
> >>> +++ b/net/ipv4/tcp.c
> >>> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
> >>>        release_sock(sk);
> >>>
> >>>        if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
> >>> -             if (cmsg_flags & TCP_CMSG_TS)
> >>> -                     tcp_recv_timestamp(msg, sk, &tss);
> >>> +             if (cmsg_flags & TCP_CMSG_TS) {
> >>> +                     u32 tsflags = READ_ONCE(sk->sk_tsflags);
> >>> +
> >>> +                     if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
> >>> +                         SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
> >>> +                             bpf_skops_rx_timestamping(sk, &tss,
> >>> +                                                       BPF_SOCK_OPS_TSTAMP_RCV_CB);
> >>
> >> Does this mean I can enable timestamp reading per cgroup?
> >
> > Yes, I think so, but I didn't try. One of the natures of sockopt
> > feature is supporting cgroup attach.
> > cgroup_bpf_prog_attach()/cgroup_bpf_link_attach() is probably
> > something that you're looking for.
> >
>
> Sound good
>
> > IIUC, you can attach the prog onto the cgroup where all the sockets
> > are set using the bpf timestamping function. So the current impl is
> > cleaner and has better isolation (to filter out those unmatched
> > flows).
> >
> >>
> >> In Simon's netstacklat[1] tool we are forced process all RX timestamp
> >> (hooking fentry/tcp_recv_timestamp), and then we have a BPF filter[2] on
> >> the cgroup IDs that we are interested in (which is a significant
> >> overhead, as this is deployed at Cloudflare production scale).
> >
> > I can feel the pain when filtering in this kind of relatively hot
> > path, which is what I'm trying to avoid internally. What I've done in
> > production (to cover those old kernels) is to just let the kernel
> > print the information, that's it, and there is an agent continuously
> > gathering the data, doing the match and computing latency. But it's
> > overall complicated.
> >
>
> I hope you don't mean your internal/old approach was using printk and
> then analyzing this data.

Of course not :)

The internal approach is to cover the old kernels but doesn't mean the
approach is old :P

Instead, the internal kernel module is super efficient and I'm trying
to ship bpf with such an ability. The fact is we've already deployed
in production: 7x24 running, zero sampling.

Please see page 24 where there is a brief introduction on how to deal
with the log part:
https://lpc.events/event/19/contributions/2055/#preview:3846
I believe this is the promising direction (ring buffer + lightweight
kernel + heavy agent) we're taking.

The headache part is that I need to provide an agent written in BPF to
do the heavy process.

>
> > Many thanks here, I'm always interested in hearing more useful and
> > real requirements and fancy ideas on how to monitor the latency :) Now
>
> Simon Sundberg <Simon.Sundberg@kau.se> have many more fancy ideas on how
> to monitor the latency.
> The netstacklat tool is part of Simon's PhD thesis:
> - https://doi.org/10.59217/qklv6836
>
> And we even gotten a paper accepted on netstacklat:
> -
> https://kau.diva-portal.org/smash/record.jsf?pid=diva2%3A2034009&dswid=3032

Sorry, I cannot access this link. Could you give me the title of this paper?

>
>
> > I'm still struggling to port the internal functions to bpf
> > timestamping.
> >
>
> Good luck, feel free to get inspired by our netstacklat tool.
> Key point is to let BPF store latency histograms and then let userspace
> periodically consume these - as heatmaps.

Thanks. I'll dig deeper into it.

>
> We are proposing to add 'netstacklat' as a new libbpf-tools utility
> - https://github.com/iovisor/bcc/issues/5510

Great!

Thanks,
Jason

>
>
> >>
> >>
> >> [1] https://github.com/xdp-project/bpf-examples/tree/main/netstacklat
> >>
> >> [2]
> >> https://github.com/xdp-project/bpf-examples/blob/main/netstacklat/netstacklat.bpf.c#L484-L488
> >>
> >>
> >>> +                     if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> >>> +                         tsflags & SOF_TIMESTAMPING_SOFTWARE ||
> >>> +                         tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
> >>> +                             tcp_recv_timestamp(msg, sk, &tss);
> >>> +             }
> >>>                if ((cmsg_flags & TCP_CMSG_INQ) | msg->msg_get_inq) {
> >>>                        msg->msg_inq = tcp_inq_hint(sk);
> >>>                        if (cmsg_flags & TCP_CMSG_INQ)
> >>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
  2026-05-18 23:16         ` Jason Xing
@ 2026-05-18 23:24           ` Jason Xing
  2026-05-19  9:57             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Xing @ 2026-05-18 23:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf, Simon Sundberg,
	Toke Høiland-Jørgensen, netdev, bpf, Jason Xing

On Tue, May 19, 2026 at 7:16 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Tue, May 19, 2026 at 12:40 AM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
> >
> >
> >
> > On 18/05/2026 15.53, Jason Xing wrote:
> > > On Mon, May 18, 2026 at 9:01 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
> > >>
> > >>
> > >>
> > >> On 18/05/2026 10.23, Jason Xing wrote:
> > >>> From: Jason Xing <kernelxing@tencent.com>
> > >>>
> > >>> Add two if statements to accurately isolate bpf timestamping and so
> > >>> timestamping. They can work respectively.
> > >>>
> > >>> As to so_timestamping, only add a loose condition via report flags
> > >>> to avoid duplicate strict checks that is done in tcp_recv_timestamp()
> > >>> and performance impact. If the loose condition is hit,
> > >>> tcp_recv_timestamp() is able to handle the exact case and doesn't
> > >>> hamper the existing timestamping feature.
> > >>>
> > >>> Make it work in TCP protocol.
> > >>>
> > >>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > >>> ---
> > >>>    net/ipv4/tcp.c | 14 ++++++++++++--
> > >>>    1 file changed, 12 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > >>> index 21ece4c71612..64c69bb3578a 100644
> > >>> --- a/net/ipv4/tcp.c
> > >>> +++ b/net/ipv4/tcp.c
> > >>> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
> > >>>        release_sock(sk);
> > >>>
> > >>>        if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
> > >>> -             if (cmsg_flags & TCP_CMSG_TS)
> > >>> -                     tcp_recv_timestamp(msg, sk, &tss);
> > >>> +             if (cmsg_flags & TCP_CMSG_TS) {
> > >>> +                     u32 tsflags = READ_ONCE(sk->sk_tsflags);
> > >>> +
> > >>> +                     if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
> > >>> +                         SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
> > >>> +                             bpf_skops_rx_timestamping(sk, &tss,
> > >>> +                                                       BPF_SOCK_OPS_TSTAMP_RCV_CB);
> > >>
> > >> Does this mean I can enable timestamp reading per cgroup?
> > >
> > > Yes, I think so, but I didn't try. One of the natures of sockopt
> > > feature is supporting cgroup attach.
> > > cgroup_bpf_prog_attach()/cgroup_bpf_link_attach() is probably
> > > something that you're looking for.
> > >
> >
> > Sound good
> >
> > > IIUC, you can attach the prog onto the cgroup where all the sockets
> > > are set using the bpf timestamping function. So the current impl is
> > > cleaner and has better isolation (to filter out those unmatched
> > > flows).
> > >
> > >>
> > >> In Simon's netstacklat[1] tool we are forced process all RX timestamp
> > >> (hooking fentry/tcp_recv_timestamp), and then we have a BPF filter[2] on
> > >> the cgroup IDs that we are interested in (which is a significant
> > >> overhead, as this is deployed at Cloudflare production scale).
> > >
> > > I can feel the pain when filtering in this kind of relatively hot
> > > path, which is what I'm trying to avoid internally. What I've done in
> > > production (to cover those old kernels) is to just let the kernel
> > > print the information, that's it, and there is an agent continuously
> > > gathering the data, doing the match and computing latency. But it's
> > > overall complicated.
> > >
> >
> > I hope you don't mean your internal/old approach was using printk and
> > then analyzing this data.
>
> Of course not :)
>
> The internal approach is to cover the old kernels but doesn't mean the
> approach is old :P
>
> Instead, the internal kernel module is super efficient and I'm trying
> to ship bpf with such an ability. The fact is we've already deployed
> in production: 7x24 running, zero sampling.
>
> Please see page 24 where there is a brief introduction on how to deal
> with the log part:
> https://lpc.events/event/19/contributions/2055/#preview:3846
> I believe this is the promising direction (ring buffer + lightweight
> kernel + heavy agent) we're taking.
>
> The headache part is that I need to provide an agent written in BPF to
> do the heavy process.
>
> >
> > > Many thanks here, I'm always interested in hearing more useful and
> > > real requirements and fancy ideas on how to monitor the latency :) Now
> >
> > Simon Sundberg <Simon.Sundberg@kau.se> have many more fancy ideas on how
> > to monitor the latency.
> > The netstacklat tool is part of Simon's PhD thesis:
> > - https://doi.org/10.59217/qklv6836
> >
> > And we even gotten a paper accepted on netstacklat:
> > -
> > https://kau.diva-portal.org/smash/record.jsf?pid=diva2%3A2034009&dswid=3032
>
> Sorry, I cannot access this link. Could you give me the title of this paper?

Waiting at the Front Door - Continuous Monitoring of Latency in the
Host Network Stack

Oh, I guess it hasn't been officially published right? This is the
reason why I have no way to know the content.

Thanks,
Jason

>
> >
> >
> > > I'm still struggling to port the internal functions to bpf
> > > timestamping.
> > >
> >
> > Good luck, feel free to get inspired by our netstacklat tool.
> > Key point is to let BPF store latency histograms and then let userspace
> > periodically consume these - as heatmaps.
>
> Thanks. I'll dig deeper into it.
>
> >
> > We are proposing to add 'netstacklat' as a new libbpf-tools utility
> > - https://github.com/iovisor/bcc/issues/5510
>
> Great!
>
> Thanks,
> Jason
>
> >
> >
> > >>
> > >>
> > >> [1] https://github.com/xdp-project/bpf-examples/tree/main/netstacklat
> > >>
> > >> [2]
> > >> https://github.com/xdp-project/bpf-examples/blob/main/netstacklat/netstacklat.bpf.c#L484-L488
> > >>
> > >>
> > >>> +                     if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> > >>> +                         tsflags & SOF_TIMESTAMPING_SOFTWARE ||
> > >>> +                         tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
> > >>> +                             tcp_recv_timestamp(msg, sk, &tss);
> > >>> +             }
> > >>>                if ((cmsg_flags & TCP_CMSG_INQ) | msg->msg_get_inq) {
> > >>>                        msg->msg_inq = tcp_inq_hint(sk);
> > >>>                        if (cmsg_flags & TCP_CMSG_INQ)
> > >>
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
  2026-05-18 15:34   ` Stanislav Fomichev
@ 2026-05-18 23:56     ` Jason Xing
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-05-18 23:56 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf, netdev, bpf, Jason Xing

On Mon, May 18, 2026 at 11:34 PM Stanislav Fomichev
<sdf.kernel@gmail.com> wrote:
>
> On 05/18, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Add two if statements to accurately isolate bpf timestamping and so
> > timestamping. They can work respectively.
> >
> > As to so_timestamping, only add a loose condition via report flags
> > to avoid duplicate strict checks that is done in tcp_recv_timestamp()
> > and performance impact. If the loose condition is hit,
> > tcp_recv_timestamp() is able to handle the exact case and doesn't
> > hamper the existing timestamping feature.
> >
> > Make it work in TCP protocol.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  net/ipv4/tcp.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 21ece4c71612..64c69bb3578a 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
> >       release_sock(sk);
> >
> >       if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
> > -             if (cmsg_flags & TCP_CMSG_TS)
> > -                     tcp_recv_timestamp(msg, sk, &tss);
> > +             if (cmsg_flags & TCP_CMSG_TS) {
> > +                     u32 tsflags = READ_ONCE(sk->sk_tsflags);
> > +
> > +                     if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
> > +                         SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
> > +                             bpf_skops_rx_timestamping(sk, &tss,
> > +                                                       BPF_SOCK_OPS_TSTAMP_RCV_CB);
>
> What about tcp_zc_finalize_rx_tstamp? Do you not want the rx tstamp for
> tcp rx zc?

Zerocopy support should not be complicated, I suppose. Let me find a
benchmark to test it locally.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
  2026-05-18 23:24           ` Jason Xing
@ 2026-05-19  9:57             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2026-05-19  9:57 UTC (permalink / raw)
  To: Jason Xing, Jesper Dangaard Brouer
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, kuniyu, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, john.fastabend, sdf, Simon Sundberg, netdev, bpf,
	Jason Xing

Jason Xing <kerneljasonxing@gmail.com> writes:

> On Tue, May 19, 2026 at 7:16 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>
>> On Tue, May 19, 2026 at 12:40 AM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>> >
>> >
>> >
>> > On 18/05/2026 15.53, Jason Xing wrote:
>> > > On Mon, May 18, 2026 at 9:01 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>> > >>
>> > >>
>> > >>
>> > >> On 18/05/2026 10.23, Jason Xing wrote:
>> > >>> From: Jason Xing <kernelxing@tencent.com>
>> > >>>
>> > >>> Add two if statements to accurately isolate bpf timestamping and so
>> > >>> timestamping. They can work respectively.
>> > >>>
>> > >>> As to so_timestamping, only add a loose condition via report flags
>> > >>> to avoid duplicate strict checks that is done in tcp_recv_timestamp()
>> > >>> and performance impact. If the loose condition is hit,
>> > >>> tcp_recv_timestamp() is able to handle the exact case and doesn't
>> > >>> hamper the existing timestamping feature.
>> > >>>
>> > >>> Make it work in TCP protocol.
>> > >>>
>> > >>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>> > >>> ---
>> > >>>    net/ipv4/tcp.c | 14 ++++++++++++--
>> > >>>    1 file changed, 12 insertions(+), 2 deletions(-)
>> > >>>
>> > >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> > >>> index 21ece4c71612..64c69bb3578a 100644
>> > >>> --- a/net/ipv4/tcp.c
>> > >>> +++ b/net/ipv4/tcp.c
>> > >>> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
>> > >>>        release_sock(sk);
>> > >>>
>> > >>>        if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
>> > >>> -             if (cmsg_flags & TCP_CMSG_TS)
>> > >>> -                     tcp_recv_timestamp(msg, sk, &tss);
>> > >>> +             if (cmsg_flags & TCP_CMSG_TS) {
>> > >>> +                     u32 tsflags = READ_ONCE(sk->sk_tsflags);
>> > >>> +
>> > >>> +                     if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
>> > >>> +                         SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
>> > >>> +                             bpf_skops_rx_timestamping(sk, &tss,
>> > >>> +                                                       BPF_SOCK_OPS_TSTAMP_RCV_CB);
>> > >>
>> > >> Does this mean I can enable timestamp reading per cgroup?
>> > >
>> > > Yes, I think so, but I didn't try. One of the natures of sockopt
>> > > feature is supporting cgroup attach.
>> > > cgroup_bpf_prog_attach()/cgroup_bpf_link_attach() is probably
>> > > something that you're looking for.
>> > >
>> >
>> > Sound good
>> >
>> > > IIUC, you can attach the prog onto the cgroup where all the sockets
>> > > are set using the bpf timestamping function. So the current impl is
>> > > cleaner and has better isolation (to filter out those unmatched
>> > > flows).
>> > >
>> > >>
>> > >> In Simon's netstacklat[1] tool we are forced process all RX timestamp
>> > >> (hooking fentry/tcp_recv_timestamp), and then we have a BPF filter[2] on
>> > >> the cgroup IDs that we are interested in (which is a significant
>> > >> overhead, as this is deployed at Cloudflare production scale).
>> > >
>> > > I can feel the pain when filtering in this kind of relatively hot
>> > > path, which is what I'm trying to avoid internally. What I've done in
>> > > production (to cover those old kernels) is to just let the kernel
>> > > print the information, that's it, and there is an agent continuously
>> > > gathering the data, doing the match and computing latency. But it's
>> > > overall complicated.
>> > >
>> >
>> > I hope you don't mean your internal/old approach was using printk and
>> > then analyzing this data.
>>
>> Of course not :)
>>
>> The internal approach is to cover the old kernels but doesn't mean the
>> approach is old :P
>>
>> Instead, the internal kernel module is super efficient and I'm trying
>> to ship bpf with such an ability. The fact is we've already deployed
>> in production: 7x24 running, zero sampling.
>>
>> Please see page 24 where there is a brief introduction on how to deal
>> with the log part:
>> https://lpc.events/event/19/contributions/2055/#preview:3846
>> I believe this is the promising direction (ring buffer + lightweight
>> kernel + heavy agent) we're taking.
>>
>> The headache part is that I need to provide an agent written in BPF to
>> do the heavy process.
>>
>> >
>> > > Many thanks here, I'm always interested in hearing more useful and
>> > > real requirements and fancy ideas on how to monitor the latency :) Now
>> >
>> > Simon Sundberg <Simon.Sundberg@kau.se> have many more fancy ideas on how
>> > to monitor the latency.
>> > The netstacklat tool is part of Simon's PhD thesis:
>> > - https://doi.org/10.59217/qklv6836
>> >
>> > And we even gotten a paper accepted on netstacklat:
>> > -
>> > https://kau.diva-portal.org/smash/record.jsf?pid=diva2%3A2034009&dswid=3032
>>
>> Sorry, I cannot access this link. Could you give me the title of this paper?
>
> Waiting at the Front Door - Continuous Monitoring of Latency in the
> Host Network Stack
>
> Oh, I guess it hasn't been officially published right? This is the
> reason why I have no way to know the content.

No, it's not published yet; I'll send you a copy off-list :)

-Toke

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-05-19  9:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
2026-05-18  8:23 ` [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc Jason Xing
2026-05-18 11:57   ` Jesper Dangaard Brouer
2026-05-18 12:35     ` Jason Xing
2026-05-18  8:23 ` [PATCH net-next 2/6] net: export sock_disable_timestamp() declaration Jason Xing
2026-05-18  8:23 ` [PATCH net-next 3/6] bpf: support bpf_setsockopt for bpf timestamping rx feature Jason Xing
2026-05-18  8:23 ` [PATCH net-next 4/6] bpf: add BPF_SOCK_OPS_TSTAMP_RCV_CB callback Jason Xing
2026-05-18  8:23 ` [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer Jason Xing
2026-05-18 13:01   ` Jesper Dangaard Brouer
2026-05-18 13:53     ` Jason Xing
2026-05-18 16:40       ` Jesper Dangaard Brouer
2026-05-18 23:16         ` Jason Xing
2026-05-18 23:24           ` Jason Xing
2026-05-19  9:57             ` Toke Høiland-Jørgensen
2026-05-18 15:34   ` Stanislav Fomichev
2026-05-18 23:56     ` Jason Xing
2026-05-18  8:23 ` [PATCH net-next 6/6] selftests/bpf: Add RX latency tests for bpf timestamping Jason Xing
2026-05-18 11:46 ` [PATCH net-next 0/6] bpf-timetamp: support rx side Jesper Dangaard Brouer
2026-05-18 12:32   ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox