[PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently
@ 2025-01-28  8:46 Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 01/13] net-timestamp: add support for bpf_setsockopt() Jason Xing
                   ` (13 more replies)
  0 siblings, 14 replies; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

"Timestamping is key to debugging network stack latency. With
SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
network issues can be attributed to the kernel." This is extracted
from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
addressed by Willem de Bruijn at netdevconf 0x17).

There are a few areas that need optimization with the consideration of
easier use and less performance impact, which I highlighted and mainly
discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
uAPI compatibility, extra system call overhead, and the need for
application modification. I initially managed to solve these issues
by writing a kernel module that hooks various key functions. However,
this approach is not suitable for the next kernel release. Therefore,
a BPF extension was proposed. During recent period, Martin KaFai Lau
provides invaluable suggestions about BPF along the way. Many thanks
here!

In this series, I only support foundamental codes and tx for TCP.
This approach mostly relies on existing SO_TIMESTAMPING feature, users
only needs to pass certain flags through bpf_setsocktopt() to a separate
tsflags. Please see the last selftest patch in this series.

After this series, we could step by step implement more advanced
functions/flags already in SO_TIMESTAMPING feature for bpf extension.

---
v7
Link: https://lore.kernel.org/all/20250121012901.87763-1-kerneljasonxing@gmail.com/
1. target bpf-next tree
2. simplely and directly stop timestamping callbacks calling a few BPF
CALLS due to safety concern.
3. add more new testcases and adjust the existing testcases
4. revise some comments of new timestamping callbacks
5. remove a few BPF CGROUP locks

RFC v6
In the meantime, any suggestions and reviews are welcome!
Link: https://lore.kernel.org/all/20250112113748.73504-1-kerneljasonxing@gmail.com/
1. handle those safety problem by using the correct method.
2. support bpf_getsockopt.
3. adjust the position of BPF_SOCK_OPS_TS_TCP_SND_CB
4. fix mishandling the hardware timestamp error
5. add more corresponding tests

v5
Link: https://lore.kernel.org/all/20241207173803.90744-1-kerneljasonxing@gmail.com/
1. handle the safety issus when someone tries to call unrelated bpf
helpers.
2. avoid adding direct function call in the hot path like
__dev_queue_xmit()
3. remove reporting the hardware timestamp and tskey since they can be
fetched through the existing helper with the help of
bpf_skops_init_skb(), please see the selftest.
4. add new sendmsg callback in tcp_sendmsg, and introduce tskey_bpf used
by bpf program to correlate tcp_sendmsg with other hook points in patch [13/15].

v4
Link: https://lore.kernel.org/all/20241028110535.82999-1-kerneljasonxing@gmail.com/
1. introduce sk->sk_bpf_cb_flags to let user use bpf_setsockopt() (Martin)
2. introduce SKBTX_BPF to enable the bpf SO_TIMESTAMPING feature (Martin)
3. introduce bpf map in tests (Martin)
4. I choose to make this series as simple as possible, so I only support
most cases in the tx path for TCP protocol.

v3
Link: https://lore.kernel.org/all/20241012040651.95616-1-kerneljasonxing@gmail.com/
1. support UDP proto by introducing a new generation point.
2. for OPT_ID, introducing sk_tskey_bpf_offset to compute the delta
between the current socket key and bpf socket key. It is desiged for
UDP, which also applies to TCP.
3. support bpf_getsockopt()
4. use cgroup static key instead.
5. add one simple bpf selftest to show how it can be used.
6. remove the rx support from v2 because the number of patches could
exceed the limit of one series.

V2
Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
1. Introduce tsflag requestors so that we are able to extend more in the
future. Besides, it enables TX flags for bpf extension feature separately
without breaking users. It is suggested by Vadim Fedorenko.
2. introduce a static key to control the whole feature. (Willem)
3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
some TX/RX cases, not all the cases.

Jason Xing (13):
  net-timestamp: add support for bpf_setsockopt()
  net-timestamp: prepare for timestamping callbacks use
  bpf: stop unsafely accessing TCP fields in bpf callbacks
  bpf: stop calling some sock_op BPF CALLs in new timestamping callbacks
  net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING
  net-timestamp: support SCM_TSTAMP_SCHED for bpf extension
  net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  net-timestamp: support hw SCM_TSTAMP_SND for bpf extension
  net-timestamp: support SCM_TSTAMP_ACK for bpf extension
  net-timestamp: make TCP tx timestamp bpf extension work
  net-timestamp: add a new callback in tcp_tx_timestamp()
  net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases
  bpf: add simple bpf tests in the tx path for so_timestamping feature

 include/linux/filter.h                        |   5 +
 include/linux/skbuff.h                        |  25 +-
 include/net/sock.h                            |  10 +
 include/net/tcp.h                             |   4 +-
 include/uapi/linux/bpf.h                      |  35 ++
 net/core/dev.c                                |   5 +-
 net/core/filter.c                             |  48 ++-
 net/core/skbuff.c                             |  65 +++-
 net/core/sock.c                               |  15 +
 net/dsa/user.c                                |   2 +-
 net/ipv4/tcp.c                                |  11 +
 net/ipv4/tcp_input.c                          |   8 +-
 net/ipv4/tcp_output.c                         |   7 +
 net/socket.c                                  |   2 +-
 tools/include/uapi/linux/bpf.h                |  28 ++
 .../bpf/prog_tests/so_timestamping.c          |  86 +++++
 .../selftests/bpf/progs/so_timestamping.c     | 299 ++++++++++++++++++
 17 files changed, 633 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/so_timestamping.c
 create mode 100644 tools/testing/selftests/bpf/progs/so_timestamping.c

-- 
2.43.5

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 01/13] net-timestamp: add support for bpf_setsockopt()
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 02/13] net-timestamp: prepare for timestamping callbacks use Jason Xing
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Users can write the following code to enable the bpf extension:
bpf_setsockopt(skops, SOL_SOCKET, SK_BPF_CB_FLAGS, &flags, sizeof(flags));

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/net/sock.h             |  3 +++
 include/uapi/linux/bpf.h       |  8 ++++++++
 net/core/filter.c              | 23 +++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  1 +
 4 files changed, 35 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8036b3b79cd8..7916982343c6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -303,6 +303,7 @@ struct sk_filter;
   *	@sk_stamp: time stamp of last packet received
   *	@sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only
   *	@sk_tsflags: SO_TIMESTAMPING flags
+  *	@sk_bpf_cb_flags: used in bpf_setsockopt()
   *	@sk_use_task_frag: allow sk_page_frag() to use current->task_frag.
   *			   Sockets that can be used under memory reclaim should
   *			   set this to false.
@@ -445,6 +446,8 @@ struct sock {
 	u32			sk_reserved_mem;
 	int			sk_forward_alloc;
 	u32			sk_tsflags;
+#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
+	u32			sk_bpf_cb_flags;
 	__cacheline_group_end(sock_write_rxtx);
 
 	__cacheline_group_begin(sock_write_tx);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2acf9b336371..6116eb3d1515 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6913,6 +6913,13 @@ enum {
 	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
 };
 
+/* Definitions for bpf_sk_cb_flags */
+enum {
+	SK_BPF_CB_TX_TIMESTAMPING	= 1<<0,
+	SK_BPF_CB_MASK			= (SK_BPF_CB_TX_TIMESTAMPING - 1) |
+					   SK_BPF_CB_TX_TIMESTAMPING
+};
+
 /* List of known BPF sock_ops operators.
  * New entries can only be added at the end
  */
@@ -7091,6 +7098,7 @@ enum {
 	TCP_BPF_SYN_IP		= 1006, /* Copy the IP[46] and TCP header */
 	TCP_BPF_SYN_MAC         = 1007, /* Copy the MAC, IP[46], and TCP header */
 	TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
+	SK_BPF_CB_FLAGS		= 1009, /* Used to set socket bpf flags */
 };
 
 enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index 2ec162dd83c4..1c6c07507a78 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5222,6 +5222,25 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
+static int sk_bpf_set_get_cb_flags(struct sock *sk, char *optval, bool getopt)
+{
+	u32 sk_bpf_cb_flags;
+
+	if (getopt) {
+		*(u32 *)optval = sk->sk_bpf_cb_flags;
+		return 0;
+	}
+
+	sk_bpf_cb_flags = *(u32 *)optval;
+
+	if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
+		return -EINVAL;
+
+	sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
+
+	return 0;
+}
+
 static int sol_socket_sockopt(struct sock *sk, int optname,
 			      char *optval, int *optlen,
 			      bool getopt)
@@ -5238,6 +5257,7 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
 	case SO_MAX_PACING_RATE:
 	case SO_BINDTOIFINDEX:
 	case SO_TXREHASH:
+	case SK_BPF_CB_FLAGS:
 		if (*optlen != sizeof(int))
 			return -EINVAL;
 		break;
@@ -5247,6 +5267,9 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
 		return -EINVAL;
 	}
 
+	if (optname == SK_BPF_CB_FLAGS)
+		return sk_bpf_set_get_cb_flags(sk, optval, getopt);
+
 	if (getopt) {
 		if (optname == SO_BINDTODEVICE)
 			return -EINVAL;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2acf9b336371..70366f74ef4e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7091,6 +7091,7 @@ enum {
 	TCP_BPF_SYN_IP		= 1006, /* Copy the IP[46] and TCP header */
 	TCP_BPF_SYN_MAC         = 1007, /* Copy the MAC, IP[46], and TCP header */
 	TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
+	SK_BPF_CB_FLAGS		= 1009, /* Used to set socket bpf flags */
 };
 
 enum {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 02/13] net-timestamp: prepare for timestamping callbacks use
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 01/13] net-timestamp: add support for bpf_setsockopt() Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 03/13] bpf: stop unsafely accessing TCP fields in bpf callbacks Jason Xing
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Later, I would introduce four callback points to report information
to user space based on this patch.

As to skb initialization here, people can follow these three steps
as below to fetch the shared info from the exported skb in the bpf
prog:
1. skops_kern = bpf_cast_to_kern_ctx(skops);
2. skb = skops_kern->skb;
3. shinfo = bpf_core_cast(skb->head + skb->end, struct skb_shared_info);

More details can be seen in the last selftest patch of the series.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/net/sock.h |  7 +++++++
 net/core/sock.c    | 15 +++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7916982343c6..6f4d54faba92 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2923,6 +2923,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
 			  struct so_timestamping timestamping);
 
 void sock_enable_timestamps(struct sock *sk);
+#if defined(CONFIG_CGROUP_BPF)
+void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
+#else
+static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
+{
+}
+#endif
 void sock_no_linger(struct sock *sk);
 void sock_set_keepalive(struct sock *sk);
 void sock_set_priority(struct sock *sk, u32 priority);
diff --git a/net/core/sock.c b/net/core/sock.c
index eae2ae70a2e0..41db6407e360 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -948,6 +948,21 @@ int sock_set_timestamping(struct sock *sk, int optname,
 	return 0;
 }
 
+#if defined(CONFIG_CGROUP_BPF)
+void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
+{
+	struct bpf_sock_ops_kern sock_ops;
+
+	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
+	sock_ops.op = op;
+	sock_ops.is_fullsock = 1;
+	sock_ops.sk = sk;
+	bpf_skops_init_skb(&sock_ops, skb, 0);
+	/* Timestamping bpf extension supports only TCP and UDP full socket */
+	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
+}
+#endif
+
 void sock_set_keepalive(struct sock *sk)
 {
 	lock_sock(sk);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 03/13] bpf: stop unsafely accessing TCP fields in bpf callbacks
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 01/13] net-timestamp: add support for bpf_setsockopt() Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 02/13] net-timestamp: prepare for timestamping callbacks use Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 04/13] bpf: stop calling some sock_op BPF CALLs in new timestamping callbacks Jason Xing
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

The "allow_tcp_access" flag is added to indicate that the callback
site has a tcp_sock locked.

Applying the new member allow_tcp_access in the existing callbacks
where is_fullsock is set to 1 can help us stop UDP socket accessing
struct tcp_sock and stop TCP socket without sk lock protecting does
the similar thing, or else it could be catastrophe leading to panic.

To keep it simple, instead of distinguishing between read and write
access, we disallow all read/write access to the tcp_sock through
the older bpf_sock_ops ctx. The new timestamping callbacks can use
newer helpers to read everything from a sk (e.g. bpf_core_cast), so
nothing is lost.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/filter.h | 5 +++++
 include/net/tcp.h      | 1 +
 net/core/filter.c      | 8 ++++----
 net/ipv4/tcp_input.c   | 2 ++
 net/ipv4/tcp_output.c  | 2 ++
 5 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a3ea46281595..1569e9f31a8c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1508,6 +1508,11 @@ struct bpf_sock_ops_kern {
 	void	*skb_data_end;
 	u8	op;
 	u8	is_fullsock;
+	u8	allow_tcp_access;	/* Indicate that the callback site
+					 * has a tcp_sock locked. Then it
+					 * would be safe to access struct
+					 * tcp_sock.
+					 */
 	u8	remaining_opt_len;
 	u64	temp;			/* temp and everything after is not
 					 * initialized to 0 before calling
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5b2b04835688..293047694710 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2649,6 +2649,7 @@ static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
 	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
 	if (sk_fullsock(sk)) {
 		sock_ops.is_fullsock = 1;
+		sock_ops.allow_tcp_access = 1;
 		sock_owned_by_me(sk);
 	}
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 1c6c07507a78..dc0e67c5776a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10381,10 +10381,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 		}							      \
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
 						struct bpf_sock_ops_kern,     \
-						is_fullsock),		      \
+						allow_tcp_access),	      \
 				      fullsock_reg, si->src_reg,	      \
 				      offsetof(struct bpf_sock_ops_kern,      \
-					       is_fullsock));		      \
+					       allow_tcp_access));	      \
 		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);	      \
 		if (si->dst_reg == si->src_reg)				      \
 			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
@@ -10469,10 +10469,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 					       temp));			      \
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
 						struct bpf_sock_ops_kern,     \
-						is_fullsock),		      \
+						allow_tcp_access),	      \
 				      reg, si->dst_reg,			      \
 				      offsetof(struct bpf_sock_ops_kern,      \
-					       is_fullsock));		      \
+					       allow_tcp_access));	      \
 		*insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2);		      \
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
 						struct bpf_sock_ops_kern, sk),\
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index eb82e01da911..77185479ed5e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -169,6 +169,7 @@ static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb)
 	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
 	sock_ops.op = BPF_SOCK_OPS_PARSE_HDR_OPT_CB;
 	sock_ops.is_fullsock = 1;
+	sock_ops.allow_tcp_access = 1;
 	sock_ops.sk = sk;
 	bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb));
 
@@ -185,6 +186,7 @@ static void bpf_skops_established(struct sock *sk, int bpf_op,
 	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
 	sock_ops.op = bpf_op;
 	sock_ops.is_fullsock = 1;
+	sock_ops.allow_tcp_access = 1;
 	sock_ops.sk = sk;
 	/* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */
 	if (skb)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0e5b9a654254..695749807c09 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -522,6 +522,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
 		sock_owned_by_me(sk);
 
 		sock_ops.is_fullsock = 1;
+		sock_ops.allow_tcp_access = 1;
 		sock_ops.sk = sk;
 	}
 
@@ -567,6 +568,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
 		sock_owned_by_me(sk);
 
 		sock_ops.is_fullsock = 1;
+		sock_ops.allow_tcp_access = 1;
 		sock_ops.sk = sk;
 	}
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 04/13] bpf: stop calling some sock_op BPF CALLs in new timestamping callbacks
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (2 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 03/13] bpf: stop unsafely accessing TCP fields in bpf callbacks Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 05/13] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Simply disallow calling bpf_sock_ops_setsockopt/getsockopt,
bpf_sock_ops_cb_flags_set, and the bpf_sock_ops_load_hdr_opt for
the new timestamping callbacks for the safety consideration.

Besides, In the next round, we will support the UDP proto for
SO_TIMESTAMPING bpf extension, so we need to ensure there is no
safety problem, which is usually caused by UDP socket trying to
access TCP fields.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/core/filter.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index dc0e67c5776a..d3395ffe058e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5523,6 +5523,11 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname,
 	return -EINVAL;
 }
 
+static bool is_locked_tcp_sock_ops(struct bpf_sock_ops_kern *bpf_sock)
+{
+	return bpf_sock->op <= BPF_SOCK_OPS_WRITE_HDR_OPT_CB;
+}
+
 static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 			   char *optval, int optlen)
 {
@@ -5673,6 +5678,9 @@ static const struct bpf_func_proto bpf_sock_addr_getsockopt_proto = {
 BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
+	if (!is_locked_tcp_sock_ops(bpf_sock))
+		return -EOPNOTSUPP;
+
 	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
 }
 
@@ -5758,6 +5766,9 @@ static int bpf_sock_ops_get_syn(struct bpf_sock_ops_kern *bpf_sock,
 BPF_CALL_5(bpf_sock_ops_getsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
+	if (!is_locked_tcp_sock_ops(bpf_sock))
+		return -EOPNOTSUPP;
+
 	if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP &&
 	    optname >= TCP_BPF_SYN && optname <= TCP_BPF_SYN_MAC) {
 		int ret, copy_len = 0;
@@ -5800,6 +5811,9 @@ BPF_CALL_2(bpf_sock_ops_cb_flags_set, struct bpf_sock_ops_kern *, bpf_sock,
 	struct sock *sk = bpf_sock->sk;
 	int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS;
 
+	if (!is_locked_tcp_sock_ops(bpf_sock))
+		return -EOPNOTSUPP;
+
 	if (!IS_ENABLED(CONFIG_INET) || !sk_fullsock(sk))
 		return -EINVAL;
 
@@ -7609,6 +7623,9 @@ BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
 	u8 search_kind, search_len, copy_len, magic_len;
 	int ret;
 
+	if (!is_locked_tcp_sock_ops(bpf_sock))
+		return -EOPNOTSUPP;
+
 	/* 2 byte is the minimal option len except TCPOPT_NOP and
 	 * TCPOPT_EOL which are useless for the bpf prog to learn
 	 * and this helper disallow loading them also.
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 05/13] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (3 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 04/13] bpf: stop calling some sock_op BPF CALLs in new timestamping callbacks Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-02-03 23:14   ` Martin KaFai Lau
  2025-01-28  8:46 ` [PATCH bpf-next v7 06/13] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension Jason Xing
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

No functional changes here. I add skb_enable_app_tstamp() to test
if the orig_skb matches the usage of application SO_TIMESTAMPING
and skb_sw_tstamp_tx() to distinguish the software and hardware
timestamp when tsflag is SCM_TSTAMP_SND.

Also, I deliberately distinguish the the software and hardware
SCM_TSTAMP_SND timestamp by passing 'sw' parameter in order to
avoid such a case where hardware may go wrong and pass a NULL
hwstamps, which is even though unlikely to happen. If it really
happens, bpf prog will finally consider it as a software timestamp.
It will be hardly recognized. Let's make the timestamping part
more robust.

After this patch, I will soon add checks about bpf SO_TIMESTAMPING.
In this way, we can support two modes parallelly.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/skbuff.h | 13 +++++++------
 net/core/dev.c         |  2 +-
 net/core/skbuff.c      | 32 ++++++++++++++++++++++++++++++--
 net/ipv4/tcp_input.c   |  3 ++-
 4 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bb2b751d274a..dfc419281cc9 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -39,6 +39,7 @@
 #include <net/net_debug.h>
 #include <net/dropreason-core.h>
 #include <net/netmem.h>
+#include <uapi/linux/errqueue.h>
 
 /**
  * DOC: skb checksums
@@ -4533,18 +4534,18 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 
 void __skb_tstamp_tx(struct sk_buff *orig_skb, const struct sk_buff *ack_skb,
 		     struct skb_shared_hwtstamps *hwtstamps,
-		     struct sock *sk, int tstype);
+		     struct sock *sk, bool sw, int tstype);
 
 /**
- * skb_tstamp_tx - queue clone of skb with send time stamps
+ * skb_tstamp_tx - queue clone of skb with send HARDWARE timestamps
  * @orig_skb:	the original outgoing packet
  * @hwtstamps:	hardware time stamps, may be NULL if not available
  *
  * If the skb has a socket associated, then this function clones the
  * skb (thus sharing the actual data and optional structures), stores
- * the optional hardware time stamping information (if non NULL) or
- * generates a software time stamp (otherwise), then queues the clone
- * to the error queue of the socket.  Errors are silently ignored.
+ * the optional hardware time stamping information (if non NULL) then
+ * queues the clone to the error queue of the socket.  Errors are
+ * silently ignored.
  */
 void skb_tstamp_tx(struct sk_buff *orig_skb,
 		   struct skb_shared_hwtstamps *hwtstamps);
@@ -4565,7 +4566,7 @@ static inline void skb_tx_timestamp(struct sk_buff *skb)
 {
 	skb_clone_tx_timestamp(skb);
 	if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
-		skb_tstamp_tx(skb, NULL);
+		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, true, SCM_TSTAMP_SND);
 }
 
 /**
diff --git a/net/core/dev.c b/net/core/dev.c
index afa2282f2604..d77b8389753e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4501,7 +4501,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 	skb_assert_len(skb);
 
 	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
-		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
+		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, true, SCM_TSTAMP_SCHED);
 
 	/* Disable soft irqs for various locks below. Also
 	 * stops preemption for RCU.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a441613a1e6c..6042961dfc02 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5539,10 +5539,35 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 }
 EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
 
+static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
+{
+	int flag;
+
+	switch (tstype) {
+	case SCM_TSTAMP_SCHED:
+		flag = SKBTX_SCHED_TSTAMP;
+		break;
+	case SCM_TSTAMP_SND:
+		flag = sw ? SKBTX_SW_TSTAMP : SKBTX_HW_TSTAMP;
+		break;
+	case SCM_TSTAMP_ACK:
+		if (TCP_SKB_CB(skb)->txstamp_ack)
+			return true;
+		fallthrough;
+	default:
+		return false;
+	}
+
+	if (skb_shinfo(skb)->tx_flags & flag)
+		return true;
+
+	return false;
+}
+
 void __skb_tstamp_tx(struct sk_buff *orig_skb,
 		     const struct sk_buff *ack_skb,
 		     struct skb_shared_hwtstamps *hwtstamps,
-		     struct sock *sk, int tstype)
+		     struct sock *sk, bool sw, int tstype)
 {
 	struct sk_buff *skb;
 	bool tsonly, opt_stats = false;
@@ -5551,6 +5576,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (!sk)
 		return;
 
+	if (!skb_enable_app_tstamp(orig_skb, tstype, sw))
+		return;
+
 	tsflags = READ_ONCE(sk->sk_tsflags);
 	if (!hwtstamps && !(tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) &&
 	    skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)
@@ -5599,7 +5627,7 @@ EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
 void skb_tstamp_tx(struct sk_buff *orig_skb,
 		   struct skb_shared_hwtstamps *hwtstamps)
 {
-	return __skb_tstamp_tx(orig_skb, NULL, hwtstamps, orig_skb->sk,
+	return __skb_tstamp_tx(orig_skb, NULL, hwtstamps, orig_skb->sk, false,
 			       SCM_TSTAMP_SND);
 }
 EXPORT_SYMBOL_GPL(skb_tstamp_tx);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 77185479ed5e..62252702929d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3330,7 +3330,8 @@ static void tcp_ack_tstamp(struct sock *sk, struct sk_buff *skb,
 	if (!before(shinfo->tskey, prior_snd_una) &&
 	    before(shinfo->tskey, tcp_sk(sk)->snd_una)) {
 		tcp_skb_tsorted_save(skb) {
-			__skb_tstamp_tx(skb, ack_skb, NULL, sk, SCM_TSTAMP_ACK);
+			__skb_tstamp_tx(skb, ack_skb, NULL, sk, true,
+					SCM_TSTAMP_ACK);
 		} tcp_skb_tsorted_restore(skb);
 	}
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 06/13] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (4 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 05/13] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-02-03 23:23   ` Martin KaFai Lau
  2025-01-28  8:46 ` [PATCH bpf-next v7 07/13] net-timestamp: support sw SCM_TSTAMP_SND " Jason Xing
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Introducing SKBTX_BPF is used as an indicator telling us whether
the skb should be traced by the bpf prog.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/skbuff.h         |  6 +++++-
 include/uapi/linux/bpf.h       |  4 ++++
 net/core/dev.c                 |  3 ++-
 net/core/skbuff.c              | 23 +++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  4 ++++
 5 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index dfc419281cc9..35c2e864dd4b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -490,10 +490,14 @@ enum {
 
 	/* generate software time stamp when entering packet scheduling */
 	SKBTX_SCHED_TSTAMP = 1 << 6,
+
+	/* used for bpf extension when a bpf program is loaded */
+	SKBTX_BPF = 1 << 7,
 };
 
 #define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP    | \
-				 SKBTX_SCHED_TSTAMP)
+				 SKBTX_SCHED_TSTAMP | \
+				 SKBTX_BPF)
 #define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | \
 				 SKBTX_HW_TSTAMP_USE_CYCLES | \
 				 SKBTX_ANY_SW_TSTAMP)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6116eb3d1515..30d2c078966b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7032,6 +7032,10 @@ enum {
 					 * by the kernel or the
 					 * earlier bpf-progs.
 					 */
+	BPF_SOCK_OPS_TS_SCHED_OPT_CB,	/* Called when skb is passing through
+					 * dev layer when SK_BPF_CB_TX_TIMESTAMPING
+					 * feature is on.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/dev.c b/net/core/dev.c
index d77b8389753e..4f291459d6b1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4500,7 +4500,8 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 	skb_reset_mac_header(skb);
 	skb_assert_len(skb);
 
-	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
+	if (unlikely(skb_shinfo(skb)->tx_flags &
+		     (SKBTX_SCHED_TSTAMP | SKBTX_BPF)))
 		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, true, SCM_TSTAMP_SCHED);
 
 	/* Disable soft irqs for various locks below. Also
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6042961dfc02..d19d577b996f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5564,6 +5564,24 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
 	return false;
 }
 
+static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype)
+{
+	int op;
+
+	if (!sk)
+		return;
+
+	switch (tstype) {
+	case SCM_TSTAMP_SCHED:
+		op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
+		break;
+	default:
+		return;
+	}
+
+	bpf_skops_tx_timestamping(sk, skb, op);
+}
+
 void __skb_tstamp_tx(struct sk_buff *orig_skb,
 		     const struct sk_buff *ack_skb,
 		     struct skb_shared_hwtstamps *hwtstamps,
@@ -5576,6 +5594,11 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (!sk)
 		return;
 
+	/* bpf extension feature entry */
+	if (skb_shinfo(orig_skb)->tx_flags & SKBTX_BPF)
+		skb_tstamp_tx_bpf(orig_skb, sk, tstype);
+
+	/* application feature entry */
 	if (!skb_enable_app_tstamp(orig_skb, tstype, sw))
 		return;
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 70366f74ef4e..eed91b7296b7 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7025,6 +7025,10 @@ enum {
 					 * by the kernel or the
 					 * earlier bpf-progs.
 					 */
+	BPF_SOCK_OPS_TS_SCHED_OPT_CB,	/* Called when skb is passing through
+					 * dev layer when SK_BPF_CB_TX_TIMESTAMPING
+					 * feature is on.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 07/13] net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (5 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 06/13] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 08/13] net-timestamp: support hw " Jason Xing
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Support SCM_TSTAMP_SND case. Then we will get the software
timestamp when the driver is about to send the skb. Later, I
will support the hardware timestamp.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/skbuff.h         |  2 +-
 include/uapi/linux/bpf.h       |  4 ++++
 net/core/skbuff.c              | 10 ++++++++--
 tools/include/uapi/linux/bpf.h |  4 ++++
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 35c2e864dd4b..de8d3bd311f5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4569,7 +4569,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 static inline void skb_tx_timestamp(struct sk_buff *skb)
 {
 	skb_clone_tx_timestamp(skb);
-	if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
+	if (skb_shinfo(skb)->tx_flags & (SKBTX_SW_TSTAMP | SKBTX_BPF))
 		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, true, SCM_TSTAMP_SND);
 }
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 30d2c078966b..6a1083bcf779 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7036,6 +7036,10 @@ enum {
 					 * dev layer when SK_BPF_CB_TX_TIMESTAMPING
 					 * feature is on.
 					 */
+	BPF_SOCK_OPS_TS_SW_OPT_CB,	/* Called when skb is about to send
+					 * to the nic when SK_BPF_CB_TX_TIMESTAMPING
+					 * feature is on.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d19d577b996f..288eb9869827 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5564,7 +5564,8 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
 	return false;
 }
 
-static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype)
+static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk,
+			      int tstype, bool sw)
 {
 	int op;
 
@@ -5575,6 +5576,11 @@ static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype)
 	case SCM_TSTAMP_SCHED:
 		op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
 		break;
+	case SCM_TSTAMP_SND:
+		if (!sw)
+			return;
+		op = BPF_SOCK_OPS_TS_SW_OPT_CB;
+		break;
 	default:
 		return;
 	}
@@ -5596,7 +5602,7 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 
 	/* bpf extension feature entry */
 	if (skb_shinfo(orig_skb)->tx_flags & SKBTX_BPF)
-		skb_tstamp_tx_bpf(orig_skb, sk, tstype);
+		skb_tstamp_tx_bpf(orig_skb, sk, tstype, sw);
 
 	/* application feature entry */
 	if (!skb_enable_app_tstamp(orig_skb, tstype, sw))
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index eed91b7296b7..9bd1c7c77b17 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7029,6 +7029,10 @@ enum {
 					 * dev layer when SK_BPF_CB_TX_TIMESTAMPING
 					 * feature is on.
 					 */
+	BPF_SOCK_OPS_TS_SW_OPT_CB,	/* Called when skb is about to send
+					 * to the nic when SK_BPF_CB_TX_TIMESTAMPING
+					 * feature is on.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 08/13] net-timestamp: support hw SCM_TSTAMP_SND for bpf extension
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (6 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 07/13] net-timestamp: support sw SCM_TSTAMP_SND " Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-02-04  0:56   ` Martin KaFai Lau
  2025-01-28  8:46 ` [PATCH bpf-next v7 09/13] net-timestamp: support SCM_TSTAMP_ACK " Jason Xing
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

In this patch, we finish the hardware part. Then bpf program can
fetch the hwstamp from skb directly.

To avoid changing so many callers using SKBTX_HW_TSTAMP from drivers,
use this simple modification like this patch does to support printing
hardware timestamp.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/skbuff.h         |  4 +++-
 include/uapi/linux/bpf.h       |  7 +++++++
 net/core/skbuff.c              | 11 ++++++-----
 net/dsa/user.c                 |  2 +-
 net/socket.c                   |  2 +-
 tools/include/uapi/linux/bpf.h |  7 +++++++
 6 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index de8d3bd311f5..df2d790ae36b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -471,7 +471,7 @@ struct skb_shared_hwtstamps {
 /* Definitions for tx_flags in struct skb_shared_info */
 enum {
 	/* generate hardware time stamp */
-	SKBTX_HW_TSTAMP = 1 << 0,
+	__SKBTX_HW_TSTAMP = 1 << 0,
 
 	/* generate software time stamp when queueing packet to NIC */
 	SKBTX_SW_TSTAMP = 1 << 1,
@@ -495,6 +495,8 @@ enum {
 	SKBTX_BPF = 1 << 7,
 };
 
+#define SKBTX_HW_TSTAMP		(__SKBTX_HW_TSTAMP | SKBTX_BPF)
+
 #define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP    | \
 				 SKBTX_SCHED_TSTAMP | \
 				 SKBTX_BPF)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6a1083bcf779..4c3566f623c2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7040,6 +7040,13 @@ enum {
 					 * to the nic when SK_BPF_CB_TX_TIMESTAMPING
 					 * feature is on.
 					 */
+	BPF_SOCK_OPS_TS_HW_OPT_CB,	/* Called in hardware phase when
+					 * SK_BPF_CB_TX_TIMESTAMPING feature
+					 * is on. At the same time, hwtstamps
+					 * of skb is initialized as the
+					 * timestamp that hardware just
+					 * generates.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 288eb9869827..c769feae5162 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5548,7 +5548,7 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
 		flag = SKBTX_SCHED_TSTAMP;
 		break;
 	case SCM_TSTAMP_SND:
-		flag = sw ? SKBTX_SW_TSTAMP : SKBTX_HW_TSTAMP;
+		flag = sw ? SKBTX_SW_TSTAMP : __SKBTX_HW_TSTAMP;
 		break;
 	case SCM_TSTAMP_ACK:
 		if (TCP_SKB_CB(skb)->txstamp_ack)
@@ -5565,7 +5565,8 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
 }
 
 static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk,
-			      int tstype, bool sw)
+			      int tstype, bool sw,
+			      struct skb_shared_hwtstamps *hwtstamps)
 {
 	int op;
 
@@ -5577,9 +5578,9 @@ static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk,
 		op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
 		break;
 	case SCM_TSTAMP_SND:
+		op = sw ? BPF_SOCK_OPS_TS_SW_OPT_CB : BPF_SOCK_OPS_TS_HW_OPT_CB;
 		if (!sw)
-			return;
-		op = BPF_SOCK_OPS_TS_SW_OPT_CB;
+			*skb_hwtstamps(skb) = *hwtstamps;
 		break;
 	default:
 		return;
@@ -5602,7 +5603,7 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 
 	/* bpf extension feature entry */
 	if (skb_shinfo(orig_skb)->tx_flags & SKBTX_BPF)
-		skb_tstamp_tx_bpf(orig_skb, sk, tstype, sw);
+		skb_tstamp_tx_bpf(orig_skb, sk, tstype, sw, hwtstamps);
 
 	/* application feature entry */
 	if (!skb_enable_app_tstamp(orig_skb, tstype, sw))
diff --git a/net/dsa/user.c b/net/dsa/user.c
index 291ab1b4acc4..ae715bf0ae75 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -897,7 +897,7 @@ static void dsa_skb_tx_timestamp(struct dsa_user_priv *p,
 {
 	struct dsa_switch *ds = p->dp->ds;
 
-	if (!(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
+	if (!(skb_shinfo(skb)->tx_flags & __SKBTX_HW_TSTAMP))
 		return;
 
 	if (!ds->ops->port_txtstamp)
diff --git a/net/socket.c b/net/socket.c
index 262a28b59c7f..70eabb510ce6 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -676,7 +676,7 @@ void __sock_tx_timestamp(__u32 tsflags, __u8 *tx_flags)
 	u8 flags = *tx_flags;
 
 	if (tsflags & SOF_TIMESTAMPING_TX_HARDWARE) {
-		flags |= SKBTX_HW_TSTAMP;
+		flags |= __SKBTX_HW_TSTAMP;
 
 		/* PTP hardware clocks can provide a free running cycle counter
 		 * as a time base for virtual clocks. Tell driver to use the
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9bd1c7c77b17..974b7f61d11f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7033,6 +7033,13 @@ enum {
 					 * to the nic when SK_BPF_CB_TX_TIMESTAMPING
 					 * feature is on.
 					 */
+	BPF_SOCK_OPS_TS_HW_OPT_CB,	/* Called in hardware phase when
+					 * SK_BPF_CB_TX_TIMESTAMPING feature
+					 * is on. At the same time, hwtstamps
+					 * of skb is initialized as the
+					 * timestamp that hardware just
+					 * generates.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 09/13] net-timestamp: support SCM_TSTAMP_ACK for bpf extension
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (7 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 08/13] net-timestamp: support hw " Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-01-28  8:46 ` [PATCH bpf-next v7 10/13] net-timestamp: make TCP tx timestamp bpf extension work Jason Xing
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Handle the ACK timestamp case. Actually testing SKBTX_BPF flag
can work, but we need to Introduce a new txstamp_ack_bpf to avoid
cache line misses in tcp_ack_tstamp(). To be more specific, in most
cases, normal flows would not access skb_shinfo as txstamp_ack
is zero, so that this function won't appear in the hot spot lists.
Introducing a new member txstamp_ack_bpf works similarly.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/net/tcp.h              | 3 ++-
 include/uapi/linux/bpf.h       | 5 +++++
 net/core/skbuff.c              | 3 +++
 net/ipv4/tcp_input.c           | 3 ++-
 net/ipv4/tcp_output.c          | 5 +++++
 tools/include/uapi/linux/bpf.h | 5 +++++
 6 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 293047694710..88429e422301 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -959,9 +959,10 @@ struct tcp_skb_cb {
 	__u8		sacked;		/* State flags for SACK.	*/
 	__u8		ip_dsfield;	/* IPv4 tos or IPv6 dsfield	*/
 	__u8		txstamp_ack:1,	/* Record TX timestamp for ack? */
+			txstamp_ack_bpf:1,	/* ack timestamp for bpf use */
 			eor:1,		/* Is skb MSG_EOR marked? */
 			has_rxtstamp:1,	/* SKB has a RX timestamp	*/
-			unused:5;
+			unused:4;
 	__u32		ack_seq;	/* Sequence number ACK'd	*/
 	union {
 		struct {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4c3566f623c2..800122a8abe5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7047,6 +7047,11 @@ enum {
 					 * timestamp that hardware just
 					 * generates.
 					 */
+	BPF_SOCK_OPS_TS_ACK_OPT_CB,	/* Called when all the skbs in the
+					 * same sendmsg call are acked
+					 * when SK_BPF_CB_TX_TIMESTAMPING
+					 * feature is on.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c769feae5162..33340e0b094f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5582,6 +5582,9 @@ static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk,
 		if (!sw)
 			*skb_hwtstamps(skb) = *hwtstamps;
 		break;
+	case SCM_TSTAMP_ACK:
+		op = BPF_SOCK_OPS_TS_ACK_OPT_CB;
+		break;
 	default:
 		return;
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 62252702929d..c8945f5be31b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3323,7 +3323,8 @@ static void tcp_ack_tstamp(struct sock *sk, struct sk_buff *skb,
 	const struct skb_shared_info *shinfo;
 
 	/* Avoid cache line misses to get skb_shinfo() and shinfo->tx_flags */
-	if (likely(!TCP_SKB_CB(skb)->txstamp_ack))
+	if (likely(!TCP_SKB_CB(skb)->txstamp_ack &&
+		   !TCP_SKB_CB(skb)->txstamp_ack_bpf))
 		return;
 
 	shinfo = skb_shinfo(skb);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 695749807c09..fc84ca669b76 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1556,6 +1556,7 @@ static void tcp_adjust_pcount(struct sock *sk, const struct sk_buff *skb, int de
 static bool tcp_has_tx_tstamp(const struct sk_buff *skb)
 {
 	return TCP_SKB_CB(skb)->txstamp_ack ||
+	       TCP_SKB_CB(skb)->txstamp_ack_bpf ||
 		(skb_shinfo(skb)->tx_flags & SKBTX_ANY_TSTAMP);
 }
 
@@ -1572,7 +1573,9 @@ static void tcp_fragment_tstamp(struct sk_buff *skb, struct sk_buff *skb2)
 		shinfo2->tx_flags |= tsflags;
 		swap(shinfo->tskey, shinfo2->tskey);
 		TCP_SKB_CB(skb2)->txstamp_ack = TCP_SKB_CB(skb)->txstamp_ack;
+		TCP_SKB_CB(skb2)->txstamp_ack_bpf = TCP_SKB_CB(skb)->txstamp_ack_bpf;
 		TCP_SKB_CB(skb)->txstamp_ack = 0;
+		TCP_SKB_CB(skb)->txstamp_ack_bpf = 0;
 	}
 }
 
@@ -3213,6 +3216,8 @@ void tcp_skb_collapse_tstamp(struct sk_buff *skb,
 		shinfo->tskey = next_shinfo->tskey;
 		TCP_SKB_CB(skb)->txstamp_ack |=
 			TCP_SKB_CB(next_skb)->txstamp_ack;
+		TCP_SKB_CB(skb)->txstamp_ack_bpf |=
+			TCP_SKB_CB(next_skb)->txstamp_ack_bpf;
 	}
 }
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 974b7f61d11f..06e68d772989 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7040,6 +7040,11 @@ enum {
 					 * timestamp that hardware just
 					 * generates.
 					 */
+	BPF_SOCK_OPS_TS_ACK_OPT_CB,	/* Called when all the skbs in the
+					 * same sendmsg call are acked
+					 * when SK_BPF_CB_TX_TIMESTAMPING
+					 * feature is on.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 10/13] net-timestamp: make TCP tx timestamp bpf extension work
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (8 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 09/13] net-timestamp: support SCM_TSTAMP_ACK " Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-02-04  1:03   ` Martin KaFai Lau
  2025-01-28  8:46 ` [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp() Jason Xing
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Make partial of the feature work finally. After this, user can
fully use the bpf prog to trace the tx path for TCP type.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/ipv4/tcp.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d704bda6c41..0a41006b10d1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -492,6 +492,15 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
 		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
 			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
 	}
+
+	if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
+		struct skb_shared_info *shinfo = skb_shinfo(skb);
+		struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
+
+		tcb->txstamp_ack_bpf = 1;
+		shinfo->tx_flags |= SKBTX_BPF;
+		shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+	}
 }
 
 static bool tcp_stream_is_readable(struct sock *sk, int target)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (9 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 10/13] net-timestamp: make TCP tx timestamp bpf extension work Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-02-04  1:16   ` Martin KaFai Lau
  2025-01-28  8:46 ` [PATCH bpf-next v7 12/13] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases Jason Xing
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Introduce the callback to correlate tcp_sendmsg timestamp with other
points, like SND/SW/ACK. We can let bpf trace the beginning of
tcp_sendmsg_locked() and fetch the socket addr, so that in
tcp_tx_timestamp() we can correlate the tskey with the socket addr.
It is accurate since they are under the protect of socket lock.
More details can be found in the selftest.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/uapi/linux/bpf.h       | 7 +++++++
 net/ipv4/tcp.c                 | 1 +
 tools/include/uapi/linux/bpf.h | 7 +++++++
 3 files changed, 15 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 800122a8abe5..accb3b314fff 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7052,6 +7052,13 @@ enum {
 					 * when SK_BPF_CB_TX_TIMESTAMPING
 					 * feature is on.
 					 */
+	BPF_SOCK_OPS_TS_SND_CB,		/* Called when every sendmsg syscall
+					 * is triggered. For TCP, it stays
+					 * in the last send process to
+					 * correlate with tcp_sendmsg timestamp
+					 * with other timestamping callbacks,
+					 * like SND/SW/ACK.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0a41006b10d1..b2f1fd216df1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -500,6 +500,7 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
 		tcb->txstamp_ack_bpf = 1;
 		shinfo->tx_flags |= SKBTX_BPF;
 		shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+		bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_SND_CB);
 	}
 }
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 06e68d772989..384502996cdd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7045,6 +7045,13 @@ enum {
 					 * when SK_BPF_CB_TX_TIMESTAMPING
 					 * feature is on.
 					 */
+	BPF_SOCK_OPS_TS_SND_CB,		/* Called when every sendmsg syscall
+					 * is triggered. For TCP, it stays
+					 * in the last send process to
+					 * correlate with tcp_sendmsg timestamp
+					 * with other timestamping callbacks,
+					 * like SND/SW/ACK.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 12/13] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (10 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp() Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-02-04  1:21   ` Martin KaFai Lau
  2025-01-28  8:46 ` [PATCH bpf-next v7 13/13] bpf: add simple bpf tests in the tx path for so_timestamping feature Jason Xing
  2025-02-04  2:27 ` [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Martin KaFai Lau
  13 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Introducing the lock to avoid affecting the applications which
are not using timestamping bpf feature.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/ipv4/tcp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b2f1fd216df1..a2ac57543b6d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -493,7 +493,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
 			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
 	}
 
-	if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
+	if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
+	    SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
 		struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next v7 13/13] bpf: add simple bpf tests in the tx path for so_timestamping feature
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (11 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 12/13] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases Jason Xing
@ 2025-01-28  8:46 ` Jason Xing
  2025-02-04  2:02   ` Martin KaFai Lau
  2025-02-04  2:27 ` [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Martin KaFai Lau
  13 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-01-28  8:46 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Only check if we pass those three key points after we enable the
bpf extension for so_timestamping. During each point, we can choose
whether to print the current timestamp.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 .../bpf/prog_tests/so_timestamping.c          |  86 +++++
 .../selftests/bpf/progs/so_timestamping.c     | 299 ++++++++++++++++++
 2 files changed, 385 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/so_timestamping.c
 create mode 100644 tools/testing/selftests/bpf/progs/so_timestamping.c

diff --git a/tools/testing/selftests/bpf/prog_tests/so_timestamping.c b/tools/testing/selftests/bpf/prog_tests/so_timestamping.c
new file mode 100644
index 000000000000..ee7fdc381609
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/so_timestamping.c
@@ -0,0 +1,86 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <linux/socket.h>
+#include <linux/tls.h>
+#include <net/if.h>
+
+#include "test_progs.h"
+#include "cgroup_helpers.h"
+#include "network_helpers.h"
+
+#include "so_timestamping.skel.h"
+
+#define CG_NAME "/so-timestamping-test"
+
+static const char addr4_str[] = "127.0.0.1";
+static const char addr6_str[] = "::1";
+static struct so_timestamping *skel;
+static int cg_fd;
+
+static void test_tcp(int family)
+{
+	struct so_timestamping__bss *bss = skel->bss;
+	char buf[] = "testing testing";
+	int sfd = -1, cfd = -1;
+	int n;
+
+	memset(bss, 0, sizeof(*bss));
+
+	sfd = start_server(family, SOCK_STREAM,
+			   family == AF_INET6 ? addr6_str : addr4_str, 0, 0);
+	if (!ASSERT_OK_FD(sfd, "start_server"))
+		goto out;
+
+	cfd = connect_to_fd(sfd, 0);
+	if (!ASSERT_OK_FD(cfd, "connect_to_fd_server"))
+		goto out;
+
+	n = write(cfd, buf, sizeof(buf));
+	if (!ASSERT_EQ(n, sizeof(buf), "send to server"))
+		goto out;
+
+	ASSERT_EQ(bss->nr_active, 1, "nr_active");
+	ASSERT_EQ(bss->nr_snd, 2, "nr_snd");
+	ASSERT_EQ(bss->nr_sched, 1, "nr_sched");
+	ASSERT_EQ(bss->nr_txsw, 1, "nr_txsw");
+	ASSERT_EQ(bss->nr_ack, 1, "nr_ack");
+
+out:
+	if (sfd >= 0)
+		close(sfd);
+	if (cfd >= 0)
+		close(cfd);
+}
+
+void test_so_timestamping(void)
+{
+	struct netns_obj *ns;
+
+	cg_fd = test__join_cgroup(CG_NAME);
+	if (cg_fd < 0)
+		return;
+
+	ns = netns_new("so_timestamping_ns", true);
+	if (!ASSERT_OK_PTR(ns, "create ns"))
+		return;
+
+	skel = so_timestamping__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open and load skel"))
+		goto done;
+
+	if (!ASSERT_OK(so_timestamping__attach(skel), "attach skel"))
+		goto done;
+
+	skel->links.skops_sockopt =
+		bpf_program__attach_cgroup(skel->progs.skops_sockopt, cg_fd);
+	if (!ASSERT_OK_PTR(skel->links.skops_sockopt, "attach cgroup"))
+		goto done;
+
+	test_tcp(AF_INET6);
+	test_tcp(AF_INET);
+
+done:
+	so_timestamping__destroy(skel);
+	netns_free(ns);
+	close(cg_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/so_timestamping.c b/tools/testing/selftests/bpf/progs/so_timestamping.c
new file mode 100644
index 000000000000..a893859ffe32
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/so_timestamping.c
@@ -0,0 +1,299 @@
+#include "vmlinux.h"
+#include "bpf_tracing_net.h"
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_kfuncs.h"
+#define BPF_PROG_TEST_TCP_HDR_OPTIONS
+#include "test_tcp_hdr_options.h"
+#include <errno.h>
+
+#define SK_BPF_CB_FLAGS 1009
+#define SK_BPF_CB_TX_TIMESTAMPING 1
+
+int nr_active;
+int nr_snd;
+int nr_passive;
+int nr_sched;
+int nr_txsw;
+int nr_ack;
+
+struct sockopt_test {
+	int opt;
+	int new;
+};
+
+static const struct sockopt_test sol_socket_tests[] = {
+	{ .opt = SK_BPF_CB_FLAGS, .new = SK_BPF_CB_TX_TIMESTAMPING, },
+	{ .opt = 0, },
+};
+
+struct loop_ctx {
+	void *ctx;
+	const struct sock *sk;
+};
+
+struct sk_stg {
+	__u64 sendmsg_ns;	/* record ts when sendmsg is called */
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, struct sk_stg);
+} sk_stg_map SEC(".maps");
+
+
+struct delay_info {
+	u64 sendmsg_ns;		/* record ts when sendmsg is called */
+	u32 sched_delay;	/* SCHED_OPT_CB - sendmsg_ns */
+	u32 sw_snd_delay;	/* SW_OPT_CB - SCHED_OPT_CB */
+	u32 ack_delay;		/* ACK_OPT_CB - SW_OPT_CB */
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, u32);
+	__type(value, struct delay_info);
+	__uint(max_entries, 1024);
+} time_map SEC(".maps");
+
+static u64 delay_tolerance_nsec = 10000000000; /* 10 second as an example */
+
+static int bpf_test_sockopt_int(void *ctx, const struct sock *sk,
+				const struct sockopt_test *t,
+				int level)
+{
+	int new, opt, tmp;
+
+	opt = t->opt;
+	new = t->new;
+
+	if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new)))
+		return 1;
+
+	if (bpf_getsockopt(ctx, level, opt, &tmp, sizeof(tmp)) ||
+	    tmp != new)
+		return 1;
+
+	return 0;
+}
+
+static int bpf_test_socket_sockopt(__u32 i, struct loop_ctx *lc)
+{
+	const struct sockopt_test *t;
+
+	if (i >= ARRAY_SIZE(sol_socket_tests))
+		return 1;
+
+	t = &sol_socket_tests[i];
+	if (!t->opt)
+		return 1;
+
+	return bpf_test_sockopt_int(lc->ctx, lc->sk, t, SOL_SOCKET);
+}
+
+static int bpf_test_sockopt(void *ctx, const struct sock *sk)
+{
+	struct loop_ctx lc = { .ctx = ctx, .sk = sk, };
+	int n;
+
+	n = bpf_loop(ARRAY_SIZE(sol_socket_tests), bpf_test_socket_sockopt, &lc, 0);
+	if (n != ARRAY_SIZE(sol_socket_tests))
+		return -1;
+
+	return 0;
+}
+
+static bool bpf_test_access_sockopt(void *ctx)
+{
+	const struct sockopt_test *t;
+	int tmp, ret, i = 0;
+	int level = SOL_SOCKET;
+
+	t = &sol_socket_tests[i];
+
+	for (; t->opt;) {
+		ret = bpf_setsockopt(ctx, level, t->opt, (void *)&t->new, sizeof(t->new));
+		if (ret != -EOPNOTSUPP)
+			return true;
+
+		ret = bpf_getsockopt(ctx, level, t->opt, &tmp, sizeof(tmp));
+		if (ret != -EOPNOTSUPP)
+			return true;
+
+		if (++i >= ARRAY_SIZE(sol_socket_tests))
+			break;
+	}
+
+	return false;
+}
+
+/* Adding a simple test to see if we can get an expected value */
+static bool bpf_test_access_load_hdr_opt(struct bpf_sock_ops *skops)
+{
+	struct tcp_opt reg_opt;
+	int load_flags = 0;
+	int ret;
+
+	reg_opt.kind = TCPOPT_EXP;
+	reg_opt.len = 0;
+	reg_opt.data32 = 0;
+	ret = bpf_load_hdr_opt(skops, &reg_opt, sizeof(reg_opt), load_flags);
+	if (ret != -EOPNOTSUPP)
+		return true;
+
+	return false;
+}
+
+/* Adding a simple test to see if we can get an expected value */
+static bool bpf_test_access_cb_flags_set(struct bpf_sock_ops *skops)
+{
+	int ret;
+
+	ret = bpf_sock_ops_cb_flags_set(skops, 0);
+	if (ret != -EOPNOTSUPP)
+		return true;
+
+	return false;
+}
+
+/* In the timestamping callbacks, we're not allowed to call the following
+ * BPF CALLs for the safety concern. Return false if expected.
+ */
+static int bpf_test_access_bpf_calls(struct bpf_sock_ops *skops,
+				     const struct sock *sk)
+{
+	if (bpf_test_access_sockopt(skops))
+		return true;
+
+	if (bpf_test_access_load_hdr_opt(skops))
+		return true;
+
+	if (bpf_test_access_cb_flags_set(skops))
+		return true;
+
+	return false;
+}
+
+static bool bpf_test_delay(struct bpf_sock_ops *skops, const struct sock *sk)
+{
+	struct bpf_sock_ops_kern *skops_kern;
+	u64 timestamp = bpf_ktime_get_ns();
+	struct skb_shared_info *shinfo;
+	struct delay_info dinfo = {0};
+	struct delay_info *val;
+	struct sk_buff *skb;
+	struct sk_stg *stg;
+	u64 prior_ts, delay;
+	u32 tskey;
+
+	if (bpf_test_access_bpf_calls(skops, sk))
+		return false;
+
+	skops_kern = bpf_cast_to_kern_ctx(skops);
+	skb = skops_kern->skb;
+	shinfo = bpf_core_cast(skb->head + skb->end, struct skb_shared_info);
+	tskey = shinfo->tskey;
+	if (!tskey)
+		return false;
+
+	if (skops->op == BPF_SOCK_OPS_TS_SND_CB) {
+		stg = bpf_sk_storage_get(&sk_stg_map, (void *)sk, 0, 0);
+		if (!stg)
+			return false;
+		dinfo.sendmsg_ns = stg->sendmsg_ns;
+		bpf_map_update_elem(&time_map, &tskey, &dinfo, BPF_ANY);
+		goto out;
+	}
+
+	val = bpf_map_lookup_elem(&time_map, &tskey);
+	if (!val)
+		return false;
+
+	switch (skops->op) {
+	case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
+		delay = val->sched_delay = timestamp - val->sendmsg_ns;
+		break;
+	case BPF_SOCK_OPS_TS_SW_OPT_CB:
+		prior_ts = val->sched_delay + val->sendmsg_ns;
+		delay = val->sw_snd_delay = timestamp - prior_ts;
+		break;
+	case BPF_SOCK_OPS_TS_ACK_OPT_CB:
+		prior_ts = val->sw_snd_delay + val->sched_delay + val->sendmsg_ns;
+		delay = val->ack_delay = timestamp - prior_ts;
+		break;
+	}
+
+	if (delay >= delay_tolerance_nsec)
+		return false;
+
+	/* Since it's the last one, remove from the map after latency check */
+	if (skops->op == BPF_SOCK_OPS_TS_ACK_OPT_CB)
+		bpf_map_delete_elem(&time_map, &tskey);
+
+out:
+	return true;
+}
+
+SEC("fentry/tcp_sendmsg_locked")
+int BPF_PROG(trace_tcp_sendmsg_locked, struct sock *sk, struct msghdr *msg, size_t size)
+{
+	u64 timestamp = bpf_ktime_get_ns();
+	u32 flag = sk->sk_bpf_cb_flags;
+	struct sk_stg *stg;
+
+	if (!flag)
+		return 0;
+
+	stg = bpf_sk_storage_get(&sk_stg_map, sk, 0,
+				 BPF_SK_STORAGE_GET_F_CREATE);
+	if (!stg)
+		return 0;
+
+	stg->sendmsg_ns = timestamp;
+	nr_snd += 1;
+	return 0;
+}
+
+SEC("sockops")
+int skops_sockopt(struct bpf_sock_ops *skops)
+{
+	struct bpf_sock *bpf_sk = skops->sk;
+	const struct sock *sk;
+
+	if (!bpf_sk)
+		return 1;
+
+	sk = (struct sock *)bpf_skc_to_tcp_sock(bpf_sk);
+	if (!sk)
+		return 1;
+
+	switch (skops->op) {
+	case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+		nr_active += !bpf_test_sockopt(skops, sk);
+		break;
+	case BPF_SOCK_OPS_TS_SND_CB:
+		if (bpf_test_delay(skops, sk))
+			nr_snd += 1;
+		break;
+	case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
+		if (bpf_test_delay(skops, sk))
+			nr_sched += 1;
+		break;
+	case BPF_SOCK_OPS_TS_SW_OPT_CB:
+		if (bpf_test_delay(skops, sk))
+			nr_txsw += 1;
+		break;
+	case BPF_SOCK_OPS_TS_ACK_OPT_CB:
+		if (bpf_test_delay(skops, sk))
+			nr_ack += 1;
+		break;
+	}
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 05/13] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING
  2025-01-28  8:46 ` [PATCH bpf-next v7 05/13] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
@ 2025-02-03 23:14   ` Martin KaFai Lau
  2025-02-04  0:18     ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Martin KaFai Lau @ 2025-02-03 23:14 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/28/25 12:46 AM, Jason Xing wrote:
> No functional changes here. I add skb_enable_app_tstamp() to test
> if the orig_skb matches the usage of application SO_TIMESTAMPING
> and skb_sw_tstamp_tx() to distinguish the software and hardware

There is no skb_sw_tstamp_tx() in the code. An outdated commit message?

> timestamp when tsflag is SCM_TSTAMP_SND.
> 
> Also, I deliberately distinguish the the software and hardware
> SCM_TSTAMP_SND timestamp by passing 'sw' parameter in order to
> avoid such a case where hardware may go wrong and pass a NULL
> hwstamps, which is even though unlikely to happen. If it really
> happens, bpf prog will finally consider it as a software timestamp.
> It will be hardly recognized. Let's make the timestamping part
> more robust.
> 
> After this patch, I will soon add checks about bpf SO_TIMESTAMPING.

This needs to be updated also. BPF does not use the SO_TIMESTAMPING socket option.

> In this way, we can support two modes parallelly.

s/parallely/in parallel/

> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   include/linux/skbuff.h | 13 +++++++------
>   net/core/dev.c         |  2 +-
>   net/core/skbuff.c      | 32 ++++++++++++++++++++++++++++++--
>   net/ipv4/tcp_input.c   |  3 ++-
>   4 files changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index bb2b751d274a..dfc419281cc9 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -39,6 +39,7 @@
>   #include <net/net_debug.h>
>   #include <net/dropreason-core.h>
>   #include <net/netmem.h>
> +#include <uapi/linux/errqueue.h>
>   
>   /**
>    * DOC: skb checksums
> @@ -4533,18 +4534,18 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>   
>   void __skb_tstamp_tx(struct sk_buff *orig_skb, const struct sk_buff *ack_skb,
>   		     struct skb_shared_hwtstamps *hwtstamps,
> -		     struct sock *sk, int tstype);
> +		     struct sock *sk, bool sw, int tstype);
>   
>   /**
> - * skb_tstamp_tx - queue clone of skb with send time stamps
> + * skb_tstamp_tx - queue clone of skb with send HARDWARE timestamps
>    * @orig_skb:	the original outgoing packet
>    * @hwtstamps:	hardware time stamps, may be NULL if not available
>    *
>    * If the skb has a socket associated, then this function clones the
>    * skb (thus sharing the actual data and optional structures), stores
> - * the optional hardware time stamping information (if non NULL) or
> - * generates a software time stamp (otherwise), then queues the clone

This line is removed. Does it mean no software timestamp now after this change?

> - * to the error queue of the socket.  Errors are silently ignored.
> + * the optional hardware time stamping information (if non NULL) then
> + * queues the clone to the error queue of the socket.  Errors are
> + * silently ignored.
>    */
>   void skb_tstamp_tx(struct sk_buff *orig_skb,
>   		   struct skb_shared_hwtstamps *hwtstamps);
> @@ -4565,7 +4566,7 @@ static inline void skb_tx_timestamp(struct sk_buff *skb)
>   {
>   	skb_clone_tx_timestamp(skb);
>   	if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
> -		skb_tstamp_tx(skb, NULL);
> +		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, true, SCM_TSTAMP_SND);
>   }



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 06/13] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension
  2025-01-28  8:46 ` [PATCH bpf-next v7 06/13] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension Jason Xing
@ 2025-02-03 23:23   ` Martin KaFai Lau
  2025-02-04  0:19     ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Martin KaFai Lau @ 2025-02-03 23:23 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/28/25 12:46 AM, Jason Xing wrote:
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 6042961dfc02..d19d577b996f 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5564,6 +5564,24 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
>   	return false;
>   }
>   
> +static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype)
> +{
> +	int op;
> +
> +	if (!sk)

This check is redundant.

> +		return;
> +
> +	switch (tstype) {
> +	case SCM_TSTAMP_SCHED:
> +		op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	bpf_skops_tx_timestamping(sk, skb, op);
> +}
> +
>   void __skb_tstamp_tx(struct sk_buff *orig_skb,
>   		     const struct sk_buff *ack_skb,
>   		     struct skb_shared_hwtstamps *hwtstamps,
> @@ -5576,6 +5594,11 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>   	if (!sk)

It has been tested here...

>   		return;
>   
> +	/* bpf extension feature entry */
> +	if (skb_shinfo(orig_skb)->tx_flags & SKBTX_BPF)
> +		skb_tstamp_tx_bpf(orig_skb, sk, tstype);

...before calling this.

> +
> +	/* application feature entry */
>   	if (!skb_enable_app_tstamp(orig_skb, tstype, sw))
>   		return;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 05/13] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING
  2025-02-03 23:14   ` Martin KaFai Lau
@ 2025-02-04  0:18     ` Jason Xing
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-04  0:18 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Tue, Feb 4, 2025 at 7:14 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > No functional changes here. I add skb_enable_app_tstamp() to test
> > if the orig_skb matches the usage of application SO_TIMESTAMPING
> > and skb_sw_tstamp_tx() to distinguish the software and hardware
>
> There is no skb_sw_tstamp_tx() in the code. An outdated commit message?

Thanks. I'll update it and double check before reposting.

>
> > timestamp when tsflag is SCM_TSTAMP_SND.
> >
> > Also, I deliberately distinguish the the software and hardware
> > SCM_TSTAMP_SND timestamp by passing 'sw' parameter in order to
> > avoid such a case where hardware may go wrong and pass a NULL
> > hwstamps, which is even though unlikely to happen. If it really
> > happens, bpf prog will finally consider it as a software timestamp.
> > It will be hardly recognized. Let's make the timestamping part
> > more robust.
> >
> > After this patch, I will soon add checks about bpf SO_TIMESTAMPING.
>
> This needs to be updated also. BPF does not use the SO_TIMESTAMPING socket option.
>
> > In this way, we can support two modes parallelly.
>
> s/parallely/in parallel/

Will fix it.

>
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   include/linux/skbuff.h | 13 +++++++------
> >   net/core/dev.c         |  2 +-
> >   net/core/skbuff.c      | 32 ++++++++++++++++++++++++++++++--
> >   net/ipv4/tcp_input.c   |  3 ++-
> >   4 files changed, 40 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index bb2b751d274a..dfc419281cc9 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -39,6 +39,7 @@
> >   #include <net/net_debug.h>
> >   #include <net/dropreason-core.h>
> >   #include <net/netmem.h>
> > +#include <uapi/linux/errqueue.h>
> >
> >   /**
> >    * DOC: skb checksums
> > @@ -4533,18 +4534,18 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >
> >   void __skb_tstamp_tx(struct sk_buff *orig_skb, const struct sk_buff *ack_skb,
> >                    struct skb_shared_hwtstamps *hwtstamps,
> > -                  struct sock *sk, int tstype);
> > +                  struct sock *sk, bool sw, int tstype);
> >
> >   /**
> > - * skb_tstamp_tx - queue clone of skb with send time stamps
> > + * skb_tstamp_tx - queue clone of skb with send HARDWARE timestamps
> >    * @orig_skb:       the original outgoing packet
> >    * @hwtstamps:      hardware time stamps, may be NULL if not available
> >    *
> >    * If the skb has a socket associated, then this function clones the
> >    * skb (thus sharing the actual data and optional structures), stores
> > - * the optional hardware time stamping information (if non NULL) or
> > - * generates a software time stamp (otherwise), then queues the clone
>
> This line is removed. Does it mean no software timestamp now after this change?

Right, _software_ timestamp will enter skb_tx_timestamp() then call
__skb_tstamp_tx() instead of this skb_tx_timestamp().

>
> > - * to the error queue of the socket.  Errors are silently ignored.
> > + * the optional hardware time stamping information (if non NULL) then
> > + * queues the clone to the error queue of the socket.  Errors are
> > + * silently ignored.
> >    */
> >   void skb_tstamp_tx(struct sk_buff *orig_skb,
> >                  struct skb_shared_hwtstamps *hwtstamps);
> > @@ -4565,7 +4566,7 @@ static inline void skb_tx_timestamp(struct sk_buff *skb)
> >   {
> >       skb_clone_tx_timestamp(skb);
> >       if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
> > -             skb_tstamp_tx(skb, NULL);
> > +             __skb_tstamp_tx(skb, NULL, NULL, skb->sk, true, SCM_TSTAMP_SND);
> >   }
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 06/13] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension
  2025-02-03 23:23   ` Martin KaFai Lau
@ 2025-02-04  0:19     ` Jason Xing
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-04  0:19 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Tue, Feb 4, 2025 at 7:23 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 6042961dfc02..d19d577b996f 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5564,6 +5564,24 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
> >       return false;
> >   }
> >
> > +static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype)
> > +{
> > +     int op;
> > +
> > +     if (!sk)
>
> This check is redundant.

Thanks. Will remove it.

>
> > +             return;
> > +
> > +     switch (tstype) {
> > +     case SCM_TSTAMP_SCHED:
> > +             op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
> > +             break;
> > +     default:
> > +             return;
> > +     }
> > +
> > +     bpf_skops_tx_timestamping(sk, skb, op);
> > +}
> > +
> >   void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >                    const struct sk_buff *ack_skb,
> >                    struct skb_shared_hwtstamps *hwtstamps,
> > @@ -5576,6 +5594,11 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >       if (!sk)
>
> It has been tested here...
>
> >               return;
> >
> > +     /* bpf extension feature entry */
> > +     if (skb_shinfo(orig_skb)->tx_flags & SKBTX_BPF)
> > +             skb_tstamp_tx_bpf(orig_skb, sk, tstype);
>
> ...before calling this.
>
> > +
> > +     /* application feature entry */
> >       if (!skb_enable_app_tstamp(orig_skb, tstype, sw))
> >               return;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 08/13] net-timestamp: support hw SCM_TSTAMP_SND for bpf extension
  2025-01-28  8:46 ` [PATCH bpf-next v7 08/13] net-timestamp: support hw " Jason Xing
@ 2025-02-04  0:56   ` Martin KaFai Lau
  2025-02-04  1:13     ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Martin KaFai Lau @ 2025-02-04  0:56 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/28/25 12:46 AM, Jason Xing wrote:
> In this patch, we finish the hardware part. Then bpf program can
> fetch the hwstamp from skb directly.
> 
> To avoid changing so many callers using SKBTX_HW_TSTAMP from drivers,
> use this simple modification like this patch does to support printing
> hardware timestamp.
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   include/linux/skbuff.h         |  4 +++-
>   include/uapi/linux/bpf.h       |  7 +++++++
>   net/core/skbuff.c              | 11 ++++++-----
>   net/dsa/user.c                 |  2 +-
>   net/socket.c                   |  2 +-
>   tools/include/uapi/linux/bpf.h |  7 +++++++
>   6 files changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index de8d3bd311f5..df2d790ae36b 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -471,7 +471,7 @@ struct skb_shared_hwtstamps {
>   /* Definitions for tx_flags in struct skb_shared_info */
>   enum {
>   	/* generate hardware time stamp */
> -	SKBTX_HW_TSTAMP = 1 << 0,
> +	__SKBTX_HW_TSTAMP = 1 << 0,
>   
>   	/* generate software time stamp when queueing packet to NIC */
>   	SKBTX_SW_TSTAMP = 1 << 1,
> @@ -495,6 +495,8 @@ enum {
>   	SKBTX_BPF = 1 << 7,
>   };
>   
> +#define SKBTX_HW_TSTAMP		(__SKBTX_HW_TSTAMP | SKBTX_BPF)
> +
>   #define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP    | \
>   				 SKBTX_SCHED_TSTAMP | \
>   				 SKBTX_BPF)
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 6a1083bcf779..4c3566f623c2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7040,6 +7040,13 @@ enum {
>   					 * to the nic when SK_BPF_CB_TX_TIMESTAMPING
>   					 * feature is on.
>   					 */
> +	BPF_SOCK_OPS_TS_HW_OPT_CB,	/* Called in hardware phase when
> +					 * SK_BPF_CB_TX_TIMESTAMPING feature
> +					 * is on. At the same time, hwtstamps
> +					 * of skb is initialized as the
> +					 * timestamp that hardware just
> +					 * generates.
> +					 */
>   };
>   
>   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 288eb9869827..c769feae5162 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5548,7 +5548,7 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
>   		flag = SKBTX_SCHED_TSTAMP;
>   		break;
>   	case SCM_TSTAMP_SND:
> -		flag = sw ? SKBTX_SW_TSTAMP : SKBTX_HW_TSTAMP;
> +		flag = sw ? SKBTX_SW_TSTAMP : __SKBTX_HW_TSTAMP;
>   		break;
>   	case SCM_TSTAMP_ACK:
>   		if (TCP_SKB_CB(skb)->txstamp_ack)
> @@ -5565,7 +5565,8 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
>   }
>   
>   static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk,
> -			      int tstype, bool sw)
> +			      int tstype, bool sw,
> +			      struct skb_shared_hwtstamps *hwtstamps)
>   {
>   	int op;
>   
> @@ -5577,9 +5578,9 @@ static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk,
>   		op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
>   		break;
>   	case SCM_TSTAMP_SND:
> +		op = sw ? BPF_SOCK_OPS_TS_SW_OPT_CB : BPF_SOCK_OPS_TS_HW_OPT_CB;
>   		if (!sw)

Patch 5 mentioned hwtstamps could be NULL, so this should be "if (hwtstamps)" here.

> -			return;
> -		op = BPF_SOCK_OPS_TS_SW_OPT_CB;
> +			*skb_hwtstamps(skb) = *hwtstamps;

Otherwise, this will crash.

pw-bot: cr


>   		break;
>   	default:
>   		return;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 10/13] net-timestamp: make TCP tx timestamp bpf extension work
  2025-01-28  8:46 ` [PATCH bpf-next v7 10/13] net-timestamp: make TCP tx timestamp bpf extension work Jason Xing
@ 2025-02-04  1:03   ` Martin KaFai Lau
  2025-02-04  1:15     ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Martin KaFai Lau @ 2025-02-04  1:03 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/28/25 12:46 AM, Jason Xing wrote:
> Make partial of the feature work finally. After this, user can

If it is "partial"-ly done, what is still missing?

My understanding is after this patch, the BPF program can fully support the TX 
timestamping in TCP.

> fully use the bpf prog to trace the tx path for TCP type.
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   net/ipv4/tcp.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0d704bda6c41..0a41006b10d1 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -492,6 +492,15 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>   		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>   			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>   	}
> +
> +	if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
> +		struct skb_shared_info *shinfo = skb_shinfo(skb);
> +		struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
> +
> +		tcb->txstamp_ack_bpf = 1;
> +		shinfo->tx_flags |= SKBTX_BPF;
> +		shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> +	}
>   }
>   
>   static bool tcp_stream_is_readable(struct sock *sk, int target)


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 08/13] net-timestamp: support hw SCM_TSTAMP_SND for bpf extension
  2025-02-04  0:56   ` Martin KaFai Lau
@ 2025-02-04  1:13     ` Jason Xing
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-04  1:13 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Tue, Feb 4, 2025 at 8:56 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > In this patch, we finish the hardware part. Then bpf program can
> > fetch the hwstamp from skb directly.
> >
> > To avoid changing so many callers using SKBTX_HW_TSTAMP from drivers,
> > use this simple modification like this patch does to support printing
> > hardware timestamp.
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   include/linux/skbuff.h         |  4 +++-
> >   include/uapi/linux/bpf.h       |  7 +++++++
> >   net/core/skbuff.c              | 11 ++++++-----
> >   net/dsa/user.c                 |  2 +-
> >   net/socket.c                   |  2 +-
> >   tools/include/uapi/linux/bpf.h |  7 +++++++
> >   6 files changed, 25 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index de8d3bd311f5..df2d790ae36b 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -471,7 +471,7 @@ struct skb_shared_hwtstamps {
> >   /* Definitions for tx_flags in struct skb_shared_info */
> >   enum {
> >       /* generate hardware time stamp */
> > -     SKBTX_HW_TSTAMP = 1 << 0,
> > +     __SKBTX_HW_TSTAMP = 1 << 0,
> >
> >       /* generate software time stamp when queueing packet to NIC */
> >       SKBTX_SW_TSTAMP = 1 << 1,
> > @@ -495,6 +495,8 @@ enum {
> >       SKBTX_BPF = 1 << 7,
> >   };
> >
> > +#define SKBTX_HW_TSTAMP              (__SKBTX_HW_TSTAMP | SKBTX_BPF)
> > +
> >   #define SKBTX_ANY_SW_TSTAMP (SKBTX_SW_TSTAMP    | \
> >                                SKBTX_SCHED_TSTAMP | \
> >                                SKBTX_BPF)
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 6a1083bcf779..4c3566f623c2 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -7040,6 +7040,13 @@ enum {
> >                                        * to the nic when SK_BPF_CB_TX_TIMESTAMPING
> >                                        * feature is on.
> >                                        */
> > +     BPF_SOCK_OPS_TS_HW_OPT_CB,      /* Called in hardware phase when
> > +                                      * SK_BPF_CB_TX_TIMESTAMPING feature
> > +                                      * is on. At the same time, hwtstamps
> > +                                      * of skb is initialized as the
> > +                                      * timestamp that hardware just
> > +                                      * generates.
> > +                                      */
> >   };
> >
> >   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 288eb9869827..c769feae5162 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5548,7 +5548,7 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
> >               flag = SKBTX_SCHED_TSTAMP;
> >               break;
> >       case SCM_TSTAMP_SND:
> > -             flag = sw ? SKBTX_SW_TSTAMP : SKBTX_HW_TSTAMP;
> > +             flag = sw ? SKBTX_SW_TSTAMP : __SKBTX_HW_TSTAMP;
> >               break;
> >       case SCM_TSTAMP_ACK:
> >               if (TCP_SKB_CB(skb)->txstamp_ack)
> > @@ -5565,7 +5565,8 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
> >   }
> >
> >   static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk,
> > -                           int tstype, bool sw)
> > +                           int tstype, bool sw,
> > +                           struct skb_shared_hwtstamps *hwtstamps)
> >   {
> >       int op;
> >
> > @@ -5577,9 +5578,9 @@ static void skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk,
> >               op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
> >               break;
> >       case SCM_TSTAMP_SND:
> > +             op = sw ? BPF_SOCK_OPS_TS_SW_OPT_CB : BPF_SOCK_OPS_TS_HW_OPT_CB;
> >               if (!sw)
>
> Patch 5 mentioned hwtstamps could be NULL, so this should be "if (hwtstamps)" here.
>
> > -                     return;
> > -             op = BPF_SOCK_OPS_TS_SW_OPT_CB;
> > +                     *skb_hwtstamps(skb) = *hwtstamps;
>
> Otherwise, this will crash.

Sorry, I tested it in the wrong way... Will fix it soon!

>
> pw-bot: cr
>
>
> >               break;
> >       default:
> >               return;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 10/13] net-timestamp: make TCP tx timestamp bpf extension work
  2025-02-04  1:03   ` Martin KaFai Lau
@ 2025-02-04  1:15     ` Jason Xing
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-04  1:15 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Tue, Feb 4, 2025 at 9:03 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > Make partial of the feature work finally. After this, user can
>
> If it is "partial"-ly done, what is still missing?
>
> My understanding is after this patch, the BPF program can fully support the TX
> timestamping in TCP.

I'm going to make the change in the next version. My thought was a big
project supporting various protocols.

>
> > fully use the bpf prog to trace the tx path for TCP type.
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   net/ipv4/tcp.c | 9 +++++++++
> >   1 file changed, 9 insertions(+)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 0d704bda6c41..0a41006b10d1 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -492,6 +492,15 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >               if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >                       shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >       }
> > +
> > +     if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
> > +             struct skb_shared_info *shinfo = skb_shinfo(skb);
> > +             struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
> > +
> > +             tcb->txstamp_ack_bpf = 1;
> > +             shinfo->tx_flags |= SKBTX_BPF;
> > +             shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > +     }
> >   }
> >
> >   static bool tcp_stream_is_readable(struct sock *sk, int target)
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-01-28  8:46 ` [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp() Jason Xing
@ 2025-02-04  1:16   ` Martin KaFai Lau
  2025-02-04  1:25     ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Martin KaFai Lau @ 2025-02-04  1:16 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/28/25 12:46 AM, Jason Xing wrote:
> Introduce the callback to correlate tcp_sendmsg timestamp with other
> points, like SND/SW/ACK. We can let bpf trace the beginning of
> tcp_sendmsg_locked() and fetch the socket addr, so that in

Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at 
the bpf_sk_storage ...".

> tcp_tx_timestamp() we can correlate the tskey with the socket addr.


> It is accurate since they are under the protect of socket lock.
> More details can be found in the selftest.

The selftest uses the bpf_sk_storage to store the sendmsg timestamp at 
fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e. 
BPF_SOCK_OPS_TS_SND_CB added in this patch).

> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   include/uapi/linux/bpf.h       | 7 +++++++
>   net/ipv4/tcp.c                 | 1 +
>   tools/include/uapi/linux/bpf.h | 7 +++++++
>   3 files changed, 15 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 800122a8abe5..accb3b314fff 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7052,6 +7052,13 @@ enum {
>   					 * when SK_BPF_CB_TX_TIMESTAMPING
>   					 * feature is on.
>   					 */
> +	BPF_SOCK_OPS_TS_SND_CB,		/* Called when every sendmsg syscall
> +					 * is triggered. For TCP, it stays
> +					 * in the last send process to
> +					 * correlate with tcp_sendmsg timestamp
> +					 * with other timestamping callbacks,
> +					 * like SND/SW/ACK.

Do you have a chance to look at how this will work at UDP?


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 12/13] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases
  2025-01-28  8:46 ` [PATCH bpf-next v7 12/13] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases Jason Xing
@ 2025-02-04  1:21   ` Martin KaFai Lau
  2025-02-04  1:25     ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Martin KaFai Lau @ 2025-02-04  1:21 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/28/25 12:46 AM, Jason Xing wrote:
> Introducing the lock to avoid affecting the applications which

s/lock/static key/

Unless it needs more static-key guards in the next re-spin, I would squash this 
one liner with patch 10.

> are not using timestamping bpf feature.
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   net/ipv4/tcp.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index b2f1fd216df1..a2ac57543b6d 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -493,7 +493,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>   			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>   	}
>   
> -	if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
> +	if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
> +	    SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
>   		struct skb_shared_info *shinfo = skb_shinfo(skb);
>   		struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
>   


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-04  1:16   ` Martin KaFai Lau
@ 2025-02-04  1:25     ` Jason Xing
  2025-02-04 17:08       ` Willem de Bruijn
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-02-04  1:25 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > tcp_sendmsg_locked() and fetch the socket addr, so that in
>
> Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> the bpf_sk_storage ...".

I will revise it. Thanks.

>
> > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
>
>
> > It is accurate since they are under the protect of socket lock.
> > More details can be found in the selftest.
>
> The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> BPF_SOCK_OPS_TS_SND_CB added in this patch).
>
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   include/uapi/linux/bpf.h       | 7 +++++++
> >   net/ipv4/tcp.c                 | 1 +
> >   tools/include/uapi/linux/bpf.h | 7 +++++++
> >   3 files changed, 15 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 800122a8abe5..accb3b314fff 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -7052,6 +7052,13 @@ enum {
> >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> >                                        * feature is on.
> >                                        */
> > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > +                                      * is triggered. For TCP, it stays
> > +                                      * in the last send process to
> > +                                      * correlate with tcp_sendmsg timestamp
> > +                                      * with other timestamping callbacks,
> > +                                      * like SND/SW/ACK.
>
> Do you have a chance to look at how this will work at UDP?

Sure, I feel like it could not be useful for UDP. Well, things get
strange because I did write a long paragraph about this thing which
apparently disappeared...

I manage to find what I wrote:
    For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
    there are two sending process, 1) lockless path, 2) lock path, which
    should be handled carefully later. For the former, even though it's
    unlikely multiple threads access the socket to call sendmsg at the
    same time, I think we'd better not correlate it like what we do to the
    TCP case because of the lack of sock lock protection. Considering SND_CB is
    uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
    case we need to use it someday.

    And one more thing is I'd like to use the v5[1] method in the next round
    to introduce a new tskey_bpf which is good for UDP type. The new field
    will not conflict with the tskey in shared info which is generated
    by sk->sk_tskey in __ip_append_data(). It hardly works if both features
    (so_timestamping and its bpf extension) exists at the same time. Users
    could get confused because sometimes they fetch the tskey from skb,
    sometimes they don't, especially when we have cmsg feature to turn it on/
    off per sendmsg. A standalone tskey for bpf extension will be needed.
    With this tskey_bpf, we can easily correlate the timestamp in sendmsg
    syscall with other tx points(SND/SW/ACK...).

    [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/

    If possible, we can leave this question until the UDP support series
    shows up. I will figure out a better solution :)

In conclusion, it probably won't be used by the UDP type. It's uAPI
flag so I consider the compatibility reason.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 12/13] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases
  2025-02-04  1:21   ` Martin KaFai Lau
@ 2025-02-04  1:25     ` Jason Xing
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-04  1:25 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Tue, Feb 4, 2025 at 9:22 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > Introducing the lock to avoid affecting the applications which
>
> s/lock/static key/
>
> Unless it needs more static-key guards in the next re-spin, I would squash this
> one liner with patch 10.

Got it. Will do that. Thanks.

>
> > are not using timestamping bpf feature.
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   net/ipv4/tcp.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index b2f1fd216df1..a2ac57543b6d 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -493,7 +493,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >                       shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >       }
> >
> > -     if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
> > +     if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
> > +         SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
> >               struct skb_shared_info *shinfo = skb_shinfo(skb);
> >               struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
> >
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 13/13] bpf: add simple bpf tests in the tx path for so_timestamping feature
  2025-01-28  8:46 ` [PATCH bpf-next v7 13/13] bpf: add simple bpf tests in the tx path for so_timestamping feature Jason Xing
@ 2025-02-04  2:02   ` Martin KaFai Lau
  2025-02-04  5:32     ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Martin KaFai Lau @ 2025-02-04  2:02 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/28/25 12:46 AM, Jason Xing wrote:
> Only check if we pass those three key points after we enable the
> bpf extension for so_timestamping. During each point, we can choose
> whether to print the current timestamp.

The commit message also needs to be updated...

> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   .../bpf/prog_tests/so_timestamping.c          |  86 +++++
>   .../selftests/bpf/progs/so_timestamping.c     | 299 ++++++++++++++++++
>   2 files changed, 385 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/so_timestamping.c
>   create mode 100644 tools/testing/selftests/bpf/progs/so_timestamping.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/so_timestamping.c b/tools/testing/selftests/bpf/prog_tests/so_timestamping.c
> new file mode 100644
> index 000000000000..ee7fdc381609
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/so_timestamping.c
> @@ -0,0 +1,86 @@
> +#define _GNU_SOURCE
> +#include <sched.h>
> +#include <linux/socket.h>
> +#include <linux/tls.h>

tls.h?

> +#include <net/if.h>

I suspect most of the above #define and #include are not needed. Please clean up.

> +
> +#include "test_progs.h"
> +#include "cgroup_helpers.h"
> +#include "network_helpers.h"
> +
> +#include "so_timestamping.skel.h"
> +
> +#define CG_NAME "/so-timestamping-test"
> +
> +static const char addr4_str[] = "127.0.0.1";
> +static const char addr6_str[] = "::1";
> +static struct so_timestamping *skel;
> +static int cg_fd;

nit. cg_fd does not need to be global.

> +
> +static void test_tcp(int family)
> +{
> +	struct so_timestamping__bss *bss = skel->bss;
> +	char buf[] = "testing testing";
> +	int sfd = -1, cfd = -1;
> +	int n;
> +
> +	memset(bss, 0, sizeof(*bss));
> +
> +	sfd = start_server(family, SOCK_STREAM,
> +			   family == AF_INET6 ? addr6_str : addr4_str, 0, 0);
> +	if (!ASSERT_OK_FD(sfd, "start_server"))
> +		goto out;
> +
> +	cfd = connect_to_fd(sfd, 0);
> +	if (!ASSERT_OK_FD(cfd, "connect_to_fd_server"))
> +		goto out;
> +
> +	n = write(cfd, buf, sizeof(buf));
> +	if (!ASSERT_EQ(n, sizeof(buf), "send to server"))
> +		goto out;
> +
> +	ASSERT_EQ(bss->nr_active, 1, "nr_active");
> +	ASSERT_EQ(bss->nr_snd, 2, "nr_snd");
> +	ASSERT_EQ(bss->nr_sched, 1, "nr_sched");
> +	ASSERT_EQ(bss->nr_txsw, 1, "nr_txsw");
> +	ASSERT_EQ(bss->nr_ack, 1, "nr_ack");
> +
> +out:
> +	if (sfd >= 0)
> +		close(sfd);
> +	if (cfd >= 0)
> +		close(cfd);
> +}
> +
> +void test_so_timestamping(void)
> +{
> +	struct netns_obj *ns;
> +
> +	cg_fd = test__join_cgroup(CG_NAME);
> +	if (cg_fd < 0)

ASSERT_OK_FD. The existing setget_sockopt test should probably be fixed also but 
that will be a separate patch.

> +		return;
> +
> +	ns = netns_new("so_timestamping_ns", true);
> +	if (!ASSERT_OK_PTR(ns, "create ns"))

cg_fd is leaked.

> +		return;

goto done;

netns_free() and so_timestamping__destroy() can handle NULL.

> +
> +	skel = so_timestamping__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "open and load skel"))
> +		goto done;
> +
> +	if (!ASSERT_OK(so_timestamping__attach(skel), "attach skel"))
> +		goto done;
> +
> +	skel->links.skops_sockopt =
> +		bpf_program__attach_cgroup(skel->progs.skops_sockopt, cg_fd);
> +	if (!ASSERT_OK_PTR(skel->links.skops_sockopt, "attach cgroup"))
> +		goto done;
> +
> +	test_tcp(AF_INET6);
> +	test_tcp(AF_INET);
> +
> +done:
> +	so_timestamping__destroy(skel);
> +	netns_free(ns);
> +	close(cg_fd);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/so_timestamping.c b/tools/testing/selftests/bpf/progs/so_timestamping.c
> new file mode 100644
> index 000000000000..a893859ffe32
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/so_timestamping.c
> @@ -0,0 +1,299 @@
> +#include "vmlinux.h"
> +#include "bpf_tracing_net.h"
> +#include <bpf/bpf_core_read.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include "bpf_misc.h"
> +#include "bpf_kfuncs.h"
> +#define BPF_PROG_TEST_TCP_HDR_OPTIONS
> +#include "test_tcp_hdr_options.h"
> +#include <errno.h>
> +
> +#define SK_BPF_CB_FLAGS 1009
> +#define SK_BPF_CB_TX_TIMESTAMPING 1
> +
> +int nr_active;
> +int nr_snd;
> +int nr_passive;
> +int nr_sched;
> +int nr_txsw;
> +int nr_ack;
> +
> +struct sockopt_test {
> +	int opt;
> +	int new;
> +};
> +
> +static const struct sockopt_test sol_socket_tests[] = {
> +	{ .opt = SK_BPF_CB_FLAGS, .new = SK_BPF_CB_TX_TIMESTAMPING, },
> +	{ .opt = 0, },
> +};
> +
> +struct loop_ctx {
> +	void *ctx;
> +	const struct sock *sk;
> +};
> +
> +struct sk_stg {
> +	__u64 sendmsg_ns;	/* record ts when sendmsg is called */
> +};
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
> +	__uint(map_flags, BPF_F_NO_PREALLOC);
> +	__type(key, int);
> +	__type(value, struct sk_stg);
> +} sk_stg_map SEC(".maps");
> +
> +
> +struct delay_info {
> +	u64 sendmsg_ns;		/* record ts when sendmsg is called */
> +	u32 sched_delay;	/* SCHED_OPT_CB - sendmsg_ns */
> +	u32 sw_snd_delay;	/* SW_OPT_CB - SCHED_OPT_CB */
> +	u32 ack_delay;		/* ACK_OPT_CB - SW_OPT_CB */
> +};
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_HASH);
> +	__type(key, u32);

I just noticed there are two tcp connections in the test. One v4 and one v6.
Unlikely to collide on seqno, still better to add a sk_cookie to the key of the 
map, like:

struct sk_tskey {
	u64 sk_cookie;
	u32 tskey;
};

Use bpf_get_sokcet_cookie(ctx) to get a unique socket cookie.

> +	__type(value, struct delay_info);
> +	__uint(max_entries, 1024);
> +} time_map SEC(".maps");
> +
> +static u64 delay_tolerance_nsec = 10000000000; /* 10 second as an example */
> +
> +static int bpf_test_sockopt_int(void *ctx, const struct sock *sk,
> +				const struct sockopt_test *t,
> +				int level)
> +{
> +	int new, opt, tmp;
> +
> +	opt = t->opt;
> +	new = t->new;
> +
> +	if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new)))
> +		return 1;
> +
> +	if (bpf_getsockopt(ctx, level, opt, &tmp, sizeof(tmp)) ||
> +	    tmp != new)
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static int bpf_test_socket_sockopt(__u32 i, struct loop_ctx *lc)
> +{
> +	const struct sockopt_test *t;
> +
> +	if (i >= ARRAY_SIZE(sol_socket_tests))
> +		return 1;
> +
> +	t = &sol_socket_tests[i];
> +	if (!t->opt)
> +		return 1;
> +
> +	return bpf_test_sockopt_int(lc->ctx, lc->sk, t, SOL_SOCKET);
> +}
> +
> +static int bpf_test_sockopt(void *ctx, const struct sock *sk)
> +{
> +	struct loop_ctx lc = { .ctx = ctx, .sk = sk, };
> +	int n;
> +
> +	n = bpf_loop(ARRAY_SIZE(sol_socket_tests), bpf_test_socket_sockopt, &lc, 0);

There is only one SK_BPF_CB_FLAGS optname to test, so no need to bpf_loop. 
Directly do one bpf_setsockopt and one bpf_getsockopt.

We can see if there is a need to refactor this timestamp test back to the 
setget_sockopt.c test later if loop will be needed after adding the UDP support. 
The setget_sockopt.c does use a loop to test many options at once which is 
probably where this piece of code (bpf_loop) is borrowed from.

> +	if (n != ARRAY_SIZE(sol_socket_tests))
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static bool bpf_test_access_sockopt(void *ctx)
> +{
> +	const struct sockopt_test *t;
> +	int tmp, ret, i = 0;
> +	int level = SOL_SOCKET;
> +
> +	t = &sol_socket_tests[i];
> +
> +	for (; t->opt;) {

Same here. Directly do one bpf_setsockopt and one bpf_getsockopt instead of looping.

> +		ret = bpf_setsockopt(ctx, level, t->opt, (void *)&t->new, sizeof(t->new));
> +		if (ret != -EOPNOTSUPP)
> +			return true;
> +
> +		ret = bpf_getsockopt(ctx, level, t->opt, &tmp, sizeof(tmp));
> +		if (ret != -EOPNOTSUPP)
> +			return true;
> +
> +		if (++i >= ARRAY_SIZE(sol_socket_tests))
> +			break;
> +	}
> +
> +	return false;
> +}
> +
> +/* Adding a simple test to see if we can get an expected value */
> +static bool bpf_test_access_load_hdr_opt(struct bpf_sock_ops *skops)
> +{
> +	struct tcp_opt reg_opt;
> +	int load_flags = 0;
> +	int ret;
> +
> +	reg_opt.kind = TCPOPT_EXP;
> +	reg_opt.len = 0;
> +	reg_opt.data32 = 0;
> +	ret = bpf_load_hdr_opt(skops, &reg_opt, sizeof(reg_opt), load_flags);
> +	if (ret != -EOPNOTSUPP)
> +		return true;
> +
> +	return false;
> +}
> +
> +/* Adding a simple test to see if we can get an expected value */
> +static bool bpf_test_access_cb_flags_set(struct bpf_sock_ops *skops)
> +{
> +	int ret;
> +
> +	ret = bpf_sock_ops_cb_flags_set(skops, 0);
> +	if (ret != -EOPNOTSUPP)
> +		return true;
> +
> +	return false;
> +}
> +
> +/* In the timestamping callbacks, we're not allowed to call the following
> + * BPF CALLs for the safety concern. Return false if expected.
> + */
> +static int bpf_test_access_bpf_calls(struct bpf_sock_ops *skops,

nit. The return value is true/false. Stay with "bool" as the return type.


> +				     const struct sock *sk)
> +{
> +	if (bpf_test_access_sockopt(skops))
> +		return true;
> +
> +	if (bpf_test_access_load_hdr_opt(skops))
> +		return true;
> +
> +	if (bpf_test_access_cb_flags_set(skops))
> +		return true;

Thanks for adding these negative tests.

> +
> +	return false;
> +}
> +

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently
  2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (12 preceding siblings ...)
  2025-01-28  8:46 ` [PATCH bpf-next v7 13/13] bpf: add simple bpf tests in the tx path for so_timestamping feature Jason Xing
@ 2025-02-04  2:27 ` Martin KaFai Lau
  2025-02-04  2:44   ` Jason Xing
  2025-02-04 17:06   ` Willem de Bruijn
  13 siblings, 2 replies; 45+ messages in thread
From: Martin KaFai Lau @ 2025-02-04  2:27 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/28/25 12:46 AM, Jason Xing wrote:
> "Timestamping is key to debugging network stack latency. With
> SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> network issues can be attributed to the kernel." This is extracted
> from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> addressed by Willem de Bruijn at netdevconf 0x17).
> 
> There are a few areas that need optimization with the consideration of
> easier use and less performance impact, which I highlighted and mainly
> discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> uAPI compatibility, extra system call overhead, and the need for
> application modification. I initially managed to solve these issues
> by writing a kernel module that hooks various key functions. However,
> this approach is not suitable for the next kernel release. Therefore,
> a BPF extension was proposed. During recent period, Martin KaFai Lau
> provides invaluable suggestions about BPF along the way. Many thanks
> here!
> 
> In this series, I only support foundamental codes and tx for TCP.

*fundamental*.

May be just "only tx time stamping for TCP is supported..."

> This approach mostly relies on existing SO_TIMESTAMPING feature, users
> only needs to pass certain flags through bpf_setsocktopt() to a separate
> tsflags. Please see the last selftest patch in this series.
> 
> After this series, we could step by step implement more advanced
> functions/flags already in SO_TIMESTAMPING feature for bpf extension.

Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13 
should be "selftests/bpf:" instead of "bpf:" in the subject.

Please revisit the commit messages of this patch set to check for outdated 
comments from the earlier revisions. I may have missed some of them.

Overall, it looks close. I will review at your replies later.

Willem, could you also take a look? Thanks.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently
  2025-02-04  2:27 ` [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Martin KaFai Lau
@ 2025-02-04  2:44   ` Jason Xing
  2025-02-04 17:11     ` Willem de Bruijn
  2025-02-04 17:06   ` Willem de Bruijn
  1 sibling, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-02-04  2:44 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Tue, Feb 4, 2025 at 10:27 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > "Timestamping is key to debugging network stack latency. With
> > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> > network issues can be attributed to the kernel." This is extracted
> > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> > addressed by Willem de Bruijn at netdevconf 0x17).
> >
> > There are a few areas that need optimization with the consideration of
> > easier use and less performance impact, which I highlighted and mainly
> > discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> > uAPI compatibility, extra system call overhead, and the need for
> > application modification. I initially managed to solve these issues
> > by writing a kernel module that hooks various key functions. However,
> > this approach is not suitable for the next kernel release. Therefore,
> > a BPF extension was proposed. During recent period, Martin KaFai Lau
> > provides invaluable suggestions about BPF along the way. Many thanks
> > here!
> >
> > In this series, I only support foundamental codes and tx for TCP.
>
> *fundamental*.
>
> May be just "only tx time stamping for TCP is supported..."
>
> > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > only needs to pass certain flags through bpf_setsocktopt() to a separate
> > tsflags. Please see the last selftest patch in this series.
> >
> > After this series, we could step by step implement more advanced
> > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
>
> Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13
> should be "selftests/bpf:" instead of "bpf:" in the subject.
>
> Please revisit the commit messages of this patch set to check for outdated
> comments from the earlier revisions. I may have missed some of them.

Roger that, sir. Thanks for your help!

>
> Overall, it looks close. I will review at your replies later.
>
> Willem, could you also take a look? Thanks.

Right, some related parts need reviews from netdev experts as well.

Willem, please help me review this when you're available. No rush :)

Thanks,
Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 13/13] bpf: add simple bpf tests in the tx path for so_timestamping feature
  2025-02-04  2:02   ` Martin KaFai Lau
@ 2025-02-04  5:32     ` Jason Xing
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-04  5:32 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Tue, Feb 4, 2025 at 10:02 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > Only check if we pass those three key points after we enable the
> > bpf extension for so_timestamping. During each point, we can choose
> > whether to print the current timestamp.
>
> The commit message also needs to be updated...

I will revise it.

>
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   .../bpf/prog_tests/so_timestamping.c          |  86 +++++
> >   .../selftests/bpf/progs/so_timestamping.c     | 299 ++++++++++++++++++
> >   2 files changed, 385 insertions(+)
> >   create mode 100644 tools/testing/selftests/bpf/prog_tests/so_timestamping.c
> >   create mode 100644 tools/testing/selftests/bpf/progs/so_timestamping.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/so_timestamping.c b/tools/testing/selftests/bpf/prog_tests/so_timestamping.c
> > new file mode 100644
> > index 000000000000..ee7fdc381609
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/so_timestamping.c
> > @@ -0,0 +1,86 @@
> > +#define _GNU_SOURCE
> > +#include <sched.h>
> > +#include <linux/socket.h>
> > +#include <linux/tls.h>
>
> tls.h?
>
> > +#include <net/if.h>
>
> I suspect most of the above #define and #include are not needed. Please clean up.

I found all the above codes unnecessary.

>
> > +
> > +#include "test_progs.h"
> > +#include "cgroup_helpers.h"
> > +#include "network_helpers.h"
> > +
> > +#include "so_timestamping.skel.h"
> > +
> > +#define CG_NAME "/so-timestamping-test"
> > +
> > +static const char addr4_str[] = "127.0.0.1";
> > +static const char addr6_str[] = "::1";
> > +static struct so_timestamping *skel;
> > +static int cg_fd;
>
> nit. cg_fd does not need to be global.

Got it.

>
> > +
> > +static void test_tcp(int family)
> > +{
> > +     struct so_timestamping__bss *bss = skel->bss;
> > +     char buf[] = "testing testing";
> > +     int sfd = -1, cfd = -1;
> > +     int n;
> > +
> > +     memset(bss, 0, sizeof(*bss));
> > +
> > +     sfd = start_server(family, SOCK_STREAM,
> > +                        family == AF_INET6 ? addr6_str : addr4_str, 0, 0);
> > +     if (!ASSERT_OK_FD(sfd, "start_server"))
> > +             goto out;
> > +
> > +     cfd = connect_to_fd(sfd, 0);
> > +     if (!ASSERT_OK_FD(cfd, "connect_to_fd_server"))
> > +             goto out;
> > +
> > +     n = write(cfd, buf, sizeof(buf));
> > +     if (!ASSERT_EQ(n, sizeof(buf), "send to server"))
> > +             goto out;
> > +
> > +     ASSERT_EQ(bss->nr_active, 1, "nr_active");
> > +     ASSERT_EQ(bss->nr_snd, 2, "nr_snd");
> > +     ASSERT_EQ(bss->nr_sched, 1, "nr_sched");
> > +     ASSERT_EQ(bss->nr_txsw, 1, "nr_txsw");
> > +     ASSERT_EQ(bss->nr_ack, 1, "nr_ack");
> > +
> > +out:
> > +     if (sfd >= 0)
> > +             close(sfd);
> > +     if (cfd >= 0)
> > +             close(cfd);
> > +}
> > +
> > +void test_so_timestamping(void)
> > +{
> > +     struct netns_obj *ns;
> > +
> > +     cg_fd = test__join_cgroup(CG_NAME);
> > +     if (cg_fd < 0)
>
> ASSERT_OK_FD. The existing setget_sockopt test should probably be fixed also but
> that will be a separate patch.

Will fix it. And I just now sent a standalone patch according to your
suggestion.

>
> > +             return;
> > +
> > +     ns = netns_new("so_timestamping_ns", true);
> > +     if (!ASSERT_OK_PTR(ns, "create ns"))
>
> cg_fd is leaked.
>
> > +             return;
>
> goto done;

Will take care of it.

>
> netns_free() and so_timestamping__destroy() can handle NULL.
>
> > +
> > +     skel = so_timestamping__open_and_load();
> > +     if (!ASSERT_OK_PTR(skel, "open and load skel"))
> > +             goto done;
> > +
> > +     if (!ASSERT_OK(so_timestamping__attach(skel), "attach skel"))
> > +             goto done;
> > +
> > +     skel->links.skops_sockopt =
> > +             bpf_program__attach_cgroup(skel->progs.skops_sockopt, cg_fd);
> > +     if (!ASSERT_OK_PTR(skel->links.skops_sockopt, "attach cgroup"))
> > +             goto done;
> > +
> > +     test_tcp(AF_INET6);
> > +     test_tcp(AF_INET);
> > +
> > +done:
> > +     so_timestamping__destroy(skel);
> > +     netns_free(ns);
> > +     close(cg_fd);
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/so_timestamping.c b/tools/testing/selftests/bpf/progs/so_timestamping.c
> > new file mode 100644
> > index 000000000000..a893859ffe32
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/so_timestamping.c
> > @@ -0,0 +1,299 @@
> > +#include "vmlinux.h"
> > +#include "bpf_tracing_net.h"
> > +#include <bpf/bpf_core_read.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include "bpf_misc.h"
> > +#include "bpf_kfuncs.h"
> > +#define BPF_PROG_TEST_TCP_HDR_OPTIONS
> > +#include "test_tcp_hdr_options.h"
> > +#include <errno.h>
> > +
> > +#define SK_BPF_CB_FLAGS 1009
> > +#define SK_BPF_CB_TX_TIMESTAMPING 1
> > +
> > +int nr_active;
> > +int nr_snd;
> > +int nr_passive;
> > +int nr_sched;
> > +int nr_txsw;
> > +int nr_ack;
> > +
> > +struct sockopt_test {
> > +     int opt;
> > +     int new;
> > +};
> > +
> > +static const struct sockopt_test sol_socket_tests[] = {
> > +     { .opt = SK_BPF_CB_FLAGS, .new = SK_BPF_CB_TX_TIMESTAMPING, },
> > +     { .opt = 0, },
> > +};
> > +
> > +struct loop_ctx {
> > +     void *ctx;
> > +     const struct sock *sk;
> > +};
> > +
> > +struct sk_stg {
> > +     __u64 sendmsg_ns;       /* record ts when sendmsg is called */
> > +};
> > +
> > +struct {
> > +     __uint(type, BPF_MAP_TYPE_SK_STORAGE);
> > +     __uint(map_flags, BPF_F_NO_PREALLOC);
> > +     __type(key, int);
> > +     __type(value, struct sk_stg);
> > +} sk_stg_map SEC(".maps");
> > +
> > +
> > +struct delay_info {
> > +     u64 sendmsg_ns;         /* record ts when sendmsg is called */
> > +     u32 sched_delay;        /* SCHED_OPT_CB - sendmsg_ns */
> > +     u32 sw_snd_delay;       /* SW_OPT_CB - SCHED_OPT_CB */
> > +     u32 ack_delay;          /* ACK_OPT_CB - SW_OPT_CB */
> > +};
> > +
> > +struct {
> > +     __uint(type, BPF_MAP_TYPE_HASH);
> > +     __type(key, u32);
>
> I just noticed there are two tcp connections in the test. One v4 and one v6.
> Unlikely to collide on seqno, still better to add a sk_cookie to the key of the
> map, like:
>
> struct sk_tskey {
>         u64 sk_cookie;
>         u32 tskey;
> };
>
> Use bpf_get_sokcet_cookie(ctx) to get a unique socket cookie.

I will try :)

>
> > +     __type(value, struct delay_info);
> > +     __uint(max_entries, 1024);
> > +} time_map SEC(".maps");
> > +
> > +static u64 delay_tolerance_nsec = 10000000000; /* 10 second as an example */
> > +
> > +static int bpf_test_sockopt_int(void *ctx, const struct sock *sk,
> > +                             const struct sockopt_test *t,
> > +                             int level)
> > +{
> > +     int new, opt, tmp;
> > +
> > +     opt = t->opt;
> > +     new = t->new;
> > +
> > +     if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new)))
> > +             return 1;
> > +
> > +     if (bpf_getsockopt(ctx, level, opt, &tmp, sizeof(tmp)) ||
> > +         tmp != new)
> > +             return 1;
> > +
> > +     return 0;
> > +}
> > +
> > +static int bpf_test_socket_sockopt(__u32 i, struct loop_ctx *lc)
> > +{
> > +     const struct sockopt_test *t;
> > +
> > +     if (i >= ARRAY_SIZE(sol_socket_tests))
> > +             return 1;
> > +
> > +     t = &sol_socket_tests[i];
> > +     if (!t->opt)
> > +             return 1;
> > +
> > +     return bpf_test_sockopt_int(lc->ctx, lc->sk, t, SOL_SOCKET);
> > +}
> > +
> > +static int bpf_test_sockopt(void *ctx, const struct sock *sk)
> > +{
> > +     struct loop_ctx lc = { .ctx = ctx, .sk = sk, };
> > +     int n;
> > +
> > +     n = bpf_loop(ARRAY_SIZE(sol_socket_tests), bpf_test_socket_sockopt, &lc, 0);
>
> There is only one SK_BPF_CB_FLAGS optname to test, so no need to bpf_loop.
> Directly do one bpf_setsockopt and one bpf_getsockopt.

I believe in the short run I will support the rx for TCP, so there
will be a SK_BPF_CB_RX_TIMESTAMPING flag. So I still want to keep it.

>
> We can see if there is a need to refactor this timestamp test back to the
> setget_sockopt.c test later if loop will be needed after adding the UDP support.
> The setget_sockopt.c does use a loop to test many options at once which is
> probably where this piece of code (bpf_loop) is borrowed from.
>
> > +     if (n != ARRAY_SIZE(sol_socket_tests))
> > +             return -1;
> > +
> > +     return 0;
> > +}
> > +
> > +static bool bpf_test_access_sockopt(void *ctx)
> > +{
> > +     const struct sockopt_test *t;
> > +     int tmp, ret, i = 0;
> > +     int level = SOL_SOCKET;
> > +
> > +     t = &sol_socket_tests[i];
> > +
> > +     for (; t->opt;) {
>
> Same here. Directly do one bpf_setsockopt and one bpf_getsockopt instead of looping.
>
> > +             ret = bpf_setsockopt(ctx, level, t->opt, (void *)&t->new, sizeof(t->new));
> > +             if (ret != -EOPNOTSUPP)
> > +                     return true;
> > +
> > +             ret = bpf_getsockopt(ctx, level, t->opt, &tmp, sizeof(tmp));
> > +             if (ret != -EOPNOTSUPP)
> > +                     return true;
> > +
> > +             if (++i >= ARRAY_SIZE(sol_socket_tests))
> > +                     break;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> > +/* Adding a simple test to see if we can get an expected value */
> > +static bool bpf_test_access_load_hdr_opt(struct bpf_sock_ops *skops)
> > +{
> > +     struct tcp_opt reg_opt;
> > +     int load_flags = 0;
> > +     int ret;
> > +
> > +     reg_opt.kind = TCPOPT_EXP;
> > +     reg_opt.len = 0;
> > +     reg_opt.data32 = 0;
> > +     ret = bpf_load_hdr_opt(skops, &reg_opt, sizeof(reg_opt), load_flags);
> > +     if (ret != -EOPNOTSUPP)
> > +             return true;
> > +
> > +     return false;
> > +}
> > +
> > +/* Adding a simple test to see if we can get an expected value */
> > +static bool bpf_test_access_cb_flags_set(struct bpf_sock_ops *skops)
> > +{
> > +     int ret;
> > +
> > +     ret = bpf_sock_ops_cb_flags_set(skops, 0);
> > +     if (ret != -EOPNOTSUPP)
> > +             return true;
> > +
> > +     return false;
> > +}
> > +
> > +/* In the timestamping callbacks, we're not allowed to call the following
> > + * BPF CALLs for the safety concern. Return false if expected.
> > + */
> > +static int bpf_test_access_bpf_calls(struct bpf_sock_ops *skops,
>
> nit. The return value is true/false. Stay with "bool" as the return type.

Oh, my bad.

>
>
> > +                                  const struct sock *sk)
> > +{
> > +     if (bpf_test_access_sockopt(skops))
> > +             return true;
> > +
> > +     if (bpf_test_access_load_hdr_opt(skops))
> > +             return true;
> > +
> > +     if (bpf_test_access_cb_flags_set(skops))
> > +             return true;
>
> Thanks for adding these negative tests.

Thanks for the careful review.

Thanks,
Jason

>
> > +
> > +     return false;
> > +}
> > +

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently
  2025-02-04  2:27 ` [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Martin KaFai Lau
  2025-02-04  2:44   ` Jason Xing
@ 2025-02-04 17:06   ` Willem de Bruijn
  1 sibling, 0 replies; 45+ messages in thread
From: Willem de Bruijn @ 2025-02-04 17:06 UTC (permalink / raw)
  To: Martin KaFai Lau, Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

Martin KaFai Lau wrote:
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > "Timestamping is key to debugging network stack latency. With
> > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> > network issues can be attributed to the kernel." This is extracted
> > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> > addressed by Willem de Bruijn at netdevconf 0x17).
> > 
> > There are a few areas that need optimization with the consideration of
> > easier use and less performance impact, which I highlighted and mainly
> > discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> > uAPI compatibility, extra system call overhead, and the need for
> > application modification. I initially managed to solve these issues
> > by writing a kernel module that hooks various key functions. However,
> > this approach is not suitable for the next kernel release. Therefore,
> > a BPF extension was proposed. During recent period, Martin KaFai Lau
> > provides invaluable suggestions about BPF along the way. Many thanks
> > here!
> > 
> > In this series, I only support foundamental codes and tx for TCP.
> 
> *fundamental*.
> 
> May be just "only tx time stamping for TCP is supported..."
> 
> > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > only needs to pass certain flags through bpf_setsocktopt() to a separate
> > tsflags. Please see the last selftest patch in this series.
> > 
> > After this series, we could step by step implement more advanced
> > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> 
> Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13 
> should be "selftests/bpf:" instead of "bpf:" in the subject.
> 
> Please revisit the commit messages of this patch set to check for outdated 
> comments from the earlier revisions. I may have missed some of them.
> 
> Overall, it looks close. I will review at your replies later.
> 
> Willem, could you also take a look? Thanks.

Will do. Traveling, but took a first quick skim. 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-04  1:25     ` Jason Xing
@ 2025-02-04 17:08       ` Willem de Bruijn
  2025-02-04 18:09         ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Willem de Bruijn @ 2025-02-04 17:08 UTC (permalink / raw)
  To: Jason Xing, Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

Jason Xing wrote:
> On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> >
> > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > the bpf_sk_storage ...".
> 
> I will revise it. Thanks.
> 
> >
> > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> >
> >
> > > It is accurate since they are under the protect of socket lock.
> > > More details can be found in the selftest.
> >
> > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> >
> > >
> > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > ---
> > >   include/uapi/linux/bpf.h       | 7 +++++++
> > >   net/ipv4/tcp.c                 | 1 +
> > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > >   3 files changed, 15 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 800122a8abe5..accb3b314fff 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -7052,6 +7052,13 @@ enum {
> > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > >                                        * feature is on.
> > >                                        */
> > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > +                                      * is triggered. For TCP, it stays
> > > +                                      * in the last send process to
> > > +                                      * correlate with tcp_sendmsg timestamp
> > > +                                      * with other timestamping callbacks,
> > > +                                      * like SND/SW/ACK.
> >
> > Do you have a chance to look at how this will work at UDP?
> 
> Sure, I feel like it could not be useful for UDP. Well, things get
> strange because I did write a long paragraph about this thing which
> apparently disappeared...
> 
> I manage to find what I wrote:
>     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
>     there are two sending process, 1) lockless path, 2) lock path, which
>     should be handled carefully later. For the former, even though it's
>     unlikely multiple threads access the socket to call sendmsg at the
>     same time, I think we'd better not correlate it like what we do to the
>     TCP case because of the lack of sock lock protection. Considering SND_CB is
>     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
>     case we need to use it someday.
> 
>     And one more thing is I'd like to use the v5[1] method in the next round
>     to introduce a new tskey_bpf which is good for UDP type. The new field
>     will not conflict with the tskey in shared info which is generated
>     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
>     (so_timestamping and its bpf extension) exists at the same time. Users
>     could get confused because sometimes they fetch the tskey from skb,
>     sometimes they don't, especially when we have cmsg feature to turn it on/
>     off per sendmsg. A standalone tskey for bpf extension will be needed.
>     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
>     syscall with other tx points(SND/SW/ACK...).
> 
>     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> 
>     If possible, we can leave this question until the UDP support series
>     shows up. I will figure out a better solution :)
> 
> In conclusion, it probably won't be used by the UDP type. It's uAPI
> flag so I consider the compatibility reason.

I don't think this is acceptable. We should aim for an API that can
easily be used across protocols, like SO_TIMESTAMPING. Taking a
timestamp at sendmsg entry is a useful property for all such
protocols.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently
  2025-02-04  2:44   ` Jason Xing
@ 2025-02-04 17:11     ` Willem de Bruijn
  2025-02-04 18:12       ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Willem de Bruijn @ 2025-02-04 17:11 UTC (permalink / raw)
  To: Jason Xing, Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

Jason Xing wrote:
> On Tue, Feb 4, 2025 at 10:27 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > "Timestamping is key to debugging network stack latency. With
> > > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> > > network issues can be attributed to the kernel." This is extracted
> > > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> > > addressed by Willem de Bruijn at netdevconf 0x17).
> > >
> > > There are a few areas that need optimization with the consideration of
> > > easier use and less performance impact, which I highlighted and mainly
> > > discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> > > uAPI compatibility, extra system call overhead, and the need for
> > > application modification. I initially managed to solve these issues
> > > by writing a kernel module that hooks various key functions. However,
> > > this approach is not suitable for the next kernel release. Therefore,
> > > a BPF extension was proposed. During recent period, Martin KaFai Lau
> > > provides invaluable suggestions about BPF along the way. Many thanks
> > > here!
> > >
> > > In this series, I only support foundamental codes and tx for TCP.
> >
> > *fundamental*.
> >
> > May be just "only tx time stamping for TCP is supported..."
> >
> > > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > > only needs to pass certain flags through bpf_setsocktopt() to a separate
> > > tsflags. Please see the last selftest patch in this series.
> > >
> > > After this series, we could step by step implement more advanced
> > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> >
> > Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13
> > should be "selftests/bpf:" instead of "bpf:" in the subject.
> >
> > Please revisit the commit messages of this patch set to check for outdated
> > comments from the earlier revisions. I may have missed some of them.
> 
> Roger that, sir. Thanks for your help!
> 
> >
> > Overall, it looks close. I will review at your replies later.
> >
> > Willem, could you also take a look? Thanks.
> 
> Right, some related parts need reviews from netdev experts as well.
> 
> Willem, please help me review this when you're available. No rush :)

I won't have much to add for the BPF side, to be clear.

One small high level commit message point: as submitting-patches
suggests, use imperative mood: "adds X" when the patch introduces a
feature, not "I add". And "caller gets" rather than "we get".

Specific case, with capitalization issue: "we need to Introduce".

I'll respond to a few inline code elements later. Nothing huge.
Also feel free to post the next version and I'll respond to that, if
you prefer.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-04 17:08       ` Willem de Bruijn
@ 2025-02-04 18:09         ` Jason Xing
  2025-02-05  3:05           ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-02-04 18:09 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > >
> > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > >
> > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > the bpf_sk_storage ...".
> >
> > I will revise it. Thanks.
> >
> > >
> > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > >
> > >
> > > > It is accurate since they are under the protect of socket lock.
> > > > More details can be found in the selftest.
> > >
> > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > >
> > > >
> > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > ---
> > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > >   net/ipv4/tcp.c                 | 1 +
> > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > >   3 files changed, 15 insertions(+)
> > > >
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index 800122a8abe5..accb3b314fff 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -7052,6 +7052,13 @@ enum {
> > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > >                                        * feature is on.
> > > >                                        */
> > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > +                                      * is triggered. For TCP, it stays
> > > > +                                      * in the last send process to
> > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > +                                      * with other timestamping callbacks,
> > > > +                                      * like SND/SW/ACK.
> > >
> > > Do you have a chance to look at how this will work at UDP?
> >
> > Sure, I feel like it could not be useful for UDP. Well, things get
> > strange because I did write a long paragraph about this thing which
> > apparently disappeared...
> >
> > I manage to find what I wrote:
> >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> >     there are two sending process, 1) lockless path, 2) lock path, which
> >     should be handled carefully later. For the former, even though it's
> >     unlikely multiple threads access the socket to call sendmsg at the
> >     same time, I think we'd better not correlate it like what we do to the
> >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> >     case we need to use it someday.
> >
> >     And one more thing is I'd like to use the v5[1] method in the next round
> >     to introduce a new tskey_bpf which is good for UDP type. The new field
> >     will not conflict with the tskey in shared info which is generated
> >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> >     (so_timestamping and its bpf extension) exists at the same time. Users
> >     could get confused because sometimes they fetch the tskey from skb,
> >     sometimes they don't, especially when we have cmsg feature to turn it on/
> >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> >     syscall with other tx points(SND/SW/ACK...).
> >
> >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> >
> >     If possible, we can leave this question until the UDP support series
> >     shows up. I will figure out a better solution :)
> >
> > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > flag so I consider the compatibility reason.
>
> I don't think this is acceptable. We should aim for an API that can
> easily be used across protocols, like SO_TIMESTAMPING.

After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
adjusted like below:

It's hard to provide an absolutely uniform interface or usage to users
for TCP and UDP and even more protocols. Cases can be handled one by
one. The main obstacle is how we can correlate the timestamp in
sendmsg syscall with other sending timestamps. It's worth noticing
that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
userspace. For instance, while skb enters the qdisc, we fail to know
which skb belongs to which sendmsg.

An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
protection of socket lock + syscall as the current patch does. But for
UDP, it can be lockless. IIUC, there is a very special case where even
SO_TIMESTAMPING may get lost: if multiple threads accessing the same
socket send UDP packets in parallel, then users could be confused
which tskey matches which sendmsg. IIUC, I will not consider this
unlikely case, then the UDP case is quite similar to the TCP case.

The scenario for the UDP case is:
1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
like TCP does in this series.
2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
with timestamp.
Note: tskey in UDP will be handled carefully in a different way
because we should support both modes for socket timestamping at the
same time.
It's really similar to TCP regardless of handling tskey.

I feel like I have to change back the name to BPF_SOCK_OPS_TS_TCP_SND_CB?

Thanks,
Jason

> Taking a
> timestamp at sendmsg entry is a useful property for all such
> protocols.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently
  2025-02-04 17:11     ` Willem de Bruijn
@ 2025-02-04 18:12       ` Jason Xing
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-04 18:12 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Feb 5, 2025 at 1:11 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Tue, Feb 4, 2025 at 10:27 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > >
> > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > "Timestamping is key to debugging network stack latency. With
> > > > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> > > > network issues can be attributed to the kernel." This is extracted
> > > > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> > > > addressed by Willem de Bruijn at netdevconf 0x17).
> > > >
> > > > There are a few areas that need optimization with the consideration of
> > > > easier use and less performance impact, which I highlighted and mainly
> > > > discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> > > > uAPI compatibility, extra system call overhead, and the need for
> > > > application modification. I initially managed to solve these issues
> > > > by writing a kernel module that hooks various key functions. However,
> > > > this approach is not suitable for the next kernel release. Therefore,
> > > > a BPF extension was proposed. During recent period, Martin KaFai Lau
> > > > provides invaluable suggestions about BPF along the way. Many thanks
> > > > here!
> > > >
> > > > In this series, I only support foundamental codes and tx for TCP.
> > >
> > > *fundamental*.
> > >
> > > May be just "only tx time stamping for TCP is supported..."
> > >
> > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > > > only needs to pass certain flags through bpf_setsocktopt() to a separate
> > > > tsflags. Please see the last selftest patch in this series.
> > > >
> > > > After this series, we could step by step implement more advanced
> > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> > >
> > > Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13
> > > should be "selftests/bpf:" instead of "bpf:" in the subject.
> > >
> > > Please revisit the commit messages of this patch set to check for outdated
> > > comments from the earlier revisions. I may have missed some of them.
> >
> > Roger that, sir. Thanks for your help!
> >
> > >
> > > Overall, it looks close. I will review at your replies later.
> > >
> > > Willem, could you also take a look? Thanks.
> >
> > Right, some related parts need reviews from netdev experts as well.
> >
> > Willem, please help me review this when you're available. No rush :)
>
> I won't have much to add for the BPF side, to be clear.
>
> One small high level commit message point: as submitting-patches
> suggests, use imperative mood: "adds X" when the patch introduces a
> feature, not "I add". And "caller gets" rather than "we get".
>
> Specific case, with capitalization issue: "we need to Introduce".

Thanks for learning a new lesson. I will adjust them.

>
> I'll respond to a few inline code elements later. Nothing huge.
> Also feel free to post the next version and I'll respond to that, if
> you prefer.

I will post v8 soon. Thanks for your precious time. Have fun with your trip :p

Thanks,
Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-04 18:09         ` Jason Xing
@ 2025-02-05  3:05           ` Jason Xing
  2025-02-05  5:13             ` Jason Xing
  2025-02-05 15:20             ` Willem de Bruijn
  0 siblings, 2 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-05  3:05 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > >
> > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > >
> > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > the bpf_sk_storage ...".
> > >
> > > I will revise it. Thanks.
> > >
> > > >
> > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > >
> > > >
> > > > > It is accurate since they are under the protect of socket lock.
> > > > > More details can be found in the selftest.
> > > >
> > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > >
> > > > >
> > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > ---
> > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > >   net/ipv4/tcp.c                 | 1 +
> > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > >   3 files changed, 15 insertions(+)
> > > > >
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > >                                        * feature is on.
> > > > >                                        */
> > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > +                                      * is triggered. For TCP, it stays
> > > > > +                                      * in the last send process to
> > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > +                                      * with other timestamping callbacks,
> > > > > +                                      * like SND/SW/ACK.
> > > >
> > > > Do you have a chance to look at how this will work at UDP?
> > >
> > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > strange because I did write a long paragraph about this thing which
> > > apparently disappeared...
> > >
> > > I manage to find what I wrote:
> > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > >     there are two sending process, 1) lockless path, 2) lock path, which
> > >     should be handled carefully later. For the former, even though it's
> > >     unlikely multiple threads access the socket to call sendmsg at the
> > >     same time, I think we'd better not correlate it like what we do to the
> > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > >     case we need to use it someday.
> > >
> > >     And one more thing is I'd like to use the v5[1] method in the next round
> > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > >     will not conflict with the tskey in shared info which is generated
> > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > >     could get confused because sometimes they fetch the tskey from skb,
> > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > >     syscall with other tx points(SND/SW/ACK...).
> > >
> > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > >
> > >     If possible, we can leave this question until the UDP support series
> > >     shows up. I will figure out a better solution :)
> > >
> > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > flag so I consider the compatibility reason.
> >
> > I don't think this is acceptable. We should aim for an API that can
> > easily be used across protocols, like SO_TIMESTAMPING.
>
> After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> adjusted like below:
>
> It's hard to provide an absolutely uniform interface or usage to users
> for TCP and UDP and even more protocols. Cases can be handled one by
> one. The main obstacle is how we can correlate the timestamp in
> sendmsg syscall with other sending timestamps. It's worth noticing
> that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> userspace. For instance, while skb enters the qdisc, we fail to know
> which skb belongs to which sendmsg.
>
> An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> protection of socket lock + syscall as the current patch does. But for
> UDP, it can be lockless. IIUC, there is a very special case where even
> SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> socket send UDP packets in parallel, then users could be confused
> which tskey matches which sendmsg. IIUC, I will not consider this
> unlikely case, then the UDP case is quite similar to the TCP case.
>
> The scenario for the UDP case is:
> 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> like TCP does in this series.
> 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> with timestamp.
> Note: tskey in UDP will be handled carefully in a different way
> because we should support both modes for socket timestamping at the
> same time.
> It's really similar to TCP regardless of handling tskey.
>

To be more precise in case you don't have much time to read the above
long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
sendmsg timestamp with corresponding tskey.

1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.

2. For UDP, we can correlate in __ip_append_data() along with those
tskey initialization, assuming there are no multiple threads calling
locklessly ip_make_skb(). Locked path
(udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
protection, so it can be easily handled. Lockless path
(udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
the same time, which should be handled properly. I prefer to implement
the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
I think. This might be the only one corner case, IIUC?

Overall I think BPF_SOCK_OPS_TS_SND_CB can work across protocols to do
the correlation job.

To be on the safe side, I can change the name BPF_SOCK_OPS_TS_SND_CB
to BPF_SOCK_OPS_TS_TCP_SND_CB just in case this approach is not the
best one. What do you think about this?

[1]
commit 4aecca4c76808f3736056d18ff510df80424bc9f
Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Date:   Tue Oct 1 05:57:14 2024 -0700

    net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message

    SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
    timestamps and packets sent via socket. Unfortunately, there is no way
    to reliably predict socket timestamp ID value in case of error returned
    by sendmsg. For UDP sockets it's impossible because of lockless
    nature of UDP transmit, several threads may send packets in parallel. In
    case of RAW sockets MSG_MORE option makes things complicated. More
    details are in the conversation [1].
    This patch adds new control message type to give user-space
    software an opportunity to control the mapping between packets and
    values by providing ID with each sendmsg for UDP sockets.
    The documentation is also added in this patch.

    [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/

Thanks,
Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-05  3:05           ` Jason Xing
@ 2025-02-05  5:13             ` Jason Xing
  2025-02-05 15:20             ` Willem de Bruijn
  1 sibling, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-05  5:13 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Feb 5, 2025 at 11:05 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > > >
> > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > >
> > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > the bpf_sk_storage ...".
> > > >
> > > > I will revise it. Thanks.
> > > >
> > > > >
> > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > >
> > > > >
> > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > More details can be found in the selftest.
> > > > >
> > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > >
> > > > > >
> > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > ---
> > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > >   3 files changed, 15 insertions(+)
> > > > > >
> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > >                                        * feature is on.
> > > > > >                                        */
> > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > +                                      * in the last send process to
> > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > +                                      * with other timestamping callbacks,
> > > > > > +                                      * like SND/SW/ACK.
> > > > >
> > > > > Do you have a chance to look at how this will work at UDP?
> > > >
> > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > strange because I did write a long paragraph about this thing which
> > > > apparently disappeared...
> > > >
> > > > I manage to find what I wrote:
> > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > >     should be handled carefully later. For the former, even though it's
> > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > >     same time, I think we'd better not correlate it like what we do to the
> > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > >     case we need to use it someday.
> > > >
> > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > >     will not conflict with the tskey in shared info which is generated
> > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > >     could get confused because sometimes they fetch the tskey from skb,
> > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > >     syscall with other tx points(SND/SW/ACK...).
> > > >
> > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > >
> > > >     If possible, we can leave this question until the UDP support series
> > > >     shows up. I will figure out a better solution :)
> > > >
> > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > flag so I consider the compatibility reason.
> > >
> > > I don't think this is acceptable. We should aim for an API that can
> > > easily be used across protocols, like SO_TIMESTAMPING.
> >
> > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > adjusted like below:
> >
> > It's hard to provide an absolutely uniform interface or usage to users
> > for TCP and UDP and even more protocols. Cases can be handled one by
> > one. The main obstacle is how we can correlate the timestamp in
> > sendmsg syscall with other sending timestamps. It's worth noticing
> > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > userspace. For instance, while skb enters the qdisc, we fail to know
> > which skb belongs to which sendmsg.
> >
> > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > protection of socket lock + syscall as the current patch does. But for
> > UDP, it can be lockless. IIUC, there is a very special case where even
> > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > socket send UDP packets in parallel, then users could be confused
> > which tskey matches which sendmsg. IIUC, I will not consider this
> > unlikely case, then the UDP case is quite similar to the TCP case.
> >
> > The scenario for the UDP case is:
> > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > like TCP does in this series.
> > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > with timestamp.
> > Note: tskey in UDP will be handled carefully in a different way
> > because we should support both modes for socket timestamping at the
> > same time.
> > It's really similar to TCP regardless of handling tskey.
> >
>
> To be more precise in case you don't have much time to read the above
> long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> sendmsg timestamp with corresponding tskey.
>
> 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
>
> 2. For UDP, we can correlate in __ip_append_data() along with those
> tskey initialization, assuming there are no multiple threads calling
> locklessly ip_make_skb(). Locked path
> (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> protection, so it can be easily handled. Lockless path
> (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> the same time, which should be handled properly. I prefer to implement
> the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> I think. This might be the only one corner case, IIUC?
>
> Overall I think BPF_SOCK_OPS_TS_SND_CB can work across protocols to do
> the correlation job.
>
> To be on the safe side, I can change the name BPF_SOCK_OPS_TS_SND_CB
> to BPF_SOCK_OPS_TS_TCP_SND_CB just in case this approach is not the
> best one. What do you think about this?
>
> [1]
> commit 4aecca4c76808f3736056d18ff510df80424bc9f
> Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> Date:   Tue Oct 1 05:57:14 2024 -0700
>
>     net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
>
>     SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>     timestamps and packets sent via socket. Unfortunately, there is no way
>     to reliably predict socket timestamp ID value in case of error returned
>     by sendmsg. For UDP sockets it's impossible because of lockless
>     nature of UDP transmit, several threads may send packets in parallel. In
>     case of RAW sockets MSG_MORE option makes things complicated. More
>     details are in the conversation [1].
>     This patch adds new control message type to give user-space
>     software an opportunity to control the mapping between packets and
>     values by providing ID with each sendmsg for UDP sockets.
>     The documentation is also added in this patch.
>
>     [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>

Oh, I came up with a feasible approach for UDP protocol:
1. introduce a field ts_opt_id_bpf which works like ts_opt_id to allow
the bpf program to fully take control of the management of tskey.
2. use fentry hook udp_sendmsg(), and introduce a callback function
like BPF_SOCK_OPS_TIMEOUT_INIT in kernel to initialize the
ts_opt_id_bpf with tskey that bpf prog generates. We can directly use
BPF_SOCK_OPS_TS_SND_CB.
3. modify the SCM_TS_OPT_ID logic to support bpf extension so that the
newly added field ts_opt_id_bpf can be passed to the
skb_shinfo(skb)->tskey in __ip_append_data().

In this way, this approach can also be extended for other protocols.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-05  3:05           ` Jason Xing
  2025-02-05  5:13             ` Jason Xing
@ 2025-02-05 15:20             ` Willem de Bruijn
  2025-02-05 15:47               ` Jason Xing
  1 sibling, 1 reply; 45+ messages in thread
From: Willem de Bruijn @ 2025-02-05 15:20 UTC (permalink / raw)
  To: Jason Xing, Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

Jason Xing wrote:
> On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > > >
> > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > >
> > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > the bpf_sk_storage ...".
> > > >
> > > > I will revise it. Thanks.
> > > >
> > > > >
> > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > >
> > > > >
> > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > More details can be found in the selftest.
> > > > >
> > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > >
> > > > > >
> > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > ---
> > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > >   3 files changed, 15 insertions(+)
> > > > > >
> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > >                                        * feature is on.
> > > > > >                                        */
> > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > +                                      * in the last send process to
> > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > +                                      * with other timestamping callbacks,
> > > > > > +                                      * like SND/SW/ACK.
> > > > >
> > > > > Do you have a chance to look at how this will work at UDP?
> > > >
> > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > strange because I did write a long paragraph about this thing which
> > > > apparently disappeared...
> > > >
> > > > I manage to find what I wrote:
> > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > >     should be handled carefully later. For the former, even though it's
> > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > >     same time, I think we'd better not correlate it like what we do to the
> > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > >     case we need to use it someday.
> > > >
> > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > >     will not conflict with the tskey in shared info which is generated
> > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > >     could get confused because sometimes they fetch the tskey from skb,
> > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > >     syscall with other tx points(SND/SW/ACK...).
> > > >
> > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > >
> > > >     If possible, we can leave this question until the UDP support series
> > > >     shows up. I will figure out a better solution :)
> > > >
> > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > flag so I consider the compatibility reason.
> > >
> > > I don't think this is acceptable. We should aim for an API that can
> > > easily be used across protocols, like SO_TIMESTAMPING.
> >
> > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > adjusted like below:
> >
> > It's hard to provide an absolutely uniform interface or usage to users
> > for TCP and UDP and even more protocols. Cases can be handled one by
> > one. 

We should try hard. SO_TIMESTAMPING is uniform across protocols.
An interface that is not is just hard to use.

> > The main obstacle is how we can correlate the timestamp in
> > sendmsg syscall with other sending timestamps. It's worth noticing
> > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > userspace. For instance, while skb enters the qdisc, we fail to know
> > which skb belongs to which sendmsg.
> >
> > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > protection of socket lock + syscall as the current patch does. But for
> > UDP, it can be lockless. IIUC, there is a very special case where even
> > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > socket send UDP packets in parallel, then users could be confused
> > which tskey matches which sendmsg.

This is a known issue for lockless datagram sockets.

With SO_TIMESTAMPING, but the use of timestamping and of concurrent
sendmsg calls is under control of the process, so it only shoots
itself in the foot.

With BPF timestamping, a process may confuse a third party admin, so
the situation is slightly different.

> > IIUC, I will not consider this
> > unlikely case, then the UDP case is quite similar to the TCP case.
> >
> > The scenario for the UDP case is:
> > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > like TCP does in this series.
> > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > with timestamp.
> > Note: tskey in UDP will be handled carefully in a different way
> > because we should support both modes for socket timestamping at the
> > same time.
> > It's really similar to TCP regardless of handling tskey.
> >
> 
> To be more precise in case you don't have much time to read the above
> long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> sendmsg timestamp with corresponding tskey.
> 
> 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
> 
> 2. For UDP, we can correlate in __ip_append_data() along with those
> tskey initialization, assuming there are no multiple threads calling
> locklessly ip_make_skb(). Locked path
> (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> protection, so it can be easily handled. Lockless path
> (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> the same time, which should be handled properly.

Different hook points is fine, as UDP (and RAW) uses __ip_append_data
or more importantly ip_send_skb, while TCP uses ip_queue_xmit.

As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB)
and the behavior of that operation. Subject to the usual distinction
between protocol behavior (bytestream vs datagram).

> I prefer to implement
> the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> I think. This might be the only one corner case, IIUC?

This sounds like an entirely different topic? Not sure what this is.

> Overall I think BPF_SOCK_OPS_TS_SND_CB can work across protocols to do
> the correlation job.
> 
> To be on the safe side, I can change the name BPF_SOCK_OPS_TS_SND_CB
> to BPF_SOCK_OPS_TS_TCP_SND_CB just in case this approach is not the
> best one. What do you think about this?
> 
> [1]
> commit 4aecca4c76808f3736056d18ff510df80424bc9f
> Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> Date:   Tue Oct 1 05:57:14 2024 -0700
> 
>     net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
> 
>     SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>     timestamps and packets sent via socket. Unfortunately, there is no way
>     to reliably predict socket timestamp ID value in case of error returned
>     by sendmsg. For UDP sockets it's impossible because of lockless
>     nature of UDP transmit, several threads may send packets in parallel. In
>     case of RAW sockets MSG_MORE option makes things complicated. More
>     details are in the conversation [1].
>     This patch adds new control message type to give user-space
>     software an opportunity to control the mapping between packets and
>     values by providing ID with each sendmsg for UDP sockets.
>     The documentation is also added in this patch.
> 
>     [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> 
> Thanks,
> Jason



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-05 15:20             ` Willem de Bruijn
@ 2025-02-05 15:47               ` Jason Xing
  2025-02-05 21:02                 ` Willem de Bruijn
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-02-05 15:47 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Feb 5, 2025 at 11:20 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > Jason Xing wrote:
> > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > > > >
> > > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > > >
> > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > > the bpf_sk_storage ...".
> > > > >
> > > > > I will revise it. Thanks.
> > > > >
> > > > > >
> > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > > >
> > > > > >
> > > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > > More details can be found in the selftest.
> > > > > >
> > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > > ---
> > > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > > >   3 files changed, 15 insertions(+)
> > > > > > >
> > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > > >                                        * feature is on.
> > > > > > >                                        */
> > > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > > +                                      * in the last send process to
> > > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > > +                                      * with other timestamping callbacks,
> > > > > > > +                                      * like SND/SW/ACK.
> > > > > >
> > > > > > Do you have a chance to look at how this will work at UDP?
> > > > >
> > > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > > strange because I did write a long paragraph about this thing which
> > > > > apparently disappeared...
> > > > >
> > > > > I manage to find what I wrote:
> > > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > > >     should be handled carefully later. For the former, even though it's
> > > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > > >     same time, I think we'd better not correlate it like what we do to the
> > > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > > >     case we need to use it someday.
> > > > >
> > > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > > >     will not conflict with the tskey in shared info which is generated
> > > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > > >     could get confused because sometimes they fetch the tskey from skb,
> > > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > > >     syscall with other tx points(SND/SW/ACK...).
> > > > >
> > > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > > >
> > > > >     If possible, we can leave this question until the UDP support series
> > > > >     shows up. I will figure out a better solution :)
> > > > >
> > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > > flag so I consider the compatibility reason.
> > > >
> > > > I don't think this is acceptable. We should aim for an API that can
> > > > easily be used across protocols, like SO_TIMESTAMPING.
> > >
> > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > > adjusted like below:
> > >
> > > It's hard to provide an absolutely uniform interface or usage to users
> > > for TCP and UDP and even more protocols. Cases can be handled one by
> > > one.
>
> We should try hard. SO_TIMESTAMPING is uniform across protocols.
> An interface that is not is just hard to use.
>
> > > The main obstacle is how we can correlate the timestamp in
> > > sendmsg syscall with other sending timestamps. It's worth noticing
> > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > > userspace. For instance, while skb enters the qdisc, we fail to know
> > > which skb belongs to which sendmsg.
> > >
> > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > > protection of socket lock + syscall as the current patch does. But for
> > > UDP, it can be lockless. IIUC, there is a very special case where even
> > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > > socket send UDP packets in parallel, then users could be confused
> > > which tskey matches which sendmsg.
>
> This is a known issue for lockless datagram sockets.
>
> With SO_TIMESTAMPING, but the use of timestamping and of concurrent
> sendmsg calls is under control of the process, so it only shoots
> itself in the foot.
>
> With BPF timestamping, a process may confuse a third party admin, so
> the situation is slightly different.

Agreed.

>
> > > IIUC, I will not consider this
> > > unlikely case, then the UDP case is quite similar to the TCP case.
> > >
> > > The scenario for the UDP case is:
> > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > > like TCP does in this series.
> > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > > with timestamp.
> > > Note: tskey in UDP will be handled carefully in a different way
> > > because we should support both modes for socket timestamping at the
> > > same time.
> > > It's really similar to TCP regardless of handling tskey.
> > >
> >
> > To be more precise in case you don't have much time to read the above
> > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> > sendmsg timestamp with corresponding tskey.
> >
> > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
> >
> > 2. For UDP, we can correlate in __ip_append_data() along with those
> > tskey initialization, assuming there are no multiple threads calling
> > locklessly ip_make_skb(). Locked path
> > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> > protection, so it can be easily handled. Lockless path
> > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> > the same time, which should be handled properly.
>
> Different hook points is fine, as UDP (and RAW) uses __ip_append_data

Then this approach (introducing this new flag) is feasible. Sorry that
last night I wrote such a long paragraph which buried something
important. Because of that, I rephrase the whole idea about how to let
UDP work with this kind of new flag in [patch v8 11/12]. Link is
https://lore.kernel.org/all/CAL+tcoCmXcDot-855XYU7PKCiGvJL=O3CQBGuOTRAs2_=Ys=gg@mail.gmail.com/

> or more importantly ip_send_skb, while TCP uses ip_queue_xmit.

For TCP, we use tcp_tx_timestamp to finish the map between sendmsg
timestamp and tskey.

>
> As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB)
> and the behavior of that operation. Subject to the usual distinction
> between protocol behavior (bytestream vs datagram).

I see your point.

>
> > I prefer to implement
> > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> > I think. This might be the only one corner case, IIUC?
>
> This sounds like an entirely different topic? Not sure what this is.

Not really a different topic. I mean let bpf prog take the whole
control of setting the tskey, then with this BPF_SOCK_OPS_TS_SND_CB
flag we can correlate the sendmsg timestamp with tskey. So It has
something to do with the usage of UDP. Please take a look at that link
to patch 11/12. For TCP, we don't need to care about the value of
tskey which has already been taken care of by SO_TIMESTAMPING. So it
is slightly different. I'm not sure if this kind of usage is
acceptable?

Thanks,
Jason

>
> > Overall I think BPF_SOCK_OPS_TS_SND_CB can work across protocols to do
> > the correlation job.
> >
> > To be on the safe side, I can change the name BPF_SOCK_OPS_TS_SND_CB
> > to BPF_SOCK_OPS_TS_TCP_SND_CB just in case this approach is not the
> > best one. What do you think about this?
> >
> > [1]
> > commit 4aecca4c76808f3736056d18ff510df80424bc9f
> > Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> > Date:   Tue Oct 1 05:57:14 2024 -0700
> >
> >     net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
> >
> >     SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> >     timestamps and packets sent via socket. Unfortunately, there is no way
> >     to reliably predict socket timestamp ID value in case of error returned
> >     by sendmsg. For UDP sockets it's impossible because of lockless
> >     nature of UDP transmit, several threads may send packets in parallel. In
> >     case of RAW sockets MSG_MORE option makes things complicated. More
> >     details are in the conversation [1].
> >     This patch adds new control message type to give user-space
> >     software an opportunity to control the mapping between packets and
> >     values by providing ID with each sendmsg for UDP sockets.
> >     The documentation is also added in this patch.
> >
> >     [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> >
> > Thanks,
> > Jason
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-05 15:47               ` Jason Xing
@ 2025-02-05 21:02                 ` Willem de Bruijn
  2025-02-06  0:33                   ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Willem de Bruijn @ 2025-02-05 21:02 UTC (permalink / raw)
  To: Jason Xing, Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

Jason Xing wrote:
> On Wed, Feb 5, 2025 at 11:20 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > > > Jason Xing wrote:
> > > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > > > > >
> > > > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > > > >
> > > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > > > the bpf_sk_storage ...".
> > > > > >
> > > > > > I will revise it. Thanks.
> > > > > >
> > > > > > >
> > > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > > > >
> > > > > > >
> > > > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > > > More details can be found in the selftest.
> > > > > > >
> > > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > > > ---
> > > > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > > > >   3 files changed, 15 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > > > >                                        * feature is on.
> > > > > > > >                                        */
> > > > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > > > +                                      * in the last send process to
> > > > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > > > +                                      * with other timestamping callbacks,
> > > > > > > > +                                      * like SND/SW/ACK.
> > > > > > >
> > > > > > > Do you have a chance to look at how this will work at UDP?
> > > > > >
> > > > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > > > strange because I did write a long paragraph about this thing which
> > > > > > apparently disappeared...
> > > > > >
> > > > > > I manage to find what I wrote:
> > > > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > > > >     should be handled carefully later. For the former, even though it's
> > > > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > > > >     same time, I think we'd better not correlate it like what we do to the
> > > > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > > > >     case we need to use it someday.
> > > > > >
> > > > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > > > >     will not conflict with the tskey in shared info which is generated
> > > > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > > > >     could get confused because sometimes they fetch the tskey from skb,
> > > > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > > > >     syscall with other tx points(SND/SW/ACK...).
> > > > > >
> > > > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > > > >
> > > > > >     If possible, we can leave this question until the UDP support series
> > > > > >     shows up. I will figure out a better solution :)
> > > > > >
> > > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > > > flag so I consider the compatibility reason.
> > > > >
> > > > > I don't think this is acceptable. We should aim for an API that can
> > > > > easily be used across protocols, like SO_TIMESTAMPING.
> > > >
> > > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > > > adjusted like below:
> > > >
> > > > It's hard to provide an absolutely uniform interface or usage to users
> > > > for TCP and UDP and even more protocols. Cases can be handled one by
> > > > one.
> >
> > We should try hard. SO_TIMESTAMPING is uniform across protocols.
> > An interface that is not is just hard to use.
> >
> > > > The main obstacle is how we can correlate the timestamp in
> > > > sendmsg syscall with other sending timestamps. It's worth noticing
> > > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > > > userspace. For instance, while skb enters the qdisc, we fail to know
> > > > which skb belongs to which sendmsg.
> > > >
> > > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > > > protection of socket lock + syscall as the current patch does. But for
> > > > UDP, it can be lockless. IIUC, there is a very special case where even
> > > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > > > socket send UDP packets in parallel, then users could be confused
> > > > which tskey matches which sendmsg.
> >
> > This is a known issue for lockless datagram sockets.
> >
> > With SO_TIMESTAMPING, but the use of timestamping and of concurrent
> > sendmsg calls is under control of the process, so it only shoots
> > itself in the foot.
> >
> > With BPF timestamping, a process may confuse a third party admin, so
> > the situation is slightly different.
> 
> Agreed.
> 
> >
> > > > IIUC, I will not consider this
> > > > unlikely case, then the UDP case is quite similar to the TCP case.
> > > >
> > > > The scenario for the UDP case is:
> > > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > > > like TCP does in this series.
> > > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > > > with timestamp.
> > > > Note: tskey in UDP will be handled carefully in a different way
> > > > because we should support both modes for socket timestamping at the
> > > > same time.
> > > > It's really similar to TCP regardless of handling tskey.
> > > >
> > >
> > > To be more precise in case you don't have much time to read the above
> > > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> > > sendmsg timestamp with corresponding tskey.
> > >
> > > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
> > >
> > > 2. For UDP, we can correlate in __ip_append_data() along with those
> > > tskey initialization, assuming there are no multiple threads calling
> > > locklessly ip_make_skb(). Locked path
> > > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> > > protection, so it can be easily handled. Lockless path
> > > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> > > the same time, which should be handled properly.
> >
> > Different hook points is fine, as UDP (and RAW) uses __ip_append_data
> 
> Then this approach (introducing this new flag) is feasible. Sorry that
> last night I wrote such a long paragraph which buried something
> important. Because of that, I rephrase the whole idea about how to let
> UDP work with this kind of new flag in [patch v8 11/12]. Link is
> https://lore.kernel.org/all/CAL+tcoCmXcDot-855XYU7PKCiGvJL=O3CQBGuOTRAs2_=Ys=gg@mail.gmail.com/
> 
> > or more importantly ip_send_skb, while TCP uses ip_queue_xmit.
> 
> For TCP, we use tcp_tx_timestamp to finish the map between sendmsg
> timestamp and tskey.
> 
> >
> > As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB)
> > and the behavior of that operation. Subject to the usual distinction
> > between protocol behavior (bytestream vs datagram).
> 
> I see your point.
> 
> >
> > > I prefer to implement
> > > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> > > I think. This might be the only one corner case, IIUC?
> >
> > This sounds like an entirely different topic? Not sure what this is.
> 
> Not really a different topic. I mean let bpf prog take the whole
> control of setting the tskey, then with this BPF_SOCK_OPS_TS_SND_CB
> flag we can correlate the sendmsg timestamp with tskey. So It has
> something to do with the usage of UDP. Please take a look at that link
> to patch 11/12. For TCP, we don't need to care about the value of
> tskey which has already been taken care of by SO_TIMESTAMPING. So it
> is slightly different. I'm not sure if this kind of usage is
> acceptable?

Why can TCP rely on SO_TIMESTAMPING to set tskey, but UDP cannot?

BPF will need to set the key for both protocol if SO_TIMESTAMPING is
not enabled, right?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-05 21:02                 ` Willem de Bruijn
@ 2025-02-06  0:33                   ` Jason Xing
  2025-02-06  3:00                     ` Willem de Bruijn
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-02-06  0:33 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Feb 6, 2025 at 5:02 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Wed, Feb 5, 2025 at 11:20 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > >
> > > > > > Jason Xing wrote:
> > > > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > > > > > >
> > > > > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > > > > >
> > > > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > > > > the bpf_sk_storage ...".
> > > > > > >
> > > > > > > I will revise it. Thanks.
> > > > > > >
> > > > > > > >
> > > > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > > > > >
> > > > > > > >
> > > > > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > > > > More details can be found in the selftest.
> > > > > > > >
> > > > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > > > > ---
> > > > > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > > > > >   3 files changed, 15 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > > > > >                                        * feature is on.
> > > > > > > > >                                        */
> > > > > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > > > > +                                      * in the last send process to
> > > > > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > > > > +                                      * with other timestamping callbacks,
> > > > > > > > > +                                      * like SND/SW/ACK.
> > > > > > > >
> > > > > > > > Do you have a chance to look at how this will work at UDP?
> > > > > > >
> > > > > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > > > > strange because I did write a long paragraph about this thing which
> > > > > > > apparently disappeared...
> > > > > > >
> > > > > > > I manage to find what I wrote:
> > > > > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > > > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > > > > >     should be handled carefully later. For the former, even though it's
> > > > > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > > > > >     same time, I think we'd better not correlate it like what we do to the
> > > > > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > > > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > > > > >     case we need to use it someday.
> > > > > > >
> > > > > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > > > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > > > > >     will not conflict with the tskey in shared info which is generated
> > > > > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > > > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > > > > >     could get confused because sometimes they fetch the tskey from skb,
> > > > > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > > > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > > > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > > > > >     syscall with other tx points(SND/SW/ACK...).
> > > > > > >
> > > > > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > > > > >
> > > > > > >     If possible, we can leave this question until the UDP support series
> > > > > > >     shows up. I will figure out a better solution :)
> > > > > > >
> > > > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > > > > flag so I consider the compatibility reason.
> > > > > >
> > > > > > I don't think this is acceptable. We should aim for an API that can
> > > > > > easily be used across protocols, like SO_TIMESTAMPING.
> > > > >
> > > > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > > > > adjusted like below:
> > > > >
> > > > > It's hard to provide an absolutely uniform interface or usage to users
> > > > > for TCP and UDP and even more protocols. Cases can be handled one by
> > > > > one.
> > >
> > > We should try hard. SO_TIMESTAMPING is uniform across protocols.
> > > An interface that is not is just hard to use.
> > >
> > > > > The main obstacle is how we can correlate the timestamp in
> > > > > sendmsg syscall with other sending timestamps. It's worth noticing
> > > > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > > > > userspace. For instance, while skb enters the qdisc, we fail to know
> > > > > which skb belongs to which sendmsg.
> > > > >
> > > > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > > > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > > > > protection of socket lock + syscall as the current patch does. But for
> > > > > UDP, it can be lockless. IIUC, there is a very special case where even
> > > > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > > > > socket send UDP packets in parallel, then users could be confused
> > > > > which tskey matches which sendmsg.
> > >
> > > This is a known issue for lockless datagram sockets.
> > >
> > > With SO_TIMESTAMPING, but the use of timestamping and of concurrent
> > > sendmsg calls is under control of the process, so it only shoots
> > > itself in the foot.
> > >
> > > With BPF timestamping, a process may confuse a third party admin, so
> > > the situation is slightly different.
> >
> > Agreed.
> >
> > >
> > > > > IIUC, I will not consider this
> > > > > unlikely case, then the UDP case is quite similar to the TCP case.
> > > > >
> > > > > The scenario for the UDP case is:
> > > > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > > > > like TCP does in this series.
> > > > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > > > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > > > > with timestamp.
> > > > > Note: tskey in UDP will be handled carefully in a different way
> > > > > because we should support both modes for socket timestamping at the
> > > > > same time.
> > > > > It's really similar to TCP regardless of handling tskey.
> > > > >
> > > >
> > > > To be more precise in case you don't have much time to read the above
> > > > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> > > > sendmsg timestamp with corresponding tskey.
> > > >
> > > > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
> > > >
> > > > 2. For UDP, we can correlate in __ip_append_data() along with those
> > > > tskey initialization, assuming there are no multiple threads calling
> > > > locklessly ip_make_skb(). Locked path
> > > > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> > > > protection, so it can be easily handled. Lockless path
> > > > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> > > > the same time, which should be handled properly.
> > >
> > > Different hook points is fine, as UDP (and RAW) uses __ip_append_data
> >
> > Then this approach (introducing this new flag) is feasible. Sorry that
> > last night I wrote such a long paragraph which buried something
> > important. Because of that, I rephrase the whole idea about how to let
> > UDP work with this kind of new flag in [patch v8 11/12]. Link is
> > https://lore.kernel.org/all/CAL+tcoCmXcDot-855XYU7PKCiGvJL=O3CQBGuOTRAs2_=Ys=gg@mail.gmail.com/
> >
> > > or more importantly ip_send_skb, while TCP uses ip_queue_xmit.
> >
> > For TCP, we use tcp_tx_timestamp to finish the map between sendmsg
> > timestamp and tskey.
> >
> > >
> > > As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB)
> > > and the behavior of that operation. Subject to the usual distinction
> > > between protocol behavior (bytestream vs datagram).
> >
> > I see your point.
> >
> > >
> > > > I prefer to implement
> > > > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> > > > I think. This might be the only one corner case, IIUC?
> > >
> > > This sounds like an entirely different topic? Not sure what this is.
> >
> > Not really a different topic. I mean let bpf prog take the whole
> > control of setting the tskey, then with this BPF_SOCK_OPS_TS_SND_CB
> > flag we can correlate the sendmsg timestamp with tskey. So It has
> > something to do with the usage of UDP. Please take a look at that link
> > to patch 11/12. For TCP, we don't need to care about the value of
> > tskey which has already been taken care of by SO_TIMESTAMPING. So it
> > is slightly different. I'm not sure if this kind of usage is
> > acceptable?
>
> Why can TCP rely on SO_TIMESTAMPING to set tskey, but UDP cannot?

Because for TCP the shared info tskey is calculated by seqno (in
tcp_tx_timestamp()), so it works for so_timestamping and its bpf
extension and they are the same. However, for UDP, the shared info
tskey can be different, depending on when to call __ip_append_data()
and what the sk->sk_tskey is. It can cause conflicts when two modes
work at the same time. More than that, lockless UDP case is a tough
one since we cannot correlate the sendmsg timestamp in udp_sendmsg()
with the tskey generated in __ip_append_data(), which is a long gap
without any lock. So reuse the IPCORK_TS_OPT_ID logic for bpf
extension here can work.

It's worth to highlight that 1) for TCP the time to generate a sendmsg
timestamp and generate a tskey are under the same lock, 2) for
non-lockless UDP, it works like TCP. But in order to deal with the
lockless part, I choose to implement the IPCORK_TS_OPT_ID idea. This
is somehow another topic, but the relevant part is that I tried to
prove BPF_SOCK_OPS_TS_SND_CB can be also used in UDP (but in a
different position compared to TCP protocol).

>
> BPF will need to set the key for both protocol if SO_TIMESTAMPING is
> not enabled, right?

For TCP, we will rely on 'shinfo->tskey = TCP_SKB_CB(skb)->seq +
skb->len - 1;' in tcp_tx_timestamp() regardless of the status of
SO_TIMESTAMPING, so don't bother to manage the tskey from bpf prog.
While for UDP, we have to manage the tskey. Of course, it's another
topic.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-06  0:33                   ` Jason Xing
@ 2025-02-06  3:00                     ` Willem de Bruijn
  2025-02-06  4:03                       ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Willem de Bruijn @ 2025-02-06  3:00 UTC (permalink / raw)
  To: Jason Xing, Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

Jason Xing wrote:
> On Thu, Feb 6, 2025 at 5:02 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > On Wed, Feb 5, 2025 at 11:20 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > Jason Xing wrote:
> > > > > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > >
> > > > > > > Jason Xing wrote:
> > > > > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > > > > > > >
> > > > > > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > > > > > >
> > > > > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > > > > > the bpf_sk_storage ...".
> > > > > > > >
> > > > > > > > I will revise it. Thanks.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > > > > > More details can be found in the selftest.
> > > > > > > > >
> > > > > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > > > > > ---
> > > > > > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > > > > > >   3 files changed, 15 insertions(+)
> > > > > > > > > >
> > > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > > > > > >                                        * feature is on.
> > > > > > > > > >                                        */
> > > > > > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > > > > > +                                      * in the last send process to
> > > > > > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > > > > > +                                      * with other timestamping callbacks,
> > > > > > > > > > +                                      * like SND/SW/ACK.
> > > > > > > > >
> > > > > > > > > Do you have a chance to look at how this will work at UDP?
> > > > > > > >
> > > > > > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > > > > > strange because I did write a long paragraph about this thing which
> > > > > > > > apparently disappeared...
> > > > > > > >
> > > > > > > > I manage to find what I wrote:
> > > > > > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > > > > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > > > > > >     should be handled carefully later. For the former, even though it's
> > > > > > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > > > > > >     same time, I think we'd better not correlate it like what we do to the
> > > > > > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > > > > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > > > > > >     case we need to use it someday.
> > > > > > > >
> > > > > > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > > > > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > > > > > >     will not conflict with the tskey in shared info which is generated
> > > > > > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > > > > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > > > > > >     could get confused because sometimes they fetch the tskey from skb,
> > > > > > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > > > > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > > > > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > > > > > >     syscall with other tx points(SND/SW/ACK...).
> > > > > > > >
> > > > > > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > > > > > >
> > > > > > > >     If possible, we can leave this question until the UDP support series
> > > > > > > >     shows up. I will figure out a better solution :)
> > > > > > > >
> > > > > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > > > > > flag so I consider the compatibility reason.
> > > > > > >
> > > > > > > I don't think this is acceptable. We should aim for an API that can
> > > > > > > easily be used across protocols, like SO_TIMESTAMPING.
> > > > > >
> > > > > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > > > > > adjusted like below:
> > > > > >
> > > > > > It's hard to provide an absolutely uniform interface or usage to users
> > > > > > for TCP and UDP and even more protocols. Cases can be handled one by
> > > > > > one.
> > > >
> > > > We should try hard. SO_TIMESTAMPING is uniform across protocols.
> > > > An interface that is not is just hard to use.
> > > >
> > > > > > The main obstacle is how we can correlate the timestamp in
> > > > > > sendmsg syscall with other sending timestamps. It's worth noticing
> > > > > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > > > > > userspace. For instance, while skb enters the qdisc, we fail to know
> > > > > > which skb belongs to which sendmsg.
> > > > > >
> > > > > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > > > > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > > > > > protection of socket lock + syscall as the current patch does. But for
> > > > > > UDP, it can be lockless. IIUC, there is a very special case where even
> > > > > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > > > > > socket send UDP packets in parallel, then users could be confused
> > > > > > which tskey matches which sendmsg.
> > > >
> > > > This is a known issue for lockless datagram sockets.
> > > >
> > > > With SO_TIMESTAMPING, but the use of timestamping and of concurrent
> > > > sendmsg calls is under control of the process, so it only shoots
> > > > itself in the foot.
> > > >
> > > > With BPF timestamping, a process may confuse a third party admin, so
> > > > the situation is slightly different.
> > >
> > > Agreed.
> > >
> > > >
> > > > > > IIUC, I will not consider this
> > > > > > unlikely case, then the UDP case is quite similar to the TCP case.
> > > > > >
> > > > > > The scenario for the UDP case is:
> > > > > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > > > > > like TCP does in this series.
> > > > > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > > > > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > > > > > with timestamp.
> > > > > > Note: tskey in UDP will be handled carefully in a different way
> > > > > > because we should support both modes for socket timestamping at the
> > > > > > same time.
> > > > > > It's really similar to TCP regardless of handling tskey.
> > > > > >
> > > > >
> > > > > To be more precise in case you don't have much time to read the above
> > > > > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> > > > > sendmsg timestamp with corresponding tskey.
> > > > >
> > > > > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
> > > > >
> > > > > 2. For UDP, we can correlate in __ip_append_data() along with those
> > > > > tskey initialization, assuming there are no multiple threads calling
> > > > > locklessly ip_make_skb(). Locked path
> > > > > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> > > > > protection, so it can be easily handled. Lockless path
> > > > > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> > > > > the same time, which should be handled properly.
> > > >
> > > > Different hook points is fine, as UDP (and RAW) uses __ip_append_data
> > >
> > > Then this approach (introducing this new flag) is feasible. Sorry that
> > > last night I wrote such a long paragraph which buried something
> > > important. Because of that, I rephrase the whole idea about how to let
> > > UDP work with this kind of new flag in [patch v8 11/12]. Link is
> > > https://lore.kernel.org/all/CAL+tcoCmXcDot-855XYU7PKCiGvJL=O3CQBGuOTRAs2_=Ys=gg@mail.gmail.com/
> > >
> > > > or more importantly ip_send_skb, while TCP uses ip_queue_xmit.
> > >
> > > For TCP, we use tcp_tx_timestamp to finish the map between sendmsg
> > > timestamp and tskey.
> > >
> > > >
> > > > As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB)
> > > > and the behavior of that operation. Subject to the usual distinction
> > > > between protocol behavior (bytestream vs datagram).
> > >
> > > I see your point.
> > >
> > > >
> > > > > I prefer to implement
> > > > > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> > > > > I think. This might be the only one corner case, IIUC?
> > > >
> > > > This sounds like an entirely different topic? Not sure what this is.
> > >
> > > Not really a different topic. I mean let bpf prog take the whole
> > > control of setting the tskey, then with this BPF_SOCK_OPS_TS_SND_CB
> > > flag we can correlate the sendmsg timestamp with tskey. So It has
> > > something to do with the usage of UDP. Please take a look at that link
> > > to patch 11/12. For TCP, we don't need to care about the value of
> > > tskey which has already been taken care of by SO_TIMESTAMPING. So it
> > > is slightly different. I'm not sure if this kind of usage is
> > > acceptable?
> >
> > Why can TCP rely on SO_TIMESTAMPING to set tskey, but UDP cannot?
> 
> Because for TCP the shared info tskey is calculated by seqno (in
> tcp_tx_timestamp()), so it works for so_timestamping and its bpf
> extension and they are the same. However, for UDP, the shared info
> tskey can be different, depending on when to call __ip_append_data()
> and what the sk->sk_tskey is. It can cause conflicts when two modes
> work at the same time. 

lockless and locked cannot conflict. (if up->pending then the only
option is to append to that.)

> More than that, lockless UDP case is a tough
> one since we cannot correlate the sendmsg timestamp in udp_sendmsg()
> with the tskey generated in __ip_append_data(), 

With SO_TIMESTAMPING we do not have this distinction between TCP and
UDP, so we don't need it here.

It is true that multiple lockless sendmsg calls can race and in that
case that correlation is ambiguous. That is also the case for
SO_TIMESTAMPING and a known issue.

This is unlikely in most workloads in practice.

> which is a long gap
> without any lock. So reuse the IPCORK_TS_OPT_ID logic for bpf
> extension here can work.

It is fine to add a solution to work around the ambiguity. But not
to make it a precondition and so diverge the API for TCP and UDP.

The same argument to choose a key from BPF can be made for TCP to
a certain extent.

> It's worth to highlight that 1) for TCP the time to generate a sendmsg
> timestamp and generate a tskey are under the same lock, 2) for
> non-lockless UDP, it works like TCP. But in order to deal with the
> lockless part, I choose to implement the IPCORK_TS_OPT_ID idea. This
> is somehow another topic, but the relevant part is that I tried to
> prove BPF_SOCK_OPS_TS_SND_CB can be also used in UDP (but in a
> different position compared to TCP protocol).
> 
> >
> > BPF will need to set the key for both protocol if SO_TIMESTAMPING is
> > not enabled, right?
> 
> For TCP, we will rely on 'shinfo->tskey = TCP_SKB_CB(skb)->seq +
> skb->len - 1;' in tcp_tx_timestamp() regardless of the status of
> SO_TIMESTAMPING, so don't bother to manage the tskey from bpf prog.
> While for UDP, we have to manage the tskey. Of course, it's another
> topic.
> 
> Thanks,
> Jason



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-06  3:00                     ` Willem de Bruijn
@ 2025-02-06  4:03                       ` Jason Xing
  2025-02-06 16:22                         ` Willem de Bruijn
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Xing @ 2025-02-06  4:03 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Feb 6, 2025 at 11:00 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Thu, Feb 6, 2025 at 5:02 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > On Wed, Feb 5, 2025 at 11:20 PM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > > > Jason Xing wrote:
> > > > > > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > > > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Jason Xing wrote:
> > > > > > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > > > > > > > >
> > > > > > > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > > > > > > >
> > > > > > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > > > > > > the bpf_sk_storage ...".
> > > > > > > > >
> > > > > > > > > I will revise it. Thanks.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > > > > > > More details can be found in the selftest.
> > > > > > > > > >
> > > > > > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > > > > > > ---
> > > > > > > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > > > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > > > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > > > > > > >   3 files changed, 15 insertions(+)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > > > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > > > > > > >                                        * feature is on.
> > > > > > > > > > >                                        */
> > > > > > > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > > > > > > +                                      * in the last send process to
> > > > > > > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > > > > > > +                                      * with other timestamping callbacks,
> > > > > > > > > > > +                                      * like SND/SW/ACK.
> > > > > > > > > >
> > > > > > > > > > Do you have a chance to look at how this will work at UDP?
> > > > > > > > >
> > > > > > > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > > > > > > strange because I did write a long paragraph about this thing which
> > > > > > > > > apparently disappeared...
> > > > > > > > >
> > > > > > > > > I manage to find what I wrote:
> > > > > > > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > > > > > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > > > > > > >     should be handled carefully later. For the former, even though it's
> > > > > > > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > > > > > > >     same time, I think we'd better not correlate it like what we do to the
> > > > > > > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > > > > > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > > > > > > >     case we need to use it someday.
> > > > > > > > >
> > > > > > > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > > > > > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > > > > > > >     will not conflict with the tskey in shared info which is generated
> > > > > > > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > > > > > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > > > > > > >     could get confused because sometimes they fetch the tskey from skb,
> > > > > > > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > > > > > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > > > > > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > > > > > > >     syscall with other tx points(SND/SW/ACK...).
> > > > > > > > >
> > > > > > > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > > > > > > >
> > > > > > > > >     If possible, we can leave this question until the UDP support series
> > > > > > > > >     shows up. I will figure out a better solution :)
> > > > > > > > >
> > > > > > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > > > > > > flag so I consider the compatibility reason.
> > > > > > > >
> > > > > > > > I don't think this is acceptable. We should aim for an API that can
> > > > > > > > easily be used across protocols, like SO_TIMESTAMPING.
> > > > > > >
> > > > > > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > > > > > > adjusted like below:
> > > > > > >
> > > > > > > It's hard to provide an absolutely uniform interface or usage to users
> > > > > > > for TCP and UDP and even more protocols. Cases can be handled one by
> > > > > > > one.
> > > > >
> > > > > We should try hard. SO_TIMESTAMPING is uniform across protocols.
> > > > > An interface that is not is just hard to use.
> > > > >
> > > > > > > The main obstacle is how we can correlate the timestamp in
> > > > > > > sendmsg syscall with other sending timestamps. It's worth noticing
> > > > > > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > > > > > > userspace. For instance, while skb enters the qdisc, we fail to know
> > > > > > > which skb belongs to which sendmsg.
> > > > > > >
> > > > > > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > > > > > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > > > > > > protection of socket lock + syscall as the current patch does. But for
> > > > > > > UDP, it can be lockless. IIUC, there is a very special case where even
> > > > > > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > > > > > > socket send UDP packets in parallel, then users could be confused
> > > > > > > which tskey matches which sendmsg.
> > > > >
> > > > > This is a known issue for lockless datagram sockets.
> > > > >
> > > > > With SO_TIMESTAMPING, but the use of timestamping and of concurrent
> > > > > sendmsg calls is under control of the process, so it only shoots
> > > > > itself in the foot.
> > > > >
> > > > > With BPF timestamping, a process may confuse a third party admin, so
> > > > > the situation is slightly different.
> > > >
> > > > Agreed.
> > > >
> > > > >
> > > > > > > IIUC, I will not consider this
> > > > > > > unlikely case, then the UDP case is quite similar to the TCP case.
> > > > > > >
> > > > > > > The scenario for the UDP case is:
> > > > > > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > > > > > > like TCP does in this series.
> > > > > > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > > > > > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > > > > > > with timestamp.
> > > > > > > Note: tskey in UDP will be handled carefully in a different way
> > > > > > > because we should support both modes for socket timestamping at the
> > > > > > > same time.
> > > > > > > It's really similar to TCP regardless of handling tskey.
> > > > > > >
> > > > > >
> > > > > > To be more precise in case you don't have much time to read the above
> > > > > > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> > > > > > sendmsg timestamp with corresponding tskey.
> > > > > >
> > > > > > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
> > > > > >
> > > > > > 2. For UDP, we can correlate in __ip_append_data() along with those
> > > > > > tskey initialization, assuming there are no multiple threads calling
> > > > > > locklessly ip_make_skb(). Locked path
> > > > > > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> > > > > > protection, so it can be easily handled. Lockless path
> > > > > > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> > > > > > the same time, which should be handled properly.
> > > > >
> > > > > Different hook points is fine, as UDP (and RAW) uses __ip_append_data
> > > >
> > > > Then this approach (introducing this new flag) is feasible. Sorry that
> > > > last night I wrote such a long paragraph which buried something
> > > > important. Because of that, I rephrase the whole idea about how to let
> > > > UDP work with this kind of new flag in [patch v8 11/12]. Link is
> > > > https://lore.kernel.org/all/CAL+tcoCmXcDot-855XYU7PKCiGvJL=O3CQBGuOTRAs2_=Ys=gg@mail.gmail.com/
> > > >
> > > > > or more importantly ip_send_skb, while TCP uses ip_queue_xmit.
> > > >
> > > > For TCP, we use tcp_tx_timestamp to finish the map between sendmsg
> > > > timestamp and tskey.
> > > >
> > > > >
> > > > > As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB)
> > > > > and the behavior of that operation. Subject to the usual distinction
> > > > > between protocol behavior (bytestream vs datagram).
> > > >
> > > > I see your point.
> > > >
> > > > >
> > > > > > I prefer to implement
> > > > > > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> > > > > > I think. This might be the only one corner case, IIUC?
> > > > >
> > > > > This sounds like an entirely different topic? Not sure what this is.
> > > >
> > > > Not really a different topic. I mean let bpf prog take the whole
> > > > control of setting the tskey, then with this BPF_SOCK_OPS_TS_SND_CB
> > > > flag we can correlate the sendmsg timestamp with tskey. So It has
> > > > something to do with the usage of UDP. Please take a look at that link
> > > > to patch 11/12. For TCP, we don't need to care about the value of
> > > > tskey which has already been taken care of by SO_TIMESTAMPING. So it
> > > > is slightly different. I'm not sure if this kind of usage is
> > > > acceptable?
> > >
> > > Why can TCP rely on SO_TIMESTAMPING to set tskey, but UDP cannot?
> >
> > Because for TCP the shared info tskey is calculated by seqno (in
> > tcp_tx_timestamp()), so it works for so_timestamping and its bpf
> > extension and they are the same. However, for UDP, the shared info
> > tskey can be different, depending on when to call __ip_append_data()
> > and what the sk->sk_tskey is. It can cause conflicts when two modes
> > work at the same time.
>
> lockless and locked cannot conflict. (if up->pending then the only
> option is to append to that.)

Sorry, I should have described more about this point. I was trying to
say that if two modes (bpf extension and application timestamping)
work at the same time, will the tskey get messed up? Because we have
to check if the application mode is turned on. If on, we fetch the
existing key generated by the application mode, or else we generate
one by modifying the sk->sk_tskey or the tskey of skb?

>
> > More than that, lockless UDP case is a tough
> > one since we cannot correlate the sendmsg timestamp in udp_sendmsg()
> > with the tskey generated in __ip_append_data(),
>
> With SO_TIMESTAMPING we do not have this distinction between TCP and
> UDP, so we don't need it here.
>
> It is true that multiple lockless sendmsg calls can race and in that
> case that correlation is ambiguous. That is also the case for
> SO_TIMESTAMPING and a known issue.
>
> This is unlikely in most workloads in practice.

Oh, I see.

> > which is a long gap
> > without any lock. So reuse the IPCORK_TS_OPT_ID logic for bpf
> > extension here can work.
>
> It is fine to add a solution to work around the ambiguity. But not
> to make it a precondition and so diverge the API for TCP and UDP.

There is no divergence in API. BPF always uses SND flag to finish the
correlation. The only difference is UDP needs to manage the tskey pool
(allocating which tskey to which sendmsg).

>
> The same argument to choose a key from BPF can be made for TCP to
> a certain extent.

Well...right, but we don't bother to do this for TCP. The TCP case is simpler.

It seems that you object to the idea that let bpf prog control the
allocating tskey for UDP _as default_.

I'm not sure if I follow you. Sorry for repeating in case that I miss something:
1) For UDP, do not allocate the tskey from the bpf program, unless
it's an alternative/workaround to handle the lockless case. So it's a
backup choice.
2) Use the exact same way like TCP to finish the correlation on the
basis of socket lock protection _as default_. Because the lockless
method is seldomly used then we may provide 1) method?
?

Thanks,
Jason

>
> > It's worth to highlight that 1) for TCP the time to generate a sendmsg
> > timestamp and generate a tskey are under the same lock, 2) for
> > non-lockless UDP, it works like TCP. But in order to deal with the
> > lockless part, I choose to implement the IPCORK_TS_OPT_ID idea. This
> > is somehow another topic, but the relevant part is that I tried to
> > prove BPF_SOCK_OPS_TS_SND_CB can be also used in UDP (but in a
> > different position compared to TCP protocol).
> >
> > >
> > > BPF will need to set the key for both protocol if SO_TIMESTAMPING is
> > > not enabled, right?
> >
> > For TCP, we will rely on 'shinfo->tskey = TCP_SKB_CB(skb)->seq +
> > skb->len - 1;' in tcp_tx_timestamp() regardless of the status of
> > SO_TIMESTAMPING, so don't bother to manage the tskey from bpf prog.
> > While for UDP, we have to manage the tskey. Of course, it's another
> > topic.
> >
> > Thanks,
> > Jason
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-06  4:03                       ` Jason Xing
@ 2025-02-06 16:22                         ` Willem de Bruijn
  2025-02-07  0:35                           ` Jason Xing
  0 siblings, 1 reply; 45+ messages in thread
From: Willem de Bruijn @ 2025-02-06 16:22 UTC (permalink / raw)
  To: Jason Xing, Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

Jason Xing wrote:
> On Thu, Feb 6, 2025 at 11:00 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > On Thu, Feb 6, 2025 at 5:02 AM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > Jason Xing wrote:
> > > > > On Wed, Feb 5, 2025 at 11:20 PM Willem de Bruijn
> > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > >
> > > > > > Jason Xing wrote:
> > > > > > > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > > > > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Jason Xing wrote:
> > > > > > > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > > > > > > > >
> > > > > > > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > > > > > > > the bpf_sk_storage ...".
> > > > > > > > > >
> > > > > > > > > > I will revise it. Thanks.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > > > > > > > More details can be found in the selftest.
> > > > > > > > > > >
> > > > > > > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > > > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > > > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > > > > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > > > > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > > > > > > > >   3 files changed, 15 insertions(+)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > > > > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > > > > > > > >                                        * feature is on.
> > > > > > > > > > > >                                        */
> > > > > > > > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > > > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > > > > > > > +                                      * in the last send process to
> > > > > > > > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > > > > > > > +                                      * with other timestamping callbacks,
> > > > > > > > > > > > +                                      * like SND/SW/ACK.
> > > > > > > > > > >
> > > > > > > > > > > Do you have a chance to look at how this will work at UDP?
> > > > > > > > > >
> > > > > > > > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > > > > > > > strange because I did write a long paragraph about this thing which
> > > > > > > > > > apparently disappeared...
> > > > > > > > > >
> > > > > > > > > > I manage to find what I wrote:
> > > > > > > > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > > > > > > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > > > > > > > >     should be handled carefully later. For the former, even though it's
> > > > > > > > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > > > > > > > >     same time, I think we'd better not correlate it like what we do to the
> > > > > > > > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > > > > > > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > > > > > > > >     case we need to use it someday.
> > > > > > > > > >
> > > > > > > > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > > > > > > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > > > > > > > >     will not conflict with the tskey in shared info which is generated
> > > > > > > > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > > > > > > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > > > > > > > >     could get confused because sometimes they fetch the tskey from skb,
> > > > > > > > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > > > > > > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > > > > > > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > > > > > > > >     syscall with other tx points(SND/SW/ACK...).
> > > > > > > > > >
> > > > > > > > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > > > > > > > >
> > > > > > > > > >     If possible, we can leave this question until the UDP support series
> > > > > > > > > >     shows up. I will figure out a better solution :)
> > > > > > > > > >
> > > > > > > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > > > > > > > flag so I consider the compatibility reason.
> > > > > > > > >
> > > > > > > > > I don't think this is acceptable. We should aim for an API that can
> > > > > > > > > easily be used across protocols, like SO_TIMESTAMPING.
> > > > > > > >
> > > > > > > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > > > > > > > adjusted like below:
> > > > > > > >
> > > > > > > > It's hard to provide an absolutely uniform interface or usage to users
> > > > > > > > for TCP and UDP and even more protocols. Cases can be handled one by
> > > > > > > > one.
> > > > > >
> > > > > > We should try hard. SO_TIMESTAMPING is uniform across protocols.
> > > > > > An interface that is not is just hard to use.
> > > > > >
> > > > > > > > The main obstacle is how we can correlate the timestamp in
> > > > > > > > sendmsg syscall with other sending timestamps. It's worth noticing
> > > > > > > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > > > > > > > userspace. For instance, while skb enters the qdisc, we fail to know
> > > > > > > > which skb belongs to which sendmsg.
> > > > > > > >
> > > > > > > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > > > > > > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > > > > > > > protection of socket lock + syscall as the current patch does. But for
> > > > > > > > UDP, it can be lockless. IIUC, there is a very special case where even
> > > > > > > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > > > > > > > socket send UDP packets in parallel, then users could be confused
> > > > > > > > which tskey matches which sendmsg.
> > > > > >
> > > > > > This is a known issue for lockless datagram sockets.
> > > > > >
> > > > > > With SO_TIMESTAMPING, but the use of timestamping and of concurrent
> > > > > > sendmsg calls is under control of the process, so it only shoots
> > > > > > itself in the foot.
> > > > > >
> > > > > > With BPF timestamping, a process may confuse a third party admin, so
> > > > > > the situation is slightly different.
> > > > >
> > > > > Agreed.
> > > > >
> > > > > >
> > > > > > > > IIUC, I will not consider this
> > > > > > > > unlikely case, then the UDP case is quite similar to the TCP case.
> > > > > > > >
> > > > > > > > The scenario for the UDP case is:
> > > > > > > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > > > > > > > like TCP does in this series.
> > > > > > > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > > > > > > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > > > > > > > with timestamp.
> > > > > > > > Note: tskey in UDP will be handled carefully in a different way
> > > > > > > > because we should support both modes for socket timestamping at the
> > > > > > > > same time.
> > > > > > > > It's really similar to TCP regardless of handling tskey.
> > > > > > > >
> > > > > > >
> > > > > > > To be more precise in case you don't have much time to read the above
> > > > > > > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> > > > > > > sendmsg timestamp with corresponding tskey.
> > > > > > >
> > > > > > > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
> > > > > > >
> > > > > > > 2. For UDP, we can correlate in __ip_append_data() along with those
> > > > > > > tskey initialization, assuming there are no multiple threads calling
> > > > > > > locklessly ip_make_skb(). Locked path
> > > > > > > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> > > > > > > protection, so it can be easily handled. Lockless path
> > > > > > > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> > > > > > > the same time, which should be handled properly.
> > > > > >
> > > > > > Different hook points is fine, as UDP (and RAW) uses __ip_append_data
> > > > >
> > > > > Then this approach (introducing this new flag) is feasible. Sorry that
> > > > > last night I wrote such a long paragraph which buried something
> > > > > important. Because of that, I rephrase the whole idea about how to let
> > > > > UDP work with this kind of new flag in [patch v8 11/12]. Link is
> > > > > https://lore.kernel.org/all/CAL+tcoCmXcDot-855XYU7PKCiGvJL=O3CQBGuOTRAs2_=Ys=gg@mail.gmail.com/
> > > > >
> > > > > > or more importantly ip_send_skb, while TCP uses ip_queue_xmit.
> > > > >
> > > > > For TCP, we use tcp_tx_timestamp to finish the map between sendmsg
> > > > > timestamp and tskey.
> > > > >
> > > > > >
> > > > > > As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB)
> > > > > > and the behavior of that operation. Subject to the usual distinction
> > > > > > between protocol behavior (bytestream vs datagram).
> > > > >
> > > > > I see your point.
> > > > >
> > > > > >
> > > > > > > I prefer to implement
> > > > > > > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> > > > > > > I think. This might be the only one corner case, IIUC?
> > > > > >
> > > > > > This sounds like an entirely different topic? Not sure what this is.
> > > > >
> > > > > Not really a different topic. I mean let bpf prog take the whole
> > > > > control of setting the tskey, then with this BPF_SOCK_OPS_TS_SND_CB
> > > > > flag we can correlate the sendmsg timestamp with tskey. So It has
> > > > > something to do with the usage of UDP. Please take a look at that link
> > > > > to patch 11/12. For TCP, we don't need to care about the value of
> > > > > tskey which has already been taken care of by SO_TIMESTAMPING. So it
> > > > > is slightly different. I'm not sure if this kind of usage is
> > > > > acceptable?
> > > >
> > > > Why can TCP rely on SO_TIMESTAMPING to set tskey, but UDP cannot?
> > >
> > > Because for TCP the shared info tskey is calculated by seqno (in
> > > tcp_tx_timestamp()), so it works for so_timestamping and its bpf
> > > extension and they are the same. However, for UDP, the shared info
> > > tskey can be different, depending on when to call __ip_append_data()
> > > and what the sk->sk_tskey is. It can cause conflicts when two modes
> > > work at the same time.
> >
> > lockless and locked cannot conflict. (if up->pending then the only
> > option is to append to that.)
> 
> Sorry, I should have described more about this point. I was trying to
> say that if two modes (bpf extension and application timestamping)
> work at the same time, will the tskey get messed up? Because we have
> to check if the application mode is turned on. If on, we fetch the
> existing key generated by the application mode, or else we generate
> one by modifying the sk->sk_tskey or the tskey of skb?

This applies to all protocols that implement both. TCP, UDP,
eventually RAW and maybe others like L2TP, CAN, MPTCP, ..

Which is why it should be handled the same uniformly.
 
> >
> > > More than that, lockless UDP case is a tough
> > > one since we cannot correlate the sendmsg timestamp in udp_sendmsg()
> > > with the tskey generated in __ip_append_data(),
> >
> > With SO_TIMESTAMPING we do not have this distinction between TCP and
> > UDP, so we don't need it here.
> >
> > It is true that multiple lockless sendmsg calls can race and in that
> > case that correlation is ambiguous. That is also the case for
> > SO_TIMESTAMPING and a known issue.
> >
> > This is unlikely in most workloads in practice.
> 
> Oh, I see.
> 
> > > which is a long gap
> > > without any lock. So reuse the IPCORK_TS_OPT_ID logic for bpf
> > > extension here can work.
> >
> > It is fine to add a solution to work around the ambiguity. But not
> > to make it a precondition and so diverge the API for TCP and UDP.
> 
> There is no divergence in API. BPF always uses SND flag to finish the
> correlation. The only difference is UDP needs to manage the tskey pool
> (allocating which tskey to which sendmsg).

Again, I don't see how this is UDP specific.

> 
> >
> > The same argument to choose a key from BPF can be made for TCP to
> > a certain extent.
> 
> Well...right, but we don't bother to do this for TCP. The TCP case is simpler.
> 
> It seems that you object to the idea that let bpf prog control the
> allocating tskey for UDP _as default_.

Correct. All protocols should have the same sensible default behavior.
 
> I'm not sure if I follow you. Sorry for repeating in case that I miss something:
> 1) For UDP, do not allocate the tskey from the bpf program, unless
> it's an alternative/workaround to handle the lockless case. So it's a
> backup choice.

And an alterative API should apply equally to all protocols too.

> 2) Use the exact same way like TCP to finish the correlation on the
> basis of socket lock protection _as default_. Because the lockless
> method is seldomly used then we may provide 1) method?
> ?

The opposite: the default for UDP is the lockless fast path.

> Thanks,
> Jason
> 
> >
> > > It's worth to highlight that 1) for TCP the time to generate a sendmsg
> > > timestamp and generate a tskey are under the same lock, 2) for
> > > non-lockless UDP, it works like TCP. But in order to deal with the
> > > lockless part, I choose to implement the IPCORK_TS_OPT_ID idea. This
> > > is somehow another topic, but the relevant part is that I tried to
> > > prove BPF_SOCK_OPS_TS_SND_CB can be also used in UDP (but in a
> > > different position compared to TCP protocol).
> > >
> > > >
> > > > BPF will need to set the key for both protocol if SO_TIMESTAMPING is
> > > > not enabled, right?
> > >
> > > For TCP, we will rely on 'shinfo->tskey = TCP_SKB_CB(skb)->seq +
> > > skb->len - 1;' in tcp_tx_timestamp() regardless of the status of
> > > SO_TIMESTAMPING, so don't bother to manage the tskey from bpf prog.
> > > While for UDP, we have to manage the tskey. Of course, it's another
> > > topic.
> > >
> > > Thanks,
> > > Jason
> >
> >



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
  2025-02-06 16:22                         ` Willem de Bruijn
@ 2025-02-07  0:35                           ` Jason Xing
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Xing @ 2025-02-07  0:35 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Fri, Feb 7, 2025 at 12:22 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Thu, Feb 6, 2025 at 11:00 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > On Thu, Feb 6, 2025 at 5:02 AM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > > > Jason Xing wrote:
> > > > > > On Wed, Feb 5, 2025 at 11:20 PM Willem de Bruijn
> > > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > >
> > > > > > > Jason Xing wrote:
> > > > > > > > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > > > > > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Jason Xing wrote:
> > > > > > > > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > > > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > > > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > > > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > > > > > > > > >
> > > > > > > > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > > > > > > > > the bpf_sk_storage ...".
> > > > > > > > > > >
> > > > > > > > > > > I will revise it. Thanks.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > > > > > > > > More details can be found in the selftest.
> > > > > > > > > > > >
> > > > > > > > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > > > > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > > > > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > > > > > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > > > > > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > > > > > > > > >   3 files changed, 15 insertions(+)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > > > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > > > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > > > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > > > > > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > > > > > > > > >                                        * feature is on.
> > > > > > > > > > > > >                                        */
> > > > > > > > > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > > > > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > > > > > > > > +                                      * in the last send process to
> > > > > > > > > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > > > > > > > > +                                      * with other timestamping callbacks,
> > > > > > > > > > > > > +                                      * like SND/SW/ACK.
> > > > > > > > > > > >
> > > > > > > > > > > > Do you have a chance to look at how this will work at UDP?
> > > > > > > > > > >
> > > > > > > > > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > > > > > > > > strange because I did write a long paragraph about this thing which
> > > > > > > > > > > apparently disappeared...
> > > > > > > > > > >
> > > > > > > > > > > I manage to find what I wrote:
> > > > > > > > > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > > > > > > > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > > > > > > > > >     should be handled carefully later. For the former, even though it's
> > > > > > > > > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > > > > > > > > >     same time, I think we'd better not correlate it like what we do to the
> > > > > > > > > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > > > > > > > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > > > > > > > > >     case we need to use it someday.
> > > > > > > > > > >
> > > > > > > > > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > > > > > > > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > > > > > > > > >     will not conflict with the tskey in shared info which is generated
> > > > > > > > > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > > > > > > > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > > > > > > > > >     could get confused because sometimes they fetch the tskey from skb,
> > > > > > > > > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > > > > > > > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > > > > > > > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > > > > > > > > >     syscall with other tx points(SND/SW/ACK...).
> > > > > > > > > > >
> > > > > > > > > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > > > > > > > > >
> > > > > > > > > > >     If possible, we can leave this question until the UDP support series
> > > > > > > > > > >     shows up. I will figure out a better solution :)
> > > > > > > > > > >
> > > > > > > > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > > > > > > > > flag so I consider the compatibility reason.
> > > > > > > > > >
> > > > > > > > > > I don't think this is acceptable. We should aim for an API that can
> > > > > > > > > > easily be used across protocols, like SO_TIMESTAMPING.
> > > > > > > > >
> > > > > > > > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > > > > > > > > adjusted like below:
> > > > > > > > >
> > > > > > > > > It's hard to provide an absolutely uniform interface or usage to users
> > > > > > > > > for TCP and UDP and even more protocols. Cases can be handled one by
> > > > > > > > > one.
> > > > > > >
> > > > > > > We should try hard. SO_TIMESTAMPING is uniform across protocols.
> > > > > > > An interface that is not is just hard to use.
> > > > > > >
> > > > > > > > > The main obstacle is how we can correlate the timestamp in
> > > > > > > > > sendmsg syscall with other sending timestamps. It's worth noticing
> > > > > > > > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > > > > > > > > userspace. For instance, while skb enters the qdisc, we fail to know
> > > > > > > > > which skb belongs to which sendmsg.
> > > > > > > > >
> > > > > > > > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > > > > > > > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > > > > > > > > protection of socket lock + syscall as the current patch does. But for
> > > > > > > > > UDP, it can be lockless. IIUC, there is a very special case where even
> > > > > > > > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > > > > > > > > socket send UDP packets in parallel, then users could be confused
> > > > > > > > > which tskey matches which sendmsg.
> > > > > > >
> > > > > > > This is a known issue for lockless datagram sockets.
> > > > > > >
> > > > > > > With SO_TIMESTAMPING, but the use of timestamping and of concurrent
> > > > > > > sendmsg calls is under control of the process, so it only shoots
> > > > > > > itself in the foot.
> > > > > > >
> > > > > > > With BPF timestamping, a process may confuse a third party admin, so
> > > > > > > the situation is slightly different.
> > > > > >
> > > > > > Agreed.
> > > > > >
> > > > > > >
> > > > > > > > > IIUC, I will not consider this
> > > > > > > > > unlikely case, then the UDP case is quite similar to the TCP case.
> > > > > > > > >
> > > > > > > > > The scenario for the UDP case is:
> > > > > > > > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > > > > > > > > like TCP does in this series.
> > > > > > > > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > > > > > > > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > > > > > > > > with timestamp.
> > > > > > > > > Note: tskey in UDP will be handled carefully in a different way
> > > > > > > > > because we should support both modes for socket timestamping at the
> > > > > > > > > same time.
> > > > > > > > > It's really similar to TCP regardless of handling tskey.
> > > > > > > > >
> > > > > > > >
> > > > > > > > To be more precise in case you don't have much time to read the above
> > > > > > > > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> > > > > > > > sendmsg timestamp with corresponding tskey.
> > > > > > > >
> > > > > > > > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
> > > > > > > >
> > > > > > > > 2. For UDP, we can correlate in __ip_append_data() along with those
> > > > > > > > tskey initialization, assuming there are no multiple threads calling
> > > > > > > > locklessly ip_make_skb(). Locked path
> > > > > > > > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> > > > > > > > protection, so it can be easily handled. Lockless path
> > > > > > > > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> > > > > > > > the same time, which should be handled properly.
> > > > > > >
> > > > > > > Different hook points is fine, as UDP (and RAW) uses __ip_append_data
> > > > > >
> > > > > > Then this approach (introducing this new flag) is feasible. Sorry that
> > > > > > last night I wrote such a long paragraph which buried something
> > > > > > important. Because of that, I rephrase the whole idea about how to let
> > > > > > UDP work with this kind of new flag in [patch v8 11/12]. Link is
> > > > > > https://lore.kernel.org/all/CAL+tcoCmXcDot-855XYU7PKCiGvJL=O3CQBGuOTRAs2_=Ys=gg@mail.gmail.com/
> > > > > >
> > > > > > > or more importantly ip_send_skb, while TCP uses ip_queue_xmit.
> > > > > >
> > > > > > For TCP, we use tcp_tx_timestamp to finish the map between sendmsg
> > > > > > timestamp and tskey.
> > > > > >
> > > > > > >
> > > > > > > As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB)
> > > > > > > and the behavior of that operation. Subject to the usual distinction
> > > > > > > between protocol behavior (bytestream vs datagram).
> > > > > >
> > > > > > I see your point.
> > > > > >
> > > > > > >
> > > > > > > > I prefer to implement
> > > > > > > > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> > > > > > > > I think. This might be the only one corner case, IIUC?
> > > > > > >
> > > > > > > This sounds like an entirely different topic? Not sure what this is.
> > > > > >
> > > > > > Not really a different topic. I mean let bpf prog take the whole
> > > > > > control of setting the tskey, then with this BPF_SOCK_OPS_TS_SND_CB
> > > > > > flag we can correlate the sendmsg timestamp with tskey. So It has
> > > > > > something to do with the usage of UDP. Please take a look at that link
> > > > > > to patch 11/12. For TCP, we don't need to care about the value of
> > > > > > tskey which has already been taken care of by SO_TIMESTAMPING. So it
> > > > > > is slightly different. I'm not sure if this kind of usage is
> > > > > > acceptable?
> > > > >
> > > > > Why can TCP rely on SO_TIMESTAMPING to set tskey, but UDP cannot?
> > > >
> > > > Because for TCP the shared info tskey is calculated by seqno (in
> > > > tcp_tx_timestamp()), so it works for so_timestamping and its bpf
> > > > extension and they are the same. However, for UDP, the shared info
> > > > tskey can be different, depending on when to call __ip_append_data()
> > > > and what the sk->sk_tskey is. It can cause conflicts when two modes
> > > > work at the same time.
> > >
> > > lockless and locked cannot conflict. (if up->pending then the only
> > > option is to append to that.)
> >
> > Sorry, I should have described more about this point. I was trying to
> > say that if two modes (bpf extension and application timestamping)
> > work at the same time, will the tskey get messed up? Because we have
> > to check if the application mode is turned on. If on, we fetch the
> > existing key generated by the application mode, or else we generate
> > one by modifying the sk->sk_tskey or the tskey of skb?
>
> This applies to all protocols that implement both. TCP, UDP,
> eventually RAW and maybe others like L2TP, CAN, MPTCP, ..
>
> Which is why it should be handled the same uniformly.

Got it.

>
> > >
> > > > More than that, lockless UDP case is a tough
> > > > one since we cannot correlate the sendmsg timestamp in udp_sendmsg()
> > > > with the tskey generated in __ip_append_data(),
> > >
> > > With SO_TIMESTAMPING we do not have this distinction between TCP and
> > > UDP, so we don't need it here.
> > >
> > > It is true that multiple lockless sendmsg calls can race and in that
> > > case that correlation is ambiguous. That is also the case for
> > > SO_TIMESTAMPING and a known issue.
> > >
> > > This is unlikely in most workloads in practice.
> >
> > Oh, I see.
> >
> > > > which is a long gap
> > > > without any lock. So reuse the IPCORK_TS_OPT_ID logic for bpf
> > > > extension here can work.
> > >
> > > It is fine to add a solution to work around the ambiguity. But not
> > > to make it a precondition and so diverge the API for TCP and UDP.
> >
> > There is no divergence in API. BPF always uses SND flag to finish the
> > correlation. The only difference is UDP needs to manage the tskey pool
> > (allocating which tskey to which sendmsg).
>
> Again, I don't see how this is UDP specific.
>
> >
> > >
> > > The same argument to choose a key from BPF can be made for TCP to
> > > a certain extent.
> >
> > Well...right, but we don't bother to do this for TCP. The TCP case is simpler.
> >
> > It seems that you object to the idea that let bpf prog control the
> > allocating tskey for UDP _as default_.
>
> Correct. All protocols should have the same sensible default behavior.

I see.

>
> > I'm not sure if I follow you. Sorry for repeating in case that I miss something:
> > 1) For UDP, do not allocate the tskey from the bpf program, unless
> > it's an alternative/workaround to handle the lockless case. So it's a
> > backup choice.
>
> And an alterative API should apply equally to all protocols too.

It seems feasible.

>
> > 2) Use the exact same way like TCP to finish the correlation on the
> > basis of socket lock protection _as default_. Because the lockless
> > method is seldomly used then we may provide 1) method?
> > ?
>
> The opposite: the default for UDP is the lockless fast path.

Then using SND flag to correlate for UDP is safe (by putting the SND
flag in the __ip_append_data() is suitable for at least UDP case).

Now I have a clear vision and goal after so many rounds of discussion. Thanks.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2025-02-07  0:35 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28  8:46 [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 01/13] net-timestamp: add support for bpf_setsockopt() Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 02/13] net-timestamp: prepare for timestamping callbacks use Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 03/13] bpf: stop unsafely accessing TCP fields in bpf callbacks Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 04/13] bpf: stop calling some sock_op BPF CALLs in new timestamping callbacks Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 05/13] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
2025-02-03 23:14   ` Martin KaFai Lau
2025-02-04  0:18     ` Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 06/13] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension Jason Xing
2025-02-03 23:23   ` Martin KaFai Lau
2025-02-04  0:19     ` Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 07/13] net-timestamp: support sw SCM_TSTAMP_SND " Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 08/13] net-timestamp: support hw " Jason Xing
2025-02-04  0:56   ` Martin KaFai Lau
2025-02-04  1:13     ` Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 09/13] net-timestamp: support SCM_TSTAMP_ACK " Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 10/13] net-timestamp: make TCP tx timestamp bpf extension work Jason Xing
2025-02-04  1:03   ` Martin KaFai Lau
2025-02-04  1:15     ` Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp() Jason Xing
2025-02-04  1:16   ` Martin KaFai Lau
2025-02-04  1:25     ` Jason Xing
2025-02-04 17:08       ` Willem de Bruijn
2025-02-04 18:09         ` Jason Xing
2025-02-05  3:05           ` Jason Xing
2025-02-05  5:13             ` Jason Xing
2025-02-05 15:20             ` Willem de Bruijn
2025-02-05 15:47               ` Jason Xing
2025-02-05 21:02                 ` Willem de Bruijn
2025-02-06  0:33                   ` Jason Xing
2025-02-06  3:00                     ` Willem de Bruijn
2025-02-06  4:03                       ` Jason Xing
2025-02-06 16:22                         ` Willem de Bruijn
2025-02-07  0:35                           ` Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 12/13] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases Jason Xing
2025-02-04  1:21   ` Martin KaFai Lau
2025-02-04  1:25     ` Jason Xing
2025-01-28  8:46 ` [PATCH bpf-next v7 13/13] bpf: add simple bpf tests in the tx path for so_timestamping feature Jason Xing
2025-02-04  2:02   ` Martin KaFai Lau
2025-02-04  5:32     ` Jason Xing
2025-02-04  2:27 ` [PATCH bpf-next v7 00/13] net-timestamp: bpf extension to equip applications transparently Martin KaFai Lau
2025-02-04  2:44   ` Jason Xing
2025-02-04 17:11     ` Willem de Bruijn
2025-02-04 18:12       ` Jason Xing
2025-02-04 17:06   ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).