[PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently
@ 2025-01-12 11:37 Jason Xing
  2025-01-12 11:37 ` [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt() Jason Xing
                   ` (14 more replies)
  0 siblings, 15 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

"Timestamping is key to debugging network stack latency. With
SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
network issues can be attributed to the kernel." This is extracted
from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
addressed by Willem de Bruijn at netdevconf 0x17).

There are a few areas that need optimization with the consideration of
easier use and less performance impact, which I highlighted and mainly
discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
uAPI compatibility, extra system call overhead, and the need for
application modification. I initially managed to solve these issues
by writing a kernel module that hooks various key functions. However,
this approach is not suitable for the next kernel release. Therefore,
a BPF extension was proposed. During recent period, Martin KaFai Lau
provides invaluable suggestions about BPF along the way. Many thanks
here!

In this series, I only support foundamental codes and tx for TCP.
This approach mostly relies on existing SO_TIMESTAMPING feature, users
only needs to pass certain flags through bpf_setsocktopt() to a separate
tsflags. Please see the last selftest patch in this series.

After this series, we could step by step implement more advanced
functions/flags already in SO_TIMESTAMPING feature for bpf extension.

---
v5
Link: https://lore.kernel.org/all/20241207173803.90744-1-kerneljasonxing@gmail.com/
1. handle the safety issus when someone tries to call unrelated bpf
helpers.
2. avoid adding direct function call in the hot path like
__dev_queue_xmit()
3. remove reporting the hardware timestamp and tskey since they can be
fetched through the existing helper with the help of
bpf_skops_init_skb(), please see the selftest.
4. add new sendmsg callback in tcp_sendmsg, and introduce tskey_bpf used
by bpf program to correlate tcp_sendmsg with other hook points in patch [13/15].

v4
Link: https://lore.kernel.org/all/20241028110535.82999-1-kerneljasonxing@gmail.com/
1. introduce sk->sk_bpf_cb_flags to let user use bpf_setsockopt() (Martin)
2. introduce SKBTX_BPF to enable the bpf SO_TIMESTAMPING feature (Martin)
3. introduce bpf map in tests (Martin)
4. I choose to make this series as simple as possible, so I only support
most cases in the tx path for TCP protocol.

v3
Link: https://lore.kernel.org/all/20241012040651.95616-1-kerneljasonxing@gmail.com/
1. support UDP proto by introducing a new generation point.
2. for OPT_ID, introducing sk_tskey_bpf_offset to compute the delta
between the current socket key and bpf socket key. It is desiged for
UDP, which also applies to TCP.
3. support bpf_getsockopt()
4. use cgroup static key instead.
5. add one simple bpf selftest to show how it can be used.
6. remove the rx support from v2 because the number of patches could
exceed the limit of one series.

V2
Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
1. Introduce tsflag requestors so that we are able to extend more in the
future. Besides, it enables TX flags for bpf extension feature separately
without breaking users. It is suggested by Vadim Fedorenko.
2. introduce a static key to control the whole feature. (Willem)
3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
some TX/RX cases, not all the cases.

Jason Xing (15):
  net-timestamp: add support for bpf_setsockopt()
  net-timestamp: prepare for bpf prog use
  bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  net-timestamp: support SK_BPF_CB_FLAGS only in bpf_sock_ops_setsockopt
  net-timestamp: add strict check in some BPF calls
  net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING
  net-timestamp: support SCM_TSTAMP_SCHED for bpf extension
  net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  net-timestamp: support SCM_TSTAMP_ACK for bpf extension
  net-timestamp: support hw SCM_TSTAMP_SND for bpf extension
  net-timestamp: support export skb to the userspace
  net-timestamp: make TCP tx timestamp bpf extension work
  net-timestamp: support tcp_sendmsg for bpf extension
  net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases
  bpf: add simple bpf tests in the tx path for so_timestamping feature

 include/linux/filter.h                        |   1 +
 include/linux/skbuff.h                        |  36 +++-
 include/net/sock.h                            |  14 ++
 include/net/tcp.h                             |   3 +-
 include/uapi/linux/bpf.h                      |  26 +++
 net/core/dev.c                                |   5 +-
 net/core/filter.c                             |  65 +++++-
 net/core/skbuff.c                             |  70 ++++++-
 net/core/sock.c                               |  17 ++
 net/ipv4/tcp.c                                |  21 +-
 net/ipv4/tcp_input.c                          |   9 +-
 net/ipv4/tcp_output.c                         |   8 +
 net/socket.c                                  |   2 +-
 tools/include/uapi/linux/bpf.h                |  19 ++
 .../bpf/prog_tests/so_timestamping.c          |  95 +++++++++
 .../selftests/bpf/progs/so_timestamping.c     | 186 ++++++++++++++++++
 16 files changed, 548 insertions(+), 29 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/so_timestamping.c
 create mode 100644 tools/testing/selftests/bpf/progs/so_timestamping.c

-- 
2.43.5

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt()
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-12 14:49   ` kernel test robot
  2025-01-14 23:20   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use Jason Xing
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Users can write the following code to enable the bpf extension:
bpf_setsockopt(skops, SOL_SOCKET, SK_BPF_CB_FLAGS, &flags, sizeof(flags));

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/net/sock.h             |  7 +++++++
 include/uapi/linux/bpf.h       |  8 ++++++++
 net/core/filter.c              | 25 +++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  1 +
 4 files changed, 41 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index ccf86c8a7a8a..f5447b4b78fd 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -303,6 +303,7 @@ struct sk_filter;
   *	@sk_stamp: time stamp of last packet received
   *	@sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only
   *	@sk_tsflags: SO_TIMESTAMPING flags
+  *	@sk_bpf_cb_flags: used for bpf_setsockopt
   *	@sk_use_task_frag: allow sk_page_frag() to use current->task_frag.
   *			   Sockets that can be used under memory reclaim should
   *			   set this to false.
@@ -445,6 +446,12 @@ struct sock {
 	u32			sk_reserved_mem;
 	int			sk_forward_alloc;
 	u32			sk_tsflags;
+#ifdef CONFIG_BPF_SYSCALL
+#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
+	u32			sk_bpf_cb_flags;
+#else
+#define SK_BPF_CB_FLAG_TEST(SK, FLAG) 0
+#endif
 	__cacheline_group_end(sock_write_rxtx);
 
 	__cacheline_group_begin(sock_write_tx);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4162afc6b5d0..e629e09b0b31 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6903,6 +6903,13 @@ enum {
 	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
 };
 
+/* Definitions for bpf_sk_cb_flags */
+enum {
+	SK_BPF_CB_TX_TIMESTAMPING	= 1<<0,
+	SK_BPF_CB_MASK			= (SK_BPF_CB_TX_TIMESTAMPING - 1) |
+					   SK_BPF_CB_TX_TIMESTAMPING
+};
+
 /* List of known BPF sock_ops operators.
  * New entries can only be added at the end
  */
@@ -7081,6 +7088,7 @@ enum {
 	TCP_BPF_SYN_IP		= 1006, /* Copy the IP[46] and TCP header */
 	TCP_BPF_SYN_MAC         = 1007, /* Copy the MAC, IP[46], and TCP header */
 	TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
+	SK_BPF_CB_FLAGS		= 1009, /* Used to set socket bpf flags */
 };
 
 enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index b957cf57299e..c6dd2d2e44c8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5222,6 +5222,23 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
+static int sk_bpf_set_cb_flags(struct sock *sk, char *optval, bool getopt)
+{
+	u32 sk_bpf_cb_flags;
+
+	if (getopt)
+		return -EINVAL;
+
+	sk_bpf_cb_flags = *(u32 *)optval;
+
+	if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
+		return -EINVAL;
+
+	sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
+
+	return 0;
+}
+
 static int sol_socket_sockopt(struct sock *sk, int optname,
 			      char *optval, int *optlen,
 			      bool getopt)
@@ -5238,6 +5255,7 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
 	case SO_MAX_PACING_RATE:
 	case SO_BINDTOIFINDEX:
 	case SO_TXREHASH:
+	case SK_BPF_CB_FLAGS:
 		if (*optlen != sizeof(int))
 			return -EINVAL;
 		break;
@@ -5247,6 +5265,13 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
 		return -EINVAL;
 	}
 
+	if (optname == SK_BPF_CB_FLAGS)
+#ifdef CONFIG_BPF_SYSCALL
+		return sk_bpf_set_cb_flags(sk, optval, getopt);
+#else
+		return -EINVAL;
+#endif
+
 	if (getopt) {
 		if (optname == SO_BINDTODEVICE)
 			return -EINVAL;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4162afc6b5d0..6b0a5b787b12 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7081,6 +7081,7 @@ enum {
 	TCP_BPF_SYN_IP		= 1006, /* Copy the IP[46] and TCP header */
 	TCP_BPF_SYN_MAC         = 1007, /* Copy the MAC, IP[46], and TCP header */
 	TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
+	SK_BPF_CB_FLAGS		= 1009, /* Used to set socket bpf flags */
 };
 
 enum {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
  2025-01-12 11:37 ` [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt() Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-14 23:39   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog Jason Xing
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Later, I would introduce three points to report some information
to user space based on this.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/net/sock.h |  7 +++++++
 net/core/sock.c    | 14 ++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index f5447b4b78fd..dd874e8337c0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2930,6 +2930,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
 			  struct so_timestamping timestamping);
 
 void sock_enable_timestamps(struct sock *sk);
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
+void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
+#else
+static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
+{
+}
+#endif
 void sock_no_linger(struct sock *sk);
 void sock_set_keepalive(struct sock *sk);
 void sock_set_priority(struct sock *sk, u32 priority);
diff --git a/net/core/sock.c b/net/core/sock.c
index eae2ae70a2e0..e06bcafb1b2d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -948,6 +948,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
 	return 0;
 }
 
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
+void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
+{
+	struct bpf_sock_ops_kern sock_ops;
+
+	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
+	sock_ops.op = op;
+	if (sk_is_tcp(sk) && sk_fullsock(sk))
+		sock_ops.is_fullsock = 1;
+	sock_ops.sk = sk;
+	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
+}
+#endif
+
 void sock_set_keepalive(struct sock *sk)
 {
 	lock_sock(sk);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
  2025-01-12 11:37 ` [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt() Jason Xing
  2025-01-12 11:37 ` [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-15  1:17   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 04/15] net-timestamp: support SK_BPF_CB_FLAGS only in bpf_sock_ops_setsockopt Jason Xing
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

timestamp_used consists of two parts, one is is_fullsock, the other
one is for UDP socket which will be support in the next round.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/filter.h | 1 +
 net/core/filter.c      | 4 ++--
 net/core/sock.c        | 1 +
 net/ipv4/tcp_input.c   | 2 ++
 net/ipv4/tcp_output.c  | 2 ++
 5 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a3ea46281595..daca3fe48b8f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1508,6 +1508,7 @@ struct bpf_sock_ops_kern {
 	void	*skb_data_end;
 	u8	op;
 	u8	is_fullsock;
+	u8	timestamp_used;
 	u8	remaining_opt_len;
 	u64	temp;			/* temp and everything after is not
 					 * initialized to 0 before calling
diff --git a/net/core/filter.c b/net/core/filter.c
index c6dd2d2e44c8..1ac996ec5e0f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10424,10 +10424,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 		}							      \
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
 						struct bpf_sock_ops_kern,     \
-						is_fullsock),		      \
+						timestamp_used),	      \
 				      fullsock_reg, si->src_reg,	      \
 				      offsetof(struct bpf_sock_ops_kern,      \
-					       is_fullsock));		      \
+					       timestamp_used));	      \
 		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);	      \
 		if (si->dst_reg == si->src_reg)				      \
 			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
diff --git a/net/core/sock.c b/net/core/sock.c
index e06bcafb1b2d..dbb9326ae9d1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -958,6 +958,7 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
 	if (sk_is_tcp(sk) && sk_fullsock(sk))
 		sock_ops.is_fullsock = 1;
 	sock_ops.sk = sk;
+	sock_ops.timestamp_used = 1;
 	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
 }
 #endif
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4811727b8a02..cad41ad34bd5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -169,6 +169,7 @@ static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb)
 	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
 	sock_ops.op = BPF_SOCK_OPS_PARSE_HDR_OPT_CB;
 	sock_ops.is_fullsock = 1;
+	sock_ops.timestamp_used = 1;
 	sock_ops.sk = sk;
 	bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb));
 
@@ -185,6 +186,7 @@ static void bpf_skops_established(struct sock *sk, int bpf_op,
 	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
 	sock_ops.op = bpf_op;
 	sock_ops.is_fullsock = 1;
+	sock_ops.timestamp_used = 1;
 	sock_ops.sk = sk;
 	/* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */
 	if (skb)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0e5b9a654254..7b4d1dfd57d4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -522,6 +522,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
 		sock_owned_by_me(sk);
 
 		sock_ops.is_fullsock = 1;
+		sock_ops.timestamp_used = 1;
 		sock_ops.sk = sk;
 	}
 
@@ -567,6 +568,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
 		sock_owned_by_me(sk);
 
 		sock_ops.is_fullsock = 1;
+		sock_ops.timestamp_used = 1;
 		sock_ops.sk = sk;
 	}
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 04/15] net-timestamp: support SK_BPF_CB_FLAGS only in bpf_sock_ops_setsockopt
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (2 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-15 21:22   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls Jason Xing
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

We will allow both TCP and UDP sockets to use this helper to
enable this feature. So let SK_BPF_CB_FLAGS pass the check:
1. skip is_fullsock check
2. skip owned by me check

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/core/filter.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 1ac996ec5e0f..0e915268db5f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5507,12 +5507,9 @@ static int sol_ipv6_sockopt(struct sock *sk, int optname,
 					      KERNEL_SOCKPTR(optval), *optlen);
 }
 
-static int __bpf_setsockopt(struct sock *sk, int level, int optname,
-			    char *optval, int optlen)
+static int ___bpf_setsockopt(struct sock *sk, int level, int optname,
+			     char *optval, int optlen)
 {
-	if (!sk_fullsock(sk))
-		return -EINVAL;
-
 	if (level == SOL_SOCKET)
 		return sol_socket_sockopt(sk, optname, optval, &optlen, false);
 	else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP)
@@ -5525,6 +5522,15 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname,
 	return -EINVAL;
 }
 
+static int __bpf_setsockopt(struct sock *sk, int level, int optname,
+			    char *optval, int optlen)
+{
+	if (!sk_fullsock(sk))
+		return -EINVAL;
+
+	return ___bpf_setsockopt(sk, level, optname, optval, optlen);
+}
+
 static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 			   char *optval, int optlen)
 {
@@ -5675,7 +5681,16 @@ static const struct bpf_func_proto bpf_sock_addr_getsockopt_proto = {
 BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
-	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
+	struct sock *sk = bpf_sock->sk;
+
+	if (optname != SK_BPF_CB_FLAGS) {
+		if (sk_fullsock(sk))
+			sock_owned_by_me(sk);
+		else if (optname != SK_BPF_CB_FLAGS)
+			return -EINVAL;
+	}
+
+	return ___bpf_setsockopt(sk, level, optname, optval, optlen);
 }
 
 static const struct bpf_func_proto bpf_sock_ops_setsockopt_proto = {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (3 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 04/15] net-timestamp: support SK_BPF_CB_FLAGS only in bpf_sock_ops_setsockopt Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-12 14:37   ` kernel test robot
                     ` (2 more replies)
  2025-01-12 11:37 ` [PATCH net-next v5 06/15] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
                   ` (9 subsequent siblings)
  14 siblings, 3 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

In the next round, we will support the UDP proto for SO_TIMESTAMPING
bpf extension, so we need to ensure there is no safety problem.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/core/filter.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 0e915268db5f..517f09aabc92 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5571,7 +5571,7 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 static int _bpf_getsockopt(struct sock *sk, int level, int optname,
 			   char *optval, int optlen)
 {
-	if (sk_fullsock(sk))
+	if (sk_fullsock(sk) && optname != SK_BPF_CB_FLAGS)
 		sock_owned_by_me(sk);
 	return __bpf_getsockopt(sk, level, optname, optval, optlen);
 }
@@ -5776,6 +5776,7 @@ BPF_CALL_5(bpf_sock_ops_getsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
 	if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP &&
+	    bpf_sock->sk->sk_protocol == IPPROTO_TCP &&
 	    optname >= TCP_BPF_SYN && optname <= TCP_BPF_SYN_MAC) {
 		int ret, copy_len = 0;
 		const u8 *start;
@@ -5817,7 +5818,8 @@ BPF_CALL_2(bpf_sock_ops_cb_flags_set, struct bpf_sock_ops_kern *, bpf_sock,
 	struct sock *sk = bpf_sock->sk;
 	int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS;
 
-	if (!IS_ENABLED(CONFIG_INET) || !sk_fullsock(sk))
+	if (!IS_ENABLED(CONFIG_INET) || !sk_fullsock(sk) ||
+	    sk->sk_protocol != IPPROTO_TCP)
 		return -EINVAL;
 
 	tcp_sk(sk)->bpf_sock_ops_cb_flags = val;
@@ -7626,6 +7628,9 @@ BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
 	u8 search_kind, search_len, copy_len, magic_len;
 	int ret;
 
+	if (bpf_sock->op != SK_BPF_CB_FLAGS)
+		return -EINVAL;
+
 	/* 2 byte is the minimal option len except TCPOPT_NOP and
 	 * TCPOPT_EOL which are useless for the bpf prog to learn
 	 * and this helper disallow loading them also.
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 06/15] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (4 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-15 22:11   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 07/15] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension Jason Xing
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

No functional changes here. I add skb_enable_app_tstamp() to test
if the orig_skb matches the usage of application SO_TIMESTAMPING
and skb_sw_tstamp_tx() to distinguish the software and hardware
timestamp when tsflag is SCM_TSTAMP_SND.

After this patch, I will soon add checks about bpf SO_TIMESTAMPING.
In this way, we can support two modes parallelly.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/skbuff.h | 22 ++++++++++++++++------
 net/core/dev.c         |  2 +-
 net/core/skbuff.c      | 42 ++++++++++++++++++++++++++++++++++++++++--
 net/ipv4/tcp_input.c   |  3 ++-
 4 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bb2b751d274a..09461ee84d2f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4533,21 +4533,31 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 
 void __skb_tstamp_tx(struct sk_buff *orig_skb, const struct sk_buff *ack_skb,
 		     struct skb_shared_hwtstamps *hwtstamps,
-		     struct sock *sk, int tstype);
+		     struct sock *sk, bool sw, int tstype);
 
 /**
- * skb_tstamp_tx - queue clone of skb with send time stamps
+ * skb_tstamp_tx - queue clone of skb with send time HARDWARE stamps
  * @orig_skb:	the original outgoing packet
  * @hwtstamps:	hardware time stamps, may be NULL if not available
  *
  * If the skb has a socket associated, then this function clones the
  * skb (thus sharing the actual data and optional structures), stores
- * the optional hardware time stamping information (if non NULL) or
- * generates a software time stamp (otherwise), then queues the clone
- * to the error queue of the socket.  Errors are silently ignored.
+ * the optional hardware time stamping information (if non NULL) then
+ * queues the clone to the error queue of the socket.  Errors are
+ * silently ignored.
  */
 void skb_tstamp_tx(struct sk_buff *orig_skb,
 		   struct skb_shared_hwtstamps *hwtstamps);
+/**
+ * skb_tstamp_tx - queue clone of skb with send time SOFTWARE stamps
+ * @orig_skb:	the original outgoing packet
+ *
+ * If the skb has a socket associated, then this function clones the
+ * skb (thus sharing the actual data and optional structures),
+ * generates a software time stamp (otherwise), then queues the clone
+ * to the error queue of the socket.  Errors are silently ignored.
+ */
+void skb_sw_tstamp_tx(struct sk_buff *orig_skb);
 
 /**
  * skb_tx_timestamp() - Driver hook for transmit timestamping
@@ -4565,7 +4575,7 @@ static inline void skb_tx_timestamp(struct sk_buff *skb)
 {
 	skb_clone_tx_timestamp(skb);
 	if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
-		skb_tstamp_tx(skb, NULL);
+		skb_sw_tstamp_tx(skb);
 }
 
 /**
diff --git a/net/core/dev.c b/net/core/dev.c
index 1a90ed8cc6cc..397fbcd5e4de 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4398,7 +4398,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 	skb_assert_len(skb);
 
 	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
-		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
+		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, true, SCM_TSTAMP_SCHED);
 
 	/* Disable soft irqs for various locks below. Also
 	 * stops preemption for RCU.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a441613a1e6c..b34c7ec3d5e9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5539,10 +5539,38 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 }
 EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
 
+static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
+{
+	int flag;
+
+	switch (tstype) {
+	case SCM_TSTAMP_SCHED:
+		flag = SKBTX_SCHED_TSTAMP;
+		break;
+	case SCM_TSTAMP_SND:
+		if (sw)
+			flag = SKBTX_SW_TSTAMP;
+		else
+			flag = SKBTX_HW_TSTAMP;
+		break;
+	case SCM_TSTAMP_ACK:
+		if (TCP_SKB_CB(skb)->txstamp_ack)
+			return true;
+		fallthrough;
+	default:
+		return false;
+	}
+
+	if (skb_shinfo(skb)->tx_flags & flag)
+		return true;
+
+	return false;
+}
+
 void __skb_tstamp_tx(struct sk_buff *orig_skb,
 		     const struct sk_buff *ack_skb,
 		     struct skb_shared_hwtstamps *hwtstamps,
-		     struct sock *sk, int tstype)
+		     struct sock *sk, bool sw, int tstype)
 {
 	struct sk_buff *skb;
 	bool tsonly, opt_stats = false;
@@ -5551,6 +5579,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (!sk)
 		return;
 
+	if (!skb_enable_app_tstamp(orig_skb, tstype, sw))
+		return;
+
 	tsflags = READ_ONCE(sk->sk_tsflags);
 	if (!hwtstamps && !(tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) &&
 	    skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)
@@ -5596,10 +5627,17 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 }
 EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
 
+void skb_sw_tstamp_tx(struct sk_buff *orig_skb)
+{
+	return __skb_tstamp_tx(orig_skb, NULL, NULL, orig_skb->sk, true,
+			       SCM_TSTAMP_SND);
+}
+EXPORT_SYMBOL_GPL(skb_sw_tstamp_tx);
+
 void skb_tstamp_tx(struct sk_buff *orig_skb,
 		   struct skb_shared_hwtstamps *hwtstamps)
 {
-	return __skb_tstamp_tx(orig_skb, NULL, hwtstamps, orig_skb->sk,
+	return __skb_tstamp_tx(orig_skb, NULL, hwtstamps, orig_skb->sk, false,
 			       SCM_TSTAMP_SND);
 }
 EXPORT_SYMBOL_GPL(skb_tstamp_tx);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index cad41ad34bd5..c2cdd0acb504 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3330,7 +3330,8 @@ static void tcp_ack_tstamp(struct sock *sk, struct sk_buff *skb,
 	if (!before(shinfo->tskey, prior_snd_una) &&
 	    before(shinfo->tskey, tcp_sk(sk)->snd_una)) {
 		tcp_skb_tsorted_save(skb) {
-			__skb_tstamp_tx(skb, ack_skb, NULL, sk, SCM_TSTAMP_ACK);
+			__skb_tstamp_tx(skb, ack_skb, NULL, sk, true,
+					SCM_TSTAMP_ACK);
 		} tcp_skb_tsorted_restore(skb);
 	}
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 07/15] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (5 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 06/15] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-15 22:32   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND " Jason Xing
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Introducing SKBTX_BPF is used as an indicator telling us whether
the skb should be traced by the bpf prog.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/skbuff.h         |  6 +++++-
 include/uapi/linux/bpf.h       |  5 +++++
 net/core/dev.c                 |  3 ++-
 net/core/skbuff.c              | 20 ++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  5 +++++
 5 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 09461ee84d2f..30901dfb4539 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -489,10 +489,14 @@ enum {
 
 	/* generate software time stamp when entering packet scheduling */
 	SKBTX_SCHED_TSTAMP = 1 << 6,
+
+	/* used for bpf extension when a bpf program is loaded */
+	SKBTX_BPF = 1 << 7,
 };
 
 #define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP    | \
-				 SKBTX_SCHED_TSTAMP)
+				 SKBTX_SCHED_TSTAMP | \
+				 SKBTX_BPF)
 #define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | \
 				 SKBTX_HW_TSTAMP_USE_CYCLES | \
 				 SKBTX_ANY_SW_TSTAMP)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e629e09b0b31..72f93c6e45c1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7022,6 +7022,11 @@ enum {
 					 * by the kernel or the
 					 * earlier bpf-progs.
 					 */
+	BPF_SOCK_OPS_TS_SCHED_OPT_CB,	/* Called when skb is passing through
+					 * dev layer when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/dev.c b/net/core/dev.c
index 397fbcd5e4de..3e0cee0b3a0b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4397,7 +4397,8 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 	skb_reset_mac_header(skb);
 	skb_assert_len(skb);
 
-	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
+	if (unlikely(skb_shinfo(skb)->tx_flags &
+		     (SKBTX_SCHED_TSTAMP | SKBTX_BPF)))
 		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, true, SCM_TSTAMP_SCHED);
 
 	/* Disable soft irqs for various locks below. Also
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b34c7ec3d5e9..169c6d03d698 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5567,6 +5567,24 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
 	return false;
 }
 
+static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype)
+{
+	int op;
+
+	if (!sk)
+		return;
+
+	switch (tstype) {
+	case SCM_TSTAMP_SCHED:
+		op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
+		break;
+	default:
+		return;
+	}
+
+	bpf_skops_tx_timestamping(sk, skb, op);
+}
+
 void __skb_tstamp_tx(struct sk_buff *orig_skb,
 		     const struct sk_buff *ack_skb,
 		     struct skb_shared_hwtstamps *hwtstamps,
@@ -5578,6 +5596,8 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 
 	if (!sk)
 		return;
+	if (skb_shinfo(orig_skb)->tx_flags & SKBTX_BPF)
+		__skb_tstamp_tx_bpf(orig_skb, sk, tstype);
 
 	if (!skb_enable_app_tstamp(orig_skb, tstype, sw))
 		return;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6b0a5b787b12..891217ab6d2d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7015,6 +7015,11 @@ enum {
 					 * by the kernel or the
 					 * earlier bpf-progs.
 					 */
+	BPF_SOCK_OPS_TS_SCHED_OPT_CB,	/* Called when skb is passing through
+					 * dev layer when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (6 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 07/15] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-15 22:48   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 09/15] net-timestamp: support SCM_TSTAMP_ACK " Jason Xing
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Support SCM_TSTAMP_SND case. Then we will get the software
timestamp when the driver is about to send the skb. Later, I
will support the hardware timestamp.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/skbuff.h         | 2 +-
 include/uapi/linux/bpf.h       | 5 +++++
 net/core/skbuff.c              | 3 +++
 tools/include/uapi/linux/bpf.h | 5 +++++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 30901dfb4539..4f38c17c67a7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4578,7 +4578,7 @@ void skb_sw_tstamp_tx(struct sk_buff *orig_skb);
 static inline void skb_tx_timestamp(struct sk_buff *skb)
 {
 	skb_clone_tx_timestamp(skb);
-	if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
+	if (skb_shinfo(skb)->tx_flags & (SKBTX_SW_TSTAMP | SKBTX_BPF))
 		skb_sw_tstamp_tx(skb);
 }
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 72f93c6e45c1..a6d761f07f67 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7027,6 +7027,11 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_SW_OPT_CB,	/* Called when skb is about to send
+					 * to the nic when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 169c6d03d698..0fb31df4ed95 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5578,6 +5578,9 @@ static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype
 	case SCM_TSTAMP_SCHED:
 		op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
 		break;
+	case SCM_TSTAMP_SND:
+		op = BPF_SOCK_OPS_TS_SW_OPT_CB;
+		break;
 	default:
 		return;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 891217ab6d2d..73fc0a95c9ca 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7020,6 +7020,11 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_SW_OPT_CB,	/* Called when skb is about to send
+					 * to the nic when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 09/15] net-timestamp: support SCM_TSTAMP_ACK for bpf extension
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (7 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND " Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-15 23:02   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 10/15] net-timestamp: support hw SCM_TSTAMP_SND " Jason Xing
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Handle the ACK timestamp case. Actually testing SKBTX_BPF flag
can work, but we need to Introduce a new txstamp_ack_bpf to avoid
cache line misses in tcp_ack_tstamp(). To be more specific, in most
cases, normal flows would not access skb_shinfo as txstamp_ack
is zero, so that this function won't appear in the hot spot lists.
Introducing a new member txstamp_ack_bpf works similarly.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/net/tcp.h              | 3 ++-
 include/uapi/linux/bpf.h       | 5 +++++
 net/core/skbuff.c              | 3 +++
 net/ipv4/tcp_input.c           | 3 ++-
 net/ipv4/tcp_output.c          | 5 +++++
 tools/include/uapi/linux/bpf.h | 5 +++++
 6 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5b2b04835688..f00a8e3f9b31 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -959,9 +959,10 @@ struct tcp_skb_cb {
 	__u8		sacked;		/* State flags for SACK.	*/
 	__u8		ip_dsfield;	/* IPv4 tos or IPv6 dsfield	*/
 	__u8		txstamp_ack:1,	/* Record TX timestamp for ack? */
+			txstamp_ack_bpf:1,	/* ack timestamp for bpf use */
 			eor:1,		/* Is skb MSG_EOR marked? */
 			has_rxtstamp:1,	/* SKB has a RX timestamp	*/
-			unused:5;
+			unused:4;
 	__u32		ack_seq;	/* Sequence number ACK'd	*/
 	union {
 		struct {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a6d761f07f67..a0aff1b4eb61 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7032,6 +7032,11 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_ACK_OPT_CB,	/* Called when all the skbs are
+					 * acknowledged when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0fb31df4ed95..17b9d8061f04 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5581,6 +5581,9 @@ static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype
 	case SCM_TSTAMP_SND:
 		op = BPF_SOCK_OPS_TS_SW_OPT_CB;
 		break;
+	case SCM_TSTAMP_ACK:
+		op = BPF_SOCK_OPS_TS_ACK_OPT_CB;
+		break;
 	default:
 		return;
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c2cdd0acb504..0f2e6e73de9f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3323,7 +3323,8 @@ static void tcp_ack_tstamp(struct sock *sk, struct sk_buff *skb,
 	const struct skb_shared_info *shinfo;
 
 	/* Avoid cache line misses to get skb_shinfo() and shinfo->tx_flags */
-	if (likely(!TCP_SKB_CB(skb)->txstamp_ack))
+	if (likely(!TCP_SKB_CB(skb)->txstamp_ack &&
+		   !TCP_SKB_CB(skb)->txstamp_ack_bpf))
 		return;
 
 	shinfo = skb_shinfo(skb);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7b4d1dfd57d4..aa1da7c89383 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1556,6 +1556,7 @@ static void tcp_adjust_pcount(struct sock *sk, const struct sk_buff *skb, int de
 static bool tcp_has_tx_tstamp(const struct sk_buff *skb)
 {
 	return TCP_SKB_CB(skb)->txstamp_ack ||
+	       TCP_SKB_CB(skb)->txstamp_ack_bpf ||
 		(skb_shinfo(skb)->tx_flags & SKBTX_ANY_TSTAMP);
 }
 
@@ -1572,7 +1573,9 @@ static void tcp_fragment_tstamp(struct sk_buff *skb, struct sk_buff *skb2)
 		shinfo2->tx_flags |= tsflags;
 		swap(shinfo->tskey, shinfo2->tskey);
 		TCP_SKB_CB(skb2)->txstamp_ack = TCP_SKB_CB(skb)->txstamp_ack;
+		TCP_SKB_CB(skb2)->txstamp_ack_bpf = TCP_SKB_CB(skb)->txstamp_ack_bpf;
 		TCP_SKB_CB(skb)->txstamp_ack = 0;
+		TCP_SKB_CB(skb)->txstamp_ack_bpf = 0;
 	}
 }
 
@@ -3213,6 +3216,8 @@ void tcp_skb_collapse_tstamp(struct sk_buff *skb,
 		shinfo->tskey = next_shinfo->tskey;
 		TCP_SKB_CB(skb)->txstamp_ack |=
 			TCP_SKB_CB(next_skb)->txstamp_ack;
+		TCP_SKB_CB(skb)->txstamp_ack_bpf |=
+			TCP_SKB_CB(next_skb)->txstamp_ack_bpf;
 	}
 }
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 73fc0a95c9ca..0fe7d663a244 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7025,6 +7025,11 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_ACK_OPT_CB,	/* Called when all the skbs are
+					 * acknowledged when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 10/15] net-timestamp: support hw SCM_TSTAMP_SND for bpf extension
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (8 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 09/15] net-timestamp: support SCM_TSTAMP_ACK " Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-12 11:37 ` [PATCH net-next v5 11/15] net-timestamp: support export skb to the userspace Jason Xing
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

To avoid changing so many callers using SKBTX_HW_TSTAMP from drivers,
use this simple modification like this patch does to support printing
hardware timestamp.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/skbuff.h | 4 +++-
 net/core/skbuff.c      | 2 +-
 net/socket.c           | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4f38c17c67a7..d3ef8db94a94 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -470,7 +470,7 @@ struct skb_shared_hwtstamps {
 /* Definitions for tx_flags in struct skb_shared_info */
 enum {
 	/* generate hardware time stamp */
-	SKBTX_HW_TSTAMP = 1 << 0,
+	__SKBTX_HW_TSTAMP = 1 << 0,
 
 	/* generate software time stamp when queueing packet to NIC */
 	SKBTX_SW_TSTAMP = 1 << 1,
@@ -494,6 +494,8 @@ enum {
 	SKBTX_BPF = 1 << 7,
 };
 
+#define SKBTX_HW_TSTAMP		(__SKBTX_HW_TSTAMP | SKBTX_BPF)
+
 #define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP    | \
 				 SKBTX_SCHED_TSTAMP | \
 				 SKBTX_BPF)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 17b9d8061f04..4bc7a424eb8a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5551,7 +5551,7 @@ static bool skb_enable_app_tstamp(struct sk_buff *skb, int tstype, bool sw)
 		if (sw)
 			flag = SKBTX_SW_TSTAMP;
 		else
-			flag = SKBTX_HW_TSTAMP;
+			flag = __SKBTX_HW_TSTAMP;
 		break;
 	case SCM_TSTAMP_ACK:
 		if (TCP_SKB_CB(skb)->txstamp_ack)
diff --git a/net/socket.c b/net/socket.c
index 4afe31656a2b..57343341bfb6 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -676,7 +676,7 @@ void __sock_tx_timestamp(__u32 tsflags, __u8 *tx_flags)
 	u8 flags = *tx_flags;
 
 	if (tsflags & SOF_TIMESTAMPING_TX_HARDWARE) {
-		flags |= SKBTX_HW_TSTAMP;
+		flags |= __SKBTX_HW_TSTAMP;
 
 		/* PTP hardware clocks can provide a free running cycle counter
 		 * as a time base for virtual clocks. Tell driver to use the
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 11/15] net-timestamp: support export skb to the userspace
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (9 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 10/15] net-timestamp: support hw SCM_TSTAMP_SND " Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-15 23:05   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 12/15] net-timestamp: make TCP tx timestamp bpf extension work Jason Xing
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

People can follow these three steps as below to fetch the shared info
from the exported skb in the bpf prog:
1. skops_kern = bpf_cast_to_kern_ctx(skops);
2. skb = skops_kern->skb;
3. shinfo = bpf_core_cast(skb->head + skb->end, struct skb_shared_info);

It's worth to highlight we will be able to fetch the hwstamp, tskey
and more key information extracted from the skb.

More details can be seen in the last selftest patch of the series.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/core/sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index dbb9326ae9d1..2f54e60a50d4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -958,6 +958,7 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
 	if (sk_is_tcp(sk) && sk_fullsock(sk))
 		sock_ops.is_fullsock = 1;
 	sock_ops.sk = sk;
+	bpf_skops_init_skb(&sock_ops, skb, 0);
 	sock_ops.timestamp_used = 1;
 	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 12/15] net-timestamp: make TCP tx timestamp bpf extension work
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (10 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 11/15] net-timestamp: support export skb to the userspace Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-12 11:37 ` [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension Jason Xing
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Make partial of the feature work finally. After this, user can
fully use the bpf prog to trace the tx path for TCP type.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/ipv4/tcp.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d704bda6c41..0a41006b10d1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -492,6 +492,15 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
 		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
 			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
 	}
+
+	if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
+		struct skb_shared_info *shinfo = skb_shinfo(skb);
+		struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
+
+		tcb->txstamp_ack_bpf = 1;
+		shinfo->tx_flags |= SKBTX_BPF;
+		shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+	}
 }
 
 static bool tcp_stream_is_readable(struct sock *sk, int target)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (11 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 12/15] net-timestamp: make TCP tx timestamp bpf extension work Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-16  0:03   ` Martin KaFai Lau
  2025-01-12 11:37 ` [PATCH net-next v5 14/15] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases Jason Xing
  2025-01-12 11:37 ` [PATCH net-next v5 15/15] bpf: add simple bpf tests in the tx path for so_timestamping feature Jason Xing
  14 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Introduce tskey_bpf to correlate tcp_sendmsg timestamp with other
three points (SND/SW/ACK). More details can be found in the
selftest.

For TCP, tskey_bpf is used to store the initial write_seq value
the moment tcp_sendmsg is called, so that the last skb of this
call will have the same tskey_bpf with tcp_sendmsg bpf callback.

UDP works similarly because tskey_bpf can increase by one everytime
udp_sendmsg gets called. It will be implemented soon.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/linux/skbuff.h         |  2 ++
 include/uapi/linux/bpf.h       |  3 +++
 net/core/sock.c                |  3 ++-
 net/ipv4/tcp.c                 | 10 ++++++++--
 tools/include/uapi/linux/bpf.h |  3 +++
 5 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d3ef8db94a94..3b7b470d5d89 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -609,6 +609,8 @@ struct skb_shared_info {
 	};
 	unsigned int	gso_type;
 	u32		tskey;
+	/* For TCP, it records the initial write_seq when sendmsg is called */
+	u32		tskey_bpf;
 
 	/*
 	 * Warning : all fields before dataref are cleared in __alloc_skb()
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a0aff1b4eb61..87420c0f2235 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7037,6 +7037,9 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_TCP_SND_CB,	/* Called when every tcp_sendmsg
+					 * syscall is triggered
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/sock.c b/net/core/sock.c
index 2f54e60a50d4..e74ab0e2979d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -958,7 +958,8 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
 	if (sk_is_tcp(sk) && sk_fullsock(sk))
 		sock_ops.is_fullsock = 1;
 	sock_ops.sk = sk;
-	bpf_skops_init_skb(&sock_ops, skb, 0);
+	if (skb)
+		bpf_skops_init_skb(&sock_ops, skb, 0);
 	sock_ops.timestamp_used = 1;
 	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
 }
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0a41006b10d1..b6e0db5e4ead 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -477,7 +477,7 @@ void tcp_init_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
-static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
+static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc, u32 first_write_seq)
 {
 	struct sk_buff *skb = tcp_write_queue_tail(sk);
 	u32 tsflags = sockc->tsflags;
@@ -500,6 +500,7 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
 		tcb->txstamp_ack_bpf = 1;
 		shinfo->tx_flags |= SKBTX_BPF;
 		shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+		shinfo->tskey_bpf = first_write_seq;
 	}
 }
 
@@ -1067,10 +1068,15 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	int flags, err, copied = 0;
 	int mss_now = 0, size_goal, copied_syn = 0;
 	int process_backlog = 0;
+	u32 first_write_seq = 0;
 	int zc = 0;
 	long timeo;
 
 	flags = msg->msg_flags;
+	if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING)) {
+		first_write_seq = tp->write_seq;
+		bpf_skops_tx_timestamping(sk, NULL, BPF_SOCK_OPS_TS_TCP_SND_CB);
+	}
 
 	if ((flags & MSG_ZEROCOPY) && size) {
 		if (msg->msg_ubuf) {
@@ -1331,7 +1337,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 out:
 	if (copied) {
-		tcp_tx_timestamp(sk, &sockc);
+		tcp_tx_timestamp(sk, &sockc, first_write_seq);
 		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
 	}
 out_nopush:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0fe7d663a244..3769e38e052d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7030,6 +7030,9 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_TCP_SND_CB,	/* Called when every tcp_sendmsg
+					 * syscall is triggered
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 14/15] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (12 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  2025-01-12 11:37 ` [PATCH net-next v5 15/15] bpf: add simple bpf tests in the tx path for so_timestamping feature Jason Xing
  14 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Introducing the lock to avoid affecting the applications which
are not using timestamping bpf feature.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/core/skbuff.c     | 4 +++-
 net/ipv4/tcp.c        | 6 ++++--
 net/ipv4/tcp_input.c  | 3 ++-
 net/ipv4/tcp_output.c | 3 ++-
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4bc7a424eb8a..ce445e49ddc1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5602,7 +5602,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 
 	if (!sk)
 		return;
-	if (skb_shinfo(orig_skb)->tx_flags & SKBTX_BPF)
+
+	if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
+	    skb_shinfo(orig_skb)->tx_flags & SKBTX_BPF)
 		__skb_tstamp_tx_bpf(orig_skb, sk, tstype);
 
 	if (!skb_enable_app_tstamp(orig_skb, tstype, sw))
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b6e0db5e4ead..07326f56cc42 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -493,7 +493,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc, u32 f
 			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
 	}
 
-	if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
+	if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
+	    SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
 		struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
 
@@ -1073,7 +1074,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	long timeo;
 
 	flags = msg->msg_flags;
-	if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING)) {
+	if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
+	    SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING)) {
 		first_write_seq = tp->write_seq;
 		bpf_skops_tx_timestamping(sk, NULL, BPF_SOCK_OPS_TS_TCP_SND_CB);
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0f2e6e73de9f..5493bc911593 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3324,7 +3324,8 @@ static void tcp_ack_tstamp(struct sock *sk, struct sk_buff *skb,
 
 	/* Avoid cache line misses to get skb_shinfo() and shinfo->tx_flags */
 	if (likely(!TCP_SKB_CB(skb)->txstamp_ack &&
-		   !TCP_SKB_CB(skb)->txstamp_ack_bpf))
+		   !(cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
+		     TCP_SKB_CB(skb)->txstamp_ack_bpf)))
 		return;
 
 	shinfo = skb_shinfo(skb);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index aa1da7c89383..2675540c4faf 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1556,7 +1556,8 @@ static void tcp_adjust_pcount(struct sock *sk, const struct sk_buff *skb, int de
 static bool tcp_has_tx_tstamp(const struct sk_buff *skb)
 {
 	return TCP_SKB_CB(skb)->txstamp_ack ||
-	       TCP_SKB_CB(skb)->txstamp_ack_bpf ||
+	       (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
+		TCP_SKB_CB(skb)->txstamp_ack_bpf) ||
 		(skb_shinfo(skb)->tx_flags & SKBTX_ANY_TSTAMP);
 }
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH net-next v5 15/15] bpf: add simple bpf tests in the tx path for so_timestamping feature
  2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (13 preceding siblings ...)
  2025-01-12 11:37 ` [PATCH net-next v5 14/15] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases Jason Xing
@ 2025-01-12 11:37 ` Jason Xing
  14 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-12 11:37 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms
  Cc: bpf, netdev, Jason Xing

Only check if we pass those three key points after we enable the
bpf extension for so_timestamping. During each point, we can choose
whether to print the current timestamp.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 .../bpf/prog_tests/so_timestamping.c          |  95 +++++++++
 .../selftests/bpf/progs/so_timestamping.c     | 186 ++++++++++++++++++
 2 files changed, 281 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/so_timestamping.c
 create mode 100644 tools/testing/selftests/bpf/progs/so_timestamping.c

diff --git a/tools/testing/selftests/bpf/prog_tests/so_timestamping.c b/tools/testing/selftests/bpf/prog_tests/so_timestamping.c
new file mode 100644
index 000000000000..77c347701e50
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/so_timestamping.c
@@ -0,0 +1,95 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <linux/socket.h>
+#include <linux/tls.h>
+#include <net/if.h>
+
+#include "test_progs.h"
+#include "cgroup_helpers.h"
+#include "network_helpers.h"
+
+#include "so_timestamping.skel.h"
+
+#define CG_NAME "/so-timestamping-test"
+
+static const char addr4_str[] = "127.0.0.1";
+static const char addr6_str[] = "::1";
+static struct so_timestamping *skel;
+static int cg_fd;
+
+static int create_netns(void)
+{
+	if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns"))
+		return -1;
+
+	if (!ASSERT_OK(system("ip link set dev lo up"), "set lo up"))
+		return -1;
+
+	return 0;
+}
+
+static void test_tcp(int family)
+{
+	struct so_timestamping__bss *bss = skel->bss;
+	char buf[] = "testing testing";
+	int sfd = -1, cfd = -1;
+	int n;
+
+	memset(bss, 0, sizeof(*bss));
+
+	sfd = start_server(family, SOCK_STREAM,
+			   family == AF_INET6 ? addr6_str : addr4_str, 0, 0);
+	if (!ASSERT_GE(sfd, 0, "start_server"))
+		goto out;
+
+	cfd = connect_to_fd(sfd, 0);
+	if (!ASSERT_GE(cfd, 0, "connect_to_fd_server")) {
+		close(sfd);
+		goto out;
+	}
+
+	n = write(cfd, buf, sizeof(buf));
+	if (!ASSERT_EQ(n, sizeof(buf), "send to server"))
+		goto out;
+
+	ASSERT_EQ(bss->nr_active, 1, "nr_active");
+	ASSERT_EQ(bss->nr_tcp, 1, "nr_tcp");
+	ASSERT_EQ(bss->nr_sched, 1, "nr_sched");
+	ASSERT_EQ(bss->nr_txsw, 1, "nr_txsw");
+	ASSERT_EQ(bss->nr_ack, 1, "nr_ack");
+
+out:
+	if (sfd >= 0)
+		close(sfd);
+	if (cfd >= 0)
+		close(cfd);
+}
+
+void test_so_timestamping(void)
+{
+	cg_fd = test__join_cgroup(CG_NAME);
+	if (cg_fd < 0)
+		return;
+
+	if (create_netns())
+		goto done;
+
+	skel = so_timestamping__open();
+	if (!ASSERT_OK_PTR(skel, "open skel"))
+		goto done;
+
+	if (!ASSERT_OK(so_timestamping__load(skel), "load skel"))
+		goto done;
+
+	skel->links.skops_sockopt =
+		bpf_program__attach_cgroup(skel->progs.skops_sockopt, cg_fd);
+	if (!ASSERT_OK_PTR(skel->links.skops_sockopt, "attach cgroup"))
+		goto done;
+
+	test_tcp(AF_INET6);
+	test_tcp(AF_INET);
+
+done:
+	so_timestamping__destroy(skel);
+	close(cg_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/so_timestamping.c b/tools/testing/selftests/bpf/progs/so_timestamping.c
new file mode 100644
index 000000000000..db0582564fba
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/so_timestamping.c
@@ -0,0 +1,186 @@
+#include "vmlinux.h"
+#include "bpf_tracing_net.h"
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+//#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+#include "bpf_kfuncs.h"
+
+#define SK_BPF_CB_FLAGS 1009
+#define SK_BPF_CB_TX_TIMESTAMPING 1
+
+int nr_active;
+int nr_tcp;
+int nr_passive;
+int nr_sched;
+int nr_txsw;
+int nr_ack;
+
+struct sockopt_test {
+	int opt;
+	int new;
+};
+
+static const struct sockopt_test sol_socket_tests[] = {
+	{ .opt = SK_BPF_CB_FLAGS, .new = SK_BPF_CB_TX_TIMESTAMPING, },
+	{ .opt = 0, },
+};
+
+struct loop_ctx {
+	void *ctx;
+	const struct sock *sk;
+};
+
+struct delay_info {
+	u64 sendmsg_ns;		/* record ts when sendmsg is called */
+	u32 sched_delay;	/* SCHED_OPT_CB - sendmsg_ns */
+	u32 sw_snd_delay;	/* SW_OPT_CB - SCHED_OPT_CB */
+	u32 ack_delay;		/* ACK_OPT_CB - SW_OPT_CB */
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, u32);
+	__type(value, struct delay_info);
+	__uint(max_entries, 1024);
+} time_map SEC(".maps");
+
+static u64 delay_tolerance_nsec = 1000000000; /* 1 second as an example */
+
+static int bpf_test_sockopt_int(void *ctx, const struct sock *sk,
+				const struct sockopt_test *t,
+				int level)
+{
+	int new, opt;
+
+	opt = t->opt;
+	new = t->new;
+
+	if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new)))
+		return 1;
+
+	return 0;
+}
+
+static int bpf_test_socket_sockopt(__u32 i, struct loop_ctx *lc)
+{
+	const struct sockopt_test *t;
+
+	if (i >= ARRAY_SIZE(sol_socket_tests))
+		return 1;
+
+	t = &sol_socket_tests[i];
+	if (!t->opt)
+		return 1;
+
+	return bpf_test_sockopt_int(lc->ctx, lc->sk, t, SOL_SOCKET);
+}
+
+static int bpf_test_sockopt(void *ctx, const struct sock *sk)
+{
+	struct loop_ctx lc = { .ctx = ctx, .sk = sk, };
+	int n;
+
+	n = bpf_loop(ARRAY_SIZE(sol_socket_tests), bpf_test_socket_sockopt, &lc, 0);
+	if (n != ARRAY_SIZE(sol_socket_tests))
+		return -1;
+
+	return 0;
+}
+
+static bool bpf_test_delay(struct bpf_sock_ops *skops, const struct sock *sk)
+{
+	const struct tcp_sock *tp = tcp_sk(sk);
+	struct bpf_sock_ops_kern *skops_kern;
+	u64 timestamp = bpf_ktime_get_ns();
+	struct skb_shared_info *shinfo;
+	struct delay_info dinfo = {0};
+	struct delay_info *val;
+	struct sk_buff *skb;
+	u32 delay, tskey;
+	u64 prior_ts;
+
+	skops_kern = bpf_cast_to_kern_ctx(skops);
+	skb = skops_kern->skb;
+	if (skb) {
+		shinfo = bpf_core_cast(skb->head + skb->end, struct skb_shared_info);
+		tskey = shinfo->tskey_bpf;
+	} else if (skops->op == BPF_SOCK_OPS_TS_TCP_SND_CB) {
+		dinfo.sendmsg_ns = timestamp;
+		tskey = tp->write_seq;
+		val = &dinfo;
+		goto out;
+	} else {
+		return false;
+	}
+
+	val = bpf_map_lookup_elem(&time_map, &tskey);
+	if (!val)
+		return false;
+
+	switch (skops->op) {
+	case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
+		delay = val->sched_delay = timestamp - val->sendmsg_ns;
+		break;
+	case BPF_SOCK_OPS_TS_SW_OPT_CB:
+		prior_ts = val->sched_delay + val->sendmsg_ns;
+		delay = val->sw_snd_delay = timestamp - prior_ts;
+		break;
+	case BPF_SOCK_OPS_TS_ACK_OPT_CB:
+		prior_ts = val->sw_snd_delay + val->sched_delay + val->sendmsg_ns;
+		delay = val->ack_delay = timestamp - prior_ts;
+		break;
+	}
+
+	if (delay <= 0 || delay >= delay_tolerance_nsec)
+		return false;
+
+	/* Since it's the last one, remove from the map after latency check */
+	if (skops->op == BPF_SOCK_OPS_TS_ACK_OPT_CB)
+		bpf_map_delete_elem(&time_map, &tskey);
+
+out:
+	bpf_map_update_elem(&time_map, &tskey, val, BPF_ANY);
+	return true;
+}
+
+SEC("sockops")
+int skops_sockopt(struct bpf_sock_ops *skops)
+{
+	struct bpf_sock *bpf_sk = skops->sk;
+	const struct sock *sk;
+
+	if (!bpf_sk)
+		return 1;
+
+	sk = (struct sock *)bpf_skc_to_tcp_sock(bpf_sk);
+	if (!sk)
+		return 1;
+
+	switch (skops->op) {
+	case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+		nr_active += !bpf_test_sockopt(skops, sk);
+		break;
+	case BPF_SOCK_OPS_TS_TCP_SND_CB:
+		if (bpf_test_delay(skops, sk))
+			nr_tcp += 1;
+		break;
+	case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
+		if (bpf_test_delay(skops, sk))
+			nr_sched += 1;
+		break;
+	case BPF_SOCK_OPS_TS_SW_OPT_CB:
+		if (bpf_test_delay(skops, sk))
+			nr_txsw += 1;
+		break;
+	case BPF_SOCK_OPS_TS_ACK_OPT_CB:
+		if (bpf_test_delay(skops, sk))
+			nr_ack += 1;
+		break;
+	}
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
  2025-01-12 11:37 ` [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls Jason Xing
@ 2025-01-12 14:37   ` kernel test robot
  2025-01-13  0:28     ` Jason Xing
  2025-01-15 21:48   ` Martin KaFai Lau
  2025-01-17 10:18   ` kernel test robot
  2 siblings, 1 reply; 61+ messages in thread
From: kernel test robot @ 2025-01-12 14:37 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, horms
  Cc: llvm, oe-kbuild-all, bpf, netdev, Jason Xing

Hi Jason,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Xing/net-timestamp-add-support-for-bpf_setsockopt/20250112-194115
base:   net-next/main
patch link:    https://lore.kernel.org/r/20250112113748.73504-6-kerneljasonxing%40gmail.com
patch subject: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
config: i386-buildonly-randconfig-006-20250112 (https://download.01.org/0day-ci/archive/20250112/202501122251.7G2Wsbzx-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250112/202501122251.7G2Wsbzx-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501122251.7G2Wsbzx-lkp@intel.com/

All warnings (new ones prefixed by >>):

         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:4863:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    4863 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:4891:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    4891 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:5063:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    5063 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:5077:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    5077 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:5126:45: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    5126 |         .arg1_type      = ARG_PTR_TO_BTF_ID_SOCK_COMMON | PTR_MAYBE_NULL,
         |                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
   net/core/filter.c:5592:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    5592 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:5626:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    5626 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:5660:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    5660 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:5703:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    5703 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:5880:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    5880 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:6417:46: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6417 |         .arg3_type      = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_WRITE | MEM_ALIGNED,
         |                           ~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~
   net/core/filter.c:6429:46: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6429 |         .arg3_type      = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_WRITE | MEM_ALIGNED,
         |                           ~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~
   net/core/filter.c:6515:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6515 |         .arg3_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:6525:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6525 |         .arg3_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:6569:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6569 |         .arg3_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:6658:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6658 |         .arg3_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:6902:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6902 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:6921:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6921 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:6940:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6940 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:6964:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6964 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:6988:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    6988 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7012:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7012 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7029:45: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7029 |         .arg1_type      = ARG_PTR_TO_BTF_ID_SOCK_COMMON | OBJ_RELEASE,
         |                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~
   net/core/filter.c:7050:35: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7050 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7074:35: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7074 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7098:35: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7098 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7118:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7118 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7137:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7137 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7156:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7156 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7474:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7474 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7476:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7476 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7543:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7543 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   net/core/filter.c:7545:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7545 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>> net/core/filter.c:7631:19: warning: result of comparison of constant 'SK_BPF_CB_FLAGS' (1009) with expression of type 'u8' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
    7631 |         if (bpf_sock->op != SK_BPF_CB_FLAGS)
         |             ~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~
   net/core/filter.c:7777:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
    7777 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
         |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
   41 warnings generated.


vim +7631 net/core/filter.c

  7622	
  7623	BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
  7624		   void *, search_res, u32, len, u64, flags)
  7625	{
  7626		bool eol, load_syn = flags & BPF_LOAD_HDR_OPT_TCP_SYN;
  7627		const u8 *op, *opend, *magic, *search = search_res;
  7628		u8 search_kind, search_len, copy_len, magic_len;
  7629		int ret;
  7630	
> 7631		if (bpf_sock->op != SK_BPF_CB_FLAGS)
  7632			return -EINVAL;
  7633	
  7634		/* 2 byte is the minimal option len except TCPOPT_NOP and
  7635		 * TCPOPT_EOL which are useless for the bpf prog to learn
  7636		 * and this helper disallow loading them also.
  7637		 */
  7638		if (len < 2 || flags & ~BPF_LOAD_HDR_OPT_TCP_SYN)
  7639			return -EINVAL;
  7640	
  7641		search_kind = search[0];
  7642		search_len = search[1];
  7643	
  7644		if (search_len > len || search_kind == TCPOPT_NOP ||
  7645		    search_kind == TCPOPT_EOL)
  7646			return -EINVAL;
  7647	
  7648		if (search_kind == TCPOPT_EXP || search_kind == 253) {
  7649			/* 16 or 32 bit magic.  +2 for kind and kind length */
  7650			if (search_len != 4 && search_len != 6)
  7651				return -EINVAL;
  7652			magic = &search[2];
  7653			magic_len = search_len - 2;
  7654		} else {
  7655			if (search_len)
  7656				return -EINVAL;
  7657			magic = NULL;
  7658			magic_len = 0;
  7659		}
  7660	
  7661		if (load_syn) {
  7662			ret = bpf_sock_ops_get_syn(bpf_sock, TCP_BPF_SYN, &op);
  7663			if (ret < 0)
  7664				return ret;
  7665	
  7666			opend = op + ret;
  7667			op += sizeof(struct tcphdr);
  7668		} else {
  7669			if (!bpf_sock->skb ||
  7670			    bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB)
  7671				/* This bpf_sock->op cannot call this helper */
  7672				return -EPERM;
  7673	
  7674			opend = bpf_sock->skb_data_end;
  7675			op = bpf_sock->skb->data + sizeof(struct tcphdr);
  7676		}
  7677	
  7678		op = bpf_search_tcp_opt(op, opend, search_kind, magic, magic_len,
  7679					&eol);
  7680		if (IS_ERR(op))
  7681			return PTR_ERR(op);
  7682	
  7683		copy_len = op[1];
  7684		ret = copy_len;
  7685		if (copy_len > len) {
  7686			ret = -ENOSPC;
  7687			copy_len = len;
  7688		}
  7689	
  7690		memcpy(search_res, op, copy_len);
  7691		return ret;
  7692	}
  7693	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt()
  2025-01-12 11:37 ` [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt() Jason Xing
@ 2025-01-12 14:49   ` kernel test robot
  2025-01-13  0:11     ` Jason Xing
  2025-01-14 23:20   ` Martin KaFai Lau
  1 sibling, 1 reply; 61+ messages in thread
From: kernel test robot @ 2025-01-12 14:49 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, horms
  Cc: oe-kbuild-all, bpf, netdev, Jason Xing

Hi Jason,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Xing/net-timestamp-add-support-for-bpf_setsockopt/20250112-194115
base:   net-next/main
patch link:    https://lore.kernel.org/r/20250112113748.73504-2-kerneljasonxing%40gmail.com
patch subject: [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt()
config: i386-buildonly-randconfig-005-20250112 (https://download.01.org/0day-ci/archive/20250112/202501122252.dqEPb1Wd-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250112/202501122252.dqEPb1Wd-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501122252.dqEPb1Wd-lkp@intel.com/

All warnings (new ones prefixed by >>):

   net/core/filter.c: In function 'sk_bpf_set_cb_flags':
   net/core/filter.c:5237:11: error: 'struct sock' has no member named 'sk_bpf_cb_flags'
    5237 |         sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
         |           ^~
   net/core/filter.c: At top level:
>> net/core/filter.c:5225:12: warning: 'sk_bpf_set_cb_flags' defined but not used [-Wunused-function]
    5225 | static int sk_bpf_set_cb_flags(struct sock *sk, char *optval, bool getopt)
         |            ^~~~~~~~~~~~~~~~~~~


vim +/sk_bpf_set_cb_flags +5225 net/core/filter.c

  5224	
> 5225	static int sk_bpf_set_cb_flags(struct sock *sk, char *optval, bool getopt)
  5226	{
  5227		u32 sk_bpf_cb_flags;
  5228	
  5229		if (getopt)
  5230			return -EINVAL;
  5231	
  5232		sk_bpf_cb_flags = *(u32 *)optval;
  5233	
  5234		if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
  5235			return -EINVAL;
  5236	
> 5237		sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
  5238	
  5239		return 0;
  5240	}
  5241	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt()
  2025-01-12 14:49   ` kernel test robot
@ 2025-01-13  0:11     ` Jason Xing
  2025-01-13  7:32       ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-13  0:11 UTC (permalink / raw)
  To: kernel test robot
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms,
	oe-kbuild-all, bpf, netdev

Thanks for the report.

On Sun, Jan 12, 2025 at 10:50 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Jason,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on net-next/main]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Xing/net-timestamp-add-support-for-bpf_setsockopt/20250112-194115
> base:   net-next/main
> patch link:    https://lore.kernel.org/r/20250112113748.73504-2-kerneljasonxing%40gmail.com
> patch subject: [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt()
> config: i386-buildonly-randconfig-005-20250112 (https://download.01.org/0day-ci/archive/20250112/202501122252.dqEPb1Wd-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250112/202501122252.dqEPb1Wd-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202501122252.dqEPb1Wd-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
>    net/core/filter.c: In function 'sk_bpf_set_cb_flags':
>    net/core/filter.c:5237:11: error: 'struct sock' has no member named 'sk_bpf_cb_flags'
>     5237 |         sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
>          |           ^~

Strange. In this series, I've already ensured that the caller of
sk_bpf_set_cb_flags is protected under CONFIG_BPF_SYSCALL, which is
the same as sk_bpf_cb_flags.

I wonder how it accesses the sk_bpf_cb_flags field if the function it
belongs to is not used, see more details as below (like "defined but
not used").

Thanks,
Jason

>    net/core/filter.c: At top level:
> >> net/core/filter.c:5225:12: warning: 'sk_bpf_set_cb_flags' defined but not used [-Wunused-function]
>     5225 | static int sk_bpf_set_cb_flags(struct sock *sk, char *optval, bool getopt)
>          |            ^~~~~~~~~~~~~~~~~~~
>
>
> vim +/sk_bpf_set_cb_flags +5225 net/core/filter.c
>
>   5224
> > 5225  static int sk_bpf_set_cb_flags(struct sock *sk, char *optval, bool getopt)
>   5226  {
>   5227          u32 sk_bpf_cb_flags;
>   5228
>   5229          if (getopt)
>   5230                  return -EINVAL;
>   5231
>   5232          sk_bpf_cb_flags = *(u32 *)optval;
>   5233
>   5234          if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
>   5235                  return -EINVAL;
>   5236
> > 5237          sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
>   5238
>   5239          return 0;
>   5240  }
>   5241
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
  2025-01-12 14:37   ` kernel test robot
@ 2025-01-13  0:28     ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-13  0:28 UTC (permalink / raw)
  To: kernel test robot
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms,
	llvm, oe-kbuild-all, bpf, netdev

On Sun, Jan 12, 2025 at 10:39 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Jason,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on net-next/main]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Xing/net-timestamp-add-support-for-bpf_setsockopt/20250112-194115
> base:   net-next/main
> patch link:    https://lore.kernel.org/r/20250112113748.73504-6-kerneljasonxing%40gmail.com
> patch subject: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
> config: i386-buildonly-randconfig-006-20250112 (https://download.01.org/0day-ci/archive/20250112/202501122251.7G2Wsbzx-lkp@intel.com/config)
> compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250112/202501122251.7G2Wsbzx-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202501122251.7G2Wsbzx-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:4863:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     4863 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:4891:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     4891 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:5063:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     5063 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:5077:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     5077 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:5126:45: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     5126 |         .arg1_type      = ARG_PTR_TO_BTF_ID_SOCK_COMMON | PTR_MAYBE_NULL,
>          |                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
>    net/core/filter.c:5592:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     5592 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:5626:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     5626 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:5660:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     5660 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:5703:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     5703 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:5880:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     5880 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:6417:46: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6417 |         .arg3_type      = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_WRITE | MEM_ALIGNED,
>          |                           ~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~
>    net/core/filter.c:6429:46: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6429 |         .arg3_type      = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_WRITE | MEM_ALIGNED,
>          |                           ~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~
>    net/core/filter.c:6515:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6515 |         .arg3_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:6525:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6525 |         .arg3_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:6569:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6569 |         .arg3_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:6658:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6658 |         .arg3_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:6902:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6902 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:6921:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6921 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:6940:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6940 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:6964:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6964 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:6988:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     6988 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7012:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7012 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7029:45: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7029 |         .arg1_type      = ARG_PTR_TO_BTF_ID_SOCK_COMMON | OBJ_RELEASE,
>          |                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~
>    net/core/filter.c:7050:35: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7050 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7074:35: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7074 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7098:35: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7098 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7118:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7118 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7137:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7137 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7156:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7156 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7474:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7474 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7476:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7476 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7543:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7543 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    net/core/filter.c:7545:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7545 |         .arg4_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
> >> net/core/filter.c:7631:19: warning: result of comparison of constant 'SK_BPF_CB_FLAGS' (1009) with expression of type 'u8' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
>     7631 |         if (bpf_sock->op != SK_BPF_CB_FLAGS)
>          |             ~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~
>    net/core/filter.c:7777:30: warning: bitwise operation between different enumeration types ('enum bpf_arg_type' and 'enum bpf_type_flag') [-Wenum-enum-conversion]
>     7777 |         .arg2_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
>          |                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
>    41 warnings generated.
>
>
> vim +7631 net/core/filter.c
>
>   7622
>   7623  BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
>   7624             void *, search_res, u32, len, u64, flags)
>   7625  {
>   7626          bool eol, load_syn = flags & BPF_LOAD_HDR_OPT_TCP_SYN;
>   7627          const u8 *op, *opend, *magic, *search = search_res;
>   7628          u8 search_kind, search_len, copy_len, magic_len;
>   7629          int ret;
>   7630
> > 7631          if (bpf_sock->op != SK_BPF_CB_FLAGS)

Oops, I realized that SK_BPF_CB_FLAGS cannot be used by "op". I'll
aggregate all the callbacks used by timestamping and use to test them
here like the following patch to avoid calling these helpers in the
context of timestamping callback.

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 87420c0f2235..9e6a782b4042 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7022,6 +7022,10 @@ enum {
                                         * by the kernel or the
                                         * earlier bpf-progs.
                                         */
+#define BPF_SOCK_OPTS_TS               (BPF_SOCK_OPS_TS_SCHED_OPT_CB | \
+                                        BPF_SOCK_OPS_TS_SW_OPT_CB | \
+                                        BPF_SOCK_OPS_TS_ACK_OPT_CB | \
+                                        BPF_SOCK_OPS_TS_TCP_SND_CB)
        BPF_SOCK_OPS_TS_SCHED_OPT_CB,   /* Called when skb is passing through
                                         * dev layer when SO_TIMESTAMPING
                                         * feature is on. It indicates the
diff --git a/net/core/filter.c b/net/core/filter.c
index 517f09aabc92..1fcd88b558f4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7628,7 +7628,7 @@ BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct
bpf_sock_ops_kern *, bpf_sock,
        u8 search_kind, search_len, copy_len, magic_len;
        int ret;

-       if (bpf_sock->op != SK_BPF_CB_FLAGS)
+       if (bpf_sock->op != BPF_SOCK_OPTS_TS)
                return -EINVAL;

        /* 2 byte is the minimal option len except TCPOPT_NOP and

Thanks,
Jason

>   7632                  return -EINVAL;
>   7633
>   7634          /* 2 byte is the minimal option len except TCPOPT_NOP and
>   7635           * TCPOPT_EOL which are useless for the bpf prog to learn
>   7636           * and this helper disallow loading them also.
>   7637           */
>   7638          if (len < 2 || flags & ~BPF_LOAD_HDR_OPT_TCP_SYN)
>   7639                  return -EINVAL;
>   7640
>   7641          search_kind = search[0];
>   7642          search_len = search[1];
>   7643
>   7644          if (search_len > len || search_kind == TCPOPT_NOP ||
>   7645              search_kind == TCPOPT_EOL)
>   7646                  return -EINVAL;
>   7647
>   7648          if (search_kind == TCPOPT_EXP || search_kind == 253) {
>   7649                  /* 16 or 32 bit magic.  +2 for kind and kind length */
>   7650                  if (search_len != 4 && search_len != 6)
>   7651                          return -EINVAL;
>   7652                  magic = &search[2];
>   7653                  magic_len = search_len - 2;
>   7654          } else {
>   7655                  if (search_len)
>   7656                          return -EINVAL;
>   7657                  magic = NULL;
>   7658                  magic_len = 0;
>   7659          }
>   7660
>   7661          if (load_syn) {
>   7662                  ret = bpf_sock_ops_get_syn(bpf_sock, TCP_BPF_SYN, &op);
>   7663                  if (ret < 0)
>   7664                          return ret;
>   7665
>   7666                  opend = op + ret;
>   7667                  op += sizeof(struct tcphdr);
>   7668          } else {
>   7669                  if (!bpf_sock->skb ||
>   7670                      bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB)
>   7671                          /* This bpf_sock->op cannot call this helper */
>   7672                          return -EPERM;
>   7673
>   7674                  opend = bpf_sock->skb_data_end;
>   7675                  op = bpf_sock->skb->data + sizeof(struct tcphdr);
>   7676          }
>   7677
>   7678          op = bpf_search_tcp_opt(op, opend, search_kind, magic, magic_len,
>   7679                                  &eol);
>   7680          if (IS_ERR(op))
>   7681                  return PTR_ERR(op);
>   7682
>   7683          copy_len = op[1];
>   7684          ret = copy_len;
>   7685          if (copy_len > len) {
>   7686                  ret = -ENOSPC;
>   7687                  copy_len = len;
>   7688          }
>   7689
>   7690          memcpy(search_res, op, copy_len);
>   7691          return ret;
>   7692  }
>   7693
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt()
  2025-01-13  0:11     ` Jason Xing
@ 2025-01-13  7:32       ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-13  7:32 UTC (permalink / raw)
  To: kernel test robot
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, horms,
	oe-kbuild-all, bpf, netdev

On Mon, Jan 13, 2025 at 8:11 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Thanks for the report.
>
> On Sun, Jan 12, 2025 at 10:50 PM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi Jason,
> >
> > kernel test robot noticed the following build warnings:
> >
> > [auto build test WARNING on net-next/main]
> >
> > url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Xing/net-timestamp-add-support-for-bpf_setsockopt/20250112-194115
> > base:   net-next/main
> > patch link:    https://lore.kernel.org/r/20250112113748.73504-2-kerneljasonxing%40gmail.com
> > patch subject: [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt()
> > config: i386-buildonly-randconfig-005-20250112 (https://download.01.org/0day-ci/archive/20250112/202501122252.dqEPb1Wd-lkp@intel.com/config)
> > compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250112/202501122252.dqEPb1Wd-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202501122252.dqEPb1Wd-lkp@intel.com/
> >
> > All warnings (new ones prefixed by >>):
> >
> >    net/core/filter.c: In function 'sk_bpf_set_cb_flags':
> >    net/core/filter.c:5237:11: error: 'struct sock' has no member named 'sk_bpf_cb_flags'
> >     5237 |         sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
> >          |           ^~
>
> Strange. In this series, I've already ensured that the caller of
> sk_bpf_set_cb_flags is protected under CONFIG_BPF_SYSCALL, which is
> the same as sk_bpf_cb_flags.
>
> I wonder how it accesses the sk_bpf_cb_flags field if the function it
> belongs to is not used, see more details as below (like "defined but
> not used").

Well, I still can't figure out how it happened. In the next respin, I
will try "#ifdef CONFIG_BPF" which bpf_sock_ops_cb_flags uses.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt()
  2025-01-12 11:37 ` [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt() Jason Xing
  2025-01-12 14:49   ` kernel test robot
@ 2025-01-14 23:20   ` Martin KaFai Lau
  2025-01-14 23:29     ` Jason Xing
  1 sibling, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-14 23:20 UTC (permalink / raw)
  To: Jason Xing
  Cc: bpf, netdev, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, eddyz87,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	horms

On 1/12/25 3:37 AM, Jason Xing wrote:
> Users can write the following code to enable the bpf extension:
> bpf_setsockopt(skops, SOL_SOCKET, SK_BPF_CB_FLAGS, &flags, sizeof(flags));
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   include/net/sock.h             |  7 +++++++
>   include/uapi/linux/bpf.h       |  8 ++++++++
>   net/core/filter.c              | 25 +++++++++++++++++++++++++
>   tools/include/uapi/linux/bpf.h |  1 +
>   4 files changed, 41 insertions(+)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index ccf86c8a7a8a..f5447b4b78fd 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -303,6 +303,7 @@ struct sk_filter;
>     *	@sk_stamp: time stamp of last packet received
>     *	@sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only
>     *	@sk_tsflags: SO_TIMESTAMPING flags
> +  *	@sk_bpf_cb_flags: used for bpf_setsockopt
>     *	@sk_use_task_frag: allow sk_page_frag() to use current->task_frag.
>     *			   Sockets that can be used under memory reclaim should
>     *			   set this to false.
> @@ -445,6 +446,12 @@ struct sock {
>   	u32			sk_reserved_mem;
>   	int			sk_forward_alloc;
>   	u32			sk_tsflags;
> +#ifdef CONFIG_BPF_SYSCALL

The CONFIG_BPF is used instead in the existing "u8 bpf_sock_ops_cb_flags;" in 
tcp_sock. afaik, CONFIG_BPF is selected by CONFIG_NET. It is why the test bot 
fails when CONFIG_BPF_SYSCALL is used here but not with the existing 
bpf_sock_ops_cb_flags. Considering CONFIG_BPF is also mostly useless here 
because of CONFIG_NET, I would remove this ifdef usage altogether. If there is 
really a need to distinguish CONFIG_BPF_SYSCALL is enabled or not, this can be 
improved together with the existing bpf_sock_ops_cb_flags.

> +#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
> +	u32			sk_bpf_cb_flags;
> +#else
> +#define SK_BPF_CB_FLAG_TEST(SK, FLAG) 0
> +#endif
>   	__cacheline_group_end(sock_write_rxtx);
>   
>   	__cacheline_group_begin(sock_write_tx);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4162afc6b5d0..e629e09b0b31 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6903,6 +6903,13 @@ enum {
>   	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
>   };
>   
> +/* Definitions for bpf_sk_cb_flags */
> +enum {
> +	SK_BPF_CB_TX_TIMESTAMPING	= 1<<0,
> +	SK_BPF_CB_MASK			= (SK_BPF_CB_TX_TIMESTAMPING - 1) |
> +					   SK_BPF_CB_TX_TIMESTAMPING
> +};
> +
>   /* List of known BPF sock_ops operators.
>    * New entries can only be added at the end
>    */
> @@ -7081,6 +7088,7 @@ enum {
>   	TCP_BPF_SYN_IP		= 1006, /* Copy the IP[46] and TCP header */
>   	TCP_BPF_SYN_MAC         = 1007, /* Copy the MAC, IP[46], and TCP header */
>   	TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
> +	SK_BPF_CB_FLAGS		= 1009, /* Used to set socket bpf flags */
>   };
>   
>   enum {
> diff --git a/net/core/filter.c b/net/core/filter.c
> index b957cf57299e..c6dd2d2e44c8 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5222,6 +5222,23 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
>   	.arg1_type      = ARG_PTR_TO_CTX,
>   };
>   
> +static int sk_bpf_set_cb_flags(struct sock *sk, char *optval, bool getopt)
> +{
> +	u32 sk_bpf_cb_flags;
> +
> +	if (getopt)

I may have this in my earlier sample code? This is probably because of my 
laziness for a quick example. getopt should also be supported, similar to the 
existing TCP_BPF_SOCK_OPS_CB_FLAGS.

> +		return -EINVAL;
> +
> +	sk_bpf_cb_flags = *(u32 *)optval;
> +
> +	if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
> +		return -EINVAL;
> +
> +	sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
> +
> +	return 0;
> +}
> +
>   static int sol_socket_sockopt(struct sock *sk, int optname,
>   			      char *optval, int *optlen,
>   			      bool getopt)
> @@ -5238,6 +5255,7 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
>   	case SO_MAX_PACING_RATE:
>   	case SO_BINDTOIFINDEX:
>   	case SO_TXREHASH:
> +	case SK_BPF_CB_FLAGS:
>   		if (*optlen != sizeof(int))
>   			return -EINVAL;
>   		break;
> @@ -5247,6 +5265,13 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
>   		return -EINVAL;
>   	}
>   
> +	if (optname == SK_BPF_CB_FLAGS)
> +#ifdef CONFIG_BPF_SYSCALL
> +		return sk_bpf_set_cb_flags(sk, optval, getopt);
> +#else
> +		return -EINVAL;
> +#endif
> +
>   	if (getopt) {
>   		if (optname == SO_BINDTODEVICE)
>   			return -EINVAL;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4162afc6b5d0..6b0a5b787b12 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7081,6 +7081,7 @@ enum {
>   	TCP_BPF_SYN_IP		= 1006, /* Copy the IP[46] and TCP header */
>   	TCP_BPF_SYN_MAC         = 1007, /* Copy the MAC, IP[46], and TCP header */
>   	TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
> +	SK_BPF_CB_FLAGS		= 1009, /* Used to set socket bpf flags */
>   };
>   
>   enum {


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt()
  2025-01-14 23:20   ` Martin KaFai Lau
@ 2025-01-14 23:29     ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-14 23:29 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, netdev, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, eddyz87,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	horms

On Wed, Jan 15, 2025 at 7:20 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> > Users can write the following code to enable the bpf extension:
> > bpf_setsockopt(skops, SOL_SOCKET, SK_BPF_CB_FLAGS, &flags, sizeof(flags));
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   include/net/sock.h             |  7 +++++++
> >   include/uapi/linux/bpf.h       |  8 ++++++++
> >   net/core/filter.c              | 25 +++++++++++++++++++++++++
> >   tools/include/uapi/linux/bpf.h |  1 +
> >   4 files changed, 41 insertions(+)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index ccf86c8a7a8a..f5447b4b78fd 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -303,6 +303,7 @@ struct sk_filter;
> >     * @sk_stamp: time stamp of last packet received
> >     * @sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only
> >     * @sk_tsflags: SO_TIMESTAMPING flags
> > +  *  @sk_bpf_cb_flags: used for bpf_setsockopt
> >     * @sk_use_task_frag: allow sk_page_frag() to use current->task_frag.
> >     *                    Sockets that can be used under memory reclaim should
> >     *                    set this to false.
> > @@ -445,6 +446,12 @@ struct sock {
> >       u32                     sk_reserved_mem;
> >       int                     sk_forward_alloc;
> >       u32                     sk_tsflags;
> > +#ifdef CONFIG_BPF_SYSCALL
>
> The CONFIG_BPF is used instead in the existing "u8 bpf_sock_ops_cb_flags;" in
> tcp_sock. afaik, CONFIG_BPF is selected by CONFIG_NET. It is why the test bot
> fails when CONFIG_BPF_SYSCALL is used here but not with the existing
> bpf_sock_ops_cb_flags. Considering CONFIG_BPF is also mostly useless here
> because of CONFIG_NET, I would remove this ifdef usage altogether. If there is
> really a need to distinguish CONFIG_BPF_SYSCALL is enabled or not, this can be
> improved together with the existing bpf_sock_ops_cb_flags.

Thank you, Martin. Then I will remove those ifdef related limitation.

>
> > +#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
> > +     u32                     sk_bpf_cb_flags;
> > +#else
> > +#define SK_BPF_CB_FLAG_TEST(SK, FLAG) 0
> > +#endif
> >       __cacheline_group_end(sock_write_rxtx);
> >
> >       __cacheline_group_begin(sock_write_tx);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4162afc6b5d0..e629e09b0b31 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -6903,6 +6903,13 @@ enum {
> >       BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
> >   };
> >
> > +/* Definitions for bpf_sk_cb_flags */
> > +enum {
> > +     SK_BPF_CB_TX_TIMESTAMPING       = 1<<0,
> > +     SK_BPF_CB_MASK                  = (SK_BPF_CB_TX_TIMESTAMPING - 1) |
> > +                                        SK_BPF_CB_TX_TIMESTAMPING
> > +};
> > +
> >   /* List of known BPF sock_ops operators.
> >    * New entries can only be added at the end
> >    */
> > @@ -7081,6 +7088,7 @@ enum {
> >       TCP_BPF_SYN_IP          = 1006, /* Copy the IP[46] and TCP header */
> >       TCP_BPF_SYN_MAC         = 1007, /* Copy the MAC, IP[46], and TCP header */
> >       TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
> > +     SK_BPF_CB_FLAGS         = 1009, /* Used to set socket bpf flags */
> >   };
> >
> >   enum {
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index b957cf57299e..c6dd2d2e44c8 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5222,6 +5222,23 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
> >       .arg1_type      = ARG_PTR_TO_CTX,
> >   };
> >
> > +static int sk_bpf_set_cb_flags(struct sock *sk, char *optval, bool getopt)
> > +{
> > +     u32 sk_bpf_cb_flags;
> > +
> > +     if (getopt)
>
> I may have this in my earlier sample code? This is probably because of my
> laziness for a quick example. getopt should also be supported, similar to the
> existing TCP_BPF_SOCK_OPS_CB_FLAGS.

Right, to be honest, I keep curious about this one, but I have no
clue. I will support getopt which should be easy :)

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use
  2025-01-12 11:37 ` [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use Jason Xing
@ 2025-01-14 23:39   ` Martin KaFai Lau
  2025-01-15  0:09     ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-14 23:39 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/12/25 3:37 AM, Jason Xing wrote:
> Later, I would introduce three points to report some information
> to user space based on this.
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   include/net/sock.h |  7 +++++++
>   net/core/sock.c    | 14 ++++++++++++++
>   2 files changed, 21 insertions(+)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index f5447b4b78fd..dd874e8337c0 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2930,6 +2930,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
>   			  struct so_timestamping timestamping);
>   
>   void sock_enable_timestamps(struct sock *sk);
> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
> +#else
> +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> +{
> +}
> +#endif
>   void sock_no_linger(struct sock *sk);
>   void sock_set_keepalive(struct sock *sk);
>   void sock_set_priority(struct sock *sk, u32 priority);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index eae2ae70a2e0..e06bcafb1b2d 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -948,6 +948,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
>   	return 0;
>   }
>   
> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> +{
> +	struct bpf_sock_ops_kern sock_ops;
> +
> +	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> +	sock_ops.op = op;
> +	if (sk_is_tcp(sk) && sk_fullsock(sk))
> +		sock_ops.is_fullsock = 1;
> +	sock_ops.sk = sk;
> +	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);

hmm... I think I have already mentioned it in the earlier revision
(https://lore.kernel.org/bpf/f8e9ab4a-38b9-43a5-aaf4-15f95a3463d0@linux.dev/).

__cgroup_bpf_run_filter_sock_ops(sk, ...) requires sk to be fullsock.
Take a look at how BPF_CGROUP_RUN_PROG_SOCK_OPS does it.
sk_to_full_sk() is used to get back the listener. For other mini socks,
it needs to skip calling the cgroup bpf prog. I still don't understand
why BPF_CGROUP_RUN_PROG_SOCK_OPS cannot be used here because of udp.

> +}
> +#endif
> +
>   void sock_set_keepalive(struct sock *sk)
>   {
>   	lock_sock(sk);


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use
  2025-01-14 23:39   ` Martin KaFai Lau
@ 2025-01-15  0:09     ` Jason Xing
  2025-01-15  0:15       ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-15  0:09 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Jan 15, 2025 at 7:40 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> > Later, I would introduce three points to report some information
> > to user space based on this.
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   include/net/sock.h |  7 +++++++
> >   net/core/sock.c    | 14 ++++++++++++++
> >   2 files changed, 21 insertions(+)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index f5447b4b78fd..dd874e8337c0 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2930,6 +2930,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >                         struct so_timestamping timestamping);
> >
> >   void sock_enable_timestamps(struct sock *sk);
> > +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> > +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
> > +#else
> > +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > +{
> > +}
> > +#endif
> >   void sock_no_linger(struct sock *sk);
> >   void sock_set_keepalive(struct sock *sk);
> >   void sock_set_priority(struct sock *sk, u32 priority);
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index eae2ae70a2e0..e06bcafb1b2d 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -948,6 +948,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >       return 0;
> >   }
> >
> > +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> > +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > +{
> > +     struct bpf_sock_ops_kern sock_ops;
> > +
> > +     memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> > +     sock_ops.op = op;
> > +     if (sk_is_tcp(sk) && sk_fullsock(sk))
> > +             sock_ops.is_fullsock = 1;
> > +     sock_ops.sk = sk;
> > +     __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
>
> hmm... I think I have already mentioned it in the earlier revision
> (https://lore.kernel.org/bpf/f8e9ab4a-38b9-43a5-aaf4-15f95a3463d0@linux.dev/).

Right, sorry, but I deleted it intentionally.

>
> __cgroup_bpf_run_filter_sock_ops(sk, ...) requires sk to be fullsock.

Well, I don't understand it, BPF_CGROUP_RUN_PROG_SOCK_OPS_SK() don't
need to check whether it is fullsock or not.

> Take a look at how BPF_CGROUP_RUN_PROG_SOCK_OPS does it.
> sk_to_full_sk() is used to get back the listener. For other mini socks,
> it needs to skip calling the cgroup bpf prog. I still don't understand
> why BPF_CGROUP_RUN_PROG_SOCK_OPS cannot be used here because of udp.

Sorry, I got lost here. BPF_CGROUP_RUN_PROG_SOCK_OPS cannot support
udp, right? And I think we've discussed that we have to get rid of the
limitation of fullsock.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use
  2025-01-15  0:09     ` Jason Xing
@ 2025-01-15  0:15       ` Jason Xing
  2025-01-15  0:26         ` Martin KaFai Lau
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-15  0:15 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Jan 15, 2025 at 8:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Jan 15, 2025 at 7:40 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 1/12/25 3:37 AM, Jason Xing wrote:
> > > Later, I would introduce three points to report some information
> > > to user space based on this.
> > >
> > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > ---
> > >   include/net/sock.h |  7 +++++++
> > >   net/core/sock.c    | 14 ++++++++++++++
> > >   2 files changed, 21 insertions(+)
> > >
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index f5447b4b78fd..dd874e8337c0 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -2930,6 +2930,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
> > >                         struct so_timestamping timestamping);
> > >
> > >   void sock_enable_timestamps(struct sock *sk);
> > > +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> > > +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
> > > +#else
> > > +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > > +{
> > > +}
> > > +#endif
> > >   void sock_no_linger(struct sock *sk);
> > >   void sock_set_keepalive(struct sock *sk);
> > >   void sock_set_priority(struct sock *sk, u32 priority);
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index eae2ae70a2e0..e06bcafb1b2d 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -948,6 +948,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
> > >       return 0;
> > >   }
> > >
> > > +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> > > +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > > +{
> > > +     struct bpf_sock_ops_kern sock_ops;
> > > +
> > > +     memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> > > +     sock_ops.op = op;
> > > +     if (sk_is_tcp(sk) && sk_fullsock(sk))
> > > +             sock_ops.is_fullsock = 1;
> > > +     sock_ops.sk = sk;
> > > +     __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
> >
> > hmm... I think I have already mentioned it in the earlier revision
> > (https://lore.kernel.org/bpf/f8e9ab4a-38b9-43a5-aaf4-15f95a3463d0@linux.dev/).
>
> Right, sorry, but I deleted it intentionally.
>
> >
> > __cgroup_bpf_run_filter_sock_ops(sk, ...) requires sk to be fullsock.
>
> Well, I don't understand it, BPF_CGROUP_RUN_PROG_SOCK_OPS_SK() don't
> need to check whether it is fullsock or not.
>
> > Take a look at how BPF_CGROUP_RUN_PROG_SOCK_OPS does it.
> > sk_to_full_sk() is used to get back the listener. For other mini socks,
> > it needs to skip calling the cgroup bpf prog. I still don't understand
> > why BPF_CGROUP_RUN_PROG_SOCK_OPS cannot be used here because of udp.
>
> Sorry, I got lost here. BPF_CGROUP_RUN_PROG_SOCK_OPS cannot support
> udp, right? And I think we've discussed that we have to get rid of the
> limitation of fullsock.

To support udp case, I think I can add the following check for
__cgroup_bpf_run_filter_sock_ops() instead of directly calling
BPF_CGROUP_RUN_PROG_SOCK_OPS():
1) if the socket belongs to tcp type, it should be fullsock.
2) or if it is a udp type socket. Then no need to check and use the fullsock.

Above lines/policies should be applied to the rest of the series, right?

According to the existing callbacks, the tcp socket is indeed fullsock.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use
  2025-01-15  0:15       ` Jason Xing
@ 2025-01-15  0:26         ` Martin KaFai Lau
  2025-01-15  0:37           ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-15  0:26 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/14/25 4:15 PM, Jason Xing wrote:
> On Wed, Jan 15, 2025 at 8:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>
>> On Wed, Jan 15, 2025 at 7:40 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>
>>> On 1/12/25 3:37 AM, Jason Xing wrote:
>>>> Later, I would introduce three points to report some information
>>>> to user space based on this.
>>>>
>>>> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
>>>> ---
>>>>    include/net/sock.h |  7 +++++++
>>>>    net/core/sock.c    | 14 ++++++++++++++
>>>>    2 files changed, 21 insertions(+)
>>>>
>>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>>> index f5447b4b78fd..dd874e8337c0 100644
>>>> --- a/include/net/sock.h
>>>> +++ b/include/net/sock.h
>>>> @@ -2930,6 +2930,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
>>>>                          struct so_timestamping timestamping);
>>>>
>>>>    void sock_enable_timestamps(struct sock *sk);
>>>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
>>>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
>>>> +#else
>>>> +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
>>>> +{
>>>> +}
>>>> +#endif
>>>>    void sock_no_linger(struct sock *sk);
>>>>    void sock_set_keepalive(struct sock *sk);
>>>>    void sock_set_priority(struct sock *sk, u32 priority);
>>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>>> index eae2ae70a2e0..e06bcafb1b2d 100644
>>>> --- a/net/core/sock.c
>>>> +++ b/net/core/sock.c
>>>> @@ -948,6 +948,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
>>>>        return 0;
>>>>    }
>>>>
>>>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
>>>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
>>>> +{
>>>> +     struct bpf_sock_ops_kern sock_ops;
>>>> +
>>>> +     memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
>>>> +     sock_ops.op = op;
>>>> +     if (sk_is_tcp(sk) && sk_fullsock(sk))
>>>> +             sock_ops.is_fullsock = 1;
>>>> +     sock_ops.sk = sk;
>>>> +     __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
>>>
>>> hmm... I think I have already mentioned it in the earlier revision
>>> (https://lore.kernel.org/bpf/f8e9ab4a-38b9-43a5-aaf4-15f95a3463d0@linux.dev/).
>>
>> Right, sorry, but I deleted it intentionally.
>>
>>>
>>> __cgroup_bpf_run_filter_sock_ops(sk, ...) requires sk to be fullsock.
>>
>> Well, I don't understand it, BPF_CGROUP_RUN_PROG_SOCK_OPS_SK() don't
>> need to check whether it is fullsock or not.

It is because the callers of BPF_CGROUP_RUN_PROG_SOCK_OPS_SK guarantees it is 
fullsock.

>>
>>> Take a look at how BPF_CGROUP_RUN_PROG_SOCK_OPS does it.
>>> sk_to_full_sk() is used to get back the listener. For other mini socks,
>>> it needs to skip calling the cgroup bpf prog. I still don't understand
>>> why BPF_CGROUP_RUN_PROG_SOCK_OPS cannot be used here because of udp.
>>
>> Sorry, I got lost here. BPF_CGROUP_RUN_PROG_SOCK_OPS cannot support
>> udp, right? And I think we've discussed that we have to get rid of the
>> limitation of fullsock.

It is the part I am missing. Why BPF_CGROUP_RUN_PROG_SOCK_OPS cannot support 
udp? UDP is not a fullsock?

> 
> To support udp case, I think I can add the following check for
> __cgroup_bpf_run_filter_sock_ops() instead of directly calling
> BPF_CGROUP_RUN_PROG_SOCK_OPS():
> 1) if the socket belongs to tcp type, it should be fullsock.
> 2) or if it is a udp type socket. Then no need to check and use the fullsock.
> 
> Above lines/policies should be applied to the rest of the series, right?
> 
> According to the existing callbacks, the tcp socket is indeed fullsock.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use
  2025-01-15  0:26         ` Martin KaFai Lau
@ 2025-01-15  0:37           ` Jason Xing
  2025-01-15  0:43             ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-15  0:37 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Jan 15, 2025 at 8:26 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/14/25 4:15 PM, Jason Xing wrote:
> > On Wed, Jan 15, 2025 at 8:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>
> >> On Wed, Jan 15, 2025 at 7:40 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>>
> >>> On 1/12/25 3:37 AM, Jason Xing wrote:
> >>>> Later, I would introduce three points to report some information
> >>>> to user space based on this.
> >>>>
> >>>> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> >>>> ---
> >>>>    include/net/sock.h |  7 +++++++
> >>>>    net/core/sock.c    | 14 ++++++++++++++
> >>>>    2 files changed, 21 insertions(+)
> >>>>
> >>>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>>> index f5447b4b78fd..dd874e8337c0 100644
> >>>> --- a/include/net/sock.h
> >>>> +++ b/include/net/sock.h
> >>>> @@ -2930,6 +2930,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >>>>                          struct so_timestamping timestamping);
> >>>>
> >>>>    void sock_enable_timestamps(struct sock *sk);
> >>>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> >>>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
> >>>> +#else
> >>>> +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> >>>> +{
> >>>> +}
> >>>> +#endif
> >>>>    void sock_no_linger(struct sock *sk);
> >>>>    void sock_set_keepalive(struct sock *sk);
> >>>>    void sock_set_priority(struct sock *sk, u32 priority);
> >>>> diff --git a/net/core/sock.c b/net/core/sock.c
> >>>> index eae2ae70a2e0..e06bcafb1b2d 100644
> >>>> --- a/net/core/sock.c
> >>>> +++ b/net/core/sock.c
> >>>> @@ -948,6 +948,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >>>>        return 0;
> >>>>    }
> >>>>
> >>>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> >>>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> >>>> +{
> >>>> +     struct bpf_sock_ops_kern sock_ops;
> >>>> +
> >>>> +     memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> >>>> +     sock_ops.op = op;
> >>>> +     if (sk_is_tcp(sk) && sk_fullsock(sk))
> >>>> +             sock_ops.is_fullsock = 1;
> >>>> +     sock_ops.sk = sk;
> >>>> +     __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
> >>>
> >>> hmm... I think I have already mentioned it in the earlier revision
> >>> (https://lore.kernel.org/bpf/f8e9ab4a-38b9-43a5-aaf4-15f95a3463d0@linux.dev/).
> >>
> >> Right, sorry, but I deleted it intentionally.
> >>
> >>>
> >>> __cgroup_bpf_run_filter_sock_ops(sk, ...) requires sk to be fullsock.
> >>
> >> Well, I don't understand it, BPF_CGROUP_RUN_PROG_SOCK_OPS_SK() don't
> >> need to check whether it is fullsock or not.
>
> It is because the callers of BPF_CGROUP_RUN_PROG_SOCK_OPS_SK guarantees it is
> fullsock.
>
> >>
> >>> Take a look at how BPF_CGROUP_RUN_PROG_SOCK_OPS does it.
> >>> sk_to_full_sk() is used to get back the listener. For other mini socks,
> >>> it needs to skip calling the cgroup bpf prog. I still don't understand
> >>> why BPF_CGROUP_RUN_PROG_SOCK_OPS cannot be used here because of udp.
> >>
> >> Sorry, I got lost here. BPF_CGROUP_RUN_PROG_SOCK_OPS cannot support
> >> udp, right? And I think we've discussed that we have to get rid of the
> >> limitation of fullsock.
>
> It is the part I am missing. Why BPF_CGROUP_RUN_PROG_SOCK_OPS cannot support
> udp? UDP is not a fullsock?

No, you're not missing anything. UDP is a fullsock and
BPF_CGROUP_RUN_PROG_SOCK_OPS itself can support udp as my v3 version
used this method already like you suggest. I feel like
misunderstanding what you really suggest. Sorry for the trouble
caused.

I wonder if using is_fullsock would affect/break the usage of fetching
some fields, especially tcp related fields,  in
sock_ops_convert_ctx_access()? Sorry that I'm not a bpf expert :(

If not, I will use BPF_CGROUP_RUN_PROG_SOCK_OPS instead.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use
  2025-01-15  0:37           ` Jason Xing
@ 2025-01-15  0:43             ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-15  0:43 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Jan 15, 2025 at 8:37 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Jan 15, 2025 at 8:26 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 1/14/25 4:15 PM, Jason Xing wrote:
> > > On Wed, Jan 15, 2025 at 8:09 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >>
> > >> On Wed, Jan 15, 2025 at 7:40 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > >>>
> > >>> On 1/12/25 3:37 AM, Jason Xing wrote:
> > >>>> Later, I would introduce three points to report some information
> > >>>> to user space based on this.
> > >>>>
> > >>>> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > >>>> ---
> > >>>>    include/net/sock.h |  7 +++++++
> > >>>>    net/core/sock.c    | 14 ++++++++++++++
> > >>>>    2 files changed, 21 insertions(+)
> > >>>>
> > >>>> diff --git a/include/net/sock.h b/include/net/sock.h
> > >>>> index f5447b4b78fd..dd874e8337c0 100644
> > >>>> --- a/include/net/sock.h
> > >>>> +++ b/include/net/sock.h
> > >>>> @@ -2930,6 +2930,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
> > >>>>                          struct so_timestamping timestamping);
> > >>>>
> > >>>>    void sock_enable_timestamps(struct sock *sk);
> > >>>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> > >>>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
> > >>>> +#else
> > >>>> +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > >>>> +{
> > >>>> +}
> > >>>> +#endif
> > >>>>    void sock_no_linger(struct sock *sk);
> > >>>>    void sock_set_keepalive(struct sock *sk);
> > >>>>    void sock_set_priority(struct sock *sk, u32 priority);
> > >>>> diff --git a/net/core/sock.c b/net/core/sock.c
> > >>>> index eae2ae70a2e0..e06bcafb1b2d 100644
> > >>>> --- a/net/core/sock.c
> > >>>> +++ b/net/core/sock.c
> > >>>> @@ -948,6 +948,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
> > >>>>        return 0;
> > >>>>    }
> > >>>>
> > >>>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> > >>>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > >>>> +{
> > >>>> +     struct bpf_sock_ops_kern sock_ops;
> > >>>> +
> > >>>> +     memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> > >>>> +     sock_ops.op = op;
> > >>>> +     if (sk_is_tcp(sk) && sk_fullsock(sk))
> > >>>> +             sock_ops.is_fullsock = 1;
> > >>>> +     sock_ops.sk = sk;
> > >>>> +     __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
> > >>>
> > >>> hmm... I think I have already mentioned it in the earlier revision
> > >>> (https://lore.kernel.org/bpf/f8e9ab4a-38b9-43a5-aaf4-15f95a3463d0@linux.dev/).
> > >>
> > >> Right, sorry, but I deleted it intentionally.
> > >>
> > >>>
> > >>> __cgroup_bpf_run_filter_sock_ops(sk, ...) requires sk to be fullsock.
> > >>
> > >> Well, I don't understand it, BPF_CGROUP_RUN_PROG_SOCK_OPS_SK() don't
> > >> need to check whether it is fullsock or not.
> >
> > It is because the callers of BPF_CGROUP_RUN_PROG_SOCK_OPS_SK guarantees it is
> > fullsock.
> >
> > >>
> > >>> Take a look at how BPF_CGROUP_RUN_PROG_SOCK_OPS does it.
> > >>> sk_to_full_sk() is used to get back the listener. For other mini socks,
> > >>> it needs to skip calling the cgroup bpf prog. I still don't understand
> > >>> why BPF_CGROUP_RUN_PROG_SOCK_OPS cannot be used here because of udp.
> > >>
> > >> Sorry, I got lost here. BPF_CGROUP_RUN_PROG_SOCK_OPS cannot support
> > >> udp, right? And I think we've discussed that we have to get rid of the
> > >> limitation of fullsock.
> >
> > It is the part I am missing. Why BPF_CGROUP_RUN_PROG_SOCK_OPS cannot support
> > udp? UDP is not a fullsock?
>
> No, you're not missing anything. UDP is a fullsock and
> BPF_CGROUP_RUN_PROG_SOCK_OPS itself can support udp as my v3 version
> used this method already like you suggest. I feel like
> misunderstanding what you really suggest. Sorry for the trouble
> caused.
>
> I wonder if using is_fullsock would affect/break the usage of fetching
> some fields, especially tcp related fields,  in
> sock_ops_convert_ctx_access()? Sorry that I'm not a bpf expert :(
>
> If not, I will use BPF_CGROUP_RUN_PROG_SOCK_OPS instead.

To be clearer, I would use the following code snippet in the next respin:
+void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
+{
+       struct bpf_sock_ops_kern sock_ops;
+
+       memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
+       sock_ops.op = op;
+       sock_ops.is_fullsock = 1;
+       sock_ops.sk = sk;
+       BPF_CGROUP_RUN_PROG_SOCK_OPS(sk, &sock_ops, CGROUP_SOCK_OPS);
+}

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-12 11:37 ` [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog Jason Xing
@ 2025-01-15  1:17   ` Martin KaFai Lau
  2025-01-15  2:28     ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-15  1:17 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/12/25 3:37 AM, Jason Xing wrote:
> timestamp_used consists of two parts, one is is_fullsock, the other
> one is for UDP socket which will be support in the next round.
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   include/linux/filter.h | 1 +
>   net/core/filter.c      | 4 ++--
>   net/core/sock.c        | 1 +
>   net/ipv4/tcp_input.c   | 2 ++
>   net/ipv4/tcp_output.c  | 2 ++
>   5 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index a3ea46281595..daca3fe48b8f 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1508,6 +1508,7 @@ struct bpf_sock_ops_kern {
>   	void	*skb_data_end;
>   	u8	op;
>   	u8	is_fullsock;
> +	u8	timestamp_used;
>   	u8	remaining_opt_len;
>   	u64	temp;			/* temp and everything after is not
>   					 * initialized to 0 before calling
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c6dd2d2e44c8..1ac996ec5e0f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10424,10 +10424,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>   		}							      \
>   		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
>   						struct bpf_sock_ops_kern,     \
> -						is_fullsock),		      \
> +						timestamp_used),	      \
>   				      fullsock_reg, si->src_reg,	      \
>   				      offsetof(struct bpf_sock_ops_kern,      \
> -					       is_fullsock));		      \
> +					       timestamp_used));	      \

hmm... I don't think it is the right change. This change may disallow the bpf 
prog from reading skops->sk. It is fine to allow bpf prog (includes the new 
timestamp callback) getting the skops->sk as long as skops->sk is a fullsock.

The actual thing that needs to address is writing to sk, like:

	case offsetof(struct bpf_sock_ops, sk_txhash):
		SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
                                           struct sock, type);


and also all the SOCK_OPS_GET_TCP_SOCK_FIELD() to prepare for the udp sock 
support. After this patch 3, I think I start to understand the udp/fullsock 
discussion in patch 2. is_fullsock here does not mean it is tcp, although it is 
always a tcp_sock now. It literally means it is a full "struct sock". The 
verifier will treat the skops->sk as "struct sock" instead of "struct tcp_sock".

>   		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);	      \
>   		if (si->dst_reg == si->src_reg)				      \
>   			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
> diff --git a/net/core/sock.c b/net/core/sock.c
> index e06bcafb1b2d..dbb9326ae9d1 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -958,6 +958,7 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
>   	if (sk_is_tcp(sk) && sk_fullsock(sk))
>   		sock_ops.is_fullsock = 1;
>   	sock_ops.sk = sk;
> +	sock_ops.timestamp_used = 1;
>   	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
>   }
>   #endif
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 4811727b8a02..cad41ad34bd5 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -169,6 +169,7 @@ static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb)
>   	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
>   	sock_ops.op = BPF_SOCK_OPS_PARSE_HDR_OPT_CB;
>   	sock_ops.is_fullsock = 1;
> +	sock_ops.timestamp_used = 1;
>   	sock_ops.sk = sk;
>   	bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb));
>   
> @@ -185,6 +186,7 @@ static void bpf_skops_established(struct sock *sk, int bpf_op,
>   	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
>   	sock_ops.op = bpf_op;
>   	sock_ops.is_fullsock = 1;
> +	sock_ops.timestamp_used = 1;
>   	sock_ops.sk = sk;
>   	/* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */
>   	if (skb)
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 0e5b9a654254..7b4d1dfd57d4 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -522,6 +522,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
>   		sock_owned_by_me(sk);
>   
>   		sock_ops.is_fullsock = 1;
> +		sock_ops.timestamp_used = 1;
>   		sock_ops.sk = sk;
>   	}
>   
> @@ -567,6 +568,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
>   		sock_owned_by_me(sk);
>   
>   		sock_ops.is_fullsock = 1;
> +		sock_ops.timestamp_used = 1;

The "timestamp_used = 1;' assignment has missed some places. At least in the 
tcp_call_bpf().

Also, the name "timestamp_used" is confusing. Like setting timestamp_used in the 
bpf_skops_*_hdr_opt() callback here when it is not a timestamp callback.

Altogether, need to rethink what to add to sock_ops instead of timestamp_used 
and it should be checked in "some" of the SOCK_OPS_*_FIELD(). A quick thought 
(not 100% sure) is to add "u8 allow_direct_access" which is only set for the 
existing sockops callbacks.

[ I will continue the rest later. ]

>   		sock_ops.sk = sk;
>   	}
>   


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-15  1:17   ` Martin KaFai Lau
@ 2025-01-15  2:28     ` Jason Xing
  2025-01-15  2:54       ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-15  2:28 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Jan 15, 2025 at 9:17 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> > timestamp_used consists of two parts, one is is_fullsock, the other
> > one is for UDP socket which will be support in the next round.
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   include/linux/filter.h | 1 +
> >   net/core/filter.c      | 4 ++--
> >   net/core/sock.c        | 1 +
> >   net/ipv4/tcp_input.c   | 2 ++
> >   net/ipv4/tcp_output.c  | 2 ++
> >   5 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index a3ea46281595..daca3fe48b8f 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -1508,6 +1508,7 @@ struct bpf_sock_ops_kern {
> >       void    *skb_data_end;
> >       u8      op;
> >       u8      is_fullsock;
> > +     u8      timestamp_used;
> >       u8      remaining_opt_len;
> >       u64     temp;                   /* temp and everything after is not
> >                                        * initialized to 0 before calling
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index c6dd2d2e44c8..1ac996ec5e0f 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -10424,10 +10424,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
> >               }                                                             \
> >               *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(                       \
> >                                               struct bpf_sock_ops_kern,     \
> > -                                             is_fullsock),                 \
> > +                                             timestamp_used),              \
> >                                     fullsock_reg, si->src_reg,              \
> >                                     offsetof(struct bpf_sock_ops_kern,      \
> > -                                            is_fullsock));                 \
> > +                                            timestamp_used));              \
>
> hmm... I don't think it is the right change. This change may disallow the bpf
> prog from reading skops->sk. It is fine to allow bpf prog (includes the new
> timestamp callback) getting the skops->sk as long as skops->sk is a fullsock.

Well, I missed some places to be changed. My original intention is
similar to yours: if it's a tcp socket && full socket, then it will be
allowed to access.

>
> The actual thing that needs to address is writing to sk, like:
>
>         case offsetof(struct bpf_sock_ops, sk_txhash):
>                 SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
>                                            struct sock, type);
>

Oh, I missed this one. I will take care of it carefully, at least
stopping writing the socket in the timestamping callback.

>
> and also all the SOCK_OPS_GET_TCP_SOCK_FIELD() to prepare for the udp sock
> support. After this patch 3, I think I start to understand the udp/fullsock
> discussion in patch 2. is_fullsock here does not mean it is tcp, although it is
> always a tcp_sock now. It literally means it is a full "struct sock". The
> verifier will treat the skops->sk as "struct sock" instead of "struct tcp_sock".

Right, I was trying to limit is_fullsock to only tcp type.

>
> >               *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);         \
> >               if (si->dst_reg == si->src_reg)                               \
> >                       *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,       \
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index e06bcafb1b2d..dbb9326ae9d1 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -958,6 +958,7 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> >       if (sk_is_tcp(sk) && sk_fullsock(sk))
> >               sock_ops.is_fullsock = 1;
> >       sock_ops.sk = sk;
> > +     sock_ops.timestamp_used = 1;
> >       __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
> >   }
> >   #endif
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 4811727b8a02..cad41ad34bd5 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -169,6 +169,7 @@ static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb)
> >       memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> >       sock_ops.op = BPF_SOCK_OPS_PARSE_HDR_OPT_CB;
> >       sock_ops.is_fullsock = 1;
> > +     sock_ops.timestamp_used = 1;
> >       sock_ops.sk = sk;
> >       bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb));
> >
> > @@ -185,6 +186,7 @@ static void bpf_skops_established(struct sock *sk, int bpf_op,
> >       memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> >       sock_ops.op = bpf_op;
> >       sock_ops.is_fullsock = 1;
> > +     sock_ops.timestamp_used = 1;
> >       sock_ops.sk = sk;
> >       /* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */
> >       if (skb)
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 0e5b9a654254..7b4d1dfd57d4 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -522,6 +522,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
> >               sock_owned_by_me(sk);
> >
> >               sock_ops.is_fullsock = 1;
> > +             sock_ops.timestamp_used = 1;
> >               sock_ops.sk = sk;
> >       }
> >
> > @@ -567,6 +568,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
> >               sock_owned_by_me(sk);
> >
> >               sock_ops.is_fullsock = 1;
> > +             sock_ops.timestamp_used = 1;
>
> The "timestamp_used = 1;' assignment has missed some places. At least in the
> tcp_call_bpf().

Thanks for double checking. I will count it too.

>
> Also, the name "timestamp_used" is confusing. Like setting timestamp_used in the
> bpf_skops_*_hdr_opt() callback here when it is not a timestamp callback.
>
> Altogether, need to rethink what to add to sock_ops instead of timestamp_used
> and it should be checked in "some" of the SOCK_OPS_*_FIELD(). A quick thought
> (not 100% sure) is to add "u8 allow_direct_access" which is only set for the
> existing sockops callbacks.

Sounds good. I will use this one :)

>
> [ I will continue the rest later. ]

Thanks for your help!

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-15  2:28     ` Jason Xing
@ 2025-01-15  2:54       ` Jason Xing
  2025-01-16  0:51         ` Martin KaFai Lau
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-15  2:54 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Wed, Jan 15, 2025 at 10:28 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Jan 15, 2025 at 9:17 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 1/12/25 3:37 AM, Jason Xing wrote:
> > > timestamp_used consists of two parts, one is is_fullsock, the other
> > > one is for UDP socket which will be support in the next round.
> > >
> > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > > ---
> > >   include/linux/filter.h | 1 +
> > >   net/core/filter.c      | 4 ++--
> > >   net/core/sock.c        | 1 +
> > >   net/ipv4/tcp_input.c   | 2 ++
> > >   net/ipv4/tcp_output.c  | 2 ++
> > >   5 files changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > index a3ea46281595..daca3fe48b8f 100644
> > > --- a/include/linux/filter.h
> > > +++ b/include/linux/filter.h
> > > @@ -1508,6 +1508,7 @@ struct bpf_sock_ops_kern {
> > >       void    *skb_data_end;
> > >       u8      op;
> > >       u8      is_fullsock;
> > > +     u8      timestamp_used;
> > >       u8      remaining_opt_len;
> > >       u64     temp;                   /* temp and everything after is not
> > >                                        * initialized to 0 before calling
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index c6dd2d2e44c8..1ac996ec5e0f 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -10424,10 +10424,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
> > >               }                                                             \
> > >               *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(                       \
> > >                                               struct bpf_sock_ops_kern,     \
> > > -                                             is_fullsock),                 \
> > > +                                             timestamp_used),              \
> > >                                     fullsock_reg, si->src_reg,              \
> > >                                     offsetof(struct bpf_sock_ops_kern,      \
> > > -                                            is_fullsock));                 \
> > > +                                            timestamp_used));              \
> >
> > hmm... I don't think it is the right change. This change may disallow the bpf
> > prog from reading skops->sk. It is fine to allow bpf prog (includes the new
> > timestamp callback) getting the skops->sk as long as skops->sk is a fullsock.
>
> Well, I missed some places to be changed. My original intention is
> similar to yours: if it's a tcp socket && full socket, then it will be
> allowed to access.
>
> >
> > The actual thing that needs to address is writing to sk, like:
> >
> >         case offsetof(struct bpf_sock_ops, sk_txhash):
> >                 SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
> >                                            struct sock, type);
> >
>
> Oh, I missed this one. I will take care of it carefully, at least
> stopping writing the socket in the timestamping callback.
>
> >
> > and also all the SOCK_OPS_GET_TCP_SOCK_FIELD() to prepare for the udp sock
> > support. After this patch 3, I think I start to understand the udp/fullsock
> > discussion in patch 2. is_fullsock here does not mean it is tcp, although it is
> > always a tcp_sock now. It literally means it is a full "struct sock". The
> > verifier will treat the skops->sk as "struct sock" instead of "struct tcp_sock".
>
> Right, I was trying to limit is_fullsock to only tcp type.
>
> >
> > >               *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);         \
> > >               if (si->dst_reg == si->src_reg)                               \
> > >                       *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,       \
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index e06bcafb1b2d..dbb9326ae9d1 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -958,6 +958,7 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > >       if (sk_is_tcp(sk) && sk_fullsock(sk))
> > >               sock_ops.is_fullsock = 1;
> > >       sock_ops.sk = sk;
> > > +     sock_ops.timestamp_used = 1;
> > >       __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
> > >   }
> > >   #endif
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 4811727b8a02..cad41ad34bd5 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -169,6 +169,7 @@ static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb)
> > >       memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> > >       sock_ops.op = BPF_SOCK_OPS_PARSE_HDR_OPT_CB;
> > >       sock_ops.is_fullsock = 1;
> > > +     sock_ops.timestamp_used = 1;
> > >       sock_ops.sk = sk;
> > >       bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb));
> > >
> > > @@ -185,6 +186,7 @@ static void bpf_skops_established(struct sock *sk, int bpf_op,
> > >       memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> > >       sock_ops.op = bpf_op;
> > >       sock_ops.is_fullsock = 1;
> > > +     sock_ops.timestamp_used = 1;
> > >       sock_ops.sk = sk;
> > >       /* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */
> > >       if (skb)
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index 0e5b9a654254..7b4d1dfd57d4 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -522,6 +522,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
> > >               sock_owned_by_me(sk);
> > >
> > >               sock_ops.is_fullsock = 1;
> > > +             sock_ops.timestamp_used = 1;
> > >               sock_ops.sk = sk;
> > >       }
> > >
> > > @@ -567,6 +568,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
> > >               sock_owned_by_me(sk);
> > >
> > >               sock_ops.is_fullsock = 1;
> > > +             sock_ops.timestamp_used = 1;
> >
> > The "timestamp_used = 1;' assignment has missed some places. At least in the
> > tcp_call_bpf().
>
> Thanks for double checking. I will count it too.
>
> >
> > Also, the name "timestamp_used" is confusing. Like setting timestamp_used in the
> > bpf_skops_*_hdr_opt() callback here when it is not a timestamp callback.
> >
> > Altogether, need to rethink what to add to sock_ops instead of timestamp_used
> > and it should be checked in "some" of the SOCK_OPS_*_FIELD(). A quick thought
> > (not 100% sure) is to add "u8 allow_direct_access" which is only set for the
> > existing sockops callbacks.
>
> Sounds good. I will use this one :)

I construct my thoughts here according to our previous discussion:
1. not limiting the use of is_fullsock, so in patch 2, I will use the
follow codes:
+void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
+{
+       struct bpf_sock_ops_kern sock_ops;
+
+       memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
+       sock_ops.op = op;
+       sock_ops.is_fullsock = 1;
+       sock_ops.sk = sk;
+       BPF_CGROUP_RUN_PROG_SOCK_OPS(sk, &sock_ops, CGROUP_SOCK_OPS);
+}

2. introduce the allow_direct_access flag which is used to test if the
socket is allowed to access tcp socket or not.
On the basis of the above bpf_skops_tx_timestamping() function, I
would add one check there:
+ if (sk_is_tcp(sk))
+       sock_ops. allow_direct_access = 1;

Also, I need to set allow_direct_access to one as long as there is
"sock_ops.is_fullsock = 1;" in the existing callbacks.

3. I will replace is_fullsock with allow_direct_access in
SOCK_OPS_GET/SET_FIELD() instead of SOCK_OPS_GET_SK().

Then the udp socket can freely access the socket with the helper
SOCK_OPS_GET_SK() because it is a fullsock. And udp socket cannot
access struct tcp_sock because in the timestamping callback, there is
no place where setting allow_direct_access for udp use.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 04/15] net-timestamp: support SK_BPF_CB_FLAGS only in bpf_sock_ops_setsockopt
  2025-01-12 11:37 ` [PATCH net-next v5 04/15] net-timestamp: support SK_BPF_CB_FLAGS only in bpf_sock_ops_setsockopt Jason Xing
@ 2025-01-15 21:22   ` Martin KaFai Lau
  2025-01-15 23:26     ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-15 21:22 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/12/25 3:37 AM, Jason Xing wrote:
> We will allow both TCP and UDP sockets to use this helper to
> enable this feature. So let SK_BPF_CB_FLAGS pass the check:
> 1. skip is_fullsock check
> 2. skip owned by me check
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   net/core/filter.c | 27 +++++++++++++++++++++------
>   1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 1ac996ec5e0f..0e915268db5f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5507,12 +5507,9 @@ static int sol_ipv6_sockopt(struct sock *sk, int optname,
>   					      KERNEL_SOCKPTR(optval), *optlen);
>   }
>   
> -static int __bpf_setsockopt(struct sock *sk, int level, int optname,
> -			    char *optval, int optlen)
> +static int ___bpf_setsockopt(struct sock *sk, int level, int optname,
> +			     char *optval, int optlen)
>   {
> -	if (!sk_fullsock(sk))
> -		return -EINVAL;
> -
>   	if (level == SOL_SOCKET)
>   		return sol_socket_sockopt(sk, optname, optval, &optlen, false);
>   	else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP)
> @@ -5525,6 +5522,15 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname,
>   	return -EINVAL;
>   }
>   
> +static int __bpf_setsockopt(struct sock *sk, int level, int optname,
> +			    char *optval, int optlen)
> +{
> +	if (!sk_fullsock(sk))
> +		return -EINVAL;
> +
> +	return ___bpf_setsockopt(sk, level, optname, optval, optlen);
> +}
> +
>   static int _bpf_setsockopt(struct sock *sk, int level, int optname,
>   			   char *optval, int optlen)
>   {
> @@ -5675,7 +5681,16 @@ static const struct bpf_func_proto bpf_sock_addr_getsockopt_proto = {
>   BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
>   	   int, level, int, optname, char *, optval, int, optlen)
>   {
> -	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
> +	struct sock *sk = bpf_sock->sk;
> +
> +	if (optname != SK_BPF_CB_FLAGS) {
> +		if (sk_fullsock(sk))
> +			sock_owned_by_me(sk);
> +		else if (optname != SK_BPF_CB_FLAGS)

This is redundant considering the outer "if" has the same check.

Regardless, "optname != SK_BPF_CB_FLAGS" is not the right check. The new 
callback (e.g. BPF_SOCK_OPS_TS_SCHED_OPT_CB) can still call 
bpf_setsockopt(TCP_*) which will be broken without a lock.

It needs to check for bpf_sock->op. I saw patch 5 has the bpf_sock->op check but 
that check is also incorrect. I will comment in there together.

> +			return -EINVAL;
> +	}
> +
> +	return ___bpf_setsockopt(sk, level, optname, optval, optlen);
>   }
>   
>   static const struct bpf_func_proto bpf_sock_ops_setsockopt_proto = {


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
  2025-01-12 11:37 ` [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls Jason Xing
  2025-01-12 14:37   ` kernel test robot
@ 2025-01-15 21:48   ` Martin KaFai Lau
  2025-01-15 23:32     ` Jason Xing
  2025-01-17 10:18   ` kernel test robot
  2 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-15 21:48 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/12/25 3:37 AM, Jason Xing wrote:
> In the next round, we will support the UDP proto for SO_TIMESTAMPING
> bpf extension, so we need to ensure there is no safety problem.
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   net/core/filter.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 0e915268db5f..517f09aabc92 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5571,7 +5571,7 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
>   static int _bpf_getsockopt(struct sock *sk, int level, int optname,
>   			   char *optval, int optlen)
>   {
> -	if (sk_fullsock(sk))
> +	if (sk_fullsock(sk) && optname != SK_BPF_CB_FLAGS)
>   		sock_owned_by_me(sk);
>   	return __bpf_getsockopt(sk, level, optname, optval, optlen);
>   }
> @@ -5776,6 +5776,7 @@ BPF_CALL_5(bpf_sock_ops_getsockopt, struct bpf_sock_ops_kern *, bpf_sock,
>   	   int, level, int, optname, char *, optval, int, optlen)
>   {
>   	if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP &&
> +	    bpf_sock->sk->sk_protocol == IPPROTO_TCP &&
>   	    optname >= TCP_BPF_SYN && optname <= TCP_BPF_SYN_MAC) {
>   		int ret, copy_len = 0;
>   		const u8 *start;
> @@ -5817,7 +5818,8 @@ BPF_CALL_2(bpf_sock_ops_cb_flags_set, struct bpf_sock_ops_kern *, bpf_sock,
>   	struct sock *sk = bpf_sock->sk;
>   	int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS;
>   
> -	if (!IS_ENABLED(CONFIG_INET) || !sk_fullsock(sk))
> +	if (!IS_ENABLED(CONFIG_INET) || !sk_fullsock(sk) ||
> +	    sk->sk_protocol != IPPROTO_TCP)
>   		return -EINVAL;
>   
>   	tcp_sk(sk)->bpf_sock_ops_cb_flags = val;
> @@ -7626,6 +7628,9 @@ BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
>   	u8 search_kind, search_len, copy_len, magic_len;
>   	int ret;
>   
> +	if (bpf_sock->op != SK_BPF_CB_FLAGS)

SK_BPF_CB_FLAGS is not an op enum, so the check is incorrect. It does break the 
existing test.

./test_progs -t tcp_hdr_options
WARNING! Selftests relying on bpf_testmod.ko will be skipped.
#402/1   tcp_hdr_options/simple_estab:FAIL
#402/2   tcp_hdr_options/no_exprm_estab:FAIL
#402/3   tcp_hdr_options/syncookie_estab:FAIL
#402/4   tcp_hdr_options/fastopen_estab:FAIL
#402/5   tcp_hdr_options/fin:FAIL
#402/6   tcp_hdr_options/misc:FAIL
#402     tcp_hdr_options:FAIL
#402/1   tcp_hdr_options/simple_estab:FAIL
#402/2   tcp_hdr_options/no_exprm_estab:FAIL
#402/3   tcp_hdr_options/syncookie_estab:FAIL
#402/4   tcp_hdr_options/fastopen_estab:FAIL
#402/5   tcp_hdr_options/fin:FAIL
#402/6   tcp_hdr_options/misc:FAIL
#402     tcp_hdr_options:FAIL


Many changes of this set is in bpf and the newly added selftest is also a bpf 
prog, all bpf selftests should be run before posting. 
(Documentation/bpf/bpf_devel_QA.rst)

The bpf CI can automatically pick it up and get an auto email on breakage like 
this if the set is tagged to bpf-next. We can figure out where to land the set 
later (bpf-next/net or net-next/main) when it is ready.

All these changes also need a test in selftests/bpf. For example, I expect there 
is a test to ensure calling these bpf helpers from the new tstamp callback will 
get a negative errno value.

For patch 4 and patch 5, I would suggest keeping it simple to only check for 
bpf_sock->op for the helpers that make tcp_sock and/or locked sk assumption.
Something like this on top of your patch. Untested:

diff --git i/net/core/filter.c w/net/core/filter.c
index 517f09aabc92..ccb13b61c528 100644
--- i/net/core/filter.c
+++ w/net/core/filter.c
@@ -7620,6 +7620,11 @@ static const u8 *bpf_search_tcp_opt(const u8 *op, const 
u8 *opend,
  	return ERR_PTR(-ENOMSG);
  }

+static bool is_locked_tcp_sock_ops(struct bpf_sock_ops_kern *bpf_sock)
+{
+	return bpf_sock->op <= BPF_SOCK_OPS_WRITE_HDR_OPT_CB;
+}
+
  BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
  	   void *, search_res, u32, len, u64, flags)
  {
@@ -7628,8 +7633,8 @@ BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct 
bpf_sock_ops_kern *, bpf_sock,
  	u8 search_kind, search_len, copy_len, magic_len;
  	int ret;

-	if (bpf_sock->op != SK_BPF_CB_FLAGS)
-		return -EINVAL;
+	if (!is_locked_tcp_sock_ops(bpf_sock))
+		return -EOPNOTSUPP;

  	/* 2 byte is the minimal option len except TCPOPT_NOP and
  	 * TCPOPT_EOL which are useless for the bpf prog to learn


> +		return -EINVAL;
> +
>   	/* 2 byte is the minimal option len except TCPOPT_NOP and
>   	 * TCPOPT_EOL which are useless for the bpf prog to learn
>   	 * and this helper disallow loading them also.


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 06/15] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING
  2025-01-12 11:37 ` [PATCH net-next v5 06/15] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
@ 2025-01-15 22:11   ` Martin KaFai Lau
  2025-01-15 23:50     ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-15 22:11 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/12/25 3:37 AM, Jason Xing wrote:
>   void __skb_tstamp_tx(struct sk_buff *orig_skb,
>   		     const struct sk_buff *ack_skb,
>   		     struct skb_shared_hwtstamps *hwtstamps,
> -		     struct sock *sk, int tstype)
> +		     struct sock *sk, bool sw, int tstype)

Instead of adding a new "bool sw" and changing all callers, is it the same as 
testing "!hwtstamps" ?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 07/15] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension
  2025-01-12 11:37 ` [PATCH net-next v5 07/15] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension Jason Xing
@ 2025-01-15 22:32   ` Martin KaFai Lau
  2025-01-15 23:57     ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-15 22:32 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/12/25 3:37 AM, Jason Xing wrote:
> +static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype)

nit. Remove the "__" prefix. There is no name conflict.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  2025-01-12 11:37 ` [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND " Jason Xing
@ 2025-01-15 22:48   ` Martin KaFai Lau
  2025-01-15 23:56     ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-15 22:48 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/12/25 3:37 AM, Jason Xing wrote:
> Support SCM_TSTAMP_SND case. Then we will get the software
> timestamp when the driver is about to send the skb. Later, I
> will support the hardware timestamp.

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 169c6d03d698..0fb31df4ed95 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5578,6 +5578,9 @@ static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype
>   	case SCM_TSTAMP_SCHED:
>   		op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
>   		break;
> +	case SCM_TSTAMP_SND:
> +		op = BPF_SOCK_OPS_TS_SW_OPT_CB;

For the hwtstamps case, is skb_hwtstamps(skb) set? From looking at a few 
drivers, it does not look like it. I don't see the hwtstamps support in patch 10 
either. What did I miss ?

> +		break;
>   	default:
>   		return;
>   	}

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 09/15] net-timestamp: support SCM_TSTAMP_ACK for bpf extension
  2025-01-12 11:37 ` [PATCH net-next v5 09/15] net-timestamp: support SCM_TSTAMP_ACK " Jason Xing
@ 2025-01-15 23:02   ` Martin KaFai Lau
  0 siblings, 0 replies; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-15 23:02 UTC (permalink / raw)
  To: Jason Xing, netdev
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf

On 1/12/25 3:37 AM, Jason Xing wrote:
> Handle the ACK timestamp case. Actually testing SKBTX_BPF flag
> can work, but we need to Introduce a new txstamp_ack_bpf to avoid
> cache line misses in tcp_ack_tstamp(). To be more specific, in most
> cases, normal flows would not access skb_shinfo as txstamp_ack
> is zero, so that this function won't appear in the hot spot lists.
> Introducing a new member txstamp_ack_bpf works similarly.

This change and some of the earlier changes (e.g. adding SKBTX_BPF) will need an 
ack from netdev.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 11/15] net-timestamp: support export skb to the userspace
  2025-01-12 11:37 ` [PATCH net-next v5 11/15] net-timestamp: support export skb to the userspace Jason Xing
@ 2025-01-15 23:05   ` Martin KaFai Lau
  2025-01-15 23:59     ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-15 23:05 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/12/25 3:37 AM, Jason Xing wrote:
> People can follow these three steps as below to fetch the shared info
> from the exported skb in the bpf prog:
> 1. skops_kern = bpf_cast_to_kern_ctx(skops);
> 2. skb = skops_kern->skb;
> 3. shinfo = bpf_core_cast(skb->head + skb->end, struct skb_shared_info);
> 
> It's worth to highlight we will be able to fetch the hwstamp, tskey
> and more key information extracted from the skb.
> 
> More details can be seen in the last selftest patch of the series.
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   net/core/sock.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index dbb9326ae9d1..2f54e60a50d4 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -958,6 +958,7 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
>   	if (sk_is_tcp(sk) && sk_fullsock(sk))
>   		sock_ops.is_fullsock = 1;
>   	sock_ops.sk = sk;
> +	bpf_skops_init_skb(&sock_ops, skb, 0);

nit. This change can fold into patch 1.

>   	sock_ops.timestamp_used = 1;
>   	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
>   }


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 04/15] net-timestamp: support SK_BPF_CB_FLAGS only in bpf_sock_ops_setsockopt
  2025-01-15 21:22   ` Martin KaFai Lau
@ 2025-01-15 23:26     ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-15 23:26 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Jan 16, 2025 at 5:22 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> > We will allow both TCP and UDP sockets to use this helper to
> > enable this feature. So let SK_BPF_CB_FLAGS pass the check:
> > 1. skip is_fullsock check
> > 2. skip owned by me check
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   net/core/filter.c | 27 +++++++++++++++++++++------
> >   1 file changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 1ac996ec5e0f..0e915268db5f 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5507,12 +5507,9 @@ static int sol_ipv6_sockopt(struct sock *sk, int optname,
> >                                             KERNEL_SOCKPTR(optval), *optlen);
> >   }
> >
> > -static int __bpf_setsockopt(struct sock *sk, int level, int optname,
> > -                         char *optval, int optlen)
> > +static int ___bpf_setsockopt(struct sock *sk, int level, int optname,
> > +                          char *optval, int optlen)
> >   {
> > -     if (!sk_fullsock(sk))
> > -             return -EINVAL;
> > -
> >       if (level == SOL_SOCKET)
> >               return sol_socket_sockopt(sk, optname, optval, &optlen, false);
> >       else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP)
> > @@ -5525,6 +5522,15 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname,
> >       return -EINVAL;
> >   }
> >
> > +static int __bpf_setsockopt(struct sock *sk, int level, int optname,
> > +                         char *optval, int optlen)
> > +{
> > +     if (!sk_fullsock(sk))
> > +             return -EINVAL;
> > +
> > +     return ___bpf_setsockopt(sk, level, optname, optval, optlen);
> > +}
> > +
> >   static int _bpf_setsockopt(struct sock *sk, int level, int optname,
> >                          char *optval, int optlen)
> >   {
> > @@ -5675,7 +5681,16 @@ static const struct bpf_func_proto bpf_sock_addr_getsockopt_proto = {
> >   BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
> >          int, level, int, optname, char *, optval, int, optlen)
> >   {
> > -     return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
> > +     struct sock *sk = bpf_sock->sk;
> > +
> > +     if (optname != SK_BPF_CB_FLAGS) {
> > +             if (sk_fullsock(sk))
> > +                     sock_owned_by_me(sk);
> > +             else if (optname != SK_BPF_CB_FLAGS)
>
> This is redundant considering the outer "if" has the same check.
>
> Regardless, "optname != SK_BPF_CB_FLAGS" is not the right check. The new
> callback (e.g. BPF_SOCK_OPS_TS_SCHED_OPT_CB) can still call
> bpf_setsockopt(TCP_*) which will be broken without a lock.
>
> It needs to check for bpf_sock->op. I saw patch 5 has the bpf_sock->op check but
> that check is also incorrect. I will comment in there together.

Thanks. Will correct them soon.

>
> > +                     return -EINVAL;
> > +     }
> > +
> > +     return ___bpf_setsockopt(sk, level, optname, optval, optlen);
> >   }
> >
> >   static const struct bpf_func_proto bpf_sock_ops_setsockopt_proto = {
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
  2025-01-15 21:48   ` Martin KaFai Lau
@ 2025-01-15 23:32     ` Jason Xing
  2025-01-18  2:15       ` Martin KaFai Lau
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-15 23:32 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Jan 16, 2025 at 5:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> > In the next round, we will support the UDP proto for SO_TIMESTAMPING
> > bpf extension, so we need to ensure there is no safety problem.
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   net/core/filter.c | 9 +++++++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 0e915268db5f..517f09aabc92 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5571,7 +5571,7 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
> >   static int _bpf_getsockopt(struct sock *sk, int level, int optname,
> >                          char *optval, int optlen)
> >   {
> > -     if (sk_fullsock(sk))
> > +     if (sk_fullsock(sk) && optname != SK_BPF_CB_FLAGS)
> >               sock_owned_by_me(sk);
> >       return __bpf_getsockopt(sk, level, optname, optval, optlen);
> >   }
> > @@ -5776,6 +5776,7 @@ BPF_CALL_5(bpf_sock_ops_getsockopt, struct bpf_sock_ops_kern *, bpf_sock,
> >          int, level, int, optname, char *, optval, int, optlen)
> >   {
> >       if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP &&
> > +         bpf_sock->sk->sk_protocol == IPPROTO_TCP &&
> >           optname >= TCP_BPF_SYN && optname <= TCP_BPF_SYN_MAC) {
> >               int ret, copy_len = 0;
> >               const u8 *start;
> > @@ -5817,7 +5818,8 @@ BPF_CALL_2(bpf_sock_ops_cb_flags_set, struct bpf_sock_ops_kern *, bpf_sock,
> >       struct sock *sk = bpf_sock->sk;
> >       int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS;
> >
> > -     if (!IS_ENABLED(CONFIG_INET) || !sk_fullsock(sk))
> > +     if (!IS_ENABLED(CONFIG_INET) || !sk_fullsock(sk) ||
> > +         sk->sk_protocol != IPPROTO_TCP)
> >               return -EINVAL;
> >
> >       tcp_sk(sk)->bpf_sock_ops_cb_flags = val;
> > @@ -7626,6 +7628,9 @@ BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
> >       u8 search_kind, search_len, copy_len, magic_len;
> >       int ret;
> >
> > +     if (bpf_sock->op != SK_BPF_CB_FLAGS)
>
> SK_BPF_CB_FLAGS is not an op enum, so the check is incorrect. It does break the
> existing test.
>
> ./test_progs -t tcp_hdr_options
> WARNING! Selftests relying on bpf_testmod.ko will be skipped.
> #402/1   tcp_hdr_options/simple_estab:FAIL
> #402/2   tcp_hdr_options/no_exprm_estab:FAIL
> #402/3   tcp_hdr_options/syncookie_estab:FAIL
> #402/4   tcp_hdr_options/fastopen_estab:FAIL
> #402/5   tcp_hdr_options/fin:FAIL
> #402/6   tcp_hdr_options/misc:FAIL
> #402     tcp_hdr_options:FAIL
> #402/1   tcp_hdr_options/simple_estab:FAIL
> #402/2   tcp_hdr_options/no_exprm_estab:FAIL
> #402/3   tcp_hdr_options/syncookie_estab:FAIL
> #402/4   tcp_hdr_options/fastopen_estab:FAIL
> #402/5   tcp_hdr_options/fin:FAIL
> #402/6   tcp_hdr_options/misc:FAIL
> #402     tcp_hdr_options:FAIL
>

Right, kernel test robot also discovered this one.

>
> Many changes of this set is in bpf and the newly added selftest is also a bpf
> prog, all bpf selftests should be run before posting.
> (Documentation/bpf/bpf_devel_QA.rst)
>
> The bpf CI can automatically pick it up and get an auto email on breakage like
> this if the set is tagged to bpf-next. We can figure out where to land the set
> later (bpf-next/net or net-next/main) when it is ready.
>
> All these changes also need a test in selftests/bpf. For example, I expect there
> is a test to ensure calling these bpf helpers from the new tstamp callback will
> get a negative errno value.
>
> For patch 4 and patch 5, I would suggest keeping it simple to only check for
> bpf_sock->op for the helpers that make tcp_sock and/or locked sk assumption.
> Something like this on top of your patch. Untested:
>
> diff --git i/net/core/filter.c w/net/core/filter.c
> index 517f09aabc92..ccb13b61c528 100644
> --- i/net/core/filter.c
> +++ w/net/core/filter.c
> @@ -7620,6 +7620,11 @@ static const u8 *bpf_search_tcp_opt(const u8 *op, const
> u8 *opend,
>         return ERR_PTR(-ENOMSG);
>   }
>
> +static bool is_locked_tcp_sock_ops(struct bpf_sock_ops_kern *bpf_sock)
> +{
> +       return bpf_sock->op <= BPF_SOCK_OPS_WRITE_HDR_OPT_CB;

I wonder if I can use the code snippets in the previous reply in this
thread, only checking if we are in the timestamping callback?
+#define BPF_SOCK_OPTS_TS               (BPF_SOCK_OPS_TS_SCHED_OPT_CB | \
+                                        BPF_SOCK_OPS_TS_SW_OPT_CB | \
+                                        BPF_SOCK_OPS_TS_ACK_OPT_CB | \
+                                        BPF_SOCK_OPS_TS_TCP_SND_CB)

Then other developers won't worry too much whether they will cause
some safety problems. If not, they will/must add callbacks earlier
than BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

> +}
> +
>   BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
>            void *, search_res, u32, len, u64, flags)
>   {
> @@ -7628,8 +7633,8 @@ BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct
> bpf_sock_ops_kern *, bpf_sock,
>         u8 search_kind, search_len, copy_len, magic_len;
>         int ret;
>
> -       if (bpf_sock->op != SK_BPF_CB_FLAGS)
> -               return -EINVAL;
> +       if (!is_locked_tcp_sock_ops(bpf_sock))
> +               return -EOPNOTSUPP;

Right, thanks, I will do that.

>
>         /* 2 byte is the minimal option len except TCPOPT_NOP and
>          * TCPOPT_EOL which are useless for the bpf prog to learn
>
>
> > +             return -EINVAL;
> > +
> >       /* 2 byte is the minimal option len except TCPOPT_NOP and
> >        * TCPOPT_EOL which are useless for the bpf prog to learn
> >        * and this helper disallow loading them also.
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 06/15] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING
  2025-01-15 22:11   ` Martin KaFai Lau
@ 2025-01-15 23:50     ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-15 23:50 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Jan 16, 2025 at 6:11 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> >   void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >                    const struct sk_buff *ack_skb,
> >                    struct skb_shared_hwtstamps *hwtstamps,
> > -                  struct sock *sk, int tstype)
> > +                  struct sock *sk, bool sw, int tstype)
>
> Instead of adding a new "bool sw" and changing all callers, is it the same as
> testing "!hwtstamps" ?

Actually, I had a version using the hwtstamps, then I realized that
hardware or driver may go wrong and pass a NULL hwstamps. It's indeed
unlikely to happen. If so, timestamping code will consider it as a
software timestamp.

I don't expect that thing happening, ensuring our code is robust
enough. The original timestamping code seems not deal with this case
as we cannot see some particular test in __skb_tstamp_tx(). The worst
thing is if it happens, we would never know and treat it as a software
SCM_TSTAMP_SND case.

Does it make any sense to you?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  2025-01-15 22:48   ` Martin KaFai Lau
@ 2025-01-15 23:56     ` Jason Xing
  2025-01-18  0:46       ` Martin KaFai Lau
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-15 23:56 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Jan 16, 2025 at 6:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> > Support SCM_TSTAMP_SND case. Then we will get the software
> > timestamp when the driver is about to send the skb. Later, I
> > will support the hardware timestamp.
>
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 169c6d03d698..0fb31df4ed95 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5578,6 +5578,9 @@ static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype
> >       case SCM_TSTAMP_SCHED:
> >               op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
> >               break;
> > +     case SCM_TSTAMP_SND:
> > +             op = BPF_SOCK_OPS_TS_SW_OPT_CB;
>
> For the hwtstamps case, is skb_hwtstamps(skb) set? From looking at a few
> drivers, it does not look like it. I don't see the hwtstamps support in patch 10
> either. What did I miss ?

Sorry, I missed adding a new flag, namely, BPF_SOCK_OPS_TS_HW_OPT_CB.
I can also skip adding that new one and rename
BPF_SOCK_OPS_TS_SW_OPT_CB accordingly for sw and hw cases if we
finally decide to use hwtstamps parameter to distinguish two different
cases.

Thanks,
Jason

>
> > +             break;
> >       default:
> >               return;
> >       }

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 07/15] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension
  2025-01-15 22:32   ` Martin KaFai Lau
@ 2025-01-15 23:57     ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-15 23:57 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Jan 16, 2025 at 6:32 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> > +static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype)
>
> nit. Remove the "__" prefix. There is no name conflict.

Got it, thanks!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 11/15] net-timestamp: support export skb to the userspace
  2025-01-15 23:05   ` Martin KaFai Lau
@ 2025-01-15 23:59     ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-15 23:59 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Jan 16, 2025 at 7:05 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> > People can follow these three steps as below to fetch the shared info
> > from the exported skb in the bpf prog:
> > 1. skops_kern = bpf_cast_to_kern_ctx(skops);
> > 2. skb = skops_kern->skb;
> > 3. shinfo = bpf_core_cast(skb->head + skb->end, struct skb_shared_info);
> >
> > It's worth to highlight we will be able to fetch the hwstamp, tskey
> > and more key information extracted from the skb.
> >
> > More details can be seen in the last selftest patch of the series.
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   net/core/sock.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index dbb9326ae9d1..2f54e60a50d4 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -958,6 +958,7 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> >       if (sk_is_tcp(sk) && sk_fullsock(sk))
> >               sock_ops.is_fullsock = 1;
> >       sock_ops.sk = sk;
> > +     bpf_skops_init_skb(&sock_ops, skb, 0);
>
> nit. This change can fold into patch 1.

I understand what you meant here. You probably refer to patch [02/15].

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension
  2025-01-12 11:37 ` [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension Jason Xing
@ 2025-01-16  0:03   ` Martin KaFai Lau
  2025-01-16  0:41     ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-16  0:03 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/12/25 3:37 AM, Jason Xing wrote:
> Introduce tskey_bpf to correlate tcp_sendmsg timestamp with other
> three points (SND/SW/ACK). More details can be found in the
> selftest.
> 
> For TCP, tskey_bpf is used to store the initial write_seq value
> the moment tcp_sendmsg is called, so that the last skb of this
> call will have the same tskey_bpf with tcp_sendmsg bpf callback.
> 
> UDP works similarly because tskey_bpf can increase by one everytime
> udp_sendmsg gets called. It will be implemented soon.
> 
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>   include/linux/skbuff.h         |  2 ++
>   include/uapi/linux/bpf.h       |  3 +++
>   net/core/sock.c                |  3 ++-
>   net/ipv4/tcp.c                 | 10 ++++++++--
>   tools/include/uapi/linux/bpf.h |  3 +++
>   5 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index d3ef8db94a94..3b7b470d5d89 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -609,6 +609,8 @@ struct skb_shared_info {
>   	};
>   	unsigned int	gso_type;
>   	u32		tskey;
> +	/* For TCP, it records the initial write_seq when sendmsg is called */
> +	u32		tskey_bpf;

I would suggest to remove this tskey_bpf addition to skb_shared_info. My 
understanding is the intention is to get the delay spent in the 
tcp_sendmsg_locked(). I think this can be done in bpf_sk_storage. More below.

>   
>   	/*
>   	 * Warning : all fields before dataref are cleared in __alloc_skb()
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index a0aff1b4eb61..87420c0f2235 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7037,6 +7037,9 @@ enum {
>   					 * feature is on. It indicates the
>   					 * recorded timestamp.
>   					 */
> +	BPF_SOCK_OPS_TS_TCP_SND_CB,	/* Called when every tcp_sendmsg
> +					 * syscall is triggered
> +					 */

UDP will need this also?

>   };
>   
>   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 2f54e60a50d4..e74ab0e2979d 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -958,7 +958,8 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
>   	if (sk_is_tcp(sk) && sk_fullsock(sk))
>   		sock_ops.is_fullsock = 1;
>   	sock_ops.sk = sk;
> -	bpf_skops_init_skb(&sock_ops, skb, 0);
> +	if (skb)
> +		bpf_skops_init_skb(&sock_ops, skb, 0);
>   	sock_ops.timestamp_used = 1;
>   	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
>   }
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0a41006b10d1..b6e0db5e4ead 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -477,7 +477,7 @@ void tcp_init_sock(struct sock *sk)
>   }
>   EXPORT_SYMBOL(tcp_init_sock);
>   
> -static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc, u32 first_write_seq)
>   {
>   	struct sk_buff *skb = tcp_write_queue_tail(sk);
>   	u32 tsflags = sockc->tsflags;
> @@ -500,6 +500,7 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>   		tcb->txstamp_ack_bpf = 1;
>   		shinfo->tx_flags |= SKBTX_BPF;
>   		shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;

Add the bpf prog callout here instead:

		bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_TCP_SND_CB);

If the bpf prog wants to figure out the delay from the very beginning of the 
tcp_sendmsg_locked(), a bpf prog (either by tracing the tcp_sendmsg_locked or by 
adding a new callout at the beginning of tcp_sendmsg_locked like this patch) can 
store a bpf_ktime_get_ns() in the bpf_sk_storage. The bpf prog running here (at 
tcp_tx_timestamp) can get that timestamp from the bpf_sk_storage since it has a 
hold on the same sk pointer. There is no need to add a new shinfo->tskey_bpf to 
measure this part of the delay.

> +		shinfo->tskey_bpf = first_write_seq;
>   	}
>   }
>   
> @@ -1067,10 +1068,15 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>   	int flags, err, copied = 0;
>   	int mss_now = 0, size_goal, copied_syn = 0;
>   	int process_backlog = 0;
> +	u32 first_write_seq = 0;
>   	int zc = 0;
>   	long timeo;
>   
>   	flags = msg->msg_flags;
> +	if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING)) {
> +		first_write_seq = tp->write_seq;
> +		bpf_skops_tx_timestamping(sk, NULL, BPF_SOCK_OPS_TS_TCP_SND_CB);

My preference is to skip this bpf callout for now and depends on a bpf trace 
program if it is really needed.

> +	}
>   
>   	if ((flags & MSG_ZEROCOPY) && size) {
>   		if (msg->msg_ubuf) {
> @@ -1331,7 +1337,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>   
>   out:
>   	if (copied) {
> -		tcp_tx_timestamp(sk, &sockc);
> +		tcp_tx_timestamp(sk, &sockc, first_write_seq);
>   		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>   	}
>   out_nopush:
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 0fe7d663a244..3769e38e052d 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7030,6 +7030,9 @@ enum {
>   					 * feature is on. It indicates the
>   					 * recorded timestamp.
>   					 */
> +	BPF_SOCK_OPS_TS_TCP_SND_CB,	/* Called when every tcp_sendmsg
> +					 * syscall is triggered
> +					 */
>   };
>   
>   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension
  2025-01-16  0:03   ` Martin KaFai Lau
@ 2025-01-16  0:41     ` Jason Xing
  2025-01-16  1:18       ` Martin KaFai Lau
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-16  0:41 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Jan 16, 2025 at 8:04 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/12/25 3:37 AM, Jason Xing wrote:
> > Introduce tskey_bpf to correlate tcp_sendmsg timestamp with other
> > three points (SND/SW/ACK). More details can be found in the
> > selftest.
> >
> > For TCP, tskey_bpf is used to store the initial write_seq value
> > the moment tcp_sendmsg is called, so that the last skb of this
> > call will have the same tskey_bpf with tcp_sendmsg bpf callback.
> >
> > UDP works similarly because tskey_bpf can increase by one everytime
> > udp_sendmsg gets called. It will be implemented soon.
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> >   include/linux/skbuff.h         |  2 ++
> >   include/uapi/linux/bpf.h       |  3 +++
> >   net/core/sock.c                |  3 ++-
> >   net/ipv4/tcp.c                 | 10 ++++++++--
> >   tools/include/uapi/linux/bpf.h |  3 +++
> >   5 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index d3ef8db94a94..3b7b470d5d89 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -609,6 +609,8 @@ struct skb_shared_info {
> >       };
> >       unsigned int    gso_type;
> >       u32             tskey;
> > +     /* For TCP, it records the initial write_seq when sendmsg is called */
> > +     u32             tskey_bpf;
>
> I would suggest to remove this tskey_bpf addition to skb_shared_info. My
> understanding is the intention is to get the delay spent in the
> tcp_sendmsg_locked(). I think this can be done in bpf_sk_storage. More below.
>
> >
> >       /*
> >        * Warning : all fields before dataref are cleared in __alloc_skb()
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index a0aff1b4eb61..87420c0f2235 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -7037,6 +7037,9 @@ enum {
> >                                        * feature is on. It indicates the
> >                                        * recorded timestamp.
> >                                        */
> > +     BPF_SOCK_OPS_TS_TCP_SND_CB,     /* Called when every tcp_sendmsg
> > +                                      * syscall is triggered
> > +                                      */
>
> UDP will need this also?

Yep.

>
> >   };
> >
> >   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 2f54e60a50d4..e74ab0e2979d 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -958,7 +958,8 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> >       if (sk_is_tcp(sk) && sk_fullsock(sk))
> >               sock_ops.is_fullsock = 1;
> >       sock_ops.sk = sk;
> > -     bpf_skops_init_skb(&sock_ops, skb, 0);
> > +     if (skb)
> > +             bpf_skops_init_skb(&sock_ops, skb, 0);
> >       sock_ops.timestamp_used = 1;
> >       __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
> >   }
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 0a41006b10d1..b6e0db5e4ead 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -477,7 +477,7 @@ void tcp_init_sock(struct sock *sk)
> >   }
> >   EXPORT_SYMBOL(tcp_init_sock);
> >
> > -static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> > +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc, u32 first_write_seq)
> >   {
> >       struct sk_buff *skb = tcp_write_queue_tail(sk);
> >       u32 tsflags = sockc->tsflags;
> > @@ -500,6 +500,7 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >               tcb->txstamp_ack_bpf = 1;
> >               shinfo->tx_flags |= SKBTX_BPF;
> >               shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>
> Add the bpf prog callout here instead:
>
>                 bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_TCP_SND_CB);

If we trigger the first time callback here by using
BPF_SOCK_OPS_TS_TCP_SND_CB, it's a little bit late. Do you mean that
we can use the same new callback at the beginning of
tcp_sendmsg_locked() and tcp_tx_timestamp()?

>
> If the bpf prog wants to figure out the delay from the very beginning of the
> tcp_sendmsg_locked(), a bpf prog (either by tracing the tcp_sendmsg_locked or by
> adding a new callout at the beginning of tcp_sendmsg_locked like this patch) can
> store a bpf_ktime_get_ns() in the bpf_sk_storage. The bpf prog running here (at

Thanks for a new lesson about the usage of bpf_sk_storage here. I'll
dig into it.

> tcp_tx_timestamp) can get that timestamp from the bpf_sk_storage since it has a
> hold on the same sk pointer. There is no need to add a new shinfo->tskey_bpf to
> measure this part of the delay.
>
> > +             shinfo->tskey_bpf = first_write_seq;
> >       }
> >   }
> >
> > @@ -1067,10 +1068,15 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >       int flags, err, copied = 0;
> >       int mss_now = 0, size_goal, copied_syn = 0;
> >       int process_backlog = 0;
> > +     u32 first_write_seq = 0;
> >       int zc = 0;
> >       long timeo;
> >
> >       flags = msg->msg_flags;
> > +     if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING)) {
> > +             first_write_seq = tp->write_seq;
> > +             bpf_skops_tx_timestamping(sk, NULL, BPF_SOCK_OPS_TS_TCP_SND_CB);
>
> My preference is to skip this bpf callout for now and depends on a bpf trace
> program if it is really needed.

I have no idea if the bpf program wants to record the timestamp here
without the above three lines? Please enlighten me more. Thanks in
advance.

I guess there is one way which I don't know yet to monitor at the
beginning of tcp_sendmsg_locked(). Then let the bpf program call
BPF_SOCK_OPS_TS_TCP_SND_CB in the tcp_tx_timestamp() like your
previous  comment with the help of bpf storage feature to correlate
them. Am I understanding right?

Thanks,
Jason

>
> > +     }
> >
> >       if ((flags & MSG_ZEROCOPY) && size) {
> >               if (msg->msg_ubuf) {
> > @@ -1331,7 +1337,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >
> >   out:
> >       if (copied) {
> > -             tcp_tx_timestamp(sk, &sockc);
> > +             tcp_tx_timestamp(sk, &sockc, first_write_seq);
> >               tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >       }
> >   out_nopush:
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index 0fe7d663a244..3769e38e052d 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -7030,6 +7030,9 @@ enum {
> >                                        * feature is on. It indicates the
> >                                        * recorded timestamp.
> >                                        */
> > +     BPF_SOCK_OPS_TS_TCP_SND_CB,     /* Called when every tcp_sendmsg
> > +                                      * syscall is triggered
> > +                                      */
> >   };
> >
> >   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-15  2:54       ` Jason Xing
@ 2025-01-16  0:51         ` Martin KaFai Lau
  2025-01-16  1:12           ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-16  0:51 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/14/25 6:54 PM, Jason Xing wrote:
> I construct my thoughts here according to our previous discussion:
> 1. not limiting the use of is_fullsock, so in patch 2, I will use the
> follow codes:
> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> +{
> +       struct bpf_sock_ops_kern sock_ops;
> +
> +       memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> +       sock_ops.op = op;
> +       sock_ops.is_fullsock = 1;
> +       sock_ops.sk = sk;

lgtm.

> +       BPF_CGROUP_RUN_PROG_SOCK_OPS(sk, &sock_ops, CGROUP_SOCK_OPS);

After looking through the set and looking again at how sk is used in 
__skb_tstamp_tx(), I think the sk must be fullsock here, so using 
__cgroup_bpf_run_filter_sock_ops() as in patch 2 is good. It will be useful to 
have a comment here to explain it must be a fullsock.

> +}
> 
> 2. introduce the allow_direct_access flag which is used to test if the
> socket is allowed to access tcp socket or not.

yeah, right now is only tcp_sock, but future will have UDP TS support.

May be the "allow_direct_access" naming is not obvious to mean the existing 
tcp_sock support. May be "allow_tcp_access"?

I was thinking to set the allow_direct_access for the "existing" sockops 
callback which must be tcp_sock and must have the sk locked.

> On the basis of the above bpf_skops_tx_timestamping() function, I
> would add one check there:
> + if (sk_is_tcp(sk))
> +       sock_ops. allow_direct_access = 1;

so don't set this in the new TS callback from bpf_skops_tx_timestamping 
regardless it is tcp or not.

> 
> Also, I need to set allow_direct_access to one as long as there is
> "sock_ops.is_fullsock = 1;" in the existing callbacks.

Only set allow_direct_access when the sk is fullsock in the "existing" sockops 
callback.

After thinking a bit more today, I think this should work. Please give it a try 
and check if some cases may be missed in sock_ops_convert_ctx_access().

> 
> 3. I will replace is_fullsock with allow_direct_access in
> SOCK_OPS_GET/SET_FIELD() instead of SOCK_OPS_GET_SK().

Yep.

> 
> Then the udp socket can freely access the socket with the helper
> SOCK_OPS_GET_SK() because it is a fullsock. And udp socket cannot
> access struct tcp_sock because in the timestamping callback, there is
> no place where setting allow_direct_access for udp use.

__sk_buff->sk? yes.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-16  0:51         ` Martin KaFai Lau
@ 2025-01-16  1:12           ` Jason Xing
  2025-01-18  1:42             ` Martin KaFai Lau
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-16  1:12 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Jan 16, 2025 at 8:51 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/14/25 6:54 PM, Jason Xing wrote:
> > I construct my thoughts here according to our previous discussion:
> > 1. not limiting the use of is_fullsock, so in patch 2, I will use the
> > follow codes:
> > +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > +{
> > +       struct bpf_sock_ops_kern sock_ops;
> > +
> > +       memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> > +       sock_ops.op = op;
> > +       sock_ops.is_fullsock = 1;
> > +       sock_ops.sk = sk;
>
> lgtm.
>
> > +       BPF_CGROUP_RUN_PROG_SOCK_OPS(sk, &sock_ops, CGROUP_SOCK_OPS);
>
> After looking through the set and looking again at how sk is used in
> __skb_tstamp_tx(), I think the sk must be fullsock here, so using
> __cgroup_bpf_run_filter_sock_ops() as in patch 2 is good. It will be useful to
> have a comment here to explain it must be a fullsock.

Got it, will add more comments on it.

>
> > +}
> >
> > 2. introduce the allow_direct_access flag which is used to test if the
> > socket is allowed to access tcp socket or not.
>
> yeah, right now is only tcp_sock, but future will have UDP TS support.
>
> May be the "allow_direct_access" naming is not obvious to mean the existing
> tcp_sock support. May be "allow_tcp_access"?

I like this name :)

>
> I was thinking to set the allow_direct_access for the "existing" sockops
> callback which must be tcp_sock and must have the sk locked.
>
> > On the basis of the above bpf_skops_tx_timestamping() function, I
> > would add one check there:
> > + if (sk_is_tcp(sk))
> > +       sock_ops. allow_direct_access = 1;
>
> so don't set this in the new TS callback from bpf_skops_tx_timestamping
> regardless it is tcp or not.
>
> >
> > Also, I need to set allow_direct_access to one as long as there is
> > "sock_ops.is_fullsock = 1;" in the existing callbacks.
>
> Only set allow_direct_access when the sk is fullsock in the "existing" sockops
> callback.

Only "existing"? Then how can the bpf program access those members of
the tcp socket structure in the current/new timestamping callbacks?

>
> After thinking a bit more today, I think this should work. Please give it a try
> and check if some cases may be missed in sock_ops_convert_ctx_access().

I will give it a shot this week.

>
> >
> > 3. I will replace is_fullsock with allow_direct_access in
> > SOCK_OPS_GET/SET_FIELD() instead of SOCK_OPS_GET_SK().
>
> Yep.
>
> >
> > Then the udp socket can freely access the socket with the helper
> > SOCK_OPS_GET_SK() because it is a fullsock. And udp socket cannot
> > access struct tcp_sock because in the timestamping callback, there is
> > no place where setting allow_direct_access for udp use.
>
> __sk_buff->sk? yes.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension
  2025-01-16  0:41     ` Jason Xing
@ 2025-01-16  1:18       ` Martin KaFai Lau
  2025-01-16  1:22         ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-16  1:18 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/15/25 4:41 PM, Jason Xing wrote:
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index a0aff1b4eb61..87420c0f2235 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -7037,6 +7037,9 @@ enum {
>>>                                         * feature is on. It indicates the
>>>                                         * recorded timestamp.
>>>                                         */
>>> +     BPF_SOCK_OPS_TS_TCP_SND_CB,     /* Called when every tcp_sendmsg
>>> +                                      * syscall is triggered
>>> +                                      */
>>
>> UDP will need this also?
> 
> Yep.

Then the TCP naming will need to be adjusted.

While on UDP, how the UDP bpf callback will look like during sendmsg?

>>> @@ -1067,10 +1068,15 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>        int flags, err, copied = 0;
>>>        int mss_now = 0, size_goal, copied_syn = 0;
>>>        int process_backlog = 0;
>>> +     u32 first_write_seq = 0;
>>>        int zc = 0;
>>>        long timeo;
>>>
>>>        flags = msg->msg_flags;
>>> +     if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING)) {
>>> +             first_write_seq = tp->write_seq;
>>> +             bpf_skops_tx_timestamping(sk, NULL, BPF_SOCK_OPS_TS_TCP_SND_CB);
>>
>> My preference is to skip this bpf callout for now and depends on a bpf trace
>> program if it is really needed.
> 
> I have no idea if the bpf program wants to record the timestamp here
> without the above three lines? Please enlighten me more. Thanks in
> advance.
> 
> I guess there is one way which I don't know yet to monitor at the
> beginning of tcp_sendmsg_locked().

The tracing bpf program (fentry in particular here). Give the one-liner bpftrace 
script a try.

Take a look at trace_tcp_connect in test_sk_storage_tracing.c. It uses fentry 
and also bpf_sk_storage_get.

If tcp_sendmsg_locked is inline-d, it can go up to the tcp_sendmsg(). It would 
be nice to have a stable bpf callback if it is really useful but I suspect this 
can be revisited later with the UDP support.

[ I will followup other replies later. ]


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension
  2025-01-16  1:18       ` Martin KaFai Lau
@ 2025-01-16  1:22         ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-16  1:22 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Thu, Jan 16, 2025 at 9:18 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/15/25 4:41 PM, Jason Xing wrote:
> >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>> index a0aff1b4eb61..87420c0f2235 100644
> >>> --- a/include/uapi/linux/bpf.h
> >>> +++ b/include/uapi/linux/bpf.h
> >>> @@ -7037,6 +7037,9 @@ enum {
> >>>                                         * feature is on. It indicates the
> >>>                                         * recorded timestamp.
> >>>                                         */
> >>> +     BPF_SOCK_OPS_TS_TCP_SND_CB,     /* Called when every tcp_sendmsg
> >>> +                                      * syscall is triggered
> >>> +                                      */
> >>
> >> UDP will need this also?
> >
> > Yep.
>
> Then the TCP naming will need to be adjusted.

Right, right!

>
> While on UDP, how the UDP bpf callback will look like during sendmsg?
>
> >>> @@ -1067,10 +1068,15 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>        int flags, err, copied = 0;
> >>>        int mss_now = 0, size_goal, copied_syn = 0;
> >>>        int process_backlog = 0;
> >>> +     u32 first_write_seq = 0;
> >>>        int zc = 0;
> >>>        long timeo;
> >>>
> >>>        flags = msg->msg_flags;
> >>> +     if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING)) {
> >>> +             first_write_seq = tp->write_seq;
> >>> +             bpf_skops_tx_timestamping(sk, NULL, BPF_SOCK_OPS_TS_TCP_SND_CB);
> >>
> >> My preference is to skip this bpf callout for now and depends on a bpf trace
> >> program if it is really needed.
> >
> > I have no idea if the bpf program wants to record the timestamp here
> > without the above three lines? Please enlighten me more. Thanks in
> > advance.
> >
> > I guess there is one way which I don't know yet to monitor at the
> > beginning of tcp_sendmsg_locked().
>
> The tracing bpf program (fentry in particular here). Give the one-liner bpftrace
> script a try.
>
> Take a look at trace_tcp_connect in test_sk_storage_tracing.c. It uses fentry
> and also bpf_sk_storage_get.

Thanks for the reference!

>
> If tcp_sendmsg_locked is inline-d, it can go up to the tcp_sendmsg(). It would
> be nice to have a stable bpf callback if it is really useful but I suspect this
> can be revisited later with the UDP support.

Got it!

>
> [ I will followup other replies later. ]
>

Thank you!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
  2025-01-12 11:37 ` [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls Jason Xing
  2025-01-12 14:37   ` kernel test robot
  2025-01-15 21:48   ` Martin KaFai Lau
@ 2025-01-17 10:18   ` kernel test robot
  2 siblings, 0 replies; 61+ messages in thread
From: kernel test robot @ 2025-01-17 10:18 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, horms
  Cc: oe-kbuild-all, bpf, netdev, Jason Xing

Hi Jason,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Xing/net-timestamp-add-support-for-bpf_setsockopt/20250112-194115
base:   net-next/main
patch link:    https://lore.kernel.org/r/20250112113748.73504-6-kerneljasonxing%40gmail.com
patch subject: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
config: arm-randconfig-r071-20250117 (https://download.01.org/0day-ci/archive/20250117/202501171802.CSquHTL3-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 14.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501171802.CSquHTL3-lkp@intel.com/

smatch warnings:
net/core/filter.c:7631 ____bpf_sock_ops_load_hdr_opt() warn: always true condition '(bpf_sock->op != 1009) => (0-255 != 1009)'

vim +7631 net/core/filter.c

  7622	
  7623	BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
  7624		   void *, search_res, u32, len, u64, flags)
  7625	{
  7626		bool eol, load_syn = flags & BPF_LOAD_HDR_OPT_TCP_SYN;
  7627		const u8 *op, *opend, *magic, *search = search_res;
  7628		u8 search_kind, search_len, copy_len, magic_len;
  7629		int ret;
  7630	
> 7631		if (bpf_sock->op != SK_BPF_CB_FLAGS)
  7632			return -EINVAL;
  7633	
  7634		/* 2 byte is the minimal option len except TCPOPT_NOP and
  7635		 * TCPOPT_EOL which are useless for the bpf prog to learn
  7636		 * and this helper disallow loading them also.
  7637		 */
  7638		if (len < 2 || flags & ~BPF_LOAD_HDR_OPT_TCP_SYN)
  7639			return -EINVAL;
  7640	
  7641		search_kind = search[0];
  7642		search_len = search[1];
  7643	
  7644		if (search_len > len || search_kind == TCPOPT_NOP ||
  7645		    search_kind == TCPOPT_EOL)
  7646			return -EINVAL;
  7647	
  7648		if (search_kind == TCPOPT_EXP || search_kind == 253) {
  7649			/* 16 or 32 bit magic.  +2 for kind and kind length */
  7650			if (search_len != 4 && search_len != 6)
  7651				return -EINVAL;
  7652			magic = &search[2];
  7653			magic_len = search_len - 2;
  7654		} else {
  7655			if (search_len)
  7656				return -EINVAL;
  7657			magic = NULL;
  7658			magic_len = 0;
  7659		}
  7660	
  7661		if (load_syn) {
  7662			ret = bpf_sock_ops_get_syn(bpf_sock, TCP_BPF_SYN, &op);
  7663			if (ret < 0)
  7664				return ret;
  7665	
  7666			opend = op + ret;
  7667			op += sizeof(struct tcphdr);
  7668		} else {
  7669			if (!bpf_sock->skb ||
  7670			    bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB)
  7671				/* This bpf_sock->op cannot call this helper */
  7672				return -EPERM;
  7673	
  7674			opend = bpf_sock->skb_data_end;
  7675			op = bpf_sock->skb->data + sizeof(struct tcphdr);
  7676		}
  7677	
  7678		op = bpf_search_tcp_opt(op, opend, search_kind, magic, magic_len,
  7679					&eol);
  7680		if (IS_ERR(op))
  7681			return PTR_ERR(op);
  7682	
  7683		copy_len = op[1];
  7684		ret = copy_len;
  7685		if (copy_len > len) {
  7686			ret = -ENOSPC;
  7687			copy_len = len;
  7688		}
  7689	
  7690		memcpy(search_res, op, copy_len);
  7691		return ret;
  7692	}
  7693	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  2025-01-15 23:56     ` Jason Xing
@ 2025-01-18  0:46       ` Martin KaFai Lau
  2025-01-18  1:43         ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-18  0:46 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/15/25 3:56 PM, Jason Xing wrote:
> On Thu, Jan 16, 2025 at 6:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 1/12/25 3:37 AM, Jason Xing wrote:
>>> Support SCM_TSTAMP_SND case. Then we will get the software
>>> timestamp when the driver is about to send the skb. Later, I
>>> will support the hardware timestamp.
>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 169c6d03d698..0fb31df4ed95 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -5578,6 +5578,9 @@ static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype
>>>        case SCM_TSTAMP_SCHED:
>>>                op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
>>>                break;
>>> +     case SCM_TSTAMP_SND:
>>> +             op = BPF_SOCK_OPS_TS_SW_OPT_CB;
>>
>> For the hwtstamps case, is skb_hwtstamps(skb) set? From looking at a few
>> drivers, it does not look like it. I don't see the hwtstamps support in patch 10
>> either. What did I miss ?
> 
> Sorry, I missed adding a new flag, namely, BPF_SOCK_OPS_TS_HW_OPT_CB.
> I can also skip adding that new one and rename
> BPF_SOCK_OPS_TS_SW_OPT_CB accordingly for sw and hw cases if we
> finally decide to use hwtstamps parameter to distinguish two different
> cases.

I think having a separate BPF_SOCK_OPS_TS_HW_OPT_CB is better considering your 
earlier hwtstamps may be NULL comment. I don't see the drivers I looked at 
passing NULL though but the comment of skb_tstamp_tx did say it may be NULL :/

Regardless, afaict, skb_hwtstamps(skb) is still not set to the hwtstamps passed 
by the driver here. The bpf prog is supposed to directly get the hwtstamps from 
the skops->skb pointer.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-16  1:12           ` Jason Xing
@ 2025-01-18  1:42             ` Martin KaFai Lau
  2025-01-18  1:58               ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-18  1:42 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/15/25 5:12 PM, Jason Xing wrote:
>>> Also, I need to set allow_direct_access to one as long as there is
>>> "sock_ops.is_fullsock = 1;" in the existing callbacks.
>> Only set allow_direct_access when the sk is fullsock in the "existing" sockops
>> callback.
> Only "existing"? Then how can the bpf program access those members of
> the tcp socket structure in the current/new timestamping callbacks?

There is at least one sk write:

	case offsetof(struct bpf_sock_ops, sk_txhash):
		SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
					 struct sock, type);

afaict, the kernel always writes sk->sk_txhash with the sk lock held. The new 
timestamping callbacks cannot write because it does not hold the lock.
Otherwise, it needs another flag in bpf_sock_ops_kern to say read only or not. 
imo, it is too complicated to be worth it.

It is fine for the new timestamping callbacks not able to access the tcp_sock 
fields through the bpf_sock_ops. We are not losing anything. The accessible 
tcp_sock fields through the bpf_sock_ops is limited and the  bpf_sock_ops api is 
pretty much frozen. The bpf prog should use the bpf_core_cast(skops->sk, struct 
tcp_sock). The future UDP timestamping support will likely need to use the 
bpf_core_cast anyway because we are not extending "struct bpf_sock_ops" for the 
udp_sock specific fields.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  2025-01-18  0:46       ` Martin KaFai Lau
@ 2025-01-18  1:43         ` Jason Xing
  2025-01-19 13:38           ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-18  1:43 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Sat, Jan 18, 2025 at 8:47 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/15/25 3:56 PM, Jason Xing wrote:
> > On Thu, Jan 16, 2025 at 6:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> On 1/12/25 3:37 AM, Jason Xing wrote:
> >>> Support SCM_TSTAMP_SND case. Then we will get the software
> >>> timestamp when the driver is about to send the skb. Later, I
> >>> will support the hardware timestamp.
> >>
> >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >>> index 169c6d03d698..0fb31df4ed95 100644
> >>> --- a/net/core/skbuff.c
> >>> +++ b/net/core/skbuff.c
> >>> @@ -5578,6 +5578,9 @@ static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype
> >>>        case SCM_TSTAMP_SCHED:
> >>>                op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
> >>>                break;
> >>> +     case SCM_TSTAMP_SND:
> >>> +             op = BPF_SOCK_OPS_TS_SW_OPT_CB;
> >>
> >> For the hwtstamps case, is skb_hwtstamps(skb) set? From looking at a few
> >> drivers, it does not look like it. I don't see the hwtstamps support in patch 10
> >> either. What did I miss ?
> >
> > Sorry, I missed adding a new flag, namely, BPF_SOCK_OPS_TS_HW_OPT_CB.
> > I can also skip adding that new one and rename
> > BPF_SOCK_OPS_TS_SW_OPT_CB accordingly for sw and hw cases if we
> > finally decide to use hwtstamps parameter to distinguish two different
> > cases.
>
> I think having a separate BPF_SOCK_OPS_TS_HW_OPT_CB is better considering your
> earlier hwtstamps may be NULL comment. I don't see the drivers I looked at
> passing NULL though but the comment of skb_tstamp_tx did say it may be NULL :/

Yep, I was trying not to rely on or trust the hardware/driver's
implementation, or else it will let the bpf program fetch the software
timestamp instead of hardware timestamp which will cause unexpected
behaviour.

After re-reading this part, I reckon that using this SKBTX_IN_PROGRESS
flag is enough to recognize if we are in the hardware timestamping
generation period. I will try this one since it requires much less
modification.

>
> Regardless, afaict, skb_hwtstamps(skb) is still not set to the hwtstamps passed
> by the driver here. The bpf prog is supposed to directly get the hwtstamps from
> the skops->skb pointer.

Right. I will init the skb_shinfo(skb)->hwtstamps first in this hw callback.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-18  1:42             ` Martin KaFai Lau
@ 2025-01-18  1:58               ` Jason Xing
  2025-01-18  2:16                 ` Martin KaFai Lau
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Xing @ 2025-01-18  1:58 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Sat, Jan 18, 2025 at 9:42 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/15/25 5:12 PM, Jason Xing wrote:
> >>> Also, I need to set allow_direct_access to one as long as there is
> >>> "sock_ops.is_fullsock = 1;" in the existing callbacks.
> >> Only set allow_direct_access when the sk is fullsock in the "existing" sockops
> >> callback.
> > Only "existing"? Then how can the bpf program access those members of
> > the tcp socket structure in the current/new timestamping callbacks?
>
> There is at least one sk write:
>
>         case offsetof(struct bpf_sock_ops, sk_txhash):
>                 SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
>                                          struct sock, type);
>
> afaict, the kernel always writes sk->sk_txhash with the sk lock held. The new
> timestamping callbacks cannot write because it does not hold the lock.

Surely, I will handle the sk_txhash case as you suggested :)

> Otherwise, it needs another flag in bpf_sock_ops_kern to say read only or not.
> imo, it is too complicated to be worth it.
>
> It is fine for the new timestamping callbacks not able to access the tcp_sock
> fields through the bpf_sock_ops. We are not losing anything. The accessible

Oh, that was my concern.

> tcp_sock fields through the bpf_sock_ops is limited and the  bpf_sock_ops api is
> pretty much frozen. The bpf prog should use the bpf_core_cast(skops->sk, struct
> tcp_sock). The future UDP timestamping support will likely need to use the
> bpf_core_cast anyway because we are not extending "struct bpf_sock_ops" for the
> udp_sock specific fields.

Thanks! Now I learned an interesting usage about bpf!

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
  2025-01-15 23:32     ` Jason Xing
@ 2025-01-18  2:15       ` Martin KaFai Lau
  2025-01-18  6:28         ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-18  2:15 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/15/25 3:32 PM, Jason Xing wrote:
>> +static bool is_locked_tcp_sock_ops(struct bpf_sock_ops_kern *bpf_sock)
>> +{
>> +       return bpf_sock->op <= BPF_SOCK_OPS_WRITE_HDR_OPT_CB;
> 
> I wonder if I can use the code snippets in the previous reply in this
> thread, only checking if we are in the timestamping callback?
> +#define BPF_SOCK_OPTS_TS               (BPF_SOCK_OPS_TS_SCHED_OPT_CB | \
> +                                        BPF_SOCK_OPS_TS_SW_OPT_CB | \
> +                                        BPF_SOCK_OPS_TS_ACK_OPT_CB | \
> +                                        BPF_SOCK_OPS_TS_TCP_SND_CB)

Note that BPF_SOCK_OPS_*_CB is not a bit.

My understanding is it is a blacklist. Please correct me if I miss-interpret the 
intention.

> 
> Then other developers won't worry too much whether they will cause
> some safety problems. If not, they will/must add callbacks earlier
> than BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

It can't be added earlier because it is in uapi. If the future new cb is safe to 
use these helpers, then it needs to adjust the BPF_SOCK_OPS_WRITE_HDR_OPT_CB 
check. is_locked_tcp_sock_ops() is a whitelist. The worst is someone will 
discover the helpers are not usable in the new cb, so no safety issue.

If forgot to adjust the blacklist and the new cb should not use the helpers, 
then it is a safety issue.

Anyhow, I don't have a strong opinion here. I did think about checking the new 
TS callback instead. I went with the simplest way in the code and also 
considering the BPF_SOCK_OPS_TS_*_CB is only introduced starting from patch 7.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-18  1:58               ` Jason Xing
@ 2025-01-18  2:16                 ` Martin KaFai Lau
  2025-01-18  2:37                   ` Jason Xing
  0 siblings, 1 reply; 61+ messages in thread
From: Martin KaFai Lau @ 2025-01-18  2:16 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On 1/17/25 5:58 PM, Jason Xing wrote:
>> On 1/15/25 5:12 PM, Jason Xing wrote:
>>>>> Also, I need to set allow_direct_access to one as long as there is
>>>>> "sock_ops.is_fullsock = 1;" in the existing callbacks.
>>>> Only set allow_direct_access when the sk is fullsock in the "existing" sockops
>>>> callback.
>>> Only "existing"? Then how can the bpf program access those members of
>>> the tcp socket structure in the current/new timestamping callbacks?
>> There is at least one sk write:
>>
>>          case offsetof(struct bpf_sock_ops, sk_txhash):
>>                  SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
>>                                           struct sock, type);
>>
>> afaict, the kernel always writes sk->sk_txhash with the sk lock held. The new
>> timestamping callbacks cannot write because it does not hold the lock.
> Surely, I will handle the sk_txhash case as you suggested 🙂

to be clear, not setting the allow_tcp_access in the new timestamping cb should do.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog
  2025-01-18  2:16                 ` Martin KaFai Lau
@ 2025-01-18  2:37                   ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-18  2:37 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Sat, Jan 18, 2025 at 10:17 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/17/25 5:58 PM, Jason Xing wrote:
> >> On 1/15/25 5:12 PM, Jason Xing wrote:
> >>>>> Also, I need to set allow_direct_access to one as long as there is
> >>>>> "sock_ops.is_fullsock = 1;" in the existing callbacks.
> >>>> Only set allow_direct_access when the sk is fullsock in the "existing" sockops
> >>>> callback.
> >>> Only "existing"? Then how can the bpf program access those members of
> >>> the tcp socket structure in the current/new timestamping callbacks?
> >> There is at least one sk write:
> >>
> >>          case offsetof(struct bpf_sock_ops, sk_txhash):
> >>                  SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
> >>                                           struct sock, type);
> >>
> >> afaict, the kernel always writes sk->sk_txhash with the sk lock held. The new
> >> timestamping callbacks cannot write because it does not hold the lock.
> > Surely, I will handle the sk_txhash case as you suggested 🙂
>
> to be clear, not setting the allow_tcp_access in the new timestamping cb should do.

Right, I will only apply to the existing callbacks. I think your last
email is pretty clear to me and dispelled my concern. Prior to this, I
was worried about not being allowed to access struct tcp_sock in
timestamping cb. Thanks for your guidance.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls
  2025-01-18  2:15       ` Martin KaFai Lau
@ 2025-01-18  6:28         ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-18  6:28 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Sat, Jan 18, 2025 at 10:15 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/15/25 3:32 PM, Jason Xing wrote:
> >> +static bool is_locked_tcp_sock_ops(struct bpf_sock_ops_kern *bpf_sock)
> >> +{
> >> +       return bpf_sock->op <= BPF_SOCK_OPS_WRITE_HDR_OPT_CB;
> >
> > I wonder if I can use the code snippets in the previous reply in this
> > thread, only checking if we are in the timestamping callback?
> > +#define BPF_SOCK_OPTS_TS               (BPF_SOCK_OPS_TS_SCHED_OPT_CB | \
> > +                                        BPF_SOCK_OPS_TS_SW_OPT_CB | \
> > +                                        BPF_SOCK_OPS_TS_ACK_OPT_CB | \
> > +                                        BPF_SOCK_OPS_TS_TCP_SND_CB)
>
> Note that BPF_SOCK_OPS_*_CB is not a bit.
>
> My understanding is it is a blacklist. Please correct me if I miss-interpret the
> intention.

Yes, blacklist it is.

>
> >
> > Then other developers won't worry too much whether they will cause
> > some safety problems. If not, they will/must add callbacks earlier
> > than BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
>
> It can't be added earlier because it is in uapi. If the future new cb is safe to
> use these helpers, then it needs to adjust the BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> check. is_locked_tcp_sock_ops() is a whitelist. The worst is someone will
> discover the helpers are not usable in the new cb, so no safety issue.
>
> If forgot to adjust the blacklist and the new cb should not use the helpers,
> then it is a safety issue.
>
> Anyhow, I don't have a strong opinion here. I did think about checking the new
> TS callback instead. I went with the simplest way in the code and also
> considering the BPF_SOCK_OPS_TS_*_CB is only introduced starting from patch 7.

Got it, I will follow your instructions :)

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  2025-01-18  1:43         ` Jason Xing
@ 2025-01-19 13:38           ` Jason Xing
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Xing @ 2025-01-19 13:38 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, horms, bpf, netdev

On Sat, Jan 18, 2025 at 9:43 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Sat, Jan 18, 2025 at 8:47 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 1/15/25 3:56 PM, Jason Xing wrote:
> > > On Thu, Jan 16, 2025 at 6:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > >>
> > >> On 1/12/25 3:37 AM, Jason Xing wrote:
> > >>> Support SCM_TSTAMP_SND case. Then we will get the software
> > >>> timestamp when the driver is about to send the skb. Later, I
> > >>> will support the hardware timestamp.
> > >>
> > >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > >>> index 169c6d03d698..0fb31df4ed95 100644
> > >>> --- a/net/core/skbuff.c
> > >>> +++ b/net/core/skbuff.c
> > >>> @@ -5578,6 +5578,9 @@ static void __skb_tstamp_tx_bpf(struct sk_buff *skb, struct sock *sk, int tstype
> > >>>        case SCM_TSTAMP_SCHED:
> > >>>                op = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
> > >>>                break;
> > >>> +     case SCM_TSTAMP_SND:
> > >>> +             op = BPF_SOCK_OPS_TS_SW_OPT_CB;
> > >>
> > >> For the hwtstamps case, is skb_hwtstamps(skb) set? From looking at a few
> > >> drivers, it does not look like it. I don't see the hwtstamps support in patch 10
> > >> either. What did I miss ?
> > >
> > > Sorry, I missed adding a new flag, namely, BPF_SOCK_OPS_TS_HW_OPT_CB.
> > > I can also skip adding that new one and rename
> > > BPF_SOCK_OPS_TS_SW_OPT_CB accordingly for sw and hw cases if we
> > > finally decide to use hwtstamps parameter to distinguish two different
> > > cases.
> >
> > I think having a separate BPF_SOCK_OPS_TS_HW_OPT_CB is better considering your
> > earlier hwtstamps may be NULL comment. I don't see the drivers I looked at
> > passing NULL though but the comment of skb_tstamp_tx did say it may be NULL :/
>
> Yep, I was trying not to rely on or trust the hardware/driver's
> implementation, or else it will let the bpf program fetch the software
> timestamp instead of hardware timestamp which will cause unexpected
> behaviour.
>
> After re-reading this part, I reckon that using this SKBTX_IN_PROGRESS
> flag is enough to recognize if we are in the hardware timestamping
> generation period. I will try this one since it requires much less
> modification.

For the record only, SKBTX_IN_PROGRESS cannot represent if we're
generating the hardware timestamp. For example, in I40e driver,
i40e_xmit_frame_ring() calls i40e_tsyn() to set SKBTX_IN_PROGRESS
before generating the software SND timestamp in i40e_tx_map() by
calling skb_tx_timestamp().

Therefore, I will use the current patch directly.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2025-01-19 13:39 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt() Jason Xing
2025-01-12 14:49   ` kernel test robot
2025-01-13  0:11     ` Jason Xing
2025-01-13  7:32       ` Jason Xing
2025-01-14 23:20   ` Martin KaFai Lau
2025-01-14 23:29     ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use Jason Xing
2025-01-14 23:39   ` Martin KaFai Lau
2025-01-15  0:09     ` Jason Xing
2025-01-15  0:15       ` Jason Xing
2025-01-15  0:26         ` Martin KaFai Lau
2025-01-15  0:37           ` Jason Xing
2025-01-15  0:43             ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog Jason Xing
2025-01-15  1:17   ` Martin KaFai Lau
2025-01-15  2:28     ` Jason Xing
2025-01-15  2:54       ` Jason Xing
2025-01-16  0:51         ` Martin KaFai Lau
2025-01-16  1:12           ` Jason Xing
2025-01-18  1:42             ` Martin KaFai Lau
2025-01-18  1:58               ` Jason Xing
2025-01-18  2:16                 ` Martin KaFai Lau
2025-01-18  2:37                   ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 04/15] net-timestamp: support SK_BPF_CB_FLAGS only in bpf_sock_ops_setsockopt Jason Xing
2025-01-15 21:22   ` Martin KaFai Lau
2025-01-15 23:26     ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls Jason Xing
2025-01-12 14:37   ` kernel test robot
2025-01-13  0:28     ` Jason Xing
2025-01-15 21:48   ` Martin KaFai Lau
2025-01-15 23:32     ` Jason Xing
2025-01-18  2:15       ` Martin KaFai Lau
2025-01-18  6:28         ` Jason Xing
2025-01-17 10:18   ` kernel test robot
2025-01-12 11:37 ` [PATCH net-next v5 06/15] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
2025-01-15 22:11   ` Martin KaFai Lau
2025-01-15 23:50     ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 07/15] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension Jason Xing
2025-01-15 22:32   ` Martin KaFai Lau
2025-01-15 23:57     ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND " Jason Xing
2025-01-15 22:48   ` Martin KaFai Lau
2025-01-15 23:56     ` Jason Xing
2025-01-18  0:46       ` Martin KaFai Lau
2025-01-18  1:43         ` Jason Xing
2025-01-19 13:38           ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 09/15] net-timestamp: support SCM_TSTAMP_ACK " Jason Xing
2025-01-15 23:02   ` Martin KaFai Lau
2025-01-12 11:37 ` [PATCH net-next v5 10/15] net-timestamp: support hw SCM_TSTAMP_SND " Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 11/15] net-timestamp: support export skb to the userspace Jason Xing
2025-01-15 23:05   ` Martin KaFai Lau
2025-01-15 23:59     ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 12/15] net-timestamp: make TCP tx timestamp bpf extension work Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension Jason Xing
2025-01-16  0:03   ` Martin KaFai Lau
2025-01-16  0:41     ` Jason Xing
2025-01-16  1:18       ` Martin KaFai Lau
2025-01-16  1:22         ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 14/15] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 15/15] bpf: add simple bpf tests in the tx path for so_timestamping feature Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).