[PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
@ 2024-10-08  9:51 Jason Xing
  2024-10-08  9:51 ` [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later Jason Xing
                   ` (9 more replies)
  0 siblings, 10 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
tracepoint to print information (say, tstamp) so that we can
transparently equip applications with this feature and require no
modification in user side.

Later, we discussed at netconf and agreed that we can use bpf for better
extension, which is mainly suggested by John Fastabend and Willem de
Bruijn. Many thanks here! So I post this series to see if we have a
better solution to extend. 

This approach relies on existing SO_TIMESTAMPING feature, for tx path,
users only needs to pass certain flags through bpf program to make sure
the last skb from each sendmsg() has timestamp related controlled flag.
For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
and wait for the moment when recvmsg() is called.

After this series, we could step by step implement more advanced
functions/flags already in SO_TIMESTAMPING feature for bpf extension.

Here is the test output:
1) receive path
iperf3-987305  [008] ...11 179955.200990: bpf_trace_printk: rx: port: 5201:55192, swtimestamp: 1728167973,670426346, hwtimestamp: 0,0
2) xmit path
iperf3-19765   [013] ...11  2021.329602: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436678584
iperf3-19765   [013] b..11  2021.329611: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436689976
iperf3-19765   [013] ...11  2021.329622: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436700739

Here is the full bpf program:
#include <linux/bpf.h>

#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#include <uapi/linux/net_tstamp.h>

int _version SEC("version") = 1;
char _license[] SEC("license") = "GPL";

# define SO_TIMESTAMPING         37

__section("sockops")
int set_initial_rto(struct bpf_sock_ops *skops)
{
	int op = (int) skops->op;
	u32 sport = 0, dport = 0;
	int rcv_flags;

	switch (op) {
	case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
	case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
		rcv_flags = SOF_TIMESTAMPING_RX_SOFTWARE;
		bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &rcv_flags, sizeof(rcv_flags));
		bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG);
		break;
	case BPF_SOCK_OPS_TX_TS_OPT_CB:
		skops->reply = SOF_TIMESTAMPING_TX_SCHED|SOF_TIMESTAMPING_TX_ACK|SOF_TIMESTAMPING_TX_SOFTWARE|
		SOF_TIMESTAMPING_OPT_ID|SOF_TIMESTAMPING_OPT_ID_TCP;
		bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG);
		break;
	case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
	case BPF_SOCK_OPS_TS_SW_OPT_CB:
	case BPF_SOCK_OPS_TS_ACK_OPT_CB:
		dport = bpf_ntohl(skops->remote_port);
		sport = skops->local_port;
		bpf_printk("tx: port: %u:%u, key: %u, timestamp: %u,%u\n",
			   sport, dport, skops->args[0], skops->args[1], skops->args[2]);
		break;
	case BPF_SOCK_OPS_TS_RX_OPT_CB:
		dport = bpf_ntohl(skops->remote_port);
		sport = skops->local_port;
		bpf_printk("rx: port: %u:%u, swtimestamp: %u,%u, hwtimestamp: %u,%u\n",
			   sport, dport, skops->args[0], skops->args[1], skops->args[2], skops->args[3]);
		break;
	}
	return 1;
}

Jason Xing (9):
  net-timestamp: add bpf infrastructure to allow exposing more
    information later
  net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit
    timestamp
  net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp
  net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp
  net-timestamp: ready to turn on the button to generate tx timestamps
  net-timestamp: add tx OPT_ID_TCP support for bpf case
  net-timestamp: open gate for bpf_setsockopt
  net-timestamp: add bpf framework for rx timestamps
  net-timestamp: add bpf support for rx software/hardware timestamp

 include/linux/tcp.h            |  2 +-
 include/net/tcp.h              | 14 ++++++
 include/uapi/linux/bpf.h       | 36 ++++++++++++++-
 net/core/filter.c              |  3 ++
 net/core/skbuff.c              | 51 +++++++++++++++++++++
 net/ipv4/tcp.c                 | 81 ++++++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 36 ++++++++++++++-
 7 files changed, 217 insertions(+), 6 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
@ 2024-10-08  9:51 ` Jason Xing
  2024-10-08 18:45   ` Willem de Bruijn
  2024-10-09  0:58   ` Kuniyuki Iwashima
  2024-10-08  9:51 ` [PATCH net-next 2/9] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp Jason Xing
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Implement basic codes so that we later can easily add each tx points.
Introducing BPF_SOCK_OPS_ALL_CB_FLAGS used as a test statement can help use
control whether to output or not.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/uapi/linux/bpf.h       |  5 ++++-
 net/core/skbuff.c              | 18 ++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  5 ++++-
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c6cd7c7aeeee..157e139ed6fc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6900,8 +6900,11 @@ enum {
 	 * options first before the BPF program does.
 	 */
 	BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
+	/* Call bpf when the kernel is generating tx timestamps.
+	 */
+	BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
 /* Mask of all currently supported cb flags */
-	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
+	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,
 };
 
 /* List of known BPF sock_ops operators.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 74149dc4ee31..5ff1a91c1204 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5539,6 +5539,21 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 }
 EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
 
+static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
+			      struct skb_shared_hwtstamps *hwtstamps)
+{
+	struct tcp_sock *tp;
+
+	if (!sk_is_tcp(sk))
+		return false;
+
+	tp = tcp_sk(sk);
+	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
+		return true;
+
+	return false;
+}
+
 void __skb_tstamp_tx(struct sk_buff *orig_skb,
 		     const struct sk_buff *ack_skb,
 		     struct skb_shared_hwtstamps *hwtstamps,
@@ -5551,6 +5566,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (!sk)
 		return;
 
+	if (bpf_skb_tstamp_tx(sk, tstype, hwtstamps))
+		return;
+
 	tsflags = READ_ONCE(sk->sk_tsflags);
 	if (!hwtstamps && !(tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) &&
 	    skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1fb3cb2636e6..93853d9d4922 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6899,8 +6899,11 @@ enum {
 	 * options first before the BPF program does.
 	 */
 	BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
+	/* Call bpf when the kernel is generating tx timestamps.
+	 */
+	BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
 /* Mask of all currently supported cb flags */
-	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
+	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,
 };
 
 /* List of known BPF sock_ops operators.
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 2/9] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
  2024-10-08  9:51 ` [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later Jason Xing
@ 2024-10-08  9:51 ` Jason Xing
  2024-10-08  9:51 ` [PATCH net-next 3/9] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp Jason Xing
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Introduce BPF_SOCK_OPS_TS_SCHED_OPT_CB flag so that we can decide to
print timestamps when the skb just passes the dev layer.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/uapi/linux/bpf.h       |  5 +++++
 net/core/skbuff.c              | 16 +++++++++++++++-
 tools/include/uapi/linux/bpf.h |  5 +++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 157e139ed6fc..3cf3c9c896c7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7019,6 +7019,11 @@ enum {
 					 * by the kernel or the
 					 * earlier bpf-progs.
 					 */
+	BPF_SOCK_OPS_TS_SCHED_OPT_CB,	/* Called when skb is passing through
+					 * dev layer when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5ff1a91c1204..e697f50d1182 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5548,8 +5548,22 @@ static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
 		return false;
 
 	tp = tcp_sk(sk);
-	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
+	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)) {
+		struct timespec64 tstamp;
+		u32 cb_flag;
+
+		switch (scm_flag) {
+		case SCM_TSTAMP_SCHED:
+			cb_flag = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
+			break;
+		default:
+			return true;
+		}
+
+		tstamp = ktime_to_timespec64(ktime_get_real());
+		tcp_call_bpf_2arg(sk, cb_flag, tstamp.tv_sec, tstamp.tv_nsec);
 		return true;
+	}
 
 	return false;
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 93853d9d4922..d60675e1a5a0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7018,6 +7018,11 @@ enum {
 					 * by the kernel or the
 					 * earlier bpf-progs.
 					 */
+	BPF_SOCK_OPS_TS_SCHED_OPT_CB,	/* Called when skb is passing through
+					 * dev layer when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 3/9] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
  2024-10-08  9:51 ` [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later Jason Xing
  2024-10-08  9:51 ` [PATCH net-next 2/9] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp Jason Xing
@ 2024-10-08  9:51 ` Jason Xing
  2024-10-08 19:13   ` Vadim Fedorenko
  2024-10-08  9:51 ` [PATCH net-next 4/9] net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp Jason Xing
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

When the skb is about to send from driver to nic, we can print timestamp
by setting BPF_SOCK_OPS_TS_SW_OPT_CB in bpf program.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/uapi/linux/bpf.h       | 5 +++++
 net/core/skbuff.c              | 8 +++++++-
 tools/include/uapi/linux/bpf.h | 5 +++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3cf3c9c896c7..0d00539f247a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7024,6 +7024,11 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_SW_OPT_CB,	/* Called when skb is about to send
+					 * to the nic when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e697f50d1182..8faaa96c026b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5556,11 +5556,17 @@ static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
 		case SCM_TSTAMP_SCHED:
 			cb_flag = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
 			break;
+		case SCM_TSTAMP_SND:
+			cb_flag = BPF_SOCK_OPS_TS_SW_OPT_CB;
+			break;
 		default:
 			return true;
 		}
 
-		tstamp = ktime_to_timespec64(ktime_get_real());
+		if (hwtstamps)
+			tstamp = ktime_to_timespec64(hwtstamps->hwtstamp);
+		else
+			tstamp = ktime_to_timespec64(ktime_get_real());
 		tcp_call_bpf_2arg(sk, cb_flag, tstamp.tv_sec, tstamp.tv_nsec);
 		return true;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d60675e1a5a0..020ec14ffae6 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7023,6 +7023,11 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_SW_OPT_CB,	/* Called when skb is about to send
+					 * to the nic when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 4/9] net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (2 preceding siblings ...)
  2024-10-08  9:51 ` [PATCH net-next 3/9] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp Jason Xing
@ 2024-10-08  9:51 ` Jason Xing
  2024-10-08  9:51 ` [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps Jason Xing
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

When the last sent skb in each sendmsg() is acknowledged in TCP layer,
we can print timestamp by setting BPF_SOCK_OPS_TS_ACK_OPT_CB in
bpf program.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/uapi/linux/bpf.h       | 5 +++++
 net/core/skbuff.c              | 3 +++
 tools/include/uapi/linux/bpf.h | 5 +++++
 3 files changed, 13 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0d00539f247a..1b478ec18ac2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7029,6 +7029,11 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_ACK_OPT_CB,	/* Called when all the skbs are
+					 * acknowledged when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 8faaa96c026b..2b1b2f7d271a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5559,6 +5559,9 @@ static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
 		case SCM_TSTAMP_SND:
 			cb_flag = BPF_SOCK_OPS_TS_SW_OPT_CB;
 			break;
+		case SCM_TSTAMP_ACK:
+			cb_flag = BPF_SOCK_OPS_TS_ACK_OPT_CB;
+			break;
 		default:
 			return true;
 		}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 020ec14ffae6..fc9b94de19f2 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7028,6 +7028,11 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TS_ACK_OPT_CB,	/* Called when all the skbs are
+					 * acknowledged when SO_TIMESTAMPING
+					 * feature is on. It indicates the
+					 * recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (3 preceding siblings ...)
  2024-10-08  9:51 ` [PATCH net-next 4/9] net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp Jason Xing
@ 2024-10-08  9:51 ` Jason Xing
  2024-10-08 18:53   ` Willem de Bruijn
  2024-10-08 19:18   ` Vadim Fedorenko
  2024-10-08  9:51 ` [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case Jason Xing
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Once we set BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG flag here, there
are three points in the previous patches where generating timestamps
works. Let us make the basic bpf mechanism for timestamping feature
 work finally.

We can use like this as a simple example in bpf program:
__section("sockops")

case BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB:
	dport = bpf_ntohl(skops->remote_port);
	sport = skops->local_port;
	skops->reply = SOF_TIMESTAMPING_TX_SCHED;
	bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG);
case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
	bpf_printk(...);

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/uapi/linux/bpf.h       |  8 ++++++++
 net/ipv4/tcp.c                 | 27 ++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h |  8 ++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1b478ec18ac2..6bf3f2892776 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7034,6 +7034,14 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TX_TS_OPT_CB,	/* Called when the last skb from
+					 * sendmsg is going to push when
+					 * SO_TIMESTAMPING feature is on.
+					 * Let user have a chance to switch
+					 * on BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG
+					 * flag for other three tx timestamp
+					 * use.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 82cc4a5633ce..ddf4089779b5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -477,12 +477,37 @@ void tcp_init_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
+static u32 bpf_tcp_tx_timestamp(struct sock *sk)
+{
+	u32 flags;
+
+	flags = tcp_call_bpf(sk, BPF_SOCK_OPS_TX_TS_OPT_CB, 0, NULL);
+	if (flags <= 0)
+		return 0;
+
+	if (flags & ~SOF_TIMESTAMPING_MASK)
+		return 0;
+
+	if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
+		return 0;
+
+	return flags;
+}
+
 static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
 {
 	struct sk_buff *skb = tcp_write_queue_tail(sk);
 	u32 tsflags = sockc->tsflags;
+	u32 flags;
+
+	if (!skb)
+		return;
+
+	flags = bpf_tcp_tx_timestamp(sk);
+	if (flags)
+		tsflags = flags;
 
-	if (tsflags && skb) {
+	if (tsflags) {
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
 		struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index fc9b94de19f2..d3bf538846da 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7033,6 +7033,14 @@ enum {
 					 * feature is on. It indicates the
 					 * recorded timestamp.
 					 */
+	BPF_SOCK_OPS_TX_TS_OPT_CB,	/* Called when the last skb from
+					 * sendmsg is going to push when
+					 * SO_TIMESTAMPING feature is on.
+					 * Let user have a chance to switch
+					 * on BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG
+					 * flag for other three tx timestamp
+					 * use.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (4 preceding siblings ...)
  2024-10-08  9:51 ` [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps Jason Xing
@ 2024-10-08  9:51 ` Jason Xing
  2024-10-08 18:56   ` Willem de Bruijn
  2024-10-08  9:51 ` [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt Jason Xing
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

We can set OPT_ID|OPT_ID_TCP before we initialize the last skb
from each sendmsg. We only set the socket once like how we use
setsockopt() with OPT_ID|OPT_ID_TCP flags.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/core/skbuff.c | 16 +++++++++++++---
 net/ipv4/tcp.c    | 19 +++++++++++++++----
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2b1b2f7d271a..a60aec450970 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5540,6 +5540,7 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
 
 static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
+			      struct sk_buff *skb,
 			      struct skb_shared_hwtstamps *hwtstamps)
 {
 	struct tcp_sock *tp;
@@ -5550,7 +5551,7 @@ static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
 	tp = tcp_sk(sk);
 	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)) {
 		struct timespec64 tstamp;
-		u32 cb_flag;
+		u32 cb_flag, key = 0;
 
 		switch (scm_flag) {
 		case SCM_TSTAMP_SCHED:
@@ -5566,11 +5567,20 @@ static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
 			return true;
 		}
 
+		/* We require user to set OPT_ID_TCP to generate key value
+		 * in a robust way.
+		 */
+		if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID &&
+		    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID_TCP) {
+			key = skb_shinfo(skb)->tskey;
+			key -= atomic_read(&sk->sk_tskey);
+		}
+
 		if (hwtstamps)
 			tstamp = ktime_to_timespec64(hwtstamps->hwtstamp);
 		else
 			tstamp = ktime_to_timespec64(ktime_get_real());
-		tcp_call_bpf_2arg(sk, cb_flag, tstamp.tv_sec, tstamp.tv_nsec);
+		tcp_call_bpf_3arg(sk, cb_flag, key, tstamp.tv_sec, tstamp.tv_nsec);
 		return true;
 	}
 
@@ -5589,7 +5599,7 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (!sk)
 		return;
 
-	if (bpf_skb_tstamp_tx(sk, tstype, hwtstamps))
+	if (bpf_skb_tstamp_tx(sk, tstype, orig_skb, hwtstamps))
 		return;
 
 	tsflags = READ_ONCE(sk->sk_tsflags);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ddf4089779b5..1d52640f9ff4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -477,7 +477,7 @@ void tcp_init_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
-static u32 bpf_tcp_tx_timestamp(struct sock *sk)
+static u32 bpf_tcp_tx_timestamp(struct sock *sk, int copied)
 {
 	u32 flags;
 
@@ -491,10 +491,21 @@ static u32 bpf_tcp_tx_timestamp(struct sock *sk)
 	if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
 		return 0;
 
+	/* We require users to set both OPT_ID and OPT_ID_TCP flags
+	 * together here, or else the key might be inaccurate.
+	 */
+	if (flags & SOF_TIMESTAMPING_OPT_ID &&
+	    flags & SOF_TIMESTAMPING_OPT_ID_TCP &&
+	    !(sk->sk_tsflags & (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP))) {
+		atomic_set(&sk->sk_tskey, (tcp_sk(sk)->write_seq - copied));
+		sk->sk_tsflags |= (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP);
+	}
+
 	return flags;
 }
 
-static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
+static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc,
+			     int copied)
 {
 	struct sk_buff *skb = tcp_write_queue_tail(sk);
 	u32 tsflags = sockc->tsflags;
@@ -503,7 +514,7 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
 	if (!skb)
 		return;
 
-	flags = bpf_tcp_tx_timestamp(sk);
+	flags = bpf_tcp_tx_timestamp(sk, copied);
 	if (flags)
 		tsflags = flags;
 
@@ -1347,7 +1358,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 out:
 	if (copied) {
-		tcp_tx_timestamp(sk, &sockc);
+		tcp_tx_timestamp(sk, &sockc, copied);
 		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
 	}
 out_nopush:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (5 preceding siblings ...)
  2024-10-08  9:51 ` [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case Jason Xing
@ 2024-10-08  9:51 ` Jason Xing
  2024-10-09  7:19   ` Martin KaFai Lau
  2024-10-08  9:51 ` [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps Jason Xing
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Now we allow users to set tsflags through bpf_setsockopt. What I
want to do is passing SOF_TIMESTAMPING_RX_SOFTWARE flag, so that
we can generate rx timestamps the moment the skb traverses through
driver.

Here is an example:

case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
	sock_opt = SOF_TIMESTAMPING_RX_SOFTWARE;
	bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING,
		       &sock_opt, sizeof(sock_opt));
	break;

In this way, we can use bpf program that help us generate and report
rx timestamp.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/core/filter.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index bd0d08bf76bb..9ce99d320571 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5225,6 +5225,9 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
 		break;
 	case SO_BINDTODEVICE:
 		break;
+	case SO_TIMESTAMPING_NEW:
+	case SO_TIMESTAMPING_OLD:
+		break;
 	default:
 		return -EINVAL;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (6 preceding siblings ...)
  2024-10-08  9:51 ` [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt Jason Xing
@ 2024-10-08  9:51 ` Jason Xing
  2024-10-09  0:22   ` Jakub Kicinski
                     ` (3 more replies)
  2024-10-08  9:51 ` [PATCH net-next 9/9] net-timestamp: add bpf support for rx software/hardware timestamp Jason Xing
  2024-10-08 18:44 ` [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Willem de Bruijn
  9 siblings, 4 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Prepare for later changes in this series. Here I use u32 for
bpf_sock_ops_cb_flags for better extension and introduce a new
rx bpf flag to control separately.

Main change is let userside set through bpf_setsockopt() for
SO_TIMESTAMPING feature.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/linux/tcp.h            |  2 +-
 include/uapi/linux/bpf.h       |  5 ++++-
 net/ipv4/tcp.c                 | 13 +++++++++++++
 tools/include/uapi/linux/bpf.h |  5 ++++-
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 6a5e08b937b3..e21fd3035962 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -446,7 +446,7 @@ struct tcp_sock {
 
 /* Sock_ops bpf program related variables */
 #ifdef CONFIG_BPF
-	u8	bpf_sock_ops_cb_flags;  /* Control calling BPF programs
+	u32	bpf_sock_ops_cb_flags;  /* Control calling BPF programs
 					 * values defined in uapi/linux/tcp.h
 					 */
 	u8	bpf_chg_cc_inprogress:1; /* In the middle of
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6bf3f2892776..3c28d74d14ea 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6903,8 +6903,11 @@ enum {
 	/* Call bpf when the kernel is generating tx timestamps.
 	 */
 	BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
+	/* Call bpf when the kernel is generating rx timestamps.
+	 */
+	BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG = (1<<8),
 /* Mask of all currently supported cb flags */
-	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,
+	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x1FF,
 };
 
 /* List of known BPF sock_ops operators.
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1d52640f9ff4..938e2bff4fa6 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2276,6 +2276,16 @@ static int tcp_zerocopy_receive(struct sock *sk,
 }
 #endif
 
+static bool tcp_bpf_recv_timestamp(struct sock *sk, struct scm_timestamping_internal *tss)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG))
+		return true;
+
+	return false;
+}
+
 /* Similar to __sock_recv_timestamp, but does not require an skb */
 void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
 			struct scm_timestamping_internal *tss)
@@ -2284,6 +2294,9 @@ void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
 	u32 tsflags = READ_ONCE(sk->sk_tsflags);
 	bool has_timestamping = false;
 
+	if (tcp_bpf_recv_timestamp(sk, tss))
+		return;
+
 	if (tss->ts[0].tv_sec || tss->ts[0].tv_nsec) {
 		if (sock_flag(sk, SOCK_RCVTSTAMP)) {
 			if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d3bf538846da..ff17cd820bde 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6902,8 +6902,11 @@ enum {
 	/* Call bpf when the kernel is generating tx timestamps.
 	 */
 	BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
+	/* Call bpf when the kernel is generating rx timestamps.
+	 */
+	BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG = (1<<8),
 /* Mask of all currently supported cb flags */
-	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,
+	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x1FF,
 };
 
 /* List of known BPF sock_ops operators.
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 9/9] net-timestamp: add bpf support for rx software/hardware timestamp
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (7 preceding siblings ...)
  2024-10-08  9:51 ` [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps Jason Xing
@ 2024-10-08  9:51 ` Jason Xing
  2024-10-08 18:44 ` [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Willem de Bruijn
  9 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-08  9:51 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Now it's time to let the bpf for rx timestamp take effect.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/tcp.h              | 14 ++++++++++++++
 include/uapi/linux/bpf.h       |  5 +++++
 net/ipv4/tcp.c                 | 28 +++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h |  5 +++++
 4 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 739a9fb83d0c..416a039da472 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2676,6 +2676,14 @@ static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2,
 	return tcp_call_bpf(sk, op, 3, args);
 }
 
+static inline int tcp_call_bpf_4arg(struct sock *sk, int op, u32 arg1, u32 arg2,
+				    u32 arg3, u32 arg4)
+{
+	u32 args[4] = {arg1, arg2, arg3, arg4};
+
+	return tcp_call_bpf(sk, op, 4, args);
+}
+
 #else
 static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
 {
@@ -2693,6 +2701,12 @@ static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2,
 	return -EPERM;
 }
 
+static inline int tcp_call_bpf_4arg(struct sock *sk, int op, u32 arg1, u32 arg2,
+				    u32 arg3, u32 arg4)
+{
+	return -EPERM;
+}
+
 #endif
 
 static inline u32 tcp_timeout_init(struct sock *sk)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3c28d74d14ea..ffaa483f1362 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7045,6 +7045,11 @@ enum {
 					 * flag for other three tx timestamp
 					 * use.
 					 */
+	BPF_SOCK_OPS_TS_RX_OPT_CB,	/* Called when tcp layer tries to
+					 * receive skbs with timestamps when
+					 * SO_TIMESTAMPING feature is on
+					 * It indicates the recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 938e2bff4fa6..f6addd26db9f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2278,10 +2278,36 @@ static int tcp_zerocopy_receive(struct sock *sk,
 
 static bool tcp_bpf_recv_timestamp(struct sock *sk, struct scm_timestamping_internal *tss)
 {
+	u32 tsflags = READ_ONCE(sk->sk_tsflags);
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG))
+	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG)) {
+		u32 hw_sec, hw_nsec, sw_sec, sw_nsec;
+
+		if (!(tsflags & (SOF_TIMESTAMPING_RX_SOFTWARE |
+		      SOF_TIMESTAMPING_RX_HARDWARE)))
+			return true;
+
+		if (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) {
+			sw_sec = tss->ts[0].tv_sec;
+			sw_nsec = tss->ts[0].tv_nsec;
+		} else {
+			sw_sec = 0;
+			sw_nsec = 0;
+		}
+
+		if (tsflags & SOF_TIMESTAMPING_RX_HARDWARE) {
+			hw_sec = tss->ts[2].tv_sec;
+			hw_nsec = tss->ts[2].tv_nsec;
+		} else {
+			hw_sec = 0;
+			hw_nsec = 0;
+		}
+
+		tcp_call_bpf_4arg(sk, BPF_SOCK_OPS_TS_RX_OPT_CB,
+				  sw_sec, sw_nsec, hw_sec, hw_nsec);
 		return true;
+	}
 
 	return false;
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ff17cd820bde..8a87fee2e012 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7044,6 +7044,11 @@ enum {
 					 * flag for other three tx timestamp
 					 * use.
 					 */
+	BPF_SOCK_OPS_TS_RX_OPT_CB,	/* Called when tcp layer tries to
+					 * receive skbs with timestamps when
+					 * SO_TIMESTAMPING feature is on
+					 * It indicates the recorded timestamp.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
                   ` (8 preceding siblings ...)
  2024-10-08  9:51 ` [PATCH net-next 9/9] net-timestamp: add bpf support for rx software/hardware timestamp Jason Xing
@ 2024-10-08 18:44 ` Willem de Bruijn
  2024-10-08 23:22   ` Jason Xing
  9 siblings, 1 reply; 49+ messages in thread
From: Willem de Bruijn @ 2024-10-08 18:44 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> tracepoint to print information (say, tstamp) so that we can
> transparently equip applications with this feature and require no
> modification in user side.
> 
> Later, we discussed at netconf and agreed that we can use bpf for better
> extension, which is mainly suggested by John Fastabend and Willem de
> Bruijn. Many thanks here! So I post this series to see if we have a
> better solution to extend. 
> 
> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
> users only needs to pass certain flags through bpf program to make sure
> the last skb from each sendmsg() has timestamp related controlled flag.
> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
> and wait for the moment when recvmsg() is called.

As you mention, overall I am very supportive of having a way to add
timestamping by adminstrators, without having to rebuild applications.
BPF hooks seem to be the right place for this.

There is existing kprobe/kretprobe/kfunc support. Supporting
SO_TIMESTAMPING directly may be useful due to its targeted feature
set, and correlation between measurements for the same data in the
stream.
 
> After this series, we could step by step implement more advanced
> functions/flags already in SO_TIMESTAMPING feature for bpf extension.

My main implementation concern is where this API overlaps with the
existing user API, and how they might conflict. A few questions in the
patches.

Also, I'm not the best person to give in-depth BPF feedback.

> Here is the test output:
> 1) receive path
> iperf3-987305  [008] ...11 179955.200990: bpf_trace_printk: rx: port: 5201:55192, swtimestamp: 1728167973,670426346, hwtimestamp: 0,0
> 2) xmit path
> iperf3-19765   [013] ...11  2021.329602: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436678584
> iperf3-19765   [013] b..11  2021.329611: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436689976
> iperf3-19765   [013] ...11  2021.329622: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436700739


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later
  2024-10-08  9:51 ` [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later Jason Xing
@ 2024-10-08 18:45   ` Willem de Bruijn
  2024-10-08 23:27     ` Jason Xing
  2024-10-09  0:58   ` Kuniyuki Iwashima
  1 sibling, 1 reply; 49+ messages in thread
From: Willem de Bruijn @ 2024-10-08 18:45 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Implement basic codes so that we later can easily add each tx points.
> Introducing BPF_SOCK_OPS_ALL_CB_FLAGS used as a test statement can help use
> control whether to output or not.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  include/uapi/linux/bpf.h       |  5 ++++-
>  net/core/skbuff.c              | 18 ++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  5 ++++-
>  3 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c6cd7c7aeeee..157e139ed6fc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6900,8 +6900,11 @@ enum {
>  	 * options first before the BPF program does.
>  	 */
>  	BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
> +	/* Call bpf when the kernel is generating tx timestamps.
> +	 */
> +	BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
>  /* Mask of all currently supported cb flags */
> -	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
> +	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,
>  };
>  
>  /* List of known BPF sock_ops operators.
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 74149dc4ee31..5ff1a91c1204 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5539,6 +5539,21 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>  }
>  EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
>  
> +static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
> +			      struct skb_shared_hwtstamps *hwtstamps)
> +{
> +	struct tcp_sock *tp;
> +
> +	if (!sk_is_tcp(sk))
> +		return false;
> +
> +	tp = tcp_sk(sk);
> +	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
> +		return true;
> +
> +	return false;
> +}
> +
>  void __skb_tstamp_tx(struct sk_buff *orig_skb,
>  		     const struct sk_buff *ack_skb,
>  		     struct skb_shared_hwtstamps *hwtstamps,
> @@ -5551,6 +5566,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>  	if (!sk)
>  		return;
>  
> +	if (bpf_skb_tstamp_tx(sk, tstype, hwtstamps))
> +		return;
> +

Eventually, this whole feature could probably be behind a
static_branch.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps
  2024-10-08  9:51 ` [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps Jason Xing
@ 2024-10-08 18:53   ` Willem de Bruijn
  2024-10-08 23:37     ` Jason Xing
  2024-10-08 19:18   ` Vadim Fedorenko
  1 sibling, 1 reply; 49+ messages in thread
From: Willem de Bruijn @ 2024-10-08 18:53 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Once we set BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG flag here, there
> are three points in the previous patches where generating timestamps
> works. Let us make the basic bpf mechanism for timestamping feature
>  work finally.
> 
> We can use like this as a simple example in bpf program:
> __section("sockops")
> 
> case BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB:
> 	dport = bpf_ntohl(skops->remote_port);
> 	sport = skops->local_port;
> 	skops->reply = SOF_TIMESTAMPING_TX_SCHED;
> 	bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG);
> case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
> 	bpf_printk(...);
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>

>  /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 82cc4a5633ce..ddf4089779b5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -477,12 +477,37 @@ void tcp_init_sock(struct sock *sk)
>  }
>  EXPORT_SYMBOL(tcp_init_sock);
>  
> +static u32 bpf_tcp_tx_timestamp(struct sock *sk)
> +{
> +	u32 flags;
> +
> +	flags = tcp_call_bpf(sk, BPF_SOCK_OPS_TX_TS_OPT_CB, 0, NULL);
> +	if (flags <= 0)
> +		return 0;
> +
> +	if (flags & ~SOF_TIMESTAMPING_MASK)
> +		return 0;
> +
> +	if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
> +		return 0;
> +
> +	return flags;
> +}
> +
>  static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>  {
>  	struct sk_buff *skb = tcp_write_queue_tail(sk);
>  	u32 tsflags = sockc->tsflags;
> +	u32 flags;
> +
> +	if (!skb)
> +		return;
> +
> +	flags = bpf_tcp_tx_timestamp(sk);
> +	if (flags)
> +		tsflags = flags;

So this feature overwrites the flags set by the user?

Ideally we would use an entirely separate field for BPF admin
timestamping requests.

>  
> -	if (tsflags && skb) {
> +	if (tsflags) {
>  		struct skb_shared_info *shinfo = skb_shinfo(skb);
>  		struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case
  2024-10-08  9:51 ` [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case Jason Xing
@ 2024-10-08 18:56   ` Willem de Bruijn
  2024-10-08 23:18     ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Willem de Bruijn @ 2024-10-08 18:56 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> We can set OPT_ID|OPT_ID_TCP before we initialize the last skb
> from each sendmsg. We only set the socket once like how we use
> setsockopt() with OPT_ID|OPT_ID_TCP flags.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/core/skbuff.c | 16 +++++++++++++---
>  net/ipv4/tcp.c    | 19 +++++++++++++++----
>  2 files changed, 28 insertions(+), 7 deletions(-)
> 

> @@ -491,10 +491,21 @@ static u32 bpf_tcp_tx_timestamp(struct sock *sk)
>  	if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
>  		return 0;
>  
> +	/* We require users to set both OPT_ID and OPT_ID_TCP flags
> +	 * together here, or else the key might be inaccurate.
> +	 */
> +	if (flags & SOF_TIMESTAMPING_OPT_ID &&
> +	    flags & SOF_TIMESTAMPING_OPT_ID_TCP &&
> +	    !(sk->sk_tsflags & (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP))) {
> +		atomic_set(&sk->sk_tskey, (tcp_sk(sk)->write_seq - copied));
> +		sk->sk_tsflags |= (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP);

So user and BPF admin conflict on both sk_tsflags and sktskey?

I think BPF resetting this key, or incrementing it, may break user
expectations.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 3/9] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp
  2024-10-08  9:51 ` [PATCH net-next 3/9] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp Jason Xing
@ 2024-10-08 19:13   ` Vadim Fedorenko
  2024-10-08 23:08     ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Vadim Fedorenko @ 2024-10-08 19:13 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

On 08/10/2024 10:51, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> When the skb is about to send from driver to nic, we can print timestamp
> by setting BPF_SOCK_OPS_TS_SW_OPT_CB in bpf program.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>   include/uapi/linux/bpf.h       | 5 +++++
>   net/core/skbuff.c              | 8 +++++++-
>   tools/include/uapi/linux/bpf.h | 5 +++++
>   3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 3cf3c9c896c7..0d00539f247a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7024,6 +7024,11 @@ enum {
>   					 * feature is on. It indicates the
>   					 * recorded timestamp.
>   					 */
> +	BPF_SOCK_OPS_TS_SW_OPT_CB,	/* Called when skb is about to send
> +					 * to the nic when SO_TIMESTAMPING
> +					 * feature is on. It indicates the
> +					 * recorded timestamp.
> +					 */
>   };
>   
>   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index e697f50d1182..8faaa96c026b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5556,11 +5556,17 @@ static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
>   		case SCM_TSTAMP_SCHED:
>   			cb_flag = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
>   			break;
> +		case SCM_TSTAMP_SND:
> +			cb_flag = BPF_SOCK_OPS_TS_SW_OPT_CB;
> +			break;
>   		default:
>   			return true;
>   		}
>   
> -		tstamp = ktime_to_timespec64(ktime_get_real());
> +		if (hwtstamps)
> +			tstamp = ktime_to_timespec64(hwtstamps->hwtstamp);
> +		else
> +			tstamp = ktime_to_timespec64(ktime_get_real());

Looks like this chunk belongs to another patch?

>   		tcp_call_bpf_2arg(sk, cb_flag, tstamp.tv_sec, tstamp.tv_nsec);
>   		return true;
>   	}
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d60675e1a5a0..020ec14ffae6 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7023,6 +7023,11 @@ enum {
>   					 * feature is on. It indicates the
>   					 * recorded timestamp.
>   					 */
> +	BPF_SOCK_OPS_TS_SW_OPT_CB,	/* Called when skb is about to send
> +					 * to the nic when SO_TIMESTAMPING
> +					 * feature is on. It indicates the
> +					 * recorded timestamp.
> +					 */
>   };
>   
>   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps
  2024-10-08  9:51 ` [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps Jason Xing
  2024-10-08 18:53   ` Willem de Bruijn
@ 2024-10-08 19:18   ` Vadim Fedorenko
  2024-10-08 23:48     ` Jason Xing
  1 sibling, 1 reply; 49+ messages in thread
From: Vadim Fedorenko @ 2024-10-08 19:18 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa
  Cc: bpf, netdev, Jason Xing

On 08/10/2024 10:51, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Once we set BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG flag here, there
> are three points in the previous patches where generating timestamps
> works. Let us make the basic bpf mechanism for timestamping feature
>   work finally.
> 
> We can use like this as a simple example in bpf program:
> __section("sockops")
> 
> case BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB:
> 	dport = bpf_ntohl(skops->remote_port);
> 	sport = skops->local_port;
> 	skops->reply = SOF_TIMESTAMPING_TX_SCHED;
> 	bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG);
> case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
> 	bpf_printk(...);
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>   include/uapi/linux/bpf.h       |  8 ++++++++
>   net/ipv4/tcp.c                 | 27 ++++++++++++++++++++++++++-
>   tools/include/uapi/linux/bpf.h |  8 ++++++++
>   3 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 1b478ec18ac2..6bf3f2892776 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7034,6 +7034,14 @@ enum {
>   					 * feature is on. It indicates the
>   					 * recorded timestamp.
>   					 */
> +	BPF_SOCK_OPS_TX_TS_OPT_CB,	/* Called when the last skb from
> +					 * sendmsg is going to push when
> +					 * SO_TIMESTAMPING feature is on.
> +					 * Let user have a chance to switch
> +					 * on BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG
> +					 * flag for other three tx timestamp
> +					 * use.
> +					 */
>   };
>   
>   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 82cc4a5633ce..ddf4089779b5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -477,12 +477,37 @@ void tcp_init_sock(struct sock *sk)
>   }
>   EXPORT_SYMBOL(tcp_init_sock);
>   
> +static u32 bpf_tcp_tx_timestamp(struct sock *sk)
> +{
> +	u32 flags;
> +
> +	flags = tcp_call_bpf(sk, BPF_SOCK_OPS_TX_TS_OPT_CB, 0, NULL);
> +	if (flags <= 0)
> +		return 0;
> +
> +	if (flags & ~SOF_TIMESTAMPING_MASK)
> +		return 0;
> +
> +	if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
> +		return 0;
> +
> +	return flags;
> +}
> +
>   static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>   {
>   	struct sk_buff *skb = tcp_write_queue_tail(sk);
>   	u32 tsflags = sockc->tsflags;
> +	u32 flags;
> +
> +	if (!skb)
> +		return;
> +
> +	flags = bpf_tcp_tx_timestamp(sk);
> +	if (flags)
> +		tsflags = flags;

In this case it's impossible to clear timestamping flags from bpf
program, but it may be very useful. Consider providing flags from
socket cookie to the program or maybe add an option to combine them?

>   
> -	if (tsflags && skb) {
> +	if (tsflags) {
>   		struct skb_shared_info *shinfo = skb_shinfo(skb);
>   		struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
>   
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index fc9b94de19f2..d3bf538846da 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7033,6 +7033,14 @@ enum {
>   					 * feature is on. It indicates the
>   					 * recorded timestamp.
>   					 */
> +	BPF_SOCK_OPS_TX_TS_OPT_CB,	/* Called when the last skb from
> +					 * sendmsg is going to push when
> +					 * SO_TIMESTAMPING feature is on.
> +					 * Let user have a chance to switch
> +					 * on BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG
> +					 * flag for other three tx timestamp
> +					 * use.
> +					 */
>   };
>   
>   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 3/9] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp
  2024-10-08 19:13   ` Vadim Fedorenko
@ 2024-10-08 23:08     ` Jason Xing
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-08 23:08 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf,
	netdev, Jason Xing

On Wed, Oct 9, 2024 at 3:13 AM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 08/10/2024 10:51, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > When the skb is about to send from driver to nic, we can print timestamp
> > by setting BPF_SOCK_OPS_TS_SW_OPT_CB in bpf program.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >   include/uapi/linux/bpf.h       | 5 +++++
> >   net/core/skbuff.c              | 8 +++++++-
> >   tools/include/uapi/linux/bpf.h | 5 +++++
> >   3 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 3cf3c9c896c7..0d00539f247a 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -7024,6 +7024,11 @@ enum {
> >                                        * feature is on. It indicates the
> >                                        * recorded timestamp.
> >                                        */
> > +     BPF_SOCK_OPS_TS_SW_OPT_CB,      /* Called when skb is about to send
> > +                                      * to the nic when SO_TIMESTAMPING
> > +                                      * feature is on. It indicates the
> > +                                      * recorded timestamp.
> > +                                      */
> >   };
> >
> >   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index e697f50d1182..8faaa96c026b 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5556,11 +5556,17 @@ static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
> >               case SCM_TSTAMP_SCHED:
> >                       cb_flag = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
> >                       break;
> > +             case SCM_TSTAMP_SND:
> > +                     cb_flag = BPF_SOCK_OPS_TS_SW_OPT_CB;
> > +                     break;
> >               default:
> >                       return true;
> >               }
> >
> > -             tstamp = ktime_to_timespec64(ktime_get_real());
> > +             if (hwtstamps)
> > +                     tstamp = ktime_to_timespec64(hwtstamps->hwtstamp);
> > +             else
> > +                     tstamp = ktime_to_timespec64(ktime_get_real());
>
> Looks like this chunk belongs to another patch?

Makes sense. I could remove it here and write them when we need to
implement hwtstamp logic.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case
  2024-10-08 18:56   ` Willem de Bruijn
@ 2024-10-08 23:18     ` Jason Xing
  2024-10-09 13:19       ` Willem de Bruijn
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-08 23:18 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

On Wed, Oct 9, 2024 at 2:56 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > We can set OPT_ID|OPT_ID_TCP before we initialize the last skb
> > from each sendmsg. We only set the socket once like how we use
> > setsockopt() with OPT_ID|OPT_ID_TCP flags.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  net/core/skbuff.c | 16 +++++++++++++---
> >  net/ipv4/tcp.c    | 19 +++++++++++++++----
> >  2 files changed, 28 insertions(+), 7 deletions(-)
> >
>
> > @@ -491,10 +491,21 @@ static u32 bpf_tcp_tx_timestamp(struct sock *sk)
> >       if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
> >               return 0;
> >
> > +     /* We require users to set both OPT_ID and OPT_ID_TCP flags
> > +      * together here, or else the key might be inaccurate.
> > +      */
> > +     if (flags & SOF_TIMESTAMPING_OPT_ID &&
> > +         flags & SOF_TIMESTAMPING_OPT_ID_TCP &&
> > +         !(sk->sk_tsflags & (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP))) {
> > +             atomic_set(&sk->sk_tskey, (tcp_sk(sk)->write_seq - copied));
> > +             sk->sk_tsflags |= (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP);
>
> So user and BPF admin conflict on both sk_tsflags and sktskey?
>
> I think BPF resetting this key, or incrementing it, may break user
> expectations.

Yes, when it comes to OPT_ID and OPT_ID_TCP, conflict could happen.
The reason why I don't use it like BPF_SOCK_OPS_TS_SCHED_OPT_CB flags
(which is set along with each last skb) is that OPT_ID logic is a
little bit complex. If we want to avoid touching sk_tsflags field in
struct sock, we have to re-implement a similiar logic as you've
already done in these years.

Now, this patch is easier but as you said it may "break" users... But
I wonder if we can give the bpf program the first priority like what
TCP_BPF_RTO_MIN does. TCP_BPF_RTO_MIN can override icsk_rto_min field
in struct inet_connection_sock.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-08 18:44 ` [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Willem de Bruijn
@ 2024-10-08 23:22   ` Jason Xing
  2024-10-09  1:05     ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-08 23:22 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> > tracepoint to print information (say, tstamp) so that we can
> > transparently equip applications with this feature and require no
> > modification in user side.
> >
> > Later, we discussed at netconf and agreed that we can use bpf for better
> > extension, which is mainly suggested by John Fastabend and Willem de
> > Bruijn. Many thanks here! So I post this series to see if we have a
> > better solution to extend.
> >
> > This approach relies on existing SO_TIMESTAMPING feature, for tx path,
> > users only needs to pass certain flags through bpf program to make sure
> > the last skb from each sendmsg() has timestamp related controlled flag.
> > For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
> > and wait for the moment when recvmsg() is called.
>
> As you mention, overall I am very supportive of having a way to add
> timestamping by adminstrators, without having to rebuild applications.
> BPF hooks seem to be the right place for this.
>
> There is existing kprobe/kretprobe/kfunc support. Supporting
> SO_TIMESTAMPING directly may be useful due to its targeted feature
> set, and correlation between measurements for the same data in the
> stream.
>
> > After this series, we could step by step implement more advanced
> > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
>
> My main implementation concern is where this API overlaps with the
> existing user API, and how they might conflict. A few questions in the
> patches.

Agreed. That's also what I'm concerned about. So I decided to ask for
related experts' help.

How to deal with it without interfering with the existing apps in the
right way is the key problem.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later
  2024-10-08 18:45   ` Willem de Bruijn
@ 2024-10-08 23:27     ` Jason Xing
  2024-10-09 13:22       ` Willem de Bruijn
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-08 23:27 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

On Wed, Oct 9, 2024 at 2:45 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Implement basic codes so that we later can easily add each tx points.
> > Introducing BPF_SOCK_OPS_ALL_CB_FLAGS used as a test statement can help use
> > control whether to output or not.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  include/uapi/linux/bpf.h       |  5 ++++-
> >  net/core/skbuff.c              | 18 ++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  5 ++++-
> >  3 files changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index c6cd7c7aeeee..157e139ed6fc 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -6900,8 +6900,11 @@ enum {
> >        * options first before the BPF program does.
> >        */
> >       BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
> > +     /* Call bpf when the kernel is generating tx timestamps.
> > +      */
> > +     BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
> >  /* Mask of all currently supported cb flags */
> > -     BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
> > +     BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,
> >  };
> >
> >  /* List of known BPF sock_ops operators.
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 74149dc4ee31..5ff1a91c1204 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5539,6 +5539,21 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >  }
> >  EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
> >
> > +static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
> > +                           struct skb_shared_hwtstamps *hwtstamps)
> > +{
> > +     struct tcp_sock *tp;
> > +
> > +     if (!sk_is_tcp(sk))
> > +             return false;
> > +
> > +     tp = tcp_sk(sk);
> > +     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
> > +             return true;
> > +
> > +     return false;
> > +}
> > +
> >  void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >                    const struct sk_buff *ack_skb,
> >                    struct skb_shared_hwtstamps *hwtstamps,
> > @@ -5551,6 +5566,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >       if (!sk)
> >               return;
> >
> > +     if (bpf_skb_tstamp_tx(sk, tstype, hwtstamps))
> > +             return;
> > +
>
> Eventually, this whole feature could probably be behind a
> static_branch.

You want to implement another toggle to control it? But for tx path
"BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)"
works as a per-netns toggle. I would like to know what you exactly
want to do in the next move?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps
  2024-10-08 18:53   ` Willem de Bruijn
@ 2024-10-08 23:37     ` Jason Xing
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-08 23:37 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

On Wed, Oct 9, 2024 at 2:53 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Once we set BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG flag here, there
> > are three points in the previous patches where generating timestamps
> > works. Let us make the basic bpf mechanism for timestamping feature
> >  work finally.
> >
> > We can use like this as a simple example in bpf program:
> > __section("sockops")
> >
> > case BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB:
> >       dport = bpf_ntohl(skops->remote_port);
> >       sport = skops->local_port;
> >       skops->reply = SOF_TIMESTAMPING_TX_SCHED;
> >       bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG);
> > case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
> >       bpf_printk(...);
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
>
> >  /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 82cc4a5633ce..ddf4089779b5 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -477,12 +477,37 @@ void tcp_init_sock(struct sock *sk)
> >  }
> >  EXPORT_SYMBOL(tcp_init_sock);
> >
> > +static u32 bpf_tcp_tx_timestamp(struct sock *sk)
> > +{
> > +     u32 flags;
> > +
> > +     flags = tcp_call_bpf(sk, BPF_SOCK_OPS_TX_TS_OPT_CB, 0, NULL);
> > +     if (flags <= 0)
> > +             return 0;
> > +
> > +     if (flags & ~SOF_TIMESTAMPING_MASK)
> > +             return 0;
> > +
> > +     if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
> > +             return 0;
> > +
> > +     return flags;
> > +}
> > +
> >  static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >  {
> >       struct sk_buff *skb = tcp_write_queue_tail(sk);
> >       u32 tsflags = sockc->tsflags;
> > +     u32 flags;
> > +
> > +     if (!skb)
> > +             return;
> > +
> > +     flags = bpf_tcp_tx_timestamp(sk);
> > +     if (flags)
> > +             tsflags = flags;
>
> So this feature overwrites the flags set by the user?

It only overrides each last skb instead of the whole socket so that
some time if we don't want to use this bpf program any more, we could
easily and directly detach it without having to find a proper time to
clear the fields in struct sock. That's the advantage of setting
through each sendmsg call, compared to bpf_setsockopt method.

> Ideally we would use an entirely separate field for BPF admin
> timestamping requests.

I understand what you mean. I'm not that familiar with how a bpf
extension actually implements, so I dug into how RTO min time can be
affected by bpf programs (see BPF_SOCK_OPS_TIMEOUT_INIT as an
example). It also modifies the existing field.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps
  2024-10-08 19:18   ` Vadim Fedorenko
@ 2024-10-08 23:48     ` Jason Xing
  2024-10-09  9:16       ` Vadim Fedorenko
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-08 23:48 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf,
	netdev, Jason Xing

On Wed, Oct 9, 2024 at 3:18 AM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 08/10/2024 10:51, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Once we set BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG flag here, there
> > are three points in the previous patches where generating timestamps
> > works. Let us make the basic bpf mechanism for timestamping feature
> >   work finally.
> >
> > We can use like this as a simple example in bpf program:
> > __section("sockops")
> >
> > case BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB:
> >       dport = bpf_ntohl(skops->remote_port);
> >       sport = skops->local_port;
> >       skops->reply = SOF_TIMESTAMPING_TX_SCHED;
> >       bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG);
> > case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
> >       bpf_printk(...);
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >   include/uapi/linux/bpf.h       |  8 ++++++++
> >   net/ipv4/tcp.c                 | 27 ++++++++++++++++++++++++++-
> >   tools/include/uapi/linux/bpf.h |  8 ++++++++
> >   3 files changed, 42 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 1b478ec18ac2..6bf3f2892776 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -7034,6 +7034,14 @@ enum {
> >                                        * feature is on. It indicates the
> >                                        * recorded timestamp.
> >                                        */
> > +     BPF_SOCK_OPS_TX_TS_OPT_CB,      /* Called when the last skb from
> > +                                      * sendmsg is going to push when
> > +                                      * SO_TIMESTAMPING feature is on.
> > +                                      * Let user have a chance to switch
> > +                                      * on BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG
> > +                                      * flag for other three tx timestamp
> > +                                      * use.
> > +                                      */
> >   };
> >
> >   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 82cc4a5633ce..ddf4089779b5 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -477,12 +477,37 @@ void tcp_init_sock(struct sock *sk)
> >   }
> >   EXPORT_SYMBOL(tcp_init_sock);
> >
> > +static u32 bpf_tcp_tx_timestamp(struct sock *sk)
> > +{
> > +     u32 flags;
> > +
> > +     flags = tcp_call_bpf(sk, BPF_SOCK_OPS_TX_TS_OPT_CB, 0, NULL);
> > +     if (flags <= 0)
> > +             return 0;
> > +
> > +     if (flags & ~SOF_TIMESTAMPING_MASK)
> > +             return 0;
> > +
> > +     if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
> > +             return 0;
> > +
> > +     return flags;
> > +}
> > +
> >   static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >   {
> >       struct sk_buff *skb = tcp_write_queue_tail(sk);
> >       u32 tsflags = sockc->tsflags;
> > +     u32 flags;
> > +
> > +     if (!skb)
> > +             return;
> > +
> > +     flags = bpf_tcp_tx_timestamp(sk);
> > +     if (flags)
> > +             tsflags = flags;
>
> In this case it's impossible to clear timestamping flags from bpf

It cannot be cleared only from the last skb until the next round of
recvmsg. Since the last skb is generated and bpf program is attached,
I would like to know why we need to clear the related fields in the
skb? Please note that I didn't hack the sk_tstflags in struct sock :)

> program, but it may be very useful. Consider providing flags from
> socket cookie to the program or maybe add an option to combine them?

Thanks for this idea. May I ask what the benefits are through adding
an option because the bpf test statement (BPF_SOCK_OPS_TEST_FLAG) is a
good option to take a whole control? Or could you provide more details
about how you expect to do so?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps
  2024-10-08  9:51 ` [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps Jason Xing
@ 2024-10-09  0:22   ` Jakub Kicinski
  2024-10-09  0:30     ` Jason Xing
  2024-10-09  2:33   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 49+ messages in thread
From: Jakub Kicinski @ 2024-10-09  0:22 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, pabeni, dsahern, willemdebruijn.kernel, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On Tue,  8 Oct 2024 17:51:08 +0800 Jason Xing wrote:
> +static bool tcp_bpf_recv_timestamp(struct sock *sk, struct scm_timestamping_internal *tss)
> +{
> +	struct tcp_sock *tp = tcp_sk(sk);
> +
> +	if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG))
> +		return true;
> +
> +	return false;
> +}
> +
>  /* Similar to __sock_recv_timestamp, but does not require an skb */
>  void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
>  			struct scm_timestamping_internal *tss)
> @@ -2284,6 +2294,9 @@ void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
>  	u32 tsflags = READ_ONCE(sk->sk_tsflags);
>  	bool has_timestamping = false;
>  
> +	if (tcp_bpf_recv_timestamp(sk, tss))

net/ipv4/tcp.c:2297:29: error: passing 'const struct sock *' to parameter of type 'struct sock *' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
 2297 |         if (tcp_bpf_recv_timestamp(sk, tss))
      |                                    ^~
net/ipv4/tcp.c:2279:49: note: passing argument to parameter 'sk' here
 2279 | static bool tcp_bpf_recv_timestamp(struct sock *sk, struct scm_timestamping_internal *tss)
      |                                                 ^
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps
  2024-10-09  0:22   ` Jakub Kicinski
@ 2024-10-09  0:30     ` Jason Xing
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-09  0:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, edumazet, pabeni, dsahern, willemdebruijn.kernel, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On Wed, Oct 9, 2024 at 8:22 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue,  8 Oct 2024 17:51:08 +0800 Jason Xing wrote:
> > +static bool tcp_bpf_recv_timestamp(struct sock *sk, struct scm_timestamping_internal *tss)
> > +{
> > +     struct tcp_sock *tp = tcp_sk(sk);
> > +
> > +     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG))
> > +             return true;
> > +
> > +     return false;
> > +}
> > +
> >  /* Similar to __sock_recv_timestamp, but does not require an skb */
> >  void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
> >                       struct scm_timestamping_internal *tss)
> > @@ -2284,6 +2294,9 @@ void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
> >       u32 tsflags = READ_ONCE(sk->sk_tsflags);
> >       bool has_timestamping = false;
> >
> > +     if (tcp_bpf_recv_timestamp(sk, tss))
>
> net/ipv4/tcp.c:2297:29: error: passing 'const struct sock *' to parameter of type 'struct sock *' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
>  2297 |         if (tcp_bpf_recv_timestamp(sk, tss))
>       |                                    ^~
> net/ipv4/tcp.c:2279:49: note: passing argument to parameter 'sk' here
>  2279 | static bool tcp_bpf_recv_timestamp(struct sock *sk, struct scm_timestamping_internal *tss)
>       |                                                 ^

Thanks, I will fix this.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later
  2024-10-08  9:51 ` [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later Jason Xing
  2024-10-08 18:45   ` Willem de Bruijn
@ 2024-10-09  0:58   ` Kuniyuki Iwashima
  2024-10-09  8:11     ` Jason Xing
  1 sibling, 1 reply; 49+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09  0:58 UTC (permalink / raw)
  To: kerneljasonxing
  Cc: andrii, ast, bpf, daniel, davem, dsahern, eddyz87, edumazet,
	haoluo, john.fastabend, jolsa, kernelxing, kpsingh, kuba,
	martin.lau, netdev, pabeni, sdf, song, willemb,
	willemdebruijn.kernel, yonghong.song, kuniyu

From: Jason Xing <kerneljasonxing@gmail.com>
Date: Tue,  8 Oct 2024 17:51:01 +0800
> From: Jason Xing <kernelxing@tencent.com>
> 
> Implement basic codes so that we later can easily add each tx points.
> Introducing BPF_SOCK_OPS_ALL_CB_FLAGS used as a test statement can help use
> control whether to output or not.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  include/uapi/linux/bpf.h       |  5 ++++-
>  net/core/skbuff.c              | 18 ++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  5 ++++-
>  3 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c6cd7c7aeeee..157e139ed6fc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6900,8 +6900,11 @@ enum {
>  	 * options first before the BPF program does.
>  	 */
>  	BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
> +	/* Call bpf when the kernel is generating tx timestamps.
> +	 */
> +	BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
>  /* Mask of all currently supported cb flags */
> -	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
> +	BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,

I remember this change makes two selftests fail and needs diff
in this link.
https://lore.kernel.org/bpf/20231016161134.25365-1-kuniyu@amazon.com/

Also, adding a bpf selftest or extending some for this series
would be nice.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-08 23:22   ` Jason Xing
@ 2024-10-09  1:05     ` Jason Xing
  2024-10-09  9:27       ` Vadim Fedorenko
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-09  1:05 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> > > tracepoint to print information (say, tstamp) so that we can
> > > transparently equip applications with this feature and require no
> > > modification in user side.
> > >
> > > Later, we discussed at netconf and agreed that we can use bpf for better
> > > extension, which is mainly suggested by John Fastabend and Willem de
> > > Bruijn. Many thanks here! So I post this series to see if we have a
> > > better solution to extend.
> > >
> > > This approach relies on existing SO_TIMESTAMPING feature, for tx path,
> > > users only needs to pass certain flags through bpf program to make sure
> > > the last skb from each sendmsg() has timestamp related controlled flag.
> > > For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
> > > and wait for the moment when recvmsg() is called.
> >
> > As you mention, overall I am very supportive of having a way to add
> > timestamping by adminstrators, without having to rebuild applications.
> > BPF hooks seem to be the right place for this.
> >
> > There is existing kprobe/kretprobe/kfunc support. Supporting
> > SO_TIMESTAMPING directly may be useful due to its targeted feature
> > set, and correlation between measurements for the same data in the
> > stream.
> >
> > > After this series, we could step by step implement more advanced
> > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> >
> > My main implementation concern is where this API overlaps with the
> > existing user API, and how they might conflict. A few questions in the
> > patches.
>
> Agreed. That's also what I'm concerned about. So I decided to ask for
> related experts' help.
>
> How to deal with it without interfering with the existing apps in the
> right way is the key problem.

What I try to implement is let the bpf program have the highest
precedence. It's similar to RTO min, see the commit as an example:

commit f086edef71be7174a16c1ed67ac65a085cda28b1
Author: Kevin Yang <yyd@google.com>
Date:   Mon Jun 3 21:30:54 2024 +0000

    tcp: add sysctl_tcp_rto_min_us

    Adding a sysctl knob to allow user to specify a default
    rto_min at socket init time, other than using the hard
    coded 200ms default rto_min.

    Note that the rto_min route option has the highest precedence
    for configuring this setting, followed by the TCP_BPF_RTO_MIN
    socket option, followed by the tcp_rto_min_us sysctl.

It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
The first priority can override others. It doesn't have a good
chance/point to restore the icsk_rto_min field if users want to
shutdown the bpf program because it is set in
bpf_sol_tcp_setsockopt().

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps
  2024-10-08  9:51 ` [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps Jason Xing
  2024-10-09  0:22   ` Jakub Kicinski
@ 2024-10-09  2:33   ` kernel test robot
  2024-10-09  4:17   ` kernel test robot
  2024-10-09  5:09   ` kernel test robot
  3 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2024-10-09  2:33 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa
  Cc: oe-kbuild-all, bpf, netdev, Jason Xing

Hi Jason,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Xing/net-timestamp-add-bpf-infrastructure-to-allow-exposing-more-information-later/20241008-175458
base:   net-next/main
patch link:    https://lore.kernel.org/r/20241008095109.99918-9-kerneljasonxing%40gmail.com
patch subject: [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps
config: openrisc-defconfig (https://download.01.org/0day-ci/archive/20241009/202410090912.5vZnvozX-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241009/202410090912.5vZnvozX-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410090912.5vZnvozX-lkp@intel.com/

All warnings (new ones prefixed by >>):

   net/ipv4/tcp.c: In function 'tcp_recv_timestamp':
>> net/ipv4/tcp.c:2297:36: warning: passing argument 1 of 'tcp_bpf_recv_timestamp' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
    2297 |         if (tcp_bpf_recv_timestamp(sk, tss))
         |                                    ^~
   net/ipv4/tcp.c:2279:49: note: expected 'struct sock *' but argument is of type 'const struct sock *'
    2279 | static bool tcp_bpf_recv_timestamp(struct sock *sk, struct scm_timestamping_internal *tss)
         |                                    ~~~~~~~~~~~~~^~


vim +2297 net/ipv4/tcp.c

  2288	
  2289	/* Similar to __sock_recv_timestamp, but does not require an skb */
  2290	void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
  2291				struct scm_timestamping_internal *tss)
  2292	{
  2293		int new_tstamp = sock_flag(sk, SOCK_TSTAMP_NEW);
  2294		u32 tsflags = READ_ONCE(sk->sk_tsflags);
  2295		bool has_timestamping = false;
  2296	
> 2297		if (tcp_bpf_recv_timestamp(sk, tss))
  2298			return;
  2299	
  2300		if (tss->ts[0].tv_sec || tss->ts[0].tv_nsec) {
  2301			if (sock_flag(sk, SOCK_RCVTSTAMP)) {
  2302				if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
  2303					if (new_tstamp) {
  2304						struct __kernel_timespec kts = {
  2305							.tv_sec = tss->ts[0].tv_sec,
  2306							.tv_nsec = tss->ts[0].tv_nsec,
  2307						};
  2308						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMPNS_NEW,
  2309							 sizeof(kts), &kts);
  2310					} else {
  2311						struct __kernel_old_timespec ts_old = {
  2312							.tv_sec = tss->ts[0].tv_sec,
  2313							.tv_nsec = tss->ts[0].tv_nsec,
  2314						};
  2315						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMPNS_OLD,
  2316							 sizeof(ts_old), &ts_old);
  2317					}
  2318				} else {
  2319					if (new_tstamp) {
  2320						struct __kernel_sock_timeval stv = {
  2321							.tv_sec = tss->ts[0].tv_sec,
  2322							.tv_usec = tss->ts[0].tv_nsec / 1000,
  2323						};
  2324						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMP_NEW,
  2325							 sizeof(stv), &stv);
  2326					} else {
  2327						struct __kernel_old_timeval tv = {
  2328							.tv_sec = tss->ts[0].tv_sec,
  2329							.tv_usec = tss->ts[0].tv_nsec / 1000,
  2330						};
  2331						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMP_OLD,
  2332							 sizeof(tv), &tv);
  2333					}
  2334				}
  2335			}
  2336	
  2337			if (tsflags & SOF_TIMESTAMPING_SOFTWARE &&
  2338			    (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE ||
  2339			     !(tsflags & SOF_TIMESTAMPING_OPT_RX_FILTER)))
  2340				has_timestamping = true;
  2341			else
  2342				tss->ts[0] = (struct timespec64) {0};
  2343		}
  2344	
  2345		if (tss->ts[2].tv_sec || tss->ts[2].tv_nsec) {
  2346			if (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE &&
  2347			    (tsflags & SOF_TIMESTAMPING_RX_HARDWARE ||
  2348			     !(tsflags & SOF_TIMESTAMPING_OPT_RX_FILTER)))
  2349				has_timestamping = true;
  2350			else
  2351				tss->ts[2] = (struct timespec64) {0};
  2352		}
  2353	
  2354		if (has_timestamping) {
  2355			tss->ts[1] = (struct timespec64) {0};
  2356			if (sock_flag(sk, SOCK_TSTAMP_NEW))
  2357				put_cmsg_scm_timestamping64(msg, tss);
  2358			else
  2359				put_cmsg_scm_timestamping(msg, tss);
  2360		}
  2361	}
  2362	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps
  2024-10-08  9:51 ` [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps Jason Xing
  2024-10-09  0:22   ` Jakub Kicinski
  2024-10-09  2:33   ` kernel test robot
@ 2024-10-09  4:17   ` kernel test robot
  2024-10-09  5:09   ` kernel test robot
  3 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2024-10-09  4:17 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa
  Cc: llvm, oe-kbuild-all, bpf, netdev, Jason Xing

Hi Jason,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Xing/net-timestamp-add-bpf-infrastructure-to-allow-exposing-more-information-later/20241008-175458
base:   net-next/main
patch link:    https://lore.kernel.org/r/20241008095109.99918-9-kerneljasonxing%40gmail.com
patch subject: [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps
config: arm-exynos_defconfig (https://download.01.org/0day-ci/archive/20241009/202410091146.2OM6QWPq-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241009/202410091146.2OM6QWPq-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410091146.2OM6QWPq-lkp@intel.com/

All errors (new ones prefixed by >>):

>> net/ipv4/tcp.c:2297:29: error: passing 'const struct sock *' to parameter of type 'struct sock *' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
    2297 |         if (tcp_bpf_recv_timestamp(sk, tss))
         |                                    ^~
   net/ipv4/tcp.c:2279:49: note: passing argument to parameter 'sk' here
    2279 | static bool tcp_bpf_recv_timestamp(struct sock *sk, struct scm_timestamping_internal *tss)
         |                                                 ^
   1 error generated.


vim +2297 net/ipv4/tcp.c

  2288	
  2289	/* Similar to __sock_recv_timestamp, but does not require an skb */
  2290	void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
  2291				struct scm_timestamping_internal *tss)
  2292	{
  2293		int new_tstamp = sock_flag(sk, SOCK_TSTAMP_NEW);
  2294		u32 tsflags = READ_ONCE(sk->sk_tsflags);
  2295		bool has_timestamping = false;
  2296	
> 2297		if (tcp_bpf_recv_timestamp(sk, tss))
  2298			return;
  2299	
  2300		if (tss->ts[0].tv_sec || tss->ts[0].tv_nsec) {
  2301			if (sock_flag(sk, SOCK_RCVTSTAMP)) {
  2302				if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
  2303					if (new_tstamp) {
  2304						struct __kernel_timespec kts = {
  2305							.tv_sec = tss->ts[0].tv_sec,
  2306							.tv_nsec = tss->ts[0].tv_nsec,
  2307						};
  2308						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMPNS_NEW,
  2309							 sizeof(kts), &kts);
  2310					} else {
  2311						struct __kernel_old_timespec ts_old = {
  2312							.tv_sec = tss->ts[0].tv_sec,
  2313							.tv_nsec = tss->ts[0].tv_nsec,
  2314						};
  2315						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMPNS_OLD,
  2316							 sizeof(ts_old), &ts_old);
  2317					}
  2318				} else {
  2319					if (new_tstamp) {
  2320						struct __kernel_sock_timeval stv = {
  2321							.tv_sec = tss->ts[0].tv_sec,
  2322							.tv_usec = tss->ts[0].tv_nsec / 1000,
  2323						};
  2324						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMP_NEW,
  2325							 sizeof(stv), &stv);
  2326					} else {
  2327						struct __kernel_old_timeval tv = {
  2328							.tv_sec = tss->ts[0].tv_sec,
  2329							.tv_usec = tss->ts[0].tv_nsec / 1000,
  2330						};
  2331						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMP_OLD,
  2332							 sizeof(tv), &tv);
  2333					}
  2334				}
  2335			}
  2336	
  2337			if (tsflags & SOF_TIMESTAMPING_SOFTWARE &&
  2338			    (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE ||
  2339			     !(tsflags & SOF_TIMESTAMPING_OPT_RX_FILTER)))
  2340				has_timestamping = true;
  2341			else
  2342				tss->ts[0] = (struct timespec64) {0};
  2343		}
  2344	
  2345		if (tss->ts[2].tv_sec || tss->ts[2].tv_nsec) {
  2346			if (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE &&
  2347			    (tsflags & SOF_TIMESTAMPING_RX_HARDWARE ||
  2348			     !(tsflags & SOF_TIMESTAMPING_OPT_RX_FILTER)))
  2349				has_timestamping = true;
  2350			else
  2351				tss->ts[2] = (struct timespec64) {0};
  2352		}
  2353	
  2354		if (has_timestamping) {
  2355			tss->ts[1] = (struct timespec64) {0};
  2356			if (sock_flag(sk, SOCK_TSTAMP_NEW))
  2357				put_cmsg_scm_timestamping64(msg, tss);
  2358			else
  2359				put_cmsg_scm_timestamping(msg, tss);
  2360		}
  2361	}
  2362	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps
  2024-10-08  9:51 ` [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps Jason Xing
                     ` (2 preceding siblings ...)
  2024-10-09  4:17   ` kernel test robot
@ 2024-10-09  5:09   ` kernel test robot
  3 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2024-10-09  5:09 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, willemb, ast, daniel, andrii, martin.lau,
	eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, jolsa
  Cc: oe-kbuild-all, bpf, netdev, Jason Xing

Hi Jason,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Xing/net-timestamp-add-bpf-infrastructure-to-allow-exposing-more-information-later/20241008-175458
base:   net-next/main
patch link:    https://lore.kernel.org/r/20241008095109.99918-9-kerneljasonxing%40gmail.com
patch subject: [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps
config: x86_64-randconfig-122-20241009 (https://download.01.org/0day-ci/archive/20241009/202410091245.zCD0FszC-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241009/202410091245.zCD0FszC-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410091245.zCD0FszC-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> net/ipv4/tcp.c:2297:36: sparse: sparse: incorrect type in argument 1 (different modifiers) @@     expected struct sock *sk @@     got struct sock const *sk @@
   net/ipv4/tcp.c:2297:36: sparse:     expected struct sock *sk
   net/ipv4/tcp.c:2297:36: sparse:     got struct sock const *sk

vim +2297 net/ipv4/tcp.c

  2288	
  2289	/* Similar to __sock_recv_timestamp, but does not require an skb */
  2290	void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
  2291				struct scm_timestamping_internal *tss)
  2292	{
  2293		int new_tstamp = sock_flag(sk, SOCK_TSTAMP_NEW);
  2294		u32 tsflags = READ_ONCE(sk->sk_tsflags);
  2295		bool has_timestamping = false;
  2296	
> 2297		if (tcp_bpf_recv_timestamp(sk, tss))
  2298			return;
  2299	
  2300		if (tss->ts[0].tv_sec || tss->ts[0].tv_nsec) {
  2301			if (sock_flag(sk, SOCK_RCVTSTAMP)) {
  2302				if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
  2303					if (new_tstamp) {
  2304						struct __kernel_timespec kts = {
  2305							.tv_sec = tss->ts[0].tv_sec,
  2306							.tv_nsec = tss->ts[0].tv_nsec,
  2307						};
  2308						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMPNS_NEW,
  2309							 sizeof(kts), &kts);
  2310					} else {
  2311						struct __kernel_old_timespec ts_old = {
  2312							.tv_sec = tss->ts[0].tv_sec,
  2313							.tv_nsec = tss->ts[0].tv_nsec,
  2314						};
  2315						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMPNS_OLD,
  2316							 sizeof(ts_old), &ts_old);
  2317					}
  2318				} else {
  2319					if (new_tstamp) {
  2320						struct __kernel_sock_timeval stv = {
  2321							.tv_sec = tss->ts[0].tv_sec,
  2322							.tv_usec = tss->ts[0].tv_nsec / 1000,
  2323						};
  2324						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMP_NEW,
  2325							 sizeof(stv), &stv);
  2326					} else {
  2327						struct __kernel_old_timeval tv = {
  2328							.tv_sec = tss->ts[0].tv_sec,
  2329							.tv_usec = tss->ts[0].tv_nsec / 1000,
  2330						};
  2331						put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMP_OLD,
  2332							 sizeof(tv), &tv);
  2333					}
  2334				}
  2335			}
  2336	
  2337			if (tsflags & SOF_TIMESTAMPING_SOFTWARE &&
  2338			    (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE ||
  2339			     !(tsflags & SOF_TIMESTAMPING_OPT_RX_FILTER)))
  2340				has_timestamping = true;
  2341			else
  2342				tss->ts[0] = (struct timespec64) {0};
  2343		}
  2344	
  2345		if (tss->ts[2].tv_sec || tss->ts[2].tv_nsec) {
  2346			if (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE &&
  2347			    (tsflags & SOF_TIMESTAMPING_RX_HARDWARE ||
  2348			     !(tsflags & SOF_TIMESTAMPING_OPT_RX_FILTER)))
  2349				has_timestamping = true;
  2350			else
  2351				tss->ts[2] = (struct timespec64) {0};
  2352		}
  2353	
  2354		if (has_timestamping) {
  2355			tss->ts[1] = (struct timespec64) {0};
  2356			if (sock_flag(sk, SOCK_TSTAMP_NEW))
  2357				put_cmsg_scm_timestamping64(msg, tss);
  2358			else
  2359				put_cmsg_scm_timestamping(msg, tss);
  2360		}
  2361	}
  2362	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt
  2024-10-08  9:51 ` [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt Jason Xing
@ 2024-10-09  7:19   ` Martin KaFai Lau
  2024-10-09  8:09     ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2024-10-09  7:19 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On 10/8/24 2:51 AM, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Now we allow users to set tsflags through bpf_setsockopt. What I
> want to do is passing SOF_TIMESTAMPING_RX_SOFTWARE flag, so that
> we can generate rx timestamps the moment the skb traverses through
> driver.
> 
> Here is an example:
> 
> case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
> case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
> 	sock_opt = SOF_TIMESTAMPING_RX_SOFTWARE;
> 	bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING,
> 		       &sock_opt, sizeof(sock_opt));
> 	break;
> 
> In this way, we can use bpf program that help us generate and report
> rx timestamp.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>   net/core/filter.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bd0d08bf76bb..9ce99d320571 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5225,6 +5225,9 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
>   		break;
>   	case SO_BINDTODEVICE:
>   		break;
> +	case SO_TIMESTAMPING_NEW:
> +	case SO_TIMESTAMPING_OLD:

I believe this change was proposed before. It will change the user expectation 
on the sk_error_queue. It needs some bits/fields/knobs for bpf. I think this 
point is similar to other's earlier comments in this thread.

I only have a chance to briefly look at it. I think it is useful. This 
bpf/timestamp feature request has come up before.

A high level comment. The current timestamp should work for non tcp sock? The 
bpf/timestamp solution should be able to also.

sockops is tcp centric. From looking at patch 9 that needs to initialize 4 args, 
this interface feels old and not sure we want to extend to other sock types.
This needs some thoughts.

> +		break;
>   	default:
>   		return -EINVAL;
>   	}


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt
  2024-10-09  7:19   ` Martin KaFai Lau
@ 2024-10-09  8:09     ` Jason Xing
  2024-10-09 13:23       ` Willem de Bruijn
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-09  8:09 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On Wed, Oct 9, 2024 at 3:19 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/8/24 2:51 AM, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Now we allow users to set tsflags through bpf_setsockopt. What I
> > want to do is passing SOF_TIMESTAMPING_RX_SOFTWARE flag, so that
> > we can generate rx timestamps the moment the skb traverses through
> > driver.
> >
> > Here is an example:
> >
> > case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
> > case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
> >       sock_opt = SOF_TIMESTAMPING_RX_SOFTWARE;
> >       bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING,
> >                      &sock_opt, sizeof(sock_opt));
> >       break;
> >
> > In this way, we can use bpf program that help us generate and report
> > rx timestamp.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >   net/core/filter.c | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index bd0d08bf76bb..9ce99d320571 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5225,6 +5225,9 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
> >               break;
> >       case SO_BINDTODEVICE:
> >               break;
> > +     case SO_TIMESTAMPING_NEW:
> > +     case SO_TIMESTAMPING_OLD:
>
> I believe this change was proposed before. It will change the user expectation
> on the sk_error_queue. It needs some bits/fields/knobs for bpf. I think this
> point is similar to other's earlier comments in this thread.

Thanks for your reply.

After seeing what you mentioned, I searched through the mailing list
and found one [1] which was designed to fetch hardware timestamps.

[1]：https://lore.kernel.org/bpf/51fd5249-140a-4f1b-b20e-703f159e88a3@linux.dev/T/

>
> I only have a chance to briefly look at it. I think it is useful. This
> bpf/timestamp feature request has come up before.

At the very beginning, I had no intention to use bpf_setsockopt() to
retrieve the rx timestamp because it will override sk_tsflags, but I
cannot implement a good way like what I did to tx path: only setting
skb's field. I'm not sure if this override behaviour is acceptable, so
I post it to know what the bpf experts' suggestions are.

>
> A high level comment. The current timestamp should work for non tcp sock? The
> bpf/timestamp solution should be able to also.

For now, it only supports TCP proto. I would like to quickly implement
a framework which is also suitable for other protos. TCP is just a
start point.

>
> sockops is tcp centric. From looking at patch 9 that needs to initialize 4 args,
> this interface feels old and not sure we want to extend to other sock types.
> This needs some thoughts.

For me, I have interests to extend to other sock types. But I'm
supposed to ask Willem's opinion first.

+Willem de Bruijn Do you want this bpf extension feature to extend to
other protos?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later
  2024-10-09  0:58   ` Kuniyuki Iwashima
@ 2024-10-09  8:11     ` Jason Xing
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-09  8:11 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: andrii, ast, bpf, daniel, davem, dsahern, eddyz87, edumazet,
	haoluo, john.fastabend, jolsa, kernelxing, kpsingh, kuba,
	martin.lau, netdev, pabeni, sdf, song, willemb,
	willemdebruijn.kernel, yonghong.song

On Wed, Oct 9, 2024 at 8:59 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Tue,  8 Oct 2024 17:51:01 +0800
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Implement basic codes so that we later can easily add each tx points.
> > Introducing BPF_SOCK_OPS_ALL_CB_FLAGS used as a test statement can help use
> > control whether to output or not.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  include/uapi/linux/bpf.h       |  5 ++++-
> >  net/core/skbuff.c              | 18 ++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  5 ++++-
> >  3 files changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index c6cd7c7aeeee..157e139ed6fc 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -6900,8 +6900,11 @@ enum {
> >        * options first before the BPF program does.
> >        */
> >       BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
> > +     /* Call bpf when the kernel is generating tx timestamps.
> > +      */
> > +     BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
> >  /* Mask of all currently supported cb flags */
> > -     BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
> > +     BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,
>
> I remember this change makes two selftests fail and needs diff
> in this link.
> https://lore.kernel.org/bpf/20231016161134.25365-1-kuniyu@amazon.com/

Thanks for pointing out this. I will dig into this :)

>
> Also, adding a bpf selftest or extending some for this series
> would be nice.

Sure, I would like to add a selftest after we all reach a consensus on
how to implement it in the right way.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps
  2024-10-08 23:48     ` Jason Xing
@ 2024-10-09  9:16       ` Vadim Fedorenko
  2024-10-09 11:15         ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Vadim Fedorenko @ 2024-10-09  9:16 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf,
	netdev, Jason Xing

On 09/10/2024 00:48, Jason Xing wrote:
> On Wed, Oct 9, 2024 at 3:18 AM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
>>
>> On 08/10/2024 10:51, Jason Xing wrote:
>>> From: Jason Xing <kernelxing@tencent.com>
>>>
>>> Once we set BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG flag here, there
>>> are three points in the previous patches where generating timestamps
>>> works. Let us make the basic bpf mechanism for timestamping feature
>>>    work finally.
>>>
>>> We can use like this as a simple example in bpf program:
>>> __section("sockops")
>>>
>>> case BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB:
>>>        dport = bpf_ntohl(skops->remote_port);
>>>        sport = skops->local_port;
>>>        skops->reply = SOF_TIMESTAMPING_TX_SCHED;
>>>        bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG);
>>> case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
>>>        bpf_printk(...);
>>>
>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>>> ---
>>>    include/uapi/linux/bpf.h       |  8 ++++++++
>>>    net/ipv4/tcp.c                 | 27 ++++++++++++++++++++++++++-
>>>    tools/include/uapi/linux/bpf.h |  8 ++++++++
>>>    3 files changed, 42 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 1b478ec18ac2..6bf3f2892776 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -7034,6 +7034,14 @@ enum {
>>>                                         * feature is on. It indicates the
>>>                                         * recorded timestamp.
>>>                                         */
>>> +     BPF_SOCK_OPS_TX_TS_OPT_CB,      /* Called when the last skb from
>>> +                                      * sendmsg is going to push when
>>> +                                      * SO_TIMESTAMPING feature is on.
>>> +                                      * Let user have a chance to switch
>>> +                                      * on BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG
>>> +                                      * flag for other three tx timestamp
>>> +                                      * use.
>>> +                                      */
>>>    };
>>>
>>>    /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 82cc4a5633ce..ddf4089779b5 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -477,12 +477,37 @@ void tcp_init_sock(struct sock *sk)
>>>    }
>>>    EXPORT_SYMBOL(tcp_init_sock);
>>>
>>> +static u32 bpf_tcp_tx_timestamp(struct sock *sk)
>>> +{
>>> +     u32 flags;
>>> +
>>> +     flags = tcp_call_bpf(sk, BPF_SOCK_OPS_TX_TS_OPT_CB, 0, NULL);
>>> +     if (flags <= 0)
>>> +             return 0;
>>> +
>>> +     if (flags & ~SOF_TIMESTAMPING_MASK)
>>> +             return 0;
>>> +
>>> +     if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
>>> +             return 0;
>>> +
>>> +     return flags;
>>> +}
>>> +
>>>    static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>    {
>>>        struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>        u32 tsflags = sockc->tsflags;
>>> +     u32 flags;
>>> +
>>> +     if (!skb)
>>> +             return;
>>> +
>>> +     flags = bpf_tcp_tx_timestamp(sk);
>>> +     if (flags)
>>> +             tsflags = flags;
>>
>> In this case it's impossible to clear timestamping flags from bpf
> 
> It cannot be cleared only from the last skb until the next round of
> recvmsg. Since the last skb is generated and bpf program is attached,
> I would like to know why we need to clear the related fields in the
> skb? Please note that I didn't hack the sk_tstflags in struct sock :)

>> program, but it may be very useful. Consider providing flags from
>> socket cookie to the program or maybe add an option to combine them?
> 
> Thanks for this idea. May I ask what the benefits are through adding
> an option because the bpf test statement (BPF_SOCK_OPS_TEST_FLAG) is a
> good option to take a whole control? Or could you provide more details
> about how you expect to do so?

Well, as Willem mentioned, you are overriding flags completely. But what
if an application is waiting for some type of timestamp to arrive, but
bpf program rewrites flags and disables this type of timestamp? It will
confuse application.

Thinking twice, clearing flags might not be useful because of the very
same issue though.


> 
> Thanks,
> Jason


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-09  1:05     ` Jason Xing
@ 2024-10-09  9:27       ` Vadim Fedorenko
  2024-10-09 11:12         ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Vadim Fedorenko @ 2024-10-09  9:27 UTC (permalink / raw)
  To: Jason Xing, Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

On 09/10/2024 02:05, Jason Xing wrote:
> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>
>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>>
>>> Jason Xing wrote:
>>>> From: Jason Xing <kernelxing@tencent.com>
>>>>
>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
>>>> tracepoint to print information (say, tstamp) so that we can
>>>> transparently equip applications with this feature and require no
>>>> modification in user side.
>>>>
>>>> Later, we discussed at netconf and agreed that we can use bpf for better
>>>> extension, which is mainly suggested by John Fastabend and Willem de
>>>> Bruijn. Many thanks here! So I post this series to see if we have a
>>>> better solution to extend.
>>>>
>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
>>>> users only needs to pass certain flags through bpf program to make sure
>>>> the last skb from each sendmsg() has timestamp related controlled flag.
>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
>>>> and wait for the moment when recvmsg() is called.
>>>
>>> As you mention, overall I am very supportive of having a way to add
>>> timestamping by adminstrators, without having to rebuild applications.
>>> BPF hooks seem to be the right place for this.
>>>
>>> There is existing kprobe/kretprobe/kfunc support. Supporting
>>> SO_TIMESTAMPING directly may be useful due to its targeted feature
>>> set, and correlation between measurements for the same data in the
>>> stream.
>>>
>>>> After this series, we could step by step implement more advanced
>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
>>>
>>> My main implementation concern is where this API overlaps with the
>>> existing user API, and how they might conflict. A few questions in the
>>> patches.
>>
>> Agreed. That's also what I'm concerned about. So I decided to ask for
>> related experts' help.
>>
>> How to deal with it without interfering with the existing apps in the
>> right way is the key problem.
> 
> What I try to implement is let the bpf program have the highest
> precedence. It's similar to RTO min, see the commit as an example:
> 
> commit f086edef71be7174a16c1ed67ac65a085cda28b1
> Author: Kevin Yang <yyd@google.com>
> Date:   Mon Jun 3 21:30:54 2024 +0000
> 
>      tcp: add sysctl_tcp_rto_min_us
> 
>      Adding a sysctl knob to allow user to specify a default
>      rto_min at socket init time, other than using the hard
>      coded 200ms default rto_min.
> 
>      Note that the rto_min route option has the highest precedence
>      for configuring this setting, followed by the TCP_BPF_RTO_MIN
>      socket option, followed by the tcp_rto_min_us sysctl.
> 
> It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
> The first priority can override others. It doesn't have a good
> chance/point to restore the icsk_rto_min field if users want to
> shutdown the bpf program because it is set in
> bpf_sol_tcp_setsockopt().

rto_min example is slightly different. With tcp_rto_min the doesn't
expect any data to come back to user space while for timestamping the
app may be confused directly by providing more data, or by not providing
expected data. I believe some hint about requestor of the data is needed
here. It will also help to solve the problem of populating sk_err_queue
mentioned by Martin.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-09  9:27       ` Vadim Fedorenko
@ 2024-10-09 11:12         ` Jason Xing
  2024-10-09 11:48           ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-09 11:12 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 09/10/2024 02:05, Jason Xing wrote:
> > On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>
> >> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
> >> <willemdebruijn.kernel@gmail.com> wrote:
> >>>
> >>> Jason Xing wrote:
> >>>> From: Jason Xing <kernelxing@tencent.com>
> >>>>
> >>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> >>>> tracepoint to print information (say, tstamp) so that we can
> >>>> transparently equip applications with this feature and require no
> >>>> modification in user side.
> >>>>
> >>>> Later, we discussed at netconf and agreed that we can use bpf for better
> >>>> extension, which is mainly suggested by John Fastabend and Willem de
> >>>> Bruijn. Many thanks here! So I post this series to see if we have a
> >>>> better solution to extend.
> >>>>
> >>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
> >>>> users only needs to pass certain flags through bpf program to make sure
> >>>> the last skb from each sendmsg() has timestamp related controlled flag.
> >>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
> >>>> and wait for the moment when recvmsg() is called.
> >>>
> >>> As you mention, overall I am very supportive of having a way to add
> >>> timestamping by adminstrators, without having to rebuild applications.
> >>> BPF hooks seem to be the right place for this.
> >>>
> >>> There is existing kprobe/kretprobe/kfunc support. Supporting
> >>> SO_TIMESTAMPING directly may be useful due to its targeted feature
> >>> set, and correlation between measurements for the same data in the
> >>> stream.
> >>>
> >>>> After this series, we could step by step implement more advanced
> >>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> >>>
> >>> My main implementation concern is where this API overlaps with the
> >>> existing user API, and how they might conflict. A few questions in the
> >>> patches.
> >>
> >> Agreed. That's also what I'm concerned about. So I decided to ask for
> >> related experts' help.
> >>
> >> How to deal with it without interfering with the existing apps in the
> >> right way is the key problem.
> >
> > What I try to implement is let the bpf program have the highest
> > precedence. It's similar to RTO min, see the commit as an example:
> >
> > commit f086edef71be7174a16c1ed67ac65a085cda28b1
> > Author: Kevin Yang <yyd@google.com>
> > Date:   Mon Jun 3 21:30:54 2024 +0000
> >
> >      tcp: add sysctl_tcp_rto_min_us
> >
> >      Adding a sysctl knob to allow user to specify a default
> >      rto_min at socket init time, other than using the hard
> >      coded 200ms default rto_min.
> >
> >      Note that the rto_min route option has the highest precedence
> >      for configuring this setting, followed by the TCP_BPF_RTO_MIN
> >      socket option, followed by the tcp_rto_min_us sysctl.
> >
> > It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
> > The first priority can override others. It doesn't have a good
> > chance/point to restore the icsk_rto_min field if users want to
> > shutdown the bpf program because it is set in
> > bpf_sol_tcp_setsockopt().
>
> rto_min example is slightly different. With tcp_rto_min the doesn't
> expect any data to come back to user space while for timestamping the
> app may be confused directly by providing more data, or by not providing
> expected data. I believe some hint about requestor of the data is needed
> here. It will also help to solve the problem of populating sk_err_queue
> mentioned by Martin.

Sorry, I don't fully get it. In this patch series, this bpf extension
feature will not rely on sk_err_queue any more to report tx timestamps
to userspace. Bpf program can do that printing.

Do you mean that it could be wrong if one skb carries the tsflags that
are previously set due to the bpf program and then suddenly users
detach the program? It indeed will put a new/cloned skb into the error
queue. Interesting corner case. It seems I have to re-implement a
totally independent tsflags for bpf extension feature. Do you have a
better idea on this?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps
  2024-10-09  9:16       ` Vadim Fedorenko
@ 2024-10-09 11:15         ` Jason Xing
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-09 11:15 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf,
	netdev, Jason Xing

On Wed, Oct 9, 2024 at 5:17 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 09/10/2024 00:48, Jason Xing wrote:
> > On Wed, Oct 9, 2024 at 3:18 AM Vadim Fedorenko
> > <vadim.fedorenko@linux.dev> wrote:
> >>
> >> On 08/10/2024 10:51, Jason Xing wrote:
> >>> From: Jason Xing <kernelxing@tencent.com>
> >>>
> >>> Once we set BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG flag here, there
> >>> are three points in the previous patches where generating timestamps
> >>> works. Let us make the basic bpf mechanism for timestamping feature
> >>>    work finally.
> >>>
> >>> We can use like this as a simple example in bpf program:
> >>> __section("sockops")
> >>>
> >>> case BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB:
> >>>        dport = bpf_ntohl(skops->remote_port);
> >>>        sport = skops->local_port;
> >>>        skops->reply = SOF_TIMESTAMPING_TX_SCHED;
> >>>        bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMP_OPT_CB_FLAG);
> >>> case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
> >>>        bpf_printk(...);
> >>>
> >>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >>> ---
> >>>    include/uapi/linux/bpf.h       |  8 ++++++++
> >>>    net/ipv4/tcp.c                 | 27 ++++++++++++++++++++++++++-
> >>>    tools/include/uapi/linux/bpf.h |  8 ++++++++
> >>>    3 files changed, 42 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>> index 1b478ec18ac2..6bf3f2892776 100644
> >>> --- a/include/uapi/linux/bpf.h
> >>> +++ b/include/uapi/linux/bpf.h
> >>> @@ -7034,6 +7034,14 @@ enum {
> >>>                                         * feature is on. It indicates the
> >>>                                         * recorded timestamp.
> >>>                                         */
> >>> +     BPF_SOCK_OPS_TX_TS_OPT_CB,      /* Called when the last skb from
> >>> +                                      * sendmsg is going to push when
> >>> +                                      * SO_TIMESTAMPING feature is on.
> >>> +                                      * Let user have a chance to switch
> >>> +                                      * on BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG
> >>> +                                      * flag for other three tx timestamp
> >>> +                                      * use.
> >>> +                                      */
> >>>    };
> >>>
> >>>    /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>> index 82cc4a5633ce..ddf4089779b5 100644
> >>> --- a/net/ipv4/tcp.c
> >>> +++ b/net/ipv4/tcp.c
> >>> @@ -477,12 +477,37 @@ void tcp_init_sock(struct sock *sk)
> >>>    }
> >>>    EXPORT_SYMBOL(tcp_init_sock);
> >>>
> >>> +static u32 bpf_tcp_tx_timestamp(struct sock *sk)
> >>> +{
> >>> +     u32 flags;
> >>> +
> >>> +     flags = tcp_call_bpf(sk, BPF_SOCK_OPS_TX_TS_OPT_CB, 0, NULL);
> >>> +     if (flags <= 0)
> >>> +             return 0;
> >>> +
> >>> +     if (flags & ~SOF_TIMESTAMPING_MASK)
> >>> +             return 0;
> >>> +
> >>> +     if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
> >>> +             return 0;
> >>> +
> >>> +     return flags;
> >>> +}
> >>> +
> >>>    static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>    {
> >>>        struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>>        u32 tsflags = sockc->tsflags;
> >>> +     u32 flags;
> >>> +
> >>> +     if (!skb)
> >>> +             return;
> >>> +
> >>> +     flags = bpf_tcp_tx_timestamp(sk);
> >>> +     if (flags)
> >>> +             tsflags = flags;
> >>
> >> In this case it's impossible to clear timestamping flags from bpf
> >
> > It cannot be cleared only from the last skb until the next round of
> > recvmsg. Since the last skb is generated and bpf program is attached,
> > I would like to know why we need to clear the related fields in the
> > skb? Please note that I didn't hack the sk_tstflags in struct sock :)
>
> >> program, but it may be very useful. Consider providing flags from
> >> socket cookie to the program or maybe add an option to combine them?
> >
> > Thanks for this idea. May I ask what the benefits are through adding
> > an option because the bpf test statement (BPF_SOCK_OPS_TEST_FLAG) is a
> > good option to take a whole control? Or could you provide more details
> > about how you expect to do so?
>
> Well, as Willem mentioned, you are overriding flags completely. But what
> if an application is waiting for some type of timestamp to arrive, but
> bpf program rewrites flags and disables this type of timestamp? It will
> confuse application.

Indeed, this series doesn't handle the conflict very well. Initially,
I tried so hard to avoid implementing the feature again. But now, it
seems inevitable. Let me dig into it more.

>
> Thinking twice, clearing flags might not be useful because of the very
> same issue though.

Yes.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-09 11:12         ` Jason Xing
@ 2024-10-09 11:48           ` Jason Xing
  2024-10-09 13:16             ` Vadim Fedorenko
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-09 11:48 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
> >
> > On 09/10/2024 02:05, Jason Xing wrote:
> > > On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >>
> > >> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
> > >> <willemdebruijn.kernel@gmail.com> wrote:
> > >>>
> > >>> Jason Xing wrote:
> > >>>> From: Jason Xing <kernelxing@tencent.com>
> > >>>>
> > >>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> > >>>> tracepoint to print information (say, tstamp) so that we can
> > >>>> transparently equip applications with this feature and require no
> > >>>> modification in user side.
> > >>>>
> > >>>> Later, we discussed at netconf and agreed that we can use bpf for better
> > >>>> extension, which is mainly suggested by John Fastabend and Willem de
> > >>>> Bruijn. Many thanks here! So I post this series to see if we have a
> > >>>> better solution to extend.
> > >>>>
> > >>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
> > >>>> users only needs to pass certain flags through bpf program to make sure
> > >>>> the last skb from each sendmsg() has timestamp related controlled flag.
> > >>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
> > >>>> and wait for the moment when recvmsg() is called.
> > >>>
> > >>> As you mention, overall I am very supportive of having a way to add
> > >>> timestamping by adminstrators, without having to rebuild applications.
> > >>> BPF hooks seem to be the right place for this.
> > >>>
> > >>> There is existing kprobe/kretprobe/kfunc support. Supporting
> > >>> SO_TIMESTAMPING directly may be useful due to its targeted feature
> > >>> set, and correlation between measurements for the same data in the
> > >>> stream.
> > >>>
> > >>>> After this series, we could step by step implement more advanced
> > >>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> > >>>
> > >>> My main implementation concern is where this API overlaps with the
> > >>> existing user API, and how they might conflict. A few questions in the
> > >>> patches.
> > >>
> > >> Agreed. That's also what I'm concerned about. So I decided to ask for
> > >> related experts' help.
> > >>
> > >> How to deal with it without interfering with the existing apps in the
> > >> right way is the key problem.
> > >
> > > What I try to implement is let the bpf program have the highest
> > > precedence. It's similar to RTO min, see the commit as an example:
> > >
> > > commit f086edef71be7174a16c1ed67ac65a085cda28b1
> > > Author: Kevin Yang <yyd@google.com>
> > > Date:   Mon Jun 3 21:30:54 2024 +0000
> > >
> > >      tcp: add sysctl_tcp_rto_min_us
> > >
> > >      Adding a sysctl knob to allow user to specify a default
> > >      rto_min at socket init time, other than using the hard
> > >      coded 200ms default rto_min.
> > >
> > >      Note that the rto_min route option has the highest precedence
> > >      for configuring this setting, followed by the TCP_BPF_RTO_MIN
> > >      socket option, followed by the tcp_rto_min_us sysctl.
> > >
> > > It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
> > > The first priority can override others. It doesn't have a good
> > > chance/point to restore the icsk_rto_min field if users want to
> > > shutdown the bpf program because it is set in
> > > bpf_sol_tcp_setsockopt().
> >
> > rto_min example is slightly different. With tcp_rto_min the doesn't
> > expect any data to come back to user space while for timestamping the
> > app may be confused directly by providing more data, or by not providing
> > expected data. I believe some hint about requestor of the data is needed
> > here. It will also help to solve the problem of populating sk_err_queue
> > mentioned by Martin.
>
> Sorry, I don't fully get it. In this patch series, this bpf extension
> feature will not rely on sk_err_queue any more to report tx timestamps
> to userspace. Bpf program can do that printing.
>
> Do you mean that it could be wrong if one skb carries the tsflags that
> are previously set due to the bpf program and then suddenly users
> detach the program? It indeed will put a new/cloned skb into the error
> queue. Interesting corner case. It seems I have to re-implement a
> totally independent tsflags for bpf extension feature. Do you have a
> better idea on this?

I feel that if I could introduce bpf new flags like
SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch
series, then it will not populate skb in sk_err_queue even users
remove the bpf program all of sudden. With this kind of specific bpf
flags, we can also avoid conflicting with the apps using
SO_TIEMSTAMPING feature. Let me give it a shot unless a better
solution shows up.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-09 11:48           ` Jason Xing
@ 2024-10-09 13:16             ` Vadim Fedorenko
  2024-10-09 13:47               ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Vadim Fedorenko @ 2024-10-09 13:16 UTC (permalink / raw)
  To: Jason Xing
  Cc: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On 09/10/2024 12:48, Jason Xing wrote:
> On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>
>> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko
>> <vadim.fedorenko@linux.dev> wrote:
>>>
>>> On 09/10/2024 02:05, Jason Xing wrote:
>>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>>
>>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
>>>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>>>
>>>>>> Jason Xing wrote:
>>>>>>> From: Jason Xing <kernelxing@tencent.com>
>>>>>>>
>>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
>>>>>>> tracepoint to print information (say, tstamp) so that we can
>>>>>>> transparently equip applications with this feature and require no
>>>>>>> modification in user side.
>>>>>>>
>>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better
>>>>>>> extension, which is mainly suggested by John Fastabend and Willem de
>>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a
>>>>>>> better solution to extend.
>>>>>>>
>>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
>>>>>>> users only needs to pass certain flags through bpf program to make sure
>>>>>>> the last skb from each sendmsg() has timestamp related controlled flag.
>>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
>>>>>>> and wait for the moment when recvmsg() is called.
>>>>>>
>>>>>> As you mention, overall I am very supportive of having a way to add
>>>>>> timestamping by adminstrators, without having to rebuild applications.
>>>>>> BPF hooks seem to be the right place for this.
>>>>>>
>>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting
>>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature
>>>>>> set, and correlation between measurements for the same data in the
>>>>>> stream.
>>>>>>
>>>>>>> After this series, we could step by step implement more advanced
>>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
>>>>>>
>>>>>> My main implementation concern is where this API overlaps with the
>>>>>> existing user API, and how they might conflict. A few questions in the
>>>>>> patches.
>>>>>
>>>>> Agreed. That's also what I'm concerned about. So I decided to ask for
>>>>> related experts' help.
>>>>>
>>>>> How to deal with it without interfering with the existing apps in the
>>>>> right way is the key problem.
>>>>
>>>> What I try to implement is let the bpf program have the highest
>>>> precedence. It's similar to RTO min, see the commit as an example:
>>>>
>>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1
>>>> Author: Kevin Yang <yyd@google.com>
>>>> Date:   Mon Jun 3 21:30:54 2024 +0000
>>>>
>>>>       tcp: add sysctl_tcp_rto_min_us
>>>>
>>>>       Adding a sysctl knob to allow user to specify a default
>>>>       rto_min at socket init time, other than using the hard
>>>>       coded 200ms default rto_min.
>>>>
>>>>       Note that the rto_min route option has the highest precedence
>>>>       for configuring this setting, followed by the TCP_BPF_RTO_MIN
>>>>       socket option, followed by the tcp_rto_min_us sysctl.
>>>>
>>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
>>>> The first priority can override others. It doesn't have a good
>>>> chance/point to restore the icsk_rto_min field if users want to
>>>> shutdown the bpf program because it is set in
>>>> bpf_sol_tcp_setsockopt().
>>>
>>> rto_min example is slightly different. With tcp_rto_min the doesn't
>>> expect any data to come back to user space while for timestamping the
>>> app may be confused directly by providing more data, or by not providing
>>> expected data. I believe some hint about requestor of the data is needed
>>> here. It will also help to solve the problem of populating sk_err_queue
>>> mentioned by Martin.
>>
>> Sorry, I don't fully get it. In this patch series, this bpf extension
>> feature will not rely on sk_err_queue any more to report tx timestamps
>> to userspace. Bpf program can do that printing.
>>
>> Do you mean that it could be wrong if one skb carries the tsflags that
>> are previously set due to the bpf program and then suddenly users
>> detach the program? It indeed will put a new/cloned skb into the error
>> queue. Interesting corner case. It seems I have to re-implement a
>> totally independent tsflags for bpf extension feature. Do you have a
>> better idea on this?
> 
> I feel that if I could introduce bpf new flags like
> SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch
> series, then it will not populate skb in sk_err_queue even users
> remove the bpf program all of sudden. With this kind of specific bpf
> flags, we can also avoid conflicting with the apps using
> SO_TIEMSTAMPING feature. Let me give it a shot unless a better
> solution shows up.

It doesn't look great to have duplicate flags just to indicate that this
particular timestamp was asked by a bpf program, even though it looks
like a straight forward solution. Sounds like we have to re-think the
interface for timestamping requests, but I don't have proper suggestion
right now.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case
  2024-10-08 23:18     ` Jason Xing
@ 2024-10-09 13:19       ` Willem de Bruijn
  2024-10-09 13:52         ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Willem de Bruijn @ 2024-10-09 13:19 UTC (permalink / raw)
  To: Jason Xing, Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

Jason Xing wrote:
> On Wed, Oct 9, 2024 at 2:56 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > We can set OPT_ID|OPT_ID_TCP before we initialize the last skb
> > > from each sendmsg. We only set the socket once like how we use
> > > setsockopt() with OPT_ID|OPT_ID_TCP flags.
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >  net/core/skbuff.c | 16 +++++++++++++---
> > >  net/ipv4/tcp.c    | 19 +++++++++++++++----
> > >  2 files changed, 28 insertions(+), 7 deletions(-)
> > >
> >
> > > @@ -491,10 +491,21 @@ static u32 bpf_tcp_tx_timestamp(struct sock *sk)
> > >       if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
> > >               return 0;
> > >
> > > +     /* We require users to set both OPT_ID and OPT_ID_TCP flags
> > > +      * together here, or else the key might be inaccurate.
> > > +      */
> > > +     if (flags & SOF_TIMESTAMPING_OPT_ID &&
> > > +         flags & SOF_TIMESTAMPING_OPT_ID_TCP &&
> > > +         !(sk->sk_tsflags & (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP))) {
> > > +             atomic_set(&sk->sk_tskey, (tcp_sk(sk)->write_seq - copied));
> > > +             sk->sk_tsflags |= (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP);
> >
> > So user and BPF admin conflict on both sk_tsflags and sktskey?
> >
> > I think BPF resetting this key, or incrementing it, may break user
> > expectations.
> 
> Yes, when it comes to OPT_ID and OPT_ID_TCP, conflict could happen.
> The reason why I don't use it like BPF_SOCK_OPS_TS_SCHED_OPT_CB flags
> (which is set along with each last skb) is that OPT_ID logic is a
> little bit complex. If we want to avoid touching sk_tsflags field in
> struct sock, we have to re-implement a similiar logic as you've
> already done in these years.

One option may be to only allow BPF to use sk_tsflags and sk_tskey if
sk_tsflags is not set by the user, and to fail user access to these
fields later.

That enforces mutual exclusion between either user or admin
timestamping.

Of course, it may still break users if BPF is first, but the user
socket tries to enable it later. So an imperfect solution.

Ideally the two would use separate per socket state. I don't know
all the options the various BPF hooks may have for this.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later
  2024-10-08 23:27     ` Jason Xing
@ 2024-10-09 13:22       ` Willem de Bruijn
  2024-10-09 13:57         ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Willem de Bruijn @ 2024-10-09 13:22 UTC (permalink / raw)
  To: Jason Xing, Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

Jason Xing wrote:
> On Wed, Oct 9, 2024 at 2:45 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Implement basic codes so that we later can easily add each tx points.
> > > Introducing BPF_SOCK_OPS_ALL_CB_FLAGS used as a test statement can help use
> > > control whether to output or not.
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >  include/uapi/linux/bpf.h       |  5 ++++-
> > >  net/core/skbuff.c              | 18 ++++++++++++++++++
> > >  tools/include/uapi/linux/bpf.h |  5 ++++-
> > >  3 files changed, 26 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index c6cd7c7aeeee..157e139ed6fc 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -6900,8 +6900,11 @@ enum {
> > >        * options first before the BPF program does.
> > >        */
> > >       BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
> > > +     /* Call bpf when the kernel is generating tx timestamps.
> > > +      */
> > > +     BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
> > >  /* Mask of all currently supported cb flags */
> > > -     BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
> > > +     BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,
> > >  };
> > >
> > >  /* List of known BPF sock_ops operators.
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 74149dc4ee31..5ff1a91c1204 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -5539,6 +5539,21 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> > >  }
> > >  EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
> > >
> > > +static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
> > > +                           struct skb_shared_hwtstamps *hwtstamps)
> > > +{
> > > +     struct tcp_sock *tp;
> > > +
> > > +     if (!sk_is_tcp(sk))
> > > +             return false;
> > > +
> > > +     tp = tcp_sk(sk);
> > > +     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
> > > +             return true;
> > > +
> > > +     return false;
> > > +}
> > > +
> > >  void __skb_tstamp_tx(struct sk_buff *orig_skb,
> > >                    const struct sk_buff *ack_skb,
> > >                    struct skb_shared_hwtstamps *hwtstamps,
> > > @@ -5551,6 +5566,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> > >       if (!sk)
> > >               return;
> > >
> > > +     if (bpf_skb_tstamp_tx(sk, tstype, hwtstamps))
> > > +             return;
> > > +
> >
> > Eventually, this whole feature could probably be behind a
> > static_branch.
> 
> You want to implement another toggle to control it? But for tx path
> "BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)"
> works as a per-netns toggle. I would like to know what you exactly
> want to do in the next move?

Not another toggle. A static branch that enables the datapath logic
when a BPF program becomes active. See also for instance ipv4_min_ttl.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt
  2024-10-09  8:09     ` Jason Xing
@ 2024-10-09 13:23       ` Willem de Bruijn
  2024-10-09 13:48         ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Willem de Bruijn @ 2024-10-09 13:23 UTC (permalink / raw)
  To: Jason Xing, Martin KaFai Lau
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	willemb, ast, daniel, andrii, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

Jason Xing wrote:
> On Wed, Oct 9, 2024 at 3:19 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 10/8/24 2:51 AM, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Now we allow users to set tsflags through bpf_setsockopt. What I
> > > want to do is passing SOF_TIMESTAMPING_RX_SOFTWARE flag, so that
> > > we can generate rx timestamps the moment the skb traverses through
> > > driver.
> > >
> > > Here is an example:
> > >
> > > case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
> > > case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
> > >       sock_opt = SOF_TIMESTAMPING_RX_SOFTWARE;
> > >       bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING,
> > >                      &sock_opt, sizeof(sock_opt));
> > >       break;
> > >
> > > In this way, we can use bpf program that help us generate and report
> > > rx timestamp.
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >   net/core/filter.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index bd0d08bf76bb..9ce99d320571 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -5225,6 +5225,9 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
> > >               break;
> > >       case SO_BINDTODEVICE:
> > >               break;
> > > +     case SO_TIMESTAMPING_NEW:
> > > +     case SO_TIMESTAMPING_OLD:
> >
> > I believe this change was proposed before. It will change the user expectation
> > on the sk_error_queue. It needs some bits/fields/knobs for bpf. I think this
> > point is similar to other's earlier comments in this thread.
> 
> Thanks for your reply.
> 
> After seeing what you mentioned, I searched through the mailing list
> and found one [1] which was designed to fetch hardware timestamps.
> 
> [1]：https://lore.kernel.org/bpf/51fd5249-140a-4f1b-b20e-703f159e88a3@linux.dev/T/
> 
> >
> > I only have a chance to briefly look at it. I think it is useful. This
> > bpf/timestamp feature request has come up before.
> 
> At the very beginning, I had no intention to use bpf_setsockopt() to
> retrieve the rx timestamp because it will override sk_tsflags, but I
> cannot implement a good way like what I did to tx path: only setting
> skb's field. I'm not sure if this override behaviour is acceptable, so
> I post it to know what the bpf experts' suggestions are.
> 
> >
> > A high level comment. The current timestamp should work for non tcp sock? The
> > bpf/timestamp solution should be able to also.
> 
> For now, it only supports TCP proto. I would like to quickly implement
> a framework which is also suitable for other protos. TCP is just a
> start point.
> 
> >
> > sockops is tcp centric. From looking at patch 9 that needs to initialize 4 args,
> > this interface feels old and not sure we want to extend to other sock types.
> > This needs some thoughts.
> 
> For me, I have interests to extend to other sock types. But I'm
> supposed to ask Willem's opinion first.
> 
> +Willem de Bruijn Do you want this bpf extension feature to extend to
> other protos?

There would likely be users for other protocols too, just like
SO_TIMESTAMPING. Though TCP is probably the most widely used case by
far.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-09 13:16             ` Vadim Fedorenko
@ 2024-10-09 13:47               ` Jason Xing
  2024-10-09 13:58                 ` Vadim Fedorenko
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-09 13:47 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 09/10/2024 12:48, Jason Xing wrote:
> > On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>
> >> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko
> >> <vadim.fedorenko@linux.dev> wrote:
> >>>
> >>> On 09/10/2024 02:05, Jason Xing wrote:
> >>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>>>>
> >>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
> >>>>> <willemdebruijn.kernel@gmail.com> wrote:
> >>>>>>
> >>>>>> Jason Xing wrote:
> >>>>>>> From: Jason Xing <kernelxing@tencent.com>
> >>>>>>>
> >>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> >>>>>>> tracepoint to print information (say, tstamp) so that we can
> >>>>>>> transparently equip applications with this feature and require no
> >>>>>>> modification in user side.
> >>>>>>>
> >>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better
> >>>>>>> extension, which is mainly suggested by John Fastabend and Willem de
> >>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a
> >>>>>>> better solution to extend.
> >>>>>>>
> >>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
> >>>>>>> users only needs to pass certain flags through bpf program to make sure
> >>>>>>> the last skb from each sendmsg() has timestamp related controlled flag.
> >>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
> >>>>>>> and wait for the moment when recvmsg() is called.
> >>>>>>
> >>>>>> As you mention, overall I am very supportive of having a way to add
> >>>>>> timestamping by adminstrators, without having to rebuild applications.
> >>>>>> BPF hooks seem to be the right place for this.
> >>>>>>
> >>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting
> >>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature
> >>>>>> set, and correlation between measurements for the same data in the
> >>>>>> stream.
> >>>>>>
> >>>>>>> After this series, we could step by step implement more advanced
> >>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> >>>>>>
> >>>>>> My main implementation concern is where this API overlaps with the
> >>>>>> existing user API, and how they might conflict. A few questions in the
> >>>>>> patches.
> >>>>>
> >>>>> Agreed. That's also what I'm concerned about. So I decided to ask for
> >>>>> related experts' help.
> >>>>>
> >>>>> How to deal with it without interfering with the existing apps in the
> >>>>> right way is the key problem.
> >>>>
> >>>> What I try to implement is let the bpf program have the highest
> >>>> precedence. It's similar to RTO min, see the commit as an example:
> >>>>
> >>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1
> >>>> Author: Kevin Yang <yyd@google.com>
> >>>> Date:   Mon Jun 3 21:30:54 2024 +0000
> >>>>
> >>>>       tcp: add sysctl_tcp_rto_min_us
> >>>>
> >>>>       Adding a sysctl knob to allow user to specify a default
> >>>>       rto_min at socket init time, other than using the hard
> >>>>       coded 200ms default rto_min.
> >>>>
> >>>>       Note that the rto_min route option has the highest precedence
> >>>>       for configuring this setting, followed by the TCP_BPF_RTO_MIN
> >>>>       socket option, followed by the tcp_rto_min_us sysctl.
> >>>>
> >>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
> >>>> The first priority can override others. It doesn't have a good
> >>>> chance/point to restore the icsk_rto_min field if users want to
> >>>> shutdown the bpf program because it is set in
> >>>> bpf_sol_tcp_setsockopt().
> >>>
> >>> rto_min example is slightly different. With tcp_rto_min the doesn't
> >>> expect any data to come back to user space while for timestamping the
> >>> app may be confused directly by providing more data, or by not providing
> >>> expected data. I believe some hint about requestor of the data is needed
> >>> here. It will also help to solve the problem of populating sk_err_queue
> >>> mentioned by Martin.
> >>
> >> Sorry, I don't fully get it. In this patch series, this bpf extension
> >> feature will not rely on sk_err_queue any more to report tx timestamps
> >> to userspace. Bpf program can do that printing.
> >>
> >> Do you mean that it could be wrong if one skb carries the tsflags that
> >> are previously set due to the bpf program and then suddenly users
> >> detach the program? It indeed will put a new/cloned skb into the error
> >> queue. Interesting corner case. It seems I have to re-implement a
> >> totally independent tsflags for bpf extension feature. Do you have a
> >> better idea on this?
> >
> > I feel that if I could introduce bpf new flags like
> > SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch
> > series, then it will not populate skb in sk_err_queue even users
> > remove the bpf program all of sudden. With this kind of specific bpf
> > flags, we can also avoid conflicting with the apps using
> > SO_TIEMSTAMPING feature. Let me give it a shot unless a better
> > solution shows up.
>
> It doesn't look great to have duplicate flags just to indicate that this
> particular timestamp was asked by a bpf program, even though it looks

Or introduce a new field in struct sock or struct sk_buff so that
existing SOF_TIMESTAMPING_* can be reused.

> like a straight forward solution. Sounds like we have to re-think the
> interface for timestamping requests, but I don't have proper suggestion
> right now.

Thanks for your help :)

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt
  2024-10-09 13:23       ` Willem de Bruijn
@ 2024-10-09 13:48         ` Jason Xing
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-09 13:48 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Martin KaFai Lau, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

On Wed, Oct 9, 2024 at 9:23 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Wed, Oct 9, 2024 at 3:19 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > >
> > > On 10/8/24 2:51 AM, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Now we allow users to set tsflags through bpf_setsockopt. What I
> > > > want to do is passing SOF_TIMESTAMPING_RX_SOFTWARE flag, so that
> > > > we can generate rx timestamps the moment the skb traverses through
> > > > driver.
> > > >
> > > > Here is an example:
> > > >
> > > > case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
> > > > case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
> > > >       sock_opt = SOF_TIMESTAMPING_RX_SOFTWARE;
> > > >       bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING,
> > > >                      &sock_opt, sizeof(sock_opt));
> > > >       break;
> > > >
> > > > In this way, we can use bpf program that help us generate and report
> > > > rx timestamp.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > >   net/core/filter.c | 3 +++
> > > >   1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index bd0d08bf76bb..9ce99d320571 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -5225,6 +5225,9 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
> > > >               break;
> > > >       case SO_BINDTODEVICE:
> > > >               break;
> > > > +     case SO_TIMESTAMPING_NEW:
> > > > +     case SO_TIMESTAMPING_OLD:
> > >
> > > I believe this change was proposed before. It will change the user expectation
> > > on the sk_error_queue. It needs some bits/fields/knobs for bpf. I think this
> > > point is similar to other's earlier comments in this thread.
> >
> > Thanks for your reply.
> >
> > After seeing what you mentioned, I searched through the mailing list
> > and found one [1] which was designed to fetch hardware timestamps.
> >
> > [1]：https://lore.kernel.org/bpf/51fd5249-140a-4f1b-b20e-703f159e88a3@linux.dev/T/
> >
> > >
> > > I only have a chance to briefly look at it. I think it is useful. This
> > > bpf/timestamp feature request has come up before.
> >
> > At the very beginning, I had no intention to use bpf_setsockopt() to
> > retrieve the rx timestamp because it will override sk_tsflags, but I
> > cannot implement a good way like what I did to tx path: only setting
> > skb's field. I'm not sure if this override behaviour is acceptable, so
> > I post it to know what the bpf experts' suggestions are.
> >
> > >
> > > A high level comment. The current timestamp should work for non tcp sock? The
> > > bpf/timestamp solution should be able to also.
> >
> > For now, it only supports TCP proto. I would like to quickly implement
> > a framework which is also suitable for other protos. TCP is just a
> > start point.
> >
> > >
> > > sockops is tcp centric. From looking at patch 9 that needs to initialize 4 args,
> > > this interface feels old and not sure we want to extend to other sock types.
> > > This needs some thoughts.
> >
> > For me, I have interests to extend to other sock types. But I'm
> > supposed to ask Willem's opinion first.
> >
> > +Willem de Bruijn Do you want this bpf extension feature to extend to
> > other protos?
>
> There would likely be users for other protocols too, just like
> SO_TIMESTAMPING. Though TCP is probably the most widely used case by
> far.

Agreed !

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case
  2024-10-09 13:19       ` Willem de Bruijn
@ 2024-10-09 13:52         ` Jason Xing
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-09 13:52 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

On Wed, Oct 9, 2024 at 9:19 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Wed, Oct 9, 2024 at 2:56 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > We can set OPT_ID|OPT_ID_TCP before we initialize the last skb
> > > > from each sendmsg. We only set the socket once like how we use
> > > > setsockopt() with OPT_ID|OPT_ID_TCP flags.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > >  net/core/skbuff.c | 16 +++++++++++++---
> > > >  net/ipv4/tcp.c    | 19 +++++++++++++++----
> > > >  2 files changed, 28 insertions(+), 7 deletions(-)
> > > >
> > >
> > > > @@ -491,10 +491,21 @@ static u32 bpf_tcp_tx_timestamp(struct sock *sk)
> > > >       if (!(flags & SOF_TIMESTAMPING_TX_RECORD_MASK))
> > > >               return 0;
> > > >
> > > > +     /* We require users to set both OPT_ID and OPT_ID_TCP flags
> > > > +      * together here, or else the key might be inaccurate.
> > > > +      */
> > > > +     if (flags & SOF_TIMESTAMPING_OPT_ID &&
> > > > +         flags & SOF_TIMESTAMPING_OPT_ID_TCP &&
> > > > +         !(sk->sk_tsflags & (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP))) {
> > > > +             atomic_set(&sk->sk_tskey, (tcp_sk(sk)->write_seq - copied));
> > > > +             sk->sk_tsflags |= (SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP);
> > >
> > > So user and BPF admin conflict on both sk_tsflags and sktskey?
> > >
> > > I think BPF resetting this key, or incrementing it, may break user
> > > expectations.
> >
> > Yes, when it comes to OPT_ID and OPT_ID_TCP, conflict could happen.
> > The reason why I don't use it like BPF_SOCK_OPS_TS_SCHED_OPT_CB flags
> > (which is set along with each last skb) is that OPT_ID logic is a
> > little bit complex. If we want to avoid touching sk_tsflags field in
> > struct sock, we have to re-implement a similiar logic as you've
> > already done in these years.
>
> One option may be to only allow BPF to use sk_tsflags and sk_tskey if
> sk_tsflags is not set by the user, and to fail user access to these
> fields later.
>
> That enforces mutual exclusion between either user or admin
> timestamping.
>
> Of course, it may still break users if BPF is first, but the user
> socket tries to enable it later. So an imperfect solution.
>
> Ideally the two would use separate per socket state. I don't know
> all the options the various BPF hooks may have for this.

Adding a new sk state or skb state is a much clearer way. That is also
what I commented on the patch [0/9]. While waiting for BPF experts'
reply, I keep investigating if there is a good way to add a new field.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later
  2024-10-09 13:22       ` Willem de Bruijn
@ 2024-10-09 13:57         ` Jason Xing
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-09 13:57 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, dsahern, willemb, ast, daniel,
	andrii, martin.lau, eddyz87, song, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, netdev, Jason Xing

On Wed, Oct 9, 2024 at 9:22 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Wed, Oct 9, 2024 at 2:45 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Implement basic codes so that we later can easily add each tx points.
> > > > Introducing BPF_SOCK_OPS_ALL_CB_FLAGS used as a test statement can help use
> > > > control whether to output or not.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > >  include/uapi/linux/bpf.h       |  5 ++++-
> > > >  net/core/skbuff.c              | 18 ++++++++++++++++++
> > > >  tools/include/uapi/linux/bpf.h |  5 ++++-
> > > >  3 files changed, 26 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index c6cd7c7aeeee..157e139ed6fc 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -6900,8 +6900,11 @@ enum {
> > > >        * options first before the BPF program does.
> > > >        */
> > > >       BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
> > > > +     /* Call bpf when the kernel is generating tx timestamps.
> > > > +      */
> > > > +     BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG = (1<<7),
> > > >  /* Mask of all currently supported cb flags */
> > > > -     BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
> > > > +     BPF_SOCK_OPS_ALL_CB_FLAGS       = 0xFF,
> > > >  };
> > > >
> > > >  /* List of known BPF sock_ops operators.
> > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > index 74149dc4ee31..5ff1a91c1204 100644
> > > > --- a/net/core/skbuff.c
> > > > +++ b/net/core/skbuff.c
> > > > @@ -5539,6 +5539,21 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
> > > >
> > > > +static bool bpf_skb_tstamp_tx(struct sock *sk, u32 scm_flag,
> > > > +                           struct skb_shared_hwtstamps *hwtstamps)
> > > > +{
> > > > +     struct tcp_sock *tp;
> > > > +
> > > > +     if (!sk_is_tcp(sk))
> > > > +             return false;
> > > > +
> > > > +     tp = tcp_sk(sk);
> > > > +     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
> > > > +             return true;
> > > > +
> > > > +     return false;
> > > > +}
> > > > +
> > > >  void __skb_tstamp_tx(struct sk_buff *orig_skb,
> > > >                    const struct sk_buff *ack_skb,
> > > >                    struct skb_shared_hwtstamps *hwtstamps,
> > > > @@ -5551,6 +5566,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> > > >       if (!sk)
> > > >               return;
> > > >
> > > > +     if (bpf_skb_tstamp_tx(sk, tstype, hwtstamps))
> > > > +             return;
> > > > +
> > >
> > > Eventually, this whole feature could probably be behind a
> > > static_branch.
> >
> > You want to implement another toggle to control it? But for tx path
> > "BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)"
> > works as a per-netns toggle. I would like to know what you exactly
> > want to do in the next move?
>
> Not another toggle. A static branch that enables the datapath logic
> when a BPF program becomes active. See also for instance ipv4_min_ttl.

Thanks, I see. Then we can totally use the bpf_setsockopt() interface
with a new tsflag field, or something like this, to implement just
like how ip4_min_ttl works.

I will give it a try to see if it can easily work.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-09 13:47               ` Jason Xing
@ 2024-10-09 13:58                 ` Vadim Fedorenko
  2024-10-09 14:35                   ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Vadim Fedorenko @ 2024-10-09 13:58 UTC (permalink / raw)
  To: Jason Xing
  Cc: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On 09/10/2024 14:47, Jason Xing wrote:
> On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
>>
>> On 09/10/2024 12:48, Jason Xing wrote:
>>> On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>
>>>> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko
>>>> <vadim.fedorenko@linux.dev> wrote:
>>>>>
>>>>> On 09/10/2024 02:05, Jason Xing wrote:
>>>>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>>>>
>>>>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
>>>>>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Jason Xing wrote:
>>>>>>>>> From: Jason Xing <kernelxing@tencent.com>
>>>>>>>>>
>>>>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
>>>>>>>>> tracepoint to print information (say, tstamp) so that we can
>>>>>>>>> transparently equip applications with this feature and require no
>>>>>>>>> modification in user side.
>>>>>>>>>
>>>>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better
>>>>>>>>> extension, which is mainly suggested by John Fastabend and Willem de
>>>>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a
>>>>>>>>> better solution to extend.
>>>>>>>>>
>>>>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
>>>>>>>>> users only needs to pass certain flags through bpf program to make sure
>>>>>>>>> the last skb from each sendmsg() has timestamp related controlled flag.
>>>>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
>>>>>>>>> and wait for the moment when recvmsg() is called.
>>>>>>>>
>>>>>>>> As you mention, overall I am very supportive of having a way to add
>>>>>>>> timestamping by adminstrators, without having to rebuild applications.
>>>>>>>> BPF hooks seem to be the right place for this.
>>>>>>>>
>>>>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting
>>>>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature
>>>>>>>> set, and correlation between measurements for the same data in the
>>>>>>>> stream.
>>>>>>>>
>>>>>>>>> After this series, we could step by step implement more advanced
>>>>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
>>>>>>>>
>>>>>>>> My main implementation concern is where this API overlaps with the
>>>>>>>> existing user API, and how they might conflict. A few questions in the
>>>>>>>> patches.
>>>>>>>
>>>>>>> Agreed. That's also what I'm concerned about. So I decided to ask for
>>>>>>> related experts' help.
>>>>>>>
>>>>>>> How to deal with it without interfering with the existing apps in the
>>>>>>> right way is the key problem.
>>>>>>
>>>>>> What I try to implement is let the bpf program have the highest
>>>>>> precedence. It's similar to RTO min, see the commit as an example:
>>>>>>
>>>>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1
>>>>>> Author: Kevin Yang <yyd@google.com>
>>>>>> Date:   Mon Jun 3 21:30:54 2024 +0000
>>>>>>
>>>>>>        tcp: add sysctl_tcp_rto_min_us
>>>>>>
>>>>>>        Adding a sysctl knob to allow user to specify a default
>>>>>>        rto_min at socket init time, other than using the hard
>>>>>>        coded 200ms default rto_min.
>>>>>>
>>>>>>        Note that the rto_min route option has the highest precedence
>>>>>>        for configuring this setting, followed by the TCP_BPF_RTO_MIN
>>>>>>        socket option, followed by the tcp_rto_min_us sysctl.
>>>>>>
>>>>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
>>>>>> The first priority can override others. It doesn't have a good
>>>>>> chance/point to restore the icsk_rto_min field if users want to
>>>>>> shutdown the bpf program because it is set in
>>>>>> bpf_sol_tcp_setsockopt().
>>>>>
>>>>> rto_min example is slightly different. With tcp_rto_min the doesn't
>>>>> expect any data to come back to user space while for timestamping the
>>>>> app may be confused directly by providing more data, or by not providing
>>>>> expected data. I believe some hint about requestor of the data is needed
>>>>> here. It will also help to solve the problem of populating sk_err_queue
>>>>> mentioned by Martin.
>>>>
>>>> Sorry, I don't fully get it. In this patch series, this bpf extension
>>>> feature will not rely on sk_err_queue any more to report tx timestamps
>>>> to userspace. Bpf program can do that printing.
>>>>
>>>> Do you mean that it could be wrong if one skb carries the tsflags that
>>>> are previously set due to the bpf program and then suddenly users
>>>> detach the program? It indeed will put a new/cloned skb into the error
>>>> queue. Interesting corner case. It seems I have to re-implement a
>>>> totally independent tsflags for bpf extension feature. Do you have a
>>>> better idea on this?
>>>
>>> I feel that if I could introduce bpf new flags like
>>> SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch
>>> series, then it will not populate skb in sk_err_queue even users
>>> remove the bpf program all of sudden. With this kind of specific bpf
>>> flags, we can also avoid conflicting with the apps using
>>> SO_TIEMSTAMPING feature. Let me give it a shot unless a better
>>> solution shows up.
>>
>> It doesn't look great to have duplicate flags just to indicate that this
>> particular timestamp was asked by a bpf program, even though it looks
> 
> Or introduce a new field in struct sock or struct sk_buff so that
> existing SOF_TIMESTAMPING_* can be reused.

Well, I was thinking about this way. We can potentially add an array of
tsflags meaning the index of the array is the requestor. That will be
more flexible in terms of adding new requestor (like scheduler or
congestion control algo) if needed. But it comes with increased memory
usage on hot path which might be a blocker.

>> like a straight forward solution. Sounds like we have to re-think the
>> interface for timestamping requests, but I don't have proper suggestion
>> right now.
> 
> Thanks for your help :)
> 
> Thanks,
> Jason


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-09 13:58                 ` Vadim Fedorenko
@ 2024-10-09 14:35                   ` Jason Xing
  2024-10-09 14:59                     ` Vadim Fedorenko
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Xing @ 2024-10-09 14:35 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On Wed, Oct 9, 2024 at 9:58 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 09/10/2024 14:47, Jason Xing wrote:
> > On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko
> > <vadim.fedorenko@linux.dev> wrote:
> >>
> >> On 09/10/2024 12:48, Jason Xing wrote:
> >>> On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>>>
> >>>> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko
> >>>> <vadim.fedorenko@linux.dev> wrote:
> >>>>>
> >>>>> On 09/10/2024 02:05, Jason Xing wrote:
> >>>>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>>>>>>
> >>>>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
> >>>>>>> <willemdebruijn.kernel@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> Jason Xing wrote:
> >>>>>>>>> From: Jason Xing <kernelxing@tencent.com>
> >>>>>>>>>
> >>>>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> >>>>>>>>> tracepoint to print information (say, tstamp) so that we can
> >>>>>>>>> transparently equip applications with this feature and require no
> >>>>>>>>> modification in user side.
> >>>>>>>>>
> >>>>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better
> >>>>>>>>> extension, which is mainly suggested by John Fastabend and Willem de
> >>>>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a
> >>>>>>>>> better solution to extend.
> >>>>>>>>>
> >>>>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
> >>>>>>>>> users only needs to pass certain flags through bpf program to make sure
> >>>>>>>>> the last skb from each sendmsg() has timestamp related controlled flag.
> >>>>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
> >>>>>>>>> and wait for the moment when recvmsg() is called.
> >>>>>>>>
> >>>>>>>> As you mention, overall I am very supportive of having a way to add
> >>>>>>>> timestamping by adminstrators, without having to rebuild applications.
> >>>>>>>> BPF hooks seem to be the right place for this.
> >>>>>>>>
> >>>>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting
> >>>>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature
> >>>>>>>> set, and correlation between measurements for the same data in the
> >>>>>>>> stream.
> >>>>>>>>
> >>>>>>>>> After this series, we could step by step implement more advanced
> >>>>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> >>>>>>>>
> >>>>>>>> My main implementation concern is where this API overlaps with the
> >>>>>>>> existing user API, and how they might conflict. A few questions in the
> >>>>>>>> patches.
> >>>>>>>
> >>>>>>> Agreed. That's also what I'm concerned about. So I decided to ask for
> >>>>>>> related experts' help.
> >>>>>>>
> >>>>>>> How to deal with it without interfering with the existing apps in the
> >>>>>>> right way is the key problem.
> >>>>>>
> >>>>>> What I try to implement is let the bpf program have the highest
> >>>>>> precedence. It's similar to RTO min, see the commit as an example:
> >>>>>>
> >>>>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1
> >>>>>> Author: Kevin Yang <yyd@google.com>
> >>>>>> Date:   Mon Jun 3 21:30:54 2024 +0000
> >>>>>>
> >>>>>>        tcp: add sysctl_tcp_rto_min_us
> >>>>>>
> >>>>>>        Adding a sysctl knob to allow user to specify a default
> >>>>>>        rto_min at socket init time, other than using the hard
> >>>>>>        coded 200ms default rto_min.
> >>>>>>
> >>>>>>        Note that the rto_min route option has the highest precedence
> >>>>>>        for configuring this setting, followed by the TCP_BPF_RTO_MIN
> >>>>>>        socket option, followed by the tcp_rto_min_us sysctl.
> >>>>>>
> >>>>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
> >>>>>> The first priority can override others. It doesn't have a good
> >>>>>> chance/point to restore the icsk_rto_min field if users want to
> >>>>>> shutdown the bpf program because it is set in
> >>>>>> bpf_sol_tcp_setsockopt().
> >>>>>
> >>>>> rto_min example is slightly different. With tcp_rto_min the doesn't
> >>>>> expect any data to come back to user space while for timestamping the
> >>>>> app may be confused directly by providing more data, or by not providing
> >>>>> expected data. I believe some hint about requestor of the data is needed
> >>>>> here. It will also help to solve the problem of populating sk_err_queue
> >>>>> mentioned by Martin.
> >>>>
> >>>> Sorry, I don't fully get it. In this patch series, this bpf extension
> >>>> feature will not rely on sk_err_queue any more to report tx timestamps
> >>>> to userspace. Bpf program can do that printing.
> >>>>
> >>>> Do you mean that it could be wrong if one skb carries the tsflags that
> >>>> are previously set due to the bpf program and then suddenly users
> >>>> detach the program? It indeed will put a new/cloned skb into the error
> >>>> queue. Interesting corner case. It seems I have to re-implement a
> >>>> totally independent tsflags for bpf extension feature. Do you have a
> >>>> better idea on this?
> >>>
> >>> I feel that if I could introduce bpf new flags like
> >>> SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch
> >>> series, then it will not populate skb in sk_err_queue even users
> >>> remove the bpf program all of sudden. With this kind of specific bpf
> >>> flags, we can also avoid conflicting with the apps using
> >>> SO_TIEMSTAMPING feature. Let me give it a shot unless a better
> >>> solution shows up.
> >>
> >> It doesn't look great to have duplicate flags just to indicate that this
> >> particular timestamp was asked by a bpf program, even though it looks
> >
> > Or introduce a new field in struct sock or struct sk_buff so that
> > existing SOF_TIMESTAMPING_* can be reused.
>
> Well, I was thinking about this way. We can potentially add an array of
> tsflags meaning the index of the array is the requestor. That will be
> more flexible in terms of adding new requestor (like scheduler or
> congestion control algo) if needed. But it comes with increased memory
> usage on hot path which might be a blocker.

Is the following code snippet what you expect? But I wonder why not
just add a u32 field instead and then use each bit of it defined in
include/uapi/linux/net_tstamp.h?

diff --git a/include/net/sock.h b/include/net/sock.h
index b32f1424ecc5..4677f53da75a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -445,6 +445,7 @@ struct sock {
        u32                     sk_reserved_mem;
        int                     sk_forward_alloc;
        u32                     sk_tsflags;
+       u32                     new_tsflags[10];
        __cacheline_group_end(sock_write_rxtx);

        __cacheline_group_begin(sock_write_tx);

I could be missing something. Sorry. If possible, could you show me
some code snippets?

As for the new requestor, IIUC, do you want to add more tx timestamp
generating points in the future?

Thanks,
Jason

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-09 14:35                   ` Jason Xing
@ 2024-10-09 14:59                     ` Vadim Fedorenko
  2024-10-09 15:20                       ` Jason Xing
  0 siblings, 1 reply; 49+ messages in thread
From: Vadim Fedorenko @ 2024-10-09 14:59 UTC (permalink / raw)
  To: Jason Xing
  Cc: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

On 09/10/2024 15:35, Jason Xing wrote:
> On Wed, Oct 9, 2024 at 9:58 PM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
>>
>> On 09/10/2024 14:47, Jason Xing wrote:
>>> On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko
>>> <vadim.fedorenko@linux.dev> wrote:
>>>>
>>>> On 09/10/2024 12:48, Jason Xing wrote:
>>>>> On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>>>
>>>>>> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko
>>>>>> <vadim.fedorenko@linux.dev> wrote:
>>>>>>>
>>>>>>> On 09/10/2024 02:05, Jason Xing wrote:
>>>>>>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
>>>>>>>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Jason Xing wrote:
>>>>>>>>>>> From: Jason Xing <kernelxing@tencent.com>
>>>>>>>>>>>
>>>>>>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
>>>>>>>>>>> tracepoint to print information (say, tstamp) so that we can
>>>>>>>>>>> transparently equip applications with this feature and require no
>>>>>>>>>>> modification in user side.
>>>>>>>>>>>
>>>>>>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better
>>>>>>>>>>> extension, which is mainly suggested by John Fastabend and Willem de
>>>>>>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a
>>>>>>>>>>> better solution to extend.
>>>>>>>>>>>
>>>>>>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
>>>>>>>>>>> users only needs to pass certain flags through bpf program to make sure
>>>>>>>>>>> the last skb from each sendmsg() has timestamp related controlled flag.
>>>>>>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
>>>>>>>>>>> and wait for the moment when recvmsg() is called.
>>>>>>>>>>
>>>>>>>>>> As you mention, overall I am very supportive of having a way to add
>>>>>>>>>> timestamping by adminstrators, without having to rebuild applications.
>>>>>>>>>> BPF hooks seem to be the right place for this.
>>>>>>>>>>
>>>>>>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting
>>>>>>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature
>>>>>>>>>> set, and correlation between measurements for the same data in the
>>>>>>>>>> stream.
>>>>>>>>>>
>>>>>>>>>>> After this series, we could step by step implement more advanced
>>>>>>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
>>>>>>>>>>
>>>>>>>>>> My main implementation concern is where this API overlaps with the
>>>>>>>>>> existing user API, and how they might conflict. A few questions in the
>>>>>>>>>> patches.
>>>>>>>>>
>>>>>>>>> Agreed. That's also what I'm concerned about. So I decided to ask for
>>>>>>>>> related experts' help.
>>>>>>>>>
>>>>>>>>> How to deal with it without interfering with the existing apps in the
>>>>>>>>> right way is the key problem.
>>>>>>>>
>>>>>>>> What I try to implement is let the bpf program have the highest
>>>>>>>> precedence. It's similar to RTO min, see the commit as an example:
>>>>>>>>
>>>>>>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1
>>>>>>>> Author: Kevin Yang <yyd@google.com>
>>>>>>>> Date:   Mon Jun 3 21:30:54 2024 +0000
>>>>>>>>
>>>>>>>>         tcp: add sysctl_tcp_rto_min_us
>>>>>>>>
>>>>>>>>         Adding a sysctl knob to allow user to specify a default
>>>>>>>>         rto_min at socket init time, other than using the hard
>>>>>>>>         coded 200ms default rto_min.
>>>>>>>>
>>>>>>>>         Note that the rto_min route option has the highest precedence
>>>>>>>>         for configuring this setting, followed by the TCP_BPF_RTO_MIN
>>>>>>>>         socket option, followed by the tcp_rto_min_us sysctl.
>>>>>>>>
>>>>>>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
>>>>>>>> The first priority can override others. It doesn't have a good
>>>>>>>> chance/point to restore the icsk_rto_min field if users want to
>>>>>>>> shutdown the bpf program because it is set in
>>>>>>>> bpf_sol_tcp_setsockopt().
>>>>>>>
>>>>>>> rto_min example is slightly different. With tcp_rto_min the doesn't
>>>>>>> expect any data to come back to user space while for timestamping the
>>>>>>> app may be confused directly by providing more data, or by not providing
>>>>>>> expected data. I believe some hint about requestor of the data is needed
>>>>>>> here. It will also help to solve the problem of populating sk_err_queue
>>>>>>> mentioned by Martin.
>>>>>>
>>>>>> Sorry, I don't fully get it. In this patch series, this bpf extension
>>>>>> feature will not rely on sk_err_queue any more to report tx timestamps
>>>>>> to userspace. Bpf program can do that printing.
>>>>>>
>>>>>> Do you mean that it could be wrong if one skb carries the tsflags that
>>>>>> are previously set due to the bpf program and then suddenly users
>>>>>> detach the program? It indeed will put a new/cloned skb into the error
>>>>>> queue. Interesting corner case. It seems I have to re-implement a
>>>>>> totally independent tsflags for bpf extension feature. Do you have a
>>>>>> better idea on this?
>>>>>
>>>>> I feel that if I could introduce bpf new flags like
>>>>> SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch
>>>>> series, then it will not populate skb in sk_err_queue even users
>>>>> remove the bpf program all of sudden. With this kind of specific bpf
>>>>> flags, we can also avoid conflicting with the apps using
>>>>> SO_TIEMSTAMPING feature. Let me give it a shot unless a better
>>>>> solution shows up.
>>>>
>>>> It doesn't look great to have duplicate flags just to indicate that this
>>>> particular timestamp was asked by a bpf program, even though it looks
>>>
>>> Or introduce a new field in struct sock or struct sk_buff so that
>>> existing SOF_TIMESTAMPING_* can be reused.
>>
>> Well, I was thinking about this way. We can potentially add an array of
>> tsflags meaning the index of the array is the requestor. That will be
>> more flexible in terms of adding new requestor (like scheduler or
>> congestion control algo) if needed. But it comes with increased memory
>> usage on hot path which might be a blocker.
> 
> Is the following code snippet what you expect? But I wonder why not
> just add a u32 field instead and then use each bit of it defined in
> include/uapi/linux/net_tstamp.h?
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index b32f1424ecc5..4677f53da75a 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -445,6 +445,7 @@ struct sock {
>          u32                     sk_reserved_mem;
>          int                     sk_forward_alloc;
>          u32                     sk_tsflags;
> +       u32                     new_tsflags[10];
>          __cacheline_group_end(sock_write_rxtx);
> 
>          __cacheline_group_begin(sock_write_tx);
> 
> I could be missing something. Sorry. If possible, could you show me
> some code snippets?
> 
> As for the new requestor, IIUC, do you want to add more tx timestamp
> generating points in the future?

It's more like this:

diff --git a/include/net/sock.h b/include/net/sock.h
index c58ca8dd561b..93f931dcc4cc 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -234,6 +234,14 @@ struct sock_common {
  struct bpf_local_storage;
  struct sk_filter;

+enum {
+       SOCKETOPT_TS_REQUESTOR = 0,
+       CMSG_TS_REQUESTOR,
+       BPFPROG_TS_REQUESTOR,
+
+       __MAX_TS_REQUESTOR,
+};
+
  /**
    *    struct sock - network layer representation of sockets
    *    @__sk_common: shared layout with inet_timewait_sock
@@ -444,7 +452,7 @@ struct sock {
         socket_lock_t           sk_lock;
         u32                     sk_reserved_mem;
         int                     sk_forward_alloc;
-       u32                     sk_tsflags;
+       u32                     sk_tsflags[__MAX_TS_REQUESTOR];
         __cacheline_group_end(sock_write_rxtx);

         __cacheline_group_begin(sock_write_tx);


And use existing SOF_TIMESTAMPING_* for each element in the array. Not
sure that struct sock is the best place though, as some timestamping
requests may be on per-packet basis for protocols other than TCP.

Again, I'm just thinking out loud, kinda wild idea.


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
  2024-10-09 14:59                     ` Vadim Fedorenko
@ 2024-10-09 15:20                       ` Jason Xing
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Xing @ 2024-10-09 15:20 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern, willemb,
	ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, bpf, netdev,
	Jason Xing

> It's more like this:
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index c58ca8dd561b..93f931dcc4cc 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -234,6 +234,14 @@ struct sock_common {
>   struct bpf_local_storage;
>   struct sk_filter;
>
> +enum {
> +       SOCKETOPT_TS_REQUESTOR = 0,
> +       CMSG_TS_REQUESTOR,
> +       BPFPROG_TS_REQUESTOR,
> +
> +       __MAX_TS_REQUESTOR,
> +};
> +
>   /**
>     *    struct sock - network layer representation of sockets
>     *    @__sk_common: shared layout with inet_timewait_sock
> @@ -444,7 +452,7 @@ struct sock {
>          socket_lock_t           sk_lock;
>          u32                     sk_reserved_mem;
>          int                     sk_forward_alloc;
> -       u32                     sk_tsflags;
> +       u32                     sk_tsflags[__MAX_TS_REQUESTOR];
>          __cacheline_group_end(sock_write_rxtx);
>
>          __cacheline_group_begin(sock_write_tx);
>
>
> And use existing SOF_TIMESTAMPING_* for each element in the array. Not
> sure that struct sock is the best place though, as some timestamping
> requests may be on per-packet basis for protocols other than TCP.
>
> Again, I'm just thinking out loud, kinda wild idea.

Thanks. I see.

Requestor or requester? I don't know.

For now, __MAX_TS_REQUESTOR can be two, one is used for the old
implementation, the other one is used for BPF extension.

One irrelevant question is if we need CMSG_TS_REQUESTOR to split the
old tsflags into two because the cmsg relies on sk->sk_tsflags which
works well.

The whole idea is very interesting and inspiring to me! It could be a
good way to go. But as you said, the memory can be a blocker. And
where exactly we should add in struct sock is another problem because
the size of this array could be different if we add more requestors in
the future.

I think I can write in the next version based on this idea
(sk_tsflags[MAX] array has two requestor members at the current stage)
and then seek for other experts' opinions.

+Willem, I'd like to know if you are for or against this idea?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2024-10-09 15:21 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
2024-10-08  9:51 ` [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later Jason Xing
2024-10-08 18:45   ` Willem de Bruijn
2024-10-08 23:27     ` Jason Xing
2024-10-09 13:22       ` Willem de Bruijn
2024-10-09 13:57         ` Jason Xing
2024-10-09  0:58   ` Kuniyuki Iwashima
2024-10-09  8:11     ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 2/9] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp Jason Xing
2024-10-08  9:51 ` [PATCH net-next 3/9] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp Jason Xing
2024-10-08 19:13   ` Vadim Fedorenko
2024-10-08 23:08     ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 4/9] net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp Jason Xing
2024-10-08  9:51 ` [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps Jason Xing
2024-10-08 18:53   ` Willem de Bruijn
2024-10-08 23:37     ` Jason Xing
2024-10-08 19:18   ` Vadim Fedorenko
2024-10-08 23:48     ` Jason Xing
2024-10-09  9:16       ` Vadim Fedorenko
2024-10-09 11:15         ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case Jason Xing
2024-10-08 18:56   ` Willem de Bruijn
2024-10-08 23:18     ` Jason Xing
2024-10-09 13:19       ` Willem de Bruijn
2024-10-09 13:52         ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt Jason Xing
2024-10-09  7:19   ` Martin KaFai Lau
2024-10-09  8:09     ` Jason Xing
2024-10-09 13:23       ` Willem de Bruijn
2024-10-09 13:48         ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps Jason Xing
2024-10-09  0:22   ` Jakub Kicinski
2024-10-09  0:30     ` Jason Xing
2024-10-09  2:33   ` kernel test robot
2024-10-09  4:17   ` kernel test robot
2024-10-09  5:09   ` kernel test robot
2024-10-08  9:51 ` [PATCH net-next 9/9] net-timestamp: add bpf support for rx software/hardware timestamp Jason Xing
2024-10-08 18:44 ` [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Willem de Bruijn
2024-10-08 23:22   ` Jason Xing
2024-10-09  1:05     ` Jason Xing
2024-10-09  9:27       ` Vadim Fedorenko
2024-10-09 11:12         ` Jason Xing
2024-10-09 11:48           ` Jason Xing
2024-10-09 13:16             ` Vadim Fedorenko
2024-10-09 13:47               ` Jason Xing
2024-10-09 13:58                 ` Vadim Fedorenko
2024-10-09 14:35                   ` Jason Xing
2024-10-09 14:59                     ` Vadim Fedorenko
2024-10-09 15:20                       ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).