* [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg
@ 2024-09-04 11:31 Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
` (4 more replies)
0 siblings, 5 replies; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-04 11:31 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Jason Xing, Simon Horman
Cc: Vadim Fedorenko, netdev
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
timestamps and packets sent via socket. Unfortunately, there is no way
to reliably predict socket timestamp ID value in case of error returned
by sendmsg. For UDP sockets it's impossible because of lockless
nature of UDP transmit, several threads may send packets in parallel. In
case of RAW sockets MSG_MORE option makes things complicated. More
details are in the conversation [1].
This patch adds new control message type to give user-space
software an opportunity to control the mapping between packets and
values by providing ID with each sendmsg.
The first patch in the series adds all needed definitions and implements
the function for UDP sockets. The explicit check of socket's type is not
added because subsequent patches in the series will add support for other
types of sockets. The documentation is also included into the first
patch.
Patch 2/4 adds support for TCP sockets. This part is simple and straight
forward.
Patch 3/4 adds support for RAW sockets. It's a bit tricky because
sock_tx_timestamp functions has to be refactored to receive full socket
cookie information to fill in ID. The commit b534dc46c8ae ("net_tstamp:
add SOF_TIMESTAMPING_OPT_ID_TCP") did the conversion of sk_tsflags to
u32 but sock_tx_timestamp functions were not converted and still receive
16b flags. It wasn't a problem because SOF_TIMESTAMPING_OPT_ID_TCP was
not checked in these functions, that's why no backporting is needed.
Patch 4/4 adds selftests for new feature.
Changelog:
v2 -> v3:
- remove SOF_TIMESTAMPING_OPT_ID_CMSG UAPI value and use kernel-internal
SOCKCM_FLAG_TS_OPT_ID which uses the highest bit of tsflags.
- add support for TCP and RAW sockets
v1 -> v2:
- add more selftests
- add documentation for the feature
- refactor UDP send function
RFC -> v1:
- add selftests
- add SOF_TIMESTAMPING_OPT_ID_CMSG to signal of custom ID provided by
user-space instead of reserving value of 0 for this.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
Vadim Fedorenko (4):
net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
net_tstamp: add SCM_TS_OPT_ID for TCP sockets
net_tstamp: add SCM_TS_OPT_ID for RAW sockets
selftests: txtimestamp: add SCM_TS_OPT_ID test
Documentation/networking/timestamping.rst | 13 ++++++
arch/alpha/include/uapi/asm/socket.h | 2 +
arch/mips/include/uapi/asm/socket.h | 2 +
arch/parisc/include/uapi/asm/socket.h | 2 +
arch/sparc/include/uapi/asm/socket.h | 2 +
include/net/inet_sock.h | 4 +-
include/net/sock.h | 29 +++++++++----
include/uapi/asm-generic/socket.h | 2 +
include/uapi/linux/net_tstamp.h | 7 ++++
net/can/raw.c | 2 +-
net/core/sock.c | 9 ++++
net/ipv4/ip_output.c | 20 ++++++---
net/ipv4/raw.c | 2 +-
net/ipv4/tcp.c | 15 ++++---
net/ipv6/ip6_output.c | 20 ++++++---
net/ipv6/raw.c | 2 +-
net/packet/af_packet.c | 6 +--
net/socket.c | 2 +-
tools/include/uapi/asm-generic/socket.h | 2 +
tools/testing/selftests/net/txtimestamp.c | 48 +++++++++++++++++-----
tools/testing/selftests/net/txtimestamp.sh | 12 +++---
21 files changed, 154 insertions(+), 49 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-04 11:31 [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
@ 2024-09-04 11:31 ` Vadim Fedorenko
2024-09-04 13:56 ` Jason Xing
` (4 more replies)
2024-09-04 11:31 ` [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets Vadim Fedorenko
` (3 subsequent siblings)
4 siblings, 5 replies; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-04 11:31 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Jason Xing, Simon Horman
Cc: Vadim Fedorenko, netdev
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
timestamps and packets sent via socket. Unfortunately, there is no way
to reliably predict socket timestamp ID value in case of error returned
by sendmsg. For UDP sockets it's impossible because of lockless
nature of UDP transmit, several threads may send packets in parallel. In
case of RAW sockets MSG_MORE option makes things complicated. More
details are in the conversation [1].
This patch adds new control message type to give user-space
software an opportunity to control the mapping between packets and
values by providing ID with each sendmsg for UDP sockets.
The documentation is also added in this patch.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
Documentation/networking/timestamping.rst | 13 +++++++++++++
arch/alpha/include/uapi/asm/socket.h | 2 ++
arch/mips/include/uapi/asm/socket.h | 2 ++
arch/parisc/include/uapi/asm/socket.h | 2 ++
arch/sparc/include/uapi/asm/socket.h | 2 ++
include/net/inet_sock.h | 4 +++-
include/net/sock.h | 2 ++
include/uapi/asm-generic/socket.h | 2 ++
include/uapi/linux/net_tstamp.h | 7 +++++++
net/core/sock.c | 9 +++++++++
net/ipv4/ip_output.c | 18 +++++++++++++-----
net/ipv6/ip6_output.c | 18 +++++++++++++-----
12 files changed, 70 insertions(+), 11 deletions(-)
diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
index 5e93cd71f99f..e365526d6bf9 100644
--- a/Documentation/networking/timestamping.rst
+++ b/Documentation/networking/timestamping.rst
@@ -193,6 +193,19 @@ SOF_TIMESTAMPING_OPT_ID:
among all possibly concurrently outstanding timestamp requests for
that socket.
+ The process can optionally override the default generated ID, by
+ passing a specific ID with control message SCM_TS_OPT_ID::
+
+ struct msghdr *msg;
+ ...
+ cmsg = CMSG_FIRSTHDR(msg);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_TS_OPT_ID;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(__u32));
+ *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
+ err = sendmsg(fd, msg, 0);
+
+
SOF_TIMESTAMPING_OPT_ID_TCP:
Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index e94f621903fe..99dec81e7c84 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -140,6 +140,8 @@
#define SO_PASSPIDFD 76
#define SO_PEERPIDFD 77
+#define SCM_TS_OPT_ID 78
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 60ebaed28a4c..bb3dc8feb205 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -151,6 +151,8 @@
#define SO_PASSPIDFD 76
#define SO_PEERPIDFD 77
+#define SCM_TS_OPT_ID 78
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index be264c2b1a11..c3ab3b3289eb 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -132,6 +132,8 @@
#define SO_PASSPIDFD 0x404A
#define SO_PEERPIDFD 0x404B
+#define SCM_TS_OPT_ID 0x404C
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 682da3714686..9b40f0a57fbc 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -133,6 +133,8 @@
#define SO_PASSPIDFD 0x0055
#define SO_PEERPIDFD 0x0056
+#define SCM_TS_OPT_ID 0x0057
+
#if !defined(__KERNEL__)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 394c3b66065e..f01dd273bea6 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -174,6 +174,7 @@ struct inet_cork {
__s16 tos;
char priority;
__u16 gso_size;
+ u32 ts_opt_id;
u64 transmit_time;
u32 mark;
};
@@ -241,7 +242,8 @@ struct inet_sock {
struct inet_cork_full cork;
};
-#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
+#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
+#define IPCORK_TS_OPT_ID 2 /* ts_opt_id field is valid, overriding sk_tskey */
enum {
INET_FLAGS_PKTINFO = 0,
diff --git a/include/net/sock.h b/include/net/sock.h
index f51d61fab059..c6554ad82961 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -952,6 +952,7 @@ enum sock_flags {
};
#define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
+#define SOCKCM_FLAG_TS_OPT_ID BIT(31)
static inline void sock_copy_flags(struct sock *nsk, const struct sock *osk)
{
@@ -1794,6 +1795,7 @@ struct sockcm_cookie {
u64 transmit_time;
u32 mark;
u32 tsflags;
+ u32 ts_opt_id;
};
static inline void sockcm_init(struct sockcm_cookie *sockc,
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 8ce8a39a1e5f..db3df3e74b01 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -135,6 +135,8 @@
#define SO_PASSPIDFD 76
#define SO_PEERPIDFD 77
+#define SCM_TS_OPT_ID 78
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index a2c66b3d7f0f..1c38536350e7 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -38,6 +38,13 @@ enum {
SOF_TIMESTAMPING_LAST
};
+/*
+ * The highest bit of sk_tsflags is reserved for kernel-internal
+ * SOCKCM_FLAG_TS_OPT_ID. This check is to control that SOF_TIMESTAMPING*
+ * values do not reach this reserved area
+ */
+static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
+
/*
* SO_TIMESTAMPING flags are either for recording a packet timestamp or for
* reporting the timestamp to user space.
diff --git a/net/core/sock.c b/net/core/sock.c
index 468b1239606c..ac70c759a371 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2859,6 +2859,15 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
return -EINVAL;
sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
break;
+ case SCM_TS_OPT_ID:
+ tsflags = READ_ONCE(sk->sk_tsflags);
+ if (!(tsflags & SOF_TIMESTAMPING_OPT_ID))
+ return -EINVAL;
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
+ return -EINVAL;
+ sockc->ts_opt_id = *(u32 *)CMSG_DATA(cmsg);
+ sockc->tsflags |= SOCKCM_FLAG_TS_OPT_ID;
+ break;
/* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
case SCM_RIGHTS:
case SCM_CREDENTIALS:
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index eea443b7f65e..bd2f6a699470 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -973,7 +973,7 @@ static int __ip_append_data(struct sock *sk,
unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
int csummode = CHECKSUM_NONE;
struct rtable *rt = dst_rtable(cork->dst);
- bool paged, hold_tskey, extra_uref = false;
+ bool paged, hold_tskey = false, extra_uref = false;
unsigned int wmem_alloc_delta = 0;
u32 tskey = 0;
@@ -1049,10 +1049,15 @@ static int __ip_append_data(struct sock *sk,
cork->length += length;
- hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
- READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
- if (hold_tskey)
- tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+ if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
+ READ_ONCE(sk->sk_tsflags) & SOCKCM_FLAG_TS_OPT_ID) {
+ if (cork->flags & IPCORK_TS_OPT_ID) {
+ tskey = cork->ts_opt_id;
+ } else {
+ tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+ hold_tskey = true;
+ }
+ }
/* So, what's going on in the loop below?
*
@@ -1325,8 +1330,11 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
cork->mark = ipc->sockc.mark;
cork->priority = ipc->priority;
cork->transmit_time = ipc->sockc.transmit_time;
+ cork->ts_opt_id = ipc->sockc.ts_opt_id;
cork->tx_flags = 0;
sock_tx_timestamp(sk, ipc->sockc.tsflags, &cork->tx_flags);
+ if (ipc->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID)
+ cork->flags |= IPCORK_TS_OPT_ID;
return 0;
}
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f26841f1490f..9b87d23314e8 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1401,7 +1401,10 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
cork->base.gso_size = ipc6->gso_size;
cork->base.tx_flags = 0;
cork->base.mark = ipc6->sockc.mark;
+ cork->base.ts_opt_id = ipc6->sockc.ts_opt_id;
sock_tx_timestamp(sk, ipc6->sockc.tsflags, &cork->base.tx_flags);
+ if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID)
+ cork->base.flags |= IPCORK_TS_OPT_ID;
cork->base.length = 0;
cork->base.transmit_time = ipc6->sockc.transmit_time;
@@ -1433,7 +1436,7 @@ static int __ip6_append_data(struct sock *sk,
bool zc = false;
u32 tskey = 0;
struct rt6_info *rt = dst_rt6_info(cork->dst);
- bool paged, hold_tskey, extra_uref = false;
+ bool paged, hold_tskey = false, extra_uref = false;
struct ipv6_txoptions *opt = v6_cork->opt;
int csummode = CHECKSUM_NONE;
unsigned int maxnonfragsize, headersize;
@@ -1543,10 +1546,15 @@ static int __ip6_append_data(struct sock *sk,
flags &= ~MSG_SPLICE_PAGES;
}
- hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
- READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
- if (hold_tskey)
- tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+ if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
+ READ_ONCE(sk->sk_tsflags) & SOCKCM_FLAG_TS_OPT_ID) {
+ if (cork->flags & IPCORK_TS_OPT_ID) {
+ tskey = cork->ts_opt_id;
+ } else {
+ tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+ hold_tskey = true;
+ }
+ }
/*
* Let's try using as much space as possible.
--
2.43.5
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-04 11:31 [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
@ 2024-09-04 11:31 ` Vadim Fedorenko
2024-09-05 14:58 ` Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 3/4] net_tstamp: add SCM_TS_OPT_ID for RAW sockets Vadim Fedorenko
` (2 subsequent siblings)
4 siblings, 1 reply; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-04 11:31 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Jason Xing, Simon Horman
Cc: Vadim Fedorenko, netdev
TCP sockets have different flow for providing timestamp OPT_ID value.
Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
net/ipv4/tcp.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8a5680b4e786..5553a8aeee80 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
}
EXPORT_SYMBOL(tcp_init_sock);
-static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
+static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
{
struct sk_buff *skb = tcp_write_queue_tail(sk);
+ u32 tsflags = sockc->tsflags;
if (tsflags && skb) {
struct skb_shared_info *shinfo = skb_shinfo(skb);
@@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
if (tsflags & SOF_TIMESTAMPING_TX_ACK)
tcb->txstamp_ack = 1;
- if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
- shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+ if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
+ if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
+ shinfo->tskey = sockc->ts_opt_id;
+ else
+ shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+ }
}
}
@@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
out:
if (copied) {
- tcp_tx_timestamp(sk, sockc.tsflags);
+ tcp_tx_timestamp(sk, &sockc);
tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
}
out_nopush:
--
2.43.5
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH net-next v3 3/4] net_tstamp: add SCM_TS_OPT_ID for RAW sockets
2024-09-04 11:31 [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets Vadim Fedorenko
@ 2024-09-04 11:31 ` Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 4/4] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
2024-09-04 11:36 ` [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
4 siblings, 0 replies; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-04 11:31 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Jason Xing, Simon Horman
Cc: Vadim Fedorenko, netdev
The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW
sockets. To add new option this patch converts all callers (direct and
indirect) of _sock_tx_timestamp to provide sockcm_cookie instead of
tsflags. And while here fix __sock_tx_timestamp to receive tsflags as
__u32 instead of __u16.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
include/net/sock.h | 27 ++++++++++++++++++---------
net/can/raw.c | 2 +-
net/ipv4/ip_output.c | 2 +-
net/ipv4/raw.c | 2 +-
net/ipv4/tcp.c | 2 +-
net/ipv6/ip6_output.c | 2 +-
net/ipv6/raw.c | 2 +-
net/packet/af_packet.c | 6 +++---
net/socket.c | 2 +-
9 files changed, 28 insertions(+), 19 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index c6554ad82961..0ec60f5225a3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2653,39 +2653,48 @@ static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk,
sock_write_timestamp(sk, 0);
}
-void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags);
+void __sock_tx_timestamp(__u32 tsflags, __u8 *tx_flags);
/**
* _sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
* @sk: socket sending this packet
- * @tsflags: timestamping flags to use
+ * @sockc: pointer to socket cmsg cookie to get timestamping info
* @tx_flags: completed with instructions for time stamping
* @tskey: filled in with next sk_tskey (not for TCP, which uses seqno)
*
* Note: callers should take care of initial ``*tx_flags`` value (usually 0)
*/
-static inline void _sock_tx_timestamp(struct sock *sk, __u16 tsflags,
+static inline void _sock_tx_timestamp(struct sock *sk,
+ const struct sockcm_cookie *sockc,
__u8 *tx_flags, __u32 *tskey)
{
+ __u32 tsflags = sockc->tsflags;
+
if (unlikely(tsflags)) {
__sock_tx_timestamp(tsflags, tx_flags);
if (tsflags & SOF_TIMESTAMPING_OPT_ID && tskey &&
- tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
- *tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+ tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
+ if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
+ *tskey = sockc->ts_opt_id;
+ else
+ *tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+ }
}
if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS)))
*tx_flags |= SKBTX_WIFI_STATUS;
}
-static inline void sock_tx_timestamp(struct sock *sk, __u16 tsflags,
+static inline void sock_tx_timestamp(struct sock *sk,
+ const struct sockcm_cookie *sockc,
__u8 *tx_flags)
{
- _sock_tx_timestamp(sk, tsflags, tx_flags, NULL);
+ _sock_tx_timestamp(sk, sockc, tx_flags, NULL);
}
-static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags)
+static inline void skb_setup_tx_timestamp(struct sk_buff *skb,
+ const struct sockcm_cookie *sockc)
{
- _sock_tx_timestamp(skb->sk, tsflags, &skb_shinfo(skb)->tx_flags,
+ _sock_tx_timestamp(skb->sk, sockc, &skb_shinfo(skb)->tx_flags,
&skb_shinfo(skb)->tskey);
}
diff --git a/net/can/raw.c b/net/can/raw.c
index 00533f64d69d..255c0a8f39d6 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -966,7 +966,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
skb->mark = READ_ONCE(sk->sk_mark);
skb->tstamp = sockc.transmit_time;
- skb_setup_tx_timestamp(skb, sockc.tsflags);
+ skb_setup_tx_timestamp(skb, &sockc);
err = can_send(skb, ro->loopback);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index bd2f6a699470..107919b5a793 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1332,7 +1332,7 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
cork->transmit_time = ipc->sockc.transmit_time;
cork->ts_opt_id = ipc->sockc.ts_opt_id;
cork->tx_flags = 0;
- sock_tx_timestamp(sk, ipc->sockc.tsflags, &cork->tx_flags);
+ sock_tx_timestamp(sk, &ipc->sockc, &cork->tx_flags);
if (ipc->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID)
cork->flags |= IPCORK_TS_OPT_ID;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 474dfd263c8b..0e9e01967ec9 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -370,7 +370,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
skb->ip_summed = CHECKSUM_NONE;
- skb_setup_tx_timestamp(skb, sockc->tsflags);
+ skb_setup_tx_timestamp(skb, sockc);
if (flags & MSG_CONFIRM)
skb_set_dst_pending_confirm(skb, 1);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5553a8aeee80..0d3decc13a99 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -483,7 +483,7 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
struct skb_shared_info *shinfo = skb_shinfo(skb);
struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
- sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
+ sock_tx_timestamp(sk, sockc, &shinfo->tx_flags);
if (tsflags & SOF_TIMESTAMPING_TX_ACK)
tcb->txstamp_ack = 1;
if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 9b87d23314e8..91c1f1a75c7e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1402,7 +1402,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
cork->base.tx_flags = 0;
cork->base.mark = ipc6->sockc.mark;
cork->base.ts_opt_id = ipc6->sockc.ts_opt_id;
- sock_tx_timestamp(sk, ipc6->sockc.tsflags, &cork->base.tx_flags);
+ sock_tx_timestamp(sk, &ipc6->sockc, &cork->base.tx_flags);
if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID)
cork->base.flags |= IPCORK_TS_OPT_ID;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 608fa9d05b55..8476a3944a88 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -629,7 +629,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
skb->ip_summed = CHECKSUM_NONE;
- skb_setup_tx_timestamp(skb, sockc->tsflags);
+ skb_setup_tx_timestamp(skb, sockc);
if (flags & MSG_CONFIRM)
skb_set_dst_pending_confirm(skb, 1);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 4a364cdd445e..83ef86c0dd88 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2118,7 +2118,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = READ_ONCE(sk->sk_mark);
skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid);
- skb_setup_tx_timestamp(skb, sockc.tsflags);
+ skb_setup_tx_timestamp(skb, &sockc);
if (unlikely(extra_len == 4))
skb->no_fcs = 1;
@@ -2650,7 +2650,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
skb->priority = READ_ONCE(po->sk.sk_priority);
skb->mark = READ_ONCE(po->sk.sk_mark);
skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, po->sk.sk_clockid);
- skb_setup_tx_timestamp(skb, sockc->tsflags);
+ skb_setup_tx_timestamp(skb, sockc);
skb_zcopy_set_nouarg(skb, ph.raw);
skb_reserve(skb, hlen);
@@ -3115,7 +3115,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
goto out_free;
}
- skb_setup_tx_timestamp(skb, sockc.tsflags);
+ skb_setup_tx_timestamp(skb, &sockc);
if (!vnet_hdr.gso_type && (len > dev->mtu + reserve + extra_len) &&
!packet_extra_vlan_len_allowed(dev, skb)) {
diff --git a/net/socket.c b/net/socket.c
index fcbdd5bc47ac..e5d9f90882b8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -688,7 +688,7 @@ void sock_release(struct socket *sock)
}
EXPORT_SYMBOL(sock_release);
-void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags)
+void __sock_tx_timestamp(__u32 tsflags, __u8 *tx_flags)
{
u8 flags = *tx_flags;
--
2.43.5
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH net-next v3 4/4] selftests: txtimestamp: add SCM_TS_OPT_ID test
2024-09-04 11:31 [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
` (2 preceding siblings ...)
2024-09-04 11:31 ` [PATCH net-next v3 3/4] net_tstamp: add SCM_TS_OPT_ID for RAW sockets Vadim Fedorenko
@ 2024-09-04 11:31 ` Vadim Fedorenko
2024-09-04 11:36 ` [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
4 siblings, 0 replies; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-04 11:31 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Jason Xing, Simon Horman
Cc: Vadim Fedorenko, netdev
Extend txtimestamp udp test to run with fixed tskey using
SCM_TS_OPT_ID control message.
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
tools/include/uapi/asm-generic/socket.h | 2 +
tools/testing/selftests/net/txtimestamp.c | 48 +++++++++++++++++-----
tools/testing/selftests/net/txtimestamp.sh | 12 +++---
3 files changed, 47 insertions(+), 15 deletions(-)
diff --git a/tools/include/uapi/asm-generic/socket.h b/tools/include/uapi/asm-generic/socket.h
index 54d9c8bf7c55..281df9139d2b 100644
--- a/tools/include/uapi/asm-generic/socket.h
+++ b/tools/include/uapi/asm-generic/socket.h
@@ -124,6 +124,8 @@
#define SO_PASSPIDFD 76
#define SO_PEERPIDFD 77
+#define SCM_TS_OPT_ID 78
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/tools/testing/selftests/net/txtimestamp.c b/tools/testing/selftests/net/txtimestamp.c
index ec60a16c9307..3a8f716e72ae 100644
--- a/tools/testing/selftests/net/txtimestamp.c
+++ b/tools/testing/selftests/net/txtimestamp.c
@@ -54,6 +54,10 @@
#define USEC_PER_SEC 1000000L
#define NSEC_PER_SEC 1000000000LL
+#ifndef SCM_TS_OPT_ID
+# define SCM_TS_OPT_ID 78
+#endif
+
/* command line parameters */
static int cfg_proto = SOCK_STREAM;
static int cfg_ipproto = IPPROTO_TCP;
@@ -77,6 +81,8 @@ static bool cfg_epollet;
static bool cfg_do_listen;
static uint16_t dest_port = 9000;
static bool cfg_print_nsec;
+static uint32_t ts_opt_id;
+static bool cfg_use_cmsg_opt_id;
static struct sockaddr_in daddr;
static struct sockaddr_in6 daddr6;
@@ -136,12 +142,13 @@ static void validate_key(int tskey, int tstype)
/* compare key for each subsequent request
* must only test for one type, the first one requested
*/
- if (saved_tskey == -1)
+ if (saved_tskey == -1 || cfg_use_cmsg_opt_id)
saved_tskey_type = tstype;
else if (saved_tskey_type != tstype)
return;
stepsize = cfg_proto == SOCK_STREAM ? cfg_payload_len : 1;
+ stepsize = cfg_use_cmsg_opt_id ? 0 : stepsize;
if (tskey != saved_tskey + stepsize) {
fprintf(stderr, "ERROR: key %d, expected %d\n",
tskey, saved_tskey + stepsize);
@@ -480,7 +487,7 @@ static void fill_header_udp(void *p, bool is_ipv4)
static void do_test(int family, unsigned int report_opt)
{
- char control[CMSG_SPACE(sizeof(uint32_t))];
+ char control[2 * CMSG_SPACE(sizeof(uint32_t))];
struct sockaddr_ll laddr;
unsigned int sock_opt;
struct cmsghdr *cmsg;
@@ -620,18 +627,32 @@ static void do_test(int family, unsigned int report_opt)
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
- if (cfg_use_cmsg) {
+ if (cfg_use_cmsg || cfg_use_cmsg_opt_id) {
memset(control, 0, sizeof(control));
msg.msg_control = control;
- msg.msg_controllen = sizeof(control);
+ msg.msg_controllen = cfg_use_cmsg * CMSG_SPACE(sizeof(uint32_t));
+ msg.msg_controllen += cfg_use_cmsg_opt_id * CMSG_SPACE(sizeof(uint32_t));
- cmsg = CMSG_FIRSTHDR(&msg);
- cmsg->cmsg_level = SOL_SOCKET;
- cmsg->cmsg_type = SO_TIMESTAMPING;
- cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t));
+ cmsg = NULL;
+ if (cfg_use_cmsg) {
+ cmsg = CMSG_FIRSTHDR(&msg);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SO_TIMESTAMPING;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t));
+
+ *((uint32_t *)CMSG_DATA(cmsg)) = report_opt;
+ }
+ if (cfg_use_cmsg_opt_id) {
+ cmsg = cmsg ? CMSG_NXTHDR(&msg, cmsg) : CMSG_FIRSTHDR(&msg);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_TS_OPT_ID;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t));
+
+ *((uint32_t *)CMSG_DATA(cmsg)) = ts_opt_id;
+ saved_tskey = ts_opt_id;
+ }
- *((uint32_t *) CMSG_DATA(cmsg)) = report_opt;
}
val = sendmsg(fd, &msg, 0);
@@ -681,6 +702,7 @@ static void __attribute__((noreturn)) usage(const char *filepath)
" -L listen on hostname and port\n"
" -n: set no-payload option\n"
" -N: print timestamps and durations in nsec (instead of usec)\n"
+ " -o N: use SCM_TS_OPT_ID control message to provide N as tskey\n"
" -p N: connect to port N\n"
" -P: use PF_PACKET\n"
" -r: use raw\n"
@@ -701,7 +723,7 @@ static void parse_opt(int argc, char **argv)
int c;
while ((c = getopt(argc, argv,
- "46bc:CeEFhIl:LnNp:PrRS:t:uv:V:x")) != -1) {
+ "46bc:CeEFhIl:LnNo:p:PrRS:t:uv:V:x")) != -1) {
switch (c) {
case '4':
do_ipv6 = 0;
@@ -742,6 +764,10 @@ static void parse_opt(int argc, char **argv)
case 'N':
cfg_print_nsec = true;
break;
+ case 'o':
+ ts_opt_id = strtoul(optarg, NULL, 10);
+ cfg_use_cmsg_opt_id = true;
+ break;
case 'p':
dest_port = strtoul(optarg, NULL, 10);
break;
@@ -799,6 +825,8 @@ static void parse_opt(int argc, char **argv)
error(1, 0, "cannot ask for pktinfo over pf_packet");
if (cfg_busy_poll && cfg_use_epoll)
error(1, 0, "pass epoll or busy_poll, not both");
+ if (cfg_proto != SOCK_DGRAM && cfg_use_cmsg_opt_id)
+ error(1, 0, "control message TS_OPT_ID can only be used with udp socket");
if (optind != argc - 1)
error(1, 0, "missing required hostname argument");
diff --git a/tools/testing/selftests/net/txtimestamp.sh b/tools/testing/selftests/net/txtimestamp.sh
index 25baca4b148e..6cc2b1b480e0 100755
--- a/tools/testing/selftests/net/txtimestamp.sh
+++ b/tools/testing/selftests/net/txtimestamp.sh
@@ -37,11 +37,13 @@ run_test_v4v6() {
run_test_tcpudpraw() {
local -r args=$@
- run_test_v4v6 ${args} # tcp
- run_test_v4v6 ${args} -u # udp
- run_test_v4v6 ${args} -r # raw
- run_test_v4v6 ${args} -R # raw (IPPROTO_RAW)
- run_test_v4v6 ${args} -P # pf_packet
+ run_test_v4v6 ${args} # tcp
+ run_test_v4v6 ${args} -u # udp
+ run_test_v4v6 ${args} -u -o 5 # udp with fixed tskey
+ run_test_v4v6 ${args} -u -o 5 -C # udp with fixed tskey and cmsg control
+ run_test_v4v6 ${args} -r # raw
+ run_test_v4v6 ${args} -R # raw (IPPROTO_RAW)
+ run_test_v4v6 ${args} -P # pf_packet
}
run_test_all() {
--
2.43.5
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg
2024-09-04 11:31 [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
` (3 preceding siblings ...)
2024-09-04 11:31 ` [PATCH net-next v3 4/4] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
@ 2024-09-04 11:36 ` Vadim Fedorenko
2024-09-04 21:04 ` Willem de Bruijn
4 siblings, 1 reply; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-04 11:36 UTC (permalink / raw)
To: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
Jason Xing, Simon Horman
Cc: netdev
On 04/09/2024 12:31, Vadim Fedorenko wrote:
> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> timestamps and packets sent via socket. Unfortunately, there is no way
> to reliably predict socket timestamp ID value in case of error returned
> by sendmsg. For UDP sockets it's impossible because of lockless
> nature of UDP transmit, several threads may send packets in parallel. In
> case of RAW sockets MSG_MORE option makes things complicated. More
> details are in the conversation [1].
> This patch adds new control message type to give user-space
> software an opportunity to control the mapping between packets and
> values by providing ID with each sendmsg.
>
> The first patch in the series adds all needed definitions and implements
> the function for UDP sockets. The explicit check of socket's type is not
> added because subsequent patches in the series will add support for other
> types of sockets. The documentation is also included into the first
> patch.
>
> Patch 2/4 adds support for TCP sockets. This part is simple and straight
> forward.
>
> Patch 3/4 adds support for RAW sockets. It's a bit tricky because
> sock_tx_timestamp functions has to be refactored to receive full socket
> cookie information to fill in ID. The commit b534dc46c8ae ("net_tstamp:
> add SOF_TIMESTAMPING_OPT_ID_TCP") did the conversion of sk_tsflags to
> u32 but sock_tx_timestamp functions were not converted and still receive
> 16b flags. It wasn't a problem because SOF_TIMESTAMPING_OPT_ID_TCP was
> not checked in these functions, that's why no backporting is needed.
>
> Patch 4/4 adds selftests for new feature.
>
> Changelog:
> v2 -> v3:
> - remove SOF_TIMESTAMPING_OPT_ID_CMSG UAPI value and use kernel-internal
> SOCKCM_FLAG_TS_OPT_ID which uses the highest bit of tsflags.
> - add support for TCP and RAW sockets
> v1 -> v2:
> - add more selftests
> - add documentation for the feature
> - refactor UDP send function
> RFC -> v1:
> - add selftests
> - add SOF_TIMESTAMPING_OPT_ID_CMSG to signal of custom ID provided by
> user-space instead of reserving value of 0 for this.
>
> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>
> Vadim Fedorenko (4):
> net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
> net_tstamp: add SCM_TS_OPT_ID for TCP sockets
> net_tstamp: add SCM_TS_OPT_ID for RAW sockets
> selftests: txtimestamp: add SCM_TS_OPT_ID test
>
> Documentation/networking/timestamping.rst | 13 ++++++
> arch/alpha/include/uapi/asm/socket.h | 2 +
> arch/mips/include/uapi/asm/socket.h | 2 +
> arch/parisc/include/uapi/asm/socket.h | 2 +
> arch/sparc/include/uapi/asm/socket.h | 2 +
> include/net/inet_sock.h | 4 +-
> include/net/sock.h | 29 +++++++++----
> include/uapi/asm-generic/socket.h | 2 +
> include/uapi/linux/net_tstamp.h | 7 ++++
> net/can/raw.c | 2 +-
> net/core/sock.c | 9 ++++
> net/ipv4/ip_output.c | 20 ++++++---
> net/ipv4/raw.c | 2 +-
> net/ipv4/tcp.c | 15 ++++---
> net/ipv6/ip6_output.c | 20 ++++++---
> net/ipv6/raw.c | 2 +-
> net/packet/af_packet.c | 6 +--
> net/socket.c | 2 +-
> tools/include/uapi/asm-generic/socket.h | 2 +
> tools/testing/selftests/net/txtimestamp.c | 48 +++++++++++++++++-----
> tools/testing/selftests/net/txtimestamp.sh | 12 +++---
> 21 files changed, 154 insertions(+), 49 deletions(-)
>
Oh, sorry for the mess, patches:
[PATCH v3 2/3] selftests: txtimestamp: add SCM_TS_OPT_ID test
[PATCH v3 3/3] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
should be ignored.
If it too messy I can resend the series.
Sorry again,
Vadim
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
@ 2024-09-04 13:56 ` Jason Xing
2024-09-04 14:01 ` Vadim Fedorenko
2024-09-04 20:54 ` Willem de Bruijn
` (3 subsequent siblings)
4 siblings, 1 reply; 31+ messages in thread
From: Jason Xing @ 2024-09-04 13:56 UTC (permalink / raw)
To: Vadim Fedorenko
Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Simon Horman, netdev
Hello Vadim,
On Wed, Sep 4, 2024 at 7:32 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>
> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> timestamps and packets sent via socket. Unfortunately, there is no way
> to reliably predict socket timestamp ID value in case of error returned
> by sendmsg. For UDP sockets it's impossible because of lockless
> nature of UDP transmit, several threads may send packets in parallel. In
> case of RAW sockets MSG_MORE option makes things complicated. More
> details are in the conversation [1].
> This patch adds new control message type to give user-space
> software an opportunity to control the mapping between packets and
> values by providing ID with each sendmsg for UDP sockets.
> The documentation is also added in this patch.
>
> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> ---
> Documentation/networking/timestamping.rst | 13 +++++++++++++
> arch/alpha/include/uapi/asm/socket.h | 2 ++
> arch/mips/include/uapi/asm/socket.h | 2 ++
> arch/parisc/include/uapi/asm/socket.h | 2 ++
> arch/sparc/include/uapi/asm/socket.h | 2 ++
> include/net/inet_sock.h | 4 +++-
> include/net/sock.h | 2 ++
> include/uapi/asm-generic/socket.h | 2 ++
> include/uapi/linux/net_tstamp.h | 7 +++++++
> net/core/sock.c | 9 +++++++++
> net/ipv4/ip_output.c | 18 +++++++++++++-----
> net/ipv6/ip6_output.c | 18 +++++++++++++-----
> 12 files changed, 70 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> index 5e93cd71f99f..e365526d6bf9 100644
> --- a/Documentation/networking/timestamping.rst
> +++ b/Documentation/networking/timestamping.rst
> @@ -193,6 +193,19 @@ SOF_TIMESTAMPING_OPT_ID:
> among all possibly concurrently outstanding timestamp requests for
> that socket.
>
> + The process can optionally override the default generated ID, by
> + passing a specific ID with control message SCM_TS_OPT_ID::
> +
> + struct msghdr *msg;
> + ...
> + cmsg = CMSG_FIRSTHDR(msg);
> + cmsg->cmsg_level = SOL_SOCKET;
> + cmsg->cmsg_type = SCM_TS_OPT_ID;
> + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32));
> + *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
> + err = sendmsg(fd, msg, 0);
> +
> +
> SOF_TIMESTAMPING_OPT_ID_TCP:
> Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
> timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> index e94f621903fe..99dec81e7c84 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -140,6 +140,8 @@
> #define SO_PASSPIDFD 76
> #define SO_PEERPIDFD 77
>
> +#define SCM_TS_OPT_ID 78
> +
> #if !defined(__KERNEL__)
>
> #if __BITS_PER_LONG == 64
> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> index 60ebaed28a4c..bb3dc8feb205 100644
> --- a/arch/mips/include/uapi/asm/socket.h
> +++ b/arch/mips/include/uapi/asm/socket.h
> @@ -151,6 +151,8 @@
> #define SO_PASSPIDFD 76
> #define SO_PEERPIDFD 77
>
> +#define SCM_TS_OPT_ID 78
> +
> #if !defined(__KERNEL__)
>
> #if __BITS_PER_LONG == 64
> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> index be264c2b1a11..c3ab3b3289eb 100644
> --- a/arch/parisc/include/uapi/asm/socket.h
> +++ b/arch/parisc/include/uapi/asm/socket.h
> @@ -132,6 +132,8 @@
> #define SO_PASSPIDFD 0x404A
> #define SO_PEERPIDFD 0x404B
>
> +#define SCM_TS_OPT_ID 0x404C
> +
> #if !defined(__KERNEL__)
>
> #if __BITS_PER_LONG == 64
> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> index 682da3714686..9b40f0a57fbc 100644
> --- a/arch/sparc/include/uapi/asm/socket.h
> +++ b/arch/sparc/include/uapi/asm/socket.h
> @@ -133,6 +133,8 @@
> #define SO_PASSPIDFD 0x0055
> #define SO_PEERPIDFD 0x0056
>
> +#define SCM_TS_OPT_ID 0x0057
> +
> #if !defined(__KERNEL__)
>
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 394c3b66065e..f01dd273bea6 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -174,6 +174,7 @@ struct inet_cork {
> __s16 tos;
> char priority;
> __u16 gso_size;
> + u32 ts_opt_id;
> u64 transmit_time;
> u32 mark;
> };
> @@ -241,7 +242,8 @@ struct inet_sock {
> struct inet_cork_full cork;
> };
>
> -#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
> +#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
> +#define IPCORK_TS_OPT_ID 2 /* ts_opt_id field is valid, overriding sk_tskey */
>
> enum {
> INET_FLAGS_PKTINFO = 0,
> diff --git a/include/net/sock.h b/include/net/sock.h
> index f51d61fab059..c6554ad82961 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -952,6 +952,7 @@ enum sock_flags {
> };
>
> #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
> +#define SOCKCM_FLAG_TS_OPT_ID BIT(31)
>
> static inline void sock_copy_flags(struct sock *nsk, const struct sock *osk)
> {
> @@ -1794,6 +1795,7 @@ struct sockcm_cookie {
> u64 transmit_time;
> u32 mark;
> u32 tsflags;
> + u32 ts_opt_id;
> };
>
> static inline void sockcm_init(struct sockcm_cookie *sockc,
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index 8ce8a39a1e5f..db3df3e74b01 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -135,6 +135,8 @@
> #define SO_PASSPIDFD 76
> #define SO_PEERPIDFD 77
>
> +#define SCM_TS_OPT_ID 78
> +
> #if !defined(__KERNEL__)
>
> #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index a2c66b3d7f0f..1c38536350e7 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -38,6 +38,13 @@ enum {
> SOF_TIMESTAMPING_LAST
> };
>
> +/*
> + * The highest bit of sk_tsflags is reserved for kernel-internal
> + * SOCKCM_FLAG_TS_OPT_ID. This check is to control that SOF_TIMESTAMPING*
> + * values do not reach this reserved area
> + */
> +static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
I saw some error occur in the patchwork:
./usr/include/linux/net_tstamp.h:46:36: error: expected ‘)’ before ‘!=’ token
46 | static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
| ^~~
| )
make[5]: *** [../usr/include/Makefile:85:
usr/include/linux/net_tstamp.hdrtest] Error 1
make[4]: *** [../scripts/Makefile.build:485: usr/include] Error 2
make[3]: *** [../scripts/Makefile.build:485: usr] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [/home/nipa/net-next/wt-1/Makefile:1925: .] Error 2
make[1]: *** [/home/nipa/net-next/wt-1/Makefile:224: __sub-make] Error 2
make: *** [Makefile:224: __sub-make] Error 2
Please see the link:
https://netdev.bots.linux.dev/static/nipa/886766/13790642/build_32bit/stderr
https://netdev.bots.linux.dev/static/nipa/886766/13790640/build_32bit/stderr
Thanks,
Jason
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-04 13:56 ` Jason Xing
@ 2024-09-04 14:01 ` Vadim Fedorenko
0 siblings, 0 replies; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-04 14:01 UTC (permalink / raw)
To: Jason Xing
Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
Simon Horman, netdev
On 04/09/2024 14:56, Jason Xing wrote:
> Hello Vadim,
>
> On Wed, Sep 4, 2024 at 7:32 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>>
>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>> timestamps and packets sent via socket. Unfortunately, there is no way
>> to reliably predict socket timestamp ID value in case of error returned
>> by sendmsg. For UDP sockets it's impossible because of lockless
>> nature of UDP transmit, several threads may send packets in parallel. In
>> case of RAW sockets MSG_MORE option makes things complicated. More
>> details are in the conversation [1].
>> This patch adds new control message type to give user-space
>> software an opportunity to control the mapping between packets and
>> values by providing ID with each sendmsg for UDP sockets.
>> The documentation is also added in this patch.
>>
>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>>
>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>> ---
>> Documentation/networking/timestamping.rst | 13 +++++++++++++
>> arch/alpha/include/uapi/asm/socket.h | 2 ++
>> arch/mips/include/uapi/asm/socket.h | 2 ++
>> arch/parisc/include/uapi/asm/socket.h | 2 ++
>> arch/sparc/include/uapi/asm/socket.h | 2 ++
>> include/net/inet_sock.h | 4 +++-
>> include/net/sock.h | 2 ++
>> include/uapi/asm-generic/socket.h | 2 ++
>> include/uapi/linux/net_tstamp.h | 7 +++++++
>> net/core/sock.c | 9 +++++++++
>> net/ipv4/ip_output.c | 18 +++++++++++++-----
>> net/ipv6/ip6_output.c | 18 +++++++++++++-----
>> 12 files changed, 70 insertions(+), 11 deletions(-)
>>
>> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
>> index 5e93cd71f99f..e365526d6bf9 100644
>> --- a/Documentation/networking/timestamping.rst
>> +++ b/Documentation/networking/timestamping.rst
>> @@ -193,6 +193,19 @@ SOF_TIMESTAMPING_OPT_ID:
>> among all possibly concurrently outstanding timestamp requests for
>> that socket.
>>
>> + The process can optionally override the default generated ID, by
>> + passing a specific ID with control message SCM_TS_OPT_ID::
>> +
>> + struct msghdr *msg;
>> + ...
>> + cmsg = CMSG_FIRSTHDR(msg);
>> + cmsg->cmsg_level = SOL_SOCKET;
>> + cmsg->cmsg_type = SCM_TS_OPT_ID;
>> + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32));
>> + *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
>> + err = sendmsg(fd, msg, 0);
>> +
>> +
>> SOF_TIMESTAMPING_OPT_ID_TCP:
>> Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
>> timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>> index e94f621903fe..99dec81e7c84 100644
>> --- a/arch/alpha/include/uapi/asm/socket.h
>> +++ b/arch/alpha/include/uapi/asm/socket.h
>> @@ -140,6 +140,8 @@
>> #define SO_PASSPIDFD 76
>> #define SO_PEERPIDFD 77
>>
>> +#define SCM_TS_OPT_ID 78
>> +
>> #if !defined(__KERNEL__)
>>
>> #if __BITS_PER_LONG == 64
>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
>> index 60ebaed28a4c..bb3dc8feb205 100644
>> --- a/arch/mips/include/uapi/asm/socket.h
>> +++ b/arch/mips/include/uapi/asm/socket.h
>> @@ -151,6 +151,8 @@
>> #define SO_PASSPIDFD 76
>> #define SO_PEERPIDFD 77
>>
>> +#define SCM_TS_OPT_ID 78
>> +
>> #if !defined(__KERNEL__)
>>
>> #if __BITS_PER_LONG == 64
>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
>> index be264c2b1a11..c3ab3b3289eb 100644
>> --- a/arch/parisc/include/uapi/asm/socket.h
>> +++ b/arch/parisc/include/uapi/asm/socket.h
>> @@ -132,6 +132,8 @@
>> #define SO_PASSPIDFD 0x404A
>> #define SO_PEERPIDFD 0x404B
>>
>> +#define SCM_TS_OPT_ID 0x404C
>> +
>> #if !defined(__KERNEL__)
>>
>> #if __BITS_PER_LONG == 64
>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
>> index 682da3714686..9b40f0a57fbc 100644
>> --- a/arch/sparc/include/uapi/asm/socket.h
>> +++ b/arch/sparc/include/uapi/asm/socket.h
>> @@ -133,6 +133,8 @@
>> #define SO_PASSPIDFD 0x0055
>> #define SO_PEERPIDFD 0x0056
>>
>> +#define SCM_TS_OPT_ID 0x0057
>> +
>> #if !defined(__KERNEL__)
>>
>>
>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>> index 394c3b66065e..f01dd273bea6 100644
>> --- a/include/net/inet_sock.h
>> +++ b/include/net/inet_sock.h
>> @@ -174,6 +174,7 @@ struct inet_cork {
>> __s16 tos;
>> char priority;
>> __u16 gso_size;
>> + u32 ts_opt_id;
>> u64 transmit_time;
>> u32 mark;
>> };
>> @@ -241,7 +242,8 @@ struct inet_sock {
>> struct inet_cork_full cork;
>> };
>>
>> -#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
>> +#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
>> +#define IPCORK_TS_OPT_ID 2 /* ts_opt_id field is valid, overriding sk_tskey */
>>
>> enum {
>> INET_FLAGS_PKTINFO = 0,
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index f51d61fab059..c6554ad82961 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -952,6 +952,7 @@ enum sock_flags {
>> };
>>
>> #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
>> +#define SOCKCM_FLAG_TS_OPT_ID BIT(31)
>>
>> static inline void sock_copy_flags(struct sock *nsk, const struct sock *osk)
>> {
>> @@ -1794,6 +1795,7 @@ struct sockcm_cookie {
>> u64 transmit_time;
>> u32 mark;
>> u32 tsflags;
>> + u32 ts_opt_id;
>> };
>>
>> static inline void sockcm_init(struct sockcm_cookie *sockc,
>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
>> index 8ce8a39a1e5f..db3df3e74b01 100644
>> --- a/include/uapi/asm-generic/socket.h
>> +++ b/include/uapi/asm-generic/socket.h
>> @@ -135,6 +135,8 @@
>> #define SO_PASSPIDFD 76
>> #define SO_PEERPIDFD 77
>>
>> +#define SCM_TS_OPT_ID 78
>> +
>> #if !defined(__KERNEL__)
>>
>> #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
>> index a2c66b3d7f0f..1c38536350e7 100644
>> --- a/include/uapi/linux/net_tstamp.h
>> +++ b/include/uapi/linux/net_tstamp.h
>> @@ -38,6 +38,13 @@ enum {
>> SOF_TIMESTAMPING_LAST
>> };
>>
>> +/*
>> + * The highest bit of sk_tsflags is reserved for kernel-internal
>> + * SOCKCM_FLAG_TS_OPT_ID. This check is to control that SOF_TIMESTAMPING*
>> + * values do not reach this reserved area
>> + */
>> +static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
>
> I saw some error occur in the patchwork:
>
> ./usr/include/linux/net_tstamp.h:46:36: error: expected ‘)’ before ‘!=’ token
> 46 | static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
> | ^~~
> | )
> make[5]: *** [../usr/include/Makefile:85:
> usr/include/linux/net_tstamp.hdrtest] Error 1
> make[4]: *** [../scripts/Makefile.build:485: usr/include] Error 2
> make[3]: *** [../scripts/Makefile.build:485: usr] Error 2
> make[3]: *** Waiting for unfinished jobs....
> make[2]: *** [/home/nipa/net-next/wt-1/Makefile:1925: .] Error 2
> make[1]: *** [/home/nipa/net-next/wt-1/Makefile:224: __sub-make] Error 2
> make: *** [Makefile:224: __sub-make] Error 2
>
> Please see the link:
> https://netdev.bots.linux.dev/static/nipa/886766/13790642/build_32bit/stderr
> https://netdev.bots.linux.dev/static/nipa/886766/13790640/build_32bit/stderr
>
> Thanks,
> Jason
Hmm, that's interesting.. Looks like some inconsistency in compilers.
I'll re-check it, thanks Jason.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
2024-09-04 13:56 ` Jason Xing
@ 2024-09-04 20:54 ` Willem de Bruijn
2024-09-05 10:10 ` Vadim Fedorenko
2024-09-05 8:24 ` Jason Xing
` (2 subsequent siblings)
4 siblings, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2024-09-04 20:54 UTC (permalink / raw)
To: Vadim Fedorenko, Vadim Fedorenko, Willem de Bruijn,
Jakub Kicinski, Paolo Abeni, David Ahern, Jason Xing,
Simon Horman
Cc: Vadim Fedorenko, netdev
Vadim Fedorenko wrote:
> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> timestamps and packets sent via socket. Unfortunately, there is no way
> to reliably predict socket timestamp ID value in case of error returned
> by sendmsg. For UDP sockets it's impossible because of lockless
> nature of UDP transmit, several threads may send packets in parallel. In
> case of RAW sockets MSG_MORE option makes things complicated. More
> details are in the conversation [1].
> This patch adds new control message type to give user-space
> software an opportunity to control the mapping between packets and
> values by providing ID with each sendmsg for UDP sockets.
> The documentation is also added in this patch.
>
> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> ---
> Documentation/networking/timestamping.rst | 13 +++++++++++++
> arch/alpha/include/uapi/asm/socket.h | 2 ++
> arch/mips/include/uapi/asm/socket.h | 2 ++
> arch/parisc/include/uapi/asm/socket.h | 2 ++
> arch/sparc/include/uapi/asm/socket.h | 2 ++
> include/net/inet_sock.h | 4 +++-
> include/net/sock.h | 2 ++
> include/uapi/asm-generic/socket.h | 2 ++
> include/uapi/linux/net_tstamp.h | 7 +++++++
> net/core/sock.c | 9 +++++++++
> net/ipv4/ip_output.c | 18 +++++++++++++-----
> net/ipv6/ip6_output.c | 18 +++++++++++++-----
> 12 files changed, 70 insertions(+), 11 deletions(-)
>
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index a2c66b3d7f0f..1c38536350e7 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -38,6 +38,13 @@ enum {
> SOF_TIMESTAMPING_LAST
> };
>
> +/*
> + * The highest bit of sk_tsflags is reserved for kernel-internal
> + * SOCKCM_FLAG_TS_OPT_ID. This check is to control that SOF_TIMESTAMPING*
> + * values do not reach this reserved area
> + */
> +static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
Let's not leak any if this implementation detail to include/uapi.
A BUILD_BUG_ON wherever SOCKCM_FLAG_TS_OPT_ID is used, such as in case
SCM_TS_OPT_ID, should work.
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index eea443b7f65e..bd2f6a699470 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -973,7 +973,7 @@ static int __ip_append_data(struct sock *sk,
> unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
> int csummode = CHECKSUM_NONE;
> struct rtable *rt = dst_rtable(cork->dst);
> - bool paged, hold_tskey, extra_uref = false;
> + bool paged, hold_tskey = false, extra_uref = false;
> unsigned int wmem_alloc_delta = 0;
> u32 tskey = 0;
>
> @@ -1049,10 +1049,15 @@ static int __ip_append_data(struct sock *sk,
>
> cork->length += length;
>
> - hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
> - READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
> - if (hold_tskey)
> - tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> + if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
> + READ_ONCE(sk->sk_tsflags) & SOCKCM_FLAG_TS_OPT_ID) {
s/SOCKCM_FLAG_TS_OPT_ID/SOF_TIMESTAMPING_OPT_ID/
> + if (cork->flags & IPCORK_TS_OPT_ID) {
> + tskey = cork->ts_opt_id;
> + } else {
> + tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> + hold_tskey = true;
> + }
> + }
>
> /* So, what's going on in the loop below?
> *
> @@ -1325,8 +1330,11 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
> cork->mark = ipc->sockc.mark;
> cork->priority = ipc->priority;
> cork->transmit_time = ipc->sockc.transmit_time;
> + cork->ts_opt_id = ipc->sockc.ts_opt_id;
> cork->tx_flags = 0;
> sock_tx_timestamp(sk, ipc->sockc.tsflags, &cork->tx_flags);
> + if (ipc->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID)
> + cork->flags |= IPCORK_TS_OPT_ID;
We can move initialization of ts_opt_id into the branch.
Or conversely avoid the branch with some convoluted shift operations
to have the rval be either 1 << 1 or 0 << 1. But let's do the simpler
thing.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg
2024-09-04 11:36 ` [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
@ 2024-09-04 21:04 ` Willem de Bruijn
2024-09-04 22:50 ` Vadim Fedorenko
0 siblings, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2024-09-04 21:04 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Jason Xing, Simon Horman
Cc: netdev
> >
> > Vadim Fedorenko (4):
> > net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
> > net_tstamp: add SCM_TS_OPT_ID for TCP sockets
> > net_tstamp: add SCM_TS_OPT_ID for RAW sockets
> > selftests: txtimestamp: add SCM_TS_OPT_ID test
> >
> Oh, sorry for the mess, patches:
>
> [PATCH v3 2/3] selftests: txtimestamp: add SCM_TS_OPT_ID test
> [PATCH v3 3/3] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
>
> should be ignored.
>
> If it too messy I can resend the series.
The series looks good to me. The four patches listed in the cover
letter summary.
Overall looks great.
Perhaps the test can now also test TCP and RAW with a fixed OPT_ID.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg
2024-09-04 21:04 ` Willem de Bruijn
@ 2024-09-04 22:50 ` Vadim Fedorenko
0 siblings, 0 replies; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-04 22:50 UTC (permalink / raw)
To: Willem de Bruijn, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Jason Xing, Simon Horman
Cc: netdev
On 04/09/2024 22:04, Willem de Bruijn wrote:
>>>
>>> Vadim Fedorenko (4):
>>> net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
>>> net_tstamp: add SCM_TS_OPT_ID for TCP sockets
>>> net_tstamp: add SCM_TS_OPT_ID for RAW sockets
>>> selftests: txtimestamp: add SCM_TS_OPT_ID test
>>>
>> Oh, sorry for the mess, patches:
>>
>> [PATCH v3 2/3] selftests: txtimestamp: add SCM_TS_OPT_ID test
>> [PATCH v3 3/3] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
>>
>> should be ignored.
>>
>> If it too messy I can resend the series.
>
> The series looks good to me. The four patches listed in the cover
> letter summary.
>
> Overall looks great.
>
> Perhaps the test can now also test TCP and RAW with a fixed OPT_ID.
Sure, I'll add these tests. As well as changes to address comments for
1/4 patch.
Thanks,
Vadim
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
2024-09-04 13:56 ` Jason Xing
2024-09-04 20:54 ` Willem de Bruijn
@ 2024-09-05 8:24 ` Jason Xing
2024-09-05 8:34 ` Vadim Fedorenko
2024-09-06 9:22 ` kernel test robot
2024-09-06 12:58 ` kernel test robot
4 siblings, 1 reply; 31+ messages in thread
From: Jason Xing @ 2024-09-05 8:24 UTC (permalink / raw)
To: Vadim Fedorenko
Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Simon Horman, netdev
Hello Vadim,
On Wed, Sep 4, 2024 at 7:32 PM Vadim Fedorenko <vadfed@meta.com> wrote:
[...]
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index a2c66b3d7f0f..1c38536350e7 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -38,6 +38,13 @@ enum {
> SOF_TIMESTAMPING_LAST
> };
>
> +/*
> + * The highest bit of sk_tsflags is reserved for kernel-internal
> + * SOCKCM_FLAG_TS_OPT_ID. This check is to control that SOF_TIMESTAMPING*
> + * values do not reach this reserved area
I wonder if we can add the above description which is quite useful in
enum{} like this:
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index a2c66b3d7f0f..2314fccaf51d 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -13,7 +13,12 @@
#include <linux/types.h>
#include <linux/socket.h> /* for SO_TIMESTAMPING */
-/* SO_TIMESTAMPING flags */
+/* SO_TIMESTAMPING flags
+ *
+ * The highest bit of sk_tsflags is reserved for kernel-internal
+ * SOCKCM_FLAG_TS_OPT_ID.
+ * SOCKCM_FLAG_TS_OPT_ID = (1 << 31),
+ */
enum {
SOF_TIMESTAMPING_TX_HARDWARE = (1<<0),
SOF_TIMESTAMPING_TX_SOFTWARE = (1<<1),
to explicitly remind the developers not to touch 1<<31 field. Or else,
it can be very hard to trace who occupied the highest field in the
future at the first glance, I think.
[...]
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index f26841f1490f..9b87d23314e8 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1401,7 +1401,10 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
> cork->base.gso_size = ipc6->gso_size;
> cork->base.tx_flags = 0;
> cork->base.mark = ipc6->sockc.mark;
> + cork->base.ts_opt_id = ipc6->sockc.ts_opt_id;
> sock_tx_timestamp(sk, ipc6->sockc.tsflags, &cork->base.tx_flags);
> + if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID)
> + cork->base.flags |= IPCORK_TS_OPT_ID;
>
> cork->base.length = 0;
> cork->base.transmit_time = ipc6->sockc.transmit_time;
> @@ -1433,7 +1436,7 @@ static int __ip6_append_data(struct sock *sk,
> bool zc = false;
> u32 tskey = 0;
> struct rt6_info *rt = dst_rt6_info(cork->dst);
> - bool paged, hold_tskey, extra_uref = false;
> + bool paged, hold_tskey = false, extra_uref = false;
> struct ipv6_txoptions *opt = v6_cork->opt;
> int csummode = CHECKSUM_NONE;
> unsigned int maxnonfragsize, headersize;
> @@ -1543,10 +1546,15 @@ static int __ip6_append_data(struct sock *sk,
> flags &= ~MSG_SPLICE_PAGES;
> }
>
> - hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
> - READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
> - if (hold_tskey)
> - tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> + if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
> + READ_ONCE(sk->sk_tsflags) & SOCKCM_FLAG_TS_OPT_ID) {
s/SOCKCM_FLAG_TS_OPT_ID/SOF_TIMESTAMPING_OPT_ID/
In case you forget to change here :)
> + if (cork->flags & IPCORK_TS_OPT_ID) {
> + tskey = cork->ts_opt_id;
> + } else {
> + tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> + hold_tskey = true;
> + }
> + }
>
> /*
> * Let's try using as much space as possible.
> --
> 2.43.5
>
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-05 8:24 ` Jason Xing
@ 2024-09-05 8:34 ` Vadim Fedorenko
2024-09-05 8:49 ` Jason Xing
0 siblings, 1 reply; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-05 8:34 UTC (permalink / raw)
To: Jason Xing, Vadim Fedorenko
Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
Simon Horman, netdev
On 05/09/2024 09:24, Jason Xing wrote:
> Hello Vadim,
>
> On Wed, Sep 4, 2024 at 7:32 PM Vadim Fedorenko <vadfed@meta.com> wrote:
> [...]
>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
>> index a2c66b3d7f0f..1c38536350e7 100644
>> --- a/include/uapi/linux/net_tstamp.h
>> +++ b/include/uapi/linux/net_tstamp.h
>> @@ -38,6 +38,13 @@ enum {
>> SOF_TIMESTAMPING_LAST
>> };
>>
>> +/*
>> + * The highest bit of sk_tsflags is reserved for kernel-internal
>> + * SOCKCM_FLAG_TS_OPT_ID. This check is to control that SOF_TIMESTAMPING*
>> + * values do not reach this reserved area
>
> I wonder if we can add the above description which is quite useful in
> enum{} like this:
>
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index a2c66b3d7f0f..2314fccaf51d 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -13,7 +13,12 @@
> #include <linux/types.h>
> #include <linux/socket.h> /* for SO_TIMESTAMPING */
>
> -/* SO_TIMESTAMPING flags */
> +/* SO_TIMESTAMPING flags
> + *
> + * The highest bit of sk_tsflags is reserved for kernel-internal
> + * SOCKCM_FLAG_TS_OPT_ID.
> + * SOCKCM_FLAG_TS_OPT_ID = (1 << 31),
> + */
> enum {
> SOF_TIMESTAMPING_TX_HARDWARE = (1<<0),
> SOF_TIMESTAMPING_TX_SOFTWARE = (1<<1),
>
> to explicitly remind the developers not to touch 1<<31 field. Or else,
> it can be very hard to trace who occupied the highest field in the
> future at the first glance, I think.
>
> [...]
That's a bit contradictory to Willem's comment about not exposing
implementation details to UAPI headers, which I think makes sense.
I will move the comment to the definition area of SOCKCM_FLAG_TS_OPT_ID
and will try to add meaningful message to BUILD_BUG_ON() to make it
easier for developers to understand the problem.
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index f26841f1490f..9b87d23314e8 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1401,7 +1401,10 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
>> cork->base.gso_size = ipc6->gso_size;
>> cork->base.tx_flags = 0;
>> cork->base.mark = ipc6->sockc.mark;
>> + cork->base.ts_opt_id = ipc6->sockc.ts_opt_id;
>> sock_tx_timestamp(sk, ipc6->sockc.tsflags, &cork->base.tx_flags);
>> + if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID)
>> + cork->base.flags |= IPCORK_TS_OPT_ID;
>>
>> cork->base.length = 0;
>> cork->base.transmit_time = ipc6->sockc.transmit_time;
>> @@ -1433,7 +1436,7 @@ static int __ip6_append_data(struct sock *sk,
>> bool zc = false;
>> u32 tskey = 0;
>> struct rt6_info *rt = dst_rt6_info(cork->dst);
>> - bool paged, hold_tskey, extra_uref = false;
>> + bool paged, hold_tskey = false, extra_uref = false;
>> struct ipv6_txoptions *opt = v6_cork->opt;
>> int csummode = CHECKSUM_NONE;
>> unsigned int maxnonfragsize, headersize;
>> @@ -1543,10 +1546,15 @@ static int __ip6_append_data(struct sock *sk,
>> flags &= ~MSG_SPLICE_PAGES;
>> }
>>
>> - hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
>> - READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
>> - if (hold_tskey)
>> - tskey = atomic_inc_return(&sk->sk_tskey) - 1;
>> + if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
>> + READ_ONCE(sk->sk_tsflags) & SOCKCM_FLAG_TS_OPT_ID) {
>
> s/SOCKCM_FLAG_TS_OPT_ID/SOF_TIMESTAMPING_OPT_ID/
> In case you forget to change here :)
Yeah, I've fixed this one already, but thanks!
>
>> + if (cork->flags & IPCORK_TS_OPT_ID) {
>> + tskey = cork->ts_opt_id;
>> + } else {
>> + tskey = atomic_inc_return(&sk->sk_tskey) - 1;
>> + hold_tskey = true;
>> + }
>> + }
>>
>> /*
>> * Let's try using as much space as possible.
>> --
>> 2.43.5
>>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-05 8:34 ` Vadim Fedorenko
@ 2024-09-05 8:49 ` Jason Xing
0 siblings, 0 replies; 31+ messages in thread
From: Jason Xing @ 2024-09-05 8:49 UTC (permalink / raw)
To: Vadim Fedorenko
Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
David Ahern, Simon Horman, netdev
On Thu, Sep 5, 2024 at 4:34 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 05/09/2024 09:24, Jason Xing wrote:
> > Hello Vadim,
> >
> > On Wed, Sep 4, 2024 at 7:32 PM Vadim Fedorenko <vadfed@meta.com> wrote:
> > [...]
> >> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> >> index a2c66b3d7f0f..1c38536350e7 100644
> >> --- a/include/uapi/linux/net_tstamp.h
> >> +++ b/include/uapi/linux/net_tstamp.h
> >> @@ -38,6 +38,13 @@ enum {
> >> SOF_TIMESTAMPING_LAST
> >> };
> >>
> >> +/*
> >> + * The highest bit of sk_tsflags is reserved for kernel-internal
> >> + * SOCKCM_FLAG_TS_OPT_ID. This check is to control that SOF_TIMESTAMPING*
> >> + * values do not reach this reserved area
> >
> > I wonder if we can add the above description which is quite useful in
> > enum{} like this:
> >
> > diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> > index a2c66b3d7f0f..2314fccaf51d 100644
> > --- a/include/uapi/linux/net_tstamp.h
> > +++ b/include/uapi/linux/net_tstamp.h
> > @@ -13,7 +13,12 @@
> > #include <linux/types.h>
> > #include <linux/socket.h> /* for SO_TIMESTAMPING */
> >
> > -/* SO_TIMESTAMPING flags */
> > +/* SO_TIMESTAMPING flags
> > + *
> > + * The highest bit of sk_tsflags is reserved for kernel-internal
> > + * SOCKCM_FLAG_TS_OPT_ID.
> > + * SOCKCM_FLAG_TS_OPT_ID = (1 << 31),
> > + */
> > enum {
> > SOF_TIMESTAMPING_TX_HARDWARE = (1<<0),
> > SOF_TIMESTAMPING_TX_SOFTWARE = (1<<1),
> >
> > to explicitly remind the developers not to touch 1<<31 field. Or else,
> > it can be very hard to trace who occupied the highest field in the
> > future at the first glance, I think.
> >
> > [...]
>
> That's a bit contradictory to Willem's comment about not exposing
> implementation details to UAPI headers, which I think makes sense.
Oh, well, I missed checking the filename... it's an uapi file, sorry
for the noise :(
>
> I will move the comment to the definition area of SOCKCM_FLAG_TS_OPT_ID
> and will try to add meaningful message to BUILD_BUG_ON() to make it
> easier for developers to understand the problem.
Great! Thanks!
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-04 20:54 ` Willem de Bruijn
@ 2024-09-05 10:10 ` Vadim Fedorenko
2024-09-05 13:29 ` Willem de Bruijn
0 siblings, 1 reply; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-05 10:10 UTC (permalink / raw)
To: Willem de Bruijn, Vadim Fedorenko, Willem de Bruijn,
Jakub Kicinski, Paolo Abeni, David Ahern, Jason Xing,
Simon Horman
Cc: netdev
On 04/09/2024 21:54, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>> timestamps and packets sent via socket. Unfortunately, there is no way
>> to reliably predict socket timestamp ID value in case of error returned
>> by sendmsg. For UDP sockets it's impossible because of lockless
>> nature of UDP transmit, several threads may send packets in parallel. In
>> case of RAW sockets MSG_MORE option makes things complicated. More
>> details are in the conversation [1].
>> This patch adds new control message type to give user-space
>> software an opportunity to control the mapping between packets and
>> values by providing ID with each sendmsg for UDP sockets.
>> The documentation is also added in this patch.
>>
>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>>
>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>> ---
>> Documentation/networking/timestamping.rst | 13 +++++++++++++
>> arch/alpha/include/uapi/asm/socket.h | 2 ++
>> arch/mips/include/uapi/asm/socket.h | 2 ++
>> arch/parisc/include/uapi/asm/socket.h | 2 ++
>> arch/sparc/include/uapi/asm/socket.h | 2 ++
>> include/net/inet_sock.h | 4 +++-
>> include/net/sock.h | 2 ++
>> include/uapi/asm-generic/socket.h | 2 ++
>> include/uapi/linux/net_tstamp.h | 7 +++++++
>> net/core/sock.c | 9 +++++++++
>> net/ipv4/ip_output.c | 18 +++++++++++++-----
>> net/ipv6/ip6_output.c | 18 +++++++++++++-----
>> 12 files changed, 70 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
>> index a2c66b3d7f0f..1c38536350e7 100644
>> --- a/include/uapi/linux/net_tstamp.h
>> +++ b/include/uapi/linux/net_tstamp.h
>> @@ -38,6 +38,13 @@ enum {
>> SOF_TIMESTAMPING_LAST
>> };
>>
>> +/*
>> + * The highest bit of sk_tsflags is reserved for kernel-internal
>> + * SOCKCM_FLAG_TS_OPT_ID. This check is to control that SOF_TIMESTAMPING*
>> + * values do not reach this reserved area
>> + */
>> +static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
>
> Let's not leak any if this implementation detail to include/uapi.
>
> A BUILD_BUG_ON wherever SOCKCM_FLAG_TS_OPT_ID is used, such as in case
> SCM_TS_OPT_ID, should work.
Makes sense. I'll change the check and will try to add meaningful message.
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index eea443b7f65e..bd2f6a699470 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -973,7 +973,7 @@ static int __ip_append_data(struct sock *sk,
>> unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
>> int csummode = CHECKSUM_NONE;
>> struct rtable *rt = dst_rtable(cork->dst);
>> - bool paged, hold_tskey, extra_uref = false;
>> + bool paged, hold_tskey = false, extra_uref = false;
>> unsigned int wmem_alloc_delta = 0;
>> u32 tskey = 0;
>>
>> @@ -1049,10 +1049,15 @@ static int __ip_append_data(struct sock *sk,
>>
>> cork->length += length;
>>
>> - hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
>> - READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
>> - if (hold_tskey)
>> - tskey = atomic_inc_return(&sk->sk_tskey) - 1;
>> + if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
>> + READ_ONCE(sk->sk_tsflags) & SOCKCM_FLAG_TS_OPT_ID) {
>
> s/SOCKCM_FLAG_TS_OPT_ID/SOF_TIMESTAMPING_OPT_ID/
Ack
>> + if (cork->flags & IPCORK_TS_OPT_ID) {
>> + tskey = cork->ts_opt_id;
>> + } else {
>> + tskey = atomic_inc_return(&sk->sk_tskey) - 1;
>> + hold_tskey = true;
>> + }
>> + }
>>
>> /* So, what's going on in the loop below?
>> *
>> @@ -1325,8 +1330,11 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
>> cork->mark = ipc->sockc.mark;
>> cork->priority = ipc->priority;
>> cork->transmit_time = ipc->sockc.transmit_time;
>> + cork->ts_opt_id = ipc->sockc.ts_opt_id;
>> cork->tx_flags = 0;
>> sock_tx_timestamp(sk, ipc->sockc.tsflags, &cork->tx_flags);
>> + if (ipc->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID)
>> + cork->flags |= IPCORK_TS_OPT_ID;
>
> We can move initialization of ts_opt_id into the branch.
>
> Or conversely avoid the branch with some convoluted shift operations
> to have the rval be either 1 << 1 or 0 << 1. But let's do the simpler
> thing.
What is the reason to move initialization behind the flag? We are not
doing this for transmit_time even though it's also used with flag only.
It's not a big deal to change, I just wonder what are the benefits?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-05 10:10 ` Vadim Fedorenko
@ 2024-09-05 13:29 ` Willem de Bruijn
0 siblings, 0 replies; 31+ messages in thread
From: Willem de Bruijn @ 2024-09-05 13:29 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Vadim Fedorenko,
Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
Jason Xing, Simon Horman
Cc: netdev
Vadim Fedorenko wrote:
> On 04/09/2024 21:54, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> >> timestamps and packets sent via socket. Unfortunately, there is no way
> >> to reliably predict socket timestamp ID value in case of error returned
> >> by sendmsg. For UDP sockets it's impossible because of lockless
> >> nature of UDP transmit, several threads may send packets in parallel. In
> >> case of RAW sockets MSG_MORE option makes things complicated. More
> >> details are in the conversation [1].
> >> This patch adds new control message type to give user-space
> >> software an opportunity to control the mapping between packets and
> >> values by providing ID with each sendmsg for UDP sockets.
> >> The documentation is also added in this patch.
> >>
> >> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> >>
> >> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >> @@ -1325,8 +1330,11 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
> >> cork->mark = ipc->sockc.mark;
> >> cork->priority = ipc->priority;
> >> cork->transmit_time = ipc->sockc.transmit_time;
> >> + cork->ts_opt_id = ipc->sockc.ts_opt_id;
> >> cork->tx_flags = 0;
> >> sock_tx_timestamp(sk, ipc->sockc.tsflags, &cork->tx_flags);
> >> + if (ipc->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >> + cork->flags |= IPCORK_TS_OPT_ID;
> >
> > We can move initialization of ts_opt_id into the branch.
> >
> > Or conversely avoid the branch with some convoluted shift operations
> > to have the rval be either 1 << 1 or 0 << 1. But let's do the simpler
> > thing.
>
> What is the reason to move initialization behind the flag? We are not
> doing this for transmit_time even though it's also used with flag only.
>
> It's not a big deal to change, I just wonder what are the benefits?
Just avoid one assignment in the hot path that does not use this
feature.
cork->ts_opt_id is only valuid if cork->flags & IPCORK_TS_OPT_ID.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-04 11:31 ` [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets Vadim Fedorenko
@ 2024-09-05 14:58 ` Vadim Fedorenko
2024-09-05 15:37 ` Jason Xing
2024-09-05 16:39 ` Willem de Bruijn
0 siblings, 2 replies; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-05 14:58 UTC (permalink / raw)
To: Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
On 04/09/2024 12:31, Vadim Fedorenko wrote:
> TCP sockets have different flow for providing timestamp OPT_ID value.
> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> ---
> net/ipv4/tcp.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 8a5680b4e786..5553a8aeee80 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> }
> EXPORT_SYMBOL(tcp_init_sock);
>
> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> {
> struct sk_buff *skb = tcp_write_queue_tail(sk);
> + u32 tsflags = sockc->tsflags;
>
> if (tsflags && skb) {
> struct skb_shared_info *shinfo = skb_shinfo(skb);
> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> tcb->txstamp_ack = 1;
> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> + shinfo->tskey = sockc->ts_opt_id;
> + else
> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> + }
> }
> }
>
> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>
> out:
> if (copied) {
> - tcp_tx_timestamp(sk, sockc.tsflags);
> + tcp_tx_timestamp(sk, &sockc);
> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> }
> out_nopush:
Hi Willem,
Unfortunately, these changes are not enough to enable custom OPT_ID for
TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
flow:
tcp_skb_collapse_tstamp()
tcp_fragment_tstamp()
tcp_gso_tstamp()
I believe the last one breaks tests, but the problem is that there is no
easy way to provide the flag of constant tskey to it. Only
shinfo::tx_flags are available at the caller side and we have already
discussed that we shouldn't use the last bit of this field.
So, how should we deal with the problem? Or is it better to postpone
support for TCP sockets in this case?
Thanks,
Vadim
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-05 14:58 ` Vadim Fedorenko
@ 2024-09-05 15:37 ` Jason Xing
2024-09-05 16:39 ` Willem de Bruijn
1 sibling, 0 replies; 31+ messages in thread
From: Jason Xing @ 2024-09-05 15:37 UTC (permalink / raw)
To: Vadim Fedorenko
Cc: Willem de Bruijn, netdev, David Ahern, Jakub Kicinski,
Simon Horman, Paolo Abeni
Hello Vadim,
On Thu, Sep 5, 2024 at 10:58 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> > TCP sockets have different flow for providing timestamp OPT_ID value.
> > Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >
> > Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> > ---
> > net/ipv4/tcp.c | 13 +++++++++----
> > 1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 8a5680b4e786..5553a8aeee80 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> > }
> > EXPORT_SYMBOL(tcp_init_sock);
> >
> > -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> > {
> > struct sk_buff *skb = tcp_write_queue_tail(sk);
> > + u32 tsflags = sockc->tsflags;
> >
> > if (tsflags && skb) {
> > struct skb_shared_info *shinfo = skb_shinfo(skb);
> > @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> > if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> > tcb->txstamp_ack = 1;
> > - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> > - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> > + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> > + shinfo->tskey = sockc->ts_opt_id;
> > + else
> > + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > + }
> > }
> > }
> >
> > @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >
> > out:
> > if (copied) {
> > - tcp_tx_timestamp(sk, sockc.tsflags);
> > + tcp_tx_timestamp(sk, &sockc);
> > tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> > }
> > out_nopush:
>
> Hi Willem,
>
> Unfortunately, these changes are not enough to enable custom OPT_ID for
> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> flow:
I was planning to test locally after your new version is posted... Now
I can see the good of a useful selftest from this :)
>
> tcp_skb_collapse_tstamp()
> tcp_fragment_tstamp()
> tcp_gso_tstamp()
>
> I believe the last one breaks tests, but the problem is that there is no
> easy way to provide the flag of constant tskey to it. Only
> shinfo::tx_flags are available at the caller side and we have already
> discussed that we shouldn't use the last bit of this field.
>
> So, how should we deal with the problem? Or is it better to postpone
> support for TCP sockets in this case?
I'm not Willem, but my intuition is to postpone it since this will
make the series much bigger than the previous expectation.
Let Willem decide finally :)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-05 14:58 ` Vadim Fedorenko
2024-09-05 15:37 ` Jason Xing
@ 2024-09-05 16:39 ` Willem de Bruijn
2024-09-06 12:20 ` Vadim Fedorenko
1 sibling, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2024-09-05 16:39 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
Vadim Fedorenko wrote:
> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> > TCP sockets have different flow for providing timestamp OPT_ID value.
> > Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >
> > Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> > ---
> > net/ipv4/tcp.c | 13 +++++++++----
> > 1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 8a5680b4e786..5553a8aeee80 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> > }
> > EXPORT_SYMBOL(tcp_init_sock);
> >
> > -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> > {
> > struct sk_buff *skb = tcp_write_queue_tail(sk);
> > + u32 tsflags = sockc->tsflags;
> >
> > if (tsflags && skb) {
> > struct skb_shared_info *shinfo = skb_shinfo(skb);
> > @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> > if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> > tcb->txstamp_ack = 1;
> > - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> > - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> > + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> > + shinfo->tskey = sockc->ts_opt_id;
> > + else
> > + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > + }
> > }
> > }
> >
> > @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >
> > out:
> > if (copied) {
> > - tcp_tx_timestamp(sk, sockc.tsflags);
> > + tcp_tx_timestamp(sk, &sockc);
> > tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> > }
> > out_nopush:
>
> Hi Willem,
>
> Unfortunately, these changes are not enough to enable custom OPT_ID for
> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> flow:
>
> tcp_skb_collapse_tstamp()
> tcp_fragment_tstamp()
> tcp_gso_tstamp()
>
> I believe the last one breaks tests, but the problem is that there is no
> easy way to provide the flag of constant tskey to it. Only
> shinfo::tx_flags are available at the caller side and we have already
> discussed that we shouldn't use the last bit of this field.
>
> So, how should we deal with the problem? Or is it better to postpone
> support for TCP sockets in this case?
Are you sure that this is a problem. These functions pass on the
skb_shinfo(skb)->ts_key from one skb to another.
As long as tcp_tx_timestamp sets the skb_shinfo(skb)->ts_key of the
original skb correctly, either from the cmsg or sk_tskey, then I don't
immediate see how this passing on from one skb to another would break
the intent.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
` (2 preceding siblings ...)
2024-09-05 8:24 ` Jason Xing
@ 2024-09-06 9:22 ` kernel test robot
2024-09-06 12:58 ` kernel test robot
4 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2024-09-06 9:22 UTC (permalink / raw)
To: Vadim Fedorenko, Vadim Fedorenko, Willem de Bruijn,
Jakub Kicinski, Paolo Abeni, David Ahern, Jason Xing,
Simon Horman
Cc: llvm, oe-kbuild-all, netdev
Hi Vadim,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Vadim-Fedorenko/net_tstamp-add-SCM_TS_OPT_ID-to-provide-OPT_ID-in-control-message/20240904-193351
base: net-next/main
patch link: https://lore.kernel.org/r/20240904113153.2196238-2-vadfed%40meta.com
patch subject: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
config: i386-buildonly-randconfig-005-20240906 (https://download.01.org/0day-ci/archive/20240906/202409061658.ydeSewh7-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240906/202409061658.ydeSewh7-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409061658.ydeSewh7-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
In file included from <built-in>:1:
>> ./usr/include/linux/net_tstamp.h:46:15: error: unknown type name 'SOF_TIMESTAMPING_LAST'
46 | static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
| ^
>> ./usr/include/linux/net_tstamp.h:46:37: error: expected ')'
46 | static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
| ^
./usr/include/linux/net_tstamp.h:46:14: note: to match this '('
46 | static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
| ^
>> ./usr/include/linux/net_tstamp.h:46:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
46 | static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
| ^
| int
1 warning and 2 errors generated.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-05 16:39 ` Willem de Bruijn
@ 2024-09-06 12:20 ` Vadim Fedorenko
2024-09-06 15:25 ` Willem de Bruijn
0 siblings, 1 reply; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-06 12:20 UTC (permalink / raw)
To: Willem de Bruijn, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
On 05/09/2024 17:39, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>
>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>> ---
>>> net/ipv4/tcp.c | 13 +++++++++----
>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 8a5680b4e786..5553a8aeee80 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>> }
>>> EXPORT_SYMBOL(tcp_init_sock);
>>>
>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>> {
>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
>>> + u32 tsflags = sockc->tsflags;
>>>
>>> if (tsflags && skb) {
>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>> tcb->txstamp_ack = 1;
>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>> + shinfo->tskey = sockc->ts_opt_id;
>>> + else
>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>> + }
>>> }
>>> }
>>>
>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>
>>> out:
>>> if (copied) {
>>> - tcp_tx_timestamp(sk, sockc.tsflags);
>>> + tcp_tx_timestamp(sk, &sockc);
>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>> }
>>> out_nopush:
>>
>> Hi Willem,
>>
>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>> flow:
>>
>> tcp_skb_collapse_tstamp()
>> tcp_fragment_tstamp()
>> tcp_gso_tstamp()
>>
>> I believe the last one breaks tests, but the problem is that there is no
>> easy way to provide the flag of constant tskey to it. Only
>> shinfo::tx_flags are available at the caller side and we have already
>> discussed that we shouldn't use the last bit of this field.
>>
>> So, how should we deal with the problem? Or is it better to postpone
>> support for TCP sockets in this case?
>
> Are you sure that this is a problem. These functions pass on the
> skb_shinfo(skb)->ts_key from one skb to another.
Yes, you are right, the problem is in a different place.
__skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
value:
if (sk_is_tcp(sk))
serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
It happens because of assumption that for TCP sockets shinfo::tskey will
have sequence number and the logic has to recalculate it back to the
bytes sent. The same logic exists in all TCP TX timestamping functions
(mentioned in the previous mail) and may trigger some unexpected
behavior. To fix the issue we have to provide some kind of signal that
tskey value is provided from user-space and shouldn't be changed. And we
have to have it somewhere in skb info. Again, tx_flags looks like the
best candidate, but it's impossible to use. I'm thinking of using
special flag in tcp_skb_cb - gonna test more, but open for other
suggestions.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
` (3 preceding siblings ...)
2024-09-06 9:22 ` kernel test robot
@ 2024-09-06 12:58 ` kernel test robot
4 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2024-09-06 12:58 UTC (permalink / raw)
To: Vadim Fedorenko, Vadim Fedorenko, Willem de Bruijn,
Jakub Kicinski, Paolo Abeni, David Ahern, Jason Xing,
Simon Horman
Cc: oe-kbuild-all, netdev
Hi Vadim,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Vadim-Fedorenko/net_tstamp-add-SCM_TS_OPT_ID-to-provide-OPT_ID-in-control-message/20240904-193351
base: net-next/main
patch link: https://lore.kernel.org/r/20240904113153.2196238-2-vadfed%40meta.com
patch subject: [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20240906/202409062041.8g7uYSEJ-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240906/202409062041.8g7uYSEJ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409062041.8g7uYSEJ-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from <command-line>:
>> ./usr/include/linux/net_tstamp.h:46:36: error: expected ')' before '!=' token
46 | static_assert(SOF_TIMESTAMPING_LAST != (1 << 31));
| ^~~
| )
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-06 12:20 ` Vadim Fedorenko
@ 2024-09-06 15:25 ` Willem de Bruijn
2024-09-06 16:33 ` Willem de Bruijn
2024-09-06 17:27 ` Vadim Fedorenko
0 siblings, 2 replies; 31+ messages in thread
From: Willem de Bruijn @ 2024-09-06 15:25 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
Vadim Fedorenko wrote:
> On 05/09/2024 17:39, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>
> >>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >>> ---
> >>> net/ipv4/tcp.c | 13 +++++++++----
> >>> 1 file changed, 9 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>> index 8a5680b4e786..5553a8aeee80 100644
> >>> --- a/net/ipv4/tcp.c
> >>> +++ b/net/ipv4/tcp.c
> >>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>> }
> >>> EXPORT_SYMBOL(tcp_init_sock);
> >>>
> >>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>> {
> >>> struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>> + u32 tsflags = sockc->tsflags;
> >>>
> >>> if (tsflags && skb) {
> >>> struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>> tcb->txstamp_ack = 1;
> >>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>> + shinfo->tskey = sockc->ts_opt_id;
> >>> + else
> >>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>> + }
> >>> }
> >>> }
> >>>
> >>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>
> >>> out:
> >>> if (copied) {
> >>> - tcp_tx_timestamp(sk, sockc.tsflags);
> >>> + tcp_tx_timestamp(sk, &sockc);
> >>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>> }
> >>> out_nopush:
> >>
> >> Hi Willem,
> >>
> >> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >> flow:
> >>
> >> tcp_skb_collapse_tstamp()
> >> tcp_fragment_tstamp()
> >> tcp_gso_tstamp()
> >>
> >> I believe the last one breaks tests, but the problem is that there is no
> >> easy way to provide the flag of constant tskey to it. Only
> >> shinfo::tx_flags are available at the caller side and we have already
> >> discussed that we shouldn't use the last bit of this field.
> >>
> >> So, how should we deal with the problem? Or is it better to postpone
> >> support for TCP sockets in this case?
> >
> > Are you sure that this is a problem. These functions pass on the
> > skb_shinfo(skb)->ts_key from one skb to another.
>
> Yes, you are right, the problem is in a different place.
>
> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> value:
>
> if (sk_is_tcp(sk))
> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>
> It happens because of assumption that for TCP sockets shinfo::tskey will
> have sequence number and the logic has to recalculate it back to the
> bytes sent. The same logic exists in all TCP TX timestamping functions
> (mentioned in the previous mail) and may trigger some unexpected
> behavior. To fix the issue we have to provide some kind of signal that
> tskey value is provided from user-space and shouldn't be changed. And we
> have to have it somewhere in skb info. Again, tx_flags looks like the
> best candidate, but it's impossible to use. I'm thinking of using
> special flag in tcp_skb_cb - gonna test more, but open for other
> suggestions.
Ai, that is tricky. tx_flags is full/scarce indeed.
CB does not persist as the skb transitions between layers.
The most obvious solution would be to set the flag in sk_tsflags
itself. But then the cmsg would no long work on a per request basis:
either the socket uses OPT_ID with counter or OPT_ID_CMSG.
Good that we catch this now before the ABI is finalized.
If necessary TCP semantics can diverge from datagrams. So we could
punt on this. But it's not ideal.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-06 15:25 ` Willem de Bruijn
@ 2024-09-06 16:33 ` Willem de Bruijn
2024-09-06 17:27 ` Vadim Fedorenko
2024-09-06 17:27 ` Vadim Fedorenko
1 sibling, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2024-09-06 16:33 UTC (permalink / raw)
To: Willem de Bruijn, Vadim Fedorenko, Willem de Bruijn,
Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
> > On 05/09/2024 17:39, Willem de Bruijn wrote:
> > > Vadim Fedorenko wrote:
> > >> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> > >>> TCP sockets have different flow for providing timestamp OPT_ID value.
> > >>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> > >>>
> > >>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> > >>> ---
> > >>> net/ipv4/tcp.c | 13 +++++++++----
> > >>> 1 file changed, 9 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > >>> index 8a5680b4e786..5553a8aeee80 100644
> > >>> --- a/net/ipv4/tcp.c
> > >>> +++ b/net/ipv4/tcp.c
> > >>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> > >>> }
> > >>> EXPORT_SYMBOL(tcp_init_sock);
> > >>>
> > >>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > >>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> > >>> {
> > >>> struct sk_buff *skb = tcp_write_queue_tail(sk);
> > >>> + u32 tsflags = sockc->tsflags;
> > >>>
> > >>> if (tsflags && skb) {
> > >>> struct skb_shared_info *shinfo = skb_shinfo(skb);
> > >>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> > >>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> > >>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> > >>> tcb->txstamp_ack = 1;
> > >>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> > >>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > >>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> > >>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> > >>> + shinfo->tskey = sockc->ts_opt_id;
> > >>> + else
> > >>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > >>> + }
> > >>> }
> > >>> }
> > >>>
> > >>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> > >>>
> > >>> out:
> > >>> if (copied) {
> > >>> - tcp_tx_timestamp(sk, sockc.tsflags);
> > >>> + tcp_tx_timestamp(sk, &sockc);
> > >>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> > >>> }
> > >>> out_nopush:
> > >>
> > >> Hi Willem,
> > >>
> > >> Unfortunately, these changes are not enough to enable custom OPT_ID for
> > >> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> > >> flow:
> > >>
> > >> tcp_skb_collapse_tstamp()
> > >> tcp_fragment_tstamp()
> > >> tcp_gso_tstamp()
> > >>
> > >> I believe the last one breaks tests, but the problem is that there is no
> > >> easy way to provide the flag of constant tskey to it. Only
> > >> shinfo::tx_flags are available at the caller side and we have already
> > >> discussed that we shouldn't use the last bit of this field.
> > >>
> > >> So, how should we deal with the problem? Or is it better to postpone
> > >> support for TCP sockets in this case?
> > >
> > > Are you sure that this is a problem. These functions pass on the
> > > skb_shinfo(skb)->ts_key from one skb to another.
> >
> > Yes, you are right, the problem is in a different place.
> >
> > __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> > provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> > value:
> >
> > if (sk_is_tcp(sk))
> > serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >
> > It happens because of assumption that for TCP sockets shinfo::tskey will
> > have sequence number and the logic has to recalculate it back to the
> > bytes sent. The same logic exists in all TCP TX timestamping functions
> > (mentioned in the previous mail) and may trigger some unexpected
> > behavior. To fix the issue we have to provide some kind of signal that
> > tskey value is provided from user-space and shouldn't be changed. And we
> > have to have it somewhere in skb info. Again, tx_flags looks like the
> > best candidate, but it's impossible to use. I'm thinking of using
> > special flag in tcp_skb_cb - gonna test more, but open for other
> > suggestions.
>
> Ai, that is tricky. tx_flags is full/scarce indeed.
>
> CB does not persist as the skb transitions between layers.
Though specifically for TCP, it is possible to look up the fast
clone on the rtx queue, whose tcp_skb_cb will be unperturbed. But
the tcb currently does not have this data either.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-06 15:25 ` Willem de Bruijn
2024-09-06 16:33 ` Willem de Bruijn
@ 2024-09-06 17:27 ` Vadim Fedorenko
2024-09-06 23:48 ` Willem de Bruijn
1 sibling, 1 reply; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-06 17:27 UTC (permalink / raw)
To: Willem de Bruijn, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
On 06/09/2024 16:25, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>> Vadim Fedorenko wrote:
>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>
>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>> ---
>>>>> net/ipv4/tcp.c | 13 +++++++++----
>>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>> --- a/net/ipv4/tcp.c
>>>>> +++ b/net/ipv4/tcp.c
>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>> }
>>>>> EXPORT_SYMBOL(tcp_init_sock);
>>>>>
>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>> {
>>>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>> + u32 tsflags = sockc->tsflags;
>>>>>
>>>>> if (tsflags && skb) {
>>>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>> tcb->txstamp_ack = 1;
>>>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>> + shinfo->tskey = sockc->ts_opt_id;
>>>>> + else
>>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>> + }
>>>>> }
>>>>> }
>>>>>
>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>
>>>>> out:
>>>>> if (copied) {
>>>>> - tcp_tx_timestamp(sk, sockc.tsflags);
>>>>> + tcp_tx_timestamp(sk, &sockc);
>>>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>> }
>>>>> out_nopush:
>>>>
>>>> Hi Willem,
>>>>
>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>> flow:
>>>>
>>>> tcp_skb_collapse_tstamp()
>>>> tcp_fragment_tstamp()
>>>> tcp_gso_tstamp()
>>>>
>>>> I believe the last one breaks tests, but the problem is that there is no
>>>> easy way to provide the flag of constant tskey to it. Only
>>>> shinfo::tx_flags are available at the caller side and we have already
>>>> discussed that we shouldn't use the last bit of this field.
>>>>
>>>> So, how should we deal with the problem? Or is it better to postpone
>>>> support for TCP sockets in this case?
>>>
>>> Are you sure that this is a problem. These functions pass on the
>>> skb_shinfo(skb)->ts_key from one skb to another.
>>
>> Yes, you are right, the problem is in a different place.
>>
>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>> value:
>>
>> if (sk_is_tcp(sk))
>> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>
>> It happens because of assumption that for TCP sockets shinfo::tskey will
>> have sequence number and the logic has to recalculate it back to the
>> bytes sent. The same logic exists in all TCP TX timestamping functions
>> (mentioned in the previous mail) and may trigger some unexpected
>> behavior. To fix the issue we have to provide some kind of signal that
>> tskey value is provided from user-space and shouldn't be changed. And we
>> have to have it somewhere in skb info. Again, tx_flags looks like the
>> best candidate, but it's impossible to use. I'm thinking of using
>> special flag in tcp_skb_cb - gonna test more, but open for other
>> suggestions.
>
> Ai, that is tricky. tx_flags is full/scarce indeed.
>
> CB does not persist as the skb transitions between layers.
>
> The most obvious solution would be to set the flag in sk_tsflags
> itself. But then the cmsg would no long work on a per request basis:
> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
>
> Good that we catch this now before the ABI is finalized.
>
> If necessary TCP semantics can diverge from datagrams. So we could
> punt on this. But it's not ideal.
I have done proof of concept code which uses hwtsamp as a storage for
custom tskey in case of TCP:
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a52638363ea5..40ed49e61bf7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
sk_buff *skb,
serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
serr->ee.ee_data = skb_shinfo(skb)->tskey;
- if (sk_is_tcp(sk))
- serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
}
err = sock_queue_err_skb(sk, skb);
@@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
* but only if the socket refcount is not zero.
*/
if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
+ if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
+ skb_shinfo(skb)->tskey =
(u32)skb_hwtstamps(skb)->hwtstamp;
*skb_hwtstamps(skb) = *hwtstamps;
__skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
false);
sock_put(sk);
@@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
}
+ if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
+ if (skb_hwtstamps(orig_skb)->hwtstamp)
+ skb_shinfo(skb)->tskey =
(u32)skb_hwtstamps(orig_skb)->hwtstamp;
+ else
+ skb_shinfo(skb)->tskey -=
atomic_read(&sk->sk_tskey);
+ }
if (hwtstamps)
*skb_hwtstamps(skb) = *hwtstamps;
else
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d3decc13a99..1a161a2155b5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
sockcm_cookie *sockc)
tcb->txstamp_ack = 1;
if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
- shinfo->tskey = sockc->ts_opt_id;
- else
- shinfo->tskey = TCP_SKB_CB(skb)->seq +
skb->len - 1;
+ skb_hwtstamps(skb)->hwtstamp =
sockc->ts_opt_id;
+ shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
}
}
}
Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
it. netdev_data field is only used on RX timestamp side, so should be
fine to reuse. WDYT?
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-06 16:33 ` Willem de Bruijn
@ 2024-09-06 17:27 ` Vadim Fedorenko
2024-09-06 23:50 ` Willem de Bruijn
0 siblings, 1 reply; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-06 17:27 UTC (permalink / raw)
To: Willem de Bruijn, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
On 06/09/2024 17:33, Willem de Bruijn wrote:
> Willem de Bruijn wrote:
>> Vadim Fedorenko wrote:
>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>>> Vadim Fedorenko wrote:
>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>>
>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>>> ---
>>>>>> net/ipv4/tcp.c | 13 +++++++++----
>>>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>>> --- a/net/ipv4/tcp.c
>>>>>> +++ b/net/ipv4/tcp.c
>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>>> }
>>>>>> EXPORT_SYMBOL(tcp_init_sock);
>>>>>>
>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>> {
>>>>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>>> + u32 tsflags = sockc->tsflags;
>>>>>>
>>>>>> if (tsflags && skb) {
>>>>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>>> tcb->txstamp_ack = 1;
>>>>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>>> + shinfo->tskey = sockc->ts_opt_id;
>>>>>> + else
>>>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>> + }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>>
>>>>>> out:
>>>>>> if (copied) {
>>>>>> - tcp_tx_timestamp(sk, sockc.tsflags);
>>>>>> + tcp_tx_timestamp(sk, &sockc);
>>>>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>>> }
>>>>>> out_nopush:
>>>>>
>>>>> Hi Willem,
>>>>>
>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>>> flow:
>>>>>
>>>>> tcp_skb_collapse_tstamp()
>>>>> tcp_fragment_tstamp()
>>>>> tcp_gso_tstamp()
>>>>>
>>>>> I believe the last one breaks tests, but the problem is that there is no
>>>>> easy way to provide the flag of constant tskey to it. Only
>>>>> shinfo::tx_flags are available at the caller side and we have already
>>>>> discussed that we shouldn't use the last bit of this field.
>>>>>
>>>>> So, how should we deal with the problem? Or is it better to postpone
>>>>> support for TCP sockets in this case?
>>>>
>>>> Are you sure that this is a problem. These functions pass on the
>>>> skb_shinfo(skb)->ts_key from one skb to another.
>>>
>>> Yes, you are right, the problem is in a different place.
>>>
>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>>> value:
>>>
>>> if (sk_is_tcp(sk))
>>> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>>
>>> It happens because of assumption that for TCP sockets shinfo::tskey will
>>> have sequence number and the logic has to recalculate it back to the
>>> bytes sent. The same logic exists in all TCP TX timestamping functions
>>> (mentioned in the previous mail) and may trigger some unexpected
>>> behavior. To fix the issue we have to provide some kind of signal that
>>> tskey value is provided from user-space and shouldn't be changed. And we
>>> have to have it somewhere in skb info. Again, tx_flags looks like the
>>> best candidate, but it's impossible to use. I'm thinking of using
>>> special flag in tcp_skb_cb - gonna test more, but open for other
>>> suggestions.
>>
>> Ai, that is tricky. tx_flags is full/scarce indeed.
>>
>> CB does not persist as the skb transitions between layers.
>
> Though specifically for TCP, it is possible to look up the fast
> clone on the rtx queue, whose tcp_skb_cb will be unperturbed. But
> the tcb currently does not have this data either.
It will work fine for software timestamps, but we cannot do the same
trick in case of HW timestamps, right?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-06 17:27 ` Vadim Fedorenko
@ 2024-09-06 23:48 ` Willem de Bruijn
2024-09-07 21:55 ` Vadim Fedorenko
0 siblings, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2024-09-06 23:48 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
Vadim Fedorenko wrote:
> On 06/09/2024 16:25, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> On 05/09/2024 17:39, Willem de Bruijn wrote:
> >>> Vadim Fedorenko wrote:
> >>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>>>
> >>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >>>>> ---
> >>>>> net/ipv4/tcp.c | 13 +++++++++----
> >>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>>>> index 8a5680b4e786..5553a8aeee80 100644
> >>>>> --- a/net/ipv4/tcp.c
> >>>>> +++ b/net/ipv4/tcp.c
> >>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>>>> }
> >>>>> EXPORT_SYMBOL(tcp_init_sock);
> >>>>>
> >>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>>> {
> >>>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>>>> + u32 tsflags = sockc->tsflags;
> >>>>>
> >>>>> if (tsflags && skb) {
> >>>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>>>> tcb->txstamp_ack = 1;
> >>>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>>>> + shinfo->tskey = sockc->ts_opt_id;
> >>>>> + else
> >>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>> + }
> >>>>> }
> >>>>> }
> >>>>>
> >>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>>>
> >>>>> out:
> >>>>> if (copied) {
> >>>>> - tcp_tx_timestamp(sk, sockc.tsflags);
> >>>>> + tcp_tx_timestamp(sk, &sockc);
> >>>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>>>> }
> >>>>> out_nopush:
> >>>>
> >>>> Hi Willem,
> >>>>
> >>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >>>> flow:
> >>>>
> >>>> tcp_skb_collapse_tstamp()
> >>>> tcp_fragment_tstamp()
> >>>> tcp_gso_tstamp()
> >>>>
> >>>> I believe the last one breaks tests, but the problem is that there is no
> >>>> easy way to provide the flag of constant tskey to it. Only
> >>>> shinfo::tx_flags are available at the caller side and we have already
> >>>> discussed that we shouldn't use the last bit of this field.
> >>>>
> >>>> So, how should we deal with the problem? Or is it better to postpone
> >>>> support for TCP sockets in this case?
> >>>
> >>> Are you sure that this is a problem. These functions pass on the
> >>> skb_shinfo(skb)->ts_key from one skb to another.
> >>
> >> Yes, you are right, the problem is in a different place.
> >>
> >> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> >> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> >> value:
> >>
> >> if (sk_is_tcp(sk))
> >> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>
> >> It happens because of assumption that for TCP sockets shinfo::tskey will
> >> have sequence number and the logic has to recalculate it back to the
> >> bytes sent. The same logic exists in all TCP TX timestamping functions
> >> (mentioned in the previous mail) and may trigger some unexpected
> >> behavior. To fix the issue we have to provide some kind of signal that
> >> tskey value is provided from user-space and shouldn't be changed. And we
> >> have to have it somewhere in skb info. Again, tx_flags looks like the
> >> best candidate, but it's impossible to use. I'm thinking of using
> >> special flag in tcp_skb_cb - gonna test more, but open for other
> >> suggestions.
> >
> > Ai, that is tricky. tx_flags is full/scarce indeed.
> >
> > CB does not persist as the skb transitions between layers.
> >
> > The most obvious solution would be to set the flag in sk_tsflags
> > itself. But then the cmsg would no long work on a per request basis:
> > either the socket uses OPT_ID with counter or OPT_ID_CMSG.
> >
> > Good that we catch this now before the ABI is finalized.
> >
> > If necessary TCP semantics can diverge from datagrams. So we could
> > punt on this. But it's not ideal.
>
> I have done proof of concept code which uses hwtsamp as a storage for
> custom tskey in case of TCP:
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a52638363ea5..40ed49e61bf7 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
> sk_buff *skb,
> serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
> if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
> serr->ee.ee_data = skb_shinfo(skb)->tskey;
> - if (sk_is_tcp(sk))
> - serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> }
>
> err = sock_queue_err_skb(sk, skb);
> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> * but only if the socket refcount is not zero.
> */
> if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
> + if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
> + skb_shinfo(skb)->tskey =
> (u32)skb_hwtstamps(skb)->hwtstamp;
> *skb_hwtstamps(skb) = *hwtstamps;
> __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
> false);
> sock_put(sk);
> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
> }
>
> + if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> + if (skb_hwtstamps(orig_skb)->hwtstamp)
> + skb_shinfo(skb)->tskey =
> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
> + else
> + skb_shinfo(skb)->tskey -=
> atomic_read(&sk->sk_tskey);
> + }
> if (hwtstamps)
> *skb_hwtstamps(skb) = *hwtstamps;
> else
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0d3decc13a99..1a161a2155b5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
> sockcm_cookie *sockc)
> tcb->txstamp_ack = 1;
> if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> - shinfo->tskey = sockc->ts_opt_id;
> - else
> - shinfo->tskey = TCP_SKB_CB(skb)->seq +
> skb->len - 1;
> + skb_hwtstamps(skb)->hwtstamp =
> sockc->ts_opt_id;
> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> }
> }
> }
>
>
> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
> it. netdev_data field is only used on RX timestamp side, so should be
> fine to reuse. WDYT?
We cannot really extend the struct. skb_shared_info is scarce.
hwtstamps is a union. But on tx the hw tstamp is queued using
skb_tstamp_tx, not through this shinfo data at all.
It just seems weird to have a shinfo->tskey, but then ignore it and
find yet another 32b field. Easier would be to find 1b to toggle
whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
I don't immediate see a perfect solution either.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-06 17:27 ` Vadim Fedorenko
@ 2024-09-06 23:50 ` Willem de Bruijn
0 siblings, 0 replies; 31+ messages in thread
From: Willem de Bruijn @ 2024-09-06 23:50 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
Vadim Fedorenko wrote:
> On 06/09/2024 17:33, Willem de Bruijn wrote:
> > Willem de Bruijn wrote:
> >> Vadim Fedorenko wrote:
> >>> On 05/09/2024 17:39, Willem de Bruijn wrote:
> >>>> Vadim Fedorenko wrote:
> >>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>>>>
> >>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >>>>>> ---
> >>>>>> net/ipv4/tcp.c | 13 +++++++++----
> >>>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
> >>>>>>
> >>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>>>>> index 8a5680b4e786..5553a8aeee80 100644
> >>>>>> --- a/net/ipv4/tcp.c
> >>>>>> +++ b/net/ipv4/tcp.c
> >>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>>>>> }
> >>>>>> EXPORT_SYMBOL(tcp_init_sock);
> >>>>>>
> >>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>>>> {
> >>>>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>>>>> + u32 tsflags = sockc->tsflags;
> >>>>>>
> >>>>>> if (tsflags && skb) {
> >>>>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>>>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>>>>> tcb->txstamp_ack = 1;
> >>>>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>>>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>>>>> + shinfo->tskey = sockc->ts_opt_id;
> >>>>>> + else
> >>>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>> + }
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>>>>
> >>>>>> out:
> >>>>>> if (copied) {
> >>>>>> - tcp_tx_timestamp(sk, sockc.tsflags);
> >>>>>> + tcp_tx_timestamp(sk, &sockc);
> >>>>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>>>>> }
> >>>>>> out_nopush:
> >>>>>
> >>>>> Hi Willem,
> >>>>>
> >>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >>>>> flow:
> >>>>>
> >>>>> tcp_skb_collapse_tstamp()
> >>>>> tcp_fragment_tstamp()
> >>>>> tcp_gso_tstamp()
> >>>>>
> >>>>> I believe the last one breaks tests, but the problem is that there is no
> >>>>> easy way to provide the flag of constant tskey to it. Only
> >>>>> shinfo::tx_flags are available at the caller side and we have already
> >>>>> discussed that we shouldn't use the last bit of this field.
> >>>>>
> >>>>> So, how should we deal with the problem? Or is it better to postpone
> >>>>> support for TCP sockets in this case?
> >>>>
> >>>> Are you sure that this is a problem. These functions pass on the
> >>>> skb_shinfo(skb)->ts_key from one skb to another.
> >>>
> >>> Yes, you are right, the problem is in a different place.
> >>>
> >>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> >>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> >>> value:
> >>>
> >>> if (sk_is_tcp(sk))
> >>> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>>
> >>> It happens because of assumption that for TCP sockets shinfo::tskey will
> >>> have sequence number and the logic has to recalculate it back to the
> >>> bytes sent. The same logic exists in all TCP TX timestamping functions
> >>> (mentioned in the previous mail) and may trigger some unexpected
> >>> behavior. To fix the issue we have to provide some kind of signal that
> >>> tskey value is provided from user-space and shouldn't be changed. And we
> >>> have to have it somewhere in skb info. Again, tx_flags looks like the
> >>> best candidate, but it's impossible to use. I'm thinking of using
> >>> special flag in tcp_skb_cb - gonna test more, but open for other
> >>> suggestions.
> >>
> >> Ai, that is tricky. tx_flags is full/scarce indeed.
> >>
> >> CB does not persist as the skb transitions between layers.
> >
> > Though specifically for TCP, it is possible to look up the fast
> > clone on the rtx queue, whose tcp_skb_cb will be unperturbed. But
> > the tcb currently does not have this data either.
>
> It will work fine for software timestamps, but we cannot do the same
> trick in case of HW timestamps, right?
I'm not really advocating it. The only user of this trick that I can
find is skb_still_in_host_queue, through skb_fclone_busy.
That said, it would also work for hardware, as the SKB_FCLONE_ORIG
remains on the rtx queue. In the common case. But skb_fclone_busy
points out edge cases, such as if a driver call skb_orphan..
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-06 23:48 ` Willem de Bruijn
@ 2024-09-07 21:55 ` Vadim Fedorenko
2024-09-08 19:19 ` Willem de Bruijn
0 siblings, 1 reply; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-07 21:55 UTC (permalink / raw)
To: Willem de Bruijn, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
On 07/09/2024 00:48, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 06/09/2024 16:25, Willem de Bruijn wrote:
>>> Vadim Fedorenko wrote:
>>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>>>> Vadim Fedorenko wrote:
>>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>>>
>>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>>>> ---
>>>>>>> net/ipv4/tcp.c | 13 +++++++++----
>>>>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>>>> --- a/net/ipv4/tcp.c
>>>>>>> +++ b/net/ipv4/tcp.c
>>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>>>> }
>>>>>>> EXPORT_SYMBOL(tcp_init_sock);
>>>>>>>
>>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>>> {
>>>>>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>>>> + u32 tsflags = sockc->tsflags;
>>>>>>>
>>>>>>> if (tsflags && skb) {
>>>>>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>>>> tcb->txstamp_ack = 1;
>>>>>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>>>> + shinfo->tskey = sockc->ts_opt_id;
>>>>>>> + else
>>>>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>> + }
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>>>
>>>>>>> out:
>>>>>>> if (copied) {
>>>>>>> - tcp_tx_timestamp(sk, sockc.tsflags);
>>>>>>> + tcp_tx_timestamp(sk, &sockc);
>>>>>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>>>> }
>>>>>>> out_nopush:
>>>>>>
>>>>>> Hi Willem,
>>>>>>
>>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>>>> flow:
>>>>>>
>>>>>> tcp_skb_collapse_tstamp()
>>>>>> tcp_fragment_tstamp()
>>>>>> tcp_gso_tstamp()
>>>>>>
>>>>>> I believe the last one breaks tests, but the problem is that there is no
>>>>>> easy way to provide the flag of constant tskey to it. Only
>>>>>> shinfo::tx_flags are available at the caller side and we have already
>>>>>> discussed that we shouldn't use the last bit of this field.
>>>>>>
>>>>>> So, how should we deal with the problem? Or is it better to postpone
>>>>>> support for TCP sockets in this case?
>>>>>
>>>>> Are you sure that this is a problem. These functions pass on the
>>>>> skb_shinfo(skb)->ts_key from one skb to another.
>>>>
>>>> Yes, you are right, the problem is in a different place.
>>>>
>>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>>>> value:
>>>>
>>>> if (sk_is_tcp(sk))
>>>> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>>>
>>>> It happens because of assumption that for TCP sockets shinfo::tskey will
>>>> have sequence number and the logic has to recalculate it back to the
>>>> bytes sent. The same logic exists in all TCP TX timestamping functions
>>>> (mentioned in the previous mail) and may trigger some unexpected
>>>> behavior. To fix the issue we have to provide some kind of signal that
>>>> tskey value is provided from user-space and shouldn't be changed. And we
>>>> have to have it somewhere in skb info. Again, tx_flags looks like the
>>>> best candidate, but it's impossible to use. I'm thinking of using
>>>> special flag in tcp_skb_cb - gonna test more, but open for other
>>>> suggestions.
>>>
>>> Ai, that is tricky. tx_flags is full/scarce indeed.
>>>
>>> CB does not persist as the skb transitions between layers.
>>>
>>> The most obvious solution would be to set the flag in sk_tsflags
>>> itself. But then the cmsg would no long work on a per request basis:
>>> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
>>>
>>> Good that we catch this now before the ABI is finalized.
>>>
>>> If necessary TCP semantics can diverge from datagrams. So we could
>>> punt on this. But it's not ideal.
>>
>> I have done proof of concept code which uses hwtsamp as a storage for
>> custom tskey in case of TCP:
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index a52638363ea5..40ed49e61bf7 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
>> sk_buff *skb,
>> serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
>> if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
>> serr->ee.ee_data = skb_shinfo(skb)->tskey;
>> - if (sk_is_tcp(sk))
>> - serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>> }
>>
>> err = sock_queue_err_skb(sk, skb);
>> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>> * but only if the socket refcount is not zero.
>> */
>> if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
>> + if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
>> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
>> + skb_shinfo(skb)->tskey =
>> (u32)skb_hwtstamps(skb)->hwtstamp;
>> *skb_hwtstamps(skb) = *hwtstamps;
>> __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
>> false);
>> sock_put(sk);
>> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>> skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
>> }
>>
>> + if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
>> + if (skb_hwtstamps(orig_skb)->hwtstamp)
>> + skb_shinfo(skb)->tskey =
>> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
>> + else
>> + skb_shinfo(skb)->tskey -=
>> atomic_read(&sk->sk_tskey);
>> + }
>> if (hwtstamps)
>> *skb_hwtstamps(skb) = *hwtstamps;
>> else
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 0d3decc13a99..1a161a2155b5 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
>> sockcm_cookie *sockc)
>> tcb->txstamp_ack = 1;
>> if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>> if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>> - shinfo->tskey = sockc->ts_opt_id;
>> - else
>> - shinfo->tskey = TCP_SKB_CB(skb)->seq +
>> skb->len - 1;
>> + skb_hwtstamps(skb)->hwtstamp =
>> sockc->ts_opt_id;
>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>> }
>> }
>> }
>>
>>
>> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
>> it. netdev_data field is only used on RX timestamp side, so should be
>> fine to reuse. WDYT?
>
> We cannot really extend the struct. skb_shared_info is scarce.
> hwtstamps is a union. But on tx the hw tstamp is queued using
> skb_tstamp_tx, not through this shinfo data at all.
>
> It just seems weird to have a shinfo->tskey, but then ignore it and
> find yet another 32b field. Easier would be to find 1b to toggle
> whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
>
> I don't immediate see a perfect solution either.
Well, with this said, and given that having 2 different tskey values is
weird solution, I'm thinking of skipping TCP part until we can find some
proper way. Will it be OK to have UDP and RAW sockets only opted into
the feature?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-07 21:55 ` Vadim Fedorenko
@ 2024-09-08 19:19 ` Willem de Bruijn
2024-09-08 19:55 ` Vadim Fedorenko
0 siblings, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2024-09-08 19:19 UTC (permalink / raw)
To: Vadim Fedorenko, Willem de Bruijn, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
Vadim Fedorenko wrote:
> On 07/09/2024 00:48, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> On 06/09/2024 16:25, Willem de Bruijn wrote:
> >>> Vadim Fedorenko wrote:
> >>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
> >>>>> Vadim Fedorenko wrote:
> >>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>>>>>
> >>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >>>>>>> ---
> >>>>>>> net/ipv4/tcp.c | 13 +++++++++----
> >>>>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>>>>>> index 8a5680b4e786..5553a8aeee80 100644
> >>>>>>> --- a/net/ipv4/tcp.c
> >>>>>>> +++ b/net/ipv4/tcp.c
> >>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>>>>>> }
> >>>>>>> EXPORT_SYMBOL(tcp_init_sock);
> >>>>>>>
> >>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>>>>> {
> >>>>>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>>>>>> + u32 tsflags = sockc->tsflags;
> >>>>>>>
> >>>>>>> if (tsflags && skb) {
> >>>>>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>>>>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>>>>>> tcb->txstamp_ack = 1;
> >>>>>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>>>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>>>>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>>>>>> + shinfo->tskey = sockc->ts_opt_id;
> >>>>>>> + else
> >>>>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>>> + }
> >>>>>>> }
> >>>>>>> }
> >>>>>>>
> >>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>>>>>
> >>>>>>> out:
> >>>>>>> if (copied) {
> >>>>>>> - tcp_tx_timestamp(sk, sockc.tsflags);
> >>>>>>> + tcp_tx_timestamp(sk, &sockc);
> >>>>>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>>>>>> }
> >>>>>>> out_nopush:
> >>>>>>
> >>>>>> Hi Willem,
> >>>>>>
> >>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >>>>>> flow:
> >>>>>>
> >>>>>> tcp_skb_collapse_tstamp()
> >>>>>> tcp_fragment_tstamp()
> >>>>>> tcp_gso_tstamp()
> >>>>>>
> >>>>>> I believe the last one breaks tests, but the problem is that there is no
> >>>>>> easy way to provide the flag of constant tskey to it. Only
> >>>>>> shinfo::tx_flags are available at the caller side and we have already
> >>>>>> discussed that we shouldn't use the last bit of this field.
> >>>>>>
> >>>>>> So, how should we deal with the problem? Or is it better to postpone
> >>>>>> support for TCP sockets in this case?
> >>>>>
> >>>>> Are you sure that this is a problem. These functions pass on the
> >>>>> skb_shinfo(skb)->ts_key from one skb to another.
> >>>>
> >>>> Yes, you are right, the problem is in a different place.
> >>>>
> >>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> >>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> >>>> value:
> >>>>
> >>>> if (sk_is_tcp(sk))
> >>>> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>>>
> >>>> It happens because of assumption that for TCP sockets shinfo::tskey will
> >>>> have sequence number and the logic has to recalculate it back to the
> >>>> bytes sent. The same logic exists in all TCP TX timestamping functions
> >>>> (mentioned in the previous mail) and may trigger some unexpected
> >>>> behavior. To fix the issue we have to provide some kind of signal that
> >>>> tskey value is provided from user-space and shouldn't be changed. And we
> >>>> have to have it somewhere in skb info. Again, tx_flags looks like the
> >>>> best candidate, but it's impossible to use. I'm thinking of using
> >>>> special flag in tcp_skb_cb - gonna test more, but open for other
> >>>> suggestions.
> >>>
> >>> Ai, that is tricky. tx_flags is full/scarce indeed.
> >>>
> >>> CB does not persist as the skb transitions between layers.
> >>>
> >>> The most obvious solution would be to set the flag in sk_tsflags
> >>> itself. But then the cmsg would no long work on a per request basis:
> >>> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
> >>>
> >>> Good that we catch this now before the ABI is finalized.
> >>>
> >>> If necessary TCP semantics can diverge from datagrams. So we could
> >>> punt on this. But it's not ideal.
> >>
> >> I have done proof of concept code which uses hwtsamp as a storage for
> >> custom tskey in case of TCP:
> >>
> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >> index a52638363ea5..40ed49e61bf7 100644
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
> >> sk_buff *skb,
> >> serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
> >> if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
> >> serr->ee.ee_data = skb_shinfo(skb)->tskey;
> >> - if (sk_is_tcp(sk))
> >> - serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >> }
> >>
> >> err = sock_queue_err_skb(sk, skb);
> >> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >> * but only if the socket refcount is not zero.
> >> */
> >> if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
> >> + if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
> >> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
> >> + skb_shinfo(skb)->tskey =
> >> (u32)skb_hwtstamps(skb)->hwtstamp;
> >> *skb_hwtstamps(skb) = *hwtstamps;
> >> __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
> >> false);
> >> sock_put(sk);
> >> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >> skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
> >> }
> >>
> >> + if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> >> + if (skb_hwtstamps(orig_skb)->hwtstamp)
> >> + skb_shinfo(skb)->tskey =
> >> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
> >> + else
> >> + skb_shinfo(skb)->tskey -=
> >> atomic_read(&sk->sk_tskey);
> >> + }
> >> if (hwtstamps)
> >> *skb_hwtstamps(skb) = *hwtstamps;
> >> else
> >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >> index 0d3decc13a99..1a161a2155b5 100644
> >> --- a/net/ipv4/tcp.c
> >> +++ b/net/ipv4/tcp.c
> >> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
> >> sockcm_cookie *sockc)
> >> tcb->txstamp_ack = 1;
> >> if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >> if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >> - shinfo->tskey = sockc->ts_opt_id;
> >> - else
> >> - shinfo->tskey = TCP_SKB_CB(skb)->seq +
> >> skb->len - 1;
> >> + skb_hwtstamps(skb)->hwtstamp =
> >> sockc->ts_opt_id;
> >> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >> }
> >> }
> >> }
> >>
> >>
> >> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
> >> it. netdev_data field is only used on RX timestamp side, so should be
> >> fine to reuse. WDYT?
> >
> > We cannot really extend the struct. skb_shared_info is scarce.
> > hwtstamps is a union. But on tx the hw tstamp is queued using
> > skb_tstamp_tx, not through this shinfo data at all.
> >
> > It just seems weird to have a shinfo->tskey, but then ignore it and
> > find yet another 32b field. Easier would be to find 1b to toggle
> > whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
> >
> > I don't immediate see a perfect solution either.
>
> Well, with this said, and given that having 2 different tskey values is
> weird solution, I'm thinking of skipping TCP part until we can find some
> proper way. Will it be OK to have UDP and RAW sockets only opted into
> the feature?
Yes, I think that's fine.
Let's make sure that tcp sendmsg fails if the new cmsg is passed. So
that we can add support for it later.
We also have to give some thought what it means to coalesce skbs with
non-sequential IDs, in the functions you mentioned before. Might be
fine to just say: that's the application's issue. But even then we
should document that behavior.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
2024-09-08 19:19 ` Willem de Bruijn
@ 2024-09-08 19:55 ` Vadim Fedorenko
0 siblings, 0 replies; 31+ messages in thread
From: Vadim Fedorenko @ 2024-09-08 19:55 UTC (permalink / raw)
To: Willem de Bruijn, Willem de Bruijn
Cc: netdev, David Ahern, Jason Xing, Jakub Kicinski, Simon Horman,
Paolo Abeni
On 08/09/2024 20:19, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 07/09/2024 00:48, Willem de Bruijn wrote:
>>> Vadim Fedorenko wrote:
>>>> On 06/09/2024 16:25, Willem de Bruijn wrote:
>>>>> Vadim Fedorenko wrote:
>>>>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>>>>>> Vadim Fedorenko wrote:
>>>>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>>>>>> ---
>>>>>>>>> net/ipv4/tcp.c | 13 +++++++++----
>>>>>>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>>>>>> --- a/net/ipv4/tcp.c
>>>>>>>>> +++ b/net/ipv4/tcp.c
>>>>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>>>>>> }
>>>>>>>>> EXPORT_SYMBOL(tcp_init_sock);
>>>>>>>>>
>>>>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>>>>> {
>>>>>>>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>>>>>> + u32 tsflags = sockc->tsflags;
>>>>>>>>>
>>>>>>>>> if (tsflags && skb) {
>>>>>>>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>>>>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>>>>>> tcb->txstamp_ack = 1;
>>>>>>>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>>>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>>>>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>>>>>> + shinfo->tskey = sockc->ts_opt_id;
>>>>>>>>> + else
>>>>>>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>>>> + }
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>>>>>
>>>>>>>>> out:
>>>>>>>>> if (copied) {
>>>>>>>>> - tcp_tx_timestamp(sk, sockc.tsflags);
>>>>>>>>> + tcp_tx_timestamp(sk, &sockc);
>>>>>>>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>>>>>> }
>>>>>>>>> out_nopush:
>>>>>>>>
>>>>>>>> Hi Willem,
>>>>>>>>
>>>>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>>>>>> flow:
>>>>>>>>
>>>>>>>> tcp_skb_collapse_tstamp()
>>>>>>>> tcp_fragment_tstamp()
>>>>>>>> tcp_gso_tstamp()
>>>>>>>>
>>>>>>>> I believe the last one breaks tests, but the problem is that there is no
>>>>>>>> easy way to provide the flag of constant tskey to it. Only
>>>>>>>> shinfo::tx_flags are available at the caller side and we have already
>>>>>>>> discussed that we shouldn't use the last bit of this field.
>>>>>>>>
>>>>>>>> So, how should we deal with the problem? Or is it better to postpone
>>>>>>>> support for TCP sockets in this case?
>>>>>>>
>>>>>>> Are you sure that this is a problem. These functions pass on the
>>>>>>> skb_shinfo(skb)->ts_key from one skb to another.
>>>>>>
>>>>>> Yes, you are right, the problem is in a different place.
>>>>>>
>>>>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>>>>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>>>>>> value:
>>>>>>
>>>>>> if (sk_is_tcp(sk))
>>>>>> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>>>>>
>>>>>> It happens because of assumption that for TCP sockets shinfo::tskey will
>>>>>> have sequence number and the logic has to recalculate it back to the
>>>>>> bytes sent. The same logic exists in all TCP TX timestamping functions
>>>>>> (mentioned in the previous mail) and may trigger some unexpected
>>>>>> behavior. To fix the issue we have to provide some kind of signal that
>>>>>> tskey value is provided from user-space and shouldn't be changed. And we
>>>>>> have to have it somewhere in skb info. Again, tx_flags looks like the
>>>>>> best candidate, but it's impossible to use. I'm thinking of using
>>>>>> special flag in tcp_skb_cb - gonna test more, but open for other
>>>>>> suggestions.
>>>>>
>>>>> Ai, that is tricky. tx_flags is full/scarce indeed.
>>>>>
>>>>> CB does not persist as the skb transitions between layers.
>>>>>
>>>>> The most obvious solution would be to set the flag in sk_tsflags
>>>>> itself. But then the cmsg would no long work on a per request basis:
>>>>> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
>>>>>
>>>>> Good that we catch this now before the ABI is finalized.
>>>>>
>>>>> If necessary TCP semantics can diverge from datagrams. So we could
>>>>> punt on this. But it's not ideal.
>>>>
>>>> I have done proof of concept code which uses hwtsamp as a storage for
>>>> custom tskey in case of TCP:
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index a52638363ea5..40ed49e61bf7 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
>>>> sk_buff *skb,
>>>> serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
>>>> if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
>>>> serr->ee.ee_data = skb_shinfo(skb)->tskey;
>>>> - if (sk_is_tcp(sk))
>>>> - serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>>> }
>>>>
>>>> err = sock_queue_err_skb(sk, skb);
>>>> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>>>> * but only if the socket refcount is not zero.
>>>> */
>>>> if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
>>>> + if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
>>>> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
>>>> + skb_shinfo(skb)->tskey =
>>>> (u32)skb_hwtstamps(skb)->hwtstamp;
>>>> *skb_hwtstamps(skb) = *hwtstamps;
>>>> __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
>>>> false);
>>>> sock_put(sk);
>>>> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>>>> skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
>>>> }
>>>>
>>>> + if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
>>>> + if (skb_hwtstamps(orig_skb)->hwtstamp)
>>>> + skb_shinfo(skb)->tskey =
>>>> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
>>>> + else
>>>> + skb_shinfo(skb)->tskey -=
>>>> atomic_read(&sk->sk_tskey);
>>>> + }
>>>> if (hwtstamps)
>>>> *skb_hwtstamps(skb) = *hwtstamps;
>>>> else
>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>> index 0d3decc13a99..1a161a2155b5 100644
>>>> --- a/net/ipv4/tcp.c
>>>> +++ b/net/ipv4/tcp.c
>>>> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
>>>> sockcm_cookie *sockc)
>>>> tcb->txstamp_ack = 1;
>>>> if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>> if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>> - shinfo->tskey = sockc->ts_opt_id;
>>>> - else
>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq +
>>>> skb->len - 1;
>>>> + skb_hwtstamps(skb)->hwtstamp =
>>>> sockc->ts_opt_id;
>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>>>> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
>>>> it. netdev_data field is only used on RX timestamp side, so should be
>>>> fine to reuse. WDYT?
>>>
>>> We cannot really extend the struct. skb_shared_info is scarce.
>>> hwtstamps is a union. But on tx the hw tstamp is queued using
>>> skb_tstamp_tx, not through this shinfo data at all.
>>>
>>> It just seems weird to have a shinfo->tskey, but then ignore it and
>>> find yet another 32b field. Easier would be to find 1b to toggle
>>> whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
>>>
>>> I don't immediate see a perfect solution either.
>>
>> Well, with this said, and given that having 2 different tskey values is
>> weird solution, I'm thinking of skipping TCP part until we can find some
>> proper way. Will it be OK to have UDP and RAW sockets only opted into
>> the feature?
>
> Yes, I think that's fine.
>
> Let's make sure that tcp sendmsg fails if the new cmsg is passed. So
> that we can add support for it later.
Thanks Willem, I'll prepare patchset tomorrow.
> We also have to give some thought what it means to coalesce skbs with
> non-sequential IDs, in the functions you mentioned before. Might be
> fine to just say: that's the application's issue. But even then we
> should document that behavior.
We can discuss it in a separate RFC series, there are more open
questions, I believe.
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2024-09-08 19:56 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-04 11:31 [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
2024-09-04 13:56 ` Jason Xing
2024-09-04 14:01 ` Vadim Fedorenko
2024-09-04 20:54 ` Willem de Bruijn
2024-09-05 10:10 ` Vadim Fedorenko
2024-09-05 13:29 ` Willem de Bruijn
2024-09-05 8:24 ` Jason Xing
2024-09-05 8:34 ` Vadim Fedorenko
2024-09-05 8:49 ` Jason Xing
2024-09-06 9:22 ` kernel test robot
2024-09-06 12:58 ` kernel test robot
2024-09-04 11:31 ` [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets Vadim Fedorenko
2024-09-05 14:58 ` Vadim Fedorenko
2024-09-05 15:37 ` Jason Xing
2024-09-05 16:39 ` Willem de Bruijn
2024-09-06 12:20 ` Vadim Fedorenko
2024-09-06 15:25 ` Willem de Bruijn
2024-09-06 16:33 ` Willem de Bruijn
2024-09-06 17:27 ` Vadim Fedorenko
2024-09-06 23:50 ` Willem de Bruijn
2024-09-06 17:27 ` Vadim Fedorenko
2024-09-06 23:48 ` Willem de Bruijn
2024-09-07 21:55 ` Vadim Fedorenko
2024-09-08 19:19 ` Willem de Bruijn
2024-09-08 19:55 ` Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 3/4] net_tstamp: add SCM_TS_OPT_ID for RAW sockets Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 4/4] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
2024-09-04 11:36 ` [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
2024-09-04 21:04 ` Willem de Bruijn
2024-09-04 22:50 ` Vadim Fedorenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).