All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
@ 2024-09-02 13:09 Vadim Fedorenko
  2024-09-02 13:09 ` [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Vadim Fedorenko @ 2024-09-02 13:09 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	David Ahern, Jason Xing
  Cc: Vadim Fedorenko, netdev

SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
timestamps and packets sent via socket. Unfortunately, there is no way
to reliably predict socket timestamp ID value in case of error returned
by sendmsg. For UDP sockets it's impossible because of lockless
nature of UDP transmit, several threads may send packets in parallel. In
case of RAW sockets MSG_MORE option makes things complicated. More
details are in the conversation [1].
This patch adds new control message type to give user-space
software an opportunity to control the mapping between packets and
values by providing ID with each sendmsg. This works fine for UDP
sockets only, and explicit check is added to control message parser.

[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/

Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
 Documentation/networking/timestamping.rst | 14 ++++++++++++++
 arch/alpha/include/uapi/asm/socket.h      |  4 +++-
 arch/mips/include/uapi/asm/socket.h       |  2 ++
 arch/parisc/include/uapi/asm/socket.h     |  2 ++
 arch/sparc/include/uapi/asm/socket.h      |  2 ++
 include/net/inet_sock.h                   |  4 +++-
 include/net/sock.h                        |  1 +
 include/uapi/asm-generic/socket.h         |  2 ++
 include/uapi/linux/net_tstamp.h           |  3 ++-
 net/core/sock.c                           | 12 ++++++++++++
 net/ethtool/common.c                      |  1 +
 net/ipv4/ip_output.c                      | 16 ++++++++++++----
 net/ipv6/ip6_output.c                     | 16 ++++++++++++----
 13 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
index 5e93cd71f99f..93b0901e4e8e 100644
--- a/Documentation/networking/timestamping.rst
+++ b/Documentation/networking/timestamping.rst
@@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
   among all possibly concurrently outstanding timestamp requests for
   that socket.
 
+  With this option enabled user-space application can provide custom
+  ID for each message sent via UDP socket with control message with
+  type set to SCM_TS_OPT_ID::
+
+    struct msghdr *msg;
+    ...
+    cmsg			 = CMSG_FIRSTHDR(msg);
+    cmsg->cmsg_level		 = SOL_SOCKET;
+    cmsg->cmsg_type		 = SO_TIMESTAMPING;
+    cmsg->cmsg_len		 = CMSG_LEN(sizeof(__u32));
+    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
+    err = sendmsg(fd, msg, 0);
+
+
 SOF_TIMESTAMPING_OPT_ID_TCP:
   Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
   timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index e94f621903fe..0698e6662cdf 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -10,7 +10,7 @@
  * Note: we only bother about making the SOL_SOCKET options
  * same as OSF/1, as that's all that "normal" programs are
  * likely to set.  We don't necessarily want to be binary
- * compatible with _everything_. 
+ * compatible with _everything_.
  */
 #define SOL_SOCKET	0xffff
 
@@ -140,6 +140,8 @@
 #define SO_PASSPIDFD		76
 #define SO_PEERPIDFD		77
 
+#define SCM_TS_OPT_ID		78
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 60ebaed28a4c..bb3dc8feb205 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -151,6 +151,8 @@
 #define SO_PASSPIDFD		76
 #define SO_PEERPIDFD		77
 
+#define SCM_TS_OPT_ID		78
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index be264c2b1a11..c3ab3b3289eb 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -132,6 +132,8 @@
 #define SO_PASSPIDFD		0x404A
 #define SO_PEERPIDFD		0x404B
 
+#define SCM_TS_OPT_ID		0x404C
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 682da3714686..9b40f0a57fbc 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -133,6 +133,8 @@
 #define SO_PASSPIDFD             0x0055
 #define SO_PEERPIDFD             0x0056
 
+#define SCM_TS_OPT_ID            0x0057
+
 #if !defined(__KERNEL__)
 
 
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 394c3b66065e..2161d50cf0fd 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -174,6 +174,7 @@ struct inet_cork {
 	__s16			tos;
 	char			priority;
 	__u16			gso_size;
+	u32			ts_opt_id;
 	u64			transmit_time;
 	u32			mark;
 };
@@ -241,7 +242,8 @@ struct inet_sock {
 	struct inet_cork_full	cork;
 };
 
-#define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
+#define IPCORK_OPT		1	/* ip-options has been held in ipcork.opt */
+#define IPCORK_TS_OPT_ID	2	/* timestmap opt id has been provided in cmsg */
 
 enum {
 	INET_FLAGS_PKTINFO	= 0,
diff --git a/include/net/sock.h b/include/net/sock.h
index f51d61fab059..73e21dad5660 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1794,6 +1794,7 @@ struct sockcm_cookie {
 	u64 transmit_time;
 	u32 mark;
 	u32 tsflags;
+	u32 ts_opt_id;
 };
 
 static inline void sockcm_init(struct sockcm_cookie *sockc,
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 8ce8a39a1e5f..db3df3e74b01 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -135,6 +135,8 @@
 #define SO_PASSPIDFD		76
 #define SO_PEERPIDFD		77
 
+#define SCM_TS_OPT_ID		78
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index a2c66b3d7f0f..e2f145e3f3a1 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -32,8 +32,9 @@ enum {
 	SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
 	SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
 	SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
+	SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_TCP,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_CMSG,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
diff --git a/net/core/sock.c b/net/core/sock.c
index 468b1239606c..560b075765fa 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2859,6 +2859,18 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
 			return -EINVAL;
 		sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
 		break;
+	case SCM_TS_OPT_ID:
+		/* allow this option for UDP sockets only */
+		if (!sk_is_udp(sk))
+			return -EINVAL;
+		tsflags = READ_ONCE(sk->sk_tsflags);
+		if (!(tsflags & SOF_TIMESTAMPING_OPT_ID))
+			return -EINVAL;
+		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
+			return -EINVAL;
+		sockc->ts_opt_id = *(u32 *)CMSG_DATA(cmsg);
+		sockc->tsflags |= SOF_TIMESTAMPING_OPT_ID_CMSG;
+		break;
 	/* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
 	case SCM_RIGHTS:
 	case SCM_CREDENTIALS:
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 7257ae272296..147b87c883a9 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -430,6 +430,7 @@ const char sof_timestamping_names[][ETH_GSTRING_LEN] = {
 	[const_ilog2(SOF_TIMESTAMPING_OPT_TX_SWHW)]  = "option-tx-swhw",
 	[const_ilog2(SOF_TIMESTAMPING_BIND_PHC)]     = "bind-phc",
 	[const_ilog2(SOF_TIMESTAMPING_OPT_ID_TCP)]   = "option-id-tcp",
+	[const_ilog2(SOF_TIMESTAMPING_OPT_ID_CMSG)]  = "option-id-cmsg",
 };
 static_assert(ARRAY_SIZE(sof_timestamping_names) == __SOF_TIMESTAMPING_CNT);
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b90d0f78ac80..1a0fe7e99843 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1048,10 +1048,15 @@ static int __ip_append_data(struct sock *sk,
 
 	cork->length += length;
 
-	hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
-		     READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
-	if (hold_tskey)
-		tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+	if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
+	    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
+		if (cork->flags & IPCORK_TS_OPT_ID) {
+			tskey = cork->ts_opt_id;
+		} else {
+			tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+			hold_tskey = true;
+		}
+	}
 
 	/* So, what's going on in the loop below?
 	 *
@@ -1324,8 +1329,11 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	cork->mark = ipc->sockc.mark;
 	cork->priority = ipc->priority;
 	cork->transmit_time = ipc->sockc.transmit_time;
+	cork->ts_opt_id = ipc->sockc.ts_opt_id;
 	cork->tx_flags = 0;
 	sock_tx_timestamp(sk, ipc->sockc.tsflags, &cork->tx_flags);
+	if (ipc->sockc.tsflags & SOF_TIMESTAMPING_OPT_ID_CMSG)
+		cork->flags |= IPCORK_TS_OPT_ID;
 
 	return 0;
 }
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f26841f1490f..d7113f8622bf 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1401,7 +1401,10 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	cork->base.gso_size = ipc6->gso_size;
 	cork->base.tx_flags = 0;
 	cork->base.mark = ipc6->sockc.mark;
+	cork->base.ts_opt_id = ipc6->sockc.ts_opt_id;
 	sock_tx_timestamp(sk, ipc6->sockc.tsflags, &cork->base.tx_flags);
+	if (ipc6->sockc.tsflags & SOF_TIMESTAMPING_OPT_ID_CMSG)
+		cork->base.flags |= IPCORK_TS_OPT_ID;
 
 	cork->base.length = 0;
 	cork->base.transmit_time = ipc6->sockc.transmit_time;
@@ -1543,10 +1546,15 @@ static int __ip6_append_data(struct sock *sk,
 			flags &= ~MSG_SPLICE_PAGES;
 	}
 
-	hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
-		     READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
-	if (hold_tskey)
-		tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+	if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
+	    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
+		if (cork->flags & IPCORK_TS_OPT_ID) {
+			tskey = cork->ts_opt_id;
+		} else {
+			tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+			hold_tskey = true;
+		}
+	}
 
 	/*
 	 * Let's try using as much space as possible.
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test
  2024-09-02 13:09 [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
@ 2024-09-02 13:09 ` Vadim Fedorenko
  2024-09-02 14:24   ` Jason Xing
  2024-09-02 14:29 ` [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Willem de Bruijn
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Vadim Fedorenko @ 2024-09-02 13:09 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	David Ahern, Jason Xing
  Cc: Vadim Fedorenko, netdev

Extend txtimestamp udp test to run with fixed tskey using
SCM_TS_OPT_ID control message.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
 tools/include/uapi/asm-generic/socket.h    |  2 +
 tools/testing/selftests/net/txtimestamp.c  | 48 +++++++++++++++++-----
 tools/testing/selftests/net/txtimestamp.sh | 12 +++---
 3 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/tools/include/uapi/asm-generic/socket.h b/tools/include/uapi/asm-generic/socket.h
index 54d9c8bf7c55..281df9139d2b 100644
--- a/tools/include/uapi/asm-generic/socket.h
+++ b/tools/include/uapi/asm-generic/socket.h
@@ -124,6 +124,8 @@
 #define SO_PASSPIDFD		76
 #define SO_PEERPIDFD		77
 
+#define SCM_TS_OPT_ID		78
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/tools/testing/selftests/net/txtimestamp.c b/tools/testing/selftests/net/txtimestamp.c
index ec60a16c9307..3a8f716e72ae 100644
--- a/tools/testing/selftests/net/txtimestamp.c
+++ b/tools/testing/selftests/net/txtimestamp.c
@@ -54,6 +54,10 @@
 #define USEC_PER_SEC	1000000L
 #define NSEC_PER_SEC	1000000000LL
 
+#ifndef SCM_TS_OPT_ID
+# define SCM_TS_OPT_ID 78
+#endif
+
 /* command line parameters */
 static int cfg_proto = SOCK_STREAM;
 static int cfg_ipproto = IPPROTO_TCP;
@@ -77,6 +81,8 @@ static bool cfg_epollet;
 static bool cfg_do_listen;
 static uint16_t dest_port = 9000;
 static bool cfg_print_nsec;
+static uint32_t ts_opt_id;
+static bool cfg_use_cmsg_opt_id;
 
 static struct sockaddr_in daddr;
 static struct sockaddr_in6 daddr6;
@@ -136,12 +142,13 @@ static void validate_key(int tskey, int tstype)
 	/* compare key for each subsequent request
 	 * must only test for one type, the first one requested
 	 */
-	if (saved_tskey == -1)
+	if (saved_tskey == -1 || cfg_use_cmsg_opt_id)
 		saved_tskey_type = tstype;
 	else if (saved_tskey_type != tstype)
 		return;
 
 	stepsize = cfg_proto == SOCK_STREAM ? cfg_payload_len : 1;
+	stepsize = cfg_use_cmsg_opt_id ? 0 : stepsize;
 	if (tskey != saved_tskey + stepsize) {
 		fprintf(stderr, "ERROR: key %d, expected %d\n",
 				tskey, saved_tskey + stepsize);
@@ -480,7 +487,7 @@ static void fill_header_udp(void *p, bool is_ipv4)
 
 static void do_test(int family, unsigned int report_opt)
 {
-	char control[CMSG_SPACE(sizeof(uint32_t))];
+	char control[2 * CMSG_SPACE(sizeof(uint32_t))];
 	struct sockaddr_ll laddr;
 	unsigned int sock_opt;
 	struct cmsghdr *cmsg;
@@ -620,18 +627,32 @@ static void do_test(int family, unsigned int report_opt)
 		msg.msg_iov = &iov;
 		msg.msg_iovlen = 1;
 
-		if (cfg_use_cmsg) {
+		if (cfg_use_cmsg || cfg_use_cmsg_opt_id) {
 			memset(control, 0, sizeof(control));
 
 			msg.msg_control = control;
-			msg.msg_controllen = sizeof(control);
+			msg.msg_controllen = cfg_use_cmsg * CMSG_SPACE(sizeof(uint32_t));
+			msg.msg_controllen += cfg_use_cmsg_opt_id * CMSG_SPACE(sizeof(uint32_t));
 
-			cmsg = CMSG_FIRSTHDR(&msg);
-			cmsg->cmsg_level = SOL_SOCKET;
-			cmsg->cmsg_type = SO_TIMESTAMPING;
-			cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t));
+			cmsg = NULL;
+			if (cfg_use_cmsg) {
+				cmsg = CMSG_FIRSTHDR(&msg);
+				cmsg->cmsg_level = SOL_SOCKET;
+				cmsg->cmsg_type = SO_TIMESTAMPING;
+				cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t));
+
+				*((uint32_t *)CMSG_DATA(cmsg)) = report_opt;
+			}
+			if (cfg_use_cmsg_opt_id) {
+				cmsg = cmsg ? CMSG_NXTHDR(&msg, cmsg) : CMSG_FIRSTHDR(&msg);
+				cmsg->cmsg_level = SOL_SOCKET;
+				cmsg->cmsg_type = SCM_TS_OPT_ID;
+				cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t));
+
+				*((uint32_t *)CMSG_DATA(cmsg)) = ts_opt_id;
+				saved_tskey = ts_opt_id;
+			}
 
-			*((uint32_t *) CMSG_DATA(cmsg)) = report_opt;
 		}
 
 		val = sendmsg(fd, &msg, 0);
@@ -681,6 +702,7 @@ static void __attribute__((noreturn)) usage(const char *filepath)
 			"  -L    listen on hostname and port\n"
 			"  -n:   set no-payload option\n"
 			"  -N:   print timestamps and durations in nsec (instead of usec)\n"
+			"  -o N: use SCM_TS_OPT_ID control message to provide N as tskey\n"
 			"  -p N: connect to port N\n"
 			"  -P:   use PF_PACKET\n"
 			"  -r:   use raw\n"
@@ -701,7 +723,7 @@ static void parse_opt(int argc, char **argv)
 	int c;
 
 	while ((c = getopt(argc, argv,
-				"46bc:CeEFhIl:LnNp:PrRS:t:uv:V:x")) != -1) {
+				"46bc:CeEFhIl:LnNo:p:PrRS:t:uv:V:x")) != -1) {
 		switch (c) {
 		case '4':
 			do_ipv6 = 0;
@@ -742,6 +764,10 @@ static void parse_opt(int argc, char **argv)
 		case 'N':
 			cfg_print_nsec = true;
 			break;
+		case 'o':
+			ts_opt_id = strtoul(optarg, NULL, 10);
+			cfg_use_cmsg_opt_id = true;
+			break;
 		case 'p':
 			dest_port = strtoul(optarg, NULL, 10);
 			break;
@@ -799,6 +825,8 @@ static void parse_opt(int argc, char **argv)
 		error(1, 0, "cannot ask for pktinfo over pf_packet");
 	if (cfg_busy_poll && cfg_use_epoll)
 		error(1, 0, "pass epoll or busy_poll, not both");
+	if (cfg_proto != SOCK_DGRAM && cfg_use_cmsg_opt_id)
+		error(1, 0, "control message TS_OPT_ID can only be used with udp socket");
 
 	if (optind != argc - 1)
 		error(1, 0, "missing required hostname argument");
diff --git a/tools/testing/selftests/net/txtimestamp.sh b/tools/testing/selftests/net/txtimestamp.sh
index 25baca4b148e..6cc2b1b480e0 100755
--- a/tools/testing/selftests/net/txtimestamp.sh
+++ b/tools/testing/selftests/net/txtimestamp.sh
@@ -37,11 +37,13 @@ run_test_v4v6() {
 run_test_tcpudpraw() {
 	local -r args=$@
 
-	run_test_v4v6 ${args}		# tcp
-	run_test_v4v6 ${args} -u	# udp
-	run_test_v4v6 ${args} -r	# raw
-	run_test_v4v6 ${args} -R	# raw (IPPROTO_RAW)
-	run_test_v4v6 ${args} -P	# pf_packet
+	run_test_v4v6 ${args}		 # tcp
+	run_test_v4v6 ${args} -u	 # udp
+	run_test_v4v6 ${args} -u -o 5	 # udp with fixed tskey
+	run_test_v4v6 ${args} -u -o 5 -C # udp with fixed tskey and cmsg control
+	run_test_v4v6 ${args} -r	 # raw
+	run_test_v4v6 ${args} -R	 # raw (IPPROTO_RAW)
+	run_test_v4v6 ${args} -P	 # pf_packet
 }
 
 run_test_all() {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test
  2024-09-02 13:09 ` [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
@ 2024-09-02 14:24   ` Jason Xing
  2024-09-08 20:04     ` Vadim Fedorenko
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Xing @ 2024-09-02 14:24 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	David Ahern, netdev

On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>
> Extend txtimestamp udp test to run with fixed tskey using
> SCM_TS_OPT_ID control message.
>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>

Thanks for adding the combination test !

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 13:09 [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
  2024-09-02 13:09 ` [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
@ 2024-09-02 14:29 ` Willem de Bruijn
  2024-09-02 21:07   ` Vadim Fedorenko
  2024-09-02 15:19 ` Jason Xing
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Willem de Bruijn @ 2024-09-02 14:29 UTC (permalink / raw)
  To: Vadim Fedorenko, Vadim Fedorenko, Willem de Bruijn,
	Jakub Kicinski, Paolo Abeni, David Ahern, Jason Xing
  Cc: Vadim Fedorenko, netdev

Vadim Fedorenko wrote:
> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> timestamps and packets sent via socket. Unfortunately, there is no way
> to reliably predict socket timestamp ID value in case of error returned
> by sendmsg. For UDP sockets it's impossible because of lockless
> nature of UDP transmit, several threads may send packets in parallel. In
> case of RAW sockets MSG_MORE option makes things complicated. More
> details are in the conversation [1].
> This patch adds new control message type to give user-space
> software an opportunity to control the mapping between packets and
> values by providing ID with each sendmsg. This works fine for UDP
> sockets only, and explicit check is added to control message parser.
> 
> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> 
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> ---
>  Documentation/networking/timestamping.rst | 14 ++++++++++++++
>  arch/alpha/include/uapi/asm/socket.h      |  4 +++-
>  arch/mips/include/uapi/asm/socket.h       |  2 ++
>  arch/parisc/include/uapi/asm/socket.h     |  2 ++
>  arch/sparc/include/uapi/asm/socket.h      |  2 ++
>  include/net/inet_sock.h                   |  4 +++-
>  include/net/sock.h                        |  1 +
>  include/uapi/asm-generic/socket.h         |  2 ++
>  include/uapi/linux/net_tstamp.h           |  3 ++-
>  net/core/sock.c                           | 12 ++++++++++++
>  net/ethtool/common.c                      |  1 +
>  net/ipv4/ip_output.c                      | 16 ++++++++++++----
>  net/ipv6/ip6_output.c                     | 16 ++++++++++++----
>  13 files changed, 68 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> index 5e93cd71f99f..93b0901e4e8e 100644
> --- a/Documentation/networking/timestamping.rst
> +++ b/Documentation/networking/timestamping.rst
> @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
>    among all possibly concurrently outstanding timestamp requests for
>    that socket.
>  
> +  With this option enabled user-space application can provide custom
> +  ID for each message sent via UDP socket with control message with
> +  type set to SCM_TS_OPT_ID::
> +
> +    struct msghdr *msg;
> +    ...
> +    cmsg			 = CMSG_FIRSTHDR(msg);
> +    cmsg->cmsg_level		 = SOL_SOCKET;
> +    cmsg->cmsg_type		 = SO_TIMESTAMPING;
> +    cmsg->cmsg_len		 = CMSG_LEN(sizeof(__u32));
> +    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
> +    err = sendmsg(fd, msg, 0);
> +

Please make it clear that this CMSG is optional.

The process can optionally override the default generated ID, by
passing a specific ID with control message SCM_TS_OPT_ID:

>  SOF_TIMESTAMPING_OPT_ID_TCP:
>    Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
>    timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> index e94f621903fe..0698e6662cdf 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -10,7 +10,7 @@
>   * Note: we only bother about making the SOL_SOCKET options
>   * same as OSF/1, as that's all that "normal" programs are
>   * likely to set.  We don't necessarily want to be binary
> - * compatible with _everything_. 
> + * compatible with _everything_.

Is this due to a checkpatch warning? If so, please add a brief comment
to the commit message to show that this change is intentional. If not,
please don't touch unrelated code.

>   */
>  #define SOL_SOCKET	0xffff
>  
> @@ -140,6 +140,8 @@
>  #define SO_PASSPIDFD		76
>  #define SO_PEERPIDFD		77
>  
> +#define SCM_TS_OPT_ID		78
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> index 60ebaed28a4c..bb3dc8feb205 100644
> --- a/arch/mips/include/uapi/asm/socket.h
> +++ b/arch/mips/include/uapi/asm/socket.h
> @@ -151,6 +151,8 @@
>  #define SO_PASSPIDFD		76
>  #define SO_PEERPIDFD		77
>  
> +#define SCM_TS_OPT_ID		78
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> index be264c2b1a11..c3ab3b3289eb 100644
> --- a/arch/parisc/include/uapi/asm/socket.h
> +++ b/arch/parisc/include/uapi/asm/socket.h
> @@ -132,6 +132,8 @@
>  #define SO_PASSPIDFD		0x404A
>  #define SO_PEERPIDFD		0x404B
>  
> +#define SCM_TS_OPT_ID		0x404C
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> index 682da3714686..9b40f0a57fbc 100644
> --- a/arch/sparc/include/uapi/asm/socket.h
> +++ b/arch/sparc/include/uapi/asm/socket.h
> @@ -133,6 +133,8 @@
>  #define SO_PASSPIDFD             0x0055
>  #define SO_PEERPIDFD             0x0056
>  
> +#define SCM_TS_OPT_ID            0x0057
> +
>  #if !defined(__KERNEL__)
>  
>  
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 394c3b66065e..2161d50cf0fd 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -174,6 +174,7 @@ struct inet_cork {
>  	__s16			tos;
>  	char			priority;
>  	__u16			gso_size;
> +	u32			ts_opt_id;
>  	u64			transmit_time;
>  	u32			mark;
>  };
> @@ -241,7 +242,8 @@ struct inet_sock {
>  	struct inet_cork_full	cork;
>  };
>  
> -#define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
> +#define IPCORK_OPT		1	/* ip-options has been held in ipcork.opt */
> +#define IPCORK_TS_OPT_ID	2	/* timestmap opt id has been provided in cmsg */

typo: timestamp

And maybe more relevant:  /* ts_opt_id field is valid, overriding sk_tskey */
  
>  enum {
>  	INET_FLAGS_PKTINFO	= 0,
> diff --git a/include/net/sock.h b/include/net/sock.h
> index f51d61fab059..73e21dad5660 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
>  	u64 transmit_time;
>  	u32 mark;
>  	u32 tsflags;
> +	u32 ts_opt_id;
>  };
>  
>  static inline void sockcm_init(struct sockcm_cookie *sockc,
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index 8ce8a39a1e5f..db3df3e74b01 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -135,6 +135,8 @@
>  #define SO_PASSPIDFD		76
>  #define SO_PEERPIDFD		77
>  
> +#define SCM_TS_OPT_ID		78

nit: different indentation

> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index a2c66b3d7f0f..e2f145e3f3a1 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -32,8 +32,9 @@ enum {
>  	SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
>  	SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
>  	SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
> +	SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
>  
> -	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_TCP,
> +	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_CMSG,
>  	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
>  				 SOF_TIMESTAMPING_LAST
>  };
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 468b1239606c..560b075765fa 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2859,6 +2859,18 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
>  			return -EINVAL;
>  		sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
>  		break;
> +	case SCM_TS_OPT_ID:
> +		/* allow this option for UDP sockets only */
> +		if (!sk_is_udp(sk))
> +			return -EINVAL;

Let's relax the restriction that this is only for UDP.

At least to also support SOCK_RAW. I don't think that requires any
additional code at all?

Extending to TCP should be straightforward too, just a branch
on sockc in tcp_tx_timestamp.

If so, let's support all. It makes for a simpler API if it is
supported uniformly wherever OPT_ID is.

> +		tsflags = READ_ONCE(sk->sk_tsflags);
> +		if (!(tsflags & SOF_TIMESTAMPING_OPT_ID))
> +			return -EINVAL;
> +		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
> +			return -EINVAL;
> +		sockc->ts_opt_id = *(u32 *)CMSG_DATA(cmsg);
> +		sockc->tsflags |= SOF_TIMESTAMPING_OPT_ID_CMSG;
> +		break;
>  	/* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
>  	case SCM_RIGHTS:
>  	case SCM_CREDENTIALS:

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 13:09 [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
  2024-09-02 13:09 ` [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
  2024-09-02 14:29 ` [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Willem de Bruijn
@ 2024-09-02 15:19 ` Jason Xing
  2024-09-02 15:51   ` Willem de Bruijn
  2024-09-02 18:38 ` Simon Horman
  2024-09-03  8:06 ` Dan Carpenter
  4 siblings, 1 reply; 22+ messages in thread
From: Jason Xing @ 2024-09-02 15:19 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	David Ahern, netdev

On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>
> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> timestamps and packets sent via socket. Unfortunately, there is no way
> to reliably predict socket timestamp ID value in case of error returned
> by sendmsg. For UDP sockets it's impossible because of lockless
> nature of UDP transmit, several threads may send packets in parallel. In
> case of RAW sockets MSG_MORE option makes things complicated. More
> details are in the conversation [1].
> This patch adds new control message type to give user-space
> software an opportunity to control the mapping between packets and
> values by providing ID with each sendmsg. This works fine for UDP
> sockets only, and explicit check is added to control message parser.
>
> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> ---
>  Documentation/networking/timestamping.rst | 14 ++++++++++++++
>  arch/alpha/include/uapi/asm/socket.h      |  4 +++-
>  arch/mips/include/uapi/asm/socket.h       |  2 ++
>  arch/parisc/include/uapi/asm/socket.h     |  2 ++
>  arch/sparc/include/uapi/asm/socket.h      |  2 ++
>  include/net/inet_sock.h                   |  4 +++-
>  include/net/sock.h                        |  1 +
>  include/uapi/asm-generic/socket.h         |  2 ++
>  include/uapi/linux/net_tstamp.h           |  3 ++-
>  net/core/sock.c                           | 12 ++++++++++++
>  net/ethtool/common.c                      |  1 +
>  net/ipv4/ip_output.c                      | 16 ++++++++++++----
>  net/ipv6/ip6_output.c                     | 16 ++++++++++++----
>  13 files changed, 68 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> index 5e93cd71f99f..93b0901e4e8e 100644
> --- a/Documentation/networking/timestamping.rst
> +++ b/Documentation/networking/timestamping.rst
> @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
>    among all possibly concurrently outstanding timestamp requests for
>    that socket.
>
> +  With this option enabled user-space application can provide custom
> +  ID for each message sent via UDP socket with control message with
> +  type set to SCM_TS_OPT_ID::
> +
> +    struct msghdr *msg;
> +    ...
> +    cmsg                        = CMSG_FIRSTHDR(msg);
> +    cmsg->cmsg_level            = SOL_SOCKET;
> +    cmsg->cmsg_type             = SO_TIMESTAMPING;
> +    cmsg->cmsg_len              = CMSG_LEN(sizeof(__u32));
> +    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
> +    err = sendmsg(fd, msg, 0);
> +
> +
>  SOF_TIMESTAMPING_OPT_ID_TCP:
>    Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
>    timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> index e94f621903fe..0698e6662cdf 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -10,7 +10,7 @@
>   * Note: we only bother about making the SOL_SOCKET options
>   * same as OSF/1, as that's all that "normal" programs are
>   * likely to set.  We don't necessarily want to be binary
> - * compatible with _everything_.
> + * compatible with _everything_.
>   */
>  #define SOL_SOCKET     0xffff
>
> @@ -140,6 +140,8 @@
>  #define SO_PASSPIDFD           76
>  #define SO_PEERPIDFD           77
>
> +#define SCM_TS_OPT_ID          78
> +
>  #if !defined(__KERNEL__)
>
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> index 60ebaed28a4c..bb3dc8feb205 100644
> --- a/arch/mips/include/uapi/asm/socket.h
> +++ b/arch/mips/include/uapi/asm/socket.h
> @@ -151,6 +151,8 @@
>  #define SO_PASSPIDFD           76
>  #define SO_PEERPIDFD           77
>
> +#define SCM_TS_OPT_ID          78
> +
>  #if !defined(__KERNEL__)
>
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> index be264c2b1a11..c3ab3b3289eb 100644
> --- a/arch/parisc/include/uapi/asm/socket.h
> +++ b/arch/parisc/include/uapi/asm/socket.h
> @@ -132,6 +132,8 @@
>  #define SO_PASSPIDFD           0x404A
>  #define SO_PEERPIDFD           0x404B
>
> +#define SCM_TS_OPT_ID          0x404C
> +
>  #if !defined(__KERNEL__)
>
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> index 682da3714686..9b40f0a57fbc 100644
> --- a/arch/sparc/include/uapi/asm/socket.h
> +++ b/arch/sparc/include/uapi/asm/socket.h
> @@ -133,6 +133,8 @@
>  #define SO_PASSPIDFD             0x0055
>  #define SO_PEERPIDFD             0x0056
>
> +#define SCM_TS_OPT_ID            0x0057
> +
>  #if !defined(__KERNEL__)
>
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 394c3b66065e..2161d50cf0fd 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -174,6 +174,7 @@ struct inet_cork {
>         __s16                   tos;
>         char                    priority;
>         __u16                   gso_size;
> +       u32                     ts_opt_id;
>         u64                     transmit_time;
>         u32                     mark;
>  };
> @@ -241,7 +242,8 @@ struct inet_sock {
>         struct inet_cork_full   cork;
>  };
>
> -#define IPCORK_OPT     1       /* ip-options has been held in ipcork.opt */
> +#define IPCORK_OPT             1       /* ip-options has been held in ipcork.opt */
> +#define IPCORK_TS_OPT_ID       2       /* timestmap opt id has been provided in cmsg */
>
>  enum {
>         INET_FLAGS_PKTINFO      = 0,
> diff --git a/include/net/sock.h b/include/net/sock.h
> index f51d61fab059..73e21dad5660 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
>         u64 transmit_time;
>         u32 mark;
>         u32 tsflags;
> +       u32 ts_opt_id;
>  };
>
>  static inline void sockcm_init(struct sockcm_cookie *sockc,
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index 8ce8a39a1e5f..db3df3e74b01 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -135,6 +135,8 @@
>  #define SO_PASSPIDFD           76
>  #define SO_PEERPIDFD           77
>
> +#define SCM_TS_OPT_ID          78
> +
>  #if !defined(__KERNEL__)
>
>  #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index a2c66b3d7f0f..e2f145e3f3a1 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -32,8 +32,9 @@ enum {
>         SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
>         SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
>         SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
> +       SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),

I'm not sure if the new flag needs to be documented as well? After
this patch, people may search the key word in the documentation file
and then find nothing.

If we have this flag here, normally it means we can pass it through
setsockopt, so is it expected? If it's an exception, I reckon that we
can forbid passing/setting this option in sock_set_timestamping() and
document this rule?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 15:19 ` Jason Xing
@ 2024-09-02 15:51   ` Willem de Bruijn
  2024-09-02 19:40     ` Vadim Fedorenko
  2024-09-03 16:01     ` Vadim Fedorenko
  0 siblings, 2 replies; 22+ messages in thread
From: Willem de Bruijn @ 2024-09-02 15:51 UTC (permalink / raw)
  To: Jason Xing, Vadim Fedorenko
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	David Ahern, netdev

Jason Xing wrote:
> On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
> >
> > SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> > timestamps and packets sent via socket. Unfortunately, there is no way
> > to reliably predict socket timestamp ID value in case of error returned
> > by sendmsg. For UDP sockets it's impossible because of lockless
> > nature of UDP transmit, several threads may send packets in parallel. In
> > case of RAW sockets MSG_MORE option makes things complicated. More
> > details are in the conversation [1].
> > This patch adds new control message type to give user-space
> > software an opportunity to control the mapping between packets and
> > values by providing ID with each sendmsg. This works fine for UDP
> > sockets only, and explicit check is added to control message parser.
> >
> > [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> >
> > Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> > ---
> >  Documentation/networking/timestamping.rst | 14 ++++++++++++++
> >  arch/alpha/include/uapi/asm/socket.h      |  4 +++-
> >  arch/mips/include/uapi/asm/socket.h       |  2 ++
> >  arch/parisc/include/uapi/asm/socket.h     |  2 ++
> >  arch/sparc/include/uapi/asm/socket.h      |  2 ++
> >  include/net/inet_sock.h                   |  4 +++-
> >  include/net/sock.h                        |  1 +
> >  include/uapi/asm-generic/socket.h         |  2 ++
> >  include/uapi/linux/net_tstamp.h           |  3 ++-
> >  net/core/sock.c                           | 12 ++++++++++++
> >  net/ethtool/common.c                      |  1 +
> >  net/ipv4/ip_output.c                      | 16 ++++++++++++----
> >  net/ipv6/ip6_output.c                     | 16 ++++++++++++----
> >  13 files changed, 68 insertions(+), 11 deletions(-)
> >
> > diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> > index 5e93cd71f99f..93b0901e4e8e 100644
> > --- a/Documentation/networking/timestamping.rst
> > +++ b/Documentation/networking/timestamping.rst
> > @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
> >    among all possibly concurrently outstanding timestamp requests for
> >    that socket.
> >
> > +  With this option enabled user-space application can provide custom
> > +  ID for each message sent via UDP socket with control message with
> > +  type set to SCM_TS_OPT_ID::
> > +
> > +    struct msghdr *msg;
> > +    ...
> > +    cmsg                        = CMSG_FIRSTHDR(msg);
> > +    cmsg->cmsg_level            = SOL_SOCKET;
> > +    cmsg->cmsg_type             = SO_TIMESTAMPING;
> > +    cmsg->cmsg_len              = CMSG_LEN(sizeof(__u32));
> > +    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
> > +    err = sendmsg(fd, msg, 0);
> > +
> > +
> >  SOF_TIMESTAMPING_OPT_ID_TCP:
> >    Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
> >    timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> > diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> > index e94f621903fe..0698e6662cdf 100644
> > --- a/arch/alpha/include/uapi/asm/socket.h
> > +++ b/arch/alpha/include/uapi/asm/socket.h
> > @@ -10,7 +10,7 @@
> >   * Note: we only bother about making the SOL_SOCKET options
> >   * same as OSF/1, as that's all that "normal" programs are
> >   * likely to set.  We don't necessarily want to be binary
> > - * compatible with _everything_.
> > + * compatible with _everything_.
> >   */
> >  #define SOL_SOCKET     0xffff
> >
> > @@ -140,6 +140,8 @@
> >  #define SO_PASSPIDFD           76
> >  #define SO_PEERPIDFD           77
> >
> > +#define SCM_TS_OPT_ID          78
> > +
> >  #if !defined(__KERNEL__)
> >
> >  #if __BITS_PER_LONG == 64
> > diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> > index 60ebaed28a4c..bb3dc8feb205 100644
> > --- a/arch/mips/include/uapi/asm/socket.h
> > +++ b/arch/mips/include/uapi/asm/socket.h
> > @@ -151,6 +151,8 @@
> >  #define SO_PASSPIDFD           76
> >  #define SO_PEERPIDFD           77
> >
> > +#define SCM_TS_OPT_ID          78
> > +
> >  #if !defined(__KERNEL__)
> >
> >  #if __BITS_PER_LONG == 64
> > diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> > index be264c2b1a11..c3ab3b3289eb 100644
> > --- a/arch/parisc/include/uapi/asm/socket.h
> > +++ b/arch/parisc/include/uapi/asm/socket.h
> > @@ -132,6 +132,8 @@
> >  #define SO_PASSPIDFD           0x404A
> >  #define SO_PEERPIDFD           0x404B
> >
> > +#define SCM_TS_OPT_ID          0x404C
> > +
> >  #if !defined(__KERNEL__)
> >
> >  #if __BITS_PER_LONG == 64
> > diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> > index 682da3714686..9b40f0a57fbc 100644
> > --- a/arch/sparc/include/uapi/asm/socket.h
> > +++ b/arch/sparc/include/uapi/asm/socket.h
> > @@ -133,6 +133,8 @@
> >  #define SO_PASSPIDFD             0x0055
> >  #define SO_PEERPIDFD             0x0056
> >
> > +#define SCM_TS_OPT_ID            0x0057
> > +
> >  #if !defined(__KERNEL__)
> >
> >
> > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> > index 394c3b66065e..2161d50cf0fd 100644
> > --- a/include/net/inet_sock.h
> > +++ b/include/net/inet_sock.h
> > @@ -174,6 +174,7 @@ struct inet_cork {
> >         __s16                   tos;
> >         char                    priority;
> >         __u16                   gso_size;
> > +       u32                     ts_opt_id;
> >         u64                     transmit_time;
> >         u32                     mark;
> >  };
> > @@ -241,7 +242,8 @@ struct inet_sock {
> >         struct inet_cork_full   cork;
> >  };
> >
> > -#define IPCORK_OPT     1       /* ip-options has been held in ipcork.opt */
> > +#define IPCORK_OPT             1       /* ip-options has been held in ipcork.opt */
> > +#define IPCORK_TS_OPT_ID       2       /* timestmap opt id has been provided in cmsg */
> >
> >  enum {
> >         INET_FLAGS_PKTINFO      = 0,
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index f51d61fab059..73e21dad5660 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
> >         u64 transmit_time;
> >         u32 mark;
> >         u32 tsflags;
> > +       u32 ts_opt_id;
> >  };
> >
> >  static inline void sockcm_init(struct sockcm_cookie *sockc,
> > diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> > index 8ce8a39a1e5f..db3df3e74b01 100644
> > --- a/include/uapi/asm-generic/socket.h
> > +++ b/include/uapi/asm-generic/socket.h
> > @@ -135,6 +135,8 @@
> >  #define SO_PASSPIDFD           76
> >  #define SO_PEERPIDFD           77
> >
> > +#define SCM_TS_OPT_ID          78
> > +
> >  #if !defined(__KERNEL__)
> >
> >  #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> > diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> > index a2c66b3d7f0f..e2f145e3f3a1 100644
> > --- a/include/uapi/linux/net_tstamp.h
> > +++ b/include/uapi/linux/net_tstamp.h
> > @@ -32,8 +32,9 @@ enum {
> >         SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
> >         SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
> >         SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
> > +       SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
> 
> I'm not sure if the new flag needs to be documented as well? After
> this patch, people may search the key word in the documentation file
> and then find nothing.
>
> If we have this flag here, normally it means we can pass it through
> setsockopt, so is it expected? If it's an exception, I reckon that we
> can forbid passing/setting this option in sock_set_timestamping() and
> document this rule?

Good point, thanks.

It must definitely not be part of SOF_TIMESTAMPING_MASK. My bad for
suggesting without giving it much thought.

The bit is kernel-internal. No need to even mention it in user-facing
documentation. But anyone reading net_tstamp.h might wonder what it
does.

It should not even be in a UAPI header, but in an internal one.
Probably include/net/sock.h, near SK_FLAGS_TIMESTAMP.

Maybe we can reserve bit 31 in u32 sk_tsflags. And if we ever have
to double that flag size, it can move up to 63, as it is not UAPI in
any way. This is a workaround to having a separate flags field in
sockcm_cookie.

And have a BUILD_BUG_ON if SOF_TIMESTAMPING_LAST reaches this reserved
region.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 13:09 [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
                   ` (2 preceding siblings ...)
  2024-09-02 15:19 ` Jason Xing
@ 2024-09-02 18:38 ` Simon Horman
  2024-09-02 19:35   ` Vadim Fedorenko
  2024-09-03  8:06 ` Dan Carpenter
  4 siblings, 1 reply; 22+ messages in thread
From: Simon Horman @ 2024-09-02 18:38 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	David Ahern, Jason Xing, netdev

On Mon, Sep 02, 2024 at 06:09:35AM -0700, Vadim Fedorenko wrote:
> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> timestamps and packets sent via socket. Unfortunately, there is no way
> to reliably predict socket timestamp ID value in case of error returned
> by sendmsg. For UDP sockets it's impossible because of lockless
> nature of UDP transmit, several threads may send packets in parallel. In
> case of RAW sockets MSG_MORE option makes things complicated. More
> details are in the conversation [1].
> This patch adds new control message type to give user-space
> software an opportunity to control the mapping between packets and
> values by providing ID with each sendmsg. This works fine for UDP
> sockets only, and explicit check is added to control message parser.
> 
> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> 
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>

...

> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c

...

> @@ -1543,10 +1546,15 @@ static int __ip6_append_data(struct sock *sk,
>  			flags &= ~MSG_SPLICE_PAGES;
>  	}
>  
> -	hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
> -		     READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
> -	if (hold_tskey)
> -		tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> +	if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
> +	    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
> +		if (cork->flags & IPCORK_TS_OPT_ID) {
> +			tskey = cork->ts_opt_id;
> +		} else {
> +			tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> +			hold_tskey = true;

Hi Vadim,

I think that hold_tskey also needs to be assigned a value in
the cases where wither of the if conditions above are false.

Flagged by Smatch.

> +		}
> +	}
>  
>  	/*
>  	 * Let's try using as much space as possible.

-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 18:38 ` Simon Horman
@ 2024-09-02 19:35   ` Vadim Fedorenko
  2024-09-03 15:23     ` Simon Horman
  0 siblings, 1 reply; 22+ messages in thread
From: Vadim Fedorenko @ 2024-09-02 19:35 UTC (permalink / raw)
  To: Simon Horman
  Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
	Jason Xing, netdev

On 02/09/2024 19:38, Simon Horman wrote:
> On Mon, Sep 02, 2024 at 06:09:35AM -0700, Vadim Fedorenko wrote:
>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>> timestamps and packets sent via socket. Unfortunately, there is no way
>> to reliably predict socket timestamp ID value in case of error returned
>> by sendmsg. For UDP sockets it's impossible because of lockless
>> nature of UDP transmit, several threads may send packets in parallel. In
>> case of RAW sockets MSG_MORE option makes things complicated. More
>> details are in the conversation [1].
>> This patch adds new control message type to give user-space
>> software an opportunity to control the mapping between packets and
>> values by providing ID with each sendmsg. This works fine for UDP
>> sockets only, and explicit check is added to control message parser.
>>
>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>>
>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> 
> ...
> 
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> 
> ...
> 
>> @@ -1543,10 +1546,15 @@ static int __ip6_append_data(struct sock *sk,
>>   			flags &= ~MSG_SPLICE_PAGES;
>>   	}
>>   
>> -	hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
>> -		     READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
>> -	if (hold_tskey)
>> -		tskey = atomic_inc_return(&sk->sk_tskey) - 1;
>> +	if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
>> +	    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
>> +		if (cork->flags & IPCORK_TS_OPT_ID) {
>> +			tskey = cork->ts_opt_id;
>> +		} else {
>> +			tskey = atomic_inc_return(&sk->sk_tskey) - 1;
>> +			hold_tskey = true;
> 
> Hi Vadim,
> 
> I think that hold_tskey also needs to be assigned a value in
> the cases where wither of the if conditions above are false.

Hi Simon!

Yes, you are right. I should probably init it with false to avoid
'else' statement.

Thanks,
Vadim


> Flagged by Smatch.
> 
>> +		}
>> +	}
>>   
>>   	/*
>>   	 * Let's try using as much space as possible.
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 15:51   ` Willem de Bruijn
@ 2024-09-02 19:40     ` Vadim Fedorenko
  2024-09-02 20:59       ` Willem de Bruijn
  2024-09-03 16:01     ` Vadim Fedorenko
  1 sibling, 1 reply; 22+ messages in thread
From: Vadim Fedorenko @ 2024-09-02 19:40 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
	netdev, Jason Xing

On 02/09/2024 16:51, Willem de Bruijn wrote:
> Jason Xing wrote:
>> On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>>>
>>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>>> timestamps and packets sent via socket. Unfortunately, there is no way
>>> to reliably predict socket timestamp ID value in case of error returned
>>> by sendmsg. For UDP sockets it's impossible because of lockless
>>> nature of UDP transmit, several threads may send packets in parallel. In
>>> case of RAW sockets MSG_MORE option makes things complicated. More
>>> details are in the conversation [1].
>>> This patch adds new control message type to give user-space
>>> software an opportunity to control the mapping between packets and
>>> values by providing ID with each sendmsg. This works fine for UDP
>>> sockets only, and explicit check is added to control message parser.
>>>
>>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>>>
>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>> ---
>>>   Documentation/networking/timestamping.rst | 14 ++++++++++++++
>>>   arch/alpha/include/uapi/asm/socket.h      |  4 +++-
>>>   arch/mips/include/uapi/asm/socket.h       |  2 ++
>>>   arch/parisc/include/uapi/asm/socket.h     |  2 ++
>>>   arch/sparc/include/uapi/asm/socket.h      |  2 ++
>>>   include/net/inet_sock.h                   |  4 +++-
>>>   include/net/sock.h                        |  1 +
>>>   include/uapi/asm-generic/socket.h         |  2 ++
>>>   include/uapi/linux/net_tstamp.h           |  3 ++-
>>>   net/core/sock.c                           | 12 ++++++++++++
>>>   net/ethtool/common.c                      |  1 +
>>>   net/ipv4/ip_output.c                      | 16 ++++++++++++----
>>>   net/ipv6/ip6_output.c                     | 16 ++++++++++++----
>>>   13 files changed, 68 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
>>> index 5e93cd71f99f..93b0901e4e8e 100644
>>> --- a/Documentation/networking/timestamping.rst
>>> +++ b/Documentation/networking/timestamping.rst
>>> @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
>>>     among all possibly concurrently outstanding timestamp requests for
>>>     that socket.
>>>
>>> +  With this option enabled user-space application can provide custom
>>> +  ID for each message sent via UDP socket with control message with
>>> +  type set to SCM_TS_OPT_ID::
>>> +
>>> +    struct msghdr *msg;
>>> +    ...
>>> +    cmsg                        = CMSG_FIRSTHDR(msg);
>>> +    cmsg->cmsg_level            = SOL_SOCKET;
>>> +    cmsg->cmsg_type             = SO_TIMESTAMPING;
>>> +    cmsg->cmsg_len              = CMSG_LEN(sizeof(__u32));
>>> +    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
>>> +    err = sendmsg(fd, msg, 0);
>>> +
>>> +
>>>   SOF_TIMESTAMPING_OPT_ID_TCP:
>>>     Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
>>>     timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
>>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>>> index e94f621903fe..0698e6662cdf 100644
>>> --- a/arch/alpha/include/uapi/asm/socket.h
>>> +++ b/arch/alpha/include/uapi/asm/socket.h
>>> @@ -10,7 +10,7 @@
>>>    * Note: we only bother about making the SOL_SOCKET options
>>>    * same as OSF/1, as that's all that "normal" programs are
>>>    * likely to set.  We don't necessarily want to be binary
>>> - * compatible with _everything_.
>>> + * compatible with _everything_.
>>>    */
>>>   #define SOL_SOCKET     0xffff
>>>
>>> @@ -140,6 +140,8 @@
>>>   #define SO_PASSPIDFD           76
>>>   #define SO_PEERPIDFD           77
>>>
>>> +#define SCM_TS_OPT_ID          78
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>   #if __BITS_PER_LONG == 64
>>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
>>> index 60ebaed28a4c..bb3dc8feb205 100644
>>> --- a/arch/mips/include/uapi/asm/socket.h
>>> +++ b/arch/mips/include/uapi/asm/socket.h
>>> @@ -151,6 +151,8 @@
>>>   #define SO_PASSPIDFD           76
>>>   #define SO_PEERPIDFD           77
>>>
>>> +#define SCM_TS_OPT_ID          78
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>   #if __BITS_PER_LONG == 64
>>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
>>> index be264c2b1a11..c3ab3b3289eb 100644
>>> --- a/arch/parisc/include/uapi/asm/socket.h
>>> +++ b/arch/parisc/include/uapi/asm/socket.h
>>> @@ -132,6 +132,8 @@
>>>   #define SO_PASSPIDFD           0x404A
>>>   #define SO_PEERPIDFD           0x404B
>>>
>>> +#define SCM_TS_OPT_ID          0x404C
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>   #if __BITS_PER_LONG == 64
>>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
>>> index 682da3714686..9b40f0a57fbc 100644
>>> --- a/arch/sparc/include/uapi/asm/socket.h
>>> +++ b/arch/sparc/include/uapi/asm/socket.h
>>> @@ -133,6 +133,8 @@
>>>   #define SO_PASSPIDFD             0x0055
>>>   #define SO_PEERPIDFD             0x0056
>>>
>>> +#define SCM_TS_OPT_ID            0x0057
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>
>>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>>> index 394c3b66065e..2161d50cf0fd 100644
>>> --- a/include/net/inet_sock.h
>>> +++ b/include/net/inet_sock.h
>>> @@ -174,6 +174,7 @@ struct inet_cork {
>>>          __s16                   tos;
>>>          char                    priority;
>>>          __u16                   gso_size;
>>> +       u32                     ts_opt_id;
>>>          u64                     transmit_time;
>>>          u32                     mark;
>>>   };
>>> @@ -241,7 +242,8 @@ struct inet_sock {
>>>          struct inet_cork_full   cork;
>>>   };
>>>
>>> -#define IPCORK_OPT     1       /* ip-options has been held in ipcork.opt */
>>> +#define IPCORK_OPT             1       /* ip-options has been held in ipcork.opt */
>>> +#define IPCORK_TS_OPT_ID       2       /* timestmap opt id has been provided in cmsg */
>>>
>>>   enum {
>>>          INET_FLAGS_PKTINFO      = 0,
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index f51d61fab059..73e21dad5660 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
>>>          u64 transmit_time;
>>>          u32 mark;
>>>          u32 tsflags;
>>> +       u32 ts_opt_id;
>>>   };
>>>
>>>   static inline void sockcm_init(struct sockcm_cookie *sockc,
>>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
>>> index 8ce8a39a1e5f..db3df3e74b01 100644
>>> --- a/include/uapi/asm-generic/socket.h
>>> +++ b/include/uapi/asm-generic/socket.h
>>> @@ -135,6 +135,8 @@
>>>   #define SO_PASSPIDFD           76
>>>   #define SO_PEERPIDFD           77
>>>
>>> +#define SCM_TS_OPT_ID          78
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>   #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
>>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
>>> index a2c66b3d7f0f..e2f145e3f3a1 100644
>>> --- a/include/uapi/linux/net_tstamp.h
>>> +++ b/include/uapi/linux/net_tstamp.h
>>> @@ -32,8 +32,9 @@ enum {
>>>          SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
>>>          SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
>>>          SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
>>> +       SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
>>
>> I'm not sure if the new flag needs to be documented as well? After
>> this patch, people may search the key word in the documentation file
>> and then find nothing.
>>
>> If we have this flag here, normally it means we can pass it through
>> setsockopt, so is it expected? If it's an exception, I reckon that we
>> can forbid passing/setting this option in sock_set_timestamping() and
>> document this rule?
> 
> Good point, thanks.
> 
> It must definitely not be part of SOF_TIMESTAMPING_MASK. My bad for
> suggesting without giving it much thought.
> 
> The bit is kernel-internal. No need to even mention it in user-facing
> documentation. But anyone reading net_tstamp.h might wonder what it
> does.
> 
> It should not even be in a UAPI header, but in an internal one.
> Probably include/net/sock.h, near SK_FLAGS_TIMESTAMP.
> 
> Maybe we can reserve bit 31 in u32 sk_tsflags. And if we ever have
> to double that flag size, it can move up to 63, as it is not UAPI in
> any way. This is a workaround to having a separate flags field in
> sockcm_cookie.
> 
> And have a BUILD_BUG_ON if SOF_TIMESTAMPING_LAST reaches this reserved
> region.

Yeah, I was also thinking of it not being UAPI, that's why I tried to
avoid it in my RFC using 0 as a reserved value. Do you think
SK_FLAGS_CMSG_TS_OPT_ID is good naming for it?

Thanks,
Vadim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 19:40     ` Vadim Fedorenko
@ 2024-09-02 20:59       ` Willem de Bruijn
  2024-09-02 21:10         ` Vadim Fedorenko
  0 siblings, 1 reply; 22+ messages in thread
From: Willem de Bruijn @ 2024-09-02 20:59 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn
  Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
	netdev, Jason Xing

Vadim Fedorenko wrote:
> On 02/09/2024 16:51, Willem de Bruijn wrote:
> > Jason Xing wrote:
> >> On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
> >>>
> >>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> >>> timestamps and packets sent via socket. Unfortunately, there is no way
> >>> to reliably predict socket timestamp ID value in case of error returned
> >>> by sendmsg. For UDP sockets it's impossible because of lockless
> >>> nature of UDP transmit, several threads may send packets in parallel. In
> >>> case of RAW sockets MSG_MORE option makes things complicated. More
> >>> details are in the conversation [1].
> >>> This patch adds new control message type to give user-space
> >>> software an opportunity to control the mapping between packets and
> >>> values by providing ID with each sendmsg. This works fine for UDP
> >>> sockets only, and explicit check is added to control message parser.
> >>>
> >>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> >>>
> >>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >>> ---
> >>>   Documentation/networking/timestamping.rst | 14 ++++++++++++++
> >>>   arch/alpha/include/uapi/asm/socket.h      |  4 +++-
> >>>   arch/mips/include/uapi/asm/socket.h       |  2 ++
> >>>   arch/parisc/include/uapi/asm/socket.h     |  2 ++
> >>>   arch/sparc/include/uapi/asm/socket.h      |  2 ++
> >>>   include/net/inet_sock.h                   |  4 +++-
> >>>   include/net/sock.h                        |  1 +
> >>>   include/uapi/asm-generic/socket.h         |  2 ++
> >>>   include/uapi/linux/net_tstamp.h           |  3 ++-
> >>>   net/core/sock.c                           | 12 ++++++++++++
> >>>   net/ethtool/common.c                      |  1 +
> >>>   net/ipv4/ip_output.c                      | 16 ++++++++++++----
> >>>   net/ipv6/ip6_output.c                     | 16 ++++++++++++----
> >>>   13 files changed, 68 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> >>> index 5e93cd71f99f..93b0901e4e8e 100644
> >>> --- a/Documentation/networking/timestamping.rst
> >>> +++ b/Documentation/networking/timestamping.rst
> >>> @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
> >>>     among all possibly concurrently outstanding timestamp requests for
> >>>     that socket.
> >>>
> >>> +  With this option enabled user-space application can provide custom
> >>> +  ID for each message sent via UDP socket with control message with
> >>> +  type set to SCM_TS_OPT_ID::
> >>> +
> >>> +    struct msghdr *msg;
> >>> +    ...
> >>> +    cmsg                        = CMSG_FIRSTHDR(msg);
> >>> +    cmsg->cmsg_level            = SOL_SOCKET;
> >>> +    cmsg->cmsg_type             = SO_TIMESTAMPING;
> >>> +    cmsg->cmsg_len              = CMSG_LEN(sizeof(__u32));
> >>> +    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
> >>> +    err = sendmsg(fd, msg, 0);
> >>> +
> >>> +
> >>>   SOF_TIMESTAMPING_OPT_ID_TCP:
> >>>     Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
> >>>     timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> >>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> >>> index e94f621903fe..0698e6662cdf 100644
> >>> --- a/arch/alpha/include/uapi/asm/socket.h
> >>> +++ b/arch/alpha/include/uapi/asm/socket.h
> >>> @@ -10,7 +10,7 @@
> >>>    * Note: we only bother about making the SOL_SOCKET options
> >>>    * same as OSF/1, as that's all that "normal" programs are
> >>>    * likely to set.  We don't necessarily want to be binary
> >>> - * compatible with _everything_.
> >>> + * compatible with _everything_.
> >>>    */
> >>>   #define SOL_SOCKET     0xffff
> >>>
> >>> @@ -140,6 +140,8 @@
> >>>   #define SO_PASSPIDFD           76
> >>>   #define SO_PEERPIDFD           77
> >>>
> >>> +#define SCM_TS_OPT_ID          78
> >>> +
> >>>   #if !defined(__KERNEL__)
> >>>
> >>>   #if __BITS_PER_LONG == 64
> >>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> >>> index 60ebaed28a4c..bb3dc8feb205 100644
> >>> --- a/arch/mips/include/uapi/asm/socket.h
> >>> +++ b/arch/mips/include/uapi/asm/socket.h
> >>> @@ -151,6 +151,8 @@
> >>>   #define SO_PASSPIDFD           76
> >>>   #define SO_PEERPIDFD           77
> >>>
> >>> +#define SCM_TS_OPT_ID          78
> >>> +
> >>>   #if !defined(__KERNEL__)
> >>>
> >>>   #if __BITS_PER_LONG == 64
> >>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> >>> index be264c2b1a11..c3ab3b3289eb 100644
> >>> --- a/arch/parisc/include/uapi/asm/socket.h
> >>> +++ b/arch/parisc/include/uapi/asm/socket.h
> >>> @@ -132,6 +132,8 @@
> >>>   #define SO_PASSPIDFD           0x404A
> >>>   #define SO_PEERPIDFD           0x404B
> >>>
> >>> +#define SCM_TS_OPT_ID          0x404C
> >>> +
> >>>   #if !defined(__KERNEL__)
> >>>
> >>>   #if __BITS_PER_LONG == 64
> >>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> >>> index 682da3714686..9b40f0a57fbc 100644
> >>> --- a/arch/sparc/include/uapi/asm/socket.h
> >>> +++ b/arch/sparc/include/uapi/asm/socket.h
> >>> @@ -133,6 +133,8 @@
> >>>   #define SO_PASSPIDFD             0x0055
> >>>   #define SO_PEERPIDFD             0x0056
> >>>
> >>> +#define SCM_TS_OPT_ID            0x0057
> >>> +
> >>>   #if !defined(__KERNEL__)
> >>>
> >>>
> >>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> >>> index 394c3b66065e..2161d50cf0fd 100644
> >>> --- a/include/net/inet_sock.h
> >>> +++ b/include/net/inet_sock.h
> >>> @@ -174,6 +174,7 @@ struct inet_cork {
> >>>          __s16                   tos;
> >>>          char                    priority;
> >>>          __u16                   gso_size;
> >>> +       u32                     ts_opt_id;
> >>>          u64                     transmit_time;
> >>>          u32                     mark;
> >>>   };
> >>> @@ -241,7 +242,8 @@ struct inet_sock {
> >>>          struct inet_cork_full   cork;
> >>>   };
> >>>
> >>> -#define IPCORK_OPT     1       /* ip-options has been held in ipcork.opt */
> >>> +#define IPCORK_OPT             1       /* ip-options has been held in ipcork.opt */
> >>> +#define IPCORK_TS_OPT_ID       2       /* timestmap opt id has been provided in cmsg */
> >>>
> >>>   enum {
> >>>          INET_FLAGS_PKTINFO      = 0,
> >>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>> index f51d61fab059..73e21dad5660 100644
> >>> --- a/include/net/sock.h
> >>> +++ b/include/net/sock.h
> >>> @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
> >>>          u64 transmit_time;
> >>>          u32 mark;
> >>>          u32 tsflags;
> >>> +       u32 ts_opt_id;
> >>>   };
> >>>
> >>>   static inline void sockcm_init(struct sockcm_cookie *sockc,
> >>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> >>> index 8ce8a39a1e5f..db3df3e74b01 100644
> >>> --- a/include/uapi/asm-generic/socket.h
> >>> +++ b/include/uapi/asm-generic/socket.h
> >>> @@ -135,6 +135,8 @@
> >>>   #define SO_PASSPIDFD           76
> >>>   #define SO_PEERPIDFD           77
> >>>
> >>> +#define SCM_TS_OPT_ID          78
> >>> +
> >>>   #if !defined(__KERNEL__)
> >>>
> >>>   #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> >>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> >>> index a2c66b3d7f0f..e2f145e3f3a1 100644
> >>> --- a/include/uapi/linux/net_tstamp.h
> >>> +++ b/include/uapi/linux/net_tstamp.h
> >>> @@ -32,8 +32,9 @@ enum {
> >>>          SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
> >>>          SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
> >>>          SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
> >>> +       SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
> >>
> >> I'm not sure if the new flag needs to be documented as well? After
> >> this patch, people may search the key word in the documentation file
> >> and then find nothing.
> >>
> >> If we have this flag here, normally it means we can pass it through
> >> setsockopt, so is it expected? If it's an exception, I reckon that we
> >> can forbid passing/setting this option in sock_set_timestamping() and
> >> document this rule?
> > 
> > Good point, thanks.
> > 
> > It must definitely not be part of SOF_TIMESTAMPING_MASK. My bad for
> > suggesting without giving it much thought.
> > 
> > The bit is kernel-internal. No need to even mention it in user-facing
> > documentation. But anyone reading net_tstamp.h might wonder what it
> > does.
> > 
> > It should not even be in a UAPI header, but in an internal one.
> > Probably include/net/sock.h, near SK_FLAGS_TIMESTAMP.
> > 
> > Maybe we can reserve bit 31 in u32 sk_tsflags. And if we ever have
> > to double that flag size, it can move up to 63, as it is not UAPI in
> > any way. This is a workaround to having a separate flags field in
> > sockcm_cookie.
> > 
> > And have a BUILD_BUG_ON if SOF_TIMESTAMPING_LAST reaches this reserved
> > region.
> 
> Yeah, I was also thinking of it not being UAPI, that's why I tried to
> avoid it in my RFC using 0 as a reserved value. Do you think
> SK_FLAGS_CMSG_TS_OPT_ID is good naming for it?

It's relevant only to sockcm_cookie, so maybe SOCKCM_FLAG_TS_OPT_ID?

One day we'll need another sockcm_cookie flag, we'll grow it to add a
real flags field and can get rid of this hack.

The struct is used embedded in ipcm_cookie and ipcm6_cookie, which
would grow as a result.

It is also simply stack allocated in cases like performance sensitive
tcp_sendmsg_locked. Here, the main cost is a slightly more expensive
zeroing in sockcm_init.

For now, I think we should just stick with using the highest bit in
sockcm.tsflags.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 14:29 ` [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Willem de Bruijn
@ 2024-09-02 21:07   ` Vadim Fedorenko
  2024-09-03 20:40     ` Willem de Bruijn
  0 siblings, 1 reply; 22+ messages in thread
From: Vadim Fedorenko @ 2024-09-02 21:07 UTC (permalink / raw)
  To: Willem de Bruijn, Willem de Bruijn
  Cc: netdev, Jakub Kicinski, Jason Xing, Paolo Abeni, David Ahern

On 02/09/2024 15:29, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>> timestamps and packets sent via socket. Unfortunately, there is no way
>> to reliably predict socket timestamp ID value in case of error returned
>> by sendmsg. For UDP sockets it's impossible because of lockless
>> nature of UDP transmit, several threads may send packets in parallel. In
>> case of RAW sockets MSG_MORE option makes things complicated. More
>> details are in the conversation [1].
>> This patch adds new control message type to give user-space
>> software an opportunity to control the mapping between packets and
>> values by providing ID with each sendmsg. This works fine for UDP
>> sockets only, and explicit check is added to control message parser.
>>
>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>>
>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>> ---
>>   Documentation/networking/timestamping.rst | 14 ++++++++++++++
>>   arch/alpha/include/uapi/asm/socket.h      |  4 +++-
>>   arch/mips/include/uapi/asm/socket.h       |  2 ++
>>   arch/parisc/include/uapi/asm/socket.h     |  2 ++
>>   arch/sparc/include/uapi/asm/socket.h      |  2 ++
>>   include/net/inet_sock.h                   |  4 +++-
>>   include/net/sock.h                        |  1 +
>>   include/uapi/asm-generic/socket.h         |  2 ++
>>   include/uapi/linux/net_tstamp.h           |  3 ++-
>>   net/core/sock.c                           | 12 ++++++++++++
>>   net/ethtool/common.c                      |  1 +
>>   net/ipv4/ip_output.c                      | 16 ++++++++++++----
>>   net/ipv6/ip6_output.c                     | 16 ++++++++++++----
>>   13 files changed, 68 insertions(+), 11 deletions(-)
>>
>> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
>> index 5e93cd71f99f..93b0901e4e8e 100644
>> --- a/Documentation/networking/timestamping.rst
>> +++ b/Documentation/networking/timestamping.rst
>> @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
>>     among all possibly concurrently outstanding timestamp requests for
>>     that socket.
>>   
>> +  With this option enabled user-space application can provide custom
>> +  ID for each message sent via UDP socket with control message with
>> +  type set to SCM_TS_OPT_ID::
>> +
>> +    struct msghdr *msg;
>> +    ...
>> +    cmsg			 = CMSG_FIRSTHDR(msg);
>> +    cmsg->cmsg_level		 = SOL_SOCKET;
>> +    cmsg->cmsg_type		 = SO_TIMESTAMPING;
>> +    cmsg->cmsg_len		 = CMSG_LEN(sizeof(__u32));
>> +    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
>> +    err = sendmsg(fd, msg, 0);
>> +
> 
> Please make it clear that this CMSG is optional.
> 
> The process can optionally override the default generated ID, by
> passing a specific ID with control message SCM_TS_OPT_ID:

Ok, I'll re-phrase it this way, thanks!


>>   SOF_TIMESTAMPING_OPT_ID_TCP:
>>     Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
>>     timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>> index e94f621903fe..0698e6662cdf 100644
>> --- a/arch/alpha/include/uapi/asm/socket.h
>> +++ b/arch/alpha/include/uapi/asm/socket.h
>> @@ -10,7 +10,7 @@
>>    * Note: we only bother about making the SOL_SOCKET options
>>    * same as OSF/1, as that's all that "normal" programs are
>>    * likely to set.  We don't necessarily want to be binary
>> - * compatible with _everything_.
>> + * compatible with _everything_.
> 
> Is this due to a checkpatch warning? If so, please add a brief comment
> to the commit message to show that this change is intentional. If not,
> please don't touch unrelated code.

I'll remove it, because it looks like it was some unhappy linter...

>>    */
>>   #define SOL_SOCKET	0xffff
>>   
>> @@ -140,6 +140,8 @@
>>   #define SO_PASSPIDFD		76
>>   #define SO_PEERPIDFD		77
>>   
>> +#define SCM_TS_OPT_ID		78
>> +
>>   #if !defined(__KERNEL__)
>>   
>>   #if __BITS_PER_LONG == 64
>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
>> index 60ebaed28a4c..bb3dc8feb205 100644
>> --- a/arch/mips/include/uapi/asm/socket.h
>> +++ b/arch/mips/include/uapi/asm/socket.h
>> @@ -151,6 +151,8 @@
>>   #define SO_PASSPIDFD		76
>>   #define SO_PEERPIDFD		77
>>   
>> +#define SCM_TS_OPT_ID		78
>> +
>>   #if !defined(__KERNEL__)
>>   
>>   #if __BITS_PER_LONG == 64
>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
>> index be264c2b1a11..c3ab3b3289eb 100644
>> --- a/arch/parisc/include/uapi/asm/socket.h
>> +++ b/arch/parisc/include/uapi/asm/socket.h
>> @@ -132,6 +132,8 @@
>>   #define SO_PASSPIDFD		0x404A
>>   #define SO_PEERPIDFD		0x404B
>>   
>> +#define SCM_TS_OPT_ID		0x404C
>> +
>>   #if !defined(__KERNEL__)
>>   
>>   #if __BITS_PER_LONG == 64
>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
>> index 682da3714686..9b40f0a57fbc 100644
>> --- a/arch/sparc/include/uapi/asm/socket.h
>> +++ b/arch/sparc/include/uapi/asm/socket.h
>> @@ -133,6 +133,8 @@
>>   #define SO_PASSPIDFD             0x0055
>>   #define SO_PEERPIDFD             0x0056
>>   
>> +#define SCM_TS_OPT_ID            0x0057
>> +
>>   #if !defined(__KERNEL__)
>>   
>>   
>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>> index 394c3b66065e..2161d50cf0fd 100644
>> --- a/include/net/inet_sock.h
>> +++ b/include/net/inet_sock.h
>> @@ -174,6 +174,7 @@ struct inet_cork {
>>   	__s16			tos;
>>   	char			priority;
>>   	__u16			gso_size;
>> +	u32			ts_opt_id;
>>   	u64			transmit_time;
>>   	u32			mark;
>>   };
>> @@ -241,7 +242,8 @@ struct inet_sock {
>>   	struct inet_cork_full	cork;
>>   };
>>   
>> -#define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
>> +#define IPCORK_OPT		1	/* ip-options has been held in ipcork.opt */
>> +#define IPCORK_TS_OPT_ID	2	/* timestmap opt id has been provided in cmsg */
> 
> typo: timestamp
> 
> And maybe more relevant:  /* ts_opt_id field is valid, overriding sk_tskey */

I'll change it

>>   enum {
>>   	INET_FLAGS_PKTINFO	= 0,
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index f51d61fab059..73e21dad5660 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
>>   	u64 transmit_time;
>>   	u32 mark;
>>   	u32 tsflags;
>> +	u32 ts_opt_id;
>>   };
>>   
>>   static inline void sockcm_init(struct sockcm_cookie *sockc,
>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
>> index 8ce8a39a1e5f..db3df3e74b01 100644
>> --- a/include/uapi/asm-generic/socket.h
>> +++ b/include/uapi/asm-generic/socket.h
>> @@ -135,6 +135,8 @@
>>   #define SO_PASSPIDFD		76
>>   #define SO_PEERPIDFD		77
>>   
>> +#define SCM_TS_OPT_ID		78
> 
> nit: different indentation

Hmm... that's interesting, it's ok in the code, there no spaces before
#define. I'll re-check it in the patch in v3.

>> +
>>   #if !defined(__KERNEL__)
>>   
>>   #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
>> index a2c66b3d7f0f..e2f145e3f3a1 100644
>> --- a/include/uapi/linux/net_tstamp.h
>> +++ b/include/uapi/linux/net_tstamp.h
>> @@ -32,8 +32,9 @@ enum {
>>   	SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
>>   	SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
>>   	SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
>> +	SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
>>   
>> -	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_TCP,
>> +	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_CMSG,
>>   	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
>>   				 SOF_TIMESTAMPING_LAST
>>   };
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 468b1239606c..560b075765fa 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2859,6 +2859,18 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
>>   			return -EINVAL;
>>   		sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
>>   		break;
>> +	case SCM_TS_OPT_ID:
>> +		/* allow this option for UDP sockets only */
>> +		if (!sk_is_udp(sk))
>> +			return -EINVAL;
> 
> Let's relax the restriction that this is only for UDP.
> 
> At least to also support SOCK_RAW. I don't think that requires any
> additional code at all?

RAW sockets use skb_setup_tx_timestamps which does atomic operation of
incrementing sk_tskey when in _sock_tx_timestamp. So I'll have to
convert all spots (can, ipv4/raw, ipv6/raw, 3 x af_packet) to use the
same logic as in udp path (sock_tx_timestamp) and add conditions.
Or change skb_setup_tx_timestamps to do the logic and take different
arguments. It may look as a big refactoring, so I would like to make
it as a follow-up series.

> Extending to TCP should be straightforward too, just a branch
> on sockc in tcp_tx_timestamp.

TCP part looks a bit easier, as you said, I have to adjust
tcp_tx_timestamp and the logic is straightforward. I still have to
provide a pointer to sock coockie instead of flags, but there is only
one caller of this function and it's much easier than with RAW sockets.

> If so, let's support all. It makes for a simpler API if it is
> supported uniformly wherever OPT_ID is.

If you think that the way I explained for RAW sockets is good enough,
I can send all of them as a single patcheset. Otherwise I would like to
add RAW sockets in follow-up series.

>> +		tsflags = READ_ONCE(sk->sk_tsflags);
>> +		if (!(tsflags & SOF_TIMESTAMPING_OPT_ID))
>> +			return -EINVAL;
>> +		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
>> +			return -EINVAL;
>> +		sockc->ts_opt_id = *(u32 *)CMSG_DATA(cmsg);
>> +		sockc->tsflags |= SOF_TIMESTAMPING_OPT_ID_CMSG;
>> +		break;
>>   	/* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
>>   	case SCM_RIGHTS:
>>   	case SCM_CREDENTIALS:


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 20:59       ` Willem de Bruijn
@ 2024-09-02 21:10         ` Vadim Fedorenko
  0 siblings, 0 replies; 22+ messages in thread
From: Vadim Fedorenko @ 2024-09-02 21:10 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
	netdev, Jason Xing

On 02/09/2024 21:59, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 02/09/2024 16:51, Willem de Bruijn wrote:
>>> Jason Xing wrote:
>>>> On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>>>>>
>>>>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>>>>> timestamps and packets sent via socket. Unfortunately, there is no way
>>>>> to reliably predict socket timestamp ID value in case of error returned
>>>>> by sendmsg. For UDP sockets it's impossible because of lockless
>>>>> nature of UDP transmit, several threads may send packets in parallel. In
>>>>> case of RAW sockets MSG_MORE option makes things complicated. More
>>>>> details are in the conversation [1].
>>>>> This patch adds new control message type to give user-space
>>>>> software an opportunity to control the mapping between packets and
>>>>> values by providing ID with each sendmsg. This works fine for UDP
>>>>> sockets only, and explicit check is added to control message parser.
>>>>>
>>>>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>>>>>
>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>> ---
>>>>>    Documentation/networking/timestamping.rst | 14 ++++++++++++++
>>>>>    arch/alpha/include/uapi/asm/socket.h      |  4 +++-
>>>>>    arch/mips/include/uapi/asm/socket.h       |  2 ++
>>>>>    arch/parisc/include/uapi/asm/socket.h     |  2 ++
>>>>>    arch/sparc/include/uapi/asm/socket.h      |  2 ++
>>>>>    include/net/inet_sock.h                   |  4 +++-
>>>>>    include/net/sock.h                        |  1 +
>>>>>    include/uapi/asm-generic/socket.h         |  2 ++
>>>>>    include/uapi/linux/net_tstamp.h           |  3 ++-
>>>>>    net/core/sock.c                           | 12 ++++++++++++
>>>>>    net/ethtool/common.c                      |  1 +
>>>>>    net/ipv4/ip_output.c                      | 16 ++++++++++++----
>>>>>    net/ipv6/ip6_output.c                     | 16 ++++++++++++----
>>>>>    13 files changed, 68 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
>>>>> index 5e93cd71f99f..93b0901e4e8e 100644
>>>>> --- a/Documentation/networking/timestamping.rst
>>>>> +++ b/Documentation/networking/timestamping.rst
>>>>> @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
>>>>>      among all possibly concurrently outstanding timestamp requests for
>>>>>      that socket.
>>>>>
>>>>> +  With this option enabled user-space application can provide custom
>>>>> +  ID for each message sent via UDP socket with control message with
>>>>> +  type set to SCM_TS_OPT_ID::
>>>>> +
>>>>> +    struct msghdr *msg;
>>>>> +    ...
>>>>> +    cmsg                        = CMSG_FIRSTHDR(msg);
>>>>> +    cmsg->cmsg_level            = SOL_SOCKET;
>>>>> +    cmsg->cmsg_type             = SO_TIMESTAMPING;
>>>>> +    cmsg->cmsg_len              = CMSG_LEN(sizeof(__u32));
>>>>> +    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
>>>>> +    err = sendmsg(fd, msg, 0);
>>>>> +
>>>>> +
>>>>>    SOF_TIMESTAMPING_OPT_ID_TCP:
>>>>>      Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
>>>>>      timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
>>>>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>>>>> index e94f621903fe..0698e6662cdf 100644
>>>>> --- a/arch/alpha/include/uapi/asm/socket.h
>>>>> +++ b/arch/alpha/include/uapi/asm/socket.h
>>>>> @@ -10,7 +10,7 @@
>>>>>     * Note: we only bother about making the SOL_SOCKET options
>>>>>     * same as OSF/1, as that's all that "normal" programs are
>>>>>     * likely to set.  We don't necessarily want to be binary
>>>>> - * compatible with _everything_.
>>>>> + * compatible with _everything_.
>>>>>     */
>>>>>    #define SOL_SOCKET     0xffff
>>>>>
>>>>> @@ -140,6 +140,8 @@
>>>>>    #define SO_PASSPIDFD           76
>>>>>    #define SO_PEERPIDFD           77
>>>>>
>>>>> +#define SCM_TS_OPT_ID          78
>>>>> +
>>>>>    #if !defined(__KERNEL__)
>>>>>
>>>>>    #if __BITS_PER_LONG == 64
>>>>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
>>>>> index 60ebaed28a4c..bb3dc8feb205 100644
>>>>> --- a/arch/mips/include/uapi/asm/socket.h
>>>>> +++ b/arch/mips/include/uapi/asm/socket.h
>>>>> @@ -151,6 +151,8 @@
>>>>>    #define SO_PASSPIDFD           76
>>>>>    #define SO_PEERPIDFD           77
>>>>>
>>>>> +#define SCM_TS_OPT_ID          78
>>>>> +
>>>>>    #if !defined(__KERNEL__)
>>>>>
>>>>>    #if __BITS_PER_LONG == 64
>>>>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
>>>>> index be264c2b1a11..c3ab3b3289eb 100644
>>>>> --- a/arch/parisc/include/uapi/asm/socket.h
>>>>> +++ b/arch/parisc/include/uapi/asm/socket.h
>>>>> @@ -132,6 +132,8 @@
>>>>>    #define SO_PASSPIDFD           0x404A
>>>>>    #define SO_PEERPIDFD           0x404B
>>>>>
>>>>> +#define SCM_TS_OPT_ID          0x404C
>>>>> +
>>>>>    #if !defined(__KERNEL__)
>>>>>
>>>>>    #if __BITS_PER_LONG == 64
>>>>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
>>>>> index 682da3714686..9b40f0a57fbc 100644
>>>>> --- a/arch/sparc/include/uapi/asm/socket.h
>>>>> +++ b/arch/sparc/include/uapi/asm/socket.h
>>>>> @@ -133,6 +133,8 @@
>>>>>    #define SO_PASSPIDFD             0x0055
>>>>>    #define SO_PEERPIDFD             0x0056
>>>>>
>>>>> +#define SCM_TS_OPT_ID            0x0057
>>>>> +
>>>>>    #if !defined(__KERNEL__)
>>>>>
>>>>>
>>>>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>>>>> index 394c3b66065e..2161d50cf0fd 100644
>>>>> --- a/include/net/inet_sock.h
>>>>> +++ b/include/net/inet_sock.h
>>>>> @@ -174,6 +174,7 @@ struct inet_cork {
>>>>>           __s16                   tos;
>>>>>           char                    priority;
>>>>>           __u16                   gso_size;
>>>>> +       u32                     ts_opt_id;
>>>>>           u64                     transmit_time;
>>>>>           u32                     mark;
>>>>>    };
>>>>> @@ -241,7 +242,8 @@ struct inet_sock {
>>>>>           struct inet_cork_full   cork;
>>>>>    };
>>>>>
>>>>> -#define IPCORK_OPT     1       /* ip-options has been held in ipcork.opt */
>>>>> +#define IPCORK_OPT             1       /* ip-options has been held in ipcork.opt */
>>>>> +#define IPCORK_TS_OPT_ID       2       /* timestmap opt id has been provided in cmsg */
>>>>>
>>>>>    enum {
>>>>>           INET_FLAGS_PKTINFO      = 0,
>>>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>>>> index f51d61fab059..73e21dad5660 100644
>>>>> --- a/include/net/sock.h
>>>>> +++ b/include/net/sock.h
>>>>> @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
>>>>>           u64 transmit_time;
>>>>>           u32 mark;
>>>>>           u32 tsflags;
>>>>> +       u32 ts_opt_id;
>>>>>    };
>>>>>
>>>>>    static inline void sockcm_init(struct sockcm_cookie *sockc,
>>>>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
>>>>> index 8ce8a39a1e5f..db3df3e74b01 100644
>>>>> --- a/include/uapi/asm-generic/socket.h
>>>>> +++ b/include/uapi/asm-generic/socket.h
>>>>> @@ -135,6 +135,8 @@
>>>>>    #define SO_PASSPIDFD           76
>>>>>    #define SO_PEERPIDFD           77
>>>>>
>>>>> +#define SCM_TS_OPT_ID          78
>>>>> +
>>>>>    #if !defined(__KERNEL__)
>>>>>
>>>>>    #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
>>>>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
>>>>> index a2c66b3d7f0f..e2f145e3f3a1 100644
>>>>> --- a/include/uapi/linux/net_tstamp.h
>>>>> +++ b/include/uapi/linux/net_tstamp.h
>>>>> @@ -32,8 +32,9 @@ enum {
>>>>>           SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
>>>>>           SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
>>>>>           SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
>>>>> +       SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
>>>>
>>>> I'm not sure if the new flag needs to be documented as well? After
>>>> this patch, people may search the key word in the documentation file
>>>> and then find nothing.
>>>>
>>>> If we have this flag here, normally it means we can pass it through
>>>> setsockopt, so is it expected? If it's an exception, I reckon that we
>>>> can forbid passing/setting this option in sock_set_timestamping() and
>>>> document this rule?
>>>
>>> Good point, thanks.
>>>
>>> It must definitely not be part of SOF_TIMESTAMPING_MASK. My bad for
>>> suggesting without giving it much thought.
>>>
>>> The bit is kernel-internal. No need to even mention it in user-facing
>>> documentation. But anyone reading net_tstamp.h might wonder what it
>>> does.
>>>
>>> It should not even be in a UAPI header, but in an internal one.
>>> Probably include/net/sock.h, near SK_FLAGS_TIMESTAMP.
>>>
>>> Maybe we can reserve bit 31 in u32 sk_tsflags. And if we ever have
>>> to double that flag size, it can move up to 63, as it is not UAPI in
>>> any way. This is a workaround to having a separate flags field in
>>> sockcm_cookie.
>>>
>>> And have a BUILD_BUG_ON if SOF_TIMESTAMPING_LAST reaches this reserved
>>> region.
>>
>> Yeah, I was also thinking of it not being UAPI, that's why I tried to
>> avoid it in my RFC using 0 as a reserved value. Do you think
>> SK_FLAGS_CMSG_TS_OPT_ID is good naming for it?
> 
> It's relevant only to sockcm_cookie, so maybe SOCKCM_FLAG_TS_OPT_ID?
> 
> One day we'll need another sockcm_cookie flag, we'll grow it to add a
> real flags field and can get rid of this hack.
> 
> The struct is used embedded in ipcm_cookie and ipcm6_cookie, which
> would grow as a result.
> 
> It is also simply stack allocated in cases like performance sensitive
> tcp_sendmsg_locked. Here, the main cost is a slightly more expensive
> zeroing in sockcm_init.
> 
> For now, I think we should just stick with using the highest bit in
> sockcm.tsflags.

I totally agree with using the highest bit in sockcm.tsflags for now.
I was just wondering about the name as naming is always the hardest
part. SOCKCM_FLAG_TS_OPT_ID looks good and reasonable, I'll use it.

Thanks,
Vadim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
@ 2024-09-03  4:16 kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2024-09-03  4:16 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20240902130937.457115-1-vadfed@meta.com>
References: <20240902130937.457115-1-vadfed@meta.com>
TO: Vadim Fedorenko <vadfed@meta.com>
TO: Vadim Fedorenko <vadim.fedorenko@linux.dev>
TO: Willem de Bruijn <willemb@google.com>
TO: Jakub Kicinski <kuba@kernel.org>
TO: Paolo Abeni <pabeni@redhat.com>
TO: David Ahern <dsahern@kernel.org>
TO: Jason Xing <kerneljasonxing@gmail.com>
CC: netdev@vger.kernel.org

Hi Vadim,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Vadim-Fedorenko/selftests-txtimestamp-add-SCM_TS_OPT_ID-test/20240902-212008
base:   net-next/main
patch link:    https://lore.kernel.org/r/20240902130937.457115-1-vadfed%40meta.com
patch subject: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
:::::: branch date: 15 hours ago
:::::: commit date: 15 hours ago
config: csky-randconfig-r072-20240903 (https://download.01.org/0day-ci/archive/20240903/202409031142.3dSuW9Oo-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 14.1.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Closes: https://lore.kernel.org/r/202409031142.3dSuW9Oo-lkp@intel.com/

smatch warnings:
net/ipv4/ip_output.c:1284 __ip_append_data() error: uninitialized symbol 'hold_tskey'.

vim +/hold_tskey +1284 net/ipv4/ip_output.c

^1da177e4c3f41 Linus Torvalds           2005-04-16   951  
f5fca608651129 David S. Miller          2011-05-08   952  static int __ip_append_data(struct sock *sk,
f5fca608651129 David S. Miller          2011-05-08   953  			    struct flowi4 *fl4,
f5fca608651129 David S. Miller          2011-05-08   954  			    struct sk_buff_head *queue,
1470ddf7f8cecf Herbert Xu               2011-03-01   955  			    struct inet_cork *cork,
5640f7685831e0 Eric Dumazet             2012-09-23   956  			    struct page_frag *pfrag,
1470ddf7f8cecf Herbert Xu               2011-03-01   957  			    int getfrag(void *from, char *to, int offset,
1470ddf7f8cecf Herbert Xu               2011-03-01   958  					int len, int odd, struct sk_buff *skb),
^1da177e4c3f41 Linus Torvalds           2005-04-16   959  			    void *from, int length, int transhdrlen,
^1da177e4c3f41 Linus Torvalds           2005-04-16   960  			    unsigned int flags)
^1da177e4c3f41 Linus Torvalds           2005-04-16   961  {
^1da177e4c3f41 Linus Torvalds           2005-04-16   962  	struct inet_sock *inet = inet_sk(sk);
b5947e5d1e710c Willem de Bruijn         2018-11-30   963  	struct ubuf_info *uarg = NULL;
^1da177e4c3f41 Linus Torvalds           2005-04-16   964  	struct sk_buff *skb;
07df5294a753df Herbert Xu               2011-03-01   965  	struct ip_options *opt = cork->opt;
^1da177e4c3f41 Linus Torvalds           2005-04-16   966  	int hh_len;
^1da177e4c3f41 Linus Torvalds           2005-04-16   967  	int exthdrlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16   968  	int mtu;
^1da177e4c3f41 Linus Torvalds           2005-04-16   969  	int copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16   970  	int err;
^1da177e4c3f41 Linus Torvalds           2005-04-16   971  	int offset = 0;
8eb77cc73977d8 Pavel Begunkov           2022-07-12   972  	bool zc = false;
daba287b299ec7 Hannes Frederic Sowa     2013-10-27   973  	unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
^1da177e4c3f41 Linus Torvalds           2005-04-16   974  	int csummode = CHECKSUM_NONE;
05d6d492097c55 Eric Dumazet             2024-04-29   975  	struct rtable *rt = dst_rtable(cork->dst);
488b6d91b07112 Vadim Fedorenko          2024-02-13   976  	bool paged, hold_tskey, extra_uref = false;
694aba690de062 Eric Dumazet             2018-03-31   977  	unsigned int wmem_alloc_delta = 0;
09c2d251b70723 Willem de Bruijn         2014-08-04   978  	u32 tskey = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16   979  
96d7303e9cfb6a Steffen Klassert         2011-06-05   980  	skb = skb_peek_tail(queue);
96d7303e9cfb6a Steffen Klassert         2011-06-05   981  
96d7303e9cfb6a Steffen Klassert         2011-06-05   982  	exthdrlen = !skb ? rt->dst.header_len : 0;
bec1f6f697362c Willem de Bruijn         2018-04-26   983  	mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
15e36f5b8e982d Willem de Bruijn         2018-04-26   984  	paged = !!cork->gso_size;
bec1f6f697362c Willem de Bruijn         2018-04-26   985  
d8d1f30b95a635 Changli Gao              2010-06-10   986  	hh_len = LL_RESERVED_SPACE(rt->dst.dev);
^1da177e4c3f41 Linus Torvalds           2005-04-16   987  
^1da177e4c3f41 Linus Torvalds           2005-04-16   988  	fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
^1da177e4c3f41 Linus Torvalds           2005-04-16   989  	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
cbc08a33126f8f Miaohe Lin               2020-08-29   990  	maxnonfragsize = ip_sk_ignore_df(sk) ? IP_MAX_MTU : mtu;
^1da177e4c3f41 Linus Torvalds           2005-04-16   991  
daba287b299ec7 Hannes Frederic Sowa     2013-10-27   992  	if (cork->length + length > maxnonfragsize - fragheaderlen) {
f5fca608651129 David S. Miller          2011-05-08   993  		ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport,
61e7f09d0f437c Hannes Frederic Sowa     2013-12-19   994  			       mtu - (opt ? opt->optlen : 0));
^1da177e4c3f41 Linus Torvalds           2005-04-16   995  		return -EMSGSIZE;
^1da177e4c3f41 Linus Torvalds           2005-04-16   996  	}
^1da177e4c3f41 Linus Torvalds           2005-04-16   997  
^1da177e4c3f41 Linus Torvalds           2005-04-16   998  	/*
^1da177e4c3f41 Linus Torvalds           2005-04-16   999  	 * transhdrlen > 0 means that this is the first fragment and we wish
^1da177e4c3f41 Linus Torvalds           2005-04-16  1000  	 * it won't be fragmented in the future.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1001  	 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1002  	if (transhdrlen &&
^1da177e4c3f41 Linus Torvalds           2005-04-16  1003  	    length + fragheaderlen <= mtu &&
c8cd0989bd151f Tom Herbert              2015-12-14  1004  	    rt->dst.dev->features & (NETIF_F_HW_CSUM | NETIF_F_IP_CSUM) &&
bec1f6f697362c Willem de Bruijn         2018-04-26  1005  	    (!(flags & MSG_MORE) || cork->gso_size) &&
cd027a5433d667 Jacek Kalwas             2018-04-12  1006  	    (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM)))
84fa7933a33f80 Patrick McHardy          2006-08-29  1007  		csummode = CHECKSUM_PARTIAL;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1008  
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1009  	if ((flags & MSG_ZEROCOPY) && length) {
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1010  		struct msghdr *msg = from;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1011  
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1012  		if (getfrag == ip_generic_getfrag && msg->msg_ubuf) {
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1013  			if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb))
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1014  				return -EINVAL;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1015  
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1016  			/* Leave uarg NULL if can't zerocopy, callers should
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1017  			 * be able to handle it.
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1018  			 */
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1019  			if ((rt->dst.dev->features & NETIF_F_SG) &&
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1020  			    csummode == CHECKSUM_PARTIAL) {
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1021  				paged = true;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1022  				zc = true;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1023  				uarg = msg->msg_ubuf;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1024  			}
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1025  		} else if (sock_flag(sk, SOCK_ZEROCOPY)) {
8c793822c5803e Jonathan Lemon           2021-01-06  1026  			uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
b5947e5d1e710c Willem de Bruijn         2018-11-30  1027  			if (!uarg)
b5947e5d1e710c Willem de Bruijn         2018-11-30  1028  				return -ENOBUFS;
522924b583082f Willem de Bruijn         2019-06-07  1029  			extra_uref = !skb_zcopy(skb);	/* only ref on new uarg */
b5947e5d1e710c Willem de Bruijn         2018-11-30  1030  			if (rt->dst.dev->features & NETIF_F_SG &&
b5947e5d1e710c Willem de Bruijn         2018-11-30  1031  			    csummode == CHECKSUM_PARTIAL) {
b5947e5d1e710c Willem de Bruijn         2018-11-30  1032  				paged = true;
8eb77cc73977d8 Pavel Begunkov           2022-07-12  1033  				zc = true;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1034  			} else {
e7d2b510165fff Pavel Begunkov           2022-09-23  1035  				uarg_to_msgzc(uarg)->zerocopy = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1036  				skb_zcopy_set(skb, uarg, &extra_uref);
b5947e5d1e710c Willem de Bruijn         2018-11-30  1037  			}
b5947e5d1e710c Willem de Bruijn         2018-11-30  1038  		}
7da0dde68486b2 David Howells            2023-05-22  1039  	} else if ((flags & MSG_SPLICE_PAGES) && length) {
cafbe182a467bf Eric Dumazet             2023-08-16  1040  		if (inet_test_bit(HDRINCL, sk))
7da0dde68486b2 David Howells            2023-05-22  1041  			return -EPERM;
5a6f6873606e03 David Howells            2023-06-14  1042  		if (rt->dst.dev->features & NETIF_F_SG &&
5a6f6873606e03 David Howells            2023-06-14  1043  		    getfrag == ip_generic_getfrag)
7da0dde68486b2 David Howells            2023-05-22  1044  			/* We need an empty buffer to attach stuff to */
7da0dde68486b2 David Howells            2023-05-22  1045  			paged = true;
7da0dde68486b2 David Howells            2023-05-22  1046  		else
7da0dde68486b2 David Howells            2023-05-22  1047  			flags &= ~MSG_SPLICE_PAGES;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1048  	}
b5947e5d1e710c Willem de Bruijn         2018-11-30  1049  
1470ddf7f8cecf Herbert Xu               2011-03-01  1050  	cork->length += length;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1051  
b7399073687728 Vadim Fedorenko          2024-09-02  1052  	if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
b7399073687728 Vadim Fedorenko          2024-09-02  1053  	    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
b7399073687728 Vadim Fedorenko          2024-09-02  1054  		if (cork->flags & IPCORK_TS_OPT_ID) {
b7399073687728 Vadim Fedorenko          2024-09-02  1055  			tskey = cork->ts_opt_id;
b7399073687728 Vadim Fedorenko          2024-09-02  1056  		} else {
488b6d91b07112 Vadim Fedorenko          2024-02-13  1057  			tskey = atomic_inc_return(&sk->sk_tskey) - 1;
b7399073687728 Vadim Fedorenko          2024-09-02  1058  			hold_tskey = true;
b7399073687728 Vadim Fedorenko          2024-09-02  1059  		}
b7399073687728 Vadim Fedorenko          2024-09-02  1060  	}
488b6d91b07112 Vadim Fedorenko          2024-02-13  1061  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1062  	/* So, what's going on in the loop below?
^1da177e4c3f41 Linus Torvalds           2005-04-16  1063  	 *
^1da177e4c3f41 Linus Torvalds           2005-04-16  1064  	 * We use calculated fragment length to generate chained skb,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1065  	 * each of segments is IP fragment ready for sending to network after
^1da177e4c3f41 Linus Torvalds           2005-04-16  1066  	 * adding appropriate IP header.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1067  	 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1068  
26cde9f7e2747b Herbert Xu               2010-06-15  1069  	if (!skb)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1070  		goto alloc_new_skb;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1071  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1072  	while (length > 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1073  		/* Check if the remaining data fits into current packet. */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1074  		copy = mtu - skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1075  		if (copy < length)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1076  			copy = maxfraglen - skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1077  		if (copy <= 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1078  			char *data;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1079  			unsigned int datalen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1080  			unsigned int fraglen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1081  			unsigned int fraggap;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1082  			unsigned int alloclen, alloc_extra;
aba36930a35e7f Willem de Bruijn         2018-11-24  1083  			unsigned int pagedlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1084  			struct sk_buff *skb_prev;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1085  alloc_new_skb:
^1da177e4c3f41 Linus Torvalds           2005-04-16  1086  			skb_prev = skb;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1087  			if (skb_prev)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1088  				fraggap = skb_prev->len - maxfraglen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1089  			else
^1da177e4c3f41 Linus Torvalds           2005-04-16  1090  				fraggap = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1091  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1092  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1093  			 * If remaining data exceeds the mtu,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1094  			 * we know we need more fragment(s).
^1da177e4c3f41 Linus Torvalds           2005-04-16  1095  			 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1096  			datalen = length + fraggap;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1097  			if (datalen > mtu - fragheaderlen)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1098  				datalen = maxfraglen - fragheaderlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1099  			fraglen = datalen + fragheaderlen;
aba36930a35e7f Willem de Bruijn         2018-11-24  1100  			pagedlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1101  
6d123b81ac6150 Jakub Kicinski           2021-06-23  1102  			alloc_extra = hh_len + 15;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1103  			alloc_extra += exthdrlen;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1104  
6d123b81ac6150 Jakub Kicinski           2021-06-23  1105  			/* The last fragment gets additional space at tail.
6d123b81ac6150 Jakub Kicinski           2021-06-23  1106  			 * Note, with MSG_MORE we overallocate on fragments,
6d123b81ac6150 Jakub Kicinski           2021-06-23  1107  			 * because we have no idea what fragment will be
6d123b81ac6150 Jakub Kicinski           2021-06-23  1108  			 * the last.
6d123b81ac6150 Jakub Kicinski           2021-06-23  1109  			 */
6d123b81ac6150 Jakub Kicinski           2021-06-23  1110  			if (datalen == length + fraggap)
6d123b81ac6150 Jakub Kicinski           2021-06-23  1111  				alloc_extra += rt->dst.trailer_len;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1112  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1113  			if ((flags & MSG_MORE) &&
d8d1f30b95a635 Changli Gao              2010-06-10  1114  			    !(rt->dst.dev->features&NETIF_F_SG))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1115  				alloclen = mtu;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1116  			else if (!paged &&
6d123b81ac6150 Jakub Kicinski           2021-06-23  1117  				 (fraglen + alloc_extra < SKB_MAX_ALLOC ||
6d123b81ac6150 Jakub Kicinski           2021-06-23  1118  				  !(rt->dst.dev->features & NETIF_F_SG)))
59104f062435c7 Eric Dumazet             2010-09-20  1119  				alloclen = fraglen;
47cf88993c9108 Pavel Begunkov           2022-08-25  1120  			else {
8eb77cc73977d8 Pavel Begunkov           2022-07-12  1121  				alloclen = fragheaderlen + transhdrlen;
8eb77cc73977d8 Pavel Begunkov           2022-07-12  1122  				pagedlen = datalen - transhdrlen;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1123  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1124  
6d123b81ac6150 Jakub Kicinski           2021-06-23  1125  			alloclen += alloc_extra;
33f99dc7fd948b Steffen Klassert         2011-06-22  1126  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1127  			if (transhdrlen) {
6d123b81ac6150 Jakub Kicinski           2021-06-23  1128  				skb = sock_alloc_send_skb(sk, alloclen,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1129  						(flags & MSG_DONTWAIT), &err);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1130  			} else {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1131  				skb = NULL;
694aba690de062 Eric Dumazet             2018-03-31  1132  				if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
^1da177e4c3f41 Linus Torvalds           2005-04-16  1133  				    2 * sk->sk_sndbuf)
6d123b81ac6150 Jakub Kicinski           2021-06-23  1134  					skb = alloc_skb(alloclen,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1135  							sk->sk_allocation);
51456b2914a34d Ian Morris               2015-04-03  1136  				if (unlikely(!skb))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1137  					err = -ENOBUFS;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1138  			}
51456b2914a34d Ian Morris               2015-04-03  1139  			if (!skb)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1140  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1141  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1142  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1143  			 *	Fill in the control structures
^1da177e4c3f41 Linus Torvalds           2005-04-16  1144  			 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1145  			skb->ip_summed = csummode;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1146  			skb->csum = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1147  			skb_reserve(skb, hh_len);
11878b40ed5c5b Willem de Bruijn         2014-07-14  1148  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1149  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1150  			 *	Find where to start putting bytes.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1151  			 */
15e36f5b8e982d Willem de Bruijn         2018-04-26  1152  			data = skb_put(skb, fraglen + exthdrlen - pagedlen);
c14d2450cb7fe1 Arnaldo Carvalho de Melo 2007-03-11  1153  			skb_set_network_header(skb, exthdrlen);
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10  1154  			skb->transport_header = (skb->network_header +
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10  1155  						 fragheaderlen);
353e5c9abd900d Steffen Klassert         2011-06-22  1156  			data += fragheaderlen + exthdrlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1157  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1158  			if (fraggap) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1159  				skb->csum = skb_copy_and_csum_bits(
^1da177e4c3f41 Linus Torvalds           2005-04-16  1160  					skb_prev, maxfraglen,
8d5930dfb7edbf Al Viro                  2020-07-10  1161  					data + transhdrlen, fraggap);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1162  				skb_prev->csum = csum_sub(skb_prev->csum,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1163  							  skb->csum);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1164  				data += fraggap;
e9fa4f7bd291c2 Herbert Xu               2006-08-13  1165  				pskb_trim_unique(skb_prev, maxfraglen);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1166  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1167  
15e36f5b8e982d Willem de Bruijn         2018-04-26  1168  			copy = datalen - transhdrlen - fraggap - pagedlen;
0f71c9caf26726 David Howells            2023-08-01  1169  			/* [!] NOTE: copy will be negative if pagedlen>0
0f71c9caf26726 David Howells            2023-08-01  1170  			 * because then the equation reduces to -fraggap.
0f71c9caf26726 David Howells            2023-08-01  1171  			 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1172  			if (copy > 0 && getfrag(from, data + transhdrlen, offset, copy, fraggap, skb) < 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1173  				err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1174  				kfree_skb(skb);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1175  				goto error;
0f71c9caf26726 David Howells            2023-08-01  1176  			} else if (flags & MSG_SPLICE_PAGES) {
0f71c9caf26726 David Howells            2023-08-01  1177  				copy = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1178  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1179  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1180  			offset += copy;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1181  			length -= copy + transhdrlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1182  			transhdrlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1183  			exthdrlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1184  			csummode = CHECKSUM_NONE;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1185  
52900d22288e7d Willem de Bruijn         2018-11-30  1186  			/* only the initial fragment is time stamped */
52900d22288e7d Willem de Bruijn         2018-11-30  1187  			skb_shinfo(skb)->tx_flags = cork->tx_flags;
52900d22288e7d Willem de Bruijn         2018-11-30  1188  			cork->tx_flags = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1189  			skb_shinfo(skb)->tskey = tskey;
52900d22288e7d Willem de Bruijn         2018-11-30  1190  			tskey = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1191  			skb_zcopy_set(skb, uarg, &extra_uref);
52900d22288e7d Willem de Bruijn         2018-11-30  1192  
0dec879f636f11 Julian Anastasov         2017-02-06  1193  			if ((flags & MSG_CONFIRM) && !skb_prev)
0dec879f636f11 Julian Anastasov         2017-02-06  1194  				skb_set_dst_pending_confirm(skb, 1);
0dec879f636f11 Julian Anastasov         2017-02-06  1195  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1196  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1197  			 * Put the packet on the pending queue.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1198  			 */
694aba690de062 Eric Dumazet             2018-03-31  1199  			if (!skb->destructor) {
694aba690de062 Eric Dumazet             2018-03-31  1200  				skb->destructor = sock_wfree;
694aba690de062 Eric Dumazet             2018-03-31  1201  				skb->sk = sk;
694aba690de062 Eric Dumazet             2018-03-31  1202  				wmem_alloc_delta += skb->truesize;
694aba690de062 Eric Dumazet             2018-03-31  1203  			}
1470ddf7f8cecf Herbert Xu               2011-03-01  1204  			__skb_queue_tail(queue, skb);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1205  			continue;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1206  		}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1207  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1208  		if (copy > length)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1209  			copy = length;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1210  
113f99c3358564 Willem de Bruijn         2018-05-17  1211  		if (!(rt->dst.dev->features&NETIF_F_SG) &&
113f99c3358564 Willem de Bruijn         2018-05-17  1212  		    skb_tailroom(skb) >= copy) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1213  			unsigned int off;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1214  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1215  			off = skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1216  			if (getfrag(from, skb_put(skb, copy),
^1da177e4c3f41 Linus Torvalds           2005-04-16  1217  					offset, copy, off, skb) < 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1218  				__skb_trim(skb, off);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1219  				err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1220  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1221  			}
7da0dde68486b2 David Howells            2023-05-22  1222  		} else if (flags & MSG_SPLICE_PAGES) {
7da0dde68486b2 David Howells            2023-05-22  1223  			struct msghdr *msg = from;
7da0dde68486b2 David Howells            2023-05-22  1224  
0f71c9caf26726 David Howells            2023-08-01  1225  			err = -EIO;
0f71c9caf26726 David Howells            2023-08-01  1226  			if (WARN_ON_ONCE(copy > msg->msg_iter.count))
0f71c9caf26726 David Howells            2023-08-01  1227  				goto error;
0f71c9caf26726 David Howells            2023-08-01  1228  
7da0dde68486b2 David Howells            2023-05-22  1229  			err = skb_splice_from_iter(skb, &msg->msg_iter, copy,
7da0dde68486b2 David Howells            2023-05-22  1230  						   sk->sk_allocation);
7da0dde68486b2 David Howells            2023-05-22  1231  			if (err < 0)
7da0dde68486b2 David Howells            2023-05-22  1232  				goto error;
7da0dde68486b2 David Howells            2023-05-22  1233  			copy = err;
7da0dde68486b2 David Howells            2023-05-22  1234  			wmem_alloc_delta += copy;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1235  		} else if (!zc) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1236  			int i = skb_shinfo(skb)->nr_frags;
5640f7685831e0 Eric Dumazet             2012-09-23  1237  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1238  			err = -ENOMEM;
5640f7685831e0 Eric Dumazet             2012-09-23  1239  			if (!sk_page_frag_refill(sk, pfrag))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1240  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1241  
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1242  			skb_zcopy_downgrade_managed(skb);
5640f7685831e0 Eric Dumazet             2012-09-23  1243  			if (!skb_can_coalesce(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet             2012-09-23  1244  					      pfrag->offset)) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1245  				err = -EMSGSIZE;
5640f7685831e0 Eric Dumazet             2012-09-23  1246  				if (i == MAX_SKB_FRAGS)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1247  					goto error;
5640f7685831e0 Eric Dumazet             2012-09-23  1248  
5640f7685831e0 Eric Dumazet             2012-09-23  1249  				__skb_fill_page_desc(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet             2012-09-23  1250  						     pfrag->offset, 0);
5640f7685831e0 Eric Dumazet             2012-09-23  1251  				skb_shinfo(skb)->nr_frags = ++i;
5640f7685831e0 Eric Dumazet             2012-09-23  1252  				get_page(pfrag->page);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1253  			}
5640f7685831e0 Eric Dumazet             2012-09-23  1254  			copy = min_t(int, copy, pfrag->size - pfrag->offset);
5640f7685831e0 Eric Dumazet             2012-09-23  1255  			if (getfrag(from,
5640f7685831e0 Eric Dumazet             2012-09-23  1256  				    page_address(pfrag->page) + pfrag->offset,
5640f7685831e0 Eric Dumazet             2012-09-23  1257  				    offset, copy, skb->len, skb) < 0)
5640f7685831e0 Eric Dumazet             2012-09-23  1258  				goto error_efault;
5640f7685831e0 Eric Dumazet             2012-09-23  1259  
5640f7685831e0 Eric Dumazet             2012-09-23  1260  			pfrag->offset += copy;
5640f7685831e0 Eric Dumazet             2012-09-23  1261  			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
ede57d58e6f38d Richard Gobert           2022-06-22  1262  			skb_len_add(skb, copy);
694aba690de062 Eric Dumazet             2018-03-31  1263  			wmem_alloc_delta += copy;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1264  		} else {
b5947e5d1e710c Willem de Bruijn         2018-11-30  1265  			err = skb_zerocopy_iter_dgram(skb, from, copy);
b5947e5d1e710c Willem de Bruijn         2018-11-30  1266  			if (err < 0)
b5947e5d1e710c Willem de Bruijn         2018-11-30  1267  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1268  		}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1269  		offset += copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1270  		length -= copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1271  	}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1272  
9e8445a56c253f Paolo Abeni              2018-04-04  1273  	if (wmem_alloc_delta)
694aba690de062 Eric Dumazet             2018-03-31  1274  		refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1275  	return 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1276  
5640f7685831e0 Eric Dumazet             2012-09-23  1277  error_efault:
5640f7685831e0 Eric Dumazet             2012-09-23  1278  	err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1279  error:
8e0449172497a9 Jonathan Lemon           2021-01-06  1280  	net_zcopy_put_abort(uarg, extra_uref);
1470ddf7f8cecf Herbert Xu               2011-03-01  1281  	cork->length -= length;
5e38e270444f26 Pavel Emelyanov          2008-07-16  1282  	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
694aba690de062 Eric Dumazet             2018-03-31  1283  	refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
488b6d91b07112 Vadim Fedorenko          2024-02-13 @1284  	if (hold_tskey)
488b6d91b07112 Vadim Fedorenko          2024-02-13  1285  		atomic_dec(&sk->sk_tskey);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1286  	return err;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1287  }
^1da177e4c3f41 Linus Torvalds           2005-04-16  1288  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 13:09 [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
                   ` (3 preceding siblings ...)
  2024-09-02 18:38 ` Simon Horman
@ 2024-09-03  8:06 ` Dan Carpenter
  2024-09-03  8:16   ` Vadim Fedorenko
  4 siblings, 1 reply; 22+ messages in thread
From: Dan Carpenter @ 2024-09-03  8:06 UTC (permalink / raw)
  To: oe-kbuild, Vadim Fedorenko, Vadim Fedorenko, Willem de Bruijn,
	Jakub Kicinski, Paolo Abeni, David Ahern, Jason Xing
  Cc: lkp, oe-kbuild-all, netdev

Hi Vadim,

kernel test robot noticed the following build warnings:

url:    https://github.com/intel-lab-lkp/linux/commits/Vadim-Fedorenko/selftests-txtimestamp-add-SCM_TS_OPT_ID-test/20240902-212008
base:   net-next/main
patch link:    https://lore.kernel.org/r/20240902130937.457115-1-vadfed%40meta.com
patch subject: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
config: csky-randconfig-r072-20240903 (https://download.01.org/0day-ci/archive/20240903/202409031142.3dSuW9Oo-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 14.1.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202409031142.3dSuW9Oo-lkp@intel.com/

smatch warnings:
net/ipv4/ip_output.c:1284 __ip_append_data() error: uninitialized symbol 'hold_tskey'.

vim +/hold_tskey +1284 net/ipv4/ip_output.c

f5fca608651129 David S. Miller          2011-05-08   952  static int __ip_append_data(struct sock *sk,
f5fca608651129 David S. Miller          2011-05-08   953  			    struct flowi4 *fl4,
f5fca608651129 David S. Miller          2011-05-08   954  			    struct sk_buff_head *queue,
1470ddf7f8cecf Herbert Xu               2011-03-01   955  			    struct inet_cork *cork,
5640f7685831e0 Eric Dumazet             2012-09-23   956  			    struct page_frag *pfrag,
1470ddf7f8cecf Herbert Xu               2011-03-01   957  			    int getfrag(void *from, char *to, int offset,
1470ddf7f8cecf Herbert Xu               2011-03-01   958  					int len, int odd, struct sk_buff *skb),
^1da177e4c3f41 Linus Torvalds           2005-04-16   959  			    void *from, int length, int transhdrlen,
^1da177e4c3f41 Linus Torvalds           2005-04-16   960  			    unsigned int flags)
^1da177e4c3f41 Linus Torvalds           2005-04-16   961  {
^1da177e4c3f41 Linus Torvalds           2005-04-16   962  	struct inet_sock *inet = inet_sk(sk);
b5947e5d1e710c Willem de Bruijn         2018-11-30   963  	struct ubuf_info *uarg = NULL;
^1da177e4c3f41 Linus Torvalds           2005-04-16   964  	struct sk_buff *skb;
07df5294a753df Herbert Xu               2011-03-01   965  	struct ip_options *opt = cork->opt;
^1da177e4c3f41 Linus Torvalds           2005-04-16   966  	int hh_len;
^1da177e4c3f41 Linus Torvalds           2005-04-16   967  	int exthdrlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16   968  	int mtu;
^1da177e4c3f41 Linus Torvalds           2005-04-16   969  	int copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16   970  	int err;
^1da177e4c3f41 Linus Torvalds           2005-04-16   971  	int offset = 0;
8eb77cc73977d8 Pavel Begunkov           2022-07-12   972  	bool zc = false;
daba287b299ec7 Hannes Frederic Sowa     2013-10-27   973  	unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
^1da177e4c3f41 Linus Torvalds           2005-04-16   974  	int csummode = CHECKSUM_NONE;
05d6d492097c55 Eric Dumazet             2024-04-29   975  	struct rtable *rt = dst_rtable(cork->dst);
488b6d91b07112 Vadim Fedorenko          2024-02-13   976  	bool paged, hold_tskey, extra_uref = false;
694aba690de062 Eric Dumazet             2018-03-31   977  	unsigned int wmem_alloc_delta = 0;
09c2d251b70723 Willem de Bruijn         2014-08-04   978  	u32 tskey = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16   979  
96d7303e9cfb6a Steffen Klassert         2011-06-05   980  	skb = skb_peek_tail(queue);
96d7303e9cfb6a Steffen Klassert         2011-06-05   981  
96d7303e9cfb6a Steffen Klassert         2011-06-05   982  	exthdrlen = !skb ? rt->dst.header_len : 0;
bec1f6f697362c Willem de Bruijn         2018-04-26   983  	mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
15e36f5b8e982d Willem de Bruijn         2018-04-26   984  	paged = !!cork->gso_size;
bec1f6f697362c Willem de Bruijn         2018-04-26   985  
d8d1f30b95a635 Changli Gao              2010-06-10   986  	hh_len = LL_RESERVED_SPACE(rt->dst.dev);
^1da177e4c3f41 Linus Torvalds           2005-04-16   987  
^1da177e4c3f41 Linus Torvalds           2005-04-16   988  	fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
^1da177e4c3f41 Linus Torvalds           2005-04-16   989  	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
cbc08a33126f8f Miaohe Lin               2020-08-29   990  	maxnonfragsize = ip_sk_ignore_df(sk) ? IP_MAX_MTU : mtu;
^1da177e4c3f41 Linus Torvalds           2005-04-16   991  
daba287b299ec7 Hannes Frederic Sowa     2013-10-27   992  	if (cork->length + length > maxnonfragsize - fragheaderlen) {
f5fca608651129 David S. Miller          2011-05-08   993  		ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport,
61e7f09d0f437c Hannes Frederic Sowa     2013-12-19   994  			       mtu - (opt ? opt->optlen : 0));
^1da177e4c3f41 Linus Torvalds           2005-04-16   995  		return -EMSGSIZE;
^1da177e4c3f41 Linus Torvalds           2005-04-16   996  	}
^1da177e4c3f41 Linus Torvalds           2005-04-16   997  
^1da177e4c3f41 Linus Torvalds           2005-04-16   998  	/*
^1da177e4c3f41 Linus Torvalds           2005-04-16   999  	 * transhdrlen > 0 means that this is the first fragment and we wish
^1da177e4c3f41 Linus Torvalds           2005-04-16  1000  	 * it won't be fragmented in the future.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1001  	 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1002  	if (transhdrlen &&
^1da177e4c3f41 Linus Torvalds           2005-04-16  1003  	    length + fragheaderlen <= mtu &&
c8cd0989bd151f Tom Herbert              2015-12-14  1004  	    rt->dst.dev->features & (NETIF_F_HW_CSUM | NETIF_F_IP_CSUM) &&
bec1f6f697362c Willem de Bruijn         2018-04-26  1005  	    (!(flags & MSG_MORE) || cork->gso_size) &&
cd027a5433d667 Jacek Kalwas             2018-04-12  1006  	    (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM)))
84fa7933a33f80 Patrick McHardy          2006-08-29  1007  		csummode = CHECKSUM_PARTIAL;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1008  
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1009  	if ((flags & MSG_ZEROCOPY) && length) {
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1010  		struct msghdr *msg = from;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1011  
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1012  		if (getfrag == ip_generic_getfrag && msg->msg_ubuf) {
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1013  			if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb))
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1014  				return -EINVAL;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1015  
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1016  			/* Leave uarg NULL if can't zerocopy, callers should
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1017  			 * be able to handle it.
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1018  			 */
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1019  			if ((rt->dst.dev->features & NETIF_F_SG) &&
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1020  			    csummode == CHECKSUM_PARTIAL) {
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1021  				paged = true;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1022  				zc = true;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1023  				uarg = msg->msg_ubuf;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1024  			}
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1025  		} else if (sock_flag(sk, SOCK_ZEROCOPY)) {
8c793822c5803e Jonathan Lemon           2021-01-06  1026  			uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
b5947e5d1e710c Willem de Bruijn         2018-11-30  1027  			if (!uarg)
b5947e5d1e710c Willem de Bruijn         2018-11-30  1028  				return -ENOBUFS;
522924b583082f Willem de Bruijn         2019-06-07  1029  			extra_uref = !skb_zcopy(skb);	/* only ref on new uarg */
b5947e5d1e710c Willem de Bruijn         2018-11-30  1030  			if (rt->dst.dev->features & NETIF_F_SG &&
b5947e5d1e710c Willem de Bruijn         2018-11-30  1031  			    csummode == CHECKSUM_PARTIAL) {
b5947e5d1e710c Willem de Bruijn         2018-11-30  1032  				paged = true;
8eb77cc73977d8 Pavel Begunkov           2022-07-12  1033  				zc = true;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1034  			} else {
e7d2b510165fff Pavel Begunkov           2022-09-23  1035  				uarg_to_msgzc(uarg)->zerocopy = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1036  				skb_zcopy_set(skb, uarg, &extra_uref);
b5947e5d1e710c Willem de Bruijn         2018-11-30  1037  			}
b5947e5d1e710c Willem de Bruijn         2018-11-30  1038  		}
7da0dde68486b2 David Howells            2023-05-22  1039  	} else if ((flags & MSG_SPLICE_PAGES) && length) {
cafbe182a467bf Eric Dumazet             2023-08-16  1040  		if (inet_test_bit(HDRINCL, sk))
7da0dde68486b2 David Howells            2023-05-22  1041  			return -EPERM;
5a6f6873606e03 David Howells            2023-06-14  1042  		if (rt->dst.dev->features & NETIF_F_SG &&
5a6f6873606e03 David Howells            2023-06-14  1043  		    getfrag == ip_generic_getfrag)
7da0dde68486b2 David Howells            2023-05-22  1044  			/* We need an empty buffer to attach stuff to */
7da0dde68486b2 David Howells            2023-05-22  1045  			paged = true;
7da0dde68486b2 David Howells            2023-05-22  1046  		else
7da0dde68486b2 David Howells            2023-05-22  1047  			flags &= ~MSG_SPLICE_PAGES;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1048  	}
b5947e5d1e710c Willem de Bruijn         2018-11-30  1049  
1470ddf7f8cecf Herbert Xu               2011-03-01  1050  	cork->length += length;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1051  
b7399073687728 Vadim Fedorenko          2024-09-02  1052  	if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
b7399073687728 Vadim Fedorenko          2024-09-02  1053  	    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
b7399073687728 Vadim Fedorenko          2024-09-02  1054  		if (cork->flags & IPCORK_TS_OPT_ID) {
b7399073687728 Vadim Fedorenko          2024-09-02  1055  			tskey = cork->ts_opt_id;
b7399073687728 Vadim Fedorenko          2024-09-02  1056  		} else {
488b6d91b07112 Vadim Fedorenko          2024-02-13  1057  			tskey = atomic_inc_return(&sk->sk_tskey) - 1;
b7399073687728 Vadim Fedorenko          2024-09-02  1058  			hold_tskey = true;

hold_tskey is never set to false.

b7399073687728 Vadim Fedorenko          2024-09-02  1059  		}
b7399073687728 Vadim Fedorenko          2024-09-02  1060  	}
488b6d91b07112 Vadim Fedorenko          2024-02-13  1061  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1062  	/* So, what's going on in the loop below?
^1da177e4c3f41 Linus Torvalds           2005-04-16  1063  	 *
^1da177e4c3f41 Linus Torvalds           2005-04-16  1064  	 * We use calculated fragment length to generate chained skb,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1065  	 * each of segments is IP fragment ready for sending to network after
^1da177e4c3f41 Linus Torvalds           2005-04-16  1066  	 * adding appropriate IP header.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1067  	 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1068  
26cde9f7e2747b Herbert Xu               2010-06-15  1069  	if (!skb)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1070  		goto alloc_new_skb;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1071  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1072  	while (length > 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1073  		/* Check if the remaining data fits into current packet. */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1074  		copy = mtu - skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1075  		if (copy < length)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1076  			copy = maxfraglen - skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1077  		if (copy <= 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1078  			char *data;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1079  			unsigned int datalen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1080  			unsigned int fraglen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1081  			unsigned int fraggap;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1082  			unsigned int alloclen, alloc_extra;
aba36930a35e7f Willem de Bruijn         2018-11-24  1083  			unsigned int pagedlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1084  			struct sk_buff *skb_prev;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1085  alloc_new_skb:
^1da177e4c3f41 Linus Torvalds           2005-04-16  1086  			skb_prev = skb;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1087  			if (skb_prev)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1088  				fraggap = skb_prev->len - maxfraglen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1089  			else
^1da177e4c3f41 Linus Torvalds           2005-04-16  1090  				fraggap = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1091  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1092  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1093  			 * If remaining data exceeds the mtu,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1094  			 * we know we need more fragment(s).
^1da177e4c3f41 Linus Torvalds           2005-04-16  1095  			 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1096  			datalen = length + fraggap;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1097  			if (datalen > mtu - fragheaderlen)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1098  				datalen = maxfraglen - fragheaderlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1099  			fraglen = datalen + fragheaderlen;
aba36930a35e7f Willem de Bruijn         2018-11-24  1100  			pagedlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1101  
6d123b81ac6150 Jakub Kicinski           2021-06-23  1102  			alloc_extra = hh_len + 15;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1103  			alloc_extra += exthdrlen;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1104  
6d123b81ac6150 Jakub Kicinski           2021-06-23  1105  			/* The last fragment gets additional space at tail.
6d123b81ac6150 Jakub Kicinski           2021-06-23  1106  			 * Note, with MSG_MORE we overallocate on fragments,
6d123b81ac6150 Jakub Kicinski           2021-06-23  1107  			 * because we have no idea what fragment will be
6d123b81ac6150 Jakub Kicinski           2021-06-23  1108  			 * the last.
6d123b81ac6150 Jakub Kicinski           2021-06-23  1109  			 */
6d123b81ac6150 Jakub Kicinski           2021-06-23  1110  			if (datalen == length + fraggap)
6d123b81ac6150 Jakub Kicinski           2021-06-23  1111  				alloc_extra += rt->dst.trailer_len;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1112  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1113  			if ((flags & MSG_MORE) &&
d8d1f30b95a635 Changli Gao              2010-06-10  1114  			    !(rt->dst.dev->features&NETIF_F_SG))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1115  				alloclen = mtu;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1116  			else if (!paged &&
6d123b81ac6150 Jakub Kicinski           2021-06-23  1117  				 (fraglen + alloc_extra < SKB_MAX_ALLOC ||
6d123b81ac6150 Jakub Kicinski           2021-06-23  1118  				  !(rt->dst.dev->features & NETIF_F_SG)))
59104f062435c7 Eric Dumazet             2010-09-20  1119  				alloclen = fraglen;
47cf88993c9108 Pavel Begunkov           2022-08-25  1120  			else {
8eb77cc73977d8 Pavel Begunkov           2022-07-12  1121  				alloclen = fragheaderlen + transhdrlen;
8eb77cc73977d8 Pavel Begunkov           2022-07-12  1122  				pagedlen = datalen - transhdrlen;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1123  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1124  
6d123b81ac6150 Jakub Kicinski           2021-06-23  1125  			alloclen += alloc_extra;
33f99dc7fd948b Steffen Klassert         2011-06-22  1126  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1127  			if (transhdrlen) {
6d123b81ac6150 Jakub Kicinski           2021-06-23  1128  				skb = sock_alloc_send_skb(sk, alloclen,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1129  						(flags & MSG_DONTWAIT), &err);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1130  			} else {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1131  				skb = NULL;
694aba690de062 Eric Dumazet             2018-03-31  1132  				if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
^1da177e4c3f41 Linus Torvalds           2005-04-16  1133  				    2 * sk->sk_sndbuf)
6d123b81ac6150 Jakub Kicinski           2021-06-23  1134  					skb = alloc_skb(alloclen,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1135  							sk->sk_allocation);
51456b2914a34d Ian Morris               2015-04-03  1136  				if (unlikely(!skb))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1137  					err = -ENOBUFS;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1138  			}
51456b2914a34d Ian Morris               2015-04-03  1139  			if (!skb)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1140  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1141  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1142  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1143  			 *	Fill in the control structures
^1da177e4c3f41 Linus Torvalds           2005-04-16  1144  			 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1145  			skb->ip_summed = csummode;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1146  			skb->csum = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1147  			skb_reserve(skb, hh_len);
11878b40ed5c5b Willem de Bruijn         2014-07-14  1148  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1149  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1150  			 *	Find where to start putting bytes.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1151  			 */
15e36f5b8e982d Willem de Bruijn         2018-04-26  1152  			data = skb_put(skb, fraglen + exthdrlen - pagedlen);
c14d2450cb7fe1 Arnaldo Carvalho de Melo 2007-03-11  1153  			skb_set_network_header(skb, exthdrlen);
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10  1154  			skb->transport_header = (skb->network_header +
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10  1155  						 fragheaderlen);
353e5c9abd900d Steffen Klassert         2011-06-22  1156  			data += fragheaderlen + exthdrlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1157  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1158  			if (fraggap) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1159  				skb->csum = skb_copy_and_csum_bits(
^1da177e4c3f41 Linus Torvalds           2005-04-16  1160  					skb_prev, maxfraglen,
8d5930dfb7edbf Al Viro                  2020-07-10  1161  					data + transhdrlen, fraggap);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1162  				skb_prev->csum = csum_sub(skb_prev->csum,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1163  							  skb->csum);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1164  				data += fraggap;
e9fa4f7bd291c2 Herbert Xu               2006-08-13  1165  				pskb_trim_unique(skb_prev, maxfraglen);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1166  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1167  
15e36f5b8e982d Willem de Bruijn         2018-04-26  1168  			copy = datalen - transhdrlen - fraggap - pagedlen;
0f71c9caf26726 David Howells            2023-08-01  1169  			/* [!] NOTE: copy will be negative if pagedlen>0
0f71c9caf26726 David Howells            2023-08-01  1170  			 * because then the equation reduces to -fraggap.
0f71c9caf26726 David Howells            2023-08-01  1171  			 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1172  			if (copy > 0 && getfrag(from, data + transhdrlen, offset, copy, fraggap, skb) < 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1173  				err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1174  				kfree_skb(skb);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1175  				goto error;
0f71c9caf26726 David Howells            2023-08-01  1176  			} else if (flags & MSG_SPLICE_PAGES) {
0f71c9caf26726 David Howells            2023-08-01  1177  				copy = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1178  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1179  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1180  			offset += copy;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1181  			length -= copy + transhdrlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1182  			transhdrlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1183  			exthdrlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1184  			csummode = CHECKSUM_NONE;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1185  
52900d22288e7d Willem de Bruijn         2018-11-30  1186  			/* only the initial fragment is time stamped */
52900d22288e7d Willem de Bruijn         2018-11-30  1187  			skb_shinfo(skb)->tx_flags = cork->tx_flags;
52900d22288e7d Willem de Bruijn         2018-11-30  1188  			cork->tx_flags = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1189  			skb_shinfo(skb)->tskey = tskey;
52900d22288e7d Willem de Bruijn         2018-11-30  1190  			tskey = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1191  			skb_zcopy_set(skb, uarg, &extra_uref);
52900d22288e7d Willem de Bruijn         2018-11-30  1192  
0dec879f636f11 Julian Anastasov         2017-02-06  1193  			if ((flags & MSG_CONFIRM) && !skb_prev)
0dec879f636f11 Julian Anastasov         2017-02-06  1194  				skb_set_dst_pending_confirm(skb, 1);
0dec879f636f11 Julian Anastasov         2017-02-06  1195  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1196  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1197  			 * Put the packet on the pending queue.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1198  			 */
694aba690de062 Eric Dumazet             2018-03-31  1199  			if (!skb->destructor) {
694aba690de062 Eric Dumazet             2018-03-31  1200  				skb->destructor = sock_wfree;
694aba690de062 Eric Dumazet             2018-03-31  1201  				skb->sk = sk;
694aba690de062 Eric Dumazet             2018-03-31  1202  				wmem_alloc_delta += skb->truesize;
694aba690de062 Eric Dumazet             2018-03-31  1203  			}
1470ddf7f8cecf Herbert Xu               2011-03-01  1204  			__skb_queue_tail(queue, skb);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1205  			continue;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1206  		}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1207  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1208  		if (copy > length)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1209  			copy = length;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1210  
113f99c3358564 Willem de Bruijn         2018-05-17  1211  		if (!(rt->dst.dev->features&NETIF_F_SG) &&
113f99c3358564 Willem de Bruijn         2018-05-17  1212  		    skb_tailroom(skb) >= copy) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1213  			unsigned int off;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1214  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1215  			off = skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1216  			if (getfrag(from, skb_put(skb, copy),
^1da177e4c3f41 Linus Torvalds           2005-04-16  1217  					offset, copy, off, skb) < 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1218  				__skb_trim(skb, off);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1219  				err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1220  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1221  			}
7da0dde68486b2 David Howells            2023-05-22  1222  		} else if (flags & MSG_SPLICE_PAGES) {
7da0dde68486b2 David Howells            2023-05-22  1223  			struct msghdr *msg = from;
7da0dde68486b2 David Howells            2023-05-22  1224  
0f71c9caf26726 David Howells            2023-08-01  1225  			err = -EIO;
0f71c9caf26726 David Howells            2023-08-01  1226  			if (WARN_ON_ONCE(copy > msg->msg_iter.count))
0f71c9caf26726 David Howells            2023-08-01  1227  				goto error;
0f71c9caf26726 David Howells            2023-08-01  1228  
7da0dde68486b2 David Howells            2023-05-22  1229  			err = skb_splice_from_iter(skb, &msg->msg_iter, copy,
7da0dde68486b2 David Howells            2023-05-22  1230  						   sk->sk_allocation);
7da0dde68486b2 David Howells            2023-05-22  1231  			if (err < 0)
7da0dde68486b2 David Howells            2023-05-22  1232  				goto error;
7da0dde68486b2 David Howells            2023-05-22  1233  			copy = err;
7da0dde68486b2 David Howells            2023-05-22  1234  			wmem_alloc_delta += copy;
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1235  		} else if (!zc) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1236  			int i = skb_shinfo(skb)->nr_frags;
5640f7685831e0 Eric Dumazet             2012-09-23  1237  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1238  			err = -ENOMEM;
5640f7685831e0 Eric Dumazet             2012-09-23  1239  			if (!sk_page_frag_refill(sk, pfrag))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1240  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1241  
c445f31b3cfaa0 Pavel Begunkov           2022-07-12  1242  			skb_zcopy_downgrade_managed(skb);
5640f7685831e0 Eric Dumazet             2012-09-23  1243  			if (!skb_can_coalesce(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet             2012-09-23  1244  					      pfrag->offset)) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1245  				err = -EMSGSIZE;
5640f7685831e0 Eric Dumazet             2012-09-23  1246  				if (i == MAX_SKB_FRAGS)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1247  					goto error;
5640f7685831e0 Eric Dumazet             2012-09-23  1248  
5640f7685831e0 Eric Dumazet             2012-09-23  1249  				__skb_fill_page_desc(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet             2012-09-23  1250  						     pfrag->offset, 0);
5640f7685831e0 Eric Dumazet             2012-09-23  1251  				skb_shinfo(skb)->nr_frags = ++i;
5640f7685831e0 Eric Dumazet             2012-09-23  1252  				get_page(pfrag->page);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1253  			}
5640f7685831e0 Eric Dumazet             2012-09-23  1254  			copy = min_t(int, copy, pfrag->size - pfrag->offset);
5640f7685831e0 Eric Dumazet             2012-09-23  1255  			if (getfrag(from,
5640f7685831e0 Eric Dumazet             2012-09-23  1256  				    page_address(pfrag->page) + pfrag->offset,
5640f7685831e0 Eric Dumazet             2012-09-23  1257  				    offset, copy, skb->len, skb) < 0)
5640f7685831e0 Eric Dumazet             2012-09-23  1258  				goto error_efault;
5640f7685831e0 Eric Dumazet             2012-09-23  1259  
5640f7685831e0 Eric Dumazet             2012-09-23  1260  			pfrag->offset += copy;
5640f7685831e0 Eric Dumazet             2012-09-23  1261  			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
ede57d58e6f38d Richard Gobert           2022-06-22  1262  			skb_len_add(skb, copy);
694aba690de062 Eric Dumazet             2018-03-31  1263  			wmem_alloc_delta += copy;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1264  		} else {
b5947e5d1e710c Willem de Bruijn         2018-11-30  1265  			err = skb_zerocopy_iter_dgram(skb, from, copy);
b5947e5d1e710c Willem de Bruijn         2018-11-30  1266  			if (err < 0)
b5947e5d1e710c Willem de Bruijn         2018-11-30  1267  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1268  		}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1269  		offset += copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1270  		length -= copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1271  	}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1272  
9e8445a56c253f Paolo Abeni              2018-04-04  1273  	if (wmem_alloc_delta)
694aba690de062 Eric Dumazet             2018-03-31  1274  		refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1275  	return 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1276  
5640f7685831e0 Eric Dumazet             2012-09-23  1277  error_efault:
5640f7685831e0 Eric Dumazet             2012-09-23  1278  	err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1279  error:
8e0449172497a9 Jonathan Lemon           2021-01-06  1280  	net_zcopy_put_abort(uarg, extra_uref);
1470ddf7f8cecf Herbert Xu               2011-03-01  1281  	cork->length -= length;
5e38e270444f26 Pavel Emelyanov          2008-07-16  1282  	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
694aba690de062 Eric Dumazet             2018-03-31  1283  	refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
488b6d91b07112 Vadim Fedorenko          2024-02-13 @1284  	if (hold_tskey)
488b6d91b07112 Vadim Fedorenko          2024-02-13  1285  		atomic_dec(&sk->sk_tskey);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1286  	return err;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1287  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-03  8:06 ` Dan Carpenter
@ 2024-09-03  8:16   ` Vadim Fedorenko
  0 siblings, 0 replies; 22+ messages in thread
From: Vadim Fedorenko @ 2024-09-03  8:16 UTC (permalink / raw)
  To: Dan Carpenter, oe-kbuild, Vadim Fedorenko, Willem de Bruijn,
	Jakub Kicinski, Paolo Abeni, David Ahern, Jason Xing
  Cc: lkp, oe-kbuild-all, netdev

On 03/09/2024 09:06, Dan Carpenter wrote:
> Hi Vadim,
> 
> kernel test robot noticed the following build warnings:
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Vadim-Fedorenko/selftests-txtimestamp-add-SCM_TS_OPT_ID-test/20240902-212008
> base:   net-next/main
> patch link:    https://lore.kernel.org/r/20240902130937.457115-1-vadfed%40meta.com
> patch subject: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
> config: csky-randconfig-r072-20240903 (https://download.01.org/0day-ci/archive/20240903/202409031142.3dSuW9Oo-lkp@intel.com/config)
> compiler: csky-linux-gcc (GCC) 14.1.0
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> | Closes: https://lore.kernel.org/r/202409031142.3dSuW9Oo-lkp@intel.com/
> 
> smatch warnings:
> net/ipv4/ip_output.c:1284 __ip_append_data() error: uninitialized symbol 'hold_tskey'.
> 
> vim +/hold_tskey +1284 net/ipv4/ip_output.c
> 

[.. snip ..]

> ^1da177e4c3f41 Linus Torvalds           2005-04-16  1051
> b7399073687728 Vadim Fedorenko          2024-09-02  1052  	if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
> b7399073687728 Vadim Fedorenko          2024-09-02  1053  	    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
> b7399073687728 Vadim Fedorenko          2024-09-02  1054  		if (cork->flags & IPCORK_TS_OPT_ID) {
> b7399073687728 Vadim Fedorenko          2024-09-02  1055  			tskey = cork->ts_opt_id;
> b7399073687728 Vadim Fedorenko          2024-09-02  1056  		} else {
> 488b6d91b07112 Vadim Fedorenko          2024-02-13  1057  			tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> b7399073687728 Vadim Fedorenko          2024-09-02  1058  			hold_tskey = true;
> 
> hold_tskey is never set to false.

Hi Dan,

This was already flagged by Simon, I'll fix it the next version.

Thanks,
Vadim


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 19:35   ` Vadim Fedorenko
@ 2024-09-03 15:23     ` Simon Horman
  0 siblings, 0 replies; 22+ messages in thread
From: Simon Horman @ 2024-09-03 15:23 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
	Jason Xing, netdev

On Mon, Sep 02, 2024 at 08:35:26PM +0100, Vadim Fedorenko wrote:
> On 02/09/2024 19:38, Simon Horman wrote:
> > On Mon, Sep 02, 2024 at 06:09:35AM -0700, Vadim Fedorenko wrote:
> > > SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> > > timestamps and packets sent via socket. Unfortunately, there is no way
> > > to reliably predict socket timestamp ID value in case of error returned
> > > by sendmsg. For UDP sockets it's impossible because of lockless
> > > nature of UDP transmit, several threads may send packets in parallel. In
> > > case of RAW sockets MSG_MORE option makes things complicated. More
> > > details are in the conversation [1].
> > > This patch adds new control message type to give user-space
> > > software an opportunity to control the mapping between packets and
> > > values by providing ID with each sendmsg. This works fine for UDP
> > > sockets only, and explicit check is added to control message parser.
> > > 
> > > [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> > > 
> > > Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> > 
> > ...
> > 
> > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > 
> > ...
> > 
> > > @@ -1543,10 +1546,15 @@ static int __ip6_append_data(struct sock *sk,
> > >   			flags &= ~MSG_SPLICE_PAGES;
> > >   	}
> > > -	hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
> > > -		     READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
> > > -	if (hold_tskey)
> > > -		tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> > > +	if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
> > > +	    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
> > > +		if (cork->flags & IPCORK_TS_OPT_ID) {
> > > +			tskey = cork->ts_opt_id;
> > > +		} else {
> > > +			tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> > > +			hold_tskey = true;
> > 
> > Hi Vadim,
> > 
> > I think that hold_tskey also needs to be assigned a value in
> > the cases where wither of the if conditions above are false.
> 
> Hi Simon!
> 
> Yes, you are right. I should probably init it with false to avoid
> 'else' statement.

Thanks, I agree that seems like a good approach.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 15:51   ` Willem de Bruijn
  2024-09-02 19:40     ` Vadim Fedorenko
@ 2024-09-03 16:01     ` Vadim Fedorenko
  2024-09-03 20:46       ` Willem de Bruijn
  1 sibling, 1 reply; 22+ messages in thread
From: Vadim Fedorenko @ 2024-09-03 16:01 UTC (permalink / raw)
  To: Willem de Bruijn, Willem de Bruijn
  Cc: Jakub Kicinski, Paolo Abeni, David Ahern, netdev, Jason Xing

On 02/09/2024 16:51, Willem de Bruijn wrote:
> Jason Xing wrote:
>> On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>>>
>>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>>> timestamps and packets sent via socket. Unfortunately, there is no way
>>> to reliably predict socket timestamp ID value in case of error returned
>>> by sendmsg. For UDP sockets it's impossible because of lockless
>>> nature of UDP transmit, several threads may send packets in parallel. In
>>> case of RAW sockets MSG_MORE option makes things complicated. More
>>> details are in the conversation [1].
>>> This patch adds new control message type to give user-space
>>> software an opportunity to control the mapping between packets and
>>> values by providing ID with each sendmsg. This works fine for UDP
>>> sockets only, and explicit check is added to control message parser.
>>>
>>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>>>
>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>> ---
>>>   Documentation/networking/timestamping.rst | 14 ++++++++++++++
>>>   arch/alpha/include/uapi/asm/socket.h      |  4 +++-
>>>   arch/mips/include/uapi/asm/socket.h       |  2 ++
>>>   arch/parisc/include/uapi/asm/socket.h     |  2 ++
>>>   arch/sparc/include/uapi/asm/socket.h      |  2 ++
>>>   include/net/inet_sock.h                   |  4 +++-
>>>   include/net/sock.h                        |  1 +
>>>   include/uapi/asm-generic/socket.h         |  2 ++
>>>   include/uapi/linux/net_tstamp.h           |  3 ++-
>>>   net/core/sock.c                           | 12 ++++++++++++
>>>   net/ethtool/common.c                      |  1 +
>>>   net/ipv4/ip_output.c                      | 16 ++++++++++++----
>>>   net/ipv6/ip6_output.c                     | 16 ++++++++++++----
>>>   13 files changed, 68 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
>>> index 5e93cd71f99f..93b0901e4e8e 100644
>>> --- a/Documentation/networking/timestamping.rst
>>> +++ b/Documentation/networking/timestamping.rst
>>> @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
>>>     among all possibly concurrently outstanding timestamp requests for
>>>     that socket.
>>>
>>> +  With this option enabled user-space application can provide custom
>>> +  ID for each message sent via UDP socket with control message with
>>> +  type set to SCM_TS_OPT_ID::
>>> +
>>> +    struct msghdr *msg;
>>> +    ...
>>> +    cmsg                        = CMSG_FIRSTHDR(msg);
>>> +    cmsg->cmsg_level            = SOL_SOCKET;
>>> +    cmsg->cmsg_type             = SO_TIMESTAMPING;
>>> +    cmsg->cmsg_len              = CMSG_LEN(sizeof(__u32));
>>> +    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
>>> +    err = sendmsg(fd, msg, 0);
>>> +
>>> +
>>>   SOF_TIMESTAMPING_OPT_ID_TCP:
>>>     Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
>>>     timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
>>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>>> index e94f621903fe..0698e6662cdf 100644
>>> --- a/arch/alpha/include/uapi/asm/socket.h
>>> +++ b/arch/alpha/include/uapi/asm/socket.h
>>> @@ -10,7 +10,7 @@
>>>    * Note: we only bother about making the SOL_SOCKET options
>>>    * same as OSF/1, as that's all that "normal" programs are
>>>    * likely to set.  We don't necessarily want to be binary
>>> - * compatible with _everything_.
>>> + * compatible with _everything_.
>>>    */
>>>   #define SOL_SOCKET     0xffff
>>>
>>> @@ -140,6 +140,8 @@
>>>   #define SO_PASSPIDFD           76
>>>   #define SO_PEERPIDFD           77
>>>
>>> +#define SCM_TS_OPT_ID          78
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>   #if __BITS_PER_LONG == 64
>>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
>>> index 60ebaed28a4c..bb3dc8feb205 100644
>>> --- a/arch/mips/include/uapi/asm/socket.h
>>> +++ b/arch/mips/include/uapi/asm/socket.h
>>> @@ -151,6 +151,8 @@
>>>   #define SO_PASSPIDFD           76
>>>   #define SO_PEERPIDFD           77
>>>
>>> +#define SCM_TS_OPT_ID          78
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>   #if __BITS_PER_LONG == 64
>>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
>>> index be264c2b1a11..c3ab3b3289eb 100644
>>> --- a/arch/parisc/include/uapi/asm/socket.h
>>> +++ b/arch/parisc/include/uapi/asm/socket.h
>>> @@ -132,6 +132,8 @@
>>>   #define SO_PASSPIDFD           0x404A
>>>   #define SO_PEERPIDFD           0x404B
>>>
>>> +#define SCM_TS_OPT_ID          0x404C
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>   #if __BITS_PER_LONG == 64
>>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
>>> index 682da3714686..9b40f0a57fbc 100644
>>> --- a/arch/sparc/include/uapi/asm/socket.h
>>> +++ b/arch/sparc/include/uapi/asm/socket.h
>>> @@ -133,6 +133,8 @@
>>>   #define SO_PASSPIDFD             0x0055
>>>   #define SO_PEERPIDFD             0x0056
>>>
>>> +#define SCM_TS_OPT_ID            0x0057
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>
>>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>>> index 394c3b66065e..2161d50cf0fd 100644
>>> --- a/include/net/inet_sock.h
>>> +++ b/include/net/inet_sock.h
>>> @@ -174,6 +174,7 @@ struct inet_cork {
>>>          __s16                   tos;
>>>          char                    priority;
>>>          __u16                   gso_size;
>>> +       u32                     ts_opt_id;
>>>          u64                     transmit_time;
>>>          u32                     mark;
>>>   };
>>> @@ -241,7 +242,8 @@ struct inet_sock {
>>>          struct inet_cork_full   cork;
>>>   };
>>>
>>> -#define IPCORK_OPT     1       /* ip-options has been held in ipcork.opt */
>>> +#define IPCORK_OPT             1       /* ip-options has been held in ipcork.opt */
>>> +#define IPCORK_TS_OPT_ID       2       /* timestmap opt id has been provided in cmsg */
>>>
>>>   enum {
>>>          INET_FLAGS_PKTINFO      = 0,
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index f51d61fab059..73e21dad5660 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
>>>          u64 transmit_time;
>>>          u32 mark;
>>>          u32 tsflags;
>>> +       u32 ts_opt_id;
>>>   };
>>>
>>>   static inline void sockcm_init(struct sockcm_cookie *sockc,
>>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
>>> index 8ce8a39a1e5f..db3df3e74b01 100644
>>> --- a/include/uapi/asm-generic/socket.h
>>> +++ b/include/uapi/asm-generic/socket.h
>>> @@ -135,6 +135,8 @@
>>>   #define SO_PASSPIDFD           76
>>>   #define SO_PEERPIDFD           77
>>>
>>> +#define SCM_TS_OPT_ID          78
>>> +
>>>   #if !defined(__KERNEL__)
>>>
>>>   #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
>>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
>>> index a2c66b3d7f0f..e2f145e3f3a1 100644
>>> --- a/include/uapi/linux/net_tstamp.h
>>> +++ b/include/uapi/linux/net_tstamp.h
>>> @@ -32,8 +32,9 @@ enum {
>>>          SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
>>>          SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
>>>          SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
>>> +       SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
>>
>> I'm not sure if the new flag needs to be documented as well? After
>> this patch, people may search the key word in the documentation file
>> and then find nothing.
>>
>> If we have this flag here, normally it means we can pass it through
>> setsockopt, so is it expected? If it's an exception, I reckon that we
>> can forbid passing/setting this option in sock_set_timestamping() and
>> document this rule?
> 
> Good point, thanks.
> 
> It must definitely not be part of SOF_TIMESTAMPING_MASK. My bad for
> suggesting without giving it much thought.
> 
> The bit is kernel-internal. No need to even mention it in user-facing
> documentation. But anyone reading net_tstamp.h might wonder what it
> does.
> 
> It should not even be in a UAPI header, but in an internal one.
> Probably include/net/sock.h, near SK_FLAGS_TIMESTAMP.
> 
> Maybe we can reserve bit 31 in u32 sk_tsflags. And if we ever have
> to double that flag size, it can move up to 63, as it is not UAPI in
> any way. This is a workaround to having a separate flags field in
> sockcm_cookie.
> 
> And have a BUILD_BUG_ON if SOF_TIMESTAMPING_LAST reaches this reserved
> region.

Hi Willem,

There is one more issue here. sk_tsflags is u32 as well as
sockcm_cookie::tsflags. But sock_tx_timestamp receives __u16 tsflags,
usually filled with sockc.tsflags. It works now because
SOF_TIMESTAMPING_OPT_ID_TCP is not checked in these functions, but it's
wrong in general. Should I fix it in this series or it's better to do in
a seperate one?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-02 21:07   ` Vadim Fedorenko
@ 2024-09-03 20:40     ` Willem de Bruijn
  0 siblings, 0 replies; 22+ messages in thread
From: Willem de Bruijn @ 2024-09-03 20:40 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, Willem de Bruijn
  Cc: netdev, Jakub Kicinski, Jason Xing, Paolo Abeni, David Ahern

Vadim Fedorenko wrote:
> On 02/09/2024 15:29, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> >> timestamps and packets sent via socket. Unfortunately, there is no way
> >> to reliably predict socket timestamp ID value in case of error returned
> >> by sendmsg. For UDP sockets it's impossible because of lockless
> >> nature of UDP transmit, several threads may send packets in parallel. In
> >> case of RAW sockets MSG_MORE option makes things complicated. More
> >> details are in the conversation [1].
> >> This patch adds new control message type to give user-space
> >> software an opportunity to control the mapping between packets and
> >> values by providing ID with each sendmsg. This works fine for UDP
> >> sockets only, and explicit check is added to control message parser.
> >>
> >> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> >>
> >> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >> ---
> >>   Documentation/networking/timestamping.rst | 14 ++++++++++++++
> >>   arch/alpha/include/uapi/asm/socket.h      |  4 +++-
> >>   arch/mips/include/uapi/asm/socket.h       |  2 ++
> >>   arch/parisc/include/uapi/asm/socket.h     |  2 ++
> >>   arch/sparc/include/uapi/asm/socket.h      |  2 ++
> >>   include/net/inet_sock.h                   |  4 +++-
> >>   include/net/sock.h                        |  1 +
> >>   include/uapi/asm-generic/socket.h         |  2 ++
> >>   include/uapi/linux/net_tstamp.h           |  3 ++-
> >>   net/core/sock.c                           | 12 ++++++++++++
> >>   net/ethtool/common.c                      |  1 +
> >>   net/ipv4/ip_output.c                      | 16 ++++++++++++----
> >>   net/ipv6/ip6_output.c                     | 16 ++++++++++++----
> >>   13 files changed, 68 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> >> index 5e93cd71f99f..93b0901e4e8e 100644
> >> --- a/Documentation/networking/timestamping.rst
> >> +++ b/Documentation/networking/timestamping.rst
> >> @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
> >>     among all possibly concurrently outstanding timestamp requests for
> >>     that socket.
> >>   
> >> +  With this option enabled user-space application can provide custom
> >> +  ID for each message sent via UDP socket with control message with
> >> +  type set to SCM_TS_OPT_ID::
> >> +
> >> +    struct msghdr *msg;
> >> +    ...
> >> +    cmsg			 = CMSG_FIRSTHDR(msg);
> >> +    cmsg->cmsg_level		 = SOL_SOCKET;
> >> +    cmsg->cmsg_type		 = SO_TIMESTAMPING;
> >> +    cmsg->cmsg_len		 = CMSG_LEN(sizeof(__u32));
> >> +    *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
> >> +    err = sendmsg(fd, msg, 0);
> >> +
> > 
> > Please make it clear that this CMSG is optional.
> > 
> > The process can optionally override the default generated ID, by
> > passing a specific ID with control message SCM_TS_OPT_ID:
> 
> Ok, I'll re-phrase it this way, thanks!
> 
> 
> >>   SOF_TIMESTAMPING_OPT_ID_TCP:
> >>     Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
> >>     timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> >> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> >> index e94f621903fe..0698e6662cdf 100644
> >> --- a/arch/alpha/include/uapi/asm/socket.h
> >> +++ b/arch/alpha/include/uapi/asm/socket.h
> >> @@ -10,7 +10,7 @@
> >>    * Note: we only bother about making the SOL_SOCKET options
> >>    * same as OSF/1, as that's all that "normal" programs are
> >>    * likely to set.  We don't necessarily want to be binary
> >> - * compatible with _everything_.
> >> + * compatible with _everything_.
> > 
> > Is this due to a checkpatch warning? If so, please add a brief comment
> > to the commit message to show that this change is intentional. If not,
> > please don't touch unrelated code.
> 
> I'll remove it, because it looks like it was some unhappy linter...
> 
> >>    */
> >>   #define SOL_SOCKET	0xffff
> >>   
> >> @@ -140,6 +140,8 @@
> >>   #define SO_PASSPIDFD		76
> >>   #define SO_PEERPIDFD		77
> >>   
> >> +#define SCM_TS_OPT_ID		78
> >> +
> >>   #if !defined(__KERNEL__)
> >>   
> >>   #if __BITS_PER_LONG == 64
> >> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> >> index 60ebaed28a4c..bb3dc8feb205 100644
> >> --- a/arch/mips/include/uapi/asm/socket.h
> >> +++ b/arch/mips/include/uapi/asm/socket.h
> >> @@ -151,6 +151,8 @@
> >>   #define SO_PASSPIDFD		76
> >>   #define SO_PEERPIDFD		77
> >>   
> >> +#define SCM_TS_OPT_ID		78
> >> +
> >>   #if !defined(__KERNEL__)
> >>   
> >>   #if __BITS_PER_LONG == 64
> >> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> >> index be264c2b1a11..c3ab3b3289eb 100644
> >> --- a/arch/parisc/include/uapi/asm/socket.h
> >> +++ b/arch/parisc/include/uapi/asm/socket.h
> >> @@ -132,6 +132,8 @@
> >>   #define SO_PASSPIDFD		0x404A
> >>   #define SO_PEERPIDFD		0x404B
> >>   
> >> +#define SCM_TS_OPT_ID		0x404C
> >> +
> >>   #if !defined(__KERNEL__)
> >>   
> >>   #if __BITS_PER_LONG == 64
> >> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> >> index 682da3714686..9b40f0a57fbc 100644
> >> --- a/arch/sparc/include/uapi/asm/socket.h
> >> +++ b/arch/sparc/include/uapi/asm/socket.h
> >> @@ -133,6 +133,8 @@
> >>   #define SO_PASSPIDFD             0x0055
> >>   #define SO_PEERPIDFD             0x0056
> >>   
> >> +#define SCM_TS_OPT_ID            0x0057
> >> +
> >>   #if !defined(__KERNEL__)
> >>   
> >>   
> >> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> >> index 394c3b66065e..2161d50cf0fd 100644
> >> --- a/include/net/inet_sock.h
> >> +++ b/include/net/inet_sock.h
> >> @@ -174,6 +174,7 @@ struct inet_cork {
> >>   	__s16			tos;
> >>   	char			priority;
> >>   	__u16			gso_size;
> >> +	u32			ts_opt_id;
> >>   	u64			transmit_time;
> >>   	u32			mark;
> >>   };
> >> @@ -241,7 +242,8 @@ struct inet_sock {
> >>   	struct inet_cork_full	cork;
> >>   };
> >>   
> >> -#define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
> >> +#define IPCORK_OPT		1	/* ip-options has been held in ipcork.opt */
> >> +#define IPCORK_TS_OPT_ID	2	/* timestmap opt id has been provided in cmsg */
> > 
> > typo: timestamp
> > 
> > And maybe more relevant:  /* ts_opt_id field is valid, overriding sk_tskey */
> 
> I'll change it
> 
> >>   enum {
> >>   	INET_FLAGS_PKTINFO	= 0,
> >> diff --git a/include/net/sock.h b/include/net/sock.h
> >> index f51d61fab059..73e21dad5660 100644
> >> --- a/include/net/sock.h
> >> +++ b/include/net/sock.h
> >> @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
> >>   	u64 transmit_time;
> >>   	u32 mark;
> >>   	u32 tsflags;
> >> +	u32 ts_opt_id;
> >>   };
> >>   
> >>   static inline void sockcm_init(struct sockcm_cookie *sockc,
> >> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> >> index 8ce8a39a1e5f..db3df3e74b01 100644
> >> --- a/include/uapi/asm-generic/socket.h
> >> +++ b/include/uapi/asm-generic/socket.h
> >> @@ -135,6 +135,8 @@
> >>   #define SO_PASSPIDFD		76
> >>   #define SO_PEERPIDFD		77
> >>   
> >> +#define SCM_TS_OPT_ID		78
> > 
> > nit: different indentation
> 
> Hmm... that's interesting, it's ok in the code, there no spaces before
> #define. I'll re-check it in the patch in v3.
> 
> >> +
> >>   #if !defined(__KERNEL__)
> >>   
> >>   #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> >> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> >> index a2c66b3d7f0f..e2f145e3f3a1 100644
> >> --- a/include/uapi/linux/net_tstamp.h
> >> +++ b/include/uapi/linux/net_tstamp.h
> >> @@ -32,8 +32,9 @@ enum {
> >>   	SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
> >>   	SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
> >>   	SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
> >> +	SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
> >>   
> >> -	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_TCP,
> >> +	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_CMSG,
> >>   	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
> >>   				 SOF_TIMESTAMPING_LAST
> >>   };
> >> diff --git a/net/core/sock.c b/net/core/sock.c
> >> index 468b1239606c..560b075765fa 100644
> >> --- a/net/core/sock.c
> >> +++ b/net/core/sock.c
> >> @@ -2859,6 +2859,18 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
> >>   			return -EINVAL;
> >>   		sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
> >>   		break;
> >> +	case SCM_TS_OPT_ID:
> >> +		/* allow this option for UDP sockets only */
> >> +		if (!sk_is_udp(sk))
> >> +			return -EINVAL;
> > 
> > Let's relax the restriction that this is only for UDP.
> > 
> > At least to also support SOCK_RAW. I don't think that requires any
> > additional code at all?
> 
> RAW sockets use skb_setup_tx_timestamps which does atomic operation of
> incrementing sk_tskey when in _sock_tx_timestamp. So I'll have to
> convert all spots (can, ipv4/raw, ipv6/raw, 3 x af_packet) to use the
> same logic as in udp path (sock_tx_timestamp) and add conditions.
> Or change skb_setup_tx_timestamps to do the logic and take different
> arguments. It may look as a big refactoring, so I would like to make
> it as a follow-up series.

You're referring to passing the whole sockcm_cookie through these
functions, right?

Agreed that that is a lot of churn and best left for a separate patch.
Would still be good to have it merged together in a single series, but
not a hard requirement.

Other than that it seems a small change in _sock_tx_timestamp.

@@ -2668,8 +2668,12 @@ static inline void _sock_tx_timestamp(struct sock *sk, __u16 tsflags,
        if (unlikely(tsflags)) {
-               __sock_tx_timestamp(tsflags, tx_flags);
+               __sock_tx_timestamp(sockc->tsflags, tx_flags);
                if (tsflags & SOF_TIMESTAMPING_OPT_ID && tskey &&
-                   tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
-                       *tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+                   tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
+                       if (tsflags & SOF_TIMESTAMPING_OPT_ID_CMSG)
+                               *tskey = sockc->ts_opt_id;
+                       else
+                               *tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+               }

> > Extending to TCP should be straightforward too, just a branch
> > on sockc in tcp_tx_timestamp.
> 
> TCP part looks a bit easier, as you said, I have to adjust
> tcp_tx_timestamp and the logic is straightforward. I still have to
> provide a pointer to sock coockie instead of flags, but there is only
> one caller of this function and it's much easier than with RAW sockets.

Yeah, the two are intermingled. If you want to pass sockc to
_sock_tx_timestamp then all callers have to pass it. And TCP
currently does not.

> > If so, let's support all. It makes for a simpler API if it is
> > supported uniformly wherever OPT_ID is.
> 
> If you think that the way I explained for RAW sockets is good enough,
> I can send all of them as a single patcheset. Otherwise I would like to
> add RAW sockets in follow-up series.
> 
> >> +		tsflags = READ_ONCE(sk->sk_tsflags);
> >> +		if (!(tsflags & SOF_TIMESTAMPING_OPT_ID))
> >> +			return -EINVAL;
> >> +		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
> >> +			return -EINVAL;
> >> +		sockc->ts_opt_id = *(u32 *)CMSG_DATA(cmsg);
> >> +		sockc->tsflags |= SOF_TIMESTAMPING_OPT_ID_CMSG;
> >> +		break;
> >>   	/* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
> >>   	case SCM_RIGHTS:
> >>   	case SCM_CREDENTIALS:
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
  2024-09-03 16:01     ` Vadim Fedorenko
@ 2024-09-03 20:46       ` Willem de Bruijn
  0 siblings, 0 replies; 22+ messages in thread
From: Willem de Bruijn @ 2024-09-03 20:46 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, Willem de Bruijn
  Cc: Jakub Kicinski, Paolo Abeni, David Ahern, netdev, Jason Xing

Vadim Fedorenko wrote:
> On 02/09/2024 16:51, Willem de Bruijn wrote:
> > Jason Xing wrote:
> >> On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
> >>>
> >>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
> >>> timestamps and packets sent via socket. Unfortunately, there is no way
> >>> to reliably predict socket timestamp ID value in case of error returned
> >>> by sendmsg. For UDP sockets it's impossible because of lockless
> >>> nature of UDP transmit, several threads may send packets in parallel. In
> >>> case of RAW sockets MSG_MORE option makes things complicated. More
> >>> details are in the conversation [1].
> >>> This patch adds new control message type to give user-space
> >>> software an opportunity to control the mapping between packets and
> >>> values by providing ID with each sendmsg. This works fine for UDP
> >>> sockets only, and explicit check is added to control message parser.
> >>>
> >>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
> >>>
> >>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>

> There is one more issue here. sk_tsflags is u32 as well as
> sockcm_cookie::tsflags. But sock_tx_timestamp receives __u16 tsflags,
> usually filled with sockc.tsflags. It works now because
> SOF_TIMESTAMPING_OPT_ID_TCP is not checked in these functions, but it's
> wrong in general. Should I fix it in this series or it's better to do in
> a seperate one?

Good find!

I overlooked that when expanding sk_tsflags when adding the 17th flag,
SOF_TIMESTAMPING_OPT_ID_TCP, and increasing sk_tsflags from 16 to 32b,
in commit b534dc46c8ae.

Passing sockc instead of sockc.ts_flags in all cases will address
this.

This does mean that up until now SOF_TIMESTAMPING_OPT_ID_TCP could not
be passed as a sock cookie. We did not notice that, because it is
benign: the option selects which tcp field to use as base for tskey,
and is only active on setsockopt.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
@ 2024-09-04 13:25 kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2024-09-04 13:25 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20240902130937.457115-1-vadfed@meta.com>
References: <20240902130937.457115-1-vadfed@meta.com>
TO: Vadim Fedorenko <vadfed@meta.com>
TO: Vadim Fedorenko <vadim.fedorenko@linux.dev>
TO: Willem de Bruijn <willemb@google.com>
TO: Jakub Kicinski <kuba@kernel.org>
TO: Paolo Abeni <pabeni@redhat.com>
TO: David Ahern <dsahern@kernel.org>
TO: Jason Xing <kerneljasonxing@gmail.com>
CC: netdev@vger.kernel.org

Hi Vadim,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Vadim-Fedorenko/selftests-txtimestamp-add-SCM_TS_OPT_ID-test/20240902-212008
base:   net-next/main
patch link:    https://lore.kernel.org/r/20240902130937.457115-1-vadfed%40meta.com
patch subject: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
:::::: branch date: 2 days ago
:::::: commit date: 2 days ago
config: x86_64-randconfig-161-20240904 (https://download.01.org/0day-ci/archive/20240904/202409042149.JShdkgQr-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Closes: https://lore.kernel.org/r/202409042149.JShdkgQr-lkp@intel.com/

smatch warnings:
net/ipv6/ip6_output.c:1815 __ip6_append_data() error: uninitialized symbol 'hold_tskey'.

vim +/hold_tskey +1815 net/ipv6/ip6_output.c

366e41d9774d70 Vlad Yasevich            2015-01-31  1414  
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1415  static int __ip6_append_data(struct sock *sk,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1416  			     struct sk_buff_head *queue,
f3b46a3e8c4030 Pavel Begunkov           2022-01-27  1417  			     struct inet_cork_full *cork_full,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1418  			     struct inet6_cork *v6_cork,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1419  			     struct page_frag *pfrag,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1420  			     int getfrag(void *from, char *to, int offset,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1421  					 int len, int odd, struct sk_buff *skb),
f93431c86b631b Wang Yufen               2022-06-07  1422  			     void *from, size_t length, int transhdrlen,
5fdaa88dfefa87 Willem de Bruijn         2018-07-06  1423  			     unsigned int flags, struct ipcm6_cookie *ipc6)
366e41d9774d70 Vlad Yasevich            2015-01-31  1424  {
366e41d9774d70 Vlad Yasevich            2015-01-31  1425  	struct sk_buff *skb, *skb_prev = NULL;
f3b46a3e8c4030 Pavel Begunkov           2022-01-27  1426  	struct inet_cork *cork = &cork_full->base;
f37a4cc6bb0ba0 Pavel Begunkov           2022-01-27  1427  	struct flowi6 *fl6 = &cork_full->fl.u.ip6;
10b8a3de603df7 Paolo Abeni              2018-03-23  1428  	unsigned int maxfraglen, fragheaderlen, mtu, orig_mtu, pmtu;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1429  	struct ubuf_info *uarg = NULL;
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1430  	int exthdrlen = 0;
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1431  	int dst_exthdrlen = 0;
366e41d9774d70 Vlad Yasevich            2015-01-31  1432  	int hh_len;
366e41d9774d70 Vlad Yasevich            2015-01-31  1433  	int copy;
366e41d9774d70 Vlad Yasevich            2015-01-31  1434  	int err;
366e41d9774d70 Vlad Yasevich            2015-01-31  1435  	int offset = 0;
773ba4fe9104a6 Pavel Begunkov           2022-07-12  1436  	bool zc = false;
366e41d9774d70 Vlad Yasevich            2015-01-31  1437  	u32 tskey = 0;
e8dfd42c17faf1 Eric Dumazet             2024-04-26  1438  	struct rt6_info *rt = dst_rt6_info(cork->dst);
488b6d91b07112 Vadim Fedorenko          2024-02-13  1439  	bool paged, hold_tskey, extra_uref = false;
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1440  	struct ipv6_txoptions *opt = v6_cork->opt;
32dce968dd987a Vlad Yasevich            2015-01-31  1441  	int csummode = CHECKSUM_NONE;
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1442  	unsigned int maxnonfragsize, headersize;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1443  	unsigned int wmem_alloc_delta = 0;
366e41d9774d70 Vlad Yasevich            2015-01-31  1444  
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1445  	skb = skb_peek_tail(queue);
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1446  	if (!skb) {
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1447  		exthdrlen = opt ? opt->opt_flen : 0;
7efdba5bd9a2f3 Romain KUNTZ             2013-01-16  1448  		dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1449  	}
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1450  
15e36f5b8e982d Willem de Bruijn         2018-04-26  1451  	paged = !!cork->gso_size;
bec1f6f697362c Willem de Bruijn         2018-04-26  1452  	mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
e367c2d03dba4c lucien                   2014-03-17  1453  	orig_mtu = mtu;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1454  
d8d1f30b95a635 Changli Gao              2010-06-10  1455  	hh_len = LL_RESERVED_SPACE(rt->dst.dev);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1456  
a1b051405bc162 Masahide NAKAMURA        2007-12-20  1457  	fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
b4ce92775c2e7f Herbert Xu               2007-11-13  1458  			(opt ? opt->opt_nflen : 0);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1459  
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1460  	headersize = sizeof(struct ipv6hdr) +
3a1cebe7e05027 Hannes Frederic Sowa     2014-05-11  1461  		     (opt ? opt->opt_flen + opt->opt_nflen : 0) +
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1462  		     rt->rt6i_nfheader_len;
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1463  
5e34af4142ffe6 Tadeusz Struk            2022-03-10  1464  	if (mtu <= fragheaderlen ||
5e34af4142ffe6 Tadeusz Struk            2022-03-10  1465  	    ((mtu - fragheaderlen) & ~7) + fragheaderlen <= sizeof(struct frag_hdr))
6596a022954127 Jiri Bohac               2022-01-19  1466  		goto emsgsize;
6596a022954127 Jiri Bohac               2022-01-19  1467  
6596a022954127 Jiri Bohac               2022-01-19  1468  	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
6596a022954127 Jiri Bohac               2022-01-19  1469  		     sizeof(struct frag_hdr);
6596a022954127 Jiri Bohac               2022-01-19  1470  
10b8a3de603df7 Paolo Abeni              2018-03-23  1471  	/* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit
10b8a3de603df7 Paolo Abeni              2018-03-23  1472  	 * the first fragment
10b8a3de603df7 Paolo Abeni              2018-03-23  1473  	 */
10b8a3de603df7 Paolo Abeni              2018-03-23  1474  	if (headersize + transhdrlen > mtu)
10b8a3de603df7 Paolo Abeni              2018-03-23  1475  		goto emsgsize;
10b8a3de603df7 Paolo Abeni              2018-03-23  1476  
26879da58711aa Wei Wang                 2016-05-02  1477  	if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1478  	    (sk->sk_protocol == IPPROTO_UDP ||
13651224c00b74 Jakub Kicinski           2022-02-16  1479  	     sk->sk_protocol == IPPROTO_ICMPV6 ||
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1480  	     sk->sk_protocol == IPPROTO_RAW)) {
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1481  		ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1482  				sizeof(struct ipv6hdr));
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1483  		goto emsgsize;
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1484  	}
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1485  
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1486  	if (ip6_sk_ignore_df(sk))
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1487  		maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1488  	else
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1489  		maxnonfragsize = mtu;
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1490  
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1491  	if (cork->length + length > maxnonfragsize - headersize) {
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1492  emsgsize:
10b8a3de603df7 Paolo Abeni              2018-03-23  1493  		pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0);
10b8a3de603df7 Paolo Abeni              2018-03-23  1494  		ipv6_local_error(sk, EMSGSIZE, fl6, pmtu);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1495  		return -EMSGSIZE;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1496  	}
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1497  
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1498  	/* CHECKSUM_PARTIAL only with no extension headers and when
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1499  	 * we are not going to fragment
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1500  	 */
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1501  	if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1502  	    headersize == sizeof(struct ipv6hdr) &&
2b89ed65a6f201 Vlad Yasevich            2017-01-29  1503  	    length <= mtu - headersize &&
bec1f6f697362c Willem de Bruijn         2018-04-26  1504  	    (!(flags & MSG_MORE) || cork->gso_size) &&
c8cd0989bd151f Tom Herbert              2015-12-14  1505  	    rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM))
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1506  		csummode = CHECKSUM_PARTIAL;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1507  
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1508  	if ((flags & MSG_ZEROCOPY) && length) {
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1509  		struct msghdr *msg = from;
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1510  
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1511  		if (getfrag == ip_generic_getfrag && msg->msg_ubuf) {
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1512  			if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb))
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1513  				return -EINVAL;
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1514  
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1515  			/* Leave uarg NULL if can't zerocopy, callers should
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1516  			 * be able to handle it.
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1517  			 */
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1518  			if ((rt->dst.dev->features & NETIF_F_SG) &&
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1519  			    csummode == CHECKSUM_PARTIAL) {
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1520  				paged = true;
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1521  				zc = true;
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1522  				uarg = msg->msg_ubuf;
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1523  			}
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1524  		} else if (sock_flag(sk, SOCK_ZEROCOPY)) {
8c793822c5803e Jonathan Lemon           2021-01-06  1525  			uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
b5947e5d1e710c Willem de Bruijn         2018-11-30  1526  			if (!uarg)
b5947e5d1e710c Willem de Bruijn         2018-11-30  1527  				return -ENOBUFS;
522924b583082f Willem de Bruijn         2019-06-07  1528  			extra_uref = !skb_zcopy(skb);	/* only ref on new uarg */
b5947e5d1e710c Willem de Bruijn         2018-11-30  1529  			if (rt->dst.dev->features & NETIF_F_SG &&
b5947e5d1e710c Willem de Bruijn         2018-11-30  1530  			    csummode == CHECKSUM_PARTIAL) {
b5947e5d1e710c Willem de Bruijn         2018-11-30  1531  				paged = true;
773ba4fe9104a6 Pavel Begunkov           2022-07-12  1532  				zc = true;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1533  			} else {
e7d2b510165fff Pavel Begunkov           2022-09-23  1534  				uarg_to_msgzc(uarg)->zerocopy = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1535  				skb_zcopy_set(skb, uarg, &extra_uref);
b5947e5d1e710c Willem de Bruijn         2018-11-30  1536  			}
b5947e5d1e710c Willem de Bruijn         2018-11-30  1537  		}
6d8192bd69bb43 David Howells            2023-05-22  1538  	} else if ((flags & MSG_SPLICE_PAGES) && length) {
cafbe182a467bf Eric Dumazet             2023-08-16  1539  		if (inet_test_bit(HDRINCL, sk))
6d8192bd69bb43 David Howells            2023-05-22  1540  			return -EPERM;
5a6f6873606e03 David Howells            2023-06-14  1541  		if (rt->dst.dev->features & NETIF_F_SG &&
5a6f6873606e03 David Howells            2023-06-14  1542  		    getfrag == ip_generic_getfrag)
6d8192bd69bb43 David Howells            2023-05-22  1543  			/* We need an empty buffer to attach stuff to */
6d8192bd69bb43 David Howells            2023-05-22  1544  			paged = true;
6d8192bd69bb43 David Howells            2023-05-22  1545  		else
6d8192bd69bb43 David Howells            2023-05-22  1546  			flags &= ~MSG_SPLICE_PAGES;
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1547  	}
b5947e5d1e710c Willem de Bruijn         2018-11-30  1548  
b7399073687728 Vadim Fedorenko          2024-09-02  1549  	if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
b7399073687728 Vadim Fedorenko          2024-09-02  1550  	    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
b7399073687728 Vadim Fedorenko          2024-09-02  1551  		if (cork->flags & IPCORK_TS_OPT_ID) {
b7399073687728 Vadim Fedorenko          2024-09-02  1552  			tskey = cork->ts_opt_id;
b7399073687728 Vadim Fedorenko          2024-09-02  1553  		} else {
488b6d91b07112 Vadim Fedorenko          2024-02-13  1554  			tskey = atomic_inc_return(&sk->sk_tskey) - 1;
b7399073687728 Vadim Fedorenko          2024-09-02  1555  			hold_tskey = true;
b7399073687728 Vadim Fedorenko          2024-09-02  1556  		}
b7399073687728 Vadim Fedorenko          2024-09-02  1557  	}
488b6d91b07112 Vadim Fedorenko          2024-02-13  1558  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1559  	/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1560  	 * Let's try using as much space as possible.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1561  	 * Use MTU if total length of the message fits into the MTU.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1562  	 * Otherwise, we need to reserve fragment header and
^1da177e4c3f41 Linus Torvalds           2005-04-16  1563  	 * fragment alignment (= 8-15 octects, in total).
^1da177e4c3f41 Linus Torvalds           2005-04-16  1564  	 *
634a63e73f0594 Randy Dunlap             2020-09-17  1565  	 * Note that we may need to "move" the data from the tail
^1da177e4c3f41 Linus Torvalds           2005-04-16  1566  	 * of the buffer to the new fragment when we split
^1da177e4c3f41 Linus Torvalds           2005-04-16  1567  	 * the message.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1568  	 *
^1da177e4c3f41 Linus Torvalds           2005-04-16  1569  	 * FIXME: It may be fragmented into multiple chunks
^1da177e4c3f41 Linus Torvalds           2005-04-16  1570  	 *        at once if non-fragmentable extension headers
^1da177e4c3f41 Linus Torvalds           2005-04-16  1571  	 *        are too large.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1572  	 * --yoshfuji
^1da177e4c3f41 Linus Torvalds           2005-04-16  1573  	 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1574  
2811ebac2521ce Hannes Frederic Sowa     2013-09-21  1575  	cork->length += length;
2811ebac2521ce Hannes Frederic Sowa     2013-09-21  1576  	if (!skb)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1577  		goto alloc_new_skb;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1578  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1579  	while (length > 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1580  		/* Check if the remaining data fits into current packet. */
e57a34478586fe Yan Zhai                 2023-10-24  1581  		copy = (cork->length <= mtu ? mtu : maxfraglen) - skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1582  		if (copy < length)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1583  			copy = maxfraglen - skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1584  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1585  		if (copy <= 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1586  			char *data;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1587  			unsigned int datalen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1588  			unsigned int fraglen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1589  			unsigned int fraggap;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1590  			unsigned int alloclen, alloc_extra;
aba36930a35e7f Willem de Bruijn         2018-11-24  1591  			unsigned int pagedlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1592  alloc_new_skb:
^1da177e4c3f41 Linus Torvalds           2005-04-16  1593  			/* There's no room in the current skb */
0c1833797a5a6e Gao feng                 2012-05-26  1594  			if (skb)
0c1833797a5a6e Gao feng                 2012-05-26  1595  				fraggap = skb->len - maxfraglen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1596  			else
^1da177e4c3f41 Linus Torvalds           2005-04-16  1597  				fraggap = 0;
0c1833797a5a6e Gao feng                 2012-05-26  1598  			/* update mtu and maxfraglen if necessary */
63159f29be1df7 Ian Morris               2015-03-29  1599  			if (!skb || !skb_prev)
0c1833797a5a6e Gao feng                 2012-05-26  1600  				ip6_append_data_mtu(&mtu, &maxfraglen,
75a493e60ac4bb Hannes Frederic Sowa     2013-07-02  1601  						    fragheaderlen, skb, rt,
e367c2d03dba4c lucien                   2014-03-17  1602  						    orig_mtu);
0c1833797a5a6e Gao feng                 2012-05-26  1603  
0c1833797a5a6e Gao feng                 2012-05-26  1604  			skb_prev = skb;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1605  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1606  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1607  			 * If remaining data exceeds the mtu,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1608  			 * we know we need more fragment(s).
^1da177e4c3f41 Linus Torvalds           2005-04-16  1609  			 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1610  			datalen = length + fraggap;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1611  
e57a34478586fe Yan Zhai                 2023-10-24  1612  			if (datalen > (cork->length <= mtu ? mtu : maxfraglen) - fragheaderlen)
0c1833797a5a6e Gao feng                 2012-05-26  1613  				datalen = maxfraglen - fragheaderlen - rt->dst.trailer_len;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1614  			fraglen = datalen + fragheaderlen;
aba36930a35e7f Willem de Bruijn         2018-11-24  1615  			pagedlen = 0;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1616  
6d123b81ac6150 Jakub Kicinski           2021-06-23  1617  			alloc_extra = hh_len;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1618  			alloc_extra += dst_exthdrlen;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1619  			alloc_extra += rt->dst.trailer_len;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1620  
6d123b81ac6150 Jakub Kicinski           2021-06-23  1621  			/* We just reserve space for fragment header.
6d123b81ac6150 Jakub Kicinski           2021-06-23  1622  			 * Note: this may be overallocation if the message
6d123b81ac6150 Jakub Kicinski           2021-06-23  1623  			 * (without MSG_MORE) fits into the MTU.
6d123b81ac6150 Jakub Kicinski           2021-06-23  1624  			 */
6d123b81ac6150 Jakub Kicinski           2021-06-23  1625  			alloc_extra += sizeof(struct frag_hdr);
6d123b81ac6150 Jakub Kicinski           2021-06-23  1626  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1627  			if ((flags & MSG_MORE) &&
d8d1f30b95a635 Changli Gao              2010-06-10  1628  			    !(rt->dst.dev->features&NETIF_F_SG))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1629  				alloclen = mtu;
6d123b81ac6150 Jakub Kicinski           2021-06-23  1630  			else if (!paged &&
6d123b81ac6150 Jakub Kicinski           2021-06-23  1631  				 (fraglen + alloc_extra < SKB_MAX_ALLOC ||
6d123b81ac6150 Jakub Kicinski           2021-06-23  1632  				  !(rt->dst.dev->features & NETIF_F_SG)))
15e36f5b8e982d Willem de Bruijn         2018-04-26  1633  				alloclen = fraglen;
47cf88993c9108 Pavel Begunkov           2022-08-25  1634  			else {
773ba4fe9104a6 Pavel Begunkov           2022-07-12  1635  				alloclen = fragheaderlen + transhdrlen;
773ba4fe9104a6 Pavel Begunkov           2022-07-12  1636  				pagedlen = datalen - transhdrlen;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1637  			}
6d123b81ac6150 Jakub Kicinski           2021-06-23  1638  			alloclen += alloc_extra;
299b0767642a65 Steffen Klassert         2011-10-11  1639  
0c1833797a5a6e Gao feng                 2012-05-26  1640  			if (datalen != length + fraggap) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1641  				/*
0c1833797a5a6e Gao feng                 2012-05-26  1642  				 * this is not the last fragment, the trailer
0c1833797a5a6e Gao feng                 2012-05-26  1643  				 * space is regarded as data space.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1644  				 */
0c1833797a5a6e Gao feng                 2012-05-26  1645  				datalen += rt->dst.trailer_len;
0c1833797a5a6e Gao feng                 2012-05-26  1646  			}
0c1833797a5a6e Gao feng                 2012-05-26  1647  
0c1833797a5a6e Gao feng                 2012-05-26  1648  			fraglen = datalen + fragheaderlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1649  
15e36f5b8e982d Willem de Bruijn         2018-04-26  1650  			copy = datalen - transhdrlen - fraggap - pagedlen;
ce650a1663354a David Howells            2023-08-02  1651  			/* [!] NOTE: copy may be negative if pagedlen>0
ce650a1663354a David Howells            2023-08-02  1652  			 * because then the equation may reduces to -fraggap.
ce650a1663354a David Howells            2023-08-02  1653  			 */
ce650a1663354a David Howells            2023-08-02  1654  			if (copy < 0 && !(flags & MSG_SPLICE_PAGES)) {
232cd35d0804cc Eric Dumazet             2017-05-19  1655  				err = -EINVAL;
232cd35d0804cc Eric Dumazet             2017-05-19  1656  				goto error;
232cd35d0804cc Eric Dumazet             2017-05-19  1657  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1658  			if (transhdrlen) {
6d123b81ac6150 Jakub Kicinski           2021-06-23  1659  				skb = sock_alloc_send_skb(sk, alloclen,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1660  						(flags & MSG_DONTWAIT), &err);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1661  			} else {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1662  				skb = NULL;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1663  				if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
^1da177e4c3f41 Linus Torvalds           2005-04-16  1664  				    2 * sk->sk_sndbuf)
6d123b81ac6150 Jakub Kicinski           2021-06-23  1665  					skb = alloc_skb(alloclen,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1666  							sk->sk_allocation);
63159f29be1df7 Ian Morris               2015-03-29  1667  				if (unlikely(!skb))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1668  					err = -ENOBUFS;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1669  			}
63159f29be1df7 Ian Morris               2015-03-29  1670  			if (!skb)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1671  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1672  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1673  			 *	Fill in the control structures
^1da177e4c3f41 Linus Torvalds           2005-04-16  1674  			 */
9c9c9ad5fae7e9 Hannes Frederic Sowa     2013-08-26  1675  			skb->protocol = htons(ETH_P_IPV6);
32dce968dd987a Vlad Yasevich            2015-01-31  1676  			skb->ip_summed = csummode;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1677  			skb->csum = 0;
1f85851e17b64c Gao feng                 2012-03-19  1678  			/* reserve for fragmentation and ipsec header */
1f85851e17b64c Gao feng                 2012-03-19  1679  			skb_reserve(skb, hh_len + sizeof(struct frag_hdr) +
1f85851e17b64c Gao feng                 2012-03-19  1680  				    dst_exthdrlen);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1681  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1682  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1683  			 *	Find where to start putting bytes
^1da177e4c3f41 Linus Torvalds           2005-04-16  1684  			 */
15e36f5b8e982d Willem de Bruijn         2018-04-26  1685  			data = skb_put(skb, fraglen - pagedlen);
1f85851e17b64c Gao feng                 2012-03-19  1686  			skb_set_network_header(skb, exthdrlen);
1f85851e17b64c Gao feng                 2012-03-19  1687  			data += fragheaderlen;
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10  1688  			skb->transport_header = (skb->network_header +
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10  1689  						 fragheaderlen);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1690  			if (fraggap) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1691  				skb->csum = skb_copy_and_csum_bits(
^1da177e4c3f41 Linus Torvalds           2005-04-16  1692  					skb_prev, maxfraglen,
8d5930dfb7edbf Al Viro                  2020-07-10  1693  					data + transhdrlen, fraggap);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1694  				skb_prev->csum = csum_sub(skb_prev->csum,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1695  							  skb->csum);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1696  				data += fraggap;
e9fa4f7bd291c2 Herbert Xu               2006-08-13  1697  				pskb_trim_unique(skb_prev, maxfraglen);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1698  			}
232cd35d0804cc Eric Dumazet             2017-05-19  1699  			if (copy > 0 &&
232cd35d0804cc Eric Dumazet             2017-05-19  1700  			    getfrag(from, data + transhdrlen, offset,
232cd35d0804cc Eric Dumazet             2017-05-19  1701  				    copy, fraggap, skb) < 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1702  				err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1703  				kfree_skb(skb);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1704  				goto error;
ce650a1663354a David Howells            2023-08-02  1705  			} else if (flags & MSG_SPLICE_PAGES) {
ce650a1663354a David Howells            2023-08-02  1706  				copy = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1707  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1708  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1709  			offset += copy;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1710  			length -= copy + transhdrlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1711  			transhdrlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1712  			exthdrlen = 0;
299b0767642a65 Steffen Klassert         2011-10-11  1713  			dst_exthdrlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1714  
52900d22288e7d Willem de Bruijn         2018-11-30  1715  			/* Only the initial fragment is time stamped */
52900d22288e7d Willem de Bruijn         2018-11-30  1716  			skb_shinfo(skb)->tx_flags = cork->tx_flags;
52900d22288e7d Willem de Bruijn         2018-11-30  1717  			cork->tx_flags = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1718  			skb_shinfo(skb)->tskey = tskey;
52900d22288e7d Willem de Bruijn         2018-11-30  1719  			tskey = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1720  			skb_zcopy_set(skb, uarg, &extra_uref);
52900d22288e7d Willem de Bruijn         2018-11-30  1721  
0dec879f636f11 Julian Anastasov         2017-02-06  1722  			if ((flags & MSG_CONFIRM) && !skb_prev)
0dec879f636f11 Julian Anastasov         2017-02-06  1723  				skb_set_dst_pending_confirm(skb, 1);
0dec879f636f11 Julian Anastasov         2017-02-06  1724  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1725  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1726  			 * Put the packet on the pending queue
^1da177e4c3f41 Linus Torvalds           2005-04-16  1727  			 */
1f4c6eb2402968 Eric Dumazet             2018-03-31  1728  			if (!skb->destructor) {
1f4c6eb2402968 Eric Dumazet             2018-03-31  1729  				skb->destructor = sock_wfree;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1730  				skb->sk = sk;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1731  				wmem_alloc_delta += skb->truesize;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1732  			}
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1733  			__skb_queue_tail(queue, skb);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1734  			continue;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1735  		}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1736  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1737  		if (copy > length)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1738  			copy = length;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1739  
113f99c3358564 Willem de Bruijn         2018-05-17  1740  		if (!(rt->dst.dev->features&NETIF_F_SG) &&
113f99c3358564 Willem de Bruijn         2018-05-17  1741  		    skb_tailroom(skb) >= copy) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1742  			unsigned int off;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1743  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1744  			off = skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1745  			if (getfrag(from, skb_put(skb, copy),
^1da177e4c3f41 Linus Torvalds           2005-04-16  1746  						offset, copy, off, skb) < 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1747  				__skb_trim(skb, off);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1748  				err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1749  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1750  			}
6d8192bd69bb43 David Howells            2023-05-22  1751  		} else if (flags & MSG_SPLICE_PAGES) {
6d8192bd69bb43 David Howells            2023-05-22  1752  			struct msghdr *msg = from;
6d8192bd69bb43 David Howells            2023-05-22  1753  
ce650a1663354a David Howells            2023-08-02  1754  			err = -EIO;
ce650a1663354a David Howells            2023-08-02  1755  			if (WARN_ON_ONCE(copy > msg->msg_iter.count))
ce650a1663354a David Howells            2023-08-02  1756  				goto error;
ce650a1663354a David Howells            2023-08-02  1757  
6d8192bd69bb43 David Howells            2023-05-22  1758  			err = skb_splice_from_iter(skb, &msg->msg_iter, copy,
6d8192bd69bb43 David Howells            2023-05-22  1759  						   sk->sk_allocation);
6d8192bd69bb43 David Howells            2023-05-22  1760  			if (err < 0)
6d8192bd69bb43 David Howells            2023-05-22  1761  				goto error;
6d8192bd69bb43 David Howells            2023-05-22  1762  			copy = err;
6d8192bd69bb43 David Howells            2023-05-22  1763  			wmem_alloc_delta += copy;
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1764  		} else if (!zc) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1765  			int i = skb_shinfo(skb)->nr_frags;
5640f7685831e0 Eric Dumazet             2012-09-23  1766  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1767  			err = -ENOMEM;
5640f7685831e0 Eric Dumazet             2012-09-23  1768  			if (!sk_page_frag_refill(sk, pfrag))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1769  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1770  
1fd3ae8c906c0f Pavel Begunkov           2022-07-12  1771  			skb_zcopy_downgrade_managed(skb);
5640f7685831e0 Eric Dumazet             2012-09-23  1772  			if (!skb_can_coalesce(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet             2012-09-23  1773  					      pfrag->offset)) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1774  				err = -EMSGSIZE;
5640f7685831e0 Eric Dumazet             2012-09-23  1775  				if (i == MAX_SKB_FRAGS)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1776  					goto error;
5640f7685831e0 Eric Dumazet             2012-09-23  1777  
5640f7685831e0 Eric Dumazet             2012-09-23  1778  				__skb_fill_page_desc(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet             2012-09-23  1779  						     pfrag->offset, 0);
5640f7685831e0 Eric Dumazet             2012-09-23  1780  				skb_shinfo(skb)->nr_frags = ++i;
5640f7685831e0 Eric Dumazet             2012-09-23  1781  				get_page(pfrag->page);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1782  			}
5640f7685831e0 Eric Dumazet             2012-09-23  1783  			copy = min_t(int, copy, pfrag->size - pfrag->offset);
9e903e085262ff Eric Dumazet             2011-10-18  1784  			if (getfrag(from,
5640f7685831e0 Eric Dumazet             2012-09-23  1785  				    page_address(pfrag->page) + pfrag->offset,
5640f7685831e0 Eric Dumazet             2012-09-23  1786  				    offset, copy, skb->len, skb) < 0)
5640f7685831e0 Eric Dumazet             2012-09-23  1787  				goto error_efault;
5640f7685831e0 Eric Dumazet             2012-09-23  1788  
5640f7685831e0 Eric Dumazet             2012-09-23  1789  			pfrag->offset += copy;
5640f7685831e0 Eric Dumazet             2012-09-23  1790  			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1791  			skb->len += copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1792  			skb->data_len += copy;
f945fa7ad9c12a Herbert Xu               2008-01-22  1793  			skb->truesize += copy;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1794  			wmem_alloc_delta += copy;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1795  		} else {
b5947e5d1e710c Willem de Bruijn         2018-11-30  1796  			err = skb_zerocopy_iter_dgram(skb, from, copy);
b5947e5d1e710c Willem de Bruijn         2018-11-30  1797  			if (err < 0)
b5947e5d1e710c Willem de Bruijn         2018-11-30  1798  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1799  		}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1800  		offset += copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1801  		length -= copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1802  	}
5640f7685831e0 Eric Dumazet             2012-09-23  1803  
9e8445a56c253f Paolo Abeni              2018-04-04  1804  	if (wmem_alloc_delta)
1f4c6eb2402968 Eric Dumazet             2018-03-31  1805  		refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1806  	return 0;
5640f7685831e0 Eric Dumazet             2012-09-23  1807  
5640f7685831e0 Eric Dumazet             2012-09-23  1808  error_efault:
5640f7685831e0 Eric Dumazet             2012-09-23  1809  	err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1810  error:
8e0449172497a9 Jonathan Lemon           2021-01-06  1811  	net_zcopy_put_abort(uarg, extra_uref);
bdc712b4c2baf9 David S. Miller          2011-05-06  1812  	cork->length -= length;
3bd653c8455bc7 Denis V. Lunev           2008-10-08  1813  	IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
1f4c6eb2402968 Eric Dumazet             2018-03-31  1814  	refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
488b6d91b07112 Vadim Fedorenko          2024-02-13 @1815  	if (hold_tskey)
488b6d91b07112 Vadim Fedorenko          2024-02-13  1816  		atomic_dec(&sk->sk_tskey);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1817  	return err;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1818  }
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1819  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test
  2024-09-02 14:24   ` Jason Xing
@ 2024-09-08 20:04     ` Vadim Fedorenko
  2024-09-08 23:36       ` Jason Xing
  0 siblings, 1 reply; 22+ messages in thread
From: Vadim Fedorenko @ 2024-09-08 20:04 UTC (permalink / raw)
  To: Jason Xing
  Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
	netdev

On 02/09/2024 15:24, Jason Xing wrote:
> On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>>
>> Extend txtimestamp udp test to run with fixed tskey using
>> SCM_TS_OPT_ID control message.
>>
>> Reviewed-by: Willem de Bruijn <willemb@google.com>
>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> 
> Thanks for adding the combination test !
> 
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Apparently, I realised that combination tests had been coded before this
change.

run_test_all() {
   setup
   run_test_tcpudpraw    # setsockopt
   run_test_tcpudpraw -C   # cmsg
   run_test_tcpudpraw -n   # timestamp w/o data
   echo "OK. All tests passed"
}

This function runs tests for TCP/UDP/RAW sockets (defined in
run_test_tcpudpraw) 3 different times - with no extra options,
with CMSG option and with "run_test_tcpudpraw":

run_test_tcpudpraw() {
   local -r args=$@

   run_test_v4v6 ${args}   # tcp
   run_test_v4v6 ${args} -u  # udp
   run_test_v4v6 ${args} -r  # raw
   run_test_v4v6 ${args} -R  # raw (IPPROTO_RAW)
   run_test_v4v6 ${args} -P  # pf_packet
}

So if I add "-o <val>" for UDP and RAW sockets in run_test_tcpudpraw it
will be run with CMSG option too.

I'll remove the "run_test_v4v6 ${args} -u -o 42 -C" from the next
version as it's already covered.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test
  2024-09-08 20:04     ` Vadim Fedorenko
@ 2024-09-08 23:36       ` Jason Xing
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Xing @ 2024-09-08 23:36 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, David Ahern,
	netdev

On Mon, Sep 9, 2024 at 4:04 AM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 02/09/2024 15:24, Jason Xing wrote:
> > On Mon, Sep 2, 2024 at 9:09 PM Vadim Fedorenko <vadfed@meta.com> wrote:
> >>
> >> Extend txtimestamp udp test to run with fixed tskey using
> >> SCM_TS_OPT_ID control message.
> >>
> >> Reviewed-by: Willem de Bruijn <willemb@google.com>
> >> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> >
> > Thanks for adding the combination test !
> >
> > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
>
> Apparently, I realised that combination tests had been coded before this
> change.
>
> run_test_all() {
>    setup
>    run_test_tcpudpraw    # setsockopt
>    run_test_tcpudpraw -C   # cmsg
>    run_test_tcpudpraw -n   # timestamp w/o data
>    echo "OK. All tests passed"
> }
>
> This function runs tests for TCP/UDP/RAW sockets (defined in
> run_test_tcpudpraw) 3 different times - with no extra options,
> with CMSG option and with "run_test_tcpudpraw":
>
> run_test_tcpudpraw() {
>    local -r args=$@
>
>    run_test_v4v6 ${args}   # tcp
>    run_test_v4v6 ${args} -u  # udp
>    run_test_v4v6 ${args} -r  # raw
>    run_test_v4v6 ${args} -R  # raw (IPPROTO_RAW)
>    run_test_v4v6 ${args} -P  # pf_packet
> }
>
> So if I add "-o <val>" for UDP and RAW sockets in run_test_tcpudpraw it
> will be run with CMSG option too.
>
> I'll remove the "run_test_v4v6 ${args} -u -o 42 -C" from the next
> version as it's already covered.

Oh, right. Thanks for finding this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-09-08 23:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-02 13:09 [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
2024-09-02 13:09 ` [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
2024-09-02 14:24   ` Jason Xing
2024-09-08 20:04     ` Vadim Fedorenko
2024-09-08 23:36       ` Jason Xing
2024-09-02 14:29 ` [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Willem de Bruijn
2024-09-02 21:07   ` Vadim Fedorenko
2024-09-03 20:40     ` Willem de Bruijn
2024-09-02 15:19 ` Jason Xing
2024-09-02 15:51   ` Willem de Bruijn
2024-09-02 19:40     ` Vadim Fedorenko
2024-09-02 20:59       ` Willem de Bruijn
2024-09-02 21:10         ` Vadim Fedorenko
2024-09-03 16:01     ` Vadim Fedorenko
2024-09-03 20:46       ` Willem de Bruijn
2024-09-02 18:38 ` Simon Horman
2024-09-02 19:35   ` Vadim Fedorenko
2024-09-03 15:23     ` Simon Horman
2024-09-03  8:06 ` Dan Carpenter
2024-09-03  8:16   ` Vadim Fedorenko
  -- strict thread matches above, loose matches on Subject: below --
2024-09-03  4:16 kernel test robot
2024-09-04 13:25 kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.