netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary
@ 2024-11-07 13:22 Anna Emese Nyiri
  2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Anna Emese Nyiri @ 2024-11-07 13:22 UTC (permalink / raw)
  To: netdev; +Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
	willemdebruijn.kernel

Introduce a new helper function, `sk_set_prio_allowed`,
to centralize the logic for validating priority settings.
Add support for the `SO_PRIORITY` control message,
enabling user-space applications to set socket priority 
via control messages (cmsg).

Patch Overview:

Patch 1/3: Introduce `sk_set_prio_allowed` helper function.
Patch 2/3: Add support for setting `SO_PRIORITY` via control messages
Patch 3/3: Add test for SO_PRIORITY setting via control messages

v3:

- Updated cover letter text.
- Removed priority field from ipcm_cookie.
- Removed cork->tos value check from ip_setup_cork, so
  cork->priority will now take its value from ipc->sockc.priority.
- Replaced ipc->priority with ipc->sockc.priority
  in ip_cmsg_send().
- Modified the error handling for the SO_PRIORITY
  case in __sock_cmsg_send().
- Added missing initialization for ipc6.sockc.priority.
- Introduced cmsg_so_priority.sh test script.
- Modified cmsg_sender.c to set priority via control message (cmsg).
- Rebased on net-next

v2:

https://lore.kernel.org/netdev/20241102125136.5030-1-annaemesenyiri@gmail.com/
- Introduced sk_set_prio_allowed helper to check capability
  for setting priority.
- Removed new fields and changed sockcm_cookie::priority
  from char to u32 to align with sk_buff::priority.
- Moved the cork->tos value check for priority setting
  from __ip_make_skb() to ip_setup_cork().
- Rebased on net-next.

v1:

https://lore.kernel.org/all/20241029144142.31382-1-annaemesenyiri@gmail.com/


Anna Emese Nyiri (3):
  Introduce sk_set_prio_allowed helper function
  support SO_PRIORITY cmsg
  test SO_PRIORITY ancillary data with cmsg_sender

 include/net/inet_sock.h                       |   2 +-
 include/net/ip.h                              |   2 +-
 include/net/sock.h                            |   4 +-
 net/can/raw.c                                 |   2 +-
 net/core/sock.c                               |  18 ++-
 net/ipv4/ip_output.c                          |   4 +-
 net/ipv4/ip_sockglue.c                        |   2 +-
 net/ipv4/raw.c                                |   2 +-
 net/ipv6/ip6_output.c                         |   3 +-
 net/ipv6/raw.c                                |   3 +-
 net/ipv6/udp.c                                |   1 +
 net/packet/af_packet.c                        |   2 +-
 tools/testing/selftests/net/cmsg_sender.c     |  11 +-
 .../testing/selftests/net/cmsg_so_priority.sh | 115 ++++++++++++++++++
 14 files changed, 156 insertions(+), 15 deletions(-)
 create mode 100755 tools/testing/selftests/net/cmsg_so_priority.sh

-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function
  2024-11-07 13:22 [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary Anna Emese Nyiri
@ 2024-11-07 13:22 ` Anna Emese Nyiri
  2024-11-07 13:48   ` Eric Dumazet
  2024-11-07 15:55   ` Willem de Bruijn
  2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
  2024-11-07 13:22 ` [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender Anna Emese Nyiri
  2 siblings, 2 replies; 13+ messages in thread
From: Anna Emese Nyiri @ 2024-11-07 13:22 UTC (permalink / raw)
  To: netdev; +Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
	willemdebruijn.kernel

Simplify priority setting permissions with the `sk_set_prio_allowed`
function, centralizing the validation logic. This change is made in
anticipation of a second caller in a following patch.
No functional changes.

Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
---
 net/core/sock.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 7f398bd07fb7..5ecf6f1a470c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -454,6 +454,13 @@ static int sock_set_timeout(long *timeo_p, sockptr_t optval, int optlen,
 	return 0;
 }
 
+static bool sk_set_prio_allowed(const struct sock *sk, int val)
+{
+	return ((val >= TC_PRIO_BESTEFFORT && val <= TC_PRIO_INTERACTIVE) ||
+		sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) ||
+		sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN));
+}
+
 static bool sock_needs_netstamp(const struct sock *sk)
 {
 	switch (sk->sk_family) {
@@ -1187,9 +1194,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
 	/* handle options which do not require locking the socket. */
 	switch (optname) {
 	case SO_PRIORITY:
-		if ((val >= 0 && val <= 6) ||
-		    sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) ||
-		    sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
+		if (sk_set_prio_allowed(sk, val)) {
 			sock_set_priority(sk, val);
 			return 0;
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg
  2024-11-07 13:22 [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary Anna Emese Nyiri
  2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
@ 2024-11-07 13:22 ` Anna Emese Nyiri
  2024-11-07 13:50   ` Eric Dumazet
  2024-11-07 16:17   ` Willem de Bruijn
  2024-11-07 13:22 ` [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender Anna Emese Nyiri
  2 siblings, 2 replies; 13+ messages in thread
From: Anna Emese Nyiri @ 2024-11-07 13:22 UTC (permalink / raw)
  To: netdev; +Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
	willemdebruijn.kernel

The Linux socket API currently allows setting SO_PRIORITY at the
socket level, applying a uniform priority to all packets sent through
that socket. The exception to this is IP_TOS, when the priority value
is calculated during the handling of
ancillary data, as implemented in commit <f02db315b8d88>
("ipv4: IP_TOS and IP_TTL can be specified as ancillary data").
However, this is a computed
value, and there is currently no mechanism to set a custom priority
via control messages prior to this patch.

According to this pacth, if SO_PRIORITY is specified as ancillary data,
the packet is sent with the priority value set through
sockc->priority, overriding the socket-level values
set via the traditional setsockopt() method. This is analogous to
the existing support for SO_MARK, as implemented in commit
<c6af0c227a22> ("ip: support SO_MARK cmsg").

Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
---
 include/net/inet_sock.h | 2 +-
 include/net/ip.h        | 2 +-
 include/net/sock.h      | 4 +++-
 net/can/raw.c           | 2 +-
 net/core/sock.c         | 7 +++++++
 net/ipv4/ip_output.c    | 4 ++--
 net/ipv4/ip_sockglue.c  | 2 +-
 net/ipv4/raw.c          | 2 +-
 net/ipv6/ip6_output.c   | 3 ++-
 net/ipv6/raw.c          | 3 ++-
 net/ipv6/udp.c          | 1 +
 net/packet/af_packet.c  | 2 +-
 12 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 56d8bc5593d3..3ccbad881d74 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -172,7 +172,7 @@ struct inet_cork {
 	u8			tx_flags;
 	__u8			ttl;
 	__s16			tos;
-	char			priority;
+	u32			priority;
 	__u16			gso_size;
 	u32			ts_opt_id;
 	u64			transmit_time;
diff --git a/include/net/ip.h b/include/net/ip.h
index 0e548c1f2a0e..9f5e33e371fc 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -81,7 +81,6 @@ struct ipcm_cookie {
 	__u8			protocol;
 	__u8			ttl;
 	__s16			tos;
-	char			priority;
 	__u16			gso_size;
 };
 
@@ -96,6 +95,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
 	ipcm_init(ipcm);
 
 	ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark);
+	ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority);
 	ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags);
 	ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
 	ipcm->addr = inet->inet_saddr;
diff --git a/include/net/sock.h b/include/net/sock.h
index 7464e9f9f47c..316a34d6c48b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1814,13 +1814,15 @@ struct sockcm_cookie {
 	u32 mark;
 	u32 tsflags;
 	u32 ts_opt_id;
+	u32 priority;
 };
 
 static inline void sockcm_init(struct sockcm_cookie *sockc,
 			       const struct sock *sk)
 {
 	*sockc = (struct sockcm_cookie) {
-		.tsflags = READ_ONCE(sk->sk_tsflags)
+		.tsflags = READ_ONCE(sk->sk_tsflags),
+		.priority = READ_ONCE(sk->sk_priority),
 	};
 }
 
diff --git a/net/can/raw.c b/net/can/raw.c
index 255c0a8f39d6..46e8ed9d64da 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
 	}
 
 	skb->dev = dev;
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = sockc.priority;
 	skb->mark = READ_ONCE(sk->sk_mark);
 	skb->tstamp = sockc.transmit_time;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 5ecf6f1a470c..68e2af168da7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2941,6 +2941,13 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
 	case SCM_RIGHTS:
 	case SCM_CREDENTIALS:
 		break;
+	case SO_PRIORITY:
+		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
+			return -EINVAL;
+		if (!sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg)))
+			return -EPERM;
+		sockc->priority = *(u32 *)CMSG_DATA(cmsg);
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 0065b1996c94..cd3e788600cc 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1328,7 +1328,7 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	cork->ttl = ipc->ttl;
 	cork->tos = ipc->tos;
 	cork->mark = ipc->sockc.mark;
-	cork->priority = ipc->priority;
+	cork->priority = ipc->sockc.priority;
 	cork->transmit_time = ipc->sockc.transmit_time;
 	cork->tx_flags = 0;
 	sock_tx_timestamp(sk, &ipc->sockc, &cork->tx_flags);
@@ -1465,7 +1465,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 		ip_options_build(skb, opt, cork->addr, rt);
 	}
 
-	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
+	skb->priority = cork->priority;
 	skb->mark = cork->mark;
 	if (sk_is_tcp(sk))
 		skb_set_delivery_time(skb, cork->transmit_time, SKB_CLOCK_MONOTONIC);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index cf377377b52d..f6a03b418dde 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -315,7 +315,7 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
 			if (val < 0 || val > 255)
 				return -EINVAL;
 			ipc->tos = val;
-			ipc->priority = rt_tos2priority(ipc->tos);
+			ipc->sockc.priority = rt_tos2priority(ipc->tos);
 			break;
 		case IP_PROTOCOL:
 			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 0e9e01967ec9..4304a68d1db0 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -358,7 +358,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 	skb_reserve(skb, hlen);
 
 	skb->protocol = htons(ETH_P_IP);
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = sockc->priority;
 	skb->mark = sockc->mark;
 	skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);
 	skb_dst_set(skb, &rt->dst);
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f7b4608bb316..ec9673b7ab16 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1401,6 +1401,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	cork->base.gso_size = ipc6->gso_size;
 	cork->base.tx_flags = 0;
 	cork->base.mark = ipc6->sockc.mark;
+	cork->base.priority = ipc6->sockc.priority;
 	sock_tx_timestamp(sk, &ipc6->sockc, &cork->base.tx_flags);
 	if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID) {
 		cork->base.flags |= IPCORK_TS_OPT_ID;
@@ -1939,7 +1940,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 	hdr->saddr = fl6->saddr;
 	hdr->daddr = *final_dst;
 
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = cork->base.priority;
 	skb->mark = cork->base.mark;
 	if (sk_is_tcp(sk))
 		skb_set_delivery_time(skb, cork->base.transmit_time, SKB_CLOCK_MONOTONIC);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 8476a3944a88..a45aba090aa4 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -619,7 +619,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 	skb_reserve(skb, hlen);
 
 	skb->protocol = htons(ETH_P_IPV6);
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = sockc->priority;
 	skb->mark = sockc->mark;
 	skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);
 
@@ -780,6 +780,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	ipcm6_init(&ipc6);
 	ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags);
 	ipc6.sockc.mark = fl6.flowi6_mark;
+	ipc6.sockc.priority = READ_ONCE(sk->sk_priority);
 
 	if (sin6) {
 		if (addr_len < SIN6_LEN_RFC2133)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 0cef8ae5d1ea..dcce9fd33e98 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1353,6 +1353,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	ipc6.gso_size = READ_ONCE(up->gso_size);
 	ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags);
 	ipc6.sockc.mark = READ_ONCE(sk->sk_mark);
+	ipc6.sockc.priority = READ_ONCE(sk->sk_priority);
 
 	/* destination address check */
 	if (sin6) {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 886c0dd47b66..f8d87d622699 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3126,7 +3126,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 
 	skb->protocol = proto;
 	skb->dev = dev;
-	skb->priority = READ_ONCE(sk->sk_priority);
+	skb->priority = sockc.priority;
 	skb->mark = sockc.mark;
 	skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
  2024-11-07 13:22 [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary Anna Emese Nyiri
  2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
  2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
@ 2024-11-07 13:22 ` Anna Emese Nyiri
  2024-11-07 17:22   ` Willem de Bruijn
  2 siblings, 1 reply; 13+ messages in thread
From: Anna Emese Nyiri @ 2024-11-07 13:22 UTC (permalink / raw)
  To: netdev
  Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
	willemdebruijn.kernel, Ido Schimmel

Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
ancillary data.

Add the `cmsg_so_priority.sh` script, which sets up two network  
namespaces (red and green) and uses the `cmsg_sender.c` program to  
send messages between them. The script sets packet priorities both  
via `setsockopt` and control messages (cmsg) and verifies whether  
packets are routed to the same queue based on priority settings.

Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
---
 tools/testing/selftests/net/cmsg_sender.c     |  11 +-
 .../testing/selftests/net/cmsg_so_priority.sh | 115 ++++++++++++++++++
 2 files changed, 125 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/net/cmsg_so_priority.sh

diff --git a/tools/testing/selftests/net/cmsg_sender.c b/tools/testing/selftests/net/cmsg_sender.c
index 876c2db02a63..6fbe93dd63d2 100644
--- a/tools/testing/selftests/net/cmsg_sender.c
+++ b/tools/testing/selftests/net/cmsg_sender.c
@@ -52,6 +52,7 @@ struct options {
 		unsigned int tclass;
 		unsigned int hlimit;
 		unsigned int priority;
+		unsigned int priority_cmsg;
 	} sockopt;
 	struct {
 		unsigned int family;
@@ -59,6 +60,7 @@ struct options {
 		unsigned int proto;
 	} sock;
 	struct option_cmsg_u32 mark;
+	struct option_cmsg_u32 priority_cmsg;
 	struct {
 		bool ena;
 		unsigned int delay;
@@ -97,6 +99,7 @@ static void __attribute__((noreturn)) cs_usage(const char *bin)
 	       "\n"
 	       "\t\t-m val  Set SO_MARK with given value\n"
 	       "\t\t-M val  Set SO_MARK via setsockopt\n"
+		   "\t\t-Q val  Set SO_PRIORITY via cmsg\n"
 	       "\t\t-d val  Set SO_TXTIME with given delay (usec)\n"
 	       "\t\t-t      Enable time stamp reporting\n"
 	       "\t\t-f val  Set don't fragment via cmsg\n"
@@ -115,7 +118,7 @@ static void cs_parse_args(int argc, char *argv[])
 {
 	int o;
 
-	while ((o = getopt(argc, argv, "46sS:p:P:m:M:n:d:tf:F:c:C:l:L:H:")) != -1) {
+	while ((o = getopt(argc, argv, "46sS:p:P:m:M:n:d:tf:F:c:C:l:L:H:Q:")) != -1) {
 		switch (o) {
 		case 's':
 			opt.silent_send = true;
@@ -148,6 +151,10 @@ static void cs_parse_args(int argc, char *argv[])
 			opt.mark.ena = true;
 			opt.mark.val = atoi(optarg);
 			break;
+		case 'Q':
+			opt.priority_cmsg.ena = true;
+			opt.priority_cmsg.val = atoi(optarg);
+			break;
 		case 'M':
 			opt.sockopt.mark = atoi(optarg);
 			break;
@@ -252,6 +259,8 @@ cs_write_cmsg(int fd, struct msghdr *msg, char *cbuf, size_t cbuf_sz)
 
 	ca_write_cmsg_u32(cbuf, cbuf_sz, &cmsg_len,
 			  SOL_SOCKET, SO_MARK, &opt.mark);
+	ca_write_cmsg_u32(cbuf, cbuf_sz, &cmsg_len,
+			SOL_SOCKET, SO_PRIORITY, &opt.priority_cmsg);
 	ca_write_cmsg_u32(cbuf, cbuf_sz, &cmsg_len,
 			  SOL_IPV6, IPV6_DONTFRAG, &opt.v6.dontfrag);
 	ca_write_cmsg_u32(cbuf, cbuf_sz, &cmsg_len,
diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh b/tools/testing/selftests/net/cmsg_so_priority.sh
new file mode 100755
index 000000000000..706d29b251e9
--- /dev/null
+++ b/tools/testing/selftests/net/cmsg_so_priority.sh
@@ -0,0 +1,115 @@
+#!/bin/bash
+
+source lib.sh
+
+IP4=192.168.0.2/16
+TGT4=192.168.0.3/16
+TGT4_NO_MASK=192.168.0.3
+IP6=2001:db8::2/64
+TGT6=2001:db8::3/64
+TGT6_NO_MASK=2001:db8::3
+IP4BR=192.168.0.1/16
+IP6BR=2001:db8::1/64
+PORT=8080
+DELAY=400000
+QUEUE_NUM=4
+
+
+cleanup() {
+    ip netns del red 2>/dev/null
+    ip netns del green 2>/dev/null
+    ip link del br0 2>/dev/null
+    ip link del vethcab0 2>/dev/null
+    ip link del vethcab1 2>/dev/null
+}
+
+trap cleanup EXIT
+
+priority_values=($(seq 0 $((QUEUE_NUM - 1))))
+
+queue_config=""
+for ((i=0; i<$QUEUE_NUM; i++)); do
+    queue_config+=" 1@$i"
+done
+
+map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
+
+ip netns add red
+ip netns add green
+ip link add br0 type bridge
+ip link set br0 up
+ip addr add $IP4BR dev br0
+ip addr add $IP6BR dev br0
+
+ip link add vethcab0 type veth peer name red0
+ip link set vethcab0 master br0
+ip link set red0 netns red
+ip netns exec red bash -c "
+ip link set lo up
+ip link set red0 up
+ip addr add $IP4 dev red0
+ip addr add $IP6 dev red0
+sysctl -w net.ipv4.ping_group_range='0 2147483647'
+exit"
+ip link set vethcab0 up
+
+ip link add vethcab1 type veth peer name green0
+ip link set vethcab1 master br0
+ip link set green0 netns green
+ip netns exec green bash -c "
+ip link set lo up
+ip link set green0 up
+ip addr add $TGT4 dev green0
+ip addr add $TGT6 dev green0
+exit"
+ip link set vethcab1 up
+
+ip netns exec red bash -c "
+sudo ethtool -L red0 tx $QUEUE_NUM rx $QUEUE_NUM
+sudo tc qdisc add dev red0 root mqprio num_tc $QUEUE_NUM queues $queue_config map $map_config hw 0
+exit"
+
+get_queue_bytes() {
+    ip netns exec red sudo tc -s qdisc show dev red0 | grep 'Sent' | awk '{print $2}'
+}
+
+TOTAL_TESTS=0
+FAILED_TESTS=0
+
+check_result() {
+    ((TOTAL_TESTS++))
+    if [ "$1" -ne 0 ]; then
+        ((FAILED_TESTS++))
+    fi
+}
+
+
+for i in 4 6; do
+    [ $i == 4 ] && TGT=$TGT4_NO_MASK || TGT=$TGT6_NO_MASK
+
+    for p in u i r; do
+        echo "Test IPV$i, prot: $p"
+        for value in "${priority_values[@]}"; do
+            ip netns exec red ./cmsg_sender -$i -Q $value -d "${DELAY}" -p $p $TGT $PORT
+            setsockopt_priority_bytes_num=($(get_queue_bytes))
+
+            ip netns exec red ./cmsg_sender -$i -P $value -d "${DELAY}" -p $p $TGT $PORT
+            cmsg_priority_bytes_num=($(get_queue_bytes))
+
+            if [[ "${cmsg_priority_bytes_num[$actual_queue]}" != \
+            "${setsockopt_priority_bytes_num[$actual_queue]}" ]]; then
+                check_result 0
+            else
+                check_result 1
+            fi
+        done
+    done
+done
+
+if [ $FAILED_TESTS -ne 0 ]; then
+    echo "FAIL - $FAILED_TESTS/$TOTAL_TESTS tests failed"
+    exit 1
+else
+    echo "OK - All $TOTAL_TESTS tests passed"
+    exit 0
+fi
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function
  2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
@ 2024-11-07 13:48   ` Eric Dumazet
  2024-11-07 15:55   ` Willem de Bruijn
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2024-11-07 13:48 UTC (permalink / raw)
  To: Anna Emese Nyiri; +Cc: netdev, fejes, kuba, pabeni, willemdebruijn.kernel

On Thu, Nov 7, 2024 at 2:23 PM Anna Emese Nyiri
<annaemesenyiri@gmail.com> wrote:
>
> Simplify priority setting permissions with the `sk_set_prio_allowed`
> function, centralizing the validation logic. This change is made in
> anticipation of a second caller in a following patch.
> No functional changes.
>
> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg
  2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
@ 2024-11-07 13:50   ` Eric Dumazet
  2024-11-07 16:17   ` Willem de Bruijn
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2024-11-07 13:50 UTC (permalink / raw)
  To: Anna Emese Nyiri; +Cc: netdev, fejes, kuba, pabeni, willemdebruijn.kernel

On Thu, Nov 7, 2024 at 2:23 PM Anna Emese Nyiri
<annaemesenyiri@gmail.com> wrote:
>
> The Linux socket API currently allows setting SO_PRIORITY at the
> socket level, applying a uniform priority to all packets sent through
> that socket. The exception to this is IP_TOS, when the priority value
> is calculated during the handling of
> ancillary data, as implemented in commit <f02db315b8d88>
> ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data").
> However, this is a computed
> value, and there is currently no mechanism to set a custom priority
> via control messages prior to this patch.
>
> According to this pacth, if SO_PRIORITY is specified as ancillary data,
> the packet is sent with the priority value set through
> sockc->priority, overriding the socket-level values
> set via the traditional setsockopt() method. This is analogous to
> the existing support for SO_MARK, as implemented in commit
> <c6af0c227a22> ("ip: support SO_MARK cmsg").
>
> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function
  2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
  2024-11-07 13:48   ` Eric Dumazet
@ 2024-11-07 15:55   ` Willem de Bruijn
  1 sibling, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-07 15:55 UTC (permalink / raw)
  To: Anna Emese Nyiri, netdev
  Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
	willemdebruijn.kernel

Anna Emese Nyiri wrote:
> Simplify priority setting permissions with the `sk_set_prio_allowed`
> function, centralizing the validation logic. This change is made in
> anticipation of a second caller in a following patch.
> No functional changes.
> 
> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg
  2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
  2024-11-07 13:50   ` Eric Dumazet
@ 2024-11-07 16:17   ` Willem de Bruijn
  1 sibling, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-07 16:17 UTC (permalink / raw)
  To: Anna Emese Nyiri, netdev
  Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
	willemdebruijn.kernel

Anna Emese Nyiri wrote:
> The Linux socket API currently allows setting SO_PRIORITY at the
> socket level, applying a uniform priority to all packets sent through
> that socket. The exception to this is IP_TOS, when the priority value
> is calculated during the handling of
> ancillary data, as implemented in commit <f02db315b8d88>
> ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data").
> However, this is a computed
> value, and there is currently no mechanism to set a custom priority
> via control messages prior to this patch.
> 
> According to this pacth, if SO_PRIORITY is specified as ancillary data,

typo: patch

> the packet is sent with the priority value set through
> sockc->priority, overriding the socket-level values
> set via the traditional setsockopt() method. This is analogous to
> the existing support for SO_MARK, as implemented in commit
> <c6af0c227a22> ("ip: support SO_MARK cmsg").

If both cmsg SO_PRIORITY and IP_TOS are passed, then the one that
takes precedence is the last one in the cmsg list.
 
> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index cf377377b52d..f6a03b418dde 100644
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index 0e9e01967ec9..4304a68d1db0 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -358,7 +358,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
>  	skb_reserve(skb, hlen);
>  
>  	skb->protocol = htons(ETH_P_IP);
> -	skb->priority = READ_ONCE(sk->sk_priority);
> +	skb->priority = sockc->priority;

This has the side effect that raw_send_hdrinc will now interpret cmsg
IP_TOS, where it previously did not (as only sockcm_cookie was passed,
not all of ipcm_cookie). This is an improvement, but good to make
explicit.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
  2024-11-07 13:22 ` [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender Anna Emese Nyiri
@ 2024-11-07 17:22   ` Willem de Bruijn
  2024-11-08 14:46     ` Willem de Bruijn
  0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-07 17:22 UTC (permalink / raw)
  To: Anna Emese Nyiri, netdev
  Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
	willemdebruijn.kernel, Ido Schimmel

Anna Emese Nyiri wrote:
> Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> ancillary data.
> 
> Add the `cmsg_so_priority.sh` script, which sets up two network  
> namespaces (red and green) and uses the `cmsg_sender.c` program to  
> send messages between them. The script sets packet priorities both  
> via `setsockopt` and control messages (cmsg) and verifies whether  
> packets are routed to the same queue based on priority settings.

qdisc. queue is a more generic term, generally referring to hw queues.
 
> Suggested-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> ---
>  tools/testing/selftests/net/cmsg_sender.c     |  11 +-
>  .../testing/selftests/net/cmsg_so_priority.sh | 115 ++++++++++++++++++
>  2 files changed, 125 insertions(+), 1 deletion(-)
>  create mode 100755 tools/testing/selftests/net/cmsg_so_priority.sh
 
> diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh b/tools/testing/selftests/net/cmsg_so_priority.sh
> new file mode 100755
> index 000000000000..706d29b251e9
> --- /dev/null
> +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> @@ -0,0 +1,115 @@
> +#!/bin/bash

SPDX header

> +
> +source lib.sh
> +
> +IP4=192.168.0.2/16
> +TGT4=192.168.0.3/16
> +TGT4_NO_MASK=192.168.0.3
> +IP6=2001:db8::2/64
> +TGT6=2001:db8::3/64
> +TGT6_NO_MASK=2001:db8::3
> +IP4BR=192.168.0.1/16
> +IP6BR=2001:db8::1/64
> +PORT=8080
> +DELAY=400000
> +QUEUE_NUM=4
> +
> +
> +cleanup() {
> +    ip netns del red 2>/dev/null
> +    ip netns del green 2>/dev/null
> +    ip link del br0 2>/dev/null
> +    ip link del vethcab0 2>/dev/null
> +    ip link del vethcab1 2>/dev/null
> +}
> +
> +trap cleanup EXIT
> +
> +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> +
> +queue_config=""
> +for ((i=0; i<$QUEUE_NUM; i++)); do
> +    queue_config+=" 1@$i"
> +done
> +
> +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> +
> +ip netns add red
> +ip netns add green
> +ip link add br0 type bridge
> +ip link set br0 up

Is this bridge needed? Can this just use a veth pair as is.

More importantly, it would be great if we can deduplicate this kind of
setup boilerplate across similar tests as much as possible.

> +ip addr add $IP4BR dev br0
> +ip addr add $IP6BR dev br0
> +
> +ip link add vethcab0 type veth peer name red0
> +ip link set vethcab0 master br0
> +ip link set red0 netns red
> +ip netns exec red bash -c "
> +ip link set lo up
> +ip link set red0 up
> +ip addr add $IP4 dev red0
> +ip addr add $IP6 dev red0
> +sysctl -w net.ipv4.ping_group_range='0 2147483647'
> +exit"
> +ip link set vethcab0 up
> +
> +ip link add vethcab1 type veth peer name green0
> +ip link set vethcab1 master br0
> +ip link set green0 netns green
> +ip netns exec green bash -c "
> +ip link set lo up
> +ip link set green0 up
> +ip addr add $TGT4 dev green0
> +ip addr add $TGT6 dev green0
> +exit"
> +ip link set vethcab1 up
> +
> +ip netns exec red bash -c "
> +sudo ethtool -L red0 tx $QUEUE_NUM rx $QUEUE_NUM
> +sudo tc qdisc add dev red0 root mqprio num_tc $QUEUE_NUM queues $queue_config map $map_config hw 0
> +exit"
> +
> +get_queue_bytes() {
> +    ip netns exec red sudo tc -s qdisc show dev red0 | grep 'Sent' | awk '{print $2}'
> +}
> +
> +TOTAL_TESTS=0
> +FAILED_TESTS=0
> +
> +check_result() {
> +    ((TOTAL_TESTS++))
> +    if [ "$1" -ne 0 ]; then
> +        ((FAILED_TESTS++))
> +    fi
> +}
> +
> +
> +for i in 4 6; do
> +    [ $i == 4 ] && TGT=$TGT4_NO_MASK || TGT=$TGT6_NO_MASK
> +
> +    for p in u i r; do
> +        echo "Test IPV$i, prot: $p"
> +        for value in "${priority_values[@]}"; do
> +            ip netns exec red ./cmsg_sender -$i -Q $value -d "${DELAY}" -p $p $TGT $PORT
> +            setsockopt_priority_bytes_num=($(get_queue_bytes))
> +
> +            ip netns exec red ./cmsg_sender -$i -P $value -d "${DELAY}" -p $p $TGT $PORT
> +            cmsg_priority_bytes_num=($(get_queue_bytes))
> +
> +            if [[ "${cmsg_priority_bytes_num[$actual_queue]}" != \
> +            "${setsockopt_priority_bytes_num[$actual_queue]}" ]]; then
> +                check_result 0
> +            else
> +                check_result 1
> +            fi
> +        done
> +    done
> +done
> +
> +if [ $FAILED_TESTS -ne 0 ]; then
> +    echo "FAIL - $FAILED_TESTS/$TOTAL_TESTS tests failed"
> +    exit 1
> +else
> +    echo "OK - All $TOTAL_TESTS tests passed"
> +    exit 0
> +fi
> -- 
> 2.43.0
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
  2024-11-07 17:22   ` Willem de Bruijn
@ 2024-11-08 14:46     ` Willem de Bruijn
  2024-11-09 18:54       ` Ferenc Fejes
  0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-08 14:46 UTC (permalink / raw)
  To: Willem de Bruijn, Anna Emese Nyiri, netdev
  Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
	willemdebruijn.kernel, Ido Schimmel

Willem de Bruijn wrote:
> Anna Emese Nyiri wrote:
> > Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> > ancillary data.
> > 
> > Add the `cmsg_so_priority.sh` script, which sets up two network  
> > namespaces (red and green) and uses the `cmsg_sender.c` program to  
> > send messages between them. The script sets packet priorities both  
> > via `setsockopt` and control messages (cmsg) and verifies whether  
> > packets are routed to the same queue based on priority settings.
> 
> qdisc. queue is a more generic term, generally referring to hw queues.
>  
> > Suggested-by: Ido Schimmel <idosch@idosch.org>
> > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > ---
> >  tools/testing/selftests/net/cmsg_sender.c     |  11 +-
> >  .../testing/selftests/net/cmsg_so_priority.sh | 115 ++++++++++++++++++
> >  2 files changed, 125 insertions(+), 1 deletion(-)
> >  create mode 100755 tools/testing/selftests/net/cmsg_so_priority.sh
>  
> > diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh b/tools/testing/selftests/net/cmsg_so_priority.sh
> > new file mode 100755
> > index 000000000000..706d29b251e9
> > --- /dev/null
> > +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> > @@ -0,0 +1,115 @@
> > +#!/bin/bash
> 
> SPDX header
> 
> > +
> > +source lib.sh
> > +
> > +IP4=192.168.0.2/16
> > +TGT4=192.168.0.3/16
> > +TGT4_NO_MASK=192.168.0.3
> > +IP6=2001:db8::2/64
> > +TGT6=2001:db8::3/64
> > +TGT6_NO_MASK=2001:db8::3
> > +IP4BR=192.168.0.1/16
> > +IP6BR=2001:db8::1/64
> > +PORT=8080
> > +DELAY=400000
> > +QUEUE_NUM=4
> > +
> > +
> > +cleanup() {
> > +    ip netns del red 2>/dev/null
> > +    ip netns del green 2>/dev/null
> > +    ip link del br0 2>/dev/null
> > +    ip link del vethcab0 2>/dev/null
> > +    ip link del vethcab1 2>/dev/null
> > +}
> > +
> > +trap cleanup EXIT
> > +
> > +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> > +
> > +queue_config=""
> > +for ((i=0; i<$QUEUE_NUM; i++)); do
> > +    queue_config+=" 1@$i"
> > +done
> > +
> > +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> > +
> > +ip netns add red
> > +ip netns add green
> > +ip link add br0 type bridge
> > +ip link set br0 up
> 
> Is this bridge needed? Can this just use a veth pair as is.
> 
> More importantly, it would be great if we can deduplicate this kind of
> setup boilerplate across similar tests as much as possible.

As a matter of fact, similar to cmsg_so_mark, this test can probably
use a dummy netdevice, no need for a second namespace and dev.

cmsg_so_mark.sh is probably small enough that it is fine to copy that
and create a duplicate. As trying to extend it to cover both tests
will probably double it in size and will just be harder to follow.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
  2024-11-08 14:46     ` Willem de Bruijn
@ 2024-11-09 18:54       ` Ferenc Fejes
  2024-11-09 19:56         ` Willem de Bruijn
  0 siblings, 1 reply; 13+ messages in thread
From: Ferenc Fejes @ 2024-11-09 18:54 UTC (permalink / raw)
  To: Willem de Bruijn, Anna Emese Nyiri, netdev
  Cc: edumazet, kuba, pabeni, Ido Schimmel

On Fri, 2024-11-08 at 09:46 -0500, Willem de Bruijn wrote:
> Willem de Bruijn wrote:
> > Anna Emese Nyiri wrote:
> > > Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> > > ancillary data.
> > > 
> > > Add the `cmsg_so_priority.sh` script, which sets up two network  
> > > namespaces (red and green) and uses the `cmsg_sender.c` program
> > > to  
> > > send messages between them. The script sets packet priorities
> > > both  
> > > via `setsockopt` and control messages (cmsg) and verifies
> > > whether  
> > > packets are routed to the same queue based on priority settings.
> > 
> > qdisc. queue is a more generic term, generally referring to hw
> > queues.
> >  
> > > Suggested-by: Ido Schimmel <idosch@idosch.org>
> > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > > ---
> > >  tools/testing/selftests/net/cmsg_sender.c     |  11 +-
> > >  .../testing/selftests/net/cmsg_so_priority.sh | 115
> > > ++++++++++++++++++
> > >  2 files changed, 125 insertions(+), 1 deletion(-)
> > >  create mode 100755
> > > tools/testing/selftests/net/cmsg_so_priority.sh
> >  
> > > diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh
> > > b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > new file mode 100755
> > > index 000000000000..706d29b251e9
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > @@ -0,0 +1,115 @@
> > > +#!/bin/bash
> > 
> > SPDX header
> > 
> > > +
> > > +source lib.sh
> > > +
> > > +IP4=192.168.0.2/16
> > > +TGT4=192.168.0.3/16
> > > +TGT4_NO_MASK=192.168.0.3
> > > +IP6=2001:db8::2/64
> > > +TGT6=2001:db8::3/64
> > > +TGT6_NO_MASK=2001:db8::3
> > > +IP4BR=192.168.0.1/16
> > > +IP6BR=2001:db8::1/64
> > > +PORT=8080
> > > +DELAY=400000
> > > +QUEUE_NUM=4
> > > +
> > > +
> > > +cleanup() {
> > > +    ip netns del red 2>/dev/null
> > > +    ip netns del green 2>/dev/null
> > > +    ip link del br0 2>/dev/null
> > > +    ip link del vethcab0 2>/dev/null
> > > +    ip link del vethcab1 2>/dev/null
> > > +}
> > > +
> > > +trap cleanup EXIT
> > > +
> > > +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> > > +
> > > +queue_config=""
> > > +for ((i=0; i<$QUEUE_NUM; i++)); do
> > > +    queue_config+=" 1@$i"
> > > +done
> > > +
> > > +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> > > +
> > > +ip netns add red
> > > +ip netns add green
> > > +ip link add br0 type bridge
> > > +ip link set br0 up
> > 
> > Is this bridge needed? Can this just use a veth pair as is.
> > 
> > More importantly, it would be great if we can deduplicate this kind
> > of
> > setup boilerplate across similar tests as much as possible.
> 
> As a matter of fact, similar to cmsg_so_mark, this test can probably
> use a dummy netdevice, no need for a second namespace and dev.
> 
> cmsg_so_mark.sh is probably small enough that it is fine to copy that
> and create a duplicate. As trying to extend it to cover both tests
> will probably double it in size and will just be harder to follow.

I'm afraid we don't have "ip rule" match argument for skb->priority
like we have for skb->mark (ip rule fwmark). AFAIU cmsg_so_mark.sh uses
rule matches to confirm if skb->mark is correct or not.

Using nftables meta priority would work with a dummy device:
add table so_prio
add chain so_prio so_prio_chain { type filter hook output priority 0; }
add rule so_prio so_prio_chain meta priority $SOPRIOVALUE counter

Is there anything simpler? I am afraid we cannot use nftables in
selftests, or can we? Thanks!




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
  2024-11-09 18:54       ` Ferenc Fejes
@ 2024-11-09 19:56         ` Willem de Bruijn
  2024-11-10 13:27           ` Ido Schimmel
  0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-09 19:56 UTC (permalink / raw)
  To: Ferenc Fejes, Willem de Bruijn, Anna Emese Nyiri, netdev
  Cc: edumazet, kuba, pabeni, Ido Schimmel

Ferenc Fejes wrote:
> On Fri, 2024-11-08 at 09:46 -0500, Willem de Bruijn wrote:
> > Willem de Bruijn wrote:
> > > Anna Emese Nyiri wrote:
> > > > Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> > > > ancillary data.
> > > > 
> > > > Add the `cmsg_so_priority.sh` script, which sets up two network  
> > > > namespaces (red and green) and uses the `cmsg_sender.c` program
> > > > to  
> > > > send messages between them. The script sets packet priorities
> > > > both  
> > > > via `setsockopt` and control messages (cmsg) and verifies
> > > > whether  
> > > > packets are routed to the same queue based on priority settings.
> > > 
> > > qdisc. queue is a more generic term, generally referring to hw
> > > queues.
> > >  
> > > > Suggested-by: Ido Schimmel <idosch@idosch.org>
> > > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > > > ---
> > > >  tools/testing/selftests/net/cmsg_sender.c     |  11 +-
> > > >  .../testing/selftests/net/cmsg_so_priority.sh | 115
> > > > ++++++++++++++++++
> > > >  2 files changed, 125 insertions(+), 1 deletion(-)
> > > >  create mode 100755
> > > > tools/testing/selftests/net/cmsg_so_priority.sh
> > >  
> > > > diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > new file mode 100755
> > > > index 000000000000..706d29b251e9
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > @@ -0,0 +1,115 @@
> > > > +#!/bin/bash
> > > 
> > > SPDX header
> > > 
> > > > +
> > > > +source lib.sh
> > > > +
> > > > +IP4=192.168.0.2/16
> > > > +TGT4=192.168.0.3/16
> > > > +TGT4_NO_MASK=192.168.0.3
> > > > +IP6=2001:db8::2/64
> > > > +TGT6=2001:db8::3/64
> > > > +TGT6_NO_MASK=2001:db8::3
> > > > +IP4BR=192.168.0.1/16
> > > > +IP6BR=2001:db8::1/64
> > > > +PORT=8080
> > > > +DELAY=400000
> > > > +QUEUE_NUM=4
> > > > +
> > > > +
> > > > +cleanup() {
> > > > +    ip netns del red 2>/dev/null
> > > > +    ip netns del green 2>/dev/null
> > > > +    ip link del br0 2>/dev/null
> > > > +    ip link del vethcab0 2>/dev/null
> > > > +    ip link del vethcab1 2>/dev/null
> > > > +}
> > > > +
> > > > +trap cleanup EXIT
> > > > +
> > > > +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> > > > +
> > > > +queue_config=""
> > > > +for ((i=0; i<$QUEUE_NUM; i++)); do
> > > > +    queue_config+=" 1@$i"
> > > > +done
> > > > +
> > > > +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> > > > +
> > > > +ip netns add red
> > > > +ip netns add green
> > > > +ip link add br0 type bridge
> > > > +ip link set br0 up
> > > 
> > > Is this bridge needed? Can this just use a veth pair as is.
> > > 
> > > More importantly, it would be great if we can deduplicate this kind
> > > of
> > > setup boilerplate across similar tests as much as possible.
> > 
> > As a matter of fact, similar to cmsg_so_mark, this test can probably
> > use a dummy netdevice, no need for a second namespace and dev.
> > 
> > cmsg_so_mark.sh is probably small enough that it is fine to copy that
> > and create a duplicate. As trying to extend it to cover both tests
> > will probably double it in size and will just be harder to follow.
> 
> I'm afraid we don't have "ip rule" match argument for skb->priority
> like we have for skb->mark (ip rule fwmark). AFAIU cmsg_so_mark.sh uses
> rule matches to confirm if skb->mark is correct or not.
> 
> Using nftables meta priority would work with a dummy device:
> add table so_prio
> add chain so_prio so_prio_chain { type filter hook output priority 0; }
> add rule so_prio so_prio_chain meta priority $SOPRIOVALUE counter
> 
> Is there anything simpler? I am afraid we cannot use nftables in
> selftests, or can we? Thanks!

I'd use traffic shaper (tc). There are a variety of qdiscs/schedulers
and classifiers that act on skb->priority.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
  2024-11-09 19:56         ` Willem de Bruijn
@ 2024-11-10 13:27           ` Ido Schimmel
  0 siblings, 0 replies; 13+ messages in thread
From: Ido Schimmel @ 2024-11-10 13:27 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Ferenc Fejes, Anna Emese Nyiri, netdev, edumazet, kuba, pabeni

On Sat, Nov 09, 2024 at 02:56:29PM -0500, Willem de Bruijn wrote:
> Ferenc Fejes wrote:
> > On Fri, 2024-11-08 at 09:46 -0500, Willem de Bruijn wrote:
> > > Willem de Bruijn wrote:
> > > > Anna Emese Nyiri wrote:
> > > > > Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> > > > > ancillary data.
> > > > > 
> > > > > Add the `cmsg_so_priority.sh` script, which sets up two network  
> > > > > namespaces (red and green) and uses the `cmsg_sender.c` program
> > > > > to  
> > > > > send messages between them. The script sets packet priorities
> > > > > both  
> > > > > via `setsockopt` and control messages (cmsg) and verifies
> > > > > whether  
> > > > > packets are routed to the same queue based on priority settings.
> > > > 
> > > > qdisc. queue is a more generic term, generally referring to hw
> > > > queues.
> > > >  
> > > > > Suggested-by: Ido Schimmel <idosch@idosch.org>
> > > > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > > > > ---
> > > > >  tools/testing/selftests/net/cmsg_sender.c     |  11 +-
> > > > >  .../testing/selftests/net/cmsg_so_priority.sh | 115
> > > > > ++++++++++++++++++
> > > > >  2 files changed, 125 insertions(+), 1 deletion(-)
> > > > >  create mode 100755
> > > > > tools/testing/selftests/net/cmsg_so_priority.sh
> > > >  
> > > > > diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > > b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > > new file mode 100755
> > > > > index 000000000000..706d29b251e9
> > > > > --- /dev/null
> > > > > +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > > @@ -0,0 +1,115 @@
> > > > > +#!/bin/bash
> > > > 
> > > > SPDX header
> > > > 
> > > > > +
> > > > > +source lib.sh
> > > > > +
> > > > > +IP4=192.168.0.2/16
> > > > > +TGT4=192.168.0.3/16
> > > > > +TGT4_NO_MASK=192.168.0.3
> > > > > +IP6=2001:db8::2/64
> > > > > +TGT6=2001:db8::3/64
> > > > > +TGT6_NO_MASK=2001:db8::3
> > > > > +IP4BR=192.168.0.1/16
> > > > > +IP6BR=2001:db8::1/64
> > > > > +PORT=8080
> > > > > +DELAY=400000
> > > > > +QUEUE_NUM=4
> > > > > +
> > > > > +
> > > > > +cleanup() {
> > > > > +    ip netns del red 2>/dev/null
> > > > > +    ip netns del green 2>/dev/null
> > > > > +    ip link del br0 2>/dev/null
> > > > > +    ip link del vethcab0 2>/dev/null
> > > > > +    ip link del vethcab1 2>/dev/null
> > > > > +}
> > > > > +
> > > > > +trap cleanup EXIT
> > > > > +
> > > > > +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> > > > > +
> > > > > +queue_config=""
> > > > > +for ((i=0; i<$QUEUE_NUM; i++)); do
> > > > > +    queue_config+=" 1@$i"
> > > > > +done
> > > > > +
> > > > > +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> > > > > +
> > > > > +ip netns add red
> > > > > +ip netns add green
> > > > > +ip link add br0 type bridge
> > > > > +ip link set br0 up
> > > > 
> > > > Is this bridge needed? Can this just use a veth pair as is.
> > > > 
> > > > More importantly, it would be great if we can deduplicate this kind
> > > > of
> > > > setup boilerplate across similar tests as much as possible.
> > > 
> > > As a matter of fact, similar to cmsg_so_mark, this test can probably
> > > use a dummy netdevice, no need for a second namespace and dev.
> > > 
> > > cmsg_so_mark.sh is probably small enough that it is fine to copy that
> > > and create a duplicate. As trying to extend it to cover both tests
> > > will probably double it in size and will just be harder to follow.
> > 
> > I'm afraid we don't have "ip rule" match argument for skb->priority
> > like we have for skb->mark (ip rule fwmark). AFAIU cmsg_so_mark.sh uses
> > rule matches to confirm if skb->mark is correct or not.
> > 
> > Using nftables meta priority would work with a dummy device:
> > add table so_prio
> > add chain so_prio so_prio_chain { type filter hook output priority 0; }
> > add rule so_prio so_prio_chain meta priority $SOPRIOVALUE counter
> > 
> > Is there anything simpler? I am afraid we cannot use nftables in
> > selftests, or can we? Thanks!
> 
> I'd use traffic shaper (tc). There are a variety of qdiscs/schedulers
> and classifiers that act on skb->priority.

When I initially suggested the test I was thinking about creating a VLAN
device with egress QoS map and then matching on VLAN priority with
flower. Something like [1]. It is just a quick hack. Proper test should
test with all combinations of IPv4 / IPv6 / UDP / ICMP / RAW / cmsg /
sockopt (like cmsg_so_mark.sh).

[1]
#!/bin/bash

ip netns del ns1 &> /dev/null
ip netns add ns1

ip -n ns1 link set dev lo up
ip -n ns1 link add name dummy1 up type dummy

ip -n ns1 link add link dummy1 name dummy1.10 up type vlan id 10 \
	egress-qos-map 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
ip -n ns1 address add 192.0.2.1/24 dev dummy1.10
ip -n ns1 neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud permanent dev \
	dummy1.10

tc -n ns1 qdisc add dev dummy1 clsact
for i in {0..7}; do
	tc -n ns1 filter add dev dummy1 egress pref 1 handle 10${i} \
		proto 802.1q flower vlan_prio $i vlan_ethtype ipv4 \
		ip_proto udp dst_ip 192.0.2.2 action drop
done

for i in {0..7}; do
	pkts=$(tc -n ns1 -j -s filter show dev dummy1 egress \
		| jq ".[] | select(.options.handle == 10$i) | \
		.options.actions[0].stats.packets")
	[[ $pkts == 0 ]] || echo "prio $i: expected 0, got $pkts"

	ip netns exec ns1 ./cmsg_sender -4 -p udp -P $i 192.0.2.2 1234

	pkts=$(tc -n ns1 -j -s filter show dev dummy1 egress \
		| jq ".[] | select(.options.handle == 10$i) | \
		.options.actions[0].stats.packets")
	[[ $pkts == 1 ]] || echo "prio $i: expected 1, got $pkts"
done

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-11-10 13:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-07 13:22 [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary Anna Emese Nyiri
2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
2024-11-07 13:48   ` Eric Dumazet
2024-11-07 15:55   ` Willem de Bruijn
2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
2024-11-07 13:50   ` Eric Dumazet
2024-11-07 16:17   ` Willem de Bruijn
2024-11-07 13:22 ` [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender Anna Emese Nyiri
2024-11-07 17:22   ` Willem de Bruijn
2024-11-08 14:46     ` Willem de Bruijn
2024-11-09 18:54       ` Ferenc Fejes
2024-11-09 19:56         ` Willem de Bruijn
2024-11-10 13:27           ` Ido Schimmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).