* [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary
@ 2024-11-07 13:22 Anna Emese Nyiri
2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Anna Emese Nyiri @ 2024-11-07 13:22 UTC (permalink / raw)
To: netdev; +Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
willemdebruijn.kernel
Introduce a new helper function, `sk_set_prio_allowed`,
to centralize the logic for validating priority settings.
Add support for the `SO_PRIORITY` control message,
enabling user-space applications to set socket priority
via control messages (cmsg).
Patch Overview:
Patch 1/3: Introduce `sk_set_prio_allowed` helper function.
Patch 2/3: Add support for setting `SO_PRIORITY` via control messages
Patch 3/3: Add test for SO_PRIORITY setting via control messages
v3:
- Updated cover letter text.
- Removed priority field from ipcm_cookie.
- Removed cork->tos value check from ip_setup_cork, so
cork->priority will now take its value from ipc->sockc.priority.
- Replaced ipc->priority with ipc->sockc.priority
in ip_cmsg_send().
- Modified the error handling for the SO_PRIORITY
case in __sock_cmsg_send().
- Added missing initialization for ipc6.sockc.priority.
- Introduced cmsg_so_priority.sh test script.
- Modified cmsg_sender.c to set priority via control message (cmsg).
- Rebased on net-next
v2:
https://lore.kernel.org/netdev/20241102125136.5030-1-annaemesenyiri@gmail.com/
- Introduced sk_set_prio_allowed helper to check capability
for setting priority.
- Removed new fields and changed sockcm_cookie::priority
from char to u32 to align with sk_buff::priority.
- Moved the cork->tos value check for priority setting
from __ip_make_skb() to ip_setup_cork().
- Rebased on net-next.
v1:
https://lore.kernel.org/all/20241029144142.31382-1-annaemesenyiri@gmail.com/
Anna Emese Nyiri (3):
Introduce sk_set_prio_allowed helper function
support SO_PRIORITY cmsg
test SO_PRIORITY ancillary data with cmsg_sender
include/net/inet_sock.h | 2 +-
include/net/ip.h | 2 +-
include/net/sock.h | 4 +-
net/can/raw.c | 2 +-
net/core/sock.c | 18 ++-
net/ipv4/ip_output.c | 4 +-
net/ipv4/ip_sockglue.c | 2 +-
net/ipv4/raw.c | 2 +-
net/ipv6/ip6_output.c | 3 +-
net/ipv6/raw.c | 3 +-
net/ipv6/udp.c | 1 +
net/packet/af_packet.c | 2 +-
tools/testing/selftests/net/cmsg_sender.c | 11 +-
.../testing/selftests/net/cmsg_so_priority.sh | 115 ++++++++++++++++++
14 files changed, 156 insertions(+), 15 deletions(-)
create mode 100755 tools/testing/selftests/net/cmsg_so_priority.sh
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function
2024-11-07 13:22 [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary Anna Emese Nyiri
@ 2024-11-07 13:22 ` Anna Emese Nyiri
2024-11-07 13:48 ` Eric Dumazet
2024-11-07 15:55 ` Willem de Bruijn
2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
2024-11-07 13:22 ` [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender Anna Emese Nyiri
2 siblings, 2 replies; 13+ messages in thread
From: Anna Emese Nyiri @ 2024-11-07 13:22 UTC (permalink / raw)
To: netdev; +Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
willemdebruijn.kernel
Simplify priority setting permissions with the `sk_set_prio_allowed`
function, centralizing the validation logic. This change is made in
anticipation of a second caller in a following patch.
No functional changes.
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
---
net/core/sock.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c
index 7f398bd07fb7..5ecf6f1a470c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -454,6 +454,13 @@ static int sock_set_timeout(long *timeo_p, sockptr_t optval, int optlen,
return 0;
}
+static bool sk_set_prio_allowed(const struct sock *sk, int val)
+{
+ return ((val >= TC_PRIO_BESTEFFORT && val <= TC_PRIO_INTERACTIVE) ||
+ sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) ||
+ sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN));
+}
+
static bool sock_needs_netstamp(const struct sock *sk)
{
switch (sk->sk_family) {
@@ -1187,9 +1194,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
/* handle options which do not require locking the socket. */
switch (optname) {
case SO_PRIORITY:
- if ((val >= 0 && val <= 6) ||
- sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) ||
- sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
+ if (sk_set_prio_allowed(sk, val)) {
sock_set_priority(sk, val);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg
2024-11-07 13:22 [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary Anna Emese Nyiri
2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
@ 2024-11-07 13:22 ` Anna Emese Nyiri
2024-11-07 13:50 ` Eric Dumazet
2024-11-07 16:17 ` Willem de Bruijn
2024-11-07 13:22 ` [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender Anna Emese Nyiri
2 siblings, 2 replies; 13+ messages in thread
From: Anna Emese Nyiri @ 2024-11-07 13:22 UTC (permalink / raw)
To: netdev; +Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
willemdebruijn.kernel
The Linux socket API currently allows setting SO_PRIORITY at the
socket level, applying a uniform priority to all packets sent through
that socket. The exception to this is IP_TOS, when the priority value
is calculated during the handling of
ancillary data, as implemented in commit <f02db315b8d88>
("ipv4: IP_TOS and IP_TTL can be specified as ancillary data").
However, this is a computed
value, and there is currently no mechanism to set a custom priority
via control messages prior to this patch.
According to this pacth, if SO_PRIORITY is specified as ancillary data,
the packet is sent with the priority value set through
sockc->priority, overriding the socket-level values
set via the traditional setsockopt() method. This is analogous to
the existing support for SO_MARK, as implemented in commit
<c6af0c227a22> ("ip: support SO_MARK cmsg").
Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
---
include/net/inet_sock.h | 2 +-
include/net/ip.h | 2 +-
include/net/sock.h | 4 +++-
net/can/raw.c | 2 +-
net/core/sock.c | 7 +++++++
net/ipv4/ip_output.c | 4 ++--
net/ipv4/ip_sockglue.c | 2 +-
net/ipv4/raw.c | 2 +-
net/ipv6/ip6_output.c | 3 ++-
net/ipv6/raw.c | 3 ++-
net/ipv6/udp.c | 1 +
net/packet/af_packet.c | 2 +-
12 files changed, 23 insertions(+), 11 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 56d8bc5593d3..3ccbad881d74 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -172,7 +172,7 @@ struct inet_cork {
u8 tx_flags;
__u8 ttl;
__s16 tos;
- char priority;
+ u32 priority;
__u16 gso_size;
u32 ts_opt_id;
u64 transmit_time;
diff --git a/include/net/ip.h b/include/net/ip.h
index 0e548c1f2a0e..9f5e33e371fc 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -81,7 +81,6 @@ struct ipcm_cookie {
__u8 protocol;
__u8 ttl;
__s16 tos;
- char priority;
__u16 gso_size;
};
@@ -96,6 +95,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
ipcm_init(ipcm);
ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark);
+ ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority);
ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags);
ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
ipcm->addr = inet->inet_saddr;
diff --git a/include/net/sock.h b/include/net/sock.h
index 7464e9f9f47c..316a34d6c48b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1814,13 +1814,15 @@ struct sockcm_cookie {
u32 mark;
u32 tsflags;
u32 ts_opt_id;
+ u32 priority;
};
static inline void sockcm_init(struct sockcm_cookie *sockc,
const struct sock *sk)
{
*sockc = (struct sockcm_cookie) {
- .tsflags = READ_ONCE(sk->sk_tsflags)
+ .tsflags = READ_ONCE(sk->sk_tsflags),
+ .priority = READ_ONCE(sk->sk_priority),
};
}
diff --git a/net/can/raw.c b/net/can/raw.c
index 255c0a8f39d6..46e8ed9d64da 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
}
skb->dev = dev;
- skb->priority = READ_ONCE(sk->sk_priority);
+ skb->priority = sockc.priority;
skb->mark = READ_ONCE(sk->sk_mark);
skb->tstamp = sockc.transmit_time;
diff --git a/net/core/sock.c b/net/core/sock.c
index 5ecf6f1a470c..68e2af168da7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2941,6 +2941,13 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
case SCM_RIGHTS:
case SCM_CREDENTIALS:
break;
+ case SO_PRIORITY:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
+ return -EINVAL;
+ if (!sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg)))
+ return -EPERM;
+ sockc->priority = *(u32 *)CMSG_DATA(cmsg);
+ break;
default:
return -EINVAL;
}
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 0065b1996c94..cd3e788600cc 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1328,7 +1328,7 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
cork->ttl = ipc->ttl;
cork->tos = ipc->tos;
cork->mark = ipc->sockc.mark;
- cork->priority = ipc->priority;
+ cork->priority = ipc->sockc.priority;
cork->transmit_time = ipc->sockc.transmit_time;
cork->tx_flags = 0;
sock_tx_timestamp(sk, &ipc->sockc, &cork->tx_flags);
@@ -1465,7 +1465,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
ip_options_build(skb, opt, cork->addr, rt);
}
- skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
+ skb->priority = cork->priority;
skb->mark = cork->mark;
if (sk_is_tcp(sk))
skb_set_delivery_time(skb, cork->transmit_time, SKB_CLOCK_MONOTONIC);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index cf377377b52d..f6a03b418dde 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -315,7 +315,7 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
if (val < 0 || val > 255)
return -EINVAL;
ipc->tos = val;
- ipc->priority = rt_tos2priority(ipc->tos);
+ ipc->sockc.priority = rt_tos2priority(ipc->tos);
break;
case IP_PROTOCOL:
if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 0e9e01967ec9..4304a68d1db0 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -358,7 +358,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
skb_reserve(skb, hlen);
skb->protocol = htons(ETH_P_IP);
- skb->priority = READ_ONCE(sk->sk_priority);
+ skb->priority = sockc->priority;
skb->mark = sockc->mark;
skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);
skb_dst_set(skb, &rt->dst);
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f7b4608bb316..ec9673b7ab16 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1401,6 +1401,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
cork->base.gso_size = ipc6->gso_size;
cork->base.tx_flags = 0;
cork->base.mark = ipc6->sockc.mark;
+ cork->base.priority = ipc6->sockc.priority;
sock_tx_timestamp(sk, &ipc6->sockc, &cork->base.tx_flags);
if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID) {
cork->base.flags |= IPCORK_TS_OPT_ID;
@@ -1939,7 +1940,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
hdr->saddr = fl6->saddr;
hdr->daddr = *final_dst;
- skb->priority = READ_ONCE(sk->sk_priority);
+ skb->priority = cork->base.priority;
skb->mark = cork->base.mark;
if (sk_is_tcp(sk))
skb_set_delivery_time(skb, cork->base.transmit_time, SKB_CLOCK_MONOTONIC);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 8476a3944a88..a45aba090aa4 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -619,7 +619,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
skb_reserve(skb, hlen);
skb->protocol = htons(ETH_P_IPV6);
- skb->priority = READ_ONCE(sk->sk_priority);
+ skb->priority = sockc->priority;
skb->mark = sockc->mark;
skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);
@@ -780,6 +780,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
ipcm6_init(&ipc6);
ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags);
ipc6.sockc.mark = fl6.flowi6_mark;
+ ipc6.sockc.priority = READ_ONCE(sk->sk_priority);
if (sin6) {
if (addr_len < SIN6_LEN_RFC2133)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 0cef8ae5d1ea..dcce9fd33e98 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1353,6 +1353,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
ipc6.gso_size = READ_ONCE(up->gso_size);
ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags);
ipc6.sockc.mark = READ_ONCE(sk->sk_mark);
+ ipc6.sockc.priority = READ_ONCE(sk->sk_priority);
/* destination address check */
if (sin6) {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 886c0dd47b66..f8d87d622699 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3126,7 +3126,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
skb->protocol = proto;
skb->dev = dev;
- skb->priority = READ_ONCE(sk->sk_priority);
+ skb->priority = sockc.priority;
skb->mark = sockc.mark;
skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid);
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
2024-11-07 13:22 [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary Anna Emese Nyiri
2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
@ 2024-11-07 13:22 ` Anna Emese Nyiri
2024-11-07 17:22 ` Willem de Bruijn
2 siblings, 1 reply; 13+ messages in thread
From: Anna Emese Nyiri @ 2024-11-07 13:22 UTC (permalink / raw)
To: netdev
Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
willemdebruijn.kernel, Ido Schimmel
Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
ancillary data.
Add the `cmsg_so_priority.sh` script, which sets up two network
namespaces (red and green) and uses the `cmsg_sender.c` program to
send messages between them. The script sets packet priorities both
via `setsockopt` and control messages (cmsg) and verifies whether
packets are routed to the same queue based on priority settings.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
---
tools/testing/selftests/net/cmsg_sender.c | 11 +-
.../testing/selftests/net/cmsg_so_priority.sh | 115 ++++++++++++++++++
2 files changed, 125 insertions(+), 1 deletion(-)
create mode 100755 tools/testing/selftests/net/cmsg_so_priority.sh
diff --git a/tools/testing/selftests/net/cmsg_sender.c b/tools/testing/selftests/net/cmsg_sender.c
index 876c2db02a63..6fbe93dd63d2 100644
--- a/tools/testing/selftests/net/cmsg_sender.c
+++ b/tools/testing/selftests/net/cmsg_sender.c
@@ -52,6 +52,7 @@ struct options {
unsigned int tclass;
unsigned int hlimit;
unsigned int priority;
+ unsigned int priority_cmsg;
} sockopt;
struct {
unsigned int family;
@@ -59,6 +60,7 @@ struct options {
unsigned int proto;
} sock;
struct option_cmsg_u32 mark;
+ struct option_cmsg_u32 priority_cmsg;
struct {
bool ena;
unsigned int delay;
@@ -97,6 +99,7 @@ static void __attribute__((noreturn)) cs_usage(const char *bin)
"\n"
"\t\t-m val Set SO_MARK with given value\n"
"\t\t-M val Set SO_MARK via setsockopt\n"
+ "\t\t-Q val Set SO_PRIORITY via cmsg\n"
"\t\t-d val Set SO_TXTIME with given delay (usec)\n"
"\t\t-t Enable time stamp reporting\n"
"\t\t-f val Set don't fragment via cmsg\n"
@@ -115,7 +118,7 @@ static void cs_parse_args(int argc, char *argv[])
{
int o;
- while ((o = getopt(argc, argv, "46sS:p:P:m:M:n:d:tf:F:c:C:l:L:H:")) != -1) {
+ while ((o = getopt(argc, argv, "46sS:p:P:m:M:n:d:tf:F:c:C:l:L:H:Q:")) != -1) {
switch (o) {
case 's':
opt.silent_send = true;
@@ -148,6 +151,10 @@ static void cs_parse_args(int argc, char *argv[])
opt.mark.ena = true;
opt.mark.val = atoi(optarg);
break;
+ case 'Q':
+ opt.priority_cmsg.ena = true;
+ opt.priority_cmsg.val = atoi(optarg);
+ break;
case 'M':
opt.sockopt.mark = atoi(optarg);
break;
@@ -252,6 +259,8 @@ cs_write_cmsg(int fd, struct msghdr *msg, char *cbuf, size_t cbuf_sz)
ca_write_cmsg_u32(cbuf, cbuf_sz, &cmsg_len,
SOL_SOCKET, SO_MARK, &opt.mark);
+ ca_write_cmsg_u32(cbuf, cbuf_sz, &cmsg_len,
+ SOL_SOCKET, SO_PRIORITY, &opt.priority_cmsg);
ca_write_cmsg_u32(cbuf, cbuf_sz, &cmsg_len,
SOL_IPV6, IPV6_DONTFRAG, &opt.v6.dontfrag);
ca_write_cmsg_u32(cbuf, cbuf_sz, &cmsg_len,
diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh b/tools/testing/selftests/net/cmsg_so_priority.sh
new file mode 100755
index 000000000000..706d29b251e9
--- /dev/null
+++ b/tools/testing/selftests/net/cmsg_so_priority.sh
@@ -0,0 +1,115 @@
+#!/bin/bash
+
+source lib.sh
+
+IP4=192.168.0.2/16
+TGT4=192.168.0.3/16
+TGT4_NO_MASK=192.168.0.3
+IP6=2001:db8::2/64
+TGT6=2001:db8::3/64
+TGT6_NO_MASK=2001:db8::3
+IP4BR=192.168.0.1/16
+IP6BR=2001:db8::1/64
+PORT=8080
+DELAY=400000
+QUEUE_NUM=4
+
+
+cleanup() {
+ ip netns del red 2>/dev/null
+ ip netns del green 2>/dev/null
+ ip link del br0 2>/dev/null
+ ip link del vethcab0 2>/dev/null
+ ip link del vethcab1 2>/dev/null
+}
+
+trap cleanup EXIT
+
+priority_values=($(seq 0 $((QUEUE_NUM - 1))))
+
+queue_config=""
+for ((i=0; i<$QUEUE_NUM; i++)); do
+ queue_config+=" 1@$i"
+done
+
+map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
+
+ip netns add red
+ip netns add green
+ip link add br0 type bridge
+ip link set br0 up
+ip addr add $IP4BR dev br0
+ip addr add $IP6BR dev br0
+
+ip link add vethcab0 type veth peer name red0
+ip link set vethcab0 master br0
+ip link set red0 netns red
+ip netns exec red bash -c "
+ip link set lo up
+ip link set red0 up
+ip addr add $IP4 dev red0
+ip addr add $IP6 dev red0
+sysctl -w net.ipv4.ping_group_range='0 2147483647'
+exit"
+ip link set vethcab0 up
+
+ip link add vethcab1 type veth peer name green0
+ip link set vethcab1 master br0
+ip link set green0 netns green
+ip netns exec green bash -c "
+ip link set lo up
+ip link set green0 up
+ip addr add $TGT4 dev green0
+ip addr add $TGT6 dev green0
+exit"
+ip link set vethcab1 up
+
+ip netns exec red bash -c "
+sudo ethtool -L red0 tx $QUEUE_NUM rx $QUEUE_NUM
+sudo tc qdisc add dev red0 root mqprio num_tc $QUEUE_NUM queues $queue_config map $map_config hw 0
+exit"
+
+get_queue_bytes() {
+ ip netns exec red sudo tc -s qdisc show dev red0 | grep 'Sent' | awk '{print $2}'
+}
+
+TOTAL_TESTS=0
+FAILED_TESTS=0
+
+check_result() {
+ ((TOTAL_TESTS++))
+ if [ "$1" -ne 0 ]; then
+ ((FAILED_TESTS++))
+ fi
+}
+
+
+for i in 4 6; do
+ [ $i == 4 ] && TGT=$TGT4_NO_MASK || TGT=$TGT6_NO_MASK
+
+ for p in u i r; do
+ echo "Test IPV$i, prot: $p"
+ for value in "${priority_values[@]}"; do
+ ip netns exec red ./cmsg_sender -$i -Q $value -d "${DELAY}" -p $p $TGT $PORT
+ setsockopt_priority_bytes_num=($(get_queue_bytes))
+
+ ip netns exec red ./cmsg_sender -$i -P $value -d "${DELAY}" -p $p $TGT $PORT
+ cmsg_priority_bytes_num=($(get_queue_bytes))
+
+ if [[ "${cmsg_priority_bytes_num[$actual_queue]}" != \
+ "${setsockopt_priority_bytes_num[$actual_queue]}" ]]; then
+ check_result 0
+ else
+ check_result 1
+ fi
+ done
+ done
+done
+
+if [ $FAILED_TESTS -ne 0 ]; then
+ echo "FAIL - $FAILED_TESTS/$TOTAL_TESTS tests failed"
+ exit 1
+else
+ echo "OK - All $TOTAL_TESTS tests passed"
+ exit 0
+fi
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function
2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
@ 2024-11-07 13:48 ` Eric Dumazet
2024-11-07 15:55 ` Willem de Bruijn
1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2024-11-07 13:48 UTC (permalink / raw)
To: Anna Emese Nyiri; +Cc: netdev, fejes, kuba, pabeni, willemdebruijn.kernel
On Thu, Nov 7, 2024 at 2:23 PM Anna Emese Nyiri
<annaemesenyiri@gmail.com> wrote:
>
> Simplify priority setting permissions with the `sk_set_prio_allowed`
> function, centralizing the validation logic. This change is made in
> anticipation of a second caller in a following patch.
> No functional changes.
>
> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> ---
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg
2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
@ 2024-11-07 13:50 ` Eric Dumazet
2024-11-07 16:17 ` Willem de Bruijn
1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2024-11-07 13:50 UTC (permalink / raw)
To: Anna Emese Nyiri; +Cc: netdev, fejes, kuba, pabeni, willemdebruijn.kernel
On Thu, Nov 7, 2024 at 2:23 PM Anna Emese Nyiri
<annaemesenyiri@gmail.com> wrote:
>
> The Linux socket API currently allows setting SO_PRIORITY at the
> socket level, applying a uniform priority to all packets sent through
> that socket. The exception to this is IP_TOS, when the priority value
> is calculated during the handling of
> ancillary data, as implemented in commit <f02db315b8d88>
> ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data").
> However, this is a computed
> value, and there is currently no mechanism to set a custom priority
> via control messages prior to this patch.
>
> According to this pacth, if SO_PRIORITY is specified as ancillary data,
> the packet is sent with the priority value set through
> sockc->priority, overriding the socket-level values
> set via the traditional setsockopt() method. This is analogous to
> the existing support for SO_MARK, as implemented in commit
> <c6af0c227a22> ("ip: support SO_MARK cmsg").
>
> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function
2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
2024-11-07 13:48 ` Eric Dumazet
@ 2024-11-07 15:55 ` Willem de Bruijn
1 sibling, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-07 15:55 UTC (permalink / raw)
To: Anna Emese Nyiri, netdev
Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
willemdebruijn.kernel
Anna Emese Nyiri wrote:
> Simplify priority setting permissions with the `sk_set_prio_allowed`
> function, centralizing the validation logic. This change is made in
> anticipation of a second caller in a following patch.
> No functional changes.
>
> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg
2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
2024-11-07 13:50 ` Eric Dumazet
@ 2024-11-07 16:17 ` Willem de Bruijn
1 sibling, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-07 16:17 UTC (permalink / raw)
To: Anna Emese Nyiri, netdev
Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
willemdebruijn.kernel
Anna Emese Nyiri wrote:
> The Linux socket API currently allows setting SO_PRIORITY at the
> socket level, applying a uniform priority to all packets sent through
> that socket. The exception to this is IP_TOS, when the priority value
> is calculated during the handling of
> ancillary data, as implemented in commit <f02db315b8d88>
> ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data").
> However, this is a computed
> value, and there is currently no mechanism to set a custom priority
> via control messages prior to this patch.
>
> According to this pacth, if SO_PRIORITY is specified as ancillary data,
typo: patch
> the packet is sent with the priority value set through
> sockc->priority, overriding the socket-level values
> set via the traditional setsockopt() method. This is analogous to
> the existing support for SO_MARK, as implemented in commit
> <c6af0c227a22> ("ip: support SO_MARK cmsg").
If both cmsg SO_PRIORITY and IP_TOS are passed, then the one that
takes precedence is the last one in the cmsg list.
> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index cf377377b52d..f6a03b418dde 100644
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index 0e9e01967ec9..4304a68d1db0 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -358,7 +358,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
> skb_reserve(skb, hlen);
>
> skb->protocol = htons(ETH_P_IP);
> - skb->priority = READ_ONCE(sk->sk_priority);
> + skb->priority = sockc->priority;
This has the side effect that raw_send_hdrinc will now interpret cmsg
IP_TOS, where it previously did not (as only sockcm_cookie was passed,
not all of ipcm_cookie). This is an improvement, but good to make
explicit.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
2024-11-07 13:22 ` [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender Anna Emese Nyiri
@ 2024-11-07 17:22 ` Willem de Bruijn
2024-11-08 14:46 ` Willem de Bruijn
0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-07 17:22 UTC (permalink / raw)
To: Anna Emese Nyiri, netdev
Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
willemdebruijn.kernel, Ido Schimmel
Anna Emese Nyiri wrote:
> Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> ancillary data.
>
> Add the `cmsg_so_priority.sh` script, which sets up two network
> namespaces (red and green) and uses the `cmsg_sender.c` program to
> send messages between them. The script sets packet priorities both
> via `setsockopt` and control messages (cmsg) and verifies whether
> packets are routed to the same queue based on priority settings.
qdisc. queue is a more generic term, generally referring to hw queues.
> Suggested-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> ---
> tools/testing/selftests/net/cmsg_sender.c | 11 +-
> .../testing/selftests/net/cmsg_so_priority.sh | 115 ++++++++++++++++++
> 2 files changed, 125 insertions(+), 1 deletion(-)
> create mode 100755 tools/testing/selftests/net/cmsg_so_priority.sh
> diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh b/tools/testing/selftests/net/cmsg_so_priority.sh
> new file mode 100755
> index 000000000000..706d29b251e9
> --- /dev/null
> +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> @@ -0,0 +1,115 @@
> +#!/bin/bash
SPDX header
> +
> +source lib.sh
> +
> +IP4=192.168.0.2/16
> +TGT4=192.168.0.3/16
> +TGT4_NO_MASK=192.168.0.3
> +IP6=2001:db8::2/64
> +TGT6=2001:db8::3/64
> +TGT6_NO_MASK=2001:db8::3
> +IP4BR=192.168.0.1/16
> +IP6BR=2001:db8::1/64
> +PORT=8080
> +DELAY=400000
> +QUEUE_NUM=4
> +
> +
> +cleanup() {
> + ip netns del red 2>/dev/null
> + ip netns del green 2>/dev/null
> + ip link del br0 2>/dev/null
> + ip link del vethcab0 2>/dev/null
> + ip link del vethcab1 2>/dev/null
> +}
> +
> +trap cleanup EXIT
> +
> +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> +
> +queue_config=""
> +for ((i=0; i<$QUEUE_NUM; i++)); do
> + queue_config+=" 1@$i"
> +done
> +
> +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> +
> +ip netns add red
> +ip netns add green
> +ip link add br0 type bridge
> +ip link set br0 up
Is this bridge needed? Can this just use a veth pair as is.
More importantly, it would be great if we can deduplicate this kind of
setup boilerplate across similar tests as much as possible.
> +ip addr add $IP4BR dev br0
> +ip addr add $IP6BR dev br0
> +
> +ip link add vethcab0 type veth peer name red0
> +ip link set vethcab0 master br0
> +ip link set red0 netns red
> +ip netns exec red bash -c "
> +ip link set lo up
> +ip link set red0 up
> +ip addr add $IP4 dev red0
> +ip addr add $IP6 dev red0
> +sysctl -w net.ipv4.ping_group_range='0 2147483647'
> +exit"
> +ip link set vethcab0 up
> +
> +ip link add vethcab1 type veth peer name green0
> +ip link set vethcab1 master br0
> +ip link set green0 netns green
> +ip netns exec green bash -c "
> +ip link set lo up
> +ip link set green0 up
> +ip addr add $TGT4 dev green0
> +ip addr add $TGT6 dev green0
> +exit"
> +ip link set vethcab1 up
> +
> +ip netns exec red bash -c "
> +sudo ethtool -L red0 tx $QUEUE_NUM rx $QUEUE_NUM
> +sudo tc qdisc add dev red0 root mqprio num_tc $QUEUE_NUM queues $queue_config map $map_config hw 0
> +exit"
> +
> +get_queue_bytes() {
> + ip netns exec red sudo tc -s qdisc show dev red0 | grep 'Sent' | awk '{print $2}'
> +}
> +
> +TOTAL_TESTS=0
> +FAILED_TESTS=0
> +
> +check_result() {
> + ((TOTAL_TESTS++))
> + if [ "$1" -ne 0 ]; then
> + ((FAILED_TESTS++))
> + fi
> +}
> +
> +
> +for i in 4 6; do
> + [ $i == 4 ] && TGT=$TGT4_NO_MASK || TGT=$TGT6_NO_MASK
> +
> + for p in u i r; do
> + echo "Test IPV$i, prot: $p"
> + for value in "${priority_values[@]}"; do
> + ip netns exec red ./cmsg_sender -$i -Q $value -d "${DELAY}" -p $p $TGT $PORT
> + setsockopt_priority_bytes_num=($(get_queue_bytes))
> +
> + ip netns exec red ./cmsg_sender -$i -P $value -d "${DELAY}" -p $p $TGT $PORT
> + cmsg_priority_bytes_num=($(get_queue_bytes))
> +
> + if [[ "${cmsg_priority_bytes_num[$actual_queue]}" != \
> + "${setsockopt_priority_bytes_num[$actual_queue]}" ]]; then
> + check_result 0
> + else
> + check_result 1
> + fi
> + done
> + done
> +done
> +
> +if [ $FAILED_TESTS -ne 0 ]; then
> + echo "FAIL - $FAILED_TESTS/$TOTAL_TESTS tests failed"
> + exit 1
> +else
> + echo "OK - All $TOTAL_TESTS tests passed"
> + exit 0
> +fi
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
2024-11-07 17:22 ` Willem de Bruijn
@ 2024-11-08 14:46 ` Willem de Bruijn
2024-11-09 18:54 ` Ferenc Fejes
0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-08 14:46 UTC (permalink / raw)
To: Willem de Bruijn, Anna Emese Nyiri, netdev
Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
willemdebruijn.kernel, Ido Schimmel
Willem de Bruijn wrote:
> Anna Emese Nyiri wrote:
> > Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> > ancillary data.
> >
> > Add the `cmsg_so_priority.sh` script, which sets up two network
> > namespaces (red and green) and uses the `cmsg_sender.c` program to
> > send messages between them. The script sets packet priorities both
> > via `setsockopt` and control messages (cmsg) and verifies whether
> > packets are routed to the same queue based on priority settings.
>
> qdisc. queue is a more generic term, generally referring to hw queues.
>
> > Suggested-by: Ido Schimmel <idosch@idosch.org>
> > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > ---
> > tools/testing/selftests/net/cmsg_sender.c | 11 +-
> > .../testing/selftests/net/cmsg_so_priority.sh | 115 ++++++++++++++++++
> > 2 files changed, 125 insertions(+), 1 deletion(-)
> > create mode 100755 tools/testing/selftests/net/cmsg_so_priority.sh
>
> > diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh b/tools/testing/selftests/net/cmsg_so_priority.sh
> > new file mode 100755
> > index 000000000000..706d29b251e9
> > --- /dev/null
> > +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> > @@ -0,0 +1,115 @@
> > +#!/bin/bash
>
> SPDX header
>
> > +
> > +source lib.sh
> > +
> > +IP4=192.168.0.2/16
> > +TGT4=192.168.0.3/16
> > +TGT4_NO_MASK=192.168.0.3
> > +IP6=2001:db8::2/64
> > +TGT6=2001:db8::3/64
> > +TGT6_NO_MASK=2001:db8::3
> > +IP4BR=192.168.0.1/16
> > +IP6BR=2001:db8::1/64
> > +PORT=8080
> > +DELAY=400000
> > +QUEUE_NUM=4
> > +
> > +
> > +cleanup() {
> > + ip netns del red 2>/dev/null
> > + ip netns del green 2>/dev/null
> > + ip link del br0 2>/dev/null
> > + ip link del vethcab0 2>/dev/null
> > + ip link del vethcab1 2>/dev/null
> > +}
> > +
> > +trap cleanup EXIT
> > +
> > +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> > +
> > +queue_config=""
> > +for ((i=0; i<$QUEUE_NUM; i++)); do
> > + queue_config+=" 1@$i"
> > +done
> > +
> > +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> > +
> > +ip netns add red
> > +ip netns add green
> > +ip link add br0 type bridge
> > +ip link set br0 up
>
> Is this bridge needed? Can this just use a veth pair as is.
>
> More importantly, it would be great if we can deduplicate this kind of
> setup boilerplate across similar tests as much as possible.
As a matter of fact, similar to cmsg_so_mark, this test can probably
use a dummy netdevice, no need for a second namespace and dev.
cmsg_so_mark.sh is probably small enough that it is fine to copy that
and create a duplicate. As trying to extend it to cover both tests
will probably double it in size and will just be harder to follow.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
2024-11-08 14:46 ` Willem de Bruijn
@ 2024-11-09 18:54 ` Ferenc Fejes
2024-11-09 19:56 ` Willem de Bruijn
0 siblings, 1 reply; 13+ messages in thread
From: Ferenc Fejes @ 2024-11-09 18:54 UTC (permalink / raw)
To: Willem de Bruijn, Anna Emese Nyiri, netdev
Cc: edumazet, kuba, pabeni, Ido Schimmel
On Fri, 2024-11-08 at 09:46 -0500, Willem de Bruijn wrote:
> Willem de Bruijn wrote:
> > Anna Emese Nyiri wrote:
> > > Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> > > ancillary data.
> > >
> > > Add the `cmsg_so_priority.sh` script, which sets up two network
> > > namespaces (red and green) and uses the `cmsg_sender.c` program
> > > to
> > > send messages between them. The script sets packet priorities
> > > both
> > > via `setsockopt` and control messages (cmsg) and verifies
> > > whether
> > > packets are routed to the same queue based on priority settings.
> >
> > qdisc. queue is a more generic term, generally referring to hw
> > queues.
> >
> > > Suggested-by: Ido Schimmel <idosch@idosch.org>
> > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > > ---
> > > tools/testing/selftests/net/cmsg_sender.c | 11 +-
> > > .../testing/selftests/net/cmsg_so_priority.sh | 115
> > > ++++++++++++++++++
> > > 2 files changed, 125 insertions(+), 1 deletion(-)
> > > create mode 100755
> > > tools/testing/selftests/net/cmsg_so_priority.sh
> >
> > > diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh
> > > b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > new file mode 100755
> > > index 000000000000..706d29b251e9
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > @@ -0,0 +1,115 @@
> > > +#!/bin/bash
> >
> > SPDX header
> >
> > > +
> > > +source lib.sh
> > > +
> > > +IP4=192.168.0.2/16
> > > +TGT4=192.168.0.3/16
> > > +TGT4_NO_MASK=192.168.0.3
> > > +IP6=2001:db8::2/64
> > > +TGT6=2001:db8::3/64
> > > +TGT6_NO_MASK=2001:db8::3
> > > +IP4BR=192.168.0.1/16
> > > +IP6BR=2001:db8::1/64
> > > +PORT=8080
> > > +DELAY=400000
> > > +QUEUE_NUM=4
> > > +
> > > +
> > > +cleanup() {
> > > + ip netns del red 2>/dev/null
> > > + ip netns del green 2>/dev/null
> > > + ip link del br0 2>/dev/null
> > > + ip link del vethcab0 2>/dev/null
> > > + ip link del vethcab1 2>/dev/null
> > > +}
> > > +
> > > +trap cleanup EXIT
> > > +
> > > +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> > > +
> > > +queue_config=""
> > > +for ((i=0; i<$QUEUE_NUM; i++)); do
> > > + queue_config+=" 1@$i"
> > > +done
> > > +
> > > +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> > > +
> > > +ip netns add red
> > > +ip netns add green
> > > +ip link add br0 type bridge
> > > +ip link set br0 up
> >
> > Is this bridge needed? Can this just use a veth pair as is.
> >
> > More importantly, it would be great if we can deduplicate this kind
> > of
> > setup boilerplate across similar tests as much as possible.
>
> As a matter of fact, similar to cmsg_so_mark, this test can probably
> use a dummy netdevice, no need for a second namespace and dev.
>
> cmsg_so_mark.sh is probably small enough that it is fine to copy that
> and create a duplicate. As trying to extend it to cover both tests
> will probably double it in size and will just be harder to follow.
I'm afraid we don't have "ip rule" match argument for skb->priority
like we have for skb->mark (ip rule fwmark). AFAIU cmsg_so_mark.sh uses
rule matches to confirm if skb->mark is correct or not.
Using nftables meta priority would work with a dummy device:
add table so_prio
add chain so_prio so_prio_chain { type filter hook output priority 0; }
add rule so_prio so_prio_chain meta priority $SOPRIOVALUE counter
Is there anything simpler? I am afraid we cannot use nftables in
selftests, or can we? Thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
2024-11-09 18:54 ` Ferenc Fejes
@ 2024-11-09 19:56 ` Willem de Bruijn
2024-11-10 13:27 ` Ido Schimmel
0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-11-09 19:56 UTC (permalink / raw)
To: Ferenc Fejes, Willem de Bruijn, Anna Emese Nyiri, netdev
Cc: edumazet, kuba, pabeni, Ido Schimmel
Ferenc Fejes wrote:
> On Fri, 2024-11-08 at 09:46 -0500, Willem de Bruijn wrote:
> > Willem de Bruijn wrote:
> > > Anna Emese Nyiri wrote:
> > > > Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> > > > ancillary data.
> > > >
> > > > Add the `cmsg_so_priority.sh` script, which sets up two network
> > > > namespaces (red and green) and uses the `cmsg_sender.c` program
> > > > to
> > > > send messages between them. The script sets packet priorities
> > > > both
> > > > via `setsockopt` and control messages (cmsg) and verifies
> > > > whether
> > > > packets are routed to the same queue based on priority settings.
> > >
> > > qdisc. queue is a more generic term, generally referring to hw
> > > queues.
> > >
> > > > Suggested-by: Ido Schimmel <idosch@idosch.org>
> > > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > > > ---
> > > > tools/testing/selftests/net/cmsg_sender.c | 11 +-
> > > > .../testing/selftests/net/cmsg_so_priority.sh | 115
> > > > ++++++++++++++++++
> > > > 2 files changed, 125 insertions(+), 1 deletion(-)
> > > > create mode 100755
> > > > tools/testing/selftests/net/cmsg_so_priority.sh
> > >
> > > > diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > new file mode 100755
> > > > index 000000000000..706d29b251e9
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > @@ -0,0 +1,115 @@
> > > > +#!/bin/bash
> > >
> > > SPDX header
> > >
> > > > +
> > > > +source lib.sh
> > > > +
> > > > +IP4=192.168.0.2/16
> > > > +TGT4=192.168.0.3/16
> > > > +TGT4_NO_MASK=192.168.0.3
> > > > +IP6=2001:db8::2/64
> > > > +TGT6=2001:db8::3/64
> > > > +TGT6_NO_MASK=2001:db8::3
> > > > +IP4BR=192.168.0.1/16
> > > > +IP6BR=2001:db8::1/64
> > > > +PORT=8080
> > > > +DELAY=400000
> > > > +QUEUE_NUM=4
> > > > +
> > > > +
> > > > +cleanup() {
> > > > + ip netns del red 2>/dev/null
> > > > + ip netns del green 2>/dev/null
> > > > + ip link del br0 2>/dev/null
> > > > + ip link del vethcab0 2>/dev/null
> > > > + ip link del vethcab1 2>/dev/null
> > > > +}
> > > > +
> > > > +trap cleanup EXIT
> > > > +
> > > > +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> > > > +
> > > > +queue_config=""
> > > > +for ((i=0; i<$QUEUE_NUM; i++)); do
> > > > + queue_config+=" 1@$i"
> > > > +done
> > > > +
> > > > +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> > > > +
> > > > +ip netns add red
> > > > +ip netns add green
> > > > +ip link add br0 type bridge
> > > > +ip link set br0 up
> > >
> > > Is this bridge needed? Can this just use a veth pair as is.
> > >
> > > More importantly, it would be great if we can deduplicate this kind
> > > of
> > > setup boilerplate across similar tests as much as possible.
> >
> > As a matter of fact, similar to cmsg_so_mark, this test can probably
> > use a dummy netdevice, no need for a second namespace and dev.
> >
> > cmsg_so_mark.sh is probably small enough that it is fine to copy that
> > and create a duplicate. As trying to extend it to cover both tests
> > will probably double it in size and will just be harder to follow.
>
> I'm afraid we don't have "ip rule" match argument for skb->priority
> like we have for skb->mark (ip rule fwmark). AFAIU cmsg_so_mark.sh uses
> rule matches to confirm if skb->mark is correct or not.
>
> Using nftables meta priority would work with a dummy device:
> add table so_prio
> add chain so_prio so_prio_chain { type filter hook output priority 0; }
> add rule so_prio so_prio_chain meta priority $SOPRIOVALUE counter
>
> Is there anything simpler? I am afraid we cannot use nftables in
> selftests, or can we? Thanks!
I'd use traffic shaper (tc). There are a variety of qdiscs/schedulers
and classifiers that act on skb->priority.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender
2024-11-09 19:56 ` Willem de Bruijn
@ 2024-11-10 13:27 ` Ido Schimmel
0 siblings, 0 replies; 13+ messages in thread
From: Ido Schimmel @ 2024-11-10 13:27 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Ferenc Fejes, Anna Emese Nyiri, netdev, edumazet, kuba, pabeni
On Sat, Nov 09, 2024 at 02:56:29PM -0500, Willem de Bruijn wrote:
> Ferenc Fejes wrote:
> > On Fri, 2024-11-08 at 09:46 -0500, Willem de Bruijn wrote:
> > > Willem de Bruijn wrote:
> > > > Anna Emese Nyiri wrote:
> > > > > Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY
> > > > > ancillary data.
> > > > >
> > > > > Add the `cmsg_so_priority.sh` script, which sets up two network
> > > > > namespaces (red and green) and uses the `cmsg_sender.c` program
> > > > > to
> > > > > send messages between them. The script sets packet priorities
> > > > > both
> > > > > via `setsockopt` and control messages (cmsg) and verifies
> > > > > whether
> > > > > packets are routed to the same queue based on priority settings.
> > > >
> > > > qdisc. queue is a more generic term, generally referring to hw
> > > > queues.
> > > >
> > > > > Suggested-by: Ido Schimmel <idosch@idosch.org>
> > > > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
> > > > > ---
> > > > > tools/testing/selftests/net/cmsg_sender.c | 11 +-
> > > > > .../testing/selftests/net/cmsg_so_priority.sh | 115
> > > > > ++++++++++++++++++
> > > > > 2 files changed, 125 insertions(+), 1 deletion(-)
> > > > > create mode 100755
> > > > > tools/testing/selftests/net/cmsg_so_priority.sh
> > > >
> > > > > diff --git a/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > > b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > > new file mode 100755
> > > > > index 000000000000..706d29b251e9
> > > > > --- /dev/null
> > > > > +++ b/tools/testing/selftests/net/cmsg_so_priority.sh
> > > > > @@ -0,0 +1,115 @@
> > > > > +#!/bin/bash
> > > >
> > > > SPDX header
> > > >
> > > > > +
> > > > > +source lib.sh
> > > > > +
> > > > > +IP4=192.168.0.2/16
> > > > > +TGT4=192.168.0.3/16
> > > > > +TGT4_NO_MASK=192.168.0.3
> > > > > +IP6=2001:db8::2/64
> > > > > +TGT6=2001:db8::3/64
> > > > > +TGT6_NO_MASK=2001:db8::3
> > > > > +IP4BR=192.168.0.1/16
> > > > > +IP6BR=2001:db8::1/64
> > > > > +PORT=8080
> > > > > +DELAY=400000
> > > > > +QUEUE_NUM=4
> > > > > +
> > > > > +
> > > > > +cleanup() {
> > > > > + ip netns del red 2>/dev/null
> > > > > + ip netns del green 2>/dev/null
> > > > > + ip link del br0 2>/dev/null
> > > > > + ip link del vethcab0 2>/dev/null
> > > > > + ip link del vethcab1 2>/dev/null
> > > > > +}
> > > > > +
> > > > > +trap cleanup EXIT
> > > > > +
> > > > > +priority_values=($(seq 0 $((QUEUE_NUM - 1))))
> > > > > +
> > > > > +queue_config=""
> > > > > +for ((i=0; i<$QUEUE_NUM; i++)); do
> > > > > + queue_config+=" 1@$i"
> > > > > +done
> > > > > +
> > > > > +map_config=$(seq 0 $((QUEUE_NUM - 1)) | tr '\n' ' ')
> > > > > +
> > > > > +ip netns add red
> > > > > +ip netns add green
> > > > > +ip link add br0 type bridge
> > > > > +ip link set br0 up
> > > >
> > > > Is this bridge needed? Can this just use a veth pair as is.
> > > >
> > > > More importantly, it would be great if we can deduplicate this kind
> > > > of
> > > > setup boilerplate across similar tests as much as possible.
> > >
> > > As a matter of fact, similar to cmsg_so_mark, this test can probably
> > > use a dummy netdevice, no need for a second namespace and dev.
> > >
> > > cmsg_so_mark.sh is probably small enough that it is fine to copy that
> > > and create a duplicate. As trying to extend it to cover both tests
> > > will probably double it in size and will just be harder to follow.
> >
> > I'm afraid we don't have "ip rule" match argument for skb->priority
> > like we have for skb->mark (ip rule fwmark). AFAIU cmsg_so_mark.sh uses
> > rule matches to confirm if skb->mark is correct or not.
> >
> > Using nftables meta priority would work with a dummy device:
> > add table so_prio
> > add chain so_prio so_prio_chain { type filter hook output priority 0; }
> > add rule so_prio so_prio_chain meta priority $SOPRIOVALUE counter
> >
> > Is there anything simpler? I am afraid we cannot use nftables in
> > selftests, or can we? Thanks!
>
> I'd use traffic shaper (tc). There are a variety of qdiscs/schedulers
> and classifiers that act on skb->priority.
When I initially suggested the test I was thinking about creating a VLAN
device with egress QoS map and then matching on VLAN priority with
flower. Something like [1]. It is just a quick hack. Proper test should
test with all combinations of IPv4 / IPv6 / UDP / ICMP / RAW / cmsg /
sockopt (like cmsg_so_mark.sh).
[1]
#!/bin/bash
ip netns del ns1 &> /dev/null
ip netns add ns1
ip -n ns1 link set dev lo up
ip -n ns1 link add name dummy1 up type dummy
ip -n ns1 link add link dummy1 name dummy1.10 up type vlan id 10 \
egress-qos-map 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
ip -n ns1 address add 192.0.2.1/24 dev dummy1.10
ip -n ns1 neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud permanent dev \
dummy1.10
tc -n ns1 qdisc add dev dummy1 clsact
for i in {0..7}; do
tc -n ns1 filter add dev dummy1 egress pref 1 handle 10${i} \
proto 802.1q flower vlan_prio $i vlan_ethtype ipv4 \
ip_proto udp dst_ip 192.0.2.2 action drop
done
for i in {0..7}; do
pkts=$(tc -n ns1 -j -s filter show dev dummy1 egress \
| jq ".[] | select(.options.handle == 10$i) | \
.options.actions[0].stats.packets")
[[ $pkts == 0 ]] || echo "prio $i: expected 0, got $pkts"
ip netns exec ns1 ./cmsg_sender -4 -p udp -P $i 192.0.2.2 1234
pkts=$(tc -n ns1 -j -s filter show dev dummy1 egress \
| jq ".[] | select(.options.handle == 10$i) | \
.options.actions[0].stats.packets")
[[ $pkts == 1 ]] || echo "prio $i: expected 1, got $pkts"
done
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-11-10 13:27 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-07 13:22 [PATCH net-next v3 0/3] SO_PRIORITY cmsg patch summary Anna Emese Nyiri
2024-11-07 13:22 ` [PATCH net-next v3 1/3] net: Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
2024-11-07 13:48 ` Eric Dumazet
2024-11-07 15:55 ` Willem de Bruijn
2024-11-07 13:22 ` [PATCH net-next v3 2/3] net: support SO_PRIORITY cmsg Anna Emese Nyiri
2024-11-07 13:50 ` Eric Dumazet
2024-11-07 16:17 ` Willem de Bruijn
2024-11-07 13:22 ` [PATCH net-next v3 3/3] selftest: net: test SO_PRIORITY ancillary data with cmsg_sender Anna Emese Nyiri
2024-11-07 17:22 ` Willem de Bruijn
2024-11-08 14:46 ` Willem de Bruijn
2024-11-09 18:54 ` Ferenc Fejes
2024-11-09 19:56 ` Willem de Bruijn
2024-11-10 13:27 ` Ido Schimmel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).