* [PATCH net-next v2 0/2] support SO_PRIORITY cmsg
@ 2024-11-02 12:51 Anna Emese Nyiri
2024-11-02 12:51 ` [PATCH net-next v2 1/2] Introduce sk_set_prio_allowed helper function Anna Emese Nyiri
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Anna Emese Nyiri @ 2024-11-02 12:51 UTC (permalink / raw)
To: netdev; +Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni,
willemdebruijn.kernel
The changes introduce a new helper function,
`sk_set_prio_allowed`, which centralizes the logic for validating
priority settings. This series adds support for the `SO_PRIORITY`
control message, allowing user-space applications to set socket
priority via control messages (cmsg).
Patch Overview:
Patch 1/2: Introduces `sk_set_prio_allowed` helper function.
Patch 2/2: Implements support for setting `SO_PRIORITY` via control
messages.
v2:
- Suggested by Willem de Bruijn <willemdebruijn.kernel@gmail.com>
introduce "sk_set_prio_allowed" helper to check priority setting
capability
- drop new fields and change sockcm_cookie::priority from "char" to
"u32" to match with sk_buff::priority
- cork->tos value check before priority setting
moved from __ip_make_skb() to ip_setup_cork()
- rebased on net-next
v1:
https://lore.kernel.org/all/20241029144142.31382-1-annaemesenyiri@gmail.com/
Anna Emese Nyiri (2):
Introduce sk_set_prio_allowed helper function
support SO_PRIORITY cmsg
include/net/inet_sock.h | 2 +-
include/net/ip.h | 3 ++-
include/net/sock.h | 4 +++-
net/can/raw.c | 2 +-
net/core/sock.c | 19 ++++++++++++++++---
net/ipv4/ip_output.c | 7 +++++--
net/ipv4/raw.c | 2 +-
net/ipv6/ip6_output.c | 3 ++-
net/ipv6/raw.c | 2 +-
net/packet/af_packet.c | 2 +-
10 files changed, 33 insertions(+), 13 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH net-next v2 1/2] Introduce sk_set_prio_allowed helper function 2024-11-02 12:51 [PATCH net-next v2 0/2] support SO_PRIORITY cmsg Anna Emese Nyiri @ 2024-11-02 12:51 ` Anna Emese Nyiri 2024-11-03 1:30 ` Willem de Bruijn 2024-11-02 12:51 ` [PATCH net-next v2 2/2] support SO_PRIORITY cmsg Anna Emese Nyiri ` (2 subsequent siblings) 3 siblings, 1 reply; 11+ messages in thread From: Anna Emese Nyiri @ 2024-11-02 12:51 UTC (permalink / raw) To: netdev; +Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni, willemdebruijn.kernel Simplifies the priority setting permissions through `sk_set_prio_allowed` function. No functional changes. Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> --- net/core/sock.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 7f398bd07fb7..5ecf6f1a470c 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -454,6 +454,13 @@ static int sock_set_timeout(long *timeo_p, sockptr_t optval, int optlen, return 0; } +static bool sk_set_prio_allowed(const struct sock *sk, int val) +{ + return ((val >= TC_PRIO_BESTEFFORT && val <= TC_PRIO_INTERACTIVE) || + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)); +} + static bool sock_needs_netstamp(const struct sock *sk) { switch (sk->sk_family) { @@ -1187,9 +1194,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname, /* handle options which do not require locking the socket. */ switch (optname) { case SO_PRIORITY: - if ((val >= 0 && val <= 6) || - sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || - sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { + if (sk_set_prio_allowed(sk, val)) { sock_set_priority(sk, val); return 0; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 1/2] Introduce sk_set_prio_allowed helper function 2024-11-02 12:51 ` [PATCH net-next v2 1/2] Introduce sk_set_prio_allowed helper function Anna Emese Nyiri @ 2024-11-03 1:30 ` Willem de Bruijn 0 siblings, 0 replies; 11+ messages in thread From: Willem de Bruijn @ 2024-11-03 1:30 UTC (permalink / raw) To: Anna Emese Nyiri, netdev Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni, willemdebruijn.kernel Anna Emese Nyiri wrote: > Simplifies the priority setting permissions through > `sk_set_prio_allowed` function. No functional changes. tiny, only because asking for changes in patch 2/2: please use imperative mood: "Simplify". And maybe mention that this is in anticipation of a second caller in a following patch. > Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> > --- > net/core/sock.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index 7f398bd07fb7..5ecf6f1a470c 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -454,6 +454,13 @@ static int sock_set_timeout(long *timeo_p, sockptr_t optval, int optlen, > return 0; > } > > +static bool sk_set_prio_allowed(const struct sock *sk, int val) > +{ > + return ((val >= TC_PRIO_BESTEFFORT && val <= TC_PRIO_INTERACTIVE) || > + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || > + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)); > +} > + > static bool sock_needs_netstamp(const struct sock *sk) > { > switch (sk->sk_family) { > @@ -1187,9 +1194,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname, > /* handle options which do not require locking the socket. */ > switch (optname) { > case SO_PRIORITY: > - if ((val >= 0 && val <= 6) || > - sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || > - sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { > + if (sk_set_prio_allowed(sk, val)) { > sock_set_priority(sk, val); > return 0; > } > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next v2 2/2] support SO_PRIORITY cmsg 2024-11-02 12:51 [PATCH net-next v2 0/2] support SO_PRIORITY cmsg Anna Emese Nyiri 2024-11-02 12:51 ` [PATCH net-next v2 1/2] Introduce sk_set_prio_allowed helper function Anna Emese Nyiri @ 2024-11-02 12:51 ` Anna Emese Nyiri 2024-11-03 1:27 ` Willem de Bruijn 2024-11-03 15:41 ` [PATCH net-next v2 0/2] " Jakub Kicinski 2024-11-03 16:25 ` Ido Schimmel 3 siblings, 1 reply; 11+ messages in thread From: Anna Emese Nyiri @ 2024-11-02 12:51 UTC (permalink / raw) To: netdev; +Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni, willemdebruijn.kernel The Linux socket API currently supports setting SO_PRIORITY at the socket level, which applies a uniform priority to all packets sent through that socket. The only exception is IP_TOS; if specified as ancillary data, the packet does not inherit the socket's priority. Instead, the priority value is computed when handling the ancillary data (as implemented in commit <f02db315b8d88> ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If the priority is set via IP_TOS, then skb->priority derives its value from the rt_tos2priority function, which calculates the priority based on the value of ipc->tos obtained from IP_TOS. However, if IP_TOS is not used and the priority has been set through a control message, skb->priority will take the value provided by that control message. Therefore, when both options are available, the primary source for skb->priority is the value set via IP_TOS. Currently, there is no option to set the priority directly from userspace on a per-packet basis. The following changes allow SO_PRIORITY to be set through control messages (CMSG), giving userspace applications more granular control over packet priorities. This patch enables setting skb->priority using CMSG. If SO_PRIORITY is specified as ancillary data, the packet is sent with the priority value set through sockc->priority_cmsg_value, overriding the socket-level values set via the traditional setsockopt() method. This is analogous to existing support for SO_MARK (as implemented in commit <c6af0c227a22> ("ip: support SO_MARK cmsg")). Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> --- include/net/inet_sock.h | 2 +- include/net/ip.h | 3 ++- include/net/sock.h | 4 +++- net/can/raw.c | 2 +- net/core/sock.c | 8 ++++++++ net/ipv4/ip_output.c | 7 +++++-- net/ipv4/raw.c | 2 +- net/ipv6/ip6_output.c | 3 ++- net/ipv6/raw.c | 2 +- net/packet/af_packet.c | 2 +- 10 files changed, 25 insertions(+), 10 deletions(-) diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 56d8bc5593d3..3ccbad881d74 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -172,7 +172,7 @@ struct inet_cork { u8 tx_flags; __u8 ttl; __s16 tos; - char priority; + u32 priority; __u16 gso_size; u32 ts_opt_id; u64 transmit_time; diff --git a/include/net/ip.h b/include/net/ip.h index 0e548c1f2a0e..e8f71a191277 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -81,7 +81,7 @@ struct ipcm_cookie { __u8 protocol; __u8 ttl; __s16 tos; - char priority; + u32 priority; __u16 gso_size; }; @@ -96,6 +96,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, ipcm_init(ipcm); ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); + ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority); ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags); ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); ipcm->addr = inet->inet_saddr; diff --git a/include/net/sock.h b/include/net/sock.h index 7464e9f9f47c..316a34d6c48b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1814,13 +1814,15 @@ struct sockcm_cookie { u32 mark; u32 tsflags; u32 ts_opt_id; + u32 priority; }; static inline void sockcm_init(struct sockcm_cookie *sockc, const struct sock *sk) { *sockc = (struct sockcm_cookie) { - .tsflags = READ_ONCE(sk->sk_tsflags) + .tsflags = READ_ONCE(sk->sk_tsflags), + .priority = READ_ONCE(sk->sk_priority), }; } diff --git a/net/can/raw.c b/net/can/raw.c index 255c0a8f39d6..46e8ed9d64da 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) } skb->dev = dev; - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = sockc.priority; skb->mark = READ_ONCE(sk->sk_mark); skb->tstamp = sockc.transmit_time; diff --git a/net/core/sock.c b/net/core/sock.c index 5ecf6f1a470c..d5586b9212dd 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2941,6 +2941,14 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, case SCM_RIGHTS: case SCM_CREDENTIALS: break; + case SO_PRIORITY: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) + return -EINVAL; + if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) { + sockc->priority = *(u32 *)CMSG_DATA(cmsg); + break; + } + return -EPERM; default: return -EINVAL; } diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 0065b1996c94..72b37321c0ea 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1328,7 +1328,10 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork, cork->ttl = ipc->ttl; cork->tos = ipc->tos; cork->mark = ipc->sockc.mark; - cork->priority = ipc->priority; + if (cork->tos != -1) + cork->priority = ipc->priority; + else + cork->priority = ipc->sockc.priority; cork->transmit_time = ipc->sockc.transmit_time; cork->tx_flags = 0; sock_tx_timestamp(sk, &ipc->sockc, &cork->tx_flags); @@ -1465,7 +1468,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk, ip_options_build(skb, opt, cork->addr, rt); } - skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); + skb->priority = cork->priority; skb->mark = cork->mark; if (sk_is_tcp(sk)) skb_set_delivery_time(skb, cork->transmit_time, SKB_CLOCK_MONOTONIC); diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 0e9e01967ec9..4304a68d1db0 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -358,7 +358,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, skb_reserve(skb, hlen); skb->protocol = htons(ETH_P_IP); - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = sockc->priority; skb->mark = sockc->mark; skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid); skb_dst_set(skb, &rt->dst); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index f7b4608bb316..ec9673b7ab16 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1401,6 +1401,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork, cork->base.gso_size = ipc6->gso_size; cork->base.tx_flags = 0; cork->base.mark = ipc6->sockc.mark; + cork->base.priority = ipc6->sockc.priority; sock_tx_timestamp(sk, &ipc6->sockc, &cork->base.tx_flags); if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID) { cork->base.flags |= IPCORK_TS_OPT_ID; @@ -1939,7 +1940,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, hdr->saddr = fl6->saddr; hdr->daddr = *final_dst; - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = cork->base.priority; skb->mark = cork->base.mark; if (sk_is_tcp(sk)) skb_set_delivery_time(skb, cork->base.transmit_time, SKB_CLOCK_MONOTONIC); diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 8476a3944a88..c82cc6cfdbd2 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -619,7 +619,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, skb_reserve(skb, hlen); skb->protocol = htons(ETH_P_IPV6); - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = sockc->priority; skb->mark = sockc->mark; skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 886c0dd47b66..f8d87d622699 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3126,7 +3126,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) skb->protocol = proto; skb->dev = dev; - skb->priority = READ_ONCE(sk->sk_priority); + skb->priority = sockc.priority; skb->mark = sockc.mark; skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid); -- 2.43.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 2/2] support SO_PRIORITY cmsg 2024-11-02 12:51 ` [PATCH net-next v2 2/2] support SO_PRIORITY cmsg Anna Emese Nyiri @ 2024-11-03 1:27 ` Willem de Bruijn 2024-11-03 13:39 ` Anna Nyiri 0 siblings, 1 reply; 11+ messages in thread From: Willem de Bruijn @ 2024-11-03 1:27 UTC (permalink / raw) To: Anna Emese Nyiri, netdev Cc: fejes, annaemesenyiri, edumazet, kuba, pabeni, willemdebruijn.kernel Anna Emese Nyiri wrote: > The Linux socket API currently supports setting SO_PRIORITY at the > socket level, which applies a uniform priority to all packets sent > through that socket. The only exception is IP_TOS; if specified as > ancillary data, the packet does not inherit the socket's priority. > Instead, the priority value is computed when handling the ancillary > data (as implemented in commit <f02db315b8d88> nit: drop the brackets > ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If > the priority is set via IP_TOS, then skb->priority derives its value > from the rt_tos2priority function, which calculates the priority > based on the value of ipc->tos obtained from IP_TOS. However, if > IP_TOS is not used and the priority has been set through a control > message, skb->priority will take the value provided by that control > message. The above describes the new situation? There is no way to set priority to a control message prior to this patch. > Therefore, when both options are available, the primary > source for skb->priority is the value set via IP_TOS. > > Currently, there is no option to set the priority directly from > userspace on a per-packet basis. The following changes allow > SO_PRIORITY to be set through control messages (CMSG), giving > userspace applications more granular control over packet priorities. > > This patch enables setting skb->priority using CMSG. If SO_PRIORITY Duplicate statement. Overall, the explanation can perhaps be condensed and made more clear. > is specified as ancillary data, the packet is sent with the priority > value set through sockc->priority_cmsg_value, overriding the No longer matches the code. > socket-level values set via the traditional setsockopt() method. This > is analogous to existing support for SO_MARK (as implemented in commit > <c6af0c227a22> ("ip: support SO_MARK cmsg")). > > Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> > --- > include/net/inet_sock.h | 2 +- > include/net/ip.h | 3 ++- > include/net/sock.h | 4 +++- > net/can/raw.c | 2 +- > net/core/sock.c | 8 ++++++++ > net/ipv4/ip_output.c | 7 +++++-- > net/ipv4/raw.c | 2 +- > net/ipv6/ip6_output.c | 3 ++- > net/ipv6/raw.c | 2 +- > net/packet/af_packet.c | 2 +- > 10 files changed, 25 insertions(+), 10 deletions(-) > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h > index 56d8bc5593d3..3ccbad881d74 100644 > --- a/include/net/inet_sock.h > +++ b/include/net/inet_sock.h > @@ -172,7 +172,7 @@ struct inet_cork { > u8 tx_flags; > __u8 ttl; > __s16 tos; > - char priority; > + u32 priority; Let's check with pahole how this affects struct size and holes. It likely adds a hole, but unavoidably so. > __u16 gso_size; > u32 ts_opt_id; > u64 transmit_time; > diff --git a/include/net/ip.h b/include/net/ip.h > index 0e548c1f2a0e..e8f71a191277 100644 > --- a/include/net/ip.h > +++ b/include/net/ip.h > @@ -81,7 +81,7 @@ struct ipcm_cookie { > __u8 protocol; > __u8 ttl; > __s16 tos; > - char priority; > + u32 priority; No need for a field in ipcm_cookie, when also present in sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is the right location. If cmsg IP_TOS is present, that can overridde ipc->sockc.priority with rt_tos2priority. Interesting that this override by IP_TOS seems to be IPV4 only. There is no equivalent call to rt_tos2priority when setting IPV6_TCLASS. > __u16 gso_size; > }; > > @@ -96,6 +96,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, > ipcm_init(ipcm); > > ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); > + ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority); > ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags); > ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); > ipcm->addr = inet->inet_saddr; > diff --git a/include/net/sock.h b/include/net/sock.h > index 7464e9f9f47c..316a34d6c48b 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1814,13 +1814,15 @@ struct sockcm_cookie { > u32 mark; > u32 tsflags; > u32 ts_opt_id; > + u32 priority; > }; > > static inline void sockcm_init(struct sockcm_cookie *sockc, > const struct sock *sk) > { > *sockc = (struct sockcm_cookie) { > - .tsflags = READ_ONCE(sk->sk_tsflags) > + .tsflags = READ_ONCE(sk->sk_tsflags), > + .priority = READ_ONCE(sk->sk_priority), > }; > } > > diff --git a/net/can/raw.c b/net/can/raw.c > index 255c0a8f39d6..46e8ed9d64da 100644 > --- a/net/can/raw.c > +++ b/net/can/raw.c > @@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) > } > > skb->dev = dev; > - skb->priority = READ_ONCE(sk->sk_priority); > + skb->priority = sockc.priority; > skb->mark = READ_ONCE(sk->sk_mark); > skb->tstamp = sockc.transmit_time; > > diff --git a/net/core/sock.c b/net/core/sock.c > index 5ecf6f1a470c..d5586b9212dd 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -2941,6 +2941,14 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, > case SCM_RIGHTS: > case SCM_CREDENTIALS: > break; > + case SO_PRIORITY: > + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) > + return -EINVAL; > + if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) { > + sockc->priority = *(u32 *)CMSG_DATA(cmsg); > + break; > + } > + return -EPERM; nit: invert to make the error case the (speculated as unlikely) branch and have the common path unindented. > default: > return -EINVAL; > } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 2/2] support SO_PRIORITY cmsg 2024-11-03 1:27 ` Willem de Bruijn @ 2024-11-03 13:39 ` Anna Nyiri 2024-11-03 13:57 ` Willem de Bruijn 0 siblings, 1 reply; 11+ messages in thread From: Anna Nyiri @ 2024-11-03 13:39 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netdev, fejes, edumazet, kuba, pabeni Willem de Bruijn <willemdebruijn.kernel@gmail.com> ezt írta (időpont: 2024. nov. 3., V, 2:27): > > Anna Emese Nyiri wrote: > > The Linux socket API currently supports setting SO_PRIORITY at the > > socket level, which applies a uniform priority to all packets sent > > through that socket. The only exception is IP_TOS; if specified as > > ancillary data, the packet does not inherit the socket's priority. > > Instead, the priority value is computed when handling the ancillary > > data (as implemented in commit <f02db315b8d88> > > nit: drop the brackets > > > ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If > > the priority is set via IP_TOS, then skb->priority derives its value > > from the rt_tos2priority function, which calculates the priority > > based on the value of ipc->tos obtained from IP_TOS. However, if > > IP_TOS is not used and the priority has been set through a control > > message, skb->priority will take the value provided by that control > > message. > > The above describes the new situation? There is no way to set > priority to a control message prior to this patch. > > > Therefore, when both options are available, the primary > > source for skb->priority is the value set via IP_TOS. > > > > Currently, there is no option to set the priority directly from > > userspace on a per-packet basis. The following changes allow > > SO_PRIORITY to be set through control messages (CMSG), giving > > userspace applications more granular control over packet priorities. > > > > This patch enables setting skb->priority using CMSG. If SO_PRIORITY > > Duplicate statement. Overall, the explanation can perhaps be > condensed and made more clear. > > > is specified as ancillary data, the packet is sent with the priority > > value set through sockc->priority_cmsg_value, overriding the > > No longer matches the code. > > > socket-level values set via the traditional setsockopt() method. This > > is analogous to existing support for SO_MARK (as implemented in commit > > <c6af0c227a22> ("ip: support SO_MARK cmsg")). > > > > Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> > > --- > > include/net/inet_sock.h | 2 +- > > include/net/ip.h | 3 ++- > > include/net/sock.h | 4 +++- > > net/can/raw.c | 2 +- > > net/core/sock.c | 8 ++++++++ > > net/ipv4/ip_output.c | 7 +++++-- > > net/ipv4/raw.c | 2 +- > > net/ipv6/ip6_output.c | 3 ++- > > net/ipv6/raw.c | 2 +- > > net/packet/af_packet.c | 2 +- > > 10 files changed, 25 insertions(+), 10 deletions(-) > > > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h > > index 56d8bc5593d3..3ccbad881d74 100644 > > --- a/include/net/inet_sock.h > > +++ b/include/net/inet_sock.h > > @@ -172,7 +172,7 @@ struct inet_cork { > > u8 tx_flags; > > __u8 ttl; > > __s16 tos; > > - char priority; > > + u32 priority; > > Let's check with pahole how this affects struct size and holes. > It likely adds a hole, but unavoidably so. > > > __u16 gso_size; > > u32 ts_opt_id; > > u64 transmit_time; > > diff --git a/include/net/ip.h b/include/net/ip.h > > index 0e548c1f2a0e..e8f71a191277 100644 > > --- a/include/net/ip.h > > +++ b/include/net/ip.h > > @@ -81,7 +81,7 @@ struct ipcm_cookie { > > __u8 protocol; > > __u8 ttl; > > __s16 tos; > > - char priority; > > + u32 priority; > > No need for a field in ipcm_cookie, when also present in > sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is > the right location. I think there could be a problem if the priority is set by IP_TOS for some reason, and then also via cmsg. The latter value may overwrite it. In the ip_setup_cork() function, there is therefore a check for the value cork->tos != -1 to give priority to the value set by IP_TOS. And that's why I thought that there should be a priority field in both ipcm_cookie and sockcm_cookie. The priority field already existed in ipcm_cookie, I didn't add it. I just changed the type. > > If cmsg IP_TOS is present, that can overridde ipc->sockc.priority with > rt_tos2priority. > > Interesting that this override by IP_TOS seems to be IPV4 only. There > is no equivalent call to rt_tos2priority when setting IPV6_TCLASS. > > > __u16 gso_size; > > }; > > > > @@ -96,6 +96,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, > > ipcm_init(ipcm); > > > > ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); > > + ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority); > > ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags); > > ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); > > ipcm->addr = inet->inet_saddr; > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 7464e9f9f47c..316a34d6c48b 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -1814,13 +1814,15 @@ struct sockcm_cookie { > > u32 mark; > > u32 tsflags; > > u32 ts_opt_id; > > + u32 priority; > > }; > > > > static inline void sockcm_init(struct sockcm_cookie *sockc, > > const struct sock *sk) > > { > > *sockc = (struct sockcm_cookie) { > > - .tsflags = READ_ONCE(sk->sk_tsflags) > > + .tsflags = READ_ONCE(sk->sk_tsflags), > > + .priority = READ_ONCE(sk->sk_priority), > > }; > > } > > > > diff --git a/net/can/raw.c b/net/can/raw.c > > index 255c0a8f39d6..46e8ed9d64da 100644 > > --- a/net/can/raw.c > > +++ b/net/can/raw.c > > @@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) > > } > > > > skb->dev = dev; > > - skb->priority = READ_ONCE(sk->sk_priority); > > + skb->priority = sockc.priority; > > skb->mark = READ_ONCE(sk->sk_mark); > > skb->tstamp = sockc.transmit_time; > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 5ecf6f1a470c..d5586b9212dd 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -2941,6 +2941,14 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, > > case SCM_RIGHTS: > > case SCM_CREDENTIALS: > > break; > > + case SO_PRIORITY: > > + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32))) > > + return -EINVAL; > > + if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) { > > + sockc->priority = *(u32 *)CMSG_DATA(cmsg); > > + break; > > + } > > + return -EPERM; > > nit: invert to make the error case the (speculated as unlikely) branch > and have the common path unindented. > > > default: > > return -EINVAL; > > } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 2/2] support SO_PRIORITY cmsg 2024-11-03 13:39 ` Anna Nyiri @ 2024-11-03 13:57 ` Willem de Bruijn 0 siblings, 0 replies; 11+ messages in thread From: Willem de Bruijn @ 2024-11-03 13:57 UTC (permalink / raw) To: Anna Nyiri, Willem de Bruijn; +Cc: netdev, fejes, edumazet, kuba, pabeni Anna Nyiri wrote: > Willem de Bruijn <willemdebruijn.kernel@gmail.com> ezt írta (időpont: > 2024. nov. 3., V, 2:27): > > > > Anna Emese Nyiri wrote: > > > The Linux socket API currently supports setting SO_PRIORITY at the > > > socket level, which applies a uniform priority to all packets sent > > > through that socket. The only exception is IP_TOS; if specified as > > > ancillary data, the packet does not inherit the socket's priority. > > > Instead, the priority value is computed when handling the ancillary > > > data (as implemented in commit <f02db315b8d88> > > > > nit: drop the brackets > > > > > ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If > > > the priority is set via IP_TOS, then skb->priority derives its value > > > from the rt_tos2priority function, which calculates the priority > > > based on the value of ipc->tos obtained from IP_TOS. However, if > > > IP_TOS is not used and the priority has been set through a control > > > message, skb->priority will take the value provided by that control > > > message. > > > > The above describes the new situation? There is no way to set > > priority to a control message prior to this patch. > > > > > Therefore, when both options are available, the primary > > > source for skb->priority is the value set via IP_TOS. > > > > > > Currently, there is no option to set the priority directly from > > > userspace on a per-packet basis. The following changes allow > > > SO_PRIORITY to be set through control messages (CMSG), giving > > > userspace applications more granular control over packet priorities. > > > > > > This patch enables setting skb->priority using CMSG. If SO_PRIORITY > > > > Duplicate statement. Overall, the explanation can perhaps be > > condensed and made more clear. > > > > > is specified as ancillary data, the packet is sent with the priority > > > value set through sockc->priority_cmsg_value, overriding the > > > > No longer matches the code. > > > > > socket-level values set via the traditional setsockopt() method. This > > > is analogous to existing support for SO_MARK (as implemented in commit > > > <c6af0c227a22> ("ip: support SO_MARK cmsg")). > > > > > > Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> > > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> > > > --- > > > include/net/inet_sock.h | 2 +- > > > include/net/ip.h | 3 ++- > > > include/net/sock.h | 4 +++- > > > net/can/raw.c | 2 +- > > > net/core/sock.c | 8 ++++++++ > > > net/ipv4/ip_output.c | 7 +++++-- > > > net/ipv4/raw.c | 2 +- > > > net/ipv6/ip6_output.c | 3 ++- > > > net/ipv6/raw.c | 2 +- > > > net/packet/af_packet.c | 2 +- > > > 10 files changed, 25 insertions(+), 10 deletions(-) > > > > > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h > > > index 56d8bc5593d3..3ccbad881d74 100644 > > > --- a/include/net/inet_sock.h > > > +++ b/include/net/inet_sock.h > > > @@ -172,7 +172,7 @@ struct inet_cork { > > > u8 tx_flags; > > > __u8 ttl; > > > __s16 tos; > > > - char priority; > > > + u32 priority; > > > > Let's check with pahole how this affects struct size and holes. > > It likely adds a hole, but unavoidably so. > > > > > __u16 gso_size; > > > u32 ts_opt_id; > > > u64 transmit_time; > > > diff --git a/include/net/ip.h b/include/net/ip.h > > > index 0e548c1f2a0e..e8f71a191277 100644 > > > --- a/include/net/ip.h > > > +++ b/include/net/ip.h > > > @@ -81,7 +81,7 @@ struct ipcm_cookie { > > > __u8 protocol; > > > __u8 ttl; > > > __s16 tos; > > > - char priority; > > > + u32 priority; > > > > No need for a field in ipcm_cookie, when also present in > > sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is > > the right location. > > I think there could be a problem if the priority is set by IP_TOS for > some reason, and then also via cmsg. The latter value may overwrite > it. In the ip_setup_cork() function, there is therefore a check for > the value cork->tos != -1 to give priority to the value set by IP_TOS. > And that's why I thought that there should be a priority field in both > ipcm_cookie and sockcm_cookie. The priority field already existed in > ipcm_cookie, I didn't add it. I just changed the type. The existing behavior that adds a branch in the hot path is actually not needed. The preferred pattern is that the cookie is initialized with the sk field, and then optionally overwritten when parsing cmsgs. The path is slightly complicated by the fact that ipcm_init_sk does not call sockcm_init, but more or less open codes that. The callers of ipcm_init_sk are datagram sockets that have more opportunities to override per-socket options on a per-packet basis. So I was wrong that the field only has to be initialized in sockcm_init. It will have to be initialized by both initializers. But still only a u32 single field is needed. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 0/2] support SO_PRIORITY cmsg 2024-11-02 12:51 [PATCH net-next v2 0/2] support SO_PRIORITY cmsg Anna Emese Nyiri 2024-11-02 12:51 ` [PATCH net-next v2 1/2] Introduce sk_set_prio_allowed helper function Anna Emese Nyiri 2024-11-02 12:51 ` [PATCH net-next v2 2/2] support SO_PRIORITY cmsg Anna Emese Nyiri @ 2024-11-03 15:41 ` Jakub Kicinski 2024-11-03 16:25 ` Ido Schimmel 3 siblings, 0 replies; 11+ messages in thread From: Jakub Kicinski @ 2024-11-03 15:41 UTC (permalink / raw) To: Anna Emese Nyiri; +Cc: netdev, fejes, edumazet, pabeni, willemdebruijn.kernel On Sat, 2 Nov 2024 13:51:34 +0100 Anna Emese Nyiri wrote: > The changes introduce a new helper function, > `sk_set_prio_allowed`, which centralizes the logic for validating > priority settings. This series adds support for the `SO_PRIORITY` > control message, allowing user-space applications to set socket > priority via control messages (cmsg). Could be a flake but it seems to break this test: tools/testing/selftests/net/fq_band_pktlimit.sh with # unexpected drop count at 3 please double check if you see this failure. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 0/2] support SO_PRIORITY cmsg 2024-11-02 12:51 [PATCH net-next v2 0/2] support SO_PRIORITY cmsg Anna Emese Nyiri ` (2 preceding siblings ...) 2024-11-03 15:41 ` [PATCH net-next v2 0/2] " Jakub Kicinski @ 2024-11-03 16:25 ` Ido Schimmel 2024-11-05 14:34 ` Anna Nyiri 3 siblings, 1 reply; 11+ messages in thread From: Ido Schimmel @ 2024-11-03 16:25 UTC (permalink / raw) To: Anna Emese Nyiri Cc: netdev, fejes, edumazet, kuba, pabeni, willemdebruijn.kernel On Sat, Nov 02, 2024 at 01:51:34PM +0100, Anna Emese Nyiri wrote: > The changes introduce a new helper function, > `sk_set_prio_allowed`, which centralizes the logic for validating > priority settings. This series adds support for the `SO_PRIORITY` > control message, allowing user-space applications to set socket > priority via control messages (cmsg). > > Patch Overview: > Patch 1/2: Introduces `sk_set_prio_allowed` helper function. > Patch 2/2: Implements support for setting `SO_PRIORITY` via control > messages. > > v2: > - Suggested by Willem de Bruijn <willemdebruijn.kernel@gmail.com> > introduce "sk_set_prio_allowed" helper to check priority setting > capability > - drop new fields and change sockcm_cookie::priority from "char" to > "u32" to match with sk_buff::priority > - cork->tos value check before priority setting > moved from __ip_make_skb() to ip_setup_cork() > - rebased on net-next > > v1: > https://lore.kernel.org/all/20241029144142.31382-1-annaemesenyiri@gmail.com/ > > Anna Emese Nyiri (2): > Introduce sk_set_prio_allowed helper function > support SO_PRIORITY cmsg > > include/net/inet_sock.h | 2 +- > include/net/ip.h | 3 ++- > include/net/sock.h | 4 +++- > net/can/raw.c | 2 +- > net/core/sock.c | 19 ++++++++++++++++--- > net/ipv4/ip_output.c | 7 +++++-- > net/ipv4/raw.c | 2 +- > net/ipv6/ip6_output.c | 3 ++- > net/ipv6/raw.c | 2 +- > net/packet/af_packet.c | 2 +- > 10 files changed, 33 insertions(+), 13 deletions(-) Please consider adding a selftest for this feature. Willem already extended tools/testing/selftests/net/cmsg_sender.c so that it could be used to set SO_PRIORITY via setsockopt. You can extend it to set SO_PRIORITY via cmsg and then use it in a test like tools/testing/selftests/net/cmsg_so_mark.sh is doing for SO_MARK. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 0/2] support SO_PRIORITY cmsg 2024-11-03 16:25 ` Ido Schimmel @ 2024-11-05 14:34 ` Anna Nyiri 2024-11-05 15:01 ` Willem de Bruijn 0 siblings, 1 reply; 11+ messages in thread From: Anna Nyiri @ 2024-11-05 14:34 UTC (permalink / raw) To: Ido Schimmel; +Cc: netdev, fejes, edumazet, kuba, pabeni, willemdebruijn.kernel Ido Schimmel <idosch@idosch.org> ezt írta (időpont: 2024. nov. 3., V, 17:25): > > On Sat, Nov 02, 2024 at 01:51:34PM +0100, Anna Emese Nyiri wrote: > > The changes introduce a new helper function, > > `sk_set_prio_allowed`, which centralizes the logic for validating > > priority settings. This series adds support for the `SO_PRIORITY` > > control message, allowing user-space applications to set socket > > priority via control messages (cmsg). > > > > Patch Overview: > > Patch 1/2: Introduces `sk_set_prio_allowed` helper function. > > Patch 2/2: Implements support for setting `SO_PRIORITY` via control > > messages. > > > > v2: > > - Suggested by Willem de Bruijn <willemdebruijn.kernel@gmail.com> > > introduce "sk_set_prio_allowed" helper to check priority setting > > capability > > - drop new fields and change sockcm_cookie::priority from "char" to > > "u32" to match with sk_buff::priority > > - cork->tos value check before priority setting > > moved from __ip_make_skb() to ip_setup_cork() > > - rebased on net-next > > > > v1: > > https://lore.kernel.org/all/20241029144142.31382-1-annaemesenyiri@gmail.com/ > > > > Anna Emese Nyiri (2): > > Introduce sk_set_prio_allowed helper function > > support SO_PRIORITY cmsg > > > > include/net/inet_sock.h | 2 +- > > include/net/ip.h | 3 ++- > > include/net/sock.h | 4 +++- > > net/can/raw.c | 2 +- > > net/core/sock.c | 19 ++++++++++++++++--- > > net/ipv4/ip_output.c | 7 +++++-- > > net/ipv4/raw.c | 2 +- > > net/ipv6/ip6_output.c | 3 ++- > > net/ipv6/raw.c | 2 +- > > net/packet/af_packet.c | 2 +- > > 10 files changed, 33 insertions(+), 13 deletions(-) > > Please consider adding a selftest for this feature. Willem already > extended tools/testing/selftests/net/cmsg_sender.c so that it could be > used to set SO_PRIORITY via setsockopt. You can extend it to set > SO_PRIORITY via cmsg and then use it in a test like > tools/testing/selftests/net/cmsg_so_mark.sh is doing for SO_MARK. Of course, I will send the test. However, I would first like to clarify which option I should assign in cmsg_sender for setting priority via cmsg. The -P option is already used for setting priority with setsockopt(), and -p is used to specify the protocol. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 0/2] support SO_PRIORITY cmsg 2024-11-05 14:34 ` Anna Nyiri @ 2024-11-05 15:01 ` Willem de Bruijn 0 siblings, 0 replies; 11+ messages in thread From: Willem de Bruijn @ 2024-11-05 15:01 UTC (permalink / raw) To: Anna Nyiri, Ido Schimmel Cc: netdev, fejes, edumazet, kuba, pabeni, willemdebruijn.kernel Anna Nyiri wrote: > Ido Schimmel <idosch@idosch.org> ezt írta (időpont: 2024. nov. 3., V, 17:25): > > > > On Sat, Nov 02, 2024 at 01:51:34PM +0100, Anna Emese Nyiri wrote: > > > The changes introduce a new helper function, > > > `sk_set_prio_allowed`, which centralizes the logic for validating > > > priority settings. This series adds support for the `SO_PRIORITY` > > > control message, allowing user-space applications to set socket > > > priority via control messages (cmsg). > > > > > > Patch Overview: > > > Patch 1/2: Introduces `sk_set_prio_allowed` helper function. > > > Patch 2/2: Implements support for setting `SO_PRIORITY` via control > > > messages. > > > > > > v2: > > > - Suggested by Willem de Bruijn <willemdebruijn.kernel@gmail.com> > > > introduce "sk_set_prio_allowed" helper to check priority setting > > > capability > > > - drop new fields and change sockcm_cookie::priority from "char" to > > > "u32" to match with sk_buff::priority > > > - cork->tos value check before priority setting > > > moved from __ip_make_skb() to ip_setup_cork() > > > - rebased on net-next > > > > > > v1: > > > https://lore.kernel.org/all/20241029144142.31382-1-annaemesenyiri@gmail.com/ > > > > > > Anna Emese Nyiri (2): > > > Introduce sk_set_prio_allowed helper function > > > support SO_PRIORITY cmsg > > > > > > include/net/inet_sock.h | 2 +- > > > include/net/ip.h | 3 ++- > > > include/net/sock.h | 4 +++- > > > net/can/raw.c | 2 +- > > > net/core/sock.c | 19 ++++++++++++++++--- > > > net/ipv4/ip_output.c | 7 +++++-- > > > net/ipv4/raw.c | 2 +- > > > net/ipv6/ip6_output.c | 3 ++- > > > net/ipv6/raw.c | 2 +- > > > net/packet/af_packet.c | 2 +- > > > 10 files changed, 33 insertions(+), 13 deletions(-) > > > > Please consider adding a selftest for this feature. Willem already > > extended tools/testing/selftests/net/cmsg_sender.c so that it could be > > used to set SO_PRIORITY via setsockopt. You can extend it to set > > SO_PRIORITY via cmsg and then use it in a test like > > tools/testing/selftests/net/cmsg_so_mark.sh is doing for SO_MARK. > > Of course, I will send the test. However, I would first like to > clarify which option I should assign in cmsg_sender for setting > priority via cmsg. The -P option is already used for setting priority > with setsockopt(), and -p is used to specify the protocol. Thanks for adding test coverage. If all the Ps are taken, use Q. It's a rare letter, and happens to be the next. In the end, whatever is available. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-11-05 15:01 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-02 12:51 [PATCH net-next v2 0/2] support SO_PRIORITY cmsg Anna Emese Nyiri 2024-11-02 12:51 ` [PATCH net-next v2 1/2] Introduce sk_set_prio_allowed helper function Anna Emese Nyiri 2024-11-03 1:30 ` Willem de Bruijn 2024-11-02 12:51 ` [PATCH net-next v2 2/2] support SO_PRIORITY cmsg Anna Emese Nyiri 2024-11-03 1:27 ` Willem de Bruijn 2024-11-03 13:39 ` Anna Nyiri 2024-11-03 13:57 ` Willem de Bruijn 2024-11-03 15:41 ` [PATCH net-next v2 0/2] " Jakub Kicinski 2024-11-03 16:25 ` Ido Schimmel 2024-11-05 14:34 ` Anna Nyiri 2024-11-05 15:01 ` Willem de Bruijn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).