* [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion.
@ 2025-01-02 16:34 Guillaume Nault
2025-01-03 15:35 ` Xin Long
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Guillaume Nault @ 2025-01-02 16:34 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: netdev, Simon Horman, Marcelo Ricardo Leitner, Xin Long,
linux-sctp, Ido Schimmel
Define inet_sk_dscp() to get a dscp_t value from struct inet_sock, so
that sctp_v4_get_dst() can easily set ->flowi4_tos from a dscp_t
variable. For the SCTP_DSCP_SET_MASK case, we can just use
inet_dsfield_to_dscp() to get a dscp_t value.
Then, when converting ->flowi4_tos from __u8 to dscp_t, we'll just have
to drop the inet_dscp_to_dsfield() conversion function.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
include/net/inet_sock.h | 6 ++++++
net/sctp/protocol.c | 10 +++++++---
2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 3ccbad881d74..1086256549fa 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -19,6 +19,7 @@
#include <linux/netdevice.h>
#include <net/flow.h>
+#include <net/inet_dscp.h>
#include <net/sock.h>
#include <net/request_sock.h>
#include <net/netns/hash.h>
@@ -302,6 +303,11 @@ static inline unsigned long inet_cmsg_flags(const struct inet_sock *inet)
return READ_ONCE(inet->inet_flags) & IP_CMSG_ALL;
}
+static inline dscp_t inet_sk_dscp(const struct inet_sock *inet)
+{
+ return inet_dsfield_to_dscp(READ_ONCE(inet->tos));
+}
+
#define inet_test_bit(nr, sk) \
test_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags)
#define inet_set_bit(nr, sk) \
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 8b9a1b96695e..29727ed1008e 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -43,6 +43,7 @@
#include <net/addrconf.h>
#include <net/inet_common.h>
#include <net/inet_ecn.h>
+#include <net/inet_sock.h>
#include <net/udp_tunnel.h>
#include <net/inet_dscp.h>
@@ -427,16 +428,19 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
struct dst_entry *dst = NULL;
union sctp_addr *daddr = &t->ipaddr;
union sctp_addr dst_saddr;
- u8 tos = READ_ONCE(inet_sk(sk)->tos);
+ dscp_t dscp;
if (t->dscp & SCTP_DSCP_SET_MASK)
- tos = t->dscp & SCTP_DSCP_VAL_MASK;
+ dscp = inet_dsfield_to_dscp(t->dscp);
+ else
+ dscp = inet_sk_dscp(inet_sk(sk));
+
memset(&_fl, 0x0, sizeof(_fl));
fl4->daddr = daddr->v4.sin_addr.s_addr;
fl4->fl4_dport = daddr->v4.sin_port;
fl4->flowi4_proto = IPPROTO_SCTP;
if (asoc) {
- fl4->flowi4_tos = tos & INET_DSCP_MASK;
+ fl4->flowi4_tos = inet_dscp_to_dsfield(dscp);
fl4->flowi4_scope = ip_sock_rt_scope(asoc->base.sk);
fl4->flowi4_oif = asoc->base.sk->sk_bound_dev_if;
fl4->fl4_sport = htons(asoc->base.bind_addr.port);
--
2.39.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion. 2025-01-02 16:34 [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion Guillaume Nault @ 2025-01-03 15:35 ` Xin Long 2025-01-03 16:59 ` Guillaume Nault 2025-01-06 9:07 ` Ido Schimmel 2025-01-06 23:20 ` patchwork-bot+netdevbpf 2 siblings, 1 reply; 6+ messages in thread From: Xin Long @ 2025-01-03 15:35 UTC (permalink / raw) To: Guillaume Nault Cc: David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Simon Horman, Marcelo Ricardo Leitner, linux-sctp, Ido Schimmel On Thu, Jan 2, 2025 at 11:34 AM Guillaume Nault <gnault@redhat.com> wrote: > > Define inet_sk_dscp() to get a dscp_t value from struct inet_sock, so > that sctp_v4_get_dst() can easily set ->flowi4_tos from a dscp_t > variable. For the SCTP_DSCP_SET_MASK case, we can just use > inet_dsfield_to_dscp() to get a dscp_t value. > > Then, when converting ->flowi4_tos from __u8 to dscp_t, we'll just have > to drop the inet_dscp_to_dsfield() conversion function. With inet_dsfield_to_dscp() && inet_dsfield_to_dscp(), the logic looks like: tos(dsfield) -> dscp_t -> tos(dsfield) It's a bit confusing, but it has been doing that all over routing places. In sctp_v4_xmit(), there's the similar tos/dscp thing, although it's not for fl4.flowi4_tos. Also, I'm curious there are still a few places under net/ using: fl4.flowi4_tos = tos & INET_DSCP_MASK; Will you consider changing all of them with inet_dsfield_to_dscp() && inet_dsfield_to_dscp() as well? Thanks. > > Signed-off-by: Guillaume Nault <gnault@redhat.com> > --- > include/net/inet_sock.h | 6 ++++++ > net/sctp/protocol.c | 10 +++++++--- > 2 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h > index 3ccbad881d74..1086256549fa 100644 > --- a/include/net/inet_sock.h > +++ b/include/net/inet_sock.h > @@ -19,6 +19,7 @@ > #include <linux/netdevice.h> > > #include <net/flow.h> > +#include <net/inet_dscp.h> > #include <net/sock.h> > #include <net/request_sock.h> > #include <net/netns/hash.h> > @@ -302,6 +303,11 @@ static inline unsigned long inet_cmsg_flags(const struct inet_sock *inet) > return READ_ONCE(inet->inet_flags) & IP_CMSG_ALL; > } > > +static inline dscp_t inet_sk_dscp(const struct inet_sock *inet) > +{ > + return inet_dsfield_to_dscp(READ_ONCE(inet->tos)); > +} > + > #define inet_test_bit(nr, sk) \ > test_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags) > #define inet_set_bit(nr, sk) \ > diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c > index 8b9a1b96695e..29727ed1008e 100644 > --- a/net/sctp/protocol.c > +++ b/net/sctp/protocol.c > @@ -43,6 +43,7 @@ > #include <net/addrconf.h> > #include <net/inet_common.h> > #include <net/inet_ecn.h> > +#include <net/inet_sock.h> > #include <net/udp_tunnel.h> > #include <net/inet_dscp.h> > > @@ -427,16 +428,19 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr, > struct dst_entry *dst = NULL; > union sctp_addr *daddr = &t->ipaddr; > union sctp_addr dst_saddr; > - u8 tos = READ_ONCE(inet_sk(sk)->tos); > + dscp_t dscp; > > if (t->dscp & SCTP_DSCP_SET_MASK) > - tos = t->dscp & SCTP_DSCP_VAL_MASK; > + dscp = inet_dsfield_to_dscp(t->dscp); > + else > + dscp = inet_sk_dscp(inet_sk(sk)); > + > memset(&_fl, 0x0, sizeof(_fl)); > fl4->daddr = daddr->v4.sin_addr.s_addr; > fl4->fl4_dport = daddr->v4.sin_port; > fl4->flowi4_proto = IPPROTO_SCTP; > if (asoc) { > - fl4->flowi4_tos = tos & INET_DSCP_MASK; > + fl4->flowi4_tos = inet_dscp_to_dsfield(dscp); > fl4->flowi4_scope = ip_sock_rt_scope(asoc->base.sk); > fl4->flowi4_oif = asoc->base.sk->sk_bound_dev_if; > fl4->fl4_sport = htons(asoc->base.bind_addr.port); > -- > 2.39.2 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion. 2025-01-03 15:35 ` Xin Long @ 2025-01-03 16:59 ` Guillaume Nault 2025-01-05 18:28 ` Xin Long 0 siblings, 1 reply; 6+ messages in thread From: Guillaume Nault @ 2025-01-03 16:59 UTC (permalink / raw) To: Xin Long Cc: David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Simon Horman, Marcelo Ricardo Leitner, linux-sctp, Ido Schimmel On Fri, Jan 03, 2025 at 10:35:55AM -0500, Xin Long wrote: > On Thu, Jan 2, 2025 at 11:34 AM Guillaume Nault <gnault@redhat.com> wrote: > > > > Define inet_sk_dscp() to get a dscp_t value from struct inet_sock, so > > that sctp_v4_get_dst() can easily set ->flowi4_tos from a dscp_t > > variable. For the SCTP_DSCP_SET_MASK case, we can just use > > inet_dsfield_to_dscp() to get a dscp_t value. > > > > Then, when converting ->flowi4_tos from __u8 to dscp_t, we'll just have > > to drop the inet_dscp_to_dsfield() conversion function. > With inet_dsfield_to_dscp() && inet_dsfield_to_dscp(), the logic > looks like: tos(dsfield) -> dscp_t -> tos(dsfield) > It's a bit confusing, but it has been doing that all over routing places. The objective is to have DSCP values stored in dscp_t variables in the kernel and keep __u8 values in user space APIs and packet headers. In practice this means using inet_dscp_to_dsfield() and inet_dsfield_to_dscp() at boundaries with user space or networking. However, since core kernel functions and structures are getting updated incrementally, some inet_dscp_to_dsfield() and inet_dsfield_to_dscp() conversions are temporarily needed between already converted and not yet converted parts of the stack. > In sctp_v4_xmit(), there's the similar tos/dscp thing, although it's not > for fl4.flowi4_tos. The sctp_v4_xmit() case is special because its dscp variable, despite its name, doesn't only carry a DSCP value, but also ECN bits. Converting it to a dscp_t variable would lose the ECN information. To be more precise, this is only the case if the SCTP_DSCP_SET_MASK flag is not set. That is, when the "dscp" variable is set using inet->tos. Since inet->tos contains both DSCP and ECN bits, this allows the socket owner to manage ECN. I don't know if that's intented by the SCTP code. If that isn't, and the ECN bits aren't supposed to be taken into account here, then I'm happy to send a patch to convert sctp_v4_xmit() to dscp_t too. > Also, I'm curious there are still a few places under net/ using: > > fl4.flowi4_tos = tos & INET_DSCP_MASK; > > Will you consider changing all of them with > inet_dsfield_to_dscp() && inet_dsfield_to_dscp() as well? Yes, I have a few more cases to convert. But some of them will have to stay. For example, in net/ipv4/ip_output.c, __ip_queue_xmit() has "fl4->flowi4_tos = tos & INET_DSCP_MASK;", but we can't just convert that "tos" variable to dscp_t because it carries both DSCP and ECN values. Although ->flowi4_tos isn't concerned with ECN, these ECN bits are used later to set the IP header. There are other cases that I'm not planning to convert, for example because the value is read from a UAPI structure that can't be updated. For example the "fl4.flowi4_tos = params->tos & INET_DSCP_MASK;" case in bpf_ipv4_fib_lookup(), where "params" is a struct bpf_fib_lookup, exported in UAPI. To summarise, the plan is to incrementally convert most ->flowi4_tos assignments, so that we have a dscp_t variable at hand. Then I'll send a patch converting all ->flowi4_tos users at once. Most of it should consist of trivial inet_dscp_to_dsfield() removals, thanks to the previous dscp_t conversions. The cases that won't follow that pattern will be explained in the commit message, but the idea is to have as few of them as possible. BTW, the reason for this work is to avoid having ECN bits interfering with route lookups. We had several such issues and regressions in the past because of ->flowi4_tos having ECN bits set in specific scenarios. > Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion. 2025-01-03 16:59 ` Guillaume Nault @ 2025-01-05 18:28 ` Xin Long 0 siblings, 0 replies; 6+ messages in thread From: Xin Long @ 2025-01-05 18:28 UTC (permalink / raw) To: Guillaume Nault Cc: David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Simon Horman, Marcelo Ricardo Leitner, linux-sctp, Ido Schimmel On Fri, Jan 3, 2025 at 11:59 AM Guillaume Nault <gnault@redhat.com> wrote: > > On Fri, Jan 03, 2025 at 10:35:55AM -0500, Xin Long wrote: > > On Thu, Jan 2, 2025 at 11:34 AM Guillaume Nault <gnault@redhat.com> wrote: > > > > > > Define inet_sk_dscp() to get a dscp_t value from struct inet_sock, so > > > that sctp_v4_get_dst() can easily set ->flowi4_tos from a dscp_t > > > variable. For the SCTP_DSCP_SET_MASK case, we can just use > > > inet_dsfield_to_dscp() to get a dscp_t value. > > > > > > Then, when converting ->flowi4_tos from __u8 to dscp_t, we'll just have > > > to drop the inet_dscp_to_dsfield() conversion function. > > With inet_dsfield_to_dscp() && inet_dsfield_to_dscp(), the logic > > looks like: tos(dsfield) -> dscp_t -> tos(dsfield) > > It's a bit confusing, but it has been doing that all over routing places. > > The objective is to have DSCP values stored in dscp_t variables in the > kernel and keep __u8 values in user space APIs and packet headers. In > practice this means using inet_dscp_to_dsfield() and > inet_dsfield_to_dscp() at boundaries with user space or networking. > > However, since core kernel functions and structures are getting updated > incrementally, some inet_dscp_to_dsfield() and inet_dsfield_to_dscp() > conversions are temporarily needed between already converted and not yet > converted parts of the stack. > > > In sctp_v4_xmit(), there's the similar tos/dscp thing, although it's not > > for fl4.flowi4_tos. > > The sctp_v4_xmit() case is special because its dscp variable, despite > its name, doesn't only carry a DSCP value, but also ECN bits. > Converting it to a dscp_t variable would lose the ECN information. > > To be more precise, this is only the case if the SCTP_DSCP_SET_MASK > flag is not set. That is, when the "dscp" variable is set using > inet->tos. Since inet->tos contains both DSCP and ECN bits, this allows > the socket owner to manage ECN. I don't know if that's intented by the > SCTP code. If that isn't, and the ECN bits aren't supposed to be taken > into account here, then I'm happy to send a patch to convert > sctp_v4_xmit() to dscp_t too. From the beginning SCTP sends its packet via ip_queue_xmit() where it allows the socket owners to manage ECN, like TCP. So let's just leave it. > > > Also, I'm curious there are still a few places under net/ using: > > > > fl4.flowi4_tos = tos & INET_DSCP_MASK; > > > > Will you consider changing all of them with > > inet_dsfield_to_dscp() && inet_dsfield_to_dscp() as well? > > Yes, I have a few more cases to convert. But some of them will have to > stay. For example, in net/ipv4/ip_output.c, __ip_queue_xmit() has > "fl4->flowi4_tos = tos & INET_DSCP_MASK;", but we can't just convert > that "tos" variable to dscp_t because it carries both DSCP and ECN > values. Although ->flowi4_tos isn't concerned with ECN, these ECN bits > are used later to set the IP header. > > There are other cases that I'm not planning to convert, for example > because the value is read from a UAPI structure that can't be updated. > For example the "fl4.flowi4_tos = params->tos & INET_DSCP_MASK;" case > in bpf_ipv4_fib_lookup(), where "params" is a struct bpf_fib_lookup, > exported in UAPI. > > To summarise, the plan is to incrementally convert most ->flowi4_tos > assignments, so that we have a dscp_t variable at hand. Then I'll send > a patch converting all ->flowi4_tos users at once. Most of it should > consist of trivial inet_dscp_to_dsfield() removals, thanks to the > previous dscp_t conversions. The cases that won't follow that pattern > will be explained in the commit message, but the idea is to have as few > of them as possible. > > BTW, the reason for this work is to avoid having ECN bits interfering > with route lookups. We had several such issues and regressions in the > past because of ->flowi4_tos having ECN bits set in specific scenarios. > Got it, thanks for the detailed explanation. Acked-by: Xin Long <lucien.xin@gmail.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion. 2025-01-02 16:34 [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion Guillaume Nault 2025-01-03 15:35 ` Xin Long @ 2025-01-06 9:07 ` Ido Schimmel 2025-01-06 23:20 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 6+ messages in thread From: Ido Schimmel @ 2025-01-06 9:07 UTC (permalink / raw) To: Guillaume Nault Cc: David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Simon Horman, Marcelo Ricardo Leitner, Xin Long, linux-sctp On Thu, Jan 02, 2025 at 05:34:18PM +0100, Guillaume Nault wrote: > Define inet_sk_dscp() to get a dscp_t value from struct inet_sock, so > that sctp_v4_get_dst() can easily set ->flowi4_tos from a dscp_t > variable. For the SCTP_DSCP_SET_MASK case, we can just use > inet_dsfield_to_dscp() to get a dscp_t value. > > Then, when converting ->flowi4_tos from __u8 to dscp_t, we'll just have > to drop the inet_dscp_to_dsfield() conversion function. > > Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion. 2025-01-02 16:34 [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion Guillaume Nault 2025-01-03 15:35 ` Xin Long 2025-01-06 9:07 ` Ido Schimmel @ 2025-01-06 23:20 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 6+ messages in thread From: patchwork-bot+netdevbpf @ 2025-01-06 23:20 UTC (permalink / raw) To: Guillaume Nault Cc: davem, kuba, pabeni, edumazet, netdev, horms, marcelo.leitner, lucien.xin, linux-sctp, idosch Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 2 Jan 2025 17:34:18 +0100 you wrote: > Define inet_sk_dscp() to get a dscp_t value from struct inet_sock, so > that sctp_v4_get_dst() can easily set ->flowi4_tos from a dscp_t > variable. For the SCTP_DSCP_SET_MASK case, we can just use > inet_dsfield_to_dscp() to get a dscp_t value. > > Then, when converting ->flowi4_tos from __u8 to dscp_t, we'll just have > to drop the inet_dscp_to_dsfield() conversion function. > > [...] Here is the summary with links: - [net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion. https://git.kernel.org/netdev/net-next/c/3f9f5cd005f5 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-01-06 23:20 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-02 16:34 [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion Guillaume Nault 2025-01-03 15:35 ` Xin Long 2025-01-03 16:59 ` Guillaume Nault 2025-01-05 18:28 ` Xin Long 2025-01-06 9:07 ` Ido Schimmel 2025-01-06 23:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).