* [RFC PATCH net-next 0/3] Preparations for FIB rule DSCP selector
@ 2024-07-25 13:17 Ido Schimmel
2024-07-25 13:17 ` [RFC PATCH net-next 1/3] ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family Ido Schimmel
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Ido Schimmel @ 2024-07-25 13:17 UTC (permalink / raw)
To: netdev, netfilter-devel
Cc: davem, kuba, pabeni, edumazet, dsahern, gnault, pablo, kadlec, fw,
Ido Schimmel
This patchset moves the masking of the upper DSCP bits in 'flowi4_tos'
to the core instead of relying on callers of the FIB lookup API to do
it.
This will allow us to start changing users of the API to initialize the
'flowi4_tos' field with all six bits of the DSCP field. In turn, this
will allow us to extend FIB rules with a new DSCP selector.
By masking the upper DSCP bits in the core we are able to maintain the
behavior of the TOS selector in FIB rules and routes to only match on
the lower DSCP bits.
While working on this I found two users of the API that do not mask the
upper DSCP bits before performing the lookup. The first is an ancient
netlink family that is unlikely to be used. It is adjusted in patch #1
to mask both the upper DSCP bits and the ECN bits before calling the
API.
The second user is a nftables module that differs in this regard from
its equivalent iptables module. It is adjusted in patch #2 to invoke the
API with the upper DSCP bits masked, like all other callers. The
relevant selftest passed, but in the unlikely case that regressions are
reported because of this change, we can restore the existing behavior
using a new flow information flag as discussed here [1].
The last patch moves the masking of the upper DSCP bits to the core,
making the first two patches redundant, but I wanted to post them
separately to call attention to the behavior change for these two users
of the FIB lookup API.
[1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/
Ido Schimmel (3):
ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family
netfilter: nft_fib: Mask upper DSCP bits before FIB lookup
ipv4: Centralize TOS matching
include/net/ip_fib.h | 7 +++++++
net/ipv4/fib_frontend.c | 2 +-
net/ipv4/fib_rules.c | 2 +-
net/ipv4/fib_semantics.c | 3 +--
net/ipv4/fib_trie.c | 3 +--
net/ipv4/netfilter/nft_fib_ipv4.c | 4 +---
6 files changed, 12 insertions(+), 9 deletions(-)
--
2.45.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH net-next 1/3] ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family
2024-07-25 13:17 [RFC PATCH net-next 0/3] Preparations for FIB rule DSCP selector Ido Schimmel
@ 2024-07-25 13:17 ` Ido Schimmel
2024-07-26 13:12 ` Guillaume Nault
2024-07-25 13:17 ` [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup Ido Schimmel
2024-07-25 13:17 ` [RFC PATCH net-next 3/3] ipv4: Centralize TOS matching Ido Schimmel
2 siblings, 1 reply; 13+ messages in thread
From: Ido Schimmel @ 2024-07-25 13:17 UTC (permalink / raw)
To: netdev, netfilter-devel
Cc: davem, kuba, pabeni, edumazet, dsahern, gnault, pablo, kadlec, fw,
Ido Schimmel
The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB
lookup according to user provided parameters and communicate the result
back to user space.
However, unlike other users of the FIB lookup API, the upper DSCP bits
and the ECN bits of the DS field are not masked, which can result in the
wrong result being returned.
Solve this by masking the upper DSCP bits and the ECN bits using
IPTOS_RT_MASK.
The structure that communicates the request and the response is not
exported to user space, so it is unlikely that this netlink family is
actually in use [1].
[1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
net/ipv4/fib_frontend.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 7ad2cafb9276..da540ddb7af6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1343,7 +1343,7 @@ static void nl_fib_lookup(struct net *net, struct fib_result_nl *frn)
struct flowi4 fl4 = {
.flowi4_mark = frn->fl_mark,
.daddr = frn->fl_addr,
- .flowi4_tos = frn->fl_tos,
+ .flowi4_tos = frn->fl_tos & IPTOS_RT_MASK,
.flowi4_scope = frn->fl_scope,
};
struct fib_table *tb;
--
2.45.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup
2024-07-25 13:17 [RFC PATCH net-next 0/3] Preparations for FIB rule DSCP selector Ido Schimmel
2024-07-25 13:17 ` [RFC PATCH net-next 1/3] ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family Ido Schimmel
@ 2024-07-25 13:17 ` Ido Schimmel
2024-07-26 13:15 ` Guillaume Nault
` (2 more replies)
2024-07-25 13:17 ` [RFC PATCH net-next 3/3] ipv4: Centralize TOS matching Ido Schimmel
2 siblings, 3 replies; 13+ messages in thread
From: Ido Schimmel @ 2024-07-25 13:17 UTC (permalink / raw)
To: netdev, netfilter-devel
Cc: davem, kuba, pabeni, edumazet, dsahern, gnault, pablo, kadlec, fw,
Ido Schimmel
As part of its functionality, the nftables FIB expression module
performs a FIB lookup, but unlike other users of the FIB lookup API, it
does so without masking the upper DSCP bits. In particular, this differs
from the equivalent iptables match ("rpfilter") that does mask the upper
DSCP bits before the FIB lookup.
Align the module to other users of the FIB lookup API and mask the upper
DSCP bits using IPTOS_RT_MASK before the lookup.
No regressions in nft_fib.sh:
# ./nft_fib.sh
PASS: fib expression did not cause unwanted packet drops
PASS: fib expression did drop packets for 1.1.1.1
PASS: fib expression did drop packets for 1c3::c01d
PASS: fib expression forward check with policy based routing
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
net/ipv4/netfilter/nft_fib_ipv4.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/ipv4/netfilter/nft_fib_ipv4.c b/net/ipv4/netfilter/nft_fib_ipv4.c
index 9eee535c64dd..df94bc28c3d7 100644
--- a/net/ipv4/netfilter/nft_fib_ipv4.c
+++ b/net/ipv4/netfilter/nft_fib_ipv4.c
@@ -22,8 +22,6 @@ static __be32 get_saddr(__be32 addr)
return addr;
}
-#define DSCP_BITS 0xfc
-
void nft_fib4_eval_type(const struct nft_expr *expr, struct nft_regs *regs,
const struct nft_pktinfo *pkt)
{
@@ -110,7 +108,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
if (priv->flags & NFTA_FIB_F_MARK)
fl4.flowi4_mark = pkt->skb->mark;
- fl4.flowi4_tos = iph->tos & DSCP_BITS;
+ fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;
if (priv->flags & NFTA_FIB_F_DADDR) {
fl4.daddr = iph->daddr;
--
2.45.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH net-next 3/3] ipv4: Centralize TOS matching
2024-07-25 13:17 [RFC PATCH net-next 0/3] Preparations for FIB rule DSCP selector Ido Schimmel
2024-07-25 13:17 ` [RFC PATCH net-next 1/3] ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family Ido Schimmel
2024-07-25 13:17 ` [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup Ido Schimmel
@ 2024-07-25 13:17 ` Ido Schimmel
2024-07-26 13:17 ` Guillaume Nault
2 siblings, 1 reply; 13+ messages in thread
From: Ido Schimmel @ 2024-07-25 13:17 UTC (permalink / raw)
To: netdev, netfilter-devel
Cc: davem, kuba, pabeni, edumazet, dsahern, gnault, pablo, kadlec, fw,
Ido Schimmel
The TOS field in the IPv4 flow information structure ('flowi4_tos') is
matched by the kernel against the TOS selector in IPv4 rules and routes.
The field is initialized differently by different call sites. Some treat
it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
791 TOS and initialize it using IPTOS_RT_MASK.
What is common to all these call sites is that they all initialize the
lower three DSCP bits, which fits the TOS definition in the initial IPv4
specification (RFC 791).
Therefore, the kernel only allows configuring IPv4 FIB rules that match
on the lower three DSCP bits which are always guaranteed to be
initialized by all call sites:
# ip -4 rule add tos 0x1c table 100
# ip -4 rule add tos 0x3c table 100
Error: Invalid tos.
While this works, it is unlikely to be very useful. RFC 791 that
initially defined the TOS and IP precedence fields was updated by RFC
2474 over twenty five years ago where these fields were replaced by a
single six bits DSCP field.
Extending FIB rules to match on DSCP can be done by adding a new DSCP
selector while maintaining the existing semantics of the TOS selector
for applications that rely on that.
A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.
However, making this change alone will result in a behavior change. For
example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
match a FIB rule that was configured with 'tos 0x1c'.
This behavior change can be avoided by masking the upper three DSCP bits
in 'flowi4_tos' before comparing it against the TOS selectors in FIB
rules and routes.
Implement the above by adding a new function that checks whether a given
DSCP value matches the one specified in the IPv4 flow information
structure and invoke it from the three places that currently match on
'flowi4_tos'.
Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
since the latter is not uAPI and we should be able to remove it at some
point.
No regressions in FIB tests:
# ./fib_tests.sh
[...]
Tests passed: 218
Tests failed: 0
And FIB rule tests:
# ./fib_rule_tests.sh
[...]
Tests passed: 116
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
include/net/ip_fib.h | 7 +++++++
net/ipv4/fib_rules.c | 2 +-
net/ipv4/fib_semantics.c | 3 +--
net/ipv4/fib_trie.c | 3 +--
4 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 72af2f223e59..967e4dc555fa 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -22,6 +22,8 @@
#include <linux/percpu.h>
#include <linux/notifier.h>
#include <linux/refcount.h>
+#include <linux/ip.h>
+#include <linux/in_route.h>
struct fib_config {
u8 fc_dst_len;
@@ -434,6 +436,11 @@ static inline bool fib4_rules_early_flow_dissect(struct net *net,
#endif /* CONFIG_IP_MULTIPLE_TABLES */
+static inline bool fib_dscp_masked_match(dscp_t dscp, const struct flowi4 *fl4)
+{
+ return dscp == inet_dsfield_to_dscp(RT_TOS(fl4->flowi4_tos));
+}
+
/* Exported by fib_frontend.c */
extern const struct nla_policy rtm_ipv4_policy[];
void ip_fib_init(void);
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 5bdd1c016009..c26776b71e97 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -186,7 +186,7 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_match(struct fib_rule *rule,
((daddr ^ r->dst) & r->dstmask))
return 0;
- if (r->dscp && r->dscp != inet_dsfield_to_dscp(fl4->flowi4_tos))
+ if (r->dscp && !fib_dscp_masked_match(r->dscp, fl4))
return 0;
if (rule->ip_proto && (rule->ip_proto != fl4->flowi4_proto))
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 2b57cd2b96e2..0f70341cb8b5 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -2066,8 +2066,7 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res)
if (fa->fa_slen != slen)
continue;
- if (fa->fa_dscp &&
- fa->fa_dscp != inet_dsfield_to_dscp(flp->flowi4_tos))
+ if (fa->fa_dscp && !fib_dscp_masked_match(fa->fa_dscp, flp))
continue;
if (fa->tb_id != tb->tb_id)
continue;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 8f30e3f00b7f..09e31757e96c 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1580,8 +1580,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
if (index >= (1ul << fa->fa_slen))
continue;
}
- if (fa->fa_dscp &&
- inet_dscp_to_dsfield(fa->fa_dscp) != flp->flowi4_tos)
+ if (fa->fa_dscp && !fib_dscp_masked_match(fa->fa_dscp, flp))
continue;
/* Paired with WRITE_ONCE() in fib_release_info() */
if (READ_ONCE(fi->fib_dead))
--
2.45.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH net-next 1/3] ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family
2024-07-25 13:17 ` [RFC PATCH net-next 1/3] ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family Ido Schimmel
@ 2024-07-26 13:12 ` Guillaume Nault
0 siblings, 0 replies; 13+ messages in thread
From: Guillaume Nault @ 2024-07-26 13:12 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, netfilter-devel, davem, kuba, pabeni, edumazet, dsahern,
pablo, kadlec, fw
On Thu, Jul 25, 2024 at 04:17:27PM +0300, Ido Schimmel wrote:
> The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB
> lookup according to user provided parameters and communicate the result
> back to user space.
>
> However, unlike other users of the FIB lookup API, the upper DSCP bits
> and the ECN bits of the DS field are not masked, which can result in the
> wrong result being returned.
>
> Solve this by masking the upper DSCP bits and the ECN bits using
> IPTOS_RT_MASK.
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup
2024-07-25 13:17 ` [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup Ido Schimmel
@ 2024-07-26 13:15 ` Guillaume Nault
2024-07-26 13:32 ` Florian Westphal
2024-07-28 2:30 ` Florian Westphal
2 siblings, 0 replies; 13+ messages in thread
From: Guillaume Nault @ 2024-07-26 13:15 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, netfilter-devel, davem, kuba, pabeni, edumazet, dsahern,
pablo, kadlec, fw
On Thu, Jul 25, 2024 at 04:17:28PM +0300, Ido Schimmel wrote:
> As part of its functionality, the nftables FIB expression module
> performs a FIB lookup, but unlike other users of the FIB lookup API, it
> does so without masking the upper DSCP bits. In particular, this differs
> from the equivalent iptables match ("rpfilter") that does mask the upper
> DSCP bits before the FIB lookup.
>
> Align the module to other users of the FIB lookup API and mask the upper
> DSCP bits using IPTOS_RT_MASK before the lookup.
If Florian and Pablo are okay with this change and the long term plan
to allow full DSCP match, then I'm all for it.
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH net-next 3/3] ipv4: Centralize TOS matching
2024-07-25 13:17 ` [RFC PATCH net-next 3/3] ipv4: Centralize TOS matching Ido Schimmel
@ 2024-07-26 13:17 ` Guillaume Nault
2024-07-28 11:34 ` Ido Schimmel
0 siblings, 1 reply; 13+ messages in thread
From: Guillaume Nault @ 2024-07-26 13:17 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, netfilter-devel, davem, kuba, pabeni, edumazet, dsahern,
pablo, kadlec, fw
On Thu, Jul 25, 2024 at 04:17:29PM +0300, Ido Schimmel wrote:
> The TOS field in the IPv4 flow information structure ('flowi4_tos') is
> matched by the kernel against the TOS selector in IPv4 rules and routes.
> The field is initialized differently by different call sites. Some treat
> it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
> RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
> 791 TOS and initialize it using IPTOS_RT_MASK.
>
> What is common to all these call sites is that they all initialize the
> lower three DSCP bits, which fits the TOS definition in the initial IPv4
> specification (RFC 791).
>
> Therefore, the kernel only allows configuring IPv4 FIB rules that match
> on the lower three DSCP bits which are always guaranteed to be
> initialized by all call sites:
>
> # ip -4 rule add tos 0x1c table 100
> # ip -4 rule add tos 0x3c table 100
> Error: Invalid tos.
>
> While this works, it is unlikely to be very useful. RFC 791 that
> initially defined the TOS and IP precedence fields was updated by RFC
> 2474 over twenty five years ago where these fields were replaced by a
> single six bits DSCP field.
>
> Extending FIB rules to match on DSCP can be done by adding a new DSCP
> selector while maintaining the existing semantics of the TOS selector
> for applications that rely on that.
>
> A prerequisite for allowing FIB rules to match on DSCP is to adjust all
> the call sites to initialize the high order DSCP bits and remove their
> masking along the path to the core where the field is matched on.
>
> However, making this change alone will result in a behavior change. For
> example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
> match a FIB rule that was configured with 'tos 0x1c'.
>
> This behavior change can be avoided by masking the upper three DSCP bits
> in 'flowi4_tos' before comparing it against the TOS selectors in FIB
> rules and routes.
>
> Implement the above by adding a new function that checks whether a given
> DSCP value matches the one specified in the IPv4 flow information
> structure and invoke it from the three places that currently match on
> 'flowi4_tos'.
>
> Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
> since the latter is not uAPI and we should be able to remove it at some
> point.
>
> No regressions in FIB tests:
>
> # ./fib_tests.sh
> [...]
> Tests passed: 218
> Tests failed: 0
>
> And FIB rule tests:
>
> # ./fib_rule_tests.sh
> [...]
> Tests passed: 116
> Tests failed: 0
>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
> include/net/ip_fib.h | 7 +++++++
> net/ipv4/fib_rules.c | 2 +-
> net/ipv4/fib_semantics.c | 3 +--
> net/ipv4/fib_trie.c | 3 +--
> 4 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 72af2f223e59..967e4dc555fa 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -22,6 +22,8 @@
> #include <linux/percpu.h>
> #include <linux/notifier.h>
> #include <linux/refcount.h>
> +#include <linux/ip.h>
Why including linux/ip.h? That doesn't seem necessary for this change.
Appart from that,
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Thanks a lot!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup
2024-07-25 13:17 ` [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup Ido Schimmel
2024-07-26 13:15 ` Guillaume Nault
@ 2024-07-26 13:32 ` Florian Westphal
2024-07-26 15:40 ` Guillaume Nault
2024-07-28 2:30 ` Florian Westphal
2 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2024-07-26 13:32 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, netfilter-devel, davem, kuba, pabeni, edumazet, dsahern,
gnault, pablo, kadlec, fw
Ido Schimmel <idosch@nvidia.com> wrote:
> @@ -110,7 +108,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
> if (priv->flags & NFTA_FIB_F_MARK)
> fl4.flowi4_mark = pkt->skb->mark;
>
> - fl4.flowi4_tos = iph->tos & DSCP_BITS;
> + fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;
If this is supposed to get centralised, wouldn't it
make more sense to not mask it, or will that happen later?
I thought plan was to ditch RT_MASK...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup
2024-07-26 13:32 ` Florian Westphal
@ 2024-07-26 15:40 ` Guillaume Nault
0 siblings, 0 replies; 13+ messages in thread
From: Guillaume Nault @ 2024-07-26 15:40 UTC (permalink / raw)
To: Florian Westphal
Cc: Ido Schimmel, netdev, netfilter-devel, davem, kuba, pabeni,
edumazet, dsahern, pablo, kadlec
On Fri, Jul 26, 2024 at 03:32:48PM +0200, Florian Westphal wrote:
> Ido Schimmel <idosch@nvidia.com> wrote:
> > @@ -110,7 +108,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
> > if (priv->flags & NFTA_FIB_F_MARK)
> > fl4.flowi4_mark = pkt->skb->mark;
> >
> > - fl4.flowi4_tos = iph->tos & DSCP_BITS;
> > + fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;
>
> If this is supposed to get centralised, wouldn't it
> make more sense to not mask it, or will that happen later?
I think Ido prefers to have this behaviour introduced in a dedicated
patch, rather than as a side effect of the centralisation done in
patch 3/3.
Once patch 3/3 is applied, the next step would be to remove all those
redundant masks (including this new nft_fib4_eval() one), so that
fib4_rule_match(), fib_table_lookup() and fib_select_default() could
see the full DSCP.
This will allow the final step of allowing IPv4 routes and fib-rules to
be configured for matching either the DSCP bits or only the old TOS ones.
> I thought plan was to ditch RT_MASK...
That was my preference too. But Ido's affraid of potential users who
may depend on fib-rules with tos=0x1c matching packets with
dsfield=0xfc. Centralising the mask would allow us to configure this
behaviour upon route or fib-rule creation.
Here's the original thread that lead to this RFC, if anyone wants more
details on the current plan:
https://lore.kernel.org/netdev/ZnwCWejSuOTqriJc@shredder.mtl.com/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup
2024-07-25 13:17 ` [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup Ido Schimmel
2024-07-26 13:15 ` Guillaume Nault
2024-07-26 13:32 ` Florian Westphal
@ 2024-07-28 2:30 ` Florian Westphal
2024-07-28 10:51 ` Ido Schimmel
2 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2024-07-28 2:30 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, netfilter-devel, davem, kuba, pabeni, edumazet, dsahern,
gnault, pablo, kadlec, fw
Ido Schimmel <idosch@nvidia.com> wrote:
> void nft_fib4_eval_type(const struct nft_expr *expr, struct nft_regs *regs,
> const struct nft_pktinfo *pkt)
> {
> @@ -110,7 +108,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
> if (priv->flags & NFTA_FIB_F_MARK)
> fl4.flowi4_mark = pkt->skb->mark;
>
> - fl4.flowi4_tos = iph->tos & DSCP_BITS;
> + fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;
I was confused because cover letter talks about allowing both tos or dscp depending on
new nlattr for ipv4, but then this patch makes that impossible because dscp bits get masked.
patch 3 says:
----
A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.
----
But nft_fib_ipv4.c already does that.
So I would suggest to just drop this patch and then get rid of '&
DSCP_BITS' once everything is in place.
But feel free to handle this as you prefer.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup
2024-07-28 2:30 ` Florian Westphal
@ 2024-07-28 10:51 ` Ido Schimmel
0 siblings, 0 replies; 13+ messages in thread
From: Ido Schimmel @ 2024-07-28 10:51 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, netfilter-devel, davem, kuba, pabeni, edumazet, dsahern,
gnault, pablo, kadlec
On Sun, Jul 28, 2024 at 04:30:40AM +0200, Florian Westphal wrote:
> Ido Schimmel <idosch@nvidia.com> wrote:
> > void nft_fib4_eval_type(const struct nft_expr *expr, struct nft_regs *regs,
> > const struct nft_pktinfo *pkt)
> > {
> > @@ -110,7 +108,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
> > if (priv->flags & NFTA_FIB_F_MARK)
> > fl4.flowi4_mark = pkt->skb->mark;
> >
> > - fl4.flowi4_tos = iph->tos & DSCP_BITS;
> > + fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;
>
> I was confused because cover letter talks about allowing both tos or dscp depending on
> new nlattr for ipv4, but then this patch makes that impossible because dscp bits get masked.
>
> patch 3 says:
> ----
> A prerequisite for allowing FIB rules to match on DSCP is to adjust all
> the call sites to initialize the high order DSCP bits and remove their
> masking along the path to the core where the field is matched on.
> ----
>
> But nft_fib_ipv4.c already does that.
>
> So I would suggest to just drop this patch and then get rid of '&
> DSCP_BITS' once everything is in place.
>
> But feel free to handle this as you prefer.
My preference is to first align all the users to mask the upper DSCP
bits (patches #1-#2), then move the masking to the core (patch #3) and
finally remove the masking from the various call sites (future
patchsets). Will make it clearer in the cover letter.
Thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH net-next 3/3] ipv4: Centralize TOS matching
2024-07-26 13:17 ` Guillaume Nault
@ 2024-07-28 11:34 ` Ido Schimmel
2024-07-29 16:05 ` Guillaume Nault
0 siblings, 1 reply; 13+ messages in thread
From: Ido Schimmel @ 2024-07-28 11:34 UTC (permalink / raw)
To: Guillaume Nault
Cc: netdev, netfilter-devel, davem, kuba, pabeni, edumazet, dsahern,
pablo, kadlec, fw
On Fri, Jul 26, 2024 at 03:17:15PM +0200, Guillaume Nault wrote:
> On Thu, Jul 25, 2024 at 04:17:29PM +0300, Ido Schimmel wrote:
> > diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> > index 72af2f223e59..967e4dc555fa 100644
> > --- a/include/net/ip_fib.h
> > +++ b/include/net/ip_fib.h
> > @@ -22,6 +22,8 @@
> > #include <linux/percpu.h>
> > #include <linux/notifier.h>
> > #include <linux/refcount.h>
> > +#include <linux/ip.h>
>
> Why including linux/ip.h? That doesn't seem necessary for this change.
RT_TOS() is defined in linux/in_route.h as ((tos)&IPTOS_TOS_MASK), but
IPTOS_TOS_MASK is defined in liunx/ip.h which is not included by
linux/in_route.h for some reason.
This also works:
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 967e4dc555fa..269ec10f63e4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -22,7 +22,6 @@
#include <linux/percpu.h>
#include <linux/notifier.h>
#include <linux/refcount.h>
-#include <linux/ip.h>
#include <linux/in_route.h>
struct fib_config {
diff --git a/include/uapi/linux/in_route.h b/include/uapi/linux/in_route.h
index 0cc2c23b47f8..10bdd7e7107f 100644
--- a/include/uapi/linux/in_route.h
+++ b/include/uapi/linux/in_route.h
@@ -2,6 +2,8 @@
#ifndef _LINUX_IN_ROUTE_H
#define _LINUX_IN_ROUTE_H
+#include <linux/ip.h>
+
/* IPv4 routing cache flags */
#define RTCF_DEAD RTNH_F_DEAD
>
> Appart from that,
>
> Reviewed-by: Guillaume Nault <gnault@redhat.com>
Thanks!
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH net-next 3/3] ipv4: Centralize TOS matching
2024-07-28 11:34 ` Ido Schimmel
@ 2024-07-29 16:05 ` Guillaume Nault
0 siblings, 0 replies; 13+ messages in thread
From: Guillaume Nault @ 2024-07-29 16:05 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, netfilter-devel, davem, kuba, pabeni, edumazet, dsahern,
pablo, kadlec, fw
On Sun, Jul 28, 2024 at 02:34:06PM +0300, Ido Schimmel wrote:
> On Fri, Jul 26, 2024 at 03:17:15PM +0200, Guillaume Nault wrote:
> > On Thu, Jul 25, 2024 at 04:17:29PM +0300, Ido Schimmel wrote:
> > > diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> > > index 72af2f223e59..967e4dc555fa 100644
> > > --- a/include/net/ip_fib.h
> > > +++ b/include/net/ip_fib.h
> > > @@ -22,6 +22,8 @@
> > > #include <linux/percpu.h>
> > > #include <linux/notifier.h>
> > > #include <linux/refcount.h>
> > > +#include <linux/ip.h>
> >
> > Why including linux/ip.h? That doesn't seem necessary for this change.
>
> RT_TOS() is defined in linux/in_route.h as ((tos)&IPTOS_TOS_MASK), but
> IPTOS_TOS_MASK is defined in liunx/ip.h which is not included by
> linux/in_route.h for some reason.
>
> This also works:
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 967e4dc555fa..269ec10f63e4 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -22,7 +22,6 @@
> #include <linux/percpu.h>
> #include <linux/notifier.h>
> #include <linux/refcount.h>
> -#include <linux/ip.h>
> #include <linux/in_route.h>
>
> struct fib_config {
> diff --git a/include/uapi/linux/in_route.h b/include/uapi/linux/in_route.h
> index 0cc2c23b47f8..10bdd7e7107f 100644
> --- a/include/uapi/linux/in_route.h
> +++ b/include/uapi/linux/in_route.h
> @@ -2,6 +2,8 @@
> #ifndef _LINUX_IN_ROUTE_H
> #define _LINUX_IN_ROUTE_H
>
> +#include <linux/ip.h>
> +
Thanks, I prefer this solution, which I find cleaner.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-07-29 16:05 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-25 13:17 [RFC PATCH net-next 0/3] Preparations for FIB rule DSCP selector Ido Schimmel
2024-07-25 13:17 ` [RFC PATCH net-next 1/3] ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family Ido Schimmel
2024-07-26 13:12 ` Guillaume Nault
2024-07-25 13:17 ` [RFC PATCH net-next 2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup Ido Schimmel
2024-07-26 13:15 ` Guillaume Nault
2024-07-26 13:32 ` Florian Westphal
2024-07-26 15:40 ` Guillaume Nault
2024-07-28 2:30 ` Florian Westphal
2024-07-28 10:51 ` Ido Schimmel
2024-07-25 13:17 ` [RFC PATCH net-next 3/3] ipv4: Centralize TOS matching Ido Schimmel
2024-07-26 13:17 ` Guillaume Nault
2024-07-28 11:34 ` Ido Schimmel
2024-07-29 16:05 ` Guillaume Nault
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).