* [PATCH net-next 1/4] ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rules
2022-02-04 13:58 [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type Guillaume Nault
@ 2022-02-04 13:58 ` Guillaume Nault
2022-02-04 13:58 ` [PATCH net-next 2/4] ipv4: Stop taking ECN bits into account in fib4-rules Guillaume Nault
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Guillaume Nault @ 2022-02-04 13:58 UTC (permalink / raw)
To: David Miller, Jakub Kicinski
Cc: netdev, Hideaki YOSHIFUJI, David Ahern,
Toke Høiland-Jørgensen, Shuah Khan, linux-kselftest,
Russell Strong, Dave Taht
Define a dscp_t type and its appropriate helpers that ensure ECN bits
are not taken into account when handling DSCP.
Use this new type to replace the tclass field of struct fib6_rule, so
that fib6-rules don't get influenced by ECN bits anymore.
Before this patch, fib6-rules didn't make any distinction between the
DSCP and ECN bits. Therefore, rules specifying a DSCP (tos or dsfield
options in iproute2) stopped working as soon a packets had at least one
of its ECN bits set (as a work around one could create four rules for
each DSCP value to match, one for each possible ECN value).
After this patch fib6-rules only compare the DSCP bits. ECN doesn't
influence the result anymore. Also, fib6-rules now must have the ECN
bits cleared or they will be rejected.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
include/net/inet_dscp.h | 57 +++++++++++++++++++
include/net/ipv6.h | 6 ++
net/ipv6/fib6_rules.c | 19 +++++--
tools/testing/selftests/net/fib_rule_tests.sh | 30 +++++++++-
4 files changed, 105 insertions(+), 7 deletions(-)
create mode 100644 include/net/inet_dscp.h
diff --git a/include/net/inet_dscp.h b/include/net/inet_dscp.h
new file mode 100644
index 000000000000..72f250dffada
--- /dev/null
+++ b/include/net/inet_dscp.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * inet_dscp.h: helpers for handling differentiated services codepoints (DSCP)
+ *
+ * DSCP is defined in RFC 2474:
+ *
+ * 0 1 2 3 4 5 6 7
+ * +---+---+---+---+---+---+---+---+
+ * | DSCP | CU |
+ * +---+---+---+---+---+---+---+---+
+ *
+ * DSCP: differentiated services codepoint
+ * CU: currently unused
+ *
+ * The whole DSCP + CU bits form the DS field.
+ * The DS field is also commonly called TOS or Traffic Class (for IPv6).
+ *
+ * Note: the CU bits are now used for Explicit Congestion Notification
+ * (RFC 3168).
+ */
+
+#ifndef _INET_DSCP_H
+#define _INET_DSCP_H
+
+#include <linux/types.h>
+
+/* Special type for storing DSCP values.
+ *
+ * A dscp_t variable stores a DS field with the CU (ECN) bits cleared.
+ * Using dscp_t allows to strictly separate DSCP and ECN bits, thus avoiding
+ * bugs where ECN bits are erroneously taken into account during FIB lookups
+ * or policy routing.
+ *
+ * Note: to get the real DSCP value contained in a dscp_t variable one would
+ * have to do a bit shift after calling inet_dscp_to_dsfield(). We could have
+ * a helper for that, but there's currently no users.
+ */
+typedef u8 __bitwise dscp_t;
+
+#define INET_DSCP_MASK 0xfc
+
+static inline dscp_t inet_dsfield_to_dscp(__u8 dsfield)
+{
+ return (__force dscp_t)(dsfield & INET_DSCP_MASK);
+}
+
+static inline __u8 inet_dscp_to_dsfield(dscp_t dscp)
+{
+ return (__force __u8)dscp;
+}
+
+static inline bool inet_validate_dscp(__u8 val)
+{
+ return !(val & ~INET_DSCP_MASK);
+}
+
+#endif /* _INET_DSCP_H */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 082f30256f59..3d898eb6df9c 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -18,6 +18,7 @@
#include <net/ndisc.h>
#include <net/flow.h>
#include <net/flow_dissector.h>
+#include <net/inet_dscp.h>
#include <net/snmp.h>
#include <net/netns/hash.h>
@@ -975,6 +976,11 @@ static inline u8 ip6_tclass(__be32 flowinfo)
return ntohl(flowinfo & IPV6_TCLASS_MASK) >> IPV6_TCLASS_SHIFT;
}
+static inline dscp_t ip6_dscp(__be32 flowinfo)
+{
+ return inet_dsfield_to_dscp(ip6_tclass(flowinfo));
+}
+
static inline __be32 ip6_make_flowinfo(unsigned int tclass, __be32 flowlabel)
{
return htonl(tclass << IPV6_TCLASS_SHIFT) | flowlabel;
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index ec029c86ae06..e2a7b0059669 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -16,6 +16,7 @@
#include <linux/indirect_call_wrapper.h>
#include <net/fib_rules.h>
+#include <net/inet_dscp.h>
#include <net/ipv6.h>
#include <net/addrconf.h>
#include <net/ip6_route.h>
@@ -25,14 +26,14 @@ struct fib6_rule {
struct fib_rule common;
struct rt6key src;
struct rt6key dst;
- u8 tclass;
+ dscp_t dscp;
};
static bool fib6_rule_matchall(const struct fib_rule *rule)
{
struct fib6_rule *r = container_of(rule, struct fib6_rule, common);
- if (r->dst.plen || r->src.plen || r->tclass)
+ if (r->dst.plen || r->src.plen || r->dscp)
return false;
return fib_rule_matchall(rule);
}
@@ -323,7 +324,7 @@ INDIRECT_CALLABLE_SCOPE int fib6_rule_match(struct fib_rule *rule,
return 0;
}
- if (r->tclass && r->tclass != ip6_tclass(fl6->flowlabel))
+ if (r->dscp && r->dscp != ip6_dscp(fl6->flowlabel))
return 0;
if (rule->ip_proto && (rule->ip_proto != fl6->flowi6_proto))
@@ -349,6 +350,13 @@ static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct net *net = sock_net(skb->sk);
struct fib6_rule *rule6 = (struct fib6_rule *) rule;
+ if (!inet_validate_dscp(frh->tos)) {
+ NL_SET_ERR_MSG(extack,
+ "Invalid dsfield (tos): ECN bits must be 0");
+ goto errout;
+ }
+ rule6->dscp = inet_dsfield_to_dscp(frh->tos);
+
if (rule->action == FR_ACT_TO_TBL && !rule->l3mdev) {
if (rule->table == RT6_TABLE_UNSPEC) {
NL_SET_ERR_MSG(extack, "Invalid table");
@@ -369,7 +377,6 @@ static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
rule6->src.plen = frh->src_len;
rule6->dst.plen = frh->dst_len;
- rule6->tclass = frh->tos;
if (fib_rule_requires_fldissect(rule))
net->ipv6.fib6_rules_require_fldissect++;
@@ -402,7 +409,7 @@ static int fib6_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh,
if (frh->dst_len && (rule6->dst.plen != frh->dst_len))
return 0;
- if (frh->tos && (rule6->tclass != frh->tos))
+ if (frh->tos && inet_dscp_to_dsfield(rule6->dscp) != frh->tos)
return 0;
if (frh->src_len &&
@@ -423,7 +430,7 @@ static int fib6_rule_fill(struct fib_rule *rule, struct sk_buff *skb,
frh->dst_len = rule6->dst.plen;
frh->src_len = rule6->src.plen;
- frh->tos = rule6->tclass;
+ frh->tos = inet_dscp_to_dsfield(rule6->dscp);
if ((rule6->dst.plen &&
nla_put_in6_addr(skb, FRA_DST, &rule6->dst.addr)) ||
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh
index 3b0489910422..d7a9ab3be1d3 100755
--- a/tools/testing/selftests/net/fib_rule_tests.sh
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -114,10 +114,25 @@ fib_rule6_test_match_n_redirect()
log_test $? 0 "rule6 del by pref: $description"
}
+fib_rule6_test_reject()
+{
+ local match="$1"
+ local rc
+
+ $IP -6 rule add $match table $RTABLE 2>/dev/null
+ rc=$?
+ log_test $rc 2 "rule6 check: $match"
+
+ if [ $rc -eq 0 ]; then
+ $IP -6 rule del $match table $RTABLE
+ fi
+}
+
fib_rule6_test()
{
local getmatch
local match
+ local cnt
# setup the fib rule redirect route
$IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink
@@ -128,8 +143,21 @@ fib_rule6_test()
match="from $SRC_IP6 iif $DEV"
fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to table"
+ # Reject dsfield (tos) options which have ECN bits set
+ for cnt in $(seq 1 3); do
+ match="dsfield $cnt"
+ fib_rule6_test_reject "$match"
+ done
+
+ # Don't take ECN bits into account when matching on dsfield
match="tos 0x10"
- fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to table"
+ for cnt in "0x10" "0x11" "0x12" "0x13"; do
+ # Using option 'tos' instead of 'dsfield' as old iproute2
+ # versions don't support 'dsfield' in ip rule show.
+ getmatch="tos $cnt"
+ fib_rule6_test_match_n_redirect "$match" "$getmatch" \
+ "$getmatch redirect to table"
+ done
match="fwmark 0x64"
getmatch="mark 0x64"
--
2.21.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net-next 2/4] ipv4: Stop taking ECN bits into account in fib4-rules
2022-02-04 13:58 [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type Guillaume Nault
2022-02-04 13:58 ` [PATCH net-next 1/4] ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rules Guillaume Nault
@ 2022-02-04 13:58 ` Guillaume Nault
2022-02-04 13:58 ` [PATCH net-next 3/4] ipv4: Reject routes specifying ECN bits in rtm_tos Guillaume Nault
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Guillaume Nault @ 2022-02-04 13:58 UTC (permalink / raw)
To: David Miller, Jakub Kicinski
Cc: netdev, Hideaki YOSHIFUJI, David Ahern,
Toke Høiland-Jørgensen, Shuah Khan, linux-kselftest,
Russell Strong, Dave Taht
Use the new dscp_t type to replace the tos field of struct fib4_rule,
so that fib4-rules consistently ignore ECN bits.
Before this patch, fib4-rules did accept rules with the high order ECN
bit set (but not the low order one). Also, it relied on its callers
masking the ECN bits of ->flowi4_tos to prevent those from influencing
the result. This was brittle and a few call paths still do the lookup
without masking the ECN bits first.
After this patch fib4-rules only compare the DSCP bits. ECN can't
influence the result anymore, even if the caller didn't mask these
bits. Also, fib4-rules now must have both ECN bits cleared or they will
be rejected.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
net/ipv4/fib_rules.c | 18 ++++++-----
tools/testing/selftests/net/fib_rule_tests.sh | 30 ++++++++++++++++++-
2 files changed, 39 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index e0b6c8b6de57..117c48571cf0 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -23,6 +23,7 @@
#include <linux/list.h>
#include <linux/rcupdate.h>
#include <linux/export.h>
+#include <net/inet_dscp.h>
#include <net/ip.h>
#include <net/route.h>
#include <net/tcp.h>
@@ -35,7 +36,7 @@ struct fib4_rule {
struct fib_rule common;
u8 dst_len;
u8 src_len;
- u8 tos;
+ dscp_t dscp;
__be32 src;
__be32 srcmask;
__be32 dst;
@@ -49,7 +50,7 @@ static bool fib4_rule_matchall(const struct fib_rule *rule)
{
struct fib4_rule *r = container_of(rule, struct fib4_rule, common);
- if (r->dst_len || r->src_len || r->tos)
+ if (r->dst_len || r->src_len || r->dscp)
return false;
return fib_rule_matchall(rule);
}
@@ -185,7 +186,7 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_match(struct fib_rule *rule,
((daddr ^ r->dst) & r->dstmask))
return 0;
- if (r->tos && (r->tos != fl4->flowi4_tos))
+ if (r->dscp && r->dscp != inet_dsfield_to_dscp(fl4->flowi4_tos))
return 0;
if (rule->ip_proto && (rule->ip_proto != fl4->flowi4_proto))
@@ -225,10 +226,12 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
int err = -EINVAL;
struct fib4_rule *rule4 = (struct fib4_rule *) rule;
- if (frh->tos & ~IPTOS_TOS_MASK) {
- NL_SET_ERR_MSG(extack, "Invalid tos");
+ if (!inet_validate_dscp(frh->tos)) {
+ NL_SET_ERR_MSG(extack,
+ "Invalid dsfield (tos): ECN bits must be 0");
goto errout;
}
+ rule4->dscp = inet_dsfield_to_dscp(frh->tos);
/* split local/main if they are not already split */
err = fib_unmerge(net);
@@ -270,7 +273,6 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
rule4->srcmask = inet_make_mask(rule4->src_len);
rule4->dst_len = frh->dst_len;
rule4->dstmask = inet_make_mask(rule4->dst_len);
- rule4->tos = frh->tos;
net->ipv4.fib_has_custom_rules = true;
@@ -313,7 +315,7 @@ static int fib4_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh,
if (frh->dst_len && (rule4->dst_len != frh->dst_len))
return 0;
- if (frh->tos && (rule4->tos != frh->tos))
+ if (frh->tos && inet_dscp_to_dsfield(rule4->dscp) != frh->tos)
return 0;
#ifdef CONFIG_IP_ROUTE_CLASSID
@@ -337,7 +339,7 @@ static int fib4_rule_fill(struct fib_rule *rule, struct sk_buff *skb,
frh->dst_len = rule4->dst_len;
frh->src_len = rule4->src_len;
- frh->tos = rule4->tos;
+ frh->tos = inet_dscp_to_dsfield(rule4->dscp);
if ((rule4->dst_len &&
nla_put_in_addr(skb, FRA_DST, rule4->dst)) ||
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh
index d7a9ab3be1d3..4f70baad867d 100755
--- a/tools/testing/selftests/net/fib_rule_tests.sh
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -215,10 +215,25 @@ fib_rule4_test_match_n_redirect()
log_test $? 0 "rule4 del by pref: $description"
}
+fib_rule4_test_reject()
+{
+ local match="$1"
+ local rc
+
+ $IP rule add $match table $RTABLE 2>/dev/null
+ rc=$?
+ log_test $rc 2 "rule4 check: $match"
+
+ if [ $rc -eq 0 ]; then
+ $IP rule del $match table $RTABLE
+ fi
+}
+
fib_rule4_test()
{
local getmatch
local match
+ local cnt
# setup the fib rule redirect route
$IP route add table $RTABLE default via $GW_IP4 dev $DEV onlink
@@ -234,8 +249,21 @@ fib_rule4_test()
fib_rule4_test_match_n_redirect "$match" "$match" "iif redirect to table"
ip netns exec testns sysctl -qw net.ipv4.ip_forward=0
+ # Reject dsfield (tos) options which have ECN bits set
+ for cnt in $(seq 1 3); do
+ match="dsfield $cnt"
+ fib_rule4_test_reject "$match"
+ done
+
+ # Don't take ECN bits into account when matching on dsfield
match="tos 0x10"
- fib_rule4_test_match_n_redirect "$match" "$match" "tos redirect to table"
+ for cnt in "0x10" "0x11" "0x12" "0x13"; do
+ # Using option 'tos' instead of 'dsfield' as old iproute2
+ # versions don't support 'dsfield' in ip rule show.
+ getmatch="tos $cnt"
+ fib_rule4_test_match_n_redirect "$match" "$getmatch" \
+ "$getmatch redirect to table"
+ done
match="fwmark 0x64"
getmatch="mark 0x64"
--
2.21.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net-next 3/4] ipv4: Reject routes specifying ECN bits in rtm_tos
2022-02-04 13:58 [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type Guillaume Nault
2022-02-04 13:58 ` [PATCH net-next 1/4] ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rules Guillaume Nault
2022-02-04 13:58 ` [PATCH net-next 2/4] ipv4: Stop taking ECN bits into account in fib4-rules Guillaume Nault
@ 2022-02-04 13:58 ` Guillaume Nault
2022-02-04 18:19 ` Shuah Khan
2022-02-07 6:08 ` [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type David Ahern
` (2 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: Guillaume Nault @ 2022-02-04 13:58 UTC (permalink / raw)
To: David Miller, Jakub Kicinski
Cc: netdev, Hideaki YOSHIFUJI, David Ahern,
Toke Høiland-Jørgensen, Shuah Khan, linux-kselftest,
Russell Strong, Dave Taht
Use the new dscp_t type to replace the fc_tos field of fib_config, to
ensure IPv4 routes aren't influenced by ECN bits when configured with
non-zero rtm_tos.
Before this patch, IPv4 routes specifying an rtm_tos with some of the
ECN bits set were accepted. However they wouldn't work (never match) as
IPv4 normally clears the ECN bits with IPTOS_RT_MASK before doing a FIB
lookup (although a few buggy code paths don't).
After this patch, IPv4 routes specifying an rtm_tos with any ECN bit
set is rejected.
Note: IPv6 routes ignore rtm_tos altogether, any rtm_tos is accepted,
but treated as if it were 0.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
Shuah, FYI, this is the patch I was refering to in our discussion about
testing invalid tos values:
https://lore.kernel.org/netdev/20220202232555.GC15826@pc-4.home/
include/net/ip_fib.h | 3 +-
net/ipv4/fib_frontend.c | 11 +++-
net/ipv4/fib_trie.c | 7 ++-
tools/testing/selftests/net/fib_tests.sh | 76 ++++++++++++++++++++++++
4 files changed, 93 insertions(+), 4 deletions(-)
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index c4297704bbcb..6a82bcb8813b 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -17,6 +17,7 @@
#include <linux/rcupdate.h>
#include <net/fib_notifier.h>
#include <net/fib_rules.h>
+#include <net/inet_dscp.h>
#include <net/inetpeer.h>
#include <linux/percpu.h>
#include <linux/notifier.h>
@@ -24,7 +25,7 @@
struct fib_config {
u8 fc_dst_len;
- u8 fc_tos;
+ dscp_t fc_dscp;
u8 fc_protocol;
u8 fc_scope;
u8 fc_type;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d61ddd8a0ec..c60e1d1ed2b0 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -32,6 +32,7 @@
#include <linux/list.h>
#include <linux/slab.h>
+#include <net/inet_dscp.h>
#include <net/ip.h>
#include <net/protocol.h>
#include <net/route.h>
@@ -735,8 +736,16 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
memset(cfg, 0, sizeof(*cfg));
rtm = nlmsg_data(nlh);
+
+ if (!inet_validate_dscp(rtm->rtm_tos)) {
+ NL_SET_ERR_MSG(extack,
+ "Invalid dsfield (tos): ECN bits must be 0");
+ err = -EINVAL;
+ goto errout;
+ }
+ cfg->fc_dscp = inet_dsfield_to_dscp(rtm->rtm_tos);
+
cfg->fc_dst_len = rtm->rtm_dst_len;
- cfg->fc_tos = rtm->rtm_tos;
cfg->fc_table = rtm->rtm_table;
cfg->fc_protocol = rtm->rtm_protocol;
cfg->fc_scope = rtm->rtm_scope;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 8060524f4256..d937eeebb812 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -61,6 +61,7 @@
#include <linux/vmalloc.h>
#include <linux/notifier.h>
#include <net/net_namespace.h>
+#include <net/inet_dscp.h>
#include <net/ip.h>
#include <net/protocol.h>
#include <net/route.h>
@@ -1210,9 +1211,9 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
struct fib_info *fi;
u8 plen = cfg->fc_dst_len;
u8 slen = KEYLENGTH - plen;
- u8 tos = cfg->fc_tos;
u32 key;
int err;
+ u8 tos;
key = ntohl(cfg->fc_dst);
@@ -1227,6 +1228,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
goto err;
}
+ tos = inet_dscp_to_dsfield(cfg->fc_dscp);
l = fib_find_node(t, &tp, key);
fa = l ? fib_find_alias(&l->leaf, slen, tos, fi->fib_priority,
tb->tb_id, false) : NULL;
@@ -1703,8 +1705,8 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
struct key_vector *l, *tp;
u8 plen = cfg->fc_dst_len;
u8 slen = KEYLENGTH - plen;
- u8 tos = cfg->fc_tos;
u32 key;
+ u8 tos;
key = ntohl(cfg->fc_dst);
@@ -1715,6 +1717,7 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
if (!l)
return -ESRCH;
+ tos = inet_dscp_to_dsfield(cfg->fc_dscp);
fa = fib_find_alias(&l->leaf, slen, tos, 0, tb->tb_id, false);
if (!fa)
return -ESRCH;
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 996af1ae3d3d..bb73235976b3 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -1447,6 +1447,81 @@ ipv4_local_rt_cache()
log_test $? 0 "Cached route removed from VRF port device"
}
+ipv4_rt_dsfield()
+{
+ echo
+ echo "IPv4 route with dsfield tests"
+
+ run_cmd "$IP route flush 172.16.102.0/24"
+
+ # New routes should reject dsfield options that interfere with ECN
+ run_cmd "$IP route add 172.16.102.0/24 dsfield 0x01 via 172.16.101.2"
+ log_test $? 2 "Reject route with dsfield 0x01"
+
+ run_cmd "$IP route add 172.16.102.0/24 dsfield 0x02 via 172.16.101.2"
+ log_test $? 2 "Reject route with dsfield 0x02"
+
+ run_cmd "$IP route add 172.16.102.0/24 dsfield 0x03 via 172.16.101.2"
+ log_test $? 2 "Reject route with dsfield 0x03"
+
+ # A generic route that doesn't take DSCP into account
+ run_cmd "$IP route add 172.16.102.0/24 via 172.16.101.2"
+
+ # A more specific route for DSCP 0x10
+ run_cmd "$IP route add 172.16.102.0/24 dsfield 0x10 via 172.16.103.2"
+
+ # DSCP 0x10 should match the specific route, no matter the ECN bits
+ $IP route get fibmatch 172.16.102.1 dsfield 0x10 | \
+ grep -q "via 172.16.103.2"
+ log_test $? 0 "IPv4 route with DSCP and ECN:Not-ECT"
+
+ $IP route get fibmatch 172.16.102.1 dsfield 0x11 | \
+ grep -q "via 172.16.103.2"
+ log_test $? 0 "IPv4 route with DSCP and ECN:ECT(1)"
+
+ $IP route get fibmatch 172.16.102.1 dsfield 0x12 | \
+ grep -q "via 172.16.103.2"
+ log_test $? 0 "IPv4 route with DSCP and ECN:ECT(0)"
+
+ $IP route get fibmatch 172.16.102.1 dsfield 0x13 | \
+ grep -q "via 172.16.103.2"
+ log_test $? 0 "IPv4 route with DSCP and ECN:CE"
+
+ # Unknown DSCP should match the generic route, no matter the ECN bits
+ $IP route get fibmatch 172.16.102.1 dsfield 0x14 | \
+ grep -q "via 172.16.101.2"
+ log_test $? 0 "IPv4 route with unknown DSCP and ECN:Not-ECT"
+
+ $IP route get fibmatch 172.16.102.1 dsfield 0x15 | \
+ grep -q "via 172.16.101.2"
+ log_test $? 0 "IPv4 route with unknown DSCP and ECN:ECT(1)"
+
+ $IP route get fibmatch 172.16.102.1 dsfield 0x16 | \
+ grep -q "via 172.16.101.2"
+ log_test $? 0 "IPv4 route with unknown DSCP and ECN:ECT(0)"
+
+ $IP route get fibmatch 172.16.102.1 dsfield 0x17 | \
+ grep -q "via 172.16.101.2"
+ log_test $? 0 "IPv4 route with unknown DSCP and ECN:CE"
+
+ # Null DSCP should match the generic route, no matter the ECN bits
+ $IP route get fibmatch 172.16.102.1 dsfield 0x00 | \
+ grep -q "via 172.16.101.2"
+ log_test $? 0 "IPv4 route with no DSCP and ECN:Not-ECT"
+
+ $IP route get fibmatch 172.16.102.1 dsfield 0x01 | \
+ grep -q "via 172.16.101.2"
+ log_test $? 0 "IPv4 route with no DSCP and ECN:ECT(1)"
+
+ $IP route get fibmatch 172.16.102.1 dsfield 0x02 | \
+ grep -q "via 172.16.101.2"
+ log_test $? 0 "IPv4 route with no DSCP and ECN:ECT(0)"
+
+ $IP route get fibmatch 172.16.102.1 dsfield 0x03 | \
+ grep -q "via 172.16.101.2"
+ log_test $? 0 "IPv4 route with no DSCP and ECN:CE"
+}
+
ipv4_route_test()
{
route_setup
@@ -1454,6 +1529,7 @@ ipv4_route_test()
ipv4_rt_add
ipv4_rt_replace
ipv4_local_rt_cache
+ ipv4_rt_dsfield
route_cleanup
}
--
2.21.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH net-next 3/4] ipv4: Reject routes specifying ECN bits in rtm_tos
2022-02-04 13:58 ` [PATCH net-next 3/4] ipv4: Reject routes specifying ECN bits in rtm_tos Guillaume Nault
@ 2022-02-04 18:19 ` Shuah Khan
0 siblings, 0 replies; 8+ messages in thread
From: Shuah Khan @ 2022-02-04 18:19 UTC (permalink / raw)
To: Guillaume Nault, David Miller, Jakub Kicinski
Cc: netdev, Hideaki YOSHIFUJI, David Ahern,
Toke Høiland-Jørgensen, Shuah Khan, linux-kselftest,
Russell Strong, Dave Taht, Shuah Khan
On 2/4/22 6:58 AM, Guillaume Nault wrote:
> Use the new dscp_t type to replace the fc_tos field of fib_config, to
> ensure IPv4 routes aren't influenced by ECN bits when configured with
> non-zero rtm_tos.
>
> Before this patch, IPv4 routes specifying an rtm_tos with some of the
> ECN bits set were accepted. However they wouldn't work (never match) as
> IPv4 normally clears the ECN bits with IPTOS_RT_MASK before doing a FIB
> lookup (although a few buggy code paths don't).
>
> After this patch, IPv4 routes specifying an rtm_tos with any ECN bit
> set is rejected.
>
> Note: IPv6 routes ignore rtm_tos altogether, any rtm_tos is accepted,
> but treated as if it were 0.
>
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
> Shuah, FYI, this is the patch I was refering to in our discussion about
> testing invalid tos values:
> https://lore.kernel.org/netdev/20220202232555.GC15826@pc-4.home/
>
This give me context. Thank you.
thanks,
-- Shuah
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type
2022-02-04 13:58 [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type Guillaume Nault
` (2 preceding siblings ...)
2022-02-04 13:58 ` [PATCH net-next 3/4] ipv4: Reject routes specifying ECN bits in rtm_tos Guillaume Nault
@ 2022-02-07 6:08 ` David Ahern
2022-02-07 19:03 ` Toke Høiland-Jørgensen
2022-02-08 5:10 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 8+ messages in thread
From: David Ahern @ 2022-02-07 6:08 UTC (permalink / raw)
To: Guillaume Nault, David Miller, Jakub Kicinski
Cc: netdev, Hideaki YOSHIFUJI, David Ahern,
Toke Høiland-Jørgensen, Shuah Khan, linux-kselftest,
Russell Strong, Dave Taht
On 2/4/22 5:58 AM, Guillaume Nault wrote:
> The networking stack currently doesn't clearly distinguish between DSCP
> and ECN bits. The entire DSCP+ECN bits are stored in u8 variables (or
> structure fields), and each part of the stack handles them in their own
> way, using different macros. This has created several bugs in the past
> and some uncommon code paths are still unfixed.
>
> Such bugs generally manifest by selecting invalid routes because of ECN
> bits interfering with FIB routes and rules lookups (more details in the
> LPC 2021 talk[1] and in the RFC of this series[2]).
>
> This patch series aims at preventing the introduction of such bugs (and
> detecting existing ones), by introducing a dscp_t type, representing
> "sanitised" DSCP values (that is, with no ECN information), as opposed
> to plain u8 values that contain both DSCP and ECN information. dscp_t
> makes it clear for the reader what we're working on, and Sparse can
> flag invalid interactions between dscp_t and plain u8.
>
> This series converts only a few variables and structures:
>
> * Patch 1 converts the tclass field of struct fib6_rule. It
> effectively forbids the use of ECN bits in the tos/dsfield option
> of ip -6 rule. Rules now match packets solely based on their DSCP
> bits, so ECN doesn't influence the result any more. This contrasts
> with the previous behaviour where all 8 bits of the Traffic Class
> field were used. It is believed that this change is acceptable as
> matching ECN bits wasn't usable for IPv4, so only IPv6-only
> deployments could be depending on it. Also the previous behaviour
> made DSCP-based ip6-rules fail for packets with both a DSCP and an
> ECN mark, which is another reason why any such deploy is unlikely.
>
> * Patch 2 converts the tos field of struct fib4_rule. This one too
> effectively forbids defining ECN bits, this time in ip -4 rule.
> Before that, setting ECN bit 1 was accepted, while ECN bit 0 was
> rejected. But even when accepted, the rule would never match, as
> the packets would have their ECN bits cleared before doing the
> rule lookup.
>
> * Patch 3 converts the fc_tos field of struct fib_config. This is
> equivalent to patch 2, but for IPv4 routes. Routes using a
> tos/dsfield option with any ECN bit set is now rejected. Before
> this patch, they were accepted but, as with ip4 rules, these routes
> couldn't match any packet, since their ECN bits are cleared before
> the lookup.
>
> * Patch 4 converts the fa_tos field of struct fib_alias. This one is
> pure internal u8 to dscp_t conversion. While patches 1-3 had user
> facing consequences, this patch shouldn't have any side effect and
> is there to give an overview of what future conversion patches will
> look like. Conversions are quite mechanical, but imply some code
> churn, which is the price for the extra clarity a possibility of
> type checking.
>
> To summarise, all the behaviour changes required for the dscp_t type
> approach to work should be contained in patches 1-3. These changes are
> edge cases of ip-route and ip-rule that don't currently work properly.
> So they should be safe. Also, a kernel selftest is added for each of
> them.
>
seems like the right directions to me.
Acked-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type
2022-02-04 13:58 [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type Guillaume Nault
` (3 preceding siblings ...)
2022-02-07 6:08 ` [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type David Ahern
@ 2022-02-07 19:03 ` Toke Høiland-Jørgensen
2022-02-08 5:10 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-02-07 19:03 UTC (permalink / raw)
To: Guillaume Nault, David Miller, Jakub Kicinski
Cc: netdev, Hideaki YOSHIFUJI, David Ahern, Shuah Khan,
linux-kselftest, Russell Strong, Dave Taht
Guillaume Nault <gnault@redhat.com> writes:
> The networking stack currently doesn't clearly distinguish between DSCP
> and ECN bits. The entire DSCP+ECN bits are stored in u8 variables (or
> structure fields), and each part of the stack handles them in their own
> way, using different macros. This has created several bugs in the past
> and some uncommon code paths are still unfixed.
>
> Such bugs generally manifest by selecting invalid routes because of ECN
> bits interfering with FIB routes and rules lookups (more details in the
> LPC 2021 talk[1] and in the RFC of this series[2]).
>
> This patch series aims at preventing the introduction of such bugs (and
> detecting existing ones), by introducing a dscp_t type, representing
> "sanitised" DSCP values (that is, with no ECN information), as opposed
> to plain u8 values that contain both DSCP and ECN information. dscp_t
> makes it clear for the reader what we're working on, and Sparse can
> flag invalid interactions between dscp_t and plain u8.
>
> This series converts only a few variables and structures:
>
> * Patch 1 converts the tclass field of struct fib6_rule. It
> effectively forbids the use of ECN bits in the tos/dsfield option
> of ip -6 rule. Rules now match packets solely based on their DSCP
> bits, so ECN doesn't influence the result any more. This contrasts
> with the previous behaviour where all 8 bits of the Traffic Class
> field were used. It is believed that this change is acceptable as
> matching ECN bits wasn't usable for IPv4, so only IPv6-only
> deployments could be depending on it. Also the previous behaviour
> made DSCP-based ip6-rules fail for packets with both a DSCP and an
> ECN mark, which is another reason why any such deploy is unlikely.
>
> * Patch 2 converts the tos field of struct fib4_rule. This one too
> effectively forbids defining ECN bits, this time in ip -4 rule.
> Before that, setting ECN bit 1 was accepted, while ECN bit 0 was
> rejected. But even when accepted, the rule would never match, as
> the packets would have their ECN bits cleared before doing the
> rule lookup.
>
> * Patch 3 converts the fc_tos field of struct fib_config. This is
> equivalent to patch 2, but for IPv4 routes. Routes using a
> tos/dsfield option with any ECN bit set is now rejected. Before
> this patch, they were accepted but, as with ip4 rules, these routes
> couldn't match any packet, since their ECN bits are cleared before
> the lookup.
>
> * Patch 4 converts the fa_tos field of struct fib_alias. This one is
> pure internal u8 to dscp_t conversion. While patches 1-3 had user
> facing consequences, this patch shouldn't have any side effect and
> is there to give an overview of what future conversion patches will
> look like. Conversions are quite mechanical, but imply some code
> churn, which is the price for the extra clarity a possibility of
> type checking.
>
> To summarise, all the behaviour changes required for the dscp_t type
> approach to work should be contained in patches 1-3. These changes are
> edge cases of ip-route and ip-rule that don't currently work properly.
> So they should be safe. Also, a kernel selftest is added for each of
> them.
>
> Finally, this work also paves the way for allowing the usage of the 3
> high order DSCP bits in IPv4 (a few call paths already handle them, but
> in general the stack clears them before IPv4 rule and route lookups).
LGTM; thanks again for doing this!
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type
2022-02-04 13:58 [PATCH net-next 0/4] inet: Separate DSCP from ECN bits using new dscp_t type Guillaume Nault
` (4 preceding siblings ...)
2022-02-07 19:03 ` Toke Høiland-Jørgensen
@ 2022-02-08 5:10 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-02-08 5:10 UTC (permalink / raw)
To: Guillaume Nault
Cc: davem, kuba, netdev, yoshfuji, dsahern, toke, shuah,
linux-kselftest, russell, dave.taht
Hello:
This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Fri, 4 Feb 2022 14:58:09 +0100 you wrote:
> The networking stack currently doesn't clearly distinguish between DSCP
> and ECN bits. The entire DSCP+ECN bits are stored in u8 variables (or
> structure fields), and each part of the stack handles them in their own
> way, using different macros. This has created several bugs in the past
> and some uncommon code paths are still unfixed.
>
> Such bugs generally manifest by selecting invalid routes because of ECN
> bits interfering with FIB routes and rules lookups (more details in the
> LPC 2021 talk[1] and in the RFC of this series[2]).
>
> [...]
Here is the summary with links:
- [net-next,1/4] ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rules
(no matching commit)
- [net-next,2/4] ipv4: Stop taking ECN bits into account in fib4-rules
https://git.kernel.org/netdev/net-next/c/563f8e97e054
- [net-next,3/4] ipv4: Reject routes specifying ECN bits in rtm_tos
https://git.kernel.org/netdev/net-next/c/f55fbb6afb8d
- [net-next,4/4] ipv4: Use dscp_t in struct fib_alias
https://git.kernel.org/netdev/net-next/c/32ccf1107980
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread