netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH nf-next 0/6] netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc
@ 2023-03-04  0:12 Xin Long
  2023-03-04  0:12 ` [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len Xin Long
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Xin Long @ 2023-03-04  0:12 UTC (permalink / raw)
  To: netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu,
	Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

Currently pskb_trim_rcsum() is always done on the RX path. However, IPv6
jumbo packets hide the real packet len in the Hop-by-hop option header,
which should be parsed before doing the trim.

In ip6_rcv_core() it calls ipv6_parse_hopopts() to handle the Hop-by-hop
option header then do pskb_trim_rcsum(). The similar process should also
be done properly before pskb_trim_rcsum() on the RX path of bridge and
openvswitch and tc.

This patchset improves the function handling the Hop-by-hop option header
in bridge, and moves this function into netfilter utils, and then uses it
in nf_conntrack_ovs for openvswitch and tc conntrack.

Note that this patch is especially needed after the IPv6 BIG TCP was
supported in kernel, which is using IPv6 Jumbo packets, and the last
patch adds a big tcp selftest, which also covers it.

Xin Long (6):
  netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
  netfilter: bridge: check len before accessing more nh data
  netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
  netfilter: move br_nf_check_hbh_len to utils
  netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
  selftests: add a selftest for big tcp

 include/linux/netfilter_ipv6.h         |   2 +
 net/bridge/br_netfilter_ipv6.c         |  79 ++---------
 net/netfilter/nf_conntrack_ovs.c       |  11 +-
 net/netfilter/utils.c                  |  54 ++++++++
 tools/testing/selftests/net/Makefile   |   1 +
 tools/testing/selftests/net/big_tcp.sh | 180 +++++++++++++++++++++++++
 6 files changed, 256 insertions(+), 71 deletions(-)
 create mode 100755 tools/testing/selftests/net/big_tcp.sh

-- 
2.39.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
  2023-03-04  0:12 [PATCH nf-next 0/6] netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc Xin Long
@ 2023-03-04  0:12 ` Xin Long
  2023-03-06 15:52   ` Simon Horman
                     ` (2 more replies)
  2023-03-04  0:12 ` [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data Xin Long
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 25+ messages in thread
From: Xin Long @ 2023-03-04  0:12 UTC (permalink / raw)
  To: netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu,
	Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

When checking Hop-by-hop option header, if the option data is in
nonlinear area, it should do pskb_may_pull instead of discarding
the skb as a bad IPv6 packet.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/bridge/br_netfilter_ipv6.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index 6b07f30675bb..5cd3e4c35123 100644
--- a/net/bridge/br_netfilter_ipv6.c
+++ b/net/bridge/br_netfilter_ipv6.c
@@ -45,14 +45,18 @@
  */
 static int br_nf_check_hbh_len(struct sk_buff *skb)
 {
-	unsigned char *raw = (u8 *)(ipv6_hdr(skb) + 1);
+	int len, off = sizeof(struct ipv6hdr);
+	unsigned char *nh;
 	u32 pkt_len;
-	const unsigned char *nh = skb_network_header(skb);
-	int off = raw - nh;
-	int len = (raw[1] + 1) << 3;
 
-	if ((raw + len) - skb->data > skb_headlen(skb))
+	if (!pskb_may_pull(skb, off + 8))
 		goto bad;
+	nh = (u8 *)(ipv6_hdr(skb) + 1);
+	len = (nh[1] + 1) << 3;
+
+	if (!pskb_may_pull(skb, off + len))
+		goto bad;
+	nh = skb_network_header(skb);
 
 	off += 2;
 	len -= 2;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data
  2023-03-04  0:12 [PATCH nf-next 0/6] netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc Xin Long
  2023-03-04  0:12 ` [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len Xin Long
@ 2023-03-04  0:12 ` Xin Long
  2023-03-06 15:59   ` Simon Horman
                     ` (2 more replies)
  2023-03-04  0:12 ` [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len Xin Long
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 25+ messages in thread
From: Xin Long @ 2023-03-04  0:12 UTC (permalink / raw)
  To: netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu,
	Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

In the while loop of br_nf_check_hbh_len(), similar to ip6_parse_tlv(),
before accessing 'nh[off + 1]', it should add a check 'len < 2'; and
before parsing IPV6_TLV_JUMBO, it should add a check 'optlen > len',
in case of overflows.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/bridge/br_netfilter_ipv6.c | 47 ++++++++++++++++------------------
 1 file changed, 22 insertions(+), 25 deletions(-)

diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index 5cd3e4c35123..50f564c33551 100644
--- a/net/bridge/br_netfilter_ipv6.c
+++ b/net/bridge/br_netfilter_ipv6.c
@@ -50,54 +50,51 @@ static int br_nf_check_hbh_len(struct sk_buff *skb)
 	u32 pkt_len;
 
 	if (!pskb_may_pull(skb, off + 8))
-		goto bad;
+		return -1;
 	nh = (u8 *)(ipv6_hdr(skb) + 1);
 	len = (nh[1] + 1) << 3;
 
 	if (!pskb_may_pull(skb, off + len))
-		goto bad;
+		return -1;
 	nh = skb_network_header(skb);
 
 	off += 2;
 	len -= 2;
-
 	while (len > 0) {
-		int optlen = nh[off + 1] + 2;
-
-		switch (nh[off]) {
-		case IPV6_TLV_PAD1:
-			optlen = 1;
-			break;
+		int optlen;
 
-		case IPV6_TLV_PADN:
-			break;
+		if (nh[off] == IPV6_TLV_PAD1) {
+			off++;
+			len--;
+			continue;
+		}
+		if (len < 2)
+			return -1;
+		optlen = nh[off + 1] + 2;
+		if (optlen > len)
+			return -1;
 
-		case IPV6_TLV_JUMBO:
+		if (nh[off] == IPV6_TLV_JUMBO) {
 			if (nh[off + 1] != 4 || (off & 3) != 2)
-				goto bad;
+				return -1;
 			pkt_len = ntohl(*(__be32 *)(nh + off + 2));
 			if (pkt_len <= IPV6_MAXPLEN ||
 			    ipv6_hdr(skb)->payload_len)
-				goto bad;
+				return -1;
 			if (pkt_len > skb->len - sizeof(struct ipv6hdr))
-				goto bad;
+				return -1;
 			if (pskb_trim_rcsum(skb,
 					    pkt_len + sizeof(struct ipv6hdr)))
-				goto bad;
+				return -1;
 			nh = skb_network_header(skb);
-			break;
-		default:
-			if (optlen > len)
-				goto bad;
-			break;
 		}
 		off += optlen;
 		len -= optlen;
 	}
-	if (len == 0)
-		return 0;
-bad:
-	return -1;
+	if (len)
+		return -1;
+
+	return 0;
 }
 
 int br_validate_ipv6(struct net *net, struct sk_buff *skb)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
  2023-03-04  0:12 [PATCH nf-next 0/6] netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc Xin Long
  2023-03-04  0:12 ` [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len Xin Long
  2023-03-04  0:12 ` [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data Xin Long
@ 2023-03-04  0:12 ` Xin Long
  2023-03-06 16:26   ` Simon Horman
                     ` (2 more replies)
  2023-03-04  0:12 ` [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils Xin Long
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 25+ messages in thread
From: Xin Long @ 2023-03-04  0:12 UTC (permalink / raw)
  To: netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu,
	Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

br_nf_check_hbh_len() is a function to check the Hop-by-hop option
header, and shouldn't do pskb_trim_rcsum() there. This patch is to
pass pkt_len out to br_validate_ipv6() and do pskb_trim_rcsum()
after calling br_validate_ipv6() instead.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/bridge/br_netfilter_ipv6.c | 33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index 50f564c33551..07289e4f3213 100644
--- a/net/bridge/br_netfilter_ipv6.c
+++ b/net/bridge/br_netfilter_ipv6.c
@@ -43,11 +43,11 @@
 /* We only check the length. A bridge shouldn't do any hop-by-hop stuff
  * anyway
  */
-static int br_nf_check_hbh_len(struct sk_buff *skb)
+static int br_nf_check_hbh_len(struct sk_buff *skb, u32 *plen)
 {
 	int len, off = sizeof(struct ipv6hdr);
 	unsigned char *nh;
-	u32 pkt_len;
+	u32 pkt_len = 0;
 
 	if (!pskb_may_pull(skb, off + 8))
 		return -1;
@@ -83,10 +83,6 @@ static int br_nf_check_hbh_len(struct sk_buff *skb)
 				return -1;
 			if (pkt_len > skb->len - sizeof(struct ipv6hdr))
 				return -1;
-			if (pskb_trim_rcsum(skb,
-					    pkt_len + sizeof(struct ipv6hdr)))
-				return -1;
-			nh = skb_network_header(skb);
 		}
 		off += optlen;
 		len -= optlen;
@@ -94,6 +90,8 @@ static int br_nf_check_hbh_len(struct sk_buff *skb)
 	if (len)
 		return -1;
 
+	if (pkt_len)
+		*plen = pkt_len;
 	return 0;
 }
 
@@ -116,22 +114,19 @@ int br_validate_ipv6(struct net *net, struct sk_buff *skb)
 		goto inhdr_error;
 
 	pkt_len = ntohs(hdr->payload_len);
+	if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb, &pkt_len))
+		goto drop;
 
-	if (pkt_len || hdr->nexthdr != NEXTHDR_HOP) {
-		if (pkt_len + ip6h_len > skb->len) {
-			__IP6_INC_STATS(net, idev,
-					IPSTATS_MIB_INTRUNCATEDPKTS);
-			goto drop;
-		}
-		if (pskb_trim_rcsum(skb, pkt_len + ip6h_len)) {
-			__IP6_INC_STATS(net, idev,
-					IPSTATS_MIB_INDISCARDS);
-			goto drop;
-		}
-		hdr = ipv6_hdr(skb);
+	if (pkt_len + ip6h_len > skb->len) {
+		__IP6_INC_STATS(net, idev,
+				IPSTATS_MIB_INTRUNCATEDPKTS);
+		goto drop;
 	}
-	if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb))
+	if (pskb_trim_rcsum(skb, pkt_len + ip6h_len)) {
+		__IP6_INC_STATS(net, idev,
+				IPSTATS_MIB_INDISCARDS);
 		goto drop;
+	}
 
 	memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
 	/* No IP options in IPv6 header; however it should be
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils
  2023-03-04  0:12 [PATCH nf-next 0/6] netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc Xin Long
                   ` (2 preceding siblings ...)
  2023-03-04  0:12 ` [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len Xin Long
@ 2023-03-04  0:12 ` Xin Long
  2023-03-06 16:35   ` Simon Horman
                     ` (2 more replies)
  2023-03-04  0:12 ` [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim Xin Long
  2023-03-04  0:12 ` [PATCH nf-next 6/6] selftests: add a selftest for big tcp Xin Long
  5 siblings, 3 replies; 25+ messages in thread
From: Xin Long @ 2023-03-04  0:12 UTC (permalink / raw)
  To: netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu,
	Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

Rename br_nf_check_hbh_len() to nf_ip6_check_hbh_len() and move it
to netfilter utils, so that it can be used by other modules, like
ovs and tc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/linux/netfilter_ipv6.h |  2 ++
 net/bridge/br_netfilter_ipv6.c | 57 +---------------------------------
 net/netfilter/utils.c          | 54 ++++++++++++++++++++++++++++++++
 3 files changed, 57 insertions(+), 56 deletions(-)

diff --git a/include/linux/netfilter_ipv6.h b/include/linux/netfilter_ipv6.h
index 48314ade1506..7834c0be2831 100644
--- a/include/linux/netfilter_ipv6.h
+++ b/include/linux/netfilter_ipv6.h
@@ -197,6 +197,8 @@ static inline int nf_cookie_v6_check(const struct ipv6hdr *iph,
 __sum16 nf_ip6_checksum(struct sk_buff *skb, unsigned int hook,
 			unsigned int dataoff, u_int8_t protocol);
 
+int nf_ip6_check_hbh_len(struct sk_buff *skb, u32 *plen);
+
 int ipv6_netfilter_init(void);
 void ipv6_netfilter_fini(void);
 
diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index 07289e4f3213..550039dfc31a 100644
--- a/net/bridge/br_netfilter_ipv6.c
+++ b/net/bridge/br_netfilter_ipv6.c
@@ -40,61 +40,6 @@
 #include <linux/sysctl.h>
 #endif
 
-/* We only check the length. A bridge shouldn't do any hop-by-hop stuff
- * anyway
- */
-static int br_nf_check_hbh_len(struct sk_buff *skb, u32 *plen)
-{
-	int len, off = sizeof(struct ipv6hdr);
-	unsigned char *nh;
-	u32 pkt_len = 0;
-
-	if (!pskb_may_pull(skb, off + 8))
-		return -1;
-	nh = (u8 *)(ipv6_hdr(skb) + 1);
-	len = (nh[1] + 1) << 3;
-
-	if (!pskb_may_pull(skb, off + len))
-		return -1;
-	nh = skb_network_header(skb);
-
-	off += 2;
-	len -= 2;
-	while (len > 0) {
-		int optlen;
-
-		if (nh[off] == IPV6_TLV_PAD1) {
-			off++;
-			len--;
-			continue;
-		}
-		if (len < 2)
-			return -1;
-		optlen = nh[off + 1] + 2;
-		if (optlen > len)
-			return -1;
-
-		if (nh[off] == IPV6_TLV_JUMBO) {
-			if (nh[off + 1] != 4 || (off & 3) != 2)
-				return -1;
-			pkt_len = ntohl(*(__be32 *)(nh + off + 2));
-			if (pkt_len <= IPV6_MAXPLEN ||
-			    ipv6_hdr(skb)->payload_len)
-				return -1;
-			if (pkt_len > skb->len - sizeof(struct ipv6hdr))
-				return -1;
-		}
-		off += optlen;
-		len -= optlen;
-	}
-	if (len)
-		return -1;
-
-	if (pkt_len)
-		*plen = pkt_len;
-	return 0;
-}
-
 int br_validate_ipv6(struct net *net, struct sk_buff *skb)
 {
 	const struct ipv6hdr *hdr;
@@ -114,7 +59,7 @@ int br_validate_ipv6(struct net *net, struct sk_buff *skb)
 		goto inhdr_error;
 
 	pkt_len = ntohs(hdr->payload_len);
-	if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb, &pkt_len))
+	if (hdr->nexthdr == NEXTHDR_HOP && nf_ip6_check_hbh_len(skb, &pkt_len))
 		goto drop;
 
 	if (pkt_len + ip6h_len > skb->len) {
diff --git a/net/netfilter/utils.c b/net/netfilter/utils.c
index 2182d361e273..04f4bd661774 100644
--- a/net/netfilter/utils.c
+++ b/net/netfilter/utils.c
@@ -215,3 +215,57 @@ int nf_reroute(struct sk_buff *skb, struct nf_queue_entry *entry)
 	}
 	return ret;
 }
+
+/* Only get and check the lengths, not do any hop-by-hop stuff. */
+int nf_ip6_check_hbh_len(struct sk_buff *skb, u32 *plen)
+{
+	int len, off = sizeof(struct ipv6hdr);
+	unsigned char *nh;
+	u32 pkt_len = 0;
+
+	if (!pskb_may_pull(skb, off + 8))
+		return -ENOMEM;
+	nh = (u8 *)(ipv6_hdr(skb) + 1);
+	len = (nh[1] + 1) << 3;
+
+	if (!pskb_may_pull(skb, off + len))
+		return -ENOMEM;
+	nh = skb_network_header(skb);
+
+	off += 2;
+	len -= 2;
+	while (len > 0) {
+		int optlen;
+
+		if (nh[off] == IPV6_TLV_PAD1) {
+			off++;
+			len--;
+			continue;
+		}
+		if (len < 2)
+			return -EBADMSG;
+		optlen = nh[off + 1] + 2;
+		if (optlen > len)
+			return -EBADMSG;
+
+		if (nh[off] == IPV6_TLV_JUMBO) {
+			if (nh[off + 1] != 4 || (off & 3) != 2)
+				return -EBADMSG;
+			pkt_len = ntohl(*(__be32 *)(nh + off + 2));
+			if (pkt_len <= IPV6_MAXPLEN ||
+			    ipv6_hdr(skb)->payload_len)
+				return -EBADMSG;
+			if (pkt_len > skb->len - sizeof(struct ipv6hdr))
+				return -EBADMSG;
+		}
+		off += optlen;
+		len -= optlen;
+	}
+	if (len)
+		return -EBADMSG;
+
+	if (pkt_len)
+		*plen = pkt_len;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nf_ip6_check_hbh_len);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
  2023-03-04  0:12 [PATCH nf-next 0/6] netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc Xin Long
                   ` (3 preceding siblings ...)
  2023-03-04  0:12 ` [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils Xin Long
@ 2023-03-04  0:12 ` Xin Long
  2023-03-06 16:35   ` Simon Horman
                     ` (2 more replies)
  2023-03-04  0:12 ` [PATCH nf-next 6/6] selftests: add a selftest for big tcp Xin Long
  5 siblings, 3 replies; 25+ messages in thread
From: Xin Long @ 2023-03-04  0:12 UTC (permalink / raw)
  To: netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu,
	Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
length for the jumbo packets, all data and exthdr will be trimmed
in nf_ct_skb_network_trim().

This patch is to call nf_ip6_check_hbh_len() to get real pkt_len
of the IPv6 packet, similar to br_validate_ipv6().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/netfilter/nf_conntrack_ovs.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c
index 52b776bdf526..2016a3b05f86 100644
--- a/net/netfilter/nf_conntrack_ovs.c
+++ b/net/netfilter/nf_conntrack_ovs.c
@@ -6,6 +6,7 @@
 #include <net/netfilter/ipv6/nf_defrag_ipv6.h>
 #include <net/ipv6_frag.h>
 #include <net/ip.h>
+#include <linux/netfilter_ipv6.h>
 
 /* 'skb' should already be pulled to nh_ofs. */
 int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct,
@@ -114,14 +115,20 @@ EXPORT_SYMBOL_GPL(nf_ct_add_helper);
 int nf_ct_skb_network_trim(struct sk_buff *skb, int family)
 {
 	unsigned int len;
+	int err;
 
 	switch (family) {
 	case NFPROTO_IPV4:
 		len = skb_ip_totlen(skb);
 		break;
 	case NFPROTO_IPV6:
-		len = sizeof(struct ipv6hdr)
-			+ ntohs(ipv6_hdr(skb)->payload_len);
+		len = ntohs(ipv6_hdr(skb)->payload_len);
+		if (ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP) {
+			err = nf_ip6_check_hbh_len(skb, &len);
+			if (err)
+				return err;
+		}
+		len += sizeof(struct ipv6hdr);
 		break;
 	default:
 		len = skb->len;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH nf-next 6/6] selftests: add a selftest for big tcp
  2023-03-04  0:12 [PATCH nf-next 0/6] netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc Xin Long
                   ` (4 preceding siblings ...)
  2023-03-04  0:12 ` [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim Xin Long
@ 2023-03-04  0:12 ` Xin Long
  2023-03-07 18:31   ` Aaron Conole
  5 siblings, 1 reply; 25+ messages in thread
From: Xin Long @ 2023-03-04  0:12 UTC (permalink / raw)
  To: netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu,
	Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

This test runs on the client-router-server topo, and monitors the traffic
on the RX devices of router and server while sending BIG TCP packets with
netperf from client to server. Meanwhile, it changes 'tso' on the TX devs
and 'gro' on the RX devs. Then it checks if any BIG TCP packets appears
on the RX devs with 'ip/ip6tables -m length ! --length 0:65535' for each
case.

Note that we also add tc action ct in link1 ingress to cover the ipv6
jumbo packets process in nf_ct_skb_network_trim() of nf_conntrack_ovs.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 tools/testing/selftests/net/Makefile   |   1 +
 tools/testing/selftests/net/big_tcp.sh | 180 +++++++++++++++++++++++++
 2 files changed, 181 insertions(+)
 create mode 100755 tools/testing/selftests/net/big_tcp.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 6cd8993454d7..099741290184 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -48,6 +48,7 @@ TEST_PROGS += l2_tos_ttl_inherit.sh
 TEST_PROGS += bind_bhash.sh
 TEST_PROGS += ip_local_port_range.sh
 TEST_PROGS += rps_default_mask.sh
+TEST_PROGS += big_tcp.sh
 TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
 TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
 TEST_GEN_FILES =  socket nettest
diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh
new file mode 100755
index 000000000000..cde9a91c4797
--- /dev/null
+++ b/tools/testing/selftests/net/big_tcp.sh
@@ -0,0 +1,180 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Testing For IPv4 and IPv6 BIG TCP.
+# TOPO: CLIENT_NS (link0)<--->(link1) ROUTER_NS (link2)<--->(link3) SERVER_NS
+
+CLIENT_NS=$(mktemp -u client-XXXXXXXX)
+CLIENT_IP4="198.51.100.1"
+CLIENT_IP6="2001:db8:1::1"
+
+SERVER_NS=$(mktemp -u server-XXXXXXXX)
+SERVER_IP4="203.0.113.1"
+SERVER_IP6="2001:db8:2::1"
+
+ROUTER_NS=$(mktemp -u router-XXXXXXXX)
+SERVER_GW4="203.0.113.2"
+CLIENT_GW4="198.51.100.2"
+SERVER_GW6="2001:db8:2::2"
+CLIENT_GW6="2001:db8:1::2"
+
+MAX_SIZE=128000
+CHK_SIZE=65535
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+setup() {
+	ip netns add $CLIENT_NS
+	ip netns add $SERVER_NS
+	ip netns add $ROUTER_NS
+	ip -net $ROUTER_NS link add link1 type veth peer name link0 netns $CLIENT_NS
+	ip -net $ROUTER_NS link add link2 type veth peer name link3 netns $SERVER_NS
+
+	ip -net $CLIENT_NS link set link0 up
+	ip -net $CLIENT_NS link set link0 mtu 1442
+	ip -net $CLIENT_NS addr add $CLIENT_IP4/24 dev link0
+	ip -net $CLIENT_NS addr add $CLIENT_IP6/64 dev link0 nodad
+	ip -net $CLIENT_NS route add $SERVER_IP4 dev link0 via $CLIENT_GW4
+	ip -net $CLIENT_NS route add $SERVER_IP6 dev link0 via $CLIENT_GW6
+	ip -net $CLIENT_NS link set dev link0 \
+		gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
+	ip -net $CLIENT_NS link set dev link0 \
+		gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
+	ip net exec $CLIENT_NS sysctl -wq net.ipv4.tcp_window_scaling=10
+
+	ip -net $ROUTER_NS link set link1 up
+	ip -net $ROUTER_NS link set link2 up
+	ip -net $ROUTER_NS addr add $CLIENT_GW4/24 dev link1
+	ip -net $ROUTER_NS addr add $CLIENT_GW6/64 dev link1 nodad
+	ip -net $ROUTER_NS addr add $SERVER_GW4/24 dev link2
+	ip -net $ROUTER_NS addr add $SERVER_GW6/64 dev link2 nodad
+	ip -net $ROUTER_NS link set dev link1 \
+		gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
+	ip -net $ROUTER_NS link set dev link2 \
+		gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
+	ip -net $ROUTER_NS link set dev link1 \
+		gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
+	ip -net $ROUTER_NS link set dev link2 \
+		gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
+	# test for nf_ct_skb_network_trim in nf_conntrack_ovs used by TC ct action.
+	ip net exec $ROUTER_NS tc qdisc add dev link1 ingress
+	ip net exec $ROUTER_NS tc filter add dev link1 ingress \
+		proto ip flower ip_proto tcp action ct
+	ip net exec $ROUTER_NS tc filter add dev link1 ingress \
+		proto ipv6 flower ip_proto tcp action ct
+	ip net exec $ROUTER_NS sysctl -wq net.ipv4.ip_forward=1
+	ip net exec $ROUTER_NS sysctl -wq net.ipv6.conf.all.forwarding=1
+
+	ip -net $SERVER_NS link set link3 up
+	ip -net $SERVER_NS addr add $SERVER_IP4/24 dev link3
+	ip -net $SERVER_NS addr add $SERVER_IP6/64 dev link3 nodad
+	ip -net $SERVER_NS route add $CLIENT_IP4 dev link3 via $SERVER_GW4
+	ip -net $SERVER_NS route add $CLIENT_IP6 dev link3 via $SERVER_GW6
+	ip -net $SERVER_NS link set dev link3 \
+		gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
+	ip -net $SERVER_NS link set dev link3 \
+		gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
+	ip net exec $SERVER_NS sysctl -wq net.ipv4.tcp_window_scaling=10
+	ip net exec $SERVER_NS netserver 2>&1 >/dev/null
+}
+
+cleanup() {
+	ip net exec $SERVER_NS pkill netserver
+	ip -net $ROUTER_NS link del link1
+	ip -net $ROUTER_NS link del link2
+	ip netns del "$CLIENT_NS"
+	ip netns del "$SERVER_NS"
+	ip netns del "$ROUTER_NS"
+}
+
+start_counter() {
+	local ipt="iptables"
+	local iface=$1
+	local netns=$2
+
+	[ "$NF" = "6" ] && ipt="ip6tables"
+	ip net exec $netns $ipt -t raw -A PREROUTING -i $iface \
+		-m length ! --length 0:$CHK_SIZE -j ACCEPT
+}
+
+check_counter() {
+	local ipt="iptables"
+	local iface=$1
+	local netns=$2
+
+	[ "$NF" = "6" ] && ipt="ip6tables"
+	test `ip net exec $netns $ipt -t raw -L -v |grep $iface | awk '{print $1}'` != "0"
+}
+
+stop_counter() {
+	local ipt="iptables"
+	local iface=$1
+	local netns=$2
+
+	[ "$NF" = "6" ] && ipt="ip6tables"
+	ip net exec $netns $ipt -t raw -D PREROUTING -i $iface \
+		-m length ! --length 0:$CHK_SIZE -j ACCEPT
+}
+
+do_netperf() {
+	local serip=$SERVER_IP4
+	local netns=$1
+
+	[ "$NF" = "6" ] && serip=$SERVER_IP6
+	ip net exec $netns netperf -$NF -t TCP_STREAM -H $serip 2>&1 >/dev/null
+}
+
+do_test() {
+	local cli_tso=$1
+	local gw_gro=$2
+	local gw_tso=$3
+	local ser_gro=$4
+	local ret="PASS"
+
+	ip net exec $CLIENT_NS ethtool -K link0 tso $cli_tso
+	ip net exec $ROUTER_NS ethtool -K link1 gro $gw_gro
+	ip net exec $ROUTER_NS ethtool -K link2 tso $gw_tso
+	ip net exec $SERVER_NS ethtool -K link3 gro $ser_gro
+
+	start_counter link1 $ROUTER_NS
+	start_counter link3 $SERVER_NS
+	do_netperf $CLIENT_NS
+
+	if check_counter link1 $ROUTER_NS; then
+		check_counter link3 $SERVER_NS || ret="FAIL_on_link3"
+	else
+		ret="FAIL_on_link1"
+	fi
+
+	stop_counter link1 $ROUTER_NS
+	stop_counter link3 $SERVER_NS
+	printf "%-9s %-8s %-8s %-8s: [%s]\n" \
+		$cli_tso $gw_gro $gw_tso $ser_gro $ret
+	test $ret = "PASS"
+}
+
+testup() {
+	echo "CLI GSO | GW GRO | GW GSO | SER GRO" && \
+	do_test "on"  "on"  "on"  "on"  && \
+	do_test "on"  "off" "on"  "off" && \
+	do_test "off" "on"  "on"  "on"  && \
+	do_test "on"  "on"  "off" "on"  && \
+	do_test "off" "on"  "off" "on"
+}
+
+if ! netperf -V &> /dev/null; then
+	echo "SKIP: Could not run test without netperf tool"
+	exit $ksft_skip
+fi
+
+if ! ip link help 2>&1 | grep gso_ipv4_max_size &> /dev/null; then
+	echo "SKIP: Could not run test without gso/gro_ipv4_max_size supported in ip-link"
+	exit $ksft_skip
+fi
+
+trap cleanup EXIT
+setup && echo "Testing for BIG TCP:" && \
+NF=4 testup && echo "***v4 Tests Done***" && \
+NF=6 testup && echo "***v6 Tests Done***"
+exit $?
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
  2023-03-04  0:12 ` [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len Xin Long
@ 2023-03-06 15:52   ` Simon Horman
  2023-03-07  9:16   ` Nikolay Aleksandrov
  2023-03-07 18:33   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Simon Horman @ 2023-03-06 15:52 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

On Fri, Mar 03, 2023 at 07:12:37PM -0500, Xin Long wrote:
> When checking Hop-by-hop option header, if the option data is in
> nonlinear area, it should do pskb_may_pull instead of discarding
> the skb as a bad IPv6 packet.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>

> ---
>  net/bridge/br_netfilter_ipv6.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
> index 6b07f30675bb..5cd3e4c35123 100644
> --- a/net/bridge/br_netfilter_ipv6.c
> +++ b/net/bridge/br_netfilter_ipv6.c
> @@ -45,14 +45,18 @@
>   */
>  static int br_nf_check_hbh_len(struct sk_buff *skb)
>  {
> -	unsigned char *raw = (u8 *)(ipv6_hdr(skb) + 1);
> +	int len, off = sizeof(struct ipv6hdr);
> +	unsigned char *nh;
>  	u32 pkt_len;
> -	const unsigned char *nh = skb_network_header(skb);
> -	int off = raw - nh;
> -	int len = (raw[1] + 1) << 3;
>  
> -	if ((raw + len) - skb->data > skb_headlen(skb))
> +	if (!pskb_may_pull(skb, off + 8))
>  		goto bad;
> +	nh = (u8 *)(ipv6_hdr(skb) + 1);

nit: if you need so spin a v2 perhaps it would be worth
     considering reconciling the type of nh (unsigned char *)
     with the type of the cast above (u8 *).

> +	len = (nh[1] + 1) << 3;
> +
> +	if (!pskb_may_pull(skb, off + len))
> +		goto bad;
> +	nh = skb_network_header(skb);
>  
>  	off += 2;
>  	len -= 2;
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data
  2023-03-04  0:12 ` [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data Xin Long
@ 2023-03-06 15:59   ` Simon Horman
  2023-03-07  9:20   ` Nikolay Aleksandrov
  2023-03-07 18:32   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Simon Horman @ 2023-03-06 15:59 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

On Fri, Mar 03, 2023 at 07:12:38PM -0500, Xin Long wrote:
> In the while loop of br_nf_check_hbh_len(), similar to ip6_parse_tlv(),
> before accessing 'nh[off + 1]', it should add a check 'len < 2'; and
> before parsing IPV6_TLV_JUMBO, it should add a check 'optlen > len',
> in case of overflows.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>

> ---
>  net/bridge/br_netfilter_ipv6.c | 47 ++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 25 deletions(-)
> 
> diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
> index 5cd3e4c35123..50f564c33551 100644
> --- a/net/bridge/br_netfilter_ipv6.c
> +++ b/net/bridge/br_netfilter_ipv6.c

...

> -	if (len == 0)
> -		return 0;
> -bad:
> -	return -1;
> +	if (len)
> +		return -1;
> +
> +	return 0;

nit: if you have to spin a v2, you may want to consider

	return len ? -1 : 0;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
  2023-03-04  0:12 ` [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len Xin Long
@ 2023-03-06 16:26   ` Simon Horman
  2023-03-07  9:21   ` Nikolay Aleksandrov
  2023-03-07 18:32   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Simon Horman @ 2023-03-06 16:26 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

On Fri, Mar 03, 2023 at 07:12:39PM -0500, Xin Long wrote:
> br_nf_check_hbh_len() is a function to check the Hop-by-hop option
> header, and shouldn't do pskb_trim_rcsum() there. This patch is to
> pass pkt_len out to br_validate_ipv6() and do pskb_trim_rcsum()
> after calling br_validate_ipv6() instead.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>

> ---
>  net/bridge/br_netfilter_ipv6.c | 33 ++++++++++++++-------------------
>  1 file changed, 14 insertions(+), 19 deletions(-)
> 
> diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
> index 50f564c33551..07289e4f3213 100644
> --- a/net/bridge/br_netfilter_ipv6.c
> +++ b/net/bridge/br_netfilter_ipv6.c
> @@ -43,11 +43,11 @@
>  /* We only check the length. A bridge shouldn't do any hop-by-hop stuff
>   * anyway
>   */
> -static int br_nf_check_hbh_len(struct sk_buff *skb)
> +static int br_nf_check_hbh_len(struct sk_buff *skb, u32 *plen)
>  {
>  	int len, off = sizeof(struct ipv6hdr);
>  	unsigned char *nh;
> -	u32 pkt_len;
> +	u32 pkt_len = 0;
>  
>  	if (!pskb_may_pull(skb, off + 8))
>  		return -1;
> @@ -83,10 +83,6 @@ static int br_nf_check_hbh_len(struct sk_buff *skb)
>  				return -1;
>  			if (pkt_len > skb->len - sizeof(struct ipv6hdr))
>  				return -1;
> -			if (pskb_trim_rcsum(skb,
> -					    pkt_len + sizeof(struct ipv6hdr)))
> -				return -1;
> -			nh = skb_network_header(skb);

nit: Something you may want to consider if you spin a v2.

     It seems that pkt_len is only set here.
     So *plen could also be set here, simplifying the return path slightly.

     Also, if so, then a not entirely related clean-up would
     be to reduce the scope of pkt_len to this block.

>  		}
>  		off += optlen;
>  		len -= optlen;
> @@ -94,6 +90,8 @@ static int br_nf_check_hbh_len(struct sk_buff *skb)
>  	if (len)
>  		return -1;
>  
> +	if (pkt_len)
> +		*plen = pkt_len;
>  	return 0;
>  }
>  

...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
  2023-03-04  0:12 ` [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim Xin Long
@ 2023-03-06 16:35   ` Simon Horman
  2023-03-07 20:58     ` Xin Long
  2023-03-07  9:22   ` Nikolay Aleksandrov
  2023-03-07 18:31   ` Aaron Conole
  2 siblings, 1 reply; 25+ messages in thread
From: Simon Horman @ 2023-03-06 16:35 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

On Fri, Mar 03, 2023 at 07:12:41PM -0500, Xin Long wrote:
> For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
> and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
> length for the jumbo packets, all data and exthdr will be trimmed
> in nf_ct_skb_network_trim().
> 
> This patch is to call nf_ip6_check_hbh_len() to get real pkt_len
> of the IPv6 packet, similar to br_validate_ipv6().
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>

...

> diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c
> index 52b776bdf526..2016a3b05f86 100644
> --- a/net/netfilter/nf_conntrack_ovs.c
> +++ b/net/netfilter/nf_conntrack_ovs.c

...

> @@ -114,14 +115,20 @@ EXPORT_SYMBOL_GPL(nf_ct_add_helper);
>  int nf_ct_skb_network_trim(struct sk_buff *skb, int family)
>  {
>  	unsigned int len;
> +	int err;
>  
>  	switch (family) {
>  	case NFPROTO_IPV4:
>  		len = skb_ip_totlen(skb);
>  		break;
>  	case NFPROTO_IPV6:
> -		len = sizeof(struct ipv6hdr)
> -			+ ntohs(ipv6_hdr(skb)->payload_len);
> +		len = ntohs(ipv6_hdr(skb)->payload_len);
> +		if (ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP) {

nit: if you spin a v2,
     you may consider reducing the scope of err to this block.

> +			err = nf_ip6_check_hbh_len(skb, &len);
> +			if (err)
> +				return err;
> +		}
> +		len += sizeof(struct ipv6hdr);
>  		break;
>  	default:
>  		len = skb->len;
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils
  2023-03-04  0:12 ` [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils Xin Long
@ 2023-03-06 16:35   ` Simon Horman
  2023-03-07  9:21   ` Nikolay Aleksandrov
  2023-03-07 18:31   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Simon Horman @ 2023-03-06 16:35 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

On Fri, Mar 03, 2023 at 07:12:40PM -0500, Xin Long wrote:
> Rename br_nf_check_hbh_len() to nf_ip6_check_hbh_len() and move it
> to netfilter utils, so that it can be used by other modules, like
> ovs and tc.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
  2023-03-04  0:12 ` [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len Xin Long
  2023-03-06 15:52   ` Simon Horman
@ 2023-03-07  9:16   ` Nikolay Aleksandrov
  2023-03-07 18:33   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Nikolay Aleksandrov @ 2023-03-07  9:16 UTC (permalink / raw)
  To: Xin Long, netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu, Pravin B Shelar,
	Aaron Conole

On 04/03/2023 02:12, Xin Long wrote:
> When checking Hop-by-hop option header, if the option data is in
> nonlinear area, it should do pskb_may_pull instead of discarding
> the skb as a bad IPv6 packet.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/bridge/br_netfilter_ipv6.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data
  2023-03-04  0:12 ` [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data Xin Long
  2023-03-06 15:59   ` Simon Horman
@ 2023-03-07  9:20   ` Nikolay Aleksandrov
  2023-03-07 18:32   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Nikolay Aleksandrov @ 2023-03-07  9:20 UTC (permalink / raw)
  To: Xin Long, netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu, Pravin B Shelar,
	Aaron Conole

On 04/03/2023 02:12, Xin Long wrote:
> In the while loop of br_nf_check_hbh_len(), similar to ip6_parse_tlv(),
> before accessing 'nh[off + 1]', it should add a check 'len < 2'; and
> before parsing IPV6_TLV_JUMBO, it should add a check 'optlen > len',
> in case of overflows.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/bridge/br_netfilter_ipv6.c | 47 ++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 25 deletions(-)


Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
  2023-03-04  0:12 ` [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len Xin Long
  2023-03-06 16:26   ` Simon Horman
@ 2023-03-07  9:21   ` Nikolay Aleksandrov
  2023-03-07 18:32   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Nikolay Aleksandrov @ 2023-03-07  9:21 UTC (permalink / raw)
  To: Xin Long, netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu, Pravin B Shelar,
	Aaron Conole

On 04/03/2023 02:12, Xin Long wrote:
> br_nf_check_hbh_len() is a function to check the Hop-by-hop option
> header, and shouldn't do pskb_trim_rcsum() there. This patch is to
> pass pkt_len out to br_validate_ipv6() and do pskb_trim_rcsum()
> after calling br_validate_ipv6() instead.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/bridge/br_netfilter_ipv6.c | 33 ++++++++++++++-------------------
>  1 file changed, 14 insertions(+), 19 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils
  2023-03-04  0:12 ` [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils Xin Long
  2023-03-06 16:35   ` Simon Horman
@ 2023-03-07  9:21   ` Nikolay Aleksandrov
  2023-03-07 18:31   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Nikolay Aleksandrov @ 2023-03-07  9:21 UTC (permalink / raw)
  To: Xin Long, netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu, Pravin B Shelar,
	Aaron Conole

On 04/03/2023 02:12, Xin Long wrote:
> Rename br_nf_check_hbh_len() to nf_ip6_check_hbh_len() and move it
> to netfilter utils, so that it can be used by other modules, like
> ovs and tc.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  include/linux/netfilter_ipv6.h |  2 ++
>  net/bridge/br_netfilter_ipv6.c | 57 +---------------------------------
>  net/netfilter/utils.c          | 54 ++++++++++++++++++++++++++++++++
>  3 files changed, 57 insertions(+), 56 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
  2023-03-04  0:12 ` [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim Xin Long
  2023-03-06 16:35   ` Simon Horman
@ 2023-03-07  9:22   ` Nikolay Aleksandrov
  2023-03-07 18:31   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Nikolay Aleksandrov @ 2023-03-07  9:22 UTC (permalink / raw)
  To: Xin Long, netfilter-devel, network dev
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, davem,
	kuba, Eric Dumazet, Paolo Abeni, Roopa Prabhu, Pravin B Shelar,
	Aaron Conole

On 04/03/2023 02:12, Xin Long wrote:
> For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
> and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
> length for the jumbo packets, all data and exthdr will be trimmed
> in nf_ct_skb_network_trim().
> 
> This patch is to call nf_ip6_check_hbh_len() to get real pkt_len
> of the IPv6 packet, similar to br_validate_ipv6().
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/netfilter/nf_conntrack_ovs.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 

I'd also recommend to change the scope of "err" as Simon already pointed out.
Other than that looks good to me.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 6/6] selftests: add a selftest for big tcp
  2023-03-04  0:12 ` [PATCH nf-next 6/6] selftests: add a selftest for big tcp Xin Long
@ 2023-03-07 18:31   ` Aaron Conole
  2023-03-07 20:06     ` Xin Long
  0 siblings, 1 reply; 25+ messages in thread
From: Aaron Conole @ 2023-03-07 18:31 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

Xin Long <lucien.xin@gmail.com> writes:

> This test runs on the client-router-server topo, and monitors the traffic
> on the RX devices of router and server while sending BIG TCP packets with
> netperf from client to server. Meanwhile, it changes 'tso' on the TX devs
> and 'gro' on the RX devs. Then it checks if any BIG TCP packets appears
> on the RX devs with 'ip/ip6tables -m length ! --length 0:65535' for each
> case.
>
> Note that we also add tc action ct in link1 ingress to cover the ipv6
> jumbo packets process in nf_ct_skb_network_trim() of nf_conntrack_ovs.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---

LGTM - just one question

Reviewed-by: Aaron Conole <aconole@redhat.com>

>  tools/testing/selftests/net/Makefile   |   1 +
>  tools/testing/selftests/net/big_tcp.sh | 180 +++++++++++++++++++++++++
>  2 files changed, 181 insertions(+)
>  create mode 100755 tools/testing/selftests/net/big_tcp.sh
>
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index 6cd8993454d7..099741290184 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -48,6 +48,7 @@ TEST_PROGS += l2_tos_ttl_inherit.sh
>  TEST_PROGS += bind_bhash.sh
>  TEST_PROGS += ip_local_port_range.sh
>  TEST_PROGS += rps_default_mask.sh
> +TEST_PROGS += big_tcp.sh
>  TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
>  TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
>  TEST_GEN_FILES =  socket nettest
> diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh
> new file mode 100755
> index 000000000000..cde9a91c4797
> --- /dev/null
> +++ b/tools/testing/selftests/net/big_tcp.sh
> @@ -0,0 +1,180 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Testing For IPv4 and IPv6 BIG TCP.
> +# TOPO: CLIENT_NS (link0)<--->(link1) ROUTER_NS (link2)<--->(link3) SERVER_NS
> +
> +CLIENT_NS=$(mktemp -u client-XXXXXXXX)
> +CLIENT_IP4="198.51.100.1"
> +CLIENT_IP6="2001:db8:1::1"
> +
> +SERVER_NS=$(mktemp -u server-XXXXXXXX)
> +SERVER_IP4="203.0.113.1"
> +SERVER_IP6="2001:db8:2::1"
> +
> +ROUTER_NS=$(mktemp -u router-XXXXXXXX)
> +SERVER_GW4="203.0.113.2"
> +CLIENT_GW4="198.51.100.2"
> +SERVER_GW6="2001:db8:2::2"
> +CLIENT_GW6="2001:db8:1::2"
> +
> +MAX_SIZE=128000
> +CHK_SIZE=65535
> +
> +# Kselftest framework requirement - SKIP code is 4.
> +ksft_skip=4
> +
> +setup() {
> +	ip netns add $CLIENT_NS
> +	ip netns add $SERVER_NS
> +	ip netns add $ROUTER_NS
> +	ip -net $ROUTER_NS link add link1 type veth peer name link0 netns $CLIENT_NS
> +	ip -net $ROUTER_NS link add link2 type veth peer name link3 netns $SERVER_NS
> +
> +	ip -net $CLIENT_NS link set link0 up
> +	ip -net $CLIENT_NS link set link0 mtu 1442
> +	ip -net $CLIENT_NS addr add $CLIENT_IP4/24 dev link0
> +	ip -net $CLIENT_NS addr add $CLIENT_IP6/64 dev link0 nodad
> +	ip -net $CLIENT_NS route add $SERVER_IP4 dev link0 via $CLIENT_GW4
> +	ip -net $CLIENT_NS route add $SERVER_IP6 dev link0 via $CLIENT_GW6
> +	ip -net $CLIENT_NS link set dev link0 \
> +		gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
> +	ip -net $CLIENT_NS link set dev link0 \
> +		gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
> +	ip net exec $CLIENT_NS sysctl -wq net.ipv4.tcp_window_scaling=10
> +
> +	ip -net $ROUTER_NS link set link1 up
> +	ip -net $ROUTER_NS link set link2 up
> +	ip -net $ROUTER_NS addr add $CLIENT_GW4/24 dev link1
> +	ip -net $ROUTER_NS addr add $CLIENT_GW6/64 dev link1 nodad
> +	ip -net $ROUTER_NS addr add $SERVER_GW4/24 dev link2
> +	ip -net $ROUTER_NS addr add $SERVER_GW6/64 dev link2 nodad
> +	ip -net $ROUTER_NS link set dev link1 \
> +		gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
> +	ip -net $ROUTER_NS link set dev link2 \
> +		gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
> +	ip -net $ROUTER_NS link set dev link1 \
> +		gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
> +	ip -net $ROUTER_NS link set dev link2 \
> +		gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
> +	# test for nf_ct_skb_network_trim in nf_conntrack_ovs used by TC ct action.
> +	ip net exec $ROUTER_NS tc qdisc add dev link1 ingress
> +	ip net exec $ROUTER_NS tc filter add dev link1 ingress \
> +		proto ip flower ip_proto tcp action ct
> +	ip net exec $ROUTER_NS tc filter add dev link1 ingress \
> +		proto ipv6 flower ip_proto tcp action ct
> +	ip net exec $ROUTER_NS sysctl -wq net.ipv4.ip_forward=1
> +	ip net exec $ROUTER_NS sysctl -wq net.ipv6.conf.all.forwarding=1
> +
> +	ip -net $SERVER_NS link set link3 up
> +	ip -net $SERVER_NS addr add $SERVER_IP4/24 dev link3
> +	ip -net $SERVER_NS addr add $SERVER_IP6/64 dev link3 nodad
> +	ip -net $SERVER_NS route add $CLIENT_IP4 dev link3 via $SERVER_GW4
> +	ip -net $SERVER_NS route add $CLIENT_IP6 dev link3 via $SERVER_GW6
> +	ip -net $SERVER_NS link set dev link3 \
> +		gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
> +	ip -net $SERVER_NS link set dev link3 \
> +		gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
> +	ip net exec $SERVER_NS sysctl -wq net.ipv4.tcp_window_scaling=10
> +	ip net exec $SERVER_NS netserver 2>&1 >/dev/null
> +}
> +
> +cleanup() {
> +	ip net exec $SERVER_NS pkill netserver
> +	ip -net $ROUTER_NS link del link1
> +	ip -net $ROUTER_NS link del link2
> +	ip netns del "$CLIENT_NS"
> +	ip netns del "$SERVER_NS"
> +	ip netns del "$ROUTER_NS"
> +}
> +
> +start_counter() {
> +	local ipt="iptables"
> +	local iface=$1
> +	local netns=$2
> +
> +	[ "$NF" = "6" ] && ipt="ip6tables"
> +	ip net exec $netns $ipt -t raw -A PREROUTING -i $iface \
> +		-m length ! --length 0:$CHK_SIZE -j ACCEPT
> +}
> +
> +check_counter() {
> +	local ipt="iptables"
> +	local iface=$1
> +	local netns=$2
> +
> +	[ "$NF" = "6" ] && ipt="ip6tables"
> +	test `ip net exec $netns $ipt -t raw -L -v |grep $iface | awk '{print $1}'` != "0"
> +}
> +
> +stop_counter() {
> +	local ipt="iptables"
> +	local iface=$1
> +	local netns=$2
> +
> +	[ "$NF" = "6" ] && ipt="ip6tables"
> +	ip net exec $netns $ipt -t raw -D PREROUTING -i $iface \
> +		-m length ! --length 0:$CHK_SIZE -j ACCEPT
> +}
> +
> +do_netperf() {
> +	local serip=$SERVER_IP4
> +	local netns=$1
> +
> +	[ "$NF" = "6" ] && serip=$SERVER_IP6
> +	ip net exec $netns netperf -$NF -t TCP_STREAM -H $serip 2>&1 >/dev/null
> +}
> +
> +do_test() {
> +	local cli_tso=$1
> +	local gw_gro=$2
> +	local gw_tso=$3
> +	local ser_gro=$4
> +	local ret="PASS"
> +
> +	ip net exec $CLIENT_NS ethtool -K link0 tso $cli_tso
> +	ip net exec $ROUTER_NS ethtool -K link1 gro $gw_gro
> +	ip net exec $ROUTER_NS ethtool -K link2 tso $gw_tso
> +	ip net exec $SERVER_NS ethtool -K link3 gro $ser_gro
> +
> +	start_counter link1 $ROUTER_NS
> +	start_counter link3 $SERVER_NS
> +	do_netperf $CLIENT_NS
> +
> +	if check_counter link1 $ROUTER_NS; then
> +		check_counter link3 $SERVER_NS || ret="FAIL_on_link3"
> +	else
> +		ret="FAIL_on_link1"
> +	fi
> +
> +	stop_counter link1 $ROUTER_NS
> +	stop_counter link3 $SERVER_NS
> +	printf "%-9s %-8s %-8s %-8s: [%s]\n" \
> +		$cli_tso $gw_gro $gw_tso $ser_gro $ret
> +	test $ret = "PASS"
> +}
> +
> +testup() {
> +	echo "CLI GSO | GW GRO | GW GSO | SER GRO" && \
> +	do_test "on"  "on"  "on"  "on"  && \
> +	do_test "on"  "off" "on"  "off" && \
> +	do_test "off" "on"  "on"  "on"  && \
> +	do_test "on"  "on"  "off" "on"  && \
> +	do_test "off" "on"  "off" "on"
> +}
> +
> +if ! netperf -V &> /dev/null; then

Is it ever possible that we get netperf without netserver?

> +	echo "SKIP: Could not run test without netperf tool"
> +	exit $ksft_skip
> +fi
> +
> +if ! ip link help 2>&1 | grep gso_ipv4_max_size &> /dev/null; then
> +	echo "SKIP: Could not run test without gso/gro_ipv4_max_size supported in ip-link"
> +	exit $ksft_skip
> +fi
> +
> +trap cleanup EXIT
> +setup && echo "Testing for BIG TCP:" && \
> +NF=4 testup && echo "***v4 Tests Done***" && \
> +NF=6 testup && echo "***v6 Tests Done***"
> +exit $?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
  2023-03-04  0:12 ` [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim Xin Long
  2023-03-06 16:35   ` Simon Horman
  2023-03-07  9:22   ` Nikolay Aleksandrov
@ 2023-03-07 18:31   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Aaron Conole @ 2023-03-07 18:31 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar

Xin Long <lucien.xin@gmail.com> writes:

> For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
> and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
> length for the jumbo packets, all data and exthdr will be trimmed
> in nf_ct_skb_network_trim().
>
> This patch is to call nf_ip6_check_hbh_len() to get real pkt_len
> of the IPv6 packet, similar to br_validate_ipv6().
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---

Reviewed-by: Aaron Conole <aconole@redhat.com>

>  net/netfilter/nf_conntrack_ovs.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c
> index 52b776bdf526..2016a3b05f86 100644
> --- a/net/netfilter/nf_conntrack_ovs.c
> +++ b/net/netfilter/nf_conntrack_ovs.c
> @@ -6,6 +6,7 @@
>  #include <net/netfilter/ipv6/nf_defrag_ipv6.h>
>  #include <net/ipv6_frag.h>
>  #include <net/ip.h>
> +#include <linux/netfilter_ipv6.h>
>  
>  /* 'skb' should already be pulled to nh_ofs. */
>  int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct,
> @@ -114,14 +115,20 @@ EXPORT_SYMBOL_GPL(nf_ct_add_helper);
>  int nf_ct_skb_network_trim(struct sk_buff *skb, int family)
>  {
>  	unsigned int len;
> +	int err;
>  
>  	switch (family) {
>  	case NFPROTO_IPV4:
>  		len = skb_ip_totlen(skb);
>  		break;
>  	case NFPROTO_IPV6:
> -		len = sizeof(struct ipv6hdr)
> -			+ ntohs(ipv6_hdr(skb)->payload_len);
> +		len = ntohs(ipv6_hdr(skb)->payload_len);
> +		if (ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP) {
> +			err = nf_ip6_check_hbh_len(skb, &len);
> +			if (err)
> +				return err;
> +		}
> +		len += sizeof(struct ipv6hdr);
>  		break;
>  	default:
>  		len = skb->len;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils
  2023-03-04  0:12 ` [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils Xin Long
  2023-03-06 16:35   ` Simon Horman
  2023-03-07  9:21   ` Nikolay Aleksandrov
@ 2023-03-07 18:31   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Aaron Conole @ 2023-03-07 18:31 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar

Xin Long <lucien.xin@gmail.com> writes:

> Rename br_nf_check_hbh_len() to nf_ip6_check_hbh_len() and move it
> to netfilter utils, so that it can be used by other modules, like
> ovs and tc.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---

Reviewed-by: Aaron Conole <aconole@redhat.com>

>  include/linux/netfilter_ipv6.h |  2 ++
>  net/bridge/br_netfilter_ipv6.c | 57 +---------------------------------
>  net/netfilter/utils.c          | 54 ++++++++++++++++++++++++++++++++
>  3 files changed, 57 insertions(+), 56 deletions(-)
>
> diff --git a/include/linux/netfilter_ipv6.h b/include/linux/netfilter_ipv6.h
> index 48314ade1506..7834c0be2831 100644
> --- a/include/linux/netfilter_ipv6.h
> +++ b/include/linux/netfilter_ipv6.h
> @@ -197,6 +197,8 @@ static inline int nf_cookie_v6_check(const struct ipv6hdr *iph,
>  __sum16 nf_ip6_checksum(struct sk_buff *skb, unsigned int hook,
>  			unsigned int dataoff, u_int8_t protocol);
>  
> +int nf_ip6_check_hbh_len(struct sk_buff *skb, u32 *plen);
> +
>  int ipv6_netfilter_init(void);
>  void ipv6_netfilter_fini(void);
>  
> diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
> index 07289e4f3213..550039dfc31a 100644
> --- a/net/bridge/br_netfilter_ipv6.c
> +++ b/net/bridge/br_netfilter_ipv6.c
> @@ -40,61 +40,6 @@
>  #include <linux/sysctl.h>
>  #endif
>  
> -/* We only check the length. A bridge shouldn't do any hop-by-hop stuff
> - * anyway
> - */
> -static int br_nf_check_hbh_len(struct sk_buff *skb, u32 *plen)
> -{
> -	int len, off = sizeof(struct ipv6hdr);
> -	unsigned char *nh;
> -	u32 pkt_len = 0;
> -
> -	if (!pskb_may_pull(skb, off + 8))
> -		return -1;
> -	nh = (u8 *)(ipv6_hdr(skb) + 1);
> -	len = (nh[1] + 1) << 3;
> -
> -	if (!pskb_may_pull(skb, off + len))
> -		return -1;
> -	nh = skb_network_header(skb);
> -
> -	off += 2;
> -	len -= 2;
> -	while (len > 0) {
> -		int optlen;
> -
> -		if (nh[off] == IPV6_TLV_PAD1) {
> -			off++;
> -			len--;
> -			continue;
> -		}
> -		if (len < 2)
> -			return -1;
> -		optlen = nh[off + 1] + 2;
> -		if (optlen > len)
> -			return -1;
> -
> -		if (nh[off] == IPV6_TLV_JUMBO) {
> -			if (nh[off + 1] != 4 || (off & 3) != 2)
> -				return -1;
> -			pkt_len = ntohl(*(__be32 *)(nh + off + 2));
> -			if (pkt_len <= IPV6_MAXPLEN ||
> -			    ipv6_hdr(skb)->payload_len)
> -				return -1;
> -			if (pkt_len > skb->len - sizeof(struct ipv6hdr))
> -				return -1;
> -		}
> -		off += optlen;
> -		len -= optlen;
> -	}
> -	if (len)
> -		return -1;
> -
> -	if (pkt_len)
> -		*plen = pkt_len;
> -	return 0;
> -}
> -
>  int br_validate_ipv6(struct net *net, struct sk_buff *skb)
>  {
>  	const struct ipv6hdr *hdr;
> @@ -114,7 +59,7 @@ int br_validate_ipv6(struct net *net, struct sk_buff *skb)
>  		goto inhdr_error;
>  
>  	pkt_len = ntohs(hdr->payload_len);
> -	if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb, &pkt_len))
> +	if (hdr->nexthdr == NEXTHDR_HOP && nf_ip6_check_hbh_len(skb, &pkt_len))
>  		goto drop;
>  
>  	if (pkt_len + ip6h_len > skb->len) {
> diff --git a/net/netfilter/utils.c b/net/netfilter/utils.c
> index 2182d361e273..04f4bd661774 100644
> --- a/net/netfilter/utils.c
> +++ b/net/netfilter/utils.c
> @@ -215,3 +215,57 @@ int nf_reroute(struct sk_buff *skb, struct nf_queue_entry *entry)
>  	}
>  	return ret;
>  }
> +
> +/* Only get and check the lengths, not do any hop-by-hop stuff. */
> +int nf_ip6_check_hbh_len(struct sk_buff *skb, u32 *plen)
> +{
> +	int len, off = sizeof(struct ipv6hdr);
> +	unsigned char *nh;
> +	u32 pkt_len = 0;
> +
> +	if (!pskb_may_pull(skb, off + 8))
> +		return -ENOMEM;
> +	nh = (u8 *)(ipv6_hdr(skb) + 1);
> +	len = (nh[1] + 1) << 3;
> +
> +	if (!pskb_may_pull(skb, off + len))
> +		return -ENOMEM;
> +	nh = skb_network_header(skb);
> +
> +	off += 2;
> +	len -= 2;
> +	while (len > 0) {
> +		int optlen;
> +
> +		if (nh[off] == IPV6_TLV_PAD1) {
> +			off++;
> +			len--;
> +			continue;
> +		}
> +		if (len < 2)
> +			return -EBADMSG;
> +		optlen = nh[off + 1] + 2;
> +		if (optlen > len)
> +			return -EBADMSG;
> +
> +		if (nh[off] == IPV6_TLV_JUMBO) {
> +			if (nh[off + 1] != 4 || (off & 3) != 2)
> +				return -EBADMSG;
> +			pkt_len = ntohl(*(__be32 *)(nh + off + 2));
> +			if (pkt_len <= IPV6_MAXPLEN ||
> +			    ipv6_hdr(skb)->payload_len)
> +				return -EBADMSG;
> +			if (pkt_len > skb->len - sizeof(struct ipv6hdr))
> +				return -EBADMSG;
> +		}
> +		off += optlen;
> +		len -= optlen;
> +	}
> +	if (len)
> +		return -EBADMSG;
> +
> +	if (pkt_len)
> +		*plen = pkt_len;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(nf_ip6_check_hbh_len);


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
  2023-03-04  0:12 ` [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len Xin Long
  2023-03-06 16:26   ` Simon Horman
  2023-03-07  9:21   ` Nikolay Aleksandrov
@ 2023-03-07 18:32   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Aaron Conole @ 2023-03-07 18:32 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar

Xin Long <lucien.xin@gmail.com> writes:

> br_nf_check_hbh_len() is a function to check the Hop-by-hop option
> header, and shouldn't do pskb_trim_rcsum() there. This patch is to
> pass pkt_len out to br_validate_ipv6() and do pskb_trim_rcsum()
> after calling br_validate_ipv6() instead.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---

Reviewed-by: Aaron Conole <aconole@redhat.com>

>  net/bridge/br_netfilter_ipv6.c | 33 ++++++++++++++-------------------
>  1 file changed, 14 insertions(+), 19 deletions(-)
>
> diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
> index 50f564c33551..07289e4f3213 100644
> --- a/net/bridge/br_netfilter_ipv6.c
> +++ b/net/bridge/br_netfilter_ipv6.c
> @@ -43,11 +43,11 @@
>  /* We only check the length. A bridge shouldn't do any hop-by-hop stuff
>   * anyway
>   */
> -static int br_nf_check_hbh_len(struct sk_buff *skb)
> +static int br_nf_check_hbh_len(struct sk_buff *skb, u32 *plen)
>  {
>  	int len, off = sizeof(struct ipv6hdr);
>  	unsigned char *nh;
> -	u32 pkt_len;
> +	u32 pkt_len = 0;
>  
>  	if (!pskb_may_pull(skb, off + 8))
>  		return -1;
> @@ -83,10 +83,6 @@ static int br_nf_check_hbh_len(struct sk_buff *skb)
>  				return -1;
>  			if (pkt_len > skb->len - sizeof(struct ipv6hdr))
>  				return -1;
> -			if (pskb_trim_rcsum(skb,
> -					    pkt_len + sizeof(struct ipv6hdr)))
> -				return -1;
> -			nh = skb_network_header(skb);
>  		}
>  		off += optlen;
>  		len -= optlen;
> @@ -94,6 +90,8 @@ static int br_nf_check_hbh_len(struct sk_buff *skb)
>  	if (len)
>  		return -1;
>  
> +	if (pkt_len)
> +		*plen = pkt_len;
>  	return 0;
>  }
>  
> @@ -116,22 +114,19 @@ int br_validate_ipv6(struct net *net, struct sk_buff *skb)
>  		goto inhdr_error;
>  
>  	pkt_len = ntohs(hdr->payload_len);
> +	if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb, &pkt_len))
> +		goto drop;
>  
> -	if (pkt_len || hdr->nexthdr != NEXTHDR_HOP) {
> -		if (pkt_len + ip6h_len > skb->len) {
> -			__IP6_INC_STATS(net, idev,
> -					IPSTATS_MIB_INTRUNCATEDPKTS);
> -			goto drop;
> -		}
> -		if (pskb_trim_rcsum(skb, pkt_len + ip6h_len)) {
> -			__IP6_INC_STATS(net, idev,
> -					IPSTATS_MIB_INDISCARDS);
> -			goto drop;
> -		}
> -		hdr = ipv6_hdr(skb);
> +	if (pkt_len + ip6h_len > skb->len) {
> +		__IP6_INC_STATS(net, idev,
> +				IPSTATS_MIB_INTRUNCATEDPKTS);
> +		goto drop;
>  	}
> -	if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb))
> +	if (pskb_trim_rcsum(skb, pkt_len + ip6h_len)) {
> +		__IP6_INC_STATS(net, idev,
> +				IPSTATS_MIB_INDISCARDS);
>  		goto drop;
> +	}
>  
>  	memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
>  	/* No IP options in IPv6 header; however it should be


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data
  2023-03-04  0:12 ` [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data Xin Long
  2023-03-06 15:59   ` Simon Horman
  2023-03-07  9:20   ` Nikolay Aleksandrov
@ 2023-03-07 18:32   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Aaron Conole @ 2023-03-07 18:32 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar

Xin Long <lucien.xin@gmail.com> writes:

> In the while loop of br_nf_check_hbh_len(), similar to ip6_parse_tlv(),
> before accessing 'nh[off + 1]', it should add a check 'len < 2'; and
> before parsing IPV6_TLV_JUMBO, it should add a check 'optlen > len',
> in case of overflows.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---

Reviewed-by: Aaron Conole <aconole@redhat.com>

>  net/bridge/br_netfilter_ipv6.c | 47 ++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 25 deletions(-)
>
> diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
> index 5cd3e4c35123..50f564c33551 100644
> --- a/net/bridge/br_netfilter_ipv6.c
> +++ b/net/bridge/br_netfilter_ipv6.c
> @@ -50,54 +50,51 @@ static int br_nf_check_hbh_len(struct sk_buff *skb)
>  	u32 pkt_len;
>  
>  	if (!pskb_may_pull(skb, off + 8))
> -		goto bad;
> +		return -1;
>  	nh = (u8 *)(ipv6_hdr(skb) + 1);
>  	len = (nh[1] + 1) << 3;
>  
>  	if (!pskb_may_pull(skb, off + len))
> -		goto bad;
> +		return -1;
>  	nh = skb_network_header(skb);
>  
>  	off += 2;
>  	len -= 2;
> -
>  	while (len > 0) {
> -		int optlen = nh[off + 1] + 2;
> -
> -		switch (nh[off]) {
> -		case IPV6_TLV_PAD1:
> -			optlen = 1;
> -			break;
> +		int optlen;
>  
> -		case IPV6_TLV_PADN:
> -			break;
> +		if (nh[off] == IPV6_TLV_PAD1) {
> +			off++;
> +			len--;
> +			continue;
> +		}
> +		if (len < 2)
> +			return -1;
> +		optlen = nh[off + 1] + 2;
> +		if (optlen > len)
> +			return -1;
>  
> -		case IPV6_TLV_JUMBO:
> +		if (nh[off] == IPV6_TLV_JUMBO) {
>  			if (nh[off + 1] != 4 || (off & 3) != 2)
> -				goto bad;
> +				return -1;
>  			pkt_len = ntohl(*(__be32 *)(nh + off + 2));
>  			if (pkt_len <= IPV6_MAXPLEN ||
>  			    ipv6_hdr(skb)->payload_len)
> -				goto bad;
> +				return -1;
>  			if (pkt_len > skb->len - sizeof(struct ipv6hdr))
> -				goto bad;
> +				return -1;
>  			if (pskb_trim_rcsum(skb,
>  					    pkt_len + sizeof(struct ipv6hdr)))
> -				goto bad;
> +				return -1;
>  			nh = skb_network_header(skb);
> -			break;
> -		default:
> -			if (optlen > len)
> -				goto bad;
> -			break;
>  		}
>  		off += optlen;
>  		len -= optlen;
>  	}
> -	if (len == 0)
> -		return 0;
> -bad:
> -	return -1;
> +	if (len)
> +		return -1;
> +
> +	return 0;
>  }
>  
>  int br_validate_ipv6(struct net *net, struct sk_buff *skb)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
  2023-03-04  0:12 ` [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len Xin Long
  2023-03-06 15:52   ` Simon Horman
  2023-03-07  9:16   ` Nikolay Aleksandrov
@ 2023-03-07 18:33   ` Aaron Conole
  2 siblings, 0 replies; 25+ messages in thread
From: Aaron Conole @ 2023-03-07 18:33 UTC (permalink / raw)
  To: Xin Long
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar

Xin Long <lucien.xin@gmail.com> writes:

> When checking Hop-by-hop option header, if the option data is in
> nonlinear area, it should do pskb_may_pull instead of discarding
> the skb as a bad IPv6 packet.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---

Reviewed-by: Aaron Conole <aconole@redhat.com>

>  net/bridge/br_netfilter_ipv6.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
> index 6b07f30675bb..5cd3e4c35123 100644
> --- a/net/bridge/br_netfilter_ipv6.c
> +++ b/net/bridge/br_netfilter_ipv6.c
> @@ -45,14 +45,18 @@
>   */
>  static int br_nf_check_hbh_len(struct sk_buff *skb)
>  {
> -	unsigned char *raw = (u8 *)(ipv6_hdr(skb) + 1);
> +	int len, off = sizeof(struct ipv6hdr);
> +	unsigned char *nh;
>  	u32 pkt_len;
> -	const unsigned char *nh = skb_network_header(skb);
> -	int off = raw - nh;
> -	int len = (raw[1] + 1) << 3;
>  
> -	if ((raw + len) - skb->data > skb_headlen(skb))
> +	if (!pskb_may_pull(skb, off + 8))
>  		goto bad;
> +	nh = (u8 *)(ipv6_hdr(skb) + 1);
> +	len = (nh[1] + 1) << 3;
> +
> +	if (!pskb_may_pull(skb, off + len))
> +		goto bad;
> +	nh = skb_network_header(skb);
>  
>  	off += 2;
>  	len -= 2;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 6/6] selftests: add a selftest for big tcp
  2023-03-07 18:31   ` Aaron Conole
@ 2023-03-07 20:06     ` Xin Long
  0 siblings, 0 replies; 25+ messages in thread
From: Xin Long @ 2023-03-07 20:06 UTC (permalink / raw)
  To: Aaron Conole
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar

On Tue, Mar 7, 2023 at 1:31 PM Aaron Conole <aconole@redhat.com> wrote:
>
> Xin Long <lucien.xin@gmail.com> writes:
>
> > This test runs on the client-router-server topo, and monitors the traffic
> > on the RX devices of router and server while sending BIG TCP packets with
> > netperf from client to server. Meanwhile, it changes 'tso' on the TX devs
> > and 'gro' on the RX devs. Then it checks if any BIG TCP packets appears
> > on the RX devs with 'ip/ip6tables -m length ! --length 0:65535' for each
> > case.
> >
> > Note that we also add tc action ct in link1 ingress to cover the ipv6
> > jumbo packets process in nf_ct_skb_network_trim() of nf_conntrack_ovs.
> >
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > ---
>
> LGTM - just one question
>
> Reviewed-by: Aaron Conole <aconole@redhat.com>
>
> >  tools/testing/selftests/net/Makefile   |   1 +
> >  tools/testing/selftests/net/big_tcp.sh | 180 +++++++++++++++++++++++++
> >  2 files changed, 181 insertions(+)
> >  create mode 100755 tools/testing/selftests/net/big_tcp.sh
> >
> > diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> > index 6cd8993454d7..099741290184 100644
> > --- a/tools/testing/selftests/net/Makefile
> > +++ b/tools/testing/selftests/net/Makefile
> > @@ -48,6 +48,7 @@ TEST_PROGS += l2_tos_ttl_inherit.sh
> >  TEST_PROGS += bind_bhash.sh
> >  TEST_PROGS += ip_local_port_range.sh
> >  TEST_PROGS += rps_default_mask.sh
> > +TEST_PROGS += big_tcp.sh
> >  TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
> >  TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
> >  TEST_GEN_FILES =  socket nettest
> > diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh
> > new file mode 100755
> > index 000000000000..cde9a91c4797
> > --- /dev/null
> > +++ b/tools/testing/selftests/net/big_tcp.sh
> > @@ -0,0 +1,180 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# Testing For IPv4 and IPv6 BIG TCP.
> > +# TOPO: CLIENT_NS (link0)<--->(link1) ROUTER_NS (link2)<--->(link3) SERVER_NS
> > +
> > +CLIENT_NS=$(mktemp -u client-XXXXXXXX)
> > +CLIENT_IP4="198.51.100.1"
> > +CLIENT_IP6="2001:db8:1::1"
> > +
> > +SERVER_NS=$(mktemp -u server-XXXXXXXX)
> > +SERVER_IP4="203.0.113.1"
> > +SERVER_IP6="2001:db8:2::1"
> > +
> > +ROUTER_NS=$(mktemp -u router-XXXXXXXX)
> > +SERVER_GW4="203.0.113.2"
> > +CLIENT_GW4="198.51.100.2"
> > +SERVER_GW6="2001:db8:2::2"
> > +CLIENT_GW6="2001:db8:1::2"
> > +
> > +MAX_SIZE=128000
> > +CHK_SIZE=65535
> > +
> > +# Kselftest framework requirement - SKIP code is 4.
> > +ksft_skip=4
> > +
> > +setup() {
> > +     ip netns add $CLIENT_NS
> > +     ip netns add $SERVER_NS
> > +     ip netns add $ROUTER_NS
> > +     ip -net $ROUTER_NS link add link1 type veth peer name link0 netns $CLIENT_NS
> > +     ip -net $ROUTER_NS link add link2 type veth peer name link3 netns $SERVER_NS
> > +
> > +     ip -net $CLIENT_NS link set link0 up
> > +     ip -net $CLIENT_NS link set link0 mtu 1442
> > +     ip -net $CLIENT_NS addr add $CLIENT_IP4/24 dev link0
> > +     ip -net $CLIENT_NS addr add $CLIENT_IP6/64 dev link0 nodad
> > +     ip -net $CLIENT_NS route add $SERVER_IP4 dev link0 via $CLIENT_GW4
> > +     ip -net $CLIENT_NS route add $SERVER_IP6 dev link0 via $CLIENT_GW6
> > +     ip -net $CLIENT_NS link set dev link0 \
> > +             gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
> > +     ip -net $CLIENT_NS link set dev link0 \
> > +             gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
> > +     ip net exec $CLIENT_NS sysctl -wq net.ipv4.tcp_window_scaling=10
> > +
> > +     ip -net $ROUTER_NS link set link1 up
> > +     ip -net $ROUTER_NS link set link2 up
> > +     ip -net $ROUTER_NS addr add $CLIENT_GW4/24 dev link1
> > +     ip -net $ROUTER_NS addr add $CLIENT_GW6/64 dev link1 nodad
> > +     ip -net $ROUTER_NS addr add $SERVER_GW4/24 dev link2
> > +     ip -net $ROUTER_NS addr add $SERVER_GW6/64 dev link2 nodad
> > +     ip -net $ROUTER_NS link set dev link1 \
> > +             gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
> > +     ip -net $ROUTER_NS link set dev link2 \
> > +             gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
> > +     ip -net $ROUTER_NS link set dev link1 \
> > +             gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
> > +     ip -net $ROUTER_NS link set dev link2 \
> > +             gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
> > +     # test for nf_ct_skb_network_trim in nf_conntrack_ovs used by TC ct action.
> > +     ip net exec $ROUTER_NS tc qdisc add dev link1 ingress
> > +     ip net exec $ROUTER_NS tc filter add dev link1 ingress \
> > +             proto ip flower ip_proto tcp action ct
> > +     ip net exec $ROUTER_NS tc filter add dev link1 ingress \
> > +             proto ipv6 flower ip_proto tcp action ct
> > +     ip net exec $ROUTER_NS sysctl -wq net.ipv4.ip_forward=1
> > +     ip net exec $ROUTER_NS sysctl -wq net.ipv6.conf.all.forwarding=1
> > +
> > +     ip -net $SERVER_NS link set link3 up
> > +     ip -net $SERVER_NS addr add $SERVER_IP4/24 dev link3
> > +     ip -net $SERVER_NS addr add $SERVER_IP6/64 dev link3 nodad
> > +     ip -net $SERVER_NS route add $CLIENT_IP4 dev link3 via $SERVER_GW4
> > +     ip -net $SERVER_NS route add $CLIENT_IP6 dev link3 via $SERVER_GW6
> > +     ip -net $SERVER_NS link set dev link3 \
> > +             gro_ipv4_max_size $MAX_SIZE gso_ipv4_max_size $MAX_SIZE
> > +     ip -net $SERVER_NS link set dev link3 \
> > +             gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
> > +     ip net exec $SERVER_NS sysctl -wq net.ipv4.tcp_window_scaling=10
> > +     ip net exec $SERVER_NS netserver 2>&1 >/dev/null
> > +}
> > +
> > +cleanup() {
> > +     ip net exec $SERVER_NS pkill netserver
> > +     ip -net $ROUTER_NS link del link1
> > +     ip -net $ROUTER_NS link del link2
> > +     ip netns del "$CLIENT_NS"
> > +     ip netns del "$SERVER_NS"
> > +     ip netns del "$ROUTER_NS"
> > +}
> > +
> > +start_counter() {
> > +     local ipt="iptables"
> > +     local iface=$1
> > +     local netns=$2
> > +
> > +     [ "$NF" = "6" ] && ipt="ip6tables"
> > +     ip net exec $netns $ipt -t raw -A PREROUTING -i $iface \
> > +             -m length ! --length 0:$CHK_SIZE -j ACCEPT
> > +}
> > +
> > +check_counter() {
> > +     local ipt="iptables"
> > +     local iface=$1
> > +     local netns=$2
> > +
> > +     [ "$NF" = "6" ] && ipt="ip6tables"
> > +     test `ip net exec $netns $ipt -t raw -L -v |grep $iface | awk '{print $1}'` != "0"
> > +}
> > +
> > +stop_counter() {
> > +     local ipt="iptables"
> > +     local iface=$1
> > +     local netns=$2
> > +
> > +     [ "$NF" = "6" ] && ipt="ip6tables"
> > +     ip net exec $netns $ipt -t raw -D PREROUTING -i $iface \
> > +             -m length ! --length 0:$CHK_SIZE -j ACCEPT
> > +}
> > +
> > +do_netperf() {
> > +     local serip=$SERVER_IP4
> > +     local netns=$1
> > +
> > +     [ "$NF" = "6" ] && serip=$SERVER_IP6
> > +     ip net exec $netns netperf -$NF -t TCP_STREAM -H $serip 2>&1 >/dev/null
> > +}
> > +
> > +do_test() {
> > +     local cli_tso=$1
> > +     local gw_gro=$2
> > +     local gw_tso=$3
> > +     local ser_gro=$4
> > +     local ret="PASS"
> > +
> > +     ip net exec $CLIENT_NS ethtool -K link0 tso $cli_tso
> > +     ip net exec $ROUTER_NS ethtool -K link1 gro $gw_gro
> > +     ip net exec $ROUTER_NS ethtool -K link2 tso $gw_tso
> > +     ip net exec $SERVER_NS ethtool -K link3 gro $ser_gro
> > +
> > +     start_counter link1 $ROUTER_NS
> > +     start_counter link3 $SERVER_NS
> > +     do_netperf $CLIENT_NS
> > +
> > +     if check_counter link1 $ROUTER_NS; then
> > +             check_counter link3 $SERVER_NS || ret="FAIL_on_link3"
> > +     else
> > +             ret="FAIL_on_link1"
> > +     fi
> > +
> > +     stop_counter link1 $ROUTER_NS
> > +     stop_counter link3 $SERVER_NS
> > +     printf "%-9s %-8s %-8s %-8s: [%s]\n" \
> > +             $cli_tso $gw_gro $gw_tso $ser_gro $ret
> > +     test $ret = "PASS"
> > +}
> > +
> > +testup() {
> > +     echo "CLI GSO | GW GRO | GW GSO | SER GRO" && \
> > +     do_test "on"  "on"  "on"  "on"  && \
> > +     do_test "on"  "off" "on"  "off" && \
> > +     do_test "off" "on"  "on"  "on"  && \
> > +     do_test "on"  "on"  "off" "on"  && \
> > +     do_test "off" "on"  "off" "on"
> > +}
> > +
> > +if ! netperf -V &> /dev/null; then
>
> Is it ever possible that we get netperf without netserver?
Hi, Aaron,

"bin_PROGRAMS = netperf netserver" in src/Makefile.am
of netperf source code will ensure both will be compiled and
installed, unless someone deletes "netserver" from the
source code.

Other kselftest like netfilter/nft_concat_range.sh also assumes
netserver exists if netperf command is installed.

Thanks for the review.

>
> > +     echo "SKIP: Could not run test without netperf tool"
> > +     exit $ksft_skip
> > +fi
> > +
> > +if ! ip link help 2>&1 | grep gso_ipv4_max_size &> /dev/null; then
> > +     echo "SKIP: Could not run test without gso/gro_ipv4_max_size supported in ip-link"
> > +     exit $ksft_skip
> > +fi
> > +
> > +trap cleanup EXIT
> > +setup && echo "Testing for BIG TCP:" && \
> > +NF=4 testup && echo "***v4 Tests Done***" && \
> > +NF=6 testup && echo "***v6 Tests Done***"
> > +exit $?
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
  2023-03-06 16:35   ` Simon Horman
@ 2023-03-07 20:58     ` Xin Long
  0 siblings, 0 replies; 25+ messages in thread
From: Xin Long @ 2023-03-07 20:58 UTC (permalink / raw)
  To: Simon Horman
  Cc: netfilter-devel, network dev, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, davem, kuba, Eric Dumazet, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Pravin B Shelar, Aaron Conole

On Mon, Mar 6, 2023 at 11:35 AM Simon Horman <simon.horman@corigine.com> wrote:
>
> On Fri, Mar 03, 2023 at 07:12:41PM -0500, Xin Long wrote:
> > For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
> > and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
> > length for the jumbo packets, all data and exthdr will be trimmed
> > in nf_ct_skb_network_trim().
> >
> > This patch is to call nf_ip6_check_hbh_len() to get real pkt_len
> > of the IPv6 packet, similar to br_validate_ipv6().
> >
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
>
> Reviewed-by: Simon Horman <simon.horman@corigine.com>
>
> ...
>
> > diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c
> > index 52b776bdf526..2016a3b05f86 100644
> > --- a/net/netfilter/nf_conntrack_ovs.c
> > +++ b/net/netfilter/nf_conntrack_ovs.c
>
> ...
>
> > @@ -114,14 +115,20 @@ EXPORT_SYMBOL_GPL(nf_ct_add_helper);
> >  int nf_ct_skb_network_trim(struct sk_buff *skb, int family)
> >  {
> >       unsigned int len;
> > +     int err;
> >
> >       switch (family) {
> >       case NFPROTO_IPV4:
> >               len = skb_ip_totlen(skb);
> >               break;
> >       case NFPROTO_IPV6:
> > -             len = sizeof(struct ipv6hdr)
> > -                     + ntohs(ipv6_hdr(skb)->payload_len);
> > +             len = ntohs(ipv6_hdr(skb)->payload_len);
> > +             if (ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP) {
>
> nit: if you spin a v2,
>      you may consider reducing the scope of err to this block.
>
Hi, Simon,

I will post v2 with your suggestions including those in another 3 patches. :)

Thanks.

> > +                     err = nf_ip6_check_hbh_len(skb, &len);
> > +                     if (err)
> > +                             return err;
> > +             }
> > +             len += sizeof(struct ipv6hdr);
> >               break;
> >       default:
> >               len = skb->len;
> > --
> > 2.39.1
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-03-07 20:59 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-04  0:12 [PATCH nf-next 0/6] netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc Xin Long
2023-03-04  0:12 ` [PATCH nf-next 1/6] netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len Xin Long
2023-03-06 15:52   ` Simon Horman
2023-03-07  9:16   ` Nikolay Aleksandrov
2023-03-07 18:33   ` Aaron Conole
2023-03-04  0:12 ` [PATCH nf-next 2/6] netfilter: bridge: check len before accessing more nh data Xin Long
2023-03-06 15:59   ` Simon Horman
2023-03-07  9:20   ` Nikolay Aleksandrov
2023-03-07 18:32   ` Aaron Conole
2023-03-04  0:12 ` [PATCH nf-next 3/6] netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len Xin Long
2023-03-06 16:26   ` Simon Horman
2023-03-07  9:21   ` Nikolay Aleksandrov
2023-03-07 18:32   ` Aaron Conole
2023-03-04  0:12 ` [PATCH nf-next 4/6] netfilter: move br_nf_check_hbh_len to utils Xin Long
2023-03-06 16:35   ` Simon Horman
2023-03-07  9:21   ` Nikolay Aleksandrov
2023-03-07 18:31   ` Aaron Conole
2023-03-04  0:12 ` [PATCH nf-next 5/6] netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim Xin Long
2023-03-06 16:35   ` Simon Horman
2023-03-07 20:58     ` Xin Long
2023-03-07  9:22   ` Nikolay Aleksandrov
2023-03-07 18:31   ` Aaron Conole
2023-03-04  0:12 ` [PATCH nf-next 6/6] selftests: add a selftest for big tcp Xin Long
2023-03-07 18:31   ` Aaron Conole
2023-03-07 20:06     ` Xin Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).