Ethernet Bridge development

Ethernet Bridge development
 help / color / mirror / Atom feed

* Re: [Bridge] [PATCH v3 1/2] net: bridge: use mac_len in bridge forwarding
From: Toshiaki Makita @ 2019-09-03 11:37 UTC (permalink / raw)
  To: Zahari Doychev, netdev
  Cc: dsahern, jiri, nikolay, simon.horman, roopa, bridge, jhs,
	makita.toshiaki, xiyou.wangcong, johannes, alexei.starovoitov
In-Reply-To: <20190902181000.25638-1-zahari.doychev@linux.com>

Hi Zahari,

Sorry for reviewing this late.

On 2019/09/03 3:09, Zahari Doychev wrote:
...
> @@ -466,13 +466,14 @@ static bool __allowed_ingress(const struct net_bridge *br,
>   		/* Tagged frame */
>   		if (skb->vlan_proto != br->vlan_proto) {
>   			/* Protocol-mismatch, empty out vlan_tci for new tag */
> -			skb_push(skb, ETH_HLEN);
> +			skb_push(skb, skb->mac_len);
>   			skb = vlan_insert_tag_set_proto(skb, skb->vlan_proto,
>   							skb_vlan_tag_get(skb));

I think we should insert vlan at skb->data, i.e. mac_header + mac_len, while this
function inserts the tag at mac_header + ETH_HLEN which is not always the correct
offset.

>   			if (unlikely(!skb))
>   				return false;
>   
>   			skb_pull(skb, ETH_HLEN);

Now skb->data is mac_header + ETH_HLEN which would be broken when mac_len is not
ETH_HLEN?

> +			skb_reset_network_header(skb);
>   			skb_reset_mac_len(skb);
>   			*vid = 0;
>   			tagged = false;
> 

Toshiaki Makita

^ permalink raw reply

* Re: [Bridge] [PATCH v5 1/1] net: br_netfiler_hooks: Drops IPv6 packets if IPv6 module is not loaded
From: Pablo Neira Ayuso @ 2019-09-02 21:20 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Nikolay Aleksandrov, netdev, Roopa Prabhu, bridge,
	Florian Westphal, linux-kernel, Jozsef Kadlecsik, coreteam,
	netfilter-devel, David S. Miller
In-Reply-To: <20190831044032.31931-1-leonardo@linux.ibm.com>

On Sat, Aug 31, 2019 at 01:40:33AM -0300, Leonardo Bras wrote:
> A kernel panic can happen if a host has disabled IPv6 on boot and have to
> process guest packets (coming from a bridge) using it's ip6tables.
> 
> IPv6 packets need to be dropped if the IPv6 module is not loaded, and the
> host ip6tables will be used.

Applied, thanks.

^ permalink raw reply

* [Bridge] [PATCH v3 2/2] selftests: forwrading: tc vlan bridge test
From: Zahari Doychev @ 2019-09-02 18:10 UTC (permalink / raw)
  To: netdev
  Cc: makita.toshiaki, jiri, nikolay, simon.horman, roopa, bridge,
	Zahari Doychev, jhs, dsahern, xiyou.wangcong, johannes,
	alexei.starovoitov
In-Reply-To: <20190902181000.25638-1-zahari.doychev@linux.com>

Add bridge vlan aware forwarding test for vlans added by tc-act_vlan. The
forwarding is tested in two cases when the bridge protocol and outer vlan
tag protocol match and mismatch. The tests checks the correct usage of
skb->mac_len in the bridge code.

Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>

---
v2->v3:
 - selftest added
---
 .../forwarding/bridge_vlan_aware_tc_vlan.sh   | 187 ++++++++++++++++++
 1 file changed, 187 insertions(+)
 create mode 100755 tools/testing/selftests/net/forwarding/bridge_vlan_aware_tc_vlan.sh

diff --git a/tools/testing/selftests/net/forwarding/bridge_vlan_aware_tc_vlan.sh b/tools/testing/selftests/net/forwarding/bridge_vlan_aware_tc_vlan.sh
new file mode 100755
index 000000000000..215d6293fa54
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/bridge_vlan_aware_tc_vlan.sh
@@ -0,0 +1,187 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# This test uses the standard topology for testing bridge forwarding. See
+# README for more details.
+#
+# tc vlan actions are applied on one of the bridge ports and on other one
+# the corresponding vlan network devices are created.
+#
+
+ALL_TESTS="
+	test_tc_vlan_bridge_ipv4_forwarding
+	test_tc_vlan_bridge_ipv4_forwarding_proto
+	test_tc_vlan_bridge_ipv6_forwarding
+	test_tc_vlan_bridge_ipv6_forwarding_proto
+"
+
+NUM_NETIFS=4
+CHECK_TC="yes"
+source lib.sh
+
+h_create()
+{
+	local dev=$1; shift
+	local ip=$1; shift
+	local ip6=$1
+
+	simple_if_init $dev $ip $ip6
+}
+
+h_destroy()
+{
+	local dev=$1; shift
+	local ip=$1; shift
+	local ip6=$1
+
+	simple_if_fini $dev $ip $ip6
+}
+
+switch_create()
+{
+	ip link add dev br0 type bridge vlan_filtering 1 vlan_protocol 802.1q \
+		mcast_snooping 0
+
+	ip link set dev $swp1 master br0
+	ip link set dev $swp2 master br0
+
+	ip link set dev br0 up
+	ip link set dev $swp1 up
+	ip link set dev $swp2 up
+
+	bridge vlan add dev $swp1 vid $svid master
+	bridge vlan add dev br0 vid $svid self
+	bridge vlan add dev $swp2 vid $svid master
+}
+
+switch_destroy()
+{
+	ip link set dev $swp2 down
+	ip link set dev $swp1 down
+
+	ip link del dev br0
+}
+
+tc_vlan_create()
+{
+	tc qdisc add dev $swp1 clsact
+
+	tc filter add dev $swp1 ingress pref 1 protocol all flower skip_hw \
+		action vlan push id $cvid protocol 802.1q pipe \
+		action vlan push id $svid protocol 802.1q
+
+	tc filter add dev $swp1 egress pref 1 protocol 802.1q \
+		flower skip_hw vlan_id $svid \
+		vlan_ethtype 802.1q cvlan_id $cvid \
+		action vlan pop pipe action vlan pop
+}
+
+tc_vlan_destroy()
+{
+	tc filter del dev $swp1 ingress pref 1
+	tc filter del dev $swp1 egress pref 1
+	tc qdisc del dev $swp1 clsact
+}
+
+vlan_create()
+{
+	local dev=$1; shift
+	local vid=$1; shift
+	local tpid=$1;
+
+	ip link add link $dev name $dev.$vid type vlan id $vid proto $tpid
+	ip link set dev $dev up
+	ip link set dev $dev.$vid
+}
+
+vlan_destroy()
+{
+	local dev=$1
+
+	ip link del dev $dev
+}
+
+setup_prepare()
+{
+	h1=${NETIFS[p1]}
+	swp1=${NETIFS[p2]}
+
+	swp2=${NETIFS[p3]}
+	h2=${NETIFS[p4]}
+
+	cvid=10
+	svid=100
+
+	vrf_prepare
+
+	switch_create
+
+	tc_vlan_create
+
+	h_create $h1 192.0.2.1/24 2001:db8:1::1/64
+
+	vlan_create $h2 $svid 802.1q
+	vlan_create $h2.$svid $cvid 802.1q
+
+	h_create $h2.$svid.$cvid 192.0.2.2/24 2001:db8:1::2/64
+}
+
+cleanup()
+{
+	pre_cleanup
+
+	tc_vlan_destroy
+
+	switch_destroy
+
+	h_destroy $h1 192.0.2.1/24 2001:db8:1::1/64
+	h_destroy $h2.$svid.$cvid 192.0.2.2/24 2001:db8:1::2/64
+
+	vlan_destroy $h2.$svid.$cvid
+	vlan_destroy $h2.$svid
+
+	ip link del dev $h1
+	ip link del dev $h2
+
+	vrf_cleanup
+}
+
+test_tc_vlan_bridge_ipv4_forwarding()
+{
+	ip link set dev br0 type bridge vlan_protocol 802.1q
+	ping_do $h1 192.0.2.2
+	check_err $? "Packets were not forwarded"
+	log_test "IPv4 tc-vlan bridge forwarding"
+}
+
+test_tc_vlan_bridge_ipv4_forwarding_proto()
+{
+	ip link set dev br0 type bridge vlan_protocol 802.1ad
+	ping_do $h1 192.0.2.2
+	check_err $? "Packets were not forwarded"
+	log_test "IPv4 tc-vlan bridge forwarding protocol mismatch"
+}
+
+test_tc_vlan_bridge_ipv6_forwarding()
+{
+	ip link set dev br0 type bridge vlan_protocol 802.1q
+	ping6_do $h1 2001:db8:1::2
+	check_err $? "Packets were not forwarded"
+	log_test "IPv6 tc-vlan bridge forwarding"
+}
+
+test_tc_vlan_bridge_ipv6_forwarding_proto()
+{
+	ip link set dev br0 type bridge vlan_protocol 802.1ad
+	ping6_do $h1  2001:db8:1::2
+	check_err $? "Packet were not forwarded"
+	log_test "IPv6 tc-vlan bridge forwarding protocol mismatch"
+}
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.22.0


^ permalink raw reply related

* [Bridge] [PATCH v3 1/2] net: bridge: use mac_len in bridge forwarding
From: Zahari Doychev @ 2019-09-02 18:09 UTC (permalink / raw)
  To: netdev
  Cc: makita.toshiaki, jiri, nikolay, simon.horman, roopa, bridge,
	Zahari Doychev, jhs, dsahern, xiyou.wangcong, johannes,
	alexei.starovoitov

The bridge code cannot forward packets from various paths that set up the
SKBs in different ways. Some of these packets get corrupted during the
forwarding as not always is just ETH_HLEN pulled at the front.

This happens e.g. when VLAN tags are pushed by using tc act_vlan on
ingress. Example configuration is provided below. The test setup consists
of two netdevs connected to external hosts. There is act_vlan on one of
them adding two vlan tags on ingress and removing the tags on egress.
The configuration is done using the following commands:

ip link add name br0 type bridge vlan_filtering 1
ip link set dev br0 up

ip link set dev net0 up
ip link set dev net0 master br0

ip link set dev net1 up
ip link set dev net1 master br0

bridge vlan add dev net0 vid 100 master
bridge vlan add dev br0 vid 100 self
bridge vlan add dev net1 vid 100 master

tc qdisc add dev net0 handle ffff: clsact
tc qdisc add dev net1 handle ffff: clsact

tc filter add dev net0 ingress pref 1 protocol all flower \
		  action vlan push id 10 pipe action vlan push id 100

tc filter add dev net0 egress pref 1 protocol 802.1q flower \
		  vlan_id 100 vlan_ethtype 802.1q cvlan_id 10 \
		  action vlan pop pipe action vlan pop

When using the setup above the packets coming on net0 get double tagged but
the MAC headers gets corrupted when the packets go out of net1.
The skb->data is pushed only by the ETH_HLEN length instead of mac_len in
br_dev_queue_push_xmit. This later causes the function validate_xmit_vlan
to insert the outer vlan tag behind the inner vlan tag as the skb->data
does not point to the start of packet.

The problem is fixed by using skb->mac_len instead of ETH_HLEN, which makes
sure that the skb headers are correctly restored. This usually does not
change anything, execpt the local bridge transmits which now need to set
the skb->mac_len correctly in br_dev_xmit, as well as the broken case noted
above.

Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>

---
v2->v3:
 - move cover letter description to commit message
---
 net/bridge/br_device.c  | 3 ++-
 net/bridge/br_forward.c | 4 ++--
 net/bridge/br_vlan.c    | 3 ++-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 681b72862c16..aeb77ff60311 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -55,8 +55,9 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	BR_INPUT_SKB_CB(skb)->frag_max_size = 0;
 
 	skb_reset_mac_header(skb);
+	skb_reset_mac_len(skb);
 	eth = eth_hdr(skb);
-	skb_pull(skb, ETH_HLEN);
+	skb_pull(skb, skb->mac_len);
 
 	if (!br_allowed_ingress(br, br_vlan_group_rcu(br), skb, &vid))
 		goto out;
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 86637000f275..edb4f3533f05 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -32,7 +32,7 @@ static inline int should_deliver(const struct net_bridge_port *p,
 
 int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-	skb_push(skb, ETH_HLEN);
+	skb_push(skb, skb->mac_len);
 	if (!is_skb_forwardable(skb->dev, skb))
 		goto drop;
 
@@ -94,7 +94,7 @@ static void __br_forward(const struct net_bridge_port *to,
 		net = dev_net(indev);
 	} else {
 		if (unlikely(netpoll_tx_running(to->br->dev))) {
-			skb_push(skb, ETH_HLEN);
+			skb_push(skb, skb->mac_len);
 			if (!is_skb_forwardable(skb->dev, skb))
 				kfree_skb(skb);
 			else
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index bb98984cd27d..419067b314d7 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -466,13 +466,14 @@ static bool __allowed_ingress(const struct net_bridge *br,
 		/* Tagged frame */
 		if (skb->vlan_proto != br->vlan_proto) {
 			/* Protocol-mismatch, empty out vlan_tci for new tag */
-			skb_push(skb, ETH_HLEN);
+			skb_push(skb, skb->mac_len);
 			skb = vlan_insert_tag_set_proto(skb, skb->vlan_proto,
 							skb_vlan_tag_get(skb));
 			if (unlikely(!skb))
 				return false;
 
 			skb_pull(skb, ETH_HLEN);
+			skb_reset_network_header(skb);
 			skb_reset_mac_len(skb);
 			*vid = 0;
 			tagged = false;
-- 
2.22.0


^ permalink raw reply related

* Re: [Bridge] [PATCH v4 2/2] net: br_netfiler_hooks: Drops IPv6 packets if IPv6 module is not loaded
From: Florian Westphal @ 2019-08-31  8:43 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Nikolay Aleksandrov, netdev, Roopa Prabhu, bridge,
	Florian Westphal, linux-kernel, Jozsef Kadlecsik, coreteam,
	netfilter-devel, David S. Miller, Pablo Neira Ayuso
In-Reply-To: <2ba876f9ad6597e640df68f09659dce3c4b5ce03.camel@linux.ibm.com>

Leonardo Bras <leonardo@linux.ibm.com> wrote:
> > There are two solutions:
> > 1. The above patch, but use NF_ACCEPT instead
> > 2. keep the DROP, but move it below the call_ip6tables test,
> >    so that users can tweak call-ip6tables to accept packets.
> 
> Q: Does 2 mean that it will only be dropped if bridge intents to use
> host's ip6tables? Else, it will be accepted by previous if?

Yes, thats the idea: Let users decide if ipv6.disable or call-ip6tables
is more important to them.

^ permalink raw reply

* Re: [Bridge] [PATCH v5 1/1] net: br_netfiler_hooks: Drops IPv6 packets if IPv6 module is not loaded
From: Florian Westphal @ 2019-08-31  8:38 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Nikolay Aleksandrov, netdev, Roopa Prabhu, bridge,
	Florian Westphal, linux-kernel, Jozsef Kadlecsik, coreteam,
	netfilter-devel, David S. Miller, Pablo Neira Ayuso
In-Reply-To: <20190831044032.31931-1-leonardo@linux.ibm.com>

Leonardo Bras <leonardo@linux.ibm.com> wrote:
> A kernel panic can happen if a host has disabled IPv6 on boot and have to
> process guest packets (coming from a bridge) using it's ip6tables.
> 
> IPv6 packets need to be dropped if the IPv6 module is not loaded, and the
> host ip6tables will be used.
> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
> Changes from v4:
> - Check if the host ip6tables is going to be used before testing 
>   ipv6 module presence 
> - Adds a warning about ipv6 module disabled.
> 
> 
>  net/bridge/br_netfilter_hooks.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
> index d3f9592f4ff8..af7800103e51 100644
> --- a/net/bridge/br_netfilter_hooks.c
> +++ b/net/bridge/br_netfilter_hooks.c
> @@ -496,6 +496,10 @@ static unsigned int br_nf_pre_routing(void *priv,
>  		if (!brnet->call_ip6tables &&
>  		    !br_opt_get(br, BROPT_NF_CALL_IP6TABLES))
>  			return NF_ACCEPT;
> +		if (!ipv6_mod_enabled()) {
> +			pr_warn_once("Module ipv6 is disabled, so call_ip6tables is not supported.");
> +			return NF_DROP;
> +		}

pr_warn_once needs a '\n'.  Pablo, can you mangle this locally when
applying?

Patch looks good to me, so:

Acked-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply

* Re: [Bridge] [PATCH v4 2/2] net: br_netfiler_hooks: Drops IPv6 packets if IPv6 module is not loaded
From: Leonardo Bras @ 2019-08-31  4:42 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Nikolay Aleksandrov, netdev, Roopa Prabhu, bridge, linux-kernel,
	Jozsef Kadlecsik, coreteam, netfilter-devel, David S. Miller,
	Pablo Neira Ayuso
In-Reply-To: <20190830205541.GR20113@breakpoint.cc>

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

On Fri, 2019-08-30 at 22:55 +0200, Florian Westphal wrote:
> Leonardo Bras <leonardo@linux.ibm.com> wrote:
> > A kernel panic can happen if a host has disabled IPv6 on boot and have to
> > process guest packets (coming from a bridge) using it's ip6tables.
> > 
> > IPv6 packets need to be dropped if the IPv6 module is not loaded.
> > 
> > Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> > ---
> >  net/bridge/br_netfilter_hooks.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
> > index d3f9592f4ff8..5e8693730df1 100644
> > --- a/net/bridge/br_netfilter_hooks.c
> > +++ b/net/bridge/br_netfilter_hooks.c
> > @@ -493,6 +493,8 @@ static unsigned int br_nf_pre_routing(void *priv,
> >  	brnet = net_generic(state->net, brnf_net_id);
> >  	if (IS_IPV6(skb) || is_vlan_ipv6(skb, state->net) ||
> >  	    is_pppoe_ipv6(skb, state->net)) {
> > +		if (!ipv6_mod_enabled())
> > +			return NF_DROP;
> >  		if (!brnet->call_ip6tables &&
> >  		    !br_opt_get(br, BROPT_NF_CALL_IP6TABLES))
> >  			return NF_ACCEPT;
> 
> No, thats too aggressive and turns the bridge into an ipv6 blackhole.
> 
> There are two solutions:
> 1. The above patch, but use NF_ACCEPT instead
> 2. keep the DROP, but move it below the call_ip6tables test,
>    so that users can tweak call-ip6tables to accept packets.

Q: Does 2 mean that it will only be dropped if bridge intents to use
host's ip6tables? Else, it will be accepted by previous if?

> Perhaps it would be good to also add a pr_warn_once() that
> tells that ipv6 was disabled on command line and
> call-ip6tables isn't supported in this configuration.
> 
Good idea, added.

> I would go with option two.
I think it's better than 1 too.

I sent a v5 with these changes:
https://lkml.org/lkml/2019/8/31/4

Thanks!

Leonardo Bras

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [Bridge] [PATCH v5 1/1] net: br_netfiler_hooks: Drops IPv6 packets if IPv6 module is not loaded
From: Leonardo Bras @ 2019-08-31  4:40 UTC (permalink / raw)
  To: netfilter-devel, coreteam, bridge, netdev, linux-kernel
  Cc: Nikolay Aleksandrov, Roopa Prabhu, Florian Westphal,
	Jozsef Kadlecsik, Leonardo Bras, David S. Miller,
	Pablo Neira Ayuso

A kernel panic can happen if a host has disabled IPv6 on boot and have to
process guest packets (coming from a bridge) using it's ip6tables.

IPv6 packets need to be dropped if the IPv6 module is not loaded, and the
host ip6tables will be used.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
Changes from v4:
- Check if the host ip6tables is going to be used before testing 
  ipv6 module presence 
- Adds a warning about ipv6 module disabled.


 net/bridge/br_netfilter_hooks.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index d3f9592f4ff8..af7800103e51 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -496,6 +496,10 @@ static unsigned int br_nf_pre_routing(void *priv,
 		if (!brnet->call_ip6tables &&
 		    !br_opt_get(br, BROPT_NF_CALL_IP6TABLES))
 			return NF_ACCEPT;
+		if (!ipv6_mod_enabled()) {
+			pr_warn_once("Module ipv6 is disabled, so call_ip6tables is not supported.");
+			return NF_DROP;
+		}
 
 		nf_bridge_pull_encap_header_rcsum(skb);
 		return br_nf_pre_routing_ipv6(priv, skb, state);
-- 
2.20.1


^ permalink raw reply related

* Re: [Bridge] [PATCH v4 1/2] netfilter: Terminate rule eval if protocol=IPv6 and ipv6 module is disabled
From: Florian Westphal @ 2019-08-30 20:58 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Nikolay Aleksandrov, netdev, Roopa Prabhu, bridge,
	Florian Westphal, linux-kernel, Jozsef Kadlecsik, coreteam,
	netfilter-devel, David S. Miller, Pablo Neira Ayuso
In-Reply-To: <20190830181354.26279-2-leonardo@linux.ibm.com>

Leonardo Bras <leonardo@linux.ibm.com> wrote:
> If IPv6 is disabled on boot (ipv6.disable=1), but nft_fib_inet ends up
> dealing with a IPv6 packet, it causes a kernel panic in
> fib6_node_lookup_1(), crashing in bad_page_fault.
> 
> The panic is caused by trying to deference a very low address (0x38
> in ppc64le), due to ipv6.fib6_main_tbl = NULL.
> BUG: Kernel NULL pointer dereference at 0x00000038
> 
> The kernel panic was reproduced in a host that disabled IPv6 on boot and
> have to process guest packets (coming from a bridge) using it's ip6tables.
> 
> Terminate rule evaluation when packet protocol is IPv6 but the ipv6 module
> is not loaded.
> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>

Acked-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply

* Re: [Bridge] [PATCH v4 2/2] net: br_netfiler_hooks: Drops IPv6 packets if IPv6 module is not loaded
From: Florian Westphal @ 2019-08-30 20:55 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Nikolay Aleksandrov, netdev, Roopa Prabhu, bridge,
	Florian Westphal, linux-kernel, Jozsef Kadlecsik, coreteam,
	netfilter-devel, David S. Miller, Pablo Neira Ayuso
In-Reply-To: <20190830181354.26279-3-leonardo@linux.ibm.com>

Leonardo Bras <leonardo@linux.ibm.com> wrote:
> A kernel panic can happen if a host has disabled IPv6 on boot and have to
> process guest packets (coming from a bridge) using it's ip6tables.
> 
> IPv6 packets need to be dropped if the IPv6 module is not loaded.
> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
>  net/bridge/br_netfilter_hooks.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
> index d3f9592f4ff8..5e8693730df1 100644
> --- a/net/bridge/br_netfilter_hooks.c
> +++ b/net/bridge/br_netfilter_hooks.c
> @@ -493,6 +493,8 @@ static unsigned int br_nf_pre_routing(void *priv,
>  	brnet = net_generic(state->net, brnf_net_id);
>  	if (IS_IPV6(skb) || is_vlan_ipv6(skb, state->net) ||
>  	    is_pppoe_ipv6(skb, state->net)) {
> +		if (!ipv6_mod_enabled())
> +			return NF_DROP;
>  		if (!brnet->call_ip6tables &&
>  		    !br_opt_get(br, BROPT_NF_CALL_IP6TABLES))
>  			return NF_ACCEPT;

No, thats too aggressive and turns the bridge into an ipv6 blackhole.

There are two solutions:
1. The above patch, but use NF_ACCEPT instead
2. keep the DROP, but move it below the call_ip6tables test,
   so that users can tweak call-ip6tables to accept packets.

Perhaps it would be good to also add a pr_warn_once() that
tells that ipv6 was disabled on command line and
call-ip6tables isn't supported in this configuration.

I would go with option two.

^ permalink raw reply

* [Bridge] [PATCH v4 2/2] net: br_netfiler_hooks: Drops IPv6 packets if IPv6 module is not loaded
From: Leonardo Bras @ 2019-08-30 18:13 UTC (permalink / raw)
  To: netfilter-devel, coreteam, bridge, netdev, linux-kernel
  Cc: Nikolay Aleksandrov, Roopa Prabhu, Florian Westphal,
	Jozsef Kadlecsik, Leonardo Bras, David S. Miller,
	Pablo Neira Ayuso
In-Reply-To: <20190830181354.26279-1-leonardo@linux.ibm.com>

A kernel panic can happen if a host has disabled IPv6 on boot and have to
process guest packets (coming from a bridge) using it's ip6tables.

IPv6 packets need to be dropped if the IPv6 module is not loaded.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 net/bridge/br_netfilter_hooks.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index d3f9592f4ff8..5e8693730df1 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -493,6 +493,8 @@ static unsigned int br_nf_pre_routing(void *priv,
 	brnet = net_generic(state->net, brnf_net_id);
 	if (IS_IPV6(skb) || is_vlan_ipv6(skb, state->net) ||
 	    is_pppoe_ipv6(skb, state->net)) {
+		if (!ipv6_mod_enabled())
+			return NF_DROP;
 		if (!brnet->call_ip6tables &&
 		    !br_opt_get(br, BROPT_NF_CALL_IP6TABLES))
 			return NF_ACCEPT;
-- 
2.20.1


^ permalink raw reply related

* [Bridge] [PATCH v4 1/2] netfilter: Terminate rule eval if protocol=IPv6 and ipv6 module is disabled
From: Leonardo Bras @ 2019-08-30 18:13 UTC (permalink / raw)
  To: netfilter-devel, coreteam, bridge, netdev, linux-kernel
  Cc: Nikolay Aleksandrov, Roopa Prabhu, Florian Westphal,
	Jozsef Kadlecsik, Leonardo Bras, David S. Miller,
	Pablo Neira Ayuso
In-Reply-To: <20190830181354.26279-1-leonardo@linux.ibm.com>

If IPv6 is disabled on boot (ipv6.disable=1), but nft_fib_inet ends up
dealing with a IPv6 packet, it causes a kernel panic in
fib6_node_lookup_1(), crashing in bad_page_fault.

The panic is caused by trying to deference a very low address (0x38
in ppc64le), due to ipv6.fib6_main_tbl = NULL.
BUG: Kernel NULL pointer dereference at 0x00000038

The kernel panic was reproduced in a host that disabled IPv6 on boot and
have to process guest packets (coming from a bridge) using it's ip6tables.

Terminate rule evaluation when packet protocol is IPv6 but the ipv6 module
is not loaded.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 net/netfilter/nft_fib_netdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nft_fib_netdev.c b/net/netfilter/nft_fib_netdev.c
index 2cf3f32fe6d2..a2e726ae7f07 100644
--- a/net/netfilter/nft_fib_netdev.c
+++ b/net/netfilter/nft_fib_netdev.c
@@ -14,6 +14,7 @@
 #include <linux/netfilter/nf_tables.h>
 #include <net/netfilter/nf_tables_core.h>
 #include <net/netfilter/nf_tables.h>
+#include <net/ipv6.h>

 #include <net/netfilter/nft_fib.h>

@@ -34,6 +35,8 @@ static void nft_fib_netdev_eval(const struct nft_expr *expr,
 		}
 		break;
 	case ETH_P_IPV6:
+		if (!ipv6_mod_enabled())
+			break;
 		switch (priv->result) {
 		case NFT_FIB_RESULT_OIF:
 		case NFT_FIB_RESULT_OIFNAME:
-- 
2.20.1

^ permalink raw reply related

* [Bridge] [PATCH v4 0/2] Drop IPV6 packets if IPv6 is disabled on boot
From: Leonardo Bras @ 2019-08-30 18:13 UTC (permalink / raw)
  To: netfilter-devel, coreteam, bridge, netdev, linux-kernel
  Cc: Nikolay Aleksandrov, Roopa Prabhu, Florian Westphal,
	Jozsef Kadlecsik, Leonardo Bras, David S. Miller,
	Pablo Neira Ayuso

This patchset was prevously a single patch named:
- netfilter: nf_tables: fib: Drop IPV6 packets if IPv6 is disabled on boot

It fixes a bug where a host, with IPv6 disabled on boot, has to deal with
guest IPv6 packets, that comes from a bridge interface.
When these packets reach the host ip6tables they cause a kernel panic.

---
Changes from v3:
- Move drop logic from nft_fib6_eval{,_type} to nft_fib_netdev_eval
- Add another patch to drop ipv6 packets from bridge when ipv6 disabled  

Changes from v2:
- Replace veredict.code from NF_DROP to NFT_BREAK
- Updated commit message (s/package/packet)

Changes from v1:
- Move drop logic from nft_fib_inet_eval() to nft_fib6_eval{,_type}
so it can affect other usages of these functions.

Leonardo Bras (2):
  netfilter: Terminate rule eval if protocol=IPv6 and ipv6 module is
    disabled
  net: br_netfiler_hooks: Drops IPv6 packets if IPv6 module is not
    loaded

 net/bridge/br_netfilter_hooks.c | 2 ++
 net/netfilter/nft_fib_netdev.c  | 3 +++
 2 files changed, 5 insertions(+)

-- 
2.20.1


^ permalink raw reply

* Re: [Bridge] [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature
From: Horatiu Vultur @ 2019-08-28 21:53 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: alexandre.belloni, f.fainelli, nikolay, netdev, roopa, bridge,
	linux-kernel, UNGLinuxDriver, allan.nielsen, davem
In-Reply-To: <20190827131824.GC11471@lunn.ch>

The 08/27/2019 15:18, Andrew Lunn wrote:
> External E-Mail
> 
> 
> > That sounds like a great idea. I was expecting to add this logic in the
> > set_rx_mode function of the driver. But unfortunetly, I got the calls to
> > this function before the dev->promiscuity is updated or not to get the
> > call at all. For example in case the port is member of a bridge and I try
> > to enable the promisc mode.
> 
> Hi Horatiu

Hi Andrew,
> 
> What about the notifier? Is it called in all the conditions you need
> to know about?

I had a look also over this but without any luck. I can get good
information from this, like knowing when a port is added or removed from
the bridge(NETDEV_CHANGEUPPER). But not in case the promisc is change by
an application(eg. tcpdump). In this case if port is part of the bridge
and then promisc is enable, then there is no callback to the driver or
any notifications.
> 
> Or, you could consider adding a new switchdev call to pass this
> information to any switchdev driver which is interested in the
> information.

Having this new switchdev call and listening for NETDEV_CHANGEUPPER
seems to be enough to know when a port needs to go in promisc mode.
> 
> At the moment, the DSA driver core does not pass onto the driver it
> should put a port into promisc mode. So pcap etc, will only see
> traffic directed to the CPU, not all the traffic ingressing the
> interface. If you put the needed core infrastructure into place, we
> could plumb it down from the DSA core to DSA drivers.
> 
> Having said that, i don't actually know if the Marvell switches
> support this. Forward using the ATU and send a copy to the CPU?  What
> switches tend to support is port mirroring, sending all the traffic
> out another port. A couple of DSA drivers support that, via TC.
> 
> 	Andrew
> 

-- 
/Horatiu

^ permalink raw reply

* Re: [Bridge] [PATCH] bridge:fragmented packets dropped by bridge
From: Rundong Ge @ 2019-08-28  9:21 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: yoshfuji, netdev, Roopa Prabhu, bridge, Florian Westphal,
	linux-kernel, kadlec, Nikolay Aleksandrov, coreteam,
	netfilter-devel, kuznet, davem, Pablo Neira Ayuso
In-Reply-To: <nycvar.YFH.7.76.1908260955400.22383@n3.vanv.qr>

Jan Engelhardt <jengelh@inai.de> 于2019年8月26日周一 下午3:59写道：
>
>
> On Tuesday 2019-07-30 14:35, Florian Westphal wrote:
> >Rundong Ge <rdong.ge@gmail.com> wrote:
> >> Given following setup:
> >> -modprobe br_netfilter
> >> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> >> -brctl addbr br0
> >> -brctl addif br0 enp2s0
> >> -brctl addif br0 enp3s0
> >> -brctl addif br0 enp6s0
> >> -ifconfig enp2s0 mtu 1300
> >> -ifconfig enp3s0 mtu 1500
> >> -ifconfig enp6s0 mtu 1500
> >> -ifconfig br0 up
> >>
> >>                  multi-port
> >> mtu1500 - mtu1500|bridge|1500 - mtu1500
> >>   A                  |            B
> >>                    mtu1300
> >
> >How can a bridge forward a frame from A/B to mtu1300?
>
> There might be a misunderstanding here judging from the shortness of this
> thread.
>
> I understood it such that the bridge ports (eth0,eth1) have MTU 1500, yet br0
> (in essence the third bridge port if you so wish) itself has MTU 1300.
>
> Therefore, frame forwarding from eth0 to eth1 should succeed, since the
> 1300-byte MTU is only relevant if the bridge decides the packet needs to be
> locally delivered.

Under this setup when I do "ping B -l 2000" from A, the fragmented
packets will be dropped by bridge.
When the "/proc/sys/net/bridge/bridge-nf-call-iptables" is on, bridge
will do defragment at PREROUTING and re-fragment at POSTROUTING. At
the re-fragment bridge will check if the max frag size is larger than
the bridge's MTU in  br_nf_ip_fragment(), if it is true packets will
be dropped.
And this patch use the outdev's MTU instead of the bridge's MTU to do
the br_nf_ip_fragment.

^ permalink raw reply

* Re: [Bridge] [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature
From: Vladimir Oltean @ 2019-08-27 14:55 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Alexandre Belloni, Florian Fainelli, David S. Miller, nikolay,
	netdev, Roopa Prabhu, bridge, lkml, UNGLinuxDriver,
	Allan W. Nielsen, Horatiu Vultur
In-Reply-To: <20190827131824.GC11471@lunn.ch>

On Tue, 27 Aug 2019 at 16:20, Andrew Lunn <andrew@lunn.ch> wrote:
>
> > That sounds like a great idea. I was expecting to add this logic in the
> > set_rx_mode function of the driver. But unfortunetly, I got the calls to
> > this function before the dev->promiscuity is updated or not to get the
> > call at all. For example in case the port is member of a bridge and I try
> > to enable the promisc mode.
>
> Hi Horatiu
>
> What about the notifier? Is it called in all the conditions you need
> to know about?
>
> Or, you could consider adding a new switchdev call to pass this
> information to any switchdev driver which is interested in the
> information.
>
> At the moment, the DSA driver core does not pass onto the driver it
> should put a port into promisc mode. So pcap etc, will only see
> traffic directed to the CPU, not all the traffic ingressing the
> interface. If you put the needed core infrastructure into place, we
> could plumb it down from the DSA core to DSA drivers.
>
> Having said that, i don't actually know if the Marvell switches
> support this. Forward using the ATU and send a copy to the CPU?  What
> switches tend to support is port mirroring, sending all the traffic
> out another port. A couple of DSA drivers support that, via TC.
>

But the CPU port is not a valid destination for port mirroring in DSA,
I might add.

>         Andrew

Regards,
-Vladimir

^ permalink raw reply

* Re: [Bridge] [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature
From: Andrew Lunn @ 2019-08-27 13:18 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: alexandre.belloni, f.fainelli, nikolay, netdev, roopa, bridge,
	linux-kernel, UNGLinuxDriver, allan.nielsen, davem
In-Reply-To: <20190827101033.g2cb6j2j4kuyzh2a@soft-dev3.microsemi.net>

> That sounds like a great idea. I was expecting to add this logic in the
> set_rx_mode function of the driver. But unfortunetly, I got the calls to
> this function before the dev->promiscuity is updated or not to get the
> call at all. For example in case the port is member of a bridge and I try
> to enable the promisc mode.

Hi Horatiu

What about the notifier? Is it called in all the conditions you need
to know about?

Or, you could consider adding a new switchdev call to pass this
information to any switchdev driver which is interested in the
information.

At the moment, the DSA driver core does not pass onto the driver it
should put a port into promisc mode. So pcap etc, will only see
traffic directed to the CPU, not all the traffic ingressing the
interface. If you put the needed core infrastructure into place, we
could plumb it down from the DSA core to DSA drivers.

Having said that, i don't actually know if the Marvell switches
support this. Forward using the ATU and send a copy to the CPU?  What
switches tend to support is port mirroring, sending all the traffic
out another port. A couple of DSA drivers support that, via TC.

	Andrew

^ permalink raw reply

* Re: [Bridge] [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature
From: Horatiu Vultur @ 2019-08-27 10:10 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: alexandre.belloni, f.fainelli, nikolay, netdev, roopa, bridge,
	linux-kernel, UNGLinuxDriver, allan.nielsen, davem
In-Reply-To: <20190826123811.GA13411@lunn.ch>

The 08/26/2019 14:38, Andrew Lunn wrote:
> External E-Mail
> 
> 
> On Mon, Aug 26, 2019 at 10:11:12AM +0200, Horatiu Vultur wrote:
> > When a network port is added to a bridge then the port is added in
> > promisc mode. Some HW that has bridge capabilities(can learn, forward,
> > flood etc the frames) they are disabling promisc mode in the network
> > driver when the port is added to the SW bridge.
> > 
> > This patch adds the feature NETIF_F_HW_BR_CAP so that the network ports
> > that have this feature will not be set in promisc mode when they are
> > added to a SW bridge.
> > 
> > In this way the HW that has bridge capabilities don't need to send all the
> > traffic to the CPU and can also implement the promisc mode and toggle it
> > using the command 'ip link set dev swp promisc on'
> 
> Hi Horatiu

Hi Andrew,
> 
> I'm still not convinced this is needed. The model is, the hardware is
> there to accelerate what Linux can do in software. Any peculiarities
> of the accelerator should be hidden in the driver.  If the accelerator
> can do its job without needing promisc mode, do that in the driver.
Thanks for the model description. I will keep in my mind for the next
patches that I will do.
> 
> So you are trying to differentiate between promisc mode because the
> interface is a member of a bridge, and promisc mode because some
> application, like pcap, has asked for promisc mode.
> 
> dev->promiscuity is a counter. So what you can do it look at its
> value, and how the interface is being used. If the interface is not a
> member of a bridge, and the count > 0, enable promisc mode in the
> accelerator. If the interface is a member of a bridge, and the count >
> 1, enable promisc mode in the accelerator.
That sounds like a great idea. I was expecting to add this logic in the
set_rx_mode function of the driver. But unfortunetly, I got the calls to
this function before the dev->promiscuity is updated or not to get the
call at all. For example in case the port is member of a bridge and I try
to enable the promisc mode.

> 
>    Andrew
> 
> 

-- 
/Horatiu

^ permalink raw reply

* Re: [Bridge] [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature
From: David Miller @ 2019-08-26 21:13 UTC (permalink / raw)
  To: andrew
  Cc: alexandre.belloni, f.fainelli, nikolay, netdev, roopa, bridge,
	linux-kernel, UNGLinuxDriver, allan.nielsen, horatiu.vultur
In-Reply-To: <20190826123811.GA13411@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Mon, 26 Aug 2019 14:38:11 +0200

> I'm still not convinced this is needed. The model is, the hardware is
> there to accelerate what Linux can do in software. Any peculiarities
> of the accelerator should be hidden in the driver.  If the accelerator
> can do its job without needing promisc mode, do that in the driver.

I completely agree.

^ permalink raw reply

* Re: [Bridge] [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature
From: Florian Fainelli @ 2019-08-26 17:01 UTC (permalink / raw)
  To: Andrew Lunn, Horatiu Vultur
  Cc: alexandre.belloni, nikolay, netdev, roopa, bridge, linux-kernel,
	UNGLinuxDriver, allan.nielsen, davem
In-Reply-To: <20190826123811.GA13411@lunn.ch>

On 8/26/19 5:38 AM, Andrew Lunn wrote:
> On Mon, Aug 26, 2019 at 10:11:12AM +0200, Horatiu Vultur wrote:
>> When a network port is added to a bridge then the port is added in
>> promisc mode. Some HW that has bridge capabilities(can learn, forward,
>> flood etc the frames) they are disabling promisc mode in the network
>> driver when the port is added to the SW bridge.
>>
>> This patch adds the feature NETIF_F_HW_BR_CAP so that the network ports
>> that have this feature will not be set in promisc mode when they are
>> added to a SW bridge.
>>
>> In this way the HW that has bridge capabilities don't need to send all the
>> traffic to the CPU and can also implement the promisc mode and toggle it
>> using the command 'ip link set dev swp promisc on'
> 
> Hi Horatiu
> 
> I'm still not convinced this is needed. The model is, the hardware is
> there to accelerate what Linux can do in software. Any peculiarities
> of the accelerator should be hidden in the driver.  If the accelerator
> can do its job without needing promisc mode, do that in the driver.
> 
> So you are trying to differentiate between promisc mode because the
> interface is a member of a bridge, and promisc mode because some
> application, like pcap, has asked for promisc mode.
> 
> dev->promiscuity is a counter. So what you can do it look at its
> value, and how the interface is being used. If the interface is not a
> member of a bridge, and the count > 0, enable promisc mode in the
> accelerator. If the interface is a member of a bridge, and the count >
> 1, enable promisc mode in the accelerator.

That is an excellent suggestion actually.

Horatiu, the other issue with your approach here is that the features
don't propagate to/from lower/upper/real devices, so if e.g.: you have a
VLAN interface enslaved as a part of the bridge, or a bond, or a tunnel
interface, the logic won't make us check NETIF_F_HW_BR_CAP because those
virtual network devices won't inherit it from their real device. I am
not suggesting you fix this with your patch series, but rather, seek a
driver local solution.
-- 
Florian

^ permalink raw reply

* Re: [Bridge] [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature
From: Andrew Lunn @ 2019-08-26 12:38 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: alexandre.belloni, f.fainelli, nikolay, netdev, roopa, bridge,
	linux-kernel, UNGLinuxDriver, allan.nielsen, davem
In-Reply-To: <1566807075-775-1-git-send-email-horatiu.vultur@microchip.com>

On Mon, Aug 26, 2019 at 10:11:12AM +0200, Horatiu Vultur wrote:
> When a network port is added to a bridge then the port is added in
> promisc mode. Some HW that has bridge capabilities(can learn, forward,
> flood etc the frames) they are disabling promisc mode in the network
> driver when the port is added to the SW bridge.
> 
> This patch adds the feature NETIF_F_HW_BR_CAP so that the network ports
> that have this feature will not be set in promisc mode when they are
> added to a SW bridge.
> 
> In this way the HW that has bridge capabilities don't need to send all the
> traffic to the CPU and can also implement the promisc mode and toggle it
> using the command 'ip link set dev swp promisc on'

Hi Horatiu

I'm still not convinced this is needed. The model is, the hardware is
there to accelerate what Linux can do in software. Any peculiarities
of the accelerator should be hidden in the driver.  If the accelerator
can do its job without needing promisc mode, do that in the driver.

So you are trying to differentiate between promisc mode because the
interface is a member of a bridge, and promisc mode because some
application, like pcap, has asked for promisc mode.

dev->promiscuity is a counter. So what you can do it look at its
value, and how the interface is being used. If the interface is not a
member of a bridge, and the count > 0, enable promisc mode in the
accelerator. If the interface is a member of a bridge, and the count >
1, enable promisc mode in the accelerator.

   Andrew

^ permalink raw reply

* [Bridge] [PATCH v2 3/3] net: mscc: Implement promisc mode.
From: Horatiu Vultur @ 2019-08-26  8:11 UTC (permalink / raw)
  To: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, andrew, f.fainelli, netdev, linux-kernel, bridge
  Cc: Horatiu Vultur
In-Reply-To: <1566807075-775-1-git-send-email-horatiu.vultur@microchip.com>

Before when a port was added to a bridge then the port was added in
promisc mode. But because of the patches:
commit e6300374f3be6 ("net: Add NETIF_HW_BR_CAP feature")
commit 764866d46cc81 ("net: mscc: Use NETIF_F_HW_BR_CAP")'

The port is not needed to be in promisc mode to be part of the bridge.
So it is possible to togle the promisc mode of the port even if it is or
not part of the bridge.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 drivers/net/ethernet/mscc/ocelot.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 7d7c94b..8a18eef 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -691,6 +691,25 @@ static void ocelot_set_rx_mode(struct net_device *dev)
 	__dev_mc_sync(dev, ocelot_mc_sync, ocelot_mc_unsync);
 }
 
+static void ocelot_change_rx_flags(struct net_device *dev, int flags)
+{
+	struct ocelot_port *port = netdev_priv(dev);
+	struct ocelot *ocelot = port->ocelot;
+	u32 val;
+
+	if (!(flags & IFF_PROMISC))
+		return;
+
+	val = ocelot_read_gix(ocelot, ANA_PORT_CPU_FWD_CFG,
+			      port->chip_port);
+	if (dev->flags & IFF_PROMISC)
+		val |= ANA_PORT_CPU_FWD_CFG_CPU_SRC_COPY_ENA;
+	else
+		val &= ~(ANA_PORT_CPU_FWD_CFG_CPU_SRC_COPY_ENA);
+
+	ocelot_write_gix(ocelot, val, ANA_PORT_CPU_FWD_CFG, port->chip_port);
+}
+
 static int ocelot_port_get_phys_port_name(struct net_device *dev,
 					  char *buf, size_t len)
 {
@@ -1070,6 +1089,7 @@ static const struct net_device_ops ocelot_port_netdev_ops = {
 	.ndo_stop			= ocelot_port_stop,
 	.ndo_start_xmit			= ocelot_port_xmit,
 	.ndo_set_rx_mode		= ocelot_set_rx_mode,
+	.ndo_change_rx_flags		= ocelot_change_rx_flags,
 	.ndo_get_phys_port_name		= ocelot_port_get_phys_port_name,
 	.ndo_set_mac_address		= ocelot_port_set_mac_address,
 	.ndo_get_stats64		= ocelot_get_stats64,
-- 
2.7.4


^ permalink raw reply related

* [Bridge] [PATCH v2 2/3] net: mscc: Use NETIF_F_HW_BR_CAP
From: Horatiu Vultur @ 2019-08-26  8:11 UTC (permalink / raw)
  To: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, andrew, f.fainelli, netdev, linux-kernel, bridge
  Cc: Horatiu Vultur
In-Reply-To: <1566807075-775-1-git-send-email-horatiu.vultur@microchip.com>

Enable NETIF_F_HW_BR_CAP feature for ocelot.
Because the HW can learn and flood the frames, so there is no need for SW
bridge to do this.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 drivers/net/ethernet/mscc/ocelot.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 4d1bce4..7d7c94b 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -2017,8 +2017,10 @@ int ocelot_probe_port(struct ocelot *ocelot, u8 port,
 	dev->ethtool_ops = &ocelot_ethtool_ops;
 
 	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_RXFCS |
-		NETIF_F_HW_TC;
-	dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_TC;
+		NETIF_F_HW_TC | NETIF_F_HW_BR_CAP;
+	dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_TC |
+		NETIF_F_HW_BR_CAP;
+	dev->priv_flags |= IFF_UNICAST_FLT;
 
 	memcpy(dev->dev_addr, ocelot->base_mac, ETH_ALEN);
 	dev->dev_addr[ETH_ALEN - 1] += port;
-- 
2.7.4


^ permalink raw reply related

* [Bridge] [PATCH v2 1/3] net: Add NETIF_HW_BR_CAP feature
From: Horatiu Vultur @ 2019-08-26  8:11 UTC (permalink / raw)
  To: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, andrew, f.fainelli, netdev, linux-kernel, bridge
  Cc: Horatiu Vultur
In-Reply-To: <1566807075-775-1-git-send-email-horatiu.vultur@microchip.com>

This patch adds a netdev feature to allow the SW bridge not to set all the
slave interfaces in promisc mode if the HW is capable of learning and
flooading the frames.

The current implementation adds all the bridge ports in promisc mode. Even
if the HW has bridge capabilities(can learn and flood frames). Then all the
frames will be copy to the CPU even if there are cases where there is no
need for this.

For example in the following scenario:
+----------------------------------+
|               SW BR              |
+----------------------------------+
|      |                  |
|      |        +------------------+
|      |        |       HW BR      |
|      |        +------------------+
|      |          |            |
+      +          +            +
p1     p2         p3           p4

Case A: There is a SW bridge with the ports p1 and p2
Case B: There is a SW bridge with the ports p3 and p4.
Case C: THere is a SW bridge with the ports p2, p3 and p4.

For case A, the HW can't do learning and flooding of the frames. Therefore
all the frames need to be copied to the CPU to allow the SW bridge to
decide what do do with the frame(forward, flood, copy to the upper layers,
etc..).

For case B, there is HW support to learn and flood the frames. In this case
there is no point to send all the frames to the CPU(except for frames that
need to go to CPU and flooded frames if flooding is enabled). Because the
HW will already forward the frame to the correct network port, then the
SW bridge will not have anything to do. It would just use CPU cycles and
then drop the frame. The reason for dropping the frame is that the network
driver will set the flag fwd_offload_mark and then SW bridge will skip all
the ports that have the same parent_id as the port that received the frame.
Which is this case.

For case C, there is HW support to learn and flood frames for ports p3 and
p4 while p2 doesn't have HW support. In this case the port p2 needs to be
in promisc mode to allow SW bridge to do the learning and flooding of the
frames while ports p3 and p4 they don't need to be in promisc mode.
The ports p3 and p4 need to make sure that the CPU is in flood mask and
need to know which addresses can be access through SW bridge so it could
send those frames to CPU port. So it would allow the SW bridge to send to
the correct network port.

A workaround for all these cases is not to set the network port in
promisc mode if it is a bridge port, which is the case for mlxsw. Or not
to implement it at all, which is the case for ocelot. But the disadvantage
of this approach is that the network bridge ports can not be set in promisc
mode if there is a need to monitor all the traffic on that port using the
command 'ip link set dev swp promisc on'. This patch adds also support for
this case.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 include/linux/netdev_features.h |  6 ++++++
 net/bridge/br_if.c              | 11 ++++++++++-
 net/core/ethtool.c              |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 4b19c54..b5a3463 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -78,6 +78,11 @@ enum {
 	NETIF_F_HW_TLS_TX_BIT,		/* Hardware TLS TX offload */
 	NETIF_F_HW_TLS_RX_BIT,		/* Hardware TLS RX offload */

+	NETIF_F_HW_BR_CAP_BIT,		/* Hardware is capable to behave as a
+					 * bridge to learn and switch frames
+					 */
+
+
 	NETIF_F_GRO_HW_BIT,		/* Hardware Generic receive offload */
 	NETIF_F_HW_TLS_RECORD_BIT,	/* Offload TLS record */

@@ -150,6 +155,7 @@ enum {
 #define NETIF_F_GSO_UDP_L4	__NETIF_F(GSO_UDP_L4)
 #define NETIF_F_HW_TLS_TX	__NETIF_F(HW_TLS_TX)
 #define NETIF_F_HW_TLS_RX	__NETIF_F(HW_TLS_RX)
+#define NETIF_F_HW_BR_CAP	__NETIF_F(HW_BR_CAP)

 /* Finds the next feature with the highest number of the range of start till 0.
  */
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 4fe30b1..93bfc55 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -161,7 +161,16 @@ void br_manage_promisc(struct net_bridge *br)
 			    (br->auto_cnt == 1 && br_auto_port(p)))
 				br_port_clear_promisc(p);
 			else
-				br_port_set_promisc(p);
+				/* If the HW has bridge capabilities to learn
+				 * and flood the frames then there is no need
+				 * to copy all the frames to the SW to do the
+				 * same. Because the HW already switched the
+				 * frame and then there is nothing to do for
+				 * the SW bridge. The SW will just use CPU
+				 * and it would drop the frame.
+				 */
+				if (!(p->dev->features & NETIF_F_HW_BR_CAP))
+					br_port_set_promisc(p);
 		}
 	}
 }
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6288e69..10430fe 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -111,6 +111,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_HW_TLS_RECORD_BIT] =	"tls-hw-record",
 	[NETIF_F_HW_TLS_TX_BIT] =	 "tls-hw-tx-offload",
 	[NETIF_F_HW_TLS_RX_BIT] =	 "tls-hw-rx-offload",
+	[NETIF_F_HW_BR_CAP_BIT] =	 "bridge-capabilities-offload",
 };

 static const char
-- 
2.7.4

^ permalink raw reply related

* [Bridge] [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature
From: Horatiu Vultur @ 2019-08-26  8:11 UTC (permalink / raw)
  To: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, andrew, f.fainelli, netdev, linux-kernel, bridge
  Cc: Horatiu Vultur

When a network port is added to a bridge then the port is added in
promisc mode. Some HW that has bridge capabilities(can learn, forward,
flood etc the frames) they are disabling promisc mode in the network
driver when the port is added to the SW bridge.

This patch adds the feature NETIF_F_HW_BR_CAP so that the network ports
that have this feature will not be set in promisc mode when they are
added to a SW bridge.

In this way the HW that has bridge capabilities don't need to send all the
traffic to the CPU and can also implement the promisc mode and toggle it
using the command 'ip link set dev swp promisc on'

v1 -> v2
  - rename feature to NETIF_F_HW_BR_CAP
  - add better description in the commit message and in the code
  - remove the check that all network driver have same netdev_ops and
    just check for the feature NETIF_F_HW_BR_CAP when setting the network
    port in promisc mode.

Horatiu Vultur (3):
  net: Add NETIF_HW_BR_CAP feature
  net: mscc: Use NETIF_F_HW_BR_CAP
  net: mscc: Implement promisc mode.

 drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
 include/linux/netdev_features.h    |  6 ++++++
 net/bridge/br_if.c                 | 11 ++++++++++-
 net/core/ethtool.c                 |  1 +
 4 files changed, 41 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox