* [RFC PATCH net-next] bridge: ability to disable forwarding on a port
@ 2015-01-17 7:32 roopa
2015-01-17 21:14 ` Arad, Ronen
2015-01-18 1:05 ` Scott Feldman
0 siblings, 2 replies; 10+ messages in thread
From: roopa @ 2015-01-17 7:32 UTC (permalink / raw)
To: stephen, davem, jhs, sfeldma, jiri, ronen.arad, tgraf,
john.fastabend, vyasevic
Cc: netdev, wkok, gospo
From: Roopa Prabhu <roopa@cumulusnetworks.com>
On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
there is a need to not re-forward the frames that come up to the
kernel in software.
Typically these are broadcast or multicast packets forwarded by the
hardware to multiple destination ports including sending a copy of
the packet to the kernel (e.g. an arp broadcast).
The bridge driver will try to forward the packet again, resulting in
two copies of the same packet.
These packets can also come up to the kernel for logging when they hit
a LOG acl in hardware.
This patch makes forwarding a flag on the port similar to
learn and flood and drops the packet just before forwarding.
(The forwarding disable on a bridge is tested to work on our boxes.
The bridge port flag addition is only compile tested.
This will need to be further refined to cover cases where a non-switch port
is bridged to a switch port etc. We will submit more patches to cover
all cases if we agree on this approach).
Other ways to solve the same problem could be to:
- use the offload feature flag on these switch ports to avoid the
re-forward:
https://www.marc.info/?l=linux-netdev&m=141820235010603&w=2
- Or the switch driver can mark or set a flag in the skb, which the bridge
driver can use to avoid a re-forward.
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
include/linux/if_bridge.h | 3 ++-
include/uapi/linux/if_link.h | 1 +
net/bridge/br_forward.c | 13 +++++++++++++
net/bridge/br_if.c | 2 +-
net/bridge/br_netlink.c | 4 +++-
net/bridge/br_sysfs_if.c | 1 +
net/core/rtnetlink.c | 4 +++-
7 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 0a8ce76..c79f4eb 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -40,10 +40,11 @@ struct br_ip_list {
#define BR_ADMIN_COST BIT(4)
#define BR_LEARNING BIT(5)
#define BR_FLOOD BIT(6)
-#define BR_AUTO_MASK (BR_FLOOD | BR_LEARNING)
#define BR_PROMISC BIT(7)
#define BR_PROXYARP BIT(8)
#define BR_LEARNING_SYNC BIT(9)
+#define BR_FORWARD BIT(10)
+#define BR_AUTO_MASK (BR_FLOOD | BR_LEARNING | BR_FORWARD)
extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index f7d0d2d..d394625 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -246,6 +246,7 @@ enum {
IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
IFLA_BRPORT_PROXYARP, /* proxy ARP */
IFLA_BRPORT_LEARNING_SYNC, /* mac learning sync from device */
+ IFLA_BRPORT_FORWARD, /* enable forwarding on a device */
__IFLA_BRPORT_MAX
};
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index f96933a..98c41c8 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -81,10 +81,23 @@ static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
br_forward_finish);
}
+int br_hw_forward_finish(struct sk_buff *skb)
+{
+ kfree_skb(skb);
+
+ return 0;
+}
+
static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
{
struct net_device *indev;
+ if (!(to->flags & BR_FORWARD)) {
+ NF_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD, skb, skb->dev, to->dev,
+ br_hw_forward_finish);
+ return;
+ }
+
if (skb_warn_if_lro(skb)) {
kfree_skb(skb);
return;
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index ed307db..c14360b 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -330,7 +330,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
p->path_cost = port_cost(dev);
p->priority = 0x8000 >> BR_PORT_BITS;
p->port_no = index;
- p->flags = BR_LEARNING | BR_FLOOD;
+ p->flags = BR_LEARNING | BR_FLOOD | BR_FORWARD;
br_init_port(p);
br_set_state(p, BR_STATE_DISABLED);
br_stp_port_timer_init(p);
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 9f5eb55..2d96033 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -61,7 +61,8 @@ static int br_port_fill_attrs(struct sk_buff *skb,
nla_put_u8(skb, IFLA_BRPORT_FAST_LEAVE, !!(p->flags & BR_MULTICAST_FAST_LEAVE)) ||
nla_put_u8(skb, IFLA_BRPORT_LEARNING, !!(p->flags & BR_LEARNING)) ||
nla_put_u8(skb, IFLA_BRPORT_UNICAST_FLOOD, !!(p->flags & BR_FLOOD)) ||
- nla_put_u8(skb, IFLA_BRPORT_PROXYARP, !!(p->flags & BR_PROXYARP)))
+ nla_put_u8(skb, IFLA_BRPORT_PROXYARP, !!(p->flags & BR_PROXYARP)) ||
+ nla_put_u8(skb, IFLA_BRPORT_FORWARD, !!(p->flags & BR_FORWARD)))
return -EMSGSIZE;
return 0;
@@ -335,6 +336,7 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
+ br_set_port_flag(p, tb, IFLA_BRPORT_FORWARD, BR_FORWARD);
if (tb[IFLA_BRPORT_COST]) {
err = br_stp_set_path_cost(p, nla_get_u32(tb[IFLA_BRPORT_COST]));
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 2de5d91..271e9ab 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -171,6 +171,7 @@ BRPORT_ATTR_FLAG(root_block, BR_ROOT_BLOCK);
BRPORT_ATTR_FLAG(learning, BR_LEARNING);
BRPORT_ATTR_FLAG(unicast_flood, BR_FLOOD);
BRPORT_ATTR_FLAG(proxyarp, BR_PROXYARP);
+BRPORT_ATTR_FLAG(forward, BR_FORWARD);
#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d06107d..e6a93eb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2785,7 +2785,9 @@ int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
brport_nla_put_flag(skb, flags, mask,
IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD) ||
brport_nla_put_flag(skb, flags, mask,
- IFLA_BRPORT_PROXYARP, BR_PROXYARP)) {
+ IFLA_BRPORT_PROXYARP, BR_PROXYARP) ||
+ brport_nla_put_flag(skb, flags, mask,
+ IFLA_BRPORT_FORWARD, BR_FORWARD)) {
nla_nest_cancel(skb, protinfo);
goto nla_put_failure;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
2015-01-17 7:32 [RFC PATCH net-next] bridge: ability to disable forwarding on a port roopa
@ 2015-01-17 21:14 ` Arad, Ronen
2015-01-18 8:48 ` roopa
2015-01-18 1:05 ` Scott Feldman
1 sibling, 1 reply; 10+ messages in thread
From: Arad, Ronen @ 2015-01-17 21:14 UTC (permalink / raw)
To: roopa@cumulusnetworks.com, stephen@networkplumber.org,
davem@davemloft.net, jhs@mojatatu.com, netdev@vger.kernel.org,
sfeldma@gmail.com, jiri@resnulli.us, tgraf@suug.ch,
john.fastabend@gmail.com, vyasevic@redhat.com
Cc: wkok@cumulusnetworks.com, gospo@cumulusnetworks.com
>-----Original Message-----
>From: roopa@cumulusnetworks.com [mailto:roopa@cumulusnetworks.com]
>Sent: Friday, January 16, 2015 11:33 PM
>To: stephen@networkplumber.org; davem@davemloft.net; jhs@mojatatu.com;
>sfeldma@gmail.com; jiri@resnulli.us; Arad, Ronen; tgraf@suug.ch;
>john.fastabend@gmail.com; vyasevic@redhat.com
>Cc: netdev@vger.kernel.org; wkok@cumulusnetworks.com;
>gospo@cumulusnetworks.com
>Subject: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
>
>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
>On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
>there is a need to not re-forward the frames that come up to the
>kernel in software.
>
>Typically these are broadcast or multicast packets forwarded by the
>hardware to multiple destination ports including sending a copy of
>the packet to the kernel (e.g. an arp broadcast).
>The bridge driver will try to forward the packet again, resulting in
>two copies of the same packet.
>
>These packets can also come up to the kernel for logging when they hit
>a LOG acl in hardware.
>
>This patch makes forwarding a flag on the port similar to
>learn and flood and drops the packet just before forwarding.
>(The forwarding disable on a bridge is tested to work on our boxes.
>The bridge port flag addition is only compile tested.
>This will need to be further refined to cover cases where a non-switch port
>is bridged to a switch port etc. We will submit more patches to cover
>all cases if we agree on this approach).
>
>Other ways to solve the same problem could be to:
>- use the offload feature flag on these switch ports to avoid the
>re-forward:
>https://www.marc.info/?l=linux-netdev&m=141820235010603&w=2
>
>- Or the switch driver can mark or set a flag in the skb, which the bridge
>driver can use to avoid a re-forward.
>
The proposed patch does not go along with the offload feature flag.
The premise of the offload feature flag is that offloading is driven by the
switch port driver without user intervention. This patch requires different
setting for BR_FLOOD in the software bridge port and the switch port driver.
The alternatives suggested (offload flag or skb flag) are better.
The proposed patch avoids re-forward but not without cost. For example in the
case of unicast flood with local destination, the skb is cloned for each port
before the forward avoidance in __br_forward. Is it acceptable overhead?
[...]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
2015-01-17 7:32 [RFC PATCH net-next] bridge: ability to disable forwarding on a port roopa
2015-01-17 21:14 ` Arad, Ronen
@ 2015-01-18 1:05 ` Scott Feldman
2015-01-18 9:10 ` roopa
1 sibling, 1 reply; 10+ messages in thread
From: Scott Feldman @ 2015-01-18 1:05 UTC (permalink / raw)
To: Roopa Prabhu
Cc: stephen@networkplumber.org, David S. Miller, Jamal Hadi Salim,
Jiří Pírko, Arad, Ronen, Thomas Graf,
john fastabend, vyasevic@redhat.com, Netdev, Wilson Kok,
Andy Gospodarek
On Fri, Jan 16, 2015 at 11:32 PM, <roopa@cumulusnetworks.com> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
> there is a need to not re-forward the frames that come up to the
> kernel in software.
>
> Typically these are broadcast or multicast packets forwarded by the
> hardware to multiple destination ports including sending a copy of
> the packet to the kernel (e.g. an arp broadcast).
> The bridge driver will try to forward the packet again, resulting in
> two copies of the same packet.
>
> These packets can also come up to the kernel for logging when they hit
> a LOG acl in hardware.
>
> This patch makes forwarding a flag on the port similar to
> learn and flood and drops the packet just before forwarding.
> (The forwarding disable on a bridge is tested to work on our boxes.
> The bridge port flag addition is only compile tested.
> This will need to be further refined to cover cases where a non-switch port
> is bridged to a switch port etc. We will submit more patches to cover
> all cases if we agree on this approach).
Good topic to bring up, thanks for proposing a patch. There is indeed
duplicate pkts sent out in the case where both the bridge and the
offloaded device are flooding these non-unicast pkts, such as ARP
requests. We do have per-port control today over unicast flooding
using BR_FLOOD (IFLA_BRPORT_UNICAST_FLOOD).
As you point out, this doesn't solve the case for non-offloaded ports
bridged with switch ports. If this port setting is enabled on an
offloaded switch port, for example, the non-offloaded port can't get
an ARP request resolved, if the MAC is behind the offloaded switch
port. But do we care? Is there a use-case for this one, mixing
offloaded and non-offloaded ports in a bridge?
>
> Other ways to solve the same problem could be to:
> - use the offload feature flag on these switch ports to avoid the
> re-forward:
> https://www.marc.info/?l=linux-netdev&m=141820235010603&w=2
>
> - Or the switch driver can mark or set a flag in the skb, which the bridge
> driver can use to avoid a re-forward.
>
> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> ---
> include/linux/if_bridge.h | 3 ++-
> include/uapi/linux/if_link.h | 1 +
> net/bridge/br_forward.c | 13 +++++++++++++
> net/bridge/br_if.c | 2 +-
> net/bridge/br_netlink.c | 4 +++-
> net/bridge/br_sysfs_if.c | 1 +
> net/core/rtnetlink.c | 4 +++-
> 7 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 0a8ce76..c79f4eb 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -40,10 +40,11 @@ struct br_ip_list {
> #define BR_ADMIN_COST BIT(4)
> #define BR_LEARNING BIT(5)
> #define BR_FLOOD BIT(6)
> -#define BR_AUTO_MASK (BR_FLOOD | BR_LEARNING)
> #define BR_PROMISC BIT(7)
> #define BR_PROXYARP BIT(8)
> #define BR_LEARNING_SYNC BIT(9)
> +#define BR_FORWARD BIT(10)
The name BR_FORWARD might confuse people thinking this is related to
STP FORWARDING state. We have BR_FLOOD for unknown unicast flooding.
How about renaming BR_FLOOD to BR_FLOOD_UNICAST and add
BR_FLOOD_BROADCAST? So you would have:
IFLA_BRPORT_UNICAST_FLOOD BR_FLOOD_UNICAST /* flood
unknown unicast traffic to port */
IFLA_BRPORT_BROADCAST_FLOOD BR_FLOOD_BROADCAST /* flood
bcast/mcast traffic to port */
> +#define BR_AUTO_MASK (BR_FLOOD | BR_LEARNING | BR_FORWARD)
>
> extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..d394625 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -246,6 +246,7 @@ enum {
> IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
> IFLA_BRPORT_PROXYARP, /* proxy ARP */
> IFLA_BRPORT_LEARNING_SYNC, /* mac learning sync from device */
> + IFLA_BRPORT_FORWARD, /* enable forwarding on a device */
> __IFLA_BRPORT_MAX
> };
> #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index f96933a..98c41c8 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -81,10 +81,23 @@ static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
> br_forward_finish);
> }
>
> +int br_hw_forward_finish(struct sk_buff *skb)
> +{
> + kfree_skb(skb);
> +
> + return 0;
> +}
> +
> static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
> {
> struct net_device *indev;
>
> + if (!(to->flags & BR_FORWARD)) {
> + NF_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD, skb, skb->dev, to->dev,
> + br_hw_forward_finish);
> + return;
> + }
> +
Seems you should make the (flags & BR_FORWARD) check earlier, before
skb cloning, in br_flood(), alongside the (flags & BR_FLOOD) check.
Also, the above code is skipping some vlan checks (br_handle_vlan).
-scott
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
2015-01-17 21:14 ` Arad, Ronen
@ 2015-01-18 8:48 ` roopa
0 siblings, 0 replies; 10+ messages in thread
From: roopa @ 2015-01-18 8:48 UTC (permalink / raw)
To: Arad, Ronen
Cc: stephen@networkplumber.org, davem@davemloft.net, jhs@mojatatu.com,
netdev@vger.kernel.org, sfeldma@gmail.com, jiri@resnulli.us,
tgraf@suug.ch, john.fastabend@gmail.com, vyasevic@redhat.com,
wkok@cumulusnetworks.com, gospo@cumulusnetworks.com
On 1/17/15, 1:14 PM, Arad, Ronen wrote:
>
>> -----Original Message-----
>> From: roopa@cumulusnetworks.com [mailto:roopa@cumulusnetworks.com]
>> Sent: Friday, January 16, 2015 11:33 PM
>> To: stephen@networkplumber.org; davem@davemloft.net; jhs@mojatatu.com;
>> sfeldma@gmail.com; jiri@resnulli.us; Arad, Ronen; tgraf@suug.ch;
>> john.fastabend@gmail.com; vyasevic@redhat.com
>> Cc: netdev@vger.kernel.org; wkok@cumulusnetworks.com;
>> gospo@cumulusnetworks.com
>> Subject: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
>>
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
>> there is a need to not re-forward the frames that come up to the
>> kernel in software.
>>
>> Typically these are broadcast or multicast packets forwarded by the
>> hardware to multiple destination ports including sending a copy of
>> the packet to the kernel (e.g. an arp broadcast).
>> The bridge driver will try to forward the packet again, resulting in
>> two copies of the same packet.
>>
>> These packets can also come up to the kernel for logging when they hit
>> a LOG acl in hardware.
>>
>> This patch makes forwarding a flag on the port similar to
>> learn and flood and drops the packet just before forwarding.
>> (The forwarding disable on a bridge is tested to work on our boxes.
>> The bridge port flag addition is only compile tested.
>> This will need to be further refined to cover cases where a non-switch port
>> is bridged to a switch port etc. We will submit more patches to cover
>> all cases if we agree on this approach).
>>
>> Other ways to solve the same problem could be to:
>> - use the offload feature flag on these switch ports to avoid the
>> re-forward:
>> https://www.marc.info/?l=linux-netdev&m=141820235010603&w=2
>>
>> - Or the switch driver can mark or set a flag in the skb, which the bridge
>> driver can use to avoid a re-forward.
>>
> The proposed patch does not go along with the offload feature flag.
> The premise of the offload feature flag is that offloading is driven by the
> switch port driver without user intervention. This patch requires different
> setting for BR_FLOOD in the software bridge port and the switch port driver.
Even with the offload feature flag, there are a few bridge port
attributes that can be set
from userspace which can be different for sw and hw. So, with this patch
I was
trying to see if making it similar to the other existing flags would be
better.
> The alternatives suggested (offload flag or skb flag) are better.
Glad to know your opinion.
>
> The proposed patch avoids re-forward but not without cost. For example in the
> case of unicast flood with local destination, the skb is cloned for each port
> before the forward avoidance in __br_forward. Is it acceptable overhead?
This patch places it in __br_forward to just make sure all the paths
including the netfilter
hook is traversed for these packets. And plus it was written with
minimal changes in mind.
We have not measured it but the extra overhead for these pkts that hit
the kernel has been negligible. But, I will look at the code to see if
it can be avoided.
Thanks,
Roopa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
2015-01-18 1:05 ` Scott Feldman
@ 2015-01-18 9:10 ` roopa
2015-01-18 20:55 ` roopa
0 siblings, 1 reply; 10+ messages in thread
From: roopa @ 2015-01-18 9:10 UTC (permalink / raw)
To: Scott Feldman
Cc: stephen@networkplumber.org, David S. Miller, Jamal Hadi Salim,
Jiří Pírko, Arad, Ronen, Thomas Graf,
john fastabend, vyasevic@redhat.com, Netdev, Wilson Kok,
Andy Gospodarek
On 1/17/15, 5:05 PM, Scott Feldman wrote:
> On Fri, Jan 16, 2015 at 11:32 PM, <roopa@cumulusnetworks.com> wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
>> there is a need to not re-forward the frames that come up to the
>> kernel in software.
>>
>> Typically these are broadcast or multicast packets forwarded by the
>> hardware to multiple destination ports including sending a copy of
>> the packet to the kernel (e.g. an arp broadcast).
>> The bridge driver will try to forward the packet again, resulting in
>> two copies of the same packet.
>>
>> These packets can also come up to the kernel for logging when they hit
>> a LOG acl in hardware.
>>
>> This patch makes forwarding a flag on the port similar to
>> learn and flood and drops the packet just before forwarding.
>> (The forwarding disable on a bridge is tested to work on our boxes.
>> The bridge port flag addition is only compile tested.
>> This will need to be further refined to cover cases where a non-switch port
>> is bridged to a switch port etc. We will submit more patches to cover
>> all cases if we agree on this approach).
> Good topic to bring up, thanks for proposing a patch. There is indeed
> duplicate pkts sent out in the case where both the bridge and the
> offloaded device are flooding these non-unicast pkts, such as ARP
> requests. We do have per-port control today over unicast flooding
> using BR_FLOOD (IFLA_BRPORT_UNICAST_FLOOD).
>
> As you point out, this doesn't solve the case for non-offloaded ports
> bridged with switch ports. If this port setting is enabled on an
> offloaded switch port, for example, the non-offloaded port can't get
> an ARP request resolved, if the MAC is behind the offloaded switch
> port. But do we care? Is there a use-case for this one, mixing
> offloaded and non-offloaded ports in a bridge?
Not sure. I don't know the use case, but I think I might have heard that
there could be a case
where a switch port could be bridged with a vm's port running on the
switch. (?)
>
>> Other ways to solve the same problem could be to:
>> - use the offload feature flag on these switch ports to avoid the
>> re-forward:
>> https://www.marc.info/?l=linux-netdev&m=141820235010603&w=2
>>
>> - Or the switch driver can mark or set a flag in the skb, which the bridge
>> driver can use to avoid a re-forward.
>>
>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> ---
>> include/linux/if_bridge.h | 3 ++-
>> include/uapi/linux/if_link.h | 1 +
>> net/bridge/br_forward.c | 13 +++++++++++++
>> net/bridge/br_if.c | 2 +-
>> net/bridge/br_netlink.c | 4 +++-
>> net/bridge/br_sysfs_if.c | 1 +
>> net/core/rtnetlink.c | 4 +++-
>> 7 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
>> index 0a8ce76..c79f4eb 100644
>> --- a/include/linux/if_bridge.h
>> +++ b/include/linux/if_bridge.h
>> @@ -40,10 +40,11 @@ struct br_ip_list {
>> #define BR_ADMIN_COST BIT(4)
>> #define BR_LEARNING BIT(5)
>> #define BR_FLOOD BIT(6)
>> -#define BR_AUTO_MASK (BR_FLOOD | BR_LEARNING)
>> #define BR_PROMISC BIT(7)
>> #define BR_PROXYARP BIT(8)
>> #define BR_LEARNING_SYNC BIT(9)
>> +#define BR_FORWARD BIT(10)
> The name BR_FORWARD might confuse people thinking this is related to
> STP FORWARDING state.
yes, that thought crossed my mind too. I had BR_FORWARDING initially and
to make it sound less like
STP changed it to BR_FORWARD. :)
> We have BR_FLOOD for unknown unicast flooding.
> How about renaming BR_FLOOD to BR_FLOOD_UNICAST and add
> BR_FLOOD_BROADCAST? So you would have:
>
> IFLA_BRPORT_UNICAST_FLOOD BR_FLOOD_UNICAST /* flood
> unknown unicast traffic to port */
> IFLA_BRPORT_BROADCAST_FLOOD BR_FLOOD_BROADCAST /* flood
> bcast/mcast traffic to port */
That's a good idea. So, unknown unicast and broadcast will be covered
with that.
Am afraid there might be other packets, like the acl LOG packet hitting
the CPU/kernel (?)
I will check if there are others.
>
>> +#define BR_AUTO_MASK (BR_FLOOD | BR_LEARNING | BR_FORWARD)
>>
>> extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
>>
>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>> index f7d0d2d..d394625 100644
>> --- a/include/uapi/linux/if_link.h
>> +++ b/include/uapi/linux/if_link.h
>> @@ -246,6 +246,7 @@ enum {
>> IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
>> IFLA_BRPORT_PROXYARP, /* proxy ARP */
>> IFLA_BRPORT_LEARNING_SYNC, /* mac learning sync from device */
>> + IFLA_BRPORT_FORWARD, /* enable forwarding on a device */
>> __IFLA_BRPORT_MAX
>> };
>> #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
>> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
>> index f96933a..98c41c8 100644
>> --- a/net/bridge/br_forward.c
>> +++ b/net/bridge/br_forward.c
>> @@ -81,10 +81,23 @@ static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
>> br_forward_finish);
>> }
>>
>> +int br_hw_forward_finish(struct sk_buff *skb)
>> +{
>> + kfree_skb(skb);
>> +
>> + return 0;
>> +}
>> +
>> static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
>> {
>> struct net_device *indev;
>>
>> + if (!(to->flags & BR_FORWARD)) {
>> + NF_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD, skb, skb->dev, to->dev,
>> + br_hw_forward_finish);
>> + return;
>> + }
>> +
> Seems you should make the (flags & BR_FORWARD) check earlier, before
> skb cloning, in br_flood(), alongside the (flags & BR_FLOOD) check.
This patch strategically places it in __br_forward to catch all forwards
(due to floods or no floods ..with direct call to br_foward)
with minimal code changes in mind. Will see if the clone can be avoided.
>
> Also, the above code is skipping some vlan checks (br_handle_vlan).
The br_handle_vlan checks seemed not necessary for a packet getting dropped.
But, will check and fix if its needed while the packet traverses the
netfilter hook and before it gets dropped.
Thanks,
Roopa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
2015-01-18 9:10 ` roopa
@ 2015-01-18 20:55 ` roopa
2015-01-19 7:37 ` Scott Feldman
0 siblings, 1 reply; 10+ messages in thread
From: roopa @ 2015-01-18 20:55 UTC (permalink / raw)
To: Scott Feldman
Cc: stephen@networkplumber.org, David S. Miller, Jamal Hadi Salim,
Jiří Pírko, Arad, Ronen, Thomas Graf,
john fastabend, vyasevic@redhat.com, Netdev, Wilson Kok,
Andy Gospodarek
On 1/18/15, 1:10 AM, roopa wrote:
> On 1/17/15, 5:05 PM, Scott Feldman wrote:
>> On Fri, Jan 16, 2015 at 11:32 PM, <roopa@cumulusnetworks.com> wrote:
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
>>> there is a need to not re-forward the frames that come up to the
>>> kernel in software.
>>>
>>> Typically these are broadcast or multicast packets forwarded by the
>>> hardware to multiple destination ports including sending a copy of
>>> the packet to the kernel (e.g. an arp broadcast).
>>> The bridge driver will try to forward the packet again, resulting in
>>> two copies of the same packet.
>>>
>>> These packets can also come up to the kernel for logging when they hit
>>> a LOG acl in hardware.
>>>
>>> This patch makes forwarding a flag on the port similar to
>>> learn and flood and drops the packet just before forwarding.
>>> (The forwarding disable on a bridge is tested to work on our boxes.
>>> The bridge port flag addition is only compile tested.
>>> This will need to be further refined to cover cases where a
>>> non-switch port
>>> is bridged to a switch port etc. We will submit more patches to cover
>>> all cases if we agree on this approach).
>> Good topic to bring up, thanks for proposing a patch. There is indeed
>> duplicate pkts sent out in the case where both the bridge and the
>> offloaded device are flooding these non-unicast pkts, such as ARP
>> requests. We do have per-port control today over unicast flooding
>> using BR_FLOOD (IFLA_BRPORT_UNICAST_FLOOD).
>>
>> As you point out, this doesn't solve the case for non-offloaded ports
>> bridged with switch ports. If this port setting is enabled on an
>> offloaded switch port, for example, the non-offloaded port can't get
>> an ARP request resolved, if the MAC is behind the offloaded switch
>> port. But do we care? Is there a use-case for this one, mixing
>> offloaded and non-offloaded ports in a bridge?
>
> Not sure. I don't know the use case, but I think I might have heard
> that there could be a case
> where a switch port could be bridged with a vm's port running on the
> switch. (?)
Ignoring the above usecase for a bit. And thinking through this again,
It appears that this check should
be only on the ingress port, no ?
If the ingress bridge port is an offloaded port, don't flood or forward
because hardware has already done it.
And this is best done with the offload feature flag on the bridge port.
But, If we bring in the usecase, where a bridge has offloaded and
non-offloaded ports mixed,
there should be an option to disable this check and to achieve that its
better for this to be a
flag on the port maintained by the bridge driver like the one proposed
by this patch (BR_FORWARD).
Thanks,
Roopa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
2015-01-18 20:55 ` roopa
@ 2015-01-19 7:37 ` Scott Feldman
2015-01-20 6:20 ` roopa
0 siblings, 1 reply; 10+ messages in thread
From: Scott Feldman @ 2015-01-19 7:37 UTC (permalink / raw)
To: roopa
Cc: stephen@networkplumber.org, David S. Miller, Jamal Hadi Salim,
Jiří Pírko, Arad, Ronen, Thomas Graf,
john fastabend, vyasevic@redhat.com, Netdev, Wilson Kok,
Andy Gospodarek
On Sun, Jan 18, 2015 at 12:55 PM, roopa <roopa@cumulusnetworks.com> wrote:
> On 1/18/15, 1:10 AM, roopa wrote:
>>
>> On 1/17/15, 5:05 PM, Scott Feldman wrote:
>>>
>>> On Fri, Jan 16, 2015 at 11:32 PM, <roopa@cumulusnetworks.com> wrote:
>>>>
>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>
>>>> On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
>>>> there is a need to not re-forward the frames that come up to the
>>>> kernel in software.
>>>>
>>>> Typically these are broadcast or multicast packets forwarded by the
>>>> hardware to multiple destination ports including sending a copy of
>>>> the packet to the kernel (e.g. an arp broadcast).
>>>> The bridge driver will try to forward the packet again, resulting in
>>>> two copies of the same packet.
>>>>
>>>> These packets can also come up to the kernel for logging when they hit
>>>> a LOG acl in hardware.
>>>>
>>>> This patch makes forwarding a flag on the port similar to
>>>> learn and flood and drops the packet just before forwarding.
>>>> (The forwarding disable on a bridge is tested to work on our boxes.
>>>> The bridge port flag addition is only compile tested.
>>>> This will need to be further refined to cover cases where a non-switch
>>>> port
>>>> is bridged to a switch port etc. We will submit more patches to cover
>>>> all cases if we agree on this approach).
>>>
>>> Good topic to bring up, thanks for proposing a patch. There is indeed
>>> duplicate pkts sent out in the case where both the bridge and the
>>> offloaded device are flooding these non-unicast pkts, such as ARP
>>> requests. We do have per-port control today over unicast flooding
>>> using BR_FLOOD (IFLA_BRPORT_UNICAST_FLOOD).
>>>
>>> As you point out, this doesn't solve the case for non-offloaded ports
>>> bridged with switch ports. If this port setting is enabled on an
>>> offloaded switch port, for example, the non-offloaded port can't get
>>> an ARP request resolved, if the MAC is behind the offloaded switch
>>> port. But do we care? Is there a use-case for this one, mixing
>>> offloaded and non-offloaded ports in a bridge?
>>
>>
>> Not sure. I don't know the use case, but I think I might have heard that
>> there could be a case
>> where a switch port could be bridged with a vm's port running on the
>> switch. (?)
>
>
> Ignoring the above usecase for a bit. And thinking through this again, It
> appears that this check should
> be only on the ingress port, no ?
>
> If the ingress bridge port is an offloaded port, don't flood or forward
> because hardware has already done it.
> And this is best done with the offload feature flag on the bridge port.
That's assuming hardware did the flood. Maybe your other option to
mark the skb if already flooded by hw is best. That's enough info for
the bridge driver to make a decision to flood or not to the other
ports, and it's an implementation decision for the driver/device to do
the flood offload, if desired, and mark the skb if it did.
Btw, you're still saying flood or forward, but in my mind we're
talking about flood only: flood of unknown unicast or flood of
bcast/mcast pkts. Forwarding would be for known-unicast pkts, which
should even involve the bridge driver since that forwarding is
offloaded to the device.
>
> But, If we bring in the usecase, where a bridge has offloaded and
> non-offloaded ports mixed,
> there should be an option to disable this check and to achieve that its
> better for this to be a
> flag on the port maintained by the bridge driver like the one proposed by
> this patch (BR_FORWARD).
>
> Thanks,
> Roopa
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
2015-01-19 7:37 ` Scott Feldman
@ 2015-01-20 6:20 ` roopa
2015-01-20 7:09 ` Samudrala, Sridhar
0 siblings, 1 reply; 10+ messages in thread
From: roopa @ 2015-01-20 6:20 UTC (permalink / raw)
To: Scott Feldman
Cc: stephen@networkplumber.org, David S. Miller, Jamal Hadi Salim,
Jiří Pírko, Arad, Ronen, Thomas Graf,
john fastabend, vyasevic@redhat.com, Netdev, Wilson Kok,
Andy Gospodarek
On 1/18/15, 11:37 PM, Scott Feldman wrote:
<snip..>
>
> Not sure. I don't know the use case, but I think I might have heard that
> there could be a case
> where a switch port could be bridged with a vm's port running on the
> switch. (?)
>>
>> Ignoring the above usecase for a bit. And thinking through this again, It
>> appears that this check should
>> be only on the ingress port, no ?
>>
>> If the ingress bridge port is an offloaded port, don't flood or forward
>> because hardware has already done it.
>> And this is best done with the offload feature flag on the bridge port.
> That's assuming hardware did the flood. Maybe your other option to
> mark the skb if already flooded by hw is best. That's enough info for
> the bridge driver to make a decision to flood or not to the other
> ports, and it's an implementation decision for the driver/device to do
> the flood offload, if desired, and mark the skb if it did.
Still thinking we can just use the offload feature flag here. How about
avoid forwarding only if both src and dst ports have
forwarding offloaded/accelerated by a switch asic ?. That should cover
all cases.
>
> Btw, you're still saying flood or forward, but in my mind we're
> talking about flood only: flood of unknown unicast or flood of
> bcast/mcast pkts. Forwarding would be for known-unicast pkts, which
> should even involve the bridge driver since that forwarding is
> offloaded to the device.
I was also taking into account pkts copied to the CPU due to an acl rule such as log.
These unicast pkts can come to cpu even if it is already forwarded in hw.
Thanks,
Roopa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
2015-01-20 6:20 ` roopa
@ 2015-01-20 7:09 ` Samudrala, Sridhar
2015-01-20 13:59 ` roopa
0 siblings, 1 reply; 10+ messages in thread
From: Samudrala, Sridhar @ 2015-01-20 7:09 UTC (permalink / raw)
To: roopa, Scott Feldman
Cc: stephen@networkplumber.org, David S. Miller, Jamal Hadi Salim,
Jiří Pírko, Arad, Ronen, Thomas Graf,
john fastabend, vyasevic@redhat.com, Netdev, Wilson Kok,
Andy Gospodarek
On 1/19/2015 10:20 PM, roopa wrote:
> On 1/18/15, 11:37 PM, Scott Feldman wrote:
>
> <snip..>
>>
>> Not sure. I don't know the use case, but I think I might have heard that
>> there could be a case
>> where a switch port could be bridged with a vm's port running on the
>> switch. (?)
>>>
>>> Ignoring the above usecase for a bit. And thinking through this
>>> again, It
>>> appears that this check should
>>> be only on the ingress port, no ?
>>>
>>> If the ingress bridge port is an offloaded port, don't flood or forward
>>> because hardware has already done it.
>>> And this is best done with the offload feature flag on the bridge port.
>> That's assuming hardware did the flood. Maybe your other option to
>> mark the skb if already flooded by hw is best. That's enough info for
>> the bridge driver to make a decision to flood or not to the other
>> ports, and it's an implementation decision for the driver/device to do
>> the flood offload, if desired, and mark the skb if it did.
>
> Still thinking we can just use the offload feature flag here. How
> about avoid forwarding only if both src and dst ports have
> forwarding offloaded/accelerated by a switch asic ?. That should
> cover all cases.
>>
>> Btw, you're still saying flood or forward, but in my mind we're
>> talking about flood only: flood of unknown unicast or flood of
>> bcast/mcast pkts. Forwarding would be for known-unicast pkts, which
>> should even involve the bridge driver since that forwarding is
>> offloaded to the device.
>
> I was also taking into account pkts copied to the CPU due to an acl
> rule such as log.
> These unicast pkts can come to cpu even if it is already forwarded in hw.
Do you have a switch port netdev that corresponds to CPU port? Or are
they seen on the bridge device?
Thanks
Sridhar
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
2015-01-20 7:09 ` Samudrala, Sridhar
@ 2015-01-20 13:59 ` roopa
0 siblings, 0 replies; 10+ messages in thread
From: roopa @ 2015-01-20 13:59 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: Scott Feldman, stephen@networkplumber.org, David S. Miller,
Jamal Hadi Salim, Jiří Pírko, Arad, Ronen,
Thomas Graf, john fastabend, vyasevic@redhat.com, Netdev,
Wilson Kok, Andy Gospodarek
On 1/19/15, 11:09 PM, Samudrala, Sridhar wrote:
>
> On 1/19/2015 10:20 PM, roopa wrote:
>> On 1/18/15, 11:37 PM, Scott Feldman wrote:
>>
>> <snip..>
>>>
>>> Not sure. I don't know the use case, but I think I might have heard
>>> that
>>> there could be a case
>>> where a switch port could be bridged with a vm's port running on the
>>> switch. (?)
>>>>
>>>> Ignoring the above usecase for a bit. And thinking through this
>>>> again, It
>>>> appears that this check should
>>>> be only on the ingress port, no ?
>>>>
>>>> If the ingress bridge port is an offloaded port, don't flood or
>>>> forward
>>>> because hardware has already done it.
>>>> And this is best done with the offload feature flag on the bridge
>>>> port.
>>> That's assuming hardware did the flood. Maybe your other option to
>>> mark the skb if already flooded by hw is best. That's enough info for
>>> the bridge driver to make a decision to flood or not to the other
>>> ports, and it's an implementation decision for the driver/device to do
>>> the flood offload, if desired, and mark the skb if it did.
>>
>> Still thinking we can just use the offload feature flag here. How
>> about avoid forwarding only if both src and dst ports have
>> forwarding offloaded/accelerated by a switch asic ?. That should
>> cover all cases.
>>>
>>> Btw, you're still saying flood or forward, but in my mind we're
>>> talking about flood only: flood of unknown unicast or flood of
>>> bcast/mcast pkts. Forwarding would be for known-unicast pkts, which
>>> should even involve the bridge driver since that forwarding is
>>> offloaded to the device.
>>
>> I was also taking into account pkts copied to the CPU due to an acl
>> rule such as log.
>> These unicast pkts can come to cpu even if it is already forwarded in
>> hw.
>
> Do you have a switch port netdev that corresponds to CPU port? Or are
> they seen on the bridge device?
No we dont have a netdev for the CPU port. These show up on the netdev
corresponding to the ingress front panel port.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-01-20 13:59 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-17 7:32 [RFC PATCH net-next] bridge: ability to disable forwarding on a port roopa
2015-01-17 21:14 ` Arad, Ronen
2015-01-18 8:48 ` roopa
2015-01-18 1:05 ` Scott Feldman
2015-01-18 9:10 ` roopa
2015-01-18 20:55 ` roopa
2015-01-19 7:37 ` Scott Feldman
2015-01-20 6:20 ` roopa
2015-01-20 7:09 ` Samudrala, Sridhar
2015-01-20 13:59 ` roopa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).