* [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
@ 2014-12-10 9:05 roopa
2014-12-10 9:37 ` Jiri Pirko
0 siblings, 1 reply; 31+ messages in thread
From: roopa @ 2014-12-10 9:05 UTC (permalink / raw)
To: jiri, sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen,
linville, vyasevic
Cc: netdev, davem, shm, gospo, Roopa Prabhu
From: Roopa Prabhu <roopa@cumulusnetworks.com>
This patch adds two new api's netdev_switch_port_bridge_setlink
and netdev_switch_port_bridge_dellink to offload bridge port attributes
to switch asic
(The names of the apis look odd with 'switch_port_bridge',
but am more inclined to change the prefix of the api to something else.
Will take any suggestions).
The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
pass bridge port attributes to the port device.
If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
the bridge port attribute offload ndo, call bridge port attribute ndo's on
the lowerdevs if supported. This is one way to pass bridge port attributes
through stacked netdevs (example when bridge port is a bond and bond slaves
are switch ports).
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
include/net/switchdev.h | 5 +++-
net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 74 insertions(+), 1 deletion(-)
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 8a6d164..22676b6 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -17,7 +17,10 @@
int netdev_switch_parent_id_get(struct net_device *dev,
struct netdev_phys_item_id *psid);
int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
-
+int netdev_switch_port_bridge_setlink(struct net_device *dev,
+ struct nlmsghdr *nlh, u16 flags);
+int netdev_switch_port_bridge_dellink(struct net_device *dev,
+ struct nlmsghdr *nlh, u16 flags);
#else
static inline int netdev_switch_parent_id_get(struct net_device *dev,
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index d162b21..62317e1 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
return ops->ndo_switch_port_stp_update(dev, state);
}
EXPORT_SYMBOL(netdev_switch_port_stp_update);
+
+/**
+ * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
+ * port attributes
+ *
+ * @dev: port device
+ * @nlh: netlink msg with bridge port attributes
+ *
+ * Notify switch device port of bridge port attributes
+ */
+int netdev_switch_port_bridge_setlink(struct net_device *dev,
+ struct nlmsghdr *nlh, u16 flags)
+{
+ const struct net_device_ops *ops = dev->netdev_ops;
+ struct net_device *lower_dev;
+ struct list_head *iter;
+ int ret = 0, err = 0;
+
+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
+ return err;
+
+ if (ops->ndo_bridge_setlink) {
+ WARN_ON(!ops->ndo_switch_parent_id_get);
+ return ops->ndo_bridge_setlink(dev, nlh, flags);
+ }
+
+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
+ err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
+ if (err)
+ ret = err;
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
+
+/**
+ * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
+ * attribute delete
+ *
+ * @dev: port device
+ * @nlh: netlink msg with bridge port attributes
+ *
+ * Notify switch device port of bridge port attribute delete
+ */
+int netdev_switch_port_bridge_dellink(struct net_device *dev,
+ struct nlmsghdr *nlh, u16 flags)
+{
+ const struct net_device_ops *ops = dev->netdev_ops;
+ struct net_device *lower_dev;
+ struct list_head *iter;
+ int ret = 0, err = 0;
+
+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
+ return err;
+
+ if (ops->ndo_bridge_dellink) {
+ WARN_ON(!ops->ndo_switch_parent_id_get);
+ return ops->ndo_bridge_dellink(dev, nlh, flags);
+ }
+
+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
+ err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
+ if (err)
+ ret = err;
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-10 9:05 [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes roopa
@ 2014-12-10 9:37 ` Jiri Pirko
2014-12-11 16:52 ` Roopa Prabhu
0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2014-12-10 9:37 UTC (permalink / raw)
To: roopa
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
>This patch adds two new api's netdev_switch_port_bridge_setlink
>and netdev_switch_port_bridge_dellink to offload bridge port attributes
>to switch asic
>
>(The names of the apis look odd with 'switch_port_bridge',
>but am more inclined to change the prefix of the api to something else.
>Will take any suggestions).
>
>The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>pass bridge port attributes to the port device.
>
>If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>the bridge port attribute offload ndo, call bridge port attribute ndo's on
>the lowerdevs if supported. This is one way to pass bridge port attributes
>through stacked netdevs (example when bridge port is a bond and bond slaves
>are switch ports).
>
>Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>---
> include/net/switchdev.h | 5 +++-
> net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 74 insertions(+), 1 deletion(-)
>
>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>index 8a6d164..22676b6 100644
>--- a/include/net/switchdev.h
>+++ b/include/net/switchdev.h
>@@ -17,7 +17,10 @@
> int netdev_switch_parent_id_get(struct net_device *dev,
> struct netdev_phys_item_id *psid);
> int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>-
>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>+ struct nlmsghdr *nlh, u16 flags);
>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>+ struct nlmsghdr *nlh, u16 flags);
> #else
>
> static inline int netdev_switch_parent_id_get(struct net_device *dev,
>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>index d162b21..62317e1 100644
>--- a/net/switchdev/switchdev.c
>+++ b/net/switchdev/switchdev.c
>@@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
> return ops->ndo_switch_port_stp_update(dev, state);
> }
> EXPORT_SYMBOL(netdev_switch_port_stp_update);
>+
>+/**
>+ * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>+ * port attributes
>+ *
>+ * @dev: port device
>+ * @nlh: netlink msg with bridge port attributes
>+ *
>+ * Notify switch device port of bridge port attributes
>+ */
>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>+ struct nlmsghdr *nlh, u16 flags)
>+{
>+ const struct net_device_ops *ops = dev->netdev_ops;
>+ struct net_device *lower_dev;
>+ struct list_head *iter;
>+ int ret = 0, err = 0;
>+
>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>+ return err;
>+
>+ if (ops->ndo_bridge_setlink) {
>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>+ return ops->ndo_bridge_setlink(dev, nlh, flags);
You have to change ndo_bridge_setlink in netdevice.h first.
Otherwise when only this patch is applied (during bisection)
this won't compile.
>+ }
>+
>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
I do not understand why to iterate over lower devices. At this
stage we don't know a thing about this upper or its lowers. Let
the uppers (/masters) to decide if this needs to be propagated
or not.
>+ err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>+ if (err)
>+ ret = err;
>+ }
^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>+
>+ return ret;
>+}
>+EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>+
>+/**
>+ * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>+ * attribute delete
>+ *
>+ * @dev: port device
>+ * @nlh: netlink msg with bridge port attributes
>+ *
>+ * Notify switch device port of bridge port attribute delete
>+ */
>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>+ struct nlmsghdr *nlh, u16 flags)
>+{
>+ const struct net_device_ops *ops = dev->netdev_ops;
>+ struct net_device *lower_dev;
>+ struct list_head *iter;
>+ int ret = 0, err = 0;
>+
>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>+ return err;
>+
>+ if (ops->ndo_bridge_dellink) {
>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>+ return ops->ndo_bridge_dellink(dev, nlh, flags);
>+ }
>+
>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>+ err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>+ if (err)
>+ ret = err;
>+ }
>+
>+ return ret;
>+}
>+EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>--
>1.7.10.4
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-10 9:37 ` Jiri Pirko
@ 2014-12-11 16:52 ` Roopa Prabhu
2014-12-11 17:11 ` Jiri Pirko
0 siblings, 1 reply; 31+ messages in thread
From: Roopa Prabhu @ 2014-12-11 16:52 UTC (permalink / raw)
To: Jiri Pirko
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
On 12/10/14, 1:37 AM, Jiri Pirko wrote:
> Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> This patch adds two new api's netdev_switch_port_bridge_setlink
>> and netdev_switch_port_bridge_dellink to offload bridge port attributes
>> to switch asic
>>
>> (The names of the apis look odd with 'switch_port_bridge',
>> but am more inclined to change the prefix of the api to something else.
>> Will take any suggestions).
>>
>> The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>> pass bridge port attributes to the port device.
>>
>> If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>> the bridge port attribute offload ndo, call bridge port attribute ndo's on
>> the lowerdevs if supported. This is one way to pass bridge port attributes
>> through stacked netdevs (example when bridge port is a bond and bond slaves
>> are switch ports).
>>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> ---
>> include/net/switchdev.h | 5 +++-
>> net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 74 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index 8a6d164..22676b6 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -17,7 +17,10 @@
>> int netdev_switch_parent_id_get(struct net_device *dev,
>> struct netdev_phys_item_id *psid);
>> int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>> -
>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>> + struct nlmsghdr *nlh, u16 flags);
>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>> + struct nlmsghdr *nlh, u16 flags);
>> #else
>>
>> static inline int netdev_switch_parent_id_get(struct net_device *dev,
>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> index d162b21..62317e1 100644
>> --- a/net/switchdev/switchdev.c
>> +++ b/net/switchdev/switchdev.c
>> @@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>> return ops->ndo_switch_port_stp_update(dev, state);
>> }
>> EXPORT_SYMBOL(netdev_switch_port_stp_update);
>> +
>> +/**
>> + * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>> + * port attributes
>> + *
>> + * @dev: port device
>> + * @nlh: netlink msg with bridge port attributes
>> + *
>> + * Notify switch device port of bridge port attributes
>> + */
>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>> + struct nlmsghdr *nlh, u16 flags)
>> +{
>> + const struct net_device_ops *ops = dev->netdev_ops;
>> + struct net_device *lower_dev;
>> + struct list_head *iter;
>> + int ret = 0, err = 0;
>> +
>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>> + return err;
>> +
>> + if (ops->ndo_bridge_setlink) {
>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>> + return ops->ndo_bridge_setlink(dev, nlh, flags);
> You have to change ndo_bridge_setlink in netdevice.h first.
> Otherwise when only this patch is applied (during bisection)
> this won't compile.
ack, will fix it and keep that in mind next time.
>
>> + }
>> +
>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
> I do not understand why to iterate over lower devices. At this
> stage we don't know a thing about this upper or its lowers. Let
> the uppers (/masters) to decide if this needs to be propagated
> or not.
Jiri, In the stacked devices case, there is no way to propagate the
bridge port attributes to switch device driver today (vlan and other
bridge port attributes). Can you tell me if there is a way ?. no,
ndo_vlan* ndo's are not useful here. Nor we should go and implement
ndo_bridge_setlink* in all devices that can be bridge ports.
And this allows a switch driver to receive these callbacks if it has
marked the switch port with an offload flag. Your way of using the
switch port to get to the switch driver does not help in these cases.
The other option is to use the 'switch device (not port)' to get to the
switch driver.
This patch shows that you can still do this with the ndo ops.
>
>> + err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>> + if (err)
>> + ret = err;
>> + }
> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>> +
>> +/**
>> + * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>> + * attribute delete
>> + *
>> + * @dev: port device
>> + * @nlh: netlink msg with bridge port attributes
>> + *
>> + * Notify switch device port of bridge port attribute delete
>> + */
>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>> + struct nlmsghdr *nlh, u16 flags)
>> +{
>> + const struct net_device_ops *ops = dev->netdev_ops;
>> + struct net_device *lower_dev;
>> + struct list_head *iter;
>> + int ret = 0, err = 0;
>> +
>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>> + return err;
>> +
>> + if (ops->ndo_bridge_dellink) {
>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>> + return ops->ndo_bridge_dellink(dev, nlh, flags);
>> + }
>> +
>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>> + err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>> + if (err)
>> + ret = err;
>> + }
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>> --
>> 1.7.10.4
>>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-11 16:52 ` Roopa Prabhu
@ 2014-12-11 17:11 ` Jiri Pirko
2014-12-11 17:59 ` Roopa Prabhu
0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2014-12-11 17:11 UTC (permalink / raw)
To: Roopa Prabhu
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>>This patch adds two new api's netdev_switch_port_bridge_setlink
>>>and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>to switch asic
>>>
>>>(The names of the apis look odd with 'switch_port_bridge',
>>>but am more inclined to change the prefix of the api to something else.
>>>Will take any suggestions).
>>>
>>>The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>pass bridge port attributes to the port device.
>>>
>>>If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>the lowerdevs if supported. This is one way to pass bridge port attributes
>>>through stacked netdevs (example when bridge port is a bond and bond slaves
>>>are switch ports).
>>>
>>>Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>---
>>>include/net/switchdev.h | 5 +++-
>>>net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>2 files changed, 74 insertions(+), 1 deletion(-)
>>>
>>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>index 8a6d164..22676b6 100644
>>>--- a/include/net/switchdev.h
>>>+++ b/include/net/switchdev.h
>>>@@ -17,7 +17,10 @@
>>>int netdev_switch_parent_id_get(struct net_device *dev,
>>> struct netdev_phys_item_id *psid);
>>>int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>-
>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>+ struct nlmsghdr *nlh, u16 flags);
>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>+ struct nlmsghdr *nlh, u16 flags);
>>>#else
>>>
>>>static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>index d162b21..62317e1 100644
>>>--- a/net/switchdev/switchdev.c
>>>+++ b/net/switchdev/switchdev.c
>>>@@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>}
>>>EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>+
>>>+/**
>>>+ * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>+ * port attributes
>>>+ *
>>>+ * @dev: port device
>>>+ * @nlh: netlink msg with bridge port attributes
>>>+ *
>>>+ * Notify switch device port of bridge port attributes
>>>+ */
>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>+ struct nlmsghdr *nlh, u16 flags)
>>>+{
>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>+ struct net_device *lower_dev;
>>>+ struct list_head *iter;
>>>+ int ret = 0, err = 0;
>>>+
>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>+ return err;
>>>+
>>>+ if (ops->ndo_bridge_setlink) {
>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>+ return ops->ndo_bridge_setlink(dev, nlh, flags);
>> You have to change ndo_bridge_setlink in netdevice.h first.
>> Otherwise when only this patch is applied (during bisection)
>> this won't compile.
>
>ack, will fix it and keep that in mind next time.
>>
>>>+ }
>>>+
>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>> I do not understand why to iterate over lower devices. At this
>> stage we don't know a thing about this upper or its lowers. Let
>> the uppers (/masters) to decide if this needs to be propagated
>> or not.
>
>Jiri, In the stacked devices case, there is no way to propagate the bridge
>port attributes to switch device driver today (vlan and other bridge port
>attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>useful here. Nor we should go and implement ndo_bridge_setlink* in all
>devices that can be bridge ports.
Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
bonding for example and let it propagate the the call to slaves.
Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
might not make sense to propagate to "lowers".
>
>And this allows a switch driver to receive these callbacks if it has marked
>the switch port with an offload flag. Your way of using the switch port to
>get to the switch driver does not help in these cases.
I do not follow how this is related to this case (stacked layout).
>
>The other option is to use the 'switch device (not port)' to get to the
>switch driver.
That would not help this case (stacked layout) I believe.
>This patch shows that you can still do this with the ndo ops.
>>
>>>+ err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>+ if (err)
>>>+ ret = err;
>>>+ }
>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>
>>>+
>>>+ return ret;
>>>+}
>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>+
>>>+/**
>>>+ * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>+ * attribute delete
>>>+ *
>>>+ * @dev: port device
>>>+ * @nlh: netlink msg with bridge port attributes
>>>+ *
>>>+ * Notify switch device port of bridge port attribute delete
>>>+ */
>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>+ struct nlmsghdr *nlh, u16 flags)
>>>+{
>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>+ struct net_device *lower_dev;
>>>+ struct list_head *iter;
>>>+ int ret = 0, err = 0;
>>>+
>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>+ return err;
>>>+
>>>+ if (ops->ndo_bridge_dellink) {
>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>+ return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>+ }
>>>+
>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>+ err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>+ if (err)
>>>+ ret = err;
>>>+ }
>>>+
>>>+ return ret;
>>>+}
>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>--
>>>1.7.10.4
>>>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-11 17:11 ` Jiri Pirko
@ 2014-12-11 17:59 ` Roopa Prabhu
2014-12-11 18:07 ` Jiri Pirko
0 siblings, 1 reply; 31+ messages in thread
From: Roopa Prabhu @ 2014-12-11 17:59 UTC (permalink / raw)
To: Jiri Pirko
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
On 12/11/14, 9:11 AM, Jiri Pirko wrote:
> Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>> Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>
>>>> This patch adds two new api's netdev_switch_port_bridge_setlink
>>>> and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>> to switch asic
>>>>
>>>> (The names of the apis look odd with 'switch_port_bridge',
>>>> but am more inclined to change the prefix of the api to something else.
>>>> Will take any suggestions).
>>>>
>>>> The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>> pass bridge port attributes to the port device.
>>>>
>>>> If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>> the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>> the lowerdevs if supported. This is one way to pass bridge port attributes
>>>> through stacked netdevs (example when bridge port is a bond and bond slaves
>>>> are switch ports).
>>>>
>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>> ---
>>>> include/net/switchdev.h | 5 +++-
>>>> net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 74 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>> index 8a6d164..22676b6 100644
>>>> --- a/include/net/switchdev.h
>>>> +++ b/include/net/switchdev.h
>>>> @@ -17,7 +17,10 @@
>>>> int netdev_switch_parent_id_get(struct net_device *dev,
>>>> struct netdev_phys_item_id *psid);
>>>> int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>> -
>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>> + struct nlmsghdr *nlh, u16 flags);
>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>> + struct nlmsghdr *nlh, u16 flags);
>>>> #else
>>>>
>>>> static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>> index d162b21..62317e1 100644
>>>> --- a/net/switchdev/switchdev.c
>>>> +++ b/net/switchdev/switchdev.c
>>>> @@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>> }
>>>> EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>> +
>>>> +/**
>>>> + * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>> + * port attributes
>>>> + *
>>>> + * @dev: port device
>>>> + * @nlh: netlink msg with bridge port attributes
>>>> + *
>>>> + * Notify switch device port of bridge port attributes
>>>> + */
>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>> + struct nlmsghdr *nlh, u16 flags)
>>>> +{
>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>> + struct net_device *lower_dev;
>>>> + struct list_head *iter;
>>>> + int ret = 0, err = 0;
>>>> +
>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>> + return err;
>>>> +
>>>> + if (ops->ndo_bridge_setlink) {
>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>> + return ops->ndo_bridge_setlink(dev, nlh, flags);
>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>> Otherwise when only this patch is applied (during bisection)
>>> this won't compile.
>> ack, will fix it and keep that in mind next time.
>>>> + }
>>>> +
>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>> I do not understand why to iterate over lower devices. At this
>>> stage we don't know a thing about this upper or its lowers. Let
>>> the uppers (/masters) to decide if this needs to be propagated
>>> or not.
>> Jiri, In the stacked devices case, there is no way to propagate the bridge
>> port attributes to switch device driver today (vlan and other bridge port
>> attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>> useful here. Nor we should go and implement ndo_bridge_setlink* in all
>> devices that can be bridge ports.
>
> Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
> bonding for example and let it propagate the the call to slaves.
No, that will require bridge attribute support in all drivers. And that
is no good.
> Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
> might not make sense to propagate to "lowers".
This does not really propagate to lowers. It is just trying to get to a
switch port and from there to the switch driver.
Example, bond driver does not need to care if its a bridge port. It will
simply pass the call to its slave which
might be a switch port.
bond driver does not care if its a bridge port. But the switch driver
cares, because it knows that the bond was created with switch ports.
>
>> And this allows a switch driver to receive these callbacks if it has marked
>> the switch port with an offload flag. Your way of using the switch port to
>> get to the switch driver does not help in these cases.
> I do not follow how this is related to this case (stacked layout).
>
>> The other option is to use the 'switch device (not port)' to get to the
>> switch driver.
>
> That would not help this case (stacked layout) I believe.
>
>
>> This patch shows that you can still do this with the ndo ops.
>>>> + err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>> + if (err)
>>>> + ret = err;
>>>> + }
>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>
>>>> +
>>>> + return ret;
>>>> +}
>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>> +
>>>> +/**
>>>> + * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>> + * attribute delete
>>>> + *
>>>> + * @dev: port device
>>>> + * @nlh: netlink msg with bridge port attributes
>>>> + *
>>>> + * Notify switch device port of bridge port attribute delete
>>>> + */
>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>> + struct nlmsghdr *nlh, u16 flags)
>>>> +{
>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>> + struct net_device *lower_dev;
>>>> + struct list_head *iter;
>>>> + int ret = 0, err = 0;
>>>> +
>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>> + return err;
>>>> +
>>>> + if (ops->ndo_bridge_dellink) {
>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>> + return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>> + }
>>>> +
>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>> + err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>> + if (err)
>>>> + ret = err;
>>>> + }
>>>> +
>>>> + return ret;
>>>> +}
>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>> --
>>>> 1.7.10.4
>>>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-11 17:59 ` Roopa Prabhu
@ 2014-12-11 18:07 ` Jiri Pirko
2014-12-11 18:27 ` Roopa Prabhu
0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2014-12-11 18:07 UTC (permalink / raw)
To: Roopa Prabhu
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>
>>>>>This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>to switch asic
>>>>>
>>>>>(The names of the apis look odd with 'switch_port_bridge',
>>>>>but am more inclined to change the prefix of the api to something else.
>>>>>Will take any suggestions).
>>>>>
>>>>>The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>pass bridge port attributes to the port device.
>>>>>
>>>>>If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>are switch ports).
>>>>>
>>>>>Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>---
>>>>>include/net/switchdev.h | 5 +++-
>>>>>net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>
>>>>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>index 8a6d164..22676b6 100644
>>>>>--- a/include/net/switchdev.h
>>>>>+++ b/include/net/switchdev.h
>>>>>@@ -17,7 +17,10 @@
>>>>>int netdev_switch_parent_id_get(struct net_device *dev,
>>>>> struct netdev_phys_item_id *psid);
>>>>>int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>-
>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>+ struct nlmsghdr *nlh, u16 flags);
>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>+ struct nlmsghdr *nlh, u16 flags);
>>>>>#else
>>>>>
>>>>>static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>index d162b21..62317e1 100644
>>>>>--- a/net/switchdev/switchdev.c
>>>>>+++ b/net/switchdev/switchdev.c
>>>>>@@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>>>}
>>>>>EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>+
>>>>>+/**
>>>>>+ * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>+ * port attributes
>>>>>+ *
>>>>>+ * @dev: port device
>>>>>+ * @nlh: netlink msg with bridge port attributes
>>>>>+ *
>>>>>+ * Notify switch device port of bridge port attributes
>>>>>+ */
>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>+ struct nlmsghdr *nlh, u16 flags)
>>>>>+{
>>>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>>>+ struct net_device *lower_dev;
>>>>>+ struct list_head *iter;
>>>>>+ int ret = 0, err = 0;
>>>>>+
>>>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>+ return err;
>>>>>+
>>>>>+ if (ops->ndo_bridge_setlink) {
>>>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>+ return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>>> Otherwise when only this patch is applied (during bisection)
>>>> this won't compile.
>>>ack, will fix it and keep that in mind next time.
>>>>>+ }
>>>>>+
>>>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>> I do not understand why to iterate over lower devices. At this
>>>> stage we don't know a thing about this upper or its lowers. Let
>>>> the uppers (/masters) to decide if this needs to be propagated
>>>> or not.
>>>Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>port attributes to switch device driver today (vlan and other bridge port
>>>attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>devices that can be bridge ports.
>>
>>Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>bonding for example and let it propagate the the call to slaves.
>No, that will require bridge attribute support in all drivers. And that is no
>good.
Not all drivers, just all masters which want to support this. Like bond,
team, macvlan etc. That would be the same as for
ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
see any problem in that. It is much much clearer over big hammer iterate
over lowers in my opinion.
>>Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>might not make sense to propagate to "lowers".
>
>This does not really propagate to lowers. It is just trying to get to a
>switch port and from there to the switch driver.
>Example, bond driver does not need to care if its a bridge port. It will
>simply pass the call to its slave which
>might be a switch port.
>
>bond driver does not care if its a bridge port. But the switch driver cares,
>because it knows that the bond was created with switch ports.
>
>
>>
>>>And this allows a switch driver to receive these callbacks if it has marked
>>>the switch port with an offload flag. Your way of using the switch port to
>>>get to the switch driver does not help in these cases.
>>I do not follow how this is related to this case (stacked layout).
>>
>>>The other option is to use the 'switch device (not port)' to get to the
>>>switch driver.
>>
>>That would not help this case (stacked layout) I believe.
>>
>>
>>>This patch shows that you can still do this with the ndo ops.
>>>>>+ err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>+ if (err)
>>>>>+ ret = err;
>>>>>+ }
>>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>
>>>>>+
>>>>>+ return ret;
>>>>>+}
>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>+
>>>>>+/**
>>>>>+ * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>+ * attribute delete
>>>>>+ *
>>>>>+ * @dev: port device
>>>>>+ * @nlh: netlink msg with bridge port attributes
>>>>>+ *
>>>>>+ * Notify switch device port of bridge port attribute delete
>>>>>+ */
>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>+ struct nlmsghdr *nlh, u16 flags)
>>>>>+{
>>>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>>>+ struct net_device *lower_dev;
>>>>>+ struct list_head *iter;
>>>>>+ int ret = 0, err = 0;
>>>>>+
>>>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>+ return err;
>>>>>+
>>>>>+ if (ops->ndo_bridge_dellink) {
>>>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>+ return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>+ }
>>>>>+
>>>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>+ err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>+ if (err)
>>>>>+ ret = err;
>>>>>+ }
>>>>>+
>>>>>+ return ret;
>>>>>+}
>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>--
>>>>>1.7.10.4
>>>>>
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-11 18:07 ` Jiri Pirko
@ 2014-12-11 18:27 ` Roopa Prabhu
2014-12-11 22:25 ` Jiri Pirko
0 siblings, 1 reply; 31+ messages in thread
From: Roopa Prabhu @ 2014-12-11 18:27 UTC (permalink / raw)
To: Jiri Pirko
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
On 12/11/14, 10:07 AM, Jiri Pirko wrote:
> Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>> Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>> On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>> Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>
>>>>>> This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>> and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>> to switch asic
>>>>>>
>>>>>> (The names of the apis look odd with 'switch_port_bridge',
>>>>>> but am more inclined to change the prefix of the api to something else.
>>>>>> Will take any suggestions).
>>>>>>
>>>>>> The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>> pass bridge port attributes to the port device.
>>>>>>
>>>>>> If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>> the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>> the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>> through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>> are switch ports).
>>>>>>
>>>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>> ---
>>>>>> include/net/switchdev.h | 5 +++-
>>>>>> net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>> 2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>> index 8a6d164..22676b6 100644
>>>>>> --- a/include/net/switchdev.h
>>>>>> +++ b/include/net/switchdev.h
>>>>>> @@ -17,7 +17,10 @@
>>>>>> int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>> struct netdev_phys_item_id *psid);
>>>>>> int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>> -
>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>> #else
>>>>>>
>>>>>> static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>> index d162b21..62317e1 100644
>>>>>> --- a/net/switchdev/switchdev.c
>>>>>> +++ b/net/switchdev/switchdev.c
>>>>>> @@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>>>> }
>>>>>> EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>> +
>>>>>> +/**
>>>>>> + * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>> + * port attributes
>>>>>> + *
>>>>>> + * @dev: port device
>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>> + *
>>>>>> + * Notify switch device port of bridge port attributes
>>>>>> + */
>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>> +{
>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>> + struct net_device *lower_dev;
>>>>>> + struct list_head *iter;
>>>>>> + int ret = 0, err = 0;
>>>>>> +
>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>> + return err;
>>>>>> +
>>>>>> + if (ops->ndo_bridge_setlink) {
>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>> + return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>>>> Otherwise when only this patch is applied (during bisection)
>>>>> this won't compile.
>>>> ack, will fix it and keep that in mind next time.
>>>>>> + }
>>>>>> +
>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>> I do not understand why to iterate over lower devices. At this
>>>>> stage we don't know a thing about this upper or its lowers. Let
>>>>> the uppers (/masters) to decide if this needs to be propagated
>>>>> or not.
>>>> Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>> port attributes to switch device driver today (vlan and other bridge port
>>>> attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>> useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>> devices that can be bridge ports.
>>> Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>> bonding for example and let it propagate the the call to slaves.
>> No, that will require bridge attribute support in all drivers. And that is no
>> good.
> Not all drivers, just all masters which want to support this. Like bond,
> team, macvlan etc. That would be the same as for
> ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
> see any problem in that. It is much much clearer over big hammer iterate
> over lowers in my opinion.
You cannot avoid the lowerdev iteration in any case.
If you added it in the individual drivers: bond, macvlan and other
drivers will all have to do the same thing.
ie Call bridge setlink on lowerdevs.
My patch avoids the need to modify these drivers. Besides it does this
only when the OFFLOAD flag is set.
It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc.
It will be all other ndo_ops we will need for switch asics.
It will be l3 tomorrow, if the route is through a bond (But at that
point, we may end up having to introduce switch device instead of going
to the port. Lets see).
Today this patch introduces an abstract way to get to the switch driver
by getting to the slave switch port (And only when the OFFLOAD flag is set).
>
>
>>> Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>> might not make sense to propagate to "lowers".
>> This does not really propagate to lowers. It is just trying to get to a
>> switch port and from there to the switch driver.
>> Example, bond driver does not need to care if its a bridge port. It will
>> simply pass the call to its slave which
>> might be a switch port.
>>
>> bond driver does not care if its a bridge port. But the switch driver cares,
>> because it knows that the bond was created with switch ports.
>>
>>
>>>> And this allows a switch driver to receive these callbacks if it has marked
>>>> the switch port with an offload flag. Your way of using the switch port to
>>>> get to the switch driver does not help in these cases.
>>> I do not follow how this is related to this case (stacked layout).
>>>
>>>> The other option is to use the 'switch device (not port)' to get to the
>>>> switch driver.
>>> That would not help this case (stacked layout) I believe.
>>>
>>>
>>>> This patch shows that you can still do this with the ndo ops.
>>>>>> + err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>> + if (err)
>>>>>> + ret = err;
>>>>>> + }
>>>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>
>>>>>> +
>>>>>> + return ret;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>> +
>>>>>> +/**
>>>>>> + * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>> + * attribute delete
>>>>>> + *
>>>>>> + * @dev: port device
>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>> + *
>>>>>> + * Notify switch device port of bridge port attribute delete
>>>>>> + */
>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>> +{
>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>> + struct net_device *lower_dev;
>>>>>> + struct list_head *iter;
>>>>>> + int ret = 0, err = 0;
>>>>>> +
>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>> + return err;
>>>>>> +
>>>>>> + if (ops->ndo_bridge_dellink) {
>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>> + return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>> + }
>>>>>> +
>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>> + err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>> + if (err)
>>>>>> + ret = err;
>>>>>> + }
>>>>>> +
>>>>>> + return ret;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>> --
>>>>>> 1.7.10.4
>>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-11 18:27 ` Roopa Prabhu
@ 2014-12-11 22:25 ` Jiri Pirko
2014-12-14 14:13 ` Roopa Prabhu
0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2014-12-11 22:25 UTC (permalink / raw)
To: Roopa Prabhu
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
Thu, Dec 11, 2014 at 07:27:32PM CET, roopa@cumulusnetworks.com wrote:
>On 12/11/14, 10:07 AM, Jiri Pirko wrote:
>>Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>>>On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>>>Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>>>On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>>>Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>
>>>>>>>This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>>>and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>>>to switch asic
>>>>>>>
>>>>>>>(The names of the apis look odd with 'switch_port_bridge',
>>>>>>>but am more inclined to change the prefix of the api to something else.
>>>>>>>Will take any suggestions).
>>>>>>>
>>>>>>>The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>>>pass bridge port attributes to the port device.
>>>>>>>
>>>>>>>If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>>>the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>>>the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>>>through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>>>are switch ports).
>>>>>>>
>>>>>>>Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>---
>>>>>>>include/net/switchdev.h | 5 +++-
>>>>>>>net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>>index 8a6d164..22676b6 100644
>>>>>>>--- a/include/net/switchdev.h
>>>>>>>+++ b/include/net/switchdev.h
>>>>>>>@@ -17,7 +17,10 @@
>>>>>>>int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>> struct netdev_phys_item_id *psid);
>>>>>>>int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>>>-
>>>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>+ struct nlmsghdr *nlh, u16 flags);
>>>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>+ struct nlmsghdr *nlh, u16 flags);
>>>>>>>#else
>>>>>>>
>>>>>>>static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>>>index d162b21..62317e1 100644
>>>>>>>--- a/net/switchdev/switchdev.c
>>>>>>>+++ b/net/switchdev/switchdev.c
>>>>>>>@@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>>>>>}
>>>>>>>EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>>>+
>>>>>>>+/**
>>>>>>>+ * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>>>+ * port attributes
>>>>>>>+ *
>>>>>>>+ * @dev: port device
>>>>>>>+ * @nlh: netlink msg with bridge port attributes
>>>>>>>+ *
>>>>>>>+ * Notify switch device port of bridge port attributes
>>>>>>>+ */
>>>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>+ struct nlmsghdr *nlh, u16 flags)
>>>>>>>+{
>>>>>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>+ struct net_device *lower_dev;
>>>>>>>+ struct list_head *iter;
>>>>>>>+ int ret = 0, err = 0;
>>>>>>>+
>>>>>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>+ return err;
>>>>>>>+
>>>>>>>+ if (ops->ndo_bridge_setlink) {
>>>>>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>+ return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>>>>> Otherwise when only this patch is applied (during bisection)
>>>>>> this won't compile.
>>>>>ack, will fix it and keep that in mind next time.
>>>>>>>+ }
>>>>>>>+
>>>>>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>> I do not understand why to iterate over lower devices. At this
>>>>>> stage we don't know a thing about this upper or its lowers. Let
>>>>>> the uppers (/masters) to decide if this needs to be propagated
>>>>>> or not.
>>>>>Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>>>port attributes to switch device driver today (vlan and other bridge port
>>>>>attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>>>useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>>>devices that can be bridge ports.
>>>>Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>>>bonding for example and let it propagate the the call to slaves.
>>>No, that will require bridge attribute support in all drivers. And that is no
>>>good.
>>Not all drivers, just all masters which want to support this. Like bond,
>>team, macvlan etc. That would be the same as for
>>ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
>>see any problem in that. It is much much clearer over big hammer iterate
>>over lowers in my opinion.
>
>You cannot avoid the lowerdev iteration in any case.
>If you added it in the individual drivers: bond, macvlan and other drivers
>will all have to do the same thing.
>ie Call bridge setlink on lowerdevs.
I feel that the right way is to let masters propagate that themselves in
their code. That's it. I might be wrong of course.
>My patch avoids the need to modify these drivers. Besides it does this only
>when the OFFLOAD flag is set.
Yep, well in my reply to another patch of you series I expressed my
feeling that the flag should be really checked in particular switch
driver, not core. But I might be wrong there as well...
>
>It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc. It
>will be all other ndo_ops we will need for switch asics.
>It will be l3 tomorrow, if the route is through a bond (But at that point, we
>may end up having to introduce switch device instead of going to the port.
>Lets see).
>
>Today this patch introduces an abstract way to get to the switch driver by
>getting to the slave switch port (And only when the OFFLOAD flag is set).
>
>
>>
>>
>>>>Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>>>might not make sense to propagate to "lowers".
>>>This does not really propagate to lowers. It is just trying to get to a
>>>switch port and from there to the switch driver.
>>>Example, bond driver does not need to care if its a bridge port. It will
>>>simply pass the call to its slave which
>>>might be a switch port.
>>>
>>>bond driver does not care if its a bridge port. But the switch driver cares,
>>>because it knows that the bond was created with switch ports.
>>>
>>>
>>>>>And this allows a switch driver to receive these callbacks if it has marked
>>>>>the switch port with an offload flag. Your way of using the switch port to
>>>>>get to the switch driver does not help in these cases.
>>>>I do not follow how this is related to this case (stacked layout).
>>>>
>>>>>The other option is to use the 'switch device (not port)' to get to the
>>>>>switch driver.
>>>>That would not help this case (stacked layout) I believe.
>>>>
>>>>
>>>>>This patch shows that you can still do this with the ndo ops.
>>>>>>>+ err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>>>+ if (err)
>>>>>>>+ ret = err;
>>>>>>>+ }
>>>>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>>
>>>>>>>+
>>>>>>>+ return ret;
>>>>>>>+}
>>>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>>>+
>>>>>>>+/**
>>>>>>>+ * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>>>+ * attribute delete
>>>>>>>+ *
>>>>>>>+ * @dev: port device
>>>>>>>+ * @nlh: netlink msg with bridge port attributes
>>>>>>>+ *
>>>>>>>+ * Notify switch device port of bridge port attribute delete
>>>>>>>+ */
>>>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>+ struct nlmsghdr *nlh, u16 flags)
>>>>>>>+{
>>>>>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>+ struct net_device *lower_dev;
>>>>>>>+ struct list_head *iter;
>>>>>>>+ int ret = 0, err = 0;
>>>>>>>+
>>>>>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>+ return err;
>>>>>>>+
>>>>>>>+ if (ops->ndo_bridge_dellink) {
>>>>>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>+ return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>>>+ }
>>>>>>>+
>>>>>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>+ err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>>>+ if (err)
>>>>>>>+ ret = err;
>>>>>>>+ }
>>>>>>>+
>>>>>>>+ return ret;
>>>>>>>+}
>>>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>>>--
>>>>>>>1.7.10.4
>>>>>>>
>>>>--
>>>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>the body of a message to majordomo@vger.kernel.org
>>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-11 22:25 ` Jiri Pirko
@ 2014-12-14 14:13 ` Roopa Prabhu
2014-12-14 15:35 ` Jiri Pirko
0 siblings, 1 reply; 31+ messages in thread
From: Roopa Prabhu @ 2014-12-14 14:13 UTC (permalink / raw)
To: Jiri Pirko
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
On 12/11/14, 2:25 PM, Jiri Pirko wrote:
> Thu, Dec 11, 2014 at 07:27:32PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/11/14, 10:07 AM, Jiri Pirko wrote:
>>> Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>>>> On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>>>> Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>>>> On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>>>> Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>
>>>>>>>> This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>>>> and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>>>> to switch asic
>>>>>>>>
>>>>>>>> (The names of the apis look odd with 'switch_port_bridge',
>>>>>>>> but am more inclined to change the prefix of the api to something else.
>>>>>>>> Will take any suggestions).
>>>>>>>>
>>>>>>>> The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>>>> pass bridge port attributes to the port device.
>>>>>>>>
>>>>>>>> If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>>>> the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>>>> the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>>>> through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>>>> are switch ports).
>>>>>>>>
>>>>>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>> ---
>>>>>>>> include/net/switchdev.h | 5 +++-
>>>>>>>> net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> 2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>>> index 8a6d164..22676b6 100644
>>>>>>>> --- a/include/net/switchdev.h
>>>>>>>> +++ b/include/net/switchdev.h
>>>>>>>> @@ -17,7 +17,10 @@
>>>>>>>> int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>> struct netdev_phys_item_id *psid);
>>>>>>>> int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>>>> -
>>>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>>>> #else
>>>>>>>>
>>>>>>>> static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>>>> index d162b21..62317e1 100644
>>>>>>>> --- a/net/switchdev/switchdev.c
>>>>>>>> +++ b/net/switchdev/switchdev.c
>>>>>>>> @@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>>>>>> }
>>>>>>>> EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>>>> + * port attributes
>>>>>>>> + *
>>>>>>>> + * @dev: port device
>>>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>>>> + *
>>>>>>>> + * Notify switch device port of bridge port attributes
>>>>>>>> + */
>>>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>>>> +{
>>>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>> + struct net_device *lower_dev;
>>>>>>>> + struct list_head *iter;
>>>>>>>> + int ret = 0, err = 0;
>>>>>>>> +
>>>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>> + return err;
>>>>>>>> +
>>>>>>>> + if (ops->ndo_bridge_setlink) {
>>>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>> + return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>>>>>> Otherwise when only this patch is applied (during bisection)
>>>>>>> this won't compile.
>>>>>> ack, will fix it and keep that in mind next time.
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>> I do not understand why to iterate over lower devices. At this
>>>>>>> stage we don't know a thing about this upper or its lowers. Let
>>>>>>> the uppers (/masters) to decide if this needs to be propagated
>>>>>>> or not.
>>>>>> Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>>>> port attributes to switch device driver today (vlan and other bridge port
>>>>>> attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>>>> useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>>>> devices that can be bridge ports.
>>>>> Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>>>> bonding for example and let it propagate the the call to slaves.
>>>> No, that will require bridge attribute support in all drivers. And that is no
>>>> good.
>>> Not all drivers, just all masters which want to support this. Like bond,
>>> team, macvlan etc. That would be the same as for
>>> ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
>>> see any problem in that. It is much much clearer over big hammer iterate
>>> over lowers in my opinion.
>> You cannot avoid the lowerdev iteration in any case.
>> If you added it in the individual drivers: bond, macvlan and other drivers
>> will all have to do the same thing.
>> ie Call bridge setlink on lowerdevs.
> I feel that the right way is to let masters propagate that themselves in
> their code.
In this case no. Just because an interface is a port in a bridge, it is
wrong to indicate that the interface driver needs to understand all
bridge port configuration attributes. Note that with what you are asking
for ...all bridge port drivers (bonds, vxlans) will also need to
implement the netdev stp state update api.
> That's it. I might be wrong of course.
>> My patch avoids the need to modify these drivers. Besides it does this only
>> when the OFFLOAD flag is set.
>
> Yep, well in my reply to another patch of you series I expressed my
> feeling that the flag should be really checked in particular switch
> driver, not core. But I might be wrong there as well...
The bridge driver owns these attributes...and he needs to call the
switchdev api to offload.
And the condition for the switchdev api call is the offload flag. And
the offload flag is part of the switchdev api.
The drivers just set it on the netdev, they dont own the offload flag.
So, I don't see a reason why the core should not
know about the flag.
What has been accepted in the kernel currently does not help bridge
driver offloading to switchdev. It does help if you want to manage your
switch device separately like you were already doing with nics. ie going
to switch port driver directly. It does not help the stacked device case
either.
I will resubmit my series with the checkpatch errors you pointed out.
And, am also looking at other ways to solve the problem.
Thanks for the review.
>> It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc. It
>> will be all other ndo_ops we will need for switch asics.
>> It will be l3 tomorrow, if the route is through a bond (But at that point, we
>> may end up having to introduce switch device instead of going to the port.
>> Lets see).
>>
>> Today this patch introduces an abstract way to get to the switch driver by
>> getting to the slave switch port (And only when the OFFLOAD flag is set).
>>
>>
>>>
>>>>> Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>>>> might not make sense to propagate to "lowers".
>>>> This does not really propagate to lowers. It is just trying to get to a
>>>> switch port and from there to the switch driver.
>>>> Example, bond driver does not need to care if its a bridge port. It will
>>>> simply pass the call to its slave which
>>>> might be a switch port.
>>>>
>>>> bond driver does not care if its a bridge port. But the switch driver cares,
>>>> because it knows that the bond was created with switch ports.
>>>>
>>>>
>>>>>> And this allows a switch driver to receive these callbacks if it has marked
>>>>>> the switch port with an offload flag. Your way of using the switch port to
>>>>>> get to the switch driver does not help in these cases.
>>>>> I do not follow how this is related to this case (stacked layout).
>>>>>
>>>>>> The other option is to use the 'switch device (not port)' to get to the
>>>>>> switch driver.
>>>>> That would not help this case (stacked layout) I believe.
>>>>>
>>>>>
>>>>>> This patch shows that you can still do this with the ndo ops.
>>>>>>>> + err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>>>> + if (err)
>>>>>>>> + ret = err;
>>>>>>>> + }
>>>>>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>>>
>>>>>>>> +
>>>>>>>> + return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>>>> + * attribute delete
>>>>>>>> + *
>>>>>>>> + * @dev: port device
>>>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>>>> + *
>>>>>>>> + * Notify switch device port of bridge port attribute delete
>>>>>>>> + */
>>>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>>>> +{
>>>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>> + struct net_device *lower_dev;
>>>>>>>> + struct list_head *iter;
>>>>>>>> + int ret = 0, err = 0;
>>>>>>>> +
>>>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>> + return err;
>>>>>>>> +
>>>>>>>> + if (ops->ndo_bridge_dellink) {
>>>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>> + return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>> + err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>>>> + if (err)
>>>>>>>> + ret = err;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>>>> --
>>>>>>>> 1.7.10.4
>>>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-14 14:13 ` Roopa Prabhu
@ 2014-12-14 15:35 ` Jiri Pirko
2014-12-14 19:41 ` Roopa Prabhu
0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2014-12-14 15:35 UTC (permalink / raw)
To: Roopa Prabhu
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
Sun, Dec 14, 2014 at 03:13:40PM CET, roopa@cumulusnetworks.com wrote:
>On 12/11/14, 2:25 PM, Jiri Pirko wrote:
>>Thu, Dec 11, 2014 at 07:27:32PM CET, roopa@cumulusnetworks.com wrote:
>>>On 12/11/14, 10:07 AM, Jiri Pirko wrote:
>>>>Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>>>>>On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>>>>>Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>>>>>Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>>>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>>
>>>>>>>>>This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>>>>>and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>>>>>to switch asic
>>>>>>>>>
>>>>>>>>>(The names of the apis look odd with 'switch_port_bridge',
>>>>>>>>>but am more inclined to change the prefix of the api to something else.
>>>>>>>>>Will take any suggestions).
>>>>>>>>>
>>>>>>>>>The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>>>>>pass bridge port attributes to the port device.
>>>>>>>>>
>>>>>>>>>If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>>>>>the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>>>>>the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>>>>>through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>>>>>are switch ports).
>>>>>>>>>
>>>>>>>>>Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>>---
>>>>>>>>>include/net/switchdev.h | 5 +++-
>>>>>>>>>net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>>>>index 8a6d164..22676b6 100644
>>>>>>>>>--- a/include/net/switchdev.h
>>>>>>>>>+++ b/include/net/switchdev.h
>>>>>>>>>@@ -17,7 +17,10 @@
>>>>>>>>>int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>>> struct netdev_phys_item_id *psid);
>>>>>>>>>int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>>>>>-
>>>>>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>>>+ struct nlmsghdr *nlh, u16 flags);
>>>>>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>>>+ struct nlmsghdr *nlh, u16 flags);
>>>>>>>>>#else
>>>>>>>>>
>>>>>>>>>static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>>>>>index d162b21..62317e1 100644
>>>>>>>>>--- a/net/switchdev/switchdev.c
>>>>>>>>>+++ b/net/switchdev/switchdev.c
>>>>>>>>>@@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>>>>>>>}
>>>>>>>>>EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>>>>>+
>>>>>>>>>+/**
>>>>>>>>>+ * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>>>>>+ * port attributes
>>>>>>>>>+ *
>>>>>>>>>+ * @dev: port device
>>>>>>>>>+ * @nlh: netlink msg with bridge port attributes
>>>>>>>>>+ *
>>>>>>>>>+ * Notify switch device port of bridge port attributes
>>>>>>>>>+ */
>>>>>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>>>+ struct nlmsghdr *nlh, u16 flags)
>>>>>>>>>+{
>>>>>>>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>>>+ struct net_device *lower_dev;
>>>>>>>>>+ struct list_head *iter;
>>>>>>>>>+ int ret = 0, err = 0;
>>>>>>>>>+
>>>>>>>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>>>+ return err;
>>>>>>>>>+
>>>>>>>>>+ if (ops->ndo_bridge_setlink) {
>>>>>>>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>>>+ return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>>>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>>>>>>> Otherwise when only this patch is applied (during bisection)
>>>>>>>> this won't compile.
>>>>>>>ack, will fix it and keep that in mind next time.
>>>>>>>>>+ }
>>>>>>>>>+
>>>>>>>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>> I do not understand why to iterate over lower devices. At this
>>>>>>>> stage we don't know a thing about this upper or its lowers. Let
>>>>>>>> the uppers (/masters) to decide if this needs to be propagated
>>>>>>>> or not.
>>>>>>>Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>>>>>port attributes to switch device driver today (vlan and other bridge port
>>>>>>>attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>>>>>useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>>>>>devices that can be bridge ports.
>>>>>>Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>>>>>bonding for example and let it propagate the the call to slaves.
>>>>>No, that will require bridge attribute support in all drivers. And that is no
>>>>>good.
>>>>Not all drivers, just all masters which want to support this. Like bond,
>>>>team, macvlan etc. That would be the same as for
>>>>ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
>>>>see any problem in that. It is much much clearer over big hammer iterate
>>>>over lowers in my opinion.
>>>You cannot avoid the lowerdev iteration in any case.
>>>If you added it in the individual drivers: bond, macvlan and other drivers
>>>will all have to do the same thing.
>>>ie Call bridge setlink on lowerdevs.
>>I feel that the right way is to let masters propagate that themselves in
>>their code.
>In this case no. Just because an interface is a port in a bridge, it is wrong
>to indicate that the interface driver needs to understand all bridge port
>configuration attributes. Note that with what you are asking for ...all
>bridge port drivers (bonds, vxlans) will also need to implement the netdev
>stp state update api.
I'm very well aware of this fact. But still, I'm convinced that the way
similar things are implemented now, using prapagation inside particular
drivers (bond/team/etc) is the correct way to go. I do not see any
downside in that. But we are running in circles here. I would love to
hear opinion of other people here.
>>That's it. I might be wrong of course.
>
>
>>>My patch avoids the need to modify these drivers. Besides it does this only
>>>when the OFFLOAD flag is set.
>>
>>Yep, well in my reply to another patch of you series I expressed my
>>feeling that the flag should be really checked in particular switch
>>driver, not core. But I might be wrong there as well...
>
>The bridge driver owns these attributes...and he needs to call the switchdev
>api to offload.
>And the condition for the switchdev api call is the offload flag. And the
>offload flag is part of the switchdev api.
>The drivers just set it on the netdev, they dont own the offload flag. So, I
>don't see a reason why the core should not
>know about the flag.
I do not understand the formulation "own the offload flag". What I say
is let the bridge/others to call the switchdev api unconditionally and
let the leaf drivers handle that as they see fit, taking various facts
into account, flags included. This way you avoid the need for flags
inheritance in stacked scenarios. Imagine following example:
bridge - bond --- eth1
--- eth2
eth1 and eth2 are switch ports. Now eth1 has the flag set and eth2 does
not. Should the bond have the flag set or not? And if it has, eth2 need
to check the flag as well to do not offload.
Implementing the inheritance correctly would be a small nightmare. So I
say, why don't just let the leafs to check and decide.
>
>What has been accepted in the kernel currently does not help bridge driver
>offloading to switchdev. It does help if you want to manage your switch
>device separately like you were already doing with nics. ie going to switch
>port driver directly. It does not help the stacked device case either.
>
>
>I will resubmit my series with the checkpatch errors you pointed out.
>
>And, am also looking at other ways to solve the problem.
>
>Thanks for the review.
>
>
>>>It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc. It
>>>will be all other ndo_ops we will need for switch asics.
>>>It will be l3 tomorrow, if the route is through a bond (But at that point, we
>>>may end up having to introduce switch device instead of going to the port.
>>>Lets see).
>>>
>>>Today this patch introduces an abstract way to get to the switch driver by
>>>getting to the slave switch port (And only when the OFFLOAD flag is set).
>>>
>>>
>>>>
>>>>>>Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>>>>>might not make sense to propagate to "lowers".
>>>>>This does not really propagate to lowers. It is just trying to get to a
>>>>>switch port and from there to the switch driver.
>>>>>Example, bond driver does not need to care if its a bridge port. It will
>>>>>simply pass the call to its slave which
>>>>>might be a switch port.
>>>>>
>>>>>bond driver does not care if its a bridge port. But the switch driver cares,
>>>>>because it knows that the bond was created with switch ports.
>>>>>
>>>>>
>>>>>>>And this allows a switch driver to receive these callbacks if it has marked
>>>>>>>the switch port with an offload flag. Your way of using the switch port to
>>>>>>>get to the switch driver does not help in these cases.
>>>>>>I do not follow how this is related to this case (stacked layout).
>>>>>>
>>>>>>>The other option is to use the 'switch device (not port)' to get to the
>>>>>>>switch driver.
>>>>>>That would not help this case (stacked layout) I believe.
>>>>>>
>>>>>>
>>>>>>>This patch shows that you can still do this with the ndo ops.
>>>>>>>>>+ err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>>>>>+ if (err)
>>>>>>>>>+ ret = err;
>>>>>>>>>+ }
>>>>>>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>>>>
>>>>>>>>>+
>>>>>>>>>+ return ret;
>>>>>>>>>+}
>>>>>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>>>>>+
>>>>>>>>>+/**
>>>>>>>>>+ * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>>>>>+ * attribute delete
>>>>>>>>>+ *
>>>>>>>>>+ * @dev: port device
>>>>>>>>>+ * @nlh: netlink msg with bridge port attributes
>>>>>>>>>+ *
>>>>>>>>>+ * Notify switch device port of bridge port attribute delete
>>>>>>>>>+ */
>>>>>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>>>+ struct nlmsghdr *nlh, u16 flags)
>>>>>>>>>+{
>>>>>>>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>>>+ struct net_device *lower_dev;
>>>>>>>>>+ struct list_head *iter;
>>>>>>>>>+ int ret = 0, err = 0;
>>>>>>>>>+
>>>>>>>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>>>+ return err;
>>>>>>>>>+
>>>>>>>>>+ if (ops->ndo_bridge_dellink) {
>>>>>>>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>>>+ return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>>>>>+ }
>>>>>>>>>+
>>>>>>>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>>>+ err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>>>>>+ if (err)
>>>>>>>>>+ ret = err;
>>>>>>>>>+ }
>>>>>>>>>+
>>>>>>>>>+ return ret;
>>>>>>>>>+}
>>>>>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>>>>>--
>>>>>>>>>1.7.10.4
>>>>>>>>>
>>>>>>--
>>>>>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>the body of a message to majordomo@vger.kernel.org
>>>>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-14 15:35 ` Jiri Pirko
@ 2014-12-14 19:41 ` Roopa Prabhu
2014-12-15 15:26 ` Jamal Hadi Salim
0 siblings, 1 reply; 31+ messages in thread
From: Roopa Prabhu @ 2014-12-14 19:41 UTC (permalink / raw)
To: Jiri Pirko
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
On 12/14/14, 7:35 AM, Jiri Pirko wrote:
> Sun, Dec 14, 2014 at 03:13:40PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/11/14, 2:25 PM, Jiri Pirko wrote:
>>> Thu, Dec 11, 2014 at 07:27:32PM CET, roopa@cumulusnetworks.com wrote:
>>>> On 12/11/14, 10:07 AM, Jiri Pirko wrote:
>>>>> Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>>>>>> On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>>>>>> Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>> On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>>>>>> Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>>>
>>>>>>>>>> This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>>>>>> and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>>>>>> to switch asic
>>>>>>>>>>
>>>>>>>>>> (The names of the apis look odd with 'switch_port_bridge',
>>>>>>>>>> but am more inclined to change the prefix of the api to something else.
>>>>>>>>>> Will take any suggestions).
>>>>>>>>>>
>>>>>>>>>> The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>>>>>> pass bridge port attributes to the port device.
>>>>>>>>>>
>>>>>>>>>> If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>>>>>> the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>>>>>> the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>>>>>> through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>>>>>> are switch ports).
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>>> ---
>>>>>>>>>> include/net/switchdev.h | 5 +++-
>>>>>>>>>> net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>> 2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>>>>> index 8a6d164..22676b6 100644
>>>>>>>>>> --- a/include/net/switchdev.h
>>>>>>>>>> +++ b/include/net/switchdev.h
>>>>>>>>>> @@ -17,7 +17,10 @@
>>>>>>>>>> int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>>>> struct netdev_phys_item_id *psid);
>>>>>>>>>> int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>>>>>> -
>>>>>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>>>>>> #else
>>>>>>>>>>
>>>>>>>>>> static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>>>>>> index d162b21..62317e1 100644
>>>>>>>>>> --- a/net/switchdev/switchdev.c
>>>>>>>>>> +++ b/net/switchdev/switchdev.c
>>>>>>>>>> @@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>>>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>>>>>>>> }
>>>>>>>>>> EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>>>>>> + * port attributes
>>>>>>>>>> + *
>>>>>>>>>> + * @dev: port device
>>>>>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>>>>>> + *
>>>>>>>>>> + * Notify switch device port of bridge port attributes
>>>>>>>>>> + */
>>>>>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>>>>>> +{
>>>>>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>>>> + struct net_device *lower_dev;
>>>>>>>>>> + struct list_head *iter;
>>>>>>>>>> + int ret = 0, err = 0;
>>>>>>>>>> +
>>>>>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>>>> + return err;
>>>>>>>>>> +
>>>>>>>>>> + if (ops->ndo_bridge_setlink) {
>>>>>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>>>> + return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>>>>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>>>>>>>> Otherwise when only this patch is applied (during bisection)
>>>>>>>>> this won't compile.
>>>>>>>> ack, will fix it and keep that in mind next time.
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>>> I do not understand why to iterate over lower devices. At this
>>>>>>>>> stage we don't know a thing about this upper or its lowers. Let
>>>>>>>>> the uppers (/masters) to decide if this needs to be propagated
>>>>>>>>> or not.
>>>>>>>> Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>>>>>> port attributes to switch device driver today (vlan and other bridge port
>>>>>>>> attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>>>>>> useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>>>>>> devices that can be bridge ports.
>>>>>>> Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>>>>>> bonding for example and let it propagate the the call to slaves.
>>>>>> No, that will require bridge attribute support in all drivers. And that is no
>>>>>> good.
>>>>> Not all drivers, just all masters which want to support this. Like bond,
>>>>> team, macvlan etc. That would be the same as for
>>>>> ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
>>>>> see any problem in that. It is much much clearer over big hammer iterate
>>>>> over lowers in my opinion.
>>>> You cannot avoid the lowerdev iteration in any case.
>>>> If you added it in the individual drivers: bond, macvlan and other drivers
>>>> will all have to do the same thing.
>>>> ie Call bridge setlink on lowerdevs.
>>> I feel that the right way is to let masters propagate that themselves in
>>> their code.
>> In this case no. Just because an interface is a port in a bridge, it is wrong
>> to indicate that the interface driver needs to understand all bridge port
>> configuration attributes. Note that with what you are asking for ...all
>> bridge port drivers (bonds, vxlans) will also need to implement the netdev
>> stp state update api.
> I'm very well aware of this fact. But still, I'm convinced that the way
> similar things are implemented now, using prapagation inside particular
> drivers (bond/team/etc) is the correct way to go. I do not see any
> downside in that. But we are running in circles here. I would love to
> hear opinion of other people here.
I know you are aware of this fact ,...but just stating again here that
we are also talking about l3 ops here.
Which don't make sense to be in all drivers. Maybe i am being very
conservative.
And yes, agree we are going in circles. :). and yes, would love to hear
other opinions as well.
>
>
>>> That's it. I might be wrong of course.
>>
>>>> My patch avoids the need to modify these drivers. Besides it does this only
>>>> when the OFFLOAD flag is set.
>>> Yep, well in my reply to another patch of you series I expressed my
>>> feeling that the flag should be really checked in particular switch
>>> driver, not core. But I might be wrong there as well...
>> The bridge driver owns these attributes...and he needs to call the switchdev
>> api to offload.
>> And the condition for the switchdev api call is the offload flag. And the
>> offload flag is part of the switchdev api.
>> The drivers just set it on the netdev, they dont own the offload flag. So, I
>> don't see a reason why the core should not
>> know about the flag.
> I do not understand the formulation "own the offload flag". What I say
> is let the bridge/others to call the switchdev api unconditionally and
> let the leaf drivers handle that as they see fit, taking various facts
> into account, flags included. This way you avoid the need for flags
> inheritance in stacked scenarios. Imagine following example:
>
> bridge - bond --- eth1
> --- eth2
>
> eth1 and eth2 are switch ports. Now eth1 has the flag set and eth2 does
> not. Should the bond have the flag set or not? And if it has, eth2 need
> to check the flag as well to do not offload.
>
> Implementing the inheritance correctly would be a small nightmare. So I
> say, why don't just let the leafs to check and decide.
Note that i wasn't really considering inheritance as the major factor in
our argument.
If that needs to be dropped, i can consider that. But, the flag is there
for a reason: To not unconditionally call lowerdev ndo's.
ie reduce some overhead.
And in the above case, the bond does get the flag..because it has a
lowerdev with the feature flag.
And note that the inheritance does not come with a cost. This is all
existing code. If i hadn't mentioned it, it would probably go unnoticed ;).
I had to just add the below patch. And existing calls to
netdev_update_features in bridge ndo_add/del_slave takes care of
inheriting the feature flags from the slaves.
again, i did not introduce any complexity here. The feature flag is just
there to reduce some overhead in unnecessarily traversing all lowerdevs.
@@ -159,7 +161,8 @@ enum {
*/
#define NETIF_F_ONE_FOR_ALL (NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ROBUST | \
NETIF_F_SG | NETIF_F_HIGHDMA | \
- NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED)
+ NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED | \
+ NETIF_F_HW_NETFUNC_OFFLOAD)
/*
* If one device doesn't support one of these features, then disable it
* for all in netdev_increment_features.
>
>
>> What has been accepted in the kernel currently does not help bridge driver
>> offloading to switchdev. It does help if you want to manage your switch
>> device separately like you were already doing with nics. ie going to switch
>> port driver directly. It does not help the stacked device case either.
>>
>>
>> I will resubmit my series with the checkpatch errors you pointed out.
>>
>> And, am also looking at other ways to solve the problem.
>>
>> Thanks for the review.
>>
>>
>>>> It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc. It
>>>> will be all other ndo_ops we will need for switch asics.
>>>> It will be l3 tomorrow, if the route is through a bond (But at that point, we
>>>> may end up having to introduce switch device instead of going to the port.
>>>> Lets see).
>>>>
>>>> Today this patch introduces an abstract way to get to the switch driver by
>>>> getting to the slave switch port (And only when the OFFLOAD flag is set).
>>>>
>>>>
>>>>>>> Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>>>>>> might not make sense to propagate to "lowers".
>>>>>> This does not really propagate to lowers. It is just trying to get to a
>>>>>> switch port and from there to the switch driver.
>>>>>> Example, bond driver does not need to care if its a bridge port. It will
>>>>>> simply pass the call to its slave which
>>>>>> might be a switch port.
>>>>>>
>>>>>> bond driver does not care if its a bridge port. But the switch driver cares,
>>>>>> because it knows that the bond was created with switch ports.
>>>>>>
>>>>>>
>>>>>>>> And this allows a switch driver to receive these callbacks if it has marked
>>>>>>>> the switch port with an offload flag. Your way of using the switch port to
>>>>>>>> get to the switch driver does not help in these cases.
>>>>>>> I do not follow how this is related to this case (stacked layout).
>>>>>>>
>>>>>>>> The other option is to use the 'switch device (not port)' to get to the
>>>>>>>> switch driver.
>>>>>>> That would not help this case (stacked layout) I believe.
>>>>>>>
>>>>>>>
>>>>>>>> This patch shows that you can still do this with the ndo ops.
>>>>>>>>>> + err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>>>>>> + if (err)
>>>>>>>>>> + ret = err;
>>>>>>>>>> + }
>>>>>>>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> + return ret;
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>>>>>> + * attribute delete
>>>>>>>>>> + *
>>>>>>>>>> + * @dev: port device
>>>>>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>>>>>> + *
>>>>>>>>>> + * Notify switch device port of bridge port attribute delete
>>>>>>>>>> + */
>>>>>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>>>>>> +{
>>>>>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>>>> + struct net_device *lower_dev;
>>>>>>>>>> + struct list_head *iter;
>>>>>>>>>> + int ret = 0, err = 0;
>>>>>>>>>> +
>>>>>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>>>> + return err;
>>>>>>>>>> +
>>>>>>>>>> + if (ops->ndo_bridge_dellink) {
>>>>>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>>>> + return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>>>> + err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>>>>>> + if (err)
>>>>>>>>>> + ret = err;
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> + return ret;
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>>>>>> --
>>>>>>>>>> 1.7.10.4
>>>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-14 19:41 ` Roopa Prabhu
@ 2014-12-15 15:26 ` Jamal Hadi Salim
2014-12-15 17:20 ` Roopa Prabhu
2014-12-15 17:25 ` Arad, Ronen
0 siblings, 2 replies; 31+ messages in thread
From: Jamal Hadi Salim @ 2014-12-15 15:26 UTC (permalink / raw)
To: Roopa Prabhu, Jiri Pirko
Cc: sfeldma, bcrl, tgraf, john.fastabend, stephen, linville, vyasevic,
netdev, davem, shm, gospo
Sorry - i didnt quiet follow the discussion, but i can see the value
of propagating things from parent to children netdevs as part of the
generic approach. And in that spirit:
Ben's patches (and I am sure the cumulus folk do this) expose ports.
i.e you boot up the hardware and you see ports. You can then put these
ports in a bridge and you can offload fdbs and do other parametrization
to the ASIC. IOW, this only becomes a bridge because you created one
in the kernel and attached bridge ports to it.
Lets say i didnt want a bridge. I want instead to take these exposed
ports and create a bond (and maybe play with LACP). How does this
propagation from parent->child->child work then? I think the idea
of just bonding and not exposing it as a switch is a reasonable use
case.
Also how does it work when i start doing L3 and the bond's port doesnt
support L3? Is it time to revive the thing we called TheThing in Du?
cheers,
jamal
On 12/14/14 14:41, Roopa Prabhu wrote:
> On 12/14/14, 7:35 AM, Jiri Pirko wrote:
[..chopped off for brevity and saving electrons..]
cheers,
jamal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-15 15:26 ` Jamal Hadi Salim
@ 2014-12-15 17:20 ` Roopa Prabhu
2014-12-15 18:03 ` Benjamin LaHaise
2014-12-15 17:25 ` Arad, Ronen
1 sibling, 1 reply; 31+ messages in thread
From: Roopa Prabhu @ 2014-12-15 17:20 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Jiri Pirko, sfeldma, bcrl, tgraf, john.fastabend, stephen,
linville, vyasevic, netdev, davem, shm, gospo
On 12/15/14, 7:26 AM, Jamal Hadi Salim wrote:
>
> Sorry - i didnt quiet follow the discussion, but i can see the value
> of propagating things from parent to children netdevs as part of the
> generic approach. And in that spirit:
>
> Ben's patches (and I am sure the cumulus folk do this) expose ports.
> i.e you boot up the hardware and you see ports. You can then put these
> ports in a bridge and you can offload fdbs and do other parametrization
> to the ASIC. IOW, this only becomes a bridge because you created one
> in the kernel and attached bridge ports to it.
>
> Lets say i didnt want a bridge. I want instead to take these exposed
> ports and create a bond (and maybe play with LACP). How does this
> propagation from parent->child->child work then? I think the idea
> of just bonding and not exposing it as a switch is a reasonable use
> case.
We have not come to pure bonding and lacp yet (but i have mentioned it
in many contexts before).
The use case you mention is offloading bond attributes. This will be
addressed as part of ongoing switchdev work
for all other offloads (bonds, vxlans etc).
Right now we are only talking bridge port attribute offload
(learn/flood/port state etc). This could still be a bridge port
attribute on a bond
when the bond is a bridge port.
> Also how does it work when i start doing L3 and the bond's port doesnt
> support L3? Is it time to revive the thing we called TheThing in Du?
yes, exactly. This is what i had indicated in my initial emails on this
thread when the stacked devices topic came up.
Since there was reluctance in introducing a switch device (theThing), My
current patch tries to do that without a switch device.
Since this is still l2, and we are dealing with bridge port attributes,
my current patch traverses the stacked netdevs to call the
ndo_bridge_setlink on the switch port.
When it comes to l3, we can follow the same.., but as discussed in Du,
there are other reasons where a switch device becomes necessary.
I can submit an alternate series to cover the switch device approach for
l2 as well.
Thanks,
Roopa
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-15 15:26 ` Jamal Hadi Salim
2014-12-15 17:20 ` Roopa Prabhu
@ 2014-12-15 17:25 ` Arad, Ronen
2014-12-15 17:57 ` Benjamin LaHaise
2014-12-15 17:57 ` John Fastabend
1 sibling, 2 replies; 31+ messages in thread
From: Arad, Ronen @ 2014-12-15 17:25 UTC (permalink / raw)
To: Jamal Hadi Salim, Roopa Prabhu, Jiri Pirko,
netdev@vger.kernel.org
Cc: sfeldma@gmail.com, bcrl@kvack.org, tgraf@suug.ch,
john.fastabend@gmail.com, stephen@networkplumber.org,
linville@tuxdriver.com, vyasevic@redhat.com, davem@davemloft.net,
shm@cumulusnetworks.com, gospo@cumulusnetworks.com
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Jamal Hadi Salim
> Sent: Monday, December 15, 2014 5:26 PM
> To: Roopa Prabhu; Jiri Pirko
> Cc: sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> john.fastabend@gmail.com; stephen@networkplumber.org;
> linville@tuxdriver.com; vyasevic@redhat.com; netdev@vger.kernel.org;
> davem@davemloft.net; shm@cumulusnetworks.com;
> gospo@cumulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
>
>
> Sorry - i didnt quiet follow the discussion, but i can see the value of
> propagating things from parent to children netdevs as part of the generic
> approach. And in that spirit:
>
> Ben's patches (and I am sure the cumulus folk do this) expose ports.
> i.e you boot up the hardware and you see ports. You can then put these
> ports in a bridge and you can offload fdbs and do other parametrization to
> the ASIC. IOW, this only becomes a bridge because you created one in the
> kernel and attached bridge ports to it.
>
> Lets say i didnt want a bridge. I want instead to take these exposed ports and
> create a bond (and maybe play with LACP). How does this propagation from
> parent->child->child work then? I think the idea of just bonding and not
> exposing it as a switch is a reasonable use case.
Are you saying that the software should reflect the same functionality the HW provides?
In other words is creating a bridge device mandatory for supporting standard VLAN-bridging (as in IEEE 802.1Q) in the HW?
VLAN-bridging including port VLAN membership, VLAN filtering, PVID, Egress un-tagging could be supported without an explicit bridge device when port devices implement bridge ndos (ndo_bridge_{set,del,get}link). What is lost is the ability to have common handling of VLAN-aware FDB in the bridge module.
Do we expect switch port devices to support L2 functionality both ways (with and without an explicit bridge device)?
Will the decision about using a bridge device or avoiding it be left to the end-user?
(This requires switch port drivers to be able to work and provide similar functionality in both setups).
I think that we need to outline the handling of L3 as it could determine or at least impact some of the answers to my above questions.
cheers,
ronen
> Also how does it work when i start doing L3 and the bond's port doesnt
> support L3? Is it time to revive the thing we called TheThing in Du?
>
> cheers,
> jamal
>
> On 12/14/14 14:41, Roopa Prabhu wrote:
> > On 12/14/14, 7:35 AM, Jiri Pirko wrote:
>
> [..chopped off for brevity and saving electrons..]
>
> cheers,
> jamal
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-15 17:25 ` Arad, Ronen
@ 2014-12-15 17:57 ` Benjamin LaHaise
2014-12-15 17:57 ` John Fastabend
1 sibling, 0 replies; 31+ messages in thread
From: Benjamin LaHaise @ 2014-12-15 17:57 UTC (permalink / raw)
To: Arad, Ronen
Cc: Jamal Hadi Salim, Roopa Prabhu, Jiri Pirko,
netdev@vger.kernel.org, sfeldma@gmail.com, tgraf@suug.ch,
john.fastabend@gmail.com, stephen@networkplumber.org,
linville@tuxdriver.com, vyasevic@redhat.com, davem@davemloft.net,
shm@cumulusnetworks.com, gospo@cumulusnetworks.com
On Mon, Dec 15, 2014 at 05:25:00PM +0000, Arad, Ronen wrote:
> > Ben's patches (and I am sure the cumulus folk do this) expose ports.
> > i.e you boot up the hardware and you see ports. You can then put these
> > ports in a bridge and you can offload fdbs and do other parametrization to
> > the ASIC. IOW, this only becomes a bridge because you created one in the
> > kernel and attached bridge ports to it.
> >
> > Lets say i didnt want a bridge. I want instead to take these exposed ports and
> > create a bond (and maybe play with LACP). How does this propagation from
> > parent->child->child work then? I think the idea of just bonding and not
> > exposing it as a switch is a reasonable use case.
>
> Are you saying that the software should reflect the same functionality the HW provides?
> In other words is creating a bridge device mandatory for supporting standard VLAN-bridging (as in IEEE 802.1Q) in the HW?
Re-using the existing Linux APIs for bridging is the approach I've taken
with the rtl8366s driver I've been working on. It makes no sense to create
entirely new APIs for these operations.
> VLAN-bridging including port VLAN membership, VLAN filtering, PVID, Egress un-tagging could be supported without an explicit bridge device when port devices implement bridge ndos (ndo_bridge_{set,del,get}link). What is lost is the ability to have common handling of VLAN-aware FDB in the bridge module.
> Do we expect switch port devices to support L2 functionality both ways (with and without an explicit bridge device)?
I've already got egress untagging working if the VLANs are configured using
vconfig and added to a bridge. The lack of this functionality in the "new"
API for bridging VLANs is a huge complaint I have about the new bridge
configuration API. I feel that API change was a step backwards in terms
of functionality. It also conflates the different FDBs for different VLANs
into the same hash table, again, something i find rather messy compared to
the 1 FDB to 1 bridge state the bridge device was in originally.
-ben
> Will the decision about using a bridge device or avoiding it be left to the end-user?
> (This requires switch port drivers to be able to work and provide similar functionality in both setups).
> I think that we need to outline the handling of L3 as it could determine or at least impact some of the answers to my above questions.
>
> cheers,
> ronen
>
> > Also how does it work when i start doing L3 and the bond's port doesnt
> > support L3? Is it time to revive the thing we called TheThing in Du?
> >
> > cheers,
> > jamal
> >
> > On 12/14/14 14:41, Roopa Prabhu wrote:
> > > On 12/14/14, 7:35 AM, Jiri Pirko wrote:
> >
> > [..chopped off for brevity and saving electrons..]
> >
> > cheers,
> > jamal
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> > of a message to majordomo@vger.kernel.org More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
--
"Thought is the essence of where you are now."
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-15 17:25 ` Arad, Ronen
2014-12-15 17:57 ` Benjamin LaHaise
@ 2014-12-15 17:57 ` John Fastabend
2014-12-15 18:36 ` Arad, Ronen
1 sibling, 1 reply; 31+ messages in thread
From: John Fastabend @ 2014-12-15 17:57 UTC (permalink / raw)
To: Arad, Ronen
Cc: Jamal Hadi Salim, Roopa Prabhu, Jiri Pirko,
netdev@vger.kernel.org, sfeldma@gmail.com, bcrl@kvack.org,
tgraf@suug.ch, stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
On 12/15/2014 09:25 AM, Arad, Ronen wrote:
>
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>> owner@vger.kernel.org] On Behalf Of Jamal Hadi Salim
>> Sent: Monday, December 15, 2014 5:26 PM
>> To: Roopa Prabhu; Jiri Pirko
>> Cc: sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> john.fastabend@gmail.com; stephen@networkplumber.org;
>> linville@tuxdriver.com; vyasevic@redhat.com; netdev@vger.kernel.org;
>> davem@davemloft.net; shm@cumulusnetworks.com;
>> gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>>
>> Sorry - i didnt quiet follow the discussion, but i can see the value of
>> propagating things from parent to children netdevs as part of the generic
>> approach. And in that spirit:
>>
>> Ben's patches (and I am sure the cumulus folk do this) expose ports.
>> i.e you boot up the hardware and you see ports. You can then put these
>> ports in a bridge and you can offload fdbs and do other parametrization to
>> the ASIC. IOW, this only becomes a bridge because you created one in the
>> kernel and attached bridge ports to it.
>>
>> Lets say i didnt want a bridge. I want instead to take these exposed ports and
>> create a bond (and maybe play with LACP). How does this propagation from
>> parent->child->child work then? I think the idea of just bonding and not
>> exposing it as a switch is a reasonable use case.
>
> Are you saying that the software should reflect the same
> functionality the HW provides?
> In other words is creating a bridge device mandatory for supporting
> standard VLAN-bridging (as in IEEE 802.1Q) in the HW?
No it shouldn't be mandatory. And I don't think it is at the moment.
Users are free to manage the FDB tables directly via netlink or
configure the software bridge to sync them. This seems like a good
model to follow to me and we should try to get as many features as
it makes sense to follow this model.
> VLAN-bridging including port VLAN membership, VLAN filtering, PVID,
> Egress un-tagging could be supported without an explicit bridge device
> when port devices implement bridge ndos (ndo_bridge_{set,del,get}link).
> What is lost is the ability to have common handling of VLAN-aware FDB in
> the bridge module.
not sure what is lost here? Its seems the SW bridge does (or at least
could) support the same vlan capabilities. And the bridge could push
these into hardware when Roopa's offload bit is set. Or if users want
to manage everything outside bridge calling the ndo_bridge_ ops directly
works as well. By the way I believe the software linux bridge supports
most of those features you listed today. If we missed something we can
add it.
> Do we expect switch port devices to support L2 functionality both
> ways (with and without an explicit bridge device)?
My opinion yes. But in fact the driver shouldn't care what is driving
it. The paths should be the same for direct user manipulation via
netlink and SELF|MASTER bit, bridge module, or any other in-kernel
sub-system.
> Will the decision about using a bridge device or avoiding it be left
> to the end-user?
Its a user policy decision. Again the offload bit gets us this in
a reasonably configurable way IMO.
> (This requires switch port drivers to be able to work and provide
> similar functionality in both setups).
Right, but if the drivers "care" who is calling their ndo ops something
is seriously broken. For the driver it should not need to know anything
about the callers so it doesn't matter to the driver if its a netlink
call from user space or an internal call fro bridge.ko
> I think that we need to outline the handling of L3 as it could
> determine or at least impact some of the answers to my above questions.
L3 should follow the same model. Admittedly I've not worked through the
L3 cases closely but I don't see why we can't apply the same model.
And maybe this is where we need to introduce a container to hold some
state as Jamal says. The easiest way to see this will be to look at
some proposed code.
> cheers,
> ronen
>
>> Also how does it work when i start doing L3 and the bond's port doesnt
>> support L3? Is it time to revive the thing we called TheThing in Du?
>>
>> cheers,
>> jamal
>>
>> On 12/14/14 14:41, Roopa Prabhu wrote:
>>> On 12/14/14, 7:35 AM, Jiri Pirko wrote:
>>
>> [..chopped off for brevity and saving electrons..]
>>
>> cheers,
>> jamal
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
--
John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-15 17:20 ` Roopa Prabhu
@ 2014-12-15 18:03 ` Benjamin LaHaise
0 siblings, 0 replies; 31+ messages in thread
From: Benjamin LaHaise @ 2014-12-15 18:03 UTC (permalink / raw)
To: Roopa Prabhu
Cc: Jamal Hadi Salim, Jiri Pirko, sfeldma, tgraf, john.fastabend,
stephen, linville, vyasevic, netdev, davem, shm, gospo
On Mon, Dec 15, 2014 at 09:20:18AM -0800, Roopa Prabhu wrote:
> On 12/15/14, 7:26 AM, Jamal Hadi Salim wrote:
> >
> >Sorry - i didnt quiet follow the discussion, but i can see the value
> >of propagating things from parent to children netdevs as part of the
> >generic approach. And in that spirit:
> >
> >Ben's patches (and I am sure the cumulus folk do this) expose ports.
> >i.e you boot up the hardware and you see ports. You can then put these
> >ports in a bridge and you can offload fdbs and do other parametrization
> >to the ASIC. IOW, this only becomes a bridge because you created one
> >in the kernel and attached bridge ports to it.
> >
> >Lets say i didnt want a bridge. I want instead to take these exposed
> >ports and create a bond (and maybe play with LACP). How does this
> >propagation from parent->child->child work then? I think the idea
> >of just bonding and not exposing it as a switch is a reasonable use
> >case.
>
> We have not come to pure bonding and lacp yet (but i have mentioned it
> in many contexts before).
> The use case you mention is offloading bond attributes. This will be
> addressed as part of ongoing switchdev work
> for all other offloads (bonds, vxlans etc).
> Right now we are only talking bridge port attribute offload
> (learn/flood/port state etc). This could still be a bridge port
> attribute on a bond
> when the bond is a bridge port.
This raises the question: do we track which attributes are configured
onto a port in the switchdev code (in an attempt to maintain the state
on behalf of the underlying device), or do we simply pass them in as
attributes of the config request? With stacking, I can see the need
for different layers to add different attributes to the config for a
given switch port. Things like bonding would need to make note of the
bond interface with a common identifier so that the switch can figure
out to put the different ports into the same group.
The rtl8366s has support for one 802.3ad group, so it looks like I will
need to tackle this.
-ben
> >Also how does it work when i start doing L3 and the bond's port doesnt
> >support L3? Is it time to revive the thing we called TheThing in Du?
> yes, exactly. This is what i had indicated in my initial emails on this
> thread when the stacked devices topic came up.
> Since there was reluctance in introducing a switch device (theThing), My
> current patch tries to do that without a switch device.
> Since this is still l2, and we are dealing with bridge port attributes,
> my current patch traverses the stacked netdevs to call the
> ndo_bridge_setlink on the switch port.
>
> When it comes to l3, we can follow the same.., but as discussed in Du,
> there are other reasons where a switch device becomes necessary.
> I can submit an alternate series to cover the switch device approach for
> l2 as well.
>
> Thanks,
> Roopa
>
--
"Thought is the essence of where you are now."
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-15 17:57 ` John Fastabend
@ 2014-12-15 18:36 ` Arad, Ronen
2014-12-15 23:27 ` Jamal Hadi Salim
0 siblings, 1 reply; 31+ messages in thread
From: Arad, Ronen @ 2014-12-15 18:36 UTC (permalink / raw)
To: John Fastabend, netdev@vger.kernel.org
Cc: Jamal Hadi Salim, Roopa Prabhu, Jiri Pirko, sfeldma@gmail.com,
bcrl@kvack.org, tgraf@suug.ch, stephen@networkplumber.org,
linville@tuxdriver.com, vyasevic@redhat.com, davem@davemloft.net,
shm@cumulusnetworks.com, gospo@cumulusnetworks.com
> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@gmail.com]
> Sent: Monday, December 15, 2014 7:57 PM
> To: Arad, Ronen
> Cc: Jamal Hadi Salim; Roopa Prabhu; Jiri Pirko; netdev@vger.kernel.org;
> sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> stephen@networkplumber.org; linville@tuxdriver.com;
> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
>
> On 12/15/2014 09:25 AM, Arad, Ronen wrote:
> >
> >
> >> -----Original Message-----
> >> From: netdev-owner@vger.kernel.org [mailto:netdev-
> >> owner@vger.kernel.org] On Behalf Of Jamal Hadi Salim
> >> Sent: Monday, December 15, 2014 5:26 PM
> >> To: Roopa Prabhu; Jiri Pirko
> >> Cc: sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> >> john.fastabend@gmail.com; stephen@networkplumber.org;
> >> linville@tuxdriver.com; vyasevic@redhat.com; netdev@vger.kernel.org;
> >> davem@davemloft.net; shm@cumulusnetworks.com;
> >> gospo@cumulusnetworks.com
> >> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
> >> del bridge port attributes
> >>
> >>
> >> Sorry - i didnt quiet follow the discussion, but i can see the value
> >> of propagating things from parent to children netdevs as part of the
> >> generic approach. And in that spirit:
> >>
> >> Ben's patches (and I am sure the cumulus folk do this) expose ports.
> >> i.e you boot up the hardware and you see ports. You can then put
> >> these ports in a bridge and you can offload fdbs and do other
> >> parametrization to the ASIC. IOW, this only becomes a bridge because
> >> you created one in the kernel and attached bridge ports to it.
> >>
> >> Lets say i didnt want a bridge. I want instead to take these exposed
> >> ports and create a bond (and maybe play with LACP). How does this
> >> propagation from
> >> parent->child->child work then? I think the idea of just bonding and
> >> parent->child->not
> >> exposing it as a switch is a reasonable use case.
> >
>
> > Are you saying that the software should reflect the same functionality
> > the HW provides?
> > In other words is creating a bridge device mandatory for supporting
> > standard VLAN-bridging (as in IEEE 802.1Q) in the HW?
>
> No it shouldn't be mandatory. And I don't think it is at the moment.
> Users are free to manage the FDB tables directly via netlink or configure the
> software bridge to sync them. This seems like a good model to follow to me
> and we should try to get as many features as it makes sense to follow this
> model.
>
> > VLAN-bridging including port VLAN membership, VLAN filtering, PVID,
> > Egress un-tagging could be supported without an explicit bridge device
> > when port devices implement bridge ndos (ndo_bridge_{set,del,get}link).
> > What is lost is the ability to have common handling of VLAN-aware FDB
> > in the bridge module.
>
> not sure what is lost here? Its seems the SW bridge does (or at least
> could) support the same vlan capabilities. And the bridge could push these
> into hardware when Roopa's offload bit is set. Or if users want to manage
> everything outside bridge calling the ndo_bridge_ ops directly works as well.
> By the way I believe the software linux bridge supports most of those
> features you listed today. If we missed something we can add it.
>
> > Do we expect switch port devices to support L2 functionality both ways
> > (with and without an explicit bridge device)?
>
> My opinion yes. But in fact the driver shouldn't care what is driving it. The
> paths should be the same for direct user manipulation via netlink and
> SELF|MASTER bit, bridge module, or any other in-kernel sub-system.
>
The behavior of a driver could depend on the presence of a bridge and features such as FDB LEARNING and LEARNING_SYNC.
A switch port driver which is not enslaved to a bridge might need to implement VLAN-aware FDB within the driver and report its content to user-space using ndo_fdb_dump.
A switch port driver which is enslaved to a bridge could do with only pass through for static FDB configuration to the HW when LEARNING_SYNC is configured. FDB reporting to user-space and soft aging are left to the bridge module FDB.
Such driver, without LEARNING_SYNC could still avoid maintaing in-driver FDB as long as it could dump the HW FDB on demand.
LEARNING_SYNC also requires periodic updates of freshness information from the driver to the bridge module.
> > Will the decision about using a bridge device or avoiding it be left
> > to the end-user?
>
> Its a user policy decision. Again the offload bit gets us this in a reasonably
> configurable way IMO.
>
> > (This requires switch port drivers to be able to work and provide
> > similar functionality in both setups).
>
> Right, but if the drivers "care" who is calling their ndo ops something is
> seriously broken. For the driver it should not need to know anything about
> the callers so it doesn't matter to the driver if its a netlink call from user
> space or an internal call fro bridge.ko
LEARNING_SYNC only makes sense when a switch port driver is enslaved to a bridge. Rocker switch driver indeed monitors upper change notifications and keep track of master bridge presence. So bridge presence is not transparent.
>
> > I think that we need to outline the handling of L3 as it could
> > determine or at least impact some of the answers to my above questions.
>
> L3 should follow the same model. Admittedly I've not worked through the
> L3 cases closely but I don't see why we can't apply the same model.
> And maybe this is where we need to introduce a container to hold some
> state as Jamal says. The easiest way to see this will be to look at some
> proposed code.
>
>
> > cheers,
> > ronen
> >
> >> Also how does it work when i start doing L3 and the bond's port
> >> doesnt support L3? Is it time to revive the thing we called TheThing in Du?
> >>
> >> cheers,
> >> jamal
> >>
> >> On 12/14/14 14:41, Roopa Prabhu wrote:
> >>> On 12/14/14, 7:35 AM, Jiri Pirko wrote:
> >>
> >> [..chopped off for brevity and saving electrons..]
> >>
> >> cheers,
> >> jamal
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-15 18:36 ` Arad, Ronen
@ 2014-12-15 23:27 ` Jamal Hadi Salim
2014-12-16 0:58 ` Arad, Ronen
0 siblings, 1 reply; 31+ messages in thread
From: Jamal Hadi Salim @ 2014-12-15 23:27 UTC (permalink / raw)
To: Arad, Ronen, John Fastabend, netdev@vger.kernel.org
Cc: Roopa Prabhu, Jiri Pirko, sfeldma@gmail.com, bcrl@kvack.org,
tgraf@suug.ch, stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
On 12/15/14 13:36, Arad, Ronen wrote:
>
>
>> -----Original Message-----
> The behavior of a driver could depend on the presence of a bridge and features such as FDB LEARNING and LEARNING_SYNC.
Indeed, those are bridge attributes.
> A switch port driver which is not enslaved to a bridge might need to implement VLAN-aware FDB
>within the driver and report its content to user-space using ndo_fdb_dump.
>
> A switch port driver which is enslaved to a bridge could do with only pass through for static FDB configuration
> to the HW when LEARNING_SYNC is configured. FDB reporting to
user-space and soft aging are left to the bridge module FDB.
> Such driver, without LEARNING_SYNC could still avoid maintaing in-driver FDB as long as it could dump the HW FDB on demand.
> LEARNING_SYNC also requires periodic updates of freshness information from the driver to the bridge module.
>
If you have an fdb - shouldnt that be exposed only if you have a bridge
abstraction exposed? i.e thats where the Linux tools would work.
What i was refering to was a scenario where i have no interest in the
fdb despite such a hardware capabilities. VLANs is a different issue;
>>> Will the decision about using a bridge device or avoiding it be left
>>> to the end-user?
>>
>> Its a user policy decision. Again the offload bit gets us this in a reasonably
>> configurable way IMO.
>>
>>> (This requires switch port drivers to be able to work and provide
>>> similar functionality in both setups).
>>
>> Right, but if the drivers "care" who is calling their ndo ops something is
>> seriously broken. For the driver it should not need to know anything about
>> the callers so it doesn't matter to the driver if its a netlink call from user
>> space or an internal call fro bridge.ko
>
> LEARNING_SYNC only makes sense when a switch port driver is enslaved to a bridge.
> Rocker switch driver indeed monitors upper change notifications and
keep track of master bridge presence.
> So bridge presence is not transparent.
>
Agreed - the challenge so far is that people have been fascinated by
"switch" point of view. I think we are learning and the class device
will eventually become obvious as useful.
cheers,
jamal
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-15 23:27 ` Jamal Hadi Salim
@ 2014-12-16 0:58 ` Arad, Ronen
2014-12-16 1:20 ` Roopa Prabhu
0 siblings, 1 reply; 31+ messages in thread
From: Arad, Ronen @ 2014-12-16 0:58 UTC (permalink / raw)
To: Jamal Hadi Salim, John Fastabend, netdev@vger.kernel.org
Cc: Roopa Prabhu, Jiri Pirko, sfeldma@gmail.com, bcrl@kvack.org,
tgraf@suug.ch, stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
> -----Original Message-----
> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
> Sent: Tuesday, December 16, 2014 1:28 AM
> To: Arad, Ronen; John Fastabend; netdev@vger.kernel.org
> Cc: Roopa Prabhu; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org;
> tgraf@suug.ch; stephen@networkplumber.org; linville@tuxdriver.com;
> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
>
> On 12/15/14 13:36, Arad, Ronen wrote:
> >
> >
> >> -----Original Message-----
>
> > The behavior of a driver could depend on the presence of a bridge and
> features such as FDB LEARNING and LEARNING_SYNC.
>
> Indeed, those are bridge attributes.
>
> > A switch port driver which is not enslaved to a bridge might need to
> >implement VLAN-aware FDB within the driver and report its content to user-
> space using ndo_fdb_dump.
> >
> > A switch port driver which is enslaved to a bridge could do with only
> > pass through for static FDB configuration
> > to the HW when LEARNING_SYNC is configured. FDB reporting to user-
> space and soft aging are left to the bridge module FDB.
> > Such driver, without LEARNING_SYNC could still avoid maintaing in-driver
> FDB as long as it could dump the HW FDB on demand.
> > LEARNING_SYNC also requires periodic updates of freshness information
> from the driver to the bridge module.
> >
>
>
> If you have an fdb - shouldnt that be exposed only if you have a bridge
> abstraction exposed? i.e thats where the Linux tools would work.
I'm trying to find out what are the opinions of other people in the netdev list.
John have clearly stated that he'd like to see full L2 switching functionality (at least) supported without making a bridge device mandatory.
The existing bridge ndos (ndo_bridge_{set,del,get}link) already support that with proper setting of SELF/MASTER flags by iproute2.
I see the value in supporting both approaches (bridge device mandatory and bridge device optional). If the choice is left to user-driven policy decision, we need to document both use models and map traditional L2 features to each model.
The L2 offloading (or NETFUNC as it is currently called), which is being discussed on a different patch-set, is only needed when a bridge device is used.
Without a bridge device, all configuration has to be targeted at the switch port driver directly using the SELF flag. FDB remains relevant and it is used to configure static MAC table entries and dump the HW MAC table.
When the HW device is a L2 switch or a multi-layer switch (L2-L3 or even higher), there is a gap between what the HW is doing and what is explicitly modeled in Linux. Without a bridge device, the HW is represented by a set of switch port devices and the bridging (both control and data planes) takes place only in the HW and switch port driver.
Each switch port driver has to implement its own FDB as there is no common shared code among drivers for different HW devices.
Using a bridge device could partially alleviate that, but it comes with a cost. There is a need to properly implement offloading of both configuration and data-path. The transmit and receive path in the bridge module should be somehow bypassed to avoid unnecessary overhead or duplicate packets coming from both software bridging and HW bridging.
> What i was refering to was a scenario where i have no interest in the fdb
> despite such a hardware capabilities. VLANs is a different issue;
>
VLAN is fundamental feature of L2 and L3 switching and Linux is unclear about it. Bridge device could model bridging of untagged packets which requires a bridge device for each VLAN and a vlan device on each port that is a member of the bridge's VLAN.
This different from the behavior and configuration of classic closed-source switches.
An alternative model is VLAN filtering where a bridge is VLAN-aware and switches tagged traffic. A bridge device represents multiple L2 domains with VLAN filtering policy that defines the switching rules within each domain. Forwarding (e.g. L3 routing) is expected across such L2 domains using L3 entities.
The modeling of L3 entities per L2 domain (e.g. per-VLAN) in the VLAN filtering model is yet unclear to me.
> >>> Will the decision about using a bridge device or avoiding it be left
> >>> to the end-user?
> >>
> >> Its a user policy decision. Again the offload bit gets us this in a
> >> reasonably configurable way IMO.
> >>
> >>> (This requires switch port drivers to be able to work and provide
> >>> similar functionality in both setups).
> >>
> >> Right, but if the drivers "care" who is calling their ndo ops
> >> something is seriously broken. For the driver it should not need to
> >> know anything about the callers so it doesn't matter to the driver if
> >> its a netlink call from user space or an internal call fro bridge.ko
> >
> > LEARNING_SYNC only makes sense when a switch port driver is enslaved to
> a bridge.
> > Rocker switch driver indeed monitors upper change notifications and keep
> track of master bridge presence.
> > So bridge presence is not transparent.
> >
>
> Agreed - the challenge so far is that people have been fascinated by "switch"
> point of view. I think we are learning and the class device will eventually
> become obvious as useful.
>
> cheers,
> jamal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 0:58 ` Arad, Ronen
@ 2014-12-16 1:20 ` Roopa Prabhu
2014-12-16 11:01 ` Arad, Ronen
0 siblings, 1 reply; 31+ messages in thread
From: Roopa Prabhu @ 2014-12-16 1:20 UTC (permalink / raw)
To: Arad, Ronen
Cc: Jamal Hadi Salim, John Fastabend, netdev@vger.kernel.org,
Jiri Pirko, sfeldma@gmail.com, bcrl@kvack.org, tgraf@suug.ch,
stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
On 12/15/14, 4:58 PM, Arad, Ronen wrote:
>
>> -----Original Message-----
>> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
>> Sent: Tuesday, December 16, 2014 1:28 AM
>> To: Arad, Ronen; John Fastabend; netdev@vger.kernel.org
>> Cc: Roopa Prabhu; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org;
>> tgraf@suug.ch; stephen@networkplumber.org; linville@tuxdriver.com;
>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>> On 12/15/14 13:36, Arad, Ronen wrote:
>>>
>>>> -----Original Message-----
>>> The behavior of a driver could depend on the presence of a bridge and
>> features such as FDB LEARNING and LEARNING_SYNC.
>>
>> Indeed, those are bridge attributes.
>>
>>> A switch port driver which is not enslaved to a bridge might need to
>>> implement VLAN-aware FDB within the driver and report its content to user-
>> space using ndo_fdb_dump.
>> >
>>> A switch port driver which is enslaved to a bridge could do with only
>>> pass through for static FDB configuration
>> > to the HW when LEARNING_SYNC is configured. FDB reporting to user-
>> space and soft aging are left to the bridge module FDB.
>>> Such driver, without LEARNING_SYNC could still avoid maintaing in-driver
>> FDB as long as it could dump the HW FDB on demand.
>>> LEARNING_SYNC also requires periodic updates of freshness information
>> from the driver to the bridge module.
>>
>> If you have an fdb - shouldnt that be exposed only if you have a bridge
>> abstraction exposed? i.e thats where the Linux tools would work.
> I'm trying to find out what are the opinions of other people in the netdev list.
> John have clearly stated that he'd like to see full L2 switching functionality (at least) supported without making a bridge device mandatory.
> The existing bridge ndos (ndo_bridge_{set,del,get}link) already support that with proper setting of SELF/MASTER flags by iproute2.
> I see the value in supporting both approaches (bridge device mandatory and bridge device optional). If the choice is left to user-driven policy decision, we need to document both use models and map traditional L2 features to each model.
> The L2 offloading (or NETFUNC as it is currently called), which is being discussed on a different patch-set, is only needed when a bridge device is used.
> Without a bridge device, all configuration has to be targeted at the switch port driver directly using the SELF flag. FDB remains relevant and it is used to configure static MAC table entries and dump the HW MAC table.
Your understanding is right here. So far all patches have kept both
models in mind.
> When the HW device is a L2 switch or a multi-layer switch (L2-L3 or even higher), there is a gap between what the HW is doing and what is explicitly modeled in Linux.
Can you elaborate more here ?. We use the linux model to accelerate a
multi-layer (l2-l3) switch today. There maybe a few gaps, but these gaps
can be closed by having equivalent functionality in the software path.
> Without a bridge device, the HW is represented by a set of switch port devices and the bridging (both control and data planes) takes place only in the HW and switch port driver.
> Each switch port driver has to implement its own FDB as there is no common shared code among drivers for different HW devices.
> Using a bridge device could partially alleviate that, but it comes with a cost. There is a need to properly implement offloading of both configuration and data-path. The transmit and receive path in the bridge module should be somehow bypassed to avoid unnecessary overhead or duplicate packets coming from both software bridging and HW bridging.
>
>> What i was refering to was a scenario where i have no interest in the fdb
>> despite such a hardware capabilities. VLANs is a different issue;
>>
> VLAN is fundamental feature of L2 and L3 switching and Linux is unclear about it. Bridge device could model bridging of untagged packets which requires a bridge device for each VLAN and a vlan device on each port that is a member of the bridge's VLAN.
> This different from the behavior and configuration of classic closed-source switches.
> An alternative model is VLAN filtering where a bridge is VLAN-aware and switches tagged traffic. A bridge device represents multiple L2 domains with VLAN filtering policy that defines the switching rules within each domain.
And the linux bridge driver supports both models today.
> Forwarding (e.g. L3 routing) is expected across such L2 domains using L3 entities.
> The modeling of L3 entities per L2 domain (e.g. per-VLAN) in the VLAN filtering model is yet unclear to me.
In the vlan filtering bridge model, You can create a vlan device on the
bridge for l3 ...
>
>>>>> Will the decision about using a bridge device or avoiding it be left
>>>>> to the end-user?
>>>> Its a user policy decision. Again the offload bit gets us this in a
>>>> reasonably configurable way IMO.
>>>>
>>>>> (This requires switch port drivers to be able to work and provide
>>>>> similar functionality in both setups).
>>>> Right, but if the drivers "care" who is calling their ndo ops
>>>> something is seriously broken. For the driver it should not need to
>>>> know anything about the callers so it doesn't matter to the driver if
>>>> its a netlink call from user space or an internal call fro bridge.ko
>>> LEARNING_SYNC only makes sense when a switch port driver is enslaved to
>> a bridge.
>> > Rocker switch driver indeed monitors upper change notifications and keep
>> track of master bridge presence.
>>> So bridge presence is not transparent.
>>>
>> Agreed - the challenge so far is that people have been fascinated by "switch"
>> point of view. I think we are learning and the class device will eventually
>> become obvious as useful.
>>
>> cheers,
>> jamal
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 1:20 ` Roopa Prabhu
@ 2014-12-16 11:01 ` Arad, Ronen
2014-12-16 15:54 ` Samudrala, Sridhar
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Arad, Ronen @ 2014-12-16 11:01 UTC (permalink / raw)
To: Roopa Prabhu, netdev@vger.kernel.org
Cc: Jamal Hadi Salim, John Fastabend, Jiri Pirko, sfeldma@gmail.com,
bcrl@kvack.org, tgraf@suug.ch, stephen@networkplumber.org,
linville@tuxdriver.com, vyasevic@redhat.com, davem@davemloft.net,
shm@cumulusnetworks.com, gospo@cumulusnetworks.com
In my reply (inline) I elaborate on the validity of bridge-less and offloaded-bridge models for L2 switching.
I also discuss the implied necessity of a bridge device for L3 routing and potential issues with the upcoming FIB offloading proposal.
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Roopa Prabhu
> Sent: Tuesday, December 16, 2014 3:21 AM
> To: Arad, Ronen
> Cc: Jamal Hadi Salim; John Fastabend; netdev@vger.kernel.org; Jiri Pirko;
> sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> stephen@networkplumber.org; linville@tuxdriver.com;
> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
>
> On 12/15/14, 4:58 PM, Arad, Ronen wrote:
> >
> >> -----Original Message-----
> >> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
> >> Sent: Tuesday, December 16, 2014 1:28 AM
> >> To: Arad, Ronen; John Fastabend; netdev@vger.kernel.org
> >> Cc: Roopa Prabhu; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org;
> >> tgraf@suug.ch; stephen@networkplumber.org; linville@tuxdriver.com;
> >> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com;
> >> gospo@cumulusnetworks.com
> >> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
> >> del bridge port attributes
> >>
> >> On 12/15/14 13:36, Arad, Ronen wrote:
> >>>
> >>>> -----Original Message-----
> >>> The behavior of a driver could depend on the presence of a bridge
> >>> and
> >> features such as FDB LEARNING and LEARNING_SYNC.
> >>
> >> Indeed, those are bridge attributes.
> >>
> >>> A switch port driver which is not enslaved to a bridge might need to
> >>> implement VLAN-aware FDB within the driver and report its content to
> >>> user-
> >> space using ndo_fdb_dump.
> >> >
> >>> A switch port driver which is enslaved to a bridge could do with
> >>> only pass through for static FDB configuration
> >> > to the HW when LEARNING_SYNC is configured. FDB reporting to
> >> user- space and soft aging are left to the bridge module FDB.
> >>> Such driver, without LEARNING_SYNC could still avoid maintaing
> >>> in-driver
> >> FDB as long as it could dump the HW FDB on demand.
> >>> LEARNING_SYNC also requires periodic updates of freshness
> >>> information
> >> from the driver to the bridge module.
> >>
> >> If you have an fdb - shouldnt that be exposed only if you have a
> >> bridge abstraction exposed? i.e thats where the Linux tools would work.
> > I'm trying to find out what are the opinions of other people in the netdev
> list.
> > John have clearly stated that he'd like to see full L2 switching functionality
> (at least) supported without making a bridge device mandatory.
> > The existing bridge ndos (ndo_bridge_{set,del,get}link) already support that
> with proper setting of SELF/MASTER flags by iproute2.
> > I see the value in supporting both approaches (bridge device mandatory
> > and bridge device optional). If the choice is left to user-driven policy decision,
> > we need to document both use models and map traditional L2 features to
> > each model.
> > The L2 offloading (or NETFUNC as it is currently called), which is being
> > discussed on a different patch-set, is only needed when a bridge device is
> > used.
> > Without a bridge device, all configuration has to be targeted at the switch
> > port driver directly using the SELF flag. FDB remains relevant and it is used to
> > configure static MAC table entries and dump the HW MAC table.
> Your understanding is right here. So far all patches have kept both models in
> mind.
> > When the HW device is a L2 switch or a multi-layer switch (L2-L3 or even
> > higher), there is a gap between what the HW is doing and what is explicitly
> > modeled in Linux.
> Can you elaborate more here ?. We use the linux model to accelerate a
> multi-layer (l2-l3) switch today. There maybe a few gaps, but these gaps can
> be closed by having equivalent functionality in the software path.
What I meant is that without a bridge device the HW switch is seen as a collection of independent switch ports. Typical switch ASIC performs L2 switching by default. This is not expressed explicitly in Linux without a bridge device.
The SELF flag is used to target typical bridge port and bridge configuration at a switch port device.
Without an explicit bridge device, bridge attributes have to be directed at an arbitrary port (any port could represent the entire switch) and interpreted by the switch port driver as intended for the entire switch (this includes attributes like STP etc.)
Each switch port device driver has to implement similar functionality (i.e. all bridge and fdb related ndos) independently without common functionality shared (e.g. FDB, soft aging).
It is a valid use model and could avoid the complexity of having to deal with the presence of both SW and HW bridge and to deal with explicit offloading of data-path.
I was trying to find out whether the intention was to continue and support both bridge-less an offloaded-bridge models and leave it to the end-user to choose the desirable model at configuration time.
This would require dual support in the switch port driver in order to have best user experience across multiple switch ASICs or other kinds of devices.
>
> > Without a bridge device, the HW is represented by a set of switch port
> > devices and the bridging (both control and data planes) takes place only in
> > the HW and switch port driver.
> > Each switch port driver has to implement its own FDB as there is no
> > common shared code among drivers for different HW devices.
> > Using a bridge device could partially alleviate that, but it comes with a cost.
> > There is a need to properly implement offloading of both configuration and
> > data-path. The transmit and receive path in the bridge module should be
> > somehow bypassed to avoid unnecessary overhead or duplicate packets
> > coming from both software bridging and HW bridging.
> >
> >> What i was refering to was a scenario where i have no interest in the
> >> fdb despite such a hardware capabilities. VLANs is a different issue;
> >>
> > VLAN is fundamental feature of L2 and L3 switching and Linux is unclear
> about it. Bridge device could model bridging of untagged packets which
> requires a bridge device for each VLAN and a vlan device on each port that is
> a member of the bridge's VLAN.
> > This different from the behavior and configuration of classic closed-source
> switches.
> > An alternative model is VLAN filtering where a bridge is VLAN-aware and
> > switches tagged traffic. A bridge device represents multiple L2 domains with
> > VLAN filtering policy that defines the switching rules within each domain.
> And the linux bridge driver supports both models today.
>
> > Forwarding (e.g. L3 routing) is expected across such L2 domains using L3
> entities.
> > The modeling of L3 entities per L2 domain (e.g. per-VLAN) in the VLAN
> > filtering model is yet unclear to me.
> In the vlan filtering bridge model, You can create a vlan device on the bridge
> for l3 ...
>
That's what I'm thinking too (I experimented with such setup using veth interfaces, bridge device, and vlan interfaces). This, however, seems to require an explicit bridge for L3 support.
Looking at the latest code of FIB offloading (not yet submitted to netdev), I noticed that a switch port device is expected as a lower descendent of the FIB destination device.
This assumption is valid in the per-vlan bridge model where IP address is assigned to the bridge itself.
This, however, is not consistent with the single multi-VLAN bridge model.
Vlan interfaces on a bridge looks like siblings of the switch ports devices on the same bridge. They are not ancestors of the switch ports.
The L3 domain ends at the bridge sub-interfaces. The only L3 entities are the vlan sub-interfaces on the bridge.
Those are route next hops and the only possible fib_dev.
L3 routing is not aware of the switch ports. Route is performed to next hop addresses on one of the vlan interfaces subnets. The actual resolution to a switch port device has to be performed by the neighbor subsystem (ARP/ND).
It is unclear to me how the FIB offloading will be redirected to an ndo of a switch port device.
> >
> >>>>> Will the decision about using a bridge device or avoiding it be
> >>>>> left to the end-user?
> >>>> Its a user policy decision. Again the offload bit gets us this in a
> >>>> reasonably configurable way IMO.
> >>>>
> >>>>> (This requires switch port drivers to be able to work and provide
> >>>>> similar functionality in both setups).
> >>>> Right, but if the drivers "care" who is calling their ndo ops
> >>>> something is seriously broken. For the driver it should not need to
> >>>> know anything about the callers so it doesn't matter to the driver
> >>>> if its a netlink call from user space or an internal call fro
> >>>> bridge.ko
> >>> LEARNING_SYNC only makes sense when a switch port driver is enslaved
> >>> to
> >> a bridge.
> >> > Rocker switch driver indeed monitors upper change notifications
> >> and keep track of master bridge presence.
> >>> So bridge presence is not transparent.
> >>>
> >> Agreed - the challenge so far is that people have been fascinated by
> "switch"
> >> point of view. I think we are learning and the class device will
> >> eventually become obvious as useful.
> >>
> >> cheers,
> >> jamal
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org More majordomo
> info
> > at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 11:01 ` Arad, Ronen
@ 2014-12-16 15:54 ` Samudrala, Sridhar
2014-12-16 16:41 ` John Fastabend
2014-12-16 19:20 ` Roopa Prabhu
2 siblings, 0 replies; 31+ messages in thread
From: Samudrala, Sridhar @ 2014-12-16 15:54 UTC (permalink / raw)
To: Arad, Ronen, Roopa Prabhu, netdev@vger.kernel.org
Cc: Jamal Hadi Salim, John Fastabend, Jiri Pirko, sfeldma@gmail.com,
bcrl@kvack.org, tgraf@suug.ch, stephen@networkplumber.org,
linville@tuxdriver.com, vyasevic@redhat.com, davem@davemloft.net,
shm@cumulusnetworks.com, gospo@cumulusnetworks.com
On 12/16/2014 3:01 AM, Arad, Ronen wrote:
> In my reply (inline) I elaborate on the validity of bridge-less and offloaded-bridge models for L2 switching.
>
> I also discuss the implied necessity of a bridge device for L3 routing and potential issues with the upcoming FIB offloading proposal.
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>> owner@vger.kernel.org] On Behalf Of Roopa Prabhu
>> Sent: Tuesday, December 16, 2014 3:21 AM
>> To: Arad, Ronen
>> Cc: Jamal Hadi Salim; John Fastabend; netdev@vger.kernel.org; Jiri Pirko;
>> sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> stephen@networkplumber.org; linville@tuxdriver.com;
>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>> On 12/15/14, 4:58 PM, Arad, Ronen wrote:
>>>> -----Original Message-----
>>>> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
>>>> Sent: Tuesday, December 16, 2014 1:28 AM
>>>> To: Arad, Ronen; John Fastabend; netdev@vger.kernel.org
>>>> Cc: Roopa Prabhu; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org;
>>>> tgraf@suug.ch; stephen@networkplumber.org; linville@tuxdriver.com;
>>>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com;
>>>> gospo@cumulusnetworks.com
>>>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
>>>> del bridge port attributes
>>>>
>>>> On 12/15/14 13:36, Arad, Ronen wrote:
>>>>>> -----Original Message-----
>>>>> The behavior of a driver could depend on the presence of a bridge
>>>>> and
>>>> features such as FDB LEARNING and LEARNING_SYNC.
>>>>
>>>> Indeed, those are bridge attributes.
>>>>
>>>>> A switch port driver which is not enslaved to a bridge might need to
>>>>> implement VLAN-aware FDB within the driver and report its content to
>>>>> user-
>>>> space using ndo_fdb_dump.
>>>> >
>>>>> A switch port driver which is enslaved to a bridge could do with
>>>>> only pass through for static FDB configuration
>>>> > to the HW when LEARNING_SYNC is configured. FDB reporting to
>>>> user- space and soft aging are left to the bridge module FDB.
>>>>> Such driver, without LEARNING_SYNC could still avoid maintaing
>>>>> in-driver
>>>> FDB as long as it could dump the HW FDB on demand.
>>>>> LEARNING_SYNC also requires periodic updates of freshness
>>>>> information
>>>> from the driver to the bridge module.
>>>>
>>>> If you have an fdb - shouldnt that be exposed only if you have a
>>>> bridge abstraction exposed? i.e thats where the Linux tools would work.
>>> I'm trying to find out what are the opinions of other people in the netdev
>> list.
>>> John have clearly stated that he'd like to see full L2 switching functionality
>> (at least) supported without making a bridge device mandatory.
>>> The existing bridge ndos (ndo_bridge_{set,del,get}link) already support that
>> with proper setting of SELF/MASTER flags by iproute2.
>>> I see the value in supporting both approaches (bridge device mandatory
>>> and bridge device optional). If the choice is left to user-driven policy decision,
>>> we need to document both use models and map traditional L2 features to
>>> each model.
>>> The L2 offloading (or NETFUNC as it is currently called), which is being
>>> discussed on a different patch-set, is only needed when a bridge device is
>>> used.
>>> Without a bridge device, all configuration has to be targeted at the switch
>>> port driver directly using the SELF flag. FDB remains relevant and it is used to
>>> configure static MAC table entries and dump the HW MAC table.
>> Your understanding is right here. So far all patches have kept both models in
>> mind.
>
>>> When the HW device is a L2 switch or a multi-layer switch (L2-L3 or even
>>> higher), there is a gap between what the HW is doing and what is explicitly
>>> modeled in Linux.
>
>> Can you elaborate more here ?. We use the linux model to accelerate a
>> multi-layer (l2-l3) switch today. There maybe a few gaps, but these gaps can
>> be closed by having equivalent functionality in the software path.
> What I meant is that without a bridge device the HW switch is seen as a collection of independent switch ports. Typical switch ASIC performs L2 switching by default. This is not expressed explicitly in Linux without a bridge device.
> The SELF flag is used to target typical bridge port and bridge configuration at a switch port device.
> Without an explicit bridge device, bridge attributes have to be directed at an arbitrary port (any port could represent the entire switch) and interpreted by the switch port driver as intended for the entire switch (this includes attributes like STP etc.)
> Each switch port device driver has to implement similar functionality (i.e. all bridge and fdb related ndos) independently without common functionality shared (e.g. FDB, soft aging).
> It is a valid use model and could avoid the complexity of having to deal with the presence of both SW and HW bridge and to deal with explicit offloading of data-path.
>
> I was trying to find out whether the intention was to continue and support both bridge-less an offloaded-bridge models and leave it to the end-user to choose the desirable model at configuration time.
> This would require dual support in the switch port driver in order to have best user experience across multiple switch ASICs or other kinds of devices.
Also is one of the usecase for an explicit bridge device to support
software switching by causing
the data packets to be processed at software bridge using appropriate
port attribute settings?
Or is it only to make it convenient to maintain the fdb and represent
the hardware path?
>>> Without a bridge device, the HW is represented by a set of switch port
>>> devices and the bridging (both control and data planes) takes place only in
>>> the HW and switch port driver.
>>> Each switch port driver has to implement its own FDB as there is no
>>> common shared code among drivers for different HW devices.
>>> Using a bridge device could partially alleviate that, but it comes with a cost.
>>> There is a need to properly implement offloading of both configuration and
>>> data-path. The transmit and receive path in the bridge module should be
>>> somehow bypassed to avoid unnecessary overhead or duplicate packets
>>> coming from both software bridging and HW bridging.
>>>
>>>> What i was refering to was a scenario where i have no interest in the
>>>> fdb despite such a hardware capabilities. VLANs is a different issue;
>>>>
>>> VLAN is fundamental feature of L2 and L3 switching and Linux is unclear
>> about it. Bridge device could model bridging of untagged packets which
>> requires a bridge device for each VLAN and a vlan device on each port that is
>> a member of the bridge's VLAN.
>>> This different from the behavior and configuration of classic closed-source
>> switches.
>>> An alternative model is VLAN filtering where a bridge is VLAN-aware and
>>> switches tagged traffic. A bridge device represents multiple L2 domains with
>>> VLAN filtering policy that defines the switching rules within each domain.
>> And the linux bridge driver supports both models today.
>>
>>> Forwarding (e.g. L3 routing) is expected across such L2 domains using L3
>> entities.
>>> The modeling of L3 entities per L2 domain (e.g. per-VLAN) in the VLAN
>>> filtering model is yet unclear to me.
>> In the vlan filtering bridge model, You can create a vlan device on the bridge
>> for l3 ...
>>
> That's what I'm thinking too (I experimented with such setup using veth interfaces, bridge device, and vlan interfaces). This, however, seems to require an explicit bridge for L3 support.
>
> Looking at the latest code of FIB offloading (not yet submitted to netdev), I noticed that a switch port device is expected as a lower descendent of the FIB destination device.
> This assumption is valid in the per-vlan bridge model where IP address is assigned to the bridge itself.
> This, however, is not consistent with the single multi-VLAN bridge model.
> Vlan interfaces on a bridge looks like siblings of the switch ports devices on the same bridge. They are not ancestors of the switch ports.
> The L3 domain ends at the bridge sub-interfaces. The only L3 entities are the vlan sub-interfaces on the bridge.
> Those are route next hops and the only possible fib_dev.
> L3 routing is not aware of the switch ports. Route is performed to next hop addresses on one of the vlan interfaces subnets. The actual resolution to a switch port device has to be performed by the neighbor subsystem (ARP/ND).
> It is unclear to me how the FIB offloading will be redirected to an ndo of a switch port device.
For L3, i would think we need to support offloading of assigning the
gateway IP and the actual route. For ex: to create route to subnet
2.2.2.0/24 with GW as 1.1.1.254/24, a user may do
ip addr add 1.1.1.254/24 dev <swX>
ip route add 5.5.5.0/24 via 1.1.1.254 dev swX
Here swX has to be a device corresponding to the switch (or cpu port),
not a switch port and the ARP requests for this gw IP need to be passed
to the linux stack so that it can send arp replies and also add an ARP
entry in hardware.
>
>>>>>>> Will the decision about using a bridge device or avoiding it be
>>>>>>> left to the end-user?
>>>>>> Its a user policy decision. Again the offload bit gets us this in a
>>>>>> reasonably configurable way IMO.
>>>>>>
>>>>>>> (This requires switch port drivers to be able to work and provide
>>>>>>> similar functionality in both setups).
>>>>>> Right, but if the drivers "care" who is calling their ndo ops
>>>>>> something is seriously broken. For the driver it should not need to
>>>>>> know anything about the callers so it doesn't matter to the driver
>>>>>> if its a netlink call from user space or an internal call fro
>>>>>> bridge.ko
>>>>> LEARNING_SYNC only makes sense when a switch port driver is enslaved
>>>>> to
>>>> a bridge.
>>>> > Rocker switch driver indeed monitors upper change notifications
>>>> and keep track of master bridge presence.
>>>>> So bridge presence is not transparent.
>>>>>
>>>> Agreed - the challenge so far is that people have been fascinated by
>> "switch"
>>>> point of view. I think we are learning and the class device will
>>>> eventually become obvious as useful.
>>>>
>>>> cheers,
>>>> jamal
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org More majordomo
>> info
>>> at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 11:01 ` Arad, Ronen
2014-12-16 15:54 ` Samudrala, Sridhar
@ 2014-12-16 16:41 ` John Fastabend
2014-12-16 17:29 ` Arad, Ronen
2014-12-16 19:20 ` Roopa Prabhu
2 siblings, 1 reply; 31+ messages in thread
From: John Fastabend @ 2014-12-16 16:41 UTC (permalink / raw)
To: Arad, Ronen
Cc: Roopa Prabhu, netdev@vger.kernel.org, Jamal Hadi Salim,
Jiri Pirko, sfeldma@gmail.com, bcrl@kvack.org, tgraf@suug.ch,
stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
On 12/16/2014 03:01 AM, Arad, Ronen wrote:
>
> In my reply (inline) I elaborate on the validity of bridge-less and offloaded-bridge models for L2 switching.
>
> I also discuss the implied necessity of a bridge device for L3 routing and potential issues with the upcoming FIB offloading proposal.
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>> owner@vger.kernel.org] On Behalf Of Roopa Prabhu
>> Sent: Tuesday, December 16, 2014 3:21 AM
>> To: Arad, Ronen
>> Cc: Jamal Hadi Salim; John Fastabend; netdev@vger.kernel.org; Jiri Pirko;
>> sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> stephen@networkplumber.org; linville@tuxdriver.com;
>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>> On 12/15/14, 4:58 PM, Arad, Ronen wrote:
>>>
>>>> -----Original Message-----
>>>> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
>>>> Sent: Tuesday, December 16, 2014 1:28 AM
>>>> To: Arad, Ronen; John Fastabend; netdev@vger.kernel.org
>>>> Cc: Roopa Prabhu; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org;
>>>> tgraf@suug.ch; stephen@networkplumber.org; linville@tuxdriver.com;
>>>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com;
>>>> gospo@cumulusnetworks.com
>>>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
>>>> del bridge port attributes
>>>>
>>>> On 12/15/14 13:36, Arad, Ronen wrote:
>>>>>
>>>>>> -----Original Message-----
>>>>> The behavior of a driver could depend on the presence of a bridge
>>>>> and
>>>> features such as FDB LEARNING and LEARNING_SYNC.
>>>>
>>>> Indeed, those are bridge attributes.
>>>>
>>>>> A switch port driver which is not enslaved to a bridge might need to
>>>>> implement VLAN-aware FDB within the driver and report its content to
>>>>> user-
>>>> space using ndo_fdb_dump.
>>>> >
>>>>> A switch port driver which is enslaved to a bridge could do with
>>>>> only pass through for static FDB configuration
>>>> > to the HW when LEARNING_SYNC is configured. FDB reporting to
>>>> user- space and soft aging are left to the bridge module FDB.
>>>>> Such driver, without LEARNING_SYNC could still avoid maintaing
>>>>> in-driver
>>>> FDB as long as it could dump the HW FDB on demand.
>>>>> LEARNING_SYNC also requires periodic updates of freshness
>>>>> information
>>>> from the driver to the bridge module.
>>>>
>>>> If you have an fdb - shouldnt that be exposed only if you have a
>>>> bridge abstraction exposed? i.e thats where the Linux tools would work.
>>> I'm trying to find out what are the opinions of other people in the netdev
>> list.
>>> John have clearly stated that he'd like to see full L2 switching functionality
>> (at least) supported without making a bridge device mandatory.
>>> The existing bridge ndos (ndo_bridge_{set,del,get}link) already support that
>> with proper setting of SELF/MASTER flags by iproute2.
>>> I see the value in supporting both approaches (bridge device mandatory
>>> and bridge device optional). If the choice is left to user-driven policy decision,
>>> we need to document both use models and map traditional L2 features to
>>> each model.
>>> The L2 offloading (or NETFUNC as it is currently called), which is being
>>> discussed on a different patch-set, is only needed when a bridge device is
>>> used.
>>> Without a bridge device, all configuration has to be targeted at the switch
>>> port driver directly using the SELF flag. FDB remains relevant and it is used to
>>> configure static MAC table entries and dump the HW MAC table.
>
>> Your understanding is right here. So far all patches have kept both models in
>> mind.
>
>
>>> When the HW device is a L2 switch or a multi-layer switch (L2-L3 or even
>>> higher), there is a gap between what the HW is doing and what is explicitly
>>> modeled in Linux.
>
>
>> Can you elaborate more here ?. We use the linux model to accelerate a
>> multi-layer (l2-l3) switch today. There maybe a few gaps, but these gaps can
>> be closed by having equivalent functionality in the software path.
>
> What I meant is that without a bridge device the HW switch is seen as a collection of independent switch ports. Typical switch ASIC performs L2 switching by default. This is not expressed explicitly in Linux without a bridge device.
> The SELF flag is used to target typical bridge port and bridge configuration at a switch port device.
> Without an explicit bridge device, bridge attributes have to be directed at an arbitrary port (any port could represent the entire switch) and interpreted by the switch port driver as intended for the entire switch (this includes attributes like STP etc.)
> Each switch port device driver has to implement similar functionality (i.e. all bridge and fdb related ndos) independently without common functionality shared (e.g. FDB, soft aging).
> It is a valid use model and could avoid the complexity of having to deal with the presence of both SW and HW bridge and to deal with explicit offloading of data-path.
>
> I was trying to find out whether the intention was to continue and support both bridge-less an offloaded-bridge models and leave it to the end-user to choose the desirable model at configuration time.
> This would require dual support in the switch port driver in order to have best user experience across multiple switch ASICs or other kinds of devices.
>
I'm still missing why there is duplicate implementations in the driver.
If the driver implements the set of ndo ops why should it care who calls
them? I think you tried to explain this already but I'm not seeing it.
[...]
I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa
might have worked some of it out.
--
John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 16:41 ` John Fastabend
@ 2014-12-16 17:29 ` Arad, Ronen
2014-12-16 19:23 ` B Viswanath
2014-12-16 19:25 ` Roopa Prabhu
0 siblings, 2 replies; 31+ messages in thread
From: Arad, Ronen @ 2014-12-16 17:29 UTC (permalink / raw)
To: John Fastabend, netdev@vger.kernel.org
Cc: Roopa Prabhu, Jamal Hadi Salim, Jiri Pirko, sfeldma@gmail.com,
bcrl@kvack.org, tgraf@suug.ch, stephen@networkplumber.org,
linville@tuxdriver.com, vyasevic@redhat.com, davem@davemloft.net,
shm@cumulusnetworks.com, gospo@cumulusnetworks.com
> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@gmail.com]
> Sent: Tuesday, December 16, 2014 6:42 PM
> To: Arad, Ronen
> Cc: Roopa Prabhu; netdev@vger.kernel.org; Jamal Hadi Salim; Jiri Pirko;
> sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> stephen@networkplumber.org; linville@tuxdriver.com;
> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
>
> On 12/16/2014 03:01 AM, Arad, Ronen wrote:
> >
> > In my reply (inline) I elaborate on the validity of bridge-less and offloaded-
> bridge models for L2 switching.
> >
> > I also discuss the implied necessity of a bridge device for L3 routing and
> potential issues with the upcoming FIB offloading proposal.
> >
> >> -----Original Message-----
> >> From: netdev-owner@vger.kernel.org [mailto:netdev-
> >> owner@vger.kernel.org] On Behalf Of Roopa Prabhu
> >> Sent: Tuesday, December 16, 2014 3:21 AM
> >> To: Arad, Ronen
> >> Cc: Jamal Hadi Salim; John Fastabend; netdev@vger.kernel.org; Jiri
> >> Pirko; sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> >> stephen@networkplumber.org; linville@tuxdriver.com;
> >> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com;
> >> gospo@cumulusnetworks.com
> >> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
> >> del bridge port attributes
> >>
> >> On 12/15/14, 4:58 PM, Arad, Ronen wrote:
> >>>
> >>>> -----Original Message-----
> >>>> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
> >>>> Sent: Tuesday, December 16, 2014 1:28 AM
> >>>> To: Arad, Ronen; John Fastabend; netdev@vger.kernel.org
> >>>> Cc: Roopa Prabhu; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org;
> >>>> tgraf@suug.ch; stephen@networkplumber.org; linville@tuxdriver.com;
> >>>> vyasevic@redhat.com; davem@davemloft.net;
> >> shm@cumulusnetworks.com;
> >>>> gospo@cumulusnetworks.com
> >>>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set
> >>>> and del bridge port attributes
> >>>>
> >>>> On 12/15/14 13:36, Arad, Ronen wrote:
> >>>>>
> >>>>>> -----Original Message-----
> >>>>> The behavior of a driver could depend on the presence of a bridge
> >>>>> and
> >>>> features such as FDB LEARNING and LEARNING_SYNC.
> >>>>
> >>>> Indeed, those are bridge attributes.
> >>>>
> >>>>> A switch port driver which is not enslaved to a bridge might need
> >>>>> to implement VLAN-aware FDB within the driver and report its
> >>>>> content to
> >>>>> user-
> >>>> space using ndo_fdb_dump.
> >>>> >
> >>>>> A switch port driver which is enslaved to a bridge could do with
> >>>>> only pass through for static FDB configuration
> >>>> > to the HW when LEARNING_SYNC is configured. FDB reporting to
> >>>> user- space and soft aging are left to the bridge module FDB.
> >>>>> Such driver, without LEARNING_SYNC could still avoid maintaing
> >>>>> in-driver
> >>>> FDB as long as it could dump the HW FDB on demand.
> >>>>> LEARNING_SYNC also requires periodic updates of freshness
> >>>>> information
> >>>> from the driver to the bridge module.
> >>>>
> >>>> If you have an fdb - shouldnt that be exposed only if you have a
> >>>> bridge abstraction exposed? i.e thats where the Linux tools would work.
> >>> I'm trying to find out what are the opinions of other people in the
> >>> netdev
> >> list.
> >>> John have clearly stated that he'd like to see full L2 switching
> >>> functionality
> >> (at least) supported without making a bridge device mandatory.
> >>> The existing bridge ndos (ndo_bridge_{set,del,get}link) already
> >>> support that
> >> with proper setting of SELF/MASTER flags by iproute2.
> >>> I see the value in supporting both approaches (bridge device
> >>> mandatory and bridge device optional). If the choice is left to
> >>> user-driven policy decision, we need to document both use models and
> >>> map traditional L2 features to each model.
> >>> The L2 offloading (or NETFUNC as it is currently called), which is
> >>> being discussed on a different patch-set, is only needed when a
> >>> bridge device is used.
> >>> Without a bridge device, all configuration has to be targeted at the
> >>> switch port driver directly using the SELF flag. FDB remains
> >>> relevant and it is used to configure static MAC table entries and dump
> the HW MAC table.
> >
> >> Your understanding is right here. So far all patches have kept both
> >> models in mind.
> >
> >
> >>> When the HW device is a L2 switch or a multi-layer switch (L2-L3 or
> >>> even higher), there is a gap between what the HW is doing and what
> >>> is explicitly modeled in Linux.
> >
> >
> >> Can you elaborate more here ?. We use the linux model to accelerate a
> >> multi-layer (l2-l3) switch today. There maybe a few gaps, but these
> >> gaps can be closed by having equivalent functionality in the software path.
> >
> > What I meant is that without a bridge device the HW switch is seen as a
> collection of independent switch ports. Typical switch ASIC performs L2
> switching by default. This is not expressed explicitly in Linux without a bridge
> device.
> > The SELF flag is used to target typical bridge port and bridge configuration
> at a switch port device.
> > Without an explicit bridge device, bridge attributes have to be
> > directed at an arbitrary port (any port could represent the entire switch)
> and interpreted by the switch port driver as intended for the entire switch
> (this includes attributes like STP etc.) Each switch port device driver has to
> implement similar functionality (i.e. all bridge and fdb related ndos)
> independently without common functionality shared (e.g. FDB, soft aging).
> > It is a valid use model and could avoid the complexity of having to deal with
> the presence of both SW and HW bridge and to deal with explicit offloading
> of data-path.
> >
> > I was trying to find out whether the intention was to continue and support
> both bridge-less an offloaded-bridge models and leave it to the end-user to
> choose the desirable model at configuration time.
> > This would require dual support in the switch port driver in order to have
> best user experience across multiple switch ASICs or other kinds of devices.
> >
>
> I'm still missing why there is duplicate implementations in the driver.
> If the driver implements the set of ndo ops why should it care who calls
> them? I think you tried to explain this already but I'm not seeing it.
>
Let's consider a bridge property. I'll use the default PVID attribute as an example. This is currently configurable by sysfs only and a netlink support for that is still due. Let's assume for our discussion that a DEAFAULT_PVID attribute will be added as a bridge attribute within AFSPEC nested attribute of AF_BRIDGE SETLINK message.
When a bridge device is present, this attribute is processed by the bridge module and saved as default_pvid field in net_bridge structure. When a switch port is enslaved to a bridge, the bridge driver creates a net_bridge_port instance and assigns it a pvid inherited from the default_pvid attribute of the bridge. Setting the pvid for a new enslaved switch port is not done via netlink. It only applies to the net_bridge_port structure which is internal to the bridge module. Offloading this to HW is not addressed with current bridge offloading.
When a bridge device is not used, the DEFAULT_PVID will be targeted using the SELF flag to any of the switch ports. The driver will recognize that as a bridge port and will need to maintain some switch global structure similar to net_bridge where it could save the default_pvid. The driver, knowing that the switch port is not enslaved to a bridge, will have to replicate the same functionality. In the HW case, it will have to configure default VLAN on all the switch ports.
This is different from the yet to be defined way of propagating default PVID from a bridge device to offloaded bridge ports.
Another example is STP. STP attributes are bridge attributes which are not offloaded when a bridge device is present. The bridge module handles STP protocol internally. Without bridge device, STP attributes have to be targeted at a switch port device and the driver should save them in driver-specific structures and have proprietary implementation of STP (as the one in the bridge module is not used).
> [...]
>
> I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa might have
> worked some of it out.
>
> --
> John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 11:01 ` Arad, Ronen
2014-12-16 15:54 ` Samudrala, Sridhar
2014-12-16 16:41 ` John Fastabend
@ 2014-12-16 19:20 ` Roopa Prabhu
2 siblings, 0 replies; 31+ messages in thread
From: Roopa Prabhu @ 2014-12-16 19:20 UTC (permalink / raw)
To: Arad, Ronen
Cc: netdev@vger.kernel.org, Jamal Hadi Salim, John Fastabend,
Jiri Pirko, sfeldma@gmail.com, bcrl@kvack.org, tgraf@suug.ch,
stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
On 12/16/14, 3:01 AM, Arad, Ronen wrote:
> In my reply (inline) I elaborate on the validity of bridge-less and offloaded-bridge models for L2 switching.
>
> I also discuss the implied necessity of a bridge device for L3 routing and potential issues with the upcoming FIB offloading proposal.
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>> owner@vger.kernel.org] On Behalf Of Roopa Prabhu
>> Sent: Tuesday, December 16, 2014 3:21 AM
>> To: Arad, Ronen
>> Cc: Jamal Hadi Salim; John Fastabend; netdev@vger.kernel.org; Jiri Pirko;
>> sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> stephen@networkplumber.org; linville@tuxdriver.com;
>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>> On 12/15/14, 4:58 PM, Arad, Ronen wrote:
>>>> -----Original Message-----
>>>> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
>>>> Sent: Tuesday, December 16, 2014 1:28 AM
>>>> To: Arad, Ronen; John Fastabend; netdev@vger.kernel.org
>>>> Cc: Roopa Prabhu; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org;
>>>> tgraf@suug.ch; stephen@networkplumber.org; linville@tuxdriver.com;
>>>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com;
>>>> gospo@cumulusnetworks.com
>>>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
>>>> del bridge port attributes
>>>>
>>>> On 12/15/14 13:36, Arad, Ronen wrote:
>>>>>> -----Original Message-----
>>>>> The behavior of a driver could depend on the presence of a bridge
>>>>> and
>>>> features such as FDB LEARNING and LEARNING_SYNC.
>>>>
>>>> Indeed, those are bridge attributes.
>>>>
>>>>> A switch port driver which is not enslaved to a bridge might need to
>>>>> implement VLAN-aware FDB within the driver and report its content to
>>>>> user-
>>>> space using ndo_fdb_dump.
>>>> >
>>>>> A switch port driver which is enslaved to a bridge could do with
>>>>> only pass through for static FDB configuration
>>>> > to the HW when LEARNING_SYNC is configured. FDB reporting to
>>>> user- space and soft aging are left to the bridge module FDB.
>>>>> Such driver, without LEARNING_SYNC could still avoid maintaing
>>>>> in-driver
>>>> FDB as long as it could dump the HW FDB on demand.
>>>>> LEARNING_SYNC also requires periodic updates of freshness
>>>>> information
>>>> from the driver to the bridge module.
>>>>
>>>> If you have an fdb - shouldnt that be exposed only if you have a
>>>> bridge abstraction exposed? i.e thats where the Linux tools would work.
>>> I'm trying to find out what are the opinions of other people in the netdev
>> list.
>>> John have clearly stated that he'd like to see full L2 switching functionality
>> (at least) supported without making a bridge device mandatory.
>>> The existing bridge ndos (ndo_bridge_{set,del,get}link) already support that
>> with proper setting of SELF/MASTER flags by iproute2.
>>> I see the value in supporting both approaches (bridge device mandatory
>>> and bridge device optional). If the choice is left to user-driven policy decision,
>>> we need to document both use models and map traditional L2 features to
>>> each model.
>>> The L2 offloading (or NETFUNC as it is currently called), which is being
>>> discussed on a different patch-set, is only needed when a bridge device is
>>> used.
>>> Without a bridge device, all configuration has to be targeted at the switch
>>> port driver directly using the SELF flag. FDB remains relevant and it is used to
>>> configure static MAC table entries and dump the HW MAC table.
>> Your understanding is right here. So far all patches have kept both models in
>> mind.
>
>>> When the HW device is a L2 switch or a multi-layer switch (L2-L3 or even
>>> higher), there is a gap between what the HW is doing and what is explicitly
>>> modeled in Linux.
>
>> Can you elaborate more here ?. We use the linux model to accelerate a
>> multi-layer (l2-l3) switch today. There maybe a few gaps, but these gaps can
>> be closed by having equivalent functionality in the software path.
> What I meant is that without a bridge device the HW switch is seen as a collection of independent switch ports. Typical switch ASIC performs L2 switching by default. This is not expressed explicitly in Linux without a bridge device.
> The SELF flag is used to target typical bridge port and bridge configuration at a switch port device.
> Without an explicit bridge device, bridge attributes have to be directed at an arbitrary port (any port could represent the entire switch) and interpreted by the switch port driver as intended for the entire switch (this includes attributes like STP etc.)
> Each switch port device driver has to implement similar functionality (i.e. all bridge and fdb related ndos) independently without common functionality shared (e.g. FDB, soft aging).
> It is a valid use model and could avoid the complexity of having to deal with the presence of both SW and HW bridge and to deal with explicit offloading of data-path.
>
> I was trying to find out whether the intention was to continue and support both bridge-less an offloaded-bridge models and leave it to the end-user to choose the desirable model at configuration time.
I have always been speaking for the offloaded-bridge model because we
@cumulus work on such a model today.
But, The infrastructure had to be generic to support nics and switch
asics. And, from the current state of things both bridge-less and
offloaded-bridge models are supported. Rocker for example allows user to
operate directly (ndo_bridge_set/getlink) on the switch hw. But it does
require a bridge.
> This would require dual support in the switch port driver in order to have best user experience across multiple switch ASICs or other kinds of devices.
agreed. However, if the switch driver wants to support just one model,
it is upto the switch driver to advertise the NETFUNC_OFFLOAD flag and
maybe not honor requests with the self flag set.
Our switch driver will likely do that because we plan to support only
the offloaded-bridge model.
>>> Without a bridge device, the HW is represented by a set of switch port
>>> devices and the bridging (both control and data planes) takes place only in
>>> the HW and switch port driver.
>>> Each switch port driver has to implement its own FDB as there is no
>>> common shared code among drivers for different HW devices.
>>> Using a bridge device could partially alleviate that, but it comes with a cost.
>>> There is a need to properly implement offloading of both configuration and
>>> data-path. The transmit and receive path in the bridge module should be
>>> somehow bypassed to avoid unnecessary overhead or duplicate packets
>>> coming from both software bridging and HW bridging.
>>>
>>>> What i was refering to was a scenario where i have no interest in the
>>>> fdb despite such a hardware capabilities. VLANs is a different issue;
>>>>
>>> VLAN is fundamental feature of L2 and L3 switching and Linux is unclear
>> about it. Bridge device could model bridging of untagged packets which
>> requires a bridge device for each VLAN and a vlan device on each port that is
>> a member of the bridge's VLAN.
>>> This different from the behavior and configuration of classic closed-source
>> switches.
>>> An alternative model is VLAN filtering where a bridge is VLAN-aware and
>>> switches tagged traffic. A bridge device represents multiple L2 domains with
>>> VLAN filtering policy that defines the switching rules within each domain.
>> And the linux bridge driver supports both models today.
>>
>>> Forwarding (e.g. L3 routing) is expected across such L2 domains using L3
>> entities.
>>> The modeling of L3 entities per L2 domain (e.g. per-VLAN) in the VLAN
>>> filtering model is yet unclear to me.
>> In the vlan filtering bridge model, You can create a vlan device on the bridge
>> for l3 ...
>>
> That's what I'm thinking too (I experimented with such setup using veth interfaces, bridge device, and vlan interfaces). This, however, seems to require an explicit bridge for L3 support.
>
> Looking at the latest code of FIB offloading (not yet submitted to netdev), I noticed that a switch port device is expected as a lower descendent of the FIB destination device.
> This assumption is valid in the per-vlan bridge model where IP address is assigned to the bridge itself.
> This, however, is not consistent with the single multi-VLAN bridge model.
> Vlan interfaces on a bridge looks like siblings of the switch ports devices on the same bridge. They are not ancestors of the switch ports.
> The L3 domain ends at the bridge sub-interfaces. The only L3 entities are the vlan sub-interfaces on the bridge.
> Those are route next hops and the only possible fib_dev.
> L3 routing is not aware of the switch ports. Route is performed to next hop addresses on one of the vlan interfaces subnets. The actual resolution to a switch port device has to be performed by the neighbor subsystem (ARP/ND).
> It is unclear to me how the FIB offloading will be redirected to an ndo of a switch port device.
I have not looked at the rocker FIB offloading yet. But we have
discussed this a bit at LPC in Dusseldorf. The route can be pushed to
hw after the resolution by the neighbour subsystem..or ..with the
introduction of a switch device (to represent the asic), the switch
driver can take care of this.
>
>>>>>>> Will the decision about using a bridge device or avoiding it be
>>>>>>> left to the end-user?
>>>>>> Its a user policy decision. Again the offload bit gets us this in a
>>>>>> reasonably configurable way IMO.
>>>>>>
>>>>>>> (This requires switch port drivers to be able to work and provide
>>>>>>> similar functionality in both setups).
>>>>>> Right, but if the drivers "care" who is calling their ndo ops
>>>>>> something is seriously broken. For the driver it should not need to
>>>>>> know anything about the callers so it doesn't matter to the driver
>>>>>> if its a netlink call from user space or an internal call fro
>>>>>> bridge.ko
>>>>> LEARNING_SYNC only makes sense when a switch port driver is enslaved
>>>>> to
>>>> a bridge.
>>>> > Rocker switch driver indeed monitors upper change notifications
>>>> and keep track of master bridge presence.
>>>>> So bridge presence is not transparent.
>>>>>
>>>> Agreed - the challenge so far is that people have been fascinated by
>> "switch"
>>>> point of view. I think we are learning and the class device will
>>>> eventually become obvious as useful.
>>>>
>>>> cheers,
>>>> jamal
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org More majordomo
>> info
>>> at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 17:29 ` Arad, Ronen
@ 2014-12-16 19:23 ` B Viswanath
2014-12-16 20:52 ` Arad, Ronen
2014-12-16 19:25 ` Roopa Prabhu
1 sibling, 1 reply; 31+ messages in thread
From: B Viswanath @ 2014-12-16 19:23 UTC (permalink / raw)
To: Arad, Ronen
Cc: John Fastabend, netdev@vger.kernel.org, Roopa Prabhu,
Jamal Hadi Salim, Jiri Pirko, sfeldma@gmail.com, bcrl@kvack.org,
tgraf@suug.ch, stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
Hi,
This is my first email on this thread, and on this list. My apologies
if I have not understood something correctly. I would like to
participate in this discussion, which is one of the reasons I joined
this list recently. Some feedback inline below.
On 16 December 2014 at 22:59, Arad, Ronen <ronen.arad@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: John Fastabend [mailto:john.fastabend@gmail.com]
>> Sent: Tuesday, December 16, 2014 6:42 PM
>> To: Arad, Ronen
>> Cc: Roopa Prabhu; netdev@vger.kernel.org; Jamal Hadi Salim; Jiri Pirko;
>> sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> stephen@networkplumber.org; linville@tuxdriver.com;
>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>> On 12/16/2014 03:01 AM, Arad, Ronen wrote:
>> >
>> > In my reply (inline) I elaborate on the validity of bridge-less and offloaded-
>> bridge models for L2 switching.
>> >
>> > I also discuss the implied necessity of a bridge device for L3 routing and
>> potential issues with the upcoming FIB offloading proposal.
>> >
>> >> -----Original Message-----
>> >> From: netdev-owner@vger.kernel.org [mailto:netdev-
>> >> owner@vger.kernel.org] On Behalf Of Roopa Prabhu
>> >> Sent: Tuesday, December 16, 2014 3:21 AM
>> >> To: Arad, Ronen
>> >> Cc: Jamal Hadi Salim; John Fastabend; netdev@vger.kernel.org; Jiri
>> >> Pirko; sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> >> stephen@networkplumber.org; linville@tuxdriver.com;
>> >> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com;
>> >> gospo@cumulusnetworks.com
>> >> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
>> >> del bridge port attributes
>> >>
>> >> On 12/15/14, 4:58 PM, Arad, Ronen wrote:
>> >>>
>> >>>> -----Original Message-----
>> >>>> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
>> >>>> Sent: Tuesday, December 16, 2014 1:28 AM
>> >>>> To: Arad, Ronen; John Fastabend; netdev@vger.kernel.org
>> >>>> Cc: Roopa Prabhu; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org;
>> >>>> tgraf@suug.ch; stephen@networkplumber.org; linville@tuxdriver.com;
>> >>>> vyasevic@redhat.com; davem@davemloft.net;
>> >> shm@cumulusnetworks.com;
>> >>>> gospo@cumulusnetworks.com
>> >>>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set
>> >>>> and del bridge port attributes
>> >>>>
>> >>>> On 12/15/14 13:36, Arad, Ronen wrote:
>> >>>>>
>> >>>>>> -----Original Message-----
>> >>>>> The behavior of a driver could depend on the presence of a bridge
>> >>>>> and
>> >>>> features such as FDB LEARNING and LEARNING_SYNC.
>> >>>>
>> >>>> Indeed, those are bridge attributes.
>> >>>>
>> >>>>> A switch port driver which is not enslaved to a bridge might need
>> >>>>> to implement VLAN-aware FDB within the driver and report its
>> >>>>> content to
>> >>>>> user-
>> >>>> space using ndo_fdb_dump.
>> >>>> >
>> >>>>> A switch port driver which is enslaved to a bridge could do with
>> >>>>> only pass through for static FDB configuration
>> >>>> > to the HW when LEARNING_SYNC is configured. FDB reporting to
>> >>>> user- space and soft aging are left to the bridge module FDB.
>> >>>>> Such driver, without LEARNING_SYNC could still avoid maintaing
>> >>>>> in-driver
>> >>>> FDB as long as it could dump the HW FDB on demand.
>> >>>>> LEARNING_SYNC also requires periodic updates of freshness
>> >>>>> information
>> >>>> from the driver to the bridge module.
>> >>>>
>> >>>> If you have an fdb - shouldnt that be exposed only if you have a
>> >>>> bridge abstraction exposed? i.e thats where the Linux tools would work.
>> >>> I'm trying to find out what are the opinions of other people in the
>> >>> netdev
>> >> list.
>> >>> John have clearly stated that he'd like to see full L2 switching
>> >>> functionality
>> >> (at least) supported without making a bridge device mandatory.
>> >>> The existing bridge ndos (ndo_bridge_{set,del,get}link) already
>> >>> support that
>> >> with proper setting of SELF/MASTER flags by iproute2.
>> >>> I see the value in supporting both approaches (bridge device
>> >>> mandatory and bridge device optional). If the choice is left to
>> >>> user-driven policy decision, we need to document both use models and
>> >>> map traditional L2 features to each model.
>> >>> The L2 offloading (or NETFUNC as it is currently called), which is
>> >>> being discussed on a different patch-set, is only needed when a
>> >>> bridge device is used.
>> >>> Without a bridge device, all configuration has to be targeted at the
>> >>> switch port driver directly using the SELF flag. FDB remains
>> >>> relevant and it is used to configure static MAC table entries and dump
>> the HW MAC table.
>> >
>> >> Your understanding is right here. So far all patches have kept both
>> >> models in mind.
>> >
>> >
>> >>> When the HW device is a L2 switch or a multi-layer switch (L2-L3 or
>> >>> even higher), there is a gap between what the HW is doing and what
>> >>> is explicitly modeled in Linux.
>> >
>> >
>> >> Can you elaborate more here ?. We use the linux model to accelerate a
>> >> multi-layer (l2-l3) switch today. There maybe a few gaps, but these
>> >> gaps can be closed by having equivalent functionality in the software path.
>> >
>> > What I meant is that without a bridge device the HW switch is seen as a
>> collection of independent switch ports. Typical switch ASIC performs L2
>> switching by default. This is not expressed explicitly in Linux without a bridge
>> device.
>> > The SELF flag is used to target typical bridge port and bridge configuration
>> at a switch port device.
>> > Without an explicit bridge device, bridge attributes have to be
>> > directed at an arbitrary port (any port could represent the entire switch)
>> and interpreted by the switch port driver as intended for the entire switch
>> (this includes attributes like STP etc.) Each switch port device driver has to
>> implement similar functionality (i.e. all bridge and fdb related ndos)
>> independently without common functionality shared (e.g. FDB, soft aging).
>> > It is a valid use model and could avoid the complexity of having to deal with
>> the presence of both SW and HW bridge and to deal with explicit offloading
>> of data-path.
>> >
>> > I was trying to find out whether the intention was to continue and support
>> both bridge-less an offloaded-bridge models and leave it to the end-user to
>> choose the desirable model at configuration time.
>> > This would require dual support in the switch port driver in order to have
>> best user experience across multiple switch ASICs or other kinds of devices.
>> >
>>
>> I'm still missing why there is duplicate implementations in the driver.
>> If the driver implements the set of ndo ops why should it care who calls
>> them? I think you tried to explain this already but I'm not seeing it.
>>
>
> Let's consider a bridge property. I'll use the default PVID attribute as an example. This is currently configurable by sysfs only and a netlink support for that is still due. Let's assume for our discussion that a DEAFAULT_PVID attribute will be added as a bridge attribute within AFSPEC nested attribute of AF_BRIDGE SETLINK message.
> When a bridge device is present, this attribute is processed by the bridge module and saved as default_pvid field in net_bridge structure. When a switch port is enslaved to a bridge, the bridge driver creates a net_bridge_port instance and assigns it a pvid inherited from the default_pvid attribute of the bridge. Setting the pvid for a new enslaved switch port is not done via netlink. It only applies to the net_bridge_port structure which is internal to the bridge module. Offloading this to HW is not addressed with current bridge offloading.
>
> When a bridge device is not used, the DEFAULT_PVID will be targeted using the SELF flag to any of the switch ports. The driver will recognize that as a bridge port and will need to maintain some switch global structure similar to net_bridge where it could save the default_pvid. The driver, knowing that the switch port is not enslaved to a bridge, will have to replicate the same functionality. In the HW case, it will have to configure default VLAN on all the switch ports.
> This is different from the yet to be defined way of propagating default PVID from a bridge device to offloaded bridge ports.
>
> Another example is STP. STP attributes are bridge attributes which are not offloaded when a bridge device is present. The bridge module handles STP protocol internally. Without bridge device, STP attributes have to be targeted at a switch port device and the driver should save them in driver-specific structures and have proprietary implementation of STP (as the one in the bridge module is not used).
In general I feel that the switch-device and port relation should be
that of the 'container-containee'. This is the actual physical
relationship. Apart from some operations such as vlans and protocol
related, it is tricky to model all operations directly on ports. My
thinking is it is cleaner to have all operations be on switch-device,
which in turn peculates the operations downward, to its contained
ports as applicable. The offloading is really a property of the
switch device and not individual ports. Similarly the FDB is
maintained by the switch and not the ports. As we extend the current
offloading mechanism to other L2, L3 and other features, we may find
it easier to have a 'switch-device' in place.
I am somewhat confused with the notion of bridges though. Many
existing linux-based routers use bridges differently than as a
vlan-broadcast-domain. For example it is common to have eth0.334 and
eth1 in the same bridge. What is being done internally is that the
additional vlan tag 334 (which indicates video traffic, say) is
removed and that video traffic is being bridged to eth1. There is no
default vlan for this bridge. This is a software bridge. I am not
sure how this can be accomplished if there is a need to associate a
vlan with a bridge.
Thanks
Viswanath
>
>
>> [...]
>>
>> I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa might have
>> worked some of it out.
>>
>> --
>> John Fastabend Intel Corporation
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 17:29 ` Arad, Ronen
2014-12-16 19:23 ` B Viswanath
@ 2014-12-16 19:25 ` Roopa Prabhu
1 sibling, 0 replies; 31+ messages in thread
From: Roopa Prabhu @ 2014-12-16 19:25 UTC (permalink / raw)
To: Arad, Ronen
Cc: John Fastabend, netdev@vger.kernel.org, Jamal Hadi Salim,
Jiri Pirko, sfeldma@gmail.com, bcrl@kvack.org, tgraf@suug.ch,
stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
On 12/16/14, 9:29 AM, Arad, Ronen wrote:
>
>> -----Original Message-----
>> From: John Fastabend [mailto:john.fastabend@gmail.com]
>> Sent: Tuesday, December 16, 2014 6:42 PM
>> To: Arad, Ronen
>> Cc: Roopa Prabhu; netdev@vger.kernel.org; Jamal Hadi Salim; Jiri Pirko;
>> sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> stephen@networkplumber.org; linville@tuxdriver.com;
>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>> On 12/16/2014 03:01 AM, Arad, Ronen wrote:
>>> In my reply (inline) I elaborate on the validity of bridge-less and offloaded-
>> bridge models for L2 switching.
>>> I also discuss the implied necessity of a bridge device for L3 routing and
>> potential issues with the upcoming FIB offloading proposal.
>>>> -----Original Message-----
>>>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>>>> owner@vger.kernel.org] On Behalf Of Roopa Prabhu
>>>> Sent: Tuesday, December 16, 2014 3:21 AM
>>>> To: Arad, Ronen
>>>> Cc: Jamal Hadi Salim; John Fastabend; netdev@vger.kernel.org; Jiri
>>>> Pirko; sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>>>> stephen@networkplumber.org; linville@tuxdriver.com;
>>>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com;
>>>> gospo@cumulusnetworks.com
>>>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
>>>> del bridge port attributes
>>>>
>>>> On 12/15/14, 4:58 PM, Arad, Ronen wrote:
>>>>>> -----Original Message-----
>>>>>> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
>>>>>> Sent: Tuesday, December 16, 2014 1:28 AM
>>>>>> To: Arad, Ronen; John Fastabend; netdev@vger.kernel.org
>>>>>> Cc: Roopa Prabhu; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org;
>>>>>> tgraf@suug.ch; stephen@networkplumber.org; linville@tuxdriver.com;
>>>>>> vyasevic@redhat.com; davem@davemloft.net;
>>>> shm@cumulusnetworks.com;
>>>>>> gospo@cumulusnetworks.com
>>>>>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set
>>>>>> and del bridge port attributes
>>>>>>
>>>>>> On 12/15/14 13:36, Arad, Ronen wrote:
>>>>>>>> -----Original Message-----
>>>>>>> The behavior of a driver could depend on the presence of a bridge
>>>>>>> and
>>>>>> features such as FDB LEARNING and LEARNING_SYNC.
>>>>>>
>>>>>> Indeed, those are bridge attributes.
>>>>>>
>>>>>>> A switch port driver which is not enslaved to a bridge might need
>>>>>>> to implement VLAN-aware FDB within the driver and report its
>>>>>>> content to
>>>>>>> user-
>>>>>> space using ndo_fdb_dump.
>>>>>> >
>>>>>>> A switch port driver which is enslaved to a bridge could do with
>>>>>>> only pass through for static FDB configuration
>>>>>> > to the HW when LEARNING_SYNC is configured. FDB reporting to
>>>>>> user- space and soft aging are left to the bridge module FDB.
>>>>>>> Such driver, without LEARNING_SYNC could still avoid maintaing
>>>>>>> in-driver
>>>>>> FDB as long as it could dump the HW FDB on demand.
>>>>>>> LEARNING_SYNC also requires periodic updates of freshness
>>>>>>> information
>>>>>> from the driver to the bridge module.
>>>>>>
>>>>>> If you have an fdb - shouldnt that be exposed only if you have a
>>>>>> bridge abstraction exposed? i.e thats where the Linux tools would work.
>>>>> I'm trying to find out what are the opinions of other people in the
>>>>> netdev
>>>> list.
>>>>> John have clearly stated that he'd like to see full L2 switching
>>>>> functionality
>>>> (at least) supported without making a bridge device mandatory.
>>>>> The existing bridge ndos (ndo_bridge_{set,del,get}link) already
>>>>> support that
>>>> with proper setting of SELF/MASTER flags by iproute2.
>>>>> I see the value in supporting both approaches (bridge device
>>>>> mandatory and bridge device optional). If the choice is left to
>>>>> user-driven policy decision, we need to document both use models and
>>>>> map traditional L2 features to each model.
>>>>> The L2 offloading (or NETFUNC as it is currently called), which is
>>>>> being discussed on a different patch-set, is only needed when a
>>>>> bridge device is used.
>>>>> Without a bridge device, all configuration has to be targeted at the
>>>>> switch port driver directly using the SELF flag. FDB remains
>>>>> relevant and it is used to configure static MAC table entries and dump
>> the HW MAC table.
>>>> Your understanding is right here. So far all patches have kept both
>>>> models in mind.
>>>
>>>>> When the HW device is a L2 switch or a multi-layer switch (L2-L3 or
>>>>> even higher), there is a gap between what the HW is doing and what
>>>>> is explicitly modeled in Linux.
>>>
>>>> Can you elaborate more here ?. We use the linux model to accelerate a
>>>> multi-layer (l2-l3) switch today. There maybe a few gaps, but these
>>>> gaps can be closed by having equivalent functionality in the software path.
>>> What I meant is that without a bridge device the HW switch is seen as a
>> collection of independent switch ports. Typical switch ASIC performs L2
>> switching by default. This is not expressed explicitly in Linux without a bridge
>> device.
>>> The SELF flag is used to target typical bridge port and bridge configuration
>> at a switch port device.
>>> Without an explicit bridge device, bridge attributes have to be
>>> directed at an arbitrary port (any port could represent the entire switch)
>> and interpreted by the switch port driver as intended for the entire switch
>> (this includes attributes like STP etc.) Each switch port device driver has to
>> implement similar functionality (i.e. all bridge and fdb related ndos)
>> independently without common functionality shared (e.g. FDB, soft aging).
>>> It is a valid use model and could avoid the complexity of having to deal with
>> the presence of both SW and HW bridge and to deal with explicit offloading
>> of data-path.
>>> I was trying to find out whether the intention was to continue and support
>> both bridge-less an offloaded-bridge models and leave it to the end-user to
>> choose the desirable model at configuration time.
>>> This would require dual support in the switch port driver in order to have
>> best user experience across multiple switch ASICs or other kinds of devices.
>> I'm still missing why there is duplicate implementations in the driver.
>> If the driver implements the set of ndo ops why should it care who calls
>> them? I think you tried to explain this already but I'm not seeing it.
>>
> Let's consider a bridge property. I'll use the default PVID attribute as an example. This is currently configurable by sysfs only and a netlink support for that is still due. Let's assume for our discussion that a DEAFAULT_PVID attribute will be added as a bridge attribute within AFSPEC nested attribute of AF_BRIDGE SETLINK message.
> When a bridge device is present, this attribute is processed by the bridge module and saved as default_pvid field in net_bridge structure. When a switch port is enslaved to a bridge, the bridge driver creates a net_bridge_port instance and assigns it a pvid inherited from the default_pvid attribute of the bridge. Setting the pvid for a new enslaved switch port is not done via netlink. It only applies to the net_bridge_port structure which is internal to the bridge module. Offloading this to HW is not addressed with current bridge offloading.
Correct. And that's where I am going with my NETFUNC_OFFLOAD series.
Hooks in the bridge driver to communicate all changes to the switch driver.
The default pvid will/should be also communicated to the switch driver.
And the switch driver does not need to learn the default pvid for the
bridge.
The callbacks (ndo's) into the switch driver will indicate the pvid for
the switch port (just like all the userspace notifications for the
bridge driver do today).
>
> When a bridge device is not used, the DEFAULT_PVID will be targeted using the SELF flag to any of the switch ports. The driver will recognize that as a bridge port and will need to maintain some switch global structure similar to net_bridge where it could save the default_pvid. The driver, knowing that the switch port is not enslaved to a bridge, will have to replicate the same functionality. In the HW case, it will have to configure default VLAN on all the switch ports.
> This is different from the yet to be defined way of propagating default PVID from a bridge device to offloaded bridge ports.
>
> Another example is STP. STP attributes are bridge attributes which are not offloaded when a bridge device is present. The bridge module handles STP protocol internally. Without bridge device, STP attributes have to be targeted at a switch port device and the driver should save them in driver-specific structures and have proprietary implementation of STP (as the one in the bridge module is not used).
yes, for all the cases where you are trying to bypass the bridge driver,
the switch driver will need implement similar functionality. So, for
switch asics offloads there is a disadvantage to not use the bridge driver.
>
>
>> [...]
>>
>> I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa might have
>> worked some of it out.
>>
>> --
>> John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 19:23 ` B Viswanath
@ 2014-12-16 20:52 ` Arad, Ronen
2014-12-16 21:51 ` B Viswanath
0 siblings, 1 reply; 31+ messages in thread
From: Arad, Ronen @ 2014-12-16 20:52 UTC (permalink / raw)
To: B Viswanath, netdev@vger.kernel.org
Cc: John Fastabend, Roopa Prabhu, Jamal Hadi Salim, Jiri Pirko,
sfeldma@gmail.com, bcrl@kvack.org, tgraf@suug.ch,
stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of B Viswanath
> Sent: Tuesday, December 16, 2014 9:24 PM
> To: Arad, Ronen
> Cc: John Fastabend; netdev@vger.kernel.org; Roopa Prabhu; Jamal Hadi
> Salim; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> stephen@networkplumber.org; linville@tuxdriver.com;
> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
>
> Hi,
>
> This is my first email on this thread, and on this list. My apologies if I have
> not understood something correctly. I would like to participate in this
> discussion, which is one of the reasons I joined this list recently. Some
> feedback inline below.
>
> On 16 December 2014 at 22:59, Arad, Ronen <ronen.arad@intel.com> wrote:
> >
> >
<sniped for brevity>
> >>
> >> I'm still missing why there is duplicate implementations in the driver.
> >> If the driver implements the set of ndo ops why should it care who
> >> calls them? I think you tried to explain this already but I'm not seeing it.
> >>
> >
> > Let's consider a bridge property. I'll use the default PVID attribute as an
> example. This is currently configurable by sysfs only and a netlink support for
> that is still due. Let's assume for our discussion that a DEAFAULT_PVID
> attribute will be added as a bridge attribute within AFSPEC nested attribute
> of AF_BRIDGE SETLINK message.
> > When a bridge device is present, this attribute is processed by the bridge
> module and saved as default_pvid field in net_bridge structure. When a
> switch port is enslaved to a bridge, the bridge driver creates a
> net_bridge_port instance and assigns it a pvid inherited from the
> default_pvid attribute of the bridge. Setting the pvid for a new enslaved
> switch port is not done via netlink. It only applies to the net_bridge_port
> structure which is internal to the bridge module. Offloading this to HW is not
> addressed with current bridge offloading.
> >
> > When a bridge device is not used, the DEFAULT_PVID will be targeted using
> the SELF flag to any of the switch ports. The driver will recognize that as a
> bridge port and will need to maintain some switch global structure similar to
> net_bridge where it could save the default_pvid. The driver, knowing that the
> switch port is not enslaved to a bridge, will have to replicate the same
> functionality. In the HW case, it will have to configure default VLAN on all the
> switch ports.
> > This is different from the yet to be defined way of propagating default PVID
> from a bridge device to offloaded bridge ports.
> >
> > Another example is STP. STP attributes are bridge attributes which are not
> offloaded when a bridge device is present. The bridge module handles STP
> protocol internally. Without bridge device, STP attributes have to be targeted
> at a switch port device and the driver should save them in driver-specific
> structures and have proprietary implementation of STP (as the one in the
> bridge module is not used).
>
> In general I feel that the switch-device and port relation should be that of the
> 'container-containee'. This is the actual physical relationship. Apart from
> some operations such as vlans and protocol related, it is tricky to model all
> operations directly on ports. My thinking is it is cleaner to have all operations
> be on switch-device, which in turn peculates the operations downward, to its
> contained ports as applicable. The offloading is really a property of the
> switch device and not individual ports. Similarly the FDB is maintained by the
> switch and not the ports. As we extend the current offloading mechanism to
> other L2, L3 and other features, we may find it easier to have a 'switch-
> device' in place.
>
> I am somewhat confused with the notion of bridges though. Many existing
> linux-based routers use bridges differently than as a vlan-broadcast-domain.
> For example it is common to have eth0.334 and
> eth1 in the same bridge. What is being done internally is that the additional
> vlan tag 334 (which indicates video traffic, say) is removed and that video
> traffic is being bridged to eth1. There is no default vlan for this bridge. This is
> a software bridge. I am not sure how this can be accomplished if there is a
> need to associate a vlan with a bridge.
>
> Thanks
> Viswanath
>
>
Let's say we have three ports sp1, sp2, sp3 and two VLANs 10 and 20.
VLAN 10 is allowed on ports sp1 and sp2 and VLAN 20 is allowed on ports sp2 and sp3.
Let's say sp1 and sp3 are access ports and carry untagged traffic. Sp2 is an uplink trunk port and carries tagged traffic.
This could be modeled using Linux bridge in at least two ways:
1) Bridge per VLAN
- Two bridge devices are used br10 and br20
- sp2.10 is a vlan interface on sp2 for VLAN 10
- sp2.20 is a vlan interface on sp2 for VLAN 20
- br10 enslaves ports sp1 and sp2.10
- br20 enslaves ports sp3 and sp2.20
- br10 and br20 could be L3 interfaces and have IP address assigned to.
This allows for routing between VLANs.
- Traffic sent to br10 will egress untagged on sp1 and tagged with VID=10 on sp2
- Traffic sent to br20 will egress untagged on sp2 and tagged with VID=20 on sp2
- Traffic received on sp1 is delivered to br10 and could be flooded if MAC DA is broadcast or unknown unicast.
- Traffic received on sp2 with VID=10 is delivered to br10 after the VLAN tag is removed.
- Similarly traffic receive on sp3 or tagged traffic with VID=20 on sp2 is delivered to br20
2) Single bridge with VLAN filtering
- A single bridge device br0 is used
- All ports sp1, sp2, and sp3 are enslaved to the bridge
- br0 allows both VID=10 and VID=20
- VLAN policy on the ports determines the bridging domains within the bridge
- sp1 allows VID=10, it is untagged on egress, and it is the PVID of sp1.
Linux does not provide a way to only allow untagged traffic to enter a port.
Both tagged traffic with VID=10 and untagged traffic are considered received on VLAN 10.
- sp3 allows VID=20, it is untagged on egress, and it is the PVID of sp2.
- sp2 allows VID=10 and VID=20, both are tagged on egress, and there is no PVID so untagged traffic received on sp2 is dropped.
- Above configuration is sufficient for L2 switching with two distinct VLANs.
- L3 routing across VLANs in this model is achieved by vlan interfaces on the bridge. br0.10 for VLAN 10 and br0.20 for VLAN 20.
- br0.10 and br0.20 are L3 interfaces and have IP addresses assigned.
- br0.10 allows VID=10
> >
> >
> >> [...]
> >>
> >> I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa
> >> might have worked some of it out.
> >>
> >> --
> >> John Fastabend Intel Corporation
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org More majordomo
> info
> > at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 20:52 ` Arad, Ronen
@ 2014-12-16 21:51 ` B Viswanath
2014-12-16 22:46 ` Arad, Ronen
0 siblings, 1 reply; 31+ messages in thread
From: B Viswanath @ 2014-12-16 21:51 UTC (permalink / raw)
To: Arad, Ronen
Cc: netdev@vger.kernel.org, John Fastabend, Roopa Prabhu,
Jamal Hadi Salim, Jiri Pirko, sfeldma@gmail.com, bcrl@kvack.org,
tgraf@suug.ch, stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
On 17 December 2014 at 02:22, Arad, Ronen <ronen.arad@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>> owner@vger.kernel.org] On Behalf Of B Viswanath
>> Sent: Tuesday, December 16, 2014 9:24 PM
>> To: Arad, Ronen
>> Cc: John Fastabend; netdev@vger.kernel.org; Roopa Prabhu; Jamal Hadi
>> Salim; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> stephen@networkplumber.org; linville@tuxdriver.com;
>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>> Hi,
>>
>> This is my first email on this thread, and on this list. My apologies if I have
>> not understood something correctly. I would like to participate in this
>> discussion, which is one of the reasons I joined this list recently. Some
>> feedback inline below.
>>
>> On 16 December 2014 at 22:59, Arad, Ronen <ronen.arad@intel.com> wrote:
>> >
>> >
> <sniped for brevity>
>> >>
>> >> I'm still missing why there is duplicate implementations in the driver.
>> >> If the driver implements the set of ndo ops why should it care who
>> >> calls them? I think you tried to explain this already but I'm not seeing it.
>> >>
>> >
>> > Let's consider a bridge property. I'll use the default PVID attribute as an
>> example. This is currently configurable by sysfs only and a netlink support for
>> that is still due. Let's assume for our discussion that a DEAFAULT_PVID
>> attribute will be added as a bridge attribute within AFSPEC nested attribute
>> of AF_BRIDGE SETLINK message.
>> > When a bridge device is present, this attribute is processed by the bridge
>> module and saved as default_pvid field in net_bridge structure. When a
>> switch port is enslaved to a bridge, the bridge driver creates a
>> net_bridge_port instance and assigns it a pvid inherited from the
>> default_pvid attribute of the bridge. Setting the pvid for a new enslaved
>> switch port is not done via netlink. It only applies to the net_bridge_port
>> structure which is internal to the bridge module. Offloading this to HW is not
>> addressed with current bridge offloading.
>> >
>> > When a bridge device is not used, the DEFAULT_PVID will be targeted using
>> the SELF flag to any of the switch ports. The driver will recognize that as a
>> bridge port and will need to maintain some switch global structure similar to
>> net_bridge where it could save the default_pvid. The driver, knowing that the
>> switch port is not enslaved to a bridge, will have to replicate the same
>> functionality. In the HW case, it will have to configure default VLAN on all the
>> switch ports.
>> > This is different from the yet to be defined way of propagating default PVID
>> from a bridge device to offloaded bridge ports.
>> >
>> > Another example is STP. STP attributes are bridge attributes which are not
>> offloaded when a bridge device is present. The bridge module handles STP
>> protocol internally. Without bridge device, STP attributes have to be targeted
>> at a switch port device and the driver should save them in driver-specific
>> structures and have proprietary implementation of STP (as the one in the
>> bridge module is not used).
>>
>> In general I feel that the switch-device and port relation should be that of the
>> 'container-containee'. This is the actual physical relationship. Apart from
>> some operations such as vlans and protocol related, it is tricky to model all
>> operations directly on ports. My thinking is it is cleaner to have all operations
>> be on switch-device, which in turn peculates the operations downward, to its
>> contained ports as applicable. The offloading is really a property of the
>> switch device and not individual ports. Similarly the FDB is maintained by the
>> switch and not the ports. As we extend the current offloading mechanism to
>> other L2, L3 and other features, we may find it easier to have a 'switch-
>> device' in place.
>>
>> I am somewhat confused with the notion of bridges though. Many existing
>> linux-based routers use bridges differently than as a vlan-broadcast-domain.
>> For example it is common to have eth0.334 and
>> eth1 in the same bridge. What is being done internally is that the additional
>> vlan tag 334 (which indicates video traffic, say) is removed and that video
>> traffic is being bridged to eth1. There is no default vlan for this bridge. This is
>> a software bridge. I am not sure how this can be accomplished if there is a
>> need to associate a vlan with a bridge.
>>
>> Thanks
>> Viswanath
>>
>>
>
> Let's say we have three ports sp1, sp2, sp3 and two VLANs 10 and 20.
> VLAN 10 is allowed on ports sp1 and sp2 and VLAN 20 is allowed on ports sp2 and sp3.
> Let's say sp1 and sp3 are access ports and carry untagged traffic. Sp2 is an uplink trunk port and carries tagged traffic.
>
> This could be modeled using Linux bridge in at least two ways:
>
> 1) Bridge per VLAN
> - Two bridge devices are used br10 and br20
> - sp2.10 is a vlan interface on sp2 for VLAN 10
> - sp2.20 is a vlan interface on sp2 for VLAN 20
> - br10 enslaves ports sp1 and sp2.10
> - br20 enslaves ports sp3 and sp2.20
> - br10 and br20 could be L3 interfaces and have IP address assigned to.
> This allows for routing between VLANs.
> - Traffic sent to br10 will egress untagged on sp1 and tagged with VID=10 on sp2
> - Traffic sent to br20 will egress untagged on sp2 and tagged with VID=20 on sp2
> - Traffic received on sp1 is delivered to br10 and could be flooded if MAC DA is broadcast or unknown unicast.
> - Traffic received on sp2 with VID=10 is delivered to br10 after the VLAN tag is removed.
> - Similarly traffic receive on sp3 or tagged traffic with VID=20 on sp2 is delivered to br20
Thanks for the explanation. Understood this, and this is how I
saw/implemented things so far. However, my understanding is that both
br10 and br20 have nothing to do with vlan 10 and 20 respectively. The
traffic the actual bridges see (say if we ran tcpdump on br10 or br20)
is always untagged. The tagging happens while the packets are
egressing the sp2 port. In that sense, there is no vlan associated
with either of the bridges.
What I was not sure was about how we can associate a vlan with such a
bridge in this case, and what it really means for the traffic.
>
> 2) Single bridge with VLAN filtering
> - A single bridge device br0 is used
> - All ports sp1, sp2, and sp3 are enslaved to the bridge
> - br0 allows both VID=10 and VID=20
> - VLAN policy on the ports determines the bridging domains within the bridge
> - sp1 allows VID=10, it is untagged on egress, and it is the PVID of sp1.
> Linux does not provide a way to only allow untagged traffic to enter a port.
> Both tagged traffic with VID=10 and untagged traffic are considered received on VLAN 10.
> - sp3 allows VID=20, it is untagged on egress, and it is the PVID of sp2.
> - sp2 allows VID=10 and VID=20, both are tagged on egress, and there is no PVID so untagged traffic received on sp2 is dropped.
>
> - Above configuration is sufficient for L2 switching with two distinct VLANs.
> - L3 routing across VLANs in this model is achieved by vlan interfaces on the bridge. br0.10 for VLAN 10 and br0.20 for VLAN 20.
> - br0.10 and br0.20 are L3 interfaces and have IP addresses assigned.
> - br0.10 allows VID=10
In this case am I correct in assuming that br0 actually represents the
switch device ? I don't see a way in which one more such bridge can
exist, since the forwarding decisions are handled at the switch-device
level.
Thanks
Viswanath
>
>
>> >
>> >
>> >> [...]
>> >>
>> >> I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa
>> >> might have worked some of it out.
>> >>
>> >> --
>> >> John Fastabend Intel Corporation
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe netdev" in
>> > the body of a message to majordomo@vger.kernel.org More majordomo
>> info
>> > at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
2014-12-16 21:51 ` B Viswanath
@ 2014-12-16 22:46 ` Arad, Ronen
0 siblings, 0 replies; 31+ messages in thread
From: Arad, Ronen @ 2014-12-16 22:46 UTC (permalink / raw)
To: B Viswanath, netdev@vger.kernel.org
Cc: John Fastabend, Roopa Prabhu, Jamal Hadi Salim, Jiri Pirko,
sfeldma@gmail.com, bcrl@kvack.org, tgraf@suug.ch,
stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
> -----Original Message-----
> From: B Viswanath [mailto:marichika4@gmail.com]
> Sent: Tuesday, December 16, 2014 11:52 PM
> To: Arad, Ronen
> Cc: netdev@vger.kernel.org; John Fastabend; Roopa Prabhu; Jamal Hadi
> Salim; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> stephen@networkplumber.org; linville@tuxdriver.com;
> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
>
> On 17 December 2014 at 02:22, Arad, Ronen <ronen.arad@intel.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: netdev-owner@vger.kernel.org [mailto:netdev-
> >> owner@vger.kernel.org] On Behalf Of B Viswanath
> >> Sent: Tuesday, December 16, 2014 9:24 PM
> >> To: Arad, Ronen
> >> Cc: John Fastabend; netdev@vger.kernel.org; Roopa Prabhu; Jamal Hadi
> >> Salim; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> >> stephen@networkplumber.org; linville@tuxdriver.com;
> >> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com;
> >> gospo@cumulusnetworks.com
> >> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
> >> del bridge port attributes
> >>
> >> Hi,
> >>
> >> This is my first email on this thread, and on this list. My apologies
> >> if I have not understood something correctly. I would like to
> >> participate in this discussion, which is one of the reasons I joined
> >> this list recently. Some feedback inline below.
> >>
> >> On 16 December 2014 at 22:59, Arad, Ronen <ronen.arad@intel.com>
> wrote:
> >> >
> >> >
> > <sniped for brevity>
> >> >>
> >> >> I'm still missing why there is duplicate implementations in the driver.
> >> >> If the driver implements the set of ndo ops why should it care who
> >> >> calls them? I think you tried to explain this already but I'm not seeing it.
> >> >>
> >> >
> >> > Let's consider a bridge property. I'll use the default PVID
> >> > attribute as an
> >> example. This is currently configurable by sysfs only and a netlink
> >> support for that is still due. Let's assume for our discussion that a
> >> DEAFAULT_PVID attribute will be added as a bridge attribute within
> >> AFSPEC nested attribute of AF_BRIDGE SETLINK message.
> >> > When a bridge device is present, this attribute is processed by the
> >> > bridge
> >> module and saved as default_pvid field in net_bridge structure. When
> >> a switch port is enslaved to a bridge, the bridge driver creates a
> >> net_bridge_port instance and assigns it a pvid inherited from the
> >> default_pvid attribute of the bridge. Setting the pvid for a new
> >> enslaved switch port is not done via netlink. It only applies to the
> >> net_bridge_port structure which is internal to the bridge module.
> >> Offloading this to HW is not addressed with current bridge offloading.
> >> >
> >> > When a bridge device is not used, the DEFAULT_PVID will be targeted
> >> > using
> >> the SELF flag to any of the switch ports. The driver will recognize
> >> that as a bridge port and will need to maintain some switch global
> >> structure similar to net_bridge where it could save the default_pvid.
> >> The driver, knowing that the switch port is not enslaved to a bridge,
> >> will have to replicate the same functionality. In the HW case, it
> >> will have to configure default VLAN on all the switch ports.
> >> > This is different from the yet to be defined way of propagating
> >> > default PVID
> >> from a bridge device to offloaded bridge ports.
> >> >
> >> > Another example is STP. STP attributes are bridge attributes which
> >> > are not
> >> offloaded when a bridge device is present. The bridge module handles
> >> STP protocol internally. Without bridge device, STP attributes have
> >> to be targeted at a switch port device and the driver should save
> >> them in driver-specific structures and have proprietary
> >> implementation of STP (as the one in the bridge module is not used).
> >>
> >> In general I feel that the switch-device and port relation should be
> >> that of the 'container-containee'. This is the actual physical
> >> relationship. Apart from some operations such as vlans and protocol
> >> related, it is tricky to model all operations directly on ports. My
> >> thinking is it is cleaner to have all operations be on switch-device,
> >> which in turn peculates the operations downward, to its contained
> >> ports as applicable. The offloading is really a property of the
> >> switch device and not individual ports. Similarly the FDB is
> >> maintained by the switch and not the ports. As we extend the current
> >> offloading mechanism to other L2, L3 and other features, we may find it
> easier to have a 'switch- device' in place.
> >>
> >> I am somewhat confused with the notion of bridges though. Many
> >> existing linux-based routers use bridges differently than as a vlan-
> broadcast-domain.
> >> For example it is common to have eth0.334 and
> >> eth1 in the same bridge. What is being done internally is that the
> >> additional vlan tag 334 (which indicates video traffic, say) is
> >> removed and that video traffic is being bridged to eth1. There is no
> >> default vlan for this bridge. This is a software bridge. I am not
> >> sure how this can be accomplished if there is a need to associate a vlan
> with a bridge.
> >>
> >> Thanks
> >> Viswanath
> >>
> >>
> >
> > Let's say we have three ports sp1, sp2, sp3 and two VLANs 10 and 20.
> > VLAN 10 is allowed on ports sp1 and sp2 and VLAN 20 is allowed on ports
> sp2 and sp3.
> > Let's say sp1 and sp3 are access ports and carry untagged traffic. Sp2 is an
> uplink trunk port and carries tagged traffic.
> >
> > This could be modeled using Linux bridge in at least two ways:
> >
> > 1) Bridge per VLAN
> > - Two bridge devices are used br10 and br20
> > - sp2.10 is a vlan interface on sp2 for VLAN 10
> > - sp2.20 is a vlan interface on sp2 for VLAN 20
> > - br10 enslaves ports sp1 and sp2.10
> > - br20 enslaves ports sp3 and sp2.20
> > - br10 and br20 could be L3 interfaces and have IP address assigned to.
> > This allows for routing between VLANs.
> > - Traffic sent to br10 will egress untagged on sp1 and tagged with
> VID=10 on sp2
> > - Traffic sent to br20 will egress untagged on sp2 and tagged with
> VID=20 on sp2
> > - Traffic received on sp1 is delivered to br10 and could be flooded if
> MAC DA is broadcast or unknown unicast.
> > - Traffic received on sp2 with VID=10 is delivered to br10 after the
> VLAN tag is removed.
> > - Similarly traffic receive on sp3 or tagged traffic with
> > VID=20 on sp2 is delivered to br20
>
> Thanks for the explanation. Understood this, and this is how I
> saw/implemented things so far. However, my understanding is that both
> br10 and br20 have nothing to do with vlan 10 and 20 respectively. The traffic
> the actual bridges see (say if we ran tcpdump on br10 or br20) is always
> untagged. The tagging happens while the packets are egressing the sp2 port.
> In that sense, there is no vlan associated with either of the bridges.
>
> What I was not sure was about how we can associate a vlan with such a
> bridge in this case, and what it really means for the traffic.
>
> >
> > 2) Single bridge with VLAN filtering
> > - A single bridge device br0 is used
> > - All ports sp1, sp2, and sp3 are enslaved to the bridge
> > - br0 allows both VID=10 and VID=20
> > - VLAN policy on the ports determines the bridging domains within the
> bridge
> > - sp1 allows VID=10, it is untagged on egress, and it is the PVID of
> sp1.
> > Linux does not provide a way to only allow untagged traffic to
> enter a port.
> > Both tagged traffic with VID=10 and untagged traffic are
> considered received on VLAN 10.
> > - sp3 allows VID=20, it is untagged on egress, and it is the PVID of
> sp2.
> > - sp2 allows VID=10 and VID=20, both are tagged on egress, and
> there is no PVID so untagged traffic received on sp2 is dropped.
> >
> > - Above configuration is sufficient for L2 switching with two distinct
> VLANs.
> > - L3 routing across VLANs in this model is achieved by vlan interfaces
> on the bridge. br0.10 for VLAN 10 and br0.20 for VLAN 20.
> > - br0.10 and br0.20 are L3 interfaces and have IP addresses assigned.
> > - br0.10 allows VID=10
>
> In this case am I correct in assuming that br0 actually represents the switch
> device ? I don't see a way in which one more such bridge can exist, since the
> forwarding decisions are handled at the switch-device level.
>
br0 represent the HW switch as far as configuration. Offloading (discussed on this patch-set) will propagate bridge and port attributes to underlying port switch devices for HW configuration.
Bypassing data-path handling by the software bridge is yet to be defined.
On the transmit path (i.e. packet originated by the switch OS or applications) it could be OK to use software bridging but it could be inefficient for multicast or flooding.
An alternative that hands packets sent to the bridge or to one of its vlan interfaces (br0.10, br0.20) to the HW switch is desirable.
On the receive path, delivering packets received from the HW by the switch driver on a switch port device (such as sp1, sp2, sp3) has to be avoided as it could cause duplicate packets (packets are flooded in the HW and again by the software bridge).
> Thanks
> Viswanath
>
> >
> >
> >> >
> >> >
> >> >> [...]
> >> >>
> >> >> I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa
> >> >> might have worked some of it out.
> >> >>
> >> >> --
> >> >> John Fastabend Intel Corporation
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe netdev"
> >> > in the body of a message to majordomo@vger.kernel.org More
> >> > majordomo
> >> info
> >> > at http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2014-12-16 22:46 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10 9:05 [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes roopa
2014-12-10 9:37 ` Jiri Pirko
2014-12-11 16:52 ` Roopa Prabhu
2014-12-11 17:11 ` Jiri Pirko
2014-12-11 17:59 ` Roopa Prabhu
2014-12-11 18:07 ` Jiri Pirko
2014-12-11 18:27 ` Roopa Prabhu
2014-12-11 22:25 ` Jiri Pirko
2014-12-14 14:13 ` Roopa Prabhu
2014-12-14 15:35 ` Jiri Pirko
2014-12-14 19:41 ` Roopa Prabhu
2014-12-15 15:26 ` Jamal Hadi Salim
2014-12-15 17:20 ` Roopa Prabhu
2014-12-15 18:03 ` Benjamin LaHaise
2014-12-15 17:25 ` Arad, Ronen
2014-12-15 17:57 ` Benjamin LaHaise
2014-12-15 17:57 ` John Fastabend
2014-12-15 18:36 ` Arad, Ronen
2014-12-15 23:27 ` Jamal Hadi Salim
2014-12-16 0:58 ` Arad, Ronen
2014-12-16 1:20 ` Roopa Prabhu
2014-12-16 11:01 ` Arad, Ronen
2014-12-16 15:54 ` Samudrala, Sridhar
2014-12-16 16:41 ` John Fastabend
2014-12-16 17:29 ` Arad, Ronen
2014-12-16 19:23 ` B Viswanath
2014-12-16 20:52 ` Arad, Ronen
2014-12-16 21:51 ` B Viswanath
2014-12-16 22:46 ` Arad, Ronen
2014-12-16 19:25 ` Roopa Prabhu
2014-12-16 19:20 ` Roopa Prabhu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).