[net-next PATCH v2 0/3] extend set/get netlink for embedded

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [net-next PATCH v2 0/3] extend set/get netlink for embedded
@ 2012-10-24 18:12 John Fastabend
  2012-10-24 18:12 ` [net-next PATCH v2 1/3] net: create generic bridge ops John Fastabend
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: John Fastabend @ 2012-10-24 18:12 UTC (permalink / raw)
  To: shemminger, buytenh, davem, vyasevic
  Cc: jhs, chrisw, krkumar2, samudrala, peter.p.waskiewicz.jr,
	jeffrey.t.kirsher, netdev, bhutchings, gregory.v.rose, eilong

This extends the PF_BRIDGE setlink and getlink so that they can be
used generically outside of the Linux bridge module. Doing this
allows embedded devices to use the same netlink interface that
the software bridge is currently using.

In this patchset I opted to create two new ndo ops ndo_bridge_setlink
and ndo_bridge_getlink. These ops pass the nlmsghdr to the device
for handling. The netlink message is extended to support nested
IFLA_AF_SPEC attributes. A IFLA_BRIDGE_FLAGS attribute is used to
determine the target of the message either a master netdev is the
target or the target is "self" indicating the target is the netdev.
In this way we can send netlink msg to an embedded device or the
linux sw bridge or both. If the set completes sucessfully the flag
is cleared in this way we can learn what failed in the both case.
This scheme is similar to how FDB updates are handled. If no flag
attribute is present the message is sent to the master device to
support the existing messages.

An initial IFLA_BRIDGE_MODE attribute is added to indicate if the
bridge is in a VEPA mode or VEB mode. This is most useful for
SR-IOV device and can be used to indicate support for VF to VF or
VF to PF traffic. Without this we have no way of knowing this
other then trial and error because not all hardware supports
this.

In the future additional attributes can be added to handle other
attributes. We could for example move some of the linux bridge
sysfs attributes here if we want a netlink interface for them.
This should also allow DSA switches to use the bridging tools.
See RFC patch from Lennert

http://patchwork.ozlabs.org/patch/16578/

I could have simply added the VEPA attribute handling to the
rtnl_setlink() routine but it seemed like a good feature to have
all the bridging events and configuration on the same type. Also
this should allow more powerful offloaded switches like the
hardware supported via DSA access to the bridge control
interface.

Also I think this interface should allow Vlad to add his port based
VLAN interface here as well.

Any feedback/criticism appreciated thanks!

---

John Fastabend (3):
      ixgbe: add setlink, getlink support to ixgbe and ixgbevf
      net: set and query VEB/VEPA bridge mode via PF_BRIDGE
      net: create generic bridge ops

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   59 ++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    |    3 
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   10 +
 include/linux/netdevice.h                         |   10 +
 include/linux/rtnetlink.h                         |    3 
 include/uapi/linux/if_bridge.h                    |   18 ++
 net/bridge/br_device.c                            |    2 
 net/bridge/br_netlink.c                           |   75 +-------
 net/bridge/br_private.h                           |    7 +
 net/core/rtnetlink.c                              |  207 +++++++++++++++++++++
 10 files changed, 328 insertions(+), 66 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [net-next PATCH v2 1/3] net: create generic bridge ops
  2012-10-24 18:12 [net-next PATCH v2 0/3] extend set/get netlink for embedded John Fastabend
@ 2012-10-24 18:12 ` John Fastabend
  2012-11-02 22:32   ` Ben Hutchings
  2012-10-24 18:13 ` [net-next PATCH v2 2/3] net: set and query VEB/VEPA bridge mode via PF_BRIDGE John Fastabend
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: John Fastabend @ 2012-10-24 18:12 UTC (permalink / raw)
  To: shemminger, buytenh, davem, vyasevic
  Cc: jhs, chrisw, krkumar2, samudrala, peter.p.waskiewicz.jr,
	jeffrey.t.kirsher, netdev, bhutchings, gregory.v.rose, eilong

The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
currently embedded in the ./net/bridge module. This prohibits
them from being used by other bridging devices. One example
of this being hardware that has embedded bridging components.

In order to use these nlmsg types more generically this patch
adds two net_device_ops hooks. One to set link bridge attributes
and another to dump the current bride attributes.

	ndo_bridge_setlink()
	ndo_bridge_getlink()

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/netdevice.h |   10 ++++++
 net/bridge/br_device.c    |    2 +
 net/bridge/br_netlink.c   |   73 +++++++----------------------------------
 net/bridge/br_private.h   |    3 ++
 net/core/rtnetlink.c      |   80 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 108 insertions(+), 60 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f8eda02..7bf867c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -887,6 +887,10 @@ struct netdev_fcoe_hbainfo {
  *		       struct net_device *dev, int idx)
  *	Used to add FDB entries to dump requests. Implementers should add
  *	entries to skb and update idx with the number of entries.
+ *
+ * int (*ndo_bridge_setlink)(struct net_device *dev, struct nlmsghdr *nlh)
+ * int (*ndo_bridge_getlink)(struct sk_buff *skb, u32 pid, u32 seq,
+ *			     struct net_device *dev)
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -998,6 +1002,12 @@ struct net_device_ops {
 						struct netlink_callback *cb,
 						struct net_device *dev,
 						int idx);
+
+	int			(*ndo_bridge_setlink)(struct net_device *dev,
+						      struct nlmsghdr *nlh);
+	int			(*ndo_bridge_getlink)(struct sk_buff *skb,
+						      u32 pid, u32 seq,
+						      struct net_device *dev);
 };
 
 /*
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 070e8a6..63b5b08 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -313,6 +313,8 @@ static const struct net_device_ops br_netdev_ops = {
 	.ndo_fdb_add		 = br_fdb_add,
 	.ndo_fdb_del		 = br_fdb_delete,
 	.ndo_fdb_dump		 = br_fdb_dump,
+	.ndo_bridge_getlink	 = br_getlink,
+	.ndo_bridge_setlink	 = br_setlink,
 };
 
 static void br_dev_free(struct net_device *dev)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 093f527..743511b 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -111,54 +111,33 @@ errout:
 /*
  * Dump information about all ports, in response to GETLINK
  */
-static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
+int br_getlink(struct sk_buff *skb, u32 pid, u32 seq,
+	       struct net_device *dev)
 {
-	struct net *net = sock_net(skb->sk);
-	struct net_device *dev;
-	int idx;
-
-	idx = 0;
-	rcu_read_lock();
-	for_each_netdev_rcu(net, dev) {
-		struct net_bridge_port *port = br_port_get_rcu(dev);
-
-		/* not a bridge port */
-		if (!port || idx < cb->args[0])
-			goto skip;
-
-		if (br_fill_ifinfo(skb, port,
-				   NETLINK_CB(cb->skb).portid,
-				   cb->nlh->nlmsg_seq, RTM_NEWLINK,
-				   NLM_F_MULTI) < 0)
-			break;
-skip:
-		++idx;
-	}
-	rcu_read_unlock();
-	cb->args[0] = idx;
+	int err = 0;
+	struct net_bridge_port *port = br_port_get_rcu(dev);
+
+	/* not a bridge port */
+	if (!port)
+		goto out;
 
-	return skb->len;
+	err = br_fill_ifinfo(skb, port, pid, seq, RTM_NEWLINK, NLM_F_MULTI);
+out:
+	return err;
 }
 
 /*
  * Change state of port (ie from forwarding to blocking etc)
  * Used by spanning tree in user space.
  */
-static int br_rtm_setlink(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
+int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
 {
-	struct net *net = sock_net(skb->sk);
 	struct ifinfomsg *ifm;
 	struct nlattr *protinfo;
-	struct net_device *dev;
 	struct net_bridge_port *p;
 	u8 new_state;
 
-	if (nlmsg_len(nlh) < sizeof(*ifm))
-		return -EINVAL;
-
 	ifm = nlmsg_data(nlh);
-	if (ifm->ifi_family != AF_BRIDGE)
-		return -EPFNOSUPPORT;
 
 	protinfo = nlmsg_find_attr(nlh, sizeof(*ifm), IFLA_PROTINFO);
 	if (!protinfo || nla_len(protinfo) < sizeof(u8))
@@ -168,10 +147,6 @@ static int br_rtm_setlink(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
 	if (new_state > BR_STATE_BLOCKING)
 		return -EINVAL;
 
-	dev = __dev_get_by_index(net, ifm->ifi_index);
-	if (!dev)
-		return -ENODEV;
-
 	p = br_port_get_rtnl(dev);
 	if (!p)
 		return -EINVAL;
@@ -218,29 +193,7 @@ struct rtnl_link_ops br_link_ops __read_mostly = {
 
 int __init br_netlink_init(void)
 {
-	int err;
-
-	err = rtnl_link_register(&br_link_ops);
-	if (err < 0)
-		goto err1;
-
-	err = __rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL,
-			      br_dump_ifinfo, NULL);
-	if (err)
-		goto err2;
-	err = __rtnl_register(PF_BRIDGE, RTM_SETLINK,
-			      br_rtm_setlink, NULL, NULL);
-	if (err)
-		goto err3;
-
-	return 0;
-
-err3:
-	rtnl_unregister_all(PF_BRIDGE);
-err2:
-	rtnl_link_unregister(&br_link_ops);
-err1:
-	return err;
+	return rtnl_link_register(&br_link_ops);
 }
 
 void __exit br_netlink_fini(void)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 9b278c4..fdcd5f6 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -553,6 +553,9 @@ extern struct rtnl_link_ops br_link_ops;
 extern int br_netlink_init(void);
 extern void br_netlink_fini(void);
 extern void br_ifinfo_notify(int event, struct net_bridge_port *port);
+extern int br_setlink(struct net_device *dev, struct nlmsghdr *nlmsg);
+extern int br_getlink(struct sk_buff *skb, u32 pid, u32 seq,
+		      struct net_device *dev);
 
 #ifdef CONFIG_SYSFS
 /* br_sysfs_if.c */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 64fe3cc..a068666 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2252,6 +2252,83 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
+static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+	int idx = 0;
+	u32 portid = NETLINK_CB(cb->skb).portid;
+	u32 seq = cb->nlh->nlmsg_seq;
+
+	rcu_read_lock();
+	for_each_netdev_rcu(net, dev) {
+		const struct net_device_ops *ops = dev->netdev_ops;
+		struct net_device *master = dev->master;
+
+		if (idx < cb->args[0])
+			continue;
+
+		if (master && master->netdev_ops->ndo_bridge_getlink) {
+			const struct net_device_ops *bops = master->netdev_ops;
+			int err = bops->ndo_bridge_getlink(skb, portid,
+							   seq, dev);
+
+			if (err < 0)
+				break;
+			else
+				idx++;
+		}
+
+		if (ops->ndo_bridge_getlink) {
+			int err = ops->ndo_bridge_getlink(skb, portid,
+							  seq, dev);
+
+			if (err < 0)
+				break;
+			else
+				idx++;
+		}
+	}
+	rcu_read_unlock();
+	cb->args[0] = idx;
+
+	return skb->len;
+}
+
+static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
+			       void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct ifinfomsg *ifm;
+	struct net_device *dev;
+	int err = -EINVAL;
+
+	if (nlmsg_len(nlh) < sizeof(*ifm))
+		return -EINVAL;
+
+	ifm = nlmsg_data(nlh);
+	if (ifm->ifi_family != AF_BRIDGE)
+		return -EPFNOSUPPORT;
+
+	dev = __dev_get_by_index(net, ifm->ifi_index);
+	if (!dev) {
+		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
+		return -ENODEV;
+	}
+
+	if (dev->master && dev->master->netdev_ops->ndo_bridge_setlink) {
+		err = dev->master->netdev_ops->ndo_bridge_setlink(dev, nlh);
+		if (err)
+			goto out;
+	}
+
+	if (dev->netdev_ops->ndo_bridge_setlink)
+		err = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
+
+out:
+	return err;
+}
+
 /* Protected by RTNL sempahore.  */
 static struct rtattr **rta_buf;
 static int rtattr_max;
@@ -2433,5 +2510,8 @@ void __init rtnetlink_init(void)
 	rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, rtnl_fdb_add, NULL, NULL);
 	rtnl_register(PF_BRIDGE, RTM_DELNEIGH, rtnl_fdb_del, NULL, NULL);
 	rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, rtnl_fdb_dump, NULL);
+
+	rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL, rtnl_bridge_getlink, NULL);
+	rtnl_register(PF_BRIDGE, RTM_SETLINK, rtnl_bridge_setlink, NULL, NULL);
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next PATCH v2 2/3] net: set and query VEB/VEPA bridge mode via PF_BRIDGE
  2012-10-24 18:12 [net-next PATCH v2 0/3] extend set/get netlink for embedded John Fastabend
  2012-10-24 18:12 ` [net-next PATCH v2 1/3] net: create generic bridge ops John Fastabend
@ 2012-10-24 18:13 ` John Fastabend
  2012-11-02 22:38   ` Ben Hutchings
  2012-10-24 18:13 ` [net-next PATCH v2 3/3] ixgbe: add setlink, getlink support to ixgbe and ixgbevf John Fastabend
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: John Fastabend @ 2012-10-24 18:13 UTC (permalink / raw)
  To: shemminger, buytenh, davem, vyasevic
  Cc: jhs, chrisw, krkumar2, samudrala, peter.p.waskiewicz.jr,
	jeffrey.t.kirsher, netdev, bhutchings, gregory.v.rose, eilong

Hardware switches may support enabling and disabling the
loopback switch which puts the device in a VEPA mode defined
in the IEEE 802.1Qbg specification. In this mode frames are
not switched in the hardware but sent directly to the switch.
SR-IOV capable NICs will likely support this mode I am
aware of at least two such devices. Also I am told (but don't
have any of this hardware available) that there are devices
that only support VEPA modes. In these cases it is important
at a minimum to be able to query these attributes.

This patch adds an additional IFLA_BRIDGE_MODE attribute that can be
set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also
anticipating bridge attributes that may be common for both embedded
bridges and software bridges this adds a flags attribute
IFLA_BRIDGE_FLAGS currently used to determine if the command or event
is being generated to/from an embedded bridge or software bridge.
Finally, the event generation is pulled out of the bridge module and
into rtnetlink proper.

For example using the macvlan driver in VEPA mode on top of
an embedded switch requires putting the embedded switch into
a VEPA mode to get the expected results.

	--------  --------
        | VEPA |  | VEPA |       <-- macvlan vepa edge relays
        --------  --------
           |        |
           |        |
        ------------------
        |      VEPA      |       <-- embedded switch in NIC
        ------------------
                |
                |
        -------------------
        | external switch |      <-- shiny new physical
	-------------------          switch with VEPA support

A packet sent from the macvlan VEPA at the top could be
loopbacked on the embedded switch and never seen by the
external switch. So in order for this to work the embedded
switch needs to be set in the VEPA state via the above
described commands.

By making these attributes nested in IFLA_AF_SPEC we allow
future extensions to be made as needed.

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/uapi/linux/if_bridge.h |   18 ++++++++
 net/bridge/br_netlink.c        |    2 -
 net/bridge/br_private.h        |    4 +-
 net/core/rtnetlink.c           |   85 ++++++++++++++++++++++++++++++++++++++--
 4 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index a8fe954..b388579 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -97,5 +97,23 @@ struct __fdb_entry {
 	__u16 unused;
 };
 
+/* Bridge Flags */
+#define BRIDGE_FLAGS_MASTER	1	/* Bridge command to/from master */
+#define BRIDGE_FLAGS_SELF	2	/* Bridge command to/from lowerdev */
 
+#define BRIDGE_MODE_VEB		0	/* Default loopback mode */
+#define BRIDGE_MODE_VEPA	1	/* 802.1Qbg defined VEPA mode */
+
+/* Bridge management nested attributes
+ * [IFLA_AF_SPEC] = {
+ *     [IFLA_BRIDGE_FLAGS]
+ *     [IFLA_BRIDGE_MODE]
+ * }
+ */
+enum {
+	IFLA_BRIDGE_FLAGS,
+	IFLA_BRIDGE_MODE,
+	__IFLA_BRIDGE_MAX,
+};
+#define IFLA_BRIDGE_MAX (__IFLA_BRIDGE_MAX - 1)
 #endif /* _UAPI_LINUX_IF_BRIDGE_H */
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 743511b..14b065c 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -166,8 +166,6 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
 	br_port_state_selection(p->br);
 	spin_unlock_bh(&p->br->lock);
 
-	br_ifinfo_notify(RTM_NEWLINK, p);
-
 	return 0;
 }
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index fdcd5f6..6f40c14 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -158,7 +158,9 @@ struct net_bridge_port
 
 static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev)
 {
-	struct net_bridge_port *port = rcu_dereference(dev->rx_handler_data);
+	struct net_bridge_port *port =
+			rcu_dereference_rtnl(dev->rx_handler_data);
+
 	return br_port_exists(dev) ? port : NULL;
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a068666..8d2af0f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2295,13 +2295,60 @@ static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
+static inline size_t bridge_nlmsg_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct ifinfomsg))
+		+ nla_total_size(IFNAMSIZ)	/* IFLA_IFNAME */
+		+ nla_total_size(MAX_ADDR_LEN)	/* IFLA_ADDRESS */
+		+ nla_total_size(sizeof(u32))	/* IFLA_MASTER */
+		+ nla_total_size(sizeof(u32))	/* IFLA_MTU */
+		+ nla_total_size(sizeof(u32))	/* IFLA_LINK */
+		+ nla_total_size(sizeof(u32))	/* IFLA_OPERSTATE */
+		+ nla_total_size(sizeof(u8))	/* IFLA_PROTINFO */
+		+ nla_total_size(sizeof(struct nlattr))	/* IFLA_AF_SPEC */
+		+ nla_total_size(sizeof(u16))	/* IFLA_BRIDGE_FLAGS */
+		+ nla_total_size(sizeof(u16));	/* IFLA_BRIDGE_MODE */
+}
+
+static int rtnl_bridge_notify(struct net_device *dev, u16 flags)
+{
+	struct net *net = dev_net(dev);
+	struct net_device *master = dev->master;
+	struct sk_buff *skb;
+	int err = -EOPNOTSUPP;
+
+	skb = nlmsg_new(bridge_nlmsg_size(), GFP_ATOMIC);
+	if (!skb) {
+		err = -ENOMEM;
+		goto errout;
+	}
+
+	if (!flags && master && master->netdev_ops->ndo_bridge_getlink)
+		err = master->netdev_ops->ndo_bridge_getlink(skb, 0, 0, dev);
+	else if (dev->netdev_ops->ndo_bridge_getlink)
+		err = dev->netdev_ops->ndo_bridge_getlink(skb, 0, 0, dev);
+
+	if (err < 0)
+		goto errout;
+
+	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+	return 0;
+errout:
+	WARN_ON(err == -EMSGSIZE);
+	kfree_skb(skb);
+	rtnl_set_sk_err(net, RTNLGRP_LINK, err);
+	return err;
+}
+
 static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			       void *arg)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ifinfomsg *ifm;
 	struct net_device *dev;
-	int err = -EINVAL;
+	struct nlattr *br_spec, *attr = NULL;
+	int rem, err = -EOPNOTSUPP;
+	u16 flags = 0;
 
 	if (nlmsg_len(nlh) < sizeof(*ifm))
 		return -EINVAL;
@@ -2316,15 +2363,45 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -ENODEV;
 	}
 
-	if (dev->master && dev->master->netdev_ops->ndo_bridge_setlink) {
+	br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
+	if (br_spec) {
+		nla_for_each_nested(attr, br_spec, rem) {
+			if (nla_type(attr) == IFLA_BRIDGE_FLAGS) {
+				flags = nla_get_u16(attr);
+				break;
+			}
+		}
+	}
+
+	if (!flags || (flags & BRIDGE_FLAGS_MASTER)) {
+		if (!dev->master ||
+		    !dev->master->netdev_ops->ndo_bridge_setlink) {
+			err = -EOPNOTSUPP;
+			goto out;
+		}
+
 		err = dev->master->netdev_ops->ndo_bridge_setlink(dev, nlh);
 		if (err)
 			goto out;
+
+		flags &= ~BRIDGE_FLAGS_MASTER;
 	}
 
-	if (dev->netdev_ops->ndo_bridge_setlink)
-		err = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
+	if ((flags & BRIDGE_FLAGS_SELF)) {
+		if (!dev->netdev_ops->ndo_bridge_setlink)
+			err = -EOPNOTSUPP;
+		else
+			err = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
+
+		if (!err)
+			flags &= ~BRIDGE_FLAGS_SELF;
+	}
 
+	if (attr && nla_type(attr) == IFLA_BRIDGE_FLAGS)
+		memcpy(nla_data(attr), &flags, sizeof(flags));
+	/* Generate event to notify upper layer of bridge change */
+	if (!err)
+		err = rtnl_bridge_notify(dev, flags);
 out:
 	return err;
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next PATCH v2 3/3] ixgbe: add setlink, getlink support to ixgbe and ixgbevf
  2012-10-24 18:12 [net-next PATCH v2 0/3] extend set/get netlink for embedded John Fastabend
  2012-10-24 18:12 ` [net-next PATCH v2 1/3] net: create generic bridge ops John Fastabend
  2012-10-24 18:13 ` [net-next PATCH v2 2/3] net: set and query VEB/VEPA bridge mode via PF_BRIDGE John Fastabend
@ 2012-10-24 18:13 ` John Fastabend
  2012-10-25 21:09 ` [net-next PATCH v2 0/3] extend set/get netlink for embedded Ariel Elior
  2012-10-31 17:21 ` David Miller
  4 siblings, 0 replies; 13+ messages in thread
From: John Fastabend @ 2012-10-24 18:13 UTC (permalink / raw)
  To: shemminger, buytenh, davem, vyasevic
  Cc: jhs, chrisw, krkumar2, samudrala, peter.p.waskiewicz.jr,
	jeffrey.t.kirsher, netdev, bhutchings, gregory.v.rose, eilong

This adds support for the net device ops to manage the embedded
hardware bridge on ixgbe devices. With this patch the bridge
mode can be toggled between VEB and VEPA to support stacking
macvlan devices or using the embedded switch without any SW
component in 802.1Qbg/br environments.

Additionally, this adds source address pruning to the ixgbevf
driver to prune any frames sent back from a reflective relay on
the switch. This is required because the existing hardware does
not support this. Without it frames get pushed into the stack
with its own src mac which is invalid per 802.1Qbg VEPA
definition.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   59 ++++++++++++++++++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    |    3 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   10 ++++
 include/linux/rtnetlink.h                         |    3 +
 net/core/rtnetlink.c                              |   50 ++++++++++++++++++
 5 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 88d636a..9a88e01 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -44,6 +44,7 @@
 #include <linux/ethtool.h>
 #include <linux/if.h>
 #include <linux/if_vlan.h>
+#include <linux/if_bridge.h>
 #include <linux/prefetch.h>
 #include <scsi/fc/fc_fcoe.h>
 
@@ -3224,7 +3225,6 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset ^ 1), reg_offset - 1);
 	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), (~0) << vf_shift);
 	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset ^ 1), reg_offset - 1);
-	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
 
 	/* Map PF MAC address in RAR Entry 0 to first pool following VFs */
 	hw->mac.ops.set_vmdq(hw, 0, VMDQ_P(0));
@@ -3247,8 +3247,6 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 
 	IXGBE_WRITE_REG(hw, IXGBE_GCR_EXT, gcr_ext);
 
-	/* enable Tx loopback for VF/PF communication */
-	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
 
 	/* Enable MAC Anti-Spoofing */
 	hw->mac.ops.set_mac_anti_spoofing(hw, (adapter->num_vfs != 0),
@@ -7025,6 +7023,59 @@ static int ixgbe_ndo_fdb_dump(struct sk_buff *skb,
 	return idx;
 }
 
+static int ixgbe_ndo_bridge_setlink(struct net_device *dev,
+				    struct nlmsghdr *nlh)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	struct nlattr *attr, *br_spec;
+	int rem;
+
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+		return -EOPNOTSUPP;
+
+	br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
+
+	nla_for_each_nested(attr, br_spec, rem) {
+		__u16 mode;
+		u32 reg = 0;
+
+		if (nla_type(attr) != IFLA_BRIDGE_MODE)
+			continue;
+
+		mode = nla_get_u16(attr);
+		if (mode == BRIDGE_MODE_VEPA)
+			reg = 0;
+		else if (mode == BRIDGE_MODE_VEB)
+			reg = IXGBE_PFDTXGSWC_VT_LBEN;
+		else
+			return -EINVAL;
+
+		IXGBE_WRITE_REG(&adapter->hw, IXGBE_PFDTXGSWC, reg);
+
+		e_info(drv, "enabling bridge mode: %s\n",
+			mode == BRIDGE_MODE_VEPA ? "VEPA" : "VEB");
+	}
+
+	return 0;
+}
+
+static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
+				    struct net_device *dev)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	u16 mode;
+
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+		return 0;
+
+	if (IXGBE_READ_REG(&adapter->hw, IXGBE_PFDTXGSWC) & 1)
+		mode = BRIDGE_MODE_VEB;
+	else
+		mode = BRIDGE_MODE_VEPA;
+
+	return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode);
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -7064,6 +7115,8 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_fdb_add		= ixgbe_ndo_fdb_add,
 	.ndo_fdb_del		= ixgbe_ndo_fdb_del,
 	.ndo_fdb_dump		= ixgbe_ndo_fdb_dump,
+	.ndo_bridge_setlink	= ixgbe_ndo_bridge_setlink,
+	.ndo_bridge_getlink	= ixgbe_ndo_bridge_getlink,
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 96876b7..7e3ac28 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -117,6 +117,9 @@ void ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 		}
 	}
 
+	/* Initialize default switching mode VEB */
+	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
+
 	/* If call to enable VFs succeeded then allocate memory
 	 * for per VF control structures.
 	 */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 07d7eab..ac6a76d 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -478,6 +478,16 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		}
 		skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 
+		/* Workaround hardware that can't do proper VEPA multicast
+		 * source pruning.
+		 */
+		if ((skb->pkt_type & (PACKET_BROADCAST | PACKET_MULTICAST)) &&
+		    !(compare_ether_addr(adapter->netdev->dev_addr,
+					eth_hdr(skb)->h_source))) {
+			dev_kfree_skb_irq(skb);
+			goto next_desc;
+		}
+
 		ixgbevf_receive_skb(q_vector, skb, staterr, rx_desc);
 
 next_desc:
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 7002bbf..489dd7bb 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -69,4 +69,7 @@ extern int ndo_dflt_fdb_dump(struct sk_buff *skb,
 			     struct netlink_callback *cb,
 			     struct net_device *dev,
 			     int idx);
+
+extern int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
+				   struct net_device *dev, u16 mode);
 #endif	/* __LINUX_RTNETLINK_H */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8d2af0f..51dc58f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2252,6 +2252,56 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
+int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
+			    struct net_device *dev, u16 mode)
+{
+	struct nlmsghdr *nlh;
+	struct ifinfomsg *ifm;
+	struct nlattr *br_afspec;
+	u8 operstate = netif_running(dev) ? dev->operstate : IF_OPER_DOWN;
+
+	nlh = nlmsg_put(skb, pid, seq, RTM_NEWLINK, sizeof(*ifm), NLM_F_MULTI);
+	if (nlh == NULL)
+		return -EMSGSIZE;
+
+	ifm = nlmsg_data(nlh);
+	ifm->ifi_family = AF_BRIDGE;
+	ifm->__ifi_pad = 0;
+	ifm->ifi_type = dev->type;
+	ifm->ifi_index = dev->ifindex;
+	ifm->ifi_flags = dev_get_flags(dev);
+	ifm->ifi_change = 0;
+
+
+	if (nla_put_string(skb, IFLA_IFNAME, dev->name) ||
+	    nla_put_u32(skb, IFLA_MTU, dev->mtu) ||
+	    nla_put_u8(skb, IFLA_OPERSTATE, operstate) ||
+	    (dev->master &&
+	     nla_put_u32(skb, IFLA_MASTER, dev->master->ifindex)) ||
+	    (dev->addr_len &&
+	     nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr)) ||
+	    (dev->ifindex != dev->iflink &&
+	     nla_put_u32(skb, IFLA_LINK, dev->iflink)))
+		goto nla_put_failure;
+
+	br_afspec = nla_nest_start(skb, IFLA_AF_SPEC);
+	if (!br_afspec)
+		goto nla_put_failure;
+
+	if (nla_put_u16(skb, IFLA_BRIDGE_FLAGS, BRIDGE_FLAGS_SELF) ||
+	    nla_put_u16(skb, IFLA_BRIDGE_MODE, mode)) {
+		nla_nest_cancel(skb, br_afspec);
+		goto nla_put_failure;
+	}
+	nla_nest_end(skb, br_afspec);
+
+	return nlmsg_end(skb, nlh);
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+EXPORT_SYMBOL(ndo_dflt_bridge_getlink);
+
 static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct net *net = sock_net(skb->sk);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [net-next PATCH v2 0/3] extend set/get netlink for embedded
  2012-10-24 18:12 [net-next PATCH v2 0/3] extend set/get netlink for embedded John Fastabend
                   ` (2 preceding siblings ...)
  2012-10-24 18:13 ` [net-next PATCH v2 3/3] ixgbe: add setlink, getlink support to ixgbe and ixgbevf John Fastabend
@ 2012-10-25 21:09 ` Ariel Elior
  2012-11-13 17:16   ` John Fastabend
  2012-10-31 17:21 ` David Miller
  4 siblings, 1 reply; 13+ messages in thread
From: Ariel Elior @ 2012-10-25 21:09 UTC (permalink / raw)
  To: 'John Fastabend', shemminger@vyatta.com,
	buytenh@wantstofly.org, davem@davemloft.net, vyasevic@redhat.com
  Cc: jhs@mojatatu.com, chrisw@redhat.com, krkumar2@in.ibm.com,
	samudrala@us.ibm.com, peter.p.waskiewicz.jr@intel.com,
	jeffrey.t.kirsher@intel.com, netdev@vger.kernel.org,
	bhutchings@solarflare.com, gregory.v.rose@intel.com,
	Eilon Greenstein



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of John Fastabend
> Sent: Wednesday, October 24, 2012 8:13 PM
> To: shemminger@vyatta.com; buytenh@wantstofly.org;
> davem@davemloft.net; vyasevic@redhat.com
> Cc: jhs@mojatatu.com; chrisw@redhat.com; krkumar2@in.ibm.com;
> samudrala@us.ibm.com; peter.p.waskiewicz.jr@intel.com;
> jeffrey.t.kirsher@intel.com; netdev@vger.kernel.org;
> bhutchings@solarflare.com; gregory.v.rose@intel.com; Eilon Greenstein
> Subject: [net-next PATCH v2 0/3] extend set/get netlink for embedded
> 
> This extends the PF_BRIDGE setlink and getlink so that they can be used
> generically outside of the Linux bridge module. Doing this allows
> embedded devices to use the same netlink interface that the software
> bridge is currently using.
> 
> In this patchset I opted to create two new ndo ops ndo_bridge_setlink and
> ndo_bridge_getlink. These ops pass the nlmsghdr to the device for
> handling. The netlink message is extended to support nested IFLA_AF_SPEC
> attributes. A IFLA_BRIDGE_FLAGS attribute is used to determine the target
> of the message either a master netdev is the target or the target is "self"
> indicating the target is the netdev.
> In this way we can send netlink msg to an embedded device or the linux sw
> bridge or both. If the set completes sucessfully the flag is cleared in this way
> we can learn what failed in the both case.
> This scheme is similar to how FDB updates are handled. If no flag attribute is
> present the message is sent to the master device to support the existing
> messages.
> 
> An initial IFLA_BRIDGE_MODE attribute is added to indicate if the bridge is
> in a VEPA mode or VEB mode. This is most useful for SR-IOV device and can
> be used to indicate support for VF to VF or VF to PF traffic. Without this we
> have no way of knowing this other then trial and error because not all
> hardware supports this.
> 
> In the future additional attributes can be added to handle other attributes.
> We could for example move some of the linux bridge sysfs attributes here if
> we want a netlink interface for them.
> This should also allow DSA switches to use the bridging tools.
> See RFC patch from Lennert
> 
> http://patchwork.ozlabs.org/patch/16578/
> 
> I could have simply added the VEPA attribute handling to the
> rtnl_setlink() routine but it seemed like a good feature to have all the
> bridging events and configuration on the same type. Also this should allow
> more powerful offloaded switches like the hardware supported via DSA
> access to the bridge control interface.
> 
> Also I think this interface should allow Vlad to add his port based VLAN
> interface here as well.
> 
> Any feedback/criticism appreciated thanks!
> 


This series looks fine from bnx2x point of view.
The dynamic change from VEB to VEPA will require a firmware change,
so we might arrive a little late for the party.
Ariel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH v2 0/3] extend set/get netlink for embedded
  2012-10-24 18:12 [net-next PATCH v2 0/3] extend set/get netlink for embedded John Fastabend
                   ` (3 preceding siblings ...)
  2012-10-25 21:09 ` [net-next PATCH v2 0/3] extend set/get netlink for embedded Ariel Elior
@ 2012-10-31 17:21 ` David Miller
  4 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2012-10-31 17:21 UTC (permalink / raw)
  To: john.r.fastabend
  Cc: shemminger, buytenh, vyasevic, jhs, chrisw, krkumar2, samudrala,
	peter.p.waskiewicz.jr, jeffrey.t.kirsher, netdev, bhutchings,
	gregory.v.rose, eilong


All applied to net-next, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH v2 1/3] net: create generic bridge ops
  2012-10-24 18:12 ` [net-next PATCH v2 1/3] net: create generic bridge ops John Fastabend
@ 2012-11-02 22:32   ` Ben Hutchings
  2012-11-02 23:46     ` John Fastabend
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Hutchings @ 2012-11-02 22:32 UTC (permalink / raw)
  To: John Fastabend
  Cc: shemminger, buytenh, davem, vyasevic, jhs, chrisw, krkumar2,
	samudrala, peter.p.waskiewicz.jr, jeffrey.t.kirsher, netdev,
	gregory.v.rose, eilong

On Wed, 2012-10-24 at 11:12 -0700, John Fastabend wrote:
> The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
> currently embedded in the ./net/bridge module. This prohibits
> them from being used by other bridging devices. One example
> of this being hardware that has embedded bridging components.
> 
> In order to use these nlmsg types more generically this patch
> adds two net_device_ops hooks. One to set link bridge attributes
> and another to dump the current bride attributes.
> 
> 	ndo_bridge_setlink()
> 	ndo_bridge_getlink()
[...]

Do you have any userland code for this?

[...]
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -2252,6 +2252,83 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
>  	return skb->len;
>  }
>  
> +static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
> +{
> +	struct net *net = sock_net(skb->sk);
> +	struct net_device *dev;
> +	int idx = 0;
> +	u32 portid = NETLINK_CB(cb->skb).portid;
> +	u32 seq = cb->nlh->nlmsg_seq;
> +
> +	rcu_read_lock();
> +	for_each_netdev_rcu(net, dev) {
> +		const struct net_device_ops *ops = dev->netdev_ops;
> +		struct net_device *master = dev->master;
> +
> +		if (idx < cb->args[0])
> +			continue;
[...]

This is wrong, as idx will not be incremented and continued iteration
won't work at all.  Since this has already been applied, I'll send a
patch for this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH v2 2/3] net: set and query VEB/VEPA bridge mode via PF_BRIDGE
  2012-10-24 18:13 ` [net-next PATCH v2 2/3] net: set and query VEB/VEPA bridge mode via PF_BRIDGE John Fastabend
@ 2012-11-02 22:38   ` Ben Hutchings
  2012-11-02 22:48     ` John Fastabend
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Hutchings @ 2012-11-02 22:38 UTC (permalink / raw)
  To: John Fastabend
  Cc: shemminger, buytenh, davem, vyasevic, jhs, chrisw, krkumar2,
	samudrala, peter.p.waskiewicz.jr, jeffrey.t.kirsher, netdev,
	gregory.v.rose, eilong

On Wed, 2012-10-24 at 11:13 -0700, John Fastabend wrote:
> Hardware switches may support enabling and disabling the
> loopback switch which puts the device in a VEPA mode defined
> in the IEEE 802.1Qbg specification. In this mode frames are
> not switched in the hardware but sent directly to the switch.
> SR-IOV capable NICs will likely support this mode I am
> aware of at least two such devices. Also I am told (but don't
> have any of this hardware available) that there are devices
> that only support VEPA modes. In these cases it is important
> at a minimum to be able to query these attributes.
> 
> This patch adds an additional IFLA_BRIDGE_MODE attribute that can be
> set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also
> anticipating bridge attributes that may be common for both embedded
> bridges and software bridges this adds a flags attribute
> IFLA_BRIDGE_FLAGS currently used to determine if the command or event
> is being generated to/from an embedded bridge or software bridge.
> Finally, the event generation is pulled out of the bridge module and
> into rtnetlink proper.
[...]
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
[...]
> +static int rtnl_bridge_notify(struct net_device *dev, u16 flags)
> +{
> +	struct net *net = dev_net(dev);
> +	struct net_device *master = dev->master;
> +	struct sk_buff *skb;
> +	int err = -EOPNOTSUPP;
> +
> +	skb = nlmsg_new(bridge_nlmsg_size(), GFP_ATOMIC);
> +	if (!skb) {
> +		err = -ENOMEM;
> +		goto errout;
> +	}
> +
> +	if (!flags && master && master->netdev_ops->ndo_bridge_getlink)
> +		err = master->netdev_ops->ndo_bridge_getlink(skb, 0, 0, dev);
> +	else if (dev->netdev_ops->ndo_bridge_getlink)
> +		err = dev->netdev_ops->ndo_bridge_getlink(skb, 0, 0, dev);

This doesn't make sense.  If flags == BRIDGE_FLAGS_MASTER then we only
send the 'self' information.  If flags == BRIDGE_FLAGS_MASTER |
BRIDGE_FLAGS_SELF then we only send the master information.

[...]
>  static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>  			       void *arg)
>  {
>  	struct net *net = sock_net(skb->sk);
>  	struct ifinfomsg *ifm;
>  	struct net_device *dev;
> -	int err = -EINVAL;
> +	struct nlattr *br_spec, *attr = NULL;
> +	int rem, err = -EOPNOTSUPP;
> +	u16 flags = 0;
>  
>  	if (nlmsg_len(nlh) < sizeof(*ifm))
>  		return -EINVAL;
> @@ -2316,15 +2363,45 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>  		return -ENODEV;
>  	}
>  
> -	if (dev->master && dev->master->netdev_ops->ndo_bridge_setlink) {
> +	br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
> +	if (br_spec) {
> +		nla_for_each_nested(attr, br_spec, rem) {
> +			if (nla_type(attr) == IFLA_BRIDGE_FLAGS) {
> +				flags = nla_get_u16(attr);
> +				break;
> +			}
> +		}
> +	}
> +
> +	if (!flags || (flags & BRIDGE_FLAGS_MASTER)) {
> +		if (!dev->master ||
> +		    !dev->master->netdev_ops->ndo_bridge_setlink) {
> +			err = -EOPNOTSUPP;
> +			goto out;
> +		}
> +
>  		err = dev->master->netdev_ops->ndo_bridge_setlink(dev, nlh);
>  		if (err)
>  			goto out;
> +
> +		flags &= ~BRIDGE_FLAGS_MASTER;
>  	}
>  
> -	if (dev->netdev_ops->ndo_bridge_setlink)
> -		err = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
> +	if ((flags & BRIDGE_FLAGS_SELF)) {
> +		if (!dev->netdev_ops->ndo_bridge_setlink)
> +			err = -EOPNOTSUPP;
> +		else
> +			err = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
> +
> +		if (!err)
> +			flags &= ~BRIDGE_FLAGS_SELF;
> +	}
>  
> +	if (attr && nla_type(attr) == IFLA_BRIDGE_FLAGS)

This condition is wrong; attr will *not* be NULL if the
nla_for_each_nested() loop terminates without finding an
IFLA_BRIDGE_FLAGS attribute.

> +		memcpy(nla_data(attr), &flags, sizeof(flags));
> +	/* Generate event to notify upper layer of bridge change */
> +	if (!err)
> +		err = rtnl_bridge_notify(dev, flags);

flags has been modified above; surely we want to use the original value
here?

Ben.

>  out:
>  	return err;
>  }
> 

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH v2 2/3] net: set and query VEB/VEPA bridge mode via PF_BRIDGE
  2012-11-02 22:38   ` Ben Hutchings
@ 2012-11-02 22:48     ` John Fastabend
  2012-11-02 23:01       ` Ben Hutchings
  0 siblings, 1 reply; 13+ messages in thread
From: John Fastabend @ 2012-11-02 22:48 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: shemminger, buytenh, davem, vyasevic, jhs, chrisw, krkumar2,
	samudrala, peter.p.waskiewicz.jr, jeffrey.t.kirsher, netdev,
	gregory.v.rose, eilong

On 11/2/2012 3:38 PM, Ben Hutchings wrote:
> On Wed, 2012-10-24 at 11:13 -0700, John Fastabend wrote:
>> Hardware switches may support enabling and disabling the
>> loopback switch which puts the device in a VEPA mode defined
>> in the IEEE 802.1Qbg specification. In this mode frames are
>> not switched in the hardware but sent directly to the switch.
>> SR-IOV capable NICs will likely support this mode I am
>> aware of at least two such devices. Also I am told (but don't
>> have any of this hardware available) that there are devices
>> that only support VEPA modes. In these cases it is important
>> at a minimum to be able to query these attributes.
>>
>> This patch adds an additional IFLA_BRIDGE_MODE attribute that can be
>> set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also
>> anticipating bridge attributes that may be common for both embedded
>> bridges and software bridges this adds a flags attribute
>> IFLA_BRIDGE_FLAGS currently used to determine if the command or event
>> is being generated to/from an embedded bridge or software bridge.
>> Finally, the event generation is pulled out of the bridge module and
>> into rtnetlink proper.
> [...]
>> --- a/net/core/rtnetlink.c
>> +++ b/net/core/rtnetlink.c
> [...]
>> +static int rtnl_bridge_notify(struct net_device *dev, u16 flags)
>> +{
>> +	struct net *net = dev_net(dev);
>> +	struct net_device *master = dev->master;
>> +	struct sk_buff *skb;
>> +	int err = -EOPNOTSUPP;
>> +
>> +	skb = nlmsg_new(bridge_nlmsg_size(), GFP_ATOMIC);
>> +	if (!skb) {
>> +		err = -ENOMEM;
>> +		goto errout;
>> +	}
>> +
>> +	if (!flags && master && master->netdev_ops->ndo_bridge_getlink)
>> +		err = master->netdev_ops->ndo_bridge_getlink(skb, 0, 0, dev);
>> +	else if (dev->netdev_ops->ndo_bridge_getlink)
>> +		err = dev->netdev_ops->ndo_bridge_getlink(skb, 0, 0, dev);
>
> This doesn't make sense.  If flags == BRIDGE_FLAGS_MASTER then we only
> send the 'self' information.  If flags == BRIDGE_FLAGS_MASTER |
> BRIDGE_FLAGS_SELF then we only send the master information.
>

agh. I screwed this up when I made this handle setting both master
and self flags. I'll fix it. Thanks.

> [...]
>>   static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>>   			       void *arg)
>>   {
>>   	struct net *net = sock_net(skb->sk);
>>   	struct ifinfomsg *ifm;
>>   	struct net_device *dev;
>> -	int err = -EINVAL;
>> +	struct nlattr *br_spec, *attr = NULL;
>> +	int rem, err = -EOPNOTSUPP;
>> +	u16 flags = 0;
>>
>>   	if (nlmsg_len(nlh) < sizeof(*ifm))
>>   		return -EINVAL;
>> @@ -2316,15 +2363,45 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>>   		return -ENODEV;
>>   	}
>>
>> -	if (dev->master && dev->master->netdev_ops->ndo_bridge_setlink) {
>> +	br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
>> +	if (br_spec) {
>> +		nla_for_each_nested(attr, br_spec, rem) {
>> +			if (nla_type(attr) == IFLA_BRIDGE_FLAGS) {
>> +				flags = nla_get_u16(attr);
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	if (!flags || (flags & BRIDGE_FLAGS_MASTER)) {
>> +		if (!dev->master ||
>> +		    !dev->master->netdev_ops->ndo_bridge_setlink) {
>> +			err = -EOPNOTSUPP;
>> +			goto out;
>> +		}
>> +
>>   		err = dev->master->netdev_ops->ndo_bridge_setlink(dev, nlh);
>>   		if (err)
>>   			goto out;
>> +
>> +		flags &= ~BRIDGE_FLAGS_MASTER;
>>   	}
>>
>> -	if (dev->netdev_ops->ndo_bridge_setlink)
>> -		err = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
>> +	if ((flags & BRIDGE_FLAGS_SELF)) {
>> +		if (!dev->netdev_ops->ndo_bridge_setlink)
>> +			err = -EOPNOTSUPP;
>> +		else
>> +			err = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
>> +
>> +		if (!err)
>> +			flags &= ~BRIDGE_FLAGS_SELF;
>> +	}
>>
>> +	if (attr && nla_type(attr) == IFLA_BRIDGE_FLAGS)
>
> This condition is wrong; attr will *not* be NULL if the
> nla_for_each_nested() loop terminates without finding an
> IFLA_BRIDGE_FLAGS attribute.

It might be NULL if the nlmsg has no IFLA_AF_SPEC attr. In this case
we still need to send the PROTINFO attribute to the master which
could be the linux bridge.

>
>> +		memcpy(nla_data(attr), &flags, sizeof(flags));
>> +	/* Generate event to notify upper layer of bridge change */
>> +	if (!err)
>> +		err = rtnl_bridge_notify(dev, flags);
>
> flags has been modified above; surely we want to use the original value
> here?

Yes. Thanks Ben. I'll have a patch shortly.

>
> Ben.
>
>>   out:
>>   	return err;
>>   }
>>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH v2 2/3] net: set and query VEB/VEPA bridge mode via PF_BRIDGE
  2012-11-02 22:48     ` John Fastabend
@ 2012-11-02 23:01       ` Ben Hutchings
  2012-11-02 23:03         ` John Fastabend
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Hutchings @ 2012-11-02 23:01 UTC (permalink / raw)
  To: John Fastabend
  Cc: shemminger, buytenh, davem, vyasevic, jhs, chrisw, krkumar2,
	samudrala, peter.p.waskiewicz.jr, jeffrey.t.kirsher, netdev,
	gregory.v.rose, eilong

On Fri, 2012-11-02 at 15:48 -0700, John Fastabend wrote:
> On 11/2/2012 3:38 PM, Ben Hutchings wrote:
> > On Wed, 2012-10-24 at 11:13 -0700, John Fastabend wrote:
> >> Hardware switches may support enabling and disabling the
> >> loopback switch which puts the device in a VEPA mode defined
> >> in the IEEE 802.1Qbg specification. In this mode frames are
> >> not switched in the hardware but sent directly to the switch.
> >> SR-IOV capable NICs will likely support this mode I am
> >> aware of at least two such devices. Also I am told (but don't
> >> have any of this hardware available) that there are devices
> >> that only support VEPA modes. In these cases it is important
> >> at a minimum to be able to query these attributes.
> >>
> >> This patch adds an additional IFLA_BRIDGE_MODE attribute that can be
> >> set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also
> >> anticipating bridge attributes that may be common for both embedded
> >> bridges and software bridges this adds a flags attribute
> >> IFLA_BRIDGE_FLAGS currently used to determine if the command or event
> >> is being generated to/from an embedded bridge or software bridge.
> >> Finally, the event generation is pulled out of the bridge module and
> >> into rtnetlink proper.
[...]
> >> +	if (attr && nla_type(attr) == IFLA_BRIDGE_FLAGS)
> >
> > This condition is wrong; attr will *not* be NULL if the
> > nla_for_each_nested() loop terminates without finding an
> > IFLA_BRIDGE_FLAGS attribute.
> 
> It might be NULL if the nlmsg has no IFLA_AF_SPEC attr. In this case
> we still need to send the PROTINFO attribute to the master which
> could be the linux bridge.
[...]

I think nla_for_each_nested() can leave attr non-null but also not valid
for use with nla_type().  And that's a problem.  I think it would be
better to use an explicit flag for whether we found that attribute,
rather than trying to re-test here.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH v2 2/3] net: set and query VEB/VEPA bridge mode via PF_BRIDGE
  2012-11-02 23:01       ` Ben Hutchings
@ 2012-11-02 23:03         ` John Fastabend
  0 siblings, 0 replies; 13+ messages in thread
From: John Fastabend @ 2012-11-02 23:03 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: shemminger, buytenh, davem, vyasevic, jhs, chrisw, krkumar2,
	samudrala, peter.p.waskiewicz.jr, jeffrey.t.kirsher, netdev,
	gregory.v.rose, eilong

On 11/2/2012 4:01 PM, Ben Hutchings wrote:
> On Fri, 2012-11-02 at 15:48 -0700, John Fastabend wrote:
>> On 11/2/2012 3:38 PM, Ben Hutchings wrote:
>>> On Wed, 2012-10-24 at 11:13 -0700, John Fastabend wrote:
>>>> Hardware switches may support enabling and disabling the
>>>> loopback switch which puts the device in a VEPA mode defined
>>>> in the IEEE 802.1Qbg specification. In this mode frames are
>>>> not switched in the hardware but sent directly to the switch.
>>>> SR-IOV capable NICs will likely support this mode I am
>>>> aware of at least two such devices. Also I am told (but don't
>>>> have any of this hardware available) that there are devices
>>>> that only support VEPA modes. In these cases it is important
>>>> at a minimum to be able to query these attributes.
>>>>
>>>> This patch adds an additional IFLA_BRIDGE_MODE attribute that can be
>>>> set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also
>>>> anticipating bridge attributes that may be common for both embedded
>>>> bridges and software bridges this adds a flags attribute
>>>> IFLA_BRIDGE_FLAGS currently used to determine if the command or event
>>>> is being generated to/from an embedded bridge or software bridge.
>>>> Finally, the event generation is pulled out of the bridge module and
>>>> into rtnetlink proper.
> [...]
>>>> +	if (attr && nla_type(attr) == IFLA_BRIDGE_FLAGS)
>>>
>>> This condition is wrong; attr will *not* be NULL if the
>>> nla_for_each_nested() loop terminates without finding an
>>> IFLA_BRIDGE_FLAGS attribute.
>>
>> It might be NULL if the nlmsg has no IFLA_AF_SPEC attr. In this case
>> we still need to send the PROTINFO attribute to the master which
>> could be the linux bridge.
> [...]
>
> I think nla_for_each_nested() can leave attr non-null but also not valid
> for use with nla_type().  And that's a problem.  I think it would be
> better to use an explicit flag for whether we found that attribute,
> rather than trying to re-test here.
>
> Ben.
>

OK. I'll add an explicit flag for this. Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH v2 1/3] net: create generic bridge ops
  2012-11-02 22:32   ` Ben Hutchings
@ 2012-11-02 23:46     ` John Fastabend
  0 siblings, 0 replies; 13+ messages in thread
From: John Fastabend @ 2012-11-02 23:46 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: shemminger, buytenh, davem, vyasevic, jhs, chrisw, krkumar2,
	samudrala, peter.p.waskiewicz.jr, jeffrey.t.kirsher, netdev,
	gregory.v.rose, eilong

On 11/2/2012 3:32 PM, Ben Hutchings wrote:
> On Wed, 2012-10-24 at 11:12 -0700, John Fastabend wrote:
>> The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
>> currently embedded in the ./net/bridge module. This prohibits
>> them from being used by other bridging devices. One example
>> of this being hardware that has embedded bridging components.
>>
>> In order to use these nlmsg types more generically this patch
>> adds two net_device_ops hooks. One to set link bridge attributes
>> and another to dump the current bride attributes.
>>
>> 	ndo_bridge_setlink()
>> 	ndo_bridge_getlink()
> [...]
>
> Do you have any userland code for this?

Just sent this to netdev. Or you can pull it here,

git://github.com/jrfastab/net-next-iproute2-jf.git


I'll test your idx fix now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH v2 0/3] extend set/get netlink for embedded
  2012-10-25 21:09 ` [net-next PATCH v2 0/3] extend set/get netlink for embedded Ariel Elior
@ 2012-11-13 17:16   ` John Fastabend
  0 siblings, 0 replies; 13+ messages in thread
From: John Fastabend @ 2012-11-13 17:16 UTC (permalink / raw)
  To: Ariel Elior
  Cc: shemminger@vyatta.com, buytenh@wantstofly.org,
	davem@davemloft.net, vyasevic@redhat.com, jhs@mojatatu.com,
	chrisw@redhat.com, krkumar2@in.ibm.com, samudrala@us.ibm.com,
	peter.p.waskiewicz.jr@intel.com, jeffrey.t.kirsher@intel.com,
	netdev@vger.kernel.org, bhutchings@solarflare.com,
	gregory.v.rose@intel.com, Eilon Greenstein

>
> This series looks fine from bnx2x point of view.
> The dynamic change from VEB to VEPA will require a firmware change,
> so we might arrive a little late for the party.
> Ariel
>

In the meantime you could enable just the get routines
so the upper layer could learn how your SR-IOV switching
is being done or not done.

My basic expectation is hardware defaults to VEB but
not sure if that is true in all cases.

.John

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-11-13 17:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-24 18:12 [net-next PATCH v2 0/3] extend set/get netlink for embedded John Fastabend
2012-10-24 18:12 ` [net-next PATCH v2 1/3] net: create generic bridge ops John Fastabend
2012-11-02 22:32   ` Ben Hutchings
2012-11-02 23:46     ` John Fastabend
2012-10-24 18:13 ` [net-next PATCH v2 2/3] net: set and query VEB/VEPA bridge mode via PF_BRIDGE John Fastabend
2012-11-02 22:38   ` Ben Hutchings
2012-11-02 22:48     ` John Fastabend
2012-11-02 23:01       ` Ben Hutchings
2012-11-02 23:03         ` John Fastabend
2012-10-24 18:13 ` [net-next PATCH v2 3/3] ixgbe: add setlink, getlink support to ixgbe and ixgbevf John Fastabend
2012-10-25 21:09 ` [net-next PATCH v2 0/3] extend set/get netlink for embedded Ariel Elior
2012-11-13 17:16   ` John Fastabend
2012-10-31 17:21 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).