Netdev List
 help / color / mirror / Atom feed
* [patch net-next 02/10] net: introduce generic switch devices support
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.

Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
 MAINTAINERS                            |  7 ++++
 include/linux/netdevice.h              | 10 ++++++
 include/net/switchdev.h                | 30 +++++++++++++++++
 net/Kconfig                            |  1 +
 net/Makefile                           |  3 ++
 net/switchdev/Kconfig                  | 13 ++++++++
 net/switchdev/Makefile                 |  5 +++
 net/switchdev/switchdev.c              | 33 +++++++++++++++++++
 9 files changed, 161 insertions(+)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 include/net/switchdev.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
new file mode 100644
index 0000000..98be76c
--- /dev/null
+++ b/Documentation/networking/switchdev.txt
@@ -0,0 +1,59 @@
+Switch (and switch-ish) device drivers HOWTO
+===========================
+
+Please note that the word "switch" is here used in very generic meaning.
+This include devices supporting L2/L3 but also various flow offloading chips,
+including switches embedded into SR-IOV NICs.
+
+Lets describe a topology a bit. Imagine the following example:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  NIC0 NIC1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+In this example, there are two independent lines between the switch silicon
+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
+separate from the switch driver. SOME switch chip is by managed by a driver
+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
+connected to some other type of bus.
+
+Now, for the previous example show the representation in kernel:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  eth0 eth1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
+created for each port of a switch. These netdevices are instances
+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
+of the switch chip. eth0 and eth1 are instances of some other existing driver.
+
+The only difference of the switch-port netdevice from the ordinary netdevice
+is that is implements couple more NDOs:
+
+	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
+			       of the same physical switch chip. This is
+			       mandatory to be implemented by all switch drivers
+			       and serves the caller for recognition of a port
+			       netdevice.
+	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
+			  chip itself (it can be though of as a "parent" of the
+			  port, therefore the name). They are not port-specific.
+			  Caller might use arbitrary port netdevice of the same
+			  switch and it will make no difference.
+	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
diff --git a/MAINTAINERS b/MAINTAINERS
index 3a41fb0..776e078 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9003,6 +9003,13 @@ F:	lib/swiotlb.c
 F:	arch/*/kernel/pci-swiotlb.c
 F:	include/linux/swiotlb.h
 
+SWITCHDEV
+M:	Jiri Pirko <jiri@resnulli.us>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	net/switchdev/
+F:	include/net/switchdev.h
+
 SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
 S:	Supported
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bbc730f..54b08a7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1017,6 +1017,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	performing GSO on a packet. The device returns true if it is
  *	able to GSO the packet, false otherwise. If the return value is
  *	false the stack will do software GSO.
+ *
+ * int (*ndo_sw_parent_id_get)(struct net_device *dev,
+ *			       struct netdev_phys_item_id *psid);
+ *	Called to get an ID of the switch chip this port is part of.
+ *	If driver implements this, it indicates that it represents a port
+ *	of a switch chip.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1168,6 +1174,10 @@ struct net_device_ops {
 	int			(*ndo_get_lock_subclass)(struct net_device *dev);
 	bool			(*ndo_gso_check) (struct sk_buff *skb,
 						  struct net_device *dev);
+#ifdef CONFIG_NET_SWITCHDEV
+	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
+							struct netdev_phys_item_id *psid);
+#endif
 };
 
 /**
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
new file mode 100644
index 0000000..79bf9bd
--- /dev/null
+++ b/include/net/switchdev.h
@@ -0,0 +1,30 @@
+/*
+ * include/net/switchdev.h - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _LINUX_SWITCHDEV_H_
+#define _LINUX_SWITCHDEV_H_
+
+#include <linux/netdevice.h>
+
+#ifdef CONFIG_NET_SWITCHDEV
+
+int netdev_sw_parent_id_get(struct net_device *dev,
+			    struct netdev_phys_item_id *psid);
+
+#else
+
+static inline int netdev_sw_parent_id_get(struct net_device *dev,
+					  struct netdev_phys_item_id *psid)
+{
+	return -EOPNOTSUPP;
+}
+
+#endif
+
+#endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 99815b5..ff9ffc1 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
 source "net/netlink/Kconfig"
 source "net/mpls/Kconfig"
 source "net/hsr/Kconfig"
+source "net/switchdev/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index 7ed1970..95fc694 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
 obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
 obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
 obj-$(CONFIG_HSR)		+= hsr/
+ifneq ($(CONFIG_NET_SWITCHDEV),)
+obj-y				+= switchdev/
+endif
diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
new file mode 100644
index 0000000..1557545
--- /dev/null
+++ b/net/switchdev/Kconfig
@@ -0,0 +1,13 @@
+#
+# Configuration for Switch device support
+#
+
+config NET_SWITCHDEV
+	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
+	depends on INET
+	---help---
+	  This module provides glue between core networking code and device
+	  drivers in order to support hardware switch chips in very generic
+	  meaning of the word "switch". This include devices supporting L2/L3 but
+	  also various flow offloading chips, including switches embedded into
+	  SR-IOV NICs.
diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
new file mode 100644
index 0000000..5ed63ed
--- /dev/null
+++ b/net/switchdev/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Switch device API
+#
+
+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
new file mode 100644
index 0000000..5010f646
--- /dev/null
+++ b/net/switchdev/switchdev.c
@@ -0,0 +1,33 @@
+/*
+ * net/switchdev/switchdev.c - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/netdevice.h>
+#include <net/switchdev.h>
+
+/**
+ *	netdev_sw_parent_id_get - Get ID of a switch
+ *	@dev: port device
+ *	@psid: switch ID
+ *
+ *	Get ID of a switch this port is part of.
+ */
+int netdev_sw_parent_id_get(struct net_device *dev,
+			    struct netdev_phys_item_id *psid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_sw_parent_id_get)
+		return -EOPNOTSUPP;
+	return ops->ndo_sw_parent_id_get(dev, psid);
+}
+EXPORT_SYMBOL(netdev_sw_parent_id_get);
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next] net/fm10k: Avoid double setting of NETIF_F_SG for the HW encapsulation feature mask
From: Or Gerlitz @ 2014-11-06  9:21 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: netdev, Or Gerlitz

The networking core does it for the driver during registration time.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 8811364..2b17cd8 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1414,13 +1414,12 @@ struct net_device *fm10k_alloc_netdev(void)
 	dev->vlan_features |= dev->features;
 
 	/* configure tunnel offloads */
-	dev->hw_enc_features = NETIF_F_IP_CSUM |
+	dev->hw_enc_features |= NETIF_F_IP_CSUM |
 			       NETIF_F_TSO |
 			       NETIF_F_TSO6 |
 			       NETIF_F_TSO_ECN |
 			       NETIF_F_GSO_UDP_TUNNEL |
-			       NETIF_F_IPV6_CSUM |
-			       NETIF_F_SG;
+			       NETIF_F_IPV6_CSUM;
 
 	/* we want to leave these both on as we cannot disable VLAN tag
 	 * insertion or stripping on the hardware since it is contained
-- 
1.7.1

^ permalink raw reply related

* [patch net-next 10/10] rocker: implement L2 bridge offloading
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

From: Scott Feldman <sfeldma@gmail.com>

Add L2 bridge offloading support to rocker driver.  Here, the Linux bridge
driver is used to collect swdev ports into a tagged (or untagged) VLAN
bridge.  The swdev will offload from the bridge driver the following L2
bridging functions:

 - Learning of neighbor MAC addresses on VLAN X  Learned mac/vlan is
installed in bridge FDB.  (And removed when device unlearns mac/vlan).
Learning must be turned off on each bridge port to disable the feature in
the bridge driver.

- Flooding of multicast/broadcast and unknown unicast pkts to (STP)
active ports in bridge.  The bridge driver is unaware of the flooding happening
at the device level.  Flooding must be turned off on each bridge port to
disable the feature on the bridge driver.

- STP port state is pushed down to driver/device.  The bridge still processes
STP BDPUs and maintains port STP state (for all VLANs in bridge), but
the driver/device must be notified of port STP state change to program
the device.

Multiple (VLAN) bridges are supported.  The device (implemented per
the OF-DPA spec) must use a portion of the VLAN namespace for
internal VLANs.  Right now, the upper 255 VLANs (0xf00 to 0xffe) are
used as internal VLAN IDs for untagged traffic and are not available
as port VLANs.

The driver uses the following interfaces:

1. To track VLAN add/del on ports in bridge:

.ndo_vlan_rx_add_vid
.ndo_vlan_rx_kill_vid

2. To track port add/del membership in bridge:

NETDEV_CHANGEUPPER netdevice notifier

3. To catch static FDB entries installed on bridge/vlan by user using netlink:

.ndo_sw_port_fdb_add
.ndo_sw_port_fdb_del

4. To be notified on port STP state change:

.ndo_sw_port_stp_update

5. To notify bridge driver on learned/forgotten mac/vlans on bridge port:

br_fdb_learn_add
br_fdb_learn_del

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/rocker/rocker.c | 660 ++++++++++++++++++++++++++++++++++-
 1 file changed, 659 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index b940be0..e0b47d6 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -28,6 +28,7 @@
 #include <linux/ethtool.h>
 #include <linux/if_ether.h>
 #include <linux/if_vlan.h>
+#include <linux/if_bridge.h>
 #include <net/switchdev.h>
 #include <net/rtnetlink.h>
 #include <asm-generic/io-64-nonatomic-lo-hi.h>
@@ -195,10 +196,12 @@ enum {
 
 struct rocker_port {
 	struct net_device *dev;
+	struct net_device *bridge_dev;
 	struct rocker *rocker;
 	unsigned port_number;
 	u32 lport;
 	__be16 internal_vlan_id;
+	int stp_state;
 	bool ctrls[ROCKER_CTRL_MAX];
 	unsigned long vlan_bitmap[ROCKER_VLAN_BITMAP_LEN];
 	struct napi_struct napi_tx;
@@ -289,6 +292,20 @@ static __be16 rocker_port_vid_to_vlan(struct rocker_port *rocker_port,
 	return vlan_id;
 }
 
+static u16 rocker_port_vlan_to_vid(struct rocker_port *rocker_port,
+				   __be16 vlan_id)
+{
+	if (rocker_vlan_id_is_internal(vlan_id))
+		return 0;
+
+	return ntohs(vlan_id);
+}
+
+static bool rocker_port_is_bridged(struct rocker_port *rocker_port)
+{
+	return !!rocker_port->bridge_dev;
+}
+
 struct rocker_wait {
 	wait_queue_head_t wait;
 	bool done;
@@ -1301,6 +1318,42 @@ static int rocker_event_link_change(struct rocker *rocker,
 #define ROCKER_OP_FLAG_NOWAIT		(1 << 1)
 #define ROCKER_OP_FLAG_LEARNED		(1 << 2)
 
+static int rocker_port_fdb(struct rocker_port *rocker_port,
+			   const unsigned char *addr,
+			   __be16 vlan_id, int flags);
+
+static int rocker_event_mac_vlan_seen(struct rocker *rocker,
+				      const struct rocker_tlv *info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAX + 1];
+	unsigned port_number;
+	struct rocker_port *rocker_port;
+	unsigned char *addr;
+	int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_LEARNED;
+	__be16 vlan_id;
+
+	rocker_tlv_parse_nested(attrs, ROCKER_TLV_EVENT_MAC_VLAN_MAX, info);
+	if (!attrs[ROCKER_TLV_EVENT_MAC_VLAN_LPORT] ||
+	    !attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAC] ||
+	    !attrs[ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID])
+		return -EIO;
+	port_number =
+		rocker_tlv_get_u32(attrs[ROCKER_TLV_EVENT_MAC_VLAN_LPORT]) - 1;
+	addr = rocker_tlv_data(attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAC]);
+	vlan_id = rocker_tlv_get_u16(attrs[ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID]);
+
+	if (port_number >= rocker->port_count)
+		return -EINVAL;
+
+	rocker_port = rocker->ports[port_number];
+
+	if (rocker_port->stp_state != BR_STATE_LEARNING &&
+	    rocker_port->stp_state != BR_STATE_FORWARDING)
+		return 0;
+
+	return rocker_port_fdb(rocker_port, addr, vlan_id, flags);
+}
+
 static int rocker_event_process(struct rocker *rocker,
 				struct rocker_desc_info *desc_info)
 {
@@ -1319,6 +1372,8 @@ static int rocker_event_process(struct rocker *rocker,
 	switch (type) {
 	case ROCKER_TLV_EVENT_TYPE_LINK_CHANGED:
 		return rocker_event_link_change(rocker, info);
+	case ROCKER_TLV_EVENT_TYPE_MAC_VLAN_SEEN:
+		return rocker_event_mac_vlan_seen(rocker, info);
 	}
 
 	return -EOPNOTSUPP;
@@ -2543,6 +2598,104 @@ static int rocker_group_l2_flood(struct rocker_port *rocker_port,
 				       group_id);
 }
 
+static int rocker_port_vlan_flood_group(struct rocker_port *rocker_port,
+					int flags, __be16 vlan_id)
+{
+	struct rocker_port *p;
+	struct rocker *rocker = rocker_port->rocker;
+	u32 group_id = ROCKER_GROUP_L2_FLOOD(vlan_id, 0);
+	u32 group_ids[rocker->port_count];
+	u8 group_count = 0;
+	int err;
+	int i;
+
+	/* Adjust the flood group for this VLAN.  The flood group
+	 * references an L2 interface group for each port in this
+	 * VLAN.
+	 */
+
+	for (i = 0; i < rocker->port_count; i++) {
+		p = rocker->ports[i];
+		if (!rocker_port_is_bridged(p))
+			continue;
+		if (test_bit(ntohs(vlan_id), p->vlan_bitmap)) {
+			group_ids[group_count++] =
+				ROCKER_GROUP_L2_INTERFACE(vlan_id,
+							  p->lport);
+		}
+	}
+
+	/* If there are no bridged ports in this VLAN, we're done */
+	if (group_count == 0)
+		return 0;
+
+	err = rocker_group_l2_flood(rocker_port, flags, vlan_id,
+				    group_count, group_ids,
+				    group_id);
+	if (err)
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN l2 flood group\n", err);
+
+	return err;
+}
+
+static int rocker_port_vlan_l2_groups(struct rocker_port *rocker_port,
+				      int flags, __be16 vlan_id,
+				      bool pop_vlan)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_port *p;
+	bool adding = !(flags & ROCKER_OP_FLAG_REMOVE);
+	u32 out_lport;
+	int ref = 0;
+	int err;
+	int i;
+
+	/* An L2 interface group for this port in this VLAN, but
+	 * only when port STP state is LEARNING|FORWARDING.
+	 */
+
+	if (rocker_port->stp_state == BR_STATE_LEARNING ||
+	    rocker_port->stp_state == BR_STATE_FORWARDING) {
+		out_lport = rocker_port->lport;
+		err = rocker_group_l2_interface(rocker_port, flags,
+						vlan_id, out_lport,
+						pop_vlan);
+		if (err) {
+			netdev_err(rocker_port->dev,
+				   "Error (%d) port VLAN l2 group for lport %d\n",
+				   err, out_lport);
+			return err;
+		}
+	}
+
+	/* An L2 interface group for this VLAN to CPU port.
+	 * Add when first port joins this VLAN and destroy when
+	 * last port leaves this VLAN.
+	 */
+
+	for (i = 0; i < rocker->port_count; i++) {
+		p = rocker->ports[i];
+		if (test_bit(ntohs(vlan_id), p->vlan_bitmap))
+			ref++;
+	}
+
+	if ((!adding || ref != 1) && (adding || ref != 0))
+		return 0;
+
+	out_lport = 0;
+	err = rocker_group_l2_interface(rocker_port, flags,
+					vlan_id, out_lport,
+					pop_vlan);
+	if (err) {
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN l2 group for CPU port\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
 static struct rocker_ctrl {
 	const u8 *eth_dst;
 	const u8 *eth_dst_mask;
@@ -2621,6 +2774,30 @@ static int rocker_port_ctrl_vlan_acl(struct rocker_port *rocker_port,
 	return err;
 }
 
+static int rocker_port_ctrl_vlan_bridge(struct rocker_port *rocker_port,
+					int flags, struct rocker_ctrl *ctrl,
+					__be16 vlan_id)
+{
+	enum rocker_of_dpa_table_id goto_tbl =
+		ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	u32 group_id = ROCKER_GROUP_L2_FLOOD(vlan_id, 0);
+	u32 tunnel_id = 0;
+	int err;
+
+	if (!rocker_port_is_bridged(rocker_port))
+		return 0;
+
+	err = rocker_flow_tbl_bridge(rocker_port, flags,
+				     ctrl->eth_dst, ctrl->eth_dst_mask,
+				     vlan_id, tunnel_id,
+				     goto_tbl, group_id, ctrl->copy_to_cpu);
+
+	if (err)
+		netdev_err(rocker_port->dev, "Error (%d) ctrl FLOOD\n", err);
+
+	return err;
+}
+
 static int rocker_port_ctrl_vlan_term(struct rocker_port *rocker_port,
 				      int flags, struct rocker_ctrl *ctrl,
 				      __be16 vlan_id)
@@ -2651,6 +2828,9 @@ static int rocker_port_ctrl_vlan(struct rocker_port *rocker_port, int flags,
 	if (ctrl->acl)
 		return rocker_port_ctrl_vlan_acl(rocker_port, flags,
 						 ctrl, vlan_id);
+	if (ctrl->bridge)
+		return rocker_port_ctrl_vlan_bridge(rocker_port, flags,
+						    ctrl, vlan_id);
 
 	if (ctrl->term)
 		return rocker_port_ctrl_vlan_term(rocker_port, flags,
@@ -2695,6 +2875,64 @@ static int rocker_port_ctrl(struct rocker_port *rocker_port, int flags,
 	return err;
 }
 
+static int rocker_port_vlan(struct rocker_port *rocker_port, int flags,
+			    u16 vid)
+{
+	enum rocker_of_dpa_table_id goto_tbl =
+		ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
+	u32 in_lport = rocker_port->lport;
+	__be16 vlan_id = htons(vid);
+	__be16 vlan_id_mask = htons(0xffff);
+	__be16 internal_vlan_id;
+	bool untagged;
+	bool adding = !(flags & ROCKER_OP_FLAG_REMOVE);
+	int err;
+
+	internal_vlan_id = rocker_port_vid_to_vlan(rocker_port, vid, &untagged);
+
+	if (adding && test_and_set_bit(ntohs(internal_vlan_id),
+				       rocker_port->vlan_bitmap))
+			return 0; /* already added */
+	else if (!adding && !test_and_clear_bit(ntohs(internal_vlan_id),
+						rocker_port->vlan_bitmap))
+			return 0; /* already removed */
+
+	if (adding) {
+		err = rocker_port_ctrl_vlan_add(rocker_port, flags,
+						internal_vlan_id);
+		if (err) {
+			netdev_err(rocker_port->dev,
+				   "Error (%d) port ctrl vlan add\n", err);
+			return err;
+		}
+	}
+
+	err = rocker_port_vlan_l2_groups(rocker_port, flags,
+					 internal_vlan_id, untagged);
+	if (err) {
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN l2 groups\n", err);
+		return err;
+	}
+
+	err = rocker_port_vlan_flood_group(rocker_port, flags,
+					   internal_vlan_id);
+	if (err) {
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN l2 flood group\n", err);
+		return err;
+	}
+
+	err = rocker_flow_tbl_vlan(rocker_port, flags,
+				   in_lport, vlan_id, vlan_id_mask,
+				   goto_tbl, untagged, internal_vlan_id);
+	if (err)
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN table\n", err);
+
+	return err;
+}
+
 static int rocker_port_ig_tbl(struct rocker_port *rocker_port, int flags)
 {
 	enum rocker_of_dpa_table_id goto_tbl;
@@ -2720,6 +2958,158 @@ static int rocker_port_ig_tbl(struct rocker_port *rocker_port, int flags)
 	return err;
 }
 
+struct rocker_fdb_learn_work {
+	struct work_struct work;
+	struct net_device *dev;
+	int flags;
+	u8 addr[ETH_ALEN];
+	u16 vid;
+};
+
+static void rocker_port_fdb_learn_work(struct work_struct *work)
+{
+	struct rocker_fdb_learn_work *lw =
+		container_of(work, struct rocker_fdb_learn_work, work);
+	bool removing = (lw->flags & ROCKER_OP_FLAG_REMOVE);
+	bool learned = (lw->flags & ROCKER_OP_FLAG_LEARNED);
+
+	if (learned & removing)
+		br_fdb_learn_del(lw->dev, lw->addr, lw->vid);
+	else if (learned & !removing)
+		br_fdb_learn_add(lw->dev, lw->addr, lw->vid);
+
+	kfree(work);
+}
+
+static int rocker_port_fdb_learn(struct rocker_port *rocker_port,
+				 int flags, const u8 *addr, __be16 vlan_id)
+{
+	struct rocker_fdb_learn_work *lw;
+	enum rocker_of_dpa_table_id goto_tbl =
+		ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	u32 out_lport = rocker_port->lport;
+	u32 tunnel_id = 0;
+	u32 group_id = ROCKER_GROUP_NONE;
+	bool copy_to_cpu = false;
+	int err;
+
+	if (rocker_port_is_bridged(rocker_port))
+		group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
+
+	err = rocker_flow_tbl_bridge(rocker_port, flags, addr, NULL,
+				     vlan_id, tunnel_id, goto_tbl,
+				     group_id, copy_to_cpu);
+	if (err)
+		return err;
+
+	if (!rocker_port_is_bridged(rocker_port))
+		return 0;
+
+	lw = kmalloc(sizeof(*lw), rocker_op_flags_gfp(flags));
+	if (!lw)
+		return -ENOMEM;
+
+	INIT_WORK(&lw->work, rocker_port_fdb_learn_work);
+
+	lw->dev = rocker_port->dev;
+	lw->flags = flags;
+	ether_addr_copy(lw->addr, addr);
+	lw->vid = rocker_port_vlan_to_vid(rocker_port, vlan_id);
+
+	schedule_work(&lw->work);
+
+	return 0;
+}
+
+static struct rocker_fdb_tbl_entry *
+rocker_fdb_tbl_find(struct rocker *rocker, struct rocker_fdb_tbl_entry *match)
+{
+	struct rocker_fdb_tbl_entry *found;
+
+	hash_for_each_possible(rocker->fdb_tbl, found, entry, match->key_crc32)
+		if (memcmp(&found->key, &match->key, sizeof(found->key)) == 0)
+			return found;
+
+	return NULL;
+}
+
+static int rocker_port_fdb(struct rocker_port *rocker_port,
+			   const unsigned char *addr,
+			   __be16 vlan_id, int flags)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_fdb_tbl_entry *fdb;
+	struct rocker_fdb_tbl_entry *found;
+	bool removing = (flags & ROCKER_OP_FLAG_REMOVE);
+	unsigned long lock_flags;
+
+	fdb = kzalloc(sizeof(*fdb), rocker_op_flags_gfp(flags));
+	if (!fdb)
+		return -ENOMEM;
+
+	fdb->learned = (flags & ROCKER_OP_FLAG_LEARNED);
+	fdb->key.lport = rocker_port->lport;
+	ether_addr_copy(fdb->key.addr, addr);
+	fdb->key.vlan_id = vlan_id;
+	fdb->key_crc32 = crc32(~0, &fdb->key, sizeof(fdb->key));
+
+	spin_lock_irqsave(&rocker->fdb_tbl_lock, lock_flags);
+
+	found = rocker_fdb_tbl_find(rocker, fdb);
+
+	if (removing && found) {
+		kfree(fdb);
+		hash_del(&found->entry);
+	} else if (!removing && !found) {
+		hash_add(rocker->fdb_tbl, &fdb->entry, fdb->key_crc32);
+	}
+
+	spin_unlock_irqrestore(&rocker->fdb_tbl_lock, lock_flags);
+
+	/* Check if adding and already exists, or removing and can't find */
+	if (!found != !removing) {
+		kfree(fdb);
+		return 0;
+	}
+
+	return rocker_port_fdb_learn(rocker_port, flags, addr, vlan_id);
+}
+
+static int rocker_port_fdb_flush(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_fdb_tbl_entry *found;
+	unsigned long lock_flags;
+	int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
+	struct hlist_node *tmp;
+	int bkt;
+	int err = 0;
+
+	if (rocker_port->stp_state == BR_STATE_LEARNING ||
+	    rocker_port->stp_state == BR_STATE_FORWARDING)
+		return 0;
+
+	spin_lock_irqsave(&rocker->fdb_tbl_lock, lock_flags);
+
+	hash_for_each_safe(rocker->fdb_tbl, bkt, tmp, found, entry) {
+		if (found->key.lport != rocker_port->lport)
+			continue;
+		if (!found->learned)
+			continue;
+		err = rocker_port_fdb_learn(rocker_port, flags,
+					    found->key.addr,
+					    found->key.vlan_id);
+		if (err)
+			goto err_out;
+		hash_del(&found->entry);
+	}
+
+err_out:
+	spin_unlock_irqrestore(&rocker->fdb_tbl_lock, lock_flags);
+
+	return err;
+}
+
 static int rocker_port_router_mac(struct rocker_port *rocker_port,
 				  int flags, __be16 vlan_id)
 {
@@ -2752,6 +3142,98 @@ static int rocker_port_router_mac(struct rocker_port *rocker_port,
 	return err;
 }
 
+static int rocker_port_fwding(struct rocker_port *rocker_port)
+{
+	bool pop_vlan;
+	u32 out_lport;
+	__be16 vlan_id;
+	u16 vid;
+	int flags = ROCKER_OP_FLAG_NOWAIT;
+	int err;
+
+	/* Port will be forwarding-enabled if its STP state is LEARNING
+	 * or FORWARDING.  Traffic from CPU can still egress, regardless of
+	 * port STP state.  Use L2 interface group on port VLANs as a way
+	 * to toggle port forwarding: if forwarding is disabled, L2
+	 * interface group will not exist.
+	 */
+
+	if (rocker_port->stp_state != BR_STATE_LEARNING &&
+	    rocker_port->stp_state != BR_STATE_FORWARDING)
+		flags |= ROCKER_OP_FLAG_REMOVE;
+
+	out_lport = rocker_port->lport;
+	for (vid = 1; vid < VLAN_N_VID; vid++) {
+		if (!test_bit(vid, rocker_port->vlan_bitmap))
+			continue;
+		vlan_id = htons(vid);
+		pop_vlan = rocker_vlan_id_is_internal(vlan_id);
+		err = rocker_group_l2_interface(rocker_port, flags,
+						vlan_id, out_lport,
+						pop_vlan);
+		if (err) {
+			netdev_err(rocker_port->dev,
+				   "Error (%d) port VLAN l2 group for lport %d\n",
+				   err, out_lport);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+static int rocker_port_stp_update(struct net_device *dev, u8 state)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	bool want[ROCKER_CTRL_MAX] = { 0, };
+	int flags;
+	int err;
+	int i;
+
+	if (rocker_port->stp_state == state)
+		return 0;
+
+	rocker_port->stp_state = state;
+
+	switch (state) {
+	case BR_STATE_DISABLED:
+		/* port is completely disabled */
+		break;
+	case BR_STATE_LISTENING:
+	case BR_STATE_BLOCKING:
+		want[ROCKER_CTRL_LINK_LOCAL_MCAST] = true;
+		break;
+	case BR_STATE_LEARNING:
+	case BR_STATE_FORWARDING:
+		want[ROCKER_CTRL_LINK_LOCAL_MCAST] = true;
+		want[ROCKER_CTRL_IPV4_MCAST] = true;
+		want[ROCKER_CTRL_IPV6_MCAST] = true;
+		if (rocker_port_is_bridged(rocker_port))
+			want[ROCKER_CTRL_DFLT_BRIDGING] = true;
+		else
+			want[ROCKER_CTRL_LOCAL_ARP] = true;
+		break;
+	}
+
+	for (i = 0; i < ROCKER_CTRL_MAX; i++) {
+		if (want[i] != rocker_port->ctrls[i]) {
+			flags = ROCKER_OP_FLAG_NOWAIT |
+				(want[i] ? 0 : ROCKER_OP_FLAG_REMOVE);
+			err = rocker_port_ctrl(rocker_port, flags,
+					       &rocker_ctrls[i]);
+			if (err)
+				return err;
+			rocker_port->ctrls[i] = want[i];
+		}
+	}
+
+	err = rocker_port_fdb_flush(rocker_port);
+	if (err)
+		return err;
+
+	return rocker_port_fwding(rocker_port);
+}
+
 static struct rocker_internal_vlan_tbl_entry *
 rocker_internal_vlan_tbl_find(struct rocker *rocker, int ifindex)
 {
@@ -2843,6 +3325,8 @@ not_found:
 static int rocker_port_open(struct net_device *dev)
 {
 	struct rocker_port *rocker_port = netdev_priv(dev);
+	u8 stp_state = rocker_port_is_bridged(rocker_port) ?
+		BR_STATE_BLOCKING : BR_STATE_FORWARDING;
 	int err;
 
 	err = rocker_port_dma_rings_init(rocker_port);
@@ -2865,12 +3349,18 @@ static int rocker_port_open(struct net_device *dev)
 		goto err_request_rx_irq;
 	}
 
+	err = rocker_port_stp_update(dev, stp_state);
+	if (err)
+		goto err_stp_update;
+
 	napi_enable(&rocker_port->napi_tx);
 	napi_enable(&rocker_port->napi_rx);
 	rocker_port_set_enable(rocker_port, true);
 	netif_start_queue(dev);
 	return 0;
 
+err_stp_update:
+	free_irq(rocker_msix_rx_vector(rocker_port), rocker_port);
 err_request_rx_irq:
 	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
 err_request_tx_irq:
@@ -2886,6 +3376,7 @@ static int rocker_port_stop(struct net_device *dev)
 	rocker_port_set_enable(rocker_port, false);
 	napi_disable(&rocker_port->napi_rx);
 	napi_disable(&rocker_port->napi_tx);
+	rocker_port_stp_update(dev, BR_STATE_DISABLED);
 	free_irq(rocker_msix_rx_vector(rocker_port), rocker_port);
 	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
 	rocker_port_dma_rings_fini(rocker_port);
@@ -3030,6 +3521,33 @@ static int rocker_port_set_mac_address(struct net_device *dev, void *p)
 	return 0;
 }
 
+static int rocker_port_vlan_rx_add_vid(struct net_device *dev,
+				       __be16 proto, u16 vid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	err = rocker_port_vlan(rocker_port, 0, vid);
+	if (err)
+		return err;
+
+	return rocker_port_router_mac(rocker_port, 0, htons(vid));
+}
+
+static int rocker_port_vlan_rx_kill_vid(struct net_device *dev,
+					__be16 proto, u16 vid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	err = rocker_port_router_mac(rocker_port, ROCKER_OP_FLAG_REMOVE,
+				     htons(vid));
+	if (err)
+		return err;
+
+	return rocker_port_vlan(rocker_port, ROCKER_OP_FLAG_REMOVE, vid);
+}
+
 static int rocker_port_sw_parent_id_get(struct net_device *dev,
 					struct netdev_phys_item_id *psid)
 {
@@ -3041,13 +3559,49 @@ static int rocker_port_sw_parent_id_get(struct net_device *dev,
 	return 0;
 }
 
+static int rocker_port_sw_port_fdb_add(struct net_device *dev,
+				       const unsigned char *addr, u16 vid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	__be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, vid, NULL);
+	int flags = 0;
+
+	if (!rocker_port_is_bridged(rocker_port))
+		return -EINVAL;
+
+	return rocker_port_fdb(rocker_port, addr, vlan_id, flags);
+}
+
+static int rocker_port_sw_port_fdb_del(struct net_device *dev,
+				       const unsigned char *addr, u16 vid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	__be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, vid, NULL);
+	int flags = ROCKER_OP_FLAG_REMOVE;
+
+	if (!rocker_port_is_bridged(rocker_port))
+		return -EINVAL;
+
+	return rocker_port_fdb(rocker_port, addr, vlan_id, flags);
+}
+
+static int rocker_port_sw_port_stp_update(struct net_device *dev, u8 state)
+{
+	return rocker_port_stp_update(dev, state);
+}
+
 static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_open			= rocker_port_open,
 	.ndo_stop			= rocker_port_stop,
 	.ndo_start_xmit			= rocker_port_xmit,
 	.ndo_set_mac_address		= rocker_port_set_mac_address,
+	.ndo_vlan_rx_add_vid		= rocker_port_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid		= rocker_port_vlan_rx_kill_vid,
 #ifdef CONFIG_NET_SWITCHDEV
 	.ndo_sw_parent_id_get		= rocker_port_sw_parent_id_get,
+	.ndo_sw_port_fdb_add		= rocker_port_sw_port_fdb_add,
+	.ndo_sw_port_fdb_del		= rocker_port_sw_port_fdb_del,
+	.ndo_sw_port_stp_update		= rocker_port_sw_port_stp_update,
 #endif
 };
 
@@ -3498,17 +4052,121 @@ static struct pci_driver rocker_pci_driver = {
 	.remove		= rocker_remove,
 };
 
+/************************************
+ * Net device notifier event handler
+ ************************************/
+
+static bool rocker_port_dev_check(struct net_device *dev)
+{
+	return dev->netdev_ops == &rocker_port_netdev_ops;
+}
+
+static int rocker_port_bridge_join(struct rocker_port *rocker_port,
+				   struct net_device *bridge)
+{
+	int err;
+
+	rocker_port_internal_vlan_id_put(rocker_port,
+					 rocker_port->dev->ifindex);
+
+	rocker_port->bridge_dev = bridge;
+
+	/* Use bridge internal VLAN ID for untagged pkts */
+	err = rocker_port_vlan(rocker_port, ROCKER_OP_FLAG_REMOVE, 0);
+	if (err)
+		return err;
+	rocker_port->internal_vlan_id =
+		rocker_port_internal_vlan_id_get(rocker_port,
+						 bridge->ifindex);
+	err = rocker_port_vlan(rocker_port, 0, 0);
+
+	return err;
+}
+
+static int rocker_port_bridge_leave(struct rocker_port *rocker_port)
+{
+	int err;
+
+	rocker_port_internal_vlan_id_put(rocker_port,
+					 rocker_port->bridge_dev->ifindex);
+
+	rocker_port->bridge_dev = NULL;
+
+	/* Use port internal VLAN ID for untagged pkts */
+	err = rocker_port_vlan(rocker_port, ROCKER_OP_FLAG_REMOVE, 0);
+	if (err)
+		return err;
+	rocker_port->internal_vlan_id =
+		rocker_port_internal_vlan_id_get(rocker_port,
+						 rocker_port->dev->ifindex);
+	err = rocker_port_vlan(rocker_port, 0, 0);
+
+	return err;
+}
+
+static int rocker_port_master_changed(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct net_device *master = netdev_master_upper_dev_get(dev);
+	int err = 0;
+
+	if (master && master->rtnl_link_ops &&
+	    !strcmp(master->rtnl_link_ops->kind, "bridge"))
+		err = rocker_port_bridge_join(rocker_port, master);
+	else
+		err = rocker_port_bridge_leave(rocker_port);
+
+	return err;
+}
+
+static int rocker_netdevice_event(struct notifier_block *unused,
+				  unsigned long event, void *ptr)
+{
+	struct net_device *dev;
+	int err;
+
+	switch (event) {
+	case NETDEV_CHANGEUPPER:
+		dev = netdev_notifier_info_to_dev(ptr);
+		if (!rocker_port_dev_check(dev))
+			return NOTIFY_DONE;
+		err = rocker_port_master_changed(dev);
+		if (err)
+			netdev_warn(dev,
+				    "failed to reflect master change (err %d)\n",
+				    err);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block rocker_netdevice_nb __read_mostly = {
+	.notifier_call = rocker_netdevice_event,
+};
+
 /***********************
  * Module init and exit
  ***********************/
 
 static int __init rocker_module_init(void)
 {
-	return pci_register_driver(&rocker_pci_driver);
+	int err;
+
+	register_netdevice_notifier(&rocker_netdevice_nb);
+	err = pci_register_driver(&rocker_pci_driver);
+	if (err)
+		goto err_pci_register_driver;
+	return 0;
+
+err_pci_register_driver:
+	unregister_netdevice_notifier(&rocker_netdevice_nb);
+	return err;
 }
 
 static void __exit rocker_module_exit(void)
 {
+	unregister_netdevice_notifier(&rocker_netdevice_nb);
 	pci_unregister_driver(&rocker_pci_driver);
 }
 
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 09/10] rocker: implement rocker ofdpa flow table manipulation
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

From: Scott Feldman <sfeldma@gmail.com>

The rocker driver maintains 4 hash tables: flows, groups, FDB, and VLANs.

Flow and group tables track the entries installed to OF-DPA tables,
per the OF-DPA spec.  See OF-DPA spec for full description of fields
in each flow and group table.  New table entries are pushed to the
device with ADD cmd.  Updated entries are pushed to the device with
MOD cmd.  For flow table entries, a crc32 key is made from fields of
the particular field.  For group table entries, the group_id is used
as the key.

The FDB table tracks fdb entries learned by the device or manually
pushed to the bridge by the user.  A crc32 key is made from the
port/mac/vlan tuple for the fdb entry.

The VLAN table tracks the ifindex-to-internal-vlan mapping for
untagged pkts.  On ingress, an untagged pkt is inserted with an
internal VLAN ID based on the input port's current internal VLAN ID.
The input port's internal VLAN will either be referenced by the port's
ifindex, if not bridged, or the containing bridge's ifindex, if
bridged.  Since the ifindex space isn't within a fixed range, uses a
hash table (with ifindex as key) to track internal VLAN ID for a given
ifindex.  The internal VLAN ID range is fixed and currently uses the
upper 255 VLAN IDs, starting at 0xf00.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/rocker/rocker.c | 1464 +++++++++++++++++++++++++++++++++-
 1 file changed, 1462 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 797481b..b940be0 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -16,6 +16,7 @@
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/spinlock.h>
+#include <linux/hashtable.h>
 #include <linux/crc32.h>
 #include <linux/sort.h>
 #include <linux/random.h>
@@ -41,6 +42,123 @@ static const struct pci_device_id rocker_pci_id_table[] = {
 	{0, }
 };
 
+struct rocker_flow_tbl_key {
+	u32 priority;
+	enum rocker_of_dpa_table_id tbl_id;
+	union {
+		struct {
+			u32 in_lport;
+			u32 in_lport_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+		} ig_port;
+		struct {
+			u32 in_lport;
+			__be16 vlan_id;
+			__be16 vlan_id_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+			bool untagged;
+			__be16 new_vlan_id;
+		} vlan;
+		struct {
+			u32 in_lport;
+			u32 in_lport_mask;
+			__be16 eth_type;
+			u8 eth_dst[ETH_ALEN];
+			u8 eth_dst_mask[ETH_ALEN];
+			__be16 vlan_id;
+			__be16 vlan_id_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+			bool copy_to_cpu;
+		} term_mac;
+		struct {
+			__be16 eth_type;
+			__be32 dst4;
+			__be32 dst4_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+			u32 group_id;
+		} ucast_routing;
+		struct {
+			u8 eth_dst[ETH_ALEN];
+			u8 eth_dst_mask[ETH_ALEN];
+			int has_eth_dst;
+			int has_eth_dst_mask;
+			__be16 vlan_id;
+			u32 tunnel_id;
+			enum rocker_of_dpa_table_id goto_tbl;
+			u32 group_id;
+			bool copy_to_cpu;
+		} bridge;
+		struct {
+			u32 in_lport;
+			u32 in_lport_mask;
+			u8 eth_src[ETH_ALEN];
+			u8 eth_src_mask[ETH_ALEN];
+			u8 eth_dst[ETH_ALEN];
+			u8 eth_dst_mask[ETH_ALEN];
+			__be16 eth_type;
+			__be16 vlan_id;
+			__be16 vlan_id_mask;
+			u8 ip_proto;
+			u8 ip_proto_mask;
+			u8 ip_tos;
+			u8 ip_tos_mask;
+			u32 group_id;
+		} acl;
+	};
+};
+
+struct rocker_flow_tbl_entry {
+	struct hlist_node entry;
+	u32 ref_count;
+	u64 cookie;
+	struct rocker_flow_tbl_key key;
+	u32 key_crc32; /* key */
+};
+
+struct rocker_group_tbl_entry {
+	struct hlist_node entry;
+	u32 cmd;
+	u32 group_id; /* key */
+	u16 group_count;
+	u32 *group_ids;
+	union {
+		struct {
+			u8 pop_vlan;
+		} l2_interface;
+		struct {
+			u8 eth_src[ETH_ALEN];
+			u8 eth_dst[ETH_ALEN];
+			__be16 vlan_id;
+			u32 group_id;
+		} l2_rewrite;
+		struct {
+			u8 eth_src[ETH_ALEN];
+			u8 eth_dst[ETH_ALEN];
+			__be16 vlan_id;
+			bool ttl_check;
+			u32 group_id;
+		} l3_unicast;
+	};
+};
+
+struct rocker_fdb_tbl_entry {
+	struct hlist_node entry;
+	u32 key_crc32; /* key */
+	bool learned;
+	struct rocker_fdb_tbl_key {
+		u32 lport;
+		u8 addr[ETH_ALEN];
+		__be16 vlan_id;
+	} key;
+};
+
+struct rocker_internal_vlan_tbl_entry {
+	struct hlist_node entry;
+	int ifindex; /* key */
+	u32 ref_count;
+	__be16 vlan_id;
+};
+
 struct rocker_desc_info {
 	char *data; /* mapped */
 	size_t data_size;
@@ -61,11 +179,28 @@ struct rocker_dma_ring_info {
 
 struct rocker;
 
+enum {
+	ROCKER_CTRL_LINK_LOCAL_MCAST,
+	ROCKER_CTRL_LOCAL_ARP,
+	ROCKER_CTRL_IPV4_MCAST,
+	ROCKER_CTRL_IPV6_MCAST,
+	ROCKER_CTRL_DFLT_BRIDGING,
+	ROCKER_CTRL_MAX,
+};
+
+#define ROCKER_INTERNAL_VLAN_ID_BASE	0x0f00
+#define ROCKER_N_INTERNAL_VLANS		255
+#define ROCKER_VLAN_BITMAP_LEN		BITS_TO_LONGS(VLAN_N_VID)
+#define ROCKER_INTERNAL_VLAN_BITMAP_LEN	BITS_TO_LONGS(ROCKER_N_INTERNAL_VLANS)
+
 struct rocker_port {
 	struct net_device *dev;
 	struct rocker *rocker;
 	unsigned port_number;
 	u32 lport;
+	__be16 internal_vlan_id;
+	bool ctrls[ROCKER_CTRL_MAX];
+	unsigned long vlan_bitmap[ROCKER_VLAN_BITMAP_LEN];
 	struct napi_struct napi_tx;
 	struct napi_struct napi_rx;
 	struct rocker_dma_ring_info tx_ring;
@@ -84,8 +219,76 @@ struct rocker {
 	spinlock_t cmd_ring_lock;
 	struct rocker_dma_ring_info cmd_ring;
 	struct rocker_dma_ring_info event_ring;
+	DECLARE_HASHTABLE(flow_tbl, 16);
+	spinlock_t flow_tbl_lock;
+	u64 flow_tbl_next_cookie;
+	DECLARE_HASHTABLE(group_tbl, 16);
+	spinlock_t group_tbl_lock;
+	DECLARE_HASHTABLE(fdb_tbl, 16);
+	spinlock_t fdb_tbl_lock;
+	unsigned long internal_vlan_bitmap[ROCKER_INTERNAL_VLAN_BITMAP_LEN];
+	DECLARE_HASHTABLE(internal_vlan_tbl, 8);
+	spinlock_t internal_vlan_tbl_lock;
 };
 
+static const u8 zero_mac[ETH_ALEN]   = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
+static const u8 ff_mac[ETH_ALEN]     = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+static const u8 ll_mac[ETH_ALEN]     = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 };
+static const u8 ll_mask[ETH_ALEN]    = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xf0 };
+static const u8 mcast_mac[ETH_ALEN]  = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00 };
+static const u8 ipv4_mcast[ETH_ALEN] = { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x00 };
+static const u8 ipv4_mask[ETH_ALEN]  = { 0xff, 0xff, 0xff, 0x80, 0x00, 0x00 };
+static const u8 ipv6_mcast[ETH_ALEN] = { 0x33, 0x33, 0x00, 0x00, 0x00, 0x00 };
+static const u8 ipv6_mask[ETH_ALEN]  = { 0xff, 0xff, 0x00, 0x00, 0x00, 0x00 };
+
+/* Rocker priority levels for flow table entries.  Higher
+ * priority match takes precedence over lower priority match.
+ */
+
+enum {
+	ROCKER_PRIORITY_UNKNOWN = 0,
+	ROCKER_PRIORITY_IG_PORT = 1,
+	ROCKER_PRIORITY_VLAN = 1,
+	ROCKER_PRIORITY_TERM_MAC_UCAST = 0,
+	ROCKER_PRIORITY_TERM_MAC_MCAST = 1,
+	ROCKER_PRIORITY_UNICAST_ROUTING = 1,
+	ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_EXACT = 1,
+	ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_WILD = 2,
+	ROCKER_PRIORITY_BRIDGING_VLAN = 3,
+	ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_EXACT = 1,
+	ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_WILD = 2,
+	ROCKER_PRIORITY_BRIDGING_TENANT = 3,
+	ROCKER_PRIORITY_ACL_CTRL = 3,
+	ROCKER_PRIORITY_ACL_NORMAL = 2,
+	ROCKER_PRIORITY_ACL_DFLT = 1,
+};
+
+static bool rocker_vlan_id_is_internal(__be16 vlan_id)
+{
+	u16 start = ROCKER_INTERNAL_VLAN_ID_BASE;
+	u16 end = 0xffe;
+	u16 _vlan_id = ntohs(vlan_id);
+
+	return (_vlan_id >= start && _vlan_id <= end);
+}
+
+static __be16 rocker_port_vid_to_vlan(struct rocker_port *rocker_port,
+				      u16 vid, bool *pop_vlan)
+{
+	__be16 vlan_id;
+
+	if (pop_vlan)
+		*pop_vlan = false;
+	vlan_id = htons(vid);
+	if (!vlan_id) {
+		vlan_id = rocker_port->internal_vlan_id;
+		if (pop_vlan)
+			*pop_vlan = true;
+	}
+
+	return vlan_id;
+}
+
 struct rocker_wait {
 	wait_queue_head_t wait;
 	bool done;
@@ -1094,6 +1297,10 @@ static int rocker_event_link_change(struct rocker *rocker,
 	return 0;
 }
 
+#define ROCKER_OP_FLAG_REMOVE		(1 << 0)
+#define ROCKER_OP_FLAG_NOWAIT		(1 << 1)
+#define ROCKER_OP_FLAG_LEARNED		(1 << 2)
+
 static int rocker_event_process(struct rocker *rocker,
 				struct rocker_desc_info *desc_info)
 {
@@ -1399,6 +1606,1236 @@ static int rocker_cmd_set_port_settings_macaddr(struct rocker_port *rocker_port,
 			       macaddr, NULL, NULL, false);
 }
 
+static int rocker_cmd_flow_tbl_add_ig_port(struct rocker_desc_info *desc_info,
+					   struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.ig_port.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
+			       entry->key.ig_port.in_lport_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.ig_port.goto_tbl))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_vlan(struct rocker_desc_info *desc_info,
+					struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.vlan.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.vlan.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+			       entry->key.vlan.vlan_id_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.vlan.goto_tbl))
+		return -EMSGSIZE;
+	if (entry->key.vlan.untagged &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_NEW_VLAN_ID,
+			       entry->key.vlan.new_vlan_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_term_mac(struct rocker_desc_info *desc_info,
+					    struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.term_mac.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
+			       entry->key.term_mac.in_lport_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+			       entry->key.term_mac.eth_type))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->key.term_mac.eth_dst))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
+			   ETH_ALEN, entry->key.term_mac.eth_dst_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.term_mac.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+			       entry->key.term_mac.vlan_id_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.term_mac.goto_tbl))
+		return -EMSGSIZE;
+	if (entry->key.term_mac.copy_to_cpu &&
+	    rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_COPY_CPU_ACTION,
+			      entry->key.term_mac.copy_to_cpu))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_flow_tbl_add_ucast_routing(struct rocker_desc_info *desc_info,
+				      struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+			       entry->key.ucast_routing.eth_type))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_DST_IP,
+			       entry->key.ucast_routing.dst4))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_DST_IP_MASK,
+			       entry->key.ucast_routing.dst4_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.ucast_routing.goto_tbl))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->key.ucast_routing.group_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_bridge(struct rocker_desc_info *desc_info,
+					  struct rocker_flow_tbl_entry *entry)
+{
+	if (entry->key.bridge.has_eth_dst &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->key.bridge.eth_dst))
+		return -EMSGSIZE;
+	if (entry->key.bridge.has_eth_dst_mask &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
+			   ETH_ALEN, entry->key.bridge.eth_dst_mask))
+		return -EMSGSIZE;
+	if (entry->key.bridge.vlan_id &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.bridge.vlan_id))
+		return -EMSGSIZE;
+	if (entry->key.bridge.tunnel_id &&
+	    rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_TUNNEL_ID,
+			       entry->key.bridge.tunnel_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.bridge.goto_tbl))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->key.bridge.group_id))
+		return -EMSGSIZE;
+	if (entry->key.bridge.copy_to_cpu &&
+	    rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_COPY_CPU_ACTION,
+			      entry->key.bridge.copy_to_cpu))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_acl(struct rocker_desc_info *desc_info,
+				       struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.acl.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
+			       entry->key.acl.in_lport_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC,
+			   ETH_ALEN, entry->key.acl.eth_src))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC_MASK,
+			   ETH_ALEN, entry->key.acl.eth_src_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->key.acl.eth_dst))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
+			   ETH_ALEN, entry->key.acl.eth_dst_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+			       entry->key.acl.eth_type))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.acl.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+			       entry->key.acl.vlan_id_mask))
+		return -EMSGSIZE;
+
+	switch (ntohs(entry->key.acl.eth_type)) {
+	case ETH_P_IP:
+	case ETH_P_IPV6:
+		if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_IP_PROTO,
+				      entry->key.acl.ip_proto))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info,
+				      ROCKER_TLV_OF_DPA_IP_PROTO_MASK,
+				      entry->key.acl.ip_proto_mask))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_IP_DSCP,
+				      entry->key.acl.ip_tos & 0x3f))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info,
+				      ROCKER_TLV_OF_DPA_IP_DSCP_MASK,
+				      entry->key.acl.ip_tos_mask & 0x3f))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_IP_ECN,
+				      (entry->key.acl.ip_tos & 0xc0) >> 6))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info,
+				      ROCKER_TLV_OF_DPA_IP_ECN_MASK,
+				      (entry->key.acl.ip_tos_mask & 0xc0) >> 6))
+			return -EMSGSIZE;
+		break;
+	}
+
+	if (entry->key.acl.group_id != ROCKER_GROUP_NONE &&
+	    rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->key.acl.group_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add(struct rocker *rocker,
+				   struct rocker_port *rocker_port,
+				   struct rocker_desc_info *desc_info,
+				   void *priv)
+{
+	struct rocker_flow_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+	int err = 0;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_ADD))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_TABLE_ID,
+			       entry->key.tbl_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_PRIORITY,
+			       entry->key.priority))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_HARDTIME, 0))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_OF_DPA_COOKIE,
+			       entry->cookie))
+		return -EMSGSIZE;
+
+	switch (entry->key.tbl_id) {
+	case ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT:
+		err = rocker_cmd_flow_tbl_add_ig_port(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_VLAN:
+		err = rocker_cmd_flow_tbl_add_vlan(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC:
+		err = rocker_cmd_flow_tbl_add_term_mac(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING:
+		err = rocker_cmd_flow_tbl_add_ucast_routing(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_BRIDGING:
+		err = rocker_cmd_flow_tbl_add_bridge(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_ACL_POLICY:
+		err = rocker_cmd_flow_tbl_add_acl(desc_info, entry);
+		break;
+	default:
+		err = -ENOTSUPP;
+		break;
+	}
+
+	if (err)
+		return err;
+
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_del(struct rocker *rocker,
+				   struct rocker_port *rocker_port,
+				   struct rocker_desc_info *desc_info,
+				   void *priv)
+{
+	const struct rocker_flow_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_DEL))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_OF_DPA_COOKIE,
+			       entry->cookie))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_l2_interface(struct rocker_desc_info *desc_info,
+				      struct rocker_group_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_OUT_LPORT,
+			       ROCKER_GROUP_PORT_GET(entry->group_id)))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_POP_VLAN,
+			      entry->l2_interface.pop_vlan))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_l2_rewrite(struct rocker_desc_info *desc_info,
+				    struct rocker_group_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID_LOWER,
+			       entry->l2_rewrite.group_id))
+		return -EMSGSIZE;
+	if (!is_zero_ether_addr(entry->l2_rewrite.eth_src) &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC,
+			   ETH_ALEN, entry->l2_rewrite.eth_src))
+		return -EMSGSIZE;
+	if (!is_zero_ether_addr(entry->l2_rewrite.eth_dst) &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->l2_rewrite.eth_dst))
+		return -EMSGSIZE;
+	if (entry->l2_rewrite.vlan_id &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->l2_rewrite.vlan_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_group_ids(struct rocker_desc_info *desc_info,
+				   struct rocker_group_tbl_entry *entry)
+{
+	int i;
+	struct rocker_tlv *group_ids;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GROUP_COUNT,
+			       entry->group_count))
+		return -EMSGSIZE;
+
+	group_ids = rocker_tlv_nest_start(desc_info,
+					  ROCKER_TLV_OF_DPA_GROUP_IDS);
+	if (!group_ids)
+		return -EMSGSIZE;
+
+	for (i = 0; i < entry->group_count; i++)
+		/* Note TLV array is 1-based */
+		if (rocker_tlv_put_u32(desc_info, i + 1, entry->group_ids[i]))
+			return -EMSGSIZE;
+
+	rocker_tlv_nest_end(desc_info, group_ids);
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_l3_unicast(struct rocker_desc_info *desc_info,
+				    struct rocker_group_tbl_entry *entry)
+{
+	if (!is_zero_ether_addr(entry->l3_unicast.eth_src) &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC,
+			   ETH_ALEN, entry->l3_unicast.eth_src))
+		return -EMSGSIZE;
+	if (!is_zero_ether_addr(entry->l3_unicast.eth_dst) &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->l3_unicast.eth_dst))
+		return -EMSGSIZE;
+	if (entry->l3_unicast.vlan_id &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->l3_unicast.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_TTL_CHECK,
+			      entry->l3_unicast.ttl_check))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID_LOWER,
+			       entry->l3_unicast.group_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_group_tbl_add(struct rocker *rocker,
+				    struct rocker_port *rocker_port,
+				    struct rocker_desc_info *desc_info,
+				    void *priv)
+{
+	struct rocker_group_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+	int err = 0;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE, entry->cmd))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->group_id))
+		return -EMSGSIZE;
+
+	switch (ROCKER_GROUP_TYPE_GET(entry->group_id)) {
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE:
+		err = rocker_cmd_group_tbl_add_l2_interface(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_REWRITE:
+		err = rocker_cmd_group_tbl_add_l2_rewrite(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_FLOOD:
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST:
+		err = rocker_cmd_group_tbl_add_group_ids(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_GROUP_TYPE_L3_UCAST:
+		err = rocker_cmd_group_tbl_add_l3_unicast(desc_info, entry);
+		break;
+	default:
+		err = -ENOTSUPP;
+		break;
+	}
+
+	if (err)
+		return err;
+
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+static int rocker_cmd_group_tbl_del(struct rocker *rocker,
+				    struct rocker_port *rocker_port,
+				    struct rocker_desc_info *desc_info,
+				    void *priv)
+{
+	const struct rocker_group_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE, entry->cmd))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->group_id))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+/*****************************************
+ * Flow, group, FDB, internal VLAN tables
+ *****************************************/
+
+static int rocker_init_tbls(struct rocker *rocker)
+{
+	hash_init(rocker->flow_tbl);
+	spin_lock_init(&rocker->flow_tbl_lock);
+
+	hash_init(rocker->group_tbl);
+	spin_lock_init(&rocker->group_tbl_lock);
+
+	hash_init(rocker->fdb_tbl);
+	spin_lock_init(&rocker->fdb_tbl_lock);
+
+	hash_init(rocker->internal_vlan_tbl);
+	spin_lock_init(&rocker->internal_vlan_tbl_lock);
+
+	return 0;
+}
+
+static void rocker_free_tbls(struct rocker *rocker)
+{
+	unsigned long flags;
+	struct rocker_flow_tbl_entry *flow_entry;
+	struct rocker_group_tbl_entry *group_entry;
+	struct rocker_fdb_tbl_entry *fdb_entry;
+	struct rocker_internal_vlan_tbl_entry *internal_vlan_entry;
+	struct hlist_node *tmp;
+	int bkt;
+
+	spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+	hash_for_each_safe(rocker->flow_tbl, bkt, tmp, flow_entry, entry)
+		hash_del(&flow_entry->entry);
+	spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+
+	spin_lock_irqsave(&rocker->group_tbl_lock, flags);
+	hash_for_each_safe(rocker->group_tbl, bkt, tmp, group_entry, entry)
+		hash_del(&group_entry->entry);
+	spin_unlock_irqrestore(&rocker->group_tbl_lock, flags);
+
+	spin_lock_irqsave(&rocker->fdb_tbl_lock, flags);
+	hash_for_each_safe(rocker->fdb_tbl, bkt, tmp, fdb_entry, entry)
+		hash_del(&fdb_entry->entry);
+	spin_unlock_irqrestore(&rocker->fdb_tbl_lock, flags);
+
+	spin_lock_irqsave(&rocker->internal_vlan_tbl_lock, flags);
+	hash_for_each_safe(rocker->internal_vlan_tbl, bkt,
+			   tmp, internal_vlan_entry, entry)
+		hash_del(&internal_vlan_entry->entry);
+	spin_unlock_irqrestore(&rocker->internal_vlan_tbl_lock, flags);
+}
+
+static struct rocker_flow_tbl_entry *
+rocker_flow_tbl_find(struct rocker *rocker, struct rocker_flow_tbl_entry *match)
+{
+	struct rocker_flow_tbl_entry *found;
+
+	hash_for_each_possible(rocker->flow_tbl, found, entry, match->key_crc32)
+		if (memcmp(&found->key, &match->key, sizeof(found->key)) == 0)
+			return found;
+
+	return NULL;
+}
+
+static int rocker_flow_tbl_add(struct rocker_port *rocker_port,
+			       struct rocker_flow_tbl_entry *match,
+			       bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_flow_tbl_entry *found;
+	unsigned long flags;
+	bool add_to_hw = false;
+	int err = 0;
+
+	match->key_crc32 = crc32(~0, &match->key, sizeof(match->key));
+
+	spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+
+	found = rocker_flow_tbl_find(rocker, match);
+
+	if (found) {
+		kfree(match);
+	} else {
+		found = match;
+		found->cookie = rocker->flow_tbl_next_cookie++;
+		hash_add(rocker->flow_tbl, &found->entry, found->key_crc32);
+		add_to_hw = true;
+	}
+
+	found->ref_count++;
+
+	spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+
+	if (add_to_hw) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_flow_tbl_add,
+				      found, NULL, NULL, nowait);
+		if (err) {
+			spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+			hash_del(&found->entry);
+			spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+			kfree(found);
+		}
+	}
+
+	return err;
+}
+
+static int rocker_flow_tbl_del(struct rocker_port *rocker_port,
+			       struct rocker_flow_tbl_entry *match,
+			       bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_flow_tbl_entry *found;
+	unsigned long flags;
+	bool del_from_hw = false;
+	int err = 0;
+
+	match->key_crc32 = crc32(~0, &match->key, sizeof(match->key));
+
+	spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+
+	found = rocker_flow_tbl_find(rocker, match);
+
+	if (found) {
+		found->ref_count--;
+		if (found->ref_count == 0) {
+			hash_del(&found->entry);
+			del_from_hw = true;
+		}
+	}
+
+	spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+
+	kfree(match);
+
+	if (del_from_hw) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_flow_tbl_del,
+				      found, NULL, NULL, nowait);
+		kfree(found);
+	}
+
+	return err;
+}
+
+static gfp_t rocker_op_flags_gfp(int flags)
+{
+	return flags & ROCKER_OP_FLAG_NOWAIT ? GFP_ATOMIC : GFP_KERNEL;
+}
+
+static int rocker_flow_tbl_do(struct rocker_port *rocker_port,
+			      int flags, struct rocker_flow_tbl_entry *entry)
+{
+	bool nowait = flags & ROCKER_OP_FLAG_NOWAIT;
+
+	if (flags & ROCKER_OP_FLAG_REMOVE)
+		return rocker_flow_tbl_del(rocker_port, entry, nowait);
+	else
+		return rocker_flow_tbl_add(rocker_port, entry, nowait);
+}
+
+static int rocker_flow_tbl_ig_port(struct rocker_port *rocker_port,
+				   int flags, u32 in_lport, u32 in_lport_mask,
+				   enum rocker_of_dpa_table_id goto_tbl)
+{
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.priority = ROCKER_PRIORITY_IG_PORT;
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT;
+	entry->key.ig_port.in_lport = in_lport;
+	entry->key.ig_port.in_lport_mask = in_lport_mask;
+	entry->key.ig_port.goto_tbl = goto_tbl;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_vlan(struct rocker_port *rocker_port,
+				int flags, u32 in_lport,
+				__be16 vlan_id, __be16 vlan_id_mask,
+				enum rocker_of_dpa_table_id goto_tbl,
+				bool untagged, __be16 new_vlan_id)
+{
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.priority = ROCKER_PRIORITY_VLAN;
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_VLAN;
+	entry->key.vlan.in_lport = in_lport;
+	entry->key.vlan.vlan_id = vlan_id;
+	entry->key.vlan.vlan_id_mask = vlan_id_mask;
+	entry->key.vlan.goto_tbl = goto_tbl;
+
+	entry->key.vlan.untagged = untagged;
+	entry->key.vlan.new_vlan_id = new_vlan_id;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_term_mac(struct rocker_port *rocker_port,
+				    u32 in_lport, u32 in_lport_mask,
+				    __be16 eth_type, const u8 *eth_dst,
+				    const u8 *eth_dst_mask, __be16 vlan_id,
+				    __be16 vlan_id_mask, bool copy_to_cpu,
+				    int flags)
+{
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	if (is_multicast_ether_addr(eth_dst)) {
+		entry->key.priority = ROCKER_PRIORITY_TERM_MAC_MCAST;
+		entry->key.term_mac.goto_tbl =
+			 ROCKER_OF_DPA_TABLE_ID_MULTICAST_ROUTING;
+	} else {
+		entry->key.priority = ROCKER_PRIORITY_TERM_MAC_UCAST;
+		entry->key.term_mac.goto_tbl =
+			 ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING;
+	}
+
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
+	entry->key.term_mac.in_lport = in_lport;
+	entry->key.term_mac.in_lport_mask = in_lport_mask;
+	entry->key.term_mac.eth_type = eth_type;
+	ether_addr_copy(entry->key.term_mac.eth_dst, eth_dst);
+	ether_addr_copy(entry->key.term_mac.eth_dst_mask, eth_dst_mask);
+	entry->key.term_mac.vlan_id = vlan_id;
+	entry->key.term_mac.vlan_id_mask = vlan_id_mask;
+	entry->key.term_mac.copy_to_cpu = copy_to_cpu;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_bridge(struct rocker_port *rocker_port,
+				  int flags,
+				  const u8 *eth_dst, const u8 *eth_dst_mask,
+				  __be16 vlan_id, u32 tunnel_id,
+				  enum rocker_of_dpa_table_id goto_tbl,
+				  u32 group_id, bool copy_to_cpu)
+{
+	struct rocker_flow_tbl_entry *entry;
+	u32 priority;
+	bool vlan_bridging = !!vlan_id;
+	bool dflt = !eth_dst || (eth_dst && eth_dst_mask);
+	bool wild = false;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_BRIDGING;
+
+	if (eth_dst) {
+		entry->key.bridge.has_eth_dst = 1;
+		ether_addr_copy(entry->key.bridge.eth_dst, eth_dst);
+	}
+	if (eth_dst_mask) {
+		entry->key.bridge.has_eth_dst_mask = 1;
+		ether_addr_copy(entry->key.bridge.eth_dst_mask, eth_dst_mask);
+		if (memcmp(eth_dst_mask, ff_mac, ETH_ALEN))
+			wild = true;
+	}
+
+	priority = ROCKER_PRIORITY_UNKNOWN;
+	if (vlan_bridging & dflt & wild)
+		priority = ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_WILD;
+	else if (vlan_bridging & dflt & !wild)
+		priority = ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_EXACT;
+	else if (vlan_bridging & !dflt)
+		priority = ROCKER_PRIORITY_BRIDGING_VLAN;
+	else if (!vlan_bridging & dflt & wild)
+		priority = ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_WILD;
+	else if (!vlan_bridging & dflt & !wild)
+		priority = ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_EXACT;
+	else if (!vlan_bridging & !dflt)
+		priority = ROCKER_PRIORITY_BRIDGING_TENANT;
+
+	entry->key.priority = priority;
+	entry->key.bridge.vlan_id = vlan_id;
+	entry->key.bridge.tunnel_id = tunnel_id;
+	entry->key.bridge.goto_tbl = goto_tbl;
+	entry->key.bridge.group_id = group_id;
+	entry->key.bridge.copy_to_cpu = copy_to_cpu;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_acl(struct rocker_port *rocker_port,
+			       int flags, u32 in_lport,
+			       u32 in_lport_mask,
+			       const u8 *eth_src, const u8 *eth_src_mask,
+			       const u8 *eth_dst, const u8 *eth_dst_mask,
+			       __be16 eth_type,
+			       __be16 vlan_id, __be16 vlan_id_mask,
+			       u8 ip_proto, u8 ip_proto_mask,
+			       u8 ip_tos, u8 ip_tos_mask,
+			       u32 group_id)
+{
+	u32 priority;
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	priority = ROCKER_PRIORITY_ACL_NORMAL;
+	if (eth_dst && eth_dst_mask) {
+		if (memcmp(eth_dst_mask, mcast_mac, ETH_ALEN) == 0)
+			priority = ROCKER_PRIORITY_ACL_DFLT;
+		else if (is_link_local_ether_addr(eth_dst))
+			priority = ROCKER_PRIORITY_ACL_CTRL;
+	}
+
+	entry->key.priority = priority;
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	entry->key.acl.in_lport = in_lport;
+	entry->key.acl.in_lport_mask = in_lport_mask;
+
+	if (eth_src)
+		ether_addr_copy(entry->key.acl.eth_src, eth_src);
+	if (eth_src_mask)
+		ether_addr_copy(entry->key.acl.eth_src_mask, eth_src_mask);
+	if (eth_dst)
+		ether_addr_copy(entry->key.acl.eth_dst, eth_dst);
+	if (eth_dst_mask)
+		ether_addr_copy(entry->key.acl.eth_dst_mask, eth_dst_mask);
+
+	entry->key.acl.eth_type = eth_type;
+	entry->key.acl.vlan_id = vlan_id;
+	entry->key.acl.vlan_id_mask = vlan_id_mask;
+	entry->key.acl.ip_proto = ip_proto;
+	entry->key.acl.ip_proto_mask = ip_proto_mask;
+	entry->key.acl.ip_tos = ip_tos;
+	entry->key.acl.ip_tos_mask = ip_tos_mask;
+	entry->key.acl.group_id = group_id;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static struct rocker_group_tbl_entry *
+rocker_group_tbl_find(struct rocker *rocker,
+		      struct rocker_group_tbl_entry *match)
+{
+	struct rocker_group_tbl_entry *found;
+
+	hash_for_each_possible(rocker->group_tbl, found,
+			       entry, match->group_id)
+		if (found->group_id == match->group_id)
+			return found;
+
+	return NULL;
+}
+
+static void rocker_group_tbl_entry_free(struct rocker_group_tbl_entry *entry)
+{
+	switch (ROCKER_GROUP_TYPE_GET(entry->group_id)) {
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_FLOOD:
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST:
+		kfree(entry->group_ids);
+		break;
+	default:
+		break;
+	}
+	kfree(entry);
+}
+
+static int rocker_group_tbl_add(struct rocker_port *rocker_port,
+				struct rocker_group_tbl_entry *match,
+				bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_group_tbl_entry *found;
+	unsigned long flags;
+	int err = 0;
+
+	spin_lock_irqsave(&rocker->group_tbl_lock, flags);
+
+	found = rocker_group_tbl_find(rocker, match);
+
+	if (found) {
+		hash_del(&found->entry);
+		rocker_group_tbl_entry_free(found);
+		found = match;
+		found->cmd = ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_MOD;
+	} else {
+		found = match;
+		found->cmd = ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_ADD;
+	}
+
+	hash_add(rocker->group_tbl, &found->entry, found->group_id);
+
+	spin_unlock_irqrestore(&rocker->group_tbl_lock, flags);
+
+	if (found->cmd)
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_group_tbl_add,
+				      found, NULL, NULL, nowait);
+
+	return err;
+}
+
+static int rocker_group_tbl_del(struct rocker_port *rocker_port,
+				struct rocker_group_tbl_entry *match,
+				bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_group_tbl_entry *found;
+	unsigned long flags;
+	int err = 0;
+
+	spin_lock_irqsave(&rocker->group_tbl_lock, flags);
+
+	found = rocker_group_tbl_find(rocker, match);
+
+	if (found) {
+		hash_del(&found->entry);
+		found->cmd = ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_DEL;
+	}
+
+	spin_unlock_irqrestore(&rocker->group_tbl_lock, flags);
+
+	rocker_group_tbl_entry_free(match);
+
+	if (found) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_group_tbl_del,
+				      found, NULL, NULL, nowait);
+		rocker_group_tbl_entry_free(found);
+	}
+
+	return err;
+}
+
+static int rocker_group_tbl_do(struct rocker_port *rocker_port,
+			       int flags, struct rocker_group_tbl_entry *entry)
+{
+	bool nowait = flags & ROCKER_OP_FLAG_NOWAIT;
+
+	if (flags & ROCKER_OP_FLAG_REMOVE)
+		return rocker_group_tbl_del(rocker_port, entry, nowait);
+	else
+		return rocker_group_tbl_add(rocker_port, entry, nowait);
+}
+
+static int rocker_group_l2_interface(struct rocker_port *rocker_port,
+				     int flags, __be16 vlan_id,
+				     u32 out_lport, int pop_vlan)
+{
+	struct rocker_group_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
+	entry->l2_interface.pop_vlan = pop_vlan;
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_group_l2_fan_out(struct rocker_port *rocker_port,
+				   int flags, u8 group_count,
+				   u32 *group_ids, u32 group_id)
+{
+	struct rocker_group_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->group_id = group_id;
+	entry->group_count = group_count;
+
+	entry->group_ids = kcalloc(group_count, sizeof(u32),
+				   rocker_op_flags_gfp(flags));
+	if (!entry->group_ids) {
+		kfree(entry);
+		return -ENOMEM;
+	}
+	memcpy(entry->group_ids, group_ids, group_count * sizeof(u32));
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_group_l2_flood(struct rocker_port *rocker_port,
+				 int flags, __be16 vlan_id,
+				 u8 group_count, u32 *group_ids,
+				 u32 group_id)
+{
+	return rocker_group_l2_fan_out(rocker_port, flags,
+				       group_count, group_ids,
+				       group_id);
+}
+
+static struct rocker_ctrl {
+	const u8 *eth_dst;
+	const u8 *eth_dst_mask;
+	u16 eth_type;
+	bool acl;
+	bool bridge;
+	bool term;
+	bool copy_to_cpu;
+} rocker_ctrls[] = {
+	[ROCKER_CTRL_LINK_LOCAL_MCAST] = {
+		/* pass link local multicast pkts up to CPU for filtering */
+		.eth_dst = ll_mac,
+		.eth_dst_mask = ll_mask,
+		.acl = true,
+	},
+	[ROCKER_CTRL_LOCAL_ARP] = {
+		/* pass local ARP pkts up to CPU */
+		.eth_dst = zero_mac,
+		.eth_dst_mask = zero_mac,
+		.eth_type = htons(ETH_P_ARP),
+		.acl = true,
+	},
+	[ROCKER_CTRL_IPV4_MCAST] = {
+		/* pass IPv4 mcast pkts up to CPU, RFC 1112 */
+		.eth_dst = ipv4_mcast,
+		.eth_dst_mask = ipv4_mask,
+		.eth_type = htons(ETH_P_IP),
+		.term  = true,
+		.copy_to_cpu = true,
+	},
+	[ROCKER_CTRL_IPV6_MCAST] = {
+		/* pass IPv6 mcast pkts up to CPU, RFC 2464 */
+		.eth_dst = ipv6_mcast,
+		.eth_dst_mask = ipv6_mask,
+		.eth_type = htons(ETH_P_IPV6),
+		.term  = true,
+		.copy_to_cpu = true,
+	},
+	[ROCKER_CTRL_DFLT_BRIDGING] = {
+		/* flood any pkts on vlan */
+		.bridge = true,
+		.copy_to_cpu = true,
+	},
+};
+
+static int rocker_port_ctrl_vlan_acl(struct rocker_port *rocker_port,
+				     int flags, struct rocker_ctrl *ctrl,
+				     __be16 vlan_id)
+{
+	u32 in_lport = rocker_port->lport;
+	u32 in_lport_mask = 0xffffffff;
+	u32 out_lport = 0;
+	u8 *eth_src = NULL;
+	u8 *eth_src_mask = NULL;
+	__be16 vlan_id_mask = htons(0xffff);
+	u8 ip_proto = 0;
+	u8 ip_proto_mask = 0;
+	u8 ip_tos = 0;
+	u8 ip_tos_mask = 0;
+	u32 group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
+	int err;
+
+	err = rocker_flow_tbl_acl(rocker_port, flags,
+				  in_lport, in_lport_mask,
+				  eth_src, eth_src_mask,
+				  ctrl->eth_dst, ctrl->eth_dst_mask,
+				  ctrl->eth_type,
+				  vlan_id, vlan_id_mask,
+				  ip_proto, ip_proto_mask,
+				  ip_tos, ip_tos_mask,
+				  group_id);
+
+	if (err)
+		netdev_err(rocker_port->dev, "Error (%d) ctrl ACL\n", err);
+
+	return err;
+}
+
+static int rocker_port_ctrl_vlan_term(struct rocker_port *rocker_port,
+				      int flags, struct rocker_ctrl *ctrl,
+				      __be16 vlan_id)
+{
+	u32 in_lport_mask = 0xffffffff;
+	__be16 vlan_id_mask = htons(0xffff);
+	int err;
+
+	if (ntohs(vlan_id) == 0)
+		vlan_id = rocker_port->internal_vlan_id;
+
+	err = rocker_flow_tbl_term_mac(rocker_port,
+				       rocker_port->lport, in_lport_mask,
+				       ctrl->eth_type, ctrl->eth_dst,
+				       ctrl->eth_dst_mask, vlan_id,
+				       vlan_id_mask, ctrl->copy_to_cpu,
+				       flags);
+
+	if (err)
+		netdev_err(rocker_port->dev, "Error (%d) ctrl term\n", err);
+
+	return err;
+}
+
+static int rocker_port_ctrl_vlan(struct rocker_port *rocker_port, int flags,
+				 struct rocker_ctrl *ctrl, __be16 vlan_id)
+{
+	if (ctrl->acl)
+		return rocker_port_ctrl_vlan_acl(rocker_port, flags,
+						 ctrl, vlan_id);
+
+	if (ctrl->term)
+		return rocker_port_ctrl_vlan_term(rocker_port, flags,
+						  ctrl, vlan_id);
+
+	return -EOPNOTSUPP;
+}
+
+static int rocker_port_ctrl_vlan_add(struct rocker_port *rocker_port,
+				     int flags, __be16 vlan_id)
+{
+	int err = 0;
+	int i;
+
+	for (i = 0; i < ROCKER_CTRL_MAX; i++) {
+		if (rocker_port->ctrls[i]) {
+			err = rocker_port_ctrl_vlan(rocker_port, flags,
+						    &rocker_ctrls[i], vlan_id);
+			if (err)
+				return err;
+		}
+	}
+
+	return err;
+}
+
+static int rocker_port_ctrl(struct rocker_port *rocker_port, int flags,
+			    struct rocker_ctrl *ctrl)
+{
+	u16 vid;
+	int err = 0;
+
+	for (vid = 1; vid < VLAN_N_VID; vid++) {
+		if (!test_bit(vid, rocker_port->vlan_bitmap))
+			continue;
+		err = rocker_port_ctrl_vlan(rocker_port, flags,
+					    ctrl, htons(vid));
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static int rocker_port_ig_tbl(struct rocker_port *rocker_port, int flags)
+{
+	enum rocker_of_dpa_table_id goto_tbl;
+	u32 in_lport;
+	u32 in_lport_mask;
+	int err;
+
+	/* Normal Ethernet Frames.  Matches pkts from any local physical
+	 * ports.  Goto VLAN tbl.
+	 */
+
+	in_lport = 0;
+	in_lport_mask = 0xffff0000;
+	goto_tbl = ROCKER_OF_DPA_TABLE_ID_VLAN;
+
+	err = rocker_flow_tbl_ig_port(rocker_port, flags,
+				      in_lport, in_lport_mask,
+				      goto_tbl);
+	if (err)
+		netdev_err(rocker_port->dev,
+			   "Error (%d) ingress port table entry\n", err);
+
+	return err;
+}
+
+static int rocker_port_router_mac(struct rocker_port *rocker_port,
+				  int flags, __be16 vlan_id)
+{
+	u32 in_lport_mask = 0xffffffff;
+	__be16 eth_type;
+	const u8 *dst_mac_mask = ff_mac;
+	__be16 vlan_id_mask = htons(0xffff);
+	bool copy_to_cpu = false;
+	int err;
+
+	if (ntohs(vlan_id) == 0)
+		vlan_id = rocker_port->internal_vlan_id;
+
+	eth_type = htons(ETH_P_IP);
+	err = rocker_flow_tbl_term_mac(rocker_port,
+				       rocker_port->lport, in_lport_mask,
+				       eth_type, rocker_port->dev->dev_addr,
+				       dst_mac_mask, vlan_id, vlan_id_mask,
+				       copy_to_cpu, flags);
+	if (err)
+		return err;
+
+	eth_type = htons(ETH_P_IPV6);
+	err = rocker_flow_tbl_term_mac(rocker_port,
+				       rocker_port->lport, in_lport_mask,
+				       eth_type, rocker_port->dev->dev_addr,
+				       dst_mac_mask, vlan_id, vlan_id_mask,
+				       copy_to_cpu, flags);
+
+	return err;
+}
+
+static struct rocker_internal_vlan_tbl_entry *
+rocker_internal_vlan_tbl_find(struct rocker *rocker, int ifindex)
+{
+	struct rocker_internal_vlan_tbl_entry *found;
+
+	hash_for_each_possible(rocker->internal_vlan_tbl,
+			       found, entry, ifindex)
+		if (found->ifindex == ifindex)
+			return found;
+
+	return NULL;
+}
+
+static __be16 rocker_port_internal_vlan_id_get(struct rocker_port *rocker_port,
+					       int ifindex)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_internal_vlan_tbl_entry *entry;
+	struct rocker_internal_vlan_tbl_entry *found;
+	unsigned long lock_flags;
+	int i;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return 0;
+
+	entry->ifindex = ifindex;
+
+	spin_lock_irqsave(&rocker->internal_vlan_tbl_lock, lock_flags);
+
+	found = rocker_internal_vlan_tbl_find(rocker, ifindex);
+	if (found) {
+		kfree(entry);
+		goto found;
+	}
+
+	found = entry;
+	hash_add(rocker->internal_vlan_tbl, &found->entry, found->ifindex);
+
+	for (i = 0; i < ROCKER_N_INTERNAL_VLANS; i++) {
+		if (test_and_set_bit(i, rocker->internal_vlan_bitmap))
+			continue;
+		found->vlan_id = htons(ROCKER_INTERNAL_VLAN_ID_BASE + i);
+		goto found;
+	}
+
+	netdev_err(rocker_port->dev, "Out of internal VLAN IDs\n");
+
+found:
+	found->ref_count++;
+	spin_unlock_irqrestore(&rocker->internal_vlan_tbl_lock, lock_flags);
+
+	return found->vlan_id;
+}
+
+static void rocker_port_internal_vlan_id_put(struct rocker_port *rocker_port,
+					     int ifindex)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_internal_vlan_tbl_entry *found;
+	unsigned long lock_flags;
+	unsigned long bit;
+
+	spin_lock_irqsave(&rocker->internal_vlan_tbl_lock, lock_flags);
+
+	found = rocker_internal_vlan_tbl_find(rocker, ifindex);
+	if (!found) {
+		netdev_err(rocker_port->dev,
+			   "ifindex (%d) not found in internal VLAN tbl\n",
+			   ifindex);
+		goto not_found;
+	}
+
+	if (--found->ref_count <= 0) {
+		bit = ntohs(found->vlan_id) - ROCKER_INTERNAL_VLAN_ID_BASE;
+		clear_bit(bit, rocker->internal_vlan_bitmap);
+		hash_del(&found->entry);
+		kfree(found);
+	}
+
+not_found:
+	spin_unlock_irqrestore(&rocker->internal_vlan_tbl_lock, lock_flags);
+}
+
 /*****************
  * Net device ops
  *****************/
@@ -1770,10 +3207,14 @@ static void rocker_carrier_init(struct rocker_port *rocker_port)
 
 static void rocker_remove_ports(struct rocker *rocker)
 {
+	struct rocker_port *rocker_port;
 	int i;
 
-	for (i = 0; i < rocker->port_count; i++)
-		unregister_netdev(rocker->ports[i]->dev);
+	for (i = 0; i < rocker->port_count; i++) {
+		rocker_port = rocker->ports[i];
+		rocker_port_ig_tbl(rocker_port, ROCKER_OP_FLAG_REMOVE);
+		unregister_netdev(rocker_port->dev);
+	}
 	kfree(rocker->ports);
 }
 
@@ -1825,8 +3266,18 @@ static int rocker_probe_port(struct rocker *rocker, unsigned port_number)
 	}
 	rocker->ports[port_number] = rocker_port;
 
+	rocker_port->internal_vlan_id =
+		rocker_port_internal_vlan_id_get(rocker_port, dev->ifindex);
+	err = rocker_port_ig_tbl(rocker_port, 0);
+	if (err) {
+		dev_err(&pdev->dev, "install ig port table failed\n");
+		goto err_port_ig_tbl;
+	}
+
 	return 0;
 
+err_port_ig_tbl:
+	unregister_netdev(dev);
 err_register_netdev:
 	free_netdev(dev);
 	return err;
@@ -1983,6 +3434,12 @@ static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	rocker->hw.id = rocker_read64(rocker, SWITCH_ID);
 
+	err = rocker_init_tbls(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "cannot init rocker tables\n");
+		goto err_init_tbls;
+	}
+
 	err = rocker_probe_ports(rocker);
 	if (err) {
 		dev_err(&pdev->dev, "failed to probe ports\n");
@@ -1994,6 +3451,8 @@ static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return 0;
 
 err_probe_ports:
+	rocker_free_tbls(rocker);
+err_init_tbls:
 	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
 err_request_event_irq:
 	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD), rocker);
@@ -2019,6 +3478,7 @@ static void rocker_remove(struct pci_dev *pdev)
 {
 	struct rocker *rocker = pci_get_drvdata(pdev);
 
+	rocker_free_tbls(rocker);
 	rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
 	rocker_remove_ports(rocker);
 	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

From: Scott Feldman <sfeldma@gmail.com>

When the swdev device learns a new mac/vlan on a port, it sends some async
notification to the driver and the driver installs an FDB in the device.
To give a holistic system view, the learned mac/vlan should be reflected
in the bridge's FBD table, so the user, using normal iproute2 cmds, can view
what is currently learned by the device.  This API on the bridge driver gives
a way for the swdev driver to install an FBD entry in the bridge FBD table.
(And remove one).

This is equivalent to the device running these cmds:

  bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master

This patch needs some extra eyeballs for review, in paricular around the
locking and contexts.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/if_bridge.h | 18 ++++++++++
 net/bridge/br_fdb.c       | 84 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 808dcb8..27ab217 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -37,6 +37,24 @@ extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __use
 typedef int br_should_route_hook_t(struct sk_buff *skb);
 extern br_should_route_hook_t __rcu *br_should_route_hook;
 
+#if IS_ENABLED(CONFIG_BRIDGE)
+int br_fdb_learn_add(struct net_device *dev,
+		     const unsigned char *addr, u16 vid);
+int br_fdb_learn_del(struct net_device *dev,
+		     const unsigned char *addr, u16 vid);
+#else
+static inline int br_fdb_learn_add(struct net_device *dev,
+				   const unsigned char *addr, u16 vid)
+{
+	return 0;
+}
+static inline int br_fdb_learn_del(struct net_device *dev,
+				   const unsigned char *addr, u16 vid)
+{
+	return 0;
+}
+#endif
+
 #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
 int br_multicast_list_adjacent(struct net_device *dev,
 			       struct list_head *br_ip_list);
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index f6f8bb5..e02d21b 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -1022,3 +1022,87 @@ void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p)
 		}
 	}
 }
+
+int br_fdb_learn_add(struct net_device *dev, const unsigned char *addr,
+		     u16 vid)
+{
+	struct net_bridge_port *p;
+	struct net_bridge *br;
+	struct hlist_head *head;
+	struct net_bridge_fdb_entry *fdb;
+	int err = 0;
+
+	rtnl_lock();
+
+	p = br_port_get_rtnl(dev);
+	if (p == NULL) {
+		pr_info("bridge: %s not a bridge port\n", dev->name);
+		err = -EINVAL;
+		goto err_rtnl_unlock;
+	}
+
+	br = p->br;
+
+	spin_lock(&br->hash_lock);
+
+	head = &br->hash[br_mac_hash(addr, vid)];
+	fdb = fdb_find(head, addr, vid);
+	if (fdb == NULL) {
+		fdb = fdb_create(head, p, addr, vid);
+		if (!fdb) {
+			err = -ENOMEM;
+			goto err_unlock;
+		}
+		fdb->is_local = 1;
+		fdb->used = jiffies;
+		fdb->updated = jiffies;
+		fdb_notify(br, fdb, RTM_NEWNEIGH);
+	} else {
+		err = -EEXIST;
+	}
+
+err_unlock:
+	spin_unlock(&br->hash_lock);
+err_rtnl_unlock:
+	rtnl_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL(br_fdb_learn_add);
+
+int br_fdb_learn_del(struct net_device *dev, const unsigned char *addr,
+		     u16 vid)
+{
+	struct net_bridge_port *p;
+	struct net_bridge *br;
+	struct hlist_head *head;
+	struct net_bridge_fdb_entry *fdb;
+	int err = 0;
+
+	rtnl_lock();
+
+	p = br_port_get_rtnl(dev);
+	if (p == NULL) {
+		pr_info("bridge: %s not a bridge port\n", dev->name);
+		err = -EINVAL;
+		goto err_rtnl_unlock;
+	}
+
+	br = p->br;
+
+	spin_lock(&br->hash_lock);
+
+	head = &br->hash[br_mac_hash(addr, vid)];
+	fdb = fdb_find(head, addr, vid);
+	if (fdb)
+		fdb_delete(br, fdb);
+	else
+		err = -ENOENT;
+
+	spin_unlock(&br->hash_lock);
+err_rtnl_unlock:
+	rtnl_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL(br_fdb_learn_del);
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 06/10] bridge: introduce fdb offloading via switchdev
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

From: Scott Feldman <sfeldma@gmail.com>

Add two new ndos: ndo_sw_port_fdb_add/del to offload static bridge
fdb entries.  Static bridge FDB entries are installed, for example,
using iproute2 bridge cmd:

       bridge fdb add ADDR dev DEV master vlan VID

This would install ADDR into the bridge's FDB for port DEV on vlan VID.  The
switch driver implements two ndo_swdev ops to add/delete FDB entries in the
switch device:

       int ndo_sw_port_fdb_add(struct net_device *dev,
                               const unsigned char *addr,
                               u16 vid);

       int ndo_sw_port_fdb_del(struct net_device *dev,
                               const unsigned char *addr,
                               u16 vid);

The driver returns 0 on success, negative error code on failure.

Note: the switch driver would not implement ndo_fdb_add/del/dump on a port
netdev as these are intended for devices maintaining their own FDB.  In our
case, we want the Linux bridge to own the FBD.

Note: by default, the bridge does not filter on VLAN and only bridges untagged
traffic.  To enable VLAN support, turn on VLAN filtering:

      echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h | 16 ++++++++++++++++
 include/net/switchdev.h   | 17 +++++++++++++++++
 net/bridge/br_fdb.c       | 10 +++++++++-
 net/switchdev/switchdev.c | 41 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 54b08a7..7de65c5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1023,6 +1023,16 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	Called to get an ID of the switch chip this port is part of.
  *	If driver implements this, it indicates that it represents a port
  *	of a switch chip.
+ *
+ * int (*ndo_sw_port_fdb_add)(struct net_device *dev,
+ *			      const unsigned char *addr,
+ *			      u16 vid);
+ *	Called to add a fdb to switch device port.
+ *
+ * int (*ndo_sw_port_fdb_del)(struct net_device *dev,
+ *			      const unsigned char *addr,
+ *			      u16 vid);
+ *	Called to delete a fdb from switch device port.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1177,6 +1187,12 @@ struct net_device_ops {
 #ifdef CONFIG_NET_SWITCHDEV
 	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
 							struct netdev_phys_item_id *psid);
+	int			(*ndo_sw_port_fdb_add)(struct net_device *dev,
+						       const unsigned char *addr,
+						       u16 vid);
+	int			(*ndo_sw_port_fdb_del)(struct net_device *dev,
+						       const unsigned char *addr,
+						       u16 vid);
 #endif
 };
 
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 79bf9bd..130cef7 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -1,6 +1,7 @@
 /*
  * include/net/switchdev.h - Switch device API
  * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ * Copyright (c) 2014 Scott Feldman <sfeldma@gmail.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -16,6 +17,10 @@
 
 int netdev_sw_parent_id_get(struct net_device *dev,
 			    struct netdev_phys_item_id *psid);
+int netdev_sw_port_fdb_add(struct net_device *dev,
+			   const unsigned char *addr, u16 vid);
+int netdev_sw_port_fdb_del(struct net_device *dev,
+			   const unsigned char *addr, u16 vid);
 
 #else
 
@@ -25,6 +30,18 @@ static inline int netdev_sw_parent_id_get(struct net_device *dev,
 	return -EOPNOTSUPP;
 }
 
+static inline int netdev_sw_port_fdb_add(struct net_device *dev,
+					 const unsigned char *addr, u16 vid)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int netdev_sw_port_fdb_del(struct net_device *dev,
+					 const unsigned char *addr, u16 vid)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 6f6c95c..f6f8bb5 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -24,6 +24,7 @@
 #include <linux/atomic.h>
 #include <asm/unaligned.h>
 #include <linux/if_vlan.h>
+#include <net/switchdev.h>
 #include "br_private.h"
 
 static struct kmem_cache *br_fdb_cache __read_mostly;
@@ -132,8 +133,12 @@ static void fdb_del_hw(struct net_bridge *br, const unsigned char *addr)
 
 static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
 {
-	if (f->is_static)
+	if (f->is_static) {
 		fdb_del_hw(br, f->addr.addr);
+		if (f->dst)
+			netdev_sw_port_fdb_del(f->dst->dev,
+					       f->addr.addr, f->vlan_id);
+	}
 
 	hlist_del_rcu(&f->hlist);
 	fdb_notify(br, f, RTM_DELNEIGH);
@@ -755,18 +760,21 @@ static int fdb_add_entry(struct net_bridge_port *source, const __u8 *addr,
 			if (!fdb->is_static) {
 				fdb->is_static = 1;
 				fdb_add_hw(br, addr);
+				netdev_sw_port_fdb_add(source->dev, addr, vid);
 			}
 		} else if (state & NUD_NOARP) {
 			fdb->is_local = 0;
 			if (!fdb->is_static) {
 				fdb->is_static = 1;
 				fdb_add_hw(br, addr);
+				netdev_sw_port_fdb_add(source->dev, addr, vid);
 			}
 		} else {
 			fdb->is_local = 0;
 			if (fdb->is_static) {
 				fdb->is_static = 0;
 				fdb_del_hw(br, addr);
+				netdev_sw_port_fdb_del(source->dev, addr, vid);
 			}
 		}
 
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 5010f646..93d47b7 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -1,6 +1,7 @@
 /*
  * net/switchdev/switchdev.c - Switch device API
  * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ * Copyright (c) 2014 Scott Feldman <sfeldma@gmail.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -31,3 +32,43 @@ int netdev_sw_parent_id_get(struct net_device *dev,
 	return ops->ndo_sw_parent_id_get(dev, psid);
 }
 EXPORT_SYMBOL(netdev_sw_parent_id_get);
+
+/**
+ *	netdev_sw_port_fdb_add - Add a fdb into switch port
+ *	@dev: port device
+ *	@addr: mac address
+ *	@vid: vlan id
+ *
+ *	Add a fdb into switch port.
+ */
+int netdev_sw_port_fdb_add(struct net_device *dev,
+			   const unsigned char *addr, u16 vid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_sw_port_fdb_add)
+		return -EOPNOTSUPP;
+	WARN_ON(!ops->ndo_sw_parent_id_get);
+	return ops->ndo_sw_port_fdb_add(dev, addr, vid);
+}
+EXPORT_SYMBOL(netdev_sw_port_fdb_add);
+
+/**
+ *	netdev_sw_port_fdb_del - Delete a fdb from switch port
+ *	@dev: port device
+ *	@addr: mac address
+ *	@vid: vlan id
+ *
+ *	Delete a fdb from switch port.
+ */
+int netdev_sw_port_fdb_del(struct net_device *dev,
+			   const unsigned char *addr, u16 vid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_sw_port_fdb_del)
+		return -EOPNOTSUPP;
+	WARN_ON(!ops->ndo_sw_parent_id_get);
+	return ops->ndo_sw_port_fdb_del(dev, addr, vid);
+}
+EXPORT_SYMBOL(netdev_sw_port_fdb_del);
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

From: Scott Feldman <sfeldma@gmail.com>

To notify switch driver of change in STP state of bridge port, add new
.ndo op and provide swdev wrapper func to call ndo op. Use it in bridge
code then.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h |  6 ++++++
 include/net/switchdev.h   |  6 ++++++
 net/bridge/br_netlink.c   |  2 ++
 net/bridge/br_stp.c       |  4 ++++
 net/bridge/br_stp_if.c    |  3 +++
 net/bridge/br_stp_timer.c |  2 ++
 net/switchdev/switchdev.c | 19 +++++++++++++++++++
 7 files changed, 42 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7de65c5..21e5869 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1033,6 +1033,10 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *			      const unsigned char *addr,
  *			      u16 vid);
  *	Called to delete a fdb from switch device port.
+ *
+ * int (*ndo_sw_port_stp_update)(struct net_device *dev, u8 state);
+ *	Called to notify switch device port of bridge port STP
+ *	state change.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1193,6 +1197,8 @@ struct net_device_ops {
 	int			(*ndo_sw_port_fdb_del)(struct net_device *dev,
 						       const unsigned char *addr,
 						       u16 vid);
+	int			(*ndo_sw_port_stp_update)(struct net_device *dev,
+							  u8 state);
 #endif
 };
 
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 130cef7..bbf7369 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -21,6 +21,7 @@ int netdev_sw_port_fdb_add(struct net_device *dev,
 			   const unsigned char *addr, u16 vid);
 int netdev_sw_port_fdb_del(struct net_device *dev,
 			   const unsigned char *addr, u16 vid);
+int netdev_sw_port_stp_update(struct net_device *dev, u8 state);
 
 #else
 
@@ -42,6 +43,11 @@ static inline int netdev_sw_port_fdb_del(struct net_device *dev,
 	return -EOPNOTSUPP;
 }
 
+static inline int netdev_sw_port_stp_update(struct net_device *dev, u8 state)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 86c239b..13fecf1 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -17,6 +17,7 @@
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <uapi/linux/if_bridge.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 #include "br_private_stp.h"
@@ -304,6 +305,7 @@ static int br_set_port_state(struct net_bridge_port *p, u8 state)
 
 	br_set_state(p, state);
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_port_state_selection(p->br);
 	return 0;
 }
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 2b047bc..c00139b 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -12,6 +12,7 @@
  */
 #include <linux/kernel.h>
 #include <linux/rculist.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 #include "br_private_stp.h"
@@ -114,6 +115,7 @@ static void br_root_port_block(const struct net_bridge *br,
 
 	br_set_state(p, BR_STATE_LISTENING);
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 
 	if (br->forward_delay > 0)
@@ -394,6 +396,7 @@ static void br_make_blocking(struct net_bridge_port *p)
 
 		br_set_state(p, BR_STATE_BLOCKING);
 		br_log_state(p);
+		netdev_sw_port_stp_update(p->dev, p->state);
 		br_ifinfo_notify(RTM_NEWLINK, p);
 
 		del_timer(&p->forward_delay_timer);
@@ -419,6 +422,7 @@ static void br_make_forwarding(struct net_bridge_port *p)
 
 	br_multicast_enable_port(p);
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 
 	if (br->forward_delay != 0)
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 4114687..91279f8 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -15,6 +15,7 @@
 #include <linux/kmod.h>
 #include <linux/etherdevice.h>
 #include <linux/rtnetlink.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 #include "br_private_stp.h"
@@ -89,6 +90,7 @@ void br_stp_enable_port(struct net_bridge_port *p)
 	br_init_port(p);
 	br_port_state_selection(p->br);
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 }
 
@@ -105,6 +107,7 @@ void br_stp_disable_port(struct net_bridge_port *p)
 	p->config_pending = 0;
 
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 
 	del_timer(&p->message_age_timer);
diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
index 4fcaa67..5bb8997 100644
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -13,6 +13,7 @@
 
 #include <linux/kernel.h>
 #include <linux/times.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 #include "br_private_stp.h"
@@ -97,6 +98,7 @@ static void br_forward_delay_timer_expired(unsigned long arg)
 		netif_carrier_on(br->dev);
 	}
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 	spin_unlock(&br->lock);
 }
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 93d47b7..75997a5 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -72,3 +72,22 @@ int netdev_sw_port_fdb_del(struct net_device *dev,
 	return ops->ndo_sw_port_fdb_del(dev, addr, vid);
 }
 EXPORT_SYMBOL(netdev_sw_port_fdb_del);
+
+/**
+ *	netdev_sw_port_stp_update - Notify switch device port of STP
+ *				    state change
+ *	@dev: port device
+ *	@state: port STP state
+ *
+ *	Notify switch device port of bridge port STP state change.
+ */
+int netdev_sw_port_stp_update(struct net_device *dev, u8 state)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_sw_port_stp_update)
+		return -EOPNOTSUPP;
+	WARN_ON(!ops->ndo_sw_parent_id_get);
+	return ops->ndo_sw_port_stp_update(dev, state);
+}
+EXPORT_SYMBOL(netdev_sw_port_stp_update);
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 05/10] rocker: introduce rocker switch driver
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

This patch introduces the first driver to benefit from the switchdev
infrastructure and to implement newly introduced switch ndos. This is a
driver for emulated switch chip implemented in qemu:
https://github.com/sfeldma/qemu-rocker/

This patch is a result of joint work with Scott Feldman.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 MAINTAINERS                          |    7 +
 drivers/net/ethernet/Kconfig         |    1 +
 drivers/net/ethernet/Makefile        |    1 +
 drivers/net/ethernet/rocker/Kconfig  |   27 +
 drivers/net/ethernet/rocker/Makefile |    5 +
 drivers/net/ethernet/rocker/rocker.c | 2062 ++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker.h |  427 +++++++
 7 files changed, 2530 insertions(+)
 create mode 100644 drivers/net/ethernet/rocker/Kconfig
 create mode 100644 drivers/net/ethernet/rocker/Makefile
 create mode 100644 drivers/net/ethernet/rocker/rocker.c
 create mode 100644 drivers/net/ethernet/rocker/rocker.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 776e078..7e15b50 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7807,6 +7807,13 @@ F:	drivers/hid/hid-roccat*
 F:	include/linux/hid-roccat*
 F:	Documentation/ABI/*/sysfs-driver-hid-roccat*
 
+ROCKER DRIVER
+M:	Jiri Pirko <jiri@resnulli.us>
+M:	Scott Feldman <sfeldma@gmail.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	drivers/net/ethernet/rocker/
+
 ROCKETPORT DRIVER
 P:	Comtrol Corp.
 W:	http://www.comtrol.com
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 1ed1fbb..df76050 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -155,6 +155,7 @@ source "drivers/net/ethernet/qualcomm/Kconfig"
 source "drivers/net/ethernet/realtek/Kconfig"
 source "drivers/net/ethernet/renesas/Kconfig"
 source "drivers/net/ethernet/rdc/Kconfig"
+source "drivers/net/ethernet/rocker/Kconfig"
 
 config S6GMAC
 	tristate "S6105 GMAC ethernet support"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 6e0b629..bf56f8b 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_NET_VENDOR_QUALCOMM) += qualcomm/
 obj-$(CONFIG_NET_VENDOR_REALTEK) += realtek/
 obj-$(CONFIG_SH_ETH) += renesas/
 obj-$(CONFIG_NET_VENDOR_RDC) += rdc/
+obj-$(CONFIG_NET_VENDOR_ROCKER) += rocker/
 obj-$(CONFIG_S6GMAC) += s6gmac.o
 obj-$(CONFIG_NET_VENDOR_SAMSUNG) += samsung/
 obj-$(CONFIG_NET_VENDOR_SEEQ) += seeq/
diff --git a/drivers/net/ethernet/rocker/Kconfig b/drivers/net/ethernet/rocker/Kconfig
new file mode 100644
index 0000000..11a850e
--- /dev/null
+++ b/drivers/net/ethernet/rocker/Kconfig
@@ -0,0 +1,27 @@
+#
+# Rocker device configuration
+#
+
+config NET_VENDOR_ROCKER
+	bool "Rocker devices"
+	default y
+	---help---
+	  If you have a network device belonging to this class, say Y.
+
+	  Note that the answer to this question doesn't directly affect the
+	  kernel: saying N will just cause the configurator to skip all
+	  the questions about Rocker devices. If you say Y, you will be asked for
+	  your specific card in the following questions.
+
+if NET_VENDOR_ROCKER
+
+config ROCKER
+	tristate "Rocker switch driver (EXPERIMENTAL)"
+	depends on PCI && NET_SWITCHDEV
+	---help---
+	  This driver supports Rocker switch device.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called rocker.
+
+endif # NET_VENDOR_ROCKER
diff --git a/drivers/net/ethernet/rocker/Makefile b/drivers/net/ethernet/rocker/Makefile
new file mode 100644
index 0000000..f85fb12
--- /dev/null
+++ b/drivers/net/ethernet/rocker/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Rocker network device drivers.
+#
+
+obj-$(CONFIG_ROCKER) += rocker.o
diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
new file mode 100644
index 0000000..797481b
--- /dev/null
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -0,0 +1,2062 @@
+/*
+ * drivers/net/ethernet/rocker/rocker.c - Rocker switch device driver
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ * Copyright (c) 2014 Scott Feldman <sfeldma@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/spinlock.h>
+#include <linux/crc32.h>
+#include <linux/sort.h>
+#include <linux/random.h>
+#include <linux/netdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <net/switchdev.h>
+#include <net/rtnetlink.h>
+#include <asm-generic/io-64-nonatomic-lo-hi.h>
+#include <generated/utsrelease.h>
+
+#include "rocker.h"
+
+static const char rocker_driver_name[] = "rocker";
+
+static const struct pci_device_id rocker_pci_id_table[] = {
+	{PCI_VDEVICE(REDHAT, PCI_DEVICE_ID_REDHAT_ROCKER), 0},
+	{0, }
+};
+
+struct rocker_desc_info {
+	char *data; /* mapped */
+	size_t data_size;
+	size_t tlv_size;
+	struct rocker_desc *desc;
+	DEFINE_DMA_UNMAP_ADDR(mapaddr);
+};
+
+struct rocker_dma_ring_info {
+	size_t size;
+	u32 head;
+	u32 tail;
+	struct rocker_desc *desc; /* mapped */
+	dma_addr_t mapaddr;
+	struct rocker_desc_info *desc_info;
+	unsigned int type;
+};
+
+struct rocker;
+
+struct rocker_port {
+	struct net_device *dev;
+	struct rocker *rocker;
+	unsigned port_number;
+	u32 lport;
+	struct napi_struct napi_tx;
+	struct napi_struct napi_rx;
+	struct rocker_dma_ring_info tx_ring;
+	struct rocker_dma_ring_info rx_ring;
+};
+
+struct rocker {
+	struct pci_dev *pdev;
+	u8 __iomem *hw_addr;
+	struct msix_entry *msix_entries;
+	unsigned port_count;
+	struct rocker_port **ports;
+	struct {
+		u64 id;
+	} hw;
+	spinlock_t cmd_ring_lock;
+	struct rocker_dma_ring_info cmd_ring;
+	struct rocker_dma_ring_info event_ring;
+};
+
+struct rocker_wait {
+	wait_queue_head_t wait;
+	bool done;
+	bool nowait;
+};
+
+static void rocker_wait_reset(struct rocker_wait *wait)
+{
+	wait->done = false;
+	wait->nowait = false;
+}
+
+static void rocker_wait_init(struct rocker_wait *wait)
+{
+	init_waitqueue_head(&wait->wait);
+	rocker_wait_reset(wait);
+}
+
+static struct rocker_wait *rocker_wait_create(gfp_t gfp)
+{
+	struct rocker_wait *wait;
+
+	wait = kmalloc(sizeof(*wait), gfp);
+	if (!wait)
+		return NULL;
+	rocker_wait_init(wait);
+	return wait;
+}
+
+static void rocker_wait_destroy(struct rocker_wait *work)
+{
+	kfree(work);
+}
+
+static bool rocker_wait_event_timeout(struct rocker_wait *wait,
+				      unsigned long timeout)
+{
+	wait_event_timeout(wait->wait, wait->done, HZ / 10);
+	if (!wait->done)
+		return false;
+	return true;
+}
+
+static void rocker_wait_wake_up(struct rocker_wait *wait)
+{
+	wait->done = true;
+	wake_up(&wait->wait);
+}
+
+static u32 rocker_msix_vector(struct rocker *rocker, unsigned vector)
+{
+	return rocker->msix_entries[vector].vector;
+}
+
+static u32 rocker_msix_tx_vector(struct rocker_port *rocker_port)
+{
+	return rocker_msix_vector(rocker_port->rocker,
+				  ROCKER_MSIX_VEC_TX(rocker_port->port_number));
+}
+
+static u32 rocker_msix_rx_vector(struct rocker_port *rocker_port)
+{
+	return rocker_msix_vector(rocker_port->rocker,
+				  ROCKER_MSIX_VEC_RX(rocker_port->port_number));
+}
+
+#define rocker_write32(rocker, reg, val)	\
+	writel((val), (rocker)->hw_addr + (ROCKER_ ## reg))
+#define rocker_read32(rocker, reg)	\
+	readl((rocker)->hw_addr + (ROCKER_ ## reg))
+#define rocker_write64(rocker, reg, val)	\
+	writeq((val), (rocker)->hw_addr + (ROCKER_ ## reg))
+#define rocker_read64(rocker, reg)	\
+	readq((rocker)->hw_addr + (ROCKER_ ## reg))
+
+/*****************************
+ * HW basic testing functions
+ *****************************/
+
+static int rocker_reg_test(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	u64 test_reg;
+	u64 rnd;
+
+	rnd = prandom_u32();
+	rnd >>= 1;
+	rocker_write32(rocker, TEST_REG, rnd);
+	test_reg = rocker_read32(rocker, TEST_REG);
+	if (test_reg != rnd * 2) {
+		dev_err(&pdev->dev, "unexpected 32bit register value %08llx, expected %08llx\n",
+			test_reg, rnd * 2);
+		return -EIO;
+	}
+
+	rnd = prandom_u32();
+	rnd <<= 31;
+	rnd |= prandom_u32();
+	rocker_write64(rocker, TEST_REG64, rnd);
+	test_reg = rocker_read64(rocker, TEST_REG64);
+	if (test_reg != rnd * 2) {
+		dev_err(&pdev->dev, "unexpected 64bit register value %16llx, expected %16llx\n",
+			test_reg, rnd * 2);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int rocker_dma_test_one(struct rocker *rocker, struct rocker_wait *wait,
+			       u32 test_type, dma_addr_t dma_handle,
+			       unsigned char *buf, unsigned char *expect,
+			       size_t size)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int i;
+
+	rocker_wait_reset(wait);
+	rocker_write32(rocker, TEST_DMA_CTRL, test_type);
+
+	if (!rocker_wait_event_timeout(wait, HZ / 10)) {
+		dev_err(&pdev->dev, "no interrupt received within a timeout\n");
+		return -EIO;
+	}
+
+	for (i = 0; i < size; i++) {
+		if (buf[i] != expect[i]) {
+			dev_err(&pdev->dev, "unexpected memory content %02x at byte %x\n, %02x expected",
+				buf[i], i, expect[i]);
+			return -EIO;
+		}
+	}
+	return 0;
+}
+
+#define ROCKER_TEST_DMA_BUF_SIZE (PAGE_SIZE * 4)
+#define ROCKER_TEST_DMA_FILL_PATTERN 0x96
+
+static int rocker_dma_test_offset(struct rocker *rocker,
+				  struct rocker_wait *wait, int offset)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	unsigned char *alloc;
+	unsigned char *buf;
+	unsigned char *expect;
+	dma_addr_t dma_handle;
+	int i;
+	int err;
+
+	alloc = kzalloc(ROCKER_TEST_DMA_BUF_SIZE * 2 + offset,
+			GFP_KERNEL | GFP_DMA);
+	if (!alloc)
+		return -ENOMEM;
+	buf = alloc + offset;
+	expect = buf + ROCKER_TEST_DMA_BUF_SIZE;
+
+	dma_handle = pci_map_single(pdev, buf, ROCKER_TEST_DMA_BUF_SIZE,
+				    PCI_DMA_BIDIRECTIONAL);
+	if (pci_dma_mapping_error(pdev, dma_handle)) {
+		err = -EIO;
+		goto free_alloc;
+	}
+
+	rocker_write64(rocker, TEST_DMA_ADDR, dma_handle);
+	rocker_write32(rocker, TEST_DMA_SIZE, ROCKER_TEST_DMA_BUF_SIZE);
+
+	memset(expect, ROCKER_TEST_DMA_FILL_PATTERN, ROCKER_TEST_DMA_BUF_SIZE);
+	err = rocker_dma_test_one(rocker, wait, ROCKER_TEST_DMA_CTRL_FILL,
+				  dma_handle, buf, expect,
+				  ROCKER_TEST_DMA_BUF_SIZE);
+	if (err)
+		goto unmap;
+
+	memset(expect, 0, ROCKER_TEST_DMA_BUF_SIZE);
+	err = rocker_dma_test_one(rocker, wait, ROCKER_TEST_DMA_CTRL_CLEAR,
+				  dma_handle, buf, expect,
+				  ROCKER_TEST_DMA_BUF_SIZE);
+	if (err)
+		goto unmap;
+
+	prandom_bytes(buf, ROCKER_TEST_DMA_BUF_SIZE);
+	for (i = 0; i < ROCKER_TEST_DMA_BUF_SIZE; i++)
+		expect[i] = ~buf[i];
+	err = rocker_dma_test_one(rocker, wait, ROCKER_TEST_DMA_CTRL_INVERT,
+				  dma_handle, buf, expect,
+				  ROCKER_TEST_DMA_BUF_SIZE);
+	if (err)
+		goto unmap;
+
+unmap:
+	pci_unmap_single(pdev, dma_handle, ROCKER_TEST_DMA_BUF_SIZE,
+			 PCI_DMA_BIDIRECTIONAL);
+free_alloc:
+	kfree(alloc);
+
+	return err;
+}
+
+static int rocker_dma_test(struct rocker *rocker, struct rocker_wait *wait)
+{
+	int i;
+	int err;
+
+	for (i = 0; i < 8; i++) {
+		err = rocker_dma_test_offset(rocker, wait, i);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static irqreturn_t rocker_test_irq_handler(int irq, void *dev_id)
+{
+	struct rocker_wait *wait = dev_id;
+
+	rocker_wait_wake_up(wait);
+
+	return IRQ_HANDLED;
+}
+
+static int rocker_basic_hw_test(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_wait wait;
+	int err;
+
+	err = rocker_reg_test(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "reg test failed\n");
+		return err;
+	}
+
+	err = request_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_TEST),
+			  rocker_test_irq_handler, 0,
+			  rocker_driver_name, &wait);
+	if (err) {
+		dev_err(&pdev->dev, "cannot assign test irq\n");
+		return err;
+	}
+
+	rocker_wait_init(&wait);
+	rocker_write32(rocker, TEST_IRQ, ROCKER_MSIX_VEC_TEST);
+
+	if (!rocker_wait_event_timeout(&wait, HZ / 10)) {
+		dev_err(&pdev->dev, "no interrupt received within a timeout\n");
+		err = -EIO;
+		goto free_irq;
+	}
+
+	err = rocker_dma_test(rocker, &wait);
+	if (err)
+		dev_err(&pdev->dev, "dma test failed\n");
+
+free_irq:
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_TEST), &wait);
+	return err;
+}
+
+/******
+ * TLV
+ ******/
+
+#define ROCKER_TLV_ALIGNTO 8U
+#define ROCKER_TLV_ALIGN(len) \
+	(((len) + ROCKER_TLV_ALIGNTO - 1) & ~(ROCKER_TLV_ALIGNTO - 1))
+#define ROCKER_TLV_HDRLEN ROCKER_TLV_ALIGN(sizeof(struct rocker_tlv))
+
+/*  <------- ROCKER_TLV_HDRLEN -------> <--- ROCKER_TLV_ALIGN(payload) --->
+ * +-----------------------------+- - -+- - - - - - - - - - - - - - -+- - -+
+ * |             Header          | Pad |           Payload           | Pad |
+ * |      (struct rocker_tlv)    | ing |                             | ing |
+ * +-----------------------------+- - -+- - - - - - - - - - - - - - -+- - -+
+ *  <--------------------------- tlv->len -------------------------->
+ */
+
+static struct rocker_tlv *rocker_tlv_next(const struct rocker_tlv *tlv,
+					  int *remaining)
+{
+	int totlen = ROCKER_TLV_ALIGN(tlv->len);
+
+	*remaining -= totlen;
+	return (struct rocker_tlv *) ((char *) tlv + totlen);
+}
+
+static int rocker_tlv_ok(const struct rocker_tlv *tlv, int remaining)
+{
+	return remaining >= (int) ROCKER_TLV_HDRLEN &&
+	       tlv->len >= ROCKER_TLV_HDRLEN &&
+	       tlv->len <= remaining;
+}
+
+#define rocker_tlv_for_each(pos, head, len, rem)	\
+	for (pos = head, rem = len;			\
+	     rocker_tlv_ok(pos, rem);			\
+	     pos = rocker_tlv_next(pos, &(rem)))
+
+#define rocker_tlv_for_each_nested(pos, tlv, rem)	\
+	rocker_tlv_for_each(pos, rocker_tlv_data(tlv),	\
+			    rocker_tlv_len(tlv), rem)
+
+static int rocker_tlv_attr_size(int payload)
+{
+	return ROCKER_TLV_HDRLEN + payload;
+}
+
+static int rocker_tlv_total_size(int payload)
+{
+	return ROCKER_TLV_ALIGN(rocker_tlv_attr_size(payload));
+}
+
+static int rocker_tlv_padlen(int payload)
+{
+	return rocker_tlv_total_size(payload) - rocker_tlv_attr_size(payload);
+}
+
+static int rocker_tlv_type(const struct rocker_tlv *tlv)
+{
+	return tlv->type;
+}
+
+static void *rocker_tlv_data(const struct rocker_tlv *tlv)
+{
+	return (char *) tlv + ROCKER_TLV_HDRLEN;
+}
+
+static int rocker_tlv_len(const struct rocker_tlv *tlv)
+{
+	return tlv->len - ROCKER_TLV_HDRLEN;
+}
+
+static u8 rocker_tlv_get_u8(const struct rocker_tlv *tlv)
+{
+	return *(u8 *) rocker_tlv_data(tlv);
+}
+
+static u16 rocker_tlv_get_u16(const struct rocker_tlv *tlv)
+{
+	return *(u16 *) rocker_tlv_data(tlv);
+}
+
+static u32 rocker_tlv_get_u32(const struct rocker_tlv *tlv)
+{
+	return *(u32 *) rocker_tlv_data(tlv);
+}
+
+static u64 rocker_tlv_get_u64(const struct rocker_tlv *tlv)
+{
+	return *(u64 *) rocker_tlv_data(tlv);
+}
+
+static void rocker_tlv_parse(struct rocker_tlv **tb, int maxtype,
+			     const char *buf, int buf_len)
+{
+	const struct rocker_tlv *tlv;
+	const struct rocker_tlv *head = (const struct rocker_tlv *) buf;
+	int rem;
+
+	memset(tb, 0, sizeof(struct rocker_tlv *) * (maxtype + 1));
+
+	rocker_tlv_for_each(tlv, head, buf_len, rem) {
+		u32 type = rocker_tlv_type(tlv);
+
+		if (type > 0 && type <= maxtype)
+			tb[type] = (struct rocker_tlv *) tlv;
+	}
+}
+
+static void rocker_tlv_parse_nested(struct rocker_tlv **tb, int maxtype,
+				    const struct rocker_tlv *tlv)
+{
+	rocker_tlv_parse(tb, maxtype, rocker_tlv_data(tlv),
+			 rocker_tlv_len(tlv));
+}
+
+static void rocker_tlv_parse_desc(struct rocker_tlv **tb, int maxtype,
+				  struct rocker_desc_info *desc_info)
+{
+	rocker_tlv_parse(tb, maxtype, desc_info->data,
+			 desc_info->desc->tlv_size);
+}
+
+static struct rocker_tlv *rocker_tlv_start(struct rocker_desc_info *desc_info)
+{
+	return (struct rocker_tlv *) ((char *) desc_info->data +
+					       desc_info->tlv_size);
+}
+
+static int rocker_tlv_put(struct rocker_desc_info *desc_info,
+			  int attrtype, int attrlen, const void *data)
+{
+	int tail_room = desc_info->data_size - desc_info->tlv_size;
+	int total_size = rocker_tlv_total_size(attrlen);
+	struct rocker_tlv *tlv;
+
+	if (unlikely(tail_room < total_size))
+		return -EMSGSIZE;
+
+	tlv = rocker_tlv_start(desc_info);
+	desc_info->tlv_size += total_size;
+	tlv->type = attrtype;
+	tlv->len = rocker_tlv_attr_size(attrlen);
+	memcpy(rocker_tlv_data(tlv), data, attrlen);
+	memset((char *) tlv + tlv->len, 0, rocker_tlv_padlen(attrlen));
+	return 0;
+}
+
+static int rocker_tlv_put_u8(struct rocker_desc_info *desc_info,
+			     int attrtype, u8 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u8), &value);
+}
+
+static int rocker_tlv_put_u16(struct rocker_desc_info *desc_info,
+			      int attrtype, u16 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u16), &value);
+}
+
+static int rocker_tlv_put_u32(struct rocker_desc_info *desc_info,
+			      int attrtype, u32 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u32), &value);
+}
+
+static int rocker_tlv_put_u64(struct rocker_desc_info *desc_info,
+			      int attrtype, u64 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u64), &value);
+}
+
+static struct rocker_tlv *
+rocker_tlv_nest_start(struct rocker_desc_info *desc_info, int attrtype)
+{
+	struct rocker_tlv *start = rocker_tlv_start(desc_info);
+
+	if (rocker_tlv_put(desc_info, attrtype, 0, NULL) < 0)
+		return NULL;
+
+	return start;
+}
+
+static void rocker_tlv_nest_end(struct rocker_desc_info *desc_info,
+				struct rocker_tlv *start)
+{
+	start->len = (char *) rocker_tlv_start(desc_info) - (char *) start;
+}
+
+static void rocker_tlv_nest_cancel(struct rocker_desc_info *desc_info,
+				   struct rocker_tlv *start)
+{
+	desc_info->tlv_size = (char *) start - desc_info->data;
+}
+
+/******************************************
+ * DMA rings and descriptors manipulations
+ ******************************************/
+
+static u32 __pos_inc(u32 pos, size_t limit)
+{
+	return ++pos == limit ? 0 : pos;
+}
+
+static int rocker_desc_err(struct rocker_desc_info *desc_info)
+{
+	return -(desc_info->desc->comp_err & ~ROCKER_DMA_DESC_COMP_ERR_GEN);
+}
+
+static void rocker_desc_gen_clear(struct rocker_desc_info *desc_info)
+{
+	desc_info->desc->comp_err &= ~ROCKER_DMA_DESC_COMP_ERR_GEN;
+}
+
+static bool rocker_desc_gen(struct rocker_desc_info *desc_info)
+{
+	u32 comp_err = desc_info->desc->comp_err;
+
+	return comp_err & ROCKER_DMA_DESC_COMP_ERR_GEN ? true : false;
+}
+
+static void *rocker_desc_cookie_ptr_get(struct rocker_desc_info *desc_info)
+{
+	return (void *) desc_info->desc->cookie;
+}
+
+static void rocker_desc_cookie_ptr_set(struct rocker_desc_info *desc_info,
+				       void *ptr)
+{
+	desc_info->desc->cookie = (long) ptr;
+}
+
+static struct rocker_desc_info *
+rocker_desc_head_get(struct rocker_dma_ring_info *info)
+{
+	static struct rocker_desc_info *desc_info;
+	u32 head = __pos_inc(info->head, info->size);
+
+	desc_info = &info->desc_info[info->head];
+	if (head == info->tail)
+		return NULL; /* ring full */
+	desc_info->tlv_size = 0;
+	return desc_info;
+}
+
+static void rocker_desc_commit(struct rocker_desc_info *desc_info)
+{
+	desc_info->desc->buf_size = desc_info->data_size;
+	desc_info->desc->tlv_size = desc_info->tlv_size;
+}
+
+static void rocker_desc_head_set(struct rocker *rocker,
+				 struct rocker_dma_ring_info *info,
+				 struct rocker_desc_info *desc_info)
+{
+	u32 head = __pos_inc(info->head, info->size);
+
+	BUG_ON(head == info->tail);
+	rocker_desc_commit(desc_info);
+	info->head = head;
+	rocker_write32(rocker, DMA_DESC_HEAD(info->type), head);
+}
+
+static struct rocker_desc_info *
+rocker_desc_tail_get(struct rocker_dma_ring_info *info)
+{
+	static struct rocker_desc_info *desc_info;
+
+	if (info->tail == info->head)
+		return NULL; /* nothing to be done between head and tail */
+	desc_info = &info->desc_info[info->tail];
+	if (!rocker_desc_gen(desc_info))
+		return NULL; /* gen bit not set, desc is not ready yet */
+	info->tail = __pos_inc(info->tail, info->size);
+	desc_info->tlv_size = desc_info->desc->tlv_size;
+	return desc_info;
+}
+
+static void rocker_dma_ring_credits_set(struct rocker *rocker,
+					struct rocker_dma_ring_info *info,
+					u32 credits)
+{
+	if (credits)
+		rocker_write32(rocker, DMA_DESC_CREDITS(info->type), credits);
+}
+
+static unsigned long rocker_dma_ring_size_fix(size_t size)
+{
+	return max(ROCKER_DMA_SIZE_MIN,
+		   min(roundup_pow_of_two(size), ROCKER_DMA_SIZE_MAX));
+}
+
+static int rocker_dma_ring_create(struct rocker *rocker,
+				  unsigned int type,
+				  size_t size,
+				  struct rocker_dma_ring_info *info)
+{
+	int i;
+
+	BUG_ON(size != rocker_dma_ring_size_fix(size));
+	info->size = size;
+	info->type = type;
+	info->head = 0;
+	info->tail = 0;
+	info->desc_info = kcalloc(info->size, sizeof(*info->desc_info),
+				  GFP_KERNEL);
+	if (!info->desc_info)
+		return -ENOMEM;
+
+	info->desc = pci_alloc_consistent(rocker->pdev,
+					  info->size * sizeof(*info->desc),
+					  &info->mapaddr);
+	if (!info->desc) {
+		kfree(info->desc_info);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < info->size; i++)
+		info->desc_info[i].desc = &info->desc[i];
+
+	rocker_write32(rocker, DMA_DESC_CTRL(info->type),
+		       ROCKER_DMA_DESC_CTRL_RESET);
+	rocker_write64(rocker, DMA_DESC_ADDR(info->type), info->mapaddr);
+	rocker_write32(rocker, DMA_DESC_SIZE(info->type), info->size);
+
+	return 0;
+}
+
+static void rocker_dma_ring_destroy(struct rocker *rocker,
+				    struct rocker_dma_ring_info *info)
+{
+	rocker_write64(rocker, DMA_DESC_ADDR(info->type), 0);
+
+	pci_free_consistent(rocker->pdev,
+			    info->size * sizeof(struct rocker_desc),
+			    info->desc, info->mapaddr);
+	kfree(info->desc_info);
+}
+
+static void rocker_dma_ring_pass_to_producer(struct rocker *rocker,
+					     struct rocker_dma_ring_info *info)
+{
+	int i;
+
+	BUG_ON(info->head || info->tail);
+
+	/* When ring is consumer, we need to advance head for each desc.
+	 * That tells hw that the desc is ready to be used by it.
+	 */
+	for (i = 0; i < info->size - 1; i++)
+		rocker_desc_head_set(rocker, info, &info->desc_info[i]);
+	rocker_desc_commit(&info->desc_info[i]);
+}
+
+static int rocker_dma_ring_bufs_alloc(struct rocker *rocker,
+				      struct rocker_dma_ring_info *info,
+				      int direction, size_t buf_size)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int i;
+	int err;
+
+	for (i = 0; i < info->size; i++) {
+		struct rocker_desc_info *desc_info = &info->desc_info[i];
+		struct rocker_desc *desc = &info->desc[i];
+		dma_addr_t dma_handle;
+		char *buf;
+
+		buf = kzalloc(buf_size, GFP_KERNEL | GFP_DMA);
+		if (!buf) {
+			err = -ENOMEM;
+			goto rollback;
+		}
+
+		dma_handle = pci_map_single(pdev, buf, buf_size, direction);
+		if (pci_dma_mapping_error(pdev, dma_handle)) {
+			kfree(buf);
+			err = -EIO;
+			goto rollback;
+		}
+
+		desc_info->data = buf;
+		desc_info->data_size = buf_size;
+		dma_unmap_addr_set(desc_info, mapaddr, dma_handle);
+
+		desc->buf_addr = dma_handle;
+		desc->buf_size = buf_size;
+	}
+	return 0;
+
+rollback:
+	for (i--; i >= 0; i--) {
+		struct rocker_desc_info *desc_info = &info->desc_info[i];
+
+		pci_unmap_single(pdev, dma_unmap_addr(desc_info, mapaddr),
+				 desc_info->data_size, direction);
+		kfree(desc_info->data);
+	}
+	return err;
+}
+
+static void rocker_dma_ring_bufs_free(struct rocker *rocker,
+				      struct rocker_dma_ring_info *info,
+				      int direction)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int i;
+
+	for (i = 0; i < info->size; i++) {
+		struct rocker_desc_info *desc_info = &info->desc_info[i];
+		struct rocker_desc *desc = &info->desc[i];
+
+		desc->buf_addr = 0;
+		desc->buf_size = 0;
+		pci_unmap_single(pdev, dma_unmap_addr(desc_info, mapaddr),
+				 desc_info->data_size, direction);
+		kfree(desc_info->data);
+	}
+}
+
+static int rocker_dma_rings_init(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int err;
+
+	err = rocker_dma_ring_create(rocker, ROCKER_DMA_CMD,
+				     ROCKER_DMA_CMD_DEFAULT_SIZE,
+				     &rocker->cmd_ring);
+	if (err) {
+		dev_err(&pdev->dev, "failed to create command dma ring\n");
+		return err;
+	}
+
+	spin_lock_init(&rocker->cmd_ring_lock);
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker->cmd_ring,
+					 PCI_DMA_BIDIRECTIONAL, PAGE_SIZE);
+	if (err) {
+		dev_err(&pdev->dev, "failed to alloc command dma ring buffers\n");
+		goto err_dma_cmd_ring_bufs_alloc;
+	}
+
+	err = rocker_dma_ring_create(rocker, ROCKER_DMA_EVENT,
+				     ROCKER_DMA_EVENT_DEFAULT_SIZE,
+				     &rocker->event_ring);
+	if (err) {
+		dev_err(&pdev->dev, "failed to create event dma ring\n");
+		goto err_dma_event_ring_create;
+	}
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker->event_ring,
+					 PCI_DMA_FROMDEVICE, PAGE_SIZE);
+	if (err) {
+		dev_err(&pdev->dev, "failed to alloc event dma ring buffers\n");
+		goto err_dma_event_ring_bufs_alloc;
+	}
+	rocker_dma_ring_pass_to_producer(rocker, &rocker->event_ring);
+	return 0;
+
+err_dma_event_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker->event_ring);
+err_dma_event_ring_create:
+	rocker_dma_ring_bufs_free(rocker, &rocker->cmd_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+err_dma_cmd_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker->cmd_ring);
+	return err;
+}
+
+static void rocker_dma_rings_fini(struct rocker *rocker)
+{
+	rocker_dma_ring_bufs_free(rocker, &rocker->event_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+	rocker_dma_ring_destroy(rocker, &rocker->event_ring);
+	rocker_dma_ring_bufs_free(rocker, &rocker->cmd_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+	rocker_dma_ring_destroy(rocker, &rocker->cmd_ring);
+}
+
+static int rocker_dma_rx_ring_skb_map(struct rocker *rocker,
+				      struct rocker_port *rocker_port,
+				      struct rocker_desc_info *desc_info,
+				      struct sk_buff *skb, size_t buf_len)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	dma_addr_t dma_handle;
+
+	dma_handle = pci_map_single(pdev, skb->data, buf_len,
+				    PCI_DMA_FROMDEVICE);
+	if (pci_dma_mapping_error(pdev, dma_handle))
+		return -EIO;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_RX_FRAG_ADDR, dma_handle))
+		goto tlv_put_failure;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_RX_FRAG_MAX_LEN, buf_len))
+		goto tlv_put_failure;
+	return 0;
+
+tlv_put_failure:
+	pci_unmap_single(pdev, dma_handle, buf_len, PCI_DMA_FROMDEVICE);
+	desc_info->tlv_size = 0;
+	return -EMSGSIZE;
+}
+
+static size_t rocker_port_rx_buf_len(struct rocker_port *rocker_port)
+{
+	return rocker_port->dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
+}
+
+static int rocker_dma_rx_ring_skb_alloc(struct rocker *rocker,
+					struct rocker_port *rocker_port,
+					struct rocker_desc_info *desc_info)
+{
+	struct net_device *dev = rocker_port->dev;
+	struct sk_buff *skb;
+	size_t buf_len = rocker_port_rx_buf_len(rocker_port);
+	int err;
+
+	/* Ensure that hw will see tlv_size zero in case of an error.
+	 * That tells hw to use another descriptor.
+	 */
+	rocker_desc_cookie_ptr_set(desc_info, NULL);
+	desc_info->tlv_size = 0;
+
+	skb = netdev_alloc_skb_ip_align(dev, buf_len);
+	if (!skb)
+		return -ENOMEM;
+	err = rocker_dma_rx_ring_skb_map(rocker, rocker_port, desc_info,
+					 skb, buf_len);
+	if (err) {
+		dev_kfree_skb_any(skb);
+		return err;
+	}
+	rocker_desc_cookie_ptr_set(desc_info, skb);
+	return 0;
+}
+
+static void rocker_dma_rx_ring_skb_unmap(struct rocker *rocker,
+					 struct rocker_tlv **attrs)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	dma_addr_t dma_handle;
+	size_t len;
+
+	if (!attrs[ROCKER_TLV_RX_FRAG_ADDR] ||
+	    !attrs[ROCKER_TLV_RX_FRAG_MAX_LEN])
+		return;
+	dma_handle = rocker_tlv_get_u64(attrs[ROCKER_TLV_RX_FRAG_ADDR]);
+	len = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FRAG_MAX_LEN]);
+	pci_unmap_single(pdev, dma_handle, len, PCI_DMA_FROMDEVICE);
+}
+
+static void rocker_dma_rx_ring_skb_free(struct rocker *rocker,
+					struct rocker_desc_info *desc_info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
+	struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
+
+	if (!skb)
+		return;
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
+	rocker_dma_rx_ring_skb_unmap(rocker, attrs);
+	dev_kfree_skb_any(skb);
+}
+
+static int rocker_dma_rx_ring_skbs_alloc(struct rocker *rocker,
+					 struct rocker_port *rocker_port)
+{
+	struct rocker_dma_ring_info *rx_ring = &rocker_port->rx_ring;
+	int i;
+	int err;
+
+	for (i = 0; i < rx_ring->size; i++) {
+		err = rocker_dma_rx_ring_skb_alloc(rocker, rocker_port,
+						   &rx_ring->desc_info[i]);
+		if (err)
+			goto rollback;
+	}
+	return 0;
+
+rollback:
+	for (i--; i >= 0; i--)
+		rocker_dma_rx_ring_skb_free(rocker, &rx_ring->desc_info[i]);
+	return err;
+}
+
+static void rocker_dma_rx_ring_skbs_free(struct rocker *rocker,
+					 struct rocker_port *rocker_port)
+{
+	struct rocker_dma_ring_info *rx_ring = &rocker_port->rx_ring;
+	int i;
+
+	for (i = 0; i < rx_ring->size; i++)
+		rocker_dma_rx_ring_skb_free(rocker, &rx_ring->desc_info[i]);
+}
+
+static int rocker_port_dma_rings_init(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	int err;
+
+	err = rocker_dma_ring_create(rocker,
+				     ROCKER_DMA_TX(rocker_port->port_number),
+				     ROCKER_DMA_TX_DEFAULT_SIZE,
+				     &rocker_port->tx_ring);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to create tx dma ring\n");
+		return err;
+	}
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker_port->tx_ring,
+					 PCI_DMA_TODEVICE,
+					 ROCKER_DMA_TX_DESC_SIZE);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to alloc tx dma ring buffers\n");
+		goto err_dma_tx_ring_bufs_alloc;
+	}
+
+	err = rocker_dma_ring_create(rocker,
+				     ROCKER_DMA_RX(rocker_port->port_number),
+				     ROCKER_DMA_RX_DEFAULT_SIZE,
+				     &rocker_port->rx_ring);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to create rx dma ring\n");
+		goto err_dma_rx_ring_create;
+	}
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker_port->rx_ring,
+					 PCI_DMA_BIDIRECTIONAL,
+					 ROCKER_DMA_RX_DESC_SIZE);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to alloc rx dma ring buffers\n");
+		goto err_dma_rx_ring_bufs_alloc;
+	}
+
+	err = rocker_dma_rx_ring_skbs_alloc(rocker, rocker_port);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to alloc rx dma ring skbs\n");
+		goto err_dma_rx_ring_skbs_alloc;
+	}
+	rocker_dma_ring_pass_to_producer(rocker, &rocker_port->rx_ring);
+
+	return 0;
+
+err_dma_rx_ring_skbs_alloc:
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->rx_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+err_dma_rx_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker_port->rx_ring);
+err_dma_rx_ring_create:
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->tx_ring,
+				  PCI_DMA_TODEVICE);
+err_dma_tx_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker_port->tx_ring);
+	return err;
+}
+
+static void rocker_port_dma_rings_fini(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+
+	rocker_dma_rx_ring_skbs_free(rocker, rocker_port);
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->rx_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+	rocker_dma_ring_destroy(rocker, &rocker_port->rx_ring);
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->tx_ring,
+				  PCI_DMA_TODEVICE);
+	rocker_dma_ring_destroy(rocker, &rocker_port->tx_ring);
+}
+
+static void rocker_port_set_enable(struct rocker_port *rocker_port, bool enable)
+{
+	u64 val = rocker_read64(rocker_port->rocker, PORT_PHYS_ENABLE);
+
+	if (enable)
+		val |= 1 << rocker_port->lport;
+	else
+		val &= ~(1 << rocker_port->lport);
+	rocker_write64(rocker_port->rocker, PORT_PHYS_ENABLE, val);
+}
+
+/********************************
+ * Interrupt handler and helpers
+ ********************************/
+
+static irqreturn_t rocker_cmd_irq_handler(int irq, void *dev_id)
+{
+	struct rocker *rocker = dev_id;
+	struct rocker_desc_info *desc_info;
+	struct rocker_wait *wait;
+	u32 credits = 0;
+
+	spin_lock(&rocker->cmd_ring_lock);
+	while ((desc_info = rocker_desc_tail_get(&rocker->cmd_ring))) {
+		wait = rocker_desc_cookie_ptr_get(desc_info);
+		if (wait->nowait) {
+			rocker_desc_gen_clear(desc_info);
+			rocker_wait_destroy(wait);
+		} else {
+			rocker_wait_wake_up(wait);
+		}
+		credits++;
+	}
+	spin_unlock(&rocker->cmd_ring_lock);
+	rocker_dma_ring_credits_set(rocker, &rocker->cmd_ring, credits);
+
+	return IRQ_HANDLED;
+}
+
+static void rocker_port_link_up(struct rocker_port *rocker_port)
+{
+	netif_carrier_on(rocker_port->dev);
+	netdev_info(rocker_port->dev, "Link is up\n");
+}
+
+static void rocker_port_link_down(struct rocker_port *rocker_port)
+{
+	netif_carrier_off(rocker_port->dev);
+	netdev_info(rocker_port->dev, "Link is down\n");
+}
+
+static int rocker_event_link_change(struct rocker *rocker,
+				    const struct rocker_tlv *info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_EVENT_LINK_CHANGED_MAX + 1];
+	unsigned port_number;
+	bool link_up;
+	struct rocker_port *rocker_port;
+
+	rocker_tlv_parse_nested(attrs, ROCKER_TLV_EVENT_LINK_CHANGED_MAX, info);
+	if (!attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LPORT] ||
+	    !attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP])
+		return -EIO;
+	port_number =
+		rocker_tlv_get_u32(attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LPORT]) - 1;
+	link_up = rocker_tlv_get_u8(attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP]);
+
+	if (port_number >= rocker->port_count)
+		return -EINVAL;
+
+	rocker_port = rocker->ports[port_number];
+	if (netif_carrier_ok(rocker_port->dev) != link_up) {
+		if (link_up)
+			rocker_port_link_up(rocker_port);
+		else
+			rocker_port_link_down(rocker_port);
+	}
+
+	return 0;
+}
+
+static int rocker_event_process(struct rocker *rocker,
+				struct rocker_desc_info *desc_info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_EVENT_MAX + 1];
+	struct rocker_tlv *info;
+	u16 type;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_EVENT_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_EVENT_TYPE] ||
+	    !attrs[ROCKER_TLV_EVENT_INFO])
+		return -EIO;
+
+	type = rocker_tlv_get_u16(attrs[ROCKER_TLV_EVENT_TYPE]);
+	info = attrs[ROCKER_TLV_EVENT_INFO];
+
+	switch (type) {
+	case ROCKER_TLV_EVENT_TYPE_LINK_CHANGED:
+		return rocker_event_link_change(rocker, info);
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static irqreturn_t rocker_event_irq_handler(int irq, void *dev_id)
+{
+	struct rocker *rocker = dev_id;
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_desc_info *desc_info;
+	u32 credits = 0;
+	int err;
+
+	while ((desc_info = rocker_desc_tail_get(&rocker->event_ring))) {
+		err = rocker_desc_err(desc_info);
+		if (err) {
+			dev_err(&pdev->dev, "event desc received with err %d\n",
+				err);
+		} else {
+			err = rocker_event_process(rocker, desc_info);
+			if (err)
+				dev_err(&pdev->dev, "event processing failed with err %d\n",
+					err);
+		}
+		rocker_desc_gen_clear(desc_info);
+		rocker_desc_head_set(rocker, &rocker->event_ring, desc_info);
+		credits++;
+	}
+	rocker_dma_ring_credits_set(rocker, &rocker->event_ring, credits);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t rocker_tx_irq_handler(int irq, void *dev_id)
+{
+	struct rocker_port *rocker_port = dev_id;
+
+	napi_schedule(&rocker_port->napi_tx);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t rocker_rx_irq_handler(int irq, void *dev_id)
+{
+	struct rocker_port *rocker_port = dev_id;
+
+	napi_schedule(&rocker_port->napi_rx);
+	return IRQ_HANDLED;
+}
+
+/********************
+ * Command interface
+ ********************/
+
+typedef int (*rocker_cmd_cb_t)(struct rocker *rocker,
+			       struct rocker_port *rocker_port,
+			       struct rocker_desc_info *desc_info,
+			       void *priv);
+
+static int rocker_cmd_exec(struct rocker *rocker,
+			   struct rocker_port *rocker_port,
+			   rocker_cmd_cb_t prepare, void *prepare_priv,
+			   rocker_cmd_cb_t process, void *process_priv,
+			   bool nowait)
+{
+	struct rocker_desc_info *desc_info;
+	struct rocker_wait *wait;
+	unsigned long flags;
+	int err;
+
+	wait = rocker_wait_create(nowait ? GFP_ATOMIC : GFP_KERNEL);
+	if (!wait)
+		return -ENOMEM;
+	wait->nowait = nowait;
+
+	spin_lock_irqsave(&rocker->cmd_ring_lock, flags);
+	desc_info = rocker_desc_head_get(&rocker->cmd_ring);
+	if (!desc_info) {
+		spin_unlock_irqrestore(&rocker->cmd_ring_lock, flags);
+		err = -EAGAIN;
+		goto out;
+	}
+	err = prepare(rocker, rocker_port, desc_info, prepare_priv);
+	if (err) {
+		spin_unlock_irqrestore(&rocker->cmd_ring_lock, flags);
+		goto out;
+	}
+	rocker_desc_cookie_ptr_set(desc_info, wait);
+	rocker_desc_head_set(rocker, &rocker->cmd_ring, desc_info);
+	spin_unlock_irqrestore(&rocker->cmd_ring_lock, flags);
+
+	if (nowait)
+		return 0;
+
+	if (!rocker_wait_event_timeout(wait, HZ / 10))
+		return -EIO;
+
+	err = rocker_desc_err(desc_info);
+	if (err)
+		return err;
+
+	if (process)
+		err = process(rocker, rocker_port, desc_info, process_priv);
+
+	rocker_desc_gen_clear(desc_info);
+out:
+	rocker_wait_destroy(wait);
+	return err;
+}
+
+static int
+rocker_cmd_get_port_settings_prep(struct rocker *rocker,
+				  struct rocker_port *rocker_port,
+				  struct rocker_desc_info *desc_info,
+				  void *priv)
+{
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_GET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port->lport))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int
+rocker_cmd_get_port_settings_ethtool_proc(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	struct ethtool_cmd *ecmd = priv;
+	struct rocker_tlv *attrs[ROCKER_TLV_CMD_MAX + 1];
+	struct rocker_tlv *info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MAX + 1];
+	u32 speed;
+	u8 duplex;
+	u8 autoneg;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_CMD_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_CMD_INFO])
+		return -EIO;
+
+	rocker_tlv_parse_nested(info_attrs, ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+				attrs[ROCKER_TLV_CMD_INFO]);
+	if (!info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_SPEED] ||
+	    !info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX] ||
+	    !info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG])
+		return -EIO;
+
+	speed = rocker_tlv_get_u32(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_SPEED]);
+	duplex = rocker_tlv_get_u8(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX]);
+	autoneg = rocker_tlv_get_u8(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG]);
+
+	ecmd->transceiver = XCVR_INTERNAL;
+	ecmd->supported = SUPPORTED_TP;
+	ecmd->phy_address = 0xff;
+	ecmd->port = PORT_TP;
+	ethtool_cmd_speed_set(ecmd, speed);
+	ecmd->duplex = duplex ? DUPLEX_FULL : DUPLEX_HALF;
+	ecmd->autoneg = autoneg ? AUTONEG_ENABLE : AUTONEG_DISABLE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_get_port_settings_macaddr_proc(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	unsigned char *macaddr = priv;
+	struct rocker_tlv *attrs[ROCKER_TLV_CMD_MAX + 1];
+	struct rocker_tlv *info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MAX + 1];
+	struct rocker_tlv *attr;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_CMD_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_CMD_INFO])
+		return -EIO;
+
+	rocker_tlv_parse_nested(info_attrs, ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+				attrs[ROCKER_TLV_CMD_INFO]);
+	attr = info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MACADDR];
+	if (!attr)
+		return -EIO;
+
+	if (rocker_tlv_len(attr) != ETH_ALEN)
+		return -EINVAL;
+
+	ether_addr_copy(macaddr, rocker_tlv_data(attr));
+	return 0;
+}
+
+static int
+rocker_cmd_set_port_settings_ethtool_prep(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	struct ethtool_cmd *ecmd = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port->lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_SPEED,
+			       ethtool_cmd_speed(ecmd)))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX,
+			      ecmd->duplex))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG,
+			      ecmd->autoneg))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int
+rocker_cmd_set_port_settings_macaddr_prep(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	unsigned char *macaddr = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port->lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_MACADDR,
+			   ETH_ALEN, macaddr))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int rocker_cmd_get_port_settings_ethtool(struct rocker_port *rocker_port,
+						struct ethtool_cmd *ecmd)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_get_port_settings_prep, NULL,
+			       rocker_cmd_get_port_settings_ethtool_proc,
+			       ecmd, false);
+}
+
+static int rocker_cmd_get_port_settings_macaddr(struct rocker_port *rocker_port,
+						unsigned char *macaddr)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_get_port_settings_prep, NULL,
+			       rocker_cmd_get_port_settings_macaddr_proc,
+			       macaddr, false);
+}
+
+static int rocker_cmd_set_port_settings_ethtool(struct rocker_port *rocker_port,
+						struct ethtool_cmd *ecmd)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_set_port_settings_ethtool_prep,
+			       ecmd, NULL, NULL, false);
+}
+
+static int rocker_cmd_set_port_settings_macaddr(struct rocker_port *rocker_port,
+						unsigned char *macaddr)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_set_port_settings_macaddr_prep,
+			       macaddr, NULL, NULL, false);
+}
+
+/*****************
+ * Net device ops
+ *****************/
+
+static int rocker_port_open(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	err = rocker_port_dma_rings_init(rocker_port);
+	if (err)
+		return err;
+
+	err = request_irq(rocker_msix_tx_vector(rocker_port),
+			  rocker_tx_irq_handler, 0,
+			  rocker_driver_name, rocker_port);
+	if (err) {
+		netdev_err(rocker_port->dev, "cannot assign tx irq\n");
+		goto err_request_tx_irq;
+	}
+
+	err = request_irq(rocker_msix_rx_vector(rocker_port),
+			  rocker_rx_irq_handler, 0,
+			  rocker_driver_name, rocker_port);
+	if (err) {
+		netdev_err(rocker_port->dev, "cannot assign rx irq\n");
+		goto err_request_rx_irq;
+	}
+
+	napi_enable(&rocker_port->napi_tx);
+	napi_enable(&rocker_port->napi_rx);
+	rocker_port_set_enable(rocker_port, true);
+	netif_start_queue(dev);
+	return 0;
+
+err_request_rx_irq:
+	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
+err_request_tx_irq:
+	rocker_port_dma_rings_fini(rocker_port);
+	return err;
+}
+
+static int rocker_port_stop(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	netif_stop_queue(dev);
+	rocker_port_set_enable(rocker_port, false);
+	napi_disable(&rocker_port->napi_rx);
+	napi_disable(&rocker_port->napi_tx);
+	free_irq(rocker_msix_rx_vector(rocker_port), rocker_port);
+	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
+	rocker_port_dma_rings_fini(rocker_port);
+
+	return 0;
+}
+
+static void rocker_tx_desc_frags_unmap(struct rocker_port *rocker_port,
+				       struct rocker_desc_info *desc_info)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_tlv *attrs[ROCKER_TLV_TX_MAX + 1];
+	struct rocker_tlv *attr;
+	int rem;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_TX_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_TX_FRAGS])
+		return;
+	rocker_tlv_for_each_nested(attr, attrs[ROCKER_TLV_TX_FRAGS], rem) {
+		struct rocker_tlv *frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_MAX + 1];
+		dma_addr_t dma_handle;
+		size_t len;
+
+		if (rocker_tlv_type(attr) != ROCKER_TLV_TX_FRAG)
+			continue;
+		rocker_tlv_parse_nested(frag_attrs, ROCKER_TLV_TX_FRAG_ATTR_MAX,
+					attr);
+		if (!frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_ADDR] ||
+		    !frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_LEN])
+			continue;
+		dma_handle = rocker_tlv_get_u64(frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_ADDR]);
+		len = rocker_tlv_get_u16(frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_LEN]);
+		pci_unmap_single(pdev, dma_handle, len, DMA_TO_DEVICE);
+	}
+}
+
+static int rocker_tx_desc_frag_map_put(struct rocker_port *rocker_port,
+				       struct rocker_desc_info *desc_info,
+				       char *buf, size_t buf_len)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct pci_dev *pdev = rocker->pdev;
+	dma_addr_t dma_handle;
+	struct rocker_tlv *frag;
+
+	dma_handle = pci_map_single(pdev, buf, buf_len, DMA_TO_DEVICE);
+	if (unlikely(pci_dma_mapping_error(pdev, dma_handle))) {
+		if (net_ratelimit())
+			netdev_err(rocker_port->dev, "failed to dma map tx frag\n");
+		return -EIO;
+	}
+	frag = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAG);
+	if (!frag)
+		goto unmap_frag;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_TX_FRAG_ATTR_ADDR,
+			       dma_handle))
+		goto nest_cancel;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_TX_FRAG_ATTR_LEN,
+			       buf_len))
+		goto nest_cancel;
+	rocker_tlv_nest_end(desc_info, frag);
+	return 0;
+
+nest_cancel:
+	rocker_tlv_nest_cancel(desc_info, frag);
+unmap_frag:
+	pci_unmap_single(pdev, dma_handle, buf_len, DMA_TO_DEVICE);
+	return -EMSGSIZE;
+}
+
+static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_desc_info *desc_info;
+	struct rocker_tlv *frags;
+	int i;
+	int err;
+
+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
+	if (unlikely(!desc_info)) {
+		if (net_ratelimit())
+			netdev_err(dev, "tx ring full when queue awake\n");
+		return NETDEV_TX_BUSY;
+	}
+
+	rocker_desc_cookie_ptr_set(desc_info, skb);
+
+	frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
+	if (!frags)
+		goto out;
+	err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
+					  skb->data, skb_headlen(skb));
+	if (err)
+		goto nest_cancel;
+	if (skb_shinfo(skb)->nr_frags > ROCKER_TX_FRAGS_MAX)
+		goto nest_cancel;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
+						  skb_frag_address(frag),
+						  skb_frag_size(frag));
+		if (err)
+			goto unmap_frags;
+	}
+	rocker_tlv_nest_end(desc_info, frags);
+
+	rocker_desc_gen_clear(desc_info);
+	rocker_desc_head_set(rocker, &rocker_port->tx_ring, desc_info);
+
+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
+	if (!desc_info)
+		netif_stop_queue(dev);
+
+	return NETDEV_TX_OK;
+
+unmap_frags:
+	rocker_tx_desc_frags_unmap(rocker_port, desc_info);
+nest_cancel:
+	rocker_tlv_nest_cancel(desc_info, frags);
+out:
+	dev_kfree_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static int rocker_port_set_mac_address(struct net_device *dev, void *p)
+{
+	struct sockaddr *addr = p;
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	err = rocker_cmd_set_port_settings_macaddr(rocker_port, addr->sa_data);
+	if (err)
+		return err;
+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+	return 0;
+}
+
+static int rocker_port_sw_parent_id_get(struct net_device *dev,
+					struct netdev_phys_item_id *psid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker *rocker = rocker_port->rocker;
+
+	psid->id_len = sizeof(rocker->hw.id);
+	memcpy(&psid->id, &rocker->hw.id, psid->id_len);
+	return 0;
+}
+
+static const struct net_device_ops rocker_port_netdev_ops = {
+	.ndo_open			= rocker_port_open,
+	.ndo_stop			= rocker_port_stop,
+	.ndo_start_xmit			= rocker_port_xmit,
+	.ndo_set_mac_address		= rocker_port_set_mac_address,
+#ifdef CONFIG_NET_SWITCHDEV
+	.ndo_sw_parent_id_get		= rocker_port_sw_parent_id_get,
+#endif
+};
+
+/********************
+ * ethtool interface
+ ********************/
+
+static int rocker_port_get_settings(struct net_device *dev,
+				    struct ethtool_cmd *ecmd)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	return rocker_cmd_get_port_settings_ethtool(rocker_port, ecmd);
+}
+
+static int rocker_port_set_settings(struct net_device *dev,
+				    struct ethtool_cmd *ecmd)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	return rocker_cmd_set_port_settings_ethtool(rocker_port, ecmd);
+}
+
+static void rocker_port_get_drvinfo(struct net_device *dev,
+				    struct ethtool_drvinfo *drvinfo)
+{
+	strlcpy(drvinfo->driver, rocker_driver_name, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version));
+}
+
+static const struct ethtool_ops rocker_port_ethtool_ops = {
+	.get_settings		= rocker_port_get_settings,
+	.set_settings		= rocker_port_set_settings,
+	.get_drvinfo		= rocker_port_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+};
+
+/*****************
+ * NAPI interface
+ *****************/
+
+static struct rocker_port *rocker_port_napi_tx_get(struct napi_struct *napi)
+{
+	return container_of(napi, struct rocker_port, napi_tx);
+}
+
+static int rocker_port_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct rocker_port *rocker_port = rocker_port_napi_tx_get(napi);
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_desc_info *desc_info;
+	u32 credits = 0;
+	int err;
+
+	/* Cleanup tx descriptors */
+	while ((desc_info = rocker_desc_tail_get(&rocker_port->tx_ring))) {
+		err = rocker_desc_err(desc_info);
+		if (err && net_ratelimit())
+			netdev_err(rocker_port->dev, "tx desc received with err %d\n",
+				   err);
+		rocker_tx_desc_frags_unmap(rocker_port, desc_info);
+		dev_kfree_skb_any(rocker_desc_cookie_ptr_get(desc_info));
+		credits++;
+	}
+
+	if (credits && netif_queue_stopped(rocker_port->dev))
+		netif_wake_queue(rocker_port->dev);
+
+	napi_complete(napi);
+	rocker_dma_ring_credits_set(rocker, &rocker_port->tx_ring, credits);
+
+	return 0;
+}
+
+static int rocker_port_rx_proc(struct rocker *rocker,
+			       struct rocker_port *rocker_port,
+			       struct rocker_desc_info *desc_info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
+	struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
+	size_t rx_len;
+
+	if (!skb)
+		return -ENOENT;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_RX_FRAG_LEN])
+		return -EINVAL;
+
+	rocker_dma_rx_ring_skb_unmap(rocker, attrs);
+
+	rx_len = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FRAG_LEN]);
+	skb_put(skb, rx_len);
+	skb->protocol = eth_type_trans(skb, rocker_port->dev);
+	netif_receive_skb(skb);
+
+	return rocker_dma_rx_ring_skb_alloc(rocker, rocker_port, desc_info);
+}
+
+static struct rocker_port *rocker_port_napi_rx_get(struct napi_struct *napi)
+{
+	return container_of(napi, struct rocker_port, napi_rx);
+}
+
+static int rocker_port_poll_rx(struct napi_struct *napi, int budget)
+{
+	struct rocker_port *rocker_port = rocker_port_napi_rx_get(napi);
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_desc_info *desc_info;
+	u32 credits = 0;
+	int err;
+
+	/* Process rx descriptors */
+	while (credits < budget &&
+	       (desc_info = rocker_desc_tail_get(&rocker_port->rx_ring))) {
+		err = rocker_desc_err(desc_info);
+		if (err) {
+			if (net_ratelimit())
+				netdev_err(rocker_port->dev, "rx desc received with err %d\n",
+					   err);
+		} else {
+			err = rocker_port_rx_proc(rocker, rocker_port,
+						  desc_info);
+			if (err && net_ratelimit())
+				netdev_err(rocker_port->dev, "rx processing failed with err %d\n",
+					   err);
+		}
+		rocker_desc_gen_clear(desc_info);
+		rocker_desc_head_set(rocker, &rocker_port->rx_ring, desc_info);
+		credits++;
+	}
+
+	if (credits < budget)
+		napi_complete(napi);
+
+	rocker_dma_ring_credits_set(rocker, &rocker_port->rx_ring, credits);
+
+	return credits;
+}
+
+/*****************
+ * PCI driver ops
+ *****************/
+
+static void rocker_carrier_init(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	u64 link_status = rocker_read64(rocker, PORT_PHYS_LINK_STATUS);
+	bool link_up;
+
+	link_up = link_status & (1 << rocker_port->lport);
+	if (link_up)
+		netif_carrier_on(rocker_port->dev);
+	else
+		netif_carrier_off(rocker_port->dev);
+}
+
+static void rocker_remove_ports(struct rocker *rocker)
+{
+	int i;
+
+	for (i = 0; i < rocker->port_count; i++)
+		unregister_netdev(rocker->ports[i]->dev);
+	kfree(rocker->ports);
+}
+
+static void rocker_port_dev_addr_init(struct rocker *rocker,
+				      struct rocker_port *rocker_port)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int err;
+
+	err = rocker_cmd_get_port_settings_macaddr(rocker_port,
+						   rocker_port->dev->dev_addr);
+	if (err) {
+		dev_warn(&pdev->dev, "failed to get mac address, using random\n");
+		eth_hw_addr_random(rocker_port->dev);
+	}
+}
+
+static int rocker_probe_port(struct rocker *rocker, unsigned port_number)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_port *rocker_port;
+	struct net_device *dev;
+	int err;
+
+	dev = alloc_etherdev(sizeof(struct rocker_port));
+	if (!dev)
+		return -ENOMEM;
+	rocker_port = netdev_priv(dev);
+	rocker_port->dev = dev;
+	rocker_port->rocker = rocker;
+	rocker_port->port_number = port_number;
+	rocker_port->lport = port_number + 1;
+
+	rocker_port_dev_addr_init(rocker, rocker_port);
+	dev->netdev_ops = &rocker_port_netdev_ops;
+	dev->ethtool_ops = &rocker_port_ethtool_ops;
+	netif_napi_add(dev, &rocker_port->napi_tx, rocker_port_poll_tx,
+		       NAPI_POLL_WEIGHT);
+	netif_napi_add(dev, &rocker_port->napi_rx, rocker_port_poll_rx,
+		       NAPI_POLL_WEIGHT);
+	rocker_carrier_init(rocker_port);
+
+	dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+
+	err = register_netdev(dev);
+	if (err) {
+		dev_err(&pdev->dev, "register_netdev failed\n");
+		goto err_register_netdev;
+	}
+	rocker->ports[port_number] = rocker_port;
+
+	return 0;
+
+err_register_netdev:
+	free_netdev(dev);
+	return err;
+}
+
+static int rocker_probe_ports(struct rocker *rocker)
+{
+	int i;
+	size_t alloc_size;
+	int err;
+
+	alloc_size = sizeof(struct rocker_port *) * rocker->port_count;
+	rocker->ports = kmalloc(alloc_size, GFP_KERNEL);
+	for (i = 0; i < rocker->port_count; i++) {
+		err = rocker_probe_port(rocker, i);
+		if (err)
+			goto remove_ports;
+	}
+	return 0;
+
+remove_ports:
+	rocker_remove_ports(rocker);
+	return err;
+}
+
+static int rocker_msix_init(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int msix_entries;
+	int i;
+	int err;
+
+	msix_entries = pci_msix_vec_count(pdev);
+	if (msix_entries < 0)
+		return msix_entries;
+
+	if (msix_entries != ROCKER_MSIX_VEC_COUNT(rocker->port_count))
+		return -EINVAL;
+
+	rocker->msix_entries = kmalloc_array(msix_entries,
+					     sizeof(struct msix_entry),
+					     GFP_KERNEL);
+	if (!rocker->msix_entries)
+		return -ENOMEM;
+
+	for (i = 0; i < msix_entries; i++)
+		rocker->msix_entries[i].entry = i;
+
+	err = pci_enable_msix_exact(pdev, rocker->msix_entries, msix_entries);
+	if (err < 0)
+		goto err_enable_msix;
+
+	return 0;
+
+err_enable_msix:
+	kfree(rocker->msix_entries);
+	return err;
+}
+
+static void rocker_msix_fini(struct rocker *rocker)
+{
+	pci_disable_msix(rocker->pdev);
+	kfree(rocker->msix_entries);
+}
+
+static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct rocker *rocker;
+	int err;
+
+	rocker = kzalloc(sizeof(*rocker), GFP_KERNEL);
+	if (!rocker)
+		return -ENOMEM;
+
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(&pdev->dev, "pci_enable_device failed\n");
+		goto err_pci_enable_device;
+	}
+
+	err = pci_request_regions(pdev, rocker_driver_name);
+	if (err) {
+		dev_err(&pdev->dev, "pci_request_regions failed\n");
+		goto err_pci_request_regions;
+	}
+
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (!err) {
+		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+		if (err) {
+			dev_err(&pdev->dev, "pci_set_consistent_dma_mask failed\n");
+			goto err_pci_set_dma_mask;
+		}
+	} else {
+		err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (err) {
+			dev_err(&pdev->dev, "pci_set_dma_mask failed\n");
+			goto err_pci_set_dma_mask;
+		}
+	}
+
+	if (pci_resource_len(pdev, 0) < ROCKER_PCI_BAR0_SIZE) {
+		dev_err(&pdev->dev, "invalid PCI region size\n");
+		goto err_pci_resource_len_check;
+	}
+
+	rocker->hw_addr = ioremap(pci_resource_start(pdev, 0),
+				  pci_resource_len(pdev, 0));
+	if (!rocker->hw_addr) {
+		dev_err(&pdev->dev, "ioremap failed\n");
+		err = -EIO;
+		goto err_ioremap;
+	}
+	pci_set_master(pdev);
+
+	rocker->pdev = pdev;
+	pci_set_drvdata(pdev, rocker);
+
+	rocker->port_count = rocker_read32(rocker, PORT_PHYS_COUNT);
+
+	err = rocker_msix_init(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "MSI-X init failed\n");
+		goto err_msix_init;
+	}
+
+	err = rocker_basic_hw_test(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "basic hw test failed\n");
+		goto err_basic_hw_test;
+	}
+
+	rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
+
+	err = rocker_dma_rings_init(rocker);
+	if (err)
+		goto err_dma_rings_init;
+
+	err = request_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD),
+			  rocker_cmd_irq_handler, 0,
+			  rocker_driver_name, rocker);
+	if (err) {
+		dev_err(&pdev->dev, "cannot assign cmd irq\n");
+		goto err_request_cmd_irq;
+	}
+
+	err = request_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT),
+			  rocker_event_irq_handler, 0,
+			  rocker_driver_name, rocker);
+	if (err) {
+		dev_err(&pdev->dev, "cannot assign event irq\n");
+		goto err_request_event_irq;
+	}
+
+	rocker->hw.id = rocker_read64(rocker, SWITCH_ID);
+
+	err = rocker_probe_ports(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "failed to probe ports\n");
+		goto err_probe_ports;
+	}
+
+	dev_info(&pdev->dev, "Rocker switch with id %016llx\n", rocker->hw.id);
+
+	return 0;
+
+err_probe_ports:
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
+err_request_event_irq:
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD), rocker);
+err_request_cmd_irq:
+	rocker_dma_rings_fini(rocker);
+err_dma_rings_init:
+err_basic_hw_test:
+	rocker_msix_fini(rocker);
+err_msix_init:
+	iounmap(rocker->hw_addr);
+err_ioremap:
+err_pci_resource_len_check:
+err_pci_set_dma_mask:
+	pci_release_regions(pdev);
+err_pci_request_regions:
+	pci_disable_device(pdev);
+err_pci_enable_device:
+	kfree(rocker);
+	return err;
+}
+
+static void rocker_remove(struct pci_dev *pdev)
+{
+	struct rocker *rocker = pci_get_drvdata(pdev);
+
+	rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
+	rocker_remove_ports(rocker);
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD), rocker);
+	rocker_dma_rings_fini(rocker);
+	rocker_msix_fini(rocker);
+	iounmap(rocker->hw_addr);
+	pci_release_regions(rocker->pdev);
+	pci_disable_device(rocker->pdev);
+	kfree(rocker);
+}
+
+static struct pci_driver rocker_pci_driver = {
+	.name		= rocker_driver_name,
+	.id_table	= rocker_pci_id_table,
+	.probe		= rocker_probe,
+	.remove		= rocker_remove,
+};
+
+/***********************
+ * Module init and exit
+ ***********************/
+
+static int __init rocker_module_init(void)
+{
+	return pci_register_driver(&rocker_pci_driver);
+}
+
+static void __exit rocker_module_exit(void)
+{
+	pci_unregister_driver(&rocker_pci_driver);
+}
+
+module_init(rocker_module_init);
+module_exit(rocker_module_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jiri@resnulli.us>");
+MODULE_AUTHOR("Scott Feldman <sfeldma@gmail.com>");
+MODULE_DESCRIPTION("Rocker switch device driver");
+MODULE_DEVICE_TABLE(pci, rocker_pci_id_table);
diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
new file mode 100644
index 0000000..56f779e
--- /dev/null
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -0,0 +1,427 @@
+/*
+ * drivers/net/ethernet/rocker/rocker.h - Rocker switch device driver
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ * Copyright (c) 2014 Scott Feldman <sfeldma@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _ROCKER_H
+#define _ROCKER_H
+
+#include <linux/types.h>
+
+#define PCI_VENDOR_ID_REDHAT		0x1b36
+#define PCI_DEVICE_ID_REDHAT_ROCKER	0x0006
+
+#define ROCKER_PCI_BAR0_SIZE		0x2000
+
+/* MSI-X vectors */
+enum {
+	ROCKER_MSIX_VEC_CMD,
+	ROCKER_MSIX_VEC_EVENT,
+	ROCKER_MSIX_VEC_TEST,
+	ROCKER_MSIX_VEC_RESERVED0,
+	__ROCKER_MSIX_VEC_TX,
+	__ROCKER_MSIX_VEC_RX,
+#define ROCKER_MSIX_VEC_TX(port) \
+	(__ROCKER_MSIX_VEC_TX + ((port) * 2))
+#define ROCKER_MSIX_VEC_RX(port) \
+	(__ROCKER_MSIX_VEC_RX + ((port) * 2))
+#define ROCKER_MSIX_VEC_COUNT(portcnt) \
+	(ROCKER_MSIX_VEC_RX((portcnt - 1)) + 1)
+};
+
+/* Rocker bogus registers */
+#define ROCKER_BOGUS_REG0		0x0000
+#define ROCKER_BOGUS_REG1		0x0004
+#define ROCKER_BOGUS_REG2		0x0008
+#define ROCKER_BOGUS_REG3		0x000c
+
+/* Rocker test registers */
+#define ROCKER_TEST_REG			0x0010
+#define ROCKER_TEST_REG64		0x0018  /* 8-byte */
+#define ROCKER_TEST_IRQ			0x0020
+#define ROCKER_TEST_DMA_ADDR		0x0028  /* 8-byte */
+#define ROCKER_TEST_DMA_SIZE		0x0030
+#define ROCKER_TEST_DMA_CTRL		0x0034
+
+/* Rocker test register ctrl */
+#define ROCKER_TEST_DMA_CTRL_CLEAR	(1 << 0)
+#define ROCKER_TEST_DMA_CTRL_FILL	(1 << 1)
+#define ROCKER_TEST_DMA_CTRL_INVERT	(1 << 2)
+
+/* Rocker DMA ring register offsets */
+#define ROCKER_DMA_DESC_ADDR(x)		(0x1000 + (x) * 32)  /* 8-byte */
+#define ROCKER_DMA_DESC_SIZE(x)		(0x1008 + (x) * 32)
+#define ROCKER_DMA_DESC_HEAD(x)		(0x100c + (x) * 32)
+#define ROCKER_DMA_DESC_TAIL(x)		(0x1010 + (x) * 32)
+#define ROCKER_DMA_DESC_CTRL(x)		(0x1014 + (x) * 32)
+#define ROCKER_DMA_DESC_CREDITS(x)	(0x1018 + (x) * 32)
+#define ROCKER_DMA_DESC_RES1(x)		(0x101c + (x) * 32)
+
+/* Rocker dma ctrl register bits */
+#define ROCKER_DMA_DESC_CTRL_RESET	(1 << 0)
+
+/* Rocker DMA ring types */
+enum rocker_dma_type {
+	ROCKER_DMA_CMD,
+	ROCKER_DMA_EVENT,
+	__ROCKER_DMA_TX,
+	__ROCKER_DMA_RX,
+#define ROCKER_DMA_TX(port) (__ROCKER_DMA_TX + (port) * 2)
+#define ROCKER_DMA_RX(port) (__ROCKER_DMA_RX + (port) * 2)
+};
+
+/* Rocker DMA ring size limits and default sizes */
+#define ROCKER_DMA_SIZE_MIN		2ul
+#define ROCKER_DMA_SIZE_MAX		65536ul
+#define ROCKER_DMA_CMD_DEFAULT_SIZE	32ul
+#define ROCKER_DMA_EVENT_DEFAULT_SIZE	32ul
+#define ROCKER_DMA_TX_DEFAULT_SIZE	64ul
+#define ROCKER_DMA_TX_DESC_SIZE		256
+#define ROCKER_DMA_RX_DEFAULT_SIZE	64ul
+#define ROCKER_DMA_RX_DESC_SIZE		256
+
+/* Rocker DMA descriptor struct */
+struct rocker_desc {
+	u64 buf_addr;
+	u64 cookie;
+	u16 buf_size;
+	u16 tlv_size;
+	u16 resv[5];
+	u16 comp_err;
+} __packed __aligned(8);
+
+#define ROCKER_DMA_DESC_COMP_ERR_GEN	(1 << 15)
+
+/* Rocker DMA TLV struct */
+struct rocker_tlv {
+	u32 type;
+	u16 len;
+} __packed __aligned(8);
+
+/* TLVs */
+enum {
+	ROCKER_TLV_CMD_UNSPEC,
+	ROCKER_TLV_CMD_TYPE,	/* u16 */
+	ROCKER_TLV_CMD_INFO,	/* nest */
+
+	__ROCKER_TLV_CMD_MAX,
+	ROCKER_TLV_CMD_MAX = __ROCKER_TLV_CMD_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_CMD_TYPE_UNSPEC,
+	ROCKER_TLV_CMD_TYPE_GET_PORT_SETTINGS,
+	ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_ADD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_MOD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_DEL,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_GET_STATS,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_ADD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_MOD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_DEL,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_GET_STATS,
+
+	__ROCKER_TLV_CMD_TYPE_MAX,
+	ROCKER_TLV_CMD_TYPE_MAX = __ROCKER_TLV_CMD_TYPE_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_CMD_PORT_SETTINGS_UNSPEC,
+	ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,		/* u32 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_SPEED,		/* u32 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX,		/* u8 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG,		/* u8 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_MACADDR,		/* binary */
+	ROCKER_TLV_CMD_PORT_SETTINGS_MODE,		/* u8 */
+
+	__ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+	ROCKER_TLV_CMD_PORT_SETTINGS_MAX =
+			__ROCKER_TLV_CMD_PORT_SETTINGS_MAX - 1,
+};
+
+enum rocker_port_mode {
+	ROCKER_PORT_MODE_OF_DPA,
+};
+
+enum {
+	ROCKER_TLV_EVENT_UNSPEC,
+	ROCKER_TLV_EVENT_TYPE,	/* u16 */
+	ROCKER_TLV_EVENT_INFO,	/* nest */
+
+	__ROCKER_TLV_EVENT_MAX,
+	ROCKER_TLV_EVENT_MAX = __ROCKER_TLV_EVENT_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_EVENT_TYPE_UNSPEC,
+	ROCKER_TLV_EVENT_TYPE_LINK_CHANGED,
+	ROCKER_TLV_EVENT_TYPE_MAC_VLAN_SEEN,
+
+	__ROCKER_TLV_EVENT_TYPE_MAX,
+	ROCKER_TLV_EVENT_TYPE_MAX = __ROCKER_TLV_EVENT_TYPE_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_EVENT_LINK_CHANGED_UNSPEC,
+	ROCKER_TLV_EVENT_LINK_CHANGED_LPORT,	/* u32 */
+	ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP,	/* u8 */
+
+	__ROCKER_TLV_EVENT_LINK_CHANGED_MAX,
+	ROCKER_TLV_EVENT_LINK_CHANGED_MAX =
+			__ROCKER_TLV_EVENT_LINK_CHANGED_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_EVENT_MAC_VLAN_UNSPEC,
+	ROCKER_TLV_EVENT_MAC_VLAN_LPORT,	/* u32 */
+	ROCKER_TLV_EVENT_MAC_VLAN_MAC,		/* binary */
+	ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID,	/* __be16 */
+
+	__ROCKER_TLV_EVENT_MAC_VLAN_MAX,
+	ROCKER_TLV_EVENT_MAC_VLAN_MAX = __ROCKER_TLV_EVENT_MAC_VLAN_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_RX_UNSPEC,
+	ROCKER_TLV_RX_FLAGS,		/* u16, see ROCKER_RX_FLAGS_ */
+	ROCKER_TLV_RX_CSUM,		/* u16 */
+	ROCKER_TLV_RX_FRAG_ADDR,	/* u64 */
+	ROCKER_TLV_RX_FRAG_MAX_LEN,	/* u16 */
+	ROCKER_TLV_RX_FRAG_LEN,		/* u16 */
+
+	__ROCKER_TLV_RX_MAX,
+	ROCKER_TLV_RX_MAX = __ROCKER_TLV_RX_MAX - 1,
+};
+
+#define ROCKER_RX_FLAGS_IPV4			(1 << 0)
+#define ROCKER_RX_FLAGS_IPV6			(1 << 1)
+#define ROCKER_RX_FLAGS_CSUM_CALC		(1 << 2)
+#define ROCKER_RX_FLAGS_IPV4_CSUM_GOOD		(1 << 3)
+#define ROCKER_RX_FLAGS_IP_FRAG			(1 << 4)
+#define ROCKER_RX_FLAGS_TCP			(1 << 5)
+#define ROCKER_RX_FLAGS_UDP			(1 << 6)
+#define ROCKER_RX_FLAGS_TCP_UDP_CSUM_GOOD	(1 << 7)
+
+enum {
+	ROCKER_TLV_TX_UNSPEC,
+	ROCKER_TLV_TX_OFFLOAD,		/* u8, see ROCKER_TX_OFFLOAD_ */
+	ROCKER_TLV_TX_L3_CSUM_OFF,	/* u16 */
+	ROCKER_TLV_TX_TSO_MSS,		/* u16 */
+	ROCKER_TLV_TX_TSO_HDR_LEN,	/* u16 */
+	ROCKER_TLV_TX_FRAGS,		/* array */
+
+	__ROCKER_TLV_TX_MAX,
+	ROCKER_TLV_TX_MAX = __ROCKER_TLV_TX_MAX - 1,
+};
+
+#define ROCKER_TX_OFFLOAD_NONE		0
+#define ROCKER_TX_OFFLOAD_IP_CSUM	1
+#define ROCKER_TX_OFFLOAD_TCP_UDP_CSUM	2
+#define ROCKER_TX_OFFLOAD_L3_CSUM	3
+#define ROCKER_TX_OFFLOAD_TSO		4
+
+#define ROCKER_TX_FRAGS_MAX		16
+
+enum {
+	ROCKER_TLV_TX_FRAG_UNSPEC,
+	ROCKER_TLV_TX_FRAG,		/* nest */
+
+	__ROCKER_TLV_TX_FRAG_MAX,
+	ROCKER_TLV_TX_FRAG_MAX = __ROCKER_TLV_TX_FRAG_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_TX_FRAG_ATTR_UNSPEC,
+	ROCKER_TLV_TX_FRAG_ATTR_ADDR,	/* u64 */
+	ROCKER_TLV_TX_FRAG_ATTR_LEN,	/* u16 */
+
+	__ROCKER_TLV_TX_FRAG_ATTR_MAX,
+	ROCKER_TLV_TX_FRAG_ATTR_MAX = __ROCKER_TLV_TX_FRAG_ATTR_MAX - 1,
+};
+
+/* cmd info nested for OF-DPA msgs */
+enum {
+	ROCKER_TLV_OF_DPA_UNSPEC,
+	ROCKER_TLV_OF_DPA_TABLE_ID,		/* u16 */
+	ROCKER_TLV_OF_DPA_PRIORITY,		/* u32 */
+	ROCKER_TLV_OF_DPA_HARDTIME,		/* u32 */
+	ROCKER_TLV_OF_DPA_IDLETIME,		/* u32 */
+	ROCKER_TLV_OF_DPA_COOKIE,		/* u64 */
+	ROCKER_TLV_OF_DPA_IN_LPORT,		/* u32 */
+	ROCKER_TLV_OF_DPA_IN_LPORT_MASK,	/* u32 */
+	ROCKER_TLV_OF_DPA_OUT_LPORT,		/* u32 */
+	ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,	/* u16 */
+	ROCKER_TLV_OF_DPA_GROUP_ID,		/* u32 */
+	ROCKER_TLV_OF_DPA_GROUP_ID_LOWER,	/* u32 */
+	ROCKER_TLV_OF_DPA_GROUP_COUNT,		/* u16 */
+	ROCKER_TLV_OF_DPA_GROUP_IDS,		/* u32 array */
+	ROCKER_TLV_OF_DPA_VLAN_ID,		/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_ID_MASK,		/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_PCP,		/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_PCP_MASK,	/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_PCP_ACTION,	/* u8 */
+	ROCKER_TLV_OF_DPA_NEW_VLAN_ID,		/* __be16 */
+	ROCKER_TLV_OF_DPA_NEW_VLAN_PCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_TUNNEL_ID,		/* u32 */
+	ROCKER_TLV_OF_DPA_TUN_LOG_LPORT,	/* u32 */
+	ROCKER_TLV_OF_DPA_ETHERTYPE,		/* __be16 */
+	ROCKER_TLV_OF_DPA_DST_MAC,		/* binary */
+	ROCKER_TLV_OF_DPA_DST_MAC_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_SRC_MAC,		/* binary */
+	ROCKER_TLV_OF_DPA_SRC_MAC_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_IP_PROTO,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_PROTO_MASK,	/* u8 */
+	ROCKER_TLV_OF_DPA_IP_DSCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_DSCP_MASK,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_DSCP_ACTION,	/* u8 */
+	ROCKER_TLV_OF_DPA_NEW_IP_DSCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_ECN,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_ECN_MASK,		/* u8 */
+	ROCKER_TLV_OF_DPA_DST_IP,		/* __be32 */
+	ROCKER_TLV_OF_DPA_DST_IP_MASK,		/* __be32 */
+	ROCKER_TLV_OF_DPA_SRC_IP,		/* __be32 */
+	ROCKER_TLV_OF_DPA_SRC_IP_MASK,		/* __be32 */
+	ROCKER_TLV_OF_DPA_DST_IPV6,		/* binary */
+	ROCKER_TLV_OF_DPA_DST_IPV6_MASK,	/* binary */
+	ROCKER_TLV_OF_DPA_SRC_IPV6,		/* binary */
+	ROCKER_TLV_OF_DPA_SRC_IPV6_MASK,	/* binary */
+	ROCKER_TLV_OF_DPA_SRC_ARP_IP,		/* __be32 */
+	ROCKER_TLV_OF_DPA_SRC_ARP_IP_MASK,	/* __be32 */
+	ROCKER_TLV_OF_DPA_L4_DST_PORT,		/* __be16 */
+	ROCKER_TLV_OF_DPA_L4_DST_PORT_MASK,	/* __be16 */
+	ROCKER_TLV_OF_DPA_L4_SRC_PORT,		/* __be16 */
+	ROCKER_TLV_OF_DPA_L4_SRC_PORT_MASK,	/* __be16 */
+	ROCKER_TLV_OF_DPA_ICMP_TYPE,		/* u8 */
+	ROCKER_TLV_OF_DPA_ICMP_TYPE_MASK,	/* u8 */
+	ROCKER_TLV_OF_DPA_ICMP_CODE,		/* u8 */
+	ROCKER_TLV_OF_DPA_ICMP_CODE_MASK,	/* u8 */
+	ROCKER_TLV_OF_DPA_IPV6_LABEL,		/* __be32 */
+	ROCKER_TLV_OF_DPA_IPV6_LABEL_MASK,	/* __be32 */
+	ROCKER_TLV_OF_DPA_QUEUE_ID_ACTION,	/* u8 */
+	ROCKER_TLV_OF_DPA_NEW_QUEUE_ID,		/* u8 */
+	ROCKER_TLV_OF_DPA_CLEAR_ACTIONS,	/* u32 */
+	ROCKER_TLV_OF_DPA_POP_VLAN,		/* u8 */
+	ROCKER_TLV_OF_DPA_TTL_CHECK,		/* u8 */
+	ROCKER_TLV_OF_DPA_COPY_CPU_ACTION,	/* u8 */
+
+	__ROCKER_TLV_OF_DPA_MAX,
+	ROCKER_TLV_OF_DPA_MAX = __ROCKER_TLV_OF_DPA_MAX - 1,
+};
+
+/* OF-DPA table IDs */
+
+enum rocker_of_dpa_table_id {
+	ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT = 0,
+	ROCKER_OF_DPA_TABLE_ID_VLAN = 10,
+	ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC = 20,
+	ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING = 30,
+	ROCKER_OF_DPA_TABLE_ID_MULTICAST_ROUTING = 40,
+	ROCKER_OF_DPA_TABLE_ID_BRIDGING = 50,
+	ROCKER_OF_DPA_TABLE_ID_ACL_POLICY = 60,
+};
+
+/* OF-DPA flow stats */
+enum {
+	ROCKER_TLV_OF_DPA_FLOW_STAT_UNSPEC,
+	ROCKER_TLV_OF_DPA_FLOW_STAT_DURATION,	/* u32 */
+	ROCKER_TLV_OF_DPA_FLOW_STAT_RX_PKTS,	/* u64 */
+	ROCKER_TLV_OF_DPA_FLOW_STAT_TX_PKTS,	/* u64 */
+
+	__ROCKER_TLV_OF_DPA_FLOW_STAT_MAX,
+	ROCKER_TLV_OF_DPA_FLOW_STAT_MAX = __ROCKER_TLV_OF_DPA_FLOW_STAT_MAX - 1,
+};
+
+/* OF-DPA group types */
+enum rocker_of_dpa_group_type {
+	ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE = 0,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_REWRITE,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_UCAST,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_FLOOD,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_INTERFACE,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_MCAST,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_ECMP,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_OVERLAY,
+};
+
+/* OF-DPA group L2 overlay types */
+enum rocker_of_dpa_overlay_type {
+	ROCKER_OF_DPA_OVERLAY_TYPE_FLOOD_UCAST = 0,
+	ROCKER_OF_DPA_OVERLAY_TYPE_FLOOD_MCAST,
+	ROCKER_OF_DPA_OVERLAY_TYPE_MCAST_UCAST,
+	ROCKER_OF_DPA_OVERLAY_TYPE_MCAST_MCAST,
+};
+
+/* OF-DPA group ID encoding */
+#define ROCKER_GROUP_TYPE_SHIFT 28
+#define ROCKER_GROUP_TYPE_MASK 0xf0000000
+#define ROCKER_GROUP_VLAN_SHIFT 16
+#define ROCKER_GROUP_VLAN_MASK 0x0fff0000
+#define ROCKER_GROUP_PORT_SHIFT 0
+#define ROCKER_GROUP_PORT_MASK 0x0000ffff
+#define ROCKER_GROUP_TUNNEL_ID_SHIFT 12
+#define ROCKER_GROUP_TUNNEL_ID_MASK 0x0ffff000
+#define ROCKER_GROUP_SUBTYPE_SHIFT 10
+#define ROCKER_GROUP_SUBTYPE_MASK 0x00000c00
+#define ROCKER_GROUP_INDEX_SHIFT 0
+#define ROCKER_GROUP_INDEX_MASK 0x0000ffff
+#define ROCKER_GROUP_INDEX_LONG_SHIFT 0
+#define ROCKER_GROUP_INDEX_LONG_MASK 0x0fffffff
+
+#define ROCKER_GROUP_TYPE_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_TYPE_MASK) >> ROCKER_GROUP_TYPE_SHIFT)
+#define ROCKER_GROUP_TYPE_SET(type) \
+	(((type) << ROCKER_GROUP_TYPE_SHIFT) & ROCKER_GROUP_TYPE_MASK)
+#define ROCKER_GROUP_VLAN_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_VLAN_ID_MASK) >> ROCKER_GROUP_VLAN_ID_SHIFT)
+#define ROCKER_GROUP_VLAN_SET(vlan_id) \
+	(((vlan_id) << ROCKER_GROUP_VLAN_SHIFT) & ROCKER_GROUP_VLAN_MASK)
+#define ROCKER_GROUP_PORT_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_PORT_MASK) >> ROCKER_GROUP_PORT_SHIFT)
+#define ROCKER_GROUP_PORT_SET(port) \
+	(((port) << ROCKER_GROUP_PORT_SHIFT) & ROCKER_GROUP_PORT_MASK)
+#define ROCKER_GROUP_INDEX_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_INDEX_MASK) >> ROCKER_GROUP_INDEX_SHIFT)
+#define ROCKER_GROUP_INDEX_SET(index) \
+	(((index) << ROCKER_GROUP_INDEX_SHIFT) & ROCKER_GROUP_INDEX_MASK)
+#define ROCKER_GROUP_INDEX_LONG_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_INDEX_LONG_MASK) >> \
+	 ROCKER_GROUP_INDEX_LONG_SHIFT)
+#define ROCKER_GROUP_INDEX_LONG_SET(index) \
+	(((index) << ROCKER_GROUP_INDEX_LONG_SHIFT) & \
+	 ROCKER_GROUP_INDEX_LONG_MASK)
+
+#define ROCKER_GROUP_NONE 0
+#define ROCKER_GROUP_L2_INTERFACE(vlan_id, port) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE) |\
+	 ROCKER_GROUP_VLAN_SET(ntohs(vlan_id)) | ROCKER_GROUP_PORT_SET(port))
+#define ROCKER_GROUP_L2_REWRITE(index) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_REWRITE) |\
+	 ROCKER_GROUP_INDEX_LONG_SET(index))
+#define ROCKER_GROUP_L2_MCAST(vlan_id, index) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST) |\
+	 ROCKER_GROUP_VLAN_SET(ntohs(vlan_id)) | ROCKER_GROUP_INDEX_SET(index))
+#define ROCKER_GROUP_L2_FLOOD(vlan_id, index) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_FLOOD) |\
+	ROCKER_GROUP_VLAN_SET(ntohs(vlan_id)) | ROCKER_GROUP_INDEX_SET(index))
+#define ROCKER_GROUP_L3_UNICAST(index) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L3_UCAST) |\
+	 ROCKER_GROUP_INDEX_LONG_SET(index))
+
+/* Rocker general purpose registers */
+#define ROCKER_CONTROL			0x0300
+#define ROCKER_PORT_PHYS_COUNT		0x0304
+#define ROCKER_PORT_PHYS_LINK_STATUS	0x0310 /* 8-byte */
+#define ROCKER_PORT_PHYS_ENABLE		0x0318 /* 8-byte */
+#define ROCKER_SWITCH_ID		0x0320 /* 8-byte */
+
+/* Rocker control bits */
+#define ROCKER_CONTROL_RESET		(1 << 0)
+
+#endif
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 04/10] net-sysfs: expose physical switch id for particular device
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/core/net-sysfs.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 55dc4da..8e6603c 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -12,6 +12,7 @@
 #include <linux/capability.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
+#include <net/switchdev.h>
 #include <linux/if_arp.h>
 #include <linux/slab.h>
 #include <linux/nsproxy.h>
@@ -399,6 +400,28 @@ static ssize_t phys_port_id_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(phys_port_id);
 
+static ssize_t phys_switch_id_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct net_device *netdev = to_net_dev(dev);
+	ssize_t ret = -EINVAL;
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	if (dev_isalive(netdev)) {
+		struct netdev_phys_item_id ppid;
+
+		ret = netdev_sw_parent_id_get(netdev, &ppid);
+		if (!ret)
+			ret = sprintf(buf, "%*phN\n", ppid.id_len, ppid.id);
+	}
+	rtnl_unlock();
+
+	return ret;
+}
+static DEVICE_ATTR_RO(phys_switch_id);
+
 static struct attribute *net_class_attrs[] = {
 	&dev_attr_netdev_group.attr,
 	&dev_attr_type.attr,
@@ -423,6 +446,7 @@ static struct attribute *net_class_attrs[] = {
 	&dev_attr_flags.attr,
 	&dev_attr_tx_queue_len.attr,
 	&dev_attr_phys_port_id.attr,
+	&dev_attr_phys_switch_id.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(net_class);
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 03/10] rtnl: expose physical switch id for particular device
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

The netdevice represents a port in a switch, it will expose
IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
belong to one physical switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 7072d83..4163753 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -145,6 +145,7 @@ enum {
 	IFLA_CARRIER,
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
+	IFLA_PHYS_SWITCH_ID,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1087c6d..f839354 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -43,6 +43,7 @@
 
 #include <linux/inet.h>
 #include <linux/netdevice.h>
+#include <net/switchdev.h>
 #include <net/ip.h>
 #include <net/protocol.h>
 #include <net/arp.h>
@@ -868,7 +869,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
 	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
 	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
-	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_SWITCH_ID */
 }
 
 static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
@@ -967,6 +969,24 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
 	return 0;
 }
 
+static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device *dev)
+{
+	int err;
+	struct netdev_phys_item_id psid;
+
+	err = netdev_sw_parent_id_get(dev, &psid);
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			return 0;
+		return err;
+	}
+
+	if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    int type, u32 pid, u32 seq, u32 change,
 			    unsigned int flags, u32 ext_filter_mask)
@@ -1039,6 +1059,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	if (rtnl_phys_port_id_fill(skb, dev))
 		goto nla_put_failure;
 
+	if (rtnl_phys_switch_id_fill(skb, dev))
+		goto nla_put_failure;
+
 	attr = nla_reserve(skb, IFLA_STATS,
 			sizeof(struct rtnl_link_stats));
 	if (attr == NULL)
@@ -1198,6 +1221,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
+	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 01/10] net: rename netdev_phys_port_id to more generic name
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

So this can be reused for identification of other "items" as well.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |  2 +-
 include/linux/netdevice.h                        | 16 ++++++++--------
 net/core/dev.c                                   |  2 +-
 net/core/net-sysfs.c                             |  2 +-
 net/core/rtnetlink.c                             |  6 +++---
 8 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index c4bd025..336ef3c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12537,7 +12537,7 @@ static int bnx2x_validate_addr(struct net_device *dev)
 }
 
 static int bnx2x_get_phys_port_id(struct net_device *netdev,
-				  struct netdev_phys_port_id *ppid)
+				  struct netdev_phys_item_id *ppid)
 {
 	struct bnx2x *bp = netdev_priv(netdev);
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1a98e23..d749165 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7373,7 +7373,7 @@ static void i40e_del_vxlan_port(struct net_device *netdev,
 
 #endif
 static int i40e_get_phys_port_id(struct net_device *netdev,
-				 struct netdev_phys_port_id *ppid)
+				 struct netdev_phys_item_id *ppid)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_pf *pf = np->vsi->back;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0efbae9..bd007c3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2258,7 +2258,7 @@ static int mlx4_en_set_vf_link_state(struct net_device *dev, int vf, int link_st
 
 #define PORT_ID_BYTE_LEN 8
 static int mlx4_en_get_phys_port_id(struct net_device *dev,
-				    struct netdev_phys_port_id *ppid)
+				    struct netdev_phys_item_id *ppid)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_dev *mdev = priv->mdev->dev;
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index f5e29f7..6e514d2 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -460,7 +460,7 @@ static void qlcnic_82xx_cancel_idc_work(struct qlcnic_adapter *adapter)
 }
 
 static int qlcnic_get_phys_port_id(struct net_device *netdev,
-				   struct netdev_phys_port_id *ppid)
+				   struct netdev_phys_item_id *ppid)
 {
 	struct qlcnic_adapter *adapter = netdev_priv(netdev);
 	struct qlcnic_hardware_context *ahw = adapter->ahw;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4767f54..bbc730f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -753,13 +753,13 @@ struct netdev_fcoe_hbainfo {
 };
 #endif
 
-#define MAX_PHYS_PORT_ID_LEN 32
+#define MAX_PHYS_ITEM_ID_LEN 32
 
-/* This structure holds a unique identifier to identify the
- * physical port used by a netdevice.
+/* This structure holds a unique identifier to identify some
+ * physical item (port for example) used by a netdevice.
  */
-struct netdev_phys_port_id {
-	unsigned char id[MAX_PHYS_PORT_ID_LEN];
+struct netdev_phys_item_id {
+	unsigned char id[MAX_PHYS_ITEM_ID_LEN];
 	unsigned char id_len;
 };
 
@@ -975,7 +975,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
  *
  * int (*ndo_get_phys_port_id)(struct net_device *dev,
- *			       struct netdev_phys_port_id *ppid);
+ *			       struct netdev_phys_item_id *ppid);
  *	Called to get ID of physical port of this device. If driver does
  *	not implement this, it is assumed that the hw is not able to have
  *	multiple net devices on single physical port.
@@ -1149,7 +1149,7 @@ struct net_device_ops {
 	int			(*ndo_change_carrier)(struct net_device *dev,
 						      bool new_carrier);
 	int			(*ndo_get_phys_port_id)(struct net_device *dev,
-							struct netdev_phys_port_id *ppid);
+							struct netdev_phys_item_id *ppid);
 	void			(*ndo_add_vxlan_port)(struct  net_device *dev,
 						      sa_family_t sa_family,
 						      __be16 port);
@@ -2878,7 +2878,7 @@ void dev_set_group(struct net_device *, int);
 int dev_set_mac_address(struct net_device *, struct sockaddr *);
 int dev_change_carrier(struct net_device *, bool new_carrier);
 int dev_get_phys_port_id(struct net_device *dev,
-			 struct netdev_phys_port_id *ppid);
+			 struct netdev_phys_item_id *ppid);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
diff --git a/net/core/dev.c b/net/core/dev.c
index 40be481..a170bcb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5817,7 +5817,7 @@ EXPORT_SYMBOL(dev_change_carrier);
  *	Get device physical port ID
  */
 int dev_get_phys_port_id(struct net_device *dev,
-			 struct netdev_phys_port_id *ppid)
+			 struct netdev_phys_item_id *ppid)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9dd0669..55dc4da 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -387,7 +387,7 @@ static ssize_t phys_port_id_show(struct device *dev,
 		return restart_syscall();
 
 	if (dev_isalive(netdev)) {
-		struct netdev_phys_port_id ppid;
+		struct netdev_phys_item_id ppid;
 
 		ret = dev_get_phys_port_id(netdev, &ppid);
 		if (!ret)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a688268..1087c6d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -868,7 +868,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
 	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
 	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
-	       + nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
 }
 
 static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
@@ -952,7 +952,7 @@ static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev,
 static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
 {
 	int err;
-	struct netdev_phys_port_id ppid;
+	struct netdev_phys_item_id ppid;
 
 	err = dev_get_phys_port_id(dev, &ppid);
 	if (err) {
@@ -1196,7 +1196,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PROMISCUITY]	= { .type = NLA_U32 },
 	[IFLA_NUM_TX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
-	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
+	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
 };
 
-- 
1.9.3

^ permalink raw reply related

* [patch net-next 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl

Hi all.

This patchset is just the first phase of switch and switch-ish device
support api in kernel. Note that the api will extend (our complete work
can be pulled from https://github.com/jpirko/net-next-rocker).

So what this patchset includes:
- introduce switchdev api for implementing switch drivers (so far
  only linux bridge fdb offload is covered)
- introduce rocker switch driver which implements switchdev api

As to the discussion if there is need to have specific class of device
representing the switch itself, so far we found no need to introduce that.
But we are generally ok with the idea and when the time comes and it will
be needed, it can be easily introduced without any disturbance.

This patchset introduces switch id export through rtnetlink and sysfs,
which is similar to what we have for port id in SR-IOV. I will send iproute2
patchset for showing the switch id for port netdevs once this is applied.

For detailed description, please see individual patches.

Jiri Pirko (5):
  net: rename netdev_phys_port_id to more generic name
  net: introduce generic switch devices support
  rtnl: expose physical switch id for particular device
  net-sysfs: expose physical switch id for particular device
  rocker: introduce rocker switch driver

Scott Feldman (5):
  bridge: introduce fdb offloading via switchdev
  bridge: call netdev_sw_port_stp_update when bridge port STP status
    changes
  bridge: add API to notify bridge driver of learned FBD on offloaded
    device
  rocker: implement rocker ofdpa flow table manipulation
  rocker: implement L2 bridge offloading

 Documentation/networking/switchdev.txt           |   59 +
 MAINTAINERS                                      |   14 +
 drivers/net/ethernet/Kconfig                     |    1 +
 drivers/net/ethernet/Makefile                    |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
 drivers/net/ethernet/rocker/Kconfig              |   27 +
 drivers/net/ethernet/rocker/Makefile             |    5 +
 drivers/net/ethernet/rocker/rocker.c             | 4180 ++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker.h             |  427 +++
 include/linux/if_bridge.h                        |   18 +
 include/linux/netdevice.h                        |   48 +-
 include/net/switchdev.h                          |   53 +
 include/uapi/linux/if_link.h                     |    1 +
 net/Kconfig                                      |    1 +
 net/Makefile                                     |    3 +
 net/bridge/br_fdb.c                              |   94 +-
 net/bridge/br_netlink.c                          |    2 +
 net/bridge/br_stp.c                              |    4 +
 net/bridge/br_stp_if.c                           |    3 +
 net/bridge/br_stp_timer.c                        |    2 +
 net/core/dev.c                                   |    2 +-
 net/core/net-sysfs.c                             |   26 +-
 net/core/rtnetlink.c                             |   30 +-
 net/switchdev/Kconfig                            |   13 +
 net/switchdev/Makefile                           |    5 +
 net/switchdev/switchdev.c                        |   93 +
 29 files changed, 5102 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 drivers/net/ethernet/rocker/Kconfig
 create mode 100644 drivers/net/ethernet/rocker/Makefile
 create mode 100644 drivers/net/ethernet/rocker/rocker.c
 create mode 100644 drivers/net/ethernet/rocker/rocker.h
 create mode 100644 include/net/switchdev.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c

-- 
1.9.3

^ permalink raw reply

* [PATCH net-next v2 14/14] openvswitch: Avoid NULL mask check while building mask
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

OVS does mask validation even if it does not need to convert
netlink mask attributes to mask structure.  ovs_nla_get_match()
caller can pass NULL mask structure pointer if the caller does
not need mask.  Therefore NULL check is required in SW_FLOW_KEY*
macros.  Following patch does not convert mask netlink attributes
if mask pointer is NULL, so we do not need these checks in
SW_FLOW_KEY* macro.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Andy Zhou <azhou@nicira.com>
---
 net/openvswitch/flow_netlink.c | 107 ++++++++++++++++++++---------------------
 1 file changed, 53 insertions(+), 54 deletions(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 482a0cb..ed31097 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -50,21 +50,18 @@
 
 #include "flow_netlink.h"
 
-static void update_range__(struct sw_flow_match *match,
-			   size_t offset, size_t size, bool is_mask)
+static void update_range(struct sw_flow_match *match,
+			 size_t offset, size_t size, bool is_mask)
 {
-	struct sw_flow_key_range *range = NULL;
+	struct sw_flow_key_range *range;
 	size_t start = rounddown(offset, sizeof(long));
 	size_t end = roundup(offset + size, sizeof(long));
 
 	if (!is_mask)
 		range = &match->range;
-	else if (match->mask)
+	else
 		range = &match->mask->range;
 
-	if (!range)
-		return;
-
 	if (range->start == range->end) {
 		range->start = start;
 		range->end = end;
@@ -80,22 +77,20 @@ static void update_range__(struct sw_flow_match *match,
 
 #define SW_FLOW_KEY_PUT(match, field, value, is_mask) \
 	do { \
-		update_range__(match, offsetof(struct sw_flow_key, field),  \
-				     sizeof((match)->key->field), is_mask); \
-		if (is_mask) {						    \
-			if ((match)->mask)				    \
-				(match)->mask->key.field = value;	    \
-		} else {                                                    \
+		update_range(match, offsetof(struct sw_flow_key, field),    \
+			     sizeof((match)->key->field), is_mask);	    \
+		if (is_mask)						    \
+			(match)->mask->key.field = value;		    \
+		else							    \
 			(match)->key->field = value;		            \
-		}                                                           \
 	} while (0)
 
 #define SW_FLOW_KEY_MEMCPY_OFFSET(match, offset, value_p, len, is_mask)	    \
 	do {								    \
-		update_range__(match, offset, len, is_mask);		    \
+		update_range(match, offset, len, is_mask);		    \
 		if (is_mask)						    \
 			memcpy((u8 *)&(match)->mask->key + offset, value_p, \
-			       len);					    \
+			       len);					   \
 		else							    \
 			memcpy((u8 *)(match)->key + offset, value_p, len);  \
 	} while (0)
@@ -104,18 +99,16 @@ static void update_range__(struct sw_flow_match *match,
 	SW_FLOW_KEY_MEMCPY_OFFSET(match, offsetof(struct sw_flow_key, field), \
 				  value_p, len, is_mask)
 
-#define SW_FLOW_KEY_MEMSET_FIELD(match, field, value, is_mask) \
-	do { \
-		update_range__(match, offsetof(struct sw_flow_key, field),  \
-				     sizeof((match)->key->field), is_mask); \
-		if (is_mask) {						    \
-			if ((match)->mask)				    \
-				memset((u8 *)&(match)->mask->key.field, value,\
-				       sizeof((match)->mask->key.field));   \
-		} else {                                                    \
+#define SW_FLOW_KEY_MEMSET_FIELD(match, field, value, is_mask)		    \
+	do {								    \
+		update_range(match, offsetof(struct sw_flow_key, field),    \
+			     sizeof((match)->key->field), is_mask);	    \
+		if (is_mask)						    \
+			memset((u8 *)&(match)->mask->key.field, value,      \
+			       sizeof((match)->mask->key.field));	    \
+		else							    \
 			memset((u8 *)&(match)->key->field, value,           \
 			       sizeof((match)->key->field));                \
-		}                                                           \
 	} while (0)
 
 static bool match_validate(const struct sw_flow_match *match,
@@ -677,8 +670,7 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 
 		SW_FLOW_KEY_PUT(match, eth.tci, tci, is_mask);
 		attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
-	} else if (!is_mask)
-		SW_FLOW_KEY_PUT(match, eth.tci, htons(0xffff), true);
+	}
 
 	if (attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) {
 		__be16 eth_type;
@@ -903,8 +895,8 @@ static void mask_set_nlattr(struct nlattr *attr, u8 val)
  * attribute specifies the mask field of the wildcarded flow.
  */
 int ovs_nla_get_match(struct sw_flow_match *match,
-		      const struct nlattr *key,
-		      const struct nlattr *mask)
+		      const struct nlattr *nla_key,
+		      const struct nlattr *nla_mask)
 {
 	const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
 	const struct nlattr *encap;
@@ -914,7 +906,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 	bool encap_valid = false;
 	int err;
 
-	err = parse_flow_nlattrs(key, a, &key_attrs);
+	err = parse_flow_nlattrs(nla_key, a, &key_attrs);
 	if (err)
 		return err;
 
@@ -955,36 +947,43 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 	if (err)
 		return err;
 
-	if (match->mask && !mask) {
-		/* Create an exact match mask. We need to set to 0xff all the
-		 * 'match->mask' fields that have been touched in 'match->key'.
-		 * We cannot simply memset 'match->mask', because padding bytes
-		 * and fields not specified in 'match->key' should be left to 0.
-		 * Instead, we use a stream of netlink attributes, copied from
-		 * 'key' and set to 0xff: ovs_key_from_nlattrs() will take care
-		 * of filling 'match->mask' appropriately.
-		 */
-		newmask = kmemdup(key, nla_total_size(nla_len(key)),
-				  GFP_KERNEL);
-		if (!newmask)
-			return -ENOMEM;
+	if (match->mask) {
+		if (!nla_mask) {
+			/* Create an exact match mask. We need to set to 0xff
+			 * all the 'match->mask' fields that have been touched
+			 * in 'match->key'. We cannot simply memset
+			 * 'match->mask', because padding bytes and fields not
+			 * specified in 'match->key' should be left to 0.
+			 * Instead, we use a stream of netlink attributes,
+			 * copied from 'key' and set to 0xff.
+			 * ovs_key_from_nlattrs() will take care of filling
+			 * 'match->mask' appropriately.
+			 */
+			newmask = kmemdup(nla_key,
+					  nla_total_size(nla_len(nla_key)),
+					  GFP_KERNEL);
+			if (!newmask)
+				return -ENOMEM;
 
-		mask_set_nlattr(newmask, 0xff);
+			mask_set_nlattr(newmask, 0xff);
 
-		/* The userspace does not send tunnel attributes that are 0,
-		 * but we should not wildcard them nonetheless.
-		 */
-		if (match->key->tun_key.ipv4_dst)
-			SW_FLOW_KEY_MEMSET_FIELD(match, tun_key, 0xff, true);
+			/* The userspace does not send tunnel attributes that
+			 * are 0, but we should not wildcard them nonetheless.
+			 */
+			if (match->key->tun_key.ipv4_dst)
+				SW_FLOW_KEY_MEMSET_FIELD(match, tun_key,
+							 0xff, true);
 
-		mask = newmask;
-	}
+			nla_mask = newmask;
+		}
 
-	if (mask) {
-		err = parse_flow_mask_nlattrs(mask, a, &mask_attrs);
+		err = parse_flow_mask_nlattrs(nla_mask, a, &mask_attrs);
 		if (err)
 			goto free_newmask;
 
+		/* Always match on tci. */
+		SW_FLOW_KEY_PUT(match, eth.tci, htons(0xffff), true);
+
 		if (mask_attrs & 1 << OVS_KEY_ATTR_ENCAP) {
 			__be16 eth_type = 0;
 			__be16 tci = 0;
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 13/14] openvswitch: Refactor action alloc and copy api.
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

There are two separate API to allocate and copy actions list. Anytime
OVS needs to copy action list, it needs to call both functions.
Following patch moves action allocation to copy function to avoid
code duplication.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
---
 net/openvswitch/datapath.c     | 25 ++++---------------------
 net/openvswitch/flow_netlink.c | 24 +++++++++++++++++-------
 net/openvswitch/flow_netlink.h |  1 -
 3 files changed, 21 insertions(+), 29 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 5101780..014485e 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -543,18 +543,12 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	if (err)
 		goto err_flow_free;
 
-	acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_PACKET_ATTR_ACTIONS]));
-	err = PTR_ERR(acts);
-	if (IS_ERR(acts))
-		goto err_flow_free;
-
 	err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
 				   &flow->key, &acts);
 	if (err)
 		goto err_flow_free;
 
 	rcu_assign_pointer(flow->sf_acts, acts);
-
 	OVS_CB(packet)->egress_tun_info = NULL;
 	packet->priority = flow->key.phy.priority;
 	packet->mark = flow->key.phy.skb_mark;
@@ -872,16 +866,11 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	ovs_flow_mask_key(&new_flow->key, &new_flow->unmasked_key, &mask);
 
 	/* Validate actions. */
-	acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_FLOW_ATTR_ACTIONS]));
-	error = PTR_ERR(acts);
-	if (IS_ERR(acts))
-		goto err_kfree_flow;
-
 	error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], &new_flow->key,
 				     &acts);
 	if (error) {
 		OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
-		goto err_kfree_acts;
+		goto err_kfree_flow;
 	}
 
 	reply = ovs_flow_cmd_alloc_info(acts, info, false);
@@ -972,6 +961,7 @@ error:
 	return error;
 }
 
+/* Factor out action copy to avoid "Wframe-larger-than=1024" warning. */
 static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
 						const struct sw_flow_key *key,
 						const struct sw_flow_mask *mask)
@@ -980,15 +970,10 @@ static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
 	struct sw_flow_key masked_key;
 	int error;
 
-	acts = ovs_nla_alloc_flow_actions(nla_len(a));
-	if (IS_ERR(acts))
-		return acts;
-
 	ovs_flow_mask_key(&masked_key, key, mask);
 	error = ovs_nla_copy_actions(a, &masked_key, &acts);
 	if (error) {
-		OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
-		kfree(acts);
+		OVS_NLERR("Actions may not be safe on all matching packets.\n");
 		return ERR_PTR(error);
 	}
 
@@ -1028,10 +1013,8 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
 			error = PTR_ERR(acts);
 			goto error;
 		}
-	}
 
-	/* Can allocate before locking if have acts. */
-	if (acts) {
+		/* Can allocate before locking if have acts. */
 		reply = ovs_flow_cmd_alloc_info(acts, info, false);
 		if (IS_ERR(reply)) {
 			error = PTR_ERR(reply);
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 1050b28..482a0cb 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1284,7 +1284,7 @@ nla_put_failure:
 
 #define MAX_ACTIONS_BUFSIZE	(32 * 1024)
 
-struct sw_flow_actions *ovs_nla_alloc_flow_actions(int size)
+static struct sw_flow_actions *nla_alloc_flow_actions(int size)
 {
 	struct sw_flow_actions *sfa;
 
@@ -1329,7 +1329,7 @@ static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
 		new_acts_size = MAX_ACTIONS_BUFSIZE;
 	}
 
-	acts = ovs_nla_alloc_flow_actions(new_acts_size);
+	acts = nla_alloc_flow_actions(new_acts_size);
 	if (IS_ERR(acts))
 		return (void *)acts;
 
@@ -1396,7 +1396,7 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
 	a->nla_len = sfa->actions_len - st_offset;
 }
 
-static int ovs_nla_copy_actions__(const struct nlattr *attr,
+static int __ovs_nla_copy_actions(const struct nlattr *attr,
 				  const struct sw_flow_key *key,
 				  int depth, struct sw_flow_actions **sfa,
 				  __be16 eth_type, __be16 vlan_tci);
@@ -1441,7 +1441,7 @@ static int validate_and_copy_sample(const struct nlattr *attr,
 	if (st_acts < 0)
 		return st_acts;
 
-	err = ovs_nla_copy_actions__(actions, key, depth + 1, sfa,
+	err = __ovs_nla_copy_actions(actions, key, depth + 1, sfa,
 				     eth_type, vlan_tci);
 	if (err)
 		return err;
@@ -1684,7 +1684,7 @@ static int copy_action(const struct nlattr *from,
 	return 0;
 }
 
-static int ovs_nla_copy_actions__(const struct nlattr *attr,
+static int __ovs_nla_copy_actions(const struct nlattr *attr,
 				  const struct sw_flow_key *key,
 				  int depth, struct sw_flow_actions **sfa,
 				  __be16 eth_type, __be16 vlan_tci)
@@ -1846,8 +1846,18 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 			 const struct sw_flow_key *key,
 			 struct sw_flow_actions **sfa)
 {
-	return ovs_nla_copy_actions__(attr, key, 0, sfa, key->eth.type,
-				      key->eth.tci);
+	int err;
+
+	*sfa = nla_alloc_flow_actions(nla_len(attr));
+	if (IS_ERR(*sfa))
+		return PTR_ERR(*sfa);
+
+	err = __ovs_nla_copy_actions(attr, key, 0, sfa, key->eth.type,
+				     key->eth.tci);
+	if (err)
+		kfree(*sfa);
+
+	return err;
 }
 
 static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 4f03706..eb0b177 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -56,7 +56,6 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 int ovs_nla_put_actions(const struct nlattr *attr,
 			int len, struct sk_buff *skb);
 
-struct sw_flow_actions *ovs_nla_alloc_flow_actions(int actions_len);
 void ovs_nla_free_flow_actions(struct sw_flow_actions *);
 
 #endif /* flow_netlink.h */
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 12/14] openvswitch: Move key_attr_size() to flow_netlink.h.
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Joe Stringer, Pravin B Shelar

From: Joe Stringer <joestringer@nicira.com>

flow-netlink has netlink related code.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/datapath.c     | 31 +++----------------------------
 net/openvswitch/flow_netlink.c | 32 ++++++++++++++++++++++++++++++++
 net/openvswitch/flow_netlink.h |  2 ++
 3 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 4fd8a45..5101780 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -375,37 +375,12 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
 	return err;
 }
 
-static size_t key_attr_size(void)
-{
-	return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
-		+ nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
-		  + nla_total_size(8)   /* OVS_TUNNEL_KEY_ATTR_ID */
-		  + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */
-		  + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */
-		  + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TOS */
-		  + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TTL */
-		  + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
-		  + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_CSUM */
-		  + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_OAM */
-		  + nla_total_size(256)   /* OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS */
-		+ nla_total_size(4)   /* OVS_KEY_ATTR_IN_PORT */
-		+ nla_total_size(4)   /* OVS_KEY_ATTR_SKB_MARK */
-		+ nla_total_size(12)  /* OVS_KEY_ATTR_ETHERNET */
-		+ nla_total_size(2)   /* OVS_KEY_ATTR_ETHERTYPE */
-		+ nla_total_size(4)   /* OVS_KEY_ATTR_8021Q */
-		+ nla_total_size(0)   /* OVS_KEY_ATTR_ENCAP */
-		+ nla_total_size(2)   /* OVS_KEY_ATTR_ETHERTYPE */
-		+ nla_total_size(40)  /* OVS_KEY_ATTR_IPV6 */
-		+ nla_total_size(2)   /* OVS_KEY_ATTR_ICMPV6 */
-		+ nla_total_size(28); /* OVS_KEY_ATTR_ND */
-}
-
 static size_t upcall_msg_size(const struct nlattr *userdata,
 			      unsigned int hdrlen)
 {
 	size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
 		+ nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
-		+ nla_total_size(key_attr_size()); /* OVS_PACKET_ATTR_KEY */
+		+ nla_total_size(ovs_key_attr_size()); /* OVS_PACKET_ATTR_KEY */
 
 	/* OVS_PACKET_ATTR_USERDATA */
 	if (userdata)
@@ -678,8 +653,8 @@ static void get_dp_stats(struct datapath *dp, struct ovs_dp_stats *stats,
 static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts)
 {
 	return NLMSG_ALIGN(sizeof(struct ovs_header))
-		+ nla_total_size(key_attr_size()) /* OVS_FLOW_ATTR_KEY */
-		+ nla_total_size(key_attr_size()) /* OVS_FLOW_ATTR_MASK */
+		+ nla_total_size(ovs_key_attr_size()) /* OVS_FLOW_ATTR_KEY */
+		+ nla_total_size(ovs_key_attr_size()) /* OVS_FLOW_ATTR_MASK */
 		+ nla_total_size(sizeof(struct ovs_flow_stats)) /* OVS_FLOW_ATTR_STATS */
 		+ nla_total_size(1) /* OVS_FLOW_ATTR_TCP_FLAGS */
 		+ nla_total_size(8) /* OVS_FLOW_ATTR_USED */
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 1b29ea7..1050b28 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -252,6 +252,38 @@ static bool match_validate(const struct sw_flow_match *match,
 	return true;
 }
 
+size_t ovs_key_attr_size(void)
+{
+	/* Whenever adding new OVS_KEY_ FIELDS, we should consider
+	 * updating this function.
+	 */
+	BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 22);
+
+	return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
+		+ nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
+		  + nla_total_size(8)   /* OVS_TUNNEL_KEY_ATTR_ID */
+		  + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */
+		  + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */
+		  + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TOS */
+		  + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TTL */
+		  + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
+		  + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_CSUM */
+		  + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_OAM */
+		  + nla_total_size(256) /* OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS */
+		+ nla_total_size(4)   /* OVS_KEY_ATTR_IN_PORT */
+		+ nla_total_size(4)   /* OVS_KEY_ATTR_SKB_MARK */
+		+ nla_total_size(4)   /* OVS_KEY_ATTR_DP_HASH */
+		+ nla_total_size(4)   /* OVS_KEY_ATTR_RECIRC_ID */
+		+ nla_total_size(12)  /* OVS_KEY_ATTR_ETHERNET */
+		+ nla_total_size(2)   /* OVS_KEY_ATTR_ETHERTYPE */
+		+ nla_total_size(4)   /* OVS_KEY_ATTR_VLAN */
+		+ nla_total_size(0)   /* OVS_KEY_ATTR_ENCAP */
+		+ nla_total_size(2)   /* OVS_KEY_ATTR_ETHERTYPE */
+		+ nla_total_size(40)  /* OVS_KEY_ATTR_IPV6 */
+		+ nla_total_size(2)   /* OVS_KEY_ATTR_ICMPV6 */
+		+ nla_total_size(28); /* OVS_KEY_ATTR_ND */
+}
+
 /* The size of the argument for each %OVS_KEY_ATTR_* Netlink attribute.  */
 static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_ENCAP] = -1,
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 6355b1d..4f03706 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -37,6 +37,8 @@
 
 #include "flow.h"
 
+size_t ovs_key_attr_size(void);
+
 void ovs_match_init(struct sw_flow_match *match,
 		    struct sw_flow_key *key, struct sw_flow_mask *mask);
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 11/14] openvswitch: Remove flow member from struct ovs_skb_cb
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Lorand Jakab, Pravin B Shelar

From: Lorand Jakab <lojakab@cisco.com>

The 'flow' memeber was chosen for removal because it's only used
in ovs_execute_actions() we can pass it as argument to this
function.

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/actions.c  |  5 +----
 net/openvswitch/datapath.c | 12 +++++++-----
 net/openvswitch/datapath.h |  4 +---
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 9fd33c0..f7e5891 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -865,14 +865,11 @@ static void process_deferred_actions(struct datapath *dp)
 
 /* Execute a list of actions against 'skb'. */
 int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
-			struct sw_flow_key *key)
+			struct sw_flow_actions *acts, struct sw_flow_key *key)
 {
 	int level = this_cpu_read(exec_actions_level);
-	struct sw_flow_actions *acts;
 	int err;
 
-	acts = rcu_dereference(OVS_CB(skb)->flow->sf_acts);
-
 	this_cpu_inc(exec_actions_level);
 	OVS_CB(skb)->egress_tun_info = NULL;
 	err = do_execute_actions(dp, skb, key,
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index cdbc44c..4fd8a45 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -257,6 +257,7 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
 	const struct vport *p = OVS_CB(skb)->input_vport;
 	struct datapath *dp = p->dp;
 	struct sw_flow *flow;
+	struct sw_flow_actions *sf_acts;
 	struct dp_stats_percpu *stats;
 	u64 *stats_counter;
 	u32 n_mask_hit;
@@ -282,10 +283,10 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
 		goto out;
 	}
 
-	OVS_CB(skb)->flow = flow;
+	ovs_flow_stats_update(flow, key->tp.flags, skb);
+	sf_acts = rcu_dereference(flow->sf_acts);
+	ovs_execute_actions(dp, skb, sf_acts, key);
 
-	ovs_flow_stats_update(OVS_CB(skb)->flow, key->tp.flags, skb);
-	ovs_execute_actions(dp, skb, key);
 	stats_counter = &stats->n_hit;
 
 out:
@@ -524,6 +525,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	struct sw_flow_actions *acts;
 	struct sk_buff *packet;
 	struct sw_flow *flow;
+	struct sw_flow_actions *sf_acts;
 	struct datapath *dp;
 	struct ethhdr *eth;
 	struct vport *input_vport;
@@ -579,7 +581,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	rcu_assign_pointer(flow->sf_acts, acts);
 
 	OVS_CB(packet)->egress_tun_info = NULL;
-	OVS_CB(packet)->flow = flow;
 	packet->priority = flow->key.phy.priority;
 	packet->mark = flow->key.phy.skb_mark;
 
@@ -597,9 +598,10 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		goto err_unlock;
 
 	OVS_CB(packet)->input_vport = input_vport;
+	sf_acts = rcu_dereference(flow->sf_acts);
 
 	local_bh_disable();
-	err = ovs_execute_actions(dp, packet, &flow->key);
+	err = ovs_execute_actions(dp, packet, sf_acts, &flow->key);
 	local_bh_enable();
 	rcu_read_unlock();
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 9741354..1c56a80 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -94,14 +94,12 @@ struct datapath {
 
 /**
  * struct ovs_skb_cb - OVS data in skb CB
- * @flow: The flow associated with this packet.  May be %NULL if no flow.
  * @egress_tun_key: Tunnel information about this packet on egress path.
  * NULL if the packet is not being tunneled.
  * @input_vport: The original vport packet came in on. This value is cached
  * when a packet is received by OVS.
  */
 struct ovs_skb_cb {
-	struct sw_flow		*flow;
 	struct ovs_tunnel_info  *egress_tun_info;
 	struct vport		*input_vport;
 };
@@ -194,7 +192,7 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq,
 					 u8 cmd);
 
 int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
-			struct sw_flow_key *);
+			struct sw_flow_actions *acts, struct sw_flow_key *);
 
 void ovs_dp_notify_wq(struct work_struct *work);
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 10/14] openvswitch: Fix the type of struct ovs_key_nd nd_target field.
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Jarno Rajahalme, Pravin B Shelar

From: Jarno Rajahalme <jrajahalme@nicira.com>

Should be the same as other IPv6 address fields.

Current master produces sparse warnings without this change.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 include/uapi/linux/openvswitch.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 631056b..26c36c4 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -400,9 +400,9 @@ struct ovs_key_arp {
 };
 
 struct ovs_key_nd {
-	__u32 nd_target[4];
-	__u8  nd_sll[ETH_ALEN];
-	__u8  nd_tll[ETH_ALEN];
+	__be32	nd_target[4];
+	__u8	nd_sll[ETH_ALEN];
+	__u8	nd_tll[ETH_ALEN];
 };
 
 /**
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 09/14] openvswitch: Drop packets when interdev is not up
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Chunhe Li, Pravin B Shelar

From: Chunhe Li <lichunhe@huawei.com>

If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.

Signed-off-by: Chunhe Li <lichunhe@huawei.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/vport-internal_dev.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 10dc07e..6a55f71 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -224,6 +224,11 @@ static int internal_dev_recv(struct vport *vport, struct sk_buff *skb)
 	struct net_device *netdev = netdev_vport_priv(vport)->dev;
 	int len;
 
+	if (unlikely(!(netdev->flags & IFF_UP))) {
+		kfree_skb(skb);
+		return 0;
+	}
+
 	len = skb->len;
 
 	skb_dst_drop(skb);
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 08/14] openvswitch: Refactor get_dp() function into multiple access APIs.
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou, Pravin B Shelar

From: Andy Zhou <azhou@nicira.com>

Avoid recursive read_rcu_lock() by using the lighter weight
get_dp_rcu() API. Add proper locking assertions to get_dp().

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/datapath.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index bbb920b..cdbc44c 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -140,19 +140,30 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *,
 static int queue_userspace_packet(struct datapath *dp, struct sk_buff *,
 				  const struct dp_upcall_info *);
 
-/* Must be called with rcu_read_lock or ovs_mutex. */
-static struct datapath *get_dp(struct net *net, int dp_ifindex)
+/* Must be called with rcu_read_lock. */
+static struct datapath *get_dp_rcu(struct net *net, int dp_ifindex)
 {
-	struct datapath *dp = NULL;
-	struct net_device *dev;
+	struct net_device *dev = dev_get_by_index_rcu(net, dp_ifindex);
 
-	rcu_read_lock();
-	dev = dev_get_by_index_rcu(net, dp_ifindex);
 	if (dev) {
 		struct vport *vport = ovs_internal_dev_get_vport(dev);
 		if (vport)
-			dp = vport->dp;
+			return vport->dp;
 	}
+
+	return NULL;
+}
+
+/* The caller must hold either ovs_mutex or rcu_read_lock to keep the
+ * returned dp pointer valid.
+ */
+static inline struct datapath *get_dp(struct net *net, int dp_ifindex)
+{
+	struct datapath *dp;
+
+	WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_ovsl_is_held());
+	rcu_read_lock();
+	dp = get_dp_rcu(net, dp_ifindex);
 	rcu_read_unlock();
 
 	return dp;
@@ -573,7 +584,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	packet->mark = flow->key.phy.skb_mark;
 
 	rcu_read_lock();
-	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+	dp = get_dp_rcu(sock_net(skb->sk), ovs_header->dp_ifindex);
 	err = -ENODEV;
 	if (!dp)
 		goto err_unlock;
@@ -1227,7 +1238,7 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	struct datapath *dp;
 
 	rcu_read_lock();
-	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+	dp = get_dp_rcu(sock_net(skb->sk), ovs_header->dp_ifindex);
 	if (!dp) {
 		rcu_read_unlock();
 		return -ENODEV;
@@ -1989,7 +2000,7 @@ static int ovs_vport_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	int i, j = 0;
 
 	rcu_read_lock();
-	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+	dp = get_dp_rcu(sock_net(skb->sk), ovs_header->dp_ifindex);
 	if (!dp) {
 		rcu_read_unlock();
 		return -ENODEV;
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 07/14] openvswitch: Refactor ovs_flow_cmd_fill_info().
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Joe Stringer, Pravin B Shelar

From: Joe Stringer <joestringer@nicira.com>

Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a
dump reply. This will be used to streamline flow_dump in a future patch.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/datapath.c | 93 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 66 insertions(+), 27 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 04a26ae..bbb920b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -674,58 +674,67 @@ static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts)
 }
 
 /* Called with ovs_mutex or RCU read lock. */
-static int ovs_flow_cmd_fill_info(const struct sw_flow *flow, int dp_ifindex,
-				  struct sk_buff *skb, u32 portid,
-				  u32 seq, u32 flags, u8 cmd)
+static int ovs_flow_cmd_fill_match(const struct sw_flow *flow,
+				   struct sk_buff *skb)
 {
-	const int skb_orig_len = skb->len;
-	struct nlattr *start;
-	struct ovs_flow_stats stats;
-	__be16 tcp_flags;
-	unsigned long used;
-	struct ovs_header *ovs_header;
 	struct nlattr *nla;
 	int err;
 
-	ovs_header = genlmsg_put(skb, portid, seq, &dp_flow_genl_family, flags, cmd);
-	if (!ovs_header)
-		return -EMSGSIZE;
-
-	ovs_header->dp_ifindex = dp_ifindex;
-
 	/* Fill flow key. */
 	nla = nla_nest_start(skb, OVS_FLOW_ATTR_KEY);
 	if (!nla)
-		goto nla_put_failure;
+		return -EMSGSIZE;
 
 	err = ovs_nla_put_flow(&flow->unmasked_key, &flow->unmasked_key, skb);
 	if (err)
-		goto error;
+		return err;
+
 	nla_nest_end(skb, nla);
 
+	/* Fill flow mask. */
 	nla = nla_nest_start(skb, OVS_FLOW_ATTR_MASK);
 	if (!nla)
-		goto nla_put_failure;
+		return -EMSGSIZE;
 
 	err = ovs_nla_put_flow(&flow->key, &flow->mask->key, skb);
 	if (err)
-		goto error;
+		return err;
 
 	nla_nest_end(skb, nla);
+	return 0;
+}
+
+/* Called with ovs_mutex or RCU read lock. */
+static int ovs_flow_cmd_fill_stats(const struct sw_flow *flow,
+				   struct sk_buff *skb)
+{
+	struct ovs_flow_stats stats;
+	__be16 tcp_flags;
+	unsigned long used;
 
 	ovs_flow_stats_get(flow, &stats, &used, &tcp_flags);
 
 	if (used &&
 	    nla_put_u64(skb, OVS_FLOW_ATTR_USED, ovs_flow_used_time(used)))
-		goto nla_put_failure;
+		return -EMSGSIZE;
 
 	if (stats.n_packets &&
 	    nla_put(skb, OVS_FLOW_ATTR_STATS, sizeof(struct ovs_flow_stats), &stats))
-		goto nla_put_failure;
+		return -EMSGSIZE;
 
 	if ((u8)ntohs(tcp_flags) &&
 	     nla_put_u8(skb, OVS_FLOW_ATTR_TCP_FLAGS, (u8)ntohs(tcp_flags)))
-		goto nla_put_failure;
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+/* Called with ovs_mutex or RCU read lock. */
+static int ovs_flow_cmd_fill_actions(const struct sw_flow *flow,
+				     struct sk_buff *skb, int skb_orig_len)
+{
+	struct nlattr *start;
+	int err;
 
 	/* If OVS_FLOW_ATTR_ACTIONS doesn't fit, skip dumping the actions if
 	 * this is the first flow to be dumped into 'skb'.  This is unusual for
@@ -749,17 +758,47 @@ static int ovs_flow_cmd_fill_info(const struct sw_flow *flow, int dp_ifindex,
 			nla_nest_end(skb, start);
 		else {
 			if (skb_orig_len)
-				goto error;
+				return err;
 
 			nla_nest_cancel(skb, start);
 		}
-	} else if (skb_orig_len)
-		goto nla_put_failure;
+	} else if (skb_orig_len) {
+		return -EMSGSIZE;
+	}
+
+	return 0;
+}
+
+/* Called with ovs_mutex or RCU read lock. */
+static int ovs_flow_cmd_fill_info(const struct sw_flow *flow, int dp_ifindex,
+				  struct sk_buff *skb, u32 portid,
+				  u32 seq, u32 flags, u8 cmd)
+{
+	const int skb_orig_len = skb->len;
+	struct ovs_header *ovs_header;
+	int err;
+
+	ovs_header = genlmsg_put(skb, portid, seq, &dp_flow_genl_family,
+				 flags, cmd);
+	if (!ovs_header)
+		return -EMSGSIZE;
+
+	ovs_header->dp_ifindex = dp_ifindex;
+
+	err = ovs_flow_cmd_fill_match(flow, skb);
+	if (err)
+		goto error;
+
+	err = ovs_flow_cmd_fill_stats(flow, skb);
+	if (err)
+		goto error;
+
+	err = ovs_flow_cmd_fill_actions(flow, skb, skb_orig_len);
+	if (err)
+		goto error;
 
 	return genlmsg_end(skb, ovs_header);
 
-nla_put_failure:
-	err = -EMSGSIZE;
 error:
 	genlmsg_cancel(skb, ovs_header);
 	return err;
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 06/14] openvswitch: refactor do_output() to move NULL check out of fast path
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou, Pravin B Shelar

From: Andy Zhou <azhou@nicira.com>

skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.

Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/actions.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 930b1b6..9fd33c0 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -551,21 +551,14 @@ static int set_sctp(struct sk_buff *skb,
 	return 0;
 }
 
-static int do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
+static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
 {
-	struct vport *vport;
+	struct vport *vport = ovs_vport_rcu(dp, out_port);
 
-	if (unlikely(!skb))
-		return -ENOMEM;
-
-	vport = ovs_vport_rcu(dp, out_port);
-	if (unlikely(!vport)) {
+	if (likely(vport))
+		ovs_vport_send(vport, skb);
+	else
 		kfree_skb(skb);
-		return -ENODEV;
-	}
-
-	ovs_vport_send(vport, skb);
-	return 0;
 }
 
 static int output_userspace(struct datapath *dp, struct sk_buff *skb,
@@ -768,8 +761,12 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 	     a = nla_next(a, &rem)) {
 		int err = 0;
 
-		if (prev_port != -1) {
-			do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port);
+		if (unlikely(prev_port != -1)) {
+			struct sk_buff *out_skb = skb_clone(skb, GFP_ATOMIC);
+
+			if (out_skb)
+				do_output(dp, out_skb, prev_port);
+
 			prev_port = -1;
 		}
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 05/14] openvswitch: Additional logging for -EINVAL on flow setups.
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Jesse Gross, Federico Iezzi, Pravin B Shelar

From: Jesse Gross <jesse@nicira.com>

There are many possible ways that a flow can be invalid so we've
added logging for most of them. This adds logs for the remaining
possible cases so there isn't any ambiguity while debugging.

CC: Federico Iezzi <fiezzi@enter.it>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/datapath.c     | 12 +++++++++---
 net/openvswitch/flow_netlink.c | 17 +++++++++++++----
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index a532a9c..04a26ae 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -817,10 +817,14 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 
 	/* Must have key and actions. */
 	error = -EINVAL;
-	if (!a[OVS_FLOW_ATTR_KEY])
+	if (!a[OVS_FLOW_ATTR_KEY]) {
+		OVS_NLERR("Flow key attribute not present in new flow.\n");
 		goto error;
-	if (!a[OVS_FLOW_ATTR_ACTIONS])
+	}
+	if (!a[OVS_FLOW_ATTR_ACTIONS]) {
+		OVS_NLERR("Flow actions attribute not present in new flow.\n");
 		goto error;
+	}
 
 	/* Most of the time we need to allocate a new flow, do it before
 	 * locking.
@@ -979,8 +983,10 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
 
 	/* Extract key. */
 	error = -EINVAL;
-	if (!a[OVS_FLOW_ATTR_KEY])
+	if (!a[OVS_FLOW_ATTR_KEY]) {
+		OVS_NLERR("Flow key attribute not present in set flow.\n");
 		goto error;
+	}
 
 	ovs_match_init(&match, &key, &mask);
 	error = ovs_nla_get_match(&match,
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 5a91d79..1b29ea7 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -581,10 +581,13 @@ static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
 	if (*attrs & (1 << OVS_KEY_ATTR_IN_PORT)) {
 		u32 in_port = nla_get_u32(a[OVS_KEY_ATTR_IN_PORT]);
 
-		if (is_mask)
+		if (is_mask) {
 			in_port = 0xffffffff; /* Always exact match in_port. */
-		else if (in_port >= DP_MAX_PORTS)
+		} else if (in_port >= DP_MAX_PORTS) {
+			OVS_NLERR("Port (%d) exceeds maximum allowable (%d).\n",
+				  in_port, DP_MAX_PORTS);
 			return -EINVAL;
+		}
 
 		SW_FLOW_KEY_PUT(match, phy.in_port, in_port, is_mask);
 		*attrs &= ~(1 << OVS_KEY_ATTR_IN_PORT);
@@ -824,8 +827,11 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 		attrs &= ~(1 << OVS_KEY_ATTR_ND);
 	}
 
-	if (attrs != 0)
+	if (attrs != 0) {
+		OVS_NLERR("Unknown key attributes (%llx).\n",
+			  (unsigned long long)attrs);
 		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -1250,8 +1256,10 @@ struct sw_flow_actions *ovs_nla_alloc_flow_actions(int size)
 {
 	struct sw_flow_actions *sfa;
 
-	if (size > MAX_ACTIONS_BUFSIZE)
+	if (size > MAX_ACTIONS_BUFSIZE) {
+		OVS_NLERR("Flow action size (%u bytes) exceeds maximum", size);
 		return ERR_PTR(-EINVAL);
+	}
 
 	sfa = kmalloc(sizeof(*sfa) + size, GFP_KERNEL);
 	if (!sfa)
@@ -1786,6 +1794,7 @@ static int ovs_nla_copy_actions__(const struct nlattr *attr,
 			break;
 
 		default:
+			OVS_NLERR("Unknown tunnel attribute (%d).\n", type);
 			return -EINVAL;
 		}
 		if (!skip_copy) {
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 04/14] openvswitch: Remove redundant tcp_flags code.
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Joe Stringer, Pravin B Shelar

From: Joe Stringer <joestringer@nicira.com>

These two cases used to be treated differently for IPv4/IPv6,
but they are now identical.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/flow_netlink.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 569309c..5a91d79 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -611,7 +611,6 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 				const struct nlattr **a, bool is_mask)
 {
 	int err;
-	u64 orig_attrs = attrs;
 
 	err = metadata_from_nlattrs(match, &attrs, a, is_mask);
 	if (err)
@@ -764,15 +763,9 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 	}
 
 	if (attrs & (1 << OVS_KEY_ATTR_TCP_FLAGS)) {
-		if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) {
-			SW_FLOW_KEY_PUT(match, tp.flags,
-					nla_get_be16(a[OVS_KEY_ATTR_TCP_FLAGS]),
-					is_mask);
-		} else {
-			SW_FLOW_KEY_PUT(match, tp.flags,
-					nla_get_be16(a[OVS_KEY_ATTR_TCP_FLAGS]),
-					is_mask);
-		}
+		SW_FLOW_KEY_PUT(match, tp.flags,
+				nla_get_be16(a[OVS_KEY_ATTR_TCP_FLAGS]),
+				is_mask);
 		attrs &= ~(1 << OVS_KEY_ATTR_TCP_FLAGS);
 	}
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 03/14] openvswitch: Move table destroy to dp-rcu callback.
From: Pravin B Shelar @ 2014-11-06  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

Ths simplifies flow-table-destroy API. No need to pass explicit
parameter about context.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
---
 net/openvswitch/datapath.c   |  5 ++---
 net/openvswitch/flow_table.c | 11 +++++++----
 net/openvswitch/flow_table.h |  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 688cb9b..a532a9c 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -187,6 +187,7 @@ static void destroy_dp_rcu(struct rcu_head *rcu)
 {
 	struct datapath *dp = container_of(rcu, struct datapath, rcu);
 
+	ovs_flow_tbl_destroy(&dp->table);
 	free_percpu(dp->stats_percpu);
 	release_net(ovs_dp_get_net(dp));
 	kfree(dp->ports);
@@ -1444,7 +1445,7 @@ err_destroy_ports_array:
 err_destroy_percpu:
 	free_percpu(dp->stats_percpu);
 err_destroy_table:
-	ovs_flow_tbl_destroy(&dp->table, false);
+	ovs_flow_tbl_destroy(&dp->table);
 err_free_dp:
 	release_net(ovs_dp_get_net(dp));
 	kfree(dp);
@@ -1476,8 +1477,6 @@ static void __dp_destroy(struct datapath *dp)
 	ovs_dp_detach_port(ovs_vport_ovsl(dp, OVSP_LOCAL));
 
 	/* RCU destroy the flow table */
-	ovs_flow_tbl_destroy(&dp->table, true);
-
 	call_rcu(&dp->rcu, destroy_dp_rcu);
 }
 
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index cf2d853..90f8b40 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2013 Nicira, Inc.
+ * Copyright (c) 2007-2014 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -250,11 +250,14 @@ skip_flows:
 		__table_instance_destroy(ti);
 }
 
-void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred)
+/* No need for locking this function is called from RCU callback or
+ * error path.
+ */
+void ovs_flow_tbl_destroy(struct flow_table *table)
 {
-	struct table_instance *ti = ovsl_dereference(table->ti);
+	struct table_instance *ti = rcu_dereference_raw(table->ti);
 
-	table_instance_destroy(ti, deferred);
+	table_instance_destroy(ti, false);
 }
 
 struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *ti,
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index 5918bff..f682c8c 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -62,7 +62,7 @@ void ovs_flow_free(struct sw_flow *, bool deferred);
 
 int ovs_flow_tbl_init(struct flow_table *);
 int ovs_flow_tbl_count(struct flow_table *table);
-void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred);
+void ovs_flow_tbl_destroy(struct flow_table *table);
 int ovs_flow_tbl_flush(struct flow_table *flow_table);
 
 int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 02/14] openvswitch: Add basic MPLS support to kernel
From: Pravin B Shelar @ 2014-11-06  9:18 UTC (permalink / raw)
  To: davem
  Cc: netdev, Simon Horman, Ravi K, Leo Alterman, Isaku Yamahata,
	Joe Stringer, Jesse Gross, Pravin B Shelar

From: Simon Horman <horms@verge.net.au>

Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 include/net/mpls.h               |  39 +++++++++++
 include/uapi/linux/openvswitch.h |  32 +++++++++
 net/core/dev.c                   |   3 +-
 net/openvswitch/Kconfig          |   1 +
 net/openvswitch/actions.c        | 106 ++++++++++++++++++++++++++++-
 net/openvswitch/datapath.c       |   6 +-
 net/openvswitch/flow.c           |  30 +++++++++
 net/openvswitch/flow.h           |  17 +++--
 net/openvswitch/flow_netlink.c   | 139 ++++++++++++++++++++++++++++++++++-----
 net/openvswitch/flow_netlink.h   |   2 +-
 10 files changed, 345 insertions(+), 30 deletions(-)
 create mode 100644 include/net/mpls.h

diff --git a/include/net/mpls.h b/include/net/mpls.h
new file mode 100644
index 0000000..5b3b5ad
--- /dev/null
+++ b/include/net/mpls.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (c) 2014 Nicira, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef _NET_MPLS_H
+#define _NET_MPLS_H 1
+
+#include <linux/if_ether.h>
+#include <linux/netdevice.h>
+
+#define MPLS_HLEN 4
+
+static inline bool eth_p_mpls(__be16 eth_type)
+{
+	return eth_type == htons(ETH_P_MPLS_UC) ||
+		eth_type == htons(ETH_P_MPLS_MC);
+}
+
+/*
+ * For non-MPLS skbs this will correspond to the network header.
+ * For MPLS skbs it will be before the network_header as the MPLS
+ * label stack lies between the end of the mac header and the network
+ * header. That is, for MPLS skbs the end of the mac header
+ * is the top of the MPLS label stack.
+ */
+static inline unsigned char *skb_mpls_header(struct sk_buff *skb)
+{
+	return skb_mac_header(skb) + skb->mac_len;
+}
+#endif
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 435eabc..631056b 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -293,6 +293,9 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_DP_HASH,      /* u32 hash value. Value 0 indicates the hash
 				   is not computed by the datapath. */
 	OVS_KEY_ATTR_RECIRC_ID, /* u32 recirc id */
+	OVS_KEY_ATTR_MPLS,      /* array of struct ovs_key_mpls.
+				 * The implementation may restrict
+				 * the accepted length of the array. */
 
 #ifdef __KERNEL__
 	OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ovs_tunnel_info */
@@ -340,6 +343,10 @@ struct ovs_key_ethernet {
 	__u8	 eth_dst[ETH_ALEN];
 };
 
+struct ovs_key_mpls {
+	__be32 mpls_lse;
+};
+
 struct ovs_key_ipv4 {
 	__be32 ipv4_src;
 	__be32 ipv4_dst;
@@ -484,6 +491,19 @@ enum ovs_userspace_attr {
 #define OVS_USERSPACE_ATTR_MAX (__OVS_USERSPACE_ATTR_MAX - 1)
 
 /**
+ * struct ovs_action_push_mpls - %OVS_ACTION_ATTR_PUSH_MPLS action argument.
+ * @mpls_lse: MPLS label stack entry to push.
+ * @mpls_ethertype: Ethertype to set in the encapsulating ethernet frame.
+ *
+ * The only values @mpls_ethertype should ever be given are %ETH_P_MPLS_UC and
+ * %ETH_P_MPLS_MC, indicating MPLS unicast or multicast. Other are rejected.
+ */
+struct ovs_action_push_mpls {
+	__be32 mpls_lse;
+	__be16 mpls_ethertype; /* Either %ETH_P_MPLS_UC or %ETH_P_MPLS_MC */
+};
+
+/**
  * struct ovs_action_push_vlan - %OVS_ACTION_ATTR_PUSH_VLAN action argument.
  * @vlan_tpid: Tag protocol identifier (TPID) to push.
  * @vlan_tci: Tag control identifier (TCI) to push.  The CFI bit must be set
@@ -534,6 +554,15 @@ struct ovs_action_hash {
  * @OVS_ACTION_ATTR_POP_VLAN: Pop the outermost 802.1Q header off the packet.
  * @OVS_ACTION_ATTR_SAMPLE: Probabilitically executes actions, as specified in
  * the nested %OVS_SAMPLE_ATTR_* attributes.
+ * @OVS_ACTION_ATTR_PUSH_MPLS: Push a new MPLS label stack entry onto the
+ * top of the packets MPLS label stack.  Set the ethertype of the
+ * encapsulating frame to either %ETH_P_MPLS_UC or %ETH_P_MPLS_MC to
+ * indicate the new packet contents.
+ * @OVS_ACTION_ATTR_POP_MPLS: Pop an MPLS label stack entry off of the
+ * packet's MPLS label stack.  Set the encapsulating frame's ethertype to
+ * indicate the new packet contents. This could potentially still be
+ * %ETH_P_MPLS if the resulting MPLS label stack is not empty.  If there
+ * is no MPLS label stack, as determined by ethertype, no action is taken.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -550,6 +579,9 @@ enum ovs_action_attr {
 	OVS_ACTION_ATTR_SAMPLE,       /* Nested OVS_SAMPLE_ATTR_*. */
 	OVS_ACTION_ATTR_RECIRC,       /* u32 recirc_id. */
 	OVS_ACTION_ATTR_HASH,	      /* struct ovs_action_hash. */
+	OVS_ACTION_ATTR_PUSH_MPLS,    /* struct ovs_action_push_mpls. */
+	OVS_ACTION_ATTR_POP_MPLS,     /* __be16 ethertype. */
+
 	__OVS_ACTION_ATTR_MAX
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 40be481..70bb609 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -118,6 +118,7 @@
 #include <linux/if_vlan.h>
 #include <linux/ip.h>
 #include <net/ip.h>
+#include <net/mpls.h>
 #include <linux/ipv6.h>
 #include <linux/in.h>
 #include <linux/jhash.h>
@@ -2530,7 +2531,7 @@ static netdev_features_t net_mpls_features(struct sk_buff *skb,
 					   netdev_features_t features,
 					   __be16 type)
 {
-	if (type == htons(ETH_P_MPLS_UC) || type == htons(ETH_P_MPLS_MC))
+	if (eth_p_mpls(type))
 		features &= skb->dev->mpls_features;
 
 	return features;
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 2a9673e..454ce12 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -30,6 +30,7 @@ config OPENVSWITCH
 
 config OPENVSWITCH_GRE
 	tristate "Open vSwitch GRE tunneling support"
+	select NET_MPLS_GSO
 	depends on INET
 	depends on OPENVSWITCH
 	depends on NET_IPGRE_DEMUX
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 922c133..930b1b6 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -28,10 +28,12 @@
 #include <linux/in6.h>
 #include <linux/if_arp.h>
 #include <linux/if_vlan.h>
+
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/checksum.h>
 #include <net/dsfield.h>
+#include <net/mpls.h>
 #include <net/sctp/checksum.h>
 
 #include "datapath.h"
@@ -118,6 +120,92 @@ static int make_writable(struct sk_buff *skb, int write_len)
 	return pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
 }
 
+static int push_mpls(struct sk_buff *skb,
+		     const struct ovs_action_push_mpls *mpls)
+{
+	__be32 *new_mpls_lse;
+	struct ethhdr *hdr;
+
+	/* Networking stack do not allow simultaneous Tunnel and MPLS GSO. */
+	if (skb->encapsulation)
+		return -ENOTSUPP;
+
+	if (skb_cow_head(skb, MPLS_HLEN) < 0)
+		return -ENOMEM;
+
+	skb_push(skb, MPLS_HLEN);
+	memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
+		skb->mac_len);
+	skb_reset_mac_header(skb);
+
+	new_mpls_lse = (__be32 *)skb_mpls_header(skb);
+	*new_mpls_lse = mpls->mpls_lse;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse,
+							     MPLS_HLEN, 0));
+
+	hdr = eth_hdr(skb);
+	hdr->h_proto = mpls->mpls_ethertype;
+
+	skb_set_inner_protocol(skb, skb->protocol);
+	skb->protocol = mpls->mpls_ethertype;
+
+	return 0;
+}
+
+static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
+{
+	struct ethhdr *hdr;
+	int err;
+
+	err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+	if (unlikely(err))
+		return err;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_sub(skb->csum,
+				     csum_partial(skb_mpls_header(skb),
+						  MPLS_HLEN, 0));
+
+	memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
+		skb->mac_len);
+
+	__skb_pull(skb, MPLS_HLEN);
+	skb_reset_mac_header(skb);
+
+	/* skb_mpls_header() is used to locate the ethertype
+	 * field correctly in the presence of VLAN tags.
+	 */
+	hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
+	hdr->h_proto = ethertype;
+	if (eth_p_mpls(skb->protocol))
+		skb->protocol = ethertype;
+	return 0;
+}
+
+static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
+{
+	__be32 *stack;
+	int err;
+
+	err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+	if (unlikely(err))
+		return err;
+
+	stack = (__be32 *)skb_mpls_header(skb);
+	if (skb->ip_summed == CHECKSUM_COMPLETE) {
+		__be32 diff[] = { ~(*stack), *mpls_lse };
+
+		skb->csum = ~csum_partial((char *)diff, sizeof(diff),
+					  ~skb->csum);
+	}
+
+	*stack = *mpls_lse;
+
+	return 0;
+}
+
 /* remove VLAN header from packet and update csum accordingly. */
 static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
 {
@@ -140,10 +228,12 @@ static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
 
 	vlan_set_encap_proto(skb, vhdr);
 	skb->mac_header += VLAN_HLEN;
+
 	if (skb_network_offset(skb) < ETH_HLEN)
 		skb_set_network_header(skb, ETH_HLEN);
-	skb_reset_mac_len(skb);
 
+	/* Update mac_len for subsequent MPLS actions */
+	skb_reset_mac_len(skb);
 	return 0;
 }
 
@@ -186,6 +276,8 @@ static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vla
 
 		if (!__vlan_put_tag(skb, skb->vlan_proto, current_tag))
 			return -ENOMEM;
+		/* Update mac_len for subsequent MPLS actions */
+		skb->mac_len += VLAN_HLEN;
 
 		if (skb->ip_summed == CHECKSUM_COMPLETE)
 			skb->csum = csum_add(skb->csum, csum_partial(skb->data
@@ -612,6 +704,10 @@ static int execute_set_action(struct sk_buff *skb,
 	case OVS_KEY_ATTR_SCTP:
 		err = set_sctp(skb, nla_data(nested_attr));
 		break;
+
+	case OVS_KEY_ATTR_MPLS:
+		err = set_mpls(skb, nla_data(nested_attr));
+		break;
 	}
 
 	return err;
@@ -690,6 +786,14 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			execute_hash(skb, key, a);
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_MPLS:
+			err = push_mpls(skb, nla_data(a));
+			break;
+
+		case OVS_ACTION_ATTR_POP_MPLS:
+			err = pop_mpls(skb, nla_get_be16(a));
+			break;
+
 		case OVS_ACTION_ATTR_PUSH_VLAN:
 			err = push_vlan(skb, nla_data(a));
 			if (unlikely(err)) /* skb already freed. */
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index f18302f..688cb9b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -560,7 +560,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		goto err_flow_free;
 
 	err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
-				   &flow->key, 0, &acts);
+				   &flow->key, &acts);
 	if (err)
 		goto err_flow_free;
 
@@ -846,7 +846,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err_kfree_flow;
 
 	error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], &new_flow->key,
-				     0, &acts);
+				     &acts);
 	if (error) {
 		OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
 		goto err_kfree_acts;
@@ -953,7 +953,7 @@ static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
 		return acts;
 
 	ovs_flow_mask_key(&masked_key, key, mask);
-	error = ovs_nla_copy_actions(a, &masked_key, 0, &acts);
+	error = ovs_nla_copy_actions(a, &masked_key, &acts);
 	if (error) {
 		OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
 		kfree(acts);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 2b78789..90a2101 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -32,6 +32,7 @@
 #include <linux/if_arp.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
+#include <linux/mpls.h>
 #include <linux/sctp.h>
 #include <linux/smp.h>
 #include <linux/tcp.h>
@@ -42,6 +43,7 @@
 #include <net/ip.h>
 #include <net/ip_tunnels.h>
 #include <net/ipv6.h>
+#include <net/mpls.h>
 #include <net/ndisc.h>
 
 #include "datapath.h"
@@ -480,6 +482,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		return -ENOMEM;
 
 	skb_reset_network_header(skb);
+	skb_reset_mac_len(skb);
 	__skb_push(skb, skb->data - skb_mac_header(skb));
 
 	/* Network layer. */
@@ -584,6 +587,33 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 			memset(&key->ip, 0, sizeof(key->ip));
 			memset(&key->ipv4, 0, sizeof(key->ipv4));
 		}
+	} else if (eth_p_mpls(key->eth.type)) {
+		size_t stack_len = MPLS_HLEN;
+
+		/* In the presence of an MPLS label stack the end of the L2
+		 * header and the beginning of the L3 header differ.
+		 *
+		 * Advance network_header to the beginning of the L3
+		 * header. mac_len corresponds to the end of the L2 header.
+		 */
+		while (1) {
+			__be32 lse;
+
+			error = check_header(skb, skb->mac_len + stack_len);
+			if (unlikely(error))
+				return 0;
+
+			memcpy(&lse, skb_network_header(skb), MPLS_HLEN);
+
+			if (stack_len == MPLS_HLEN)
+				memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
+
+			skb_set_network_header(skb, skb->mac_len + stack_len);
+			if (lse & htonl(MPLS_LS_S_MASK))
+				break;
+
+			stack_len += MPLS_HLEN;
+		}
 	} else if (key->eth.type == htons(ETH_P_IPV6)) {
 		int nh_len;             /* IPv6 Header + Extensions */
 
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 7181331..4962bee 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -102,12 +102,17 @@ struct sw_flow_key {
 		__be16 tci;		/* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
 		__be16 type;		/* Ethernet frame type. */
 	} eth;
-	struct {
-		u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
-		u8     tos;		/* IP ToS. */
-		u8     ttl;		/* IP TTL/hop limit. */
-		u8     frag;		/* One of OVS_FRAG_TYPE_*. */
-	} ip;
+	union {
+		struct {
+			__be32 top_lse;	/* top label stack entry */
+		} mpls;
+		struct {
+			u8     proto;	/* IP protocol or lower 8 bits of ARP opcode. */
+			u8     tos;	    /* IP ToS. */
+			u8     ttl;	    /* IP TTL/hop limit. */
+			u8     frag;	/* One of OVS_FRAG_TYPE_*. */
+		} ip;
+	};
 	struct {
 		__be16 src;		/* TCP/UDP/SCTP source port. */
 		__be16 dst;		/* TCP/UDP/SCTP destination port. */
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 939bcb3..569309c 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -46,6 +46,7 @@
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/ndisc.h>
+#include <net/mpls.h>
 
 #include "flow_netlink.h"
 
@@ -134,7 +135,8 @@ static bool match_validate(const struct sw_flow_match *match,
 			| (1 << OVS_KEY_ATTR_ICMP)
 			| (1 << OVS_KEY_ATTR_ICMPV6)
 			| (1 << OVS_KEY_ATTR_ARP)
-			| (1 << OVS_KEY_ATTR_ND));
+			| (1 << OVS_KEY_ATTR_ND)
+			| (1 << OVS_KEY_ATTR_MPLS));
 
 	/* Always allowed mask fields. */
 	mask_allowed |= ((1 << OVS_KEY_ATTR_TUNNEL)
@@ -149,6 +151,12 @@ static bool match_validate(const struct sw_flow_match *match,
 			mask_allowed |= 1 << OVS_KEY_ATTR_ARP;
 	}
 
+	if (eth_p_mpls(match->key->eth.type)) {
+		key_expected |= 1 << OVS_KEY_ATTR_MPLS;
+		if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
+			mask_allowed |= 1 << OVS_KEY_ATTR_MPLS;
+	}
+
 	if (match->key->eth.type == htons(ETH_P_IP)) {
 		key_expected |= 1 << OVS_KEY_ATTR_IPV4;
 		if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
@@ -266,6 +274,7 @@ static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_RECIRC_ID] = sizeof(u32),
 	[OVS_KEY_ATTR_DP_HASH] = sizeof(u32),
 	[OVS_KEY_ATTR_TUNNEL] = -1,
+	[OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls),
 };
 
 static bool is_all_zero(const u8 *fp, size_t size)
@@ -735,6 +744,16 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 		attrs &= ~(1 << OVS_KEY_ATTR_ARP);
 	}
 
+	if (attrs & (1 << OVS_KEY_ATTR_MPLS)) {
+		const struct ovs_key_mpls *mpls_key;
+
+		mpls_key = nla_data(a[OVS_KEY_ATTR_MPLS]);
+		SW_FLOW_KEY_PUT(match, mpls.top_lse,
+				mpls_key->mpls_lse, is_mask);
+
+		attrs &= ~(1 << OVS_KEY_ATTR_MPLS);
+	 }
+
 	if (attrs & (1 << OVS_KEY_ATTR_TCP)) {
 		const struct ovs_key_tcp *tcp_key;
 
@@ -1140,6 +1159,14 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
 		arp_key->arp_op = htons(output->ip.proto);
 		ether_addr_copy(arp_key->arp_sha, output->ipv4.arp.sha);
 		ether_addr_copy(arp_key->arp_tha, output->ipv4.arp.tha);
+	} else if (eth_p_mpls(swkey->eth.type)) {
+		struct ovs_key_mpls *mpls_key;
+
+		nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key));
+		if (!nla)
+			goto nla_put_failure;
+		mpls_key = nla_data(nla);
+		mpls_key->mpls_lse = output->mpls.top_lse;
 	}
 
 	if ((swkey->eth.type == htons(ETH_P_IP) ||
@@ -1336,9 +1363,15 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
 	a->nla_len = sfa->actions_len - st_offset;
 }
 
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+				  const struct sw_flow_key *key,
+				  int depth, struct sw_flow_actions **sfa,
+				  __be16 eth_type, __be16 vlan_tci);
+
 static int validate_and_copy_sample(const struct nlattr *attr,
 				    const struct sw_flow_key *key, int depth,
-				    struct sw_flow_actions **sfa)
+				    struct sw_flow_actions **sfa,
+				    __be16 eth_type, __be16 vlan_tci)
 {
 	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
 	const struct nlattr *probability, *actions;
@@ -1375,7 +1408,8 @@ static int validate_and_copy_sample(const struct nlattr *attr,
 	if (st_acts < 0)
 		return st_acts;
 
-	err = ovs_nla_copy_actions(actions, key, depth + 1, sfa);
+	err = ovs_nla_copy_actions__(actions, key, depth + 1, sfa,
+				     eth_type, vlan_tci);
 	if (err)
 		return err;
 
@@ -1385,10 +1419,10 @@ static int validate_and_copy_sample(const struct nlattr *attr,
 	return 0;
 }
 
-static int validate_tp_port(const struct sw_flow_key *flow_key)
+static int validate_tp_port(const struct sw_flow_key *flow_key,
+			    __be16 eth_type)
 {
-	if ((flow_key->eth.type == htons(ETH_P_IP) ||
-	     flow_key->eth.type == htons(ETH_P_IPV6)) &&
+	if ((eth_type == htons(ETH_P_IP) || eth_type == htons(ETH_P_IPV6)) &&
 	    (flow_key->tp.src || flow_key->tp.dst))
 		return 0;
 
@@ -1483,7 +1517,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 static int validate_set(const struct nlattr *a,
 			const struct sw_flow_key *flow_key,
 			struct sw_flow_actions **sfa,
-			bool *set_tun)
+			bool *set_tun, __be16 eth_type)
 {
 	const struct nlattr *ovs_key = nla_data(a);
 	int key_type = nla_type(ovs_key);
@@ -1508,6 +1542,9 @@ static int validate_set(const struct nlattr *a,
 		break;
 
 	case OVS_KEY_ATTR_TUNNEL:
+		if (eth_p_mpls(eth_type))
+			return -EINVAL;
+
 		*set_tun = true;
 		err = validate_and_copy_set_tun(a, sfa);
 		if (err)
@@ -1515,7 +1552,7 @@ static int validate_set(const struct nlattr *a,
 		break;
 
 	case OVS_KEY_ATTR_IPV4:
-		if (flow_key->eth.type != htons(ETH_P_IP))
+		if (eth_type != htons(ETH_P_IP))
 			return -EINVAL;
 
 		if (!flow_key->ip.proto)
@@ -1531,7 +1568,7 @@ static int validate_set(const struct nlattr *a,
 		break;
 
 	case OVS_KEY_ATTR_IPV6:
-		if (flow_key->eth.type != htons(ETH_P_IPV6))
+		if (eth_type != htons(ETH_P_IPV6))
 			return -EINVAL;
 
 		if (!flow_key->ip.proto)
@@ -1553,19 +1590,24 @@ static int validate_set(const struct nlattr *a,
 		if (flow_key->ip.proto != IPPROTO_TCP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_type);
 
 	case OVS_KEY_ATTR_UDP:
 		if (flow_key->ip.proto != IPPROTO_UDP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_type);
+
+	case OVS_KEY_ATTR_MPLS:
+		if (!eth_p_mpls(eth_type))
+			return -EINVAL;
+		break;
 
 	case OVS_KEY_ATTR_SCTP:
 		if (flow_key->ip.proto != IPPROTO_SCTP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_type);
 
 	default:
 		return -EINVAL;
@@ -1609,12 +1651,13 @@ static int copy_action(const struct nlattr *from,
 	return 0;
 }
 
-int ovs_nla_copy_actions(const struct nlattr *attr,
-			 const struct sw_flow_key *key,
-			 int depth,
-			 struct sw_flow_actions **sfa)
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+				  const struct sw_flow_key *key,
+				  int depth, struct sw_flow_actions **sfa,
+				  __be16 eth_type, __be16 vlan_tci)
 {
 	const struct nlattr *a;
+	bool out_tnl_port = false;
 	int rem, err;
 
 	if (depth >= SAMPLE_ACTION_DEPTH)
@@ -1626,6 +1669,8 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 			[OVS_ACTION_ATTR_OUTPUT] = sizeof(u32),
 			[OVS_ACTION_ATTR_RECIRC] = sizeof(u32),
 			[OVS_ACTION_ATTR_USERSPACE] = (u32)-1,
+			[OVS_ACTION_ATTR_PUSH_MPLS] = sizeof(struct ovs_action_push_mpls),
+			[OVS_ACTION_ATTR_POP_MPLS] = sizeof(__be16),
 			[OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan),
 			[OVS_ACTION_ATTR_POP_VLAN] = 0,
 			[OVS_ACTION_ATTR_SET] = (u32)-1,
@@ -1655,6 +1700,8 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 		case OVS_ACTION_ATTR_OUTPUT:
 			if (nla_get_u32(a) >= DP_MAX_PORTS)
 				return -EINVAL;
+			out_tnl_port = false;
+
 			break;
 
 		case OVS_ACTION_ATTR_HASH: {
@@ -1671,6 +1718,7 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 		}
 
 		case OVS_ACTION_ATTR_POP_VLAN:
+			vlan_tci = htons(0);
 			break;
 
 		case OVS_ACTION_ATTR_PUSH_VLAN:
@@ -1679,19 +1727,66 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 				return -EINVAL;
 			if (!(vlan->vlan_tci & htons(VLAN_TAG_PRESENT)))
 				return -EINVAL;
+			vlan_tci = vlan->vlan_tci;
 			break;
 
 		case OVS_ACTION_ATTR_RECIRC:
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_MPLS: {
+			const struct ovs_action_push_mpls *mpls = nla_data(a);
+
+			/* Networking stack do not allow simultaneous Tunnel
+			 * and MPLS GSO.
+			 */
+			if (out_tnl_port)
+				return -EINVAL;
+
+			if (!eth_p_mpls(mpls->mpls_ethertype))
+				return -EINVAL;
+			/* Prohibit push MPLS other than to a white list
+			 * for packets that have a known tag order.
+			 */
+			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
+			    (eth_type != htons(ETH_P_IP) &&
+			     eth_type != htons(ETH_P_IPV6) &&
+			     eth_type != htons(ETH_P_ARP) &&
+			     eth_type != htons(ETH_P_RARP) &&
+			     !eth_p_mpls(eth_type)))
+				return -EINVAL;
+			eth_type = mpls->mpls_ethertype;
+			break;
+		}
+
+		case OVS_ACTION_ATTR_POP_MPLS:
+			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
+			    !eth_p_mpls(eth_type))
+				return -EINVAL;
+
+			/* Disallow subsequent L2.5+ set and mpls_pop actions
+			 * as there is no check here to ensure that the new
+			 * eth_type is valid and thus set actions could
+			 * write off the end of the packet or otherwise
+			 * corrupt it.
+			 *
+			 * Support for these actions is planned using packet
+			 * recirculation.
+			 */
+			eth_type = htons(0);
+			break;
+
 		case OVS_ACTION_ATTR_SET:
-			err = validate_set(a, key, sfa, &skip_copy);
+			err = validate_set(a, key, sfa,
+					   &out_tnl_port, eth_type);
 			if (err)
 				return err;
+
+			skip_copy = out_tnl_port;
 			break;
 
 		case OVS_ACTION_ATTR_SAMPLE:
-			err = validate_and_copy_sample(a, key, depth, sfa);
+			err = validate_and_copy_sample(a, key, depth, sfa,
+						       eth_type, vlan_tci);
 			if (err)
 				return err;
 			skip_copy = true;
@@ -1713,6 +1808,14 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 	return 0;
 }
 
+int ovs_nla_copy_actions(const struct nlattr *attr,
+			 const struct sw_flow_key *key,
+			 struct sw_flow_actions **sfa)
+{
+	return ovs_nla_copy_actions__(attr, key, 0, sfa, key->eth.type,
+				      key->eth.tci);
+}
+
 static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
 {
 	const struct nlattr *a;
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 206e45a..6355b1d 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -49,7 +49,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 		      const struct nlattr *);
 
 int ovs_nla_copy_actions(const struct nlattr *attr,
-			 const struct sw_flow_key *key, int depth,
+			 const struct sw_flow_key *key,
 			 struct sw_flow_actions **sfa);
 int ovs_nla_put_actions(const struct nlattr *attr,
 			int len, struct sk_buff *skb);
-- 
1.9.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox