Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 0/3] Preparation for VLAN_TAG_PRESENT cleanup
From: David Miller @ 2017-01-04  3:23 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev
In-Reply-To: <cover.1483487429.git.mirq-linux@rere.qmqm.pl>


By submitted these in sections, but all at once, you are subverting
my requirement to submit only small self contained patch series.

Please do not do this.

The whole point is to not have a lot of patches in flight for one
thing for people to review at one time.  Contributors can review
more easily small, easily digestable, pieces.

Start over, and only submit small numbers of patches at one time.  Do
not submit new patches until the first series has been processed.

Thank you.

^ permalink raw reply

* [PATCH net-next] net: phy: add extension of phy-mode for XLGMII
From: Jie Deng @ 2017-01-04  5:04 UTC (permalink / raw)
  To: f.fainelli, davem; +Cc: netdev, linux-kernel, Jie Deng

The Synopsys DWC_xlgmac core provides a multiplexed 40-Gigabit
Media-Independent Interface (XLGMII, an IEEE 802.3 Clause 81
compliant reconciliation sub-layer) for communication with
the 100/50/40/25-Gigabit PHY and 10-Gigabit Media-Independent
Interface (XGMII, an IEEE 802.3 Clause 46 compliant reconciliation
sub-layer) for communication with the 10-Gigabit PHY.

Currently, There are only interface mode definitions for "xgmii".
This patch adds the definitions for the PHY layer to recognize
"xlgmii" as a valid PHY interface.

Signed-off-by: Jie Deng <jiedeng@synopsys.com>
---
 include/linux/phy.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index f7d95f6..7b6bfb3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -82,6 +82,7 @@
 	PHY_INTERFACE_MODE_MOCA,
 	PHY_INTERFACE_MODE_QSGMII,
 	PHY_INTERFACE_MODE_TRGMII,
+	PHY_INTERFACE_MODE_XLGMII,
 	PHY_INTERFACE_MODE_MAX,
 } phy_interface_t;
 
@@ -142,6 +143,8 @@ static inline const char *phy_modes(phy_interface_t interface)
 		return "qsgmii";
 	case PHY_INTERFACE_MODE_TRGMII:
 		return "trgmii";
+	case PHY_INTERFACE_MODE_XLGMII:
+		return "xlgmii";
 	default:
 		return "unknown";
 	}
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next v2] net/sched: cls_matchall: Fix error path
From: Cong Wang @ 2017-01-04  6:33 UTC (permalink / raw)
  To: Yotam Gigi
  Cc: Jamal Hadi Salim, David Miller, eladr, Jiri Pirko,
	Linux Kernel Network Developers
In-Reply-To: <1483464024-25745-1-git-send-email-yotamg@mellanox.com>

On Tue, Jan 3, 2017 at 9:20 AM, Yotam Gigi <yotamg@mellanox.com> wrote:
> Fix several error paths in matchall:
>  - Release reference to actions in case the hardware fails offloading
>    (relevant to skip_sw only)
>  - Fix error path in case tcf_exts initialization/validation fail
>
> Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>

Looks good,

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

^ permalink raw reply

* Re: ping: What's the purpose of rate limit?
From: Cong Wang @ 2017-01-04  6:45 UTC (permalink / raw)
  To: Guan Xin; +Cc: Linux Kernel Network Developers
In-Reply-To: <CANeMGR4=vycxBU+LcrL2JXHt2FTYBkU2aLxfY5XRzHdws-aYag@mail.gmail.com>

Hi,

On Sat, Dec 31, 2016 at 4:04 AM, Guan Xin <guanx.bac@gmail.com> wrote:
> Hello,
>
> Excuse me, I searched but didn't find an answer --
>
> What's the purpose of setting a limit to the "-i" and "-l" parameters
> of ping for non-root users?
>
> It seems that this is only intended to prevent accidental misuse
> because these restrictions can be easily worked around by starting
> multiple instances or wrapping the program in a loop, e.g.,
> while true; do ping -l3 -c3 192.168.0.1; done
>

I think the purpose of "-i" is not for rate limit, but for either hear beating
or monitoring latency periodically.

Thanks.

^ permalink raw reply

* [RFC 2/4] net-next: dsa: Refactor DT probing of a switch port
From: John Crispin @ 2017-01-04  7:38 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Florian Fainelli, Vivien Didelot; +Cc: netdev
In-Reply-To: <1483515484-21793-1-git-send-email-john@phrozen.org>

From: Andrew Lunn <andrew@lunn.ch>

Move the DT probing of a switch port into a function of its own, since
it is about to get more complex. Add better error handling as well.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/dsa.c |  138 ++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 102 insertions(+), 36 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 7899919..0e0621c 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -326,14 +326,10 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent)
 			continue;
 
 		if (!strcmp(name, "cpu")) {
-			if (dst->cpu_switch != -1) {
-				netdev_err(dst->master_netdev,
-					   "multiple cpu ports?!\n");
-				ret = -EINVAL;
-				goto out;
+			if (dst->cpu_switch == -1) {
+				dst->cpu_switch = index;
+				dst->cpu_port = i;
 			}
-			dst->cpu_switch = index;
-			dst->cpu_port = i;
 			ds->cpu_port_mask |= 1 << i;
 		} else if (!strcmp(name, "dsa")) {
 			ds->dsa_port_mask |= 1 << i;
@@ -709,11 +705,15 @@ static void dsa_of_free_platform_data(struct dsa_platform_data *pd)
 {
 	int i;
 	int port_index;
+	struct dsa_chip_data *cd;
 
 	for (i = 0; i < pd->nr_chips; i++) {
+		cd = &pd->chip[i];
 		port_index = 0;
 		while (port_index < DSA_MAX_PORTS) {
-			kfree(pd->chip[i].port_names[port_index]);
+			kfree(cd->port_names[port_index]);
+			if (cd->port_ethernet[port_index])
+				dev_put(cd->port_ethernet[port_index]);
 			port_index++;
 		}
 
@@ -724,6 +724,94 @@ static void dsa_of_free_platform_data(struct dsa_platform_data *pd)
 	kfree(pd->chip);
 }
 
+static int dsa_of_probe_cpu_port(struct dsa_chip_data *cd,
+				 struct device_node *port,
+				 int port_index)
+{
+	struct net_device *ethernet_dev;
+	struct device_node *ethernet;
+
+	ethernet = of_parse_phandle(port, "ethernet", 0);
+	if (ethernet) {
+		ethernet_dev = of_find_net_device_by_node(ethernet);
+		if (!ethernet_dev)
+			return -EPROBE_DEFER;
+
+		dev_hold(ethernet_dev);
+		cd->port_ethernet[port_index] = ethernet_dev;
+	}
+
+	return 0;
+}
+
+static int dsa_of_probe_user_port(struct dsa_chip_data *cd,
+				  struct device_node *port,
+				  int port_index)
+{
+	struct device_node *cpu_port;
+	const unsigned int *cpu_port_reg;
+	int cpu_port_index;
+
+	cpu_port = of_parse_phandle(port, "cpu", 0);
+	if (cpu_port) {
+		cpu_port_reg = of_get_property(cpu_port, "reg", NULL);
+		if (!cpu_port_reg)
+			return -EINVAL;
+		cpu_port_index = be32_to_cpup(cpu_port_reg);
+		cd->port_cpu[port_index] = cpu_port_index;
+	}
+
+	return 0;
+}
+
+static int dsa_of_probe_port(struct dsa_platform_data *pd,
+			     struct dsa_chip_data *cd,
+			     int chip_index,
+			     struct device_node *port)
+{
+	bool is_cpu_port = false, is_dsa_port = false;
+	bool is_user_port = false;
+	const unsigned int *port_reg;
+	const char *port_name;
+	int port_index, ret = 0;
+
+	port_reg = of_get_property(port, "reg", NULL);
+	if (!port_reg)
+		return -EINVAL;
+
+	port_index = be32_to_cpup(port_reg);
+
+	port_name = of_get_property(port, "label", NULL);
+	if (!port_name)
+		return -EINVAL;
+
+	if (!strcmp(port_name, "cpu"))
+		is_cpu_port = true;
+	if (!strcmp(port_name, "dsa"))
+		is_dsa_port = true;
+	if (!is_cpu_port && !is_dsa_port)
+		is_user_port = true;
+
+	cd->port_dn[port_index] = port;
+
+	cd->port_names[port_index] = kstrdup(port_name,
+			GFP_KERNEL);
+	if (!cd->port_names[port_index])
+		return -ENOMEM;
+
+	if (is_dsa_port)
+		ret = dsa_of_probe_links(pd, cd, chip_index,
+					 port_index, port, port_name);
+	if (is_cpu_port)
+		ret = dsa_of_probe_cpu_port(cd, port, port_index);
+	if (is_user_port)
+		ret = dsa_of_probe_user_port(cd, port, port_index);
+	if (ret)
+		return ret;
+
+	return port_index;
+}
+
 static int dsa_of_probe(struct device *dev)
 {
 	struct device_node *np = dev->of_node;
@@ -732,9 +820,8 @@ static int dsa_of_probe(struct device *dev)
 	struct net_device *ethernet_dev;
 	struct dsa_platform_data *pd;
 	struct dsa_chip_data *cd;
-	const char *port_name;
-	int chip_index, port_index;
-	const unsigned int *sw_addr, *port_reg;
+	int chip_index;
+	const unsigned int *sw_addr;
 	u32 eeprom_len;
 	int ret;
 
@@ -821,32 +908,11 @@ static int dsa_of_probe(struct device *dev)
 		}
 
 		for_each_available_child_of_node(child, port) {
-			port_reg = of_get_property(port, "reg", NULL);
-			if (!port_reg)
-				continue;
-
-			port_index = be32_to_cpup(port_reg);
-			if (port_index >= DSA_MAX_PORTS)
-				break;
-
-			port_name = of_get_property(port, "label", NULL);
-			if (!port_name)
-				continue;
-
-			cd->port_dn[port_index] = port;
-
-			cd->port_names[port_index] = kstrdup(port_name,
-					GFP_KERNEL);
-			if (!cd->port_names[port_index]) {
-				ret = -ENOMEM;
+			ret = dsa_of_probe_port(pd, cd, chip_index, port);
+			if (ret < 0)
 				goto out_free_chip;
-			}
-
-			ret = dsa_of_probe_links(pd, cd, chip_index,
-						 port_index, port, port_name);
-			if (ret)
-				goto out_free_chip;
-
+			if (ret == DSA_MAX_PORTS)
+				break;
 		}
 	}
 
-- 
1.7.10.4

^ permalink raw reply related

* [RFC 4/4] net-next: dsa: qca8k: add support for multiple cpu ports
From: John Crispin @ 2017-01-04  7:38 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Florian Fainelli, Vivien Didelot
  Cc: netdev, John Crispin
In-Reply-To: <1483515484-21793-1-git-send-email-john@phrozen.org>

With the subsystem now supporting multiple cpu ports, we need to make some
changes to the driver as it currently has the cpu port hardcoded as port0.
The patch moves the setup logic for the cpu port into one loop which
iterates over all cpu ports and sets them up. Additionally the bridge
join/leave logic needs a small fix to work with having a cpu port other
than 0.

Signed-off-by: John Crispin <john@phrozen.org>
---
 drivers/net/dsa/qca8k.c |  135 +++++++++++++++++++++++++++--------------------
 drivers/net/dsa/qca8k.h |    2 -
 2 files changed, 78 insertions(+), 59 deletions(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index b3df70d..1693388 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -486,11 +486,25 @@
 		qca8k_reg_clear(priv, QCA8K_REG_PORT_STATUS(port), mask);
 }
 
+static void
+qca8k_setup_flooding(struct qca8k_priv *priv, int port_mask, int enable)
+{
+	u32 mask = (port_mask << QCA8K_GLOBAL_FW_CTRL1_IGMP_DP_S) |
+		   (port_mask << QCA8K_GLOBAL_FW_CTRL1_BC_DP_S) |
+		   (port_mask << QCA8K_GLOBAL_FW_CTRL1_MC_DP_S) |
+		   (port_mask << QCA8K_GLOBAL_FW_CTRL1_UC_DP_S);
+
+	if (enable)
+		qca8k_reg_set(priv, QCA8K_REG_GLOBAL_FW_CTRL1, mask);
+	else
+		qca8k_reg_clear(priv, QCA8K_REG_GLOBAL_FW_CTRL1, mask);
+}
+
 static int
 qca8k_setup(struct dsa_switch *ds)
 {
 	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
-	int ret, i, phy_mode = -1;
+	int ret, i;
 
 	/* Make sure that port 0 is the cpu port */
 	if (!dsa_is_cpu_port(ds, 0)) {
@@ -506,29 +520,49 @@
 	if (IS_ERR(priv->regmap))
 		pr_warn("regmap initialization failed");
 
-	/* Initialize CPU port pad mode (xMII type, delays...) */
-	phy_mode = of_get_phy_mode(ds->ports[ds->dst->cpu_port].dn);
-	if (phy_mode < 0) {
-		pr_err("Can't find phy-mode for master device\n");
-		return phy_mode;
-	}
-	ret = qca8k_set_pad_ctrl(priv, QCA8K_CPU_PORT, phy_mode);
-	if (ret < 0)
-		return ret;
-
-	/* Enable CPU Port */
+	/* Tell the switch that port0 is a cpu port */
 	qca8k_reg_set(priv, QCA8K_REG_GLOBAL_FW_CTRL0,
 		      QCA8K_GLOBAL_FW_CTRL0_CPU_PORT_EN);
-	qca8k_port_set_status(priv, QCA8K_CPU_PORT, 1);
-	priv->port_sts[QCA8K_CPU_PORT].enabled = 1;
 
 	/* Enable MIB counters */
 	qca8k_mib_init(priv);
 
-	/* Enable QCA header mode on the cpu port */
-	qca8k_write(priv, QCA8K_REG_PORT_HDR_CTRL(QCA8K_CPU_PORT),
-		    QCA8K_PORT_HDR_CTRL_ALL << QCA8K_PORT_HDR_CTRL_TX_S |
-		    QCA8K_PORT_HDR_CTRL_ALL << QCA8K_PORT_HDR_CTRL_RX_S);
+	/* Setup the cpu ports */
+	for (i = 0; i < DSA_MAX_PORTS; i++) {
+		struct net_device *netdev;
+		int phy_mode = -1;
+
+		if (!dsa_is_cpu_port(ds, i))
+			continue;
+
+		netdev = ds->dst->pd->chip->port_ethernet[i];
+		if (!netdev) {
+			pr_err("Can't find netdev for port%d\n", i);
+			return -ENODEV;
+		}
+
+		/* Initialize CPU port pad mode (xMII type, delays...) */
+		phy_mode = of_get_phy_mode(netdev->dev.parent->of_node);
+		if (phy_mode < 0) {
+			pr_err("Can't find phy-mode for port:%d\n", i);
+			return phy_mode;
+		}
+		ret = qca8k_set_pad_ctrl(priv, i, phy_mode);
+		if (ret < 0)
+			return ret;
+
+		/* Enable QCA header mode on the cpu port */
+		qca8k_write(priv,
+			    QCA8K_REG_PORT_HDR_CTRL(i),
+			    QCA8K_PORT_HDR_CTRL_ALL << QCA8K_PORT_HDR_CTRL_TX_S |
+			    QCA8K_PORT_HDR_CTRL_ALL << QCA8K_PORT_HDR_CTRL_RX_S);
+
+		qca8k_port_set_status(priv, i, 1);
+		priv->port_sts[i].enabled = 1;
+
+		/* Forward all unknown frames to CPU port for Linux processing */
+		qca8k_setup_flooding(priv, BIT(i), 1);
+	}
 
 	/* Disable forwarding by default on all ports */
 	for (i = 0; i < QCA8K_NUM_PORTS; i++)
@@ -540,43 +574,30 @@
 		if (ds->enabled_port_mask & BIT(i))
 			qca8k_port_set_status(priv, i, 0);
 
-	/* Forward all unknown frames to CPU port for Linux processing */
-	qca8k_write(priv, QCA8K_REG_GLOBAL_FW_CTRL1,
-		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_IGMP_DP_S |
-		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_BC_DP_S |
-		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_MC_DP_S |
-		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_UC_DP_S);
-
-	/* Setup connection between CPU port & user ports */
+	/* Setup user ports and connections to CPU ports */
 	for (i = 0; i < DSA_MAX_PORTS; i++) {
-		/* CPU port gets connected to all user ports of the switch */
-		if (dsa_is_cpu_port(ds, i)) {
-			qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(QCA8K_CPU_PORT),
-				  QCA8K_PORT_LOOKUP_MEMBER,
-				  ds->enabled_port_mask);
-		}
+		int shift = 16 * (i % 2);
+		int cpu_port;
 
-		/* Invividual user ports get connected to CPU port only */
-		if (ds->enabled_port_mask & BIT(i)) {
-			int shift = 16 * (i % 2);
-
-			qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(i),
-				  QCA8K_PORT_LOOKUP_MEMBER,
-				  BIT(QCA8K_CPU_PORT));
-
-			/* Enable ARP Auto-learning by default */
-			qca8k_reg_set(priv, QCA8K_PORT_LOOKUP_CTRL(i),
-				      QCA8K_PORT_LOOKUP_LEARN);
-
-			/* For port based vlans to work we need to set the
-			 * default egress vid
-			 */
-			qca8k_rmw(priv, QCA8K_EGRESS_VLAN(i),
-				  0xffff << shift, 1 << shift);
-			qca8k_write(priv, QCA8K_REG_PORT_VLAN_CTRL0(i),
-				    QCA8K_PORT_VLAN_CVID(1) |
-				    QCA8K_PORT_VLAN_SVID(1));
-		}
+		if (!(ds->enabled_port_mask & BIT(i)))
+			continue;
+
+		cpu_port = dsa_port_upstream_port(ds, i);
+		qca8k_reg_set(priv, QCA8K_PORT_LOOKUP_CTRL(i), BIT(cpu_port));
+		qca8k_reg_set(priv, QCA8K_PORT_LOOKUP_CTRL(cpu_port), BIT(i));
+
+		/* Enable ARP Auto-learning by default */
+		qca8k_reg_set(priv, QCA8K_PORT_LOOKUP_CTRL(i),
+			      QCA8K_PORT_LOOKUP_LEARN);
+
+		/* For port based vlans to work we need to set the
+		 * default egress vid
+		 */
+		qca8k_rmw(priv, QCA8K_EGRESS_VLAN(i),
+			  0xffff << shift, 1 << shift);
+		qca8k_write(priv, QCA8K_REG_PORT_VLAN_CTRL0(i),
+			    QCA8K_PORT_VLAN_CVID(1) |
+			    QCA8K_PORT_VLAN_SVID(1));
 	}
 
 	/* Flush the FDB table */
@@ -750,7 +771,7 @@
 		       struct net_device *bridge)
 {
 	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
-	int port_mask = BIT(QCA8K_CPU_PORT);
+	int port_mask = 0;
 	int i;
 
 	priv->port_sts[port].bridge_dev = bridge;
@@ -768,8 +789,7 @@
 			port_mask |= BIT(i);
 	}
 	/* Add all other ports to this ports portvlan mask */
-	qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(port),
-		  QCA8K_PORT_LOOKUP_MEMBER, port_mask);
+	qca8k_reg_set(priv, QCA8K_PORT_LOOKUP_CTRL(port), port_mask);
 
 	return 0;
 }
@@ -796,7 +816,8 @@
 	 * this port
 	 */
 	qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(port),
-		  QCA8K_PORT_LOOKUP_MEMBER, BIT(QCA8K_CPU_PORT));
+		  QCA8K_PORT_LOOKUP_MEMBER,
+		  BIT(dsa_port_upstream_port(ds, i)));
 }
 
 static int
diff --git a/drivers/net/dsa/qca8k.h b/drivers/net/dsa/qca8k.h
index 2014647..aca6abb 100644
--- a/drivers/net/dsa/qca8k.h
+++ b/drivers/net/dsa/qca8k.h
@@ -26,8 +26,6 @@
 
 #define QCA8K_NUM_FDB_RECORDS				2048
 
-#define QCA8K_CPU_PORT					0
-
 /* Global control registers */
 #define QCA8K_REG_MASK_CTRL				0x000
 #define   QCA8K_MASK_CTRL_ID_M				0xff
-- 
1.7.10.4

^ permalink raw reply related

* [RFC 3/4] net-next: dsa: Add support for multiple cpu ports.
From: John Crispin @ 2017-01-04  7:38 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Florian Fainelli, Vivien Didelot
  Cc: netdev, John Crispin
In-Reply-To: <1483515484-21793-1-git-send-email-john@phrozen.org>

From: Andrew Lunn <andrew@lunn.ch>

Some boards have two CPU interfaces connected to the switch, e.g. WiFi
access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and
two port connected to the SoC.

This patch extends DSA to allows both CPU ports to be used. The "cpu"
node in the DSA tree can now have a phandle to the host interface it
connects to. Each user port can have a phandle to a cpu port which
should be used for traffic between the port and the CPU. Thus simple
load sharing over the two CPU ports can be achieved.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 include/net/dsa.h  |   21 ++++++++++++++++++++-
 net/dsa/dsa2.c     |   36 ++++++++++++++++++++++++++++++------
 net/dsa/dsa_priv.h |    5 +++++
 net/dsa/slave.c    |   27 ++++++++++++++++-----------
 4 files changed, 71 insertions(+), 18 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index b122196..f68180b 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -60,6 +60,8 @@ struct dsa_chip_data {
 	 */
 	char		*port_names[DSA_MAX_PORTS];
 	struct device_node *port_dn[DSA_MAX_PORTS];
+	struct net_device *port_ethernet[DSA_MAX_PORTS];
+	int		port_cpu[DSA_MAX_PORTS];
 
 	/*
 	 * An array of which element [a] indicates which port on this
@@ -204,7 +206,7 @@ struct dsa_switch {
 
 static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p)
 {
-	return !!(ds->index == ds->dst->cpu_switch && p == ds->dst->cpu_port);
+	return !!(ds->cpu_port_mask & (1 << p));
 }
 
 static inline bool dsa_is_dsa_port(struct dsa_switch *ds, int p)
@@ -217,6 +219,11 @@ static inline bool dsa_is_port_initialized(struct dsa_switch *ds, int p)
 	return ds->enabled_port_mask & (1 << p) && ds->ports[p].netdev;
 }
 
+static inline bool dsa_is_upstream_port(struct dsa_switch *ds, int p)
+{
+	return dsa_is_cpu_port(ds, p) || dsa_is_dsa_port(ds, p);
+}
+
 static inline u8 dsa_upstream_port(struct dsa_switch *ds)
 {
 	struct dsa_switch_tree *dst = ds->dst;
@@ -233,6 +240,18 @@ static inline u8 dsa_upstream_port(struct dsa_switch *ds)
 		return ds->rtable[dst->cpu_switch];
 }
 
+static inline u8 dsa_port_upstream_port(struct dsa_switch *ds, int port)
+{
+	/*
+	 * If this port has a specific upstream cpu port, use it,
+	 * otherwise use the switch default.
+	 */
+	if (ds->cd->port_cpu[port])
+		return ds->cd->port_cpu[port];
+	else
+		return dsa_upstream_port(ds);
+}
+
 struct switchdev_trans;
 struct switchdev_obj;
 struct switchdev_obj_port_fdb;
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 5fff951..1763cd4 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -258,7 +258,7 @@ static void dsa_cpu_port_unapply(struct device_node *port, u32 index,
 {
 	dsa_cpu_dsa_destroy(port);
 	ds->cpu_port_mask &= ~BIT(index);
-
+	dev_put(ds->cd->port_ethernet[index]);
 }
 
 static int dsa_user_port_apply(struct device_node *port, u32 index,
@@ -475,6 +475,28 @@ static int dsa_cpu_parse(struct device_node *port, u32 index,
 
 	dst->rcv = dst->tag_ops->rcv;
 
+	dev_hold(ethernet_dev);
+	ds->cd->port_ethernet[index] = ethernet_dev;
+
+	return 0;
+}
+
+static int dsa_user_parse(struct device_node *port, u32 index,
+			  struct dsa_switch *ds)
+{
+	struct device_node *cpu_port;
+	const unsigned int *cpu_port_reg;
+	int cpu_port_index;
+
+	cpu_port = of_parse_phandle(port, "cpu", 0);
+	if (cpu_port) {
+		cpu_port_reg = of_get_property(cpu_port, "reg", NULL);
+		if (!cpu_port_reg)
+			return -EINVAL;
+		cpu_port_index = be32_to_cpup(cpu_port_reg);
+		ds->cd->port_cpu[index] = cpu_port_index;
+	}
+
 	return 0;
 }
 
@@ -482,18 +504,20 @@ static int dsa_ds_parse(struct dsa_switch_tree *dst, struct dsa_switch *ds)
 {
 	struct device_node *port;
 	u32 index;
-	int err;
+	int err = 0;
 
 	for (index = 0; index < DSA_MAX_PORTS; index++) {
 		port = ds->ports[index].dn;
 		if (!port)
 			continue;
 
-		if (dsa_port_is_cpu(port)) {
+		if (dsa_port_is_cpu(port))
 			err = dsa_cpu_parse(port, index, dst, ds);
-			if (err)
-				return err;
-		}
+		else if (!dsa_port_is_dsa(port))
+			err = dsa_user_parse(port, index,  ds);
+
+		if (err)
+			return err;
 	}
 
 	pr_info("DSA: switch %d %d parsed\n", dst->tree, ds->index);
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 6cfd738..7e1e62c 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -24,6 +24,11 @@ struct dsa_device_ops {
 struct dsa_slave_priv {
 	struct sk_buff *	(*xmit)(struct sk_buff *skb,
 					struct net_device *dev);
+	/*
+	 * Which host device do we used to send packets to the switch
+	 * for this port.
+	 */
+	struct net_device	*master;
 
 	/*
 	 * Which switch this port is a part of, and the port index
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index ffd91969..260d4a9 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -61,7 +61,7 @@ static int dsa_slave_get_iflink(const struct net_device *dev)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
 
-	return p->parent->dst->master_netdev->ifindex;
+	return p->master->ifindex;
 }
 
 static inline bool dsa_port_is_bridged(struct dsa_slave_priv *p)
@@ -96,7 +96,7 @@ static void dsa_port_set_stp_state(struct dsa_switch *ds, int port, u8 state)
 static int dsa_slave_open(struct net_device *dev)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct net_device *master = p->parent->dst->master_netdev;
+	struct net_device *master = p->master;
 	struct dsa_switch *ds = p->parent;
 	u8 stp_state = dsa_port_is_bridged(p) ?
 			BR_STATE_BLOCKING : BR_STATE_FORWARDING;
@@ -151,7 +151,7 @@ static int dsa_slave_open(struct net_device *dev)
 static int dsa_slave_close(struct net_device *dev)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct net_device *master = p->parent->dst->master_netdev;
+	struct net_device *master = p->master;
 	struct dsa_switch *ds = p->parent;
 
 	if (p->phy)
@@ -178,7 +178,7 @@ static int dsa_slave_close(struct net_device *dev)
 static void dsa_slave_change_rx_flags(struct net_device *dev, int change)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct net_device *master = p->parent->dst->master_netdev;
+	struct net_device *master = p->master;
 
 	if (change & IFF_ALLMULTI)
 		dev_set_allmulti(master, dev->flags & IFF_ALLMULTI ? 1 : -1);
@@ -189,7 +189,7 @@ static void dsa_slave_change_rx_flags(struct net_device *dev, int change)
 static void dsa_slave_set_rx_mode(struct net_device *dev)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct net_device *master = p->parent->dst->master_netdev;
+	struct net_device *master = p->master;
 
 	dev_mc_sync(master, dev);
 	dev_uc_sync(master, dev);
@@ -198,7 +198,7 @@ static void dsa_slave_set_rx_mode(struct net_device *dev)
 static int dsa_slave_set_mac_address(struct net_device *dev, void *a)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct net_device *master = p->parent->dst->master_netdev;
+	struct net_device *master = p->master;
 	struct sockaddr *addr = a;
 	int err;
 
@@ -633,7 +633,7 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* Queue the SKB for transmission on the parent interface, but
 	 * do not modify its EtherType
 	 */
-	nskb->dev = p->parent->dst->master_netdev;
+	nskb->dev = p->master;
 	dev_queue_xmit(nskb);
 
 	return NETDEV_TX_OK;
@@ -947,7 +947,7 @@ static int dsa_slave_netpoll_setup(struct net_device *dev,
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct dsa_switch *ds = p->parent;
-	struct net_device *master = ds->dst->master_netdev;
+	struct net_device *master = p->master;
 	struct netpoll *netpoll;
 	int err = 0;
 
@@ -1247,12 +1247,16 @@ int dsa_slave_create(struct dsa_switch *ds, struct device *parent,
 	struct net_device *master;
 	struct net_device *slave_dev;
 	struct dsa_slave_priv *p;
+	int port_cpu = ds->cd->port_cpu[port];
 	int ret;
 
-	master = ds->dst->master_netdev;
-	if (ds->master_netdev)
+	if (port_cpu && ds->cd->port_ethernet[port_cpu])
+		master = ds->cd->port_ethernet[port_cpu];
+	else if (ds->master_netdev)
 		master = ds->master_netdev;
-
+	else
+		master = ds->dst->master_netdev;
+	master->dsa_ptr = (void *)ds->dst;
 	slave_dev = alloc_netdev(sizeof(struct dsa_slave_priv), name,
 				 NET_NAME_UNKNOWN, ether_setup);
 	if (slave_dev == NULL)
@@ -1279,6 +1283,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct device *parent,
 	p->parent = ds;
 	p->port = port;
 	p->xmit = dst->tag_ops->xmit;
+	p->master = master;
 
 	p->old_pause = -1;
 	p->old_link = -1;
-- 
1.7.10.4

^ permalink raw reply related

* [RFC 0/4] net-next: dsa: add support for multiple cpu ports
From: John Crispin @ 2017-01-04  7:38 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Florian Fainelli, Vivien Didelot
  Cc: netdev, John Crispin

This series is based on work from Andrew. I have rebased his patches to
work on the latest kernel. The main problem is probably the fact that the
cpu port to user port mapping happens inside the devicetree.

Andrew Lunn (3):
  Documentation: devicetree: add multiple cpu port DSA binding
  net-next: dsa: Refactor DT probing of a switch port
  net-next: dsa: Add support for multiple cpu ports.

John Crispin (1):
  net-next: dsa: qca8k: add support for multiple cpu ports

 Documentation/devicetree/bindings/net/dsa/dsa.txt |   67 +++++++++-
 drivers/net/dsa/qca8k.c                           |  135 +++++++++++---------
 drivers/net/dsa/qca8k.h                           |    2 -
 include/net/dsa.h                                 |   21 +++-
 net/dsa/dsa.c                                     |  138 +++++++++++++++------
 net/dsa/dsa2.c                                    |   36 +++++-
 net/dsa/dsa_priv.h                                |    5 +
 net/dsa/slave.c                                   |   27 ++--
 8 files changed, 317 insertions(+), 114 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [RFC 1/4] Documentation: devicetree: add multiple cpu port DSA binding
From: John Crispin @ 2017-01-04  7:38 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Florian Fainelli, Vivien Didelot
  Cc: netdev, Rob Herring, devicetree
In-Reply-To: <1483515484-21793-1-git-send-email-john@phrozen.org>

From: Andrew Lunn <andrew@lunn.ch>

Extend the DSA binding documentation, adding the new properties required
when there is more than one CPU port attached to the switch.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt |   67 ++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index a4a570f..fc901cf 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -337,13 +337,25 @@ Optional property:
 			  This mii-bus will be used in preference to the
 			  global dsa,mii-bus defined above, for this switch.
 
+- ethernet		: Optional for "cpu" ports. A phandle to an ethernet
+                          device which will be used by this CPU port for
+			  passing packets to/from the host. If not present,
+			  the port will use the "dsa,ethernet" property
+			  defined above.
+
+- cpu			: Option for non "cpu"/"dsa" ports. A phandle to a
+			  "cpu" port, which will be used for passing packets
+			  from this port to the host. If not present, the first
+			  "cpu" port will be used.
+
+
 Optional subnodes:
 - fixed-link		: Fixed-link subnode describing a link to a non-MDIO
 			  managed entity. See
 			  Documentation/devicetree/bindings/net/fixed-link.txt
 			  for details.
 
-Example:
+Examples:
 
 	dsa@0 {
 		compatible = "marvell,dsa";
@@ -416,3 +428,56 @@ Example:
 			};
 		};
 	};
+
+	dsa@1 {
+		compatible = "marvell,dsa";
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		dsa,ethernet = <&eth0port>;
+		dsa,mii-bus = <&mdio>;
+
+		switch@0 {
+			#address-cells = <1>;
+			#size-cells = <0>;
+			reg = <0 0>;	/* MDIO address 0, switch 0 in tree */
+
+			port@0 {
+				reg = <0>;
+				label = "lan4";
+			};
+
+			port@1 {
+				reg = <1>;
+				label = "lan3";
+				cpu = <&cpu1>;
+			};
+
+			port@2 {
+				reg = <2>;
+				label = "lan2";
+			};
+
+			port@3 {
+				reg = <3>;
+				label = "lan1";
+				cpu = <&cpu1>;
+			};
+
+			port@4 {
+				reg = <4>;
+				label = "wan";
+			};
+
+			port@5 {
+				reg = <5>;
+				label = "cpu";
+			};
+
+			cpu1: port@6 {
+				reg = <6>;
+				label = "cpu";
+				ethernet = <&eth1port>;
+			};
+		};
+	};
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH RFC ipsec-next 5/5] esp: Add a software GRO codepath
From: Steffen Klassert @ 2017-01-04  8:23 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Steffen Klassert, Sowmini Varadhan, Ilan Tayari
In-Reply-To: <1483518230-6777-1-git-send-email-steffen.klassert@secunet.com>

This patch adds GRO callbacks for ESP on ipv4 and ipv6.

In case the GRO layer detects an ESP packet, the
esp{4,6}_gro_receive() function calls the xfrm input layer
which decapsulates the packet and reinject it into
layer 2 by calling netif_rx().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/Kconfig                |  6 +++
 net/ipv4/Makefile               |  1 +
 net/ipv4/esp4.c                 |  5 +++
 net/ipv4/esp4_offload.c         | 94 +++++++++++++++++++++++++++++++++++++++
 net/ipv4/ip_vti.c               |  4 ++
 net/ipv4/xfrm4_input.c          |  6 +++
 net/ipv4/xfrm4_mode_transport.c |  3 +-
 net/ipv6/Kconfig                |  6 +++
 net/ipv6/Makefile               |  1 +
 net/ipv6/esp6.c                 |  5 +++
 net/ipv6/esp6_offload.c         | 98 +++++++++++++++++++++++++++++++++++++++++
 net/ipv6/xfrm6_input.c          |  5 +++
 net/xfrm/xfrm_input.c           | 27 +++++++++---
 13 files changed, 253 insertions(+), 8 deletions(-)
 create mode 100644 net/ipv4/esp4_offload.c
 create mode 100644 net/ipv6/esp6_offload.c

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 6e7baaf..27df99f 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -360,6 +360,12 @@ config INET_ESP
 
 	  If unsure, say Y.
 
+config INET_ESP_OFFLOAD
+	tristate "IP: ESP transformation offload"
+	depends on INET_ESP
+	select XFRM_OFFLOAD
+	default n
+
 config INET_IPCOMP
 	tristate "IP: IPComp transformation"
 	select INET_XFRM_TUNNEL
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index 48af58a..c6d4238 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_NET_IPVTI) += ip_vti.o
 obj-$(CONFIG_SYN_COOKIES) += syncookies.o
 obj-$(CONFIG_INET_AH) += ah4.o
 obj-$(CONFIG_INET_ESP) += esp4.o
+obj-$(CONFIG_INET_ESP_OFFLOAD) += esp4_offload.o
 obj-$(CONFIG_INET_IPCOMP) += ipcomp.o
 obj-$(CONFIG_INET_XFRM_TUNNEL) += xfrm4_tunnel.o
 obj-$(CONFIG_INET_XFRM_MODE_BEET) += xfrm4_mode_beet.o
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index b1e2444..4563aeb 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -629,6 +629,11 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb)
 			nfrags = 1;
 
 			goto skip_cow;
+		} else if (skb_xfrm_gro(skb)) {
+			nfrags = skb_shinfo(skb)->nr_frags;
+			nfrags++;
+
+			goto skip_cow;
 		} else if (!skb_has_frag_list(skb)) {
 			nfrags = skb_shinfo(skb)->nr_frags;
 			nfrags++;
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
new file mode 100644
index 0000000..7277d15
--- /dev/null
+++ b/net/ipv4/esp4_offload.c
@@ -0,0 +1,94 @@
+/*
+ * IPV4 GSO/GRO offload support
+ * Linux INET implementation
+ *
+ * Copyright (C) 2016 secunet Security Networks AG
+ * Author: Steffen Klassert <steffen.klassert@secunet.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * ESP GRO support
+ */
+
+#include <linux/skbuff.h>
+#include <linux/init.h>
+#include <net/protocol.h>
+#include <crypto/aead.h>
+#include <crypto/authenc.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <net/ip.h>
+#include <net/xfrm.h>
+#include <net/esp.h>
+#include <linux/scatterlist.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <net/udp.h>
+
+static struct sk_buff **esp4_gro_receive(struct sk_buff **head,
+					 struct sk_buff *skb)
+{
+	int err;
+	if (NAPI_GRO_CB(skb)->flush)
+		goto out;
+
+	skb_pull(skb, skb_gro_offset(skb));
+	skb->xfrm_gro = 1;
+
+	err = xfrm4_rcv_encap(skb, IPPROTO_ESP, 0, -2);
+	if (err == -EOPNOTSUPP) {
+		skb_push(skb, skb_gro_offset(skb));
+		NAPI_GRO_CB(skb)->same_flow = 0;
+		NAPI_GRO_CB(skb)->flush = 1;
+		skb->xfrm_gro = 0;
+		goto out;
+	}
+
+	return ERR_PTR(-EINPROGRESS);
+out:
+	return NULL;
+}
+
+static int esp4_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	struct xfrm_state *x = xfrm_input_state(skb);
+	struct crypto_aead *aead = x->data;
+	struct ip_esp_hdr *esph = (struct ip_esp_hdr *)(skb->data + nhoff);
+	struct packet_offload *ptype;
+	int err = -ENOENT;
+	__be16 type = skb->protocol;
+
+	rcu_read_lock();
+	ptype = gro_find_complete_by_type(type);
+	if (ptype != NULL)
+		err = ptype->callbacks.gro_complete(skb, nhoff + sizeof(*esph) + crypto_aead_ivsize(aead));
+
+	rcu_read_unlock();
+
+	return err;
+}
+
+static const struct net_offload esp4_offload = {
+	.callbacks = {
+		.gro_receive = esp4_gro_receive,
+		.gro_complete = esp4_gro_complete,
+	},
+};
+
+static int __init esp4_offload_init(void)
+{
+	return inet_add_offload(&esp4_offload, IPPROTO_ESP);
+}
+
+static void __exit esp4_offload_exit(void)
+{
+	inet_del_offload(&esp4_offload, IPPROTO_ESP);
+}
+
+module_init(esp4_offload_init);
+module_exit(esp4_offload_exit);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Steffen Klassert <steffen.klassert@secunet.com>");
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 8b14f14..f275feb 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -60,6 +60,10 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
 	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
 				  iph->saddr, iph->daddr, 0);
 	if (tunnel) {
+		/* encap_type < -1 indicates a GRO call, we don't support this. */
+		if (encap_type < -1)
+			return -EOPNOTSUPP;
+
 		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 			goto drop;
 
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 62e1e72..6fe1a68 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -53,6 +53,12 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
 	iph->tot_len = htons(skb->len);
 	ip_send_check(iph);
 
+
+	if (skb_xfrm_gro(skb)) {
+		skb_mac_header_rebuild(skb);
+		return 0;
+	}
+
 	NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
 		dev_net(skb->dev), NULL, skb, skb->dev, NULL,
 		xfrm4_rcv_encap_finish);
diff --git a/net/ipv4/xfrm4_mode_transport.c b/net/ipv4/xfrm4_mode_transport.c
index fd840c7..7b447f3 100644
--- a/net/ipv4/xfrm4_mode_transport.c
+++ b/net/ipv4/xfrm4_mode_transport.c
@@ -50,7 +50,8 @@ static int xfrm4_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 		skb->network_header = skb->transport_header;
 	}
 	ip_hdr(skb)->tot_len = htons(skb->len + ihl);
-	skb_reset_transport_header(skb);
+	if (!skb_xfrm_gro(skb))
+		skb_reset_transport_header(skb);
 	return 0;
 }
 
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index ec1267e..8b0713b 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -75,6 +75,12 @@ config INET6_ESP
 
 	  If unsure, say Y.
 
+config INET6_ESP_OFFLOAD
+	tristate "IPv6: ESP transformation offload"
+	depends on INET6_ESP
+	select XFRM_OFFLOAD
+	default n
+
 config INET6_IPCOMP
 	tristate "IPv6: IPComp transformation"
 	select INET6_XFRM_TUNNEL
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index a9e9fec..217e9ff 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -30,6 +30,7 @@ ipv6-objs += $(ipv6-y)
 
 obj-$(CONFIG_INET6_AH) += ah6.o
 obj-$(CONFIG_INET6_ESP) += esp6.o
+obj-$(CONFIG_INET6_ESP_OFFLOAD) += esp6_offload.o
 obj-$(CONFIG_INET6_IPCOMP) += ipcomp6.o
 obj-$(CONFIG_INET6_XFRM_TUNNEL) += xfrm6_tunnel.o
 obj-$(CONFIG_INET6_TUNNEL) += tunnel6.o
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index ff54faa..1cca0f1 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -571,6 +571,11 @@ static int esp6_input(struct xfrm_state *x, struct sk_buff *skb)
 			nfrags = 1;
 
 			goto skip_cow;
+		} else if (skb_xfrm_gro(skb)) {
+			nfrags = skb_shinfo(skb)->nr_frags;
+			nfrags++;
+
+			goto skip_cow;
 		} else if (!skb_has_frag_list(skb)) {
 			nfrags = skb_shinfo(skb)->nr_frags;
 			nfrags++;
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
new file mode 100644
index 0000000..e0006cf
--- /dev/null
+++ b/net/ipv6/esp6_offload.c
@@ -0,0 +1,98 @@
+/*
+ * IPV6 GSO/GRO offload support
+ * Linux INET implementation
+ *
+ * Copyright (C) 2016 secunet Security Networks AG
+ * Author: Steffen Klassert <steffen.klassert@secunet.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * ESP GRO support
+ */
+
+#include <linux/skbuff.h>
+#include <linux/init.h>
+#include <net/protocol.h>
+#include <crypto/aead.h>
+#include <crypto/authenc.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <net/ip.h>
+#include <net/xfrm.h>
+#include <net/esp.h>
+#include <linux/scatterlist.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <net/ip6_route.h>
+#include <net/ipv6.h>
+#include <linux/icmpv6.h>
+
+static struct sk_buff **esp6_gro_receive(struct sk_buff **head,
+					 struct sk_buff *skb)
+{
+	int err;
+	if (NAPI_GRO_CB(skb)->flush)
+		goto out;
+
+	skb_pull(skb, skb_gro_offset(skb));
+	skb->xfrm_gro = 1;
+
+	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
+	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
+	err = xfrm_input(skb, IPPROTO_ESP, 0, -2);
+	if (err == -EOPNOTSUPP) {
+		skb_push(skb, skb_gro_offset(skb));
+		NAPI_GRO_CB(skb)->same_flow = 0;
+		NAPI_GRO_CB(skb)->flush = 1;
+		skb->xfrm_gro = 0;
+		goto out;
+	}
+
+	return ERR_PTR(-EINPROGRESS);
+out:
+	return NULL;
+}
+
+static int esp6_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	struct xfrm_state *x = xfrm_input_state(skb);
+	struct crypto_aead *aead = x->data;
+	struct ip_esp_hdr *esph = (struct ip_esp_hdr *)(skb->data + nhoff);
+	struct packet_offload *ptype;
+	int err = -ENOENT;
+	__be16 type = skb->protocol;
+
+	rcu_read_lock();
+	ptype = gro_find_complete_by_type(type);
+	if (ptype != NULL)
+		err = ptype->callbacks.gro_complete(skb, nhoff + sizeof(*esph) + crypto_aead_ivsize(aead));
+
+	rcu_read_unlock();
+
+	return err;
+}
+
+static const struct net_offload esp6_offload = {
+	.callbacks = {
+		.gro_receive = esp6_gro_receive,
+		.gro_complete = esp6_gro_complete,
+	},
+};
+
+static int __init esp6_offload_init(void)
+{
+	return inet6_add_offload(&esp6_offload, IPPROTO_ESP);
+}
+
+static void __exit esp6_offload_exit(void)
+{
+	inet6_del_offload(&esp6_offload, IPPROTO_ESP);
+}
+
+module_init(esp6_offload_init);
+module_exit(esp6_offload_exit);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Steffen Klassert <steffen.klassert@secunet.com>");
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index b578956..8ed16c8 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -44,6 +44,11 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 	ipv6_hdr(skb)->payload_len = htons(skb->len);
 	__skb_push(skb, skb->data - skb_network_header(skb));
 
+	if (skb_xfrm_gro(skb)) {
+		skb_mac_header_rebuild(skb);
+		return -1;
+	}
+
 	NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
 		dev_net(skb->dev), NULL, skb, skb->dev, NULL,
 		ip6_rcv_finish);
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 6e3f025..893263e 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -193,13 +193,17 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 	int decaps = 0;
 	int async = 0;
 
-	/* A negative encap_type indicates async resumption. */
 	if (encap_type < 0) {
-		async = 1;
-		x = xfrm_input_state(skb);
-		seq = XFRM_SKB_CB(skb)->seq.input.low;
-		family = x->outer_mode->afinfo->family;
-		goto resume;
+		/* An encap_type of -1 indicates async resumption. */
+		if (encap_type == -1) {
+			async = 1;
+			x = xfrm_input_state(skb);
+			seq = XFRM_SKB_CB(skb)->seq.input.low;
+			family = x->outer_mode->afinfo->family;
+			goto resume;
+		}
+		/* encap_type < -1 indicates a GRO call. */
+		encap_type = 0;
 	}
 
 	daddr = (xfrm_address_t *)(skb_network_header(skb) +
@@ -374,7 +378,16 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 		netif_rx(skb);
 		return 0;
 	} else {
-		return x->inner_mode->afinfo->transport_finish(skb, async);
+		int xfrm_gro = skb_xfrm_gro(skb);
+
+		err = x->inner_mode->afinfo->transport_finish(skb, async);
+		if (xfrm_gro) {
+			skb_dst_drop(skb);
+			netif_rx(skb);
+			return 0;
+		}
+
+		return err;
 	}
 
 drop_unlock:
-- 
1.9.1

^ permalink raw reply related

* [PATCH RFC ipsec-next 4/5] net: Prepare for IPsec GRO
From: Steffen Klassert @ 2017-01-04  8:23 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Steffen Klassert, Sowmini Varadhan, Ilan Tayari
In-Reply-To: <1483518230-6777-1-git-send-email-steffen.klassert@secunet.com>

This patch prepares the generic codepath for IPsec GRO.
We introduce a new GRO_CONSUMED notifier to reflect that
IPsec can return asynchronous. On IPsec GRO we grab the
packet and reinject it back to layer 2 after IPsec
processing. We also use one xfrm_gro bit on the sk_buff
that will be set from IPsec to notify about GRO. If this
bit is set, we call napi_gro_receive for the backlog device
instead of __netif_receive_skb.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdevice.h |  1 +
 include/linux/skbuff.h    | 14 +++++++++++++-
 net/core/dev.c            | 17 ++++++++++++++++-
 net/xfrm/Kconfig          |  4 ++++
 4 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ecd78b3..89bad76 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -352,6 +352,7 @@ enum gro_result {
 	GRO_HELD,
 	GRO_NORMAL,
 	GRO_DROP,
+	GRO_CONSUMED,
 };
 typedef enum gro_result gro_result_t;
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b53c0cf..a78cd90 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -749,7 +749,10 @@ struct sk_buff {
 #ifdef CONFIG_NET_SWITCHDEV
 	__u8			offload_fwd_mark:1;
 #endif
-	/* 2, 4 or 5 bit hole */
+#ifdef CONFIG_XFRM_OFFLOAD
+	__u8			xfrm_gro:1;
+#endif
+	/* 1 to 5 bits hole */
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
@@ -3698,6 +3701,15 @@ static inline struct sec_path *skb_sec_path(struct sk_buff *skb)
 #endif
 }
 
+static inline bool skb_xfrm_gro(struct sk_buff *skb)
+{
+#ifdef CONFIG_XFRM_OFFLOAD
+	return skb->xfrm_gro;
+#else
+	return false;
+#endif
+}
+
 /* Keeps track of mac header offset relative to skb->head.
  * It is useful for TSO of Tunneling protocol. e.g. GRE.
  * For non-tunnel skb it points to skb_mac_header() and for
diff --git a/net/core/dev.c b/net/core/dev.c
index 56818f7..ecbaaf7f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4525,6 +4525,11 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	}
 	rcu_read_unlock();
 
+	if (PTR_ERR(pp) == -EINPROGRESS) {
+		ret = GRO_CONSUMED;
+		goto ok;
+	}
+
 	if (&ptype->list == head)
 		goto normal;
 
@@ -4623,6 +4628,9 @@ static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
 	case GRO_MERGED_FREE:
 		if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD) {
 			skb_dst_drop(skb);
+#ifdef CONFIG_XFRM_OFFLOAD
+			secpath_put(skb->sp);
+#endif
 			kmem_cache_free(skbuff_head_cache, skb);
 		} else {
 			__kfree_skb(skb);
@@ -4631,6 +4639,7 @@ static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
 
 	case GRO_HELD:
 	case GRO_MERGED:
+	case GRO_CONSUMED:
 		break;
 	}
 
@@ -4701,6 +4710,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi,
 		break;
 
 	case GRO_MERGED:
+	case GRO_CONSUMED:
 		break;
 	}
 
@@ -4843,7 +4853,12 @@ static int process_backlog(struct napi_struct *napi, int quota)
 
 		while ((skb = __skb_dequeue(&sd->process_queue))) {
 			rcu_read_lock();
-			__netif_receive_skb(skb);
+
+			if (skb_xfrm_gro(skb))
+				napi_gro_receive(napi, skb);
+			else
+				__netif_receive_skb(skb);
+
 			rcu_read_unlock();
 			input_queue_head_incr(sd);
 			if (++work >= quota)
diff --git a/net/xfrm/Kconfig b/net/xfrm/Kconfig
index bda1a13..442ac61 100644
--- a/net/xfrm/Kconfig
+++ b/net/xfrm/Kconfig
@@ -5,6 +5,10 @@ config XFRM
        bool
        depends on NET
 
+config XFRM_OFFLOAD
+	bool
+	depends on XFRM
+
 config XFRM_ALGO
 	tristate
 	select XFRM
-- 
1.9.1

^ permalink raw reply related

* [PATCH RFC ipsec-next] IPsec offload, part one
From: Steffen Klassert @ 2017-01-04  8:23 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Steffen Klassert, Sowmini Varadhan, Ilan Tayari

This is the first part of the IPsec offload work we
talked at the IPsec workshop at the last netdev
conference. I plan to apply this to ipsec-next
after this round of review.

Patch 1 and 2 try to avoid skb linearization in
the ESP layer.

Patch 3 introduces a hepler to seup the esp trailer.

Patch 4 prepares the generic network code for
IPsec GRO. The main reason why we need this, is
that we need to reinject the decrypted inner
packet back to the GRO layer.

Patch 5 introduces GRO handlers for ESP, GRO
can enabled with a IPsec offload config option.
This config option will also be used for the
upcomming hardware offload.

David, patch 3 touches generic networking code.
Is it ok to integrate such a generic preparation
patch into an IPsec pull request, or do you
prefer to get it as a separate patch?

^ permalink raw reply

* [PATCH RFC ipsec-next 3/5] esp: Introduce a helper to setup the trailer
From: Steffen Klassert @ 2017-01-04  8:23 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Steffen Klassert, Sowmini Varadhan, Ilan Tayari
In-Reply-To: <1483518230-6777-1-git-send-email-steffen.klassert@secunet.com>

We need to setup the trailer in two different cases,
so add a helper to avoid code duplication.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/esp4.c | 44 +++++++++++++++++++-------------------------
 net/ipv6/esp6.c | 44 +++++++++++++++++++-------------------------
 2 files changed, 38 insertions(+), 50 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 9e8d971..b1e2444 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -182,6 +182,22 @@ static void esp_output_done_esn(struct crypto_async_request *base, int err)
 	esp_output_done(base, err);
 }
 
+static void esp_output_fill_trailer(u8 *tail, int tfclen, int plen, __u8 proto)
+{
+	/* Fill padding... */
+	if (tfclen) {
+		memset(tail, 0, tfclen);
+		tail += tfclen;
+	}
+	do {
+		int i;
+		for (i = 0; i < plen - 2; i++)
+			tail[i] = i + 1;
+	} while (0);
+	tail[plen - 2] = plen - 2;
+	tail[plen - 1] = proto;
+}
+
 static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 {
 	struct esp_output_extra *extra;
@@ -304,18 +320,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 
 			tail = vaddr + pfrag->offset;
 
-			/* Fill padding... */
-			if (tfclen) {
-				memset(tail, 0, tfclen);
-				tail += tfclen;
-			}
-			do {
-				int i;
-				for (i = 0; i < plen - 2; i++)
-					tail[i] = i + 1;
-			} while (0);
-			tail[plen - 2] = plen - 2;
-			tail[plen - 1] = proto;
+			esp_output_fill_trailer(tail, tfclen, plen, proto);
 
 			kunmap_atomic(vaddr);
 
@@ -395,20 +400,9 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	esph = ip_esp_hdr(skb);
 
 skip_cow:
-	/* Fill padding... */
-	if (tfclen) {
-		memset(tail, 0, tfclen);
-		tail += tfclen;
-	}
-	do {
-		int i;
-		for (i = 0; i < plen - 2; i++)
-			tail[i] = i + 1;
-	} while (0);
-	tail[plen - 2] = plen - 2;
-	tail[plen - 1] = proto;
-	pskb_put(skb, trailer, clen - skb->len + alen);
+	esp_output_fill_trailer(tail, tfclen, plen, proto);
 
+	pskb_put(skb, trailer, clen - skb->len + alen);
 	skb_push(skb, -skb_network_offset(skb));
 	esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low);
 	esph->spi = x->id.spi;
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index a428ac6..ff54faa 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -198,6 +198,22 @@ static void esp_output_done_esn(struct crypto_async_request *base, int err)
 	esp_output_done(base, err);
 }
 
+static void esp_output_fill_trailer(u8 *tail, int tfclen, int plen, __u8 proto)
+{
+	/* Fill padding... */
+	if (tfclen) {
+		memset(tail, 0, tfclen);
+		tail += tfclen;
+	}
+	do {
+		int i;
+		for (i = 0; i < plen - 2; i++)
+			tail[i] = i + 1;
+	} while (0);
+	tail[plen - 2] = plen - 2;
+	tail[plen - 1] = proto;
+}
+
 static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 {
 	int err;
@@ -284,18 +300,7 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 
 			tail = vaddr + pfrag->offset;
 
-			/* Fill padding... */
-			if (tfclen) {
-				memset(tail, 0, tfclen);
-				tail += tfclen;
-			}
-			do {
-				int i;
-				for (i = 0; i < plen - 2; i++)
-					tail[i] = i + 1;
-			} while (0);
-			tail[plen - 2] = plen - 2;
-			tail[plen - 1] = proto;
+			esp_output_fill_trailer(tail, tfclen, plen, proto);
 
 			kunmap_atomic(vaddr);
 
@@ -375,20 +380,9 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 	esph = ip_esp_hdr(skb);
 
 skip_cow:
-	/* Fill padding... */
-	if (tfclen) {
-		memset(tail, 0, tfclen);
-		tail += tfclen;
-	}
-	do {
-		int i;
-		for (i = 0; i < plen - 2; i++)
-			tail[i] = i + 1;
-	} while (0);
-	tail[plen - 2] = plen - 2;
-	tail[plen - 1] = proto;
-	pskb_put(skb, trailer, clen - skb->len + alen);
+	esp_output_fill_trailer(tail, tfclen, plen, proto);
 
+	pskb_put(skb, trailer, clen - skb->len + alen);
 	skb_push(skb, -skb_network_offset(skb));
 
 	esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low);
-- 
1.9.1

^ permalink raw reply related

* [PATCH RFC ipsec-next 1/5] esp4: Avoid skb_cow_data whenever possible
From: Steffen Klassert @ 2017-01-04  8:23 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Steffen Klassert, Sowmini Varadhan, Ilan Tayari
In-Reply-To: <1483518230-6777-1-git-send-email-steffen.klassert@secunet.com>

This patch tries to avoid skb_cow_data on esp4.

On the encrypt side we add the IPsec tailbits
to the linear part of the buffer if there is
space on it. If there is no space on the linear
part, we add a page fragment with the tailbits to
the buffer and use separate src and dst scatterlists.

On the decrypt side, we leave the buffer as it is
if it is not cloned.

With this, we can avoid a linearization of the buffer
in most of the cases.

Joint work with:
Sowmini Varadhan <sowmini.varadhan@oracle.com>
Ilan Tayari <ilant@mellanox.com>

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h |   2 +
 net/ipv4/esp4.c    | 338 +++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 266 insertions(+), 74 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 31947b9..ce53cfe 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -213,6 +213,8 @@ struct xfrm_state {
 	/* Last used time */
 	unsigned long		lastused;
 
+	struct page_frag xfrag;
+
 	/* Reference to data common to all the instances of this
 	 * transformer. */
 	const struct xfrm_type	*type;
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 20fb25e..9e8d971 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -18,6 +18,8 @@
 #include <net/protocol.h>
 #include <net/udp.h>
 
+#include <linux/highmem.h>
+
 struct esp_skb_cb {
 	struct xfrm_skb_cb xfrm;
 	void *tmp;
@@ -92,11 +94,40 @@ static inline struct scatterlist *esp_req_sg(struct crypto_aead *aead,
 			     __alignof__(struct scatterlist));
 }
 
+static void esp_ssg_unref(struct xfrm_state *x, void *tmp)
+{
+	struct esp_output_extra *extra = esp_tmp_extra(tmp);
+	struct crypto_aead *aead = x->data;
+	int extralen = 0;
+	u8 *iv;
+	struct aead_request *req;
+	struct scatterlist *sg;
+
+	if (x->props.flags & XFRM_STATE_ESN)
+		extralen += sizeof(*extra);
+
+	extra = esp_tmp_extra(tmp);
+	iv = esp_tmp_iv(aead, tmp, extralen);
+	req = esp_tmp_req(aead, iv);
+
+	/* Unref skb_frag_pages in the src scatterlist if necessary.
+	 * Skip the first sg which comes from skb->data.
+	 */
+	if (req->src != req->dst)
+		for (sg = sg_next(req->src); sg; sg = sg_next(sg))
+			put_page(sg_page(sg));
+}
+
 static void esp_output_done(struct crypto_async_request *base, int err)
 {
 	struct sk_buff *skb = base->data;
+	void *tmp;
+	struct dst_entry *dst = skb_dst(skb);
+	struct xfrm_state *x = dst->xfrm;
 
-	kfree(ESP_SKB_CB(skb)->tmp);
+	tmp = ESP_SKB_CB(skb)->tmp;
+	esp_ssg_unref(x, tmp);
+	kfree(tmp);
 	xfrm_output_resume(skb, err);
 }
 
@@ -120,6 +151,29 @@ static void esp_output_restore_header(struct sk_buff *skb)
 				sizeof(__be32));
 }
 
+static struct ip_esp_hdr *esp_output_set_extra(struct sk_buff *skb,
+					       struct ip_esp_hdr *esph,
+					       struct esp_output_extra *extra)
+{
+	struct xfrm_state *x = skb_dst(skb)->xfrm;
+
+	/* For ESN we move the header forward by 4 bytes to
+	 * accomodate the high bits.  We will move it back after
+	 * encryption.
+	 */
+	if ((x->props.flags & XFRM_STATE_ESN)) {
+		extra->esphoff = (unsigned char *)esph -
+				 skb_transport_header(skb);
+		esph = (struct ip_esp_hdr *)((unsigned char *)esph - 4);
+		extra->seqhi = esph->spi;
+		esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.hi);
+	}
+
+	esph->spi = x->id.spi;
+
+	return esph;
+}
+
 static void esp_output_done_esn(struct crypto_async_request *base, int err)
 {
 	struct sk_buff *skb = base->data;
@@ -130,16 +184,18 @@ static void esp_output_done_esn(struct crypto_async_request *base, int err)
 
 static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 {
-	int err;
 	struct esp_output_extra *extra;
+	int err = -ENOMEM;
 	struct ip_esp_hdr *esph;
 	struct crypto_aead *aead;
 	struct aead_request *req;
-	struct scatterlist *sg;
+	struct scatterlist *sg, *dsg;
 	struct sk_buff *trailer;
+	struct page *page;
 	void *tmp;
 	u8 *iv;
 	u8 *tail;
+	u8 *vaddr;
 	int blksize;
 	int clen;
 	int alen;
@@ -149,7 +205,9 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	int nfrags;
 	int assoclen;
 	int extralen;
+	int tailen;
 	__be64 seqno;
+	__u8 proto = *skb_mac_header(skb);
 
 	/* skb is pure payload to encrypt */
 
@@ -169,12 +227,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	blksize = ALIGN(crypto_aead_blocksize(aead), 4);
 	clen = ALIGN(skb->len + 2 + tfclen, blksize);
 	plen = clen - skb->len - tfclen;
-
-	err = skb_cow_data(skb, tfclen + plen + alen, &trailer);
-	if (err < 0)
-		goto error;
-	nfrags = err;
-
+	tailen = tfclen + plen + alen;
 	assoclen = sizeof(*esph);
 	extralen = 0;
 
@@ -183,35 +236,8 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 		assoclen += sizeof(__be32);
 	}
 
-	tmp = esp_alloc_tmp(aead, nfrags, extralen);
-	if (!tmp) {
-		err = -ENOMEM;
-		goto error;
-	}
-
-	extra = esp_tmp_extra(tmp);
-	iv = esp_tmp_iv(aead, tmp, extralen);
-	req = esp_tmp_req(aead, iv);
-	sg = esp_req_sg(aead, req);
-
-	/* Fill padding... */
-	tail = skb_tail_pointer(trailer);
-	if (tfclen) {
-		memset(tail, 0, tfclen);
-		tail += tfclen;
-	}
-	do {
-		int i;
-		for (i = 0; i < plen - 2; i++)
-			tail[i] = i + 1;
-	} while (0);
-	tail[plen - 2] = plen - 2;
-	tail[plen - 1] = *skb_mac_header(skb);
-	pskb_put(skb, trailer, clen - skb->len + alen);
-
-	skb_push(skb, -skb_network_offset(skb));
-	esph = ip_esp_hdr(skb);
 	*skb_mac_header(skb) = IPPROTO_ESP;
+	esph = ip_esp_hdr(skb);
 
 	/* this is non-NULL only with UDP Encapsulation */
 	if (x->encap) {
@@ -230,7 +256,8 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 		uh = (struct udphdr *)esph;
 		uh->source = sport;
 		uh->dest = dport;
-		uh->len = htons(skb->len - skb_transport_offset(skb));
+		uh->len = htons(skb->len + tailen
+				- skb_transport_offset(skb));
 		uh->check = 0;
 
 		switch (encap_type) {
@@ -248,31 +275,170 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 		*skb_mac_header(skb) = IPPROTO_UDP;
 	}
 
-	esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low);
+	if (!skb_cloned(skb)) {
+		if (tailen <= skb_availroom(skb)) {
+			nfrags = 1;
+			trailer = skb;
+			tail = skb_tail_pointer(trailer);
 
-	aead_request_set_callback(req, 0, esp_output_done, skb);
+			goto skip_cow;
+		} else if ((skb_shinfo(skb)->nr_frags < MAX_SKB_FRAGS)
+			   && !skb_has_frag_list(skb)) {
+			int allocsize;
+			struct sock *sk = skb->sk;
+			struct page_frag *pfrag = &x->xfrag;
 
-	/* For ESN we move the header forward by 4 bytes to
-	 * accomodate the high bits.  We will move it back after
-	 * encryption.
-	 */
-	if ((x->props.flags & XFRM_STATE_ESN)) {
-		extra->esphoff = (unsigned char *)esph -
-				 skb_transport_header(skb);
-		esph = (struct ip_esp_hdr *)((unsigned char *)esph - 4);
-		extra->seqhi = esph->spi;
-		esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.hi);
-		aead_request_set_callback(req, 0, esp_output_done_esn, skb);
+			allocsize = ALIGN(tailen, L1_CACHE_BYTES);
+
+			spin_lock_bh(&x->lock);
+
+			if (unlikely(!skb_page_frag_refill(allocsize, pfrag, GFP_ATOMIC))) {
+				spin_unlock_bh(&x->lock);
+				goto cow;
+			}
+
+			page = pfrag->page;
+			get_page(page);
+
+			vaddr = kmap_atomic(page);
+
+			tail = vaddr + pfrag->offset;
+
+			/* Fill padding... */
+			if (tfclen) {
+				memset(tail, 0, tfclen);
+				tail += tfclen;
+			}
+			do {
+				int i;
+				for (i = 0; i < plen - 2; i++)
+					tail[i] = i + 1;
+			} while (0);
+			tail[plen - 2] = plen - 2;
+			tail[plen - 1] = proto;
+
+			kunmap_atomic(vaddr);
+
+			nfrags = skb_shinfo(skb)->nr_frags;
+
+			__skb_fill_page_desc(skb, nfrags, page, pfrag->offset,
+					     tailen);
+			skb_shinfo(skb)->nr_frags = ++nfrags;
+
+			pfrag->offset = pfrag->offset + allocsize;
+			nfrags++;
+
+			skb->len += tailen;
+			skb->data_len += tailen;
+			skb->truesize += tailen;
+			if (sk)
+				atomic_add(tailen, &sk->sk_wmem_alloc);
+
+			skb_push(skb, -skb_network_offset(skb));
+
+			esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low);
+			esph->spi = x->id.spi;
+
+			tmp = esp_alloc_tmp(aead, nfrags + 2, extralen);
+			if (!tmp) {
+				spin_unlock_bh(&x->lock);
+				err = -ENOMEM;
+				goto error;
+			}
+
+			extra = esp_tmp_extra(tmp);
+			iv = esp_tmp_iv(aead, tmp, extralen);
+			req = esp_tmp_req(aead, iv);
+			sg = esp_req_sg(aead, req);
+			dsg = &sg[nfrags];
+
+			esph = esp_output_set_extra(skb, esph, extra);
+
+			sg_init_table(sg, nfrags);
+			skb_to_sgvec(skb, sg,
+				     (unsigned char *)esph - skb->data,
+				     assoclen + ivlen + clen + alen);
+
+			allocsize = ALIGN(skb->data_len, L1_CACHE_BYTES);
+
+			if (unlikely(!skb_page_frag_refill(allocsize, pfrag, GFP_ATOMIC))) {
+				spin_unlock_bh(&x->lock);
+				err = -ENOMEM;
+				goto error;
+			}
+
+			skb_shinfo(skb)->nr_frags = 1;
+
+			page = pfrag->page;
+			get_page(page);
+			/* replace page frags in skb with new page */
+			__skb_fill_page_desc(skb, 0, page, pfrag->offset, skb->data_len);
+			pfrag->offset = pfrag->offset + allocsize;
+
+			sg_init_table(dsg, skb_shinfo(skb)->nr_frags + 1);
+			skb_to_sgvec(skb, dsg,
+				     (unsigned char *)esph - skb->data,
+				     assoclen + ivlen + clen + alen);
+
+			spin_unlock_bh(&x->lock);
+
+			goto skip_cow2;
+		}
 	}
 
+cow:
+	err = skb_cow_data(skb, tailen, &trailer);
+	if (err < 0)
+		goto error;
+	nfrags = err;
+	tail = skb_tail_pointer(trailer);
+	esph = ip_esp_hdr(skb);
+
+skip_cow:
+	/* Fill padding... */
+	if (tfclen) {
+		memset(tail, 0, tfclen);
+		tail += tfclen;
+	}
+	do {
+		int i;
+		for (i = 0; i < plen - 2; i++)
+			tail[i] = i + 1;
+	} while (0);
+	tail[plen - 2] = plen - 2;
+	tail[plen - 1] = proto;
+	pskb_put(skb, trailer, clen - skb->len + alen);
+
+	skb_push(skb, -skb_network_offset(skb));
+	esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low);
 	esph->spi = x->id.spi;
 
+	tmp = esp_alloc_tmp(aead, nfrags, extralen);
+	if (!tmp) {
+		err = -ENOMEM;
+		goto error;
+	}
+
+	extra = esp_tmp_extra(tmp);
+	iv = esp_tmp_iv(aead, tmp, extralen);
+	req = esp_tmp_req(aead, iv);
+	sg = esp_req_sg(aead, req);
+	dsg = sg;
+
+	esph = esp_output_set_extra(skb, esph, extra);
+
 	sg_init_table(sg, nfrags);
 	skb_to_sgvec(skb, sg,
 		     (unsigned char *)esph - skb->data,
 		     assoclen + ivlen + clen + alen);
 
-	aead_request_set_crypt(req, sg, sg, ivlen + clen, iv);
+skip_cow2:
+	if ((x->props.flags & XFRM_STATE_ESN))
+		aead_request_set_callback(req, 0, esp_output_done_esn, skb);
+	else
+		aead_request_set_callback(req, 0, esp_output_done, skb);
+
+	aead_request_set_crypt(req, sg, dsg, ivlen + clen, iv);
 	aead_request_set_ad(req, assoclen);
 
 	seqno = cpu_to_be64(XFRM_SKB_CB(skb)->seq.output.low +
@@ -298,6 +464,8 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 			esp_output_restore_header(skb);
 	}
 
+	if (sg != dsg)
+		esp_ssg_unref(x, tmp);
 	kfree(tmp);
 
 error:
@@ -401,6 +569,23 @@ static void esp_input_restore_header(struct sk_buff *skb)
 	__skb_pull(skb, 4);
 }
 
+static void esp_input_set_header(struct sk_buff *skb, __be32 *seqhi)
+{
+	struct xfrm_state *x = xfrm_input_state(skb);
+	struct ip_esp_hdr *esph = (struct ip_esp_hdr *)skb->data;
+
+	/* For ESN we move the header forward by 4 bytes to
+	 * accomodate the high bits.  We will move it back after
+	 * decryption.
+	 */
+	if ((x->props.flags & XFRM_STATE_ESN)) {
+		esph = (void *)skb_push(skb, 4);
+		*seqhi = esph->spi;
+		esph->spi = esph->seq_no;
+		esph->seq_no = XFRM_SKB_CB(skb)->seq.input.hi;
+	}
+}
+
 static void esp_input_done_esn(struct crypto_async_request *base, int err)
 {
 	struct sk_buff *skb = base->data;
@@ -437,12 +622,6 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb)
 	if (elen <= 0)
 		goto out;
 
-	err = skb_cow_data(skb, 0, &trailer);
-	if (err < 0)
-		goto out;
-
-	nfrags = err;
-
 	assoclen = sizeof(*esph);
 	seqhilen = 0;
 
@@ -451,6 +630,26 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb)
 		assoclen += seqhilen;
 	}
 
+	if (!skb_cloned(skb)) {
+		if (!skb_is_nonlinear(skb)) {
+			nfrags = 1;
+
+			goto skip_cow;
+		} else if (!skb_has_frag_list(skb)) {
+			nfrags = skb_shinfo(skb)->nr_frags;
+			nfrags++;
+
+			goto skip_cow;
+		}
+	}
+
+	err = skb_cow_data(skb, 0, &trailer);
+	if (err < 0)
+		goto out;
+
+	nfrags = err;
+
+skip_cow:
 	err = -ENOMEM;
 	tmp = esp_alloc_tmp(aead, nfrags, seqhilen);
 	if (!tmp)
@@ -462,26 +661,17 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb)
 	req = esp_tmp_req(aead, iv);
 	sg = esp_req_sg(aead, req);
 
-	skb->ip_summed = CHECKSUM_NONE;
+	esp_input_set_header(skb, seqhi);
 
-	esph = (struct ip_esp_hdr *)skb->data;
+	sg_init_table(sg, nfrags);
+	skb_to_sgvec(skb, sg, 0, skb->len);
 
-	aead_request_set_callback(req, 0, esp_input_done, skb);
+	skb->ip_summed = CHECKSUM_NONE;
 
-	/* For ESN we move the header forward by 4 bytes to
-	 * accomodate the high bits.  We will move it back after
-	 * decryption.
-	 */
-	if ((x->props.flags & XFRM_STATE_ESN)) {
-		esph = (void *)skb_push(skb, 4);
-		*seqhi = esph->spi;
-		esph->spi = esph->seq_no;
-		esph->seq_no = XFRM_SKB_CB(skb)->seq.input.hi;
+	if ((x->props.flags & XFRM_STATE_ESN))
 		aead_request_set_callback(req, 0, esp_input_done_esn, skb);
-	}
-
-	sg_init_table(sg, nfrags);
-	skb_to_sgvec(skb, sg, 0, skb->len);
+	else
+		aead_request_set_callback(req, 0, esp_input_done, skb);
 
 	aead_request_set_crypt(req, sg, sg, elen + ivlen, iv);
 	aead_request_set_ad(req, assoclen);
-- 
1.9.1

^ permalink raw reply related

* [PATCH RFC ipsec-next 2/5] esp6: Avoid skb_cow_data whenever possible
From: Steffen Klassert @ 2017-01-04  8:23 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Steffen Klassert, Sowmini Varadhan, Ilan Tayari
In-Reply-To: <1483518230-6777-1-git-send-email-steffen.klassert@secunet.com>

This patch tries to avoid skb_cow_data on esp6.

On the encrypt side we add the IPsec tailbits
to the linear part of the buffer if there is
space on it. If there is no space on the linear
part, we add a page fragment with the tailbits to
the buffer and use separate src and dst scatterlists.

On the decrypt side, we leave the buffer as it is
if it is not cloned.

With this, we can avoid a linearization of the buffer
in most of the cases.

Joint work with:
Sowmini Varadhan <sowmini.varadhan@oracle.com>
Ilan Tayari <ilant@mellanox.com>

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/esp6.c | 302 +++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 246 insertions(+), 56 deletions(-)

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index cbcdd5d..a428ac6 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -44,6 +44,8 @@
 #include <net/protocol.h>
 #include <linux/icmpv6.h>
 
+#include <linux/highmem.h>
+
 struct esp_skb_cb {
 	struct xfrm_skb_cb xfrm;
 	void *tmp;
@@ -114,11 +116,40 @@ static inline struct scatterlist *esp_req_sg(struct crypto_aead *aead,
 			     __alignof__(struct scatterlist));
 }
 
+static void esp_ssg_unref(struct xfrm_state *x, void *tmp)
+{
+	__be32 *seqhi;
+	struct crypto_aead *aead = x->data;
+	int seqhilen = 0;
+	u8 *iv;
+	struct aead_request *req;
+	struct scatterlist *sg;
+
+	if (x->props.flags & XFRM_STATE_ESN)
+		seqhilen += sizeof(__be32);
+
+	seqhi = esp_tmp_seqhi(tmp);
+	iv = esp_tmp_iv(aead, tmp, seqhilen);
+	req = esp_tmp_req(aead, iv);
+
+	/* Unref skb_frag_pages in the src scatterlist if necessary.
+	 * Skip the first sg which comes from skb->data.
+	 */
+	if (req->src != req->dst)
+		for (sg = sg_next(req->src); sg; sg = sg_next(sg))
+			put_page(sg_page(sg));
+}
+
 static void esp_output_done(struct crypto_async_request *base, int err)
 {
 	struct sk_buff *skb = base->data;
+	void *tmp;
+	struct dst_entry *dst = skb_dst(skb);
+	struct xfrm_state *x = dst->xfrm;
 
-	kfree(ESP_SKB_CB(skb)->tmp);
+	tmp = ESP_SKB_CB(skb)->tmp;
+	esp_ssg_unref(x, tmp);
+	kfree(tmp);
 	xfrm_output_resume(skb, err);
 }
 
@@ -138,6 +169,27 @@ static void esp_output_restore_header(struct sk_buff *skb)
 	esp_restore_header(skb, skb_transport_offset(skb) - sizeof(__be32));
 }
 
+static struct ip_esp_hdr *esp_output_set_esn(struct sk_buff *skb,
+					     struct ip_esp_hdr *esph,
+					     __be32 *seqhi)
+{
+	struct xfrm_state *x = skb_dst(skb)->xfrm;
+
+	/* For ESN we move the header forward by 4 bytes to
+	 * accomodate the high bits.  We will move it back after
+	 * encryption.
+	 */
+	if ((x->props.flags & XFRM_STATE_ESN)) {
+		esph = (void *)(skb_transport_header(skb) - sizeof(__be32));
+		*seqhi = esph->spi;
+		esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.hi);
+	}
+
+	esph->spi = x->id.spi;
+
+	return esph;
+}
+
 static void esp_output_done_esn(struct crypto_async_request *base, int err)
 {
 	struct sk_buff *skb = base->data;
@@ -152,8 +204,9 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 	struct ip_esp_hdr *esph;
 	struct crypto_aead *aead;
 	struct aead_request *req;
-	struct scatterlist *sg;
+	struct scatterlist *sg, *dsg;
 	struct sk_buff *trailer;
+	struct page *page;
 	void *tmp;
 	int blksize;
 	int clen;
@@ -164,10 +217,13 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 	int nfrags;
 	int assoclen;
 	int seqhilen;
+	int tailen;
 	u8 *iv;
 	u8 *tail;
+	u8 *vaddr;
 	__be32 *seqhi;
 	__be64 seqno;
+	__u8 proto = *skb_mac_header(skb);
 
 	/* skb is pure payload to encrypt */
 	aead = x->data;
@@ -186,11 +242,7 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 	blksize = ALIGN(crypto_aead_blocksize(aead), 4);
 	clen = ALIGN(skb->len + 2 + tfclen, blksize);
 	plen = clen - skb->len - tfclen;
-
-	err = skb_cow_data(skb, tfclen + plen + alen, &trailer);
-	if (err < 0)
-		goto error;
-	nfrags = err;
+	tailen = tfclen + plen + alen;
 
 	assoclen = sizeof(*esph);
 	seqhilen = 0;
@@ -200,19 +252,130 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 		assoclen += seqhilen;
 	}
 
-	tmp = esp_alloc_tmp(aead, nfrags, seqhilen);
-	if (!tmp) {
-		err = -ENOMEM;
-		goto error;
+	*skb_mac_header(skb) = IPPROTO_ESP;
+	esph = ip_esp_hdr(skb);
+
+	if (!skb_cloned(skb)) {
+		if (tailen <= skb_availroom(skb)) {
+			nfrags = 1;
+			trailer = skb;
+			tail = skb_tail_pointer(trailer);
+
+			goto skip_cow;
+		} else if ((skb_shinfo(skb)->nr_frags < MAX_SKB_FRAGS)
+			   && !skb_has_frag_list(skb)) {
+			int allocsize;
+			struct sock *sk = skb->sk;
+			struct page_frag *pfrag = &x->xfrag;
+
+			allocsize = ALIGN(tailen, L1_CACHE_BYTES);
+
+			spin_lock_bh(&x->lock);
+
+			if (unlikely(!skb_page_frag_refill(allocsize, pfrag, GFP_ATOMIC))) {
+				spin_unlock_bh(&x->lock);
+				goto cow;
+			}
+
+			page = pfrag->page;
+			get_page(page);
+
+			vaddr = kmap_atomic(page);
+
+			tail = vaddr + pfrag->offset;
+
+			/* Fill padding... */
+			if (tfclen) {
+				memset(tail, 0, tfclen);
+				tail += tfclen;
+			}
+			do {
+				int i;
+				for (i = 0; i < plen - 2; i++)
+					tail[i] = i + 1;
+			} while (0);
+			tail[plen - 2] = plen - 2;
+			tail[plen - 1] = proto;
+
+			kunmap_atomic(vaddr);
+
+			nfrags = skb_shinfo(skb)->nr_frags;
+
+			__skb_fill_page_desc(skb, nfrags, page, pfrag->offset,
+					     tailen);
+			skb_shinfo(skb)->nr_frags = ++nfrags;
+
+			pfrag->offset = pfrag->offset + allocsize;
+			nfrags++;
+
+			skb->len += tailen;
+			skb->data_len += tailen;
+			skb->truesize += tailen;
+			if (sk)
+				atomic_add(tailen, &sk->sk_wmem_alloc);
+
+			skb_push(skb, -skb_network_offset(skb));
+
+			esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low);
+			esph->spi = x->id.spi;
+
+			tmp = esp_alloc_tmp(aead, nfrags + 2, seqhilen);
+			if (!tmp) {
+				spin_unlock_bh(&x->lock);
+				err = -ENOMEM;
+				goto error;
+			}
+			seqhi = esp_tmp_seqhi(tmp);
+			iv = esp_tmp_iv(aead, tmp, seqhilen);
+			req = esp_tmp_req(aead, iv);
+			sg = esp_req_sg(aead, req);
+			dsg = &sg[nfrags];
+
+			esph = esp_output_set_esn(skb, esph, seqhi);
+
+			sg_init_table(sg, nfrags);
+			skb_to_sgvec(skb, sg,
+				     (unsigned char *)esph - skb->data,
+				     assoclen + ivlen + clen + alen);
+
+			allocsize = ALIGN(skb->data_len, L1_CACHE_BYTES);
+
+			if (unlikely(!skb_page_frag_refill(allocsize, pfrag, GFP_ATOMIC))) {
+				spin_unlock_bh(&x->lock);
+				err = -ENOMEM;
+				goto error;
+			}
+
+			skb_shinfo(skb)->nr_frags = 1;
+
+			page = pfrag->page;
+			get_page(page);
+			/* replace page frags in skb with new page */
+			__skb_fill_page_desc(skb, 0, page, pfrag->offset, skb->data_len);
+			pfrag->offset = pfrag->offset + allocsize;
+
+			sg_init_table(dsg, skb_shinfo(skb)->nr_frags + 1);
+			skb_to_sgvec(skb, dsg,
+				     (unsigned char *)esph - skb->data,
+				     assoclen + ivlen + clen + alen);
+
+			spin_unlock_bh(&x->lock);
+
+			goto skip_cow2;
+		}
 	}
 
-	seqhi = esp_tmp_seqhi(tmp);
-	iv = esp_tmp_iv(aead, tmp, seqhilen);
-	req = esp_tmp_req(aead, iv);
-	sg = esp_req_sg(aead, req);
+cow:
+	err = skb_cow_data(skb, tailen, &trailer);
+	if (err < 0)
+		goto error;
+	nfrags = err;
 
-	/* Fill padding... */
 	tail = skb_tail_pointer(trailer);
+	esph = ip_esp_hdr(skb);
+
+skip_cow:
+	/* Fill padding... */
 	if (tfclen) {
 		memset(tail, 0, tfclen);
 		tail += tfclen;
@@ -223,36 +386,40 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 			tail[i] = i + 1;
 	} while (0);
 	tail[plen - 2] = plen - 2;
-	tail[plen - 1] = *skb_mac_header(skb);
+	tail[plen - 1] = proto;
 	pskb_put(skb, trailer, clen - skb->len + alen);
 
 	skb_push(skb, -skb_network_offset(skb));
-	esph = ip_esp_hdr(skb);
-	*skb_mac_header(skb) = IPPROTO_ESP;
 
 	esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low);
+	esph->spi = x->id.spi;
 
-	aead_request_set_callback(req, 0, esp_output_done, skb);
-
-	/* For ESN we move the header forward by 4 bytes to
-	 * accomodate the high bits.  We will move it back after
-	 * encryption.
-	 */
-	if ((x->props.flags & XFRM_STATE_ESN)) {
-		esph = (void *)(skb_transport_header(skb) - sizeof(__be32));
-		*seqhi = esph->spi;
-		esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.hi);
-		aead_request_set_callback(req, 0, esp_output_done_esn, skb);
+	tmp = esp_alloc_tmp(aead, nfrags, seqhilen);
+	if (!tmp) {
+		err = -ENOMEM;
+		goto error;
 	}
 
-	esph->spi = x->id.spi;
+	seqhi = esp_tmp_seqhi(tmp);
+	iv = esp_tmp_iv(aead, tmp, seqhilen);
+	req = esp_tmp_req(aead, iv);
+	sg = esp_req_sg(aead, req);
+	dsg = sg;
+
+	esph = esp_output_set_esn(skb, esph, seqhi);
 
 	sg_init_table(sg, nfrags);
 	skb_to_sgvec(skb, sg,
 		     (unsigned char *)esph - skb->data,
 		     assoclen + ivlen + clen + alen);
 
-	aead_request_set_crypt(req, sg, sg, ivlen + clen, iv);
+skip_cow2:
+	if ((x->props.flags & XFRM_STATE_ESN))
+		aead_request_set_callback(req, 0, esp_output_done_esn, skb);
+	else
+		aead_request_set_callback(req, 0, esp_output_done, skb);
+
+	aead_request_set_crypt(req, sg, dsg, ivlen + clen, iv);
 	aead_request_set_ad(req, assoclen);
 
 	seqno = cpu_to_be64(XFRM_SKB_CB(skb)->seq.output.low +
@@ -278,6 +445,8 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 			esp_output_restore_header(skb);
 	}
 
+	if (sg != dsg)
+		esp_ssg_unref(x, tmp);
 	kfree(tmp);
 
 error:
@@ -343,6 +512,23 @@ static void esp_input_restore_header(struct sk_buff *skb)
 	__skb_pull(skb, 4);
 }
 
+static void esp_input_set_header(struct sk_buff *skb, __be32 *seqhi)
+{
+	struct xfrm_state *x = xfrm_input_state(skb);
+	struct ip_esp_hdr *esph = (struct ip_esp_hdr *)skb->data;
+
+	/* For ESN we move the header forward by 4 bytes to
+	 * accomodate the high bits.  We will move it back after
+	 * decryption.
+	 */
+	if ((x->props.flags & XFRM_STATE_ESN)) {
+		esph = (void *)skb_push(skb, 4);
+		*seqhi = esph->spi;
+		esph->spi = esph->seq_no;
+		esph->seq_no = XFRM_SKB_CB(skb)->seq.input.hi;
+	}
+}
+
 static void esp_input_done_esn(struct crypto_async_request *base, int err)
 {
 	struct sk_buff *skb = base->data;
@@ -378,14 +564,6 @@ static int esp6_input(struct xfrm_state *x, struct sk_buff *skb)
 		goto out;
 	}
 
-	nfrags = skb_cow_data(skb, 0, &trailer);
-	if (nfrags < 0) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	ret = -ENOMEM;
-
 	assoclen = sizeof(*esph);
 	seqhilen = 0;
 
@@ -394,6 +572,27 @@ static int esp6_input(struct xfrm_state *x, struct sk_buff *skb)
 		assoclen += seqhilen;
 	}
 
+	if (!skb_cloned(skb)) {
+		if (!skb_is_nonlinear(skb)) {
+			nfrags = 1;
+
+			goto skip_cow;
+		} else if (!skb_has_frag_list(skb)) {
+			nfrags = skb_shinfo(skb)->nr_frags;
+			nfrags++;
+
+			goto skip_cow;
+		}
+	}
+
+	nfrags = skb_cow_data(skb, 0, &trailer);
+	if (nfrags < 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+skip_cow:
+	ret = -ENOMEM;
 	tmp = esp_alloc_tmp(aead, nfrags, seqhilen);
 	if (!tmp)
 		goto out;
@@ -404,26 +603,17 @@ static int esp6_input(struct xfrm_state *x, struct sk_buff *skb)
 	req = esp_tmp_req(aead, iv);
 	sg = esp_req_sg(aead, req);
 
-	skb->ip_summed = CHECKSUM_NONE;
+	esp_input_set_header(skb, seqhi);
 
-	esph = (struct ip_esp_hdr *)skb->data;
+	sg_init_table(sg, nfrags);
+	skb_to_sgvec(skb, sg, 0, skb->len);
 
-	aead_request_set_callback(req, 0, esp_input_done, skb);
+	skb->ip_summed = CHECKSUM_NONE;
 
-	/* For ESN we move the header forward by 4 bytes to
-	 * accomodate the high bits.  We will move it back after
-	 * decryption.
-	 */
-	if ((x->props.flags & XFRM_STATE_ESN)) {
-		esph = (void *)skb_push(skb, 4);
-		*seqhi = esph->spi;
-		esph->spi = esph->seq_no;
-		esph->seq_no = XFRM_SKB_CB(skb)->seq.input.hi;
+	if ((x->props.flags & XFRM_STATE_ESN))
 		aead_request_set_callback(req, 0, esp_input_done_esn, skb);
-	}
-
-	sg_init_table(sg, nfrags);
-	skb_to_sgvec(skb, sg, 0, skb->len);
+	else
+		aead_request_set_callback(req, 0, esp_input_done, skb);
 
 	aead_request_set_crypt(req, sg, sg, elen + ivlen, iv);
 	aead_request_set_ad(req, assoclen);
-- 
1.9.1

^ permalink raw reply related

* RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
From: maowenan @ 2017-01-04  9:02 UTC (permalink / raw)
  To: maowenan, Alexander Duyck, netdev@vger.kernel.org,
	jeffrey.t.kirsher@intel.com, Stephen Hemminger
  Cc: weiyongjun (A), Dingtianhong, Wangzhou (B)
In-Reply-To: <CAKgT0Ud0=fQpADLHPnZBLrG8xLGHUB20TZt1mi+GBNYJQngDCA@mail.gmail.com>



> -----Original Message-----
> From: maowenan
> Sent: Monday, December 26, 2016 4:33 PM
> To: maowenan; 'Alexander Duyck'
> Cc: 'Jeff Kirsher'; 'Stephen Hemminger'; 'netdev@vger.kernel.org'; weiyongjun
> (A); Dingtianhong; Wangzhou (B)
> Subject: RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
> 
> 
> 
> > -----Original Message-----
> > From: maowenan
> > Sent: Saturday, December 24, 2016 4:30 PM
> > To: 'Alexander Duyck'
> > Cc: Jeff Kirsher; Stephen Hemminger; netdev@vger.kernel.org;
> > weiyongjun (A); Dingtianhong; Wangzhou (B)
> > Subject: RE: [PATCH] ethtool: add one ethtool option to set relax
> > ordering mode
> >
> >
> >
> > > -----Original Message-----
> > > From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> > > Sent: Friday, December 23, 2016 11:43 PM
> > > To: maowenan
> > > Cc: Jeff Kirsher; Stephen Hemminger; netdev@vger.kernel.org;
> > > weiyongjun (A); Dingtianhong
> > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > > ordering mode
> > >
> > > On Thu, Dec 22, 2016 at 10:14 PM, maowenan <maowenan@huawei.com>
> > > wrote:
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Jeff Kirsher [mailto:jeffrey.t.kirsher@intel.com]
> > > >> Sent: Friday, December 23, 2016 9:07 AM
> > > >> To: maowenan; Alexander Duyck
> > > >> Cc: Stephen Hemminger; netdev@vger.kernel.org; weiyongjun (A);
> > > >> Dingtianhong
> > > >> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > > >> ordering mode
> > > >>
> > > >> On Fri, 2016-12-23 at 00:40 +0000, maowenan wrote:
> > > >> > > -----Original Message-----
> > > >> > > From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> > > >> > > Sent: Thursday, December 22, 2016 11:54 PM
> > > >> > > To: maowenan
> > > >> > > Cc: Stephen Hemminger; netdev@vger.kernel.org;
> > > jeffrey.t.kirsher@intel.
> > > >> > > com;
> > > >> > > weiyongjun (A); Dingtianhong
> > > >> > > Subject: Re: [PATCH] ethtool: add one ethtool option to set
> > > >> > > relax ordering mode
> > > >> > >
> > > >> > > On Wed, Dec 21, 2016 at 5:39 PM, maowenan
> > > <maowenan@huawei.com>
> > > >> > > wrote:
> > > >> > > >
> > > >> > > >
> > > >> > > > > -----Original Message-----
> > > >> > > > > From: Stephen Hemminger
> > > >> > > > > [mailto:stephen@networkplumber.org]
> > > >> > > > > Sent: Thursday, December 22, 2016 9:28 AM
> > > >> > > > > To: maowenan
> > > >> > > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
> > > >> > > > > Subject: Re: [PATCH] ethtool: add one ethtool option to
> > > >> > > > > set relax ordering mode
> > > >> > > > >
> > > >> > > > > On Thu, 8 Dec 2016 14:51:38 +0800 Mao Wenan
> > > >> > > > > <maowenan@huawei.com> wrote:
> > > >> > > > >
> > > >> > > > > > This patch provides one way to set/unset IXGBE NIC TX
> > > >> > > > > > and RX relax ordering mode, which can be set by ethtool.
> > > >> > > > > > Relax ordering is one mode of 82599 NIC, to enable this
> > > >> > > > > > mode can enhance the performance for some cpu architecure.
> > > >> > > > >
> > > >> > > > > Then it should be done by CPU architecture specific
> > > >> > > > > quirks (preferably in PCI
> > > >> > > > > layer) so that all users get the option without having to
> > > >> > > > > do manual
> > > >> > >
> > > >> > > intervention.
> > > >> > > > >
> > > >> > > > > > example:
> > > >> > > > > > ethtool -s enp1s0f0 relaxorder off ethtool -s enp1s0f0
> > > >> > > > > > relaxorder on
> > > >> > > > >
> > > >> > > > > Doing it via ethtool is a developer API (for testing) not
> > > >> > > > > something that makes sense in production.
> > > >> > > >
> > > >> > > >
> > > >> > > > This feature is not mandatory for all users, acturally
> > > >> > > > relax ordering default configuration of 82599 is 'disable',
> > > >> > > > So this patch gives one way to
> > > >> > >
> > > >> > > enable relax ordering to be selected in some performance condition.
> > > >> > >
> > > >> > > That isn't quite true.  The default for Sparc systems is to
> > > >> > > have it enabled.
> > > >> > >
> > > >> > > Really this is something that is platform specific.  I agree
> > > >> > > with Stephen that it would work better if this was handled as
> > > >> > > a series of platform specific quirks handled at something
> > > >> > > like the PCI layer rather than be a switch the user can toggle on and
> off.
> > > >> > >
> > > >> > > With that being said there are changes being made that should
> > > >> > > help to improve the situation.  Specifically I am looking at
> > > >> > > adding support for the DMA_ATTR_WEAK_ORDERING which may
> also
> > > >> > > allow us to identify cases where you might be able to specify
> > > >> > > the DMA behavior via the DMA mapping instead of having to
> > > >> > > make the final decision in the device itself.
> > > >> > >
> > > >> > > - Alex
> > > >> >
> > > >> > Yes, Sparc is a special case. From the NIC driver point of
> > > >> > view, It is no need for some ARCHs to do particular operation
> > > >> > and compiling branch, ethtool is a flexible method for user to
> > > >> > make decision whether
> > > >> > on|off this feature.
> > > >> > I think Jeff as maintainer of 82599 has some comments about this.
> > > >>
> > > >> My original comment/objection was that you attempted to do this
> > > >> change as a module parameter to the ixgbe driver, where I
> > > >> directed you to use ethtool so that other drivers could benefit
> > > >> from the ability to enable/disable relaxed ordering.  As far as
> > > >> how it gets implemented in ethtool or PCI layer, makes little
> > > >> difference to me, I only had issues with the driver specific
> > > >> module parameter implementation,
> > > which is not acceptable.
> > > >
> > > >
> > > > Thank you Jeff and Alex.
> > > > And then I have gone through mail thread about "i40e: enable PCIe
> > > > relax ordering for SPARC", It only works for SPARC, any other ARCH
> > > > who wants to enable DMA_ATTR_WEAK_ORDERING feature, should
> define
> > > > the
> > > new macro, recompile the driver module.
> > > >
> > > > Because of the above reasons, we implement in ethtool to give the
> > > > final user a convenient way to on|off special feature, no need
> > > > define new macro, easy to extend the new features, and also good
> > > > benefit for other
> > > driver as Jeff referred.
> > > >
> > >
> > > I think the point is we shouldn't base the decision on user input.
> > > The fact is the PCIe device control register should have a bit that
> > > indicates if the device is allowed to enable relaxed ordering or not.
> > > If we can guarantee that the bit is set in all the cases where it
> > > should be set, and cleared in all the cases where it should not then
> > > we could use something like that to determine if the device is
> > > supposed to enable relaxed ordering instead of trying to make the
> > > decision
> > ourselves.
> > >
> > > - Alex
> >
> > ok. We are focusing on the register.
> > And yes, to enable relax ordering for 82599 should be set by one or
> > more bits of Rx/TX DCA Control Register, these bits should be set in
> > many cpu architectures, such as arm64, sparc, and so on, and should be
> cleared in other ARCHs.
> > By the way, how do you enable SPARC macro, how and where to define
> > this compiling macro when user one to enable relax ordering under SPARC
> system?
> > #ifndef CONFIG_SPARC
> >
> >
> 
> 
> Hi, Alex,
> Have you already sent out the patches about DMA_ATTR_WEAK_ORDERING?
> We want to get you how to enable DMA_ATTR_WEAK_ORDERING by PCIe layer,
> and we can refer to that.

I have verified DMA_ATTR_WEAK_ORDERING is not usable for our system(arm64 and 82599),
We should enable relax ordering in 82599 DCA control register to improve performance. 
As Stephen Hemminger do not suggest use ethtool to set relax ordering feature, 
@Jeff, do you agree with using erratum config to enable RO mode in 82599.  
Codes like below:
In Kconfig:
+config HI_ERRATUM_xxxx

In ixgbe_82599.c
#if !defined (CONFIG_SPARC) || !defined(HI_ERRATUM_xxxx)
	/* Disable relaxed ordering */
	for (i = 0; ((i < hw->mac.max_tx_queues) &&
	     (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
		regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
		regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
		IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
	}

	for (i = 0; ((i < hw->mac.max_rx_queues) &&
	     (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
		regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
		regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
			    IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
		IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
	}
#endif









^ permalink raw reply

* Re: [PATCH] uapi: use wildcards to list files
From: Nicolas Dichtel @ 2017-01-04  9:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, linux-kernel, dri-devel, netdev, linux-media,
	linux-mmc, netfilter-devel, linux-nfs, linux-raid, linux-spi,
	linux-mtd, linux-rdma, fcoe-devel, alsa-devel, linux-fbdev,
	xen-devel, davem, airlied, David Howells
In-Reply-To: <2108827.WpE3IvfEdH@wuerfel>

Le 03/01/2017 à 22:37, Arnd Bergmann a écrit :
> On Tuesday, January 3, 2017 3:35:44 PM CET Nicolas Dichtel wrote:
>> Regularly, when a new header is created in include/uapi/, the developer
>> forgets to add it in the corresponding Kbuild file. This error is usually
>> detected after the release is out.
>>
>> In fact, all headers under include/uapi/ should be exported, so let's
>> use wildcards.
> 
> I think the idea makes a lot of sense: if a header is in uapi, we should
> really export it. However, using a wildcard expression seems a bit
> backwards here, I think we should make this implicit and not have the
> Kbuild file at all.
> 
> The "header-y" syntax was originally added back when the uapi headers
> were mixed with the internal headers in the same directory. After
> David Howells introduced the separate directory for uapi, it has
> become a bit redundant.
Ok, thank you for the explanation, I was wondering why those Kbuild files were
needed.

> 
> Can you try to modify scripts/Makefile.headersinst instead so we
> can simply remove the Kbuild files entirely?
I will try something.


Regards,
Nicolas

^ permalink raw reply

* [PATCH] net-next: korina: Fix NAPI versus resources freeing
From: John Crispin @ 2017-01-04  9:31 UTC (permalink / raw)
  To: Florian Fainelli, David S. Miller; +Cc: netdev, John Crispin

From: Florian Fainelli <f.fainelli@gmail.com>

Commit beb0babfb77e ("korina: disable napi on close and restart")
introduced calls to napi_disable() that were missing before,
unfortunately this leaves a small window during which NAPI has a chance
to run, yet we just freed resources since korina_free_ring() has been
called:

Fix this by disabling NAPI first then freeing resource, and make sure
that we also cancel the restart taks before doing the resource freeing.

Fixes: beb0babfb77e ("korina: disable napi on close and restart")

Reported-by: Alexandros C. Couloumbis <alex@ozo.com>
Tested-by: John Crispin <john@phrozen.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/korina.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 1799fe1..c051987 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -900,10 +900,10 @@ static void korina_restart_task(struct work_struct *work)
 				DMA_STAT_DONE | DMA_STAT_HALT | DMA_STAT_ERR,
 				&lp->rx_dma_regs->dmasm);
 
-	korina_free_ring(dev);
-
 	napi_disable(&lp->napi);
 
+	korina_free_ring(dev);
+
 	if (korina_init(dev) < 0) {
 		printk(KERN_ERR "%s: cannot restart device\n", dev->name);
 		return;
@@ -1064,12 +1064,12 @@ static int korina_close(struct net_device *dev)
 	tmp = tmp | DMA_STAT_DONE | DMA_STAT_HALT | DMA_STAT_ERR;
 	writel(tmp, &lp->rx_dma_regs->dmasm);
 
-	korina_free_ring(dev);
-
 	napi_disable(&lp->napi);
 
 	cancel_work_sync(&lp->restart_task);
 
+	korina_free_ring(dev);
+
 	free_irq(lp->rx_irq, dev);
 	free_irq(lp->tx_irq, dev);
 	free_irq(lp->ovr_irq, dev);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/3] stmmac: move stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to platform structure
From: Joao Pinto @ 2017-01-04  9:43 UTC (permalink / raw)
  To: davem; +Cc: lars.persson, niklass, swarren, treding, netdev, Joao Pinto
In-Reply-To: <cover.1483372523.git.jpinto@synopsys.com>

This patch moves stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to the
plat_stmmacenet_data structure. It also moves these platform variables
initialization to stmmac_platform. This was done for two reasons:

a) If PCI is used, platform related code is being executed in stmmac_main
resulting in warnings that have no sense and conceptually was not right

b) stmmac as a synopsys reference ethernet driver stack will be hosting
more and more drivers to its structure like synopsys/dwc_eth_qos.c.
These drivers have their own DT bindings that are not compatible with
stmmac's. One of the most important are the clock names, and so they need
to be parsed in the glue logic and initialized there, and that is the main
reason why the clocks were passed to the platform structure.

Signed-off-by: Joao Pinto <jpinto@synopsys.com>
---
changes v1 -> v2:
- Nothing changed, just to keep up patch set version

 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |  5 --
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |  4 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 82 ++++------------------
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  | 47 +++++++++++++
 include/linux/stmmac.h                             |  5 ++
 5 files changed, 69 insertions(+), 74 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index eab04ae..bf8a83e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -106,9 +106,6 @@ struct stmmac_priv {
 	u32 msg_enable;
 	int wolopts;
 	int wol_irq;
-	struct clk *stmmac_clk;
-	struct clk *pclk;
-	struct reset_control *stmmac_rst;
 	int clk_csr;
 	struct timer_list eee_ctrl_timer;
 	int lpi_irq;
@@ -120,8 +117,6 @@ struct stmmac_priv {
 	struct ptp_clock *ptp_clock;
 	struct ptp_clock_info ptp_clock_ops;
 	unsigned int default_addend;
-	struct clk *clk_ptp_ref;
-	unsigned int clk_ptp_rate;
 	u32 adv_ts;
 	int use_riwt;
 	int irq_wake;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 699ee1d..322e5c6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -712,7 +712,7 @@ static int stmmac_ethtool_op_set_eee(struct net_device *dev,
 
 static u32 stmmac_usec2riwt(u32 usec, struct stmmac_priv *priv)
 {
-	unsigned long clk = clk_get_rate(priv->stmmac_clk);
+	unsigned long clk = clk_get_rate(priv->plat->stmmac_clk);
 
 	if (!clk)
 		return 0;
@@ -722,7 +722,7 @@ static u32 stmmac_usec2riwt(u32 usec, struct stmmac_priv *priv)
 
 static u32 stmmac_riwt2usec(u32 riwt, struct stmmac_priv *priv)
 {
-	unsigned long clk = clk_get_rate(priv->stmmac_clk);
+	unsigned long clk = clk_get_rate(priv->plat->stmmac_clk);
 
 	if (!clk)
 		return 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index f1a0afc..6e6e9dc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -158,7 +158,7 @@ static void stmmac_clk_csr_set(struct stmmac_priv *priv)
 {
 	u32 clk_rate;
 
-	clk_rate = clk_get_rate(priv->stmmac_clk);
+	clk_rate = clk_get_rate(priv->plat->stmmac_clk);
 
 	/* Platform provided default clk_csr would be assumed valid
 	 * for all other cases except for the below mentioned ones.
@@ -607,7 +607,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr)
 
 		/* program Sub Second Increment reg */
 		sec_inc = priv->hw->ptp->config_sub_second_increment(
-			priv->ptpaddr, priv->clk_ptp_rate,
+			priv->ptpaddr, priv->plat->clk_ptp_rate,
 			priv->plat->has_gmac4);
 		temp = div_u64(1000000000ULL, sec_inc);
 
@@ -617,7 +617,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr)
 		 * where, freq_div_ratio = 1e9ns/sec_inc
 		 */
 		temp = (u64)(temp << 32);
-		priv->default_addend = div_u64(temp, priv->clk_ptp_rate);
+		priv->default_addend = div_u64(temp, priv->plat->clk_ptp_rate);
 		priv->hw->ptp->config_addend(priv->ptpaddr,
 					     priv->default_addend);
 
@@ -645,18 +645,6 @@ static int stmmac_init_ptp(struct stmmac_priv *priv)
 	if (!(priv->dma_cap.time_stamp || priv->dma_cap.atime_stamp))
 		return -EOPNOTSUPP;
 
-	/* Fall-back to main clock in case of no PTP ref is passed */
-	priv->clk_ptp_ref = devm_clk_get(priv->device, "clk_ptp_ref");
-	if (IS_ERR(priv->clk_ptp_ref)) {
-		priv->clk_ptp_rate = clk_get_rate(priv->stmmac_clk);
-		priv->clk_ptp_ref = NULL;
-		netdev_dbg(priv->dev, "PTP uses main clock\n");
-	} else {
-		clk_prepare_enable(priv->clk_ptp_ref);
-		priv->clk_ptp_rate = clk_get_rate(priv->clk_ptp_ref);
-		netdev_dbg(priv->dev, "PTP rate %d\n", priv->clk_ptp_rate);
-	}
-
 	priv->adv_ts = 0;
 	/* Check if adv_ts can be enabled for dwmac 4.x core */
 	if (priv->plat->has_gmac4 && priv->dma_cap.atime_stamp)
@@ -683,8 +671,8 @@ static int stmmac_init_ptp(struct stmmac_priv *priv)
 
 static void stmmac_release_ptp(struct stmmac_priv *priv)
 {
-	if (priv->clk_ptp_ref)
-		clk_disable_unprepare(priv->clk_ptp_ref);
+	if (priv->plat->clk_ptp_ref)
+		clk_disable_unprepare(priv->plat->clk_ptp_ref);
 	stmmac_ptp_unregister(priv);
 }
 
@@ -3278,44 +3266,8 @@ int stmmac_dvr_probe(struct device *device,
 	if ((phyaddr >= 0) && (phyaddr <= 31))
 		priv->plat->phy_addr = phyaddr;
 
-	priv->stmmac_clk = devm_clk_get(priv->device, STMMAC_RESOURCE_NAME);
-	if (IS_ERR(priv->stmmac_clk)) {
-		netdev_warn(priv->dev, "%s: warning: cannot get CSR clock\n",
-			    __func__);
-		/* If failed to obtain stmmac_clk and specific clk_csr value
-		 * is NOT passed from the platform, probe fail.
-		 */
-		if (!priv->plat->clk_csr) {
-			ret = PTR_ERR(priv->stmmac_clk);
-			goto error_clk_get;
-		} else {
-			priv->stmmac_clk = NULL;
-		}
-	}
-	clk_prepare_enable(priv->stmmac_clk);
-
-	priv->pclk = devm_clk_get(priv->device, "pclk");
-	if (IS_ERR(priv->pclk)) {
-		if (PTR_ERR(priv->pclk) == -EPROBE_DEFER) {
-			ret = -EPROBE_DEFER;
-			goto error_pclk_get;
-		}
-		priv->pclk = NULL;
-	}
-	clk_prepare_enable(priv->pclk);
-
-	priv->stmmac_rst = devm_reset_control_get(priv->device,
-						  STMMAC_RESOURCE_NAME);
-	if (IS_ERR(priv->stmmac_rst)) {
-		if (PTR_ERR(priv->stmmac_rst) == -EPROBE_DEFER) {
-			ret = -EPROBE_DEFER;
-			goto error_hw_init;
-		}
-		dev_info(priv->device, "no reset control found\n");
-		priv->stmmac_rst = NULL;
-	}
-	if (priv->stmmac_rst)
-		reset_control_deassert(priv->stmmac_rst);
+	if (priv->plat->stmmac_rst)
+		reset_control_deassert(priv->plat->stmmac_rst);
 
 	/* Init MAC and get the capabilities */
 	ret = stmmac_hw_init(priv);
@@ -3406,10 +3358,6 @@ int stmmac_dvr_probe(struct device *device,
 error_netdev_register:
 	netif_napi_del(&priv->napi);
 error_hw_init:
-	clk_disable_unprepare(priv->pclk);
-error_pclk_get:
-	clk_disable_unprepare(priv->stmmac_clk);
-error_clk_get:
 	free_netdev(ndev);
 
 	return ret;
@@ -3435,10 +3383,10 @@ int stmmac_dvr_remove(struct device *dev)
 	stmmac_set_mac(priv->ioaddr, false);
 	netif_carrier_off(ndev);
 	unregister_netdev(ndev);
-	if (priv->stmmac_rst)
-		reset_control_assert(priv->stmmac_rst);
-	clk_disable_unprepare(priv->pclk);
-	clk_disable_unprepare(priv->stmmac_clk);
+	if (priv->plat->stmmac_rst)
+		reset_control_assert(priv->plat->stmmac_rst);
+	clk_disable_unprepare(priv->plat->pclk);
+	clk_disable_unprepare(priv->plat->stmmac_clk);
 	if (priv->hw->pcs != STMMAC_PCS_RGMII &&
 	    priv->hw->pcs != STMMAC_PCS_TBI &&
 	    priv->hw->pcs != STMMAC_PCS_RTBI)
@@ -3487,8 +3435,8 @@ int stmmac_suspend(struct device *dev)
 		stmmac_set_mac(priv->ioaddr, false);
 		pinctrl_pm_select_sleep_state(priv->device);
 		/* Disable clock in case of PWM is off */
-		clk_disable(priv->pclk);
-		clk_disable(priv->stmmac_clk);
+		clk_disable(priv->plat->pclk);
+		clk_disable(priv->plat->stmmac_clk);
 	}
 	spin_unlock_irqrestore(&priv->lock, flags);
 
@@ -3528,8 +3476,8 @@ int stmmac_resume(struct device *dev)
 	} else {
 		pinctrl_pm_select_default_state(priv->device);
 		/* enable the clk prevously disabled */
-		clk_enable(priv->stmmac_clk);
-		clk_enable(priv->pclk);
+		clk_enable(priv->plat->stmmac_clk);
+		clk_enable(priv->plat->pclk);
 		/* reset the phy so that it's ready */
 		if (priv->mii)
 			stmmac_mdio_reset(priv->mii);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 6064fcc..4e44f9c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -336,7 +336,54 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 
 	plat->axi = stmmac_axi_setup(pdev);
 
+	/* clock setup */
+	plat->stmmac_clk = devm_clk_get(&pdev->dev,
+					STMMAC_RESOURCE_NAME);
+	if (IS_ERR(plat->stmmac_clk)) {
+		dev_warn(&pdev->dev, "Cannot get CSR clock\n");
+		plat->stmmac_clk = NULL;
+	}
+	clk_prepare_enable(plat->stmmac_clk);
+
+	plat->pclk = devm_clk_get(&pdev->dev, "pclk");
+	if (IS_ERR(plat->pclk)) {
+		if (PTR_ERR(plat->pclk) == -EPROBE_DEFER)
+			goto error_pclk_get;
+
+		plat->pclk = NULL;
+	}
+	clk_prepare_enable(plat->pclk);
+
+	/* Fall-back to main clock in case of no PTP ref is passed */
+	plat->clk_ptp_ref = devm_clk_get(&pdev->dev, "clk_ptp_ref");
+	if (IS_ERR(plat->clk_ptp_ref)) {
+		plat->clk_ptp_rate = clk_get_rate(plat->stmmac_clk);
+		plat->clk_ptp_ref = NULL;
+		dev_warn(&pdev->dev, "PTP uses main clock\n");
+	} else {
+		clk_prepare_enable(plat->clk_ptp_ref);
+		plat->clk_ptp_rate = clk_get_rate(plat->clk_ptp_ref);
+		dev_info(&pdev->dev, "No reset control found\n");
+	}
+
+	plat->stmmac_rst = devm_reset_control_get(&pdev->dev,
+						  STMMAC_RESOURCE_NAME);
+	if (IS_ERR(plat->stmmac_rst)) {
+		if (PTR_ERR(plat->stmmac_rst) == -EPROBE_DEFER)
+			goto error_hw_init;
+
+		dev_info(&pdev->dev, "no reset control found\n");
+		plat->stmmac_rst = NULL;
+	}
+
 	return plat;
+
+error_hw_init:
+	clk_disable_unprepare(plat->pclk);
+error_pclk_get:
+	clk_disable_unprepare(plat->stmmac_clk);
+
+	return ERR_PTR(-EPROBE_DEFER);
 }
 
 /**
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 90fefde..e29e7b8 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -139,6 +139,11 @@ struct plat_stmmacenet_data {
 	int (*init)(struct platform_device *pdev, void *priv);
 	void (*exit)(struct platform_device *pdev, void *priv);
 	void *bsp_priv;
+	struct clk *stmmac_clk;
+	struct clk *pclk;
+	struct clk *clk_ptp_ref;
+	unsigned int clk_ptp_rate;
+	struct reset_control *stmmac_rst;
 	struct stmmac_axi *axi;
 	int has_gmac4;
 	bool tso_en;
-- 
2.9.3

^ permalink raw reply related

* [PATCH 1/3] stmmac: adding DT parameter for LPI tx clock gating
From: Joao Pinto @ 2017-01-04  9:43 UTC (permalink / raw)
  To: davem; +Cc: lars.persson, niklass, swarren, treding, netdev, Joao Pinto
In-Reply-To: <cover.1483372523.git.jpinto@synopsys.com>

This patch adds a new parameter to the stmmac DT: snps,en-tx-lpi-clockgating.
It was ported from synopsys/dwc_eth_qos.c and it is useful if lpi tx clock
gating is needed by stmmac users also.

Signed-off-by: Joao Pinto <jpinto@synopsys.com>
---
changes v1 -> v2:
- Nothing changed, just to keep up patch set version

 Documentation/devicetree/bindings/net/stmmac.txt      | 2 ++
 drivers/net/ethernet/stmicro/stmmac/common.h          | 3 ++-
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c  | 5 ++++-
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h          | 1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c     | 6 +++++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c     | 3 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 +++
 include/linux/stmmac.h                                | 1 +
 8 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt
index 128da75..a0c749f 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -49,6 +49,8 @@ Optional properties:
 - snps,force_sf_dma_mode	Force DMA to use the Store and Forward
 				mode for both tx and rx. This flag is
 				ignored if force_thresh_dma_mode is set.
+- snps,en-tx-lpi-clockgating	Enable gating of the MAC TX clock during
+				TX low-power mode
 - snps,multicast-filter-bins:	Number of multicast filter hash bins
 				supported by this device instance
 - snps,perfect-filter-entries:	Number of perfect filter entries supported
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 6c96291..75e2666 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -476,7 +476,8 @@ struct stmmac_ops {
 			      unsigned int reg_n);
 	void (*get_umac_addr)(struct mac_device_info *hw, unsigned char *addr,
 			      unsigned int reg_n);
-	void (*set_eee_mode)(struct mac_device_info *hw);
+	void (*set_eee_mode)(struct mac_device_info *hw,
+			     bool en_tx_lpi_clockgating);
 	void (*reset_eee_mode)(struct mac_device_info *hw);
 	void (*set_eee_timer)(struct mac_device_info *hw, int ls, int tw);
 	void (*set_eee_pls)(struct mac_device_info *hw, int link);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index be3c91c..a5ffca1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -343,11 +343,14 @@ static int dwmac1000_irq_status(struct mac_device_info *hw,
 	return ret;
 }
 
-static void dwmac1000_set_eee_mode(struct mac_device_info *hw)
+static void dwmac1000_set_eee_mode(struct mac_device_info *hw,
+				   bool en_tx_lpi_clockgating)
 {
 	void __iomem *ioaddr = hw->pcsr;
 	u32 value;
 
+	/*TODO - en_tx_lpi_clockgating treatment */
+
 	/* Enable the link status receive on RGMII, SGMII ore SMII
 	 * receive path and instruct the transmit to enter in LPI
 	 * state.
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 73d1dab..db45134 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -98,6 +98,7 @@ enum power_event {
 #define GMAC4_LPI_TIMER_CTRL	0xd4
 
 /* LPI control and status defines */
+#define GMAC4_LPI_CTRL_STATUS_LPITCSE	BIT(21)	/* LPI Tx Clock Stop Enable */
 #define GMAC4_LPI_CTRL_STATUS_LPITXA	BIT(19)	/* Enable LPI TX Automate */
 #define GMAC4_LPI_CTRL_STATUS_PLS	BIT(17) /* PHY Link Status */
 #define GMAC4_LPI_CTRL_STATUS_LPIEN	BIT(16)	/* LPI Enable */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 02eab79..834f40f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -137,7 +137,8 @@ static void dwmac4_get_umac_addr(struct mac_device_info *hw,
 				   GMAC_ADDR_LOW(reg_n));
 }
 
-static void dwmac4_set_eee_mode(struct mac_device_info *hw)
+static void dwmac4_set_eee_mode(struct mac_device_info *hw,
+				bool en_tx_lpi_clockgating)
 {
 	void __iomem *ioaddr = hw->pcsr;
 	u32 value;
@@ -149,6 +150,9 @@ static void dwmac4_set_eee_mode(struct mac_device_info *hw)
 	value = readl(ioaddr + GMAC4_LPI_CTRL_STATUS);
 	value |= GMAC4_LPI_CTRL_STATUS_LPIEN | GMAC4_LPI_CTRL_STATUS_LPITXA;
 
+	if (en_tx_lpi_clockgating)
+		value |= GMAC4_LPI_CTRL_STATUS_LPITCSE;
+
 	writel(value, ioaddr + GMAC4_LPI_CTRL_STATUS);
 }
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c97870f..f1a0afc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -239,7 +239,8 @@ static void stmmac_enable_eee_mode(struct stmmac_priv *priv)
 	/* Check and enter in LPI mode */
 	if ((priv->dirty_tx == priv->cur_tx) &&
 	    (priv->tx_path_in_lpi_mode == false))
-		priv->hw->mac->set_eee_mode(priv->hw);
+		priv->hw->mac->set_eee_mode(priv->hw,
+					    priv->plat->en_tx_lpi_clockgating);
 }
 
 /**
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 082cd48..6064fcc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -249,6 +249,9 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 	plat->force_sf_dma_mode =
 		of_property_read_bool(np, "snps,force_sf_dma_mode");
 
+	plat->en_tx_lpi_clockgating =
+		of_property_read_bool(np, "snps,en-tx-lpi-clockgating");
+
 	/* Set the maxmtu to a default of JUMBO_LEN in case the
 	 * parameter is not present in the device tree.
 	 */
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 266dab9..90fefde 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -143,5 +143,6 @@ struct plat_stmmacenet_data {
 	int has_gmac4;
 	bool tso_en;
 	int mac_port_sel_speed;
+	bool en_tx_lpi_clockgating;
 };
 #endif
-- 
2.9.3

^ permalink raw reply related

* [PATCH 0/3] adding new glue driver dwmac-dwc-qos-eth
From: Joao Pinto @ 2017-01-04  9:43 UTC (permalink / raw)
  To: davem; +Cc: lars.persson, niklass, swarren, treding, netdev, Joao Pinto

This patch set contains the porting of the synopsys/dwc_eth_qos.c driver
to the stmmac structure. This operation resulted in the creation of a new
platform glue driver called dwmac-dwc-qos-eth which was based in the
dwc_eth_qos as is.

dwmac-dwc-qos-eth inherited dwc_eth_qos DT bindings, to assure that current
and old users can continue to use it as before. We can see this driver as
being deprecated, since all new development will be done in stmmac.

Please check each patch for implementation details.

Joao Pinto (3):
  stmmac: adding DT parameter for LPI tx clock gating
  stmmac: move stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to platform
    structure
  stmmac: adding new glue driver dwmac-dwc-qos-eth

 .../bindings/net/snps,dwc-qos-ethernet.txt         |   3 +
 Documentation/devicetree/bindings/net/stmmac.txt   |   2 +
 drivers/net/ethernet/stmicro/stmmac/Kconfig        |   9 +
 drivers/net/ethernet/stmicro/stmmac/Makefile       |   1 +
 drivers/net/ethernet/stmicro/stmmac/common.h       |   3 +-
 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c    | 199 +++++++++++++++++++++
 .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   |   5 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h       |   1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  |   6 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |   5 -
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |   4 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  85 ++-------
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  65 ++++++-
 include/linux/stmmac.h                             |   6 +
 14 files changed, 313 insertions(+), 81 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c

-- 
2.9.3

^ permalink raw reply

* [PATCH 3/3] stmmac: adding new glue driver dwmac-dwc-qos-eth
From: Joao Pinto @ 2017-01-04  9:43 UTC (permalink / raw)
  To: davem; +Cc: lars.persson, niklass, swarren, treding, netdev, Joao Pinto
In-Reply-To: <cover.1483372523.git.jpinto@synopsys.com>

This patch adds a new glue driver called dwmac-dwc-qos-eth which
was based in the dwc_eth_qos as is. To assure retro-compatibility a slight
tweak was also added to stmmac_platform.

Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Reviewed-by: Lars Persson <larper@axis.com>
---
changes v1 -> v2 (Lars)
- brust-map convertion
- axis name removed from the driver
- Makefile include order fixed
- snps,read-requests entry fixed
- check for platform_get_irq() failure
- check for devm_ioremap_resource() failure
- clk err order was fixed

 .../bindings/net/snps,dwc-qos-ethernet.txt         |   3 +
 drivers/net/ethernet/stmicro/stmmac/Kconfig        |   9 +
 drivers/net/ethernet/stmicro/stmmac/Makefile       |   1 +
 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c    | 199 +++++++++++++++++++++
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  15 +-
 5 files changed, 224 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c

diff --git a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
index d93f71c..21d27aa 100644
--- a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
+++ b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
@@ -1,5 +1,8 @@
 * Synopsys DWC Ethernet QoS IP version 4.10 driver (GMAC)
 
+This binding is deprecated, but it continues to be supported, but new
+features should be preferably added to the stmmac binding document.
+
 This binding supports the Synopsys Designware Ethernet QoS (Quality Of Service)
 IP block. The IP supports multiple options for bus type, clocking and reset
 structure, and feature list. Consequently, a number of properties and list
diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index ab66248..99594e3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -29,6 +29,15 @@ config STMMAC_PLATFORM
 
 if STMMAC_PLATFORM
 
+config DWMAC_DWC_QOS_ETH
+	tristate "Support for snps,dwc-qos-ethernet.txt DT binding."
+	select PHYLIB
+	select CRC32
+	select MII
+	depends on OF && HAS_DMA
+	help
+	  Support for chips using the snps,dwc-qos-ethernet.txt DT binding.
+
 config DWMAC_GENERIC
 	tristate "Generic driver for DWMAC"
 	default STMMAC_PLATFORM
diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 8f83a86..700c603 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_DWMAC_SOCFPGA)	+= dwmac-altr-socfpga.o
 obj-$(CONFIG_DWMAC_STI)		+= dwmac-sti.o
 obj-$(CONFIG_DWMAC_STM32)	+= dwmac-stm32.o
 obj-$(CONFIG_DWMAC_SUNXI)	+= dwmac-sunxi.o
+obj-$(CONFIG_DWMAC_DWC_QOS_ETH)	+= dwmac-dwc-qos-eth.o
 obj-$(CONFIG_DWMAC_GENERIC)	+= dwmac-generic.o
 stmmac-platform-objs:= stmmac_platform.o
 dwmac-altr-socfpga-objs := altr_tse_pcs.o dwmac-socfpga.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
new file mode 100644
index 0000000..8a6ac06
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
@@ -0,0 +1,199 @@
+/*
+ * Synopsys DWC Ethernet Quality-of-Service v4.10a linux driver
+ *
+ * Copyright (C) 2016 Joao Pinto <jpinto@synopsys.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/clk.h>
+#include <linux/clk-provider.h>
+#include <linux/device.h>
+#include <linux/ethtool.h>
+#include <linux/io.h>
+#include <linux/ioport.h>
+#include <linux/module.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/platform_device.h>
+#include <linux/stmmac.h>
+
+#include "stmmac_platform.h"
+
+static int dwc_eth_dwmac_config_dt(struct platform_device *pdev,
+				   struct plat_stmmacenet_data *plat_dat)
+{
+	struct device_node *np = pdev->dev.of_node;
+	u32 burst_map = 0;
+	u32 bit_index = 0;
+	u32 a_index = 0;
+
+	if (!plat_dat->axi) {
+		plat_dat->axi = kzalloc(sizeof(struct stmmac_axi), GFP_KERNEL);
+
+		if (!plat_dat->axi)
+			return -ENOMEM;
+	}
+
+	plat_dat->axi->axi_lpi_en = of_property_read_bool(np, "snps,en-lpi");
+	if (of_property_read_u32(np, "snps,write-requests",
+				 &plat_dat->axi->axi_wr_osr_lmt)) {
+		/**
+		 * Since the register has a reset value of 1, if property
+		 * is missing, default to 1.
+		 */
+		plat_dat->axi->axi_wr_osr_lmt = 1;
+	} else {
+		/**
+		 * If property exists, to keep the behavior from dwc_eth_qos,
+		 * subtract one after parsing.
+		 */
+		plat_dat->axi->axi_wr_osr_lmt--;
+	}
+
+	if (of_property_read_u32(np, "read,read-requests",
+				 &plat_dat->axi->axi_rd_osr_lmt)) {
+		/**
+		 * Since the register has a reset value of 1, if property
+		 * is missing, default to 1.
+		 */
+		plat_dat->axi->axi_rd_osr_lmt = 1;
+	} else {
+		/**
+		 * If property exists, to keep the behavior from dwc_eth_qos,
+		 * subtract one after parsing.
+		 */
+		plat_dat->axi->axi_rd_osr_lmt--;
+	}
+	of_property_read_u32(np, "snps,burst-map", &burst_map);
+
+	/* converts burst-map bitmask to burst array */
+	for (bit_index = 0; bit_index < 7; bit_index++) {
+		if (burst_map & (1 << bit_index)) {
+			switch (bit_index) {
+			case 0:
+			plat_dat->axi->axi_blen[a_index] = 4; break;
+			case 1:
+			plat_dat->axi->axi_blen[a_index] = 8; break;
+			case 2:
+			plat_dat->axi->axi_blen[a_index] = 16; break;
+			case 3:
+			plat_dat->axi->axi_blen[a_index] = 32; break;
+			case 4:
+			plat_dat->axi->axi_blen[a_index] = 64; break;
+			case 5:
+			plat_dat->axi->axi_blen[a_index] = 128; break;
+			case 6:
+			plat_dat->axi->axi_blen[a_index] = 256; break;
+			default:
+			break;
+			}
+			a_index++;
+		}
+	}
+
+	/* dwc-qos needs GMAC4, AAL, TSO and PMT */
+	plat_dat->has_gmac4 = 1;
+	plat_dat->dma_cfg->aal = 1;
+	plat_dat->tso_en = 1;
+	plat_dat->pmt = 1;
+
+	return 0;
+}
+
+static int dwc_eth_dwmac_probe(struct platform_device *pdev)
+{
+	struct plat_stmmacenet_data *plat_dat;
+	struct stmmac_resources stmmac_res;
+	struct resource *res;
+	int ret;
+
+	/**
+	 * Since stmmac_platform supports name IRQ only, basic platform
+	 * resource initialization is done in the glue logic.
+	 */
+	stmmac_res.irq = platform_get_irq(pdev, 0);
+	if (stmmac_res.irq < 0) {
+		if (stmmac_res.irq != -EPROBE_DEFER) {
+			dev_err(&pdev->dev,
+				"IRQ configuration information not found\n");
+		}
+		return stmmac_res.irq;
+	}
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	stmmac_res.addr = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(stmmac_res.addr))
+		return PTR_ERR(stmmac_res.addr);
+
+	plat_dat = stmmac_probe_config_dt(pdev, &stmmac_res.mac);
+	if (IS_ERR(plat_dat))
+		return PTR_ERR(plat_dat);
+
+	plat_dat->stmmac_clk = devm_clk_get(&pdev->dev, "phy_ref_clk");
+	if (IS_ERR(plat_dat->stmmac_clk)) {
+		dev_err(&pdev->dev, "apb_pclk clock not found.\n");
+		ret = PTR_ERR(plat_dat->stmmac_clk);
+		plat_dat->stmmac_clk = NULL;
+		goto err_remove_config_dt;
+	}
+	clk_prepare_enable(plat_dat->stmmac_clk);
+
+	plat_dat->pclk = devm_clk_get(&pdev->dev, "apb_pclk");
+	if (IS_ERR(plat_dat->pclk)) {
+		dev_err(&pdev->dev, "phy_ref_clk clock not found.\n");
+		ret = PTR_ERR(plat_dat->pclk);
+		plat_dat->pclk = NULL;
+		goto err_out_clk_dis_phy;
+	}
+	clk_prepare_enable(plat_dat->pclk);
+
+	ret = dwc_eth_dwmac_config_dt(pdev, plat_dat);
+	if (ret)
+		goto err_out_clk_dis_aper;
+
+	ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+	if (ret)
+		goto err_out_clk_dis_aper;
+
+	return 0;
+
+err_out_clk_dis_aper:
+	clk_disable_unprepare(plat_dat->pclk);
+err_out_clk_dis_phy:
+	clk_disable_unprepare(plat_dat->stmmac_clk);
+err_remove_config_dt:
+	stmmac_remove_config_dt(pdev, plat_dat);
+
+	return ret;
+}
+
+static int dwc_eth_dwmac_remove(struct platform_device *pdev)
+{
+	return stmmac_pltfr_remove(pdev);
+}
+
+static const struct of_device_id dwc_eth_dwmac_match[] = {
+	{ .compatible = "snps,dwc-qos-ethernet-4.10", },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, dwc_eth_dwmac_match);
+
+static struct platform_driver dwc_eth_dwmac_driver = {
+	.probe  = dwc_eth_dwmac_probe,
+	.remove = dwc_eth_dwmac_remove,
+	.driver = {
+		.name           = "dwc-eth-dwmac",
+		.of_match_table = dwc_eth_dwmac_match,
+	},
+};
+module_platform_driver(dwc_eth_dwmac_driver);
+
+MODULE_AUTHOR("Joao Pinto <jpinto@synopsys.com>");
+MODULE_DESCRIPTION("Synopsys DWC Ethernet Quality-of-Service v4.10a driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 4e44f9c..00c0f8d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -181,10 +181,19 @@ static int stmmac_dt_phy(struct plat_stmmacenet_data *plat,
 		mdio = false;
 	}
 
-	/* If snps,dwmac-mdio is passed from DT, always register the MDIO */
-	for_each_child_of_node(np, plat->mdio_node) {
-		if (of_device_is_compatible(plat->mdio_node, "snps,dwmac-mdio"))
+	/* exception for dwmac-dwc-qos-eth glue logic */
+	if (of_device_is_compatible(np, "snps,dwc-qos-ethernet-4.10")) {
+		plat->mdio_node = of_get_child_by_name(np, "mdio");
+	} else {
+		/**
+		 * If snps,dwmac-mdio is passed from DT, always register
+		 * the MDIO
+		 */
+		for_each_child_of_node(np, plat->mdio_node) {
+			if (of_device_is_compatible(plat->mdio_node,
+						    "snps,dwmac-mdio"))
 			break;
+		}
 	}
 
 	if (plat->mdio_node) {
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net 2/2] net: systemport: Pad packet before inserting TSB
From: Sergei Shtylyov @ 2017-01-04 10:04 UTC (permalink / raw)
  To: Florian Fainelli, netdev; +Cc: davem
In-Reply-To: <20170104003449.27078-3-f.fainelli@gmail.com>

Hello!

On 1/4/2017 3:34 AM, Florian Fainelli wrote:

> Inserting the TSB means adding an extra 8 bytes in fron the of packet

    In front?

> that is going to be used as metadata information by the TDMA engine, but
> stripped off, so it does not really help with the packet padding.
>
> For some odd packet sizes that fall below the 60 bytes payload (e.g: ARP)
> we can end-up padding them after the TSB insertion, thus making them 64
> bytes, but with the TDMA stripping off the first 8 bytes, they could
> still be smaller than 64 bytes which is required to ingress the switch.
>
> Fix this by swapping the padding and TSB insertion, guaranteeing that
> the packets have the right sizes.
>
> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
[...]

MBR, Sergei

^ permalink raw reply

* RE: [PATCH v2 09/11] net: mvpp2: simplify MVPP2_PRS_RI_* definitions
From: David Laight @ 2017-01-04 10:14 UTC (permalink / raw)
  To: 'Thomas Petazzoni', David S. Miller,
	netdev@vger.kernel.org
  Cc: Nadav Haklai, Hanna Hawa, Yehuda Yitschak, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	linux-arm-kernel@lists.infradead.org, Stefan Chulski,
	Marcin Wojtas
In-Reply-To: <1482857660-16092-10-git-send-email-thomas.petazzoni@free-electrons.com>

From: Thomas Petazzoni
> Sent: 27 December 2016 16:54
> Some of the MVPP2_PRS_RI_* definitions use the ~(value) syntax, which
> doesn't compile nicely on 64-bit. Moreover, those definitions are in
> fact unneeded, since they are always used in combination with a bit
> mask that ensures only the appropriate bits are modified.
> 
> Therefore, such definitions should just be set to 0x0. For example:
> 
>  #define MVPP2_PRS_RI_L2_CAST_MASK              0x600
>  #define MVPP2_PRS_RI_L2_UCAST                  ~(BIT(9) | BIT(10))
>  #define MVPP2_PRS_RI_L2_MCAST                  BIT(9)
>  #define MVPP2_PRS_RI_L2_BCAST                  BIT(10)
> 
> becomes
> 
>  #define MVPP2_PRS_RI_L2_CAST_MASK              0x600
>  #define MVPP2_PRS_RI_L2_UCAST                  0x0
>  #define MVPP2_PRS_RI_L2_MCAST                  BIT(9)
>  #define MVPP2_PRS_RI_L2_BCAST                  BIT(10)
...
Shouldn't MVPP2_PRS_RI_L2_CAST_MASK be defined as
(MVPP2_PRS_RI_L2_MCAST | MVPP2_PRS_RI_L2_BCAST)?

	David



^ permalink raw reply

* Re: [PATCH net-next] net/sched: cls_flower: Add user specified data
From: Simon Horman @ 2017-01-04 10:14 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Jamal Hadi Salim, John Fastabend, David S. Miller, netdev,
	Jiri Pirko, Hadar Hen Zion, Or Gerlitz, Roi Dayan, Roman Mashak
In-Reply-To: <606f64cb-1744-d44f-8fd1-a4bdb1ca872f@mellanox.com>

On Tue, Jan 03, 2017 at 02:22:05PM +0200, Paul Blakey wrote:
> 
> 
> On 03/01/2017 13:44, Jamal Hadi Salim wrote:
> >On 17-01-02 11:33 PM, John Fastabend wrote:
> >>On 17-01-02 05:22 PM, Jamal Hadi Salim wrote:
> >
> >[..]
> >>>Like all cookie semantics it is for storing state. The receiver
> >>>(kernel)
> >>>is not just store it and not intepret it. The user when reading it back
> >>>simplifies what they have to do for their processing.
> >>>
> >>>>
> >>>>The tuple <ifindex:qdisc:prio:handle> really should be unique why
> >>>>not use this for system wide mappings?
> >>>>
> >>>
> >>>I think on a single machine should be enough, however:
> >>>typically the user wants to define the value in a manner that
> >>>in a distributed system it is unique. It would be trickier to
> >>>do so with well defined values such as above.
> >>>
> >>
> >>Just extend the tuple <hostname:ifindex:qdisc:prio:handle> that
> >>should be unique in the domain of hostname's, or use some other domain
> >>wide machine identifier.
> >>
> >
> >May work for the case of filter identification. The nice thing for
> >allowing cookies is you can let the user define it define their
> >own scheme.
> >
> >>Although actions can be shared so the cookie can be shared across
> >>filters. Maybe its useful but it doesn't uniquely identify a filter
> >>in the shared case but the user would have to specify that case
> >>so maybe its not important.
> >>
> >
> >Note: the action cookies and filter cookies are unrelated/orthogonal.
> >Their basic concept of stashing something in the cookie to help improve
> >what user space does (in our case millions of actions of which some are
> >used for accounting) is similar.
> >I have no objections to the flow cookies; my main concern was it should
> >be applicable to all classifiers not just flower. And the arbitrary size
> >of the cookie that you pointed out is questionable.
> >
> >cheers,
> >jamal
> 
> 
> Hi all,
> Our use case is replacing OVS rules with TC filters for HW offload, and
> you're are right the cookie would
> have saved us the mapping from OVS rule ufid to the tc filter handle/prio...
> that was generated for it.
> It also was going to be used to store other info like which OVS output port
> corresponds to the ifindex,

Possibly off-topic but I am curious to know why you need to store the port.
My possibly naïve assumption is that a filter is attached to the netdev
corresponding to the input port and mirred or other actions are used to output
to netdevs corresponding to output ports.

> so we need 128+32 for now. It helps us with dumping the the flows back, when
> we lose data on crash
> or restarting the user space daemon.
> HW hints is another thing that might be helpful.
> Its binary blob because user/app specifc and its usage might change in the
> future and its and that's why there
> is some headroom with size as well.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox