Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Something hitting my total number of connections to the server
From: Eric Dumazet @ 2017-08-22 17:44 UTC (permalink / raw)
  To: David Ahern; +Cc: Akshat Kakkar, David Laight, netdev, Willem de Bruijn
In-Reply-To: <bf580154-aea9-4143-5d9c-df3764b5e745@gmail.com>

On Tue, 2017-08-22 at 09:43 -0700, David Ahern wrote:
> On 8/22/17 6:02 AM, Eric Dumazet wrote:
> >>
> >> net.core.netdev_max_backlog=10000
> > This is an insane backlog.
> > 
> 
> https://www.kernel.org/doc/Documentation/networking/scaling.txt
> 
> "== Suggested Configuration
> 
> Flow limit is useful on systems with many concurrent connections,
> where a single connection taking up 50% of a CPU indicates a problem.
> In such environments, enable the feature on all CPUs that handle
> network rx interrupts (as set in /proc/irq/N/smp_affinity).
> 
> The feature depends on the input packet queue length to exceed
> the flow limit threshold (50%) + the flow history length (256).
> Setting net.core.netdev_max_backlog to either 1000 or 10000
> performed well in experiments."

10000 is adding tail latencies.

At Google we run all the fleet with backlog of 1000

And yes, it took time to get rid of the backlog of 10000 that was setup
years ago, because of old constraints and some fears.

Willem wrote this doc in 2013, before we finally went back to 1000.

We should update this doc.

^ permalink raw reply

* Re: [PATCH net-next v4] openvswitch: enable NSH support
From: Ben Pfaff @ 2017-08-22 17:35 UTC (permalink / raw)
  To: Jan Scheurich
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, e@erig.me,
	dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, Jiri Benc
In-Reply-To: <CFF8EF42F1132E4CBE2BF0AB6C21C58D7274C9FB-hqolJogE5njKJFWPz4pdheaU1rCVNFv4@public.gmane.org>

On Tue, Aug 22, 2017 at 08:32:49AM +0000, Jan Scheurich wrote:
> > > Or why else does OVS user space code take so great pain to model
> > > possible misalignment and provide/use safe access functions?
> > 
> > I don't know how the ovs user space deals with packet allocation. In
> > the kernel, the network header is aligned in a way that it allows
> > efficient 32bit access.
> 
> It seems that OVS has not had the same approach as Linux. There is no
> config parameter covering the alignment characteristics of the machine
> architecture. For packets buffers received from outside sources
> (e.g. DPDK interfaces) they make no assumptions on alignment and play
> safe. For packets allocated inside OVS, the Ethernet packet is
> typically stored so that the L3 header is 32-bit aligned, so that the
> misalignment precautions would be unnecessary. But I didn't check all
> code paths.

We solved the alignment problem in OVS userspace a different way, by
defining our versions of the network protocol headers so that they only
need 16-bit alignment.  In turn, we did that by defining a
ovs_16aligned_be32 type as a pair of be16s and ovs_16aligned_be64 as
four be16s, and using helper functions for reads and writes.  This made
it harder to screw up alignment in a subtle way and only find out long
after release when someone tested a corner case on a RISC architecture.
It probably has a performance cost on those RISC architectures, for the
cases where the access really is aligned, but it's more obviously
correct and I highly value that for OVS userspace.

As far as I can tell it's not actually possible, in the general case, to
add padding such that all parts of a packet are aligned.  VXLAN is the
case that comes to mind.  With VXLAN, as far as I can tell, the 14-byte
inner Ethernet header mean that you can align either the outer IPv4
header or the inner IPv4 header, but not both.  That means that no
matter how careful OVS is about aligning packets, it would still have to
deal with unaligned accesses in some cases.

I see that the VXLAN issue has come up in Linux before:
https://www.ietf.org/mail-archive/web/nvo3/current/msg05743.html

^ permalink raw reply

* Re: [PATCH] net: ethernet: stmmac: dwmac-rk: Add rv1108 gmac support
From: David Miller @ 2017-08-22 17:31 UTC (permalink / raw)
  To: david.wu
  Cc: heiko, robh+dt, linux, peppe.cavallaro, alexandre.torgue,
	huangtao, netdev, linux-rockchip, linux-kernel
In-Reply-To: <1503310375-30393-1-git-send-email-david.wu@rock-chips.com>

From: David Wu <david.wu@rock-chips.com>
Date: Mon, 21 Aug 2017 18:12:55 +0800

> It only supports rmii interface. Add constants and callback functions
> for the dwmac on rv1108 socs. As can be seen, the base structure is
> the same, only registers and the bits in them moved slightly.
> 
> Signed-off-by: David Wu <david.wu@rock-chips.com>

Applied, thanks.

^ permalink raw reply

* [PATCH v2 6/6] dpaa_eth: check allocation result
From: Madalin Bucur @ 2017-08-22 17:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503423066-15420-1-git-send-email-madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 73ca8d7..4225806 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2561,6 +2561,9 @@ static struct dpaa_bp *dpaa_bp_alloc(struct device *dev)
 
 	dpaa_bp->bpid = FSL_DPAA_BPID_INV;
 	dpaa_bp->percpu_count = devm_alloc_percpu(dev, *dpaa_bp->percpu_count);
+	if (!dpaa_bp->percpu_count)
+		return ERR_PTR(-ENOMEM);
+
 	dpaa_bp->config_count = FSL_DPAA_ETH_MAX_BUF_COUNT;
 
 	dpaa_bp->seed_cb = dpaa_bp_seed;
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 5/6] Documentation: networking: add RSS information
From: Madalin Bucur @ 2017-08-22 17:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503423066-15420-1-git-send-email-madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 Documentation/networking/dpaa.txt | 68 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/dpaa.txt b/Documentation/networking/dpaa.txt
index 76e016d..f88194f 100644
--- a/Documentation/networking/dpaa.txt
+++ b/Documentation/networking/dpaa.txt
@@ -13,6 +13,7 @@ Contents
 	- Configuring DPAA Ethernet in your kernel
 	- DPAA Ethernet Frame Processing
 	- DPAA Ethernet Features
+	- DPAA IRQ Affinity and Receive Side Scaling
 	- Debugging
 
 DPAA Ethernet Overview
@@ -147,7 +148,10 @@ gradually.
 
 The driver has Rx and Tx checksum offloading for UDP and TCP. Currently the Rx
 checksum offload feature is enabled by default and cannot be controlled through
-ethtool.
+ethtool. Also, rx-flow-hash and rx-hashing was added. The addition of RSS
+provides a big performance boost for the forwarding scenarios, allowing
+different traffic flows received by one interface to be processed by different
+CPUs in parallel.
 
 The driver has support for multiple prioritized Tx traffic classes. Priorities
 range from 0 (lowest) to 3 (highest). These are mapped to HW workqueues with
@@ -166,6 +170,68 @@ classes as follows:
 tc qdisc add dev <int> root handle 1: \
 	 mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1
 
+DPAA IRQ Affinity and Receive Side Scaling
+==========================================
+
+Traffic coming on the DPAA Rx queues or on the DPAA Tx confirmation
+queues is seen by the CPU as ingress traffic on a certain portal.
+The DPAA QMan portal interrupts are affined each to a certain CPU.
+The same portal interrupt services all the QMan portal consumers.
+
+By default the DPAA Ethernet driver enables RSS, making use of the
+DPAA FMan Parser and Keygen blocks to distribute traffic on 128
+hardware frame queues using a hash on IP v4/v6 source and destination
+and L4 source and destination ports, in present in the received frame.
+When RSS is disabled, all traffic received by a certain interface is
+received on the default Rx frame queue. The default DPAA Rx frame
+queues are configured to put the received traffic into a pool channel
+that allows any available CPU portal to dequeue the ingress traffic.
+The default frame queues have the HOLDACTIVE option set, ensuring that
+traffic bursts from a certain queue are serviced by the same CPU.
+This ensures a very low rate of frame reordering. A drawback of this
+is that only one CPU at a time can service the traffic received by a
+certain interface when RSS is not enabled.
+
+To implement RSS, the DPAA Ethernet driver allocates an extra set of
+128 Rx frame queues that are configured to dedicated channels, in a
+round-robin manner. The mapping of the frame queues to CPUs is now
+hardcoded, there is no indirection table to move traffic for a certain
+FQ (hash result) to another CPU. The ingress traffic arriving on one
+of these frame queues will arrive at the same portal and will always
+be processed by the same CPU. This ensures intra-flow order preservation
+and workload distribution for multiple traffic flows.
+
+RSS can be turned off for a certain interface using ethtool, i.e.
+
+	# ethtool -N fm1-mac9 rx-flow-hash tcp4 ""
+
+To turn it back on, one needs to set rx-flow-hash for tcp4/6 or udp4/6:
+
+	# ethtool -N fm1-mac9 rx-flow-hash udp4 sfdn
+
+There is no independent control for individual protocols, any command
+run for one of tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 is
+going to control the rx-flow-hashing for all protocols on that interface.
+
+Besides using the FMan Keygen computed hash for spreading traffic on the
+128 Rx FQs, the DPAA Ethernet driver also sets the skb hash value when
+the NETIF_F_RXHASH feature is on (active by default). This can be turned
+on or off through ethtool, i.e.:
+
+	# ethtool -K fm1-mac9 rx-hashing off
+	# ethtool -k fm1-mac9 | grep hash
+	receive-hashing: off
+	# ethtool -K fm1-mac9 rx-hashing on
+	Actual changes:
+	receive-hashing: on
+	# ethtool -k fm1-mac9 | grep hash
+	receive-hashing: on
+
+Please note that Rx hashing depends upon the rx-flow-hashing being on
+for that interface - turning off rx-flow-hashing will also disable the
+rx-hashing (without ethtool reporting it as off as that depends on the
+NETIF_F_RXHASH feature flag).
+
 Debugging
 =========
 
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 4/6] dpaa_eth: add NETIF_F_RXHASH
From: Madalin Bucur @ 2017-08-22 17:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503423066-15420-1-git-send-email-madalin.bucur@nxp.com>

Set the skb hash when then FMan Keygen hash result is available.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c     | 23 +++++++++++++++++++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h     |  1 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c |  9 +++++++--
 drivers/net/ethernet/freescale/fman/fman_port.c    | 11 +++++++++++
 drivers/net/ethernet/freescale/fman/fman_port.h    |  2 ++
 5 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 6d89e74..73ca8d7 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -236,7 +236,7 @@ static int dpaa_netdev_init(struct net_device *net_dev,
 	net_dev->max_mtu = dpaa_get_max_mtu();
 
 	net_dev->hw_features |= (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
-				 NETIF_F_LLTX);
+				 NETIF_F_LLTX | NETIF_F_RXHASH);
 
 	net_dev->hw_features |= NETIF_F_SG | NETIF_F_HIGHDMA;
 	/* The kernels enables GSO automatically, if we declare NETIF_F_SG.
@@ -2237,12 +2237,13 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
 	dma_addr_t addr = qm_fd_addr(fd);
 	enum qm_fd_format fd_format;
 	struct net_device *net_dev;
-	u32 fd_status;
+	u32 fd_status, hash_offset;
 	struct dpaa_bp *dpaa_bp;
 	struct dpaa_priv *priv;
 	unsigned int skb_len;
 	struct sk_buff *skb;
 	int *count_ptr;
+	void *vaddr;
 
 	fd_status = be32_to_cpu(fd->status);
 	fd_format = qm_fd_get_format(fd);
@@ -2288,7 +2289,8 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
 	dma_unmap_single(dpaa_bp->dev, addr, dpaa_bp->size, DMA_FROM_DEVICE);
 
 	/* prefetch the first 64 bytes of the frame or the SGT start */
-	prefetch(phys_to_virt(addr) + qm_fd_get_offset(fd));
+	vaddr = phys_to_virt(addr);
+	prefetch(vaddr + qm_fd_get_offset(fd));
 
 	fd_format = qm_fd_get_format(fd);
 	/* The only FD types that we may receive are contig and S/G */
@@ -2309,6 +2311,18 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
 
 	skb->protocol = eth_type_trans(skb, net_dev);
 
+	if (net_dev->features & NETIF_F_RXHASH && priv->keygen_in_use &&
+	    !fman_port_get_hash_result_offset(priv->mac_dev->port[RX],
+					      &hash_offset)) {
+		enum pkt_hash_types type;
+
+		/* if L4 exists, it was used in the hash generation */
+		type = be32_to_cpu(fd->status) & FM_FD_STAT_L4CV ?
+			PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3;
+		skb_set_hash(skb, be32_to_cpu(*(u32 *)(vaddr + hash_offset)),
+			     type);
+	}
+
 	skb_len = skb->len;
 
 	if (unlikely(netif_receive_skb(skb) == NET_RX_DROP))
@@ -2774,6 +2788,9 @@ static int dpaa_eth_probe(struct platform_device *pdev)
 	if (err)
 		goto init_ports_failed;
 
+	/* Rx traffic distribution based on keygen hashing defaults to on */
+	priv->keygen_in_use = true;
+
 	priv->percpu_priv = devm_alloc_percpu(dev, *priv->percpu_priv);
 	if (!priv->percpu_priv) {
 		dev_err(dev, "devm_alloc_percpu() failed\n");
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 496a12c..bd94220 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -159,6 +159,7 @@ struct dpaa_priv {
 	struct list_head dpaa_fq_list;
 
 	u8 num_tc;
+	bool keygen_in_use;
 	u32 msg_enable;	/* net_device message level */
 
 	struct {
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index 965f652..faea674 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -402,6 +402,8 @@ static void dpaa_get_strings(struct net_device *net_dev, u32 stringset,
 static int dpaa_get_hash_opts(struct net_device *dev,
 			      struct ethtool_rxnfc *cmd)
 {
+	struct dpaa_priv *priv = netdev_priv(dev);
+
 	cmd->data = 0;
 
 	switch (cmd->flow_type) {
@@ -409,7 +411,8 @@ static int dpaa_get_hash_opts(struct net_device *dev,
 	case TCP_V6_FLOW:
 	case UDP_V4_FLOW:
 	case UDP_V6_FLOW:
-		cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+		if (priv->keygen_in_use)
+			cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
 		/* Fall through */
 	case IPV4_FLOW:
 	case IPV6_FLOW:
@@ -421,7 +424,8 @@ static int dpaa_get_hash_opts(struct net_device *dev,
 	case AH_V6_FLOW:
 	case ESP_V4_FLOW:
 	case ESP_V6_FLOW:
-		cmd->data |= RXH_IP_SRC | RXH_IP_DST;
+		if (priv->keygen_in_use)
+			cmd->data |= RXH_IP_SRC | RXH_IP_DST;
 		break;
 	default:
 		cmd->data = 0;
@@ -458,6 +462,7 @@ static void dpaa_set_hash(struct net_device *net_dev, bool enable)
 	rxport = mac_dev->port[0];
 
 	fman_port_use_kg_hash(rxport, enable);
+	priv->keygen_in_use = enable;
 }
 
 static int dpaa_set_hash_opts(struct net_device *dev,
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
index b0ad9c4..451bae7 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -1720,6 +1720,17 @@ u32 fman_port_get_qman_channel_id(struct fman_port *port)
 }
 EXPORT_SYMBOL(fman_port_get_qman_channel_id);
 
+int fman_port_get_hash_result_offset(struct fman_port *port, u32 *offset)
+{
+	if (port->buffer_offsets.hash_result_offset == ILLEGAL_BASE)
+		return -EINVAL;
+
+	*offset = port->buffer_offsets.hash_result_offset;
+
+	return 0;
+}
+EXPORT_SYMBOL(fman_port_get_hash_result_offset);
+
 static int fman_port_probe(struct platform_device *of_dev)
 {
 	struct fman_port *port;
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.h b/drivers/net/ethernet/freescale/fman/fman_port.h
index 5a99611..e86ca6a 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.h
+++ b/drivers/net/ethernet/freescale/fman/fman_port.h
@@ -151,6 +151,8 @@ int fman_port_enable(struct fman_port *port);
 
 u32 fman_port_get_qman_channel_id(struct fman_port *port);
 
+int fman_port_get_hash_result_offset(struct fman_port *port, u32 *offset);
+
 struct fman_port *fman_port_bind(struct device *dev);
 
 #endif /* __FMAN_PORT_H */
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 3/6] dpaa_eth: enable Rx hashing control
From: Madalin Bucur @ 2017-08-22 17:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503423066-15420-1-git-send-email-madalin.bucur@nxp.com>

Allow ethtool control of the Rx flow hashing. By default RSS is
enabled, this allows to turn it off by bypassing the FMan Keygen
block and sending all traffic on the default Rx frame queue.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 113 +++++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index aad825088..965f652 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -399,6 +399,117 @@ static void dpaa_get_strings(struct net_device *net_dev, u32 stringset,
 	memcpy(strings, dpaa_stats_global, size);
 }
 
+static int dpaa_get_hash_opts(struct net_device *dev,
+			      struct ethtool_rxnfc *cmd)
+{
+	cmd->data = 0;
+
+	switch (cmd->flow_type) {
+	case TCP_V4_FLOW:
+	case TCP_V6_FLOW:
+	case UDP_V4_FLOW:
+	case UDP_V6_FLOW:
+		cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+		/* Fall through */
+	case IPV4_FLOW:
+	case IPV6_FLOW:
+	case SCTP_V4_FLOW:
+	case SCTP_V6_FLOW:
+	case AH_ESP_V4_FLOW:
+	case AH_ESP_V6_FLOW:
+	case AH_V4_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V4_FLOW:
+	case ESP_V6_FLOW:
+		cmd->data |= RXH_IP_SRC | RXH_IP_DST;
+		break;
+	default:
+		cmd->data = 0;
+		break;
+	}
+
+	return 0;
+}
+
+static int dpaa_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
+			  u32 *unused)
+{
+	int ret = -EOPNOTSUPP;
+
+	switch (cmd->cmd) {
+	case ETHTOOL_GRXFH:
+		ret = dpaa_get_hash_opts(dev, cmd);
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+}
+
+static void dpaa_set_hash(struct net_device *net_dev, bool enable)
+{
+	struct mac_device *mac_dev;
+	struct fman_port *rxport;
+	struct dpaa_priv *priv;
+
+	priv = netdev_priv(net_dev);
+	mac_dev = priv->mac_dev;
+	rxport = mac_dev->port[0];
+
+	fman_port_use_kg_hash(rxport, enable);
+}
+
+static int dpaa_set_hash_opts(struct net_device *dev,
+			      struct ethtool_rxnfc *nfc)
+{
+	int ret = -EINVAL;
+
+	/* we support hashing on IPv4/v6 src/dest IP and L4 src/dest port */
+	if (nfc->data &
+	    ~(RXH_IP_SRC | RXH_IP_DST | RXH_L4_B_0_1 | RXH_L4_B_2_3))
+		return -EINVAL;
+
+	switch (nfc->flow_type) {
+	case TCP_V4_FLOW:
+	case TCP_V6_FLOW:
+	case UDP_V4_FLOW:
+	case UDP_V6_FLOW:
+	case IPV4_FLOW:
+	case IPV6_FLOW:
+	case SCTP_V4_FLOW:
+	case SCTP_V6_FLOW:
+	case AH_ESP_V4_FLOW:
+	case AH_ESP_V6_FLOW:
+	case AH_V4_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V4_FLOW:
+	case ESP_V6_FLOW:
+		dpaa_set_hash(dev, !!nfc->data);
+		ret = 0;
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+}
+
+static int dpaa_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{
+	int ret = -EOPNOTSUPP;
+
+	switch (cmd->cmd) {
+	case ETHTOOL_SRXFH:
+		ret = dpaa_set_hash_opts(dev, cmd);
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+}
+
 const struct ethtool_ops dpaa_ethtool_ops = {
 	.get_drvinfo = dpaa_get_drvinfo,
 	.get_msglevel = dpaa_get_msglevel,
@@ -412,4 +523,6 @@ const struct ethtool_ops dpaa_ethtool_ops = {
 	.get_strings = dpaa_get_strings,
 	.get_link_ksettings = dpaa_get_link_ksettings,
 	.set_link_ksettings = dpaa_set_link_ksettings,
+	.get_rxnfc = dpaa_get_rxnfc,
+	.set_rxnfc = dpaa_set_rxnfc,
 };
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 2/6] dpaa_eth: use multiple Rx frame queues
From: Madalin Bucur @ 2017-08-22 17:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503423066-15420-1-git-send-email-madalin.bucur@nxp.com>

Add a block of 128 Rx frame queues per port. The FMan hardware will
send traffic on one of these queues based on the FMan port Parse
Classify Distribute setup. The hash computed by the FMan Keygen
block will select the Rx FQ.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c     | 50 +++++++++++++++++++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h     |  1 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   |  3 ++
 3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index c7fa285..6d89e74 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -158,7 +158,7 @@ MODULE_PARM_DESC(tx_timeout, "The Tx timeout in ms");
 #define DPAA_RX_PRIV_DATA_SIZE	(u16)(DPAA_TX_PRIV_DATA_SIZE + \
 					dpaa_rx_extra_headroom)
 
-#define DPAA_ETH_RX_QUEUES	128
+#define DPAA_ETH_PCD_RXQ_NUM	128
 
 #define DPAA_ENQUEUE_RETRIES	100000
 
@@ -169,6 +169,7 @@ struct fm_port_fqs {
 	struct dpaa_fq *tx_errq;
 	struct dpaa_fq *rx_defq;
 	struct dpaa_fq *rx_errq;
+	struct dpaa_fq *rx_pcdq;
 };
 
 /* All the dpa bps in use at any moment */
@@ -628,6 +629,7 @@ static inline void dpaa_assign_wq(struct dpaa_fq *fq, int idx)
 		fq->wq = 5;
 		break;
 	case FQ_TYPE_RX_DEFAULT:
+	case FQ_TYPE_RX_PCD:
 		fq->wq = 6;
 		break;
 	case FQ_TYPE_TX:
@@ -688,6 +690,7 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct list_head *list,
 			      struct fm_port_fqs *port_fqs)
 {
 	struct dpaa_fq *dpaa_fq;
+	u32 fq_base, fq_base_aligned, i;
 
 	dpaa_fq = dpaa_fq_alloc(dev, 0, 1, list, FQ_TYPE_RX_ERROR);
 	if (!dpaa_fq)
@@ -701,6 +704,26 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct list_head *list,
 
 	port_fqs->rx_defq = &dpaa_fq[0];
 
+	/* the PCD FQIDs range needs to be aligned for correct operation */
+	if (qman_alloc_fqid_range(&fq_base, 2 * DPAA_ETH_PCD_RXQ_NUM))
+		goto fq_alloc_failed;
+
+	fq_base_aligned = ALIGN(fq_base, DPAA_ETH_PCD_RXQ_NUM);
+
+	for (i = fq_base; i < fq_base_aligned; i++)
+		qman_release_fqid(i);
+
+	for (i = fq_base_aligned + DPAA_ETH_PCD_RXQ_NUM;
+	     i < (fq_base + 2 * DPAA_ETH_PCD_RXQ_NUM); i++)
+		qman_release_fqid(i);
+
+	dpaa_fq = dpaa_fq_alloc(dev, fq_base_aligned, DPAA_ETH_PCD_RXQ_NUM,
+				list, FQ_TYPE_RX_PCD);
+	if (!dpaa_fq)
+		goto fq_alloc_failed;
+
+	port_fqs->rx_pcdq = &dpaa_fq[0];
+
 	if (!dpaa_fq_alloc(dev, 0, DPAA_ETH_TXQ_NUM, list, FQ_TYPE_TX_CONF_MQ))
 		goto fq_alloc_failed;
 
@@ -870,13 +893,14 @@ static void dpaa_fq_setup(struct dpaa_priv *priv,
 			  const struct dpaa_fq_cbs *fq_cbs,
 			  struct fman_port *tx_port)
 {
-	int egress_cnt = 0, conf_cnt = 0, num_portals = 0, cpu;
+	int egress_cnt = 0, conf_cnt = 0, num_portals = 0, portal_cnt = 0, cpu;
 	const cpumask_t *affine_cpus = qman_affine_cpus();
-	u16 portals[NR_CPUS];
+	u16 channels[NR_CPUS];
 	struct dpaa_fq *fq;
 
 	for_each_cpu(cpu, affine_cpus)
-		portals[num_portals++] = qman_affine_channel(cpu);
+		channels[num_portals++] = qman_affine_channel(cpu);
+
 	if (num_portals == 0)
 		dev_err(priv->net_dev->dev.parent,
 			"No Qman software (affine) channels found");
@@ -890,6 +914,12 @@ static void dpaa_fq_setup(struct dpaa_priv *priv,
 		case FQ_TYPE_RX_ERROR:
 			dpaa_setup_ingress(priv, fq, &fq_cbs->rx_errq);
 			break;
+		case FQ_TYPE_RX_PCD:
+			if (!num_portals)
+				continue;
+			dpaa_setup_ingress(priv, fq, &fq_cbs->rx_defq);
+			fq->channel = channels[portal_cnt++ % num_portals];
+			break;
 		case FQ_TYPE_TX:
 			dpaa_setup_egress(priv, fq, tx_port,
 					  &fq_cbs->egress_ern);
@@ -1039,7 +1069,8 @@ static int dpaa_fq_init(struct dpaa_fq *dpaa_fq, bool td_enable)
 		/* Put all the ingress queues in our "ingress CGR". */
 		if (priv->use_ingress_cgr &&
 		    (dpaa_fq->fq_type == FQ_TYPE_RX_DEFAULT ||
-		     dpaa_fq->fq_type == FQ_TYPE_RX_ERROR)) {
+		     dpaa_fq->fq_type == FQ_TYPE_RX_ERROR ||
+		     dpaa_fq->fq_type == FQ_TYPE_RX_PCD)) {
 			initfq.we_mask |= cpu_to_be16(QM_INITFQ_WE_CGID);
 			initfq.fqd.fq_ctrl |= cpu_to_be16(QM_FQCTRL_CGE);
 			initfq.fqd.cgid = (u8)priv->ingress_cgr.cgrid;
@@ -1170,7 +1201,7 @@ static int dpaa_eth_init_tx_port(struct fman_port *port, struct dpaa_fq *errq,
 
 static int dpaa_eth_init_rx_port(struct fman_port *port, struct dpaa_bp **bps,
 				 size_t count, struct dpaa_fq *errq,
-				 struct dpaa_fq *defq,
+				 struct dpaa_fq *defq, struct dpaa_fq *pcdq,
 				 struct dpaa_buffer_layout *buf_layout)
 {
 	struct fman_buffer_prefix_content buf_prefix_content;
@@ -1190,6 +1221,10 @@ static int dpaa_eth_init_rx_port(struct fman_port *port, struct dpaa_bp **bps,
 	rx_p = &params.specific_params.rx_params;
 	rx_p->err_fqid = errq->fqid;
 	rx_p->dflt_fqid = defq->fqid;
+	if (pcdq) {
+		rx_p->pcd_base_fqid = pcdq->fqid;
+		rx_p->pcd_fqs_count = DPAA_ETH_PCD_RXQ_NUM;
+	}
 
 	count = min(ARRAY_SIZE(rx_p->ext_buf_pools.ext_buf_pool), count);
 	rx_p->ext_buf_pools.num_of_pools_used = (u8)count;
@@ -1234,7 +1269,8 @@ static int dpaa_eth_init_ports(struct mac_device *mac_dev,
 		return err;
 
 	err = dpaa_eth_init_rx_port(rxport, bps, count, port_fqs->rx_errq,
-				    port_fqs->rx_defq, &buf_layout[RX]);
+				    port_fqs->rx_defq, port_fqs->rx_pcdq,
+				    &buf_layout[RX]);
 
 	return err;
 }
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 9941a78..496a12c 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -52,6 +52,7 @@
 enum dpaa_fq_type {
 	FQ_TYPE_RX_DEFAULT = 1, /* Rx Default FQs */
 	FQ_TYPE_RX_ERROR,	/* Rx Error FQs */
+	FQ_TYPE_RX_PCD,		/* Rx Parse Classify Distribute FQs */
 	FQ_TYPE_TX,		/* "Real" Tx FQs */
 	FQ_TYPE_TX_CONFIRM,	/* Tx default Conf FQ (actually an Rx FQ) */
 	FQ_TYPE_TX_CONF_MQ,	/* Tx conf FQs (one for each Tx FQ) */
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
index ec75d1c..0d9b185 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
@@ -71,6 +71,9 @@ static ssize_t dpaa_eth_show_fqids(struct device *dev,
 		case FQ_TYPE_RX_ERROR:
 			str = "Rx error";
 			break;
+		case FQ_TYPE_RX_PCD:
+			str = "Rx PCD";
+			break;
 		case FQ_TYPE_TX_CONFIRM:
 			str = "Tx default confirmation";
 			break;
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 1/6] fsl/fman: enable FMan Keygen
From: Madalin Bucur @ 2017-08-22 17:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503423066-15420-1-git-send-email-madalin.bucur@nxp.com>

From: Iordache Florinel-R70177 <florinel.iordache@nxp.com>

Add support for the FMan Keygen with a hardcoded scheme to spread
incoming traffic on a FQ range based on source and destination IPs
and ports.

Signed-off-by: Iordache Florinel <florinel.iordache@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/fman/Makefile      |   2 +-
 drivers/net/ethernet/freescale/fman/fman.c        |  26 +
 drivers/net/ethernet/freescale/fman/fman.h        |   2 +
 drivers/net/ethernet/freescale/fman/fman_keygen.c | 783 ++++++++++++++++++++++
 drivers/net/ethernet/freescale/fman/fman_keygen.h |  46 ++
 drivers/net/ethernet/freescale/fman/fman_port.c   |  40 +-
 drivers/net/ethernet/freescale/fman/fman_port.h   |   5 +
 7 files changed, 902 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h

diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile
index 6049177..2c38119 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -4,6 +4,6 @@ obj-$(CONFIG_FSL_FMAN) += fsl_fman.o
 obj-$(CONFIG_FSL_FMAN) += fsl_fman_port.o
 obj-$(CONFIG_FSL_FMAN) += fsl_mac.o
 
-fsl_fman-objs	:= fman_muram.o fman.o fman_sp.o
+fsl_fman-objs	:= fman_muram.o fman.o fman_sp.o fman_keygen.o
 fsl_fman_port-objs := fman_port.o
 fsl_mac-objs:= mac.o fman_dtsec.o fman_memac.o fman_tgec.o
diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index e714b8f..491a5ac 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -34,6 +34,7 @@
 
 #include "fman.h"
 #include "fman_muram.h"
+#include "fman_keygen.h"
 
 #include <linux/fsl/guts.h>
 #include <linux/slab.h>
@@ -56,6 +57,7 @@
 /* Modules registers offsets */
 #define BMI_OFFSET		0x00080000
 #define QMI_OFFSET		0x00080400
+#define KG_OFFSET		0x000C1000
 #define DMA_OFFSET		0x000C2000
 #define FPM_OFFSET		0x000C3000
 #define IMEM_OFFSET		0x000C4000
@@ -617,6 +619,7 @@ struct fman {
 	struct fman_qmi_regs __iomem *qmi_regs;
 	struct fman_dma_regs __iomem *dma_regs;
 	struct fman_hwp_regs __iomem *hwp_regs;
+	struct fman_kg_regs __iomem *kg_regs;
 	fman_exceptions_cb *exception_cb;
 	fman_bus_error_cb *bus_error_cb;
 	/* Spinlock for FMan use */
@@ -631,6 +634,8 @@ struct fman {
 	/* Fifo in MURAM */
 	unsigned long fifo_offset;
 	size_t fifo_size;
+	/* KeyGen handle */
+	struct fman_keygen *keygen;
 
 	u32 liodn_base[64];
 	u32 liodn_offset[64];
@@ -1811,6 +1816,7 @@ static int fman_config(struct fman *fman)
 	fman->qmi_regs = base_addr + QMI_OFFSET;
 	fman->dma_regs = base_addr + DMA_OFFSET;
 	fman->hwp_regs = base_addr + HWP_OFFSET;
+	fman->kg_regs = base_addr + KG_OFFSET;
 	fman->base_addr = base_addr;
 
 	spin_lock_init(&fman->spinlock);
@@ -2083,6 +2089,11 @@ static int fman_init(struct fman *fman)
 	/* Init HW Parser */
 	hwp_init(fman->hwp_regs);
 
+	/* Init KeyGen */
+	fman->keygen = keygen_init(fman->kg_regs);
+	if (!fman->keygen)
+		return -EINVAL;
+
 	err = enable(fman, cfg);
 	if (err != 0)
 		return err;
@@ -2562,6 +2573,21 @@ int fman_get_rx_extra_headroom(void)
 EXPORT_SYMBOL(fman_get_rx_extra_headroom);
 
 /**
+ * fman_get_keygen
+ *
+ * @fman:	A Pointer to FMan device
+ *
+ * Get the handle to KeyGen module part of FM driver
+ *
+ * Return: Handle to KeyGen
+ */
+struct fman_keygen *fman_get_keygen(struct fman *fman)
+{
+	return fman->keygen;
+}
+EXPORT_SYMBOL(fman_get_keygen);
+
+/**
  * fman_bind
  * @dev:	FMan OF device pointer
  *
diff --git a/drivers/net/ethernet/freescale/fman/fman.h b/drivers/net/ethernet/freescale/fman/fman.h
index f53e147..291990e 100644
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -320,6 +320,8 @@ u16 fman_get_max_frm(void);
 
 int fman_get_rx_extra_headroom(void);
 
+struct fman_keygen *fman_get_keygen(struct fman *fman);
+
 struct fman *fman_bind(struct device *dev);
 
 #endif /* __FM_H */
diff --git a/drivers/net/ethernet/freescale/fman/fman_keygen.c b/drivers/net/ethernet/freescale/fman/fman_keygen.c
new file mode 100644
index 0000000..f54da3c
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman_keygen.c
@@ -0,0 +1,783 @@
+/*
+ * Copyright 2017 NXP
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of NXP nor the
+ *       names of its contributors may be used to endorse or promote products
+ *       derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NXP ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL NXP BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/slab.h>
+
+#include "fman_keygen.h"
+
+/* Maximum number of HW Ports */
+#define FMAN_MAX_NUM_OF_HW_PORTS		64
+
+/* Maximum number of KeyGen Schemes */
+#define FM_KG_MAX_NUM_OF_SCHEMES		32
+
+/* Number of generic KeyGen Generic Extract Command Registers */
+#define FM_KG_NUM_OF_GENERIC_REGS		8
+
+/* Dummy port ID */
+#define DUMMY_PORT_ID				0
+
+/* Select Scheme Value Register */
+#define KG_SCH_DEF_USE_KGSE_DV_0		2
+#define KG_SCH_DEF_USE_KGSE_DV_1		3
+
+/* Registers Shifting values */
+#define FM_KG_KGAR_NUM_SHIFT			16
+#define KG_SCH_DEF_L4_PORT_SHIFT		8
+#define KG_SCH_DEF_IP_ADDR_SHIFT		18
+#define KG_SCH_HASH_CONFIG_SHIFT_SHIFT		24
+
+/* KeyGen Registers bit field masks: */
+
+/* Enable bit field mask for KeyGen General Configuration Register */
+#define FM_KG_KGGCR_EN				0x80000000
+
+/* KeyGen Global Registers bit field masks */
+#define FM_KG_KGAR_GO				0x80000000
+#define FM_KG_KGAR_READ				0x40000000
+#define FM_KG_KGAR_WRITE			0x00000000
+#define FM_KG_KGAR_SEL_SCHEME_ENTRY		0x00000000
+#define FM_KG_KGAR_SCM_WSEL_UPDATE_CNT		0x00008000
+
+#define FM_KG_KGAR_ERR				0x20000000
+#define FM_KG_KGAR_SEL_CLS_PLAN_ENTRY		0x01000000
+#define FM_KG_KGAR_SEL_PORT_ENTRY		0x02000000
+#define FM_KG_KGAR_SEL_PORT_WSEL_SP		0x00008000
+#define FM_KG_KGAR_SEL_PORT_WSEL_CPP		0x00004000
+
+/* Error events exceptions */
+#define FM_EX_KG_DOUBLE_ECC			0x80000000
+#define FM_EX_KG_KEYSIZE_OVERFLOW		0x40000000
+
+/* Scheme Registers bit field masks */
+#define KG_SCH_MODE_EN				0x80000000
+#define KG_SCH_VSP_NO_KSP_EN			0x80000000
+#define KG_SCH_HASH_CONFIG_SYM			0x40000000
+
+/* Known Protocol field codes */
+#define KG_SCH_KN_PORT_ID		0x80000000
+#define KG_SCH_KN_MACDST		0x40000000
+#define KG_SCH_KN_MACSRC		0x20000000
+#define KG_SCH_KN_TCI1			0x10000000
+#define KG_SCH_KN_TCI2			0x08000000
+#define KG_SCH_KN_ETYPE			0x04000000
+#define KG_SCH_KN_PPPSID		0x02000000
+#define KG_SCH_KN_PPPID			0x01000000
+#define KG_SCH_KN_MPLS1			0x00800000
+#define KG_SCH_KN_MPLS2			0x00400000
+#define KG_SCH_KN_MPLS_LAST		0x00200000
+#define KG_SCH_KN_IPSRC1		0x00100000
+#define KG_SCH_KN_IPDST1		0x00080000
+#define KG_SCH_KN_PTYPE1		0x00040000
+#define KG_SCH_KN_IPTOS_TC1		0x00020000
+#define KG_SCH_KN_IPV6FL1		0x00010000
+#define KG_SCH_KN_IPSRC2		0x00008000
+#define KG_SCH_KN_IPDST2		0x00004000
+#define KG_SCH_KN_PTYPE2		0x00002000
+#define KG_SCH_KN_IPTOS_TC2		0x00001000
+#define KG_SCH_KN_IPV6FL2		0x00000800
+#define KG_SCH_KN_GREPTYPE		0x00000400
+#define KG_SCH_KN_IPSEC_SPI		0x00000200
+#define KG_SCH_KN_IPSEC_NH		0x00000100
+#define KG_SCH_KN_IPPID			0x00000080
+#define KG_SCH_KN_L4PSRC		0x00000004
+#define KG_SCH_KN_L4PDST		0x00000002
+#define KG_SCH_KN_TFLG			0x00000001
+
+/* NIA values */
+#define NIA_ENG_BMI			0x00500000
+#define NIA_BMI_AC_ENQ_FRAME		0x00000002
+#define ENQUEUE_KG_DFLT_NIA		(NIA_ENG_BMI | NIA_BMI_AC_ENQ_FRAME)
+
+/* Hard-coded configuration:
+ * These values are used as hard-coded values for KeyGen configuration
+ * and they replace user selections for this hard-coded version
+ */
+
+/* Hash distribution shift */
+#define DEFAULT_HASH_DIST_FQID_SHIFT		0
+
+/* Hash shift */
+#define DEFAULT_HASH_SHIFT			0
+
+/* Symmetric hash usage:
+ * Warning:
+ * - the value for symmetric hash usage must be in accordance with hash
+ *	key defined below
+ * - according to tests performed, spreading is not working if symmetric
+ *	hash is set on true
+ * So ultimately symmetric hash functionality should be always disabled:
+ */
+#define DEFAULT_SYMMETRIC_HASH			false
+
+/* Hash Key extraction fields: */
+#define DEFAULT_HASH_KEY_EXTRACT_FIELDS		\
+	(KG_SCH_KN_IPSRC1 | KG_SCH_KN_IPDST1 | \
+	    KG_SCH_KN_L4PSRC | KG_SCH_KN_L4PDST)
+
+/* Default values to be used as hash key in case IPv4 or L4 (TCP, UDP)
+ * don't exist in the frame
+ */
+/* Default IPv4 address */
+#define DEFAULT_HASH_KEY_IPv4_ADDR		0x0A0A0A0A
+/* Default L4 port */
+#define DEFAULT_HASH_KEY_L4_PORT		0x0B0B0B0B
+
+/* KeyGen Memory Mapped Registers: */
+
+/* Scheme Configuration RAM Registers */
+struct fman_kg_scheme_regs {
+	u32 kgse_mode;		/* 0x100: MODE */
+	u32 kgse_ekfc;		/* 0x104: Extract Known Fields Command */
+	u32 kgse_ekdv;		/* 0x108: Extract Known Default Value */
+	u32 kgse_bmch;		/* 0x10C: Bit Mask Command High */
+	u32 kgse_bmcl;		/* 0x110: Bit Mask Command Low */
+	u32 kgse_fqb;		/* 0x114: Frame Queue Base */
+	u32 kgse_hc;		/* 0x118: Hash Command */
+	u32 kgse_ppc;		/* 0x11C: Policer Profile Command */
+	u32 kgse_gec[FM_KG_NUM_OF_GENERIC_REGS];
+			/* 0x120: Generic Extract Command */
+	u32 kgse_spc;
+		/* 0x140: KeyGen Scheme Entry Statistic Packet Counter */
+	u32 kgse_dv0;	/* 0x144: KeyGen Scheme Entry Default Value 0 */
+	u32 kgse_dv1;	/* 0x148: KeyGen Scheme Entry Default Value 1 */
+	u32 kgse_ccbs;
+		/* 0x14C: KeyGen Scheme Entry Coarse Classification Bit*/
+	u32 kgse_mv;	/* 0x150: KeyGen Scheme Entry Match vector */
+	u32 kgse_om;	/* 0x154: KeyGen Scheme Entry Operation Mode bits */
+	u32 kgse_vsp;
+		/* 0x158: KeyGen Scheme Entry Virtual Storage Profile */
+};
+
+/* Port Partition Configuration Registers */
+struct fman_kg_pe_regs {
+	u32 fmkg_pe_sp;		/* 0x100: KeyGen Port entry Scheme Partition */
+	u32 fmkg_pe_cpp;
+		/* 0x104: KeyGen Port Entry Classification Plan Partition */
+};
+
+/* General Configuration and Status Registers
+ * Global Statistic Counters
+ * KeyGen Global Registers
+ */
+struct fman_kg_regs {
+	u32 fmkg_gcr;	/* 0x000: KeyGen General Configuration Register */
+	u32 res004;	/* 0x004: Reserved */
+	u32 res008;	/* 0x008: Reserved */
+	u32 fmkg_eer;	/* 0x00C: KeyGen Error Event Register */
+	u32 fmkg_eeer;	/* 0x010: KeyGen Error Event Enable Register */
+	u32 res014;	/* 0x014: Reserved */
+	u32 res018;	/* 0x018: Reserved */
+	u32 fmkg_seer;	/* 0x01C: KeyGen Scheme Error Event Register */
+	u32 fmkg_seeer;	/* 0x020: KeyGen Scheme Error Event Enable Register */
+	u32 fmkg_gsr;	/* 0x024: KeyGen Global Status Register */
+	u32 fmkg_tpc;	/* 0x028: Total Packet Counter Register */
+	u32 fmkg_serc;	/* 0x02C: Soft Error Capture Register */
+	u32 res030[4];	/* 0x030: Reserved */
+	u32 fmkg_fdor;	/* 0x034: Frame Data Offset Register */
+	u32 fmkg_gdv0r;	/* 0x038: Global Default Value Register 0 */
+	u32 fmkg_gdv1r;	/* 0x03C: Global Default Value Register 1 */
+	u32 res04c[6];	/* 0x040: Reserved */
+	u32 fmkg_feer;	/* 0x044: Force Error Event Register */
+	u32 res068[38];	/* 0x048: Reserved */
+	union {
+		u32 fmkg_indirect[63];	/* 0x100: Indirect Access Registers */
+		struct fman_kg_scheme_regs fmkg_sch; /* Scheme Registers */
+		struct fman_kg_pe_regs fmkg_pe; /* Port Partition Registers */
+	};
+	u32 fmkg_ar;	/* 0x1FC: KeyGen Action Register */
+};
+
+/* KeyGen Scheme data */
+struct keygen_scheme {
+	bool used;	/* Specifies if this scheme is used */
+	u8 hw_port_id;
+		/* Hardware port ID
+		 * schemes sharing between multiple ports is not
+		 * currently supported
+		 * so we have only one port id bound to a scheme
+		 */
+	u32 base_fqid;
+		/* Base FQID:
+		 * Must be between 1 and 2^24-1
+		 * If hash is used and an even distribution is
+		 * expected according to hash_fqid_count,
+		 * base_fqid must be aligned to hash_fqid_count
+		 */
+	u32 hash_fqid_count;
+		/* FQ range for hash distribution:
+		 * Must be a power of 2
+		 * Represents the range of queues for spreading
+		 */
+	bool use_hashing;	/* Usage of Hashing and spreading over FQ */
+	bool symmetric_hash;	/* Symmetric Hash option usage */
+	u8 hashShift;
+		/* Hash result right shift.
+		 * Select the 24 bits out of the 64 hash result.
+		 * 0 means using the 24 LSB's, otherwise
+		 * use the 24 LSB's after shifting right
+		 */
+	u32 match_vector;	/* Match Vector */
+};
+
+/* KeyGen driver data */
+struct fman_keygen {
+	struct keygen_scheme schemes[FM_KG_MAX_NUM_OF_SCHEMES];
+				/* Array of schemes */
+	struct fman_kg_regs __iomem *keygen_regs;	/* KeyGen registers */
+};
+
+/* keygen_write_ar_wait
+ *
+ * Write Action Register with specified value, wait for GO bit field to be
+ * idle and then read the error
+ *
+ * regs: KeyGen registers
+ * fmkg_ar: Action Register value
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+static int keygen_write_ar_wait(struct fman_kg_regs __iomem *regs, u32 fmkg_ar)
+{
+	iowrite32be(fmkg_ar, &regs->fmkg_ar);
+
+	/* Wait for GO bit field to be idle */
+	while (fmkg_ar & FM_KG_KGAR_GO)
+		fmkg_ar = ioread32be(&regs->fmkg_ar);
+
+	if (fmkg_ar & FM_KG_KGAR_ERR)
+		return -EINVAL;
+
+	return 0;
+}
+
+/* build_ar_scheme
+ *
+ * Build Action Register value for scheme settings
+ *
+ * scheme_id: Scheme ID
+ * update_counter: update scheme counter
+ * write: true for action to write the scheme or false for read action
+ *
+ * Return: AR value
+ */
+static u32 build_ar_scheme(u8 scheme_id, bool update_counter, bool write)
+{
+	u32 rw = (u32)(write ? FM_KG_KGAR_WRITE : FM_KG_KGAR_READ);
+
+	return (u32)(FM_KG_KGAR_GO |
+			rw |
+			FM_KG_KGAR_SEL_SCHEME_ENTRY |
+			DUMMY_PORT_ID |
+			((u32)scheme_id << FM_KG_KGAR_NUM_SHIFT) |
+			(update_counter ? FM_KG_KGAR_SCM_WSEL_UPDATE_CNT : 0));
+}
+
+/* build_ar_bind_scheme
+ *
+ * Build Action Register value for port binding to schemes
+ *
+ * hwport_id: HW Port ID
+ * write: true for action to write the bind or false for read action
+ *
+ * Return: AR value
+ */
+static u32 build_ar_bind_scheme(u8 hwport_id, bool write)
+{
+	u32 rw = write ? (u32)FM_KG_KGAR_WRITE : (u32)FM_KG_KGAR_READ;
+
+	return (u32)(FM_KG_KGAR_GO |
+			rw |
+			FM_KG_KGAR_SEL_PORT_ENTRY |
+			hwport_id |
+			FM_KG_KGAR_SEL_PORT_WSEL_SP);
+}
+
+/* keygen_write_sp
+ *
+ * Write Scheme Partition Register with specified value
+ *
+ * regs: KeyGen Registers
+ * sp: Scheme Partition register value
+ * add: true to add a scheme partition or false to clear
+ *
+ * Return: none
+ */
+static void keygen_write_sp(struct fman_kg_regs __iomem *regs, u32 sp, bool add)
+{
+	u32 tmp;
+
+	tmp = ioread32be(&regs->fmkg_pe.fmkg_pe_sp);
+
+	if (add)
+		tmp |= sp;
+	else
+		tmp &= ~sp;
+
+	iowrite32be(tmp, &regs->fmkg_pe.fmkg_pe_sp);
+}
+
+/* build_ar_bind_cls_plan
+ *
+ * Build Action Register value for Classification Plan
+ *
+ * hwport_id: HW Port ID
+ * write: true for action to write the CP or false for read action
+ *
+ * Return: AR value
+ */
+static u32 build_ar_bind_cls_plan(u8 hwport_id, bool write)
+{
+	u32 rw = write ? (u32)FM_KG_KGAR_WRITE : (u32)FM_KG_KGAR_READ;
+
+	return (u32)(FM_KG_KGAR_GO |
+			rw |
+			FM_KG_KGAR_SEL_PORT_ENTRY |
+			hwport_id |
+			FM_KG_KGAR_SEL_PORT_WSEL_CPP);
+}
+
+/* keygen_write_cpp
+ *
+ * Write Classification Plan Partition Register with specified value
+ *
+ * regs: KeyGen Registers
+ * cpp: CPP register value
+ *
+ * Return: none
+ */
+static void keygen_write_cpp(struct fman_kg_regs __iomem *regs, u32 cpp)
+{
+	iowrite32be(cpp, &regs->fmkg_pe.fmkg_pe_cpp);
+}
+
+/* keygen_write_scheme
+ *
+ * Write all Schemes Registers with specified values
+ *
+ * regs: KeyGen Registers
+ * scheme_id: Scheme ID
+ * scheme_regs: Scheme registers values desired to be written
+ * update_counter: update scheme counter
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+static int keygen_write_scheme(struct fman_kg_regs __iomem *regs, u8 scheme_id,
+			       struct fman_kg_scheme_regs *scheme_regs,
+				bool update_counter)
+{
+	u32 ar_reg;
+	int err, i;
+
+	/* Write indirect scheme registers */
+	iowrite32be(scheme_regs->kgse_mode, &regs->fmkg_sch.kgse_mode);
+	iowrite32be(scheme_regs->kgse_ekfc, &regs->fmkg_sch.kgse_ekfc);
+	iowrite32be(scheme_regs->kgse_ekdv, &regs->fmkg_sch.kgse_ekdv);
+	iowrite32be(scheme_regs->kgse_bmch, &regs->fmkg_sch.kgse_bmch);
+	iowrite32be(scheme_regs->kgse_bmcl, &regs->fmkg_sch.kgse_bmcl);
+	iowrite32be(scheme_regs->kgse_fqb, &regs->fmkg_sch.kgse_fqb);
+	iowrite32be(scheme_regs->kgse_hc, &regs->fmkg_sch.kgse_hc);
+	iowrite32be(scheme_regs->kgse_ppc, &regs->fmkg_sch.kgse_ppc);
+	iowrite32be(scheme_regs->kgse_spc, &regs->fmkg_sch.kgse_spc);
+	iowrite32be(scheme_regs->kgse_dv0, &regs->fmkg_sch.kgse_dv0);
+	iowrite32be(scheme_regs->kgse_dv1, &regs->fmkg_sch.kgse_dv1);
+	iowrite32be(scheme_regs->kgse_ccbs, &regs->fmkg_sch.kgse_ccbs);
+	iowrite32be(scheme_regs->kgse_mv, &regs->fmkg_sch.kgse_mv);
+	iowrite32be(scheme_regs->kgse_om, &regs->fmkg_sch.kgse_om);
+	iowrite32be(scheme_regs->kgse_vsp, &regs->fmkg_sch.kgse_vsp);
+
+	for (i = 0 ; i < FM_KG_NUM_OF_GENERIC_REGS ; i++)
+		iowrite32be(scheme_regs->kgse_gec[i],
+			    &regs->fmkg_sch.kgse_gec[i]);
+
+	/* Write AR (Action register) */
+	ar_reg = build_ar_scheme(scheme_id, update_counter, true);
+	err = keygen_write_ar_wait(regs, ar_reg);
+	if (err != 0) {
+		pr_err("Writing Action Register failed\n");
+		return err;
+	}
+
+	return err;
+}
+
+/* get_free_scheme_id
+ *
+ * Find the first free scheme available to be used
+ *
+ * keygen: KeyGen handle
+ * scheme_id: pointer to scheme id
+ *
+ * Return: 0 on success, -EINVAL when the are no available free schemes
+ */
+static int get_free_scheme_id(struct fman_keygen *keygen, u8 *scheme_id)
+{
+	u8 i;
+
+	for (i = 0; i < FM_KG_MAX_NUM_OF_SCHEMES; i++)
+		if (!keygen->schemes[i].used) {
+			*scheme_id = i;
+			return 0;
+		}
+
+	return -EINVAL;
+}
+
+/* get_scheme
+ *
+ * Provides the scheme for specified ID
+ *
+ * keygen: KeyGen handle
+ * scheme_id: Scheme ID
+ *
+ * Return: handle to required scheme
+ */
+static struct keygen_scheme *get_scheme(struct fman_keygen *keygen,
+					u8 scheme_id)
+{
+	if (scheme_id >= FM_KG_MAX_NUM_OF_SCHEMES)
+		return NULL;
+	return &keygen->schemes[scheme_id];
+}
+
+/* keygen_bind_port_to_schemes
+ *
+ * Bind the port to schemes
+ *
+ * keygen: KeyGen handle
+ * scheme_id: id of the scheme to bind to
+ * bind: true to bind the port or false to unbind it
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+static int keygen_bind_port_to_schemes(struct fman_keygen *keygen,
+				       u8 scheme_id,
+					bool bind)
+{
+	struct fman_kg_regs __iomem *keygen_regs = keygen->keygen_regs;
+	struct keygen_scheme *scheme;
+	u32 ar_reg;
+	u32 schemes_vector = 0;
+	int err;
+
+	scheme = get_scheme(keygen, scheme_id);
+	if (!scheme) {
+		pr_err("Requested Scheme does not exist\n");
+		return -EINVAL;
+	}
+	if (!scheme->used) {
+		pr_err("Cannot bind port to an invalid scheme\n");
+		return -EINVAL;
+	}
+
+	schemes_vector |= 1 << (31 - scheme_id);
+
+	ar_reg = build_ar_bind_scheme(scheme->hw_port_id, false);
+	err = keygen_write_ar_wait(keygen_regs, ar_reg);
+	if (err != 0) {
+		pr_err("Reading Action Register failed\n");
+		return err;
+	}
+
+	keygen_write_sp(keygen_regs, schemes_vector, bind);
+
+	ar_reg = build_ar_bind_scheme(scheme->hw_port_id, true);
+	err = keygen_write_ar_wait(keygen_regs, ar_reg);
+	if (err != 0) {
+		pr_err("Writing Action Register failed\n");
+		return err;
+	}
+
+	return 0;
+}
+
+/* keygen_scheme_setup
+ *
+ * Setup the scheme according to required configuration
+ *
+ * keygen: KeyGen handle
+ * scheme_id: scheme ID
+ * enable: true to enable scheme or false to disable it
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+static int keygen_scheme_setup(struct fman_keygen *keygen, u8 scheme_id,
+			       bool enable)
+{
+	struct fman_kg_regs __iomem *keygen_regs = keygen->keygen_regs;
+	struct fman_kg_scheme_regs scheme_regs;
+	struct keygen_scheme *scheme;
+	u32 tmp_reg;
+	int err;
+
+	scheme = get_scheme(keygen, scheme_id);
+	if (!scheme) {
+		pr_err("Requested Scheme does not exist\n");
+		return -EINVAL;
+	}
+	if (enable && scheme->used) {
+		pr_err("The requested Scheme is already used\n");
+		return -EINVAL;
+	}
+
+	/* Clear scheme registers */
+	memset(&scheme_regs, 0, sizeof(struct fman_kg_scheme_regs));
+
+	/* Setup all scheme registers: */
+	tmp_reg = 0;
+
+	if (enable) {
+		/* Enable Scheme */
+		tmp_reg |= KG_SCH_MODE_EN;
+		/* Enqueue frame NIA */
+		tmp_reg |= ENQUEUE_KG_DFLT_NIA;
+	}
+
+	scheme_regs.kgse_mode = tmp_reg;
+
+	scheme_regs.kgse_mv = scheme->match_vector;
+
+	/* Scheme don't override StorageProfile:
+	 * valid only for DPAA_VERSION >= 11
+	 */
+	scheme_regs.kgse_vsp = KG_SCH_VSP_NO_KSP_EN;
+
+	/* Configure Hard-Coded Rx Hashing: */
+
+	if (scheme->use_hashing) {
+		/* configure kgse_ekfc */
+		scheme_regs.kgse_ekfc = DEFAULT_HASH_KEY_EXTRACT_FIELDS;
+
+		/* configure kgse_ekdv */
+		tmp_reg = 0;
+		tmp_reg |= (KG_SCH_DEF_USE_KGSE_DV_0 <<
+				KG_SCH_DEF_IP_ADDR_SHIFT);
+		tmp_reg |= (KG_SCH_DEF_USE_KGSE_DV_1 <<
+				KG_SCH_DEF_L4_PORT_SHIFT);
+		scheme_regs.kgse_ekdv = tmp_reg;
+
+		/* configure kgse_dv0 */
+		scheme_regs.kgse_dv0 = DEFAULT_HASH_KEY_IPv4_ADDR;
+		/* configure kgse_dv1 */
+		scheme_regs.kgse_dv1 = DEFAULT_HASH_KEY_L4_PORT;
+
+		/* configure kgse_hc  */
+		tmp_reg = 0;
+		tmp_reg |= ((scheme->hash_fqid_count - 1) <<
+				DEFAULT_HASH_DIST_FQID_SHIFT);
+		tmp_reg |= scheme->hashShift << KG_SCH_HASH_CONFIG_SHIFT_SHIFT;
+
+		if (scheme->symmetric_hash) {
+			/* Normally extraction key should be verified if
+			 * complies with symmetric hash
+			 * But because extraction is hard-coded, we are sure
+			 * the key is symmetric
+			 */
+			tmp_reg |= KG_SCH_HASH_CONFIG_SYM;
+		}
+		scheme_regs.kgse_hc = tmp_reg;
+	} else {
+		scheme_regs.kgse_ekfc = 0;
+		scheme_regs.kgse_hc = 0;
+		scheme_regs.kgse_ekdv = 0;
+		scheme_regs.kgse_dv0 = 0;
+		scheme_regs.kgse_dv1 = 0;
+	}
+
+	/* configure kgse_fqb: Scheme FQID base */
+	tmp_reg = 0;
+	tmp_reg |= scheme->base_fqid;
+	scheme_regs.kgse_fqb = tmp_reg;
+
+	/* features not used by hard-coded configuration */
+	scheme_regs.kgse_bmch = 0;
+	scheme_regs.kgse_bmcl = 0;
+	scheme_regs.kgse_spc = 0;
+
+	/* Write scheme registers */
+	err = keygen_write_scheme(keygen_regs, scheme_id, &scheme_regs, true);
+	if (err != 0) {
+		pr_err("Writing scheme registers failed\n");
+		return err;
+	}
+
+	/* Update used field for Scheme */
+	scheme->used = enable;
+
+	return 0;
+}
+
+/* keygen_init
+ *
+ * KeyGen initialization:
+ * Initializes and enables KeyGen, allocate driver memory, setup registers,
+ * clear port bindings, invalidate all schemes
+ *
+ * keygen_regs: KeyGen registers base address
+ *
+ * Return: Handle to KeyGen driver
+ */
+struct fman_keygen *keygen_init(struct fman_kg_regs __iomem *keygen_regs)
+{
+	struct fman_keygen *keygen;
+	u32 ar;
+	int i;
+
+	/* Allocate memory for KeyGen driver */
+	keygen = kzalloc(sizeof(*keygen), GFP_KERNEL);
+	if (!keygen)
+		return NULL;
+
+	keygen->keygen_regs = keygen_regs;
+
+	/* KeyGen initialization (for Master partition):
+	 * Setup KeyGen registers
+	 */
+	iowrite32be(ENQUEUE_KG_DFLT_NIA, &keygen_regs->fmkg_gcr);
+
+	iowrite32be(FM_EX_KG_DOUBLE_ECC | FM_EX_KG_KEYSIZE_OVERFLOW,
+		    &keygen_regs->fmkg_eer);
+
+	iowrite32be(0, &keygen_regs->fmkg_fdor);
+	iowrite32be(0, &keygen_regs->fmkg_gdv0r);
+	iowrite32be(0, &keygen_regs->fmkg_gdv1r);
+
+	/* Clear binding between ports to schemes and classification plans
+	 * so that all ports are not bound to any scheme/classification plan
+	 */
+	for (i = 0; i < FMAN_MAX_NUM_OF_HW_PORTS; i++) {
+		/* Clear all pe sp schemes registers */
+		keygen_write_sp(keygen_regs, 0xffffffff, false);
+		ar = build_ar_bind_scheme(i, true);
+		keygen_write_ar_wait(keygen_regs, ar);
+
+		/* Clear all pe cpp classification plans registers */
+		keygen_write_cpp(keygen_regs, 0);
+		ar = build_ar_bind_cls_plan(i, true);
+		keygen_write_ar_wait(keygen_regs, ar);
+	}
+
+	/* Enable all scheme interrupts */
+	iowrite32be(0xFFFFFFFF, &keygen_regs->fmkg_seer);
+	iowrite32be(0xFFFFFFFF, &keygen_regs->fmkg_seeer);
+
+	/* Enable KyeGen */
+	iowrite32be(ioread32be(&keygen_regs->fmkg_gcr) | FM_KG_KGGCR_EN,
+		    &keygen_regs->fmkg_gcr);
+
+	return keygen;
+}
+EXPORT_SYMBOL(keygen_init);
+
+/* keygen_port_hashing_init
+ *
+ * Initializes a port for Rx Hashing with specified configuration parameters
+ *
+ * keygen: KeyGen handle
+ * hw_port_id: HW Port ID
+ * hash_base_fqid: Hashing Base FQID used for spreading
+ * hash_size: Hashing size
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+int keygen_port_hashing_init(struct fman_keygen *keygen, u8 hw_port_id,
+			     u32 hash_base_fqid, u32 hash_size)
+{
+	struct keygen_scheme *scheme;
+	u8 scheme_id;
+	int err;
+
+	/* Validate Scheme configuration parameters */
+	if (hash_base_fqid == 0 || (hash_base_fqid & ~0x00FFFFFF)) {
+		pr_err("Base FQID must be between 1 and 2^24-1\n");
+		return -EINVAL;
+	}
+	if (hash_size == 0 || (hash_size & (hash_size - 1)) != 0) {
+		pr_err("Hash size must be power of two\n");
+		return -EINVAL;
+	}
+
+	/* Find a free scheme */
+	err = get_free_scheme_id(keygen, &scheme_id);
+	if (err) {
+		pr_err("The maximum number of available Schemes has been exceeded\n");
+		return -EINVAL;
+	}
+
+	/* Create and configure Hard-Coded Scheme: */
+
+	scheme = get_scheme(keygen, scheme_id);
+	if (!scheme) {
+		pr_err("Requested Scheme does not exist\n");
+		return -EINVAL;
+	}
+	if (scheme->used) {
+		pr_err("The requested Scheme is already used\n");
+		return -EINVAL;
+	}
+
+	/* Clear all scheme fields because the scheme may have been
+	 * previously used
+	 */
+	memset(scheme, 0, sizeof(struct keygen_scheme));
+
+	/* Setup scheme: */
+	scheme->hw_port_id = hw_port_id;
+	scheme->use_hashing = true;
+	scheme->base_fqid = hash_base_fqid;
+	scheme->hash_fqid_count = hash_size;
+	scheme->symmetric_hash = DEFAULT_SYMMETRIC_HASH;
+	scheme->hashShift = DEFAULT_HASH_SHIFT;
+
+	/* All Schemes in hard-coded configuration
+	 * are Indirect Schemes
+	 */
+	scheme->match_vector = 0;
+
+	err = keygen_scheme_setup(keygen, scheme_id, true);
+	if (err != 0) {
+		pr_err("Scheme setup failed\n");
+		return err;
+	}
+
+	/* Bind Rx port to Scheme */
+	err = keygen_bind_port_to_schemes(keygen, scheme_id, true);
+	if (err != 0) {
+		pr_err("Binding port to schemes failed\n");
+		return err;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(keygen_port_hashing_init);
diff --git a/drivers/net/ethernet/freescale/fman/fman_keygen.h b/drivers/net/ethernet/freescale/fman/fman_keygen.h
new file mode 100644
index 0000000..c4640de
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman_keygen.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright 2017 NXP
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of NXP nor the
+ *       names of its contributors may be used to endorse or promote products
+ *       derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NXP ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL NXP BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __KEYGEN_H
+#define __KEYGEN_H
+
+#include <linux/io.h>
+
+struct fman_keygen;
+struct fman_kg_regs;
+
+struct fman_keygen *keygen_init(struct fman_kg_regs __iomem *keygen_regs);
+
+int keygen_port_hashing_init(struct fman_keygen *keygen, u8 hw_port_id,
+			     u32 hash_base_fqid, u32 hash_size);
+
+#endif /* __KEYGEN_H */
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
index 49bfa11..b0ad9c4 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -35,6 +35,7 @@
 #include "fman_port.h"
 #include "fman.h"
 #include "fman_sp.h"
+#include "fman_keygen.h"
 
 #include <linux/io.h>
 #include <linux/slab.h>
@@ -184,6 +185,7 @@
 #define NIA_ENG_QMI_ENQ					0x00540000
 #define NIA_ENG_QMI_DEQ					0x00580000
 #define NIA_ENG_HWP					0x00440000
+#define NIA_ENG_HWK					0x00480000
 #define NIA_BMI_AC_ENQ_FRAME				0x00000002
 #define NIA_BMI_AC_TX_RELEASE				0x000002C0
 #define NIA_BMI_AC_RELEASE				0x000000C0
@@ -394,6 +396,8 @@ struct fman_port_bpools {
 struct fman_port_cfg {
 	u32 dflt_fqid;
 	u32 err_fqid;
+	u32 pcd_base_fqid;
+	u32 pcd_fqs_count;
 	u8 deq_sp;
 	bool deq_high_priority;
 	enum fman_port_deq_type deq_type;
@@ -1271,6 +1275,10 @@ static void set_rx_dflt_cfg(struct fman_port *port,
 		port_params->specific_params.rx_params.err_fqid;
 	port->cfg->dflt_fqid =
 		port_params->specific_params.rx_params.dflt_fqid;
+	port->cfg->pcd_base_fqid =
+		port_params->specific_params.rx_params.pcd_base_fqid;
+	port->cfg->pcd_fqs_count =
+		port_params->specific_params.rx_params.pcd_fqs_count;
 }
 
 static void set_tx_dflt_cfg(struct fman_port *port,
@@ -1398,6 +1406,24 @@ int fman_port_config(struct fman_port *port, struct fman_port_params *params)
 EXPORT_SYMBOL(fman_port_config);
 
 /**
+ * fman_port_use_kg_hash
+ * port:        A pointer to a FM Port module.
+ * Sets the HW KeyGen or the BMI as HW Parser next engine, enabling
+ * or bypassing the KeyGen hashing of Rx traffic
+ */
+void fman_port_use_kg_hash(struct fman_port *port, bool enable)
+{
+	if (enable)
+		/* After the Parser frames go to KeyGen */
+		iowrite32be(NIA_ENG_HWK, &port->bmi_regs->rx.fmbm_rfpne);
+	else
+		/* After the Parser frames go to BMI */
+		iowrite32be(NIA_ENG_BMI | NIA_BMI_AC_ENQ_FRAME,
+			    &port->bmi_regs->rx.fmbm_rfpne);
+}
+EXPORT_SYMBOL(fman_port_use_kg_hash);
+
+/**
  * fman_port_init
  * port:	A pointer to a FM Port module.
  * Initializes the FM PORT module by defining the software structure and
@@ -1407,9 +1433,10 @@ EXPORT_SYMBOL(fman_port_config);
  */
 int fman_port_init(struct fman_port *port)
 {
+	struct fman_port_init_params params;
+	struct fman_keygen *keygen;
 	struct fman_port_cfg *cfg;
 	int err;
-	struct fman_port_init_params params;
 
 	if (is_init_done(port->cfg))
 		return -EINVAL;
@@ -1472,6 +1499,17 @@ int fman_port_init(struct fman_port *port)
 	if (err)
 		return err;
 
+	if (port->cfg->pcd_fqs_count) {
+		keygen = fman_get_keygen(port->dts_params.fman);
+		err = keygen_port_hashing_init(keygen, port->port_id,
+					       port->cfg->pcd_base_fqid,
+					       port->cfg->pcd_fqs_count);
+		if (err)
+			return err;
+
+		fman_port_use_kg_hash(port, true);
+	}
+
 	kfree(port->cfg);
 	port->cfg = NULL;
 
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.h b/drivers/net/ethernet/freescale/fman/fman_port.h
index 8ba9017..5a99611 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.h
+++ b/drivers/net/ethernet/freescale/fman/fman_port.h
@@ -100,6 +100,9 @@ struct fman_port;
 struct fman_port_rx_params {
 	u32 err_fqid;			/* Error Queue Id. */
 	u32 dflt_fqid;			/* Default Queue Id. */
+	u32 pcd_base_fqid;		/* PCD base Queue Id. */
+	u32 pcd_fqs_count;		/* Number of PCD FQs. */
+
 	/* Which external buffer pools are used
 	 * (up to FMAN_PORT_MAX_EXT_POOLS_NUM), and their sizes.
 	 */
@@ -134,6 +137,8 @@ struct fman_port_params {
 
 int fman_port_config(struct fman_port *port, struct fman_port_params *params);
 
+void fman_port_use_kg_hash(struct fman_port *port, bool enable);
+
 int fman_port_init(struct fman_port *port);
 
 int fman_port_cfg_buf_prefix_content(struct fman_port *port,
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 0/6] Add RSS to DPAA 1.x Ethernet driver
From: Madalin Bucur @ 2017-08-22 17:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel

This patch set introduces Receive Side Scaling for the DPAA Ethernet
driver. Documentation is updated with details related to the new
feature and limitations that apply.
Added also a small fix.

Change from v1: removed a C++ style comment

Iordache Florinel-R70177 (1):
  fsl/fman: enable FMan Keygen

Madalin Bucur (5):
  dpaa_eth: use multiple Rx frame queues
  dpaa_eth: enable Rx hashing control
  dpaa_eth: add NETIF_F_RXHASH
  Documentation: networking: add RSS information
  dpaa_eth: check allocation result

 Documentation/networking/dpaa.txt                  |  68 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c     |  76 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h     |   2 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   |   3 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 118 ++++
 drivers/net/ethernet/freescale/fman/Makefile       |   2 +-
 drivers/net/ethernet/freescale/fman/fman.c         |  26 +
 drivers/net/ethernet/freescale/fman/fman.h         |   2 +
 drivers/net/ethernet/freescale/fman/fman_keygen.c  | 783 +++++++++++++++++++++
 drivers/net/ethernet/freescale/fman/fman_keygen.h  |  46 ++
 drivers/net/ethernet/freescale/fman/fman_port.c    |  51 +-
 drivers/net/ethernet/freescale/fman/fman_port.h    |   7 +
 12 files changed, 1171 insertions(+), 13 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h

-- 
2.1.0

^ permalink raw reply

* Re: [PATCH net] ethernet: xircom: small clean up in setup_xirc2ps_cs()
From: David Miller @ 2017-08-22 17:30 UTC (permalink / raw)
  To: dan.carpenter
  Cc: jarod, netdev, kernel-janitors, matvejchikov, bhe, akpm, mingo,
	linux-kernel
In-Reply-To: <20170821094730.ydm2xwsqbkyadybi@mwanda>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Mon, 21 Aug 2017 12:47:30 +0300

> The get_options() function takes the whole ARRAY_SIZE().  It doesn't
> matter here because we don't use more than 7 elements.
> 
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 net] udp: on peeking bad csum, drop packets even if not at head
From: Willem de Bruijn @ 2017-08-22 17:29 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Eric Dumazet, David Miller, Willem de Bruijn, netdev
In-Reply-To: <1503420462.13544.9.camel@redhat.com>

On Tue, Aug 22, 2017 at 12:47 PM, Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2017-08-22 at 09:39 -0700, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> When peeking, if a bad csum is discovered, the skb is unlinked from
>> the queue with __sk_queue_drop_skb and the peek operation restarted.
>>
>> __sk_queue_drop_skb only drops packets that match the queue head.
>>
>> This fails if the skb was found after the head, using SO_PEEK_OFF
>> socket option. This causes an infinite loop.
>>
>> We MUST drop this problematic skb, and we can simply check if skb was
>> already removed by another thread, by looking at skb->next :
>>
>> This pointer is set to NULL by the  __skb_unlink() operation, that might
>> have happened only under the spinlock protection.
>>
>> Many thanks to syzkaller team (and particularly Dmitry Vyukov who
>> provided us nice C reproducers exhibiting the lockup) and Willem de
>> Bruijn who provided first version for this patch and a test program.
>>
>> Fixes: 627d2d6b5500 ("udp: enable MSG_PEEK at non-zero offset")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Reported-by: Dmitry Vyukov <dvyukov@google.com>
>> Cc: Willem de Bruijn <willemb@google.com>
>> ---
>>  net/core/datagram.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/core/datagram.c b/net/core/datagram.c
>> index a21ca8dee5ea..8c2f4489ff8f 100644
>> --- a/net/core/datagram.c
>> +++ b/net/core/datagram.c
>> @@ -362,7 +362,7 @@ int __sk_queue_drop_skb(struct sock *sk, struct sk_buff_head *sk_queue,
>>       if (flags & MSG_PEEK) {
>>               err = -ENOENT;
>>               spin_lock_bh(&sk_queue->lock);
>> -             if (skb == skb_peek(sk_queue)) {
>> +             if (skb->next) {
>>                       __skb_unlink(skb, sk_queue);
>>                       refcount_dec(&skb->users);
>>                       if (destructor)
>>
>
> This version is really nice!

It is :) Thanks, Eric!

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* Re: pull request (net): ipsec 2017-08-21
From: David Miller @ 2017-08-22 17:27 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, netdev
In-Reply-To: <1503295683-19153-1-git-send-email-steffen.klassert@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Mon, 21 Aug 2017 08:07:59 +0200

> 1) Fix memleaks when ESP takes an error path.
> 
> 2) Fix null pointer dereference when creating a sub policy
>    that matches the same outer flow as main policy does.
>    From Koichiro Den.
> 
> 3) Fix possible out-of-bound access in xfrm_migrate.
>    This patch should go to the stable trees too.
>    From Vladis Dronov.
> 
> 4) ESP can return positive and negative error values,
>    so treat both cases as an error.
> 
> Please pull or let me know if there are problems.

Pulled, thanks!

^ permalink raw reply

* [PATCH net-next 2/2] selftests/net: Add a test to validate behavior of rx timestamps
From: Mike Maloney @ 2017-08-22 17:27 UTC (permalink / raw)
  To: netdev, davem; +Cc: Mike Maloney
In-Reply-To: <20170822172703.31703-1-maloneykernel@gmail.com>

From: Mike Maloney <maloney@google.com>

Validate the behavior of the combination of various timestamp socket
options, and ensure consistency across ip, udp, and tcp.

Signed-off-by: Mike Maloney <maloney@google.com>
---
 .../selftests/networking/timestamping/.gitignore   |   1 +
 .../selftests/networking/timestamping/Makefile     |   4 +-
 .../networking/timestamping/rxtimestamp.c          | 379 +++++++++++++++++++++
 3 files changed, 383 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/networking/timestamping/rxtimestamp.c

diff --git a/tools/testing/selftests/networking/timestamping/.gitignore b/tools/testing/selftests/networking/timestamping/.gitignore
index 9e69e982fb38..d9355035e746 100644
--- a/tools/testing/selftests/networking/timestamping/.gitignore
+++ b/tools/testing/selftests/networking/timestamping/.gitignore
@@ -1,3 +1,4 @@
 timestamping
+rxtimestamp
 txtimestamp
 hwtstamp_config
diff --git a/tools/testing/selftests/networking/timestamping/Makefile b/tools/testing/selftests/networking/timestamping/Makefile
index ccbb9edbbbb9..92fb8ee917c5 100644
--- a/tools/testing/selftests/networking/timestamping/Makefile
+++ b/tools/testing/selftests/networking/timestamping/Makefile
@@ -1,4 +1,6 @@
-TEST_PROGS := hwtstamp_config timestamping txtimestamp
+CFLAGS += -I../../../../../usr/include
+
+TEST_PROGS := hwtstamp_config rxtimestamp timestamping txtimestamp
 
 all: $(TEST_PROGS)
 
diff --git a/tools/testing/selftests/networking/timestamping/rxtimestamp.c b/tools/testing/selftests/networking/timestamping/rxtimestamp.c
new file mode 100644
index 000000000000..6abcdf401d1a
--- /dev/null
+++ b/tools/testing/selftests/networking/timestamping/rxtimestamp.c
@@ -0,0 +1,379 @@
+#include <errno.h>
+#include <error.h>
+#include <getopt.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <sys/time.h>
+#include <sys/socket.h>
+#include <sys/select.h>
+#include <sys/ioctl.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+
+#include <asm/types.h>
+#include <linux/net_tstamp.h>
+#include <linux/errqueue.h>
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+struct options {
+	int so_timestamp;
+	int so_timestampns;
+	int so_timestamping;
+};
+
+struct tstamps {
+	bool tstamp;
+	bool tstampns;
+	bool swtstamp;
+	bool hwtstamp;
+};
+
+struct socket_type {
+	char *friendly_name;
+	int type;
+	int protocol;
+	bool enabled;
+};
+
+struct test_case {
+	struct options sockopt;
+	struct tstamps expected;
+	bool enabled;
+};
+
+struct sof_flag {
+	int mask;
+	char *name;
+};
+
+static struct sof_flag sof_flags[] = {
+#define SOF_FLAG(f) { f, #f }
+	SOF_FLAG(SOF_TIMESTAMPING_SOFTWARE),
+	SOF_FLAG(SOF_TIMESTAMPING_RX_SOFTWARE),
+	SOF_FLAG(SOF_TIMESTAMPING_RX_HARDWARE),
+};
+
+static struct socket_type socket_types[] = {
+	{ "ip",		SOCK_DGRAM,	IPPROTO_IP },
+	{ "udp",	SOCK_DGRAM,	IPPROTO_UDP },
+	{ "tcp",	SOCK_STREAM,	IPPROTO_TCP },
+};
+
+static struct test_case test_cases[] = {
+	{ {}, {} },
+	{
+		{ so_timestamp: 1 },
+		{ tstamp: true }
+	},
+	{
+		{ so_timestampns: 1 },
+		{ tstampns: true }
+	},
+	{
+		{ so_timestamp: 1, so_timestampns: 1 },
+		{ tstampns: true }
+	},
+	{
+		{ so_timestamping: SOF_TIMESTAMPING_RX_SOFTWARE },
+		{}
+	},
+	{
+		/* Loopback device does not support hw timestamps. */
+		{ so_timestamping: SOF_TIMESTAMPING_RX_HARDWARE },
+		{}
+	},
+	{
+		{ so_timestamping: SOF_TIMESTAMPING_SOFTWARE },
+		{}
+	},
+	{
+		{ so_timestamping: SOF_TIMESTAMPING_RX_SOFTWARE
+			| SOF_TIMESTAMPING_RX_HARDWARE },
+		{}
+	},
+	{
+		{ so_timestamping: SOF_TIMESTAMPING_SOFTWARE
+			| SOF_TIMESTAMPING_RX_SOFTWARE },
+		{ swtstamp: true }
+	},
+	{
+		{ so_timestamp: 1, so_timestamping: SOF_TIMESTAMPING_SOFTWARE
+			| SOF_TIMESTAMPING_RX_SOFTWARE },
+		{ tstamp: true, swtstamp: true }
+	},
+};
+
+static struct option long_options[] = {
+	{ "list_tests", no_argument, 0, 'l' },
+	{ "test_num", required_argument, 0, 'n' },
+	{ "op_size", required_argument, 0, 's' },
+	{ "tcp", no_argument, 0, 't' },
+	{ "udp", no_argument, 0, 'u' },
+	{ "ip", no_argument, 0, 'i' },
+};
+
+static int next_port = 19999;
+static int op_size = 10 * 1024;
+
+void print_test_case(struct test_case *t)
+{
+	int f = 0;
+
+	printf("sockopts {");
+	if (t->sockopt.so_timestamp)
+		printf(" SO_TIMESTAMP ");
+	if (t->sockopt.so_timestampns)
+		printf(" SO_TIMESTAMPNS ");
+	if (t->sockopt.so_timestamping) {
+		printf(" SO_TIMESTAMPING: {");
+		for (f = 0; f < ARRAY_SIZE(sof_flags); f++)
+			if (t->sockopt.so_timestamping & sof_flags[f].mask)
+				printf(" %s |", sof_flags[f].name);
+		printf("}");
+	}
+	printf("} expected cmsgs: {");
+	if (t->expected.tstamp)
+		printf(" SCM_TIMESTAMP ");
+	if (t->expected.tstampns)
+		printf(" SCM_TIMESTAMPNS ");
+	if (t->expected.swtstamp || t->expected.hwtstamp) {
+		printf(" SCM_TIMESTAMPING {");
+		if (t->expected.swtstamp)
+			printf("0");
+		if (t->expected.swtstamp && t->expected.hwtstamp)
+			printf(",");
+		if (t->expected.hwtstamp)
+			printf("2");
+		printf("}");
+	}
+	printf("}\n");
+}
+
+void do_send(int src)
+{
+	int r;
+	char *buf = malloc(op_size);
+
+	memset(buf, 'z', op_size);
+	r = write(src, buf, op_size);
+	if (r < 0)
+		error(1, errno, "Failed to sendmsg");
+
+	free(buf);
+}
+
+bool do_recv(int rcv, struct tstamps expected)
+{
+	const int CMSG_SIZE = 1024;
+
+	struct scm_timestamping *ts;
+	struct tstamps actual = {};
+	char cmsg_buf[CMSG_SIZE];
+	struct iovec recv_iov;
+	struct cmsghdr *cmsg;
+	bool failed = false;
+	struct msghdr hdr;
+	int flags = 0;
+	int r;
+
+	memset(&hdr, 0, sizeof(hdr));
+	hdr.msg_iov = &recv_iov;
+	hdr.msg_iovlen = 1;
+	recv_iov.iov_base = malloc(op_size);
+	recv_iov.iov_len = op_size;
+
+	hdr.msg_control = cmsg_buf;
+	hdr.msg_controllen = sizeof(cmsg_buf);
+
+	r = recvmsg(rcv, &hdr, flags);
+	if (r < 0)
+		error(1, errno, "Failed to recvmsg");
+	if (r != op_size)
+		error(1, 0, "Only received %d bytes of payload.", r);
+
+	if (hdr.msg_flags & (MSG_TRUNC | MSG_CTRUNC))
+		error(1, 0, "Message was truncated.");
+
+	for (cmsg = CMSG_FIRSTHDR(&hdr); cmsg != NULL;
+	     cmsg = CMSG_NXTHDR(&hdr, cmsg)) {
+		if (cmsg->cmsg_level != SOL_SOCKET)
+			error(1, 0, "Unexpected cmsg_level %d",
+			      cmsg->cmsg_level);
+		switch (cmsg->cmsg_type) {
+		case SCM_TIMESTAMP:
+			actual.tstamp = true;
+			break;
+		case SCM_TIMESTAMPNS:
+			actual.tstampns = true;
+			break;
+		case SCM_TIMESTAMPING:
+			ts = (struct scm_timestamping *)CMSG_DATA(cmsg);
+			actual.swtstamp = !!ts->ts[0].tv_sec;
+			actual.hwtstamp = !!ts->ts[2].tv_sec;
+			break;
+		default:
+			error(1, 0, "Unexpected cmsg_type %d", cmsg->cmsg_type);
+		}
+	}
+
+#define VALIDATE(field) \
+	do { \
+		if (expected.field != actual.field) { \
+			if (expected.field) \
+				error(0, 0, "Expected " #field " to be set."); \
+			else \
+				error(0, 0, \
+				      "Expected " #field " to not be set."); \
+			failed = true; \
+		} \
+	} while (0)
+
+	VALIDATE(tstamp);
+	VALIDATE(tstampns);
+	VALIDATE(swtstamp);
+	VALIDATE(hwtstamp);
+#undef VALIDATE
+
+	free(recv_iov.iov_base);
+
+	return failed;
+}
+
+void config_so_flags(int rcv, struct options o)
+{
+	int on = 1;
+
+	if (setsockopt(rcv, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) < 0)
+		error(1, errno, "Failed to enable SO_REUSEADDR");
+
+	if (o.so_timestamp &&
+	    setsockopt(rcv, SOL_SOCKET, SO_TIMESTAMP,
+		       &o.so_timestamp, sizeof(o.so_timestamp)) < 0)
+		error(1, errno, "Failed to enable SO_TIMESTAMP");
+
+	if (o.so_timestampns &&
+	    setsockopt(rcv, SOL_SOCKET, SO_TIMESTAMPNS,
+		       &o.so_timestampns, sizeof(o.so_timestampns)) < 0)
+		error(1, errno, "Failed to enable SO_TIMESTAMPNS");
+
+	if (o.so_timestamping &&
+	    setsockopt(rcv, SOL_SOCKET, SO_TIMESTAMPING,
+		       &o.so_timestamping, sizeof(o.so_timestamping)) < 0)
+		error(1, errno, "Failed to set SO_TIMESTAMPING");
+}
+
+bool run_test_case(struct socket_type s, struct test_case t)
+{
+	int port = (s.type == IPPROTO_IP) ? 0 : next_port++;
+	struct sockaddr_in addr;
+	bool failed = false;
+	int src, dst, rcv;
+
+	src = socket(AF_INET, s.type, s.protocol);
+	if (src < 0)
+		error(1, errno, "Failed to open src socket");
+
+	dst = socket(AF_INET, s.type, s.protocol);
+	if (dst < 0)
+		error(1, errno, "Failed to open dst socket");
+
+	memset(&addr, 0, sizeof(addr));
+	addr.sin_family = AF_INET;
+	addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+	addr.sin_port = htons(port);
+
+	if (bind(dst, (struct sockaddr *)&addr, sizeof(addr)) < 0)
+		error(1, errno, "Failed to bind to port %d", port);
+
+	if (s.type == SOCK_STREAM && (listen(dst, 1) < 0))
+		error(1, errno, "Failed to listen");
+
+	if (connect(src, (struct sockaddr *)&addr, sizeof(addr)) < 0)
+		error(1, errno, "Failed to connect");
+
+	if (s.type == SOCK_STREAM) {
+		rcv = accept(dst, NULL, NULL);
+		if (rcv < 0)
+			error(1, errno, "Failed to accept");
+		close(dst);
+	} else {
+		rcv = dst;
+	}
+
+	config_so_flags(rcv, t.sockopt);
+	usleep(20000); /* setsockopt for SO_TIMESTAMPING is asynchronous */
+	do_send(src);
+	failed = do_recv(rcv, t.expected);
+
+	close(rcv);
+	close(src);
+
+	return failed;
+}
+
+int main(int argc, char **argv)
+{
+	bool all_protocols = true;
+	bool all_tests = true;
+	int arg_index = 0;
+	int failures = 0;
+	int s, t;
+	char opt;
+
+	while ((opt = getopt_long(argc, argv, "", long_options,
+				  &arg_index)) != -1) {
+		switch (opt) {
+		case 'l':
+			for (t = 0; t < ARRAY_SIZE(test_cases); t++) {
+				printf("%d\t", t);
+				print_test_case(&test_cases[t]);
+			}
+			return 0;
+		case 'n':
+			t = atoi(optarg);
+			if (t > ARRAY_SIZE(test_cases))
+				error(1, 0, "Invalid test case: %d", t);
+			all_tests = false;
+			test_cases[t].enabled = true;
+			break;
+		case 's':
+			op_size = atoi(optarg);
+			break;
+		case 'u':
+			all_protocols = false;
+			socket_types[1].enabled = true;
+			break;
+		case 'i':
+			all_protocols = false;
+			socket_types[0].enabled = true;
+			break;
+		default:
+			error(1, 0, "Failed to parse parameters.");
+		}
+	}
+
+	for (s = 0; s < ARRAY_SIZE(socket_types); s++) {
+		if (!all_protocols && !socket_types[s].enabled)
+			continue;
+
+		printf("Testing %s...\n", socket_types[s].friendly_name);
+		for (t = 0; t < ARRAY_SIZE(test_cases); t++) {
+			if (!all_tests && !test_cases[t].enabled)
+				continue;
+
+			printf("Starting testcase %d...\n", t);
+			if (run_test_case(socket_types[s], test_cases[t])) {
+				failures++;
+				printf("FAILURE in test case ");
+				print_test_case(&test_cases[t]);
+			}
+		}
+	}
+	if (!failures)
+		printf("PASSED.\n");
+	return failures;
+}
-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply related

* [PATCH net-next 1/2] tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg
From: Mike Maloney @ 2017-08-22 17:27 UTC (permalink / raw)
  To: netdev, davem; +Cc: Mike Maloney
In-Reply-To: <20170822172703.31703-1-maloneykernel@gmail.com>

From: Mike Maloney <maloney@google.com>

When SOF_TIMESTAMPING_RX_SOFTWARE is enabled for tcp sockets, return the
timestamp corresponding to the highest sequence number data returned.

Previously the skb->tstamp is overwritten when a TCP packet is placed
in the out of order queue.  While the packet is in the ooo queue, save the
timestamp in the TCB_SKB_CB.  This space is shared with the gso_*
options which are only used on the tx path, and a previously unused 4
byte hole.

When skbs are coalesced either in the sk_receive_queue or the
out_of_order_queue always choose the timestamp of the appended skb to
maintain the invariant of returning the timestamp of the last byte in
the recvmsg buffer.

Signed-off-by: Mike Maloney <maloney@google.com>
---
 include/net/tcp.h    |  9 +++++++-
 net/ipv4/tcp.c       | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_input.c | 35 +++++++++++++++++++++++++----
 net/ipv4/tcp_ipv4.c  |  2 ++
 net/ipv6/tcp_ipv6.c  |  2 ++
 5 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index afdab3781425..f26d20e9760d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -774,6 +774,12 @@ struct tcp_skb_cb {
 			u16	tcp_gso_segs;
 			u16	tcp_gso_size;
 		};
+
+		/* Used to stash the receive timestamp while this skb is in the
+		 * out of order queue, as skb->tstamp is overwritten by the
+		 * rbnode.
+		 */
+		ktime_t		swtstamp;
 	};
 	__u8		tcp_flags;	/* TCP header flags. (tcp[13])	*/
 
@@ -790,7 +796,8 @@ struct tcp_skb_cb {
 	__u8		ip_dsfield;	/* IPv4 tos or IPv6 dsfield	*/
 	__u8		txstamp_ack:1,	/* Record TX timestamp for ack? */
 			eor:1,		/* Is skb MSG_EOR marked? */
-			unused:6;
+			has_rxtstamp:1,	/* SKB has a RX timestamp	*/
+			unused:5;
 	__u32		ack_seq;	/* Sequence number ACK'd	*/
 	union {
 		struct {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d25e3bcca66b..4c58c7b2d8ed 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -269,6 +269,7 @@
 #include <linux/err.h>
 #include <linux/time.h>
 #include <linux/slab.h>
+#include <linux/errqueue.h>
 
 #include <net/icmp.h>
 #include <net/inet_common.h>
@@ -1695,6 +1696,59 @@ int tcp_peek_len(struct socket *sock)
 }
 EXPORT_SYMBOL(tcp_peek_len);
 
+static void tcp_update_recv_tstamps(struct sk_buff *skb,
+				    struct scm_timestamping *tss)
+{
+	if (skb->tstamp)
+		tss->ts[0] = ktime_to_timespec(skb->tstamp);
+	else
+		tss->ts[0] = (struct timespec) {0};
+
+	if (skb_hwtstamps(skb)->hwtstamp)
+		tss->ts[2] = ktime_to_timespec(skb_hwtstamps(skb)->hwtstamp);
+	else
+		tss->ts[2] = (struct timespec) {0};
+}
+
+/* Similar to __sock_recv_timestamp, but does not require an skb */
+void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
+			struct scm_timestamping *tss)
+{
+	struct timeval tv;
+	bool has_timestamping = false;
+
+	if (tss->ts[0].tv_sec || tss->ts[0].tv_nsec) {
+		if (sock_flag(sk, SOCK_RCVTSTAMP)) {
+			if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+				put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
+					 sizeof(tss->ts[0]), &tss->ts[0]);
+			} else {
+				tv.tv_sec = tss->ts[0].tv_sec;
+				tv.tv_usec = tss->ts[0].tv_nsec / 1000;
+
+				put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
+					 sizeof(tv), &tv);
+			}
+		}
+
+		if (sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE)
+			has_timestamping = true;
+		else
+			tss->ts[0] = (struct timespec) {0};
+	}
+
+	if (tss->ts[2].tv_sec || tss->ts[2].tv_nsec) {
+		if (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
+			has_timestamping = true;
+		else
+			tss->ts[2] = (struct timespec) {0};
+	}
+
+	if (has_timestamping)
+		put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING,
+			 sizeof(*tss), tss);
+}
+
 /*
  *	This routine copies from a sock struct into the user buffer.
  *
@@ -1716,6 +1770,8 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	long timeo;
 	struct sk_buff *skb, *last;
 	u32 urg_hole = 0;
+	struct scm_timestamping tss;
+	bool has_tss = false;
 
 	if (unlikely(flags & MSG_ERRQUEUE))
 		return inet_recv_error(sk, msg, len, addr_len);
@@ -1911,6 +1967,10 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		if (used + offset < skb->len)
 			continue;
 
+		if (TCP_SKB_CB(skb)->has_rxtstamp) {
+			tcp_update_recv_tstamps(skb, &tss);
+			has_tss = true;
+		}
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			goto found_fin_ok;
 		if (!(flags & MSG_PEEK))
@@ -1929,6 +1989,9 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	 * on connected socket. I was just happy when found this 8) --ANK
 	 */
 
+	if (has_tss)
+		tcp_recv_timestamp(msg, sk, &tss);
+
 	/* Clean up data we have read: This will do ACK frames. */
 	tcp_cleanup_rbuf(sk, copied);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ddc854728a60..66abcbf6f381 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4246,9 +4246,15 @@ static void tcp_sack_remove(struct tcp_sock *tp)
 	tp->rx_opt.num_sacks = num_sacks;
 }
 
+enum tcp_queue {
+	OOO_QUEUE,
+	RCV_QUEUE,
+};
+
 /**
  * tcp_try_coalesce - try to merge skb to prior one
  * @sk: socket
+ * @dest: destination queue
  * @to: prior buffer
  * @from: buffer to add in queue
  * @fragstolen: pointer to boolean
@@ -4260,6 +4266,7 @@ static void tcp_sack_remove(struct tcp_sock *tp)
  * Returns true if caller should free @from instead of queueing it
  */
 static bool tcp_try_coalesce(struct sock *sk,
+			     enum tcp_queue dest,
 			     struct sk_buff *to,
 			     struct sk_buff *from,
 			     bool *fragstolen)
@@ -4281,6 +4288,15 @@ static bool tcp_try_coalesce(struct sock *sk,
 	TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
 	TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
 	TCP_SKB_CB(to)->tcp_flags |= TCP_SKB_CB(from)->tcp_flags;
+
+	if (TCP_SKB_CB(from)->has_rxtstamp) {
+		TCP_SKB_CB(to)->has_rxtstamp = true;
+		if (dest == OOO_QUEUE)
+			TCP_SKB_CB(to)->swtstamp = TCP_SKB_CB(from)->swtstamp;
+		else
+			to->tstamp = from->tstamp;
+	}
+
 	return true;
 }
 
@@ -4315,6 +4331,9 @@ static void tcp_ofo_queue(struct sock *sk)
 		}
 		p = rb_next(p);
 		rb_erase(&skb->rbnode, &tp->out_of_order_queue);
+		/* Replace tstamp which was stomped by rbnode */
+		if (TCP_SKB_CB(skb)->has_rxtstamp)
+			skb->tstamp = TCP_SKB_CB(skb)->swtstamp;
 
 		if (unlikely(!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))) {
 			SOCK_DEBUG(sk, "ofo packet was already received\n");
@@ -4326,7 +4345,8 @@ static void tcp_ofo_queue(struct sock *sk)
 			   TCP_SKB_CB(skb)->end_seq);
 
 		tail = skb_peek_tail(&sk->sk_receive_queue);
-		eaten = tail && tcp_try_coalesce(sk, tail, skb, &fragstolen);
+		eaten = tail && tcp_try_coalesce(sk, RCV_QUEUE,
+						 tail, skb, &fragstolen);
 		tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
 		fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN;
 		if (!eaten)
@@ -4380,6 +4400,10 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		return;
 	}
 
+	/* Stash tstamp to avoid being stomped on by rbnode */
+	if (TCP_SKB_CB(skb)->has_rxtstamp)
+		TCP_SKB_CB(skb)->swtstamp = skb->tstamp;
+
 	inet_csk_schedule_ack(sk);
 
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
@@ -4405,7 +4429,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 	/* In the typical case, we are adding an skb to the end of the list.
 	 * Use of ooo_last_skb avoids the O(Log(N)) rbtree lookup.
 	 */
-	if (tcp_try_coalesce(sk, tp->ooo_last_skb, skb, &fragstolen)) {
+	if (tcp_try_coalesce(sk, OOO_QUEUE, tp->ooo_last_skb,
+			     skb, &fragstolen)) {
 coalesce_done:
 		tcp_grow_window(sk, skb);
 		kfree_skb_partial(skb, fragstolen);
@@ -4455,7 +4480,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 				__kfree_skb(skb1);
 				goto merge_right;
 			}
-		} else if (tcp_try_coalesce(sk, skb1, skb, &fragstolen)) {
+		} else if (tcp_try_coalesce(sk, OOO_QUEUE, skb1,
+					    skb, &fragstolen)) {
 			goto coalesce_done;
 		}
 		p = &parent->rb_right;
@@ -4506,7 +4532,8 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
 
 	__skb_pull(skb, hdrlen);
 	eaten = (tail &&
-		 tcp_try_coalesce(sk, tail, skb, fragstolen)) ? 1 : 0;
+		 tcp_try_coalesce(sk, RCV_QUEUE, tail,
+				  skb, fragstolen)) ? 1 : 0;
 	tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
 	if (!eaten) {
 		__skb_queue_tail(&sk->sk_receive_queue, skb);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5af8b809dfbc..a63486afa7a7 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1637,6 +1637,8 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph);
 	TCP_SKB_CB(skb)->sacked	 = 0;
+	TCP_SKB_CB(skb)->has_rxtstamp =
+			skb->tstamp || skb_hwtstamps(skb)->hwtstamp;
 
 lookup:
 	sk = __inet_lookup_skb(&tcp_hashinfo, skb, __tcp_hdrlen(th), th->source,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d79a1af3252e..abba3bc2a3d9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1394,6 +1394,8 @@ static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr,
 	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv6_get_dsfield(hdr);
 	TCP_SKB_CB(skb)->sacked = 0;
+	TCP_SKB_CB(skb)->has_rxtstamp =
+			skb->tstamp || skb_hwtstamps(skb)->hwtstamp;
 }
 
 static int tcp_v6_rcv(struct sk_buff *skb)
-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply related

* [PATCH net-next 0/2] tcp: Add software rx timestamp for TCP.
From: Mike Maloney @ 2017-08-22 17:27 UTC (permalink / raw)
  To: netdev, davem; +Cc: Mike Maloney

From: Mike Maloney <maloney@google.com>

Add software rx timestamps for TCP, and a test to ensure consistency of
behavior between IP, UDP, and TCP implementations.

Mike Maloney (2):
  tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg
  selftests/net: Add a test to validate behavior of rx timestamps

 include/net/tcp.h                                  |   9 +-
 net/ipv4/tcp.c                                     |  63 ++++
 net/ipv4/tcp_input.c                               |  35 +-
 net/ipv4/tcp_ipv4.c                                |   2 +
 net/ipv6/tcp_ipv6.c                                |   2 +
 .../selftests/networking/timestamping/.gitignore   |   1 +
 .../selftests/networking/timestamping/Makefile     |   4 +-
 .../networking/timestamping/rxtimestamp.c          | 379 +++++++++++++++++++++
 8 files changed, 489 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/networking/timestamping/rxtimestamp.c

-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply

* Re: [PATCH RFC net] fsl/man: Inherit parent device and of_node
From: David Miller @ 2017-08-22 17:26 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, andrew, vivien.didelot, madalin.bucur, junote,
	igal.liberman
In-Reply-To: <20170819001255.20755-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Fri, 18 Aug 2017 17:12:55 -0700

> @@ -623,6 +623,8 @@ static struct platform_device *dpaa_eth_add_device(int fman_id,
>  		goto no_mem;
>  	}
>  
> +	pdev->dev.of_node = node;
> +	pdev->dev.parent = priv->dev;
>  	set_dma_ops(&pdev->dev, get_dma_ops(priv->dev));
>  
>  	ret = platform_device_add_data(pdev, &data, sizeof(data));

I guess since we allocate and manage the platform device here, we can
fumble around safely with it's device node pointer and parent.

So this should be ok.

^ permalink raw reply

* Re: [PATCH net] ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
From: David Miller @ 2017-08-22 17:23 UTC (permalink / raw)
  To: sbrivio; +Cc: netdev, sd, hannes
In-Reply-To: <9597319ba96a966ee580bd04f36584b52dfa2c48.1503059705.git.sbrivio@redhat.com>

From: Stefano Brivio <sbrivio@redhat.com>
Date: Fri, 18 Aug 2017 14:40:53 +0200

> A packet length of exactly IPV6_MAXPLEN is allowed, we should
> refuse parsing options only if the size is 64KiB or more.
> 
> While at it, remove one extra variable and one assignment which
> were also introduced by the commit that introduced the size
> check. Checking the sum 'offset + len' and only later adding
> 'len' to 'offset' doesn't provide any advantage over directly
> summing to 'offset' and checking it.
> 
> Fixes: 6399f1fae4ec ("ipv6: avoid overflow of offset in ip6_find_1stfragopt")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Willem de Bruijn @ 2017-08-22 17:19 UTC (permalink / raw)
  To: Koichiro Den; +Cc: Michael S. Tsirkin, virtualization, Network Development
In-Reply-To: <1503410668.8694.14.camel@klaipeden.com>

>> > >         /* Don't wait up for transmitted skbs to be freed. */
>> > >         if (!use_napi) {
>> > > +               if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
>> > > +                       struct ubuf_info *uarg;
>> > > +                       uarg = skb_shinfo(skb)->destructor_arg;
>> > > +                       if (uarg->callback)
>> > > +                           uarg->callback(uarg, true);
>> > > +                       skb_shinfo(skb)->destructor_arg = NULL;
>> > > +                       skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
>> > > +               }
>> >
>> > Instead of open coding, this can use skb_zcopy_clear.
>>
>> It is not correct to only send the zerocopy completion here, as
>> the skb will still be shared with the nic. The only safe approach
>> to notifying early is to create a copy with skb_orphan_frags_rx.
>> That will call skb_zcopy_clear(skb, false) internally. But the
>> copy will be very expensive.
> I noticed this email after my last post. I cannot not imagine how it could be
> unsafe in virtio case. Sorry could you explain a bit more?

A guest process sends a packet with MSG_ZEROCOPY to the
virtio-net device. As soon as the process receives the completion
notification, it is allowed to reuse the memory backing the packet.

A call to skb_zcopy_clear in virtio-net start_xmit will notify the
process that it is allowed to reuse the memory. But the user pages
are still linked into the skb frags and are about to be shared with
the host os.

>> Is the deadlock you refer to the case where tx interrupts are
>> disabled, so transmit completions are only handled in start_xmit
>> and somehow the socket(s) are unable to send new data? The
>> question is what is blocking them. If it is zerocopy, it may be a
>> too low optmem_max or hitting the per-user locked pages limit.
>> We may have to keep interrupts enabled when zerocopy skbs
>> are in flight.
> Even if we keep interrupts enabled, We miss the completion without start_xmit.
> And it's also likely that the next start_xmit depends on the completion itself
> as I described in my last post.
>

^ permalink raw reply

* Re: XDP redirect measurements, gotchas and tracepoints
From: John Fastabend @ 2017-08-22 17:17 UTC (permalink / raw)
  To: Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: xdp-newbies@vger.kernel.org, Daniel Borkmann, Andy Gospodarek,
	netdev@vger.kernel.org, Paweł Staszewski
In-Reply-To: <20170822170913.35umf4j6hlmcnwtm@ast-mbp>

On 08/22/2017 10:09 AM, Alexei Starovoitov wrote:
> On Tue, Aug 22, 2017 at 08:37:10AM +0200, Jesper Dangaard Brouer wrote:
>>
>>
>>> Once tx-ing netdev added to devmap we can enable xdp on it automatically?
>>
>> I think you are referring to Gotcha-2 here:
> 
> oops. yes :)
> 
>>
>>   Second gotcha(2): you cannot TX out a device, unless it also have a
>>   xdp bpf program attached. (This is an implicit dependency, as the
>>   driver code need to setup XDP resources before it can ndo_xdp_xmit).
>>
>> Yes, we should work on improving this situation.  Auto enabling XDP
>> when a netdev is added to a devmap is a good solution.  Currently this
>> is tied to loading an XDP bpf_prog.  Do you propose loading a dummy
>> bpf_prog on the netdev? (then we need to handle 1. not replacing
>> existing bpf_prog, 2. on take-down don't remove "later" loaded
>> bpf_prog).
> 
> right. these things need to be taken care of.
> Technically for ndo_xdp_xmit to work the program doesn't need
> to be attached, but the device needs to be in xdp mode with
> configured xdp tx rings.
> The easiest, of course, is just to document it :)
> and may be add some sort of warning that if netdev is added
> to devmap and it's not in xdp mode, return warning or error.
> 

When I wrote this I assumed some user space piece could
load the "dummy" nop program on devices as needed. It seemed
easier than putting semi-complex logic in the kernel to load
programs on update_elem, but only if the user hasn't already
loaded a program and then unload it but again only if some
criteria is met. Then we would have one more kernel path into
load/unload BPF programs and would need all the tests and what
not.

+1 for documenting and userland usability patches.

.John

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Willem de Bruijn @ 2017-08-22 17:16 UTC (permalink / raw)
  To: Koichiro Den
  Cc: Jason Wang, Michael S. Tsirkin, virtualization,
	Network Development
In-Reply-To: <1503409339.8694.12.camel@klaipeden.com>

>> > An issue of the referenced patch is that sndbuf could be smaller than low
>> > watermark.
> We cannot determine the low watermark properly because of not only sndbuf size
> issue but also the fact that the upper vhost-net cannot directly see how much
> descriptor is currently available at the virtio-net tx queue. It depends on
> multiqueue settings or other senders which are also using the same tx queue.
> Note that in the latter case if they constantly transmitting, the deadlock could
> not occur(*), however if it has just temporarily fulfill some portion of the
> pool in the mean time, then the low watermark cannot be helpful.
> (*: That is because it's reliable enough in the sense I mention below.)
>
> Keep in this in mind, let me briefly describe the possible deadlock I mentioned:
> (1). vhost-net on L1 guest has nothing to do sendmsg until the upper layer sets
> new descriptors, which depends only on the vhost-net zcopy callback and adding
> newly used descriptors.
> (2). vhost-net callback depends on the skb freeing on the xmit path only.
> (3). the xmit path depends (possibly only) on the vhost-net sendmsg.
> As you see, it's enough to bring about the situation above that L1 virtio-net
> reaches its limit earlier than the L0 host processing. The vhost-net pool could
> be almost full or empty, whatever.

Thanks for the context. This issue is very similar to the one that used to
exist when running out of transmit descriptors, before the removal of
the timer and introduction of skb_orphan in start_xmit.

To make sure that I understand correctly, let me paraphrase:

A. guest socket cannot send because it exhausted its sk budget (sndbuf, tsq, ..)

B. budget is not freed up until guest receives tx completion for this flow

C. tx completion is held back on the host side in vhost_zerocopy_signal_used
   behind the completion for an unrelated skb

D. unrelated packet is delayed somewhere in the host stackf zerocopy
completions.
   e.g., netem

The issue that is specific to vhost-net zerocopy is that (C) enforces strict
ordering of transmit completions causing head of line blocking behind
vhost-net zerocopy callbacks.

This is a different problem from

C1. tx completion is delayed until guest sends another packet and
       triggers free_old_xmit_skb

Both in host and guest, zerocopy packets should never be able to loop
to a receive path where they can cause unbounded delay.

The obvious cases of latency are queueing, like netem. That leads
to poor performance for unrelated flows, but I don't see how this
could cause deadlock.

^ permalink raw reply

* Re: XDP redirect measurements, gotchas and tracepoints
From: Alexei Starovoitov @ 2017-08-22 17:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: xdp-newbies@vger.kernel.org, John Fastabend, Daniel Borkmann,
	Andy Gospodarek, netdev@vger.kernel.org, Paweł Staszewski
In-Reply-To: <20170822083710.47a182a2@redhat.com>

On Tue, Aug 22, 2017 at 08:37:10AM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> > Once tx-ing netdev added to devmap we can enable xdp on it automatically?
> 
> I think you are referring to Gotcha-2 here:

oops. yes :)

> 
>   Second gotcha(2): you cannot TX out a device, unless it also have a
>   xdp bpf program attached. (This is an implicit dependency, as the
>   driver code need to setup XDP resources before it can ndo_xdp_xmit).
> 
> Yes, we should work on improving this situation.  Auto enabling XDP
> when a netdev is added to a devmap is a good solution.  Currently this
> is tied to loading an XDP bpf_prog.  Do you propose loading a dummy
> bpf_prog on the netdev? (then we need to handle 1. not replacing
> existing bpf_prog, 2. on take-down don't remove "later" loaded
> bpf_prog).

right. these things need to be taken care of.
Technically for ndo_xdp_xmit to work the program doesn't need
to be attached, but the device needs to be in xdp mode with
configured xdp tx rings.
The easiest, of course, is just to document it :)
and may be add some sort of warning that if netdev is added
to devmap and it's not in xdp mode, return warning or error.

^ permalink raw reply

* [PATCH net-next v2 02/10] net: mvpp2: fix the synchronization module bypass macro name
From: Antoine Tenart @ 2017-08-22 17:08 UTC (permalink / raw)
  To: davem, jason, andrew, gregory.clement, sebastian.hesselbarth
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, mw, stefanc,
	netdev, linux-arm-kernel
In-Reply-To: <20170822170830.32413-1-antoine.tenart@free-electrons.com>

The macro defining the bit to toggle to bypass or not the
synchronization module is wrongly named. Writing 1 will disable bypass.
This patch s/MVPP22_CTRL4_SYNC_BYPASS/MVPP22_CTRL4_SYNC_BYPASS_DIS/.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 34c679f25fec..03b7ced1082f 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -346,7 +346,7 @@
 #define MVPP22_GMAC_CTRL_4_REG			0x90
 #define     MVPP22_CTRL4_EXT_PIN_GMII_SEL	BIT(0)
 #define     MVPP22_CTRL4_DP_CLK_SEL		BIT(5)
-#define     MVPP22_CTRL4_SYNC_BYPASS		BIT(6)
+#define     MVPP22_CTRL4_SYNC_BYPASS_DIS	BIT(6)
 #define     MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE	BIT(7)
 
 /* Per-port XGMAC registers. PPv2.2 only, only for GOP port 0,
@@ -4269,7 +4269,7 @@ static void mvpp22_port_mii_set(struct mvpp2_port *port)
 	else
 		val &= ~MVPP22_CTRL4_EXT_PIN_GMII_SEL;
 	val &= ~MVPP22_CTRL4_DP_CLK_SEL;
-	val |= MVPP22_CTRL4_SYNC_BYPASS;
+	val |= MVPP22_CTRL4_SYNC_BYPASS_DIS;
 	val |= MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE;
 	writel(val, port->base + MVPP22_GMAC_CTRL_4_REG);
 }
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v2 08/10] Documentation/bindings: net: marvell-pp2: add the system controller
From: Antoine Tenart @ 2017-08-22 17:08 UTC (permalink / raw)
  To: davem, jason, andrew, gregory.clement, sebastian.hesselbarth
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, mw, stefanc,
	netdev, linux-arm-kernel
In-Reply-To: <20170822170830.32413-1-antoine.tenart@free-electrons.com>

This patch documents the new marvell,system-controller property used by
the Marvell ppv2 network driver.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
---
 Documentation/devicetree/bindings/net/marvell-pp2.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/marvell-pp2.txt b/Documentation/devicetree/bindings/net/marvell-pp2.txt
index 8918ad3ccf14..49484db81583 100644
--- a/Documentation/devicetree/bindings/net/marvell-pp2.txt
+++ b/Documentation/devicetree/bindings/net/marvell-pp2.txt
@@ -45,6 +45,7 @@ Optional properties (port):
                    be the name associated to the interrupts listed. Valid
                    names are: "tx-cpu0", "tx-cpu1", "tx-cpu2", "tx-cpu3",
 		   "rx-shared".
+- marvell,system-controller: a phandle to the system controller.
 
 Example for marvell,armada-375-pp2:
 
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v2 10/10] arm64: dts: marvell: mcbin: enable more networking ports
From: Antoine Tenart @ 2017-08-22 17:08 UTC (permalink / raw)
  To: davem, jason, andrew, gregory.clement, sebastian.hesselbarth
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, mw, stefanc,
	netdev, linux-arm-kernel
In-Reply-To: <20170822170830.32413-1-antoine.tenart@free-electrons.com>

This patch enables the two GE/SFP ports. They are configured in 10GKR
mode by default. To do this the cpm_xdmio is enabled as well, and two
phy descriptions are added.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
---
 arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 30 +++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
index abd39d1c1739..6cb4b000e1ac 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
@@ -127,6 +127,30 @@
 	};
 };
 
+&cpm_xmdio {
+	status = "okay";
+
+	phy0: ethernet-phy@0 {
+		compatible = "ethernet-phy-ieee802.3-c45";
+		reg = <0>;
+	};
+
+	phy1: ethernet-phy@1 {
+		compatible = "ethernet-phy-ieee802.3-c45";
+		reg = <8>;
+	};
+};
+
+&cpm_ethernet {
+	status = "okay";
+};
+
+&cpm_eth0 {
+	status = "okay";
+	phy = <&phy0>;
+	phy-mode = "10gbase-kr";
+};
+
 &cpm_sata0 {
 	/* CPM Lane 0 - U29 */
 	status = "okay";
@@ -154,6 +178,12 @@
 	status = "okay";
 };
 
+&cps_eth0 {
+	status = "okay";
+	phy = <&phy1>;
+	phy-mode = "10gbase-kr";
+};
+
 &cps_eth1 {
 	/* CPS Lane 0 - J5 (Gigabit RJ45) */
 	status = "okay";
-- 
2.13.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox