[PATCH net-next v2 0/4] mvneta: Introduce RSS support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v2 0/4] mvneta: Introduce RSS support
@ 2015-12-04 18:44 Gregory CLEMENT
  2015-12-04 18:44 ` [PATCH net-next v2 1/4] net: mvneta: Make the default queue related for each port Gregory CLEMENT
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Gregory CLEMENT @ 2015-12-04 18:44 UTC (permalink / raw)
  To: David S. Miller, linux-kernel, netdev, Thomas Petazzoni
  Cc: Lior Amsalem, Andrew Lunn, Russell King - ARM Linux, Jason Cooper,
	Arnd Bergmann, Boris BREZILLON, Simon Guinot, Nadav Haklai,
	Ezequiel Garcia, Gregory CLEMENT, Maxime Ripard, Marcin Wojtas,
	Willy Tarreau, linux-arm-kernel, Sebastian Hesselbarth

Hi,

this series is the first step add RSS support on mvneta.

It will allow associating an ethernet interface to a given CPU through
RSS by using "ethtool -X ethX weight". Indeed, currently I only enable
one entry in the RSS lookup table. Even if it is not really RSS, it
allows to get back the irq affinity feature we lost by using the
percpu interrupt.

The main change compared to the first version is the add of the last
patch. It is not related to RSS but it applied on top of the series so
that why I added it there. This one was suggested by Arnd and allows
better tx latency in some scenario. For example we see a more
predictable behavior regarding throughput and latency when we have
multiple threads sending out data on different CPUs, in particular
different kinds of data.

I also fixed some typos in the third patch.

Th first patch make the default queue associate to each port and no
more a global variable.

The second patch really associates the RX queues with the CPUs instead
of masking the percpu interrupts for doing it. All the RX queues are
enabled and are statically associated with the CPUs by using a modulo
of the number of present CPUs. But at this stage only one RX queue
will receive the stream.

The third patch introduces a first level of RSS support through the
ethtool functions. As explained in the introduction there is only one
entry in the RSS lookup table which permits at the end to associate an
mvneta port to a CPU through the RX queues because the mapping is
static.

Thanks,

Gregory

Gregory CLEMENT (4):
  net: mvneta: Make the default queue related for each port
  net: mvneta: Associate RX queues with each CPU
  net: mvneta: Add naive RSS support
  net: mvneta: Spread out the TX queues management on all CPUs

 drivers/net/ethernet/marvell/mvneta.c | 320 +++++++++++++++++++++++++++++-----
 1 file changed, 276 insertions(+), 44 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next v2 1/4] net: mvneta: Make the default queue related for each port
  2015-12-04 18:44 [PATCH net-next v2 0/4] mvneta: Introduce RSS support Gregory CLEMENT
@ 2015-12-04 18:44 ` Gregory CLEMENT
  2015-12-04 18:44 ` [PATCH net-next v2 2/4] net: mvneta: Associate RX queues with each CPU Gregory CLEMENT
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Gregory CLEMENT @ 2015-12-04 18:44 UTC (permalink / raw)
  To: David S. Miller, linux-kernel, netdev, Thomas Petazzoni
  Cc: Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory CLEMENT,
	Ezequiel Garcia, linux-arm-kernel, Lior Amsalem, Nadav Haklai,
	Marcin Wojtas, Simon Guinot, Maxime Ripard, Boris BREZILLON,
	Russell King - ARM Linux, Willy Tarreau, Arnd Bergmann

Instead of using the same default queue for all the port. Move it in the
port struct. It will allow have a different default queue for each port.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvneta.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index e84c7f2634d3..1c7751917d58 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -354,6 +354,7 @@ struct mvneta_port {
 	struct mvneta_tx_queue *txqs;
 	struct net_device *dev;
 	struct notifier_block cpu_notifier;
+	int rxq_def;
 
 	/* Core clock */
 	struct clk *clk;
@@ -817,7 +818,7 @@ static void mvneta_port_up(struct mvneta_port *pp)
 	mvreg_write(pp, MVNETA_TXQ_CMD, q_map);
 
 	/* Enable all initialized RXQs. */
-	mvreg_write(pp, MVNETA_RXQ_CMD, BIT(rxq_def));
+	mvreg_write(pp, MVNETA_RXQ_CMD, BIT(pp->rxq_def));
 }
 
 /* Stop the Ethernet port activity */
@@ -1027,7 +1028,7 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
 	mvreg_write(pp, MVNETA_ACC_MODE, val);
 
 	/* Update val of portCfg register accordingly with all RxQueue types */
-	val = MVNETA_PORT_CONFIG_DEFL_VALUE(rxq_def);
+	val = MVNETA_PORT_CONFIG_DEFL_VALUE(pp->rxq_def);
 	mvreg_write(pp, MVNETA_PORT_CONFIG, val);
 
 	val = 0;
@@ -2076,19 +2077,19 @@ static void mvneta_set_rx_mode(struct net_device *dev)
 	if (dev->flags & IFF_PROMISC) {
 		/* Accept all: Multicast + Unicast */
 		mvneta_rx_unicast_promisc_set(pp, 1);
-		mvneta_set_ucast_table(pp, rxq_def);
-		mvneta_set_special_mcast_table(pp, rxq_def);
-		mvneta_set_other_mcast_table(pp, rxq_def);
+		mvneta_set_ucast_table(pp, pp->rxq_def);
+		mvneta_set_special_mcast_table(pp, pp->rxq_def);
+		mvneta_set_other_mcast_table(pp, pp->rxq_def);
 	} else {
 		/* Accept single Unicast */
 		mvneta_rx_unicast_promisc_set(pp, 0);
 		mvneta_set_ucast_table(pp, -1);
-		mvneta_mac_addr_set(pp, dev->dev_addr, rxq_def);
+		mvneta_mac_addr_set(pp, dev->dev_addr, pp->rxq_def);
 
 		if (dev->flags & IFF_ALLMULTI) {
 			/* Accept all multicast */
-			mvneta_set_special_mcast_table(pp, rxq_def);
-			mvneta_set_other_mcast_table(pp, rxq_def);
+			mvneta_set_special_mcast_table(pp, pp->rxq_def);
+			mvneta_set_other_mcast_table(pp, pp->rxq_def);
 		} else {
 			/* Accept only initialized multicast */
 			mvneta_set_special_mcast_table(pp, -1);
@@ -2097,7 +2098,7 @@ static void mvneta_set_rx_mode(struct net_device *dev)
 			if (!netdev_mc_empty(dev)) {
 				netdev_for_each_mc_addr(ha, dev) {
 					mvneta_mcast_addr_set(pp, ha->addr,
-							      rxq_def);
+							      pp->rxq_def);
 				}
 			}
 		}
@@ -2180,7 +2181,7 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
 	 * RX packets
 	 */
 	cause_rx_tx |= port->cause_rx_tx;
-	rx_done = mvneta_rx(pp, budget, &pp->rxqs[rxq_def]);
+	rx_done = mvneta_rx(pp, budget, &pp->rxqs[pp->rxq_def]);
 	budget -= rx_done;
 
 	if (budget > 0) {
@@ -2393,17 +2394,17 @@ static void mvneta_cleanup_txqs(struct mvneta_port *pp)
 /* Cleanup all Rx queues */
 static void mvneta_cleanup_rxqs(struct mvneta_port *pp)
 {
-	mvneta_rxq_deinit(pp, &pp->rxqs[rxq_def]);
+	mvneta_rxq_deinit(pp, &pp->rxqs[pp->rxq_def]);
 }
 
 
 /* Init all Rx queues */
 static int mvneta_setup_rxqs(struct mvneta_port *pp)
 {
-	int err = mvneta_rxq_init(pp, &pp->rxqs[rxq_def]);
+	int err = mvneta_rxq_init(pp, &pp->rxqs[pp->rxq_def]);
 	if (err) {
 		netdev_err(pp->dev, "%s: can't create rxq=%d\n",
-			   __func__, rxq_def);
+			   __func__, pp->rxq_def);
 		mvneta_cleanup_rxqs(pp);
 		return err;
 	}
@@ -2609,7 +2610,7 @@ static int mvneta_set_mac_addr(struct net_device *dev, void *addr)
 	mvneta_mac_addr_set(pp, dev->dev_addr, -1);
 
 	/* Set new addr in hw */
-	mvneta_mac_addr_set(pp, sockaddr->sa_data, rxq_def);
+	mvneta_mac_addr_set(pp, sockaddr->sa_data, pp->rxq_def);
 
 	eth_commit_mac_addr_change(dev, addr);
 	return 0;
@@ -2728,7 +2729,7 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
 {
 	int online_cpu_idx, cpu, i = 0;
 
-	online_cpu_idx = rxq_def % num_online_cpus();
+	online_cpu_idx = pp->rxq_def % num_online_cpus();
 
 	for_each_online_cpu(cpu) {
 		if (i == online_cpu_idx)
@@ -3306,6 +3307,8 @@ static int mvneta_probe(struct platform_device *pdev)
 				 strcmp(managed, "in-band-status") == 0);
 	pp->cpu_notifier.notifier_call = mvneta_percpu_notifier;
 
+	pp->rxq_def = rxq_def;
+
 	pp->clk = devm_clk_get(&pdev->dev, NULL);
 	if (IS_ERR(pp->clk)) {
 		err = PTR_ERR(pp->clk);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v2 2/4] net: mvneta: Associate RX queues with each CPU
  2015-12-04 18:44 [PATCH net-next v2 0/4] mvneta: Introduce RSS support Gregory CLEMENT
  2015-12-04 18:44 ` [PATCH net-next v2 1/4] net: mvneta: Make the default queue related for each port Gregory CLEMENT
@ 2015-12-04 18:44 ` Gregory CLEMENT
  2015-12-04 18:44 ` [PATCH net-next v2 3/4] net: mvneta: Add naive RSS support Gregory CLEMENT
  2015-12-04 18:45 ` [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs Gregory CLEMENT
  3 siblings, 0 replies; 9+ messages in thread
From: Gregory CLEMENT @ 2015-12-04 18:44 UTC (permalink / raw)
  To: David S. Miller, linux-kernel, netdev, Thomas Petazzoni
  Cc: Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory CLEMENT,
	Ezequiel Garcia, linux-arm-kernel, Lior Amsalem, Nadav Haklai,
	Marcin Wojtas, Simon Guinot, Maxime Ripard, Boris BREZILLON,
	Russell King - ARM Linux, Willy Tarreau, Arnd Bergmann

We enable the percpu interrupt for all the CPU and we just associate a
CPU to a few queue at the neta level. The mapping between the CPUs and
the queues is static. The queues are associated to the CPU module the
number of CPUs. However currently we only use on RX queue for a given
Ethernet port.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvneta.c | 150 ++++++++++++++++++++++++++--------
 1 file changed, 115 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 1c7751917d58..8974ab084839 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -109,9 +109,16 @@
 #define MVNETA_CPU_MAP(cpu)                      (0x2540 + ((cpu) << 2))
 #define      MVNETA_CPU_RXQ_ACCESS_ALL_MASK      0x000000ff
 #define      MVNETA_CPU_TXQ_ACCESS_ALL_MASK      0x0000ff00
+#define      MVNETA_CPU_RXQ_ACCESS(rxq)		 BIT(rxq)
 #define MVNETA_RXQ_TIME_COAL_REG(q)              (0x2580 + ((q) << 2))
 
-/* Exception Interrupt Port/Queue Cause register */
+/* Exception Interrupt Port/Queue Cause register
+ *
+ * Their behavior depend of the mapping done using the PCPX2Q
+ * registers. For a given CPU if the bit associated to a queue is not
+ * set, then for the register a read from this CPU will always return
+ * 0 and a write won't do anything
+ */
 
 #define MVNETA_INTR_NEW_CAUSE                    0x25a0
 #define MVNETA_INTR_NEW_MASK                     0x25a4
@@ -818,7 +825,13 @@ static void mvneta_port_up(struct mvneta_port *pp)
 	mvreg_write(pp, MVNETA_TXQ_CMD, q_map);
 
 	/* Enable all initialized RXQs. */
-	mvreg_write(pp, MVNETA_RXQ_CMD, BIT(pp->rxq_def));
+	for (queue = 0; queue < rxq_number; queue++) {
+		struct mvneta_rx_queue *rxq = &pp->rxqs[queue];
+
+		if (rxq->descs != NULL)
+			q_map |= (1 << queue);
+	}
+	mvreg_write(pp, MVNETA_RXQ_CMD, q_map);
 }
 
 /* Stop the Ethernet port activity */
@@ -986,6 +999,7 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
 	int cpu;
 	int queue;
 	u32 val;
+	int max_cpu = num_present_cpus();
 
 	/* Clear all Cause registers */
 	mvreg_write(pp, MVNETA_INTR_NEW_CAUSE, 0);
@@ -1001,13 +1015,23 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
 	/* Enable MBUS Retry bit16 */
 	mvreg_write(pp, MVNETA_MBUS_RETRY, 0x20);
 
-	/* Set CPU queue access map - all CPUs have access to all RX
-	 * queues and to all TX queues
+	/* Set CPU queue access map. CPUs are assigned to the RX
+	 * queues modulo their number and all the TX queues are
+	 * assigned to the CPU associated to the default RX queue.
 	 */
-	for_each_present_cpu(cpu)
-		mvreg_write(pp, MVNETA_CPU_MAP(cpu),
-			    (MVNETA_CPU_RXQ_ACCESS_ALL_MASK |
-			     MVNETA_CPU_TXQ_ACCESS_ALL_MASK));
+	for_each_present_cpu(cpu) {
+		int rxq_map = 0, txq_map = 0;
+		int rxq;
+
+		for (rxq = 0; rxq < rxq_number; rxq++)
+			if ((rxq % max_cpu) == cpu)
+				rxq_map |= MVNETA_CPU_RXQ_ACCESS(rxq);
+
+		if (cpu == rxq_def)
+			txq_map = MVNETA_CPU_TXQ_ACCESS_ALL_MASK;
+
+		mvreg_write(pp, MVNETA_CPU_MAP(cpu), rxq_map | txq_map);
+	}
 
 	/* Reset RX and TX DMAs */
 	mvreg_write(pp, MVNETA_PORT_RX_RESET, MVNETA_PORT_RX_DMA_RESET);
@@ -2149,6 +2173,7 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
 {
 	int rx_done = 0;
 	u32 cause_rx_tx;
+	int rx_queue;
 	struct mvneta_port *pp = netdev_priv(napi->dev);
 	struct mvneta_pcpu_port *port = this_cpu_ptr(pp->ports);
 
@@ -2180,8 +2205,15 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
 	/* For the case where the last mvneta_poll did not process all
 	 * RX packets
 	 */
+	rx_queue = fls(((cause_rx_tx >> 8) & 0xff));
+
 	cause_rx_tx |= port->cause_rx_tx;
-	rx_done = mvneta_rx(pp, budget, &pp->rxqs[pp->rxq_def]);
+
+	if (rx_queue) {
+		rx_queue = rx_queue - 1;
+		rx_done = mvneta_rx(pp, budget, &pp->rxqs[rx_queue]);
+	}
+
 	budget -= rx_done;
 
 	if (budget > 0) {
@@ -2394,19 +2426,27 @@ static void mvneta_cleanup_txqs(struct mvneta_port *pp)
 /* Cleanup all Rx queues */
 static void mvneta_cleanup_rxqs(struct mvneta_port *pp)
 {
-	mvneta_rxq_deinit(pp, &pp->rxqs[pp->rxq_def]);
+	int queue;
+
+	for (queue = 0; queue < txq_number; queue++)
+		mvneta_rxq_deinit(pp, &pp->rxqs[queue]);
 }
 
 
 /* Init all Rx queues */
 static int mvneta_setup_rxqs(struct mvneta_port *pp)
 {
-	int err = mvneta_rxq_init(pp, &pp->rxqs[pp->rxq_def]);
-	if (err) {
-		netdev_err(pp->dev, "%s: can't create rxq=%d\n",
-			   __func__, pp->rxq_def);
-		mvneta_cleanup_rxqs(pp);
-		return err;
+	int queue;
+
+	for (queue = 0; queue < rxq_number; queue++) {
+		int err = mvneta_rxq_init(pp, &pp->rxqs[queue]);
+
+		if (err) {
+			netdev_err(pp->dev, "%s: can't create rxq=%d\n",
+				   __func__, queue);
+			mvneta_cleanup_rxqs(pp);
+			return err;
+		}
 	}
 
 	return 0;
@@ -2430,6 +2470,19 @@ static int mvneta_setup_txqs(struct mvneta_port *pp)
 	return 0;
 }
 
+static void mvneta_percpu_unmask_interrupt(void *arg)
+{
+	struct mvneta_port *pp = arg;
+
+	/* All the queue are unmasked, but actually only the ones
+	 * maped to this CPU will be unmasked
+	 */
+	mvreg_write(pp, MVNETA_INTR_NEW_MASK,
+		    MVNETA_RX_INTR_MASK_ALL |
+		    MVNETA_TX_INTR_MASK_ALL |
+		    MVNETA_MISCINTR_INTR_MASK);
+}
+
 static void mvneta_start_dev(struct mvneta_port *pp)
 {
 	unsigned int cpu;
@@ -2447,11 +2500,10 @@ static void mvneta_start_dev(struct mvneta_port *pp)
 		napi_enable(&port->napi);
 	}
 
-	/* Unmask interrupts */
-	mvreg_write(pp, MVNETA_INTR_NEW_MASK,
-		    MVNETA_RX_INTR_MASK(rxq_number) |
-		    MVNETA_TX_INTR_MASK(txq_number) |
-		    MVNETA_MISCINTR_INTR_MASK);
+	/* Unmask interrupts. It has to be done from each CPU */
+	for_each_online_cpu(cpu)
+		smp_call_function_single(cpu, mvneta_percpu_unmask_interrupt,
+					 pp, true);
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
 		    MVNETA_CAUSE_LINK_CHANGE |
@@ -2727,22 +2779,35 @@ static void mvneta_percpu_disable(void *arg)
 
 static void mvneta_percpu_elect(struct mvneta_port *pp)
 {
-	int online_cpu_idx, cpu, i = 0;
+	int online_cpu_idx, max_cpu, cpu, i = 0;
 
 	online_cpu_idx = pp->rxq_def % num_online_cpus();
+	max_cpu = num_present_cpus();
 
 	for_each_online_cpu(cpu) {
-		if (i == online_cpu_idx)
-			/* Enable per-CPU interrupt on the one CPU we
-			 * just elected
+		int rxq_map = 0, txq_map = 0;
+		int rxq;
+
+		for (rxq = 0; rxq < rxq_number; rxq++)
+			if ((rxq % max_cpu) == cpu)
+				rxq_map |= MVNETA_CPU_RXQ_ACCESS(rxq);
+
+		if (i == online_cpu_idx) {
+			/* Map the default receive queue and transmit
+			 * queue to the elected CPU
 			 */
-			smp_call_function_single(cpu, mvneta_percpu_enable,
-						pp, true);
-		else
-			/* Disable per-CPU interrupt on all the other CPU */
-			smp_call_function_single(cpu, mvneta_percpu_disable,
-						pp, true);
+			rxq_map |= MVNETA_CPU_RXQ_ACCESS(pp->rxq_def);
+			txq_map = MVNETA_CPU_TXQ_ACCESS_ALL_MASK;
+		}
+		mvreg_write(pp, MVNETA_CPU_MAP(cpu), rxq_map | txq_map);
+
+		/* Update the interrupt mask on each CPU according the
+		 * new mapping
+		 */
+		smp_call_function_single(cpu, mvneta_percpu_unmask_interrupt,
+					 pp, true);
 		i++;
+
 	}
 };
 
@@ -2777,12 +2842,22 @@ static int mvneta_percpu_notifier(struct notifier_block *nfb,
 		mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
 		napi_enable(&port->napi);
 
+
+		/* Enable per-CPU interrupts on the CPU that is
+		 * brought up.
+		 */
+		smp_call_function_single(cpu, mvneta_percpu_enable,
+					 pp, true);
+
 		/* Enable per-CPU interrupt on the one CPU we care
 		 * about.
 		 */
 		mvneta_percpu_elect(pp);
 
-		/* Unmask all ethernet port interrupts */
+		/* Unmask all ethernet port interrupts, as this
+		 * notifier is called for each CPU then the CPU to
+		 * Queue mapping is applied
+		 */
 		mvreg_write(pp, MVNETA_INTR_NEW_MASK,
 			MVNETA_RX_INTR_MASK(rxq_number) |
 			MVNETA_TX_INTR_MASK(txq_number) |
@@ -2833,7 +2908,7 @@ static int mvneta_percpu_notifier(struct notifier_block *nfb,
 static int mvneta_open(struct net_device *dev)
 {
 	struct mvneta_port *pp = netdev_priv(dev);
-	int ret;
+	int ret, cpu;
 
 	pp->pkt_size = MVNETA_RX_PKT_SIZE(pp->dev->mtu);
 	pp->frag_size = SKB_DATA_ALIGN(MVNETA_RX_BUF_SIZE(pp->pkt_size)) +
@@ -2863,8 +2938,13 @@ static int mvneta_open(struct net_device *dev)
 	 */
 	mvneta_percpu_disable(pp);
 
-	/* Elect a CPU to handle our RX queue interrupt */
-	mvneta_percpu_elect(pp);
+	/* Enable per-CPU interrupt on all the CPU to handle our RX
+	 * queue interrupts
+	 */
+	for_each_online_cpu(cpu)
+		smp_call_function_single(cpu, mvneta_percpu_enable,
+					 pp, true);
+
 
 	/* Register a CPU notifier to handle the case where our CPU
 	 * might be taken offline.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v2 3/4] net: mvneta: Add naive RSS support
  2015-12-04 18:44 [PATCH net-next v2 0/4] mvneta: Introduce RSS support Gregory CLEMENT
  2015-12-04 18:44 ` [PATCH net-next v2 1/4] net: mvneta: Make the default queue related for each port Gregory CLEMENT
  2015-12-04 18:44 ` [PATCH net-next v2 2/4] net: mvneta: Associate RX queues with each CPU Gregory CLEMENT
@ 2015-12-04 18:44 ` Gregory CLEMENT
  2015-12-04 18:45 ` [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs Gregory CLEMENT
  3 siblings, 0 replies; 9+ messages in thread
From: Gregory CLEMENT @ 2015-12-04 18:44 UTC (permalink / raw)
  To: David S. Miller, linux-kernel, netdev, Thomas Petazzoni
  Cc: Lior Amsalem, Andrew Lunn, Russell King - ARM Linux, Jason Cooper,
	Arnd Bergmann, Boris BREZILLON, Simon Guinot, Nadav Haklai,
	Ezequiel Garcia, Gregory CLEMENT, Maxime Ripard, Marcin Wojtas,
	Willy Tarreau, linux-arm-kernel, Sebastian Hesselbarth

This patch adds the support for the RSS related ethtool
function. Currently it only uses one entry in the indirection table which
allows associating an mvneta interface to a given CPU.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
---
 drivers/net/ethernet/marvell/mvneta.c | 127 +++++++++++++++++++++++++++++++++-
 1 file changed, 126 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 8974ab084839..e0dba6869605 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -259,6 +259,11 @@
 
 #define MVNETA_TX_MTU_MAX		0x3ffff
 
+/* The RSS lookup table actually has 256 entries but we do not use
+ * them yet
+ */
+#define MVNETA_RSS_LU_TABLE_SIZE	1
+
 /* TSO header size */
 #define TSO_HEADER_SIZE 128
 
@@ -380,6 +385,8 @@ struct mvneta_port {
 	int use_inband_status:1;
 
 	u64 ethtool_stats[ARRAY_SIZE(mvneta_statistics)];
+
+	u32 indir[MVNETA_RSS_LU_TABLE_SIZE];
 };
 
 /* The mvneta_tx_desc and mvneta_rx_desc structures describe the
@@ -1027,7 +1034,7 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
 			if ((rxq % max_cpu) == cpu)
 				rxq_map |= MVNETA_CPU_RXQ_ACCESS(rxq);
 
-		if (cpu == rxq_def)
+		if (cpu == pp->rxq_def)
 			txq_map = MVNETA_CPU_TXQ_ACCESS_ALL_MASK;
 
 		mvreg_write(pp, MVNETA_CPU_MAP(cpu), rxq_map | txq_map);
@@ -2483,6 +2490,18 @@ static void mvneta_percpu_unmask_interrupt(void *arg)
 		    MVNETA_MISCINTR_INTR_MASK);
 }
 
+static void mvneta_percpu_mask_interrupt(void *arg)
+{
+	struct mvneta_port *pp = arg;
+
+	/* All the queue are masked, but actually only the ones
+	 * maped to this CPU will be masked
+	 */
+	mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
+	mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
+	mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
+}
+
 static void mvneta_start_dev(struct mvneta_port *pp)
 {
 	unsigned int cpu;
@@ -3173,6 +3192,106 @@ static int mvneta_ethtool_get_sset_count(struct net_device *dev, int sset)
 	return -EOPNOTSUPP;
 }
 
+static u32 mvneta_ethtool_get_rxfh_indir_size(struct net_device *dev)
+{
+	return MVNETA_RSS_LU_TABLE_SIZE;
+}
+
+static int mvneta_ethtool_get_rxnfc(struct net_device *dev,
+				    struct ethtool_rxnfc *info,
+				    u32 *rules __always_unused)
+{
+	switch (info->cmd) {
+	case ETHTOOL_GRXRINGS:
+		info->data =  rxq_number;
+		return 0;
+	case ETHTOOL_GRXFH:
+		return -EOPNOTSUPP;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static int  mvneta_config_rss(struct mvneta_port *pp)
+{
+	int cpu;
+	u32 val;
+
+	netif_tx_stop_all_queues(pp->dev);
+
+	for_each_online_cpu(cpu)
+		smp_call_function_single(cpu, mvneta_percpu_mask_interrupt,
+					 pp, true);
+
+	/* We have to synchronise on the napi of each CPU */
+	for_each_online_cpu(cpu) {
+		struct mvneta_pcpu_port *pcpu_port =
+			per_cpu_ptr(pp->ports, cpu);
+
+		napi_synchronize(&pcpu_port->napi);
+		napi_disable(&pcpu_port->napi);
+	}
+
+	pp->rxq_def = pp->indir[0];
+
+	/* Update unicast mapping */
+	mvneta_set_rx_mode(pp->dev);
+
+	/* Update val of portCfg register accordingly with all RxQueue types */
+	val = MVNETA_PORT_CONFIG_DEFL_VALUE(pp->rxq_def);
+	mvreg_write(pp, MVNETA_PORT_CONFIG, val);
+
+	/* Update the elected CPU matching the new rxq_def */
+	mvneta_percpu_elect(pp);
+
+	/* We have to synchronise on the napi of each CPU */
+	for_each_online_cpu(cpu) {
+		struct mvneta_pcpu_port *pcpu_port =
+			per_cpu_ptr(pp->ports, cpu);
+
+		napi_enable(&pcpu_port->napi);
+	}
+
+	netif_tx_start_all_queues(pp->dev);
+
+	return 0;
+}
+
+static int mvneta_ethtool_set_rxfh(struct net_device *dev, const u32 *indir,
+				   const u8 *key, const u8 hfunc)
+{
+	struct mvneta_port *pp = netdev_priv(dev);
+	/* We require at least one supported parameter to be changed
+	 * and no change in any of the unsupported parameters
+	 */
+	if (key ||
+	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
+		return -EOPNOTSUPP;
+
+	if (!indir)
+		return 0;
+
+	memcpy(pp->indir, indir, MVNETA_RSS_LU_TABLE_SIZE);
+
+	return mvneta_config_rss(pp);
+}
+
+static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key,
+				   u8 *hfunc)
+{
+	struct mvneta_port *pp = netdev_priv(dev);
+
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+
+	if (!indir)
+		return 0;
+
+	memcpy(indir, pp->indir, MVNETA_RSS_LU_TABLE_SIZE);
+
+	return 0;
+}
+
 static const struct net_device_ops mvneta_netdev_ops = {
 	.ndo_open            = mvneta_open,
 	.ndo_stop            = mvneta_stop,
@@ -3197,6 +3316,10 @@ const struct ethtool_ops mvneta_eth_tool_ops = {
 	.get_strings	= mvneta_ethtool_get_strings,
 	.get_ethtool_stats = mvneta_ethtool_get_stats,
 	.get_sset_count	= mvneta_ethtool_get_sset_count,
+	.get_rxfh_indir_size = mvneta_ethtool_get_rxfh_indir_size,
+	.get_rxnfc	= mvneta_ethtool_get_rxnfc,
+	.get_rxfh	= mvneta_ethtool_get_rxfh,
+	.set_rxfh	= mvneta_ethtool_set_rxfh,
 };
 
 /* Initialize hw */
@@ -3389,6 +3512,8 @@ static int mvneta_probe(struct platform_device *pdev)
 
 	pp->rxq_def = rxq_def;
 
+	pp->indir[0] = rxq_def;
+
 	pp->clk = devm_clk_get(&pdev->dev, NULL);
 	if (IS_ERR(pp->clk)) {
 		err = PTR_ERR(pp->clk);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs
  2015-12-04 18:44 [PATCH net-next v2 0/4] mvneta: Introduce RSS support Gregory CLEMENT
                   ` (2 preceding siblings ...)
  2015-12-04 18:44 ` [PATCH net-next v2 3/4] net: mvneta: Add naive RSS support Gregory CLEMENT
@ 2015-12-04 18:45 ` Gregory CLEMENT
  2015-12-04 19:12   ` Eric Dumazet
  2015-12-05 19:14   ` Marcin Wojtas
  3 siblings, 2 replies; 9+ messages in thread
From: Gregory CLEMENT @ 2015-12-04 18:45 UTC (permalink / raw)
  To: David S. Miller, linux-kernel, netdev, Thomas Petazzoni
  Cc: Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory CLEMENT,
	Ezequiel Garcia, linux-arm-kernel, Lior Amsalem, Nadav Haklai,
	Marcin Wojtas, Simon Guinot, Maxime Ripard, Boris BREZILLON,
	Russell King - ARM Linux, Willy Tarreau, Arnd Bergmann

With this patch each CPU is associated with its own set of TX queues. In
the same time the SKB received in mvneta_tx is bound to the queue
associated to the CPU sending the data. Thanks to this the next IRQ will
be received on the same CPU allowing sending more data.

It will also allow to have a more predictable behavior regarding
throughput and latency when having multiple threads sending out data on
different CPUs.

As an example on Armada XP GP, with an iperf bound to a CPU and a ping
bound to another CPU, without this patch the ping round trip was about
2.5ms (and could reach 3s!), whereas with this patch it was around
0.7ms (and sometime it went to 1.2ms).

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvneta.c | 48 ++++++++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index e0dba6869605..bb5e29daac0b 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -110,6 +110,7 @@
 #define      MVNETA_CPU_RXQ_ACCESS_ALL_MASK      0x000000ff
 #define      MVNETA_CPU_TXQ_ACCESS_ALL_MASK      0x0000ff00
 #define      MVNETA_CPU_RXQ_ACCESS(rxq)		 BIT(rxq)
+#define      MVNETA_CPU_TXQ_ACCESS(txq)		 BIT(txq + 8)
 #define MVNETA_RXQ_TIME_COAL_REG(q)              (0x2580 + ((q) << 2))
 
 /* Exception Interrupt Port/Queue Cause register
@@ -1022,20 +1023,30 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
 	/* Enable MBUS Retry bit16 */
 	mvreg_write(pp, MVNETA_MBUS_RETRY, 0x20);
 
-	/* Set CPU queue access map. CPUs are assigned to the RX
-	 * queues modulo their number and all the TX queues are
-	 * assigned to the CPU associated to the default RX queue.
+	/* Set CPU queue access map. CPUs are assigned to the RX and
+	 * TX queues modulo their number. If there is only one TX
+	 * queue then it is assigned to the CPU associated to the
+	 * default RX queue.
 	 */
 	for_each_present_cpu(cpu) {
 		int rxq_map = 0, txq_map = 0;
-		int rxq;
+		int rxq, txq;
 
 		for (rxq = 0; rxq < rxq_number; rxq++)
 			if ((rxq % max_cpu) == cpu)
 				rxq_map |= MVNETA_CPU_RXQ_ACCESS(rxq);
 
-		if (cpu == pp->rxq_def)
-			txq_map = MVNETA_CPU_TXQ_ACCESS_ALL_MASK;
+		for (txq = 0; txq < txq_number; txq++)
+			if ((txq % max_cpu) == cpu)
+				txq_map |= MVNETA_CPU_TXQ_ACCESS(txq);
+
+		/* With only one TX queue we configure a special case
+		 * which will allow to get all the irq on a single
+		 * CPU
+		 */
+		if (txq_number == 1)
+			txq_map = (cpu == pp->rxq_def) ?
+				MVNETA_CPU_TXQ_ACCESS(1) : 0;
 
 		mvreg_write(pp, MVNETA_CPU_MAP(cpu), rxq_map | txq_map);
 	}
@@ -1824,13 +1835,16 @@ error:
 static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
 {
 	struct mvneta_port *pp = netdev_priv(dev);
-	u16 txq_id = skb_get_queue_mapping(skb);
+	u16 txq_id = smp_processor_id() % txq_number;
 	struct mvneta_tx_queue *txq = &pp->txqs[txq_id];
 	struct mvneta_tx_desc *tx_desc;
 	int len = skb->len;
 	int frags = 0;
 	u32 tx_cmd;
 
+	/* Use the tx queue bound to this CPU */
+	skb_set_queue_mapping(skb, txq_id);
+
 	if (!netif_running(dev))
 		goto out;
 
@@ -2811,13 +2825,23 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
 			if ((rxq % max_cpu) == cpu)
 				rxq_map |= MVNETA_CPU_RXQ_ACCESS(rxq);
 
-		if (i == online_cpu_idx) {
-			/* Map the default receive queue and transmit
-			 * queue to the elected CPU
+		if (i == online_cpu_idx)
+			/* Map the default receive queue queue to the
+			 * elected CPU
 			 */
 			rxq_map |= MVNETA_CPU_RXQ_ACCESS(pp->rxq_def);
-			txq_map = MVNETA_CPU_TXQ_ACCESS_ALL_MASK;
-		}
+
+		/* We update the TX queue map only if we have one
+		 * queue. In this case we associate the TX queue to
+		 * the CPU bound to the default RX queue
+		 */
+		if (txq_number == 1)
+			txq_map = (i == online_cpu_idx) ?
+				MVNETA_CPU_TXQ_ACCESS(1) : 0;
+		else
+			txq_map = mvreg_read(pp, MVNETA_CPU_MAP(cpu)) &
+				MVNETA_CPU_TXQ_ACCESS_ALL_MASK;
+
 		mvreg_write(pp, MVNETA_CPU_MAP(cpu), rxq_map | txq_map);
 
 		/* Update the interrupt mask on each CPU according the
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs
  2015-12-04 18:45 ` [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs Gregory CLEMENT
@ 2015-12-04 19:12   ` Eric Dumazet
  2015-12-04 21:30     ` Arnd Bergmann
  2015-12-05 19:14   ` Marcin Wojtas
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2015-12-04 19:12 UTC (permalink / raw)
  To: Gregory CLEMENT
  Cc: David S. Miller, linux-kernel, netdev, Thomas Petazzoni,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Ezequiel Garcia,
	linux-arm-kernel, Lior Amsalem, Nadav Haklai, Marcin Wojtas,
	Simon Guinot, Maxime Ripard, Boris BREZILLON,
	Russell King - ARM Linux, Willy Tarreau, Arnd Bergmann

On Fri, 2015-12-04 at 19:45 +0100, Gregory CLEMENT wrote:
> With this patch each CPU is associated with its own set of TX queues. In
> the same time the SKB received in mvneta_tx is bound to the queue
> associated to the CPU sending the data. Thanks to this the next IRQ will
> be received on the same CPU allowing sending more data.
> 
> It will also allow to have a more predictable behavior regarding
> throughput and latency when having multiple threads sending out data on
> different CPUs.
> 
> As an example on Armada XP GP, with an iperf bound to a CPU and a ping
> bound to another CPU, without this patch the ping round trip was about
> 2.5ms (and could reach 3s!), whereas with this patch it was around
> 0.7ms (and sometime it went to 1.2ms).

This really looks like you need something smarter than pfifo_fast qdisc,
and maybe BQL (I did not check if this driver already implements this)

> 
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>

...

> @@ -1824,13 +1835,16 @@ error:
>  static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct mvneta_port *pp = netdev_priv(dev);
> -	u16 txq_id = skb_get_queue_mapping(skb);
> +	u16 txq_id = smp_processor_id() % txq_number;
>  	struct mvneta_tx_queue *txq = &pp->txqs[txq_id];
>  	struct mvneta_tx_desc *tx_desc;
>  	int len = skb->len;
>  	int frags = 0;
>  	u32 tx_cmd;
>  
> +	/* Use the tx queue bound to this CPU */
> +	skb_set_queue_mapping(skb, txq_id);
> +


We certainly do not want every driver implementing its own hacks.

We have a standard way to handle this, it is called XPS, and eventually
ndo_select_queue()

Documentation/networking/scaling.txt contains some hints.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs
  2015-12-04 19:12   ` Eric Dumazet
@ 2015-12-04 21:30     ` Arnd Bergmann
  0 siblings, 0 replies; 9+ messages in thread
From: Arnd Bergmann @ 2015-12-04 21:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Gregory CLEMENT, David S. Miller, linux-kernel, netdev,
	Thomas Petazzoni, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, Ezequiel Garcia, linux-arm-kernel,
	Lior Amsalem, Nadav Haklai, Marcin Wojtas, Simon Guinot,
	Maxime Ripard, Boris BREZILLON, Russell King - ARM Linux,
	Willy Tarreau

On Friday 04 December 2015 11:12:30 Eric Dumazet wrote:
> On Fri, 2015-12-04 at 19:45 +0100, Gregory CLEMENT wrote:
> > With this patch each CPU is associated with its own set of TX queues. In
> > the same time the SKB received in mvneta_tx is bound to the queue
> > associated to the CPU sending the data. Thanks to this the next IRQ will
> > be received on the same CPU allowing sending more data.
> > 
> > It will also allow to have a more predictable behavior regarding
> > throughput and latency when having multiple threads sending out data on
> > different CPUs.
> > 
> > As an example on Armada XP GP, with an iperf bound to a CPU and a ping
> > bound to another CPU, without this patch the ping round trip was about
> > 2.5ms (and could reach 3s!), whereas with this patch it was around
> > 0.7ms (and sometime it went to 1.2ms).
> 
> This really looks like you need something smarter than pfifo_fast qdisc,
> and maybe BQL (I did not check if this driver already implements this)

I suggested this change as well as the BQL implementation that Marcin did.
I believe he hasn't posted that yet while he's doing some more testing,
but it should come soon.

	Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs
  2015-12-04 18:45 ` [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs Gregory CLEMENT
  2015-12-04 19:12   ` Eric Dumazet
@ 2015-12-05 19:14   ` Marcin Wojtas
  2015-12-05 22:24     ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Marcin Wojtas @ 2015-12-05 19:14 UTC (permalink / raw)
  To: Gregory CLEMENT
  Cc: David S. Miller, linux-kernel, netdev, Thomas Petazzoni,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Ezequiel Garcia,
	linux-arm-kernel@lists.infradead.org, Lior Amsalem, Nadav Haklai,
	Simon Guinot, Maxime Ripard, Boris BREZILLON,
	Russell King - ARM Linux, Willy Tarreau, Arnd Bergmann

Hi Gregory,

> @@ -1824,13 +1835,16 @@ error:
>  static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
>  {
>         struct mvneta_port *pp = netdev_priv(dev);
> -       u16 txq_id = skb_get_queue_mapping(skb);
> +       u16 txq_id = smp_processor_id() % txq_number;

I think it may be ok to bind TXQs to different CPUs, but I don't think
that replacing skb_get_queue_mapping by in fact smp_processor_id() is
the best idea. This way you use only 2 TXQs on A385 and 4 TXQs on AXP.
There are HW mechanisms like WRR or EJP that provide balancing for
egress, so let's better keep all 8.

As a compromise I think it's enough to do the mapping and we would
achieve some offload by TX processing done on different CPU's and let
BQL do the balance on higher level. FYI, I've already implemented BQL
and will submit it asap, however I still have some weird problems
after enabling it.

Best regards,
Marcin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs
  2015-12-05 19:14   ` Marcin Wojtas
@ 2015-12-05 22:24     ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2015-12-05 22:24 UTC (permalink / raw)
  To: mw
  Cc: gregory.clement, linux-kernel, netdev, thomas.petazzoni, jason,
	andrew, sebastian.hesselbarth, ezequiel.garcia, linux-arm-kernel,
	alior, nadavh, simon.guinot, maxime.ripard, boris.brezillon,
	linux, w, arnd

From: Marcin Wojtas <mw@semihalf.com>
Date: Sat, 5 Dec 2015 20:14:31 +0100

> Hi Gregory,
> 
>> @@ -1824,13 +1835,16 @@ error:
>>  static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
>>  {
>>         struct mvneta_port *pp = netdev_priv(dev);
>> -       u16 txq_id = skb_get_queue_mapping(skb);
>> +       u16 txq_id = smp_processor_id() % txq_number;
> 
> I think it may be ok to bind TXQs to different CPUs, but I don't think
> that replacing skb_get_queue_mapping by in fact smp_processor_id() is
> the best idea. This way you use only 2 TXQs on A385 and 4 TXQs on AXP.
> There are HW mechanisms like WRR or EJP that provide balancing for
> egress, so let's better keep all 8.

Also it is possible for other parts of the stack to set the SKB queue
mapping and you must respect that setting rather than override it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-12-05 22:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-04 18:44 [PATCH net-next v2 0/4] mvneta: Introduce RSS support Gregory CLEMENT
2015-12-04 18:44 ` [PATCH net-next v2 1/4] net: mvneta: Make the default queue related for each port Gregory CLEMENT
2015-12-04 18:44 ` [PATCH net-next v2 2/4] net: mvneta: Associate RX queues with each CPU Gregory CLEMENT
2015-12-04 18:44 ` [PATCH net-next v2 3/4] net: mvneta: Add naive RSS support Gregory CLEMENT
2015-12-04 18:45 ` [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues management on all CPUs Gregory CLEMENT
2015-12-04 19:12   ` Eric Dumazet
2015-12-04 21:30     ` Arnd Bergmann
2015-12-05 19:14   ` Marcin Wojtas
2015-12-05 22:24     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).