[PATCH net-next 0/2] gianfar: Tx timeout issue

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/2] gianfar: Tx timeout issue
@ 2014-03-05  8:28 Claudiu Manoil
  2014-03-05  8:28 ` [PATCH net-next 1/2] gianfar: Separate out the Tx interrupt handling (Tx NAPI) Claudiu Manoil
  2014-03-05  8:28 ` [PATCH net-next 2/2] gianfar: Make multi-queue polling optional Claudiu Manoil
  0 siblings, 2 replies; 6+ messages in thread
From: Claudiu Manoil @ 2014-03-05  8:28 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

Hi David,

There's an older Tx timeout issue showing up on etsec2 devices
with 2 CPUs.  I pinned this issue down to processing overhead
incurred by supporting multiple Tx/Rx rings, as explained in
the 2nd patch below.  But before this, there's also a concurency
issue leading to Rx/Tx spurrious interrupts, addressed by the
'Tx NAPI' patch below.
The Tx timeout can be triggered with multiple Tx flows,
'iperf -c -N 8' commands, on a 2 CPUs etsec2 based (P1020) board.

Before the patches:
"""
root@p1020rdb-pc:~# iperf -c 172.16.1.3 -n 1000M -P 8 &
[...]
root@p1020rdb-pc:~# NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue 1 timed out
WARNING: at net/sched/sch_generic.c:279
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc3-03386-g89ea59c #23
task: ed84ef40 ti: ed868000 task.ti: ed868000
NIP: c04627a8 LR: c04627a8 CTR: c02fb270
REGS: ed869d00 TRAP: 0700   Not tainted  (3.13.0-rc3-03386-g89ea59c)
MSR: 00029000 <CE,EE,ME>  CR: 44000022  XER: 20000000
[...]

root@p1020rdb-pc:~# [ ID] Interval       Transfer     Bandwidth
[  5]  0.0-19.3 sec  1000 MBytes    434 Mbits/sec
[  8]  0.0-39.7 sec  1000 MBytes    211 Mbits/sec
[  9]  0.0-40.1 sec  1000 MBytes    209 Mbits/sec
[  3]  0.0-40.2 sec  1000 MBytes    209 Mbits/sec
[ 10]  0.0-59.0 sec  1000 MBytes    142 Mbits/sec
[  7]  0.0-74.6 sec  1000 MBytes    112 Mbits/sec
[  6]  0.0-74.7 sec  1000 MBytes    112 Mbits/sec
[  4]  0.0-74.7 sec  1000 MBytes    112 Mbits/sec
[SUM]  0.0-74.7 sec  7.81 GBytes    898 Mbits/sec

root@p1020rdb-pc:~# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:04:9f:00:13:01  
          inet addr:172.16.1.1  Bcast:172.16.255.255  Mask:255.255.0.0
          inet6 addr: fe80::204:9fff:fe00:1301/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:708722 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8717849 errors:6 dropped:0 overruns:1470 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:58118018 (55.4 MiB)  TX bytes:274069482 (261.3 MiB)
          Base address:0xa000 

"""

After applying the patches:
"""
root@p1020rdb-pc:~# iperf -c 172.16.1.3 -n 1000M -P 8 &
[...]
root@p1020rdb-pc:~# [ ID] Interval       Transfer     Bandwidth
[  9]  0.0-70.5 sec  1000 MBytes    119 Mbits/sec
[  5]  0.0-70.5 sec  1000 MBytes    119 Mbits/sec
[  6]  0.0-70.7 sec  1000 MBytes    119 Mbits/sec
[  4]  0.0-71.0 sec  1000 MBytes    118 Mbits/sec
[  8]  0.0-71.1 sec  1000 MBytes    118 Mbits/sec
[  3]  0.0-71.2 sec  1000 MBytes    118 Mbits/sec
[ 10]  0.0-71.3 sec  1000 MBytes    118 Mbits/sec
[  7]  0.0-71.3 sec  1000 MBytes    118 Mbits/sec
[SUM]  0.0-71.3 sec  7.81 GBytes    942 Mbits/sec

root@p1020rdb-pc:~# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:04:9f:00:13:01  
          inet addr:172.16.1.1  Bcast:172.16.255.255  Mask:255.255.0.0
          inet6 addr: fe80::204:9fff:fe00:1301/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:728446 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8690057 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:59732650 (56.9 MiB)  TX bytes:271554306 (258.9 MiB)
          Base address:0xa000 
"""


Thanks.

Claudiu Manoil (2):
  gianfar: Separate out the Tx interrupt handling (Tx NAPI)
  gianfar: Make multi-queue polling optional

 drivers/net/ethernet/freescale/gianfar.c | 248 +++++++++++++++++++++++--------
 drivers/net/ethernet/freescale/gianfar.h |  29 ++--
 2 files changed, 202 insertions(+), 75 deletions(-)

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next 1/2] gianfar: Separate out the Tx interrupt handling (Tx NAPI)
  2014-03-05  8:28 [PATCH net-next 0/2] gianfar: Tx timeout issue Claudiu Manoil
@ 2014-03-05  8:28 ` Claudiu Manoil
  2014-03-05  8:28 ` [PATCH net-next 2/2] gianfar: Make multi-queue polling optional Claudiu Manoil
  1 sibling, 0 replies; 6+ messages in thread
From: Claudiu Manoil @ 2014-03-05  8:28 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

There are some concurrency issues on devices w/ 2 CPUs related
to the handling of Rx and Tx interrupts.  eTSEC has separate
interrupt lines for Rx and Tx but a single imask register
to mask these interrupts and a single NAPI instance to handle
both Rx and Tx work.  As a result, the Rx and Tx ISRs are
identical, both are invoking gfar_schedule_cleanup(), however
both handlers can be entered at the same time when the Rx and
Tx interrupts are taken by different CPUs.  In this case
spurrious interrupts (SPU) show up (in /proc/interrupts)
indicating a concurrency issue.  Also, Tx overruns followed
by Tx timeout have been observed under heavy Tx traffic load.

To address these issues, the schedule cleanup ISR part has
been changed to handle the Rx and Tx interrupts independently.
The patch adds a separate NAPI poll routine for Tx cleanup to
be triggerred independently by the Tx confirmation interrupts
only.  Existing poll functions are modified to handle only
the Rx path processing.  The Tx poll routine does not need a
budget, since Tx processing doesn't consume NAPI budget, and
hence it is registered with minimum NAPI weight.
NAPI scheduling does not require locking since there are
different NAPI instances between the Rx and Tx confirmation
paths now.
So, the patch fixes the occurence of spurrious Rx/Tx interrupts.
Tx overruns also occur less frequently now.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 218 ++++++++++++++++++++++---------
 drivers/net/ethernet/freescale/gianfar.h |  11 +-
 2 files changed, 160 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index c5b9320..1aa2d55 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -128,8 +128,10 @@ static void free_skb_resources(struct gfar_private *priv);
 static void gfar_set_multi(struct net_device *dev);
 static void gfar_set_hash_for_addr(struct net_device *dev, u8 *addr);
 static void gfar_configure_serdes(struct net_device *dev);
-static int gfar_poll(struct napi_struct *napi, int budget);
-static int gfar_poll_sq(struct napi_struct *napi, int budget);
+static int gfar_poll_rx(struct napi_struct *napi, int budget);
+static int gfar_poll_tx(struct napi_struct *napi, int budget);
+static int gfar_poll_rx_sq(struct napi_struct *napi, int budget);
+static int gfar_poll_tx_sq(struct napi_struct *napi, int budget);
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void gfar_netpoll(struct net_device *dev);
 #endif
@@ -614,16 +616,20 @@ static void disable_napi(struct gfar_private *priv)
 {
 	int i;
 
-	for (i = 0; i < priv->num_grps; i++)
-		napi_disable(&priv->gfargrp[i].napi);
+	for (i = 0; i < priv->num_grps; i++) {
+		napi_disable(&priv->gfargrp[i].napi_rx);
+		napi_disable(&priv->gfargrp[i].napi_tx);
+	}
 }
 
 static void enable_napi(struct gfar_private *priv)
 {
 	int i;
 
-	for (i = 0; i < priv->num_grps; i++)
-		napi_enable(&priv->gfargrp[i].napi);
+	for (i = 0; i < priv->num_grps; i++) {
+		napi_enable(&priv->gfargrp[i].napi_rx);
+		napi_enable(&priv->gfargrp[i].napi_tx);
+	}
 }
 
 static int gfar_parse_group(struct device_node *np,
@@ -1257,13 +1263,19 @@ static int gfar_probe(struct platform_device *ofdev)
 	dev->ethtool_ops = &gfar_ethtool_ops;
 
 	/* Register for napi ...We are registering NAPI for each grp */
-	if (priv->mode == SQ_SG_MODE)
-		netif_napi_add(dev, &priv->gfargrp[0].napi, gfar_poll_sq,
+	if (priv->mode == SQ_SG_MODE) {
+		netif_napi_add(dev, &priv->gfargrp[0].napi_rx, gfar_poll_rx_sq,
 			       GFAR_DEV_WEIGHT);
-	else
-		for (i = 0; i < priv->num_grps; i++)
-			netif_napi_add(dev, &priv->gfargrp[i].napi, gfar_poll,
-				       GFAR_DEV_WEIGHT);
+		netif_napi_add(dev, &priv->gfargrp[0].napi_tx, gfar_poll_tx_sq,
+			       2);
+	} else {
+		for (i = 0; i < priv->num_grps; i++) {
+			netif_napi_add(dev, &priv->gfargrp[i].napi_rx,
+				       gfar_poll_rx, GFAR_DEV_WEIGHT);
+			netif_napi_add(dev, &priv->gfargrp[i].napi_tx,
+				       gfar_poll_tx, 2);
+		}
+	}
 
 	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_CSUM) {
 		dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG |
@@ -2538,31 +2550,6 @@ static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 	netdev_tx_completed_queue(txq, howmany, bytes_sent);
 }
 
-static void gfar_schedule_cleanup(struct gfar_priv_grp *gfargrp)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&gfargrp->grplock, flags);
-	if (napi_schedule_prep(&gfargrp->napi)) {
-		gfar_write(&gfargrp->regs->imask, IMASK_RTX_DISABLED);
-		__napi_schedule(&gfargrp->napi);
-	} else {
-		/* Clear IEVENT, so interrupts aren't called again
-		 * because of the packets that have already arrived.
-		 */
-		gfar_write(&gfargrp->regs->ievent, IEVENT_RTX_MASK);
-	}
-	spin_unlock_irqrestore(&gfargrp->grplock, flags);
-
-}
-
-/* Interrupt Handler for Transmit complete */
-static irqreturn_t gfar_transmit(int irq, void *grp_id)
-{
-	gfar_schedule_cleanup((struct gfar_priv_grp *)grp_id);
-	return IRQ_HANDLED;
-}
-
 static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 			   struct sk_buff *skb)
 {
@@ -2633,7 +2620,48 @@ static inline void count_errors(unsigned short status, struct net_device *dev)
 
 irqreturn_t gfar_receive(int irq, void *grp_id)
 {
-	gfar_schedule_cleanup((struct gfar_priv_grp *)grp_id);
+	struct gfar_priv_grp *grp = (struct gfar_priv_grp *)grp_id;
+	unsigned long flags;
+	u32 imask;
+
+	if (likely(napi_schedule_prep(&grp->napi_rx))) {
+		spin_lock_irqsave(&grp->grplock, flags);
+		imask = gfar_read(&grp->regs->imask);
+		imask &= IMASK_RX_DISABLED;
+		gfar_write(&grp->regs->imask, imask);
+		spin_unlock_irqrestore(&grp->grplock, flags);
+		__napi_schedule(&grp->napi_rx);
+	} else {
+		/* Clear IEVENT, so interrupts aren't called again
+		 * because of the packets that have already arrived.
+		 */
+		gfar_write(&grp->regs->ievent, IEVENT_RX_MASK);
+	}
+
+	return IRQ_HANDLED;
+}
+
+/* Interrupt Handler for Transmit complete */
+static irqreturn_t gfar_transmit(int irq, void *grp_id)
+{
+	struct gfar_priv_grp *grp = (struct gfar_priv_grp *)grp_id;
+	unsigned long flags;
+	u32 imask;
+
+	if (likely(napi_schedule_prep(&grp->napi_tx))) {
+		spin_lock_irqsave(&grp->grplock, flags);
+		imask = gfar_read(&grp->regs->imask);
+		imask &= IMASK_TX_DISABLED;
+		gfar_write(&grp->regs->imask, imask);
+		spin_unlock_irqrestore(&grp->grplock, flags);
+		__napi_schedule(&grp->napi_tx);
+	} else {
+		/* Clear IEVENT, so interrupts aren't called again
+		 * because of the packets that have already arrived.
+		 */
+		gfar_write(&grp->regs->ievent, IEVENT_TX_MASK);
+	}
+
 	return IRQ_HANDLED;
 }
 
@@ -2757,7 +2785,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 				rx_queue->stats.rx_bytes += pkt_len;
 				skb_record_rx_queue(skb, rx_queue->qindex);
 				gfar_process_frame(dev, skb, amount_pull,
-						   &rx_queue->grp->napi);
+						   &rx_queue->grp->napi_rx);
 
 			} else {
 				netif_warn(priv, rx_err, dev, "Missing skb!\n");
@@ -2786,55 +2814,81 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 	return howmany;
 }
 
-static int gfar_poll_sq(struct napi_struct *napi, int budget)
+static int gfar_poll_rx_sq(struct napi_struct *napi, int budget)
 {
 	struct gfar_priv_grp *gfargrp =
-		container_of(napi, struct gfar_priv_grp, napi);
+		container_of(napi, struct gfar_priv_grp, napi_rx);
 	struct gfar __iomem *regs = gfargrp->regs;
-	struct gfar_priv_tx_q *tx_queue = gfargrp->priv->tx_queue[0];
 	struct gfar_priv_rx_q *rx_queue = gfargrp->priv->rx_queue[0];
 	int work_done = 0;
 
 	/* Clear IEVENT, so interrupts aren't called again
 	 * because of the packets that have already arrived
 	 */
-	gfar_write(&regs->ievent, IEVENT_RTX_MASK);
-
-	/* run Tx cleanup to completion */
-	if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx])
-		gfar_clean_tx_ring(tx_queue);
+	gfar_write(&regs->ievent, IEVENT_RX_MASK);
 
 	work_done = gfar_clean_rx_ring(rx_queue, budget);
 
 	if (work_done < budget) {
+		u32 imask;
 		napi_complete(napi);
 		/* Clear the halt bit in RSTAT */
 		gfar_write(&regs->rstat, gfargrp->rstat);
 
-		gfar_write(&regs->imask, IMASK_DEFAULT);
+		spin_lock_irq(&gfargrp->grplock);
+		imask = gfar_read(&regs->imask);
+		imask |= IMASK_RX_DEFAULT;
+		gfar_write(&regs->imask, imask);
+		spin_unlock_irq(&gfargrp->grplock);
 	}
 
 	return work_done;
 }
 
-static int gfar_poll(struct napi_struct *napi, int budget)
+static int gfar_poll_tx_sq(struct napi_struct *napi, int budget)
 {
 	struct gfar_priv_grp *gfargrp =
-		container_of(napi, struct gfar_priv_grp, napi);
+		container_of(napi, struct gfar_priv_grp, napi_tx);
+	struct gfar __iomem *regs = gfargrp->regs;
+	struct gfar_priv_tx_q *tx_queue = gfargrp->priv->tx_queue[0];
+	u32 imask;
+
+	/* Clear IEVENT, so interrupts aren't called again
+	 * because of the packets that have already arrived
+	 */
+	gfar_write(&regs->ievent, IEVENT_TX_MASK);
+
+	/* run Tx cleanup to completion */
+	if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx])
+		gfar_clean_tx_ring(tx_queue);
+
+	napi_complete(napi);
+
+	spin_lock_irq(&gfargrp->grplock);
+	imask = gfar_read(&regs->imask);
+	imask |= IMASK_TX_DEFAULT;
+	gfar_write(&regs->imask, imask);
+	spin_unlock_irq(&gfargrp->grplock);
+
+	return 0;
+}
+
+static int gfar_poll_rx(struct napi_struct *napi, int budget)
+{
+	struct gfar_priv_grp *gfargrp =
+		container_of(napi, struct gfar_priv_grp, napi_rx);
 	struct gfar_private *priv = gfargrp->priv;
 	struct gfar __iomem *regs = gfargrp->regs;
-	struct gfar_priv_tx_q *tx_queue = NULL;
 	struct gfar_priv_rx_q *rx_queue = NULL;
 	int work_done = 0, work_done_per_q = 0;
 	int i, budget_per_q = 0;
-	int has_tx_work = 0;
 	unsigned long rstat_rxf;
 	int num_act_queues;
 
 	/* Clear IEVENT, so interrupts aren't called again
 	 * because of the packets that have already arrived
 	 */
-	gfar_write(&regs->ievent, IEVENT_RTX_MASK);
+	gfar_write(&regs->ievent, IEVENT_RX_MASK);
 
 	rstat_rxf = gfar_read(&regs->rstat) & RSTAT_RXF_MASK;
 
@@ -2842,15 +2896,6 @@ static int gfar_poll(struct napi_struct *napi, int budget)
 	if (num_act_queues)
 		budget_per_q = budget/num_act_queues;
 
-	for_each_set_bit(i, &gfargrp->tx_bit_map, priv->num_tx_queues) {
-		tx_queue = priv->tx_queue[i];
-		/* run Tx cleanup to completion */
-		if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx]) {
-			gfar_clean_tx_ring(tx_queue);
-			has_tx_work = 1;
-		}
-	}
-
 	for_each_set_bit(i, &gfargrp->rx_bit_map, priv->num_rx_queues) {
 		/* skip queue if not active */
 		if (!(rstat_rxf & (RSTAT_CLEAR_RXF0 >> i)))
@@ -2873,19 +2918,62 @@ static int gfar_poll(struct napi_struct *napi, int budget)
 		}
 	}
 
-	if (!num_act_queues && !has_tx_work) {
-
+	if (!num_act_queues) {
+		u32 imask;
 		napi_complete(napi);
 
 		/* Clear the halt bit in RSTAT */
 		gfar_write(&regs->rstat, gfargrp->rstat);
 
-		gfar_write(&regs->imask, IMASK_DEFAULT);
+		spin_lock_irq(&gfargrp->grplock);
+		imask = gfar_read(&regs->imask);
+		imask |= IMASK_RX_DEFAULT;
+		gfar_write(&regs->imask, imask);
+		spin_unlock_irq(&gfargrp->grplock);
 	}
 
 	return work_done;
 }
 
+static int gfar_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct gfar_priv_grp *gfargrp =
+		container_of(napi, struct gfar_priv_grp, napi_tx);
+	struct gfar_private *priv = gfargrp->priv;
+	struct gfar __iomem *regs = gfargrp->regs;
+	struct gfar_priv_tx_q *tx_queue = NULL;
+	int has_tx_work = 0;
+	int i;
+
+	/* Clear IEVENT, so interrupts aren't called again
+	 * because of the packets that have already arrived
+	 */
+	gfar_write(&regs->ievent, IEVENT_TX_MASK);
+
+	for_each_set_bit(i, &gfargrp->tx_bit_map, priv->num_tx_queues) {
+		tx_queue = priv->tx_queue[i];
+		/* run Tx cleanup to completion */
+		if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx]) {
+			gfar_clean_tx_ring(tx_queue);
+			has_tx_work = 1;
+		}
+	}
+
+	if (!has_tx_work) {
+		u32 imask;
+		napi_complete(napi);
+
+		spin_lock_irq(&gfargrp->grplock);
+		imask = gfar_read(&regs->imask);
+		imask |= IMASK_TX_DEFAULT;
+		gfar_write(&regs->imask, imask);
+		spin_unlock_irq(&gfargrp->grplock);
+	}
+
+	return 0;
+}
+
+
 #ifdef CONFIG_NET_POLL_CONTROLLER
 /* Polling 'interrupt' - used by things like netconsole to send skbs
  * without having to re-enable interrupts. It's not called while
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 1e16216..1aeb34e 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -377,8 +377,11 @@ extern const char gfar_driver_version[];
 		IMASK_RXFEN0 | IMASK_BSY | IMASK_EBERR | IMASK_BABR | \
 		IMASK_XFUN | IMASK_RXC | IMASK_BABT | IMASK_DPE \
 		| IMASK_PERR)
-#define IMASK_RTX_DISABLED ((~(IMASK_RXFEN0 | IMASK_TXFEN | IMASK_BSY)) \
-			   & IMASK_DEFAULT)
+#define IMASK_RX_DEFAULT (IMASK_RXFEN0 | IMASK_BSY)
+#define IMASK_TX_DEFAULT (IMASK_TXFEN | IMASK_TXBEN)
+
+#define IMASK_RX_DISABLED ((~(IMASK_RX_DEFAULT)) & IMASK_DEFAULT)
+#define IMASK_TX_DISABLED ((~(IMASK_TX_DEFAULT)) & IMASK_DEFAULT)
 
 /* Fifo management */
 #define FIFO_TX_THR_MASK	0x01ff
@@ -1014,13 +1017,13 @@ struct gfar_irqinfo {
 
 struct gfar_priv_grp {
 	spinlock_t grplock __attribute__ ((aligned (SMP_CACHE_BYTES)));
-	struct	napi_struct napi;
+	struct	napi_struct napi_rx;
+	struct	napi_struct napi_tx;
 	struct gfar_private *priv;
 	struct gfar __iomem *regs;
 	unsigned int rstat;
 	unsigned long num_rx_queues;
 	unsigned long rx_bit_map;
-	/* cacheline 3 */
 	unsigned int tstat;
 	unsigned long num_tx_queues;
 	unsigned long tx_bit_map;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 2/2] gianfar: Make multi-queue polling optional
  2014-03-05  8:28 [PATCH net-next 0/2] gianfar: Tx timeout issue Claudiu Manoil
  2014-03-05  8:28 ` [PATCH net-next 1/2] gianfar: Separate out the Tx interrupt handling (Tx NAPI) Claudiu Manoil
@ 2014-03-05  8:28 ` Claudiu Manoil
  2014-03-06 21:52   ` David Miller
  1 sibling, 1 reply; 6+ messages in thread
From: Claudiu Manoil @ 2014-03-05  8:28 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

For the newer controllers (etsec2 models) the driver currently
supports 8 Tx and Rx DMA rings (aka HW queues).  However, there
are only 2 pairs of Rx/Tx interrupt lines, as these controllers
are integrated in low power SoCs with 2 CPUs at most.  As a result,
there are at most 2 NAPI instances that have to service multiple
Tx and Rx queues for these devices.  This complicates the NAPI
polling routine having to iterate over the mutiple Rx/Tx queues
hooked to the same interrupt lines.  And there's also an overhead
at HW level, as the controller needs to service all the 8 Tx rings
in a round robin manner.  The cumulated overhead shows up for mutiple
parrallel Tx flows transmitted by the kernel stack, when the driver
usually starts returning NETDEV_TX_BUSY and leading to NETDEV WATCHDOG
Tx timeout triggering if the Tx path is congested for too long.

As an alternative, this patch makes the driver support only one
Tx/Rx DMA ring per NAPI instace (per interrupt group or pair
of Tx/Rx interrupt lines) by default.  The simplified single queue
polling routine (gfar_poll_sq) will be the default napi poll routine
for the etsec2 devices too.  Only small adjustments needed to be made
to link the Tx/Rx HW queues with each NAPI instance (2 in this case).
The gfar_poll_sq() is already succefully used by older SQ_SG (single
interrupt group) controllers.  And there's also a significat memory
footprint reduction by supporting 2 Rx/Tx DMA rings (at most), instead
of 8.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 40 +++++++++++++++++++++++++++-----
 drivers/net/ethernet/freescale/gianfar.h | 18 ++++++++++----
 2 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 1aa2d55..829eb34 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -128,8 +128,10 @@ static void free_skb_resources(struct gfar_private *priv);
 static void gfar_set_multi(struct net_device *dev);
 static void gfar_set_hash_for_addr(struct net_device *dev, u8 *addr);
 static void gfar_configure_serdes(struct net_device *dev);
+#ifdef GFAR_MULTI_Q_POLL
 static int gfar_poll_rx(struct napi_struct *napi, int budget);
 static int gfar_poll_tx(struct napi_struct *napi, int budget);
+#endif
 static int gfar_poll_rx_sq(struct napi_struct *napi, int budget);
 static int gfar_poll_tx_sq(struct napi_struct *napi, int budget);
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -636,7 +638,6 @@ static int gfar_parse_group(struct device_node *np,
 			    struct gfar_private *priv, const char *model)
 {
 	struct gfar_priv_grp *grp = &priv->gfargrp[priv->num_grps];
-	u32 *queue_mask;
 	int i;
 
 	for (i = 0; i < GFAR_NUM_IRQS; i++) {
@@ -665,12 +666,18 @@ static int gfar_parse_group(struct device_node *np,
 	grp->priv = priv;
 	spin_lock_init(&grp->grplock);
 	if (priv->mode == MQ_MG_MODE) {
+#ifdef GFAR_MULTI_Q_POLL
+		u32 *queue_mask;
 		queue_mask = (u32 *)of_get_property(np, "fsl,rx-bit-map", NULL);
 		grp->rx_bit_map = queue_mask ?
 			*queue_mask : (DEFAULT_MAPPING >> priv->num_grps);
 		queue_mask = (u32 *)of_get_property(np, "fsl,tx-bit-map", NULL);
 		grp->tx_bit_map = queue_mask ?
 			*queue_mask : (DEFAULT_MAPPING >> priv->num_grps);
+#else /* One Q per interrupt group: Q0 to G0, Q1 to G1 */
+		grp->rx_bit_map = (DEFAULT_MAPPING >> priv->num_grps);
+		grp->tx_bit_map = (DEFAULT_MAPPING >> priv->num_grps);
+#endif
 	} else {
 		grp->rx_bit_map = 0xFF;
 		grp->tx_bit_map = 0xFF;
@@ -686,6 +693,8 @@ static int gfar_parse_group(struct device_node *np,
 	 * also assign queues to groups
 	 */
 	for_each_set_bit(i, &grp->rx_bit_map, priv->num_rx_queues) {
+		if (!grp->rx_queue)
+			grp->rx_queue = priv->rx_queue[i];
 		grp->num_rx_queues++;
 		grp->rstat |= (RSTAT_CLEAR_RHALT >> i);
 		priv->rqueue |= ((RQUEUE_EN0 | RQUEUE_EX0) >> i);
@@ -693,6 +702,8 @@ static int gfar_parse_group(struct device_node *np,
 	}
 
 	for_each_set_bit(i, &grp->tx_bit_map, priv->num_tx_queues) {
+		if (!grp->tx_queue)
+			grp->tx_queue = priv->tx_queue[i];
 		grp->num_tx_queues++;
 		grp->tstat |= (TSTAT_CLEAR_THALT >> i);
 		priv->tqueue |= (TQUEUE_EN0 >> i);
@@ -726,6 +737,9 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 	/* parse the num of tx and rx queues */
 	tx_queues = (u32 *)of_get_property(np, "fsl,num_tx_queues", NULL);
 	num_tx_qs = tx_queues ? *tx_queues : 1;
+#ifndef GFAR_MULTI_Q_POLL
+	num_tx_qs = (num_tx_qs > 2) ? 2 : num_tx_qs; /* one q per int group */
+#endif
 
 	if (num_tx_qs > MAX_TX_QS) {
 		pr_err("num_tx_qs(=%d) greater than MAX_TX_QS(=%d)\n",
@@ -736,6 +750,9 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 
 	rx_queues = (u32 *)of_get_property(np, "fsl,num_rx_queues", NULL);
 	num_rx_qs = rx_queues ? *rx_queues : 1;
+#ifndef GFAR_MULTI_Q_POLL
+	num_rx_qs = (num_rx_qs > 2) ? 2 : num_rx_qs; /* one q per int group */
+#endif
 
 	if (num_rx_qs > MAX_RX_QS) {
 		pr_err("num_rx_qs(=%d) greater than MAX_RX_QS(=%d)\n",
@@ -1271,9 +1288,19 @@ static int gfar_probe(struct platform_device *ofdev)
 	} else {
 		for (i = 0; i < priv->num_grps; i++) {
 			netif_napi_add(dev, &priv->gfargrp[i].napi_rx,
-				       gfar_poll_rx, GFAR_DEV_WEIGHT);
+#ifdef GFAR_MULTI_Q_POLL
+				       gfar_poll_rx,
+#else
+				       gfar_poll_rx_sq,
+#endif
+				       GFAR_DEV_WEIGHT);
 			netif_napi_add(dev, &priv->gfargrp[i].napi_tx,
-				       gfar_poll_tx, 2);
+#ifdef GFAR_MULTI_Q_POLL
+				       gfar_poll_tx,
+#else
+				       gfar_poll_tx_sq,
+#endif
+				       2);
 		}
 	}
 
@@ -2819,7 +2846,7 @@ static int gfar_poll_rx_sq(struct napi_struct *napi, int budget)
 	struct gfar_priv_grp *gfargrp =
 		container_of(napi, struct gfar_priv_grp, napi_rx);
 	struct gfar __iomem *regs = gfargrp->regs;
-	struct gfar_priv_rx_q *rx_queue = gfargrp->priv->rx_queue[0];
+	struct gfar_priv_rx_q *rx_queue = gfargrp->rx_queue;
 	int work_done = 0;
 
 	/* Clear IEVENT, so interrupts aren't called again
@@ -2850,7 +2877,7 @@ static int gfar_poll_tx_sq(struct napi_struct *napi, int budget)
 	struct gfar_priv_grp *gfargrp =
 		container_of(napi, struct gfar_priv_grp, napi_tx);
 	struct gfar __iomem *regs = gfargrp->regs;
-	struct gfar_priv_tx_q *tx_queue = gfargrp->priv->tx_queue[0];
+	struct gfar_priv_tx_q *tx_queue = gfargrp->tx_queue;
 	u32 imask;
 
 	/* Clear IEVENT, so interrupts aren't called again
@@ -2873,6 +2900,7 @@ static int gfar_poll_tx_sq(struct napi_struct *napi, int budget)
 	return 0;
 }
 
+#ifdef GFAR_MULTI_Q_POLL
 static int gfar_poll_rx(struct napi_struct *napi, int budget)
 {
 	struct gfar_priv_grp *gfargrp =
@@ -2972,7 +3000,7 @@ static int gfar_poll_tx(struct napi_struct *napi, int budget)
 
 	return 0;
 }
-
+#endif
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
 /* Polling 'interrupt' - used by things like netconsole to send skbs
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 1aeb34e..a90c848 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -410,9 +410,14 @@ extern const char gfar_driver_version[];
 #define FPR_FILER_MASK	0xFFFFFFFF
 #define MAX_FILER_IDX	0xFF
 
+#ifdef GFAR_MULTI_Q_POLL
 /* This default RIR value directly corresponds
  * to the 3-bit hash value generated */
 #define DEFAULT_RIR0	0x05397700
+#else /* only 2 Qs used */
+/* Map even hash values to Q0, and odd ones to Q1 */
+#define DEFAULT_RIR0	0x04104100
+#endif
 
 /* RQFCR register bits */
 #define RQFCR_GPI		0x80000000
@@ -1016,17 +1021,20 @@ struct gfar_irqinfo {
  */
 
 struct gfar_priv_grp {
-	spinlock_t grplock __attribute__ ((aligned (SMP_CACHE_BYTES)));
+	spinlock_t grplock __aligned(SMP_CACHE_BYTES);
 	struct	napi_struct napi_rx;
 	struct	napi_struct napi_tx;
-	struct gfar_private *priv;
 	struct gfar __iomem *regs;
-	unsigned int rstat;
-	unsigned long num_rx_queues;
-	unsigned long rx_bit_map;
+	struct gfar_priv_tx_q *tx_queue;
+	struct gfar_priv_rx_q *rx_queue;
 	unsigned int tstat;
+	unsigned int rstat;
+
+	struct gfar_private *priv;
 	unsigned long num_tx_queues;
 	unsigned long tx_bit_map;
+	unsigned long num_rx_queues;
+	unsigned long rx_bit_map;
 
 	struct gfar_irqinfo *irqinfo[GFAR_NUM_IRQS];
 };
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 2/2] gianfar: Make multi-queue polling optional
  2014-03-05  8:28 ` [PATCH net-next 2/2] gianfar: Make multi-queue polling optional Claudiu Manoil
@ 2014-03-06 21:52   ` David Miller
  2014-03-07 12:52     ` Claudiu Manoil
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2014-03-06 21:52 UTC (permalink / raw)
  To: claudiu.manoil; +Cc: netdev

From: Claudiu Manoil <claudiu.manoil@freescale.com>
Date: Wed, 5 Mar 2014 10:28:39 +0200

> For the newer controllers (etsec2 models) the driver currently
> supports 8 Tx and Rx DMA rings (aka HW queues).  However, there
> are only 2 pairs of Rx/Tx interrupt lines, as these controllers
> are integrated in low power SoCs with 2 CPUs at most.  As a result,
> there are at most 2 NAPI instances that have to service multiple
> Tx and Rx queues for these devices.  This complicates the NAPI
> polling routine having to iterate over the mutiple Rx/Tx queues
> hooked to the same interrupt lines.  And there's also an overhead
> at HW level, as the controller needs to service all the 8 Tx rings
> in a round robin manner.  The cumulated overhead shows up for mutiple
> parrallel Tx flows transmitted by the kernel stack, when the driver
> usually starts returning NETDEV_TX_BUSY and leading to NETDEV WATCHDOG
> Tx timeout triggering if the Tx path is congested for too long.
> 
> As an alternative, this patch makes the driver support only one
> Tx/Rx DMA ring per NAPI instace (per interrupt group or pair
> of Tx/Rx interrupt lines) by default.  The simplified single queue
> polling routine (gfar_poll_sq) will be the default napi poll routine
> for the etsec2 devices too.  Only small adjustments needed to be made
> to link the Tx/Rx HW queues with each NAPI instance (2 in this case).
> The gfar_poll_sq() is already succefully used by older SQ_SG (single
> interrupt group) controllers.  And there's also a significat memory
> footprint reduction by supporting 2 Rx/Tx DMA rings (at most), instead
> of 8.
> 
> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>

This patch is not OK.

First of all, you are disabling multi-queue for other devices.

You're adding a CPP check for a macro that is set by nothing.

Determine the condition to limit the number of queues at run-time.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 2/2] gianfar: Make multi-queue polling optional
  2014-03-06 21:52   ` David Miller
@ 2014-03-07 12:52     ` Claudiu Manoil
  2014-03-07 18:20       ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Claudiu Manoil @ 2014-03-07 12:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 3/6/2014 11:52 PM, David Miller wrote:
> From: Claudiu Manoil <claudiu.manoil@freescale.com>
> Date: Wed, 5 Mar 2014 10:28:39 +0200
>
>> For the newer controllers (etsec2 models) the driver currently
>> supports 8 Tx and Rx DMA rings (aka HW queues).  However, there
>> are only 2 pairs of Rx/Tx interrupt lines, as these controllers
>> are integrated in low power SoCs with 2 CPUs at most.  As a result,
>> there are at most 2 NAPI instances that have to service multiple
>> Tx and Rx queues for these devices.  This complicates the NAPI
>> polling routine having to iterate over the mutiple Rx/Tx queues
>> hooked to the same interrupt lines.  And there's also an overhead
>> at HW level, as the controller needs to service all the 8 Tx rings
>> in a round robin manner.  The cumulated overhead shows up for mutiple
>> parrallel Tx flows transmitted by the kernel stack, when the driver
>> usually starts returning NETDEV_TX_BUSY and leading to NETDEV WATCHDOG
>> Tx timeout triggering if the Tx path is congested for too long.
>>
>> As an alternative, this patch makes the driver support only one
>> Tx/Rx DMA ring per NAPI instace (per interrupt group or pair
>> of Tx/Rx interrupt lines) by default.  The simplified single queue
>> polling routine (gfar_poll_sq) will be the default napi poll routine
>> for the etsec2 devices too.  Only small adjustments needed to be made
>> to link the Tx/Rx HW queues with each NAPI instance (2 in this case).
>> The gfar_poll_sq() is already succefully used by older SQ_SG (single
>> interrupt group) controllers.  And there's also a significat memory
>> footprint reduction by supporting 2 Rx/Tx DMA rings (at most), instead
>> of 8.
>>
>> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
>
> This patch is not OK.
>
> First of all, you are disabling multi-queue for other devices.
>
> You're adding a CPP check for a macro that is set by nothing.
>
> Determine the condition to limit the number of queues at run-time.
>
>

Hi David,

Thanks for reviewing this and for the concerns.
I agree that using CPP defines is ugly.  I reworked
the 2nd patch to do these checks at run-time.

For your first concern, the "fsl,etsec2" models are the
only devices for which multi-queue NAPI polling is used.
The other devices work in SQ_SG mode, they support a single
pair of Rx/Tx queues and already use single queue NAPI polling.

The "fsl,etsec2" models work in MQ_MG_MODE (Multi-Group) because
they have 2 separate pairs of Rx/Tx interrupt lines
(aka interrupt groups), so they can work with 2 (Rx/Tx) NAPI
instances serving one (Rx/Tx) queue each.
So because these devices can support 2 Rx and 2 TX queues,
they are still multi-queue, but they can be serviced with the
simplified SQ NAPI poll routine, per interrupt group
(NAPI instance).

However, enabling support for all the 8 RX and 8 TX hw queues
turns out to be a problem as the combined processing overhead is
too much: multi-queue polling per NAPI instance + eTSEC
controller having to service the 8 Tx queues round-robin.
And this results in serious Tx congestion (and Tx timeout).

As I see it, multi-queue NAPI polling not only creates issues
but is not justified for this linux driver.  However, instead
of removing this code altogether I thought it would be safer
to still keep it in the driver for a while, and do some
checks to limit the number of queues at runtime.

Hopefully this explains the problem a bit clearer.

Thanks.
Claudiu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 2/2] gianfar: Make multi-queue polling optional
  2014-03-07 12:52     ` Claudiu Manoil
@ 2014-03-07 18:20       ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2014-03-07 18:20 UTC (permalink / raw)
  To: claudiu.manoil; +Cc: netdev

From: Claudiu Manoil <claudiu.manoil@freescale.com>
Date: Fri, 7 Mar 2014 14:52:32 +0200

> Hopefully this explains the problem a bit clearer.

You haven't addressed my concerns at all.

You're disabling an entire body of code with no way to enable
it at all.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-03-07 18:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-05  8:28 [PATCH net-next 0/2] gianfar: Tx timeout issue Claudiu Manoil
2014-03-05  8:28 ` [PATCH net-next 1/2] gianfar: Separate out the Tx interrupt handling (Tx NAPI) Claudiu Manoil
2014-03-05  8:28 ` [PATCH net-next 2/2] gianfar: Make multi-queue polling optional Claudiu Manoil
2014-03-06 21:52   ` David Miller
2014-03-07 12:52     ` Claudiu Manoil
2014-03-07 18:20       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).