netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] forcedeth: several proposed updates for testing
@ 2007-10-06 15:12 Jeff Garzik
  2007-10-06 15:13 ` [PATCH 1/5] forcedeth: make NAPI unconditional Jeff Garzik
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-06 15:12 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton


The 'fe-lock' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git fe-lock

contains the following changes that I would like to get tested:

      [netdrvr] forcedeth: make NAPI unconditional
      [netdrvr] forcedeth: interrupt handling cleanup
      [netdrvr] forcedeth: process TX completions using NAPI
      [netdrvr] forcedeth: internal simplification and cleanups
      [netdrvr] forcedeth: timer overhaul

These are intended for feedback and testing, NOT for merging.

The goals of these changes are:
* move the driver towards a more sane, simple, easy to verify locking
  setup -- irq handler would often acquire/release the lock twice
  for each interrupt -- and hopefully

* to eliminate a rarely used, apparently fragile locking scheme that
  includes heavy use of disable_irq().  this tool is most often employed
  during NIC reset/reconfiguration, so satisfying this goal implies
  changing the way NIC reset and config are accomplished.

Miscellaneous notes:
* using the new napi_struct stuff in net-2.6.24, the TX completion
  process has been moved to a -separate- NAPI polling channel.  thus
  there are now two napi_structs, one for RX and one for TX, independent
  of each other.

* I feel TX NAPI is a useful tool, because it provides an independent TX
  process control point and system load feedback point.
  Thus I felt this was slightly superior to tasklets.

* But who knows if this is a good idea?  :)  I am interested in
  feedback and criticism on this issue.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] forcedeth: make NAPI unconditional
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
@ 2007-10-06 15:13 ` Jeff Garzik
  2007-10-06 15:14 ` [PATCH 2/5] forcedeth: interrupt handling cleanup Jeff Garzik
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-06 15:13 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton


commit 7bfc023b952e8e12c7333efccd2e78023c546a7c
Author: Jeff Garzik <jeff@garzik.org>
Date:   Fri Oct 5 20:50:24 2007 -0400

    [netdrvr] forcedeth: make NAPI unconditional
    
    Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

 drivers/net/Kconfig     |   17 -----
 drivers/net/forcedeth.c |  149 +-----------------------------------------------
 2 files changed, 4 insertions(+), 162 deletions(-)

7bfc023b952e8e12c7333efccd2e78023c546a7c
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 9c635a2..59eab61 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1430,23 +1430,6 @@ config FORCEDETH
 	  <file:Documentation/networking/net-modules.txt>.  The module will be
 	  called forcedeth.
 
-config FORCEDETH_NAPI
-	bool "Use Rx and Tx Polling (NAPI) (EXPERIMENTAL)"
-	depends on FORCEDETH && EXPERIMENTAL
-	help
-	  NAPI is a new driver API designed to reduce CPU and interrupt load
-	  when the driver is receiving lots of packets from the card. It is
-	  still somewhat experimental and thus not yet enabled by default.
-
-	  If your estimated Rx load is 10kpps or more, or if the card will be
-	  deployed on potentially unfriendly networks (e.g. in a firewall),
-	  then say Y here.
-
-	  See <file:Documentation/networking/NAPI_HOWTO.txt> for more
-	  information.
-
-	  If in doubt, say N.
-
 config CS89x0
 	tristate "CS89x0 support"
 	depends on NET_PCI && (ISA || MACH_IXDP2351 || ARCH_IXDP2X01 || ARCH_PNX010X)
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index dae30b7..49906cc 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -123,12 +123,7 @@
  * DEV_NEED_TIMERIRQ will not harm you on sane hardware, only generating a few
  * superfluous timer interrupts from the nic.
  */
-#ifdef CONFIG_FORCEDETH_NAPI
-#define DRIVERNAPI "-NAPI"
-#else
-#define DRIVERNAPI
-#endif
-#define FORCEDETH_VERSION		"0.60"
+#define FORCEDETH_VERSION		"1.00"
 #define DRV_NAME			"forcedeth"
 
 #include <linux/module.h>
@@ -1587,7 +1582,6 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 }
 
 /* If rx bufs are exhausted called after 50ms to attempt to refresh */
-#ifdef CONFIG_FORCEDETH_NAPI
 static void nv_do_rx_refill(unsigned long data)
 {
 	struct net_device *dev = (struct net_device *) data;
@@ -1596,41 +1590,6 @@ static void nv_do_rx_refill(unsigned long data)
 	/* Just reschedule NAPI rx processing */
 	netif_rx_schedule(dev, &np->napi);
 }
-#else
-static void nv_do_rx_refill(unsigned long data)
-{
-	struct net_device *dev = (struct net_device *) data;
-	struct fe_priv *np = netdev_priv(dev);
-	int retcode;
-
-	if (!using_multi_irqs(dev)) {
-		if (np->msi_flags & NV_MSI_X_ENABLED)
-			disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
-		else
-			disable_irq(dev->irq);
-	} else {
-		disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
-	}
-	if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)
-		retcode = nv_alloc_rx(dev);
-	else
-		retcode = nv_alloc_rx_optimized(dev);
-	if (retcode) {
-		spin_lock_irq(&np->lock);
-		if (!np->in_shutdown)
-			mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-		spin_unlock_irq(&np->lock);
-	}
-	if (!using_multi_irqs(dev)) {
-		if (np->msi_flags & NV_MSI_X_ENABLED)
-			enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
-		else
-			enable_irq(dev->irq);
-	} else {
-		enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
-	}
-}
-#endif
 
 static void nv_init_rx(struct net_device *dev)
 {
@@ -2383,11 +2342,8 @@ static int nv_rx_process(struct net_device *dev, int limit)
 		skb->protocol = eth_type_trans(skb, dev);
 		dprintk(KERN_DEBUG "%s: nv_rx_process: %d bytes, proto %d accepted.\n",
 					dev->name, len, skb->protocol);
-#ifdef CONFIG_FORCEDETH_NAPI
 		netif_receive_skb(skb);
-#else
-		netif_rx(skb);
-#endif
+
 		dev->last_rx = jiffies;
 		np->stats.rx_packets++;
 		np->stats.rx_bytes += len;
@@ -2480,28 +2436,14 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
 				dev->name, len, skb->protocol);
 
 			if (likely(!np->vlangrp)) {
-#ifdef CONFIG_FORCEDETH_NAPI
 				netif_receive_skb(skb);
-#else
-				netif_rx(skb);
-#endif
 			} else {
 				vlanflags = le32_to_cpu(np->get_rx.ex->buflow);
-				if (vlanflags & NV_RX3_VLAN_TAG_PRESENT) {
-#ifdef CONFIG_FORCEDETH_NAPI
+				if (vlanflags & NV_RX3_VLAN_TAG_PRESENT)
 					vlan_hwaccel_receive_skb(skb, np->vlangrp,
 								 vlanflags & NV_RX3_VLAN_TAG_MASK);
-#else
-					vlan_hwaccel_rx(skb, np->vlangrp,
-							vlanflags & NV_RX3_VLAN_TAG_MASK);
-#endif
-				} else {
-#ifdef CONFIG_FORCEDETH_NAPI
+				else
 					netif_receive_skb(skb);
-#else
-					netif_rx(skb);
-#endif
-				}
 			}
 
 			dev->last_rx = jiffies;
@@ -3001,7 +2943,6 @@ static irqreturn_t nv_nic_irq(int foo, void *data)
 		nv_tx_done(dev);
 		spin_unlock(&np->lock);
 
-#ifdef CONFIG_FORCEDETH_NAPI
 		if (events & NVREG_IRQ_RX_ALL) {
 			netif_rx_schedule(dev, &np->napi);
 
@@ -3015,16 +2956,6 @@ static irqreturn_t nv_nic_irq(int foo, void *data)
 				writel(np->irqmask, base + NvRegIrqMask);
 			spin_unlock(&np->lock);
 		}
-#else
-		if (nv_rx_process(dev, RX_WORK_PER_LOOP)) {
-			if (unlikely(nv_alloc_rx(dev))) {
-				spin_lock(&np->lock);
-				if (!np->in_shutdown)
-					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-				spin_unlock(&np->lock);
-			}
-		}
-#endif
 		if (unlikely(events & NVREG_IRQ_LINK)) {
 			spin_lock(&np->lock);
 			nv_link_irq(dev);
@@ -3116,7 +3047,6 @@ static irqreturn_t nv_nic_irq_optimized(int foo, void *data)
 		nv_tx_done_optimized(dev, TX_WORK_PER_LOOP);
 		spin_unlock(&np->lock);
 
-#ifdef CONFIG_FORCEDETH_NAPI
 		if (events & NVREG_IRQ_RX_ALL) {
 			netif_rx_schedule(dev, &np->napi);
 
@@ -3130,16 +3060,6 @@ static irqreturn_t nv_nic_irq_optimized(int foo, void *data)
 				writel(np->irqmask, base + NvRegIrqMask);
 			spin_unlock(&np->lock);
 		}
-#else
-		if (nv_rx_process_optimized(dev, RX_WORK_PER_LOOP)) {
-			if (unlikely(nv_alloc_rx_optimized(dev))) {
-				spin_lock(&np->lock);
-				if (!np->in_shutdown)
-					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-				spin_unlock(&np->lock);
-			}
-		}
-#endif
 		if (unlikely(events & NVREG_IRQ_LINK)) {
 			spin_lock(&np->lock);
 			nv_link_irq(dev);
@@ -3248,7 +3168,6 @@ static irqreturn_t nv_nic_irq_tx(int foo, void *data)
 	return IRQ_RETVAL(i);
 }
 
-#ifdef CONFIG_FORCEDETH_NAPI
 static int nv_napi_poll(struct napi_struct *napi, int budget)
 {
 	struct fe_priv *np = container_of(napi, struct fe_priv, napi);
@@ -3288,9 +3207,7 @@ static int nv_napi_poll(struct napi_struct *napi, int budget)
 	}
 	return pkts;
 }
-#endif
 
-#ifdef CONFIG_FORCEDETH_NAPI
 static irqreturn_t nv_nic_irq_rx(int foo, void *data)
 {
 	struct net_device *dev = (struct net_device *) data;
@@ -3309,54 +3226,6 @@ static irqreturn_t nv_nic_irq_rx(int foo, void *data)
 	}
 	return IRQ_HANDLED;
 }
-#else
-static irqreturn_t nv_nic_irq_rx(int foo, void *data)
-{
-	struct net_device *dev = (struct net_device *) data;
-	struct fe_priv *np = netdev_priv(dev);
-	u8 __iomem *base = get_hwbase(dev);
-	u32 events;
-	int i;
-	unsigned long flags;
-
-	dprintk(KERN_DEBUG "%s: nv_nic_irq_rx\n", dev->name);
-
-	for (i=0; ; i++) {
-		events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_RX_ALL;
-		writel(NVREG_IRQ_RX_ALL, base + NvRegMSIXIrqStatus);
-		dprintk(KERN_DEBUG "%s: rx irq: %08x\n", dev->name, events);
-		if (!(events & np->irqmask))
-			break;
-
-		if (nv_rx_process_optimized(dev, RX_WORK_PER_LOOP)) {
-			if (unlikely(nv_alloc_rx_optimized(dev))) {
-				spin_lock_irqsave(&np->lock, flags);
-				if (!np->in_shutdown)
-					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-				spin_unlock_irqrestore(&np->lock, flags);
-			}
-		}
-
-		if (unlikely(i > max_interrupt_work)) {
-			spin_lock_irqsave(&np->lock, flags);
-			/* disable interrupts on the nic */
-			writel(NVREG_IRQ_RX_ALL, base + NvRegIrqMask);
-			pci_push(base);
-
-			if (!np->in_shutdown) {
-				np->nic_poll_irq |= NVREG_IRQ_RX_ALL;
-				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-			}
-			spin_unlock_irqrestore(&np->lock, flags);
-			printk(KERN_DEBUG "%s: too many iterations (%d) in nv_nic_irq_rx.\n", dev->name, i);
-			break;
-		}
-	}
-	dprintk(KERN_DEBUG "%s: nv_nic_irq_rx completed\n", dev->name);
-
-	return IRQ_RETVAL(i);
-}
-#endif
 
 static irqreturn_t nv_nic_irq_other(int foo, void *data)
 {
@@ -4619,9 +4488,7 @@ static void nv_self_test(struct net_device *dev, struct ethtool_test *test, u64
 	if (test->flags & ETH_TEST_FL_OFFLINE) {
 		if (netif_running(dev)) {
 			netif_stop_queue(dev);
-#ifdef CONFIG_FORCEDETH_NAPI
 			napi_disable(&np->napi);
-#endif
 			netif_tx_lock_bh(dev);
 			spin_lock_irq(&np->lock);
 			nv_disable_hw_interrupts(dev, np->irqmask);
@@ -4680,9 +4547,7 @@ static void nv_self_test(struct net_device *dev, struct ethtool_test *test, u64
 			nv_start_rx(dev);
 			nv_start_tx(dev);
 			netif_start_queue(dev);
-#ifdef CONFIG_FORCEDETH_NAPI
 			napi_enable(&np->napi);
-#endif
 			nv_enable_hw_interrupts(dev, np->irqmask);
 		}
 	}
@@ -4910,9 +4775,7 @@ static int nv_open(struct net_device *dev)
 	nv_start_rx(dev);
 	nv_start_tx(dev);
 	netif_start_queue(dev);
-#ifdef CONFIG_FORCEDETH_NAPI
 	napi_enable(&np->napi);
-#endif
 
 	if (ret) {
 		netif_carrier_on(dev);
@@ -4943,9 +4806,7 @@ static int nv_close(struct net_device *dev)
 	spin_lock_irq(&np->lock);
 	np->in_shutdown = 1;
 	spin_unlock_irq(&np->lock);
-#ifdef CONFIG_FORCEDETH_NAPI
 	napi_disable(&np->napi);
-#endif
 	synchronize_irq(dev->irq);
 
 	del_timer_sync(&np->oom_kick);
@@ -5159,9 +5020,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	dev->poll_controller = nv_poll_controller;
 #endif
-#ifdef CONFIG_FORCEDETH_NAPI
 	netif_napi_add(dev, &np->napi, nv_napi_poll, RX_WORK_PER_LOOP);
-#endif
 	SET_ETHTOOL_OPS(dev, &ops);
 	dev->tx_timeout = nv_tx_timeout;
 	dev->watchdog_timeo = NV_WATCHDOG_TIMEO;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] forcedeth: interrupt handling cleanup
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
  2007-10-06 15:13 ` [PATCH 1/5] forcedeth: make NAPI unconditional Jeff Garzik
@ 2007-10-06 15:14 ` Jeff Garzik
  2007-10-07  4:43   ` Yinghai Lu
  2007-10-07  9:03   ` Ingo Molnar
  2007-10-06 15:14 ` [PATCH 3/5] forcedeth: process TX completions using NAPI Jeff Garzik
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-06 15:14 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton


commit a606d2a111cdf948da5d69eb1de5526c5c2dafef
Author: Jeff Garzik <jeff@garzik.org>
Date:   Fri Oct 5 22:56:05 2007 -0400

    [netdrvr] forcedeth: interrupt handling cleanup
    
    * nv_nic_irq_optimized() and nv_nic_irq_other() were complete duplicates
      of nv_nic_irq(), with the exception of one function call.  Consolidate
      all three into a single interrupt handler, deleting a lot of redundant
      code.
    
    * greatly simplify irq handler locking.
    
      Prior to this change, the irq handler(s) would acquire and release
      np->lock for each action (RX, TX, other events).
    
      For the common case -- RX or TX work -- the lock is always acquired,
      making all successive acquire/release combinations largely redundant.
    
      Acquire the lock at the beginning of the irq handler, and release it at
      the end of the irq handler.  This is simple, easy, and obvious.
    
    * remove irq handler work loop.
    
      All interesting events emanating from the irq handler either have
      their own work loops, or they poke a timer into action.
    
      Therefore, delete the pointless master interrupt handler work loop.
    
    Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

 drivers/net/forcedeth.c |  325 +++++++++++-------------------------------------
 1 file changed, 77 insertions(+), 248 deletions(-)

a606d2a111cdf948da5d69eb1de5526c5c2dafef
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 49906cc..1d1a5f5 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -2917,208 +2917,110 @@ static void nv_link_irq(struct net_device *dev)
 	dprintk(KERN_DEBUG "%s: link change notification done.\n", dev->name);
 }
 
-static irqreturn_t nv_nic_irq(int foo, void *data)
+static irqreturn_t __nv_nic_irq(struct net_device *dev, bool optimized)
 {
-	struct net_device *dev = (struct net_device *) data;
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
 	u32 events;
-	int i;
+	int handled = 0;
 
-	dprintk(KERN_DEBUG "%s: nv_nic_irq\n", dev->name);
+	dprintk(KERN_DEBUG "%s: nv_nic_irq%s\n", dev->name,
+		optimized ? "_optimized" : "");
 
-	for (i=0; ; i++) {
-		if (!(np->msi_flags & NV_MSI_X_ENABLED)) {
-			events = readl(base + NvRegIrqStatus) & NVREG_IRQSTAT_MASK;
-			writel(NVREG_IRQSTAT_MASK, base + NvRegIrqStatus);
-		} else {
-			events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQSTAT_MASK;
-			writel(NVREG_IRQSTAT_MASK, base + NvRegMSIXIrqStatus);
-		}
-		dprintk(KERN_DEBUG "%s: irq: %08x\n", dev->name, events);
-		if (!(events & np->irqmask))
-			break;
+	spin_lock(&np->lock);
 
-		spin_lock(&np->lock);
-		nv_tx_done(dev);
-		spin_unlock(&np->lock);
+	if (!(np->msi_flags & NV_MSI_X_ENABLED)) {
+		events = readl(base + NvRegIrqStatus) & NVREG_IRQSTAT_MASK;
+		writel(NVREG_IRQSTAT_MASK, base + NvRegIrqStatus);
+	} else {
+		events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQSTAT_MASK;
+		writel(NVREG_IRQSTAT_MASK, base + NvRegMSIXIrqStatus);
+	}
 
-		if (events & NVREG_IRQ_RX_ALL) {
-			netif_rx_schedule(dev, &np->napi);
+	dprintk(KERN_DEBUG "%s: irq: %08x\n", dev->name, events);
 
-			/* Disable furthur receive irq's */
-			spin_lock(&np->lock);
-			np->irqmask &= ~NVREG_IRQ_RX_ALL;
+	if (!(events & np->irqmask))
+		goto out;
 
-			if (np->msi_flags & NV_MSI_X_ENABLED)
-				writel(NVREG_IRQ_RX_ALL, base + NvRegIrqMask);
-			else
-				writel(np->irqmask, base + NvRegIrqMask);
-			spin_unlock(&np->lock);
-		}
-		if (unlikely(events & NVREG_IRQ_LINK)) {
-			spin_lock(&np->lock);
-			nv_link_irq(dev);
-			spin_unlock(&np->lock);
-		}
-		if (unlikely(np->need_linktimer && time_after(jiffies, np->link_timeout))) {
-			spin_lock(&np->lock);
-			nv_linkchange(dev);
-			spin_unlock(&np->lock);
-			np->link_timeout = jiffies + LINK_TIMEOUT;
-		}
-		if (unlikely(events & (NVREG_IRQ_TX_ERR))) {
-			dprintk(KERN_DEBUG "%s: received irq with events 0x%x. Probably TX fail.\n",
-						dev->name, events);
-		}
-		if (unlikely(events & (NVREG_IRQ_UNKNOWN))) {
-			printk(KERN_DEBUG "%s: received irq with unknown events 0x%x. Please report\n",
-						dev->name, events);
-		}
-		if (unlikely(events & NVREG_IRQ_RECOVER_ERROR)) {
-			spin_lock(&np->lock);
-			/* disable interrupts on the nic */
-			if (!(np->msi_flags & NV_MSI_X_ENABLED))
-				writel(0, base + NvRegIrqMask);
-			else
-				writel(np->irqmask, base + NvRegIrqMask);
-			pci_push(base);
+	if (optimized)
+		nv_tx_done_optimized(dev, TX_WORK_PER_LOOP);
+	else
+		nv_tx_done(dev);
 
-			if (!np->in_shutdown) {
-				np->nic_poll_irq = np->irqmask;
-				np->recover_error = 1;
-				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-			}
-			spin_unlock(&np->lock);
-			break;
-		}
-		if (unlikely(i > max_interrupt_work)) {
-			spin_lock(&np->lock);
-			/* disable interrupts on the nic */
-			if (!(np->msi_flags & NV_MSI_X_ENABLED))
-				writel(0, base + NvRegIrqMask);
-			else
-				writel(np->irqmask, base + NvRegIrqMask);
-			pci_push(base);
+	if (events & NVREG_IRQ_RX_ALL) {
+		netif_rx_schedule(dev, &np->napi);
 
-			if (!np->in_shutdown) {
-				np->nic_poll_irq = np->irqmask;
-				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-			}
-			spin_unlock(&np->lock);
-			printk(KERN_DEBUG "%s: too many iterations (%d) in nv_nic_irq.\n", dev->name, i);
-			break;
-		}
+		/* Disable furthur receive irq's */
+		np->irqmask &= ~NVREG_IRQ_RX_ALL;
 
+		if (np->msi_flags & NV_MSI_X_ENABLED)
+			writel(NVREG_IRQ_RX_ALL, base + NvRegIrqMask);
+		else
+			writel(np->irqmask, base + NvRegIrqMask);
 	}
-	dprintk(KERN_DEBUG "%s: nv_nic_irq completed\n", dev->name);
 
-	return IRQ_RETVAL(i);
-}
+	if (unlikely(events & NVREG_IRQ_LINK))
+		nv_link_irq(dev);
 
-/**
- * All _optimized functions are used to help increase performance
- * (reduce CPU and increase throughput). They use descripter version 3,
- * compiler directives, and reduce memory accesses.
- */
-static irqreturn_t nv_nic_irq_optimized(int foo, void *data)
-{
-	struct net_device *dev = (struct net_device *) data;
-	struct fe_priv *np = netdev_priv(dev);
-	u8 __iomem *base = get_hwbase(dev);
-	u32 events;
-	int i;
+	if (unlikely(np->need_linktimer && time_after(jiffies, np->link_timeout))) {
+		nv_linkchange(dev);
+		np->link_timeout = jiffies + LINK_TIMEOUT;
+	}
 
-	dprintk(KERN_DEBUG "%s: nv_nic_irq_optimized\n", dev->name);
+	if (unlikely(events & (NVREG_IRQ_TX_ERR))) {
+		dprintk(KERN_DEBUG "%s: received irq with events 0x%x. "
+			"Probably TX fail.\n",
+			dev->name, events);
+	}
 
-	for (i=0; ; i++) {
-		if (!(np->msi_flags & NV_MSI_X_ENABLED)) {
-			events = readl(base + NvRegIrqStatus) & NVREG_IRQSTAT_MASK;
-			writel(NVREG_IRQSTAT_MASK, base + NvRegIrqStatus);
-		} else {
-			events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQSTAT_MASK;
-			writel(NVREG_IRQSTAT_MASK, base + NvRegMSIXIrqStatus);
-		}
-		dprintk(KERN_DEBUG "%s: irq: %08x\n", dev->name, events);
-		if (!(events & np->irqmask))
-			break;
+	if (unlikely(events & (NVREG_IRQ_UNKNOWN))) {
+		printk(KERN_DEBUG "%s: received irq with unknown events 0x%x. "
+			"Please report\n",
+			dev->name, events);
+	}
 
-		spin_lock(&np->lock);
-		nv_tx_done_optimized(dev, TX_WORK_PER_LOOP);
-		spin_unlock(&np->lock);
+	if (unlikely(events & NVREG_IRQ_RECOVER_ERROR)) {
+		/* disable interrupts on the nic */
+		if (!(np->msi_flags & NV_MSI_X_ENABLED))
+			writel(0, base + NvRegIrqMask);
+		else
+			writel(np->irqmask, base + NvRegIrqMask);
+		pci_push(base);
 
-		if (events & NVREG_IRQ_RX_ALL) {
-			netif_rx_schedule(dev, &np->napi);
+		if (!np->in_shutdown) {
+			np->nic_poll_irq = np->irqmask;
+			np->recover_error = 1;
+			mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
+		}
+	}
 
-			/* Disable furthur receive irq's */
-			spin_lock(&np->lock);
-			np->irqmask &= ~NVREG_IRQ_RX_ALL;
+	handled = 1;
 
-			if (np->msi_flags & NV_MSI_X_ENABLED)
-				writel(NVREG_IRQ_RX_ALL, base + NvRegIrqMask);
-			else
-				writel(np->irqmask, base + NvRegIrqMask);
-			spin_unlock(&np->lock);
-		}
-		if (unlikely(events & NVREG_IRQ_LINK)) {
-			spin_lock(&np->lock);
-			nv_link_irq(dev);
-			spin_unlock(&np->lock);
-		}
-		if (unlikely(np->need_linktimer && time_after(jiffies, np->link_timeout))) {
-			spin_lock(&np->lock);
-			nv_linkchange(dev);
-			spin_unlock(&np->lock);
-			np->link_timeout = jiffies + LINK_TIMEOUT;
-		}
-		if (unlikely(events & (NVREG_IRQ_TX_ERR))) {
-			dprintk(KERN_DEBUG "%s: received irq with events 0x%x. Probably TX fail.\n",
-						dev->name, events);
-		}
-		if (unlikely(events & (NVREG_IRQ_UNKNOWN))) {
-			printk(KERN_DEBUG "%s: received irq with unknown events 0x%x. Please report\n",
-						dev->name, events);
-		}
-		if (unlikely(events & NVREG_IRQ_RECOVER_ERROR)) {
-			spin_lock(&np->lock);
-			/* disable interrupts on the nic */
-			if (!(np->msi_flags & NV_MSI_X_ENABLED))
-				writel(0, base + NvRegIrqMask);
-			else
-				writel(np->irqmask, base + NvRegIrqMask);
-			pci_push(base);
+out:
+	spin_unlock(&np->lock);
 
-			if (!np->in_shutdown) {
-				np->nic_poll_irq = np->irqmask;
-				np->recover_error = 1;
-				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-			}
-			spin_unlock(&np->lock);
-			break;
-		}
+	dprintk(KERN_DEBUG "%s: nv_nic_irq%s completed\n", dev->name,
+		optimized ? "_optimized" : "");
 
-		if (unlikely(i > max_interrupt_work)) {
-			spin_lock(&np->lock);
-			/* disable interrupts on the nic */
-			if (!(np->msi_flags & NV_MSI_X_ENABLED))
-				writel(0, base + NvRegIrqMask);
-			else
-				writel(np->irqmask, base + NvRegIrqMask);
-			pci_push(base);
+	return IRQ_RETVAL(handled);
+}
 
-			if (!np->in_shutdown) {
-				np->nic_poll_irq = np->irqmask;
-				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-			}
-			spin_unlock(&np->lock);
-			printk(KERN_DEBUG "%s: too many iterations (%d) in nv_nic_irq.\n", dev->name, i);
-			break;
-		}
+static irqreturn_t nv_nic_irq(int foo, void *data)
+{
+	struct net_device *dev = data;
+	return __nv_nic_irq(dev, false);
+}
 
-	}
-	dprintk(KERN_DEBUG "%s: nv_nic_irq_optimized completed\n", dev->name);
+static irqreturn_t nv_nic_irq_optimized(int foo, void *data)
+{
+	struct net_device *dev = data;
+	return __nv_nic_irq(dev, true);
+}
 
-	return IRQ_RETVAL(i);
+static irqreturn_t nv_nic_irq_other(int foo, void *data)
+{
+	struct net_device *dev = (struct net_device *) data;
+	return __nv_nic_irq(dev, true);
 }
 
 static irqreturn_t nv_nic_irq_tx(int foo, void *data)
@@ -3227,79 +3129,6 @@ static irqreturn_t nv_nic_irq_rx(int foo, void *data)
 	return IRQ_HANDLED;
 }
 
-static irqreturn_t nv_nic_irq_other(int foo, void *data)
-{
-	struct net_device *dev = (struct net_device *) data;
-	struct fe_priv *np = netdev_priv(dev);
-	u8 __iomem *base = get_hwbase(dev);
-	u32 events;
-	int i;
-	unsigned long flags;
-
-	dprintk(KERN_DEBUG "%s: nv_nic_irq_other\n", dev->name);
-
-	for (i=0; ; i++) {
-		events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_OTHER;
-		writel(NVREG_IRQ_OTHER, base + NvRegMSIXIrqStatus);
-		dprintk(KERN_DEBUG "%s: irq: %08x\n", dev->name, events);
-		if (!(events & np->irqmask))
-			break;
-
-		/* check tx in case we reached max loop limit in tx isr */
-		spin_lock_irqsave(&np->lock, flags);
-		nv_tx_done_optimized(dev, TX_WORK_PER_LOOP);
-		spin_unlock_irqrestore(&np->lock, flags);
-
-		if (events & NVREG_IRQ_LINK) {
-			spin_lock_irqsave(&np->lock, flags);
-			nv_link_irq(dev);
-			spin_unlock_irqrestore(&np->lock, flags);
-		}
-		if (np->need_linktimer && time_after(jiffies, np->link_timeout)) {
-			spin_lock_irqsave(&np->lock, flags);
-			nv_linkchange(dev);
-			spin_unlock_irqrestore(&np->lock, flags);
-			np->link_timeout = jiffies + LINK_TIMEOUT;
-		}
-		if (events & NVREG_IRQ_RECOVER_ERROR) {
-			spin_lock_irq(&np->lock);
-			/* disable interrupts on the nic */
-			writel(NVREG_IRQ_OTHER, base + NvRegIrqMask);
-			pci_push(base);
-
-			if (!np->in_shutdown) {
-				np->nic_poll_irq |= NVREG_IRQ_OTHER;
-				np->recover_error = 1;
-				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-			}
-			spin_unlock_irq(&np->lock);
-			break;
-		}
-		if (events & (NVREG_IRQ_UNKNOWN)) {
-			printk(KERN_DEBUG "%s: received irq with unknown events 0x%x. Please report\n",
-						dev->name, events);
-		}
-		if (unlikely(i > max_interrupt_work)) {
-			spin_lock_irqsave(&np->lock, flags);
-			/* disable interrupts on the nic */
-			writel(NVREG_IRQ_OTHER, base + NvRegIrqMask);
-			pci_push(base);
-
-			if (!np->in_shutdown) {
-				np->nic_poll_irq |= NVREG_IRQ_OTHER;
-				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-			}
-			spin_unlock_irqrestore(&np->lock, flags);
-			printk(KERN_DEBUG "%s: too many iterations (%d) in nv_nic_irq_other.\n", dev->name, i);
-			break;
-		}
-
-	}
-	dprintk(KERN_DEBUG "%s: nv_nic_irq_other completed\n", dev->name);
-
-	return IRQ_RETVAL(i);
-}
-
 static irqreturn_t nv_nic_irq_test(int foo, void *data)
 {
 	struct net_device *dev = (struct net_device *) data;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] forcedeth: process TX completions using NAPI
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
  2007-10-06 15:13 ` [PATCH 1/5] forcedeth: make NAPI unconditional Jeff Garzik
  2007-10-06 15:14 ` [PATCH 2/5] forcedeth: interrupt handling cleanup Jeff Garzik
@ 2007-10-06 15:14 ` Jeff Garzik
  2007-10-07 14:39   ` Jeff Garzik
  2007-10-06 15:14 ` [PATCH 4/5] forcedeth: internal simplification and cleanups Jeff Garzik
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Jeff Garzik @ 2007-10-06 15:14 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton


commit 57cbfacc00d69be2ba02b65d1021442273b76263
Author: Jeff Garzik <jeff@garzik.org>
Date:   Fri Oct 5 23:25:56 2007 -0400

    [netdrvr] forcedeth: process TX completions using NAPI
    
    Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

 drivers/net/forcedeth.c |  143 +++++++++++++++++++++++++++---------------------
 1 file changed, 83 insertions(+), 60 deletions(-)

57cbfacc00d69be2ba02b65d1021442273b76263
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 1d1a5f5..1c236e6 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -744,6 +744,7 @@ struct fe_priv {
 
 	struct net_device *dev;
 	struct napi_struct napi;
+	struct napi_struct tx_napi;
 
 	/* General data:
 	 * Locking: spin_lock(&np->lock); */
@@ -810,7 +811,6 @@ struct fe_priv {
 	union ring_type tx_ring;
 	u32 tx_flags;
 	int tx_ring_size;
-	int tx_stop;
 
 	/* vlan fields */
 	struct vlan_group *vlangrp;
@@ -1763,10 +1763,7 @@ static int nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	empty_slots = nv_get_empty_tx_slots(np);
 	if (unlikely(empty_slots <= entries)) {
-		spin_lock_irq(&np->lock);
 		netif_stop_queue(dev);
-		np->tx_stop = 1;
-		spin_unlock_irq(&np->lock);
 		return NETDEV_TX_BUSY;
 	}
 
@@ -1879,10 +1876,7 @@ static int nv_start_xmit_optimized(struct sk_buff *skb, struct net_device *dev)
 
 	empty_slots = nv_get_empty_tx_slots(np);
 	if (unlikely(empty_slots <= entries)) {
-		spin_lock_irq(&np->lock);
 		netif_stop_queue(dev);
-		np->tx_stop = 1;
-		spin_unlock_irq(&np->lock);
 		return NETDEV_TX_BUSY;
 	}
 
@@ -1987,12 +1981,16 @@ static int nv_start_xmit_optimized(struct sk_buff *skb, struct net_device *dev)
  *
  * Caller must own np->lock.
  */
-static void nv_tx_done(struct net_device *dev)
+static int nv_tx_done(struct net_device *dev, int limit)
 {
 	struct fe_priv *np = netdev_priv(dev);
 	u32 flags;
+	int orig_limit = limit;
 	struct ring_desc* orig_get_tx = np->get_tx.orig;
 
+	if (unlikely(limit < 1))
+		return 0;
+
 	while ((np->get_tx.orig != np->put_tx.orig) &&
 	       !((flags = le32_to_cpu(np->get_tx.orig->flaglen)) & NV_TX_VALID)) {
 
@@ -2039,19 +2037,28 @@ static void nv_tx_done(struct net_device *dev)
 			np->get_tx.orig = np->first_tx.orig;
 		if (unlikely(np->get_tx_ctx++ == np->last_tx_ctx))
 			np->get_tx_ctx = np->first_tx_ctx;
+
+		limit--;
+		if (limit == 0)
+			break;
 	}
-	if (unlikely((np->tx_stop == 1) && (np->get_tx.orig != orig_get_tx))) {
-		np->tx_stop = 0;
+
+	if (np->get_tx.orig != orig_get_tx)
 		netif_wake_queue(dev);
-	}
+
+	return orig_limit - limit;
 }
 
-static void nv_tx_done_optimized(struct net_device *dev, int limit)
+static int nv_tx_done_optimized(struct net_device *dev, int limit)
 {
 	struct fe_priv *np = netdev_priv(dev);
 	u32 flags;
+	int orig_limit = limit;
 	struct ring_desc_ex* orig_get_tx = np->get_tx.ex;
 
+	if (unlikely(limit < 1))
+		return 0;
+
 	while ((np->get_tx.ex != np->put_tx.ex) &&
 	       !((flags = le32_to_cpu(np->get_tx.ex->flaglen)) & NV_TX_VALID) &&
 	       (limit-- > 0)) {
@@ -2075,10 +2082,11 @@ static void nv_tx_done_optimized(struct net_device *dev, int limit)
 		if (unlikely(np->get_tx_ctx++ == np->last_tx_ctx))
 			np->get_tx_ctx = np->first_tx_ctx;
 	}
-	if (unlikely((np->tx_stop == 1) && (np->get_tx.ex != orig_get_tx))) {
-		np->tx_stop = 0;
+
+	if (np->get_tx.ex != orig_get_tx)
 		netif_wake_queue(dev);
-	}
+
+	return orig_limit - limit;
 }
 
 /*
@@ -2149,9 +2157,9 @@ static void nv_tx_timeout(struct net_device *dev)
 	/* 1) stop tx engine */
 	nv_stop_tx(dev);
 
-	/* 2) check that the packets were not sent already: */
+	/* 2) process all pending tx completions */
 	if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)
-		nv_tx_done(dev);
+		nv_tx_done(dev, np->tx_ring_size);
 	else
 		nv_tx_done_optimized(dev, np->tx_ring_size);
 
@@ -2923,6 +2931,7 @@ static irqreturn_t __nv_nic_irq(struct net_device *dev, bool optimized)
 	u8 __iomem *base = get_hwbase(dev);
 	u32 events;
 	int handled = 0;
+	u32 upd_mask = 0;
 
 	dprintk(KERN_DEBUG "%s: nv_nic_irq%s\n", dev->name,
 		optimized ? "_optimized" : "");
@@ -2942,19 +2951,25 @@ static irqreturn_t __nv_nic_irq(struct net_device *dev, bool optimized)
 	if (!(events & np->irqmask))
 		goto out;
 
-	if (optimized)
-		nv_tx_done_optimized(dev, TX_WORK_PER_LOOP);
-	else
-		nv_tx_done(dev);
-
 	if (events & NVREG_IRQ_RX_ALL) {
 		netif_rx_schedule(dev, &np->napi);
 
 		/* Disable furthur receive irq's */
-		np->irqmask &= ~NVREG_IRQ_RX_ALL;
+		upd_mask |= NVREG_IRQ_RX_ALL;
+	}
+
+	if (events & NVREG_IRQ_TX_ALL) {
+		netif_rx_schedule(dev, &np->tx_napi);
+
+		/* Disable furthur xmit irq's */
+		upd_mask |= NVREG_IRQ_TX_ALL;
+	}
+
+	if (upd_mask) {
+		np->irqmask &= ~upd_mask;
 
 		if (np->msi_flags & NV_MSI_X_ENABLED)
-			writel(NVREG_IRQ_RX_ALL, base + NvRegIrqMask);
+			writel(upd_mask, base + NvRegIrqMask);
 		else
 			writel(np->irqmask, base + NvRegIrqMask);
 	}
@@ -3019,7 +3034,7 @@ static irqreturn_t nv_nic_irq_optimized(int foo, void *data)
 
 static irqreturn_t nv_nic_irq_other(int foo, void *data)
 {
-	struct net_device *dev = (struct net_device *) data;
+	struct net_device *dev = data;
 	return __nv_nic_irq(dev, true);
 }
 
@@ -3029,45 +3044,48 @@ static irqreturn_t nv_nic_irq_tx(int foo, void *data)
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
 	u32 events;
-	int i;
-	unsigned long flags;
 
-	dprintk(KERN_DEBUG "%s: nv_nic_irq_tx\n", dev->name);
+	events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_TX_ALL;
+	writel(NVREG_IRQ_TX_ALL, base + NvRegMSIXIrqStatus);
 
-	for (i=0; ; i++) {
-		events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_TX_ALL;
-		writel(NVREG_IRQ_TX_ALL, base + NvRegMSIXIrqStatus);
-		dprintk(KERN_DEBUG "%s: tx irq: %08x\n", dev->name, events);
-		if (!(events & np->irqmask))
-			break;
+	if (events) {
+		netif_rx_schedule(dev, &np->tx_napi);
+		/* disable receive interrupts on the nic */
+		writel(NVREG_IRQ_TX_ALL, base + NvRegIrqMask);
+		pci_push(base);
+	}
+	return IRQ_HANDLED;
+}
 
-		spin_lock_irqsave(&np->lock, flags);
-		nv_tx_done_optimized(dev, TX_WORK_PER_LOOP);
-		spin_unlock_irqrestore(&np->lock, flags);
+static int nv_napi_tx_poll(struct napi_struct *napi, int budget)
+{
+	struct fe_priv *np = container_of(napi, struct fe_priv, napi);
+	struct net_device *dev = np->dev;
+	u8 __iomem *base = get_hwbase(dev);
+	unsigned long flags;
+	int pkts;
 
-		if (unlikely(events & (NVREG_IRQ_TX_ERR))) {
-			dprintk(KERN_DEBUG "%s: received irq with events 0x%x. Probably TX fail.\n",
-						dev->name, events);
-		}
-		if (unlikely(i > max_interrupt_work)) {
-			spin_lock_irqsave(&np->lock, flags);
-			/* disable interrupts on the nic */
-			writel(NVREG_IRQ_TX_ALL, base + NvRegIrqMask);
-			pci_push(base);
+	spin_lock_irqsave(&np->lock, flags);
 
-			if (!np->in_shutdown) {
-				np->nic_poll_irq |= NVREG_IRQ_TX_ALL;
-				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-			}
-			spin_unlock_irqrestore(&np->lock, flags);
-			printk(KERN_DEBUG "%s: too many iterations (%d) in nv_nic_irq_tx.\n", dev->name, i);
-			break;
-		}
+	if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)
+		pkts = nv_tx_done(dev, budget);
+	else
+		pkts = nv_tx_done_optimized(dev, budget);
 
+	if (pkts < budget) {
+		/* re-enable receive interrupts */
+		__netif_rx_complete(dev, napi);
+
+		np->irqmask |= NVREG_IRQ_TX_ALL;
+		if (np->msi_flags & NV_MSI_X_ENABLED)
+			writel(NVREG_IRQ_TX_ALL, base + NvRegIrqMask);
+		else
+			writel(np->irqmask, base + NvRegIrqMask);
 	}
-	dprintk(KERN_DEBUG "%s: nv_nic_irq_tx completed\n", dev->name);
 
-	return IRQ_RETVAL(i);
+	spin_unlock_irqrestore(&np->lock, flags);
+
+	return pkts;
 }
 
 static int nv_napi_poll(struct napi_struct *napi, int budget)
@@ -4316,7 +4334,8 @@ static void nv_self_test(struct net_device *dev, struct ethtool_test *test, u64
 
 	if (test->flags & ETH_TEST_FL_OFFLINE) {
 		if (netif_running(dev)) {
-			netif_stop_queue(dev);
+			netif_tx_disable(dev);
+			napi_disable(&np->tx_napi);
 			napi_disable(&np->napi);
 			netif_tx_lock_bh(dev);
 			spin_lock_irq(&np->lock);
@@ -4375,8 +4394,9 @@ static void nv_self_test(struct net_device *dev, struct ethtool_test *test, u64
 			/* restart rx engine */
 			nv_start_rx(dev);
 			nv_start_tx(dev);
-			netif_start_queue(dev);
 			napi_enable(&np->napi);
+			napi_enable(&np->tx_napi);
+			netif_start_queue(dev);
 			nv_enable_hw_interrupts(dev, np->irqmask);
 		}
 	}
@@ -4603,8 +4623,9 @@ static int nv_open(struct net_device *dev)
 	ret = nv_update_linkspeed(dev);
 	nv_start_rx(dev);
 	nv_start_tx(dev);
-	netif_start_queue(dev);
 	napi_enable(&np->napi);
+	napi_enable(&np->tx_napi);
+	netif_start_queue(dev);
 
 	if (ret) {
 		netif_carrier_on(dev);
@@ -4635,14 +4656,15 @@ static int nv_close(struct net_device *dev)
 	spin_lock_irq(&np->lock);
 	np->in_shutdown = 1;
 	spin_unlock_irq(&np->lock);
+	netif_stop_queue(dev);
 	napi_disable(&np->napi);
+	napi_disable(&np->tx_napi);
 	synchronize_irq(dev->irq);
 
 	del_timer_sync(&np->oom_kick);
 	del_timer_sync(&np->nic_poll);
 	del_timer_sync(&np->stats_poll);
 
-	netif_stop_queue(dev);
 	spin_lock_irq(&np->lock);
 	nv_stop_tx(dev);
 	nv_stop_rx(dev);
@@ -4850,6 +4872,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	dev->poll_controller = nv_poll_controller;
 #endif
 	netif_napi_add(dev, &np->napi, nv_napi_poll, RX_WORK_PER_LOOP);
+	netif_napi_add(dev, &np->tx_napi, nv_napi_tx_poll, RX_WORK_PER_LOOP);
 	SET_ETHTOOL_OPS(dev, &ops);
 	dev->tx_timeout = nv_tx_timeout;
 	dev->watchdog_timeo = NV_WATCHDOG_TIMEO;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] forcedeth: internal simplification and cleanups
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
                   ` (2 preceding siblings ...)
  2007-10-06 15:14 ` [PATCH 3/5] forcedeth: process TX completions using NAPI Jeff Garzik
@ 2007-10-06 15:14 ` Jeff Garzik
  2007-10-06 15:15 ` [PATCH 5/5] forcedeth: timer overhaul Jeff Garzik
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-06 15:14 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton


commit 39572457a4dfe9a9dc1efd6641e7a6467e5658a1
Author: Jeff Garzik <jeff@garzik.org>
Date:   Sat Oct 6 01:21:01 2007 -0400

    [netdrvr] forcedeth: internal simplification and cleanups
    
    * remove changelog from source; its kept in git repository
    
    * split guts of RX/TX DMA engine disable into disable portion,
      and wait/etc. portions.
    
    * consolidate descriptor version tests using nv_optimized()
    
    * consolidate NIC DMA start, stop and drain into
      nv_start_txrx(), nv_stop_txrx(), nv_drain_txrx()
    
    * change nv_poll_controller() to call interrupt handling function
    
    Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

 drivers/net/forcedeth.c |  228 +++++++++++++++++-------------------------------
 1 file changed, 84 insertions(+), 144 deletions(-)

39572457a4dfe9a9dc1efd6641e7a6467e5658a1
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 1c236e6..d6eacd7 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -29,89 +29,7 @@
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  *
- * Changelog:
- * 	0.01: 05 Oct 2003: First release that compiles without warnings.
- * 	0.02: 05 Oct 2003: Fix bug for nv_drain_tx: do not try to free NULL skbs.
- * 			   Check all PCI BARs for the register window.
- * 			   udelay added to mii_rw.
- * 	0.03: 06 Oct 2003: Initialize dev->irq.
- * 	0.04: 07 Oct 2003: Initialize np->lock, reduce handled irqs, add printks.
- * 	0.05: 09 Oct 2003: printk removed again, irq status print tx_timeout.
- * 	0.06: 10 Oct 2003: MAC Address read updated, pff flag generation updated,
- * 			   irq mask updated
- * 	0.07: 14 Oct 2003: Further irq mask updates.
- * 	0.08: 20 Oct 2003: rx_desc.Length initialization added, nv_alloc_rx refill
- * 			   added into irq handler, NULL check for drain_ring.
- * 	0.09: 20 Oct 2003: Basic link speed irq implementation. Only handle the
- * 			   requested interrupt sources.
- * 	0.10: 20 Oct 2003: First cleanup for release.
- * 	0.11: 21 Oct 2003: hexdump for tx added, rx buffer sizes increased.
- * 			   MAC Address init fix, set_multicast cleanup.
- * 	0.12: 23 Oct 2003: Cleanups for release.
- * 	0.13: 25 Oct 2003: Limit for concurrent tx packets increased to 10.
- * 			   Set link speed correctly. start rx before starting
- * 			   tx (nv_start_rx sets the link speed).
- * 	0.14: 25 Oct 2003: Nic dependant irq mask.
- * 	0.15: 08 Nov 2003: fix smp deadlock with set_multicast_list during
- * 			   open.
- * 	0.16: 15 Nov 2003: include file cleanup for ppc64, rx buffer size
- * 			   increased to 1628 bytes.
- * 	0.17: 16 Nov 2003: undo rx buffer size increase. Substract 1 from
- * 			   the tx length.
- * 	0.18: 17 Nov 2003: fix oops due to late initialization of dev_stats
- * 	0.19: 29 Nov 2003: Handle RxNoBuf, detect & handle invalid mac
- * 			   addresses, really stop rx if already running
- * 			   in nv_start_rx, clean up a bit.
- * 	0.20: 07 Dec 2003: alloc fixes
- * 	0.21: 12 Jan 2004: additional alloc fix, nic polling fix.
- *	0.22: 19 Jan 2004: reprogram timer to a sane rate, avoid lockup
- *			   on close.
- *	0.23: 26 Jan 2004: various small cleanups
- *	0.24: 27 Feb 2004: make driver even less anonymous in backtraces
- *	0.25: 09 Mar 2004: wol support
- *	0.26: 03 Jun 2004: netdriver specific annotation, sparse-related fixes
- *	0.27: 19 Jun 2004: Gigabit support, new descriptor rings,
- *			   added CK804/MCP04 device IDs, code fixes
- *			   for registers, link status and other minor fixes.
- *	0.28: 21 Jun 2004: Big cleanup, making driver mostly endian safe
- *	0.29: 31 Aug 2004: Add backup timer for link change notification.
- *	0.30: 25 Sep 2004: rx checksum support for nf 250 Gb. Add rx reset
- *			   into nv_close, otherwise reenabling for wol can
- *			   cause DMA to kfree'd memory.
- *	0.31: 14 Nov 2004: ethtool support for getting/setting link
- *			   capabilities.
- *	0.32: 16 Apr 2005: RX_ERROR4 handling added.
- *	0.33: 16 May 2005: Support for MCP51 added.
- *	0.34: 18 Jun 2005: Add DEV_NEED_LINKTIMER to all nForce nics.
- *	0.35: 26 Jun 2005: Support for MCP55 added.
- *	0.36: 28 Jun 2005: Add jumbo frame support.
- *	0.37: 10 Jul 2005: Additional ethtool support, cleanup of pci id list
- *	0.38: 16 Jul 2005: tx irq rewrite: Use global flags instead of
- *			   per-packet flags.
- *	0.39: 18 Jul 2005: Add 64bit descriptor support.
- *	0.40: 19 Jul 2005: Add support for mac address change.
- *	0.41: 30 Jul 2005: Write back original MAC in nv_close instead
- *			   of nv_remove
- *	0.42: 06 Aug 2005: Fix lack of link speed initialization
- *			   in the second (and later) nv_open call
- *	0.43: 10 Aug 2005: Add support for tx checksum.
- *	0.44: 20 Aug 2005: Add support for scatter gather and segmentation.
- *	0.45: 18 Sep 2005: Remove nv_stop/start_rx from every link check
- *	0.46: 20 Oct 2005: Add irq optimization modes.
- *	0.47: 26 Oct 2005: Add phyaddr 0 in phy scan.
- *	0.48: 24 Dec 2005: Disable TSO, bugfix for pci_map_single
- *	0.49: 10 Dec 2005: Fix tso for large buffers.
- *	0.50: 20 Jan 2006: Add 8021pq tagging support.
- *	0.51: 20 Jan 2006: Add 64bit consistent memory allocation for rings.
- *	0.52: 20 Jan 2006: Add MSI/MSIX support.
- *	0.53: 19 Mar 2006: Fix init from low power mode and add hw reset.
- *	0.54: 21 Mar 2006: Fix spin locks for multi irqs and cleanup.
- *	0.55: 22 Mar 2006: Add flow control (pause frame).
- *	0.56: 22 Mar 2006: Additional ethtool config and moduleparam support.
- *	0.57: 14 May 2006: Mac address set in probe/remove and order corrections.
- *	0.58: 30 Oct 2006: Added support for sideband management unit.
- *	0.59: 30 Oct 2006: Added support for recoverable error.
- *	0.60: 20 Jan 2007: Code optimizations for rings, rx & tx data paths, and stats.
+ ****************************************************************************
  *
  * Known bugs:
  * We suspect that on some hardware no TX done interrupts are generated.
@@ -122,7 +40,9 @@
  * DEV_NEED_TIMERIRQ from the driver_data flags.
  * DEV_NEED_TIMERIRQ will not harm you on sane hardware, only generating a few
  * superfluous timer interrupts from the nic.
+ *
  */
+
 #define FORCEDETH_VERSION		"1.00"
 #define DRV_NAME			"forcedeth"
 
@@ -893,6 +813,12 @@ static inline void pci_push(u8 __iomem *base)
 	readl(base);
 }
 
+static bool nv_optimized(struct fe_priv *np)
+{
+	return (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2) ?
+		false : true;
+}
+
 static inline u32 nv_descr_getlength(struct ring_desc *prd, u32 v)
 {
 	return le32_to_cpu(prd->flaglen)
@@ -1342,18 +1268,28 @@ static void nv_start_rx(struct net_device *dev)
 	pci_push(base);
 }
 
-static void nv_stop_rx(struct net_device *dev)
+static void __nv_stop_rx(struct net_device *dev)
 {
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
 	u32 rx_ctrl = readl(base + NvRegReceiverControl);
 
-	dprintk(KERN_DEBUG "%s: nv_stop_rx\n", dev->name);
 	if (!np->mac_in_use)
 		rx_ctrl &= ~NVREG_RCVCTL_START;
 	else
 		rx_ctrl |= NVREG_RCVCTL_RX_PATH_EN;
 	writel(rx_ctrl, base + NvRegReceiverControl);
+}
+
+static void nv_stop_rx(struct net_device *dev)
+{
+	struct fe_priv *np = netdev_priv(dev);
+	u8 __iomem *base = get_hwbase(dev);
+
+	dprintk(KERN_DEBUG "%s: nv_stop_rx\n", dev->name);
+
+	__nv_stop_rx(dev);
+
 	reg_delay(dev, NvRegReceiverStatus, NVREG_RCVSTAT_BUSY, 0,
 			NV_RXSTOP_DELAY1, NV_RXSTOP_DELAY1MAX,
 			KERN_INFO "nv_stop_rx: ReceiverStatus remained busy");
@@ -1377,18 +1313,28 @@ static void nv_start_tx(struct net_device *dev)
 	pci_push(base);
 }
 
-static void nv_stop_tx(struct net_device *dev)
+static void __nv_stop_tx(struct net_device *dev)
 {
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
 	u32 tx_ctrl = readl(base + NvRegTransmitterControl);
 
-	dprintk(KERN_DEBUG "%s: nv_stop_tx\n", dev->name);
 	if (!np->mac_in_use)
 		tx_ctrl &= ~NVREG_XMITCTL_START;
 	else
 		tx_ctrl |= NVREG_XMITCTL_TX_PATH_EN;
 	writel(tx_ctrl, base + NvRegTransmitterControl);
+}
+
+static void nv_stop_tx(struct net_device *dev)
+{
+	struct fe_priv *np = netdev_priv(dev);
+	u8 __iomem *base = get_hwbase(dev);
+
+	dprintk(KERN_DEBUG "%s: nv_stop_tx\n", dev->name);
+
+	__nv_stop_tx(dev);
+
 	reg_delay(dev, NvRegTransmitterStatus, NVREG_XMITSTAT_BUSY, 0,
 			NV_TXSTOP_DELAY1, NV_TXSTOP_DELAY1MAX,
 			KERN_INFO "nv_stop_tx: TransmitterStatus remained busy");
@@ -1399,6 +1345,18 @@ static void nv_stop_tx(struct net_device *dev)
 		       base + NvRegTransmitPoll);
 }
 
+static void nv_stop_txrx(struct net_device *dev)
+{
+	nv_stop_rx(dev);
+	nv_stop_tx(dev);
+}
+
+static void nv_start_txrx(struct net_device *dev)
+{
+	nv_start_rx(dev);
+	nv_start_tx(dev);
+}
+
 static void nv_txrx_reset(struct net_device *dev)
 {
 	struct fe_priv *np = netdev_priv(dev);
@@ -1651,7 +1609,7 @@ static int nv_init_ring(struct net_device *dev)
 
 	nv_init_tx(dev);
 	nv_init_rx(dev);
-	if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)
+	if (!nv_optimized(np))
 		return nv_alloc_rx(dev);
 	else
 		return nv_alloc_rx_optimized(dev);
@@ -1723,7 +1681,7 @@ static void nv_drain_rx(struct net_device *dev)
 	}
 }
 
-static void drain_ring(struct net_device *dev)
+static void nv_drain_txrx(struct net_device *dev)
 {
 	nv_drain_tx(dev);
 	nv_drain_rx(dev);
@@ -2158,7 +2116,7 @@ static void nv_tx_timeout(struct net_device *dev)
 	nv_stop_tx(dev);
 
 	/* 2) process all pending tx completions */
-	if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)
+	if (!nv_optimized(np))
 		nv_tx_done(dev, np->tx_ring_size);
 	else
 		nv_tx_done_optimized(dev, np->tx_ring_size);
@@ -2514,12 +2472,10 @@ static int nv_change_mtu(struct net_device *dev, int new_mtu)
 		netif_tx_lock_bh(dev);
 		spin_lock(&np->lock);
 		/* stop engines */
-		nv_stop_rx(dev);
-		nv_stop_tx(dev);
+		nv_stop_txrx(dev);
 		nv_txrx_reset(dev);
 		/* drain rx queue */
-		nv_drain_rx(dev);
-		nv_drain_tx(dev);
+		nv_drain_txrx(dev);
 		/* reinit driver view of the rx queue */
 		set_bufsize(dev);
 		if (nv_init_ring(dev)) {
@@ -2536,8 +2492,7 @@ static int nv_change_mtu(struct net_device *dev, int new_mtu)
 		pci_push(base);
 
 		/* restart rx engine */
-		nv_start_rx(dev);
-		nv_start_tx(dev);
+		nv_start_txrx(dev);
 		spin_unlock(&np->lock);
 		netif_tx_unlock_bh(dev);
 		nv_enable_irq(dev);
@@ -3067,7 +3022,7 @@ static int nv_napi_tx_poll(struct napi_struct *napi, int budget)
 
 	spin_lock_irqsave(&np->lock, flags);
 
-	if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)
+	if (!nv_optimized(np))
 		pkts = nv_tx_done(dev, budget);
 	else
 		pkts = nv_tx_done_optimized(dev, budget);
@@ -3096,7 +3051,7 @@ static int nv_napi_poll(struct napi_struct *napi, int budget)
 	unsigned long flags;
 	int pkts, retcode;
 
-	if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2) {
+	if (!nv_optimized(np)) {
 		pkts = nv_rx_process(dev, budget);
 		retcode = nv_alloc_rx(dev);
 	} else {
@@ -3214,7 +3169,7 @@ static int nv_request_irq(struct net_device *dev, int intr_test)
 	if (intr_test) {
 		handler = nv_nic_irq_test;
 	} else {
-		if (np->desc_ver == DESC_VER_3)
+		if (nv_optimized(np))
 			handler = nv_nic_irq_optimized;
 		else
 			handler = nv_nic_irq;
@@ -3363,12 +3318,10 @@ static void nv_do_nic_poll(unsigned long data)
 			netif_tx_lock_bh(dev);
 			spin_lock(&np->lock);
 			/* stop engines */
-			nv_stop_rx(dev);
-			nv_stop_tx(dev);
+			nv_stop_txrx(dev);
 			nv_txrx_reset(dev);
 			/* drain rx queue */
-			nv_drain_rx(dev);
-			nv_drain_tx(dev);
+			nv_drain_txrx(dev);
 			/* reinit driver view of the rx queue */
 			set_bufsize(dev);
 			if (nv_init_ring(dev)) {
@@ -3385,8 +3338,7 @@ static void nv_do_nic_poll(unsigned long data)
 			pci_push(base);
 
 			/* restart rx engine */
-			nv_start_rx(dev);
-			nv_start_tx(dev);
+			nv_start_txrx(dev);
 			spin_unlock(&np->lock);
 			netif_tx_unlock_bh(dev);
 		}
@@ -3398,7 +3350,7 @@ static void nv_do_nic_poll(unsigned long data)
 	pci_push(base);
 
 	if (!using_multi_irqs(dev)) {
-		if (np->desc_ver == DESC_VER_3)
+		if (nv_optimized(np))
 			nv_nic_irq_optimized(0, dev);
 		else
 			nv_nic_irq(0, dev);
@@ -3425,7 +3377,12 @@ static void nv_do_nic_poll(unsigned long data)
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void nv_poll_controller(struct net_device *dev)
 {
-	nv_do_nic_poll((unsigned long) dev);
+	struct fe_priv *np = netdev_priv(dev);
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__nv_nic_irq(dev, nv_optimized(np));
+	local_irq_restore(flags);
 }
 #endif
 
@@ -3595,8 +3552,7 @@ static int nv_set_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
 		netif_tx_lock_bh(dev);
 		spin_lock(&np->lock);
 		/* stop engines */
-		nv_stop_rx(dev);
-		nv_stop_tx(dev);
+		nv_stop_txrx(dev);
 		spin_unlock(&np->lock);
 		netif_tx_unlock_bh(dev);
 	}
@@ -3702,8 +3658,7 @@ static int nv_set_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
 	}
 
 	if (netif_running(dev)) {
-		nv_start_rx(dev);
-		nv_start_tx(dev);
+		nv_start_txrx(dev);
 		nv_enable_irq(dev);
 	}
 
@@ -3746,8 +3701,7 @@ static int nv_nway_reset(struct net_device *dev)
 			netif_tx_lock_bh(dev);
 			spin_lock(&np->lock);
 			/* stop engines */
-			nv_stop_rx(dev);
-			nv_stop_tx(dev);
+			nv_stop_txrx(dev);
 			spin_unlock(&np->lock);
 			netif_tx_unlock_bh(dev);
 			printk(KERN_INFO "%s: link down.\n", dev->name);
@@ -3767,8 +3721,7 @@ static int nv_nway_reset(struct net_device *dev)
 		}
 
 		if (netif_running(dev)) {
-			nv_start_rx(dev);
-			nv_start_tx(dev);
+			nv_start_txrx(dev);
 			nv_enable_irq(dev);
 		}
 		ret = 0;
@@ -3859,12 +3812,10 @@ static int nv_set_ringparam(struct net_device *dev, struct ethtool_ringparam* ri
 		netif_tx_lock_bh(dev);
 		spin_lock(&np->lock);
 		/* stop engines */
-		nv_stop_rx(dev);
-		nv_stop_tx(dev);
+		nv_stop_txrx(dev);
 		nv_txrx_reset(dev);
 		/* drain queues */
-		nv_drain_rx(dev);
-		nv_drain_tx(dev);
+		nv_drain_txrx(dev);
 		/* delete queues */
 		free_rings(dev);
 	}
@@ -3904,8 +3855,7 @@ static int nv_set_ringparam(struct net_device *dev, struct ethtool_ringparam* ri
 		pci_push(base);
 
 		/* restart engines */
-		nv_start_rx(dev);
-		nv_start_tx(dev);
+		nv_start_txrx(dev);
 		spin_unlock(&np->lock);
 		netif_tx_unlock_bh(dev);
 		nv_enable_irq(dev);
@@ -3946,8 +3896,7 @@ static int nv_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam*
 		netif_tx_lock_bh(dev);
 		spin_lock(&np->lock);
 		/* stop engines */
-		nv_stop_rx(dev);
-		nv_stop_tx(dev);
+		nv_stop_txrx(dev);
 		spin_unlock(&np->lock);
 		netif_tx_unlock_bh(dev);
 	}
@@ -3988,8 +3937,7 @@ static int nv_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam*
 	}
 
 	if (netif_running(dev)) {
-		nv_start_rx(dev);
-		nv_start_tx(dev);
+		nv_start_txrx(dev);
 		nv_enable_irq(dev);
 	}
 	return 0;
@@ -4225,8 +4173,7 @@ static int nv_loopback_test(struct net_device *dev)
 	pci_push(base);
 
 	/* restart rx engine */
-	nv_start_rx(dev);
-	nv_start_tx(dev);
+	nv_start_txrx(dev);
 
 	/* setup packet for tx */
 	pkt_len = ETH_DATA_LEN;
@@ -4304,12 +4251,10 @@ static int nv_loopback_test(struct net_device *dev)
 	dev_kfree_skb_any(tx_skb);
  out:
 	/* stop engines */
-	nv_stop_rx(dev);
-	nv_stop_tx(dev);
+	nv_stop_txrx(dev);
 	nv_txrx_reset(dev);
 	/* drain rx queue */
-	nv_drain_rx(dev);
-	nv_drain_tx(dev);
+	nv_drain_txrx(dev);
 
 	if (netif_running(dev)) {
 		writel(misc1_flags, base + NvRegMisc1);
@@ -4346,12 +4291,10 @@ static void nv_self_test(struct net_device *dev, struct ethtool_test *test, u64
 				writel(NVREG_IRQSTAT_MASK, base + NvRegMSIXIrqStatus);
 			}
 			/* stop engines */
-			nv_stop_rx(dev);
-			nv_stop_tx(dev);
+			nv_stop_txrx(dev);
 			nv_txrx_reset(dev);
 			/* drain rx queue */
-			nv_drain_rx(dev);
-			nv_drain_tx(dev);
+			nv_drain_txrx(dev);
 			spin_unlock_irq(&np->lock);
 			netif_tx_unlock_bh(dev);
 		}
@@ -4392,8 +4335,7 @@ static void nv_self_test(struct net_device *dev, struct ethtool_test *test, u64
 			writel(NVREG_TXRXCTL_KICK|np->txrxctl_bits, get_hwbase(dev) + NvRegTxRxControl);
 			pci_push(base);
 			/* restart rx engine */
-			nv_start_rx(dev);
-			nv_start_tx(dev);
+			nv_start_txrx(dev);
 			napi_enable(&np->napi);
 			napi_enable(&np->tx_napi);
 			netif_start_queue(dev);
@@ -4621,8 +4563,7 @@ static int nv_open(struct net_device *dev)
 	 * to init hw */
 	np->linkspeed = 0;
 	ret = nv_update_linkspeed(dev);
-	nv_start_rx(dev);
-	nv_start_tx(dev);
+	nv_start_txrx(dev);
 	napi_enable(&np->napi);
 	napi_enable(&np->tx_napi);
 	netif_start_queue(dev);
@@ -4644,7 +4585,7 @@ static int nv_open(struct net_device *dev)
 
 	return 0;
 out_drain:
-	drain_ring(dev);
+	nv_drain_txrx(dev);
 	return ret;
 }
 
@@ -4666,8 +4607,7 @@ static int nv_close(struct net_device *dev)
 	del_timer_sync(&np->stats_poll);
 
 	spin_lock_irq(&np->lock);
-	nv_stop_tx(dev);
-	nv_stop_rx(dev);
+	nv_stop_txrx(dev);
 	nv_txrx_reset(dev);
 
 	/* disable interrupts on the nic or we will lock up */
@@ -4680,7 +4620,7 @@ static int nv_close(struct net_device *dev)
 
 	nv_free_irq(dev);
 
-	drain_ring(dev);
+	nv_drain_txrx(dev);
 
 	if (np->wolenabled) {
 		writel(NVREG_PFF_ALWAYS|NVREG_PFF_MYADDR, base + NvRegPacketFilterFlags);
@@ -4860,7 +4800,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 
 	dev->open = nv_open;
 	dev->stop = nv_close;
-	if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)
+	if (!nv_optimized(np))
 		dev->hard_start_xmit = nv_start_xmit;
 	else
 		dev->hard_start_xmit = nv_start_xmit_optimized;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] forcedeth: timer overhaul
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
                   ` (3 preceding siblings ...)
  2007-10-06 15:14 ` [PATCH 4/5] forcedeth: internal simplification and cleanups Jeff Garzik
@ 2007-10-06 15:15 ` Jeff Garzik
  2007-10-06 15:17 ` [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-06 15:15 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton


commit d7c766113ee2ec66ae8975e0acbad086d2c23594
Author: Jeff Garzik <jeff@garzik.org>
Date:   Sat Oct 6 10:57:56 2007 -0400

    [netdrvr] forcedeth: timer overhaul
    
    * convert stats_poll timer to a delayed-work workqueue stats_task
    
    * protect hw stats update with a lock
    
    * now that recovery is the only remaining use of nv_do_nic_poll(),
      rename it to nv_reset_task(), move it from a timer to a workqueue,
      and delete all non-recovery-related code from the function.
    
    * kill np->in_shutdown, it mirrors netif_running().  furthermore,
      the overwhelming majority of sites that tested np->in_shutdown
      were inside rtnl_lock() and guaranteed never to race against shutdown
      anyway.
    
    Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

 drivers/net/forcedeth.c |  194 ++++++++++++++++++------------------------------
 1 file changed, 75 insertions(+), 119 deletions(-)

d7c766113ee2ec66ae8975e0acbad086d2c23594
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index d6eacd7..a037f49 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -62,6 +62,7 @@
 #include <linux/init.h>
 #include <linux/if_vlan.h>
 #include <linux/dma-mapping.h>
+#include <linux/workqueue.h>
 
 #include <asm/irq.h>
 #include <asm/io.h>
@@ -450,7 +451,6 @@ union ring_type {
 #define NV_PKTLIMIT_2	9100	/* Actual limit according to NVidia: 9202 */
 
 #define OOM_REFILL	(1+HZ/20)
-#define POLL_WAIT	(1+HZ/100)
 #define LINK_TIMEOUT	(3*HZ)
 #define STATS_INTERVAL	(10*HZ)
 
@@ -670,7 +670,6 @@ struct fe_priv {
 	 * Locking: spin_lock(&np->lock); */
 	struct net_device_stats stats;
 	struct nv_ethtool_stats estats;
-	int in_shutdown;
 	u32 linkspeed;
 	int duplex;
 	int autoneg;
@@ -681,7 +680,6 @@ struct fe_priv {
 	unsigned int phy_model;
 	u16 gigabit;
 	int intr_test;
-	int recover_error;
 
 	/* General data: RO fields */
 	dma_addr_t ring_addr;
@@ -710,9 +708,9 @@ struct fe_priv {
 	unsigned int rx_buf_sz;
 	unsigned int pkt_limit;
 	struct timer_list oom_kick;
-	struct timer_list nic_poll;
-	struct timer_list stats_poll;
-	u32 nic_poll_irq;
+	struct work_struct reset_task;
+	struct delayed_work stats_task;
+	u32 reset_task_irq;
 	int rx_ring_size;
 
 	/* media detection workaround.
@@ -1388,7 +1386,7 @@ static void nv_mac_reset(struct net_device *dev)
 	pci_push(base);
 }
 
-static void nv_get_hw_stats(struct net_device *dev)
+static void __nv_get_hw_stats(struct net_device *dev)
 {
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
@@ -1443,6 +1441,16 @@ static void nv_get_hw_stats(struct net_device *dev)
 	}
 }
 
+static void nv_get_hw_stats(struct net_device *dev)
+{
+	struct fe_priv *np = netdev_priv(dev);
+	unsigned long flags;
+
+	spin_lock_irqsave(&np->lock, flags);
+	__nv_get_hw_stats(dev);
+	spin_unlock_irqrestore(&np->lock, flags);
+}
+
 /*
  * nv_get_stats: dev->get_stats function
  * Get latest stats value from the nic.
@@ -2478,10 +2486,9 @@ static int nv_change_mtu(struct net_device *dev, int new_mtu)
 		nv_drain_txrx(dev);
 		/* reinit driver view of the rx queue */
 		set_bufsize(dev);
-		if (nv_init_ring(dev)) {
-			if (!np->in_shutdown)
-				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-		}
+		if (nv_init_ring(dev))
+			mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
+
 		/* reinit nic view of the rx queue */
 		writel(np->rx_buf_sz, base + NvRegOffloadConfig);
 		setup_hw_rings(dev, NV_SETUP_RX_RING | NV_SETUP_TX_RING);
@@ -2957,10 +2964,9 @@ static irqreturn_t __nv_nic_irq(struct net_device *dev, bool optimized)
 			writel(np->irqmask, base + NvRegIrqMask);
 		pci_push(base);
 
-		if (!np->in_shutdown) {
-			np->nic_poll_irq = np->irqmask;
-			np->recover_error = 1;
-			mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
+		if (netif_running(dev)) {
+			np->reset_task_irq = np->irqmask;
+			schedule_work(&np->reset_task);
 		}
 	}
 
@@ -3061,8 +3067,7 @@ static int nv_napi_poll(struct napi_struct *napi, int budget)
 
 	if (retcode) {
 		spin_lock_irqsave(&np->lock, flags);
-		if (!np->in_shutdown)
-			mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
+		mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
 		spin_unlock_irqrestore(&np->lock, flags);
 	}
 
@@ -3276,12 +3281,12 @@ static void nv_free_irq(struct net_device *dev)
 	}
 }
 
-static void nv_do_nic_poll(unsigned long data)
+static void nv_reset_task(struct work_struct *work)
 {
-	struct net_device *dev = (struct net_device *) data;
-	struct fe_priv *np = netdev_priv(dev);
+	struct fe_priv *np = container_of(work, struct fe_priv, reset_task);
+	struct net_device *dev = np->dev;
 	u8 __iomem *base = get_hwbase(dev);
-	u32 mask = 0;
+	u32 mask;
 
 	/*
 	 * First disable irq(s) and then
@@ -3290,88 +3295,51 @@ static void nv_do_nic_poll(unsigned long data)
 	 */
 
 	if (!using_multi_irqs(dev)) {
-		if (np->msi_flags & NV_MSI_X_ENABLED)
-			disable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
-		else
-			disable_irq_lockdep(dev->irq);
 		mask = np->irqmask;
 	} else {
-		if (np->nic_poll_irq & NVREG_IRQ_RX_ALL) {
-			disable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
+		mask = 0;
+		if (np->reset_task_irq & NVREG_IRQ_RX_ALL)
 			mask |= NVREG_IRQ_RX_ALL;
-		}
-		if (np->nic_poll_irq & NVREG_IRQ_TX_ALL) {
-			disable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_TX].vector);
+		if (np->reset_task_irq & NVREG_IRQ_TX_ALL)
 			mask |= NVREG_IRQ_TX_ALL;
-		}
-		if (np->nic_poll_irq & NVREG_IRQ_OTHER) {
-			disable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_OTHER].vector);
+		if (np->reset_task_irq & NVREG_IRQ_OTHER)
 			mask |= NVREG_IRQ_OTHER;
-		}
 	}
-	np->nic_poll_irq = 0;
+	np->reset_task_irq = 0;
 
-	if (np->recover_error) {
-		np->recover_error = 0;
-		printk(KERN_INFO "forcedeth: MAC in recoverable error state\n");
-		if (netif_running(dev)) {
-			netif_tx_lock_bh(dev);
-			spin_lock(&np->lock);
-			/* stop engines */
-			nv_stop_txrx(dev);
-			nv_txrx_reset(dev);
-			/* drain rx queue */
-			nv_drain_txrx(dev);
-			/* reinit driver view of the rx queue */
-			set_bufsize(dev);
-			if (nv_init_ring(dev)) {
-				if (!np->in_shutdown)
-					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-			}
-			/* reinit nic view of the rx queue */
-			writel(np->rx_buf_sz, base + NvRegOffloadConfig);
-			setup_hw_rings(dev, NV_SETUP_RX_RING | NV_SETUP_TX_RING);
-			writel( ((np->rx_ring_size-1) << NVREG_RINGSZ_RXSHIFT) + ((np->tx_ring_size-1) << NVREG_RINGSZ_TXSHIFT),
-				base + NvRegRingSizes);
-			pci_push(base);
-			writel(NVREG_TXRXCTL_KICK|np->txrxctl_bits, get_hwbase(dev) + NvRegTxRxControl);
-			pci_push(base);
+	printk(KERN_INFO "forcedeth: MAC in recoverable error state\n");
+	if (!netif_running(dev))
+		goto out;
 
-			/* restart rx engine */
-			nv_start_txrx(dev);
-			spin_unlock(&np->lock);
-			netif_tx_unlock_bh(dev);
-		}
-	}
+	netif_tx_lock_bh(dev);
+	spin_lock(&np->lock);
+	/* stop engines */
+	nv_stop_txrx(dev);
+	nv_txrx_reset(dev);
+	/* drain rx queue */
+	nv_drain_txrx(dev);
+	/* reinit driver view of the rx queue */
+	set_bufsize(dev);
+	if (nv_init_ring(dev))
+		mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
 
-	/* FIXME: Do we need synchronize_irq(dev->irq) here? */
+	/* reinit nic view of the rx queue */
+	writel(np->rx_buf_sz, base + NvRegOffloadConfig);
+	setup_hw_rings(dev, NV_SETUP_RX_RING | NV_SETUP_TX_RING);
+	writel( ((np->rx_ring_size-1) << NVREG_RINGSZ_RXSHIFT) + ((np->tx_ring_size-1) << NVREG_RINGSZ_TXSHIFT),
+		base + NvRegRingSizes);
+	pci_push(base);
+	writel(NVREG_TXRXCTL_KICK|np->txrxctl_bits, get_hwbase(dev) + NvRegTxRxControl);
+	pci_push(base);
+
+	/* restart rx engine */
+	nv_start_txrx(dev);
+	spin_unlock(&np->lock);
+	netif_tx_unlock_bh(dev);
 
+out:
 	writel(mask, base + NvRegIrqMask);
 	pci_push(base);
-
-	if (!using_multi_irqs(dev)) {
-		if (nv_optimized(np))
-			nv_nic_irq_optimized(0, dev);
-		else
-			nv_nic_irq(0, dev);
-		if (np->msi_flags & NV_MSI_X_ENABLED)
-			enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
-		else
-			enable_irq_lockdep(dev->irq);
-	} else {
-		if (np->nic_poll_irq & NVREG_IRQ_RX_ALL) {
-			nv_nic_irq_rx(0, dev);
-			enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
-		}
-		if (np->nic_poll_irq & NVREG_IRQ_TX_ALL) {
-			nv_nic_irq_tx(0, dev);
-			enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_TX].vector);
-		}
-		if (np->nic_poll_irq & NVREG_IRQ_OTHER) {
-			nv_nic_irq_other(0, dev);
-			enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_OTHER].vector);
-		}
-	}
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -3386,15 +3354,15 @@ static void nv_poll_controller(struct net_device *dev)
 }
 #endif
 
-static void nv_do_stats_poll(unsigned long data)
+static void nv_stats_task(struct work_struct *_work)
 {
-	struct net_device *dev = (struct net_device *) data;
-	struct fe_priv *np = netdev_priv(dev);
+	struct delayed_work *work = (struct delayed_work *) _work;
+	struct fe_priv *np = container_of(work, struct fe_priv, stats_task);
+	struct net_device *dev = np->dev;
 
 	nv_get_hw_stats(dev);
 
-	if (!np->in_shutdown)
-		mod_timer(&np->stats_poll, jiffies + STATS_INTERVAL);
+	schedule_delayed_work(work, STATS_INTERVAL);
 }
 
 static void nv_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)
@@ -3840,10 +3808,8 @@ static int nv_set_ringparam(struct net_device *dev, struct ethtool_ringparam* ri
 	if (netif_running(dev)) {
 		/* reinit driver view of the queues */
 		set_bufsize(dev);
-		if (nv_init_ring(dev)) {
-			if (!np->in_shutdown)
-				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-		}
+		if (nv_init_ring(dev))
+			mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
 
 		/* reinit nic view of the queues */
 		writel(np->rx_buf_sz, base + NvRegOffloadConfig);
@@ -4024,7 +3990,7 @@ static void nv_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *e
 	struct fe_priv *np = netdev_priv(dev);
 
 	/* update stats */
-	nv_do_stats_poll((unsigned long)dev);
+	nv_get_hw_stats(dev);
 
 	memcpy(buffer, &np->estats, nv_get_sset_count(dev, ETH_SS_STATS)*sizeof(u64));
 }
@@ -4322,10 +4288,9 @@ static void nv_self_test(struct net_device *dev, struct ethtool_test *test, u64
 		if (netif_running(dev)) {
 			/* reinit driver view of the rx queue */
 			set_bufsize(dev);
-			if (nv_init_ring(dev)) {
-				if (!np->in_shutdown)
-					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-			}
+			if (nv_init_ring(dev))
+				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
+
 			/* reinit nic view of the rx queue */
 			writel(np->rx_buf_sz, base + NvRegOffloadConfig);
 			setup_hw_rings(dev, NV_SETUP_RX_RING | NV_SETUP_TX_RING);
@@ -4473,8 +4438,6 @@ static int nv_open(struct net_device *dev)
 	nv_txrx_reset(dev);
 	writel(0, base + NvRegUnknownSetupReg6);
 
-	np->in_shutdown = 0;
-
 	/* give hw rings */
 	setup_hw_rings(dev, NV_SETUP_RX_RING | NV_SETUP_TX_RING);
 	writel( ((np->rx_ring_size-1) << NVREG_RINGSZ_RXSHIFT) + ((np->tx_ring_size-1) << NVREG_RINGSZ_TXSHIFT),
@@ -4579,7 +4542,7 @@ static int nv_open(struct net_device *dev)
 
 	/* start statistics timer */
 	if (np->driver_data & (DEV_HAS_STATISTICS_V1|DEV_HAS_STATISTICS_V2))
-		mod_timer(&np->stats_poll, jiffies + STATS_INTERVAL);
+		schedule_delayed_work(&np->stats_task, STATS_INTERVAL);
 
 	spin_unlock_irq(&np->lock);
 
@@ -4594,17 +4557,14 @@ static int nv_close(struct net_device *dev)
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base;
 
-	spin_lock_irq(&np->lock);
-	np->in_shutdown = 1;
-	spin_unlock_irq(&np->lock);
 	netif_stop_queue(dev);
 	napi_disable(&np->napi);
 	napi_disable(&np->tx_napi);
 	synchronize_irq(dev->irq);
 
 	del_timer_sync(&np->oom_kick);
-	del_timer_sync(&np->nic_poll);
-	del_timer_sync(&np->stats_poll);
+	cancel_rearming_delayed_work(&np->stats_task);
+	cancel_work_sync(&np->reset_task);
 
 	spin_lock_irq(&np->lock);
 	nv_stop_txrx(dev);
@@ -4658,12 +4618,8 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	init_timer(&np->oom_kick);
 	np->oom_kick.data = (unsigned long) dev;
 	np->oom_kick.function = &nv_do_rx_refill;	/* timer handler */
-	init_timer(&np->nic_poll);
-	np->nic_poll.data = (unsigned long) dev;
-	np->nic_poll.function = &nv_do_nic_poll;	/* timer handler */
-	init_timer(&np->stats_poll);
-	np->stats_poll.data = (unsigned long) dev;
-	np->stats_poll.function = &nv_do_stats_poll;	/* timer handler */
+	INIT_WORK(&np->reset_task, nv_reset_task);
+	INIT_DELAYED_WORK(&np->stats_task, nv_stats_task);
 
 	err = pci_enable_device(pci_dev);
 	if (err) {

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] forcedeth: several proposed updates for testing
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
                   ` (4 preceding siblings ...)
  2007-10-06 15:15 ` [PATCH 5/5] forcedeth: timer overhaul Jeff Garzik
@ 2007-10-06 15:17 ` Jeff Garzik
  2007-10-06 15:24 ` Jeff Garzik
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-06 15:17 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton

On Sat, Oct 06, 2007 at 11:12:50AM -0400, Jeff Garzik wrote:
> 
> The 'fe-lock' branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git fe-lock


It should also be pointed out that these patches were generated on top
of davem's net-2.6.24.git tree.

They -probably- apply to -mm, but you might have to remove the forcedeth
patches -mm already has, before applying.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] forcedeth: several proposed updates for testing
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
                   ` (5 preceding siblings ...)
  2007-10-06 15:17 ` [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
@ 2007-10-06 15:24 ` Jeff Garzik
  2007-10-07  9:08 ` Ingo Molnar
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-06 15:24 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton

Jeff Garzik wrote:
> The goals of these changes are:
> * move the driver towards a more sane, simple, easy to verify locking
>   setup -- irq handler would often acquire/release the lock twice
>   for each interrupt -- and hopefully

  s/and hopefully//   (it became the next bullet point)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] forcedeth: interrupt handling cleanup
  2007-10-06 15:14 ` [PATCH 2/5] forcedeth: interrupt handling cleanup Jeff Garzik
@ 2007-10-07  4:43   ` Yinghai Lu
  2007-10-07 11:40     ` Jeff Garzik
  2007-10-07  9:03   ` Ingo Molnar
  1 sibling, 1 reply; 21+ messages in thread
From: Yinghai Lu @ 2007-10-07  4:43 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Ayaz Abdulla, LKML, Andrew Morton

On 10/6/07, Jeff Garzik <jeff@garzik.org> wrote:
>
> commit a606d2a111cdf948da5d69eb1de5526c5c2dafef
> Author: Jeff Garzik <jeff@garzik.org>
> Date:   Fri Oct 5 22:56:05 2007 -0400
>
>     [netdrvr] forcedeth: interrupt handling cleanup
>
>     * nv_nic_irq_optimized() and nv_nic_irq_other() were complete duplicates
>       of nv_nic_irq(), with the exception of one function call.  Consolidate
>       all three into a single interrupt handler, deleting a lot of redundant
>       code.
>
>     * greatly simplify irq handler locking.
>
>       Prior to this change, the irq handler(s) would acquire and release
>       np->lock for each action (RX, TX, other events).
>
>       For the common case -- RX or TX work -- the lock is always acquired,
>       making all successive acquire/release combinations largely redundant.
>
>       Acquire the lock at the beginning of the irq handler, and release it at
>       the end of the irq handler.  This is simple, easy, and obvious.
>
>     * remove irq handler work loop.
>
>       All interesting events emanating from the irq handler either have
>       their own work loops, or they poke a timer into action.
>
>       Therefore, delete the pointless master interrupt handler work loop.
>
>     Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
>
>  drivers/net/forcedeth.c |  325 +++++++++++-------------------------------------
>  1 file changed, 77 insertions(+), 248 deletions(-)
>
any chance to create three verion irq handlers for ioapic, msi, msi-x...?

MACRO or inline function?

YH

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] forcedeth: interrupt handling cleanup
  2007-10-06 15:14 ` [PATCH 2/5] forcedeth: interrupt handling cleanup Jeff Garzik
  2007-10-07  4:43   ` Yinghai Lu
@ 2007-10-07  9:03   ` Ingo Molnar
  1 sibling, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2007-10-07  9:03 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Ayaz Abdulla, LKML, Andrew Morton


* Jeff Garzik <jeff@garzik.org> wrote:

> -			spin_unlock(&np->lock);
> -			printk(KERN_DEBUG "%s: too many iterations (%d) in nv_nic_irq.\n", dev->name, i);
> -			break;

i like that! One forcedeth annoyance that triggers frequently on one of 
my testboxes is:

[  120.955202] eth0: too many iterations (6) in nv_nic_irq.
[  121.233865] eth0: too many iterations (6) in nv_nic_irq.
[  129.215450] eth0: too many iterations (6) in nv_nic_irq.
[  139.734408] eth0: too many iterations (6) in nv_nic_irq.
[  144.546811] eth0: too many iterations (6) in nv_nic_irq.
[  153.811005] eth0: too many iterations (6) in nv_nic_irq.
[  154.695879] eth0: too many iterations (6) in nv_nic_irq.
[  155.455078] eth0: too many iterations (6) in nv_nic_irq.
[  173.912162] eth0: too many iterations (6) in nv_nic_irq.

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] forcedeth: several proposed updates for testing
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
                   ` (6 preceding siblings ...)
  2007-10-06 15:24 ` Jeff Garzik
@ 2007-10-07  9:08 ` Ingo Molnar
  2007-10-07 11:34   ` Jeff Garzik
  2007-10-07 11:23 ` [PATCH 6/n] forcedeth: protect slow path with mutex Jeff Garzik
  2007-10-07 14:47 ` [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
  9 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2007-10-07  9:08 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Ayaz Abdulla, LKML, Andrew Morton


* Jeff Garzik <jeff@garzik.org> wrote:

> * I feel TX NAPI is a useful tool, because it provides an independent TX
>   process control point and system load feedback point.
>   Thus I felt this was slightly superior to tasklets.

/me agrees violently

btw., when i played with this tunable under -rt:

 enum {
         NV_OPTIMIZATION_MODE_THROUGHPUT,
         NV_OPTIMIZATION_MODE_CPU
 };
 static int optimization_mode = NV_OPTIMIZATION_MODE_THROUGHPUT;

the MODE_CPU one gave (much) _higher_ bandwidth. The queueing model in 
forcedeth seemed to be not that robust and i think a single queueing 
model should be adopted instead of this tunable. (which i think just hid 
some bug/dependency) But i never got to the bottom of it so it's just 
the impression i got.

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 6/n] forcedeth: protect slow path with mutex
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
                   ` (7 preceding siblings ...)
  2007-10-07  9:08 ` Ingo Molnar
@ 2007-10-07 11:23 ` Jeff Garzik
  2007-10-07 14:40   ` Jeff Garzik
  2007-10-07 14:47 ` [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
  9 siblings, 1 reply; 21+ messages in thread
From: Jeff Garzik @ 2007-10-07 11:23 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton


commit abca163a14b28c234df9bf38034bc967ff81c3aa
Author: Jeff Garzik <jeff@garzik.org>
Date:   Sun Oct 7 07:22:14 2007 -0400

    [netdrvr] forcedeth: wrap slow path hw manipulation inside hw_mutex
    
    * This makes sure everybody who wants to start/stop the RX and TX engines
      first acquires this mutex.
    
    * tx_timeout code was deleted, replaced by scheduling reset_task.
    
    * linkchange moved to a workqueue (always inside hw_mutex)
    
    * simplified irq handling a bit
    
    * make sure to disable workqueues before NAPI
    
    Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

 drivers/net/forcedeth.c |  272 ++++++++++++++++++++++++++++++------------------
 1 file changed, 175 insertions(+), 97 deletions(-)

abca163a14b28c234df9bf38034bc967ff81c3aa
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index a037f49..d1c1efa 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -63,6 +63,7 @@
 #include <linux/if_vlan.h>
 #include <linux/dma-mapping.h>
 #include <linux/workqueue.h>
+#include <linux/mutex.h>
 
 #include <asm/irq.h>
 #include <asm/io.h>
@@ -647,6 +648,12 @@ struct nv_skb_map {
 	unsigned int dma_len;
 };
 
+struct nv_mc_info {
+	u32 addr[2];
+	u32 mask[2];
+	u32 pff;
+};
+
 /*
  * SMP locking:
  * All hardware access under dev->priv->lock, except the performance
@@ -709,6 +716,8 @@ struct fe_priv {
 	unsigned int pkt_limit;
 	struct timer_list oom_kick;
 	struct work_struct reset_task;
+	struct work_struct linkchange_task;
+	struct work_struct mcast_task;
 	struct delayed_work stats_task;
 	u32 reset_task_irq;
 	int rx_ring_size;
@@ -731,14 +740,18 @@ struct fe_priv {
 	int tx_ring_size;
 
 	/* vlan fields */
-	struct vlan_group *vlangrp;
+	struct vlan_group	*vlangrp;
 
 	/* msi/msi-x fields */
-	u32 msi_flags;
-	struct msix_entry msi_x_entry[NV_MSI_X_MAX_VECTORS];
+	u32			msi_flags;
+	struct msix_entry	msi_x_entry[NV_MSI_X_MAX_VECTORS];
 
 	/* flow control */
-	u32 pause_flags;
+	u32			pause_flags;
+
+	struct mutex		hw_mutex;
+
+	struct nv_mc_info	mci;
 };
 
 /*
@@ -2120,27 +2133,9 @@ static void nv_tx_timeout(struct net_device *dev)
 
 	spin_lock_irq(&np->lock);
 
-	/* 1) stop tx engine */
-	nv_stop_tx(dev);
-
-	/* 2) process all pending tx completions */
-	if (!nv_optimized(np))
-		nv_tx_done(dev, np->tx_ring_size);
-	else
-		nv_tx_done_optimized(dev, np->tx_ring_size);
+	np->reset_task_irq = np->irqmask;
+	schedule_work(&np->reset_task);
 
-	/* 3) if there are dead entries: clear everything */
-	if (np->get_tx_ctx != np->put_tx_ctx) {
-		printk(KERN_DEBUG "%s: tx_timeout: dead entries!\n", dev->name);
-		nv_drain_tx(dev);
-		nv_init_tx(dev);
-		setup_hw_rings(dev, NV_SETUP_TX_RING);
-	}
-
-	netif_wake_queue(dev);
-
-	/* 4) restart tx engine */
-	nv_start_tx(dev);
 	spin_unlock_irq(&np->lock);
 }
 
@@ -2476,6 +2471,7 @@ static int nv_change_mtu(struct net_device *dev, int new_mtu)
 		 * guessed, there is probably a simpler approach.
 		 * Changing the MTU is a rare event, it shouldn't matter.
 		 */
+		mutex_lock(&np->hw_mutex);
 		nv_disable_irq(dev);
 		netif_tx_lock_bh(dev);
 		spin_lock(&np->lock);
@@ -2503,6 +2499,7 @@ static int nv_change_mtu(struct net_device *dev, int new_mtu)
 		spin_unlock(&np->lock);
 		netif_tx_unlock_bh(dev);
 		nv_enable_irq(dev);
+		mutex_unlock(&np->hw_mutex);
 	}
 	return 0;
 }
@@ -2535,6 +2532,8 @@ static int nv_set_mac_address(struct net_device *dev, void *addr)
 	/* synchronized against open : rtnl_lock() held by caller */
 	memcpy(dev->dev_addr, macaddr->sa_data, ETH_ALEN);
 
+	mutex_lock(&np->hw_mutex);
+
 	if (netif_running(dev)) {
 		netif_tx_lock_bh(dev);
 		spin_lock_irq(&np->lock);
@@ -2552,6 +2551,8 @@ static int nv_set_mac_address(struct net_device *dev, void *addr)
 	} else {
 		nv_copy_mac_to_hw(dev);
 	}
+
+	mutex_unlock(&np->hw_mutex);
 	return 0;
 }
 
@@ -2605,17 +2606,61 @@ static void nv_set_multicast(struct net_device *dev)
 	}
 	addr[0] |= NVREG_MCASTADDRA_FORCE;
 	pff |= NVREG_PFF_ALWAYS;
+
+	if (mutex_trylock(&np->hw_mutex)) {
+		spin_lock_irq(&np->lock);
+
+		nv_stop_rx(dev);
+
+		writel(addr[0], base + NvRegMulticastAddrA);
+		writel(addr[1], base + NvRegMulticastAddrB);
+		writel(mask[0], base + NvRegMulticastMaskA);
+		writel(mask[1], base + NvRegMulticastMaskB);
+		writel(pff, base + NvRegPacketFilterFlags);
+		dprintk(KERN_INFO "%s: reconfiguration for multicast lists.\n",
+			dev->name);
+
+		nv_start_rx(dev);
+
+		spin_unlock_irq(&np->lock);
+	} else {
+		spin_lock_irq(&np->lock);
+		np->mci.addr[0] = addr[0];
+		np->mci.addr[1] = addr[1];
+		np->mci.mask[0] = mask[0];
+		np->mci.mask[1] = mask[1];
+		np->mci.pff = pff;
+		spin_unlock_irq(&np->lock);
+
+		schedule_work(&np->mcast_task);
+	}
+}
+
+static void nv_mcast_task(struct work_struct *work)
+{
+	struct fe_priv *np = container_of(work, struct fe_priv, mcast_task);
+	struct net_device *dev = np->dev;
+	u8 __iomem *base = get_hwbase(dev);
+
+	mutex_lock(&np->hw_mutex);
+
 	spin_lock_irq(&np->lock);
+
 	nv_stop_rx(dev);
-	writel(addr[0], base + NvRegMulticastAddrA);
-	writel(addr[1], base + NvRegMulticastAddrB);
-	writel(mask[0], base + NvRegMulticastMaskA);
-	writel(mask[1], base + NvRegMulticastMaskB);
-	writel(pff, base + NvRegPacketFilterFlags);
+
+	writel(np->mci.addr[0], base + NvRegMulticastAddrA);
+	writel(np->mci.addr[1], base + NvRegMulticastAddrB);
+	writel(np->mci.mask[0], base + NvRegMulticastMaskA);
+	writel(np->mci.mask[1], base + NvRegMulticastMaskB);
+	writel(np->mci.pff, base + NvRegPacketFilterFlags);
 	dprintk(KERN_INFO "%s: reconfiguration for multicast lists.\n",
 		dev->name);
+
 	nv_start_rx(dev);
+
 	spin_unlock_irq(&np->lock);
+
+	mutex_unlock(&np->hw_mutex);
 }
 
 static void nv_update_pause(struct net_device *dev, u32 pause_flags)
@@ -2873,6 +2918,15 @@ static void nv_linkchange(struct net_device *dev)
 	}
 }
 
+static void nv_linkchange_task(struct work_struct *work)
+{
+	struct fe_priv *np = container_of(work, struct fe_priv, linkchange_task);
+
+	mutex_lock(&np->hw_mutex);
+	nv_linkchange(np->dev);
+	mutex_unlock(&np->hw_mutex);
+}
+
 static void nv_link_irq(struct net_device *dev)
 {
 	u8 __iomem *base = get_hwbase(dev);
@@ -2883,7 +2937,7 @@ static void nv_link_irq(struct net_device *dev)
 	dprintk(KERN_INFO "%s: link change irq, status 0x%x.\n", dev->name, miistat);
 
 	if (miistat & (NVREG_MIISTAT_LINKCHANGE))
-		nv_linkchange(dev);
+		schedule_work(&np->linkchange_task);
 	dprintk(KERN_DEBUG "%s: link change notification done.\n", dev->name);
 }
 
@@ -2894,34 +2948,39 @@ static irqreturn_t __nv_nic_irq(struct net_device *dev, bool optimized)
 	u32 events;
 	int handled = 0;
 	u32 upd_mask = 0;
+	bool msix = (np->msi_flags & NV_MSI_X_ENABLED);
 
 	dprintk(KERN_DEBUG "%s: nv_nic_irq%s\n", dev->name,
 		optimized ? "_optimized" : "");
 
-	spin_lock(&np->lock);
-
-	if (!(np->msi_flags & NV_MSI_X_ENABLED)) {
+	if (!msix)
 		events = readl(base + NvRegIrqStatus) & NVREG_IRQSTAT_MASK;
-		writel(NVREG_IRQSTAT_MASK, base + NvRegIrqStatus);
-	} else {
+	else
 		events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQSTAT_MASK;
-		writel(NVREG_IRQSTAT_MASK, base + NvRegMSIXIrqStatus);
-	}
 
 	dprintk(KERN_DEBUG "%s: irq: %08x\n", dev->name, events);
 
+	spin_lock(&np->lock);
+
 	if (!(events & np->irqmask))
 		goto out;
 
-	if (events & NVREG_IRQ_RX_ALL) {
-		netif_rx_schedule(dev, &np->napi);
+	if (!msix)
+		writel(NVREG_IRQSTAT_MASK, base + NvRegIrqStatus);
+	else
+		writel(NVREG_IRQSTAT_MASK, base + NvRegMSIXIrqStatus);
+
+	if ((events & NVREG_IRQ_RX_ALL) &&
+	    (netif_rx_schedule_prep(dev, &np->tx_napi))) {
+		__netif_rx_schedule(dev, &np->napi);
 
 		/* Disable furthur receive irq's */
 		upd_mask |= NVREG_IRQ_RX_ALL;
 	}
 
-	if (events & NVREG_IRQ_TX_ALL) {
-		netif_rx_schedule(dev, &np->tx_napi);
+	if ((events & NVREG_IRQ_TX_ALL) &&
+	    (netif_rx_schedule_prep(dev, &np->tx_napi))) {
+		__netif_rx_schedule(dev, &np->tx_napi);
 
 		/* Disable furthur xmit irq's */
 		upd_mask |= NVREG_IRQ_TX_ALL;
@@ -2930,7 +2989,7 @@ static irqreturn_t __nv_nic_irq(struct net_device *dev, bool optimized)
 	if (upd_mask) {
 		np->irqmask &= ~upd_mask;
 
-		if (np->msi_flags & NV_MSI_X_ENABLED)
+		if (msix)
 			writel(upd_mask, base + NvRegIrqMask);
 		else
 			writel(np->irqmask, base + NvRegIrqMask);
@@ -2940,7 +2999,7 @@ static irqreturn_t __nv_nic_irq(struct net_device *dev, bool optimized)
 		nv_link_irq(dev);
 
 	if (unlikely(np->need_linktimer && time_after(jiffies, np->link_timeout))) {
-		nv_linkchange(dev);
+		schedule_work(&np->linkchange_task);
 		np->link_timeout = jiffies + LINK_TIMEOUT;
 	}
 
@@ -2958,7 +3017,7 @@ static irqreturn_t __nv_nic_irq(struct net_device *dev, bool optimized)
 
 	if (unlikely(events & NVREG_IRQ_RECOVER_ERROR)) {
 		/* disable interrupts on the nic */
-		if (!(np->msi_flags & NV_MSI_X_ENABLED))
+		if (!msix)
 			writel(0, base + NvRegIrqMask);
 		else
 			writel(np->irqmask, base + NvRegIrqMask);
@@ -3001,20 +3060,15 @@ static irqreturn_t nv_nic_irq_other(int foo, void *data)
 
 static irqreturn_t nv_nic_irq_tx(int foo, void *data)
 {
-	struct net_device *dev = (struct net_device *) data;
+	struct net_device *dev = data;
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
-	u32 events;
 
-	events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_TX_ALL;
-	writel(NVREG_IRQ_TX_ALL, base + NvRegMSIXIrqStatus);
+	writel(NVREG_IRQ_TX_ALL, base + NvRegMSIXIrqStatus);	/* ack ints */
+	writel(NVREG_IRQ_TX_ALL, base + NvRegIrqMask);	    /* disable ints */
+
+	netif_rx_schedule(dev, &np->tx_napi);
 
-	if (events) {
-		netif_rx_schedule(dev, &np->tx_napi);
-		/* disable receive interrupts on the nic */
-		writel(NVREG_IRQ_TX_ALL, base + NvRegIrqMask);
-		pci_push(base);
-	}
 	return IRQ_HANDLED;
 }
 
@@ -3090,26 +3144,21 @@ static int nv_napi_poll(struct napi_struct *napi, int budget)
 
 static irqreturn_t nv_nic_irq_rx(int foo, void *data)
 {
-	struct net_device *dev = (struct net_device *) data;
+	struct net_device *dev = data;
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
-	u32 events;
 
-	events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_RX_ALL;
-	writel(NVREG_IRQ_RX_ALL, base + NvRegMSIXIrqStatus);
+	writel(NVREG_IRQ_RX_ALL, base + NvRegMSIXIrqStatus);	/* ack ints */
+	writel(NVREG_IRQ_RX_ALL, base + NvRegIrqMask);	    /* disable ints */
+
+	netif_rx_schedule(dev, &np->napi);
 
-	if (events) {
-		netif_rx_schedule(dev, &np->napi);
-		/* disable receive interrupts on the nic */
-		writel(NVREG_IRQ_RX_ALL, base + NvRegIrqMask);
-		pci_push(base);
-	}
 	return IRQ_HANDLED;
 }
 
 static irqreturn_t nv_nic_irq_test(int foo, void *data)
 {
-	struct net_device *dev = (struct net_device *) data;
+	struct net_device *dev = data;
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
 	u32 events;
@@ -3287,12 +3336,17 @@ static void nv_reset_task(struct work_struct *work)
 	struct net_device *dev = np->dev;
 	u8 __iomem *base = get_hwbase(dev);
 	u32 mask;
+	unsigned long flags;
+
+	mutex_lock(&np->hw_mutex);
 
 	/*
 	 * First disable irq(s) and then
 	 * reenable interrupts on the nic, we have to do this before calling
 	 * nv_nic_irq because that may decide to do otherwise
 	 */
+	netif_tx_lock_bh(dev);
+	spin_lock_irqsave(&np->lock, flags);
 
 	if (!using_multi_irqs(dev)) {
 		mask = np->irqmask;
@@ -3308,11 +3362,7 @@ static void nv_reset_task(struct work_struct *work)
 	np->reset_task_irq = 0;
 
 	printk(KERN_INFO "forcedeth: MAC in recoverable error state\n");
-	if (!netif_running(dev))
-		goto out;
 
-	netif_tx_lock_bh(dev);
-	spin_lock(&np->lock);
 	/* stop engines */
 	nv_stop_txrx(dev);
 	nv_txrx_reset(dev);
@@ -3334,12 +3384,14 @@ static void nv_reset_task(struct work_struct *work)
 
 	/* restart rx engine */
 	nv_start_txrx(dev);
-	spin_unlock(&np->lock);
-	netif_tx_unlock_bh(dev);
 
-out:
 	writel(mask, base + NvRegIrqMask);
 	pci_push(base);
+
+	spin_unlock_irqrestore(&np->lock, flags);
+	netif_tx_unlock_bh(dev);
+
+	mutex_unlock(&np->hw_mutex);
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -4321,29 +4373,45 @@ static void nv_get_strings(struct net_device *dev, u32 stringset, u8 *buffer)
 	}
 }
 
+static int nv_ethtool_begin (struct net_device *dev)
+{
+	struct fe_priv *np = get_nvpriv(dev);
+
+	mutex_lock(&np->hw_mutex);
+}
+
+static void nv_ethtool_complete (struct net_device *dev)
+{
+	struct fe_priv *np = get_nvpriv(dev);
+
+	mutex_unlock(&np->hw_mutex);
+}
+
 static const struct ethtool_ops ops = {
-	.get_drvinfo = nv_get_drvinfo,
-	.get_link = ethtool_op_get_link,
-	.get_wol = nv_get_wol,
-	.set_wol = nv_set_wol,
-	.get_settings = nv_get_settings,
-	.set_settings = nv_set_settings,
-	.get_regs_len = nv_get_regs_len,
-	.get_regs = nv_get_regs,
-	.nway_reset = nv_nway_reset,
-	.set_tso = nv_set_tso,
-	.get_ringparam = nv_get_ringparam,
-	.set_ringparam = nv_set_ringparam,
-	.get_pauseparam = nv_get_pauseparam,
-	.set_pauseparam = nv_set_pauseparam,
-	.get_rx_csum = nv_get_rx_csum,
-	.set_rx_csum = nv_set_rx_csum,
-	.set_tx_csum = nv_set_tx_csum,
-	.set_sg = nv_set_sg,
-	.get_strings = nv_get_strings,
-	.get_ethtool_stats = nv_get_ethtool_stats,
-	.get_sset_count = nv_get_sset_count,
-	.self_test = nv_self_test,
+	.begin			= nv_ethtool_begin,
+	.complete		= nv_ethtool_complete,
+	.get_drvinfo		= nv_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+	.get_wol		= nv_get_wol,
+	.set_wol		= nv_set_wol,
+	.get_settings		= nv_get_settings,
+	.set_settings		= nv_set_settings,
+	.get_regs_len		= nv_get_regs_len,
+	.get_regs		= nv_get_regs,
+	.nway_reset		= nv_nway_reset,
+	.set_tso		= nv_set_tso,
+	.get_ringparam		= nv_get_ringparam,
+	.set_ringparam		= nv_set_ringparam,
+	.get_pauseparam		= nv_get_pauseparam,
+	.set_pauseparam		= nv_set_pauseparam,
+	.get_rx_csum		= nv_get_rx_csum,
+	.set_rx_csum		= nv_set_rx_csum,
+	.set_tx_csum		= nv_set_tx_csum,
+	.set_sg			= nv_set_sg,
+	.get_strings		= nv_get_strings,
+	.get_ethtool_stats	= nv_get_ethtool_stats,
+	.get_sset_count		= nv_get_sset_count,
+	.self_test		= nv_self_test,
 };
 
 static void nv_vlan_rx_register(struct net_device *dev, struct vlan_group *grp)
@@ -4524,9 +4592,13 @@ static int nv_open(struct net_device *dev)
 	}
 	/* set linkspeed to invalid value, thus force nv_update_linkspeed
 	 * to init hw */
+	mutex_lock(&np->hw_mutex);
 	np->linkspeed = 0;
 	ret = nv_update_linkspeed(dev);
+	mutex_unlock(&np->hw_mutex);
+
 	nv_start_txrx(dev);
+
 	napi_enable(&np->napi);
 	napi_enable(&np->tx_napi);
 	netif_start_queue(dev);
@@ -4558,13 +4630,17 @@ static int nv_close(struct net_device *dev)
 	u8 __iomem *base;
 
 	netif_stop_queue(dev);
-	napi_disable(&np->napi);
-	napi_disable(&np->tx_napi);
-	synchronize_irq(dev->irq);
 
 	del_timer_sync(&np->oom_kick);
 	cancel_rearming_delayed_work(&np->stats_task);
 	cancel_work_sync(&np->reset_task);
+	cancel_work_sync(&np->linkchange_task);
+	cancel_work_sync(&np->mcast_task);
+
+	napi_disable(&np->napi);
+	napi_disable(&np->tx_napi);
+
+	synchronize_irq(dev->irq);
 
 	spin_lock_irq(&np->lock);
 	nv_stop_txrx(dev);
@@ -4619,6 +4695,8 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	np->oom_kick.data = (unsigned long) dev;
 	np->oom_kick.function = &nv_do_rx_refill;	/* timer handler */
 	INIT_WORK(&np->reset_task, nv_reset_task);
+	INIT_WORK(&np->linkchange_task, nv_linkchange_task);
+	INIT_WORK(&np->mcast_task, nv_mcast_task);
 	INIT_DELAYED_WORK(&np->stats_task, nv_stats_task);
 
 	err = pci_enable_device(pci_dev);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] forcedeth: several proposed updates for testing
  2007-10-07  9:08 ` Ingo Molnar
@ 2007-10-07 11:34   ` Jeff Garzik
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-07 11:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: netdev, Ayaz Abdulla, LKML, Andrew Morton

Ingo Molnar wrote:
> * Jeff Garzik <jeff@garzik.org> wrote:
> 
>> * I feel TX NAPI is a useful tool, because it provides an independent TX
>>   process control point and system load feedback point.
>>   Thus I felt this was slightly superior to tasklets.
> 
> /me agrees violently
> 
> btw., when i played with this tunable under -rt:
> 
>  enum {
>          NV_OPTIMIZATION_MODE_THROUGHPUT,
>          NV_OPTIMIZATION_MODE_CPU
>  };
>  static int optimization_mode = NV_OPTIMIZATION_MODE_THROUGHPUT;
> 
> the MODE_CPU one gave (much) _higher_ bandwidth. The queueing model in 
> forcedeth seemed to be not that robust and i think a single queueing 
> model should be adopted instead of this tunable. (which i think just hid 
> some bug/dependency) But i never got to the bottom of it so it's just 
> the impression i got.

That's interesting.  It will be informative to narrow down the variables 
affected by this.  My changes stirred the pot quite a bit :)

* 'throughput' mode enables MSI-X, and separate interrupt vectors for RX 
and TX.  so, NVIDIA's MSI-X implementation, our generic MSI-X support, 
or "Known bugs" (see top of file) may be a factor here.

* 'throughput' mode also changes the NIC's timer interrupt frequency

* do you recall if you were running in NAPI mode?  It defaulted to off 
in Kconfig, but I turned it on unconditionally.

* I think TX NAPI has the potential to make the optimization_mode 
irrelevant (along with the other changes, most notably the interrupt 
handling change)

* and overall, yes, if we can have a single queueing model / 
optimization mode I am strongly in favor of that.

Testing welcome ;-)  Though these patches are raw and "hot off the 
presses", so unrelated bugs are practically a certainty.  And I am 
worrying about the "Known bugs" note at the top.  My gut feeling is that 
this was, in part, misunderstanding on the part of reverse-engineers, 
since corrected when NVIDIA started contributing to the driver.

	Jeff




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] forcedeth: interrupt handling cleanup
  2007-10-07  4:43   ` Yinghai Lu
@ 2007-10-07 11:40     ` Jeff Garzik
  2007-10-07 19:36       ` Yinghai Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Garzik @ 2007-10-07 11:40 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: netdev, Ayaz Abdulla, LKML, Andrew Morton

Yinghai Lu wrote:
> On 10/6/07, Jeff Garzik <jeff@garzik.org> wrote:
>> commit a606d2a111cdf948da5d69eb1de5526c5c2dafef
>> Author: Jeff Garzik <jeff@garzik.org>
>> Date:   Fri Oct 5 22:56:05 2007 -0400
>>
>>     [netdrvr] forcedeth: interrupt handling cleanup
>>
>>     * nv_nic_irq_optimized() and nv_nic_irq_other() were complete duplicates
>>       of nv_nic_irq(), with the exception of one function call.  Consolidate
>>       all three into a single interrupt handler, deleting a lot of redundant
>>       code.
>>
>>     * greatly simplify irq handler locking.
>>
>>       Prior to this change, the irq handler(s) would acquire and release
>>       np->lock for each action (RX, TX, other events).
>>
>>       For the common case -- RX or TX work -- the lock is always acquired,
>>       making all successive acquire/release combinations largely redundant.
>>
>>       Acquire the lock at the beginning of the irq handler, and release it at
>>       the end of the irq handler.  This is simple, easy, and obvious.
>>
>>     * remove irq handler work loop.
>>
>>       All interesting events emanating from the irq handler either have
>>       their own work loops, or they poke a timer into action.
>>
>>       Therefore, delete the pointless master interrupt handler work loop.
>>
>>     Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
>>
>>  drivers/net/forcedeth.c |  325 +++++++++++-------------------------------------
>>  1 file changed, 77 insertions(+), 248 deletions(-)
>>
> any chance to create three verion irq handlers for ioapic, msi, msi-x...?
> 
> MACRO or inline function?

MSI-X already has its own separate interrupt handlers.  MSI and INTx 
call the same interrupt handling code, like the unmodified driver goes. 
  Creating an MSI-specific irq handler would not save very much AFAICS, 
but I might be missing something.

Do you have ideas/suggestions for a different method?

	Jeff




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] forcedeth: process TX completions using NAPI
  2007-10-06 15:14 ` [PATCH 3/5] forcedeth: process TX completions using NAPI Jeff Garzik
@ 2007-10-07 14:39   ` Jeff Garzik
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-07 14:39 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton, Yinghai Lu, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

Jeff Garzik wrote:
> commit 57cbfacc00d69be2ba02b65d1021442273b76263
> Author: Jeff Garzik <jeff@garzik.org>
> Date:   Fri Oct 5 23:25:56 2007 -0400
> 
>     [netdrvr] forcedeth: process TX completions using NAPI
>     
>     Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
> 
>  drivers/net/forcedeth.c |  143 +++++++++++++++++++++++++++---------------------
>  1 file changed, 83 insertions(+), 60 deletions(-)

The attached patch fixes an obvious bug.  Once applied, TX NAPI actually 
works :)



[-- Attachment #2: patch.fe3 --]
[-- Type: text/plain, Size: 530 bytes --]

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 1c236e6..e25c05e 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -3059,7 +3059,7 @@ static irqreturn_t nv_nic_irq_tx(int foo, void *data)
 
 static int nv_napi_tx_poll(struct napi_struct *napi, int budget)
 {
-	struct fe_priv *np = container_of(napi, struct fe_priv, napi);
+	struct fe_priv *np = container_of(napi, struct fe_priv, tx_napi);
 	struct net_device *dev = np->dev;
 	u8 __iomem *base = get_hwbase(dev);
 	unsigned long flags;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/n] forcedeth: protect slow path with mutex
  2007-10-07 11:23 ` [PATCH 6/n] forcedeth: protect slow path with mutex Jeff Garzik
@ 2007-10-07 14:40   ` Jeff Garzik
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-07 14:40 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

Jeff Garzik wrote:
> commit abca163a14b28c234df9bf38034bc967ff81c3aa
> Author: Jeff Garzik <jeff@garzik.org>
> Date:   Sun Oct 7 07:22:14 2007 -0400
> 
>     [netdrvr] forcedeth: wrap slow path hw manipulation inside hw_mutex
>     
>     * This makes sure everybody who wants to start/stop the RX and TX engines
>       first acquires this mutex.
>     
>     * tx_timeout code was deleted, replaced by scheduling reset_task.
>     
>     * linkchange moved to a workqueue (always inside hw_mutex)
>     
>     * simplified irq handling a bit
>     
>     * make sure to disable workqueues before NAPI
>     
>     Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
> 
>  drivers/net/forcedeth.c |  272 ++++++++++++++++++++++++++++++------------------
>  1 file changed, 175 insertions(+), 97 deletions(-)

You will need the attached patch to even build (oops).

Also, testing shows there is a mutex deadlock somewhere.


[-- Attachment #2: patch.fe6 --]
[-- Type: text/plain, Size: 603 bytes --]

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index d1c1efa..91926b1 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -2929,6 +2929,7 @@ static void nv_linkchange_task(struct work_struct *work)
 
 static void nv_link_irq(struct net_device *dev)
 {
+	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
 	u32 miistat;
 
@@ -4378,6 +4379,7 @@ static int nv_ethtool_begin (struct net_device *dev)
 	struct fe_priv *np = get_nvpriv(dev);
 
 	mutex_lock(&np->hw_mutex);
+	return 0;
 }
 
 static void nv_ethtool_complete (struct net_device *dev)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] forcedeth: several proposed updates for testing
  2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
                   ` (8 preceding siblings ...)
  2007-10-07 11:23 ` [PATCH 6/n] forcedeth: protect slow path with mutex Jeff Garzik
@ 2007-10-07 14:47 ` Jeff Garzik
  2007-10-07 19:39   ` Yinghai Lu
  9 siblings, 1 reply; 21+ messages in thread
From: Jeff Garzik @ 2007-10-07 14:47 UTC (permalink / raw)
  To: netdev, Ayaz Abdulla; +Cc: LKML, Andrew Morton, Yinghai Lu, Ingo Molnar

Jeff Garzik wrote:
> The 'fe-lock' branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git fe-lock
> 
> contains the following changes that I would like to get tested:
> 
>       [netdrvr] forcedeth: make NAPI unconditional
>       [netdrvr] forcedeth: interrupt handling cleanup
>       [netdrvr] forcedeth: process TX completions using NAPI
>       [netdrvr] forcedeth: internal simplification and cleanups
>       [netdrvr] forcedeth: timer overhaul

OK, I've successfully tested patches 1-5 on an AMD64 system with

00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)

A trivial s/napi/tx_napi/ fix must be applied to patch #3 (sent in 
separate email).  Once that is done, the patchset works flawlessly here, 
passing simple RX, TX, RX+TX TCP stress tests.

I never ran into any TX problems, of the type hinted at by the "Known 
bugs" section at the top of forcedeth.c.  Either (a) my chip does not 
have that bug, (b) my chip needs DEV_NEED_TIMERIRQ for other reasons, or 
(c) the issue is not a hardware issue but a driver bug that is now fixed.

I'm going to hope its (c), <grin>  But only testing will tell.

	Jeff


P.S.  Depending on when this gets fixed, you might have to revert 
net-2.6.24.git commit 5f5dace1ce001b24fb8286e09ffd3c4d2b547e09 ("[NET]: 
Various dst_ifdown routines to catch refcounting bugs").

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] forcedeth: interrupt handling cleanup
  2007-10-07 11:40     ` Jeff Garzik
@ 2007-10-07 19:36       ` Yinghai Lu
  2007-10-07 20:07         ` Jeff Garzik
  0 siblings, 1 reply; 21+ messages in thread
From: Yinghai Lu @ 2007-10-07 19:36 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Ayaz Abdulla, LKML, Andrew Morton

On 10/7/07, Jeff Garzik <jeff@garzik.org> wrote:
> Yinghai Lu wrote:
> > On 10/6/07, Jeff Garzik <jeff@garzik.org> wrote:
> >> commit a606d2a111cdf948da5d69eb1de5526c5c2dafef
> >> Author: Jeff Garzik <jeff@garzik.org>
> >> Date:   Fri Oct 5 22:56:05 2007 -0400
> >>
> >>     [netdrvr] forcedeth: interrupt handling cleanup
> >>
> >>     * nv_nic_irq_optimized() and nv_nic_irq_other() were complete duplicates
> >>       of nv_nic_irq(), with the exception of one function call.  Consolidate
> >>       all three into a single interrupt handler, deleting a lot of redundant
> >>       code.
> >>
> >>     * greatly simplify irq handler locking.
> >>
> >>       Prior to this change, the irq handler(s) would acquire and release
> >>       np->lock for each action (RX, TX, other events).
> >>
> >>       For the common case -- RX or TX work -- the lock is always acquired,
> >>       making all successive acquire/release combinations largely redundant.
> >>
> >>       Acquire the lock at the beginning of the irq handler, and release it at
> >>       the end of the irq handler.  This is simple, easy, and obvious.
> >>
> >>     * remove irq handler work loop.
> >>
> >>       All interesting events emanating from the irq handler either have
> >>       their own work loops, or they poke a timer into action.
> >>
> >>       Therefore, delete the pointless master interrupt handler work loop.
> >>
> >>     Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
> >>
> Do you have ideas/suggestions for a different method?

in the e1000 driver, it has seperate handler for msi and ioapic.

but in forcedeth, the nv_nic_irq_optimized keep check msi_flags...,
with num of msi interrupt number, that could cause cpu loading get a
little bit high..., even the network performance is ok.

YH

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] forcedeth: several proposed updates for testing
  2007-10-07 14:47 ` [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
@ 2007-10-07 19:39   ` Yinghai Lu
  2007-10-07 20:05     ` Jeff Garzik
  0 siblings, 1 reply; 21+ messages in thread
From: Yinghai Lu @ 2007-10-07 19:39 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Ayaz Abdulla, LKML, Andrew Morton, Ingo Molnar

On 10/7/07, Jeff Garzik <jeff@garzik.org> wrote:
> Jeff Garzik wrote:
> > The 'fe-lock' branch of
> > git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git fe-lock
> OK, I've successfully tested patches 1-5 on an AMD64 system with
>
> 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)

need to test that with mcp55 based, that will use MSI.

YH

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] forcedeth: several proposed updates for testing
  2007-10-07 19:39   ` Yinghai Lu
@ 2007-10-07 20:05     ` Jeff Garzik
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-07 20:05 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: netdev, Ayaz Abdulla, LKML, Andrew Morton, Ingo Molnar

Yinghai Lu wrote:
> On 10/7/07, Jeff Garzik <jeff@garzik.org> wrote:
>> Jeff Garzik wrote:
>>> The 'fe-lock' branch of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git fe-lock
>> OK, I've successfully tested patches 1-5 on an AMD64 system with
>>
>> 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
> 
> need to test that with mcp55 based, that will use MSI.

I would love to see an MSI-X test too (might have to turn on throughput 
mode).

	Jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] forcedeth: interrupt handling cleanup
  2007-10-07 19:36       ` Yinghai Lu
@ 2007-10-07 20:07         ` Jeff Garzik
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2007-10-07 20:07 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: netdev, Ayaz Abdulla, LKML, Andrew Morton

Yinghai Lu wrote:
> On 10/7/07, Jeff Garzik <jeff@garzik.org> wrote:
>> Yinghai Lu wrote:
>>> On 10/6/07, Jeff Garzik <jeff@garzik.org> wrote:
>>>> commit a606d2a111cdf948da5d69eb1de5526c5c2dafef
>>>> Author: Jeff Garzik <jeff@garzik.org>
>>>> Date:   Fri Oct 5 22:56:05 2007 -0400
>>>>
>>>>     [netdrvr] forcedeth: interrupt handling cleanup
>>>>
>>>>     * nv_nic_irq_optimized() and nv_nic_irq_other() were complete duplicates
>>>>       of nv_nic_irq(), with the exception of one function call.  Consolidate
>>>>       all three into a single interrupt handler, deleting a lot of redundant
>>>>       code.
>>>>
>>>>     * greatly simplify irq handler locking.
>>>>
>>>>       Prior to this change, the irq handler(s) would acquire and release
>>>>       np->lock for each action (RX, TX, other events).
>>>>
>>>>       For the common case -- RX or TX work -- the lock is always acquired,
>>>>       making all successive acquire/release combinations largely redundant.
>>>>
>>>>       Acquire the lock at the beginning of the irq handler, and release it at
>>>>       the end of the irq handler.  This is simple, easy, and obvious.
>>>>
>>>>     * remove irq handler work loop.
>>>>
>>>>       All interesting events emanating from the irq handler either have
>>>>       their own work loops, or they poke a timer into action.
>>>>
>>>>       Therefore, delete the pointless master interrupt handler work loop.
>>>>
>>>>     Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
>>>>
>> Do you have ideas/suggestions for a different method?
> 
> in the e1000 driver, it has seperate handler for msi and ioapic.
> 
> but in forcedeth, the nv_nic_irq_optimized keep check msi_flags...,
> with num of msi interrupt number, that could cause cpu loading get a
> little bit high..., even the network performance is ok.

With all the activity in the interrupt handler, a few in-cache branches 
are definitely going to be lost in the noise.

Separating the interrupt handlers between MSI and non-MSI tends to be of 
more benefit when the separation is accompanied by more efficient 
locking in the MSI interrupt handler, or a different mode of interrupt 
clear, or some other attribute.

Though CPU usage would be a good thing to measure, with these patches.

	Jeff





^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2007-10-07 20:07 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-06 15:12 [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
2007-10-06 15:13 ` [PATCH 1/5] forcedeth: make NAPI unconditional Jeff Garzik
2007-10-06 15:14 ` [PATCH 2/5] forcedeth: interrupt handling cleanup Jeff Garzik
2007-10-07  4:43   ` Yinghai Lu
2007-10-07 11:40     ` Jeff Garzik
2007-10-07 19:36       ` Yinghai Lu
2007-10-07 20:07         ` Jeff Garzik
2007-10-07  9:03   ` Ingo Molnar
2007-10-06 15:14 ` [PATCH 3/5] forcedeth: process TX completions using NAPI Jeff Garzik
2007-10-07 14:39   ` Jeff Garzik
2007-10-06 15:14 ` [PATCH 4/5] forcedeth: internal simplification and cleanups Jeff Garzik
2007-10-06 15:15 ` [PATCH 5/5] forcedeth: timer overhaul Jeff Garzik
2007-10-06 15:17 ` [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
2007-10-06 15:24 ` Jeff Garzik
2007-10-07  9:08 ` Ingo Molnar
2007-10-07 11:34   ` Jeff Garzik
2007-10-07 11:23 ` [PATCH 6/n] forcedeth: protect slow path with mutex Jeff Garzik
2007-10-07 14:40   ` Jeff Garzik
2007-10-07 14:47 ` [PATCH 0/5] forcedeth: several proposed updates for testing Jeff Garzik
2007-10-07 19:39   ` Yinghai Lu
2007-10-07 20:05     ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).