[PATCH 1/3] r8169: non-NAPI netif_poll

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/3] r8169: non-NAPI netif_poll_disable removal
@ 2004-11-02 18:03 Jon Mason
  2004-11-06 14:56 ` Francois Romieu
  0 siblings, 1 reply; 4+ messages in thread
From: Jon Mason @ 2004-11-02 18:03 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

rtl8169_close contains a call to netif_poll_disable.  This call sets and spins 
on the __LINK_STATE_RX_SCHED bit, which isn't really a problem if this 
function is only called in close (as that bit is off with NAPI off).  
However, if the driver hits a transmit timeout (or any other call that 
references rtl8169_wait_for_quiescence), the the module will hang 
indefinitely during any subsequent call to netif_poll_disable or dev_close 
function call because this bit is now set.

Since this function should not really be called with NAPI off, I chose to 
remove the calls if NAPI is not defined (See patch below).  The patch is 
against the 2.6.10-rc1-mm2 version of the driver.  I have verified that this 
fixes the problem on ppc64 and x86_64.

Signed-off-by: Jon Mason <jdmason@us.ibm.com>

--- r8169.c.orig        2004-11-02 10:21:31.305104272 -0600
+++ r8169.c     2004-11-02 10:25:56.849735352 -0600
@@ -85,10 +85,12 @@ VERSION 1.6LK       <2004/04/14>
 #define rtl8169_rx_skb                 netif_receive_skb
 #define rtl8169_rx_hwaccel_skb         vlan_hwaccel_rx
 #define rtl8169_rx_quota(count, quota) min(count, quota)
+#define rtl8169_poll_disable(dev)      netif_poll_disable(dev)
 #else
 #define rtl8169_rx_skb                 netif_rx
 #define rtl8169_rx_hwaccel_skb         vlan_hwaccel_receive_skb
 #define rtl8169_rx_quota(count, quota) count
+#define rtl8169_poll_disable(dev)
 #endif

 /* media options */
@@ -1745,7 +1747,7 @@ static void rtl8169_wait_for_quiescence(
        synchronize_irq(dev->irq);

        /* Wait for any pending NAPI task to complete */
-       netif_poll_disable(dev);
+       rtl8169_poll_disable(dev);
 }

 static void rtl8169_reinit_task(void *_data)
@@ -2284,7 +2286,7 @@ rtl8169_close(struct net_device *dev)

        free_irq(dev->irq, dev);

-       netif_poll_disable(dev);
+       rtl8169_poll_disable(dev);

        rtl8169_tx_clear(tp);

[-- Attachment #2: r8169-close-1.patch --]
[-- Type: text/x-diff, Size: 997 bytes --]

--- r8169.c.orig	2004-11-02 10:21:31.305104272 -0600
+++ r8169.c	2004-11-02 10:25:56.849735352 -0600
@@ -85,10 +85,12 @@ VERSION 1.6LK	<2004/04/14>
 #define rtl8169_rx_skb			netif_receive_skb
 #define rtl8169_rx_hwaccel_skb		vlan_hwaccel_rx
 #define rtl8169_rx_quota(count, quota)	min(count, quota)
+#define rtl8169_poll_disable(dev)	netif_poll_disable(dev)
 #else
 #define rtl8169_rx_skb			netif_rx
 #define rtl8169_rx_hwaccel_skb		vlan_hwaccel_receive_skb
 #define rtl8169_rx_quota(count, quota)	count
+#define rtl8169_poll_disable(dev)
 #endif
 
 /* media options */
@@ -1745,7 +1747,7 @@ static void rtl8169_wait_for_quiescence(
 	synchronize_irq(dev->irq);
 
 	/* Wait for any pending NAPI task to complete */
-	netif_poll_disable(dev);
+	rtl8169_poll_disable(dev);
 }
 
 static void rtl8169_reinit_task(void *_data)
@@ -2284,7 +2286,7 @@ rtl8169_close(struct net_device *dev)
 
 	free_irq(dev->irq, dev);
 
-	netif_poll_disable(dev);
+	rtl8169_poll_disable(dev);
 
 	rtl8169_tx_clear(tp);
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/3] r8169: non-NAPI netif_poll_disable removal
  2004-11-06 14:56 ` Francois Romieu
@ 2004-11-06  9:30   ` Jon Mason
  2004-11-06 23:38     ` Francois Romieu
  0 siblings, 1 reply; 4+ messages in thread
From: Jon Mason @ 2004-11-06  9:30 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, jgarzik

On Saturday 06 November 2004 02:56 pm, Francois Romieu wrote:
[...]
> Yep but it seems to me that there is still a window for a race with NAPI
> if the relevant error path is taken:
>
> 1 - Tx timeout
> 2 - rtl8169_reset_task: netif_poll_disable is issued. Assume that the
>     refilling of the Rx buffers fails -> rtl8169_reset_task schedules
>     itself for later
> 3 - operator closes the device: netif_running(dev) == 0 and rtl8169_close()
>     is issued
> 4 - rtl8169_close() loops on netif_poll_disable()

There is an overall problem with the rtl8169_reset_task.  While the driver 
continues to receive packets, can't allcate any new rx buffers, or the rx dma 
engine is hung and no rx buffers are processed, the rtl8169_reset_task will 
schedule itself and loop forever (as dirty_rx != cur_rx).

It seems to me a better idea to just remove the dirty_rx == cur_rx check, and 
let the function clean-up the mess the driver/adapter is in.  

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/3] r8169: non-NAPI netif_poll_disable removal
  2004-11-02 18:03 [PATCH 1/3] r8169: non-NAPI netif_poll_disable removal Jon Mason
@ 2004-11-06 14:56 ` Francois Romieu
  2004-11-06  9:30   ` Jon Mason
  0 siblings, 1 reply; 4+ messages in thread
From: Francois Romieu @ 2004-11-06 14:56 UTC (permalink / raw)
  To: Jon Mason; +Cc: netdev, jgarzik

Jon Mason <jdmason@us.ibm.com> :
> rtl8169_close contains a call to netif_poll_disable.  This call sets and spins 
> on the __LINK_STATE_RX_SCHED bit, which isn't really a problem if this 
> function is only called in close (as that bit is off with NAPI off).  
> However, if the driver hits a transmit timeout (or any other call that 
> references rtl8169_wait_for_quiescence), the the module will hang 
> indefinitely during any subsequent call to netif_poll_disable or dev_close 
> function call because this bit is now set.
> 
> Since this function should not really be called with NAPI off, I chose to 
> remove the calls if NAPI is not defined (See patch below).  The patch is 
> against the 2.6.10-rc1-mm2 version of the driver.  I have verified that this 
> fixes the problem on ppc64 and x86_64.

Yep but it seems to me that there is still a window for a race with NAPI
if the relevant error path is taken:

1 - Tx timeout
2 - rtl8169_reset_task: netif_poll_disable is issued. Assume that the
    refilling of the Rx buffers fails -> rtl8169_reset_task schedules
    itself for later
3 - operator closes the device: netif_running(dev) == 0 and rtl8169_close()
    is issued
4 - rtl8169_close() loops on netif_poll_disable()

Neither __netif_rx_schedule nor netif_poll_enable was issued. Deadlock.

It should be enough to change rtl8169_wait_for_quiescence() for:

static void rtl8169_wait_for_quiescence(struct net_device *dev)
{
        synchronize_irq(dev->irq);

        /* Wait for any pending NAPI task to complete */
        netif_poll_disable(dev);

	RTL_W16(IntrMask, 0x00);

        netif_poll_enable(dev);
}

There will be a few "interrupt XXX taken in poll" when it races with
the irq handler but they are harmless anyway [*].

The lack of netif_poll_enable() after netif_wake_queue() in
rtl8169_reset_task() which you suggested to me should also be fixed
with this change.

[*] Ok, it should be possible to have a BUG_ON due to the simultaneous
    invocation of rtl8169_reinit_task() and rtl8169_reset_task() under
    weird circumstances but this one will be easily fixed by an 
    adequate netif_queue_stopped() in rtl8169_pcierr_interrupt().

--
Ueimor

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/3] r8169: non-NAPI netif_poll_disable removal
  2004-11-06  9:30   ` Jon Mason
@ 2004-11-06 23:38     ` Francois Romieu
  0 siblings, 0 replies; 4+ messages in thread
From: Francois Romieu @ 2004-11-06 23:38 UTC (permalink / raw)
  To: Jon Mason; +Cc: netdev, jgarzik

Jon Mason <jdmason@us.ibm.com> :
[...]
> There is an overall problem with the rtl8169_reset_task.  While the driver 
> continues to receive packets, can't allcate any new rx buffers, or the rx dma 
> engine is hung and no rx buffers are processed, the rtl8169_reset_task will 
> schedule itself and loop forever (as dirty_rx != cur_rx).

rtl8169_reset_task() is issued after a rtl8169_hw_reset() which should disable
the Rx/Tx DMA activity. dirty_rx == cur_rx means a complete Rx ring refill.

At first sight, I'd say that a partially refilled Rx ring is still usable if
it verifies 0 < dirty_rx % NUM_RX_DESC < cur_rx % NUM_RX_DESC or
cur_rx % NUM_RX_DESC == 0. I prefer to simply test for a complete refill.

Regarding the initial issue, what about the second patch below (against
2.6.10-rc1-bk15 + latest Jeff's netdev + first patch) ?

It is a bit paranoid but an user already reported a PCI error interruption
on x86 so it should be possible to have a PCI error close to a Tx watchdog
event. Despite what I claimed before, netif_queue_stopped() is not enough to
synchronize the tasks.


diff -puN drivers/net/r8169.c~r8169-250 drivers/net/r8169.c
--- linux-2.6.10-rc1/drivers/net/r8169.c~r8169-250	2004-11-06 21:49:00.000000000 +0100
+++ linux-2.6.10-rc1-fr/drivers/net/r8169.c	2004-11-06 21:49:00.000000000 +0100
@@ -1742,10 +1742,17 @@ static void rtl8169_schedule_work(struct
 
 static void rtl8169_wait_for_quiescence(struct net_device *dev)
 {
+	struct rtl8169_private *tp = netdev_priv(dev);
+	void __iomem *ioaddr = tp->mmio_addr;
+
 	synchronize_irq(dev->irq);
 
 	/* Wait for any pending NAPI task to complete */
 	netif_poll_disable(dev);
+
+	RTL_W16(IntrMask, 0x0000);
+
+	netif_poll_enable(dev);
 }
 
 static void rtl8169_reinit_task(void *_data)

diff -puN drivers/net/r8169.c~r8169-260 drivers/net/r8169.c
--- linux-2.6.10-rc1/drivers/net/r8169.c~r8169-260	2004-11-06 23:50:59.000000000 +0100
+++ linux-2.6.10-rc1-fr/drivers/net/r8169.c	2004-11-07 00:19:02.000000000 +0100
@@ -150,6 +150,12 @@ enum phy_version {
 	RTL_GIGA_PHY_VER_G = 0x07, /* PHY Reg 0x03 bit0-3 == 0x0002 */
 };
 
+enum drv_state {
+	R8169_STATE_NONE = 0,
+	R8169_STATE_UP,
+	R8169_STATE_DOWN,
+	R8169_STATE_TX_TIMEOUT,
+};
 
 #define _R(NAME,MAC,MASK) \
 	{ .name = NAME, .mac_version = MAC, .RxConfigMask = MASK }
@@ -403,6 +409,7 @@ struct rtl8169_private {
 	unsigned int (*phy_reset_pending)(void __iomem *);
 	unsigned int (*link_ok)(void __iomem *);
 	struct work_struct task;
+	unsigned int state;
 };
 
 MODULE_AUTHOR("Realtek");
@@ -423,6 +430,7 @@ static void rtl8169_hw_start(struct net_
 static int rtl8169_close(struct net_device *dev);
 static void rtl8169_set_rx_mode(struct net_device *dev);
 static void rtl8169_tx_timeout(struct net_device *dev);
+static void rtl8169_work(void *);
 static struct net_device_stats *rtl8169_get_stats(struct net_device *netdev);
 static int rtl8169_rx_interrupt(struct net_device *, struct rtl8169_private *,
 				void __iomem *);
@@ -1480,13 +1488,15 @@ rtl8169_open(struct net_device *dev)
 	if (retval < 0)
 		goto err_free_rx;
 
-	INIT_WORK(&tp->task, NULL, dev);
+	INIT_WORK(&tp->task, rtl8169_work, dev);
 
 	rtl8169_hw_start(dev);
 
 	rtl8169_request_timer(dev);
 
 	rtl8169_check_link_status(dev, tp, tp->mmio_addr);
+
+	netif_poll_enable(dev);
 out:
 	return retval;
 
@@ -1506,6 +1516,8 @@ static void rtl8169_hw_reset(void __iome
 	/* Disable interrupts */
 	RTL_W16(IntrMask, 0x0000);
 
+	RTL_W16(IntrStatus, 0x0000);
+
 	/* Reset the chipset */
 	RTL_W8(ChipCmd, CmdReset);
 
@@ -1732,11 +1744,8 @@ static void rtl8169_tx_clear(struct rtl8
 	tp->cur_tx = tp->dirty_tx = 0;
 }
 
-static void rtl8169_schedule_work(struct net_device *dev, void (*task)(void *))
+static void rtl8169_schedule_work(struct rtl8169_private *tp)
 {
-	struct rtl8169_private *tp = netdev_priv(dev);
-
-	PREPARE_WORK(&tp->task, task, dev);
 	schedule_delayed_work(&tp->task, 4);
 }
 
@@ -1750,38 +1759,30 @@ static void rtl8169_wait_for_quiescence(
 	/* Wait for any pending NAPI task to complete */
 	netif_poll_disable(dev);
 
-	RTL_W16(IntrMask, 0x0000);
+	RTL_W16(IntrMask, 0x00);
 
 	netif_poll_enable(dev);
 }
 
-static void rtl8169_reinit_task(void *_data)
+static int rtl8169_down_task(struct net_device *dev)
 {
-	struct net_device *dev = _data;
-	int ret;
+	struct rtl8169_private *tp = netdev_priv(dev);
+	unsigned long flags;
 
-	if (netif_running(dev)) {
-		rtl8169_wait_for_quiescence(dev);
-		rtl8169_close(dev);
-	}
+	rtl8169_wait_for_quiescence(dev);
+	rtl8169_close(dev);
 
-	ret = rtl8169_open(dev);
-	if (unlikely(ret < 0)) {
-		if (net_ratelimit()) {
-			printk(PFX KERN_ERR "%s: reinit failure (status = %d)."
-			       " Rescheduling.\n", dev->name, ret);
-		}
-		rtl8169_schedule_work(dev, rtl8169_reinit_task);
-	}
+	spin_lock_irqsave(&tp->lock, flags);
+	tp->state = R8169_STATE_UP;
+	spin_unlock_irqrestore(&tp->lock, flags);
+
+	return 0;
 }
 
-static void rtl8169_reset_task(void *_data)
+static int rtl8169_reset_task(struct net_device *dev)
 {
-	struct net_device *dev = _data;
 	struct rtl8169_private *tp = netdev_priv(dev);
-
-	if (!netif_running(dev))
-		return;
+	int ret = 0;
 
 	rtl8169_wait_for_quiescence(dev);
 
@@ -1792,23 +1793,81 @@ static void rtl8169_reset_task(void *_da
 		rtl8169_init_ring_indexes(tp);
 		rtl8169_hw_start(dev);
 		netif_wake_queue(dev);
+	} else
+		ret = -EAGAIN;
+
+	return ret;
+}
+
+static void rtl8169_work(void *_data)
+{
+	struct net_device *dev = _data;
+	struct rtl8169_private *tp = netdev_priv(dev);
+	unsigned long flags;
+	static struct {
+		int (*action)(struct net_device *);
+		unsigned int state;
+	} rtl8169_recovery_task[] = {
+		{ rtl8169_reset_task,	R8169_STATE_TX_TIMEOUT },
+		{ rtl8169_down_task,	R8169_STATE_DOWN },
+		{ rtl8169_open,		R8169_STATE_UP },
+		{ NULL,			R8169_STATE_NONE }
+	}, *p;
+
+	if (!netif_running(dev))
+		return;
+
+	spin_lock_irqsave(&tp->lock, flags);
+	for (p = rtl8169_recovery_task; p->action; p++) {
+		if (p->state == tp->state)
+			break;
+	}
+	spin_unlock_irqrestore(&tp->lock, flags);
+
+	if (likely(p->action)) {
+		int ret;
+
+		ret = p->action(dev);
+
+		spin_lock_irqsave(&tp->lock, flags);
+
+		if ((ret < 0) || (p->state != tp->state)) {
+			if ((ret < 0) && net_ratelimit()) {
+				printk(PFX KERN_ERR
+				       "%s: task failed (%p). Rescheduling.\n",
+				       dev->name, p->action);
+			}
+			rtl8169_schedule_work(tp);
+		} else
+			tp->state = R8169_STATE_NONE;
+
+		spin_unlock_irqrestore(&tp->lock, flags);
 	} else {
-		if (net_ratelimit()) {
-			printk(PFX KERN_EMERG "%s: Rx buffers shortage\n",
-			       dev->name);
-		}
-		rtl8169_schedule_work(dev, rtl8169_reset_task);
+		printk(PFX KERN_ERR "%s: nothing to do in delayed task.\n",
+		       dev->name);
 	}
 }
 
 static void rtl8169_tx_timeout(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
+	unsigned long flags;
 
 	rtl8169_hw_reset(tp->mmio_addr);
 
-	/* Let's wait a bit while any (async) irq lands on */
-	rtl8169_schedule_work(dev, rtl8169_reset_task);
+	spin_lock_irqsave(&tp->lock, flags);
+
+	/*
+	 * Let's wait a bit while any (async) irq lands on.
+	 * A racing pci error recovery task should be able to recover
+	 * from a TX timeout as well: no need to insist.
+	 */
+	if (likely(tp->state == R8169_STATE_NONE)) {
+		tp->state = R8169_STATE_TX_TIMEOUT;
+		rtl8169_schedule_work(tp);
+	}
+
+	spin_unlock_irqrestore(&tp->lock, flags);
 }
 
 static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb,
@@ -1977,11 +2036,20 @@ static void rtl8169_pcierr_interrupt(str
 
 	/* The infamous DAC f*ckup only happens at boot time */
 	if ((tp->cp_cmd & PCIDAC) && (tp->dirty_rx == tp->cur_rx == 0)) {
+		unsigned long flags;
+
 		printk(KERN_INFO PFX "%s: disabling PCI DAC.\n", dev->name);
 		tp->cp_cmd &= ~PCIDAC;
 		RTL_W16(CPlusCmd, tp->cp_cmd);
 		dev->features &= ~NETIF_F_HIGHDMA;
-		rtl8169_schedule_work(dev, rtl8169_reinit_task);
+
+		spin_lock_irqsave(&tp->lock, flags);
+
+		if (likely(tp->state == R8169_STATE_NONE))
+			rtl8169_schedule_work(tp);
+		tp->state = R8169_STATE_DOWN;
+
+		spin_unlock_irqrestore(&tp->lock, flags);
 	}
 
 	rtl8169_hw_reset(ioaddr);

_

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-11-06 23:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-02 18:03 [PATCH 1/3] r8169: non-NAPI netif_poll_disable removal Jon Mason
2004-11-06 14:56 ` Francois Romieu
2004-11-06  9:30   ` Jon Mason
2004-11-06 23:38     ` Francois Romieu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).