[PATCH] natsemi: netpoll fixes

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] natsemi: netpoll fixes
@ 2007-03-05 20:10 Sergei Shtylyov
  2007-03-05 22:41 ` Mark Brown
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Sergei Shtylyov @ 2007-03-05 20:10 UTC (permalink / raw)
  To: jgarzik; +Cc: netdev, mhuth

Fix two issues in this driver's netpoll path: one usual, with spin_unlock_irq()
enabling interrupts which nobody asks it to do (that has been fixed recently in
a number of drivers) and one unusual, with poll_controller() method possibly
causing loss of interrupts due to the interrupt status register being cleared
by a simple read and the interrpupt handler simply storing it, not accumulating.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>

---
 drivers/net/natsemi.c |   24 +++++++++++++++++++-----
 1 files changed, 19 insertions(+), 5 deletions(-)

Index: linux-2.6/drivers/net/natsemi.c
===================================================================
--- linux-2.6.orig/drivers/net/natsemi.c
+++ linux-2.6/drivers/net/natsemi.c
@@ -2024,6 +2024,7 @@ static int start_tx(struct sk_buff *skb,
 	struct netdev_private *np = netdev_priv(dev);
 	void __iomem * ioaddr = ns_ioaddr(dev);
 	unsigned entry;
+	unsigned long flags;
 
 	/* Note: Ordering is important here, set the field with the
 	   "ownership" bit last, and only then increment cur_tx. */
@@ -2037,7 +2038,7 @@ static int start_tx(struct sk_buff *skb,
 
 	np->tx_ring[entry].addr = cpu_to_le32(np->tx_dma[entry]);
 
-	spin_lock_irq(&np->lock);
+	spin_lock_irqsave(&np->lock, flags);
 
 	if (!np->hands_off) {
 		np->tx_ring[entry].cmd_status = cpu_to_le32(DescOwn | skb->len);
@@ -2056,7 +2057,7 @@ static int start_tx(struct sk_buff *skb,
 		dev_kfree_skb_irq(skb);
 		np->stats.tx_dropped++;
 	}
-	spin_unlock_irq(&np->lock);
+	spin_unlock_irqrestore(&np->lock, flags);
 
 	dev->trans_start = jiffies;
 
@@ -2222,6 +2223,8 @@ static void netdev_rx(struct net_device 
 		pkt_len = (desc_status & DescSizeMask) - 4;
 		if ((desc_status&(DescMore|DescPktOK|DescRxLong)) != DescPktOK){
 			if (desc_status & DescMore) {
+				unsigned long flags;
+
 				if (netif_msg_rx_err(np))
 					printk(KERN_WARNING
 						"%s: Oversized(?) Ethernet "
@@ -2236,12 +2239,12 @@ static void netdev_rx(struct net_device 
 				 * reset procedure documented in
 				 * AN-1287. */
 
-				spin_lock_irq(&np->lock);
+				spin_lock_irqsave(&np->lock, flags);
 				reset_rx(dev);
 				reinit_rx(dev);
 				writel(np->ring_dma, ioaddr + RxRingPtr);
 				check_link(dev);
-				spin_unlock_irq(&np->lock);
+				spin_unlock_irqrestore(&np->lock, flags);
 
 				/* We'll enable RX on exit from this
 				 * function. */
@@ -2396,8 +2399,19 @@ static struct net_device_stats *get_stat
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void natsemi_poll_controller(struct net_device *dev)
 {
+	struct netdev_private *np = netdev_priv(dev);
+
 	disable_irq(dev->irq);
-	intr_handler(dev->irq, dev);
+
+	/*
+	 * A real interrupt might have already reached us at this point
+	 * but NAPI might still haven't called us back.  As the interrupt
+	 * status register is cleared by reading, we should prevent an
+	 * interrupt loss in this case...
+	 */
+	if (!np->intr_status)
+		intr_handler(dev->irq, dev);
+
 	enable_irq(dev->irq);
 }
 #endif


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-05 20:10 [PATCH] natsemi: netpoll fixes Sergei Shtylyov
@ 2007-03-05 22:41 ` Mark Brown
  2007-03-05 22:43 ` Mark Brown
  2007-03-06 11:10 ` Jeff Garzik
  2 siblings, 0 replies; 15+ messages in thread
From: Mark Brown @ 2007-03-05 22:41 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1211 bytes --]

On Tue, Mar 06, 2007 at 12:10:08AM +0400, Sergei Shtylyov wrote:

>  #ifdef CONFIG_NET_POLL_CONTROLLER
>  static void natsemi_poll_controller(struct net_device *dev)
>  {
> +	struct netdev_private *np = netdev_priv(dev);
> +
>  	disable_irq(dev->irq);
> -	intr_handler(dev->irq, dev);
> +
> +	/*
> +	 * A real interrupt might have already reached us at this point
> +	 * but NAPI might still haven't called us back.  As the interrupt
> +	 * status register is cleared by reading, we should prevent an
> +	 * interrupt loss in this case...
> +	 */
> +	if (!np->intr_status)
> +		intr_handler(dev->irq, dev);
> +
>  	enable_irq(dev->irq);

Is it possible for this to run at the same time as the NAPI poll?  If so
then it is possible for the netpoll poll to run between np->intr_status
being cleared and netif_rx_complete() being called.  If the hardware
asserts an interrupt at the wrong moment then this could cause the 

In any case, this is a problem independently of netpoll if the chip
shares an interrupt with anything so the interrupt handler should be
fixed to cope with this situation instead.

-- 
"You grabbed my hand and we fell into it, like a daydream - or a fever."

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-05 20:10 [PATCH] natsemi: netpoll fixes Sergei Shtylyov
  2007-03-05 22:41 ` Mark Brown
@ 2007-03-05 22:43 ` Mark Brown
  2007-03-06  4:10   ` Mark Huth
  2007-03-06 11:10 ` Jeff Garzik
  2 siblings, 1 reply; 15+ messages in thread
From: Mark Brown @ 2007-03-05 22:43 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: jgarzik, netdev, mhuth

[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]

[Once more with CCs]

On Tue, Mar 06, 2007 at 12:10:08AM +0400, Sergei Shtylyov wrote:

>  #ifdef CONFIG_NET_POLL_CONTROLLER
>  static void natsemi_poll_controller(struct net_device *dev)
>  {
> +     struct netdev_private *np = netdev_priv(dev);
> +
>       disable_irq(dev->irq);
> -     intr_handler(dev->irq, dev);
> +
> +     /*
> +      * A real interrupt might have already reached us at this point
> +      * but NAPI might still haven't called us back.  As the
> interrupt
> +      * status register is cleared by reading, we should prevent an
> +      * interrupt loss in this case...
> +      */
> +     if (!np->intr_status)
> +             intr_handler(dev->irq, dev);
> +
>       enable_irq(dev->irq);

Is it possible for this to run at the same time as the NAPI poll?  If so
then it is possible for the netpoll poll to run between np->intr_status
being cleared and netif_rx_complete() being called.  If the hardware
asserts an interrupt at the wrong moment then this could cause the

In any case, this is a problem independently of netpoll if the chip
shares an interrupt with anything so the interrupt handler should be
fixed to cope with this situation instead.

--
"You grabbed my hand and we fell into it, like a daydream - or a fever."

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-05 22:43 ` Mark Brown
@ 2007-03-06  4:10   ` Mark Huth
  2007-03-10 20:25     ` Sergei Shtylyov
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Huth @ 2007-03-06  4:10 UTC (permalink / raw)
  To: Sergei Shtylyov, jgarzik, netdev, mhuth

Mark Brown wrote:
> [Once more with CCs]
>
> On Tue, Mar 06, 2007 at 12:10:08AM +0400, Sergei Shtylyov wrote:
>
>   
>>  #ifdef CONFIG_NET_POLL_CONTROLLER
>>  static void natsemi_poll_controller(struct net_device *dev)
>>  {
>> +     struct netdev_private *np = netdev_priv(dev);
>> +
>>       disable_irq(dev->irq);
>> -     intr_handler(dev->irq, dev);
>> +
>> +     /*
>> +      * A real interrupt might have already reached us at this point
>> +      * but NAPI might still haven't called us back.  As the
>> interrupt
>> +      * status register is cleared by reading, we should prevent an
>> +      * interrupt loss in this case...
>> +      */
>> +     if (!np->intr_status)
>> +             intr_handler(dev->irq, dev);
>> +
>>       enable_irq(dev->irq);
>>     
>
> Is it possible for this to run at the same time as the NAPI poll?  If so
> then it is possible for the netpoll poll to run between np->intr_status
> being cleared and netif_rx_complete() being called.  If the hardware
> asserts an interrupt at the wrong moment then this could cause the
>   
Well, there is a whole task of analyzing the netpoll conditions under 
smp.  There appears to me to be a race with netpoll and NAPI on another 
processor, given that netpoll can be called with virtually any system 
condition on a debug breakpoint or crash dump initiation.  I'm spending 
some time looking into it, but don't have a smoking gun immediately.  
Regardless, if such a condition does exist, it is shared across many or 
all of the potential netpolled devices.  Since that is exactly the 
condition  the suggested patch purports to solve, it is pointless if the 
whole NAPI/netpoll race exists.  Such a race would lead to various and 
imaginative failures in the system.  So don't fix that problem in a 
particular driver.  If it exists, fix it generally in the netpoll/NAPI 
infrastructure.
> In any case, this is a problem independently of netpoll if the chip
> shares an interrupt with anything so the interrupt handler should be
> fixed to cope with this situation instead.
>   
Yes, that would appear so.  If an interrupt line is shared with this 
device, then the interrupt handler can be called again, even though the 
device's interrupts are disabled on the interface.  So, in the actual 
interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if 
it's set, leave immediately, it can't be our interrupt. If it's clear, 
read the irq enable hardware register.  If enabled, do the rest of the 
interrupt handler. Since the isr is disabled only by the interrupt 
handler, and enabled only by the poll routine, the race on the interrupt 
cause register is prevented.  And, as a byproduct, the netpoll race is 
also prevented.  You could just always read the isr enable hardware 
register, but that means you always do an operation to the chip, which 
can be painfully slow.  I guess the tradeoff depends on the probability 
of getting the isr called when NAPI is active for the device.

If this results in netpoll not getting a packet right away, that's okay, 
since the netpoll users should try again.

Mark Huth


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-05 20:10 [PATCH] natsemi: netpoll fixes Sergei Shtylyov
  2007-03-05 22:41 ` Mark Brown
  2007-03-05 22:43 ` Mark Brown
@ 2007-03-06 11:10 ` Jeff Garzik
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff Garzik @ 2007-03-06 11:10 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, mhuth

Sergei Shtylyov wrote:
> Fix two issues in this driver's netpoll path: one usual, with spin_unlock_irq()
> enabling interrupts which nobody asks it to do (that has been fixed recently in
> a number of drivers) and one unusual, with poll_controller() method possibly
> causing loss of interrupts due to the interrupt status register being cleared
> by a simple read and the interrpupt handler simply storing it, not accumulating.
> 
> Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
> 
> ---
>  drivers/net/natsemi.c |   24 +++++++++++++++++++-----
>  1 files changed, 19 insertions(+), 5 deletions(-)

applied



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-06  4:10   ` Mark Huth
@ 2007-03-10 20:25     ` Sergei Shtylyov
  2007-03-11 12:16       ` Mark Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Sergei Shtylyov @ 2007-03-10 20:25 UTC (permalink / raw)
  To: Mark Huth; +Cc: jgarzik, netdev

Hello.

Mark Huth wrote:

>>>  #ifdef CONFIG_NET_POLL_CONTROLLER
>>>  static void natsemi_poll_controller(struct net_device *dev)
>>>  {
>>> +     struct netdev_private *np = netdev_priv(dev);
>>> +
>>>       disable_irq(dev->irq);
>>> -     intr_handler(dev->irq, dev);
>>> +
>>> +     /*
>>> +      * A real interrupt might have already reached us at this point
>>> +      * but NAPI might still haven't called us back.  As the
>>> interrupt
>>> +      * status register is cleared by reading, we should prevent an
>>> +      * interrupt loss in this case...
>>> +      */
>>> +     if (!np->intr_status)
>>> +             intr_handler(dev->irq, dev);
>>> +
>>>       enable_irq(dev->irq);

    Oops, I was going to recast the patch but my attention switched elsewhere 
for couple of days, and it "slipped" into mainline. I'm now preparing a better 
patch to also protect...

>> Is it possible for this to run at the same time as the NAPI poll?  If so
>> then it is possible for the netpoll poll to run between np->intr_status
>> being cleared and netif_rx_complete() being called.  If the hardware
>> asserts an interrupt at the wrong moment then this could cause the

> Well, there is a whole task of analyzing the netpoll conditions under 
> smp.  There appears to me to be a race with netpoll and NAPI on another 
> processor, given that netpoll can be called with virtually any system 
> condition on a debug breakpoint or crash dump initiation.  I'm spending 
> some time looking into it, but don't have a smoking gun immediately.  
> Regardless, if such a condition does exist, it is shared across many or 
> all of the potential netpolled devices.  Since that is exactly the 
> condition  the suggested patch purports to solve, it is pointless if the 
> whole NAPI/netpoll race exists.  Such a race would lead to various and 
> imaginative failures in the system.  So don't fix that problem in a 
> particular driver.  If it exists, fix it generally in the netpoll/NAPI 
> infrastructure.

    Any takers? :-)

>> In any case, this is a problem independently of netpoll if the chip
>> shares an interrupt with anything so the interrupt handler should be
>> fixed to cope with this situation instead.

> Yes, that would appear so.  If an interrupt line is shared with this 
> device, then the interrupt handler can be called again, even though the 
> device's interrupts are disabled on the interface.  So, in the actual 
> interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if 
> it's set, leave immediately, it can't be our interrupt. If it's clear, 
> read the irq enable hardware register.  If enabled, do the rest of the 
> interrupt handler.

    It seems that there's no need to read it, as it gets set to 0 
"synchronously" with setting the 'hands_off' flag (except in NAPI callback)...

> Since the isr is disabled only by the interrupt 
> handler, and enabled only by the poll routine, the race on the interrupt 
> cause register is prevented.  And, as a byproduct, the netpoll race is 
> also prevented.  You could just always read the isr enable hardware 
> register, but that means you always do an operation to the chip, which 
> can be painfully slow.

    Yeah, it seems currently unjustified.  However IntrEnable would have been 
an ultimate criterion on taking or ignoring an interrupt otherwise...

> I guess the tradeoff depends on the probability 
> of getting the isr called when NAPI is active for the device.

    Didn't get it... :-/

> If this results in netpoll not getting a packet right away, that's okay, 
> since the netpoll users should try again.

    Well, in certain stuations (like KGDBoE), netpoll callback being called 
*while* NAPI callback is being executed would mean a deadlock, I think (as 
NAPI callback will never complete)...
    BTW, it seems I've found another interrupt lossage path in the driver:

netdev_poll() -> netdev_rx() -> reset_rx()

If the netdev_rx() detects an oversized packet, it will call reset_rx() which 
will spin in a loop "collecting" interrupt status until it sees RxResetDone 
there.  The issue is 'intr_status' field will get overwritten and interrupt 
status lost after netdev_rx() returns to netdev_poll().  How do you think, is 
this corner case worth fixing (by moving netdev_rx() call to the top of a 
do/while loop)?

> Mark Huth

WBR, Sergei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-10 20:25     ` Sergei Shtylyov
@ 2007-03-11 12:16       ` Mark Brown
  2007-03-12 13:05         ` Sergei Shtylyov
  2007-03-12 19:05         ` Mark Huth
  0 siblings, 2 replies; 15+ messages in thread
From: Mark Brown @ 2007-03-11 12:16 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Mark Huth, jgarzik, netdev

[-- Attachment #1: Type: text/plain, Size: 8245 bytes --]

On Sat, Mar 10, 2007 at 11:25:05PM +0300, Sergei Shtylyov wrote:

>    Oops, I was going to recast the patch but my attention switched 
>    elsewhere for couple of days, and it "slipped" into mainline. I'm now 
> preparing a better patch to also protect...

Ah, I was also looking at it.  I enclose my current patch which appears
to work although I have not finished testing it yet.

> >interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if 
> >it's set, leave immediately, it can't be our interrupt. If it's clear, 
> >read the irq enable hardware register.  If enabled, do the rest of the 
> >interrupt handler.

>    It seems that there's no need to read it, as it gets set to 0 
> "synchronously" with setting the 'hands_off' flag (except in NAPI 
> callback)...

hands_off is stronger than that - it's used for sync with some of the
other code paths like suspend/resume and means "don't touch the chip".
I've added a new driver local flag instead.

>    Yeah, it seems currently unjustified.  However IntrEnable would have 
>    been an ultimate criterion on taking or ignoring an interrupt otherwise...

> >I guess the tradeoff depends on the probability 
> >of getting the isr called when NAPI is active for the device.

>    Didn't get it... :-/

Some systems can provoke this fairly easily - Sokeris have some boards
where at least three natsemis share a single interrupt line, for example
(the model I'm looking at has three, they look to have another
configuration where 5 would end up on the line).

>    BTW, it seems I've found another interrupt lossage path in the driver:

> netdev_poll() -> netdev_rx() -> reset_rx()

> If the netdev_rx() detects an oversized packet, it will call reset_rx() 
> which will spin in a loop "collecting" interrupt status until it sees 
> RxResetDone there.  The issue is 'intr_status' field will get overwritten 
> and interrupt status lost after netdev_rx() returns to netdev_poll().  How 
> do you think, is this corner case worth fixing (by moving netdev_rx() call
> to the top of a do/while loop)?

Moving netdev_rx() would fix that one but there's some others too -
there's one in the timer routine if the chip crashes.  In the case you
describe above the consequences shouldn't be too bad since it tends to
only occur at high volume so further traffic will tend to occur and
cause things to recover - all the testing of that patch was done with
the bug present and no ill effects.

Subject: natsemi: Fix NAPI for interrupt sharing
To: Jeff Garzik <jeff@garzik.org>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>, Simon Blake <simon@citylink.co.nz>, John Philips <johnphilips42@yahoo.com>, netdev@vger.kernel.org

The interrupt status register for the natsemi chips is clear on read and
was read unconditionally from both the interrupt and from the NAPI poll
routine, meaning that if the interrupt service routine was called (for 
example, due to a shared interrupt) while a NAPI poll was scheduled
interrupts could be missed.  This patch fixes that by ensuring that the
interrupt status register is only read when there is no poll scheduled.

It also reverts a workaround for this problem from the netpoll hook.

Thanks to Sergei Shtylyov <sshtylyov@ru.mvista.com> for spotting the
issue and Simon Blake <simon@citylink.co.nz> for testing resources.

Signed-Off-By: Mark Brown <broonie@sirena.org.uk>

Index: linux-2.6/drivers/net/natsemi.c
===================================================================
--- linux-2.6.orig/drivers/net/natsemi.c	2007-03-11 02:32:43.000000000 +0000
+++ linux-2.6/drivers/net/natsemi.c	2007-03-11 12:09:14.000000000 +0000
@@ -571,6 +571,8 @@
 	int oom;
 	/* Interrupt status */
 	u32 intr_status;
+	int poll_active;
+	spinlock_t intr_lock;
 	/* Do not touch the nic registers */
 	int hands_off;
 	/* Don't pay attention to the reported link state. */
@@ -812,9 +814,11 @@
 	pci_set_drvdata(pdev, dev);
 	np->iosize = iosize;
 	spin_lock_init(&np->lock);
+	spin_lock_init(&np->intr_lock);
 	np->msg_enable = (debug >= 0) ? (1<<debug)-1 : NATSEMI_DEF_MSG;
 	np->hands_off = 0;
 	np->intr_status = 0;
+	np->poll_active = 0;
 	np->eeprom_size = natsemi_pci_info[chip_idx].eeprom_size;
 	if (natsemi_pci_info[chip_idx].flags & NATSEMI_FLAG_IGNORE_PHY)
 		np->ignore_phy = 1;
@@ -1406,6 +1410,8 @@
 	writel(rfcr, ioaddr + RxFilterAddr);
 }
 
+/* MUST be called so that both NAPI poll and ISR are excluded due to
+ * use of intr_status. */
 static void reset_rx(struct net_device *dev)
 {
 	int i;
@@ -2118,30 +2124,45 @@
 	struct net_device *dev = dev_instance;
 	struct netdev_private *np = netdev_priv(dev);
 	void __iomem * ioaddr = ns_ioaddr(dev);
+	unsigned long flags;
+	irqreturn_t status = IRQ_NONE;
 
 	if (np->hands_off)
 		return IRQ_NONE;
 
-	/* Reading automatically acknowledges. */
-	np->intr_status = readl(ioaddr + IntrStatus);
-
-	if (netif_msg_intr(np))
-		printk(KERN_DEBUG
-		       "%s: Interrupt, status %#08x, mask %#08x.\n",
-		       dev->name, np->intr_status,
-		       readl(ioaddr + IntrMask));
+	spin_lock_irqsave(&np->intr_lock, flags);
 
-	if (!np->intr_status)
-		return IRQ_NONE;
+	/* Reading IntrStatus automatically acknowledges so don't do
+	 * that while a poll is scheduled.  */
+	if (!np->poll_active) {
+		np->intr_status = readl(ioaddr + IntrStatus);
 
-	prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
+		if (netif_msg_intr(np))
+			printk(KERN_DEBUG
+			       "%s: Interrupt, status %#08x, mask %#08x.\n",
+			       dev->name, np->intr_status,
+			       readl(ioaddr + IntrMask));
+
+		if (np->intr_status) {
+			prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
+
+			/* Disable interrupts and register for poll */
+			if (netif_rx_schedule_prep(dev)) {
+				natsemi_irq_disable(dev);
+				__netif_rx_schedule(dev);
+				np->poll_active = 1;
+			} else
+				printk(KERN_WARNING
+			       	       "%s: Ignoring interrupt, status %#08x, mask %#08x.\n",
+				       dev->name, np->intr_status,
+				       readl(ioaddr + IntrMask));
 
-	if (netif_rx_schedule_prep(dev)) {
-		/* Disable interrupts and register for poll */
-		natsemi_irq_disable(dev);
-		__netif_rx_schedule(dev);
+			status = IRQ_HANDLED;
+		}
 	}
-	return IRQ_HANDLED;
+
+	spin_unlock_irqrestore(&np->intr_lock, flags);
+	return status;
 }
 
 /* This is the NAPI poll routine.  As well as the standard RX handling
@@ -2154,8 +2175,15 @@
 
 	int work_to_do = min(*budget, dev->quota);
 	int work_done = 0;
+	unsigned long flags;
 
 	do {
+		if (netif_msg_intr(np))
+			printk(KERN_DEBUG
+			       "%s: Poll, status %#08x, mask %#08x.\n",
+			       dev->name, np->intr_status,
+			       readl(ioaddr + IntrMask));
+
 		if (np->intr_status &
 		    (IntrTxDone | IntrTxIntr | IntrTxIdle | IntrTxErr)) {
 			spin_lock(&np->lock);
@@ -2182,14 +2210,19 @@
 		np->intr_status = readl(ioaddr + IntrStatus);
 	} while (np->intr_status);
 
+	/* We need to ensure that the ISR doesn't run between telling
+	 * NAPI we're done and enabling the interrupt. */
+	spin_lock_irqsave(&np->intr_lock, flags);
+
 	netif_rx_complete(dev);
+	np->poll_active = 0;
 
 	/* Reenable interrupts providing nothing is trying to shut
 	 * the chip down. */
-	spin_lock(&np->lock);
-	if (!np->hands_off && netif_running(dev))
+	if (!np->hands_off)
 		natsemi_irq_enable(dev);
-	spin_unlock(&np->lock);
+
+	spin_unlock_irqrestore(&np->intr_lock, flags);
 
 	return 0;
 }
@@ -2399,19 +2432,8 @@
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void natsemi_poll_controller(struct net_device *dev)
 {
-	struct netdev_private *np = netdev_priv(dev);
-
 	disable_irq(dev->irq);
-
-	/*
-	 * A real interrupt might have already reached us at this point
-	 * but NAPI might still haven't called us back.  As the interrupt
-	 * status register is cleared by reading, we should prevent an
-	 * interrupt loss in this case...
-	 */
-	if (!np->intr_status)
-		intr_handler(dev->irq, dev);
-
+	intr_handler(dev->irq, dev);
 	enable_irq(dev->irq);
 }
 #endif

-- 
"You grabbed my hand and we fell into it, like a daydream - or a fever."



[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-11 12:16       ` Mark Brown
@ 2007-03-12 13:05         ` Sergei Shtylyov
  2007-03-12 19:11           ` Mark Brown
  2007-03-12 19:12           ` Sergei Shtylyov
  2007-03-12 19:05         ` Mark Huth
  1 sibling, 2 replies; 15+ messages in thread
From: Sergei Shtylyov @ 2007-03-12 13:05 UTC (permalink / raw)
  To: Mark Brown; +Cc: Mark Huth, jgarzik, netdev

Hello.

Mark Brown wrote:

>>   Oops, I was going to recast the patch but my attention switched 
>>   elsewhere for couple of days, and it "slipped" into mainline. I'm now 
>>preparing a better patch to also protect...

> Ah, I was also looking at it.  I enclose my current patch which appears
> to work although I have not finished testing it yet.

>>>interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if 
>>>it's set, leave immediately, it can't be our interrupt. If it's clear, 
>>>read the irq enable hardware register.  If enabled, do the rest of the 
>>>interrupt handler.

>>   It seems that there's no need to read it, as it gets set to 0 
>>"synchronously" with setting the 'hands_off' flag (except in NAPI 
>>callback)...

> hands_off is stronger than that - it's used for sync with some of the
> other code paths like suspend/resume and means "don't touch the chip".
> I've added a new driver local flag instead.

    I'm not sure it was worth it -- we already had __LINK_STATE_RX_SCHED...

>>   Yeah, it seems currently unjustified.  However IntrEnable would have 
>>   been an ultimate criterion on taking or ignoring an interrupt otherwise...

>>>I guess the tradeoff depends on the probability 
>>>of getting the isr called when NAPI is active for the device.

>>   Didn't get it... :-/

    I mean I didn't understand why there's tradeoff and how it depends on the 
probability...

> Some systems can provoke this fairly easily - Sokeris have some boards
> where at least three natsemis share a single interrupt line, for example
> (the model I'm looking at has three, they look to have another
> configuration where 5 would end up on the line).

    PCI means IRQ sharing, generally. So, it should have been fixed anyway. :-)

>>   BTW, it seems I've found another interrupt lossage path in the driver:

>>netdev_poll() -> netdev_rx() -> reset_rx()

>>If the netdev_rx() detects an oversized packet, it will call reset_rx() 
>>which will spin in a loop "collecting" interrupt status until it sees 
>>RxResetDone there.  The issue is 'intr_status' field will get overwritten 
>>and interrupt status lost after netdev_rx() returns to netdev_poll().  How 
>>do you think, is this corner case worth fixing (by moving netdev_rx() call
>>to the top of a do/while loop)?

> Moving netdev_rx() would fix that one but there's some others too -
> there's one in the timer routine if the chip crashes.  In the case you

    Erm, sorry, I'm not seeing it -- could you "point with finger" please? :-)

> describe above the consequences shouldn't be too bad since it tends to
> only occur at high volume so further traffic will tend to occur and
> cause things to recover - all the testing of that patch was done with
> the bug present and no ill effects.

    Oversized packets occur only at high volume? Is it some errata?

> Subject: natsemi: Fix NAPI for interrupt sharing
> To: Jeff Garzik <jeff@garzik.org>
> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>, Simon Blake <simon@citylink.co.nz>, John Philips <johnphilips42@yahoo.com>, netdev@vger.kernel.org

> The interrupt status register for the natsemi chips is clear on read and
> was read unconditionally from both the interrupt and from the NAPI poll
> routine, meaning that if the interrupt service routine was called (for 
> example, due to a shared interrupt) while a NAPI poll was scheduled
> interrupts could be missed.  This patch fixes that by ensuring that the
> interrupt status register is only read when there is no poll scheduled.

> It also reverts a workaround for this problem from the netpoll hook.

> Thanks to Sergei Shtylyov <sshtylyov@ru.mvista.com> for spotting the
> issue and Simon Blake <simon@citylink.co.nz> for testing resources.

    Thanks for the patch!
    (If I only knew somebody else was working on that issue, it could have 
saved my cycles, sigh... but well, I should have said  that I was going to 
recast the patch. :-)

> Signed-Off-By: Mark Brown <broonie@sirena.org.uk>

> Index: linux-2.6/drivers/net/natsemi.c
> ===================================================================
> --- linux-2.6.orig/drivers/net/natsemi.c	2007-03-11 02:32:43.000000000 +0000
> +++ linux-2.6/drivers/net/natsemi.c	2007-03-11 12:09:14.000000000 +0000
> @@ -571,6 +571,8 @@
>  	int oom;
>  	/* Interrupt status */
>  	u32 intr_status;
> +	int poll_active;
> +	spinlock_t intr_lock;
>  	/* Do not touch the nic registers */
>  	int hands_off;
>  	/* Don't pay attention to the reported link state. */
> @@ -812,9 +814,11 @@
>  	pci_set_drvdata(pdev, dev);
>  	np->iosize = iosize;
>  	spin_lock_init(&np->lock);
> +	spin_lock_init(&np->intr_lock);
>  	np->msg_enable = (debug >= 0) ? (1<<debug)-1 : NATSEMI_DEF_MSG;
>  	np->hands_off = 0;
>  	np->intr_status = 0;
> +	np->poll_active = 0;
>  	np->eeprom_size = natsemi_pci_info[chip_idx].eeprom_size;
>  	if (natsemi_pci_info[chip_idx].flags & NATSEMI_FLAG_IGNORE_PHY)
>  		np->ignore_phy = 1;
> @@ -1406,6 +1410,8 @@
>  	writel(rfcr, ioaddr + RxFilterAddr);
>  }
>  
> +/* MUST be called so that both NAPI poll and ISR are excluded due to
> + * use of intr_status. */
>  static void reset_rx(struct net_device *dev)
>  {
>  	int i;
> @@ -2118,30 +2124,45 @@
>  	struct net_device *dev = dev_instance;
>  	struct netdev_private *np = netdev_priv(dev);
>  	void __iomem * ioaddr = ns_ioaddr(dev);
> +	unsigned long flags;
> +	irqreturn_t status = IRQ_NONE;
>  
>  	if (np->hands_off)
>  		return IRQ_NONE;
>  
> -	/* Reading automatically acknowledges. */
> -	np->intr_status = readl(ioaddr + IntrStatus);
> -
> -	if (netif_msg_intr(np))
> -		printk(KERN_DEBUG
> -		       "%s: Interrupt, status %#08x, mask %#08x.\n",
> -		       dev->name, np->intr_status,
> -		       readl(ioaddr + IntrMask));
> +	spin_lock_irqsave(&np->intr_lock, flags);

    Yeah, I've suspected that we need to grab np->lock here... but does that 
separate spinlock actually protect us from anything?

> -	if (!np->intr_status)
> -		return IRQ_NONE;
> +	/* Reading IntrStatus automatically acknowledges so don't do
> +	 * that while a poll is scheduled.  */
> +	if (!np->poll_active) {
> +		np->intr_status = readl(ioaddr + IntrStatus);
>  
> -	prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
> +		if (netif_msg_intr(np))
> +			printk(KERN_DEBUG
> +			       "%s: Interrupt, status %#08x, mask %#08x.\n",
> +			       dev->name, np->intr_status,
> +			       readl(ioaddr + IntrMask));
> +
> +		if (np->intr_status) {
> +			prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
> +
> +			/* Disable interrupts and register for poll */
> +			if (netif_rx_schedule_prep(dev)) {
> +				natsemi_irq_disable(dev);
> +				__netif_rx_schedule(dev);
> +				np->poll_active = 1;
> +			} else
> +				printk(KERN_WARNING
> +			       	       "%s: Ignoring interrupt, status %#08x, mask %#08x.\n",
> +				       dev->name, np->intr_status,
> +				       readl(ioaddr + IntrMask));
>  
> -	if (netif_rx_schedule_prep(dev)) {
> -		/* Disable interrupts and register for poll */
> -		natsemi_irq_disable(dev);
> -		__netif_rx_schedule(dev);
> +			status = IRQ_HANDLED;
> +		}
>  	}
> -	return IRQ_HANDLED;
> +
> +	spin_unlock_irqrestore(&np->intr_lock, flags);
> +	return status;
>  }
>  
>  /* This is the NAPI poll routine.  As well as the standard RX handling
> @@ -2154,8 +2175,15 @@
>  
>  	int work_to_do = min(*budget, dev->quota);
>  	int work_done = 0;
> +	unsigned long flags;
>  
>  	do {
> +		if (netif_msg_intr(np))
> +			printk(KERN_DEBUG
> +			       "%s: Poll, status %#08x, mask %#08x.\n",
> +			       dev->name, np->intr_status,
> +			       readl(ioaddr + IntrMask));
> +
>  		if (np->intr_status &
>  		    (IntrTxDone | IntrTxIntr | IntrTxIdle | IntrTxErr)) {
>  			spin_lock(&np->lock);
> @@ -2182,14 +2210,19 @@
>  		np->intr_status = readl(ioaddr + IntrStatus);
>  	} while (np->intr_status);
>  
> +	/* We need to ensure that the ISR doesn't run between telling
> +	 * NAPI we're done and enabling the interrupt. */

    Why? :-O

> +	spin_lock_irqsave(&np->intr_lock, flags);
> +
>  	netif_rx_complete(dev);
> +	np->poll_active = 0;
>  
>  	/* Reenable interrupts providing nothing is trying to shut
>  	 * the chip down. */
> -	spin_lock(&np->lock);
> -	if (!np->hands_off && netif_running(dev))
> +	if (!np->hands_off)
>  		natsemi_irq_enable(dev);
> -	spin_unlock(&np->lock);
> +
> +	spin_unlock_irqrestore(&np->intr_lock, flags);
>  
>  	return 0;
>  }

WBR, Sergei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-11 12:16       ` Mark Brown
  2007-03-12 13:05         ` Sergei Shtylyov
@ 2007-03-12 19:05         ` Mark Huth
  2007-03-13  0:14           ` Mark Brown
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Huth @ 2007-03-12 19:05 UTC (permalink / raw)
  To: Sergei Shtylyov, Mark Huth, jgarzik, netdev

Mark Brown wrote:
> On Sat, Mar 10, 2007 at 11:25:05PM +0300, Sergei Shtylyov wrote:
>
>   
>>    Oops, I was going to recast the patch but my attention switched 
>>    elsewhere for couple of days, and it "slipped" into mainline. I'm now 
>> preparing a better patch to also protect...
>>     
>
> Ah, I was also looking at it.  I enclose my current patch which appears
> to work although I have not finished testing it yet.
>
>   
>>> interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if 
>>> it's set, leave immediately, it can't be our interrupt. If it's clear, 
>>> read the irq enable hardware register.  If enabled, do the rest of the 
>>> interrupt handler.
>>>       
>
>   
>>    It seems that there's no need to read it, as it gets set to 0 
>> "synchronously" with setting the 'hands_off' flag (except in NAPI 
>> callback)...
>>     
>
> hands_off is stronger than that - it's used for sync with some of the
> other code paths like suspend/resume and means "don't touch the chip".
> I've added a new driver local flag instead.
>
>   
>>    Yeah, it seems currently unjustified.  However IntrEnable would have 
>>    been an ultimate criterion on taking or ignoring an interrupt otherwise...
>>     
>
>   
>>> I guess the tradeoff depends on the probability 
>>> of getting the isr called when NAPI is active for the device.
>>>       
>
>   
>>    Didn't get it... :-/
>>     
>
> Some systems can provoke this fairly easily - Sokeris have some boards
> where at least three natsemis share a single interrupt line, for example
> (the model I'm looking at has three, they look to have another
> configuration where 5 would end up on the line).
>
>   
>>    BTW, it seems I've found another interrupt lossage path in the driver:
>>     
>
>   
>> netdev_poll() -> netdev_rx() -> reset_rx()
>>     
>
>   
>> If the netdev_rx() detects an oversized packet, it will call reset_rx() 
>> which will spin in a loop "collecting" interrupt status until it sees 
>> RxResetDone there.  The issue is 'intr_status' field will get overwritten 
>> and interrupt status lost after netdev_rx() returns to netdev_poll().  How 
>> do you think, is this corner case worth fixing (by moving netdev_rx() call
>> to the top of a do/while loop)?
>>     
>
> Moving netdev_rx() would fix that one but there's some others too -
> there's one in the timer routine if the chip crashes.  In the case you
> describe above the consequences shouldn't be too bad since it tends to
> only occur at high volume so further traffic will tend to occur and
> cause things to recover - all the testing of that patch was done with
> the bug present and no ill effects.
>   
The proposed patch would seem to be way overkill.  The problem, as 
solved in this patch, at least is just the prevention of the ISR from 
reading the intr_status while another context (NAPI) is still active.  
The simplest fix is to revert the netpoll attempted fix and then modify 
the isr etnry criteria as follows:

--- natsemi.c   2006-09-19 20:42:06.000000000 -0700
+++ natsemi.c.new       2007-03-12 11:35:28.000000000 -0700
@@ -2094,7 +2094,7 @@
        struct netdev_private *np = netdev_priv(dev);
        void __iomem * ioaddr = ns_ioaddr(dev);

-        if (np->hands_off)
+       if (np->hands_off || !readl(ioaddr + IntrEnable))
                return IRQ_NONE;

        /* Reading automatically acknowledges. */

Since the interrupts are enabled as the NAPI-callback exits, and the 
interrupts are disabled in the isr after the callback is scheduled, this 
fully avoids the potential race conditions, and requires no locking.  If 
we want to avoid the ioaccess, then this could be augmented with a flag 
in the device private area which would be tested first, short-circuiting 
the ioaccess most of the time when the callback is already scheduled.  
But the IntrEnable still would need to be check to avoid the race on the 
enable/disable condition between the callback and the isr.

It is possible to entirely avoid the IntrEnable access and use only a 
flag, but that requires a local_irqsave/restore around the calback's 
enabling of the interrupt and and clearing of the poll_active flag.  
That is fully up-safe, but may have a theoretical possibility of an 
unserviced (spurious) interrupt on SMP systems.  Such a possibility 
would only exist if an architecture is able to accept an interrupt on a 
second processor and get to the natsemi isr before the the first 
proccesor can clear the poll_active flag.  That can be further minimized 
by omitting the readback of the IntrEnable register, or deferring it 
until after the flag is cleared.

If you would like me to prepare formal patches based on any of these 
concepts, let me know.

Mark Huth
> Subject: natsemi: Fix NAPI for interrupt sharing
> To: Jeff Garzik <jeff@garzik.org>
> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>, Simon Blake <simon@citylink.co.nz>, John Philips <johnphilips42@yahoo.com>, netdev@vger.kernel.org
>
> The interrupt status register for the natsemi chips is clear on read and
> was read unconditionally from both the interrupt and from the NAPI poll
> routine, meaning that if the interrupt service routine was called (for 
> example, due to a shared interrupt) while a NAPI poll was scheduled
> interrupts could be missed.  This patch fixes that by ensuring that the
> interrupt status register is only read when there is no poll scheduled.
>
> It also reverts a workaround for this problem from the netpoll hook.
>
> Thanks to Sergei Shtylyov <sshtylyov@ru.mvista.com> for spotting the
> issue and Simon Blake <simon@citylink.co.nz> for testing resources.
>
> Signed-Off-By: Mark Brown <broonie@sirena.org.uk>
>
> Index: linux-2.6/drivers/net/natsemi.c
> ===================================================================
> --- linux-2.6.orig/drivers/net/natsemi.c	2007-03-11 02:32:43.000000000 +0000
> +++ linux-2.6/drivers/net/natsemi.c	2007-03-11 12:09:14.000000000 +0000
> @@ -571,6 +571,8 @@
>  	int oom;
>  	/* Interrupt status */
>  	u32 intr_status;
> +	int poll_active;
> +	spinlock_t intr_lock;
>  	/* Do not touch the nic registers */
>  	int hands_off;
>  	/* Don't pay attention to the reported link state. */
> @@ -812,9 +814,11 @@
>  	pci_set_drvdata(pdev, dev);
>  	np->iosize = iosize;
>  	spin_lock_init(&np->lock);
> +	spin_lock_init(&np->intr_lock);
>  	np->msg_enable = (debug >= 0) ? (1<<debug)-1 : NATSEMI_DEF_MSG;
>  	np->hands_off = 0;
>  	np->intr_status = 0;
> +	np->poll_active = 0;
>  	np->eeprom_size = natsemi_pci_info[chip_idx].eeprom_size;
>  	if (natsemi_pci_info[chip_idx].flags & NATSEMI_FLAG_IGNORE_PHY)
>  		np->ignore_phy = 1;
> @@ -1406,6 +1410,8 @@
>  	writel(rfcr, ioaddr + RxFilterAddr);
>  }
>  
> +/* MUST be called so that both NAPI poll and ISR are excluded due to
> + * use of intr_status. */
>  static void reset_rx(struct net_device *dev)
>  {
>  	int i;
> @@ -2118,30 +2124,45 @@
>  	struct net_device *dev = dev_instance;
>  	struct netdev_private *np = netdev_priv(dev);
>  	void __iomem * ioaddr = ns_ioaddr(dev);
> +	unsigned long flags;
> +	irqreturn_t status = IRQ_NONE;
>  
>  	if (np->hands_off)
>  		return IRQ_NONE;
>  
> -	/* Reading automatically acknowledges. */
> -	np->intr_status = readl(ioaddr + IntrStatus);
> -
> -	if (netif_msg_intr(np))
> -		printk(KERN_DEBUG
> -		       "%s: Interrupt, status %#08x, mask %#08x.\n",
> -		       dev->name, np->intr_status,
> -		       readl(ioaddr + IntrMask));
> +	spin_lock_irqsave(&np->intr_lock, flags);
>  
> -	if (!np->intr_status)
> -		return IRQ_NONE;
> +	/* Reading IntrStatus automatically acknowledges so don't do
> +	 * that while a poll is scheduled.  */
> +	if (!np->poll_active) {
> +		np->intr_status = readl(ioaddr + IntrStatus);
>  
> -	prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
> +		if (netif_msg_intr(np))
> +			printk(KERN_DEBUG
> +			       "%s: Interrupt, status %#08x, mask %#08x.\n",
> +			       dev->name, np->intr_status,
> +			       readl(ioaddr + IntrMask));
> +
> +		if (np->intr_status) {
> +			prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
> +
> +			/* Disable interrupts and register for poll */
> +			if (netif_rx_schedule_prep(dev)) {
> +				natsemi_irq_disable(dev);
> +				__netif_rx_schedule(dev);
> +				np->poll_active = 1;
> +			} else
> +				printk(KERN_WARNING
> +			       	       "%s: Ignoring interrupt, status %#08x, mask %#08x.\n",
> +				       dev->name, np->intr_status,
> +				       readl(ioaddr + IntrMask));
>  
> -	if (netif_rx_schedule_prep(dev)) {
> -		/* Disable interrupts and register for poll */
> -		natsemi_irq_disable(dev);
> -		__netif_rx_schedule(dev);
> +			status = IRQ_HANDLED;
> +		}
>  	}
> -	return IRQ_HANDLED;
> +
> +	spin_unlock_irqrestore(&np->intr_lock, flags);
> +	return status;
>  }
>  
>  /* This is the NAPI poll routine.  As well as the standard RX handling
> @@ -2154,8 +2175,15 @@
>  
>  	int work_to_do = min(*budget, dev->quota);
>  	int work_done = 0;
> +	unsigned long flags;
>  
>  	do {
> +		if (netif_msg_intr(np))
> +			printk(KERN_DEBUG
> +			       "%s: Poll, status %#08x, mask %#08x.\n",
> +			       dev->name, np->intr_status,
> +			       readl(ioaddr + IntrMask));
> +
>  		if (np->intr_status &
>  		    (IntrTxDone | IntrTxIntr | IntrTxIdle | IntrTxErr)) {
>  			spin_lock(&np->lock);
> @@ -2182,14 +2210,19 @@
>  		np->intr_status = readl(ioaddr + IntrStatus);
>  	} while (np->intr_status);
>  
> +	/* We need to ensure that the ISR doesn't run between telling
> +	 * NAPI we're done and enabling the interrupt. */
> +	spin_lock_irqsave(&np->intr_lock, flags);
> +
>  	netif_rx_complete(dev);
> +	np->poll_active = 0;
>  
>  	/* Reenable interrupts providing nothing is trying to shut
>  	 * the chip down. */
> -	spin_lock(&np->lock);
> -	if (!np->hands_off && netif_running(dev))
> +	if (!np->hands_off)
>  		natsemi_irq_enable(dev);
> -	spin_unlock(&np->lock);
> +
> +	spin_unlock_irqrestore(&np->intr_lock, flags);
>  
>  	return 0;
>  }
> @@ -2399,19 +2432,8 @@
>  #ifdef CONFIG_NET_POLL_CONTROLLER
>  static void natsemi_poll_controller(struct net_device *dev)
>  {
> -	struct netdev_private *np = netdev_priv(dev);
> -
>  	disable_irq(dev->irq);
> -
> -	/*
> -	 * A real interrupt might have already reached us at this point
> -	 * but NAPI might still haven't called us back.  As the interrupt
> -	 * status register is cleared by reading, we should prevent an
> -	 * interrupt loss in this case...
> -	 */
> -	if (!np->intr_status)
> -		intr_handler(dev->irq, dev);
> -
> +	intr_handler(dev->irq, dev);
>  	enable_irq(dev->irq);
>  }
>  #endif
>
>   


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-12 13:05         ` Sergei Shtylyov
@ 2007-03-12 19:11           ` Mark Brown
  2007-03-13 13:53             ` Sergei Shtylyov
  2007-03-12 19:12           ` Sergei Shtylyov
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Brown @ 2007-03-12 19:11 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Mark Huth, jgarzik, netdev

[-- Attachment #1: Type: text/plain, Size: 3888 bytes --]

On Mon, Mar 12, 2007 at 04:05:48PM +0300, Sergei Shtylyov wrote:
> Mark Brown wrote:

> >hands_off is stronger than that - it's used for sync with some of the
> >other code paths like suspend/resume and means "don't touch the chip".
> >I've added a new driver local flag instead.

>    I'm not sure it was worth it -- we already had __LINK_STATE_RX_SCHED...

It would be nice but as well as being a general layering violation it
could cause problems when trying to quiesce the hardware since
netif_poll_disable() sets this without actually scheduling the poll.

> >>  Yeah, it seems currently unjustified.  However IntrEnable would have 
> >>  been an ultimate criterion on taking or ignoring an interrupt 
> >>  otherwise...

>>>I guess the tradeoff depends on the probability
>>>of getting the isr called when NAPI is active for the device.

>    I mean I didn't understand why there's tradeoff and how it depends on 
>    the probability...

Reading device registers means going over the PCI bus, which is
expensive.  Shadowing the interrupt mask state in the driver makes
the driver more complicated but means that we avoid synchronous PCI
accesses, reducing the number of cycles the CPU spends stalled doing
those.

The performance benefit from the extra code complexity depends on how
often we end up doing the extra read.  Since one of the things NAPI does
is provide some interrupt mitigation the extra work is most likely to be
noticable if some other device is generating interrupts that trigger the
natsemi interrupt handler.

> >Moving netdev_rx() would fix that one but there's some others too -
> >there's one in the timer routine if the chip crashes.  In the case you

>    Erm, sorry, I'm not seeing it -- could you "point with finger" please? 
>    :-)

In netdev_timer() when the device is using PORT_TP if the DspCfg read
back from the chip differs from the one we think we programmed into it
then the driver thinks the PHY fell over.  It then goes through an init
sequence, including init_registers() which will reset IntrEnable among
other things.

> >describe above the consequences shouldn't be too bad since it tends to
> >only occur at high volume so further traffic will tend to occur and
> >cause things to recover - all the testing of that patch was done with
> >the bug present and no ill effects.

>    Oversized packets occur only at high volume? Is it some errata?

It's an errata - AN 1287 which you can get from the National web site.
It's not actually that chip that's getting oversided packets, what
happens is that the state machine which reads data off the wire gets
confused and eventually locks up.  Before locking up it will usually
report one or more oversided packets so this is a useful hint that we
should reset the recieve state machine in order to recover from this.

>    (If I only knew somebody else was working on that issue, it could have 
> saved my cycles, sigh... but well, I should have said  that I was going to 
> recast the patch. :-)

Yeah, me too.  I'll submit this one once I've finished testing, then
audit other IntrStatus accesses.

> >-		       readl(ioaddr + IntrMask));
> >+	spin_lock_irqsave(&np->intr_lock, flags);

>    Yeah, I've suspected that we need to grab np->lock here... but does that 
> separate spinlock actually protect us from anything?

...

> >+	/* We need to ensure that the ISR doesn't run between telling
> >+	 * NAPI we're done and enabling the interrupt. */
> 
>    Why? :-O

This is to ensure that the interrupt handler can't run between us
marking that the poll has complete and reenabling the interrupt from the
chip.  That would mean that the chip would end up with interrupts from
the chip enabled while the poll is scheduled.

-- 
"You grabbed my hand and we fell into it, like a daydream - or a fever."

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-12 13:05         ` Sergei Shtylyov
  2007-03-12 19:11           ` Mark Brown
@ 2007-03-12 19:12           ` Sergei Shtylyov
  1 sibling, 0 replies; 15+ messages in thread
From: Sergei Shtylyov @ 2007-03-12 19:12 UTC (permalink / raw)
  To: Mark Brown; +Cc: Mark Huth, jgarzik, netdev

Hello, I wrote:

>> Subject: natsemi: Fix NAPI for interrupt sharing
>> To: Jeff Garzik <jeff@garzik.org>
>> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>, Simon Blake 
>> <simon@citylink.co.nz>, John Philips <johnphilips42@yahoo.com>, 
>> netdev@vger.kernel.org

>> The interrupt status register for the natsemi chips is clear on read and
>> was read unconditionally from both the interrupt and from the NAPI poll
>> routine, meaning that if the interrupt service routine was called (for 
>> example, due to a shared interrupt) while a NAPI poll was scheduled
>> interrupts could be missed.  This patch fixes that by ensuring that the
>> interrupt status register is only read when there is no poll scheduled.

>> It also reverts a workaround for this problem from the netpoll hook.

>> Thanks to Sergei Shtylyov <sshtylyov@ru.mvista.com> for spotting the

    Well, I've blithely overlooked it, and it's you who did spot it. :-)

>> issue and Simon Blake <simon@citylink.co.nz> for testing resources.

>    Thanks for the patch!
>    (If I only knew somebody else was working on that issue, it could 
> have saved my cycles, sigh... but well, I should have said  that I was 
> going to recast the patch. :-)

>> Signed-Off-By: Mark Brown <broonie@sirena.org.uk>

>> Index: linux-2.6/drivers/net/natsemi.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/net/natsemi.c    2007-03-11 
>> 02:32:43.000000000 +0000
>> +++ linux-2.6/drivers/net/natsemi.c    2007-03-11 12:09:14.000000000 
>> +0000
>> @@ -571,6 +571,8 @@
>>      int oom;
>>      /* Interrupt status */
>>      u32 intr_status;
>> +    int poll_active;
>> +    spinlock_t intr_lock;
>>      /* Do not touch the nic registers */
>>      int hands_off;
>>      /* Don't pay attention to the reported link state. */
>> @@ -812,9 +814,11 @@
>>      pci_set_drvdata(pdev, dev);
>>      np->iosize = iosize;
>>      spin_lock_init(&np->lock);
>> +    spin_lock_init(&np->intr_lock);
>>      np->msg_enable = (debug >= 0) ? (1<<debug)-1 : NATSEMI_DEF_MSG;
>>      np->hands_off = 0;
>>      np->intr_status = 0;
>> +    np->poll_active = 0;
>>      np->eeprom_size = natsemi_pci_info[chip_idx].eeprom_size;
>>      if (natsemi_pci_info[chip_idx].flags & NATSEMI_FLAG_IGNORE_PHY)
>>          np->ignore_phy = 1;
>> @@ -1406,6 +1410,8 @@
>>      writel(rfcr, ioaddr + RxFilterAddr);
>>  }
>>  
>> +/* MUST be called so that both NAPI poll and ISR are excluded due to
>> + * use of intr_status. */
>>  static void reset_rx(struct net_device *dev)
>>  {
>>      int i;
>> @@ -2118,30 +2124,45 @@
>>      struct net_device *dev = dev_instance;
>>      struct netdev_private *np = netdev_priv(dev);
>>      void __iomem * ioaddr = ns_ioaddr(dev);
>> +    unsigned long flags;
>> +    irqreturn_t status = IRQ_NONE;
>>  
>>      if (np->hands_off)
>>          return IRQ_NONE;
>>  
>> -    /* Reading automatically acknowledges. */
>> -    np->intr_status = readl(ioaddr + IntrStatus);
>> -
>> -    if (netif_msg_intr(np))
>> -        printk(KERN_DEBUG
>> -               "%s: Interrupt, status %#08x, mask %#08x.\n",
>> -               dev->name, np->intr_status,
>> -               readl(ioaddr + IntrMask));
>> +    spin_lock_irqsave(&np->intr_lock, flags);

>    Yeah, I've suspected that we need to grab np->lock here... but does 
> that separate spinlock actually protect us from anything?

    I'm also not sure that we need to disable interrupts here.

>> -    if (!np->intr_status)
>> -        return IRQ_NONE;
>> +    /* Reading IntrStatus automatically acknowledges so don't do
>> +     * that while a poll is scheduled.  */
>> +    if (!np->poll_active) {
>> +        np->intr_status = readl(ioaddr + IntrStatus);
>>  
>> -    prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
>> +        if (netif_msg_intr(np))
>> +            printk(KERN_DEBUG
>> +                   "%s: Interrupt, status %#08x, mask %#08x.\n",
>> +                   dev->name, np->intr_status,
>> +                   readl(ioaddr + IntrMask));
>> +
>> +        if (np->intr_status) {
>> +            prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
>> +
>> +            /* Disable interrupts and register for poll */
>> +            if (netif_rx_schedule_prep(dev)) {
>> +                natsemi_irq_disable(dev);
>> +                __netif_rx_schedule(dev);
>> +                np->poll_active = 1;
>> +            } else
>> +                printk(KERN_WARNING
>> +                              "%s: Ignoring interrupt, status %#08x, 
>> mask %#08x.\n",
>> +                       dev->name, np->intr_status,
>> +                       readl(ioaddr + IntrMask));
>>  
>> -    if (netif_rx_schedule_prep(dev)) {
>> -        /* Disable interrupts and register for poll */
>> -        natsemi_irq_disable(dev);
>> -        __netif_rx_schedule(dev);
>> +            status = IRQ_HANDLED;
>> +        }
>>      }
>> -    return IRQ_HANDLED;
>> +
>> +    spin_unlock_irqrestore(&np->intr_lock, flags);
>> +    return status;
>>  }
>>  
>>  /* This is the NAPI poll routine.  As well as the standard RX handling
>> @@ -2154,8 +2175,15 @@
>>  
>>      int work_to_do = min(*budget, dev->quota);
>>      int work_done = 0;
>> +    unsigned long flags;
>>  
>>      do {
>> +        if (netif_msg_intr(np))
>> +            printk(KERN_DEBUG
>> +                   "%s: Poll, status %#08x, mask %#08x.\n",
>> +                   dev->name, np->intr_status,
>> +                   readl(ioaddr + IntrMask));
>> +
>>          if (np->intr_status &
>>              (IntrTxDone | IntrTxIntr | IntrTxIdle | IntrTxErr)) {
>>              spin_lock(&np->lock);
>> @@ -2182,14 +2210,19 @@
>>          np->intr_status = readl(ioaddr + IntrStatus);
>>      } while (np->intr_status);
>>  
>> +    /* We need to ensure that the ISR doesn't run between telling
>> +     * NAPI we're done and enabling the interrupt. */

>    Why? :-O

    Ah, got it: intr_handler() may disable interrupts (if some have appeared 
since the last IntrStatus read) and upon return poll() will erroneously 
re-enable them again...  Good catch! :-)
    Could also been dealt with by checking if the interrupt is actually 
enabled in intr_handler() -- so, this would now seem a better solution to me 
as we don't have to introduce flags/spinlocks, and avoid interrupt-off latency...

>> +    spin_lock_irqsave(&np->intr_lock, flags);
>> +
>>      netif_rx_complete(dev);
>> +    np->poll_active = 0;
>>  
>>      /* Reenable interrupts providing nothing is trying to shut
>>       * the chip down. */
>> -    spin_lock(&np->lock);
>> -    if (!np->hands_off && netif_running(dev))
>> +    if (!np->hands_off)
>>          natsemi_irq_enable(dev);
>> -    spin_unlock(&np->lock);
>> +
>> +    spin_unlock_irqrestore(&np->intr_lock, flags);

    Not really sure we can replace one spinlock with another...

>>      return 0;
>>  }

WBR, Sergei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-12 19:05         ` Mark Huth
@ 2007-03-13  0:14           ` Mark Brown
  2007-03-13 13:45             ` Sergei Shtylyov
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Brown @ 2007-03-13  0:14 UTC (permalink / raw)
  To: Mark Huth; +Cc: Sergei Shtylyov, jgarzik, netdev

[-- Attachment #1: Type: text/plain, Size: 4347 bytes --]

On Mon, Mar 12, 2007 at 12:05:46PM -0700, Mark Huth wrote:

> Since the interrupts are enabled as the NAPI-callback exits, and the 
> interrupts are disabled in the isr after the callback is scheduled, this 
> fully avoids the potential race conditions, and requires no locking.  If 

I've benchmarked both approaches and they appear to be much of a
muchness for performance in my tests so I've updated my patch to use
this approach instead.  Thanks for the suggestion, it is simpler.

> If you would like me to prepare formal patches based on any of these 
> concepts, let me know.

That's OK - unless any further suggestions I'll submit the patch below
after further testing.  The improvements to trace and correctness of the
interupt handler seem useful.

Subject: natsemi: Fix NAPI for interrupt sharing

The interrupt status register for the natsemi chips is clear on read and
was read unconditionally from both the interrupt and from the NAPI poll
routine, meaning that if the interrupt service routine was called (for 
example, due to a shared interrupt) while a NAPI poll was scheduled
interrupts could be missed.  This patch fixes that by ensuring that the
interrupt status register is only read by the interrupt handler when
interrupts are enabled from the chip.

It also reverts a workaround for this problem from the netpoll hook and
improves the trace for interrupt events.

Thanks to Sergei Shtylyov <sshtylyov@ru.mvista.com> for spotting the
issue, Mark Huth <mhuth@mvista.com> for a simpler method and Simon
Blake <simon@citylink.co.nz> for testing resources.

Signed-Off-By: Mark Brown <broonie@sirena.org.uk>

Index: linux-2.6/drivers/net/natsemi.c
===================================================================
--- linux-2.6.orig/drivers/net/natsemi.c	2007-03-11 02:32:43.000000000 +0000
+++ linux-2.6/drivers/net/natsemi.c	2007-03-13 00:12:29.000000000 +0000
@@ -2119,10 +2119,12 @@
 	struct netdev_private *np = netdev_priv(dev);
 	void __iomem * ioaddr = ns_ioaddr(dev);
 
-	if (np->hands_off)
+	/* Reading IntrStatus automatically acknowledges so don't do
+	 * that while interrupts are disabled, (for example, while a
+	 * poll is scheduled).  */
+	if (np->hands_off || !readl(ioaddr + IntrEnable))
 		return IRQ_NONE;
 
-	/* Reading automatically acknowledges. */
 	np->intr_status = readl(ioaddr + IntrStatus);
 
 	if (netif_msg_intr(np))
@@ -2131,17 +2133,23 @@
 		       dev->name, np->intr_status,
 		       readl(ioaddr + IntrMask));
 
-	if (!np->intr_status)
-		return IRQ_NONE;
-
-	prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
+	if (np->intr_status) {
+		prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
 
-	if (netif_rx_schedule_prep(dev)) {
 		/* Disable interrupts and register for poll */
-		natsemi_irq_disable(dev);
-		__netif_rx_schedule(dev);
+		if (netif_rx_schedule_prep(dev)) {
+			natsemi_irq_disable(dev);
+			__netif_rx_schedule(dev);
+		} else
+			printk(KERN_WARNING
+		       	       "%s: Ignoring interrupt, status %#08x, mask %#08x.\n",
+			       dev->name, np->intr_status,
+			       readl(ioaddr + IntrMask));
+
+		return IRQ_HANDLED;
 	}
-	return IRQ_HANDLED;
+
+	return IRQ_NONE;
 }
 
 /* This is the NAPI poll routine.  As well as the standard RX handling
@@ -2156,6 +2164,12 @@
 	int work_done = 0;
 
 	do {
+		if (netif_msg_intr(np))
+			printk(KERN_DEBUG
+			       "%s: Poll, status %#08x, mask %#08x.\n",
+			       dev->name, np->intr_status,
+			       readl(ioaddr + IntrMask));
+
 		if (np->intr_status &
 		    (IntrTxDone | IntrTxIntr | IntrTxIdle | IntrTxErr)) {
 			spin_lock(&np->lock);
@@ -2399,19 +2413,8 @@
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void natsemi_poll_controller(struct net_device *dev)
 {
-	struct netdev_private *np = netdev_priv(dev);
-
 	disable_irq(dev->irq);
-
-	/*
-	 * A real interrupt might have already reached us at this point
-	 * but NAPI might still haven't called us back.  As the interrupt
-	 * status register is cleared by reading, we should prevent an
-	 * interrupt loss in this case...
-	 */
-	if (!np->intr_status)
-		intr_handler(dev->irq, dev);
-
+	intr_handler(dev->irq, dev);
 	enable_irq(dev->irq);
 }
 #endif

-- 
"You grabbed my hand and we fell into it, like a daydream - or a fever."

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-13  0:14           ` Mark Brown
@ 2007-03-13 13:45             ` Sergei Shtylyov
  0 siblings, 0 replies; 15+ messages in thread
From: Sergei Shtylyov @ 2007-03-13 13:45 UTC (permalink / raw)
  To: Mark Brown; +Cc: Mark Huth, jgarzik, netdev

Hello.

Mark Brown wrote:

> Subject: natsemi: Fix NAPI for interrupt sharing

> The interrupt status register for the natsemi chips is clear on read and
> was read unconditionally from both the interrupt and from the NAPI poll
> routine, meaning that if the interrupt service routine was called (for 
> example, due to a shared interrupt) while a NAPI poll was scheduled
> interrupts could be missed.  This patch fixes that by ensuring that the
> interrupt status register is only read by the interrupt handler when
> interrupts are enabled from the chip.
> 
> It also reverts a workaround for this problem from the netpoll hook and
> improves the trace for interrupt events.
> 
> Thanks to Sergei Shtylyov <sshtylyov@ru.mvista.com> for spotting the
> issue, Mark Huth <mhuth@mvista.com> for a simpler method and Simon
> Blake <simon@citylink.co.nz> for testing resources.

> Signed-Off-By: Mark Brown <broonie@sirena.org.uk>

> Index: linux-2.6/drivers/net/natsemi.c
> ===================================================================
> --- linux-2.6.orig/drivers/net/natsemi.c	2007-03-11 02:32:43.000000000 +0000
> +++ linux-2.6/drivers/net/natsemi.c	2007-03-13 00:12:29.000000000 +0000
[...]
> @@ -2131,17 +2133,23 @@
>  		       dev->name, np->intr_status,
>  		       readl(ioaddr + IntrMask));
>  
> -	if (!np->intr_status)
> -		return IRQ_NONE;
> -
> -	prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
> +	if (np->intr_status) {
> +		prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
>  
> -	if (netif_rx_schedule_prep(dev)) {
>  		/* Disable interrupts and register for poll */
> -		natsemi_irq_disable(dev);
> -		__netif_rx_schedule(dev);
> +		if (netif_rx_schedule_prep(dev)) {
> +			natsemi_irq_disable(dev);
> +			__netif_rx_schedule(dev);
> +		} else
> +			printk(KERN_WARNING
> +		       	       "%s: Ignoring interrupt, status %#08x, mask %#08x.\n",
> +			       dev->name, np->intr_status,
> +			       readl(ioaddr + IntrMask));
> +
> +		return IRQ_HANDLED;
>  	}
> -	return IRQ_HANDLED;
> +
> +	return IRQ_NONE;
>  }

    The only "complaint" I have is that this "restructuring" seems 
unnecessary: the only real change it does is an addition of else to the if 
statement.

WBR, Sergei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-12 19:11           ` Mark Brown
@ 2007-03-13 13:53             ` Sergei Shtylyov
  2007-03-13 19:31               ` Mark Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Sergei Shtylyov @ 2007-03-13 13:53 UTC (permalink / raw)
  To: Mark Brown; +Cc: Mark Huth, jgarzik, netdev

Hello.

Mark Brown wrote:

>>>Moving netdev_rx() would fix that one but there's some others too -
>>>there's one in the timer routine if the chip crashes.  In the case you

>>   Erm, sorry, I'm not seeing it -- could you "point with finger" please? 
>>   :-)

> In netdev_timer() when the device is using PORT_TP if the DspCfg read
> back from the chip differs from the one we think we programmed into it
> then the driver thinks the PHY fell over.  It then goes through an init
> sequence, including init_registers() which will reset IntrEnable among
> other things.

    What's more important for us, it will also clear IntrStatus (and ignore 
all pending interrupts).  Well, as it will also reinit the whole TX/RX rings, 
so that all packets will be lost...

>>>describe above the consequences shouldn't be too bad since it tends to
>>>only occur at high volume so further traffic will tend to occur and
>>>cause things to recover - all the testing of that patch was done with
>>>the bug present and no ill effects.

>>   Oversized packets occur only at high volume? Is it some errata?

> It's an errata - AN 1287 which you can get from the National web site.
> It's not actually that chip that's getting oversided packets, what
> happens is that the state machine which reads data off the wire gets
> confused and eventually locks up.  Before locking up it will usually
> report one or more oversided packets so this is a useful hint that we
> should reset the recieve state machine in order to recover from this.

    That's all good by why we need to completely lose TX and other interrupts
in the meantime? High inbound traffic doesn't necessarily mean a high outbound 
one, does it?

WBR, Sergei


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] natsemi: netpoll fixes
  2007-03-13 13:53             ` Sergei Shtylyov
@ 2007-03-13 19:31               ` Mark Brown
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Brown @ 2007-03-13 19:31 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Mark Huth, jgarzik, netdev

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

On Tue, Mar 13, 2007 at 04:53:54PM +0300, Sergei Shtylyov wrote:
> Mark Brown wrote:

> >confused and eventually locks up.  Before locking up it will usually
> >report one or more oversided packets so this is a useful hint that we
> >should reset the recieve state machine in order to recover from this.

>    That's all good by why we need to completely lose TX and other interrupts
> in the meantime? High inbound traffic doesn't necessarily mean a high 
> outbound one, does it?

While the code in the driver can cope if the chip takes a while to
respond to the reset as far as I have been able to tell in testing 
it does so close enough to immediately to avoid repeating the loop at
all.  The effect on transmit processing should be minimal.

-- 
"You grabbed my hand and we fell into it, like a daydream - or a fever."

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-03-13 19:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-05 20:10 [PATCH] natsemi: netpoll fixes Sergei Shtylyov
2007-03-05 22:41 ` Mark Brown
2007-03-05 22:43 ` Mark Brown
2007-03-06  4:10   ` Mark Huth
2007-03-10 20:25     ` Sergei Shtylyov
2007-03-11 12:16       ` Mark Brown
2007-03-12 13:05         ` Sergei Shtylyov
2007-03-12 19:11           ` Mark Brown
2007-03-13 13:53             ` Sergei Shtylyov
2007-03-13 19:31               ` Mark Brown
2007-03-12 19:12           ` Sergei Shtylyov
2007-03-12 19:05         ` Mark Huth
2007-03-13  0:14           ` Mark Brown
2007-03-13 13:45             ` Sergei Shtylyov
2007-03-06 11:10 ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).