Q: natsemi.c spinlocks

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Q: natsemi.c spinlocks
@ 2000-12-23 17:45 Manfred
  2000-12-24  0:54 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Manfred @ 2000-12-23 17:45 UTC (permalink / raw)
  To: mulder.abg, jgarzik; +Cc: linux-kernel

Hi Jeff, Tjeerd,

I spotted the spin_lock in natsemi.c, and I think it's bogus.

The "simultaneous interrupt entry" is a bug in some 2.0 and 2.1 kernel
(even Alan didn't remember it exactly when I asked him), thus a sane
driver can assume that an interrupt handler is never reentered.

Donald often uses dev->interrupt to hide other races, but I don't see
anything in this driver (tx_timeout and netdev_timer are both trivial)

--
  Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Q: natsemi.c spinlocks
  2000-12-23 17:45 Q: natsemi.c spinlocks Manfred
@ 2000-12-24  0:54 ` Andrew Morton
  2000-12-24 11:15   ` Manfred
  2001-01-21 14:43   ` [PATCH] " Jeff Garzik
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2000-12-24  0:54 UTC (permalink / raw)
  To: Manfred; +Cc: mulder.abg, jgarzik, linux-kernel

Manfred wrote:
> 
> Hi Jeff, Tjeerd,
> 
> I spotted the spin_lock in natsemi.c, and I think it's bogus.
> 
> The "simultaneous interrupt entry" is a bug in some 2.0 and 2.1 kernel
> (even Alan didn't remember it exactly when I asked him), thus a sane
> driver can assume that an interrupt handler is never reentered.
> 
> Donald often uses dev->interrupt to hide other races, but I don't see
> anything in this driver (tx_timeout and netdev_timer are both trivial)

Hi, Manfed.

I think you're right.  2.4's interrupt handling prevents
simultaneous entry of the same ISR.

However, natsemi.c's spinlock needs to be retained, and
extended into start_tx(), because this driver has
a race which has cropped up in a few others:

Current code:

start_tx()
{
	...
	if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) {
	/* WINDOW HERE */
                np->tx_full = 1;
                netif_stop_queue(dev);
        }
	...
}

If the ring is currently full and an interrupt comes in
at the indicated window and reaps ALL the packets in the
ring, the driver ends up in state `tx_full = 1' and tramsmit
disabled, but with no outstanding transmit interrupts.

It's screwed.  You need another interrupt so tx_full
can be cleared and the queue can be restarted, but you can't
*get* another interrupt because there are no Tx packets outstanding.

It's very unlikely to happen with this particular driver
because it's also polling the transmit queue within
receive interrupts.  Receiving a packet will clear
the condition.

If you were madly hosing out UDP packets and receiving nothing
then this could occur.  It was certainly triggerable in 3c59x.c,
which doesn't test the Tx queue state in Rx interrupts.

I currently have natsemi.c lying in pieces on my garage floor,
so I'll put this locking in if it's OK with everyone?

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Q: natsemi.c spinlocks
  2000-12-24  0:54 ` Andrew Morton
@ 2000-12-24 11:15   ` Manfred
  2001-01-21 14:38     ` Jeff Garzik
  2001-01-21 14:43   ` [PATCH] " Jeff Garzik
  1 sibling, 1 reply; 7+ messages in thread
From: Manfred @ 2000-12-24 11:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: jgarzik, linux-kernel

Andrew Morton wrote:
> 
> start_tx()
> {

Yes, I overlooked start_tx.

Hmm. start_tx also assumes that the cpu commits writes in order, I'm
sure the driver is unreliable on RISC cpus.

Perhaps the driver should use pci_alloc_consistent and pci_map_single?

--
  Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Q: natsemi.c spinlocks
  2000-12-24 11:15   ` Manfred
@ 2001-01-21 14:38     ` Jeff Garzik
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Garzik @ 2001-01-21 14:38 UTC (permalink / raw)
  To: Manfred; +Cc: Andrew Morton, linux-kernel

Manfred wrote:
> 
> Andrew Morton wrote:
> >
> > start_tx()
> > {
> 
> Yes, I overlooked start_tx.
> 
> Hmm. start_tx also assumes that the cpu commits writes in order, I'm
> sure the driver is unreliable on RISC cpus.
> 
> Perhaps the driver should use pci_alloc_consistent and pci_map_single?

Eventually, all drivers which use PCI DMA of some sort -should- use
pci_alloc_consistent, etc.

	Jeff


-- 
Jeff Garzik       | "You see, in this world there's two kinds of
Building 1024     |  people, my friend: Those with loaded guns
MandrakeSoft      |  and those who dig. You dig."  --Blondie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] Re: Q: natsemi.c spinlocks
  2000-12-24  0:54 ` Andrew Morton
  2000-12-24 11:15   ` Manfred
@ 2001-01-21 14:43   ` Jeff Garzik
  2001-01-22  7:06     ` Donald Becker
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2001-01-21 14:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Manfred, mulder.abg, linux-kernel, Donald Becker

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

Andrew Morton wrote:
> 
> Manfred wrote:
> >
> > Hi Jeff, Tjeerd,
> >
> > I spotted the spin_lock in natsemi.c, and I think it's bogus.
> >
> > The "simultaneous interrupt entry" is a bug in some 2.0 and 2.1 kernel
> > (even Alan didn't remember it exactly when I asked him), thus a sane
> > driver can assume that an interrupt handler is never reentered.
> >
> > Donald often uses dev->interrupt to hide other races, but I don't see
> > anything in this driver (tx_timeout and netdev_timer are both trivial)
> 
> Hi, Manfed.
> 
> I think you're right.  2.4's interrupt handling prevents
> simultaneous entry of the same ISR.
> 
> However, natsemi.c's spinlock needs to be retained, and
> extended into start_tx(), because this driver has
> a race which has cropped up in a few others:
> 
> Current code:
> 
> start_tx()
> {
>         ...
>         if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) {
>         /* WINDOW HERE */
>                 np->tx_full = 1;
>                 netif_stop_queue(dev);
>         }
>         ...
> }
> 
> If the ring is currently full and an interrupt comes in
> at the indicated window and reaps ALL the packets in the
> ring, the driver ends up in state `tx_full = 1' and tramsmit
> disabled, but with no outstanding transmit interrupts.
> 
> It's screwed.  You need another interrupt so tx_full
> can be cleared and the queue can be restarted, but you can't
> *get* another interrupt because there are no Tx packets outstanding.
> 
> It's very unlikely to happen with this particular driver
> because it's also polling the transmit queue within
> receive interrupts.  Receiving a packet will clear
> the condition.
> 
> If you were madly hosing out UDP packets and receiving nothing
> then this could occur.  It was certainly triggerable in 3c59x.c,
> which doesn't test the Tx queue state in Rx interrupts.
> 
> I currently have natsemi.c lying in pieces on my garage floor,
> so I'll put this locking in if it's OK with everyone?

(entire message quoted, as it was from 12/23/2000)

Attached is a patch against 2.4.1-pre9, which includes the changes I
would prefer.  Comments?

The Tx locking is a bit conservative -- I think Donald suggested it
could be removed completely -- but I would prefer to have something I am
100% certain will work, and then test the driver without locking under
stress conditions to make sure no race or other bug exists.

	Jeff


-- 
Jeff Garzik       | "You see, in this world there's two kinds of
Building 1024     |  people, my friend: Those with loaded guns
MandrakeSoft      |  and those who dig. You dig."  --Blondie

[-- Attachment #2: natsemi-2.4.1.9.patch --]
[-- Type: text/plain, Size: 4201 bytes --]

Index: linux_2_4/drivers/net/natsemi.c
diff -u linux_2_4/drivers/net/natsemi.c:1.1.1.6 linux_2_4/drivers/net/natsemi.c:1.1.1.6.24.1
--- linux_2_4/drivers/net/natsemi.c:1.1.1.6	Mon Dec 11 19:23:42 2000
+++ linux_2_4/drivers/net/natsemi.c	Sun Jan 21 06:39:02 2001
@@ -26,6 +26,11 @@
 		- Bug fixes and better intr performance (Tjeerd)
 	Version 1.0.2:
 		- Now reads correct MAC address from eeprom
+	Version 1.0.3:
+		- Eliminate redundant priv->tx_full flag
+		- Call netif_start_queue from dev->tx_timeout
+		- wmb() in start_tx() to flush data
+		- Update Tx locking
 
 */
 
@@ -35,7 +40,7 @@
 static const char version2[] =
 "  http://www.scyld.com/network/natsemi.html\n";
 static const char version3[] =
-"  (unofficial 2.4.x kernel port, version 1.0.2, October 6, 2000 Jeff Garzik, Tjeerd Mulder)\n";
+"  (unofficial 2.4.x kernel port, version 1.0.3, January 21, 2001 Jeff Garzik, Tjeerd Mulder)\n";
 /* Updated to recommendations in pci-skeleton v2.03. */
 
 /* Automatically extracted configuration info:
@@ -187,13 +192,14 @@
 
 The send packet thread has partial control over the Tx ring and 'dev->tbusy'
 flag.  It sets the tbusy flag whenever it's queuing a Tx packet. If the next
-queue slot is empty, it clears the tbusy flag when finished otherwise it sets
-the 'lp->tx_full' flag.
+queue slot is empty, it clears the tbusy flag when finished.  Under 2.4, the
+"tbusy flag" is now controlled by netif_{start,stop,wake}_queue() and tested
+by netif_queue_stopped().
 
 The interrupt handler has exclusive control over the Rx ring and records stats
 from the Tx ring.  After reaping the stats, it marks the Tx queue entry as
-empty by incrementing the dirty_tx mark. Iff the 'lp->tx_full' flag is set, it
-clears both the tx_full and tbusy flags.
+empty by incrementing the dirty_tx mark. Iff Tx queueing is stopped and Tx
+entries were reaped, the Tx queue is started and scheduled.
 
 IV. Notes
 
@@ -319,7 +325,6 @@
 	unsigned int cur_rx, dirty_rx;		/* Producer/consumer ring indices */
 	unsigned int cur_tx, dirty_tx;
 	unsigned int rx_buf_sz;				/* Based on MTU+slack. */
-	unsigned int tx_full:1;				/* The Tx queue is full. */
 	/* These values are keep track of the transceiver/media in use. */
 	unsigned int full_duplex:1;			/* Full-duplex operation requested. */
 	unsigned int duplex_lock:1;
@@ -697,7 +702,7 @@
 
 	dev->trans_start = jiffies;
 	np->stats.tx_errors++;
-	return;
+	netif_start_queue(dev);
 }
 
 
@@ -707,7 +712,6 @@
 	struct netdev_private *np = (struct netdev_private *)dev->priv;
 	int i;
 
-	np->tx_full = 0;
 	np->cur_rx = np->cur_tx = 0;
 	np->dirty_rx = np->dirty_tx = 0;
 
@@ -763,11 +767,13 @@
 	np->cur_tx++;
 
 	/* StrongARM: Explicitly cache flush np->tx_ring and skb->data,skb->len. */
+	wmb();
 
-	if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) {
-		np->tx_full = 1;
+	spin_lock_irq(&np->lock);
+	if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1)
 		netif_stop_queue(dev);
-	}
+	spin_unlock_irq(&np->lock);
+
 	/* Wake the potentially-idle transmit channel. */
 	writel(TxOn, dev->base_addr + ChipCmd);
 
@@ -798,9 +804,7 @@
 #endif
 
 	ioaddr = dev->base_addr;
-	np = (struct netdev_private *)dev->priv;
-
-	spin_lock(&np->lock);
+	np = dev->priv;
 
 	do {
 		u32 intr_status = readl(ioaddr + IntrStatus);
@@ -818,6 +822,8 @@
 		if (intr_status & (IntrRxDone | IntrRxErr | IntrRxIdle | IntrRxOverrun))
 			netdev_rx(dev);
 
+		spin_lock(&np->lock);
+
 		for (; np->cur_tx - np->dirty_tx > 0; np->dirty_tx++) {
 			int entry = np->dirty_tx % TX_RING_SIZE;
 			if (np->tx_ring[entry].cmd_status & cpu_to_le32(DescOwn))
@@ -839,13 +845,14 @@
 			dev_kfree_skb_irq(np->tx_skbuff[entry]);
 			np->tx_skbuff[entry] = 0;
 		}
-		if (np->tx_full
+		if (netif_queue_stopped(dev)
 			&& np->cur_tx - np->dirty_tx < TX_QUEUE_LEN - 4) {
 			/* The ring is no longer full, wake queue. */
-			np->tx_full = 0;
 			netif_wake_queue(dev);
 		}
 
+		spin_unlock(&np->lock);
+
 		/* Abnormal error summary/uncommon events handlers. */
 		if (intr_status & IntrAbnormalSummary)
 			netdev_error(dev, intr_status);
@@ -873,8 +880,6 @@
 		}
 	}
 #endif
-
-	spin_unlock(&np->lock);
 }
 
 /* This routine is logically part of the interrupt handler, but separated

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Re: Q: natsemi.c spinlocks
  2001-01-21 14:43   ` [PATCH] " Jeff Garzik
@ 2001-01-22  7:06     ` Donald Becker
  2001-01-22  8:51       ` Manfred Spraul
  0 siblings, 1 reply; 7+ messages in thread
From: Donald Becker @ 2001-01-22  7:06 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andrew Morton, Manfred, mulder.abg, linux-kernel

On Sun, 21 Jan 2001, Jeff Garzik wrote:
> Andrew Morton wrote:
> > Manfred wrote:
> > > Hi Jeff, Tjeerd,
> > > I spotted the spin_lock in natsemi.c, and I think it's bogus.
> > >
> > > The "simultaneous interrupt entry" is a bug in some 2.0 and 2.1 kernel
> > > (even Alan didn't remember it exactly when I asked him), thus a sane
> > > driver can assume that an interrupt handler is never reentered.
...
> > I think you're right.  2.4's interrupt handling prevents
> > simultaneous entry of the same ISR.

The bug (simultaneous calls to the interrupt handler on SMP) existed in
most 2.0 versions was fixed before 2.2.  A driver that needs to work
with multiple kernel versions must have the check.

> > However, natsemi.c's spinlock needs to be retained, and
> > extended into start_tx(), because this driver has
> > a race which has cropped up in a few others:
> >         ...
> >         if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) {
> >         /* WINDOW HERE */
> >                 np->tx_full = 1;
> >                 netif_stop_queue(dev);
> >         }
> > If the ring is currently full and an interrupt comes in
> > at the indicated window and reaps ALL the packets in the
> > ring, the driver ends up in state `tx_full = 1' and tramsmit
> > disabled, but with no outstanding transmit interrupts.

The better solution, which I've been adding to the drivers, is to check
again for a just-cleared Tx queue after setting tx_full.
That trades an extra comparison on a rarely followed path for a spinlock
that is taken for every transmit and interrupt.

Remember: spinlocks are expensive!

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Re: Q: natsemi.c spinlocks
  2001-01-22  7:06     ` Donald Becker
@ 2001-01-22  8:51       ` Manfred Spraul
  0 siblings, 0 replies; 7+ messages in thread
From: Manfred Spraul @ 2001-01-22  8:51 UTC (permalink / raw)
  To: Donald Becker; +Cc: Jeff Garzik, Andrew Morton, mulder.abg, linux-kernel

Donald Becker wrote:
> 
> > > However, natsemi.c's spinlock needs to be retained, and
> > > extended into start_tx(), because this driver has
> > > a race which has cropped up in a few others:
> > >         ...
> > >         if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) {
> > >         /* WINDOW HERE */
> > >                 np->tx_full = 1;
> > >                 netif_stop_queue(dev);
> > >         }
> > > If the ring is currently full and an interrupt comes in
> > > at the indicated window and reaps ALL the packets in the
> > > ring, the driver ends up in state `tx_full = 1' and tramsmit
> > > disabled, but with no outstanding transmit interrupts.
> 
> The better solution, which I've been adding to the drivers, is to check
> again for a just-cleared Tx queue after setting tx_full.
> That trades an extra comparison on a rarely followed path for a spinlock
> that is taken for every transmit and interrupt.
>
Please do not forget the memory barrier(s):
	
	tx_full = 1;
	if(condition)
		...;

That's exactly the sequence that caused deadlocks with wait_queues -
even a Pentium cpu will evaluate the condition before the write to
tx_full is commited. I have a test program (userspace) that reliably
locks up on my P II. I can send you the details if you are interested.

I think you also need a memory barrier in the tx_interrupt codepath.

> 
> Remember: spinlocks are expensive!
> 

But memory barriers are extremely error prone.

What about

tx_interrupt()

	if(netif_queue_stopped(dev)) {
		spin_lock(&np->lock);
		if(np->cur_tx - np->dirty_tx <= TX_QUEUE_LEN/2)
			netif_wake_queue(dev));
		spin_unlock(&np->lock);
	}

hard_xmit()

	if(np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN-1) {
		spin_lock_irq(&np->lock);
		if(np->cur_tx - np_dirty_tx >= TX_QUEUE_LEN-1)
			netif_stop_queue(dev);
		spin_unlock_irq(&np->lock);
	}

--
	Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-01-22  8:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-12-23 17:45 Q: natsemi.c spinlocks Manfred
2000-12-24  0:54 ` Andrew Morton
2000-12-24 11:15   ` Manfred
2001-01-21 14:38     ` Jeff Garzik
2001-01-21 14:43   ` [PATCH] " Jeff Garzik
2001-01-22  7:06     ` Donald Becker
2001-01-22  8:51       ` Manfred Spraul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox