bugfix patch to arch/ppc/8260_io/fcc

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* bugfix patch to arch/ppc/8260_io/fcc_enet.c
@ 2001-04-06 14:16 David Schleef
  2001-04-08  7:15 ` Dan Malek
  0 siblings, 1 reply; 5+ messages in thread
From: David Schleef @ 2001-04-06 14:16 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 480 bytes --]


The attached patch fixes a bug with 8260 fcc_enet driver that
is related to when the TX buffer becomes full.  Currently,
the driver relies on the BD_ENET_TX_READY for determining if
a ring slot is available for a tx buffer.  This is not a
valid criterion, because the interrupt handler may not have
cleared the slot from a previous tx buffer.  This bug is easy
to generate by writing files over NFS with large wsize.

The patch should apply cleanly to 2_4 and 2_5.




dave...



[-- Attachment #2: patch-txring --]
[-- Type: text/plain, Size: 1209 bytes --]

--- fcc_enet.c.25	Fri Apr  6 03:29:06 2001
+++ fcc_enet.c	Fri Apr  6 04:59:36 2001
@@ -219,8 +219,8 @@
 struct fcc_enet_private {
 	/* The saved address of a sent-in-place packet/buffer, for skfree(). */
 	struct	sk_buff* tx_skbuff[TX_RING_SIZE];
-	ushort	skb_cur;
-	ushort	skb_dirty;
+	uint	skb_cur;
+	uint	skb_dirty;

 	/* CPM dual port RAM relative addresses.
 	*/
@@ -329,10 +329,10 @@

 	/* Save skb pointer.
 	*/
-	cep->tx_skbuff[cep->skb_cur] = skb;
+	cep->tx_skbuff[(cep->skb_cur & TX_RING_MOD_MASK)] = skb;

 	cep->stats.tx_bytes += skb->len;
-	cep->skb_cur = (cep->skb_cur+1) & TX_RING_MOD_MASK;
+	cep->skb_cur++;

 	spin_lock_irq(&cep->lock);

@@ -355,7 +355,7 @@
 	else
 		bdp++;

-	if (bdp->cbd_sc & BD_ENET_TX_READY) {
+	if(cep->skb_cur - cep->skb_dirty >= TX_RING_SIZE){
 		netif_stop_queue(dev);
 		cep->tx_full = 1;
 	}
@@ -475,8 +475,8 @@

 		/* Free the sk buffer associated with this last transmit.
 		*/
-		dev_kfree_skb_irq(cep->tx_skbuff[cep->skb_dirty]);
-		cep->skb_dirty = (cep->skb_dirty + 1) & TX_RING_MOD_MASK;
+		dev_kfree_skb_irq(cep->tx_skbuff[(cep->skb_dirty & TX_RING_MOD_MASK)]);
+		cep->skb_dirty++;

 		/* Update pointer to next buffer descriptor to be transmitted.
 		*/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bugfix patch to arch/ppc/8260_io/fcc_enet.c
  2001-04-06 14:16 bugfix patch to arch/ppc/8260_io/fcc_enet.c David Schleef
@ 2001-04-08  7:15 ` Dan Malek
  2001-04-08 22:28   ` David Schleef
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Malek @ 2001-04-08  7:15 UTC (permalink / raw)
  To: David Schleef; +Cc: linuxppc-dev

David Schleef wrote:
>
> The attached patch fixes a bug with 8260 fcc_enet driver that
> is related to when the TX buffer becomes full.

Well, you need to prove to me you don't have a wrap-around
problem.  The line:
	if(cep->skb_cur - cep->skb_dirty >= TX_RING_SIZE){

is in big trouble, and I suspect you changed these from shorts
to ints because it didn't work right.  I suspect all you did
was postpone the problem until you hit 4G of packets instead of 64K.

> ..... Currently,
> the driver relies on the BD_ENET_TX_READY for determining if
> a ring slot is available for a tx buffer.  This is not a
> valid criterion, because the interrupt handler may not have
> cleared the slot from a previous tx buffer.

I beg to differ.  It is a valid criterion because the interrupt
handler isn't responsible for clearing the flag.  The transmit
function sets it, and the CPM will clear it when it is done sending
the buffer.

You are going to have big problems if you are writing into buffers
that have this flag set.

> ..... This bug is easy
> to generate by writing files over NFS with large wsize.

Then, there is some other bug that needs to be found and fixed.....

	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bugfix patch to arch/ppc/8260_io/fcc_enet.c
  2001-04-08  7:15 ` Dan Malek
@ 2001-04-08 22:28   ` David Schleef
  2001-04-09 17:57     ` Dan Malek
  2001-04-09 20:10     ` Dan Malek
  0 siblings, 2 replies; 5+ messages in thread
From: David Schleef @ 2001-04-08 22:28 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev


On Sun, Apr 08, 2001 at 03:15:29AM -0400, Dan Malek wrote:
>
> David Schleef wrote:
> >
> > The attached patch fixes a bug with 8260 fcc_enet driver that
> > is related to when the TX buffer becomes full.
>
> Well, you need to prove to me you don't have a wrap-around
> problem.  The line:
> 	if(cep->skb_cur - cep->skb_dirty >= TX_RING_SIZE){
>
> is in big trouble, and I suspect you changed these from shorts
> to ints because it didn't work right.  I suspect all you did
> was postpone the problem until you hit 4G of packets instead of 64K.

Check again.  The unsigned arithmetic works correctly.

The change from ushort to uint was a completely unnecessary change
that was based on prior experience with compilers generating bad
code for unsigned shorts.  After checking, I noticed that the
powerpc is actually completely sane in this respect.


> > ..... Currently,
> > the driver relies on the BD_ENET_TX_READY for determining if
> > a ring slot is available for a tx buffer.  This is not a
> > valid criterion, because the interrupt handler may not have
> > cleared the slot from a previous tx buffer.
>
> I beg to differ.  It is a valid criterion because the interrupt
> handler isn't responsible for clearing the flag.  The transmit
> function sets it, and the CPM will clear it when it is done sending
> the buffer.

(Sorry if I was being unclear.  It made sense to me... =) )

There are two possible sequences of events that I think is
occuring.  The one I was trying to explain, which isn't very
likely:

  (starting with a full queue)

  - tx slot N BD_ENET_TX_READY is cleared by the CPM

  - [before the interrupt is dispatched,] fcc_enet_start_xmit()
    sees the "empty" slot and stuffs a new buffer into it,
    overwriting the old buffer that hasn't been cleaned up.

  - interrupt handler is dispatched, and frees the new buffer
    instead of the old buffer.

  - kernel gets confused when the interrupt handler eventually
    tries to free the buffer for the second time.

The other (and more likely) is:

  - fcc_enet_start_xmit() fills up the TX ring, thus cep->skb_cur
    == cep->skb->dirty.

  - an interrupt occurs because TX is complete.  The interrupt
    handler doesn't clear the slot or restart the net device queue,
    because it fails on the test.

	if(cep->skb_cur == cep->skb_dirty)break;

  - the net device watchdog eventually realizes that something is
    wrong, and tries to reset the device, which doesn't work because
    fcc_enet_timeout doesn't do anything.





dave...


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bugfix patch to arch/ppc/8260_io/fcc_enet.c
  2001-04-08 22:28   ` David Schleef
@ 2001-04-09 17:57     ` Dan Malek
  2001-04-09 20:10     ` Dan Malek
  1 sibling, 0 replies; 5+ messages in thread
From: Dan Malek @ 2001-04-09 17:57 UTC (permalink / raw)
  To: David Schleef; +Cc: linuxppc-dev

David Schleef wrote:

> Check again.  The unsigned arithmetic works correctly.

OK.

>   (starting with a full queue)

This one can't happen because of the race conditions are avoided
by the use of cep->lock.

> The other (and more likely) is:
>
>   - fcc_enet_start_xmit() fills up the TX ring, thus cep->skb_cur
>     == cep->skb->dirty.

What driver are you using?  That line of code hasn't been in a driver
for a very long time.  This was a very obvious bug that has been
corrected long ago.  You should be using the FSM Labs linuxppc_2_5
bitkeeper tree for any of the 8xx/82xx development.  There are lots
of things that were placed in the 2_5 tree when the 2.4 tree was frozen.
We are now trying to merge them back, but there are still some pieces
left to go.

	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bugfix patch to arch/ppc/8260_io/fcc_enet.c
  2001-04-08 22:28   ` David Schleef
  2001-04-09 17:57     ` Dan Malek
@ 2001-04-09 20:10     ` Dan Malek
  1 sibling, 0 replies; 5+ messages in thread
From: Dan Malek @ 2001-04-09 20:10 UTC (permalink / raw)
  To: David Schleef; +Cc: linuxppc-dev

David Schleef wrote:

> The other (and more likely) is:

>         if(cep->skb_cur == cep->skb_dirty)break;

OK, I kind of found this too.  My last message indicated this
should have been fixed, and if this _exact_ line of code is in
your driver, it is too old.  The proper test checks for the buffer
pointers equal, and also adds the 'tx_full' condition so the
wrap around case is properly handled.

The current version of this driver has been subjected to the
University of New Hampshire Interoperability testing lab by
a very large communication company.  The test reports were sent
to me, I made some changes according to their recommendations,
and I believe they were happy with the results.  I would hope
they found all of the errors :-).

	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-04-09 20:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-04-06 14:16 bugfix patch to arch/ppc/8260_io/fcc_enet.c David Schleef
2001-04-08  7:15 ` Dan Malek
2001-04-08 22:28   ` David Schleef
2001-04-09 17:57     ` Dan Malek
2001-04-09 20:10     ` Dan Malek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).