* fealnx oopses
@ 2004-03-26 10:14 Denis Vlasenko
2004-03-26 19:22 ` Andreas Henriksson
0 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-26 10:14 UTC (permalink / raw)
To: netdev, OGAWA Hirofumi, Jeff Garzik; +Cc: andreas, Denis Vlasenko
I'll appreciate any comments from netdev@oss.sgi.com readers.
Especially on "or else?? np->cur_rx->skbuff stays NULL..." part.
> From: Andreas Henriksson <andreas@fjortis.info>
> To: vda@port.imtp.ilyichevsk.odessa.ua
>
> I've had the same problem for some time (although I've been using 2.6).
>
> Try the attached patch.... it works reliably for me with it (except that
> the controller sometimes get reset because of "transmit timed out").
> I guess there isn't much changed in the driver between 2.4 and 2.6 so it
> will hopefully apply cleanly.
> I have opened a bug about this (which I've already marked "resolved"):
> http://bugzilla.kernel.org/show_bug.cgi?id=1902
> Also see http://fjortis.info/pub/panic/fealnx/ for verbose information
> about my quest to solve the problem ... :)
> Please try this out and report to me if it helps.... I've been thinking
> of submitting this for some time but I haven't found anyone who can help
> me test the patch....
Andreas,
You concluded that your oops happened in
dev_kfree_skb_irq(np->cur_tx->skbuff);
because np->cur_tx->skbuff==NULL.
In my case I saw oops in
skb_put(skb = np->cur_rx->skbuff, pkt_len);
because np->cur_rx->skbuff==NULL.
(I am 100% sure, I traced it to exact assembler instruction
and C source line)
Seems similar but not exactly the same.
Another similarity is that you saw two do_IRQ's in your
oops traces, just like me: http://lkml.org/lkml/2004/3/21/119
Looks like it indeed can be caused by race between
network stack trying to stuff more tx data into card
and interrupts from card.
Another thread of me tried to guess how cur_rx->skbuff
can end up being NULL.
Here we are in interrupt context, processing rx intr:
static int netdev_rx(struct net_device *dev)
...
we just received large packet, pass it to network stack:
} else {
skb_put(skb = np->cur_rx->skbuff, pkt_len);
np->cur_rx->skbuff = NULL; <--------- set to NULL here
if (np->really_rx_count == RX_RING_SIZE)
np->lack_rxbuf = np->cur_rx;
--np->really_rx_count;
}
skb->protocol = eth_type_trans(skb, dev);
netif_rx(skb);
dev->last_rx = jiffies;
np->stats.rx_packets++;
np->stats.rx_bytes += pkt_len;
}
if (np->cur_rx->skbuff == NULL) {
aha. need to allocate new rx buffer
struct sk_buff *skb;
skb = dev_alloc_skb(np->rx_buf_sz);
if (skb != NULL) {
skb->dev = dev; /* Mark as being used by this device. */
np->cur_rx->buffer = pci_map_single(np->pci_dev, skb->tail,
np->rx_buf_sz, PCI_DMA_FROMDEVICE);
np->cur_rx->skbuff = skb;
++np->really_rx_count;
}
or else?? np->cur_rx->skbuff stays NULL...
}
if (np->cur_rx->skbuff != NULL)
free_one_rx_descriptor(np);
this does np->cur_rx = np->cur_rx->next_desc_logical inside
bit ONLY if( !=NULL). This is the only place which can possibly
change ->cur_tx (I grepped entire driver). Will something change
cur_rx->skbuff later? lets see:
} /* end of while loop */
/* allocate skb for rx buffers */
allocate_rx_buffers(dev);
this function will try to allocate all remaining unallocated
rx buffers but may fail if low on memory, leaving cur_rx->skbuff == NULL!
return 0;
}
end of handler. If next rx intr struck us right now, we will oops.
Am I right?
BTW, my box is indeed slow and low on RAM, this fits.
Regarding your patch: I looked in start_tx(). Apart from latent
bug in commented out part of code:
next = (struct fealnx *) np->cur_tx_copy.next_desc_logical;
which must be
next = (struct fealnx_desc *) np->cur_tx_copy->next_desc_logical;
I can't see anything racy there. The function just submits more
tx buffers for the card, it never touch cur_tx or cur_tx->skbuff...
If I miss something and it indeed races with interrupt, you
definitely need to add
spin_lock(&np->lock);
...
spin_unlock(&np->lock);
around interrupt handler body or at least around tx part
of it, or else your patch is incomplete (race will still
be possible on SMP).
Anyway, I applied your patch and flooded with UDP again.
My box did not oops. Unfortunately, it did not oops when
I reverted back to old, presumably buggy driver. I cannot
reproduce it anymore with old driver too! Bad. :(
--
vda
> --- old-fealnx.c 2004-03-23 12:43:21.000000000 +0100
> +++ fealnx.c 2004-03-23 12:44:17.000000000 +0100
> @@ -1311,6 +1311,9 @@ static void init_ring(struct net_device
> static int start_tx(struct sk_buff *skb, struct net_device *dev)
> {
> struct netdev_private *np = dev->priv;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&np->lock, flags);
>
> np->cur_tx_copy->skbuff = skb;
>
> @@ -1377,6 +1380,8 @@ static int start_tx(struct sk_buff *skb,
> writel(0, dev->base_addr + TXPDR);
> dev->trans_start = jiffies;
>
> + spin_unlock_irqrestore(&np->lock, flags);
> +
> return 0;
> }
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-26 10:14 fealnx oopses Denis Vlasenko
@ 2004-03-26 19:22 ` Andreas Henriksson
2004-03-26 19:33 ` [PATCH] " Jeff Garzik
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Andreas Henriksson @ 2004-03-26 19:22 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: netdev
On Fri, Mar 26, 2004 at 12:14:57PM +0200, Denis Vlasenko wrote:
<snip>
> BTW, my box is indeed slow and low on RAM, this fits.
>
I have only been looking at problems with races between the interrupt
handler and the rest of the driver code.. there might be a bunch of
problems with failed memory allocations that hasn't bitten me.
>
> Regarding your patch: I looked in start_tx(). Apart from latent
> bug in commented out part of code:
> next = (struct fealnx *) np->cur_tx_copy.next_desc_logical;
> which must be
> next = (struct fealnx_desc *) np->cur_tx_copy->next_desc_logical;
> I can't see anything racy there. The function just submits more
> tx buffers for the card, it never touch cur_tx or cur_tx->skbuff...
Francois Romieu explains the race in a comment to the bug I opened in
the bugzilla.
http://bugzilla.kernel.org/show_bug.cgi?id=1902#c1
The problem is that really_tx_count and similar parts of the private
structure isn't atomically updated and both the interrupt handler and
the start_tx function updates them.
(they are regular integers instead of atomic_t)
>
> If I miss something and it indeed races with interrupt, you
> definitely need to add
> spin_lock(&np->lock);
> ...
> spin_unlock(&np->lock);
> around interrupt handler body or at least around tx part
> of it, or else your patch is incomplete (race will still
> be possible on SMP).
>
I came to the conclusion that there should be a spinlock in the
interrupt handler yesterday, but it won't effect me at all since I don't
have SMP (nor preempt) so I'll leave it for now anyway.
>
> Anyway, I applied your patch and flooded with UDP again.
> My box did not oops. Unfortunately, it did not oops when
> I reverted back to old, presumably buggy driver. I cannot
> reproduce it anymore with old driver too! Bad. :(
I haven't been able to reproduce a kernel panic with my patch eighter.
And I've been transfering Terabytes of traffic during the last weeks (or
maybee it's months, well anyway.. I've done enough testing to say that
the card works "good enough" in this machine atleast).
And I've even tried your udp test..
Although I now have the myson/fealnx card in my p3-900 (256Mb)
workstation instead of the old p-166 (40Mb) which served as a gateway before.
It might just be that it's harder to trigger on newer/bigger machines.
Maybee I should power up my p-166 again.. I actually have 2 of these
cards so I can have one in each machine.. :)
> --
> vda
>
--
Regards,
Andreas Henriksson
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] Re: fealnx oopses
2004-03-26 19:22 ` Andreas Henriksson
@ 2004-03-26 19:33 ` Jeff Garzik
2004-03-26 20:05 ` Denis Vlasenko
2004-03-27 2:13 ` Andreas Henriksson
2004-03-26 19:57 ` Denis Vlasenko
2004-03-26 20:20 ` Francois Romieu
2 siblings, 2 replies; 31+ messages in thread
From: Jeff Garzik @ 2004-03-26 19:33 UTC (permalink / raw)
To: Andreas Henriksson; +Cc: Denis Vlasenko, netdev, Marcelo Tosatti, Linux Kernel
[-- Attachment #1: Type: text/plain, Size: 2797 bytes --]
Andreas Henriksson wrote:
> On Fri, Mar 26, 2004 at 12:14:57PM +0200, Denis Vlasenko wrote:
>
> <snip>
>
>>BTW, my box is indeed slow and low on RAM, this fits.
>>
>
>
> I have only been looking at problems with races between the interrupt
> handler and the rest of the driver code.. there might be a bunch of
> problems with failed memory allocations that hasn't bitten me.
>
>
>>Regarding your patch: I looked in start_tx(). Apart from latent
>>bug in commented out part of code:
>> next = (struct fealnx *) np->cur_tx_copy.next_desc_logical;
>>which must be
>> next = (struct fealnx_desc *) np->cur_tx_copy->next_desc_logical;
>>I can't see anything racy there. The function just submits more
>>tx buffers for the card, it never touch cur_tx or cur_tx->skbuff...
>
>
> Francois Romieu explains the race in a comment to the bug I opened in
> the bugzilla.
>
> http://bugzilla.kernel.org/show_bug.cgi?id=1902#c1
>
> The problem is that really_tx_count and similar parts of the private
> structure isn't atomically updated and both the interrupt handler and
> the start_tx function updates them.
> (they are regular integers instead of atomic_t)
>
>
>>If I miss something and it indeed races with interrupt, you
>>definitely need to add
>> spin_lock(&np->lock);
>>...
>> spin_unlock(&np->lock);
>>around interrupt handler body or at least around tx part
>>of it, or else your patch is incomplete (race will still
>>be possible on SMP).
>>
>
>
> I came to the conclusion that there should be a spinlock in the
> interrupt handler yesterday, but it won't effect me at all since I don't
> have SMP (nor preempt) so I'll leave it for now anyway.
>
>
>>Anyway, I applied your patch and flooded with UDP again.
>>My box did not oops. Unfortunately, it did not oops when
>>I reverted back to old, presumably buggy driver. I cannot
>>reproduce it anymore with old driver too! Bad. :(
>
>
> I haven't been able to reproduce a kernel panic with my patch eighter.
> And I've been transfering Terabytes of traffic during the last weeks (or
> maybee it's months, well anyway.. I've done enough testing to say that
> the card works "good enough" in this machine atleast).
> And I've even tried your udp test..
>
> Although I now have the myson/fealnx card in my p3-900 (256Mb)
> workstation instead of the old p-166 (40Mb) which served as a gateway before.
> It might just be that it's harder to trigger on newer/bigger machines.
> Maybee I should power up my p-166 again.. I actually have 2 of these
> cards so I can have one in each machine.. :)
Well really, somebody needs to port Donald Becker's myson driver to 2.6
APIs... I would like to get rid of fealnx, or somebody needs to spend a
decent amount of time fixing it.
Does the attached patch fix the issue?
Jeff
[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 1037 bytes --]
===== drivers/net/fealnx.c 1.34 vs edited =====
--- 1.34/drivers/net/fealnx.c Sun Mar 14 01:54:58 2004
+++ edited/drivers/net/fealnx.c Fri Mar 26 14:31:07 2004
@@ -1303,14 +1303,15 @@
/* for the last tx descriptor */
np->tx_ring[i - 1].next_desc = np->tx_ring_dma;
np->tx_ring[i - 1].next_desc_logical = &np->tx_ring[0];
-
- return;
}
static int start_tx(struct sk_buff *skb, struct net_device *dev)
{
struct netdev_private *np = dev->priv;
+ unsigned long flags;
+
+ spin_lock_irqsave(&np->lock, flags);
np->cur_tx_copy->skbuff = skb;
@@ -1377,6 +1378,7 @@
writel(0, dev->base_addr + TXPDR);
dev->trans_start = jiffies;
+ spin_unlock_irqrestore(&np->lock, flags);
return 0;
}
@@ -1423,6 +1425,8 @@
unsigned int num_tx = 0;
int handled = 0;
+ spin_lock(&np->lock);
+
writel(0, dev->base_addr + IMR);
ioaddr = dev->base_addr;
@@ -1564,6 +1568,8 @@
dev->name, readl(ioaddr + ISR));
writel(np->imrvalue, ioaddr + IMR);
+
+ spin_unlock(&np->lock);
return IRQ_RETVAL(handled);
}
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-26 19:22 ` Andreas Henriksson
2004-03-26 19:33 ` [PATCH] " Jeff Garzik
@ 2004-03-26 19:57 ` Denis Vlasenko
[not found] ` <40648CAF.5010203@pobox.com>
2004-03-26 20:20 ` Francois Romieu
2 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-26 19:57 UTC (permalink / raw)
To: Andreas Henriksson; +Cc: netdev, romieu
On Friday 26 March 2004 21:22, Andreas Henriksson wrote:
> On Fri, Mar 26, 2004 at 12:14:57PM +0200, Denis Vlasenko wrote:
>
> <snip>
>
> > BTW, my box is indeed slow and low on RAM, this fits.
>
> I have only been looking at problems with races between the interrupt
> handler and the rest of the driver code.. there might be a bunch of
> problems with failed memory allocations that hasn't bitten me.
>
> > Regarding your patch: I looked in start_tx(). Apart from latent
> > bug in commented out part of code:
> > next = (struct fealnx *) np->cur_tx_copy.next_desc_logical;
> > which must be
> > next = (struct fealnx_desc *) np->cur_tx_copy->next_desc_logical;
> > I can't see anything racy there. The function just submits more
> > tx buffers for the card, it never touch cur_tx or cur_tx->skbuff...
>
> Francois Romieu explains the race in a comment to the bug I opened in
> the bugzilla.
>
> http://bugzilla.kernel.org/show_bug.cgi?id=1902#c1
> The problem is that really_tx_count and similar parts of the private
> structure isn't atomically updated and both the interrupt handler and
> the start_tx function updates them.
> (they are regular integers instead of atomic_t)
------- Additional Comment #1 From Francois Romieu 2004-02-27 14:53 -------
Afaiks, np->{free_tx_count/really_tx_count} update in start_xmit is not atomic
wrt the irq handler and the irq handler updates these variables as well.
So we could have:
- start_xmit reads really_tx_count (first step to update really_tx_count);
- first tx irq takes place in and updates really_tx_count
- start_xmit ends updating really_tx_count, ignoring the value set in the irq
handler
- second tx irq takes place and reads an erroneously high really_tx_count
value
- <censored>
Nice explanation, but on x86 uniprocessor it can't happen.
> > If I miss something and it indeed races with interrupt, you
> > definitely need to add
> > spin_lock(&np->lock);
> > ...
> > spin_unlock(&np->lock);
> > around interrupt handler body or at least around tx part
> > of it, or else your patch is incomplete (race will still
> > be possible on SMP).
>
> I came to the conclusion that there should be a spinlock in the
> interrupt handler yesterday, but it won't effect me at all since I don't
> have SMP (nor preempt) so I'll leave it for now anyway.
>
> > Anyway, I applied your patch and flooded with UDP again.
> > My box did not oops. Unfortunately, it did not oops when
> > I reverted back to old, presumably buggy driver. I cannot
> > reproduce it anymore with old driver too! Bad. :(
>
> I haven't been able to reproduce a kernel panic with my patch eighter.
> And I've been transfering Terabytes of traffic during the last weeks (or
> maybee it's months, well anyway.. I've done enough testing to say that
> the card works "good enough" in this machine atleast).
> And I've even tried your udp test..
>
> Although I now have the myson/fealnx card in my p3-900 (256Mb)
> workstation instead of the old p-166 (40Mb) which served as a gateway
> before. It might just be that it's harder to trigger on newer/bigger
> machines. Maybee I should power up my p-166 again.. I actually have 2 of
> these cards so I can have one in each machine.. :)
A Good Idea (tm)
I have a spare P75, do you need it? ;)
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] Re: fealnx oopses
2004-03-26 19:33 ` [PATCH] " Jeff Garzik
@ 2004-03-26 20:05 ` Denis Vlasenko
2004-03-27 2:13 ` Andreas Henriksson
1 sibling, 0 replies; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-26 20:05 UTC (permalink / raw)
To: Jeff Garzik, Andreas Henriksson; +Cc: netdev, Marcelo Tosatti, Linux Kernel
On Friday 26 March 2004 21:33, Jeff Garzik wrote:
> > Although I now have the myson/fealnx card in my p3-900 (256Mb)
> > workstation instead of the old p-166 (40Mb) which served as a gateway
> > before. It might just be that it's harder to trigger on newer/bigger
> > machines. Maybee I should power up my p-166 again.. I actually have 2 of
> > these cards so I can have one in each machine.. :)
>
> Well really, somebody needs to port Donald Becker's myson driver to 2.6
> APIs... I would like to get rid of fealnx, or somebody needs to spend a
> decent amount of time fixing it.
>
> Does the attached patch fix the issue?
>
> Jeff
It may fix Andreas case, but I doubt it can fix mine -
mine was related to _rx_ code path.
I need to find a way to reliably reproduce oopses with unfixed driver first.
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-26 19:22 ` Andreas Henriksson
2004-03-26 19:33 ` [PATCH] " Jeff Garzik
2004-03-26 19:57 ` Denis Vlasenko
@ 2004-03-26 20:20 ` Francois Romieu
2 siblings, 0 replies; 31+ messages in thread
From: Francois Romieu @ 2004-03-26 20:20 UTC (permalink / raw)
To: Andreas Henriksson; +Cc: Denis Vlasenko, netdev
Andreas Henriksson <andreas@fjortis.info> :
[...]
> I came to the conclusion that there should be a spinlock in the
> interrupt handler yesterday, but it won't effect me at all since I don't
> have SMP (nor preempt) so I'll leave it for now anyway.
SMP is not needed. If an interrupt is fired at the wrong time, the race
takes place as well.
--
Ueimor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
[not found] ` <40648CAF.5010203@pobox.com>
@ 2004-03-26 22:14 ` Denis Vlasenko
2004-03-26 22:35 ` Francois Romieu
0 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-26 22:14 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Andreas Henriksson, netdev, romieu
On Friday 26 March 2004 22:03, Jeff Garzik wrote:
> Denis Vlasenko wrote:
> > ------- Additional Comment #1 From Francois Romieu 2004-02-27 14:53
> > ------- Afaiks, np->{free_tx_count/really_tx_count} update in start_xmit
> > is not atomic wrt the irq handler and the irq handler updates these
> > variables as well. So we could have:
> > - start_xmit reads really_tx_count (first step to update
> > really_tx_count); - first tx irq takes place in and updates
> > really_tx_count
> > - start_xmit ends updating really_tx_count, ignoring the value set in the
> > irq handler
> > - second tx irq takes place and reads an erroneously high really_tx_count
> > value
> > - <censored>
> >
> > Nice explanation, but on x86 uniprocessor it can't happen.
>
> An irq in the middle of start_tx() can definitely happen on uniprocessor...
Yes. But on x86 a++ is atomic vs interrupts - it's a single instruction
and interrupts happen on instruction boundaries only.
Entire start_tx() is included below sig. Where it can race
against intr? I see no such place... do you?
In other words: patch is needed for SMP, but Andreas was bitten
by something else.
> Can you try the patch I just posted?
I can. The problem is, I cannot reproduce oops _without_ patch,
and this renders testing useless.
I will work on reproducing.
>
> Jeff
--
vda
static int start_tx(struct sk_buff *skb, struct net_device *dev)
{
struct netdev_private *np = dev->priv;
np->cur_tx_copy->skbuff = skb;
#define one_buffer
#define BPT 1022
#if defined(one_buffer)
np->cur_tx_copy->buffer = pci_map_single(np->pci_dev, skb->data,
skb->len, PCI_DMA_TODEVICE);
np->cur_tx_copy->control = TXIC | TXLD | TXFD | CRCEnable | PADEnable;
np->cur_tx_copy->control |= (skb->len << PKTSShift); /* pkt size */
np->cur_tx_copy->control |= (skb->len << TBSShift); /* buffer size */
// 89/12/29 add,
if (np->pci_dev->device == 0x891)
np->cur_tx_copy->control |= ETIControl | RetryTxLC;
np->cur_tx_copy->status = TXOWN;
np->cur_tx_copy = np->cur_tx_copy->next_desc_logical;
--np->free_tx_count;
#elif defined(two_buffer)
...dead code...
#endif
if (np->free_tx_count < 2)
netif_stop_queue(dev);
++np->really_tx_count;
writel(0, dev->base_addr + TXPDR);
dev->trans_start = jiffies;
return 0;
}
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-26 22:14 ` Denis Vlasenko
@ 2004-03-26 22:35 ` Francois Romieu
2004-03-27 0:03 ` Denis Vlasenko
[not found] ` <4064BB35.4050301@pobox.com>
0 siblings, 2 replies; 31+ messages in thread
From: Francois Romieu @ 2004-03-26 22:35 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: Jeff Garzik, Andreas Henriksson, netdev
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> :
[...]
> Yes. But on x86 a++ is atomic vs interrupts - it's a single instruction
> and interrupts happen on instruction boundaries only.
Do you realize that you are saying that the CPU can atomically increment an
integer which sits _in memory_ ?
Oops :o)
--
Ueimor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-26 22:35 ` Francois Romieu
@ 2004-03-27 0:03 ` Denis Vlasenko
2004-03-27 0:30 ` Francois Romieu
[not found] ` <4064BB35.4050301@pobox.com>
1 sibling, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-27 0:03 UTC (permalink / raw)
To: Francois Romieu; +Cc: Jeff Garzik, Andreas Henriksson, netdev
On Saturday 27 March 2004 00:35, Francois Romieu wrote:
> Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> :
> [...]
>
> > Yes. But on x86 a++ is atomic vs interrupts - it's a single instruction
> > and interrupts happen on instruction boundaries only.
>
> Do you realize that you are saying that the CPU can atomically increment an
> integer which sits _in memory_ ?
Not exactly. CPU does not atomically increment memory.
I am saying that x86 CPU can't be interrupted in the middle
of read-modify-write operation, like
incl (%eax)
because it is single machine instruction. Interrupt either
happens before it, and handler see 'old' value, or after it,
and handler see 'new' value. Interrupt cannot happen 'inside' it.
Some RISC processors lack such single-instruction RMW ops.
Bug can thrigger on those CPUs:
load (%r1),r2
inc v2
<-------- interrupt here. Ouch!
store v2,(%r1)
Not on x86.
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-27 0:03 ` Denis Vlasenko
@ 2004-03-27 0:30 ` Francois Romieu
0 siblings, 0 replies; 31+ messages in thread
From: Francois Romieu @ 2004-03-27 0:30 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: Jeff Garzik, Andreas Henriksson, netdev
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> :
[...]
> Not exactly. CPU does not atomically increment memory.
> I am saying that x86 CPU can't be interrupted in the middle
> of read-modify-write operation, like
Yep, that's exactly what I tought was not possible. Complete brain outage.
--
Ueimor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] Re: fealnx oopses
2004-03-26 19:33 ` [PATCH] " Jeff Garzik
2004-03-26 20:05 ` Denis Vlasenko
@ 2004-03-27 2:13 ` Andreas Henriksson
1 sibling, 0 replies; 31+ messages in thread
From: Andreas Henriksson @ 2004-03-27 2:13 UTC (permalink / raw)
To: Jeff Garzik
Cc: Andreas Henriksson, Denis Vlasenko, netdev, Marcelo Tosatti,
Linux Kernel
On Fri, Mar 26, 2004 at 02:33:18PM -0500, Jeff Garzik wrote:
> Andreas Henriksson wrote:
> >On Fri, Mar 26, 2004 at 12:14:57PM +0200, Denis Vlasenko wrote:
> >
> ><snip>
> >
> >>BTW, my box is indeed slow and low on RAM, this fits.
> >>
> >
> >
> >I have only been looking at problems with races between the interrupt
> >handler and the rest of the driver code.. there might be a bunch of
> >problems with failed memory allocations that hasn't bitten me.
> >
> >
> >>Regarding your patch: I looked in start_tx(). Apart from latent
> >>bug in commented out part of code:
> >> next = (struct fealnx *) np->cur_tx_copy.next_desc_logical;
> >>which must be
> >> next = (struct fealnx_desc *) np->cur_tx_copy->next_desc_logical;
> >>I can't see anything racy there. The function just submits more
> >>tx buffers for the card, it never touch cur_tx or cur_tx->skbuff...
> >
> >
> >Francois Romieu explains the race in a comment to the bug I opened in
> >the bugzilla.
> >
> >http://bugzilla.kernel.org/show_bug.cgi?id=1902#c1
> >
> >The problem is that really_tx_count and similar parts of the private
> >structure isn't atomically updated and both the interrupt handler and
> >the start_tx function updates them.
> >(they are regular integers instead of atomic_t)
> >
> >
> >>If I miss something and it indeed races with interrupt, you
> >>definitely need to add
> >> spin_lock(&np->lock);
> >>...
> >> spin_unlock(&np->lock);
> >>around interrupt handler body or at least around tx part
> >>of it, or else your patch is incomplete (race will still
> >>be possible on SMP).
> >>
> >
> >
> >I came to the conclusion that there should be a spinlock in the
> >interrupt handler yesterday, but it won't effect me at all since I don't
> >have SMP (nor preempt) so I'll leave it for now anyway.
> >
> >
> >>Anyway, I applied your patch and flooded with UDP again.
> >>My box did not oops. Unfortunately, it did not oops when
> >>I reverted back to old, presumably buggy driver. I cannot
> >>reproduce it anymore with old driver too! Bad. :(
> >
> >
> >I haven't been able to reproduce a kernel panic with my patch eighter.
> >And I've been transfering Terabytes of traffic during the last weeks (or
> >maybee it's months, well anyway.. I've done enough testing to say that
> >the card works "good enough" in this machine atleast).
> >And I've even tried your udp test..
> >
> >Although I now have the myson/fealnx card in my p3-900 (256Mb)
> >workstation instead of the old p-166 (40Mb) which served as a gateway
> >before.
> >It might just be that it's harder to trigger on newer/bigger machines.
> >Maybee I should power up my p-166 again.. I actually have 2 of these
> >cards so I can have one in each machine.. :)
>
> Well really, somebody needs to port Donald Becker's myson driver to 2.6
> APIs... I would like to get rid of fealnx, or somebody needs to spend a
> decent amount of time fixing it.
>
Ok, didn't know someone actually had a better driver.. I though "fix up
the current to not panic and wait for people to forget that the hardware
ever existed" was the proper "fix". ;)
Although I doubt I'm able to even get donalds driver to even compile
without the middle layer... I'm not anywhere near a kernel developer.
But I'd certainly love to try to get it working.. but don't hold your
breath. :)
>
> Does the attached patch fix the issue?
>
Probably... it includes the lock I've added... though it could take me
up to 3 weeks to get a kernel panic on my p-166 (although it could
happen twice a day as well as it did when I wrote the bugreport in the bugzilla).
> Jeff
>
>
I'll try your patch and if you don't hear from me it works "good
enough". Anyway... adding a lock or two will probably not make the
situation worse then it is today..
> ===== drivers/net/fealnx.c 1.34 vs edited =====
> --- 1.34/drivers/net/fealnx.c Sun Mar 14 01:54:58 2004
> +++ edited/drivers/net/fealnx.c Fri Mar 26 14:31:07 2004
> @@ -1303,14 +1303,15 @@
> /* for the last tx descriptor */
> np->tx_ring[i - 1].next_desc = np->tx_ring_dma;
> np->tx_ring[i - 1].next_desc_logical = &np->tx_ring[0];
> -
> - return;
> }
>
>
> static int start_tx(struct sk_buff *skb, struct net_device *dev)
> {
> struct netdev_private *np = dev->priv;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&np->lock, flags);
>
> np->cur_tx_copy->skbuff = skb;
>
> @@ -1377,6 +1378,7 @@
> writel(0, dev->base_addr + TXPDR);
> dev->trans_start = jiffies;
>
> + spin_unlock_irqrestore(&np->lock, flags);
> return 0;
> }
>
> @@ -1423,6 +1425,8 @@
> unsigned int num_tx = 0;
> int handled = 0;
>
> + spin_lock(&np->lock);
> +
> writel(0, dev->base_addr + IMR);
>
> ioaddr = dev->base_addr;
> @@ -1564,6 +1568,8 @@
> dev->name, readl(ioaddr + ISR));
>
> writel(np->imrvalue, ioaddr + IMR);
> +
> + spin_unlock(&np->lock);
>
> return IRQ_RETVAL(handled);
> }
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
[not found] ` <4064BB35.4050301@pobox.com>
@ 2004-03-27 21:28 ` Denis Vlasenko
2004-03-27 23:55 ` Francois Romieu
0 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-27 21:28 UTC (permalink / raw)
To: Jeff Garzik, Francois Romieu; +Cc: Andreas Henriksson, netdev
On Saturday 27 March 2004 01:22, Jeff Garzik wrote:
> Francois Romieu wrote:
> > Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> :
> > [...]
> >
> >>Yes. But on x86 a++ is atomic vs interrupts - it's a single instruction
> >>and interrupts happen on instruction boundaries only.
> >
> > Do you realize that you are saying that the CPU can atomically increment
> > an integer which sits _in memory_ ?
>
> Sure you can...
>
> But start_tx() has two variables free_tx_count and
> really_tx_count, and about six or seven places those are used.
>
> And this comment...
>
> > I can. The problem is, I cannot reproduce oops _without_ patch,
> > and this renders testing useless.
>
> Sounds like it fixes the problem, to me.
>
> Jeff
Okay I can reproduce it now. 16 Mb of RAM and underclock to 125 MHz
did the trick ;)
Sorry Jeff your patch does not fix my problem.
I also verified that it happens this way:
(1) rx skb is processed and freed
(2) allocation failure for new skb
(3) next rx intr occurs, cur_rx->skbuff still == NULL
(4) oops
I modified code as follows and I did see all these printk()s
in sequence:
} else {
if(np->cur_rx->skbuff == NULL)
(3) printk(KERN_ERR "vda: np->cur_rx->skbuff == NULL!\n"); //vda
(4) skb_put(skb = np->cur_rx->skbuff, pkt_len);
(1) np->cur_rx->skbuff = NULL;
if (np->really_rx_count == RX_RING_SIZE)
np->lack_rxbuf = np->cur_rx;
--np->really_rx_count;
}
skb->protocol = eth_type_trans(skb, dev);
netif_rx(skb);
dev->last_rx = jiffies;
np->stats.rx_packets++;
np->stats.rx_bytes += pkt_len;
}
if (np->cur_rx->skbuff == NULL) {
struct sk_buff *skb;
skb = dev_alloc_skb(np->rx_buf_sz);
if (skb != NULL) {
skb->dev = dev; /* Mark as being used by this device. */
np->cur_rx->buffer = pci_map_single(np->pci_dev, skb->tail,
np->rx_buf_sz, PCI_DMA_FROMDEVICE);
np->cur_rx->skbuff = skb;
++np->really_rx_count;
} else { //vda
(2) printk(KERN_ERR "vda: low on mem, cannot allocate skb!\n");
}
}
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-27 21:28 ` Denis Vlasenko
@ 2004-03-27 23:55 ` Francois Romieu
2004-03-28 20:19 ` Denis Vlasenko
0 siblings, 1 reply; 31+ messages in thread
From: Francois Romieu @ 2004-03-27 23:55 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: Jeff Garzik, Andreas Henriksson, netdev
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> :
[...]
> Sorry Jeff your patch does not fix my problem.
> I also verified that it happens this way:
> (1) rx skb is processed and freed
> (2) allocation failure for new skb
> (3) next rx intr occurs, cur_rx->skbuff still == NULL
> (4) oops
Your previous analysis made sense: the logic responsible for low memory
handling is broken in several ways.
Patch against2.6.5-rc2-mm3 below, without warranty
--- drivers/net/fealnx.c 2004-03-28 00:44:27.000000000 +0100
+++ drivers/net/fealnx.c 2004-03-28 00:52:46.000000000 +0100
@@ -1134,15 +1134,17 @@ static void allocate_rx_buffers(struct n
struct sk_buff *skb;
skb = dev_alloc_skb(np->rx_buf_sz);
- np->lack_rxbuf->skbuff = skb;
-
if (skb == NULL)
break; /* Better luck next round. */
+ while (np->lack_rxbuf->skbuff)
+ np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->skbuff = skb;
+
skb->dev = dev; /* Mark as being used by this device. */
np->lack_rxbuf->buffer = pci_map_single(np->pci_dev, skb->tail,
np->rx_buf_sz, PCI_DMA_FROMDEVICE);
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->status = RXOWN;
++np->really_rx_count;
}
}
@@ -1380,33 +1382,22 @@ static int start_tx(struct sk_buff *skb,
return 0;
}
-
-void free_one_rx_descriptor(struct netdev_private *np)
-{
- if (np->really_rx_count == RX_RING_SIZE)
- np->cur_rx->status = RXOWN;
- else {
- np->lack_rxbuf->skbuff = np->cur_rx->skbuff;
- np->lack_rxbuf->buffer = np->cur_rx->buffer;
- np->lack_rxbuf->status = RXOWN;
- ++np->really_rx_count;
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
- }
- np->cur_rx = np->cur_rx->next_desc_logical;
-}
-
-
void reset_rx_descriptors(struct net_device *dev)
{
struct netdev_private *np = dev->priv;
+ struct fealnx_desc *cur = np->cur_rx;
+ int i;
stop_nic_rx(dev->base_addr, np->crvalue);
- while (!(np->cur_rx->status & RXOWN))
- free_one_rx_descriptor(np);
-
allocate_rx_buffers(dev);
+ for (i = 0; i < RX_RING_SIZE; i++) {
+ if (cur->skbuff)
+ cur->status = RXOWN;
+ cur = cur->next_desc_logical;
+ }
+
writel(np->rx_ring_dma + (np->cur_rx - np->rx_ring),
dev->base_addr + RXLBA);
writel(np->crvalue, dev->base_addr + TCRRCR);
@@ -1576,7 +1567,7 @@ static int netdev_rx(struct net_device *
struct netdev_private *np = dev->priv;
/* If EOP is set on the next entry, it's a new packet. Send it up. */
- while (!(np->cur_rx->status & RXOWN)) {
+ while (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) {
s32 rx_status = np->cur_rx->status;
if (np->really_rx_count == 0)
@@ -1628,8 +1619,15 @@ static int netdev_rx(struct net_device *
np->stats.rx_length_errors++;
/* free all rx descriptors related this long pkt */
- for (i = 0; i < desno; ++i)
- free_one_rx_descriptor(np);
+ for (i = 0; i < desno; ++i) {
+ if (!np->cur_rx->skbuff) {
+ printk(KERN_DEBUG
+ "%s: I'm scared\n", dev->name);
+ break;
+ }
+ np->cur_rx->status = RXOWN;
+ np->cur_rx = np->cur_rx->next_desc_logical;
+ }
continue;
} else { /* something error, need to reset this chip */
reset_rx_descriptors(dev);
@@ -1679,8 +1677,6 @@ static int netdev_rx(struct net_device *
PCI_DMA_FROMDEVICE);
skb_put(skb = np->cur_rx->skbuff, pkt_len);
np->cur_rx->skbuff = NULL;
- if (np->really_rx_count == RX_RING_SIZE)
- np->lack_rxbuf = np->cur_rx;
--np->really_rx_count;
}
skb->protocol = eth_type_trans(skb, dev);
@@ -1689,25 +1685,7 @@ static int netdev_rx(struct net_device *
np->stats.rx_packets++;
np->stats.rx_bytes += pkt_len;
}
-
- if (np->cur_rx->skbuff == NULL) {
- struct sk_buff *skb;
-
- skb = dev_alloc_skb(np->rx_buf_sz);
-
- if (skb != NULL) {
- skb->dev = dev; /* Mark as being used by this device. */
- np->cur_rx->buffer = pci_map_single(np->pci_dev,
- skb->tail,
- np->rx_buf_sz,
- PCI_DMA_FROMDEVICE);
- np->cur_rx->skbuff = skb;
- ++np->really_rx_count;
- }
- }
-
- if (np->cur_rx->skbuff != NULL)
- free_one_rx_descriptor(np);
+ np->cur_rx = np->cur_rx->next_desc_logical;
} /* end of while loop */
/* allocate skb for rx buffers */
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-27 23:55 ` Francois Romieu
@ 2004-03-28 20:19 ` Denis Vlasenko
2004-03-28 23:27 ` Andreas Henriksson
0 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-28 20:19 UTC (permalink / raw)
To: Francois Romieu; +Cc: Jeff Garzik, Andreas Henriksson, netdev
[-- Attachment #1: Type: text/plain, Size: 3847 bytes --]
On Sunday 28 March 2004 01:55, Francois Romieu wrote:
> Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> :
> [...]
>
> > Sorry Jeff your patch does not fix my problem.
> > I also verified that it happens this way:
> > (1) rx skb is processed and freed
> > (2) allocation failure for new skb
> > (3) next rx intr occurs, cur_rx->skbuff still == NULL
> > (4) oops
>
> Your previous analysis made sense: the logic responsible for low memory
> handling is broken in several ways.
>
> Patch against2.6.5-rc2-mm3 below, without warranty
I tested on 2.4.25, patch applied with minimal manual correction
in last hunk (rediffed patch attached).
Patched driver oopses in allocate_rx_buffers() shortly after boot,
before I can login. It dies here (sorry, only objdump for now,
I am -EBADLYNEEDTOSLEEP. Can track it to src line tomorrow):
0000065c <allocate_rx_buffers>:
65c: 55 push %ebp
65d: 89 e5 mov %esp,%ebp
65f: 56 push %esi
660: 53 push %ebx
661: 8b 75 08 mov 0x8(%ebp),%esi
664: 8b 5e 6c mov 0x6c(%esi),%ebx
667: 83 bb 9c 00 00 00 0c cmpl $0xc,0x9c(%ebx)
66e: 0f 84 8a 00 00 00 je 6fe <allocate_rx_buffers+0xa2>
674: 6a 20 push $0x20
676: 8b 83 b0 00 00 00 mov 0xb0(%ebx),%eax
67c: 83 c0 10 add $0x10,%eax
67f: 50 push %eax
680: e8 fc ff ff ff call 681 <allocate_rx_buffers+0x25>
685: 85 c0 test %eax,%eax
687: 5a pop %edx
688: 59 pop %ecx
689: 74 73 je 6fe <allocate_rx_buffers+0xa2>
68b: 83 80 84 00 00 00 10 addl $0x10,0x84(%eax)
692: 83 80 88 00 00 00 10 addl $0x10,0x88(%eax)
699: 85 c0 test %eax,%eax
69b: 74 61 je 6fe <allocate_rx_buffers+0xa2>
69d: 8b 93 98 00 00 00 mov 0x98(%ebx),%edx
6a3: 8b 4a 14 mov 0x14(%edx),%ecx
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
%edx == 0 here
6a6: 85 c9 test %ecx,%ecx
6a8: 74 12 je 6bc <allocate_rx_buffers+0x60>
6aa: 89 f6 mov %esi,%esi
6ac: 8b 52 10 mov 0x10(%edx),%edx
6af: 89 93 98 00 00 00 mov %edx,0x98(%ebx)
6b5: 8b 4a 14 mov 0x14(%edx),%ecx
6b8: 85 c9 test %ecx,%ecx
6ba: 75 f0 jne 6ac <allocate_rx_buffers+0x50>
6bc: 89 42 14 mov %eax,0x14(%edx)
6bf: 89 70 18 mov %esi,0x18(%eax)
6c2: 8b 93 98 00 00 00 mov 0x98(%ebx),%edx
6c8: 8b 80 88 00 00 00 mov 0x88(%eax),%eax
6ce: f0 83 44 24 00 00 lock addl $0x0,0x0(%esp,1)
6d4: 05 00 00 00 40 add $0x40000000,%eax
6d9: 89 42 08 mov %eax,0x8(%edx)
6dc: 8b 83 98 00 00 00 mov 0x98(%ebx),%eax
6e2: c7 00 00 00 00 80 movl $0x80000000,(%eax)
6e8: 8b 83 9c 00 00 00 mov 0x9c(%ebx),%eax
6ee: 40 inc %eax
6ef: 83 f8 0c cmp $0xc,%eax
6f2: 89 83 9c 00 00 00 mov %eax,0x9c(%ebx)
6f8: 0f 85 76 ff ff ff jne 674 <allocate_rx_buffers+0x18>
6fe: 8d 65 f8 lea 0xfffffff8(%ebp),%esp
...
--
vda
[-- Attachment #2: fealnx_oom.24.patch --]
[-- Type: text/x-diff, Size: 3451 bytes --]
--- fealnx.c.orig Fri Nov 28 20:26:20 2003
+++ fealnx.c Sun Mar 28 21:32:56 2004
@@ -1134,15 +1134,17 @@
struct sk_buff *skb;
skb = dev_alloc_skb(np->rx_buf_sz);
- np->lack_rxbuf->skbuff = skb;
-
if (skb == NULL)
break; /* Better luck next round. */
+ while (np->lack_rxbuf->skbuff)
+ np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->skbuff = skb;
+
skb->dev = dev; /* Mark as being used by this device. */
np->lack_rxbuf->buffer = pci_map_single(np->pci_dev, skb->tail,
np->rx_buf_sz, PCI_DMA_FROMDEVICE);
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->status = RXOWN;
++np->really_rx_count;
}
}
@@ -1380,33 +1382,22 @@
return 0;
}
-
-void free_one_rx_descriptor(struct netdev_private *np)
-{
- if (np->really_rx_count == RX_RING_SIZE)
- np->cur_rx->status = RXOWN;
- else {
- np->lack_rxbuf->skbuff = np->cur_rx->skbuff;
- np->lack_rxbuf->buffer = np->cur_rx->buffer;
- np->lack_rxbuf->status = RXOWN;
- ++np->really_rx_count;
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
- }
- np->cur_rx = np->cur_rx->next_desc_logical;
-}
-
-
void reset_rx_descriptors(struct net_device *dev)
{
struct netdev_private *np = dev->priv;
+ struct fealnx_desc *cur = np->cur_rx;
+ int i;
stop_nic_rx(dev->base_addr, np->crvalue);
- while (!(np->cur_rx->status & RXOWN))
- free_one_rx_descriptor(np);
-
allocate_rx_buffers(dev);
+ for (i = 0; i < RX_RING_SIZE; i++) {
+ if (cur->skbuff)
+ cur->status = RXOWN;
+ cur = cur->next_desc_logical;
+ }
+
writel(np->rx_ring_dma + (np->cur_rx - np->rx_ring),
dev->base_addr + RXLBA);
writel(np->crvalue, dev->base_addr + TCRRCR);
@@ -1576,7 +1567,7 @@
struct netdev_private *np = dev->priv;
/* If EOP is set on the next entry, it's a new packet. Send it up. */
- while (!(np->cur_rx->status & RXOWN)) {
+ while (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) {
s32 rx_status = np->cur_rx->status;
if (np->really_rx_count == 0)
@@ -1628,8 +1619,15 @@
np->stats.rx_length_errors++;
/* free all rx descriptors related this long pkt */
- for (i = 0; i < desno; ++i)
- free_one_rx_descriptor(np);
+ for (i = 0; i < desno; ++i) {
+ if (!np->cur_rx->skbuff) {
+ printk(KERN_DEBUG
+ "%s: I'm scared\n", dev->name);
+ break;
+ }
+ np->cur_rx->status = RXOWN;
+ np->cur_rx = np->cur_rx->next_desc_logical;
+ }
continue;
} else { /* something error, need to reset this chip */
reset_rx_descriptors(dev);
@@ -1671,8 +1669,6 @@
} else {
skb_put(skb = np->cur_rx->skbuff, pkt_len);
np->cur_rx->skbuff = NULL;
- if (np->really_rx_count == RX_RING_SIZE)
- np->lack_rxbuf = np->cur_rx;
--np->really_rx_count;
}
skb->protocol = eth_type_trans(skb, dev);
@@ -1682,22 +1678,7 @@
np->stats.rx_bytes += pkt_len;
}
- if (np->cur_rx->skbuff == NULL) {
- struct sk_buff *skb;
-
- skb = dev_alloc_skb(np->rx_buf_sz);
-
- if (skb != NULL) {
- skb->dev = dev; /* Mark as being used by this device. */
- np->cur_rx->buffer = pci_map_single(np->pci_dev, skb->tail,
- np->rx_buf_sz, PCI_DMA_FROMDEVICE);
- np->cur_rx->skbuff = skb;
- ++np->really_rx_count;
- }
- }
-
- if (np->cur_rx->skbuff != NULL)
- free_one_rx_descriptor(np);
+ np->cur_rx = np->cur_rx->next_desc_logical;
} /* end of while loop */
/* allocate skb for rx buffers */
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-28 20:19 ` Denis Vlasenko
@ 2004-03-28 23:27 ` Andreas Henriksson
2004-03-28 23:38 ` Francois Romieu
2004-03-29 7:52 ` fealnx oopses Denis Vlasenko
0 siblings, 2 replies; 31+ messages in thread
From: Andreas Henriksson @ 2004-03-28 23:27 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: Francois Romieu, Jeff Garzik, Andreas Henriksson, netdev
On Sun, Mar 28, 2004 at 10:19:56PM +0200, Denis Vlasenko wrote:
> I tested on 2.4.25, patch applied with minimal manual correction
> in last hunk (rediffed patch attached).
I tested on 2.6.5-rc2-bk7 ... with both jeffs (which was exactly the
same patch I had except for removing the return at the end of the void
function) and francis patches...
Btw. I haven't seen any panics at all because of rx, just tx, and that
problem seems solved. But I thought testing additional patches that
is supposed to cure other problems won't hurt me.. ;)
>
> Patched driver oopses in allocate_rx_buffers() shortly after boot,
It oops as soon as the first packet is sent out....
> --- fealnx.c.orig Fri Nov 28 20:26:20 2003
> +++ fealnx.c Sun Mar 28 21:32:56 2004
> @@ -1134,15 +1134,17 @@
> struct sk_buff *skb;
>
> skb = dev_alloc_skb(np->rx_buf_sz);
> - np->lack_rxbuf->skbuff = skb;
> -
> if (skb == NULL)
> break; /* Better luck next round. */
>
np->lack_rxbuf == NULL here....
(verified by inserting a "BUG_ON(np->lack_rxbuf==NULL);"...)
> + while (np->lack_rxbuf->skbuff)
and now we are dead.. ;(
> + np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
> + np->lack_rxbuf->skbuff = skb;
> +
> skb->dev = dev; /* Mark as being used by this device. */
> np->lack_rxbuf->buffer = pci_map_single(np->pci_dev, skb->tail,
> np->rx_buf_sz, PCI_DMA_FROMDEVICE);
> - np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
> + np->lack_rxbuf->status = RXOWN;
> ++np->really_rx_count;
> }
> }
> @@ -1380,33 +1382,22 @@
> return 0;
> }
>
> -
> -void free_one_rx_descriptor(struct netdev_private *np)
> -{
> - if (np->really_rx_count == RX_RING_SIZE)
> - np->cur_rx->status = RXOWN;
> - else {
> - np->lack_rxbuf->skbuff = np->cur_rx->skbuff;
> - np->lack_rxbuf->buffer = np->cur_rx->buffer;
> - np->lack_rxbuf->status = RXOWN;
> - ++np->really_rx_count;
> - np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
> - }
> - np->cur_rx = np->cur_rx->next_desc_logical;
> -}
> -
> -
> void reset_rx_descriptors(struct net_device *dev)
> {
> struct netdev_private *np = dev->priv;
> + struct fealnx_desc *cur = np->cur_rx;
> + int i;
>
> stop_nic_rx(dev->base_addr, np->crvalue);
>
> - while (!(np->cur_rx->status & RXOWN))
> - free_one_rx_descriptor(np);
> -
> allocate_rx_buffers(dev);
>
> + for (i = 0; i < RX_RING_SIZE; i++) {
> + if (cur->skbuff)
> + cur->status = RXOWN;
> + cur = cur->next_desc_logical;
> + }
> +
> writel(np->rx_ring_dma + (np->cur_rx - np->rx_ring),
> dev->base_addr + RXLBA);
> writel(np->crvalue, dev->base_addr + TCRRCR);
> @@ -1576,7 +1567,7 @@
> struct netdev_private *np = dev->priv;
>
> /* If EOP is set on the next entry, it's a new packet. Send it up. */
> - while (!(np->cur_rx->status & RXOWN)) {
> + while (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) {
> s32 rx_status = np->cur_rx->status;
>
> if (np->really_rx_count == 0)
> @@ -1628,8 +1619,15 @@
> np->stats.rx_length_errors++;
>
> /* free all rx descriptors related this long pkt */
> - for (i = 0; i < desno; ++i)
> - free_one_rx_descriptor(np);
> + for (i = 0; i < desno; ++i) {
> + if (!np->cur_rx->skbuff) {
> + printk(KERN_DEBUG
> + "%s: I'm scared\n", dev->name);
> + break;
> + }
> + np->cur_rx->status = RXOWN;
> + np->cur_rx = np->cur_rx->next_desc_logical;
> + }
> continue;
> } else { /* something error, need to reset this chip */
> reset_rx_descriptors(dev);
> @@ -1671,8 +1669,6 @@
> } else {
> skb_put(skb = np->cur_rx->skbuff, pkt_len);
> np->cur_rx->skbuff = NULL;
> - if (np->really_rx_count == RX_RING_SIZE)
> - np->lack_rxbuf = np->cur_rx;
> --np->really_rx_count;
> }
> skb->protocol = eth_type_trans(skb, dev);
> @@ -1682,22 +1678,7 @@
> np->stats.rx_bytes += pkt_len;
> }
>
> - if (np->cur_rx->skbuff == NULL) {
> - struct sk_buff *skb;
> -
> - skb = dev_alloc_skb(np->rx_buf_sz);
> -
> - if (skb != NULL) {
> - skb->dev = dev; /* Mark as being used by this device. */
> - np->cur_rx->buffer = pci_map_single(np->pci_dev, skb->tail,
> - np->rx_buf_sz, PCI_DMA_FROMDEVICE);
> - np->cur_rx->skbuff = skb;
> - ++np->really_rx_count;
> - }
> - }
> -
> - if (np->cur_rx->skbuff != NULL)
> - free_one_rx_descriptor(np);
> + np->cur_rx = np->cur_rx->next_desc_logical;
> } /* end of while loop */
>
> /* allocate skb for rx buffers */
Regards,
Andreas Henriksson
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-28 23:27 ` Andreas Henriksson
@ 2004-03-28 23:38 ` Francois Romieu
2004-03-29 17:01 ` Andreas Henriksson
2004-03-29 7:52 ` fealnx oopses Denis Vlasenko
1 sibling, 1 reply; 31+ messages in thread
From: Francois Romieu @ 2004-03-28 23:38 UTC (permalink / raw)
To: Andreas Henriksson; +Cc: Denis Vlasenko, Jeff Garzik, netdev
Andreas Henriksson <andreas@fjortis.info> :
[...]
> > --- fealnx.c.orig Fri Nov 28 20:26:20 2003
> > +++ fealnx.c Sun Mar 28 21:32:56 2004
> > @@ -1134,15 +1134,17 @@
> > struct sk_buff *skb;
> >
> > skb = dev_alloc_skb(np->rx_buf_sz);
> > - np->lack_rxbuf->skbuff = skb;
> > -
> > if (skb == NULL)
> > break; /* Better luck next round. */
> >
>
> np->lack_rxbuf == NULL here....
>
> (verified by inserting a "BUG_ON(np->lack_rxbuf==NULL);"...)
Oops, I forgot to initialize np->lack_rxbuf correctly.
If you have some time to spend, you can change in netdev_rx:
pci_unmap_single(np->pci_dev,
np->cur_rx->buffer,
np->rx_buf_sz,
PCI_DMA_FROMDEVICE);
skb_put(skb = np->cur_rx->skbuff, pkt_len);
np->cur_rx->skbuff = NULL;
--np->really_rx_count;
into:
pci_unmap_single(np->pci_dev,
np->cur_rx->buffer,
np->rx_buf_sz,
PCI_DMA_FROMDEVICE);
skb_put(skb = np->cur_rx->skbuff, pkt_len);
np->cur_rx->skbuff = NULL;
if (!np->lack_rxbuf) <<<
np->lack_rxbuf = np->cur_rx; <<<
np->cur_rx->skbuff = NULL;
--np->really_rx_count;
It may be simpler/safer to turn (init_ring):
np->lack_rxbuf = NULL;
into
np->lack_rxbuf = np->rx_ring;
I'll check the whole thing tomorrow. It's time to sleep now.
Thanks for your report.
--
Ueimor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-28 23:27 ` Andreas Henriksson
2004-03-28 23:38 ` Francois Romieu
@ 2004-03-29 7:52 ` Denis Vlasenko
1 sibling, 0 replies; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-29 7:52 UTC (permalink / raw)
To: Andreas Henriksson
Cc: Francois Romieu, Jeff Garzik, Andreas Henriksson, netdev
On Monday 29 March 2004 01:27, Andreas Henriksson wrote:
> On Sun, Mar 28, 2004 at 10:19:56PM +0200, Denis Vlasenko wrote:
> > I tested on 2.4.25, patch applied with minimal manual correction
> > in last hunk (rediffed patch attached).
>
> I tested on 2.6.5-rc2-bk7 ... with both jeffs (which was exactly the
> same patch I had except for removing the return at the end of the void
> function) and francis patches...
>
> Btw. I haven't seen any panics at all because of rx, just tx, and that
> problem seems solved. But I thought testing additional patches that
> is supposed to cure other problems won't hurt me.. ;)
Thank you, you verified that _I_ did not caused it.
(I could, I was a bit tired at the time)
You need to make your box very RAM starved for this.
I boot with mem=16M and load all iptables NAT/conntrack
modules to make network processing more CPU/RAM intensive.
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-28 23:38 ` Francois Romieu
@ 2004-03-29 17:01 ` Andreas Henriksson
2004-03-29 21:49 ` Denis Vlasenko
0 siblings, 1 reply; 31+ messages in thread
From: Andreas Henriksson @ 2004-03-29 17:01 UTC (permalink / raw)
To: Francois Romieu; +Cc: Andreas Henriksson, Denis Vlasenko, Jeff Garzik, netdev
On Mon, Mar 29, 2004 at 01:38:34AM +0200, Francois Romieu wrote:
>
> Oops, I forgot to initialize np->lack_rxbuf correctly.
>
> If you have some time to spend, you can change in netdev_rx:
> pci_unmap_single(np->pci_dev,
> np->cur_rx->buffer,
> np->rx_buf_sz,
> PCI_DMA_FROMDEVICE);
> skb_put(skb = np->cur_rx->skbuff, pkt_len);
> np->cur_rx->skbuff = NULL;
> --np->really_rx_count;
> into:
> pci_unmap_single(np->pci_dev,
> np->cur_rx->buffer,
> np->rx_buf_sz,
> PCI_DMA_FROMDEVICE);
> skb_put(skb = np->cur_rx->skbuff, pkt_len);
> np->cur_rx->skbuff = NULL;
> if (!np->lack_rxbuf) <<<
> np->lack_rxbuf = np->cur_rx; <<<
> np->cur_rx->skbuff = NULL;
> --np->really_rx_count;
>
I tried this one:
> It may be simpler/safer to turn (init_ring):
> np->lack_rxbuf = NULL;
> into
> np->lack_rxbuf = np->rx_ring;
>
> I'll check the whole thing tomorrow. It's time to sleep now.
>
And everything seems to work better then ever before.. :)
I transfered 2 GB of data over my lan and didn't even get a single "transmit
timed out" or anything.
I've attached a patch with is what I'm using at the moment that includes
both Jeff's and Francois changes below if anyone else wants to test it...
(Also available at
http://fjortis.info/pub/panic/fealnx/fealnx-jgarzik-fromieu.patch ... )
> Thanks for your report.
Thanks for your patch and useful suggestions.. :)
>
> --
> Ueimor
Regards,
Andreas Henriksson
--- org-fealnx.c 2004-03-28 18:30:37.000000000 +0200
+++ fealnx.c 2004-03-29 18:27:20.000000000 +0200
@@ -1134,15 +1134,17 @@ static void allocate_rx_buffers(struct n
struct sk_buff *skb;
skb = dev_alloc_skb(np->rx_buf_sz);
- np->lack_rxbuf->skbuff = skb;
-
if (skb == NULL)
break; /* Better luck next round. */
+ while (np->lack_rxbuf->skbuff)
+ np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->skbuff = skb;
+
skb->dev = dev; /* Mark as being used by this device. */
np->lack_rxbuf->buffer = pci_map_single(np->pci_dev, skb->tail,
np->rx_buf_sz, PCI_DMA_FROMDEVICE);
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->status = RXOWN;
++np->really_rx_count;
}
}
@@ -1251,7 +1253,7 @@ static void init_ring(struct net_device
/* initialize rx variables */
np->rx_buf_sz = (dev->mtu <= 1500 ? PKT_BUF_SZ : dev->mtu + 32);
np->cur_rx = &np->rx_ring[0];
- np->lack_rxbuf = NULL;
+ np->lack_rxbuf = np->rx_ring;
np->really_rx_count = 0;
/* initial rx descriptors. */
@@ -1303,14 +1305,15 @@ static void init_ring(struct net_device
/* for the last tx descriptor */
np->tx_ring[i - 1].next_desc = np->tx_ring_dma;
np->tx_ring[i - 1].next_desc_logical = &np->tx_ring[0];
-
- return;
}
static int start_tx(struct sk_buff *skb, struct net_device *dev)
{
struct netdev_private *np = dev->priv;
+ unsigned long flags;
+
+ spin_lock_irqsave(&np->lock, flags);
np->cur_tx_copy->skbuff = skb;
@@ -1340,7 +1343,7 @@ static int start_tx(struct sk_buff *skb,
np->cur_tx_copy->control |= (BPT << TBSShift); /* buffer size */
/* for the last descriptor */
- next = (struct fealnx *) np->cur_tx_copy.next_desc_logical;
+ next = np->cur_tx_copy.next_desc_logical;
next->skbuff = skb;
next->control = TXIC | TXLD | CRCEnable | PADEnable;
next->control |= (skb->len << PKTSShift); /* pkt size */
@@ -1377,36 +1380,26 @@ static int start_tx(struct sk_buff *skb,
writel(0, dev->base_addr + TXPDR);
dev->trans_start = jiffies;
+ spin_unlock_irqrestore(&np->lock, flags);
return 0;
}
-
-void free_one_rx_descriptor(struct netdev_private *np)
-{
- if (np->really_rx_count == RX_RING_SIZE)
- np->cur_rx->status = RXOWN;
- else {
- np->lack_rxbuf->skbuff = np->cur_rx->skbuff;
- np->lack_rxbuf->buffer = np->cur_rx->buffer;
- np->lack_rxbuf->status = RXOWN;
- ++np->really_rx_count;
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
- }
- np->cur_rx = np->cur_rx->next_desc_logical;
-}
-
-
void reset_rx_descriptors(struct net_device *dev)
{
struct netdev_private *np = dev->priv;
+ struct fealnx_desc *cur = np->cur_rx;
+ int i;
stop_nic_rx(dev->base_addr, np->crvalue);
- while (!(np->cur_rx->status & RXOWN))
- free_one_rx_descriptor(np);
-
allocate_rx_buffers(dev);
+ for (i = 0; i < RX_RING_SIZE; i++) {
+ if (cur->skbuff)
+ cur->status = RXOWN;
+ cur = cur->next_desc_logical;
+ }
+
writel(np->rx_ring_dma + (np->cur_rx - np->rx_ring),
dev->base_addr + RXLBA);
writel(np->crvalue, dev->base_addr + TCRRCR);
@@ -1423,6 +1416,8 @@ static irqreturn_t intr_handler(int irq,
unsigned int num_tx = 0;
int handled = 0;
+ spin_lock(&np->lock);
+
writel(0, dev->base_addr + IMR);
ioaddr = dev->base_addr;
@@ -1565,6 +1560,8 @@ static irqreturn_t intr_handler(int irq,
writel(np->imrvalue, ioaddr + IMR);
+ spin_unlock(&np->lock);
+
return IRQ_RETVAL(handled);
}
@@ -1576,7 +1573,7 @@ static int netdev_rx(struct net_device *
struct netdev_private *np = dev->priv;
/* If EOP is set on the next entry, it's a new packet. Send it up. */
- while (!(np->cur_rx->status & RXOWN)) {
+ while (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) {
s32 rx_status = np->cur_rx->status;
if (np->really_rx_count == 0)
@@ -1628,8 +1625,15 @@ static int netdev_rx(struct net_device *
np->stats.rx_length_errors++;
/* free all rx descriptors related this long pkt */
- for (i = 0; i < desno; ++i)
- free_one_rx_descriptor(np);
+ for (i = 0; i < desno; ++i) {
+ if (!np->cur_rx->skbuff) {
+ printk(KERN_DEBUG
+ "%s: I'm scared\n", dev->name);
+ break;
+ }
+ np->cur_rx->status = RXOWN;
+ np->cur_rx = np->cur_rx->next_desc_logical;
+ }
continue;
} else { /* something error, need to reset this chip */
reset_rx_descriptors(dev);
@@ -1679,8 +1683,6 @@ static int netdev_rx(struct net_device *
PCI_DMA_FROMDEVICE);
skb_put(skb = np->cur_rx->skbuff, pkt_len);
np->cur_rx->skbuff = NULL;
- if (np->really_rx_count == RX_RING_SIZE)
- np->lack_rxbuf = np->cur_rx;
--np->really_rx_count;
}
skb->protocol = eth_type_trans(skb, dev);
@@ -1689,25 +1691,7 @@ static int netdev_rx(struct net_device *
np->stats.rx_packets++;
np->stats.rx_bytes += pkt_len;
}
-
- if (np->cur_rx->skbuff == NULL) {
- struct sk_buff *skb;
-
- skb = dev_alloc_skb(np->rx_buf_sz);
-
- if (skb != NULL) {
- skb->dev = dev; /* Mark as being used by this device. */
- np->cur_rx->buffer = pci_map_single(np->pci_dev,
- skb->tail,
- np->rx_buf_sz,
- PCI_DMA_FROMDEVICE);
- np->cur_rx->skbuff = skb;
- ++np->really_rx_count;
- }
- }
-
- if (np->cur_rx->skbuff != NULL)
- free_one_rx_descriptor(np);
+ np->cur_rx = np->cur_rx->next_desc_logical;
} /* end of while loop */
/* allocate skb for rx buffers */
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-29 17:01 ` Andreas Henriksson
@ 2004-03-29 21:49 ` Denis Vlasenko
2004-03-29 22:20 ` Francois Romieu
0 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-29 21:49 UTC (permalink / raw)
To: Andreas Henriksson, Francois Romieu
Cc: Andreas Henriksson, Jeff Garzik, netdev, Denis Vlasenko
[CCed myself to have nice thread view in Kmail]
> I tried this one:
> > It may be simpler/safer to turn (init_ring):
> > np->lack_rxbuf = NULL;
> > into
> > np->lack_rxbuf = np->rx_ring;
> >
> > I'll check the whole thing tomorrow. It's time to sleep now.
>
> And everything seems to work better then ever before.. :)
>
> I transfered 2 GB of data over my lan and didn't even get a single
> "transmit timed out" or anything.
I tested this. Does not oops. Drowns in 'too much work in interrupt'
and 0-order alloc failures instead. I cannot ctrl-C my netcat
in order to stop UDP flood. I shall be able to do it.
I revieved patched driver code a bit and have a couple of questions.
I'll post code chunks and then questions below. Code contains
my debugging additions, all flagged with '//vda' comments.
I did not come up with a new patch because I don't feel
stupid (brave?) enough to decide what to do with rx
when there is no buffers. I don't fully understand how
this stuff interacts with PCI bus and card's hardware
(how does rx packet data end up in RAM).
struct fealnx_desc *cur_rx;
struct fealnx_desc *lack_rxbuf;
please, lack_rx without 'buf'. Sometimes I grep for
'_rx->something'.
np->cur_rx = &np->rx_ring[0];
np->lack_rxbuf = np->rx_ring;
same value, differently expressed.
/* for the last descriptor */
next = np->cur_tx_copy.next_desc_logical;
this code is #ifdef'ed out, but this ^^^ is still a bug.
shall be '->'
Now, more serious stuff:
in intr_handler():
if (--boguscnt < 0) {
printk(KERN_WARNING "%s: Too much work at interrupt, "
"status=0x%4.4x.\n", dev->name, intr_status);
break;
}
Shall we do something with this condition?
What if card is simply go mad? Maybe card reset?
static int netdev_rx(struct net_device *dev)
{
struct netdev_private *np = dev->priv;
if( ! (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) ) { //vda:
printk(KERN_ERR "netdev_rx(): nothing to do?! (np->cur_rx->status & RXOWN) == 0x%04x, np->cur_rx->skbuff == %p\n"
,(np->cur_rx->status & RXOWN)
,np->cur_rx->skbuff
);
}
I added this. If we trigger this, netdev_rx won't enter
while() loop and will do essentially nothing
except for trying to allocate_rx_buffers(dev).
I have a suspicion that device will trigger intr
again _for the same packet_.
Do we need some code to tell device to drop this packet?
I did trigger this right before 'too much work'
(RXOWN was set, ->skbuff was not NULL).
What does it mean? Card received a packet but _not_
into this buffer? How card decides into which buffer
to receive? Shall we check them all?
if (np->really_rx_count == 0) { //vda:
printk(KERN_ERR "netdev_rx(): np->really_rx_count is 0 before while()\n");
}
did not trigger this, but if it happens, above comment
apply. There were no free rx buffers, yet we are in rx intr.
Time to cry out loud!
/* If EOP is set on the next entry, it's a new packet. Send it up. */
while (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) {
s32 rx_status = np->cur_rx->status;
...
further in netdev_rx():
if (pkt_len < rx_copybreak &&
(skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
skb->dev = dev;
skb_reserve(skb, 2); /* 16 byte align the IP header */
16 *bit* align maybe?
/* Call copy + cksum if available. */
#if ! defined(__alpha__)
eth_copy_and_sum(skb,
np->cur_rx->skbuff->tail, pkt_len, 0);
skb_put(skb, pkt_len);
#else
memcpy(skb_put(skb, pkt_len),
np->cur_rx->skbuff->tail, pkt_len);
#endif
} else {
skb_put(skb = np->cur_rx->skbuff, pkt_len);
np->cur_rx->skbuff = NULL;
--np->really_rx_count;
}
skb->protocol = eth_type_trans(skb, dev);
netif_rx(skb);
dev->last_rx = jiffies;
np->stats.rx_packets++;
np->stats.rx_bytes += pkt_len;
}
np->cur_rx = np->cur_rx->next_desc_logical;
} /* end of while loop */
if(pkt_len < rx_copybreak...) path is taken, skbuff is still usable
for next rx, no? Then why np->cur_rx = np->cur_rx->next_desc_logical?
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-29 21:49 ` Denis Vlasenko
@ 2004-03-29 22:20 ` Francois Romieu
2004-03-29 22:50 ` Denis Vlasenko
0 siblings, 1 reply; 31+ messages in thread
From: Francois Romieu @ 2004-03-29 22:20 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: Andreas Henriksson, Jeff Garzik, netdev
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> :
[...]
> in intr_handler():
> if (--boguscnt < 0) {
> printk(KERN_WARNING "%s: Too much work at interrupt, "
> "status=0x%4.4x.\n", dev->name, intr_status);
> break;
> }
> Shall we do something with this condition?
> What if card is simply go mad? Maybe card reset?
1 - Yes.
2 - disable the offending interruption/NAPI (reset is not needed)
[...]
> static int netdev_rx(struct net_device *dev)
> {
> struct netdev_private *np = dev->priv;
>
> if( ! (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) ) { //vda:
> printk(KERN_ERR "netdev_rx(): nothing to do?! (np->cur_rx->status & RXOWN) == 0x%04x, np->cur_rx->skbuff == %p\n"
> ,(np->cur_rx->status & RXOWN)
> ,np->cur_rx->skbuff
> );
> }
> I added this. If we trigger this, netdev_rx won't enter
> while() loop and will do essentially nothing
> except for trying to allocate_rx_buffers(dev).
It is supposed to mean that there is an unallocated buffer in the ring and
that the driver has simply wrapped to the point where it met it again.
So there is only one thing to do: try to allocate.
[...]
> I did trigger this right before 'too much work'
> (RXOWN was set, ->skbuff was not NULL).
> What does it mean? Card received a packet but _not_
> into this buffer? How card decides into which buffer
> to receive? Shall we check them all?
It probably means that several packets were processed during a previous
interruption so when this interruption is triggered, there's nothing to
do.
[...]
> further in netdev_rx():
> if (pkt_len < rx_copybreak &&
> (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
> skb->dev = dev;
> skb_reserve(skb, 2); /* 16 byte align the IP header */
> 16 *bit* align maybe?
Classical typo. Don't change it :o)
[...]
> np->cur_rx = np->cur_rx->next_desc_logical;
> } /* end of while loop */
> if(pkt_len < rx_copybreak...) path is taken, skbuff is still usable
> for next rx, no? Then why np->cur_rx = np->cur_rx->next_desc_logical?
Not for the next Rx: the whole ring will have to be processed first. The
sole difference when copybreak does not apply is that an allocation should
be performed for the relevant descriptor. The descriptor are set up in a
circular list and the asic walks this list. So whatever happens, the driver
must consider the next descriptor as current for the upcoming interruption.
--
Ueimor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-29 22:20 ` Francois Romieu
@ 2004-03-29 22:50 ` Denis Vlasenko
2004-03-29 23:16 ` Denis Vlasenko
[not found] ` <4068AC87.2030506@pobox.com>
0 siblings, 2 replies; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-29 22:50 UTC (permalink / raw)
To: Francois Romieu; +Cc: Andreas Henriksson, Jeff Garzik, netdev, Denis Vlasenko
I think bug can be considered fixed if I can start
netcat UDP flood, wait however long I want, then press
ctrl-C and get my bash prompt back. Local netcat
closes socket and exits, remote netcat gets its
icmp 'port unreachable' and exits too. Everybody's
happy.
Oopses are gone but it looks like box is so much interrupt
flooded that userspace has no chance of processing ctrl-C.
What can we do? I think driver can do something useful
whet it detects 'too much work in interrupt'. Disabling rx
for several ms seems like 'quick and dirty' way.
Francois what do you think? Can you code something up
for me to test?
On Tuesday 30 March 2004 00:20, Francois Romieu wrote:
> Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua>:
> [...]
>
> > in intr_handler():
> > if (--boguscnt < 0) {
> > printk(KERN_WARNING "%s: Too much work at
> > interrupt, " "status=0x%4.4x.\n", dev->name, intr_status); break;
> > }
> > Shall we do something with this condition?
> > What if card is simply go mad? Maybe card reset?
>
> 1 - Yes.
> 2 - disable the offending interruption/NAPI (reset is not needed)
Imagine that hardware got stuck with intr constantly asserted.
Reset can cure that. In any event, it might give us a needed
pause of several ms, just what I want.
If you worry about lost packets, that's not a concern -
if we reached this, we are dropping tons of them already.
> [...]
>
> > static int netdev_rx(struct net_device *dev)
> > {
> > struct netdev_private *np = dev->priv;
> >
> > if( ! (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) ) {
> > //vda: printk(KERN_ERR "netdev_rx(): nothing to do?! (np->cur_rx->status
> > & RXOWN) == 0x%04x, np->cur_rx->skbuff == %p\n" ,(np->cur_rx->status &
> > RXOWN)
> > ,np->cur_rx->skbuff
> > );
> > }
> > I added this. If we trigger this, netdev_rx won't enter
> > while() loop and will do essentially nothing
> > except for trying to allocate_rx_buffers(dev).
>
> It is supposed to mean that there is an unallocated buffer in the ring and
> that the driver has simply wrapped to the point where it met it again.
> So there is only one thing to do: try to allocate.
Hm, but why we got rx intr at all? Card couldn't receive packet into
non-allocated buffer, right?
> [...]
>
> > I did trigger this right before 'too much work'
> > (RXOWN was set, ->skbuff was not NULL).
> > What does it mean? Card received a packet but _not_
> > into this buffer? How card decides into which buffer
> > to receive? Shall we check them all?
>
> It probably means that several packets were processed during a previous
> interruption so when this interruption is triggered, there's nothing to
> do.
Aha, card didn't know that and prods CPU again. I got it.
> [...]
>
> > np->cur_rx = np->cur_rx->next_desc_logical;
> > } /* end of while loop */
> > if(pkt_len < rx_copybreak...) path is taken, skbuff is still usable
> > for next rx, no? Then why np->cur_rx = np->cur_rx->next_desc_logical?
>
> Not for the next Rx: the whole ring will have to be processed first. The
> sole difference when copybreak does not apply is that an allocation should
> be performed for the relevant descriptor. The descriptor are set up in a
> circular list and the asic walks this list. So whatever happens, the driver
> must consider the next descriptor as current for the upcoming interruption.
/me feels enlightened
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-29 22:50 ` Denis Vlasenko
@ 2004-03-29 23:16 ` Denis Vlasenko
2004-03-31 21:01 ` Francois Romieu
[not found] ` <4068AC87.2030506@pobox.com>
1 sibling, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-29 23:16 UTC (permalink / raw)
To: Francois Romieu; +Cc: Andreas Henriksson, Jeff Garzik, netdev, Denis Vlasenko
On Tuesday 30 March 2004 00:50, Denis Vlasenko wrote:
> I think bug can be considered fixed if I can start
> netcat UDP flood, wait however long I want, then press
> ctrl-C and get my bash prompt back. Local netcat
> closes socket and exits, remote netcat gets its
> icmp 'port unreachable' and exits too. Everybody's
> happy.
>
> Oopses are gone but it looks like box is so much interrupt
> flooded that userspace has no chance of processing ctrl-C.
> What can we do? I think driver can do something useful
> whet it detects 'too much work in interrupt'. Disabling rx
> for several ms seems like 'quick and dirty' way.
I just verified that even if I stop remote netcat,
box does not recover. Console fills with
"Too much work at interrupt, status=0x0020"
0x0020 is RBU = 0x00000020, /* receive buffer unavailable */
> Francois what do you think? Can you code something up
> for me to test?
>
> On Tuesday 30 March 2004 00:20, Francois Romieu wrote:
> > Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua>:
> > [...]
> >
> > > in intr_handler():
> > > if (--boguscnt < 0) {
> > > printk(KERN_WARNING "%s: Too much work at
> > > interrupt, " "status=0x%4.4x.\n", dev->name, intr_status); break;
> > > }
> > > Shall we do something with this condition?
> > > What if card is simply go mad? Maybe card reset?
> >
> > 1 - Yes.
> > 2 - disable the offending interruption/NAPI (reset is not needed)
>
> Imagine that hardware got stuck with intr constantly asserted.
> Reset can cure that. In any event, it might give us a needed
> pause of several ms, just what I want.
>
> If you worry about lost packets, that's not a concern -
> if we reached this, we are dropping tons of them already.
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
[not found] ` <4068AC87.2030506@pobox.com>
@ 2004-03-29 23:18 ` Denis Vlasenko
2004-03-31 16:39 ` fealnx oopses (with [PATCH]) Denis Vlasenko
0 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-29 23:18 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Francois Romieu, Andreas Henriksson, netdev, Denis Vlasenko
On Tuesday 30 March 2004 01:08, Jeff Garzik wrote:
> Denis Vlasenko wrote:
> > I think bug can be considered fixed if I can start
> > netcat UDP flood, wait however long I want, then press
> > ctrl-C and get my bash prompt back. Local netcat
> > closes socket and exits, remote netcat gets its
> > icmp 'port unreachable' and exits too. Everybody's
> > happy.
> >
> > Oopses are gone but it looks like box is so much interrupt
> > flooded that userspace has no chance of processing ctrl-C.
> > What can we do? I think driver can do something useful
> > whet it detects 'too much work in interrupt'. Disabling rx
> > for several ms seems like 'quick and dirty' way.
> >
> > Francois what do you think? Can you code something up
> > for me to test?
>
> Andreas had a francois+jgarzik patch, I thought...?
>
> Looking good...
Yes, I did all these tests with this patch.
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses (with [PATCH])
2004-03-29 23:18 ` Denis Vlasenko
@ 2004-03-31 16:39 ` Denis Vlasenko
2004-03-31 19:24 ` Andreas Henriksson
0 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-31 16:39 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Francois Romieu, Andreas Henriksson, netdev, Denis Vlasenko
[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]
> > > Oopses are gone but it looks like box is so much interrupt
> > > flooded that userspace has no chance of processing ctrl-C.
> > > What can we do? I think driver can do something useful
> > > whet it detects 'too much work in interrupt'. Disabling rx
> > > for several ms seems like 'quick and dirty' way.
> > >
> > > Francois what do you think? Can you code something up
> > > for me to test?
> >
> > Andreas had a francois+jgarzik patch, I thought...?
> >
> > Looking good...
>
> Yes, I did all these tests with this patch.
Ok, here is what I have now.
What we had before:
original behaviour: Andreas had tx tomeouts, I had fatal oops.
francois+jgarzik patch: Andreas happy, I had lockup (endless 'Too much work')
Now, with the attached patch, I *don't* lock up. I can ctrl-C
netcat, UDP flood stops and card is alive, tested with pinging.
I modified 'Too much work in interrupt' code.
I added code which completely stops rx and tx and schedules
card reset a-la reset previously used in tx_timeout code path.
There is 1 second delay.
In testing, I saw this code triggering a number of times.
It works as intended.
I also have seen tx timeout events. (I touched that code a bit,
factoring out reset code, but without changing logic). They did
not hang the box, either, but box was unable to do any tx.
tx timeouts just happened again and again.
Remote side decided that we are gone and stopped UDP flood,
starting ARP address resolution instead, but test box could
not answer even that.
In this state I can ctrl-C local netcat and box recovers
after several seconds. Tx is working again, I can ping other
hosts, etc.
I conclude that tx timeout logic does not handle situation
when *local* process generates and submits tons of packets
at enormous rate.
Now, the patch itself. It is against 2.4.25.
Sorry, it contains some debugging stuff and some future work
(e.g. #defines). I decided to not remove it now to avoid
brown paper bugs. (I can remove and retest, but testing can
take another couple of hours...)
I can code tx timeout similarly to 'Too much work', but I
retained it 'as is' in hopes of your comments on both
code paths.
--
vda
[-- Attachment #2: fealnx-jgarzik-fromieu-vda.24.patch --]
[-- Type: text/x-diff, Size: 14732 bytes --]
--- fealnx.c.original Fri Nov 28 20:26:20 2003
+++ fealnx.c Wed Mar 31 17:26:57 2004
@@ -210,6 +210,26 @@
BPREMRPSR = 0x5c, /* bypass & receive error mask and phy status */
};
+//vda: TCRRCR values collected from code: W for writing, R for reading
+#define CR_W_ENH 0x02000000 /* enhanced mode */
+#define CR_W_FD 0x00100000 /* full duplex */
+#define CR_W_PS10 0x00080000 /* 10 mbit */
+#define CR_W_TXEN 0x00040000 /* tx enable */
+#define CR_W_PS1000 0x00010000 /* 1000 mbit */
+#define CR_W_RXMASK 0x000000e0
+#define CR_W_PROM 0x00000080 /* promiscuous mode */
+#define CR_W_AB 0x00000040 /* accept broadcast */
+#define CR_W_AM 0x00000020 /* accept mutlicast */
+#define CR_W_ARP 0x00000008 /* receive runt pkt */
+#define CR_W_ALP 0x00000004 /* receive long pkt */
+#define CR_W_SEP 0x00000002 /* receive error pkt */
+#define CR_W_RXEN 0x00000001 /* rx enable (unicast?) */
+
+#define CR_R_TXSTOP 0x04000000 /* tx stopped */
+#define CR_R_FD 0x00100000 /* full duplex detected */
+#define CR_R_PS10 0x00080000 /* 10 mbit detected */
+#define CR_R_RXSTOP 0x00008000 /* rx stopped */
+
/* Bits in the interrupt status/enable registers. */
/* The bits in the Intr Status/Enable registers, mostly interrupt sources. */
enum intr_status_bits {
@@ -401,6 +421,12 @@
/* Media monitoring timer. */
struct timer_list timer;
+ /* Reset timer */ //vda:
+ struct timer_list reset_timer;
+ int reset_timer_armed;
+ unsigned long crvalue_sv;
+ unsigned long imrvalue_sv;
+
/* Frequently used values: keep some adjacent for cache effect. */
int flags;
struct pci_dev *pci_dev;
@@ -436,6 +462,7 @@
static void getlinktype(struct net_device *dev);
static void getlinkstatus(struct net_device *dev);
static void netdev_timer(unsigned long data);
+static void reset_timer(unsigned long data); //vda:
static void tx_timeout(struct net_device *dev);
static void init_ring(struct net_device *dev);
static int start_tx(struct sk_buff *skb, struct net_device *dev);
@@ -478,6 +505,19 @@
}
+void stop_nic_rxtx(long ioaddr, long crvalue) //vda:
+{
+ int delay = 0x1000;
+ writel(crvalue & (~0x40001), ioaddr + TCRRCR);
+
+ /* wait for rx and tx stop */
+ while(delay--) {
+ int t = readl(ioaddr + TCRRCR);
+ if( (t & 0x04008000) == 0x04008000 ) break;
+ }
+}
+
+
static int __devinit fealnx_init_one(struct pci_dev *pdev,
const struct pci_device_id *ent)
@@ -666,7 +706,7 @@
dev->hard_start_xmit = &start_tx;
dev->stop = &netdev_close;
dev->get_stats = &get_stats;
- dev->set_multicast_list = &set_rx_mode;
+ dev->set_multicast_list = &set_rx_mode; //vda: hmm... set_rx_mode needs locking? TODO
dev->do_ioctl = &mii_ioctl;
dev->ethtool_ops = &netdev_ethtool_ops;
dev->tx_timeout = tx_timeout;
@@ -723,7 +763,8 @@
unsigned int m80x_read_tick(void)
/* function: Reads the Timer tick count register which decrements by 2 from */
-/* 65536 to 0 every 1/36.414 of a second. Each 2 decrements of the *//* count represents 838 nsec's. */
+/* 65536 to 0 every 1/36.414 of a second. Each 2 decrements of the */
+/* count represents 838 nsec's. */
/* input : none. */
/* output : none. */
{
@@ -985,6 +1026,11 @@
/* timer handler */
add_timer(&np->timer);
+ init_timer(&np->reset_timer); //vda:
+ np->reset_timer.data = (unsigned long) dev;
+ np->reset_timer.function = &reset_timer;
+ np->reset_timer_armed = 0;
+
return 0;
}
@@ -1134,15 +1180,17 @@
struct sk_buff *skb;
skb = dev_alloc_skb(np->rx_buf_sz);
- np->lack_rxbuf->skbuff = skb;
-
if (skb == NULL)
break; /* Better luck next round. */
+ while (np->lack_rxbuf->skbuff)
+ np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->skbuff = skb;
+
skb->dev = dev; /* Mark as being used by this device. */
np->lack_rxbuf->buffer = pci_map_single(np->pci_dev, skb->tail,
np->rx_buf_sz, PCI_DMA_FROMDEVICE);
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->status = RXOWN;
++np->really_rx_count;
}
}
@@ -1156,19 +1204,25 @@
int next_tick = 10 * HZ;
int old_crvalue = np->crvalue;
unsigned int old_linkok = np->linkok;
+ unsigned long flags;
if (debug)
printk(KERN_DEBUG "%s: Media selection timer tick, status %8.8x "
"config %8.8x.\n", dev->name, readl(ioaddr + ISR),
readl(ioaddr + TCRRCR));
+ spin_lock_irqsave(&np->lock, flags); //vda:
+
if (np->flags == HAS_MII_XCVR) {
getlinkstatus(dev);
if ((old_linkok == 0) && (np->linkok == 1)) { /* we need to detect the media type again */
getlinktype(dev);
if (np->crvalue != old_crvalue) {
- stop_nic_tx(ioaddr, np->crvalue);
- stop_nic_rx(ioaddr, np->crvalue & (~0x40000));
+ //vda: stop_nic_tx(ioaddr, np->crvalue);
+ //stop_nic_rx(ioaddr, np->crvalue & (~0x40000));
+ /* stop rx,tx and start again, */
+ /* this switches us to new mode */
+ stop_nic_rxtx(ioaddr, np->crvalue);
writel(np->crvalue, ioaddr + TCRRCR);
}
}
@@ -1176,36 +1230,22 @@
allocate_rx_buffers(dev);
+ spin_unlock_irqrestore(&np->lock, flags);
+
np->timer.expires = RUN_AT(next_tick);
add_timer(&np->timer);
}
-
-static void tx_timeout(struct net_device *dev)
+/* Reinit. Gross */
+static void vda_reset_disable_rxtx(struct net_device *dev) //vda:
{
struct netdev_private *np = dev->priv;
long ioaddr = dev->base_addr;
int i;
- printk(KERN_WARNING "%s: Transmit timed out, status %8.8x,"
- " resetting...\n", dev->name, readl(ioaddr + ISR));
-
- {
-
- printk(KERN_DEBUG " Rx ring %p: ", np->rx_ring);
- for (i = 0; i < RX_RING_SIZE; i++)
- printk(" %8.8x", (unsigned int) np->rx_ring[i].status);
- printk("\n" KERN_DEBUG " Tx ring %p: ", np->tx_ring);
- for (i = 0; i < TX_RING_SIZE; i++)
- printk(" %4.4x", np->tx_ring[i].status);
- printk("\n");
- }
-
- /* Reinit. Gross */
-
/* Reset the chip's Tx and Rx processes. */
- stop_nic_tx(ioaddr, 0);
- reset_rx_descriptors(dev);
+ stop_nic_rxtx(ioaddr, 0);
+ //reset_rx_descriptors(dev); //moved down: contains stop_nic_rx(ioaddr, np->crvalue)!
/* Disable interrupts by clearing the interrupt mask. */
writel(0x0000, ioaddr + IMR);
@@ -1219,6 +1259,14 @@
readl(ioaddr + BCR);
rmb();
}
+}
+
+static void vda_enable_rxtx(struct net_device *dev) //vda:
+{
+ struct netdev_private *np = dev->priv;
+ long ioaddr = dev->base_addr;
+
+ reset_rx_descriptors(dev); // contains stop_nic_rx(ioaddr, np->crvalue)
writel((np->cur_tx - np->tx_ring)*sizeof(struct fealnx_desc) +
np->tx_ring_dma, ioaddr + TXLBA);
@@ -1227,18 +1275,66 @@
writel(np->bcrvalue, ioaddr + BCR);
- writel(0, dev->base_addr + RXPDR);
- set_rx_mode(dev);
+ writel(0, ioaddr + RXPDR); // is order of these ok?
+ set_rx_mode(dev); // changes np->crvalue, writes TCRRCR
+
/* Clear and Enable interrupts by setting the interrupt mask. */
writel(FBE | TUNF | CNTOVF | RBU | TI | RI, ioaddr + ISR);
writel(np->imrvalue, ioaddr + IMR);
- writel(0, dev->base_addr + TXPDR);
+ writel(0, ioaddr + TXPDR);
+}
+
+static void reset_timer(unsigned long data) //vda:
+{
+ struct net_device *dev = (struct net_device *) data;
+ struct netdev_private *np = dev->priv;
+ unsigned long flags;
+
+ printk(KERN_WARNING "%s: resetting tx and rx machinery\n", dev->name);
+
+ spin_lock_irqsave(&np->lock, flags);
+ np->crvalue = np->crvalue_sv;
+ np->imrvalue = np->imrvalue_sv;
+
+ vda_reset_disable_rxtx(dev);
+ vda_enable_rxtx(dev);
+ netif_start_queue(dev); // netif_tx_enable(dev) doesn't exist, right?
+
+ np->reset_timer_armed = 0;
+
+ spin_unlock_irqrestore(&np->lock, flags);
+}
+
+
+static void tx_timeout(struct net_device *dev)
+{
+ struct netdev_private *np = dev->priv;
+ long ioaddr = dev->base_addr;
+ unsigned long flags;
+ int i;
+
+ printk(KERN_WARNING "%s: Transmit timed out, status %8.8x,"
+ " resetting...\n", dev->name, readl(ioaddr + ISR));
+
+ {
+
+ printk(KERN_DEBUG " Rx ring %p: ", np->rx_ring);
+ for (i = 0; i < RX_RING_SIZE; i++)
+ printk(" %8.8x", (unsigned int) np->rx_ring[i].status);
+ printk("\n" KERN_DEBUG " Tx ring %p: ", np->tx_ring);
+ for (i = 0; i < TX_RING_SIZE; i++)
+ printk(" %4.4x", np->tx_ring[i].status);
+ printk("\n");
+ }
+
+ spin_lock_irqsave(&np->lock, flags); //vda:
+ vda_reset_disable_rxtx(dev);
+ vda_enable_rxtx(dev);
+ spin_unlock_irqrestore(&np->lock, flags);
dev->trans_start = jiffies;
np->stats.tx_errors++;
-
- return;
}
@@ -1251,7 +1347,7 @@
/* initialize rx variables */
np->rx_buf_sz = (dev->mtu <= 1500 ? PKT_BUF_SZ : dev->mtu + 32);
np->cur_rx = &np->rx_ring[0];
- np->lack_rxbuf = NULL;
+ np->lack_rxbuf = np->rx_ring;
np->really_rx_count = 0;
/* initial rx descriptors. */
@@ -1303,14 +1399,15 @@
/* for the last tx descriptor */
np->tx_ring[i - 1].next_desc = np->tx_ring_dma;
np->tx_ring[i - 1].next_desc_logical = &np->tx_ring[0];
-
- return;
}
static int start_tx(struct sk_buff *skb, struct net_device *dev)
{
struct netdev_private *np = dev->priv;
+ unsigned long flags;
+
+ spin_lock_irqsave(&np->lock, flags);
np->cur_tx_copy->skbuff = skb;
@@ -1340,7 +1437,7 @@
np->cur_tx_copy->control |= (BPT << TBSShift); /* buffer size */
/* for the last descriptor */
- next = (struct fealnx *) np->cur_tx_copy.next_desc_logical;
+ next = np->cur_tx_copy.next_desc_logical;
next->skbuff = skb;
next->control = TXIC | TXLD | CRCEnable | PADEnable;
next->control |= (skb->len << PKTSShift); /* pkt size */
@@ -1377,36 +1474,26 @@
writel(0, dev->base_addr + TXPDR);
dev->trans_start = jiffies;
+ spin_unlock_irqrestore(&np->lock, flags);
return 0;
}
-
-void free_one_rx_descriptor(struct netdev_private *np)
-{
- if (np->really_rx_count == RX_RING_SIZE)
- np->cur_rx->status = RXOWN;
- else {
- np->lack_rxbuf->skbuff = np->cur_rx->skbuff;
- np->lack_rxbuf->buffer = np->cur_rx->buffer;
- np->lack_rxbuf->status = RXOWN;
- ++np->really_rx_count;
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
- }
- np->cur_rx = np->cur_rx->next_desc_logical;
-}
-
-
void reset_rx_descriptors(struct net_device *dev)
{
struct netdev_private *np = dev->priv;
+ struct fealnx_desc *cur = np->cur_rx;
+ int i;
stop_nic_rx(dev->base_addr, np->crvalue);
- while (!(np->cur_rx->status & RXOWN))
- free_one_rx_descriptor(np);
-
allocate_rx_buffers(dev);
+ for (i = 0; i < RX_RING_SIZE; i++) {
+ if (cur->skbuff)
+ cur->status = RXOWN;
+ cur = cur->next_desc_logical;
+ }
+
writel(np->rx_ring_dma + (np->cur_rx - np->rx_ring),
dev->base_addr + RXLBA);
writel(np->crvalue, dev->base_addr + TCRRCR);
@@ -1423,6 +1510,8 @@
unsigned int num_tx = 0;
int handled = 0;
+ spin_lock(&np->lock);
+
writel(0, dev->base_addr + IMR);
ioaddr = dev->base_addr;
@@ -1548,6 +1637,23 @@
if (--boguscnt < 0) {
printk(KERN_WARNING "%s: Too much work at interrupt, "
"status=0x%4.4x.\n", dev->name, intr_status);
+ if(!np->reset_timer_armed) { //vda:
+ printk(KERN_WARNING "scheduling card reset\n");
+ np->reset_timer.expires = RUN_AT(HZ/2);
+ add_timer(&np->reset_timer);
+ np->reset_timer_armed = 1;
+ //taken from reset code:
+ //stop_nic_tx(ioaddr, 0); //stop tx
+
+ // without this, I've seen one lockup and one 'forever stuck tx' event
+ stop_nic_rxtx(ioaddr, 0); //stop rx
+ netif_stop_queue(dev); // or netif_tx_disable(dev);??
+ /* Prevent other codepaths from enabling tx,rx,intrs */
+ np->crvalue_sv = np->crvalue;
+ np->crvalue &= ~(CR_W_TXEN | CR_W_RXEN); // or simply = 0?
+ np->imrvalue_sv = np->imrvalue;
+ np->imrvalue = 0;
+ }
break;
}
} while (1);
@@ -1564,6 +1670,8 @@
dev->name, readl(ioaddr + ISR));
writel(np->imrvalue, ioaddr + IMR);
+out_unlock:
+ spin_unlock(&np->lock);
return IRQ_RETVAL(handled);
}
@@ -1575,12 +1683,25 @@
{
struct netdev_private *np = dev->priv;
+ //if( ! (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) ) { //vda:
+ // printk(KERN_ERR "netdev_rx(): nothing to do?! (np->cur_rx->status & RXOWN) == 0x%04x, np->cur_rx->skbuff == %p\n"
+ // ,(np->cur_rx->status & RXOWN)
+ // ,np->cur_rx->skbuff
+ // );
+ //}
+
+ //if (np->really_rx_count == 0) { //vda:
+ // printk(KERN_ERR "netdev_rx(): np->really_rx_count is 0 before while()\n");
+ //}
+
/* If EOP is set on the next entry, it's a new packet. Send it up. */
- while (!(np->cur_rx->status & RXOWN)) {
+ while (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) {
s32 rx_status = np->cur_rx->status;
- if (np->really_rx_count == 0)
- break;
+ //if (np->really_rx_count == 0) {
+ // printk(KERN_ERR "netdev_rx(): np->really_rx_count reached 0\n"); //vda:
+ // break;
+ //}
if (debug)
printk(KERN_DEBUG " netdev_rx() status was %8.8x.\n", rx_status);
@@ -1628,8 +1749,15 @@
np->stats.rx_length_errors++;
/* free all rx descriptors related this long pkt */
- for (i = 0; i < desno; ++i)
- free_one_rx_descriptor(np);
+ for (i = 0; i < desno; ++i) {
+ if (!np->cur_rx->skbuff) {
+ printk(KERN_DEBUG
+ "%s: I'm scared\n", dev->name);
+ break;
+ }
+ np->cur_rx->status = RXOWN;
+ np->cur_rx = np->cur_rx->next_desc_logical;
+ }
continue;
} else { /* something error, need to reset this chip */
reset_rx_descriptors(dev);
@@ -1671,8 +1799,6 @@
} else {
skb_put(skb = np->cur_rx->skbuff, pkt_len);
np->cur_rx->skbuff = NULL;
- if (np->really_rx_count == RX_RING_SIZE)
- np->lack_rxbuf = np->cur_rx;
--np->really_rx_count;
}
skb->protocol = eth_type_trans(skb, dev);
@@ -1682,22 +1808,7 @@
np->stats.rx_bytes += pkt_len;
}
- if (np->cur_rx->skbuff == NULL) {
- struct sk_buff *skb;
-
- skb = dev_alloc_skb(np->rx_buf_sz);
-
- if (skb != NULL) {
- skb->dev = dev; /* Mark as being used by this device. */
- np->cur_rx->buffer = pci_map_single(np->pci_dev, skb->tail,
- np->rx_buf_sz, PCI_DMA_FROMDEVICE);
- np->cur_rx->skbuff = skb;
- ++np->really_rx_count;
- }
- }
-
- if (np->cur_rx->skbuff != NULL)
- free_one_rx_descriptor(np);
+ np->cur_rx = np->cur_rx->next_desc_logical;
} /* end of while loop */
/* allocate skb for rx buffers */
@@ -1752,8 +1863,10 @@
rx_mode = AB | AM;
}
- stop_nic_tx(ioaddr, np->crvalue);
- stop_nic_rx(ioaddr, np->crvalue & (~0x40000));
+ //vda:
+ //stop_nic_tx(ioaddr, np->crvalue);
+ //stop_nic_rx(ioaddr, np->crvalue & (~0x40000));
+ stop_nic_rxtx(ioaddr, np->crvalue);
writel(mc_filter[0], ioaddr + MAR0);
writel(mc_filter[1], ioaddr + MAR1);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses (with [PATCH])
2004-03-31 16:39 ` fealnx oopses (with [PATCH]) Denis Vlasenko
@ 2004-03-31 19:24 ` Andreas Henriksson
2004-03-31 20:38 ` Denis Vlasenko
0 siblings, 1 reply; 31+ messages in thread
From: Andreas Henriksson @ 2004-03-31 19:24 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: Jeff Garzik, Francois Romieu, Andreas Henriksson, netdev
On Wed, Mar 31, 2004 at 06:39:33PM +0200, Denis Vlasenko wrote:
>
> Ok, here is what I have now.
>
> What we had before:
> original behaviour: Andreas had tx tomeouts, I had fatal oops.
I had tx-related kernel panic with vanilla driver on my p-166.
> francois+jgarzik patch: Andreas happy, I had lockup (endless 'Too much work')
>
with spin_lock_irqsave in start_tx I haven't been able to trigger the
panic on my p3-600 .. (Although, I don't know if it actually fixed the
problem or if it's just harder to trigger on a faster machine.. but I've
been moving ALOT of data so even if it's 10 or 100 times hard I should
have triggered it by now..)
so jgarzik's patch is enough to make me happy... no more panic but I
still see this which is annoying when trying to do interactive stuff
over the network:
-- snip --
NETDEV WATCHDOG: eth1: transmit timed out
eth1: Transmit timed out, status 00000000, resetting...
NETDEV WATCHDOG: eth1: transmit timed out
eth1: Transmit timed out, status 00000000, resetting...
NETDEV WATCHDOG: eth1: transmit timed out
eth1: Transmit timed out, status 00000000, resetting...
-- snip --
> Now, with the attached patch, I *don't* lock up. I can ctrl-C
> netcat, UDP flood stops and card is alive, tested with pinging.
>
> I modified 'Too much work in interrupt' code.
> I added code which completely stops rx and tx and schedules
> card reset a-la reset previously used in tx_timeout code path.
> There is 1 second delay.
>
To me it doesn't seem better nor worse then before... so I guess if it
helps you in any way, good.
>
> In testing, I saw this code triggering a number of times.
> It works as intended.
>
> I also have seen tx timeout events. (I touched that code a bit,
> factoring out reset code, but without changing logic). They did
> not hang the box, either, but box was unable to do any tx.
> tx timeouts just happened again and again.
> Remote side decided that we are gone and stopped UDP flood,
> starting ARP address resolution instead, but test box could
> not answer even that.
>
> In this state I can ctrl-C local netcat and box recovers
> after several seconds. Tx is working again, I can ping other
> hosts, etc.
>
> I conclude that tx timeout logic does not handle situation
> when *local* process generates and submits tons of packets
> at enormous rate.
>
> Now, the patch itself. It is against 2.4.25.
> Sorry, it contains some debugging stuff and some future work
> (e.g. #defines). I decided to not remove it now to avoid
> brown paper bugs. (I can remove and retest, but testing can
> take another couple of hours...)
>
applies easily to 2.6 too, with some offset and fuzz and 1 reject that
is really easy to fix manually.
>
> I can code tx timeout similarly to 'Too much work', but I
> retained it 'as is' in hopes of your comments on both
> code paths.
> --
> vda
>
> --- fealnx.c.original Fri Nov 28 20:26:20 2003
> +++ fealnx.c Wed Mar 31 17:26:57 2004
> @@ -210,6 +210,26 @@
> BPREMRPSR = 0x5c, /* bypass & receive error mask and phy status */
> };
>
> +//vda: TCRRCR values collected from code: W for writing, R for reading
> +#define CR_W_ENH 0x02000000 /* enhanced mode */
> +#define CR_W_FD 0x00100000 /* full duplex */
> +#define CR_W_PS10 0x00080000 /* 10 mbit */
> +#define CR_W_TXEN 0x00040000 /* tx enable */
> +#define CR_W_PS1000 0x00010000 /* 1000 mbit */
> +#define CR_W_RXMASK 0x000000e0
> +#define CR_W_PROM 0x00000080 /* promiscuous mode */
> +#define CR_W_AB 0x00000040 /* accept broadcast */
> +#define CR_W_AM 0x00000020 /* accept mutlicast */
> +#define CR_W_ARP 0x00000008 /* receive runt pkt */
> +#define CR_W_ALP 0x00000004 /* receive long pkt */
> +#define CR_W_SEP 0x00000002 /* receive error pkt */
> +#define CR_W_RXEN 0x00000001 /* rx enable (unicast?) */
> +
> +#define CR_R_TXSTOP 0x04000000 /* tx stopped */
> +#define CR_R_FD 0x00100000 /* full duplex detected */
> +#define CR_R_PS10 0x00080000 /* 10 mbit detected */
> +#define CR_R_RXSTOP 0x00008000 /* rx stopped */
> +
> /* Bits in the interrupt status/enable registers. */
> /* The bits in the Intr Status/Enable registers, mostly interrupt sources. */
> enum intr_status_bits {
> @@ -401,6 +421,12 @@
> /* Media monitoring timer. */
> struct timer_list timer;
>
> + /* Reset timer */ //vda:
> + struct timer_list reset_timer;
> + int reset_timer_armed;
> + unsigned long crvalue_sv;
> + unsigned long imrvalue_sv;
> +
> /* Frequently used values: keep some adjacent for cache effect. */
> int flags;
> struct pci_dev *pci_dev;
> @@ -436,6 +462,7 @@
> static void getlinktype(struct net_device *dev);
> static void getlinkstatus(struct net_device *dev);
> static void netdev_timer(unsigned long data);
> +static void reset_timer(unsigned long data); //vda:
> static void tx_timeout(struct net_device *dev);
> static void init_ring(struct net_device *dev);
> static int start_tx(struct sk_buff *skb, struct net_device *dev);
> @@ -478,6 +505,19 @@
> }
>
>
> +void stop_nic_rxtx(long ioaddr, long crvalue) //vda:
> +{
> + int delay = 0x1000;
> + writel(crvalue & (~0x40001), ioaddr + TCRRCR);
> +
> + /* wait for rx and tx stop */
> + while(delay--) {
> + int t = readl(ioaddr + TCRRCR);
> + if( (t & 0x04008000) == 0x04008000 ) break;
> + }
> +}
> +
> +
>
> static int __devinit fealnx_init_one(struct pci_dev *pdev,
> const struct pci_device_id *ent)
> @@ -666,7 +706,7 @@
> dev->hard_start_xmit = &start_tx;
> dev->stop = &netdev_close;
> dev->get_stats = &get_stats;
> - dev->set_multicast_list = &set_rx_mode;
> + dev->set_multicast_list = &set_rx_mode; //vda: hmm... set_rx_mode needs locking? TODO
> dev->do_ioctl = &mii_ioctl;
> dev->ethtool_ops = &netdev_ethtool_ops;
> dev->tx_timeout = tx_timeout;
> @@ -723,7 +763,8 @@
>
> unsigned int m80x_read_tick(void)
> /* function: Reads the Timer tick count register which decrements by 2 from */
> -/* 65536 to 0 every 1/36.414 of a second. Each 2 decrements of the *//* count represents 838 nsec's. */
> +/* 65536 to 0 every 1/36.414 of a second. Each 2 decrements of the */
> +/* count represents 838 nsec's. */
> /* input : none. */
> /* output : none. */
> {
> @@ -985,6 +1026,11 @@
> /* timer handler */
> add_timer(&np->timer);
>
> + init_timer(&np->reset_timer); //vda:
> + np->reset_timer.data = (unsigned long) dev;
> + np->reset_timer.function = &reset_timer;
> + np->reset_timer_armed = 0;
> +
> return 0;
> }
>
> @@ -1134,15 +1180,17 @@
> struct sk_buff *skb;
>
> skb = dev_alloc_skb(np->rx_buf_sz);
> - np->lack_rxbuf->skbuff = skb;
> -
> if (skb == NULL)
> break; /* Better luck next round. */
>
> + while (np->lack_rxbuf->skbuff)
> + np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
> + np->lack_rxbuf->skbuff = skb;
> +
> skb->dev = dev; /* Mark as being used by this device. */
> np->lack_rxbuf->buffer = pci_map_single(np->pci_dev, skb->tail,
> np->rx_buf_sz, PCI_DMA_FROMDEVICE);
> - np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
> + np->lack_rxbuf->status = RXOWN;
> ++np->really_rx_count;
> }
> }
> @@ -1156,19 +1204,25 @@
> int next_tick = 10 * HZ;
> int old_crvalue = np->crvalue;
> unsigned int old_linkok = np->linkok;
> + unsigned long flags;
>
> if (debug)
> printk(KERN_DEBUG "%s: Media selection timer tick, status %8.8x "
> "config %8.8x.\n", dev->name, readl(ioaddr + ISR),
> readl(ioaddr + TCRRCR));
>
> + spin_lock_irqsave(&np->lock, flags); //vda:
> +
> if (np->flags == HAS_MII_XCVR) {
> getlinkstatus(dev);
> if ((old_linkok == 0) && (np->linkok == 1)) { /* we need to detect the media type again */
> getlinktype(dev);
> if (np->crvalue != old_crvalue) {
> - stop_nic_tx(ioaddr, np->crvalue);
> - stop_nic_rx(ioaddr, np->crvalue & (~0x40000));
> + //vda: stop_nic_tx(ioaddr, np->crvalue);
> + //stop_nic_rx(ioaddr, np->crvalue & (~0x40000));
> + /* stop rx,tx and start again, */
> + /* this switches us to new mode */
> + stop_nic_rxtx(ioaddr, np->crvalue);
> writel(np->crvalue, ioaddr + TCRRCR);
> }
> }
> @@ -1176,36 +1230,22 @@
>
> allocate_rx_buffers(dev);
>
> + spin_unlock_irqrestore(&np->lock, flags);
> +
> np->timer.expires = RUN_AT(next_tick);
> add_timer(&np->timer);
> }
>
> -
> -static void tx_timeout(struct net_device *dev)
> +/* Reinit. Gross */
> +static void vda_reset_disable_rxtx(struct net_device *dev) //vda:
> {
> struct netdev_private *np = dev->priv;
> long ioaddr = dev->base_addr;
> int i;
>
> - printk(KERN_WARNING "%s: Transmit timed out, status %8.8x,"
> - " resetting...\n", dev->name, readl(ioaddr + ISR));
> -
> - {
> -
> - printk(KERN_DEBUG " Rx ring %p: ", np->rx_ring);
> - for (i = 0; i < RX_RING_SIZE; i++)
> - printk(" %8.8x", (unsigned int) np->rx_ring[i].status);
> - printk("\n" KERN_DEBUG " Tx ring %p: ", np->tx_ring);
> - for (i = 0; i < TX_RING_SIZE; i++)
> - printk(" %4.4x", np->tx_ring[i].status);
> - printk("\n");
> - }
> -
> - /* Reinit. Gross */
> -
> /* Reset the chip's Tx and Rx processes. */
> - stop_nic_tx(ioaddr, 0);
> - reset_rx_descriptors(dev);
> + stop_nic_rxtx(ioaddr, 0);
> + //reset_rx_descriptors(dev); //moved down: contains stop_nic_rx(ioaddr, np->crvalue)!
>
> /* Disable interrupts by clearing the interrupt mask. */
> writel(0x0000, ioaddr + IMR);
> @@ -1219,6 +1259,14 @@
> readl(ioaddr + BCR);
> rmb();
> }
> +}
> +
> +static void vda_enable_rxtx(struct net_device *dev) //vda:
> +{
> + struct netdev_private *np = dev->priv;
> + long ioaddr = dev->base_addr;
> +
> + reset_rx_descriptors(dev); // contains stop_nic_rx(ioaddr, np->crvalue)
>
> writel((np->cur_tx - np->tx_ring)*sizeof(struct fealnx_desc) +
> np->tx_ring_dma, ioaddr + TXLBA);
> @@ -1227,18 +1275,66 @@
>
> writel(np->bcrvalue, ioaddr + BCR);
>
> - writel(0, dev->base_addr + RXPDR);
> - set_rx_mode(dev);
> + writel(0, ioaddr + RXPDR); // is order of these ok?
> + set_rx_mode(dev); // changes np->crvalue, writes TCRRCR
> +
> /* Clear and Enable interrupts by setting the interrupt mask. */
> writel(FBE | TUNF | CNTOVF | RBU | TI | RI, ioaddr + ISR);
> writel(np->imrvalue, ioaddr + IMR);
>
> - writel(0, dev->base_addr + TXPDR);
> + writel(0, ioaddr + TXPDR);
> +}
> +
> +static void reset_timer(unsigned long data) //vda:
> +{
> + struct net_device *dev = (struct net_device *) data;
> + struct netdev_private *np = dev->priv;
> + unsigned long flags;
> +
> + printk(KERN_WARNING "%s: resetting tx and rx machinery\n", dev->name);
> +
> + spin_lock_irqsave(&np->lock, flags);
> + np->crvalue = np->crvalue_sv;
> + np->imrvalue = np->imrvalue_sv;
> +
> + vda_reset_disable_rxtx(dev);
> + vda_enable_rxtx(dev);
> + netif_start_queue(dev); // netif_tx_enable(dev) doesn't exist, right?
> +
> + np->reset_timer_armed = 0;
> +
> + spin_unlock_irqrestore(&np->lock, flags);
> +}
> +
> +
> +static void tx_timeout(struct net_device *dev)
> +{
> + struct netdev_private *np = dev->priv;
> + long ioaddr = dev->base_addr;
> + unsigned long flags;
> + int i;
> +
> + printk(KERN_WARNING "%s: Transmit timed out, status %8.8x,"
> + " resetting...\n", dev->name, readl(ioaddr + ISR));
> +
> + {
> +
> + printk(KERN_DEBUG " Rx ring %p: ", np->rx_ring);
> + for (i = 0; i < RX_RING_SIZE; i++)
> + printk(" %8.8x", (unsigned int) np->rx_ring[i].status);
> + printk("\n" KERN_DEBUG " Tx ring %p: ", np->tx_ring);
> + for (i = 0; i < TX_RING_SIZE; i++)
> + printk(" %4.4x", np->tx_ring[i].status);
> + printk("\n");
> + }
> +
> + spin_lock_irqsave(&np->lock, flags); //vda:
> + vda_reset_disable_rxtx(dev);
> + vda_enable_rxtx(dev);
> + spin_unlock_irqrestore(&np->lock, flags);
>
> dev->trans_start = jiffies;
> np->stats.tx_errors++;
> -
> - return;
> }
>
>
> @@ -1251,7 +1347,7 @@
> /* initialize rx variables */
> np->rx_buf_sz = (dev->mtu <= 1500 ? PKT_BUF_SZ : dev->mtu + 32);
> np->cur_rx = &np->rx_ring[0];
> - np->lack_rxbuf = NULL;
> + np->lack_rxbuf = np->rx_ring;
> np->really_rx_count = 0;
>
> /* initial rx descriptors. */
> @@ -1303,14 +1399,15 @@
> /* for the last tx descriptor */
> np->tx_ring[i - 1].next_desc = np->tx_ring_dma;
> np->tx_ring[i - 1].next_desc_logical = &np->tx_ring[0];
> -
> - return;
> }
>
>
> static int start_tx(struct sk_buff *skb, struct net_device *dev)
> {
> struct netdev_private *np = dev->priv;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&np->lock, flags);
>
> np->cur_tx_copy->skbuff = skb;
>
> @@ -1340,7 +1437,7 @@
> np->cur_tx_copy->control |= (BPT << TBSShift); /* buffer size */
>
> /* for the last descriptor */
> - next = (struct fealnx *) np->cur_tx_copy.next_desc_logical;
> + next = np->cur_tx_copy.next_desc_logical;
> next->skbuff = skb;
> next->control = TXIC | TXLD | CRCEnable | PADEnable;
> next->control |= (skb->len << PKTSShift); /* pkt size */
> @@ -1377,36 +1474,26 @@
> writel(0, dev->base_addr + TXPDR);
> dev->trans_start = jiffies;
>
> + spin_unlock_irqrestore(&np->lock, flags);
> return 0;
> }
>
> -
> -void free_one_rx_descriptor(struct netdev_private *np)
> -{
> - if (np->really_rx_count == RX_RING_SIZE)
> - np->cur_rx->status = RXOWN;
> - else {
> - np->lack_rxbuf->skbuff = np->cur_rx->skbuff;
> - np->lack_rxbuf->buffer = np->cur_rx->buffer;
> - np->lack_rxbuf->status = RXOWN;
> - ++np->really_rx_count;
> - np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
> - }
> - np->cur_rx = np->cur_rx->next_desc_logical;
> -}
> -
> -
> void reset_rx_descriptors(struct net_device *dev)
> {
> struct netdev_private *np = dev->priv;
> + struct fealnx_desc *cur = np->cur_rx;
> + int i;
>
> stop_nic_rx(dev->base_addr, np->crvalue);
>
> - while (!(np->cur_rx->status & RXOWN))
> - free_one_rx_descriptor(np);
> -
> allocate_rx_buffers(dev);
>
> + for (i = 0; i < RX_RING_SIZE; i++) {
> + if (cur->skbuff)
> + cur->status = RXOWN;
> + cur = cur->next_desc_logical;
> + }
> +
> writel(np->rx_ring_dma + (np->cur_rx - np->rx_ring),
> dev->base_addr + RXLBA);
> writel(np->crvalue, dev->base_addr + TCRRCR);
> @@ -1423,6 +1510,8 @@
> unsigned int num_tx = 0;
> int handled = 0;
>
> + spin_lock(&np->lock);
> +
> writel(0, dev->base_addr + IMR);
>
> ioaddr = dev->base_addr;
> @@ -1548,6 +1637,23 @@
> if (--boguscnt < 0) {
> printk(KERN_WARNING "%s: Too much work at interrupt, "
> "status=0x%4.4x.\n", dev->name, intr_status);
> + if(!np->reset_timer_armed) { //vda:
> + printk(KERN_WARNING "scheduling card reset\n");
> + np->reset_timer.expires = RUN_AT(HZ/2);
> + add_timer(&np->reset_timer);
> + np->reset_timer_armed = 1;
> + //taken from reset code:
> + //stop_nic_tx(ioaddr, 0); //stop tx
> +
> + // without this, I've seen one lockup and one 'forever stuck tx' event
> + stop_nic_rxtx(ioaddr, 0); //stop rx
> + netif_stop_queue(dev); // or netif_tx_disable(dev);??
> + /* Prevent other codepaths from enabling tx,rx,intrs */
> + np->crvalue_sv = np->crvalue;
> + np->crvalue &= ~(CR_W_TXEN | CR_W_RXEN); // or simply = 0?
> + np->imrvalue_sv = np->imrvalue;
> + np->imrvalue = 0;
> + }
> break;
> }
> } while (1);
> @@ -1564,6 +1670,8 @@
> dev->name, readl(ioaddr + ISR));
>
> writel(np->imrvalue, ioaddr + IMR);
> +out_unlock:
> + spin_unlock(&np->lock);
>
> return IRQ_RETVAL(handled);
> }
> @@ -1575,12 +1683,25 @@
> {
> struct netdev_private *np = dev->priv;
>
> + //if( ! (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) ) { //vda:
> + // printk(KERN_ERR "netdev_rx(): nothing to do?! (np->cur_rx->status & RXOWN) == 0x%04x, np->cur_rx->skbuff == %p\n"
> + // ,(np->cur_rx->status & RXOWN)
> + // ,np->cur_rx->skbuff
> + // );
> + //}
> +
> + //if (np->really_rx_count == 0) { //vda:
> + // printk(KERN_ERR "netdev_rx(): np->really_rx_count is 0 before while()\n");
> + //}
> +
> /* If EOP is set on the next entry, it's a new packet. Send it up. */
> - while (!(np->cur_rx->status & RXOWN)) {
> + while (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) {
> s32 rx_status = np->cur_rx->status;
>
> - if (np->really_rx_count == 0)
> - break;
> + //if (np->really_rx_count == 0) {
> + // printk(KERN_ERR "netdev_rx(): np->really_rx_count reached 0\n"); //vda:
> + // break;
> + //}
>
> if (debug)
> printk(KERN_DEBUG " netdev_rx() status was %8.8x.\n", rx_status);
> @@ -1628,8 +1749,15 @@
> np->stats.rx_length_errors++;
>
> /* free all rx descriptors related this long pkt */
> - for (i = 0; i < desno; ++i)
> - free_one_rx_descriptor(np);
> + for (i = 0; i < desno; ++i) {
> + if (!np->cur_rx->skbuff) {
> + printk(KERN_DEBUG
> + "%s: I'm scared\n", dev->name);
> + break;
> + }
> + np->cur_rx->status = RXOWN;
> + np->cur_rx = np->cur_rx->next_desc_logical;
> + }
> continue;
> } else { /* something error, need to reset this chip */
> reset_rx_descriptors(dev);
> @@ -1671,8 +1799,6 @@
> } else {
> skb_put(skb = np->cur_rx->skbuff, pkt_len);
> np->cur_rx->skbuff = NULL;
> - if (np->really_rx_count == RX_RING_SIZE)
> - np->lack_rxbuf = np->cur_rx;
> --np->really_rx_count;
> }
> skb->protocol = eth_type_trans(skb, dev);
> @@ -1682,22 +1808,7 @@
> np->stats.rx_bytes += pkt_len;
> }
-------------------------------------
this is the part that I got a reject for on 2.6:
pci_map_single's arguments where slit on one row for each argument.
>
> - if (np->cur_rx->skbuff == NULL) {
> - struct sk_buff *skb;
> -
> - skb = dev_alloc_skb(np->rx_buf_sz);
> -
> - if (skb != NULL) {
> - skb->dev = dev; /* Mark as being used by this device. */
> - np->cur_rx->buffer = pci_map_single(np->pci_dev, skb->tail,
> - np->rx_buf_sz, PCI_DMA_FROMDEVICE);
> - np->cur_rx->skbuff = skb;
> - ++np->really_rx_count;
> - }
> - }
> -
> - if (np->cur_rx->skbuff != NULL)
> - free_one_rx_descriptor(np);
> + np->cur_rx = np->cur_rx->next_desc_logical;
> } /* end of while loop */
------------------------------
>
> /* allocate skb for rx buffers */
> @@ -1752,8 +1863,10 @@
> rx_mode = AB | AM;
> }
>
> - stop_nic_tx(ioaddr, np->crvalue);
> - stop_nic_rx(ioaddr, np->crvalue & (~0x40000));
> + //vda:
> + //stop_nic_tx(ioaddr, np->crvalue);
> + //stop_nic_rx(ioaddr, np->crvalue & (~0x40000));
> + stop_nic_rxtx(ioaddr, np->crvalue);
>
> writel(mc_filter[0], ioaddr + MAR0);
> writel(mc_filter[1], ioaddr + MAR1);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses (with [PATCH])
2004-03-31 19:24 ` Andreas Henriksson
@ 2004-03-31 20:38 ` Denis Vlasenko
2004-03-31 20:53 ` Jeff Garzik
0 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-03-31 20:38 UTC (permalink / raw)
To: Andreas Henriksson
Cc: Jeff Garzik, Francois Romieu, Andreas Henriksson, netdev,
Denis Vlasenko
Thank you for testing!
On Wednesday 31 March 2004 21:24, Andreas Henriksson wrote:
> On Wed, Mar 31, 2004 at 06:39:33PM +0200, Denis Vlasenko wrote:
> > Ok, here is what I have now.
> >
> > What we had before:
> > original behaviour: Andreas had tx tomeouts, I had fatal oops.
>
> I had tx-related kernel panic with vanilla driver on my p-166.
>
> > francois+jgarzik patch: Andreas happy, I had lockup (endless 'Too much
> > work')
>
> with spin_lock_irqsave in start_tx I haven't been able to trigger the
> panic on my p3-600 .. (Although, I don't know if it actually fixed the
> problem or if it's just harder to trigger on a faster machine.. but I've
> been moving ALOT of data so even if it's 10 or 100 times hard I should
> have triggered it by now..)
>
> so jgarzik's patch is enough to make me happy... no more panic but I
> still see this which is annoying when trying to do interactive stuff
> over the network:
>
> -- snip --
> NETDEV WATCHDOG: eth1: transmit timed out
> eth1: Transmit timed out, status 00000000, resetting...
> NETDEV WATCHDOG: eth1: transmit timed out
> eth1: Transmit timed out, status 00000000, resetting...
> NETDEV WATCHDOG: eth1: transmit timed out
> eth1: Transmit timed out, status 00000000, resetting...
Is it happens several times in a row? I see the same thing,
actually, when my netcat madly spews UDPs, it happens
endlessly, until I kill netcat.
So, maybe I can cure this too by replacing tx timeout code
with code which I use for 'too much work' codepath.
> > I modified 'Too much work in interrupt' code.
> > I added code which completely stops rx and tx and schedules
> > card reset a-la reset previously used in tx_timeout code path.
> > There is 1 second delay.
>
> To me it doesn't seem better nor worse then before... so I guess if it
> helps you in any way, good.
Yes, you do not trigger this. Your box use original code
still living in tx timeout path.
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses (with [PATCH])
2004-03-31 20:38 ` Denis Vlasenko
@ 2004-03-31 20:53 ` Jeff Garzik
2004-03-31 22:23 ` Francois Romieu
0 siblings, 1 reply; 31+ messages in thread
From: Jeff Garzik @ 2004-03-31 20:53 UTC (permalink / raw)
To: netdev; +Cc: Denis Vlasenko, Andreas Henriksson, Francois Romieu
FYI to those involved, I sent the TX patch upstream.
When there is a minimal RX patch people are happy with, let me know...
further changes are fine, but I would like to split up the changes as
much as is reasonable.
Jeff
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses
2004-03-29 23:16 ` Denis Vlasenko
@ 2004-03-31 21:01 ` Francois Romieu
0 siblings, 0 replies; 31+ messages in thread
From: Francois Romieu @ 2004-03-31 21:01 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: Andreas Henriksson, Jeff Garzik, netdev
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> :
[...]
> I just verified that even if I stop remote netcat,
> box does not recover. Console fills with
> "Too much work at interrupt, status=0x0020"
> 0x0020 is RBU = 0x00000020, /* receive buffer unavailable */
There too much work in my daily interrupt handler as well so this get
scheduled for a few days/nights :o)
If anything else, I'll probably cook up a nice patch for the current changes
once things settle down a bit.
--
Ueimor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses (with [PATCH])
2004-03-31 20:53 ` Jeff Garzik
@ 2004-03-31 22:23 ` Francois Romieu
2004-04-01 11:09 ` Denis Vlasenko
0 siblings, 1 reply; 31+ messages in thread
From: Francois Romieu @ 2004-03-31 22:23 UTC (permalink / raw)
To: Jeff Garzik; +Cc: netdev, Denis Vlasenko, Andreas Henriksson
[-- Attachment #1: Type: text/plain, Size: 540 bytes --]
Jeff Garzik <jgarzik@pobox.com> :
> FYI to those involved, I sent the TX patch upstream.
>
> When there is a minimal RX patch people are happy with, let me know...
> further changes are fine, but I would like to split up the changes as
> much as is reasonable.
Attached is the patch tested by Andreas which fixes single allocation
failures but will not protect against excessive work in irq handler nor
complete depletion of the Rx ring.
It applies on top of the Tx patch (as seen on bk-commit).
Earth-shattering, anyone ?
--
Ueimor
[-- Attachment #2: fealnx-allocation-failure.patch --]
[-- Type: text/plain, Size: 5137 bytes --]
Handle allocation failure with grace:
- defer instant allocation in netdev_rx() until the packet examination
loop is finished. This removes code duplication between netdev_rx()
and allocate_rx_buffers();
- protect netdev_rx() against wrap-around if it ends on an unallocated
descriptor;
- always proceed to the next Rx descriptor in netdev_rx() so as to stay in
sync with the chipset;
- positionning of the first dirty/unallocated Rx descriptor is of the sole
responsibility of allocate_rx_buffers();
- unconditionnaly give the allocated descriptors to the chipset during
reset_rx_descriptors().
drivers/net/fealnx.c | 69 ++++++++++++++++-----------------------------------
1 files changed, 22 insertions(+), 47 deletions(-)
diff -puN drivers/net/fealnx.c~fealnx-allocation-failure drivers/net/fealnx.c
--- linux-2.6.5-rc2/drivers/net/fealnx.c~fealnx-allocation-failure 2004-03-31 23:31:23.000000000 +0200
+++ linux-2.6.5-rc2-fr/drivers/net/fealnx.c 2004-03-31 23:50:36.000000000 +0200
@@ -1134,15 +1134,18 @@ static void allocate_rx_buffers(struct n
struct sk_buff *skb;
skb = dev_alloc_skb(np->rx_buf_sz);
- np->lack_rxbuf->skbuff = skb;
if (skb == NULL)
break; /* Better luck next round. */
+ while (np->lack_rxbuf->skbuff)
+ np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->skbuff = skb;
+
skb->dev = dev; /* Mark as being used by this device. */
np->lack_rxbuf->buffer = pci_map_single(np->pci_dev, skb->tail,
np->rx_buf_sz, PCI_DMA_FROMDEVICE);
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
+ np->lack_rxbuf->status = RXOWN;
++np->really_rx_count;
}
}
@@ -1250,8 +1253,7 @@ static void init_ring(struct net_device
/* initialize rx variables */
np->rx_buf_sz = (dev->mtu <= 1500 ? PKT_BUF_SZ : dev->mtu + 32);
- np->cur_rx = &np->rx_ring[0];
- np->lack_rxbuf = NULL;
+ np->lack_rxbuf = np->cur_rx = np->rx_ring;
np->really_rx_count = 0;
/* initial rx descriptors. */
@@ -1303,8 +1305,6 @@ static void init_ring(struct net_device
/* for the last tx descriptor */
np->tx_ring[i - 1].next_desc = np->tx_ring_dma;
np->tx_ring[i - 1].next_desc_logical = &np->tx_ring[0];
-
- return;
}
@@ -1381,32 +1381,22 @@ static int start_tx(struct sk_buff *skb,
}
-void free_one_rx_descriptor(struct netdev_private *np)
-{
- if (np->really_rx_count == RX_RING_SIZE)
- np->cur_rx->status = RXOWN;
- else {
- np->lack_rxbuf->skbuff = np->cur_rx->skbuff;
- np->lack_rxbuf->buffer = np->cur_rx->buffer;
- np->lack_rxbuf->status = RXOWN;
- ++np->really_rx_count;
- np->lack_rxbuf = np->lack_rxbuf->next_desc_logical;
- }
- np->cur_rx = np->cur_rx->next_desc_logical;
-}
-
-
void reset_rx_descriptors(struct net_device *dev)
{
struct netdev_private *np = dev->priv;
+ struct fealnx_desc *cur = np->cur_rx;
+ int i;
stop_nic_rx(dev->base_addr, np->crvalue);
- while (!(np->cur_rx->status & RXOWN))
- free_one_rx_descriptor(np);
-
allocate_rx_buffers(dev);
+ for (i = 0; i < RX_RING_SIZE; i++) {
+ if (cur->skbuff)
+ cur->status = RXOWN;
+ cur = cur->next_desc_logical;
+ }
+
writel(np->rx_ring_dma + (np->cur_rx - np->rx_ring),
dev->base_addr + RXLBA);
writel(np->crvalue, dev->base_addr + TCRRCR);
@@ -1580,7 +1570,7 @@ static int netdev_rx(struct net_device *
struct netdev_private *np = dev->priv;
/* If EOP is set on the next entry, it's a new packet. Send it up. */
- while (!(np->cur_rx->status & RXOWN)) {
+ while (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) {
s32 rx_status = np->cur_rx->status;
if (np->really_rx_count == 0)
@@ -1632,8 +1622,12 @@ static int netdev_rx(struct net_device *
np->stats.rx_length_errors++;
/* free all rx descriptors related this long pkt */
- for (i = 0; i < desno; ++i)
- free_one_rx_descriptor(np);
+ for (i = 0; i < desno; i++) {
+ if (!np->cur_rx->skbuff)
+ break;
+ np->cur_rx->status = RXOWN;
+ np->cur_rx = np->cur_rx->next_desc_logical;
+ }
continue;
} else { /* something error, need to reset this chip */
reset_rx_descriptors(dev);
@@ -1683,8 +1677,6 @@ static int netdev_rx(struct net_device *
PCI_DMA_FROMDEVICE);
skb_put(skb = np->cur_rx->skbuff, pkt_len);
np->cur_rx->skbuff = NULL;
- if (np->really_rx_count == RX_RING_SIZE)
- np->lack_rxbuf = np->cur_rx;
--np->really_rx_count;
}
skb->protocol = eth_type_trans(skb, dev);
@@ -1694,24 +1686,7 @@ static int netdev_rx(struct net_device *
np->stats.rx_bytes += pkt_len;
}
- if (np->cur_rx->skbuff == NULL) {
- struct sk_buff *skb;
-
- skb = dev_alloc_skb(np->rx_buf_sz);
-
- if (skb != NULL) {
- skb->dev = dev; /* Mark as being used by this device. */
- np->cur_rx->buffer = pci_map_single(np->pci_dev,
- skb->tail,
- np->rx_buf_sz,
- PCI_DMA_FROMDEVICE);
- np->cur_rx->skbuff = skb;
- ++np->really_rx_count;
- }
- }
-
- if (np->cur_rx->skbuff != NULL)
- free_one_rx_descriptor(np);
+ np->cur_rx = np->cur_rx->next_desc_logical;
} /* end of while loop */
/* allocate skb for rx buffers */
_
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses (with [PATCH])
2004-03-31 22:23 ` Francois Romieu
@ 2004-04-01 11:09 ` Denis Vlasenko
2004-04-01 12:28 ` Francois Romieu
0 siblings, 1 reply; 31+ messages in thread
From: Denis Vlasenko @ 2004-04-01 11:09 UTC (permalink / raw)
To: Francois Romieu, Jeff Garzik; +Cc: netdev, Andreas Henriksson, Denis Vlasenko
On Thursday 01 April 2004 01:23, Francois Romieu wrote:
> Jeff Garzik <jgarzik@pobox.com> :
> > FYI to those involved, I sent the TX patch upstream.
> >
> > When there is a minimal RX patch people are happy with, let me know...
> > further changes are fine, but I would like to split up the changes as
> > much as is reasonable.
>
> Attached is the patch tested by Andreas which fixes single allocation
> failures but will not protect against excessive work in irq handler nor
> complete depletion of the Rx ring.
>
> It applies on top of the Tx patch (as seen on bk-commit).
>
> Earth-shattering, anyone ?
Yes, this thing improved life for me (from oopses
to uninterruptible excessive work in interrupt).
My RX coding was on top of it, so I think this should
be applied.
Jeff?
--
vda
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: fealnx oopses (with [PATCH])
2004-04-01 11:09 ` Denis Vlasenko
@ 2004-04-01 12:28 ` Francois Romieu
0 siblings, 0 replies; 31+ messages in thread
From: Francois Romieu @ 2004-04-01 12:28 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: Jeff Garzik, netdev, Andreas Henriksson
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> :
[...]
> Yes, this thing improved life for me (from oopses
> to uninterruptible excessive work in interrupt).
> My RX coding was on top of it, so I think this should
> be applied.
<bozo>
The patch includes an unneeded 'return' removal which will choke (it is
already done in the Tx patch sent by Jeff).
</bozo>
Should I rediff and send directly to Jeff ?
--
Ueimor
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2004-04-01 12:28 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-26 10:14 fealnx oopses Denis Vlasenko
2004-03-26 19:22 ` Andreas Henriksson
2004-03-26 19:33 ` [PATCH] " Jeff Garzik
2004-03-26 20:05 ` Denis Vlasenko
2004-03-27 2:13 ` Andreas Henriksson
2004-03-26 19:57 ` Denis Vlasenko
[not found] ` <40648CAF.5010203@pobox.com>
2004-03-26 22:14 ` Denis Vlasenko
2004-03-26 22:35 ` Francois Romieu
2004-03-27 0:03 ` Denis Vlasenko
2004-03-27 0:30 ` Francois Romieu
[not found] ` <4064BB35.4050301@pobox.com>
2004-03-27 21:28 ` Denis Vlasenko
2004-03-27 23:55 ` Francois Romieu
2004-03-28 20:19 ` Denis Vlasenko
2004-03-28 23:27 ` Andreas Henriksson
2004-03-28 23:38 ` Francois Romieu
2004-03-29 17:01 ` Andreas Henriksson
2004-03-29 21:49 ` Denis Vlasenko
2004-03-29 22:20 ` Francois Romieu
2004-03-29 22:50 ` Denis Vlasenko
2004-03-29 23:16 ` Denis Vlasenko
2004-03-31 21:01 ` Francois Romieu
[not found] ` <4068AC87.2030506@pobox.com>
2004-03-29 23:18 ` Denis Vlasenko
2004-03-31 16:39 ` fealnx oopses (with [PATCH]) Denis Vlasenko
2004-03-31 19:24 ` Andreas Henriksson
2004-03-31 20:38 ` Denis Vlasenko
2004-03-31 20:53 ` Jeff Garzik
2004-03-31 22:23 ` Francois Romieu
2004-04-01 11:09 ` Denis Vlasenko
2004-04-01 12:28 ` Francois Romieu
2004-03-29 7:52 ` fealnx oopses Denis Vlasenko
2004-03-26 20:20 ` Francois Romieu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).