e100 problems in .23rc8 ?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* e100 problems in .23rc8 ?
@ 2007-09-26 15:04 Dave Jones
  2007-09-26 18:10 ` Kok, Auke
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Jones @ 2007-09-26 15:04 UTC (permalink / raw)
  To: netdev

Last night, I hit this bug during boot up..
http://www.codemonkey.org.uk/junk/e100-2.jpg

This morning, I got a mail from a Fedora user of the same
.23-rc8 based kernel that has seen a different trace
also implicating e100..

http://www.codemonkey.org.uk/junk/e100.jpg

It may be that the two problems are unrelated, and it's
just coincidence that both reports happen to be on an e100,
but the timing is odd.  Have there been other reports
of similar problems recently ?

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-09-26 15:04 e100 problems in .23rc8 ? Dave Jones
@ 2007-09-26 18:10 ` Kok, Auke
  2007-09-26 18:18   ` Dave Jones
  2007-09-27  6:58   ` Herbert Xu
  0 siblings, 2 replies; 27+ messages in thread
From: Kok, Auke @ 2007-09-26 18:10 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev

Dave Jones wrote:
> Last night, I hit this bug during boot up..
> http://www.codemonkey.org.uk/junk/e100-2.jpg
> 
> This morning, I got a mail from a Fedora user of the same
> .23-rc8 based kernel that has seen a different trace
> also implicating e100..
> 
> http://www.codemonkey.org.uk/junk/e100.jpg
> 
> It may be that the two problems are unrelated, and it's
> just coincidence that both reports happen to be on an e100,
> but the timing is odd.  Have there been other reports
> of similar problems recently ?

there hasn't been a change to e100 in two months now - perhaps something slipped
into the stack that broke it? If this reproduces, could you bisect?

Auke

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-09-26 18:10 ` Kok, Auke
@ 2007-09-26 18:18   ` Dave Jones
  2007-09-27  6:58   ` Herbert Xu
  1 sibling, 0 replies; 27+ messages in thread
From: Dave Jones @ 2007-09-26 18:18 UTC (permalink / raw)
  To: Kok, Auke; +Cc: netdev

On Wed, Sep 26, 2007 at 11:10:11AM -0700, Kok, Auke wrote:
 > Dave Jones wrote:
 > > Last night, I hit this bug during boot up..
 > > http://www.codemonkey.org.uk/junk/e100-2.jpg
 > > 
 > > This morning, I got a mail from a Fedora user of the same
 > > .23-rc8 based kernel that has seen a different trace
 > > also implicating e100..
 > > 
 > > http://www.codemonkey.org.uk/junk/e100.jpg
 > > 
 > > It may be that the two problems are unrelated, and it's
 > > just coincidence that both reports happen to be on an e100,
 > > but the timing is odd.  Have there been other reports
 > > of similar problems recently ?
 > 
 > there hasn't been a change to e100 in two months now - perhaps something slipped
 > into the stack that broke it? If this reproduces, could you bisect?

Yeah, I notice only 3 changes to e100 since .22
I'll see if I can reproduce the first one and bisect.

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-09-26 18:10 ` Kok, Auke
  2007-09-26 18:18   ` Dave Jones
@ 2007-09-27  6:58   ` Herbert Xu
  2007-10-11  0:36     ` Dave Jones
  1 sibling, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2007-09-27  6:58 UTC (permalink / raw)
  To: Kok, Auke; +Cc: davej, netdev

Kok, Auke <auke-jan.h.kok@intel.com> wrote:
> Dave Jones wrote:
>> Last night, I hit this bug during boot up..
>> http://www.codemonkey.org.uk/junk/e100-2.jpg
>> 
>> This morning, I got a mail from a Fedora user of the same
>> .23-rc8 based kernel that has seen a different trace
>> also implicating e100..
>> 
>> http://www.codemonkey.org.uk/junk/e100.jpg
>> 
>> It may be that the two problems are unrelated, and it's
>> just coincidence that both reports happen to be on an e100,
>> but the timing is odd.  Have there been other reports
>> of similar problems recently ?
> 
> there hasn't been a change to e100 in two months now - perhaps something slipped
> into the stack that broke it? If this reproduces, could you bisect?

Well this looks exactly like the e1000 race that we fixed around
the time of the last kernel release.  That fix never made it into
e100 so it's no surprise that we get a similar crash here.

The problem is that if a spurious interrupt comes in between
request_irq and netif_poll_enable then you'll get a crash at
the next netif_rx_complete.

It'd be good if this were reproducible as it would allow us
to identify the source of the spurious interrupt, which may
well be caused by an unrelated bug somewhere else.

In any case, e100 should be prepared to deal with spurious
interrupts as e1000 has been fixed to do.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-09-27  6:58   ` Herbert Xu
@ 2007-10-11  0:36     ` Dave Jones
  2007-10-11  1:25       ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Jones @ 2007-10-11  0:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Kok, Auke, netdev, esandeen, dmack

On Thu, Sep 27, 2007 at 02:58:27PM +0800, Herbert Xu wrote:
 > Kok, Auke <auke-jan.h.kok@intel.com> wrote:
 > > Dave Jones wrote:
 > >> Last night, I hit this bug during boot up..
 > >> http://www.codemonkey.org.uk/junk/e100-2.jpg
 > >> 
 > >> This morning, I got a mail from a Fedora user of the same
 > >> .23-rc8 based kernel that has seen a different trace
 > >> also implicating e100..
 > >> 
 > >> http://www.codemonkey.org.uk/junk/e100.jpg
 > >> 
 > >> It may be that the two problems are unrelated, and it's
 > >> just coincidence that both reports happen to be on an e100,
 > >> but the timing is odd.  Have there been other reports
 > >> of similar problems recently ?
 > > 
 > > there hasn't been a change to e100 in two months now - perhaps something slipped
 > > into the stack that broke it? If this reproduces, could you bisect?

So I looked into this some more, after it reared its head again.
The problem with bisecting it is that it doesn't happen on every
boot, so it's difficult to determine the good/bad state.
I've never managed to reproduce it on 2.6.22 however.

 > Well this looks exactly like the e1000 race that we fixed around
 > the time of the last kernel release.  That fix never made it into
 > e100 so it's no surprise that we get a similar crash here.

We're starting to see more reports of this from Fedora users
now that 2.6.23 is final.  Once we push that as an update
for Fedora 7 users, it's likely we'll see even more.
(likewise for the soon-to-be released F8, based on 2.6.23)

The e1000 changes you reference above, is this the changeset you mean?

commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
Author: Auke Kok <auke-jan.h.kok@intel.com>
Date:   Fri Jun 1 10:22:39 2007 -0700

    e1000: disable polling before registering netdevice


 > The problem is that if a spurious interrupt comes in between
 > request_irq and netif_poll_enable then you'll get a crash at
 > the next netif_rx_complete.
 > 
 > It'd be good if this were reproducible as it would allow us
 > to identify the source of the spurious interrupt, which may
 > well be caused by an unrelated bug somewhere else.
 > 
 > In any case, e100 should be prepared to deal with spurious
 > interrupts as e1000 has been fixed to do.

Adding some of the other reporters of this bug to Cc,
in case they've found this more reproducable than myself
(maybe they'll have more luck bisecting).

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-11  0:36     ` Dave Jones
@ 2007-10-11  1:25       ` Herbert Xu
  2007-10-11 16:10         ` Kok, Auke
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2007-10-11  1:25 UTC (permalink / raw)
  To: Dave Jones; +Cc: Kok, Auke, netdev, esandeen, dmack

On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote:
> 
> The e1000 changes you reference above, is this the changeset you mean?
> 
> commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
> Author: Auke Kok <auke-jan.h.kok@intel.com>
> Date:   Fri Jun 1 10:22:39 2007 -0700
> 
>     e1000: disable polling before registering netdevice

Yep.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-11  1:25       ` Herbert Xu
@ 2007-10-11 16:10         ` Kok, Auke
  2007-10-11 17:25           ` Dave Jones
  2007-10-11 23:24           ` Herbert Xu
  0 siblings, 2 replies; 27+ messages in thread
From: Kok, Auke @ 2007-10-11 16:10 UTC (permalink / raw)
  To: Herbert Xu, Dave Jones; +Cc: Kok, Auke, netdev, esandeen, dmack

Herbert Xu wrote:
> On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote:
>> The e1000 changes you reference above, is this the changeset you mean?
>>
>> commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
>> Author: Auke Kok <auke-jan.h.kok@intel.com>
>> Date:   Fri Jun 1 10:22:39 2007 -0700
>>
>>     e1000: disable polling before registering netdevice
> 
> Yep.

this patch actually called napi_disable() in the probe routine which was wrong,
but e100 does not do that. Nonetheless e100 doesn't call netif_carrier_off() and
netif_stop_queue(), so to make e100 the same as e1000 we should probably do this,
see below.

Dave, can you see if this resolves the issue for you? If so then we might want to
push this to -stable.

Auke


---
e100: disable netdevice explicitly to avoid rx irq oops

Several reported OOPS messages suggest that e100 has a race that was fixed in
e1000 before where incoming interrupts trigger an OOPS immediately after probe()
finishes.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 280313b..ded5f68 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2682,6 +2682,10 @@ static int __devinit e100_probe(struct pci_dev *pdev,
 	if (err)
 		DPRINTK(PROBE, ERR, "Error clearing wake event\n");

+	/* tell the stack to leave us alone until e100_open() is called */
+	netif_carrier_off(netdev);
+	netif_stop_queue(netdev);
+
 	strcpy(netdev->name, "eth%d");
 	if((err = register_netdev(netdev))) {
 		DPRINTK(PROBE, ERR, "Cannot register net device, aborting.\n");

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-11 16:10         ` Kok, Auke
@ 2007-10-11 17:25           ` Dave Jones
  2007-10-11 18:56             ` Eric Sandeen
  2007-10-12 14:54             ` David Mack
  2007-10-11 23:24           ` Herbert Xu
  1 sibling, 2 replies; 27+ messages in thread
From: Dave Jones @ 2007-10-11 17:25 UTC (permalink / raw)
  To: Kok, Auke; +Cc: Herbert Xu, netdev, esandeen, dmack

On Thu, Oct 11, 2007 at 09:10:34AM -0700, Kok, Auke wrote:
 > Herbert Xu wrote:
 > > On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote:
 > >> The e1000 changes you reference above, is this the changeset you mean?
 > >>
 > >> commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
 > >> Author: Auke Kok <auke-jan.h.kok@intel.com>
 > >> Date:   Fri Jun 1 10:22:39 2007 -0700
 > >>
 > >>     e1000: disable polling before registering netdevice
 > > 
 > > Yep.
 > 
 > this patch actually called napi_disable() in the probe routine which was wrong,
 > but e100 does not do that. Nonetheless e100 doesn't call netif_carrier_off() and
 > netif_stop_queue(), so to make e100 the same as e1000 we should probably do this,
 > see below.
 > 
 > Dave, can you see if this resolves the issue for you? If so then we might want to
 > push this to -stable.
 
Will do, thanks Auke.

Eric/David, the Fedora 8 RPM version 2.6.23-6.fc8 will have this if you
want to give it a shot too.  It'll be at
http://people.redhat.com/davej/kernels/Fedora/f7.92/ when it's done
building in an hour or so.

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-11 17:25           ` Dave Jones
@ 2007-10-11 18:56             ` Eric Sandeen
  2007-10-12 14:54             ` David Mack
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Sandeen @ 2007-10-11 18:56 UTC (permalink / raw)
  To: Dave Jones; +Cc: Kok, Auke, Herbert Xu, netdev, esandeen, dmack

> Eric/David, the Fedora 8 RPM version 2.6.23-6.fc8 will have this if you
> want to give it a shot too.  It'll be at
> http://people.redhat.com/davej/kernels/Fedora/f7.92/ when it's done
> building in an hour or so.
> 
> 	Dave
> 

Thanks, I'll give it a whirl this evening.  I put a new net card in that 
box 'cause I got tired of resetting it :)

-Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: e100 problems in .23rc8 ?
  2007-10-11 17:25           ` Dave Jones
  2007-10-11 18:56             ` Eric Sandeen
@ 2007-10-12 14:54             ` David Mack
  2007-10-12 15:35               ` Herbert Xu
  1 sibling, 1 reply; 27+ messages in thread
From: David Mack @ 2007-10-12 14:54 UTC (permalink / raw)
  To: Dave Jones, Kok, Auke; +Cc: Herbert Xu, netdev, esandeen

[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]

Still no joy here. See attached capture. What's really weird is that it
shows *two* kernel panics, one in  e100_poll and one in _list_add.

Dave

> -----Original Message-----
> From: Dave Jones [mailto:davej@redhat.com] 
> Sent: Thursday, October 11, 2007 10:26 AM
> To: Kok, Auke
> Cc: Herbert Xu; netdev@vger.kernel.org; esandeen@redhat.com; 
> David Mack
> Subject: Re: e100 problems in .23rc8 ?
> 
> On Thu, Oct 11, 2007 at 09:10:34AM -0700, Kok, Auke wrote:
>  > Herbert Xu wrote:
>  > > On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote:
>  > >> The e1000 changes you reference above, is this the 
> changeset you mean?
>  > >>
>  > >> commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
>  > >> Author: Auke Kok <auke-jan.h.kok@intel.com>
>  > >> Date:   Fri Jun 1 10:22:39 2007 -0700
>  > >>
>  > >>     e1000: disable polling before registering netdevice
>  > > 
>  > > Yep.
>  > 
>  > this patch actually called napi_disable() in the probe 
> routine which was wrong,
>  > but e100 does not do that. Nonetheless e100 doesn't call 
> netif_carrier_off() and
>  > netif_stop_queue(), so to make e100 the same as e1000 we 
> should probably do this,
>  > see below.
>  > 
>  > Dave, can you see if this resolves the issue for you? If 
> so then we might want to
>  > push this to -stable.
>  
> Will do, thanks Auke.
> 
> Eric/David, the Fedora 8 RPM version 2.6.23-6.fc8 will have 
> this if you
> want to give it a shot too.  It'll be at
> http://people.redhat.com/davej/kernels/Fedora/f7.92/ when it's done
> building in an hour or so.
> 
> 	Dave
> 
> -- 
> http://www.codemonkey.org.uk
> 

[-- Attachment #2: capture.txt --]
[-- Type: text/plain, Size: 18055 bytes --]

Linux version 2.6.23-6.fc8 (kojibuilder@) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-31)) #1 SMP Thu Oct 11 14:54:16 EDT 2007
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000ec000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
 BIOS-e820: 000000003fff0000 - 000000003fff8000 (ACPI data)
 BIOS-e820: 000000003fff8000 - 0000000040000000 (ACPI NVS)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
896MB LOWMEM available.
Using x86 segment limits to approximate NX protection
Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->   229376
  HighMem    229376 ->   262128
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    0:        0 ->   262128
DMI 2.3 present.
Using APIC driver default
ACPI: RSDP 000FA8D0, 0014 (r0 AMI   )
ACPI: RSDT 3FFF0000, 0028 (r1 AMIINT                10 MSFT       97)
ACPI: FACP 3FFF0030, 0074 (r1 AMIINT                10 MSFT       97)
ACPI: DSDT 3FFF00B0, 2AE4 (r1    VIA   VT8371     1000 MSFT  100000B)
ACPI: FACS 3FFF8000, 0040
ACPI: PM-Timer IO Port: 0x808
Allocating PCI resources starting at 50000000 (gap: 40000000:bfff0000)
swsusp: Registered nosave memory region: 000000000009f000 - 00000000000a0000
swsusp: Registered nosave memory region: 00000000000a0000 - 00000000000ec000
swsusp: Registered nosave memory region: 00000000000ec000 - 0000000000100000
Built 1 zonelists in Zone order.  Total pages: 258545
Kernel command line: ro root=LABEL=/1 console=ttyS0,9600n8
Local APIC disabled by BIOS -- you can enable it with "lapic"
Enabling fast FPU save and restore... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c0814000 soft=c07f4000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 952.200 MHz processor.
Console: colour VGA+ 80x25
console [ttyS0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:    8
... MAX_LOCK_DEPTH:          30
... MAX_LOCKDEP_KEYS:        2048
... CLASSHASH_SIZE:           1024
... MAX_LOCKDEP_ENTRIES:     8192
... MAX_LOCKDEP_CHAINS:      16384
... CHAINHASH_SIZE:          8192
 memory used by lock dependency info: 1024 kB
 per task-struct memory footprint: 1680 bytes
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1022396k/1048512k available (2271k kernel code, 25372k reserved, 1174k data, 568k init, 131008k highmem)
virtual kernel memory layout:
    fixmap  : 0xffc53000 - 0xfffff000   (3760 kB)
    pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
    vmalloc : 0xf8800000 - 0xff7fe000   ( 111 MB)
    lowmem  : 0xc0000000 - 0xf8000000   ( 896 MB)
      .init : 0xc0763000 - 0xc07f1000   ( 568 kB)
      .data : 0xc0637e5f - 0xc075da44   (1174 kB)
      .text : 0xc0400000 - 0xc0637e5f   (2271 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
SLUB: Genslabs=22, HWalign=32, Order=0-1, MinObjects=4, CPUs=1, Nodes=1
Calibrating delay using timer specific routine.. 1907.59 BogoMIPS (lpj=953797)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 64K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to ffffe000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 12k freed
ACPI: Core revision 20070126
ACPI: setting ELCR to 0800 (from 0e00)
CPU0: AMD Athlon(tm) Processor stepping 01
SMP motherboard not detected.
Local APIC not detected. Using dummy APIC emulation.
Brought up 1 CPUs
khelper used greatest stack depth: 3160 bytes left
Booting paravirtualized kernel on bare hardware
Time:  7:47:03  Date: 10/12/07
NET: Registered protocol family 16
khelper used greatest stack depth: 3084 bytes left
No dock devices found.
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfdb71, last bus=1
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S4 S5)
ACPI: Using PIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
Disabling VIA memory write queue (PCI ID 0305, rev 03): [55] 89 & 1f -> 09
PCI quirk: region 0800-08ff claimed by vt82c586 ACPI
PCI quirk: region 0c00-0c7f claimed by vt82c686 HW-mon
PCI quirk: region 0400-040f claimed by vt82c686 SMB
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [URP2] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 11 devices
ACPI: ACPI bus type pnp unregistered
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
Time: tsc clocksource has been installed.
PCI: Bridge: 0000:00:01.0
  IO window: 8000-8fff
  MEM window: dde00000-dfefffff
  PREFETCH window: cdc00000-ddcfffff
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 65536 (order: 9, 2621440 bytes)
TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
TCP: Hash tables configured (established 65536 bind 65536)
TCP reno registered
checking if image is initramfs... it is
Switched to high resolution mode on CPU 0
Freeing initrd memory: 2921k freed
khelper used greatest stack depth: 2956 bytes left
apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)
apm: overridden by ACPI.
audit: initializing netlink socket (disabled)
audit(1192175220.146:1): initialized
highmem bounce pool size: 64 pages
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
ksign: Installing public key data
Loading keyring
- Added public key BA5046002C6A482A
- User ID: Red Hat, Inc. (Kernel Module GPG key)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
PCI: VIA PCI bridge detected. Disabling DAC.
PCI: Disabling Via external APIC routing
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
ACPI: Thermal Zone [THRM] (30 C)
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Real Time Clock Driver v1.12ac
Non-volatile memory driver v1.2
Linux agpgart interface v0.102
agpgart: Detected VIA Twister-K/KT133x/KM133 chipset
agpgart: AGP aperture is 64M @ 0xe0000000
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
ÿserial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:08: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:09: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize
input: Macintosh mouse button emulation as /class/input/input0
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard as /class/input/input1
cpuidle: using governor menu
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
powernow-k8: Processor cpuid 631 not supported
Using IPI No-Shortcut mode
  Magic number: 11:900:773
Freeing unused kernel memory: 568k freed
Write protecting the kernel read-only data: 877k
Red Hat nash version 6.0.19 starting
Mounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Setting up hotplug.
Creating block device nodes.
Loinsmod used greatest stack depth: 2696 bytes left
ading ehci-hcd.ko module
Loading ohci-hcd.ko module
Loading uhUSB Universal Host Controller Interface driver v3.0
ci-hcd.ko moduleACPI: PCI Interrupt Link [LNKD] enabled at IRQ 9

ACPI: PCI Interrupt 0000:00:07.2[D] -> Link [LNKD] -> GSI 9 (level, low) -> IRQ 9
uhci_hcd 0000:00:07.2: UHCI Host Controller
uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 1
uhci_hcd 0000:00:07.2: irq 9, io base 0x0000b800
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:07.3[D] -> Link [LNKD] -> GSI 9 (level, low) -> IRQ 9
uhci_hcd 0000:00:07.3: UHCI Host Controller
uhci_hcd 0000:00:07.3: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:07.3: irq 9, io base 0x0000bc00
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
insmod used greatest stack depth: 1892 bytes left
Loading mbcache.ko module
Loading jbd.ko minput: ImPS/2 Generic Wheel Mouse as /class/input/input2
odule
Loading ext3.ko module
Loading scsi_mod.ko module
SCSI subsystem initialized
Loading sd_mod.ko module
Loading libata.ko module
Loading ata_generic.ko module
Loading pata_via.ko module
scsi0 : pata_via
scsi1 : pata_via
ata1: PATA max UDMA/66 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001ffa0 irq 14
ata2: PATA max UDMA/66 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001ffa8 irq 15
ata1.00: ATA-5: IC35L060AVER07-0, ER6OA41A, max UDMA/100
ata1.00: 120103200 sectors, multi 16: LBA 
ata1.01: ATA-7: Maxtor 6Y250P0, YAR41BW0, max UDMA/133
ata1.01: 490234752 sectors, multi 16: LBA48 
ata1.00: configured for UDMA/66
ata1.01: configured for UDMA/66
ata2.00: ATAPI: PIONEER DVD-RW  DVR-106D, 1.05, max UDMA/33
ata2.00: configured for UDMA/33
scsi 0:0:0:0: Direct-Access     ATA      IC35L060AVER07-0 ER6O PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 120103200 512-byte hardware sectors (61493 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 120103200 512-byte hardware sectors (61493 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2 sda3
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 0:0:1:0: Direct-Access     ATA      Maxtor 6Y250P0   YAR4 PQ: 0 ANSI: 5
sd 0:0:1:0: [sdb] 490234752 512-byte hardware sectors (251000 MB)
sd 0:0:1:0: [sdb] Write Protect is off
sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:1:0: [sdb] 490234752 512-byte hardware sectors (251000 MB)
sd 0:0:1:0: [sdb] Write Protect is off
sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdb: sdb1
sd 0:0:1:0: [sdb] Attached SCSI disk
scsi 1:0:0:0: CD-ROM            PIONEER  DVD-RW  DVR-106D 1.05 PQ: 0 ANSI: 5
insmod used greatest stack depth: 820 bytes left
Waiting for driver initialization.
Trying to resume from LABEL=SWAP-sda2
No suspend signature on swap, not resuming.
Creating root device.
Mounting root filesystem.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Setting up other filesystems.
Setting up new root fs
no fstab.sys, mounting internal defaults
Switching to new root and running init.
unmounting old /dev
unmounting old /proc
unmounting old /sys
SELinux:  Disabled at runtime.
audit(1192175231.628:2): selinux=0 auid=4294967295
\rINIT: version 2.86 booting
		Welcome to Fedora 
		Press 'I' to enter interactive startup.
Setting clock  (localtime): Fri Oct 12 07:47:21 PDT 2007 [  OK  ]
Starting udev: [  OK  ]
Loading default keymap (us): [  OK  ]
Setting hostname garnet.leviatron.com:  [  OK  ]
No devices found
Setting up Logical Volume Management:   No volume groups found
[  OK  ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda3 
/1: clean, 269209/14731200 files, 2239016/14725580 blocks
[/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/sda1 
/boot: recovering journal
/boot: clean, 50/26104 files, 42169/104388 blocks
[  OK  ]
Remounting root filesystem in read-write mode:  [  OK  ]
Mounting local filesystems:  [  OK  ]
Enabling local filesystem quotas:  [  OK  ]
Enabling /etc/fstab swaps:  [  OK  ]
\rINIT: Entering runlevel: 3
Entering non-interactive startup
Checking for hardware changes [  OK  ]
Bringing up loopback interface:  [  OK  ]
Bringing up interface eth0:  
Determining IP information for eth0...------------[ cut here ]------------
kernel BUG at include/linux/netdevice.h:1008!
invalid opcode: 0000 [#1] SMP 
Modules linked in: ipv6 dm_mirror dm_multipath dm_mod floppy snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_via82xx gameport snd_via82xx_modem snd_seq_dummy snd_emu10k1 snd_hwdep snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_mpu401_uart via686a hwmon snd_rawmidi i2c_viapro snd_timer e100 snd_util_mem snd_seq_device mii button parport_pc snd snd_page_alloc i2c_core soundcore parport sr_mod sg cdrom pata_via ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<f8936cff>]    Not tainted VLI
EFLAGS: 00210046   (2.6.23-6.fc8 #1)
EIP is at e100_poll+0x24e/0x2ba [e100]
eax: 00000016   ebx: 00200246   ecx: 00000234   edx: f7b26000
esi: f7b26600   edi: 00000000   ebp: c07f4fc0   esp: c07f4f84
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process ip (pid: 1487, ti=c07f4000 task=f73fed60 task.ti=f6c3b000)
Stack: c07f4f9c c0449484 00000001 f73fed60 c07f4fd0 f7b26000 00000010 00000000 
       00000000 00000046 f6c1e000 007f4fbc f7b26000 00000000 c1e588b0 c07f4fe0 
       c05d05fb c1e58880 fffbf2ec 0000012c 00000001 c0751b18 0000000a c07f4ff8 
Call Trace:
 [<c0406463>] show_trace_log_lvl+0x1a/0x2f
 [<c0406513>] show_stack_log_lvl+0x9b/0xa3
 [<c04066d3>] show_registers+0x1b8/0x289
 [<c04068af>] die+0x10b/0x23e
 [<c0634ce8>] do_trap+0x8a/0xa3
 [<c0406ca1>] do_invalid_op+0x88/0x92
 [<c0634ab2>] error_code+0x72/0x78
 [<c05d05fb>] net_rx_action+0xa4/0x1bc
 [<c0432a39>] __do_softirq+0x78/0xff
 [<c04075d4>] do_softirq+0x74/0xf7
 =======================
Code: 42 2c a8 02 75 73 9c 58 8d 04 05 00 00 00 00 90 89 c3 fa 8d 04 05 00 00 00 00 90 90 e8 7d 0a b1 c7 8b 55 d8 8b 42 2c a8 20 75 04 <0f> 0b eb fe 8b 45 d8 05 80 01 00 00 e8 b4 de bc c7 8b 45 d8 90 
EIP: [<f8936cff>] e100_poll+0x24e/0x2ba [e100] SS:ESP 0068:c07f4f84
Kernel panic - not syncing: Fatal exception in interrupt
list_add corruption. prev->next should be next (c1e588b0), but was 00100100. (prev=f7b26180).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:33!
invalid opcode: 0000 [#2] SMP 
Modules linked in: ipv6 dm_mirror dm_multipath dm_mod floppy snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_via82xx gameport snd_via82xx_modem snd_seq_dummy snd_emu10k1 snd_hwdep snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_mpu401_uart via686a hwmon snd_rawmidi i2c_viapro snd_timer e100 snd_util_mem snd_seq_device mii button parport_pc snd snd_page_alloc i2c_core soundcore parport sr_mod sg cdrom pata_via ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<c0504c70>]    Tainted: G      D VLI
EFLAGS: 00210082   (2.6.23-6.fc8 #1)
EIP is at __list_add+0x4b/0x60
eax: 00000061   ebx: f7b26180   ecx: c042e1f4   edx: f73fed60
esi: 00100100   edi: f7b26600   ebp: c0814f94   esp: c0814f7c
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process ip (pid: 1487, ti=c0814000 task=f73fed60 task.ti=f6c3b000)
Stack: c06dab51 c1e588b0 00100100 f7b26180 f7b26000 00200046 c0814fa4 c05cd3a7 
       00000000 f7b26000 c0814fc8 f89355c4 00000000 c0814fc0 c052b9d1 f7ffd870 
       f70c6d70 00000000 00000000 c0814fe0 c046214e 0000000b c0752380 0000000b 
Call Trace:
 [<c0406463>] show_trace_log_lvl+0x1a/0x2f
 [<c0406513>] show_stack_log_lvl+0x9b/0xa3
 [<c04066d3>] show_registers+0x1b8/0x289
 [<c04068af>] die+0x10b/0x23e
 [<c0634ce8>] do_trap+0x8a/0xa3
 [<c0406ca1>] do_invalid_op+0x88/0x92
 [<c0634ab2>] error_code+0x72/0x78
 [<c05cd3a7>] __netif_rx_schedule+0x47/0xab
 [<f89355c4>] e100_intr+0x96/0xa5 [e100]
 [<c046214e>] handle_IRQ_event+0x1a/0x4f
 [<c0463696>] handle_level_irq+0x7f/0xc9
 [<c04076e8>] do_IRQ+0x91/0xbd
 =======================
Code: 01 ab 6d c0 e8 88 9a f2 ff 0f 0b eb fe 8b 32 39 ce 74 1c 89 54 24 0c 89 74 24 08 89 4c 24 04 c7 04 24 51 ab 6d c0 e8 66 9a f2 ff <0f> 0b eb fe 89 59 04 89 0b 89 43 04 89 18 83 c4 10 5b 5e 5d c3 
EIP: [<c0504c70>] __list_add+0x4b/0x60 SS:ESP 0068:c0814f7c
Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-12 14:54             ` David Mack
@ 2007-10-12 15:35               ` Herbert Xu
  2007-10-12 15:51                 ` David Mack
  2007-10-12 17:04                 ` Kok, Auke
  0 siblings, 2 replies; 27+ messages in thread
From: Herbert Xu @ 2007-10-12 15:35 UTC (permalink / raw)
  To: David Mack; +Cc: Dave Jones, Kok, Auke, netdev, esandeen

On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote:
> Still no joy here. See attached capture. What's really weird is that it
> shows *two* kernel panics, one in  e100_poll and one in _list_add.

Yes that's the symptom one would expect from that bug.  We really
need to apply the same fix that was done for e1000.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: e100 problems in .23rc8 ?
  2007-10-12 15:35               ` Herbert Xu
@ 2007-10-12 15:51                 ` David Mack
  2007-10-13  2:35                   ` Herbert Xu
  2007-10-12 17:04                 ` Kok, Auke
  1 sibling, 1 reply; 27+ messages in thread
From: David Mack @ 2007-10-12 15:51 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Dave Jones, Kok, Auke, netdev, esandeen

If I understand the message Dave Jones sent yesterday, the patch you
mention *was* applied to the e100 driver in 2.6.23-6.fc8?

Dave 

> -----Original Message-----
> From: Herbert Xu [mailto:herbert@gondor.apana.org.au] 
> Sent: Friday, October 12, 2007 8:36 AM
> To: David Mack
> Cc: Dave Jones; Kok, Auke; netdev@vger.kernel.org; esandeen@redhat.com
> Subject: Re: e100 problems in .23rc8 ?
> 
> On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote:
> > Still no joy here. See attached capture. What's really 
> weird is that it
> > shows *two* kernel panics, one in  e100_poll and one in _list_add.
> 
> Yes that's the symptom one would expect from that bug.  We really
> need to apply the same fix that was done for e1000.
> 
> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-12 15:51                 ` David Mack
@ 2007-10-13  2:35                   ` Herbert Xu
  2007-10-16 14:33                     ` Eric Sandeen
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2007-10-13  2:35 UTC (permalink / raw)
  To: David Mack; +Cc: herbert, davej, auke-jan.h.kok, netdev, esandeen

David Mack <dmack@juniper.net> wrote:
> If I understand the message Dave Jones sent yesterday, the patch you
> mention *was* applied to the e100 driver in 2.6.23-6.fc8?

Nope, he applied a different one which doesn't have the crucial
part to disable NAPI polls before registration.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-13  2:35                   ` Herbert Xu
@ 2007-10-16 14:33                     ` Eric Sandeen
  2007-10-16 14:35                       ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Sandeen @ 2007-10-16 14:33 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Mack, davej, auke-jan.h.kok, netdev

Herbert Xu wrote:
> David Mack <dmack@juniper.net> wrote:
>> If I understand the message Dave Jones sent yesterday, the patch you
>> mention *was* applied to the e100 driver in 2.6.23-6.fc8?
> 
> Nope, he applied a different one which doesn't have the crucial
> part to disable NAPI polls before registration.
> 
> Cheers,

Hm... running 2.6.23-6.fc8, I've been through 30+ reboot cycles without
a problem.  Before, I'd oops every 5 or so times I booted...

I now have another NIC in the box, disabled; I don't think that should
be affecting anything?

-Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-16 14:33                     ` Eric Sandeen
@ 2007-10-16 14:35                       ` Herbert Xu
  2007-10-16 15:47                         ` Eric Sandeen
  2007-10-16 16:39                         ` David Mack
  0 siblings, 2 replies; 27+ messages in thread
From: Herbert Xu @ 2007-10-16 14:35 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: David Mack, davej, auke-jan.h.kok, netdev

On Tue, Oct 16, 2007 at 09:33:15AM -0500, Eric Sandeen wrote:
> 
> Hm... running 2.6.23-6.fc8, I've been through 30+ reboot cycles without
> a problem.  Before, I'd oops every 5 or so times I booted...
> 
> I now have another NIC in the box, disabled; I don't think that should
> be affecting anything?

Well the original problem was caused by spurious interrupts on
the IRQ line where your e100 is so it could well be sporadic.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-16 14:35                       ` Herbert Xu
@ 2007-10-16 15:47                         ` Eric Sandeen
  2007-10-16 16:39                         ` David Mack
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Sandeen @ 2007-10-16 15:47 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Mack, davej, auke-jan.h.kok, netdev

Herbert Xu wrote:
> On Tue, Oct 16, 2007 at 09:33:15AM -0500, Eric Sandeen wrote:
>> Hm... running 2.6.23-6.fc8, I've been through 30+ reboot cycles without
>> a problem.  Before, I'd oops every 5 or so times I booted...
>>
>> I now have another NIC in the box, disabled; I don't think that should
>> be affecting anything?
> 
> Well the original problem was caused by spurious interrupts on
> the IRQ line where your e100 is so it could well be sporadic.

Hah, well, I took the other NIC out and it didn't survive more than a
couple reboots on that kernel.

Now that I know I can still hit it, I'll do any testing that's needed.

Thanks,

-Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: e100 problems in .23rc8 ?
  2007-10-16 14:35                       ` Herbert Xu
  2007-10-16 15:47                         ` Eric Sandeen
@ 2007-10-16 16:39                         ` David Mack
  1 sibling, 0 replies; 27+ messages in thread
From: David Mack @ 2007-10-16 16:39 UTC (permalink / raw)
  To: Herbert Xu, Eric Sandeen; +Cc: davej, auke-jan.h.kok, netdev

My problem is anything but sporadic. I have succeeded in booting a
2.6.23-based kernel exactly once since the roll toward F8 started early
last summer.

Dave

> -----Original Message-----
> From: Herbert Xu [mailto:herbert@gondor.apana.org.au] On 
> Behalf Of Herbert Xu
> Sent: Tuesday, October 16, 2007 7:35 AM
> To: Eric Sandeen
> Cc: David Mack; davej@redhat.com; auke-jan.h.kok@intel.com; 
> netdev@vger.kernel.org
> Subject: Re: e100 problems in .23rc8 ?
> 
> On Tue, Oct 16, 2007 at 09:33:15AM -0500, Eric Sandeen wrote:
> > 
> > Hm... running 2.6.23-6.fc8, I've been through 30+ reboot 
> cycles without
> > a problem.  Before, I'd oops every 5 or so times I booted...
> > 
> > I now have another NIC in the box, disabled; I don't think 
> that should
> > be affecting anything?
> 
> Well the original problem was caused by spurious interrupts on
> the IRQ line where your e100 is so it could well be sporadic.
> 
> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-12 15:35               ` Herbert Xu
  2007-10-12 15:51                 ` David Mack
@ 2007-10-12 17:04                 ` Kok, Auke
  2007-10-18 17:51                   ` David Mack
  1 sibling, 1 reply; 27+ messages in thread
From: Kok, Auke @ 2007-10-12 17:04 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Mack, Dave Jones, netdev, esandeen

Herbert Xu wrote:
> On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote:
>> Still no joy here. See attached capture. What's really weird is that it
>> shows *two* kernel panics, one in  e100_poll and one in _list_add.
> 
> Yes that's the symptom one would expect from that bug.  We really
> need to apply the same fix that was done for e1000.

I feared that. its not the same as that commit that floated around in this thread
and involves some reorganization in the init/probe code, so it's a bit more
involved than just a few lines. I'll need a little bit of time to generate this fix.

Auke

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: e100 problems in .23rc8 ?
  2007-10-12 17:04                 ` Kok, Auke
@ 2007-10-18 17:51                   ` David Mack
  2007-10-18 17:59                     ` Kok, Auke
  0 siblings, 1 reply; 27+ messages in thread
From: David Mack @ 2007-10-18 17:51 UTC (permalink / raw)
  To: Kok, Auke, Herbert Xu; +Cc: Dave Jones, netdev, esandeen

It appears that the needed e100 fix made it into the Fedora
2.6.23.1-23.fc8 kernel. Boots reliably now.

Huge thanks and great work, guys.

Dave

> -----Original Message-----
> From: Kok, Auke [mailto:auke-jan.h.kok@intel.com] 
> Sent: Friday, October 12, 2007 10:05 AM
> To: Herbert Xu
> Cc: David Mack; Dave Jones; netdev@vger.kernel.org; 
> esandeen@redhat.com
> Subject: Re: e100 problems in .23rc8 ?
> 
> Herbert Xu wrote:
> > On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote:
> >> Still no joy here. See attached capture. What's really 
> weird is that it
> >> shows *two* kernel panics, one in  e100_poll and one in _list_add.
> > 
> > Yes that's the symptom one would expect from that bug.  We really
> > need to apply the same fix that was done for e1000.
> 
> I feared that. its not the same as that commit that floated 
> around in this thread
> and involves some reorganization in the init/probe code, so 
> it's a bit more
> involved than just a few lines. I'll need a little bit of 
> time to generate this fix.
> 
> Auke
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-18 17:51                   ` David Mack
@ 2007-10-18 17:59                     ` Kok, Auke
  2007-10-18 18:17                       ` Chuck Ebbert
                                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Kok, Auke @ 2007-10-18 17:59 UTC (permalink / raw)
  To: Dave Jones; +Cc: David Mack, Herbert Xu, netdev, esandeen

David Mack wrote:
> It appears that the needed e100 fix made it into the Fedora
> 2.6.23.1-23.fc8 kernel. Boots reliably now.
> 
> Huge thanks and great work, guys.


DaveJ, I didn't push anything upstream. Can you verify this now works?

Auke


> 
> Dave
> 
>> -----Original Message-----
>> From: Kok, Auke [mailto:auke-jan.h.kok@intel.com] 
>> Sent: Friday, October 12, 2007 10:05 AM
>> To: Herbert Xu
>> Cc: David Mack; Dave Jones; netdev@vger.kernel.org; 
>> esandeen@redhat.com
>> Subject: Re: e100 problems in .23rc8 ?
>>
>> Herbert Xu wrote:
>>> On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote:
>>>> Still no joy here. See attached capture. What's really 
>> weird is that it
>>>> shows *two* kernel panics, one in  e100_poll and one in _list_add.
>>> Yes that's the symptom one would expect from that bug.  We really
>>> need to apply the same fix that was done for e1000.
>> I feared that. its not the same as that commit that floated 
>> around in this thread
>> and involves some reorganization in the init/probe code, so 
>> it's a bit more
>> involved than just a few lines. I'll need a little bit of 
>> time to generate this fix.
>>
>> Auke
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-18 17:59                     ` Kok, Auke
@ 2007-10-18 18:17                       ` Chuck Ebbert
  2007-10-22  1:04                       ` Dave Jones
  2007-10-22 14:44                       ` Chuck Ebbert
  2 siblings, 0 replies; 27+ messages in thread
From: Chuck Ebbert @ 2007-10-18 18:17 UTC (permalink / raw)
  To: Kok, Auke; +Cc: Dave Jones, David Mack, Herbert Xu, netdev, esandeen

On 10/18/2007 01:59 PM, Kok, Auke wrote:
> David Mack wrote:
>> It appears that the needed e100 fix made it into the Fedora
>> 2.6.23.1-23.fc8 kernel. Boots reliably now.
>>
>> Huge thanks and great work, guys.
> 
> 
> DaveJ, I didn't push anything upstream. Can you verify this now works?
> 

We didn't put anything in Fedora recently...


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-18 17:59                     ` Kok, Auke
  2007-10-18 18:17                       ` Chuck Ebbert
@ 2007-10-22  1:04                       ` Dave Jones
  2007-10-22  3:10                         ` Herbert Xu
  2007-10-22 14:05                         ` David Mack
  2007-10-22 14:44                       ` Chuck Ebbert
  2 siblings, 2 replies; 27+ messages in thread
From: Dave Jones @ 2007-10-22  1:04 UTC (permalink / raw)
  To: Kok, Auke; +Cc: David Mack, Herbert Xu, netdev, esandeen

On Thu, Oct 18, 2007 at 10:59:59AM -0700, Kok, Auke wrote:
 > David Mack wrote:
 > > It appears that the needed e100 fix made it into the Fedora
 > > 2.6.23.1-23.fc8 kernel. Boots reliably now.
 > > 
 > > Huge thanks and great work, guys.
 > 
 > DaveJ, I didn't push anything upstream. Can you verify this now works?

There was no e100 changes in the kernel above, so David just
got lucky. (The race doesn't always occur, so it sometimes appears
something got fixed.).

I included the patch below in the latest build, but I've not had
chance to try it on an e100 box yet..

	Dave

--- linux-2.6.23.noarch/drivers/net/e100.c~	2007-10-18 16:10:40.000000000 -0400
+++ linux-2.6.23.noarch/drivers/net/e100.c	2007-10-18 16:16:02.000000000 -0400
@@ -2682,6 +2682,8 @@ static int __devinit e100_probe(struct p
 	if (err)
 		DPRINTK(PROBE, ERR, "Error clearing wake event\n");
 
+	netif_poll_disable(netdev);
+
 	strcpy(netdev->name, "eth%d");
 	if((err = register_netdev(netdev))) {
 		DPRINTK(PROBE, ERR, "Cannot register net device, aborting.\n");

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-22  1:04                       ` Dave Jones
@ 2007-10-22  3:10                         ` Herbert Xu
  2007-10-22 14:05                         ` David Mack
  1 sibling, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2007-10-22  3:10 UTC (permalink / raw)
  To: Dave Jones; +Cc: Kok, Auke, David Mack, netdev, esandeen

On Sun, Oct 21, 2007 at 09:04:40PM -0400, Dave Jones wrote:
>
> I included the patch below in the latest build, but I've not had
> chance to try it on an e100 box yet..

Looks good to me.  Thanks Dave!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: e100 problems in .23rc8 ?
  2007-10-22  1:04                       ` Dave Jones
  2007-10-22  3:10                         ` Herbert Xu
@ 2007-10-22 14:05                         ` David Mack
  2007-10-22 14:59                           ` Eric Sandeen
  1 sibling, 1 reply; 27+ messages in thread
From: David Mack @ 2007-10-22 14:05 UTC (permalink / raw)
  To: Dave Jones, Kok, Auke; +Cc: Herbert Xu, netdev, esandeen

Then I got very, very lucky, since I have successfully rebooted
2.6.23.1-23.fc8 four times (zero panics) and this is the first time a
2.6.23 kernel has not panicked on me in months.

This does not fill me with confidence in the theory that the panics I've
been seeing are due to a race condition.

Dave

> -----Original Message-----
> From: Dave Jones [mailto:davej@redhat.com] 
> Sent: Sunday, October 21, 2007 6:05 PM
> To: Kok, Auke
> Cc: David Mack; Herbert Xu; netdev@vger.kernel.org; 
> esandeen@redhat.com
> Subject: Re: e100 problems in .23rc8 ?
> 
> On Thu, Oct 18, 2007 at 10:59:59AM -0700, Kok, Auke wrote:
>  > David Mack wrote:
>  > > It appears that the needed e100 fix made it into the Fedora
>  > > 2.6.23.1-23.fc8 kernel. Boots reliably now.
>  > > 
>  > > Huge thanks and great work, guys.
>  > 
>  > DaveJ, I didn't push anything upstream. Can you verify 
> this now works?
> 
> There was no e100 changes in the kernel above, so David just
> got lucky. (The race doesn't always occur, so it sometimes appears
> something got fixed.).
> 
> I included the patch below in the latest build, but I've not had
> chance to try it on an e100 box yet..
> 
> 	Dave
> 
> --- linux-2.6.23.noarch/drivers/net/e100.c~	2007-10-18 
> 16:10:40.000000000 -0400
> +++ linux-2.6.23.noarch/drivers/net/e100.c	2007-10-18 
> 16:16:02.000000000 -0400
> @@ -2682,6 +2682,8 @@ static int __devinit e100_probe(struct p
>  	if (err)
>  		DPRINTK(PROBE, ERR, "Error clearing wake event\n");
>  
> +	netif_poll_disable(netdev);
> +
>  	strcpy(netdev->name, "eth%d");
>  	if((err = register_netdev(netdev))) {
>  		DPRINTK(PROBE, ERR, "Cannot register net 
> device, aborting.\n");
> 
> -- 
> http://www.codemonkey.org.uk
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-22 14:05                         ` David Mack
@ 2007-10-22 14:59                           ` Eric Sandeen
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Sandeen @ 2007-10-22 14:59 UTC (permalink / raw)
  To: David Mack; +Cc: Dave Jones, Kok, Auke, Herbert Xu, netdev, esandeen

David Mack wrote:
> Then I got very, very lucky, since I have successfully rebooted
> 2.6.23.1-23.fc8 four times (zero panics) and this is the first time a
> 2.6.23 kernel has not panicked on me in months.
> 
> This does not fill me with confidence in the theory that the panics I've
> been seeing are due to a race condition.

I'll agree with the testing results, at least.  I booted successfully
*60* times with 2.6.23.1-23.fc8, booting the stock F8test3 kernel would
oops every 5 or 6 boots.

It may well be a race, but if so something is apparently opening/closing
the window on us! :)

-Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-18 17:59                     ` Kok, Auke
  2007-10-18 18:17                       ` Chuck Ebbert
  2007-10-22  1:04                       ` Dave Jones
@ 2007-10-22 14:44                       ` Chuck Ebbert
  2 siblings, 0 replies; 27+ messages in thread
From: Chuck Ebbert @ 2007-10-22 14:44 UTC (permalink / raw)
  To: Kok, Auke; +Cc: Dave Jones, David Mack, Herbert Xu, netdev, esandeen

On 10/18/2007 01:59 PM, Kok, Auke wrote:
> David Mack wrote:
>> It appears that the needed e100 fix made it into the Fedora
>> 2.6.23.1-23.fc8 kernel. Boots reliably now.
>>
>> Huge thanks and great work, guys.
> 
> 
> DaveJ, I didn't push anything upstream. Can you verify this now works?
> 

One of our users just posted this:


We observed the same panic on a Dell Dimension 5150 (E510), although not
limited to warm boots.	We noticed that the following trace is possible:

- when starting the interface, e100_up() gets called
- it calls e100_hw_init(), which disables e100 IRQ generation
(e100_disable_irq())
- it registers the interrupt handler
- the interrupt handler (e100_intr()) gets called - this happens because the
IRQ line is shared with another device (in this case, the SATA controller)
- the interrupt handler examines the stat_ack register of the interface: even
though interrupts are disabled, an event is indicated and the interrupt handler
proceeds
- the interrupt handler calls netif_rx_schedule_prep(), which sets the
__LINK_STATE_RX_SCHED bit, and __netif_rx_schedule(), which adds the interface
to the poll list
- when the interrupt handler returns, e100_up() calls netif_poll_enable(), thus
clearing the __LINK_STATE_RX_SCHED bit
- now the NET RX softirq (net_rx_action) calls e100_poll(), which in turn calls
netif_rx_complete()
- netif_rx_complete() checks whether the __LINK_STATE_RX_SCHED bit is set and
triggers the panic

To avoid this situation, where the interrupt handler executes although e100
interrupts are disabled, we suggest the attached patch.  It lets the interrupt
handler check the interrupt mask bit before proceeding with the interrupt
handling.


Authors: Christof Efkemann <chref@tzi.de>, Kai Thomsen <kthomsen@tzi.de>
Description:
Avoid interrupt handler execution if e100 interrupts are disabled.
Checks the interrupt mask bit before proceeding with the interrupt handling.

--- drivers/net/e100.c.old	2007-10-20 18:32:40.000000000 +0200
+++ drivers/net/e100.c	2007-10-20 18:36:02.000000000 +0200
@@ -1960,11 +1960,13 @@
 	struct net_device *netdev = dev_id;
 	struct nic *nic = netdev_priv(netdev);
 	u8 stat_ack = ioread8(&nic->csr->scb.stat_ack);
+	u8 cmd_hi = ioread8(&nic->csr->scb.cmd_hi);
 
 	DPRINTK(INTR, DEBUG, "stat_ack = 0x%02X\n", stat_ack);
 
 	if(stat_ack == stat_ack_not_ours ||	/* Not our interrupt */
-	   stat_ack == stat_ack_not_present)	/* Hardware is ejected */
+	   stat_ack == stat_ack_not_present ||	/* Hardware is ejected */
+	   cmd_hi & irq_mask_all)		/* Interrupts masked */
 		return IRQ_NONE;
 
 	/* Ack interrupt(s) */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: e100 problems in .23rc8 ?
  2007-10-11 16:10         ` Kok, Auke
  2007-10-11 17:25           ` Dave Jones
@ 2007-10-11 23:24           ` Herbert Xu
  1 sibling, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2007-10-11 23:24 UTC (permalink / raw)
  To: Kok, Auke; +Cc: Dave Jones, netdev, esandeen, dmack

On Thu, Oct 11, 2007 at 09:10:34AM -0700, Kok, Auke wrote:
> >>
> >> commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
> >> Author: Auke Kok <auke-jan.h.kok@intel.com>
> >> Date:   Fri Jun 1 10:22:39 2007 -0700
> >>
> >>     e1000: disable polling before registering netdevice
> 
> this patch actually called napi_disable() in the probe routine which was wrong,
> but e100 does not do that. Nonetheless e100 doesn't call netif_carrier_off() and
> netif_stop_queue(), so to make e100 the same as e1000 we should probably do this,
> see below.

Back then we didn't have napi_disable at all.  That patch calls
netif_poll_disable which has different semantics.

> Dave, can you see if this resolves the issue for you? If so then we might want to
> push this to -stable.

The problem is with netif_poll so this patch probably won't help.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2007-10-22 15:00 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-26 15:04 e100 problems in .23rc8 ? Dave Jones
2007-09-26 18:10 ` Kok, Auke
2007-09-26 18:18   ` Dave Jones
2007-09-27  6:58   ` Herbert Xu
2007-10-11  0:36     ` Dave Jones
2007-10-11  1:25       ` Herbert Xu
2007-10-11 16:10         ` Kok, Auke
2007-10-11 17:25           ` Dave Jones
2007-10-11 18:56             ` Eric Sandeen
2007-10-12 14:54             ` David Mack
2007-10-12 15:35               ` Herbert Xu
2007-10-12 15:51                 ` David Mack
2007-10-13  2:35                   ` Herbert Xu
2007-10-16 14:33                     ` Eric Sandeen
2007-10-16 14:35                       ` Herbert Xu
2007-10-16 15:47                         ` Eric Sandeen
2007-10-16 16:39                         ` David Mack
2007-10-12 17:04                 ` Kok, Auke
2007-10-18 17:51                   ` David Mack
2007-10-18 17:59                     ` Kok, Auke
2007-10-18 18:17                       ` Chuck Ebbert
2007-10-22  1:04                       ` Dave Jones
2007-10-22  3:10                         ` Herbert Xu
2007-10-22 14:05                         ` David Mack
2007-10-22 14:59                           ` Eric Sandeen
2007-10-22 14:44                       ` Chuck Ebbert
2007-10-11 23:24           ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).