* e100 problems in .23rc8 ? @ 2007-09-26 15:04 Dave Jones 2007-09-26 18:10 ` Kok, Auke 0 siblings, 1 reply; 27+ messages in thread From: Dave Jones @ 2007-09-26 15:04 UTC (permalink / raw) To: netdev Last night, I hit this bug during boot up.. http://www.codemonkey.org.uk/junk/e100-2.jpg This morning, I got a mail from a Fedora user of the same .23-rc8 based kernel that has seen a different trace also implicating e100.. http://www.codemonkey.org.uk/junk/e100.jpg It may be that the two problems are unrelated, and it's just coincidence that both reports happen to be on an e100, but the timing is odd. Have there been other reports of similar problems recently ? Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-09-26 15:04 e100 problems in .23rc8 ? Dave Jones @ 2007-09-26 18:10 ` Kok, Auke 2007-09-26 18:18 ` Dave Jones 2007-09-27 6:58 ` Herbert Xu 0 siblings, 2 replies; 27+ messages in thread From: Kok, Auke @ 2007-09-26 18:10 UTC (permalink / raw) To: Dave Jones; +Cc: netdev Dave Jones wrote: > Last night, I hit this bug during boot up.. > http://www.codemonkey.org.uk/junk/e100-2.jpg > > This morning, I got a mail from a Fedora user of the same > .23-rc8 based kernel that has seen a different trace > also implicating e100.. > > http://www.codemonkey.org.uk/junk/e100.jpg > > It may be that the two problems are unrelated, and it's > just coincidence that both reports happen to be on an e100, > but the timing is odd. Have there been other reports > of similar problems recently ? there hasn't been a change to e100 in two months now - perhaps something slipped into the stack that broke it? If this reproduces, could you bisect? Auke ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-09-26 18:10 ` Kok, Auke @ 2007-09-26 18:18 ` Dave Jones 2007-09-27 6:58 ` Herbert Xu 1 sibling, 0 replies; 27+ messages in thread From: Dave Jones @ 2007-09-26 18:18 UTC (permalink / raw) To: Kok, Auke; +Cc: netdev On Wed, Sep 26, 2007 at 11:10:11AM -0700, Kok, Auke wrote: > Dave Jones wrote: > > Last night, I hit this bug during boot up.. > > http://www.codemonkey.org.uk/junk/e100-2.jpg > > > > This morning, I got a mail from a Fedora user of the same > > .23-rc8 based kernel that has seen a different trace > > also implicating e100.. > > > > http://www.codemonkey.org.uk/junk/e100.jpg > > > > It may be that the two problems are unrelated, and it's > > just coincidence that both reports happen to be on an e100, > > but the timing is odd. Have there been other reports > > of similar problems recently ? > > there hasn't been a change to e100 in two months now - perhaps something slipped > into the stack that broke it? If this reproduces, could you bisect? Yeah, I notice only 3 changes to e100 since .22 I'll see if I can reproduce the first one and bisect. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-09-26 18:10 ` Kok, Auke 2007-09-26 18:18 ` Dave Jones @ 2007-09-27 6:58 ` Herbert Xu 2007-10-11 0:36 ` Dave Jones 1 sibling, 1 reply; 27+ messages in thread From: Herbert Xu @ 2007-09-27 6:58 UTC (permalink / raw) To: Kok, Auke; +Cc: davej, netdev Kok, Auke <auke-jan.h.kok@intel.com> wrote: > Dave Jones wrote: >> Last night, I hit this bug during boot up.. >> http://www.codemonkey.org.uk/junk/e100-2.jpg >> >> This morning, I got a mail from a Fedora user of the same >> .23-rc8 based kernel that has seen a different trace >> also implicating e100.. >> >> http://www.codemonkey.org.uk/junk/e100.jpg >> >> It may be that the two problems are unrelated, and it's >> just coincidence that both reports happen to be on an e100, >> but the timing is odd. Have there been other reports >> of similar problems recently ? > > there hasn't been a change to e100 in two months now - perhaps something slipped > into the stack that broke it? If this reproduces, could you bisect? Well this looks exactly like the e1000 race that we fixed around the time of the last kernel release. That fix never made it into e100 so it's no surprise that we get a similar crash here. The problem is that if a spurious interrupt comes in between request_irq and netif_poll_enable then you'll get a crash at the next netif_rx_complete. It'd be good if this were reproducible as it would allow us to identify the source of the spurious interrupt, which may well be caused by an unrelated bug somewhere else. In any case, e100 should be prepared to deal with spurious interrupts as e1000 has been fixed to do. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-09-27 6:58 ` Herbert Xu @ 2007-10-11 0:36 ` Dave Jones 2007-10-11 1:25 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: Dave Jones @ 2007-10-11 0:36 UTC (permalink / raw) To: Herbert Xu; +Cc: Kok, Auke, netdev, esandeen, dmack On Thu, Sep 27, 2007 at 02:58:27PM +0800, Herbert Xu wrote: > Kok, Auke <auke-jan.h.kok@intel.com> wrote: > > Dave Jones wrote: > >> Last night, I hit this bug during boot up.. > >> http://www.codemonkey.org.uk/junk/e100-2.jpg > >> > >> This morning, I got a mail from a Fedora user of the same > >> .23-rc8 based kernel that has seen a different trace > >> also implicating e100.. > >> > >> http://www.codemonkey.org.uk/junk/e100.jpg > >> > >> It may be that the two problems are unrelated, and it's > >> just coincidence that both reports happen to be on an e100, > >> but the timing is odd. Have there been other reports > >> of similar problems recently ? > > > > there hasn't been a change to e100 in two months now - perhaps something slipped > > into the stack that broke it? If this reproduces, could you bisect? So I looked into this some more, after it reared its head again. The problem with bisecting it is that it doesn't happen on every boot, so it's difficult to determine the good/bad state. I've never managed to reproduce it on 2.6.22 however. > Well this looks exactly like the e1000 race that we fixed around > the time of the last kernel release. That fix never made it into > e100 so it's no surprise that we get a similar crash here. We're starting to see more reports of this from Fedora users now that 2.6.23 is final. Once we push that as an update for Fedora 7 users, it's likely we'll see even more. (likewise for the soon-to-be released F8, based on 2.6.23) The e1000 changes you reference above, is this the changeset you mean? commit 416b5d10afdc797c21c457ade3714e8f2f75edd9 Author: Auke Kok <auke-jan.h.kok@intel.com> Date: Fri Jun 1 10:22:39 2007 -0700 e1000: disable polling before registering netdevice > The problem is that if a spurious interrupt comes in between > request_irq and netif_poll_enable then you'll get a crash at > the next netif_rx_complete. > > It'd be good if this were reproducible as it would allow us > to identify the source of the spurious interrupt, which may > well be caused by an unrelated bug somewhere else. > > In any case, e100 should be prepared to deal with spurious > interrupts as e1000 has been fixed to do. Adding some of the other reporters of this bug to Cc, in case they've found this more reproducable than myself (maybe they'll have more luck bisecting). Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-11 0:36 ` Dave Jones @ 2007-10-11 1:25 ` Herbert Xu 2007-10-11 16:10 ` Kok, Auke 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2007-10-11 1:25 UTC (permalink / raw) To: Dave Jones; +Cc: Kok, Auke, netdev, esandeen, dmack On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote: > > The e1000 changes you reference above, is this the changeset you mean? > > commit 416b5d10afdc797c21c457ade3714e8f2f75edd9 > Author: Auke Kok <auke-jan.h.kok@intel.com> > Date: Fri Jun 1 10:22:39 2007 -0700 > > e1000: disable polling before registering netdevice Yep. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-11 1:25 ` Herbert Xu @ 2007-10-11 16:10 ` Kok, Auke 2007-10-11 17:25 ` Dave Jones 2007-10-11 23:24 ` Herbert Xu 0 siblings, 2 replies; 27+ messages in thread From: Kok, Auke @ 2007-10-11 16:10 UTC (permalink / raw) To: Herbert Xu, Dave Jones; +Cc: Kok, Auke, netdev, esandeen, dmack Herbert Xu wrote: > On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote: >> The e1000 changes you reference above, is this the changeset you mean? >> >> commit 416b5d10afdc797c21c457ade3714e8f2f75edd9 >> Author: Auke Kok <auke-jan.h.kok@intel.com> >> Date: Fri Jun 1 10:22:39 2007 -0700 >> >> e1000: disable polling before registering netdevice > > Yep. this patch actually called napi_disable() in the probe routine which was wrong, but e100 does not do that. Nonetheless e100 doesn't call netif_carrier_off() and netif_stop_queue(), so to make e100 the same as e1000 we should probably do this, see below. Dave, can you see if this resolves the issue for you? If so then we might want to push this to -stable. Auke --- e100: disable netdevice explicitly to avoid rx irq oops Several reported OOPS messages suggest that e100 has a race that was fixed in e1000 before where incoming interrupts trigger an OOPS immediately after probe() finishes. Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> diff --git a/drivers/net/e100.c b/drivers/net/e100.c index 280313b..ded5f68 100644 --- a/drivers/net/e100.c +++ b/drivers/net/e100.c @@ -2682,6 +2682,10 @@ static int __devinit e100_probe(struct pci_dev *pdev, if (err) DPRINTK(PROBE, ERR, "Error clearing wake event\n"); + /* tell the stack to leave us alone until e100_open() is called */ + netif_carrier_off(netdev); + netif_stop_queue(netdev); + strcpy(netdev->name, "eth%d"); if((err = register_netdev(netdev))) { DPRINTK(PROBE, ERR, "Cannot register net device, aborting.\n"); ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-11 16:10 ` Kok, Auke @ 2007-10-11 17:25 ` Dave Jones 2007-10-11 18:56 ` Eric Sandeen 2007-10-12 14:54 ` David Mack 2007-10-11 23:24 ` Herbert Xu 1 sibling, 2 replies; 27+ messages in thread From: Dave Jones @ 2007-10-11 17:25 UTC (permalink / raw) To: Kok, Auke; +Cc: Herbert Xu, netdev, esandeen, dmack On Thu, Oct 11, 2007 at 09:10:34AM -0700, Kok, Auke wrote: > Herbert Xu wrote: > > On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote: > >> The e1000 changes you reference above, is this the changeset you mean? > >> > >> commit 416b5d10afdc797c21c457ade3714e8f2f75edd9 > >> Author: Auke Kok <auke-jan.h.kok@intel.com> > >> Date: Fri Jun 1 10:22:39 2007 -0700 > >> > >> e1000: disable polling before registering netdevice > > > > Yep. > > this patch actually called napi_disable() in the probe routine which was wrong, > but e100 does not do that. Nonetheless e100 doesn't call netif_carrier_off() and > netif_stop_queue(), so to make e100 the same as e1000 we should probably do this, > see below. > > Dave, can you see if this resolves the issue for you? If so then we might want to > push this to -stable. Will do, thanks Auke. Eric/David, the Fedora 8 RPM version 2.6.23-6.fc8 will have this if you want to give it a shot too. It'll be at http://people.redhat.com/davej/kernels/Fedora/f7.92/ when it's done building in an hour or so. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-11 17:25 ` Dave Jones @ 2007-10-11 18:56 ` Eric Sandeen 2007-10-12 14:54 ` David Mack 1 sibling, 0 replies; 27+ messages in thread From: Eric Sandeen @ 2007-10-11 18:56 UTC (permalink / raw) To: Dave Jones; +Cc: Kok, Auke, Herbert Xu, netdev, esandeen, dmack > Eric/David, the Fedora 8 RPM version 2.6.23-6.fc8 will have this if you > want to give it a shot too. It'll be at > http://people.redhat.com/davej/kernels/Fedora/f7.92/ when it's done > building in an hour or so. > > Dave > Thanks, I'll give it a whirl this evening. I put a new net card in that box 'cause I got tired of resetting it :) -Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: e100 problems in .23rc8 ? 2007-10-11 17:25 ` Dave Jones 2007-10-11 18:56 ` Eric Sandeen @ 2007-10-12 14:54 ` David Mack 2007-10-12 15:35 ` Herbert Xu 1 sibling, 1 reply; 27+ messages in thread From: David Mack @ 2007-10-12 14:54 UTC (permalink / raw) To: Dave Jones, Kok, Auke; +Cc: Herbert Xu, netdev, esandeen [-- Attachment #1: Type: text/plain, Size: 1657 bytes --] Still no joy here. See attached capture. What's really weird is that it shows *two* kernel panics, one in e100_poll and one in _list_add. Dave > -----Original Message----- > From: Dave Jones [mailto:davej@redhat.com] > Sent: Thursday, October 11, 2007 10:26 AM > To: Kok, Auke > Cc: Herbert Xu; netdev@vger.kernel.org; esandeen@redhat.com; > David Mack > Subject: Re: e100 problems in .23rc8 ? > > On Thu, Oct 11, 2007 at 09:10:34AM -0700, Kok, Auke wrote: > > Herbert Xu wrote: > > > On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote: > > >> The e1000 changes you reference above, is this the > changeset you mean? > > >> > > >> commit 416b5d10afdc797c21c457ade3714e8f2f75edd9 > > >> Author: Auke Kok <auke-jan.h.kok@intel.com> > > >> Date: Fri Jun 1 10:22:39 2007 -0700 > > >> > > >> e1000: disable polling before registering netdevice > > > > > > Yep. > > > > this patch actually called napi_disable() in the probe > routine which was wrong, > > but e100 does not do that. Nonetheless e100 doesn't call > netif_carrier_off() and > > netif_stop_queue(), so to make e100 the same as e1000 we > should probably do this, > > see below. > > > > Dave, can you see if this resolves the issue for you? If > so then we might want to > > push this to -stable. > > Will do, thanks Auke. > > Eric/David, the Fedora 8 RPM version 2.6.23-6.fc8 will have > this if you > want to give it a shot too. It'll be at > http://people.redhat.com/davej/kernels/Fedora/f7.92/ when it's done > building in an hour or so. > > Dave > > -- > http://www.codemonkey.org.uk > [-- Attachment #2: capture.txt --] [-- Type: text/plain, Size: 18055 bytes --] Linux version 2.6.23-6.fc8 (kojibuilder@) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-31)) #1 SMP Thu Oct 11 14:54:16 EDT 2007 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000ec000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003fff8000 (ACPI data) BIOS-e820: 000000003fff8000 - 0000000040000000 (ACPI NVS) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 127MB HIGHMEM available. 896MB LOWMEM available. Using x86 segment limits to approximate NX protection Zone PFN ranges: DMA 0 -> 4096 Normal 4096 -> 229376 HighMem 229376 -> 262128 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0 -> 262128 DMI 2.3 present. Using APIC driver default ACPI: RSDP 000FA8D0, 0014 (r0 AMI ) ACPI: RSDT 3FFF0000, 0028 (r1 AMIINT 10 MSFT 97) ACPI: FACP 3FFF0030, 0074 (r1 AMIINT 10 MSFT 97) ACPI: DSDT 3FFF00B0, 2AE4 (r1 VIA VT8371 1000 MSFT 100000B) ACPI: FACS 3FFF8000, 0040 ACPI: PM-Timer IO Port: 0x808 Allocating PCI resources starting at 50000000 (gap: 40000000:bfff0000) swsusp: Registered nosave memory region: 000000000009f000 - 00000000000a0000 swsusp: Registered nosave memory region: 00000000000a0000 - 00000000000ec000 swsusp: Registered nosave memory region: 00000000000ec000 - 0000000000100000 Built 1 zonelists in Zone order. Total pages: 258545 Kernel command line: ro root=LABEL=/1 console=ttyS0,9600n8 Local APIC disabled by BIOS -- you can enable it with "lapic" Enabling fast FPU save and restore... done. Initializing CPU#0 CPU 0 irqstacks, hard=c0814000 soft=c07f4000 PID hash table entries: 4096 (order: 12, 16384 bytes) Detected 952.200 MHz processor. Console: colour VGA+ 80x25 console [ttyS0] enabled Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar ... MAX_LOCKDEP_SUBCLASSES: 8 ... MAX_LOCK_DEPTH: 30 ... MAX_LOCKDEP_KEYS: 2048 ... CLASSHASH_SIZE: 1024 ... MAX_LOCKDEP_ENTRIES: 8192 ... MAX_LOCKDEP_CHAINS: 16384 ... CHAINHASH_SIZE: 8192 memory used by lock dependency info: 1024 kB per task-struct memory footprint: 1680 bytes Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1022396k/1048512k available (2271k kernel code, 25372k reserved, 1174k data, 568k init, 131008k highmem) virtual kernel memory layout: fixmap : 0xffc53000 - 0xfffff000 (3760 kB) pkmap : 0xff800000 - 0xffc00000 (4096 kB) vmalloc : 0xf8800000 - 0xff7fe000 ( 111 MB) lowmem : 0xc0000000 - 0xf8000000 ( 896 MB) .init : 0xc0763000 - 0xc07f1000 ( 568 kB) .data : 0xc0637e5f - 0xc075da44 (1174 kB) .text : 0xc0400000 - 0xc0637e5f (2271 kB) Checking if this processor honours the WP bit even in supervisor mode... Ok. SLUB: Genslabs=22, HWalign=32, Order=0-1, MinObjects=4, CPUs=1, Nodes=1 Calibrating delay using timer specific routine.. 1907.59 BogoMIPS (lpj=953797) Security Framework v1.0.0 initialized SELinux: Initializing. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 512 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 64K (64 bytes/line) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Compat vDSO mapped to ffffe000. Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code Freeing SMP alternatives: 12k freed ACPI: Core revision 20070126 ACPI: setting ELCR to 0800 (from 0e00) CPU0: AMD Athlon(tm) Processor stepping 01 SMP motherboard not detected. Local APIC not detected. Using dummy APIC emulation. Brought up 1 CPUs khelper used greatest stack depth: 3160 bytes left Booting paravirtualized kernel on bare hardware Time: 7:47:03 Date: 10/12/07 NET: Registered protocol family 16 khelper used greatest stack depth: 3084 bytes left No dock devices found. ACPI: bus type pci registered PCI: PCI BIOS revision 2.10 entry at 0xfdb71, last bus=1 PCI: Using configuration type 1 Setting up standard PCI resources ACPI: Interpreter enabled ACPI: (supports S0 S1 S4 S5) ACPI: Using PIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) Disabling VIA memory write queue (PCI ID 0305, rev 03): [55] 89 & 1f -> 09 PCI quirk: region 0800-08ff claimed by vt82c586 ACPI PCI quirk: region 0c00-0c7f claimed by vt82c686 HW-mon PCI quirk: region 0400-040f claimed by vt82c686 SMB ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *9 10 11 12 14 15) ACPI: Power Resource [URP1] (off) ACPI: Power Resource [URP2] (off) ACPI: Power Resource [FDDP] (off) ACPI: Power Resource [LPTP] (off) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init ACPI: bus type pnp registered pnp: PnP ACPI: found 11 devices ACPI: ACPI bus type pnp unregistered usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default Time: tsc clocksource has been installed. PCI: Bridge: 0000:00:01.0 IO window: 8000-8fff MEM window: dde00000-dfefffff PREFETCH window: cdc00000-ddcfffff NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 65536 (order: 9, 2621440 bytes) TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) TCP: Hash tables configured (established 65536 bind 65536) TCP reno registered checking if image is initramfs... it is Switched to high resolution mode on CPU 0 Freeing initrd memory: 2921k freed khelper used greatest stack depth: 2956 bytes left apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac) apm: overridden by ACPI. audit: initializing netlink socket (disabled) audit(1192175220.146:1): initialized highmem bounce pool size: 64 pages Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) ksign: Installing public key data Loading keyring - Added public key BA5046002C6A482A - User ID: Red Hat, Inc. (Kernel Module GPG key) Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) PCI: VIA PCI bridge detected. Disabling DAC. PCI: Disabling Via external APIC routing pci_hotplug: PCI Hot Plug PCI Core version: 0.5 ACPI: Thermal Zone [THRM] (30 C) isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found Real Time Clock Driver v1.12ac Non-volatile memory driver v1.2 Linux agpgart interface v0.102 agpgart: Detected VIA Twister-K/KT133x/KM133 chipset agpgart: AGP aperture is 64M @ 0xe0000000 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled ÿserial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A 00:08: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:09: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize input: Macintosh mouse button emulation as /class/input/input0 PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice input: AT Translated Set 2 keyboard as /class/input/input1 cpuidle: using governor menu usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver TCP cubic registered Initializing XFRM netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 powernow-k8: Processor cpuid 631 not supported Using IPI No-Shortcut mode Magic number: 11:900:773 Freeing unused kernel memory: 568k freed Write protecting the kernel read-only data: 877k Red Hat nash version 6.0.19 starting Mounting proc filesystem Mounting sysfs filesystem Creating /dev Creating initial device nodes Setting up hotplug. Creating block device nodes. Loinsmod used greatest stack depth: 2696 bytes left ading ehci-hcd.ko module Loading ohci-hcd.ko module Loading uhUSB Universal Host Controller Interface driver v3.0 ci-hcd.ko moduleACPI: PCI Interrupt Link [LNKD] enabled at IRQ 9 ACPI: PCI Interrupt 0000:00:07.2[D] -> Link [LNKD] -> GSI 9 (level, low) -> IRQ 9 uhci_hcd 0000:00:07.2: UHCI Host Controller uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 1 uhci_hcd 0000:00:07.2: irq 9, io base 0x0000b800 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:07.3[D] -> Link [LNKD] -> GSI 9 (level, low) -> IRQ 9 uhci_hcd 0000:00:07.3: UHCI Host Controller uhci_hcd 0000:00:07.3: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:07.3: irq 9, io base 0x0000bc00 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected insmod used greatest stack depth: 1892 bytes left Loading mbcache.ko module Loading jbd.ko minput: ImPS/2 Generic Wheel Mouse as /class/input/input2 odule Loading ext3.ko module Loading scsi_mod.ko module SCSI subsystem initialized Loading sd_mod.ko module Loading libata.ko module Loading ata_generic.ko module Loading pata_via.ko module scsi0 : pata_via scsi1 : pata_via ata1: PATA max UDMA/66 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001ffa0 irq 14 ata2: PATA max UDMA/66 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001ffa8 irq 15 ata1.00: ATA-5: IC35L060AVER07-0, ER6OA41A, max UDMA/100 ata1.00: 120103200 sectors, multi 16: LBA ata1.01: ATA-7: Maxtor 6Y250P0, YAR41BW0, max UDMA/133 ata1.01: 490234752 sectors, multi 16: LBA48 ata1.00: configured for UDMA/66 ata1.01: configured for UDMA/66 ata2.00: ATAPI: PIONEER DVD-RW DVR-106D, 1.05, max UDMA/33 ata2.00: configured for UDMA/33 scsi 0:0:0:0: Direct-Access ATA IC35L060AVER07-0 ER6O PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 120103200 512-byte hardware sectors (61493 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 120103200 512-byte hardware sectors (61493 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sd 0:0:0:0: [sda] Attached SCSI disk scsi 0:0:1:0: Direct-Access ATA Maxtor 6Y250P0 YAR4 PQ: 0 ANSI: 5 sd 0:0:1:0: [sdb] 490234752 512-byte hardware sectors (251000 MB) sd 0:0:1:0: [sdb] Write Protect is off sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:1:0: [sdb] 490234752 512-byte hardware sectors (251000 MB) sd 0:0:1:0: [sdb] Write Protect is off sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: sdb1 sd 0:0:1:0: [sdb] Attached SCSI disk scsi 1:0:0:0: CD-ROM PIONEER DVD-RW DVR-106D 1.05 PQ: 0 ANSI: 5 insmod used greatest stack depth: 820 bytes left Waiting for driver initialization. Trying to resume from LABEL=SWAP-sda2 No suspend signature on swap, not resuming. Creating root device. Mounting root filesystem. EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. Setting up other filesystems. Setting up new root fs no fstab.sys, mounting internal defaults Switching to new root and running init. unmounting old /dev unmounting old /proc unmounting old /sys SELinux: Disabled at runtime. audit(1192175231.628:2): selinux=0 auid=4294967295 \rINIT: version 2.86 booting Welcome to Fedora Press 'I' to enter interactive startup. Setting clock (localtime): Fri Oct 12 07:47:21 PDT 2007 [ OK ] Starting udev: [ OK ] Loading default keymap (us): [ OK ] Setting hostname garnet.leviatron.com: [ OK ] No devices found Setting up Logical Volume Management: No volume groups found [ OK ] Checking filesystems Checking all file systems. [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda3 /1: clean, 269209/14731200 files, 2239016/14725580 blocks [/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/sda1 /boot: recovering journal /boot: clean, 50/26104 files, 42169/104388 blocks [ OK ] Remounting root filesystem in read-write mode: [ OK ] Mounting local filesystems: [ OK ] Enabling local filesystem quotas: [ OK ] Enabling /etc/fstab swaps: [ OK ] \rINIT: Entering runlevel: 3 Entering non-interactive startup Checking for hardware changes [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: Determining IP information for eth0...------------[ cut here ]------------ kernel BUG at include/linux/netdevice.h:1008! invalid opcode: 0000 [#1] SMP Modules linked in: ipv6 dm_mirror dm_multipath dm_mod floppy snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_via82xx gameport snd_via82xx_modem snd_seq_dummy snd_emu10k1 snd_hwdep snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_mpu401_uart via686a hwmon snd_rawmidi i2c_viapro snd_timer e100 snd_util_mem snd_seq_device mii button parport_pc snd snd_page_alloc i2c_core soundcore parport sr_mod sg cdrom pata_via ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<f8936cff>] Not tainted VLI EFLAGS: 00210046 (2.6.23-6.fc8 #1) EIP is at e100_poll+0x24e/0x2ba [e100] eax: 00000016 ebx: 00200246 ecx: 00000234 edx: f7b26000 esi: f7b26600 edi: 00000000 ebp: c07f4fc0 esp: c07f4f84 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process ip (pid: 1487, ti=c07f4000 task=f73fed60 task.ti=f6c3b000) Stack: c07f4f9c c0449484 00000001 f73fed60 c07f4fd0 f7b26000 00000010 00000000 00000000 00000046 f6c1e000 007f4fbc f7b26000 00000000 c1e588b0 c07f4fe0 c05d05fb c1e58880 fffbf2ec 0000012c 00000001 c0751b18 0000000a c07f4ff8 Call Trace: [<c0406463>] show_trace_log_lvl+0x1a/0x2f [<c0406513>] show_stack_log_lvl+0x9b/0xa3 [<c04066d3>] show_registers+0x1b8/0x289 [<c04068af>] die+0x10b/0x23e [<c0634ce8>] do_trap+0x8a/0xa3 [<c0406ca1>] do_invalid_op+0x88/0x92 [<c0634ab2>] error_code+0x72/0x78 [<c05d05fb>] net_rx_action+0xa4/0x1bc [<c0432a39>] __do_softirq+0x78/0xff [<c04075d4>] do_softirq+0x74/0xf7 ======================= Code: 42 2c a8 02 75 73 9c 58 8d 04 05 00 00 00 00 90 89 c3 fa 8d 04 05 00 00 00 00 90 90 e8 7d 0a b1 c7 8b 55 d8 8b 42 2c a8 20 75 04 <0f> 0b eb fe 8b 45 d8 05 80 01 00 00 e8 b4 de bc c7 8b 45 d8 90 EIP: [<f8936cff>] e100_poll+0x24e/0x2ba [e100] SS:ESP 0068:c07f4f84 Kernel panic - not syncing: Fatal exception in interrupt list_add corruption. prev->next should be next (c1e588b0), but was 00100100. (prev=f7b26180). ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:33! invalid opcode: 0000 [#2] SMP Modules linked in: ipv6 dm_mirror dm_multipath dm_mod floppy snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_via82xx gameport snd_via82xx_modem snd_seq_dummy snd_emu10k1 snd_hwdep snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_mpu401_uart via686a hwmon snd_rawmidi i2c_viapro snd_timer e100 snd_util_mem snd_seq_device mii button parport_pc snd snd_page_alloc i2c_core soundcore parport sr_mod sg cdrom pata_via ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<c0504c70>] Tainted: G D VLI EFLAGS: 00210082 (2.6.23-6.fc8 #1) EIP is at __list_add+0x4b/0x60 eax: 00000061 ebx: f7b26180 ecx: c042e1f4 edx: f73fed60 esi: 00100100 edi: f7b26600 ebp: c0814f94 esp: c0814f7c ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process ip (pid: 1487, ti=c0814000 task=f73fed60 task.ti=f6c3b000) Stack: c06dab51 c1e588b0 00100100 f7b26180 f7b26000 00200046 c0814fa4 c05cd3a7 00000000 f7b26000 c0814fc8 f89355c4 00000000 c0814fc0 c052b9d1 f7ffd870 f70c6d70 00000000 00000000 c0814fe0 c046214e 0000000b c0752380 0000000b Call Trace: [<c0406463>] show_trace_log_lvl+0x1a/0x2f [<c0406513>] show_stack_log_lvl+0x9b/0xa3 [<c04066d3>] show_registers+0x1b8/0x289 [<c04068af>] die+0x10b/0x23e [<c0634ce8>] do_trap+0x8a/0xa3 [<c0406ca1>] do_invalid_op+0x88/0x92 [<c0634ab2>] error_code+0x72/0x78 [<c05cd3a7>] __netif_rx_schedule+0x47/0xab [<f89355c4>] e100_intr+0x96/0xa5 [e100] [<c046214e>] handle_IRQ_event+0x1a/0x4f [<c0463696>] handle_level_irq+0x7f/0xc9 [<c04076e8>] do_IRQ+0x91/0xbd ======================= Code: 01 ab 6d c0 e8 88 9a f2 ff 0f 0b eb fe 8b 32 39 ce 74 1c 89 54 24 0c 89 74 24 08 89 4c 24 04 c7 04 24 51 ab 6d c0 e8 66 9a f2 ff <0f> 0b eb fe 89 59 04 89 0b 89 43 04 89 18 83 c4 10 5b 5e 5d c3 EIP: [<c0504c70>] __list_add+0x4b/0x60 SS:ESP 0068:c0814f7c Kernel panic - not syncing: Fatal exception in interrupt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-12 14:54 ` David Mack @ 2007-10-12 15:35 ` Herbert Xu 2007-10-12 15:51 ` David Mack 2007-10-12 17:04 ` Kok, Auke 0 siblings, 2 replies; 27+ messages in thread From: Herbert Xu @ 2007-10-12 15:35 UTC (permalink / raw) To: David Mack; +Cc: Dave Jones, Kok, Auke, netdev, esandeen On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote: > Still no joy here. See attached capture. What's really weird is that it > shows *two* kernel panics, one in e100_poll and one in _list_add. Yes that's the symptom one would expect from that bug. We really need to apply the same fix that was done for e1000. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: e100 problems in .23rc8 ? 2007-10-12 15:35 ` Herbert Xu @ 2007-10-12 15:51 ` David Mack 2007-10-13 2:35 ` Herbert Xu 2007-10-12 17:04 ` Kok, Auke 1 sibling, 1 reply; 27+ messages in thread From: David Mack @ 2007-10-12 15:51 UTC (permalink / raw) To: Herbert Xu; +Cc: Dave Jones, Kok, Auke, netdev, esandeen If I understand the message Dave Jones sent yesterday, the patch you mention *was* applied to the e100 driver in 2.6.23-6.fc8? Dave > -----Original Message----- > From: Herbert Xu [mailto:herbert@gondor.apana.org.au] > Sent: Friday, October 12, 2007 8:36 AM > To: David Mack > Cc: Dave Jones; Kok, Auke; netdev@vger.kernel.org; esandeen@redhat.com > Subject: Re: e100 problems in .23rc8 ? > > On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote: > > Still no joy here. See attached capture. What's really > weird is that it > > shows *two* kernel panics, one in e100_poll and one in _list_add. > > Yes that's the symptom one would expect from that bug. We really > need to apply the same fix that was done for e1000. > > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-12 15:51 ` David Mack @ 2007-10-13 2:35 ` Herbert Xu 2007-10-16 14:33 ` Eric Sandeen 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2007-10-13 2:35 UTC (permalink / raw) To: David Mack; +Cc: herbert, davej, auke-jan.h.kok, netdev, esandeen David Mack <dmack@juniper.net> wrote: > If I understand the message Dave Jones sent yesterday, the patch you > mention *was* applied to the e100 driver in 2.6.23-6.fc8? Nope, he applied a different one which doesn't have the crucial part to disable NAPI polls before registration. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-13 2:35 ` Herbert Xu @ 2007-10-16 14:33 ` Eric Sandeen 2007-10-16 14:35 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: Eric Sandeen @ 2007-10-16 14:33 UTC (permalink / raw) To: Herbert Xu; +Cc: David Mack, davej, auke-jan.h.kok, netdev Herbert Xu wrote: > David Mack <dmack@juniper.net> wrote: >> If I understand the message Dave Jones sent yesterday, the patch you >> mention *was* applied to the e100 driver in 2.6.23-6.fc8? > > Nope, he applied a different one which doesn't have the crucial > part to disable NAPI polls before registration. > > Cheers, Hm... running 2.6.23-6.fc8, I've been through 30+ reboot cycles without a problem. Before, I'd oops every 5 or so times I booted... I now have another NIC in the box, disabled; I don't think that should be affecting anything? -Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-16 14:33 ` Eric Sandeen @ 2007-10-16 14:35 ` Herbert Xu 2007-10-16 15:47 ` Eric Sandeen 2007-10-16 16:39 ` David Mack 0 siblings, 2 replies; 27+ messages in thread From: Herbert Xu @ 2007-10-16 14:35 UTC (permalink / raw) To: Eric Sandeen; +Cc: David Mack, davej, auke-jan.h.kok, netdev On Tue, Oct 16, 2007 at 09:33:15AM -0500, Eric Sandeen wrote: > > Hm... running 2.6.23-6.fc8, I've been through 30+ reboot cycles without > a problem. Before, I'd oops every 5 or so times I booted... > > I now have another NIC in the box, disabled; I don't think that should > be affecting anything? Well the original problem was caused by spurious interrupts on the IRQ line where your e100 is so it could well be sporadic. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-16 14:35 ` Herbert Xu @ 2007-10-16 15:47 ` Eric Sandeen 2007-10-16 16:39 ` David Mack 1 sibling, 0 replies; 27+ messages in thread From: Eric Sandeen @ 2007-10-16 15:47 UTC (permalink / raw) To: Herbert Xu; +Cc: David Mack, davej, auke-jan.h.kok, netdev Herbert Xu wrote: > On Tue, Oct 16, 2007 at 09:33:15AM -0500, Eric Sandeen wrote: >> Hm... running 2.6.23-6.fc8, I've been through 30+ reboot cycles without >> a problem. Before, I'd oops every 5 or so times I booted... >> >> I now have another NIC in the box, disabled; I don't think that should >> be affecting anything? > > Well the original problem was caused by spurious interrupts on > the IRQ line where your e100 is so it could well be sporadic. Hah, well, I took the other NIC out and it didn't survive more than a couple reboots on that kernel. Now that I know I can still hit it, I'll do any testing that's needed. Thanks, -Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: e100 problems in .23rc8 ? 2007-10-16 14:35 ` Herbert Xu 2007-10-16 15:47 ` Eric Sandeen @ 2007-10-16 16:39 ` David Mack 1 sibling, 0 replies; 27+ messages in thread From: David Mack @ 2007-10-16 16:39 UTC (permalink / raw) To: Herbert Xu, Eric Sandeen; +Cc: davej, auke-jan.h.kok, netdev My problem is anything but sporadic. I have succeeded in booting a 2.6.23-based kernel exactly once since the roll toward F8 started early last summer. Dave > -----Original Message----- > From: Herbert Xu [mailto:herbert@gondor.apana.org.au] On > Behalf Of Herbert Xu > Sent: Tuesday, October 16, 2007 7:35 AM > To: Eric Sandeen > Cc: David Mack; davej@redhat.com; auke-jan.h.kok@intel.com; > netdev@vger.kernel.org > Subject: Re: e100 problems in .23rc8 ? > > On Tue, Oct 16, 2007 at 09:33:15AM -0500, Eric Sandeen wrote: > > > > Hm... running 2.6.23-6.fc8, I've been through 30+ reboot > cycles without > > a problem. Before, I'd oops every 5 or so times I booted... > > > > I now have another NIC in the box, disabled; I don't think > that should > > be affecting anything? > > Well the original problem was caused by spurious interrupts on > the IRQ line where your e100 is so it could well be sporadic. > > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-12 15:35 ` Herbert Xu 2007-10-12 15:51 ` David Mack @ 2007-10-12 17:04 ` Kok, Auke 2007-10-18 17:51 ` David Mack 1 sibling, 1 reply; 27+ messages in thread From: Kok, Auke @ 2007-10-12 17:04 UTC (permalink / raw) To: Herbert Xu; +Cc: David Mack, Dave Jones, netdev, esandeen Herbert Xu wrote: > On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote: >> Still no joy here. See attached capture. What's really weird is that it >> shows *two* kernel panics, one in e100_poll and one in _list_add. > > Yes that's the symptom one would expect from that bug. We really > need to apply the same fix that was done for e1000. I feared that. its not the same as that commit that floated around in this thread and involves some reorganization in the init/probe code, so it's a bit more involved than just a few lines. I'll need a little bit of time to generate this fix. Auke ^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: e100 problems in .23rc8 ? 2007-10-12 17:04 ` Kok, Auke @ 2007-10-18 17:51 ` David Mack 2007-10-18 17:59 ` Kok, Auke 0 siblings, 1 reply; 27+ messages in thread From: David Mack @ 2007-10-18 17:51 UTC (permalink / raw) To: Kok, Auke, Herbert Xu; +Cc: Dave Jones, netdev, esandeen It appears that the needed e100 fix made it into the Fedora 2.6.23.1-23.fc8 kernel. Boots reliably now. Huge thanks and great work, guys. Dave > -----Original Message----- > From: Kok, Auke [mailto:auke-jan.h.kok@intel.com] > Sent: Friday, October 12, 2007 10:05 AM > To: Herbert Xu > Cc: David Mack; Dave Jones; netdev@vger.kernel.org; > esandeen@redhat.com > Subject: Re: e100 problems in .23rc8 ? > > Herbert Xu wrote: > > On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote: > >> Still no joy here. See attached capture. What's really > weird is that it > >> shows *two* kernel panics, one in e100_poll and one in _list_add. > > > > Yes that's the symptom one would expect from that bug. We really > > need to apply the same fix that was done for e1000. > > I feared that. its not the same as that commit that floated > around in this thread > and involves some reorganization in the init/probe code, so > it's a bit more > involved than just a few lines. I'll need a little bit of > time to generate this fix. > > Auke > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-18 17:51 ` David Mack @ 2007-10-18 17:59 ` Kok, Auke 2007-10-18 18:17 ` Chuck Ebbert ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Kok, Auke @ 2007-10-18 17:59 UTC (permalink / raw) To: Dave Jones; +Cc: David Mack, Herbert Xu, netdev, esandeen David Mack wrote: > It appears that the needed e100 fix made it into the Fedora > 2.6.23.1-23.fc8 kernel. Boots reliably now. > > Huge thanks and great work, guys. DaveJ, I didn't push anything upstream. Can you verify this now works? Auke > > Dave > >> -----Original Message----- >> From: Kok, Auke [mailto:auke-jan.h.kok@intel.com] >> Sent: Friday, October 12, 2007 10:05 AM >> To: Herbert Xu >> Cc: David Mack; Dave Jones; netdev@vger.kernel.org; >> esandeen@redhat.com >> Subject: Re: e100 problems in .23rc8 ? >> >> Herbert Xu wrote: >>> On Fri, Oct 12, 2007 at 07:54:33AM -0700, David Mack wrote: >>>> Still no joy here. See attached capture. What's really >> weird is that it >>>> shows *two* kernel panics, one in e100_poll and one in _list_add. >>> Yes that's the symptom one would expect from that bug. We really >>> need to apply the same fix that was done for e1000. >> I feared that. its not the same as that commit that floated >> around in this thread >> and involves some reorganization in the init/probe code, so >> it's a bit more >> involved than just a few lines. I'll need a little bit of >> time to generate this fix. >> >> Auke >> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-18 17:59 ` Kok, Auke @ 2007-10-18 18:17 ` Chuck Ebbert 2007-10-22 1:04 ` Dave Jones 2007-10-22 14:44 ` Chuck Ebbert 2 siblings, 0 replies; 27+ messages in thread From: Chuck Ebbert @ 2007-10-18 18:17 UTC (permalink / raw) To: Kok, Auke; +Cc: Dave Jones, David Mack, Herbert Xu, netdev, esandeen On 10/18/2007 01:59 PM, Kok, Auke wrote: > David Mack wrote: >> It appears that the needed e100 fix made it into the Fedora >> 2.6.23.1-23.fc8 kernel. Boots reliably now. >> >> Huge thanks and great work, guys. > > > DaveJ, I didn't push anything upstream. Can you verify this now works? > We didn't put anything in Fedora recently... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-18 17:59 ` Kok, Auke 2007-10-18 18:17 ` Chuck Ebbert @ 2007-10-22 1:04 ` Dave Jones 2007-10-22 3:10 ` Herbert Xu 2007-10-22 14:05 ` David Mack 2007-10-22 14:44 ` Chuck Ebbert 2 siblings, 2 replies; 27+ messages in thread From: Dave Jones @ 2007-10-22 1:04 UTC (permalink / raw) To: Kok, Auke; +Cc: David Mack, Herbert Xu, netdev, esandeen On Thu, Oct 18, 2007 at 10:59:59AM -0700, Kok, Auke wrote: > David Mack wrote: > > It appears that the needed e100 fix made it into the Fedora > > 2.6.23.1-23.fc8 kernel. Boots reliably now. > > > > Huge thanks and great work, guys. > > DaveJ, I didn't push anything upstream. Can you verify this now works? There was no e100 changes in the kernel above, so David just got lucky. (The race doesn't always occur, so it sometimes appears something got fixed.). I included the patch below in the latest build, but I've not had chance to try it on an e100 box yet.. Dave --- linux-2.6.23.noarch/drivers/net/e100.c~ 2007-10-18 16:10:40.000000000 -0400 +++ linux-2.6.23.noarch/drivers/net/e100.c 2007-10-18 16:16:02.000000000 -0400 @@ -2682,6 +2682,8 @@ static int __devinit e100_probe(struct p if (err) DPRINTK(PROBE, ERR, "Error clearing wake event\n"); + netif_poll_disable(netdev); + strcpy(netdev->name, "eth%d"); if((err = register_netdev(netdev))) { DPRINTK(PROBE, ERR, "Cannot register net device, aborting.\n"); -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-22 1:04 ` Dave Jones @ 2007-10-22 3:10 ` Herbert Xu 2007-10-22 14:05 ` David Mack 1 sibling, 0 replies; 27+ messages in thread From: Herbert Xu @ 2007-10-22 3:10 UTC (permalink / raw) To: Dave Jones; +Cc: Kok, Auke, David Mack, netdev, esandeen On Sun, Oct 21, 2007 at 09:04:40PM -0400, Dave Jones wrote: > > I included the patch below in the latest build, but I've not had > chance to try it on an e100 box yet.. Looks good to me. Thanks Dave! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: e100 problems in .23rc8 ? 2007-10-22 1:04 ` Dave Jones 2007-10-22 3:10 ` Herbert Xu @ 2007-10-22 14:05 ` David Mack 2007-10-22 14:59 ` Eric Sandeen 1 sibling, 1 reply; 27+ messages in thread From: David Mack @ 2007-10-22 14:05 UTC (permalink / raw) To: Dave Jones, Kok, Auke; +Cc: Herbert Xu, netdev, esandeen Then I got very, very lucky, since I have successfully rebooted 2.6.23.1-23.fc8 four times (zero panics) and this is the first time a 2.6.23 kernel has not panicked on me in months. This does not fill me with confidence in the theory that the panics I've been seeing are due to a race condition. Dave > -----Original Message----- > From: Dave Jones [mailto:davej@redhat.com] > Sent: Sunday, October 21, 2007 6:05 PM > To: Kok, Auke > Cc: David Mack; Herbert Xu; netdev@vger.kernel.org; > esandeen@redhat.com > Subject: Re: e100 problems in .23rc8 ? > > On Thu, Oct 18, 2007 at 10:59:59AM -0700, Kok, Auke wrote: > > David Mack wrote: > > > It appears that the needed e100 fix made it into the Fedora > > > 2.6.23.1-23.fc8 kernel. Boots reliably now. > > > > > > Huge thanks and great work, guys. > > > > DaveJ, I didn't push anything upstream. Can you verify > this now works? > > There was no e100 changes in the kernel above, so David just > got lucky. (The race doesn't always occur, so it sometimes appears > something got fixed.). > > I included the patch below in the latest build, but I've not had > chance to try it on an e100 box yet.. > > Dave > > --- linux-2.6.23.noarch/drivers/net/e100.c~ 2007-10-18 > 16:10:40.000000000 -0400 > +++ linux-2.6.23.noarch/drivers/net/e100.c 2007-10-18 > 16:16:02.000000000 -0400 > @@ -2682,6 +2682,8 @@ static int __devinit e100_probe(struct p > if (err) > DPRINTK(PROBE, ERR, "Error clearing wake event\n"); > > + netif_poll_disable(netdev); > + > strcpy(netdev->name, "eth%d"); > if((err = register_netdev(netdev))) { > DPRINTK(PROBE, ERR, "Cannot register net > device, aborting.\n"); > > -- > http://www.codemonkey.org.uk > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-22 14:05 ` David Mack @ 2007-10-22 14:59 ` Eric Sandeen 0 siblings, 0 replies; 27+ messages in thread From: Eric Sandeen @ 2007-10-22 14:59 UTC (permalink / raw) To: David Mack; +Cc: Dave Jones, Kok, Auke, Herbert Xu, netdev, esandeen David Mack wrote: > Then I got very, very lucky, since I have successfully rebooted > 2.6.23.1-23.fc8 four times (zero panics) and this is the first time a > 2.6.23 kernel has not panicked on me in months. > > This does not fill me with confidence in the theory that the panics I've > been seeing are due to a race condition. I'll agree with the testing results, at least. I booted successfully *60* times with 2.6.23.1-23.fc8, booting the stock F8test3 kernel would oops every 5 or 6 boots. It may well be a race, but if so something is apparently opening/closing the window on us! :) -Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-18 17:59 ` Kok, Auke 2007-10-18 18:17 ` Chuck Ebbert 2007-10-22 1:04 ` Dave Jones @ 2007-10-22 14:44 ` Chuck Ebbert 2 siblings, 0 replies; 27+ messages in thread From: Chuck Ebbert @ 2007-10-22 14:44 UTC (permalink / raw) To: Kok, Auke; +Cc: Dave Jones, David Mack, Herbert Xu, netdev, esandeen On 10/18/2007 01:59 PM, Kok, Auke wrote: > David Mack wrote: >> It appears that the needed e100 fix made it into the Fedora >> 2.6.23.1-23.fc8 kernel. Boots reliably now. >> >> Huge thanks and great work, guys. > > > DaveJ, I didn't push anything upstream. Can you verify this now works? > One of our users just posted this: We observed the same panic on a Dell Dimension 5150 (E510), although not limited to warm boots. We noticed that the following trace is possible: - when starting the interface, e100_up() gets called - it calls e100_hw_init(), which disables e100 IRQ generation (e100_disable_irq()) - it registers the interrupt handler - the interrupt handler (e100_intr()) gets called - this happens because the IRQ line is shared with another device (in this case, the SATA controller) - the interrupt handler examines the stat_ack register of the interface: even though interrupts are disabled, an event is indicated and the interrupt handler proceeds - the interrupt handler calls netif_rx_schedule_prep(), which sets the __LINK_STATE_RX_SCHED bit, and __netif_rx_schedule(), which adds the interface to the poll list - when the interrupt handler returns, e100_up() calls netif_poll_enable(), thus clearing the __LINK_STATE_RX_SCHED bit - now the NET RX softirq (net_rx_action) calls e100_poll(), which in turn calls netif_rx_complete() - netif_rx_complete() checks whether the __LINK_STATE_RX_SCHED bit is set and triggers the panic To avoid this situation, where the interrupt handler executes although e100 interrupts are disabled, we suggest the attached patch. It lets the interrupt handler check the interrupt mask bit before proceeding with the interrupt handling. Authors: Christof Efkemann <chref@tzi.de>, Kai Thomsen <kthomsen@tzi.de> Description: Avoid interrupt handler execution if e100 interrupts are disabled. Checks the interrupt mask bit before proceeding with the interrupt handling. --- drivers/net/e100.c.old 2007-10-20 18:32:40.000000000 +0200 +++ drivers/net/e100.c 2007-10-20 18:36:02.000000000 +0200 @@ -1960,11 +1960,13 @@ struct net_device *netdev = dev_id; struct nic *nic = netdev_priv(netdev); u8 stat_ack = ioread8(&nic->csr->scb.stat_ack); + u8 cmd_hi = ioread8(&nic->csr->scb.cmd_hi); DPRINTK(INTR, DEBUG, "stat_ack = 0x%02X\n", stat_ack); if(stat_ack == stat_ack_not_ours || /* Not our interrupt */ - stat_ack == stat_ack_not_present) /* Hardware is ejected */ + stat_ack == stat_ack_not_present || /* Hardware is ejected */ + cmd_hi & irq_mask_all) /* Interrupts masked */ return IRQ_NONE; /* Ack interrupt(s) */ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: e100 problems in .23rc8 ? 2007-10-11 16:10 ` Kok, Auke 2007-10-11 17:25 ` Dave Jones @ 2007-10-11 23:24 ` Herbert Xu 1 sibling, 0 replies; 27+ messages in thread From: Herbert Xu @ 2007-10-11 23:24 UTC (permalink / raw) To: Kok, Auke; +Cc: Dave Jones, netdev, esandeen, dmack On Thu, Oct 11, 2007 at 09:10:34AM -0700, Kok, Auke wrote: > >> > >> commit 416b5d10afdc797c21c457ade3714e8f2f75edd9 > >> Author: Auke Kok <auke-jan.h.kok@intel.com> > >> Date: Fri Jun 1 10:22:39 2007 -0700 > >> > >> e1000: disable polling before registering netdevice > > this patch actually called napi_disable() in the probe routine which was wrong, > but e100 does not do that. Nonetheless e100 doesn't call netif_carrier_off() and > netif_stop_queue(), so to make e100 the same as e1000 we should probably do this, > see below. Back then we didn't have napi_disable at all. That patch calls netif_poll_disable which has different semantics. > Dave, can you see if this resolves the issue for you? If so then we might want to > push this to -stable. The problem is with netif_poll so this patch probably won't help. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2007-10-22 15:00 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-26 15:04 e100 problems in .23rc8 ? Dave Jones 2007-09-26 18:10 ` Kok, Auke 2007-09-26 18:18 ` Dave Jones 2007-09-27 6:58 ` Herbert Xu 2007-10-11 0:36 ` Dave Jones 2007-10-11 1:25 ` Herbert Xu 2007-10-11 16:10 ` Kok, Auke 2007-10-11 17:25 ` Dave Jones 2007-10-11 18:56 ` Eric Sandeen 2007-10-12 14:54 ` David Mack 2007-10-12 15:35 ` Herbert Xu 2007-10-12 15:51 ` David Mack 2007-10-13 2:35 ` Herbert Xu 2007-10-16 14:33 ` Eric Sandeen 2007-10-16 14:35 ` Herbert Xu 2007-10-16 15:47 ` Eric Sandeen 2007-10-16 16:39 ` David Mack 2007-10-12 17:04 ` Kok, Auke 2007-10-18 17:51 ` David Mack 2007-10-18 17:59 ` Kok, Auke 2007-10-18 18:17 ` Chuck Ebbert 2007-10-22 1:04 ` Dave Jones 2007-10-22 3:10 ` Herbert Xu 2007-10-22 14:05 ` David Mack 2007-10-22 14:59 ` Eric Sandeen 2007-10-22 14:44 ` Chuck Ebbert 2007-10-11 23:24 ` Herbert Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).