* e1000: driver reboot/kexec bug.
2005-02-15 21:24 [PATCH net-drivers-2.6 0/10] e1000: driver update Malli Chilakala
@ 2005-02-16 12:25 ` Eric W. Biederman
0 siblings, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 2005-02-16 12:25 UTC (permalink / raw)
To: Malli Chilakala; +Cc: jgarzik@pobox.com, netdev
When I kexec a new kernel on hardware that includes
some revs of the e1000 (see below for lscpi -n) the
e1000 driver is not able to reinitialize the NIC. I
have seen this in both 2.4.29 and 2.6.10.
Tracking it down it appears to be some side effect to powering down
the nic. If I remove the pci_set_power_state call in e1000_suspend
or I simply apply the attached patch so I get that affect when
rebooting everything works. pci_enable_device brings the device
up to full power before the driver initialization code does anything
else so I don't have a clue what is really going on but it is.
Boot messages on failure:
> Intel(R) PRO/1000 Network Driver - version 5.6.10.1-k1
> Copyright (c) 1999-2004 Intel Corporation.
> PCI: Enabling device 03:04.0 (0000 -> 0003)
> e1000: 03:04.0: e1000_probe: The EEPROM Checksum Is Not Valid
> PCI: Enabling device 03:04.1 (0000 -> 0003)
> e1000: 03:04.1: e1000_probe: The EEPROM Checksum Is Not Valid
lspci -n of the problem onboard e1000 NIC.
> 03:04.0 Class 0200: 8086:1079 (rev 03)
> 03:04.1 Class 0200: 8086:1079 (rev 03)
Patch which avoids the problem.
diff -uNrX linux-exclude-files linux-2.4.29-kexec-apic-virtwire-on-shutdownx86_64/drivers/net/e1000/e1000_main.c linux-2.4.29-kexec7.build.x86_64/drivers/net/e1000/e1000_main.c
--- linux-2.4.29-kexec-apic-virtwire-on-shutdownx86_64/drivers/net/e1000/e1000_main.c Tue Feb 15 14:17:09 2005
+++ linux-2.4.29-kexec7.build.x86_64/drivers/net/e1000/e1000_main.c Wed Feb 16 04:58:18 2005
@@ -2777,7 +2777,7 @@
case SYS_POWER_OFF:
while((pdev = pci_find_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) {
if(pci_dev_driver(pdev) == &e1000_driver)
- e1000_suspend(pdev, 3);
+ e1000_suspend(pdev, (event == SYS_DOWN)?0:3);
}
}
return NOTIFY_DONE;
Any help to track down why this is happening so we can apply
a clean fix would be appreciated.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: e1000: driver reboot/kexec bug.
@ 2005-02-17 17:02 Venkatesan, Ganesh
2005-02-17 17:57 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: Venkatesan, Ganesh @ 2005-02-17 17:02 UTC (permalink / raw)
To: eric; +Cc: jgarzik, Chilakala, Mallikarjuna, netdev
Hi Eric:
Could you send us 'lspci -vvv' output for 3:4.0? Could you also help us
with some more information on the platform you are using?
Thanks,
Ganesh.
>-----Original Message-----
>From: eric@ebiederm.dsl.xmission.com
>[mailto:eric@ebiederm.dsl.xmission.com] On Behalf Of Eric W. Biederman
>Sent: Wednesday, February 16, 2005 4:26 AM
>To: Chilakala, Mallikarjuna
>Cc: jgarzik@pobox.com; netdev
>Subject: e1000: driver reboot/kexec bug.
>
>
>When I kexec a new kernel on hardware that includes
>some revs of the e1000 (see below for lscpi -n) the
>e1000 driver is not able to reinitialize the NIC. I
>have seen this in both 2.4.29 and 2.6.10.
>
>Tracking it down it appears to be some side effect to powering down
>the nic. If I remove the pci_set_power_state call in e1000_suspend
>or I simply apply the attached patch so I get that affect when
>rebooting everything works. pci_enable_device brings the device
>up to full power before the driver initialization code does anything
>else so I don't have a clue what is really going on but it is.
>
>
>Boot messages on failure:
>> Intel(R) PRO/1000 Network Driver - version 5.6.10.1-k1
>> Copyright (c) 1999-2004 Intel Corporation.
>> PCI: Enabling device 03:04.0 (0000 -> 0003)
>> e1000: 03:04.0: e1000_probe: The EEPROM Checksum Is Not Valid
>> PCI: Enabling device 03:04.1 (0000 -> 0003)
>> e1000: 03:04.1: e1000_probe: The EEPROM Checksum Is Not Valid
>
>lspci -n of the problem onboard e1000 NIC.
>> 03:04.0 Class 0200: 8086:1079 (rev 03)
>> 03:04.1 Class 0200: 8086:1079 (rev 03)
>
>
>Patch which avoids the problem.
>diff -uNrX linux-exclude-files linux-2.4.29-kexec-apic-virtwire-on-
>shutdownx86_64/drivers/net/e1000/e1000_main.c linux-2.4.29-
>kexec7.build.x86_64/drivers/net/e1000/e1000_main.c
>--- linux-2.4.29-kexec-apic-virtwire-on-
>shutdownx86_64/drivers/net/e1000/e1000_main.c Tue Feb 15 14:17:09 2005
>+++ linux-2.4.29-kexec7.build.x86_64/drivers/net/e1000/e1000_main.c
Wed
>Feb 16 04:58:18 2005
>@@ -2777,7 +2777,7 @@
> case SYS_POWER_OFF:
> while((pdev = pci_find_device(PCI_ANY_ID, PCI_ANY_ID,
pdev))) {
> if(pci_dev_driver(pdev) == &e1000_driver)
>- e1000_suspend(pdev, 3);
>+ e1000_suspend(pdev, (event ==
SYS_DOWN)?0:3);
> }
> }
> return NOTIFY_DONE;
>
>
>Any help to track down why this is happening so we can apply
>a clean fix would be appreciated.
>
>Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: e1000: driver reboot/kexec bug.
@ 2005-02-17 17:04 Venkatesan, Ganesh
0 siblings, 0 replies; 6+ messages in thread
From: Venkatesan, Ganesh @ 2005-02-17 17:04 UTC (permalink / raw)
To: Venkatesan, Ganesh, eric; +Cc: jgarzik, Chilakala, Mallikarjuna, netdev
Eric:
Please send me 'ethtool -e eth?' for the interface corresponding to one
of the 8086:1079 devices.
Ganesh.
>-----Original Message-----
>From: Venkatesan, Ganesh
>Sent: Thursday, February 17, 2005 9:02 AM
>To: 'eric@ebiederm.dsl.xmission.com'
>Cc: jgarzik@pobox.com; Chilakala, Mallikarjuna; netdev@oss.sgi.com
>Subject: RE: e1000: driver reboot/kexec bug.
>
>Hi Eric:
>
>Could you send us 'lspci -vvv' output for 3:4.0? Could you also help us
>with some more information on the platform you are using?
>
>Thanks,
>Ganesh.
>
>>-----Original Message-----
>>From: eric@ebiederm.dsl.xmission.com
>>[mailto:eric@ebiederm.dsl.xmission.com] On Behalf Of Eric W. Biederman
>>Sent: Wednesday, February 16, 2005 4:26 AM
>>To: Chilakala, Mallikarjuna
>>Cc: jgarzik@pobox.com; netdev
>>Subject: e1000: driver reboot/kexec bug.
>>
>>
>>When I kexec a new kernel on hardware that includes
>>some revs of the e1000 (see below for lscpi -n) the
>>e1000 driver is not able to reinitialize the NIC. I
>>have seen this in both 2.4.29 and 2.6.10.
>>
>>Tracking it down it appears to be some side effect to powering down
>>the nic. If I remove the pci_set_power_state call in e1000_suspend
>>or I simply apply the attached patch so I get that affect when
>>rebooting everything works. pci_enable_device brings the device
>>up to full power before the driver initialization code does anything
>>else so I don't have a clue what is really going on but it is.
>>
>>
>>Boot messages on failure:
>>> Intel(R) PRO/1000 Network Driver - version 5.6.10.1-k1
>>> Copyright (c) 1999-2004 Intel Corporation.
>>> PCI: Enabling device 03:04.0 (0000 -> 0003)
>>> e1000: 03:04.0: e1000_probe: The EEPROM Checksum Is Not Valid
>>> PCI: Enabling device 03:04.1 (0000 -> 0003)
>>> e1000: 03:04.1: e1000_probe: The EEPROM Checksum Is Not Valid
>>
>>lspci -n of the problem onboard e1000 NIC.
>>> 03:04.0 Class 0200: 8086:1079 (rev 03)
>>> 03:04.1 Class 0200: 8086:1079 (rev 03)
>>
>>
>>Patch which avoids the problem.
>>diff -uNrX linux-exclude-files linux-2.4.29-kexec-apic-virtwire-on-
>>shutdownx86_64/drivers/net/e1000/e1000_main.c linux-2.4.29-
>>kexec7.build.x86_64/drivers/net/e1000/e1000_main.c
>>--- linux-2.4.29-kexec-apic-virtwire-on-
>>shutdownx86_64/drivers/net/e1000/e1000_main.c Tue Feb 15 14:17:09 2005
>>+++ linux-2.4.29-kexec7.build.x86_64/drivers/net/e1000/e1000_main.c
Wed
>>Feb 16 04:58:18 2005
>>@@ -2777,7 +2777,7 @@
>> case SYS_POWER_OFF:
>> while((pdev = pci_find_device(PCI_ANY_ID, PCI_ANY_ID,
pdev))) {
>> if(pci_dev_driver(pdev) == &e1000_driver)
>>- e1000_suspend(pdev, 3);
>>+ e1000_suspend(pdev, (event ==
SYS_DOWN)?0:3);
>> }
>> }
>> return NOTIFY_DONE;
>>
>>
>>Any help to track down why this is happening so we can apply
>>a clean fix would be appreciated.
>>
>>Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: e1000: driver reboot/kexec bug.
2005-02-17 17:02 e1000: driver reboot/kexec bug Venkatesan, Ganesh
@ 2005-02-17 17:57 ` Eric W. Biederman
0 siblings, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 2005-02-17 17:57 UTC (permalink / raw)
To: Venkatesan, Ganesh; +Cc: jgarzik, Chilakala, Mallikarjuna, netdev
"Venkatesan, Ganesh" <ganesh.venkatesan@intel.com> writes:
> Hi Eric:
>
> Could you send us 'lspci -vvv' output for 3:4.0? Could you also help us
> with some more information on the platform you are using?
> Please send me 'ethtool -e eth?' for the interface corresponding to one
> of the 8086:1079 devices.
The motherboard is an Intel Jarell motherboard with the Lindenhurst
aka E7520 chipset. I have seen the same symptoms on a recent
Supermicro system as well.
Do you know enough about kexec to attempt to reproduce this problem
that way?
Anyway hopefully this is enough to get you started.
Eric
03:04.0 Ethernet controller: Intel Corp.: Unknown device 1079 (rev 03)
Subsystem: Intel Corp.: Unknown device 1079
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (63750ns min), cache line size 10
Interrupt: pin A routed to IRQ 54
Region 0: Memory at fe000000 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at 1000 [size=64]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] #07 [0002]
Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
ethtool -e eth0
Offset Values
------ ------
0x0000 00 02 b3 e8 fa c8 b0 0c ff ff ff ff ff ff ff ff
0x0010 00 00 00 00 08 46 1a 34 86 80 79 10 86 80 e8 b2
0x0020 0c c3 00 00 00 00 05 01 88 1c ff ff ff ff ff ff
0x0030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0040 0c c3 21 00 0c 28 05 01 86 0c ff ff ff ff ff ff
0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02 06
0x0060 0c 01 00 40 13 12 ff ff 0c 01 00 40 08 11 ff ff
0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 5e 37
0x0080 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0090 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00a0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00b0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00c0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00d0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00e0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00f0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0100 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0110 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0120 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0130 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0140 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0150 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0160 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0170 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0180 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0190 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01a0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01b0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01c0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01d0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01e0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 6f
0x01f0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: e1000: driver reboot/kexec bug.
@ 2005-02-17 18:33 Venkatesan, Ganesh
2005-02-17 19:53 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: Venkatesan, Ganesh @ 2005-02-17 18:33 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: jgarzik, Chilakala, Mallikarjuna, netdev
>Do you know enough about kexec to attempt to reproduce this problem
>that way?
Not much. All I have is an old paper by Andy Pfiffer. Could you point me
to more resources on this?
Ganesh.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: e1000: driver reboot/kexec bug.
2005-02-17 18:33 Venkatesan, Ganesh
@ 2005-02-17 19:53 ` Eric W. Biederman
0 siblings, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 2005-02-17 19:53 UTC (permalink / raw)
To: Venkatesan, Ganesh; +Cc: jgarzik, Chilakala, Mallikarjuna, netdev, fastboot
"Venkatesan, Ganesh" <ganesh.venkatesan@intel.com> writes:
> >Do you know enough about kexec to attempt to reproduce this problem
> >that way?
>
> Not much. All I have is an old paper by Andy Pfiffer. Could you point me
> to more resources on this?
Short explanation:
The user space lives at:
http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz
Other bits and pieces can be found at:
http://www.xmission.com/~ebiederm/files/kexec/
The latest patches are in the -mm tree.
Usually it is as simple as:
/sbin/kexec -l /path/to/bzImage --append='your command line options'
Then drop to single user mode and do:
/sbin/kexec -e
My patches have not made into the initscripts yet so doing a clean system
shutdown has not been fully automated yet.
i386 and x86-64 architectures should both work.
Not it is a matter of slowing digging into the hardware support code and
getting out the bugs that are revealed.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-02-17 19:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-17 17:02 e1000: driver reboot/kexec bug Venkatesan, Ganesh
2005-02-17 17:57 ` Eric W. Biederman
-- strict thread matches above, loose matches on Subject: below --
2005-02-17 18:33 Venkatesan, Ganesh
2005-02-17 19:53 ` Eric W. Biederman
2005-02-17 17:04 Venkatesan, Ganesh
2005-02-15 21:24 [PATCH net-drivers-2.6 0/10] e1000: driver update Malli Chilakala
2005-02-16 12:25 ` e1000: driver reboot/kexec bug Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).