* [e1000e] BUG triggered when triggering LED blinking
@ 2010-11-09 8:39 Holger Eitzenberger
2010-11-10 18:32 ` Brandeburg, Jesse
0 siblings, 1 reply; 5+ messages in thread
From: Holger Eitzenberger @ 2010-11-09 8:39 UTC (permalink / raw)
To: e1000-devel; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 2673 bytes --]
Hi,
using e1000e driver version 1.2.10 and kernel version 2.6.32.24 I see
the kernel go BUG() sporadically at the time 'ethtool -p eth0 3' comes
back.
Network hardware is four times 'Intel Corporation 82583V Gigabit Network
Connection' (0x8086:0x150c) on Atom N450.
kernel BUG at kernel/workqueue.c:287!
invalid opcode: 0000 [#1] SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.1/input/input2/event2/dev
Modules linked in: nls_utf8 isofs edd ide_cd_mod sr_mod cdrom sg sd_mod
pata_acpi ata_generic usb_storage ppdev ide_pci_generic ata_piix libata
evdev rtc_cmos uhci_hcd parport_pc scsi_mod i2c_i801 ehci_hcd rtc_core
rtc_lib e1000e parport ftdi_sio usbhid usbserial
Pid: 8, comm: events/1 Not tainted (2.6.32.24-62.gce8dff6-ai #1) To Be
Filled By O.E.M.
EIP: 0060:[<c102ed4f>] EFLAGS: 00010206 CPU: 1
EIP is at worker_thread+0xc5/0x144
EAX: c1c052e0 EBX: c1d052e0 ECX: f70cac1c EDX: f70cac1c
ESI: f81c75b0 EDI: f70cac18 EBP: f705e3b0 ESP: f7093f90
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process events/1 (pid: 8, ti=f7092000 task=f705e3b0 task.ti=f7092000)
Stack:
f705e5a0 c1d052ec c1d052e4 00000000 f705e3b0 c103151e f7093fa8 f7093fa8
<0> f7061f68 c1d052e0 c102ec8a 00000000 c103132b 00000000 00000000
00000000
<0> f7093fd0 f7093fd0 c10312ca 00000000 00000000 c100329f f7061f5c
00000000
Call Trace:
[<c103151e>] ? autoremove_wake_function+0x0/0x2d
[<c102ec8a>] ? worker_thread+0x0/0x144
[<c103132b>] ? kthread+0x61/0x66
[<c10312ca>] ? kthread+0x0/0x66
[<c100329f>] ? kernel_thread_helper+0x7/0x10
Code: e9 85 00 00 00 8d 79 fc 8b 77 0c 89 7b 18 8b 11 8b 41 04 89 42 04
89 10 89 09 89 49 04 f0 fe 03 fb 8b 41 fc 83 e0 fc 39 c3 74 04 <0f> 0b
eb fe f0 80 61 fc fe 89 f8 ff d6 89 e0 25 00 e0 ff ff 8b
EIP: [<c102ed4f>] worker_thread+0xc5/0x144 SS:ESP 0068:f7093f90
---[ end trace e297b781eb382c2f ]---
The full trace is attached, it may become clearer from that.
After taking a look I think this may be caused by initializing
adapter->led_blink_task several times in e1000_phys_id(), while possibly
led_blink_task is running:
if ((hw->phy.type == e1000_phy_ife) ||
(hw->mac.type == e1000_pchlan) ||
(hw->mac.type == e1000_82574)) {
INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
if (!adapter->blink_timer.function) {
I can't reproduce it after moving it inside the following if block,
but I'm not quite sure if this catches all races in there. Especially
the msleep_interruptible() may be too optimistic because it may
actually not wait long enough. Someone with more knowledge of the
driver should take a look.
I've attached a proposed fix for the double initialization, please check.
/holger
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: putty.log --]
[-- Type: text/plain; charset=unknown-8bit, Size: 4366 bytes --]
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2010.11.03 16:51:28 =~=~=~=~=~=~=~=~=~=~=~=
ÿ------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:287!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.1/input/input2/event2/dev
Modules linked in: nls_utf8 isofs edd ide_cd_mod sr_mod cdrom sg sd_mod pata_acpi ata_generic usb_storage ppdev ide_pci_generic ata_piix libata evdev rtc_cmos uhci_hcd parport_pc scsi_mod i2c_i801 ehci_hcd rtc_core rtc_lib e1000e parport ftdi_sio usbhid usbserial
Pid: 8, comm: events/1 Not tainted (2.6.32.24-62.gce8dff6-ai #1) To Be Filled By O.E.M.
EIP: 0060:[<c102ed4f>] EFLAGS: 00010206 CPU: 1
EIP is at worker_thread+0xc5/0x144
EAX: c1c052e0 EBX: c1d052e0 ECX: f70cac1c EDX: f70cac1c
ESI: f81c75b0 EDI: f70cac18 EBP: f705e3b0 ESP: f7093f90
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process events/1 (pid: 8, ti=f7092000 task=f705e3b0 task.ti=f7092000)
Stack:
f705e5a0 c1d052ec c1d052e4 00000000 f705e3b0 c103151e f7093fa8 f7093fa8
<0> f7061f68 c1d052e0 c102ec8a 00000000 c103132b 00000000 00000000 00000000
<0> f7093fd0 f7093fd0 c10312ca 00000000 00000000 c100329f f7061f5c 00000000
Call Trace:
[<c103151e>] ? autoremove_wake_function+0x0/0x2d
[<c102ec8a>] ? worker_thread+0x0/0x144
[<c103132b>] ? kthread+0x61/0x66
[<c10312ca>] ? kthread+0x0/0x66
[<c100329f>] ? kernel_thread_helper+0x7/0x10
Code: e9 85 00 00 00 8d 79 fc 8b 77 0c 89 7b 18 8b 11 8b 41 04 89 42 04 89 10 89 09 89 49 04 f0 fe 03 fb 8b 41 fc 83 e0 fc 39 c3 74 04 <0f> 0b eb fe f0 80 61 fc fe 89 f8 ff d6 89 e0 25 00 e0 ff ff 8b
EIP: [<c102ed4f>] worker_thread+0xc5/0x144 SS:ESP 0068:f7093f90
---[ end trace e297b781eb382c2f ]---
------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:191!
invalid opcode: 0000 [#2] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.1/input/input2/event2/dev
Modules linked in: nls_utf8 isofs edd ide_cd_mod sr_mod cdrom sg sd_mod pata_acpi ata_generic usb_storage ppdev ide_pci_generic ata_piix libata evdev rtc_cmos uhci_hcd parport_pc scsi_mod i2c_i801 ehci_hcd rtc_core rtc_lib e1000e parport ftdi_sio usbhid usbserial
Pid: 1012, comm: klogd Tainted: G D (2.6.32.24-62.gce8dff6-ai #1) To Be Filled By O.E.M.
EIP: 0060:[<c102eeac>] EFLAGS: 00010283 CPU: 0
EIP is at queue_work_on+0x1b/0x44
EAX: f70cac1c EBX: 00000000 ECX: f70cac18 EDX: 00000000
ESI: f7001280 EDI: f81c7920 EBP: f71dbf60 ESP: f71dbf3c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process klogd (pid: 1012, ti=f71da000 task=f71d87a0 task.ti=f71da000)
Stack:
f70c8330 c133ca00 f81c7931 00000100 c1029710 c133d810 c133d610 c133d410
<0> c133d210 f71dbf60 f71dbf60 00000001 00000004 00000100 00000141 c102652e
<0> c1318010 0000000a 00000000 00000046 00000000 099251d0 bfb6e968 c10265be
Call Trace:
[<f81c7931>] ? e1000e_set_ethtool_ops+0x1d1/0x38b0 [e1000e]
[<c1029710>] ? run_timer_softirq+0x116/0x16a
[<c102652e>] ? __do_softirq+0x78/0xe5
[<c10265be>] ? do_softirq+0x23/0x27
[<c102668d>] ? irq_exit+0x26/0x58
[<c100f9ea>] ? smp_apic_timer_interrupt+0x6c/0x76
[<c10030f6>] ? apic_timer_interrupt+0x2a/0x30
Code: 2c c1 8b 00 03 04 8d c0 bb 2c c1 e9 3d ff ff ff 56 89 d6 53 89 c3 f0 0f ba 29 00 19 c0 31 d2 85 c0 75 2c 8d 41 04 39 41 04 74 04 <0f> 0b eb fe 83 7e 10 00 89 ca 0f 45 1d b4 bc 2c c1 8b 06 03 04
EIP: [<c102eeac>] queue_work_on+0x1b/0x44 SS:ESP 0068:f71dbf3c
---[ end trace e297b781eb382c30 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 1012, comm: klogd Tainted: G D 2.6.32.24-62.gce8dff6-ai #1
Call Trace:
[<c11e0a50>] ? panic+0x38/0xda
[<c11e3371>] ? oops_end+0x89/0x94
[<c1003628>] ? do_invalid_op+0x0/0x70
[<c100368f>] ? do_invalid_op+0x67/0x70
[<c102eeac>] ? queue_work_on+0x1b/0x44
[<c117bfb3>] ? sys_sendto+0xfc/0x127
[<c101c3cc>] ? dequeue_task_fair+0x3f/0x1bc
[<c1018e0b>] ? sched_slice+0x6d/0x79
[<c11e2ac6>] ? error_code+0x66/0x6c
[<f81c7920>] ? e1000e_set_ethtool_ops+0x1c0/0x38b0 [e1000e]
[<c1003628>] ? do_invalid_op+0x0/0x70
[<c102eeac>] ? queue_work_on+0x1b/0x44
[<f81c7931>] ? e1000e_set_ethtool_ops+0x1d1/0x38b0 [e1000e]
[<c1029710>] ? run_timer_softirq+0x116/0x16a
[<c102652e>] ? __do_softirq+0x78/0xe5
[<c10265be>] ? do_softirq+0x23/0x27
[<c102668d>] ? irq_exit+0x26/0x58
[<c100f9ea>] ? smp_apic_timer_interrupt+0x6c/0x76
[<c10030f6>] ? apic_timer_interrupt+0x2a/0x30
[-- Attachment #3: e1000e-fix.diff --]
[-- Type: text/x-diff, Size: 708 bytes --]
Index: linux-2.6.32.y/drivers/net/e1000e/ethtool.c
===================================================================
--- linux-2.6.32.y.orig/drivers/net/e1000e/ethtool.c 2010-11-08 15:34:35.000000000 +0100
+++ linux-2.6.32.y/drivers/net/e1000e/ethtool.c 2010-11-08 17:32:27.000000000 +0100
@@ -1833,8 +1833,8 @@
if ((hw->phy.type == e1000_phy_ife) ||
(hw->mac.type == e1000_pchlan) ||
(hw->mac.type == e1000_82574)) {
- INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
if (!adapter->blink_timer.function) {
+ INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
init_timer(&adapter->blink_timer);
adapter->blink_timer.function =
e1000_led_blink_callback;
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [e1000e] BUG triggered when triggering LED blinking
2010-11-09 8:39 [e1000e] BUG triggered when triggering LED blinking Holger Eitzenberger
@ 2010-11-10 18:32 ` Brandeburg, Jesse
2010-11-11 10:17 ` [e1000e] BUG triggered in blink path Holger Eitzenberger
0 siblings, 1 reply; 5+ messages in thread
From: Brandeburg, Jesse @ 2010-11-10 18:32 UTC (permalink / raw)
To: Holger Eitzenberger
Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org
On Tue, 9 Nov 2010, Holger Eitzenberger wrote:
> Hi,
>
> using e1000e driver version 1.2.10 and kernel version 2.6.32.24 I see
> the kernel go BUG() sporadically at the time 'ethtool -p eth0 3' comes
> back.
>
> Network hardware is four times 'Intel Corporation 82583V Gigabit Network
> Connection' (0x8086:0x150c) on Atom N450.
>
> kernel BUG at kernel/workqueue.c:287!
<snip>
>
> The full trace is attached, it may become clearer from that.
>
> After taking a look I think this may be caused by initializing
> adapter->led_blink_task several times in e1000_phys_id(), while possibly
> led_blink_task is running:
>
> if ((hw->phy.type == e1000_phy_ife) ||
> (hw->mac.type == e1000_pchlan) ||
> (hw->mac.type == e1000_82574)) {
> INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
> if (!adapter->blink_timer.function) {
>
> I can't reproduce it after moving it inside the following if block,
> but I'm not quite sure if this catches all races in there. Especially
> the msleep_interruptible() may be too optimistic because it may
> actually not wait long enough. Someone with more knowledge of the
> driver should take a look.
thanks for your investigation and troubleshooting. I don't think it is
correct at all to be calling INIT_WORK more than once. In fact the
INIT_WORK should just be moved into probe, and then e1000_phys_id should
just do schedule_work.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [e1000e] BUG triggered in blink path
2010-11-10 18:32 ` Brandeburg, Jesse
@ 2010-11-11 10:17 ` Holger Eitzenberger
2010-11-11 23:59 ` Jeff Kirsher
2010-11-12 0:40 ` Brandeburg, Jesse
0 siblings, 2 replies; 5+ messages in thread
From: Holger Eitzenberger @ 2010-11-11 10:17 UTC (permalink / raw)
To: Brandeburg, Jesse; +Cc: e1000-devel, netdev
[-- Attachment #1: Type: text/plain, Size: 1170 bytes --]
Hi Jesse,
I've attached the patch against net-next-2.6. Please check if it's ok
for you. I checked e1000, igb and ixgbe as well, they don't have that
problem.
/holger
> > After taking a look I think this may be caused by initializing
> > adapter->led_blink_task several times in e1000_phys_id(), while possibly
> > led_blink_task is running:
> >
> > if ((hw->phy.type == e1000_phy_ife) ||
> > (hw->mac.type == e1000_pchlan) ||
> > (hw->mac.type == e1000_82574)) {
> > INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
> > if (!adapter->blink_timer.function) {
> >
> > I can't reproduce it after moving it inside the following if block,
> > but I'm not quite sure if this catches all races in there. Especially
> > the msleep_interruptible() may be too optimistic because it may
> > actually not wait long enough. Someone with more knowledge of the
> > driver should take a look.
>
> thanks for your investigation and troubleshooting. I don't think it is
> correct at all to be calling INIT_WORK more than once. In fact the
> INIT_WORK should just be moved into probe, and then e1000_phys_id should
> just do schedule_work.
>
>
[-- Attachment #2: x.diff --]
[-- Type: text/x-diff, Size: 2637 bytes --]
e1000e: fix double initialization in blink path
The kernel goes BUG() at the time 'ethtool -p eth0 3' comes
back, which is due to adapter->led_blink_task initialized
several times. At the time it is still running this results
in a corrupted task_list of the associated workqueue.
The fix is to move the workqueue initialization to the
probe function instead.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Index: net-next-2.6/drivers/net/e1000e/ethtool.c
===================================================================
--- net-next-2.6.orig/drivers/net/e1000e/ethtool.c 2010-11-11 10:57:28.000000000 +0100
+++ net-next-2.6/drivers/net/e1000e/ethtool.c 2010-11-11 11:02:21.000000000 +0100
@@ -1860,7 +1860,7 @@
/* bit defines for adapter->led_status */
#define E1000_LED_ON 0
-static void e1000e_led_blink_task(struct work_struct *work)
+void e1000e_led_blink_task(struct work_struct *work)
{
struct e1000_adapter *adapter = container_of(work,
struct e1000_adapter, led_blink_task);
@@ -1892,7 +1892,6 @@
(hw->mac.type == e1000_pch2lan) ||
(hw->mac.type == e1000_82583) ||
(hw->mac.type == e1000_82574)) {
- INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
if (!adapter->blink_timer.function) {
init_timer(&adapter->blink_timer);
adapter->blink_timer.function =
Index: net-next-2.6/drivers/net/e1000e/netdev.c
===================================================================
--- net-next-2.6.orig/drivers/net/e1000e/netdev.c 2010-11-11 10:57:28.000000000 +0100
+++ net-next-2.6/drivers/net/e1000e/netdev.c 2010-11-11 11:01:17.000000000 +0100
@@ -5864,6 +5864,7 @@
INIT_WORK(&adapter->downshift_task, e1000e_downshift_workaround);
INIT_WORK(&adapter->update_phy_task, e1000e_update_phy_task);
INIT_WORK(&adapter->print_hang_task, e1000_print_hw_hang);
+ INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
/* Initialize link parameters. User can change them with ethtool */
adapter->hw.mac.autoneg = 1;
Index: net-next-2.6/drivers/net/e1000e/e1000.h
===================================================================
--- net-next-2.6.orig/drivers/net/e1000e/e1000.h 2010-11-11 10:57:28.000000000 +0100
+++ net-next-2.6/drivers/net/e1000e/e1000.h 2010-11-11 11:01:57.000000000 +0100
@@ -482,6 +482,7 @@
extern void e1000e_check_options(struct e1000_adapter *adapter);
extern void e1000e_set_ethtool_ops(struct net_device *netdev);
+extern void e1000e_led_blink_task(struct work_struct *work);
extern int e1000e_up(struct e1000_adapter *adapter);
extern void e1000e_down(struct e1000_adapter *adapter);
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [e1000e] BUG triggered in blink path
2010-11-11 10:17 ` [e1000e] BUG triggered in blink path Holger Eitzenberger
@ 2010-11-11 23:59 ` Jeff Kirsher
2010-11-12 0:40 ` Brandeburg, Jesse
1 sibling, 0 replies; 5+ messages in thread
From: Jeff Kirsher @ 2010-11-11 23:59 UTC (permalink / raw)
To: Brandeburg, Jesse, e1000-devel, netdev
On Thu, Nov 11, 2010 at 02:17, Holger Eitzenberger
<holger@eitzenberger.org> wrote:
> Hi Jesse,
>
> I've attached the patch against net-next-2.6. Please check if it's ok
> for you. I checked e1000, igb and ixgbe as well, they don't have that
> problem.
>
> /holger
>
Patch looks good, I have added the patch to my tree.
Cheers,
Jeff
>> > After taking a look I think this may be caused by initializing
>> > adapter->led_blink_task several times in e1000_phys_id(), while possibly
>> > led_blink_task is running:
>> >
>> > if ((hw->phy.type == e1000_phy_ife) ||
>> > (hw->mac.type == e1000_pchlan) ||
>> > (hw->mac.type == e1000_82574)) {
>> > INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
>> > if (!adapter->blink_timer.function) {
>> >
>> > I can't reproduce it after moving it inside the following if block,
>> > but I'm not quite sure if this catches all races in there. Especially
>> > the msleep_interruptible() may be too optimistic because it may
>> > actually not wait long enough. Someone with more knowledge of the
>> > driver should take a look.
>>
>> thanks for your investigation and troubleshooting. I don't think it is
>> correct at all to be calling INIT_WORK more than once. In fact the
>> INIT_WORK should just be moved into probe, and then e1000_phys_id should
>> just do schedule_work.
>>
>>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [e1000e] BUG triggered in blink path
2010-11-11 10:17 ` [e1000e] BUG triggered in blink path Holger Eitzenberger
2010-11-11 23:59 ` Jeff Kirsher
@ 2010-11-12 0:40 ` Brandeburg, Jesse
1 sibling, 0 replies; 5+ messages in thread
From: Brandeburg, Jesse @ 2010-11-12 0:40 UTC (permalink / raw)
To: Holger Eitzenberger
Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org
On Thu, 11 Nov 2010, Holger Eitzenberger wrote:
> I've attached the patch against net-next-2.6. Please check if it's ok
> for you. I checked e1000, igb and ixgbe as well, they don't have that
> problem.
Holger,
I think the patch looks good and thanks. Great catch too! I see Jeff has
put it into our testing process.
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-11-12 0:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-09 8:39 [e1000e] BUG triggered when triggering LED blinking Holger Eitzenberger
2010-11-10 18:32 ` Brandeburg, Jesse
2010-11-11 10:17 ` [e1000e] BUG triggered in blink path Holger Eitzenberger
2010-11-11 23:59 ` Jeff Kirsher
2010-11-12 0:40 ` Brandeburg, Jesse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).