netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [e1000e] BUG triggered when triggering LED blinking
@ 2010-11-09  8:39 Holger Eitzenberger
  2010-11-10 18:32 ` Brandeburg, Jesse
  0 siblings, 1 reply; 5+ messages in thread
From: Holger Eitzenberger @ 2010-11-09  8:39 UTC (permalink / raw)
  To: e1000-devel; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 2673 bytes --]

Hi,

using e1000e driver version 1.2.10 and kernel version 2.6.32.24 I see
the kernel go BUG() sporadically at the time 'ethtool -p eth0 3' comes
back.

Network hardware is four times 'Intel Corporation 82583V Gigabit Network
Connection' (0x8086:0x150c) on Atom N450.

kernel BUG at kernel/workqueue.c:287!
invalid opcode: 0000 [#1] SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.1/input/input2/event2/dev
Modules linked in: nls_utf8 isofs edd ide_cd_mod sr_mod cdrom sg sd_mod
pata_acpi ata_generic usb_storage ppdev ide_pci_generic ata_piix libata
evdev rtc_cmos uhci_hcd parport_pc scsi_mod i2c_i801 ehci_hcd rtc_core
rtc_lib e1000e parport ftdi_sio usbhid usbserial

Pid: 8, comm: events/1 Not tainted (2.6.32.24-62.gce8dff6-ai #1) To Be
Filled By O.E.M.
EIP: 0060:[<c102ed4f>] EFLAGS: 00010206 CPU: 1
EIP is at worker_thread+0xc5/0x144
EAX: c1c052e0 EBX: c1d052e0 ECX: f70cac1c EDX: f70cac1c
ESI: f81c75b0 EDI: f70cac18 EBP: f705e3b0 ESP: f7093f90
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process events/1 (pid: 8, ti=f7092000 task=f705e3b0 task.ti=f7092000)
Stack:
 f705e5a0 c1d052ec c1d052e4 00000000 f705e3b0 c103151e f7093fa8 f7093fa8
<0> f7061f68 c1d052e0 c102ec8a 00000000 c103132b 00000000 00000000
00000000
<0> f7093fd0 f7093fd0 c10312ca 00000000 00000000 c100329f f7061f5c
00000000
Call Trace:
 [<c103151e>] ? autoremove_wake_function+0x0/0x2d
 [<c102ec8a>] ? worker_thread+0x0/0x144
 [<c103132b>] ? kthread+0x61/0x66
 [<c10312ca>] ? kthread+0x0/0x66
 [<c100329f>] ? kernel_thread_helper+0x7/0x10
Code: e9 85 00 00 00 8d 79 fc 8b 77 0c 89 7b 18 8b 11 8b 41 04 89 42 04
89 10 89 09 89 49 04 f0 fe 03 fb 8b 41 fc 83 e0 fc 39 c3 74 04 <0f> 0b
eb fe f0 80 61 fc fe 89 f8 ff d6 89 e0 25 00 e0 ff ff 8b
EIP: [<c102ed4f>] worker_thread+0xc5/0x144 SS:ESP 0068:f7093f90
---[ end trace e297b781eb382c2f ]---

The full trace is attached, it may become clearer from that.

After taking a look I think this may be caused by initializing
adapter->led_blink_task several times in e1000_phys_id(), while possibly
led_blink_task is running:

	if ((hw->phy.type == e1000_phy_ife) ||
	    (hw->mac.type == e1000_pchlan) ||
	    (hw->mac.type == e1000_82574)) {
		INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
		if (!adapter->blink_timer.function) {

I can't reproduce it after moving it inside the following if block,
but I'm not quite sure if this catches all races in there.  Especially
the msleep_interruptible() may be too optimistic because it may
actually not wait long enough.  Someone with more knowledge of the
driver should take a look.

I've attached a proposed fix for the double initialization, please check.

 /holger


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: putty.log --]
[-- Type: text/plain; charset=unknown-8bit, Size: 4366 bytes --]

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2010.11.03 16:51:28 =~=~=~=~=~=~=~=~=~=~=~=
ÿ------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:287!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.1/input/input2/event2/dev
Modules linked in: nls_utf8 isofs edd ide_cd_mod sr_mod cdrom sg sd_mod pata_acpi ata_generic usb_storage ppdev ide_pci_generic ata_piix libata evdev rtc_cmos uhci_hcd parport_pc scsi_mod i2c_i801 ehci_hcd rtc_core rtc_lib e1000e parport ftdi_sio usbhid usbserial

Pid: 8, comm: events/1 Not tainted (2.6.32.24-62.gce8dff6-ai #1) To Be Filled By O.E.M.
EIP: 0060:[<c102ed4f>] EFLAGS: 00010206 CPU: 1
EIP is at worker_thread+0xc5/0x144
EAX: c1c052e0 EBX: c1d052e0 ECX: f70cac1c EDX: f70cac1c
ESI: f81c75b0 EDI: f70cac18 EBP: f705e3b0 ESP: f7093f90
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process events/1 (pid: 8, ti=f7092000 task=f705e3b0 task.ti=f7092000)
Stack:
 f705e5a0 c1d052ec c1d052e4 00000000 f705e3b0 c103151e f7093fa8 f7093fa8
<0> f7061f68 c1d052e0 c102ec8a 00000000 c103132b 00000000 00000000 00000000
<0> f7093fd0 f7093fd0 c10312ca 00000000 00000000 c100329f f7061f5c 00000000
Call Trace:
 [<c103151e>] ? autoremove_wake_function+0x0/0x2d
 [<c102ec8a>] ? worker_thread+0x0/0x144
 [<c103132b>] ? kthread+0x61/0x66
 [<c10312ca>] ? kthread+0x0/0x66
 [<c100329f>] ? kernel_thread_helper+0x7/0x10
Code: e9 85 00 00 00 8d 79 fc 8b 77 0c 89 7b 18 8b 11 8b 41 04 89 42 04 89 10 89 09 89 49 04 f0 fe 03 fb 8b 41 fc 83 e0 fc 39 c3 74 04 <0f> 0b eb fe f0 80 61 fc fe 89 f8 ff d6 89 e0 25 00 e0 ff ff 8b 
EIP: [<c102ed4f>] worker_thread+0xc5/0x144 SS:ESP 0068:f7093f90
---[ end trace e297b781eb382c2f ]---
------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:191!
invalid opcode: 0000 [#2] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.1/input/input2/event2/dev
Modules linked in: nls_utf8 isofs edd ide_cd_mod sr_mod cdrom sg sd_mod pata_acpi ata_generic usb_storage ppdev ide_pci_generic ata_piix libata evdev rtc_cmos uhci_hcd parport_pc scsi_mod i2c_i801 ehci_hcd rtc_core rtc_lib e1000e parport ftdi_sio usbhid usbserial

Pid: 1012, comm: klogd Tainted: G      D    (2.6.32.24-62.gce8dff6-ai #1) To Be Filled By O.E.M.
EIP: 0060:[<c102eeac>] EFLAGS: 00010283 CPU: 0
EIP is at queue_work_on+0x1b/0x44
EAX: f70cac1c EBX: 00000000 ECX: f70cac18 EDX: 00000000
ESI: f7001280 EDI: f81c7920 EBP: f71dbf60 ESP: f71dbf3c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process klogd (pid: 1012, ti=f71da000 task=f71d87a0 task.ti=f71da000)
Stack:
 f70c8330 c133ca00 f81c7931 00000100 c1029710 c133d810 c133d610 c133d410
<0> c133d210 f71dbf60 f71dbf60 00000001 00000004 00000100 00000141 c102652e
<0> c1318010 0000000a 00000000 00000046 00000000 099251d0 bfb6e968 c10265be
Call Trace:
 [<f81c7931>] ? e1000e_set_ethtool_ops+0x1d1/0x38b0 [e1000e]
 [<c1029710>] ? run_timer_softirq+0x116/0x16a
 [<c102652e>] ? __do_softirq+0x78/0xe5
 [<c10265be>] ? do_softirq+0x23/0x27
 [<c102668d>] ? irq_exit+0x26/0x58
 [<c100f9ea>] ? smp_apic_timer_interrupt+0x6c/0x76
 [<c10030f6>] ? apic_timer_interrupt+0x2a/0x30
Code: 2c c1 8b 00 03 04 8d c0 bb 2c c1 e9 3d ff ff ff 56 89 d6 53 89 c3 f0 0f ba 29 00 19 c0 31 d2 85 c0 75 2c 8d 41 04 39 41 04 74 04 <0f> 0b eb fe 83 7e 10 00 89 ca 0f 45 1d b4 bc 2c c1 8b 06 03 04 
EIP: [<c102eeac>] queue_work_on+0x1b/0x44 SS:ESP 0068:f71dbf3c
---[ end trace e297b781eb382c30 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 1012, comm: klogd Tainted: G      D    2.6.32.24-62.gce8dff6-ai #1
Call Trace:
 [<c11e0a50>] ? panic+0x38/0xda
 [<c11e3371>] ? oops_end+0x89/0x94
 [<c1003628>] ? do_invalid_op+0x0/0x70
 [<c100368f>] ? do_invalid_op+0x67/0x70
 [<c102eeac>] ? queue_work_on+0x1b/0x44
 [<c117bfb3>] ? sys_sendto+0xfc/0x127
 [<c101c3cc>] ? dequeue_task_fair+0x3f/0x1bc
 [<c1018e0b>] ? sched_slice+0x6d/0x79
 [<c11e2ac6>] ? error_code+0x66/0x6c
 [<f81c7920>] ? e1000e_set_ethtool_ops+0x1c0/0x38b0 [e1000e]
 [<c1003628>] ? do_invalid_op+0x0/0x70
 [<c102eeac>] ? queue_work_on+0x1b/0x44
 [<f81c7931>] ? e1000e_set_ethtool_ops+0x1d1/0x38b0 [e1000e]
 [<c1029710>] ? run_timer_softirq+0x116/0x16a
 [<c102652e>] ? __do_softirq+0x78/0xe5
 [<c10265be>] ? do_softirq+0x23/0x27
 [<c102668d>] ? irq_exit+0x26/0x58
 [<c100f9ea>] ? smp_apic_timer_interrupt+0x6c/0x76
 [<c10030f6>] ? apic_timer_interrupt+0x2a/0x30

[-- Attachment #3: e1000e-fix.diff --]
[-- Type: text/x-diff, Size: 708 bytes --]

Index: linux-2.6.32.y/drivers/net/e1000e/ethtool.c
===================================================================
--- linux-2.6.32.y.orig/drivers/net/e1000e/ethtool.c	2010-11-08 15:34:35.000000000 +0100
+++ linux-2.6.32.y/drivers/net/e1000e/ethtool.c	2010-11-08 17:32:27.000000000 +0100
@@ -1833,8 +1833,8 @@
 	if ((hw->phy.type == e1000_phy_ife) ||
 	    (hw->mac.type == e1000_pchlan) ||
 	    (hw->mac.type == e1000_82574)) {
-		INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
 		if (!adapter->blink_timer.function) {
+			INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
 			init_timer(&adapter->blink_timer);
 			adapter->blink_timer.function =
 				e1000_led_blink_callback;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [e1000e] BUG triggered when triggering LED blinking
  2010-11-09  8:39 [e1000e] BUG triggered when triggering LED blinking Holger Eitzenberger
@ 2010-11-10 18:32 ` Brandeburg, Jesse
  2010-11-11 10:17   ` [e1000e] BUG triggered in blink path Holger Eitzenberger
  0 siblings, 1 reply; 5+ messages in thread
From: Brandeburg, Jesse @ 2010-11-10 18:32 UTC (permalink / raw)
  To: Holger Eitzenberger
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org



On Tue, 9 Nov 2010, Holger Eitzenberger wrote:

> Hi,
> 
> using e1000e driver version 1.2.10 and kernel version 2.6.32.24 I see
> the kernel go BUG() sporadically at the time 'ethtool -p eth0 3' comes
> back.
> 
> Network hardware is four times 'Intel Corporation 82583V Gigabit Network
> Connection' (0x8086:0x150c) on Atom N450.
> 
> kernel BUG at kernel/workqueue.c:287!

<snip>

> 
> The full trace is attached, it may become clearer from that.
> 
> After taking a look I think this may be caused by initializing
> adapter->led_blink_task several times in e1000_phys_id(), while possibly
> led_blink_task is running:
> 
> 	if ((hw->phy.type == e1000_phy_ife) ||
> 	    (hw->mac.type == e1000_pchlan) ||
> 	    (hw->mac.type == e1000_82574)) {
> 		INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
> 		if (!adapter->blink_timer.function) {
> 
> I can't reproduce it after moving it inside the following if block,
> but I'm not quite sure if this catches all races in there.  Especially
> the msleep_interruptible() may be too optimistic because it may
> actually not wait long enough.  Someone with more knowledge of the
> driver should take a look.

thanks for your investigation and troubleshooting.  I don't think it is 
correct at all to be calling INIT_WORK more than once.  In fact the 
INIT_WORK should just be moved into probe, and then e1000_phys_id should 
just do schedule_work.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [e1000e] BUG triggered in blink path
  2010-11-10 18:32 ` Brandeburg, Jesse
@ 2010-11-11 10:17   ` Holger Eitzenberger
  2010-11-11 23:59     ` Jeff Kirsher
  2010-11-12  0:40     ` Brandeburg, Jesse
  0 siblings, 2 replies; 5+ messages in thread
From: Holger Eitzenberger @ 2010-11-11 10:17 UTC (permalink / raw)
  To: Brandeburg, Jesse; +Cc: e1000-devel, netdev

[-- Attachment #1: Type: text/plain, Size: 1170 bytes --]

Hi Jesse,

I've attached the patch against net-next-2.6.  Please check if it's ok
for you.  I checked e1000, igb and ixgbe as well, they don't have that
problem.

 /holger

> > After taking a look I think this may be caused by initializing
> > adapter->led_blink_task several times in e1000_phys_id(), while possibly
> > led_blink_task is running:
> > 
> > 	if ((hw->phy.type == e1000_phy_ife) ||
> > 	    (hw->mac.type == e1000_pchlan) ||
> > 	    (hw->mac.type == e1000_82574)) {
> > 		INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
> > 		if (!adapter->blink_timer.function) {
> > 
> > I can't reproduce it after moving it inside the following if block,
> > but I'm not quite sure if this catches all races in there.  Especially
> > the msleep_interruptible() may be too optimistic because it may
> > actually not wait long enough.  Someone with more knowledge of the
> > driver should take a look.
> 
> thanks for your investigation and troubleshooting.  I don't think it is 
> correct at all to be calling INIT_WORK more than once.  In fact the 
> INIT_WORK should just be moved into probe, and then e1000_phys_id should 
> just do schedule_work.
> 
> 

[-- Attachment #2: x.diff --]
[-- Type: text/x-diff, Size: 2637 bytes --]

e1000e: fix double initialization in blink path

The kernel goes BUG() at the time 'ethtool -p eth0 3' comes
back, which is due to adapter->led_blink_task initialized
several times.  At the time it is still running this results
in a corrupted task_list of the associated workqueue.

The fix is to move the workqueue initialization to the
probe function instead.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: net-next-2.6/drivers/net/e1000e/ethtool.c
===================================================================
--- net-next-2.6.orig/drivers/net/e1000e/ethtool.c	2010-11-11 10:57:28.000000000 +0100
+++ net-next-2.6/drivers/net/e1000e/ethtool.c	2010-11-11 11:02:21.000000000 +0100
@@ -1860,7 +1860,7 @@
 /* bit defines for adapter->led_status */
 #define E1000_LED_ON		0
 
-static void e1000e_led_blink_task(struct work_struct *work)
+void e1000e_led_blink_task(struct work_struct *work)
 {
 	struct e1000_adapter *adapter = container_of(work,
 	                                struct e1000_adapter, led_blink_task);
@@ -1892,7 +1892,6 @@
 	    (hw->mac.type == e1000_pch2lan) ||
 	    (hw->mac.type == e1000_82583) ||
 	    (hw->mac.type == e1000_82574)) {
-		INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
 		if (!adapter->blink_timer.function) {
 			init_timer(&adapter->blink_timer);
 			adapter->blink_timer.function =
Index: net-next-2.6/drivers/net/e1000e/netdev.c
===================================================================
--- net-next-2.6.orig/drivers/net/e1000e/netdev.c	2010-11-11 10:57:28.000000000 +0100
+++ net-next-2.6/drivers/net/e1000e/netdev.c	2010-11-11 11:01:17.000000000 +0100
@@ -5864,6 +5864,7 @@
 	INIT_WORK(&adapter->downshift_task, e1000e_downshift_workaround);
 	INIT_WORK(&adapter->update_phy_task, e1000e_update_phy_task);
 	INIT_WORK(&adapter->print_hang_task, e1000_print_hw_hang);
+	INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
 
 	/* Initialize link parameters. User can change them with ethtool */
 	adapter->hw.mac.autoneg = 1;
Index: net-next-2.6/drivers/net/e1000e/e1000.h
===================================================================
--- net-next-2.6.orig/drivers/net/e1000e/e1000.h	2010-11-11 10:57:28.000000000 +0100
+++ net-next-2.6/drivers/net/e1000e/e1000.h	2010-11-11 11:01:57.000000000 +0100
@@ -482,6 +482,7 @@
 
 extern void e1000e_check_options(struct e1000_adapter *adapter);
 extern void e1000e_set_ethtool_ops(struct net_device *netdev);
+extern void e1000e_led_blink_task(struct work_struct *work);
 
 extern int e1000e_up(struct e1000_adapter *adapter);
 extern void e1000e_down(struct e1000_adapter *adapter);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [e1000e] BUG triggered in blink path
  2010-11-11 10:17   ` [e1000e] BUG triggered in blink path Holger Eitzenberger
@ 2010-11-11 23:59     ` Jeff Kirsher
  2010-11-12  0:40     ` Brandeburg, Jesse
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff Kirsher @ 2010-11-11 23:59 UTC (permalink / raw)
  To: Brandeburg, Jesse, e1000-devel, netdev

On Thu, Nov 11, 2010 at 02:17, Holger Eitzenberger
<holger@eitzenberger.org> wrote:
> Hi Jesse,
>
> I've attached the patch against net-next-2.6.  Please check if it's ok
> for you.  I checked e1000, igb and ixgbe as well, they don't have that
> problem.
>
>  /holger
>

Patch looks good, I have added the patch to my tree.

Cheers,
Jeff

>> > After taking a look I think this may be caused by initializing
>> > adapter->led_blink_task several times in e1000_phys_id(), while possibly
>> > led_blink_task is running:
>> >
>> >     if ((hw->phy.type == e1000_phy_ife) ||
>> >         (hw->mac.type == e1000_pchlan) ||
>> >         (hw->mac.type == e1000_82574)) {
>> >             INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
>> >             if (!adapter->blink_timer.function) {
>> >
>> > I can't reproduce it after moving it inside the following if block,
>> > but I'm not quite sure if this catches all races in there.  Especially
>> > the msleep_interruptible() may be too optimistic because it may
>> > actually not wait long enough.  Someone with more knowledge of the
>> > driver should take a look.
>>
>> thanks for your investigation and troubleshooting.  I don't think it is
>> correct at all to be calling INIT_WORK more than once.  In fact the
>> INIT_WORK should just be moved into probe, and then e1000_phys_id should
>> just do schedule_work.
>>
>>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [e1000e] BUG triggered in blink path
  2010-11-11 10:17   ` [e1000e] BUG triggered in blink path Holger Eitzenberger
  2010-11-11 23:59     ` Jeff Kirsher
@ 2010-11-12  0:40     ` Brandeburg, Jesse
  1 sibling, 0 replies; 5+ messages in thread
From: Brandeburg, Jesse @ 2010-11-12  0:40 UTC (permalink / raw)
  To: Holger Eitzenberger
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org



On Thu, 11 Nov 2010, Holger Eitzenberger wrote:
> I've attached the patch against net-next-2.6.  Please check if it's ok
> for you.  I checked e1000, igb and ixgbe as well, they don't have that
> problem.

Holger, 

I think the patch looks good and thanks.  Great catch too!  I see Jeff has 
put it into our testing process.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-11-12  0:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-09  8:39 [e1000e] BUG triggered when triggering LED blinking Holger Eitzenberger
2010-11-10 18:32 ` Brandeburg, Jesse
2010-11-11 10:17   ` [e1000e] BUG triggered in blink path Holger Eitzenberger
2010-11-11 23:59     ` Jeff Kirsher
2010-11-12  0:40     ` Brandeburg, Jesse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).