Re: 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?
       [not found]       ` <20101021190208.GA7289@nik-comp.lan>
@ 2010-10-21 19:09         ` Brandeburg, Jesse
  2010-10-22 11:01           ` [E1000-devel] " Jussi Kivilinna
  2010-10-22 18:15           ` Ben Greear
  0 siblings, 2 replies; 3+ messages in thread
From: Brandeburg, Jesse @ 2010-10-21 19:09 UTC (permalink / raw)
  To: Nikola Ciprich
  Cc: Tantilov, Emil S, e1000-devel list, netdev, Linux kernel list,
	nikola.ciprich@linuxbox.cz, linux-net maillist


Adding netdev... beware the top post ordering in the thread.

On Thu, 21 Oct 2010, Nikola Ciprich wrote:

> Ok, here're the steps to reproduce the problem:
> 
> ip link set up dev eth0
> vconfig add eth0 10
> ip link set up dev eth0.10
> brctl addbr brtest
> # to bylo ok, sundá to až:
> brctl add brtest eth0.10
> 
> last command causes panic in few seconds..
> 
> Interesting thing is that it're reproducible only for eth0, not for
> eth1 (both are onboard 80003ES2LAN)
> 
> here's the lspci for those:
> 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
> 	Subsystem: Super Micro Computer Inc Unknown device 0000
> 	Flags: bus master, fast devsel, latency 0, IRQ 65
> 	Memory at d8300000 (32-bit, non-prefetchable) [size=128K]
> 	I/O ports at 3000 [size=32]
> 	Capabilities: [c8] Power Management version 2
> 	Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
> 	Capabilities: [e0] Express Endpoint IRQ 0
> 	Capabilities: [100] Advanced Error Reporting
> 	Capabilities: [140] Device Serial Number 5a-19-34-ff-ff-48-30-00
> 
> 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
> 	Subsystem: Super Micro Computer Inc Unknown device 0000
> 	Flags: bus master, fast devsel, latency 0, IRQ 66
> 	Memory at d8320000 (32-bit, non-prefetchable) [size=128K]
> 	I/O ports at 3020 [size=32]
> 	Capabilities: [c8] Power Management version 2
> 	Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
> 	Capabilities: [e0] Express Endpoint IRQ 0
> 	Capabilities: [100] Advanced Error Reporting
> 	Capabilities: [140] Device Serial Number 5a-19-34-ff-ff-48-30-00
> 
> last but not least, I've done bisect and it leads to:
> commit ae878ae280bea286ff2b1e1cb6e609dd8cb4501d
> Author: Maciej Żenczykowski <maze@google.com>
> Date:   Sun Oct 3 14:49:00 2010 -0700                                                                                                                                                                                                      net: Fix IPv6 PMTU disc. w/ asymmetric routes
> 
> doesn't seem to me to be related at all, but reverting really seems to fix the problem for me..
> I hope it helps..
> with best regards
> nik
> 
> 
> 
> On Wed, Oct 20, 2010 at 06:36:59PM +0200, Nikola Ciprich wrote:
> > so unfortunately I have to take back what I just wrote :(
> > the problem still persists, it just seems to be more random
> > so I'll try to separate the exact command that causes the panic..
> > n.
> > 
> > On Wed, Oct 20, 2010 at 04:46:40PM +0200, Nikola Ciprich wrote:
> > > Hello Emil,
> > > 
> > > I tried it now, I can still 100% reproduce with 2.6.36-rc7-git2, but
> > > with 2.6.36-rc8-git5 it works OK. So it certainly got fixed in the meantime!
> > > I'll therefore close the bug in BZ.
> > > 
> > > have a nice day!
> > > 
> > > best regards
> > > 
> > > nik
> > > 
> > > 
> > > On Fri, Oct 15, 2010 at 10:58:15AM -0600, Tantilov, Emil S wrote:
> > > > >-----Original Message-----
> > > > >From: Nikola Ciprich [mailto:extmaillist@linuxbox.cz]
> > > > >Sent: Wednesday, October 13, 2010 10:23 PM
> > > > >To: Linux kernel list; linux-net maillist; e1000-devel list
> > > > >Cc: nikola.ciprich@linuxbox.cz
> > > > >Subject: [E1000-devel] 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?
> > > > >
> > > > >Hi,
> > > > >when I try to boot 2.6.36-rc7-git2 on one of my machines, it crashes while
> > > > >setting up the network.
> > > > 
> > > > Thanks for letting us know!
> > > > 
> > > > >The setup is quite complex, with bonding, lots of vlans and 3 intel
> > > > >adapters (system is quad x86_64)
> > > > 
> > > > Could you provide more details about the exact setup and the sequence of 
> > > > commands that lead to the crash? If you can narrow it down to the actual 
> > > > command that caused the crash that would be very helpful.
> > > > 
> > > > Also - are you testing from Linus or net-next tree? If you are not using 
> > > > net-next, could you give it a try and see if you can reproduce it there?
> > > > 
> > > > >
> > > > >snip of lspci:
> > > > >06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet
> > > > >Controller (Copper) (rev 01)
> > > > >06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet
> > > > >Controller (Copper) (rev 01)
> > > > >09:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
> > > > >Connection
> > > > 
> > > > Could you provide the output of lspci -vvv and a kernel config?

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [E1000-devel] 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?
  2010-10-21 19:09         ` 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans? Brandeburg, Jesse
@ 2010-10-22 11:01           ` Jussi Kivilinna
  2010-10-22 18:15           ` Ben Greear
  1 sibling, 0 replies; 3+ messages in thread
From: Jussi Kivilinna @ 2010-10-22 11:01 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: Nikola Ciprich, Tantilov, Emil S, linux-net maillist,
	e1000-devel list, nikola.ciprich@linuxbox.cz, Linux kernel list,
	netdev

Hello!

I seem to have same problem but with r8169 device:

  02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.  
RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
	Subsystem: ASUSTeK Computer Inc. Device 83a3
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-  
Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-  
<TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 40
	Region 0: I/O ports at e800 [size=256]
	Region 2: Memory at f8fff000 (64-bit, prefetchable) [size=4K]
	Region 4: Memory at f8ff8000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at febf0000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+  
Queue=0/0 Enable+
		Address: 00000000fee0100c  Data: 4161
	Capabilities: [70] Express (v2) Endpoint, MSI 01
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0  
<512ns, L1 <64us
			ClockPM+ Suprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive-  
BWMgmt- ABWMgmt-
	Capabilities: [ac] MSI-X: Enable- Mask- TabSize=4
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [cc] Vital Product Data <?>
	Capabilities: [100] Advanced Error Reporting <?>
	Capabilities: [140] Virtual Channel <?>
	Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
	Kernel driver in use: r8169
	Kernel modules: r8169

When I tried to boot, computer freezes at vlan_hwaccel_do_receive  
(gpf), with computer completely frozen and not responding to sysrq.  
With quick search I found this thread and figured problem must be in  
shared hardware vlan acceleration code. So I disabled vlan hwaccel in  
r8169 driver (CONFIG_R8169_VLAN=n) and got computer booting ok.

Network is set up with two VLANs and two bridges:
  bridge name	bridge id		STP enabled	interfaces
  br0		8000.xxxxxxxxxxxx	no		dummy0
							dummy2
							dummy4
							dummy6
							eth0.0000
  wanbr0		8000.yyyyyyyyyyyy	no		dummy1
							dummy3
							eth0.0009

-Jussi

Quoting "Brandeburg, Jesse" <jesse.brandeburg@intel.com>:

>
> Adding netdev... beware the top post ordering in the thread.
>
> On Thu, 21 Oct 2010, Nikola Ciprich wrote:
>
>> Ok, here're the steps to reproduce the problem:
>>
>> ip link set up dev eth0
>> vconfig add eth0 10
>> ip link set up dev eth0.10
>> brctl addbr brtest
>> # to bylo ok, sundá to až:
>> brctl add brtest eth0.10
>>
>> last command causes panic in few seconds..
>>
>> Interesting thing is that it're reproducible only for eth0, not for
>> eth1 (both are onboard 80003ES2LAN)
>>
>> here's the lspci for those:
>> 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit  
>> Ethernet Controller (Copper) (rev 01)
>> 	Subsystem: Super Micro Computer Inc Unknown device 0000
>> 	Flags: bus master, fast devsel, latency 0, IRQ 65
>> 	Memory at d8300000 (32-bit, non-prefetchable) [size=128K]
>> 	I/O ports at 3000 [size=32]
>> 	Capabilities: [c8] Power Management version 2
>> 	Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
>> 	Capabilities: [e0] Express Endpoint IRQ 0
>> 	Capabilities: [100] Advanced Error Reporting
>> 	Capabilities: [140] Device Serial Number 5a-19-34-ff-ff-48-30-00
>>
>> 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit  
>> Ethernet Controller (Copper) (rev 01)
>> 	Subsystem: Super Micro Computer Inc Unknown device 0000
>> 	Flags: bus master, fast devsel, latency 0, IRQ 66
>> 	Memory at d8320000 (32-bit, non-prefetchable) [size=128K]
>> 	I/O ports at 3020 [size=32]
>> 	Capabilities: [c8] Power Management version 2
>> 	Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
>> 	Capabilities: [e0] Express Endpoint IRQ 0
>> 	Capabilities: [100] Advanced Error Reporting
>> 	Capabilities: [140] Device Serial Number 5a-19-34-ff-ff-48-30-00
>>
>> last but not least, I've done bisect and it leads to:
>> commit ae878ae280bea286ff2b1e1cb6e609dd8cb4501d
>> Author: Maciej Żenczykowski <maze@google.com>
>> Date:   Sun Oct 3 14:49:00 2010 -0700                                
>>                                                                      
>>                                                                      
>>                                net: Fix IPv6 PMTU disc. w/  
>> asymmetric routes
>>
>> doesn't seem to me to be related at all, but reverting really seems  
>> to fix the problem for me..
>> I hope it helps..
>> with best regards
>> nik
>>
>>
>>
>> On Wed, Oct 20, 2010 at 06:36:59PM +0200, Nikola Ciprich wrote:
>> > so unfortunately I have to take back what I just wrote :(
>> > the problem still persists, it just seems to be more random
>> > so I'll try to separate the exact command that causes the panic..
>> > n.
>> >
>> > On Wed, Oct 20, 2010 at 04:46:40PM +0200, Nikola Ciprich wrote:
>> > > Hello Emil,
>> > >
>> > > I tried it now, I can still 100% reproduce with 2.6.36-rc7-git2, but
>> > > with 2.6.36-rc8-git5 it works OK. So it certainly got fixed in  
>> the meantime!
>> > > I'll therefore close the bug in BZ.
>> > >
>> > > have a nice day!
>> > >
>> > > best regards
>> > >
>> > > nik
>> > >
>> > >
>> > > On Fri, Oct 15, 2010 at 10:58:15AM -0600, Tantilov, Emil S wrote:
>> > > > >-----Original Message-----
>> > > > >From: Nikola Ciprich [mailto:extmaillist@linuxbox.cz]
>> > > > >Sent: Wednesday, October 13, 2010 10:23 PM
>> > > > >To: Linux kernel list; linux-net maillist; e1000-devel list
>> > > > >Cc: nikola.ciprich@linuxbox.cz
>> > > > >Subject: [E1000-devel] 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?
>> > > > >
>> > > > >Hi,
>> > > > >when I try to boot 2.6.36-rc7-git2 on one of my machines, it  
>> crashes while
>> > > > >setting up the network.
>> > > >
>> > > > Thanks for letting us know!
>> > > >
>> > > > >The setup is quite complex, with bonding, lots of vlans and 3 intel
>> > > > >adapters (system is quad x86_64)
>> > > >
>> > > > Could you provide more details about the exact setup and the  
>> sequence of
>> > > > commands that lead to the crash? If you can narrow it down to  
>> the actual
>> > > > command that caused the crash that would be very helpful.
>> > > >
>> > > > Also - are you testing from Linus or net-next tree? If you  
>> are not using
>> > > > net-next, could you give it a try and see if you can  
>> reproduce it there?
>> > > >
>> > > > >
>> > > > >snip of lspci:
>> > > > >06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN  
>> Gigabit Ethernet
>> > > > >Controller (Copper) (rev 01)
>> > > > >06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN  
>> Gigabit Ethernet
>> > > > >Controller (Copper) (rev 01)
>> > > > >09:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
>> > > > >Connection
>> > > >
>> > > > Could you provide the output of lspci -vvv and a kernel config?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [E1000-devel] 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?
  2010-10-21 19:09         ` 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans? Brandeburg, Jesse
  2010-10-22 11:01           ` [E1000-devel] " Jussi Kivilinna
@ 2010-10-22 18:15           ` Ben Greear
  1 sibling, 0 replies; 3+ messages in thread
From: Ben Greear @ 2010-10-22 18:15 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: Nikola Ciprich, Tantilov, Emil S, linux-net maillist,
	e1000-devel list, nikola.ciprich@linuxbox.cz, Linux kernel list,
	netdev

On 10/21/2010 12:09 PM, Brandeburg, Jesse wrote:
>
> Adding netdev... beware the top post ordering in the thread.

Is there any more info, like a stack trace?  We just saw this on
one of our more complex setups.  Kernel is 2.6.36, with some patches,
including a proprietary module:

general protection fault: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:0f:01.0/class
CPU 2
Modules linked in: 8021q garp bridge veth arc4 michael_mic macvlan wanlink(P) pktgen iscsi_tcp libiscsi_]

Pid: 0, comm: kworker/0:1 Tainted: P            2.6.36-rc8+ #3 X7DBU/X7DBU
RIP: 0010:[<ffffffff813ccc35>]  [<ffffffff813ccc35>] vlan_hwaccel_do_receive+0x64/0xca
RSP: 0018:ffff880001a83c00  EFLAGS: 00010283
RAX: 0000000000000002 RBX: ffff880047c9ee00 RCX: ffff880074c18000
RDX: ffff8800ffffffff RSI: 0000000000004359 RDI: 0000000000000001
RBP: ffff880001a83c20 R08: 00000000000003eb R09: ffffffff810620af
R10: ffff880047c9ee28 R11: 00000000ffffffff R12: ffff880074c18000
R13: ffff1000766988d0 R14: ffffc900037e1dd8 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880001a80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000462073 CR3: 0000000074219000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 0, threadinfo ffff88007d030000, task ffff88007d76f700)
Stack:
  0000000000014400 ffff880047c9ee00 ffff880074c18948 ffff880047c9ee08
<0> ffff880001a83c90 ffffffff813456ed ffff880001a83c40 ffffffff8100fbba
<0> ffff880001a83c70 ffffffff81061dad ffff880001b102c0 ffff880047c9ee00
Call Trace:
  <IRQ>
  [<ffffffff813456ed>] __netif_receive_skb+0x4b/0x444
  [<ffffffff8100fbba>] ? read_tsc+0x9/0x1b
  [<ffffffff81061dad>] ? getnstimeofday+0x5e/0xb4
  [<ffffffff8134697a>] netif_receive_skb+0x7c/0x83
  [<ffffffff813470b5>] napi_skb_finish+0x24/0x3b
  [<ffffffff813ccf16>] vlan_gro_receive+0x7b/0x7d
  [<ffffffffa02bff4b>] e1000_receive_skb+0x54/0x70 [e1000e]
  [<ffffffffa02c1cc9>] e1000_clean_rx_irq+0x1fe/0x2aa [e1000e]
  [<ffffffff810651de>] ? clockevents_program_event+0x75/0x7e
  [<ffffffff810651de>] ? clockevents_program_event+0x75/0x7e
  [<ffffffffa02c20a7>] e1000_clean+0x75/0x221 [e1000e]
  [<ffffffff81346b67>] net_rx_action+0xad/0x1e9
  [<ffffffff8100fcd0>] ? native_sched_clock+0x3c/0x68
  [<ffffffff81048932>] __do_softirq+0xa8/0x135
  [<ffffffff8100a99c>] call_softirq+0x1c/0x30
  [<ffffffff8100c05d>] do_softirq+0x41/0x7e
  [<ffffffff81048ac4>] irq_exit+0x36/0x85
  [<ffffffff8100b797>] do_IRQ+0xad/0xc4
  [<ffffffff813efa13>] ret_from_intr+0x0/0x11
  <EOI>
  [<ffffffff81010840>] ? mwait_idle+0x7f/0x8c
  [<ffffffff81010833>] ? mwait_idle+0x72/0x8c
  [<ffffffff81008dd5>] cpu_idle+0x59/0xb5
  [<ffffffff813e97d6>] start_secondary+0x1a9/0x1ae
Code: 0d 0f b7 c0 41 8b 44 85 04 66 c7 83 c4 00 00 00 00 00 89 43 78 4d 8b ad d8 00 00 00 e8 11 87 e0 ff
RIP  [<ffffffff813ccc35>] vlan_hwaccel_do_receive+0x64/0xca
  RSP <ffff880001a83c00>
---[ end trace 64a9f9c2bdc31dcd ]---
Kernel panic - not syncing: Fatal exception in interrupt

I re-compiled this kernel with symbols, and the crash points here.  We'll
try to reproduce with this newly compiled kernel, in case that merely compiling
with symbols changes the offsets.

(gdb) l *(vlan_hwaccel_do_receive+0x64)
0xffffffff813ccc55 is in vlan_hwaccel_do_receive (/home/greearb/git/linux-2.6.dev.36.y/net/8021q/vlan_core.c:56).
51		skb->vlan_tci = 0;
52	
53		rx_stats = this_cpu_ptr(vlan_dev_info(dev)->vlan_rx_stats);
54	
55		u64_stats_update_begin(&rx_stats->syncp);
56		rx_stats->rx_packets++;
57		rx_stats->rx_bytes += skb->len;
58	
59		switch (skb->pkt_type) {
60		case PACKET_BROADCAST:
(gdb)


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-10-22 18:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20101014052242.GA8998@pcnci.linuxbox.cz>
     [not found] ` <EA929A9653AAE14F841771FB1DE5A1366027CEFF1C@rrsmsx501.amr.corp.intel.com>
     [not found]   ` <20101020144640.GA3009@pcnci.linuxbox.cz>
     [not found]     ` <20101020163659.GB3009@pcnci.linuxbox.cz>
     [not found]       ` <20101021190208.GA7289@nik-comp.lan>
2010-10-21 19:09         ` 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans? Brandeburg, Jesse
2010-10-22 11:01           ` [E1000-devel] " Jussi Kivilinna
2010-10-22 18:15           ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).