* Re: [E1000-devel] 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?
2010-10-21 19:09 ` 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans? Brandeburg, Jesse
@ 2010-10-22 11:01 ` Jussi Kivilinna
2010-10-22 18:15 ` Ben Greear
1 sibling, 0 replies; 3+ messages in thread
From: Jussi Kivilinna @ 2010-10-22 11:01 UTC (permalink / raw)
To: Brandeburg, Jesse
Cc: Nikola Ciprich, Tantilov, Emil S, linux-net maillist,
e1000-devel list, nikola.ciprich@linuxbox.cz, Linux kernel list,
netdev
Hello!
I seem to have same problem but with r8169 device:
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
Subsystem: ASUSTeK Computer Inc. Device 83a3
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 40
Region 0: I/O ports at e800 [size=256]
Region 2: Memory at f8fff000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at f8ff8000 (64-bit, prefetchable) [size=16K]
Expansion ROM at febf0000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable+
Address: 00000000fee0100c Data: 4161
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0
<512ns, L1 <64us
ClockPM+ Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
Capabilities: [ac] MSI-X: Enable- Mask- TabSize=4
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [cc] Vital Product Data <?>
Capabilities: [100] Advanced Error Reporting <?>
Capabilities: [140] Virtual Channel <?>
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8169
Kernel modules: r8169
When I tried to boot, computer freezes at vlan_hwaccel_do_receive
(gpf), with computer completely frozen and not responding to sysrq.
With quick search I found this thread and figured problem must be in
shared hardware vlan acceleration code. So I disabled vlan hwaccel in
r8169 driver (CONFIG_R8169_VLAN=n) and got computer booting ok.
Network is set up with two VLANs and two bridges:
bridge name bridge id STP enabled interfaces
br0 8000.xxxxxxxxxxxx no dummy0
dummy2
dummy4
dummy6
eth0.0000
wanbr0 8000.yyyyyyyyyyyy no dummy1
dummy3
eth0.0009
-Jussi
Quoting "Brandeburg, Jesse" <jesse.brandeburg@intel.com>:
>
> Adding netdev... beware the top post ordering in the thread.
>
> On Thu, 21 Oct 2010, Nikola Ciprich wrote:
>
>> Ok, here're the steps to reproduce the problem:
>>
>> ip link set up dev eth0
>> vconfig add eth0 10
>> ip link set up dev eth0.10
>> brctl addbr brtest
>> # to bylo ok, sundá to až:
>> brctl add brtest eth0.10
>>
>> last command causes panic in few seconds..
>>
>> Interesting thing is that it're reproducible only for eth0, not for
>> eth1 (both are onboard 80003ES2LAN)
>>
>> here's the lspci for those:
>> 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
>> Ethernet Controller (Copper) (rev 01)
>> Subsystem: Super Micro Computer Inc Unknown device 0000
>> Flags: bus master, fast devsel, latency 0, IRQ 65
>> Memory at d8300000 (32-bit, non-prefetchable) [size=128K]
>> I/O ports at 3000 [size=32]
>> Capabilities: [c8] Power Management version 2
>> Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
>> Capabilities: [e0] Express Endpoint IRQ 0
>> Capabilities: [100] Advanced Error Reporting
>> Capabilities: [140] Device Serial Number 5a-19-34-ff-ff-48-30-00
>>
>> 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
>> Ethernet Controller (Copper) (rev 01)
>> Subsystem: Super Micro Computer Inc Unknown device 0000
>> Flags: bus master, fast devsel, latency 0, IRQ 66
>> Memory at d8320000 (32-bit, non-prefetchable) [size=128K]
>> I/O ports at 3020 [size=32]
>> Capabilities: [c8] Power Management version 2
>> Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
>> Capabilities: [e0] Express Endpoint IRQ 0
>> Capabilities: [100] Advanced Error Reporting
>> Capabilities: [140] Device Serial Number 5a-19-34-ff-ff-48-30-00
>>
>> last but not least, I've done bisect and it leads to:
>> commit ae878ae280bea286ff2b1e1cb6e609dd8cb4501d
>> Author: Maciej Żenczykowski <maze@google.com>
>> Date: Sun Oct 3 14:49:00 2010 -0700
>>
>>
>> net: Fix IPv6 PMTU disc. w/
>> asymmetric routes
>>
>> doesn't seem to me to be related at all, but reverting really seems
>> to fix the problem for me..
>> I hope it helps..
>> with best regards
>> nik
>>
>>
>>
>> On Wed, Oct 20, 2010 at 06:36:59PM +0200, Nikola Ciprich wrote:
>> > so unfortunately I have to take back what I just wrote :(
>> > the problem still persists, it just seems to be more random
>> > so I'll try to separate the exact command that causes the panic..
>> > n.
>> >
>> > On Wed, Oct 20, 2010 at 04:46:40PM +0200, Nikola Ciprich wrote:
>> > > Hello Emil,
>> > >
>> > > I tried it now, I can still 100% reproduce with 2.6.36-rc7-git2, but
>> > > with 2.6.36-rc8-git5 it works OK. So it certainly got fixed in
>> the meantime!
>> > > I'll therefore close the bug in BZ.
>> > >
>> > > have a nice day!
>> > >
>> > > best regards
>> > >
>> > > nik
>> > >
>> > >
>> > > On Fri, Oct 15, 2010 at 10:58:15AM -0600, Tantilov, Emil S wrote:
>> > > > >-----Original Message-----
>> > > > >From: Nikola Ciprich [mailto:extmaillist@linuxbox.cz]
>> > > > >Sent: Wednesday, October 13, 2010 10:23 PM
>> > > > >To: Linux kernel list; linux-net maillist; e1000-devel list
>> > > > >Cc: nikola.ciprich@linuxbox.cz
>> > > > >Subject: [E1000-devel] 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?
>> > > > >
>> > > > >Hi,
>> > > > >when I try to boot 2.6.36-rc7-git2 on one of my machines, it
>> crashes while
>> > > > >setting up the network.
>> > > >
>> > > > Thanks for letting us know!
>> > > >
>> > > > >The setup is quite complex, with bonding, lots of vlans and 3 intel
>> > > > >adapters (system is quad x86_64)
>> > > >
>> > > > Could you provide more details about the exact setup and the
>> sequence of
>> > > > commands that lead to the crash? If you can narrow it down to
>> the actual
>> > > > command that caused the crash that would be very helpful.
>> > > >
>> > > > Also - are you testing from Linus or net-next tree? If you
>> are not using
>> > > > net-next, could you give it a try and see if you can
>> reproduce it there?
>> > > >
>> > > > >
>> > > > >snip of lspci:
>> > > > >06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN
>> Gigabit Ethernet
>> > > > >Controller (Copper) (rev 01)
>> > > > >06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN
>> Gigabit Ethernet
>> > > > >Controller (Copper) (rev 01)
>> > > > >09:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
>> > > > >Connection
>> > > >
>> > > > Could you provide the output of lspci -vvv and a kernel config?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [E1000-devel] 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans?
2010-10-21 19:09 ` 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans? Brandeburg, Jesse
2010-10-22 11:01 ` [E1000-devel] " Jussi Kivilinna
@ 2010-10-22 18:15 ` Ben Greear
1 sibling, 0 replies; 3+ messages in thread
From: Ben Greear @ 2010-10-22 18:15 UTC (permalink / raw)
To: Brandeburg, Jesse
Cc: Nikola Ciprich, Tantilov, Emil S, linux-net maillist,
e1000-devel list, nikola.ciprich@linuxbox.cz, Linux kernel list,
netdev
On 10/21/2010 12:09 PM, Brandeburg, Jesse wrote:
>
> Adding netdev... beware the top post ordering in the thread.
Is there any more info, like a stack trace? We just saw this on
one of our more complex setups. Kernel is 2.6.36, with some patches,
including a proprietary module:
general protection fault: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:0f:01.0/class
CPU 2
Modules linked in: 8021q garp bridge veth arc4 michael_mic macvlan wanlink(P) pktgen iscsi_tcp libiscsi_]
Pid: 0, comm: kworker/0:1 Tainted: P 2.6.36-rc8+ #3 X7DBU/X7DBU
RIP: 0010:[<ffffffff813ccc35>] [<ffffffff813ccc35>] vlan_hwaccel_do_receive+0x64/0xca
RSP: 0018:ffff880001a83c00 EFLAGS: 00010283
RAX: 0000000000000002 RBX: ffff880047c9ee00 RCX: ffff880074c18000
RDX: ffff8800ffffffff RSI: 0000000000004359 RDI: 0000000000000001
RBP: ffff880001a83c20 R08: 00000000000003eb R09: ffffffff810620af
R10: ffff880047c9ee28 R11: 00000000ffffffff R12: ffff880074c18000
R13: ffff1000766988d0 R14: ffffc900037e1dd8 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff880001a80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000462073 CR3: 0000000074219000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 0, threadinfo ffff88007d030000, task ffff88007d76f700)
Stack:
0000000000014400 ffff880047c9ee00 ffff880074c18948 ffff880047c9ee08
<0> ffff880001a83c90 ffffffff813456ed ffff880001a83c40 ffffffff8100fbba
<0> ffff880001a83c70 ffffffff81061dad ffff880001b102c0 ffff880047c9ee00
Call Trace:
<IRQ>
[<ffffffff813456ed>] __netif_receive_skb+0x4b/0x444
[<ffffffff8100fbba>] ? read_tsc+0x9/0x1b
[<ffffffff81061dad>] ? getnstimeofday+0x5e/0xb4
[<ffffffff8134697a>] netif_receive_skb+0x7c/0x83
[<ffffffff813470b5>] napi_skb_finish+0x24/0x3b
[<ffffffff813ccf16>] vlan_gro_receive+0x7b/0x7d
[<ffffffffa02bff4b>] e1000_receive_skb+0x54/0x70 [e1000e]
[<ffffffffa02c1cc9>] e1000_clean_rx_irq+0x1fe/0x2aa [e1000e]
[<ffffffff810651de>] ? clockevents_program_event+0x75/0x7e
[<ffffffff810651de>] ? clockevents_program_event+0x75/0x7e
[<ffffffffa02c20a7>] e1000_clean+0x75/0x221 [e1000e]
[<ffffffff81346b67>] net_rx_action+0xad/0x1e9
[<ffffffff8100fcd0>] ? native_sched_clock+0x3c/0x68
[<ffffffff81048932>] __do_softirq+0xa8/0x135
[<ffffffff8100a99c>] call_softirq+0x1c/0x30
[<ffffffff8100c05d>] do_softirq+0x41/0x7e
[<ffffffff81048ac4>] irq_exit+0x36/0x85
[<ffffffff8100b797>] do_IRQ+0xad/0xc4
[<ffffffff813efa13>] ret_from_intr+0x0/0x11
<EOI>
[<ffffffff81010840>] ? mwait_idle+0x7f/0x8c
[<ffffffff81010833>] ? mwait_idle+0x72/0x8c
[<ffffffff81008dd5>] cpu_idle+0x59/0xb5
[<ffffffff813e97d6>] start_secondary+0x1a9/0x1ae
Code: 0d 0f b7 c0 41 8b 44 85 04 66 c7 83 c4 00 00 00 00 00 89 43 78 4d 8b ad d8 00 00 00 e8 11 87 e0 ff
RIP [<ffffffff813ccc35>] vlan_hwaccel_do_receive+0x64/0xca
RSP <ffff880001a83c00>
---[ end trace 64a9f9c2bdc31dcd ]---
Kernel panic - not syncing: Fatal exception in interrupt
I re-compiled this kernel with symbols, and the crash points here. We'll
try to reproduce with this newly compiled kernel, in case that merely compiling
with symbols changes the offsets.
(gdb) l *(vlan_hwaccel_do_receive+0x64)
0xffffffff813ccc55 is in vlan_hwaccel_do_receive (/home/greearb/git/linux-2.6.dev.36.y/net/8021q/vlan_core.c:56).
51 skb->vlan_tci = 0;
52
53 rx_stats = this_cpu_ptr(vlan_dev_info(dev)->vlan_rx_stats);
54
55 u64_stats_update_begin(&rx_stats->syncp);
56 rx_stats->rx_packets++;
57 rx_stats->rx_bytes += skb->len;
58
59 switch (skb->pkt_type) {
60 case PACKET_BROADCAST:
(gdb)
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 3+ messages in thread