* ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3
@ 2009-07-15 23:00 Ben Greear
2009-07-15 23:35 ` Ben Hutchings
2009-07-16 19:13 ` Waskiewicz Jr, Peter P
0 siblings, 2 replies; 9+ messages in thread
From: Ben Greear @ 2009-07-15 23:00 UTC (permalink / raw)
To: NetDev
I just got a fancy new 10G NIC and tried it out in a (patched elsewhere, but stock ixgbe driver) 2.6.31-rc3) kernel.
First of all, it runs very fast: sustained 9.5Gbps tx + rx on two ports concurrently (using modified pktgen),
with 1500 byte pkts.
I did see a warning in the boot logs though.
Here is the lspci for one of these ports:
03:00.1 Ethernet controller: Intel Corporation Device 10fb (rev 01)
Subsystem: Device 0083:000c
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 17
Region 0: Memory at f9680000 (64-bit, prefetchable) [size=512K]
Region 2: I/O ports at e880 [size=32]
Region 4: Memory at f9778000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Mask+ 64bit+ Count=1/1 Enable-
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Mask- TabSize=64
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM unknown, Latency L0 <1us, L1 <8us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Capabilities: [100] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140] Device Serial Number 00-00-00-ff-ff-00-00-00
Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy-
IOVSta: Migration-
Initial VFs: 64, Total VFs: 64, Number of VFs: 64, Function Dependency Link: 01
VF offset: 128, stride: 2, Device ID: 10d8
Supported Page Size: 00000553, System Page Size: 00000001
VF Migration: offset: 00000000, BIR: 0
Kernel driver in use: ixgbe
Kernel modules: ixgbe
selected dmesg output:
ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 2.0.34-k2
ixgbe: Copyright (c) 1999-2009 Intel Corporation.
ixgbe 0000:03:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ixgbe 0000:03:00.0: setting latency timer to 64
alloc irq_desc for 35 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 35 for MSI/MSI-X
alloc irq_desc for 36 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 36 for MSI/MSI-X
alloc irq_desc for 37 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 37 for MSI/MSI-X
alloc irq_desc for 38 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 38 for MSI/MSI-X
alloc irq_desc for 39 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 39 for MSI/MSI-X
alloc irq_desc for 40 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 40 for MSI/MSI-X
alloc irq_desc for 41 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 41 for MSI/MSI-X
alloc irq_desc for 42 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 42 for MSI/MSI-X
alloc irq_desc for 43 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 43 for MSI/MSI-X
alloc irq_desc for 44 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 44 for MSI/MSI-X
alloc irq_desc for 45 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 45 for MSI/MSI-X
alloc irq_desc for 46 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 46 for MSI/MSI-X
alloc irq_desc for 47 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 47 for MSI/MSI-X
alloc irq_desc for 48 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 48 for MSI/MSI-X
alloc irq_desc for 49 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 49 for MSI/MSI-X
alloc irq_desc for 50 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 50 for MSI/MSI-X
alloc irq_desc for 51 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.0: irq 51 for MSI/MSI-X
ixgbe: 0000:03:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
ixgbe 0000:03:00.0: (PCI Express:5.0Gb/s:Width x8) 00:0c:bd:00:90:1a
ixgbe 0000:03:00.0: MAC: 2, PHY: 9, SFP+: 5, PBA No: e57138-000
ixgbe 0000:03:00.0: This device is a pre-production adapter/LOM. Please be aware there may be issues associated with your hardware. If you are experiencing
problems please contact your Intel or hardware representative who provided you with this hardware.
ixgbe 0000:03:00.0: Intel(R) 10 Gigabit Network Connection
alloc irq_desc for 17 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
ixgbe 0000:03:00.1: setting latency timer to 64
alloc irq_desc for 52 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 52 for MSI/MSI-X
alloc irq_desc for 53 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 53 for MSI/MSI-X
alloc irq_desc for 54 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 54 for MSI/MSI-X
alloc irq_desc for 55 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 55 for MSI/MSI-X
alloc irq_desc for 56 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 56 for MSI/MSI-X
alloc irq_desc for 57 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 57 for MSI/MSI-X
alloc irq_desc for 58 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 58 for MSI/MSI-X
alloc irq_desc for 59 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 59 for MSI/MSI-X
alloc irq_desc for 60 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 60 for MSI/MSI-X
alloc irq_desc for 61 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 61 for MSI/MSI-X
alloc irq_desc for 62 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 62 for MSI/MSI-X
alloc irq_desc for 63 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 63 for MSI/MSI-X
alloc irq_desc for 64 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 64 for MSI/MSI-X
alloc irq_desc for 65 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 65 for MSI/MSI-X
alloc irq_desc for 66 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 66 for MSI/MSI-X
alloc irq_desc for 67 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 67 for MSI/MSI-X
alloc irq_desc for 68 on node 0
alloc kstat_irqs on node 0
ixgbe 0000:03:00.1: irq 68 for MSI/MSI-X
ixgbe: 0000:03:00.1: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
ixgbe 0000:03:00.1: (PCI Express:5.0Gb/s:Width x8) 00:0c:bd:00:90:1b
ixgbe 0000:03:00.1: MAC: 2, PHY: 9, SFP+: 5, PBA No: e57138-000
ixgbe 0000:03:00.1: This device is a pre-production adapter/LOM. Please be aware there may be issues associated with your hardware. If you are experiencing
problems please contact your Intel or hardware representative who provided you with this hardware.
ixgbe 0000:03:00.1: Intel(R) 10 Gigabit Network Connection
....
BUG: scheduling while atomic: S99lanforge/2133/0x00000002
Modules linked in: sco stp llc bnep l2cap bluetooth nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 dm_multipath uinput ixgbe i2c_i801 i2c_core dca mdio
e1000e iTCO_wdt iTCO_vendor_support pcspkr ata_generic pata_acpi [last unloaded: bridge]
Pid: 2133, comm: S99lanforge Not tainted 2.6.31-rc3 #2
Call Trace:
[<ffffffff81042456>] __schedule_bug+0x5c/0x60
[<ffffffff813e6712>] schedule+0xc1/0x85e
[<ffffffff8104488a>] ? check_preempt_wakeup+0x2d/0x1b7
[<ffffffff813e880b>] ? _spin_unlock_irqrestore+0x37/0x42
[<ffffffff813e7182>] schedule_timeout+0x97/0xbb
[<ffffffff8105857e>] ? process_timeout+0x0/0xb
[<ffffffff813e71bf>] schedule_timeout_uninterruptible+0x19/0x1b
[<ffffffff81058a25>] msleep+0x16/0x1d
[<ffffffffa005e160>] ixgbe_stop_adapter_generic+0x38/0x97 [ixgbe]
[<ffffffffa0063e5a>] ixgbe_reset_hw_82599+0x13/0x1a4 [ixgbe]
[<ffffffffa005cfc3>] ixgbe_init_hw_generic+0xf/0x1d [ixgbe]
[<ffffffffa0056f04>] ixgbe_reset+0x1e/0xef [ixgbe]
[<ffffffffa005ee71>] ixgbe_set_flags+0x5c/0x66 [ixgbe]
[<ffffffff81343fe2>] dev_disable_lro+0x4d/0x69
[<ffffffff81398191>] devinet_sysctl_forward+0xd7/0x1a4
[<ffffffff81136111>] proc_sys_call_handler+0x8d/0xb7
[<ffffffff8113614a>] proc_sys_write+0xf/0x11
[<ffffffff810e856d>] vfs_write+0xa9/0x106
[<ffffffff810e8680>] sys_write+0x45/0x69
[<ffffffff81011b42>] system_call_fastpath+0x16/0x1b
More info available if needed.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3
2009-07-15 23:00 ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3 Ben Greear
@ 2009-07-15 23:35 ` Ben Hutchings
2009-07-15 23:50 ` Ben Greear
2009-07-16 19:13 ` Waskiewicz Jr, Peter P
1 sibling, 1 reply; 9+ messages in thread
From: Ben Hutchings @ 2009-07-15 23:35 UTC (permalink / raw)
To: Ben Greear; +Cc: NetDev
On Wed, 2009-07-15 at 16:00 -0700, Ben Greear wrote:
> I just got a fancy new 10G NIC and tried it out in a (patched elsewhere, but stock ixgbe driver) 2.6.31-rc3) kernel.
And what exactly are those patches?
> First of all, it runs very fast: sustained 9.5Gbps tx + rx on two ports concurrently (using modified pktgen),
> with 1500 byte pkts.
>
> I did see a warning in the boot logs though.
[...]
> BUG: scheduling while atomic: S99lanforge/2133/0x00000002
> Modules linked in: sco stp llc bnep l2cap bluetooth nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 dm_multipath uinput ixgbe i2c_i801 i2c_core dca mdio
> e1000e iTCO_wdt iTCO_vendor_support pcspkr ata_generic pata_acpi [last unloaded: bridge]
> Pid: 2133, comm: S99lanforge Not tainted 2.6.31-rc3 #2
> Call Trace:
> [<ffffffff81042456>] __schedule_bug+0x5c/0x60
> [<ffffffff813e6712>] schedule+0xc1/0x85e
> [<ffffffff8104488a>] ? check_preempt_wakeup+0x2d/0x1b7
> [<ffffffff813e880b>] ? _spin_unlock_irqrestore+0x37/0x42
> [<ffffffff813e7182>] schedule_timeout+0x97/0xbb
> [<ffffffff8105857e>] ? process_timeout+0x0/0xb
> [<ffffffff813e71bf>] schedule_timeout_uninterruptible+0x19/0x1b
> [<ffffffff81058a25>] msleep+0x16/0x1d
> [<ffffffffa005e160>] ixgbe_stop_adapter_generic+0x38/0x97 [ixgbe]
> [<ffffffffa0063e5a>] ixgbe_reset_hw_82599+0x13/0x1a4 [ixgbe]
> [<ffffffffa005cfc3>] ixgbe_init_hw_generic+0xf/0x1d [ixgbe]
> [<ffffffffa0056f04>] ixgbe_reset+0x1e/0xef [ixgbe]
> [<ffffffffa005ee71>] ixgbe_set_flags+0x5c/0x66 [ixgbe]
> [<ffffffff81343fe2>] dev_disable_lro+0x4d/0x69
> [<ffffffff81398191>] devinet_sysctl_forward+0xd7/0x1a4
> [<ffffffff81136111>] proc_sys_call_handler+0x8d/0xb7
> [<ffffffff8113614a>] proc_sys_write+0xf/0x11
> [<ffffffff810e856d>] vfs_write+0xa9/0x106
> [<ffffffff810e8680>] sys_write+0x45/0x69
> [<ffffffff81011b42>] system_call_fastpath+0x16/0x1b
I introduced dev_disable_lro() and calls to it because LRO doesn't work
in conjunction with bridging or forwarding. (GRO does not have this
problem as it allows the original packets to be regenerated.)
So far as I can see, none of the functions in this backtrace should be
entering atomic context, so I suspect that the patches "elsewhere" might
be doing something strange.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3
2009-07-15 23:35 ` Ben Hutchings
@ 2009-07-15 23:50 ` Ben Greear
2009-07-16 17:39 ` Ben Greear
0 siblings, 1 reply; 9+ messages in thread
From: Ben Greear @ 2009-07-15 23:50 UTC (permalink / raw)
To: Ben Hutchings; +Cc: NetDev
On 07/15/2009 04:35 PM, Ben Hutchings wrote:
> On Wed, 2009-07-15 at 16:00 -0700, Ben Greear wrote:
>> I just got a fancy new 10G NIC and tried it out in a (patched elsewhere, but stock ixgbe driver) 2.6.31-rc3) kernel.
>
> And what exactly are those patches?
various and sundry, but so far I only see this problem with ixgbe
(the 1G NICs don't complain at least).
I'll build an un-patched kernel and see if I can reproduce them.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3
2009-07-15 23:50 ` Ben Greear
@ 2009-07-16 17:39 ` Ben Greear
0 siblings, 0 replies; 9+ messages in thread
From: Ben Greear @ 2009-07-16 17:39 UTC (permalink / raw)
To: Ben Hutchings; +Cc: NetDev
On 07/15/2009 04:50 PM, Ben Greear wrote:
> On 07/15/2009 04:35 PM, Ben Hutchings wrote:
>> On Wed, 2009-07-15 at 16:00 -0700, Ben Greear wrote:
>>> I just got a fancy new 10G NIC and tried it out in a (patched
>>> elsewhere, but stock ixgbe driver) 2.6.31-rc3) kernel.
>>
>> And what exactly are those patches?
>
> various and sundry, but so far I only see this problem with ixgbe
> (the 1G NICs don't complain at least).
>
> I'll build an un-patched kernel and see if I can reproduce them.
I see the same problem on a stock kernel. I'm running a pre-empt kernel..think
that might have some bearing on the problem?
Full dmesg here:
http://www.candelatech.com/oss/i7_dmesg.txt
Kernel config file here:
http://www.candelatech.com/oss/i7_config.txt
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3
2009-07-15 23:00 ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3 Ben Greear
2009-07-15 23:35 ` Ben Hutchings
@ 2009-07-16 19:13 ` Waskiewicz Jr, Peter P
2009-07-16 19:32 ` Ben Greear
1 sibling, 1 reply; 9+ messages in thread
From: Waskiewicz Jr, Peter P @ 2009-07-16 19:13 UTC (permalink / raw)
To: Ben Greear; +Cc: NetDev
On Wed, 15 Jul 2009, Ben Greear wrote:
> I just got a fancy new 10G NIC and tried it out in a (patched elsewhere, but stock ixgbe driver) 2.6.31-rc3) kernel.
>
> First of all, it runs very fast: sustained 9.5Gbps tx + rx on two ports concurrently (using modified pktgen),
> with 1500 byte pkts.
>
> I did see a warning in the boot logs though.
Yes, see below for an explanation.
> ixgbe: 0000:03:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
> ixgbe 0000:03:00.0: (PCI Express:5.0Gb/s:Width x8) 00:0c:bd:00:90:1a
> ixgbe 0000:03:00.0: MAC: 2, PHY: 9, SFP+: 5, PBA No: e57138-000
> ixgbe 0000:03:00.0: This device is a pre-production adapter/LOM. Please be aware there may be issues associated with your hardware. If you are experiencing
> problems please contact your Intel or hardware representative who provided you with this hardware.
It's self-explanatory; the EEPROM version on the NIC is not the
production-level EEPROM. If you run ethtool -i ethX on this interface,
you will see what the firmware (EEPROM) version is. My guess is it's
going to be 0.5-1 or something; the production firmware is 0.9-3. If you
received this NIC from an Intel rep, they can get you the production
EEPROM and tools necessary to reprogram the NIC.
> BUG: scheduling while atomic: S99lanforge/2133/0x00000002
> Modules linked in: sco stp llc bnep l2cap bluetooth nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 dm_multipath uinput ixgbe i2c_i801 i2c_core dca mdio
> e1000e iTCO_wdt iTCO_vendor_support pcspkr ata_generic pata_acpi [last unloaded: bridge]
> Pid: 2133, comm: S99lanforge Not tainted 2.6.31-rc3 #2
> Call Trace:
> [<ffffffff81042456>] __schedule_bug+0x5c/0x60
> [<ffffffff813e6712>] schedule+0xc1/0x85e
> [<ffffffff8104488a>] ? check_preempt_wakeup+0x2d/0x1b7
> [<ffffffff813e880b>] ? _spin_unlock_irqrestore+0x37/0x42
> [<ffffffff813e7182>] schedule_timeout+0x97/0xbb
> [<ffffffff8105857e>] ? process_timeout+0x0/0xb
> [<ffffffff813e71bf>] schedule_timeout_uninterruptible+0x19/0x1b
> [<ffffffff81058a25>] msleep+0x16/0x1d
> [<ffffffffa005e160>] ixgbe_stop_adapter_generic+0x38/0x97 [ixgbe]
> [<ffffffffa0063e5a>] ixgbe_reset_hw_82599+0x13/0x1a4 [ixgbe]
> [<ffffffffa005cfc3>] ixgbe_init_hw_generic+0xf/0x1d [ixgbe]
> [<ffffffffa0056f04>] ixgbe_reset+0x1e/0xef [ixgbe]
> [<ffffffffa005ee71>] ixgbe_set_flags+0x5c/0x66 [ixgbe]
> [<ffffffff81343fe2>] dev_disable_lro+0x4d/0x69
> [<ffffffff81398191>] devinet_sysctl_forward+0xd7/0x1a4
> [<ffffffff81136111>] proc_sys_call_handler+0x8d/0xb7
> [<ffffffff8113614a>] proc_sys_write+0xf/0x11
> [<ffffffff810e856d>] vfs_write+0xa9/0x106
> [<ffffffff810e8680>] sys_write+0x45/0x69
> [<ffffffff81011b42>] system_call_fastpath+0x16/0x1b
We haven't seen such a panic in our testing, but we don't heavily test
toggling the LRO flags. We lightly touch the flags, but nothing heavy.
Note that there is a difference in this device, 82599 (assumed since
your lspci shows you're linked at 5.0 Gt/sec), that we have a HW-based
LRO running. This is the preferred configuration the driver uses at
load; there may be something broken with how we switch between HW LRO +
GRO and just straight GRO.
I will see if our validation guys can reproduce this. In the meantime,
can you try without preempt enabled? Also, it wasn't obvious to me if
this is 100% reproducible, or if it's racy. Can you comment on that?
Cheers,
-PJ
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3
2009-07-16 19:13 ` Waskiewicz Jr, Peter P
@ 2009-07-16 19:32 ` Ben Greear
2009-07-16 19:58 ` Jesper Dangaard Brouer
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Ben Greear @ 2009-07-16 19:32 UTC (permalink / raw)
To: Waskiewicz Jr, Peter P; +Cc: NetDev
On 07/16/2009 12:13 PM, Waskiewicz Jr, Peter P wrote:
> On Wed, 15 Jul 2009, Ben Greear wrote:
>
>> I just got a fancy new 10G NIC and tried it out in a (patched elsewhere, but stock ixgbe driver) 2.6.31-rc3) kernel.
>>
>> First of all, it runs very fast: sustained 9.5Gbps tx + rx on two ports concurrently (using modified pktgen),
>> with 1500 byte pkts.
>>
>> I did see a warning in the boot logs though.
>
> Yes, see below for an explanation.
>
>> ixgbe: 0000:03:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
>> ixgbe 0000:03:00.0: (PCI Express:5.0Gb/s:Width x8) 00:0c:bd:00:90:1a
>> ixgbe 0000:03:00.0: MAC: 2, PHY: 9, SFP+: 5, PBA No: e57138-000
>> ixgbe 0000:03:00.0: This device is a pre-production adapter/LOM. Please be aware there may be issues associated with your hardware. If you are experiencing
>> problems please contact your Intel or hardware representative who provided you with this hardware.
>
> It's self-explanatory; the EEPROM version on the NIC is not the
> production-level EEPROM. If you run ethtool -i ethX on this interface,
> you will see what the firmware (EEPROM) version is. My guess is it's
> going to be 0.5-1 or something; the production firmware is 0.9-3. If you
> received this NIC from an Intel rep, they can get you the production
> EEPROM and tools necessary to reprogram the NIC.
Yes, 0.5-1
I got it from interfacemasters.com, but they can probably help me do the same.
> We haven't seen such a panic in our testing, but we don't heavily test
> toggling the LRO flags. We lightly touch the flags, but nothing heavy.
> Note that there is a difference in this device, 82599 (assumed since
> your lspci shows you're linked at 5.0 Gt/sec), that we have a HW-based
> LRO running. This is the preferred configuration the driver uses at
> load; there may be something broken with how we switch between HW LRO +
> GRO and just straight GRO.
I believe the trigger for this is my script that enables ip_forward. I'm
not twiddling LRO settings directly as far as I can tell.
> I will see if our validation guys can reproduce this. In the meantime,
> can you try without preempt enabled? Also, it wasn't obvious to me if
> this is 100% reproducible, or if it's racy. Can you comment on that?
It is 100% reproducible on the system I'm testing. I haven't tried other servers
or other ixgbe NICs yet.
I'll try w/out pre-empt, should have results later today.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3
2009-07-16 19:32 ` Ben Greear
@ 2009-07-16 19:58 ` Jesper Dangaard Brouer
2009-07-16 20:17 ` David Miller
2009-07-16 21:08 ` Ben Greear
2 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2009-07-16 19:58 UTC (permalink / raw)
To: Ben Greear; +Cc: Waskiewicz Jr, Peter P, NetDev
On Thu, 16 Jul 2009, Ben Greear wrote:
> On 07/16/2009 12:13 PM, Waskiewicz Jr, Peter P wrote:
>> On Wed, 15 Jul 2009, Ben Greear wrote:
>>
>> > I just got a fancy new 10G NIC and tried it out in a (patched elsewhere,
>> > but stock ixgbe driver) 2.6.31-rc3) kernel.
>> >
>> > First of all, it runs very fast: sustained 9.5Gbps tx + rx on two ports
>> > concurrently (using modified pktgen),
>> > with 1500 byte pkts.
Yes, its very fast! Using pktgen, I can generate 11Mpps with 64 bytes
packets. Compared to my other 10GbE NIC which can do 3-4 Mpps.
>> > ixgbe 0000:03:00.0: This device is a pre-production adapter/LOM. Please
>> > be aware there may be issues associated with your hardware. If you are
>> > experiencing
>> > problems please contact your Intel or hardware representative who
>> > provided you with this hardware.
>>
>> It's self-explanatory; the EEPROM version on the NIC is not the
>> production-level EEPROM. If you run ethtool -i ethX on this interface,
>> you will see what the firmware (EEPROM) version is. My guess is it's
>> going to be 0.5-1 or something; the production firmware is 0.9-3. If you
>> received this NIC from an Intel rep, they can get you the production
>> EEPROM and tools necessary to reprogram the NIC.
>
> Yes, 0.5-1
I'm running firmware-version: 0.9-3 (which I think Peter upgraded for me).
I'm also seeing the issue (sorry for not reporting it before now, Peter, I
was going to bring it up later. Didn't think the NIC was being shipped
yet)
> I believe the trigger for this is my script that enables ip_forward. I'm
> not twiddling LRO settings directly as far as I can tell.
Also think its when the boot scripts enable ip_forward.
See the devinet_sysctl_forward() call below.
root@firesoul:~# uname -a
Linux firesoul 2.6.31-rc1-net-2.6-00122-ge594e96 #8 SMP PREEMPT Fri Jul 10 17:01:40 CEST 2009 x86_64 GNU/Linux
[ 11.729520] ixgbe: eth33 NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 11.729541] ixgbe: eth33 NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 12.496856] BUG: scheduling while atomic: sysctl/3214/0x00000002
[ 12.497184] Modules linked in: asus_atk0110 ixgbe hwmon mdio r8169
[ 12.497190] Pid: 3214, comm: sysctl Not tainted 2.6.31-rc1-net-2.6-00122-ge594e96 #8
[ 12.497192] Call Trace:
[ 12.497198] [<ffffffff8102f8c2>] __schedule_bug+0x57/0x5c
[ 12.497202] [<ffffffff81428291>] schedule+0xcb/0x88f
[ 12.497206] [<ffffffff810aaedb>] ? __inc_zone_state+0x11/0x75
[ 12.497211] [<ffffffff81044301>] ? lock_timer_base+0x26/0x4a
[ 12.497214] [<ffffffff8142a10a>] ? _spin_unlock_irqrestore+0x2c/0x37
[ 12.497217] [<ffffffff81044872>] ? __mod_timer+0x102/0x114
[ 12.497220] [<ffffffff81428d7f>] schedule_timeout+0x98/0xbf
[ 12.497223] [<ffffffff81044413>] ? process_timeout+0x0/0xb
[ 12.497225] [<ffffffff81428d7a>] ? schedule_timeout+0x93/0xbf
[ 12.497228] [<ffffffff81428dbf>] schedule_timeout_uninterruptible+0x19/0x1b
[ 12.497231] [<ffffffff81044898>] msleep+0x14/0x1e
[ 12.497240] [<ffffffffa001d19f>] ixgbe_down+0xc7/0x25e [ixgbe]
[ 12.497248] [<ffffffffa001e512>] ixgbe_reinit_locked+0x59/0x70 [ixgbe]
[ 12.497256] [<ffffffffa0020691>] ixgbe_set_flags+0x52/0x66 [ixgbe]
[ 12.497260] [<ffffffff813882f6>] dev_disable_lro+0x4d/0x69
[ 12.497264] [<ffffffff813d904e>] devinet_sysctl_forward+0xd2/0x1a2
[ 12.497268] [<ffffffff8110aea9>] proc_sys_call_handler+0x96/0xbc
[ 12.497272] [<ffffffff811a9482>] ? __up_read+0x92/0x9c
[ 12.497274] [<ffffffff8110aede>] proc_sys_write+0xf/0x11
[ 12.497277] [<ffffffff810cd625>] vfs_write+0xab/0x105
[ 12.497280] [<ffffffff810cd743>] sys_write+0x47/0x6e
[ 12.497284] [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
Cheers,
Jesper Brouer
--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3
2009-07-16 19:32 ` Ben Greear
2009-07-16 19:58 ` Jesper Dangaard Brouer
@ 2009-07-16 20:17 ` David Miller
2009-07-16 21:08 ` Ben Greear
2 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2009-07-16 20:17 UTC (permalink / raw)
To: greearb; +Cc: peter.p.waskiewicz.jr, netdev
From: Ben Greear <greearb@candelatech.com>
Date: Thu, 16 Jul 2009 12:32:50 -0700
> I believe the trigger for this is my script that enables ip_forward.
> I'm not twiddling LRO settings directly as far as I can tell.
Turning on/off bridging or forwarding twiddles the LRO settings
as a side effect.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3
2009-07-16 19:32 ` Ben Greear
2009-07-16 19:58 ` Jesper Dangaard Brouer
2009-07-16 20:17 ` David Miller
@ 2009-07-16 21:08 ` Ben Greear
2 siblings, 0 replies; 9+ messages in thread
From: Ben Greear @ 2009-07-16 21:08 UTC (permalink / raw)
To: Waskiewicz Jr, Peter P; +Cc: NetDev
On 07/16/2009 12:32 PM, Ben Greear wrote:
> It is 100% reproducible on the system I'm testing. I haven't tried other
> servers
> or other ixgbe NICs yet.
>
> I'll try w/out pre-empt, should have results later today.
I set pre-empt to Voluntary and I no longer see these warnings
in dmesg.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-07-16 21:08 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-15 23:00 ixgbe: schedule while atomic bug during dev_disable_lro 2.6.31-rc3 Ben Greear
2009-07-15 23:35 ` Ben Hutchings
2009-07-15 23:50 ` Ben Greear
2009-07-16 17:39 ` Ben Greear
2009-07-16 19:13 ` Waskiewicz Jr, Peter P
2009-07-16 19:32 ` Ben Greear
2009-07-16 19:58 ` Jesper Dangaard Brouer
2009-07-16 20:17 ` David Miller
2009-07-16 21:08 ` Ben Greear
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).