netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Severe regression in bnx2 driver with bonding in post 2.6.30 kernels
@ 2010-03-31 12:55 Stuart Shelton
  2010-03-31 16:01 ` Michael Chan
  2012-05-17 20:21 ` Bo Mackey
  0 siblings, 2 replies; 4+ messages in thread
From: Stuart Shelton @ 2010-03-31 12:55 UTC (permalink / raw)
  To: mchan, netdev


Hi all,

The Broadcom NetXtreme II driver appears to have a severe regression in 
all kernels post 2.6.30 - I've observed problems with 2.6.31, 2.6.32. 
and 2.6.33.

The hardware impacted is an IBM Bladecenter LS21 Blade, model 7971.  We 
have a large number of these, and all are affected.

We use generic channel-bonding, with the following options in modprobe.conf:

alias bond0 bonding
options bond0 mode=0 miimon=100

With any kernel prior to 2.6.31, the dmesg output reads:

Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
   alloc irq_desc for 17 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
...
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
   alloc irq_desc for 18 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth1: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth1 to eg1
udev: renamed network interface eth0 to eg0
...
   alloc irq_desc for 32 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: eg0: using MSI
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
   alloc irq_desc for 33 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:05.0: irq 33 for MSI/MSI-X
bnx2: eg1: using MSI
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: eg0: using MSI
bonding: bond0: enslaving eg0 as an active interface with a down link.
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2 0000:02:05.0: irq 33 for MSI/MSI-X
bnx2: eg1: using MSI
bonding: bond0: enslaving eg1 as an active interface with a down link.
bonding: bond0: link status definitely up for interface eg0.
bonding: bond0: link status definitely up for interface eg1.
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON


... however, with kernels from 2.6.31 and later, the dmesg output reads:

Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
   alloc irq_desc for 17 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
   alloc irq_desc for 18 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth1: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth1 to eg1
udev: renamed network interface eth0 to eg0
...
   alloc irq_desc for 32 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: eg0: using MSI
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
   alloc irq_desc for 33 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:05.0: irq 33 for MSI/MSI-X
bnx2: eg0 NIC SerDes Link is Down
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: eg1: using MSI
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: Chip reset did not complete
bnx2: eg1 NIC SerDes Link is Down
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: fw sync timeout, reset code = 4040005
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: fw sync timeout, reset code = 4040005
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
NET: Registered protocol family 17
bnx2 0000:02:05.0: PCI INT A disabled
bnx2 0000:02:04.0: PCI INT A disabled
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth1: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth0 to eg0
udev: renamed network interface eth1 to eg1
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: eg1: using MSI
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2 0000:02:04.0: irq 33 for MSI/MSI-X
bnx2: eg1 NIC SerDes Link is Down
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 33 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: fw sync timeout, reset code = 4040005
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: PCI INT A disabled
bnx2 0000:02:04.0: PCI INT A disabled
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
udev: renamed network interface eth0 to eg0
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth0 to eg1
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete

... (this later ouput showing the initial attempt to raise the 
interfaces at boot, and then me manually removing and re-inserting the 
bnx2 driver).  Alongside this, the console outputs "SIOCSIFFLAGS: Device 
or resource busy".

On these more recent kernels, the SIOCSIFFLAGS line is always output, 
but about 50% of the time the network interface is raised.  When this 
fails, then sometimes removing and re-inserting the bnx2 driver can 
result in usable non-bonded interfaces - but as often as not the NICs 
won't be usable even in a standard non-bonded configuration.

With a simple reboot back to a 2.6.30 or earlier kernel, the problem 
goes away (even though the firmware file on disk is the same as that 
used with the later kernels).  Ever blade we have is affected, so this 
is not a hardware problem (or at least, if it is, then it's a very 
common one!).  I thought that the problem might only occur when bonding 
is used - but I can't now recall what made me think this, and I've not 
been able to get the server down-time to extensively test the issue further.

Any advice/guidance greatly appreciated,

Stuart

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Severe regression in bnx2 driver with bonding in post 2.6.30 kernels
  2010-03-31 12:55 Severe regression in bnx2 driver with bonding in post 2.6.30 kernels Stuart Shelton
@ 2010-03-31 16:01 ` Michael Chan
  2010-03-31 18:04   ` Stuart Shelton
  2012-05-17 20:21 ` Bo Mackey
  1 sibling, 1 reply; 4+ messages in thread
From: Michael Chan @ 2010-03-31 16:01 UTC (permalink / raw)
  To: 'Stuart Shelton', netdev@vger.kernel.org

Stuart Shelton wrote:

> The Broadcom NetXtreme II driver appears to have a severe
> regression in
> all kernels post 2.6.30 - I've observed problems with 2.6.31, 2.6.32.
> and 2.6.33.
>
> The hardware impacted is an IBM Bladecenter LS21 Blade, model
> 7971.  We
> have a large number of these, and all are affected.
>

Can you provide me ethtool -i eth0 to see what the NVRAM-based
firmware version is?  Thanks.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Severe regression in bnx2 driver with bonding in post 2.6.30 kernels
  2010-03-31 16:01 ` Michael Chan
@ 2010-03-31 18:04   ` Stuart Shelton
  0 siblings, 0 replies; 4+ messages in thread
From: Stuart Shelton @ 2010-03-31 18:04 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev@vger.kernel.org


Hi Michael,

Thanks for the reply.  When I first became aware of this problem, I checked ibm.com and applied the most recent Bladecenter/LS21 firmware updates - so the NIC firmware should be current compared to what IBM officially offers.

The ethtool output is:

$ sudo ethtool -i eg0
driver: bnx2
version: 2.0.1
firmware-version: 4.6.1 ipms 1.6.0
bus-info: 0000:02:04.0

The latest IBM firmware update appears to be:

http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-5083668&brandind=5000020

Many thanks,

Stuart


On 31 Mar 2010, at 17:01, Michael Chan wrote:

> Stuart Shelton wrote:
> 
>> The Broadcom NetXtreme II driver appears to have a severe
>> regression in
>> all kernels post 2.6.30 - I've observed problems with 2.6.31, 2.6.32.
>> and 2.6.33.
>> 
>> The hardware impacted is an IBM Bladecenter LS21 Blade, model
>> 7971.  We
>> have a large number of these, and all are affected.
>> 
> 
> Can you provide me ethtool -i eth0 to see what the NVRAM-based
> firmware version is?  Thanks.
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Severe regression in bnx2 driver with bonding in post 2.6.30 kernels
  2010-03-31 12:55 Severe regression in bnx2 driver with bonding in post 2.6.30 kernels Stuart Shelton
  2010-03-31 16:01 ` Michael Chan
@ 2012-05-17 20:21 ` Bo Mackey
  1 sibling, 0 replies; 4+ messages in thread
From: Bo Mackey @ 2012-05-17 20:21 UTC (permalink / raw)
  To: netdev



Stuart Shelton <stuart <at> openobjects.com> writes:

> 
> 
> Hi all,
> 
> The Broadcom NetXtreme II driver appears to have a severe regression in 
> all kernels post 2.6.30 - I've observed problems with 2.6.31, 2.6.32. 
> and 2.6.33.
> 
> The hardware impacted is an IBM Bladecenter LS21 Blade, model 7971.  We 
> have a large number of these, and all are affected.
> 
> We use generic channel-bonding, with the following options in modprobe.conf:
> 


Hi Stuart, et. al.,

Has the above issue been fixed? If so, can you please share the root cause and
the diffs? Seeing a similar issue in version 2.0.8 of the bnx2 driver.

Thank you,
Bo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-17 20:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-31 12:55 Severe regression in bnx2 driver with bonding in post 2.6.30 kernels Stuart Shelton
2010-03-31 16:01 ` Michael Chan
2010-03-31 18:04   ` Stuart Shelton
2012-05-17 20:21 ` Bo Mackey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).