Intel-Wired-Lan Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Ruppert <idl0r@qasl.de>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] igb firmware 1.63 broken / flapping on switch reboot - update or downgrade possible?
Date: Wed, 19 May 2021 13:57:01 +0200	[thread overview]
Message-ID: <181a441c7019d51ac1284523793b8d1c@qasl.de> (raw)

Hi List,

Problem: If we reboot a Switch that is connected to igb interfaces (we 
use bonding), the interface will flapp several times during the reboot 
of the switch
Setup: 2x 1GE I350 (igb) connected to 2x Juniper EX3330 for example
It's a active/backup Bonding with MIIMON being disabled and ARP check 
being configured

What we have figured out so far, it seems to be a bug in firmware 1.63 
while a system with 1.61 seems to work just fine:

We have a bunch of systems with:
02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
	Subsystem: Super Micro Computer Inc Device 1521
	Kernel driver in use: igb
	Kernel modules: igb
02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
	Subsystem: Super Micro Computer Inc Device 1521
	Kernel driver in use: igb
	Kernel modules: igb

Lets pick 2 of those systems, first the good one:
# ethtool -i net0
driver: igb
version: 5.6.0-k
firmware-version: 1.61, 0x8000090e
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

# uname -r
3.10.0-1160.25.1.el7.x86_64

CentOS 7.9

# dmesg
[627590.997603] igb 0000:02:00.0 net0: igb: net0 NIC Link is Down
[627598.277441] bond0: link status definitely down for interface net0, 
disabling it
[627598.278062] bond0: making interface net1 the new active one
[627598.278536] device net0 left promiscuous mode
[627598.279109] device net1 entered promiscuous mode
[627856.894229] igb 0000:02:00.0 net0: igb: net0 NIC Link is Up 1000 
Mbps Full Duplex, Flow Control: RX/TX
[627859.970951] bond0: link status definitely up for interface net0
[627859.971577] bond0: making interface net0 the new active one
[627859.972127] device net1 left promiscuous mode
[627859.972801] device net0 entered promiscuous mode


That's the complete switch reboot and that is how it's supposed to be.

Now the broken one (we have multiple broken ones, all the same firmware 
version):
# ethtool -i net0
driver: igb
version: 5.6.0-k
firmware-version: 1.63, 0x80000a05
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

# uname -r
3.10.0-1160.25.1.el7.x86_64

CentOS 7.9

# dmesg[451689.477836] igb 0000:01:00.0 net0: igb: net0 NIC Link is Down
[451697.112000] bond0: link status definitely down for interface net0, 
disabling it
[451697.113060] bond0: making interface net1 the new active one
[451697.113906] device net0 left promiscuous mode
[451697.114840] device net1 entered promiscuous mode
[451742.241325] bond0: link status definitely up for interface net0
[451742.242276] bond0: making interface net0 the new active one
[451742.243065] device net1 left promiscuous mode
[451742.243976] device net0 entered promiscuous mode
[451751.265579] bond0: link status definitely down for interface net0, 
disabling it
[451751.266503] bond0: making interface net1 the new active one
[451751.267300] device net0 left promiscuous mode
[451751.268166] device net1 entered promiscuous mode
[451817.443511] bond0: link status definitely up for interface net0
[451817.444428] bond0: making interface net0 the new active one
[451817.445216] device net1 left promiscuous mode
[451817.446100] device net0 entered promiscuous mode
[451826.467777] bond0: link status definitely down for interface net0, 
disabling it
[451826.468836] bond0: making interface net1 the new active one
[451826.469702] device net0 left promiscuous mode
[451826.470534] device net1 entered promiscuous mode
[451856.548666] bond0: link status definitely up for interface net0
[451856.549534] bond0: making interface net0 the new active one
[451856.550283] device net1 left promiscuous mode
[451856.551142] device net0 entered promiscuous mode
[451865.572959] bond0: link status definitely down for interface net0, 
disabling it
[451865.573892] bond0: making interface net1 the new active one
[451865.574671] device net0 left promiscuous mode
[451865.575504] device net1 entered promiscuous mode
[451874.597227] bond0: link status definitely up for interface net0
[451874.598273] bond0: making interface net0 the new active one
[451874.599057] device net1 left promiscuous mode
[451874.599901] device net0 entered promiscuous mode
[451883.621550] bond0: link status definitely down for interface net0, 
disabling it
[451883.622382] bond0: making interface net1 the new active one
[451883.623136] device net0 left promiscuous mode
[451883.623898] device net1 entered promiscuous mode
[451886.629557] bond0: link status definitely up for interface net0
[451886.630416] bond0: making interface net0 the new active one
[451886.631178] device net1 left promiscuous mode
[451886.632051] device net0 entered promiscuous mode
[451895.653860] bond0: link status definitely down for interface net0, 
disabling it
[451895.654792] bond0: making interface net1 the new active one
[451895.655548] device net0 left promiscuous mode
[451895.656372] device net1 entered promiscuous mode
[451898.661903] bond0: link status definitely up for interface net0
[451898.662789] bond0: making interface net0 the new active one
[451898.663582] device net1 left promiscuous mode
[451898.664464] device net0 entered promiscuous mode
[451907.686173] bond0: link status definitely down for interface net0, 
disabling it
[451907.687090] bond0: making interface net1 the new active one
[451907.687864] device net0 left promiscuous mode
[451907.688700] device net1 entered promiscuous mode
[451919.718549] bond0: link status definitely up for interface net0
[451919.719403] bond0: making interface net0 the new active one
[451919.720165] device net1 left promiscuous mode
[451919.721040] device net0 entered promiscuous mode
[451928.742836] bond0: link status definitely down for interface net0, 
disabling it
[451928.743834] bond0: making interface net1 the new active one
[451928.744601] device net0 left promiscuous mode
[451928.745452] device net1 entered promiscuous mode
[451949.799426] bond0: link status definitely up for interface net0
[451949.800297] bond0: making interface net0 the new active one
[451949.801080] device net1 left promiscuous mode
[451949.801978] device net0 entered promiscuous mode
[451954.463872] igb 0000:01:00.0 net0: igb: net0 NIC Link is Up 1000 
Mbps Full Duplex, Flow Control: RX/TX

This is the same reboot as on the good one. It's the same switch they're 
connected to. The same bonding config etc. So it doesn't seem to be 
related to the bonding.
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: net0 (primary_reselect always)
Currently Active Slave: net0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 3000
ARP IP target/s (n.n.n.n form): 192.168.99.105

Slave Interface: net0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 9
Permanent HW addr: 0c:c4:7a:ab:f2:30
Slave queue ID: 0

Slave Interface: net1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 0c:c4:7a:ab:f2:31
Slave queue ID: 0


Is it possible to upgrade the firmware? Is there a more recent one at 
all? I couldn't find any info about that nor a changelog or something 
else so far. We'd do even a downgrade to get that fixed.
The firmware doesn't seem to be included into the driver so I would 
assume there's an external package for it?

-- 
Regards,
Christian Ruppert

             reply	other threads:[~2021-05-19 11:57 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-19 11:57 Christian Ruppert [this message]
2021-05-19 13:09 ` [Intel-wired-lan] igb firmware 1.63 broken / flapping on switch reboot - update or downgrade possible? Christian Ruppert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=181a441c7019d51ac1284523793b8d1c@qasl.de \
    --to=idl0r@qasl.de \
    --cc=intel-wired-lan@osuosl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox