From: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
To: netdev@vger.kernel.org
Cc: nik@linuxbox.cz
Subject: bnx2x - occasional high packet loss (on LAN)
Date: Tue, 15 Sep 2015 06:17:26 +0200 [thread overview]
Message-ID: <20150915041726.GD6850@pcnci.linuxbox.cz> (raw)
[-- Attachment #1: Type: text/plain, Size: 3688 bytes --]
Hello,
I'm trying to track strange issue with one of our servers and
like to ask for recommendations..
I've got three node cluster (nodes A..C) interconnected with stacked broadcom
ICX6610. eth0 of each box is connected to first switch, eth1 to second one,
bonding set as follows: "mode=802.3ad lacp_rate=fast xmit_hash_policy=layer2+3 miimon=100"
It happened few times, that suddenly eth1 on box A started misbehaving and communication
with other nodes (ie flood ping) started dropping up to 30% packets. When this port
has been shut on both sides, problem immediately vanished.
We've tried replacing card, cable and using different port on switch, but problem
repeated again yesterday..
Since it's "only" loss, and not link loss, bonding doesn't help me much..
however during weekend, port also had strange link issue:
Sep 12 15:23:45 remrprv1a kernel: [676373.296786] bnx2x 0000:03:00.1 eth1: NIC Link is Down
Sep 12 15:23:46 remrprv1a kernel: [676373.356638] bond0: link status definitely down for interface eth1, disabling it
Sep 12 15:23:46 remrprv1a kernel: [676374.299571] bnx2x 0000:03:00.1 eth1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
Sep 12 15:23:47 remrprv1a kernel: [676374.364428] bond0: link status definitely up for interface eth1, 10000 Mbps full duplex
Sep 12 15:23:47 remrprv1a kernel: [676374.372902] bond0: first active interface up!
Sep 12 15:24:24 remrprv1a kernel: [676411.402511] bnx2x 0000:03:00.1 eth1: NIC Link is Down
Sep 12 15:24:24 remrprv1a kernel: [676411.407422] bond0: link status definitely down for interface eth1, disabling it
Sep 12 15:24:25 remrprv1a kernel: [676412.405311] bnx2x 0000:03:00.1 eth1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
Sep 12 15:24:25 remrprv1a kernel: [676412.408123] bond0: link status definitely up for interface eth1, 0 Mbps full duplex
Sep 12 15:24:51 remrprv1a kernel: [676438.477641] bnx2x 0000:03:00.1 eth1: NIC Link is Down
Sep 12 15:24:51 remrprv1a kernel: [676438.528513] bond0: link status definitely down for interface eth1, disabling it
Sep 12 15:24:52 remrprv1a kernel: [676439.480472] bnx2x 0000:03:00.1 eth1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
Sep 12 15:24:52 remrprv1a kernel: [676439.536282] bond0: link status definitely up for interface eth1, 10000 Mbps full duplex
0mbps link speed is quite weird I guess..
all three boxes are the same, running centos6 based system, 4.0.5 x86_64 kernel.
The only difference I noticed on them is, that irqbalance was enabled on problematic
box and not on the others.. So I disabled it and rebooted the box.. The problem is,
I can't really wait for the problem to reappear, so I'd like to ask, has anybody
seen similar problem? I of so, was it fixed in some newer kernel release? I haven't
found mention in the changelogs, but still.. or does somebody have a hint on what else
I should check?
I'll try to reproduce this on test system (enabling irqbalance and doing some network
benchmarks, but I'd be most happy if I could prevent it on this production system..)
thanks a lot for any advance
with best regards
nikola ciprich
PS: here's lspci -vv of eths.. should I provide any further information, please let me know:
http://nik.lbox.cz/download/lspci.txt
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
next reply other threads:[~2015-09-15 4:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-15 4:17 Nikola Ciprich [this message]
2015-09-16 8:15 ` bnx2x - occasional high packet loss (on LAN) Ariel Elior
2015-09-16 8:18 ` Nikola Ciprich
2015-09-21 10:32 ` Nikola Ciprich
2015-09-21 10:43 ` Ariel Elior
2015-09-21 10:58 ` Nikola Ciprich
2015-10-09 7:59 ` Nikola Ciprich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150915041726.GD6850@pcnci.linuxbox.cz \
--to=nikola.ciprich@linuxbox.cz \
--cc=netdev@vger.kernel.org \
--cc=nik@linuxbox.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).