Setting large MTU size on slave interfaces may stall the whole system

Netdev List
 help / color / mirror / Atom feed

From: Qing Huang <qing.huang@oracle.com>
To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Jay Vosburgh <j.vosburgh@gmail.com>,
	Veaceslav Falico <vfalico@gmail.com>,
	Andy Gospodarek <andy@greyhouse.net>
Subject: Setting large MTU size on slave interfaces may stall the whole system
Date: Mon, 11 Dec 2017 19:21:17 -0800	[thread overview]
Message-ID: <ea15b7f9-4c02-3d4e-1c01-4e2a90d2a23c@oracle.com> (raw)
In-Reply-To: <9f95c2a0-e4fe-270d-790a-beeb6b3e7690@oracle.com>

(resend this email in text format)


Hi,

We found an issue with the bonding driver when testing Mellanox devices.
The following test commands will stall the whole system sometimes, with 
serial console
flooded with log messages from the bond_miimon_inspect() function. 
Setting mtu size
to be 1500 seems okay but very rarely it may hit the same problem too.

ip address flush dev ens3f0
ip link set dev ens3f0 down
ip address flush dev ens3f1
ip link set dev ens3f1 down
[root@ca-hcl629 etc]# modprobe bonding mode=0 miimon=250 use_carrier=1
updelay=500 downdelay=500
[root@ca-hcl629 etc]# ifconfig bond0 up
[root@ca-hcl629 etc]# ifenslave bond0 ens3f0 ens3f1
[root@ca-hcl629 etc]# ip link set bond0 mtu 4500 up


Seiral console output:

** 4 printk messages dropped ** [ 3717.743761] bond0: link status down for
interface ens3f0, disabling it in 500 ms

** 5 printk messages dropped ** [ 3717.755737] bond0: link status down for
interface ens3f0, disabling it in 500 ms

** 5 printk messages dropped ** [ 3717.767758] bond0: link status down for
interface ens3f0, disabling it in 500 ms

** 4 printk messages dropped ** [ 3717.777737] bond0: link status down for
interface ens3f0, disabling it in 500 ms

or

** 4 printk messages dropped ** [274743.297863] bond0: link status down 
again
after 500 ms for interface enp48s0f1
** 4 printk messages dropped ** [274743.307866] bond0: link status down 
again
after 500 ms for interface enp48s0f1
** 4 printk messages dropped ** [274743.317857] bond0: link status down 
again
after 500 ms for interface enp48s0f1
** 4 printk messages dropped ** [274743.327823] bond0: link status down 
again
after 500 ms for interface enp48s0f1
** 4 printk messages dropped ** [274743.337817] bond0: link status down 
again
after 500 ms for interface enp48s0f1


The root cause is the combined affect from commit 
1f2cd845d3827412e82bf26dde0abca332ede402(Revert
"Merge branch 'bonding_monitor_locking'") and commit 
de77ecd4ef02ca783f7762e04e92b3d0964be66b
("bonding: improve link-status update in mii-monitoring"). E.g. 
reverting the second commit, we don't
see the problem.

It seems that when setting a large mtu size on an RoCE interface, the 
RTNL mutex may be held too long by the slave
interface, causing bond_mii_monitor() to be called repeatedly at an 
interval of 1 tick (1K HZ kernel configuration)
and kernel to become unresponsive.


We found two possible solutions:

#1, don't re-arm the mii monitor thread too quick if we cannot get RTNL 
lock:
index b2db581..8fd587a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2266,7 +2266,6 @@ static void bond_mii_monitor(struct work_struct 
*work)

                 /* Race avoidance with bond_close cancel of workqueue */
                 if (!rtnl_trylock()) {
-                       delay = 1;
                         should_notify_peers = false;
                         goto re_arm;
                 }

#2, we use printk_ratelimit() to avoid flooding log messages generated 
by bond_miimon_inspect().

index b2db581..0183b7f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2054,7 +2054,7 @@ static int bond_miimon_inspect(struct bonding *bond)
                         bond_propose_link_state(slave, BOND_LINK_FAIL);
                         commit++;
                         slave->delay = bond->params.downdelay;
-                       if (slave->delay) {
+                       if (slave->delay && printk_ratelimit()) {
                                 netdev_info(bond->dev, "link status 
down for
%sinterface %s, disabling it in %d ms\n",
                                             (BOND_MODE(bond) ==
BOND_MODE_ACTIVEBACKUP) ?
@@ -2105,7 +2105,8 @@ static int bond_miimon_inspect(struct bonding *bond)
                 case BOND_LINK_BACK:
                         if (!link_state) {
                                 bond_propose_link_state(slave,
BOND_LINK_DOWN);
-                               netdev_info(bond->dev, "link status down
again after %d ms for interface %s\n",
+                               if(printk_ratelimit())
+                                       netdev_info(bond->dev, "link status
down again after %d ms for interface %s\n",
(bond->params.updelay -
slave->delay) *
bond->params.miimon,
slave->dev->name);


Regarding the flooding messages, the netdev_info output is misleading 
anyway
when bond_mii_monitor() is called at 1 tick interval due to lock 
contention.


Solution #1 looks simpler and cleaner to me. Any side affect of doing that?


Thanks,
Qing

next      parent reply	other threads:[~2017-12-12  3:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9f95c2a0-e4fe-270d-790a-beeb6b3e7690@oracle.com>
2017-12-12  3:21 ` Qing Huang [this message]
2017-12-13 14:28   ` Setting large MTU size on slave interfaces may stall the whole system Or Gerlitz
2017-12-15  0:00     ` Qing Huang

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:b2db581 dfblob:8fd587a dfblob:b2db581 dfblob:0183b7f )
 OR (
bs:"Setting large MTU size on slave interfaces may stall the whole system" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea15b7f9-4c02-3d4e-1c01-4e2a90d2a23c@oracle.com \
    --to=qing.huang@oracle.com \
    --cc=andy@greyhouse.net \
    --cc=j.vosburgh@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox