From: Qing Huang <qing.huang@oracle.com>
To: Or Gerlitz <gerlitz.or@gmail.com>
Cc: Linux Netdev List <netdev@vger.kernel.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Jay Vosburgh <j.vosburgh@gmail.com>,
Veaceslav Falico <vfalico@gmail.com>,
Andy Gospodarek <andy@greyhouse.net>,
Aviv Heller <avivh@mellanox.com>, Moni Shoua <monis@mellanox.com>,
Mahesh Bandewar <maheshb@google.com>,
"David S. Miller" <davem@davemloft.net>
Subject: Re: Setting large MTU size on slave interfaces may stall the whole system
Date: Thu, 14 Dec 2017 16:00:05 -0800 [thread overview]
Message-ID: <91b6e7f4-e1f5-fe9c-f694-91b9f62e1608@oracle.com> (raw)
In-Reply-To: <CAJ3xEMgqmeT7tQvJZ+5Daaz2a7wsC9rUyNstP0ZoTZkr5_p1gA@mail.gmail.com>
Hi Or,
On 12/13/2017 06:28 AM, Or Gerlitz wrote:
> On Tue, Dec 12, 2017 at 5:21 AM, Qing Huang <qing.huang@oracle.com> wrote:
>> Hi,
>>
>> We found an issue with the bonding driver when testing Mellanox devices.
>> The following test commands will stall the whole system sometimes, with
>> serial console
>> flooded with log messages from the bond_miimon_inspect() function. Setting
>> mtu size
>> to be 1500 seems okay but very rarely it may hit the same problem too.
>>
>> ip address flush dev ens3f0
>> ip link set dev ens3f0 down
>> ip address flush dev ens3f1
>> ip link set dev ens3f1 down
>> [root@ca-hcl629 etc]# modprobe bonding mode=0 miimon=250 use_carrier=1
>> updelay=500 downdelay=500
>> [root@ca-hcl629 etc]# ifconfig bond0 up
>> [root@ca-hcl629 etc]# ifenslave bond0 ens3f0 ens3f1
>> [root@ca-hcl629 etc]# ip link set bond0 mtu 4500 up
>> Seiral console output:
>>
>> ** 4 printk messages dropped ** [ 3717.743761] bond0: link status down for
>> interface ens3f0, disabling it in 500 ms
> [..]
>
>
>> It seems that when setting a large mtu size on an RoCE interface, the RTNL
>> mutex may be held too long by the slave
>> interface, causing bond_mii_monitor() to be called repeatedly at an interval
>> of 1 tick (1K HZ kernel configuration) and kernel to become unresponsive.
> Did you try/managed to reproduce that also with other NIC drivers?
This seems to be an issue with the bonding driver. Also running older
kernels on the same
hardware without commit (de77ecd4ef02) seems to work fine. We can try to
reproduce the
issue with other type of NIC hardware.
>
> Or.
prev parent reply other threads:[~2017-12-15 0:00 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9f95c2a0-e4fe-270d-790a-beeb6b3e7690@oracle.com>
2017-12-12 3:21 ` Setting large MTU size on slave interfaces may stall the whole system Qing Huang
2017-12-13 14:28 ` Or Gerlitz
2017-12-15 0:00 ` Qing Huang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=91b6e7f4-e1f5-fe9c-f694-91b9f62e1608@oracle.com \
--to=qing.huang@oracle.com \
--cc=andy@greyhouse.net \
--cc=avivh@mellanox.com \
--cc=davem@davemloft.net \
--cc=gerlitz.or@gmail.com \
--cc=j.vosburgh@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maheshb@google.com \
--cc=monis@mellanox.com \
--cc=netdev@vger.kernel.org \
--cc=vfalico@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox