All of lore.kernel.org
 help / color / mirror / Atom feed
From: zhuyj <zyjzyj2000@gmail.com>
To: Jay Vosburgh <jay.vosburgh@canonical.com>,
	"Tantilov, Emil S" <emil.s.tantilov@intel.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"gospo@cumulusnetworks.com" <gospo@cumulusnetworks.com>,
	"jiri@mellanox.com" <jiri@mellanox.com>,
	zhuyj <zyjzyj2000@gmail.com>
Subject: Re: bonding reports interface up with 0 Mbps
Date: Thu, 4 Feb 2016 14:44:49 +0800	[thread overview]
Message-ID: <56B2F361.80901@gmail.com> (raw)
In-Reply-To: <16238.1454565446@famine>

On 02/04/2016 01:57 PM, Jay Vosburgh wrote:
> Tantilov, Emil S <emil.s.tantilov@intel.com> wrote:
>
>> We are seeing an occasional issue where the bonding driver may report interface up with 0 Mbps:
>> bond0: link status definitely up for interface eth0, 0 Mbps full duplex
>>
>> So far in all the failed traces I have collected this happens on NETDEV_CHANGELOWERSTATE event:
>>
>> <...>-20533 [000] .... 81811.041241: ixgbe_service_task: eth1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
>> <...>-20533 [000] .... 81811.041257: ixgbe_check_vf_rate_limit <-ixgbe_service_task
>> <...>-20533 [000] .... 81811.041272: ixgbe_ping_all_vfs <-ixgbe_service_task
>> kworker/u48:0-7503  [010] .... 81811.041345: ixgbe_get_stats64 <-dev_get_stats
>> kworker/u48:0-7503  [010] .... 81811.041393: bond_netdev_event: eth1: event: 1b
>> kworker/u48:0-7503  [010] .... 81811.041394: bond_netdev_event: eth1: IFF_SLAVE
>> kworker/u48:0-7503  [010] .... 81811.041395: bond_netdev_event: eth1: slave->speed = ffffffff
>> <...>-20533 [000] .... 81811.041407: ixgbe_ptp_overflow_check <-ixgbe_service_task
>> kworker/u48:0-7503  [010] .... 81811.041407: bond_mii_monitor: bond0: link status definitely up for interface eth1, 0 Mbps full duplex
> 	From looking at the code that prints this, the "full" duplex is
> probably actually DUPLEX_UNKNOWN, but the netdev_info uses the
> expression slave->duplex ? "full" : "half", so DUPLEX_UNKNOWN at 0xff
> would print "full."
>
> 	This is what ixgbe_get_settings returns for speed and duplex if
> it is called when carrier is off.

I agree with you totally. I think it is the root cause.

Best Regards!
Zhu Yanjun

>
>> As a proof of concept I added NETDEV_CHANGELOWERSTATE in bond_slave_netdev_event() along with NETDEV_UP/CHANGE:
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 56b5605..a9dac4c 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -3014,6 +3014,7 @@ static int bond_slave_netdev_event(unsigned long event,
>> 		break;
>> 	case NETDEV_UP:
>> 	case NETDEV_CHANGE:
>> +	case NETDEV_CHANGELOWERSTATE:
>> 		bond_update_speed_duplex(slave);
>> 		if (BOND_MODE(bond) == BOND_MODE_8023AD)
>> 			bond_3ad_adapter_speed_duplex_changed(slave);
>>
>> With this change I have not seen 0 Mbps reported by the bonding driver (around 12 hour test up to this point
>> vs. 2-3 hours otherwise). Although I suppose it could also be some sort of race/timing issue with bond_mii_monitor().
> 	This change as a fix seems kind of odd, since CHANGELOWERSTATE
> is generated by bonding itself.  Perhaps the net effect is to add a
> delay and then update the speed and duplex, masking the actual problem.
>
> 	Emil, if I recall correctly, the test patch I send that uses the
> notifiers directly instead of miimon (specify miimon=0 and have bonding
> respond to the notifiers) handled everything properly, right?  If so I
> can split that up and submit it properly; it seems more like a feature
> than a straightforward bug fix, so I'm not sure it's appropriate for
> net.
>
> 	As a possibly less complex alternative for the miimon > 0 case,
> could you try the following:
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 56b560558884..ac8921e65f26 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2120,6 +2120,7 @@ static void bond_miimon_commit(struct bonding *bond)
>   {
>   	struct list_head *iter;
>   	struct slave *slave, *primary;
> +	int link_state;
>   
>   	bond_for_each_slave(bond, slave, iter) {
>   		switch (slave->new_link) {
> @@ -2127,6 +2128,10 @@ static void bond_miimon_commit(struct bonding *bond)
>   			continue;
>   
>   		case BOND_LINK_UP:
> +			link_state = bond_check_dev_link(bond, slave->dev, 0);
> +			if (!link_state)
> +				continue;
> +			bond_update_speed_duplex(slave);
>   			bond_set_slave_link_state(slave, BOND_LINK_UP,
>   						  BOND_SLAVE_NOTIFY_NOW);
>   			slave->last_link_up = jiffies;
>
>
> 	This will make bonding recheck the link state and update the
> speed and duplex after it acquires RTNL to commit a link change.  This
> probably still has a race, since the change of carrier state in the
> device is not mutexed by anything bonding can acquire (so it can always
> change as soon as it's checked).
>
> 	Thanks,
>
> 	-J
>
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

  reply	other threads:[~2016-02-04  6:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03 23:10 bonding reports interface up with 0 Mbps Tantilov, Emil S
2016-02-04  2:56 ` zhuyj
2016-02-04  5:57 ` Jay Vosburgh
2016-02-04  6:44   ` zhuyj [this message]
2016-02-04 15:47   ` Tantilov, Emil S
2016-02-04 20:19     ` Jay Vosburgh
2016-02-04 20:29 ` Jay Vosburgh
2016-02-05  0:07   ` Tantilov, Emil S
2016-02-05  0:37   ` Jay Vosburgh
2016-02-05  0:43     ` Tantilov, Emil S
2016-02-05  5:19       ` zhuyj
2016-02-05  3:24     ` zhuyj
2016-02-05 16:43     ` Tantilov, Emil S
2016-02-08 16:30     ` Tantilov, Emil S

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B2F361.80901@gmail.com \
    --to=zyjzyj2000@gmail.com \
    --cc=emil.s.tantilov@intel.com \
    --cc=gospo@cumulusnetworks.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=jiri@mellanox.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.