netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bonding:fix speed unknown,lacp bonding failed
@ 2013-07-05  6:32 Wangyufen
  2013-07-05  8:40 ` David Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Wangyufen @ 2013-07-05  6:32 UTC (permalink / raw)
  To: netdev, lizefan; +Cc: zhangdianfang

From: "Wang Yufen" <wangyufen@huawei.com>

We bonded nic using LACP mode repeatedly, occasionally LACP bonding failed,
because a slave nic port speed was unknown. But when we used ethtool to 
check the slave NIC status, the nic status was normal,speed was 10000Mb/s.
	
Bonding get the NIC speed from NIC drivers,just when enslave nic 
and receive NETDEV_CHANGE event.We call bond_update_speed_duplex to 
update speed and duplex when miimon inspect slave link is OK and slave 
speed is unknown.
	
	
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
---
 drivers/net/bonding/bond_main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f975696..d288a98 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
 
 		switch (slave->link) {
 		case BOND_LINK_UP:
-			if (link_state)
+			if (link_state) {
+				if (slave->speed == SPEED_UNKNOWN)
+					bond_update_speed_duplex(slave);
 				continue;
+			}
 
 			slave->link = BOND_LINK_FAIL;
 			slave->delay = bond->params.downdelay;
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
  2013-07-05  6:32 [PATCH] bonding:fix speed unknown,lacp bonding failed Wangyufen
@ 2013-07-05  8:40 ` David Miller
  2013-07-05 10:10   ` wangyufen
  2013-07-05  9:20 ` Veaceslav Falico
  2013-07-05 14:54 ` Ben Hutchings
  2 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2013-07-05  8:40 UTC (permalink / raw)
  To: wangyufen; +Cc: netdev, lizefan, zhangdianfang

From: Wangyufen <wangyufen@huawei.com>
Date: Fri, 5 Jul 2013 14:32:59 +0800

> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)

>  		switch (slave->link) {
>  		case BOND_LINK_UP:
> -			if (link_state)
> +			if (link_state) {
> +				if (slave->speed == SPEED_UNKNOWN)
> +					bond_update_speed_duplex(slave);
>  				continue;

bond_miimon_inspect() does not hold the RTNL mutex, and it is required
that the RTNL mutex is held when bond_update_speed_duplex() is called.

If you ran this new code, you should be hitting the assertion at the
beginning of __ethtool_get_settings() which reads:

	ASSERT_RTNL();

In fact, if you look at bond_miimon_inspect()'s caller it goes:

		if (!rtnl_trylock()) {

right after calling bond_miimon_inspect().

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
  2013-07-05  6:32 [PATCH] bonding:fix speed unknown,lacp bonding failed Wangyufen
  2013-07-05  8:40 ` David Miller
@ 2013-07-05  9:20 ` Veaceslav Falico
  2013-07-05 10:08   ` wangyufen
  2013-07-05 14:54 ` Ben Hutchings
  2 siblings, 1 reply; 6+ messages in thread
From: Veaceslav Falico @ 2013-07-05  9:20 UTC (permalink / raw)
  To: Wangyufen; +Cc: netdev, lizefan, zhangdianfang

On Fri, Jul 5, 2013 at 8:32 AM, Wangyufen <wangyufen@huawei.com> wrote:
> From: "Wang Yufen" <wangyufen@huawei.com>
>
> We bonded nic using LACP mode repeatedly, occasionally LACP bonding failed,
> because a slave nic port speed was unknown. But when we used ethtool to
> check the slave NIC status, the nic status was normal,speed was 10000Mb/s.

Can you give a bit more details on how did you test? And which nic was it?

I've tried to reproduce it with with while :; do echo +/- >
/sys/.../bonding/slaves; done
but failed.

>
> Bonding get the NIC speed from NIC drivers,just when enslave nic
> and receive NETDEV_CHANGE event.We call bond_update_speed_duplex to
> update speed and duplex when miimon inspect slave link is OK and slave
> speed is unknown.
>
>
> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
> ---
>  drivers/net/bonding/bond_main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index f975696..d288a98 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
>
>                 switch (slave->link) {
>                 case BOND_LINK_UP:
> -                       if (link_state)
> +                       if (link_state) {
> +                               if (slave->speed == SPEED_UNKNOWN)
> +                                       bond_update_speed_duplex(slave);
>                                 continue;
> +                       }
>
>                         slave->link = BOND_LINK_FAIL;
>                         slave->delay = bond->params.downdelay;
> --
> 1.8.0
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Best regards,
Veaceslav Falico

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
  2013-07-05  9:20 ` Veaceslav Falico
@ 2013-07-05 10:08   ` wangyufen
  0 siblings, 0 replies; 6+ messages in thread
From: wangyufen @ 2013-07-05 10:08 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: netdev, lizefan, zhangdianfang

On 2013/7/5 17:20, Veaceslav Falico wrote:
> On Fri, Jul 5, 2013 at 8:32 AM, Wangyufen <wangyufen@huawei.com> wrote:
>> From: "Wang Yufen" <wangyufen@huawei.com>
>>
>> We bonded nic using LACP mode repeatedly, occasionally LACP bonding failed,
>> because a slave nic port speed was unknown. But when we used ethtool to
>> check the slave NIC status, the nic status was normal,speed was 10000Mb/s.
> 
> Can you give a bit more details on how did you test? And which nic was it?
> 
> I've tried to reproduce it with with while :; do echo +/- >
> /sys/.../bonding/slaves; done
> but failed.
> 

that is my test script:
# !/bin/sh

function bond_enable()
{
        echo -vpa0 >/sys/class/net/bonding_masters
        sleep 2
        echo +vpa0 > /sys/class/net/bonding_masters
        echo 4 > /sys/class/net/vpa0/bonding/mode
        echo 1 > /sys/class/net/vpa0/bonding/xmit_hash_policy
        ifconfig vpa0 up
        sleep 1
        echo +eth0 > /sys/class/net/vpa0/bonding/slaves
        echo +eth5 > /sys/class/net/vpa0/bonding/slaves
        echo +eth7 > /sys/class/net/vpa0/bonding/slaves
        echo +eth9 > /sys/class/net/vpa0/bonding/slaves
        sleep 2
}

for((i=0;i<5000;i++))
do
        bond_enable

        j=0
        while [ $j -lt 10 ]
        do
                sleep 0.5
                ifconfig vpa0 192.168.13.8/24
                out=`ifconfig | grep "192.168.13.8"`
                if [ -n "$out" ];then
                        break
                fi
                ((j++))
        done
        out1=`ping 192.168.13.8 -c 4`
        out2=`cat /proc/net/bonding/vpa0 | grep "Number of ports: 4"`
        if [ -n "$out1" -a -n "$out2" ];then
                echo $i PASS >> dep002.log
        else
                echo "out1 is $out1, out2 is $out2" >>dep002.log
                echo $i FAIL >> dep002.log
                exit 0
        fi
done

>>
>> Bonding get the NIC speed from NIC drivers,just when enslave nic
>> and receive NETDEV_CHANGE event.We call bond_update_speed_duplex to
>> update speed and duplex when miimon inspect slave link is OK and slave
>> speed is unknown.
>>
>>
>> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
>> ---
>>  drivers/net/bonding/bond_main.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index f975696..d288a98 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
>>
>>                 switch (slave->link) {
>>                 case BOND_LINK_UP:
>> -                       if (link_state)
>> +                       if (link_state) {
>> +                               if (slave->speed == SPEED_UNKNOWN)
>> +                                       bond_update_speed_duplex(slave);
>>                                 continue;
>> +                       }
>>
>>                         slave->link = BOND_LINK_FAIL;
>>                         slave->delay = bond->params.downdelay;
>> --
>> 1.8.0
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> Best regards,
> Veaceslav Falico
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
  2013-07-05  8:40 ` David Miller
@ 2013-07-05 10:10   ` wangyufen
  0 siblings, 0 replies; 6+ messages in thread
From: wangyufen @ 2013-07-05 10:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, lizefan, zhangdianfang

On 2013/7/5 16:40, David Miller wrote:
> From: Wangyufen <wangyufen@huawei.com>
> Date: Fri, 5 Jul 2013 14:32:59 +0800
> 
>> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
> 
>>  		switch (slave->link) {
>>  		case BOND_LINK_UP:
>> -			if (link_state)
>> +			if (link_state) {
>> +				if (slave->speed == SPEED_UNKNOWN)
>> +					bond_update_speed_duplex(slave);
>>  				continue;
> 
> bond_miimon_inspect() does not hold the RTNL mutex, and it is required
> that the RTNL mutex is held when bond_update_speed_duplex() is called.
> 
> If you ran this new code, you should be hitting the assertion at the
> beginning of __ethtool_get_settings() which reads:
> 
> 	ASSERT_RTNL();
> 
> In fact, if you look at bond_miimon_inspect()'s caller it goes:
> 
> 		if (!rtnl_trylock()) {
> 
> right after calling bond_miimon_inspect().
OK,thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
  2013-07-05  6:32 [PATCH] bonding:fix speed unknown,lacp bonding failed Wangyufen
  2013-07-05  8:40 ` David Miller
  2013-07-05  9:20 ` Veaceslav Falico
@ 2013-07-05 14:54 ` Ben Hutchings
  2 siblings, 0 replies; 6+ messages in thread
From: Ben Hutchings @ 2013-07-05 14:54 UTC (permalink / raw)
  To: Wangyufen; +Cc: netdev, lizefan, zhangdianfang

On Fri, 2013-07-05 at 14:32 +0800, Wangyufen wrote:
> From: "Wang Yufen" <wangyufen@huawei.com>
> 
> We bonded nic using LACP mode repeatedly, occasionally LACP bonding failed,
> because a slave nic port speed was unknown. But when we used ethtool to 
> check the slave NIC status, the nic status was normal,speed was 10000Mb/s.
> 	
> Bonding get the NIC speed from NIC drivers,just when enslave nic 
> and receive NETDEV_CHANGE event.We call bond_update_speed_duplex to 
> update speed and duplex when miimon inspect slave link is OK and slave 
> speed is unknown.

bond_update_sleep_duplex() must not be called in atomic context.

Ben.

> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
> ---
>  drivers/net/bonding/bond_main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index f975696..d288a98 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
>  
>  		switch (slave->link) {
>  		case BOND_LINK_UP:
> -			if (link_state)
> +			if (link_state) {
> +				if (slave->speed == SPEED_UNKNOWN)
> +					bond_update_speed_duplex(slave);
>  				continue;
> +			}
>  
>  			slave->link = BOND_LINK_FAIL;
>  			slave->delay = bond->params.downdelay;

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-07-05 14:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-05  6:32 [PATCH] bonding:fix speed unknown,lacp bonding failed Wangyufen
2013-07-05  8:40 ` David Miller
2013-07-05 10:10   ` wangyufen
2013-07-05  9:20 ` Veaceslav Falico
2013-07-05 10:08   ` wangyufen
2013-07-05 14:54 ` Ben Hutchings

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).