* [PATCH] bonding:fix speed unknown,lacp bonding failed
@ 2013-07-05 6:32 Wangyufen
2013-07-05 8:40 ` David Miller
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Wangyufen @ 2013-07-05 6:32 UTC (permalink / raw)
To: netdev, lizefan; +Cc: zhangdianfang
From: "Wang Yufen" <wangyufen@huawei.com>
We bonded nic using LACP mode repeatedly, occasionally LACP bonding failed,
because a slave nic port speed was unknown. But when we used ethtool to
check the slave NIC status, the nic status was normal,speed was 10000Mb/s.
Bonding get the NIC speed from NIC drivers,just when enslave nic
and receive NETDEV_CHANGE event.We call bond_update_speed_duplex to
update speed and duplex when miimon inspect slave link is OK and slave
speed is unknown.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
---
drivers/net/bonding/bond_main.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f975696..d288a98 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
switch (slave->link) {
case BOND_LINK_UP:
- if (link_state)
+ if (link_state) {
+ if (slave->speed == SPEED_UNKNOWN)
+ bond_update_speed_duplex(slave);
continue;
+ }
slave->link = BOND_LINK_FAIL;
slave->delay = bond->params.downdelay;
--
1.8.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
2013-07-05 6:32 [PATCH] bonding:fix speed unknown,lacp bonding failed Wangyufen
@ 2013-07-05 8:40 ` David Miller
2013-07-05 10:10 ` wangyufen
2013-07-05 9:20 ` Veaceslav Falico
2013-07-05 14:54 ` Ben Hutchings
2 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2013-07-05 8:40 UTC (permalink / raw)
To: wangyufen; +Cc: netdev, lizefan, zhangdianfang
From: Wangyufen <wangyufen@huawei.com>
Date: Fri, 5 Jul 2013 14:32:59 +0800
> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
> switch (slave->link) {
> case BOND_LINK_UP:
> - if (link_state)
> + if (link_state) {
> + if (slave->speed == SPEED_UNKNOWN)
> + bond_update_speed_duplex(slave);
> continue;
bond_miimon_inspect() does not hold the RTNL mutex, and it is required
that the RTNL mutex is held when bond_update_speed_duplex() is called.
If you ran this new code, you should be hitting the assertion at the
beginning of __ethtool_get_settings() which reads:
ASSERT_RTNL();
In fact, if you look at bond_miimon_inspect()'s caller it goes:
if (!rtnl_trylock()) {
right after calling bond_miimon_inspect().
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
2013-07-05 6:32 [PATCH] bonding:fix speed unknown,lacp bonding failed Wangyufen
2013-07-05 8:40 ` David Miller
@ 2013-07-05 9:20 ` Veaceslav Falico
2013-07-05 10:08 ` wangyufen
2013-07-05 14:54 ` Ben Hutchings
2 siblings, 1 reply; 6+ messages in thread
From: Veaceslav Falico @ 2013-07-05 9:20 UTC (permalink / raw)
To: Wangyufen; +Cc: netdev, lizefan, zhangdianfang
On Fri, Jul 5, 2013 at 8:32 AM, Wangyufen <wangyufen@huawei.com> wrote:
> From: "Wang Yufen" <wangyufen@huawei.com>
>
> We bonded nic using LACP mode repeatedly, occasionally LACP bonding failed,
> because a slave nic port speed was unknown. But when we used ethtool to
> check the slave NIC status, the nic status was normal,speed was 10000Mb/s.
Can you give a bit more details on how did you test? And which nic was it?
I've tried to reproduce it with with while :; do echo +/- >
/sys/.../bonding/slaves; done
but failed.
>
> Bonding get the NIC speed from NIC drivers,just when enslave nic
> and receive NETDEV_CHANGE event.We call bond_update_speed_duplex to
> update speed and duplex when miimon inspect slave link is OK and slave
> speed is unknown.
>
>
> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
> ---
> drivers/net/bonding/bond_main.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index f975696..d288a98 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
>
> switch (slave->link) {
> case BOND_LINK_UP:
> - if (link_state)
> + if (link_state) {
> + if (slave->speed == SPEED_UNKNOWN)
> + bond_update_speed_duplex(slave);
> continue;
> + }
>
> slave->link = BOND_LINK_FAIL;
> slave->delay = bond->params.downdelay;
> --
> 1.8.0
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Best regards,
Veaceslav Falico
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
2013-07-05 9:20 ` Veaceslav Falico
@ 2013-07-05 10:08 ` wangyufen
0 siblings, 0 replies; 6+ messages in thread
From: wangyufen @ 2013-07-05 10:08 UTC (permalink / raw)
To: Veaceslav Falico; +Cc: netdev, lizefan, zhangdianfang
On 2013/7/5 17:20, Veaceslav Falico wrote:
> On Fri, Jul 5, 2013 at 8:32 AM, Wangyufen <wangyufen@huawei.com> wrote:
>> From: "Wang Yufen" <wangyufen@huawei.com>
>>
>> We bonded nic using LACP mode repeatedly, occasionally LACP bonding failed,
>> because a slave nic port speed was unknown. But when we used ethtool to
>> check the slave NIC status, the nic status was normal,speed was 10000Mb/s.
>
> Can you give a bit more details on how did you test? And which nic was it?
>
> I've tried to reproduce it with with while :; do echo +/- >
> /sys/.../bonding/slaves; done
> but failed.
>
that is my test script:
# !/bin/sh
function bond_enable()
{
echo -vpa0 >/sys/class/net/bonding_masters
sleep 2
echo +vpa0 > /sys/class/net/bonding_masters
echo 4 > /sys/class/net/vpa0/bonding/mode
echo 1 > /sys/class/net/vpa0/bonding/xmit_hash_policy
ifconfig vpa0 up
sleep 1
echo +eth0 > /sys/class/net/vpa0/bonding/slaves
echo +eth5 > /sys/class/net/vpa0/bonding/slaves
echo +eth7 > /sys/class/net/vpa0/bonding/slaves
echo +eth9 > /sys/class/net/vpa0/bonding/slaves
sleep 2
}
for((i=0;i<5000;i++))
do
bond_enable
j=0
while [ $j -lt 10 ]
do
sleep 0.5
ifconfig vpa0 192.168.13.8/24
out=`ifconfig | grep "192.168.13.8"`
if [ -n "$out" ];then
break
fi
((j++))
done
out1=`ping 192.168.13.8 -c 4`
out2=`cat /proc/net/bonding/vpa0 | grep "Number of ports: 4"`
if [ -n "$out1" -a -n "$out2" ];then
echo $i PASS >> dep002.log
else
echo "out1 is $out1, out2 is $out2" >>dep002.log
echo $i FAIL >> dep002.log
exit 0
fi
done
>>
>> Bonding get the NIC speed from NIC drivers,just when enslave nic
>> and receive NETDEV_CHANGE event.We call bond_update_speed_duplex to
>> update speed and duplex when miimon inspect slave link is OK and slave
>> speed is unknown.
>>
>>
>> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
>> ---
>> drivers/net/bonding/bond_main.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index f975696..d288a98 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
>>
>> switch (slave->link) {
>> case BOND_LINK_UP:
>> - if (link_state)
>> + if (link_state) {
>> + if (slave->speed == SPEED_UNKNOWN)
>> + bond_update_speed_duplex(slave);
>> continue;
>> + }
>>
>> slave->link = BOND_LINK_FAIL;
>> slave->delay = bond->params.downdelay;
>> --
>> 1.8.0
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best regards,
> Veaceslav Falico
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
2013-07-05 8:40 ` David Miller
@ 2013-07-05 10:10 ` wangyufen
0 siblings, 0 replies; 6+ messages in thread
From: wangyufen @ 2013-07-05 10:10 UTC (permalink / raw)
To: David Miller; +Cc: netdev, lizefan, zhangdianfang
On 2013/7/5 16:40, David Miller wrote:
> From: Wangyufen <wangyufen@huawei.com>
> Date: Fri, 5 Jul 2013 14:32:59 +0800
>
>> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
>
>> switch (slave->link) {
>> case BOND_LINK_UP:
>> - if (link_state)
>> + if (link_state) {
>> + if (slave->speed == SPEED_UNKNOWN)
>> + bond_update_speed_duplex(slave);
>> continue;
>
> bond_miimon_inspect() does not hold the RTNL mutex, and it is required
> that the RTNL mutex is held when bond_update_speed_duplex() is called.
>
> If you ran this new code, you should be hitting the assertion at the
> beginning of __ethtool_get_settings() which reads:
>
> ASSERT_RTNL();
>
> In fact, if you look at bond_miimon_inspect()'s caller it goes:
>
> if (!rtnl_trylock()) {
>
> right after calling bond_miimon_inspect().
OK,thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bonding:fix speed unknown,lacp bonding failed
2013-07-05 6:32 [PATCH] bonding:fix speed unknown,lacp bonding failed Wangyufen
2013-07-05 8:40 ` David Miller
2013-07-05 9:20 ` Veaceslav Falico
@ 2013-07-05 14:54 ` Ben Hutchings
2 siblings, 0 replies; 6+ messages in thread
From: Ben Hutchings @ 2013-07-05 14:54 UTC (permalink / raw)
To: Wangyufen; +Cc: netdev, lizefan, zhangdianfang
On Fri, 2013-07-05 at 14:32 +0800, Wangyufen wrote:
> From: "Wang Yufen" <wangyufen@huawei.com>
>
> We bonded nic using LACP mode repeatedly, occasionally LACP bonding failed,
> because a slave nic port speed was unknown. But when we used ethtool to
> check the slave NIC status, the nic status was normal,speed was 10000Mb/s.
>
> Bonding get the NIC speed from NIC drivers,just when enslave nic
> and receive NETDEV_CHANGE event.We call bond_update_speed_duplex to
> update speed and duplex when miimon inspect slave link is OK and slave
> speed is unknown.
bond_update_sleep_duplex() must not be called in atomic context.
Ben.
> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
> ---
> drivers/net/bonding/bond_main.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index f975696..d288a98 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2301,8 +2301,11 @@ static int bond_miimon_inspect(struct bonding *bond)
>
> switch (slave->link) {
> case BOND_LINK_UP:
> - if (link_state)
> + if (link_state) {
> + if (slave->speed == SPEED_UNKNOWN)
> + bond_update_speed_duplex(slave);
> continue;
> + }
>
> slave->link = BOND_LINK_FAIL;
> slave->delay = bond->params.downdelay;
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-07-05 14:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-05 6:32 [PATCH] bonding:fix speed unknown,lacp bonding failed Wangyufen
2013-07-05 8:40 ` David Miller
2013-07-05 10:10 ` wangyufen
2013-07-05 9:20 ` Veaceslav Falico
2013-07-05 10:08 ` wangyufen
2013-07-05 14:54 ` Ben Hutchings
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).