* [TEST] bond_options.sh looks flaky
@ 2024-01-22 21:55 Jakub Kicinski
2024-01-22 23:25 ` Jay Vosburgh
0 siblings, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2024-01-22 21:55 UTC (permalink / raw)
To: Benjamin Poirier, Hangbin Liu, Jay Vosburgh; +Cc: netdev@vger.kernel.org
Hi folks,
looks like tools/testing/selftests/drivers/net/bonding/bond_options.sh
is a bit flaky. This error:
# TEST: prio (balance-alb arp_ip_target primary_reselect 1) [FAIL]
# Current active slave is eth2 but not eth1
https://netdev-2.bots.linux.dev/vmksft-bonding/results/432442/7-bond-options-sh
was gone on the next run, even tho the only difference between
the content of the tree was:
$ git diff net-next-2024-01-22--18-00..net-next-2024-01-22--21-00 --stat
Documentation/devicetree/bindings/net/adi,adin.yaml | 7 ++-----
drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
drivers/net/phy/adin.c | 2 --
3 files changed, 3 insertions(+), 8 deletions(-)
So definitely nothing of relevance..
Any ideas?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [TEST] bond_options.sh looks flaky
2024-01-22 21:55 [TEST] bond_options.sh looks flaky Jakub Kicinski
@ 2024-01-22 23:25 ` Jay Vosburgh
2024-01-23 3:56 ` Hangbin Liu
0 siblings, 1 reply; 4+ messages in thread
From: Jay Vosburgh @ 2024-01-22 23:25 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Benjamin Poirier, Hangbin Liu, netdev@vger.kernel.org
Jakub Kicinski <kuba@kernel.org> wrote:
>Hi folks,
>
>looks like tools/testing/selftests/drivers/net/bonding/bond_options.sh
>is a bit flaky. This error:
>
># TEST: prio (balance-alb arp_ip_target primary_reselect 1) [FAIL]
># Current active slave is eth2 but not eth1
>
>https://netdev-2.bots.linux.dev/vmksft-bonding/results/432442/7-bond-options-sh
>
>was gone on the next run, even tho the only difference between
>the content of the tree was:
>
>$ git diff net-next-2024-01-22--18-00..net-next-2024-01-22--21-00 --stat
> Documentation/devicetree/bindings/net/adi,adin.yaml | 7 ++-----
> drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
> drivers/net/phy/adin.c | 2 --
> 3 files changed, 3 insertions(+), 8 deletions(-)
>
>So definitely nothing of relevance..
>
>Any ideas?
I think I see a couple of things in the test logic:
1) in bond_options.sh:
prio_arp()
{
local primary_reselect
local mode=$1
for primary_reselect in 0 1 2; do
prio_test "mode active-backup arp_interval 100 arp_ip_target ${g_ip4} primary eth1 primary_reselect $primary_reselect"
log_test "prio" "$mode arp_ip_target primary_reselect $primary_reselect"
done
}
The above appears to always test with "mode active-backup"
regardless of what $mode contains, but logs that $mode was tested. The
same is true for the prio_ns test that is just after prio_arp in
bond_options.sh.
2) The balance-alb and balance-tlb modes don't work with the ARP
monitor. If the prio_arp or prio_ns tests were actually testing the
stated $mode with arp_interval, it should never succeed.
3) I'm not sure why this test fails, but the prior test that claims to
be active-backup does not, even though both appear to be actually
testing active-backup. The log entries for the actual "prio
(active-backup arp_ip_target primary_reselect 1)" test start at time
281.913374, and differ from the failing test starting at 715.597039.
-J
---
-Jay Vosburgh, jay.vosburgh@canonical.com
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [TEST] bond_options.sh looks flaky
2024-01-22 23:25 ` Jay Vosburgh
@ 2024-01-23 3:56 ` Hangbin Liu
2024-01-23 7:17 ` Hangbin Liu
0 siblings, 1 reply; 4+ messages in thread
From: Hangbin Liu @ 2024-01-23 3:56 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Jakub Kicinski, Benjamin Poirier, netdev@vger.kernel.org
On Mon, Jan 22, 2024 at 03:25:57PM -0800, Jay Vosburgh wrote:
> Jakub Kicinski <kuba@kernel.org> wrote:
>
> >Hi folks,
> >
> >looks like tools/testing/selftests/drivers/net/bonding/bond_options.sh
> >is a bit flaky. This error:
> >
> ># TEST: prio (balance-alb arp_ip_target primary_reselect 1) [FAIL]
> ># Current active slave is eth2 but not eth1
> >
> >https://netdev-2.bots.linux.dev/vmksft-bonding/results/432442/7-bond-options-sh
> >
> >was gone on the next run, even tho the only difference between
> >the content of the tree was:
> >
> >$ git diff net-next-2024-01-22--18-00..net-next-2024-01-22--21-00 --stat
> > Documentation/devicetree/bindings/net/adi,adin.yaml | 7 ++-----
> > drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
> > drivers/net/phy/adin.c | 2 --
> > 3 files changed, 3 insertions(+), 8 deletions(-)
> >
> >So definitely nothing of relevance..
> >
> >Any ideas?
>
> I think I see a couple of things in the test logic:
>
> 1) in bond_options.sh:
>
> prio_arp()
> {
> local primary_reselect
> local mode=$1
>
> for primary_reselect in 0 1 2; do
> prio_test "mode active-backup arp_interval 100 arp_ip_target ${g_ip4} primary eth1 primary_reselect $primary_reselect"
> log_test "prio" "$mode arp_ip_target primary_reselect $primary_reselect"
> done
> }
>
> The above appears to always test with "mode active-backup"
> regardless of what $mode contains, but logs that $mode was tested. The
> same is true for the prio_ns test that is just after prio_arp in
> bond_options.sh.
Ah, yes. I will post a fix for this issue.
>
> 2) The balance-alb and balance-tlb modes don't work with the ARP
> monitor. If the prio_arp or prio_ns tests were actually testing the
> stated $mode with arp_interval, it should never succeed.
Hmm, I forgot why I put the prio_arp/prio_ns in the mode for loop but
only use active-backup for testing... But this definitely a waste of time.
I will run them only for active-backup testing.
>
> 3) I'm not sure why this test fails, but the prior test that claims to
> be active-backup does not, even though both appear to be actually
> testing active-backup. The log entries for the actual "prio
> (active-backup arp_ip_target primary_reselect 1)" test start at time
> 281.913374, and differ from the failing test starting at 715.597039.
From the passed log
[ 505.516927] br0: port 2(s1) entered disabled state
[ 505.773009] bond0: (slave eth1): link status definitely down, disabling slave
[ 505.773593] bond0: (slave eth2): making interface the new active one
While the failed log
[ 723.603062] br0: port 4(s2) entered disabled state
[ 723.868750] bond0: (slave eth2): link status definitely down, disabling slave
[ 723.869104] bond0: (slave eth1): making interface the new active one
It looks the wrong active link was set. It should be eth1 but set to eth2.
So the later link operation set eth2 link down. Not sure why eth2 was set to
active interface. I need to print log immediately if check_err failed.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [TEST] bond_options.sh looks flaky
2024-01-23 3:56 ` Hangbin Liu
@ 2024-01-23 7:17 ` Hangbin Liu
0 siblings, 0 replies; 4+ messages in thread
From: Hangbin Liu @ 2024-01-23 7:17 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Jakub Kicinski, Benjamin Poirier, netdev@vger.kernel.org
On Tue, Jan 23, 2024 at 11:56:15AM +0800, Hangbin Liu wrote:
> > 3) I'm not sure why this test fails, but the prior test that claims to
> > be active-backup does not, even though both appear to be actually
> > testing active-backup. The log entries for the actual "prio
> > (active-backup arp_ip_target primary_reselect 1)" test start at time
> > 281.913374, and differ from the failing test starting at 715.597039.
>
> From the passed log
>
> [ 505.516927] br0: port 2(s1) entered disabled state
> [ 505.773009] bond0: (slave eth1): link status definitely down, disabling slave
> [ 505.773593] bond0: (slave eth2): making interface the new active one
>
> While the failed log
> [ 723.603062] br0: port 4(s2) entered disabled state
> [ 723.868750] bond0: (slave eth2): link status definitely down, disabling slave
> [ 723.869104] bond0: (slave eth1): making interface the new active one
>
> It looks the wrong active link was set. It should be eth1 but set to eth2.
> So the later link operation set eth2 link down. Not sure why eth2 was set to
> active interface. I need to print log immediately if check_err failed.
Ah, the log did print the error message:
# TEST: prio (balance-alb arp_ip_target primary_reselect 1) [FAIL]
# Current active slave is eth2 but not eth1
From the log, not sure why eth0/eth1 down and thus the eth2 become the active
one.
[ 716.115869] bond0: (slave eth1): making interface the new active one
[ 716.116914] bond0: (slave eth1): Enslaving as an active interface with an up link
[ 716.117792] br0: port 2(s1) entered blocking state
[ 716.118022] br0: port 2(s1) entered forwarding state
[ 716.234644] bond0: (slave eth2): Enslaving as a backup interface with an up link
[ 716.235716] br0: port 4(s2) entered blocking state
[ 716.235926] br0: port 4(s2) entered forwarding state
[ 716.373537] bond0: (slave eth0): link status definitely down, disabling slave
[ 716.374651] bond0: (slave eth1): link status definitely down, disabling slave
[ 716.374920] bond0: (slave eth2): making interface the new active one
[ 716.484168] bond0: (slave eth0): link status definitely up
[ 716.484909] bond0: (slave eth1): link status definitely up
For other passed test you can see the eth0/eth1 was not set to down. So eth1
keep as the active one.
[ 498.558083] bond0: (slave eth1): making interface the new active one
[ 498.558973] bond0: (slave eth1): Enslaving as an active interface with an up link
[ 498.559724] br0: port 2(s1) entered blocking state
[ 498.559962] br0: port 2(s1) entered forwarding state
[ 498.632107] bond0: (slave eth2): Enslaving as a backup interface with an up link
[ 498.636366] br0: port 4(s2) entered blocking state
[ 498.636684] br0: port 4(s2) entered forwarding state
Thanks
Hangbin
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-01-23 7:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-22 21:55 [TEST] bond_options.sh looks flaky Jakub Kicinski
2024-01-22 23:25 ` Jay Vosburgh
2024-01-23 3:56 ` Hangbin Liu
2024-01-23 7:17 ` Hangbin Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).