From: Petr Machata <petrm@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Petr Machata <petrm@nvidia.com>,
Nikolay Aleksandrov <razor@blackwall.org>,
Hangbin Liu <liuhangbin@gmail.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [TEST] forwarding/router_bridge_lag.sh started to flake on Monday
Date: Fri, 23 Aug 2024 18:13:01 +0200 [thread overview]
Message-ID: <87ttfbi5ce.fsf@nvidia.com> (raw)
In-Reply-To: <20240823080253.1c11c028@kernel.org>
Jakub Kicinski <kuba@kernel.org> writes:
> On Fri, 23 Aug 2024 13:28:11 +0200 Petr Machata wrote:
>> Jakub Kicinski <kuba@kernel.org> writes:
>>
>> > Looks like forwarding/router_bridge_lag.sh has gotten a lot more flaky
>> > this week. It flaked very occasionally (and in a different way) before:
>> >
>> > https://netdev.bots.linux.dev/contest.html?executor=vmksft-forwarding&test=router-bridge-lag-sh&ld_cnt=250
>> >
>> > There doesn't seem to be any obvious commit that could have caused this.
>>
>> Hmm:
>> # 3.37 [+0.11] Error: Device is up. Set it down before adding it as a team port.
>>
>> How are the tests isolated, are they each run in their own vng, or are
>> instances shared? Could it be that the test that runs befor this one
>> neglects to take a port down?
>
> Yes, each one has its own VM, but the VM is reused for multiple tests
> serially. The "info" file shows which VM was use (thr-id identifies
> the worker, vm-id identifies VM within the worker, worker will restart
> the VM if it detects a crash).
OK, so my guess would be that whatever ran before the test forgot to put
the port down.
>> In one failure case (I don't see further back or my browser would
>> apparently catch fire) the predecessor was no_forwarding.sh, and indeed
>> it looks like it raises the ports, but I don't see where it sets them
>> back down.
>>
>> Then router-bridge-lag's cleanup downs the ports, and on rerun it
>> succeeds. The issue would be probabilistic, because no_forwarding does
>> not always run before this test, and some tests do not care that the
>> ports are up. If that's the root cause, this should fix it:
>>
>> From 0baf91dc24b95ae0cadfdf5db05b74888e6a228a Mon Sep 17 00:00:00 2001
>> Message-ID: <0baf91dc24b95ae0cadfdf5db05b74888e6a228a.1724413545.git.petrm@nvidia.com>
>> From: Petr Machata <petrm@nvidia.com>
>> Date: Fri, 23 Aug 2024 14:42:48 +0300
>> Subject: [PATCH net-next mlxsw] selftests: forwarding: no_forwarding: Down
>> ports on cleanup
>> To: <nbu-linux-internal@nvidia.com>
>>
>> This test neglects to put ports down on cleanup. Fix it.
>>
>> Fixes: 476a4f05d9b8 ("selftests: forwarding: add a no_forwarding.sh test")
>> Signed-off-by: Petr Machata <petrm@nvidia.com>
>> ---
>> tools/testing/selftests/net/forwarding/no_forwarding.sh | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/tools/testing/selftests/net/forwarding/no_forwarding.sh b/tools/testing/selftests/net/forwarding/no_forwarding.sh
>> index af3b398d13f0..9e677aa64a06 100755
>> --- a/tools/testing/selftests/net/forwarding/no_forwarding.sh
>> +++ b/tools/testing/selftests/net/forwarding/no_forwarding.sh
>> @@ -233,6 +233,9 @@ cleanup()
>> {
>> pre_cleanup
>>
>> + ip link set dev $swp2 down
>> + ip link set dev $swp1 down
>> +
>> h2_destroy
>> h1_destroy
>>
>
> no_forwarding always runs in thread 0 because it's the slowest tests
> and we try to run from the slowest as a basic bin packing heuristic.
> Clicking thru the failures I don't see them on thread 0.
Is there a way to see what ran before?
> But putting the ports down seems like a good cleanup regardless.
I'll send it as a proper patch.
next prev parent reply other threads:[~2024-08-23 16:15 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-22 15:37 [TEST] forwarding/router_bridge_lag.sh started to flake on Monday Jakub Kicinski
2024-08-23 11:28 ` Petr Machata
2024-08-23 15:02 ` Jakub Kicinski
2024-08-23 16:13 ` Petr Machata [this message]
2024-08-24 21:27 ` Jakub Kicinski
2024-08-25 9:01 ` Petr Machata
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttfbi5ce.fsf@nvidia.com \
--to=petrm@nvidia.com \
--cc=kuba@kernel.org \
--cc=liuhangbin@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=razor@blackwall.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).