netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Petr Machata <petrm@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Petr Machata <petrm@nvidia.com>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	Hangbin Liu <liuhangbin@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [TEST] forwarding/router_bridge_lag.sh started to flake on Monday
Date: Fri, 23 Aug 2024 18:13:01 +0200	[thread overview]
Message-ID: <87ttfbi5ce.fsf@nvidia.com> (raw)
In-Reply-To: <20240823080253.1c11c028@kernel.org>


Jakub Kicinski <kuba@kernel.org> writes:

> On Fri, 23 Aug 2024 13:28:11 +0200 Petr Machata wrote:
>> Jakub Kicinski <kuba@kernel.org> writes:
>> 
>> > Looks like forwarding/router_bridge_lag.sh has gotten a lot more flaky
>> > this week. It flaked very occasionally (and in a different way) before:
>> >
>> > https://netdev.bots.linux.dev/contest.html?executor=vmksft-forwarding&test=router-bridge-lag-sh&ld_cnt=250
>> >
>> > There doesn't seem to be any obvious commit that could have caused this.  
>> 
>> Hmm:
>>     # 3.37 [+0.11] Error: Device is up. Set it down before adding it as a team port.
>> 
>> How are the tests isolated, are they each run in their own vng, or are
>> instances shared? Could it be that the test that runs befor this one
>> neglects to take a port down?
>
> Yes, each one has its own VM, but the VM is reused for multiple tests
> serially. The "info" file shows which VM was use (thr-id identifies
> the worker, vm-id identifies VM within the worker, worker will restart
> the VM if it detects a crash).

OK, so my guess would be that whatever ran before the test forgot to put
the port down.

>> In one failure case (I don't see further back or my browser would
>> apparently catch fire) the predecessor was no_forwarding.sh, and indeed
>> it looks like it raises the ports, but I don't see where it sets them
>> back down.
>> 
>> Then router-bridge-lag's cleanup downs the ports, and on rerun it
>> succeeds. The issue would be probabilistic, because no_forwarding does
>> not always run before this test, and some tests do not care that the
>> ports are up. If that's the root cause, this should fix it:
>> 
>> From 0baf91dc24b95ae0cadfdf5db05b74888e6a228a Mon Sep 17 00:00:00 2001
>> Message-ID: <0baf91dc24b95ae0cadfdf5db05b74888e6a228a.1724413545.git.petrm@nvidia.com>
>> From: Petr Machata <petrm@nvidia.com>
>> Date: Fri, 23 Aug 2024 14:42:48 +0300
>> Subject: [PATCH net-next mlxsw] selftests: forwarding: no_forwarding: Down
>>  ports on cleanup
>> To: <nbu-linux-internal@nvidia.com>
>> 
>> This test neglects to put ports down on cleanup. Fix it.
>> 
>> Fixes: 476a4f05d9b8 ("selftests: forwarding: add a no_forwarding.sh test")
>> Signed-off-by: Petr Machata <petrm@nvidia.com>
>> ---
>>  tools/testing/selftests/net/forwarding/no_forwarding.sh | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/tools/testing/selftests/net/forwarding/no_forwarding.sh b/tools/testing/selftests/net/forwarding/no_forwarding.sh
>> index af3b398d13f0..9e677aa64a06 100755
>> --- a/tools/testing/selftests/net/forwarding/no_forwarding.sh
>> +++ b/tools/testing/selftests/net/forwarding/no_forwarding.sh
>> @@ -233,6 +233,9 @@ cleanup()
>>  {
>>  	pre_cleanup
>>  
>> +	ip link set dev $swp2 down
>> +	ip link set dev $swp1 down
>> +
>>  	h2_destroy
>>  	h1_destroy
>>  
>
> no_forwarding always runs in thread 0 because it's the slowest tests
> and we try to run from the slowest as a basic bin packing heuristic.
> Clicking thru the failures I don't see them on thread 0.

Is there a way to see what ran before?

> But putting the ports down seems like a good cleanup regardless.

I'll send it as a proper patch.

  reply	other threads:[~2024-08-23 16:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-22 15:37 [TEST] forwarding/router_bridge_lag.sh started to flake on Monday Jakub Kicinski
2024-08-23 11:28 ` Petr Machata
2024-08-23 15:02   ` Jakub Kicinski
2024-08-23 16:13     ` Petr Machata [this message]
2024-08-24 21:27       ` Jakub Kicinski
2024-08-25  9:01         ` Petr Machata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttfbi5ce.fsf@nvidia.com \
    --to=petrm@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=liuhangbin@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).