* [PATCH net-next v2] selftests: openvswitch: retry instead of sleep
@ 2024-07-10 9:04 Adrian Moreno
2024-07-11 9:41 ` Adrián Moreno
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Adrian Moreno @ 2024-07-10 9:04 UTC (permalink / raw)
To: netdev
Cc: Adrian Moreno, Pravin B Shelar, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shuah Khan, dev, linux-kselftest,
linux-kernel
There are a couple of places where the test script "sleep"s to wait for
some external condition to be met.
This is error prone, specially in slow systems (identified in CI by
"KSFT_MACHINE_SLOW=yes").
To fix this, add a "ovs_wait" function that tries to execute a command
a few times until it succeeds. The timeout used is set to 5s for
"normal" systems and doubled if a slow CI machine is detected.
This should make the following work:
$ vng --build \
--config tools/testing/selftests/net/config \
--config kernel/configs/debug.config
$ vng --run . --user root -- "make -C tools/testing/selftests/ \
KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
.../selftests/net/openvswitch/openvswitch.sh | 45 +++++++++++++++----
.../selftests/net/openvswitch/ovs-dpctl.py | 1 +
2 files changed, 38 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index bc71dbc18b21..cc0bfae2bafa 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -11,6 +11,11 @@ ksft_skip=4
PAUSE_ON_FAIL=no
VERBOSE=0
TRACING=0
+WAIT_TIMEOUT=5
+
+if test "X$KSFT_MACHINE_SLOW" == "Xyes"; then
+ WAIT_TIMEOUT=10
+fi
tests="
arp_ping eth-arp: Basic arp ping between two NS
@@ -29,6 +34,30 @@ info() {
[ $VERBOSE = 0 ] || echo $*
}
+ovs_wait() {
+ info "waiting $WAIT_TIMEOUT s for: $@"
+
+ if "$@" ; then
+ info "wait succeeded immediately"
+ return 0
+ fi
+
+ # A quick re-check helps speed up small races in fast systems.
+ # However, fractional sleeps might not necessarily work.
+ local start=0
+ sleep 0.1 || { sleep 1; start=1; }
+
+ for (( i=start; i<WAIT_TIMEOUT; i++ )); do
+ if "$@" ; then
+ info "wait succeeded after $i seconds"
+ return 0
+ fi
+ sleep 1
+ done
+ info "wait failed after $i seconds"
+ return 1
+}
+
ovs_base=`pwd`
sbxs=
sbx_add () {
@@ -278,20 +307,19 @@ test_psample() {
# Record psample data.
ovs_spawn_daemon "test_psample" python3 $ovs_base/ovs-dpctl.py psample-events
+ ovs_wait grep -q "listening for psample events" ${ovs_dir}/stdout
# Send a single ping.
- sleep 1
ovs_sbx "test_psample" ip netns exec client ping -I c1 172.31.110.20 -c 1 || return 1
- sleep 1
# We should have received one userspace action upcall and 2 psample packets.
- grep -E "userspace action command" $ovs_dir/s0.out >/dev/null 2>&1 || return 1
+ ovs_wait grep -q "userspace action command" $ovs_dir/s0.out || return 1
# client -> server samples should only contain the first 14 bytes of the packet.
- grep -E "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \
- $ovs_dir/stdout >/dev/null 2>&1 || return 1
- grep -E "rate:4294967295,group:2,cookie:eeff0c" \
- $ovs_dir/stdout >/dev/null 2>&1 || return 1
+ ovs_wait grep -qE "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \
+ $ovs_dir/stdout || return 1
+
+ ovs_wait grep -q "rate:4294967295,group:2,cookie:eeff0c" $ovs_dir/stdout || return 1
return 0
}
@@ -711,7 +739,8 @@ test_upcall_interfaces() {
ovs_add_netns_and_veths "test_upcall_interfaces" ui0 upc left0 l0 \
172.31.110.1/24 -u || return 1
- sleep 1
+ ovs_wait grep -q "listening on upcall packet handler" ${ovs_dir}/left0.out
+
info "sending arping"
ip netns exec upc arping -I l0 172.31.110.20 -c 1 \
>$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 1e15b0818074..8a0396bfaf99 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -2520,6 +2520,7 @@ class PsampleEvent(EventSocket):
marshal_class = psample_msg
def read_samples(self):
+ print("listening for psample events", flush=True)
while True:
try:
for msg in self.get():
--
2.45.2
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH net-next v2] selftests: openvswitch: retry instead of sleep
2024-07-10 9:04 [PATCH net-next v2] selftests: openvswitch: retry instead of sleep Adrian Moreno
@ 2024-07-11 9:41 ` Adrián Moreno
2024-07-11 15:44 ` Jakub Kicinski
2024-07-11 11:28 ` [ovs-dev] " Ilya Maximets
2024-07-12 1:20 ` patchwork-bot+netdevbpf
2 siblings, 1 reply; 5+ messages in thread
From: Adrián Moreno @ 2024-07-11 9:41 UTC (permalink / raw)
To: netdev
Cc: Pravin B Shelar, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Shuah Khan, dev, linux-kselftest, linux-kernel,
horms
On Wed, Jul 10, 2024 at 11:04:59AM GMT, Adrian Moreno wrote:
> There are a couple of places where the test script "sleep"s to wait for
> some external condition to be met.
>
> This is error prone, specially in slow systems (identified in CI by
> "KSFT_MACHINE_SLOW=yes").
>
> To fix this, add a "ovs_wait" function that tries to execute a command
> a few times until it succeeds. The timeout used is set to 5s for
> "normal" systems and doubled if a slow CI machine is detected.
>
> This should make the following work:
>
> $ vng --build \
> --config tools/testing/selftests/net/config \
> --config kernel/configs/debug.config
>
> $ vng --run . --user root -- "make -C tools/testing/selftests/ \
> KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
>
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> ---
> .../selftests/net/openvswitch/openvswitch.sh | 45 +++++++++++++++----
> .../selftests/net/openvswitch/ovs-dpctl.py | 1 +
> 2 files changed, 38 insertions(+), 8 deletions(-)
>
> diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> index bc71dbc18b21..cc0bfae2bafa 100755
> --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
> +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> @@ -11,6 +11,11 @@ ksft_skip=4
> PAUSE_ON_FAIL=no
> VERBOSE=0
> TRACING=0
> +WAIT_TIMEOUT=5
> +
> +if test "X$KSFT_MACHINE_SLOW" == "Xyes"; then
> + WAIT_TIMEOUT=10
> +fi
>
> tests="
> arp_ping eth-arp: Basic arp ping between two NS
> @@ -29,6 +34,30 @@ info() {
> [ $VERBOSE = 0 ] || echo $*
> }
>
> +ovs_wait() {
> + info "waiting $WAIT_TIMEOUT s for: $@"
> +
> + if "$@" ; then
> + info "wait succeeded immediately"
> + return 0
> + fi
> +
> + # A quick re-check helps speed up small races in fast systems.
> + # However, fractional sleeps might not necessarily work.
> + local start=0
> + sleep 0.1 || { sleep 1; start=1; }
> +
> + for (( i=start; i<WAIT_TIMEOUT; i++ )); do
> + if "$@" ; then
> + info "wait succeeded after $i seconds"
> + return 0
> + fi
> + sleep 1
> + done
> + info "wait failed after $i seconds"
> + return 1
> +}
> +
> ovs_base=`pwd`
> sbxs=
> sbx_add () {
> @@ -278,20 +307,19 @@ test_psample() {
>
> # Record psample data.
> ovs_spawn_daemon "test_psample" python3 $ovs_base/ovs-dpctl.py psample-events
> + ovs_wait grep -q "listening for psample events" ${ovs_dir}/stdout
>
> # Send a single ping.
> - sleep 1
> ovs_sbx "test_psample" ip netns exec client ping -I c1 172.31.110.20 -c 1 || return 1
> - sleep 1
>
> # We should have received one userspace action upcall and 2 psample packets.
> - grep -E "userspace action command" $ovs_dir/s0.out >/dev/null 2>&1 || return 1
> + ovs_wait grep -q "userspace action command" $ovs_dir/s0.out || return 1
>
> # client -> server samples should only contain the first 14 bytes of the packet.
> - grep -E "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \
> - $ovs_dir/stdout >/dev/null 2>&1 || return 1
> - grep -E "rate:4294967295,group:2,cookie:eeff0c" \
> - $ovs_dir/stdout >/dev/null 2>&1 || return 1
> + ovs_wait grep -qE "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \
> + $ovs_dir/stdout || return 1
> +
> + ovs_wait grep -q "rate:4294967295,group:2,cookie:eeff0c" $ovs_dir/stdout || return 1
>
> return 0
> }
> @@ -711,7 +739,8 @@ test_upcall_interfaces() {
> ovs_add_netns_and_veths "test_upcall_interfaces" ui0 upc left0 l0 \
> 172.31.110.1/24 -u || return 1
>
> - sleep 1
> + ovs_wait grep -q "listening on upcall packet handler" ${ovs_dir}/left0.out
> +
> info "sending arping"
> ip netns exec upc arping -I l0 172.31.110.20 -c 1 \
> >$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr
> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> index 1e15b0818074..8a0396bfaf99 100644
> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> @@ -2520,6 +2520,7 @@ class PsampleEvent(EventSocket):
> marshal_class = psample_msg
>
> def read_samples(self):
> + print("listening for psample events", flush=True)
> while True:
> try:
> for msg in self.get():
> --
> 2.45.2
>
This patch is supposed to fix openvswitch selftests on "-dbg" machines.
However, as Simon points out, all recent rounds are failing [1]. I don't
see this patch being included in the batches and I was wondering why.
Also I see a (presumably unrelated) build error netdev/build_32bit.
Is there anything I can do?
[1]
https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=openvswitch-sh
Thanks.
Adrián
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH net-next v2] selftests: openvswitch: retry instead of sleep
2024-07-11 9:41 ` Adrián Moreno
@ 2024-07-11 15:44 ` Jakub Kicinski
0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2024-07-11 15:44 UTC (permalink / raw)
To: Adrián Moreno
Cc: netdev, Pravin B Shelar, David S. Miller, Eric Dumazet,
Paolo Abeni, Shuah Khan, dev, linux-kselftest, linux-kernel,
horms
On Thu, 11 Jul 2024 09:41:08 +0000 Adrián Moreno wrote:
> This patch is supposed to fix openvswitch selftests on "-dbg" machines.
> However, as Simon points out, all recent rounds are failing [1]. I don't
> see this patch being included in the batches and I was wondering why.
>
> Also I see a (presumably unrelated) build error netdev/build_32bit.
> Is there anything I can do?
Hopefully fixed now, we'll see if it gets into the next run.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [ovs-dev] [PATCH net-next v2] selftests: openvswitch: retry instead of sleep
2024-07-10 9:04 [PATCH net-next v2] selftests: openvswitch: retry instead of sleep Adrian Moreno
2024-07-11 9:41 ` Adrián Moreno
@ 2024-07-11 11:28 ` Ilya Maximets
2024-07-12 1:20 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 5+ messages in thread
From: Ilya Maximets @ 2024-07-11 11:28 UTC (permalink / raw)
To: Adrian Moreno, netdev
Cc: dev, linux-kernel, Eric Dumazet, linux-kselftest, Jakub Kicinski,
Paolo Abeni, Shuah Khan, David S. Miller, i.maximets
On 7/10/24 11:04, Adrian Moreno wrote:
> There are a couple of places where the test script "sleep"s to wait for
> some external condition to be met.
>
> This is error prone, specially in slow systems (identified in CI by
> "KSFT_MACHINE_SLOW=yes").
>
> To fix this, add a "ovs_wait" function that tries to execute a command
> a few times until it succeeds. The timeout used is set to 5s for
> "normal" systems and doubled if a slow CI machine is detected.
>
> This should make the following work:
>
> $ vng --build \
> --config tools/testing/selftests/net/config \
> --config kernel/configs/debug.config
>
> $ vng --run . --user root -- "make -C tools/testing/selftests/ \
> KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
>
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> ---
> .../selftests/net/openvswitch/openvswitch.sh | 45 +++++++++++++++----
> .../selftests/net/openvswitch/ovs-dpctl.py | 1 +
> 2 files changed, 38 insertions(+), 8 deletions(-)
Seem like we don't have a signal from CI for some reason yet,
but I tested this locally and it seem to work fine. Either
way it's a better way of doing things than sleep'n'hope.
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v2] selftests: openvswitch: retry instead of sleep
2024-07-10 9:04 [PATCH net-next v2] selftests: openvswitch: retry instead of sleep Adrian Moreno
2024-07-11 9:41 ` Adrián Moreno
2024-07-11 11:28 ` [ovs-dev] " Ilya Maximets
@ 2024-07-12 1:20 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-07-12 1:20 UTC (permalink / raw)
To: =?utf-8?q?Adri=C3=A1n_Moreno_=3Camorenoz=40redhat=2Ecom=3E?=
Cc: netdev, pshelar, davem, edumazet, kuba, pabeni, shuah, dev,
linux-kselftest, linux-kernel
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 10 Jul 2024 11:04:59 +0200 you wrote:
> There are a couple of places where the test script "sleep"s to wait for
> some external condition to be met.
>
> This is error prone, specially in slow systems (identified in CI by
> "KSFT_MACHINE_SLOW=yes").
>
> To fix this, add a "ovs_wait" function that tries to execute a command
> a few times until it succeeds. The timeout used is set to 5s for
> "normal" systems and doubled if a slow CI machine is detected.
>
> [...]
Here is the summary with links:
- [net-next,v2] selftests: openvswitch: retry instead of sleep
https://git.kernel.org/netdev/net-next/c/5e724cb688a2
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-12 1:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-10 9:04 [PATCH net-next v2] selftests: openvswitch: retry instead of sleep Adrian Moreno
2024-07-11 9:41 ` Adrián Moreno
2024-07-11 15:44 ` Jakub Kicinski
2024-07-11 11:28 ` [ovs-dev] " Ilya Maximets
2024-07-12 1:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).