[PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests
@ 2024-10-07 16:26 Petr Machata
  2024-10-07 16:26 ` [PATCH net-next 1/5] selftests: mlxsw: sch_red_ets: Increase required backlog Petr Machata
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Petr Machata @ 2024-10-07 16:26 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, linux-kselftest, Shuah Khan,
	Petr Machata, mlxsw

Tweak the mlxsw-specific RED selftests to increase stability on
Spectrum-3 and Spectrum-4 machines.

Petr Machata (5):
  selftests: mlxsw: sch_red_ets: Increase required backlog
  selftests: mlxsw: sch_red_core: Increase backlog size tolerance
  selftests: mlxsw: sch_red_core: Sleep before querying queue depth
  selftests: mlxsw: sch_red_core: Send more packets for drop tests
  selftests: mlxsw: sch_red_core: Lower TBF rate

 .../drivers/net/mlxsw/sch_red_core.sh         | 28 +++++++++++--------
 .../drivers/net/mlxsw/sch_red_ets.sh          |  8 +++---
 2 files changed, 20 insertions(+), 16 deletions(-)

-- 
2.45.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/5] selftests: mlxsw: sch_red_ets: Increase required backlog
  2024-10-07 16:26 [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests Petr Machata
@ 2024-10-07 16:26 ` Petr Machata
  2024-10-07 16:26 ` [PATCH net-next 2/5] selftests: mlxsw: sch_red_core: Increase backlog size tolerance Petr Machata
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Petr Machata @ 2024-10-07 16:26 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, linux-kselftest, Shuah Khan,
	Petr Machata, mlxsw

Backlog fluctuates on Spectrum-4 much more than on <4. Increasing the
desired backlog seems to help, as the constant fluctuations do not overlap
into the territory where packets are marked.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
---
 tools/testing/selftests/drivers/net/mlxsw/sch_red_ets.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_ets.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_ets.sh
index 8ecddafa79b3..576067b207a8 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_ets.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_ets.sh
@@ -20,8 +20,8 @@ source sch_red_core.sh
 # $BACKLOG2 are far enough not to overlap, so that we can assume that if we do
 # see (do not see) marking, it is actually due to the configuration of that one
 # TC, and not due to configuration of the other TC leaking over.
-BACKLOG1=200000
-BACKLOG2=500000
+BACKLOG1=400000
+BACKLOG2=1000000
 
 install_root_qdisc()
 {
@@ -35,7 +35,7 @@ install_qdisc_tc0()
 
 	tc qdisc add dev $swp3 parent 10:8 handle 108: red \
 	   limit 1000000 min $BACKLOG1 max $((BACKLOG1 + 1)) \
-	   probability 1.0 avpkt 8000 burst 38 "${args[@]}"
+	   probability 1.0 avpkt 8000 burst 51 "${args[@]}"
 }
 
 install_qdisc_tc1()
@@ -44,7 +44,7 @@ install_qdisc_tc1()
 
 	tc qdisc add dev $swp3 parent 10:7 handle 107: red \
 	   limit 1000000 min $BACKLOG2 max $((BACKLOG2 + 1)) \
-	   probability 1.0 avpkt 8000 burst 63 "${args[@]}"
+	   probability 1.0 avpkt 8000 burst 126 "${args[@]}"
 }
 
 install_qdisc()
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/5] selftests: mlxsw: sch_red_core: Increase backlog size tolerance
  2024-10-07 16:26 [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests Petr Machata
  2024-10-07 16:26 ` [PATCH net-next 1/5] selftests: mlxsw: sch_red_ets: Increase required backlog Petr Machata
@ 2024-10-07 16:26 ` Petr Machata
  2024-10-07 16:26 ` [PATCH net-next 3/5] selftests: mlxsw: sch_red_core: Sleep before querying queue depth Petr Machata
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Petr Machata @ 2024-10-07 16:26 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, linux-kselftest, Shuah Khan,
	Petr Machata, mlxsw

Backlog fluctuates on Spectrum-4 much more than on <4. In practice we can
sample queue depth values going from about -12% to about +7% of the
configured RED limit. The test which checks the queue size has a limit of
+-10%, and as a result often fails. We attempted to fix the issue by
busywaiting for several seconds hoping to get within the bounds, but that
still proved to be too noisy (or the wait time would be impractically
long). Unfortunately we have to bump the value tolerance from 10% to 15%,
which in this patch do.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
---
 tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index 299e06a5808c..a25a15eb6d31 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -532,10 +532,11 @@ do_red_test()
 	check_fail $? "Traffic went into backlog instead of being early-dropped"
 	pct=$(check_marking get_nmarked $vlan "== 0")
 	check_err $? "backlog $backlog / $limit Got $pct% marked packets, expected == 0."
+	backlog=$(get_qdisc_backlog $vlan)
 	local diff=$((limit - backlog))
 	pct=$((100 * diff / limit))
-	((-10 <= pct && pct <= 10))
-	check_err $? "backlog $backlog / $limit expected <= 10% distance"
+	((-15 <= pct && pct <= 15))
+	check_err $? "backlog $backlog / $limit expected <= 15% distance"
 	log_test "TC $((vlan - 10)): RED backlog > limit"
 
 	stop_traffic
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 3/5] selftests: mlxsw: sch_red_core: Sleep before querying queue depth
  2024-10-07 16:26 [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests Petr Machata
  2024-10-07 16:26 ` [PATCH net-next 1/5] selftests: mlxsw: sch_red_ets: Increase required backlog Petr Machata
  2024-10-07 16:26 ` [PATCH net-next 2/5] selftests: mlxsw: sch_red_core: Increase backlog size tolerance Petr Machata
@ 2024-10-07 16:26 ` Petr Machata
  2024-10-07 16:26 ` [PATCH net-next 4/5] selftests: mlxsw: sch_red_core: Send more packets for drop tests Petr Machata
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Petr Machata @ 2024-10-07 16:26 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, linux-kselftest, Shuah Khan,
	Petr Machata, mlxsw

The qdisc stats are taken from the port's periodic HW stats, which are
updated once a second. We try to accommodate the latency by using busywait
in build_backlog().

The issue in that seems to be that when do_mark_test() builds the backlog,
it makes the decision whether to send more packets based on the first
instance of the queue depth stat exceeding the current value, when in fact
more traffic is on the way and the queue depth would increase further. This
leads to failures in TC 1 of mark-mirror test, where we see the following
failure:

TEST: TC 0: marked packets mirror'd                                 [ OK ]
TEST: TC 1: marked packets mirror'd                                 [FAIL]
        Spurious packets (1680 -> 2290) observed without buffer pressure

Fix by waiting for the full second before reading the queue depth for the
first time, to make sure it reflects all in-flight traffic.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
---
 tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index a25a15eb6d31..b1ff395b3880 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -372,6 +372,7 @@ build_backlog()
 	local i=0
 
 	while :; do
+		sleep 1
 		local cur=$(busywait 1100 until_counter_is "> $cur" \
 					    get_qdisc_backlog $vlan)
 		local diff=$((size - cur))
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 4/5] selftests: mlxsw: sch_red_core: Send more packets for drop tests
  2024-10-07 16:26 [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests Petr Machata
                   ` (2 preceding siblings ...)
  2024-10-07 16:26 ` [PATCH net-next 3/5] selftests: mlxsw: sch_red_core: Sleep before querying queue depth Petr Machata
@ 2024-10-07 16:26 ` Petr Machata
  2024-10-07 16:26 ` [PATCH net-next 5/5] selftests: mlxsw: sch_red_core: Lower TBF rate Petr Machata
  2024-10-08 23:20 ` [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests patchwork-bot+netdevbpf
  5 siblings, 0 replies; 7+ messages in thread
From: Petr Machata @ 2024-10-07 16:26 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, linux-kselftest, Shuah Khan,
	Petr Machata, mlxsw

This test works by injecting into a port with a maxed-out queue a couple
packets and checks if a corresponding number of packets were dropped. This
has worked well on Spectrum<4, but on Spectrum-4 it has been noisy. This
is in line with the observation that on Spectrum-4, queue size tends to
fluctuate more. A handful of packets could then still be accepted to the
queue even though it was nominally full just recently.

In order to accommodate this behavior, send many more packets. The buffer
can fit N extra packets, but not N% packets. This therefore allows us to
set wider absolute margins, while actually narrowing them relatively.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 .../selftests/drivers/net/mlxsw/sch_red_core.sh  | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index b1ff395b3880..316444389c4e 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -653,20 +653,22 @@ do_drop_test()
 	build_backlog $vlan $((3 * limit / 2)) udp >/dev/null
 
 	base=$($fetch_counter)
-	send_packets $vlan udp 11
+	send_packets $vlan udp 100
 
-	now=$(busywait 1100 until_counter_is ">= $((base + 10))" $fetch_counter)
-	check_err $? "Dropped packets not observed: 11 expected, $((now - base)) seen"
+	now=$(busywait 1100 until_counter_is ">= $((base + 95))" $fetch_counter)
+	check_err $? "${trigger}ped packets not observed: 100 expected, $((now - base)) seen"
 
 	# When no extra traffic is injected, there should be no mirroring.
-	busywait 1100 until_counter_is ">= $((base + 20))" $fetch_counter >/dev/null
+	busywait 1100 until_counter_is ">= $((base + 110))" \
+		 $fetch_counter >/dev/null
 	check_fail $? "Spurious packets observed"
 
 	# When the rule is uninstalled, there should be no mirroring.
 	qevent_rule_uninstall_$subtest
-	send_packets $vlan udp 11
-	busywait 1100 until_counter_is ">= $((base + 20))" $fetch_counter >/dev/null
-	check_fail $? "Spurious packets observed after uninstall"
+	send_packets $vlan udp 100
+	now=$(busywait 1100 until_counter_is ">= $((base + 110))" \
+		       $fetch_counter)
+	check_fail $? "$((now - base)) spurious packets observed after uninstall"
 
 	log_test "TC $((vlan - 10)): ${trigger}ped packets $subtest'd"
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 5/5] selftests: mlxsw: sch_red_core: Lower TBF rate
  2024-10-07 16:26 [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests Petr Machata
                   ` (3 preceding siblings ...)
  2024-10-07 16:26 ` [PATCH net-next 4/5] selftests: mlxsw: sch_red_core: Send more packets for drop tests Petr Machata
@ 2024-10-07 16:26 ` Petr Machata
  2024-10-08 23:20 ` [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests patchwork-bot+netdevbpf
  5 siblings, 0 replies; 7+ messages in thread
From: Petr Machata @ 2024-10-07 16:26 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Amit Cohen, Ido Schimmel, linux-kselftest, Shuah Khan,
	Petr Machata, mlxsw

The RED test uses a pair of TBF shapers. The first to get predictably-sized
stream of traffic, and second to get a 100% saturated chokepoint. To this
chokepoint it injects individual packets. Because the chokepoint is
saturated, these additional packets go straight to the backlog. This allows
the test to check RED behavior across various queue sizes.

The shapers are rated at 1Gbps, for historical reasons (before mlxsw
supported TBF offload, the test used port speed to create the chokepoints).
Machines with a low-power CPU may have trouble consistently generating
1Gbps of traffic, and the test then spuriously fails.

Instead, drop the rate to 200Mbps (Spectrum has a guaranteed shaper rate
granularity of 200Mbps, so anything lower is not guaranteed to work well).
Because that means fewer packets will be mirrored in the ECN-mark test,
adjust the passing condition accordingly.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index 316444389c4e..f4c324957dcc 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -137,7 +137,7 @@ h2_create()
 	# Prevent this by adding a shaper which limits the traffic in $h2 to
 	# 1Gbps.
 
-	tc qdisc replace dev $h2 root handle 10: tbf rate 1gbit \
+	tc qdisc replace dev $h2 root handle 10: tbf rate 200mbit \
 		burst 128K limit 1G
 }
 
@@ -199,7 +199,7 @@ switch_create()
 	done
 
 	for intf in $swp3 $swp4; do
-		tc qdisc replace dev $intf root handle 1: tbf rate 1gbit \
+		tc qdisc replace dev $intf root handle 1: tbf rate 200mbit \
 			burst 128K limit 1G
 	done
 
@@ -602,7 +602,7 @@ do_mark_test()
 	# Above limit, everything should be mirrored, we should see lots of
 	# packets.
 	build_backlog $vlan $((3 * limit / 2)) tcp tos=0x01 >/dev/null
-	busywait_for_counter 1100 +10000 \
+	busywait_for_counter 1100 +2500 \
 		 $fetch_counter > /dev/null
 	check_err_fail "$should_fail" $? "ECN-marked packets $subtest'd"
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests
  2024-10-07 16:26 [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests Petr Machata
                   ` (4 preceding siblings ...)
  2024-10-07 16:26 ` [PATCH net-next 5/5] selftests: mlxsw: sch_red_core: Lower TBF rate Petr Machata
@ 2024-10-08 23:20 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-10-08 23:20 UTC (permalink / raw)
  To: Petr Machata
  Cc: davem, edumazet, kuba, pabeni, netdev, amcohen, idosch,
	linux-kselftest, shuah, mlxsw

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 7 Oct 2024 18:26:04 +0200 you wrote:
> Tweak the mlxsw-specific RED selftests to increase stability on
> Spectrum-3 and Spectrum-4 machines.
> 
> Petr Machata (5):
>   selftests: mlxsw: sch_red_ets: Increase required backlog
>   selftests: mlxsw: sch_red_core: Increase backlog size tolerance
>   selftests: mlxsw: sch_red_core: Sleep before querying queue depth
>   selftests: mlxsw: sch_red_core: Send more packets for drop tests
>   selftests: mlxsw: sch_red_core: Lower TBF rate
> 
> [...]

Here is the summary with links:
  - [net-next,1/5] selftests: mlxsw: sch_red_ets: Increase required backlog
    https://git.kernel.org/netdev/net-next/c/870dd51117cb
  - [net-next,2/5] selftests: mlxsw: sch_red_core: Increase backlog size tolerance
    https://git.kernel.org/netdev/net-next/c/8fb5b6073456
  - [net-next,3/5] selftests: mlxsw: sch_red_core: Sleep before querying queue depth
    https://git.kernel.org/netdev/net-next/c/787f148cec34
  - [net-next,4/5] selftests: mlxsw: sch_red_core: Send more packets for drop tests
    https://git.kernel.org/netdev/net-next/c/7049166e51bc
  - [net-next,5/5] selftests: mlxsw: sch_red_core: Lower TBF rate
    https://git.kernel.org/netdev/net-next/c/501fa2426b5f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-10-08 23:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-07 16:26 [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests Petr Machata
2024-10-07 16:26 ` [PATCH net-next 1/5] selftests: mlxsw: sch_red_ets: Increase required backlog Petr Machata
2024-10-07 16:26 ` [PATCH net-next 2/5] selftests: mlxsw: sch_red_core: Increase backlog size tolerance Petr Machata
2024-10-07 16:26 ` [PATCH net-next 3/5] selftests: mlxsw: sch_red_core: Sleep before querying queue depth Petr Machata
2024-10-07 16:26 ` [PATCH net-next 4/5] selftests: mlxsw: sch_red_core: Send more packets for drop tests Petr Machata
2024-10-07 16:26 ` [PATCH net-next 5/5] selftests: mlxsw: sch_red_core: Lower TBF rate Petr Machata
2024-10-08 23:20 ` [PATCH net-next 0/5] selftests: mlxsw: Stabilize RED tests patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).