netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [TEST] vxlan brige test flakiness
@ 2025-12-03 17:50 Jakub Kicinski
  2025-12-04 17:46 ` Petr Machata
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-12-03 17:50 UTC (permalink / raw)
  To: Petr Machata, netdev

Hi!

We're seeing a few more flakes on vxlan-bridge-1q-mc-ul-sh and
vxlan-bridge-1q-mc-ul-sh in the new setup than we used to (tho
the former was always relatively flaky).

# 141.78 [+13.13] TEST: VXLAN MC flood IPv6 mcroute changelink                        [FAIL]
# 141.78 [+0.00] Expected 10 packets on H2, got 11
# 141.83 [+0.04] smcroutectl: mroute: deleting route from lo10 (192.0.2.33/32,233.252.0.1/32)

https://netdev.bots.linux.dev/contest.html?pass=0&executor=vmksft-forwarding&test=vxlan-bridge-1q-mc-ul-sh&pw-n=0

Perhaps we should make the filter more specific to the test traffic?

LMK if you need access to the system.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [TEST] vxlan brige test flakiness
  2025-12-03 17:50 [TEST] vxlan brige test flakiness Jakub Kicinski
@ 2025-12-04 17:46 ` Petr Machata
  2025-12-04 18:43   ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Petr Machata @ 2025-12-04 17:46 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Petr Machata, netdev


Jakub Kicinski <kuba@kernel.org> writes:

> Hi!
>
> We're seeing a few more flakes on vxlan-bridge-1q-mc-ul-sh and
> vxlan-bridge-1q-mc-ul-sh in the new setup than we used to (tho
> the former was always relatively flaky).

You listed the same test twice, so that's the one that I'm looking into now.

> # 141.78 [+13.13] TEST: VXLAN MC flood IPv6 mcroute changelink                        [FAIL]
> # 141.78 [+0.00] Expected 10 packets on H2, got 11

I can probably reproduce the same by removing the vx_wait() sleep.

> # 141.83 [+0.04] smcroutectl: mroute: deleting route from lo10 (192.0.2.33/32,233.252.0.1/32)
>
> https://netdev.bots.linux.dev/contest.html?pass=0&executor=vmksft-forwarding&test=vxlan-bridge-1q-mc-ul-sh&pw-n=0
>
> Perhaps we should make the filter more specific to the test traffic?

Yeah, we match on just the VXLAN packets themselves, not the payload,
because matching on the encap packet is a bit annoying. Then it gets
confused by some ndisc garbage in the overlay. We can match on the
inside using u32, but it needs tweaks in a couple places. I should be
able to send a fix tomorrow.

> LMK if you need access to the system.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [TEST] vxlan brige test flakiness
  2025-12-04 17:46 ` Petr Machata
@ 2025-12-04 18:43   ` Jakub Kicinski
  2025-12-05 16:16     ` Petr Machata
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-12-04 18:43 UTC (permalink / raw)
  To: Petr Machata; +Cc: netdev

On Thu, 4 Dec 2025 18:46:30 +0100 Petr Machata wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> > We're seeing a few more flakes on vxlan-bridge-1q-mc-ul-sh and
> > vxlan-bridge-1q-mc-ul-sh in the new setup than we used to (tho
> > the former was always relatively flaky).  
> 
> You listed the same test twice, so that's the one that I'm looking into now.

Ah, I thought one of them was 1d but indeed the CI was just reporting
it twice because of different machine running the test. It's just one
test case that's flaking on two setups.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [TEST] vxlan brige test flakiness
  2025-12-04 18:43   ` Jakub Kicinski
@ 2025-12-05 16:16     ` Petr Machata
  2025-12-06  0:26       ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Petr Machata @ 2025-12-05 16:16 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Petr Machata, netdev


Jakub Kicinski <kuba@kernel.org> writes:

> On Thu, 4 Dec 2025 18:46:30 +0100 Petr Machata wrote:
>> Jakub Kicinski <kuba@kernel.org> writes:
>> > We're seeing a few more flakes on vxlan-bridge-1q-mc-ul-sh and
>> > vxlan-bridge-1q-mc-ul-sh in the new setup than we used to (tho
>> > the former was always relatively flaky).  
>> 
>> You listed the same test twice, so that's the one that I'm looking into now.
>
> Ah, I thought one of them was 1d but indeed the CI was just reporting
> it twice because of different machine running the test. It's just one
> test case that's flaking on two setups.

OK, cool.

I think the following patch would fix the issue. But I think it should
be thematically split into two parts, the lib.sh fix needs its own
explanation. Then there is a third patch to get rid of the
now-unnecessary vx_wait() helper.

I think it makes sense to send it all as next material after you open it
in January. But if the issue is super annoying, I can send the two-part
fix now for net, and the cleanup in January for next.

Let me know what you prefer.

diff --git a/tools/testing/selftests/net/forwarding/config b/tools/testing/selftests/net/forwarding/config
index ce64518aaa11..93a61c217cc3 100644
--- a/tools/testing/selftests/net/forwarding/config
+++ b/tools/testing/selftests/net/forwarding/config
@@ -28,6 +28,7 @@ CONFIG_NET_ACT_TUNNEL_KEY=m
 CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_CLS_BASIC=m
 CONFIG_NET_CLS_FLOWER=m
+CONFIG_NET_CLS_U32=m
 CONFIG_NET_CLS_MATCHALL=m
 CONFIG_NET_EMATCH=y
 CONFIG_NET_EMATCH_META=m
diff --git a/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh
index 6a570d256e07..5ce19ca08846 100755
--- a/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh
+++ b/tools/testing/selftests/net/forwarding/vxlan_bridge_1q_mc_ul.sh
@@ -138,13 +138,18 @@ install_capture()
 	defer tc qdisc del dev "$dev" clsact
 
 	tc filter add dev "$dev" ingress proto ip pref 104 \
-	   flower skip_hw ip_proto udp dst_port "$VXPORT" \
-	   action pass
+	   u32 match ip protocol 0x11 0xff \
+	       match u16 "$VXPORT" 0xffff at 0x16 \
+	       match u16 0x0800 0xffff at 0x30 \
+	       action pass
 	defer tc filter del dev "$dev" ingress proto ip pref 104
 
 	tc filter add dev "$dev" ingress proto ipv6 pref 106 \
-	   flower skip_hw ip_proto udp dst_port "$VXPORT" \
-	   action pass
+	   u32 match ip6 protocol 0x11 0xff \
+	       match u16 "$VXPORT" 0xffff at 0x2a \
+	       match u16 0x86dd 0xffff at 0x44 \
+	       match u8 0x11 0xff at 0x4c \
+	       action pass
 	defer tc filter del dev "$dev" ingress proto ipv6 pref 106
 }
 
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index a48f29b5f3b2..b7179b01b546 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -280,7 +280,8 @@ tc_rule_stats_get()
 	local selector=${1:-.packets}; shift
 
 	tc -j -s filter show dev $dev $dir pref $pref \
-	    | jq ".[1].options.actions[].stats$selector"
+	    | jq ".[] | select(.options.actions) |
+		  .options.actions[].stats$selector"
 }
 
 tc_rule_handle_stats_get()

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [TEST] vxlan brige test flakiness
  2025-12-05 16:16     ` Petr Machata
@ 2025-12-06  0:26       ` Jakub Kicinski
  2025-12-08 10:27         ` Petr Machata
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-12-06  0:26 UTC (permalink / raw)
  To: Petr Machata; +Cc: netdev

On Fri, 5 Dec 2025 17:16:56 +0100 Petr Machata wrote:
> OK, cool.
> 
> I think the following patch would fix the issue. But I think it should
> be thematically split into two parts, the lib.sh fix needs its own
> explanation. Then there is a third patch to get rid of the
> now-unnecessary vx_wait() helper.
> 
> I think it makes sense to send it all as next material after you open it
> in January. But if the issue is super annoying, I can send the two-part
> fix now for net, and the cleanup in January for next.
> 
> Let me know what you prefer.

I think both the fix and the cleanup would be acceptable at this stage
of the merge window. But no strong preference, I queued up the diff you
shared as a local NIPA patch so we can see how it fares over the
weekend. And it will get auto-ejected when you post the real thing.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [TEST] vxlan brige test flakiness
  2025-12-06  0:26       ` Jakub Kicinski
@ 2025-12-08 10:27         ` Petr Machata
  0 siblings, 0 replies; 6+ messages in thread
From: Petr Machata @ 2025-12-08 10:27 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Petr Machata, netdev


Jakub Kicinski <kuba@kernel.org> writes:

> On Fri, 5 Dec 2025 17:16:56 +0100 Petr Machata wrote:
>> OK, cool.
>> 
>> I think the following patch would fix the issue. But I think it should
>> be thematically split into two parts, the lib.sh fix needs its own
>> explanation. Then there is a third patch to get rid of the
>> now-unnecessary vx_wait() helper.
>> 
>> I think it makes sense to send it all as next material after you open it
>> in January. But if the issue is super annoying, I can send the two-part
>> fix now for net, and the cleanup in January for next.
>> 
>> Let me know what you prefer.
>
> I think both the fix and the cleanup would be acceptable at this stage
> of the merge window. But no strong preference, I queued up the diff you
> shared as a local NIPA patch so we can see how it fares over the
> weekend. And it will get auto-ejected when you post the real thing.

Looks good so far, but not conclusive, there was a similarly long streak
before Dec 3. I'll check again tomorrow. If it's still green, I'll send
all three patches for net.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-12-08 10:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-03 17:50 [TEST] vxlan brige test flakiness Jakub Kicinski
2025-12-04 17:46 ` Petr Machata
2025-12-04 18:43   ` Jakub Kicinski
2025-12-05 16:16     ` Petr Machata
2025-12-06  0:26       ` Jakub Kicinski
2025-12-08 10:27         ` Petr Machata

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).