* [PATCH net] selftests: rtnetlink: try double sleep to give WQ a chance
@ 2025-07-10 14:53 Jakub Kicinski
2025-07-11 2:14 ` Hangbin Liu
0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2025-07-10 14:53 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
liuhangbin, shuah, linux-kselftest
The rtnetlink test for preferred lifetime of an address is quite flaky.
Problems started around the 6.16 merge window in May. The test fails
with:
FAIL: preferred_lft addresses remaining
and unlike most of our flakes this one fails on the "normal" kernel
builds, not the builds with kernel/configs/debug.config. I suspect
the flakes may be related to power saving, since the expirations
run from a "power efficient" workqueue. Adding a short sleep seems
to decrease the flakes by 8x but they still happen. With this
patch in place we get a flake every couple of weeks, not every
couple of days. Better ideas welcome..
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: liuhangbin@gmail.com
CC: shuah@kernel.org
CC: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/rtnetlink.sh | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 2e8243a65b50..b9e1497ea27a 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -299,6 +299,11 @@ kci_test_addrlft()
done
sleep 5
+ # Schedule out for a bit, address GC runs from the power efficient WQ
+ # if the long sleep above has put the whole system into sleep state
+ # the WQ may have not had a chance to run.
+ sleep 0.1
+
run_cmd_grep_fail "10.23.11." ip addr show dev "$devdummy"
if [ $? -eq 0 ]; then
check_err 1
--
2.50.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net] selftests: rtnetlink: try double sleep to give WQ a chance
2025-07-10 14:53 [PATCH net] selftests: rtnetlink: try double sleep to give WQ a chance Jakub Kicinski
@ 2025-07-11 2:14 ` Hangbin Liu
2025-07-11 14:17 ` Jakub Kicinski
0 siblings, 1 reply; 5+ messages in thread
From: Hangbin Liu @ 2025-07-11 2:14 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
linux-kselftest
On Thu, Jul 10, 2025 at 07:53:12AM -0700, Jakub Kicinski wrote:
> The rtnetlink test for preferred lifetime of an address is quite flaky.
> Problems started around the 6.16 merge window in May. The test fails
> with:
>
> FAIL: preferred_lft addresses remaining
>
> and unlike most of our flakes this one fails on the "normal" kernel
> builds, not the builds with kernel/configs/debug.config. I suspect
> the flakes may be related to power saving, since the expirations
> run from a "power efficient" workqueue. Adding a short sleep seems
> to decrease the flakes by 8x but they still happen. With this
> patch in place we get a flake every couple of weeks, not every
> couple of days. Better ideas welcome..
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: liuhangbin@gmail.com
> CC: shuah@kernel.org
> CC: linux-kselftest@vger.kernel.org
> ---
> tools/testing/selftests/net/rtnetlink.sh | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
> index 2e8243a65b50..b9e1497ea27a 100755
> --- a/tools/testing/selftests/net/rtnetlink.sh
> +++ b/tools/testing/selftests/net/rtnetlink.sh
> @@ -299,6 +299,11 @@ kci_test_addrlft()
> done
>
> sleep 5
> + # Schedule out for a bit, address GC runs from the power efficient WQ
> + # if the long sleep above has put the whole system into sleep state
> + # the WQ may have not had a chance to run.
> + sleep 0.1
> +
How about use slowwait to check if the address still exists. e.g.
check_addr_not_exist()
{
dev=$1
addr=$2
if ip addr show dev $dev | grep -q $addr; then
return 1
else
return 0
}
slowwait 5 check_addr_not_exist "$devdummy" "10.23.11."
> run_cmd_grep_fail "10.23.11." ip addr show dev "$devdummy"
> if [ $? -eq 0 ]; then
> check_err 1
> --
> 2.50.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] selftests: rtnetlink: try double sleep to give WQ a chance
2025-07-11 2:14 ` Hangbin Liu
@ 2025-07-11 14:17 ` Jakub Kicinski
2025-07-14 7:19 ` Hangbin Liu
0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2025-07-11 14:17 UTC (permalink / raw)
To: Hangbin Liu
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
linux-kselftest
On Fri, 11 Jul 2025 02:14:03 +0000 Hangbin Liu wrote:
> > sleep 5
> > + # Schedule out for a bit, address GC runs from the power efficient WQ
> > + # if the long sleep above has put the whole system into sleep state
> > + # the WQ may have not had a chance to run.
> > + sleep 0.1
> > +
>
> How about use slowwait to check if the address still exists.
Weirdly if we read the addresses twice they disappear, I haven't looked
into the code for the why, but seemed like using slowwait could
potentially mask the addresses sticking around when nobody runs
the Netlink handlers for a while? Dunno..
I queued this debug patch a couple of months ago:
sleep 5
- run_cmd_grep_fail "10.23.11." ip addr show dev "$devdummy"
+ ip addr show dev "$devdummy" > /tmp/a
+ run_cmd_grep_fail "10.23.11." cat /tmp/a
if [ $? -eq 0 ]; then
- check_err 1
- end_test "FAIL: preferred_lft addresses remaining"
+ check_err 1
+ cat /tmp/a
+ echo "==="
+ ip addr show dev "$devdummy"
+ end_test "FAIL: preferred_lft addresses remaining ($lft)"
return
fi
And when it flakes the output looks like this:
# 7.23 [+7.00] 297: test-dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
# 7.23 [+0.00] link/ether 9e:a6:c4:c2:1b:16 brd ff:ff:ff:ff:ff:ff
# 7.23 [+0.00] inet 10.23.11.81/32 scope global deprecated dynamic test-dummy0
# 7.23 [+0.00] valid_lft 0sec preferred_lft 0sec
# 7.23 [+0.00] inet 10.23.11.84/32 scope global deprecated dynamic test-dummy0
# 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
# 7.24 [+0.00] inet 10.23.11.93/32 scope global deprecated dynamic test-dummy0
# 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
# 7.24 [+0.00] inet 10.23.11.94/32 scope global deprecated dynamic test-dummy0
# 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
# 7.24 [+0.00] inet 10.23.11.97/32 scope global deprecated dynamic test-dummy0
# 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
# 7.24 [+0.00] inet 10.23.11.99/32 scope global deprecated dynamic test-dummy0
# 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
# 7.24 [+0.00] inet6 fe80::9ca6:c4ff:fec2:1b16/64 scope link proto kernel_ll
# 7.24 [+0.00] valid_lft forever preferred_lft forever
# 7.24 [+0.00] ===
# 7.25 [+0.00] 297: test-dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
# 7.25 [+0.00] link/ether 9e:a6:c4:c2:1b:16 brd ff:ff:ff:ff:ff:ff
# 7.25 [+0.00] inet6 fe80::9ca6:c4ff:fec2:1b16/64 scope link proto kernel_ll
# 7.25 [+0.00] valid_lft forever preferred_lft forever
# 7.25 [+0.00] FAIL: preferred_lft addresses remaining (1)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] selftests: rtnetlink: try double sleep to give WQ a chance
2025-07-11 14:17 ` Jakub Kicinski
@ 2025-07-14 7:19 ` Hangbin Liu
2025-07-14 22:30 ` Jakub Kicinski
0 siblings, 1 reply; 5+ messages in thread
From: Hangbin Liu @ 2025-07-14 7:19 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
linux-kselftest
On Fri, Jul 11, 2025 at 07:17:29AM -0700, Jakub Kicinski wrote:
> On Fri, 11 Jul 2025 02:14:03 +0000 Hangbin Liu wrote:
> > > sleep 5
> > > + # Schedule out for a bit, address GC runs from the power efficient WQ
> > > + # if the long sleep above has put the whole system into sleep state
> > > + # the WQ may have not had a chance to run.
> > > + sleep 0.1
> > > +
> >
> > How about use slowwait to check if the address still exists.
>
> Weirdly if we read the addresses twice they disappear, I haven't looked
> into the code for the why, but seemed like using slowwait could
> potentially mask the addresses sticking around when nobody runs
> the Netlink handlers for a while? Dunno..
Not sure if I understand correctly. Do you mean the addresses will keep there
if we use slowwait?
Thanks
Hangbin
>
> I queued this debug patch a couple of months ago:
>
> sleep 5
> - run_cmd_grep_fail "10.23.11." ip addr show dev "$devdummy"
> + ip addr show dev "$devdummy" > /tmp/a
> + run_cmd_grep_fail "10.23.11." cat /tmp/a
> if [ $? -eq 0 ]; then
> - check_err 1
> - end_test "FAIL: preferred_lft addresses remaining"
> + check_err 1
> + cat /tmp/a
> + echo "==="
> + ip addr show dev "$devdummy"
> + end_test "FAIL: preferred_lft addresses remaining ($lft)"
> return
> fi
>
> And when it flakes the output looks like this:
>
> # 7.23 [+7.00] 297: test-dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
> # 7.23 [+0.00] link/ether 9e:a6:c4:c2:1b:16 brd ff:ff:ff:ff:ff:ff
> # 7.23 [+0.00] inet 10.23.11.81/32 scope global deprecated dynamic test-dummy0
> # 7.23 [+0.00] valid_lft 0sec preferred_lft 0sec
> # 7.23 [+0.00] inet 10.23.11.84/32 scope global deprecated dynamic test-dummy0
> # 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
> # 7.24 [+0.00] inet 10.23.11.93/32 scope global deprecated dynamic test-dummy0
> # 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
> # 7.24 [+0.00] inet 10.23.11.94/32 scope global deprecated dynamic test-dummy0
> # 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
> # 7.24 [+0.00] inet 10.23.11.97/32 scope global deprecated dynamic test-dummy0
> # 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
> # 7.24 [+0.00] inet 10.23.11.99/32 scope global deprecated dynamic test-dummy0
> # 7.24 [+0.00] valid_lft 0sec preferred_lft 0sec
> # 7.24 [+0.00] inet6 fe80::9ca6:c4ff:fec2:1b16/64 scope link proto kernel_ll
> # 7.24 [+0.00] valid_lft forever preferred_lft forever
> # 7.24 [+0.00] ===
> # 7.25 [+0.00] 297: test-dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
> # 7.25 [+0.00] link/ether 9e:a6:c4:c2:1b:16 brd ff:ff:ff:ff:ff:ff
> # 7.25 [+0.00] inet6 fe80::9ca6:c4ff:fec2:1b16/64 scope link proto kernel_ll
> # 7.25 [+0.00] valid_lft forever preferred_lft forever
> # 7.25 [+0.00] FAIL: preferred_lft addresses remaining (1)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] selftests: rtnetlink: try double sleep to give WQ a chance
2025-07-14 7:19 ` Hangbin Liu
@ 2025-07-14 22:30 ` Jakub Kicinski
0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2025-07-14 22:30 UTC (permalink / raw)
To: Hangbin Liu
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
linux-kselftest
On Mon, 14 Jul 2025 07:19:09 +0000 Hangbin Liu wrote:
> > > How about use slowwait to check if the address still exists.
> >
> > Weirdly if we read the addresses twice they disappear, I haven't looked
> > into the code for the why, but seemed like using slowwait could
> > potentially mask the addresses sticking around when nobody runs
> > the Netlink handlers for a while? Dunno..
>
> Not sure if I understand correctly. Do you mean the addresses will keep there
> if we use slowwait?
No, I mean there may be false negatives, not false positive.
But maybe it's fine, it will definitely prevent flakes.
Could you post the slowwait patch officially?
--
pw-bot: cr
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-14 22:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-10 14:53 [PATCH net] selftests: rtnetlink: try double sleep to give WQ a chance Jakub Kicinski
2025-07-11 2:14 ` Hangbin Liu
2025-07-11 14:17 ` Jakub Kicinski
2025-07-14 7:19 ` Hangbin Liu
2025-07-14 22:30 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).