* [PATCH net-next] selftests/net: force synchronized GC for a test.
@ 2024-02-23 8:13 Kui-Feng Lee
2024-02-24 2:21 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Kui-Feng Lee @ 2024-02-23 8:13 UTC (permalink / raw)
To: netdev, ast, martin.lau, kernel-team, kuba, davem, dsahern
Cc: sinquersw, kuifeng, Kui-Feng Lee
Due to the slowness of the test environment, always set off a synchronized
GC after waiting for GC. This can fix the problem that Fib6 garbage
collection test fails occasionally.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
tools/testing/selftests/net/fib_tests.sh | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 3ec1050e47a2..0a82c9bc07bb 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -823,7 +823,9 @@ fib6_gc_test()
$IP -6 route add 2001:20::$i \
via 2001:10::2 dev dummy_10 expires $EXPIRE
done
+ # Wait for GC
sleep $(($EXPIRE * 2 + 1))
+ $NS_EXEC sysctl -wq net.ipv6.route.flush=1
check_rt_num 0 $($IP -6 route list |grep expires|wc -l)
log_test $ret 0 "ipv6 route garbage collection (with permanent routes)"
@@ -864,7 +866,7 @@ fib6_gc_test()
# Wait for GC
sleep $(($EXPIRE * 2 + 1))
-
+ $NS_EXEC sysctl -wq net.ipv6.route.flush=1
check_rt_num 5 $($IP -6 route list |grep -v expires|grep 2001:20::|wc -l)
log_test $ret 0 "ipv6 route garbage collection (replace with permanent)"
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-02-23 8:13 [PATCH net-next] selftests/net: force synchronized GC for a test Kui-Feng Lee
@ 2024-02-24 2:21 ` Jakub Kicinski
2024-02-24 3:15 ` David Ahern
2024-02-24 4:40 ` Kui-Feng Lee
0 siblings, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2024-02-24 2:21 UTC (permalink / raw)
To: Kui-Feng Lee
Cc: netdev, ast, martin.lau, kernel-team, davem, dsahern, sinquersw,
kuifeng
On Fri, 23 Feb 2024 00:13:46 -0800 Kui-Feng Lee wrote:
> Due to the slowness of the test environment
Would be interesting if it's slowness, because it failed 2 times
on the debug runner but 5 times on the non-debug one. We'll see.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-02-24 2:21 ` Jakub Kicinski
@ 2024-02-24 3:15 ` David Ahern
2024-02-24 3:54 ` Kui-Feng Lee
2024-02-29 17:39 ` Kui-Feng Lee
2024-02-24 4:40 ` Kui-Feng Lee
1 sibling, 2 replies; 13+ messages in thread
From: David Ahern @ 2024-02-24 3:15 UTC (permalink / raw)
To: Jakub Kicinski, Kui-Feng Lee
Cc: netdev, ast, martin.lau, kernel-team, davem, sinquersw, kuifeng
On 2/23/24 7:21 PM, Jakub Kicinski wrote:
> On Fri, 23 Feb 2024 00:13:46 -0800 Kui-Feng Lee wrote:
>> Due to the slowness of the test environment
>
> Would be interesting if it's slowness, because it failed 2 times
> on the debug runner but 5 times on the non-debug one. We'll see.
hmmm... that should be debugged. waiting 2*N + 1 and then requesting GC
and still failing suggests something else is at play
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-02-24 3:15 ` David Ahern
@ 2024-02-24 3:54 ` Kui-Feng Lee
2024-02-29 17:39 ` Kui-Feng Lee
1 sibling, 0 replies; 13+ messages in thread
From: Kui-Feng Lee @ 2024-02-24 3:54 UTC (permalink / raw)
To: David Ahern, Jakub Kicinski, Kui-Feng Lee
Cc: netdev, ast, martin.lau, kernel-team, davem, kuifeng
On 2/23/24 19:15, David Ahern wrote:
> On 2/23/24 7:21 PM, Jakub Kicinski wrote:
>> On Fri, 23 Feb 2024 00:13:46 -0800 Kui-Feng Lee wrote:
>>> Due to the slowness of the test environment
>>
>> Would be interesting if it's slowness, because it failed 2 times
>> on the debug runner but 5 times on the non-debug one. We'll see.
>
> hmmm... that should be debugged. waiting 2*N + 1 and then requesting GC
> and still failing suggests something else is at play
Do you mean it is still happening even with this patch?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-02-24 2:21 ` Jakub Kicinski
2024-02-24 3:15 ` David Ahern
@ 2024-02-24 4:40 ` Kui-Feng Lee
2024-02-27 8:53 ` Paolo Abeni
1 sibling, 1 reply; 13+ messages in thread
From: Kui-Feng Lee @ 2024-02-24 4:40 UTC (permalink / raw)
To: Jakub Kicinski, Kui-Feng Lee
Cc: netdev, ast, martin.lau, kernel-team, davem, dsahern, kuifeng
On 2/23/24 18:21, Jakub Kicinski wrote:
> On Fri, 23 Feb 2024 00:13:46 -0800 Kui-Feng Lee wrote:
>> Due to the slowness of the test environment
>
> Would be interesting if it's slowness, because it failed 2 times
> on the debug runner but 5 times on the non-debug one. We'll see.
The code was landed on Feb 12, 2024.
It is actually 1 time with debug runner but 3 times on the non-debug
one. 1 time with debug runner and 2 times with the non-debug runner
happened in a 12 hours period.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-02-24 4:40 ` Kui-Feng Lee
@ 2024-02-27 8:53 ` Paolo Abeni
0 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-27 8:53 UTC (permalink / raw)
To: Kui-Feng Lee, Jakub Kicinski, Kui-Feng Lee
Cc: netdev, ast, martin.lau, kernel-team, davem, dsahern, kuifeng
On Fri, 2024-02-23 at 20:40 -0800, Kui-Feng Lee wrote:
>
> On 2/23/24 18:21, Jakub Kicinski wrote:
> > On Fri, 23 Feb 2024 00:13:46 -0800 Kui-Feng Lee wrote:
> > > Due to the slowness of the test environment
> >
> > Would be interesting if it's slowness, because it failed 2 times
> > on the debug runner but 5 times on the non-debug one. We'll see.
>
> The code was landed on Feb 12, 2024.
> It is actually 1 time with debug runner but 3 times on the non-debug
> one. 1 time with debug runner and 2 times with the non-debug runner
> happened in a 12 hours period.
The only failure in a debug run was causes by an unrelated issue:
https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/467362/7-fib-tests-sh/stdout
(grep for UBSAN)
Basically the problems happens quite sporadically only in "fast" env. I
think it's worthy a deeper investigation (it sounds like a race
somewhere).
I guess/hope then raising the number of temporary and permanent routes
created by the relevant test-case could help making the issue more
easily reproducible.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-02-24 3:15 ` David Ahern
2024-02-24 3:54 ` Kui-Feng Lee
@ 2024-02-29 17:39 ` Kui-Feng Lee
2024-03-02 0:45 ` Kui-Feng Lee
1 sibling, 1 reply; 13+ messages in thread
From: Kui-Feng Lee @ 2024-02-29 17:39 UTC (permalink / raw)
To: David Ahern, Jakub Kicinski, Kui-Feng Lee
Cc: netdev, ast, martin.lau, kernel-team, davem, kuifeng
On 2/23/24 19:15, David Ahern wrote:
> On 2/23/24 7:21 PM, Jakub Kicinski wrote:
>> On Fri, 23 Feb 2024 00:13:46 -0800 Kui-Feng Lee wrote:
>>> Due to the slowness of the test environment
>>
>> Would be interesting if it's slowness, because it failed 2 times
>> on the debug runner but 5 times on the non-debug one. We'll see.
>
> hmmm... that should be debugged. waiting 2*N + 1 and then requesting GC
> and still failing suggests something else is at play
I did some tests, and found fib6_run_gc() do round_jiffies()
for the gc timer. So, gc_interval can increase 0.75 seconds in
some case. I am doing more investigation on this.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-02-29 17:39 ` Kui-Feng Lee
@ 2024-03-02 0:45 ` Kui-Feng Lee
2024-03-04 15:44 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Kui-Feng Lee @ 2024-03-02 0:45 UTC (permalink / raw)
To: David Ahern, Jakub Kicinski, Kui-Feng Lee
Cc: netdev, ast, martin.lau, kernel-team, davem, kuifeng
On 2/29/24 09:39, Kui-Feng Lee wrote:
>
>
> On 2/23/24 19:15, David Ahern wrote:
>> On 2/23/24 7:21 PM, Jakub Kicinski wrote:
>>> On Fri, 23 Feb 2024 00:13:46 -0800 Kui-Feng Lee wrote:
>>>> Due to the slowness of the test environment
>>>
>>> Would be interesting if it's slowness, because it failed 2 times
>>> on the debug runner but 5 times on the non-debug one. We'll see.
>>
>> hmmm... that should be debugged. waiting 2*N + 1 and then requesting GC
>> and still failing suggests something else is at play
>
> I did some tests, and found fib6_run_gc() do round_jiffies()
> for the gc timer. So, gc_interval can increase 0.75 seconds in
> some case. I am doing more investigation on this.
My conclusion is routes going to expire in N seconds can stay
for 2*N + 1 seconds.
Adding a new route going to expire in N seconds, it will starts
gc timer for N seconds. In our case, we add several routes going to
expire in N seconds consecutively. The later routes may expires in
different ticks than the first route added. So, a route may wait nearly
2*N seconds. The test case waits for 2*N + 1 seconds, that should be
enough.
However, some extra waiting may be added to it.
There are two possible extra waiting. The first one is calling
round_jiffies() in fib6_run_gc(), that may add 750ms at most. The second
one is the granularity of waiting for 5 seconds (in our case) is 512ms
for HZ 1000 according to the comment at the very begin of timer.c.
In fact, it can add 392ms for 5750ms (5000ms + 750ms). Overall, they may
contribute up to 1144ms.
Does that make sense?
Debug build is slower. So, the test scripts will be slower than normal
build. That means the script is actually waiting longer with a debug build.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-03-02 0:45 ` Kui-Feng Lee
@ 2024-03-04 15:44 ` Jakub Kicinski
2024-03-04 17:27 ` Kui-Feng Lee
2024-03-05 2:41 ` David Ahern
0 siblings, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2024-03-04 15:44 UTC (permalink / raw)
To: Kui-Feng Lee
Cc: David Ahern, Kui-Feng Lee, netdev, ast, martin.lau, kernel-team,
davem, kuifeng
On Fri, 1 Mar 2024 16:45:58 -0800 Kui-Feng Lee wrote:
> However, some extra waiting may be added to it.
> There are two possible extra waiting. The first one is calling
> round_jiffies() in fib6_run_gc(), that may add 750ms at most. The second
> one is the granularity of waiting for 5 seconds (in our case) is 512ms
> for HZ 1000 according to the comment at the very begin of timer.c.
> In fact, it can add 392ms for 5750ms (5000ms + 750ms). Overall, they may
> contribute up to 1144ms.
>
> Does that make sense?
>
> Debug build is slower. So, the test scripts will be slower than normal
> build. That means the script is actually waiting longer with a debug build.
Meaning bumping the wait to $((($EXPIRE + 1) * 2))
should be enough for the non-debug runner?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-03-04 15:44 ` Jakub Kicinski
@ 2024-03-04 17:27 ` Kui-Feng Lee
2024-03-05 2:41 ` David Ahern
1 sibling, 0 replies; 13+ messages in thread
From: Kui-Feng Lee @ 2024-03-04 17:27 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Ahern, Kui-Feng Lee, netdev, ast, martin.lau, kernel-team,
davem, kuifeng
On 3/4/24 07:44, Jakub Kicinski wrote:
> On Fri, 1 Mar 2024 16:45:58 -0800 Kui-Feng Lee wrote:
>> However, some extra waiting may be added to it.
>> There are two possible extra waiting. The first one is calling
>> round_jiffies() in fib6_run_gc(), that may add 750ms at most. The second
>> one is the granularity of waiting for 5 seconds (in our case) is 512ms
>> for HZ 1000 according to the comment at the very begin of timer.c.
>> In fact, it can add 392ms for 5750ms (5000ms + 750ms). Overall, they may
>> contribute up to 1144ms.
>>
>> Does that make sense?
>>
>> Debug build is slower. So, the test scripts will be slower than normal
>> build. That means the script is actually waiting longer with a debug build.
>
> Meaning bumping the wait to $((($EXPIRE + 1) * 2))
> should be enough for the non-debug runner?
Yes, it should be enough. I will send out another patch base on this.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-03-04 15:44 ` Jakub Kicinski
2024-03-04 17:27 ` Kui-Feng Lee
@ 2024-03-05 2:41 ` David Ahern
2024-03-05 2:49 ` David Ahern
1 sibling, 1 reply; 13+ messages in thread
From: David Ahern @ 2024-03-05 2:41 UTC (permalink / raw)
To: Jakub Kicinski, Kui-Feng Lee
Cc: Kui-Feng Lee, netdev, ast, martin.lau, kernel-team, davem,
kuifeng
On 3/4/24 8:44 AM, Jakub Kicinski wrote:
> On Fri, 1 Mar 2024 16:45:58 -0800 Kui-Feng Lee wrote:
>> However, some extra waiting may be added to it.
>> There are two possible extra waiting. The first one is calling
>> round_jiffies() in fib6_run_gc(), that may add 750ms at most. The second
>> one is the granularity of waiting for 5 seconds (in our case) is 512ms
>> for HZ 1000 according to the comment at the very begin of timer.c.
>> In fact, it can add 392ms for 5750ms (5000ms + 750ms). Overall, they may
>> contribute up to 1144ms.
>>
>> Does that make sense?
>>
>> Debug build is slower. So, the test scripts will be slower than normal
>> build. That means the script is actually waiting longer with a debug build.
>
> Meaning bumping the wait to $((($EXPIRE + 1) * 2))
> should be enough for the non-debug runner?
I have not had time to do a deep a dive on the timing, but it seems odd
to me that a 1 second timer can turn into 11 sec. That means for 10
seconds (10x the time the user requested) the route survived.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-03-05 2:41 ` David Ahern
@ 2024-03-05 2:49 ` David Ahern
2024-03-05 3:20 ` Kui-Feng Lee
0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2024-03-05 2:49 UTC (permalink / raw)
To: Jakub Kicinski, Kui-Feng Lee
Cc: Kui-Feng Lee, netdev, ast, martin.lau, kernel-team, davem,
kuifeng
On 3/4/24 7:41 PM, David Ahern wrote:
> On 3/4/24 8:44 AM, Jakub Kicinski wrote:
>> On Fri, 1 Mar 2024 16:45:58 -0800 Kui-Feng Lee wrote:
>>> However, some extra waiting may be added to it.
>>> There are two possible extra waiting. The first one is calling
>>> round_jiffies() in fib6_run_gc(), that may add 750ms at most. The second
>>> one is the granularity of waiting for 5 seconds (in our case) is 512ms
>>> for HZ 1000 according to the comment at the very begin of timer.c.
>>> In fact, it can add 392ms for 5750ms (5000ms + 750ms). Overall, they may
>>> contribute up to 1144ms.
>>>
>>> Does that make sense?
>>>
>>> Debug build is slower. So, the test scripts will be slower than normal
>>> build. That means the script is actually waiting longer with a debug build.
>>
>> Meaning bumping the wait to $((($EXPIRE + 1) * 2))
>> should be enough for the non-debug runner?
>
> I have not had time to do a deep a dive on the timing, but it seems odd
> to me that a 1 second timer can turn into 11 sec. That means for 10
> seconds (10x the time the user requested) the route survived.
Also, you added a direct call to ipv6_sysctl_rtcache_flush to force a
flush which is going to try to kick off gc at that moment. Is the delay
kicking in?
delay = net->ipv6.sysctl.flush_delay;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] selftests/net: force synchronized GC for a test.
2024-03-05 2:49 ` David Ahern
@ 2024-03-05 3:20 ` Kui-Feng Lee
0 siblings, 0 replies; 13+ messages in thread
From: Kui-Feng Lee @ 2024-03-05 3:20 UTC (permalink / raw)
To: David Ahern, Jakub Kicinski
Cc: Kui-Feng Lee, netdev, ast, martin.lau, kernel-team, davem,
kuifeng
On 3/4/24 18:49, David Ahern wrote:
> On 3/4/24 7:41 PM, David Ahern wrote:
>> On 3/4/24 8:44 AM, Jakub Kicinski wrote:
>>> On Fri, 1 Mar 2024 16:45:58 -0800 Kui-Feng Lee wrote:
>>>> However, some extra waiting may be added to it.
>>>> There are two possible extra waiting. The first one is calling
>>>> round_jiffies() in fib6_run_gc(), that may add 750ms at most. The second
>>>> one is the granularity of waiting for 5 seconds (in our case) is 512ms
>>>> for HZ 1000 according to the comment at the very begin of timer.c.
>>>> In fact, it can add 392ms for 5750ms (5000ms + 750ms). Overall, they may
>>>> contribute up to 1144ms.
>>>>
>>>> Does that make sense?
>>>>
>>>> Debug build is slower. So, the test scripts will be slower than normal
>>>> build. That means the script is actually waiting longer with a debug build.
>>>
>>> Meaning bumping the wait to $((($EXPIRE + 1) * 2))
>>> should be enough for the non-debug runner?
>>
>> I have not had time to do a deep a dive on the timing, but it seems odd
>> to me that a 1 second timer can turn into 11 sec. That means for 10
>> seconds (10x the time the user requested) the route survived.
>
> Also, you added a direct call to ipv6_sysctl_rtcache_flush to force a
> flush which is going to try to kick off gc at that moment. Is the delay
> kicking in?
>
> delay = net->ipv6.sysctl.flush_delay;
The delay doesn't cause any different here. The delay affects only
exceptions.
It doesn't 10x. It does 2 times.
In fib_tests.sh
sysctl -wq net.ipv6.route.gc_interval= $EXPIRE
The test sets the gc interval to 5 seconds.
The test install several routes in a round. So, some later routes may
miss the first GC that emitted after the first 5 seconds. These routes
have to wait for 5 more seconds for the next time GC. So, it is 10
seconds. However, due to calling round_jiffies() in fib6_run_gc(), the
waiting time of the second GC period can be longer. It could increase
750ms at most. And, due to the granularity of the timer, it can adds
another 512ms at the range 4s~32s.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-03-05 3:20 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-23 8:13 [PATCH net-next] selftests/net: force synchronized GC for a test Kui-Feng Lee
2024-02-24 2:21 ` Jakub Kicinski
2024-02-24 3:15 ` David Ahern
2024-02-24 3:54 ` Kui-Feng Lee
2024-02-29 17:39 ` Kui-Feng Lee
2024-03-02 0:45 ` Kui-Feng Lee
2024-03-04 15:44 ` Jakub Kicinski
2024-03-04 17:27 ` Kui-Feng Lee
2024-03-05 2:41 ` David Ahern
2024-03-05 2:49 ` David Ahern
2024-03-05 3:20 ` Kui-Feng Lee
2024-02-24 4:40 ` Kui-Feng Lee
2024-02-27 8:53 ` Paolo Abeni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).