* [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
@ 2024-06-12 11:23 Cyril Hrubis
2024-06-12 12:21 ` Petr Vorel
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Cyril Hrubis @ 2024-06-12 11:23 UTC (permalink / raw)
To: ltp; +Cc: Steve
Instead of hardcoding the values we attempt to measure the CPU speed and
set the timeout accordingly. Given that the difference in the duration
of the test when the kernel is buggy is about 30x we do not have to have
a precise callibration, just very rough estimate if we are running on a
server or small ARM board would suffice.
So we attempt to measure how long does a bussy loop take and base the
default timeout on that. On x86_64 CPUs the resulting timeout is about
double of the value of the actual test runtime and works fine, but we
need to make sure that the coeficient we divide the result from
callibrate works for small boards too. So please run the test on as many
machines as you can and report if we need to make the dividor smaller or
not.
CC: Steve <xufeifei1@oppo.com>
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
---
Changes in v2:
- use different style of the bussy loop
.../kernel/sched/cfs-scheduler/starvation.c | 38 +++++++++++++++++--
1 file changed, 35 insertions(+), 3 deletions(-)
diff --git a/testcases/kernel/sched/cfs-scheduler/starvation.c b/testcases/kernel/sched/cfs-scheduler/starvation.c
index eb9fd6ff5..0bb820f2f 100644
--- a/testcases/kernel/sched/cfs-scheduler/starvation.c
+++ b/testcases/kernel/sched/cfs-scheduler/starvation.c
@@ -21,11 +21,40 @@
#include <sched.h>
#include "tst_test.h"
+#include "tst_safe_clocks.h"
+#include "tst_timer.h"
static char *str_loop;
-static long loop = 2000000;
+static long loop = 1000000;
static char *str_timeout;
-static int timeout = 240;
+static int timeout;
+
+#define CALLIBRATE_LOOPS 100000000
+
+static int callibrate(void)
+{
+ int i;
+ struct timespec start, stop;
+ long long diff;
+
+ for (i = 0; i < CALLIBRATE_LOOPS; i++) {
+ __asm__ __volatile__ ("" : "+g" (i) : :);
+ }
+
+ SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &start);
+
+ for (i = 0; i < CALLIBRATE_LOOPS; i++) {
+ __asm__ __volatile__ ("" : "+g" (i) : :);
+ }
+
+ SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &stop);
+
+ diff = tst_timespec_diff_us(stop, start);
+
+ tst_res(TINFO, "CPU did %i loops in %llius", CALLIBRATE_LOOPS, diff);
+
+ return diff;
+}
static int wait_for_pid(pid_t pid)
{
@@ -59,8 +88,11 @@ static void setup(void)
if (tst_parse_long(str_loop, &loop, 1, LONG_MAX))
tst_brk(TBROK, "Invalid number of loop number '%s'", str_loop);
- if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX))
+ if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX)) {
tst_brk(TBROK, "Invalid number of timeout '%s'", str_timeout);
+ } else {
+ timeout = callibrate() / 1000;
+ }
tst_set_max_runtime(timeout);
}
--
2.44.2
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
2024-06-12 11:23 [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout Cyril Hrubis
@ 2024-06-12 12:21 ` Petr Vorel
2024-06-12 18:22 ` Cyril Hrubis
2024-06-13 11:22 ` Anders Roxell
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Petr Vorel @ 2024-06-12 12:21 UTC (permalink / raw)
To: Cyril Hrubis; +Cc: Steve, ltp
Hi Cyril,
> Instead of hardcoding the values we attempt to measure the CPU speed and
> set the timeout accordingly. Given that the difference in the duration
> of the test when the kernel is buggy is about 30x we do not have to have
> a precise callibration, just very rough estimate if we are running on a
> server or small ARM board would suffice.
> So we attempt to measure how long does a bussy loop take and base the
s/bussy/busy/
> default timeout on that. On x86_64 CPUs the resulting timeout is about
> double of the value of the actual test runtime and works fine, but we
> need to make sure that the coeficient we divide the result from
> callibrate works for small boards too. So please run the test on as many
> machines as you can and report if we need to make the dividor smaller or
> not.
This looks much better: ppc64le has Timeout per run is 0h 00m 30s (1m 10s
runtime), others (x68_64, s390x, aarch64) have ~ 1 min or less.
Tested-by: Petr Vorel <pvorel@suse.cz>
...
> +static int callibrate(void)
> +{
> + int i;
> + struct timespec start, stop;
> + long long diff;
> +
> + for (i = 0; i < CALLIBRATE_LOOPS; i++) {
> + __asm__ __volatile__ ("" : "+g" (i) : :);
> + }
> +
> + SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &start);
> +
> + for (i = 0; i < CALLIBRATE_LOOPS; i++) {
> + __asm__ __volatile__ ("" : "+g" (i) : :);
> + }
> +
> + SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &stop);
> +
> + diff = tst_timespec_diff_us(stop, start);
> +
> + tst_res(TINFO, "CPU did %i loops in %llius", CALLIBRATE_LOOPS, diff);
> +
> + return diff;
> +}
> static int wait_for_pid(pid_t pid)
> {
> @@ -59,8 +88,11 @@ static void setup(void)
> if (tst_parse_long(str_loop, &loop, 1, LONG_MAX))
> tst_brk(TBROK, "Invalid number of loop number '%s'", str_loop);
> - if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX))
> + if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX)) {
nit: {} brackets are not needed.
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Kind regards,
Petr
> tst_brk(TBROK, "Invalid number of timeout '%s'", str_timeout);
> + } else {
> + timeout = callibrate() / 1000;
> + }
> tst_set_max_runtime(timeout);
> }
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
2024-06-12 12:21 ` Petr Vorel
@ 2024-06-12 18:22 ` Cyril Hrubis
0 siblings, 0 replies; 10+ messages in thread
From: Cyril Hrubis @ 2024-06-12 18:22 UTC (permalink / raw)
To: Petr Vorel; +Cc: Steve, ltp
Hi!
> > Instead of hardcoding the values we attempt to measure the CPU speed and
> > set the timeout accordingly. Given that the difference in the duration
> > of the test when the kernel is buggy is about 30x we do not have to have
> > a precise callibration, just very rough estimate if we are running on a
> > server or small ARM board would suffice.
>
> > So we attempt to measure how long does a bussy loop take and base the
> s/bussy/busy/
>
> > default timeout on that. On x86_64 CPUs the resulting timeout is about
> > double of the value of the actual test runtime and works fine, but we
> > need to make sure that the coeficient we divide the result from
> > callibrate works for small boards too. So please run the test on as many
> > machines as you can and report if we need to make the dividor smaller or
> > not.
>
> This looks much better: ppc64le has Timeout per run is 0h 00m 30s (1m 10s
> runtime), others (x68_64, s390x, aarch64) have ~ 1 min or less.
>
> Tested-by: Petr Vorel <pvorel@suse.cz>
And I've realized that we also have to check if we had exhausted our
runtime in the testcase, because if someone overrides the timeout the
test will stil pass even when it shouldn't. I will send a separate patch
for that.
--
Cyril Hrubis
chrubis@suse.cz
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
2024-06-12 11:23 [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout Cyril Hrubis
2024-06-12 12:21 ` Petr Vorel
@ 2024-06-13 11:22 ` Anders Roxell
2024-06-27 9:28 ` joe.liu via ltp
2024-07-01 9:50 ` joe.liu via ltp
3 siblings, 0 replies; 10+ messages in thread
From: Anders Roxell @ 2024-06-13 11:22 UTC (permalink / raw)
To: Cyril Hrubis; +Cc: Steve, ltp
On Wed, 12 Jun 2024 at 13:23, Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Instead of hardcoding the values we attempt to measure the CPU speed and
> set the timeout accordingly. Given that the difference in the duration
> of the test when the kernel is buggy is about 30x we do not have to have
> a precise callibration, just very rough estimate if we are running on a
> server or small ARM board would suffice.
>
> So we attempt to measure how long does a bussy loop take and base the
> default timeout on that. On x86_64 CPUs the resulting timeout is about
> double of the value of the actual test runtime and works fine, but we
> need to make sure that the coeficient we divide the result from
> callibrate works for small boards too. So please run the test on as many
> machines as you can and report if we need to make the dividor smaller or
> not.
>
> CC: Steve <xufeifei1@oppo.com>
> Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
I've tested this patch on 1 arm HW and 2 arm64 HW's. see the results below.
arm HW:
===== [1;36mstarvation[0m =====
command: starvation
[ 43.278533] /usr/local/bin/kirk[356]: starting test starvation (starvation)
tst_test.c:1734: TINFO: LTP version: 20180118-5676-g4696467fb
tst_test.c:1618: TINFO: Timeout per run is 0h 05m 24s
starvation.c:86: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
starvation.c:54: TINFO: CPU did 100000000 loops in 89172us
tst_test.c:1626: TINFO: Updating max runtime to 0h 01m 29s
tst_test.c:1618: TINFO: Timeout per run is 0h 06m 53s
starvation.c:132: TPASS: wait_for_pid(child_pid) passed
Summary:
passed 2
failed 0
broken 0
skipped 0
warnings 0
Duration: 1m 2s
arm64 HW1:
===== [1;36mstarvation[0m =====
command: starvation
[ 49.213159] /usr/local/bin/kirk[362]: starting test starvation (starvation)
[ 49.213159] /usr/local/bin/kirk[362]: starting test starvation (starvation)
tst_test.c:1734: TINFO: LTP version: 20180118-5676-g4696467fb
tst_test.c:1618: TINFO: Timeout per run is 0h 05m 24s
starvation.c:86: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
starvation.c:54: TINFO: CPU did 100000000 loops in 141688us
tst_test.c:1626: TINFO: Updating max runtime to 0h 02m 21s
tst_test.c:1618: TINFO: Timeout per run is 0h 07m 45s
starvation.c:132: TPASS: wait_for_pid(child_pid) passed
Summary:
passed 2
failed 0
broken 0
skipped 0
warnings 0
Duration: 1m 32s
arm64 HW2:
===== [1;36mstarvation[0m =====
command: starvation
[ 25.384826] /usr/local/bin/kirk[775]: starting test starvation (starvation)
tst_test.c:1734: TINFO: LTP version: 20180118-5676-g4696467fb
tst_test.c:1618: TINFO: Timeout per run is 0h 05m 24s
starvation.c:86: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
starvation.c:54: TINFO: CPU did 100000000 loops in 113584us
tst_test.c:1626: TINFO: Updating max runtime to 0h 01m 53s
tst_test.c:1618: TINFO: Timeout per run is 0h 07m 17s
starvation.c:132: TPASS: wait_for_pid(child_pid) passed
Summary:
passed 2
failed 0
broken 0
skipped 0
warnings 0
Duration: 1m 15s
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Cheers,
Anders
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
2024-06-12 11:23 [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout Cyril Hrubis
2024-06-12 12:21 ` Petr Vorel
2024-06-13 11:22 ` Anders Roxell
@ 2024-06-27 9:28 ` joe.liu via ltp
2024-06-28 12:49 ` Cyril Hrubis
2024-07-01 9:50 ` joe.liu via ltp
3 siblings, 1 reply; 10+ messages in thread
From: joe.liu via ltp @ 2024-06-27 9:28 UTC (permalink / raw)
To: chrubis; +Cc: alix.wu, xufeifei1, ltp, joe.liu
Hi Cyril,
> Instead of hardcoding the values we attempt to measure the CPU speed and
> set the timeout accordingly. Given that the difference in the duration
> of the test when the kernel is buggy is about 30x we do not have to have
> a precise callibration, just very rough estimate if we are running on a
> server or small ARM board would suffice.
> So we attempt to measure how long does a bussy loop take and base the
> default timeout on that. On x86_64 CPUs the resulting timeout is about
> double of the value of the actual test runtime and works fine, but we
> need to make sure that the coeficient we divide the result from
> callibrate works for small boards too. So please run the test on as many
> machines as you can and report if we need to make the dividor smaller or
> not.
While applying this new patch, the test result of 32bit kernel is still failed. Please refers following log:
external/ltp/lib/tst_test.c:1690: TINFO: LTP version: 20230929
external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:90: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:58: TINFO: CPU did 100000000 loops in 198306us
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:99: TINFO: cal timeout: 198
external/ltp/lib/tst_test.c:1583: TINFO: Updating max runtime to 0h 03m 18s
external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 03m 48s
Test timeouted, sending SIGKILL!
After using "time ./starvation" to check the timecost, we can see the timecost is about 26 mins and 48 seconds
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
2024-06-27 9:28 ` joe.liu via ltp
@ 2024-06-28 12:49 ` Cyril Hrubis
0 siblings, 0 replies; 10+ messages in thread
From: Cyril Hrubis @ 2024-06-28 12:49 UTC (permalink / raw)
To: joe.liu; +Cc: alix.wu, xufeifei1, ltp
Hi!
> > Instead of hardcoding the values we attempt to measure the CPU speed and
> > set the timeout accordingly. Given that the difference in the duration
> > of the test when the kernel is buggy is about 30x we do not have to have
> > a precise callibration, just very rough estimate if we are running on a
> > server or small ARM board would suffice.
>
> > So we attempt to measure how long does a bussy loop take and base the
> > default timeout on that. On x86_64 CPUs the resulting timeout is about
> > double of the value of the actual test runtime and works fine, but we
> > need to make sure that the coeficient we divide the result from
> > callibrate works for small boards too. So please run the test on as many
> > machines as you can and report if we need to make the dividor smaller or
> > not.
>
> While applying this new patch, the test result of 32bit kernel is still failed. Please refers following log:
>
> external/ltp/lib/tst_test.c:1690: TINFO: LTP version: 20230929
> external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:90: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:58: TINFO: CPU did 100000000 loops in 198306us
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:99: TINFO: cal timeout: 198
> external/ltp/lib/tst_test.c:1583: TINFO: Updating max runtime to 0h 03m 18s
> external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 03m 48s
> Test timeouted, sending SIGKILL!
>
> After using "time ./starvation" to check the timecost, we can see the timecost is about 26 mins and 48 seconds
This is really strange. I tested the code on RPi Zero with 32bit BCM2835
where the CPU is even slower and runtime limit is set to about 5
minutes, yet the test finished in a minute. That's with resonably recent
kernel 6.1 though.
What kernel version do you use?
Do you run LTP with background tasks to keep the CPU bussy?
Do you have any custom patches for kernel scheduller applied?
It really looks like something unexpected is happening at your end.
--
Cyril Hrubis
chrubis@suse.cz
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
2024-06-12 11:23 [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout Cyril Hrubis
` (2 preceding siblings ...)
2024-06-27 9:28 ` joe.liu via ltp
@ 2024-07-01 9:50 ` joe.liu via ltp
2024-07-02 14:47 ` Cyril Hrubis
3 siblings, 1 reply; 10+ messages in thread
From: joe.liu via ltp @ 2024-07-01 9:50 UTC (permalink / raw)
To: chrubis; +Cc: alix.wu, xufeifei1, ltp, joe.liu
> Hi!
> > > Instead of hardcoding the values we attempt to measure the CPU speed and
> > > set the timeout accordingly. Given that the difference in the duration
> > > of the test when the kernel is buggy is about 30x we do not have to have
> > > a precise callibration, just very rough estimate if we are running on a
> > > server or small ARM board would suffice.
> > > So we attempt to measure how long does a bussy loop take and base the
> > > default timeout on that. On x86_64 CPUs the resulting timeout is about
> > > double of the value of the actual test runtime and works fine, but we
> > > need to make sure that the coeficient we divide the result from
> > > callibrate works for small boards too. So please run the test on as many
> > > machines as you can and report if we need to make the dividor smaller or
> > > not.
> > While applying this new patch, the test result of 32bit kernel is still failed. Please refers following log:
> >
> > external/ltp/lib/tst_test.c:1690: TINFO: LTP version: 20230929
> > external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:90: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:58: TINFO: CPU did 100000000 loops in 198306us
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:99: TINFO: cal timeout: 198
> > external/ltp/lib/tst_test.c:1583: TINFO: Updating max runtime to 0h 03m 18s
> > external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 03m 48s
> > Test timeouted, sending SIGKILL!
> >
> > After using "time ./starvation" to check the timecost, we can see the timecost is about 26 mins and 48 seconds
> This is really strange. I tested the code on RPi Zero with 32bit BCM2835
> where the CPU is even slower and runtime limit is set to about 5
> minutes, yet the test finished in a minute. That's with resonably recent
> kernel 6.1 though.
> What kernel version do you use?
We are now using kernel 5.15
> Do you run LTP with background tasks to keep the CPU bussy?
Yes, we are testing LTP with Android VTS. And the test environment is under Android Launcher.
> Do you have any custom patches for kernel scheduller applied?
I think there is no any patch applied for kernel scheduller.
> It really looks like something unexpected is happening at your end.
Actually, the same kernel version using 64bit kernel can pass the ltp test.
1. for 32bit kernel, we have this log
32bit kernel
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start
main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top)
4524 root 20 0 14M 472K 0 R 85.7 0.0 0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top
2. for 64bit kernel, we have this log
64bit kernel
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 9753
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 9753
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 9754
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 9754
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 9753
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 9753, current pid is 9754, go to child() start
main pid is 9753, child pid is 9754, and we see both main and child pid are working (checking by top)
9754 root 20 0 16M 472K 0 R 51.4 0.0 0:05.51 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top
9753 root 20 0 16M 464K 0 R 45.7 0.0 0:05.70 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
2024-07-01 9:50 ` joe.liu via ltp
@ 2024-07-02 14:47 ` Cyril Hrubis
2024-07-10 15:55 ` Petr Vorel
0 siblings, 1 reply; 10+ messages in thread
From: Cyril Hrubis @ 2024-07-02 14:47 UTC (permalink / raw)
To: joe.liu; +Cc: alix.wu, xufeifei1, ltp
Hi!
> > What kernel version do you use?
> We are now using kernel 5.15
>
> > Do you run LTP with background tasks to keep the CPU bussy?
> Yes, we are testing LTP with Android VTS. And the test environment is under Android Launcher.
>
> > Do you have any custom patches for kernel scheduller applied?
> I think there is no any patch applied for kernel scheduller.
>
> > It really looks like something unexpected is happening at your end.
>
> Actually, the same kernel version using 64bit kernel can pass the ltp test.
>
>
>
> 1. for 32bit kernel, we have this log
>
> 32bit kernel
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start
>
> main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top)
>
> 4524 root 20 0 14M 472K 0 R 85.7 0.0 0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top
So this looks like we managed to reproduce the issue the test was trying
to catch. The child is hogging the CPU and the parent process cannot
proceed. And I suppose that this happens only on 32bit because 32bit is
less tested these days.
--
Cyril Hrubis
chrubis@suse.cz
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
2024-07-02 14:47 ` Cyril Hrubis
@ 2024-07-10 15:55 ` Petr Vorel
2024-07-10 18:12 ` Cyril Hrubis
0 siblings, 1 reply; 10+ messages in thread
From: Petr Vorel @ 2024-07-10 15:55 UTC (permalink / raw)
To: Cyril Hrubis; +Cc: alix.wu, joe.liu, ltp, xufeifei1
Hi Cyril,
all,
> Hi!
> > > What kernel version do you use?
> > We are now using kernel 5.15
> > > Do you run LTP with background tasks to keep the CPU bussy?
> > Yes, we are testing LTP with Android VTS. And the test environment is under Android Launcher.
> > > Do you have any custom patches for kernel scheduller applied?
> > I think there is no any patch applied for kernel scheduller.
> > > It really looks like something unexpected is happening at your end.
> > Actually, the same kernel version using 64bit kernel can pass the ltp test.
> > 1. for 32bit kernel, we have this log
> > 32bit kernel
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start
> > main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top)
> > 4524 root 20 0 14M 472K 0 R 85.7 0.0 0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top
> So this looks like we managed to reproduce the issue the test was trying
> to catch. The child is hogging the CPU and the parent process cannot
> proceed. And I suppose that this happens only on 32bit because 32bit is
> less tested these days.
I guess we can merge this, right?
Or we still not sure whether this is really kernel bug not related to the
change?
Kind regards,
Petr
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout
2024-07-10 15:55 ` Petr Vorel
@ 2024-07-10 18:12 ` Cyril Hrubis
0 siblings, 0 replies; 10+ messages in thread
From: Cyril Hrubis @ 2024-07-10 18:12 UTC (permalink / raw)
To: Petr Vorel; +Cc: alix.wu, joe.liu, ltp, xufeifei1
Hi!
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start
>
> > > main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top)
>
> > > 4524 root 20 0 14M 472K 0 R 85.7 0.0 0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top
>
> > So this looks like we managed to reproduce the issue the test was trying
> > to catch. The child is hogging the CPU and the parent process cannot
> > proceed. And I suppose that this happens only on 32bit because 32bit is
> > less tested these days.
>
> I guess we can merge this, right?
> Or we still not sure whether this is really kernel bug not related to the
> change?
It stil needs a runtime check at the end of the loop in a case that
timeout is multiplied. I will have to send another version, thanks for
reminding me.
--
Cyril Hrubis
chrubis@suse.cz
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-07-10 18:13 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-12 11:23 [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout Cyril Hrubis
2024-06-12 12:21 ` Petr Vorel
2024-06-12 18:22 ` Cyril Hrubis
2024-06-13 11:22 ` Anders Roxell
2024-06-27 9:28 ` joe.liu via ltp
2024-06-28 12:49 ` Cyril Hrubis
2024-07-01 9:50 ` joe.liu via ltp
2024-07-02 14:47 ` Cyril Hrubis
2024-07-10 15:55 ` Petr Vorel
2024-07-10 18:12 ` Cyril Hrubis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox