* cyclictest vs. latmus
@ 2022-11-06 19:11 Robert Berger
2022-11-07 7:32 ` Philippe Gerum
0 siblings, 1 reply; 5+ messages in thread
From: Robert Berger @ 2022-11-06 19:11 UTC (permalink / raw)
To: xenomai
Hi,
I run some test cases with cyclictest and cyclictest built for xenomai 3
for a couple of years now and want to switch to xenomai 4/evl.
Looks like I managed to compile an evl kernel and the evllib and I use
latmus instead of cyclictest (not sure if I'm doing that correctly) and
also I compare the results against cyclictest (not sure it's right to do
that they way I'm doing it).
Anyways, sorry for the lengthy document I came of with[1] with
histograms and questions.
Here are my questions, which you can find in a nicer formatted way in
the doc[1].
= evl Kernel - CONFIG_PREEMPT_NONE - cyclictest =
My understanding is, that evl works similar to xenomai 3, meaning that
you need to compile/link an application against libevl for evl to kick
in. I would expect figures 15 and 16 on page 10 to look like figures 3
and 4 on page 4.
What’s odd is
*) the outlier in the graph without load is bigger than the one with
load (1.3 ms vs. 550 us), which should be the opposite
*) the outlier in the graph with load should be like the one in figure 4
which is around 10 ms, but it is more like 550 us - please note that the
kernel config contains CONFIG_PREEMPT_NONE=y and CONFIG_EVL=y
Does this observation imply, that an evl kernel modifies the behavior of
the "vanilla" Linux scheduler for processes which should run on the
"standard/vanilla" Linux scheduler?
= evl Kernel - CONFIG_EVL - latmus =
I am not quite sure if/how latmus compares to cyclictest. Ideally I
would like to compare histograms produced by latmus against those I
produce with cyclictest.
Let’s have a look at graphs 17 and 18 on page 11.
*) the outlier in the graph without load is bigger than the one with
load (780 us vs. 700 us), which should be the opposite
*) xenomai 3 with cyclictest compiled for xenomai - figures 13 and 14 on
page 9 performs significantly better than evl with latmus
**) no load outlier: xenomai 3(cyclictest): 26 us - evl(latmus): 780 us
**) load outlier: xenomai 3(cyclictest): 65 us - evl(latmus): 700 us
*) a preempt-rt patched kernel with cyclictest - figures 9 and 10 on
page 7 performs significantly
better than evl with latmus
**) no load outlier: preempt-rt(cyclictest): 120 us - evl(latmus): 780 us
**) load outlier: preempt-rt(cyclictest): 119 us - evl(latmus): 700 us
*) a vanilla kernel with CONFIG_PREEMPT with cyclictest - figures 7 and
8 on page 6 performs similar
to evl with latmus
**) no load outlier: preempt-rt(cyclictest): 880 us - evl(latmus): 780 us
**) load outlier: preempt-rt(cyclictest): 720 us - evl(latmus): 700 us
Please check the .pdf here[1] for more details:
[1]
https://drive.google.com/drive/folders/1_5PZ_4sQxvL5MbiQU1Y1kUXC5IjF5SSX?usp=sharing
Regards,
Robert
--
Robert Berger
Embedded Software Evangelist
Reliable Embedded Systems
Consulting Training Engineering
URL: https://www.reliableembeddedsystems.com
Schedule a web meeting:
https://calendly.com/reliableembeddedsystems/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cyclictest vs. latmus
2022-11-06 19:11 cyclictest vs. latmus Robert Berger
@ 2022-11-07 7:32 ` Philippe Gerum
2022-11-10 6:52 ` Robert Berger
0 siblings, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2022-11-07 7:32 UTC (permalink / raw)
To: Robert Berger; +Cc: xenomai
[-- Attachment #1: Type: text/plain, Size: 6945 bytes --]
Robert Berger <xenomai.list@gmail.com> writes:
> Hi,
>
> I run some test cases with cyclictest and cyclictest built for xenomai
> 3 for a couple of years now and want to switch to xenomai 4/evl.
>
> Looks like I managed to compile an evl kernel and the evllib and I use
> latmus instead of cyclictest (not sure if I'm doing that correctly)
> and also I compare the results against cyclictest (not sure it's right
> to do that they way I'm doing it).
>
> Anyways, sorry for the lengthy document I came of with[1] with
> histograms and questions.
>
> Here are my questions, which you can find in a nicer formatted way in
> the doc[1].
>
> = evl Kernel - CONFIG_PREEMPT_NONE - cyclictest =
>
> My understanding is, that evl works similar to xenomai 3, meaning that
> you need to compile/link an application against libevl for evl to kick
> in. I would expect figures 15 and 16 on page 10 to look like figures 3
> and 4 on page 4.
>
Linking is not enough with EVL, besides this is no POSIX API so you
would not have any silent wrapping via the real-time syscall library,
and there is no automatic bootstrap via the library constructor trick
either. IOW, an EVL application needs to explicitly attach to the core
via a call to evl_init(), and its real-time threads have to do so as
well using the evl_attach_thread() syscall, this is documented at
[1]. So unless cyclictest.c was modified to issue such syscall, the
performance figures you observed would be those of threads managed by
the vanilla kernel, not the real-time core.
To make sure you are actually running EVL threads, you may want to check
with the libevl 'ps' command, e.g. this is a snapshot taken when a
latmus instance is running:
# evl ps -l
root@homelab-phytec-mira:~# evl ps -l
CPU PID SCHED PRIO ISW CTXSW SYS RWA STAT TIMEOUT %CPU CPUTIME WCHAN NAME
0 407 fifo 98 0 20947 20948 0 Wt - 0.0 0:125.134 &wf->wait timer-responder:405
0 408 weak 0 1 2 1 0 W - 0.0 0:000.023 &wf->wait test-sitter:405
The kernel configuration can be checked [2] for known latency killers as
follows:
root@homelab-phytec-mira:~# evl check
root@homelab-phytec-mira:~#
i.e. this command should be silent, otherwise problematic Kconfig
option(s) would be dumped to stdout.
> What’s odd is
>
> *) the outlier in the graph without load is bigger than the one with
> load (1.3 ms vs. 550 us), which should be the opposite
>
> *) the outlier in the graph with load should be like the one in figure
> 4 which is around 10 ms, but it is more like 550 us - please note
> that the
> kernel config contains CONFIG_PREEMPT_NONE=y and CONFIG_EVL=y
> Does this observation imply, that an evl kernel modifies the behavior
> of the "vanilla" Linux scheduler for processes which should run on the
> "standard/vanilla" Linux scheduler?
No it does not. It looks like all these figures are not related to EVL
threads, but to regular/vanilla threads instead.
>
> = evl Kernel - CONFIG_EVL - latmus =
>
> I am not quite sure if/how latmus compares to cyclictest. Ideally I
> would like to compare histograms produced by latmus against those I
> produce with cyclictest.
>
The purpose and behavior of latmus are detailed here [3].
> Let’s have a look at graphs 17 and 18 on page 11.
>
> *) the outlier in the graph without load is bigger than the one with
> load (780 us vs. 700 us), which should be the opposite
>
> *) xenomai 3 with cyclictest compiled for xenomai - figures 13 and 14
> on page 9 performs significantly better than evl with latmus
> **) no load outlier: xenomai 3(cyclictest): 26 us - evl(latmus): 780 us
> **) load outlier: xenomai 3(cyclictest): 65 us - evl(latmus): 700 us
>
> *) a preempt-rt patched kernel with cyclictest - figures 9 and 10 on
> page 7 performs significantly
> better than evl with latmus
> **) no load outlier: preempt-rt(cyclictest): 120 us - evl(latmus): 780 us
> **) load outlier: preempt-rt(cyclictest): 119 us - evl(latmus): 700 us
>
> *) a vanilla kernel with CONFIG_PREEMPT with cyclictest - figures 7
> and 8 on page 6 performs similar
> to evl with latmus
> **) no load outlier: preempt-rt(cyclictest): 880 us - evl(latmus): 780 us
> **) load outlier: preempt-rt(cyclictest): 720 us - evl(latmus): 700 us
>
> Please check the .pdf here[1] for more details:
>
FWIW, I have a phytec mira at hand here - this is actually my main
development board for some real-time application software ATM, so I ran
a couple of short latmus tests the same way you did.
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 5.15.77-00705-gae6080e09d9a (rpm@pyro) (arm-linux-gnueabihf-gcc (GCC) 11.0.1 20210310 (experimental) [master revision 5987d8a79cda1069c774e5c302d5597310270026], GNU ld (Linaro_Binutils-2021.03) 2.36.50.20210310) #30 SMP PREEMPT IRQPIPE Mon Nov 7 08:55:08 CET 2022
[ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[ 0.000000] OF: fdt: Machine model: PHYTEC phyBOARD-Mira QuadPlus Carrier-Board with NAND
root@homelab-phytec-mira:~# evl -v
evl.0.40 -- #df5f221 (2022-11-06 13:00:54 +0100) [requires ABI 30]
The first test ran for 500s on a non-isolated CPU(0), the second one
isolated on its own CPU(1), both with the same stress-ng loop you
mentioned in your document, running in parallel to the latmus test:
root@homelab-phytec-mira:~# latmus -gnon-isolated.gp -T500 -p 500 --histogram=1000
warming up on CPU0 (not isolated)...
RTT| 00:00:01 (user, 500 us period, priority 98, CPU0-noisol)
...
root@homelab-phytec-mira:~# latmus -gisolated.gp -T500 -p 500 --histogram=1000
warming up on CPU1...
RTT| 00:00:01 (user, 500 us period, priority 98, CPU1)
...
root@homelab-phytec-mira:~# while :; do stress-ng --cpu 12 --io 4 --vm 2 --vm-bytes=500M --fork 4 --timeout 10s; done
stress-ng: info: [2732] dispatching hogs: 12 cpu, 4 io, 2 vm, 4 fork
stress-ng: info: [2732] successful run completed in 12.99s
...
The results are available from [4][5] and [6][7] respectively. To sum
up, we have ~62 µs worst-case in non-isolated mode, 37 µs when
isolated. Both figures are in line with the expectations on this SoM.
To help figuring out the reason for this behavior with latmus on your
test board, you may want to share your .config. However, I don't think
the results you observed with cyclictest are relevant to EVL.
[1] https://evlproject.org/core/user-api/thread/#thread-services
[2] https://evlproject.org/core/commands/#evl-check-command
[3] https://evlproject.org/core/benchmarks/#latmus-timer-response-time
[4]
[-- Attachment #2: non-isolated.gp --]
[-- Type: application/octet-stream, Size: 1287 bytes --]
# test started on: Tue Jun 18 04:49:31 2019
# Linux version 5.15.77-00705-gae6080e09d9a (rpm@pyro) (arm-linux-gnueabihf-gcc (GCC) 11.0.1 20210310 (experimental) [master revision 5987d8a79cda1069c774e5c302d5597310270026], GNU ld (Linaro_Binutils-2021.03) 2.36.50.20210310) #30 SMP PREEMPT IRQPIPE Mon Nov 7 08:55:08 CET 2022
# console=ttymxc1,115200 root=/dev/nfs ip=dhcp nfsroot=/var/minilab/tftpboot/%s/switch/rootfs,v3,tcp maxcpus=4
# libevl version: evl.0.40 -- #df5f221 (2022-11-06 13:00:54 +0100)
# sampling period: 500 microseconds
# clock gravity: 0i 6000k 6000u
# clocksource: mxc_timer1
# vDSO access: mmio
# context: user
# thread priority: 98
# thread affinity: CPU0-noisol
# C-state restricted
# duration (hhmmss): 00:08:20
# peak (hhmmss): 00:06:15
# min latency: 1.000
# avg latency: 8.548
# max latency: 61.378
# sample count: 1000003
1 2416
2 44296
3 248740
4 73368
5 57164
6 55003
7 66124
8 68120
9 60403
10 51085
11 42603
12 35007
13 29758
14 25462
15 21965
16 19128
17 16399
18 13794
19 11761
20 9807
21 8395
22 6807
23 5949
24 4743
25 4025
26 3275
27 2651
28 2333
29 1827
30 1517
31 1241
32 1026
33 841
34 689
35 508
36 397
37 297
38 267
39 202
40 173
41 101
42 80
43 69
44 44
45 36
46 35
47 24
48 11
49 12
50 7
51 5
52 5
53 2
54 1
55 1
56 0
57 2
58 1
59 0
60 0
61 1
[-- Attachment #3: Type: text/plain, Size: 4 bytes --]
[5]
[-- Attachment #4: non-isolated.png --]
[-- Type: image/png, Size: 10218 bytes --]
[-- Attachment #5: Type: text/plain, Size: 4 bytes --]
[6]
[-- Attachment #6: isolated.gp --]
[-- Type: application/octet-stream, Size: 1108 bytes --]
# test started on: Tue Jun 18 05:10:05 2019
# Linux version 5.15.77-00705-gae6080e09d9a (rpm@pyro) (arm-linux-gnueabihf-gcc (GCC) 11.0.1 20210310 (experimental) [master revision 5987d8a79cda1069c774e5c302d5597310270026], GNU ld (Linaro_Binutils-2021.03) 2.36.50.20210310) #30 SMP PREEMPT IRQPIPE Mon Nov 7 08:55:08 CET 2022
# console=ttymxc1,115200 root=/dev/nfs ip=dhcp nfsroot=/var/minilab/tftpboot/%s/switch/rootfs,v3,tcp isolcpus=1 evl.oobcpus=1
# libevl version: evl.0.40 -- #df5f221 (2022-11-06 13:00:54 +0100)
# sampling period: 500 microseconds
# clock gravity: 0i 6000k 6000u
# clocksource: mxc_timer1
# vDSO access: mmio
# context: user
# thread priority: 98
# thread affinity: CPU1
# C-state restricted
# duration (hhmmss): 00:08:20
# peak (hhmmss): 00:04:21
# min latency: 0.666
# avg latency: 2.453
# max latency: 36.697
# sample count: 1000004
0 22809
1 522366
2 180838
3 108240
4 60186
5 36583
6 22924
7 14890
8 9831
9 6650
10 4512
11 3258
12 2061
13 1536
14 993
15 746
16 471
17 347
18 205
19 140
20 100
21 74
22 59
23 44
24 33
25 21
26 34
27 16
28 11
29 9
30 2
31 9
32 2
33 1
34 0
35 1
36 2
[-- Attachment #7: Type: text/plain, Size: 4 bytes --]
[7]
[-- Attachment #8: isolated.png --]
[-- Type: image/png, Size: 10576 bytes --]
[-- Attachment #9: Type: text/plain, Size: 15 bytes --]
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cyclictest vs. latmus
2022-11-07 7:32 ` Philippe Gerum
@ 2022-11-10 6:52 ` Robert Berger
2022-11-11 9:00 ` Philippe Gerum
0 siblings, 1 reply; 5+ messages in thread
From: Robert Berger @ 2022-11-10 6:52 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
Hi,
Thanks for the explanations and sorry for the late reply.
My comments are inline.
On 07/11/2022 08:32, Philippe Gerum wrote:
>
> The results are available from [4][5] and [6][7] respectively. To sum
> up, we have ~62 µs worst-case in non-isolated mode, 37 µs when
> isolated. Both figures are in line with the expectations on this SoM.
Yep and this is what I would expect, so most likely something is wrong
in my configuration ;
>
> To help figuring out the reason for this behavior with latmus on your
> test board, you may want to share your .config. However, I don't think
> the results you observed with cyclictest are relevant to EVL.
>
Please note, that I use a multi_v7_defconfig. I guess you use something
else.
Here[1] you can find the config.gz, which I pulled from the board.
Those are the components I use:
root@multi-v7-ml:~# cat /proc/version
Linux version 5.15.64-evl-student-g96fa4a8fcde6
(student@HP-ZBook-15-G4-2) (arm-resy-linux-gnueabi-gcc (GCC) 12.2.0, GNU
ld (GNU Binutils) 2.39.0.20220819) #1 SMP IRQPIPE Fri Nov 4 01:10:57
CET 2022
git show 96fa4a8fcde6d9361777daf05caaa527a44e418e
commit 96fa4a8fcde6d9361777daf05caaa527a44e418e (HEAD ->
v5.15.64-evl3-rebase_LOCAL)
Author: student <student@ReliableEmbeddedSystems.com>
Date: Fri Nov 4 01:10:35 2022 +0100
commit not to be -dirty
diff --git a/arch/arm/configs/multi_v7_defconfig
b/arch/arm/configs/multi_v7_defconfig
index 33572998dbbe..f770b9681a98 100644
--- a/arch/arm/configs/multi_v7_defconfig
+++ b/arch/arm/configs/multi_v7_defconfig
@@ -1187,3 +1187,8 @@ CONFIG_DEBUG_FS=y
CONFIG_CHROME_PLATFORMS=y
CONFIG_CROS_EC=m
CONFIG_CROS_EC_CHARDEV=m
+CONFIG_DA9062_WATCHDOG=y
+CONFIG_MFD_DA9062=y
+CONFIG_REGULATOR_DA9062=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
commit 5c75c622911d192dc8c2988679713576cff89d45 (HEAD -> r39_LOCAL, tag:
r39, origin/next, origin/master, origin/HEAD, master)
Author: Philippe Gerum <rpm@xenomai.org>
Date: Mon Oct 31 12:16:38 2022 +0100
libevl r39
Signed-off-by: Philippe Gerum <rpm@xenomai.org>
[1]
https://drive.google.com/drive/folders/1yPOiKzfEgyXmdB-y4Pc4EzP15V4OtMyV?usp=sharing
Regards,
Robert
--
Robert Berger
Embedded Software Evangelist
Reliable Embedded Systems
Consulting Training Engineering
URL: https://www.reliableembeddedsystems.com
Schedule a web meeting:
https://calendly.com/reliableembeddedsystems/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: cyclictest vs. latmus
2022-11-10 6:52 ` Robert Berger
@ 2022-11-11 9:00 ` Philippe Gerum
2022-11-21 22:17 ` Robert Berger
0 siblings, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2022-11-11 9:00 UTC (permalink / raw)
To: Robert Berger; +Cc: xenomai
Robert Berger <xenomai.list@gmail.com> writes:
> Hi,
>
> Thanks for the explanations and sorry for the late reply.
>
> My comments are inline.
>
> On 07/11/2022 08:32, Philippe Gerum wrote:
>> The results are available from [4][5] and [6][7] respectively. To
>> sum
>> up, we have ~62 µs worst-case in non-isolated mode, 37 µs when
>> isolated. Both figures are in line with the expectations on this SoM.
>
> Yep and this is what I would expect, so most likely something is wrong
> in my configuration ;
>
>> To help figuring out the reason for this behavior with latmus on
>> your
>> test board, you may want to share your .config. However, I don't think
>> the results you observed with cyclictest are relevant to EVL.
>>
>
> Please note, that I use a multi_v7_defconfig. I guess you use
> something else.
>
No, I'm using multi_v7_defconfig as well. imx_v6_7_defconfig won't boot
that SoM.
The latency spikes you observed is related to CPU frequency scaling (see
[1]), the "performance" governor was not the default one in your
Kconfig, but the on-demand governor was, leading to dynamic frequency
adjustment. This should be reported by the following command on your
system:
~ # evl check
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=n
CONFIG_FTRACE=y
CONFIG_FTRACE is also reported because it does have a noticeable impact
on the latency figures even when no tracer is enabled on ARM, although
the overhead is still negligible compared to actual latency killers.
Setting CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y is enough to fix your
Kconfig as I did here.
HTH,
[1] https://evlproject.org/core/caveat/#caveat-cpufreq
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cyclictest vs. latmus
2022-11-11 9:00 ` Philippe Gerum
@ 2022-11-21 22:17 ` Robert Berger
0 siblings, 0 replies; 5+ messages in thread
From: Robert Berger @ 2022-11-21 22:17 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
Hi,
Sorry for the late reply - I was on the road.
My comments are inline.
On 11/11/2022 10:00, Philippe Gerum wrote:
> The latency spikes you observed is related to CPU frequency scaling (see
> [1]), the "performance" governor was not the default one in your
> Kconfig, but the on-demand governor was, leading to dynamic frequency
> adjustment. This should be reported by the following command on your
> system:
>
> ~ # evl check
> CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=n
> CONFIG_FTRACE=y
>
> CONFIG_FTRACE is also reported because it does have a noticeable impact
> on the latency figures even when no tracer is enabled on ARM, although
> the overhead is still negligible compared to actual latency killers.
>
> Setting CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y is enough to fix your
> Kconfig as I did here.
>
> HTH,
>
> [1] https://evlproject.org/core/caveat/#caveat-cpufreq
Absolutely!
I fixed my kernel config and now the graphs looks like Xenomai 3.
Thanks!
Regards,
Robert
--
Robert Berger
Embedded Software Evangelist
Reliable Embedded Systems
Consulting Training Engineering
URL: https://www.reliableembeddedsystems.com
Schedule a web meeting:
https://calendly.com/reliableembeddedsystems/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-11-21 22:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-06 19:11 cyclictest vs. latmus Robert Berger
2022-11-07 7:32 ` Philippe Gerum
2022-11-10 6:52 ` Robert Berger
2022-11-11 9:00 ` Philippe Gerum
2022-11-21 22:17 ` Robert Berger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.