* cyclictest vs. latmus @ 2022-11-06 19:11 Robert Berger 2022-11-07 7:32 ` Philippe Gerum 0 siblings, 1 reply; 5+ messages in thread From: Robert Berger @ 2022-11-06 19:11 UTC (permalink / raw) To: xenomai Hi, I run some test cases with cyclictest and cyclictest built for xenomai 3 for a couple of years now and want to switch to xenomai 4/evl. Looks like I managed to compile an evl kernel and the evllib and I use latmus instead of cyclictest (not sure if I'm doing that correctly) and also I compare the results against cyclictest (not sure it's right to do that they way I'm doing it). Anyways, sorry for the lengthy document I came of with[1] with histograms and questions. Here are my questions, which you can find in a nicer formatted way in the doc[1]. = evl Kernel - CONFIG_PREEMPT_NONE - cyclictest = My understanding is, that evl works similar to xenomai 3, meaning that you need to compile/link an application against libevl for evl to kick in. I would expect figures 15 and 16 on page 10 to look like figures 3 and 4 on page 4. What’s odd is *) the outlier in the graph without load is bigger than the one with load (1.3 ms vs. 550 us), which should be the opposite *) the outlier in the graph with load should be like the one in figure 4 which is around 10 ms, but it is more like 550 us - please note that the kernel config contains CONFIG_PREEMPT_NONE=y and CONFIG_EVL=y Does this observation imply, that an evl kernel modifies the behavior of the "vanilla" Linux scheduler for processes which should run on the "standard/vanilla" Linux scheduler? = evl Kernel - CONFIG_EVL - latmus = I am not quite sure if/how latmus compares to cyclictest. Ideally I would like to compare histograms produced by latmus against those I produce with cyclictest. Let’s have a look at graphs 17 and 18 on page 11. *) the outlier in the graph without load is bigger than the one with load (780 us vs. 700 us), which should be the opposite *) xenomai 3 with cyclictest compiled for xenomai - figures 13 and 14 on page 9 performs significantly better than evl with latmus **) no load outlier: xenomai 3(cyclictest): 26 us - evl(latmus): 780 us **) load outlier: xenomai 3(cyclictest): 65 us - evl(latmus): 700 us *) a preempt-rt patched kernel with cyclictest - figures 9 and 10 on page 7 performs significantly better than evl with latmus **) no load outlier: preempt-rt(cyclictest): 120 us - evl(latmus): 780 us **) load outlier: preempt-rt(cyclictest): 119 us - evl(latmus): 700 us *) a vanilla kernel with CONFIG_PREEMPT with cyclictest - figures 7 and 8 on page 6 performs similar to evl with latmus **) no load outlier: preempt-rt(cyclictest): 880 us - evl(latmus): 780 us **) load outlier: preempt-rt(cyclictest): 720 us - evl(latmus): 700 us Please check the .pdf here[1] for more details: [1] https://drive.google.com/drive/folders/1_5PZ_4sQxvL5MbiQU1Y1kUXC5IjF5SSX?usp=sharing Regards, Robert -- Robert Berger Embedded Software Evangelist Reliable Embedded Systems Consulting Training Engineering URL: https://www.reliableembeddedsystems.com Schedule a web meeting: https://calendly.com/reliableembeddedsystems/ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cyclictest vs. latmus 2022-11-06 19:11 cyclictest vs. latmus Robert Berger @ 2022-11-07 7:32 ` Philippe Gerum 2022-11-10 6:52 ` Robert Berger 0 siblings, 1 reply; 5+ messages in thread From: Philippe Gerum @ 2022-11-07 7:32 UTC (permalink / raw) To: Robert Berger; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 6945 bytes --] Robert Berger <xenomai.list@gmail.com> writes: > Hi, > > I run some test cases with cyclictest and cyclictest built for xenomai > 3 for a couple of years now and want to switch to xenomai 4/evl. > > Looks like I managed to compile an evl kernel and the evllib and I use > latmus instead of cyclictest (not sure if I'm doing that correctly) > and also I compare the results against cyclictest (not sure it's right > to do that they way I'm doing it). > > Anyways, sorry for the lengthy document I came of with[1] with > histograms and questions. > > Here are my questions, which you can find in a nicer formatted way in > the doc[1]. > > = evl Kernel - CONFIG_PREEMPT_NONE - cyclictest = > > My understanding is, that evl works similar to xenomai 3, meaning that > you need to compile/link an application against libevl for evl to kick > in. I would expect figures 15 and 16 on page 10 to look like figures 3 > and 4 on page 4. > Linking is not enough with EVL, besides this is no POSIX API so you would not have any silent wrapping via the real-time syscall library, and there is no automatic bootstrap via the library constructor trick either. IOW, an EVL application needs to explicitly attach to the core via a call to evl_init(), and its real-time threads have to do so as well using the evl_attach_thread() syscall, this is documented at [1]. So unless cyclictest.c was modified to issue such syscall, the performance figures you observed would be those of threads managed by the vanilla kernel, not the real-time core. To make sure you are actually running EVL threads, you may want to check with the libevl 'ps' command, e.g. this is a snapshot taken when a latmus instance is running: # evl ps -l root@homelab-phytec-mira:~# evl ps -l CPU PID SCHED PRIO ISW CTXSW SYS RWA STAT TIMEOUT %CPU CPUTIME WCHAN NAME 0 407 fifo 98 0 20947 20948 0 Wt - 0.0 0:125.134 &wf->wait timer-responder:405 0 408 weak 0 1 2 1 0 W - 0.0 0:000.023 &wf->wait test-sitter:405 The kernel configuration can be checked [2] for known latency killers as follows: root@homelab-phytec-mira:~# evl check root@homelab-phytec-mira:~# i.e. this command should be silent, otherwise problematic Kconfig option(s) would be dumped to stdout. > What’s odd is > > *) the outlier in the graph without load is bigger than the one with > load (1.3 ms vs. 550 us), which should be the opposite > > *) the outlier in the graph with load should be like the one in figure > 4 which is around 10 ms, but it is more like 550 us - please note > that the > kernel config contains CONFIG_PREEMPT_NONE=y and CONFIG_EVL=y > Does this observation imply, that an evl kernel modifies the behavior > of the "vanilla" Linux scheduler for processes which should run on the > "standard/vanilla" Linux scheduler? No it does not. It looks like all these figures are not related to EVL threads, but to regular/vanilla threads instead. > > = evl Kernel - CONFIG_EVL - latmus = > > I am not quite sure if/how latmus compares to cyclictest. Ideally I > would like to compare histograms produced by latmus against those I > produce with cyclictest. > The purpose and behavior of latmus are detailed here [3]. > Let’s have a look at graphs 17 and 18 on page 11. > > *) the outlier in the graph without load is bigger than the one with > load (780 us vs. 700 us), which should be the opposite > > *) xenomai 3 with cyclictest compiled for xenomai - figures 13 and 14 > on page 9 performs significantly better than evl with latmus > **) no load outlier: xenomai 3(cyclictest): 26 us - evl(latmus): 780 us > **) load outlier: xenomai 3(cyclictest): 65 us - evl(latmus): 700 us > > *) a preempt-rt patched kernel with cyclictest - figures 9 and 10 on > page 7 performs significantly > better than evl with latmus > **) no load outlier: preempt-rt(cyclictest): 120 us - evl(latmus): 780 us > **) load outlier: preempt-rt(cyclictest): 119 us - evl(latmus): 700 us > > *) a vanilla kernel with CONFIG_PREEMPT with cyclictest - figures 7 > and 8 on page 6 performs similar > to evl with latmus > **) no load outlier: preempt-rt(cyclictest): 880 us - evl(latmus): 780 us > **) load outlier: preempt-rt(cyclictest): 720 us - evl(latmus): 700 us > > Please check the .pdf here[1] for more details: > FWIW, I have a phytec mira at hand here - this is actually my main development board for some real-time application software ATM, so I ran a couple of short latmus tests the same way you did. [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 5.15.77-00705-gae6080e09d9a (rpm@pyro) (arm-linux-gnueabihf-gcc (GCC) 11.0.1 20210310 (experimental) [master revision 5987d8a79cda1069c774e5c302d5597310270026], GNU ld (Linaro_Binutils-2021.03) 2.36.50.20210310) #30 SMP PREEMPT IRQPIPE Mon Nov 7 08:55:08 CET 2022 [ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache [ 0.000000] OF: fdt: Machine model: PHYTEC phyBOARD-Mira QuadPlus Carrier-Board with NAND root@homelab-phytec-mira:~# evl -v evl.0.40 -- #df5f221 (2022-11-06 13:00:54 +0100) [requires ABI 30] The first test ran for 500s on a non-isolated CPU(0), the second one isolated on its own CPU(1), both with the same stress-ng loop you mentioned in your document, running in parallel to the latmus test: root@homelab-phytec-mira:~# latmus -gnon-isolated.gp -T500 -p 500 --histogram=1000 warming up on CPU0 (not isolated)... RTT| 00:00:01 (user, 500 us period, priority 98, CPU0-noisol) ... root@homelab-phytec-mira:~# latmus -gisolated.gp -T500 -p 500 --histogram=1000 warming up on CPU1... RTT| 00:00:01 (user, 500 us period, priority 98, CPU1) ... root@homelab-phytec-mira:~# while :; do stress-ng --cpu 12 --io 4 --vm 2 --vm-bytes=500M --fork 4 --timeout 10s; done stress-ng: info: [2732] dispatching hogs: 12 cpu, 4 io, 2 vm, 4 fork stress-ng: info: [2732] successful run completed in 12.99s ... The results are available from [4][5] and [6][7] respectively. To sum up, we have ~62 µs worst-case in non-isolated mode, 37 µs when isolated. Both figures are in line with the expectations on this SoM. To help figuring out the reason for this behavior with latmus on your test board, you may want to share your .config. However, I don't think the results you observed with cyclictest are relevant to EVL. [1] https://evlproject.org/core/user-api/thread/#thread-services [2] https://evlproject.org/core/commands/#evl-check-command [3] https://evlproject.org/core/benchmarks/#latmus-timer-response-time [4] [-- Attachment #2: non-isolated.gp --] [-- Type: application/octet-stream, Size: 1287 bytes --] # test started on: Tue Jun 18 04:49:31 2019 # Linux version 5.15.77-00705-gae6080e09d9a (rpm@pyro) (arm-linux-gnueabihf-gcc (GCC) 11.0.1 20210310 (experimental) [master revision 5987d8a79cda1069c774e5c302d5597310270026], GNU ld (Linaro_Binutils-2021.03) 2.36.50.20210310) #30 SMP PREEMPT IRQPIPE Mon Nov 7 08:55:08 CET 2022 # console=ttymxc1,115200 root=/dev/nfs ip=dhcp nfsroot=/var/minilab/tftpboot/%s/switch/rootfs,v3,tcp maxcpus=4 # libevl version: evl.0.40 -- #df5f221 (2022-11-06 13:00:54 +0100) # sampling period: 500 microseconds # clock gravity: 0i 6000k 6000u # clocksource: mxc_timer1 # vDSO access: mmio # context: user # thread priority: 98 # thread affinity: CPU0-noisol # C-state restricted # duration (hhmmss): 00:08:20 # peak (hhmmss): 00:06:15 # min latency: 1.000 # avg latency: 8.548 # max latency: 61.378 # sample count: 1000003 1 2416 2 44296 3 248740 4 73368 5 57164 6 55003 7 66124 8 68120 9 60403 10 51085 11 42603 12 35007 13 29758 14 25462 15 21965 16 19128 17 16399 18 13794 19 11761 20 9807 21 8395 22 6807 23 5949 24 4743 25 4025 26 3275 27 2651 28 2333 29 1827 30 1517 31 1241 32 1026 33 841 34 689 35 508 36 397 37 297 38 267 39 202 40 173 41 101 42 80 43 69 44 44 45 36 46 35 47 24 48 11 49 12 50 7 51 5 52 5 53 2 54 1 55 1 56 0 57 2 58 1 59 0 60 0 61 1 [-- Attachment #3: Type: text/plain, Size: 4 bytes --] [5] [-- Attachment #4: non-isolated.png --] [-- Type: image/png, Size: 10218 bytes --] [-- Attachment #5: Type: text/plain, Size: 4 bytes --] [6] [-- Attachment #6: isolated.gp --] [-- Type: application/octet-stream, Size: 1108 bytes --] # test started on: Tue Jun 18 05:10:05 2019 # Linux version 5.15.77-00705-gae6080e09d9a (rpm@pyro) (arm-linux-gnueabihf-gcc (GCC) 11.0.1 20210310 (experimental) [master revision 5987d8a79cda1069c774e5c302d5597310270026], GNU ld (Linaro_Binutils-2021.03) 2.36.50.20210310) #30 SMP PREEMPT IRQPIPE Mon Nov 7 08:55:08 CET 2022 # console=ttymxc1,115200 root=/dev/nfs ip=dhcp nfsroot=/var/minilab/tftpboot/%s/switch/rootfs,v3,tcp isolcpus=1 evl.oobcpus=1 # libevl version: evl.0.40 -- #df5f221 (2022-11-06 13:00:54 +0100) # sampling period: 500 microseconds # clock gravity: 0i 6000k 6000u # clocksource: mxc_timer1 # vDSO access: mmio # context: user # thread priority: 98 # thread affinity: CPU1 # C-state restricted # duration (hhmmss): 00:08:20 # peak (hhmmss): 00:04:21 # min latency: 0.666 # avg latency: 2.453 # max latency: 36.697 # sample count: 1000004 0 22809 1 522366 2 180838 3 108240 4 60186 5 36583 6 22924 7 14890 8 9831 9 6650 10 4512 11 3258 12 2061 13 1536 14 993 15 746 16 471 17 347 18 205 19 140 20 100 21 74 22 59 23 44 24 33 25 21 26 34 27 16 28 11 29 9 30 2 31 9 32 2 33 1 34 0 35 1 36 2 [-- Attachment #7: Type: text/plain, Size: 4 bytes --] [7] [-- Attachment #8: isolated.png --] [-- Type: image/png, Size: 10576 bytes --] [-- Attachment #9: Type: text/plain, Size: 15 bytes --] -- Philippe. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cyclictest vs. latmus 2022-11-07 7:32 ` Philippe Gerum @ 2022-11-10 6:52 ` Robert Berger 2022-11-11 9:00 ` Philippe Gerum 0 siblings, 1 reply; 5+ messages in thread From: Robert Berger @ 2022-11-10 6:52 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai Hi, Thanks for the explanations and sorry for the late reply. My comments are inline. On 07/11/2022 08:32, Philippe Gerum wrote: > > The results are available from [4][5] and [6][7] respectively. To sum > up, we have ~62 µs worst-case in non-isolated mode, 37 µs when > isolated. Both figures are in line with the expectations on this SoM. Yep and this is what I would expect, so most likely something is wrong in my configuration ; > > To help figuring out the reason for this behavior with latmus on your > test board, you may want to share your .config. However, I don't think > the results you observed with cyclictest are relevant to EVL. > Please note, that I use a multi_v7_defconfig. I guess you use something else. Here[1] you can find the config.gz, which I pulled from the board. Those are the components I use: root@multi-v7-ml:~# cat /proc/version Linux version 5.15.64-evl-student-g96fa4a8fcde6 (student@HP-ZBook-15-G4-2) (arm-resy-linux-gnueabi-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39.0.20220819) #1 SMP IRQPIPE Fri Nov 4 01:10:57 CET 2022 git show 96fa4a8fcde6d9361777daf05caaa527a44e418e commit 96fa4a8fcde6d9361777daf05caaa527a44e418e (HEAD -> v5.15.64-evl3-rebase_LOCAL) Author: student <student@ReliableEmbeddedSystems.com> Date: Fri Nov 4 01:10:35 2022 +0100 commit not to be -dirty diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig index 33572998dbbe..f770b9681a98 100644 --- a/arch/arm/configs/multi_v7_defconfig +++ b/arch/arm/configs/multi_v7_defconfig @@ -1187,3 +1187,8 @@ CONFIG_DEBUG_FS=y CONFIG_CHROME_PLATFORMS=y CONFIG_CROS_EC=m CONFIG_CROS_EC_CHARDEV=m +CONFIG_DA9062_WATCHDOG=y +CONFIG_MFD_DA9062=y +CONFIG_REGULATOR_DA9062=y +CONFIG_IKCONFIG=y +CONFIG_IKCONFIG_PROC=y commit 5c75c622911d192dc8c2988679713576cff89d45 (HEAD -> r39_LOCAL, tag: r39, origin/next, origin/master, origin/HEAD, master) Author: Philippe Gerum <rpm@xenomai.org> Date: Mon Oct 31 12:16:38 2022 +0100 libevl r39 Signed-off-by: Philippe Gerum <rpm@xenomai.org> [1] https://drive.google.com/drive/folders/1yPOiKzfEgyXmdB-y4Pc4EzP15V4OtMyV?usp=sharing Regards, Robert -- Robert Berger Embedded Software Evangelist Reliable Embedded Systems Consulting Training Engineering URL: https://www.reliableembeddedsystems.com Schedule a web meeting: https://calendly.com/reliableembeddedsystems/ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: cyclictest vs. latmus 2022-11-10 6:52 ` Robert Berger @ 2022-11-11 9:00 ` Philippe Gerum 2022-11-21 22:17 ` Robert Berger 0 siblings, 1 reply; 5+ messages in thread From: Philippe Gerum @ 2022-11-11 9:00 UTC (permalink / raw) To: Robert Berger; +Cc: xenomai Robert Berger <xenomai.list@gmail.com> writes: > Hi, > > Thanks for the explanations and sorry for the late reply. > > My comments are inline. > > On 07/11/2022 08:32, Philippe Gerum wrote: >> The results are available from [4][5] and [6][7] respectively. To >> sum >> up, we have ~62 µs worst-case in non-isolated mode, 37 µs when >> isolated. Both figures are in line with the expectations on this SoM. > > Yep and this is what I would expect, so most likely something is wrong > in my configuration ; > >> To help figuring out the reason for this behavior with latmus on >> your >> test board, you may want to share your .config. However, I don't think >> the results you observed with cyclictest are relevant to EVL. >> > > Please note, that I use a multi_v7_defconfig. I guess you use > something else. > No, I'm using multi_v7_defconfig as well. imx_v6_7_defconfig won't boot that SoM. The latency spikes you observed is related to CPU frequency scaling (see [1]), the "performance" governor was not the default one in your Kconfig, but the on-demand governor was, leading to dynamic frequency adjustment. This should be reported by the following command on your system: ~ # evl check CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=n CONFIG_FTRACE=y CONFIG_FTRACE is also reported because it does have a noticeable impact on the latency figures even when no tracer is enabled on ARM, although the overhead is still negligible compared to actual latency killers. Setting CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y is enough to fix your Kconfig as I did here. HTH, [1] https://evlproject.org/core/caveat/#caveat-cpufreq -- Philippe. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cyclictest vs. latmus 2022-11-11 9:00 ` Philippe Gerum @ 2022-11-21 22:17 ` Robert Berger 0 siblings, 0 replies; 5+ messages in thread From: Robert Berger @ 2022-11-21 22:17 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai Hi, Sorry for the late reply - I was on the road. My comments are inline. On 11/11/2022 10:00, Philippe Gerum wrote: > The latency spikes you observed is related to CPU frequency scaling (see > [1]), the "performance" governor was not the default one in your > Kconfig, but the on-demand governor was, leading to dynamic frequency > adjustment. This should be reported by the following command on your > system: > > ~ # evl check > CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=n > CONFIG_FTRACE=y > > CONFIG_FTRACE is also reported because it does have a noticeable impact > on the latency figures even when no tracer is enabled on ARM, although > the overhead is still negligible compared to actual latency killers. > > Setting CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y is enough to fix your > Kconfig as I did here. > > HTH, > > [1] https://evlproject.org/core/caveat/#caveat-cpufreq Absolutely! I fixed my kernel config and now the graphs looks like Xenomai 3. Thanks! Regards, Robert -- Robert Berger Embedded Software Evangelist Reliable Embedded Systems Consulting Training Engineering URL: https://www.reliableembeddedsystems.com Schedule a web meeting: https://calendly.com/reliableembeddedsystems/ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-11-21 22:17 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-06 19:11 cyclictest vs. latmus Robert Berger 2022-11-07 7:32 ` Philippe Gerum 2022-11-10 6:52 ` Robert Berger 2022-11-11 9:00 ` Philippe Gerum 2022-11-21 22:17 ` Robert Berger
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.