Re: cyclictest vs. latmus - Philippe Gerum

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Philippe Gerum <rpm@xenomai.org>
To: Robert Berger <xenomai.list@gmail.com>
Cc: xenomai@lists.linux.dev
Subject: Re: cyclictest vs. latmus
Date: Mon, 07 Nov 2022 08:32:48 +0100	[thread overview]
Message-ID: <87a653p28w.fsf@xenomai.org> (raw)
In-Reply-To: <49fb01fa-abda-46eb-cf47-31d48810d7eb@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6945 bytes --]


Robert Berger <xenomai.list@gmail.com> writes:

> Hi,
>
> I run some test cases with cyclictest and cyclictest built for xenomai
> 3 for a couple of years now and want to switch to xenomai 4/evl.
>
> Looks like I managed to compile an evl kernel and the evllib and I use
> latmus instead of cyclictest (not sure if I'm doing that correctly)
> and also I compare the results against cyclictest (not sure it's right
> to do that they way I'm doing it).
>
> Anyways, sorry for the lengthy document I came of with[1] with
> histograms and questions.
>
> Here are my questions, which you can find in a nicer formatted way in
> the doc[1].
>
> = evl Kernel - CONFIG_PREEMPT_NONE - cyclictest =
>
> My understanding is, that evl works similar to xenomai 3, meaning that
> you need to compile/link an application against libevl for evl to kick
> in. I would expect figures 15 and 16 on page 10 to look like figures 3
> and 4 on page 4.
>

Linking is not enough with EVL, besides this is no POSIX API so you
would not have any silent wrapping via the real-time syscall library,
and there is no automatic bootstrap via the library constructor trick
either. IOW, an EVL application needs to explicitly attach to the core
via a call to evl_init(), and its real-time threads have to do so as
well using the evl_attach_thread() syscall, this is documented at
[1]. So unless cyclictest.c was modified to issue such syscall, the
performance figures you observed would be those of threads managed by
the vanilla kernel, not the real-time core.

To make sure you are actually running EVL threads, you may want to check
with the libevl 'ps' command, e.g. this is a snapshot taken when a
latmus instance is running:

# evl ps -l
root@homelab-phytec-mira:~# evl ps -l
CPU   PID   SCHED   PRIO  ISW     CTXSW     SYS       RWA       STAT     TIMEOUT      %CPU      CPUTIME       WCHAN                 NAME
  0   407    fifo    98   0       20947     20948     0          Wt         -           0.0      0:125.134    &wf->wait             timer-responder:405
  0   408    weak     0   1       2         1         0          W          -           0.0      0:000.023    &wf->wait             test-sitter:405


The kernel configuration can be checked [2] for known latency killers as
follows:

root@homelab-phytec-mira:~# evl check
root@homelab-phytec-mira:~#

i.e. this command should be silent, otherwise problematic Kconfig
option(s) would be dumped to stdout.

> What’s odd is
>
> *) the outlier in the graph without load is bigger than the one with
>  load (1.3 ms vs. 550 us), which should be the opposite
>
> *) the outlier in the graph with load should be like the one in figure
>  4 which is around 10 ms, but it is more like 550 us - please note
> that the
> kernel config contains CONFIG_PREEMPT_NONE=y and CONFIG_EVL=y
> Does this observation imply, that an evl kernel modifies the behavior
> of the "vanilla" Linux scheduler for processes which should run on the
> "standard/vanilla" Linux scheduler?

No it does not. It looks like all these figures are not related to EVL
threads, but to regular/vanilla threads instead.

>
> = evl Kernel - CONFIG_EVL - latmus =
>
> I am not quite sure if/how latmus compares to cyclictest. Ideally I
> would like to compare histograms produced by latmus against those I
> produce with cyclictest.
>

The purpose and behavior of latmus are detailed here [3].

> Let’s have a look at graphs 17 and 18 on page 11.
>
> *) the outlier in the graph without load is bigger than the one with
>  load (780 us vs. 700 us), which should be the opposite
>
> *) xenomai 3 with cyclictest compiled for xenomai - figures 13 and 14
>  on page 9 performs significantly better than evl with latmus
> **) no load outlier: xenomai 3(cyclictest): 26 us - evl(latmus): 780 us
> **) load outlier: xenomai 3(cyclictest): 65 us - evl(latmus): 700 us
>
> *) a preempt-rt patched kernel with cyclictest - figures 9 and 10 on
>  page 7 performs significantly
> better than evl with latmus
> **) no load outlier: preempt-rt(cyclictest): 120 us - evl(latmus): 780 us
> **) load outlier: preempt-rt(cyclictest): 119 us - evl(latmus): 700 us
>
> *) a vanilla kernel with CONFIG_PREEMPT with cyclictest - figures 7
>  and 8 on page 6 performs similar
> to evl with latmus
> **) no load outlier: preempt-rt(cyclictest): 880 us - evl(latmus): 780 us
> **) load outlier: preempt-rt(cyclictest): 720 us - evl(latmus): 700 us
>
> Please check the .pdf here[1] for more details:
>

FWIW, I have a phytec mira at hand here - this is actually my main
development board for some real-time application software ATM, so I ran
a couple of short latmus tests the same way you did.

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.15.77-00705-gae6080e09d9a (rpm@pyro) (arm-linux-gnueabihf-gcc (GCC) 11.0.1 20210310 (experimental) [master revision 5987d8a79cda1069c774e5c302d5597310270026], GNU ld (Linaro_Binutils-2021.03) 2.36.50.20210310) #30 SMP PREEMPT IRQPIPE Mon Nov 7 08:55:08 CET 2022
[    0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt: Machine model: PHYTEC phyBOARD-Mira QuadPlus Carrier-Board with NAND

root@homelab-phytec-mira:~# evl -v
evl.0.40 -- #df5f221 (2022-11-06 13:00:54 +0100) [requires ABI 30]

The first test ran for 500s on a non-isolated CPU(0), the second one
isolated on its own CPU(1), both with the same stress-ng loop you
mentioned in your document, running in parallel to the latmus test:

root@homelab-phytec-mira:~# latmus -gnon-isolated.gp -T500 -p 500 --histogram=1000
warming up on CPU0 (not isolated)...
RTT|  00:00:01  (user, 500 us period, priority 98, CPU0-noisol)
...

root@homelab-phytec-mira:~# latmus -gisolated.gp -T500 -p 500 --histogram=1000
warming up on CPU1...
RTT|  00:00:01  (user, 500 us period, priority 98, CPU1)
...

root@homelab-phytec-mira:~# while :; do stress-ng --cpu 12 --io 4 --vm 2 --vm-bytes=500M --fork 4 --timeout 10s; done 
stress-ng: info:  [2732] dispatching hogs: 12 cpu, 4 io, 2 vm, 4 fork
stress-ng: info:  [2732] successful run completed in 12.99s
...

The results are available from [4][5] and [6][7] respectively. To sum
up, we have ~62 µs worst-case in non-isolated mode, 37 µs when
isolated. Both figures are in line with the expectations on this SoM.

To help figuring out the reason for this behavior with latmus on your
test board, you may want to share your .config. However, I don't think
the results you observed with cyclictest are relevant to EVL.

[1] https://evlproject.org/core/user-api/thread/#thread-services
[2] https://evlproject.org/core/commands/#evl-check-command
[3] https://evlproject.org/core/benchmarks/#latmus-timer-response-time

[4] 

[-- Attachment #2: non-isolated.gp --]
[-- Type: application/octet-stream, Size: 1287 bytes --]

# test started on: Tue Jun 18 04:49:31 2019
# Linux version 5.15.77-00705-gae6080e09d9a (rpm@pyro) (arm-linux-gnueabihf-gcc (GCC) 11.0.1 20210310 (experimental) [master revision 5987d8a79cda1069c774e5c302d5597310270026], GNU ld (Linaro_Binutils-2021.03) 2.36.50.20210310) #30 SMP PREEMPT IRQPIPE Mon Nov 7 08:55:08 CET 2022
# console=ttymxc1,115200 root=/dev/nfs ip=dhcp nfsroot=/var/minilab/tftpboot/%s/switch/rootfs,v3,tcp maxcpus=4
# libevl version: evl.0.40 -- #df5f221 (2022-11-06 13:00:54 +0100)
# sampling period: 500 microseconds
# clock gravity: 0i 6000k 6000u
# clocksource: mxc_timer1
# vDSO access: mmio
# context: user
# thread priority: 98
# thread affinity: CPU0-noisol
# C-state restricted
# duration (hhmmss): 00:08:20
# peak (hhmmss): 00:06:15
# min latency: 1.000
# avg latency: 8.548
# max latency: 61.378
# sample count: 1000003
1 2416
2 44296
3 248740
4 73368
5 57164
6 55003
7 66124
8 68120
9 60403
10 51085
11 42603
12 35007
13 29758
14 25462
15 21965
16 19128
17 16399
18 13794
19 11761
20 9807
21 8395
22 6807
23 5949
24 4743
25 4025
26 3275
27 2651
28 2333
29 1827
30 1517
31 1241
32 1026
33 841
34 689
35 508
36 397
37 297
38 267
39 202
40 173
41 101
42 80
43 69
44 44
45 36
46 35
47 24
48 11
49 12
50 7
51 5
52 5
53 2
54 1
55 1
56 0
57 2
58 1
59 0
60 0
61 1

[-- Attachment #3: Type: text/plain, Size: 4 bytes --]

[5] 

[-- Attachment #4: non-isolated.png --]
[-- Type: image/png, Size: 10218 bytes --]

[-- Attachment #5: Type: text/plain, Size: 4 bytes --]

[6] 

[-- Attachment #6: isolated.gp --]
[-- Type: application/octet-stream, Size: 1108 bytes --]

# test started on: Tue Jun 18 05:10:05 2019
# Linux version 5.15.77-00705-gae6080e09d9a (rpm@pyro) (arm-linux-gnueabihf-gcc (GCC) 11.0.1 20210310 (experimental) [master revision 5987d8a79cda1069c774e5c302d5597310270026], GNU ld (Linaro_Binutils-2021.03) 2.36.50.20210310) #30 SMP PREEMPT IRQPIPE Mon Nov 7 08:55:08 CET 2022
# console=ttymxc1,115200 root=/dev/nfs ip=dhcp nfsroot=/var/minilab/tftpboot/%s/switch/rootfs,v3,tcp isolcpus=1 evl.oobcpus=1
# libevl version: evl.0.40 -- #df5f221 (2022-11-06 13:00:54 +0100)
# sampling period: 500 microseconds
# clock gravity: 0i 6000k 6000u
# clocksource: mxc_timer1
# vDSO access: mmio
# context: user
# thread priority: 98
# thread affinity: CPU1
# C-state restricted
# duration (hhmmss): 00:08:20
# peak (hhmmss): 00:04:21
# min latency: 0.666
# avg latency: 2.453
# max latency: 36.697
# sample count: 1000004
0 22809
1 522366
2 180838
3 108240
4 60186
5 36583
6 22924
7 14890
8 9831
9 6650
10 4512
11 3258
12 2061
13 1536
14 993
15 746
16 471
17 347
18 205
19 140
20 100
21 74
22 59
23 44
24 33
25 21
26 34
27 16
28 11
29 9
30 2
31 9
32 2
33 1
34 0
35 1
36 2

[-- Attachment #7: Type: text/plain, Size: 4 bytes --]

[7] 

[-- Attachment #8: isolated.png --]
[-- Type: image/png, Size: 10576 bytes --]

[-- Attachment #9: Type: text/plain, Size: 15 bytes --]


-- 
Philippe.

next prev parent reply	other threads:[~2022-11-07  9:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-06 19:11 cyclictest vs. latmus Robert Berger
2022-11-07  7:32 ` Philippe Gerum [this message]
2022-11-10  6:52   ` Robert Berger
2022-11-11  9:00     ` Philippe Gerum
2022-11-21 22:17       ` Robert Berger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a653p28w.fsf@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=xenomai.list@gmail.com \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.