From: Philippe Gerum <rpm@xenomai.org>
To: "Łukasz Majewski" <lukma@nabladev.com>
Cc: Giulio Moro <giulio@bela.io>, Xenomai <xenomai@lists.linux.dev>
Subject: Re: Unexpected switches to in-band
Date: Sun, 26 Oct 2025 21:04:52 +0100 [thread overview]
Message-ID: <87a51djuor.fsf@xenomai.org> (raw)
In-Reply-To: <20251023155439.0170f987@wsk> ("Łukasz Majewski"'s message of "Thu, 23 Oct 2025 15:54:39 +0200")
Hi Łukasz,
Łukasz Majewski <lukma@nabladev.com> writes:
> Hi Philippe,
>
>> Giulio Moro <giulio@bela.io> writes:
>>
>> > Łukasz Majewski wrote on 20/10/2025 02:47:
>> >
>> >> Could you share which version of libevl do you use?
>> >
>> > I was using the latest release of libevl that was compatible with
>> > the kernel UAPI. Sorry I haven't provided more details on the
>> > issue; I am focusing on building an image around 6.1 for a deadline
>> > coming up next month so I haven't been able to get into tracing yet.
>> >
>> > The only additional finding I have so far is that there seems to be
>> > something on the Linux side that "breaks real-time" which affects
>> > both linux and evl. In the below list, "evl bad" means that the
>> > above mentioned ISW are observed. "Linux bad" means that I see a
>> > disproportionate number of underruns under stress on a Linux program
>> > running with SCHED_FIFO and priority 95 with a period of 360us. I
>> > understand "disproportionate" is a very subjective term but, to give
>> > an idea, over a 10 minutes test I get a couple of underruns with
>> > "linux good" and I get hundreds of underruns with "linux good"
>> >
>> > v6.12.y-evl-rebase: evl bad, linux bad
>> > v6.11.y-evl-rebase: evl bad, linux untested
>> > v6.10.y-evl-rebase: evl bad, linux untested
>> > v6.9.y-evl-rebase: evl bad, linux bad
>> > v6.6.y-evl-rebase: evl bad, linux bad
>> > v6.3.y-evl-rebase: evl bad, linux bad on startup only
>> > v6.2.y-evl-rebase: libevl r42, evl good, linux good
>> > v6.1.y-cip-evl-rebase: libevl master, evl good, linux good
>> >
>> > Not sure if this is of any help; I hope to be able to get back on
>> > this soon. Best,
>> > Giulio
>>
>> If someone could send me the relevant portion of a trace file with a
>> 'latspot' tracepoint triggered on a latmus run, I could investigate
>> this issue. I'd need the function tracer active on all CPUs, with all
>> traces dumped to a single trace file ('evl trace -ef' should do).
>>
>
> Please find tar'ed output for the trace(s).
>
> Please, however be aware that - I've fall back to 6.6 (as it is the
> version in which I can reproduce the issue in the fastest way).
>
> Customer also reported, that they can reproduce with their SW stack the
> issue on 6.1-slts and 6.12, but it takes considerably longer than for
> 6.6 (in which I can use simple programs to "allocate" memory).
>
Ok, but 6.6 is definitely unmaintained Dovetail-wise, and has been so
for several months now. So although this issue was observed with
maintained releases too, debugging a current issue on an obsolete code
base is a fragile process nevertheless.
> I've used pretty standard set of ftrace CONFIG_* options enabled.
>
> However, it seems like there is a "hole" around the time when in-band
> switch has been reported (in dmesg) and in ftrace output.
>
> I'm going to do the same with all available tracers enabled.
>
> Last but not least, the latspot event is not present in my ftrace
> output (although I've enabled all the CONFIG_EVL*DEBUG options).
>
> Is there any special set of options to required for EVL tracig?
>
Yes, you need to enable the evl/evl_latspot tracepoint.
See [1].
> Tars with logs:
> https://nextcloud.swupdate.org/index.php/s/FgiMsHG9xG8frk3
> https://nextcloud.swupdate.org/index.php/s/XcW75xsQPMXm3zg
Ok, first observation, the logs reveal that we are in an OOM situation:
the kernel strategy is best-effort there, to keep the system in the best
possible state while sacrificing processes. But honestly, although the
VM_LOCKED pages are unevictable by definition, there are quite a few
spots in the mm which might trigger the OOM reaper, including the
inability to allocate page table information, insert new pages and so
on. Although all the memory of an oob application is committed, with its
VMAs populated once libevl has issued mlockall(), I genuinely don't know
how this fares with an OOM situation.
Anyway, I still see a common pattern between the two set of traces, the
unwanted inband switch happens during what seems to be time holes
(assuming that traces of all CPUs are merged into each log):
[ 285.166640] EVL: timer-responder:754 switching in-band [pid=756, excpt=14, user_pc=0x7be4dc59d5fe]
<idle>-0 [022] *..1. 148.913052: rcu_dyntick: Start 1 0 0x8ec
<idle>-0 [014] dN.1. 172.761906: rcu_dyntick: End 0 1 0x39c
[ 172.317009] EVL: timer-responder:743 switching in-band [pid=745, excpt=14, user_pc=0x7036049465fe]
<idle>-0 [022] *..1. 148.913052: rcu_dyntick: Start 1 0 0x8ec
<idle>-0 [014] dN.1. 172.761906: rcu_dyntick: End 0 1 0x39c
A lot can be done in 24 µs on such class of hardware, so either some
traces are missing, or something happens at hardware level which the
kernel does not know about, as it may be seen on x86 with some
uncooperative BIOS (e.g. SMIs, thermal events come to mind). Hopefully
this is not the case, but then we need to make sure that some traces are
indeed missing. If such time hole is confirmed though, then the issue
Giulio is seeing might be different.
[1] https://v4.xenomai.org/core/commands/index.html#evl-trace-command
--
Philippe.
next prev parent reply other threads:[~2025-10-26 20:10 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-09 4:33 Unexpected switches to in-band Giulio Moro
2025-10-09 13:17 ` Łukasz Majewski
2025-10-09 19:05 ` Giulio Moro
2025-10-10 10:24 ` Łukasz Majewski
2025-10-10 12:21 ` Giulio Moro
2025-10-10 13:08 ` Łukasz Majewski
2025-10-11 4:25 ` Giulio Moro
2025-10-11 15:55 ` Philippe Gerum
2025-10-11 16:10 ` Philippe Gerum
2025-10-11 16:47 ` Giulio Moro
2025-10-11 16:56 ` Philippe Gerum
2025-10-11 17:15 ` Philippe Gerum
2025-10-11 19:46 ` Giulio Moro
2025-10-12 8:54 ` Philippe Gerum
2025-10-12 14:44 ` Philippe Gerum
2025-10-20 7:47 ` Łukasz Majewski
2025-10-20 12:46 ` Giulio Moro
2025-10-20 14:01 ` Philippe Gerum
2025-10-21 11:13 ` Łukasz Majewski
2025-10-23 13:54 ` Łukasz Majewski
2025-10-26 20:04 ` Philippe Gerum [this message]
2025-10-27 11:05 ` Łukasz Majewski
2025-10-27 11:35 ` Philippe Gerum
2025-10-27 12:54 ` Łukasz Majewski
2025-10-27 16:25 ` Łukasz Majewski
2025-10-27 18:16 ` Giulio Moro
2025-10-27 22:42 ` Giulio Moro
2025-10-29 9:18 ` Philippe Gerum
2025-10-29 13:51 ` Łukasz Majewski
2025-10-30 12:26 ` Łukasz Majewski
2025-10-30 16:17 ` Philippe Gerum
2025-10-31 15:56 ` Łukasz Majewski
2025-10-31 16:30 ` Philippe Gerum
2025-10-31 17:34 ` Jan Kiszka
2025-10-31 18:09 ` Philippe Gerum
2025-10-31 18:11 ` Philippe Gerum
2025-11-01 11:32 ` Łukasz Majewski
2025-11-03 7:57 ` Florian Bezdeka
2025-11-03 9:29 ` Jan Kiszka
2025-11-01 11:31 ` Łukasz Majewski
2025-10-31 18:13 ` Philippe Gerum
2025-11-01 15:59 ` Łukasz Majewski
2025-11-01 16:33 ` Giulio Moro
2025-11-03 14:06 ` Philippe Gerum
2025-11-04 7:53 ` Łukasz Majewski
2025-11-04 8:19 ` Philippe Gerum
2025-11-03 14:00 ` Philippe Gerum
2025-10-30 16:26 ` Philippe Gerum
2025-10-11 17:43 ` Philippe Gerum
2025-10-11 15:37 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a51djuor.fsf@xenomai.org \
--to=rpm@xenomai.org \
--cc=giulio@bela.io \
--cc=lukma@nabladev.com \
--cc=xenomai@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.