From: "Łukasz Majewski" <lukma@nabladev.com>
To: Philippe Gerum <rpm@xenomai.org>
Cc: Giulio Moro <giulio@bela.io>, Xenomai <xenomai@lists.linux.dev>
Subject: Re: Unexpected switches to in-band
Date: Wed, 29 Oct 2025 14:51:25 +0100 [thread overview]
Message-ID: <20251029145125.71debaab@wsk> (raw)
In-Reply-To: <875xbyawx7.fsf@xenomai.org>
Hi Philippe,
> Hi Łukasz,
>
> Łukasz Majewski <lukma@nabladev.com> writes:
>
> > Hi Philippe,
> >
> >> > so either some
> >> > traces are missing,
> >>
> >> I will check if I can add more tracing.
> >
> > On one console:
> > ---------------
> > evl trace -eirq -f; sleep 1; latmus -c 2 -p 1000 -P 99
> >
> >
> > On the second one:
> > ------------------
> > test_prog/stress_prog_NO_IO &
> > sleep 5
> > test_prog/stress_prog_NO_IO &
> >
> > echo 0 > /sys/kernel/tracing/tracing_on
> > evl trace -p > Xenomai4-ISW-trace-6.6-logs.txt
> >
> > First trace:
> > ############
> >
> > The relevant dmesg output:
> >
> > [ 75.214527] EVL: timer-responder:743 switching in-band [pid=745,
> > excpt=14, user_pc=0x77d08492a5fe]
> >
> > When I do look into logs - there is also a "hole" when the ISW is
> > reported by dmesg.
> >
> >
> > However, there are some hints when we look around evl_latmus:
> >
> > timer-responder-745 [002] d.~2. 75.868139: evl_latspot: **
> > latency peak: 18843.649 us **
> >
> > On CPU 20:
> > stress_prog_NO_-752 [020] ...2. 75.868139: folio_add_lru_vma
> > <-__handle_mm_fault
> >
> > Which ends with folio_add_lru()...
> > And after that we do have the ~18ms latency peak.
> >
> > And the "folio_*" is a relatively new feature added to Linux kernel
> > [1].
> >
> > The logs can be fetched from:
> > https://nextcloud.swupdate.org/index.php/s/EjHo5EAwMTeY5ya
> >
> >
> > Second trace [*]:
> > #################
> >
> >
> > [ 1610.086961] EVL: timer-responder:768 switching in-band [pid=770,
> > excpt=14, user_pc=0x72e9d35af5fe]
> >
> > <idle>-0 [008] *..1. 1610.076068: rcu_dyntick: Start 1 0 0x2ec
> > <idle>-0 [003] *.~1. 1610.102242: rcu_dyntick: Endirq 0 1 0x404
> > <idle>-0 [003] *.~2. 1610.102242: arch_handle_irq
> > <-arch_pipeline_entry <idle>-0 [003] *.~2. 1610.102242:
> > irq_to_desc <-arch_handle_irq <idle>-0 [003] *.~1. 1610.102242:
> > ma_read: mtree_load Node: 00000000c4071949 (0 18446744073709551615)
> > range: 4355-4355 <idle>-0 [003] *.~2. 1610.102242:
> > rcu_oob_prepare_lock <-mtree_load <idle>-0 [003] *.~2.
> > 1610.102242: __rcu_read_lock <-mtree_load <idle>-0 [003] *.~2.
> > 1610.102242: __rcu_read_unlock <-mtree_load
> >
> >
> > Now some questions:
> > 1. Why '~' (i.e. oob stage) is on CPU=3 as we pinpointed execution
> > of the 'latmus' to CPU=2 (and the command line has isolcpu=2) ?
> >
>
> Dovetail always switches up front to the oob stage before calling an
> irq handler, so this trace is normal (upgrading the execution stage
> for the current context is a lightweight operation, this is entirely
> different from switching a task from inband to oob mode which is
> heavyweight).
Thanks for the explanation.
>
> > 2. The mtree_load() is called in the OOB context... IMHO, this
> > shall be done before we do start the latmus?
> >
>
> mtree_load() is simply the lookup call into a maple tree. In this
> context, the call fetches the IRQ descriptor corresponding to the
> interrupt number in a descriptor table managed as a tree of such
> kind. No issue there.
Ok.
>
> > 3. In this case - there is also a "hole" between the ISW time
> > (1610.086961) and the ones between rcu_dyntick: Start / Endirq
> >
> >
> > Link to the full log:
> > https://nextcloud.swupdate.org/index.php/s/nPRAnqCLFDibX6r
> >
> >
>
> Ok, so this trace is interesting. Since the timer responder thread was
> pinned on CPU2, the first investigation needs to focus on that CPU
> specifically. Filtering out all activity from other CPUs for now, here
> are some observations:
>
> <idle>-0 [002] *.~2. 75.864279: evl_core_tick
> <-lapic_oob_handler <idle>-0 [002] *.~2. 75.864279:
> do_clock_tick <-evl_core_tick <idle>-0 [002] *.~1.
> 75.864279: evl_timer_expire: timer=latmus_pulse_handler <idle>-0
> [002] *.~2. 75.864279: latmus_pulse_handler <-do_clock_tick
> <idle>-0 [002] *.~2. 75.864279: evl_flush_wait_locked
> <-latmus_pulse_handler <idle>-0 [002] *.~1. 75.864280:
> evl_flush_wait: wchan=&wf->wait
>
> CPU2 received an oob IRQ timer event for waking up the timer-responder
> thread which measures latency. evl_flush_wait() is the wake up call
> that posts an evl 'flag' which the responder waits on for the next
> event.
>
> <idle>-0 [002] *.~2. 75.864280: evl_insert_tnode
> <-do_clock_tick <idle>-0 [002] *.~2. 75.864280:
> evl_program_proxy_tick <-do_clock_tick <idle>-0 [002] *.~2.
> 75.864280: lapic_next_deadline <-evl_program_proxy_tick <idle>-0
> [002] *.~2. 75.864280: do_trace_write_msr <-lapic_next_deadline
> <idle>-0 [002] *.~1. 75.864280: write_msr: 6e0, value
> 340012ec56 <idle>-0 [002] *.~1. 75.864280: evl_timer_shot:
> [proxy-timer/2] at 75.864790 (delay: 510 us, 134693 cycles)
>
> On exit from the IRQ handling process, we program the timer for the
> next shot, which should be 510 us ahead. With a 1Khz frequency, this
> means that we consumed half of the time budget already, which is a
> lot. But some guy is pounding the hardware with OOM-prone activity,
> so this may make sense anyway. So far so good. The next oob tick
> should occur as close as possible to 75.864790.
>
> <idle>-0 [002] *.~1. 75.864280: local_timer_exit:
> vector=236 <idle>-0 [002] *.~1. 75.864280: irq_handler_exit:
> irq=4355 ret=handled <idle>-0 [002] *.~1. 75.864280:
> irq_pipeline_exit: irq=4355 <idle>-0 [002] *.~1. 75.864280:
> irq_enable: caller=arch_pipeline_entry+0x9f/0x110
> parent=sysvec_apic_timer_interrupt+0x12/0x20 <idle>-0 [002]
> #.~2. 75.864280: handle_irq_pipelined_finish <-arch_pipeline_entry
> <idle>-0 [002] #.~1. 75.864280: rcu_dyntick: --= 3 1 0xc
> <idle>-0 [002] #.~2. 75.864280: preempt_count_sub
> <-handle_irq_pipelined_finish <idle>-0 [002] #.~2.
> 75.864281: preempt_count_sub <-handle_irq_pipelined_finish
>
> At this point (i.e. on IRQ exit), I would expect to see a call to
> evl_schedule(), to switch to the timer-responder thread we have just
> woken up (the irq_exit_pipeline() hook in Dovetail ends up calling the
> rescheduling procedure the real-time core provides, for evl that would
> be evl_exit_irq()). It does not happen, so this must be that such
> thread did not get a chance to go back waiting on the evl flag yet,
> i.e. evl_flush_wait() did not wake up anyone. This is interesting:
> this would mean that after 500+ µs, the timer responder is still busy
> doing "something else" (in fact, some kernel activity is borrowing
> time from it). we don't have enough traces to figure out why yet.
>
> <idle>-0 [002] *.~2. 75.864791:
> generic_pipeline_irq_desc <-arch_handle_irq <idle>-0 [002]
> *.~1. 75.864791: irq_pipeline_entry: irq=4355 <idle>-0 [002]
> *.~2. 75.864791: handle_apic_irq <-generic_pipeline_irq_desc
> <idle>-0 [002] *.~2. 75.864791: handle_oob_irq
> <-handle_apic_irq <idle>-0 [002] *.~2. 75.864791: do_oob_irq
> <-handle_oob_irq <idle>-0 [002] *.~1. 75.864791:
> irq_handler_entry: irq=4355 name=Out-of-band LAPIC timer interrupt
> <idle>-0 [002] *.~2. 75.864791: lapic_oob_handler
> <-do_oob_irq <idle>-0 [002] *.~1. 75.864791:
> local_timer_entry: vector=236 <idle>-0 [002] *.~2.
> 75.864791: evl_core_tick <-lapic_oob_handler <idle>-0 [002]
> *.~2. 75.864791: do_clock_tick <-evl_core_tick
>
> As expected, the latest timer tick scheduled by evl occurs close to
> the target, only off by one microsecond (75.864791 vs 75.864790). So
> we have no timing issue here. Something else is at work.
>
> Could you get me a trace which goes back up to two secs before the
> time reported by evl about the inband switch of the timer responder?
> e.g.
>
> [ 75.214527] EVL: timer-responder:743 switching in-band [pid=745,
> excpt=14, user_pc=0x77d08492a5fe]
>
> In the above case, I would need a trace going back to 73.214527 or so.
>
Apparently the ftrace buffer was too small. I will adjust it to store
more data and share results.
> TIA,
>
--
Best regards,
Lukasz Majewski
--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic
next prev parent reply other threads:[~2025-10-29 13:51 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-09 4:33 Unexpected switches to in-band Giulio Moro
2025-10-09 13:17 ` Łukasz Majewski
2025-10-09 19:05 ` Giulio Moro
2025-10-10 10:24 ` Łukasz Majewski
2025-10-10 12:21 ` Giulio Moro
2025-10-10 13:08 ` Łukasz Majewski
2025-10-11 4:25 ` Giulio Moro
2025-10-11 15:55 ` Philippe Gerum
2025-10-11 16:10 ` Philippe Gerum
2025-10-11 16:47 ` Giulio Moro
2025-10-11 16:56 ` Philippe Gerum
2025-10-11 17:15 ` Philippe Gerum
2025-10-11 19:46 ` Giulio Moro
2025-10-12 8:54 ` Philippe Gerum
2025-10-12 14:44 ` Philippe Gerum
2025-10-20 7:47 ` Łukasz Majewski
2025-10-20 12:46 ` Giulio Moro
2025-10-20 14:01 ` Philippe Gerum
2025-10-21 11:13 ` Łukasz Majewski
2025-10-23 13:54 ` Łukasz Majewski
2025-10-26 20:04 ` Philippe Gerum
2025-10-27 11:05 ` Łukasz Majewski
2025-10-27 11:35 ` Philippe Gerum
2025-10-27 12:54 ` Łukasz Majewski
2025-10-27 16:25 ` Łukasz Majewski
2025-10-27 18:16 ` Giulio Moro
2025-10-27 22:42 ` Giulio Moro
2025-10-29 9:18 ` Philippe Gerum
2025-10-29 13:51 ` Łukasz Majewski [this message]
2025-10-30 12:26 ` Łukasz Majewski
2025-10-30 16:17 ` Philippe Gerum
2025-10-31 15:56 ` Łukasz Majewski
2025-10-31 16:30 ` Philippe Gerum
2025-10-31 17:34 ` Jan Kiszka
2025-10-31 18:09 ` Philippe Gerum
2025-10-31 18:11 ` Philippe Gerum
2025-11-01 11:32 ` Łukasz Majewski
2025-11-03 7:57 ` Florian Bezdeka
2025-11-03 9:29 ` Jan Kiszka
2025-11-01 11:31 ` Łukasz Majewski
2025-10-31 18:13 ` Philippe Gerum
2025-11-01 15:59 ` Łukasz Majewski
2025-11-01 16:33 ` Giulio Moro
2025-11-03 14:06 ` Philippe Gerum
2025-11-04 7:53 ` Łukasz Majewski
2025-11-04 8:19 ` Philippe Gerum
2025-11-03 14:00 ` Philippe Gerum
2025-10-30 16:26 ` Philippe Gerum
2025-10-11 17:43 ` Philippe Gerum
2025-10-11 15:37 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251029145125.71debaab@wsk \
--to=lukma@nabladev.com \
--cc=giulio@bela.io \
--cc=rpm@xenomai.org \
--cc=xenomai@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.