All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Łukasz Majewski" <lukma@nabladev.com>
To: Philippe Gerum <rpm@xenomai.org>
Cc: Giulio Moro <giulio@bela.io>, Xenomai <xenomai@lists.linux.dev>
Subject: Re: Unexpected switches to in-band
Date: Sat, 1 Nov 2025 16:59:57 +0100	[thread overview]
Message-ID: <20251101165957.12432bf4@wsk> (raw)
In-Reply-To: <87bjlnc9u5.fsf@xenomai.org>

Hi Philippe,

> Łukasz Majewski <lukma@nabladev.com> writes:
> 
> > Hi Philippe,
> >  
> >> Hi Łukasz,
> >> 
> >> Łukasz Majewski <lukma@nabladev.com> writes:
> >>   
> >> >> > Could you get me a trace which goes back up to two secs before
> >> >> > the time reported by evl about the inband switch of the timer
> >> >> > responder? e.g.    
> >> >
> >> > Why is it 2 seconds? Is it related to latmus "refresh time"?
> >> > (which is 1s)
> >> >    
> >> 
> >> Picking this delay is purely heuristical. Such a long backlog is
> >> likely going to include the smoking gun, if any.
> >>   
> >> >> > 
> >> >> > [   75.214527] EVL: timer-responder:743 switching in-band
> >> >> > [pid=745, excpt=14, user_pc=0x77d08492a5fe]
> >> >> > 
> >> >> > In the above case, I would need a trace going back to
> >> >> > 73.214527 or so.     
> >> >> 
> >> >> Apparently the ftrace buffer was too small. I will adjust it to
> >> >> store more data and share results.    
> >> >
> >> > It is also interesting, that:
> >> >
> >> > bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
> >> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> >> > 16387
> >> > bash-5.2# evl trace -eirq -f
> >> > tracing enabled
> >> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> >> > 131
> >> >
> >> > Reduces the buffer size...
> >> >    
> >> 
> >> Yeah, the evl-trace script is not that smart. I can see the issue
> >> causing this, will fix.
> >>   
> >> > Hence, I've used the "normal" ftrace with enabled evl functions -
> >> > the    
> >> 
> >> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles
> >> with the sysfs entries manually, and may be a bit naive at times.
> >> As Jan pointed out long ago, relying on the trace-cmd front-end
> >> would be a better approach than tweaking ftrace's sysfs-based
> >> controls manually at the very least. evl-trace adds EVL semantics
> >> to the tracing configuration which is handy, but the way it is
> >> done today may deserve a revamping, or maybe drop it entirely in
> >> favor of trace-cmd if we have enough flexibility there in order to
> >> introduce some EVL specifics.
> >>   
> >> > detailed information about the ftrace config is with log files:
> >> > https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
> >> >
> >> >
> >> > To extract the cpu2 specific data:
> >> > bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
> >> > cpu2.txt
> >> >
> >> > Here again the ISW happens in the cpu2 execution "hole":
> >> > [  128.466459] EVL: timer-responder:736 switching in-band
> >> > [pid=738, excpt=14, user_pc=0x7588e71985fe]
> >> >
> >> > Happens for cpu2 between:
> >> >
> >> > 11832546:          <idle>-0     [002] *..2.   128.466438:
> >> > 	sched_idle_set_state <-cpuidle_enter_state
> >> > 11832547:          <idle>-0     [002] *..2.   128.466468:
> >> > 	handle_irq_pipelined_prepare <-arch_pipeline_entry
> >> >
> >> > However, now the log is large enough to inspect what has
> >> > happened in the system for ~4s before.
> >> >
> >> >
> >> > With the logs:
> >> > timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
> >> > [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
> >> >
> >> > and after ISW:
> >> >
> >> > <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
> >> > latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
> >> >
> >> > Why there is such increase in "delay" (253 vs 739 us) ? 
> >> >
> >> > It looks like we have only 2us delay as the timer has fired on
> >> > 128.466469 vs. expected 128.466467.
> >> >    
> >> 
> >> Ok, so the issue is still the same: for some reason, the
> >> timer-responder thread is taking a (minor) fault when accessing
> >> some memory, on its way back from the ioctl(EVL_LATIOC_PULSE)
> >> syscall, or right after. This seems to be valid memory though,
> >> just not immediately accessible, otherwise we'd have received
> >> SEGV, which we did not.  
> >
> > Yes, I also think so - like some "deferred allocation" despite of
> > mlockall.
> >
> > Please find the logs with the below patch applied:
> > https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
> >  
> 
> Thanks. I need to analyze the fault flags next week to see if I can
> make something of these, but I just stumbled on an obviously
> significant fact when looking at the multi-CPU view: the memory
> compaction daemon is running asynchronously on other cores while the
> responder is taking the fault. Compaction is all about moving memory
> around to free larger physical regions, so updating PTEs will happen
> as a result.
> 
> Moreover, I could not reproduce the issue at hand, but I did not have
> CONFIG_COMPACTION enabled. In the meantime, could you try disabling
> this on your end, to see if the fault still appears?
> 

CONFIG_COMPACTION=n - it also automatically set CONFIG_MIGRATION=n

From the Kconfig dependencies CONFIG_MIGRATION=n -> !PREEMPT_RT [=n]
(i.e. it can be only enabled when PREEMPT_RT is NOT)

I've just set CONFIG_COMPACTION=n (which also disabled the
CONFIG_MIGRATION) and now I cannot reproduce the issue on my setup (i.e.
v6.6.) anymore.

It looks like those (enabled by default) config options have slipped in
silently and introduced the issue...


-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

  parent reply	other threads:[~2025-11-01 16:00 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-09  4:33 Unexpected switches to in-band Giulio Moro
2025-10-09 13:17 ` Łukasz Majewski
2025-10-09 19:05   ` Giulio Moro
2025-10-10 10:24     ` Łukasz Majewski
2025-10-10 12:21       ` Giulio Moro
2025-10-10 13:08         ` Łukasz Majewski
2025-10-11  4:25           ` Giulio Moro
2025-10-11 15:55     ` Philippe Gerum
2025-10-11 16:10       ` Philippe Gerum
2025-10-11 16:47         ` Giulio Moro
2025-10-11 16:56           ` Philippe Gerum
2025-10-11 17:15           ` Philippe Gerum
2025-10-11 19:46             ` Giulio Moro
2025-10-12  8:54               ` Philippe Gerum
2025-10-12 14:44               ` Philippe Gerum
2025-10-20  7:47           ` Łukasz Majewski
2025-10-20 12:46             ` Giulio Moro
2025-10-20 14:01               ` Philippe Gerum
2025-10-21 11:13                 ` Łukasz Majewski
2025-10-23 13:54                 ` Łukasz Majewski
2025-10-26 20:04                   ` Philippe Gerum
2025-10-27 11:05                     ` Łukasz Majewski
2025-10-27 11:35                       ` Philippe Gerum
2025-10-27 12:54                         ` Łukasz Majewski
2025-10-27 16:25                       ` Łukasz Majewski
2025-10-27 18:16                         ` Giulio Moro
2025-10-27 22:42                           ` Giulio Moro
2025-10-29  9:18                         ` Philippe Gerum
2025-10-29 13:51                           ` Łukasz Majewski
2025-10-30 12:26                             ` Łukasz Majewski
2025-10-30 16:17                               ` Philippe Gerum
2025-10-31 15:56                                 ` Łukasz Majewski
2025-10-31 16:30                                   ` Philippe Gerum
2025-10-31 17:34                                     ` Jan Kiszka
2025-10-31 18:09                                       ` Philippe Gerum
2025-10-31 18:11                                         ` Philippe Gerum
2025-11-01 11:32                                           ` Łukasz Majewski
2025-11-03  7:57                                           ` Florian Bezdeka
2025-11-03  9:29                                             ` Jan Kiszka
2025-11-01 11:31                                         ` Łukasz Majewski
2025-10-31 18:13                                       ` Philippe Gerum
2025-11-01 15:59                                     ` Łukasz Majewski [this message]
2025-11-01 16:33                                       ` Giulio Moro
2025-11-03 14:06                                         ` Philippe Gerum
2025-11-04  7:53                                           ` Łukasz Majewski
2025-11-04  8:19                                             ` Philippe Gerum
2025-11-03 14:00                                       ` Philippe Gerum
2025-10-30 16:26                               ` Philippe Gerum
2025-10-11 17:43         ` Philippe Gerum
2025-10-11 15:37 ` Philippe Gerum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251101165957.12432bf4@wsk \
    --to=lukma@nabladev.com \
    --cc=giulio@bela.io \
    --cc=rpm@xenomai.org \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.