Unexpected switches to in-band

All of lore.kernel.org
 help / color / mirror / Atom feed

* Unexpected switches to in-band
@ 2025-10-09  4:33 Giulio Moro
  2025-10-09 13:17 ` Łukasz Majewski
  2025-10-11 15:37 ` Philippe Gerum
  0 siblings, 2 replies; 50+ messages in thread
From: Giulio Moro @ 2025-10-09  4:33 UTC (permalink / raw)
  To: Xenomai

Again, PocketBeagle 2 (TI AM6254). I get errors such as the below:

debian@BeagleBone:~$ sudo latmus -m -H20 -T 500s
warming up on CPU0 (not isolated)...
RTT|  00:00:01  (user, 1000 us period, priority 98, CPU0-noisol)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD|      4.957|     10.135|     22.647|       0|     0|      4.957|     22.647
RTD|      4.977|     10.107|     23.112|       0|     0|      4.957|     23.112
RTD|      0.948|     14.831|     52.911|       0|     0|      0.948|     52.911
RTD|      4.084|      9.548|     24.404|       0|     0|      0.724|    122.540
RTD|      4.195|     10.080|     50.350|       0|     0|      0.724|    122.540
---|-----------|-----------|-----------|--------|------|-----------------------
RTS|   -982.336|     11.935|  12357.665|      22|     2|      00:02:36/00:08:20

*** WARNING: unexpected switches to in-band mode detected,
              latency figures displayed are NOT reliable.
              Please submit a bug report upstream.
-- aborting

Possibly unrelated, it also happens that when I run our program, which runs with a period of 360us using 15% of CPU, it normally works OK, but occasionally I get _large_ underruns (like 3 to 5 ms). I haven't started debugging that yet, and I am not entirely sure it's EVL's fault, but I thought it may be worth mentioning it here.

This is on a kernel I put together: I applied the EVL commits that are at the tip of v6.12.y-evl-rebase on top of the TI v6.12.43-ti-arm64-r53 kernel plus the beagleboard patches. Here's a link, FYI https://github.com/giuliomoro/linux/tree/v6.12.43-ti-arm64-r53-bb-evl

I appreciate this may be a bit too far from mainline to ask for specific help, but I am wondering what the general approach to problem solution is here: what would make a the latmus thread unexpectedly switch to in-band?

Thanks,
Giulio

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-09  4:33 Unexpected switches to in-band Giulio Moro
@ 2025-10-09 13:17 ` Łukasz Majewski
  2025-10-09 19:05   ` Giulio Moro
  2025-10-11 15:37 ` Philippe Gerum
  1 sibling, 1 reply; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-09 13:17 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Xenomai

Hi Giulio,

> Again, PocketBeagle 2 (TI AM6254). I get errors such as the below:
> 
> debian@BeagleBone:~$ sudo latmus -m -H20 -T 500s
> warming up on CPU0 (not isolated)...
> RTT|  00:00:01  (user, 1000 us period, priority 98, CPU0-noisol)
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat
> best|--lat worst RTD|      4.957|     10.135|     22.647|       0|
>  0|      4.957|     22.647 RTD|      4.977|     10.107|     23.112|
>     0|     0|      4.957|     23.112 RTD|      0.948|     14.831|
> 52.911|       0|     0|      0.948|     52.911 RTD|      4.084|
> 9.548|     24.404|       0|     0|      0.724|    122.540 RTD|
> 4.195|     10.080|     50.350|       0|     0|      0.724|    122.540
> ---|-----------|-----------|-----------|--------|------|-----------------------
> RTS|   -982.336|     11.935|  12357.665|      22|     2|
> 00:02:36/00:08:20
> 
> *** WARNING: unexpected switches to in-band mode detected,
>               latency figures displayed are NOT reliable.
>               Please submit a bug report upstream.
> -- aborting
> 
> Possibly unrelated, it also happens that when I run our program,
> which runs with a period of 360us using 15% of CPU, it normally works
> OK, but occasionally I get _large_ underruns (like 3 to 5 ms). I
> haven't started debugging that yet, and I am not entirely sure it's
> EVL's fault, but I thought it may be worth mentioning it here.
> 
> This is on a kernel I put together: I applied the EVL commits that
> are at the tip of v6.12.y-evl-rebase on top of the TI
> v6.12.43-ti-arm64-r53 kernel plus the beagleboard patches. Here's a
> link, FYI
> https://github.com/giuliomoro/linux/tree/v6.12.43-ti-arm64-r53-bb-evl
> 
> I appreciate this may be a bit too far from mainline to ask for
> specific help, but I am wondering what the general approach to
> problem solution is here: what would make a the latmus thread
> unexpectedly switch to in-band?

I can report the same issue on:
https://gitlab.com/Xenomai/xenomai4/linux-evl/-/commits/v6.6.69-evl1-rebase?ref_type=tags

With guidance (and huge help) from Phillipe I'm now ftracing the
evl core with latmus to check what can cause the issue.

In my case I do use two simple test programs to allocate C++ <vector> -
10 GiB of memory each.

It seems like latmus is trying at some point access some evicted from
cache memory page...

The reported switch to in-band (the execution path) seems to be correct
and expected. The real problem seems to be that at some point in time,
the raw_copy_from_user() (at linux-evl/drivers/evl/latmus.c):1126
causes the X86_PF_USER (user mode access) + X86_PF_INSTR (fault was an
instruction fetch) exceptions.

> 
> Thanks,
> Giulio
> 


-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-09 13:17 ` Łukasz Majewski
@ 2025-10-09 19:05   ` Giulio Moro
  2025-10-10 10:24     ` Łukasz Majewski
  2025-10-11 15:55     ` Philippe Gerum
  0 siblings, 2 replies; 50+ messages in thread
From: Giulio Moro @ 2025-10-09 19:05 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Xenomai

> It seems like latmus is trying at some point access some evicted from
> cache memory page...

> In my case I do use two simple test programs to allocate C++ <vector> -

Thanks, that put me on a path to reliably reproduce this.

Swap is disabled. I, then set up the system so that the oom-killer has is easy: it kills the allocating process. This is much faster at executing than going through the list of programs and picking the one with the worst score.

echo 2 | sudo tee /proc/sys/vm/overcommit_memory
echo 0 | sudo tee /proc/sys/vm/overcommit_ratio
echo 1 | sudo tee /proc/sys/vm/oom_kill_allocating_task

I then have a C++ program allocating 50MiB, and I run 4 or more instances of it, one per core:

while sleep 0.1; do ./alloc& ./alloc& ./alloc& ./alloc; done

Furthermore, I have four instance of dd in the background:

dd if=/dev/zero of=/dev/null

With that, I can trigger latmus's inband switch pretty reliably within seconds (e.g.: latmus -m -K -p 360). If instead of latmus I run our application, it seems to be even faster and more reliable at triggering an in-band switch (once I set T_WOSS | T_WOLI | T_WOSX | T_HMSIG for the thread), and sigdebug_marked() confirms it is marked as sigdebug. While running it inside gdb I can inspect the backtrace upon receving the signal and it seems to be happening in seemingly harmless places. Most of the time it happens at some depth inside evl_usleep(), sometimes it happens inside libc's sinf(), sometimes somewhere else in our rt thread. I'd guess I just see it happen at random places, so the fact that it happens more often in evl_usleep() it's just because the thread spends 85% of the time in it. Note that our application never uses raw_copy_from_user(): the only call into the kernel from the real-time thread is via evl_usleep()

It may be of interest that if I disabled (T_WOSS | T_WOLI | T_WOSX | T_HMSIG) in our application and thus it's free to keep running when receiving an ISW, I can see the number of ISW grows quickly in the first few seconds of execution to something like 20 but then remains constant . Similarly, latmus with -K seems to accumulate several (5 to 10) ISW at the beginning and then proceed without any further ISW for several minutes. They eventually occasionally occur again, but much more sparingly than in the first few seconds.

For completeness, here's the C++ program I use for testing. I attempt to allocate memory in smaller chunks and get close as close as I can to filling up system memory across the four processes before the oom kills one of them.

#include <vector>

int main()
{
	std::vector<std::vector<char>> all;
	for(unsigned int n = 0; n < 5; ++n)
	{
		all.emplace_back();
		all.back().resize(10 * 1024 * 1024);
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-09 19:05   ` Giulio Moro
@ 2025-10-10 10:24     ` Łukasz Majewski
  2025-10-10 12:21       ` Giulio Moro
  2025-10-11 15:55     ` Philippe Gerum
  1 sibling, 1 reply; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-10 10:24 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Xenomai

Hi Giulio,

> > It seems like latmus is trying at some point access some evicted
> > from cache memory page...  
> 
> > In my case I do use two simple test programs to allocate C++
> > <vector> -  
> 
> Thanks, that put me on a path to reliably reproduce this.
> 
> Swap is disabled. I, then set up the system so that the oom-killer
> has is easy: it kills the allocating process. This is much faster at
> executing than going through the list of programs and picking the one
> with the worst score.
> 
> echo 2 | sudo tee /proc/sys/vm/overcommit_memory
> echo 0 | sudo tee /proc/sys/vm/overcommit_ratio
> echo 1 | sudo tee /proc/sys/vm/oom_kill_allocating_task
> 
> I then have a C++ program allocating 50MiB, and I run 4 or more
> instances of it, one per core: 
> while sleep 0.1; do ./alloc& ./alloc& ./alloc& ./alloc; done
> 
> Furthermore, I have four instance of dd in the background:
> 
> dd if=/dev/zero of=/dev/null
> 
> With that, I can trigger latmus's inband switch pretty reliably
> within seconds (e.g.: latmus -m -K -p 360). If instead of latmus I
> run our application, it seems to be even faster and more reliable at
> triggering an in-band switch (once I set T_WOSS | T_WOLI | T_WOSX |
> T_HMSIG for the thread), and sigdebug_marked() confirms it is marked
> as sigdebug. While running it inside gdb I can inspect the backtrace
> upon receving the signal and it seems to be happening in seemingly
> harmless places. Most of the time it happens at some depth inside
> evl_usleep(), sometimes it happens inside libc's sinf(), sometimes
> somewhere else in our rt thread. I'd guess I just see it happen at
> random places, so the fact that it happens more often in evl_usleep()
> it's just because the thread spends 85% of the time in it. Note that
> our application never uses raw_copy_from_user(): the only call into
> the kernel from the real-time thread is via evl_usleep()
> 

I can also confirm that the issue appears at random places. Apparently
this is determined by the affected pages content.

Another viable observation is that:

1. This issue was not observed with Xenomai 3

2. After the "first" occurrence of the in-band switch in latmus, when
it is started again, the issue cannot be reproduced any more. The
conclusion here is that the issue is with "startup" or "fresh"
configuration of memory subsystem with dovetail.

> It may be of interest that if I disabled (T_WOSS | T_WOLI | T_WOSX |
> T_HMSIG) in our application and thus it's free to keep running when
> receiving an ISW, I can see the number of ISW grows quickly in the
> first few seconds of execution to something like 20 but then remains
> constant . Similarly, latmus with -K seems to accumulate several (5
> to 10) ISW at the beginning and then proceed without any further ISW
> for several minutes. They eventually occasionally occur again, but
> much more sparingly than in the first few seconds.
> 
> For completeness, here's the C++ program I use for testing. I attempt
> to allocate memory in smaller chunks and get close as close as I can
> to filling up system memory across the four processes before the oom
> kills one of them.
> 
> #include <vector>
> 
> int main()
> {
> 	std::vector<std::vector<char>> all;
> 	for(unsigned int n = 0; n < 5; ++n)
> 	{
> 		all.emplace_back();
> 		all.back().resize(10 * 1024 * 1024);
> 	}
> 	return 0;
> }



-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-10 10:24     ` Łukasz Majewski
@ 2025-10-10 12:21       ` Giulio Moro
  2025-10-10 13:08         ` Łukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Giulio Moro @ 2025-10-10 12:21 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Xenomai

> 2. After the "first" occurrence of the in-band switch in latmus, when
> it is started again, the issue cannot be reproduced any more. The
> conclusion here is that the issue is with "startup" or "fresh"
> configuration of memory subsystem with dovetail

Do you mean that only the first instance of latmus that is started after a reboot is affected by this issue, while successive instances are not?

> With guidance (and huge help) from Phillipe I'm now ftracing the
> evl core with latmus to check what can cause the issue.

Would you be able to explain your setup here to see if I can reproduce?

Thanks,
Giulio

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-10 12:21       ` Giulio Moro
@ 2025-10-10 13:08         ` Łukasz Majewski
  2025-10-11  4:25           ` Giulio Moro
  0 siblings, 1 reply; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-10 13:08 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Xenomai

HI Giulio,

> > 2. After the "first" occurrence of the in-band switch in latmus,
> > when it is started again, the issue cannot be reproduced any more.
> > The conclusion here is that the issue is with "startup" or "fresh"
> > configuration of memory subsystem with dovetail  
> 
> Do you mean that only the first instance of latmus that is started
> after a reboot is affected by this issue, while successive instances
> are not?

Yes. That is what I do observe.

Just to mention - I do run latmus only on cpu2:
latmus -c 2 -p 1000 -P 99

Test programs are run on other cores as well

> 
> > With guidance (and huge help) from Phillipe I'm now ftracing the
> > evl core with latmus to check what can cause the issue.  
> 
> Would you be able to explain your setup here to see if I can
> reproduce?

Basically I do run in one ssh connected console:

evl trace -e 2 
latmus -c 2 -p 1000 -P 99
...
...



On the other ssh connected console:
<test_program_10GiB_vector> &
sleep 5;
<test_program_10GiB_vector> &

Then in latmus console I do see the switch to in-band notification.

From dmesg also I can read:
[   94.685063] EVL: timer-responder:745 switching in-band [pid=747,
excpt=14, user_pc=0x797b51227f20]


And te 94.685063 timestamp is crucial here. Unfortunately the ftrace
output doesn't match with it - i.e. there is no operation "around" this
time (there are ones 70 ms before and ~20 ms after).
This I'm investigating now.

> 
> Thanks,
> Giulio



-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-10 13:08         ` Łukasz Majewski
@ 2025-10-11  4:25           ` Giulio Moro
  0 siblings, 0 replies; 50+ messages in thread
From: Giulio Moro @ 2025-10-11  4:25 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Xenomai

I've tried the same base kernel using cobalt v3.3.1 + dovetail instead of evl+dovetail and the issues are pretty similar, both when using `latency` and our application. Is there a modern-ish ipipe patch anywhere? I do not see it in the current documentation ...
Best,
Giulio

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-09  4:33 Unexpected switches to in-band Giulio Moro
  2025-10-09 13:17 ` Łukasz Majewski
@ 2025-10-11 15:37 ` Philippe Gerum
  1 sibling, 0 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-11 15:37 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Xenomai


Hi Giulio,

Giulio Moro <giulio@bela.io> writes:

> Again, PocketBeagle 2 (TI AM6254). I get errors such as the below:
>
> debian@BeagleBone:~$ sudo latmus -m -H20 -T 500s
> warming up on CPU0 (not isolated)...
> RTT|  00:00:01  (user, 1000 us period, priority 98, CPU0-noisol)
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
> RTD|      4.957|     10.135|     22.647|       0|     0|      4.957|     22.647
> RTD|      4.977|     10.107|     23.112|       0|     0|      4.957|     23.112
> RTD|      0.948|     14.831|     52.911|       0|     0|      0.948|     52.911
> RTD|      4.084|      9.548|     24.404|       0|     0|      0.724|    122.540
> RTD|      4.195|     10.080|     50.350|       0|     0|      0.724|    122.540
> ---|-----------|-----------|-----------|--------|------|-----------------------
> RTS|   -982.336|     11.935|  12357.665|      22|     2|      00:02:36/00:08:20
>
> *** WARNING: unexpected switches to in-band mode detected,
>              latency figures displayed are NOT reliable.
>              Please submit a bug report upstream.
> -- aborting

Could you please checkout out the documentation regarding the 'latspot'
(f)tracepoint which the evl core implements [1]? This tracepoint is
specifically designed to help in finding the reason for delays in the
latmus program (you could use it in your own app as well).

>
> Possibly unrelated, it also happens that when I run our program, which runs with a period of 360us using 15% of CPU, it normally works OK, but occasionally I get _large_ underruns (like 3 to 5 ms). I haven't started debugging that yet, and I am not entirely sure it's EVL's fault, but I thought it may be worth mentioning it here.
>
> This is on a kernel I put together: I applied the EVL commits that are at the tip of v6.12.y-evl-rebase on top of the TI v6.12.43-ti-arm64-r53 kernel plus the beagleboard patches. Here's a link, FYI https://github.com/giuliomoro/linux/tree/v6.12.43-ti-arm64-r53-bb-evl
>
> I appreciate this may be a bit too far from mainline to ask for specific help, but I am wondering what the general approach to problem solution is here: what would make a the latmus thread unexpectedly switch to in-band?
>

Something happening in kernel space and/or at machine level (CPUfreq
event, NMI and so on), so the best way is to enable ftrace to track this
issue down.

> Thanks,
> Giulio

[1] https://v4.xenomai.org/core/commands/index.html

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-09 19:05   ` Giulio Moro
  2025-10-10 10:24     ` Łukasz Majewski
@ 2025-10-11 15:55     ` Philippe Gerum
  2025-10-11 16:10       ` Philippe Gerum
  1 sibling, 1 reply; 50+ messages in thread
From: Philippe Gerum @ 2025-10-11 15:55 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Łukasz Majewski, Xenomai

Giulio Moro <giulio@bela.io> writes:

>> It seems like latmus is trying at some point access some evicted from
>> cache memory page...
>
>> In my case I do use two simple test programs to allocate C++ <vector> -
>
> Thanks, that put me on a path to reliably reproduce this.
>
> Swap is disabled. I, then set up the system so that the oom-killer has is easy: it kills the allocating process. This is much faster at executing than going through the list of programs and picking the one with the worst score.
>
> echo 2 | sudo tee /proc/sys/vm/overcommit_memory
> echo 0 | sudo tee /proc/sys/vm/overcommit_ratio
> echo 1 | sudo tee /proc/sys/vm/oom_kill_allocating_task
>
> I then have a C++ program allocating 50MiB, and I run 4 or more instances of it, one per core:
>  while sleep 0.1; do ./alloc& ./alloc& ./alloc& ./alloc; done
>
> Furthermore, I have four instance of dd in the background:
>
> dd if=/dev/zero of=/dev/null
>
> With that, I can trigger latmus's inband switch pretty reliably within seconds (e.g.: latmus -m -K -p 360). If instead of latmus I run our application, it seems to be even faster and more reliable at triggering an in-band switch (once I set T_WOSS | T_WOLI | T_WOSX | T_HMSIG for the thread), and sigdebug_marked() confirms it is marked as sigdebug. While running it inside gdb I can inspect the backtrace upon receving the signal and it seems to be happening in seemingly harmless places. Most of the time it happens at some depth inside evl_usleep(), sometimes it happens inside libc's sinf(), sometimes somewhere else in our rt thread. I'd guess I just see it happen at random places, so the fact that it happens more often in evl_usleep() it's just because the thread spends 85% of the time in it. Note that our application never uses raw_copy_from_user(): the only call into the kernel from the real-time thread is via evl_usleep()
>
> It may be of interest that if I disabled (T_WOSS | T_WOLI | T_WOSX | T_HMSIG) in our application and thus it's free to keep running when receiving an ISW, I can see the number of ISW grows quickly in the first few seconds of execution to something like 20 but then remains constant . Similarly, latmus with -K seems to accumulate several (5 to 10) ISW at the beginning and then proceed without any further ISW for several minutes. They eventually occasionally occur again, but much more sparingly than in the first few seconds.
>
> For completeness, here's the C++ program I use for testing. I attempt to allocate memory in smaller chunks and get close as close as I can to filling up system memory across the four processes before the oom kills one of them.
>
> #include <vector>
>
> int main()
> {
> 	std::vector<std::vector<char>> all;
> 	for(unsigned int n = 0; n < 5; ++n)
> 	{
> 		all.emplace_back();
> 		all.back().resize(10 * 1024 * 1024);
> 	}
> 	return 0;
> }

Ok, so it looks like both of you observe the same issue, which seems to
be arch-independent. Checking the code which takes care of preventing
COW for oob-enabled processes (a behavior which may cause unwanted minor
faults in a tricky way), I stumbled upon a really bad bug. Could you
check whether the patch below helps?

diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 73de18353e79..c6b1efcbd833 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -91,7 +91,7 @@ static inline int get_dumpable(struct mm_struct *mm)
 
 #define MMF_VM_MERGE_ANY	30
 #define MMF_VM_MERGE_ANY_MASK	(1 << MMF_VM_MERGE_ANY)
-#define MMF_DOVETAILED		31	/* mm belongs to a dovetailed process */
+#define MMF_DOVETAILED		18	/* mm belongs to a dovetailed process */
 
 #define MMF_TOPDOWN		31	/* mm searches top down by default */
 #define MMF_TOPDOWN_MASK	(1 << MMF_TOPDOWN)

-- 
Philippe.

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-11 15:55     ` Philippe Gerum
@ 2025-10-11 16:10       ` Philippe Gerum
  2025-10-11 16:47         ` Giulio Moro
  2025-10-11 17:43         ` Philippe Gerum
  0 siblings, 2 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-11 16:10 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Łukasz Majewski, Xenomai

Philippe Gerum <rpm@xenomai.org> writes:

> Giulio Moro <giulio@bela.io> writes:
>
>>> It seems like latmus is trying at some point access some evicted from
>>> cache memory page...
>>
>>> In my case I do use two simple test programs to allocate C++ <vector> -
>>
>> Thanks, that put me on a path to reliably reproduce this.
>>
>> Swap is disabled. I, then set up the system so that the oom-killer has is easy: it kills the allocating process. This is much faster at executing than going through the list of programs and picking the one with the worst score.
>>
>> echo 2 | sudo tee /proc/sys/vm/overcommit_memory
>> echo 0 | sudo tee /proc/sys/vm/overcommit_ratio
>> echo 1 | sudo tee /proc/sys/vm/oom_kill_allocating_task
>>
>> I then have a C++ program allocating 50MiB, and I run 4 or more instances of it, one per core:
>>  while sleep 0.1; do ./alloc& ./alloc& ./alloc& ./alloc; done
>>
>> Furthermore, I have four instance of dd in the background:
>>
>> dd if=/dev/zero of=/dev/null
>>
>> With that, I can trigger latmus's inband switch pretty reliably within seconds (e.g.: latmus -m -K -p 360). If instead of latmus I run our application, it seems to be even faster and more reliable at triggering an in-band switch (once I set T_WOSS | T_WOLI | T_WOSX | T_HMSIG for the thread), and sigdebug_marked() confirms it is marked as sigdebug. While running it inside gdb I can inspect the backtrace upon receving the signal and it seems to be happening in seemingly harmless places. Most of the time it happens at some depth inside evl_usleep(), sometimes it happens inside libc's sinf(), sometimes somewhere else in our rt thread. I'd guess I just see it happen at random places, so the fact that it happens more often in evl_usleep() it's just because the thread spends 85% of the time in it. Note that our application never uses raw_copy_from_user(): the only call into the kernel from the real-time thread is via evl_usleep()
>>
>> It may be of interest that if I disabled (T_WOSS | T_WOLI | T_WOSX | T_HMSIG) in our application and thus it's free to keep running when receiving an ISW, I can see the number of ISW grows quickly in the first few seconds of execution to something like 20 but then remains constant . Similarly, latmus with -K seems to accumulate several (5 to 10) ISW at the beginning and then proceed without any further ISW for several minutes. They eventually occasionally occur again, but much more sparingly than in the first few seconds.
>>
>> For completeness, here's the C++ program I use for testing. I attempt to allocate memory in smaller chunks and get close as close as I can to filling up system memory across the four processes before the oom kills one of them.
>>
>> #include <vector>
>>
>> int main()
>> {
>> 	std::vector<std::vector<char>> all;
>> 	for(unsigned int n = 0; n < 5; ++n)
>> 	{
>> 		all.emplace_back();
>> 		all.back().resize(10 * 1024 * 1024);
>> 	}
>> 	return 0;
>> }
>
> Ok, so it looks like both of you observe the same issue, which seems to
> be arch-independent. Checking the code which takes care of preventing
> COW for oob-enabled processes (a behavior which may cause unwanted minor
> faults in a tricky way), I stumbled upon a really bad bug. Could you
> check whether the patch below helps?
>
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 73de18353e79..c6b1efcbd833 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -91,7 +91,7 @@ static inline int get_dumpable(struct mm_struct *mm)
>  
>  #define MMF_VM_MERGE_ANY	30
>  #define MMF_VM_MERGE_ANY_MASK	(1 << MMF_VM_MERGE_ANY)
> -#define MMF_DOVETAILED		31	/* mm belongs to a dovetailed process */
> +#define MMF_DOVETAILED		18	/* mm belongs to a dovetailed process */
>  
>  #define MMF_TOPDOWN		31	/* mm searches top down by default */
>  #define MMF_TOPDOWN_MASK	(1 << MMF_TOPDOWN)

Note: this issue only affects kernels from v6.10 and on

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-11 16:10       ` Philippe Gerum
@ 2025-10-11 16:47         ` Giulio Moro
  2025-10-11 16:56           ` Philippe Gerum
                             ` (2 more replies)
  2025-10-11 17:43         ` Philippe Gerum
  1 sibling, 3 replies; 50+ messages in thread
From: Giulio Moro @ 2025-10-11 16:47 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Łukasz Majewski, Xenomai

I realised that gitlab's v6.12.y-evl-rebase and v6.1.y-cip-evl-rebase can boot as-is on the PocketBeagle 2 and so I'll keep using those for tests going forward, without any third-party patches.

>> Ok, so it looks like both of you observe the same issue, which seems to
>> be arch-independent. Checking the code which takes care of preventing
>> COW for oob-enabled processes (a behavior which may cause unwanted minor
>> faults in a tricky way), I stumbled upon a really bad bug. Could you
>> check whether the patch below helps?
>>
>> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
>> index 73de18353e79..c6b1efcbd833 100644
>> --- a/include/linux/sched/coredump.h
>> +++ b/include/linux/sched/coredump.h
>> @@ -91,7 +91,7 @@ static inline int get_dumpable(struct mm_struct *mm)
>>   
>>   #define MMF_VM_MERGE_ANY	30
>>   #define MMF_VM_MERGE_ANY_MASK	(1 << MMF_VM_MERGE_ANY)
>> -#define MMF_DOVETAILED		31	/* mm belongs to a dovetailed process */
>> +#define MMF_DOVETAILED		18	/* mm belongs to a dovetailed process */
>>   
>>   #define MMF_TOPDOWN		31	/* mm searches top down by default */
>>   #define MMF_TOPDOWN_MASK	(1 << MMF_TOPDOWN)

Tried that on top of  v6.12.y-evl-rebase , it doesn't seem to fix the problem, neither for latmus nor for our application.

> Note: this issue only affects kernels from v6.10 and on

I can confirm that on v6.1.y-cip-evl-rebase I don't see that issue, even though MMF_DOVETAILED is 31 also there.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-11 16:47         ` Giulio Moro
@ 2025-10-11 16:56           ` Philippe Gerum
  2025-10-11 17:15           ` Philippe Gerum
  2025-10-20  7:47           ` Łukasz Majewski
  2 siblings, 0 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-11 16:56 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Łukasz Majewski, Xenomai

Giulio Moro <giulio@bela.io> writes:

> I realised that gitlab's v6.12.y-evl-rebase and v6.1.y-cip-evl-rebase can boot as-is on the PocketBeagle 2 and so I'll keep using those for tests going forward, without any third-party patches.
>
>>> Ok, so it looks like both of you observe the same issue, which seems to
>>> be arch-independent. Checking the code which takes care of preventing
>>> COW for oob-enabled processes (a behavior which may cause unwanted minor
>>> faults in a tricky way), I stumbled upon a really bad bug. Could you
>>> check whether the patch below helps?
>>>
>>> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
>>> index 73de18353e79..c6b1efcbd833 100644
>>> --- a/include/linux/sched/coredump.h
>>> +++ b/include/linux/sched/coredump.h
>>> @@ -91,7 +91,7 @@ static inline int get_dumpable(struct mm_struct *mm)
>>>     #define MMF_VM_MERGE_ANY	30
>>>   #define MMF_VM_MERGE_ANY_MASK	(1 << MMF_VM_MERGE_ANY)
>>> -#define MMF_DOVETAILED		31	/* mm belongs to a dovetailed process */
>>> +#define MMF_DOVETAILED		18	/* mm belongs to a dovetailed process */
>>>     #define MMF_TOPDOWN		31	/* mm searches top
>>> down by default */
>>>   #define MMF_TOPDOWN_MASK	(1 << MMF_TOPDOWN)
>
> Tried that on top of  v6.12.y-evl-rebase , it doesn't seem to fix the problem, neither for latmus nor for our application.
>
>> Note: this issue only affects kernels from v6.10 and on
>
> I can confirm that on v6.1.y-cip-evl-rebase I don't see that issue, even though MMF_DOVETAILED is 31 also there.

Meanwhile, I've been trying to reproduce on x86 and armv8 (odroid c4)
using the posted recipe, no luck so far. I'll check armv7 as well.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-11 16:47         ` Giulio Moro
  2025-10-11 16:56           ` Philippe Gerum
@ 2025-10-11 17:15           ` Philippe Gerum
  2025-10-11 19:46             ` Giulio Moro
  2025-10-20  7:47           ` Łukasz Majewski
  2 siblings, 1 reply; 50+ messages in thread
From: Philippe Gerum @ 2025-10-11 17:15 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Łukasz Majewski, Xenomai

Giulio Moro <giulio@bela.io> writes:

> I realised that gitlab's v6.12.y-evl-rebase and v6.1.y-cip-evl-rebase can boot as-is on the PocketBeagle 2 and so I'll keep using those for tests going forward, without any third-party patches.
>
>>> Ok, so it looks like both of you observe the same issue, which seems to
>>> be arch-independent. Checking the code which takes care of preventing
>>> COW for oob-enabled processes (a behavior which may cause unwanted minor
>>> faults in a tricky way), I stumbled upon a really bad bug. Could you
>>> check whether the patch below helps?
>>>
>>> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
>>> index 73de18353e79..c6b1efcbd833 100644
>>> --- a/include/linux/sched/coredump.h
>>> +++ b/include/linux/sched/coredump.h
>>> @@ -91,7 +91,7 @@ static inline int get_dumpable(struct mm_struct *mm)
>>>     #define MMF_VM_MERGE_ANY	30
>>>   #define MMF_VM_MERGE_ANY_MASK	(1 << MMF_VM_MERGE_ANY)
>>> -#define MMF_DOVETAILED		31	/* mm belongs to a dovetailed process */
>>> +#define MMF_DOVETAILED		18	/* mm belongs to a dovetailed process */
>>>     #define MMF_TOPDOWN		31	/* mm searches top
>>> down by default */
>>>   #define MMF_TOPDOWN_MASK	(1 << MMF_TOPDOWN)
>
> Tried that on top of  v6.12.y-evl-rebase , it doesn't seem to fix the problem, neither for latmus nor for our application.
>
>> Note: this issue only affects kernels from v6.10 and on
>
> I can confirm that on v6.1.y-cip-evl-rebase I don't see that issue, even though MMF_DOVETAILED is 31 also there.

Quick check: does "evl check" run without any output (as expected)?

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-11 16:10       ` Philippe Gerum
  2025-10-11 16:47         ` Giulio Moro
@ 2025-10-11 17:43         ` Philippe Gerum
  1 sibling, 0 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-11 17:43 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Łukasz Majewski, Xenomai

Philippe Gerum <rpm@xenomai.org> writes:

> Philippe Gerum <rpm@xenomai.org> writes:
>
>> Giulio Moro <giulio@bela.io> writes:
>>
>>>> It seems like latmus is trying at some point access some evicted from
>>>> cache memory page...
>>>
>>>> In my case I do use two simple test programs to allocate C++ <vector> -
>>>
>>> Thanks, that put me on a path to reliably reproduce this.
>>>
>>> Swap is disabled. I, then set up the system so that the oom-killer has is easy: it kills the allocating process. This is much faster at executing than going through the list of programs and picking the one with the worst score.
>>>
>>> echo 2 | sudo tee /proc/sys/vm/overcommit_memory
>>> echo 0 | sudo tee /proc/sys/vm/overcommit_ratio
>>> echo 1 | sudo tee /proc/sys/vm/oom_kill_allocating_task
>>>
>>> I then have a C++ program allocating 50MiB, and I run 4 or more instances of it, one per core:
>>>  while sleep 0.1; do ./alloc& ./alloc& ./alloc& ./alloc; done
>>>
>>> Furthermore, I have four instance of dd in the background:
>>>
>>> dd if=/dev/zero of=/dev/null
>>>
>>> With that, I can trigger latmus's inband switch pretty reliably within seconds (e.g.: latmus -m -K -p 360). If instead of latmus I run our application, it seems to be even faster and more reliable at triggering an in-band switch (once I set T_WOSS | T_WOLI | T_WOSX | T_HMSIG for the thread), and sigdebug_marked() confirms it is marked as sigdebug. While running it inside gdb I can inspect the backtrace upon receving the signal and it seems to be happening in seemingly harmless places. Most of the time it happens at some depth inside evl_usleep(), sometimes it happens inside libc's sinf(), sometimes somewhere else in our rt thread. I'd guess I just see it happen at random places, so the fact that it happens more often in evl_usleep() it's just because the thread spends 85% of the time in it. Note that our application never uses raw_copy_from_user(): the only call into the kernel from the real-time thread is via evl_usleep()
>>>
>>> It may be of interest that if I disabled (T_WOSS | T_WOLI | T_WOSX | T_HMSIG) in our application and thus it's free to keep running when receiving an ISW, I can see the number of ISW grows quickly in the first few seconds of execution to something like 20 but then remains constant . Similarly, latmus with -K seems to accumulate several (5 to 10) ISW at the beginning and then proceed without any further ISW for several minutes. They eventually occasionally occur again, but much more sparingly than in the first few seconds.
>>>
>>> For completeness, here's the C++ program I use for testing. I attempt to allocate memory in smaller chunks and get close as close as I can to filling up system memory across the four processes before the oom kills one of them.
>>>
>>> #include <vector>
>>>
>>> int main()
>>> {
>>> 	std::vector<std::vector<char>> all;
>>> 	for(unsigned int n = 0; n < 5; ++n)
>>> 	{
>>> 		all.emplace_back();
>>> 		all.back().resize(10 * 1024 * 1024);
>>> 	}
>>> 	return 0;
>>> }
>>
>> Ok, so it looks like both of you observe the same issue, which seems to
>> be arch-independent. Checking the code which takes care of preventing
>> COW for oob-enabled processes (a behavior which may cause unwanted minor
>> faults in a tricky way), I stumbled upon a really bad bug. Could you
>> check whether the patch below helps?
>>
>> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
>> index 73de18353e79..c6b1efcbd833 100644
>> --- a/include/linux/sched/coredump.h
>> +++ b/include/linux/sched/coredump.h
>> @@ -91,7 +91,7 @@ static inline int get_dumpable(struct mm_struct *mm)
>>  
>>  #define MMF_VM_MERGE_ANY	30
>>  #define MMF_VM_MERGE_ANY_MASK	(1 << MMF_VM_MERGE_ANY)
>> -#define MMF_DOVETAILED		31	/* mm belongs to a dovetailed process */
>> +#define MMF_DOVETAILED		18	/* mm belongs to a dovetailed process */
>>  
>>  #define MMF_TOPDOWN		31	/* mm searches top down by default */
>>  #define MMF_TOPDOWN_MASK	(1 << MMF_TOPDOWN)
>
> Note: this issue only affects kernels from v6.10 and on

Actually, this issue only affects v6.12 among the kernel releases we
currently maintain, v6.16 is fine.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-11 17:15           ` Philippe Gerum
@ 2025-10-11 19:46             ` Giulio Moro
  2025-10-12  8:54               ` Philippe Gerum
  2025-10-12 14:44               ` Philippe Gerum
  0 siblings, 2 replies; 50+ messages in thread
From: Giulio Moro @ 2025-10-11 19:46 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

> Quick check: does "evl check" run without any output (as expected)?

Yes

Further update: tested v6.16-evl-rebase and it seems to have the same in-band switching issue. So far the only good one was v6.1.y-cip-evl-rebase.

> Meanwhile, I've been trying to reproduce on x86 and armv8 (odroid c4) using the posted recipe, no luck so far.

Thanks for giving it a try. Wanna have a look at/try my config? https://github.com/giuliomoro/linux/blob/v6.12.y-evl-rebase/.config

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-11 19:46             ` Giulio Moro
@ 2025-10-12  8:54               ` Philippe Gerum
  2025-10-12 14:44               ` Philippe Gerum
  1 sibling, 0 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-12  8:54 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Xenomai

Giulio Moro <giulio@bela.io> writes:

>> Quick check: does "evl check" run without any output (as expected)?
>
> Yes
>
> Further update: tested v6.16-evl-rebase and it seems to have the same in-band switching issue. So far the only good one was v6.1.y-cip-evl-rebase.
>
>> Meanwhile, I've been trying to reproduce on x86 and armv8 (odroid c4) using the posted recipe, no luck so far.
>
> Thanks for giving it a try. Wanna have a look at/try my config? https://github.com/giuliomoro/linux/blob/v6.12.y-evl-rebase/.config

Yep, thanks.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-11 19:46             ` Giulio Moro
  2025-10-12  8:54               ` Philippe Gerum
@ 2025-10-12 14:44               ` Philippe Gerum
  1 sibling, 0 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-12 14:44 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Xenomai

Giulio Moro <giulio@bela.io> writes:

>> Quick check: does "evl check" run without any output (as expected)?
>
> Yes
>
> Further update: tested v6.16-evl-rebase and it seems to have the same in-band switching issue. So far the only good one was v6.1.y-cip-evl-rebase.
>
>> Meanwhile, I've been trying to reproduce on x86 and armv8 (odroid c4) using the posted recipe, no luck so far.
>
> Thanks for giving it a try. Wanna have a look at/try my config? https://github.com/giuliomoro/linux/blob/v6.12.y-evl-rebase/.config

I could not reproduce this issue yet, neither on x86 nor arm64, trying a
bunch of configurations of linux-evl/v6.12.49. At this point, the best
approach would be to track down the latency spot on the TI board using
the 'latspot' (f)tracepoint which latmus triggers each time a new
latency maximum is reached, so that we might find out the reason for
taking those minor faults which are causing in-band switches. There is
some documentation about its usage at [1].

[1] https://v4.xenomai.org/core/commands/index.html

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-11 16:47         ` Giulio Moro
  2025-10-11 16:56           ` Philippe Gerum
  2025-10-11 17:15           ` Philippe Gerum
@ 2025-10-20  7:47           ` Łukasz Majewski
  2025-10-20 12:46             ` Giulio Moro
  2 siblings, 1 reply; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-20  7:47 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Philippe Gerum, Xenomai

Hi Giulio,

> I realised that gitlab's v6.12.y-evl-rebase and v6.1.y-cip-evl-rebase
> can boot as-is on the PocketBeagle 2 and so I'll keep using those for
> tests going forward, without any third-party patches.
> 
> >> Ok, so it looks like both of you observe the same issue, which
> >> seems to be arch-independent. Checking the code which takes care
> >> of preventing COW for oob-enabled processes (a behavior which may
> >> cause unwanted minor faults in a tricky way), I stumbled upon a
> >> really bad bug. Could you check whether the patch below helps?
> >>
> >> diff --git a/include/linux/sched/coredump.h
> >> b/include/linux/sched/coredump.h index 73de18353e79..c6b1efcbd833
> >> 100644 --- a/include/linux/sched/coredump.h
> >> +++ b/include/linux/sched/coredump.h
> >> @@ -91,7 +91,7 @@ static inline int get_dumpable(struct mm_struct
> >> *mm) 
> >>   #define MMF_VM_MERGE_ANY	30
> >>   #define MMF_VM_MERGE_ANY_MASK	(1 << MMF_VM_MERGE_ANY)
> >> -#define MMF_DOVETAILED		31	/* mm belongs to
> >> a dovetailed process */ +#define MMF_DOVETAILED
> >> 18	/* mm belongs to a dovetailed process */ 
> >>   #define MMF_TOPDOWN		31	/* mm searches top
> >> down by default */ #define MMF_TOPDOWN_MASK	(1 <<
> >> MMF_TOPDOWN)  
> 
> Tried that on top of  v6.12.y-evl-rebase , it doesn't seem to fix the
> problem, neither for latmus nor for our application.
> 
> > Note: this issue only affects kernels from v6.10 and on  
> 
> I can confirm that on v6.1.y-cip-evl-rebase I don't see that issue,
> even though MMF_DOVETAILED is 31 also there.

Could you share which version of libevl do you use?

-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-20  7:47           ` Łukasz Majewski
@ 2025-10-20 12:46             ` Giulio Moro
  2025-10-20 14:01               ` Philippe Gerum
  0 siblings, 1 reply; 50+ messages in thread
From: Giulio Moro @ 2025-10-20 12:46 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Philippe Gerum, Xenomai

Łukasz Majewski wrote on 20/10/2025 02:47:

> Could you share which version of libevl do you use?

I was using the latest release of libevl that was compatible with the kernel UAPI.
Sorry I haven't provided more details on the issue; I am focusing on building an image around 6.1 for a deadline coming up next month so I haven't been able to get into tracing yet.

The only additional finding I have so far is that there seems to be something on the Linux side that "breaks real-time" which affects both linux and evl. In the below list, "evl bad" means that the above mentioned ISW are observed. "Linux bad" means that I see a disproportionate number of underruns under stress on a Linux program running with SCHED_FIFO and priority 95 with a period of 360us. I understand "disproportionate" is a very subjective term but, to give an idea, over a 10 minutes test I get a couple of underruns with "linux good" and I get hundreds of underruns with "linux good"

v6.12.y-evl-rebase: evl bad, linux bad
v6.11.y-evl-rebase: evl bad, linux untested
v6.10.y-evl-rebase: evl bad, linux untested
v6.9.y-evl-rebase: evl bad, linux bad
v6.6.y-evl-rebase: evl bad, linux bad
v6.3.y-evl-rebase: evl bad, linux bad on startup only
v6.2.y-evl-rebase: libevl r42, evl good, linux good
v6.1.y-cip-evl-rebase: libevl master, evl good, linux good

Not sure if this is of any help; I hope to be able to get back on this soon.
Best,
Giulio

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-20 12:46             ` Giulio Moro
@ 2025-10-20 14:01               ` Philippe Gerum
  2025-10-21 11:13                 ` Łukasz Majewski
  2025-10-23 13:54                 ` Łukasz Majewski
  0 siblings, 2 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-20 14:01 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Łukasz Majewski, Xenomai

Giulio Moro <giulio@bela.io> writes:

> Łukasz Majewski wrote on 20/10/2025 02:47:
>
>> Could you share which version of libevl do you use?
>
> I was using the latest release of libevl that was compatible with the kernel UAPI.
> Sorry I haven't provided more details on the issue; I am focusing on building an image around 6.1 for a deadline coming up next month so I haven't been able to get into tracing yet.
>
> The only additional finding I have so far is that there seems to be
> something on the Linux side that "breaks real-time" which affects both
> linux and evl. In the below list, "evl bad" means that the above
> mentioned ISW are observed. "Linux bad" means that I see a
> disproportionate number of underruns under stress on a Linux program
> running with SCHED_FIFO and priority 95 with a period of 360us. I
> understand "disproportionate" is a very subjective term but, to give
> an idea, over a 10 minutes test I get a couple of underruns with
> "linux good" and I get hundreds of underruns with "linux good"
>
> v6.12.y-evl-rebase: evl bad, linux bad
> v6.11.y-evl-rebase: evl bad, linux untested
> v6.10.y-evl-rebase: evl bad, linux untested
> v6.9.y-evl-rebase: evl bad, linux bad
> v6.6.y-evl-rebase: evl bad, linux bad
> v6.3.y-evl-rebase: evl bad, linux bad on startup only
> v6.2.y-evl-rebase: libevl r42, evl good, linux good
> v6.1.y-cip-evl-rebase: libevl master, evl good, linux good
>
> Not sure if this is of any help; I hope to be able to get back on this soon.
> Best,
> Giulio

If someone could send me the relevant portion of a trace file with a
'latspot' tracepoint triggered on a latmus run, I could investigate this
issue. I'd need the function tracer active on all CPUs, with all traces
dumped to a single trace file ('evl trace -ef' should do).

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-20 14:01               ` Philippe Gerum
@ 2025-10-21 11:13                 ` Łukasz Majewski
  2025-10-23 13:54                 ` Łukasz Majewski
  1 sibling, 0 replies; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-21 11:13 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe, Giulio,

> Giulio Moro <giulio@bela.io> writes:
> 
> > Łukasz Majewski wrote on 20/10/2025 02:47:
> >  
> >> Could you share which version of libevl do you use?  
> >
> > I was using the latest release of libevl that was compatible with
> > the kernel UAPI. Sorry I haven't provided more details on the
> > issue; I am focusing on building an image around 6.1 for a deadline
> > coming up next month so I haven't been able to get into tracing yet.
> >
> > The only additional finding I have so far is that there seems to be
> > something on the Linux side that "breaks real-time" which affects
> > both linux and evl. In the below list, "evl bad" means that the
> > above mentioned ISW are observed. "Linux bad" means that I see a
> > disproportionate number of underruns under stress on a Linux program
> > running with SCHED_FIFO and priority 95 with a period of 360us. I
> > understand "disproportionate" is a very subjective term but, to give
> > an idea, over a 10 minutes test I get a couple of underruns with
> > "linux good" and I get hundreds of underruns with "linux good"
> >
> > v6.12.y-evl-rebase: evl bad, linux bad

The evl ps -l -> gives value of ISW 0 all the time.

> > v6.11.y-evl-rebase: evl bad, linux untested
> > v6.10.y-evl-rebase: evl bad, linux untested
> > v6.9.y-evl-rebase: evl bad, linux bad
> > v6.6.y-evl-rebase: evl bad, linux bad
> > v6.3.y-evl-rebase: evl bad, linux bad on startup only
> > v6.2.y-evl-rebase: libevl r42, evl good, linux good
> > v6.1.y-cip-evl-rebase: libevl master, evl good, linux good
> >
> > Not sure if this is of any help; I hope to be able to get back on
> > this soon. Best,
> > Giulio  

When switching to 6.12.y-evl-rebase [1] (including the MMF_DOVETAIL bit
fix with libevl r55) I cannot catch the issue anymore. For comparison -
I've fallback to 6.6 (with r50 libevl) with the same setup and I can
catch it.

Giulio, on what system do you run the test setup:

echo 2 | sudo tee /proc/sys/vm/overcommit_memory
echo 0 | sudo tee /proc/sys/vm/overcommit_ratio
echo 1 | sudo tee /proc/sys/vm/oom_kill_allocating_task

I then have a C++ program allocating 50MiB, and I run 4 or more
instances of it, one per core:
  
while sleep 0.1; do ./alloc& ./alloc& ./alloc& ./alloc; done

Furthermore, I have four instance of dd in the background:

dd if=/dev/zero of=/dev/null

With that, I can trigger latmus's inband switch pretty reliably within
seconds (e.g.: latmus -m -K -p 360)

As on mine the write to:
echo 0 | sudo tee /proc/sys/vm/overcommit_ratio

just makes the system unusable:
./run_tst.sh: fork: Cannot allocate memory



> 
> If someone could send me the relevant portion of a trace file with a
> 'latspot' tracepoint triggered on a latmus run, I could investigate
> this issue. I'd need the function tracer active on all CPUs, with all
> traces dumped to a single trace file ('evl trace -ef' should do).
> 

I strive to have the issue reproductible on 6.12.

Links:
[1] -
https://gitlab.com/Xenomai/xenomai4/linux-evl/-/commits/v6.12.y-evl-rebase?ref_type=heads

-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-20 14:01               ` Philippe Gerum
  2025-10-21 11:13                 ` Łukasz Majewski
@ 2025-10-23 13:54                 ` Łukasz Majewski
  2025-10-26 20:04                   ` Philippe Gerum
  1 sibling, 1 reply; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-23 13:54 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe,

> Giulio Moro <giulio@bela.io> writes:
> 
> > Łukasz Majewski wrote on 20/10/2025 02:47:
> >  
> >> Could you share which version of libevl do you use?  
> >
> > I was using the latest release of libevl that was compatible with
> > the kernel UAPI. Sorry I haven't provided more details on the
> > issue; I am focusing on building an image around 6.1 for a deadline
> > coming up next month so I haven't been able to get into tracing yet.
> >
> > The only additional finding I have so far is that there seems to be
> > something on the Linux side that "breaks real-time" which affects
> > both linux and evl. In the below list, "evl bad" means that the
> > above mentioned ISW are observed. "Linux bad" means that I see a
> > disproportionate number of underruns under stress on a Linux program
> > running with SCHED_FIFO and priority 95 with a period of 360us. I
> > understand "disproportionate" is a very subjective term but, to give
> > an idea, over a 10 minutes test I get a couple of underruns with
> > "linux good" and I get hundreds of underruns with "linux good"
> >
> > v6.12.y-evl-rebase: evl bad, linux bad
> > v6.11.y-evl-rebase: evl bad, linux untested
> > v6.10.y-evl-rebase: evl bad, linux untested
> > v6.9.y-evl-rebase: evl bad, linux bad
> > v6.6.y-evl-rebase: evl bad, linux bad
> > v6.3.y-evl-rebase: evl bad, linux bad on startup only
> > v6.2.y-evl-rebase: libevl r42, evl good, linux good
> > v6.1.y-cip-evl-rebase: libevl master, evl good, linux good
> >
> > Not sure if this is of any help; I hope to be able to get back on
> > this soon. Best,
> > Giulio  
> 
> If someone could send me the relevant portion of a trace file with a
> 'latspot' tracepoint triggered on a latmus run, I could investigate
> this issue. I'd need the function tracer active on all CPUs, with all
> traces dumped to a single trace file ('evl trace -ef' should do).
> 

Please find tar'ed output for the trace(s).

Please, however be aware that - I've fall back to 6.6 (as it is the
version in which I can reproduce the issue in the fastest way).

Customer also reported, that they can reproduce with their SW stack the
issue on 6.1-slts and 6.12, but it takes considerably longer than for
6.6 (in which I can use simple programs to "allocate" memory).

I've used pretty standard set of ftrace CONFIG_* options enabled.

However, it seems like there is a "hole" around the time when in-band
switch has been reported (in dmesg) and in ftrace output.

I'm going to do the same with all available tracers enabled.

Last but not least, the latspot event is not present in my ftrace
output (although I've enabled all the CONFIG_EVL*DEBUG options).

Is there any special set of options to required for EVL tracig?

Tars with logs:
https://nextcloud.swupdate.org/index.php/s/FgiMsHG9xG8frk3
https://nextcloud.swupdate.org/index.php/s/XcW75xsQPMXm3zg


-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-23 13:54                 ` Łukasz Majewski
@ 2025-10-26 20:04                   ` Philippe Gerum
  2025-10-27 11:05                     ` Łukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Philippe Gerum @ 2025-10-26 20:04 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Giulio Moro, Xenomai


Hi Łukasz,

Łukasz Majewski <lukma@nabladev.com> writes:

> Hi Philippe,
>
>> Giulio Moro <giulio@bela.io> writes:
>> 
>> > Łukasz Majewski wrote on 20/10/2025 02:47:
>> >  
>> >> Could you share which version of libevl do you use?  
>> >
>> > I was using the latest release of libevl that was compatible with
>> > the kernel UAPI. Sorry I haven't provided more details on the
>> > issue; I am focusing on building an image around 6.1 for a deadline
>> > coming up next month so I haven't been able to get into tracing yet.
>> >
>> > The only additional finding I have so far is that there seems to be
>> > something on the Linux side that "breaks real-time" which affects
>> > both linux and evl. In the below list, "evl bad" means that the
>> > above mentioned ISW are observed. "Linux bad" means that I see a
>> > disproportionate number of underruns under stress on a Linux program
>> > running with SCHED_FIFO and priority 95 with a period of 360us. I
>> > understand "disproportionate" is a very subjective term but, to give
>> > an idea, over a 10 minutes test I get a couple of underruns with
>> > "linux good" and I get hundreds of underruns with "linux good"
>> >
>> > v6.12.y-evl-rebase: evl bad, linux bad
>> > v6.11.y-evl-rebase: evl bad, linux untested
>> > v6.10.y-evl-rebase: evl bad, linux untested
>> > v6.9.y-evl-rebase: evl bad, linux bad
>> > v6.6.y-evl-rebase: evl bad, linux bad
>> > v6.3.y-evl-rebase: evl bad, linux bad on startup only
>> > v6.2.y-evl-rebase: libevl r42, evl good, linux good
>> > v6.1.y-cip-evl-rebase: libevl master, evl good, linux good
>> >
>> > Not sure if this is of any help; I hope to be able to get back on
>> > this soon. Best,
>> > Giulio  
>> 
>> If someone could send me the relevant portion of a trace file with a
>> 'latspot' tracepoint triggered on a latmus run, I could investigate
>> this issue. I'd need the function tracer active on all CPUs, with all
>> traces dumped to a single trace file ('evl trace -ef' should do).
>> 
>
> Please find tar'ed output for the trace(s).
>
> Please, however be aware that - I've fall back to 6.6 (as it is the
> version in which I can reproduce the issue in the fastest way).
>
> Customer also reported, that they can reproduce with their SW stack the
> issue on 6.1-slts and 6.12, but it takes considerably longer than for
> 6.6 (in which I can use simple programs to "allocate" memory).
>

Ok, but 6.6 is definitely unmaintained Dovetail-wise, and has been so
for several months now. So although this issue was observed with
maintained releases too, debugging a current issue on an obsolete code
base is a fragile process nevertheless.

> I've used pretty standard set of ftrace CONFIG_* options enabled.
>
> However, it seems like there is a "hole" around the time when in-band
> switch has been reported (in dmesg) and in ftrace output.
>
> I'm going to do the same with all available tracers enabled.
>
> Last but not least, the latspot event is not present in my ftrace
> output (although I've enabled all the CONFIG_EVL*DEBUG options).
>
> Is there any special set of options to required for EVL tracig?
>

Yes, you need to enable the evl/evl_latspot tracepoint.
See [1].

> Tars with logs:
> https://nextcloud.swupdate.org/index.php/s/FgiMsHG9xG8frk3
> https://nextcloud.swupdate.org/index.php/s/XcW75xsQPMXm3zg

Ok, first observation, the logs reveal that we are in an OOM situation:
the kernel strategy is best-effort there, to keep the system in the best
possible state while sacrificing processes. But honestly, although the
VM_LOCKED pages are unevictable by definition, there are quite a few
spots in the mm which might trigger the OOM reaper, including the
inability to allocate page table information, insert new pages and so
on. Although all the memory of an oob application is committed, with its
VMAs populated once libevl has issued mlockall(), I genuinely don't know
how this fares with an OOM situation.

Anyway, I still see a common pattern between the two set of traces, the
unwanted inband switch happens during what seems to be time holes
(assuming that traces of all CPUs are merged into each log):

[  285.166640] EVL: timer-responder:754 switching in-band [pid=756, excpt=14, user_pc=0x7be4dc59d5fe]
   <idle>-0       [022] *..1.   148.913052: rcu_dyntick: Start 1 0 0x8ec
   <idle>-0       [014] dN.1.   172.761906: rcu_dyntick: End 0 1 0x39c

[  172.317009] EVL: timer-responder:743 switching in-band [pid=745, excpt=14, user_pc=0x7036049465fe]
   <idle>-0       [022] *..1.   148.913052: rcu_dyntick: Start 1 0 0x8ec
   <idle>-0       [014] dN.1.   172.761906: rcu_dyntick: End 0 1 0x39c

A lot can be done in 24 µs on such class of hardware, so either some
traces are missing, or something happens at hardware level which the
kernel does not know about, as it may be seen on x86 with some
uncooperative BIOS (e.g. SMIs, thermal events come to mind). Hopefully
this is not the case, but then we need to make sure that some traces are
indeed missing. If such time hole is confirmed though, then the issue
Giulio is seeing might be different.

[1] https://v4.xenomai.org/core/commands/index.html#evl-trace-command

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-26 20:04                   ` Philippe Gerum
@ 2025-10-27 11:05                     ` Łukasz Majewski
  2025-10-27 11:35                       ` Philippe Gerum
  2025-10-27 16:25                       ` Łukasz Majewski
  0 siblings, 2 replies; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-27 11:05 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe,

> Hi Łukasz,
> 
> Łukasz Majewski <lukma@nabladev.com> writes:
> 
> > Hi Philippe,
> >  
> >> Giulio Moro <giulio@bela.io> writes:
> >>   
> >> > Łukasz Majewski wrote on 20/10/2025 02:47:
> >> >    
> >> >> Could you share which version of libevl do you use?    
> >> >
> >> > I was using the latest release of libevl that was compatible with
> >> > the kernel UAPI. Sorry I haven't provided more details on the
> >> > issue; I am focusing on building an image around 6.1 for a
> >> > deadline coming up next month so I haven't been able to get into
> >> > tracing yet.
> >> >
> >> > The only additional finding I have so far is that there seems to
> >> > be something on the Linux side that "breaks real-time" which
> >> > affects both linux and evl. In the below list, "evl bad" means
> >> > that the above mentioned ISW are observed. "Linux bad" means
> >> > that I see a disproportionate number of underruns under stress
> >> > on a Linux program running with SCHED_FIFO and priority 95 with
> >> > a period of 360us. I understand "disproportionate" is a very
> >> > subjective term but, to give an idea, over a 10 minutes test I
> >> > get a couple of underruns with "linux good" and I get hundreds
> >> > of underruns with "linux good"
> >> >
> >> > v6.12.y-evl-rebase: evl bad, linux bad
> >> > v6.11.y-evl-rebase: evl bad, linux untested
> >> > v6.10.y-evl-rebase: evl bad, linux untested
> >> > v6.9.y-evl-rebase: evl bad, linux bad
> >> > v6.6.y-evl-rebase: evl bad, linux bad
> >> > v6.3.y-evl-rebase: evl bad, linux bad on startup only
> >> > v6.2.y-evl-rebase: libevl r42, evl good, linux good
> >> > v6.1.y-cip-evl-rebase: libevl master, evl good, linux good
> >> >
> >> > Not sure if this is of any help; I hope to be able to get back on
> >> > this soon. Best,
> >> > Giulio    
> >> 
> >> If someone could send me the relevant portion of a trace file with
> >> a 'latspot' tracepoint triggered on a latmus run, I could
> >> investigate this issue. I'd need the function tracer active on all
> >> CPUs, with all traces dumped to a single trace file ('evl trace
> >> -ef' should do). 
> >
> > Please find tar'ed output for the trace(s).
> >
> > Please, however be aware that - I've fall back to 6.6 (as it is the
> > version in which I can reproduce the issue in the fastest way).
> >
> > Customer also reported, that they can reproduce with their SW stack
> > the issue on 6.1-slts and 6.12, but it takes considerably longer
> > than for 6.6 (in which I can use simple programs to "allocate"
> > memory). 
> 
> Ok, but 6.6 is definitely unmaintained Dovetail-wise, and has been so
> for several months now.

Yes, I'm aware of it ...

> So although this issue was observed with
> maintained releases too, debugging a current issue on an obsolete code
> base is a fragile process nevertheless.
> 

It is fragile, yes - but:

1. I can reproduce it with minutes with fairly easy way

2. I do assume, that when we find the issue on 6.6, then we can
forward port it for 6.12 (and I will be more than happy to switch to
6.12+).

> > I've used pretty standard set of ftrace CONFIG_* options enabled.
> >
> > However, it seems like there is a "hole" around the time when
> > in-band switch has been reported (in dmesg) and in ftrace output.
> >
> > I'm going to do the same with all available tracers enabled.
> >
> > Last but not least, the latspot event is not present in my ftrace
> > output (although I've enabled all the CONFIG_EVL*DEBUG options).
> >
> > Is there any special set of options to required for EVL tracig?
> >  
> 
> Yes, you need to enable the evl/evl_latspot tracepoint.
> See [1].
> 

I've been confused a bit.

With libevl r50 - I've used

evl trace -e -f
(and I've assumed that evl* specific traces will be added).

When I've check:
cat /sys/kernel/debug/tracing/events/evl/evl_latspot/enable

returned '1', so it shall be enabled as well.

However, when I run:

evl trace -eirq it now works as expected.


If I may made an observation - I would expect the following work flow
(according to [1]):

evl trace -eirq
evl trace -d
evl trace -p > foo.txt

However, evl trace -d disables tracing, yes, but also clears the buffer.
(instead of 'evl trace -d' I do use
'echo 0 > /sys/kernel/tracing/tracing_on')

> > Tars with logs:
> > https://nextcloud.swupdate.org/index.php/s/FgiMsHG9xG8frk3
> > https://nextcloud.swupdate.org/index.php/s/XcW75xsQPMXm3zg  
> 
> Ok, first observation, the logs reveal that we are in an OOM
> situation: the kernel strategy is best-effort there, to keep the
> system in the best possible state while sacrificing processes.

+1

> But
> honestly, although the VM_LOCKED pages are unevictable by definition,
> there are quite a few spots in the mm which might trigger the OOM
> reaper, including the inability to allocate page table information,
> insert new pages and so on. Although all the memory of an oob
> application is committed, with its VMAs populated once libevl has
> issued mlockall(), I genuinely don't know how this fares with an OOM
> situation.

I would expect that, the VM_LOCKED pages (and their entries to PTE)
would be the _last_ ones to be touched (if any).

However, maybe some "optimization" in mm was added to Linux kernel
recently and we do experience the consequences.

> 
> Anyway, I still see a common pattern between the two set of traces,
> the unwanted inband switch happens during what seems to be time holes
> (assuming that traces of all CPUs are merged into each log):

Yes, those were traces for all cpus - added to a single file.

> 
> [  285.166640] EVL: timer-responder:754 switching in-band [pid=756,
> excpt=14, user_pc=0x7be4dc59d5fe] <idle>-0       [022] *..1.
> 148.913052: rcu_dyntick: Start 1 0 0x8ec <idle>-0       [014] dN.1.
> 172.761906: rcu_dyntick: End 0 1 0x39c
> 
> [  172.317009] EVL: timer-responder:743 switching in-band [pid=745,
> excpt=14, user_pc=0x7036049465fe] <idle>-0       [022] *..1.
> 148.913052: rcu_dyntick: Start 1 0 0x8ec <idle>-0       [014] dN.1.
> 172.761906: rcu_dyntick: End 0 1 0x39c
> 
> A lot can be done in 24 µs on such class of hardware,

For the record. Both Xenomai3 and Xenomai4 have been run on the
identical hardware.

On Xenomai3 there are no issues...

> so either some
> traces are missing,

I will check if I can add more tracing.

> or something happens at hardware level which the
> kernel does not know about, as it may be seen on x86 with some
> uncooperative BIOS (e.g. SMIs, thermal events come to mind).

IIRC - Giulio reported the issue on some ARM system...

> Hopefully
> this is not the case, but then we need to make sure that some traces
> are indeed missing. If such time hole is confirmed though, then the
> issue Giulio is seeing might be different.
> 
> [1] https://v4.xenomai.org/core/commands/index.html#evl-trace-command
> 

-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-27 11:05                     ` Łukasz Majewski
@ 2025-10-27 11:35                       ` Philippe Gerum
  2025-10-27 12:54                         ` Łukasz Majewski
  2025-10-27 16:25                       ` Łukasz Majewski
  1 sibling, 1 reply; 50+ messages in thread
From: Philippe Gerum @ 2025-10-27 11:35 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Giulio Moro, Xenomai

Łukasz Majewski <lukma@nabladev.com> writes:

>
> For the record. Both Xenomai3 and Xenomai4 have been run on the
> identical hardware.
>
> On Xenomai3 there are no issues...
>

Dovetail deals with those things, and this layer is shared between
xenomai3 and xenomai4. So the situation is still unclear.

>> so either some
>> traces are missing,
>
> I will check if I can add more tracing.
>
>> or something happens at hardware level which the
>> kernel does not know about, as it may be seen on x86 with some
>> uncooperative BIOS (e.g. SMIs, thermal events come to mind).
>
> IIRC - Giulio reported the issue on some ARM system...
>

Yes, but he also reported that he could observe the same issue on
xenomai3/cobalt as well. So we cannot rule out two distinct issues just
yet.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-27 11:35                       ` Philippe Gerum
@ 2025-10-27 12:54                         ` Łukasz Majewski
  0 siblings, 0 replies; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-27 12:54 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe,

> Łukasz Majewski <lukma@nabladev.com> writes:
> 
> >
> > For the record. Both Xenomai3 and Xenomai4 have been run on the
> > identical hardware.
> >
> > On Xenomai3 there are no issues...
> >  
> 
> Dovetail deals with those things, and this layer is shared between
> xenomai3 and xenomai4. So the situation is still unclear.

Just to clarify - the Xenomai3/Cobalt setup with 5.4 kernel + ipipe
patch was used. That (old) setup was NOT using the dovetail.

> 
> >> so either some
> >> traces are missing,  
> >
> > I will check if I can add more tracing.
> >  
> >> or something happens at hardware level which the
> >> kernel does not know about, as it may be seen on x86 with some
> >> uncooperative BIOS (e.g. SMIs, thermal events come to mind).  
> >
> > IIRC - Giulio reported the issue on some ARM system...
> >  
> 
> Yes, but he also reported that he could observe the same issue on
> xenomai3/cobalt as well. So we cannot rule out two distinct issues
> just yet.
> 



-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-27 11:05                     ` Łukasz Majewski
  2025-10-27 11:35                       ` Philippe Gerum
@ 2025-10-27 16:25                       ` Łukasz Majewski
  2025-10-27 18:16                         ` Giulio Moro
  2025-10-29  9:18                         ` Philippe Gerum
  1 sibling, 2 replies; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-27 16:25 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe,

> > so either some
> > traces are missing,  
> 
> I will check if I can add more tracing.

On one console:
---------------
evl trace -eirq -f; sleep 1; latmus -c 2 -p 1000 -P 99


On the second one:
------------------
test_prog/stress_prog_NO_IO &
sleep 5
test_prog/stress_prog_NO_IO &

echo 0 > /sys/kernel/tracing/tracing_on
evl trace -p > Xenomai4-ISW-trace-6.6-logs.txt

First trace:
############

The relevant dmesg output:

[   75.214527] EVL: timer-responder:743 switching in-band [pid=745,
excpt=14, user_pc=0x77d08492a5fe]

When I do look into logs - there is also a "hole" when the ISW is
reported by dmesg.


However, there are some hints when we look around evl_latmus:

timer-responder-745     [002] d.~2.    75.868139: evl_latspot: **
latency peak: 18843.649 us **

On CPU 20:
stress_prog_NO_-752     [020] ...2.    75.868139: folio_add_lru_vma
<-__handle_mm_fault

Which ends with folio_add_lru()...
And after that we do have the ~18ms latency peak.

And the "folio_*" is a relatively new feature added to Linux kernel [1].

The logs can be fetched from:
https://nextcloud.swupdate.org/index.php/s/EjHo5EAwMTeY5ya


Second trace [*]:
#################


[ 1610.086961] EVL: timer-responder:768 switching in-band [pid=770,
excpt=14, user_pc=0x72e9d35af5fe]

<idle>-0 [008] *..1.  1610.076068: rcu_dyntick: Start 1 0 0x2ec
<idle>-0 [003] *.~1.  1610.102242: rcu_dyntick: Endirq 0 1 0x404
<idle>-0 [003] *.~2.  1610.102242: arch_handle_irq <-arch_pipeline_entry
<idle>-0 [003] *.~2.  1610.102242: irq_to_desc <-arch_handle_irq
<idle>-0 [003] *.~1.  1610.102242: ma_read: mtree_load  
	Node: 00000000c4071949 (0 18446744073709551615) range: 4355-4355
<idle>-0 [003] *.~2.  1610.102242: rcu_oob_prepare_lock <-mtree_load
<idle>-0 [003] *.~2.  1610.102242: __rcu_read_lock <-mtree_load
<idle>-0 [003] *.~2.  1610.102242: __rcu_read_unlock <-mtree_load


Now some questions:
1. Why '~' (i.e. oob stage) is on CPU=3 as we pinpointed execution
   of the 'latmus' to CPU=2 (and the command line has isolcpu=2) ?

2. The mtree_load() is called in the OOB context... IMHO, this shall be
done before we do start the latmus?

3. In this case - there is also a "hole" between the ISW time
(1610.086961) and the ones between rcu_dyntick: Start / Endirq


Link to the full log:
https://nextcloud.swupdate.org/index.php/s/nPRAnqCLFDibX6r


Note:
[*] - All ftrace options enabled in the kernel


Links:
[1] -
https://github.com/torvalds/linux/commit/49f8275c7d92?spm=a2c65.11461447.0.0.68003853Pbec3R

-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-27 16:25                       ` Łukasz Majewski
@ 2025-10-27 18:16                         ` Giulio Moro
  2025-10-27 22:42                           ` Giulio Moro
  2025-10-29  9:18                         ` Philippe Gerum
  1 sibling, 1 reply; 50+ messages in thread
From: Giulio Moro @ 2025-10-27 18:16 UTC (permalink / raw)
  To: Łukasz Majewski, Philippe Gerum; +Cc: Xenomai

Sorry for my late response, here are my logs:

files.bela.io/Xenomai4-ISW-trace-6.12-logs-2.txt

this is the kernel in use (with committed .config): https://github.com/giuliomoro/linux/tree/v6.12.y-evl-rebase

Best,
Giulio

Łukasz Majewski wrote on 27/10/2025 11:25:
> Hi Philippe,
> 
>>> so either some
>>> traces are missing,
>>
>> I will check if I can add more tracing.
> 
> On one console:
> ---------------
> evl trace -eirq -f; sleep 1; latmus -c 2 -p 1000 -P 99
> 
> 
> On the second one:
> ------------------
> test_prog/stress_prog_NO_IO &
> sleep 5
> test_prog/stress_prog_NO_IO &
> 
> echo 0 > /sys/kernel/tracing/tracing_on
> evl trace -p > Xenomai4-ISW-trace-6.6-logs.txt
> 
> First trace:
> ############
> 
> The relevant dmesg output:
> 
> [   75.214527] EVL: timer-responder:743 switching in-band [pid=745,
> excpt=14, user_pc=0x77d08492a5fe]
> 
> When I do look into logs - there is also a "hole" when the ISW is
> reported by dmesg.
> 
> 
> However, there are some hints when we look around evl_latmus:
> 
> timer-responder-745     [002] d.~2.    75.868139: evl_latspot: **
> latency peak: 18843.649 us **
> 
> On CPU 20:
> stress_prog_NO_-752     [020] ...2.    75.868139: folio_add_lru_vma
> <-__handle_mm_fault
> 
> Which ends with folio_add_lru()...
> And after that we do have the ~18ms latency peak.
> 
> And the "folio_*" is a relatively new feature added to Linux kernel [1].
> 
> The logs can be fetched from:
> https://nextcloud.swupdate.org/index.php/s/EjHo5EAwMTeY5ya
> 
> 
> Second trace [*]:
> #################
> 
> 
> [ 1610.086961] EVL: timer-responder:768 switching in-band [pid=770,
> excpt=14, user_pc=0x72e9d35af5fe]
> 
> <idle>-0 [008] *..1.  1610.076068: rcu_dyntick: Start 1 0 0x2ec
> <idle>-0 [003] *.~1.  1610.102242: rcu_dyntick: Endirq 0 1 0x404
> <idle>-0 [003] *.~2.  1610.102242: arch_handle_irq <-arch_pipeline_entry
> <idle>-0 [003] *.~2.  1610.102242: irq_to_desc <-arch_handle_irq
> <idle>-0 [003] *.~1.  1610.102242: ma_read: mtree_load
> 	Node: 00000000c4071949 (0 18446744073709551615) range: 4355-4355
> <idle>-0 [003] *.~2.  1610.102242: rcu_oob_prepare_lock <-mtree_load
> <idle>-0 [003] *.~2.  1610.102242: __rcu_read_lock <-mtree_load
> <idle>-0 [003] *.~2.  1610.102242: __rcu_read_unlock <-mtree_load
> 
> 
> Now some questions:
> 1. Why '~' (i.e. oob stage) is on CPU=3 as we pinpointed execution
>     of the 'latmus' to CPU=2 (and the command line has isolcpu=2) ?
> 
> 2. The mtree_load() is called in the OOB context... IMHO, this shall be
> done before we do start the latmus?
> 
> 3. In this case - there is also a "hole" between the ISW time
> (1610.086961) and the ones between rcu_dyntick: Start / Endirq
> 
> 
> Link to the full log:
> https://nextcloud.swupdate.org/index.php/s/nPRAnqCLFDibX6r
> 
> 
> Note:
> [*] - All ftrace options enabled in the kernel
> 
> 
> Links:
> [1] -
> https://github.com/torvalds/linux/commit/49f8275c7d92?spm=a2c65.11461447.0.0.68003853Pbec3R
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-27 18:16                         ` Giulio Moro
@ 2025-10-27 22:42                           ` Giulio Moro
  0 siblings, 0 replies; 50+ messages in thread
From: Giulio Moro @ 2025-10-27 22:42 UTC (permalink / raw)
  To: Łukasz Majewski, Philippe Gerum; +Cc: Xenomai

> Sorry for my late response, here are my logs:
> 
> files.bela.io/Xenomai4-ISW-trace-6.12-logs-2.txt
> 
> this is the kernel in use (with committed .config): https://github.com/giuliomoro/linux/tree/v6.12.y-evl-rebase

I managed to collect a few more traces.

files.bela.io/traces/traces-1.zip
files.bela.io/traces/full-test.sh << file used to generate them, it uses the `alloc` program I provided here https://lore.kernel.org/xenomai/87wm51sb3l.fsf@xenomai.org/T/#mef94253db4bf13198b1ca7ac30d268fbc1acde94

Interestingly:
- latmus tends to fail towards the beginning, or it won't fail at all, so full-test.sh kills it after 10 seconds if it hasn't failed yet
- sometimes I boot the system and latmus doesn't fail after 20 or more  10-second long attempts
- sometimes I boot the system and I can get a failure about 50% of the time
- sometimes (30% of the times) trace file is empty (it only contains evl trace's comments)

I hope these help, let me know if I need to change anything.

Best,
Giulio

Giulio Moro wrote on 27/10/2025 13:16:
> Sorry for my late response, here are my logs:
> 
> files.bela.io/Xenomai4-ISW-trace-6.12-logs-2.txt
> 
> this is the kernel in use (with committed .config): https://github.com/giuliomoro/linux/tree/v6.12.y-evl-rebase
> 
> Best,
> Giulio
> 
> Łukasz Majewski wrote on 27/10/2025 11:25:
>> Hi Philippe,
>>
>>>> so either some
>>>> traces are missing,
>>>
>>> I will check if I can add more tracing.
>>
>> On one console:
>> ---------------
>> evl trace -eirq -f; sleep 1; latmus -c 2 -p 1000 -P 99
>>
>>
>> On the second one:
>> ------------------
>> test_prog/stress_prog_NO_IO &
>> sleep 5
>> test_prog/stress_prog_NO_IO &
>>
>> echo 0 > /sys/kernel/tracing/tracing_on
>> evl trace -p > Xenomai4-ISW-trace-6.6-logs.txt
>>
>> First trace:
>> ############
>>
>> The relevant dmesg output:
>>
>> [   75.214527] EVL: timer-responder:743 switching in-band [pid=745,
>> excpt=14, user_pc=0x77d08492a5fe]
>>
>> When I do look into logs - there is also a "hole" when the ISW is
>> reported by dmesg.
>>
>>
>> However, there are some hints when we look around evl_latmus:
>>
>> timer-responder-745     [002] d.~2.    75.868139: evl_latspot: **
>> latency peak: 18843.649 us **
>>
>> On CPU 20:
>> stress_prog_NO_-752     [020] ...2.    75.868139: folio_add_lru_vma
>> <-__handle_mm_fault
>>
>> Which ends with folio_add_lru()...
>> And after that we do have the ~18ms latency peak.
>>
>> And the "folio_*" is a relatively new feature added to Linux kernel [1].
>>
>> The logs can be fetched from:
>> https://nextcloud.swupdate.org/index.php/s/EjHo5EAwMTeY5ya
>>
>>
>> Second trace [*]:
>> #################
>>
>>
>> [ 1610.086961] EVL: timer-responder:768 switching in-band [pid=770,
>> excpt=14, user_pc=0x72e9d35af5fe]
>>
>> <idle>-0 [008] *..1.  1610.076068: rcu_dyntick: Start 1 0 0x2ec
>> <idle>-0 [003] *.~1.  1610.102242: rcu_dyntick: Endirq 0 1 0x404
>> <idle>-0 [003] *.~2.  1610.102242: arch_handle_irq <-arch_pipeline_entry
>> <idle>-0 [003] *.~2.  1610.102242: irq_to_desc <-arch_handle_irq
>> <idle>-0 [003] *.~1.  1610.102242: ma_read: mtree_load
>>     Node: 00000000c4071949 (0 18446744073709551615) range: 4355-4355
>> <idle>-0 [003] *.~2.  1610.102242: rcu_oob_prepare_lock <-mtree_load
>> <idle>-0 [003] *.~2.  1610.102242: __rcu_read_lock <-mtree_load
>> <idle>-0 [003] *.~2.  1610.102242: __rcu_read_unlock <-mtree_load
>>
>>
>> Now some questions:
>> 1. Why '~' (i.e. oob stage) is on CPU=3 as we pinpointed execution
>>     of the 'latmus' to CPU=2 (and the command line has isolcpu=2) ?
>>
>> 2. The mtree_load() is called in the OOB context... IMHO, this shall be
>> done before we do start the latmus?
>>
>> 3. In this case - there is also a "hole" between the ISW time
>> (1610.086961) and the ones between rcu_dyntick: Start / Endirq
>>
>>
>> Link to the full log:
>> https://nextcloud.swupdate.org/index.php/s/nPRAnqCLFDibX6r
>>
>>
>> Note:
>> [*] - All ftrace options enabled in the kernel
>>
>>
>> Links:
>> [1] -
>> https://github.com/torvalds/linux/commit/49f8275c7d92?spm=a2c65.11461447.0.0.68003853Pbec3R
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-27 16:25                       ` Łukasz Majewski
  2025-10-27 18:16                         ` Giulio Moro
@ 2025-10-29  9:18                         ` Philippe Gerum
  2025-10-29 13:51                           ` Łukasz Majewski
  1 sibling, 1 reply; 50+ messages in thread
From: Philippe Gerum @ 2025-10-29  9:18 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Giulio Moro, Xenomai


Hi Łukasz,

Łukasz Majewski <lukma@nabladev.com> writes:

> Hi Philippe,
>
>> > so either some
>> > traces are missing,  
>> 
>> I will check if I can add more tracing.
>
> On one console:
> ---------------
> evl trace -eirq -f; sleep 1; latmus -c 2 -p 1000 -P 99
>
>
> On the second one:
> ------------------
> test_prog/stress_prog_NO_IO &
> sleep 5
> test_prog/stress_prog_NO_IO &
>
> echo 0 > /sys/kernel/tracing/tracing_on
> evl trace -p > Xenomai4-ISW-trace-6.6-logs.txt
>
> First trace:
> ############
>
> The relevant dmesg output:
>
> [   75.214527] EVL: timer-responder:743 switching in-band [pid=745,
> excpt=14, user_pc=0x77d08492a5fe]
>
> When I do look into logs - there is also a "hole" when the ISW is
> reported by dmesg.
>
>
> However, there are some hints when we look around evl_latmus:
>
> timer-responder-745     [002] d.~2.    75.868139: evl_latspot: **
> latency peak: 18843.649 us **
>
> On CPU 20:
> stress_prog_NO_-752     [020] ...2.    75.868139: folio_add_lru_vma
> <-__handle_mm_fault
>
> Which ends with folio_add_lru()...
> And after that we do have the ~18ms latency peak.
>
> And the "folio_*" is a relatively new feature added to Linux kernel [1].
>
> The logs can be fetched from:
> https://nextcloud.swupdate.org/index.php/s/EjHo5EAwMTeY5ya
>
>
> Second trace [*]:
> #################
>
>
> [ 1610.086961] EVL: timer-responder:768 switching in-band [pid=770,
> excpt=14, user_pc=0x72e9d35af5fe]
>
> <idle>-0 [008] *..1.  1610.076068: rcu_dyntick: Start 1 0 0x2ec
> <idle>-0 [003] *.~1.  1610.102242: rcu_dyntick: Endirq 0 1 0x404
> <idle>-0 [003] *.~2.  1610.102242: arch_handle_irq <-arch_pipeline_entry
> <idle>-0 [003] *.~2.  1610.102242: irq_to_desc <-arch_handle_irq
> <idle>-0 [003] *.~1.  1610.102242: ma_read: mtree_load  
> 	Node: 00000000c4071949 (0 18446744073709551615) range: 4355-4355
> <idle>-0 [003] *.~2.  1610.102242: rcu_oob_prepare_lock <-mtree_load
> <idle>-0 [003] *.~2.  1610.102242: __rcu_read_lock <-mtree_load
> <idle>-0 [003] *.~2.  1610.102242: __rcu_read_unlock <-mtree_load
>
>
> Now some questions:
> 1. Why '~' (i.e. oob stage) is on CPU=3 as we pinpointed execution
>    of the 'latmus' to CPU=2 (and the command line has isolcpu=2) ?
>

Dovetail always switches up front to the oob stage before calling an irq
handler, so this trace is normal (upgrading the execution stage for the
current context is a lightweight operation, this is entirely different
from switching a task from inband to oob mode which is heavyweight).

> 2. The mtree_load() is called in the OOB context... IMHO, this shall be
> done before we do start the latmus?
>

mtree_load() is simply the lookup call into a maple tree. In this
context, the call fetches the IRQ descriptor corresponding to the
interrupt number in a descriptor table managed as a tree of such
kind. No issue there.

> 3. In this case - there is also a "hole" between the ISW time
> (1610.086961) and the ones between rcu_dyntick: Start / Endirq
>
>
> Link to the full log:
> https://nextcloud.swupdate.org/index.php/s/nPRAnqCLFDibX6r
>
>

Ok, so this trace is interesting. Since the timer responder thread was
pinned on CPU2, the first investigation needs to focus on that CPU
specifically. Filtering out all activity from other CPUs for now, here
are some observations:

          <idle>-0       [002] *.~2.    75.864279: evl_core_tick <-lapic_oob_handler
          <idle>-0       [002] *.~2.    75.864279: do_clock_tick <-evl_core_tick
          <idle>-0       [002] *.~1.    75.864279: evl_timer_expire: timer=latmus_pulse_handler
          <idle>-0       [002] *.~2.    75.864279: latmus_pulse_handler <-do_clock_tick
          <idle>-0       [002] *.~2.    75.864279: evl_flush_wait_locked <-latmus_pulse_handler
          <idle>-0       [002] *.~1.    75.864280: evl_flush_wait: wchan=&wf->wait

CPU2 received an oob IRQ timer event for waking up the timer-responder
thread which measures latency. evl_flush_wait() is the wake up call that
posts an evl 'flag' which the responder waits on for the next event.

          <idle>-0       [002] *.~2.    75.864280: evl_insert_tnode <-do_clock_tick
          <idle>-0       [002] *.~2.    75.864280: evl_program_proxy_tick <-do_clock_tick
          <idle>-0       [002] *.~2.    75.864280: lapic_next_deadline <-evl_program_proxy_tick
          <idle>-0       [002] *.~2.    75.864280: do_trace_write_msr <-lapic_next_deadline
          <idle>-0       [002] *.~1.    75.864280: write_msr: 6e0, value 340012ec56
          <idle>-0       [002] *.~1.    75.864280: evl_timer_shot: [proxy-timer/2] at 75.864790 (delay: 510 us, 134693 cycles)

On exit from the IRQ handling process, we program the timer for the next
shot, which should be 510 us ahead. With a 1Khz frequency, this means
that we consumed half of the time budget already, which is a lot. But
some guy is pounding the hardware with OOM-prone activity, so this may
make sense anyway. So far so good. The next oob tick should occur as
close as possible to 75.864790.

          <idle>-0       [002] *.~1.    75.864280: local_timer_exit: vector=236
          <idle>-0       [002] *.~1.    75.864280: irq_handler_exit: irq=4355 ret=handled
          <idle>-0       [002] *.~1.    75.864280: irq_pipeline_exit: irq=4355
          <idle>-0       [002] *.~1.    75.864280: irq_enable: caller=arch_pipeline_entry+0x9f/0x110 parent=sysvec_apic_timer_interrupt+0x12/0x20
          <idle>-0       [002] #.~2.    75.864280: handle_irq_pipelined_finish <-arch_pipeline_entry
          <idle>-0       [002] #.~1.    75.864280: rcu_dyntick: --= 3 1 0xc
          <idle>-0       [002] #.~2.    75.864280: preempt_count_sub <-handle_irq_pipelined_finish
          <idle>-0       [002] #.~2.    75.864281: preempt_count_sub <-handle_irq_pipelined_finish

At this point (i.e. on IRQ exit), I would expect to see a call to
evl_schedule(), to switch to the timer-responder thread we have just
woken up (the irq_exit_pipeline() hook in Dovetail ends up calling the
rescheduling procedure the real-time core provides, for evl that would
be evl_exit_irq()). It does not happen, so this must be that such thread
did not get a chance to go back waiting on the evl flag yet,
i.e. evl_flush_wait() did not wake up anyone. This is interesting: this
would mean that after 500+ µs, the timer responder is still busy doing
"something else" (in fact, some kernel activity is borrowing time from
it). we don't have enough traces to figure out why yet.

          <idle>-0       [002] *.~2.    75.864791: generic_pipeline_irq_desc <-arch_handle_irq
          <idle>-0       [002] *.~1.    75.864791: irq_pipeline_entry: irq=4355
          <idle>-0       [002] *.~2.    75.864791: handle_apic_irq <-generic_pipeline_irq_desc
          <idle>-0       [002] *.~2.    75.864791: handle_oob_irq <-handle_apic_irq
          <idle>-0       [002] *.~2.    75.864791: do_oob_irq <-handle_oob_irq
          <idle>-0       [002] *.~1.    75.864791: irq_handler_entry: irq=4355 name=Out-of-band LAPIC timer interrupt
          <idle>-0       [002] *.~2.    75.864791: lapic_oob_handler <-do_oob_irq
          <idle>-0       [002] *.~1.    75.864791: local_timer_entry: vector=236
          <idle>-0       [002] *.~2.    75.864791: evl_core_tick <-lapic_oob_handler
          <idle>-0       [002] *.~2.    75.864791: do_clock_tick <-evl_core_tick

As expected, the latest timer tick scheduled by evl occurs close to the
target, only off by one microsecond (75.864791 vs 75.864790). So we have
no timing issue here. Something else is at work.

Could you get me a trace which goes back up to two secs before the time
reported by evl about the inband switch of the timer responder? e.g.

[   75.214527] EVL: timer-responder:743 switching in-band [pid=745, excpt=14, user_pc=0x77d08492a5fe]

In the above case, I would need a trace going back to 73.214527 or so.

TIA,

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-29  9:18                         ` Philippe Gerum
@ 2025-10-29 13:51                           ` Łukasz Majewski
  2025-10-30 12:26                             ` Łukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-29 13:51 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe,

> Hi Łukasz,
> 
> Łukasz Majewski <lukma@nabladev.com> writes:
> 
> > Hi Philippe,
> >  
> >> > so either some
> >> > traces are missing,    
> >> 
> >> I will check if I can add more tracing.  
> >
> > On one console:
> > ---------------
> > evl trace -eirq -f; sleep 1; latmus -c 2 -p 1000 -P 99
> >
> >
> > On the second one:
> > ------------------
> > test_prog/stress_prog_NO_IO &
> > sleep 5
> > test_prog/stress_prog_NO_IO &
> >
> > echo 0 > /sys/kernel/tracing/tracing_on
> > evl trace -p > Xenomai4-ISW-trace-6.6-logs.txt
> >
> > First trace:
> > ############
> >
> > The relevant dmesg output:
> >
> > [   75.214527] EVL: timer-responder:743 switching in-band [pid=745,
> > excpt=14, user_pc=0x77d08492a5fe]
> >
> > When I do look into logs - there is also a "hole" when the ISW is
> > reported by dmesg.
> >
> >
> > However, there are some hints when we look around evl_latmus:
> >
> > timer-responder-745     [002] d.~2.    75.868139: evl_latspot: **
> > latency peak: 18843.649 us **
> >
> > On CPU 20:
> > stress_prog_NO_-752     [020] ...2.    75.868139: folio_add_lru_vma
> > <-__handle_mm_fault
> >
> > Which ends with folio_add_lru()...
> > And after that we do have the ~18ms latency peak.
> >
> > And the "folio_*" is a relatively new feature added to Linux kernel
> > [1].
> >
> > The logs can be fetched from:
> > https://nextcloud.swupdate.org/index.php/s/EjHo5EAwMTeY5ya
> >
> >
> > Second trace [*]:
> > #################
> >
> >
> > [ 1610.086961] EVL: timer-responder:768 switching in-band [pid=770,
> > excpt=14, user_pc=0x72e9d35af5fe]
> >
> > <idle>-0 [008] *..1.  1610.076068: rcu_dyntick: Start 1 0 0x2ec
> > <idle>-0 [003] *.~1.  1610.102242: rcu_dyntick: Endirq 0 1 0x404
> > <idle>-0 [003] *.~2.  1610.102242: arch_handle_irq
> > <-arch_pipeline_entry <idle>-0 [003] *.~2.  1610.102242:
> > irq_to_desc <-arch_handle_irq <idle>-0 [003] *.~1.  1610.102242:
> > ma_read: mtree_load Node: 00000000c4071949 (0 18446744073709551615)
> > range: 4355-4355 <idle>-0 [003] *.~2.  1610.102242:
> > rcu_oob_prepare_lock <-mtree_load <idle>-0 [003] *.~2.
> > 1610.102242: __rcu_read_lock <-mtree_load <idle>-0 [003] *.~2.
> > 1610.102242: __rcu_read_unlock <-mtree_load
> >
> >
> > Now some questions:
> > 1. Why '~' (i.e. oob stage) is on CPU=3 as we pinpointed execution
> >    of the 'latmus' to CPU=2 (and the command line has isolcpu=2) ?
> >  
> 
> Dovetail always switches up front to the oob stage before calling an
> irq handler, so this trace is normal (upgrading the execution stage
> for the current context is a lightweight operation, this is entirely
> different from switching a task from inband to oob mode which is
> heavyweight).

Thanks for the explanation.

> 
> > 2. The mtree_load() is called in the OOB context... IMHO, this
> > shall be done before we do start the latmus?
> >  
> 
> mtree_load() is simply the lookup call into a maple tree. In this
> context, the call fetches the IRQ descriptor corresponding to the
> interrupt number in a descriptor table managed as a tree of such
> kind. No issue there.

Ok.

> 
> > 3. In this case - there is also a "hole" between the ISW time
> > (1610.086961) and the ones between rcu_dyntick: Start / Endirq
> >
> >
> > Link to the full log:
> > https://nextcloud.swupdate.org/index.php/s/nPRAnqCLFDibX6r
> >
> >  
> 
> Ok, so this trace is interesting. Since the timer responder thread was
> pinned on CPU2, the first investigation needs to focus on that CPU
> specifically. Filtering out all activity from other CPUs for now, here
> are some observations:
> 
>           <idle>-0       [002] *.~2.    75.864279: evl_core_tick
> <-lapic_oob_handler <idle>-0       [002] *.~2.    75.864279:
> do_clock_tick <-evl_core_tick <idle>-0       [002] *.~1.
> 75.864279: evl_timer_expire: timer=latmus_pulse_handler <idle>-0
>  [002] *.~2.    75.864279: latmus_pulse_handler <-do_clock_tick
> <idle>-0       [002] *.~2.    75.864279: evl_flush_wait_locked
> <-latmus_pulse_handler <idle>-0       [002] *.~1.    75.864280:
> evl_flush_wait: wchan=&wf->wait
> 
> CPU2 received an oob IRQ timer event for waking up the timer-responder
> thread which measures latency. evl_flush_wait() is the wake up call
> that posts an evl 'flag' which the responder waits on for the next
> event.
> 
>           <idle>-0       [002] *.~2.    75.864280: evl_insert_tnode
> <-do_clock_tick <idle>-0       [002] *.~2.    75.864280:
> evl_program_proxy_tick <-do_clock_tick <idle>-0       [002] *.~2.
> 75.864280: lapic_next_deadline <-evl_program_proxy_tick <idle>-0
>  [002] *.~2.    75.864280: do_trace_write_msr <-lapic_next_deadline
> <idle>-0       [002] *.~1.    75.864280: write_msr: 6e0, value
> 340012ec56 <idle>-0       [002] *.~1.    75.864280: evl_timer_shot:
> [proxy-timer/2] at 75.864790 (delay: 510 us, 134693 cycles)
> 
> On exit from the IRQ handling process, we program the timer for the
> next shot, which should be 510 us ahead. With a 1Khz frequency, this
> means that we consumed half of the time budget already, which is a
> lot. But some guy is pounding the hardware with OOM-prone activity,
> so this may make sense anyway. So far so good. The next oob tick
> should occur as close as possible to 75.864790.
> 
>           <idle>-0       [002] *.~1.    75.864280: local_timer_exit:
> vector=236 <idle>-0       [002] *.~1.    75.864280: irq_handler_exit:
> irq=4355 ret=handled <idle>-0       [002] *.~1.    75.864280:
> irq_pipeline_exit: irq=4355 <idle>-0       [002] *.~1.    75.864280:
> irq_enable: caller=arch_pipeline_entry+0x9f/0x110
> parent=sysvec_apic_timer_interrupt+0x12/0x20 <idle>-0       [002]
> #.~2.    75.864280: handle_irq_pipelined_finish <-arch_pipeline_entry
> <idle>-0       [002] #.~1.    75.864280: rcu_dyntick: --= 3 1 0xc
> <idle>-0       [002] #.~2.    75.864280: preempt_count_sub
> <-handle_irq_pipelined_finish <idle>-0       [002] #.~2.
> 75.864281: preempt_count_sub <-handle_irq_pipelined_finish
> 
> At this point (i.e. on IRQ exit), I would expect to see a call to
> evl_schedule(), to switch to the timer-responder thread we have just
> woken up (the irq_exit_pipeline() hook in Dovetail ends up calling the
> rescheduling procedure the real-time core provides, for evl that would
> be evl_exit_irq()). It does not happen, so this must be that such
> thread did not get a chance to go back waiting on the evl flag yet,
> i.e. evl_flush_wait() did not wake up anyone. This is interesting:
> this would mean that after 500+ µs, the timer responder is still busy
> doing "something else" (in fact, some kernel activity is borrowing
> time from it). we don't have enough traces to figure out why yet.
> 
>           <idle>-0       [002] *.~2.    75.864791:
> generic_pipeline_irq_desc <-arch_handle_irq <idle>-0       [002]
> *.~1.    75.864791: irq_pipeline_entry: irq=4355 <idle>-0       [002]
> *.~2.    75.864791: handle_apic_irq <-generic_pipeline_irq_desc
> <idle>-0       [002] *.~2.    75.864791: handle_oob_irq
> <-handle_apic_irq <idle>-0       [002] *.~2.    75.864791: do_oob_irq
> <-handle_oob_irq <idle>-0       [002] *.~1.    75.864791:
> irq_handler_entry: irq=4355 name=Out-of-band LAPIC timer interrupt
> <idle>-0       [002] *.~2.    75.864791: lapic_oob_handler
> <-do_oob_irq <idle>-0       [002] *.~1.    75.864791:
> local_timer_entry: vector=236 <idle>-0       [002] *.~2.
> 75.864791: evl_core_tick <-lapic_oob_handler <idle>-0       [002]
> *.~2.    75.864791: do_clock_tick <-evl_core_tick
> 
> As expected, the latest timer tick scheduled by evl occurs close to
> the target, only off by one microsecond (75.864791 vs 75.864790). So
> we have no timing issue here. Something else is at work.
> 
> Could you get me a trace which goes back up to two secs before the
> time reported by evl about the inband switch of the timer responder?
> e.g.
> 
> [   75.214527] EVL: timer-responder:743 switching in-band [pid=745,
> excpt=14, user_pc=0x77d08492a5fe]
> 
> In the above case, I would need a trace going back to 73.214527 or so.
> 

Apparently the ftrace buffer was too small. I will adjust it to store
more data and share results.

> TIA,
> 



-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-29 13:51                           ` Łukasz Majewski
@ 2025-10-30 12:26                             ` Łukasz Majewski
  2025-10-30 16:17                               ` Philippe Gerum
  2025-10-30 16:26                               ` Philippe Gerum
  0 siblings, 2 replies; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-30 12:26 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe,

> Hi Philippe,
> 
> > Hi Łukasz,
> > 
> > Łukasz Majewski <lukma@nabladev.com> writes:
> >   
> > > Hi Philippe,
> > >    
> > >> > so either some
> > >> > traces are missing,      
> > >> 
> > >> I will check if I can add more tracing.    
> > >
> > > On one console:
> > > ---------------
> > > evl trace -eirq -f; sleep 1; latmus -c 2 -p 1000 -P 99
> > >
> > >
> > > On the second one:
> > > ------------------
> > > test_prog/stress_prog_NO_IO &
> > > sleep 5
> > > test_prog/stress_prog_NO_IO &
> > >
> > > echo 0 > /sys/kernel/tracing/tracing_on
> > > evl trace -p > Xenomai4-ISW-trace-6.6-logs.txt
> > >
> > > First trace:
> > > ############
> > >
> > > The relevant dmesg output:
> > >
> > > [   75.214527] EVL: timer-responder:743 switching in-band
> > > [pid=745, excpt=14, user_pc=0x77d08492a5fe]
> > >
> > > When I do look into logs - there is also a "hole" when the ISW is
> > > reported by dmesg.
> > >
> > >
> > > However, there are some hints when we look around evl_latmus:
> > >
> > > timer-responder-745     [002] d.~2.    75.868139: evl_latspot: **
> > > latency peak: 18843.649 us **
> > >
> > > On CPU 20:
> > > stress_prog_NO_-752     [020] ...2.    75.868139:
> > > folio_add_lru_vma <-__handle_mm_fault
> > >
> > > Which ends with folio_add_lru()...
> > > And after that we do have the ~18ms latency peak.
> > >
> > > And the "folio_*" is a relatively new feature added to Linux
> > > kernel [1].
> > >
> > > The logs can be fetched from:
> > > https://nextcloud.swupdate.org/index.php/s/EjHo5EAwMTeY5ya
> > >
> > >
> > > Second trace [*]:
> > > #################
> > >
> > >
> > > [ 1610.086961] EVL: timer-responder:768 switching in-band
> > > [pid=770, excpt=14, user_pc=0x72e9d35af5fe]
> > >
> > > <idle>-0 [008] *..1.  1610.076068: rcu_dyntick: Start 1 0 0x2ec
> > > <idle>-0 [003] *.~1.  1610.102242: rcu_dyntick: Endirq 0 1 0x404
> > > <idle>-0 [003] *.~2.  1610.102242: arch_handle_irq
> > > <-arch_pipeline_entry <idle>-0 [003] *.~2.  1610.102242:
> > > irq_to_desc <-arch_handle_irq <idle>-0 [003] *.~1.  1610.102242:
> > > ma_read: mtree_load Node: 00000000c4071949 (0
> > > 18446744073709551615) range: 4355-4355 <idle>-0 [003] *.~2.
> > > 1610.102242: rcu_oob_prepare_lock <-mtree_load <idle>-0 [003]
> > > *.~2. 1610.102242: __rcu_read_lock <-mtree_load <idle>-0 [003]
> > > *.~2. 1610.102242: __rcu_read_unlock <-mtree_load
> > >
> > >
> > > Now some questions:
> > > 1. Why '~' (i.e. oob stage) is on CPU=3 as we pinpointed execution
> > >    of the 'latmus' to CPU=2 (and the command line has isolcpu=2) ?
> > >    
> > 
> > Dovetail always switches up front to the oob stage before calling an
> > irq handler, so this trace is normal (upgrading the execution stage
> > for the current context is a lightweight operation, this is entirely
> > different from switching a task from inband to oob mode which is
> > heavyweight).  
> 
> Thanks for the explanation.
> 
> >   
> > > 2. The mtree_load() is called in the OOB context... IMHO, this
> > > shall be done before we do start the latmus?
> > >    
> > 
> > mtree_load() is simply the lookup call into a maple tree. In this
> > context, the call fetches the IRQ descriptor corresponding to the
> > interrupt number in a descriptor table managed as a tree of such
> > kind. No issue there.  
> 
> Ok.
> 
> >   
> > > 3. In this case - there is also a "hole" between the ISW time
> > > (1610.086961) and the ones between rcu_dyntick: Start / Endirq
> > >
> > >
> > > Link to the full log:
> > > https://nextcloud.swupdate.org/index.php/s/nPRAnqCLFDibX6r
> > >
> > >    
> > 
> > Ok, so this trace is interesting. Since the timer responder thread
> > was pinned on CPU2, the first investigation needs to focus on that
> > CPU specifically. Filtering out all activity from other CPUs for
> > now, here are some observations:
> > 
> >           <idle>-0       [002] *.~2.    75.864279: evl_core_tick
> > <-lapic_oob_handler <idle>-0       [002] *.~2.    75.864279:
> > do_clock_tick <-evl_core_tick <idle>-0       [002] *.~1.
> > 75.864279: evl_timer_expire: timer=latmus_pulse_handler <idle>-0
> >  [002] *.~2.    75.864279: latmus_pulse_handler <-do_clock_tick
> > <idle>-0       [002] *.~2.    75.864279: evl_flush_wait_locked
> > <-latmus_pulse_handler <idle>-0       [002] *.~1.    75.864280:
> > evl_flush_wait: wchan=&wf->wait
> > 
> > CPU2 received an oob IRQ timer event for waking up the
> > timer-responder thread which measures latency. evl_flush_wait() is
> > the wake up call that posts an evl 'flag' which the responder waits
> > on for the next event.
> > 
> >           <idle>-0       [002] *.~2.    75.864280: evl_insert_tnode
> > <-do_clock_tick <idle>-0       [002] *.~2.    75.864280:
> > evl_program_proxy_tick <-do_clock_tick <idle>-0       [002] *.~2.
> > 75.864280: lapic_next_deadline <-evl_program_proxy_tick <idle>-0
> >  [002] *.~2.    75.864280: do_trace_write_msr <-lapic_next_deadline
> > <idle>-0       [002] *.~1.    75.864280: write_msr: 6e0, value
> > 340012ec56 <idle>-0       [002] *.~1.    75.864280: evl_timer_shot:
> > [proxy-timer/2] at 75.864790 (delay: 510 us, 134693 cycles)
> > 
> > On exit from the IRQ handling process, we program the timer for the
> > next shot, which should be 510 us ahead. With a 1Khz frequency, this
> > means that we consumed half of the time budget already, which is a
> > lot. But some guy is pounding the hardware with OOM-prone activity,
> > so this may make sense anyway. So far so good. The next oob tick
> > should occur as close as possible to 75.864790.
> > 
> >           <idle>-0       [002] *.~1.    75.864280: local_timer_exit:
> > vector=236 <idle>-0       [002] *.~1.    75.864280:
> > irq_handler_exit: irq=4355 ret=handled <idle>-0       [002] *.~1.
> >  75.864280: irq_pipeline_exit: irq=4355 <idle>-0       [002] *.~1.
> >   75.864280: irq_enable: caller=arch_pipeline_entry+0x9f/0x110
> > parent=sysvec_apic_timer_interrupt+0x12/0x20 <idle>-0       [002]
> > #.~2.    75.864280: handle_irq_pipelined_finish
> > <-arch_pipeline_entry <idle>-0       [002] #.~1.    75.864280:
> > rcu_dyntick: --= 3 1 0xc <idle>-0       [002] #.~2.    75.864280:
> > preempt_count_sub <-handle_irq_pipelined_finish <idle>-0
> > [002] #.~2. 75.864281: preempt_count_sub
> > <-handle_irq_pipelined_finish
> > 
> > At this point (i.e. on IRQ exit), I would expect to see a call to
> > evl_schedule(), to switch to the timer-responder thread we have just
> > woken up (the irq_exit_pipeline() hook in Dovetail ends up calling
> > the rescheduling procedure the real-time core provides, for evl
> > that would be evl_exit_irq()). It does not happen, so this must be
> > that such thread did not get a chance to go back waiting on the evl
> > flag yet, i.e. evl_flush_wait() did not wake up anyone. This is
> > interesting: this would mean that after 500+ µs, the timer
> > responder is still busy doing "something else" (in fact, some
> > kernel activity is borrowing time from it). we don't have enough
> > traces to figure out why yet.
> > 
> >           <idle>-0       [002] *.~2.    75.864791:
> > generic_pipeline_irq_desc <-arch_handle_irq <idle>-0       [002]
> > *.~1.    75.864791: irq_pipeline_entry: irq=4355 <idle>-0
> > [002] *.~2.    75.864791: handle_apic_irq
> > <-generic_pipeline_irq_desc <idle>-0       [002] *.~2.
> > 75.864791: handle_oob_irq <-handle_apic_irq <idle>-0       [002]
> > *.~2.    75.864791: do_oob_irq <-handle_oob_irq <idle>-0
> > [002] *.~1.    75.864791: irq_handler_entry: irq=4355
> > name=Out-of-band LAPIC timer interrupt <idle>-0       [002] *.~2.
> >  75.864791: lapic_oob_handler <-do_oob_irq <idle>-0       [002]
> > *.~1.    75.864791: local_timer_entry: vector=236 <idle>-0
> > [002] *.~2. 75.864791: evl_core_tick <-lapic_oob_handler <idle>-0
> >     [002] *.~2.    75.864791: do_clock_tick <-evl_core_tick
> > 
> > As expected, the latest timer tick scheduled by evl occurs close to
> > the target, only off by one microsecond (75.864791 vs 75.864790). So
> > we have no timing issue here. Something else is at work.
> > 
> > Could you get me a trace which goes back up to two secs before the
> > time reported by evl about the inband switch of the timer responder?
> > e.g.

Why is it 2 seconds? Is it related to latmus "refresh time"? (which is
1s)

> > 
> > [   75.214527] EVL: timer-responder:743 switching in-band [pid=745,
> > excpt=14, user_pc=0x77d08492a5fe]
> > 
> > In the above case, I would need a trace going back to 73.214527 or
> > so. 
> 
> Apparently the ftrace buffer was too small. I will adjust it to store
> more data and share results.

It is also interesting, that:

bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
16387
bash-5.2# evl trace -eirq -f
tracing enabled
bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
131

Reduces the buffer size...

Hence, I've used the "normal" ftrace with enabled evl functions - the
detailed information about the ftrace config is with log files:
https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7


To extract the cpu2 specific data:
bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt > cpu2.txt

Here again the ISW happens in the cpu2 execution "hole":
[  128.466459] EVL: timer-responder:736 switching in-band [pid=738,
excpt=14, user_pc=0x7588e71985fe]

Happens for cpu2 between:

11832546:          <idle>-0     [002] *..2.   128.466438:
	sched_idle_set_state <-cpuidle_enter_state
11832547:          <idle>-0     [002] *..2.   128.466468:
	handle_irq_pipelined_prepare <-arch_pipeline_entry

However, now the log is large enough to inspect what has happened in
the system for ~4s before.


With the logs:
timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
[proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)

and after ISW:

<idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)

Why there is such increase in "delay" (253 vs 739 us) ? 

It looks like we have only 2us delay as the timer has fired on
128.466469 vs. expected 128.466467.

> 
> > TIA,
> >   
> 
> 
> 

-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-30 12:26                             ` Łukasz Majewski
@ 2025-10-30 16:17                               ` Philippe Gerum
  2025-10-31 15:56                                 ` Łukasz Majewski
  2025-10-30 16:26                               ` Philippe Gerum
  1 sibling, 1 reply; 50+ messages in thread
From: Philippe Gerum @ 2025-10-30 16:17 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Giulio Moro, Xenomai


Hi Łukasz,

Łukasz Majewski <lukma@nabladev.com> writes:

>> > Could you get me a trace which goes back up to two secs before the
>> > time reported by evl about the inband switch of the timer responder?
>> > e.g.
>
> Why is it 2 seconds? Is it related to latmus "refresh time"? (which is
> 1s)
>

Picking this delay is purely heuristical. Such a long backlog is likely
going to include the smoking gun, if any.

>> > 
>> > [   75.214527] EVL: timer-responder:743 switching in-band [pid=745,
>> > excpt=14, user_pc=0x77d08492a5fe]
>> > 
>> > In the above case, I would need a trace going back to 73.214527 or
>> > so. 
>> 
>> Apparently the ftrace buffer was too small. I will adjust it to store
>> more data and share results.
>
> It is also interesting, that:
>
> bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> 16387
> bash-5.2# evl trace -eirq -f
> tracing enabled
> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> 131
>
> Reduces the buffer size...
>

Yeah, the evl-trace script is not that smart. I can see the issue
causing this, will fix.

> Hence, I've used the "normal" ftrace with enabled evl functions - the

FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles with
the sysfs entries manually, and may be a bit naive at times. As Jan
pointed out long ago, relying on the trace-cmd front-end would be a
better approach than tweaking ftrace's sysfs-based controls manually at
the very least. evl-trace adds EVL semantics to the tracing
configuration which is handy, but the way it is done today may deserve a
revamping, or maybe drop it entirely in favor of trace-cmd if we have
enough flexibility there in order to introduce some EVL specifics.

> detailed information about the ftrace config is with log files:
> https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
>
>
> To extract the cpu2 specific data:
> bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt > cpu2.txt
>
> Here again the ISW happens in the cpu2 execution "hole":
> [  128.466459] EVL: timer-responder:736 switching in-band [pid=738,
> excpt=14, user_pc=0x7588e71985fe]
>
> Happens for cpu2 between:
>
> 11832546:          <idle>-0     [002] *..2.   128.466438:
> 	sched_idle_set_state <-cpuidle_enter_state
> 11832547:          <idle>-0     [002] *..2.   128.466468:
> 	handle_irq_pipelined_prepare <-arch_pipeline_entry
>
> However, now the log is large enough to inspect what has happened in
> the system for ~4s before.
>
>
> With the logs:
> timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
> [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
>
> and after ISW:
>
> <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
> latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
>
> Why there is such increase in "delay" (253 vs 739 us) ? 
>
> It looks like we have only 2us delay as the timer has fired on
> 128.466469 vs. expected 128.466467.
>

Ok, so the issue is still the same: for some reason, the timer-responder
thread is taking a (minor) fault when accessing some memory, on its way
back from the ioctl(EVL_LATIOC_PULSE) syscall, or right after. This
seems to be valid memory though, just not immediately accessible,
otherwise we'd have received SEGV, which we did not.

--
          <idle>-0       [002] #.~2.   129.084225: fpu__suspend_inband <-dovetail_context_switch
          <idle>-0       [002] #.~2.   129.084225: switch_mm_irqs_off <-dovetail_context_switch
 timer-responder-738     [002] *.~3.   129.084225: switch_fpu_return <-dovetail_context_switch
 timer-responder-738     [002] *.~2.   129.084225: evl_switch_tail: { current=timer-responder:736[738] }
 timer-responder-738     [002] d.~2.   129.084225: evl_finish_wait: wchan=&wf->wait
 timer-responder-738     [002] d.~2.   129.084225: evl_oob_sysexit: result=0
--

The responder is resuming from a wait on the EVL_LATIOC_PULSE flag,
leaving the kernel, all seems fine so far.. I'm trying to find some
rationale for taking a fault in the syscall return path from the
assembly section which would explain a lack of trace _and_ the faulting
instruction living in a user-space mapping (this is no vDSO call
anyway), but cannot find any so far. A corruption of the register frame
used in restoring the calling user context would most likely have
triggered a major fault, not a minor one. Hopefully.

--
 timer-responder-738     [002] *.~3.   129.084226: handle_oob_trap_entry <-__oob_trap_notify
 timer-responder-738     [002] *.~2.   129.084227: evl_thread_fault: ip=0x7588e71985fe trapnr=0xe
 timer-responder-738     [002] *.~3.   129.084227: _printk <-handle_oob_trap_entry
--

We need more information from the register file passed to us upon
trap. Could you apply the following quick patch to the test kernel? (not
to be upstreamed as-is, this is x86-specific unlike the file it applies
to). TIA,

diff --git a/include/trace/events/evl.h b/include/trace/events/evl.h
index 7136437d00227..3deeafb17e0fc 100644
--- a/include/trace/events/evl.h
+++ b/include/trace/events/evl.h
@@ -443,17 +443,26 @@ TRACE_EVENT(evl_thread_fault,
 	TP_ARGS(trapnr, regs),
 
 	TP_STRUCT__entry(
+		__field(long,	sp)
+		__field(long,	flags)
 		__field(long,	ip)
+		__field(long,	orig_ax)
+		__field(u16,	cs)
 		__field(unsigned int, trapnr)
 	),
 
 	TP_fast_assign(
+		__entry->sp = regs->sp;
+		__entry->flags = regs->flags;
 		__entry->ip = instruction_pointer(regs);
+		__entry->orig_ax = regs->orig_ax;
+		__entry->cs = regs->cs;
 		__entry->trapnr = trapnr;
 	),
 
-	TP_printk("ip=%#lx trapnr=%#x",
-		  __entry->ip, __entry->trapnr)
+	TP_printk("ip=%#lx trapnr=%#x, sp=%#lx, flags=%#lx, orig_ax=%#lx, cs=%#hx",
+		__entry->ip, __entry->trapnr,
+		__entry->sp, __entry->flags, __entry->orig_ax, __entry->cs)
 );
 
 TRACE_EVENT(evl_thread_set_current_prio,

-- 
Philippe.

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-30 12:26                             ` Łukasz Majewski
  2025-10-30 16:17                               ` Philippe Gerum
@ 2025-10-30 16:26                               ` Philippe Gerum
  1 sibling, 0 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-30 16:26 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Giulio Moro, Xenomai

Łukasz Majewski <lukma@nabladev.com> writes:

>
> <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
> latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
>
> Why there is such increase in "delay" (253 vs 739 us) ?

This is the delay until the next timer tick in an absolute timeline,
therefore the larger the value, the lesser time we consumed to handle
the last tick prior to programming the next one (e.g. 128.466214s +
253us = 128.466467s). The time consumed is still huge for this class of
hardware, but ftrace + memory stressing do explain this.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-30 16:17                               ` Philippe Gerum
@ 2025-10-31 15:56                                 ` Łukasz Majewski
  2025-10-31 16:30                                   ` Philippe Gerum
  0 siblings, 1 reply; 50+ messages in thread
From: Łukasz Majewski @ 2025-10-31 15:56 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe,

> Hi Łukasz,
> 
> Łukasz Majewski <lukma@nabladev.com> writes:
> 
> >> > Could you get me a trace which goes back up to two secs before
> >> > the time reported by evl about the inband switch of the timer
> >> > responder? e.g.  
> >
> > Why is it 2 seconds? Is it related to latmus "refresh time"? (which
> > is 1s)
> >  
> 
> Picking this delay is purely heuristical. Such a long backlog is
> likely going to include the smoking gun, if any.
> 
> >> > 
> >> > [   75.214527] EVL: timer-responder:743 switching in-band
> >> > [pid=745, excpt=14, user_pc=0x77d08492a5fe]
> >> > 
> >> > In the above case, I would need a trace going back to 73.214527
> >> > or so.   
> >> 
> >> Apparently the ftrace buffer was too small. I will adjust it to
> >> store more data and share results.  
> >
> > It is also interesting, that:
> >
> > bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> > 16387
> > bash-5.2# evl trace -eirq -f
> > tracing enabled
> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> > 131
> >
> > Reduces the buffer size...
> >  
> 
> Yeah, the evl-trace script is not that smart. I can see the issue
> causing this, will fix.
> 
> > Hence, I've used the "normal" ftrace with enabled evl functions -
> > the  
> 
> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles with
> the sysfs entries manually, and may be a bit naive at times. As Jan
> pointed out long ago, relying on the trace-cmd front-end would be a
> better approach than tweaking ftrace's sysfs-based controls manually
> at the very least. evl-trace adds EVL semantics to the tracing
> configuration which is handy, but the way it is done today may
> deserve a revamping, or maybe drop it entirely in favor of trace-cmd
> if we have enough flexibility there in order to introduce some EVL
> specifics.
> 
> > detailed information about the ftrace config is with log files:
> > https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
> >
> >
> > To extract the cpu2 specific data:
> > bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
> > cpu2.txt
> >
> > Here again the ISW happens in the cpu2 execution "hole":
> > [  128.466459] EVL: timer-responder:736 switching in-band [pid=738,
> > excpt=14, user_pc=0x7588e71985fe]
> >
> > Happens for cpu2 between:
> >
> > 11832546:          <idle>-0     [002] *..2.   128.466438:
> > 	sched_idle_set_state <-cpuidle_enter_state
> > 11832547:          <idle>-0     [002] *..2.   128.466468:
> > 	handle_irq_pipelined_prepare <-arch_pipeline_entry
> >
> > However, now the log is large enough to inspect what has happened in
> > the system for ~4s before.
> >
> >
> > With the logs:
> > timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
> > [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
> >
> > and after ISW:
> >
> > <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
> > latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
> >
> > Why there is such increase in "delay" (253 vs 739 us) ? 
> >
> > It looks like we have only 2us delay as the timer has fired on
> > 128.466469 vs. expected 128.466467.
> >  
> 
> Ok, so the issue is still the same: for some reason, the
> timer-responder thread is taking a (minor) fault when accessing some
> memory, on its way back from the ioctl(EVL_LATIOC_PULSE) syscall, or
> right after. This seems to be valid memory though, just not
> immediately accessible, otherwise we'd have received SEGV, which we
> did not.

Yes, I also think so - like some "deferred allocation" despite of
mlockall.

Please find the logs with the below patch applied:
https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E

> 
> --
>           <idle>-0       [002] #.~2.   129.084225:
> fpu__suspend_inband <-dovetail_context_switch <idle>-0       [002]
> #.~2.   129.084225: switch_mm_irqs_off <-dovetail_context_switch
> timer-responder-738     [002] *.~3.   129.084225: switch_fpu_return
> <-dovetail_context_switch timer-responder-738     [002] *.~2.
> 129.084225: evl_switch_tail: { current=timer-responder:736[738] }
> timer-responder-738     [002] d.~2.   129.084225: evl_finish_wait:
> wchan=&wf->wait timer-responder-738     [002] d.~2.   129.084225:
> evl_oob_sysexit: result=0 --
> 
> The responder is resuming from a wait on the EVL_LATIOC_PULSE flag,
> leaving the kernel, all seems fine so far.. I'm trying to find some
> rationale for taking a fault in the syscall return path from the
> assembly section which would explain a lack of trace _and_ the
> faulting instruction living in a user-space mapping (this is no vDSO
> call anyway), but cannot find any so far. A corruption of the
> register frame used in restoring the calling user context would most
> likely have triggered a major fault, not a minor one. Hopefully.
> 
> --
>  timer-responder-738     [002] *.~3.   129.084226:
> handle_oob_trap_entry <-__oob_trap_notify timer-responder-738
> [002] *.~2.   129.084227: evl_thread_fault: ip=0x7588e71985fe
> trapnr=0xe timer-responder-738     [002] *.~3.   129.084227: _printk
> <-handle_oob_trap_entry --
> 
> We need more information from the register file passed to us upon
> trap. Could you apply the following quick patch to the test kernel?
> (not to be upstreamed as-is, this is x86-specific unlike the file it
> applies to). TIA,
> 
> diff --git a/include/trace/events/evl.h b/include/trace/events/evl.h
> index 7136437d00227..3deeafb17e0fc 100644
> --- a/include/trace/events/evl.h
> +++ b/include/trace/events/evl.h
> @@ -443,17 +443,26 @@ TRACE_EVENT(evl_thread_fault,
>  	TP_ARGS(trapnr, regs),
>  
>  	TP_STRUCT__entry(
> +		__field(long,	sp)
> +		__field(long,	flags)
>  		__field(long,	ip)
> +		__field(long,	orig_ax)
> +		__field(u16,	cs)
>  		__field(unsigned int, trapnr)
>  	),
>  
>  	TP_fast_assign(
> +		__entry->sp = regs->sp;
> +		__entry->flags = regs->flags;
>  		__entry->ip = instruction_pointer(regs);
> +		__entry->orig_ax = regs->orig_ax;
> +		__entry->cs = regs->cs;
>  		__entry->trapnr = trapnr;
>  	),
>  
> -	TP_printk("ip=%#lx trapnr=%#x",
> -		  __entry->ip, __entry->trapnr)
> +	TP_printk("ip=%#lx trapnr=%#x, sp=%#lx, flags=%#lx,
> orig_ax=%#lx, cs=%#hx",
> +		__entry->ip, __entry->trapnr,
> +		__entry->sp, __entry->flags, __entry->orig_ax,
> __entry->cs) );
>  
>  TRACE_EVENT(evl_thread_set_current_prio,
> 



-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-31 15:56                                 ` Łukasz Majewski
@ 2025-10-31 16:30                                   ` Philippe Gerum
  2025-10-31 17:34                                     ` Jan Kiszka
  2025-11-01 15:59                                     ` Łukasz Majewski
  0 siblings, 2 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-31 16:30 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Giulio Moro, Xenomai

Łukasz Majewski <lukma@nabladev.com> writes:

> Hi Philippe,
>
>> Hi Łukasz,
>> 
>> Łukasz Majewski <lukma@nabladev.com> writes:
>> 
>> >> > Could you get me a trace which goes back up to two secs before
>> >> > the time reported by evl about the inband switch of the timer
>> >> > responder? e.g.  
>> >
>> > Why is it 2 seconds? Is it related to latmus "refresh time"? (which
>> > is 1s)
>> >  
>> 
>> Picking this delay is purely heuristical. Such a long backlog is
>> likely going to include the smoking gun, if any.
>> 
>> >> > 
>> >> > [   75.214527] EVL: timer-responder:743 switching in-band
>> >> > [pid=745, excpt=14, user_pc=0x77d08492a5fe]
>> >> > 
>> >> > In the above case, I would need a trace going back to 73.214527
>> >> > or so.   
>> >> 
>> >> Apparently the ftrace buffer was too small. I will adjust it to
>> >> store more data and share results.  
>> >
>> > It is also interesting, that:
>> >
>> > bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
>> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>> > 16387
>> > bash-5.2# evl trace -eirq -f
>> > tracing enabled
>> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>> > 131
>> >
>> > Reduces the buffer size...
>> >  
>> 
>> Yeah, the evl-trace script is not that smart. I can see the issue
>> causing this, will fix.
>> 
>> > Hence, I've used the "normal" ftrace with enabled evl functions -
>> > the  
>> 
>> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles with
>> the sysfs entries manually, and may be a bit naive at times. As Jan
>> pointed out long ago, relying on the trace-cmd front-end would be a
>> better approach than tweaking ftrace's sysfs-based controls manually
>> at the very least. evl-trace adds EVL semantics to the tracing
>> configuration which is handy, but the way it is done today may
>> deserve a revamping, or maybe drop it entirely in favor of trace-cmd
>> if we have enough flexibility there in order to introduce some EVL
>> specifics.
>> 
>> > detailed information about the ftrace config is with log files:
>> > https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
>> >
>> >
>> > To extract the cpu2 specific data:
>> > bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
>> > cpu2.txt
>> >
>> > Here again the ISW happens in the cpu2 execution "hole":
>> > [  128.466459] EVL: timer-responder:736 switching in-band [pid=738,
>> > excpt=14, user_pc=0x7588e71985fe]
>> >
>> > Happens for cpu2 between:
>> >
>> > 11832546:          <idle>-0     [002] *..2.   128.466438:
>> > 	sched_idle_set_state <-cpuidle_enter_state
>> > 11832547:          <idle>-0     [002] *..2.   128.466468:
>> > 	handle_irq_pipelined_prepare <-arch_pipeline_entry
>> >
>> > However, now the log is large enough to inspect what has happened in
>> > the system for ~4s before.
>> >
>> >
>> > With the logs:
>> > timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
>> > [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
>> >
>> > and after ISW:
>> >
>> > <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
>> > latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
>> >
>> > Why there is such increase in "delay" (253 vs 739 us) ? 
>> >
>> > It looks like we have only 2us delay as the timer has fired on
>> > 128.466469 vs. expected 128.466467.
>> >  
>> 
>> Ok, so the issue is still the same: for some reason, the
>> timer-responder thread is taking a (minor) fault when accessing some
>> memory, on its way back from the ioctl(EVL_LATIOC_PULSE) syscall, or
>> right after. This seems to be valid memory though, just not
>> immediately accessible, otherwise we'd have received SEGV, which we
>> did not.
>
> Yes, I also think so - like some "deferred allocation" despite of
> mlockall.
>
> Please find the logs with the below patch applied:
> https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
>

Thanks. I need to analyze the fault flags next week to see if I can make
something of these, but I just stumbled on an obviously significant fact
when looking at the multi-CPU view: the memory compaction daemon is
running asynchronously on other cores while the responder is taking the
fault. Compaction is all about moving memory around to free larger
physical regions, so updating PTEs will happen as a result.

Moreover, I could not reproduce the issue at hand, but I did not have
CONFIG_COMPACTION enabled. In the meantime, could you try disabling this
on your end, to see if the fault still appears?

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-31 16:30                                   ` Philippe Gerum
@ 2025-10-31 17:34                                     ` Jan Kiszka
  2025-10-31 18:09                                       ` Philippe Gerum
  2025-10-31 18:13                                       ` Philippe Gerum
  2025-11-01 15:59                                     ` Łukasz Majewski
  1 sibling, 2 replies; 50+ messages in thread
From: Jan Kiszka @ 2025-10-31 17:34 UTC (permalink / raw)
  To: Philippe Gerum, Łukasz Majewski; +Cc: Giulio Moro, Xenomai

On 31.10.25 17:30, Philippe Gerum wrote:
> Łukasz Majewski <lukma@nabladev.com> writes:
> 
>> Hi Philippe,
>>
>>> Hi Łukasz,
>>>
>>> Łukasz Majewski <lukma@nabladev.com> writes:
>>>
>>>>>> Could you get me a trace which goes back up to two secs before
>>>>>> the time reported by evl about the inband switch of the timer
>>>>>> responder? e.g.  
>>>>
>>>> Why is it 2 seconds? Is it related to latmus "refresh time"? (which
>>>> is 1s)
>>>>  
>>>
>>> Picking this delay is purely heuristical. Such a long backlog is
>>> likely going to include the smoking gun, if any.
>>>
>>>>>>
>>>>>> [   75.214527] EVL: timer-responder:743 switching in-band
>>>>>> [pid=745, excpt=14, user_pc=0x77d08492a5fe]
>>>>>>
>>>>>> In the above case, I would need a trace going back to 73.214527
>>>>>> or so.   
>>>>>
>>>>> Apparently the ftrace buffer was too small. I will adjust it to
>>>>> store more data and share results.  
>>>>
>>>> It is also interesting, that:
>>>>
>>>> bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>> 16387
>>>> bash-5.2# evl trace -eirq -f
>>>> tracing enabled
>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>> 131
>>>>
>>>> Reduces the buffer size...
>>>>  
>>>
>>> Yeah, the evl-trace script is not that smart. I can see the issue
>>> causing this, will fix.
>>>
>>>> Hence, I've used the "normal" ftrace with enabled evl functions -
>>>> the  
>>>
>>> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles with
>>> the sysfs entries manually, and may be a bit naive at times. As Jan
>>> pointed out long ago, relying on the trace-cmd front-end would be a
>>> better approach than tweaking ftrace's sysfs-based controls manually
>>> at the very least. evl-trace adds EVL semantics to the tracing
>>> configuration which is handy, but the way it is done today may
>>> deserve a revamping, or maybe drop it entirely in favor of trace-cmd
>>> if we have enough flexibility there in order to introduce some EVL
>>> specifics.
>>>
>>>> detailed information about the ftrace config is with log files:
>>>> https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
>>>>
>>>>
>>>> To extract the cpu2 specific data:
>>>> bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
>>>> cpu2.txt
>>>>
>>>> Here again the ISW happens in the cpu2 execution "hole":
>>>> [  128.466459] EVL: timer-responder:736 switching in-band [pid=738,
>>>> excpt=14, user_pc=0x7588e71985fe]
>>>>
>>>> Happens for cpu2 between:
>>>>
>>>> 11832546:          <idle>-0     [002] *..2.   128.466438:
>>>> 	sched_idle_set_state <-cpuidle_enter_state
>>>> 11832547:          <idle>-0     [002] *..2.   128.466468:
>>>> 	handle_irq_pipelined_prepare <-arch_pipeline_entry
>>>>
>>>> However, now the log is large enough to inspect what has happened in
>>>> the system for ~4s before.
>>>>
>>>>
>>>> With the logs:
>>>> timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
>>>> [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
>>>>
>>>> and after ISW:
>>>>
>>>> <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
>>>> latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
>>>>
>>>> Why there is such increase in "delay" (253 vs 739 us) ? 
>>>>
>>>> It looks like we have only 2us delay as the timer has fired on
>>>> 128.466469 vs. expected 128.466467.
>>>>  
>>>
>>> Ok, so the issue is still the same: for some reason, the
>>> timer-responder thread is taking a (minor) fault when accessing some
>>> memory, on its way back from the ioctl(EVL_LATIOC_PULSE) syscall, or
>>> right after. This seems to be valid memory though, just not
>>> immediately accessible, otherwise we'd have received SEGV, which we
>>> did not.
>>
>> Yes, I also think so - like some "deferred allocation" despite of
>> mlockall.
>>
>> Please find the logs with the below patch applied:
>> https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
>>
> 
> Thanks. I need to analyze the fault flags next week to see if I can make
> something of these, but I just stumbled on an obviously significant fact
> when looking at the multi-CPU view: the memory compaction daemon is
> running asynchronously on other cores while the responder is taking the
> fault. Compaction is all about moving memory around to free larger
> physical regions, so updating PTEs will happen as a result.
> 
> Moreover, I could not reproduce the issue at hand, but I did not have
> CONFIG_COMPACTION enabled. In the meantime, could you try disabling this
> on your end, to see if the fault still appears?

CONFIG_COMPACTION selects CONFIG_MIGRATION, and that is well-known to be
incompatible with RT. Xenomai 3 is warning about that that, EVL does not?

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-31 17:34                                     ` Jan Kiszka
@ 2025-10-31 18:09                                       ` Philippe Gerum
  2025-10-31 18:11                                         ` Philippe Gerum
  2025-11-01 11:31                                         ` Łukasz Majewski
  2025-10-31 18:13                                       ` Philippe Gerum
  1 sibling, 2 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-31 18:09 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Łukasz Majewski, Giulio Moro, Xenomai

Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 31.10.25 17:30, Philippe Gerum wrote:
>> Łukasz Majewski <lukma@nabladev.com> writes:
>> 
>>> Hi Philippe,
>>>
>>>> Hi Łukasz,
>>>>
>>>> Łukasz Majewski <lukma@nabladev.com> writes:
>>>>
>>>>>>> Could you get me a trace which goes back up to two secs before
>>>>>>> the time reported by evl about the inband switch of the timer
>>>>>>> responder? e.g.  
>>>>>
>>>>> Why is it 2 seconds? Is it related to latmus "refresh time"? (which
>>>>> is 1s)
>>>>>  
>>>>
>>>> Picking this delay is purely heuristical. Such a long backlog is
>>>> likely going to include the smoking gun, if any.
>>>>
>>>>>>>
>>>>>>> [   75.214527] EVL: timer-responder:743 switching in-band
>>>>>>> [pid=745, excpt=14, user_pc=0x77d08492a5fe]
>>>>>>>
>>>>>>> In the above case, I would need a trace going back to 73.214527
>>>>>>> or so.   
>>>>>>
>>>>>> Apparently the ftrace buffer was too small. I will adjust it to
>>>>>> store more data and share results.  
>>>>>
>>>>> It is also interesting, that:
>>>>>
>>>>> bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>>> 16387
>>>>> bash-5.2# evl trace -eirq -f
>>>>> tracing enabled
>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>>> 131
>>>>>
>>>>> Reduces the buffer size...
>>>>>  
>>>>
>>>> Yeah, the evl-trace script is not that smart. I can see the issue
>>>> causing this, will fix.
>>>>
>>>>> Hence, I've used the "normal" ftrace with enabled evl functions -
>>>>> the  
>>>>
>>>> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles with
>>>> the sysfs entries manually, and may be a bit naive at times. As Jan
>>>> pointed out long ago, relying on the trace-cmd front-end would be a
>>>> better approach than tweaking ftrace's sysfs-based controls manually
>>>> at the very least. evl-trace adds EVL semantics to the tracing
>>>> configuration which is handy, but the way it is done today may
>>>> deserve a revamping, or maybe drop it entirely in favor of trace-cmd
>>>> if we have enough flexibility there in order to introduce some EVL
>>>> specifics.
>>>>
>>>>> detailed information about the ftrace config is with log files:
>>>>> https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
>>>>>
>>>>>
>>>>> To extract the cpu2 specific data:
>>>>> bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
>>>>> cpu2.txt
>>>>>
>>>>> Here again the ISW happens in the cpu2 execution "hole":
>>>>> [  128.466459] EVL: timer-responder:736 switching in-band [pid=738,
>>>>> excpt=14, user_pc=0x7588e71985fe]
>>>>>
>>>>> Happens for cpu2 between:
>>>>>
>>>>> 11832546:          <idle>-0     [002] *..2.   128.466438:
>>>>> 	sched_idle_set_state <-cpuidle_enter_state
>>>>> 11832547:          <idle>-0     [002] *..2.   128.466468:
>>>>> 	handle_irq_pipelined_prepare <-arch_pipeline_entry
>>>>>
>>>>> However, now the log is large enough to inspect what has happened in
>>>>> the system for ~4s before.
>>>>>
>>>>>
>>>>> With the logs:
>>>>> timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
>>>>> [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
>>>>>
>>>>> and after ISW:
>>>>>
>>>>> <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
>>>>> latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
>>>>>
>>>>> Why there is such increase in "delay" (253 vs 739 us) ? 
>>>>>
>>>>> It looks like we have only 2us delay as the timer has fired on
>>>>> 128.466469 vs. expected 128.466467.
>>>>>  
>>>>
>>>> Ok, so the issue is still the same: for some reason, the
>>>> timer-responder thread is taking a (minor) fault when accessing some
>>>> memory, on its way back from the ioctl(EVL_LATIOC_PULSE) syscall, or
>>>> right after. This seems to be valid memory though, just not
>>>> immediately accessible, otherwise we'd have received SEGV, which we
>>>> did not.
>>>
>>> Yes, I also think so - like some "deferred allocation" despite of
>>> mlockall.
>>>
>>> Please find the logs with the below patch applied:
>>> https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
>>>
>> 
>> Thanks. I need to analyze the fault flags next week to see if I can make
>> something of these, but I just stumbled on an obviously significant fact
>> when looking at the multi-CPU view: the memory compaction daemon is
>> running asynchronously on other cores while the responder is taking the
>> fault. Compaction is all about moving memory around to free larger
>> physical regions, so updating PTEs will happen as a result.
>> 
>> Moreover, I could not reproduce the issue at hand, but I did not have
>> CONFIG_COMPACTION enabled. In the meantime, could you try disabling this
>> on your end, to see if the fault still appears?
>
> CONFIG_COMPACTION selects CONFIG_MIGRATION, and that is well-known to be
> incompatible with RT. Xenomai 3 is warning about that that, EVL does not?
>

Dovetail should do that when CONFIG_DOVETAIL is set, this is by
definition common to any rt core.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-31 18:09                                       ` Philippe Gerum
@ 2025-10-31 18:11                                         ` Philippe Gerum
  2025-11-01 11:32                                           ` Łukasz Majewski
  2025-11-03  7:57                                           ` Florian Bezdeka
  2025-11-01 11:31                                         ` Łukasz Majewski
  1 sibling, 2 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-31 18:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Łukasz Majewski, Giulio Moro, Xenomai

Philippe Gerum <rpm@xenomai.org> writes:

> Jan Kiszka <jan.kiszka@siemens.com> writes:
>
>> On 31.10.25 17:30, Philippe Gerum wrote:
>>> Łukasz Majewski <lukma@nabladev.com> writes:
>>> 
>>>> Hi Philippe,
>>>>
>>>>> Hi Łukasz,
>>>>>
>>>>> Łukasz Majewski <lukma@nabladev.com> writes:
>>>>>
>>>>>>>> Could you get me a trace which goes back up to two secs before
>>>>>>>> the time reported by evl about the inband switch of the timer
>>>>>>>> responder? e.g.  
>>>>>>
>>>>>> Why is it 2 seconds? Is it related to latmus "refresh time"? (which
>>>>>> is 1s)
>>>>>>  
>>>>>
>>>>> Picking this delay is purely heuristical. Such a long backlog is
>>>>> likely going to include the smoking gun, if any.
>>>>>
>>>>>>>>
>>>>>>>> [   75.214527] EVL: timer-responder:743 switching in-band
>>>>>>>> [pid=745, excpt=14, user_pc=0x77d08492a5fe]
>>>>>>>>
>>>>>>>> In the above case, I would need a trace going back to 73.214527
>>>>>>>> or so.   
>>>>>>>
>>>>>>> Apparently the ftrace buffer was too small. I will adjust it to
>>>>>>> store more data and share results.  
>>>>>>
>>>>>> It is also interesting, that:
>>>>>>
>>>>>> bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
>>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>>>> 16387
>>>>>> bash-5.2# evl trace -eirq -f
>>>>>> tracing enabled
>>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>>>> 131
>>>>>>
>>>>>> Reduces the buffer size...
>>>>>>  
>>>>>
>>>>> Yeah, the evl-trace script is not that smart. I can see the issue
>>>>> causing this, will fix.
>>>>>
>>>>>> Hence, I've used the "normal" ftrace with enabled evl functions -
>>>>>> the  
>>>>>
>>>>> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles with
>>>>> the sysfs entries manually, and may be a bit naive at times. As Jan
>>>>> pointed out long ago, relying on the trace-cmd front-end would be a
>>>>> better approach than tweaking ftrace's sysfs-based controls manually
>>>>> at the very least. evl-trace adds EVL semantics to the tracing
>>>>> configuration which is handy, but the way it is done today may
>>>>> deserve a revamping, or maybe drop it entirely in favor of trace-cmd
>>>>> if we have enough flexibility there in order to introduce some EVL
>>>>> specifics.
>>>>>
>>>>>> detailed information about the ftrace config is with log files:
>>>>>> https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
>>>>>>
>>>>>>
>>>>>> To extract the cpu2 specific data:
>>>>>> bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
>>>>>> cpu2.txt
>>>>>>
>>>>>> Here again the ISW happens in the cpu2 execution "hole":
>>>>>> [  128.466459] EVL: timer-responder:736 switching in-band [pid=738,
>>>>>> excpt=14, user_pc=0x7588e71985fe]
>>>>>>
>>>>>> Happens for cpu2 between:
>>>>>>
>>>>>> 11832546:          <idle>-0     [002] *..2.   128.466438:
>>>>>> 	sched_idle_set_state <-cpuidle_enter_state
>>>>>> 11832547:          <idle>-0     [002] *..2.   128.466468:
>>>>>> 	handle_irq_pipelined_prepare <-arch_pipeline_entry
>>>>>>
>>>>>> However, now the log is large enough to inspect what has happened in
>>>>>> the system for ~4s before.
>>>>>>
>>>>>>
>>>>>> With the logs:
>>>>>> timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
>>>>>> [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
>>>>>>
>>>>>> and after ISW:
>>>>>>
>>>>>> <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
>>>>>> latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
>>>>>>
>>>>>> Why there is such increase in "delay" (253 vs 739 us) ? 
>>>>>>
>>>>>> It looks like we have only 2us delay as the timer has fired on
>>>>>> 128.466469 vs. expected 128.466467.
>>>>>>  
>>>>>
>>>>> Ok, so the issue is still the same: for some reason, the
>>>>> timer-responder thread is taking a (minor) fault when accessing some
>>>>> memory, on its way back from the ioctl(EVL_LATIOC_PULSE) syscall, or
>>>>> right after. This seems to be valid memory though, just not
>>>>> immediately accessible, otherwise we'd have received SEGV, which we
>>>>> did not.
>>>>
>>>> Yes, I also think so - like some "deferred allocation" despite of
>>>> mlockall.
>>>>
>>>> Please find the logs with the below patch applied:
>>>> https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
>>>>
>>> 
>>> Thanks. I need to analyze the fault flags next week to see if I can make
>>> something of these, but I just stumbled on an obviously significant fact
>>> when looking at the multi-CPU view: the memory compaction daemon is
>>> running asynchronously on other cores while the responder is taking the
>>> fault. Compaction is all about moving memory around to free larger
>>> physical regions, so updating PTEs will happen as a result.
>>> 
>>> Moreover, I could not reproduce the issue at hand, but I did not have
>>> CONFIG_COMPACTION enabled. In the meantime, could you try disabling this
>>> on your end, to see if the fault still appears?
>>
>> CONFIG_COMPACTION selects CONFIG_MIGRATION, and that is well-known to be
>> incompatible with RT. Xenomai 3 is warning about that that, EVL does not?
>>
>
> Dovetail should do that when CONFIG_DOVETAIL is set, this is by
> definition common to any rt core.

Also, it would be much better to find a way to exclude any mm bearing
the MMF_DOVETAIL flag from the movable set, instead of excluding
everything that sets CONFIG_COMPACTION.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-31 17:34                                     ` Jan Kiszka
  2025-10-31 18:09                                       ` Philippe Gerum
@ 2025-10-31 18:13                                       ` Philippe Gerum
  1 sibling, 0 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-10-31 18:13 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Łukasz Majewski, Giulio Moro, Xenomai

Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 31.10.25 17:30, Philippe Gerum wrote:
>> Łukasz Majewski <lukma@nabladev.com> writes:
>> 
>>> Hi Philippe,
>>>
>>>> Hi Łukasz,
>>>>
>>>> Łukasz Majewski <lukma@nabladev.com> writes:
>>>>
>>>>>>> Could you get me a trace which goes back up to two secs before
>>>>>>> the time reported by evl about the inband switch of the timer
>>>>>>> responder? e.g.  
>>>>>
>>>>> Why is it 2 seconds? Is it related to latmus "refresh time"? (which
>>>>> is 1s)
>>>>>  
>>>>
>>>> Picking this delay is purely heuristical. Such a long backlog is
>>>> likely going to include the smoking gun, if any.
>>>>
>>>>>>>
>>>>>>> [   75.214527] EVL: timer-responder:743 switching in-band
>>>>>>> [pid=745, excpt=14, user_pc=0x77d08492a5fe]
>>>>>>>
>>>>>>> In the above case, I would need a trace going back to 73.214527
>>>>>>> or so.   
>>>>>>
>>>>>> Apparently the ftrace buffer was too small. I will adjust it to
>>>>>> store more data and share results.  
>>>>>
>>>>> It is also interesting, that:
>>>>>
>>>>> bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>>> 16387
>>>>> bash-5.2# evl trace -eirq -f
>>>>> tracing enabled
>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>>> 131
>>>>>
>>>>> Reduces the buffer size...
>>>>>  
>>>>
>>>> Yeah, the evl-trace script is not that smart. I can see the issue
>>>> causing this, will fix.
>>>>
>>>>> Hence, I've used the "normal" ftrace with enabled evl functions -
>>>>> the  
>>>>
>>>> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles with
>>>> the sysfs entries manually, and may be a bit naive at times. As Jan
>>>> pointed out long ago, relying on the trace-cmd front-end would be a
>>>> better approach than tweaking ftrace's sysfs-based controls manually
>>>> at the very least. evl-trace adds EVL semantics to the tracing
>>>> configuration which is handy, but the way it is done today may
>>>> deserve a revamping, or maybe drop it entirely in favor of trace-cmd
>>>> if we have enough flexibility there in order to introduce some EVL
>>>> specifics.
>>>>
>>>>> detailed information about the ftrace config is with log files:
>>>>> https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
>>>>>
>>>>>
>>>>> To extract the cpu2 specific data:
>>>>> bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
>>>>> cpu2.txt
>>>>>
>>>>> Here again the ISW happens in the cpu2 execution "hole":
>>>>> [  128.466459] EVL: timer-responder:736 switching in-band [pid=738,
>>>>> excpt=14, user_pc=0x7588e71985fe]
>>>>>
>>>>> Happens for cpu2 between:
>>>>>
>>>>> 11832546:          <idle>-0     [002] *..2.   128.466438:
>>>>> 	sched_idle_set_state <-cpuidle_enter_state
>>>>> 11832547:          <idle>-0     [002] *..2.   128.466468:
>>>>> 	handle_irq_pipelined_prepare <-arch_pipeline_entry
>>>>>
>>>>> However, now the log is large enough to inspect what has happened in
>>>>> the system for ~4s before.
>>>>>
>>>>>
>>>>> With the logs:
>>>>> timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
>>>>> [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
>>>>>
>>>>> and after ISW:
>>>>>
>>>>> <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
>>>>> latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
>>>>>
>>>>> Why there is such increase in "delay" (253 vs 739 us) ? 
>>>>>
>>>>> It looks like we have only 2us delay as the timer has fired on
>>>>> 128.466469 vs. expected 128.466467.
>>>>>  
>>>>
>>>> Ok, so the issue is still the same: for some reason, the
>>>> timer-responder thread is taking a (minor) fault when accessing some
>>>> memory, on its way back from the ioctl(EVL_LATIOC_PULSE) syscall, or
>>>> right after. This seems to be valid memory though, just not
>>>> immediately accessible, otherwise we'd have received SEGV, which we
>>>> did not.
>>>
>>> Yes, I also think so - like some "deferred allocation" despite of
>>> mlockall.
>>>
>>> Please find the logs with the below patch applied:
>>> https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
>>>
>> 
>> Thanks. I need to analyze the fault flags next week to see if I can make
>> something of these, but I just stumbled on an obviously significant fact
>> when looking at the multi-CPU view: the memory compaction daemon is
>> running asynchronously on other cores while the responder is taking the
>> fault. Compaction is all about moving memory around to free larger
>> physical regions, so updating PTEs will happen as a result.
>> 
>> Moreover, I could not reproduce the issue at hand, but I did not have
>> CONFIG_COMPACTION enabled. In the meantime, could you try disabling this
>> on your end, to see if the fault still appears?
>
> CONFIG_COMPACTION selects CONFIG_MIGRATION, and that is well-known to be
> incompatible with RT. Xenomai 3 is warning about that that, EVL does not?
>

Yep, CONFIG_{MIGRATION, COMPACTION} is not even tested by evl-check, and
there is no provision for issuing any warning in the EVL-specific
Kconfig bits. So it looks like we have a winner..

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-31 18:09                                       ` Philippe Gerum
  2025-10-31 18:11                                         ` Philippe Gerum
@ 2025-11-01 11:31                                         ` Łukasz Majewski
  1 sibling, 0 replies; 50+ messages in thread
From: Łukasz Majewski @ 2025-11-01 11:31 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, Giulio Moro, Xenomai

Hi Philippe, Jan,

> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
> > On 31.10.25 17:30, Philippe Gerum wrote:  
> >> Łukasz Majewski <lukma@nabladev.com> writes:
> >>   
> >>> Hi Philippe,
> >>>  
> >>>> Hi Łukasz,
> >>>>
> >>>> Łukasz Majewski <lukma@nabladev.com> writes:
> >>>>  
> >>>>>>> Could you get me a trace which goes back up to two secs before
> >>>>>>> the time reported by evl about the inband switch of the timer
> >>>>>>> responder? e.g.    
> >>>>>
> >>>>> Why is it 2 seconds? Is it related to latmus "refresh time"?
> >>>>> (which is 1s)
> >>>>>    
> >>>>
> >>>> Picking this delay is purely heuristical. Such a long backlog is
> >>>> likely going to include the smoking gun, if any.
> >>>>  
> >>>>>>>
> >>>>>>> [   75.214527] EVL: timer-responder:743 switching in-band
> >>>>>>> [pid=745, excpt=14, user_pc=0x77d08492a5fe]
> >>>>>>>
> >>>>>>> In the above case, I would need a trace going back to
> >>>>>>> 73.214527 or so.     
> >>>>>>
> >>>>>> Apparently the ftrace buffer was too small. I will adjust it to
> >>>>>> store more data and share results.    
> >>>>>
> >>>>> It is also interesting, that:
> >>>>>
> >>>>> bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
> >>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> >>>>> 16387
> >>>>> bash-5.2# evl trace -eirq -f
> >>>>> tracing enabled
> >>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> >>>>> 131
> >>>>>
> >>>>> Reduces the buffer size...
> >>>>>    
> >>>>
> >>>> Yeah, the evl-trace script is not that smart. I can see the issue
> >>>> causing this, will fix.
> >>>>  
> >>>>> Hence, I've used the "normal" ftrace with enabled evl functions
> >>>>> - the    
> >>>>
> >>>> FWIW, EVL relies on the normal ftrace. However, evl-trace
> >>>> fiddles with the sysfs entries manually, and may be a bit naive
> >>>> at times. As Jan pointed out long ago, relying on the trace-cmd
> >>>> front-end would be a better approach than tweaking ftrace's
> >>>> sysfs-based controls manually at the very least. evl-trace adds
> >>>> EVL semantics to the tracing configuration which is handy, but
> >>>> the way it is done today may deserve a revamping, or maybe drop
> >>>> it entirely in favor of trace-cmd if we have enough flexibility
> >>>> there in order to introduce some EVL specifics.
> >>>>  
> >>>>> detailed information about the ftrace config is with log files:
> >>>>> https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
> >>>>>
> >>>>>
> >>>>> To extract the cpu2 specific data:
> >>>>> bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
> >>>>> cpu2.txt
> >>>>>
> >>>>> Here again the ISW happens in the cpu2 execution "hole":
> >>>>> [  128.466459] EVL: timer-responder:736 switching in-band
> >>>>> [pid=738, excpt=14, user_pc=0x7588e71985fe]
> >>>>>
> >>>>> Happens for cpu2 between:
> >>>>>
> >>>>> 11832546:          <idle>-0     [002] *..2.   128.466438:
> >>>>> 	sched_idle_set_state <-cpuidle_enter_state
> >>>>> 11832547:          <idle>-0     [002] *..2.   128.466468:
> >>>>> 	handle_irq_pipelined_prepare <-arch_pipeline_entry
> >>>>>
> >>>>> However, now the log is large enough to inspect what has
> >>>>> happened in the system for ~4s before.
> >>>>>
> >>>>>
> >>>>> With the logs:
> >>>>> timer-responder-738     [002] *.~2.   128.466214:
> >>>>> evl_timer_shot: [proxy-timer/2] at 128.466467 (delay: 253 us,
> >>>>> 66918 cycles)
> >>>>>
> >>>>> and after ISW:
> >>>>>
> >>>>> <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
> >>>>> latmus_pulse_handler at 128.467208 (delay: 739 us, 195206
> >>>>> cycles)
> >>>>>
> >>>>> Why there is such increase in "delay" (253 vs 739 us) ? 
> >>>>>
> >>>>> It looks like we have only 2us delay as the timer has fired on
> >>>>> 128.466469 vs. expected 128.466467.
> >>>>>    
> >>>>
> >>>> Ok, so the issue is still the same: for some reason, the
> >>>> timer-responder thread is taking a (minor) fault when accessing
> >>>> some memory, on its way back from the ioctl(EVL_LATIOC_PULSE)
> >>>> syscall, or right after. This seems to be valid memory though,
> >>>> just not immediately accessible, otherwise we'd have received
> >>>> SEGV, which we did not.  
> >>>
> >>> Yes, I also think so - like some "deferred allocation" despite of
> >>> mlockall.
> >>>
> >>> Please find the logs with the below patch applied:
> >>> https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
> >>>  
> >> 
> >> Thanks. I need to analyze the fault flags next week to see if I
> >> can make something of these, but I just stumbled on an obviously
> >> significant fact when looking at the multi-CPU view: the memory
> >> compaction daemon is running asynchronously on other cores while
> >> the responder is taking the fault. Compaction is all about moving
> >> memory around to free larger physical regions, so updating PTEs
> >> will happen as a result.
> >> 
> >> Moreover, I could not reproduce the issue at hand, but I did not
> >> have CONFIG_COMPACTION enabled. In the meantime, could you try
> >> disabling this on your end, to see if the fault still appears?  
> >
> > CONFIG_COMPACTION selects CONFIG_MIGRATION, and that is well-known
> > to be incompatible with RT. Xenomai 3 is warning about that that,
> > EVL does not? 
> 
> Dovetail should do that when CONFIG_DOVETAIL is set, this is by
> definition common to any rt core.
> 

bash-5.2# evl check
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_EVL_DEBUG_CORE=y
CONFIG_EVL_DEBUG_MEMORY=y
CONFIG_FTRACE=y

And of course three last entries were enabled when I've been "hunting"
the ISW in latmus cause.

-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-31 18:11                                         ` Philippe Gerum
@ 2025-11-01 11:32                                           ` Łukasz Majewski
  2025-11-03  7:57                                           ` Florian Bezdeka
  1 sibling, 0 replies; 50+ messages in thread
From: Łukasz Majewski @ 2025-11-01 11:32 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, Giulio Moro, Xenomai

Hi Philippe,

> Philippe Gerum <rpm@xenomai.org> writes:
> 
> > Jan Kiszka <jan.kiszka@siemens.com> writes:
> >  
> >> On 31.10.25 17:30, Philippe Gerum wrote:  
> >>> Łukasz Majewski <lukma@nabladev.com> writes:
> >>>   
> >>>> Hi Philippe,
> >>>>  
> >>>>> Hi Łukasz,
> >>>>>
> >>>>> Łukasz Majewski <lukma@nabladev.com> writes:
> >>>>>  
> >>>>>>>> Could you get me a trace which goes back up to two secs
> >>>>>>>> before the time reported by evl about the inband switch of
> >>>>>>>> the timer responder? e.g.    
> >>>>>>
> >>>>>> Why is it 2 seconds? Is it related to latmus "refresh time"?
> >>>>>> (which is 1s)
> >>>>>>    
> >>>>>
> >>>>> Picking this delay is purely heuristical. Such a long backlog is
> >>>>> likely going to include the smoking gun, if any.
> >>>>>  
> >>>>>>>>
> >>>>>>>> [   75.214527] EVL: timer-responder:743 switching in-band
> >>>>>>>> [pid=745, excpt=14, user_pc=0x77d08492a5fe]
> >>>>>>>>
> >>>>>>>> In the above case, I would need a trace going back to
> >>>>>>>> 73.214527 or so.     
> >>>>>>>
> >>>>>>> Apparently the ftrace buffer was too small. I will adjust it
> >>>>>>> to store more data and share results.    
> >>>>>>
> >>>>>> It is also interesting, that:
> >>>>>>
> >>>>>> bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
> >>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> >>>>>> 16387
> >>>>>> bash-5.2# evl trace -eirq -f
> >>>>>> tracing enabled
> >>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> >>>>>> 131
> >>>>>>
> >>>>>> Reduces the buffer size...
> >>>>>>    
> >>>>>
> >>>>> Yeah, the evl-trace script is not that smart. I can see the
> >>>>> issue causing this, will fix.
> >>>>>  
> >>>>>> Hence, I've used the "normal" ftrace with enabled evl
> >>>>>> functions - the    
> >>>>>
> >>>>> FWIW, EVL relies on the normal ftrace. However, evl-trace
> >>>>> fiddles with the sysfs entries manually, and may be a bit naive
> >>>>> at times. As Jan pointed out long ago, relying on the trace-cmd
> >>>>> front-end would be a better approach than tweaking ftrace's
> >>>>> sysfs-based controls manually at the very least. evl-trace adds
> >>>>> EVL semantics to the tracing configuration which is handy, but
> >>>>> the way it is done today may deserve a revamping, or maybe drop
> >>>>> it entirely in favor of trace-cmd if we have enough flexibility
> >>>>> there in order to introduce some EVL specifics.
> >>>>>  
> >>>>>> detailed information about the ftrace config is with log files:
> >>>>>> https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
> >>>>>>
> >>>>>>
> >>>>>> To extract the cpu2 specific data:
> >>>>>> bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
> >>>>>> cpu2.txt
> >>>>>>
> >>>>>> Here again the ISW happens in the cpu2 execution "hole":
> >>>>>> [  128.466459] EVL: timer-responder:736 switching in-band
> >>>>>> [pid=738, excpt=14, user_pc=0x7588e71985fe]
> >>>>>>
> >>>>>> Happens for cpu2 between:
> >>>>>>
> >>>>>> 11832546:          <idle>-0     [002] *..2.   128.466438:
> >>>>>> 	sched_idle_set_state <-cpuidle_enter_state
> >>>>>> 11832547:          <idle>-0     [002] *..2.   128.466468:
> >>>>>> 	handle_irq_pipelined_prepare <-arch_pipeline_entry
> >>>>>>
> >>>>>> However, now the log is large enough to inspect what has
> >>>>>> happened in the system for ~4s before.
> >>>>>>
> >>>>>>
> >>>>>> With the logs:
> >>>>>> timer-responder-738     [002] *.~2.   128.466214:
> >>>>>> evl_timer_shot: [proxy-timer/2] at 128.466467 (delay: 253 us,
> >>>>>> 66918 cycles)
> >>>>>>
> >>>>>> and after ISW:
> >>>>>>
> >>>>>> <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
> >>>>>> latmus_pulse_handler at 128.467208 (delay: 739 us, 195206
> >>>>>> cycles)
> >>>>>>
> >>>>>> Why there is such increase in "delay" (253 vs 739 us) ? 
> >>>>>>
> >>>>>> It looks like we have only 2us delay as the timer has fired on
> >>>>>> 128.466469 vs. expected 128.466467.
> >>>>>>    
> >>>>>
> >>>>> Ok, so the issue is still the same: for some reason, the
> >>>>> timer-responder thread is taking a (minor) fault when accessing
> >>>>> some memory, on its way back from the ioctl(EVL_LATIOC_PULSE)
> >>>>> syscall, or right after. This seems to be valid memory though,
> >>>>> just not immediately accessible, otherwise we'd have received
> >>>>> SEGV, which we did not.  
> >>>>
> >>>> Yes, I also think so - like some "deferred allocation" despite of
> >>>> mlockall.
> >>>>
> >>>> Please find the logs with the below patch applied:
> >>>> https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
> >>>>  
> >>> 
> >>> Thanks. I need to analyze the fault flags next week to see if I
> >>> can make something of these, but I just stumbled on an obviously
> >>> significant fact when looking at the multi-CPU view: the memory
> >>> compaction daemon is running asynchronously on other cores while
> >>> the responder is taking the fault. Compaction is all about moving
> >>> memory around to free larger physical regions, so updating PTEs
> >>> will happen as a result.
> >>> 
> >>> Moreover, I could not reproduce the issue at hand, but I did not
> >>> have CONFIG_COMPACTION enabled. In the meantime, could you try
> >>> disabling this on your end, to see if the fault still appears?  
> >>
> >> CONFIG_COMPACTION selects CONFIG_MIGRATION, and that is well-known
> >> to be incompatible with RT. Xenomai 3 is warning about that that,
> >> EVL does not? 
> >
> > Dovetail should do that when CONFIG_DOVETAIL is set, this is by
> > definition common to any rt core.  
> 
> Also, it would be much better to find a way to exclude any mm bearing
> the MMF_DOVETAIL flag from the movable set, instead of excluding
> everything that sets CONFIG_COMPACTION.
> 

Yes, as we may be "hit" again in the future when either the CONFIG_*
name is changed or new, improved approach is silently introduced...

-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-31 16:30                                   ` Philippe Gerum
  2025-10-31 17:34                                     ` Jan Kiszka
@ 2025-11-01 15:59                                     ` Łukasz Majewski
  2025-11-01 16:33                                       ` Giulio Moro
  2025-11-03 14:00                                       ` Philippe Gerum
  1 sibling, 2 replies; 50+ messages in thread
From: Łukasz Majewski @ 2025-11-01 15:59 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe,

> Łukasz Majewski <lukma@nabladev.com> writes:
> 
> > Hi Philippe,
> >  
> >> Hi Łukasz,
> >> 
> >> Łukasz Majewski <lukma@nabladev.com> writes:
> >>   
> >> >> > Could you get me a trace which goes back up to two secs before
> >> >> > the time reported by evl about the inband switch of the timer
> >> >> > responder? e.g.    
> >> >
> >> > Why is it 2 seconds? Is it related to latmus "refresh time"?
> >> > (which is 1s)
> >> >    
> >> 
> >> Picking this delay is purely heuristical. Such a long backlog is
> >> likely going to include the smoking gun, if any.
> >>   
> >> >> > 
> >> >> > [   75.214527] EVL: timer-responder:743 switching in-band
> >> >> > [pid=745, excpt=14, user_pc=0x77d08492a5fe]
> >> >> > 
> >> >> > In the above case, I would need a trace going back to
> >> >> > 73.214527 or so.     
> >> >> 
> >> >> Apparently the ftrace buffer was too small. I will adjust it to
> >> >> store more data and share results.    
> >> >
> >> > It is also interesting, that:
> >> >
> >> > bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
> >> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> >> > 16387
> >> > bash-5.2# evl trace -eirq -f
> >> > tracing enabled
> >> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
> >> > 131
> >> >
> >> > Reduces the buffer size...
> >> >    
> >> 
> >> Yeah, the evl-trace script is not that smart. I can see the issue
> >> causing this, will fix.
> >>   
> >> > Hence, I've used the "normal" ftrace with enabled evl functions -
> >> > the    
> >> 
> >> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles
> >> with the sysfs entries manually, and may be a bit naive at times.
> >> As Jan pointed out long ago, relying on the trace-cmd front-end
> >> would be a better approach than tweaking ftrace's sysfs-based
> >> controls manually at the very least. evl-trace adds EVL semantics
> >> to the tracing configuration which is handy, but the way it is
> >> done today may deserve a revamping, or maybe drop it entirely in
> >> favor of trace-cmd if we have enough flexibility there in order to
> >> introduce some EVL specifics.
> >>   
> >> > detailed information about the ftrace config is with log files:
> >> > https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
> >> >
> >> >
> >> > To extract the cpu2 specific data:
> >> > bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
> >> > cpu2.txt
> >> >
> >> > Here again the ISW happens in the cpu2 execution "hole":
> >> > [  128.466459] EVL: timer-responder:736 switching in-band
> >> > [pid=738, excpt=14, user_pc=0x7588e71985fe]
> >> >
> >> > Happens for cpu2 between:
> >> >
> >> > 11832546:          <idle>-0     [002] *..2.   128.466438:
> >> > 	sched_idle_set_state <-cpuidle_enter_state
> >> > 11832547:          <idle>-0     [002] *..2.   128.466468:
> >> > 	handle_irq_pipelined_prepare <-arch_pipeline_entry
> >> >
> >> > However, now the log is large enough to inspect what has
> >> > happened in the system for ~4s before.
> >> >
> >> >
> >> > With the logs:
> >> > timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
> >> > [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
> >> >
> >> > and after ISW:
> >> >
> >> > <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
> >> > latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
> >> >
> >> > Why there is such increase in "delay" (253 vs 739 us) ? 
> >> >
> >> > It looks like we have only 2us delay as the timer has fired on
> >> > 128.466469 vs. expected 128.466467.
> >> >    
> >> 
> >> Ok, so the issue is still the same: for some reason, the
> >> timer-responder thread is taking a (minor) fault when accessing
> >> some memory, on its way back from the ioctl(EVL_LATIOC_PULSE)
> >> syscall, or right after. This seems to be valid memory though,
> >> just not immediately accessible, otherwise we'd have received
> >> SEGV, which we did not.  
> >
> > Yes, I also think so - like some "deferred allocation" despite of
> > mlockall.
> >
> > Please find the logs with the below patch applied:
> > https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
> >  
> 
> Thanks. I need to analyze the fault flags next week to see if I can
> make something of these, but I just stumbled on an obviously
> significant fact when looking at the multi-CPU view: the memory
> compaction daemon is running asynchronously on other cores while the
> responder is taking the fault. Compaction is all about moving memory
> around to free larger physical regions, so updating PTEs will happen
> as a result.
> 
> Moreover, I could not reproduce the issue at hand, but I did not have
> CONFIG_COMPACTION enabled. In the meantime, could you try disabling
> this on your end, to see if the fault still appears?
> 

CONFIG_COMPACTION=n - it also automatically set CONFIG_MIGRATION=n

From the Kconfig dependencies CONFIG_MIGRATION=n -> !PREEMPT_RT [=n]
(i.e. it can be only enabled when PREEMPT_RT is NOT)

I've just set CONFIG_COMPACTION=n (which also disabled the
CONFIG_MIGRATION) and now I cannot reproduce the issue on my setup (i.e.
v6.6.) anymore.

It looks like those (enabled by default) config options have slipped in
silently and introduced the issue...


-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-11-01 15:59                                     ` Łukasz Majewski
@ 2025-11-01 16:33                                       ` Giulio Moro
  2025-11-03 14:06                                         ` Philippe Gerum
  2025-11-03 14:00                                       ` Philippe Gerum
  1 sibling, 1 reply; 50+ messages in thread
From: Giulio Moro @ 2025-11-01 16:33 UTC (permalink / raw)
  To: Łukasz Majewski, Philippe Gerum; +Cc: Xenomai

Hi everyone,

> 
> CONFIG_COMPACTION=n - it also automatically set CONFIG_MIGRATION=n
> 
>  From the Kconfig dependencies CONFIG_MIGRATION=n -> !PREEMPT_RT [=n]
> (i.e. it can be only enabled when PREEMPT_RT is NOT)
> 
> I've just set CONFIG_COMPACTION=n (which also disabled the
> CONFIG_MIGRATION) and now I cannot reproduce the issue on my setup (i.e.
> v6.6.) anymore.
> 
> It looks like those (enabled by default) config options have slipped in
> silently and introduced the issue...
> 

I can confirm disabling CONFIG_COMPACTION and CONFIG_MIGRATION fixed the issue for me as well. Fun fact is that I started looking over these options when I first raised the issue, but then realised that evl check did not flag them as problematic, giving me a false sense of security ...
Without knowing much about the internals, in hindsight it makes perfect sense that the process was more likely to fail early on in the lifteime of the process and/or of the kernel than after it had been running for a long time: once the memory layout has settled, it's unlikely it will get moved again.

By the way, I was thinking that replacing an in-menuconfig warning (a la Xenomai-3) would be nice to have, instead of having to manually run evl check on the config file.
Thanks everyone!

Best,
Giulio

Łukasz Majewski wrote on 01/11/2025 10:59:
> Hi Philippe,
> 
>> Łukasz Majewski <lukma@nabladev.com> writes:
>>
>>> Hi Philippe,
>>>   
>>>> Hi Łukasz,
>>>>
>>>> Łukasz Majewski <lukma@nabladev.com> writes:
>>>>    
>>>>>>> Could you get me a trace which goes back up to two secs before
>>>>>>> the time reported by evl about the inband switch of the timer
>>>>>>> responder? e.g.
>>>>>
>>>>> Why is it 2 seconds? Is it related to latmus "refresh time"?
>>>>> (which is 1s)
>>>>>     
>>>>
>>>> Picking this delay is purely heuristical. Such a long backlog is
>>>> likely going to include the smoking gun, if any.
>>>>    
>>>>>>>
>>>>>>> [   75.214527] EVL: timer-responder:743 switching in-band
>>>>>>> [pid=745, excpt=14, user_pc=0x77d08492a5fe]
>>>>>>>
>>>>>>> In the above case, I would need a trace going back to
>>>>>>> 73.214527 or so.
>>>>>>
>>>>>> Apparently the ftrace buffer was too small. I will adjust it to
>>>>>> store more data and share results.
>>>>>
>>>>> It is also interesting, that:
>>>>>
>>>>> bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>>> 16387
>>>>> bash-5.2# evl trace -eirq -f
>>>>> tracing enabled
>>>>> bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>>>>> 131
>>>>>
>>>>> Reduces the buffer size...
>>>>>     
>>>>
>>>> Yeah, the evl-trace script is not that smart. I can see the issue
>>>> causing this, will fix.
>>>>    
>>>>> Hence, I've used the "normal" ftrace with enabled evl functions -
>>>>> the
>>>>
>>>> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles
>>>> with the sysfs entries manually, and may be a bit naive at times.
>>>> As Jan pointed out long ago, relying on the trace-cmd front-end
>>>> would be a better approach than tweaking ftrace's sysfs-based
>>>> controls manually at the very least. evl-trace adds EVL semantics
>>>> to the tracing configuration which is handy, but the way it is
>>>> done today may deserve a revamping, or maybe drop it entirely in
>>>> favor of trace-cmd if we have enough flexibility there in order to
>>>> introduce some EVL specifics.
>>>>    
>>>>> detailed information about the ftrace config is with log files:
>>>>> https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
>>>>>
>>>>>
>>>>> To extract the cpu2 specific data:
>>>>> bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
>>>>> cpu2.txt
>>>>>
>>>>> Here again the ISW happens in the cpu2 execution "hole":
>>>>> [  128.466459] EVL: timer-responder:736 switching in-band
>>>>> [pid=738, excpt=14, user_pc=0x7588e71985fe]
>>>>>
>>>>> Happens for cpu2 between:
>>>>>
>>>>> 11832546:          <idle>-0     [002] *..2.   128.466438:
>>>>> 	sched_idle_set_state <-cpuidle_enter_state
>>>>> 11832547:          <idle>-0     [002] *..2.   128.466468:
>>>>> 	handle_irq_pipelined_prepare <-arch_pipeline_entry
>>>>>
>>>>> However, now the log is large enough to inspect what has
>>>>> happened in the system for ~4s before.
>>>>>
>>>>>
>>>>> With the logs:
>>>>> timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
>>>>> [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
>>>>>
>>>>> and after ISW:
>>>>>
>>>>> <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
>>>>> latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
>>>>>
>>>>> Why there is such increase in "delay" (253 vs 739 us) ?
>>>>>
>>>>> It looks like we have only 2us delay as the timer has fired on
>>>>> 128.466469 vs. expected 128.466467.
>>>>>     
>>>>
>>>> Ok, so the issue is still the same: for some reason, the
>>>> timer-responder thread is taking a (minor) fault when accessing
>>>> some memory, on its way back from the ioctl(EVL_LATIOC_PULSE)
>>>> syscall, or right after. This seems to be valid memory though,
>>>> just not immediately accessible, otherwise we'd have received
>>>> SEGV, which we did not.
>>>
>>> Yes, I also think so - like some "deferred allocation" despite of
>>> mlockall.
>>>
>>> Please find the logs with the below patch applied:
>>> https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
>>>   
>>
>> Thanks. I need to analyze the fault flags next week to see if I can
>> make something of these, but I just stumbled on an obviously
>> significant fact when looking at the multi-CPU view: the memory
>> compaction daemon is running asynchronously on other cores while the
>> responder is taking the fault. Compaction is all about moving memory
>> around to free larger physical regions, so updating PTEs will happen
>> as a result.
>>
>> Moreover, I could not reproduce the issue at hand, but I did not have
>> CONFIG_COMPACTION enabled. In the meantime, could you try disabling
>> this on your end, to see if the fault still appears?
>>
> 
> CONFIG_COMPACTION=n - it also automatically set CONFIG_MIGRATION=n
> 
>  From the Kconfig dependencies CONFIG_MIGRATION=n -> !PREEMPT_RT [=n]
> (i.e. it can be only enabled when PREEMPT_RT is NOT)
> 
> I've just set CONFIG_COMPACTION=n (which also disabled the
> CONFIG_MIGRATION) and now I cannot reproduce the issue on my setup (i.e.
> v6.6.) anymore.
> 
> It looks like those (enabled by default) config options have slipped in
> silently and introduced the issue...
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-10-31 18:11                                         ` Philippe Gerum
  2025-11-01 11:32                                           ` Łukasz Majewski
@ 2025-11-03  7:57                                           ` Florian Bezdeka
  2025-11-03  9:29                                             ` Jan Kiszka
  1 sibling, 1 reply; 50+ messages in thread
From: Florian Bezdeka @ 2025-11-03  7:57 UTC (permalink / raw)
  To: Philippe Gerum, Jan Kiszka; +Cc: Łukasz Majewski, Giulio Moro, Xenomai

On Fri, 2025-10-31 at 19:11 +0100, Philippe Gerum wrote:

[snip]

> > > 
> > > > Thanks. I need to analyze the fault flags next week to see if I can make
> > > > something of these, but I just stumbled on an obviously significant fact
> > > > when looking at the multi-CPU view: the memory compaction daemon is
> > > > running asynchronously on other cores while the responder is taking the
> > > > fault. Compaction is all about moving memory around to free larger
> > > > physical regions, so updating PTEs will happen as a result.
> > > > 
> > > > Moreover, I could not reproduce the issue at hand, but I did not have
> > > > CONFIG_COMPACTION enabled. In the meantime, could you try disabling this
> > > > on your end, to see if the fault still appears?
> > > 
> > > CONFIG_COMPACTION selects CONFIG_MIGRATION, and that is well-known to be
> > > incompatible with RT. Xenomai 3 is warning about that that, EVL does not?
> > > 
> > 
> > Dovetail should do that when CONFIG_DOVETAIL is set, this is by
> > definition common to any rt core.
> 
> Also, it would be much better to find a way to exclude any mm bearing
> the MMF_DOVETAIL flag from the movable set, instead of excluding
> everything that sets CONFIG_COMPACTION.

Are there any concrete ideas already? Asking because I'm expecting this
topic to appear on my radar soon, again. Some colleagues were fighting
with the opposite direction in the past: CONFIG_COMPACTION being
disabled. That caused some trouble with a memory intense application
running next to the RT workload. The OOM killer was invoked quite
regularly.

Florian


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-11-03  7:57                                           ` Florian Bezdeka
@ 2025-11-03  9:29                                             ` Jan Kiszka
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Kiszka @ 2025-11-03  9:29 UTC (permalink / raw)
  To: Florian Bezdeka, Philippe Gerum
  Cc: Łukasz Majewski, Giulio Moro, Xenomai

On 03.11.25 08:57, Florian Bezdeka wrote:
> On Fri, 2025-10-31 at 19:11 +0100, Philippe Gerum wrote:
> 
> [snip]
> 
>>>>
>>>>> Thanks. I need to analyze the fault flags next week to see if I can make
>>>>> something of these, but I just stumbled on an obviously significant fact
>>>>> when looking at the multi-CPU view: the memory compaction daemon is
>>>>> running asynchronously on other cores while the responder is taking the
>>>>> fault. Compaction is all about moving memory around to free larger
>>>>> physical regions, so updating PTEs will happen as a result.
>>>>>
>>>>> Moreover, I could not reproduce the issue at hand, but I did not have
>>>>> CONFIG_COMPACTION enabled. In the meantime, could you try disabling this
>>>>> on your end, to see if the fault still appears?
>>>>
>>>> CONFIG_COMPACTION selects CONFIG_MIGRATION, and that is well-known to be
>>>> incompatible with RT. Xenomai 3 is warning about that that, EVL does not?
>>>>
>>>
>>> Dovetail should do that when CONFIG_DOVETAIL is set, this is by
>>> definition common to any rt core.
>>
>> Also, it would be much better to find a way to exclude any mm bearing
>> the MMF_DOVETAIL flag from the movable set, instead of excluding
>> everything that sets CONFIG_COMPACTION.
> 
> Are there any concrete ideas already? Asking because I'm expecting this
> topic to appear on my radar soon, again. Some colleagues were fighting
> with the opposite direction in the past: CONFIG_COMPACTION being
> disabled. That caused some trouble with a memory intense application
> running next to the RT workload. The OOM killer was invoked quite
> regularly.

I even heard that this could be a topic in the PREEMPT-RT context, thus
for upstream. Maybe worth to look left and right first before starting
own work.

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-11-01 15:59                                     ` Łukasz Majewski
  2025-11-01 16:33                                       ` Giulio Moro
@ 2025-11-03 14:00                                       ` Philippe Gerum
  1 sibling, 0 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-11-03 14:00 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Giulio Moro, Xenomai

Łukasz Majewski <lukma@nabladev.com> writes:

> Hi Philippe,
>
>> Łukasz Majewski <lukma@nabladev.com> writes:
>> 
>> > Hi Philippe,
>> >  
>> >> Hi Łukasz,
>> >> 
>> >> Łukasz Majewski <lukma@nabladev.com> writes:
>> >>   
>> >> >> > Could you get me a trace which goes back up to two secs before
>> >> >> > the time reported by evl about the inband switch of the timer
>> >> >> > responder? e.g.    
>> >> >
>> >> > Why is it 2 seconds? Is it related to latmus "refresh time"?
>> >> > (which is 1s)
>> >> >    
>> >> 
>> >> Picking this delay is purely heuristical. Such a long backlog is
>> >> likely going to include the smoking gun, if any.
>> >>   
>> >> >> > 
>> >> >> > [   75.214527] EVL: timer-responder:743 switching in-band
>> >> >> > [pid=745, excpt=14, user_pc=0x77d08492a5fe]
>> >> >> > 
>> >> >> > In the above case, I would need a trace going back to
>> >> >> > 73.214527 or so.     
>> >> >> 
>> >> >> Apparently the ftrace buffer was too small. I will adjust it to
>> >> >> store more data and share results.    
>> >> >
>> >> > It is also interesting, that:
>> >> >
>> >> > bash-5.2# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
>> >> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>> >> > 16387
>> >> > bash-5.2# evl trace -eirq -f
>> >> > tracing enabled
>> >> > bash-5.2# cat /sys/kernel/debug/tracing/buffer_size_kb
>> >> > 131
>> >> >
>> >> > Reduces the buffer size...
>> >> >    
>> >> 
>> >> Yeah, the evl-trace script is not that smart. I can see the issue
>> >> causing this, will fix.
>> >>   
>> >> > Hence, I've used the "normal" ftrace with enabled evl functions -
>> >> > the    
>> >> 
>> >> FWIW, EVL relies on the normal ftrace. However, evl-trace fiddles
>> >> with the sysfs entries manually, and may be a bit naive at times.
>> >> As Jan pointed out long ago, relying on the trace-cmd front-end
>> >> would be a better approach than tweaking ftrace's sysfs-based
>> >> controls manually at the very least. evl-trace adds EVL semantics
>> >> to the tracing configuration which is handy, but the way it is
>> >> done today may deserve a revamping, or maybe drop it entirely in
>> >> favor of trace-cmd if we have enough flexibility there in order to
>> >> introduce some EVL specifics.
>> >>   
>> >> > detailed information about the ftrace config is with log files:
>> >> > https://nextcloud.swupdate.org/index.php/s/czBd3Ydq9B9noZ7
>> >> >
>> >> >
>> >> > To extract the cpu2 specific data:
>> >> > bash-5.2# grep -rnE '\[002\]' Xenomai4-ISW-trace-6.6-logs.txt >
>> >> > cpu2.txt
>> >> >
>> >> > Here again the ISW happens in the cpu2 execution "hole":
>> >> > [  128.466459] EVL: timer-responder:736 switching in-band
>> >> > [pid=738, excpt=14, user_pc=0x7588e71985fe]
>> >> >
>> >> > Happens for cpu2 between:
>> >> >
>> >> > 11832546:          <idle>-0     [002] *..2.   128.466438:
>> >> > 	sched_idle_set_state <-cpuidle_enter_state
>> >> > 11832547:          <idle>-0     [002] *..2.   128.466468:
>> >> > 	handle_irq_pipelined_prepare <-arch_pipeline_entry
>> >> >
>> >> > However, now the log is large enough to inspect what has
>> >> > happened in the system for ~4s before.
>> >> >
>> >> >
>> >> > With the logs:
>> >> > timer-responder-738     [002] *.~2.   128.466214: evl_timer_shot:
>> >> > [proxy-timer/2] at 128.466467 (delay: 253 us, 66918 cycles)
>> >> >
>> >> > and after ISW:
>> >> >
>> >> > <idle>-0       [002] *.~1.   128.466469: evl_timer_shot:
>> >> > latmus_pulse_handler at 128.467208 (delay: 739 us, 195206 cycles)
>> >> >
>> >> > Why there is such increase in "delay" (253 vs 739 us) ? 
>> >> >
>> >> > It looks like we have only 2us delay as the timer has fired on
>> >> > 128.466469 vs. expected 128.466467.
>> >> >    
>> >> 
>> >> Ok, so the issue is still the same: for some reason, the
>> >> timer-responder thread is taking a (minor) fault when accessing
>> >> some memory, on its way back from the ioctl(EVL_LATIOC_PULSE)
>> >> syscall, or right after. This seems to be valid memory though,
>> >> just not immediately accessible, otherwise we'd have received
>> >> SEGV, which we did not.  
>> >
>> > Yes, I also think so - like some "deferred allocation" despite of
>> > mlockall.
>> >
>> > Please find the logs with the below patch applied:
>> > https://nextcloud.swupdate.org/index.php/s/JXNW5bJrB3fQo5E
>> >  
>> 
>> Thanks. I need to analyze the fault flags next week to see if I can
>> make something of these, but I just stumbled on an obviously
>> significant fact when looking at the multi-CPU view: the memory
>> compaction daemon is running asynchronously on other cores while the
>> responder is taking the fault. Compaction is all about moving memory
>> around to free larger physical regions, so updating PTEs will happen
>> as a result.
>> 
>> Moreover, I could not reproduce the issue at hand, but I did not have
>> CONFIG_COMPACTION enabled. In the meantime, could you try disabling
>> this on your end, to see if the fault still appears?
>> 
>
> CONFIG_COMPACTION=n - it also automatically set CONFIG_MIGRATION=n
>
> From the Kconfig dependencies CONFIG_MIGRATION=n -> !PREEMPT_RT [=n]
> (i.e. it can be only enabled when PREEMPT_RT is NOT)
>
> I've just set CONFIG_COMPACTION=n (which also disabled the
> CONFIG_MIGRATION) and now I cannot reproduce the issue on my setup (i.e.
> v6.6.) anymore.
>
> It looks like those (enabled by default) config options have slipped in
> silently and introduced the issue...
>

Yeah, sorry guys. This silly issue should never have happened, I simply
overlooked the migration setting which has been well-known to be a
latency killer since 2015 or so if my memory serves me well, my bad.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-11-01 16:33                                       ` Giulio Moro
@ 2025-11-03 14:06                                         ` Philippe Gerum
  2025-11-04  7:53                                           ` Łukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Philippe Gerum @ 2025-11-03 14:06 UTC (permalink / raw)
  To: Giulio Moro; +Cc: Łukasz Majewski, Xenomai

Giulio Moro <giulio@bela.io> writes:

> Hi everyone,
>
>> CONFIG_COMPACTION=n - it also automatically set CONFIG_MIGRATION=n
>>  From the Kconfig dependencies CONFIG_MIGRATION=n -> !PREEMPT_RT
>> [=n]
>> (i.e. it can be only enabled when PREEMPT_RT is NOT)
>> I've just set CONFIG_COMPACTION=n (which also disabled the
>> CONFIG_MIGRATION) and now I cannot reproduce the issue on my setup (i.e.
>> v6.6.) anymore.
>> It looks like those (enabled by default) config options have slipped
>> in
>> silently and introduced the issue...
>> 
>
> I can confirm disabling CONFIG_COMPACTION and CONFIG_MIGRATION fixed the issue for me as well. Fun fact is that I started looking over these options when I first raised the issue, but then realised that evl check did not flag them as problematic, giving me a false sense of security ...
> Without knowing much about the internals, in hindsight it makes perfect sense that the process was more likely to fail early on in the lifteime of the process and/or of the kernel than after it had been running for a long time: once the memory layout has settled, it's unlikely it will get moved again.
>
> By the way, I was thinking that replacing an in-menuconfig warning (a
> la Xenomai-3) would be nice to have, instead of having to manually run
> evl check on the config file.

Actually, we want both. Some automated configuration process may not
give us any hint loud enough, a runtime check is always welcome,
especially when providing support.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-11-03 14:06                                         ` Philippe Gerum
@ 2025-11-04  7:53                                           ` Łukasz Majewski
  2025-11-04  8:19                                             ` Philippe Gerum
  0 siblings, 1 reply; 50+ messages in thread
From: Łukasz Majewski @ 2025-11-04  7:53 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Giulio Moro, Xenomai

Hi Philippe,

> Giulio Moro <giulio@bela.io> writes:
> 
> > Hi everyone,
> >  
> >> CONFIG_COMPACTION=n - it also automatically set CONFIG_MIGRATION=n
> >>  From the Kconfig dependencies CONFIG_MIGRATION=n -> !PREEMPT_RT
> >> [=n]
> >> (i.e. it can be only enabled when PREEMPT_RT is NOT)
> >> I've just set CONFIG_COMPACTION=n (which also disabled the
> >> CONFIG_MIGRATION) and now I cannot reproduce the issue on my setup
> >> (i.e. v6.6.) anymore.
> >> It looks like those (enabled by default) config options have
> >> slipped in
> >> silently and introduced the issue...
> >>   
> >
> > I can confirm disabling CONFIG_COMPACTION and CONFIG_MIGRATION
> > fixed the issue for me as well. Fun fact is that I started looking
> > over these options when I first raised the issue, but then realised
> > that evl check did not flag them as problematic, giving me a false
> > sense of security ... Without knowing much about the internals, in
> > hindsight it makes perfect sense that the process was more likely
> > to fail early on in the lifteime of the process and/or of the
> > kernel than after it had been running for a long time: once the
> > memory layout has settled, it's unlikely it will get moved again.
> >
> > By the way, I was thinking that replacing an in-menuconfig warning
> > (a la Xenomai-3) would be nice to have, instead of having to
> > manually run evl check on the config file.  
> 
> Actually, we want both. Some automated configuration process may not
> give us any hint loud enough, a runtime check is always welcome,
> especially when providing support.
> 

IMHO, it would be enough if we would have:

1. evl check complaining loudly about CONFIG_COMPACTION=y and
CONFIG_MIGRATION=y

2. Just adjust Kconfig to have:
CONFIG_MIGRATION if !DOVETAIL



-- 
Best regards,

Lukasz Majewski

--
Nabla Software Engineering GmbH
HRB 40522 Augsburg
Phone: +49 821 45592596
E-Mail: office@nabladev.com
Geschftsfhrer : Stefano Babic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Unexpected switches to in-band
  2025-11-04  7:53                                           ` Łukasz Majewski
@ 2025-11-04  8:19                                             ` Philippe Gerum
  0 siblings, 0 replies; 50+ messages in thread
From: Philippe Gerum @ 2025-11-04  8:19 UTC (permalink / raw)
  To: Łukasz Majewski; +Cc: Giulio Moro, Xenomai


Hi Łukasz,

Łukasz Majewski <lukma@nabladev.com> writes:

> Hi Philippe,
>
>> Giulio Moro <giulio@bela.io> writes:
>> 
>> > Hi everyone,
>> >  
>> >> CONFIG_COMPACTION=n - it also automatically set CONFIG_MIGRATION=n
>> >>  From the Kconfig dependencies CONFIG_MIGRATION=n -> !PREEMPT_RT
>> >> [=n]
>> >> (i.e. it can be only enabled when PREEMPT_RT is NOT)
>> >> I've just set CONFIG_COMPACTION=n (which also disabled the
>> >> CONFIG_MIGRATION) and now I cannot reproduce the issue on my setup
>> >> (i.e. v6.6.) anymore.
>> >> It looks like those (enabled by default) config options have
>> >> slipped in
>> >> silently and introduced the issue...
>> >>   
>> >
>> > I can confirm disabling CONFIG_COMPACTION and CONFIG_MIGRATION
>> > fixed the issue for me as well. Fun fact is that I started looking
>> > over these options when I first raised the issue, but then realised
>> > that evl check did not flag them as problematic, giving me a false
>> > sense of security ... Without knowing much about the internals, in
>> > hindsight it makes perfect sense that the process was more likely
>> > to fail early on in the lifteime of the process and/or of the
>> > kernel than after it had been running for a long time: once the
>> > memory layout has settled, it's unlikely it will get moved again.
>> >
>> > By the way, I was thinking that replacing an in-menuconfig warning
>> > (a la Xenomai-3) would be nice to have, instead of having to
>> > manually run evl check on the config file.  
>> 
>> Actually, we want both. Some automated configuration process may not
>> give us any hint loud enough, a runtime check is always welcome,
>> especially when providing support.
>> 
>
> IMHO, it would be enough if we would have:
>
> 1. evl check complaining loudly about CONFIG_COMPACTION=y and
> CONFIG_MIGRATION=y
>
> 2. Just adjust Kconfig to have:
> CONFIG_MIGRATION if !DOVETAIL

Unfortunately, CONFIG_MIGRATION is required via the CONFIG_CMA
dependency for some platforms such as i.MX with Vivante GPUs.  In this
case, we have to enable this feature, restricting page migration to
well-identified runtime phases. We might try to add per-{arch, platform}
conditionals in order to narrow the scope, but this might be a fragile
solution.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2025-11-04  8:25 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09  4:33 Unexpected switches to in-band Giulio Moro
2025-10-09 13:17 ` Łukasz Majewski
2025-10-09 19:05   ` Giulio Moro
2025-10-10 10:24     ` Łukasz Majewski
2025-10-10 12:21       ` Giulio Moro
2025-10-10 13:08         ` Łukasz Majewski
2025-10-11  4:25           ` Giulio Moro
2025-10-11 15:55     ` Philippe Gerum
2025-10-11 16:10       ` Philippe Gerum
2025-10-11 16:47         ` Giulio Moro
2025-10-11 16:56           ` Philippe Gerum
2025-10-11 17:15           ` Philippe Gerum
2025-10-11 19:46             ` Giulio Moro
2025-10-12  8:54               ` Philippe Gerum
2025-10-12 14:44               ` Philippe Gerum
2025-10-20  7:47           ` Łukasz Majewski
2025-10-20 12:46             ` Giulio Moro
2025-10-20 14:01               ` Philippe Gerum
2025-10-21 11:13                 ` Łukasz Majewski
2025-10-23 13:54                 ` Łukasz Majewski
2025-10-26 20:04                   ` Philippe Gerum
2025-10-27 11:05                     ` Łukasz Majewski
2025-10-27 11:35                       ` Philippe Gerum
2025-10-27 12:54                         ` Łukasz Majewski
2025-10-27 16:25                       ` Łukasz Majewski
2025-10-27 18:16                         ` Giulio Moro
2025-10-27 22:42                           ` Giulio Moro
2025-10-29  9:18                         ` Philippe Gerum
2025-10-29 13:51                           ` Łukasz Majewski
2025-10-30 12:26                             ` Łukasz Majewski
2025-10-30 16:17                               ` Philippe Gerum
2025-10-31 15:56                                 ` Łukasz Majewski
2025-10-31 16:30                                   ` Philippe Gerum
2025-10-31 17:34                                     ` Jan Kiszka
2025-10-31 18:09                                       ` Philippe Gerum
2025-10-31 18:11                                         ` Philippe Gerum
2025-11-01 11:32                                           ` Łukasz Majewski
2025-11-03  7:57                                           ` Florian Bezdeka
2025-11-03  9:29                                             ` Jan Kiszka
2025-11-01 11:31                                         ` Łukasz Majewski
2025-10-31 18:13                                       ` Philippe Gerum
2025-11-01 15:59                                     ` Łukasz Majewski
2025-11-01 16:33                                       ` Giulio Moro
2025-11-03 14:06                                         ` Philippe Gerum
2025-11-04  7:53                                           ` Łukasz Majewski
2025-11-04  8:19                                             ` Philippe Gerum
2025-11-03 14:00                                       ` Philippe Gerum
2025-10-30 16:26                               ` Philippe Gerum
2025-10-11 17:43         ` Philippe Gerum
2025-10-11 15:37 ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.