All of lore.kernel.org
 help / color / mirror / Atom feed
* SCHED_SPORADIC in Xenomai 3
@ 2026-06-10  6:24 Jan Kiszka
  2026-06-10  7:21 ` Jan Kiszka
  2026-06-11  7:33 ` Philippe Gerum
  0 siblings, 2 replies; 12+ messages in thread
From: Jan Kiszka @ 2026-06-10  6:24 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

Hi Philippe,

while trying to port the signal-while-suspended fix to Xenomai 3, I ran
into XNHELD, a state only existing there. I suppose that was once
forward-ported as EVL_T_HALT. The only user of XNHELD in Xenomai 3 is
SCHED_SPORADIC - so let's dive into that scheduling class.

Turned out it was never documented, not even linked to the POSIX
standard. But it also slightly differs from it (low_prio = -1 -> suspend
on depletion). There is also no test case, so I asked an AI for one.
That worked fairly well as it seems to have revealed an issue:

Could it be that we are not properly suspending the budget tracking when
a higher-prio task from a different scheduling class is preempting a
sporadic thread? It looks like that xnsched_sporadic_pick is not invoked
if a thread is selected from a higher-prio class first, namely sched-rt
with its weight 4 vs. 3 if sched-sporadic. Or is that an (undocumented)
limitation/misconfiguration? Is that issue even affecting other
time-slicing classes as well??

That furthermore makes me wonder if we actually have users of
sched-sporadic. Likely a hard to answer question, as usual. But such a
limitation should have been observed earlier under real workload...

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-10  6:24 SCHED_SPORADIC in Xenomai 3 Jan Kiszka
@ 2026-06-10  7:21 ` Jan Kiszka
  2026-06-10  7:33   ` Jan Kiszka
  2026-06-11  7:33 ` Philippe Gerum
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Kiszka @ 2026-06-10  7:21 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 10.06.26 08:24, Jan Kiszka wrote:
> Hi Philippe,
> 
> while trying to port the signal-while-suspended fix to Xenomai 3, I ran
> into XNHELD, a state only existing there. I suppose that was once
> forward-ported as EVL_T_HALT. The only user of XNHELD in Xenomai 3 is
> SCHED_SPORADIC - so let's dive into that scheduling class.
> 
> Turned out it was never documented, not even linked to the POSIX
> standard. But it also slightly differs from it (low_prio = -1 -> suspend
> on depletion). There is also no test case, so I asked an AI for one.
> That worked fairly well as it seems to have revealed an issue:
> 
> Could it be that we are not properly suspending the budget tracking when
> a higher-prio task from a different scheduling class is preempting a
> sporadic thread? It looks like that xnsched_sporadic_pick is not invoked
> if a thread is selected from a higher-prio class first, namely sched-rt
> with its weight 4 vs. 3 if sched-sporadic. Or is that an (undocumented)
> limitation/misconfiguration? Is that issue even affecting other
> time-slicing classes as well??
> 
> That furthermore makes me wonder if we actually have users of
> sched-sporadic. Likely a hard to answer question, as usual. But such a
> limitation should have been observed earlier under real workload...
> 
> Jan
> 

Here is a trace that proves how xnsched_sporadic_pick and, thus, 
sporadic_suspend_activity are not called:

         disrupt-1682  [000] d..2.    94.171753: cobalt_head_sysentry: syscall=clock_nanosleep64
         disrupt-1682  [000] d..2.    94.171755: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
         disrupt-1682  [000] d..2.    94.171757: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)
         disrupt-1682  [000] d..2.    94.171759: cobalt_timer_start:   timer=0xffffc900008bbb00(smokey) value=60000001 interval=0 mode=0x0
         disrupt-1682  [000] d..2.    94.171761: cobalt_tick_shot:     next tick at 94.231756 (delay: 59995 us)
         disrupt-1682  [000] d..2.    94.171770: cobalt_schedule:      status=0x10000000
         disrupt-1682  [000] d..2.    94.171771: cobalt_trace_pid:     pid=1682, prio=30
         disrupt-1682  [000] d..2.    94.171776: bprint:               xnsched_sporadic_pick: xnsched_sporadic_pick, curr=1682 next=1681
         disrupt-1682  [000] d..2.    94.171777: bprint:               xnsched_sporadic_pick: sporadic_resume_activity, pss->budget 99964473
         disrupt-1682  [000] d..2.    94.171778: bprint:               sporadic_schedule_drop: sporadic_schedule_drop, pss->budget 99964473
         disrupt-1682  [000] d..2.    94.171778: cobalt_timer_start:   timer=0xffffc900008bc4d8(pss-drop) value=94216201725 interval=0 mode=0x1
         disrupt-1682  [000] d..2.    94.171779: cobalt_switch_context: prev_name=disrupt prev_pid=1682 prev_prio=30 prev_state=0x248044 ==> next_name=ss-d next_pid=1681 next_prio=20
            ss-d-1681  [000] d..2.    94.171784: cobalt_trace_pid:     pid=1681, prio=20
            ss-d-1681  [000] d..2.    94.171788: cobalt_synch_acquire: synch=0xffffc900008bd408
            ss-d-1681  [000] d..2.    94.171789: cobalt_head_sysexit:  result=0
            ss-d-1681  [000] d..2.    94.171799: cobalt_head_sysentry: syscall=mutex_unlock
            ss-d-1681  [000] d..2.    94.171801: cobalt_synch_release: synch=0xffffc900008bd408
            ss-d-1681  [000] d..2.    94.171801: cobalt_head_sysexit:  result=0
            ss-d-1681  [000] d..2.    94.231786: cobalt_timer_expire:  timer=0xffffc900008bbb00
            ss-d-1681  [000] d..2.    94.231789: cobalt_thread_resume: name=disrupt pid=1682 mask=0x4
            ss-d-1681  [000] d..2.    94.231790: cobalt_trace_pid:     pid=1682, prio=30
            ss-d-1681  [000] d..2.    94.231791: cobalt_timer_stop:    timer=0xffffc900008bbb00
            ss-d-1681  [000] d..2.    94.231794: cobalt_tick_shot:     next tick at 94.271741 (delay: 39948 us)
            ss-d-1681  [000] d..2.    94.231802: cobalt_schedule:      status=0x10000000
            ss-d-1681  [000] d..2.    94.231803: cobalt_trace_pid:     pid=1681, prio=20
            ss-d-1681  [000] d..2.    94.231805: cobalt_switch_context: prev_name=ss-d prev_pid=1681 prev_prio=20 prev_state=0x248048 ==> next_name=disrupt next_pid=1682 next_prio=30
         disrupt-1682  [000] d..2.    94.231810: cobalt_trace_pid:     pid=1682, prio=30
         disrupt-1682  [000] d..2.    94.231811: cobalt_head_sysexit:  result=0
         disrupt-1682  [000] d..2.    94.271767: cobalt_timer_expire:  timer=0xffffc900008bc4d8
         disrupt-1682  [000] d..2.    94.271771: cobalt_thread_suspend: pid=1681 mask=0x200 timeout=0 timeout_mode=0 wchan=(nil)
         disrupt-1682  [000] d..2.    94.271772: cobalt_tick_shot:     next tick at 95.130203 (delay: 858431 us)
         disrupt-1682  [000] d..2.    94.271914: cobalt_head_sysentry: syscall=clock_nanosleep64
         disrupt-1682  [000] d..2.    94.271916: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
         disrupt-1682  [000] d..2.    94.271917: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)

Jan

PS: I asked AI (opus-4.7) to confirm or disprove this, and it failed in 
its code analysis. It seems like the way the code is structured and 
commented misguided it in its conclusions. Sooo statistically human...

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-10  7:21 ` Jan Kiszka
@ 2026-06-10  7:33   ` Jan Kiszka
  2026-06-10 18:17     ` Jan Kiszka
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kiszka @ 2026-06-10  7:33 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 10.06.26 09:21, Jan Kiszka wrote:
> On 10.06.26 08:24, Jan Kiszka wrote:
>> Hi Philippe,
>>
>> while trying to port the signal-while-suspended fix to Xenomai 3, I ran
>> into XNHELD, a state only existing there. I suppose that was once
>> forward-ported as EVL_T_HALT. The only user of XNHELD in Xenomai 3 is
>> SCHED_SPORADIC - so let's dive into that scheduling class.
>>
>> Turned out it was never documented, not even linked to the POSIX
>> standard. But it also slightly differs from it (low_prio = -1 -> suspend
>> on depletion). There is also no test case, so I asked an AI for one.
>> That worked fairly well as it seems to have revealed an issue:
>>
>> Could it be that we are not properly suspending the budget tracking when
>> a higher-prio task from a different scheduling class is preempting a
>> sporadic thread? It looks like that xnsched_sporadic_pick is not invoked
>> if a thread is selected from a higher-prio class first, namely sched-rt
>> with its weight 4 vs. 3 if sched-sporadic. Or is that an (undocumented)
>> limitation/misconfiguration? Is that issue even affecting other
>> time-slicing classes as well??
>>
>> That furthermore makes me wonder if we actually have users of
>> sched-sporadic. Likely a hard to answer question, as usual. But such a
>> limitation should have been observed earlier under real workload...
>>
>> Jan
>>
> 
> Here is a trace that proves how xnsched_sporadic_pick and, thus, 
> sporadic_suspend_activity are not called:
> 
>          disrupt-1682  [000] d..2.    94.171753: cobalt_head_sysentry: syscall=clock_nanosleep64
>          disrupt-1682  [000] d..2.    94.171755: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
>          disrupt-1682  [000] d..2.    94.171757: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)
>          disrupt-1682  [000] d..2.    94.171759: cobalt_timer_start:   timer=0xffffc900008bbb00(smokey) value=60000001 interval=0 mode=0x0
>          disrupt-1682  [000] d..2.    94.171761: cobalt_tick_shot:     next tick at 94.231756 (delay: 59995 us)
>          disrupt-1682  [000] d..2.    94.171770: cobalt_schedule:      status=0x10000000
>          disrupt-1682  [000] d..2.    94.171771: cobalt_trace_pid:     pid=1682, prio=30
>          disrupt-1682  [000] d..2.    94.171776: bprint:               xnsched_sporadic_pick: xnsched_sporadic_pick, curr=1682 next=1681
>          disrupt-1682  [000] d..2.    94.171777: bprint:               xnsched_sporadic_pick: sporadic_resume_activity, pss->budget 99964473
>          disrupt-1682  [000] d..2.    94.171778: bprint:               sporadic_schedule_drop: sporadic_schedule_drop, pss->budget 99964473
>          disrupt-1682  [000] d..2.    94.171778: cobalt_timer_start:   timer=0xffffc900008bc4d8(pss-drop) value=94216201725 interval=0 mode=0x1
>          disrupt-1682  [000] d..2.    94.171779: cobalt_switch_context: prev_name=disrupt prev_pid=1682 prev_prio=30 prev_state=0x248044 ==> next_name=ss-d next_pid=1681 next_prio=20
>             ss-d-1681  [000] d..2.    94.171784: cobalt_trace_pid:     pid=1681, prio=20
>             ss-d-1681  [000] d..2.    94.171788: cobalt_synch_acquire: synch=0xffffc900008bd408
>             ss-d-1681  [000] d..2.    94.171789: cobalt_head_sysexit:  result=0
>             ss-d-1681  [000] d..2.    94.171799: cobalt_head_sysentry: syscall=mutex_unlock
>             ss-d-1681  [000] d..2.    94.171801: cobalt_synch_release: synch=0xffffc900008bd408
>             ss-d-1681  [000] d..2.    94.171801: cobalt_head_sysexit:  result=0
>             ss-d-1681  [000] d..2.    94.231786: cobalt_timer_expire:  timer=0xffffc900008bbb00
>             ss-d-1681  [000] d..2.    94.231789: cobalt_thread_resume: name=disrupt pid=1682 mask=0x4
>             ss-d-1681  [000] d..2.    94.231790: cobalt_trace_pid:     pid=1682, prio=30
>             ss-d-1681  [000] d..2.    94.231791: cobalt_timer_stop:    timer=0xffffc900008bbb00
>             ss-d-1681  [000] d..2.    94.231794: cobalt_tick_shot:     next tick at 94.271741 (delay: 39948 us)
>             ss-d-1681  [000] d..2.    94.231802: cobalt_schedule:      status=0x10000000
>             ss-d-1681  [000] d..2.    94.231803: cobalt_trace_pid:     pid=1681, prio=20
>             ss-d-1681  [000] d..2.    94.231805: cobalt_switch_context: prev_name=ss-d prev_pid=1681 prev_prio=20 prev_state=0x248048 ==> next_name=disrupt next_pid=1682 next_prio=30
>          disrupt-1682  [000] d..2.    94.231810: cobalt_trace_pid:     pid=1682, prio=30
>          disrupt-1682  [000] d..2.    94.231811: cobalt_head_sysexit:  result=0
>          disrupt-1682  [000] d..2.    94.271767: cobalt_timer_expire:  timer=0xffffc900008bc4d8
>          disrupt-1682  [000] d..2.    94.271771: cobalt_thread_suspend: pid=1681 mask=0x200 timeout=0 timeout_mode=0 wchan=(nil)
>          disrupt-1682  [000] d..2.    94.271772: cobalt_tick_shot:     next tick at 95.130203 (delay: 858431 us)
>          disrupt-1682  [000] d..2.    94.271914: cobalt_head_sysentry: syscall=clock_nanosleep64
>          disrupt-1682  [000] d..2.    94.271916: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
>          disrupt-1682  [000] d..2.    94.271917: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)
> 
> Jan
> 
> PS: I asked AI (opus-4.7) to confirm or disprove this, and it failed in 
> its code analysis. It seems like the way the code is structured and 
> commented misguided it in its conclusions. Sooo statistically human...
> 

Asked different ("find a path where..."):

"You're right — I need to retract part of my previous answer. Looking at this scenario through the cascade rule, sporadic_suspend_activity() is not invoked when a SCHED_FIFO thread preempts a SCHED_SPORADIC thread.
                                                                                                                                                                                                                           
  The path

  1. A SCHED_SPORADIC thread runs at, say, normal_prio = 50. Its drop_timer was armed by sporadic_schedule_drop() at the last cascade, set to fire at resume_date + budget.
  2. A SCHED_FIFO thread at priority 80 becomes runnable (wakes from sleep, IRQ unblocks it, etc.) and xnsched_set_resched() marks the CPU.
  3. ___xnsched_run() → xnsched_pick_next() iterates for_each_xnsched_class (sched.c:322). RT has the highest weight, so xnsched_rt_pick(sched) runs first.
  4. In xnsched_rt_pick (sched.c:730–762), the head of the highest-priority queue is now the FIFO thread. Its sched_class == &xnsched_class_rt.
  5. The cascade check at sched.c:757 (if (unlikely(thread->sched_class != &xnsched_class_rt))) is false.
  6. xnsched_rt_pick returns the FIFO thread via del_q. The for_each_xnsched_class loop sees a non-NULL result and exits.
  7. xnsched_sporadic_pick is never called. sporadic_suspend_activity(curr) never runs on the outgoing sporadic thread."

This looks more useful.

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-10  7:33   ` Jan Kiszka
@ 2026-06-10 18:17     ` Jan Kiszka
  2026-06-11  5:38       ` Jan Kiszka
  2026-06-11  7:36       ` Philippe Gerum
  0 siblings, 2 replies; 12+ messages in thread
From: Jan Kiszka @ 2026-06-10 18:17 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 10.06.26 09:33, Jan Kiszka wrote:
> On 10.06.26 09:21, Jan Kiszka wrote:
>> On 10.06.26 08:24, Jan Kiszka wrote:
>>> Hi Philippe,
>>>
>>> while trying to port the signal-while-suspended fix to Xenomai 3, I ran
>>> into XNHELD, a state only existing there. I suppose that was once
>>> forward-ported as EVL_T_HALT. The only user of XNHELD in Xenomai 3 is
>>> SCHED_SPORADIC - so let's dive into that scheduling class.
>>>
>>> Turned out it was never documented, not even linked to the POSIX
>>> standard. But it also slightly differs from it (low_prio = -1 -> suspend
>>> on depletion). There is also no test case, so I asked an AI for one.
>>> That worked fairly well as it seems to have revealed an issue:
>>>
>>> Could it be that we are not properly suspending the budget tracking when
>>> a higher-prio task from a different scheduling class is preempting a
>>> sporadic thread? It looks like that xnsched_sporadic_pick is not invoked
>>> if a thread is selected from a higher-prio class first, namely sched-rt
>>> with its weight 4 vs. 3 if sched-sporadic. Or is that an (undocumented)
>>> limitation/misconfiguration? Is that issue even affecting other
>>> time-slicing classes as well??
>>>
>>> That furthermore makes me wonder if we actually have users of
>>> sched-sporadic. Likely a hard to answer question, as usual. But such a
>>> limitation should have been observed earlier under real workload...
>>>
>>> Jan
>>>
>>
>> Here is a trace that proves how xnsched_sporadic_pick and, thus, 
>> sporadic_suspend_activity are not called:
>>
>>          disrupt-1682  [000] d..2.    94.171753: cobalt_head_sysentry: syscall=clock_nanosleep64
>>          disrupt-1682  [000] d..2.    94.171755: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
>>          disrupt-1682  [000] d..2.    94.171757: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)
>>          disrupt-1682  [000] d..2.    94.171759: cobalt_timer_start:   timer=0xffffc900008bbb00(smokey) value=60000001 interval=0 mode=0x0
>>          disrupt-1682  [000] d..2.    94.171761: cobalt_tick_shot:     next tick at 94.231756 (delay: 59995 us)
>>          disrupt-1682  [000] d..2.    94.171770: cobalt_schedule:      status=0x10000000
>>          disrupt-1682  [000] d..2.    94.171771: cobalt_trace_pid:     pid=1682, prio=30
>>          disrupt-1682  [000] d..2.    94.171776: bprint:               xnsched_sporadic_pick: xnsched_sporadic_pick, curr=1682 next=1681
>>          disrupt-1682  [000] d..2.    94.171777: bprint:               xnsched_sporadic_pick: sporadic_resume_activity, pss->budget 99964473
>>          disrupt-1682  [000] d..2.    94.171778: bprint:               sporadic_schedule_drop: sporadic_schedule_drop, pss->budget 99964473
>>          disrupt-1682  [000] d..2.    94.171778: cobalt_timer_start:   timer=0xffffc900008bc4d8(pss-drop) value=94216201725 interval=0 mode=0x1
>>          disrupt-1682  [000] d..2.    94.171779: cobalt_switch_context: prev_name=disrupt prev_pid=1682 prev_prio=30 prev_state=0x248044 ==> next_name=ss-d next_pid=1681 next_prio=20
>>             ss-d-1681  [000] d..2.    94.171784: cobalt_trace_pid:     pid=1681, prio=20
>>             ss-d-1681  [000] d..2.    94.171788: cobalt_synch_acquire: synch=0xffffc900008bd408
>>             ss-d-1681  [000] d..2.    94.171789: cobalt_head_sysexit:  result=0
>>             ss-d-1681  [000] d..2.    94.171799: cobalt_head_sysentry: syscall=mutex_unlock
>>             ss-d-1681  [000] d..2.    94.171801: cobalt_synch_release: synch=0xffffc900008bd408
>>             ss-d-1681  [000] d..2.    94.171801: cobalt_head_sysexit:  result=0
>>             ss-d-1681  [000] d..2.    94.231786: cobalt_timer_expire:  timer=0xffffc900008bbb00
>>             ss-d-1681  [000] d..2.    94.231789: cobalt_thread_resume: name=disrupt pid=1682 mask=0x4
>>             ss-d-1681  [000] d..2.    94.231790: cobalt_trace_pid:     pid=1682, prio=30
>>             ss-d-1681  [000] d..2.    94.231791: cobalt_timer_stop:    timer=0xffffc900008bbb00
>>             ss-d-1681  [000] d..2.    94.231794: cobalt_tick_shot:     next tick at 94.271741 (delay: 39948 us)
>>             ss-d-1681  [000] d..2.    94.231802: cobalt_schedule:      status=0x10000000
>>             ss-d-1681  [000] d..2.    94.231803: cobalt_trace_pid:     pid=1681, prio=20
>>             ss-d-1681  [000] d..2.    94.231805: cobalt_switch_context: prev_name=ss-d prev_pid=1681 prev_prio=20 prev_state=0x248048 ==> next_name=disrupt next_pid=1682 next_prio=30
>>          disrupt-1682  [000] d..2.    94.231810: cobalt_trace_pid:     pid=1682, prio=30
>>          disrupt-1682  [000] d..2.    94.231811: cobalt_head_sysexit:  result=0
>>          disrupt-1682  [000] d..2.    94.271767: cobalt_timer_expire:  timer=0xffffc900008bc4d8
>>          disrupt-1682  [000] d..2.    94.271771: cobalt_thread_suspend: pid=1681 mask=0x200 timeout=0 timeout_mode=0 wchan=(nil)
>>          disrupt-1682  [000] d..2.    94.271772: cobalt_tick_shot:     next tick at 95.130203 (delay: 858431 us)
>>          disrupt-1682  [000] d..2.    94.271914: cobalt_head_sysentry: syscall=clock_nanosleep64
>>          disrupt-1682  [000] d..2.    94.271916: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
>>          disrupt-1682  [000] d..2.    94.271917: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)
>>
>> Jan
>>
>> PS: I asked AI (opus-4.7) to confirm or disprove this, and it failed in 
>> its code analysis. It seems like the way the code is structured and 
>> commented misguided it in its conclusions. Sooo statistically human...
>>
> 
> Asked different ("find a path where..."):
> 
> "You're right — I need to retract part of my previous answer. Looking at this scenario through the cascade rule, sporadic_suspend_activity() is not invoked when a SCHED_FIFO thread preempts a SCHED_SPORADIC thread.
>                                                                                                                                                                                                                            
>   The path
> 
>   1. A SCHED_SPORADIC thread runs at, say, normal_prio = 50. Its drop_timer was armed by sporadic_schedule_drop() at the last cascade, set to fire at resume_date + budget.
>   2. A SCHED_FIFO thread at priority 80 becomes runnable (wakes from sleep, IRQ unblocks it, etc.) and xnsched_set_resched() marks the CPU.
>   3. ___xnsched_run() → xnsched_pick_next() iterates for_each_xnsched_class (sched.c:322). RT has the highest weight, so xnsched_rt_pick(sched) runs first.
>   4. In xnsched_rt_pick (sched.c:730–762), the head of the highest-priority queue is now the FIFO thread. Its sched_class == &xnsched_class_rt.
>   5. The cascade check at sched.c:757 (if (unlikely(thread->sched_class != &xnsched_class_rt))) is false.
>   6. xnsched_rt_pick returns the FIFO thread via del_q. The for_each_xnsched_class loop sees a non-NULL result and exits.
>   7. xnsched_sporadic_pick is never called. sporadic_suspend_activity(curr) never runs on the outgoing sporadic thread."
> 
> This looks more useful.
> 
> Jan
> 

The blast radius extends:

 - I added a preempting fifo thread to the sched-quota test as well, and
   it completely destroyed the thread group accounting: the group no
   longer gets throttled, rather than getting time stolen by the
   preemptions.

 - The evl core looks identical here and should be similarly affected,
   regarding quota-based scheduling.

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-10 18:17     ` Jan Kiszka
@ 2026-06-11  5:38       ` Jan Kiszka
  2026-06-11  7:49         ` Philippe Gerum
  2026-06-11  7:36       ` Philippe Gerum
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Kiszka @ 2026-06-11  5:38 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 10.06.26 20:17, Jan Kiszka wrote:
> On 10.06.26 09:33, Jan Kiszka wrote:
>> On 10.06.26 09:21, Jan Kiszka wrote:
>>> On 10.06.26 08:24, Jan Kiszka wrote:
>>>> Hi Philippe,
>>>>
>>>> while trying to port the signal-while-suspended fix to Xenomai 3, I ran
>>>> into XNHELD, a state only existing there. I suppose that was once
>>>> forward-ported as EVL_T_HALT. The only user of XNHELD in Xenomai 3 is
>>>> SCHED_SPORADIC - so let's dive into that scheduling class.
>>>>
>>>> Turned out it was never documented, not even linked to the POSIX
>>>> standard. But it also slightly differs from it (low_prio = -1 -> suspend
>>>> on depletion). There is also no test case, so I asked an AI for one.
>>>> That worked fairly well as it seems to have revealed an issue:
>>>>
>>>> Could it be that we are not properly suspending the budget tracking when
>>>> a higher-prio task from a different scheduling class is preempting a
>>>> sporadic thread? It looks like that xnsched_sporadic_pick is not invoked
>>>> if a thread is selected from a higher-prio class first, namely sched-rt
>>>> with its weight 4 vs. 3 if sched-sporadic. Or is that an (undocumented)
>>>> limitation/misconfiguration? Is that issue even affecting other
>>>> time-slicing classes as well??
>>>>
>>>> That furthermore makes me wonder if we actually have users of
>>>> sched-sporadic. Likely a hard to answer question, as usual. But such a
>>>> limitation should have been observed earlier under real workload...
>>>>
>>>> Jan
>>>>
>>>
>>> Here is a trace that proves how xnsched_sporadic_pick and, thus, 
>>> sporadic_suspend_activity are not called:
>>>
>>>          disrupt-1682  [000] d..2.    94.171753: cobalt_head_sysentry: syscall=clock_nanosleep64
>>>          disrupt-1682  [000] d..2.    94.171755: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
>>>          disrupt-1682  [000] d..2.    94.171757: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)
>>>          disrupt-1682  [000] d..2.    94.171759: cobalt_timer_start:   timer=0xffffc900008bbb00(smokey) value=60000001 interval=0 mode=0x0
>>>          disrupt-1682  [000] d..2.    94.171761: cobalt_tick_shot:     next tick at 94.231756 (delay: 59995 us)
>>>          disrupt-1682  [000] d..2.    94.171770: cobalt_schedule:      status=0x10000000
>>>          disrupt-1682  [000] d..2.    94.171771: cobalt_trace_pid:     pid=1682, prio=30
>>>          disrupt-1682  [000] d..2.    94.171776: bprint:               xnsched_sporadic_pick: xnsched_sporadic_pick, curr=1682 next=1681
>>>          disrupt-1682  [000] d..2.    94.171777: bprint:               xnsched_sporadic_pick: sporadic_resume_activity, pss->budget 99964473
>>>          disrupt-1682  [000] d..2.    94.171778: bprint:               sporadic_schedule_drop: sporadic_schedule_drop, pss->budget 99964473
>>>          disrupt-1682  [000] d..2.    94.171778: cobalt_timer_start:   timer=0xffffc900008bc4d8(pss-drop) value=94216201725 interval=0 mode=0x1
>>>          disrupt-1682  [000] d..2.    94.171779: cobalt_switch_context: prev_name=disrupt prev_pid=1682 prev_prio=30 prev_state=0x248044 ==> next_name=ss-d next_pid=1681 next_prio=20
>>>             ss-d-1681  [000] d..2.    94.171784: cobalt_trace_pid:     pid=1681, prio=20
>>>             ss-d-1681  [000] d..2.    94.171788: cobalt_synch_acquire: synch=0xffffc900008bd408
>>>             ss-d-1681  [000] d..2.    94.171789: cobalt_head_sysexit:  result=0
>>>             ss-d-1681  [000] d..2.    94.171799: cobalt_head_sysentry: syscall=mutex_unlock
>>>             ss-d-1681  [000] d..2.    94.171801: cobalt_synch_release: synch=0xffffc900008bd408
>>>             ss-d-1681  [000] d..2.    94.171801: cobalt_head_sysexit:  result=0
>>>             ss-d-1681  [000] d..2.    94.231786: cobalt_timer_expire:  timer=0xffffc900008bbb00
>>>             ss-d-1681  [000] d..2.    94.231789: cobalt_thread_resume: name=disrupt pid=1682 mask=0x4
>>>             ss-d-1681  [000] d..2.    94.231790: cobalt_trace_pid:     pid=1682, prio=30
>>>             ss-d-1681  [000] d..2.    94.231791: cobalt_timer_stop:    timer=0xffffc900008bbb00
>>>             ss-d-1681  [000] d..2.    94.231794: cobalt_tick_shot:     next tick at 94.271741 (delay: 39948 us)
>>>             ss-d-1681  [000] d..2.    94.231802: cobalt_schedule:      status=0x10000000
>>>             ss-d-1681  [000] d..2.    94.231803: cobalt_trace_pid:     pid=1681, prio=20
>>>             ss-d-1681  [000] d..2.    94.231805: cobalt_switch_context: prev_name=ss-d prev_pid=1681 prev_prio=20 prev_state=0x248048 ==> next_name=disrupt next_pid=1682 next_prio=30
>>>          disrupt-1682  [000] d..2.    94.231810: cobalt_trace_pid:     pid=1682, prio=30
>>>          disrupt-1682  [000] d..2.    94.231811: cobalt_head_sysexit:  result=0
>>>          disrupt-1682  [000] d..2.    94.271767: cobalt_timer_expire:  timer=0xffffc900008bc4d8
>>>          disrupt-1682  [000] d..2.    94.271771: cobalt_thread_suspend: pid=1681 mask=0x200 timeout=0 timeout_mode=0 wchan=(nil)
>>>          disrupt-1682  [000] d..2.    94.271772: cobalt_tick_shot:     next tick at 95.130203 (delay: 858431 us)
>>>          disrupt-1682  [000] d..2.    94.271914: cobalt_head_sysentry: syscall=clock_nanosleep64
>>>          disrupt-1682  [000] d..2.    94.271916: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
>>>          disrupt-1682  [000] d..2.    94.271917: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)
>>>
>>> Jan
>>>
>>> PS: I asked AI (opus-4.7) to confirm or disprove this, and it failed in 
>>> its code analysis. It seems like the way the code is structured and 
>>> commented misguided it in its conclusions. Sooo statistically human...
>>>
>>
>> Asked different ("find a path where..."):
>>
>> "You're right — I need to retract part of my previous answer. Looking at this scenario through the cascade rule, sporadic_suspend_activity() is not invoked when a SCHED_FIFO thread preempts a SCHED_SPORADIC thread.
>>                                                                                                                                                                                                                            
>>   The path
>>
>>   1. A SCHED_SPORADIC thread runs at, say, normal_prio = 50. Its drop_timer was armed by sporadic_schedule_drop() at the last cascade, set to fire at resume_date + budget.
>>   2. A SCHED_FIFO thread at priority 80 becomes runnable (wakes from sleep, IRQ unblocks it, etc.) and xnsched_set_resched() marks the CPU.
>>   3. ___xnsched_run() → xnsched_pick_next() iterates for_each_xnsched_class (sched.c:322). RT has the highest weight, so xnsched_rt_pick(sched) runs first.
>>   4. In xnsched_rt_pick (sched.c:730–762), the head of the highest-priority queue is now the FIFO thread. Its sched_class == &xnsched_class_rt.
>>   5. The cascade check at sched.c:757 (if (unlikely(thread->sched_class != &xnsched_class_rt))) is false.
>>   6. xnsched_rt_pick returns the FIFO thread via del_q. The for_each_xnsched_class loop sees a non-NULL result and exits.
>>   7. xnsched_sporadic_pick is never called. sporadic_suspend_activity(curr) never runs on the outgoing sporadic thread."
>>
>> This looks more useful.
>>
>> Jan
>>
> 
> The blast radius extends:
> 
>  - I added a preempting fifo thread to the sched-quota test as well, and
>    it completely destroyed the thread group accounting: the group no
>    longer gets throttled, rather than getting time stolen by the
>    preemptions.
> 
>  - The evl core looks identical here and should be similarly affected,
>    regarding quota-based scheduling.
> 

Here is an attempt to fix sched-quota, along with the test case 
modifications:

diff --git a/include/cobalt/kernel/sched.h b/include/cobalt/kernel/sched.h
index 106e6e29a3..bf19391e2c 100644
--- a/include/cobalt/kernel/sched.h
+++ b/include/cobalt/kernel/sched.h
@@ -136,6 +136,7 @@ struct xnsched_class {
 	void (*sched_dequeue)(struct xnthread *thread);
 	void (*sched_requeue)(struct xnthread *thread);
 	struct xnthread *(*sched_pick)(struct xnsched *sched);
+	void (*sched_out)(struct xnthread *thread);
 	void (*sched_tick)(struct xnsched *sched);
 	void (*sched_rotate)(struct xnsched *sched,
 			     const union xnsched_policy_param *p);
diff --git a/kernel/cobalt/sched-quota.c b/kernel/cobalt/sched-quota.c
index 60b2c92b8f..d6e7022d99 100644
--- a/kernel/cobalt/sched-quota.c
+++ b/kernel/cobalt/sched-quota.c
@@ -405,27 +405,36 @@ static void xnsched_quota_requeue(struct xnthread *thread)
 	tg->nr_active++;
 }
 
+static void charge_usage(struct xnsched_quota_group *tg, xnticks_t now)
+{
+	xnticks_t elapsed;
+
+	elapsed = now - tg->run_start_ns;
+	if (elapsed < tg->run_budget_ns)
+		tg->run_budget_ns -= elapsed;
+	else
+		tg->run_budget_ns = 0;
+}
+
 static struct xnthread *xnsched_quota_pick(struct xnsched *sched)
 {
 	struct xnthread *next, *curr = sched->curr;
 	struct xnsched_quota *qs = &sched->quota;
 	struct xnsched_quota_group *otg, *tg;
-	xnticks_t now, elapsed;
+	xnticks_t now;
 	int ret;
 
 	now = xnclock_read_monotonic(&nkclock);
 	otg = curr->quota;
 	if (otg == NULL)
 		goto pick;
+
 	/*
 	 * Charge the time consumed by the outgoing thread to the
 	 * group it belongs to.
 	 */
-	elapsed = now - otg->run_start_ns;
-	if (elapsed < otg->run_budget_ns)
-		otg->run_budget_ns -= elapsed;
-	else
-		otg->run_budget_ns = 0;
+	charge_usage(otg, now);
+
 pick:
 	next = xnsched_getq(&sched->rt.runnable);
 	if (next == NULL) {
@@ -477,6 +486,14 @@ out:
 	return next;
 }
 
+static void xnsched_quota_out(struct xnthread *thread)
+{
+	struct xnsched_quota_group *tg = thread->quota;
+
+	if (tg)
+		charge_usage(tg, xnclock_read_monotonic(&nkclock));
+}
+
 static void xnsched_quota_migrate(struct xnthread *thread, struct xnsched *sched)
 {
 	union xnsched_policy_param param;
@@ -814,6 +831,7 @@ struct xnsched_class xnsched_class_quota = {
 	.sched_dequeue		=	xnsched_quota_dequeue,
 	.sched_requeue		=	xnsched_quota_requeue,
 	.sched_pick		=	xnsched_quota_pick,
+	.sched_out		=	xnsched_quota_out,
 	.sched_tick		=	NULL,
 	.sched_rotate		=	NULL,
 	.sched_migrate		=	xnsched_quota_migrate,
diff --git a/kernel/cobalt/sched.c b/kernel/cobalt/sched.c
index d527b6be2c..6dfbf83220 100644
--- a/kernel/cobalt/sched.c
+++ b/kernel/cobalt/sched.c
@@ -895,6 +895,7 @@ static inline void do_lazy_user_work(struct xnthread *curr)
 
 int ___xnsched_run(struct xnsched *sched)
 {
+	struct xnsched_class *prev_schedclass __maybe_unused;
 	bool switched = false, leaving_inband;
 	struct xnthread *prev, *next, *curr;
 	spl_t s;
@@ -933,6 +934,13 @@ int ___xnsched_run(struct xnsched *sched)
 
 	prev = curr;
 
+#ifdef CONFIG_XENO_OPT_SCHED_CLASSES
+	prev_schedclass = prev->sched_class;
+	if (prev_schedclass->weight < next->sched_class->weight &&
+	    prev_schedclass->sched_out)
+		prev_schedclass->sched_out(prev);
+#endif
+
 	trace_cobalt_switch_context(prev, next);
 
 	/*
diff --git a/testsuite/smokey/sched-quota/sched-quota.c b/testsuite/smokey/sched-quota/sched-quota.c
index f9e64e37f1..cebe629394 100644
--- a/testsuite/smokey/sched-quota/sched-quota.c
+++ b/testsuite/smokey/sched-quota/sched-quota.c
@@ -45,9 +45,9 @@ smokey_test_plugin(sched_quota,
 
 static unsigned long long crunch_per_sec, loops_per_sec;
 
-static pthread_t threads[MAX_THREADS];
+static pthread_t threads[MAX_THREADS + 1];
 
-static unsigned long counts[MAX_THREADS];
+static unsigned long counts[MAX_THREADS + 1];
 
 static int nrthreads;
 
@@ -107,6 +107,36 @@ static void *thread_body(void *arg)
 	return NULL;
 }
 
+static void *disruptor(void *arg)
+{
+	unsigned long *count_r = arg;
+	int oldstate, oldtype;
+	struct timespec req;
+
+	pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &oldstate);
+	pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &oldtype);
+	*count_r = 0;
+	sem_post(&ready);
+
+	pthread_mutex_lock(&lock);
+	for (;;) {
+		if (started)
+			break;
+		pthread_cond_wait(&barrier, &lock);
+	}
+	pthread_mutex_unlock(&lock);
+
+	while (!throttle) {
+		do_work(1000, count_r);
+
+		req.tv_sec = 0;
+		req.tv_nsec = 1000000;
+		clock_nanosleep(CLOCK_MONOTONIC, 0, &req, NULL);
+	}
+
+	return NULL;
+}
+
 static void __create_quota_thread(pthread_t *tid, const char *name,
 				  int tgid, unsigned long *count_r)
 {
@@ -133,7 +163,7 @@ static void __create_quota_thread(pthread_t *tid, const char *name,
 	__create_quota_thread(&(__tid), __label, __tgid, &(__count))
 
 static void __create_fifo_thread(pthread_t *tid, const char *name,
-				 unsigned long *count_r)
+				 void *(*func)(void *), unsigned long *count_r)
 {
 	struct sched_param param;
 	pthread_attr_t attr;
@@ -143,9 +173,9 @@ static void __create_fifo_thread(pthread_t *tid, const char *name,
 	pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
 	pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
 	pthread_attr_setschedpolicy(&attr, SCHED_FIFO);
-	param.sched_priority = 1;
+	param.sched_priority = 2;
 	pthread_attr_setschedparam(&attr, &param);
-	ret = pthread_create(tid, &attr, thread_body, count_r);
+	ret = pthread_create(tid, &attr, func, count_r);
 	if (ret)
 		error(1, ret, "pthread_create(SCHED_FIFO)");
 
@@ -153,8 +183,8 @@ static void __create_fifo_thread(pthread_t *tid, const char *name,
 	pthread_setname_np(*tid, name);
 }
 
-#define create_fifo_thread(__tid, __label, __count)	\
-	__create_fifo_thread(&(__tid), __label, &(__count))
+#define create_fifo_thread(__tid, __label, __func, __count)	\
+	__create_fifo_thread(&(__tid), __label, __func, &(__count))
 
 static double run_quota(int quota)
 {
@@ -190,6 +220,10 @@ static double run_quota(int quota)
 		sem_wait(&ready);
 	}
 
+	create_fifo_thread(threads[nrthreads], "disruptor", disruptor,
+			   counts[nrthreads]);
+	sem_wait(&ready);
+
 	pthread_mutex_lock(&lock);
 	started = 1;
 	pthread_cond_broadcast(&barrier);
@@ -212,6 +246,7 @@ static double run_quota(int quota)
 		pthread_cancel(threads[n]);
 		pthread_join(threads[n], NULL);
 	}
+	pthread_join(threads[nrthreads], NULL);
 
 	cf.quota.op = sched_quota_remove;
 	cf.quota.remove.tgid = tgid;
@@ -243,7 +278,7 @@ static unsigned long long calibrate(void)
 
 	for (n = 0; n < nrthreads; n++) {
 		snprintf(label, sizeof(label), "t%d", n);
-		create_fifo_thread(threads[n], label, counts[n]);
+		create_fifo_thread(threads[n], label, thread_body, counts[n]);
 		sem_wait(&ready);
 	}
 

Something analogous for sched-sporadic does not help. It looks broken in 
its algorithm, specifically that is schedules consumed budget for 
recharge on preemption, rather than reducing the remaining budget. But I 
might also miss some case where this is actually needed.

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-10  6:24 SCHED_SPORADIC in Xenomai 3 Jan Kiszka
  2026-06-10  7:21 ` Jan Kiszka
@ 2026-06-11  7:33 ` Philippe Gerum
  2026-06-11  7:42   ` Jan Kiszka
  1 sibling, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2026-06-11  7:33 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Hi Jan,

Jan Kiszka <jan.kiszka@siemens.com> writes:

> Hi Philippe,
>
> while trying to port the signal-while-suspended fix to Xenomai 3, I ran
> into XNHELD, a state only existing there. I suppose that was once
> forward-ported as EVL_T_HALT. The only user of XNHELD in Xenomai 3 is
> SCHED_SPORADIC - so let's dive into that scheduling class.
>
> Turned out it was never documented, not even linked to the POSIX
> standard. But it also slightly differs from it (low_prio = -1 -> suspend
> on depletion). There is also no test case, so I asked an AI for one.
> That worked fairly well as it seems to have revealed an issue:
>
> Could it be that we are not properly suspending the budget tracking when
> a higher-prio task from a different scheduling class is preempting a
> sporadic thread? It looks like that xnsched_sporadic_pick is not invoked
> if a thread is selected from a higher-prio class first, namely sched-rt
> with its weight 4 vs. 3 if sched-sporadic. Or is that an (undocumented)
> limitation/misconfiguration? Is that issue even affecting other
> time-slicing classes as well??

Yes, this is a systemic bug. We need to tell classes that some thread of
theirs is scheduling out.

>
> That furthermore makes me wonder if we actually have users of
> sched-sporadic. Likely a hard to answer question, as usual. But such a
> limitation should have been observed earlier under real workload...
>

I only know of one user, back in 2009. Never received any feedback since
then.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-10 18:17     ` Jan Kiszka
  2026-06-11  5:38       ` Jan Kiszka
@ 2026-06-11  7:36       ` Philippe Gerum
  1 sibling, 0 replies; 12+ messages in thread
From: Philippe Gerum @ 2026-06-11  7:36 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 10.06.26 09:33, Jan Kiszka wrote:
>> On 10.06.26 09:21, Jan Kiszka wrote:
>>> On 10.06.26 08:24, Jan Kiszka wrote:
>>>> Hi Philippe,
>>>>
>>>> while trying to port the signal-while-suspended fix to Xenomai 3, I ran
>>>> into XNHELD, a state only existing there. I suppose that was once
>>>> forward-ported as EVL_T_HALT. The only user of XNHELD in Xenomai 3 is
>>>> SCHED_SPORADIC - so let's dive into that scheduling class.
>>>>
>>>> Turned out it was never documented, not even linked to the POSIX
>>>> standard. But it also slightly differs from it (low_prio = -1 -> suspend
>>>> on depletion). There is also no test case, so I asked an AI for one.
>>>> That worked fairly well as it seems to have revealed an issue:
>>>>
>>>> Could it be that we are not properly suspending the budget tracking when
>>>> a higher-prio task from a different scheduling class is preempting a
>>>> sporadic thread? It looks like that xnsched_sporadic_pick is not invoked
>>>> if a thread is selected from a higher-prio class first, namely sched-rt
>>>> with its weight 4 vs. 3 if sched-sporadic. Or is that an (undocumented)
>>>> limitation/misconfiguration? Is that issue even affecting other
>>>> time-slicing classes as well??
>>>>
>>>> That furthermore makes me wonder if we actually have users of
>>>> sched-sporadic. Likely a hard to answer question, as usual. But such a
>>>> limitation should have been observed earlier under real workload...
>>>>
>>>> Jan
>>>>
>>>
>>> Here is a trace that proves how xnsched_sporadic_pick and, thus, 
>>> sporadic_suspend_activity are not called:
>>>
>>>          disrupt-1682  [000] d..2.    94.171753: cobalt_head_sysentry: syscall=clock_nanosleep64
>>>          disrupt-1682  [000] d..2.    94.171755: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
>>>          disrupt-1682  [000] d..2.    94.171757: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)
>>>          disrupt-1682  [000] d..2.    94.171759: cobalt_timer_start:   timer=0xffffc900008bbb00(smokey) value=60000001 interval=0 mode=0x0
>>>          disrupt-1682  [000] d..2.    94.171761: cobalt_tick_shot:     next tick at 94.231756 (delay: 59995 us)
>>>          disrupt-1682  [000] d..2.    94.171770: cobalt_schedule:      status=0x10000000
>>>          disrupt-1682  [000] d..2.    94.171771: cobalt_trace_pid:     pid=1682, prio=30
>>>          disrupt-1682  [000] d..2.    94.171776: bprint:               xnsched_sporadic_pick: xnsched_sporadic_pick, curr=1682 next=1681
>>>          disrupt-1682  [000] d..2.    94.171777: bprint:               xnsched_sporadic_pick: sporadic_resume_activity, pss->budget 99964473
>>>          disrupt-1682  [000] d..2.    94.171778: bprint:               sporadic_schedule_drop: sporadic_schedule_drop, pss->budget 99964473
>>>          disrupt-1682  [000] d..2.    94.171778: cobalt_timer_start:   timer=0xffffc900008bc4d8(pss-drop) value=94216201725 interval=0 mode=0x1
>>>          disrupt-1682  [000] d..2.    94.171779: cobalt_switch_context: prev_name=disrupt prev_pid=1682 prev_prio=30 prev_state=0x248044 ==> next_name=ss-d next_pid=1681 next_prio=20
>>>             ss-d-1681  [000] d..2.    94.171784: cobalt_trace_pid:     pid=1681, prio=20
>>>             ss-d-1681  [000] d..2.    94.171788: cobalt_synch_acquire: synch=0xffffc900008bd408
>>>             ss-d-1681  [000] d..2.    94.171789: cobalt_head_sysexit:  result=0
>>>             ss-d-1681  [000] d..2.    94.171799: cobalt_head_sysentry: syscall=mutex_unlock
>>>             ss-d-1681  [000] d..2.    94.171801: cobalt_synch_release: synch=0xffffc900008bd408
>>>             ss-d-1681  [000] d..2.    94.171801: cobalt_head_sysexit:  result=0
>>>             ss-d-1681  [000] d..2.    94.231786: cobalt_timer_expire:  timer=0xffffc900008bbb00
>>>             ss-d-1681  [000] d..2.    94.231789: cobalt_thread_resume: name=disrupt pid=1682 mask=0x4
>>>             ss-d-1681  [000] d..2.    94.231790: cobalt_trace_pid:     pid=1682, prio=30
>>>             ss-d-1681  [000] d..2.    94.231791: cobalt_timer_stop:    timer=0xffffc900008bbb00
>>>             ss-d-1681  [000] d..2.    94.231794: cobalt_tick_shot:     next tick at 94.271741 (delay: 39948 us)
>>>             ss-d-1681  [000] d..2.    94.231802: cobalt_schedule:      status=0x10000000
>>>             ss-d-1681  [000] d..2.    94.231803: cobalt_trace_pid:     pid=1681, prio=20
>>>             ss-d-1681  [000] d..2.    94.231805: cobalt_switch_context: prev_name=ss-d prev_pid=1681 prev_prio=20 prev_state=0x248048 ==> next_name=disrupt next_pid=1682 next_prio=30
>>>          disrupt-1682  [000] d..2.    94.231810: cobalt_trace_pid:     pid=1682, prio=30
>>>          disrupt-1682  [000] d..2.    94.231811: cobalt_head_sysexit:  result=0
>>>          disrupt-1682  [000] d..2.    94.271767: cobalt_timer_expire:  timer=0xffffc900008bc4d8
>>>          disrupt-1682  [000] d..2.    94.271771: cobalt_thread_suspend: pid=1681 mask=0x200 timeout=0 timeout_mode=0 wchan=(nil)
>>>          disrupt-1682  [000] d..2.    94.271772: cobalt_tick_shot:     next tick at 95.130203 (delay: 858431 us)
>>>          disrupt-1682  [000] d..2.    94.271914: cobalt_head_sysentry: syscall=clock_nanosleep64
>>>          disrupt-1682  [000] d..2.    94.271916: cobalt_clock_nanosleep: clock_id=1 flags=0() rqt=(0.060000000)
>>>          disrupt-1682  [000] d..2.    94.271917: cobalt_thread_suspend: pid=1682 mask=0x4 timeout=60000001 timeout_mode=0 wchan=(nil)
>>>
>>> Jan
>>>
>>> PS: I asked AI (opus-4.7) to confirm or disprove this, and it failed in 
>>> its code analysis. It seems like the way the code is structured and 
>>> commented misguided it in its conclusions. Sooo statistically human...
>>>
>> 
>> Asked different ("find a path where..."):
>> 
>> "You're right — I need to retract part of my previous
>> answer. Looking at this scenario through the cascade rule,
>> sporadic_suspend_activity() is not invoked when a SCHED_FIFO thread
>> preempts a SCHED_SPORADIC thread.
>>
>>   The path
>> 
>>   1. A SCHED_SPORADIC thread runs at, say, normal_prio = 50. Its
>> drop_timer was armed by sporadic_schedule_drop() at the last
>> cascade, set to fire at resume_date + budget.
>>   2. A SCHED_FIFO thread at priority 80 becomes runnable (wakes from sleep, IRQ unblocks it, etc.) and xnsched_set_resched() marks the CPU.
>>   3. ___xnsched_run() → xnsched_pick_next() iterates for_each_xnsched_class (sched.c:322). RT has the highest weight, so xnsched_rt_pick(sched) runs first.
>>   4. In xnsched_rt_pick (sched.c:730–762), the head of the highest-priority queue is now the FIFO thread. Its sched_class == &xnsched_class_rt.
>>   5. The cascade check at sched.c:757 (if (unlikely(thread->sched_class != &xnsched_class_rt))) is false.
>>   6. xnsched_rt_pick returns the FIFO thread via del_q. The for_each_xnsched_class loop sees a non-NULL result and exits.
>>   7. xnsched_sporadic_pick is never called. sporadic_suspend_activity(curr) never runs on the outgoing sporadic thread."
>> 
>> This looks more useful.
>> 
>> Jan
>> 
>
> The blast radius extends:
>
>  - I added a preempting fifo thread to the sched-quota test as well, and
>    it completely destroyed the thread group accounting: the group no
>    longer gets throttled, rather than getting time stolen by the
>    preemptions.
>
>  - The evl core looks identical here and should be similarly affected,
>    regarding quota-based scheduling.
>

The EVL core inherited the implementation of the quota scheduling class
from Cobalt, therefore suffers the same problem ATM.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-11  7:33 ` Philippe Gerum
@ 2026-06-11  7:42   ` Jan Kiszka
  2026-06-11  7:56     ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kiszka @ 2026-06-11  7:42 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 11.06.26 09:33, Philippe Gerum wrote:
> 
> Hi Jan,
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> Hi Philippe,
>>
>> while trying to port the signal-while-suspended fix to Xenomai 3, I ran
>> into XNHELD, a state only existing there. I suppose that was once
>> forward-ported as EVL_T_HALT. The only user of XNHELD in Xenomai 3 is
>> SCHED_SPORADIC - so let's dive into that scheduling class.
>>
>> Turned out it was never documented, not even linked to the POSIX
>> standard. But it also slightly differs from it (low_prio = -1 -> suspend
>> on depletion). There is also no test case, so I asked an AI for one.
>> That worked fairly well as it seems to have revealed an issue:
>>
>> Could it be that we are not properly suspending the budget tracking when
>> a higher-prio task from a different scheduling class is preempting a
>> sporadic thread? It looks like that xnsched_sporadic_pick is not invoked
>> if a thread is selected from a higher-prio class first, namely sched-rt
>> with its weight 4 vs. 3 if sched-sporadic. Or is that an (undocumented)
>> limitation/misconfiguration? Is that issue even affecting other
>> time-slicing classes as well??
> 
> Yes, this is a systemic bug. We need to tell classes that some thread of
> theirs is scheduling out.
> 

OK, then please have a look of my proposal from today. If it makes
sense, I'm happy to submit it for x3 as well as linux-evl patch.

>>
>> That furthermore makes me wonder if we actually have users of
>> sched-sporadic. Likely a hard to answer question, as usual. But such a
>> limitation should have been observed earlier under real workload...
>>
> 
> I only know of one user, back in 2009. Never received any feedback since
> then.
> 

Hmm, I'm inclined to retire it then.
@all: If there are users today, please speak up now!

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-11  5:38       ` Jan Kiszka
@ 2026-06-11  7:49         ` Philippe Gerum
  2026-06-11  7:56           ` Jan Kiszka
  0 siblings, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2026-06-11  7:49 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

Jan Kiszka <jan.kiszka@siemens.com> writes:

> diff --git a/kernel/cobalt/sched.c b/kernel/cobalt/sched.c
> index d527b6be2c..6dfbf83220 100644
> --- a/kernel/cobalt/sched.c
> +++ b/kernel/cobalt/sched.c
> @@ -895,6 +895,7 @@ static inline void do_lazy_user_work(struct xnthread *curr)
>  
>  int ___xnsched_run(struct xnsched *sched)
>  {
> +	struct xnsched_class *prev_schedclass __maybe_unused;
>  	bool switched = false, leaving_inband;
>  	struct xnthread *prev, *next, *curr;
>  	spl_t s;
> @@ -933,6 +934,13 @@ int ___xnsched_run(struct xnsched *sched)
>  
>  	prev = curr;
>  
> +#ifdef CONFIG_XENO_OPT_SCHED_CLASSES
> +	prev_schedclass = prev->sched_class;
> +	if (prev_schedclass->weight < next->sched_class->weight &&
> +	    prev_schedclass->sched_out)
> +		prev_schedclass->sched_out(prev);
> +#endif

I would call the scheduling out hook unconditionally, the sched class
has all the information required to sort this out, do the right thing,
which the generic scheduler does not.
>
> Something analogous for sched-sporadic does not help. It looks broken in 
> its algorithm, specifically that is schedules consumed budget for 
> recharge on preemption, rather than reducing the remaining budget.

Since the original implementation did not account for the preemption
case in budget-tracking classes, that makes sense.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-11  7:49         ` Philippe Gerum
@ 2026-06-11  7:56           ` Jan Kiszka
  2026-06-11  8:53             ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kiszka @ 2026-06-11  7:56 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 11.06.26 09:49, Philippe Gerum wrote:
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> diff --git a/kernel/cobalt/sched.c b/kernel/cobalt/sched.c
>> index d527b6be2c..6dfbf83220 100644
>> --- a/kernel/cobalt/sched.c
>> +++ b/kernel/cobalt/sched.c
>> @@ -895,6 +895,7 @@ static inline void do_lazy_user_work(struct xnthread *curr)
>>  
>>  int ___xnsched_run(struct xnsched *sched)
>>  {
>> +	struct xnsched_class *prev_schedclass __maybe_unused;
>>  	bool switched = false, leaving_inband;
>>  	struct xnthread *prev, *next, *curr;
>>  	spl_t s;
>> @@ -933,6 +934,13 @@ int ___xnsched_run(struct xnsched *sched)
>>  
>>  	prev = curr;
>>  
>> +#ifdef CONFIG_XENO_OPT_SCHED_CLASSES
>> +	prev_schedclass = prev->sched_class;
>> +	if (prev_schedclass->weight < next->sched_class->weight &&
>> +	    prev_schedclass->sched_out)
>> +		prev_schedclass->sched_out(prev);
>> +#endif
> 
> I would call the scheduling out hook unconditionally, the sched class
> has all the information required to sort this out, do the right thing,
> which the generic scheduler does not.

That would mean moving the accounting out of the pick callback
unconditionally as well - leaving some smaller synergies on the road.
Need to look into that closer, though.

>>
>> Something analogous for sched-sporadic does not help. It looks broken in 
>> its algorithm, specifically that is schedules consumed budget for 
>> recharge on preemption, rather than reducing the remaining budget.
> 
> Since the original implementation did not account for the preemption
> case in budget-tracking classes, that makes sense.
> 

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-11  7:42   ` Jan Kiszka
@ 2026-06-11  7:56     ` Philippe Gerum
  0 siblings, 0 replies; 12+ messages in thread
From: Philippe Gerum @ 2026-06-11  7:56 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 11.06.26 09:33, Philippe Gerum wrote:
>> 
>> Hi Jan,
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> Hi Philippe,
>>>
>>> while trying to port the signal-while-suspended fix to Xenomai 3, I ran
>>> into XNHELD, a state only existing there. I suppose that was once
>>> forward-ported as EVL_T_HALT. The only user of XNHELD in Xenomai 3 is
>>> SCHED_SPORADIC - so let's dive into that scheduling class.
>>>
>>> Turned out it was never documented, not even linked to the POSIX
>>> standard. But it also slightly differs from it (low_prio = -1 -> suspend
>>> on depletion). There is also no test case, so I asked an AI for one.
>>> That worked fairly well as it seems to have revealed an issue:
>>>
>>> Could it be that we are not properly suspending the budget tracking when
>>> a higher-prio task from a different scheduling class is preempting a
>>> sporadic thread? It looks like that xnsched_sporadic_pick is not invoked
>>> if a thread is selected from a higher-prio class first, namely sched-rt
>>> with its weight 4 vs. 3 if sched-sporadic. Or is that an (undocumented)
>>> limitation/misconfiguration? Is that issue even affecting other
>>> time-slicing classes as well??
>> 
>> Yes, this is a systemic bug. We need to tell classes that some thread of
>> theirs is scheduling out.
>> 
>
> OK, then please have a look of my proposal from today. If it makes
> sense, I'm happy to submit it for x3 as well as linux-evl patch.
>
>>>
>>> That furthermore makes me wonder if we actually have users of
>>> sched-sporadic. Likely a hard to answer question, as usual. But such a
>>> limitation should have been observed earlier under real workload...
>>>
>> 
>> I only know of one user, back in 2009. Never received any feedback since
>> then.
>> 
>
> Hmm, I'm inclined to retire it then.

The sched sporadic implementation is certainly not quite right ATM in
x3, although it could be fixed. This said, the POSIX standard was even
originally broken IIRC and required some amendments. I did not port that
sched class to x4.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SCHED_SPORADIC in Xenomai 3
  2026-06-11  7:56           ` Jan Kiszka
@ 2026-06-11  8:53             ` Philippe Gerum
  0 siblings, 0 replies; 12+ messages in thread
From: Philippe Gerum @ 2026-06-11  8:53 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 11.06.26 09:49, Philippe Gerum wrote:
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> diff --git a/kernel/cobalt/sched.c b/kernel/cobalt/sched.c
>>> index d527b6be2c..6dfbf83220 100644
>>> --- a/kernel/cobalt/sched.c
>>> +++ b/kernel/cobalt/sched.c
>>> @@ -895,6 +895,7 @@ static inline void do_lazy_user_work(struct xnthread *curr)
>>>  
>>>  int ___xnsched_run(struct xnsched *sched)
>>>  {
>>> +	struct xnsched_class *prev_schedclass __maybe_unused;
>>>  	bool switched = false, leaving_inband;
>>>  	struct xnthread *prev, *next, *curr;
>>>  	spl_t s;
>>> @@ -933,6 +934,13 @@ int ___xnsched_run(struct xnsched *sched)
>>>  
>>>  	prev = curr;
>>>  
>>> +#ifdef CONFIG_XENO_OPT_SCHED_CLASSES
>>> +	prev_schedclass = prev->sched_class;
>>> +	if (prev_schedclass->weight < next->sched_class->weight &&
>>> +	    prev_schedclass->sched_out)
>>> +		prev_schedclass->sched_out(prev);
>>> +#endif
>> 
>> I would call the scheduling out hook unconditionally, the sched class
>> has all the information required to sort this out, do the right thing,
>> which the generic scheduler does not.
>
> That would mean moving the accounting out of the pick callback
> unconditionally as well - leaving some smaller synergies on the road.

This hook would have to be called when the current thread blocks, which
should not be filtered out by the class weight check. Moreover, if next
!= curr, we know for sure that curr is either blocked, yields on a
round-robin tick or gets preempted on priority basis. In the latter
case, the sched class weight is accounted for when picking next. If
current is still runnable, prev_schedclass->weight has to be lower than
next->sched_class->weight for preemption to take place anyway.

IOW, the following change would be appropriate:

>>> +	    if (prev_schedclass->sched_out)
>>> +		prev_schedclass->sched_out(prev);

-- 
Philippe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-11  8:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10  6:24 SCHED_SPORADIC in Xenomai 3 Jan Kiszka
2026-06-10  7:21 ` Jan Kiszka
2026-06-10  7:33   ` Jan Kiszka
2026-06-10 18:17     ` Jan Kiszka
2026-06-11  5:38       ` Jan Kiszka
2026-06-11  7:49         ` Philippe Gerum
2026-06-11  7:56           ` Jan Kiszka
2026-06-11  8:53             ` Philippe Gerum
2026-06-11  7:36       ` Philippe Gerum
2026-06-11  7:33 ` Philippe Gerum
2026-06-11  7:42   ` Jan Kiszka
2026-06-11  7:56     ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.