[Xenomai-core] [BUG] scheduling order of dying shadow threads

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-core] [BUG] scheduling order of dying shadow threads
@ 2005-11-08  8:38 Jan Kiszka
  2005-11-08  9:32 ` Philippe Gerum
  2005-11-08 15:51 ` Philippe Gerum
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Kiszka @ 2005-11-08  8:38 UTC (permalink / raw)
  To: xenomai-core

[-- Attachment #1.1: Type: text/plain, Size: 3124 bytes --]

Hi Philippe,

I think this one is for you: ;)

Sebastian got almost mad with his CAN driver while tracing a strange
scheduling behaviour during shadow thread deletion for several days(!) -
and I was right on the way to follow him yesterday evening. Attached is
a simplified demonstration of the effect, consisting of a RTDM driver
and both a kernel and user space application to trigger it.

Assume two or more user space RT-threads blocking on the same RTDM
semaphore inside a driver (I was not yet able to reproduce this with a
simply native user space application :/). All get then woken up on
rtdm_sem_destroy during device closure. They increment a global counter,
save the current value in a per-thread variable, and then terminate.
They had also passed another per-thread variable to the RTDM driver
which was updated in the kernel using the same(!) counter.

/* application */
void demo(void *arg)
{
    rt_dev_read(dev, &value_k[(int)arg], 0);
    value_u[(int)arg] = ++counter;
}

/* driver */
int demo_read_rt(struct rtdm_dev_context    *context,
                 rtdm_user_info_t           *user_info,
                 void                       *buf,
                 size_t                     nbyte)
{
    struct demodrv_context  *my_context;
    int                     ret;

    my_context = (struct demodrv_context *)context->dev_private;

    ret = rtdm_sem_down(&my_context->read_sem);
    *(int *)buf = ++(*counter);

    return ret;
}

That global counter is also incremented during closure to visualise the
call order:

int demo_close_rt(struct rtdm_dev_context   *context,
                  rtdm_user_info_t          *user_info)
{
    struct demodrv_context  *my_context;

    my_context = (struct demodrv_context *)context->dev_private;

    printk("close 1: %d\n", xnpod_current_thread()->cprio);
    rtdm_sem_destroy(&my_context->read_sem);
    printk("close 2: %d\n", xnpod_current_thread()->cprio);
    (*counter)++;

    return 0;
}

Now one would expect the following content of the involved variables
when running 3 threads e.g.:

           thread 1      (prio 99)
         /   thread 2    (prio 98)
         |  /   thread 3 (prio 97)
         |  |  /
value_k: 1, 3, 5
value_u: 2, 4, 6
counter: 7

This is indeed what we get when the application locates in kernel space,
i.e. does not use shadow threads. But when it is a user space
application, the result looks like this:

           thread 1
         /   thread 2
         |  /   thread 3
         |  |  /
value_k: 1, 4, 6
value_u: 2, 5, 7
counter: 7

Which means that first thread returns from kernel to user space and
terminates, then the close handler gets executed again, and only
afterwards the remaining threads!

The reason is also displayed by demodrv:
close 1: 0	- prio of root thread before rtdm_sem_destroy
close 2: 99	- ... and after rtdm_sem_destroy

Which means that the non-RT thread calling rt_dev_close gets lifted to
prio 99 on calling rtdm_sem_destroy, the prio of the thread first woken
up. It seems to loose this prio quite soon again, but not soon enough to
avoid the inversion - very strange.

Any ideas?

Jan

[-- Attachment #1.2: cleanup-race.tar.bz2 --]
[-- Type: application/x-bzip2, Size: 2039 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-core] [BUG] scheduling order of dying shadow threads
  2005-11-08  8:38 [Xenomai-core] [BUG] scheduling order of dying shadow threads Jan Kiszka
@ 2005-11-08  9:32 ` Philippe Gerum
  2005-11-08 15:51 ` Philippe Gerum
  1 sibling, 0 replies; 10+ messages in thread
From: Philippe Gerum @ 2005-11-08  9:32 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Hi Philippe,
> 
> I think this one is for you: ;)
> 
> Sebastian got almost mad with his CAN driver while tracing a strange
> scheduling behaviour during shadow thread deletion for several days(!) -
> and I was right on the way to follow him yesterday evening. Attached is
> a simplified demonstration of the effect, consisting of a RTDM driver
> and both a kernel and user space application to trigger it.
> 
> Assume two or more user space RT-threads blocking on the same RTDM
> semaphore inside a driver (I was not yet able to reproduce this with a
> simply native user space application :/). All get then woken up on
> rtdm_sem_destroy during device closure. They increment a global counter,
> save the current value in a per-thread variable, and then terminate.
> They had also passed another per-thread variable to the RTDM driver
> which was updated in the kernel using the same(!) counter.
> 
> /* application */
> void demo(void *arg)
> {
>     rt_dev_read(dev, &value_k[(int)arg], 0);
>     value_u[(int)arg] = ++counter;
> }
> 
> /* driver */
> int demo_read_rt(struct rtdm_dev_context    *context,
>                  rtdm_user_info_t           *user_info,
>                  void                       *buf,
>                  size_t                     nbyte)
> {
>     struct demodrv_context  *my_context;
>     int                     ret;
> 
> 
>     my_context = (struct demodrv_context *)context->dev_private;
> 
>     ret = rtdm_sem_down(&my_context->read_sem);
>     *(int *)buf = ++(*counter);
> 
>     return ret;
> }
> 
> 
> That global counter is also incremented during closure to visualise the
> call order:
> 
> 
> int demo_close_rt(struct rtdm_dev_context   *context,
>                   rtdm_user_info_t          *user_info)
> {
>     struct demodrv_context  *my_context;
> 
> 
>     my_context = (struct demodrv_context *)context->dev_private;
> 
>     printk("close 1: %d\n", xnpod_current_thread()->cprio);
>     rtdm_sem_destroy(&my_context->read_sem);
>     printk("close 2: %d\n", xnpod_current_thread()->cprio);
>     (*counter)++;
> 
>     return 0;
> }
> 
> 
> Now one would expect the following content of the involved variables
> when running 3 threads e.g.:
> 
>            thread 1      (prio 99)
>          /   thread 2    (prio 98)
>          |  /   thread 3 (prio 97)
>          |  |  /
> value_k: 1, 3, 5
> value_u: 2, 4, 6
> counter: 7
> 
> This is indeed what we get when the application locates in kernel space,
> i.e. does not use shadow threads. But when it is a user space
> application, the result looks like this:
> 
>            thread 1
>          /   thread 2
>          |  /   thread 3
>          |  |  /
> value_k: 1, 4, 6
> value_u: 2, 5, 7
> counter: 7
> 
> Which means that first thread returns from kernel to user space and
> terminates, then the close handler gets executed again, and only
> afterwards the remaining threads!
> 
> The reason is also displayed by demodrv:
> close 1: 0	- prio of root thread before rtdm_sem_destroy
> close 2: 99	- ... and after rtdm_sem_destroy
> 
> Which means that the non-RT thread calling rt_dev_close gets lifted to
> prio 99 on calling rtdm_sem_destroy, the prio of the thread first woken
> up. It seems to loose this prio quite soon again, but not soon enough to
> avoid the inversion - very strange.
> 
> Any ideas?

Not yet, but your analysis looks right, the main thread seems to spuriously 
recycle the priority level of task1 as the latter enters the secondary mode. The 
good news is that it's likely that such issue only impacts the deletion process 
which involves some specific transitions. I'll try to reproduce this bug asap 
and let you know. Thanks.

> 
> Jan
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core


-- 

Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-core] [BUG] scheduling order of dying shadow threads
  2005-11-08  8:38 [Xenomai-core] [BUG] scheduling order of dying shadow threads Jan Kiszka
  2005-11-08  9:32 ` Philippe Gerum
@ 2005-11-08 15:51 ` Philippe Gerum
  2005-11-08 16:21   ` Sebastian Smolorz
  2005-11-08 16:30   ` Jan Kiszka
  1 sibling, 2 replies; 10+ messages in thread
From: Philippe Gerum @ 2005-11-08 15:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Hi Philippe,
> 
> I think this one is for you: ;)
> 
> Sebastian got almost mad with his CAN driver while tracing a strange
> scheduling behaviour during shadow thread deletion for several days(!) -
> and I was right on the way to follow him yesterday evening. Attached is
> a simplified demonstration of the effect, consisting of a RTDM driver
> and both a kernel and user space application to trigger it.
> 

I've spotted the issue in nucleus/shadow.c. Basically, the root thread priority 
boost was leaking to a non-shadow thread due to a missing priority reset in the 
lostage APC handler, whilst a shadow was in the process of relaxing. Really 
funky bug, thanks! :o> Fixed in the repo hopefully for good. The scheduling 
sequence is now correct with your demo app on my box.

-- 

Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-core] [BUG] scheduling order of dying shadow threads
  2005-11-08 15:51 ` Philippe Gerum
@ 2005-11-08 16:21   ` Sebastian Smolorz
  2005-11-08 16:30   ` Jan Kiszka
  1 sibling, 0 replies; 10+ messages in thread
From: Sebastian Smolorz @ 2005-11-08 16:21 UTC (permalink / raw)
  To: xenomai-core

On Tue, 8 Nov 2005, Philippe Gerum wrote:

> Jan Kiszka wrote:
> > Hi Philippe,
> >
> > I think this one is for you: ;)
> >
> > Sebastian got almost mad with his CAN driver while tracing a strange
> > scheduling behaviour during shadow thread deletion for several days(!) -
> > and I was right on the way to follow him yesterday evening. Attached is
> > a simplified demonstration of the effect, consisting of a RTDM driver
> > and both a kernel and user space application to trigger it.
> >
>
> I've spotted the issue in nucleus/shadow.c. Basically, the root thread priority
> boost was leaking to a non-shadow thread due to a missing priority reset in the
> lostage APC handler, whilst a shadow was in the process of relaxing. Really
> funky bug, thanks! :o> Fixed in the repo hopefully for good. The scheduling
> sequence is now correct with your demo app on my box.

Thank you, problem has disappeared here, too! :-)


Sebastian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-core] [BUG] scheduling order of dying shadow threads
  2005-11-08 15:51 ` Philippe Gerum
  2005-11-08 16:21   ` Sebastian Smolorz
@ 2005-11-08 16:30   ` Jan Kiszka
  2005-11-08 17:47     ` Philippe Gerum
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2005-11-08 16:30 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Hi Philippe,
>>
>> I think this one is for you: ;)
>>
>> Sebastian got almost mad with his CAN driver while tracing a strange
>> scheduling behaviour during shadow thread deletion for several days(!) -
>> and I was right on the way to follow him yesterday evening. Attached is
>> a simplified demonstration of the effect, consisting of a RTDM driver
>> and both a kernel and user space application to trigger it.
>>
> 
> I've spotted the issue in nucleus/shadow.c. Basically, the root thread
> priority boost was leaking to a non-shadow thread due to a missing
> priority reset in the lostage APC handler, whilst a shadow was in the
> process of relaxing. Really funky bug, thanks! :o> Fixed in the repo
> hopefully for good. The scheduling sequence is now correct with your
> demo app on my box.
> 

Yep, looks good here as well. Great and quick work! Just don't expect
that someone can follow your explanations easily. :)

I think this issue has some similarity with the one I once stumbled over
regarding non-RT signalling to Linux. I'm not going to repeat my general
concerns regarding the priority boosting of the root thread now... ;)

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-core] [BUG] scheduling order of dying shadow threads
  2005-11-08 16:30   ` Jan Kiszka
@ 2005-11-08 17:47     ` Philippe Gerum
  2005-11-14 12:45       ` [Xenomai-core] [BUG] rt_pipe_flush declaration missing in skins/native/pipe.h Ignacio García Pérez
  0 siblings, 1 reply; 10+ messages in thread
From: Philippe Gerum @ 2005-11-08 17:47 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Philippe Gerum wrote:
> 
>>Jan Kiszka wrote:
>>
>>>Hi Philippe,
>>>
>>>I think this one is for you: ;)
>>>
>>>Sebastian got almost mad with his CAN driver while tracing a strange
>>>scheduling behaviour during shadow thread deletion for several days(!) -
>>>and I was right on the way to follow him yesterday evening. Attached is
>>>a simplified demonstration of the effect, consisting of a RTDM driver
>>>and both a kernel and user space application to trigger it.
>>>
>>
>>I've spotted the issue in nucleus/shadow.c. Basically, the root thread
>>priority boost was leaking to a non-shadow thread due to a missing
>>priority reset in the lostage APC handler, whilst a shadow was in the
>>process of relaxing. Really funky bug, thanks! :o> Fixed in the repo
>>hopefully for good. The scheduling sequence is now correct with your
>>demo app on my box.
>>
> 
> 
> Yep, looks good here as well. Great and quick work! Just don't expect
> that someone can follow your explanations easily. :)
> 

Well, sorry. Here is a more useful explanation :

The demo thread in your code calls sleep(1) before exiting, which causes the 
underlying shadow thread to relax. The same would happen without sleeping, since 
  a terminating thread is silently relaxed by the nucleus in any case as needed.

When relaxing the current thread, xnshadow_relax() first boosts the priority of 
the root thread (i.e. the placeholder for Linux in the Xenomai scheduler) right 
before suspending itself. Before that, a wake up request has been scheduled 
(using an APC), so that lostage_handler will be called, which will in turn 
invoke wake_up_process() for the relaxing thread. This is needed because shadows 
running in primary mode are seen as suspended in the Linux sense in 
TASK_INTERRUPTIBLE state. The reason for this is that both Xenomai and Linux 
schedulers must have a mutually exclusive control over a shadow, they should not 
be allowed to both fiddle concurrently with a single thread context; conversely, 
relaxed thread operating in secondary mode are seen as suspended on the XNRELAX 
condition by the nucleus.

IOW, what we want to do here is some kind of transition from the Xenomai to the 
Linux scheduler for the relaxing shadow thread.
This way, we make sure that the Linux scheduler will get back in control for the 
   awaken shadow thread, which ends up running in secondary mode once it has 
been resumed by wake_up_process().

Problem is that the unless we actually reset the root thread priority to the 
lowest one in lostage_handler in order to revert the priority boost done in 
xnshadow_relax, there is a short window of time during which a normal Linux task 
that has been preempted by the APC request that runs lostage_handler could run 
and wreck the scheduling sequence (e.g. your main() context). The fix is about 
downgrading the root thread priority and waking the relaxed shadow up in the 
same move, so that the priority scheme is kept intact.

Now the question is: why does the root thread priority need to be upgraded while 
relaxing a shadow? The answer is simple: when relaxing a shadow, you are not 
expected to change tasks in a Xenomai or Linux sense, you are only changing the 
Xenomai exec mode for a shadow, which means that we must ensure that giving 
control back to the Linux kernel just for the purpose of changing the current 
exec mode won't cause the current priority level of the relaxing thread to be 
lost and spuriously downgraded to the lowest one of the system.
So we just boost it to be equal to the one of the relaxing thread; this way, the 
Linux kernel code undergoes a Xenomai RT priority boost so that Linux cannot be 
preempted by lower priority Xenomai threads. When a shadow thread running in 
secondary mode is switched in, the root thread priority always inherits the 
Xenomai priority level for that thread; conversely, when a non-Xenomai/regular 
Linux task is scheduled in, the root thread priority is downgraded to the lowest 
Xenomai priority.

If one thinks a bit ahead now, having this scheme in place, we should be able to 
benefit from every improvement in the vanilla Linux kernel granularity toward 
real-time guarantees. Because we don't break the priority scheme moving in and 
out of the Linux domain, a Xenomai scheduling decision remains consistent with 
the Linux priority scheme, which is a necessary condition for providing a high 
integration level between Xeno and the vanilla kernel.

> I think this issue has some similarity with the one I once stumbled over
> regarding non-RT signalling to Linux. I'm not going to repeat my general
> concerns regarding the priority boosting of the root thread now... ;)
> 

Each time you spot a bug like this, your stack of concerns should lose at least 
one element, isn'it? :o>

Until Linux is really able to provide a fine-grained, non-disruptive and not 
easily disrupted (e.g. locking semantics in drivers), and low-overhead core 
implementation for RT support, Xeno will need to provide its own primary 
scheduler for latency-critical duties, the seamless mode migration just 
described being there to guarantee a seamless integration.

The day Linux does provide this support, we will be able to focus on the abtract 
RTOS core, RT interface skins, traditional RTOS emulators, and drivers, instead 
of compensating for the current lack of determinism, rebasing Xeno's scheduling 
support over native tasks and the native Linux scheduler. Oh, yeah... _That_ 
would be a great day. But for now, we still need two cooperating schedulers for 
some time ahead, I think. Sigh...

> Jan
> 

-- 

Philippe.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Xenomai-core] [BUG] rt_pipe_flush declaration missing in skins/native/pipe.h
  2005-11-08 17:47     ` Philippe Gerum
@ 2005-11-14 12:45       ` Ignacio García Pérez
  2005-11-14 15:02         ` Philippe Gerum
  0 siblings, 1 reply; 10+ messages in thread
From: Ignacio García Pérez @ 2005-11-14 12:45 UTC (permalink / raw)
  To: xenomai-core

Hi,

The subject says it all.

Nacho.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-core] [BUG] rt_pipe_flush declaration missing in skins/native/pipe.h
  2005-11-14 12:45       ` [Xenomai-core] [BUG] rt_pipe_flush declaration missing in skins/native/pipe.h Ignacio García Pérez
@ 2005-11-14 15:02         ` Philippe Gerum
  2005-11-14 16:08           ` Ignacio García Pérez
  0 siblings, 1 reply; 10+ messages in thread
From: Philippe Gerum @ 2005-11-14 15:02 UTC (permalink / raw)
  To: Ignacio García Pérez; +Cc: xenomai-core

Ignacio García Pérez wrote:
> Hi,
> 
> The subject says it all.
>

Fixed, thanks.

PS: please send patches when possible, it's faster to handle for me and less 
likely to be forgotten in my job queue. TIA,

> Nacho.
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core
> 


-- 

Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-core] [BUG] rt_pipe_flush declaration missing in skins/native/pipe.h
  2005-11-14 15:02         ` Philippe Gerum
@ 2005-11-14 16:08           ` Ignacio García Pérez
  2005-11-14 16:20             ` Philippe Gerum
  0 siblings, 1 reply; 10+ messages in thread
From: Ignacio García Pérez @ 2005-11-14 16:08 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 424 bytes --]

Philippe Gerum wrote:

> Ignacio García Pérez wrote:
>
>> Hi,
>>
>> The subject says it all.
>>
>
> Fixed, thanks.
>
> PS: please send patches when possible, it's faster to handle for me
> and less likely to be forgotten in my job queue. TIA,
>
I updated my source from the repository, and the
EXPORT_SYMBOL(rt_pipe_flush) in pipe.c is missing, so rt_pipe_flush is
not usable yet. Patch attached.

Nacho.

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 380 bytes --]

Index: skins/native/pipe.c
===================================================================
--- skins/native/pipe.c	(revision 143)
+++ skins/native/pipe.c	(working copy)
@@ -1050,5 +1050,6 @@
 EXPORT_SYMBOL(rt_pipe_read);
 EXPORT_SYMBOL(rt_pipe_write);
 EXPORT_SYMBOL(rt_pipe_stream);
+EXPORT_SYMBOL(rt_pipe_flush);
 EXPORT_SYMBOL(rt_pipe_alloc);
 EXPORT_SYMBOL(rt_pipe_free);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-core] [BUG] rt_pipe_flush declaration missing in skins/native/pipe.h
  2005-11-14 16:08           ` Ignacio García Pérez
@ 2005-11-14 16:20             ` Philippe Gerum
  0 siblings, 0 replies; 10+ messages in thread
From: Philippe Gerum @ 2005-11-14 16:20 UTC (permalink / raw)
  To: Ignacio García Pérez; +Cc: xenomai-core

Ignacio García Pérez wrote:
> Philippe Gerum wrote:
> 
> 
>>Ignacio García Pérez wrote:
>>
>>
>>>Hi,
>>>
>>>The subject says it all.
>>>
>>
>>Fixed, thanks.
>>
>>PS: please send patches when possible, it's faster to handle for me
>>and less likely to be forgotten in my job queue. TIA,
>>
> 
> I updated my source from the repository, and the
> EXPORT_SYMBOL(rt_pipe_flush) in pipe.c is missing, so rt_pipe_flush is
> not usable yet. Patch attached.
> 

Applied, thanks.

> Nacho.
> 
> 
> ------------------------------------------------------------------------
> 
> Index: skins/native/pipe.c
> ===================================================================
> --- skins/native/pipe.c	(revision 143)
> +++ skins/native/pipe.c	(working copy)
> @@ -1050,5 +1050,6 @@
>  EXPORT_SYMBOL(rt_pipe_read);
>  EXPORT_SYMBOL(rt_pipe_write);
>  EXPORT_SYMBOL(rt_pipe_stream);
> +EXPORT_SYMBOL(rt_pipe_flush);
>  EXPORT_SYMBOL(rt_pipe_alloc);
>  EXPORT_SYMBOL(rt_pipe_free);


-- 

Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-11-14 16:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-08  8:38 [Xenomai-core] [BUG] scheduling order of dying shadow threads Jan Kiszka
2005-11-08  9:32 ` Philippe Gerum
2005-11-08 15:51 ` Philippe Gerum
2005-11-08 16:21   ` Sebastian Smolorz
2005-11-08 16:30   ` Jan Kiszka
2005-11-08 17:47     ` Philippe Gerum
2005-11-14 12:45       ` [Xenomai-core] [BUG] rt_pipe_flush declaration missing in skins/native/pipe.h Ignacio García Pérez
2005-11-14 15:02         ` Philippe Gerum
2005-11-14 16:08           ` Ignacio García Pérez
2005-11-14 16:20             ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.