Re: [Xenomai] Sleeping function called from invalid context

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [Xenomai] Sleeping function called from invalid context
@ 2014-12-10 18:58 Stoidner, Christoph
  2014-12-10 19:01 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-10 18:58 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

   > That is strange, are these tasks
   > running with the SCHED_OTHER policy ?
   No, they are running with sched fifo.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-10 18:58 [Xenomai] Sleeping function called from invalid context Stoidner, Christoph
@ 2014-12-10 19:01 ` Gilles Chanteperdrix
  2014-12-11 10:00   ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-10 19:01 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Wed, Dec 10, 2014 at 06:58:26PM +0000, Stoidner, Christoph wrote:
> > That is strange, are these tasks
> > running with the SCHED_OTHER policy ?
> No, they are running with sched fifo.

Ok, next, are they using
rt_timer_mode_set
or
RT drivers which return -ENOSYS in their rt handlers ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-10 19:01 ` Gilles Chanteperdrix
@ 2014-12-11 10:00   ` Stoidner, Christoph
  2014-12-11 10:05     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-11 10:00 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>> > That is strange, are these tasks
>> > running with the SCHED_OTHER policy ?
>> No, they are running with sched fifo.
>
>Ok, next, are they using
>rt_timer_mode_set
>or
>RT drivers which return -ENOSYS in their rt handlers ?

Assumedly you mean rt_timer_set_mode() instead of rt_timer_mode_set(). This function is not called since we are using our own skin. Also our own skin does not use xntbase_switch(). Maybe it is interesting to know that our skin is configured with periodic timing (1ms). 

RT drivers are also not installed on our system. 

Regards,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 10:00   ` Stoidner, Christoph
@ 2014-12-11 10:05     ` Gilles Chanteperdrix
  2014-12-11 10:18       ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-11 10:05 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Thu, Dec 11, 2014 at 10:00:16AM +0000, Stoidner, Christoph wrote:
> 
> >> > That is strange, are these tasks
> >> > running with the SCHED_OTHER policy ?
> >> No, they are running with sched fifo.
> >
> >Ok, next, are they using
> >rt_timer_mode_set
> >or
> >RT drivers which return -ENOSYS in their rt handlers ?
> 
> Assumedly you mean rt_timer_set_mode() instead of rt_timer_mode_set(). This function is not called since we are using our own skin. Also our own skin does not use xntbase_switch(). Maybe it is interesting to know that our skin is configured with periodic timing (1ms). 
> 
> RT drivers are also not installed on our system. 

Maybe I look wrong at nucleus/shadow.c, but I see only two reasons
for for xnshadow_relax being called for a secondary mode syscall: 

- a call requiring primary mode was made and the switchback flag is
set (which itself can have two reasons, either the fact that the
syscall uses it, the only one being currently rt_timer_set_mode, or
the fact that the caller runs with the SCHED_OTHER policy and does
not hold a mutex)

- a call with the adaptive flag was made, handled in primary mode,
but returned -ENOSYS, which means the call needs to be retried in
secondary mode.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 10:05     ` Gilles Chanteperdrix
@ 2014-12-11 10:18       ` Stoidner, Christoph
  2014-12-11 10:22         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-11 10:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>>
>> >> > That is strange, are these tasks
>> >> > running with the SCHED_OTHER policy ?
>> >> No, they are running with sched fifo.
>> >
>> >Ok, next, are they using
>> >rt_timer_mode_set
>> >or
>> >RT drivers which return -ENOSYS in their rt handlers ?
>>
>> Assumedly you mean rt_timer_set_mode() instead of rt_timer_mode_set(). This function is not called since we are >using our own skin. Also our own skin does not use xntbase_switch(). Maybe it is interesting to know that our skin is >configured with periodic timing (1ms).
>>
>> RT drivers are also not installed on our system.
>
> Maybe I look wrong at nucleus/shadow.c, but I see only two reasons
> for for xnshadow_relax being called for a secondary mode syscall:
>
> - a call requiring primary mode was made and the switchback flag is
> set (which itself can have two reasons, either the fact that the
> syscall uses it, the only one being currently rt_timer_set_mode, or
> the fact that the caller runs with the SCHED_OTHER policy and does
> not hold a mutex)
>
> - a call with the adaptive flag was made, handled in primary mode,
> but returned -ENOSYS, which means the call needs to be retried in
> secondary mode.
>

You are right. We have implemented some syscalls the require primary mode (since they could block) but should switch back to secondary. These syscalls are using attributes

    __xn_exec_primary | __xn_exec_switchback

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 10:18       ` Stoidner, Christoph
@ 2014-12-11 10:22         ` Gilles Chanteperdrix
  2014-12-11 10:29           ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-11 10:22 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Thu, Dec 11, 2014 at 10:18:55AM +0000, Stoidner, Christoph wrote:
> >>
> >> >> > That is strange, are these tasks
> >> >> > running with the SCHED_OTHER policy ?
> >> >> No, they are running with sched fifo.
> >> >
> >> >Ok, next, are they using
> >> >rt_timer_mode_set
> >> >or
> >> >RT drivers which return -ENOSYS in their rt handlers ?
> >>
> >> Assumedly you mean rt_timer_set_mode() instead of rt_timer_mode_set(). This function is not called since we are >using our own skin. Also our own skin does not use xntbase_switch(). Maybe it is interesting to know that our skin is >configured with periodic timing (1ms).
> >>
> >> RT drivers are also not installed on our system.
> >
> > Maybe I look wrong at nucleus/shadow.c, but I see only two reasons
> > for for xnshadow_relax being called for a secondary mode syscall:
> >
> > - a call requiring primary mode was made and the switchback flag is
> > set (which itself can have two reasons, either the fact that the
> > syscall uses it, the only one being currently rt_timer_set_mode, or
> > the fact that the caller runs with the SCHED_OTHER policy and does
> > not hold a mutex)
> >
> > - a call with the adaptive flag was made, handled in primary mode,
> > but returned -ENOSYS, which means the call needs to be retried in
> > secondary mode.
> >
> 
> You are right. We have implemented some syscalls the require primary mode (since they could block) but should switch back to secondary. These syscalls are using attributes
> 
>     __xn_exec_primary | __xn_exec_switchback

Well then the tasks blocked with a stack trace like what you showed
are blocked when coming back from such syscalls. Does this happen
often, or are these syscall seldom used?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 10:22         ` Gilles Chanteperdrix
@ 2014-12-11 10:29           ` Stoidner, Christoph
  2014-12-11 10:47             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-11 10:29 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>> >> RT drivers are also not installed on our system.
>> >
>> > Maybe I look wrong at nucleus/shadow.c, but I see only two reasons
>> > for for xnshadow_relax being called for a secondary mode syscall:
>> >
>> > - a call requiring primary mode was made and the switchback flag is
>> > set (which itself can have two reasons, either the fact that the
>> > syscall uses it, the only one being currently rt_timer_set_mode, or
>> > the fact that the caller runs with the SCHED_OTHER policy and does
>> > not hold a mutex)
>> >
>> > - a call with the adaptive flag was made, handled in primary mode,
>> > but returned -ENOSYS, which means the call needs to be retried in
>> > secondary mode.
>> >
>>
>> You are right. We have implemented some syscalls the require primary mode (since they could block) but should switch back to secondary. These syscalls are using attributes
>>
>>     __xn_exec_primary | __xn_exec_switchback
>
> Well then the tasks blocked with a stack trace like what you showed
> are blocked when coming back from such syscalls. Does this happen
> often, or are these syscall seldom used?

In this application the switchback syscalls are used rather often. 

Regards,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 10:29           ` Stoidner, Christoph
@ 2014-12-11 10:47             ` Gilles Chanteperdrix
  2014-12-11 11:17               ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-11 10:47 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Thu, Dec 11, 2014 at 10:29:58AM +0000, Stoidner, Christoph wrote:
> >> >> RT drivers are also not installed on our system.
> >> >
> >> > Maybe I look wrong at nucleus/shadow.c, but I see only two reasons
> >> > for for xnshadow_relax being called for a secondary mode syscall:
> >> >
> >> > - a call requiring primary mode was made and the switchback flag is
> >> > set (which itself can have two reasons, either the fact that the
> >> > syscall uses it, the only one being currently rt_timer_set_mode, or
> >> > the fact that the caller runs with the SCHED_OTHER policy and does
> >> > not hold a mutex)
> >> >
> >> > - a call with the adaptive flag was made, handled in primary mode,
> >> > but returned -ENOSYS, which means the call needs to be retried in
> >> > secondary mode.
> >> >
> >>
> >> You are right. We have implemented some syscalls the require primary mode (since they could block) but should switch back to secondary. These syscalls are using attributes
> >>
> >>     __xn_exec_primary | __xn_exec_switchback
> >
> > Well then the tasks blocked with a stack trace like what you showed
> > are blocked when coming back from such syscalls. Does this happen
> > often, or are these syscall seldom used?
> 
> In this application the switchback syscalls are used rather often. 

Two things to note:

- mode switches do not go without some overhead, it is almost never
a good idea to force the switchback, since the nucleus will switch
the target thread to secondary mode when needed anyway, all that you
risk is to introduce useless mode switches, so, if you do it, you
must have a good reason for it;

- there is a risk of overflowing the nucleus queue for relax
requests, you will see a message in the kernel logs when this
happens if you enable debugging of the nucleus
(CONFIG_XENO_OPT_DEBUG_NUCLEUS).



-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 10:47             ` Gilles Chanteperdrix
@ 2014-12-11 11:17               ` Stoidner, Christoph
  2014-12-11 14:47                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-11 11:17 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>>
>> In this application the switchback syscalls are used rather often.
>
>Two things to note:
>
>- mode switches do not go without some overhead, it is almost never
>a good idea to force the switchback, since the nucleus will switch
>the target thread to secondary mode when needed anyway, all that you
>risk is to introduce useless mode switches, so, if you do it, you
>must have a good reason for it;

Yes I know. However we need to assure that the task switches back to secondary domain after using skin's call, since otherwise it would affect other linux tasks which actual have a higher linux-priority. In general we try to simulate something like "priority coupling" since this is marked as deprecated.

>- there is a risk of overflowing the nucleus queue for relax
>requests, you will see a message in the kernel logs when this
>happens if you enable debugging of the nucleus
>(CONFIG_XENO_OPT_DEBUG_NUCLEUS).

I have enabled CONFIG_XENO_OPT_DEBUG_NUCLEUS now but I do not see any according message.

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 11:17               ` Stoidner, Christoph
@ 2014-12-11 14:47                 ` Gilles Chanteperdrix
  2014-12-11 15:47                   ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-11 14:47 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Thu, Dec 11, 2014 at 11:17:42AM +0000, Stoidner, Christoph wrote:
> >>
> >> In this application the switchback syscalls are used rather often.
> >
> >Two things to note:
> >
> >- mode switches do not go without some overhead, it is almost never
> >a good idea to force the switchback, since the nucleus will switch
> >the target thread to secondary mode when needed anyway, all that you
> >risk is to introduce useless mode switches, so, if you do it, you
> >must have a good reason for it;
> 
> Yes I know. However we need to assure that the task switches back
> to secondary domain after using skin's call, since otherwise it
> would affect other linux tasks which actual have a higher
> linux-priority. In general we try to simulate something like
> "priority coupling" since this is marked as deprecated.

I understand, I guess this is the reason why xenomai-3 introduced
the XNWEAK scheduling class (which does the same thing by the way,
automatically switches back to secondary mode). I am not sure this
can compare to priority coupling though, since priority coupling
does the reverse: a task running in secondary mode may prevent other
tasks running in primary mode from running.

> 
> >- there is a risk of overflowing the nucleus queue for relax
> >requests, you will see a message in the kernel logs when this
> >happens if you enable debugging of the nucleus
> >(CONFIG_XENO_OPT_DEBUG_NUCLEUS).
> 
> I have enabled CONFIG_XENO_OPT_DEBUG_NUCLEUS now but I do not see
> any according message.

Ok, maybe we have some hope with the tracer though. What should
trigger a trace is the fact that a relax request has been sent, but
that the next linux scheduling point does not wake up the said task.
This is all debug code, so it does not need to be clean. You can
define a per-cpu variable (if running on SMP systems, otherwise a
global variable will do) with the last request posted. And when a
linux scheduling happens, test that the newly scheduled task is the
task that was passed to the relax, if that is not the case, trigger
a trace freeze. The point where the nucleus is informed of a Linux
task switch is do_schedule_event(). The trick is that if you have
some tasks with a higher priority than the relaxed task, it is
normal that the relaxed task is not scheduled immediately, so if you
want the condition to hold, you need to run the xenomai tasks which
relax with the highest priority. Also, obviously, after the test,
the pointer should be reset to NULL, because there are several task
queued, and with a global variable you have no way of knowing which
is next.

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 14:47                 ` Gilles Chanteperdrix
@ 2014-12-11 15:47                   ` Stoidner, Christoph
  2014-12-11 16:06                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-11 15:47 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


> Ok, maybe we have some hope with the tracer though. What should
> trigger a trace is the fact that a relax request has been sent, but
> that the next linux scheduling point does not wake up the said task.
> This is all debug code, so it does not need to be clean. You can
> define a per-cpu variable (if running on SMP systems, otherwise a
> global variable will do) with the last request posted. And when a
> linux scheduling happens, test that the newly scheduled task is the
> task that was passed to the relax, if that is not the case, trigger
> a trace freeze. The point where the nucleus is informed of a Linux
> task switch is do_schedule_event(). The trick is that if you have
> some tasks with a higher priority than the relaxed task, it is
> normal that the relaxed task is not scheduled immediately, so if you
> want the condition to hold, you need to run the xenomai tasks which
> relax with the highest priority. Also, obviously, after the test,
> the pointer should be reset to NULL, because there are several task
> queued, and with a global variable you have no way of knowing which
> is next.

I will do so. Do I have to change the gatekeeper's priority behind the relaxing
task to assure that the gatekeeper would be scheduled before?

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 15:47                   ` Stoidner, Christoph
@ 2014-12-11 16:06                     ` Gilles Chanteperdrix
  2014-12-11 16:31                       ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-11 16:06 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Thu, Dec 11, 2014 at 03:47:36PM +0000, Stoidner, Christoph wrote:
> 
> > Ok, maybe we have some hope with the tracer though. What should
> > trigger a trace is the fact that a relax request has been sent, but
> > that the next linux scheduling point does not wake up the said task.
> > This is all debug code, so it does not need to be clean. You can
> > define a per-cpu variable (if running on SMP systems, otherwise a
> > global variable will do) with the last request posted. And when a
> > linux scheduling happens, test that the newly scheduled task is the
> > task that was passed to the relax, if that is not the case, trigger
> > a trace freeze. The point where the nucleus is informed of a Linux
> > task switch is do_schedule_event(). The trick is that if you have
> > some tasks with a higher priority than the relaxed task, it is
> > normal that the relaxed task is not scheduled immediately, so if you
> > want the condition to hold, you need to run the xenomai tasks which
> > relax with the highest priority. Also, obviously, after the test,
> > the pointer should be reset to NULL, because there are several task
> > queued, and with a global variable you have no way of knowing which
> > is next.
> 
> I will do so. Do I have to change the gatekeeper's priority behind the relaxing
> task to assure that the gatekeeper would be scheduled before?

The gatekeeper is not involved.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 16:06                     ` Gilles Chanteperdrix
@ 2014-12-11 16:31                       ` Stoidner, Christoph
  2014-12-11 16:38                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-11 16:31 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>> > Ok, maybe we have some hope with the tracer though. What should
>> > trigger a trace is the fact that a relax request has been sent, but
>> > that the next linux scheduling point does not wake up the said task.
>> > This is all debug code, so it does not need to be clean. You can
>> > define a per-cpu variable (if running on SMP systems, otherwise a
>> > global variable will do) with the last request posted. And when a
>> > linux scheduling happens, test that the newly scheduled task is the
>> > task that was passed to the relax, if that is not the case, trigger
>> > a trace freeze. The point where the nucleus is informed of a Linux
>> > task switch is do_schedule_event(). The trick is that if you have
>> > some tasks with a higher priority than the relaxed task, it is
>> > normal that the relaxed task is not scheduled immediately, so if you
>> > want the condition to hold, you need to run the xenomai tasks which
>> > relax with the highest priority. Also, obviously, after the test,
>> > the pointer should be reset to NULL, because there are several task
>> > queued, and with a global variable you have no way of knowing which
>> > is next.
>>
>> I will do so. Do I have to change the gatekeeper's priority behind the relaxing
>> task to assure that the gatekeeper would be scheduled before?
>
> The gatekeeper is not involved.

Sorry, I wanted to say: "that gatekeeper would NOT be scheduled before?". Otherwise
we see possibly the gatekeeper instead of the relaxed task? Or am I wrong?

Regards,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 16:31                       ` Stoidner, Christoph
@ 2014-12-11 16:38                         ` Gilles Chanteperdrix
  2014-12-11 19:23                           ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-11 16:38 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Thu, Dec 11, 2014 at 04:31:38PM +0000, Stoidner, Christoph wrote:
> >> > Ok, maybe we have some hope with the tracer though. What should
> >> > trigger a trace is the fact that a relax request has been sent, but
> >> > that the next linux scheduling point does not wake up the said task.
> >> > This is all debug code, so it does not need to be clean. You can
> >> > define a per-cpu variable (if running on SMP systems, otherwise a
> >> > global variable will do) with the last request posted. And when a
> >> > linux scheduling happens, test that the newly scheduled task is the
> >> > task that was passed to the relax, if that is not the case, trigger
> >> > a trace freeze. The point where the nucleus is informed of a Linux
> >> > task switch is do_schedule_event(). The trick is that if you have
> >> > some tasks with a higher priority than the relaxed task, it is
> >> > normal that the relaxed task is not scheduled immediately, so if you
> >> > want the condition to hold, you need to run the xenomai tasks which
> >> > relax with the highest priority. Also, obviously, after the test,
> >> > the pointer should be reset to NULL, because there are several task
> >> > queued, and with a global variable you have no way of knowing which
> >> > is next.
> >>
> >> I will do so. Do I have to change the gatekeeper's priority behind the relaxing
> >> task to assure that the gatekeeper would be scheduled before?
> >
> > The gatekeeper is not involved.
> 
> Sorry, I wanted to say: "that gatekeeper would NOT be scheduled before?". Otherwise
> we see possibly the gatekeeper instead of the relaxed task? Or am I wrong?

I meant to say the gatekeeper is not involved in the primary to
secondary mode switches, only in the secondary to primary mode
switches. But yes, since it runs with the highest priority, it may
be the one scheduled when back to primary mode. Though I would say
it is probably very unlikely, since the events activating the
gatekeeper is a secondary mode event, which by definition could not
have happened while another task was in primary mode. So what could
happen is that one task running in secondary mode tries to switch to
primary mode, at which point, before the gatekeeper is even
activated, another task running in primary mode is activated
(perhaps because it is waiting for an interrupt, may it be atimer).

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 16:38                         ` Gilles Chanteperdrix
@ 2014-12-11 19:23                           ` Stoidner, Christoph
  2014-12-12 16:42                             ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-11 19:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>> >> > Ok, maybe we have some hope with the tracer though. What should
>> >> > trigger a trace is the fact that a relax request has been sent, but
>> >> > that the next linux scheduling point does not wake up the said task.
>> >> > This is all debug code, so it does not need to be clean. You can
>> >> > define a per-cpu variable (if running on SMP systems, otherwise a
>> >> > global variable will do) with the last request posted. And when a
>> >> > linux scheduling happens, test that the newly scheduled task is the
>> >> > task that was passed to the relax, if that is not the case, trigger
>> >> > a trace freeze. The point where the nucleus is informed of a Linux
>> >> > task switch is do_schedule_event(). The trick is that if you have
>> >> > some tasks with a higher priority than the relaxed task, it is
>> >> > normal that the relaxed task is not scheduled immediately, so if you
>> >> > want the condition to hold, you need to run the xenomai tasks which
>> >> > relax with the highest priority. Also, obviously, after the test,
>> >> > the pointer should be reset to NULL, because there are several task
>> >> > queued, and with a global variable you have no way of knowing which
>> >> > is next.
>> >>
>> >> I will do so. Do I have to change the gatekeeper's priority behind the relaxing
>> >> task to assure that the gatekeeper would be scheduled before?
>> >
>> > The gatekeeper is not involved.
>>
>> Sorry, I wanted to say: "that gatekeeper would NOT be scheduled before?". Otherwise
>> we see possibly the gatekeeper instead of the relaxed task? Or am I wrong?
>
> I meant to say the gatekeeper is not involved in the primary to
> secondary mode switches, only in the secondary to primary mode
> switches. But yes, since it runs with the highest priority, it may
> be the one scheduled when back to primary mode. Though I would say
> it is probably very unlikely, since the events activating the
> gatekeeper is a secondary mode event, which by definition could not
> have happened while another task was in primary mode. So what could
> happen is that one task running in secondary mode tries to switch to
> primary mode, at which point, before the gatekeeper is even
> activated, another task running in primary mode is activated
> (perhaps because it is waiting for an interrupt, may it be atimer).

I have added debug code as described by you. Unfortunately I am not sure
if I catch the correct point in time to trigger the trace. The problem is that
beside gatekeeper their are also some kernel interrupt tasks that have a
higher priority. 

Because of that and to understand whats happening here I have added
some other debug code. Thereby my goal was to verify if all requests
of schedule_linux_call() are processed by lostage_handler(). This is 
realized as below:

1) In schedule_linux_call() I check if it's the request for my highest prio 
task. If yes I keep the tasks's pointer in a global variable.

2) In lostage_handler() before wake_up_process() call I check if the global 
pointer is equal to the current request. If true it's the wakeup for my
high prio task. In that case I keep that information by clearing the global
pointer with NULL.

3) In do_schedule_event() I output some printk() message when the 
global pointer is unequal to NULL.

What I would expect now is to see no message from do_schedule_event(), 
since lostage_handler() should be called before do_schedule_event().
However after some run-time I see the printk() message of do_schedule_event().
After that all tasks are freezed.

Is my assumption from above wrong or is the outputted printk() message 
the indication that the problem has occurred? 

For better understanding find my debug-changes below:

*** tmp/xenomai-2.6.3/ksrc/nucleus/shadow.c     2013-08-20 13:14:38.000000000 +0200
--- xenomai-2.6.3/ksrc/nucleus/shadow.c 2014-12-11 20:03:22.885493756 +0100
***************
*** 752,757 ****
--- 752,758 ----
        }
  }
  
+ static volatile struct task_struct *__last_requested_task = NULL;
  static void lostage_handler(void *cookie)
  {
        int cpu, reqnum, type, arg, sig, sigarg;
***************
*** 795,800 ****
--- 796,803 ----
  
                        /* fall through */
                case LO_START_REQ:
+                       if (__last_requested_task && p==__last_requested_task)
+                               __last_requested_task = NULL;
                        wake_up_process(p);
                        break;
  
***************
*** 843,848 ****
--- 846,854 ----
        rq->req[reqnum].task = p;
        rq->req[reqnum].arg = arg;
  
+     if (strcmp(p->comm, "LApp") == 0)
+               __last_requested_task = p;
+ 
        __rthal_apc_schedule(lostage_apc);
  
        splexit(s);
***************
*** 2606,2611 ****
--- 2612,2623 ----
        if (!xnpod_active_p())
                return;
  
+       if (__last_requested_task)
+       {
+               printk("===================== lostage_handler() was not called before!\n");
+               __last_requested_task = NULL;
+       }
+ 
        prev_task = current;
        next = xnshadow_thread(next_task);
        set_switch_lock_owner(prev_task);


Regards,
Christoph





^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-11 19:23                           ` Stoidner, Christoph
@ 2014-12-12 16:42                             ` Stoidner, Christoph
  2014-12-15 11:42                               ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-12 16:42 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

Hi Gilles,

it seems to me as APC interrupts on ipipe got lost. I have added two counters: One increments in schedule_linux_call() when a request for a specific application task is queued. Another one is incremented in lostage_handler() when the specific task was waked up. When I output the counter's values after tasks have freezed the counter of lostage_handler is exactly one value less than schedule_linux_call's counter. 

And then, when I wake-up APC thread manually all freezed tasks continue for a moment, until all are freezing again.

Do you have any idea what could lead to that behaviour?

Regards,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-12 16:42                             ` Stoidner, Christoph
@ 2014-12-15 11:42                               ` Stoidner, Christoph
  2014-12-15 13:23                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-15 11:42 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


> it seems to me as APC interrupts on ipipe got lost. I have added two counters: 
> One increments in schedule_linux_call() when a request for a specific application 
> task is queued. Another one is incremented in lostage_handler() when the 
> specific task was waked up. When I output the counter's values after tasks have
> freezed the counter of lostage_handler is exactly one value less than 
> schedule_linux_call's counter.
>
> And then, when I wake-up APC thread manually all freezed tasks continue for a 
> moment, until all are freezing again.

As said above waking APC thread using

  wake_up_process(rthal_apc_servers[smp_processor_id()]);

does reactivate all Xenomai threads for scheduling. Incontrast using

  rthal_apc_schedule(lostage_apc);

does not help. So it seems as APC interrupt is pending but won't be  executed.

Any idea?

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-15 11:42                               ` Stoidner, Christoph
@ 2014-12-15 13:23                                 ` Gilles Chanteperdrix
  2014-12-15 13:29                                   ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-15 13:23 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Mon, Dec 15, 2014 at 11:42:40AM +0000, Stoidner, Christoph wrote:
> 
> > it seems to me as APC interrupts on ipipe got lost. I have added two counters: 
> > One increments in schedule_linux_call() when a request for a specific application 
> > task is queued. Another one is incremented in lostage_handler() when the 
> > specific task was waked up. When I output the counter's values after tasks have
> > freezed the counter of lostage_handler is exactly one value less than 
> > schedule_linux_call's counter.
> >
> > And then, when I wake-up APC thread manually all freezed tasks continue for a 
> > moment, until all are freezing again.
> 
> As said above waking APC thread using
> 
>   wake_up_process(rthal_apc_servers[smp_processor_id()]);
> 
> does reactivate all Xenomai threads for scheduling. Incontrast using
> 
>   rthal_apc_schedule(lostage_apc);
> 
> does not help. So it seems as APC interrupt is pending but won't be  executed.
> 
> Any idea?

Do you have the same issue if you run a kernel patched with Xenomai
and without the PREEMPT_RT patch?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-15 13:23                                 ` Gilles Chanteperdrix
@ 2014-12-15 13:29                                   ` Stoidner, Christoph
  2014-12-15 14:20                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-15 13:29 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>> > it seems to me as APC interrupts on ipipe got lost. I have added two counters:
>> > One increments in schedule_linux_call() when a request for a specific application
>> > task is queued. Another one is incremented in lostage_handler() when the
>> > specific task was waked up. When I output the counter's values after tasks have
>> > freezed the counter of lostage_handler is exactly one value less than
>> > schedule_linux_call's counter.
>> >
>> > And then, when I wake-up APC thread manually all freezed tasks continue for a
>> > moment, until all are freezing again.
>>
>> As said above waking APC thread using
>>
>>   wake_up_process(rthal_apc_servers[smp_processor_id()]);
>>
>> does reactivate all Xenomai threads for scheduling. Incontrast using
>>
>>   rthal_apc_schedule(lostage_apc);
>>
>> does not help. So it seems as APC interrupt is pending but won't be  executed.
>>
>> Any idea?
>
> Do you have the same issue if you run a kernel patched with Xenomai
> and without the PREEMPT_RT patch?

No, without PREEMPT_RT the problem did not yet occur.

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-15 13:29                                   ` Stoidner, Christoph
@ 2014-12-15 14:20                                     ` Gilles Chanteperdrix
  2014-12-15 15:11                                       ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-15 14:20 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Mon, Dec 15, 2014 at 01:29:19PM +0000, Stoidner, Christoph wrote:
> >> > it seems to me as APC interrupts on ipipe got lost. I have added two counters:
> >> > One increments in schedule_linux_call() when a request for a specific application
> >> > task is queued. Another one is incremented in lostage_handler() when the
> >> > specific task was waked up. When I output the counter's values after tasks have
> >> > freezed the counter of lostage_handler is exactly one value less than
> >> > schedule_linux_call's counter.
> >> >
> >> > And then, when I wake-up APC thread manually all freezed tasks continue for a
> >> > moment, until all are freezing again.
> >>
> >> As said above waking APC thread using
> >>
> >>   wake_up_process(rthal_apc_servers[smp_processor_id()]);
> >>
> >> does reactivate all Xenomai threads for scheduling. Incontrast using
> >>
> >>   rthal_apc_schedule(lostage_apc);
> >>
> >> does not help. So it seems as APC interrupt is pending but won't be  executed.
> >>
> >> Any idea?
> >
> > Do you have the same issue if you run a kernel patched with Xenomai
> > and without the PREEMPT_RT patch?
> 
> No, without PREEMPT_RT the problem did not yet occur.

Have you tried to follow the path from rthal_apc_schedule to
rthal_apc_handler to see where the notification gets lost? 

Also note that calling wake_up_process from primary mode is not a
good idea.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-15 14:20                                     ` Gilles Chanteperdrix
@ 2014-12-15 15:11                                       ` Stoidner, Christoph
  2014-12-15 15:19                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-15 15:11 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


>> >> > it seems to me as APC interrupts on ipipe got lost. I have added two counters:
>> >> > One increments in schedule_linux_call() when a request for a specific application
>> >> > task is queued. Another one is incremented in lostage_handler() when the
>> >> > specific task was waked up. When I output the counter's values after tasks have
>> >> > freezed the counter of lostage_handler is exactly one value less than
>> >> > schedule_linux_call's counter.
>> >> >
>> >> > And then, when I wake-up APC thread manually all freezed tasks continue for a
>> >> > moment, until all are freezing again.
>> >>
>> >> As said above waking APC thread using
>> >>
>> >>   wake_up_process(rthal_apc_servers[smp_processor_id()]);
>> >>
>> >> does reactivate all Xenomai threads for scheduling. Incontrast using
>> >>
>> >>   rthal_apc_schedule(lostage_apc);
>> >>
>> >> does not help. So it seems as APC interrupt is pending but won't be  executed.
>> >>
>> >> Any idea?
>> >
>> > Do you have the same issue if you run a kernel patched with Xenomai
>> > and without the PREEMPT_RT patch?
>>
>> No, without PREEMPT_RT the problem did not yet occur.
>
> Have you tried to follow the path from rthal_apc_schedule to
> rthal_apc_handler to see where the notification gets lost?

I am about to do that. Unfortunately I am not very familiar with ipipe
implementation. Do you know if there are some tracing or debug 
features that could help me here?

> Also note that calling wake_up_process from primary mode is not a
> good idea.

Thank for the hint. Actually I had used wake_up_process() only from 2nd 
domain to test if this APC thread was broken or not.

Thanks and regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-15 15:11                                       ` Stoidner, Christoph
@ 2014-12-15 15:19                                         ` Gilles Chanteperdrix
  2014-12-17 12:24                                           ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-15 15:19 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Mon, Dec 15, 2014 at 03:11:45PM +0000, Stoidner, Christoph wrote:
> 
> >> >> > it seems to me as APC interrupts on ipipe got lost. I have added two counters:
> >> >> > One increments in schedule_linux_call() when a request for a specific application
> >> >> > task is queued. Another one is incremented in lostage_handler() when the
> >> >> > specific task was waked up. When I output the counter's values after tasks have
> >> >> > freezed the counter of lostage_handler is exactly one value less than
> >> >> > schedule_linux_call's counter.
> >> >> >
> >> >> > And then, when I wake-up APC thread manually all freezed tasks continue for a
> >> >> > moment, until all are freezing again.
> >> >>
> >> >> As said above waking APC thread using
> >> >>
> >> >>   wake_up_process(rthal_apc_servers[smp_processor_id()]);
> >> >>
> >> >> does reactivate all Xenomai threads for scheduling. Incontrast using
> >> >>
> >> >>   rthal_apc_schedule(lostage_apc);
> >> >>
> >> >> does not help. So it seems as APC interrupt is pending but won't be  executed.
> >> >>
> >> >> Any idea?
> >> >
> >> > Do you have the same issue if you run a kernel patched with Xenomai
> >> > and without the PREEMPT_RT patch?
> >>
> >> No, without PREEMPT_RT the problem did not yet occur.
> >
> > Have you tried to follow the path from rthal_apc_schedule to
> > rthal_apc_handler to see where the notification gets lost?
> 
> I am about to do that. Unfortunately I am not very familiar with ipipe
> implementation. Do you know if there are some tracing or debug 
> features that could help me here?

The APC code mostly happens in ksrc/arch/generic/hal.c and the
PREEMPT_RT path is distinct from the non PREEMPT_RT one, so, I would
tend to believe that the problem is in that code. As for tracing,
same as usual, you can  use the I-pipe tracer, injecting some calls
to ipipe_trace_special() perhaps, to save some values in the trace.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-15 15:19                                         ` Gilles Chanteperdrix
@ 2014-12-17 12:24                                           ` Stoidner, Christoph
  2014-12-17 12:38                                             ` Gilles Chanteperdrix
  2014-12-17 13:22                                             ` Gilles Chanteperdrix
  0 siblings, 2 replies; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-17 12:24 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


After some research I have ended up now in APC'c thread handling (see code snipped below, out of ksrc/arch/generic/hal.c). From my point of view there could be occur the "lost wakeup problem". That means in detail that rthal_kicker() calls wakeup when rthal_apc_thread() has returned from rthal_apc_handler() but not yet called set_current_state(). After that, when kicker has finished, the APC thread calls set_current_state() and goes to sleep. Thus, the wakeup is lost. Or do I overlook something?  Maybe we should use a waitqueue here?


static int rthal_apc_thread(void *data)
{
    unsigned cpu = (unsigned)(unsigned long)data;

    set_cpus_allowed(current, cpumask_of_cpu(cpu));
    sigfillset(&current->blocked);
    current->flags |= PF_NOFREEZE;
    /* Use highest priority here, since some apc handlers might
       require to run as soon as possible after the request has been
       pended. */
    rthal_setsched_root(current, SCHED_FIFO, MAX_RT_PRIO - 1);

    while (!kthread_should_stop()) {
        set_current_state(TASK_INTERRUPTIBLE);
        schedule();
        rthal_apc_handler(0, NULL);
    }

    __set_current_state(TASK_RUNNING);

    return 0;
}

void rthal_apc_kicker(unsigned virq, void *cookie)
{
    wake_up_process(rthal_apc_servers[smp_processor_id()]);
}

Regards,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-17 12:24                                           ` Stoidner, Christoph
@ 2014-12-17 12:38                                             ` Gilles Chanteperdrix
  2014-12-17 13:22                                             ` Gilles Chanteperdrix
  1 sibling, 0 replies; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-17 12:38 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Wed, Dec 17, 2014 at 12:24:23PM +0000, Stoidner, Christoph wrote:
> 
> After some research I have ended up now in APC'c thread handling
> (see code snipped below, out of ksrc/arch/generic/hal.c). From my
> point of view there could be occur the "lost wakeup problem". That
> means in detail that rthal_kicker() calls wakeup when
> rthal_apc_thread() has returned from rthal_apc_handler() but not
> yet called set_current_state(). After that, when kicker has
> finished, the APC thread calls set_current_state() and goes to
> sleep. Thus, the wakeup is lost. Or do I overlook something? Maybe
> we should use a waitqueue here?

Indeed, this looks suspicious. Please try the wait queue.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-17 12:24                                           ` Stoidner, Christoph
  2014-12-17 12:38                                             ` Gilles Chanteperdrix
@ 2014-12-17 13:22                                             ` Gilles Chanteperdrix
  2014-12-17 15:46                                               ` Gilles Chanteperdrix
  1 sibling, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-17 13:22 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Wed, Dec 17, 2014 at 12:24:23PM +0000, Stoidner, Christoph wrote:
> 
> After some research I have ended up now in APC'c thread handling (see code snipped below, out of ksrc/arch/generic/hal.c). From my point of view there could be occur the "lost wakeup problem". That means in detail that rthal_kicker() calls wakeup when rthal_apc_thread() has returned from rthal_apc_handler() but not yet called set_current_state(). After that, when kicker has finished, the APC thread calls set_current_state() and goes to sleep. Thus, the wakeup is lost. Or do I overlook something?  Maybe we should use a waitqueue here?
> 
> 
> static int rthal_apc_thread(void *data)
> {
>     unsigned cpu = (unsigned)(unsigned long)data;
> 
>     set_cpus_allowed(current, cpumask_of_cpu(cpu));
>     sigfillset(&current->blocked);
>     current->flags |= PF_NOFREEZE;
>     /* Use highest priority here, since some apc handlers might
>        require to run as soon as possible after the request has been
>        pended. */
>     rthal_setsched_root(current, SCHED_FIFO, MAX_RT_PRIO - 1);
> 
>     while (!kthread_should_stop()) {
>         set_current_state(TASK_INTERRUPTIBLE);
>         schedule();

You can obtain the same effect as with a wait queue by replacing the
schedule() above with:

	  if (rthal_apc_pending[cpu] == 0)
	     schedule();

However, using a wait queue will make the code easier to read.

>         rthal_apc_handler(0, NULL);
>     }
> 
>     __set_current_state(TASK_RUNNING);
> 
>     return 0;
> }
> 
> void rthal_apc_kicker(unsigned virq, void *cookie)
> {
>     wake_up_process(rthal_apc_servers[smp_processor_id()]);
> }

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-17 13:22                                             ` Gilles Chanteperdrix
@ 2014-12-17 15:46                                               ` Gilles Chanteperdrix
  2014-12-17 22:40                                                 ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-17 15:46 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Wed, Dec 17, 2014 at 02:22:35PM +0100, Gilles Chanteperdrix wrote:
> On Wed, Dec 17, 2014 at 12:24:23PM +0000, Stoidner, Christoph wrote:
> > 
> > After some research I have ended up now in APC'c thread handling (see code snipped below, out of ksrc/arch/generic/hal.c). From my point of view there could be occur the "lost wakeup problem". That means in detail that rthal_kicker() calls wakeup when rthal_apc_thread() has returned from rthal_apc_handler() but not yet called set_current_state(). After that, when kicker has finished, the APC thread calls set_current_state() and goes to sleep. Thus, the wakeup is lost. Or do I overlook something?  Maybe we should use a waitqueue here?
> > 
> > 
> > static int rthal_apc_thread(void *data)
> > {
> >     unsigned cpu = (unsigned)(unsigned long)data;
> > 
> >     set_cpus_allowed(current, cpumask_of_cpu(cpu));
> >     sigfillset(&current->blocked);
> >     current->flags |= PF_NOFREEZE;
> >     /* Use highest priority here, since some apc handlers might
> >        require to run as soon as possible after the request has been
> >        pended. */
> >     rthal_setsched_root(current, SCHED_FIFO, MAX_RT_PRIO - 1);
> > 
> >     while (!kthread_should_stop()) {
> >         set_current_state(TASK_INTERRUPTIBLE);
> >         schedule();
> 
> You can obtain the same effect as with a wait queue by replacing the
> schedule() above with:
> 
> 	  if (rthal_apc_pending[cpu] == 0)
> 	     schedule();
	  else
	     set_current_state(TASK_RUNNING);


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-17 15:46                                               ` Gilles Chanteperdrix
@ 2014-12-17 22:40                                                 ` Stoidner, Christoph
  0 siblings, 0 replies; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-17 22:40 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

> > After some research I have ended up now in APC'c thread handling 
>> > (see code snipped below, out of ksrc/arch/generic/hal.c). From my 
>> > point of view there could be occur the "lost wakeup problem". That 
>> > means in detail that rthal_kicker() calls wakeup when rthal_apc_thread() 
>> > has returned from rthal_apc_handler() but not yet called set_current_state(). 
>> > After that, when kicker has finished, the APC thread calls set_current_state() 
>> > and goes to sleep. Thus, the wakeup is lost. Or do I overlook something?  
>> > Maybe we should use a waitqueue here?
>> >
>> >
>> > static int rthal_apc_thread(void *data)
>> > {
>> >     unsigned cpu = (unsigned)(unsigned long)data;
>> >
>> >     set_cpus_allowed(current, cpumask_of_cpu(cpu));
>> >     sigfillset(&current->blocked);
>> >     current->flags |= PF_NOFREEZE;
>> >     /* Use highest priority here, since some apc handlers might
>> >        require to run as soon as possible after the request has been
>> >        pended. */
>> >     rthal_setsched_root(current, SCHED_FIFO, MAX_RT_PRIO - 1);
>> >
>> >     while (!kthread_should_stop()) {
>> >         set_current_state(TASK_INTERRUPTIBLE);
>> >         schedule();
>>
>> You can obtain the same effect as with a wait queue by replacing the
>> schedule() above with:
>>
>>         if (rthal_apc_pending[cpu] == 0)
>>            schedule();
>         else
>             set_current_state(TASK_RUNNING);

Thanks for the hint, I like your approach. There we do not need to allocate a
waitqueue resource. Moreover the task wont be scheduled for every wakup call 
but only as often as required. I will check if my application runs stable now with
that fix. If true I will publish the patch here.

Regards,
Christoph
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [Xenomai] Sleeping function called from invalid context
@ 2014-12-06 14:19 Stoidner, Christoph
  2014-12-06 14:25 ` Gilles Chanteperdrix
  2014-12-07 12:40 ` Gilles Chanteperdrix
  0 siblings, 2 replies; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-06 14:19 UTC (permalink / raw)
  To: xenomai@xenomai.org


Hi at all,

I am using linux 3.10.18 and ipipe-core-3.10.18-arm-1 on a Freescale i.MX28. I have also merged PREEMPT RT rt14 into the kernel. FCSE is disabled.

I have enabled several debug options in the kernel. When a started program exits I get the message below:

[   33.187104] BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
[   33.187125] in_atomic(): 1, irqs_disabled(): 128, pid: 87, name: 
[   33.187138] 1 lock held by /87:
[   33.187213]  #0:  (rcu_read_lock){......}, at: [<c002add8>] __lock_task_sighand+0x24/0xf4
[   33.187232] Preemption disabled at:[<  (null)>]   (null)
[   33.187237] 
[   33.187262] CPU: 0 PID: 87 Comm:  Not tainted 3.10.18-rt14-arvero-rev01-ipipe #2
[   33.187337] [<c0013110>] (unwind_backtrace+0x0/0xf0) from [<c001158c>] (show_stack+0x10/0x14)
[   33.187398] [<c001158c>] (show_stack+0x10/0x14) from [<c04a0b70>] (rt_spin_lock+0x20/0x64)
[   33.187444] [<c04a0b70>] (rt_spin_lock+0x20/0x64) from [<c002ae28>] (__lock_task_sighand+0x74/0xf4)
[   33.187479] [<c002ae28>] (__lock_task_sighand+0x74/0xf4) from [<c002aecc>] (do_send_sig_info+0x24/0x64)
[   33.187522] [<c002aecc>] (do_send_sig_info+0x24/0x64) from [<c00b0878>] (lostage_handler+0xf8/0x128)
[   33.187568] [<c00b0878>] (lostage_handler+0xf8/0x128) from [<c0073860>] (rthal_apc_handler+0x60/0x84)
[   33.187615] [<c0073860>] (rthal_apc_handler+0x60/0x84) from [<c0066a58>] (__ipipe_do_sync_stage+0x1f8/0x288)
[   33.187655] [<c0066a58>] (__ipipe_do_sync_stage+0x1f8/0x288) from [<c00145c0>] (__ipipe_syscall_root+0xf4/0x13c)
[   33.187694] [<c00145c0>] (__ipipe_syscall_root+0xf4/0x13c) from [<c000ea34>] (vector_swi+0x54/0x74)

For me it seems as I have done something wrong when merging spinlocks of PREEMPT RT and ipipe. Isn't rt_spin_lock() allowed to be called within that context? Does anyone has an idea why?

Regards,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-06 14:19 Stoidner, Christoph
@ 2014-12-06 14:25 ` Gilles Chanteperdrix
  2014-12-06 15:11   ` Stoidner, Christoph
  2014-12-07 12:40 ` Gilles Chanteperdrix
  1 sibling, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-06 14:25 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Sat, Dec 06, 2014 at 02:19:27PM +0000, Stoidner, Christoph wrote:
> 
> Hi at all,
> 
> I am using linux 3.10.18 and ipipe-core-3.10.18-arm-1 on a Freescale i.MX28. I have also merged PREEMPT RT rt14 into the kernel. FCSE is disabled.

If you have lockdep enabled, you should disable it, as it currently
does not work with xenomai on arm.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-06 14:25 ` Gilles Chanteperdrix
@ 2014-12-06 15:11   ` Stoidner, Christoph
  2014-12-07 12:32     ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-06 15:11 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

Hi Gilles,

> If you have lockdep enabled, you should disable it, as it currently
> does not work with xenomai on arm.

lockdep is disabled in my kernel config.

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-06 15:11   ` Stoidner, Christoph
@ 2014-12-07 12:32     ` Stoidner, Christoph
  0 siblings, 0 replies; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-07 12:32 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>> If you have lockdep enabled, you should disable it, as it currently
>> does not work with xenomai on arm.

> lockdep is disabled in my kernel config.

>From my understanding the output

[   33.187125] in_atomic(): 1, irqs_disabled(): 128, pid: 87, name: 
[   33.187138] 1 lock held by /87:

tells that preemt_count was unequal to zero and IRQ's were disabled. In that case sleeping would not be allowed. Am I right here?

That bug detection is enabled by kernel config DEBUG_ATOMIC_SLEEP. Do you think that debug config has same problem as lockdep and the detected bug is not really a bug?

Actually I am searching for a bug that crashs my application program after some hours and leading to complete system-freeze. Unfortunately no message does appear when the crash happens. That's the reason why I am using the debug config from above. Do you think these problems could be concerned. Our do you have some other idea how to figure out what's happening there?

Regards and thanks in advance,
Christoph 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-06 14:19 Stoidner, Christoph
  2014-12-06 14:25 ` Gilles Chanteperdrix
@ 2014-12-07 12:40 ` Gilles Chanteperdrix
  2014-12-07 13:50   ` Stoidner, Christoph
  1 sibling, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-07 12:40 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Sat, Dec 06, 2014 at 02:19:27PM +0000, Stoidner, Christoph wrote:
> 
> Hi at all,
> 
> I am using linux 3.10.18 and ipipe-core-3.10.18-arm-1 on a Freescale i.MX28. I have also merged PREEMPT RT rt14 into the kernel. FCSE is disabled.
> 
> I have enabled several debug options in the kernel. When a started program exits I get the message below:
> 
> [   33.187104] BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
> [   33.187125] in_atomic(): 1, irqs_disabled(): 128, pid: 87, name: 
> [   33.187138] 1 lock held by /87:
> [   33.187213]  #0:  (rcu_read_lock){......}, at: [<c002add8>] __lock_task_sighand+0x24/0xf4
> [   33.187232] Preemption disabled at:[<  (null)>]   (null)
> [   33.187237] 
> [   33.187262] CPU: 0 PID: 87 Comm:  Not tainted 3.10.18-rt14-arvero-rev01-ipipe #2
> [   33.187337] [<c0013110>] (unwind_backtrace+0x0/0xf0) from [<c001158c>] (show_stack+0x10/0x14)
> [   33.187398] [<c001158c>] (show_stack+0x10/0x14) from [<c04a0b70>] (rt_spin_lock+0x20/0x64)
> [   33.187444] [<c04a0b70>] (rt_spin_lock+0x20/0x64) from [<c002ae28>] (__lock_task_sighand+0x74/0xf4)
> [   33.187479] [<c002ae28>] (__lock_task_sighand+0x74/0xf4) from [<c002aecc>] (do_send_sig_info+0x24/0x64)
> [   33.187522] [<c002aecc>] (do_send_sig_info+0x24/0x64) from [<c00b0878>] (lostage_handler+0xf8/0x128)
> [   33.187568] [<c00b0878>] (lostage_handler+0xf8/0x128) from [<c0073860>] (rthal_apc_handler+0x60/0x84)
> [   33.187615] [<c0073860>] (rthal_apc_handler+0x60/0x84) from [<c0066a58>] (__ipipe_do_sync_stage+0x1f8/0x288)
> [   33.187655] [<c0066a58>] (__ipipe_do_sync_stage+0x1f8/0x288) from [<c00145c0>] (__ipipe_syscall_root+0xf4/0x13c)
> [   33.187694] [<c00145c0>] (__ipipe_syscall_root+0xf4/0x13c) from [<c000ea34>] (vector_swi+0x54/0x74)
> 
> For me it seems as I have done something wrong when merging spinlocks of PREEMPT RT and ipipe. Isn't rt_spin_lock() allowed to be called within that context? Does anyone has an idea why?

>From what I understand of Xenomai 2.6 code (if this is the version
you are using), normally, the apc should be invoked from the
rthal_apc_thread context, not from rthal_apc_handler.

The virq handler should be rthal_apc_kicker, which wakes up the
process.

This code is #ifdef CONFIG_PREEMPT_RT, is not CONFIG_PREEMPT_RT
defined in your kernel ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-07 12:40 ` Gilles Chanteperdrix
@ 2014-12-07 13:50   ` Stoidner, Christoph
  2014-12-07 13:52     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-07 13:50 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org



> From what I understand of Xenomai 2.6 code (if this is the version
> you are using), 

Yes, I am using Xenomai 2.6

> normally, the apc should be invoked from the
> rthal_apc_thread context, not from rthal_apc_handler.
>
> The virq handler should be rthal_apc_kicker, which wakes up the
> process.
>
> This code is #ifdef CONFIG_PREEMPT_RT, is not CONFIG_PREEMPT_RT
> defined in your kernel ?

In my kernel CONFIG_PREEMPT_RT is not defined but CONFIG_PREEMPT_RT_FULL. Moreover except in Xenomai code I did not find any CONFIG_PREEMPT_RT in kernel code or preempt RT patch-3.10.18-rt14. In older versions of preempt RT (e.g. patch-2.6.33.9-rt31) I can found such a define/configuration. It seems as it was removed. 

Regards,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-07 13:50   ` Stoidner, Christoph
@ 2014-12-07 13:52     ` Gilles Chanteperdrix
  2014-12-07 15:05       ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-07 13:52 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Sun, Dec 07, 2014 at 01:50:02PM +0000, Stoidner, Christoph wrote:
> 
> 
> > From what I understand of Xenomai 2.6 code (if this is the version
> > you are using), 
> 
> Yes, I am using Xenomai 2.6
> 
> > normally, the apc should be invoked from the
> > rthal_apc_thread context, not from rthal_apc_handler.
> >
> > The virq handler should be rthal_apc_kicker, which wakes up the
> > process.
> >
> > This code is #ifdef CONFIG_PREEMPT_RT, is not CONFIG_PREEMPT_RT
> > defined in your kernel ?
> 
> In my kernel CONFIG_PREEMPT_RT is not defined but
> CONFIG_PREEMPT_RT_FULL. Moreover except in Xenomai code I did not
> find any CONFIG_PREEMPT_RT in kernel code or preempt RT
> patch-3.10.18-rt14. In older versions of preempt RT (e.g.
> patch-2.6.33.9-rt31) I can found such a define/configuration. It
> seems as it was removed.

Well, then you need to replace #ifdef CONFIG_PREEMPT_RT with
CONFIG_PREEMPT_RT_FULL, and test the kernel version in wrappers.h to
define CONFIG_PREEMPT_RT_FULL if CONFIG_PREEMPT_RT is dedined for
old kernels.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-07 13:52     ` Gilles Chanteperdrix
@ 2014-12-07 15:05       ` Stoidner, Christoph
  2014-12-09 20:06         ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-07 15:05 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


> Well, then you need to replace #ifdef CONFIG_PREEMPT_RT with
> CONFIG_PREEMPT_RT_FULL, and test the kernel version in wrappers.h to
> define CONFIG_PREEMPT_RT_FULL if CONFIG_PREEMPT_RT is dedined for
> old kernels.

I have applied both changes and try if this solves the my problem. If it works I will publish the patch here.

Thanks,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-07 15:05       ` Stoidner, Christoph
@ 2014-12-09 20:06         ` Stoidner, Christoph
  2014-12-09 20:08           ` Gilles Chanteperdrix
  2014-12-09 20:49           ` Stoidner, Christoph
  0 siblings, 2 replies; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-09 20:06 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


Hi Gilles,

>> Well, then you need to replace #ifdef CONFIG_PREEMPT_RT with
>> CONFIG_PREEMPT_RT_FULL, and test the kernel version in wrappers.h to
>> define CONFIG_PREEMPT_RT_FULL if CONFIG_PREEMPT_RT is dedined for
>> old kernels.

>I have applied both changes and try if this solves the my problem. If it works I will publish the patch here.

below you can find the patch. It solves the problem concerning "sleeping function called from invalid context" when using PREEMPT RT. Would it be reasonable to apply that change to GIT repository?

Index: ksrc/arch/generic/hal.c
===================================================================
--- ksrc/arch/generic/hal.c	(Revision 1136)
+++ ksrc/arch/generic/hal.c	(Arbeitskopie)
@@ -355,7 +355,7 @@
     rthal_spin_unlock(&rthal_apc_lock);
 }
 
-#ifdef CONFIG_PREEMPT_RT
+#ifdef CONFIG_PREEMPT_RT_FULL
 
 /* On PREEMPT_RT, we need to invoke the apc handlers over a process
    context, so that the latter can access non-atomic kernel services
@@ -398,11 +398,11 @@
 
 #define rthal_apc_trampoline rthal_apc_kicker
 
-#else /* !CONFIG_PREEMPT_RT */
+#else /* !CONFIG_PREEMPT_RT_FULL */
 
 #define rthal_apc_trampoline rthal_apc_handler
 
-#endif /* CONFIG_PREEMPT_RT */
+#endif /* CONFIG_PREEMPT_RT_FULL */
 
 /**
  * @fn int rthal_apc_alloc (const char *name,void (*handler)(void *cookie),void *cookie)
@@ -495,7 +495,7 @@
 	clear_bit(apc, &rthal_apc_map);
 }
 
-#ifdef CONFIG_PREEMPT_RT
+#ifdef CONFIG_PREEMPT_RT_FULL
 
 static inline int setup_apc_handler(void)
 {
@@ -524,7 +524,7 @@
 	}
 }
 
-#else  /* !CONFIG_PREEMPT_RT */
+#else  /* !CONFIG_PREEMPT_RT_FULL */
 
 static inline int setup_apc_handler(void)
 {
@@ -533,7 +533,7 @@
 
 static inline void cleanup_apc_handler(void) { }
 
-#endif  /* !CONFIG_PREEMPT_RT */
+#endif  /* !CONFIG_PREEMPT_RT_FULL */
 
 int rthal_init(void)
 {
Index: ksrc/nucleus/pipe.c
===================================================================
--- ksrc/nucleus/pipe.c	(Revision 1136)
+++ ksrc/nucleus/pipe.c	(Arbeitskopie)
@@ -173,7 +173,7 @@
 					xnlock_get_irqsave(&nklock, s);
 				}
 			}
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
 			/*
 			 * Assume a waiter might have entered/left the
 			 * queue, so we need to refetch the sleep
@@ -198,7 +198,7 @@
 			xnlock_put_irqrestore(&nklock, s);
 			kill_fasync(&state->asyncq, xnpipe_asyncsig, POLL_IN);
 			xnlock_get_irqsave(&nklock, s);
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
 			nh = getheadq(&xnpipe_asyncq);
 #endif
 		}
Index: include/asm-generic/wrappers.h
===================================================================
--- include/asm-generic/wrappers.h	(Revision 1136)
+++ include/asm-generic/wrappers.h	(Arbeitskopie)
@@ -43,6 +43,12 @@
 #include <asm/system.h>
 #endif /* kernel < 3.4.0 */
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,0,0)
+#ifdef CONFIG_PREEMPT_RT
+#define CONFIG_PREEMPT_RT_FULL /* after kernel 2.6 CONFIG_PREEMPT_RT was renamed */
+#endif
+#endif /* kernel < 3.0.0 */
+
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
 
 #include <linux/wrapper.h>
@@ -664,8 +670,8 @@
 #ifndef DEFINE_SEMAPHORE
 /* Legacy DECLARE_MUTEX vanished in 2.6.37 */
 #define DEFINE_BINARY_SEMAPHORE(sem) DECLARE_MUTEX(sem)
-#elif defined(CONFIG_PREEMPT_RT)
-#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
+#elif defined(CONFIG_PREEMPT_RT_FULL)
+#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
 #else
 #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
 #endif

Regards,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:06         ` Stoidner, Christoph
@ 2014-12-09 20:08           ` Gilles Chanteperdrix
  2014-12-09 20:18             ` Stoidner, Christoph
  2014-12-09 20:49           ` Stoidner, Christoph
  1 sibling, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-09 20:08 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Tue, Dec 09, 2014 at 08:06:03PM +0000, Stoidner, Christoph wrote:
> 
> Hi Gilles,
> 
> >> Well, then you need to replace #ifdef CONFIG_PREEMPT_RT with
> >> CONFIG_PREEMPT_RT_FULL, and test the kernel version in wrappers.h to
> >> define CONFIG_PREEMPT_RT_FULL if CONFIG_PREEMPT_RT is dedined for
> >> old kernels.
> 
> >I have applied both changes and try if this solves the my problem. If it works I will publish the patch here.
> 
> below you can find the patch. It solves the problem concerning "sleeping function called from invalid context" when using PREEMPT RT. Would it be reasonable to apply that change to GIT repository?

Yes, of course.

>  #ifndef DEFINE_SEMAPHORE
>  /* Legacy DECLARE_MUTEX vanished in 2.6.37 */
>  #define DEFINE_BINARY_SEMAPHORE(sem) DECLARE_MUTEX(sem)
> -#elif defined(CONFIG_PREEMPT_RT)
> -#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
> +#elif defined(CONFIG_PREEMPT_RT_FULL)
> +#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
>  #else
>  #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
>  #endif

Why that change? I mean in 2.6.37, the define was CONFIG_PREEMPT_RT
anwyay.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:08           ` Gilles Chanteperdrix
@ 2014-12-09 20:18             ` Stoidner, Christoph
  2014-12-09 20:24               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-09 20:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


>>  #ifndef DEFINE_SEMAPHORE
>>  /* Legacy DECLARE_MUTEX vanished in 2.6.37 */
>>  #define DEFINE_BINARY_SEMAPHORE(sem) DECLARE_MUTEX(sem)
>> -#elif defined(CONFIG_PREEMPT_RT)
>> -#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
>> +#elif defined(CONFIG_PREEMPT_RT_FULL)
>> +#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
>>  #else
>>  #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
>>  #endif

> Why that change? I mean in 2.6.37, the define was CONFIG_PREEMPT_RT
> anwyay.

If DEFINE_SEMAPHORE is not defined we are below 2.6.37. So for 2.6.37 and above CONFIG_PREEMPT_RT was checked.

Regards,
Christoph

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:18             ` Stoidner, Christoph
@ 2014-12-09 20:24               ` Gilles Chanteperdrix
  2014-12-09 20:34                 ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-09 20:24 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Tue, Dec 09, 2014 at 08:18:53PM +0000, Stoidner, Christoph wrote:
> 
> >>  #ifndef DEFINE_SEMAPHORE
> >>  /* Legacy DECLARE_MUTEX vanished in 2.6.37 */
> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DECLARE_MUTEX(sem)
> >> -#elif defined(CONFIG_PREEMPT_RT)
> >> -#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
> >> +#elif defined(CONFIG_PREEMPT_RT_FULL)
> >> +#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
> >>  #else
> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
> >>  #endif
> 
> > Why that change? I mean in 2.6.37, the define was CONFIG_PREEMPT_RT
> > anwyay.
> 
> If DEFINE_SEMAPHORE is not defined we are below 2.6.37. So for 2.6.37 and above CONFIG_PREEMPT_RT was checked.

Well, are not we in the "elif" case of the #ifndef DEFINE_SEMAPHORE,
so DEFINE_SEMAPHORE is defined, or am I missing something ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:24               ` Gilles Chanteperdrix
@ 2014-12-09 20:34                 ` Stoidner, Christoph
  2014-12-09 20:37                   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-09 20:34 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


>>
>> >>  #ifndef DEFINE_SEMAPHORE
>> >>  /* Legacy DECLARE_MUTEX vanished in 2.6.37 */
>> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DECLARE_MUTEX(sem)
>> >> -#elif defined(CONFIG_PREEMPT_RT)
>> >> -#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
>> >> +#elif defined(CONFIG_PREEMPT_RT_FULL)
>> >> +#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
>> >>  #else
>> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
>> >>  #endif
>>
>> > Why that change? I mean in 2.6.37, the define was CONFIG_PREEMPT_RT
>> > anwyay.
>>
>> If DEFINE_SEMAPHORE is not defined we are below 2.6.37. So for 2.6.37 and above CONFIG_PREEMPT_RT was checked.

> Well, are not we in the "elif" case of the #ifndef DEFINE_SEMAPHORE,
> so DEFINE_SEMAPHORE is defined, or am I missing something ?

Let me explain that more in detail:

DEFINE_SEMAPHORE is defined for linux >= 2.6.37

So for linux 2.6.37 we are in 

#elif defined(CONFIG_PREEMPT_RT)

Also in linux 3.0.0 we are in that elif. Since in 3.0.0 and above CONFIG_PREEMPT_RT 
was changed to CONFIG_PREEMPT_RT_FULL, so we have to use the latter one instead.

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:34                 ` Stoidner, Christoph
@ 2014-12-09 20:37                   ` Gilles Chanteperdrix
  2014-12-09 20:47                     ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-09 20:37 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Tue, Dec 09, 2014 at 08:34:11PM +0000, Stoidner, Christoph wrote:
> 
> >>
> >> >>  #ifndef DEFINE_SEMAPHORE
> >> >>  /* Legacy DECLARE_MUTEX vanished in 2.6.37 */
> >> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DECLARE_MUTEX(sem)
> >> >> -#elif defined(CONFIG_PREEMPT_RT)
> >> >> -#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
> >> >> +#elif defined(CONFIG_PREEMPT_RT_FULL)
> >> >> +#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
> >> >>  #else
> >> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
> >> >>  #endif
> >>
> >> > Why that change? I mean in 2.6.37, the define was CONFIG_PREEMPT_RT
> >> > anwyay.
> >>
> >> If DEFINE_SEMAPHORE is not defined we are below 2.6.37. So for 2.6.37 and above CONFIG_PREEMPT_RT was checked.
> 
> > Well, are not we in the "elif" case of the #ifndef DEFINE_SEMAPHORE,
> > so DEFINE_SEMAPHORE is defined, or am I missing something ?
> 
> Let me explain that more in detail:
> 
> DEFINE_SEMAPHORE is defined for linux >= 2.6.37
> 
> So for linux 2.6.37 we are in 
> 
> #elif defined(CONFIG_PREEMPT_RT)
> 
> Also in linux 3.0.0 we are in that elif. Since in 3.0.0 and above CONFIG_PREEMPT_RT 
> was changed to CONFIG_PREEMPT_RT_FULL, so we have to use the latter one instead.

Ok, but please the two ifdefs then, one for PREEMPT_RT and the other
for PREEMT_RT_FULL. This will avoid breaking is someone decides to
change the header and moves the chunks around.

Also, I believe the hunk is still wrong, it replaces:
#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
with:
#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:37                   ` Gilles Chanteperdrix
@ 2014-12-09 20:47                     ` Stoidner, Christoph
  2014-12-09 20:55                       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-09 20:47 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>> >>
>> >> >>  #ifndef DEFINE_SEMAPHORE
>> >> >>  /* Legacy DECLARE_MUTEX vanished in 2.6.37 */
>> >> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DECLARE_MUTEX(sem)
>> >> >> -#elif defined(CONFIG_PREEMPT_RT)
>> >> >> -#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
>> >> >> +#elif defined(CONFIG_PREEMPT_RT_FULL)
>> >> >> +#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
>> >> >>  #else
>> >> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
>> >> >>  #endif
>> >>
>> >> > Why that change? I mean in 2.6.37, the define was CONFIG_PREEMPT_RT
>> >> > anwyay.
>> >>
>> >> If DEFINE_SEMAPHORE is not defined we are below 2.6.37. So for 2.6.37 and above CONFIG_PREEMPT_RT was checked.
>>
>> > Well, are not we in the "elif" case of the #ifndef DEFINE_SEMAPHORE,
>> > so DEFINE_SEMAPHORE is defined, or am I missing something ?
>>
>> Let me explain that more in detail:
>>
>> DEFINE_SEMAPHORE is defined for linux >= 2.6.37
>>
>> So for linux 2.6.37 we are in
>>
>> #elif defined(CONFIG_PREEMPT_RT)
>>
>> Also in linux 3.0.0 we are in that elif. Since in 3.0.0 and above CONFIG_PREEMPT_RT
>> was changed to CONFIG_PREEMPT_RT_FULL, so we have to use the latter one instead.

> Ok, but please the two ifdefs then, one for PREEMPT_RT and the other
> for PREEMT_RT_FULL. This will avoid breaking is someone decides to
> change the header and moves the chunks around.

Sorry, maybe I am a little bit confused but I do not understand what you mean. There is only one 
ifdef using CONFIG_PREEMPT_RT. This is required for compatibility to lower kernel versions
to introduce newer CONFIG_PREEMPT_RT_FULL. All other code parts are using 
CONFIG_PREEMPT_RT_FULL afterwards.

> Also, I believe the hunk is still wrong, it replaces:
> #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
> with:
> #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)

Your are right, that's true! In newer kernel version DEFINE_SEMAPHORE does accept only
one argument. But I am not sure why and in which version. So I will check how to do here.

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:47                     ` Stoidner, Christoph
@ 2014-12-09 20:55                       ` Gilles Chanteperdrix
  0 siblings, 0 replies; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-09 20:55 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Tue, Dec 09, 2014 at 08:47:54PM +0000, Stoidner, Christoph wrote:
> >> >>
> >> >> >>  #ifndef DEFINE_SEMAPHORE
> >> >> >>  /* Legacy DECLARE_MUTEX vanished in 2.6.37 */
> >> >> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DECLARE_MUTEX(sem)
> >> >> >> -#elif defined(CONFIG_PREEMPT_RT)
> >> >> >> -#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem, 1)
> >> >> >> +#elif defined(CONFIG_PREEMPT_RT_FULL)
> >> >> >> +#define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
> >> >> >>  #else
> >> >> >>  #define DEFINE_BINARY_SEMAPHORE(sem) DEFINE_SEMAPHORE(sem)
> >> >> >>  #endif
> >> >>
> >> >> > Why that change? I mean in 2.6.37, the define was CONFIG_PREEMPT_RT
> >> >> > anwyay.
> >> >>
> >> >> If DEFINE_SEMAPHORE is not defined we are below 2.6.37. So for 2.6.37 and above CONFIG_PREEMPT_RT was checked.
> >>
> >> > Well, are not we in the "elif" case of the #ifndef DEFINE_SEMAPHORE,
> >> > so DEFINE_SEMAPHORE is defined, or am I missing something ?
> >>
> >> Let me explain that more in detail:
> >>
> >> DEFINE_SEMAPHORE is defined for linux >= 2.6.37
> >>
> >> So for linux 2.6.37 we are in
> >>
> >> #elif defined(CONFIG_PREEMPT_RT)
> >>
> >> Also in linux 3.0.0 we are in that elif. Since in 3.0.0 and above CONFIG_PREEMPT_RT
> >> was changed to CONFIG_PREEMPT_RT_FULL, so we have to use the latter one instead.
> 
> > Ok, but please the two ifdefs then, one for PREEMPT_RT and the other
> > for PREEMT_RT_FULL. This will avoid breaking is someone decides to
> > change the header and moves the chunks around.
> 
> Sorry, maybe I am a little bit confused but I do not understand
> what you mean. There is only one ifdef using CONFIG_PREEMPT_RT.
> This is required for compatibility to lower kernel versions to
> introduce newer CONFIG_PREEMPT_RT_FULL. All other code parts are
> using CONFIG_PREEMPT_RT_FULL afterwards.

This is what I asked in Xenomai code. But in wrappers.h, I would
prefer the code to not depend on the order of the definitions.

Currently, you have, at the beginning of the file

#if whatever
#ifdef CONFIG_PREEMPT_RT
#define CONFIG_PREEMPT_RT_FULL
#endif
#endif

And later:

#ifdef CONFIG_PREEMPT_RT_FULL

This works because if we are in the PREEMPT_RT case, the define at
the top of the file will get PREEMT_RT_FULL defined.

Now, if someone decides to move the ifdef from the top of the file
to the bottom of the file. The binary semaphore defines breaks.

Whereas if at the bottom of the file, you use:

#ifdef CONFIG_PREEMPT_RT
foo
#elif define CONFIG_PREEMPT_RT_FULL
foo

You no longer have this issue.

Beware of these ifdefs, breakage is easy and may long stay
unnoticed.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:06         ` Stoidner, Christoph
  2014-12-09 20:08           ` Gilles Chanteperdrix
@ 2014-12-09 20:49           ` Stoidner, Christoph
  2014-12-09 20:59             ` Gilles Chanteperdrix
  1 sibling, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-09 20:49 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


Now with the fix from above unfortunately I ran into another problem. 

After some undetermined situation nuclues freezes. That means all tasks running in primary domain are freezed. In contrast all processes and tasks of secondary domain are still running without any problem at the same time. The xenomai task states are looking normal:

~ # cat /proc/xenomai/stat
CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
  0  0      0          13914      0     00500080   98.8  ROOT
  0  83     196        196        0     00300380    0.0  DOS4
  0  82     13         13         0     00300380    0.0  DOS5

I have enabled ipipe traceing to figure out what's happening here (see output below, created after tasks have freezed). However I am not sure how to interpret that. Does anyone has an idea?

~ # echo 1 > /proc/ipipe/trace/frozen 
~ # cat /proc/ipipe/trace/frozen 
I-pipe frozen back-tracing service on 3.10.18-rt14-arvero-rev01-ipipe/ipipe release #1
------------------------------------------------------------
CPU: 0, Freeze: 2248334328 cycles, Trace Points: 100 (+10)
Calibrated minimum trace-point overhead: 1.080 us

 +----- Hard IRQs ('|': locked)
 |+-- Xenomai
 ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
 |||			  +---------- Delay flag ('+': > 1 us, '!': > 10 us)
 |||			  |	   +- NMI noise ('N')
 |||			  |	   |
	  Type	  User Val.   Time    Delay  Function (Parent)
:    +func                -181+   1.040  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x1c)
:    +func                -180+   1.200  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x88)
:    +func                -179+   2.160  filp_close+0x10 (do_dup2+0xdc)
:    +func                -177+   1.760  locks_remove_posix+0x14 (filp_close+0x64)
:    +func                -175+   1.240  fput+0x10 (filp_close+0x6c)
:|   +begin   0x80000001  -174+   1.520  fput+0xe8 (filp_close+0x6c)
:|   +end     0x80000001  -172+   1.240  fput+0xfc (filp_close+0x6c)
:|   +func                -171	  0.880  __ipipe_bugon_irqs_enabled+0x10 (ret_fast_syscall+0x14)
:|   +end     0x90000000  -170+   2.080  ret_fast_syscall+0x2c (<0000b5bc>)
:|   +begin   0x90000000  -168+   2.320  vector_swi+0x3c (<0000b3d8>)
:    +func                -166+   1.360  __ipipe_syscall_root+0x10 (vector_swi+0x74)
:    +func                -164	  0.920  SyS_close+0x10 (ret_fast_syscall+0x0)
:    +func                -164+   1.080  __close_fd+0x10 (SyS_close+0x30)
:    +func                -162	  0.880  rt_spin_lock+0x10 (__close_fd+0x28)
:    +func                -162	  0.960  ipipe_root_only+0x10 (rt_spin_lock+0x1c)
:    +func                -161	  0.920  rt_spin_lock_slowlock+0x14 (rt_spin_lock+0x24)
:    +func                -160+   1.320  ipipe_root_only+0x10 (rt_spin_lock_slowlock+0x40)
:    +func                -158+   1.520  __try_to_take_rt_mutex+0x14 (rt_spin_lock_slowlock+0x70)
:    +func                -157+   2.080  ipipe_root_only+0x10 (rt_spin_lock_slowlock+0x238)
:    +func                -155	  0.880  rt_spin_unlock+0x10 (__close_fd+0xbc)
:    +func                -154	  0.880  rt_spin_lock_slowunlock+0x10 (rt_spin_unlock+0x18)
:    +func                -153	  1.000  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x1c)
:    +func                -152+   1.040  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x88)
:    +func                -151+   1.120  filp_close+0x10 (__close_fd+0xc8)
:    +func                -150+   1.120  locks_remove_posix+0x14 (filp_close+0x64)
:    +func                -149	  0.960  fput+0x10 (filp_close+0x6c)
:|   +begin   0x80000001  -148	  0.920  fput+0xe8 (filp_close+0x6c)
:|   +end     0x80000001  -147+   1.800  fput+0xfc (filp_close+0x6c)
:|   +func                -145	  1.000  __ipipe_bugon_irqs_enabled+0x10 (ret_fast_syscall+0x14)
:|   +end     0x90000000  -144!  18.320  ret_fast_syscall+0x2c (<0000b3dc>)
:|   +begin   0x90000000  -126+   2.360  vector_swi+0x3c (<0012097c>)
:    +func                -123+   1.800  __ipipe_syscall_root+0x10 (vector_swi+0x74)
:    +func                -122+   1.960  SyS_wait4+0x14 (ret_fast_syscall+0x0)
:    +func                -120+   2.440  do_wait+0x14 (SyS_wait4+0x94)
:    +func                -117+   1.320  add_wait_queue+0x10 (do_wait+0x6c)
:    +func                -116+   1.200  rt_spin_lock+0x10 (add_wait_queue+0x30)
:    +func                -115+   1.160  ipipe_root_only+0x10 (rt_spin_lock+0x1c)
:    +func                -113	  0.920  rt_spin_lock_slowlock+0x14 (rt_spin_lock+0x24)
:    +func                -113+   1.640  ipipe_root_only+0x10 (rt_spin_lock_slowlock+0x40)
:    +func                -111+   1.320  __try_to_take_rt_mutex+0x14 (rt_spin_lock_slowlock+0x70)
:    +func                -110+   2.040  ipipe_root_only+0x10 (rt_spin_lock_slowlock+0x238)
:    +func                -108+   1.040  rt_spin_unlock+0x10 (add_wait_queue+0x50)
:    +func                -107	  0.880  rt_spin_lock_slowunlock+0x10 (rt_spin_unlock+0x18)
:    +func                -106	  1.000  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x1c)
:    +func                -105+   1.920  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x88)
:    +func                -103+   1.080  rt_read_lock+0x10 (do_wait+0xb0)
:    +func                -102	  0.880  __rt_spin_lock+0x10 (rt_read_lock+0x3c)
:    +func                -101	  0.960  ipipe_root_only+0x10 (__rt_spin_lock+0x1c)
:    +func                -100	  0.920  rt_spin_lock_slowlock+0x14 (__rt_spin_lock+0x24)
:    +func                 -99+   1.160  ipipe_root_only+0x10 (rt_spin_lock_slowlock+0x40)
:    +func                 -98	  1.000  __try_to_take_rt_mutex+0x14 (rt_spin_lock_slowlock+0x70)
:    +func                 -97+   1.920  ipipe_root_only+0x10 (rt_spin_lock_slowlock+0x238)
:    +func                 -95+   2.440  wait_consider_task+0x14 (do_wait+0xec)
:    +func                 -92+   3.360  task_stopped_code+0x10 (wait_consider_task+0xe4)
:    +func                 -89+   1.160  rt_read_unlock+0x10 (do_wait+0x194)
:    +func                 -88	  0.880  __rt_spin_unlock+0x10 (rt_read_unlock+0x2c)
:    +func                 -87	  0.880  rt_spin_lock_slowunlock+0x10 (__rt_spin_unlock+0x18)
:    +func                 -86	  1.000  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x1c)
:    +func                 -85+   1.760  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x88)
:    +func                 -83	  0.880  remove_wait_queue+0x10 (do_wait+0x114)
:    +func                 -82	  0.880  rt_spin_lock+0x10 (remove_wait_queue+0x20)
:    +func                 -82	  0.960  ipipe_root_only+0x10 (rt_spin_lock+0x1c)
:    +func                 -81	  0.920  rt_spin_lock_slowlock+0x14 (rt_spin_lock+0x24)
:    +func                 -80	  1.000  ipipe_root_only+0x10 (rt_spin_lock_slowlock+0x40)
:    +func                 -79	  1.000  __try_to_take_rt_mutex+0x14 (rt_spin_lock_slowlock+0x70)
:    +func                 -78+   1.680  ipipe_root_only+0x10 (rt_spin_lock_slowlock+0x238)
:    +func                 -76	  0.880  rt_spin_unlock+0x10 (remove_wait_queue+0x48)
:    +func                 -75	  0.880  rt_spin_lock_slowunlock+0x10 (rt_spin_unlock+0x18)
:    +func                 -74	  1.000  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x1c)
:    +func                 -73+   1.560  ipipe_root_only+0x10 (rt_spin_lock_slowunlock+0x88)
:    +func                 -72+   1.480  put_pid+0x10 (SyS_wait4+0xa0)
:|   +func                 -70	  0.840  __ipipe_bugon_irqs_enabled+0x10 (ret_fast_syscall+0x14)
:|   +end     0x90000000   -69!  18.440  ret_fast_syscall+0x2c (<00120980>)
:|   +begin   0x90000000   -51+   2.240  vector_swi+0x3c (<0000b4f8>)
:    +func                 -49+   1.560  __ipipe_syscall_root+0x10 (vector_swi+0x74)
:    +func                 -47+   1.200  SyS_write+0x14 (ret_fast_syscall+0x0)
:    +func                 -46+   2.320  fget_light+0x10 (SyS_write+0x28)
:    +func                 -44+   2.640  vfs_write+0x14 (SyS_write+0x50)
:    +func                 -41+   2.640  rw_verify_area+0x14 (vfs_write+0x8c)
:    +func                 -38+   1.280  __sb_start_write+0x14 (vfs_write+0x1c8)
:    +func                 -37	  1.000  ipipe_root_only+0x10 (__sb_start_write+0x48)
:    +func                 -36+   1.920  ipipe_root_only+0x10 (__sb_start_write+0x70)
:    +func                 -34+   2.120  proc_reg_write+0x14 (vfs_write+0xcc)
:|   +begin   0x80000001   -32+   1.360  proc_reg_write+0xb8 (vfs_write+0xcc)
:|   +end     0x80000001   -31+   2.040  proc_reg_write+0xe0 (vfs_write+0xcc)
:    +func                 -29+   7.080  __ipipe_frozen_ctrl+0x14 (proc_reg_write+0x94)
:    +func                 -22	  0.920  _mutex_lock+0x10 (__ipipe_frozen_ctrl+0x9c)
:    +func                 -21+   1.080  rt_mutex_lock+0x10 (_mutex_lock+0x18)
:    +func                 -20+   1.440  ipipe_root_only+0x10 (rt_mutex_lock+0x1c)
:    +func                 -18+   1.080  rt_mutex_slowlock+0x14 (rt_mutex_lock+0x30)
:    +func                 -17+   1.800  ipipe_root_only+0x10 (rt_mutex_slowlock+0x38)
:    +func                 -15+   1.320  __try_to_take_rt_mutex+0x14 (rt_mutex_slowlock+0x74)
:    +func                 -14+   1.800  ipipe_root_only+0x10 (rt_mutex_slowlock+0x128)
:    +func                 -12+   1.240  ipipe_trace_frozen_reset+0x10 (__ipipe_frozen_ctrl+0xa0)
:    +func                 -11+   2.160  __ipipe_global_path_lock+0x10 (ipipe_trace_frozen_reset+0x18)
:    +func                  -9+   1.400  __ipipe_spin_lock_irqsave+0x10 (__ipipe_global_path_lock+0x1c)
:|   +begin   0x80000001    -7+   3.960  __ipipe_spin_lock_irqsave+0x5c (__ipipe_global_path_lock+0x1c)
:|   #func                  -3+   1.680  __ipipe_spin_unlock_irqcomplete+0x10 (__ipipe_global_path_unlock+0x6c)
:|   +end     0x80000001    -2+   2.120  __ipipe_spin_unlock_irqcomplete+0x48 (__ipipe_global_path_unlock+0x6c)
<    +freeze  0xffffffff     0	  1.440  __ipipe_frozen_ctrl+0xb0 (proc_reg_write+0x94)
     +func                   1	  1.200  _mutex_unlock+0x10 (__ipipe_frozen_ctrl+0xb8)
     +func                   2	  0.920  rt_mutex_unlock+0x10 (_mutex_unlock+0x18)
     +func                   3	  1.160  ipipe_root_only+0x10 (rt_mutex_unlock+0x1c)
     +func                   4	  1.560  ipipe_root_only+0x10 (rt_mutex_unlock+0x88)
     +func                   6	  1.280  unuse_pde+0x10 (proc_reg_write+0xa0)
 |   +begin   0x80000001     7	  1.120  unuse_pde+0x58 (proc_reg_write+0xa0)
 |   +end     0x80000001     8	  2.320  unuse_pde+0x6c (proc_reg_write+0xa0)
     +func                  11	  1.240  __fsnotify_parent+0x14 (vfs_write+0x168)
     +func                  12	  1.520  fsnotify+0x14 (vfs_write+0x184)
     +func                  13	  0.000  __srcu_read_lock+0x10 (fsnotify+0x1d0)
~ # 

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:49           ` Stoidner, Christoph
@ 2014-12-09 20:59             ` Gilles Chanteperdrix
  2014-12-10 16:23               ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-09 20:59 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Tue, Dec 09, 2014 at 08:49:49PM +0000, Stoidner, Christoph wrote:
> 
> Now with the fix from above unfortunately I ran into another problem. 
> 
> After some undetermined situation nuclues freezes. That means all
> tasks running in primary domain are freezed. In contrast all
> processes and tasks of secondary domain are still running without
> any problem at the same time. The xenomai task states are looking
> normal:
> 
> ~ # cat /proc/xenomai/stat
> CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
>   0  0      0          13914      0     00500080   98.8  ROOT
>   0  83     196        196        0     00300380    0.0  DOS4
>   0  82     13         13         0     00300380    0.0  DOS5
> 
> I have enabled ipipe traceing to figure out what's happening here
> (see output below, created after tasks have freezed). However I am
> not sure how to interpret that. Does anyone has an idea?

ipipe tracing will not help you, because you have no way to trigger
it at the right time.

Anyway, is not it simply a deadlock in your application ?

Please check that Xenomai timer is still running by checking that
its counter increases in cat /proc/xenomai/irq result.

If Xenomai and Linux timers are shared (I believe they are on
imx28), checking the same thing for Linux timer in /proc/interrupts
is sufficient, Xenomai timer can not possibly be jammed if Linux
timer still ticks.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-09 20:59             ` Gilles Chanteperdrix
@ 2014-12-10 16:23               ` Stoidner, Christoph
  2014-12-10 16:26                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-10 16:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org


>
> Anyway, is not it simply a deadlock in your application ?

I don't think so. Here is a list of all tasks:

~ # cat /proc/xenomai/stat 
CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
  0  0      0          7452       0     00500080   98.9  ROOT
  0  83     196        196        0     00300380    0.0  DOS4
  0  81     13         13         0     00300380    0.0  DOS5
  0  97     618        830        0     00300380    0.0  @WDG
  0  101    751        1127       0     00300380    0.0  HSSR
  0  128    1268       2186       0     00300380    0.0  DOS8
  0  129    1          1          0     00300380    0.0  @CGI
  0  130    7          10         0     00300182    0.0  LOG 
  0  131    1          2          0     00300184    0.0  IOXP
  0  142    20         42         0     00300380    0.0  KTMR
  0  151    1652       1901       0     00300380    0.0  Sdrv
  0  152    383        489        0     00300380    0.0  LApp
  0  156    0          1          0     00300380    0.0  DOS9
  0  0      0          14937556   0     00000000    1.0  IRQ16: [timer]

Most of them are in state 00300380, that means:
  XNSTARTED
  XNMAPPED
  XNRELAXED
  XNFPU
  XNSHADOW

For a deadlock I would expect all threads must be wait for a semaphore or something else (state flag XNPEND). However all tasks are freezed.

> Please check that Xenomai timer is still running by checking that
> its counter increases in cat /proc/xenomai/irq result.

That's the output:

~ # cat /proc/xenomai/irq 
IRQ         CPU0
 16:    14155140         [timer]
1027:        5233         [virtual]
~ # cat /proc/xenomai/irq 
IRQ         CPU0
 16:    14158294         [timer]
1027:        5233         [virtual]
~ # cat /proc/xenomai/irq 
IRQ         CPU0
 16:    14159556         [timer]
1027:        5233         [virtual]
~ # 

The time-IRQ (16) is still counting. However the IRQ 1027 - called "virtual" - is stopped when all xenomai tasks are freezed. Do you know for what that 'virtual' IRQ is? 

> If Xenomai and Linux timers are shared (I believe they are on
> imx28), checking the same thing for Linux timer in /proc/interrupts
> is sufficient, Xenomai timer can not possibly be jammed if Linux
> timer still ticks.

Linux timer seem to run as well:

~ # cat /proc/interrupts 
           CPU0       
 16:      49931         -  MXS Timer Tick
 18:         61         -  mxs-dma
 19:          0         -  mxs-dma
 25:         11         -  mxs-mmc
 26:          0         -  mxs-spi
207:          0         -  mxs-lradc-touchscreen
208:          0         -  mxs-lradc-thresh0
209:          0         -  mxs-lradc-thresh1
210:          0         -  mxs-lradc-channel0
211:          0         -  mxs-lradc-channel1
212:          0         -  mxs-lradc-channel2
213:          0         -  mxs-lradc-channel3
214:          0         -  mxs-lradc-channel4
215:          0         -  mxs-lradc-channel5
216:          0         -  mxs-lradc-channel6
217:          0         -  mxs-lradc-channel7
218:          0         -  mxs-lradc-button0
219:          0         -  mxs-lradc-button1
220:          0         -  RTC alarm
224:         90         -  80072000.serial
225:       2982         -  uart-pl011
226:          0         -  ci13xxx_imx
227:      40138         -  800f0000.ethernet
Err:          0
~ # cat /proc/interrupts 
           CPU0       
 16:      49958         -  MXS Timer Tick
 18:         61         -  mxs-dma
 19:          0         -  mxs-dma
 25:         11         -  mxs-mmc
 26:          0         -  mxs-spi
207:          0         -  mxs-lradc-touchscreen
208:          0         -  mxs-lradc-thresh0
209:          0         -  mxs-lradc-thresh1
210:          0         -  mxs-lradc-channel0
211:          0         -  mxs-lradc-channel1
212:          0         -  mxs-lradc-channel2
213:          0         -  mxs-lradc-channel3
214:          0         -  mxs-lradc-channel4
215:          0         -  mxs-lradc-channel5
216:          0         -  mxs-lradc-channel6
217:          0         -  mxs-lradc-channel7
218:          0         -  mxs-lradc-button0
219:          0         -  mxs-lradc-button1
220:          0         -  RTC alarm
224:         90         -  80072000.serial
225:       3115         -  uart-pl011
226:          0         -  ci13xxx_imx
227:      40154         -  800f0000.ethernet
Err:          0
~ # 


Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-10 16:23               ` Stoidner, Christoph
@ 2014-12-10 16:26                 ` Gilles Chanteperdrix
  2014-12-10 18:23                   ` Stoidner, Christoph
  0 siblings, 1 reply; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-10 16:26 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Wed, Dec 10, 2014 at 04:23:51PM +0000, Stoidner, Christoph wrote:
> 
> >
> > Anyway, is not it simply a deadlock in your application ?
> 
> I don't think so. Here is a list of all tasks:
> 
> ~ # cat /proc/xenomai/stat 
> CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
>   0  0      0          7452       0     00500080   98.9  ROOT
>   0  83     196        196        0     00300380    0.0  DOS4
>   0  81     13         13         0     00300380    0.0  DOS5
>   0  97     618        830        0     00300380    0.0  @WDG
>   0  101    751        1127       0     00300380    0.0  HSSR
>   0  128    1268       2186       0     00300380    0.0  DOS8
>   0  129    1          1          0     00300380    0.0  @CGI
>   0  130    7          10         0     00300182    0.0  LOG 
>   0  131    1          2          0     00300184    0.0  IOXP
>   0  142    20         42         0     00300380    0.0  KTMR
>   0  151    1652       1901       0     00300380    0.0  Sdrv
>   0  152    383        489        0     00300380    0.0  LApp
>   0  156    0          1          0     00300380    0.0  DOS9
>   0  0      0          14937556   0     00000000    1.0  IRQ16: [timer]
> 
> Most of them are in state 00300380, that means:
>   XNSTARTED
>   XNMAPPED
>   XNRELAXED
>   XNFPU
>   XNSHADOW
> 
> For a deadlock I would expect all threads must be wait for a semaphore or something else (state flag XNPEND). However all tasks are freezed.

Well no, XNRELAXED state means that it is suspended from xenomai
scheduler point of view, and handled by Linux scheduler. So, if
there is a deadlock, it happens in secondary mode.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-10 16:26                 ` Gilles Chanteperdrix
@ 2014-12-10 18:23                   ` Stoidner, Christoph
  2014-12-10 18:41                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 49+ messages in thread
From: Stoidner, Christoph @ 2014-12-10 18:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

>> >
>> > Anyway, is not it simply a deadlock in your application ?
>>
>> I don't think so. Here is a list of all tasks:
>>
>> ~ # cat /proc/xenomai/stat
>> CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
>>   0  0      0          7452       0     00500080   98.9  ROOT
>>   0  83     196        196        0     00300380    0.0  DOS4
>>   0  81     13         13         0     00300380    0.0  DOS5
>>   0  97     618        830        0     00300380    0.0  @WDG
>>   0  101    751        1127       0     00300380    0.0  HSSR
>>   0  128    1268       2186       0     00300380    0.0  DOS8
>>   0  129    1          1          0     00300380    0.0  @CGI
>>   0  130    7          10         0     00300182    0.0  LOG
>>   0  131    1          2          0     00300184    0.0  IOXP
>>   0  142    20         42         0     00300380    0.0  KTMR
>>   0  151    1652       1901       0     00300380    0.0  Sdrv
>>   0  152    383        489        0     00300380    0.0  LApp
>>   0  156    0          1          0     00300380    0.0  DOS9
>>   0  0      0          14937556   0     00000000    1.0  IRQ16: [timer]
>>
>> Most of them are in state 00300380, that means:
>>   XNSTARTED
>>   XNMAPPED
>>   XNRELAXED
>>   XNFPU
>>   XNSHADOW
>>
>> For a deadlock I would expect all threads must be wait for a semaphore or something else (state flag XNPEND). However all tasks are freezed.
>
> Well no, XNRELAXED state means that it is suspended from xenomai
> scheduler point of view, and handled by Linux scheduler. So, if
> there is a deadlock, it happens in secondary mode.

The callstack in /proc/[PID]/tasks/[TID]/stack of all threads shows that they are at same address in __xnpod_schedule:

Call stack for task with TID 128:

~ # cat /proc/128/tasks/128/stack
[<c00a8a78>] __xnpod_schedule+0x294/0x764
[<c00a92a4>] xnpod_suspend_thread+0x2c0/0x354
[<c00b6b48>] xnshadow_relax+0x94/0x168
[<c00b7384>] losyscall_event+0x240/0x26c
[<c007c6fc>] ipipe_syscall_hook+0x44/0x54
[<c0078474>] __ipipe_notify_syscall+0x70/0x1d0
[<c0016820>] __ipipe_syscall_root+0x6c/0x194
[<c000f734>] vector_swi+0x74/0x8c
[<ffffffff>] 0xffffffff

With addr2line I have identified the if-statement as below on address c00a8a78 (found in kernel/xenomai/nucleus/pod.c:2256):

  if (shadow && xnarch_root_domain_p())

Before this line there is the call xnpod_switch_to(sched, prev, next). So I assume the thread has not resumed from context switch.

Do you have any more idea?

Regards,
Christoph


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Xenomai] Sleeping function called from invalid context
  2014-12-10 18:23                   ` Stoidner, Christoph
@ 2014-12-10 18:41                     ` Gilles Chanteperdrix
  0 siblings, 0 replies; 49+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-10 18:41 UTC (permalink / raw)
  To: Stoidner, Christoph; +Cc: xenomai@xenomai.org

On Wed, Dec 10, 2014 at 06:23:03PM +0000, Stoidner, Christoph wrote:
> [<c00b6b48>] xnshadow_relax+0x94/0x168
> [<c00b7384>] losyscall_event+0x240/0x26c

That is strange, are these tasks running with the SCHED_OTHER policy ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-12-17 22:40 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10 18:58 [Xenomai] Sleeping function called from invalid context Stoidner, Christoph
2014-12-10 19:01 ` Gilles Chanteperdrix
2014-12-11 10:00   ` Stoidner, Christoph
2014-12-11 10:05     ` Gilles Chanteperdrix
2014-12-11 10:18       ` Stoidner, Christoph
2014-12-11 10:22         ` Gilles Chanteperdrix
2014-12-11 10:29           ` Stoidner, Christoph
2014-12-11 10:47             ` Gilles Chanteperdrix
2014-12-11 11:17               ` Stoidner, Christoph
2014-12-11 14:47                 ` Gilles Chanteperdrix
2014-12-11 15:47                   ` Stoidner, Christoph
2014-12-11 16:06                     ` Gilles Chanteperdrix
2014-12-11 16:31                       ` Stoidner, Christoph
2014-12-11 16:38                         ` Gilles Chanteperdrix
2014-12-11 19:23                           ` Stoidner, Christoph
2014-12-12 16:42                             ` Stoidner, Christoph
2014-12-15 11:42                               ` Stoidner, Christoph
2014-12-15 13:23                                 ` Gilles Chanteperdrix
2014-12-15 13:29                                   ` Stoidner, Christoph
2014-12-15 14:20                                     ` Gilles Chanteperdrix
2014-12-15 15:11                                       ` Stoidner, Christoph
2014-12-15 15:19                                         ` Gilles Chanteperdrix
2014-12-17 12:24                                           ` Stoidner, Christoph
2014-12-17 12:38                                             ` Gilles Chanteperdrix
2014-12-17 13:22                                             ` Gilles Chanteperdrix
2014-12-17 15:46                                               ` Gilles Chanteperdrix
2014-12-17 22:40                                                 ` Stoidner, Christoph
  -- strict thread matches above, loose matches on Subject: below --
2014-12-06 14:19 Stoidner, Christoph
2014-12-06 14:25 ` Gilles Chanteperdrix
2014-12-06 15:11   ` Stoidner, Christoph
2014-12-07 12:32     ` Stoidner, Christoph
2014-12-07 12:40 ` Gilles Chanteperdrix
2014-12-07 13:50   ` Stoidner, Christoph
2014-12-07 13:52     ` Gilles Chanteperdrix
2014-12-07 15:05       ` Stoidner, Christoph
2014-12-09 20:06         ` Stoidner, Christoph
2014-12-09 20:08           ` Gilles Chanteperdrix
2014-12-09 20:18             ` Stoidner, Christoph
2014-12-09 20:24               ` Gilles Chanteperdrix
2014-12-09 20:34                 ` Stoidner, Christoph
2014-12-09 20:37                   ` Gilles Chanteperdrix
2014-12-09 20:47                     ` Stoidner, Christoph
2014-12-09 20:55                       ` Gilles Chanteperdrix
2014-12-09 20:49           ` Stoidner, Christoph
2014-12-09 20:59             ` Gilles Chanteperdrix
2014-12-10 16:23               ` Stoidner, Christoph
2014-12-10 16:26                 ` Gilles Chanteperdrix
2014-12-10 18:23                   ` Stoidner, Christoph
2014-12-10 18:41                     ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.