[Xenomai-core] High latencies on ARM.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-core] High latencies on ARM.
@ 2008-01-02 10:31 Gilles Chanteperdrix
  2008-01-17 10:42 ` Jan Kiszka
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-02 10:31 UTC (permalink / raw)
  To: xenomai-core

Hi,

after some (unsuccessful) time trying to instrument the code in a way
that does not change the latency results completely, I found the
reason for the high latency with latency -t 1 and latency -t 2 on ARM.
So, here comes an update on this issue. The culprit is the user-space
context switch, which flushes the processor cache with the nklock
locked, irqs off.

There are two things we could do:
- arrange for the ARM cache flush to happen with the nklock unlocked
and irqs enabled. This will improve interrupt latency (latency -t 2)
but obviously not scheduling latency (latency -t 1). If we go that
way, there are several problems we should solve:

we do not want interrupt handlers to reenter xnpod_schedule(), for
this we can use the XNLOCK bit, set on whatever is
xnpod_current_thread() when the cache flush occurs

since the interrupt handler may modify the rescheduling bits, we need
to test these bits in xnpod_schedule() epilogue and restart
xnpod_schedule() if need be

we do not want xnpod_delete_thread() to delete one of the two threads
involved in the context switch, for this the only solution I found is
to add a bit to the thread mask meaning that the thread is currently
switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
to delete whatever thread was marked for deletion

in case of migration with xnpod_migrate_thread, we do not want
xnpod_schedule() on the target CPU to switch to the migrated thread
before the context switch on the source CPU is finished, for this we
can avoid setting the resched bit in xnpod_migrate_thread(), detect
the condition in xnpod_schedule() epilogue and set the rescheduling
bits so that xnpod_schedule is restarted and send the IPI to the
target CPU.

- avoid using user-space real-time tasks when running latency
kernel-space benches, i.e. at least in the latency -t 1 and latency -t
2 case. This means that we should change the timerbench driver. There
are at least two ways of doing this:
use an rt_pipe
 modify the timerbench driver to implement only the nrt ioctl, using
vanilla linux services such as wait_event and wake_up.

What do you think ?

-- 
                                               Gilles Chanteperdrix

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-02 10:31 [Xenomai-core] High latencies on ARM Gilles Chanteperdrix
@ 2008-01-17 10:42 ` Jan Kiszka
  2008-01-17 10:47   ` Gilles Chanteperdrix
  2008-01-22 20:36 ` Gilles Chanteperdrix
       [not found] ` <18315.63245.160672.547658@domain.hid>
  2 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2008-01-17 10:42 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> Hi,
> 
> after some (unsuccessful) time trying to instrument the code in a way
> that does not change the latency results completely, I found the
> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
> So, here comes an update on this issue. The culprit is the user-space
> context switch, which flushes the processor cache with the nklock
> locked, irqs off.
> 
> There are two things we could do:
> - arrange for the ARM cache flush to happen with the nklock unlocked
> and irqs enabled. This will improve interrupt latency (latency -t 2)
> but obviously not scheduling latency (latency -t 1). If we go that
> way, there are several problems we should solve:
> 
> we do not want interrupt handlers to reenter xnpod_schedule(), for
> this we can use the XNLOCK bit, set on whatever is
> xnpod_current_thread() when the cache flush occurs
> 
> since the interrupt handler may modify the rescheduling bits, we need
> to test these bits in xnpod_schedule() epilogue and restart
> xnpod_schedule() if need be
> 
> we do not want xnpod_delete_thread() to delete one of the two threads
> involved in the context switch, for this the only solution I found is
> to add a bit to the thread mask meaning that the thread is currently
> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
> to delete whatever thread was marked for deletion
> 
> in case of migration with xnpod_migrate_thread, we do not want
> xnpod_schedule() on the target CPU to switch to the migrated thread
> before the context switch on the source CPU is finished, for this we
> can avoid setting the resched bit in xnpod_migrate_thread(), detect
> the condition in xnpod_schedule() epilogue and set the rescheduling
> bits so that xnpod_schedule is restarted and send the IPI to the
> target CPU.
> 
> - avoid using user-space real-time tasks when running latency
> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
> 2 case. This means that we should change the timerbench driver. There
> are at least two ways of doing this:
> use an rt_pipe
>  modify the timerbench driver to implement only the nrt ioctl, using
> vanilla linux services such as wait_event and wake_up.

[As you reminded me of this unanswered question:]
One may consider adding further modes _besides_ current kernel tests
that do not rely on RTDM & native userland support (e.g. when
CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
scenarios as well that must not be killed by such a change.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 10:42 ` Jan Kiszka
@ 2008-01-17 10:47   ` Gilles Chanteperdrix
  2008-01-17 11:55     ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-17 10:47 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>
> Gilles Chanteperdrix wrote:
> > Hi,
> >
> > after some (unsuccessful) time trying to instrument the code in a way
> > that does not change the latency results completely, I found the
> > reason for the high latency with latency -t 1 and latency -t 2 on ARM.
> > So, here comes an update on this issue. The culprit is the user-space
> > context switch, which flushes the processor cache with the nklock
> > locked, irqs off.
> >
> > There are two things we could do:
> > - arrange for the ARM cache flush to happen with the nklock unlocked
> > and irqs enabled. This will improve interrupt latency (latency -t 2)
> > but obviously not scheduling latency (latency -t 1). If we go that
> > way, there are several problems we should solve:
> >
> > we do not want interrupt handlers to reenter xnpod_schedule(), for
> > this we can use the XNLOCK bit, set on whatever is
> > xnpod_current_thread() when the cache flush occurs
> >
> > since the interrupt handler may modify the rescheduling bits, we need
> > to test these bits in xnpod_schedule() epilogue and restart
> > xnpod_schedule() if need be
> >
> > we do not want xnpod_delete_thread() to delete one of the two threads
> > involved in the context switch, for this the only solution I found is
> > to add a bit to the thread mask meaning that the thread is currently
> > switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
> > to delete whatever thread was marked for deletion
> >
> > in case of migration with xnpod_migrate_thread, we do not want
> > xnpod_schedule() on the target CPU to switch to the migrated thread
> > before the context switch on the source CPU is finished, for this we
> > can avoid setting the resched bit in xnpod_migrate_thread(), detect
> > the condition in xnpod_schedule() epilogue and set the rescheduling
> > bits so that xnpod_schedule is restarted and send the IPI to the
> > target CPU.
> >
> > - avoid using user-space real-time tasks when running latency
> > kernel-space benches, i.e. at least in the latency -t 1 and latency -t
> > 2 case. This means that we should change the timerbench driver. There
> > are at least two ways of doing this:
> > use an rt_pipe
> >  modify the timerbench driver to implement only the nrt ioctl, using
> > vanilla linux services such as wait_event and wake_up.
>
> [As you reminded me of this unanswered question:]
> One may consider adding further modes _besides_ current kernel tests
> that do not rely on RTDM & native userland support (e.g. when
> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
> scenarios as well that must not be killed by such a change.

I think the current test scenario for latency -t 1 and latency -t 2
are a bit misleading: they measure kernel-space latencies in presence
of user-space real-time tasks. When one runs latency -t 1 or latency
-t 2, one would expect that there are only kernel-space real-time
tasks.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 10:47   ` Gilles Chanteperdrix
@ 2008-01-17 11:55     ` Jan Kiszka
  2008-01-17 13:59       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2008-01-17 11:55 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> Gilles Chanteperdrix wrote:
>>> Hi,
>>>
>>> after some (unsuccessful) time trying to instrument the code in a way
>>> that does not change the latency results completely, I found the
>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
>>> So, here comes an update on this issue. The culprit is the user-space
>>> context switch, which flushes the processor cache with the nklock
>>> locked, irqs off.
>>>
>>> There are two things we could do:
>>> - arrange for the ARM cache flush to happen with the nklock unlocked
>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
>>> but obviously not scheduling latency (latency -t 1). If we go that
>>> way, there are several problems we should solve:
>>>
>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
>>> this we can use the XNLOCK bit, set on whatever is
>>> xnpod_current_thread() when the cache flush occurs
>>>
>>> since the interrupt handler may modify the rescheduling bits, we need
>>> to test these bits in xnpod_schedule() epilogue and restart
>>> xnpod_schedule() if need be
>>>
>>> we do not want xnpod_delete_thread() to delete one of the two threads
>>> involved in the context switch, for this the only solution I found is
>>> to add a bit to the thread mask meaning that the thread is currently
>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
>>> to delete whatever thread was marked for deletion
>>>
>>> in case of migration with xnpod_migrate_thread, we do not want
>>> xnpod_schedule() on the target CPU to switch to the migrated thread
>>> before the context switch on the source CPU is finished, for this we
>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
>>> the condition in xnpod_schedule() epilogue and set the rescheduling
>>> bits so that xnpod_schedule is restarted and send the IPI to the
>>> target CPU.
>>>
>>> - avoid using user-space real-time tasks when running latency
>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
>>> 2 case. This means that we should change the timerbench driver. There
>>> are at least two ways of doing this:
>>> use an rt_pipe
>>>  modify the timerbench driver to implement only the nrt ioctl, using
>>> vanilla linux services such as wait_event and wake_up.
>> [As you reminded me of this unanswered question:]
>> One may consider adding further modes _besides_ current kernel tests
>> that do not rely on RTDM & native userland support (e.g. when
>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
>> scenarios as well that must not be killed by such a change.
> 
> I think the current test scenario for latency -t 1 and latency -t 2
> are a bit misleading: they measure kernel-space latencies in presence
> of user-space real-time tasks. When one runs latency -t 1 or latency
> -t 2, one would expect that there are only kernel-space real-time
> tasks.

If they are misleading, depends on your perspective. In fact, they are
measuring in-kernel scenarios over the standard Xenomai setup, which
includes userland RT task activity these day. Those scenarios are mainly
targeting driver use cases, not pure kernel-space applications.

But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
would benefit from an additional set of test cases.

Jan
-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 11:55     ` Jan Kiszka
@ 2008-01-17 13:59       ` Gilles Chanteperdrix
  2008-01-17 14:16         ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-17 13:59 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Jan 17, 2008 12:55 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>
> Gilles Chanteperdrix wrote:
> > On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >> Gilles Chanteperdrix wrote:
> >>> Hi,
> >>>
> >>> after some (unsuccessful) time trying to instrument the code in a way
> >>> that does not change the latency results completely, I found the
> >>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
> >>> So, here comes an update on this issue. The culprit is the user-space
> >>> context switch, which flushes the processor cache with the nklock
> >>> locked, irqs off.
> >>>
> >>> There are two things we could do:
> >>> - arrange for the ARM cache flush to happen with the nklock unlocked
> >>> and irqs enabled. This will improve interrupt latency (latency -t 2)
> >>> but obviously not scheduling latency (latency -t 1). If we go that
> >>> way, there are several problems we should solve:
> >>>
> >>> we do not want interrupt handlers to reenter xnpod_schedule(), for
> >>> this we can use the XNLOCK bit, set on whatever is
> >>> xnpod_current_thread() when the cache flush occurs
> >>>
> >>> since the interrupt handler may modify the rescheduling bits, we need
> >>> to test these bits in xnpod_schedule() epilogue and restart
> >>> xnpod_schedule() if need be
> >>>
> >>> we do not want xnpod_delete_thread() to delete one of the two threads
> >>> involved in the context switch, for this the only solution I found is
> >>> to add a bit to the thread mask meaning that the thread is currently
> >>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
> >>> to delete whatever thread was marked for deletion
> >>>
> >>> in case of migration with xnpod_migrate_thread, we do not want
> >>> xnpod_schedule() on the target CPU to switch to the migrated thread
> >>> before the context switch on the source CPU is finished, for this we
> >>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
> >>> the condition in xnpod_schedule() epilogue and set the rescheduling
> >>> bits so that xnpod_schedule is restarted and send the IPI to the
> >>> target CPU.
> >>>
> >>> - avoid using user-space real-time tasks when running latency
> >>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
> >>> 2 case. This means that we should change the timerbench driver. There
> >>> are at least two ways of doing this:
> >>> use an rt_pipe
> >>>  modify the timerbench driver to implement only the nrt ioctl, using
> >>> vanilla linux services such as wait_event and wake_up.
> >> [As you reminded me of this unanswered question:]
> >> One may consider adding further modes _besides_ current kernel tests
> >> that do not rely on RTDM & native userland support (e.g. when
> >> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
> >> scenarios as well that must not be killed by such a change.
> >
> > I think the current test scenario for latency -t 1 and latency -t 2
> > are a bit misleading: they measure kernel-space latencies in presence
> > of user-space real-time tasks. When one runs latency -t 1 or latency
> > -t 2, one would expect that there are only kernel-space real-time
> > tasks.
>
> If they are misleading, depends on your perspective. In fact, they are
> measuring in-kernel scenarios over the standard Xenomai setup, which
> includes userland RT task activity these day. Those scenarios are mainly
> targeting driver use cases, not pure kernel-space applications.
>
> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
> would benefit from an additional set of test cases.

Ok, I will not touch timerbench then, and implement another kernel module.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 13:59       ` Gilles Chanteperdrix
@ 2008-01-17 14:16         ` Jan Kiszka
  2008-01-17 14:18           ` Jan Kiszka
  2008-01-17 14:20           ` Gilles Chanteperdrix
  0 siblings, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2008-01-17 14:16 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> On Jan 17, 2008 12:55 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> Gilles Chanteperdrix wrote:
>>> On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> Hi,
>>>>>
>>>>> after some (unsuccessful) time trying to instrument the code in a way
>>>>> that does not change the latency results completely, I found the
>>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
>>>>> So, here comes an update on this issue. The culprit is the user-space
>>>>> context switch, which flushes the processor cache with the nklock
>>>>> locked, irqs off.
>>>>>
>>>>> There are two things we could do:
>>>>> - arrange for the ARM cache flush to happen with the nklock unlocked
>>>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
>>>>> but obviously not scheduling latency (latency -t 1). If we go that
>>>>> way, there are several problems we should solve:
>>>>>
>>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
>>>>> this we can use the XNLOCK bit, set on whatever is
>>>>> xnpod_current_thread() when the cache flush occurs
>>>>>
>>>>> since the interrupt handler may modify the rescheduling bits, we need
>>>>> to test these bits in xnpod_schedule() epilogue and restart
>>>>> xnpod_schedule() if need be
>>>>>
>>>>> we do not want xnpod_delete_thread() to delete one of the two threads
>>>>> involved in the context switch, for this the only solution I found is
>>>>> to add a bit to the thread mask meaning that the thread is currently
>>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
>>>>> to delete whatever thread was marked for deletion
>>>>>
>>>>> in case of migration with xnpod_migrate_thread, we do not want
>>>>> xnpod_schedule() on the target CPU to switch to the migrated thread
>>>>> before the context switch on the source CPU is finished, for this we
>>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
>>>>> the condition in xnpod_schedule() epilogue and set the rescheduling
>>>>> bits so that xnpod_schedule is restarted and send the IPI to the
>>>>> target CPU.
>>>>>
>>>>> - avoid using user-space real-time tasks when running latency
>>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
>>>>> 2 case. This means that we should change the timerbench driver. There
>>>>> are at least two ways of doing this:
>>>>> use an rt_pipe
>>>>>  modify the timerbench driver to implement only the nrt ioctl, using
>>>>> vanilla linux services such as wait_event and wake_up.
>>>> [As you reminded me of this unanswered question:]
>>>> One may consider adding further modes _besides_ current kernel tests
>>>> that do not rely on RTDM & native userland support (e.g. when
>>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
>>>> scenarios as well that must not be killed by such a change.
>>> I think the current test scenario for latency -t 1 and latency -t 2
>>> are a bit misleading: they measure kernel-space latencies in presence
>>> of user-space real-time tasks. When one runs latency -t 1 or latency
>>> -t 2, one would expect that there are only kernel-space real-time
>>> tasks.
>> If they are misleading, depends on your perspective. In fact, they are
>> measuring in-kernel scenarios over the standard Xenomai setup, which
>> includes userland RT task activity these day. Those scenarios are mainly
>> targeting driver use cases, not pure kernel-space applications.
>>
>> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
>> would benefit from an additional set of test cases.
> 
> Ok, I will not touch timerbench then, and implement another kernel module.
> 

[Without considering all details]
To achieve this independence of user space RT thread, it should suffice
to implement a kernel-based frontend for timerbench. This frontent would
then either dump to syslog or open some pipe to tell userland about the
benchmark results. What do yo think?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 14:16         ` Jan Kiszka
@ 2008-01-17 14:18           ` Jan Kiszka
  2008-01-17 14:20           ` Gilles Chanteperdrix
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2008-01-17 14:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> On Jan 17, 2008 12:55 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>> Gilles Chanteperdrix wrote:
>>>> On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>> Hi,
>>>>>>
>>>>>> after some (unsuccessful) time trying to instrument the code in a way
>>>>>> that does not change the latency results completely, I found the
>>>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
>>>>>> So, here comes an update on this issue. The culprit is the user-space
>>>>>> context switch, which flushes the processor cache with the nklock
>>>>>> locked, irqs off.
>>>>>>
>>>>>> There are two things we could do:
>>>>>> - arrange for the ARM cache flush to happen with the nklock unlocked
>>>>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
>>>>>> but obviously not scheduling latency (latency -t 1). If we go that
>>>>>> way, there are several problems we should solve:
>>>>>>
>>>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
>>>>>> this we can use the XNLOCK bit, set on whatever is
>>>>>> xnpod_current_thread() when the cache flush occurs
>>>>>>
>>>>>> since the interrupt handler may modify the rescheduling bits, we need
>>>>>> to test these bits in xnpod_schedule() epilogue and restart
>>>>>> xnpod_schedule() if need be
>>>>>>
>>>>>> we do not want xnpod_delete_thread() to delete one of the two threads
>>>>>> involved in the context switch, for this the only solution I found is
>>>>>> to add a bit to the thread mask meaning that the thread is currently
>>>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
>>>>>> to delete whatever thread was marked for deletion
>>>>>>
>>>>>> in case of migration with xnpod_migrate_thread, we do not want
>>>>>> xnpod_schedule() on the target CPU to switch to the migrated thread
>>>>>> before the context switch on the source CPU is finished, for this we
>>>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
>>>>>> the condition in xnpod_schedule() epilogue and set the rescheduling
>>>>>> bits so that xnpod_schedule is restarted and send the IPI to the
>>>>>> target CPU.
>>>>>>
>>>>>> - avoid using user-space real-time tasks when running latency
>>>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
>>>>>> 2 case. This means that we should change the timerbench driver. There
>>>>>> are at least two ways of doing this:
>>>>>> use an rt_pipe
>>>>>>  modify the timerbench driver to implement only the nrt ioctl, using
>>>>>> vanilla linux services such as wait_event and wake_up.
>>>>> [As you reminded me of this unanswered question:]
>>>>> One may consider adding further modes _besides_ current kernel tests
>>>>> that do not rely on RTDM & native userland support (e.g. when
>>>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
>>>>> scenarios as well that must not be killed by such a change.
>>>> I think the current test scenario for latency -t 1 and latency -t 2
>>>> are a bit misleading: they measure kernel-space latencies in presence
>>>> of user-space real-time tasks. When one runs latency -t 1 or latency
>>>> -t 2, one would expect that there are only kernel-space real-time
>>>> tasks.
>>> If they are misleading, depends on your perspective. In fact, they are
>>> measuring in-kernel scenarios over the standard Xenomai setup, which
>>> includes userland RT task activity these day. Those scenarios are mainly
>>> targeting driver use cases, not pure kernel-space applications.
>>>
>>> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
>>> would benefit from an additional set of test cases.
>> Ok, I will not touch timerbench then, and implement another kernel module.
>>
> 
> [Without considering all details]
> To achieve this independence of user space RT thread, it should suffice
> to implement a kernel-based frontend for timerbench. This frontent would
> then either dump to syslog or open some pipe to tell userland about the
> benchmark results. What do yo think?
> 

(That is only in case you meant "reimplementing timerbench" with
"implement another kernel module". Just write a kernel-hosted RTDM user
of timerbench.)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 14:16         ` Jan Kiszka
  2008-01-17 14:18           ` Jan Kiszka
@ 2008-01-17 14:20           ` Gilles Chanteperdrix
  2008-01-17 14:22             ` Jan Kiszka
  1 sibling, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-17 14:20 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Jan 17, 2008 3:16 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>
> Gilles Chanteperdrix wrote:
> > On Jan 17, 2008 12:55 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >> Gilles Chanteperdrix wrote:
> >>> On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >>>> Gilles Chanteperdrix wrote:
> >>>>> Hi,
> >>>>>
> >>>>> after some (unsuccessful) time trying to instrument the code in a way
> >>>>> that does not change the latency results completely, I found the
> >>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
> >>>>> So, here comes an update on this issue. The culprit is the user-space
> >>>>> context switch, which flushes the processor cache with the nklock
> >>>>> locked, irqs off.
> >>>>>
> >>>>> There are two things we could do:
> >>>>> - arrange for the ARM cache flush to happen with the nklock unlocked
> >>>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
> >>>>> but obviously not scheduling latency (latency -t 1). If we go that
> >>>>> way, there are several problems we should solve:
> >>>>>
> >>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
> >>>>> this we can use the XNLOCK bit, set on whatever is
> >>>>> xnpod_current_thread() when the cache flush occurs
> >>>>>
> >>>>> since the interrupt handler may modify the rescheduling bits, we need
> >>>>> to test these bits in xnpod_schedule() epilogue and restart
> >>>>> xnpod_schedule() if need be
> >>>>>
> >>>>> we do not want xnpod_delete_thread() to delete one of the two threads
> >>>>> involved in the context switch, for this the only solution I found is
> >>>>> to add a bit to the thread mask meaning that the thread is currently
> >>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
> >>>>> to delete whatever thread was marked for deletion
> >>>>>
> >>>>> in case of migration with xnpod_migrate_thread, we do not want
> >>>>> xnpod_schedule() on the target CPU to switch to the migrated thread
> >>>>> before the context switch on the source CPU is finished, for this we
> >>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
> >>>>> the condition in xnpod_schedule() epilogue and set the rescheduling
> >>>>> bits so that xnpod_schedule is restarted and send the IPI to the
> >>>>> target CPU.
> >>>>>
> >>>>> - avoid using user-space real-time tasks when running latency
> >>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
> >>>>> 2 case. This means that we should change the timerbench driver. There
> >>>>> are at least two ways of doing this:
> >>>>> use an rt_pipe
> >>>>>  modify the timerbench driver to implement only the nrt ioctl, using
> >>>>> vanilla linux services such as wait_event and wake_up.
> >>>> [As you reminded me of this unanswered question:]
> >>>> One may consider adding further modes _besides_ current kernel tests
> >>>> that do not rely on RTDM & native userland support (e.g. when
> >>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
> >>>> scenarios as well that must not be killed by such a change.
> >>> I think the current test scenario for latency -t 1 and latency -t 2
> >>> are a bit misleading: they measure kernel-space latencies in presence
> >>> of user-space real-time tasks. When one runs latency -t 1 or latency
> >>> -t 2, one would expect that there are only kernel-space real-time
> >>> tasks.
> >> If they are misleading, depends on your perspective. In fact, they are
> >> measuring in-kernel scenarios over the standard Xenomai setup, which
> >> includes userland RT task activity these day. Those scenarios are mainly
> >> targeting driver use cases, not pure kernel-space applications.
> >>
> >> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
> >> would benefit from an additional set of test cases.
> >
> > Ok, I will not touch timerbench then, and implement another kernel module.
> >
>
> [Without considering all details]
> To achieve this independence of user space RT thread, it should suffice
> to implement a kernel-based frontend for timerbench. This frontent would
> then either dump to syslog or open some pipe to tell userland about the
> benchmark results. What do yo think?

My intent was to implement a protocol similar to the one of
timerbench, but using an rt-pipe, and continue to use the latency
test, adding new options such as -t 3 and t 4. But there may be
problems with this approach: if we are compiling without
CONFIG_XENO_OPT_PERVASIVE, latency will not run at all. So, it is
probably simpler to implement a klatency that just reads from the
rt-pipe.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 14:20           ` Gilles Chanteperdrix
@ 2008-01-17 14:22             ` Jan Kiszka
  2008-01-17 15:37               ` Gilles Chanteperdrix
  2008-01-21 21:55               ` Gilles Chanteperdrix
  0 siblings, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2008-01-17 14:22 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> On Jan 17, 2008 3:16 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> Gilles Chanteperdrix wrote:
>>> On Jan 17, 2008 12:55 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>>>>> Gilles Chanteperdrix wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> after some (unsuccessful) time trying to instrument the code in a way
>>>>>>> that does not change the latency results completely, I found the
>>>>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
>>>>>>> So, here comes an update on this issue. The culprit is the user-space
>>>>>>> context switch, which flushes the processor cache with the nklock
>>>>>>> locked, irqs off.
>>>>>>>
>>>>>>> There are two things we could do:
>>>>>>> - arrange for the ARM cache flush to happen with the nklock unlocked
>>>>>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
>>>>>>> but obviously not scheduling latency (latency -t 1). If we go that
>>>>>>> way, there are several problems we should solve:
>>>>>>>
>>>>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
>>>>>>> this we can use the XNLOCK bit, set on whatever is
>>>>>>> xnpod_current_thread() when the cache flush occurs
>>>>>>>
>>>>>>> since the interrupt handler may modify the rescheduling bits, we need
>>>>>>> to test these bits in xnpod_schedule() epilogue and restart
>>>>>>> xnpod_schedule() if need be
>>>>>>>
>>>>>>> we do not want xnpod_delete_thread() to delete one of the two threads
>>>>>>> involved in the context switch, for this the only solution I found is
>>>>>>> to add a bit to the thread mask meaning that the thread is currently
>>>>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
>>>>>>> to delete whatever thread was marked for deletion
>>>>>>>
>>>>>>> in case of migration with xnpod_migrate_thread, we do not want
>>>>>>> xnpod_schedule() on the target CPU to switch to the migrated thread
>>>>>>> before the context switch on the source CPU is finished, for this we
>>>>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
>>>>>>> the condition in xnpod_schedule() epilogue and set the rescheduling
>>>>>>> bits so that xnpod_schedule is restarted and send the IPI to the
>>>>>>> target CPU.
>>>>>>>
>>>>>>> - avoid using user-space real-time tasks when running latency
>>>>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
>>>>>>> 2 case. This means that we should change the timerbench driver. There
>>>>>>> are at least two ways of doing this:
>>>>>>> use an rt_pipe
>>>>>>>  modify the timerbench driver to implement only the nrt ioctl, using
>>>>>>> vanilla linux services such as wait_event and wake_up.
>>>>>> [As you reminded me of this unanswered question:]
>>>>>> One may consider adding further modes _besides_ current kernel tests
>>>>>> that do not rely on RTDM & native userland support (e.g. when
>>>>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
>>>>>> scenarios as well that must not be killed by such a change.
>>>>> I think the current test scenario for latency -t 1 and latency -t 2
>>>>> are a bit misleading: they measure kernel-space latencies in presence
>>>>> of user-space real-time tasks. When one runs latency -t 1 or latency
>>>>> -t 2, one would expect that there are only kernel-space real-time
>>>>> tasks.
>>>> If they are misleading, depends on your perspective. In fact, they are
>>>> measuring in-kernel scenarios over the standard Xenomai setup, which
>>>> includes userland RT task activity these day. Those scenarios are mainly
>>>> targeting driver use cases, not pure kernel-space applications.
>>>>
>>>> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
>>>> would benefit from an additional set of test cases.
>>> Ok, I will not touch timerbench then, and implement another kernel module.
>>>
>> [Without considering all details]
>> To achieve this independence of user space RT thread, it should suffice
>> to implement a kernel-based frontend for timerbench. This frontent would
>> then either dump to syslog or open some pipe to tell userland about the
>> benchmark results. What do yo think?
> 
> My intent was to implement a protocol similar to the one of
> timerbench, but using an rt-pipe, and continue to use the latency
> test, adding new options such as -t 3 and t 4. But there may be
> problems with this approach: if we are compiling without
> CONFIG_XENO_OPT_PERVASIVE, latency will not run at all. So, it is
> probably simpler to implement a klatency that just reads from the
> rt-pipe.

But that klantency could perfectly reuse what timerbench already
provides, without code changes to the latter, in theory.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 14:22             ` Jan Kiszka
@ 2008-01-17 15:37               ` Gilles Chanteperdrix
  2008-01-31  7:43                 ` Gilles Chanteperdrix
  2008-01-21 21:55               ` Gilles Chanteperdrix
  1 sibling, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-17 15:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Jan 17, 2008 3:22 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>
> Gilles Chanteperdrix wrote:
> > On Jan 17, 2008 3:16 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >> Gilles Chanteperdrix wrote:
> >>> On Jan 17, 2008 12:55 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >>>> Gilles Chanteperdrix wrote:
> >>>>> On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >>>>>> Gilles Chanteperdrix wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> after some (unsuccessful) time trying to instrument the code in a way
> >>>>>>> that does not change the latency results completely, I found the
> >>>>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
> >>>>>>> So, here comes an update on this issue. The culprit is the user-space
> >>>>>>> context switch, which flushes the processor cache with the nklock
> >>>>>>> locked, irqs off.
> >>>>>>>
> >>>>>>> There are two things we could do:
> >>>>>>> - arrange for the ARM cache flush to happen with the nklock unlocked
> >>>>>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
> >>>>>>> but obviously not scheduling latency (latency -t 1). If we go that
> >>>>>>> way, there are several problems we should solve:
> >>>>>>>
> >>>>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
> >>>>>>> this we can use the XNLOCK bit, set on whatever is
> >>>>>>> xnpod_current_thread() when the cache flush occurs
> >>>>>>>
> >>>>>>> since the interrupt handler may modify the rescheduling bits, we need
> >>>>>>> to test these bits in xnpod_schedule() epilogue and restart
> >>>>>>> xnpod_schedule() if need be
> >>>>>>>
> >>>>>>> we do not want xnpod_delete_thread() to delete one of the two threads
> >>>>>>> involved in the context switch, for this the only solution I found is
> >>>>>>> to add a bit to the thread mask meaning that the thread is currently
> >>>>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
> >>>>>>> to delete whatever thread was marked for deletion
> >>>>>>>
> >>>>>>> in case of migration with xnpod_migrate_thread, we do not want
> >>>>>>> xnpod_schedule() on the target CPU to switch to the migrated thread
> >>>>>>> before the context switch on the source CPU is finished, for this we
> >>>>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
> >>>>>>> the condition in xnpod_schedule() epilogue and set the rescheduling
> >>>>>>> bits so that xnpod_schedule is restarted and send the IPI to the
> >>>>>>> target CPU.
> >>>>>>>
> >>>>>>> - avoid using user-space real-time tasks when running latency
> >>>>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
> >>>>>>> 2 case. This means that we should change the timerbench driver. There
> >>>>>>> are at least two ways of doing this:
> >>>>>>> use an rt_pipe
> >>>>>>>  modify the timerbench driver to implement only the nrt ioctl, using
> >>>>>>> vanilla linux services such as wait_event and wake_up.
> >>>>>> [As you reminded me of this unanswered question:]
> >>>>>> One may consider adding further modes _besides_ current kernel tests
> >>>>>> that do not rely on RTDM & native userland support (e.g. when
> >>>>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
> >>>>>> scenarios as well that must not be killed by such a change.
> >>>>> I think the current test scenario for latency -t 1 and latency -t 2
> >>>>> are a bit misleading: they measure kernel-space latencies in presence
> >>>>> of user-space real-time tasks. When one runs latency -t 1 or latency
> >>>>> -t 2, one would expect that there are only kernel-space real-time
> >>>>> tasks.
> >>>> If they are misleading, depends on your perspective. In fact, they are
> >>>> measuring in-kernel scenarios over the standard Xenomai setup, which
> >>>> includes userland RT task activity these day. Those scenarios are mainly
> >>>> targeting driver use cases, not pure kernel-space applications.
> >>>>
> >>>> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
> >>>> would benefit from an additional set of test cases.
> >>> Ok, I will not touch timerbench then, and implement another kernel module.
> >>>
> >> [Without considering all details]
> >> To achieve this independence of user space RT thread, it should suffice
> >> to implement a kernel-based frontend for timerbench. This frontent would
> >> then either dump to syslog or open some pipe to tell userland about the
> >> benchmark results. What do yo think?
> >
> > My intent was to implement a protocol similar to the one of
> > timerbench, but using an rt-pipe, and continue to use the latency
> > test, adding new options such as -t 3 and t 4. But there may be
> > problems with this approach: if we are compiling without
> > CONFIG_XENO_OPT_PERVASIVE, latency will not run at all. So, it is
> > probably simpler to implement a klatency that just reads from the
> > rt-pipe.
>
> But that klantency could perfectly reuse what timerbench already
> provides, without code changes to the latter, in theory.

That would be a kernel module then, but I also need some user-space
piece of software to do the computations and print the results.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 14:22             ` Jan Kiszka
  2008-01-17 15:37               ` Gilles Chanteperdrix
@ 2008-01-21 21:55               ` Gilles Chanteperdrix
  1 sibling, 0 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-21 21:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
 > Gilles Chanteperdrix wrote:
 > > On Jan 17, 2008 3:16 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > >> Gilles Chanteperdrix wrote:
 > >>> On Jan 17, 2008 12:55 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > >>>> Gilles Chanteperdrix wrote:
 > >>>>> On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > >>>>>> Gilles Chanteperdrix wrote:
 > >>>>>>> Hi,
 > >>>>>>>
 > >>>>>>> after some (unsuccessful) time trying to instrument the code in a way
 > >>>>>>> that does not change the latency results completely, I found the
 > >>>>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
 > >>>>>>> So, here comes an update on this issue. The culprit is the user-space
 > >>>>>>> context switch, which flushes the processor cache with the nklock
 > >>>>>>> locked, irqs off.
 > >>>>>>>
 > >>>>>>> There are two things we could do:
 > >>>>>>> - arrange for the ARM cache flush to happen with the nklock unlocked
 > >>>>>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
 > >>>>>>> but obviously not scheduling latency (latency -t 1). If we go that
 > >>>>>>> way, there are several problems we should solve:
 > >>>>>>>
 > >>>>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
 > >>>>>>> this we can use the XNLOCK bit, set on whatever is
 > >>>>>>> xnpod_current_thread() when the cache flush occurs
 > >>>>>>>
 > >>>>>>> since the interrupt handler may modify the rescheduling bits, we need
 > >>>>>>> to test these bits in xnpod_schedule() epilogue and restart
 > >>>>>>> xnpod_schedule() if need be
 > >>>>>>>
 > >>>>>>> we do not want xnpod_delete_thread() to delete one of the two threads
 > >>>>>>> involved in the context switch, for this the only solution I found is
 > >>>>>>> to add a bit to the thread mask meaning that the thread is currently
 > >>>>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
 > >>>>>>> to delete whatever thread was marked for deletion
 > >>>>>>>
 > >>>>>>> in case of migration with xnpod_migrate_thread, we do not want
 > >>>>>>> xnpod_schedule() on the target CPU to switch to the migrated thread
 > >>>>>>> before the context switch on the source CPU is finished, for this we
 > >>>>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
 > >>>>>>> the condition in xnpod_schedule() epilogue and set the rescheduling
 > >>>>>>> bits so that xnpod_schedule is restarted and send the IPI to the
 > >>>>>>> target CPU.
 > >>>>>>>
 > >>>>>>> - avoid using user-space real-time tasks when running latency
 > >>>>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
 > >>>>>>> 2 case. This means that we should change the timerbench driver. There
 > >>>>>>> are at least two ways of doing this:
 > >>>>>>> use an rt_pipe
 > >>>>>>>  modify the timerbench driver to implement only the nrt ioctl, using
 > >>>>>>> vanilla linux services such as wait_event and wake_up.
 > >>>>>> [As you reminded me of this unanswered question:]
 > >>>>>> One may consider adding further modes _besides_ current kernel tests
 > >>>>>> that do not rely on RTDM & native userland support (e.g. when
 > >>>>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
 > >>>>>> scenarios as well that must not be killed by such a change.
 > >>>>> I think the current test scenario for latency -t 1 and latency -t 2
 > >>>>> are a bit misleading: they measure kernel-space latencies in presence
 > >>>>> of user-space real-time tasks. When one runs latency -t 1 or latency
 > >>>>> -t 2, one would expect that there are only kernel-space real-time
 > >>>>> tasks.
 > >>>> If they are misleading, depends on your perspective. In fact, they are
 > >>>> measuring in-kernel scenarios over the standard Xenomai setup, which
 > >>>> includes userland RT task activity these day. Those scenarios are mainly
 > >>>> targeting driver use cases, not pure kernel-space applications.
 > >>>>
 > >>>> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
 > >>>> would benefit from an additional set of test cases.
 > >>> Ok, I will not touch timerbench then, and implement another kernel module.
 > >>>
 > >> [Without considering all details]
 > >> To achieve this independence of user space RT thread, it should suffice
 > >> to implement a kernel-based frontend for timerbench. This frontent would
 > >> then either dump to syslog or open some pipe to tell userland about the
 > >> benchmark results. What do yo think?
 > > 
 > > My intent was to implement a protocol similar to the one of
 > > timerbench, but using an rt-pipe, and continue to use the latency
 > > test, adding new options such as -t 3 and t 4. But there may be
 > > problems with this approach: if we are compiling without
 > > CONFIG_XENO_OPT_PERVASIVE, latency will not run at all. So, it is
 > > probably simpler to implement a klatency that just reads from the
 > > rt-pipe.
 > 
 > But that klantency could perfectly reuse what timerbench already
 > provides, without code changes to the latter, in theory.

In theory yes, but in practice, timerbench non real-time ioctls use some
linux services, so they can not be called from the context of a
kernel-space task listening on a real-time pipe.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-02 10:31 [Xenomai-core] High latencies on ARM Gilles Chanteperdrix
  2008-01-17 10:42 ` Jan Kiszka
@ 2008-01-22 20:36 ` Gilles Chanteperdrix
  2008-01-22 21:46   ` Jan Kiszka
       [not found] ` <18315.63245.160672.547658@domain.hid>
  2 siblings, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-22 20:36 UTC (permalink / raw)
  To: xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 3021 bytes --]

Gilles Chanteperdrix wrote:
 > Hi,
 > 
 > after some (unsuccessful) time trying to instrument the code in a way
 > that does not change the latency results completely, I found the
 > reason for the high latency with latency -t 1 and latency -t 2 on ARM.
 > So, here comes an update on this issue. The culprit is the user-space
 > context switch, which flushes the processor cache with the nklock
 > locked, irqs off.
 > 
 > There are two things we could do:
 > - arrange for the ARM cache flush to happen with the nklock unlocked
 > and irqs enabled. This will improve interrupt latency (latency -t 2)
 > but obviously not scheduling latency (latency -t 1). If we go that
 > way, there are several problems we should solve:
 > 
 > we do not want interrupt handlers to reenter xnpod_schedule(), for
 > this we can use the XNLOCK bit, set on whatever is
 > xnpod_current_thread() when the cache flush occurs
 > 
 > since the interrupt handler may modify the rescheduling bits, we need
 > to test these bits in xnpod_schedule() epilogue and restart
 > xnpod_schedule() if need be
 > 
 > we do not want xnpod_delete_thread() to delete one of the two threads
 > involved in the context switch, for this the only solution I found is
 > to add a bit to the thread mask meaning that the thread is currently
 > switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
 > to delete whatever thread was marked for deletion
 > 
 > in case of migration with xnpod_migrate_thread, we do not want
 > xnpod_schedule() on the target CPU to switch to the migrated thread
 > before the context switch on the source CPU is finished, for this we
 > can avoid setting the resched bit in xnpod_migrate_thread(), detect
 > the condition in xnpod_schedule() epilogue and set the rescheduling
 > bits so that xnpod_schedule is restarted and send the IPI to the
 > target CPU.

Please find attached a patch implementing these ideas. This adds some
clutter, which I would be happy to reduce. Better ideas are welcome.


 > 
 > - avoid using user-space real-time tasks when running latency
 > kernel-space benches, i.e. at least in the latency -t 1 and latency -t
 > 2 case. This means that we should change the timerbench driver. There
 > are at least two ways of doing this:
 > use an rt_pipe
 >  modify the timerbench driver to implement only the nrt ioctl, using
 > vanilla linux services such as wait_event and wake_up.
 > 
 > What do you think ?

So, what do you thing is the best way to change the timerbench driver,
* use an rt_pipe ? Pros: allows to run latency -t 1 and latency -t 2 even
 if Xenomai is compiled with CONFIG_XENO_OPT_PERVASIVE off; cons: make
 the timerbench non portable on other implementations of rtdm, eg. rtdm
 over rtai or the version of rtdm which runs over vanilla linux
* modify the timerbecn driver to implement only nrt ioctls ? Pros:
  better driver portability; cons: latency would still need
  CONFIG_XENO_OPT_PERVASIVE to run latency -t 1 and latency -t 2.

-- 


					    Gilles Chanteperdrix.

[-- Attachment #2: xeno-unlocked-arm-ctx-switch.diff --]
[-- Type: text/plain, Size: 18051 bytes --]

Index: include/asm-arm/bits/pod.h
===================================================================
--- include/asm-arm/bits/pod.h	(revision 3405)
+++ include/asm-arm/bits/pod.h	(working copy)
@@ -67,41 +67,41 @@
 #endif /* TIF_MMSWITCH_INT */
 }
 
-static inline void xnarch_switch_to(xnarchtcb_t * out_tcb, xnarchtcb_t * in_tcb)
-{
-	struct task_struct *prev = out_tcb->active_task;
-	struct mm_struct *prev_mm = out_tcb->active_mm;
-	struct task_struct *next = in_tcb->user_task;
-
-
-	if (likely(next != NULL)) {
-		in_tcb->active_task = next;
-		in_tcb->active_mm = in_tcb->mm;
-		rthal_clear_foreign_stack(&rthal_domain);
-	} else {
-		in_tcb->active_task = prev;
-		in_tcb->active_mm = prev_mm;
-		rthal_set_foreign_stack(&rthal_domain);
-	}
-
-	if (prev_mm != in_tcb->active_mm) {
-		/* Switch to new user-space thread? */
-		if (in_tcb->active_mm)
-			switch_mm(prev_mm, in_tcb->active_mm, next);
-		if (!next->mm)
-			enter_lazy_tlb(prev_mm, next);
-	}
-
-	/* Kernel-to-kernel context switch. */
-	rthal_thread_switch(prev, out_tcb->tip, in_tcb->tip);
+#define xnarch_switch_to(_out_tcb, _in_tcb, lock)			\
+{									\
+	xnarchtcb_t *in_tcb = (_in_tcb);				\
+	xnarchtcb_t *out_tcb = (_out_tcb);				\
+	struct task_struct *prev = out_tcb->active_task;		\
+	struct mm_struct *prev_mm = out_tcb->active_mm;			\
+	struct task_struct *next = in_tcb->user_task;			\
+									\
+									\
+	if (likely(next != NULL)) {					\
+		in_tcb->active_task = next;				\
+		in_tcb->active_mm = in_tcb->mm;				\
+		rthal_clear_foreign_stack(&rthal_domain);		\
+	} else {							\
+		in_tcb->active_task = prev;				\
+		in_tcb->active_mm = prev_mm;				\
+		rthal_set_foreign_stack(&rthal_domain);			\
+	}								\
+									\
+	if (prev_mm != in_tcb->active_mm) {				\
+		/* Switch to new user-space thread? */			\
+		if (in_tcb->active_mm) {				\
+			spl_t ignored;					\
+			xnlock_clear_irqon(lock);			\
+			switch_mm(prev_mm, in_tcb->active_mm, next);	\
+			xnlock_get_irqsave(lock, ignored);		\
+		}							\
+		if (!next->mm)						\
+			enter_lazy_tlb(prev_mm, next);			\
+	}								\
+									\
+	/* Kernel-to-kernel context switch. */				\
+	rthal_thread_switch(prev, out_tcb->tip, in_tcb->tip);		\
 }
 
-static inline void xnarch_finalize_and_switch(xnarchtcb_t * dead_tcb,
-					      xnarchtcb_t * next_tcb)
-{
-	xnarch_switch_to(dead_tcb, next_tcb);
-}
-
 static inline void xnarch_finalize_no_switch(xnarchtcb_t * dead_tcb)
 {
 	/* Empty */
Index: include/asm-arm/system.h
===================================================================
--- include/asm-arm/system.h	(revision 3405)
+++ include/asm-arm/system.h	(working copy)
@@ -31,6 +31,8 @@
 
 #define XNARCH_THREAD_STACKSZ   4096
 
+#define XNARCH_WANT_UNLOCKED_CTXSW
+
 #define xnarch_stack_size(tcb)  ((tcb)->stacksize)
 #define xnarch_user_task(tcb)   ((tcb)->user_task)
 #define xnarch_user_pid(tcb)    ((tcb)->user_task->pid)
Index: include/nucleus/thread.h
===================================================================
--- include/nucleus/thread.h	(revision 3405)
+++ include/nucleus/thread.h	(working copy)
@@ -61,6 +61,7 @@
 #define XNFPU     0x00100000 /**< Thread uses FPU */
 #define XNSHADOW  0x00200000 /**< Shadow thread */
 #define XNROOT    0x00400000 /**< Root thread (that is, Linux/IDLE) */
+#define XNSWLOCK  0x00800000 /**< Thread is currently switching context. */
 
 /*! @} */ /* Ends doxygen comment group: nucleus_state_flags */
 
Index: include/nucleus/pod.h
===================================================================
--- include/nucleus/pod.h	(revision 3405)
+++ include/nucleus/pod.h	(working copy)
@@ -139,6 +139,11 @@
 
 	xntimer_t htimer;	/*!< Host timer. */
 
+	xnqueue_t zombies;
+
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnthread_t *lastthread;
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 } xnsched_t;
 
 #define nkpod (&nkpod_struct)
@@ -238,6 +243,14 @@
 }
 #endif /* CONFIG_XENO_OPT_WATCHDOG */
 
+void __xnpod_finalize_zombies(xnsched_t *sched);
+
+static inline void xnpod_finalize_zombies(xnsched_t *sched)
+{
+	if (!emptyq_p(&sched->zombies))
+		__xnpod_finalize_zombies(sched);
+}
+
 	/* -- Beginning of the exported interface */
 
 #define xnpod_sched_slot(cpu) \
Index: ksrc/nucleus/pod.c
===================================================================
--- ksrc/nucleus/pod.c	(revision 3415)
+++ ksrc/nucleus/pod.c	(working copy)
@@ -292,6 +292,7 @@
 #endif /* CONFIG_SMP */
 		xntimer_set_name(&sched->htimer, htimer_name);
 		xntimer_set_sched(&sched->htimer, sched);
+		initq(&sched->zombies);
 	}
 
 	xnlock_put_irqrestore(&nklock, s);
@@ -545,63 +546,35 @@
 	__clrbits(sched->status, XNKCOUT);
 }
 
-static inline void xnpod_switch_zombie(xnthread_t *threadout,
-				       xnthread_t *threadin)
+void __xnpod_finalize_zombies(xnsched_t *sched)
 {
-	/* Must be called with nklock locked, interrupts off. */
-	xnsched_t *sched = xnpod_current_sched();
-#ifdef CONFIG_XENO_OPT_PERVASIVE
-	int shadow = xnthread_test_state(threadout, XNSHADOW);
-#endif /* CONFIG_XENO_OPT_PERVASIVE */
+	xnholder_t *holder;
 
-	trace_mark(xn_nucleus_sched_finalize,
-		   "thread_out %p thread_out_name %s "
-		   "thread_in %p thread_in_name %s",
-		   threadout, xnthread_name(threadout),
-		   threadin, xnthread_name(threadin));
+	while ((holder = getq(&sched->zombies))) {
+		xnthread_t *thread = link2thread(holder, glink);
 
-	if (!emptyq_p(&nkpod->tdeleteq) && !xnthread_test_state(threadout, XNROOT)) {
-		trace_mark(xn_nucleus_thread_callout,
-			   "thread %p thread_name %s hook %s",
-			   threadout, xnthread_name(threadout), "DELETE");
-		xnpod_fire_callouts(&nkpod->tdeleteq, threadout);
-	}
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+		if (thread == sched->runthread) {
+			appendq(&sched->zombies, &thread->glink);
+			break;
+		}
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 
-	sched->runthread = threadin;
+		/* Must be called with nklock locked, interrupts off. */
+		trace_mark(xn_nucleus_sched_finalize,
+			   "thread_out %p thread_out_name %s",
+			   thread, xnthread_name(thread));
 
-	if (xnthread_test_state(threadin, XNROOT)) {
-		xnpod_reset_watchdog(sched);
-		xnfreesync();
-		xnarch_enter_root(xnthread_archtcb(threadin));
+		if (!emptyq_p(&nkpod->tdeleteq)
+		    && !xnthread_test_state(thread, XNROOT)) {
+			trace_mark(xn_nucleus_thread_callout,
+				   "thread %p thread_name %s hook %s",
+				   thread, xnthread_name(thread), "DELETE");
+			xnpod_fire_callouts(&nkpod->tdeleteq, thread);
+		}
+
+		xnthread_cleanup_tcb(thread);
 	}
-
-	/* FIXME: Catch 22 here, whether we choose to run on an invalid
-	   stack (cleanup then hooks), or to access the TCB space shortly
-	   after it has been freed while non-preemptible (hooks then
-	   cleanup)... Option #2 is current. */
-
-	xnthread_cleanup_tcb(threadout);
-
-	xnstat_exectime_finalize(sched, &threadin->stat.account);
-
-	xnarch_finalize_and_switch(xnthread_archtcb(threadout),
-				   xnthread_archtcb(threadin));
-
-#ifdef CONFIG_XENO_OPT_PERVASIVE
-	xnarch_trace_pid(xnthread_user_task(threadin) ?
-			 xnarch_user_pid(xnthread_archtcb(threadin)) : -1,
-			 xnthread_current_priority(threadin));
-
-	if (shadow)
-		/* Reap the user-space mate of a deleted real-time shadow.
-		   The Linux task has resumed into the Linux domain at the
-		   last code location executed by the shadow. Remember
-		   that both sides use the Linux task's stack. */
-		xnshadow_exit();
-#endif /* CONFIG_XENO_OPT_PERVASIVE */
-
-	xnpod_fatal("zombie thread %s (%p) would not die...", threadout->name,
-		    threadout);
 }
 
 /*! 
@@ -1211,11 +1184,17 @@
 
 	xnthread_set_state(thread, XNZOMBIE);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW		
+	if (sched->runthread == thread
+	    || xnthread_test_state(thread, XNSWLOCK)) {
+#else /* XNARCH_WANT_UNLOCKED_CTXSW */
 	if (sched->runthread == thread) {
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 		/* We first need to elect a new runthread before switching out
 		   the current one forever. Use the thread zombie state to go
 		   through the rescheduling procedure then actually destroy
 		   the thread object. */
+		appendq(&sched->zombies, &thread->glink);
 		xnsched_set_resched(sched);
 		xnpod_schedule();
 	} else {
@@ -1788,7 +1767,7 @@
 	xnlock_get_irqsave(&nklock, s);
 
 	trace_mark(xn_nucleus_thread_renice,
-		   "thread %p thread_name %s priority %d",
+		   "thread %p thread_nmae %s priority %d",
 		   thread, xnthread_name(thread), prio);
 
 	oldprio = thread->cprio;
@@ -1899,7 +1878,11 @@
 
 	/* Put thread in the ready queue of the destination CPU's scheduler. */
 	xnpod_resume_thread(thread, 0);
-
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	/* Clear the resched bit to avoid that the migrated thread be
+	   scheduled on destination CPU. */
+	xnsched_clr_resched(thread->sched);
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 	xnpod_schedule();
 
 	/* Reset execution time measurement period so that we don't mess up
@@ -2140,6 +2123,21 @@
 
 void xnpod_welcome_thread(xnthread_t *thread, int imask)
 {
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnsched_t *sched = thread->sched;
+
+	xnthread_clear_state(sched->lastthread, XNSWLOCK);
+	xnthread_clear_state(sched->runthread, XNSWLOCK);
+
+	/* Detect a thread which called xnpod_migrate_thread */
+	if (sched->lastthread->sched != sched) {
+		xnsched_set_resched(sched);
+		xnsched_set_resched(sched->lastthread->sched);
+	}
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
+	xnpod_finalize_zombies(thread->sched);
+
 	trace_mark(xn_nucleus_thread_boot, "thread %p thread_name %s",
 		   thread, xnthread_name(thread));
 
@@ -2174,6 +2172,11 @@
 
 	xnlock_clear_irqoff(&nklock);
 	splexit(!!imask);
+
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnsched_tst_resched(sched))
+		xnpod_schedule();
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 }
 
 #ifdef CONFIG_XENO_HW_FPU
@@ -2373,6 +2376,7 @@
 	xnthread_t *threadout, *threadin, *runthread;
 	xnpholder_t *pholder;
 	xnsched_t *sched;
+	int zombie;
 #if defined(CONFIG_SMP) || XENO_DEBUG(NUCLEUS)
 	int need_resched;
 #endif /* CONFIG_SMP || XENO_DEBUG(NUCLEUS) */
@@ -2402,7 +2406,9 @@
 	xnarch_trace_pid(xnthread_user_task(runthread) ?
 			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
 			 xnthread_current_priority(runthread));
-
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+      restart:
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 #if defined(CONFIG_SMP) || XENO_DEBUG(NUCLEUS)
 	need_resched = xnsched_tst_resched(sched);
 #endif
@@ -2426,16 +2432,24 @@
 
 #endif /* CONFIG_SMP */
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnthread_test_state(runthread, XNSWLOCK))
+		goto unlock_and_exit;
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	/* Clear the rescheduling bit */
 	xnsched_clr_resched(sched);
 
+	zombie = xnthread_test_state(runthread, XNZOMBIE);
 	if (!xnthread_test_state(runthread, XNTHREAD_BLOCK_BITS | XNZOMBIE)) {
 
 		/* Do not preempt the current thread if it holds the
 		 * scheduler lock. */
 
-		if (xnthread_test_state(runthread, XNLOCK))
+		if (xnthread_test_state(runthread, XNLOCK)) {
+			xnsched_set_resched(sched);
 			goto signal_unlock_and_exit;
+		}
 
 		pholder = sched_getheadpq(&sched->readyq);
 
@@ -2491,9 +2505,6 @@
 	shadow = xnthread_test_state(threadout, XNSHADOW);
 #endif /* CONFIG_XENO_OPT_PERVASIVE */
 
-	if (xnthread_test_state(threadout, XNZOMBIE))
-		xnpod_switch_zombie(threadout, threadin);
-
 	sched->runthread = threadin;
 
 	if (xnthread_test_state(threadout, XNROOT))
@@ -2507,8 +2518,18 @@
 	xnstat_exectime_switch(sched, &threadin->stat.account);
 	xnstat_counter_inc(&threadin->stat.csw);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	sched->lastthread = threadout;
+	xnthread_set_state(threadout, XNSWLOCK);
+	xnthread_set_state(threadin, XNSWLOCK);
+
 	xnarch_switch_to(xnthread_archtcb(threadout),
+			 xnthread_archtcb(threadin),
+			 &nklock);
+#else /* !XNARCH_WANT_UNLOCKED_CTXSW */	
+	xnarch_switch_to(xnthread_archtcb(threadout),
 			 xnthread_archtcb(threadin));
+#endif /* !XNARCH_WANT_UNLOCKED_CTXSW */
 
 #ifdef CONFIG_SMP
 	/* If threadout migrated while suspended, sched is no longer correct. */
@@ -2525,23 +2546,27 @@
 #ifdef CONFIG_XENO_OPT_PERVASIVE
 	/* Test whether we are relaxing a thread. In such a case, we are here the
 	   epilogue of Linux' schedule, and should skip xnpod_schedule epilogue. */
-	if (shadow && xnthread_test_state(runthread, XNROOT)) {
-		spl_t ignored;
-		/* Shadow on entry and root without shadow extension on exit? 
-		   Mmmm... This must be the user-space mate of a deleted real-time
-		   shadow we've just rescheduled in the Linux domain to have it
-		   exit properly.  Reap it now. */
-		if (xnshadow_thrptd(current) == NULL)
-			xnshadow_exit();
+	if (shadow && xnthread_test_state(runthread, XNROOT))
+		goto relax_epilogue;
+#endif /* CONFIG_XENO_OPT_PERVASIVE */
 
-		/* We need to relock nklock here, since it is not locked and
-		   the caller may expect it to be locked. */
-		xnlock_get_irqsave(&nklock, ignored);
-		xnlock_put_irqrestore(&nklock, s);
-		return;
+	if (zombie)
+		xnpod_fatal("zombie thread %s (%p) would not die...",
+			    threadout->name, threadout);
+
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnthread_clear_state(sched->lastthread, XNSWLOCK);
+	xnthread_clear_state(sched->runthread, XNSWLOCK);
+
+	/* Detect a thread which called xnpod_migrate_thread */
+	if (sched->lastthread->sched != sched) {
+		xnsched_set_resched(sched);
+		xnsched_set_resched(sched->lastthread->sched);
 	}
-#endif /* CONFIG_XENO_OPT_PERVASIVE */
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 
+	xnpod_finalize_zombies(sched);
+
 #ifdef CONFIG_XENO_HW_FPU
 	__xnpod_switch_fpu(sched);
 #endif /* CONFIG_XENO_HW_FPU */
@@ -2558,12 +2583,42 @@
 		xnpod_fire_callouts(&nkpod->tswitchq, runthread);
 	}
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnsched_tst_resched(sched)) {
+		if (xnthread_signaled_p(runthread))
+			xnpod_dispatch_signals();
+		goto restart;
+	}
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
       signal_unlock_and_exit:
 
 	if (xnthread_signaled_p(runthread))
 		xnpod_dispatch_signals();
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+      unlock_and_exit:
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 	xnlock_put_irqrestore(&nklock, s);
+	return;
+
+#ifdef CONFIG_XENO_OPT_PERVASIVE
+      relax_epilogue:
+	{
+		spl_t ignored;
+		/* Shadow on entry and root without shadow extension on exit? 
+		   Mmmm... This must be the user-space mate of a deleted real-time
+		   shadow we've just rescheduled in the Linux domain to have it
+		   exit properly.  Reap it now. */
+		if (xnshadow_thrptd(current) == NULL)
+			xnshadow_exit();
+
+		/* We need to relock nklock here, since it is not locked and
+		   the caller may expect it to be locked. */
+		xnlock_get_irqsave(&nklock, ignored);
+		xnlock_put_irqrestore(&nklock, s);
+	}
+#endif /* CONFIG_XENO_OPT_PERVASIVE */
 }
 
 /*! 
@@ -2664,9 +2719,6 @@
 	if (threadin == runthread)
 		return;		/* No switch. */
 
-	if (xnthread_test_state(runthread, XNZOMBIE))
-		xnpod_switch_zombie(runthread, threadin);
-
 	sched->runthread = threadin;
 
 	if (xnthread_test_state(runthread, XNROOT))
@@ -2684,18 +2736,41 @@
 	xnstat_exectime_switch(sched, &threadin->stat.account);
 	xnstat_counter_inc(&threadin->stat.csw);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	sched->lastthread = runthread;
+	xnthread_set_state(runthread, XNSWLOCK);
+	xnthread_set_state(threadin, XNSWLOCK);
+
 	xnarch_switch_to(xnthread_archtcb(runthread),
+			 xnthread_archtcb(threadin),
+			 &nklock);
+#else /* !XNARCH_WANT_UNLOCKED_CTXSW */	
+	xnarch_switch_to(xnthread_archtcb(runthread),
 			 xnthread_archtcb(threadin));
+#endif /* !XNARCH_WANT_UNLOCKED_CTXSW */
 
-	xnarch_trace_pid(xnthread_user_task(runthread) ?
-			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
-			 xnthread_current_priority(runthread));
-
 #ifdef CONFIG_SMP
 	/* If runthread migrated while suspended, sched is no longer correct. */
 	sched = xnpod_current_sched();
 #endif
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnthread_clear_state(sched->lastthread, XNSWLOCK);
+	xnthread_clear_state(sched->runthread, XNSWLOCK);
+
+	/* Detect a thread which called xnpod_migrate_thread */
+	if (sched->lastthread->sched != sched) {
+		xnsched_set_resched(sched);
+		xnsched_set_resched(sched->lastthread->sched);
+	}
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
+	xnpod_finalize_zombies(sched);
+
+	xnarch_trace_pid(xnthread_user_task(runthread) ?
+			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
+			 xnthread_current_priority(runthread));
+
 #ifdef CONFIG_XENO_HW_FPU
 	__xnpod_switch_fpu(sched);
 #endif /* CONFIG_XENO_HW_FPU */
@@ -2704,6 +2779,11 @@
 	if (nkpod->schedhook && runthread == sched->runthread)
 		nkpod->schedhook(runthread, XNRUNNING);
 #endif /* __XENO_SIM__ */
+
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnsched_tst_resched(sched))
+		xnpod_schedule();
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 }
 
 /*! 
Index: ksrc/nucleus/shadow.c
===================================================================
--- ksrc/nucleus/shadow.c	(revision 3405)
+++ ksrc/nucleus/shadow.c	(working copy)
@@ -1059,6 +1059,7 @@
 	struct task_struct *this_task = current;
 	struct __gatekeeper *gk;
 	xnthread_t *thread;
+	xnsched_t *sched;
 	int gk_cpu;
 
 redo:
@@ -1124,9 +1125,23 @@
 	}
 
 	/* "current" is now running into the Xenomai domain. */
+	sched = xnpod_current_sched();
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnthread_clear_state(sched->lastthread, XNSWLOCK);
+	xnthread_clear_state(sched->runthread, XNSWLOCK);
+
+	/* Detect a thread which called xnpod_migrate_thread */
+	if (sched->lastthread->sched != sched) {
+		xnsched_set_resched(sched);
+		xnsched_set_resched(sched->lastthread->sched);
+	}
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
+	xnpod_finalize_zombies(sched);
+
 #ifdef CONFIG_XENO_HW_FPU
-	xnpod_switch_fpu(xnpod_current_sched());
+	xnpod_switch_fpu(sched);
 #endif /* CONFIG_XENO_HW_FPU */
 
 	xnarch_schedule_tail(this_task);
@@ -1149,6 +1164,11 @@
 	trace_mark(xn_nucleus_shadow_hardened, "thread %p thread_name %s",
 		   thread, xnthread_name(thread));
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnsched_tst_resched(sched))
+		xnpod_schedule();
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
       [not found] ` <18315.63245.160672.547658@domain.hid>
@ 2008-01-22 20:36   ` Gilles Chanteperdrix
  2008-01-23 17:48     ` Philippe Gerum
  0 siblings, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-22 20:36 UTC (permalink / raw)
  To: xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 1355 bytes --]

Gilles Chanteperdrix wrote:
 > Please find attached a patch implementing these ideas. This adds some
 > clutter, which I would be happy to reduce. Better ideas are welcome.
 > 

Ok. New version of the patch, this time split in two parts, should
hopefully make it more readable.

 > 
 >  > 
 >  > - avoid using user-space real-time tasks when running latency
 >  > kernel-space benches, i.e. at least in the latency -t 1 and latency -t
 >  > 2 case. This means that we should change the timerbench driver. There
 >  > are at least two ways of doing this:
 >  > use an rt_pipe
 >  >  modify the timerbench driver to implement only the nrt ioctl, using
 >  > vanilla linux services such as wait_event and wake_up.
 >  > 
 >  > What do you think ?
 > 
 > So, what do you thing is the best way to change the timerbench driver,
 > * use an rt_pipe ? Pros: allows to run latency -t 1 and latency -t 2 even
 >  if Xenomai is compiled with CONFIG_XENO_OPT_PERVASIVE off; cons: make
 >  the timerbench non portable on other implementations of rtdm, eg. rtdm
 >  over rtai or the version of rtdm which runs over vanilla linux
 > * modify the timerbecn driver to implement only nrt ioctls ? Pros:
 >   better driver portability; cons: latency would still need
 >   CONFIG_XENO_OPT_PERVASIVE to run latency -t 1 and latency -t 2.

-- 


					    Gilles Chanteperdrix.

[-- Attachment #2: xeno-rework-self-deletion.diff --]
[-- Type: text/plain, Size: 9127 bytes --]

Index: include/nucleus/pod.h
===================================================================
--- include/nucleus/pod.h	(revision 3405)
+++ include/nucleus/pod.h	(working copy)
@@ -139,6 +139,7 @@
 
 	xntimer_t htimer;	/*!< Host timer. */
 
+	xnqueue_t zombies;
 } xnsched_t;
 
 #define nkpod (&nkpod_struct)
@@ -238,6 +239,14 @@
 }
 #endif /* CONFIG_XENO_OPT_WATCHDOG */
 
+void __xnpod_finalize_zombies(xnsched_t *sched);
+
+static inline void xnpod_finalize_zombies(xnsched_t *sched)
+{
+	if (!emptyq_p(&sched->zombies))
+		__xnpod_finalize_zombies(sched);
+}
+
 	/* -- Beginning of the exported interface */
 
 #define xnpod_sched_slot(cpu) \
Index: ksrc/nucleus/pod.c
===================================================================
--- ksrc/nucleus/pod.c	(revision 3415)
+++ ksrc/nucleus/pod.c	(working copy)
@@ -292,6 +292,7 @@
 #endif /* CONFIG_SMP */
 		xntimer_set_name(&sched->htimer, htimer_name);
 		xntimer_set_sched(&sched->htimer, sched);
+		initq(&sched->zombies);
 	}
 
 	xnlock_put_irqrestore(&nklock, s);
@@ -545,63 +546,28 @@
 	__clrbits(sched->status, XNKCOUT);
 }
 
-static inline void xnpod_switch_zombie(xnthread_t *threadout,
-				       xnthread_t *threadin)
+void __xnpod_finalize_zombies(xnsched_t *sched)
 {
-	/* Must be called with nklock locked, interrupts off. */
-	xnsched_t *sched = xnpod_current_sched();
-#ifdef CONFIG_XENO_OPT_PERVASIVE
-	int shadow = xnthread_test_state(threadout, XNSHADOW);
-#endif /* CONFIG_XENO_OPT_PERVASIVE */
+	xnholder_t *holder;
 
-	trace_mark(xn_nucleus_sched_finalize,
-		   "thread_out %p thread_out_name %s "
-		   "thread_in %p thread_in_name %s",
-		   threadout, xnthread_name(threadout),
-		   threadin, xnthread_name(threadin));
+	while ((holder = getq(&sched->zombies))) {
+		xnthread_t *thread = link2thread(holder, glink);
 
-	if (!emptyq_p(&nkpod->tdeleteq) && !xnthread_test_state(threadout, XNROOT)) {
-		trace_mark(xn_nucleus_thread_callout,
-			   "thread %p thread_name %s hook %s",
-			   threadout, xnthread_name(threadout), "DELETE");
-		xnpod_fire_callouts(&nkpod->tdeleteq, threadout);
-	}
+		/* Must be called with nklock locked, interrupts off. */
+		trace_mark(xn_nucleus_sched_finalize,
+			   "thread_out %p thread_out_name %s",
+			   thread, xnthread_name(thread));
 
-	sched->runthread = threadin;
+		if (!emptyq_p(&nkpod->tdeleteq)
+		    && !xnthread_test_state(thread, XNROOT)) {
+			trace_mark(xn_nucleus_thread_callout,
+				   "thread %p thread_name %s hook %s",
+				   thread, xnthread_name(thread), "DELETE");
+			xnpod_fire_callouts(&nkpod->tdeleteq, thread);
+		}
 
-	if (xnthread_test_state(threadin, XNROOT)) {
-		xnpod_reset_watchdog(sched);
-		xnfreesync();
-		xnarch_enter_root(xnthread_archtcb(threadin));
+		xnthread_cleanup_tcb(thread);
 	}
-
-	/* FIXME: Catch 22 here, whether we choose to run on an invalid
-	   stack (cleanup then hooks), or to access the TCB space shortly
-	   after it has been freed while non-preemptible (hooks then
-	   cleanup)... Option #2 is current. */
-
-	xnthread_cleanup_tcb(threadout);
-
-	xnstat_exectime_finalize(sched, &threadin->stat.account);
-
-	xnarch_finalize_and_switch(xnthread_archtcb(threadout),
-				   xnthread_archtcb(threadin));
-
-#ifdef CONFIG_XENO_OPT_PERVASIVE
-	xnarch_trace_pid(xnthread_user_task(threadin) ?
-			 xnarch_user_pid(xnthread_archtcb(threadin)) : -1,
-			 xnthread_current_priority(threadin));
-
-	if (shadow)
-		/* Reap the user-space mate of a deleted real-time shadow.
-		   The Linux task has resumed into the Linux domain at the
-		   last code location executed by the shadow. Remember
-		   that both sides use the Linux task's stack. */
-		xnshadow_exit();
-#endif /* CONFIG_XENO_OPT_PERVASIVE */
-
-	xnpod_fatal("zombie thread %s (%p) would not die...", threadout->name,
-		    threadout);
 }
 
 /*! 
@@ -1216,6 +1182,7 @@
 		   the current one forever. Use the thread zombie state to go
 		   through the rescheduling procedure then actually destroy
 		   the thread object. */
+		appendq(&sched->zombies, &thread->glink);
 		xnsched_set_resched(sched);
 		xnpod_schedule();
 	} else {
@@ -2140,6 +2107,8 @@
 
 void xnpod_welcome_thread(xnthread_t *thread, int imask)
 {
+	xnpod_finalize_zombies(thread->sched);
+
 	trace_mark(xn_nucleus_thread_boot, "thread %p thread_name %s",
 		   thread, xnthread_name(thread));
 
@@ -2373,6 +2342,7 @@
 	xnthread_t *threadout, *threadin, *runthread;
 	xnpholder_t *pholder;
 	xnsched_t *sched;
+	int zombie;
 #if defined(CONFIG_SMP) || XENO_DEBUG(NUCLEUS)
 	int need_resched;
 #endif /* CONFIG_SMP || XENO_DEBUG(NUCLEUS) */
@@ -2402,7 +2372,6 @@
 	xnarch_trace_pid(xnthread_user_task(runthread) ?
 			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
 			 xnthread_current_priority(runthread));
-
 #if defined(CONFIG_SMP) || XENO_DEBUG(NUCLEUS)
 	need_resched = xnsched_tst_resched(sched);
 #endif
@@ -2429,13 +2398,16 @@
 	/* Clear the rescheduling bit */
 	xnsched_clr_resched(sched);
 
+	zombie = xnthread_test_state(runthread, XNZOMBIE);
 	if (!xnthread_test_state(runthread, XNTHREAD_BLOCK_BITS | XNZOMBIE)) {
 
 		/* Do not preempt the current thread if it holds the
 		 * scheduler lock. */
 
-		if (xnthread_test_state(runthread, XNLOCK))
+		if (xnthread_test_state(runthread, XNLOCK)) {
+			xnsched_set_resched(sched);
 			goto signal_unlock_and_exit;
+		}
 
 		pholder = sched_getheadpq(&sched->readyq);
 
@@ -2491,9 +2463,6 @@
 	shadow = xnthread_test_state(threadout, XNSHADOW);
 #endif /* CONFIG_XENO_OPT_PERVASIVE */
 
-	if (xnthread_test_state(threadout, XNZOMBIE))
-		xnpod_switch_zombie(threadout, threadin);
-
 	sched->runthread = threadin;
 
 	if (xnthread_test_state(threadout, XNROOT))
@@ -2525,23 +2494,16 @@
 #ifdef CONFIG_XENO_OPT_PERVASIVE
 	/* Test whether we are relaxing a thread. In such a case, we are here the
 	   epilogue of Linux' schedule, and should skip xnpod_schedule epilogue. */
-	if (shadow && xnthread_test_state(runthread, XNROOT)) {
-		spl_t ignored;
-		/* Shadow on entry and root without shadow extension on exit? 
-		   Mmmm... This must be the user-space mate of a deleted real-time
-		   shadow we've just rescheduled in the Linux domain to have it
-		   exit properly.  Reap it now. */
-		if (xnshadow_thrptd(current) == NULL)
-			xnshadow_exit();
-
-		/* We need to relock nklock here, since it is not locked and
-		   the caller may expect it to be locked. */
-		xnlock_get_irqsave(&nklock, ignored);
-		xnlock_put_irqrestore(&nklock, s);
-		return;
-	}
+	if (shadow && xnthread_test_state(runthread, XNROOT))
+		goto relax_epilogue;
 #endif /* CONFIG_XENO_OPT_PERVASIVE */
 
+	if (zombie)
+		xnpod_fatal("zombie thread %s (%p) would not die...",
+			    threadout->name, threadout);
+
+	xnpod_finalize_zombies(sched);
+
 #ifdef CONFIG_XENO_HW_FPU
 	__xnpod_switch_fpu(sched);
 #endif /* CONFIG_XENO_HW_FPU */
@@ -2564,6 +2526,25 @@
 		xnpod_dispatch_signals();
 
 	xnlock_put_irqrestore(&nklock, s);
+	return;
+
+#ifdef CONFIG_XENO_OPT_PERVASIVE
+      relax_epilogue:
+	{
+		spl_t ignored;
+		/* Shadow on entry and root without shadow extension on exit? 
+		   Mmmm... This must be the user-space mate of a deleted real-time
+		   shadow we've just rescheduled in the Linux domain to have it
+		   exit properly.  Reap it now. */
+		if (xnshadow_thrptd(current) == NULL)
+			xnshadow_exit();
+
+		/* We need to relock nklock here, since it is not locked and
+		   the caller may expect it to be locked. */
+		xnlock_get_irqsave(&nklock, ignored);
+		xnlock_put_irqrestore(&nklock, s);
+	}
+#endif /* CONFIG_XENO_OPT_PERVASIVE */
 }
 
 /*! 
@@ -2664,9 +2645,6 @@
 	if (threadin == runthread)
 		return;		/* No switch. */
 
-	if (xnthread_test_state(runthread, XNZOMBIE))
-		xnpod_switch_zombie(runthread, threadin);
-
 	sched->runthread = threadin;
 
 	if (xnthread_test_state(runthread, XNROOT))
@@ -2687,15 +2665,17 @@
 	xnarch_switch_to(xnthread_archtcb(runthread),
 			 xnthread_archtcb(threadin));
 
-	xnarch_trace_pid(xnthread_user_task(runthread) ?
-			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
-			 xnthread_current_priority(runthread));
-
 #ifdef CONFIG_SMP
 	/* If runthread migrated while suspended, sched is no longer correct. */
 	sched = xnpod_current_sched();
 #endif
 
+	xnpod_finalize_zombies(sched);
+
+	xnarch_trace_pid(xnthread_user_task(runthread) ?
+			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
+			 xnthread_current_priority(runthread));
+
 #ifdef CONFIG_XENO_HW_FPU
 	__xnpod_switch_fpu(sched);
 #endif /* CONFIG_XENO_HW_FPU */
Index: ksrc/nucleus/shadow.c
===================================================================
--- ksrc/nucleus/shadow.c	(revision 3405)
+++ ksrc/nucleus/shadow.c	(working copy)
@@ -1059,6 +1059,7 @@
 	struct task_struct *this_task = current;
 	struct __gatekeeper *gk;
 	xnthread_t *thread;
+	xnsched_t *sched;
 	int gk_cpu;
 
 redo:
@@ -1124,9 +1125,12 @@
 	}
 
 	/* "current" is now running into the Xenomai domain. */
+	sched = xnpod_current_sched();
 
+	xnpod_finalize_zombies(sched);
+
 #ifdef CONFIG_XENO_HW_FPU
-	xnpod_switch_fpu(xnpod_current_sched());
+	xnpod_switch_fpu(sched);
 #endif /* CONFIG_XENO_HW_FPU */
 
 	xnarch_schedule_tail(this_task);

[-- Attachment #3: xeno-unlocked-arm-ctx-switch.2.diff --]
[-- Type: text/plain, Size: 11988 bytes --]

diff -Naurdp -x .svn -x '*~' rework_self_deletion/include/asm-arm/bits/pod.h trunk/include/asm-arm/bits/pod.h
--- rework_self_deletion/include/asm-arm/bits/pod.h	2008-01-15 21:14:03.000000000 +0100
+++ trunk/include/asm-arm/bits/pod.h	2008-01-15 00:43:50.000000000 +0100
@@ -67,39 +67,39 @@ static inline void xnarch_enter_root(xna
 #endif /* TIF_MMSWITCH_INT */
 }
 
-static inline void xnarch_switch_to(xnarchtcb_t * out_tcb, xnarchtcb_t * in_tcb)
-{
-	struct task_struct *prev = out_tcb->active_task;
-	struct mm_struct *prev_mm = out_tcb->active_mm;
-	struct task_struct *next = in_tcb->user_task;
-
-
-	if (likely(next != NULL)) {
-		in_tcb->active_task = next;
-		in_tcb->active_mm = in_tcb->mm;
-		rthal_clear_foreign_stack(&rthal_domain);
-	} else {
-		in_tcb->active_task = prev;
-		in_tcb->active_mm = prev_mm;
-		rthal_set_foreign_stack(&rthal_domain);
-	}
-
-	if (prev_mm != in_tcb->active_mm) {
-		/* Switch to new user-space thread? */
-		if (in_tcb->active_mm)
-			switch_mm(prev_mm, in_tcb->active_mm, next);
-		if (!next->mm)
-			enter_lazy_tlb(prev_mm, next);
-	}
-
-	/* Kernel-to-kernel context switch. */
-	rthal_thread_switch(prev, out_tcb->tip, in_tcb->tip);
-}
-
-static inline void xnarch_finalize_and_switch(xnarchtcb_t * dead_tcb,
-					      xnarchtcb_t * next_tcb)
-{
-	xnarch_switch_to(dead_tcb, next_tcb);
+#define xnarch_switch_to(_out_tcb, _in_tcb, lock)			\
+{									\
+	xnarchtcb_t *in_tcb = (_in_tcb);				\
+	xnarchtcb_t *out_tcb = (_out_tcb);				\
+	struct task_struct *prev = out_tcb->active_task;		\
+	struct mm_struct *prev_mm = out_tcb->active_mm;			\
+	struct task_struct *next = in_tcb->user_task;			\
+									\
+									\
+	if (likely(next != NULL)) {					\
+		in_tcb->active_task = next;				\
+		in_tcb->active_mm = in_tcb->mm;				\
+		rthal_clear_foreign_stack(&rthal_domain);		\
+	} else {							\
+		in_tcb->active_task = prev;				\
+		in_tcb->active_mm = prev_mm;				\
+		rthal_set_foreign_stack(&rthal_domain);			\
+	}								\
+									\
+	if (prev_mm != in_tcb->active_mm) {				\
+		/* Switch to new user-space thread? */			\
+		if (in_tcb->active_mm) {				\
+			spl_t ignored;					\
+			xnlock_clear_irqon(lock);			\
+			switch_mm(prev_mm, in_tcb->active_mm, next);	\
+			xnlock_get_irqsave(lock, ignored);		\
+		}							\
+		if (!next->mm)						\
+			enter_lazy_tlb(prev_mm, next);			\
+	}								\
+									\
+	/* Kernel-to-kernel context switch. */				\
+	rthal_thread_switch(prev, out_tcb->tip, in_tcb->tip);		\
 }
 
 static inline void xnarch_finalize_no_switch(xnarchtcb_t * dead_tcb)
diff -Naurdp -x .svn -x '*~' rework_self_deletion/include/asm-arm/system.h trunk/include/asm-arm/system.h
--- rework_self_deletion/include/asm-arm/system.h	2008-01-15 21:13:47.000000000 +0100
+++ trunk/include/asm-arm/system.h	2008-01-15 00:30:47.000000000 +0100
@@ -31,6 +31,8 @@
 
 #define XNARCH_THREAD_STACKSZ   4096
 
+#define XNARCH_WANT_UNLOCKED_CTXSW
+
 #define xnarch_stack_size(tcb)  ((tcb)->stacksize)
 #define xnarch_user_task(tcb)   ((tcb)->user_task)
 #define xnarch_user_pid(tcb)    ((tcb)->user_task->pid)
diff -Naurdp -x .svn -x '*~' rework_self_deletion/include/nucleus/pod.h trunk/include/nucleus/pod.h
--- rework_self_deletion/include/nucleus/pod.h	2008-01-15 21:13:28.000000000 +0100
+++ trunk/include/nucleus/pod.h	2008-01-15 00:07:37.000000000 +0100
@@ -140,6 +140,10 @@ typedef struct xnsched {
 	xntimer_t htimer;	/*!< Host timer. */
 
 	xnqueue_t zombies;
+
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnthread_t *lastthread;
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 } xnsched_t;
 
 #define nkpod (&nkpod_struct)
diff -Naurdp -x .svn -x '*~' rework_self_deletion/include/nucleus/thread.h trunk/include/nucleus/thread.h
--- rework_self_deletion/include/nucleus/thread.h	2008-01-15 21:13:13.000000000 +0100
+++ trunk/include/nucleus/thread.h	2008-01-13 22:21:03.000000000 +0100
@@ -61,6 +61,7 @@
 #define XNFPU     0x00100000 /**< Thread uses FPU */
 #define XNSHADOW  0x00200000 /**< Shadow thread */
 #define XNROOT    0x00400000 /**< Root thread (that is, Linux/IDLE) */
+#define XNSWLOCK  0x00800000 /**< Thread is currently switching context. */
 
 /*! @} */ /* Ends doxygen comment group: nucleus_state_flags */
 
diff -Naurdp -x .svn -x '*~' rework_self_deletion/ksrc/nucleus/pod.c trunk/ksrc/nucleus/pod.c
--- rework_self_deletion/ksrc/nucleus/pod.c	2008-01-15 21:19:19.000000000 +0100
+++ trunk/ksrc/nucleus/pod.c	2008-01-15 21:25:48.000000000 +0100
@@ -395,6 +395,9 @@ int xnpod_init(void)
 		appendq(&pod->threadq, &sched->rootcb.glink);
 
 		sched->runthread = &sched->rootcb;
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+		sched->lastthread = &sched->rootcb;
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 #ifdef CONFIG_XENO_HW_FPU
 		sched->fpuholder = &sched->rootcb;
 #endif /* CONFIG_XENO_HW_FPU */
@@ -553,6 +556,13 @@ void __xnpod_finalize_zombies(xnsched_t 
 	while ((holder = getq(&sched->zombies))) {
 		xnthread_t *thread = link2thread(holder, glink);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+		if (thread == sched->runthread) {
+			appendq(&sched->zombies, &thread->glink);
+			break;
+		}
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 		/* Must be called with nklock locked, interrupts off. */
 		trace_mark(xn_nucleus_sched_finalize,
 			   "thread_out %p thread_out_name %s",
@@ -1177,7 +1187,12 @@ void xnpod_delete_thread(xnthread_t *thr
 
 	xnthread_set_state(thread, XNZOMBIE);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW		
+	if (sched->runthread == thread
+	    || xnthread_test_state(thread, XNSWLOCK)) {
+#else /* XNARCH_WANT_UNLOCKED_CTXSW */
 	if (sched->runthread == thread) {
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 		/* We first need to elect a new runthread before switching out
 		   the current one forever. Use the thread zombie state to go
 		   through the rescheduling procedure then actually destroy
@@ -1864,8 +1879,10 @@ int xnpod_migrate_thread(int cpu)
 	/* Migrate the thread periodic timer. */
 	xntimer_set_sched(&thread->ptimer, thread->sched);
 
+#ifndef XNARCH_WANT_UNLOCKED_CTXSW
 	/* Put thread in the ready queue of the destination CPU's scheduler. */
 	xnpod_resume_thread(thread, 0);
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 
 	xnpod_schedule();
 
@@ -2107,6 +2124,17 @@ void xnpod_dispatch_signals(void)
 
 void xnpod_welcome_thread(xnthread_t *thread, int imask)
 {
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnsched_t *sched = thread->sched;
+
+	xnthread_clear_state(sched->lastthread, XNSWLOCK);
+	xnthread_clear_state(sched->runthread, XNSWLOCK);
+
+	/* Detect a thread which called xnpod_migrate_thread */
+	if (sched->lastthread->sched != sched)
+		xnpod_resume_thread(sched->lastthread, 0);
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	xnpod_finalize_zombies(thread->sched);
 
 	trace_mark(xn_nucleus_thread_boot, "thread %p thread_name %s",
@@ -2143,6 +2171,11 @@ void xnpod_welcome_thread(xnthread_t *th
 
 	xnlock_clear_irqoff(&nklock);
 	splexit(!!imask);
+
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnsched_resched_p())
+		xnpod_schedule();
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 }
 
 #ifdef CONFIG_XENO_HW_FPU
@@ -2372,6 +2405,9 @@ void xnpod_schedule(void)
 	xnarch_trace_pid(xnthread_user_task(runthread) ?
 			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
 			 xnthread_current_priority(runthread));
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+      restart:
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 #if defined(CONFIG_SMP) || XENO_DEBUG(NUCLEUS)
 	need_resched = xnsched_tst_resched(sched);
 #endif
@@ -2395,6 +2431,11 @@ void xnpod_schedule(void)
 
 #endif /* CONFIG_SMP */
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnthread_test_state(runthread, XNSWLOCK))
+		goto unlock_and_exit;
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	/* Clear the rescheduling bit */
 	xnsched_clr_resched(sched);
 
@@ -2476,8 +2517,18 @@ void xnpod_schedule(void)
 	xnstat_exectime_switch(sched, &threadin->stat.account);
 	xnstat_counter_inc(&threadin->stat.csw);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	sched->lastthread = threadout;
+	xnthread_set_state(threadout, XNSWLOCK);
+	xnthread_set_state(threadin, XNSWLOCK);
+
+	xnarch_switch_to(xnthread_archtcb(threadout),
+			 xnthread_archtcb(threadin),
+			 &nklock);
+#else /* !XNARCH_WANT_UNLOCKED_CTXSW */	
 	xnarch_switch_to(xnthread_archtcb(threadout),
 			 xnthread_archtcb(threadin));
+#endif /* !XNARCH_WANT_UNLOCKED_CTXSW */
 
 #ifdef CONFIG_SMP
 	/* If threadout migrated while suspended, sched is no longer correct. */
@@ -2502,6 +2553,15 @@ void xnpod_schedule(void)
 		xnpod_fatal("zombie thread %s (%p) would not die...",
 			    threadout->name, threadout);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnthread_clear_state(sched->lastthread, XNSWLOCK);
+	xnthread_clear_state(sched->runthread, XNSWLOCK);
+
+	/* Detect a thread which called xnpod_migrate_thread */
+	if (sched->lastthread->sched != sched)
+		xnpod_resume_thread(sched->lastthread, 0);
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	xnpod_finalize_zombies(sched);
 
 #ifdef CONFIG_XENO_HW_FPU
@@ -2520,11 +2580,22 @@ void xnpod_schedule(void)
 		xnpod_fire_callouts(&nkpod->tswitchq, runthread);
 	}
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnsched_resched_p()) {
+		if (xnthread_signaled_p(runthread))
+			xnpod_dispatch_signals();
+		goto restart;
+	}
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
       signal_unlock_and_exit:
 
 	if (xnthread_signaled_p(runthread))
 		xnpod_dispatch_signals();
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+      unlock_and_exit:
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 	xnlock_put_irqrestore(&nklock, s);
 	return;
 
@@ -2662,14 +2733,33 @@ void xnpod_schedule_runnable(xnthread_t 
 	xnstat_exectime_switch(sched, &threadin->stat.account);
 	xnstat_counter_inc(&threadin->stat.csw);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	sched->lastthread = runthread;
+	xnthread_set_state(runthread, XNSWLOCK);
+	xnthread_set_state(threadin, XNSWLOCK);
+
+	xnarch_switch_to(xnthread_archtcb(runthread),
+			 xnthread_archtcb(threadin),
+			 &nklock);
+#else /* !XNARCH_WANT_UNLOCKED_CTXSW */	
 	xnarch_switch_to(xnthread_archtcb(runthread),
 			 xnthread_archtcb(threadin));
+#endif /* !XNARCH_WANT_UNLOCKED_CTXSW */
 
 #ifdef CONFIG_SMP
 	/* If runthread migrated while suspended, sched is no longer correct. */
 	sched = xnpod_current_sched();
 #endif
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnthread_clear_state(sched->lastthread, XNSWLOCK);
+	xnthread_clear_state(sched->runthread, XNSWLOCK);
+
+	/* Detect a thread which called xnpod_migrate_thread */
+	if (sched->lastthread->sched != sched)
+		xnpod_resume_thread(sched->lastthread, 0);
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	xnpod_finalize_zombies(sched);
 
 	xnarch_trace_pid(xnthread_user_task(runthread) ?
@@ -2684,6 +2774,11 @@ void xnpod_schedule_runnable(xnthread_t 
 	if (nkpod->schedhook && runthread == sched->runthread)
 		nkpod->schedhook(runthread, XNRUNNING);
 #endif /* __XENO_SIM__ */
+
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnsched_resched_p())
+		xnpod_schedule();
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 }
 
 /*! 
diff -Naurdp -x .svn -x '*~' rework_self_deletion/ksrc/nucleus/shadow.c trunk/ksrc/nucleus/shadow.c
--- rework_self_deletion/ksrc/nucleus/shadow.c	2008-01-15 21:14:36.000000000 +0100
+++ trunk/ksrc/nucleus/shadow.c	2008-01-15 20:44:18.000000000 +0100
@@ -1127,6 +1127,15 @@ redo:
 	/* "current" is now running into the Xenomai domain. */
 	sched = xnpod_current_sched();
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnthread_clear_state(sched->lastthread, XNSWLOCK);
+	xnthread_clear_state(sched->runthread, XNSWLOCK);
+
+	/* Detect a thread which called xnpod_migrate_thread */
+	if (sched->lastthread->sched != sched)
+		xnpod_resume_thread(sched->lastthread, 0);
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	xnpod_finalize_zombies(sched);
 
 #ifdef CONFIG_XENO_HW_FPU
@@ -1153,6 +1162,11 @@ redo:
 	trace_mark(xn_nucleus_shadow_hardened, "thread %p thread_name %s",
 		   thread, xnthread_name(thread));
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnsched_resched_p())
+		xnpod_schedule();
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-22 20:36 ` Gilles Chanteperdrix
@ 2008-01-22 21:46   ` Jan Kiszka
  2008-01-22 22:13     ` Gilles Chanteperdrix
  2008-01-22 22:22     ` Gilles Chanteperdrix
  0 siblings, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2008-01-22 21:46 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 3847 bytes --]

Gilles Chanteperdrix wrote:
> Gilles Chanteperdrix wrote:
>  > Hi,
>  > 
>  > after some (unsuccessful) time trying to instrument the code in a way
>  > that does not change the latency results completely, I found the
>  > reason for the high latency with latency -t 1 and latency -t 2 on ARM.
>  > So, here comes an update on this issue. The culprit is the user-space
>  > context switch, which flushes the processor cache with the nklock
>  > locked, irqs off.
>  > 
>  > There are two things we could do:
>  > - arrange for the ARM cache flush to happen with the nklock unlocked
>  > and irqs enabled. This will improve interrupt latency (latency -t 2)
>  > but obviously not scheduling latency (latency -t 1). If we go that
>  > way, there are several problems we should solve:
>  > 
>  > we do not want interrupt handlers to reenter xnpod_schedule(), for
>  > this we can use the XNLOCK bit, set on whatever is
>  > xnpod_current_thread() when the cache flush occurs
>  > 
>  > since the interrupt handler may modify the rescheduling bits, we need
>  > to test these bits in xnpod_schedule() epilogue and restart
>  > xnpod_schedule() if need be
>  > 
>  > we do not want xnpod_delete_thread() to delete one of the two threads
>  > involved in the context switch, for this the only solution I found is
>  > to add a bit to the thread mask meaning that the thread is currently
>  > switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
>  > to delete whatever thread was marked for deletion
>  > 
>  > in case of migration with xnpod_migrate_thread, we do not want
>  > xnpod_schedule() on the target CPU to switch to the migrated thread
>  > before the context switch on the source CPU is finished, for this we
>  > can avoid setting the resched bit in xnpod_migrate_thread(), detect
>  > the condition in xnpod_schedule() epilogue and set the rescheduling
>  > bits so that xnpod_schedule is restarted and send the IPI to the
>  > target CPU.
> 
> Please find attached a patch implementing these ideas. This adds some
> clutter, which I would be happy to reduce. Better ideas are welcome.
> 

I tried to cross-read the patch (-p would have been nice) but failed - 
this needs to be applied on some tree. Does the patch improve ARM 
latencies already?

> 
>  > 
>  > - avoid using user-space real-time tasks when running latency
>  > kernel-space benches, i.e. at least in the latency -t 1 and latency -t
>  > 2 case. This means that we should change the timerbench driver. There
>  > are at least two ways of doing this:
>  > use an rt_pipe
>  >  modify the timerbench driver to implement only the nrt ioctl, using
>  > vanilla linux services such as wait_event and wake_up.
>  > 
>  > What do you think ?
> 
> So, what do you thing is the best way to change the timerbench driver,
> * use an rt_pipe ? Pros: allows to run latency -t 1 and latency -t 2 even
>  if Xenomai is compiled with CONFIG_XENO_OPT_PERVASIVE off; cons: make
>  the timerbench non portable on other implementations of rtdm, eg. rtdm
>  over rtai or the version of rtdm which runs over vanilla linux
> * modify the timerbecn driver to implement only nrt ioctls ? Pros:
>   better driver portability; cons: latency would still need
>   CONFIG_XENO_OPT_PERVASIVE to run latency -t 1 and latency -t 2.

I'm still voting for my third approach:

  -> Write latency as kernel application (klatency) against the
     timerbench device
  -> Call NRT IOCTLs of timerbench during module init/cleanup
  -> Use module parameters for customization
  -> Setup a low-prio kernel-based RT task to issue the RT IOCTLs
  -> Format the results nicely (similar to userland latency) in that RT
     task and stuff them into some rtpipe
  -> Use "cat /dev/rtpipeX" to display the results

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-22 21:46   ` Jan Kiszka
@ 2008-01-22 22:13     ` Gilles Chanteperdrix
  2008-01-22 22:22     ` Gilles Chanteperdrix
  1 sibling, 0 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-22 22:13 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
 > Gilles Chanteperdrix wrote:
 > > Gilles Chanteperdrix wrote:
 > >  > Hi,
 > >  > 
 > >  > after some (unsuccessful) time trying to instrument the code in a way
 > >  > that does not change the latency results completely, I found the
 > >  > reason for the high latency with latency -t 1 and latency -t 2 on ARM.
 > >  > So, here comes an update on this issue. The culprit is the user-space
 > >  > context switch, which flushes the processor cache with the nklock
 > >  > locked, irqs off.
 > >  > 
 > >  > There are two things we could do:
 > >  > - arrange for the ARM cache flush to happen with the nklock unlocked
 > >  > and irqs enabled. This will improve interrupt latency (latency -t 2)
 > >  > but obviously not scheduling latency (latency -t 1). If we go that
 > >  > way, there are several problems we should solve:
 > >  > 
 > >  > we do not want interrupt handlers to reenter xnpod_schedule(), for
 > >  > this we can use the XNLOCK bit, set on whatever is
 > >  > xnpod_current_thread() when the cache flush occurs
 > >  > 
 > >  > since the interrupt handler may modify the rescheduling bits, we need
 > >  > to test these bits in xnpod_schedule() epilogue and restart
 > >  > xnpod_schedule() if need be
 > >  > 
 > >  > we do not want xnpod_delete_thread() to delete one of the two threads
 > >  > involved in the context switch, for this the only solution I found is
 > >  > to add a bit to the thread mask meaning that the thread is currently
 > >  > switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
 > >  > to delete whatever thread was marked for deletion
 > >  > 
 > >  > in case of migration with xnpod_migrate_thread, we do not want
 > >  > xnpod_schedule() on the target CPU to switch to the migrated thread
 > >  > before the context switch on the source CPU is finished, for this we
 > >  > can avoid setting the resched bit in xnpod_migrate_thread(), detect
 > >  > the condition in xnpod_schedule() epilogue and set the rescheduling
 > >  > bits so that xnpod_schedule is restarted and send the IPI to the
 > >  > target CPU.
 > > 
 > > Please find attached a patch implementing these ideas. This adds some
 > > clutter, which I would be happy to reduce. Better ideas are welcome.
 > > 
 > 
 > I tried to cross-read the patch (-p would have been nice) but failed - 
 > this needs to be applied on some tree. Does the patch improve ARM 
 > latencies already?

I split the patch in two parts in another post, this should make it
easier to read.

 > 
 > > 
 > >  > 
 > >  > - avoid using user-space real-time tasks when running latency
 > >  > kernel-space benches, i.e. at least in the latency -t 1 and latency -t
 > >  > 2 case. This means that we should change the timerbench driver. There
 > >  > are at least two ways of doing this:
 > >  > use an rt_pipe
 > >  >  modify the timerbench driver to implement only the nrt ioctl, using
 > >  > vanilla linux services such as wait_event and wake_up.
 > >  > 
 > >  > What do you think ?
 > > 
 > > So, what do you thing is the best way to change the timerbench driver,
 > > * use an rt_pipe ? Pros: allows to run latency -t 1 and latency -t 2 even
 > >  if Xenomai is compiled with CONFIG_XENO_OPT_PERVASIVE off; cons: make
 > >  the timerbench non portable on other implementations of rtdm, eg. rtdm
 > >  over rtai or the version of rtdm which runs over vanilla linux
 > > * modify the timerbecn driver to implement only nrt ioctls ? Pros:
 > >   better driver portability; cons: latency would still need
 > >   CONFIG_XENO_OPT_PERVASIVE to run latency -t 1 and latency -t 2.
 > 
 > I'm still voting for my third approach:
 > 
 >   -> Write latency as kernel application (klatency) against the
 >      timerbench device
 >   -> Call NRT IOCTLs of timerbench during module init/cleanup
 >   -> Use module parameters for customization
 >   -> Setup a low-prio kernel-based RT task to issue the RT IOCTLs
 >   -> Format the results nicely (similar to userland latency) in that RT
 >      task and stuff them into some rtpipe
 >   -> Use "cat /dev/rtpipeX" to display the results

Sorry this mail is older than your last reply to my question. I had
problems with my MTA, so I resent all the mail which were not sent, I
hoped they would be sent with their original date preserved, but
unfortunately, this is not the case.

Now, to answer your suggestion, I think that formating the results
belongs to user-space, not to kernel-space. Besides, emitting NRT ioctls
from module initialization and cleanup routines make this klatency
module quite inflexible. I was rather thinking about implementing the RT
versions of the IOCTLS so that they could be called from a kernel space
real-time task.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-22 21:46   ` Jan Kiszka
  2008-01-22 22:13     ` Gilles Chanteperdrix
@ 2008-01-22 22:22     ` Gilles Chanteperdrix
  1 sibling, 0 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-22 22:22 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
 > Does the patch improve ARM latencies already?

Yes, it does. The (interrupt) latency goes from above 100us to
80us. This is not yet 50us, though.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-22 20:36   ` Gilles Chanteperdrix
@ 2008-01-23 17:48     ` Philippe Gerum
  2008-01-23 17:53       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 29+ messages in thread
From: Philippe Gerum @ 2008-01-23 17:48 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> Gilles Chanteperdrix wrote:
>  > Please find attached a patch implementing these ideas. This adds some
>  > clutter, which I would be happy to reduce. Better ideas are welcome.
>  > 
> 
> Ok. New version of the patch, this time split in two parts, should
> hopefully make it more readable.
> 

Ack. I'd suggest the following:

- let's have a rate limiter when walking the zombie queue in
__xnpod_finalize_zombies. We hold the superlock here, and what the patch
also introduces is the potential for flushing more than a single TCB at
a time, which might not always be a cheap operation, depending on which
cra^H^Hode runs on behalf of the deletion hooks for instance. We may
take for granted that no sane code would continuously create more
threads than we would be able to finalize in a given time frame anyway.

- We could move most of the code depending on XNARCH_WANT_UNLOCKED_CTXSW
 to conditional inlines in pod.h. This would reduce the visual pollution
a lot.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-23 17:48     ` Philippe Gerum
@ 2008-01-23 17:53       ` Gilles Chanteperdrix
  2008-01-23 18:34         ` Philippe Gerum
  0 siblings, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-23 17:53 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> Gilles Chanteperdrix wrote:
> > Gilles Chanteperdrix wrote:
> >  > Please find attached a patch implementing these ideas. This adds some
> >  > clutter, which I would be happy to reduce. Better ideas are welcome.
> >  >
> >
> > Ok. New version of the patch, this time split in two parts, should
> > hopefully make it more readable.
> >
>
> Ack. I'd suggest the following:
>
> - let's have a rate limiter when walking the zombie queue in
> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
> also introduces is the potential for flushing more than a single TCB at
> a time, which might not always be a cheap operation, depending on which
> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
> take for granted that no sane code would continuously create more
> threads than we would be able to finalize in a given time frame anyway.

The maximum number of zombies in the queue is
1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
bit is armed.

>
> - We could move most of the code depending on XNARCH_WANT_UNLOCKED_CTXSW
>  to conditional inlines in pod.h. This would reduce the visual pollution
> a lot.

Ok, will try that, especially since the code added to the 4 places
where a scheduling tail takes place is pretty repetitive.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-23 17:53       ` Gilles Chanteperdrix
@ 2008-01-23 18:34         ` Philippe Gerum
  2008-01-23 18:39           ` Gilles Chanteperdrix
                             ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Philippe Gerum @ 2008-01-23 18:34 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> Gilles Chanteperdrix wrote:
>>> Gilles Chanteperdrix wrote:
>>>  > Please find attached a patch implementing these ideas. This adds some
>>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
>>>  >
>>>
>>> Ok. New version of the patch, this time split in two parts, should
>>> hopefully make it more readable.
>>>
>> Ack. I'd suggest the following:
>>
>> - let's have a rate limiter when walking the zombie queue in
>> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
>> also introduces is the potential for flushing more than a single TCB at
>> a time, which might not always be a cheap operation, depending on which
>> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
>> take for granted that no sane code would continuously create more
>> threads than we would be able to finalize in a given time frame anyway.
> 
> The maximum number of zombies in the queue is
> 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
> only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
> bit is armed.

Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
thread deletion isn't cheap already.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-23 18:34         ` Philippe Gerum
@ 2008-01-23 18:39           ` Gilles Chanteperdrix
  2008-01-23 22:38           ` Gilles Chanteperdrix
  2008-01-24 10:18           ` Gilles Chanteperdrix
  2 siblings, 0 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-23 18:39 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

On Jan 23, 2008 7:34 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> Gilles Chanteperdrix wrote:
> > On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> >> Gilles Chanteperdrix wrote:
> >>> Gilles Chanteperdrix wrote:
> >>>  > Please find attached a patch implementing these ideas. This adds some
> >>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
> >>>  >
> >>>
> >>> Ok. New version of the patch, this time split in two parts, should
> >>> hopefully make it more readable.
> >>>
> >> Ack. I'd suggest the following:
> >>
> >> - let's have a rate limiter when walking the zombie queue in
> >> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
> >> also introduces is the potential for flushing more than a single TCB at
> >> a time, which might not always be a cheap operation, depending on which
> >> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
> >> take for granted that no sane code would continuously create more
> >> threads than we would be able to finalize in a given time frame anyway.
> >
> > The maximum number of zombies in the queue is
> > 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
> > only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
> > bit is armed.
>
> Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
> thread deletion isn't cheap already.

Ok, as you wish.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-23 18:34         ` Philippe Gerum
  2008-01-23 18:39           ` Gilles Chanteperdrix
@ 2008-01-23 22:38           ` Gilles Chanteperdrix
  2008-01-24 10:18           ` Gilles Chanteperdrix
  2 siblings, 0 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-23 22:38 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 1464 bytes --]

Philippe Gerum wrote:
 > Gilles Chanteperdrix wrote:
 > > On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
 > >> Gilles Chanteperdrix wrote:
 > >>> Gilles Chanteperdrix wrote:
 > >>>  > Please find attached a patch implementing these ideas. This adds some
 > >>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
 > >>>  >
 > >>>
 > >>> Ok. New version of the patch, this time split in two parts, should
 > >>> hopefully make it more readable.
 > >>>
 > >> Ack. I'd suggest the following:
 > >>
 > >> - let's have a rate limiter when walking the zombie queue in
 > >> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
 > >> also introduces is the potential for flushing more than a single TCB at
 > >> a time, which might not always be a cheap operation, depending on which
 > >> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
 > >> take for granted that no sane code would continuously create more
 > >> threads than we would be able to finalize in a given time frame anyway.
 > > 
 > > The maximum number of zombies in the queue is
 > > 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
 > > only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
 > > bit is armed.
 > 
 > Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
 > thread deletion isn't cheap already.

Here come new patches.

-- 


					    Gilles Chanteperdrix.

[-- Attachment #2: xeno-rework-self-deletion.2.diff --]
[-- Type: text/plain, Size: 13325 bytes --]

Index: include/asm-ia64/bits/pod.h
===================================================================
--- include/asm-ia64/bits/pod.h	(revision 3441)
+++ include/asm-ia64/bits/pod.h	(working copy)
@@ -100,12 +100,6 @@ static inline void xnarch_switch_to(xnar
 	}
 }
 
-static inline void xnarch_finalize_and_switch(xnarchtcb_t * dead_tcb,
-					      xnarchtcb_t * next_tcb)
-{
-	xnarch_switch_to(dead_tcb, next_tcb);
-}
-
 static inline void xnarch_finalize_no_switch(xnarchtcb_t * dead_tcb)
 {
 	/* Empty */
Index: include/asm-blackfin/bits/pod.h
===================================================================
--- include/asm-blackfin/bits/pod.h	(revision 3441)
+++ include/asm-blackfin/bits/pod.h	(working copy)
@@ -67,12 +67,6 @@ static inline void xnarch_switch_to(xnar
 	rthal_thread_switch(out_tcb->tsp, in_tcb->tsp);
 }
 
-static inline void xnarch_finalize_and_switch(xnarchtcb_t * dead_tcb,
-					      xnarchtcb_t * next_tcb)
-{
-	xnarch_switch_to(dead_tcb, next_tcb);
-}
-
 static inline void xnarch_finalize_no_switch(xnarchtcb_t * dead_tcb)
 {
 	/* Empty */
Index: include/asm-arm/bits/pod.h
===================================================================
--- include/asm-arm/bits/pod.h	(revision 3441)
+++ include/asm-arm/bits/pod.h	(working copy)
@@ -96,12 +96,6 @@ static inline void xnarch_switch_to(xnar
 	rthal_thread_switch(prev, out_tcb->tip, in_tcb->tip);
 }
 
-static inline void xnarch_finalize_and_switch(xnarchtcb_t * dead_tcb,
-					      xnarchtcb_t * next_tcb)
-{
-	xnarch_switch_to(dead_tcb, next_tcb);
-}
-
 static inline void xnarch_finalize_no_switch(xnarchtcb_t * dead_tcb)
 {
 	/* Empty */
Index: include/asm-powerpc/bits/pod.h
===================================================================
--- include/asm-powerpc/bits/pod.h	(revision 3441)
+++ include/asm-powerpc/bits/pod.h	(working copy)
@@ -106,12 +106,6 @@ static inline void xnarch_switch_to(xnar
 	barrier();
 }
 
-static inline void xnarch_finalize_and_switch(xnarchtcb_t * dead_tcb,
-					      xnarchtcb_t * next_tcb)
-{
-	xnarch_switch_to(dead_tcb, next_tcb);
-}
-
 static inline void xnarch_finalize_no_switch(xnarchtcb_t * dead_tcb)
 {
 	/* Empty */
Index: include/asm-x86/bits/pod_64.h
===================================================================
--- include/asm-x86/bits/pod_64.h	(revision 3441)
+++ include/asm-x86/bits/pod_64.h	(working copy)
@@ -96,12 +96,6 @@ static inline void xnarch_switch_to(xnar
 	stts();
 }
 
-static inline void xnarch_finalize_and_switch(xnarchtcb_t * dead_tcb,
-					      xnarchtcb_t * next_tcb)
-{
-	xnarch_switch_to(dead_tcb, next_tcb);
-}
-
 static inline void xnarch_finalize_no_switch(xnarchtcb_t * dead_tcb)
 {
 	/* Empty */
Index: include/asm-x86/bits/pod_32.h
===================================================================
--- include/asm-x86/bits/pod_32.h	(revision 3441)
+++ include/asm-x86/bits/pod_32.h	(working copy)
@@ -123,12 +123,6 @@ static inline void xnarch_switch_to(xnar
 	stts();
 }
 
-static inline void xnarch_finalize_and_switch(xnarchtcb_t * dead_tcb,
-					      xnarchtcb_t * next_tcb)
-{
-	xnarch_switch_to(dead_tcb, next_tcb);
-}
-
 static inline void xnarch_finalize_no_switch(xnarchtcb_t * dead_tcb)
 {
 	/* Empty */
Index: include/asm-sim/bits/pod.h
===================================================================
--- include/asm-sim/bits/pod.h	(revision 3441)
+++ include/asm-sim/bits/pod.h	(working copy)
@@ -38,12 +38,6 @@ static inline void xnarch_switch_to (xna
     __mvm_breakable(mvm_switch_threads)(out_tcb->vmthread,in_tcb->vmthread);
 }
 
-static inline void xnarch_finalize_and_switch (xnarchtcb_t *dead_tcb,
-					       xnarchtcb_t *next_tcb)
-{
-    mvm_finalize_switch_threads(dead_tcb->vmthread,next_tcb->vmthread);
-}
-
 static inline void xnarch_finalize_no_switch (xnarchtcb_t *dead_tcb)
 {
     if (dead_tcb->vmthread)	/* Might be unstarted. */
Index: include/nucleus/pod.h
===================================================================
--- include/nucleus/pod.h	(revision 3441)
+++ include/nucleus/pod.h	(working copy)
@@ -139,6 +139,7 @@ typedef struct xnsched {
 
 	xntimer_t htimer;	/*!< Host timer. */
 
+	xnqueue_t zombies;
 } xnsched_t;
 
 #define nkpod (&nkpod_struct)
@@ -238,6 +239,14 @@ static inline void xnpod_reset_watchdog(
 }
 #endif /* CONFIG_XENO_OPT_WATCHDOG */
 
+void __xnpod_finalize_zombies(xnsched_t *sched);
+
+static inline void xnpod_finalize_zombies(xnsched_t *sched)
+{
+	if (!emptyq_p(&sched->zombies))
+		__xnpod_finalize_zombies(sched);
+}
+
 	/* -- Beginning of the exported interface */
 
 #define xnpod_sched_slot(cpu) \
Index: ksrc/nucleus/pod.c
===================================================================
--- ksrc/nucleus/pod.c	(revision 3441)
+++ ksrc/nucleus/pod.c	(working copy)
@@ -292,6 +292,7 @@ int xnpod_init(void)
 #endif /* CONFIG_SMP */
 		xntimer_set_name(&sched->htimer, htimer_name);
 		xntimer_set_sched(&sched->htimer, sched);
+		initq(&sched->zombies);
 	}
 
 	xnlock_put_irqrestore(&nklock, s);
@@ -545,63 +546,26 @@ static inline void xnpod_fire_callouts(x
 	__clrbits(sched->status, XNKCOUT);
 }
 
-static inline void xnpod_switch_zombie(xnthread_t *threadout,
-				       xnthread_t *threadin)
+void __xnpod_finalize_zombies(xnsched_t *sched)
 {
-	/* Must be called with nklock locked, interrupts off. */
-	xnsched_t *sched = xnpod_current_sched();
-#ifdef CONFIG_XENO_OPT_PERVASIVE
-	int shadow = xnthread_test_state(threadout, XNSHADOW);
-#endif /* CONFIG_XENO_OPT_PERVASIVE */
+	xnthread_t *thread = link2thread(getq(&sched->zombies), glink);
 
+	/* Must be called with nklock locked, interrupts off. */
 	trace_mark(xn_nucleus_sched_finalize,
-		   "thread_out %p thread_out_name %s "
-		   "thread_in %p thread_in_name %s",
-		   threadout, xnthread_name(threadout),
-		   threadin, xnthread_name(threadin));
+		   "thread_out %p thread_out_name %s",
+		   thread, xnthread_name(thread));
 
-	if (!emptyq_p(&nkpod->tdeleteq) && !xnthread_test_state(threadout, XNROOT)) {
+	if (!emptyq_p(&nkpod->tdeleteq)
+	    && !xnthread_test_state(thread, XNROOT)) {
 		trace_mark(xn_nucleus_thread_callout,
 			   "thread %p thread_name %s hook %s",
-			   threadout, xnthread_name(threadout), "DELETE");
-		xnpod_fire_callouts(&nkpod->tdeleteq, threadout);
+			   thread, xnthread_name(thread), "DELETE");
+		xnpod_fire_callouts(&nkpod->tdeleteq, thread);
 	}
 
-	sched->runthread = threadin;
-
-	if (xnthread_test_state(threadin, XNROOT)) {
-		xnpod_reset_watchdog(sched);
-		xnfreesync();
-		xnarch_enter_root(xnthread_archtcb(threadin));
-	}
-
-	/* FIXME: Catch 22 here, whether we choose to run on an invalid
-	   stack (cleanup then hooks), or to access the TCB space shortly
-	   after it has been freed while non-preemptible (hooks then
-	   cleanup)... Option #2 is current. */
-
-	xnthread_cleanup_tcb(threadout);
+	xnthread_cleanup_tcb(thread);
 
-	xnstat_exectime_finalize(sched, &threadin->stat.account);
-
-	xnarch_finalize_and_switch(xnthread_archtcb(threadout),
-				   xnthread_archtcb(threadin));
-
-#ifdef CONFIG_XENO_OPT_PERVASIVE
-	xnarch_trace_pid(xnthread_user_task(threadin) ?
-			 xnarch_user_pid(xnthread_archtcb(threadin)) : -1,
-			 xnthread_current_priority(threadin));
-
-	if (shadow)
-		/* Reap the user-space mate of a deleted real-time shadow.
-		   The Linux task has resumed into the Linux domain at the
-		   last code location executed by the shadow. Remember
-		   that both sides use the Linux task's stack. */
-		xnshadow_exit();
-#endif /* CONFIG_XENO_OPT_PERVASIVE */
-
-	xnpod_fatal("zombie thread %s (%p) would not die...", threadout->name,
-		    threadout);
+	xnarch_finalize_no_switch(xnthread_archtcb(thread));
 }
 
 /*! 
@@ -1216,6 +1180,7 @@ void xnpod_delete_thread(xnthread_t *thr
 		   the current one forever. Use the thread zombie state to go
 		   through the rescheduling procedure then actually destroy
 		   the thread object. */
+		appendq(&sched->zombies, &thread->glink);
 		xnsched_set_resched(sched);
 		xnpod_schedule();
 	} else {
@@ -2140,6 +2105,8 @@ void xnpod_dispatch_signals(void)
 
 void xnpod_welcome_thread(xnthread_t *thread, int imask)
 {
+	xnpod_finalize_zombies(thread->sched);
+
 	trace_mark(xn_nucleus_thread_boot, "thread %p thread_name %s",
 		   thread, xnthread_name(thread));
 
@@ -2373,6 +2340,7 @@ void xnpod_schedule(void)
 	xnthread_t *threadout, *threadin, *runthread;
 	xnpholder_t *pholder;
 	xnsched_t *sched;
+	int zombie;
 #if defined(CONFIG_SMP) || XENO_DEBUG(NUCLEUS)
 	int need_resched;
 #endif /* CONFIG_SMP || XENO_DEBUG(NUCLEUS) */
@@ -2402,7 +2370,6 @@ void xnpod_schedule(void)
 	xnarch_trace_pid(xnthread_user_task(runthread) ?
 			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
 			 xnthread_current_priority(runthread));
-
 #if defined(CONFIG_SMP) || XENO_DEBUG(NUCLEUS)
 	need_resched = xnsched_tst_resched(sched);
 #endif
@@ -2429,13 +2396,16 @@ void xnpod_schedule(void)
 	/* Clear the rescheduling bit */
 	xnsched_clr_resched(sched);
 
+	zombie = xnthread_test_state(runthread, XNZOMBIE);
 	if (!xnthread_test_state(runthread, XNTHREAD_BLOCK_BITS | XNZOMBIE)) {
 
 		/* Do not preempt the current thread if it holds the
 		 * scheduler lock. */
 
-		if (xnthread_test_state(runthread, XNLOCK))
+		if (xnthread_test_state(runthread, XNLOCK)) {
+			xnsched_set_resched(sched);
 			goto signal_unlock_and_exit;
+		}
 
 		pholder = sched_getheadpq(&sched->readyq);
 
@@ -2491,9 +2461,6 @@ void xnpod_schedule(void)
 	shadow = xnthread_test_state(threadout, XNSHADOW);
 #endif /* CONFIG_XENO_OPT_PERVASIVE */
 
-	if (xnthread_test_state(threadout, XNZOMBIE))
-		xnpod_switch_zombie(threadout, threadin);
-
 	sched->runthread = threadin;
 
 	if (xnthread_test_state(threadout, XNROOT))
@@ -2525,23 +2492,16 @@ void xnpod_schedule(void)
 #ifdef CONFIG_XENO_OPT_PERVASIVE
 	/* Test whether we are relaxing a thread. In such a case, we are here the
 	   epilogue of Linux' schedule, and should skip xnpod_schedule epilogue. */
-	if (shadow && xnthread_test_state(runthread, XNROOT)) {
-		spl_t ignored;
-		/* Shadow on entry and root without shadow extension on exit? 
-		   Mmmm... This must be the user-space mate of a deleted real-time
-		   shadow we've just rescheduled in the Linux domain to have it
-		   exit properly.  Reap it now. */
-		if (xnshadow_thrptd(current) == NULL)
-			xnshadow_exit();
-
-		/* We need to relock nklock here, since it is not locked and
-		   the caller may expect it to be locked. */
-		xnlock_get_irqsave(&nklock, ignored);
-		xnlock_put_irqrestore(&nklock, s);
-		return;
-	}
+	if (shadow && xnthread_test_state(runthread, XNROOT))
+		goto relax_epilogue;
 #endif /* CONFIG_XENO_OPT_PERVASIVE */
 
+	if (zombie)
+		xnpod_fatal("zombie thread %s (%p) would not die...",
+			    threadout->name, threadout);
+
+	xnpod_finalize_zombies(sched);
+
 #ifdef CONFIG_XENO_HW_FPU
 	__xnpod_switch_fpu(sched);
 #endif /* CONFIG_XENO_HW_FPU */
@@ -2564,6 +2524,25 @@ void xnpod_schedule(void)
 		xnpod_dispatch_signals();
 
 	xnlock_put_irqrestore(&nklock, s);
+	return;
+
+#ifdef CONFIG_XENO_OPT_PERVASIVE
+      relax_epilogue:
+	{
+		spl_t ignored;
+		/* Shadow on entry and root without shadow extension on exit? 
+		   Mmmm... This must be the user-space mate of a deleted real-time
+		   shadow we've just rescheduled in the Linux domain to have it
+		   exit properly.  Reap it now. */
+		if (xnshadow_thrptd(current) == NULL)
+			xnshadow_exit();
+
+		/* We need to relock nklock here, since it is not locked and
+		   the caller may expect it to be locked. */
+		xnlock_get_irqsave(&nklock, ignored);
+		xnlock_put_irqrestore(&nklock, s);
+	}
+#endif /* CONFIG_XENO_OPT_PERVASIVE */
 }
 
 /*! 
@@ -2664,9 +2643,6 @@ void xnpod_schedule_runnable(xnthread_t 
 	if (threadin == runthread)
 		return;		/* No switch. */
 
-	if (xnthread_test_state(runthread, XNZOMBIE))
-		xnpod_switch_zombie(runthread, threadin);
-
 	sched->runthread = threadin;
 
 	if (xnthread_test_state(runthread, XNROOT))
@@ -2687,15 +2663,17 @@ void xnpod_schedule_runnable(xnthread_t 
 	xnarch_switch_to(xnthread_archtcb(runthread),
 			 xnthread_archtcb(threadin));
 
-	xnarch_trace_pid(xnthread_user_task(runthread) ?
-			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
-			 xnthread_current_priority(runthread));
-
 #ifdef CONFIG_SMP
 	/* If runthread migrated while suspended, sched is no longer correct. */
 	sched = xnpod_current_sched();
 #endif
 
+	xnpod_finalize_zombies(sched);
+
+	xnarch_trace_pid(xnthread_user_task(runthread) ?
+			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
+			 xnthread_current_priority(runthread));
+
 #ifdef CONFIG_XENO_HW_FPU
 	__xnpod_switch_fpu(sched);
 #endif /* CONFIG_XENO_HW_FPU */
Index: ksrc/nucleus/shadow.c
===================================================================
--- ksrc/nucleus/shadow.c	(revision 3441)
+++ ksrc/nucleus/shadow.c	(working copy)
@@ -1059,6 +1059,7 @@ int xnshadow_harden(void)
 	struct task_struct *this_task = current;
 	struct __gatekeeper *gk;
 	xnthread_t *thread;
+	xnsched_t *sched;
 	int gk_cpu;
 
 redo:
@@ -1124,9 +1125,12 @@ redo:
 	}
 
 	/* "current" is now running into the Xenomai domain. */
+	sched = xnpod_current_sched();
+
+	xnpod_finalize_zombies(sched);
 
 #ifdef CONFIG_XENO_HW_FPU
-	xnpod_switch_fpu(xnpod_current_sched());
+	xnpod_switch_fpu(sched);
 #endif /* CONFIG_XENO_HW_FPU */
 
 	xnarch_schedule_tail(this_task);

[-- Attachment #3: xeno-unlocked-arm-ctx-switch.3.diff --]
[-- Type: text/plain, Size: 11458 bytes --]

diff -Naurdp -x .svn -x '*~' -x 'klat_mod.*' rework_self_deletion/include/asm-arm/bits/pod.h trunk/include/asm-arm/bits/pod.h
--- rework_self_deletion/include/asm-arm/bits/pod.h	2008-01-23 22:54:20.000000000 +0100
+++ trunk/include/asm-arm/bits/pod.h	2008-01-15 00:43:50.000000000 +0100
@@ -67,33 +67,39 @@ static inline void xnarch_enter_root(xna
 #endif /* TIF_MMSWITCH_INT */
 }
 
-static inline void xnarch_switch_to(xnarchtcb_t * out_tcb, xnarchtcb_t * in_tcb)
-{
-	struct task_struct *prev = out_tcb->active_task;
-	struct mm_struct *prev_mm = out_tcb->active_mm;
-	struct task_struct *next = in_tcb->user_task;
-
-
-	if (likely(next != NULL)) {
-		in_tcb->active_task = next;
-		in_tcb->active_mm = in_tcb->mm;
-		rthal_clear_foreign_stack(&rthal_domain);
-	} else {
-		in_tcb->active_task = prev;
-		in_tcb->active_mm = prev_mm;
-		rthal_set_foreign_stack(&rthal_domain);
-	}
-
-	if (prev_mm != in_tcb->active_mm) {
-		/* Switch to new user-space thread? */
-		if (in_tcb->active_mm)
-			switch_mm(prev_mm, in_tcb->active_mm, next);
-		if (!next->mm)
-			enter_lazy_tlb(prev_mm, next);
-	}
-
-	/* Kernel-to-kernel context switch. */
-	rthal_thread_switch(prev, out_tcb->tip, in_tcb->tip);
+#define xnarch_switch_to(_out_tcb, _in_tcb, lock)			\
+{									\
+	xnarchtcb_t *in_tcb = (_in_tcb);				\
+	xnarchtcb_t *out_tcb = (_out_tcb);				\
+	struct task_struct *prev = out_tcb->active_task;		\
+	struct mm_struct *prev_mm = out_tcb->active_mm;			\
+	struct task_struct *next = in_tcb->user_task;			\
+									\
+									\
+	if (likely(next != NULL)) {					\
+		in_tcb->active_task = next;				\
+		in_tcb->active_mm = in_tcb->mm;				\
+		rthal_clear_foreign_stack(&rthal_domain);		\
+	} else {							\
+		in_tcb->active_task = prev;				\
+		in_tcb->active_mm = prev_mm;				\
+		rthal_set_foreign_stack(&rthal_domain);			\
+	}								\
+									\
+	if (prev_mm != in_tcb->active_mm) {				\
+		/* Switch to new user-space thread? */			\
+		if (in_tcb->active_mm) {				\
+			spl_t ignored;					\
+			xnlock_clear_irqon(lock);			\
+			switch_mm(prev_mm, in_tcb->active_mm, next);	\
+			xnlock_get_irqsave(lock, ignored);		\
+		}							\
+		if (!next->mm)						\
+			enter_lazy_tlb(prev_mm, next);			\
+	}								\
+									\
+	/* Kernel-to-kernel context switch. */				\
+	rthal_thread_switch(prev, out_tcb->tip, in_tcb->tip);		\
 }
 
 static inline void xnarch_finalize_no_switch(xnarchtcb_t * dead_tcb)
diff -Naurdp -x .svn -x '*~' -x 'klat_mod.*' rework_self_deletion/include/asm-arm/system.h trunk/include/asm-arm/system.h
--- rework_self_deletion/include/asm-arm/system.h	2008-01-15 21:13:47.000000000 +0100
+++ trunk/include/asm-arm/system.h	2008-01-15 00:30:47.000000000 +0100
@@ -31,6 +31,8 @@
 
 #define XNARCH_THREAD_STACKSZ   4096
 
+#define XNARCH_WANT_UNLOCKED_CTXSW
+
 #define xnarch_stack_size(tcb)  ((tcb)->stacksize)
 #define xnarch_user_task(tcb)   ((tcb)->user_task)
 #define xnarch_user_pid(tcb)    ((tcb)->user_task->pid)
diff -Naurdp -x .svn -x '*~' -x 'klat_mod.*' rework_self_deletion/include/nucleus/pod.h trunk/include/nucleus/pod.h
--- rework_self_deletion/include/nucleus/pod.h	2008-01-15 21:13:28.000000000 +0100
+++ trunk/include/nucleus/pod.h	2008-01-23 22:35:45.000000000 +0100
@@ -140,6 +140,10 @@ typedef struct xnsched {
 	xntimer_t htimer;	/*!< Host timer. */
 
 	xnqueue_t zombies;
+
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	xnthread_t *lastthread;
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 } xnsched_t;
 
 #define nkpod (&nkpod_struct)
@@ -457,6 +461,27 @@ static inline void xnpod_delete_self(voi
 	xnpod_delete_thread(xnpod_current_thread());
 }
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+static inline void xnpod_finish_unlocked_switch(xnsched_t *sched)
+{
+	xnthread_clear_state(sched->lastthread, XNSWLOCK);
+	xnthread_clear_state(sched->runthread, XNSWLOCK);
+
+	/* Detect a thread which called xnpod_migrate_thread */
+	if (sched->lastthread->sched != sched)
+		xnpod_resume_thread(sched->lastthread, 0);
+}
+
+static inline void xnpod_resched_after_unlocked_switch(void)
+{
+	if (xnsched_resched_p())
+		xnpod_schedule();
+}
+#else /* !XNARCH_WANT_UNLOCKED_CTXSW */
+#define xnpod_finish_unlocked_switch(sched)
+#define xnpod_resched_after_unlocked_switch()
+#endif /* !XNARCH_WANT_UNLOCKED_CTXSW */
+
 #ifdef __cplusplus
 }
 #endif
diff -Naurdp -x .svn -x '*~' -x 'klat_mod.*' rework_self_deletion/include/nucleus/thread.h trunk/include/nucleus/thread.h
--- rework_self_deletion/include/nucleus/thread.h	2008-01-15 21:13:13.000000000 +0100
+++ trunk/include/nucleus/thread.h	2008-01-13 22:21:03.000000000 +0100
@@ -61,6 +61,7 @@
 #define XNFPU     0x00100000 /**< Thread uses FPU */
 #define XNSHADOW  0x00200000 /**< Shadow thread */
 #define XNROOT    0x00400000 /**< Root thread (that is, Linux/IDLE) */
+#define XNSWLOCK  0x00800000 /**< Thread is currently switching context. */
 
 /*! @} */ /* Ends doxygen comment group: nucleus_state_flags */
 
diff -Naurdp -x .svn -x '*~' -x 'klat_mod.*' rework_self_deletion/ksrc/nucleus/pod.c trunk/ksrc/nucleus/pod.c
--- rework_self_deletion/ksrc/nucleus/pod.c	2008-01-23 22:48:38.000000000 +0100
+++ trunk/ksrc/nucleus/pod.c	2008-01-23 23:33:52.000000000 +0100
@@ -66,6 +66,22 @@ char *nkmsgbuf = NULL;
 
 xnarch_cpumask_t nkaffinity = XNPOD_ALL_CPUS;
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+static inline void xnpod_switch_to(xnsched_t *sched,
+				   xnthread_t *threadout, xnthread_t *threadin)
+{
+	sched->lastthread = threadout;
+	xnthread_set_state(threadout, XNSWLOCK);
+	xnthread_set_state(threadin, XNSWLOCK);
+
+	xnarch_switch_to(xnthread_archtcb(threadout),
+			 xnthread_archtcb(threadin), &nklock);
+}
+#else /* !XNARCH_WANT_UNLOCKED_CTXSW */
+#define xnpod_switch_to(sched, threadout, threadin) \
+	xnarch_switch_to(xnthread_archtcb(threadout), xnthread_archtcb(threadin))
+#endif /* !XNARCH_WANT_UNLOCKED_CTXSW */
+
 const char *xnpod_fatal_helper(const char *format, ...)
 {
 	const unsigned nr_cpus = xnarch_num_online_cpus();
@@ -395,6 +411,9 @@ int xnpod_init(void)
 		appendq(&pod->threadq, &sched->rootcb.glink);
 
 		sched->runthread = &sched->rootcb;
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+		sched->lastthread = &sched->rootcb;
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 #ifdef CONFIG_XENO_HW_FPU
 		sched->fpuholder = &sched->rootcb;
 #endif /* CONFIG_XENO_HW_FPU */
@@ -550,6 +569,13 @@ void __xnpod_finalize_zombies(xnsched_t 
 {
 	xnthread_t *thread = link2thread(getq(&sched->zombies), glink);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (thread == sched->runthread) {
+		appendq(&sched->zombies, &thread->glink);
+		return;
+	}
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	/* Must be called with nklock locked, interrupts off. */
 	trace_mark(xn_nucleus_sched_finalize,
 		   "thread_out %p thread_out_name %s",
@@ -1175,7 +1201,12 @@ void xnpod_delete_thread(xnthread_t *thr
 
 	xnthread_set_state(thread, XNZOMBIE);
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW		
+	if (sched->runthread == thread
+	    || xnthread_test_state(thread, XNSWLOCK)) {
+#else /* XNARCH_WANT_UNLOCKED_CTXSW */
 	if (sched->runthread == thread) {
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 		/* We first need to elect a new runthread before switching out
 		   the current one forever. Use the thread zombie state to go
 		   through the rescheduling procedure then actually destroy
@@ -1862,8 +1893,10 @@ int xnpod_migrate_thread(int cpu)
 	/* Migrate the thread periodic timer. */
 	xntimer_set_sched(&thread->ptimer, thread->sched);
 
+#ifndef XNARCH_WANT_UNLOCKED_CTXSW
 	/* Put thread in the ready queue of the destination CPU's scheduler. */
 	xnpod_resume_thread(thread, 0);
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 
 	xnpod_schedule();
 
@@ -2105,6 +2138,8 @@ void xnpod_dispatch_signals(void)
 
 void xnpod_welcome_thread(xnthread_t *thread, int imask)
 {
+	xnpod_finish_unlocked_switch(thread->sched);
+
 	xnpod_finalize_zombies(thread->sched);
 
 	trace_mark(xn_nucleus_thread_boot, "thread %p thread_name %s",
@@ -2141,6 +2176,8 @@ void xnpod_welcome_thread(xnthread_t *th
 
 	xnlock_clear_irqoff(&nklock);
 	splexit(!!imask);
+
+	xnpod_resched_after_unlocked_switch();
 }
 
 #ifdef CONFIG_XENO_HW_FPU
@@ -2370,6 +2407,9 @@ void xnpod_schedule(void)
 	xnarch_trace_pid(xnthread_user_task(runthread) ?
 			 xnarch_user_pid(xnthread_archtcb(runthread)) : -1,
 			 xnthread_current_priority(runthread));
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+      restart:
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 #if defined(CONFIG_SMP) || XENO_DEBUG(NUCLEUS)
 	need_resched = xnsched_tst_resched(sched);
 #endif
@@ -2393,6 +2433,11 @@ void xnpod_schedule(void)
 
 #endif /* CONFIG_SMP */
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnthread_test_state(runthread, XNSWLOCK))
+		goto unlock_and_exit;
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
 	/* Clear the rescheduling bit */
 	xnsched_clr_resched(sched);
 
@@ -2474,8 +2519,7 @@ void xnpod_schedule(void)
 	xnstat_exectime_switch(sched, &threadin->stat.account);
 	xnstat_counter_inc(&threadin->stat.csw);
 
-	xnarch_switch_to(xnthread_archtcb(threadout),
-			 xnthread_archtcb(threadin));
+	xnpod_switch_to(sched, threadout, threadin);
 
 #ifdef CONFIG_SMP
 	/* If threadout migrated while suspended, sched is no longer correct. */
@@ -2500,6 +2544,8 @@ void xnpod_schedule(void)
 		xnpod_fatal("zombie thread %s (%p) would not die...",
 			    threadout->name, threadout);
 
+	xnpod_finish_unlocked_switch(sched);
+
 	xnpod_finalize_zombies(sched);
 
 #ifdef CONFIG_XENO_HW_FPU
@@ -2518,11 +2564,22 @@ void xnpod_schedule(void)
 		xnpod_fire_callouts(&nkpod->tswitchq, runthread);
 	}
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+	if (xnsched_resched_p()) {
+		if (xnthread_signaled_p(runthread))
+			xnpod_dispatch_signals();
+		goto restart;
+	}
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
+
       signal_unlock_and_exit:
 
 	if (xnthread_signaled_p(runthread))
 		xnpod_dispatch_signals();
 
+#ifdef XNARCH_WANT_UNLOCKED_CTXSW
+      unlock_and_exit:
+#endif /* XNARCH_WANT_UNLOCKED_CTXSW */
 	xnlock_put_irqrestore(&nklock, s);
 	return;
 
@@ -2660,14 +2717,15 @@ void xnpod_schedule_runnable(xnthread_t 
 	xnstat_exectime_switch(sched, &threadin->stat.account);
 	xnstat_counter_inc(&threadin->stat.csw);
 
-	xnarch_switch_to(xnthread_archtcb(runthread),
-			 xnthread_archtcb(threadin));
+	xnpod_switch_to(sched, runthread, threadin);
 
 #ifdef CONFIG_SMP
 	/* If runthread migrated while suspended, sched is no longer correct. */
 	sched = xnpod_current_sched();
 #endif
 
+	xnpod_finish_unlocked_switch(sched);
+
 	xnpod_finalize_zombies(sched);
 
 	xnarch_trace_pid(xnthread_user_task(runthread) ?
@@ -2682,6 +2740,8 @@ void xnpod_schedule_runnable(xnthread_t 
 	if (nkpod->schedhook && runthread == sched->runthread)
 		nkpod->schedhook(runthread, XNRUNNING);
 #endif /* __XENO_SIM__ */
+
+	xnpod_resched_after_unlocked_switch();
 }
 
 /*! 
diff -Naurdp -x .svn -x '*~' -x 'klat_mod.*' rework_self_deletion/ksrc/nucleus/shadow.c trunk/ksrc/nucleus/shadow.c
--- rework_self_deletion/ksrc/nucleus/shadow.c	2008-01-15 21:14:36.000000000 +0100
+++ trunk/ksrc/nucleus/shadow.c	2008-01-23 22:20:47.000000000 +0100
@@ -1127,6 +1127,8 @@ redo:
 	/* "current" is now running into the Xenomai domain. */
 	sched = xnpod_current_sched();
 
+	xnpod_finish_unlocked_switch(sched);
+
 	xnpod_finalize_zombies(sched);
 
 #ifdef CONFIG_XENO_HW_FPU
@@ -1153,6 +1155,8 @@ redo:
 	trace_mark(xn_nucleus_shadow_hardened, "thread %p thread_name %s",
 		   thread, xnthread_name(thread));
 
+	xnpod_resched_after_unlocked_switch();
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-23 18:34         ` Philippe Gerum
  2008-01-23 18:39           ` Gilles Chanteperdrix
  2008-01-23 22:38           ` Gilles Chanteperdrix
@ 2008-01-24 10:18           ` Gilles Chanteperdrix
  2008-01-26 18:17             ` Philippe Gerum
  2 siblings, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-24 10:18 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

On Jan 23, 2008 7:34 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> Gilles Chanteperdrix wrote:
> > On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> >> Gilles Chanteperdrix wrote:
> >>> Gilles Chanteperdrix wrote:
> >>>  > Please find attached a patch implementing these ideas. This adds some
> >>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
> >>>  >
> >>>
> >>> Ok. New version of the patch, this time split in two parts, should
> >>> hopefully make it more readable.
> >>>
> >> Ack. I'd suggest the following:
> >>
> >> - let's have a rate limiter when walking the zombie queue in
> >> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
> >> also introduces is the potential for flushing more than a single TCB at
> >> a time, which might not always be a cheap operation, depending on which
> >> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
> >> take for granted that no sane code would continuously create more
> >> threads than we would be able to finalize in a given time frame anyway.
> >
> > The maximum number of zombies in the queue is
> > 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
> > only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
> > bit is armed.
>
> Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
> thread deletion isn't cheap already.

I am not sure that holding the nklock while we run the thread deletion
hooks is really needed.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-24 10:18           ` Gilles Chanteperdrix
@ 2008-01-26 18:17             ` Philippe Gerum
  2008-01-26 18:43               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 29+ messages in thread
From: Philippe Gerum @ 2008-01-26 18:17 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> On Jan 23, 2008 7:34 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> Gilles Chanteperdrix wrote:
>>> On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>  > Please find attached a patch implementing these ideas. This adds some
>>>>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
>>>>>  >
>>>>>
>>>>> Ok. New version of the patch, this time split in two parts, should
>>>>> hopefully make it more readable.
>>>>>
>>>> Ack. I'd suggest the following:
>>>>
>>>> - let's have a rate limiter when walking the zombie queue in
>>>> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
>>>> also introduces is the potential for flushing more than a single TCB at
>>>> a time, which might not always be a cheap operation, depending on which
>>>> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
>>>> take for granted that no sane code would continuously create more
>>>> threads than we would be able to finalize in a given time frame anyway.
>>> The maximum number of zombies in the queue is
>>> 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
>>> only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
>>> bit is armed.
>> Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
>> thread deletion isn't cheap already.
> 
> I am not sure that holding the nklock while we run the thread deletion
> hooks is really needed.
> 

Deletion hooks may currently rely on the following assumptions when running:

- rescheduling is locked
- nklock is held, interrupts are off
- they run on behalf of the deletor context

The self-delete refactoring currently kills #3 because we now run the
hooks after the context switch, and would also kill #2 if we did not
hold the nklock (btw, enabling the nucleus debug while running with this
patch should raise an abort, from xnshadow_unmap, due to the second
assertion).

It should be possible to get rid of #3 for xnshadow_unmap (serious
testing needed here), but we would have to grab the nklock from this
routine anyway.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-26 18:17             ` Philippe Gerum
@ 2008-01-26 18:43               ` Gilles Chanteperdrix
  2008-01-27  0:19                 ` Philippe Gerum
  0 siblings, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-26 18:43 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

Philippe Gerum wrote:
 > Gilles Chanteperdrix wrote:
 > > On Jan 23, 2008 7:34 PM, Philippe Gerum <rpm@xenomai.org> wrote:
 > >> Gilles Chanteperdrix wrote:
 > >>> On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
 > >>>> Gilles Chanteperdrix wrote:
 > >>>>> Gilles Chanteperdrix wrote:
 > >>>>>  > Please find attached a patch implementing these ideas. This adds some
 > >>>>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
 > >>>>>  >
 > >>>>>
 > >>>>> Ok. New version of the patch, this time split in two parts, should
 > >>>>> hopefully make it more readable.
 > >>>>>
 > >>>> Ack. I'd suggest the following:
 > >>>>
 > >>>> - let's have a rate limiter when walking the zombie queue in
 > >>>> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
 > >>>> also introduces is the potential for flushing more than a single TCB at
 > >>>> a time, which might not always be a cheap operation, depending on which
 > >>>> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
 > >>>> take for granted that no sane code would continuously create more
 > >>>> threads than we would be able to finalize in a given time frame anyway.
 > >>> The maximum number of zombies in the queue is
 > >>> 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
 > >>> only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
 > >>> bit is armed.
 > >> Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
 > >> thread deletion isn't cheap already.
 > > 
 > > I am not sure that holding the nklock while we run the thread deletion
 > > hooks is really needed.
 > > 
 > 
 > Deletion hooks may currently rely on the following assumptions when running:
 > 
 > - rescheduling is locked
 > - nklock is held, interrupts are off
 > - they run on behalf of the deletor context
 > 
 > The self-delete refactoring currently kills #3 because we now run the
 > hooks after the context switch, and would also kill #2 if we did not
 > hold the nklock (btw, enabling the nucleus debug while running with this
 > patch should raise an abort, from xnshadow_unmap, due to the second
 > assertion).
 > 
 > It should be possible to get rid of #3 for xnshadow_unmap (serious
 > testing needed here), but we would have to grab the nklock from this
 > routine anyway.

Since the unmapped task is no longer running on the current CPU, is no
there any chance that it is run on another CPU by the time we get to
xnshadow_unmap ?

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-26 18:43               ` Gilles Chanteperdrix
@ 2008-01-27  0:19                 ` Philippe Gerum
       [not found]                   ` <18333.3277.36164.63798@domain.hid>
  0 siblings, 1 reply; 29+ messages in thread
From: Philippe Gerum @ 2008-01-27  0:19 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > Gilles Chanteperdrix wrote:
>  > > On Jan 23, 2008 7:34 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>  > >> Gilles Chanteperdrix wrote:
>  > >>> On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>  > >>>> Gilles Chanteperdrix wrote:
>  > >>>>> Gilles Chanteperdrix wrote:
>  > >>>>>  > Please find attached a patch implementing these ideas. This adds some
>  > >>>>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
>  > >>>>>  >
>  > >>>>>
>  > >>>>> Ok. New version of the patch, this time split in two parts, should
>  > >>>>> hopefully make it more readable.
>  > >>>>>
>  > >>>> Ack. I'd suggest the following:
>  > >>>>
>  > >>>> - let's have a rate limiter when walking the zombie queue in
>  > >>>> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
>  > >>>> also introduces is the potential for flushing more than a single TCB at
>  > >>>> a time, which might not always be a cheap operation, depending on which
>  > >>>> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
>  > >>>> take for granted that no sane code would continuously create more
>  > >>>> threads than we would be able to finalize in a given time frame anyway.
>  > >>> The maximum number of zombies in the queue is
>  > >>> 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
>  > >>> only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
>  > >>> bit is armed.
>  > >> Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
>  > >> thread deletion isn't cheap already.
>  > > 
>  > > I am not sure that holding the nklock while we run the thread deletion
>  > > hooks is really needed.
>  > > 
>  > 
>  > Deletion hooks may currently rely on the following assumptions when running:
>  > 
>  > - rescheduling is locked
>  > - nklock is held, interrupts are off
>  > - they run on behalf of the deletor context
>  > 
>  > The self-delete refactoring currently kills #3 because we now run the
>  > hooks after the context switch, and would also kill #2 if we did not
>  > hold the nklock (btw, enabling the nucleus debug while running with this
>  > patch should raise an abort, from xnshadow_unmap, due to the second
>  > assertion).
>  > 

Forget about this; shadows are always exited in secondary mode, so
that's fine, i.e. xnpod_current_thread() != deleted thread, hence we
should always run the deletion hooks immediately on behalf of the caller.

>  > It should be possible to get rid of #3 for xnshadow_unmap (serious
>  > testing needed here), but we would have to grab the nklock from this
>  > routine anyway.
> 
> Since the unmapped task is no longer running on the current CPU, is no
> there any chance that it is run on another CPU by the time we get to
> xnshadow_unmap ?
> 

The unmapped task is running actually, and do_exit() may reschedule
quite late until kernel preemption is eventually disabled, which happens
long after the I-pipe notifier is fired. We would need the nklock to
protect the RPI management too.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
       [not found]                   ` <18333.3277.36164.63798@domain.hid>
@ 2008-01-27 23:34                     ` Philippe Gerum
  2008-01-28 11:02                       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 29+ messages in thread
From: Philippe Gerum @ 2008-01-27 23:34 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > Gilles Chanteperdrix wrote:
>  > > Philippe Gerum wrote:
>  > >  > Gilles Chanteperdrix wrote:
>  > >  > > On Jan 23, 2008 7:34 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>  > >  > >> Gilles Chanteperdrix wrote:
>  > >  > >>> On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>  > >  > >>>> Gilles Chanteperdrix wrote:
>  > >  > >>>>> Gilles Chanteperdrix wrote:
>  > >  > >>>>>  > Please find attached a patch implementing these ideas. This adds some
>  > >  > >>>>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
>  > >  > >>>>>  >
>  > >  > >>>>>
>  > >  > >>>>> Ok. New version of the patch, this time split in two parts, should
>  > >  > >>>>> hopefully make it more readable.
>  > >  > >>>>>
>  > >  > >>>> Ack. I'd suggest the following:
>  > >  > >>>>
>  > >  > >>>> - let's have a rate limiter when walking the zombie queue in
>  > >  > >>>> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
>  > >  > >>>> also introduces is the potential for flushing more than a single TCB at
>  > >  > >>>> a time, which might not always be a cheap operation, depending on which
>  > >  > >>>> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
>  > >  > >>>> take for granted that no sane code would continuously create more
>  > >  > >>>> threads than we would be able to finalize in a given time frame anyway.
>  > >  > >>> The maximum number of zombies in the queue is
>  > >  > >>> 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
>  > >  > >>> only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
>  > >  > >>> bit is armed.
>  > >  > >> Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
>  > >  > >> thread deletion isn't cheap already.
>  > >  > > 
>  > >  > > I am not sure that holding the nklock while we run the thread deletion
>  > >  > > hooks is really needed.
>  > >  > > 
>  > >  > 
>  > >  > Deletion hooks may currently rely on the following assumptions when running:
>  > >  > 
>  > >  > - rescheduling is locked
>  > >  > - nklock is held, interrupts are off
>  > >  > - they run on behalf of the deletor context
>  > >  > 
>  > >  > The self-delete refactoring currently kills #3 because we now run the
>  > >  > hooks after the context switch, and would also kill #2 if we did not
>  > >  > hold the nklock (btw, enabling the nucleus debug while running with this
>  > >  > patch should raise an abort, from xnshadow_unmap, due to the second
>  > >  > assertion).
>  > >  > 
>  > 
>  > Forget about this; shadows are always exited in secondary mode, so
>  > that's fine, i.e. xnpod_current_thread() != deleted thread, hence we
>  > should always run the deletion hooks immediately on behalf of the caller.
> 
> What happens if the watchdog kills a user-space thread which is
> currently running in primary mode ? If I read xnpod_delete_thread
> correctly, the SIGKILL signal is sent to the target thread only if it is
> not the current thread.
> 

I'd say: zombie queuing from xnpod_delete, then shadow unmap on behalf
of the next switched context which would trigger the lo-stage unmap
request -> wake_up_process against the Linux side and asbestos underwear
provided by the relax epilogue, which would eventually reap the guy
through do_exit(). As a matter of fact, we would still have the
unmap-over-non-current issue, that's true.

Ok, could we try coding a damn Tetris instead? Pong, maybe? Gasp...

-- 
Philippe.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-27 23:34                     ` Philippe Gerum
@ 2008-01-28 11:02                       ` Gilles Chanteperdrix
  2008-01-28 12:18                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-28 11:02 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

On Jan 28, 2008 12:34 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>
> Gilles Chanteperdrix wrote:
> > Philippe Gerum wrote:
> >  > Gilles Chanteperdrix wrote:
> >  > > Philippe Gerum wrote:
> >  > >  > Gilles Chanteperdrix wrote:
> >  > >  > > On Jan 23, 2008 7:34 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> >  > >  > >> Gilles Chanteperdrix wrote:
> >  > >  > >>> On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> >  > >  > >>>> Gilles Chanteperdrix wrote:
> >  > >  > >>>>> Gilles Chanteperdrix wrote:
> >  > >  > >>>>>  > Please find attached a patch implementing these ideas. This adds some
> >  > >  > >>>>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
> >  > >  > >>>>>  >
> >  > >  > >>>>>
> >  > >  > >>>>> Ok. New version of the patch, this time split in two parts, should
> >  > >  > >>>>> hopefully make it more readable.
> >  > >  > >>>>>
> >  > >  > >>>> Ack. I'd suggest the following:
> >  > >  > >>>>
> >  > >  > >>>> - let's have a rate limiter when walking the zombie queue in
> >  > >  > >>>> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
> >  > >  > >>>> also introduces is the potential for flushing more than a single TCB at
> >  > >  > >>>> a time, which might not always be a cheap operation, depending on which
> >  > >  > >>>> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
> >  > >  > >>>> take for granted that no sane code would continuously create more
> >  > >  > >>>> threads than we would be able to finalize in a given time frame anyway.
> >  > >  > >>> The maximum number of zombies in the queue is
> >  > >  > >>> 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
> >  > >  > >>> only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
> >  > >  > >>> bit is armed.
> >  > >  > >> Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
> >  > >  > >> thread deletion isn't cheap already.
> >  > >  > >
> >  > >  > > I am not sure that holding the nklock while we run the thread deletion
> >  > >  > > hooks is really needed.
> >  > >  > >
> >  > >  >
> >  > >  > Deletion hooks may currently rely on the following assumptions when running:
> >  > >  >
> >  > >  > - rescheduling is locked
> >  > >  > - nklock is held, interrupts are off
> >  > >  > - they run on behalf of the deletor context
> >  > >  >
> >  > >  > The self-delete refactoring currently kills #3 because we now run the
> >  > >  > hooks after the context switch, and would also kill #2 if we did not
> >  > >  > hold the nklock (btw, enabling the nucleus debug while running with this
> >  > >  > patch should raise an abort, from xnshadow_unmap, due to the second
> >  > >  > assertion).
> >  > >  >
> >  >
> >  > Forget about this; shadows are always exited in secondary mode, so
> >  > that's fine, i.e. xnpod_current_thread() != deleted thread, hence we
> >  > should always run the deletion hooks immediately on behalf of the caller.
> >
> > What happens if the watchdog kills a user-space thread which is
> > currently running in primary mode ? If I read xnpod_delete_thread
> > correctly, the SIGKILL signal is sent to the target thread only if it is
> > not the current thread.
> >
>
> I'd say: zombie queuing from xnpod_delete, then shadow unmap on behalf
> of the next switched context which would trigger the lo-stage unmap
> request -> wake_up_process against the Linux side and asbestos underwear
> provided by the relax epilogue, which would eventually reap the guy
> through do_exit(). As a matter of fact, we would still have the
> unmap-over-non-current issue, that's true.
>
> Ok, could we try coding a damn Tetris instead? Pong, maybe? Gasp...

Games for mobile phones then, because I am afraid games for consoles
or PCs are too complicated for me.

No, seriously, how do we solve this ? Maybe we could relax from
xnpod_delete_thread ?


-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-28 11:02                       ` Gilles Chanteperdrix
@ 2008-01-28 12:18                         ` Gilles Chanteperdrix
  0 siblings, 0 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-28 12:18 UTC (permalink / raw)
  To: rpm, xenomai-core

Gilles Chanteperdrix wrote:
 > On Jan 28, 2008 12:34 AM, Philippe Gerum <rpm@xenomai.org> wrote:
 > >
 > > Gilles Chanteperdrix wrote:
 > > > Philippe Gerum wrote:
 > > >  > Gilles Chanteperdrix wrote:
 > > >  > > Philippe Gerum wrote:
 > > >  > >  > Gilles Chanteperdrix wrote:
 > > >  > >  > > On Jan 23, 2008 7:34 PM, Philippe Gerum <rpm@xenomai.org> wrote:
 > > >  > >  > >> Gilles Chanteperdrix wrote:
 > > >  > >  > >>> On Jan 23, 2008 6:48 PM, Philippe Gerum <rpm@xenomai.org> wrote:
 > > >  > >  > >>>> Gilles Chanteperdrix wrote:
 > > >  > >  > >>>>> Gilles Chanteperdrix wrote:
 > > >  > >  > >>>>>  > Please find attached a patch implementing these ideas. This adds some
 > > >  > >  > >>>>>  > clutter, which I would be happy to reduce. Better ideas are welcome.
 > > >  > >  > >>>>>  >
 > > >  > >  > >>>>>
 > > >  > >  > >>>>> Ok. New version of the patch, this time split in two parts, should
 > > >  > >  > >>>>> hopefully make it more readable.
 > > >  > >  > >>>>>
 > > >  > >  > >>>> Ack. I'd suggest the following:
 > > >  > >  > >>>>
 > > >  > >  > >>>> - let's have a rate limiter when walking the zombie queue in
 > > >  > >  > >>>> __xnpod_finalize_zombies. We hold the superlock here, and what the patch
 > > >  > >  > >>>> also introduces is the potential for flushing more than a single TCB at
 > > >  > >  > >>>> a time, which might not always be a cheap operation, depending on which
 > > >  > >  > >>>> cra^H^Hode runs on behalf of the deletion hooks for instance. We may
 > > >  > >  > >>>> take for granted that no sane code would continuously create more
 > > >  > >  > >>>> threads than we would be able to finalize in a given time frame anyway.
 > > >  > >  > >>> The maximum number of zombies in the queue is
 > > >  > >  > >>> 1 + XNARCH_WANT_UNLOCKED_CTXSW, since a zombie is added to the queue
 > > >  > >  > >>> only if a deleted thread is xnpod_current_thread(), or if the XNLOCKSW
 > > >  > >  > >>> bit is armed.
 > > >  > >  > >> Ack. rate_limit = 1? I'm really reluctant to increase the WCET here,
 > > >  > >  > >> thread deletion isn't cheap already.
 > > >  > >  > >
 > > >  > >  > > I am not sure that holding the nklock while we run the thread deletion
 > > >  > >  > > hooks is really needed.
 > > >  > >  > >
 > > >  > >  >
 > > >  > >  > Deletion hooks may currently rely on the following assumptions when running:
 > > >  > >  >
 > > >  > >  > - rescheduling is locked
 > > >  > >  > - nklock is held, interrupts are off
 > > >  > >  > - they run on behalf of the deletor context
 > > >  > >  >
 > > >  > >  > The self-delete refactoring currently kills #3 because we now run the
 > > >  > >  > hooks after the context switch, and would also kill #2 if we did not
 > > >  > >  > hold the nklock (btw, enabling the nucleus debug while running with this
 > > >  > >  > patch should raise an abort, from xnshadow_unmap, due to the second
 > > >  > >  > assertion).
 > > >  > >  >
 > > >  >
 > > >  > Forget about this; shadows are always exited in secondary mode, so
 > > >  > that's fine, i.e. xnpod_current_thread() != deleted thread, hence we
 > > >  > should always run the deletion hooks immediately on behalf of the caller.
 > > >
 > > > What happens if the watchdog kills a user-space thread which is
 > > > currently running in primary mode ? If I read xnpod_delete_thread
 > > > correctly, the SIGKILL signal is sent to the target thread only if it is
 > > > not the current thread.
 > > >
 > >
 > > I'd say: zombie queuing from xnpod_delete, then shadow unmap on behalf
 > > of the next switched context which would trigger the lo-stage unmap
 > > request -> wake_up_process against the Linux side and asbestos underwear
 > > provided by the relax epilogue, which would eventually reap the guy
 > > through do_exit(). As a matter of fact, we would still have the
 > > unmap-over-non-current issue, that's true.
 > >
 > > Ok, could we try coding a damn Tetris instead? Pong, maybe? Gasp...
 > 
 > Games for mobile phones then, because I am afraid games for consoles
 > or PCs are too complicated for me.
 > 
 > No, seriously, how do we solve this ? Maybe we could relax from
 > xnpod_delete_thread ?

This will not work, xnpod_schedule will not let xnshadow_relax suspend
the current thread while in interrupt context.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Xenomai-core] High latencies on ARM.
  2008-01-17 15:37               ` Gilles Chanteperdrix
@ 2008-01-31  7:43                 ` Gilles Chanteperdrix
  0 siblings, 0 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2008-01-31  7:43 UTC (permalink / raw)
  To: Jan Kiszka, xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 5602 bytes --]

Gilles Chanteperdrix wrote:
 > On Jan 17, 2008 3:22 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > >
 > > Gilles Chanteperdrix wrote:
 > > > On Jan 17, 2008 3:16 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > > >> Gilles Chanteperdrix wrote:
 > > >>> On Jan 17, 2008 12:55 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > > >>>> Gilles Chanteperdrix wrote:
 > > >>>>> On Jan 17, 2008 11:42 AM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
 > > >>>>>> Gilles Chanteperdrix wrote:
 > > >>>>>>> Hi,
 > > >>>>>>>
 > > >>>>>>> after some (unsuccessful) time trying to instrument the code in a way
 > > >>>>>>> that does not change the latency results completely, I found the
 > > >>>>>>> reason for the high latency with latency -t 1 and latency -t 2 on ARM.
 > > >>>>>>> So, here comes an update on this issue. The culprit is the user-space
 > > >>>>>>> context switch, which flushes the processor cache with the nklock
 > > >>>>>>> locked, irqs off.
 > > >>>>>>>
 > > >>>>>>> There are two things we could do:
 > > >>>>>>> - arrange for the ARM cache flush to happen with the nklock unlocked
 > > >>>>>>> and irqs enabled. This will improve interrupt latency (latency -t 2)
 > > >>>>>>> but obviously not scheduling latency (latency -t 1). If we go that
 > > >>>>>>> way, there are several problems we should solve:
 > > >>>>>>>
 > > >>>>>>> we do not want interrupt handlers to reenter xnpod_schedule(), for
 > > >>>>>>> this we can use the XNLOCK bit, set on whatever is
 > > >>>>>>> xnpod_current_thread() when the cache flush occurs
 > > >>>>>>>
 > > >>>>>>> since the interrupt handler may modify the rescheduling bits, we need
 > > >>>>>>> to test these bits in xnpod_schedule() epilogue and restart
 > > >>>>>>> xnpod_schedule() if need be
 > > >>>>>>>
 > > >>>>>>> we do not want xnpod_delete_thread() to delete one of the two threads
 > > >>>>>>> involved in the context switch, for this the only solution I found is
 > > >>>>>>> to add a bit to the thread mask meaning that the thread is currently
 > > >>>>>>> switching, and to (re)test the XNZOMBIE bit in xnpod_schedule epilogue
 > > >>>>>>> to delete whatever thread was marked for deletion
 > > >>>>>>>
 > > >>>>>>> in case of migration with xnpod_migrate_thread, we do not want
 > > >>>>>>> xnpod_schedule() on the target CPU to switch to the migrated thread
 > > >>>>>>> before the context switch on the source CPU is finished, for this we
 > > >>>>>>> can avoid setting the resched bit in xnpod_migrate_thread(), detect
 > > >>>>>>> the condition in xnpod_schedule() epilogue and set the rescheduling
 > > >>>>>>> bits so that xnpod_schedule is restarted and send the IPI to the
 > > >>>>>>> target CPU.
 > > >>>>>>>
 > > >>>>>>> - avoid using user-space real-time tasks when running latency
 > > >>>>>>> kernel-space benches, i.e. at least in the latency -t 1 and latency -t
 > > >>>>>>> 2 case. This means that we should change the timerbench driver. There
 > > >>>>>>> are at least two ways of doing this:
 > > >>>>>>> use an rt_pipe
 > > >>>>>>>  modify the timerbench driver to implement only the nrt ioctl, using
 > > >>>>>>> vanilla linux services such as wait_event and wake_up.
 > > >>>>>> [As you reminded me of this unanswered question:]
 > > >>>>>> One may consider adding further modes _besides_ current kernel tests
 > > >>>>>> that do not rely on RTDM & native userland support (e.g. when
 > > >>>>>> CONFIG_XENO_OPT_PERVASIVE is disabled). But the current tests are valid
 > > >>>>>> scenarios as well that must not be killed by such a change.
 > > >>>>> I think the current test scenario for latency -t 1 and latency -t 2
 > > >>>>> are a bit misleading: they measure kernel-space latencies in presence
 > > >>>>> of user-space real-time tasks. When one runs latency -t 1 or latency
 > > >>>>> -t 2, one would expect that there are only kernel-space real-time
 > > >>>>> tasks.
 > > >>>> If they are misleading, depends on your perspective. In fact, they are
 > > >>>> measuring in-kernel scenarios over the standard Xenomai setup, which
 > > >>>> includes userland RT task activity these day. Those scenarios are mainly
 > > >>>> targeting driver use cases, not pure kernel-space applications.
 > > >>>>
 > > >>>> But I agree that, for !CONFIG_XENO_OPT_PERVASIVE-like scenarios, we
 > > >>>> would benefit from an additional set of test cases.
 > > >>> Ok, I will not touch timerbench then, and implement another kernel module.
 > > >>>
 > > >> [Without considering all details]
 > > >> To achieve this independence of user space RT thread, it should suffice
 > > >> to implement a kernel-based frontend for timerbench. This frontent would
 > > >> then either dump to syslog or open some pipe to tell userland about the
 > > >> benchmark results. What do yo think?
 > > >
 > > > My intent was to implement a protocol similar to the one of
 > > > timerbench, but using an rt-pipe, and continue to use the latency
 > > > test, adding new options such as -t 3 and t 4. But there may be
 > > > problems with this approach: if we are compiling without
 > > > CONFIG_XENO_OPT_PERVASIVE, latency will not run at all. So, it is
 > > > probably simpler to implement a klatency that just reads from the
 > > > rt-pipe.
 > >
 > > But that klantency could perfectly reuse what timerbench already
 > > provides, without code changes to the latter, in theory.
 > 
 > That would be a kernel module then, but I also need some user-space
 > piece of software to do the computations and print the results.

Ok. Here comes, for review, a patch following your advices which adds
the "klatency" test.

-- 


					    Gilles Chanteperdrix.

[-- Attachment #2: xeno-klatency.diff --]
[-- Type: text/plain, Size: 22898 bytes --]

Index: configure
===================================================================
Index: Makefile.in
===================================================================
Index: include/uitron/Makefile.in
===================================================================
Index: include/asm-ia64/bits/Makefile.in
===================================================================
Index: include/asm-ia64/Makefile.in
===================================================================
Index: include/Makefile.in
===================================================================
Index: include/vxworks/Makefile.in
===================================================================
Index: include/native/Makefile.in
===================================================================
Index: include/asm-blackfin/bits/Makefile.in
===================================================================
Index: include/asm-blackfin/Makefile.in
===================================================================
Index: include/asm-generic/bits/Makefile.in
===================================================================
Index: include/asm-generic/Makefile.in
===================================================================
Index: include/asm-arm/bits/Makefile.in
===================================================================
Index: include/asm-arm/Makefile.in
===================================================================
Index: include/asm-powerpc/bits/Makefile.in
===================================================================
Index: include/asm-powerpc/Makefile.in
===================================================================
Index: include/rtai/Makefile.in
===================================================================
Index: include/psos+/Makefile.in
===================================================================
Index: include/posix/syscall.h
===================================================================
Index: include/posix/Makefile.in
===================================================================
Index: include/posix/sys/Makefile.in
===================================================================
Index: include/posix/sys/select.h
===================================================================
Index: include/vrtx/Makefile.in
===================================================================
Index: include/asm-x86/Makefile.in
===================================================================
Index: include/asm-x86/bits/Makefile.in
===================================================================
Index: include/rtdm/Makefile.in
===================================================================
Index: include/rtdm/rtdm_driver.h
===================================================================
Index: include/rtdm/rtdm.h
===================================================================
Index: include/asm-sim/bits/Makefile.in
===================================================================
Index: include/asm-sim/Makefile.in
===================================================================
Index: include/nucleus/Makefile.in
===================================================================
Index: include/nucleus/select.h
===================================================================
Index: configure.in
===================================================================
--- configure.in	(revision 3452)
+++ configure.in	(working copy)
@@ -766,6 +766,7 @@ AC_CONFIG_FILES([ \
        	src/testsuite/switchtest/Makefile \
 	src/testsuite/irqbench/Makefile \
 	src/testsuite/clocktest/Makefile \
+       	src/testsuite/klatency/Makefile \
 	src/utils/Makefile \
 	src/utils/can/Makefile \
        	include/Makefile \
Index: src/utils/can/Makefile.in
===================================================================
Index: src/utils/Makefile.in
===================================================================
Index: src/Makefile.in
===================================================================
Index: src/include/Makefile.in
===================================================================
Index: src/rtdk/Makefile.in
===================================================================
Index: src/skins/psos+/Makefile.in
===================================================================
Index: src/skins/rtai/Makefile.in
===================================================================
Index: src/skins/uitron/Makefile.in
===================================================================
Index: src/skins/posix/Makefile.in
===================================================================
Index: src/skins/posix/wrappers.c
===================================================================
Index: src/skins/posix/select.c
===================================================================
Index: src/skins/posix/posix.wrappers
===================================================================
Index: src/skins/posix/Makefile.am
===================================================================
Index: src/skins/Makefile.in
===================================================================
Index: src/skins/vrtx/Makefile.in
===================================================================
Index: src/skins/vxworks/Makefile.in
===================================================================
Index: src/skins/rtdm/Makefile.in
===================================================================
Index: src/testsuite/latency/Makefile.in
===================================================================
Index: src/testsuite/switchbench/Makefile.in
===================================================================
Index: src/testsuite/switchtest/Makefile.in
===================================================================
Index: src/testsuite/Makefile.in
===================================================================
Index: src/testsuite/cyclic/Makefile.in
===================================================================
Index: src/testsuite/Makefile.am
===================================================================
--- src/testsuite/Makefile.am	(revision 3452)
+++ src/testsuite/Makefile.am	(working copy)
@@ -1 +1 @@
-SUBDIRS = latency switchbench cyclic switchtest irqbench clocktest
+SUBDIRS = latency switchbench cyclic switchtest irqbench clocktest klatency
Index: src/testsuite/klatency/Makefile.in
===================================================================
Index: src/testsuite/klatency/runinfo.in
===================================================================
--- src/testsuite/klatency/runinfo.in	(revision 0)
+++ src/testsuite/klatency/runinfo.in	(revision 0)
@@ -0,0 +1 @@
+latency:native+rtdm+timerbench+klat:!@exec_prefix@domain.hid
Index: src/testsuite/klatency/Makefile.am
===================================================================
--- src/testsuite/klatency/Makefile.am	(revision 0)
+++ src/testsuite/klatency/Makefile.am	(revision 0)
@@ -0,0 +1,25 @@
+testdir = $(exec_prefix)/share/xenomai/testsuite/klatency
+
+bin_PROGRAMS = klatency
+
+klatency_SOURCES = klatency.c
+
+klatency_CPPFLAGS = \
+	@XENO_USER_CFLAGS@ \
+	-I$(top_srcdir)/include
+
+klatency_LDFLAGS = @XENO_USER_LDFLAGS@
+
+install-data-local:
+	$(mkinstalldirs) $(DESTDIR)$(testdir)
+	@sed -e's,@exec_prefix\@,$(exec_prefix),g' $(srcdir)/runinfo.in > $(DESTDIR)$(testdir)/.runinfo
+	@echo "\$${DESTDIR}$(exec_prefix)/bin/xeno-load \`dirname \$$0\` \$$*" > $(DESTDIR)$(testdir)/run
+	@chmod +x $(DESTDIR)$(testdir)/run
+
+uninstall-local:
+	$(RM) $(DESTDIR)$(testdir)/.runinfo $(DESTDIR)$(testdir)/run
+
+run: all
+	@$(top_srcdir)/scripts/xeno-load --verbose
+
+EXTRA_DIST = runinfo.in
Index: src/testsuite/klatency/klatency.c
===================================================================
--- src/testsuite/klatency/klatency.c	(revision 0)
+++ src/testsuite/klatency/klatency.c	(revision 0)
@@ -0,0 +1,217 @@
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <signal.h>
+#include <getopt.h>
+#include <time.h>
+#include <errno.h>
+#include <rtdm/rttesting.h>
+
+long long period_ns = 0;
+int test_duration = 0;		/* sec of testing, via -T <sec>, 0 is inf */
+int data_lines = 21;		/* data lines per header line, -l <lines> to change */
+int quiet = 0;			/* suppress printing of RTH, RTD lines when -T given */
+int benchdev_no = -1;
+int benchdev = -1;
+int freeze_max;
+int priority;
+
+#define USER_TASK       0
+#define KERNEL_TASK     1
+#define TIMER_HANDLER   2
+
+int test_mode = USER_TASK;
+const char *test_mode_names[] = {
+	"periodic user-mode task",
+	"in-kernel periodic task",
+	"in-kernel timer handler"
+};
+
+time_t test_start, test_end;	/* report test duration */
+int test_loops = 0;		/* outer loop count */
+
+int finished = 0;
+
+void display(void)
+{
+	struct rttst_interm_bench_res result;
+	int err, n = 0, got_results = 0;
+	time_t start, actual_duration;
+
+	time(&start);
+
+	printf("warming up...\n");
+
+	if (quiet)
+		fprintf(stderr, "running quietly for %d seconds\n",
+			test_duration);
+
+	while (!finished) {
+		long minj, gminj, maxj, gmaxj, avgj, goverrun;
+
+		err = read(benchdev, &result, sizeof(result));
+		if (err <= 0) {
+			fprintf(stderr, "read: %d, errno: %d\n", err, errno);
+			break;
+		}
+
+		got_results = 1;
+		minj = result.last.min;
+		gminj = result.overall.min;
+		avgj = result.last.avg;
+		maxj = result.last.max;
+		gmaxj = result.overall.max;
+		goverrun = result.overall.overruns;
+
+		if (!quiet) {
+			if (data_lines && (n++ % data_lines) == 0) {
+				time_t now, dt;
+				time(&now);
+				dt = now - start;
+				printf
+					("RTT|  %.2ld:%.2ld:%.2ld  (%s, %Ld us period, "
+					 "priority %d)\n", dt / 3600,
+					 (dt / 60) % 60, dt % 60,
+					 test_mode_names[test_mode],
+					 period_ns / 1000, priority);
+				printf("RTH|%12s|%12s|%12s|%8s|%12s|%12s\n",
+				       "-----lat min", "-----lat avg",
+				       "-----lat max", "-overrun",
+				       "----lat best", "---lat worst");
+			}
+
+			printf("RTD|%12.3f|%12.3f|%12.3f|%8ld|%12.3f|%12.3f\n",
+			       (double)minj / 1000,
+			       (double)avgj / 1000,
+			       (double)maxj / 1000,
+			       goverrun,
+			       (double)gminj / 1000, (double)gmaxj / 1000);
+		}
+	}
+
+	time(&test_end);
+	actual_duration = test_end - test_start;
+	if (!test_duration)
+		test_duration = actual_duration;
+
+	if (got_results) {
+		long gminj, gmaxj, gavgj, goverrun;
+
+		gminj = result.overall.min;
+		gmaxj = result.overall.max;
+		goverrun = result.overall.overruns;
+		gavgj = result.overall.avg
+			/ ((result.overall.test_loops) > 1 ?
+			   result.overall.test_loops : 2) - 1;
+
+		printf("---|------------|------------|------------|--------|-------------------------\n"
+		       "RTS|%12.3f|%12.3f|%12.3f|%8ld|    %.2ld:%.2ld:%.2ld/%.2d:%.2d:%.2d\n",
+		       (double)gminj / 1000, (double)gavgj / 1000, (double)gmaxj / 1000,
+		       goverrun, actual_duration / 3600, (actual_duration / 60) % 60,
+		       actual_duration % 60, test_duration / 3600,
+		       (test_duration / 60) % 60, test_duration % 60);
+
+	}
+
+	if (benchdev >= 0)
+		close(benchdev);
+}
+
+void sighand(int sig __attribute__ ((unused)))
+{
+	finished = 1;
+}
+
+int main(int argc, char **argv)
+{
+	struct rttst_tmbench_config config;
+	int c;
+
+	while ((c = getopt(argc, argv, "l:T:qP:")) != EOF)
+		switch (c) {
+		case 'l':
+
+			data_lines = atoi(optarg);
+			break;
+
+		case 'T':
+
+			test_duration = atoi(optarg);
+			alarm(test_duration);
+			break;
+
+		case 'q':
+
+			quiet = 1;
+			break;
+
+		case 'P':
+
+			benchdev_no = atoi(optarg);
+			break;
+
+		default:
+
+			fprintf(stderr, "usage: latency [options]\n"
+				"  [-l <data-lines per header>] # default=21, 0 to supress headers\n"
+				"  [-T <test_duration_seconds>] # default=0, so ^C to end\n"
+				"  [-q]                         # supresses RTD, RTH lines if -T is used\n"
+				"  [-P <rt_pipe_no>]            # number of testing pipe, default=auto\n");
+			exit(2);
+		}
+
+	if (!test_duration && quiet) {
+		fprintf(stderr,
+			"latency: -q only works if -T has been given.\n");
+		quiet = 0;
+	}
+
+	time(&test_start);
+
+	signal(SIGINT, sighand);
+	signal(SIGTERM, sighand);
+	signal(SIGHUP, sighand);
+	signal(SIGALRM, sighand);
+
+	setlinebuf(stdout);
+
+	if (benchdev_no == -1) {
+		benchdev = open("/proc/xenomai/registry/native/pipes/klat_pipe",
+				O_RDONLY);
+		if (benchdev == -1) {
+			perror("open(/proc/xenomai/registry/native/pipes/klat_pipe)");
+			fprintf(stderr,
+				"modprobe klat_mod or try the -P option?\n");
+			exit(EXIT_FAILURE);
+		}
+	} else {
+		char devname[64];
+		snprintf(devname, sizeof(devname), "/dev/rtp%d", benchdev_no);
+		benchdev = open(devname, O_RDONLY);
+		if (benchdev == -1) {
+			fprintf(stderr, "open(%s): %s\n",
+				devname, strerror(errno));
+			exit(EXIT_FAILURE);
+		}
+	}
+
+	if (read(benchdev, &config, sizeof(config)) == -1) {
+		perror("read");
+		exit(EXIT_FAILURE);
+	}
+
+	test_mode = config.mode;
+	priority = config.priority;
+	period_ns = config.period;
+	freeze_max = config.freeze_max;
+
+	printf("== Sampling period: %Ld us\n"
+	       "== Test mode: %s\n"
+	       "== All results in microseconds\n",
+	       period_ns / 1000, test_mode_names[test_mode]);
+
+	display();
+
+	return 0;
+}
Index: src/testsuite/irqbench/Makefile.in
===================================================================
Index: src/testsuite/clocktest/Makefile.in
===================================================================
Index: scripts/Makefile.in
===================================================================
Index: ksrc/skins/posix/syscall.c
===================================================================
Index: ksrc/skins/posix/mq.c
===================================================================
Index: ksrc/skins/posix/thread.c
===================================================================
Index: ksrc/skins/posix/thread.h
===================================================================
Index: ksrc/skins/posix/internal.h
===================================================================
Index: ksrc/skins/rtdm/device.c
===================================================================
Index: ksrc/skins/rtdm/drvlib.c
===================================================================
Index: ksrc/skins/rtdm/core.c
===================================================================
Index: ksrc/drivers/testing/Kconfig
===================================================================
--- ksrc/drivers/testing/Kconfig	(revision 3452)
+++ ksrc/drivers/testing/Kconfig	(working copy)
@@ -22,4 +22,10 @@ config XENO_DRIVERS_SWITCHTEST
 	Kernel-based driver for unit testing context switches and
 	FPU switches.
 
+config XENO_DRIVERS_KLATENCY
+	depends on XENO_DRIVERS_TIMERBENCH
+	tristate "Kernel-only latency measurement module"
+	help
+	Kernel module for kernel-only latency measurement.
+
 endmenu
Index: ksrc/drivers/testing/Config.in
===================================================================
--- ksrc/drivers/testing/Config.in	(revision 3452)
+++ ksrc/drivers/testing/Config.in	(working copy)
@@ -11,4 +11,6 @@ dep_tristate 'IRQ benchmark driver' CONF
 
 dep_tristate 'Context switches test driver' CONFIG_XENO_DRIVERS_SWITCHTEST $CONFIG_XENO_SKIN_RTDM
 
+dep_tristate 'Kernel-only latency measurement module' CONFIG_XENO_DRIVERS_KLATENCY $XENO_DRIVERS_TIMERBENCH
+
 endmenu
Index: ksrc/drivers/testing/klat.c
===================================================================
--- ksrc/drivers/testing/klat.c	(revision 0)
+++ ksrc/drivers/testing/klat.c	(revision 0)
@@ -0,0 +1,156 @@
+/*
+ * Copyright (C) 2008 Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>.
+ *
+ * Xenomai is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * Xenomai is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Xenomai; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include <native/pipe.h>
+#include <native/task.h>
+#include <rtdm/rttesting.h>
+
+#define DEV_NR_MAX 256
+
+static int pipe = P_MINOR_AUTO;
+module_param(pipe, int, 0400);
+MODULE_PARM_DESC(pipe, "Index of the RT-pipe used for first connection"
+		 " (-1, the default, means automatic minor allocation)");
+
+static int mode = 1;
+module_param(mode, int, 0400);
+MODULE_PARM_DESC(mode, "Test mode, (1 for kernel task, 2 for timer handler)");
+
+static int priority = 99;
+module_param(priority, int, 0400);
+MODULE_PARM_DESC(priority, "Kernel task priority");
+
+static unsigned period = 100;
+module_param(period, uint, 0400);
+MODULE_PARM_DESC(period, "Sampling period, in microseconds");
+
+static int freeze_max = 0;
+module_param(freeze_max, int, 0400);
+MODULE_PARM_DESC(freeze_max, "Freeze trace for each new max latency");
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("gilles.chanteperdrix@xenomai.org");
+
+static RT_TASK klat_srvr;
+static RT_PIPE klat_pipe;
+static int fd;
+
+static void klat_server(void *cookie)
+{
+	struct rttst_interm_bench_res res;
+	int err;
+
+	for (;;) {
+		err = rt_dev_ioctl(fd, RTTST_RTIOC_INTERM_BENCH_RES, &res);
+		if (err) {
+			if (err != -EIDRM)
+				printk("rt_dev_ioctl(RTTST_RTIOC_INTERM_BENCH_RES): %d",
+				       err);
+			return;
+		}
+
+		/* Do not check rt_pipe_write return value, the pipe may well be
+		   full. */
+		rt_pipe_write(&klat_pipe, &res, sizeof(res), P_NORMAL);
+	}
+}
+
+static int __init klat_mod_init(void)
+{
+	char devname[RTDM_MAX_DEVNAME_LEN + 1];
+	struct rttst_tmbench_config config;
+	unsigned dev_nr;
+	int err;
+
+	err = rt_pipe_create(&klat_pipe, "klat_pipe", pipe, 4096);
+	if (err) {
+		printk("rt_pipe_create(klat_pipe): %d\n", err);
+		return err;
+	}
+
+	err = rt_task_create(&klat_srvr, "klat_srvr", 0, 0, 0);
+	if (err) {
+		printk("rt_task_create(klat_srvr): %d\n", err);
+		goto err_close_pipe;
+	}
+
+	config.mode = mode;
+	config.priority = priority;
+	config.period = period * 1000;
+	config.warmup_loops = 1;
+	config.histogram_size = 0;
+	config.freeze_max = freeze_max;
+
+	for (dev_nr = 0; dev_nr < DEV_NR_MAX; dev_nr++) {
+		snprintf(devname, sizeof(devname), "rttest%d", dev_nr);
+		fd = rt_dev_open(devname, O_RDONLY);
+		if (fd < 0)
+			continue;
+
+		err = rt_dev_ioctl(fd, RTTST_RTIOC_TMBENCH_START, &config);
+		if (err == -ENOTTY) {
+			rt_dev_close(fd);
+			continue;
+		}
+
+		if (err < 0) {
+			printk("rt_dev_ioctl(RTTST_RTIOC_TMBENCH_START): %d\n",
+			       err);
+			goto err_close_dev;
+		}
+
+		break;
+	}
+	if (fd < 0) {
+		printk("rt_dev_open: could not find rttest device\n"
+		       "(modprobe timerbench?)");
+		return fd;
+	}
+
+	err = rt_pipe_write(&klat_pipe, &config, sizeof(config), P_NORMAL);
+	if (err < 0) {
+		printk("rt_pipe_write: %d\n", err);
+		goto err_close_dev;
+	}
+
+	err = rt_task_start(&klat_srvr, &klat_server, NULL);
+	if (err) {
+		printk("rt_task_start: %d\n", err);
+		goto err_close_dev;
+	}
+	
+	return 0;
+
+  err_close_dev:
+	rt_dev_close(fd);
+	rt_task_delete(&klat_srvr);
+  err_close_pipe:
+	rt_pipe_delete(&klat_pipe);
+	return err;
+}
+
+
+static void klat_mod_exit(void)
+{
+	rt_dev_close(fd);
+	rt_task_delete(&klat_srvr);
+	rt_pipe_delete(&klat_pipe);
+}
+
+module_init(klat_mod_init);
+module_exit(klat_mod_exit);
Index: ksrc/drivers/testing/Makefile
===================================================================
--- ksrc/drivers/testing/Makefile	(revision 3452)
+++ ksrc/drivers/testing/Makefile	(working copy)
@@ -7,6 +7,7 @@ EXTRA_CFLAGS += -D__IN_XENOMAI__ -Iinclu
 obj-$(CONFIG_XENO_DRIVERS_TIMERBENCH) += xeno_timerbench.o
 obj-$(CONFIG_XENO_DRIVERS_IRQBENCH)   += xeno_irqbench.o
 obj-$(CONFIG_XENO_DRIVERS_SWITCHTEST) += xeno_switchtest.o
+obj-$(CONFIG_XENO_DRIVERS_KLATENCY)   += xeno_klat.o
 
 xeno_timerbench-y := timerbench.o
 
@@ -14,6 +15,8 @@ xeno_irqbench-y := irqbench.o
 
 xeno_switchtest-y := switchtest.o
 
+xeno_klat-y := klat.o
+
 EXTRA_CFLAGS += -D__IN_XENOMAI__ -Iinclude/xenomai
 
 else
@@ -25,14 +28,17 @@ O_TARGET := built-in.o
 obj-$(CONFIG_XENO_DRIVERS_TIMERBENCH) += xeno_timerbench.o
 obj-$(CONFIG_XENO_DRIVERS_IRQBENCH)   += xeno_irqbench.o
 obj-$(CONFIG_XENO_DRIVERS_SWITCHTEST) += xeno_switchtest.o
-
+obj-$(CONFIG_XENO_DRIVERS_KLATENCY)   += xeno_klat.o
 xeno_timerbench-objs := timerbench.o
 
 xeno_irqbench-objs := irqbench.o
 
 xeno_switchtest-objs := switchtest.o
 
-export-objs := $(xeno_timerbench-objs) $(xeno_irqbench-objs) $(xeno_switchtest-objs)
+xeno_klat-objs := klat.o
+
+export-objs := $(xeno_timerbench-objs) $(xeno_irqbench-objs) \
+	$(xeno_switchtest-objs) $(xeno_klat-objs)
 
 EXTRA_CFLAGS += -D__IN_XENOMAI__ -I$(TOPDIR)/include/xenomai -I$(TOPDIR)/include/xenomai/compat
 
@@ -47,4 +53,7 @@ xeno_irqbench.o: $(xeno_irqbench-objs)
 xeno_switchtest.o: $(xeno_switchtest-objs)
 	$(LD) -r -o $@ $(xeno_switchtest-objs)
 
+xeno_klat.o: $(xeno_klat-objs)
+	$(LD) -r -o $@ $(xeno_klat-objs)
+
 endif
Index: ksrc/nucleus/select.c
===================================================================
Index: ksrc/nucleus/Makefile
===================================================================
Index: config/Makefile.in
===================================================================
Index: doc/txt/Makefile.in
===================================================================
Index: doc/docbook/xenomai/Makefile.in
===================================================================
Index: doc/docbook/Makefile.in
===================================================================
Index: doc/docbook/custom-stylesheets/Makefile.in
===================================================================
Index: doc/docbook/custom-stylesheets/xsl/Makefile.in
===================================================================
Index: doc/docbook/custom-stylesheets/xsl/fo/Makefile.in
===================================================================
Index: doc/docbook/custom-stylesheets/xsl/html/Makefile.in
===================================================================
Index: doc/docbook/custom-stylesheets/xsl/common/Makefile.in
===================================================================
Index: doc/Makefile.in
===================================================================
Index: doc/man/Makefile.in
===================================================================
Index: doc/doxygen/Makefile.in
===================================================================
Index: aclocal.m4
===================================================================

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2008-01-31  7:43 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-02 10:31 [Xenomai-core] High latencies on ARM Gilles Chanteperdrix
2008-01-17 10:42 ` Jan Kiszka
2008-01-17 10:47   ` Gilles Chanteperdrix
2008-01-17 11:55     ` Jan Kiszka
2008-01-17 13:59       ` Gilles Chanteperdrix
2008-01-17 14:16         ` Jan Kiszka
2008-01-17 14:18           ` Jan Kiszka
2008-01-17 14:20           ` Gilles Chanteperdrix
2008-01-17 14:22             ` Jan Kiszka
2008-01-17 15:37               ` Gilles Chanteperdrix
2008-01-31  7:43                 ` Gilles Chanteperdrix
2008-01-21 21:55               ` Gilles Chanteperdrix
2008-01-22 20:36 ` Gilles Chanteperdrix
2008-01-22 21:46   ` Jan Kiszka
2008-01-22 22:13     ` Gilles Chanteperdrix
2008-01-22 22:22     ` Gilles Chanteperdrix
     [not found] ` <18315.63245.160672.547658@domain.hid>
2008-01-22 20:36   ` Gilles Chanteperdrix
2008-01-23 17:48     ` Philippe Gerum
2008-01-23 17:53       ` Gilles Chanteperdrix
2008-01-23 18:34         ` Philippe Gerum
2008-01-23 18:39           ` Gilles Chanteperdrix
2008-01-23 22:38           ` Gilles Chanteperdrix
2008-01-24 10:18           ` Gilles Chanteperdrix
2008-01-26 18:17             ` Philippe Gerum
2008-01-26 18:43               ` Gilles Chanteperdrix
2008-01-27  0:19                 ` Philippe Gerum
     [not found]                   ` <18333.3277.36164.63798@domain.hid>
2008-01-27 23:34                     ` Philippe Gerum
2008-01-28 11:02                       ` Gilles Chanteperdrix
2008-01-28 12:18                         ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.