* [Xenomai] Xenomai-forge: thread using 100% cpu load
@ 2013-02-28 19:19 Ronny Meeus
2013-02-28 20:10 ` Gilles Chanteperdrix
0 siblings, 1 reply; 30+ messages in thread
From: Ronny Meeus @ 2013-02-28 19:19 UTC (permalink / raw)
To: xenomai
Hello
we are using the PSOS interface of Xenomai forge, running completely
in user-space using the mercury code.
We deploy our application on different processors, one product is
running on PPC multicore (P4040, P4080, P4034) and another one on
Cavium (8 core device).
The Linux version we use is 2.6.32 but I would assume that this is not
so relevant.
Our Xenomai application is running on one of the cores (affinity is
set), while the other cores are running other code.
On both architectures we recently start to see issues that one thread
is consuming 100% of the core on which the application is pinned.
The thread that monopolizes the core is the thread internally used to
manage the timers, running at the highest priority.
The trigger for running into this behavior is currently unclear.
If we only start a part of the application (platform management only),
the issue is not observed.
We see this on both an old version of Xenomai and a very recent one
(pulled from the git repo yesterday).
I will continue to debug this issue in the coming days and try isolate
the code that is triggering it, but I can use hints from the
community.
Debugging is complex since once the load starts, the debugger is not
reacting anymore.
If I put breakpoints in the functions that are called when the timer
expires (both oneshot and periodic), the process starts to clone
itself and I endup with tens of them.
Has anybody seen an issue like this before or does somebody has some
hints on how to debug this problem?
Many thanks.
---
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-02-28 19:19 [Xenomai] Xenomai-forge: thread using 100% cpu load Ronny Meeus
@ 2013-02-28 20:10 ` Gilles Chanteperdrix
2013-02-28 20:22 ` Thomas De Schampheleire
2013-02-28 20:30 ` Ronny Meeus
0 siblings, 2 replies; 30+ messages in thread
From: Gilles Chanteperdrix @ 2013-02-28 20:10 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 02/28/2013 08:19 PM, Ronny Meeus wrote:
> Hello
>
> we are using the PSOS interface of Xenomai forge, running completely
> in user-space using the mercury code.
> We deploy our application on different processors, one product is
> running on PPC multicore (P4040, P4080, P4034) and another one on
> Cavium (8 core device).
> The Linux version we use is 2.6.32 but I would assume that this is not
> so relevant.
>
> Our Xenomai application is running on one of the cores (affinity is
> set), while the other cores are running other code.
>
> On both architectures we recently start to see issues that one thread
> is consuming 100% of the core on which the application is pinned.
> The thread that monopolizes the core is the thread internally used to
> manage the timers, running at the highest priority.
> The trigger for running into this behavior is currently unclear.
> If we only start a part of the application (platform management only),
> the issue is not observed.
> We see this on both an old version of Xenomai and a very recent one
> (pulled from the git repo yesterday).
>
> I will continue to debug this issue in the coming days and try isolate
> the code that is triggering it, but I can use hints from the
> community.
> Debugging is complex since once the load starts, the debugger is not
> reacting anymore.
> If I put breakpoints in the functions that are called when the timer
> expires (both oneshot and periodic), the process starts to clone
> itself and I endup with tens of them.
>
> Has anybody seen an issue like this before or does somebody has some
> hints on how to debug this problem?
First enable the watchdog. It will send a signal to the application when
detecting a problem, then you can use the watchdog to trigger an I-pipe
tracer trace when the bug happens. You will probably have to increase
the watchdog polling frequency, in order to have a meaningful trace.
--
Gilles.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-02-28 20:10 ` Gilles Chanteperdrix
@ 2013-02-28 20:22 ` Thomas De Schampheleire
2013-02-28 20:27 ` Gilles Chanteperdrix
2013-03-01 8:22 ` Philippe Gerum
2013-02-28 20:30 ` Ronny Meeus
1 sibling, 2 replies; 30+ messages in thread
From: Thomas De Schampheleire @ 2013-02-28 20:22 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>
>> Hello
>>
>> we are using the PSOS interface of Xenomai forge, running completely
>> in user-space using the mercury code.
>> We deploy our application on different processors, one product is
>> running on PPC multicore (P4040, P4080, P4034) and another one on
>> Cavium (8 core device).
>> The Linux version we use is 2.6.32 but I would assume that this is not
>> so relevant.
>>
>> Our Xenomai application is running on one of the cores (affinity is
>> set), while the other cores are running other code.
>>
>> On both architectures we recently start to see issues that one thread
>> is consuming 100% of the core on which the application is pinned.
>> The thread that monopolizes the core is the thread internally used to
>> manage the timers, running at the highest priority.
>> The trigger for running into this behavior is currently unclear.
>> If we only start a part of the application (platform management only),
>> the issue is not observed.
>> We see this on both an old version of Xenomai and a very recent one
>> (pulled from the git repo yesterday).
>>
>> I will continue to debug this issue in the coming days and try isolate
>> the code that is triggering it, but I can use hints from the
>> community.
>> Debugging is complex since once the load starts, the debugger is not
>> reacting anymore.
>> If I put breakpoints in the functions that are called when the timer
>> expires (both oneshot and periodic), the process starts to clone
>> itself and I endup with tens of them.
>>
>> Has anybody seen an issue like this before or does somebody has some
>> hints on how to debug this problem?
>
>
> First enable the watchdog. It will send a signal to the application when
> detecting a problem, then you can use the watchdog to trigger an I-pipe
> tracer trace when the bug happens. You will probably have to increase
> the watchdog polling frequency, in order to have a meaningful trace.
>
I don't think an I-pipe tracer will be possible when using the Mercury
core, right (xenomai-forge) ?
Best regards,
Thomas
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-02-28 20:22 ` Thomas De Schampheleire
@ 2013-02-28 20:27 ` Gilles Chanteperdrix
2013-03-01 8:22 ` Philippe Gerum
1 sibling, 0 replies; 30+ messages in thread
From: Gilles Chanteperdrix @ 2013-02-28 20:27 UTC (permalink / raw)
To: Thomas De Schampheleire; +Cc: xenomai
On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>
>>> Hello
>>>
>>> we are using the PSOS interface of Xenomai forge, running completely
>>> in user-space using the mercury code.
>>> We deploy our application on different processors, one product is
>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>> Cavium (8 core device).
>>> The Linux version we use is 2.6.32 but I would assume that this is not
>>> so relevant.
>>>
>>> Our Xenomai application is running on one of the cores (affinity is
>>> set), while the other cores are running other code.
>>>
>>> On both architectures we recently start to see issues that one thread
>>> is consuming 100% of the core on which the application is pinned.
>>> The thread that monopolizes the core is the thread internally used to
>>> manage the timers, running at the highest priority.
>>> The trigger for running into this behavior is currently unclear.
>>> If we only start a part of the application (platform management only),
>>> the issue is not observed.
>>> We see this on both an old version of Xenomai and a very recent one
>>> (pulled from the git repo yesterday).
>>>
>>> I will continue to debug this issue in the coming days and try isolate
>>> the code that is triggering it, but I can use hints from the
>>> community.
>>> Debugging is complex since once the load starts, the debugger is not
>>> reacting anymore.
>>> If I put breakpoints in the functions that are called when the timer
>>> expires (both oneshot and periodic), the process starts to clone
>>> itself and I endup with tens of them.
>>>
>>> Has anybody seen an issue like this before or does somebody has some
>>> hints on how to debug this problem?
>>
>>
>> First enable the watchdog. It will send a signal to the application when
>> detecting a problem, then you can use the watchdog to trigger an I-pipe
>> tracer trace when the bug happens. You will probably have to increase
>> the watchdog polling frequency, in order to have a meaningful trace.
>>
>
> I don't think an I-pipe tracer will be possible when using the Mercury
> core, right (xenomai-forge) ?
As the name indicates the "I-PIPE tracer" has nothing to do with the
version of Xenomai you use. Anyway, you can firs try to trigger a
backtrace when the watchdog sends its signal, in case the bug is in the
application and not in the kernel. You never know...
--
Gilles.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-02-28 20:10 ` Gilles Chanteperdrix
2013-02-28 20:22 ` Thomas De Schampheleire
@ 2013-02-28 20:30 ` Ronny Meeus
2013-02-28 20:35 ` Gilles Chanteperdrix
1 sibling, 1 reply; 30+ messages in thread
From: Ronny Meeus @ 2013-02-28 20:30 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>
>> Hello
>>
>> we are using the PSOS interface of Xenomai forge, running completely
>> in user-space using the mercury code.
>> We deploy our application on different processors, one product is
>> running on PPC multicore (P4040, P4080, P4034) and another one on
>> Cavium (8 core device).
>> The Linux version we use is 2.6.32 but I would assume that this is not
>> so relevant.
>>
>> Our Xenomai application is running on one of the cores (affinity is
>> set), while the other cores are running other code.
>>
>> On both architectures we recently start to see issues that one thread
>> is consuming 100% of the core on which the application is pinned.
>> The thread that monopolizes the core is the thread internally used to
>> manage the timers, running at the highest priority.
>> The trigger for running into this behavior is currently unclear.
>> If we only start a part of the application (platform management only),
>> the issue is not observed.
>> We see this on both an old version of Xenomai and a very recent one
>> (pulled from the git repo yesterday).
>>
>> I will continue to debug this issue in the coming days and try isolate
>> the code that is triggering it, but I can use hints from the
>> community.
>> Debugging is complex since once the load starts, the debugger is not
>> reacting anymore.
>> If I put breakpoints in the functions that are called when the timer
>> expires (both oneshot and periodic), the process starts to clone
>> itself and I endup with tens of them.
>>
>> Has anybody seen an issue like this before or does somebody has some
>> hints on how to debug this problem?
>
>
> First enable the watchdog. It will send a signal to the application when
> detecting a problem, then you can use the watchdog to trigger an I-pipe
> tracer trace when the bug happens. You will probably have to increase
> the watchdog polling frequency, in order to have a meaningful trace.
>
> --
> Gilles.
Gilles,
We are running completely in user-space (mercury) .
I thought that the watchdog and I-pipe tracer are only relevant when
using the cobalt code.
In case my assumption is wrong, please correct me and let me know how
to enable it.
---
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-02-28 20:30 ` Ronny Meeus
@ 2013-02-28 20:35 ` Gilles Chanteperdrix
0 siblings, 0 replies; 30+ messages in thread
From: Gilles Chanteperdrix @ 2013-02-28 20:35 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 02/28/2013 09:30 PM, Ronny Meeus wrote:
> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>
>>> Hello
>>>
>>> we are using the PSOS interface of Xenomai forge, running completely
>>> in user-space using the mercury code.
>>> We deploy our application on different processors, one product is
>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>> Cavium (8 core device).
>>> The Linux version we use is 2.6.32 but I would assume that this is not
>>> so relevant.
>>>
>>> Our Xenomai application is running on one of the cores (affinity is
>>> set), while the other cores are running other code.
>>>
>>> On both architectures we recently start to see issues that one thread
>>> is consuming 100% of the core on which the application is pinned.
>>> The thread that monopolizes the core is the thread internally used to
>>> manage the timers, running at the highest priority.
>>> The trigger for running into this behavior is currently unclear.
>>> If we only start a part of the application (platform management only),
>>> the issue is not observed.
>>> We see this on both an old version of Xenomai and a very recent one
>>> (pulled from the git repo yesterday).
>>>
>>> I will continue to debug this issue in the coming days and try isolate
>>> the code that is triggering it, but I can use hints from the
>>> community.
>>> Debugging is complex since once the load starts, the debugger is not
>>> reacting anymore.
>>> If I put breakpoints in the functions that are called when the timer
>>> expires (both oneshot and periodic), the process starts to clone
>>> itself and I endup with tens of them.
>>>
>>> Has anybody seen an issue like this before or does somebody has some
>>> hints on how to debug this problem?
>>
>>
>> First enable the watchdog. It will send a signal to the application when
>> detecting a problem, then you can use the watchdog to trigger an I-pipe
>> tracer trace when the bug happens. You will probably have to increase
>> the watchdog polling frequency, in order to have a meaningful trace.
>>
>> --
>> Gilles.
>
> Gilles,
>
> We are running completely in user-space (mercury) .
cobalt also runs in user-space.
> I thought that the watchdog and I-pipe tracer are only relevant when
> using the cobalt code.
> In case my assumption is wrong, please correct me and let me know how
> to enable it.
Yes, if you are using plain linux, there are even more tools to debug
the problem:
- you can enable RT throttling to avoid the machine lockup by the buggy
thread
- you can enable the kernel detection for just your case
(CONFIG_LOCKUP_DETECTOR)
- if you are on x86 you can use the NMI watchdog
- you can use FTRACE instead of the I-pipe tracer
- or you can decide to compile the kernel with CONFIG_IPIPE and
CONFIG_IPIPE_TRACE to use the I-pipe tracer without Xenomai.
- maybe xenomai-forge's "slackspot" tool works for mecury?
--
Gilles.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-02-28 20:22 ` Thomas De Schampheleire
2013-02-28 20:27 ` Gilles Chanteperdrix
@ 2013-03-01 8:22 ` Philippe Gerum
2013-03-01 8:26 ` Gilles Chanteperdrix
1 sibling, 1 reply; 30+ messages in thread
From: Philippe Gerum @ 2013-03-01 8:22 UTC (permalink / raw)
To: Thomas De Schampheleire; +Cc: xenomai
On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>
>>> Hello
>>>
>>> we are using the PSOS interface of Xenomai forge, running completely
>>> in user-space using the mercury code.
>>> We deploy our application on different processors, one product is
>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>> Cavium (8 core device).
>>> The Linux version we use is 2.6.32 but I would assume that this is not
>>> so relevant.
>>>
>>> Our Xenomai application is running on one of the cores (affinity is
>>> set), while the other cores are running other code.
>>>
>>> On both architectures we recently start to see issues that one thread
>>> is consuming 100% of the core on which the application is pinned.
>>> The thread that monopolizes the core is the thread internally used to
>>> manage the timers, running at the highest priority.
>>> The trigger for running into this behavior is currently unclear.
>>> If we only start a part of the application (platform management only),
>>> the issue is not observed.
>>> We see this on both an old version of Xenomai and a very recent one
>>> (pulled from the git repo yesterday).
>>>
>>> I will continue to debug this issue in the coming days and try isolate
>>> the code that is triggering it, but I can use hints from the
>>> community.
>>> Debugging is complex since once the load starts, the debugger is not
>>> reacting anymore.
>>> If I put breakpoints in the functions that are called when the timer
>>> expires (both oneshot and periodic), the process starts to clone
>>> itself and I endup with tens of them.
>>>
>>> Has anybody seen an issue like this before or does somebody has some
>>> hints on how to debug this problem?
>>
>>
>> First enable the watchdog. It will send a signal to the application when
>> detecting a problem, then you can use the watchdog to trigger an I-pipe
>> tracer trace when the bug happens. You will probably have to increase
>> the watchdog polling frequency, in order to have a meaningful trace.
>>
>
> I don't think an I-pipe tracer will be possible when using the Mercury
> core, right (xenomai-forge) ?
>
Correct.
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-01 8:22 ` Philippe Gerum
@ 2013-03-01 8:26 ` Gilles Chanteperdrix
2013-03-01 8:30 ` Philippe Gerum
0 siblings, 1 reply; 30+ messages in thread
From: Gilles Chanteperdrix @ 2013-03-01 8:26 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On 03/01/2013 09:22 AM, Philippe Gerum wrote:
> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>
>>>> Hello
>>>>
>>>> we are using the PSOS interface of Xenomai forge, running completely
>>>> in user-space using the mercury code.
>>>> We deploy our application on different processors, one product is
>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>> Cavium (8 core device).
>>>> The Linux version we use is 2.6.32 but I would assume that this is not
>>>> so relevant.
>>>>
>>>> Our Xenomai application is running on one of the cores (affinity is
>>>> set), while the other cores are running other code.
>>>>
>>>> On both architectures we recently start to see issues that one thread
>>>> is consuming 100% of the core on which the application is pinned.
>>>> The thread that monopolizes the core is the thread internally used to
>>>> manage the timers, running at the highest priority.
>>>> The trigger for running into this behavior is currently unclear.
>>>> If we only start a part of the application (platform management only),
>>>> the issue is not observed.
>>>> We see this on both an old version of Xenomai and a very recent one
>>>> (pulled from the git repo yesterday).
>>>>
>>>> I will continue to debug this issue in the coming days and try isolate
>>>> the code that is triggering it, but I can use hints from the
>>>> community.
>>>> Debugging is complex since once the load starts, the debugger is not
>>>> reacting anymore.
>>>> If I put breakpoints in the functions that are called when the timer
>>>> expires (both oneshot and periodic), the process starts to clone
>>>> itself and I endup with tens of them.
>>>>
>>>> Has anybody seen an issue like this before or does somebody has some
>>>> hints on how to debug this problem?
>>>
>>>
>>> First enable the watchdog. It will send a signal to the application when
>>> detecting a problem, then you can use the watchdog to trigger an I-pipe
>>> tracer trace when the bug happens. You will probably have to increase
>>> the watchdog polling frequency, in order to have a meaningful trace.
>>>
>>
>> I don't think an I-pipe tracer will be possible when using the Mercury
>> core, right (xenomai-forge) ?
>>
>
> Correct.
I do not think so. The way I see it, you can enable the I-pipe tracer
without CONFIG_XENOMAI.
--
Gilles.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-01 8:26 ` Gilles Chanteperdrix
@ 2013-03-01 8:30 ` Philippe Gerum
2013-03-01 8:30 ` Gilles Chanteperdrix
0 siblings, 1 reply; 30+ messages in thread
From: Philippe Gerum @ 2013-03-01 8:30 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>
>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>
>>>>> Hello
>>>>>
>>>>> we are using the PSOS interface of Xenomai forge, running completely
>>>>> in user-space using the mercury code.
>>>>> We deploy our application on different processors, one product is
>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>> Cavium (8 core device).
>>>>> The Linux version we use is 2.6.32 but I would assume that this is not
>>>>> so relevant.
>>>>>
>>>>> Our Xenomai application is running on one of the cores (affinity is
>>>>> set), while the other cores are running other code.
>>>>>
>>>>> On both architectures we recently start to see issues that one thread
>>>>> is consuming 100% of the core on which the application is pinned.
>>>>> The thread that monopolizes the core is the thread internally used to
>>>>> manage the timers, running at the highest priority.
>>>>> The trigger for running into this behavior is currently unclear.
>>>>> If we only start a part of the application (platform management only),
>>>>> the issue is not observed.
>>>>> We see this on both an old version of Xenomai and a very recent one
>>>>> (pulled from the git repo yesterday).
>>>>>
>>>>> I will continue to debug this issue in the coming days and try isolate
>>>>> the code that is triggering it, but I can use hints from the
>>>>> community.
>>>>> Debugging is complex since once the load starts, the debugger is not
>>>>> reacting anymore.
>>>>> If I put breakpoints in the functions that are called when the timer
>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>> itself and I endup with tens of them.
>>>>>
>>>>> Has anybody seen an issue like this before or does somebody has some
>>>>> hints on how to debug this problem?
>>>>
>>>>
>>>> First enable the watchdog. It will send a signal to the application when
>>>> detecting a problem, then you can use the watchdog to trigger an I-pipe
>>>> tracer trace when the bug happens. You will probably have to increase
>>>> the watchdog polling frequency, in order to have a meaningful trace.
>>>>
>>>
>>> I don't think an I-pipe tracer will be possible when using the Mercury
>>> core, right (xenomai-forge) ?
>>>
>>
>> Correct.
>
>
> I do not think so. The way I see it, you can enable the I-pipe tracer
> without CONFIG_XENOMAI.
>
Mercury has NO pipeline in the kernel.
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-01 8:30 ` Philippe Gerum
@ 2013-03-01 8:30 ` Gilles Chanteperdrix
2013-03-01 8:41 ` Philippe Gerum
0 siblings, 1 reply; 30+ messages in thread
From: Gilles Chanteperdrix @ 2013-03-01 8:30 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On 03/01/2013 09:30 AM, Philippe Gerum wrote:
> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>
>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> we are using the PSOS interface of Xenomai forge, running completely
>>>>>> in user-space using the mercury code.
>>>>>> We deploy our application on different processors, one product is
>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>> Cavium (8 core device).
>>>>>> The Linux version we use is 2.6.32 but I would assume that this is not
>>>>>> so relevant.
>>>>>>
>>>>>> Our Xenomai application is running on one of the cores (affinity is
>>>>>> set), while the other cores are running other code.
>>>>>>
>>>>>> On both architectures we recently start to see issues that one thread
>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>> The thread that monopolizes the core is the thread internally used to
>>>>>> manage the timers, running at the highest priority.
>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>> If we only start a part of the application (platform management only),
>>>>>> the issue is not observed.
>>>>>> We see this on both an old version of Xenomai and a very recent one
>>>>>> (pulled from the git repo yesterday).
>>>>>>
>>>>>> I will continue to debug this issue in the coming days and try isolate
>>>>>> the code that is triggering it, but I can use hints from the
>>>>>> community.
>>>>>> Debugging is complex since once the load starts, the debugger is not
>>>>>> reacting anymore.
>>>>>> If I put breakpoints in the functions that are called when the timer
>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>> itself and I endup with tens of them.
>>>>>>
>>>>>> Has anybody seen an issue like this before or does somebody has some
>>>>>> hints on how to debug this problem?
>>>>>
>>>>>
>>>>> First enable the watchdog. It will send a signal to the application when
>>>>> detecting a problem, then you can use the watchdog to trigger an I-pipe
>>>>> tracer trace when the bug happens. You will probably have to increase
>>>>> the watchdog polling frequency, in order to have a meaningful trace.
>>>>>
>>>>
>>>> I don't think an I-pipe tracer will be possible when using the Mercury
>>>> core, right (xenomai-forge) ?
>>>>
>>>
>>> Correct.
>>
>>
>> I do not think so. The way I see it, you can enable the I-pipe tracer
>> without CONFIG_XENOMAI.
>>
>
> Mercury has NO pipeline in the kernel.
>
You mean mercury can not run with an I-pipe kernel?
--
Gilles.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-01 8:30 ` Gilles Chanteperdrix
@ 2013-03-01 8:41 ` Philippe Gerum
2013-03-02 11:13 ` Ronny Meeus
0 siblings, 1 reply; 30+ messages in thread
From: Philippe Gerum @ 2013-03-01 8:41 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>
>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>
>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> we are using the PSOS interface of Xenomai forge, running completely
>>>>>>> in user-space using the mercury code.
>>>>>>> We deploy our application on different processors, one product is
>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>>> Cavium (8 core device).
>>>>>>> The Linux version we use is 2.6.32 but I would assume that this is not
>>>>>>> so relevant.
>>>>>>>
>>>>>>> Our Xenomai application is running on one of the cores (affinity is
>>>>>>> set), while the other cores are running other code.
>>>>>>>
>>>>>>> On both architectures we recently start to see issues that one thread
>>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>>> The thread that monopolizes the core is the thread internally used to
>>>>>>> manage the timers, running at the highest priority.
>>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>>> If we only start a part of the application (platform management only),
>>>>>>> the issue is not observed.
>>>>>>> We see this on both an old version of Xenomai and a very recent one
>>>>>>> (pulled from the git repo yesterday).
>>>>>>>
>>>>>>> I will continue to debug this issue in the coming days and try isolate
>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>> community.
>>>>>>> Debugging is complex since once the load starts, the debugger is not
>>>>>>> reacting anymore.
>>>>>>> If I put breakpoints in the functions that are called when the timer
>>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>>> itself and I endup with tens of them.
>>>>>>>
>>>>>>> Has anybody seen an issue like this before or does somebody has some
>>>>>>> hints on how to debug this problem?
>>>>>>
>>>>>>
>>>>>> First enable the watchdog. It will send a signal to the application when
>>>>>> detecting a problem, then you can use the watchdog to trigger an I-pipe
>>>>>> tracer trace when the bug happens. You will probably have to increase
>>>>>> the watchdog polling frequency, in order to have a meaningful trace.
>>>>>>
>>>>>
>>>>> I don't think an I-pipe tracer will be possible when using the Mercury
>>>>> core, right (xenomai-forge) ?
>>>>>
>>>>
>>>> Correct.
>>>
>>>
>>> I do not think so. The way I see it, you can enable the I-pipe tracer
>>> without CONFIG_XENOMAI.
>>>
>>
>> Mercury has NO pipeline in the kernel.
>>
>
> You mean mercury can not run with an I-pipe kernel?
>
I mean it does not care about the pipeline, it does not need it. So if
this is about observing kernel activity, then ftrace should be fine, or
possibly perf to find out where userland spends time.
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-01 8:41 ` Philippe Gerum
@ 2013-03-02 11:13 ` Ronny Meeus
2013-03-05 12:43 ` Ronny Meeus
2013-03-06 13:49 ` Philippe Gerum
0 siblings, 2 replies; 30+ messages in thread
From: Ronny Meeus @ 2013-03-02 11:13 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On Fri, Mar 1, 2013 at 9:41 AM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
>>
>> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>>
>>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>>>
>>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>>
>>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>>>
>>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>
>>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>>
>>>>>>>> Hello
>>>>>>>>
>>>>>>>> we are using the PSOS interface of Xenomai forge, running completely
>>>>>>>> in user-space using the mercury code.
>>>>>>>> We deploy our application on different processors, one product is
>>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>>>> Cavium (8 core device).
>>>>>>>> The Linux version we use is 2.6.32 but I would assume that this is
>>>>>>>> not
>>>>>>>> so relevant.
>>>>>>>>
>>>>>>>> Our Xenomai application is running on one of the cores (affinity is
>>>>>>>> set), while the other cores are running other code.
>>>>>>>>
>>>>>>>> On both architectures we recently start to see issues that one
>>>>>>>> thread
>>>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>>>> The thread that monopolizes the core is the thread internally used
>>>>>>>> to
>>>>>>>> manage the timers, running at the highest priority.
>>>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>>>> If we only start a part of the application (platform management
>>>>>>>> only),
>>>>>>>> the issue is not observed.
>>>>>>>> We see this on both an old version of Xenomai and a very recent one
>>>>>>>> (pulled from the git repo yesterday).
>>>>>>>>
>>>>>>>> I will continue to debug this issue in the coming days and try
>>>>>>>> isolate
>>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>>> community.
>>>>>>>> Debugging is complex since once the load starts, the debugger is not
>>>>>>>> reacting anymore.
>>>>>>>> If I put breakpoints in the functions that are called when the timer
>>>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>>>> itself and I endup with tens of them.
>>>>>>>>
>>>>>>>> Has anybody seen an issue like this before or does somebody has some
>>>>>>>> hints on how to debug this problem?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> First enable the watchdog. It will send a signal to the application
>>>>>>> when
>>>>>>> detecting a problem, then you can use the watchdog to trigger an
>>>>>>> I-pipe
>>>>>>> tracer trace when the bug happens. You will probably have to increase
>>>>>>> the watchdog polling frequency, in order to have a meaningful trace.
>>>>>>>
>>>>>>
>>>>>> I don't think an I-pipe tracer will be possible when using the Mercury
>>>>>> core, right (xenomai-forge) ?
>>>>>>
>>>>>
>>>>> Correct.
>>>>
>>>>
>>>>
>>>> I do not think so. The way I see it, you can enable the I-pipe tracer
>>>> without CONFIG_XENOMAI.
>>>>
>>>
>>> Mercury has NO pipeline in the kernel.
>>>
>>
>> You mean mercury can not run with an I-pipe kernel?
>>
>
> I mean it does not care about the pipeline, it does not need it. So if this
> is about observing kernel activity, then ftrace should be fine, or possibly
> perf to find out where userland spends time.
>
> --
> Philippe.
>
>
> _______________________________________________
> Xenomai mailing list
> Xenomai@xenomai.org
> http://www.xenomai.org/mailman/listinfo/xenomai
Hello
An update on the investigation:
I was able to make this issue disappear by changing the timeout value
of the smallest timers we use.
We use a couple of timers with a timeout of 25ms. By enlarging these
to 25sec and the problem is gone.
Yesterday I was also able to see (using the"strace" tool) the process
executing constantly "clone" system calls.
Note that the process we use is large (2Gb) and uses an mlockall call.
In http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
I see that a new thread is created when the timer_create is called for
the first time. This thread stays alive until the program exits and is
used to process the timer expiries.
I have the feeling that there is an issue during the creation of this
thread. For example what would happen if the clone operation takes
longer than the time needed to perform the clone operation?
In the past we already observed issues with the clone call that we
could not explain (creation of the clone simply failed on our
application while it was working fine on a smaller application).
Do you guys know whether there is an impact on the clone operation by
this mlockall call?
I will try to make a small test application on which the issue can be
reproduced.
---
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-02 11:13 ` Ronny Meeus
@ 2013-03-05 12:43 ` Ronny Meeus
2013-03-05 13:28 ` Philippe Gerum
2013-03-05 14:08 ` Philippe Gerum
2013-03-06 13:49 ` Philippe Gerum
1 sibling, 2 replies; 30+ messages in thread
From: Ronny Meeus @ 2013-03-05 12:43 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On Sat, Mar 2, 2013 at 12:13 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote:
> On Fri, Mar 1, 2013 at 9:41 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
>>>
>>> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>>>
>>>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>>>>
>>>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>>>
>>>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>>>>
>>>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>
>>>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>>>
>>>>>>>>> Hello
>>>>>>>>>
>>>>>>>>> we are using the PSOS interface of Xenomai forge, running completely
>>>>>>>>> in user-space using the mercury code.
>>>>>>>>> We deploy our application on different processors, one product is
>>>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>>>>> Cavium (8 core device).
>>>>>>>>> The Linux version we use is 2.6.32 but I would assume that this is
>>>>>>>>> not
>>>>>>>>> so relevant.
>>>>>>>>>
>>>>>>>>> Our Xenomai application is running on one of the cores (affinity is
>>>>>>>>> set), while the other cores are running other code.
>>>>>>>>>
>>>>>>>>> On both architectures we recently start to see issues that one
>>>>>>>>> thread
>>>>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>>>>> The thread that monopolizes the core is the thread internally used
>>>>>>>>> to
>>>>>>>>> manage the timers, running at the highest priority.
>>>>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>>>>> If we only start a part of the application (platform management
>>>>>>>>> only),
>>>>>>>>> the issue is not observed.
>>>>>>>>> We see this on both an old version of Xenomai and a very recent one
>>>>>>>>> (pulled from the git repo yesterday).
>>>>>>>>>
>>>>>>>>> I will continue to debug this issue in the coming days and try
>>>>>>>>> isolate
>>>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>>>> community.
>>>>>>>>> Debugging is complex since once the load starts, the debugger is not
>>>>>>>>> reacting anymore.
>>>>>>>>> If I put breakpoints in the functions that are called when the timer
>>>>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>>>>> itself and I endup with tens of them.
>>>>>>>>>
>>>>>>>>> Has anybody seen an issue like this before or does somebody has some
>>>>>>>>> hints on how to debug this problem?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> First enable the watchdog. It will send a signal to the application
>>>>>>>> when
>>>>>>>> detecting a problem, then you can use the watchdog to trigger an
>>>>>>>> I-pipe
>>>>>>>> tracer trace when the bug happens. You will probably have to increase
>>>>>>>> the watchdog polling frequency, in order to have a meaningful trace.
>>>>>>>>
>>>>>>>
>>>>>>> I don't think an I-pipe tracer will be possible when using the Mercury
>>>>>>> core, right (xenomai-forge) ?
>>>>>>>
>>>>>>
>>>>>> Correct.
>>>>>
>>>>>
>>>>>
>>>>> I do not think so. The way I see it, you can enable the I-pipe tracer
>>>>> without CONFIG_XENOMAI.
>>>>>
>>>>
>>>> Mercury has NO pipeline in the kernel.
>>>>
>>>
>>> You mean mercury can not run with an I-pipe kernel?
>>>
>>
>> I mean it does not care about the pipeline, it does not need it. So if this
>> is about observing kernel activity, then ftrace should be fine, or possibly
>> perf to find out where userland spends time.
>>
>> --
>> Philippe.
>>
>>
>> _______________________________________________
>> Xenomai mailing list
>> Xenomai@xenomai.org
>> http://www.xenomai.org/mailman/listinfo/xenomai
>
> Hello
>
> An update on the investigation:
> I was able to make this issue disappear by changing the timeout value
> of the smallest timers we use.
> We use a couple of timers with a timeout of 25ms. By enlarging these
> to 25sec and the problem is gone.
>
> Yesterday I was also able to see (using the"strace" tool) the process
> executing constantly "clone" system calls.
> Note that the process we use is large (2Gb) and uses an mlockall call.
>
> In http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
> I see that a new thread is created when the timer_create is called for
> the first time. This thread stays alive until the program exits and is
> used to process the timer expiries.
> I have the feeling that there is an issue during the creation of this
> thread. For example what would happen if the clone operation takes
> longer than the time needed to perform the clone operation?
> In the past we already observed issues with the clone call that we
> could not explain (creation of the clone simply failed on our
> application while it was working fine on a smaller application).
>
> Do you guys know whether there is an impact on the clone operation by
> this mlockall call?
>
> I will try to make a small test application on which the issue can be
> reproduced.
>
> ---
> Ronny
I'm able to reproduce the issue on a small test build:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <psos.h>
#include <copperplate/init.h>
#include <stdlib.h>
#include <string.h>
static void foo (u_long a0, u_long a1, u_long a2, u_long a3)
{
u_long ret, ev = 0, tmid,tmid2;
ret = tm_evevery(1,1,&tmid);
ret = tm_evafter(30000,4,&tmid2);
while (1) {
ret = ev_receive(0xFF,EV_ANY|EV_WAIT,0,&ev);
if (ev & 4) {
printf("%lx Restarting one-shot timer.
ev=%lx\n",ret,ev);
tm_evafter(30000,4,&tmid2);
}
ev = 0;
}
tm_wkafter(100);
}
int main(int argc, char * const *argv)
{
u_long ret, tid = 0, args[4];
mlockall(MCL_CURRENT | MCL_FUTURE);
copperplate_init(&argc,&argv);
ret = t_create("TEST",97, 0, 0, 0, &tid);
printf("t_create(tid=%lu) = %lu\n", tid, ret);
args[0] = 1;
args[1] = 2;
args[2] = 3;
args[3] = 4;
ret = t_start(tid, 0, foo, args);
printf("t_start(tid=%lu) = %lu\n", tid, ret);
while (1)
tm_wkafter(100);
return 0;
}
The TEST task starts 2 timers: one periodic and one 1shot timer.
Each time the one-shot timer expires, a print is done and the timer is
restarted.
Observation is that once the one-shot timer expires, the application
starts to use 100% cpuload on one core and the application code is not
executed anymore. So it looks like there is constant processing in
either Xenomai or the library code to process the timer handling. If
periodic timers are used the issue is not observed.
Best regards,
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-05 12:43 ` Ronny Meeus
@ 2013-03-05 13:28 ` Philippe Gerum
2013-03-05 14:08 ` Philippe Gerum
1 sibling, 0 replies; 30+ messages in thread
From: Philippe Gerum @ 2013-03-05 13:28 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/05/2013 01:43 PM, Ronny Meeus wrote:
>
> I'm able to reproduce the issue on a small test build:
>
> #include <stdio.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/mman.h>
> #include <psos.h>
> #include <copperplate/init.h>
> #include <stdlib.h>
> #include <string.h>
>
> static void foo (u_long a0, u_long a1, u_long a2, u_long a3)
> {
> u_long ret, ev = 0, tmid,tmid2;
>
> ret = tm_evevery(1,1,&tmid);
> ret = tm_evafter(30000,4,&tmid2);
> while (1) {
> ret = ev_receive(0xFF,EV_ANY|EV_WAIT,0,&ev);
Can you check the return vakue here?
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-05 12:43 ` Ronny Meeus
2013-03-05 13:28 ` Philippe Gerum
@ 2013-03-05 14:08 ` Philippe Gerum
2013-03-05 14:25 ` Ronny Meeus
2013-03-06 10:55 ` Ronny Meeus
1 sibling, 2 replies; 30+ messages in thread
From: Philippe Gerum @ 2013-03-05 14:08 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/05/2013 01:43 PM, Ronny Meeus wrote:
> On Sat, Mar 2, 2013 at 12:13 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote:
>> On Fri, Mar 1, 2013 at 9:41 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>>> On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
>>>>
>>>> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>>>>
>>>>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>>>>>
>>>>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>>>>
>>>>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>>>>>
>>>>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>
>>>>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>>>>
>>>>>>>>>> Hello
>>>>>>>>>>
>>>>>>>>>> we are using the PSOS interface of Xenomai forge, running completely
>>>>>>>>>> in user-space using the mercury code.
>>>>>>>>>> We deploy our application on different processors, one product is
>>>>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>>>>>> Cavium (8 core device).
>>>>>>>>>> The Linux version we use is 2.6.32 but I would assume that this is
>>>>>>>>>> not
>>>>>>>>>> so relevant.
>>>>>>>>>>
>>>>>>>>>> Our Xenomai application is running on one of the cores (affinity is
>>>>>>>>>> set), while the other cores are running other code.
>>>>>>>>>>
>>>>>>>>>> On both architectures we recently start to see issues that one
>>>>>>>>>> thread
>>>>>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>>>>>> The thread that monopolizes the core is the thread internally used
>>>>>>>>>> to
>>>>>>>>>> manage the timers, running at the highest priority.
>>>>>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>>>>>> If we only start a part of the application (platform management
>>>>>>>>>> only),
>>>>>>>>>> the issue is not observed.
>>>>>>>>>> We see this on both an old version of Xenomai and a very recent one
>>>>>>>>>> (pulled from the git repo yesterday).
>>>>>>>>>>
>>>>>>>>>> I will continue to debug this issue in the coming days and try
>>>>>>>>>> isolate
>>>>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>>>>> community.
>>>>>>>>>> Debugging is complex since once the load starts, the debugger is not
>>>>>>>>>> reacting anymore.
>>>>>>>>>> If I put breakpoints in the functions that are called when the timer
>>>>>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>>>>>> itself and I endup with tens of them.
>>>>>>>>>>
>>>>>>>>>> Has anybody seen an issue like this before or does somebody has some
>>>>>>>>>> hints on how to debug this problem?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> First enable the watchdog. It will send a signal to the application
>>>>>>>>> when
>>>>>>>>> detecting a problem, then you can use the watchdog to trigger an
>>>>>>>>> I-pipe
>>>>>>>>> tracer trace when the bug happens. You will probably have to increase
>>>>>>>>> the watchdog polling frequency, in order to have a meaningful trace.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don't think an I-pipe tracer will be possible when using the Mercury
>>>>>>>> core, right (xenomai-forge) ?
>>>>>>>>
>>>>>>>
>>>>>>> Correct.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I do not think so. The way I see it, you can enable the I-pipe tracer
>>>>>> without CONFIG_XENOMAI.
>>>>>>
>>>>>
>>>>> Mercury has NO pipeline in the kernel.
>>>>>
>>>>
>>>> You mean mercury can not run with an I-pipe kernel?
>>>>
>>>
>>> I mean it does not care about the pipeline, it does not need it. So if this
>>> is about observing kernel activity, then ftrace should be fine, or possibly
>>> perf to find out where userland spends time.
>>>
>>> --
>>> Philippe.
>>>
>>>
>>> _______________________________________________
>>> Xenomai mailing list
>>> Xenomai@xenomai.org
>>> http://www.xenomai.org/mailman/listinfo/xenomai
>>
>> Hello
>>
>> An update on the investigation:
>> I was able to make this issue disappear by changing the timeout value
>> of the smallest timers we use.
>> We use a couple of timers with a timeout of 25ms. By enlarging these
>> to 25sec and the problem is gone.
>>
>> Yesterday I was also able to see (using the"strace" tool) the process
>> executing constantly "clone" system calls.
>> Note that the process we use is large (2Gb) and uses an mlockall call.
>>
>> In http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
>> I see that a new thread is created when the timer_create is called for
>> the first time. This thread stays alive until the program exits and is
>> used to process the timer expiries.
>> I have the feeling that there is an issue during the creation of this
>> thread. For example what would happen if the clone operation takes
>> longer than the time needed to perform the clone operation?
>> In the past we already observed issues with the clone call that we
>> could not explain (creation of the clone simply failed on our
>> application while it was working fine on a smaller application).
>>
>> Do you guys know whether there is an impact on the clone operation by
>> this mlockall call?
>>
>> I will try to make a small test application on which the issue can be
>> reproduced.
>>
>> ---
>> Ronny
>
> I'm able to reproduce the issue on a small test build:
>
> #include <stdio.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/mman.h>
> #include <psos.h>
> #include <copperplate/init.h>
> #include <stdlib.h>
> #include <string.h>
>
> static void foo (u_long a0, u_long a1, u_long a2, u_long a3)
> {
> u_long ret, ev = 0, tmid,tmid2;
>
> ret = tm_evevery(1,1,&tmid);
> ret = tm_evafter(30000,4,&tmid2);
> while (1) {
> ret = ev_receive(0xFF,EV_ANY|EV_WAIT,0,&ev);
> if (ev & 4) {
> printf("%lx Restarting one-shot timer.
> ev=%lx\n",ret,ev);
> tm_evafter(30000,4,&tmid2);
> }
> ev = 0;
> }
> tm_wkafter(100);
> }
>
> int main(int argc, char * const *argv)
> {
> u_long ret, tid = 0, args[4];
>
> mlockall(MCL_CURRENT | MCL_FUTURE);
> copperplate_init(&argc,&argv);
>
> ret = t_create("TEST",97, 0, 0, 0, &tid);
> printf("t_create(tid=%lu) = %lu\n", tid, ret);
> args[0] = 1;
> args[1] = 2;
> args[2] = 3;
> args[3] = 4;
> ret = t_start(tid, 0, foo, args);
> printf("t_start(tid=%lu) = %lu\n", tid, ret);
>
> while (1)
> tm_wkafter(100);
> return 0;
> }
>
> The TEST task starts 2 timers: one periodic and one 1shot timer.
> Each time the one-shot timer expires, a print is done and the timer is
> restarted.
>
> Observation is that once the one-shot timer expires, the application
> starts to use 100% cpuload on one core and the application code is not
> executed anymore. So it looks like there is constant processing in
> either Xenomai or the library code to process the timer handling. If
> periodic timers are used the issue is not observed.
>
I can't reproduce this bug using that test code, over glibc 2.15/x86. We
might have a problem with SIGEV_THREAD. Which glibc release are you running?
Also, do you observe the same issue with a larger event interval for the
periodic timer (e.g. 1000 ticks)?
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-05 14:08 ` Philippe Gerum
@ 2013-03-05 14:25 ` Ronny Meeus
2013-03-05 14:47 ` Philippe Gerum
2013-03-06 10:55 ` Ronny Meeus
1 sibling, 1 reply; 30+ messages in thread
From: Ronny Meeus @ 2013-03-05 14:25 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On Tue, Mar 5, 2013 at 3:08 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/05/2013 01:43 PM, Ronny Meeus wrote:
>>
>> On Sat, Mar 2, 2013 at 12:13 PM, Ronny Meeus <ronny.meeus@gmail.com>
>> wrote:
>>>
>>> On Fri, Mar 1, 2013 at 9:41 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>>
>>>> On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
>>>>>
>>>>>
>>>>> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>>>>>
>>>>>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>>>>>
>>>>>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello
>>>>>>>>>>>
>>>>>>>>>>> we are using the PSOS interface of Xenomai forge, running
>>>>>>>>>>> completely
>>>>>>>>>>> in user-space using the mercury code.
>>>>>>>>>>> We deploy our application on different processors, one product is
>>>>>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>>>>>>> Cavium (8 core device).
>>>>>>>>>>> The Linux version we use is 2.6.32 but I would assume that this
>>>>>>>>>>> is
>>>>>>>>>>> not
>>>>>>>>>>> so relevant.
>>>>>>>>>>>
>>>>>>>>>>> Our Xenomai application is running on one of the cores (affinity
>>>>>>>>>>> is
>>>>>>>>>>> set), while the other cores are running other code.
>>>>>>>>>>>
>>>>>>>>>>> On both architectures we recently start to see issues that one
>>>>>>>>>>> thread
>>>>>>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>>>>>>> The thread that monopolizes the core is the thread internally
>>>>>>>>>>> used
>>>>>>>>>>> to
>>>>>>>>>>> manage the timers, running at the highest priority.
>>>>>>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>>>>>>> If we only start a part of the application (platform management
>>>>>>>>>>> only),
>>>>>>>>>>> the issue is not observed.
>>>>>>>>>>> We see this on both an old version of Xenomai and a very recent
>>>>>>>>>>> one
>>>>>>>>>>> (pulled from the git repo yesterday).
>>>>>>>>>>>
>>>>>>>>>>> I will continue to debug this issue in the coming days and try
>>>>>>>>>>> isolate
>>>>>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>>>>>> community.
>>>>>>>>>>> Debugging is complex since once the load starts, the debugger is
>>>>>>>>>>> not
>>>>>>>>>>> reacting anymore.
>>>>>>>>>>> If I put breakpoints in the functions that are called when the
>>>>>>>>>>> timer
>>>>>>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>>>>>>> itself and I endup with tens of them.
>>>>>>>>>>>
>>>>>>>>>>> Has anybody seen an issue like this before or does somebody has
>>>>>>>>>>> some
>>>>>>>>>>> hints on how to debug this problem?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> First enable the watchdog. It will send a signal to the
>>>>>>>>>> application
>>>>>>>>>> when
>>>>>>>>>> detecting a problem, then you can use the watchdog to trigger an
>>>>>>>>>> I-pipe
>>>>>>>>>> tracer trace when the bug happens. You will probably have to
>>>>>>>>>> increase
>>>>>>>>>> the watchdog polling frequency, in order to have a meaningful
>>>>>>>>>> trace.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't think an I-pipe tracer will be possible when using the
>>>>>>>>> Mercury
>>>>>>>>> core, right (xenomai-forge) ?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Correct.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I do not think so. The way I see it, you can enable the I-pipe tracer
>>>>>>> without CONFIG_XENOMAI.
>>>>>>>
>>>>>>
>>>>>> Mercury has NO pipeline in the kernel.
>>>>>>
>>>>>
>>>>> You mean mercury can not run with an I-pipe kernel?
>>>>>
>>>>
>>>> I mean it does not care about the pipeline, it does not need it. So if
>>>> this
>>>> is about observing kernel activity, then ftrace should be fine, or
>>>> possibly
>>>> perf to find out where userland spends time.
>>>>
>>>> --
>>>> Philippe.
>>>>
>>>>
>>>> _______________________________________________
>>>> Xenomai mailing list
>>>> Xenomai@xenomai.org
>>>> http://www.xenomai.org/mailman/listinfo/xenomai
>>>
>>>
>>> Hello
>>>
>>> An update on the investigation:
>>> I was able to make this issue disappear by changing the timeout value
>>> of the smallest timers we use.
>>> We use a couple of timers with a timeout of 25ms. By enlarging these
>>> to 25sec and the problem is gone.
>>>
>>> Yesterday I was also able to see (using the"strace" tool) the process
>>> executing constantly "clone" system calls.
>>> Note that the process we use is large (2Gb) and uses an mlockall call.
>>>
>>> In
>>> http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
>>> I see that a new thread is created when the timer_create is called for
>>> the first time. This thread stays alive until the program exits and is
>>> used to process the timer expiries.
>>> I have the feeling that there is an issue during the creation of this
>>> thread. For example what would happen if the clone operation takes
>>> longer than the time needed to perform the clone operation?
>>> In the past we already observed issues with the clone call that we
>>> could not explain (creation of the clone simply failed on our
>>> application while it was working fine on a smaller application).
>>>
>>> Do you guys know whether there is an impact on the clone operation by
>>> this mlockall call?
>>>
>>> I will try to make a small test application on which the issue can be
>>> reproduced.
>>>
>>> ---
>>> Ronny
>>
>>
>> I'm able to reproduce the issue on a small test build:
>>
>> #include <stdio.h>
>> #include <unistd.h>
>> #include <sys/types.h>
>> #include <sys/mman.h>
>> #include <psos.h>
>> #include <copperplate/init.h>
>> #include <stdlib.h>
>> #include <string.h>
>>
>> static void foo (u_long a0, u_long a1, u_long a2, u_long a3)
>> {
>> u_long ret, ev = 0, tmid,tmid2;
>>
>> ret = tm_evevery(1,1,&tmid);
>> ret = tm_evafter(30000,4,&tmid2);
>> while (1) {
>> ret = ev_receive(0xFF,EV_ANY|EV_WAIT,0,&ev);
>> if (ev & 4) {
>> printf("%lx Restarting one-shot timer.
>> ev=%lx\n",ret,ev);
>> tm_evafter(30000,4,&tmid2);
>> }
>> ev = 0;
>> }
>> tm_wkafter(100);
>> }
>>
>> int main(int argc, char * const *argv)
>> {
>> u_long ret, tid = 0, args[4];
>>
>> mlockall(MCL_CURRENT | MCL_FUTURE);
>> copperplate_init(&argc,&argv);
>>
>> ret = t_create("TEST",97, 0, 0, 0, &tid);
>> printf("t_create(tid=%lu) = %lu\n", tid, ret);
>> args[0] = 1;
>> args[1] = 2;
>> args[2] = 3;
>> args[3] = 4;
>> ret = t_start(tid, 0, foo, args);
>> printf("t_start(tid=%lu) = %lu\n", tid, ret);
>>
>> while (1)
>> tm_wkafter(100);
>> return 0;
>> }
>>
>> The TEST task starts 2 timers: one periodic and one 1shot timer.
>> Each time the one-shot timer expires, a print is done and the timer is
>> restarted.
>>
>> Observation is that once the one-shot timer expires, the application
>> starts to use 100% cpuload on one core and the application code is not
>> executed anymore. So it looks like there is constant processing in
>> either Xenomai or the library code to process the timer handling. If
>> periodic timers are used the issue is not observed.
>>
>
> I can't reproduce this bug using that test code, over glibc 2.15/x86. We
> might have a problem with SIGEV_THREAD. Which glibc release are you running?
>
> Also, do you observe the same issue with a larger event interval for the
> periodic timer (e.g. 1000 ticks)?
>
> --
> Philippe.
Philippe,
this is the output I see:
# taskset 4 /tmp/simple_tm_cancel.exe &
# 0"000.506| WARNING: [main] Xenomai compiled with partial debug enabled,
high latencies expected [--enable-debug=partial]
t_create(tid=273617536) = 0
t_start(tid=273617536) = 0
0 Restarting one-shot timer. ev=6
After this I see a cpuload of 100%.
The zero on the beginning of the line is the return value of the ev_receive.
If I change the timeout value to a large value I also see the issue.
It seems to start using long as soon as the one-shot timer expires /
gets restarted.
This is the information about the library we use:
GNU C Library stable release version 2.9, by Roland McGrath et al.
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.3.3.
Compiled on a Linux >>2.6.18-274.el5xen<< system on 2012-02-15.
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
Native POSIX Threads Library by Ulrich Drepper et al
Support for some architectures added on, not maintained in glibc core.
BIND-8.2.3-T5B
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-05 14:25 ` Ronny Meeus
@ 2013-03-05 14:47 ` Philippe Gerum
2013-03-05 14:53 ` Ronny Meeus
0 siblings, 1 reply; 30+ messages in thread
From: Philippe Gerum @ 2013-03-05 14:47 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/05/2013 03:25 PM, Ronny Meeus wrote:
> On Tue, Mar 5, 2013 at 3:08 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/05/2013 01:43 PM, Ronny Meeus wrote:
>>>
>>> On Sat, Mar 2, 2013 at 12:13 PM, Ronny Meeus <ronny.meeus@gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Mar 1, 2013 at 9:41 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>>>
>>>>> On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
>>>>>>
>>>>>>
>>>>>> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>>>>>>
>>>>>>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello
>>>>>>>>>>>>
>>>>>>>>>>>> we are using the PSOS interface of Xenomai forge, running
>>>>>>>>>>>> completely
>>>>>>>>>>>> in user-space using the mercury code.
>>>>>>>>>>>> We deploy our application on different processors, one product is
>>>>>>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>>>>>>>> Cavium (8 core device).
>>>>>>>>>>>> The Linux version we use is 2.6.32 but I would assume that this
>>>>>>>>>>>> is
>>>>>>>>>>>> not
>>>>>>>>>>>> so relevant.
>>>>>>>>>>>>
>>>>>>>>>>>> Our Xenomai application is running on one of the cores (affinity
>>>>>>>>>>>> is
>>>>>>>>>>>> set), while the other cores are running other code.
>>>>>>>>>>>>
>>>>>>>>>>>> On both architectures we recently start to see issues that one
>>>>>>>>>>>> thread
>>>>>>>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>>>>>>>> The thread that monopolizes the core is the thread internally
>>>>>>>>>>>> used
>>>>>>>>>>>> to
>>>>>>>>>>>> manage the timers, running at the highest priority.
>>>>>>>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>>>>>>>> If we only start a part of the application (platform management
>>>>>>>>>>>> only),
>>>>>>>>>>>> the issue is not observed.
>>>>>>>>>>>> We see this on both an old version of Xenomai and a very recent
>>>>>>>>>>>> one
>>>>>>>>>>>> (pulled from the git repo yesterday).
>>>>>>>>>>>>
>>>>>>>>>>>> I will continue to debug this issue in the coming days and try
>>>>>>>>>>>> isolate
>>>>>>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>>>>>>> community.
>>>>>>>>>>>> Debugging is complex since once the load starts, the debugger is
>>>>>>>>>>>> not
>>>>>>>>>>>> reacting anymore.
>>>>>>>>>>>> If I put breakpoints in the functions that are called when the
>>>>>>>>>>>> timer
>>>>>>>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>>>>>>>> itself and I endup with tens of them.
>>>>>>>>>>>>
>>>>>>>>>>>> Has anybody seen an issue like this before or does somebody has
>>>>>>>>>>>> some
>>>>>>>>>>>> hints on how to debug this problem?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> First enable the watchdog. It will send a signal to the
>>>>>>>>>>> application
>>>>>>>>>>> when
>>>>>>>>>>> detecting a problem, then you can use the watchdog to trigger an
>>>>>>>>>>> I-pipe
>>>>>>>>>>> tracer trace when the bug happens. You will probably have to
>>>>>>>>>>> increase
>>>>>>>>>>> the watchdog polling frequency, in order to have a meaningful
>>>>>>>>>>> trace.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't think an I-pipe tracer will be possible when using the
>>>>>>>>>> Mercury
>>>>>>>>>> core, right (xenomai-forge) ?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Correct.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I do not think so. The way I see it, you can enable the I-pipe tracer
>>>>>>>> without CONFIG_XENOMAI.
>>>>>>>>
>>>>>>>
>>>>>>> Mercury has NO pipeline in the kernel.
>>>>>>>
>>>>>>
>>>>>> You mean mercury can not run with an I-pipe kernel?
>>>>>>
>>>>>
>>>>> I mean it does not care about the pipeline, it does not need it. So if
>>>>> this
>>>>> is about observing kernel activity, then ftrace should be fine, or
>>>>> possibly
>>>>> perf to find out where userland spends time.
>>>>>
>>>>> --
>>>>> Philippe.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xenomai mailing list
>>>>> Xenomai@xenomai.org
>>>>> http://www.xenomai.org/mailman/listinfo/xenomai
>>>>
>>>>
>>>> Hello
>>>>
>>>> An update on the investigation:
>>>> I was able to make this issue disappear by changing the timeout value
>>>> of the smallest timers we use.
>>>> We use a couple of timers with a timeout of 25ms. By enlarging these
>>>> to 25sec and the problem is gone.
>>>>
>>>> Yesterday I was also able to see (using the"strace" tool) the process
>>>> executing constantly "clone" system calls.
>>>> Note that the process we use is large (2Gb) and uses an mlockall call.
>>>>
>>>> In
>>>> http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
>>>> I see that a new thread is created when the timer_create is called for
>>>> the first time. This thread stays alive until the program exits and is
>>>> used to process the timer expiries.
>>>> I have the feeling that there is an issue during the creation of this
>>>> thread. For example what would happen if the clone operation takes
>>>> longer than the time needed to perform the clone operation?
>>>> In the past we already observed issues with the clone call that we
>>>> could not explain (creation of the clone simply failed on our
>>>> application while it was working fine on a smaller application).
>>>>
>>>> Do you guys know whether there is an impact on the clone operation by
>>>> this mlockall call?
>>>>
>>>> I will try to make a small test application on which the issue can be
>>>> reproduced.
>>>>
>>>> ---
>>>> Ronny
>>>
>>>
>>> I'm able to reproduce the issue on a small test build:
>>>
>>> #include <stdio.h>
>>> #include <unistd.h>
>>> #include <sys/types.h>
>>> #include <sys/mman.h>
>>> #include <psos.h>
>>> #include <copperplate/init.h>
>>> #include <stdlib.h>
>>> #include <string.h>
>>>
>>> static void foo (u_long a0, u_long a1, u_long a2, u_long a3)
>>> {
>>> u_long ret, ev = 0, tmid,tmid2;
>>>
>>> ret = tm_evevery(1,1,&tmid);
>>> ret = tm_evafter(30000,4,&tmid2);
>>> while (1) {
>>> ret = ev_receive(0xFF,EV_ANY|EV_WAIT,0,&ev);
>>> if (ev & 4) {
>>> printf("%lx Restarting one-shot timer.
>>> ev=%lx\n",ret,ev);
>>> tm_evafter(30000,4,&tmid2);
>>> }
>>> ev = 0;
>>> }
>>> tm_wkafter(100);
>>> }
>>>
>>> int main(int argc, char * const *argv)
>>> {
>>> u_long ret, tid = 0, args[4];
>>>
>>> mlockall(MCL_CURRENT | MCL_FUTURE);
>>> copperplate_init(&argc,&argv);
>>>
>>> ret = t_create("TEST",97, 0, 0, 0, &tid);
>>> printf("t_create(tid=%lu) = %lu\n", tid, ret);
>>> args[0] = 1;
>>> args[1] = 2;
>>> args[2] = 3;
>>> args[3] = 4;
>>> ret = t_start(tid, 0, foo, args);
>>> printf("t_start(tid=%lu) = %lu\n", tid, ret);
>>>
>>> while (1)
>>> tm_wkafter(100);
>>> return 0;
>>> }
>>>
>>> The TEST task starts 2 timers: one periodic and one 1shot timer.
>>> Each time the one-shot timer expires, a print is done and the timer is
>>> restarted.
>>>
>>> Observation is that once the one-shot timer expires, the application
>>> starts to use 100% cpuload on one core and the application code is not
>>> executed anymore. So it looks like there is constant processing in
>>> either Xenomai or the library code to process the timer handling. If
>>> periodic timers are used the issue is not observed.
>>>
>>
>> I can't reproduce this bug using that test code, over glibc 2.15/x86. We
>> might have a problem with SIGEV_THREAD. Which glibc release are you running?
>>
>> Also, do you observe the same issue with a larger event interval for the
>> periodic timer (e.g. 1000 ticks)?
>>
>> --
>> Philippe.
>
> Philippe,
> this is the output I see:
>
> # taskset 4 /tmp/simple_tm_cancel.exe &
> # 0"000.506| WARNING: [main] Xenomai compiled with partial debug enabled,
> high latencies expected [--enable-debug=partial]
> t_create(tid=273617536) = 0
> t_start(tid=273617536) = 0
> 0 Restarting one-shot timer. ev=6
6? We are asking for events 4 or 1, so at best, we might get 0x5 if both
are pending at the same time. Or maybe is it a different test?
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-05 14:47 ` Philippe Gerum
@ 2013-03-05 14:53 ` Ronny Meeus
0 siblings, 0 replies; 30+ messages in thread
From: Ronny Meeus @ 2013-03-05 14:53 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On Tue, Mar 5, 2013 at 3:47 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/05/2013 03:25 PM, Ronny Meeus wrote:
>>
>> On Tue, Mar 5, 2013 at 3:08 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>
>>> On 03/05/2013 01:43 PM, Ronny Meeus wrote:
>>>>
>>>>
>>>> On Sat, Mar 2, 2013 at 12:13 PM, Ronny Meeus <ronny.meeus@gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On Fri, Mar 1, 2013 at 9:41 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>>>>
>>>>>>
>>>>>> On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>>>>>>>
>>>>>>>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello
>>>>>>>>>>>>>
>>>>>>>>>>>>> we are using the PSOS interface of Xenomai forge, running
>>>>>>>>>>>>> completely
>>>>>>>>>>>>> in user-space using the mercury code.
>>>>>>>>>>>>> We deploy our application on different processors, one product
>>>>>>>>>>>>> is
>>>>>>>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one
>>>>>>>>>>>>> on
>>>>>>>>>>>>> Cavium (8 core device).
>>>>>>>>>>>>> The Linux version we use is 2.6.32 but I would assume that this
>>>>>>>>>>>>> is
>>>>>>>>>>>>> not
>>>>>>>>>>>>> so relevant.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Our Xenomai application is running on one of the cores
>>>>>>>>>>>>> (affinity
>>>>>>>>>>>>> is
>>>>>>>>>>>>> set), while the other cores are running other code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On both architectures we recently start to see issues that one
>>>>>>>>>>>>> thread
>>>>>>>>>>>>> is consuming 100% of the core on which the application is
>>>>>>>>>>>>> pinned.
>>>>>>>>>>>>> The thread that monopolizes the core is the thread internally
>>>>>>>>>>>>> used
>>>>>>>>>>>>> to
>>>>>>>>>>>>> manage the timers, running at the highest priority.
>>>>>>>>>>>>> The trigger for running into this behavior is currently
>>>>>>>>>>>>> unclear.
>>>>>>>>>>>>> If we only start a part of the application (platform management
>>>>>>>>>>>>> only),
>>>>>>>>>>>>> the issue is not observed.
>>>>>>>>>>>>> We see this on both an old version of Xenomai and a very recent
>>>>>>>>>>>>> one
>>>>>>>>>>>>> (pulled from the git repo yesterday).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will continue to debug this issue in the coming days and try
>>>>>>>>>>>>> isolate
>>>>>>>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>>>>>>>> community.
>>>>>>>>>>>>> Debugging is complex since once the load starts, the debugger
>>>>>>>>>>>>> is
>>>>>>>>>>>>> not
>>>>>>>>>>>>> reacting anymore.
>>>>>>>>>>>>> If I put breakpoints in the functions that are called when the
>>>>>>>>>>>>> timer
>>>>>>>>>>>>> expires (both oneshot and periodic), the process starts to
>>>>>>>>>>>>> clone
>>>>>>>>>>>>> itself and I endup with tens of them.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Has anybody seen an issue like this before or does somebody has
>>>>>>>>>>>>> some
>>>>>>>>>>>>> hints on how to debug this problem?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> First enable the watchdog. It will send a signal to the
>>>>>>>>>>>> application
>>>>>>>>>>>> when
>>>>>>>>>>>> detecting a problem, then you can use the watchdog to trigger an
>>>>>>>>>>>> I-pipe
>>>>>>>>>>>> tracer trace when the bug happens. You will probably have to
>>>>>>>>>>>> increase
>>>>>>>>>>>> the watchdog polling frequency, in order to have a meaningful
>>>>>>>>>>>> trace.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I don't think an I-pipe tracer will be possible when using the
>>>>>>>>>>> Mercury
>>>>>>>>>>> core, right (xenomai-forge) ?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Correct.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I do not think so. The way I see it, you can enable the I-pipe
>>>>>>>>> tracer
>>>>>>>>> without CONFIG_XENOMAI.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Mercury has NO pipeline in the kernel.
>>>>>>>>
>>>>>>>
>>>>>>> You mean mercury can not run with an I-pipe kernel?
>>>>>>>
>>>>>>
>>>>>> I mean it does not care about the pipeline, it does not need it. So if
>>>>>> this
>>>>>> is about observing kernel activity, then ftrace should be fine, or
>>>>>> possibly
>>>>>> perf to find out where userland spends time.
>>>>>>
>>>>>> --
>>>>>> Philippe.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xenomai mailing list
>>>>>> Xenomai@xenomai.org
>>>>>> http://www.xenomai.org/mailman/listinfo/xenomai
>>>>>
>>>>>
>>>>>
>>>>> Hello
>>>>>
>>>>> An update on the investigation:
>>>>> I was able to make this issue disappear by changing the timeout value
>>>>> of the smallest timers we use.
>>>>> We use a couple of timers with a timeout of 25ms. By enlarging these
>>>>> to 25sec and the problem is gone.
>>>>>
>>>>> Yesterday I was also able to see (using the"strace" tool) the process
>>>>> executing constantly "clone" system calls.
>>>>> Note that the process we use is large (2Gb) and uses an mlockall call.
>>>>>
>>>>> In
>>>>>
>>>>> http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
>>>>> I see that a new thread is created when the timer_create is called for
>>>>> the first time. This thread stays alive until the program exits and is
>>>>> used to process the timer expiries.
>>>>> I have the feeling that there is an issue during the creation of this
>>>>> thread. For example what would happen if the clone operation takes
>>>>> longer than the time needed to perform the clone operation?
>>>>> In the past we already observed issues with the clone call that we
>>>>> could not explain (creation of the clone simply failed on our
>>>>> application while it was working fine on a smaller application).
>>>>>
>>>>> Do you guys know whether there is an impact on the clone operation by
>>>>> this mlockall call?
>>>>>
>>>>> I will try to make a small test application on which the issue can be
>>>>> reproduced.
>>>>>
>>>>> ---
>>>>> Ronny
>>>>
>>>>
>>>>
>>>> I'm able to reproduce the issue on a small test build:
>>>>
>>>> #include <stdio.h>
>>>> #include <unistd.h>
>>>> #include <sys/types.h>
>>>> #include <sys/mman.h>
>>>> #include <psos.h>
>>>> #include <copperplate/init.h>
>>>> #include <stdlib.h>
>>>> #include <string.h>
>>>>
>>>> static void foo (u_long a0, u_long a1, u_long a2, u_long a3)
>>>> {
>>>> u_long ret, ev = 0, tmid,tmid2;
>>>>
>>>> ret = tm_evevery(1,1,&tmid);
>>>> ret = tm_evafter(30000,4,&tmid2);
>>>> while (1) {
>>>> ret = ev_receive(0xFF,EV_ANY|EV_WAIT,0,&ev);
>>>> if (ev & 4) {
>>>> printf("%lx Restarting one-shot timer.
>>>> ev=%lx\n",ret,ev);
>>>> tm_evafter(30000,4,&tmid2);
>>>> }
>>>> ev = 0;
>>>> }
>>>> tm_wkafter(100);
>>>> }
>>>>
>>>> int main(int argc, char * const *argv)
>>>> {
>>>> u_long ret, tid = 0, args[4];
>>>>
>>>> mlockall(MCL_CURRENT | MCL_FUTURE);
>>>> copperplate_init(&argc,&argv);
>>>>
>>>> ret = t_create("TEST",97, 0, 0, 0, &tid);
>>>> printf("t_create(tid=%lu) = %lu\n", tid, ret);
>>>> args[0] = 1;
>>>> args[1] = 2;
>>>> args[2] = 3;
>>>> args[3] = 4;
>>>> ret = t_start(tid, 0, foo, args);
>>>> printf("t_start(tid=%lu) = %lu\n", tid, ret);
>>>>
>>>> while (1)
>>>> tm_wkafter(100);
>>>> return 0;
>>>> }
>>>>
>>>> The TEST task starts 2 timers: one periodic and one 1shot timer.
>>>> Each time the one-shot timer expires, a print is done and the timer is
>>>> restarted.
>>>>
>>>> Observation is that once the one-shot timer expires, the application
>>>> starts to use 100% cpuload on one core and the application code is not
>>>> executed anymore. So it looks like there is constant processing in
>>>> either Xenomai or the library code to process the timer handling. If
>>>> periodic timers are used the issue is not observed.
>>>>
>>>
>>> I can't reproduce this bug using that test code, over glibc 2.15/x86. We
>>> might have a problem with SIGEV_THREAD. Which glibc release are you
>>> running?
>>>
>>> Also, do you observe the same issue with a larger event interval for the
>>> periodic timer (e.g. 1000 ticks)?
>>>
>>> --
>>> Philippe.
>>
>>
>> Philippe,
>> this is the output I see:
>>
>> # taskset 4 /tmp/simple_tm_cancel.exe &
>> # 0"000.506| WARNING: [main] Xenomai compiled with partial debug
>> enabled,
>> high latencies expected
>> [--enable-debug=partial]
>> t_create(tid=273617536) = 0
>> t_start(tid=273617536) = 0
>> 0 Restarting one-shot timer. ev=6
>
>
> 6? We are asking for events 4 or 1, so at best, we might get 0x5 if both are
> pending at the same time. Or maybe is it a different test?
>
>
> --
> Philippe.
In the meantime I indeed added another timer. The events I receive are
as expected.
I just wanted to show the return value of the ev_receive call.
Sorry for the confusion.
---
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-05 14:08 ` Philippe Gerum
2013-03-05 14:25 ` Ronny Meeus
@ 2013-03-06 10:55 ` Ronny Meeus
2013-03-06 11:09 ` Philippe Gerum
2013-03-06 13:24 ` Philippe Gerum
1 sibling, 2 replies; 30+ messages in thread
From: Ronny Meeus @ 2013-03-06 10:55 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On Tue, Mar 5, 2013 at 3:08 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/05/2013 01:43 PM, Ronny Meeus wrote:
>>
>> On Sat, Mar 2, 2013 at 12:13 PM, Ronny Meeus <ronny.meeus@gmail.com>
>> wrote:
>>>
>>> On Fri, Mar 1, 2013 at 9:41 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>>
>>>> On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
>>>>>
>>>>>
>>>>> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>>>>>
>>>>>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>>>>>
>>>>>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello
>>>>>>>>>>>
>>>>>>>>>>> we are using the PSOS interface of Xenomai forge, running
>>>>>>>>>>> completely
>>>>>>>>>>> in user-space using the mercury code.
>>>>>>>>>>> We deploy our application on different processors, one product is
>>>>>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>>>>>>> Cavium (8 core device).
>>>>>>>>>>> The Linux version we use is 2.6.32 but I would assume that this
>>>>>>>>>>> is
>>>>>>>>>>> not
>>>>>>>>>>> so relevant.
>>>>>>>>>>>
>>>>>>>>>>> Our Xenomai application is running on one of the cores (affinity
>>>>>>>>>>> is
>>>>>>>>>>> set), while the other cores are running other code.
>>>>>>>>>>>
>>>>>>>>>>> On both architectures we recently start to see issues that one
>>>>>>>>>>> thread
>>>>>>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>>>>>>> The thread that monopolizes the core is the thread internally
>>>>>>>>>>> used
>>>>>>>>>>> to
>>>>>>>>>>> manage the timers, running at the highest priority.
>>>>>>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>>>>>>> If we only start a part of the application (platform management
>>>>>>>>>>> only),
>>>>>>>>>>> the issue is not observed.
>>>>>>>>>>> We see this on both an old version of Xenomai and a very recent
>>>>>>>>>>> one
>>>>>>>>>>> (pulled from the git repo yesterday).
>>>>>>>>>>>
>>>>>>>>>>> I will continue to debug this issue in the coming days and try
>>>>>>>>>>> isolate
>>>>>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>>>>>> community.
>>>>>>>>>>> Debugging is complex since once the load starts, the debugger is
>>>>>>>>>>> not
>>>>>>>>>>> reacting anymore.
>>>>>>>>>>> If I put breakpoints in the functions that are called when the
>>>>>>>>>>> timer
>>>>>>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>>>>>>> itself and I endup with tens of them.
>>>>>>>>>>>
>>>>>>>>>>> Has anybody seen an issue like this before or does somebody has
>>>>>>>>>>> some
>>>>>>>>>>> hints on how to debug this problem?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> First enable the watchdog. It will send a signal to the
>>>>>>>>>> application
>>>>>>>>>> when
>>>>>>>>>> detecting a problem, then you can use the watchdog to trigger an
>>>>>>>>>> I-pipe
>>>>>>>>>> tracer trace when the bug happens. You will probably have to
>>>>>>>>>> increase
>>>>>>>>>> the watchdog polling frequency, in order to have a meaningful
>>>>>>>>>> trace.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't think an I-pipe tracer will be possible when using the
>>>>>>>>> Mercury
>>>>>>>>> core, right (xenomai-forge) ?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Correct.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I do not think so. The way I see it, you can enable the I-pipe tracer
>>>>>>> without CONFIG_XENOMAI.
>>>>>>>
>>>>>>
>>>>>> Mercury has NO pipeline in the kernel.
>>>>>>
>>>>>
>>>>> You mean mercury can not run with an I-pipe kernel?
>>>>>
>>>>
>>>> I mean it does not care about the pipeline, it does not need it. So if
>>>> this
>>>> is about observing kernel activity, then ftrace should be fine, or
>>>> possibly
>>>> perf to find out where userland spends time.
>>>>
>>>> --
>>>> Philippe.
>>>>
>>>>
>>>> _______________________________________________
>>>> Xenomai mailing list
>>>> Xenomai@xenomai.org
>>>> http://www.xenomai.org/mailman/listinfo/xenomai
>>>
>>>
>>> Hello
>>>
>>> An update on the investigation:
>>> I was able to make this issue disappear by changing the timeout value
>>> of the smallest timers we use.
>>> We use a couple of timers with a timeout of 25ms. By enlarging these
>>> to 25sec and the problem is gone.
>>>
>>> Yesterday I was also able to see (using the"strace" tool) the process
>>> executing constantly "clone" system calls.
>>> Note that the process we use is large (2Gb) and uses an mlockall call.
>>>
>>> In
>>> http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
>>> I see that a new thread is created when the timer_create is called for
>>> the first time. This thread stays alive until the program exits and is
>>> used to process the timer expiries.
>>> I have the feeling that there is an issue during the creation of this
>>> thread. For example what would happen if the clone operation takes
>>> longer than the time needed to perform the clone operation?
>>> In the past we already observed issues with the clone call that we
>>> could not explain (creation of the clone simply failed on our
>>> application while it was working fine on a smaller application).
>>>
>>> Do you guys know whether there is an impact on the clone operation by
>>> this mlockall call?
>>>
>>> I will try to make a small test application on which the issue can be
>>> reproduced.
>>>
>>> ---
>>> Ronny
>>
>>
>> I'm able to reproduce the issue on a small test build:
>>
>> #include <stdio.h>
>> #include <unistd.h>
>> #include <sys/types.h>
>> #include <sys/mman.h>
>> #include <psos.h>
>> #include <copperplate/init.h>
>> #include <stdlib.h>
>> #include <string.h>
>>
>> static void foo (u_long a0, u_long a1, u_long a2, u_long a3)
>> {
>> u_long ret, ev = 0, tmid,tmid2;
>>
>> ret = tm_evevery(1,1,&tmid);
>> ret = tm_evafter(30000,4,&tmid2);
>> while (1) {
>> ret = ev_receive(0xFF,EV_ANY|EV_WAIT,0,&ev);
>> if (ev & 4) {
>> printf("%lx Restarting one-shot timer.
>> ev=%lx\n",ret,ev);
>> tm_evafter(30000,4,&tmid2);
>> }
>> ev = 0;
>> }
>> tm_wkafter(100);
>> }
>>
>> int main(int argc, char * const *argv)
>> {
>> u_long ret, tid = 0, args[4];
>>
>> mlockall(MCL_CURRENT | MCL_FUTURE);
>> copperplate_init(&argc,&argv);
>>
>> ret = t_create("TEST",97, 0, 0, 0, &tid);
>> printf("t_create(tid=%lu) = %lu\n", tid, ret);
>> args[0] = 1;
>> args[1] = 2;
>> args[2] = 3;
>> args[3] = 4;
>> ret = t_start(tid, 0, foo, args);
>> printf("t_start(tid=%lu) = %lu\n", tid, ret);
>>
>> while (1)
>> tm_wkafter(100);
>> return 0;
>> }
>>
>> The TEST task starts 2 timers: one periodic and one 1shot timer.
>> Each time the one-shot timer expires, a print is done and the timer is
>> restarted.
>>
>> Observation is that once the one-shot timer expires, the application
>> starts to use 100% cpuload on one core and the application code is not
>> executed anymore. So it looks like there is constant processing in
>> either Xenomai or the library code to process the timer handling. If
>> periodic timers are used the issue is not observed.
>>
>
> I can't reproduce this bug using that test code, over glibc 2.15/x86. We
> might have a problem with SIGEV_THREAD. Which glibc release are you running?
>
Philip,
do you have a reference to the issue that you are suspecting and a
view on which version of the glib we need to use to solve it?
Ronny
> Also, do you observe the same issue with a larger event interval for the
> periodic timer (e.g. 1000 ticks)?
>
> --
> Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-06 10:55 ` Ronny Meeus
@ 2013-03-06 11:09 ` Philippe Gerum
2013-03-06 11:18 ` Philippe Gerum
2013-03-06 13:24 ` Philippe Gerum
1 sibling, 1 reply; 30+ messages in thread
From: Philippe Gerum @ 2013-03-06 11:09 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/06/2013 11:55 AM, Ronny Meeus wrote:
> On Tue, Mar 5, 2013 at 3:08 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/05/2013 01:43 PM, Ronny Meeus wrote:
>>>
>>> On Sat, Mar 2, 2013 at 12:13 PM, Ronny Meeus <ronny.meeus@gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Mar 1, 2013 at 9:41 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>>>
>>>>> On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
>>>>>>
>>>>>>
>>>>>> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>>>>>>
>>>>>>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello
>>>>>>>>>>>>
>>>>>>>>>>>> we are using the PSOS interface of Xenomai forge, running
>>>>>>>>>>>> completely
>>>>>>>>>>>> in user-space using the mercury code.
>>>>>>>>>>>> We deploy our application on different processors, one product is
>>>>>>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>>>>>>>> Cavium (8 core device).
>>>>>>>>>>>> The Linux version we use is 2.6.32 but I would assume that this
>>>>>>>>>>>> is
>>>>>>>>>>>> not
>>>>>>>>>>>> so relevant.
>>>>>>>>>>>>
>>>>>>>>>>>> Our Xenomai application is running on one of the cores (affinity
>>>>>>>>>>>> is
>>>>>>>>>>>> set), while the other cores are running other code.
>>>>>>>>>>>>
>>>>>>>>>>>> On both architectures we recently start to see issues that one
>>>>>>>>>>>> thread
>>>>>>>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>>>>>>>> The thread that monopolizes the core is the thread internally
>>>>>>>>>>>> used
>>>>>>>>>>>> to
>>>>>>>>>>>> manage the timers, running at the highest priority.
>>>>>>>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>>>>>>>> If we only start a part of the application (platform management
>>>>>>>>>>>> only),
>>>>>>>>>>>> the issue is not observed.
>>>>>>>>>>>> We see this on both an old version of Xenomai and a very recent
>>>>>>>>>>>> one
>>>>>>>>>>>> (pulled from the git repo yesterday).
>>>>>>>>>>>>
>>>>>>>>>>>> I will continue to debug this issue in the coming days and try
>>>>>>>>>>>> isolate
>>>>>>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>>>>>>> community.
>>>>>>>>>>>> Debugging is complex since once the load starts, the debugger is
>>>>>>>>>>>> not
>>>>>>>>>>>> reacting anymore.
>>>>>>>>>>>> If I put breakpoints in the functions that are called when the
>>>>>>>>>>>> timer
>>>>>>>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>>>>>>>> itself and I endup with tens of them.
>>>>>>>>>>>>
>>>>>>>>>>>> Has anybody seen an issue like this before or does somebody has
>>>>>>>>>>>> some
>>>>>>>>>>>> hints on how to debug this problem?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> First enable the watchdog. It will send a signal to the
>>>>>>>>>>> application
>>>>>>>>>>> when
>>>>>>>>>>> detecting a problem, then you can use the watchdog to trigger an
>>>>>>>>>>> I-pipe
>>>>>>>>>>> tracer trace when the bug happens. You will probably have to
>>>>>>>>>>> increase
>>>>>>>>>>> the watchdog polling frequency, in order to have a meaningful
>>>>>>>>>>> trace.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't think an I-pipe tracer will be possible when using the
>>>>>>>>>> Mercury
>>>>>>>>>> core, right (xenomai-forge) ?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Correct.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I do not think so. The way I see it, you can enable the I-pipe tracer
>>>>>>>> without CONFIG_XENOMAI.
>>>>>>>>
>>>>>>>
>>>>>>> Mercury has NO pipeline in the kernel.
>>>>>>>
>>>>>>
>>>>>> You mean mercury can not run with an I-pipe kernel?
>>>>>>
>>>>>
>>>>> I mean it does not care about the pipeline, it does not need it. So if
>>>>> this
>>>>> is about observing kernel activity, then ftrace should be fine, or
>>>>> possibly
>>>>> perf to find out where userland spends time.
>>>>>
>>>>> --
>>>>> Philippe.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xenomai mailing list
>>>>> Xenomai@xenomai.org
>>>>> http://www.xenomai.org/mailman/listinfo/xenomai
>>>>
>>>>
>>>> Hello
>>>>
>>>> An update on the investigation:
>>>> I was able to make this issue disappear by changing the timeout value
>>>> of the smallest timers we use.
>>>> We use a couple of timers with a timeout of 25ms. By enlarging these
>>>> to 25sec and the problem is gone.
>>>>
>>>> Yesterday I was also able to see (using the"strace" tool) the process
>>>> executing constantly "clone" system calls.
>>>> Note that the process we use is large (2Gb) and uses an mlockall call.
>>>>
>>>> In
>>>> http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
>>>> I see that a new thread is created when the timer_create is called for
>>>> the first time. This thread stays alive until the program exits and is
>>>> used to process the timer expiries.
>>>> I have the feeling that there is an issue during the creation of this
>>>> thread. For example what would happen if the clone operation takes
>>>> longer than the time needed to perform the clone operation?
>>>> In the past we already observed issues with the clone call that we
>>>> could not explain (creation of the clone simply failed on our
>>>> application while it was working fine on a smaller application).
>>>>
>>>> Do you guys know whether there is an impact on the clone operation by
>>>> this mlockall call?
>>>>
>>>> I will try to make a small test application on which the issue can be
>>>> reproduced.
>>>>
>>>> ---
>>>> Ronny
>>>
>>>
>>> I'm able to reproduce the issue on a small test build:
>>>
>>> #include <stdio.h>
>>> #include <unistd.h>
>>> #include <sys/types.h>
>>> #include <sys/mman.h>
>>> #include <psos.h>
>>> #include <copperplate/init.h>
>>> #include <stdlib.h>
>>> #include <string.h>
>>>
>>> static void foo (u_long a0, u_long a1, u_long a2, u_long a3)
>>> {
>>> u_long ret, ev = 0, tmid,tmid2;
>>>
>>> ret = tm_evevery(1,1,&tmid);
>>> ret = tm_evafter(30000,4,&tmid2);
>>> while (1) {
>>> ret = ev_receive(0xFF,EV_ANY|EV_WAIT,0,&ev);
>>> if (ev & 4) {
>>> printf("%lx Restarting one-shot timer.
>>> ev=%lx\n",ret,ev);
>>> tm_evafter(30000,4,&tmid2);
>>> }
>>> ev = 0;
>>> }
>>> tm_wkafter(100);
>>> }
>>>
>>> int main(int argc, char * const *argv)
>>> {
>>> u_long ret, tid = 0, args[4];
>>>
>>> mlockall(MCL_CURRENT | MCL_FUTURE);
>>> copperplate_init(&argc,&argv);
>>>
>>> ret = t_create("TEST",97, 0, 0, 0, &tid);
>>> printf("t_create(tid=%lu) = %lu\n", tid, ret);
>>> args[0] = 1;
>>> args[1] = 2;
>>> args[2] = 3;
>>> args[3] = 4;
>>> ret = t_start(tid, 0, foo, args);
>>> printf("t_start(tid=%lu) = %lu\n", tid, ret);
>>>
>>> while (1)
>>> tm_wkafter(100);
>>> return 0;
>>> }
>>>
>>> The TEST task starts 2 timers: one periodic and one 1shot timer.
>>> Each time the one-shot timer expires, a print is done and the timer is
>>> restarted.
>>>
>>> Observation is that once the one-shot timer expires, the application
>>> starts to use 100% cpuload on one core and the application code is not
>>> executed anymore. So it looks like there is constant processing in
>>> either Xenomai or the library code to process the timer handling. If
>>> periodic timers are used the issue is not observed.
>>>
>>
>> I can't reproduce this bug using that test code, over glibc 2.15/x86. We
>> might have a problem with SIGEV_THREAD. Which glibc release are you running?
>>
>
> Philip,
> do you have a reference to the issue that you are suspecting and a
Nothing specific I can confirm yet.
> view on which version of the glib we need to use to solve it?
>
>
So far I have the test running fine over glibc 2.15(x86) and eglibc
2.13(ppc). I have an outdated glibc 2.8(arm) I'm about to test which
might give me a different status.
I don't think I'll keep running timers over SIGEV_THREAD in the upcoming
-forge work anyway, the spec leaves too much room for interpretation
with respect to the underlying implementation. Typically, one server
thread per-timer would be quite of a problem with legacy systems firing
tenths of timeout timers used as plain watchdogs.
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-06 11:09 ` Philippe Gerum
@ 2013-03-06 11:18 ` Philippe Gerum
0 siblings, 0 replies; 30+ messages in thread
From: Philippe Gerum @ 2013-03-06 11:18 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/06/2013 12:09 PM, Philippe Gerum wrote:
>> do you have a reference to the issue that you are suspecting and a
>
> Nothing specific I can confirm yet.
>
>> view on which version of the glib we need to use to solve it?
>>
>>
>
> So far I have the test running fine over glibc 2.15(x86) and eglibc
> 2.13(ppc). I have an outdated glibc 2.8(arm) I'm about to test which
> might give me a different status.
>
> I don't think I'll keep running timers over SIGEV_THREAD in the upcoming
> -forge work anyway, the spec leaves too much room for interpretation
> with respect to the underlying implementation. Typically, one server
> thread per-timer would be quite of a problem with legacy systems firing
> tenths of timeout timers used as plain watchdogs.
>
No luck, 2.8 works fine too. Could you strace your small example on the
failing system for a few seconds until the oneshot timer triggers,
sending me the bzipped output log (privately)?
TIA,
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-06 10:55 ` Ronny Meeus
2013-03-06 11:09 ` Philippe Gerum
@ 2013-03-06 13:24 ` Philippe Gerum
1 sibling, 0 replies; 30+ messages in thread
From: Philippe Gerum @ 2013-03-06 13:24 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/06/2013 11:55 AM, Ronny Meeus wrote:
>> I can't reproduce this bug using that test code, over glibc 2.15/x86. We
>> might have a problem with SIGEV_THREAD. Which glibc release are you running?
>>
>
> Philip,
> do you have a reference to the issue that you are suspecting and a
> view on which version of the glib we need to use to solve it?
>
http://sourceware.org/bugzilla/show_bug.cgi?id=7094
I would definitely drop glibc 2.9 in our case.
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-02 11:13 ` Ronny Meeus
2013-03-05 12:43 ` Ronny Meeus
@ 2013-03-06 13:49 ` Philippe Gerum
2013-03-06 14:32 ` Ronny Meeus
1 sibling, 1 reply; 30+ messages in thread
From: Philippe Gerum @ 2013-03-06 13:49 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/02/2013 12:13 PM, Ronny Meeus wrote:
> An update on the investigation:
> I was able to make this issue disappear by changing the timeout value
> of the smallest timers we use.
> We use a couple of timers with a timeout of 25ms. By enlarging these
> to 25sec and the problem is gone.
>
> Yesterday I was also able to see (using the"strace" tool) the process
> executing constantly "clone" system calls.
> Note that the process we use is large (2Gb) and uses an mlockall call.
>
> In http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
> I see that a new thread is created when the timer_create is called for
> the first time. This thread stays alive until the program exits and is
> used to process the timer expiries.
Looking at the code, glibc 2.9 not only forks one helper thread once,
but also creates a dedicated short-lived thread for running the user
handler at each timer expiration. This implementation is still current
with 2.15. Which makes quite too many clones out there for my taste.
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-06 13:49 ` Philippe Gerum
@ 2013-03-06 14:32 ` Ronny Meeus
2013-03-06 15:49 ` Philippe Gerum
0 siblings, 1 reply; 30+ messages in thread
From: Ronny Meeus @ 2013-03-06 14:32 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On Wed, Mar 6, 2013 at 2:49 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/02/2013 12:13 PM, Ronny Meeus wrote:
>
>> An update on the investigation:
>> I was able to make this issue disappear by changing the timeout value
>> of the smallest timers we use.
>> We use a couple of timers with a timeout of 25ms. By enlarging these
>> to 25sec and the problem is gone.
>>
>> Yesterday I was also able to see (using the"strace" tool) the process
>> executing constantly "clone" system calls.
>> Note that the process we use is large (2Gb) and uses an mlockall call.
>>
>> In
>> http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
>> I see that a new thread is created when the timer_create is called for
>> the first time. This thread stays alive until the program exits and is
>> used to process the timer expiries.
>
>
> Looking at the code, glibc 2.9 not only forks one helper thread once, but
> also creates a dedicated short-lived thread for running the user handler at
> each timer expiration. This implementation is still current with 2.15. Which
> makes quite too many clones out there for my taste.
>
> --
> Philippe.
It can be that there are too many clones, so performance wise this is
not good, but I would assume that the code should behave correctly in
any case.
Here is the last part of the strace information I collected:
11309 0.000069 futex(0x17ab6a4, 0x81, 0x1, 0, 0 <unfinished ...>
16541 0.000042 <... futex resumed> ) = 0
16541 0.000035 futex(0x17ab6a4, 0x81, 0x1, 0, 0x17ab6a4) = 0
16541 0.000064 rt_sigprocmask(SIG_SETMASK, [], NULL, 16) = 0
16541 0.000095 exit(0) = ?
11309 0.000054 <... futex resumed> ) = 1
11309 0.000038 rt_sigtimedwait([RT_0], <unfinished ...>
11308 0.000060 <... clock_gettime resumed> ) = 0
11308 0.000043 timer_create(0x1, 0x157ac68, 0x1050b654 <unfinished ...>
11309 0.000074 <... rt_sigtimedwait resumed> {si_signo=SIGRT_0,
si_code=SI_TIMER, si_pid=51, si_uid=0, si_value={int=273724880,
ptr=0x1050b5d0}}, NULL, 16) = 32
11309 0.000132 sched_get_priority_min(SCHED_FIFO) = 1
11309 0.000062 sched_get_priority_max(SCHED_FIFO) = 99
11309 0.000058 clone(child_stack=0x17ab020,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
, parent_tidptr=0, tls=0x10, child_tidptr=0) = 16543
16543 0.000117 SYS_6272() = 0
16543 0.000058 futex(0x17ab6a4, 0x80, 0x2, 0, 0x17ab6a4 <unfinished ...>
11309 0.000040 sched_setscheduler(16543, SCHED_FIFO, { 99 }) = 0
11309 0.000067 futex(0x17ab6a4, 0x81, 0x1, 0, 0 <unfinished ...>
16543 0.000043 <... futex resumed> ) = 0
16543 0.000035 futex(0x17ab6a4, 0x81, 0x1, 0, 0x17ab6a4) = 0
16543 0.000064 rt_sigprocmask(SIG_SETMASK, [], NULL, 16) = 0
16543 0.000095 exit(0) = ?
11309 0.000054 <... futex resumed> ) = 1
11309 0.000038 rt_sigtimedwait([RT_0], <unfinished ...>
11308 0.000059 <... timer_create resumed> ) = 0
11309 0.000058 <... rt_sigtimedwait resumed> {si_signo=SIGRT_0,
si_code=SI_TIMER, si_pid=51, si_uid=0, si_value={int=273724880,
ptr=0x1050b5d0}}, NULL, 16) = 32
11309 15.442127 +++ killed by SIGKILL +++
I'm not an expert in this, but to me it looks like the create_timer
call, executed in the context of the 11308 thread, gets interrupted
because a signal is received by thread 11309. This signal is generated
because of a timer expiry which creates a new thread and processes the
callback function.
Is it not possible to disable all signals of the process during the
creation of the timer in xenomai. In this way we can avoid the race
condition in the library. This might be not a clean solution but would
it work as a temporary one until you finish the timer handling you
talk about in one of your earlier mails. BTW can I have some
information about what you are planning to change?
Regards,
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-06 14:32 ` Ronny Meeus
@ 2013-03-06 15:49 ` Philippe Gerum
2013-03-07 10:02 ` Ronny Meeus
0 siblings, 1 reply; 30+ messages in thread
From: Philippe Gerum @ 2013-03-06 15:49 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/06/2013 03:32 PM, Ronny Meeus wrote:
> si_code=SI_TIMER, si_pid=51, si_uid=0, si_value={int=273724880,
> ptr=0x1050b5d0}}, NULL, 16) = 32
> 11309 15.442127 +++ killed by SIGKILL +++
>
> I'm not an expert in this, but to me it looks like the create_timer
> call, executed in the context of the 11308 thread, gets interrupted
> because a signal is received by thread 11309. This signal is generated
> because of a timer expiry which creates a new thread and processes the
> callback function.
>
> Is it not possible to disable all signals of the process during the
> creation of the timer in xenomai. In this way we can avoid the race
> condition in the library. This might be not a clean solution but would
> it work as a temporary one until you finish the timer handling you
> talk about in one of your earlier mails.
Fiddling with signal masks to work around an underlying bug would be
asking for more trouble. Time is a scarce enough resource.
BTW can I have some
> information about what you are planning to change?
>
This one:
http://git.xenomai.org/?p=xenomai-forge.git;a=commit;h=cf30fb5e22a1a3d66412f1024be1ba86260904cc
This implementation passes the psos testsuite in -forge, and runs your
testcase fine. Please pull this change and let me know if the situation
improves.
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-06 15:49 ` Philippe Gerum
@ 2013-03-07 10:02 ` Ronny Meeus
2013-03-07 10:32 ` Philippe Gerum
0 siblings, 1 reply; 30+ messages in thread
From: Ronny Meeus @ 2013-03-07 10:02 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On Wed, Mar 6, 2013 at 4:49 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/06/2013 03:32 PM, Ronny Meeus wrote:
>>
>> si_code=SI_TIMER, si_pid=51, si_uid=0, si_value={int=273724880,
>> ptr=0x1050b5d0}}, NULL, 16) = 32
>> 11309 15.442127 +++ killed by SIGKILL +++
>>
>> I'm not an expert in this, but to me it looks like the create_timer
>> call, executed in the context of the 11308 thread, gets interrupted
>> because a signal is received by thread 11309. This signal is generated
>> because of a timer expiry which creates a new thread and processes the
>> callback function.
>>
>> Is it not possible to disable all signals of the process during the
>> creation of the timer in xenomai. In this way we can avoid the race
>> condition in the library. This might be not a clean solution but would
>> it work as a temporary one until you finish the timer handling you
>> talk about in one of your earlier mails.
>
>
> Fiddling with signal masks to work around an underlying bug would be asking
> for more trouble. Time is a scarce enough resource.
>
>
> BTW can I have some
>>
>> information about what you are planning to change?
>>
>
> This one:
> http://git.xenomai.org/?p=xenomai-forge.git;a=commit;h=cf30fb5e22a1a3d66412f1024be1ba86260904cc
>
> This implementation passes the psos testsuite in -forge, and runs your
> testcase fine. Please pull this change and let me know if the situation
> improves.
>
> --
> Philippe.
Philippe
thanks for the patch.
With this patch the test application is working fine, no cpuload is
observed anymore.
Our real application on the other hand aborts after some time with
this reporting:
------------------------------------------------------------------------------
[ ERROR BACKTRACE: thread root ]
#0 EAGAIN in t_create(), task.c:320
#1 EAGAIN in copperplate_create_thread(), internal.c:138
------------------------------------------------------------------------------
I did not observe this when running with the xenomai before the patch.
Do you have a clue about why I see this?
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-07 10:02 ` Ronny Meeus
@ 2013-03-07 10:32 ` Philippe Gerum
2013-03-07 15:56 ` Philippe Gerum
0 siblings, 1 reply; 30+ messages in thread
From: Philippe Gerum @ 2013-03-07 10:32 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/07/2013 11:02 AM, Ronny Meeus wrote:
> On Wed, Mar 6, 2013 at 4:49 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/06/2013 03:32 PM, Ronny Meeus wrote:
>>>
>>> si_code=SI_TIMER, si_pid=51, si_uid=0, si_value={int=273724880,
>>> ptr=0x1050b5d0}}, NULL, 16) = 32
>>> 11309 15.442127 +++ killed by SIGKILL +++
>>>
>>> I'm not an expert in this, but to me it looks like the create_timer
>>> call, executed in the context of the 11308 thread, gets interrupted
>>> because a signal is received by thread 11309. This signal is generated
>>> because of a timer expiry which creates a new thread and processes the
>>> callback function.
>>>
>>> Is it not possible to disable all signals of the process during the
>>> creation of the timer in xenomai. In this way we can avoid the race
>>> condition in the library. This might be not a clean solution but would
>>> it work as a temporary one until you finish the timer handling you
>>> talk about in one of your earlier mails.
>>
>>
>> Fiddling with signal masks to work around an underlying bug would be asking
>> for more trouble. Time is a scarce enough resource.
>>
>>
>> BTW can I have some
>>>
>>> information about what you are planning to change?
>>>
>>
>> This one:
>> http://git.xenomai.org/?p=xenomai-forge.git;a=commit;h=cf30fb5e22a1a3d66412f1024be1ba86260904cc
>>
>> This implementation passes the psos testsuite in -forge, and runs your
>> testcase fine. Please pull this change and let me know if the situation
>> improves.
>>
>> --
>> Philippe.
>
> Philippe
>
> thanks for the patch.
> With this patch the test application is working fine, no cpuload is
> observed anymore.
> Our real application on the other hand aborts after some time with
> this reporting:
> ------------------------------------------------------------------------------
> [ ERROR BACKTRACE: thread root ]
>
> #0 EAGAIN in t_create(), task.c:320
> #1 EAGAIN in copperplate_create_thread(), internal.c:138
> ------------------------------------------------------------------------------
>
> I did not observe this when running with the xenomai before the patch.
> Do you have a clue about why I see this?
>
>
There is likely a resource leakage hiding somewhere. I'll have a look.
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-07 10:32 ` Philippe Gerum
@ 2013-03-07 15:56 ` Philippe Gerum
2013-03-07 19:51 ` Ronny Meeus
0 siblings, 1 reply; 30+ messages in thread
From: Philippe Gerum @ 2013-03-07 15:56 UTC (permalink / raw)
To: Ronny Meeus; +Cc: xenomai
On 03/07/2013 11:32 AM, Philippe Gerum wrote:
> On 03/07/2013 11:02 AM, Ronny Meeus wrote:
>>
>> thanks for the patch.
>> With this patch the test application is working fine, no cpuload is
>> observed anymore.
>> Our real application on the other hand aborts after some time with
>> this reporting:
>> ------------------------------------------------------------------------------
>>
>> [ ERROR BACKTRACE: thread root ]
>>
>> #0 EAGAIN in t_create(), task.c:320
>> #1 EAGAIN in copperplate_create_thread(), internal.c:138
>> ------------------------------------------------------------------------------
>>
>>
>> I did not observe this when running with the xenomai before the patch.
>> Do you have a clue about why I see this?
>>
>>
>
> There is likely a resource leakage hiding somewhere. I'll have a look.
>
After hours running the test case, I didn't see any sign of leakage,
same after code inspection. Maybe this bug is visible now that your app
can run more code. There are several potential causes for EAGAIN,
however assuming this is not a rlimit issue, checking /proc/vmstat while
your app runs may give some hint.
--
Philippe.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-07 15:56 ` Philippe Gerum
@ 2013-03-07 19:51 ` Ronny Meeus
2013-03-08 7:44 ` Ronny Meeus
0 siblings, 1 reply; 30+ messages in thread
From: Ronny Meeus @ 2013-03-07 19:51 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On Thu, Mar 7, 2013 at 4:56 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/07/2013 11:32 AM, Philippe Gerum wrote:
>>
>> On 03/07/2013 11:02 AM, Ronny Meeus wrote:
>>>
>>>
>>> thanks for the patch.
>>> With this patch the test application is working fine, no cpuload is
>>> observed anymore.
>>> Our real application on the other hand aborts after some time with
>>> this reporting:
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> [ ERROR BACKTRACE: thread root ]
>>>
>>> #0 EAGAIN in t_create(), task.c:320
>>> #1 EAGAIN in copperplate_create_thread(), internal.c:138
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> I did not observe this when running with the xenomai before the patch.
>>> Do you have a clue about why I see this?
>>>
>>>
>>
>> There is likely a resource leakage hiding somewhere. I'll have a look.
>>
>
> After hours running the test case, I didn't see any sign of leakage, same
> after code inspection. Maybe this bug is visible now that your app can run
> more code. There are several potential causes for EAGAIN, however assuming
> this is not a rlimit issue, checking /proc/vmstat while your app runs may
> give some hint.
>
> --
> Philippe.
I try to debug this issue tomorrow.
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
2013-03-07 19:51 ` Ronny Meeus
@ 2013-03-08 7:44 ` Ronny Meeus
0 siblings, 0 replies; 30+ messages in thread
From: Ronny Meeus @ 2013-03-08 7:44 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
On Thu, Mar 7, 2013 at 8:51 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote:
> On Thu, Mar 7, 2013 at 4:56 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/07/2013 11:32 AM, Philippe Gerum wrote:
>>>
>>> On 03/07/2013 11:02 AM, Ronny Meeus wrote:
>>>>
>>>>
>>>> thanks for the patch.
>>>> With this patch the test application is working fine, no cpuload is
>>>> observed anymore.
>>>> Our real application on the other hand aborts after some time with
>>>> this reporting:
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> [ ERROR BACKTRACE: thread root ]
>>>>
>>>> #0 EAGAIN in t_create(), task.c:320
>>>> #1 EAGAIN in copperplate_create_thread(), internal.c:138
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>> I did not observe this when running with the xenomai before the patch.
>>>> Do you have a clue about why I see this?
>>>>
>>>>
>>>
>>> There is likely a resource leakage hiding somewhere. I'll have a look.
>>>
>>
>> After hours running the test case, I didn't see any sign of leakage, same
>> after code inspection. Maybe this bug is visible now that your app can run
>> more code. There are several potential causes for EAGAIN, however assuming
>> this is not a rlimit issue, checking /proc/vmstat while your app runs may
>> give some hint.
>>
>> --
>> Philippe.
>
> I try to debug this issue tomorrow.
>
> Ronny
I found the issue, thanks to your /proc/vmstat hint.
The problem was that the complete virtual address space was depleted
and there was no space left to create some of the tasks.
By reducing the memory configuration of our application the build
starts up and the original load issue is resolved.
In the coming days I will do some further testing on this Xenomai
version and let you know the result later.
In a comment in the code it is indicated that a timerwheel will be
introduced later.
To you have any plans for this in the near future?
Our application is a legacy one and I think it has something like 50
timers running.
Thanks for the great support.
Ronny
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2013-03-08 7:44 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-28 19:19 [Xenomai] Xenomai-forge: thread using 100% cpu load Ronny Meeus
2013-02-28 20:10 ` Gilles Chanteperdrix
2013-02-28 20:22 ` Thomas De Schampheleire
2013-02-28 20:27 ` Gilles Chanteperdrix
2013-03-01 8:22 ` Philippe Gerum
2013-03-01 8:26 ` Gilles Chanteperdrix
2013-03-01 8:30 ` Philippe Gerum
2013-03-01 8:30 ` Gilles Chanteperdrix
2013-03-01 8:41 ` Philippe Gerum
2013-03-02 11:13 ` Ronny Meeus
2013-03-05 12:43 ` Ronny Meeus
2013-03-05 13:28 ` Philippe Gerum
2013-03-05 14:08 ` Philippe Gerum
2013-03-05 14:25 ` Ronny Meeus
2013-03-05 14:47 ` Philippe Gerum
2013-03-05 14:53 ` Ronny Meeus
2013-03-06 10:55 ` Ronny Meeus
2013-03-06 11:09 ` Philippe Gerum
2013-03-06 11:18 ` Philippe Gerum
2013-03-06 13:24 ` Philippe Gerum
2013-03-06 13:49 ` Philippe Gerum
2013-03-06 14:32 ` Ronny Meeus
2013-03-06 15:49 ` Philippe Gerum
2013-03-07 10:02 ` Ronny Meeus
2013-03-07 10:32 ` Philippe Gerum
2013-03-07 15:56 ` Philippe Gerum
2013-03-07 19:51 ` Ronny Meeus
2013-03-08 7:44 ` Ronny Meeus
2013-02-28 20:30 ` Ronny Meeus
2013-02-28 20:35 ` Gilles Chanteperdrix
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.