All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philippe Gerum <rpm@xenomai.org>
To: Ronny Meeus <ronny.meeus@gmail.com>
Cc: xenomai@xenomai.org
Subject: Re: [Xenomai] Xenomai-forge: thread using 100% cpu load
Date: Wed, 06 Mar 2013 12:09:32 +0100	[thread overview]
Message-ID: <513723EC.6070203@xenomai.org> (raw)
In-Reply-To: <CAMJ=MEccwvx-__TQZZdAzxmb=cuzivG_kUkKxU65V=R3a5yuTw@mail.gmail.com>

On 03/06/2013 11:55 AM, Ronny Meeus wrote:
> On Tue, Mar 5, 2013 at 3:08 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/05/2013 01:43 PM, Ronny Meeus wrote:
>>>
>>> On Sat, Mar 2, 2013 at 12:13 PM, Ronny Meeus <ronny.meeus@gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Mar 1, 2013 at 9:41 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>>>
>>>>> On 03/01/2013 09:30 AM, Gilles Chanteperdrix wrote:
>>>>>>
>>>>>>
>>>>>> On 03/01/2013 09:30 AM, Philippe Gerum wrote:
>>>>>>
>>>>>>> On 03/01/2013 09:26 AM, Gilles Chanteperdrix wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/01/2013 09:22 AM, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>>> On 02/28/2013 09:22 PM, Thomas De Schampheleire wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
>>>>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello
>>>>>>>>>>>>
>>>>>>>>>>>> we are using the PSOS interface of Xenomai forge, running
>>>>>>>>>>>> completely
>>>>>>>>>>>> in user-space using the mercury code.
>>>>>>>>>>>> We deploy our application on different processors, one product is
>>>>>>>>>>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>>>>>>>>>>> Cavium (8 core device).
>>>>>>>>>>>> The Linux version we use is 2.6.32 but I would assume that this
>>>>>>>>>>>> is
>>>>>>>>>>>> not
>>>>>>>>>>>> so relevant.
>>>>>>>>>>>>
>>>>>>>>>>>> Our Xenomai application is running on one of the cores (affinity
>>>>>>>>>>>> is
>>>>>>>>>>>> set), while the other cores are running other code.
>>>>>>>>>>>>
>>>>>>>>>>>> On both architectures we recently start to see issues that one
>>>>>>>>>>>> thread
>>>>>>>>>>>> is consuming 100% of the core on which the application is pinned.
>>>>>>>>>>>> The thread that monopolizes the core is the thread internally
>>>>>>>>>>>> used
>>>>>>>>>>>> to
>>>>>>>>>>>> manage the timers, running at the highest priority.
>>>>>>>>>>>> The trigger for running into this behavior is currently unclear.
>>>>>>>>>>>> If we only start a part of the application (platform management
>>>>>>>>>>>> only),
>>>>>>>>>>>> the issue is not observed.
>>>>>>>>>>>> We see this on both an old version of Xenomai and a very recent
>>>>>>>>>>>> one
>>>>>>>>>>>> (pulled from the git repo yesterday).
>>>>>>>>>>>>
>>>>>>>>>>>> I will continue to debug this issue in the coming days and try
>>>>>>>>>>>> isolate
>>>>>>>>>>>> the code that is triggering it, but I can use hints from the
>>>>>>>>>>>> community.
>>>>>>>>>>>> Debugging is complex since once the load starts, the debugger is
>>>>>>>>>>>> not
>>>>>>>>>>>> reacting anymore.
>>>>>>>>>>>> If I put breakpoints in the functions that are called when the
>>>>>>>>>>>> timer
>>>>>>>>>>>> expires (both oneshot and periodic), the process starts to clone
>>>>>>>>>>>> itself and I endup with tens of them.
>>>>>>>>>>>>
>>>>>>>>>>>> Has anybody seen an issue like this before or does somebody has
>>>>>>>>>>>> some
>>>>>>>>>>>> hints on how to debug this problem?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> First enable the watchdog. It will send a signal to the
>>>>>>>>>>> application
>>>>>>>>>>> when
>>>>>>>>>>> detecting a problem, then you can use the watchdog to trigger an
>>>>>>>>>>> I-pipe
>>>>>>>>>>> tracer trace when the bug happens. You will probably have to
>>>>>>>>>>> increase
>>>>>>>>>>> the watchdog polling frequency, in order to have a meaningful
>>>>>>>>>>> trace.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't think an I-pipe tracer will be possible when using the
>>>>>>>>>> Mercury
>>>>>>>>>> core, right (xenomai-forge) ?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Correct.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I do not think so. The way I see it, you can enable the I-pipe tracer
>>>>>>>> without CONFIG_XENOMAI.
>>>>>>>>
>>>>>>>
>>>>>>> Mercury has NO pipeline in the kernel.
>>>>>>>
>>>>>>
>>>>>> You mean mercury can not run with an I-pipe kernel?
>>>>>>
>>>>>
>>>>> I mean it does not care about the pipeline, it does not need it. So if
>>>>> this
>>>>> is about observing kernel activity, then ftrace should be fine, or
>>>>> possibly
>>>>> perf to find out where userland spends time.
>>>>>
>>>>> --
>>>>> Philippe.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xenomai mailing list
>>>>> Xenomai@xenomai.org
>>>>> http://www.xenomai.org/mailman/listinfo/xenomai
>>>>
>>>>
>>>> Hello
>>>>
>>>> An update on the investigation:
>>>> I was able to make this issue disappear by changing the timeout value
>>>> of the smallest timers we use.
>>>> We use a couple of timers with a timeout of 25ms. By enlarging these
>>>> to 25sec and the problem is gone.
>>>>
>>>> Yesterday I was also able to see (using the"strace" tool) the process
>>>> executing constantly "clone" system calls.
>>>> Note that the process we use is large (2Gb) and uses an mlockall call.
>>>>
>>>> In
>>>> http://stackoverflow.com/questions/4263958/some-information-on-timer-helper-thread-of-librt-so-1/4935895#4935895
>>>> I see that a new thread is created when the timer_create is called for
>>>> the first time. This thread stays alive until the program exits and is
>>>> used to process the timer expiries.
>>>> I have the feeling that there is an issue during the creation of this
>>>> thread. For example what would happen if the clone operation takes
>>>> longer than the time needed to perform the clone operation?
>>>> In the past we already observed issues with the clone call that we
>>>> could not explain (creation of the clone simply failed on our
>>>> application while it was working fine on a smaller application).
>>>>
>>>> Do you guys know whether there is an impact on the clone operation by
>>>> this mlockall call?
>>>>
>>>> I will try to make a small test application on which the issue can be
>>>> reproduced.
>>>>
>>>> ---
>>>> Ronny
>>>
>>>
>>> I'm able to reproduce the issue on a small test build:
>>>
>>> #include <stdio.h>
>>> #include <unistd.h>
>>> #include <sys/types.h>
>>> #include <sys/mman.h>
>>> #include <psos.h>
>>> #include <copperplate/init.h>
>>> #include <stdlib.h>
>>> #include <string.h>
>>>
>>> static void foo (u_long a0, u_long a1, u_long a2, u_long a3)
>>> {
>>>           u_long ret, ev = 0, tmid,tmid2;
>>>
>>>           ret = tm_evevery(1,1,&tmid);
>>>           ret = tm_evafter(30000,4,&tmid2);
>>>           while (1) {
>>>                   ret = ev_receive(0xFF,EV_ANY|EV_WAIT,0,&ev);
>>>                   if (ev & 4) {
>>>                           printf("%lx Restarting one-shot timer.
>>> ev=%lx\n",ret,ev);
>>>                           tm_evafter(30000,4,&tmid2);
>>>                   }
>>>                   ev = 0;
>>>           }
>>>           tm_wkafter(100);
>>> }
>>>
>>> int main(int argc, char * const *argv)
>>> {
>>>           u_long ret, tid = 0, args[4];
>>>
>>>           mlockall(MCL_CURRENT | MCL_FUTURE);
>>>           copperplate_init(&argc,&argv);
>>>
>>>           ret = t_create("TEST",97, 0, 0, 0, &tid);
>>>           printf("t_create(tid=%lu) = %lu\n", tid, ret);
>>>           args[0] = 1;
>>>           args[1] = 2;
>>>           args[2] = 3;
>>>           args[3] = 4;
>>>           ret = t_start(tid, 0, foo, args);
>>>           printf("t_start(tid=%lu) = %lu\n", tid, ret);
>>>
>>>           while (1)
>>>                   tm_wkafter(100);
>>>           return 0;
>>> }
>>>
>>> The TEST task starts 2 timers: one periodic and one 1shot timer.
>>> Each time the one-shot timer expires, a print is done and the timer is
>>> restarted.
>>>
>>> Observation is that once the one-shot timer expires, the application
>>> starts to use 100% cpuload on one core and the application code is not
>>> executed anymore. So it looks like there is constant processing in
>>> either Xenomai or the library code to process the timer handling. If
>>> periodic timers are used the issue is not observed.
>>>
>>
>> I can't reproduce this bug using that test code, over glibc 2.15/x86. We
>> might have a problem with SIGEV_THREAD. Which glibc release are you running?
>>
>
> Philip,
> do you have a reference to the issue that you are suspecting and a

Nothing specific I can confirm yet.

> view on which version of the glib we need to use to solve it?
>
>

So far I have the test running fine over glibc 2.15(x86) and eglibc 
2.13(ppc). I have an outdated glibc 2.8(arm) I'm about to test which 
might give me a different status.

I don't think I'll keep running timers over SIGEV_THREAD in the upcoming 
-forge work anyway, the spec leaves too much room for interpretation 
with respect to the underlying implementation. Typically, one server 
thread per-timer would be quite of a problem with legacy systems firing 
tenths of timeout timers used as plain watchdogs.

-- 
Philippe.


  reply	other threads:[~2013-03-06 11:09 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-28 19:19 [Xenomai] Xenomai-forge: thread using 100% cpu load Ronny Meeus
2013-02-28 20:10 ` Gilles Chanteperdrix
2013-02-28 20:22   ` Thomas De Schampheleire
2013-02-28 20:27     ` Gilles Chanteperdrix
2013-03-01  8:22     ` Philippe Gerum
2013-03-01  8:26       ` Gilles Chanteperdrix
2013-03-01  8:30         ` Philippe Gerum
2013-03-01  8:30           ` Gilles Chanteperdrix
2013-03-01  8:41             ` Philippe Gerum
2013-03-02 11:13               ` Ronny Meeus
2013-03-05 12:43                 ` Ronny Meeus
2013-03-05 13:28                   ` Philippe Gerum
2013-03-05 14:08                   ` Philippe Gerum
2013-03-05 14:25                     ` Ronny Meeus
2013-03-05 14:47                       ` Philippe Gerum
2013-03-05 14:53                         ` Ronny Meeus
2013-03-06 10:55                     ` Ronny Meeus
2013-03-06 11:09                       ` Philippe Gerum [this message]
2013-03-06 11:18                         ` Philippe Gerum
2013-03-06 13:24                       ` Philippe Gerum
2013-03-06 13:49                 ` Philippe Gerum
2013-03-06 14:32                   ` Ronny Meeus
2013-03-06 15:49                     ` Philippe Gerum
2013-03-07 10:02                       ` Ronny Meeus
2013-03-07 10:32                         ` Philippe Gerum
2013-03-07 15:56                           ` Philippe Gerum
2013-03-07 19:51                             ` Ronny Meeus
2013-03-08  7:44                               ` Ronny Meeus
2013-02-28 20:30   ` Ronny Meeus
2013-02-28 20:35     ` Gilles Chanteperdrix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=513723EC.6070203@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=ronny.meeus@gmail.com \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.