All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] non-blocking rt_task_suspend(NULL)
@ 2014-04-15 12:42 Petr Cervenka
  2014-04-16  9:08 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Cervenka @ 2014-04-15 12:42 UTC (permalink / raw)
  To: Xenomai

Hello I have a problem with the rt_task_suspend(NULL) call.
I'm using it for synchronization of two (producer / consumer like) tasks.
1) When the consumer task has no work to do, it stops itself by calling of the rt_task_suspend(NULL).
2) When the producer creates new work for consumer, it wakes it up by calling of rt_task_resume(&consumerTask).
The problem is, that consumer seldom switches to a state, that it sleeps by rt_task_suspend no more. And the task then takes all the CPU time.
The return code is 0. But I already have seen couple of -4 (-EINTR) values in the past also.
Consumer task status was 00300380 before and 00300184 (if there is small safety sleep present).
I can use for example RT_EVENT variable instead, but I'm curious if you by chance don't know, what is happening?
Xenomai 2.6.3, Linux 3.5.7

Petr Cervenka


^ permalink raw reply	[flat|nested] 27+ messages in thread
* Re: [Xenomai] non-blocking rt_task_suspend(NULL)
@ 2014-04-16 14:20 Petr Cervenka
  2014-04-16 14:28 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Cervenka @ 2014-04-16 14:20 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

> Od: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>
> CC: "Xenomai" <xenomai@xenomai.org>
>On 04/16/2014 02:22 PM, Petr Cervenka wrote:
>>> Od: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>>
>>> CC: "Xenomai" <xenomai@xenomai.org> On 04/15/2014 02:42 PM, Petr
>>> Cervenka wrote:
>>>> Hello I have a problem with the rt_task_suspend(NULL) call. I'm
>>>> using it for synchronization of two (producer / consumer like)
>>>> tasks. 1) When the consumer task has no work to do, it stops
>>>> itself by calling of the rt_task_suspend(NULL). 2) When the
>>>> producer creates new work for consumer, it wakes it up by calling
>>>> of rt_task_resume(&consumerTask). The problem is, that consumer
>>>> seldom switches to a state, that it sleeps by rt_task_suspend no
>>>> more. And the task then takes all the CPU time. The return code
>>>> is 0. But I already have seen couple of -4 (-EINTR) values in the
>>>> past also. Consumer task status was 00300380 before and 00300184
>>>> (if there is small safety sleep present). I can use for example
>>>> RT_EVENT variable instead, but I'm curious if you by chance don't
>>>> know, what is happening? Xenomai 2.6.3, Linux 3.5.7
>>>
>>> Could you post the example of code you are using to get this
>>> issue?
>>>
>>
>> It's and application with many threads, mutexes and others. It's also
>> special measuring HW dependent. I can post here some simplified
>> example. But I don't think it would be possible to reproduce the same
>> behavior easily. It happens in my configuration only probably once
>> per day and very unpredictably. But I have more details. I replaced
>> rt_task_suspend / rt_task_resume by rt_event_wait / rt_event_signal.
>> It failed similar way, but this time the result of wait was -4
>> (-EINTR). And (after several millions of invocations) it recovered
>> itself.
>
>-EINTR is a valid return value for both rt_event_wait and 
>rt_task_suspend. In case you get this error, you should loop to call 
>rt_event_wait again, and not call rt_event_clear, as you risk clearing 
>an event which has been signaled afterwards.
>
You are right. It was just very quick replace of waiting and waking-up functions. But I'm checking the "work queue" anyway and it also doesn't need exact timing here. My problem it that the slow consumer task seems to be "interrupted by signal" (or whatever) for several minutes. I mean, that it doesn't wait for the event anymore and it always returns immediately (with -EINTR return code). I also already got one such situation half an hour ago. But the return code was 0 that time. Could you give me some advice what to check when such situation happens again?

Petr


^ permalink raw reply	[flat|nested] 27+ messages in thread
* Re: [Xenomai] non-blocking rt_task_suspend(NULL)
@ 2014-04-16 16:02 Petr Cervenka
  2014-04-16 16:17 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Cervenka @ 2014-04-16 16:02 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

> Od: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>
> CC: "Xenomai" <xenomai@xenomai.org>
>On 04/16/2014 04:20 PM, Petr Cervenka wrote:
>>> Od: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>>
>>> CC: "Xenomai" <xenomai@xenomai.org> On 04/16/2014 02:22 PM, Petr
>>> Cervenka wrote:
>>>>> Od: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>>>>
>>>>> CC: "Xenomai" <xenomai@xenomai.org> On 04/15/2014 02:42 PM,
>>>>> Petr Cervenka wrote:
>>>>>> Hello I have a problem with the rt_task_suspend(NULL) call.
>>>>>> I'm using it for synchronization of two (producer / consumer
>>>>>> like) tasks. 1) When the consumer task has no work to do, it
>>>>>> stops itself by calling of the rt_task_suspend(NULL). 2) When
>>>>>> the producer creates new work for consumer, it wakes it up by
>>>>>> calling of rt_task_resume(&consumerTask). The problem is,
>>>>>> that consumer seldom switches to a state, that it sleeps by
>>>>>> rt_task_suspend no more. And the task then takes all the CPU
>>>>>> time. The return code is 0. But I already have seen couple of
>>>>>> -4 (-EINTR) values in the past also. Consumer task status was
>>>>>> 00300380 before and 00300184 (if there is small safety sleep
>>>>>> present). I can use for example RT_EVENT variable instead,
>>>>>> but I'm curious if you by chance don't know, what is
>>>>>> happening? Xenomai 2.6.3, Linux 3.5.7
>>>>>
>>>>> Could you post the example of code you are using to get this
>>>>> issue?
>>>>>
>>>>
>>>> It's and application with many threads, mutexes and others. It's
>>>> also special measuring HW dependent. I can post here some
>>>> simplified example. But I don't think it would be possible to
>>>> reproduce the same behavior easily. It happens in my
>>>> configuration only probably once per day and very unpredictably.
>>>> But I have more details. I replaced rt_task_suspend /
>>>> rt_task_resume by rt_event_wait / rt_event_signal. It failed
>>>> similar way, but this time the result of wait was -4 (-EINTR).
>>>> And (after several millions of invocations) it recovered itself.
>>>
>>> -EINTR is a valid return value for both rt_event_wait and
>>> rt_task_suspend. In case you get this error, you should loop to
>>> call rt_event_wait again, and not call rt_event_clear, as you risk
>>> clearing an event which has been signaled afterwards.
>>>
>> You are right. It was just very quick replace of waiting and
>> waking-up functions. But I'm checking the "work queue" anyway and it
>> also doesn't need exact timing here. My problem it that the slow
>> consumer task seems to be "interrupted by signal" (or whatever) for
>> several minutes. I mean, that it doesn't wait for the event anymore
>> and it always returns immediately (with -EINTR return code).
>
>Are you running inside gdb? Does the task receive the SIGDEBUG signal? 
>Do you have the XNWARNSW bit armed?
>

gdb: No.
SIGDEBUG, XNWARNSW: I don't even know what it is ;-).

>> I also
>> already got one such situation half an hour ago. But the return code
>> was 0 that time. Could you give me some advice what to check when
>> such situation happens again?
>
>Well the task status should help.
>
Normaly the task status is (from /proc/xenomai/stat): 00300182.
(XNFPU | XNSHADOW| XNMAPPED | XNSTARTED | XNPEND - waiting for an event)
Task status from the last issue (from /proc/xenomai/stat) was 00300380.
(XNFPU | XNSHADOW| XNRELAX | XNMAPPED | XNSTARTED)
CPU load of the task was 23% (and more than 4mil. MSW/CSW). Perhaps sending of UDP packets (used for debugging) caused some sleep and prevented the computer from total freeze.

After next issue I will have more precise information from rt_task_inquire.

Petr


^ permalink raw reply	[flat|nested] 27+ messages in thread
* Re: [Xenomai] non-blocking rt_task_suspend(NULL)
@ 2014-05-02 12:13 Petr Cervenka
  2014-05-02 12:30 ` Gilles Chanteperdrix
  2014-05-02 13:16 ` Philippe Gerum
  0 siblings, 2 replies; 27+ messages in thread
From: Petr Cervenka @ 2014-05-02 12:13 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

> Od: "Petr Cervenka" <grugh@centrum.cz>
>
> CC: "Xenomai" <xenomai@xenomai.org>
>> Od: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>
>> CC: "Xenomai" <xenomai@xenomai.org>
>>On 04/24/2014 05:06 PM, Petr Cervenka wrote:
>>>> Od: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
>>>>
>>>>> SIGDEBUG signal was not received. Task status from
>>>>> rt_task_inquire() was 0x300180 or 0x300380 (depends where it is
>>>>> placed) When the task is in the "wrong" state, also the call of
>>>>> rt_task_sleep(100000) is returning permanently -EINTR code. Do
>>>>> you have any other idea what to check or what can cause perhaps
>>>>> every xenomai call fail with -EINTR in one task?
>>>>
>>>> If I had to debug this issue, I would enable the I-pipe tracer and
>>>> trigger a trace freeze when the -EINTR code is received. With
>>>> enough trace points, it should be possible to understand what
>>>> happens.
>>>>
>>> I called a xntrace_user_freeze() immediately when the issue occurs,
>>> but I simply don't understand what is happening there. The trace
>>> output is in the attachment. Could you please help me to understand
>>> it?
>>>
>>> I also got some minor problem with xntrace_user_freeze, because the
>>> linker was not able to find it: asyncwriter.cpp:(.text+0x843):
>>> undefined reference to `xntrace_user_freeze(unsigned long, int)' It
>>> is defined in src/skins/common/trace.c and (should be) contained in
>>> libxenomai.so. But I was not successful and I had to define it myself
>>> (under different name). Version of xenomai is 2.6.3.
>>
>>We see that xnpod_suspend_thread returns immediately, likely because it 
>>has the XNKICKED bit set. Could you add more back trace points? So that 
>>we see what is setting the XNKICKED bit?
>>
>
>I have added (maybe too much) back trace points. But as last time, I'm not able to see (almost) anything in it ;-)
>Previous xnpod_suspend_thread (on line 3713, probably caused by rt_mutex_aquire) seems to be fine (for me;-) ).
>

Could you please help me with analysis, what is happening in the trace log?
I only see at the end only peaces of log, which are already contained somewhere before.
For example lines 3832-3993 are the same as 2901-3062.
Also 3697-3861 and 3065-3229.
Also 3596-3692 and 2935-3058
Also 3065-3550 and 2402-2887

There are also suspicious lines with "device_not_available" and "ipipe_handle_exception", but they seem to be regularly appearing.

Thank you in  advance.
Petr


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-05-20 12:54 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-15 12:42 [Xenomai] non-blocking rt_task_suspend(NULL) Petr Cervenka
2014-04-16  9:08 ` Gilles Chanteperdrix
2014-04-16 12:22   ` Petr Cervenka
2014-04-16 12:26     ` Gilles Chanteperdrix
  -- strict thread matches above, loose matches on Subject: below --
2014-04-16 14:20 Petr Cervenka
2014-04-16 14:28 ` Gilles Chanteperdrix
2014-04-16 16:02 Petr Cervenka
2014-04-16 16:17 ` Gilles Chanteperdrix
2014-04-18  8:51   ` Petr Cervenka
2014-04-22 17:20     ` Gilles Chanteperdrix
2014-04-24 15:06       ` Petr Cervenka
2014-04-24 17:53         ` Gilles Chanteperdrix
2014-04-25  8:38           ` Petr Cervenka
2014-05-02 12:13 Petr Cervenka
2014-05-02 12:30 ` Gilles Chanteperdrix
2014-05-02 13:16 ` Philippe Gerum
2014-05-06  8:17   ` Petr Cervenka
2014-05-06  8:39     ` Philippe Gerum
2014-05-06  8:56     ` Philippe Gerum
2014-05-06  9:29       ` Petr Cervenka
2014-05-06 12:57         ` Philippe Gerum
2014-05-07 13:13           ` Petr Cervenka
2014-05-08 15:53             ` Philippe Gerum
2014-05-12 12:37               ` Petr Cervenka
2014-05-12 13:09                 ` Philippe Gerum
2014-05-20 12:27                   ` Petr Cervenka
2014-05-20 12:54                     ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.