Re: [Xenomai] Fwd: Re: Problem that the Linux scheduler is not called for some ms

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Philippe Gerum <rpm@xenomai.org>
To: harald.fessl@sigmatek.at, xenomai@xenomai.org
Subject: Re: [Xenomai] Fwd: Re: Problem that the Linux scheduler is not called for some ms
Date: Fri, 23 Oct 2015 10:02:51 +0200	[thread overview]
Message-ID: <5629E9AB.80709@xenomai.org> (raw)
In-Reply-To: <5628E3EB.5040109@sigmatek.at>

On 10/22/2015 03:26 PM, Harald Feßl wrote:
> Am 19.10.2015 um 11:37 schrieb Harald Feßl:
>> Am 14.10.2015 um 10:15 schrieb Philippe Gerum:
>>> On 10/14/2015 09:29 AM, Harald Feßl wrote:
>>>> Am 08.10.2015 um 09:29 schrieb Harald Feßl:
>>>>> Am 07.10.2015 um 17:27 schrieb Philippe Gerum:
>>>>>> On 10/07/2015 04:23 PM, Harald Feßl wrote:
>>>>>>> Am 07.10.2015 um 11:07 schrieb Philippe Gerum:
>>>>>>>> On 10/07/2015 11:01 AM, Johann Obermayr wrote:
>>>>>>>>> Am 05.10.2015 um 13:54 schrieb Philippe Gerum:
>>>>>>>>>> On 10/05/2015 12:53 PM, Johann Obermayr wrote:
>>>>>>>>>>> Am 29.09.2015 um 17:56 schrieb Philippe Gerum:
>>>>>>>>>>>> On 09/29/2015 02:38 PM, Johann Obermayr wrote:
>>>>>>>>>>>>> Am 25.09.2015 um 10:44 schrieb Harald Feßl:
>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have done a ipipe trace for some working and one non
>>>>>>>>>>>>>> working
>>>>>>>>>>>>>> cycle.
>>>>>>>>>>>>>> The trace is stopped after the non working cycle.
>>>>>>>>>>>>>> I have marked the working cycles with green and the non
>>>>>>>>>>>>>> working
>>>>>>>>>>>>>> cycle
>>>>>>>>>>>>>> with red in my graphical trace.
>>>>>>>>>>>>>> The ipipe trace and graphical trace are stopped at the same
>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After the non working cycle the system is working correct
>>>>>>>>>>>>>> again for
>>>>>>>>>>>>>> some seconds or minutes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the problem is, that the migration of the task
>>>>>>>>>>>>>> "cyclic"
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> xenomai to linux, needs sometimes some ms.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Harald
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Harald Fessl
>>>>>>>>>>>>>> Betriebssystem
>>>>>>>>>>>>>> ________________________________
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> SIGMATEK GmbH & Co KG
>>>>>>>>>>>>>> Sigmatekstraße 1
>>>>>>>>>>>>>> 5112 Lamprechtshausen
>>>>>>>>>>>>>> Österreich / Austria
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Tel.:  +43/6274/4321-0
>>>>>>>>>>>>>> Fax:  +43/6274/4321-18
>>>>>>>>>>>>>> E-Mail: harald.fessl@sigmatek.at
>>>>>>>>>>>>>> http://www.sigmatek-automation.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ***********************Please
>>>>>>>>>>>>>> note:************************************
>>>>>>>>>>>>>> This email and all attachments are confidential and intended
>>>>>>>>>>>>>> solely for
>>>>>>>>>>>>>> the person or entity to whom it is addressed. If you are
>>>>>>>>>>>>>> not the
>>>>>>>>>>>>>> named
>>>>>>>>>>>>>> addressee you must not make this email and all attachments
>>>>>>>>>>>>>> accessible
>>>>>>>>>>>>>> to any other person. If you have received this email in error
>>>>>>>>>>>>>> please
>>>>>>>>>>>>>> delete it together with all attachments.
>>>>>>>>>>>>>> ***********************************************************************
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 23.09.2015 um 12:36 schrieb Jan Kiszka:
>>>>>>>>>>>>>>> On 2015-09-23 10:51, Harald Feßl wrote:
>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The linux tasks are not blocked (not all).
>>>>>>>>>>>>>>>> I think the problem is , that the linux scheduler function
>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>>>> is not called for some ms.
>>>>>>>>>>>>>>>> I have also traced the calls to the scheduler function
>>>>>>>>>>>>>>>> "static int __sched __schedule(void)"
>>>>>>>>>>>>>>>> and sometimes when the decribed problem occur this
>>>>>>>>>>>>>>>> function is
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> called while no linux task are running.
>>>>>>>>>>>>>>> If no task is runnable, there is also no reason to invoke
>>>>>>>>>>>>>>> schedule.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please post a ftrace log of your system, covering both a
>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>>> and a
>>>>>>>>>>>>>>> non-working cycle, including cobalt* and at least sched
>>>>>>>>>>>>>>> and irq
>>>>>>>>>>>>>>> events.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello Philippe and Xenomai forum,
>>>>>>>>>>>>>
>>>>>>>>>>>>> we have some trouble with a xenomai task (cyclic with prio 30)
>>>>>>>>>>>>> after
>>>>>>>>>>>>> switching to secondary domain.
>>>>>>>>>>>>> Linux ARM 3.0, Xenomai 2.6.2.1, and CONFIG_XENO_OPT_PRIOCPL=y.
>>>>>>>>>>>> PRIOCPL should be disabled, and all tests redone in this
>>>>>>>>>>>> context.
>>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> the result is the same.
>>>>>>>>>>> Some time your cyclic task will not schedule after switching to
>>>>>>>>>>> secondary domain.
>>>>>>>>>>>
>>>>>>>>>> I just noticed you were using 2.6.2.1. Several bugs in the domain
>>>>>>>>>> switch
>>>>>>>>>> mechanism have been fixed since then until 2.6.4, and the latter
>>>>>>>>>> still
>>>>>>>>>> suffers a recently SMP rescheduling issue already fixed in the
>>>>>>>>>> 2.6.x
>>>>>>>>>> maintenance branch.
>>>>>>>>>>
>>>>>>>>>> You should try running your code on top of that branch before
>>>>>>>>>> diving
>>>>>>>>>> any
>>>>>>>>>> deeper, I suspect you might be facing a bug that has been fixed
>>>>>>>>>> already.
>>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> we have test it with Linux 3.10.53 (Freescale) and Xenomai 2.6.4.
>>>>>>>>> But we see the same problem.
>>>>>>>>>
>>>>>>>> Ok, so please send a trace freeze with that configuration
>>>>>>>> illustrating
>>>>>>>> the problem, with PRIOCPL disabled. The previous traces you sent
>>>>>>>> include
>>>>>>>> RPI noise, which makes their interpretation uncertain.
>>>>>>>>
>>>>>>> Hello Philippe
>>>>>>>
>>>>>>> I am working with Johann at the same problem.
>>>>>>> We don't know what you mean with RPI noise.
>>>>>>> Is there a kernel switch to turn of some trace records.
>>>>>> I mean the traces generated by the PRIOCPL option. This one needs
>>>>>> to be
>>>>>> disabled.
>>>>> Ok, attached there is a ipipe trace without PRIOCPL.
>>>>> The trace was stopped after the "cyclic" task was not called for more
>>>>> than 3 ms.
>>>>> The configuration is still the same as Johann has described.
>>>>>
>>>> Hello Philippe
>>>>
>>>> Have you seen any problems in our ipipe trace, which I sent last week ?
>>>>
>>> >From those traces, the cyclic task is waiting for an event to happen
>>> before your watchdog pulls the break:
>>>
>>> :   + func                -256      0.743  rtdm_event_wait+0x14
>>> (lrtdrv_timing_wait+0xa0 [sigmatek_lrt])
>>> :   + func                -255+   1.193 rtdm_event_timedwait+0x14
>>> (rtdm_event_wait+0x2c)
>>>
>>> Looking at the timestamps, I can't see any 3 ms stall period for that
>>> task, at the very least it still happens to run a few hundreds of µs
>>> before your instrumentation code triggers an inactivity timeout.
>>>
>>> However, that cyclic task invoked a secondary mode call earlier from its
>>> processing loop, which caused a relax:
>>>
>>> :   +*func                -892+   1.083  hisyscall_event+0x14
>>> (ipipe_syscall_hook+0x80)
>>> :   +*func                -891      0.843  xnshadow_relax+0x14
>>> (hisyscall_event+0x238)
>>>
>>> which eventually ended by a switch back to primary mode:
>>>
>>> :|  *+func                -282+   1.130 xnpod_resume_thread+0x14
>>> (gatekeeper_thread+0x208)
>>> :|  *+[  589] cyclic: 30  -281+   1.403 xnpod_resume_thread+0xe8
>>> (gatekeeper_thread+0x208)
>>> :|  *+func                -280      0.984 __ipipe_restore_head+0x10
>>> (gatekeeper_thread+0x2c0)
>>> :    +func                -279      0.947  __xnpod_schedule+0x14
>>> (gatekeeper_thread+0x2a8)
>>>
>>> Nothing really bad from the traces at first sight, although your system
>>> seems quite loaded, and some patterns tend to favor thundering herds
>>> effect:
>>>
>>>     +*func                -986      0.889 __rtdm_synch_flush+0x10
>>> (period_update+0xd8 [sigmatek_lrt])
>>> :|  #*func                -985      0.828  xnsynch_flush+0x14
>>> (__rtdm_synch_flush+0xd0)
>>> :|  #*func                -985      0.851 xnpod_resume_thread+0x14
>>> (xnsynch_flush+0x124)
>>> :|  #*[  589] cyclic: 30  -984      0.904 xnpod_resume_thread+0xe8
>>> (xnsynch_flush+0x124)
>>> :|  #*func                -983+   1.120 xnpod_resume_thread+0x14
>>> (xnsynch_flush+0x124)
>>> :|  #*[  587] Loader: 29  -982      0.891 xnpod_resume_thread+0xe8
>>> (xnsynch_flush+0x124)
>>> :|  #*func                -981      0.780 xnpod_resume_thread+0x14
>>> (xnsynch_flush+0x124)
>>> :|  #*[  593] backgrou 0  -980      0.992 xnpod_resume_thread+0xe8
>>> (xnsynch_flush+0x124)
>>>
>> Hallo Philippe
>>
>> Sorry about the last ipipe trace I sent. The trace buffer was
>> configured to small and so the problem was not traced.
>> In this ipipe trace I have seen what occur, but I don't know why.
>> At line 252366 the cyclic task will suspended.
>> :|  # [  610] cyclic: 30 -9119      0.677  __xnpod_schedule+0x14c
>> (xnpod_suspend_thread+0x434)
>> At line 261530 the cyclic task will resumed again. Between these two
>> calls are about 10ms.
>> :|  *+[  610] cyclic: 30  -528+   1.486  xnpod_resume_thread+0xd4
>> (gatekeeper_thread+0x1d8)
>>
>>
>> Note: You wrote the system seems to be very loaded.
>> The cpu load is only very high while the ipipe tracing is running.
>>
>> Harald
>>
>>
> Hello Philippe
> 
> Is it possible that you can have a short view to the trace I have sent
> on monday.
> We dont know what can be wrong in that case.
> 

I will definitely look at this early next week. Unfortunately, a 23+ MB
trace file is not something I can have a quick look at, even if only
considering trace points spanning a 10 ms time frame by micro-second steps.

-- 
Philippe.

     prev parent reply	other threads:[~2015-10-23  8:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5601617F.3080800@sigmatek.at>
2015-09-22 14:13 ` [Xenomai] Problem that the Linux scheduler is not called for some ms Wolfgang Netbal
2015-09-22 18:44   ` Jan Kiszka
     [not found]     ` <560263A0.4080208@sigmatek.at>
2015-09-23  8:51       ` [Xenomai] Fwd: " Harald Feßl
2015-09-23 10:36         ` Jan Kiszka
2015-09-24 14:46           ` Harald Feßl
2015-09-25  8:44           ` Harald Feßl
2015-09-29 12:38             ` Johann Obermayr
2015-09-29 15:56               ` Philippe Gerum
     [not found]                 ` <561256C6.4070508@sigmatek.at>
     [not found]                   ` <561264DD.2020803@xenomai.org>
2015-10-07  9:01                     ` Johann Obermayr
2015-10-07  9:07                       ` Philippe Gerum
2015-10-07 14:23                         ` Harald Feßl
2015-10-07 15:27                           ` Philippe Gerum
2015-10-08  7:29                             ` Harald Feßl
2015-10-14  7:29                               ` Harald Feßl
2015-10-14  8:15                                 ` Philippe Gerum
2015-10-19  9:37                                   ` Harald Feßl
2015-10-22 13:26                                     ` Harald Feßl
2015-10-23  8:02                                       ` Philippe Gerum [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5629E9AB.80709@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=harald.fessl@sigmatek.at \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.