From mboxrd@z Thu Jan 1 00:00:00 1970 References: <5601A1A1.2020506@siemens.com> <560263A0.4080208@sigmatek.at> <56026810.8040300@sigmatek.at> <56028093.6050805@siemens.com> <56050985.2060002@sigmatek.at> <560A8632.3020107@sigmatek.at> <560AB4C7.3050508@xenomai.org> <561256C6.4070508@sigmatek.at> <561264DD.2020803@xenomai.org> <5614DF5C.200@sigmatek.at> <5614E0C7.6000309@xenomai.org> <56152AED.6050407@sigmatek.at> <561539EB.9090301@xenomai.org> <56161B52.1070903@sigmatek.at> <561E043C.1020002@sigmatek.at> <561E0F30.4000800@xenomai.org> <5624B9E5.2040305@sigmatek.at> From: =?UTF-8?Q?Harald_Fe=c3=9fl?= Message-ID: <5628E3EB.5040109@sigmatek.at> Date: Thu, 22 Oct 2015 15:26:03 +0200 MIME-Version: 1.0 In-Reply-To: <5624B9E5.2040305@sigmatek.at> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Subject: Re: [Xenomai] Fwd: Re: Problem that the Linux scheduler is not called for some ms Reply-To: harald.fessl@sigmatek.at List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: rpm@xenomai.org, xenomai@xenomai.org Am 19.10.2015 um 11:37 schrieb Harald Feßl: > Am 14.10.2015 um 10:15 schrieb Philippe Gerum: >> On 10/14/2015 09:29 AM, Harald Feßl wrote: >>> Am 08.10.2015 um 09:29 schrieb Harald Feßl: >>>> Am 07.10.2015 um 17:27 schrieb Philippe Gerum: >>>>> On 10/07/2015 04:23 PM, Harald Feßl wrote: >>>>>> Am 07.10.2015 um 11:07 schrieb Philippe Gerum: >>>>>>> On 10/07/2015 11:01 AM, Johann Obermayr wrote: >>>>>>>> Am 05.10.2015 um 13:54 schrieb Philippe Gerum: >>>>>>>>> On 10/05/2015 12:53 PM, Johann Obermayr wrote: >>>>>>>>>> Am 29.09.2015 um 17:56 schrieb Philippe Gerum: >>>>>>>>>>> On 09/29/2015 02:38 PM, Johann Obermayr wrote: >>>>>>>>>>>> Am 25.09.2015 um 10:44 schrieb Harald Feßl: >>>>>>>>>>>>> Hi >>>>>>>>>>>>> >>>>>>>>>>>>> I have done a ipipe trace for some working and one non >>>>>>>>>>>>> working >>>>>>>>>>>>> cycle. >>>>>>>>>>>>> The trace is stopped after the non working cycle. >>>>>>>>>>>>> I have marked the working cycles with green and the non >>>>>>>>>>>>> working >>>>>>>>>>>>> cycle >>>>>>>>>>>>> with red in my graphical trace. >>>>>>>>>>>>> The ipipe trace and graphical trace are stopped at the same >>>>>>>>>>>>> time. >>>>>>>>>>>>> >>>>>>>>>>>>> After the non working cycle the system is working correct >>>>>>>>>>>>> again for >>>>>>>>>>>>> some seconds or minutes. >>>>>>>>>>>>> >>>>>>>>>>>>> I think the problem is, that the migration of the task >>>>>>>>>>>>> "cyclic" >>>>>>>>>>>>> from >>>>>>>>>>>>> xenomai to linux, needs sometimes some ms. >>>>>>>>>>>>> >>>>>>>>>>>>> Harald >>>>>>>>>>>>> >>>>>>>>>>>>> Harald Fessl >>>>>>>>>>>>> Betriebssystem >>>>>>>>>>>>> ________________________________ >>>>>>>>>>>>> >>>>>>>>>>>>> SIGMATEK GmbH & Co KG >>>>>>>>>>>>> Sigmatekstraße 1 >>>>>>>>>>>>> 5112 Lamprechtshausen >>>>>>>>>>>>> Österreich / Austria >>>>>>>>>>>>> >>>>>>>>>>>>> Tel.: +43/6274/4321-0 >>>>>>>>>>>>> Fax: +43/6274/4321-18 >>>>>>>>>>>>> E-Mail: harald.fessl@sigmatek.at >>>>>>>>>>>>> http://www.sigmatek-automation.com >>>>>>>>>>>>> >>>>>>>>>>>>> ***********************Please >>>>>>>>>>>>> note:************************************ >>>>>>>>>>>>> This email and all attachments are confidential and intended >>>>>>>>>>>>> solely for >>>>>>>>>>>>> the person or entity to whom it is addressed. If you are >>>>>>>>>>>>> not the >>>>>>>>>>>>> named >>>>>>>>>>>>> addressee you must not make this email and all attachments >>>>>>>>>>>>> accessible >>>>>>>>>>>>> to any other person. If you have received this email in error >>>>>>>>>>>>> please >>>>>>>>>>>>> delete it together with all attachments. >>>>>>>>>>>>> *********************************************************************** >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Am 23.09.2015 um 12:36 schrieb Jan Kiszka: >>>>>>>>>>>>>> On 2015-09-23 10:51, Harald Feßl wrote: >>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The linux tasks are not blocked (not all). >>>>>>>>>>>>>>> I think the problem is , that the linux scheduler function >>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>> kernel >>>>>>>>>>>>>>> is not called for some ms. >>>>>>>>>>>>>>> I have also traced the calls to the scheduler function >>>>>>>>>>>>>>> "static int __sched __schedule(void)" >>>>>>>>>>>>>>> and sometimes when the decribed problem occur this >>>>>>>>>>>>>>> function is >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>> called while no linux task are running. >>>>>>>>>>>>>> If no task is runnable, there is also no reason to invoke >>>>>>>>>>>>>> schedule. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please post a ftrace log of your system, covering both a >>>>>>>>>>>>>> working >>>>>>>>>>>>>> and a >>>>>>>>>>>>>> non-working cycle, including cobalt* and at least sched >>>>>>>>>>>>>> and irq >>>>>>>>>>>>>> events. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jan >>>>>>>>>>>>>> >>>>>>>>>>>> Hello Philippe and Xenomai forum, >>>>>>>>>>>> >>>>>>>>>>>> we have some trouble with a xenomai task (cyclic with prio 30) >>>>>>>>>>>> after >>>>>>>>>>>> switching to secondary domain. >>>>>>>>>>>> Linux ARM 3.0, Xenomai 2.6.2.1, and CONFIG_XENO_OPT_PRIOCPL=y. >>>>>>>>>>> PRIOCPL should be disabled, and all tests redone in this >>>>>>>>>>> context. >>>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> the result is the same. >>>>>>>>>> Some time your cyclic task will not schedule after switching to >>>>>>>>>> secondary domain. >>>>>>>>>> >>>>>>>>> I just noticed you were using 2.6.2.1. Several bugs in the domain >>>>>>>>> switch >>>>>>>>> mechanism have been fixed since then until 2.6.4, and the latter >>>>>>>>> still >>>>>>>>> suffers a recently SMP rescheduling issue already fixed in the >>>>>>>>> 2.6.x >>>>>>>>> maintenance branch. >>>>>>>>> >>>>>>>>> You should try running your code on top of that branch before >>>>>>>>> diving >>>>>>>>> any >>>>>>>>> deeper, I suspect you might be facing a bug that has been fixed >>>>>>>>> already. >>>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> we have test it with Linux 3.10.53 (Freescale) and Xenomai 2.6.4. >>>>>>>> But we see the same problem. >>>>>>>> >>>>>>> Ok, so please send a trace freeze with that configuration >>>>>>> illustrating >>>>>>> the problem, with PRIOCPL disabled. The previous traces you sent >>>>>>> include >>>>>>> RPI noise, which makes their interpretation uncertain. >>>>>>> >>>>>> Hello Philippe >>>>>> >>>>>> I am working with Johann at the same problem. >>>>>> We don't know what you mean with RPI noise. >>>>>> Is there a kernel switch to turn of some trace records. >>>>> I mean the traces generated by the PRIOCPL option. This one needs >>>>> to be >>>>> disabled. >>>> Ok, attached there is a ipipe trace without PRIOCPL. >>>> The trace was stopped after the "cyclic" task was not called for more >>>> than 3 ms. >>>> The configuration is still the same as Johann has described. >>>> >>> Hello Philippe >>> >>> Have you seen any problems in our ipipe trace, which I sent last week ? >>> >> >From those traces, the cyclic task is waiting for an event to happen >> before your watchdog pulls the break: >> >> : + func -256 0.743 rtdm_event_wait+0x14 >> (lrtdrv_timing_wait+0xa0 [sigmatek_lrt]) >> : + func -255+ 1.193 rtdm_event_timedwait+0x14 >> (rtdm_event_wait+0x2c) >> >> Looking at the timestamps, I can't see any 3 ms stall period for that >> task, at the very least it still happens to run a few hundreds of µs >> before your instrumentation code triggers an inactivity timeout. >> >> However, that cyclic task invoked a secondary mode call earlier from its >> processing loop, which caused a relax: >> >> : +*func -892+ 1.083 hisyscall_event+0x14 >> (ipipe_syscall_hook+0x80) >> : +*func -891 0.843 xnshadow_relax+0x14 >> (hisyscall_event+0x238) >> >> which eventually ended by a switch back to primary mode: >> >> :| *+func -282+ 1.130 xnpod_resume_thread+0x14 >> (gatekeeper_thread+0x208) >> :| *+[ 589] cyclic: 30 -281+ 1.403 xnpod_resume_thread+0xe8 >> (gatekeeper_thread+0x208) >> :| *+func -280 0.984 __ipipe_restore_head+0x10 >> (gatekeeper_thread+0x2c0) >> : +func -279 0.947 __xnpod_schedule+0x14 >> (gatekeeper_thread+0x2a8) >> >> Nothing really bad from the traces at first sight, although your system >> seems quite loaded, and some patterns tend to favor thundering herds >> effect: >> >> +*func -986 0.889 __rtdm_synch_flush+0x10 >> (period_update+0xd8 [sigmatek_lrt]) >> :| #*func -985 0.828 xnsynch_flush+0x14 >> (__rtdm_synch_flush+0xd0) >> :| #*func -985 0.851 xnpod_resume_thread+0x14 >> (xnsynch_flush+0x124) >> :| #*[ 589] cyclic: 30 -984 0.904 xnpod_resume_thread+0xe8 >> (xnsynch_flush+0x124) >> :| #*func -983+ 1.120 xnpod_resume_thread+0x14 >> (xnsynch_flush+0x124) >> :| #*[ 587] Loader: 29 -982 0.891 xnpod_resume_thread+0xe8 >> (xnsynch_flush+0x124) >> :| #*func -981 0.780 xnpod_resume_thread+0x14 >> (xnsynch_flush+0x124) >> :| #*[ 593] backgrou 0 -980 0.992 xnpod_resume_thread+0xe8 >> (xnsynch_flush+0x124) >> > Hallo Philippe > > Sorry about the last ipipe trace I sent. The trace buffer was > configured to small and so the problem was not traced. > In this ipipe trace I have seen what occur, but I don't know why. > At line 252366 the cyclic task will suspended. > :| # [ 610] cyclic: 30 -9119 0.677 __xnpod_schedule+0x14c > (xnpod_suspend_thread+0x434) > At line 261530 the cyclic task will resumed again. Between these two > calls are about 10ms. > :| *+[ 610] cyclic: 30 -528+ 1.486 xnpod_resume_thread+0xd4 > (gatekeeper_thread+0x1d8) > > > Note: You wrote the system seems to be very loaded. > The cpu load is only very high while the ipipe tracing is running. > > Harald > > Hello Philippe Is it possible that you can have a short view to the trace I have sent on monday. We dont know what can be wrong in that case. Harald