From mboxrd@z Thu Jan 1 00:00:00 1970 References: <574D9B03.8080706@sigmatek.at> <20160531141646.GG5951@hermes.click-hack.org> <574EE886.2020907@sigmatek.at> <20160601141238.GC14103@hermes.click-hack.org> <574FEB2D.5010509@sigmatek.at> <20160602082318.GB1801@hermes.click-hack.org> <5755204C.6090701@sigmatek.at> <20160606153545.GA376@hermes.click-hack.org> <5756D673.4080408@sigmatek.at> <20160607170050.GA13922@hermes.click-hack.org> From: Wolfgang Netbal Message-ID: <57714C60.4070407@sigmatek.at> Date: Mon, 27 Jun 2016 17:55:12 +0200 MIME-Version: 1.0 In-Reply-To: <20160607170050.GA13922@hermes.click-hack.org> Content-Type: text/plain; charset="windows-1252"; format="flowed" Content-Transfer-Encoding: quoted-printable Subject: Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 Reply-To: wolfgang.netbal@sigmatek.at List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: xenomai@xenomai.org Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: >>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>> Dear all, >>>>>>>>>> >>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.= 43" to >>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The = system >>>>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.= 2.1 + >>>>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>>>> >>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in o= ur old >>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>>>>>> XENOMAI task with priority =3D 95. >> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() >> and 1038 handled by xnpod_schedule_handler() while my realtime task >> is running on kernel 3.10.53 with Xenomai 2.6.4. >> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the >> once that are send by my board using GPIOs, but this virtual interrupts >> are assigned to Xenomai and Linux as well but I didn't see a handler >> installed. >> I'm pretty sure that these interrupts are slowing down my system, but >> where do they come from ? >> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? >> how long do they need to process ? > How do you mean you do not see them? If you are talking about the > rescheduling API, it used no to be bound to a virq (so, it would > have a different irq number on cortex A9, something between 0 and 31 > that would not show in the usual /proc files), I wonder if 3.0 is > before or after that. You do not see them in /proc, or you see them > and their count does not increase? Sorry for the long delay, we ran a lot of tests to find out what could=20 be the reason for the performance difference. If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to=20 the virtual IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel > As for where they come from, this is not a mystery, the reschedule > IPI is triggered when code on one cpu changes the scheduler state > (wakes up a thread for instance) on another cpu. If you want to > avoid it, do not do that. That means, do not share mutex between > threads running on different cpus, pay attention for timers to be > running on the same cpu as the thread they signal, etc... > > The APC virq is used to multiplex several services, which you can > find by grepping the sources for rthal_apc_alloc: > ./ksrc/skins/posix/apc.c: pse51_lostage_apc =3D rthal_apc_alloc("ps= e51_lostage_handler", > ./ksrc/skins/rtdm/device.c: rtdm_apc =3D rthal_apc_alloc("deferred RT= DM close", rtdm_apc_handler, > ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", &r= egistry_proc_schedule, NULL); > ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup= _proc, NULL); > ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &l= ostage_handler, NULL); > ./ksrc/nucleus/select.c: xnselect_apc =3D rthal_apc_alloc("xnselec= tors_destroy", > > It would be interesting to know which of these services is triggered > a lot. One possibility I see would be root thread priority > inheritance, so it would be caused by mode switches. This brings the > question: do your application have threads migrating between primary > and secondary mode, do you see the count of mode switches increase > with the kernel changes, do you have root thread priority > inheritance enabled? > Here a short sum up of our tests and the results and at the end a few=20 questions :-) we are using a Freescale imx6dl on our hardware and upgraded our operating = system from Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler= we use GCC 4.7.2 Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler = we use GCC 4.8.2 On both Kernels the CONFIG_SMP is set. What we see is that when we running a customer project in a Xenomai task wi= th priority 95 tooks 40% of the CPU time on Kernel 3.0.43 and 47% of CPU time on Kernel 3.10.53 so the new system is slower by 7% if we sum up this to 100% CPU load we hav= e a difference of 15% To find out what is the reason for this difference we ran the following tes= t. We tried to get the new system faster by change some components of the syst= em. -Changing U-Boot on new system -> still 7% slower -Copy Kernel 3.0.43 to new system -> still 7% slower -Creating Kernel 3.0.43 with Xenomai 2.6.4 and copy it to new system -> still 7% slower -Compiling the new system with old GCC version -> still 7% slo= wer -We also checked the settings for RAM and CPU clock -> these are equal It looks like that is not one of the big components, so we started to test some special functions like rt_timer_tsc() In the following example we stay for 800=B5s in the while loop and start this loop again after 200=B5s delay. The task application running this code has priotity 95. Here a simplified code snipped start =3D rt_timer_tsc(); do { current =3D rt_timer_tsc(); i++;=09 } while((current - start) < 800) i is on old system at 1464 i is on new system at 1392 this means a difference of 5% Is it possible that the prefetching of code change between the two Kernel v= ersions, because our customer application is bigger then our testcode the difference= can be greater. Any hints what can be the reason for this slowdown ? Kind regards Wolfgang