From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <545FA90B.4040407@free.fr> Date: Sun, 09 Nov 2014 18:48:59 +0100 From: Thierry Bultel MIME-Version: 1.0 References: <20141107095222.GD6724@sisyphus.hd.free.fr> <586279251.109308096.1415364479248.JavaMail.root@zimbra90-e16.priv.proxad.net> <20141107195807.GD17476@sisyphus.hd.free.fr> In-Reply-To: <20141107195807.GD17476@sisyphus.hd.free.fr> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Subject: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: nicolas Mabire , xenomai@xenomai.org Le 07/11/2014 20:58, Gilles Chanteperdrix a écrit : > On Fri, Nov 07, 2014 at 01:47:59PM +0100, tbultel@free.fr wrote: >> >> >> ----- Mail original ----- >>> De: "Gilles Chanteperdrix" >>> À: tbultel@free.fr >>> Cc: xenomai@xenomai.org, "Lennart Sorensen" >>> Envoyé: Vendredi 7 Novembre 2014 10:52:22 >>> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot >>> >>> On Fri, Nov 07, 2014 at 10:48:43AM +0100, tbultel@free.fr wrote: >>>> >>>> >>>> ----- Mail original ----- >>>>> De: "Gilles Chanteperdrix" >>>>> À: "Lennart Sorensen" >>>>> Cc: tbultel@free.fr, xenomai@xenomai.org >>>>> Envoyé: Jeudi 6 Novembre 2014 17:08:21 >>>>> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + >>>>> adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot >>>>> >>>>> On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote: >>>>>> On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr >>>>>> wrote: >>>>>>> Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled >>>>>>> It is -not- enabled in the evaluation kernel that is provided >>>>>>> by >>>>>>> the >>>>>>> manufacturer. >>>>>>> That errata is said to be for CPU revs < r2p0 >>>>>>> >>>>>>> I am a little bit puzzled about the naming conventions for >>>>>>> the >>>>>>> CPU revision, >>>>>>> uboot says rev1.2, the kernel says >>>>>>> >>>>>>> Processor : ARMv7 Processor rev 10 (v7l) >>>>>>> ... >>>>>>> >>>>>>> CPU implementer : 0x41 >>>>>>> CPU architecture: 7 >>>>>>> CPU variant : 0x2 >>>>>>> CPU part : 0xc09 >>>>>>> CPU revision : 10 >>>>>>> >>>>>>> how can I do the matching ? >>>>>>> >>>>>>> Meanwhile, we noticed that compared to the evaluation kernel, >>>>>>> we >>>>>>> were missing >>>>>>> CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419 >>>>>>> >>>>>>> Adding them helps a lot, but the freeze still happens on one >>>>>>> machine >>>>>>> >>>>>>> We are currently trying with 754322 + 769419 + 754327 + 5 >>>>>>> nops in >>>>>>> fec ... >>>>>>> but not sure if we need 754327. >>>>>>> >>>>>>> Regards >>>>>>> Thierry >>>>>>> >>>>>>> PS: Regarding the thermal issue, we have changed our >>>>>>> supplier, we >>>>>>> now have >>>>>>> a dissipator that is big enough (it is the AMOS820 from Via >>>>>>> Embedded) >>>>>> >>>>>> I am not sure how to read the A9 revision. I have this on a >>>>>> system: >>>>> >>>>> I have found the method I gave in ARM documentation. I am pretty >>>>> sure this is how it works. >>>>> >>>>> -- >>>>> Gilles. >>>>> >>>> >>>> >>>> Gille, >>>> we agree that as we have a r2p10, the 754327 does not apply. >>>> Thus the only erratas I was missing are CONFIG_ARM_ERRATA_754322 >>>> and CONFIG_PL310_ERRATA_769419, >>>> that are in ./arch/arm/configs/imx6_defconfig >>>> They are now part of my config. >>>> Unfortunately, the network stress test still makes the freeze >>>> happen with CONFIG_IPIPE enabled >>>> >>>> How come can that freeze only happen on -some- machines (they all >>>> have the same CPU rev), >>>> and that the time they stay up is dependent on them ? >>>> If the freeze was reproducible without CONFIG_IPIPE, we could >>>> easily say that it is simply >>>> an hardware bug but unfortunately with is not the case. >>>> >>>> A new info: the machine that freezes the most also freezes with >>>> ethernet fec unplugged. >>>> All these machines work fine with CONFIG_IPIPE disabled. >>> >>> Well, you told me that you had freezes because of the mb() in the >>> FEC code, all that I can tell you is that the bug I know related to >>> mb() would probably be fixed by adding nops before the mb(). It is >>> not clear to me, have you tried that? >>> >>> -- >> >> The freeze happens faster with he mb(), yes. >> But it is still there without it, or when adding the 5 nops before. >> And if the ethernet is unplugged (which normally leads to the code >> we mention not to be called), we have the bug, too. >> I have just made a test with a ethernet on USB adapter and it freezes the same way. > > When the freeze happens, is the timer still ticking? I will attempt to do some led debugging by next week, because I do not have a JTAG yet Have you > checked that all the tricks in the idle function are disabled, in > particular the switch to timer broadcast mode? Could you please be more specific ? But as you are talking about timer broadcast, I do not know if you remember, but in a previous mail, I said that I saw strange behaviour in the statistics of /proc/interrupts. The 'iMX Timer Tick' interrupt, which is executed on CPU0, increases its counter very slowly, less than 1 per minute. We did not pay too much attention to it. I see in /proc/timer_list that its handler is tick_handle_oneshot_broadcast Could that be related ? Also, one of our application runs in linux domain (not linked with xenomai), and uses clock_nanosleep to be woken up each 30 ms. We initially used CONFIG_NO_HZ, and found out that sometimes it took up to 200ms to be woken up. LTTng showed that it was not a preemption, and that the thread was really sched-switched, but that it took the CPU only after the next coming interrupt, for instance a network one. Again, I probably should have looked deeper to understand why, but the workaround of using CONFIG_HZ=1000 did it (which I guess hides the bug, but makes that the thread only looses 1 ms in the worse case) I wonder if that bug could be another symptom or not. Have you tried > enabling I-pipe and xenomai debugs? > This is my next step Regards Thierry