From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Dooks Date: Fri, 14 Mar 2014 14:45:14 +0000 Subject: Re: [PATCH] [RFC] ARM: shmobile: koelsch-reference: Work around core clock issues Message-Id: <532315FA.3@codethink.co.uk> List-Id: References: <1394720970-4749-1-git-send-email-geert@linux-m68k.org> In-Reply-To: <1394720970-4749-1-git-send-email-geert@linux-m68k.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-sh@vger.kernel.org On 14/03/14 14:43, Laurent Pinchart wrote: > Hi Geert, > > On Friday 14 March 2014 14:02:59 Geert Uytterhoeven wrote: >> On Fri, Mar 14, 2014 at 1:43 PM, Laurent Pinchart wrote: >>>>>> This should do the job, but as you mentioned, it's a crude hack. As >>>>>> we're targeting v3.16, is there a chance we could fix the problem >>>>>> properly instead ? >>>> >>>> Of course the goal is to fix it for real, so the crude hack will no >>>> longer be needed. But for now, it looks like a good short-term >>>> workaround. >>>> >>>>> The best fix would be to re-enable the PM and find out what is >>>> >>>> Sure, but in a multiplatform-aware way. >>> >>> Of course. Are you working on that, or should I give it a try ? Would you >>> like to discuss this ? >> >> Yes, I plan to work on this. But all input is welcome, of course. > > Any opinion on https://lkml.org/lkml/2014/1/31/290 ? > >>>>> actually causing the external abort. However currently there is >>>>> no information in the manuals about anything we could find out from >>>>> the AXI busses as to what the source actually is. >>>> >>>> I re-applied your patch "ARM: shmobile: compile drivers/sh for >>>> CONFIG_ARCH_SHMOBILE_MULTI", and surprisingly, I no longer get the >>>> external abort. >>>> >>>> Some experimenting revealed it's due to the "ether" clock in the >>>> clk_enables[] array. As long as that's enabled early, the system seems to >>>> boot fine with your patch. >>> >>> At what point do you get the external abort without the ether clock >>> workaround ? >> >> When userspace starts: >> >> Freeing unused kernel memory: 204K (c042b000 - c045e000) >> Unhandled fault: imprecise external abort (0x1406) at 0x00000000 >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 >> >> CPU: 1 PID: 1 Comm: init Not tainted >> 3.14.0-rc6-koelsch-reference-00362-gf29bb90d4995-dirty #164 >> Backtrace: >> [] (dump_backtrace) from [] (show_stack+0x18/0x1c) >> r6:eec799c0 r5:ee49ce40 r4:00000000 r3:00000204 >> [] (show_stack) from [] (dump_stack+0x70/0x8c) >> [] (dump_stack) from [] (panic+0x90/0x1ec) >> r4:eec799c0 r3:00000001 >> [] (panic) from [] (do_exit+0x494/0x8bc) >> r3:eec73dc0 r2:00000000 r1:00000007 r0:c03d33ac >> r7:ee49ce78 >> [] (do_exit) from [] (do_group_exit+0xa4/0xd0) >> r7:ee431040 >> [] (do_group_exit) from [] >> (get_signal_to_deliver+0x4bc/0x520) >> r7:ee431040 r6:eec7bee4 r5:eec7a000 r4:01060013 >> [] (get_signal_to_deliver) from [] >> (do_signal+0xa8/0x3c0) r10:00000000 r9:eec7a000 r8:00000000 r7:eec7a000 >> r6:00000000 r5:00000000 r4:eec7bfb0 >> [] (do_signal) from [] (do_work_pending+0x54/0x9c) >> r10:00000000 r8:00000000 r7:00000000 r6:00000000 r5:eec7a000 r4:eec7bfb0 >> [] (do_work_pending) from [] (work_pending+0xc/0x20) >> r6:ffffffff r5:00000030 r4:b6ef0bc0 r3:eec799c0 >> CPU0: stopping >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted >> 3.14.0-rc6-koelsch-reference-00362-gf29bb90d4995-dirty #164 >> Backtrace: >> [] (dump_backtrace) from [] (show_stack+0x18/0x1c) >> r6:c0468844 r5:00000000 r4:00000000 r3:00200000 >> [] (show_stack) from [] (dump_stack+0x70/0x8c) >> [] (dump_stack) from [] (handle_IPI+0xcc/0x164) >> r4:c0484b98 r3:c046eae0 >> [] (handle_IPI) from [] (gic_handle_irq+0x58/0x60) >> r5:c0461f18 r4:f0002000 >> [] (gic_handle_irq) from [] (__irq_svc+0x40/0x50) >> Exception stack(0xc0461f18 to 0xc0461f60) >> 1f00: ef1ed698 >> 00000000 1f20: 006e076b 00000000 c045d698 2ed90000 60000113 ef1ed698 >> c0468380 413fc0f2 1f40: ef7fccc0 c0461f8c c0461f60 c0461f60 c0067e14 >> c0067e18 60000113 ffffffff r6:ffffffff r5:60000113 r4:c0067e18 r3:c0067e14 >> [] (rcu_idle_exit) from [] >> (cpu_startup_entry+0xe4/0x118) r8:c0468380 r7:c03357f4 r6:c0468454 >> r5:c0484780 r4:c0460000 >> [] (cpu_startup_entry) from [] (rest_init+0x68/0x80) >> r7:c0454d90 r3:00000000 >> [] (rest_init) from [] (start_kernel+0x2fc/0x358) >> [] (start_kernel) from [<40008074>] (0x40008074) > > As the external abort is imprecise the backtrace is pretty useless :-/ All we > can tell from the DFSR value 0x1406 is that the fault was generated by a read > access not related to a cache maintenance operation. Bit 12 is an > implementation defined bit that might provide more information, but it isn't > documented in the R8A7791 datasheet. > > Could you try to enable LPAE ? The DFSR format is slightly different in that > case, it may provide more information. > >> Difference in clk_summary output between working and failed case just before >> "Freeing unused kernel memory" is: >> >> - ether 2 2 65000000 0 >> + ether 1 1 65000000 0 >> >> so at that point the clock is still enabled. >> >> You once mentioned that if you try to access a module's registers while its >> MSTP clock is not running you may get an exception (on some SoCs). >> Is this such an exception? > > Yes, those are the same symptoms. > >> Note that I never got exceptions when accessing QSPI or MSIOF on r8a7791 >> with the respective MSTP clocks disabled. I also didn't get one when >> Ethernet stopped working after the is_enabled() MSTP fix. That was before >> NFS root was mounted, though. >> >> Running actual executables after mounting is different. Demand paging is >> involved there. Perhaps there's a bug somewhere in nfs root mmap() or in the >> Ethernet driver, not propagating the errors due to the lost Ethernet clock, >> so /sbin/init starts running an uninitalized page? > > I don't think so. According to > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211h/Caccdbdh.html, > external aborts are errors "that occur in the memory system other than those > that are detected by an MMU." That looks really device-related to me. I've also had these when trying to access a bad address for one of the AXI busses (IICC). -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius