* set_fiq_handler: Bad mode in data abort handler detected [not found] <2527501.cXAbiV8bqS@dabox> @ 2014-04-24 10:31 ` Russell King - ARM Linux 2014-04-24 11:57 ` Tim Sander 2014-04-24 14:33 ` Tim Sander 0 siblings, 2 replies; 8+ messages in thread From: Russell King - ARM Linux @ 2014-04-24 10:31 UTC (permalink / raw) To: linux-arm-kernel Please address kernel related problems to the linux-arm-kernel mailing list in preference to linux-arm. Thanks. On Thu, Apr 24, 2014 at 11:46:15AM +0200, Tim Sander wrote: > I have installed a FIQ handler with set_fiq_handler on an Xilinx Zynq. > I had to enable the the FIQ symbol in kconfig for the Zynq as its not enabled > by default. As i was not able to boot a mainline kernel i used the 3.12 kernel > of the xilinx repository at github. But as there are no changes in the FIQ handler > stuff i guess that does not matter. The Zynq is a dual ArmV7 Cortex A9. > The handler works for an random timespan and then i see: The first rule of FIQs is that they are not permitted to cause any aborts what so ever - any aborts can be fatal as they can cause deadlock. > Bad mode in data abort handler detected > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM > Modules linked in: firq(O) ipv6 > CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.12.0-xilinx-dirty #54 > task: c05bd420 ti: c05b2000 task.ti: c05b2000 > PC is at 0xffff1224 > LR is at arch_cpu_idle+0x20/0x2c > pc : [<ffff1224>] lr : [<c000f344>] psr: 600e01d1 > sp : c05b3f70 ip : 00000000 fp : 00000000 > r10: 00000000 r9 : 413fc090 r8 : c0a264c0 > r7 : c05a7720 r6 : c04080c8 r5 : c05f2500 r4 : c05b2000 > r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : c0a299f8 > Flags: nZCv IRQs off FIQs off Mode FIQ_32 ISA ARM Segment kernel > Control: 18c5387d Table: 1ec0404a DAC: 00000015 > Process swapper/0 (pid: 0, stack limit = 0xc05b2240) > Stack: (0xc05b3f70 to 0xc05b4000) > 3f60: c0a299f8 00000000 00000000 00000000 > 3f80: c05b2000 c05f2500 c04080c8 c05a7720 c0a264c0 413fc090 00000000 00000000 > 3fa0: 00000000 c05b3f70 c000f344 ffff1224 600e01d1 ffffffff 00000000 c0055fb8 > 3fc0: c040a7b0 c0584a5c ffffffff ffffffff c0584574 00000000 00000000 c05a7720 > 3fe0: 18c5387d c05ba3cc c05a771c c05be440 0000406a 00008074 00000000 00000000 > [<c000f344>] (arch_cpu_idle+0x20/0x2c) from [<00000000>] ( (null)) > Code: e320f000 e320f000 e320f000 eafffffe (e5889000) The faulting instruction was: str r9, [r8] However, the register dump above does not include the FIQ banked registers, so we don't actually know what r8 was. > My first guess would be that i had a cache page miss in the fiq handler? Yes. > I guess the best way would be putting the fiq-handler on the On Chip > Memory but then i would still have the same problem that the code jumping > to the OCM would have a cache miss? I'm guessing that the address pointed to by r8 (the timer base) is ioremapped after other threads are already started? The problem with that is other threads won't have the L1 page table pointers for these mappings - we populate these lazily because trying to do it at ioremap() time would be extremely painful. What might be possible is to have a function which can be called in these circumstances which ensures that a kernel address is accessible to all threads in the system, though while it's doing that, it would have to stop any fork() or exit() activity to be sure that it updated every thread. In years gone by, I'd have recommended that the kernel mappings for this stuff were done via static mappings, but with DT, that's no longer acceptable. So I guess we have a problem... -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. ^ permalink raw reply [flat|nested] 8+ messages in thread
* set_fiq_handler: Bad mode in data abort handler detected 2014-04-24 10:31 ` set_fiq_handler: Bad mode in data abort handler detected Russell King - ARM Linux @ 2014-04-24 11:57 ` Tim Sander 2014-04-24 14:33 ` Tim Sander 1 sibling, 0 replies; 8+ messages in thread From: Tim Sander @ 2014-04-24 11:57 UTC (permalink / raw) To: linux-arm-kernel Hi Russell Am Donnerstag, 24. April 2014, 11:31:37 schrieb Russell King - ARM Linux: > Please address kernel related problems to the linux-arm-kernel mailing > list in preference to linux-arm. Thanks. Sorry, i thought linux-arm was the right list. > On Thu, Apr 24, 2014 at 11:46:15AM +0200, Tim Sander wrote: > > I have installed a FIQ handler with set_fiq_handler on an Xilinx Zynq. > > I had to enable the the FIQ symbol in kconfig for the Zynq as its not > > enabled by default. As i was not able to boot a mainline kernel i used > > the 3.12 kernel of the xilinx repository at github. But as there are no > > changes in the FIQ handler stuff i guess that does not matter. The Zynq > > is a dual ArmV7 Cortex A9. > > The handler works for an random timespan and then i see: > The first rule of FIQs is that they are not permitted to cause any > aborts what so ever - any aborts can be fatal as they can cause > deadlock. > > > Bad mode in data abort handler detected > > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM > > Modules linked in: firq(O) ipv6 > > CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.12.0-xilinx-dirty > > #54 task: c05bd420 ti: c05b2000 task.ti: c05b2000 > > PC is at 0xffff1224 > > LR is at arch_cpu_idle+0x20/0x2c > > pc : [<ffff1224>] lr : [<c000f344>] psr: 600e01d1 > > sp : c05b3f70 ip : 00000000 fp : 00000000 > > r10: 00000000 r9 : 413fc090 r8 : c0a264c0 > > r7 : c05a7720 r6 : c04080c8 r5 : c05f2500 r4 : c05b2000 > > r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : c0a299f8 > > Flags: nZCv IRQs off FIQs off Mode FIQ_32 ISA ARM Segment kernel > > Control: 18c5387d Table: 1ec0404a DAC: 00000015 > > Process swapper/0 (pid: 0, stack limit = 0xc05b2240) > > Stack: (0xc05b3f70 to 0xc05b4000) > > 3f60: c0a299f8 00000000 00000000 > > 00000000 3f80: c05b2000 c05f2500 c04080c8 c05a7720 c0a264c0 413fc090 > > 00000000 00000000 3fa0: 00000000 c05b3f70 c000f344 ffff1224 600e01d1 > > ffffffff 00000000 c0055fb8 3fc0: c040a7b0 c0584a5c ffffffff ffffffff > > c0584574 00000000 00000000 c05a7720 3fe0: 18c5387d c05ba3cc c05a771c > > c05be440 0000406a 00008074 00000000 00000000 [<c000f344>] > > (arch_cpu_idle+0x20/0x2c) from [<00000000>] ( (null)) Code: e320f000 > > e320f000 e320f000 eafffffe (e5889000) > > The faulting instruction was: > > str r9, [r8] r8 indeed points to a ioremapped address. > However, the register dump above does not include the FIQ banked registers, > so we don't actually know what r8 was. > > > My first guess would be that i had a cache page miss in the fiq handler? > > Yes. > > > I guess the best way would be putting the fiq-handler on the On Chip > > Memory but then i would still have the same problem that the code jumping > > to the OCM would have a cache miss? > > I'm guessing that the address pointed to by r8 (the timer base) is > ioremapped after other threads are already started? Yes: For testing purposes i wrote a kernel module which insmod'ed into the kernel. To the ioremap for this address is surely executed after the kernel threads are started (which i guess is you mean with other threads). > The problem with > that is other threads won't have the L1 page table pointers for these > mappings - we populate these lazily because trying to do it at > ioremap() time would be extremely painful. So the success of the fiq interrupt depends on the context of the kernel thread running (or more precisely of the L1 page table pointers of that particular thread) when the FIQ hits? > What might be possible is to have a function which can be called in > these circumstances which ensures that a kernel address is accessible > to all threads in the system, though while it's doing that, it would > have to stop any fork() or exit() activity to be sure that it updated > every thread. Well, as this would be at the time of the FIQ installment, where timing is not yet critical, that should at least work for this usecase. > In years gone by, I'd have recommended that the kernel mappings for > this stuff were done via static mappings, but with DT, that's no > longer acceptable. So I guess we have a problem... Oh my, i didn't meant to open a can of worms. Best regards Tim ^ permalink raw reply [flat|nested] 8+ messages in thread
* set_fiq_handler: Bad mode in data abort handler detected 2014-04-24 10:31 ` set_fiq_handler: Bad mode in data abort handler detected Russell King - ARM Linux 2014-04-24 11:57 ` Tim Sander @ 2014-04-24 14:33 ` Tim Sander 2014-04-24 19:01 ` Russell King - ARM Linux 1 sibling, 1 reply; 8+ messages in thread From: Tim Sander @ 2014-04-24 14:33 UTC (permalink / raw) To: linux-arm-kernel Hi Russell and List <snip> > In years gone by, I'd have recommended that the kernel mappings for > this stuff were done via static mappings, but with DT, that's no > longer acceptable. So I guess we have a problem... To verify that your very plausible hypothesis is right i tried: timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY); //also tried MT_DEVICE the memory at early boot in "zynq_init_late". But this fails and gives the following error: WARNING: CPU: 0 PID: 1 at arch/arm/mm/ioremap.c:301 __arm_ioremap_pfn_caller+0x100/0x184() which seems to be the only WARN_ON which shows that the pfn is invalid. Any hints why this call to __arm_ioremap fails? Also i tried to map with ioremap_nocache during module load but i guess this information also gets propagated lazy so it also didn't work. Best regards Tim ^ permalink raw reply [flat|nested] 8+ messages in thread
* set_fiq_handler: Bad mode in data abort handler detected 2014-04-24 14:33 ` Tim Sander @ 2014-04-24 19:01 ` Russell King - ARM Linux 2014-04-25 13:36 ` Tim Sander 0 siblings, 1 reply; 8+ messages in thread From: Russell King - ARM Linux @ 2014-04-24 19:01 UTC (permalink / raw) To: linux-arm-kernel On Thu, Apr 24, 2014 at 04:33:38PM +0200, Tim Sander wrote: > Hi Russell and List > <snip> > > In years gone by, I'd have recommended that the kernel mappings for > > this stuff were done via static mappings, but with DT, that's no > > longer acceptable. So I guess we have a problem... > > To verify that your very plausible hypothesis is right i tried: > timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY); //also tried MT_DEVICE This isn't going to help. Any dynamically initialised mapping via any of the ioremap functions is going to fail for the reason I outlined, and it doesn't matter what type of mapping you use. *All* dynamically created mappings are populated to other threads lazily. The reason for that is because it's _very_ expensive/racy to walk over every single thread and update its page tables - Linux years ago used to do that as standard with ioremap() and similar, and the code was ripped out after it became too much of a burden. When I talk about static mappings above, I'm talking about those which are setup very early in boot via iotable_init(). However, these aren't permitted with DT anymore. > the memory at early boot in "zynq_init_late". My kernel doesn't have zynq_init_late()... I'm guessing that it's hooked into the .init_late callback, which is certainly too late - this is called towards the end of driver initialisation, after many threads have already been spawned. The places where you are called before any threads have been spawned are unfortunately places where you can't use ioremap(). At the moment, I don't have an answer to this - the answers I have are incompatible with the direction that arm-soc people want to go (which is to have zero static mappings.) -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. ^ permalink raw reply [flat|nested] 8+ messages in thread
* set_fiq_handler: Bad mode in data abort handler detected 2014-04-24 19:01 ` Russell King - ARM Linux @ 2014-04-25 13:36 ` Tim Sander 2014-04-25 13:51 ` Russell King - ARM Linux 0 siblings, 1 reply; 8+ messages in thread From: Tim Sander @ 2014-04-25 13:36 UTC (permalink / raw) To: linux-arm-kernel Hi Russell and List Thanks for your feedback! Am Donnerstag, 24. April 2014, 20:01:56 schrieb Russell King - ARM Linux: > > > In years gone by, I'd have recommended that the kernel mappings for > > > this stuff were done via static mappings, but with DT, that's no > > > longer acceptable. So I guess we have a problem... > > > > To verify that your very plausible hypothesis is right i tried: > > timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY); > > //also tried MT_DEVICE > This isn't going to help. Any dynamically initialised mapping via any > of the ioremap functions is going to fail for the reason I outlined, > and it doesn't matter what type of mapping you use. *All* dynamically > created mappings are populated to other threads lazily. Ok, i tried mapping statically in bootup. Just to verify and understand the problem. It seems to help somewhat (probably it does go into more threads), but it doesn't remedy the problem completly: static struct map_desc zynq_axi_gp0 __initdata = { .virtual = 0xe4000000, //FIXME just arbitrary, which? .pfn = __phys_to_pfn(0x40000000), .length = SZ_128M, .type = MT_DEVICE, }; static void __init zynq_axi_gp_init(void) { iotable_init(&zynq_axi_gp0,1); zynq_axi_gp0_base = (void __iomem *) zynq_axi_gp0.virtual; BUG_ON(!zynq_axi_gp0_base); } This was called in the .map_io callback. But it seems, even this is to late to propagate into all threads. Calling it earlier does not work (e.g. .init_early ,.init_timer or init_irq)... Thinking about it, if its truly lazy even an early initialization does not help if mapping synchronisation is allways done lazy via data abort. > The reason for that is because it's _very_ expensive/racy to walk over > every single thread and update its page tables - Linux years ago used > to do that as standard with ioremap() and similar, and the code was > ripped out after it became too much of a burden. It seems as if this was before git times? At least it does not seem to be in the git repository. Do you have an rough estimate in what year that was or which kernel version? > When I talk about static mappings above, I'm talking about those which > are setup very early in boot via iotable_init(). However, these aren't > permitted with DT anymore. As pointed out above this call at least boots and works in a way that i see the ioremapped virtual address used (0xe4000000). > > the memory at early boot in "zynq_init_late". > > My kernel doesn't have zynq_init_late()... I'm guessing that it's hooked > into the .init_late callback, which is certainly too late - this is called > towards the end of driver initialisation, after many threads have already > been spawned. Is there an callback where iotable_init still works and that is early enough? > The places where you are called before any threads have been spawned are > unfortunately places where you can't use ioremap(). > > At the moment, I don't have an answer to this - the answers I have are > incompatible with the direction that arm-soc people want to go (which is > to have zero static mappings.) Ok, your wrote in the earlier mail: >What might be possible is to have a function which can be called in >these circumstances which ensures that a kernel address is accessible >to all threads in the system, though while it's doing that, it would >have to stop any fork() or exit() activity to be sure that it updated >every thread. Would a solution that works that way be acceptable for mainline? Besides that i currently don't understand why the FIQ worked on older pre CortexA9 cores with Linux? There is an nice writeup at http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/ which is working with an armV5 (which has the caches on the "wrong" side) and i think that it was also working on armV6 (aka arm1136). Best regards Tim ^ permalink raw reply [flat|nested] 8+ messages in thread
* set_fiq_handler: Bad mode in data abort handler detected 2014-04-25 13:36 ` Tim Sander @ 2014-04-25 13:51 ` Russell King - ARM Linux 2014-05-12 7:02 ` set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault) Tim Sander 0 siblings, 1 reply; 8+ messages in thread From: Russell King - ARM Linux @ 2014-04-25 13:51 UTC (permalink / raw) To: linux-arm-kernel On Fri, Apr 25, 2014 at 03:36:48PM +0200, Tim Sander wrote: > Hi Russell and List > > Thanks for your feedback! > Am Donnerstag, 24. April 2014, 20:01:56 schrieb Russell King - ARM Linux: > > > > In years gone by, I'd have recommended that the kernel mappings for > > > > this stuff were done via static mappings, but with DT, that's no > > > > longer acceptable. So I guess we have a problem... > > > > > > To verify that your very plausible hypothesis is right i tried: > > > timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY); > > > //also tried MT_DEVICE > > This isn't going to help. Any dynamically initialised mapping via any > > of the ioremap functions is going to fail for the reason I outlined, > > and it doesn't matter what type of mapping you use. *All* dynamically > > created mappings are populated to other threads lazily. > Ok, i tried mapping statically in bootup. Just to verify and understand the > problem. It seems to help somewhat (probably it does go into more threads), > but it doesn't remedy the problem completly: > > static struct map_desc zynq_axi_gp0 __initdata = { > .virtual = 0xe4000000, //FIXME just arbitrary, which? > .pfn = __phys_to_pfn(0x40000000), > .length = SZ_128M, > .type = MT_DEVICE, > }; > > static void __init zynq_axi_gp_init(void) > { > iotable_init(&zynq_axi_gp0,1); > zynq_axi_gp0_base = (void __iomem *) zynq_axi_gp0.virtual; > BUG_ON(!zynq_axi_gp0_base); > } > This was called in the .map_io callback. But it seems, even this is to late to > propagate into all threads. Calling it earlier does not work (e.g. .init_early > ,.init_timer or init_irq)... It isn't too late. .map_io is called as part of the very early kernel initialisation, when the page tables are being setup with real mappings for the very first time. There's no interrupts, no real memory allocators, in fact not much of anything at that point. I'm afraid that I'm no longer that knowledgeable about whether ioremap will take account of this stuff or not - other people have been hacking in this area and my knowledge is outdated. > Thinking about it, if its truly lazy even an early initialization does not > help if mapping synchronisation is allways done lazy via data abort. This is how it works. .map_io is called with the init_mm as the current mm structure. This contains the page tables. Calling iotable_init() sets up mappings in that page table. No other threads exist at this point. When a kernel thread is spawned, all L1 page tables for kernel mappings are copied to the child's page tables. Therefore, the mappings setup via iotable_init() will propagate into the children without any data aborts. On ioremap(), the init_mm's page tables are updated with the L1 entries. Other page tables are not updated until an access is performed, which causes a data abort if there is no L1 page table entry. So, .map_io should resolve the problem. If it doesn't, something else is going on - maybe ioremap() is trampling all over your static mappings... though I thought we put the iotable_init()-created mappings into the vmalloc list, which should prevent it. I don't know anymore... > > The reason for that is because it's _very_ expensive/racy to walk over > > every single thread and update its page tables - Linux years ago used > > to do that as standard with ioremap() and similar, and the code was > > ripped out after it became too much of a burden. > > It seems as if this was before git times? At least it does not seem to be > in the git repository. Do you have an rough estimate in what year that was or > which kernel version? Yes, way before a very long time ago, probably 1.2 or 2.0 kernel time (or their development counterparts.) I'm sorry, I don't think I can really help anymore with this problem. I've given you the best that my limited knowledge of the ARM kernel today allows, which is reducing as I don't really hack on the ARM kernel very much anymore, and I'm not involved with many of the changes which happen today. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. ^ permalink raw reply [flat|nested] 8+ messages in thread
* set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault) 2014-04-25 13:51 ` Russell King - ARM Linux @ 2014-05-12 7:02 ` Tim Sander 2014-05-12 19:06 ` Nicolas Pitre 0 siblings, 1 reply; 8+ messages in thread From: Tim Sander @ 2014-05-12 7:02 UTC (permalink / raw) To: linux-arm-kernel Hi I am still hunting the mmu faults during FIQ. But i have some new information which seem to warrant a new mail. But first for reference the thread start: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/250196.html as i am also cc'ing linux-mm as this seems also concerning mm. Am Freitag, 25. April 2014, 14:51:18 schrieb Russell King - ARM Linux: > On Fri, Apr 25, 2014 at 03:36:48PM +0200, Tim Sander wrote: > > Hi Russell and List > > > > Thanks for your feedback! > > > > Am Donnerstag, 24. April 2014, 20:01:56 schrieb Russell King - ARM Linux: > > > > > In years gone by, I'd have recommended that the kernel mappings for > > > > > this stuff were done via static mappings, but with DT, that's no > > > > > longer acceptable. So I guess we have a problem... > > > > > > > > To verify that your very plausible hypothesis is right i tried: > > > > timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY); > > > > //also tried MT_DEVICE > > > > > > This isn't going to help. Any dynamically initialised mapping via any > > > of the ioremap functions is going to fail for the reason I outlined, > > > and it doesn't matter what type of mapping you use. *All* dynamically > > > created mappings are populated to other threads lazily. > > > > Ok, i tried mapping statically in bootup. Just to verify and understand > > the > > problem. It seems to help somewhat (probably it does go into more > > threads), > > but it doesn't remedy the problem completly: > > > > static struct map_desc zynq_axi_gp0 __initdata = { > > > > .virtual = 0xe4000000, //FIXME just arbitrary, which? > > .pfn = __phys_to_pfn(0x40000000), > > .length = SZ_128M, > > .type = MT_DEVICE, > > > > }; > > > > static void __init zynq_axi_gp_init(void) > > { > > > > iotable_init(&zynq_axi_gp0,1); > > zynq_axi_gp0_base = (void __iomem *) zynq_axi_gp0.virtual; > > BUG_ON(!zynq_axi_gp0_base); > > > > } > > This was called in the .map_io callback. But it seems, even this is to > > late to propagate into all threads. Calling it earlier does not work > > (e.g. .init_early ,.init_timer or init_irq)... > > It isn't too late. .map_io is called as part of the very early kernel > initialisation, when the page tables are being setup with real mappings > for the very first time. There's no interrupts, no real memory allocators, > in fact not much of anything at that point. > > I'm afraid that I'm no longer that knowledgeable about whether ioremap > will take account of this stuff or not - other people have been hacking > in this area and my knowledge is outdated. > > > Thinking about it, if its truly lazy even an early initialization does not > > help if mapping synchronisation is allways done lazy via data abort. > > This is how it works. > > .map_io is called with the init_mm as the current mm structure. This > contains the page tables. Calling iotable_init() sets up mappings in > that page table. No other threads exist at this point. > > When a kernel thread is spawned, all L1 page tables for kernel mappings > are copied to the child's page tables. Therefore, the mappings setup > via iotable_init() will propagate into the children without any data > aborts. > > On ioremap(), the init_mm's page tables are updated with the L1 entries. > Other page tables are not updated until an access is performed, which > causes a data abort if there is no L1 page table entry. > > So, .map_io should resolve the problem. If it doesn't, something else > is going on - maybe ioremap() is trampling all over your static mappings... > though I thought we put the iotable_init()-created mappings into the > vmalloc list, which should prevent it. I don't know anymore... I did an prefaulting for each available processes: for_each_process(process) { printk("process: %s [%d]\n",process->comm,process->pid); if(process->mm) { switch_mm(old_process->mm,process->mm,process); ioread32(priv->my_hardware); // access the memory, prefault mmu old_process = process; } } but still i get the the "Bad mode in data abort": Bad mode in data abort handler detected Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM Modules linked in: firq(O+) ipv6 CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.12.0-xilinx-00005-gc9455c0-dirty #97 task: c05cb420 ti: c05c0000 task.ti: c05c0000 PC is at 0xe3fc0000 LR is at arch_cpu_idle+0x20/0x2c pc : [<e3fc0000>] lr : [<c000f344>] psr: 600701d1 sp : c05c1f70 ip : 00000000 fp : 00000000 r10: 00000000 r9 : 413fc090 r8 : c0a7b4c0 r7 : c05b6088 r6 : c0412348 r5 : c06008c0 r4 : c05c0000 r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : c0a7e9f8 Flags: nZCv IRQs off FIQs off Mode FIQ_32 ISA ARM Segment kernel Control: 18c5387d Table: 1ec2c04a DAC: 00000015 Process swapper/0 (pid: 0, stack limit = 0xc05c0240) Stack: (0xc05c1f70 to 0xc05c2000) 1f60: c0a7e9f8 00000000 00000000 00000000 1f80: c05c0000 c06008c0 c0412348 c05b6088 c0a7b4c0 413fc090 00000000 00000000 1fa0: 00000000 c05c1f70 c000f344 e3fc0000 600701d1 ffffffff 00000000 c0056748 1fc0: c0414a30 c0592a60 ffffffff ffffffff c0592574 00000000 00000000 c05b6088 1fe0: 18c5387d c05c83cc c05b6084 c05cc440 0000406a 00008074 00000000 00000000 [<c000f344>] (arch_cpu_idle+0x20/0x2c) from [<00000000>] ( (null)) Code: bad PC value ---[ end trace 38f263d4b2076bcb ]--- But then i realized that its always swapper/0 which is faulting. But i don't see a pid 0 process in my for_each_process loop. So i tried some special handling for pid 0 to also prefault it: process = pid_task(&init_struct_pid, PIDTYPE_PID); if(process) { printk("process: %s [%d]\n",process->comm,process->pid); switch_mm(current_task->mm,process->mm,process); ioread32(priv->my_hardware); // access the memory, prefault mmu switch_mm(process->mm,current_task->mm,current_task); } else printk("process pid prefault failed\n"); //<this path is taken But it seems that the scheduler pid struct has no process associated. So its not possible to get the mmu_struct for the pid 0. The structure can't be implicit or otherwise there should be an mmu entry due to the prefaulting done or due to the static mapping. So it seems there is an MMU table which is not associated with any process and is used during scheduler/swapper work... but where is it hiding? I am sure that the error seen is a mmu translation fault as the IFSR bits of the DFSR show 00101 or 00111 which is a mmu translation fault for section or page. I have also verified the address accessed by the fiq handler routine accesses my_hardware. Also the fact that the handler is working *most* of the time fits well to the mmu translation fault. Another interesting fact is that if the interrupt rate is slower (e.g. 1 second), i see this problem if it is faster (probably Kernel HZ(?), but hard to tell as the error is not deterministic) they seem to go away. Best regards Tim ^ permalink raw reply [flat|nested] 8+ messages in thread
* set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault) 2014-05-12 7:02 ` set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault) Tim Sander @ 2014-05-12 19:06 ` Nicolas Pitre 0 siblings, 0 replies; 8+ messages in thread From: Nicolas Pitre @ 2014-05-12 19:06 UTC (permalink / raw) To: linux-arm-kernel On Mon, 12 May 2014, Tim Sander wrote: > I did an prefaulting for each available processes: > for_each_process(process) > { > printk("process: %s [%d]\n",process->comm,process->pid); > if(process->mm) { > switch_mm(old_process->mm,process->mm,process); > ioread32(priv->my_hardware); // access the memory, prefault mmu > old_process = process; > } > } > but still i get the the "Bad mode in data abort": > Bad mode in data abort handler detected > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM > Modules linked in: firq(O+) ipv6 > CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.12.0-xilinx-00005-gc9455c0-dirty #97 > task: c05cb420 ti: c05c0000 task.ti: c05c0000 > PC is at 0xe3fc0000 > LR is at arch_cpu_idle+0x20/0x2c > pc : [<e3fc0000>] lr : [<c000f344>] psr: 600701d1 > sp : c05c1f70 ip : 00000000 fp : 00000000 > r10: 00000000 r9 : 413fc090 r8 : c0a7b4c0 > r7 : c05b6088 r6 : c0412348 r5 : c06008c0 r4 : c05c0000 > r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : c0a7e9f8 > Flags: nZCv IRQs off FIQs off Mode FIQ_32 ISA ARM Segment kernel > Control: 18c5387d Table: 1ec2c04a DAC: 00000015 > Process swapper/0 (pid: 0, stack limit = 0xc05c0240) > Stack: (0xc05c1f70 to 0xc05c2000) > 1f60: c0a7e9f8 00000000 00000000 00000000 > 1f80: c05c0000 c06008c0 c0412348 c05b6088 c0a7b4c0 413fc090 00000000 00000000 > 1fa0: 00000000 c05c1f70 c000f344 e3fc0000 600701d1 ffffffff 00000000 c0056748 > 1fc0: c0414a30 c0592a60 ffffffff ffffffff c0592574 00000000 00000000 c05b6088 > 1fe0: 18c5387d c05c83cc c05b6084 c05cc440 0000406a 00008074 00000000 00000000 > [<c000f344>] (arch_cpu_idle+0x20/0x2c) from [<00000000>] ( (null)) > Code: bad PC value > ---[ end trace 38f263d4b2076bcb ]--- > > But then i realized that its always swapper/0 which is faulting. But i don't see a pid 0 process > in my for_each_process loop. So i tried some special handling for pid 0 to also prefault it: > > process = pid_task(&init_struct_pid, PIDTYPE_PID); > if(process) { > printk("process: %s [%d]\n",process->comm,process->pid); > switch_mm(current_task->mm,process->mm,process); > ioread32(priv->my_hardware); // access the memory, prefault mmu > switch_mm(process->mm,current_task->mm,current_task); > } else printk("process pid prefault failed\n"); //<this path is taken > > But it seems that the scheduler pid struct has no process associated. So its > not possible to get the mmu_struct for the pid 0. The structure can't be > implicit or otherwise there should be an mmu entry due to the prefaulting done > or due to the static mapping. So it seems there is an MMU table which is not > associated with any process and is used during scheduler/swapper work... > but where is it hiding? The mmu_struct for PID 0 is at &init_mm. Try: switch_mm(current_task->mm, &init_mm, NULL); Nicolas ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-05-12 19:06 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <2527501.cXAbiV8bqS@dabox> 2014-04-24 10:31 ` set_fiq_handler: Bad mode in data abort handler detected Russell King - ARM Linux 2014-04-24 11:57 ` Tim Sander 2014-04-24 14:33 ` Tim Sander 2014-04-24 19:01 ` Russell King - ARM Linux 2014-04-25 13:36 ` Tim Sander 2014-04-25 13:51 ` Russell King - ARM Linux 2014-05-12 7:02 ` set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault) Tim Sander 2014-05-12 19:06 ` Nicolas Pitre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).