* [Xenomai] Kernel freezes in __ipipe_sync_stage @ 2014-06-20 9:11 Marco Tessore 2014-06-20 11:52 ` Gilles Chanteperdrix 0 siblings, 1 reply; 11+ messages in thread From: Marco Tessore @ 2014-06-20 9:11 UTC (permalink / raw) To: xenomai Good morning, I am a fairly new programmer to kernel code developement, and recently I deal with the development of applications and device drivers using the Linux / Ipipe / Xenomai platform; I have a problem with a kernel installed on devices that we have in production, set up by another programmer. Please allow me to describe the problem: the problem is essentially that the kernel rarely, but often enough to be a problem, seems to freeze at boot, and from what I have seen with a debugger hardware - specifically the Lauterbach T32 - it seems that the stalemate is due to the ipipe code. The kernel is version 2.6.31 for ARM architecture - specifically a Freescale iMX257, ARM926EJ-S - with Xenomai 2.5.6 and a not very recent ipipe patch, of which I did not know the version, i presume the one included in the xenomai archive. The stalling seems to occur in the function __ ipipe_sync_stage, in kernel/ipipe/core.c, and can occur at various times during the system boot. As an example, I describe the stack that I could observe during one of these stall conditions: __ipipe_mach_get_tsc xntimer_tick_aperiodic xintr_clock_handler __ipipe_sync_stage ipipe_suspend_domain __ipipe_walk_pipeline __ipipe_restore_pipeline xnarc_next_htick_shot clockevents_program_event tick_dev_program_event tick_program_event hrtimer_interrupt mxc_timer_interrupt handle_IRQ_event handle_level_irq asm_do_IRQ __ipipe_sync_stage <-- loop ipipe_suspend_domain __ipipe_walk_pipeline __ipipe_restore_pipeline_head xnpod_enable_timesource xnpod_init __native_skin_init do_one_initcall kernel_init The problem seems to occur within the first call to the function __ipipe_sync_stage (as indicated by the arrow), in particular it seems that we never match the exit condition of the innermost "while" loop: ((submask = p-> irqpend_lomask [level])! = 0). It seems that after the reset of p->irqpend_lomask[level], during the execution of interrupt service routine, timer interrupt I think, it seems that the flag, or some other flags in the variable returns set, and this seems to cause the lock. Given that, although I had an idea of the general mechanisms that drive ipipe, I am not able to grasp the implementation details, in particular I cannot state when and where that flags are set, I presume when hw interrupt occours. I was wondering if you could give me an idea of what could cause stalling or if you had any suggestions on how to get out, making advancing the kernel in a clean state. Thank you in advance for any suggestions you could give me, kind regards Marco Tessore ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-06-20 9:11 [Xenomai] Kernel freezes in __ipipe_sync_stage Marco Tessore @ 2014-06-20 11:52 ` Gilles Chanteperdrix 2014-06-20 12:18 ` Marco Tessore 2014-06-24 16:41 ` Marco Tessore 0 siblings, 2 replies; 11+ messages in thread From: Gilles Chanteperdrix @ 2014-06-20 11:52 UTC (permalink / raw) To: Marco Tessore, xenomai On 06/20/2014 11:11 AM, Marco Tessore wrote: > The kernel is version 2.6.31 for ARM architecture - specifically a Do you have the same problem with a recent I-pipe patches, like one for 3.8 or 3.10 kernel? -- Gilles. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-06-20 11:52 ` Gilles Chanteperdrix @ 2014-06-20 12:18 ` Marco Tessore 2014-06-20 12:25 ` Gilles Chanteperdrix 2014-06-24 16:41 ` Marco Tessore 1 sibling, 1 reply; 11+ messages in thread From: Marco Tessore @ 2014-06-20 12:18 UTC (permalink / raw) To: Gilles Chanteperdrix, xenomai Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto: > On 06/20/2014 11:11 AM, Marco Tessore wrote: >> The kernel is version 2.6.31 for ARM architecture - specifically a > Do you have the same problem with a recent I-pipe patches, like one for > 3.8 or 3.10 kernel? > One note: Philippe had already written an email about, full of tips on how to identify the cause, and that is probably not attributable to ipipe, unfortunately the email went into spam and I had not seen. For the moment I have enough material to work with, if the tests, that I will do on the 3.10, were to emerge again something I will inform you. Thank you very much Marco ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-06-20 12:18 ` Marco Tessore @ 2014-06-20 12:25 ` Gilles Chanteperdrix 0 siblings, 0 replies; 11+ messages in thread From: Gilles Chanteperdrix @ 2014-06-20 12:25 UTC (permalink / raw) To: Marco Tessore, xenomai On 06/20/2014 02:18 PM, Marco Tessore wrote: > Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto: >> On 06/20/2014 11:11 AM, Marco Tessore wrote: >>> The kernel is version 2.6.31 for ARM architecture - specifically a >> Do you have the same problem with a recent I-pipe patches, like one for >> 3.8 or 3.10 kernel? >> > > One note: > Philippe had already written an email about, full of tips on how to > identify the cause, and that is probably not attributable to ipipe, > unfortunately the email went into spam and I had not seen. > > For the moment I have enough material to work with, > if the tests, that I will do on the 3.10, were to emerge again something > I will inform you. Ok, there is a very old bug in the imx tsc emulation, which used a physical address as virtual address. See for instance: http://www.armadeus.com/wiki/index.php?title=Xenomai#Xenomai_kernel_space_support -- Gilles. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-06-20 11:52 ` Gilles Chanteperdrix 2014-06-20 12:18 ` Marco Tessore @ 2014-06-24 16:41 ` Marco Tessore 2014-06-24 17:10 ` Philippe Gerum 1 sibling, 1 reply; 11+ messages in thread From: Marco Tessore @ 2014-06-24 16:41 UTC (permalink / raw) To: Gilles Chanteperdrix, xenomai Hi, Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto: > On 06/20/2014 11:11 AM, Marco Tessore wrote: >> The kernel is version 2.6.31 for ARM architecture - specifically a > Do you have the same problem with a recent I-pipe patches, like one for > 3.8 or 3.10 kernel? > I managed to do some tests on 3.10 kernel but on onother board with imx28 CPU, actually it happens that that kernel freezes too, but I haven't debugged it with the jtag debugger. I have, instead, some information on the original problem, that is the one that worried me more: In summary: I have a board based on imx25, with kernel 2.6.31, Xenomai 2.5.6 and ipipe patch 1.16-02. Rarely, but often enough to be a problem, the kernel freezes at boot. Thanks to a JTAG debugger I'm able to observe the kernel in the following situation: I'm in an infinite loop with the following stack trace: __ipipe_set_irqpending xnintr_host_tick (__ipipe_propagate_irq) xnintr_clock_handler __ipipe_sync_stage <- (1) ipipe_suspend_domain __ipipe_walk_pipeline __ipipe_restore_pipeline_head xnarch_next_tick_shot clockevents_program_event tick_dev_program_event hrtimer_interrupt mxc_interrupt handle_IRQ_event handle_level_irq asm_do_IRQ __ipipe_sync_stage <- (2) ipipe_suspend_domain __ipipe_walk_pipeline __ipipe_restore_pipeline_head xnpod_enable_timesource xnpod_init __native_skin_init ... ... Specifically, it happens that the first call to __ipipe_sync_stage, the one marked with the number (2), is working on a stage that I can not determine, let's say for convenience stage S1, I think is the Linux secondary domain but I'm not sure, so the function invokes the interrupt handler of the system timer. Continuing in the stack trace, I have a nested call to __ipipe_sync_stage, indicated with (1), but this call works on another stage, for convenience domain S2, in turn this function invokes a handler for the timer irq, which at a certain point invokes the __ipipe_propagate_irq which raises the flags for the stage S1, thus making the first call to __ipipe_sync_stage (2) fails to get out of their while loops. I should add that I do not see hardware interrupt for the timer in function __ipipe_grab_IRQ. I have no idea how the cycle is triggered,but when the kernel is locked, the kernel is located in the software exclusively infinite loop described above. In the hope that you could help me understand what is going on, I would have liked groped a patch like this: - Store, for each level of nesting of __ipipe_sync_stage, the irq number currently running and on behalf of which stage. - Patch the function __ipipe_set_irqpending in such a way as not to set the flags for the pair (irq, stage) if the pair is already present at some level in the current stack trace, that is, - if the function __ipipe_sync_stage is executing the handler for a stage, and then he had reset the flags in irqpend_himask and irqpend_lomask, it does not expect the handler goes to raise again the same flag for the same stage. What do you think about this? Thank you very much for any kind of advice you could give me Sincerely Marco Tessore ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-06-24 16:41 ` Marco Tessore @ 2014-06-24 17:10 ` Philippe Gerum 2014-06-25 7:50 ` Marco Tessore 0 siblings, 1 reply; 11+ messages in thread From: Philippe Gerum @ 2014-06-24 17:10 UTC (permalink / raw) To: Marco Tessore, Gilles Chanteperdrix, xenomai On 06/24/2014 06:41 PM, Marco Tessore wrote: > Hi, > > Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto: >> On 06/20/2014 11:11 AM, Marco Tessore wrote: >>> The kernel is version 2.6.31 for ARM architecture - specifically a >> Do you have the same problem with a recent I-pipe patches, like one for >> 3.8 or 3.10 kernel? >> > > I managed to do some tests on 3.10 kernel but on onother board with > imx28 CPU, actually it happens that that kernel freezes too, > but I haven't debugged it with the jtag debugger. > > I have, instead, some information on the original problem, that is the > one that worried me more: > > In summary: > I have a board based on imx25, with kernel 2.6.31, Xenomai 2.5.6 and > ipipe patch 1.16-02. > > Rarely, but often enough to be a problem, the kernel freezes at boot. > Thanks to a JTAG debugger I'm able to observe the kernel in the > following situation: > I'm in an infinite loop with the following stack trace: > __ipipe_set_irqpending > xnintr_host_tick (__ipipe_propagate_irq) > xnintr_clock_handler > __ipipe_sync_stage <- (1) > ipipe_suspend_domain > __ipipe_walk_pipeline > __ipipe_restore_pipeline_head > xnarch_next_tick_shot > clockevents_program_event > tick_dev_program_event > hrtimer_interrupt > mxc_interrupt > handle_IRQ_event > handle_level_irq > asm_do_IRQ > __ipipe_sync_stage <- (2) > ipipe_suspend_domain > __ipipe_walk_pipeline > __ipipe_restore_pipeline_head > xnpod_enable_timesource > xnpod_init > __native_skin_init > ... > ... > > Specifically, it happens that the first call to __ipipe_sync_stage, the > one marked with the number (2), is working on a stage that I can not > determine, > let's say for convenience stage S1, I think is the Linux secondary > domain but I'm not sure, > so the function invokes the interrupt handler of the system timer. > Continuing in the stack trace, I have a nested call to > __ipipe_sync_stage, indicated with (1), > but this call works on another stage, for convenience domain S2, > in turn this function invokes a handler for the timer irq, which at a > certain point invokes the __ipipe_propagate_irq which raises the flags > for the stage S1, > thus making the first call to __ipipe_sync_stage (2) fails to get out of > their while loops. > > I should add that I do not see hardware interrupt for the timer in > function __ipipe_grab_IRQ. > I have no idea how the cycle is triggered,but when the kernel is locked, > the kernel is located in the software exclusively infinite loop > described above. > > > In the hope that you could help me understand what is going on, > I would have liked groped a patch like this: > - Store, for each level of nesting of __ipipe_sync_stage, the irq number > currently running and on behalf of which stage. > - Patch the function __ipipe_set_irqpending in such a way as not to set > the flags for the pair (irq, stage) if the pair is already present at > some level in the current stack trace, that is, > - if the function __ipipe_sync_stage is executing the handler for a > stage, and then he had reset the flags in irqpend_himask and > irqpend_lomask, it does not expect the handler goes to raise again the > same flag for the same stage. > > What do you think about this? > > Thank you very much for any kind of advice you could give me > You mentioned random lockups during boot. Does you board ever lock up when passing xeno_hal.disable=1 on the kernel command line? -- Philippe. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-06-24 17:10 ` Philippe Gerum @ 2014-06-25 7:50 ` Marco Tessore 2014-06-25 8:39 ` Philippe Gerum 0 siblings, 1 reply; 11+ messages in thread From: Marco Tessore @ 2014-06-25 7:50 UTC (permalink / raw) To: Philippe Gerum, Gilles Chanteperdrix, xenomai Il 24/06/2014 19:10, Philippe Gerum ha scritto: > On 06/24/2014 06:41 PM, Marco Tessore wrote: >> Hi, >> >> Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto: >>> On 06/20/2014 11:11 AM, Marco Tessore wrote: >>>> The kernel is version 2.6.31 for ARM architecture - specifically a >>> Do you have the same problem with a recent I-pipe patches, like one for >>> 3.8 or 3.10 kernel? >>> >> >> I managed to do some tests on 3.10 kernel but on onother board with >> imx28 CPU, actually it happens that that kernel freezes too, >> but I haven't debugged it with the jtag debugger. >> >> I have, instead, some information on the original problem, that is the >> one that worried me more: >> >> In summary: >> I have a board based on imx25, with kernel 2.6.31, Xenomai 2.5.6 and >> ipipe patch 1.16-02. >> >> Rarely, but often enough to be a problem, the kernel freezes at boot. >> Thanks to a JTAG debugger I'm able to observe the kernel in the >> following situation: >> I'm in an infinite loop with the following stack trace: >> __ipipe_set_irqpending >> xnintr_host_tick (__ipipe_propagate_irq) >> xnintr_clock_handler >> __ipipe_sync_stage <- (1) >> ipipe_suspend_domain >> __ipipe_walk_pipeline >> __ipipe_restore_pipeline_head >> xnarch_next_tick_shot >> clockevents_program_event >> tick_dev_program_event >> hrtimer_interrupt >> mxc_interrupt >> handle_IRQ_event >> handle_level_irq >> asm_do_IRQ >> __ipipe_sync_stage <- (2) >> ipipe_suspend_domain >> __ipipe_walk_pipeline >> __ipipe_restore_pipeline_head >> xnpod_enable_timesource >> xnpod_init >> __native_skin_init >> ... >> ... >> >> Specifically, it happens that the first call to __ipipe_sync_stage, the >> one marked with the number (2), is working on a stage that I can not >> determine, >> let's say for convenience stage S1, I think is the Linux secondary >> domain but I'm not sure, >> so the function invokes the interrupt handler of the system timer. >> Continuing in the stack trace, I have a nested call to >> __ipipe_sync_stage, indicated with (1), >> but this call works on another stage, for convenience domain S2, >> in turn this function invokes a handler for the timer irq, which at a >> certain point invokes the __ipipe_propagate_irq which raises the flags >> for the stage S1, >> thus making the first call to __ipipe_sync_stage (2) fails to get out of >> their while loops. >> >> I should add that I do not see hardware interrupt for the timer in >> function __ipipe_grab_IRQ. >> I have no idea how the cycle is triggered,but when the kernel is locked, >> the kernel is located in the software exclusively infinite loop >> described above. >> >> >> In the hope that you could help me understand what is going on, >> I would have liked groped a patch like this: >> - Store, for each level of nesting of __ipipe_sync_stage, the irq number >> currently running and on behalf of which stage. >> - Patch the function __ipipe_set_irqpending in such a way as not to set >> the flags for the pair (irq, stage) if the pair is already present at >> some level in the current stack trace, that is, >> - if the function __ipipe_sync_stage is executing the handler for a >> stage, and then he had reset the flags in irqpend_himask and >> irqpend_lomask, it does not expect the handler goes to raise again the >> same flag for the same stage. >> >> What do you think about this? >> >> Thank you very much for any kind of advice you could give me >> > > You mentioned random lockups during boot. Does you board ever lock up > when passing xeno_hal.disable=1 on the kernel command line? > Yes, I mentioned random lockups, but always the kernel enters in the infinite loop described above. Following your suggestion I tried to pass parameter xeno_hal.disable=1 but kernel sayed "Unknown boot option `xeno_hal.disable=1': ignoring" What is supposed to do this option anyway? If it would disable HAL, does not this inhibits xenomai realtime services? What about the patch,described above, that I would apply? say, don't permit that the interrupt handlers called in __ipipe_sync_stage raise a couple (stage, irq) already handled in the current stack? Thank you Marco Tessore ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-06-25 7:50 ` Marco Tessore @ 2014-06-25 8:39 ` Philippe Gerum 2014-07-02 16:36 ` Marco Tessore 2014-07-09 16:09 ` Marco Tessore 0 siblings, 2 replies; 11+ messages in thread From: Philippe Gerum @ 2014-06-25 8:39 UTC (permalink / raw) To: Marco Tessore, Gilles Chanteperdrix, xenomai On 06/25/2014 09:50 AM, Marco Tessore wrote: > Il 24/06/2014 19:10, Philippe Gerum ha scritto: >> On 06/24/2014 06:41 PM, Marco Tessore wrote: >>> Hi, >>> >>> Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto: >>>> On 06/20/2014 11:11 AM, Marco Tessore wrote: >>>>> The kernel is version 2.6.31 for ARM architecture - specifically a >>>> Do you have the same problem with a recent I-pipe patches, like one for >>>> 3.8 or 3.10 kernel? >>>> >>> >>> I managed to do some tests on 3.10 kernel but on onother board with >>> imx28 CPU, actually it happens that that kernel freezes too, >>> but I haven't debugged it with the jtag debugger. >>> >>> I have, instead, some information on the original problem, that is the >>> one that worried me more: >>> >>> In summary: >>> I have a board based on imx25, with kernel 2.6.31, Xenomai 2.5.6 and >>> ipipe patch 1.16-02. >>> >>> Rarely, but often enough to be a problem, the kernel freezes at boot. >>> Thanks to a JTAG debugger I'm able to observe the kernel in the >>> following situation: >>> I'm in an infinite loop with the following stack trace: >>> __ipipe_set_irqpending >>> xnintr_host_tick (__ipipe_propagate_irq) >>> xnintr_clock_handler >>> __ipipe_sync_stage <- (1) >>> ipipe_suspend_domain >>> __ipipe_walk_pipeline >>> __ipipe_restore_pipeline_head >>> xnarch_next_tick_shot >>> clockevents_program_event >>> tick_dev_program_event >>> hrtimer_interrupt >>> mxc_interrupt >>> handle_IRQ_event >>> handle_level_irq >>> asm_do_IRQ >>> __ipipe_sync_stage <- (2) >>> ipipe_suspend_domain >>> __ipipe_walk_pipeline >>> __ipipe_restore_pipeline_head >>> xnpod_enable_timesource >>> xnpod_init >>> __native_skin_init >>> ... >>> ... >>> >>> Specifically, it happens that the first call to __ipipe_sync_stage, the >>> one marked with the number (2), is working on a stage that I can not >>> determine, >>> let's say for convenience stage S1, I think is the Linux secondary >>> domain but I'm not sure, >>> so the function invokes the interrupt handler of the system timer. >>> Continuing in the stack trace, I have a nested call to >>> __ipipe_sync_stage, indicated with (1), >>> but this call works on another stage, for convenience domain S2, >>> in turn this function invokes a handler for the timer irq, which at a >>> certain point invokes the __ipipe_propagate_irq which raises the flags >>> for the stage S1, >>> thus making the first call to __ipipe_sync_stage (2) fails to get out of >>> their while loops. >>> >>> I should add that I do not see hardware interrupt for the timer in >>> function __ipipe_grab_IRQ. >>> I have no idea how the cycle is triggered,but when the kernel is locked, >>> the kernel is located in the software exclusively infinite loop >>> described above. >>> >>> >>> In the hope that you could help me understand what is going on, >>> I would have liked groped a patch like this: >>> - Store, for each level of nesting of __ipipe_sync_stage, the irq number >>> currently running and on behalf of which stage. >>> - Patch the function __ipipe_set_irqpending in such a way as not to set >>> the flags for the pair (irq, stage) if the pair is already present at >>> some level in the current stack trace, that is, >>> - if the function __ipipe_sync_stage is executing the handler for a >>> stage, and then he had reset the flags in irqpend_himask and >>> irqpend_lomask, it does not expect the handler goes to raise again the >>> same flag for the same stage. >>> >>> What do you think about this? >>> >>> Thank you very much for any kind of advice you could give me >>> >> >> You mentioned random lockups during boot. Does you board ever lock up >> when passing xeno_hal.disable=1 on the kernel command line? >> > Yes, I mentioned random lockups, but always the kernel enters in the > infinite loop described above. > Following your suggestion I tried to pass parameter xeno_hal.disable=1 > but kernel sayed > "Unknown boot option `xeno_hal.disable=1': ignoring" > This is because you are running an outdated Xenomai 2.5.x release. A work around is to build all the Xenomai skins as modules in the kernel (native, posix, vxworks etc), refraining from modloading them during the boot process. > What is supposed to do this option anyway? If it would disable HAL, does > not this inhibits xenomai realtime services? > This is exactly what we want. When the real-time services commence, control of the hardware timer is handed over to Xenomai, which enables pipelining of the clock source events to the co-kernel. We need to know in this path is involved. > What about the patch,described above, that I would apply? say, don't > permit that the interrupt handlers called in __ipipe_sync_stage raise a > couple (stage, irq) already handled in the current stack? > This won't work, this breaks an aspect of the pipeline core logic. This would be papering over the issue, not fixing it, opening a can of worms down the road. We are not chasing a bug in the core logic at this point, we are more likely chasing a bug in the SoC-specific code which binds the hw timer to the pipeline. First step is to determine if the system experiences an IRQ storm of some sort from the timer chip, and why so. By focusing on the IRQ replay loop which basically resyncs the current interrupt state with the past events logged, you may be looking at rays from an ancient sun. > Thank you > Marco Tessore > > -- Philippe. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-06-25 8:39 ` Philippe Gerum @ 2014-07-02 16:36 ` Marco Tessore 2014-07-02 17:41 ` Gilles Chanteperdrix 2014-07-09 16:09 ` Marco Tessore 1 sibling, 1 reply; 11+ messages in thread From: Marco Tessore @ 2014-07-02 16:36 UTC (permalink / raw) To: Philippe Gerum, Gilles Chanteperdrix, xenomai Good morning, Il 25/06/2014 10:39, Philippe Gerum ha scritto: > On 06/25/2014 09:50 AM, Marco Tessore wrote: >> Il 24/06/2014 19:10, Philippe Gerum ha scritto: >>> On 06/24/2014 06:41 PM, Marco Tessore wrote: >>>> Hi, >>>> >>>> Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto: >>>>> On 06/20/2014 11:11 AM, Marco Tessore wrote: >>>>>> The kernel is version 2.6.31 for ARM architecture - specifically a >>>>> Do you have the same problem with a recent I-pipe patches, like >>>>> one for >>>>> 3.8 or 3.10 kernel? >>>>> >>>> >>>> I managed to do some tests on 3.10 kernel but on onother board with >>>> imx28 CPU, actually it happens that that kernel freezes too, >>>> but I haven't debugged it with the jtag debugger. >>>> >>>> I have, instead, some information on the original problem, that is the >>>> one that worried me more: >>>> >>>> In summary: >>>> I have a board based on imx25, with kernel 2.6.31, Xenomai 2.5.6 and >>>> ipipe patch 1.16-02. >>>> >>>> Rarely, but often enough to be a problem, the kernel freezes at boot. >>>> Thanks to a JTAG debugger I'm able to observe the kernel in the >>>> following situation: >>>> I'm in an infinite loop with the following stack trace: >>>> __ipipe_set_irqpending >>>> xnintr_host_tick (__ipipe_propagate_irq) >>>> xnintr_clock_handler >>>> __ipipe_sync_stage <- (1) >>>> ipipe_suspend_domain >>>> __ipipe_walk_pipeline >>>> __ipipe_restore_pipeline_head >>>> xnarch_next_tick_shot >>>> clockevents_program_event >>>> tick_dev_program_event >>>> hrtimer_interrupt >>>> mxc_interrupt >>>> handle_IRQ_event >>>> handle_level_irq >>>> asm_do_IRQ >>>> __ipipe_sync_stage <- (2) >>>> ipipe_suspend_domain >>>> __ipipe_walk_pipeline >>>> __ipipe_restore_pipeline_head >>>> xnpod_enable_timesource >>>> xnpod_init >>>> __native_skin_init >>>> ... >>>> ... >>>> >>>> Specifically, it happens that the first call to __ipipe_sync_stage, >>>> the >>>> one marked with the number (2), is working on a stage that I can not >>>> determine, >>>> let's say for convenience stage S1, I think is the Linux secondary >>>> domain but I'm not sure, >>>> so the function invokes the interrupt handler of the system timer. >>>> Continuing in the stack trace, I have a nested call to >>>> __ipipe_sync_stage, indicated with (1), >>>> but this call works on another stage, for convenience domain S2, >>>> in turn this function invokes a handler for the timer irq, which at a >>>> certain point invokes the __ipipe_propagate_irq which raises the flags >>>> for the stage S1, >>>> thus making the first call to __ipipe_sync_stage (2) fails to get >>>> out of >>>> their while loops. >>>> >>>> I should add that I do not see hardware interrupt for the timer in >>>> function __ipipe_grab_IRQ. >>>> I have no idea how the cycle is triggered,but when the kernel is >>>> locked, >>>> the kernel is located in the software exclusively infinite loop >>>> described above. >>>> >>>> >>>> In the hope that you could help me understand what is going on, >>>> I would have liked groped a patch like this: >>>> - Store, for each level of nesting of __ipipe_sync_stage, the irq >>>> number >>>> currently running and on behalf of which stage. >>>> - Patch the function __ipipe_set_irqpending in such a way as not to >>>> set >>>> the flags for the pair (irq, stage) if the pair is already present at >>>> some level in the current stack trace, that is, >>>> - if the function __ipipe_sync_stage is executing the handler for a >>>> stage, and then he had reset the flags in irqpend_himask and >>>> irqpend_lomask, it does not expect the handler goes to raise again the >>>> same flag for the same stage. >>>> >>>> What do you think about this? >>>> >>>> Thank you very much for any kind of advice you could give me >>>> >>> >>> You mentioned random lockups during boot. Does you board ever lock up >>> when passing xeno_hal.disable=1 on the kernel command line? >>> >> Yes, I mentioned random lockups, but always the kernel enters in the >> infinite loop described above. >> Following your suggestion I tried to pass parameter xeno_hal.disable=1 >> but kernel sayed >> "Unknown boot option `xeno_hal.disable=1': ignoring" >> > > This is because you are running an outdated Xenomai 2.5.x release. A > work around is to build all the Xenomai skins as modules in the kernel > (native, posix, vxworks etc), refraining from modloading them during > the boot process. > >> What is supposed to do this option anyway? If it would disable HAL, does >> not this inhibits xenomai realtime services? >> > > This is exactly what we want. When the real-time services commence, > control of the hardware timer is handed over to Xenomai, which enables > pipelining of the clock source events to the co-kernel. We need to > know in this path is involved. > >> What about the patch,described above, that I would apply? say, don't >> permit that the interrupt handlers called in __ipipe_sync_stage raise a >> couple (stage, irq) already handled in the current stack? >> > > This won't work, this breaks an aspect of the pipeline core logic. > This would be papering over the issue, not fixing it, opening a can of > worms down the road. We are not chasing a bug in the core logic at > this point, we are more likely chasing a bug in the SoC-specific code > which binds the hw timer to the pipeline. > > First step is to determine if the system experiences an IRQ storm of > some sort from the timer chip, and why so. By focusing on the IRQ > replay loop which basically resyncs the current interrupt state with > the past events logged, you may be looking at rays from an ancient sun. > >> Thank you >> Marco Tessore >> >> > > still trying to investigate the problem, I re-applied the patch ipipe on a clean kernel and compared with the problematic one, obviously by matching the same versions of kernel, ipipe patch, xenomai. I noticed a difference between the defective one and the one just obtained: the defective kernel has the following block of code at the end of the file /arch/arm/mach-mx25/devices.c il blocco: #ifdef CONFIG_IPIPE static int post_cpu_init(void) { ipipe_mach_allow_hwtimer_uaccess(MX25_AIPS1_BASE_ADDR_VIRT,MX25_AIPS2_BASE_ADDR_VIRT); return 0; } postcore_initcall(post_cpu_init); #endif /* CONFIG_IPIPE */ the question that I kindly ask is: what should do the function ipipe_mach_allow_hwtimer_uaccess? In order to reconnect to the previous email: - I analyzed interrupts: the timer ones seem to me fairly regular - Occasionally we have bursts of the NAND memory interrupt, I think it is normal - I am still experiencing occasional blocks of the kernel, but does not occur interrupt flood, neither from the timer and nand memory, for at least one second before the deadlock - seen with an oscilloscope. Now I'm trying a kernel where I commented the code block above; I hope they do not occur more blocks, but I'd like to know what is the function ipipe_mach_allow_hwtimer_uaccess, since the block was entered by the person who produced the kernel I'm debugging, the block is not present in the ipipe patch attached to the distribution of Xenomai, I do not know why it was inserted. Thank you very much kind regards Marco Tessore ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-07-02 16:36 ` Marco Tessore @ 2014-07-02 17:41 ` Gilles Chanteperdrix 0 siblings, 0 replies; 11+ messages in thread From: Gilles Chanteperdrix @ 2014-07-02 17:41 UTC (permalink / raw) To: Marco Tessore, Philippe Gerum, xenomai On 07/02/2014 06:36 PM, Marco Tessore wrote: > ipipe_mach_allow_hwtimer_uaccess(MX25_AIPS1_BASE_ADDR_VIRT,MX25_AIPS2_BASE_ADDR_VIRT); > return 0; > } > > postcore_initcall(post_cpu_init); > #endif /* CONFIG_IPIPE */ > > > the question that I kindly ask is: what should do the function > ipipe_mach_allow_hwtimer_uaccess? As the name suggests, this function allows access to the hardware timer registers from user-space. Xenomai user-space libraries require this, unless you compile xenomai with --disable-arm-tsc, but this will increase (greatly) the latency of services such as rt_timer_tsc or clock_gettime(CLOCK_MONOTONIC) by requiring a system call. And it probably has nothing to do with your problem. -- Gilles. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai] Kernel freezes in __ipipe_sync_stage 2014-06-25 8:39 ` Philippe Gerum 2014-07-02 16:36 ` Marco Tessore @ 2014-07-09 16:09 ` Marco Tessore 1 sibling, 0 replies; 11+ messages in thread From: Marco Tessore @ 2014-07-09 16:09 UTC (permalink / raw) To: Philippe Gerum, Gilles Chanteperdrix, xenomai Good morning, I'm still trying to investigate the deadlock that is keeping me busy for quite some time. I have the following situation occurs: the domain root is in its call to __ipipe_sync_stage invoked indirectly by xnpod_enable_timesource {xnlock_put_irq_restore(lock, x = 0), lock is ignored, and this will generate calls to __ipipe_restore_pipeline_head, __ipipe_walk_pipeline and ipipe_suspend_domain } here we are: in __ipipe_sync_stage for the Linux domain. In it I have execution of the timer interrupt service routine, which in my case is a Freescale i.MX25's timer: mxc_timer_interrupt in arch/arm/plat_mxc/time.c. As a note: this file (time.c) have been corrected since it previously doesn'n take into account that timer chip for i.MX25 is the same of the one for the mx3 and mx5. Following the chain, from __ipipe_sync_stage, we have a call to xnarch_next_ht_shot, xntimer_start_aperiodic; is finally invoked the __ipipe_set_irq_pending for xenomai domain. Subsequently, the procedure __xnarch_next_htick_shot invokes the the ipipe_restore_pipeline_head. than we have this call: void __ipipe_restore_pipeline_head(unsigned long x) { struct ipipe_percpu_domain_data *p = ipipe_head_cpudom_ptr(); local_irq_disable_hw(); if (x) { #ifdef CONFIG_DEBUG_KERNEL static int warned; if (!warned && test_and_set_bit(IPIPE_STALL_FLAG, &p->status)) { /* * Already stalled albeit ipipe_restore_pipeline_head() * should have detected it? Send a warning once. */ warned = 1; printk(KERN_WARNING "I-pipe: ipipe_restore_pipeline_head() optimization failed.\n"); dump_stack(); } #else /* !CONFIG_DEBUG_KERNEL */ set_bit(IPIPE_STALL_FLAG, &p->status); #endif /* CONFIG_DEBUG_KERNEL */ } else { __clear_bit(IPIPE_STALL_FLAG, &p->status); if (unlikely(p->irqpend_himask != 0)) { struct ipipe_domain *head_domain = __ipipe_pipeline_head(); if (likely(head_domain == __ipipe_current_domain)) __ipipe_sync_pipeline(IPIPE_IRQMASK_ANY); else __ipipe_walk_pipeline(&head_domain->p_link); <-- THIS CALL } local_irq_enable_hw(); } } (as we saw before, irqpend_himask for xenomai domain was set for the timer interrupt) Here the call to the __ipipe_walk_pipeline and from this the __ipipe_sync_stage for the xenomai domain. We have the call to xnintr_clock_handler xntimer_tick_aperiodic, xntimer_next_local_shot, xnintr_host_tick, xnarch_relay_tick theese calls __ipipe_set_irq_pending for the timer interrupt on linux domain. Since we are already - deeper in the call stack - in the __ipipe_sync_stage for the linux domain, we have that at this level __ipipe_sync_stage clears the flags in the interrupt log for the timer, it handles the timer interrupt and the chain described above, set in turns the flags in the interrupt log for xenomai domain, which handler sets again the interrupt log for the linux domain; In the next iteration this repeats infinite times, causing stall of the kernel. Can you help me to understand some more? In particular how it can be possible that linux domain triggers xenomain domain that in turns triggers linux domain? As I said in previous mails, this is not a frequent bug, it happens randomly when I boot the machine, but it's still limiting the scope for which the device has been developed. I can capture the state with an hardware debugger when deadlock happens, but I cannot find what is happened before. Surely I know that I havent anomalies in timer interrupt, driving a pin in the function __ipipe_grab_irq, I can see that timer interrupt is quite regular. As I said in previous mails, this is not a frequent bug, it happens randomly when I boot the machine, but it's still limiting the scope for which the device has been developed. I can capture the state with an hardware debugger when deadlock happens, but I cannot find what is happened before. Surely I know that I haven't anomalies in timer interrupt: driving a pin in the function __ipipe_grab_irq, I can see that timer interrupt is quite regular. Thank you in advance for any help. Kind regards Marco Tessore In reference to your past email Il 25/06/2014 10:39, Philippe Gerum ha scritto: > On 06/25/2014 09:50 AM, Marco Tessore wrote: >> Il 24/06/2014 19:10, Philippe Gerum ha scritto: >>> On 06/24/2014 06:41 PM, Marco Tessore wrote: >>>> Hi, >>>> >>>> Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto: >>>>> On 06/20/2014 11:11 AM, Marco Tessore wrote: >>>>>> The kernel is version 2.6.31 for ARM architecture - specifically a >>>>> Do you have the same problem with a recent I-pipe patches, like >>>>> one for >>>>> 3.8 or 3.10 kernel? >>>>> >>>> >>>> I managed to do some tests on 3.10 kernel but on onother board with >>>> imx28 CPU, actually it happens that that kernel freezes too, >>>> but I haven't debugged it with the jtag debugger. >>>> > This is because you are running an outdated Xenomai 2.5.x release. A > work around is to build all the Xenomai skins as modules in the kernel > (native, posix, vxworks etc), refraining from modloading them during > the boot process. I tried this and the event has not occurred, instead, after hundreds of reboots it happened that the kernel freezed in idle_task, and the init process stalled, I don't know where, can be related or not to the problem described above. > > First step is to determine if the system experiences an IRQ storm of > some sort from the timer chip, and why so. By focusing on the IRQ > replay loop which basically resyncs the current interrupt state with > the past events logged, you may be looking at rays from an ancient sun. > It can be excluded, I haven't saw any interrupt storm, the timer interrupt is quite regular. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-07-09 16:09 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-20 9:11 [Xenomai] Kernel freezes in __ipipe_sync_stage Marco Tessore 2014-06-20 11:52 ` Gilles Chanteperdrix 2014-06-20 12:18 ` Marco Tessore 2014-06-20 12:25 ` Gilles Chanteperdrix 2014-06-24 16:41 ` Marco Tessore 2014-06-24 17:10 ` Philippe Gerum 2014-06-25 7:50 ` Marco Tessore 2014-06-25 8:39 ` Philippe Gerum 2014-07-02 16:36 ` Marco Tessore 2014-07-02 17:41 ` Gilles Chanteperdrix 2014-07-09 16:09 ` Marco Tessore
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.