Re: [Xenomai] Kernel freezes in __ipipe_sync_stage

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marco Tessore <marco.tessore@axelsw.it>
To: Philippe Gerum <rpm@xenomai.org>,
	Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>,
	xenomai@xenomai.org
Subject: Re: [Xenomai] Kernel freezes in __ipipe_sync_stage
Date: Wed, 02 Jul 2014 18:36:04 +0200	[thread overview]
Message-ID: <53B434F4.6060203@axelsw.it> (raw)
In-Reply-To: <53AA8AC3.7020100@xenomai.org>

Good morning,

Il 25/06/2014 10:39, Philippe Gerum ha scritto:
> On 06/25/2014 09:50 AM, Marco Tessore wrote:
>> Il 24/06/2014 19:10, Philippe Gerum ha scritto:
>>> On 06/24/2014 06:41 PM, Marco Tessore wrote:
>>>> Hi,
>>>>
>>>> Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto:
>>>>> On 06/20/2014 11:11 AM, Marco Tessore wrote:
>>>>>> The kernel is version 2.6.31 for ARM architecture - specifically a
>>>>> Do you have the same problem with a recent I-pipe patches, like 
>>>>> one for
>>>>> 3.8 or 3.10 kernel?
>>>>>
>>>>
>>>> I managed to do some tests on 3.10 kernel but on onother board with
>>>> imx28 CPU, actually it happens that that kernel freezes too,
>>>> but I haven't debugged it with the jtag debugger.
>>>>
>>>> I have, instead, some information on the original problem, that is the
>>>> one that worried me more:
>>>>
>>>> In summary:
>>>> I have a board based on imx25, with kernel 2.6.31, Xenomai 2.5.6 and
>>>> ipipe patch 1.16-02.
>>>>
>>>> Rarely, but often enough to be a problem, the kernel freezes at boot.
>>>> Thanks to a JTAG debugger I'm able to observe the kernel in the
>>>> following situation:
>>>> I'm in an infinite loop with the following stack trace:
>>>> __ipipe_set_irqpending
>>>> xnintr_host_tick (__ipipe_propagate_irq)
>>>> xnintr_clock_handler
>>>> __ipipe_sync_stage    <- (1)
>>>> ipipe_suspend_domain
>>>> __ipipe_walk_pipeline
>>>> __ipipe_restore_pipeline_head
>>>> xnarch_next_tick_shot
>>>> clockevents_program_event
>>>> tick_dev_program_event
>>>> hrtimer_interrupt
>>>> mxc_interrupt
>>>> handle_IRQ_event
>>>> handle_level_irq
>>>> asm_do_IRQ
>>>> __ipipe_sync_stage <- (2)
>>>> ipipe_suspend_domain
>>>> __ipipe_walk_pipeline
>>>> __ipipe_restore_pipeline_head
>>>> xnpod_enable_timesource
>>>> xnpod_init
>>>> __native_skin_init
>>>> ...
>>>> ...
>>>>
>>>> Specifically, it happens that the first call to __ipipe_sync_stage, 
>>>> the
>>>> one marked with the number (2), is working on a stage that I can not
>>>> determine,
>>>> let's say for convenience stage S1, I think is the Linux secondary
>>>> domain but I'm not sure,
>>>> so the function invokes the interrupt handler of the system timer.
>>>> Continuing in the stack trace, I have a nested call to
>>>> __ipipe_sync_stage, indicated with (1),
>>>> but this call works on another stage, for convenience domain S2,
>>>> in turn this function invokes a handler for the timer irq, which at a
>>>> certain point invokes the __ipipe_propagate_irq which raises the flags
>>>> for the stage S1,
>>>> thus making the first call to __ipipe_sync_stage (2) fails to get 
>>>> out of
>>>> their while loops.
>>>>
>>>> I should add that I do not see hardware interrupt for the timer in
>>>> function __ipipe_grab_IRQ.
>>>> I have no idea how the cycle is triggered,but when the kernel is 
>>>> locked,
>>>> the kernel is located in the software exclusively infinite loop
>>>> described above.
>>>>
>>>>
>>>> In the hope that you could help me understand what is going on,
>>>> I would have liked groped a patch like this:
>>>> - Store, for each level of nesting of __ipipe_sync_stage, the irq 
>>>> number
>>>> currently running and on behalf of which stage.
>>>> - Patch the function __ipipe_set_irqpending in such a way as not to 
>>>> set
>>>> the flags for the pair (irq, stage) if the pair is already present at
>>>> some level in the current stack trace, that is,
>>>> - if the function __ipipe_sync_stage is executing the handler for a
>>>> stage, and then he had reset the flags in irqpend_himask and
>>>> irqpend_lomask, it does not expect the handler goes to raise again the
>>>> same flag for the same stage.
>>>>
>>>> What do you think about this?
>>>>
>>>> Thank you very much for any kind of advice you could give me
>>>>
>>>
>>> You mentioned random lockups during boot. Does you board ever lock up
>>> when passing xeno_hal.disable=1 on the kernel command line?
>>>
>> Yes, I mentioned random lockups, but always the kernel enters in the
>> infinite loop described above.
>> Following your suggestion I tried to pass parameter xeno_hal.disable=1
>> but kernel sayed
>> "Unknown boot option `xeno_hal.disable=1': ignoring"
>>
>
> This is because you are running an outdated Xenomai 2.5.x release. A 
> work around is to build all the Xenomai skins as modules in the kernel 
> (native, posix, vxworks etc), refraining from modloading them during 
> the boot process.
>
>> What is supposed to do this option anyway? If it would disable HAL, does
>> not this inhibits xenomai realtime services?
>>
>
> This is exactly what we want. When the real-time services commence, 
> control of the hardware timer is handed over to Xenomai, which enables 
> pipelining of the clock source events to the co-kernel. We need to 
> know in this path is involved.
>
>> What about the patch,described above, that I would apply? say, don't
>> permit that the interrupt handlers called in __ipipe_sync_stage raise a
>> couple (stage, irq) already handled in the current stack?
>>
>
> This won't work, this breaks an aspect of the pipeline core logic. 
> This would be papering over the issue, not fixing it, opening a can of 
> worms down the road. We are not chasing a bug in the core logic at 
> this point, we are more likely chasing a bug in the SoC-specific code 
> which binds the hw timer to the pipeline.
>
> First step is to determine if the system experiences an IRQ storm of 
> some sort from the timer chip, and why so. By focusing on the IRQ 
> replay loop which basically resyncs the current interrupt state with 
> the past events logged, you may be looking at rays from an ancient sun.
>
>> Thank you
>> Marco Tessore
>>
>>
>
>
still trying to investigate the problem, I re-applied the patch ipipe on 
a clean kernel and compared with the problematic one,
obviously by matching the same versions of kernel, ipipe patch, xenomai.

I noticed a difference between the defective one and the one just obtained:
the defective kernel has the following block of code at the end of the file
/arch/arm/mach-mx25/devices.c

il blocco:
#ifdef CONFIG_IPIPE
static int post_cpu_init(void)
{
ipipe_mach_allow_hwtimer_uaccess(MX25_AIPS1_BASE_ADDR_VIRT,MX25_AIPS2_BASE_ADDR_VIRT);
     return 0;
}

postcore_initcall(post_cpu_init);
#endif /* CONFIG_IPIPE */


the question that I kindly ask is: what should do the function 
ipipe_mach_allow_hwtimer_uaccess?


In order to reconnect to the previous email:
- I analyzed interrupts: the timer ones seem to me fairly regular
- Occasionally we have bursts of the NAND memory interrupt, I think it 
is normal
- I am still experiencing occasional blocks of the kernel, but does not 
occur interrupt flood, neither from the timer and nand memory,
    for at least one second before the deadlock - seen with an oscilloscope.


Now I'm trying a kernel where I commented the code block above;
I hope they do not occur more blocks, but I'd like to know what is the 
function ipipe_mach_allow_hwtimer_uaccess,
since the block was entered by the person who produced the kernel I'm 
debugging,
the block is not present in the ipipe patch attached to the distribution 
of Xenomai,
I do not know why it was inserted.

Thank you very much
kind regards
Marco Tessore

next prev parent reply	other threads:[~2014-07-02 16:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-20  9:11 [Xenomai] Kernel freezes in __ipipe_sync_stage Marco Tessore
2014-06-20 11:52 ` Gilles Chanteperdrix
2014-06-20 12:18   ` Marco Tessore
2014-06-20 12:25     ` Gilles Chanteperdrix
2014-06-24 16:41   ` Marco Tessore
2014-06-24 17:10     ` Philippe Gerum
2014-06-25  7:50       ` Marco Tessore
2014-06-25  8:39         ` Philippe Gerum
2014-07-02 16:36           ` Marco Tessore [this message]
2014-07-02 17:41             ` Gilles Chanteperdrix
2014-07-09 16:09           ` Marco Tessore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53B434F4.6060203@axelsw.it \
    --to=marco.tessore@axelsw.it \
    --cc=gilles.chanteperdrix@xenomai.org \
    --cc=rpm@xenomai.org \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.