[Xenomai] xnarch_xchg infinite loop

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai] xnarch_xchg infinite loop
@ 2014-01-20  8:31 Henri Roosen
  2014-01-20  9:48 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 8+ messages in thread
From: Henri Roosen @ 2014-01-20  8:31 UTC (permalink / raw)
  To: Xenomai@xenomai.org

Hi all,

We have the problem that (hot-)rebooting our Xenomai system fails every 1
out of 10 times.

The system is an ARM iMX6Solo (Cortex-A9) running Xenomai 2.6.2.1 and
kernel 3.0 (freescale branch).

When the system hangs at reboot, it is in an infinite loop in the Xenomai
atomic exchange implementation, with STREX always returning 1:

__xnarch_xchg
S:0xC0094B2C : ADD      r3,r6,#0x890
S:0xC0094B30 : LDREX    r2,[r3]
S:0xC0094B34 : STREX    r1,r9,[r3]
S:0xC0094B38 : TEQ      r1,#0
S:0xC0094B3C : BNE      {pc}-0xc ; 0xc0094b30

Does anyone know what is causing the STREX to always return 0 and why it
might get into this state?
Is there a workaround for this problem?

Any help is appreciated!

Thanks,
Henri

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] xnarch_xchg infinite loop
  2014-01-20  8:31 [Xenomai] xnarch_xchg infinite loop Henri Roosen
@ 2014-01-20  9:48 ` Gilles Chanteperdrix
  2014-01-20 10:02   ` Henri Roosen
  0 siblings, 1 reply; 8+ messages in thread
From: Gilles Chanteperdrix @ 2014-01-20  9:48 UTC (permalink / raw)
  To: Henri Roosen; +Cc: Xenomai@xenomai.org

On 01/20/2014 09:31 AM, Henri Roosen wrote:
> Hi all,
> 
> We have the problem that (hot-)rebooting our Xenomai system fails every 1
> out of 10 times.
> 
> The system is an ARM iMX6Solo (Cortex-A9) running Xenomai 2.6.2.1 and
> kernel 3.0 (freescale branch).
> 
> When the system hangs at reboot, it is in an infinite loop in the Xenomai
> atomic exchange implementation, with STREX always returning 1:
> 
> __xnarch_xchg
> S:0xC0094B2C : ADD      r3,r6,#0x890
> S:0xC0094B30 : LDREX    r2,[r3]
> S:0xC0094B34 : STREX    r1,r9,[r3]
> S:0xC0094B38 : TEQ      r1,#0
> S:0xC0094B3C : BNE      {pc}-0xc ; 0xc0094b30
> 
> Does anyone know what is causing the STREX to always return 0 and why it
> might get into this state?

Normally, strex fails if "something else" stores data in between ldrex
and strex. Do you have the full stack trace?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] xnarch_xchg infinite loop
  2014-01-20  9:48 ` Gilles Chanteperdrix
@ 2014-01-20 10:02   ` Henri Roosen
  2014-01-20 20:40     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 8+ messages in thread
From: Henri Roosen @ 2014-01-20 10:02 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai@xenomai.org

On Mon, Jan 20, 2014 at 10:48 AM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> On 01/20/2014 09:31 AM, Henri Roosen wrote:
> > Hi all,
> >
> > We have the problem that (hot-)rebooting our Xenomai system fails every 1
> > out of 10 times.
> >
> > The system is an ARM iMX6Solo (Cortex-A9) running Xenomai 2.6.2.1 and
> > kernel 3.0 (freescale branch).
> >
> > When the system hangs at reboot, it is in an infinite loop in the Xenomai
> > atomic exchange implementation, with STREX always returning 1:
> >
> > __xnarch_xchg
> > S:0xC0094B2C : ADD      r3,r6,#0x890
> > S:0xC0094B30 : LDREX    r2,[r3]
> > S:0xC0094B34 : STREX    r1,r9,[r3]
> > S:0xC0094B38 : TEQ      r1,#0
> > S:0xC0094B3C : BNE      {pc}-0xc ; 0xc0094b30
> >
> > Does anyone know what is causing the STREX to always return 0 and why it
> > might get into this state?
>
> Normally, strex fails if "something else" stores data in between ldrex
> and strex. Do you have the full stack trace?
>

Thank you for your reply Gilles. Please find the stacktrace below:

#0 __xnarch_xchg( size = 4, x = 3204479504, ptr = <Value optimised away by
compiler> ) at atomic_asm.h:79
#1 xnintr_irq_handler( irq = 260, cookie = (void*) 0xBF0079E8 ) at
atomic_asm.h:93
#2 __ipipe_sync_stage() at core.c:1301
#3 ipipe_suspend_domain() at core.c:856
#4 __ipipe_walk_pipeline( pos = (struct list_head*) 0xC0463884 ) at
core.c:797
#5 __ipipe_handle_irq( irq = 98, flags = 0 ) at ipipe.c:564
#6 __ipipe_grab_irq( irq = <Value optimised away by compiler>, regs =
(struct pt_regs*) 0xCC897DD8 ) at ipipe.c:618
#7 [__irq_svc+0x40]

Thanks,
Henri


> --
>                                                                 Gilles.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] xnarch_xchg infinite loop
  2014-01-20 10:02   ` Henri Roosen
@ 2014-01-20 20:40     ` Gilles Chanteperdrix
  2014-01-22  8:31       ` Henri Roosen
  0 siblings, 1 reply; 8+ messages in thread
From: Gilles Chanteperdrix @ 2014-01-20 20:40 UTC (permalink / raw)
  To: Henri Roosen; +Cc: Xenomai@xenomai.org

On 01/20/2014 11:02 AM, Henri Roosen wrote:
> On Mon, Jan 20, 2014 at 10:48 AM, Gilles Chanteperdrix <
> gilles.chanteperdrix@xenomai.org> wrote:
> 
>> On 01/20/2014 09:31 AM, Henri Roosen wrote:
>>> Hi all,
>>>
>>> We have the problem that (hot-)rebooting our Xenomai system fails every 1
>>> out of 10 times.
>>>
>>> The system is an ARM iMX6Solo (Cortex-A9) running Xenomai 2.6.2.1 and
>>> kernel 3.0 (freescale branch).
>>>
>>> When the system hangs at reboot, it is in an infinite loop in the Xenomai
>>> atomic exchange implementation, with STREX always returning 1:
>>>
>>> __xnarch_xchg
>>> S:0xC0094B2C : ADD      r3,r6,#0x890
>>> S:0xC0094B30 : LDREX    r2,[r3]
>>> S:0xC0094B34 : STREX    r1,r9,[r3]
>>> S:0xC0094B38 : TEQ      r1,#0
>>> S:0xC0094B3C : BNE      {pc}-0xc ; 0xc0094b30
>>>
>>> Does anyone know what is causing the STREX to always return 0 and why it
>>> might get into this state?
>>
>> Normally, strex fails if "something else" stores data in between ldrex
>> and strex. Do you have the full stack trace?
>>
> 
> Thank you for your reply Gilles. Please find the stacktrace below:
> 
> #0 __xnarch_xchg( size = 4, x = 3204479504, ptr = <Value optimised away by
> compiler> ) at atomic_asm.h:79
> #1 xnintr_irq_handler( irq = 260, cookie = (void*) 0xBF0079E8 ) at
> atomic_asm.h:93
> #2 __ipipe_sync_stage() at core.c:1301
> #3 ipipe_suspend_domain() at core.c:856
> #4 __ipipe_walk_pipeline( pos = (struct list_head*) 0xC0463884 ) at
> core.c:797
> #5 __ipipe_handle_irq( irq = 98, flags = 0 ) at ipipe.c:564
> #6 __ipipe_grab_irq( irq = <Value optimised away by compiler>, regs =
> (struct pt_regs*) 0xCC897DD8 ) at ipipe.c:618
> #7 [__irq_svc+0x40]

I do not really see what is going on. However, how come you have a
driver still running while rebooting? Do you not remove the drivers
before rebooting? Also, __irq_svc means you received an interrupt while
in svc mode, so, it would be interesting to know what is below
__irq_svc. I believe you can know that by adding a call to show_stack
(and preferably build the kernel with frame pointers). The trick will be
to avoid getting a show_stack on every interrupt.

Regards.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] xnarch_xchg infinite loop
  2014-01-20 20:40     ` Gilles Chanteperdrix
@ 2014-01-22  8:31       ` Henri Roosen
  2014-01-22 10:47         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 8+ messages in thread
From: Henri Roosen @ 2014-01-22  8:31 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai@xenomai.org

On Mon, Jan 20, 2014 at 9:40 PM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> On 01/20/2014 11:02 AM, Henri Roosen wrote:
> > On Mon, Jan 20, 2014 at 10:48 AM, Gilles Chanteperdrix <
> > gilles.chanteperdrix@xenomai.org> wrote:
> >
> >> On 01/20/2014 09:31 AM, Henri Roosen wrote:
> >>> Hi all,
> >>>
> >>> We have the problem that (hot-)rebooting our Xenomai system fails
> every 1
> >>> out of 10 times.
> >>>
> >>> The system is an ARM iMX6Solo (Cortex-A9) running Xenomai 2.6.2.1 and
> >>> kernel 3.0 (freescale branch).
> >>>
> >>> When the system hangs at reboot, it is in an infinite loop in the
> Xenomai
> >>> atomic exchange implementation, with STREX always returning 1:
> >>>
> >>> __xnarch_xchg
> >>> S:0xC0094B2C : ADD      r3,r6,#0x890
> >>> S:0xC0094B30 : LDREX    r2,[r3]
> >>> S:0xC0094B34 : STREX    r1,r9,[r3]
> >>> S:0xC0094B38 : TEQ      r1,#0
> >>> S:0xC0094B3C : BNE      {pc}-0xc ; 0xc0094b30
> >>>
> >>> Does anyone know what is causing the STREX to always return 0 and why
> it
> >>> might get into this state?
> >>
> >> Normally, strex fails if "something else" stores data in between ldrex
> >> and strex. Do you have the full stack trace?
> >>
> >
> > Thank you for your reply Gilles. Please find the stacktrace below:
> >
> > #0 __xnarch_xchg( size = 4, x = 3204479504, ptr = <Value optimised away
> by
> > compiler> ) at atomic_asm.h:79
> > #1 xnintr_irq_handler( irq = 260, cookie = (void*) 0xBF0079E8 ) at
> > atomic_asm.h:93
> > #2 __ipipe_sync_stage() at core.c:1301
> > #3 ipipe_suspend_domain() at core.c:856
> > #4 __ipipe_walk_pipeline( pos = (struct list_head*) 0xC0463884 ) at
> > core.c:797
> > #5 __ipipe_handle_irq( irq = 98, flags = 0 ) at ipipe.c:564
> > #6 __ipipe_grab_irq( irq = <Value optimised away by compiler>, regs =
> > (struct pt_regs*) 0xCC897DD8 ) at ipipe.c:618
> > #7 [__irq_svc+0x40]
>
> I do not really see what is going on. However, how come you have a
> driver still running while rebooting? Do you not remove the drivers
>

Removing the device driver before reboot doesn't help: we still get a
LDREX/STREX lockup in the Xenomai timertick path. I'm still not sure what
actually causes this lockup, but my guess is somewhere late in the
shutdown-sequence the memory (or something else) is put into a state that
makes LDREX/STREX fail all the time. Any interrupt from the Xenomai domain
locks the system then.

As a quick (and dirty!) workaround skipping __ipipe_handle_irq when
system_state == SYSTEM_REBOOT solves the problem.

So thinking towards a proper solution: is there or should there be any
shutdown/de-initialization of Xenomai and it's services in the Linux
shutdown sequence?

before rebooting? Also, __irq_svc means you received an interrupt while
> in svc mode, so, it would be interesting to know what is below
> __irq_svc. I believe you can know that by adding a call to show_stack
> (and preferably build the kernel with frame pointers). The trick will be
> to avoid getting a show_stack on every interrupt.
>
Regards.
>
> --
>                                                                 Gilles.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] xnarch_xchg infinite loop
  2014-01-22  8:31       ` Henri Roosen
@ 2014-01-22 10:47         ` Gilles Chanteperdrix
  2014-01-22 14:49           ` Henri Roosen
  0 siblings, 1 reply; 8+ messages in thread
From: Gilles Chanteperdrix @ 2014-01-22 10:47 UTC (permalink / raw)
  To: Henri Roosen; +Cc: Xenomai@xenomai.org

On 01/22/2014 09:31 AM, Henri Roosen wrote:
> Removing the device driver before reboot doesn't help: we still get a
> LDREX/STREX lockup in the Xenomai timertick path. I'm still not sure what
> actually causes this lockup, but my guess is somewhere late in the
> shutdown-sequence the memory (or something else) is put into a state that
> makes LDREX/STREX fail all the time. Any interrupt from the Xenomai domain
> locks the system then.
> 
> As a quick (and dirty!) workaround skipping __ipipe_handle_irq when
> system_state == SYSTEM_REBOOT solves the problem.
> 
> So thinking towards a proper solution: is there or should there be any
> shutdown/de-initialization of Xenomai and it's services in the Linux
> shutdown sequence?

The simplest solution seems to find the Linux code involved, which
probably contains a local_irq_disable() to avoid this situation, and
replace it with hard_local_irq_disable() in the I-pipe case so as to
also block xenomai interrupts.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] xnarch_xchg infinite loop
  2014-01-22 10:47         ` Gilles Chanteperdrix
@ 2014-01-22 14:49           ` Henri Roosen
  2014-01-22 20:20             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 8+ messages in thread
From: Henri Roosen @ 2014-01-22 14:49 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai@xenomai.org

On Wed, Jan 22, 2014 at 11:47 AM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> On 01/22/2014 09:31 AM, Henri Roosen wrote:
> > Removing the device driver before reboot doesn't help: we still get a
> > LDREX/STREX lockup in the Xenomai timertick path. I'm still not sure what
> > actually causes this lockup, but my guess is somewhere late in the
> > shutdown-sequence the memory (or something else) is put into a state that
> > makes LDREX/STREX fail all the time. Any interrupt from the Xenomai
> domain
> > locks the system then.
> >
> > As a quick (and dirty!) workaround skipping __ipipe_handle_irq when
> > system_state == SYSTEM_REBOOT solves the problem.
> >
> > So thinking towards a proper solution: is there or should there be any
> > shutdown/de-initialization of Xenomai and it's services in the Linux
> > shutdown sequence?
>
> The simplest solution seems to find the Linux code involved, which
> probably contains a local_irq_disable() to avoid this situation, and
> replace it with hard_local_irq_disable() in the I-pipe case so as to
> also block xenomai interrupts.
>

Great, this seems to work. Thanks Gilles!
I replaced with local_irq_disable_hw(), that is what you meant right?

The ARM reboot code also disables the fiq with local_fiq_disable(). It it
necessary to also change this by local_fiq_disable_hw() for the ipipe case?
Or is there no need for that?

Thanks,
Henri


> --
>                                                                 Gilles.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] xnarch_xchg infinite loop
  2014-01-22 14:49           ` Henri Roosen
@ 2014-01-22 20:20             ` Gilles Chanteperdrix
  0 siblings, 0 replies; 8+ messages in thread
From: Gilles Chanteperdrix @ 2014-01-22 20:20 UTC (permalink / raw)
  To: Henri Roosen; +Cc: Xenomai@xenomai.org

On 01/22/2014 03:49 PM, Henri Roosen wrote:
> On Wed, Jan 22, 2014 at 11:47 AM, Gilles Chanteperdrix <
> gilles.chanteperdrix@xenomai.org> wrote:
> 
>> On 01/22/2014 09:31 AM, Henri Roosen wrote:
>>> Removing the device driver before reboot doesn't help: we still get a
>>> LDREX/STREX lockup in the Xenomai timertick path. I'm still not sure what
>>> actually causes this lockup, but my guess is somewhere late in the
>>> shutdown-sequence the memory (or something else) is put into a state that
>>> makes LDREX/STREX fail all the time. Any interrupt from the Xenomai
>> domain
>>> locks the system then.
>>>
>>> As a quick (and dirty!) workaround skipping __ipipe_handle_irq when
>>> system_state == SYSTEM_REBOOT solves the problem.
>>>
>>> So thinking towards a proper solution: is there or should there be any
>>> shutdown/de-initialization of Xenomai and it's services in the Linux
>>> shutdown sequence?
>>
>> The simplest solution seems to find the Linux code involved, which
>> probably contains a local_irq_disable() to avoid this situation, and
>> replace it with hard_local_irq_disable() in the I-pipe case so as to
>> also block xenomai interrupts.
>>
> 
> Great, this seems to work. Thanks Gilles!
> I replaced with local_irq_disable_hw(), that is what you meant right?
> 
> The ARM reboot code also disables the fiq with local_fiq_disable(). It it
> necessary to also change this by local_fiq_disable_hw() for the ipipe case?
> Or is there no need for that?

Right, hard_local_irq_disable is for I-pipe patches for 3.2 and later
kernels. Yeq, if local_fiq_disable_hw exists, you should call it instead
of local_fiq_disable.

Regards.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-01-22 20:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-20  8:31 [Xenomai] xnarch_xchg infinite loop Henri Roosen
2014-01-20  9:48 ` Gilles Chanteperdrix
2014-01-20 10:02   ` Henri Roosen
2014-01-20 20:40     ` Gilles Chanteperdrix
2014-01-22  8:31       ` Henri Roosen
2014-01-22 10:47         ` Gilles Chanteperdrix
2014-01-22 14:49           ` Henri Roosen
2014-01-22 20:20             ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.