[Xenomai-core] kernel threads crash - possible race condition?

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-core] kernel threads crash - possible race condition?
@ 2011-04-14 13:04 Jesper Christensen
  2011-04-14 13:31 ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Jesper Christensen @ 2011-04-14 13:04 UTC (permalink / raw)
  To: xenomai@xenomai.org

I wrote about some problems concerning stack corruption when running
xenomai on ppc. I have found out that if i disable hardware interrupts
while running "rthal_thread_switch" the problem seems to dissapear
somewhat. I saw a crash yesterday after running for 3 hours, and i'm
currently running a test (has been running for 3 hours). Usually it
would fail after 30-40 minutes. My question is: could there be a problem
if we receive an interrupt between updating the stack pointer and the
sprg3 register with the new thread pointer?

/Jesper

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai-core] kernel threads crash - possible race condition?
  2011-04-14 13:04 [Xenomai-core] kernel threads crash - possible race condition? Jesper Christensen
@ 2011-04-14 13:31 ` Philippe Gerum
  2011-04-14 13:46   ` Jesper Christensen
  0 siblings, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2011-04-14 13:31 UTC (permalink / raw)
  To: Jesper Christensen; +Cc: xenomai@xenomai.org

On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote:
> I wrote about some problems concerning stack corruption when running
> xenomai on ppc. I have found out that if i disable hardware interrupts
> while running "rthal_thread_switch" the problem seems to dissapear
> somewhat. I saw a crash yesterday after running for 3 hours, and i'm
> currently running a test (has been running for 3 hours). Usually it
> would fail after 30-40 minutes. My question is: could there be a problem
> if we receive an interrupt between updating the stack pointer and the
> sprg3 register with the new thread pointer?
> 

Normally, there should not be any issue (famous last words), since we
would run Xenomai-only code over the preempted context, and we don't
depend on SPRG3 to fetch the current phys address. In fact, at this
stage we simply don't care about the linux context, only referring to
the current Xenomai thread, which is obtained differently.

Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the "machine"
config area, if this ends up being rock-solid, then this would be a hint
that something may be fishy in this area. Raising your k-thread stack
sizes in a separate test may be interesting to check too, if not already
done.

> /Jesper
> 
> 
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core

-- 
Philippe.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai-core] kernel threads crash - possible race condition?
  2011-04-14 13:31 ` Philippe Gerum
@ 2011-04-14 13:46   ` Jesper Christensen
  2011-04-14 14:09     ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Jesper Christensen @ 2011-04-14 13:46 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai@xenomai.org


Actually i have been running with CONFIG_XENO_HW_UNLOCKED_SWITCH the
whole time and i also raised the stack size from 4k to 8k. I do however
think there could be some fishyness in entry_32.S. In
"transfer_to_handler" SPRN_SPRG3 is used to check for stack overflow (at
least in my kernel 2.6.29.6), but i must admit i haven't seen any of
that in the kernel log.

/Jesper


On 2011-04-14 15:31, Philippe Gerum wrote:
> On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote:
>   
>> I wrote about some problems concerning stack corruption when running
>> xenomai on ppc. I have found out that if i disable hardware interrupts
>> while running "rthal_thread_switch" the problem seems to dissapear
>> somewhat. I saw a crash yesterday after running for 3 hours, and i'm
>> currently running a test (has been running for 3 hours). Usually it
>> would fail after 30-40 minutes. My question is: could there be a problem
>> if we receive an interrupt between updating the stack pointer and the
>> sprg3 register with the new thread pointer?
>>
>>     
> Normally, there should not be any issue (famous last words), since we
> would run Xenomai-only code over the preempted context, and we don't
> depend on SPRG3 to fetch the current phys address. In fact, at this
> stage we simply don't care about the linux context, only referring to
> the current Xenomai thread, which is obtained differently.
>
> Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the "machine"
> config area, if this ends up being rock-solid, then this would be a hint
> that something may be fishy in this area. Raising your k-thread stack
> sizes in a separate test may be interesting to check too, if not already
> done.
>
>
>   
>> /Jesper
>>
>>
>>
>> _______________________________________________
>> Xenomai-core mailing list
>> Xenomai-core@domain.hid
>> https://mail.gna.org/listinfo/xenomai-core
>>     
>   



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai-core] kernel threads crash - possible race condition?
  2011-04-14 13:46   ` Jesper Christensen
@ 2011-04-14 14:09     ` Philippe Gerum
  2011-04-14 14:16       ` Jesper Christensen
  0 siblings, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2011-04-14 14:09 UTC (permalink / raw)
  To: Jesper Christensen; +Cc: xenomai@xenomai.org

On Thu, 2011-04-14 at 15:46 +0200, Jesper Christensen wrote:
> Actually i have been running with CONFIG_XENO_HW_UNLOCKED_SWITCH the
> whole time

You mean enabled?

>  and i also raised the stack size from 4k to 8k. I do however
> think there could be some fishyness in entry_32.S. In
> "transfer_to_handler" SPRN_SPRG3 is used to check for stack overflow (at
> least in my kernel 2.6.29.6), but i must admit i haven't seen any of
> that in the kernel log.
> 

Mmm, you are right. In any case, what we want with the unmasked switch
feature is to allow interrupts while we flush the tlb and set the new mm
context, which may be lengthy on some low end platforms. Allowing the
switch code to be preempted during the register swap is of no use wrt
latency.

Do you have a patch at hand which you could post that flips MSR_EE in
rthal_thread_switch already?

> /Jesper
> 
> 
> On 2011-04-14 15:31, Philippe Gerum wrote:
> > On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote:
> >   
> >> I wrote about some problems concerning stack corruption when running
> >> xenomai on ppc. I have found out that if i disable hardware interrupts
> >> while running "rthal_thread_switch" the problem seems to dissapear
> >> somewhat. I saw a crash yesterday after running for 3 hours, and i'm
> >> currently running a test (has been running for 3 hours). Usually it
> >> would fail after 30-40 minutes. My question is: could there be a problem
> >> if we receive an interrupt between updating the stack pointer and the
> >> sprg3 register with the new thread pointer?
> >>
> >>     
> > Normally, there should not be any issue (famous last words), since we
> > would run Xenomai-only code over the preempted context, and we don't
> > depend on SPRG3 to fetch the current phys address. In fact, at this
> > stage we simply don't care about the linux context, only referring to
> > the current Xenomai thread, which is obtained differently.
> >
> > Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the "machine"
> > config area, if this ends up being rock-solid, then this would be a hint
> > that something may be fishy in this area. Raising your k-thread stack
> > sizes in a separate test may be interesting to check too, if not already
> > done.
> >
> >
> >   
> >> /Jesper
> >>
> >>
> >>
> >> _______________________________________________
> >> Xenomai-core mailing list
> >> Xenomai-core@domain.hid
> >> https://mail.gna.org/listinfo/xenomai-core
> >>     
> >   
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai-core] kernel threads crash - possible race condition?
  2011-04-14 14:09     ` Philippe Gerum
@ 2011-04-14 14:16       ` Jesper Christensen
  0 siblings, 0 replies; 5+ messages in thread
From: Jesper Christensen @ 2011-04-14 14:16 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai@xenomai.org


On 2011-04-14 16:09, Philippe Gerum wrote:
> On Thu, 2011-04-14 at 15:46 +0200, Jesper Christensen wrote:
>   
>> Actually i have been running with CONFIG_XENO_HW_UNLOCKED_SWITCH the
>> whole time
>>     
> You mean enabled?
>   
Disabled, sorry.
>   
>>  and i also raised the stack size from 4k to 8k. I do however
>> think there could be some fishyness in entry_32.S. In
>> "transfer_to_handler" SPRN_SPRG3 is used to check for stack overflow (at
>> least in my kernel 2.6.29.6), but i must admit i haven't seen any of
>> that in the kernel log.
>>
>>     
> Mmm, you are right. In any case, what we want with the unmasked switch
> feature is to allow interrupts while we flush the tlb and set the new mm
> context, which may be lengthy on some low end platforms. Allowing the
> switch code to be preempted during the register swap is of no use wrt
> latency.
>
> Do you have a patch at hand which you could post that flips MSR_EE in
> rthal_thread_switch already?
>
>   
This protects the whole function, but it should flip the bit inside like
you suggest.

diff --git a/include/asm-powerpc/bits/pod.h b/include/asm-powerpc/bits/pod.h
old mode 100644
new mode 100755
index 6269907..e279647
--- a/include/asm-powerpc/bits/pod.h
+++ b/include/asm-powerpc/bits/pod.h
@@ -106,6 +106,7 @@ static inline void xnarch_switch_to(xnarchtcb_t
*out_tcb,
        struct mm_struct *prev_mm = out_tcb->active_mm, *next_mm;
        struct task_struct *prev = out_tcb->active_task;
        struct task_struct *next = in_tcb->user_task;
+       unsigned long flags;
 
        if (likely(next != NULL)) {
                in_tcb->active_task = next;
@@ -156,12 +157,14 @@ static inline void xnarch_switch_to(xnarchtcb_t
*out_tcb,
 #endif /* PPC32 */
 #endif /* !__IPIPE_FEATURE_HARDENED_SWITCHMM */
 
+    rthal_local_irq_save_hw(flags);
 #ifdef CONFIG_PPC64
        rthal_thread_switch(out_tcb->tsp, in_tcb->tsp, next == NULL);
 #else
        rthal_thread_switch(out_tcb->tsp, in_tcb->tsp);
 #endif
        barrier();
+       rthal_local_irq_restore_hw(flags);
 }

>> /Jesper
>>
>>
>> On 2011-04-14 15:31, Philippe Gerum wrote:
>>     
>>> On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote:
>>>   
>>>       
>>>> I wrote about some problems concerning stack corruption when running
>>>> xenomai on ppc. I have found out that if i disable hardware interrupts
>>>> while running "rthal_thread_switch" the problem seems to dissapear
>>>> somewhat. I saw a crash yesterday after running for 3 hours, and i'm
>>>> currently running a test (has been running for 3 hours). Usually it
>>>> would fail after 30-40 minutes. My question is: could there be a problem
>>>> if we receive an interrupt between updating the stack pointer and the
>>>> sprg3 register with the new thread pointer?
>>>>
>>>>     
>>>>         
>>> Normally, there should not be any issue (famous last words), since we
>>> would run Xenomai-only code over the preempted context, and we don't
>>> depend on SPRG3 to fetch the current phys address. In fact, at this
>>> stage we simply don't care about the linux context, only referring to
>>> the current Xenomai thread, which is obtained differently.
>>>
>>> Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the "machine"
>>> config area, if this ends up being rock-solid, then this would be a hint
>>> that something may be fishy in this area. Raising your k-thread stack
>>> sizes in a separate test may be interesting to check too, if not already
>>> done.
>>>
>>>
>>>   
>>>       
>>>> /Jesper
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Xenomai-core mailing list
>>>> Xenomai-core@domain.hid
>>>> https://mail.gna.org/listinfo/xenomai-core
>>>>     
>>>>         
>>>   
>>>       
>>     
>   



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-04-14 14:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-14 13:04 [Xenomai-core] kernel threads crash - possible race condition? Jesper Christensen
2011-04-14 13:31 ` Philippe Gerum
2011-04-14 13:46   ` Jesper Christensen
2011-04-14 14:09     ` Philippe Gerum
2011-04-14 14:16       ` Jesper Christensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.