* [Xenomai-core] kernel threads crash - possible race condition? @ 2011-04-14 13:04 Jesper Christensen 2011-04-14 13:31 ` Philippe Gerum 0 siblings, 1 reply; 5+ messages in thread From: Jesper Christensen @ 2011-04-14 13:04 UTC (permalink / raw) To: xenomai@xenomai.org I wrote about some problems concerning stack corruption when running xenomai on ppc. I have found out that if i disable hardware interrupts while running "rthal_thread_switch" the problem seems to dissapear somewhat. I saw a crash yesterday after running for 3 hours, and i'm currently running a test (has been running for 3 hours). Usually it would fail after 30-40 minutes. My question is: could there be a problem if we receive an interrupt between updating the stack pointer and the sprg3 register with the new thread pointer? /Jesper ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] kernel threads crash - possible race condition? 2011-04-14 13:04 [Xenomai-core] kernel threads crash - possible race condition? Jesper Christensen @ 2011-04-14 13:31 ` Philippe Gerum 2011-04-14 13:46 ` Jesper Christensen 0 siblings, 1 reply; 5+ messages in thread From: Philippe Gerum @ 2011-04-14 13:31 UTC (permalink / raw) To: Jesper Christensen; +Cc: xenomai@xenomai.org On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote: > I wrote about some problems concerning stack corruption when running > xenomai on ppc. I have found out that if i disable hardware interrupts > while running "rthal_thread_switch" the problem seems to dissapear > somewhat. I saw a crash yesterday after running for 3 hours, and i'm > currently running a test (has been running for 3 hours). Usually it > would fail after 30-40 minutes. My question is: could there be a problem > if we receive an interrupt between updating the stack pointer and the > sprg3 register with the new thread pointer? > Normally, there should not be any issue (famous last words), since we would run Xenomai-only code over the preempted context, and we don't depend on SPRG3 to fetch the current phys address. In fact, at this stage we simply don't care about the linux context, only referring to the current Xenomai thread, which is obtained differently. Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the "machine" config area, if this ends up being rock-solid, then this would be a hint that something may be fishy in this area. Raising your k-thread stack sizes in a separate test may be interesting to check too, if not already done. > /Jesper > > > > _______________________________________________ > Xenomai-core mailing list > Xenomai-core@domain.hid > https://mail.gna.org/listinfo/xenomai-core -- Philippe. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] kernel threads crash - possible race condition? 2011-04-14 13:31 ` Philippe Gerum @ 2011-04-14 13:46 ` Jesper Christensen 2011-04-14 14:09 ` Philippe Gerum 0 siblings, 1 reply; 5+ messages in thread From: Jesper Christensen @ 2011-04-14 13:46 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai@xenomai.org Actually i have been running with CONFIG_XENO_HW_UNLOCKED_SWITCH the whole time and i also raised the stack size from 4k to 8k. I do however think there could be some fishyness in entry_32.S. In "transfer_to_handler" SPRN_SPRG3 is used to check for stack overflow (at least in my kernel 2.6.29.6), but i must admit i haven't seen any of that in the kernel log. /Jesper On 2011-04-14 15:31, Philippe Gerum wrote: > On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote: > >> I wrote about some problems concerning stack corruption when running >> xenomai on ppc. I have found out that if i disable hardware interrupts >> while running "rthal_thread_switch" the problem seems to dissapear >> somewhat. I saw a crash yesterday after running for 3 hours, and i'm >> currently running a test (has been running for 3 hours). Usually it >> would fail after 30-40 minutes. My question is: could there be a problem >> if we receive an interrupt between updating the stack pointer and the >> sprg3 register with the new thread pointer? >> >> > Normally, there should not be any issue (famous last words), since we > would run Xenomai-only code over the preempted context, and we don't > depend on SPRG3 to fetch the current phys address. In fact, at this > stage we simply don't care about the linux context, only referring to > the current Xenomai thread, which is obtained differently. > > Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the "machine" > config area, if this ends up being rock-solid, then this would be a hint > that something may be fishy in this area. Raising your k-thread stack > sizes in a separate test may be interesting to check too, if not already > done. > > > >> /Jesper >> >> >> >> _______________________________________________ >> Xenomai-core mailing list >> Xenomai-core@domain.hid >> https://mail.gna.org/listinfo/xenomai-core >> > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] kernel threads crash - possible race condition? 2011-04-14 13:46 ` Jesper Christensen @ 2011-04-14 14:09 ` Philippe Gerum 2011-04-14 14:16 ` Jesper Christensen 0 siblings, 1 reply; 5+ messages in thread From: Philippe Gerum @ 2011-04-14 14:09 UTC (permalink / raw) To: Jesper Christensen; +Cc: xenomai@xenomai.org On Thu, 2011-04-14 at 15:46 +0200, Jesper Christensen wrote: > Actually i have been running with CONFIG_XENO_HW_UNLOCKED_SWITCH the > whole time You mean enabled? > and i also raised the stack size from 4k to 8k. I do however > think there could be some fishyness in entry_32.S. In > "transfer_to_handler" SPRN_SPRG3 is used to check for stack overflow (at > least in my kernel 2.6.29.6), but i must admit i haven't seen any of > that in the kernel log. > Mmm, you are right. In any case, what we want with the unmasked switch feature is to allow interrupts while we flush the tlb and set the new mm context, which may be lengthy on some low end platforms. Allowing the switch code to be preempted during the register swap is of no use wrt latency. Do you have a patch at hand which you could post that flips MSR_EE in rthal_thread_switch already? > /Jesper > > > On 2011-04-14 15:31, Philippe Gerum wrote: > > On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote: > > > >> I wrote about some problems concerning stack corruption when running > >> xenomai on ppc. I have found out that if i disable hardware interrupts > >> while running "rthal_thread_switch" the problem seems to dissapear > >> somewhat. I saw a crash yesterday after running for 3 hours, and i'm > >> currently running a test (has been running for 3 hours). Usually it > >> would fail after 30-40 minutes. My question is: could there be a problem > >> if we receive an interrupt between updating the stack pointer and the > >> sprg3 register with the new thread pointer? > >> > >> > > Normally, there should not be any issue (famous last words), since we > > would run Xenomai-only code over the preempted context, and we don't > > depend on SPRG3 to fetch the current phys address. In fact, at this > > stage we simply don't care about the linux context, only referring to > > the current Xenomai thread, which is obtained differently. > > > > Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the "machine" > > config area, if this ends up being rock-solid, then this would be a hint > > that something may be fishy in this area. Raising your k-thread stack > > sizes in a separate test may be interesting to check too, if not already > > done. > > > > > > > >> /Jesper > >> > >> > >> > >> _______________________________________________ > >> Xenomai-core mailing list > >> Xenomai-core@domain.hid > >> https://mail.gna.org/listinfo/xenomai-core > >> > > > -- Philippe. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] kernel threads crash - possible race condition? 2011-04-14 14:09 ` Philippe Gerum @ 2011-04-14 14:16 ` Jesper Christensen 0 siblings, 0 replies; 5+ messages in thread From: Jesper Christensen @ 2011-04-14 14:16 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai@xenomai.org On 2011-04-14 16:09, Philippe Gerum wrote: > On Thu, 2011-04-14 at 15:46 +0200, Jesper Christensen wrote: > >> Actually i have been running with CONFIG_XENO_HW_UNLOCKED_SWITCH the >> whole time >> > You mean enabled? > Disabled, sorry. > >> and i also raised the stack size from 4k to 8k. I do however >> think there could be some fishyness in entry_32.S. In >> "transfer_to_handler" SPRN_SPRG3 is used to check for stack overflow (at >> least in my kernel 2.6.29.6), but i must admit i haven't seen any of >> that in the kernel log. >> >> > Mmm, you are right. In any case, what we want with the unmasked switch > feature is to allow interrupts while we flush the tlb and set the new mm > context, which may be lengthy on some low end platforms. Allowing the > switch code to be preempted during the register swap is of no use wrt > latency. > > Do you have a patch at hand which you could post that flips MSR_EE in > rthal_thread_switch already? > > This protects the whole function, but it should flip the bit inside like you suggest. diff --git a/include/asm-powerpc/bits/pod.h b/include/asm-powerpc/bits/pod.h old mode 100644 new mode 100755 index 6269907..e279647 --- a/include/asm-powerpc/bits/pod.h +++ b/include/asm-powerpc/bits/pod.h @@ -106,6 +106,7 @@ static inline void xnarch_switch_to(xnarchtcb_t *out_tcb, struct mm_struct *prev_mm = out_tcb->active_mm, *next_mm; struct task_struct *prev = out_tcb->active_task; struct task_struct *next = in_tcb->user_task; + unsigned long flags; if (likely(next != NULL)) { in_tcb->active_task = next; @@ -156,12 +157,14 @@ static inline void xnarch_switch_to(xnarchtcb_t *out_tcb, #endif /* PPC32 */ #endif /* !__IPIPE_FEATURE_HARDENED_SWITCHMM */ + rthal_local_irq_save_hw(flags); #ifdef CONFIG_PPC64 rthal_thread_switch(out_tcb->tsp, in_tcb->tsp, next == NULL); #else rthal_thread_switch(out_tcb->tsp, in_tcb->tsp); #endif barrier(); + rthal_local_irq_restore_hw(flags); } >> /Jesper >> >> >> On 2011-04-14 15:31, Philippe Gerum wrote: >> >>> On Thu, 2011-04-14 at 15:04 +0200, Jesper Christensen wrote: >>> >>> >>>> I wrote about some problems concerning stack corruption when running >>>> xenomai on ppc. I have found out that if i disable hardware interrupts >>>> while running "rthal_thread_switch" the problem seems to dissapear >>>> somewhat. I saw a crash yesterday after running for 3 hours, and i'm >>>> currently running a test (has been running for 3 hours). Usually it >>>> would fail after 30-40 minutes. My question is: could there be a problem >>>> if we receive an interrupt between updating the stack pointer and the >>>> sprg3 register with the new thread pointer? >>>> >>>> >>>> >>> Normally, there should not be any issue (famous last words), since we >>> would run Xenomai-only code over the preempted context, and we don't >>> depend on SPRG3 to fetch the current phys address. In fact, at this >>> stage we simply don't care about the linux context, only referring to >>> the current Xenomai thread, which is obtained differently. >>> >>> Try switching off CONFIG_XENO_HW_UNLOCKED_SWITCH, in the "machine" >>> config area, if this ends up being rock-solid, then this would be a hint >>> that something may be fishy in this area. Raising your k-thread stack >>> sizes in a separate test may be interesting to check too, if not already >>> done. >>> >>> >>> >>> >>>> /Jesper >>>> >>>> >>>> >>>> _______________________________________________ >>>> Xenomai-core mailing list >>>> Xenomai-core@domain.hid >>>> https://mail.gna.org/listinfo/xenomai-core >>>> >>>> >>> >>> >> > ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-04-14 14:16 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-14 13:04 [Xenomai-core] kernel threads crash - possible race condition? Jesper Christensen 2011-04-14 13:31 ` Philippe Gerum 2011-04-14 13:46 ` Jesper Christensen 2011-04-14 14:09 ` Philippe Gerum 2011-04-14 14:16 ` Jesper Christensen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.