* [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. @ 2011-12-23 17:33 Lennart Sorensen 2011-12-23 18:17 ` Philippe Gerum 2012-01-04 13:34 ` Philippe Gerum 0 siblings, 2 replies; 12+ messages in thread From: Lennart Sorensen @ 2011-12-23 17:33 UTC (permalink / raw) To: xenomai After spending quite a while trying to explain how things like /bin/echo could possibly segfault, I finally discovered that the new feature in xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible context switches is what is corrupting the state of random linux processes once in a while. After turning the option off, I haven't seen a single crash just like 2.4.10. So something subtle is wrong with this option. It appears to be most likely to occour (possibly only likely) when xenomai is handling interrupts. It seems that getting an interrupt in the middle of a context switch at the wrong time corrupts the process that is being switched to or from (no idea which it is). Unless someone can think of a way to track down and fix this I would certainly suggest making the option off by default instead of on. With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore. -- Len Sorensen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2011-12-23 17:33 [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc Lennart Sorensen @ 2011-12-23 18:17 ` Philippe Gerum 2011-12-23 18:32 ` Lennart Sorensen 2012-01-04 13:34 ` Philippe Gerum 1 sibling, 1 reply; 12+ messages in thread From: Philippe Gerum @ 2011-12-23 18:17 UTC (permalink / raw) To: Lennart Sorensen; +Cc: xenomai On 12/23/2011 06:33 PM, Lennart Sorensen wrote: > After spending quite a while trying to explain how things like /bin/echo > could possibly segfault, I finally discovered that the new feature in > xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible > context switches is what is corrupting the state of random linux processes > once in a while. > > After turning the option off, I haven't seen a single crash just like 2.4.10. > > So something subtle is wrong with this option. > > It appears to be most likely to occour (possibly only likely) when > xenomai is handling interrupts. > > It seems that getting an interrupt in the middle of a context switch at > the wrong time corrupts the process that is being switched to or from > (no idea which it is). > > Unless someone can think of a way to track down and fix this I would > certainly suggest making the option off by default instead of on. > Papering over a bug this way is certainly not an option. > With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore. > Which kernel version, what ppc hardware? -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2011-12-23 18:17 ` Philippe Gerum @ 2011-12-23 18:32 ` Lennart Sorensen 2011-12-23 20:08 ` Philippe Gerum 0 siblings, 1 reply; 12+ messages in thread From: Lennart Sorensen @ 2011-12-23 18:32 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote: > Papering over a bug this way is certainly not an option. Long term it certainly isn't. > Which kernel version, what ppc hardware? 3.0.13, 3.0.9, 3.0.8. mpc8360e. xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04 -- Len Sorensen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2011-12-23 18:32 ` Lennart Sorensen @ 2011-12-23 20:08 ` Philippe Gerum 2011-12-23 20:25 ` Lennart Sorensen 0 siblings, 1 reply; 12+ messages in thread From: Philippe Gerum @ 2011-12-23 20:08 UTC (permalink / raw) To: Lennart Sorensen; +Cc: xenomai On 12/23/2011 07:32 PM, Lennart Sorensen wrote: > On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote: >> Papering over a bug this way is certainly not an option. > > Long term it certainly isn't. > >> Which kernel version, what ppc hardware? > > 3.0.13, 3.0.9, 3.0.8. mpc8360e. > > xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04 > Do you have a typical test scenario which triggers this bug? -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2011-12-23 20:08 ` Philippe Gerum @ 2011-12-23 20:25 ` Lennart Sorensen 2011-12-23 21:48 ` Philippe Gerum 0 siblings, 1 reply; 12+ messages in thread From: Lennart Sorensen @ 2011-12-23 20:25 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote: > Do you have a typical test scenario which triggers this bug? It can take a couple of hours under pretty heavy load to get one occourance. But with preemptible context swiches off we haven't seen any in a week. For sure xenomai tasks are handling interrupts quite a lot at the time. I wish we had a simple test case to show it, but it seems to require triggering an interrupt in the middle of a context switch at exactly the wrong place. -- Len Sorensen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2011-12-23 20:25 ` Lennart Sorensen @ 2011-12-23 21:48 ` Philippe Gerum 2011-12-23 21:55 ` Lennart Sorensen 0 siblings, 1 reply; 12+ messages in thread From: Philippe Gerum @ 2011-12-23 21:48 UTC (permalink / raw) To: Lennart Sorensen; +Cc: xenomai On 12/23/2011 09:25 PM, Lennart Sorensen wrote: > On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote: >> Do you have a typical test scenario which triggers this bug? > > It can take a couple of hours under pretty heavy load to get one > occourance. But with preemptible context swiches off we haven't seen > any in a week. > > For sure xenomai tasks are handling interrupts quite a lot at the time. > > I wish we had a simple test case to show it, but it seems to require > triggering an interrupt in the middle of a context switch at exactly > the wrong place. > Is it reproducible with the basic latency or cyclic tests if waiting for long enough? Running ltp in parallel would trigger a decent load, but sometimes two shell loops forking commands in the background are enough to trigger a variety of issues when something fragile exists in the mmu layer as modified by the I-Pipe. -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2011-12-23 21:48 ` Philippe Gerum @ 2011-12-23 21:55 ` Lennart Sorensen 2011-12-23 21:58 ` Philippe Gerum 0 siblings, 1 reply; 12+ messages in thread From: Lennart Sorensen @ 2011-12-23 21:55 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote: > Is it reproducible with the basic latency or cyclic tests if waiting > for long enough? Running ltp in parallel would trigger a decent > load, but sometimes two shell loops forking commands in the > background are enough to trigger a variety of issues when something > fragile exists in the mmu layer as modified by the I-Pipe. Well we can try after I come back from vacation in a couple of weeks. -- Len Sorensen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2011-12-23 21:55 ` Lennart Sorensen @ 2011-12-23 21:58 ` Philippe Gerum 0 siblings, 0 replies; 12+ messages in thread From: Philippe Gerum @ 2011-12-23 21:58 UTC (permalink / raw) To: Lennart Sorensen; +Cc: xenomai On 12/23/2011 10:55 PM, Lennart Sorensen wrote: > On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote: >> Is it reproducible with the basic latency or cyclic tests if waiting >> for long enough? Running ltp in parallel would trigger a decent >> load, but sometimes two shell loops forking commands in the >> background are enough to trigger a variety of issues when something >> fragile exists in the mmu layer as modified by the I-Pipe. > > Well we can try after I come back from vacation in a couple of weeks. > Ok. I will try to reproduce on my side as well. -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2011-12-23 17:33 [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc Lennart Sorensen 2011-12-23 18:17 ` Philippe Gerum @ 2012-01-04 13:34 ` Philippe Gerum 2018-03-21 15:40 ` [Xenomai] " Frank Benkert 1 sibling, 1 reply; 12+ messages in thread From: Philippe Gerum @ 2012-01-04 13:34 UTC (permalink / raw) To: Lennart Sorensen; +Cc: xenomai On 12/23/2011 06:33 PM, Lennart Sorensen wrote: > After spending quite a while trying to explain how things like /bin/echo > could possibly segfault, I finally discovered that the new feature in > xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible > context switches is what is corrupting the state of random linux processes > once in a while. > > After turning the option off, I haven't seen a single crash just like 2.4.10. > > So something subtle is wrong with this option. > > It appears to be most likely to occour (possibly only likely) when > xenomai is handling interrupts. > > It seems that getting an interrupt in the middle of a context switch at > the wrong time corrupts the process that is being switched to or from > (no idea which it is). > > Unless someone can think of a way to track down and fix this I would > certainly suggest making the option off by default instead of on. > > With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore. > Does the patch below help? http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=f38d0b2a820104411c5a33636f6dab634a9bffc1 -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Xenomai] [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2012-01-04 13:34 ` Philippe Gerum @ 2018-03-21 15:40 ` Frank Benkert 2018-03-21 16:40 ` Philippe Gerum 0 siblings, 1 reply; 12+ messages in thread From: Frank Benkert @ 2018-03-21 15:40 UTC (permalink / raw) To: xenomai Sorry for pumping this old topic, but it needs some clarification: For all of you who find this thread via search engines because you are looking for sporadic crashes on Xenomai PowerPC in relation to task switches: This patch does not fix the problem - at least not in our case. After several years, the problem suddenly appeared with us, as we are now increasingly using the Ethernet interface on our old product. Maybe the new interrupt load triggers this old bug. There is a patch in Xenomai 3 which removes the buggy feature because of problems with the MMU. The only way to fix these sporadic crashes is to disable the switch (CONFIG_XENO_HW_UNLOCKED_SWITCH=n). See also http://git.xenomai.org/?p=ipipe.git;a=commit;h=614aa59453dacf7693fbb18229c27676c2803dbb http://git.xenomai.org/?p=ipipe.git;a=commit;h=04ea520ab96a16ec65529a2efed92c9a4a8bda34 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai] [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2018-03-21 15:40 ` [Xenomai] " Frank Benkert @ 2018-03-21 16:40 ` Philippe Gerum 2018-03-22 7:22 ` Frank Benkert 0 siblings, 1 reply; 12+ messages in thread From: Philippe Gerum @ 2018-03-21 16:40 UTC (permalink / raw) To: Frank Benkert, xenomai On 03/21/2018 04:40 PM, Frank Benkert wrote: > Sorry for pumping this old topic, but it needs some clarification: > > For all of you who find this thread via search engines because you are > looking for sporadic crashes on Xenomai PowerPC in relation to task > switches: > This patch does not fix the problem - at least not in our case. After > several years, the problem suddenly appeared with us, as we are now > increasingly using the Ethernet interface on our old product. Maybe the > new interrupt load triggers this old bug. > > There is a patch in Xenomai 3 which removes the buggy feature because of > problems with the MMU. No, the commit you are referring to reads as this: commit 323824258692a6d175881d18a644b276858b353d Author: Philippe Gerum <rpm@xenomai.org> Date: Sat Nov 14 16:41:13 2015 +0100 cobalt/powerpc: drop support for unlocked context switch This feature never actually brought any measurable gain on powerpc platforms, compared to the complexity of its implementation in the pipeline. It was primarily aimed at reducing latency for interrupt handlers when costly cache and TLB flushes are required to switch context, at the expense of increasing the scheduling latency. It turned out to be counter-productive on common powerpc platforms, with efficient MMUs. This feature has been default off for a while now, and 4.1+ pipelines won't provide support for it anymore. Time to drop support from Xenomai too. This was a decision based on the unfavorable performance vs complexity ratio, not because of any pending bug that could not be fixed. > The only way to fix these sporadic crashes is to disable the switch > (CONFIG_XENO_HW_UNLOCKED_SWITCH=n). > Possibly not, because your reasoning assumes that only the IRQ pipeline might be involved in dealing with unlocked switching, which is wrong. The Xenomai core is involved too, as hinted by the commit log above. If you are actually running the stock 2.6.0 release, another attempt at addressing the random crash issue would be to merge this commit: commit ffc58d175a4e6f335c0e42946fa45ca984a93ce4 Author: Philippe Gerum <rpm@xenomai.org> Date: Wed Jan 4 14:14:11 2012 +0100 hal/powerpc: plug race in thread context switch Since rthal_thread_switch() is entered with hw IRQs enabled when CONFIG_XENO_HW_UNLOCKED_SWITCH is in effect, we ought to mask them around the register swap. This is in essence an overdue fix for the issue spotted and solved quite some time ago by Jesper Christensen, see: https://mail.gna.org/public/xenomai-core/2011-04/msg00095.html. Configurations with CONFIG_XENO_HW_UNLOCKED_SWITCH disabled are immune to this issue. -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Xenomai] [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc. 2018-03-21 16:40 ` Philippe Gerum @ 2018-03-22 7:22 ` Frank Benkert 0 siblings, 0 replies; 12+ messages in thread From: Frank Benkert @ 2018-03-22 7:22 UTC (permalink / raw) To: xenomai Hi Philippe, thanks for responding that fast. I've only just realized that I posted on the wrong list. Sorry for that. We currently running Xenomai 2.6.5 and the Problems still exists: Random Crashes of Xenomai- and Non-Xenomai Processes with SIGSEG and SIGILL at various positions without any recognisable correlations. This means, that the Patch > commit ffc58d175a4e6f335c0e42946fa45ca984a93ce4 > Author: Philippe Gerum <rpm@xenomai.org> > Date: Wed Jan 4 14:14:11 2012 +0100 > > hal/powerpc: plug race in thread context switch does not fix the problem in our case - sorry. My recommendation at least on an old MPC5200 processor is to disable the unlocked-switch functionality to prevent this crashes. In the meantime, our test systems run ten times as long without any abnormalities. This is what my original post should mention for all guys stumbling over this thread while digging for this random crashes. Now I understand, that removing this feature in Xenomai-3 was not driven by bug reports. My mistake. Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-03-22 7:22 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-23 17:33 [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc Lennart Sorensen 2011-12-23 18:17 ` Philippe Gerum 2011-12-23 18:32 ` Lennart Sorensen 2011-12-23 20:08 ` Philippe Gerum 2011-12-23 20:25 ` Lennart Sorensen 2011-12-23 21:48 ` Philippe Gerum 2011-12-23 21:55 ` Lennart Sorensen 2011-12-23 21:58 ` Philippe Gerum 2012-01-04 13:34 ` Philippe Gerum 2018-03-21 15:40 ` [Xenomai] " Frank Benkert 2018-03-21 16:40 ` Philippe Gerum 2018-03-22 7:22 ` Frank Benkert
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.