* [Xenomai-help] Handling Linux Signals in primary domain context
@ 2010-06-01 13:50 Tschaeche IT-Services
2010-06-01 13:52 ` Gilles Chanteperdrix
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Tschaeche IT-Services @ 2010-06-01 13:50 UTC (permalink / raw)
To: xenomai
Hi,
we have the following scenario:
A high priority periodic primary domain task (H), which calls
rt_task_suspend(L) in each even period and rt_task_resume(L)
in each odd period on a low priority primary domain task (L).
L-task consumes all available CPU resources (while(1)).
Thus, the rest of each cycle (after H has got the CPU) is used
alternately by L-task, ROOT-task, L-task,...
In our debugging implementation, we send a SIGTRAP to L-task.
H-task recognizes this by reporting EINTR when calling rt_task_suspend(L).
But, the while(1) in L-task is not interrupted although there is a SIGTRAP
pending.
Our workaround could be, to send a rt_signal when rt_task_suspend()
returns EINTR and, then, in the rt-signal handler migrate L-task
to secondary domain (calling rt_task_set_mode(T_PRIMARY,0))
initiating the Linux scheduler, which, then, initiates the SIGTRAP handling
in secondary domain context.
Is there a simpler way to get primary domain tasks interrupted
by Linux signals? Xenomai already knows about the pending signal
and, maybe, could initiate the secondary domain switch on a primary scheduler
event.
Thanks,
Olli
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-01 13:50 [Xenomai-help] Handling Linux Signals in primary domain context Tschaeche IT-Services @ 2010-06-01 13:52 ` Gilles Chanteperdrix 2010-06-01 13:59 ` Gilles Chanteperdrix 2010-06-01 14:32 ` Philippe Gerum 2 siblings, 0 replies; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-01 13:52 UTC (permalink / raw) To: Tschaeche IT-Services; +Cc: xenomai Tschaeche IT-Services wrote: > Hi, > > we have the following scenario: > > A high priority periodic primary domain task (H), which calls > rt_task_suspend(L) in each even period and rt_task_resume(L) > in each odd period on a low priority primary domain task (L). > L-task consumes all available CPU resources (while(1)). > Thus, the rest of each cycle (after H has got the CPU) is used > alternately by L-task, ROOT-task, L-task,... > > In our debugging implementation, we send a SIGTRAP to L-task. > H-task recognizes this by reporting EINTR when calling rt_task_suspend(L). > But, the while(1) in L-task is not interrupted although there is a SIGTRAP > pending. > > Our workaround could be, to send a rt_signal when rt_task_suspend() > returns EINTR and, then, in the rt-signal handler migrate L-task > to secondary domain (calling rt_task_set_mode(T_PRIMARY,0)) > initiating the Linux scheduler, which, then, initiates the SIGTRAP handling > in secondary domain context. > > Is there a simpler way to get primary domain tasks interrupted > by Linux signals? Xenomai already knows about the pending signal > and, maybe, could initiate the secondary domain switch on a primary scheduler > event. Could you send us a self-contained minimal program which exhibits this behaviour? -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-01 13:50 [Xenomai-help] Handling Linux Signals in primary domain context Tschaeche IT-Services 2010-06-01 13:52 ` Gilles Chanteperdrix @ 2010-06-01 13:59 ` Gilles Chanteperdrix 2010-06-01 14:32 ` Philippe Gerum 2 siblings, 0 replies; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-01 13:59 UTC (permalink / raw) To: Tschaeche IT-Services; +Cc: xenomai Tschaeche IT-Services wrote: > Hi, > > we have the following scenario: > > A high priority periodic primary domain task (H), which calls > rt_task_suspend(L) in each even period and rt_task_resume(L) > in each odd period on a low priority primary domain task (L). > L-task consumes all available CPU resources (while(1)). > Thus, the rest of each cycle (after H has got the CPU) is used > alternately by L-task, ROOT-task, L-task,... > > In our debugging implementation, we send a SIGTRAP to L-task. > H-task recognizes this by reporting EINTR when calling rt_task_suspend(L). > But, the while(1) in L-task is not interrupted although there is a SIGTRAP > pending. That is expected, the automatic migration from primary mode to secondary mode when recieving a signal only works if the task emits syscall. > > Our workaround could be, to send a rt_signal when rt_task_suspend() > returns EINTR and, then, in the rt-signal handler migrate L-task > to secondary domain (calling rt_task_set_mode(T_PRIMARY,0)) > initiating the Linux scheduler, which, then, initiates the SIGTRAP handling > in secondary domain context. > > Is there a simpler way to get primary domain tasks interrupted > by Linux signals? Xenomai already knows about the pending signal > and, maybe, could initiate the secondary domain switch on a primary scheduler > event. Xenomai tasks are expected to emit system calls from time to time. I am afraid your use case is kind of out of what Xenomai was made for. -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-01 13:50 [Xenomai-help] Handling Linux Signals in primary domain context Tschaeche IT-Services 2010-06-01 13:52 ` Gilles Chanteperdrix 2010-06-01 13:59 ` Gilles Chanteperdrix @ 2010-06-01 14:32 ` Philippe Gerum 2010-06-01 15:54 ` Tschaeche IT-Services 2 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2010-06-01 14:32 UTC (permalink / raw) To: Tschaeche IT-Services; +Cc: xenomai On Tue, 2010-06-01 at 15:50 +0200, Tschaeche IT-Services wrote: > Hi, > > we have the following scenario: > > A high priority periodic primary domain task (H), which calls > rt_task_suspend(L) in each even period and rt_task_resume(L) > in each odd period on a low priority primary domain task (L). > L-task consumes all available CPU resources (while(1)). > Thus, the rest of each cycle (after H has got the CPU) is used > alternately by L-task, ROOT-task, L-task,... > > In our debugging implementation, we send a SIGTRAP to L-task. > H-task recognizes this by reporting EINTR when calling rt_task_suspend(L). > But, the while(1) in L-task is not interrupted although there is a SIGTRAP > pending. Using SIGTRAP will badly conflict with GDB. Hope this is ok. > > Our workaround could be, to send a rt_signal when rt_task_suspend() > returns EINTR and, then, in the rt-signal handler migrate L-task > to secondary domain (calling rt_task_set_mode(T_PRIMARY,0)) > initiating the Linux scheduler, which, then, initiates the SIGTRAP handling > in secondary domain context. > > Is there a simpler way to get primary domain tasks interrupted > by Linux signals? Xenomai already knows about the pending signal > and, maybe, could initiate the secondary domain switch on a primary scheduler > event. Not in the absence of syscall. We thought about this once already, when considering how a watchdog preempting a runaway task in primary mode could force a secondary mode switch: there is no sane and easy solution to this unfortunately. If the basic idea is about throttling the activity of the L-task, then you could use the sporadic server policy (enabled via pthread_setschedparam_ex()). > > Thanks, > > Olli > > _______________________________________________ > Xenomai-help mailing list > Xenomai-help@domain.hid > https://mail.gna.org/listinfo/xenomai-help -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-01 14:32 ` Philippe Gerum @ 2010-06-01 15:54 ` Tschaeche IT-Services 2010-06-01 16:52 ` Tschaeche IT-Services 2010-06-01 16:58 ` Jan Kiszka 0 siblings, 2 replies; 27+ messages in thread From: Tschaeche IT-Services @ 2010-06-01 15:54 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 1308 bytes --] On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: > Not in the absence of syscall. We thought about this once already, when > considering how a watchdog preempting a runaway task in primary mode > could force a secondary mode switch: there is no sane and easy solution > to this unfortunately. This is exactly Sigmatek's problem: Our customers develop code within our debugging/development environment. We want to catch this situation (the developer implements a while(1)) with a watchdog throwing SIGTRAP so that our debugger gets active and can locate the problem according to the stack frame... Find attached a separated test case (using SIGTERM which should terminate the application). When pressing space the system freezes (work_l() is in the while() loop with pending signal and work_h() does not rt_task_suspend() anymore (returning EINTR). Then, we implement a workaround sending a rt-signal when rt_task_suspend() returns EINTR. In the rt-signal handler we explicitely migrate the task to secondary domain, where linux signal handling is triggered... Thanks, Olli -- Tschaeche IT-Services Tel.: +49/9134/9089850 Dr.-Ing. Oliver Tschäche Mobil: +49/176/20435601 Welluckenweg 4 Email: services@domain.hid 91077 Neunkirchen [-- Attachment #2: signal2xenomai.c --] [-- Type: text/x-csrc, Size: 1996 bytes --] /* compile with gcc -Wall -D_GNU_SOURCE -lpthread -o thisfile thisfile.c */ /* * Simple test app to show pthread api mutex behaviour. * * This is compared against the Xenomai native api behaviour. * See mutex_xeno_native.c */ #include <stdio.h> #include <string.h> #include <stdlib.h> #include <sys/mman.h> /* Needed for mlockall() */ #include <limits.h> #include <pthread.h> #include "native/task.h" #define MY_STACK_SIZE (100*1024) /* 100 kB is enough for now. */ static pthread_t ph, pl; static RT_TASK xh, xl; static volatile int state = 0; void * work_h(void *cookie) { if (rt_task_shadow(&xh, "high", 50, 0)) { printf("failed to shadow high\n"); return NULL; } if (rt_task_set_periodic(&xh, TM_NOW, 1000000)) { printf("failed to set high periodic\n"); return NULL; } while (1) { if (rt_task_wait_period(NULL)) { printf("wait_period failed\n"); break; } switch (state) { case -1: if (rt_task_suspend(&xl)) { /* work around??? */ } state = 1; break; case 1: rt_task_resume(&xl); state = -1; break; default: break; } } return NULL; } void * work_l(void *cookie) { if (rt_task_shadow(&xl, "low", 25, 0)) { printf("failed to shadow low\n"); return NULL; } if (rt_task_set_mode(0, T_PRIMARY, NULL)) { printf("failed to migrate low\n"); return NULL; } state = -1; while (1) ; return NULL; } int main(void) { pthread_attr_t threadattr; mlockall(MCL_CURRENT | MCL_FUTURE); pthread_attr_init(&threadattr); pthread_attr_setstacksize(&threadattr, MY_STACK_SIZE); pthread_create(&ph, &threadattr, work_h, NULL); printf("high prio watchdog started\n"); pthread_create(&pl, &threadattr, work_l, NULL); printf("low prio work started\n"); printf("Press <ENTER> to send a signal\n"); getc(stdin); pthread_kill(pl, SIGTERM); /* you will not get here, because work_l() eats up your CPU */ printf("Press <ENTER> to finish\n"); getc(stdin); printf("main finished\n"); return 0; } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-01 15:54 ` Tschaeche IT-Services @ 2010-06-01 16:52 ` Tschaeche IT-Services 2010-06-01 16:58 ` Jan Kiszka 1 sibling, 0 replies; 27+ messages in thread From: Tschaeche IT-Services @ 2010-06-01 16:52 UTC (permalink / raw) To: Tschaeche IT-Services; +Cc: xenomai On Tue, Jun 01, 2010 at 05:54:04PM +0200, Tschaeche IT-Services wrote: > Then, we implement a workaround sending a rt-signal > when rt_task_suspend() returns EINTR. In the rt-signal > handler we explicitely migrate the task to secondary > domain, where linux signal handling is triggered... this does not work: rt_task_catch() is only allowed for kernel based tasks :-( Is there any other possibility to interrupt the task and switch it to secondary domain? Thanks, Olli ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-01 15:54 ` Tschaeche IT-Services 2010-06-01 16:52 ` Tschaeche IT-Services @ 2010-06-01 16:58 ` Jan Kiszka 2010-06-02 8:36 ` Gilles Chanteperdrix 1 sibling, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2010-06-01 16:58 UTC (permalink / raw) To: Tschaeche IT-Services; +Cc: xenomai Tschaeche IT-Services wrote: > On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >> Not in the absence of syscall. We thought about this once already, when >> considering how a watchdog preempting a runaway task in primary mode >> could force a secondary mode switch: there is no sane and easy solution >> to this unfortunately. > > This is exactly Sigmatek's problem: Our customers develop code > within our debugging/development environment. We want to catch > this situation (the developer implements a while(1)) with a > watchdog throwing SIGTRAP so that our debugger gets active > and can locate the problem according to the stack frame... CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries to catch "well-behaving" broken threads via SIGDEBUG and kills the hopelessly broken rest - system alive again. You can then debug the former and need to do code review on the latter. Or you could also try to add some loop-breaking Xenomai syscalls (or even more clever checks) to library services the code under suspect usually invokes. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-01 16:58 ` Jan Kiszka @ 2010-06-02 8:36 ` Gilles Chanteperdrix 2010-06-02 9:14 ` Jan Kiszka 2010-06-02 9:15 ` Philippe Gerum 0 siblings, 2 replies; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-02 8:36 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai Jan Kiszka wrote: > Tschaeche IT-Services wrote: >> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>> Not in the absence of syscall. We thought about this once already, when >>> considering how a watchdog preempting a runaway task in primary mode >>> could force a secondary mode switch: there is no sane and easy solution >>> to this unfortunately. >> This is exactly Sigmatek's problem: Our customers develop code >> within our debugging/development environment. We want to catch >> this situation (the developer implements a while(1)) with a >> watchdog throwing SIGTRAP so that our debugger gets active >> and can locate the problem according to the stack frame... > > CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries > to catch "well-behaving" broken threads via SIGDEBUG and kills the > hopelessly broken rest - system alive again. > > You can then debug the former and need to do code review on the latter. > Or you could also try to add some loop-breaking Xenomai syscalls (or > even more clever checks) to library services the code under suspect > usually invokes. I am afraid "well-behaving" means emitting syscalls. We have a radical way to cause a SIGSEGV to be sent to a thread having run amok: set its PC to an invalid address (after having printed the real PC). gdb will not be able to print where the program stopped, but should be able to print the backtrace. -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 8:36 ` Gilles Chanteperdrix @ 2010-06-02 9:14 ` Jan Kiszka 2010-06-02 9:15 ` Philippe Gerum 1 sibling, 0 replies; 27+ messages in thread From: Jan Kiszka @ 2010-06-02 9:14 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Tschaeche IT-Services wrote: >>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>>> Not in the absence of syscall. We thought about this once already, when >>>> considering how a watchdog preempting a runaway task in primary mode >>>> could force a secondary mode switch: there is no sane and easy solution >>>> to this unfortunately. >>> This is exactly Sigmatek's problem: Our customers develop code >>> within our debugging/development environment. We want to catch >>> this situation (the developer implements a while(1)) with a >>> watchdog throwing SIGTRAP so that our debugger gets active >>> and can locate the problem according to the stack frame... >> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries >> to catch "well-behaving" broken threads via SIGDEBUG and kills the >> hopelessly broken rest - system alive again. >> >> You can then debug the former and need to do code review on the latter. >> Or you could also try to add some loop-breaking Xenomai syscalls (or >> even more clever checks) to library services the code under suspect >> usually invokes. > > I am afraid "well-behaving" means emitting syscalls. We have a radical > way to cause a SIGSEGV to be sent to a thread having run amok: set its > PC to an invalid address (after having printed the real PC). gdb will > not be able to print where the program stopped, but should be able to > print the backtrace. Just discussing this with our customer raised spontaneous interest (due to the yet unsolved switching issue with non-RT Xenomai threads). I'm going to look into this, also trying to find some more sophisticated approaches, e.g. simulating a call to preserve the call trace (which would make it really useful) or jumping to some helper function that issues a syscall. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 8:36 ` Gilles Chanteperdrix 2010-06-02 9:14 ` Jan Kiszka @ 2010-06-02 9:15 ` Philippe Gerum 2010-06-02 9:20 ` Jan Kiszka ` (2 more replies) 1 sibling, 3 replies; 27+ messages in thread From: Philippe Gerum @ 2010-06-02 9:15 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Tschaeche IT-Services wrote: > >> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: > >>> Not in the absence of syscall. We thought about this once already, when > >>> considering how a watchdog preempting a runaway task in primary mode > >>> could force a secondary mode switch: there is no sane and easy solution > >>> to this unfortunately. > >> This is exactly Sigmatek's problem: Our customers develop code > >> within our debugging/development environment. We want to catch > >> this situation (the developer implements a while(1)) with a > >> watchdog throwing SIGTRAP so that our debugger gets active > >> and can locate the problem according to the stack frame... > > > > CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries > > to catch "well-behaving" broken threads via SIGDEBUG and kills the > > hopelessly broken rest - system alive again. > > > > You can then debug the former and need to do code review on the latter. > > Or you could also try to add some loop-breaking Xenomai syscalls (or > > even more clever checks) to library services the code under suspect > > usually invokes. > > I am afraid "well-behaving" means emitting syscalls. We have a radical > way to cause a SIGSEGV to be sent to a thread having run amok: set its > PC to an invalid address (after having printed the real PC). gdb will > not be able to print where the program stopped, but should be able to > print the backtrace. > Actually, we could extend this logic and forge a stack frame to return to the preempted application code via some userland trampoline code, doing the switch: [watchdog trigger] forge_return_frame(on =regs->sp, to =regs->pc); regs->pc = __oops_I_did_it_again; __oops_I_did_it_again: __xn_migrate(LINUX_DOMAIN); ret (via forged frame) The thing is, that this brings in some arch-dep code to forge a stack frame (like the kernel uses for signals), that should rather live in the pipeline core. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:15 ` Philippe Gerum @ 2010-06-02 9:20 ` Jan Kiszka 2010-06-02 9:28 ` Philippe Gerum 2010-06-02 9:21 ` Gilles Chanteperdrix 2010-06-02 12:02 ` Daniele Nicolodi 2 siblings, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2010-06-02 9:20 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai@xenomai.org Philippe Gerum wrote: > On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Tschaeche IT-Services wrote: >>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>>>> Not in the absence of syscall. We thought about this once already, when >>>>> considering how a watchdog preempting a runaway task in primary mode >>>>> could force a secondary mode switch: there is no sane and easy solution >>>>> to this unfortunately. >>>> This is exactly Sigmatek's problem: Our customers develop code >>>> within our debugging/development environment. We want to catch >>>> this situation (the developer implements a while(1)) with a >>>> watchdog throwing SIGTRAP so that our debugger gets active >>>> and can locate the problem according to the stack frame... >>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries >>> to catch "well-behaving" broken threads via SIGDEBUG and kills the >>> hopelessly broken rest - system alive again. >>> >>> You can then debug the former and need to do code review on the latter. >>> Or you could also try to add some loop-breaking Xenomai syscalls (or >>> even more clever checks) to library services the code under suspect >>> usually invokes. >> I am afraid "well-behaving" means emitting syscalls. We have a radical >> way to cause a SIGSEGV to be sent to a thread having run amok: set its >> PC to an invalid address (after having printed the real PC). gdb will >> not be able to print where the program stopped, but should be able to >> print the backtrace. >> > > Actually, we could extend this logic and forge a stack frame to return > to the preempted application code via some userland trampoline code, > doing the switch: > > [watchdog trigger] > forge_return_frame(on =regs->sp, to =regs->pc); > regs->pc = __oops_I_did_it_again; > > __oops_I_did_it_again: > __xn_migrate(LINUX_DOMAIN); > ret (via forged frame) Yep, that's what came to my mind as well. But the __oops_I_did_it_again part has to reside in user space, no? > > The thing is, that this brings in some arch-dep code to forge a stack > frame (like the kernel uses for signals), that should rather live in the > pipeline core. Actually, we are then close to enabling signal delivery outside syscalls... Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:20 ` Jan Kiszka @ 2010-06-02 9:28 ` Philippe Gerum 2010-06-02 9:37 ` Gilles Chanteperdrix 0 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2010-06-02 9:28 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: > >> Jan Kiszka wrote: > >>> Tschaeche IT-Services wrote: > >>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: > >>>>> Not in the absence of syscall. We thought about this once already, when > >>>>> considering how a watchdog preempting a runaway task in primary mode > >>>>> could force a secondary mode switch: there is no sane and easy solution > >>>>> to this unfortunately. > >>>> This is exactly Sigmatek's problem: Our customers develop code > >>>> within our debugging/development environment. We want to catch > >>>> this situation (the developer implements a while(1)) with a > >>>> watchdog throwing SIGTRAP so that our debugger gets active > >>>> and can locate the problem according to the stack frame... > >>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries > >>> to catch "well-behaving" broken threads via SIGDEBUG and kills the > >>> hopelessly broken rest - system alive again. > >>> > >>> You can then debug the former and need to do code review on the latter. > >>> Or you could also try to add some loop-breaking Xenomai syscalls (or > >>> even more clever checks) to library services the code under suspect > >>> usually invokes. > >> I am afraid "well-behaving" means emitting syscalls. We have a radical > >> way to cause a SIGSEGV to be sent to a thread having run amok: set its > >> PC to an invalid address (after having printed the real PC). gdb will > >> not be able to print where the program stopped, but should be able to > >> print the backtrace. > >> > > > > Actually, we could extend this logic and forge a stack frame to return > > to the preempted application code via some userland trampoline code, > > doing the switch: > > > > [watchdog trigger] > > forge_return_frame(on =regs->sp, to =regs->pc); > > regs->pc = __oops_I_did_it_again; > > > > __oops_I_did_it_again: > > __xn_migrate(LINUX_DOMAIN); > > ret (via forged frame) > > Yep, that's what came to my mind as well. But the __oops_I_did_it_again > part has to reside in user space, no? Clearly, yes. Either we map this explictly, or we just make sure to compile it in each app, and pass its address at skin binding time. Our text is mmlocked anyway. > > > > > The thing is, that this brings in some arch-dep code to forge a stack > > frame (like the kernel uses for signals), that should rather live in the > > pipeline core. > > Actually, we are then close to enabling signal delivery outside syscalls... > Yes, looks like. > Jan > -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:28 ` Philippe Gerum @ 2010-06-02 9:37 ` Gilles Chanteperdrix 2010-06-02 10:06 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-02 9:37 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, xenomai@xenomai.org Philippe Gerum wrote: > On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote: >> Philippe Gerum wrote: >>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: >>>> Jan Kiszka wrote: >>>>> Tschaeche IT-Services wrote: >>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>>>>>> Not in the absence of syscall. We thought about this once already, when >>>>>>> considering how a watchdog preempting a runaway task in primary mode >>>>>>> could force a secondary mode switch: there is no sane and easy solution >>>>>>> to this unfortunately. >>>>>> This is exactly Sigmatek's problem: Our customers develop code >>>>>> within our debugging/development environment. We want to catch >>>>>> this situation (the developer implements a while(1)) with a >>>>>> watchdog throwing SIGTRAP so that our debugger gets active >>>>>> and can locate the problem according to the stack frame... >>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries >>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the >>>>> hopelessly broken rest - system alive again. >>>>> >>>>> You can then debug the former and need to do code review on the latter. >>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or >>>>> even more clever checks) to library services the code under suspect >>>>> usually invokes. >>>> I am afraid "well-behaving" means emitting syscalls. We have a radical >>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its >>>> PC to an invalid address (after having printed the real PC). gdb will >>>> not be able to print where the program stopped, but should be able to >>>> print the backtrace. >>>> >>> Actually, we could extend this logic and forge a stack frame to return >>> to the preempted application code via some userland trampoline code, >>> doing the switch: >>> >>> [watchdog trigger] >>> forge_return_frame(on =regs->sp, to =regs->pc); >>> regs->pc = __oops_I_did_it_again; >>> >>> __oops_I_did_it_again: >>> __xn_migrate(LINUX_DOMAIN); >>> ret (via forged frame) >> Yep, that's what came to my mind as well. But the __oops_I_did_it_again >> part has to reside in user space, no? > > Clearly, yes. Either we map this explictly, or we just make sure to > compile it in each app, and pass its address at skin binding time. Our > text is mmlocked anyway. > >>> The thing is, that this brings in some arch-dep code to forge a stack >>> frame (like the kernel uses for signals), that should rather live in the >>> pipeline core. >> Actually, we are then close to enabling signal delivery outside syscalls... >> > > Yes, looks like. When thinking about this real signals things, I was thinking about putting the forging code into Xenomai (the code is the same for all kernel versions, so there is no reason to put it into the I-pipe, and we may have to emit a special syscall to restore the context when handling the signal is done). What we need the I-pipe for, however, is to trigger some event on the way back to user-space. -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:37 ` Gilles Chanteperdrix @ 2010-06-02 10:06 ` Philippe Gerum 2010-06-02 10:19 ` Gilles Chanteperdrix 2010-06-02 10:29 ` Gilles Chanteperdrix 0 siblings, 2 replies; 27+ messages in thread From: Philippe Gerum @ 2010-06-02 10:06 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai@xenomai.org On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote: > >> Philippe Gerum wrote: > >>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: > >>>> Jan Kiszka wrote: > >>>>> Tschaeche IT-Services wrote: > >>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: > >>>>>>> Not in the absence of syscall. We thought about this once already, when > >>>>>>> considering how a watchdog preempting a runaway task in primary mode > >>>>>>> could force a secondary mode switch: there is no sane and easy solution > >>>>>>> to this unfortunately. > >>>>>> This is exactly Sigmatek's problem: Our customers develop code > >>>>>> within our debugging/development environment. We want to catch > >>>>>> this situation (the developer implements a while(1)) with a > >>>>>> watchdog throwing SIGTRAP so that our debugger gets active > >>>>>> and can locate the problem according to the stack frame... > >>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries > >>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the > >>>>> hopelessly broken rest - system alive again. > >>>>> > >>>>> You can then debug the former and need to do code review on the latter. > >>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or > >>>>> even more clever checks) to library services the code under suspect > >>>>> usually invokes. > >>>> I am afraid "well-behaving" means emitting syscalls. We have a radical > >>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its > >>>> PC to an invalid address (after having printed the real PC). gdb will > >>>> not be able to print where the program stopped, but should be able to > >>>> print the backtrace. > >>>> > >>> Actually, we could extend this logic and forge a stack frame to return > >>> to the preempted application code via some userland trampoline code, > >>> doing the switch: > >>> > >>> [watchdog trigger] > >>> forge_return_frame(on =regs->sp, to =regs->pc); > >>> regs->pc = __oops_I_did_it_again; > >>> > >>> __oops_I_did_it_again: > >>> __xn_migrate(LINUX_DOMAIN); > >>> ret (via forged frame) > >> Yep, that's what came to my mind as well. But the __oops_I_did_it_again > >> part has to reside in user space, no? > > > > Clearly, yes. Either we map this explictly, or we just make sure to > > compile it in each app, and pass its address at skin binding time. Our > > text is mmlocked anyway. > > > >>> The thing is, that this brings in some arch-dep code to forge a stack > >>> frame (like the kernel uses for signals), that should rather live in the > >>> pipeline core. > >> Actually, we are then close to enabling signal delivery outside syscalls... > >> > > > > Yes, looks like. > > When thinking about this real signals things, I was thinking about > putting the forging code into Xenomai (the code is the same for all > kernel versions, so there is no reason to put it into the I-pipe, and we > may have to emit a special syscall to restore the context when handling > the signal is done). What we need the I-pipe for, however, is to trigger > some event on the way back to user-space. > A reason to have this code in the pipeline core is because we would duplicate the setup_rt_frame code already available from the vanilla kernel. It's a bit like xnarch_switch_to: we used to open code most of it in our arch-dep code, mostly duplicating the vanilla switch code, but having switch_mm() ironed enough - on arm and powerpc at least - to be callable from the Xenomai domain as well proved to be a serious relief. Granted, the signal code is unlikely to change a lot, given the strong ABI requirements this has wrt the glibc, but I'm always reluctant to introduce duplicates at both ends of the system; I would rather factor out that code and make it available to both domains, if that makes sense. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 10:06 ` Philippe Gerum @ 2010-06-02 10:19 ` Gilles Chanteperdrix 2010-06-02 10:42 ` Philippe Gerum 2010-06-02 10:29 ` Gilles Chanteperdrix 1 sibling, 1 reply; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-02 10:19 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, xenomai@xenomai.org Philippe Gerum wrote: > On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote: >> Philippe Gerum wrote: >>> On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote: >>>> Philippe Gerum wrote: >>>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: >>>>>> Jan Kiszka wrote: >>>>>>> Tschaeche IT-Services wrote: >>>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>>>>>>>> Not in the absence of syscall. We thought about this once already, when >>>>>>>>> considering how a watchdog preempting a runaway task in primary mode >>>>>>>>> could force a secondary mode switch: there is no sane and easy solution >>>>>>>>> to this unfortunately. >>>>>>>> This is exactly Sigmatek's problem: Our customers develop code >>>>>>>> within our debugging/development environment. We want to catch >>>>>>>> this situation (the developer implements a while(1)) with a >>>>>>>> watchdog throwing SIGTRAP so that our debugger gets active >>>>>>>> and can locate the problem according to the stack frame... >>>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries >>>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the >>>>>>> hopelessly broken rest - system alive again. >>>>>>> >>>>>>> You can then debug the former and need to do code review on the latter. >>>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or >>>>>>> even more clever checks) to library services the code under suspect >>>>>>> usually invokes. >>>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical >>>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its >>>>>> PC to an invalid address (after having printed the real PC). gdb will >>>>>> not be able to print where the program stopped, but should be able to >>>>>> print the backtrace. >>>>>> >>>>> Actually, we could extend this logic and forge a stack frame to return >>>>> to the preempted application code via some userland trampoline code, >>>>> doing the switch: >>>>> >>>>> [watchdog trigger] >>>>> forge_return_frame(on =regs->sp, to =regs->pc); >>>>> regs->pc = __oops_I_did_it_again; >>>>> >>>>> __oops_I_did_it_again: >>>>> __xn_migrate(LINUX_DOMAIN); >>>>> ret (via forged frame) >>>> Yep, that's what came to my mind as well. But the __oops_I_did_it_again >>>> part has to reside in user space, no? >>> Clearly, yes. Either we map this explictly, or we just make sure to >>> compile it in each app, and pass its address at skin binding time. Our >>> text is mmlocked anyway. >>> >>>>> The thing is, that this brings in some arch-dep code to forge a stack >>>>> frame (like the kernel uses for signals), that should rather live in the >>>>> pipeline core. >>>> Actually, we are then close to enabling signal delivery outside syscalls... >>>> >>> Yes, looks like. >> When thinking about this real signals things, I was thinking about >> putting the forging code into Xenomai (the code is the same for all >> kernel versions, so there is no reason to put it into the I-pipe, and we >> may have to emit a special syscall to restore the context when handling >> the signal is done). What we need the I-pipe for, however, is to trigger >> some event on the way back to user-space. >> > > A reason to have this code in the pipeline core is because we would > duplicate the setup_rt_frame code already available from the vanilla > kernel. It's a bit like xnarch_switch_to: we used to open code most of > it in our arch-dep code, mostly duplicating the vanilla switch code, but > having switch_mm() ironed enough - on arm and powerpc at least - to be > callable from the Xenomai domain as well proved to be a serious relief. > > Granted, the signal code is unlikely to change a lot, given the strong > ABI requirements this has wrt the glibc, but I'm always reluctant to > introduce duplicates at both ends of the system; I would rather factor > out that code and make it available to both domains, if that makes > sense. I am not sure it really makes sense: the biggest part of the linux code is used to setup the special frame passed as the last void * pointer of signal handlers with the SA_SIGINFO option, allowing (among others) signal handlers to use setcontext() to implement co-routines, and I am not sure we really want that. And if you do some major revamping of Linux stack frame build functions, you will have merge conflicts every time you upgrade the I-pipe patch. Besides, we still have the return through syscall issue: returning from the signal handler can not be a simple "return" instruction, since we have to save and restore most registers. -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 10:19 ` Gilles Chanteperdrix @ 2010-06-02 10:42 ` Philippe Gerum 2010-06-02 10:51 ` Gilles Chanteperdrix 0 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2010-06-02 10:42 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai@xenomai.org On Wed, 2010-06-02 at 12:19 +0200, Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote: > >> Philippe Gerum wrote: > >>> On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote: > >>>> Philippe Gerum wrote: > >>>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: > >>>>>> Jan Kiszka wrote: > >>>>>>> Tschaeche IT-Services wrote: > >>>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: > >>>>>>>>> Not in the absence of syscall. We thought about this once already, when > >>>>>>>>> considering how a watchdog preempting a runaway task in primary mode > >>>>>>>>> could force a secondary mode switch: there is no sane and easy solution > >>>>>>>>> to this unfortunately. > >>>>>>>> This is exactly Sigmatek's problem: Our customers develop code > >>>>>>>> within our debugging/development environment. We want to catch > >>>>>>>> this situation (the developer implements a while(1)) with a > >>>>>>>> watchdog throwing SIGTRAP so that our debugger gets active > >>>>>>>> and can locate the problem according to the stack frame... > >>>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries > >>>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the > >>>>>>> hopelessly broken rest - system alive again. > >>>>>>> > >>>>>>> You can then debug the former and need to do code review on the latter. > >>>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or > >>>>>>> even more clever checks) to library services the code under suspect > >>>>>>> usually invokes. > >>>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical > >>>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its > >>>>>> PC to an invalid address (after having printed the real PC). gdb will > >>>>>> not be able to print where the program stopped, but should be able to > >>>>>> print the backtrace. > >>>>>> > >>>>> Actually, we could extend this logic and forge a stack frame to return > >>>>> to the preempted application code via some userland trampoline code, > >>>>> doing the switch: > >>>>> > >>>>> [watchdog trigger] > >>>>> forge_return_frame(on =regs->sp, to =regs->pc); > >>>>> regs->pc = __oops_I_did_it_again; > >>>>> > >>>>> __oops_I_did_it_again: > >>>>> __xn_migrate(LINUX_DOMAIN); > >>>>> ret (via forged frame) > >>>> Yep, that's what came to my mind as well. But the __oops_I_did_it_again > >>>> part has to reside in user space, no? > >>> Clearly, yes. Either we map this explictly, or we just make sure to > >>> compile it in each app, and pass its address at skin binding time. Our > >>> text is mmlocked anyway. > >>> > >>>>> The thing is, that this brings in some arch-dep code to forge a stack > >>>>> frame (like the kernel uses for signals), that should rather live in the > >>>>> pipeline core. > >>>> Actually, we are then close to enabling signal delivery outside syscalls... > >>>> > >>> Yes, looks like. > >> When thinking about this real signals things, I was thinking about > >> putting the forging code into Xenomai (the code is the same for all > >> kernel versions, so there is no reason to put it into the I-pipe, and we > >> may have to emit a special syscall to restore the context when handling > >> the signal is done). What we need the I-pipe for, however, is to trigger > >> some event on the way back to user-space. > >> > > > > A reason to have this code in the pipeline core is because we would > > duplicate the setup_rt_frame code already available from the vanilla > > kernel. It's a bit like xnarch_switch_to: we used to open code most of > > it in our arch-dep code, mostly duplicating the vanilla switch code, but > > having switch_mm() ironed enough - on arm and powerpc at least - to be > > callable from the Xenomai domain as well proved to be a serious relief. > > > > Granted, the signal code is unlikely to change a lot, given the strong > > ABI requirements this has wrt the glibc, but I'm always reluctant to > > introduce duplicates at both ends of the system; I would rather factor > > out that code and make it available to both domains, if that makes > > sense. > > I am not sure it really makes sense: the biggest part of the linux code > is used to setup the special frame passed as the last void * pointer of > signal handlers with the SA_SIGINFO option, allowing (among others) > signal handlers to use setcontext() to implement co-routines, and I am > not sure we really want that. It's not about wanting that, it is about having it for free despite we would not use it. > And if you do some major revamping of > Linux stack frame build functions, you will have merge conflicts every > time you upgrade the I-pipe patch. > I don't think so, for the same reason than you suspect that the kernel code does not change ever so often in that area. > Besides, we still have the return through syscall issue: returning from > the signal handler can not be a simple "return" instruction, since we > have to save and restore most registers. > Sure, but this is not related to the place where you would put the forging code. You may have a Xenomai syscall invoking a pipeline service, we do that all the time actually. Anyway, this issue is not critical to me. If you can achieve that goal in plain Xenomai space without ending up with a two pages long hairy code for each arch, then I won't not be pigheaded. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 10:42 ` Philippe Gerum @ 2010-06-02 10:51 ` Gilles Chanteperdrix 0 siblings, 0 replies; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-02 10:51 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, xenomai@xenomai.org Philippe Gerum wrote: > On Wed, 2010-06-02 at 12:19 +0200, Gilles Chanteperdrix wrote: >> Philippe Gerum wrote: >>> On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote: >>>> Philippe Gerum wrote: >>>>> On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote: >>>>>> Philippe Gerum wrote: >>>>>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: >>>>>>>> Jan Kiszka wrote: >>>>>>>>> Tschaeche IT-Services wrote: >>>>>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>>>>>>>>>> Not in the absence of syscall. We thought about this once already, when >>>>>>>>>>> considering how a watchdog preempting a runaway task in primary mode >>>>>>>>>>> could force a secondary mode switch: there is no sane and easy solution >>>>>>>>>>> to this unfortunately. >>>>>>>>>> This is exactly Sigmatek's problem: Our customers develop code >>>>>>>>>> within our debugging/development environment. We want to catch >>>>>>>>>> this situation (the developer implements a while(1)) with a >>>>>>>>>> watchdog throwing SIGTRAP so that our debugger gets active >>>>>>>>>> and can locate the problem according to the stack frame... >>>>>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries >>>>>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the >>>>>>>>> hopelessly broken rest - system alive again. >>>>>>>>> >>>>>>>>> You can then debug the former and need to do code review on the latter. >>>>>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or >>>>>>>>> even more clever checks) to library services the code under suspect >>>>>>>>> usually invokes. >>>>>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical >>>>>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its >>>>>>>> PC to an invalid address (after having printed the real PC). gdb will >>>>>>>> not be able to print where the program stopped, but should be able to >>>>>>>> print the backtrace. >>>>>>>> >>>>>>> Actually, we could extend this logic and forge a stack frame to return >>>>>>> to the preempted application code via some userland trampoline code, >>>>>>> doing the switch: >>>>>>> >>>>>>> [watchdog trigger] >>>>>>> forge_return_frame(on =regs->sp, to =regs->pc); >>>>>>> regs->pc = __oops_I_did_it_again; >>>>>>> >>>>>>> __oops_I_did_it_again: >>>>>>> __xn_migrate(LINUX_DOMAIN); >>>>>>> ret (via forged frame) >>>>>> Yep, that's what came to my mind as well. But the __oops_I_did_it_again >>>>>> part has to reside in user space, no? >>>>> Clearly, yes. Either we map this explictly, or we just make sure to >>>>> compile it in each app, and pass its address at skin binding time. Our >>>>> text is mmlocked anyway. >>>>> >>>>>>> The thing is, that this brings in some arch-dep code to forge a stack >>>>>>> frame (like the kernel uses for signals), that should rather live in the >>>>>>> pipeline core. >>>>>> Actually, we are then close to enabling signal delivery outside syscalls... >>>>>> >>>>> Yes, looks like. >>>> When thinking about this real signals things, I was thinking about >>>> putting the forging code into Xenomai (the code is the same for all >>>> kernel versions, so there is no reason to put it into the I-pipe, and we >>>> may have to emit a special syscall to restore the context when handling >>>> the signal is done). What we need the I-pipe for, however, is to trigger >>>> some event on the way back to user-space. >>>> >>> A reason to have this code in the pipeline core is because we would >>> duplicate the setup_rt_frame code already available from the vanilla >>> kernel. It's a bit like xnarch_switch_to: we used to open code most of >>> it in our arch-dep code, mostly duplicating the vanilla switch code, but >>> having switch_mm() ironed enough - on arm and powerpc at least - to be >>> callable from the Xenomai domain as well proved to be a serious relief. >>> >>> Granted, the signal code is unlikely to change a lot, given the strong >>> ABI requirements this has wrt the glibc, but I'm always reluctant to >>> introduce duplicates at both ends of the system; I would rather factor >>> out that code and make it available to both domains, if that makes >>> sense. >> I am not sure it really makes sense: the biggest part of the linux code >> is used to setup the special frame passed as the last void * pointer of >> signal handlers with the SA_SIGINFO option, allowing (among others) >> signal handlers to use setcontext() to implement co-routines, and I am >> not sure we really want that. > > It's not about wanting that, it is about having it for free despite we > would not use it. > >> And if you do some major revamping of >> Linux stack frame build functions, you will have merge conflicts every >> time you upgrade the I-pipe patch. >> > > I don't think so, for the same reason than you suspect that the kernel > code does not change ever so often in that area. > >> Besides, we still have the return through syscall issue: returning from >> the signal handler can not be a simple "return" instruction, since we >> have to save and restore most registers. >> > > Sure, but this is not related to the place where you would put the > forging code. You may have a Xenomai syscall invoking a pipeline > service, we do that all the time actually. Yes, OK. We can do this by implementing a trampoline for signals in user-space. > > Anyway, this issue is not critical to me. If you can achieve that goal > in plain Xenomai space without ending up with a two pages long hairy > code for each arch, then I won't not be pigheaded. I have posted what the code would look like from my point of view. It does look pretty simple and linear to me, though is two pages long. -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 10:06 ` Philippe Gerum 2010-06-02 10:19 ` Gilles Chanteperdrix @ 2010-06-02 10:29 ` Gilles Chanteperdrix 1 sibling, 0 replies; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-02 10:29 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, xenomai@xenomai.org Philippe Gerum wrote: > On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote: >> Philippe Gerum wrote: >>> On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote: >>>> Philippe Gerum wrote: >>>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: >>>>>> Jan Kiszka wrote: >>>>>>> Tschaeche IT-Services wrote: >>>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>>>>>>>> Not in the absence of syscall. We thought about this once already, when >>>>>>>>> considering how a watchdog preempting a runaway task in primary mode >>>>>>>>> could force a secondary mode switch: there is no sane and easy solution >>>>>>>>> to this unfortunately. >>>>>>>> This is exactly Sigmatek's problem: Our customers develop code >>>>>>>> within our debugging/development environment. We want to catch >>>>>>>> this situation (the developer implements a while(1)) with a >>>>>>>> watchdog throwing SIGTRAP so that our debugger gets active >>>>>>>> and can locate the problem according to the stack frame... >>>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries >>>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the >>>>>>> hopelessly broken rest - system alive again. >>>>>>> >>>>>>> You can then debug the former and need to do code review on the latter. >>>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or >>>>>>> even more clever checks) to library services the code under suspect >>>>>>> usually invokes. >>>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical >>>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its >>>>>> PC to an invalid address (after having printed the real PC). gdb will >>>>>> not be able to print where the program stopped, but should be able to >>>>>> print the backtrace. >>>>>> >>>>> Actually, we could extend this logic and forge a stack frame to return >>>>> to the preempted application code via some userland trampoline code, >>>>> doing the switch: >>>>> >>>>> [watchdog trigger] >>>>> forge_return_frame(on =regs->sp, to =regs->pc); >>>>> regs->pc = __oops_I_did_it_again; >>>>> >>>>> __oops_I_did_it_again: >>>>> __xn_migrate(LINUX_DOMAIN); >>>>> ret (via forged frame) >>>> Yep, that's what came to my mind as well. But the __oops_I_did_it_again >>>> part has to reside in user space, no? >>> Clearly, yes. Either we map this explictly, or we just make sure to >>> compile it in each app, and pass its address at skin binding time. Our >>> text is mmlocked anyway. >>> >>>>> The thing is, that this brings in some arch-dep code to forge a stack >>>>> frame (like the kernel uses for signals), that should rather live in the >>>>> pipeline core. >>>> Actually, we are then close to enabling signal delivery outside syscalls... >>>> >>> Yes, looks like. >> When thinking about this real signals things, I was thinking about >> putting the forging code into Xenomai (the code is the same for all >> kernel versions, so there is no reason to put it into the I-pipe, and we >> may have to emit a special syscall to restore the context when handling >> the signal is done). What we need the I-pipe for, however, is to trigger >> some event on the way back to user-space. >> > > A reason to have this code in the pipeline core is because we would > duplicate the setup_rt_frame code already available from the vanilla > kernel. It's a bit like xnarch_switch_to: we used to open code most of > it in our arch-dep code, mostly duplicating the vanilla switch code, but > having switch_mm() ironed enough - on arm and powerpc at least - to be > callable from the Xenomai domain as well proved to be a serious relief. > > Granted, the signal code is unlikely to change a lot, given the strong > ABI requirements this has wrt the glibc, but I'm always reluctant to > introduce duplicates at both ends of the system; I would rather factor > out that code and make it available to both domains, if that makes > sense. I even had written some piece of code for x86 (completely untested). #include <asm/ptrace.h> #define __FIX_EFLAGS (X86_EFLAGS_AC | X86_EFLAGS_OF | \ X86_EFLAGS_DF | X86_EFLAGS_TF | X86_EFLAGS_SF | \ X86_EFLAGS_ZF | X86_EFLAGS_AF | X86_EFLAGS_PF | \ X86_EFLAGS_CF) #ifdef CONFIG_X86_32 # define FIX_EFLAGS (__FIX_EFLAGS | X86_EFLAGS_RF) #else # define FIX_EFLAGS __FIX_EFLAGS #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 11) #define hal_fpu_init_p(task) ((task)->used_math) #define hal_set_fpu_init(task) ((task)->used_math = 1) #else #define hal_fpu_init_p(task) tsk_used_math(task) #define hal_set_fpu_init(task) set_stopped_child_used_math(task) #endif void __user *hal_push(struct pt_regs *regs, void *chunk, size_t size) { unsigned long sp = regs->sp; sp -= size; if (__xn_copy_to_user((void __user *)sp, chunk, size)) return ERR_PTR(-EFAULT); regs->sp = sp; return (void __user *)sp; } #ifdef CONFIG_X86_32 struct sigtest_sigframe { u32 pretcoder; void *arg1; void *arg2; void __user *math; struct pt_regs regs; }; static unsigned long align_sigframe(unsigned long sp) { return ((sp + 4) & -16ul) - 4; } void hal_save_fpu(x86_fpustate *fpup) { if (cpu_has_fxsr) __asm__ __volatile__("fxsave %0; fnclex":"=m"(*fpup)); else __asm__ __volatile__("fnsave %0; fwait":"=m"(*fpup)); } void hal_restore_fpu(x86_fpustate *fpup) { clts(); if (cpu_has_fxsr) __asm__ __volatile__("fxrstor %0": /* no output */ :"m"(*fpup)); else __asm__ __volatile__("frstor %0": /* no output */ :"m"(*fpup)); } void hal_init_fpu(void) { __asm__ __volatile__("clts; fninit"); if (cpu_has_xmm) { unsigned long __mxcsr = 0x1f80UL & 0xffbfUL; __asm__ __volatile__("ldmxcsr %0"::"m"(__mxcsr)); } } int hal_trigger_cb(struct pt_regs *regs, void *fpup, void __user *cb, void __user *ret, void *arg1, void *arg2) { struct sigtest_sigframe __user *frame; unsigned long sp = regs->sp; unsigned long flags; local_irq_save_hw(flags); if (wrap_test_fpu_used(current) || hal_fpu_init_p(current)) { if (wrap_test_fpu_used(current)) { hal_save_fpu(fpup); wrap_clear_fpu_used(current); } if (__xn_copy_to_user((void __user *)sp, fpup, sizeof(*fpup))) { local_irq_restore_hw(flags); return -EFAULT; } k_frame->math = (void __user *)sp; } else k_frame->math = NULL; local_irq_restore_hw(flags); sp = align_sigframe(sp - sizeof(*frame)); frame = (struct sigtest_sigframe __user *)sp; k_frame->pretcoder = ret; k_frame->arg1 = arg1; k_frame->arg2 = arg2; if (__xn_copy_to_user(frame, k_frame, offsetof(struct sigtest_sigframe, regs))) return -EFAULT; if (__xn_copy_to_user(&frame->regs, regs, sizeof(*regs))) return -EFAULT; regs->sp = sp; regs->ip = (unsigned long)cb; regs->ax = (unsigned long)arg1; regs->dx = (unsigned long)arg2; regs->cx = 0; regs->ds = __USER_DS; regs->es = __USER_DS; regs->ss = __USER_DS; regs->cs = __USER_CS; return 0; } int hal_restore_regs(struct pt_regs *regs, void *fpup) { struct sigtest_sigframe __user *frame; unsigned long orig_flags; unsigned long flags; void __user *math; frame = (struct sigtest_sigframe __user *)(regs->sp - 8); orig_flags = regs->flags; if (__xn_copy_from_user(&math, &frame->math, sizeof(math))) return -EFAULT; if (__xn_copy_from_user(regs, &frame->regs, sizeof(*regs))) return -EFAULT; set_user_gs(regs, regs->gs); regs->cs |= 3; regs->ss |= 3; regs->flags = (orig_flags & ~FIX_EFLAGS) | (regs->flags & FIX_EFLAGS); local_irq_save_hw(flags); if (math) { if (__xn_copy_from_user(fpup, math, sizeof(*fpup))) { local_irq_restore_hw(flags); return -EFAULT; } hal_restore_fpu(fpup); } else if (hal_fpu_init_p(current)) { /* sighandler used fpu, restore the init state. */ hal_init_fpu(); wrap_set_fpu_used(current); } local_irq_restore_hw(flags); } #else /* CONFIG_X86_64 */ #endif /* CONFIG_X86_64 */ > -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:15 ` Philippe Gerum 2010-06-02 9:20 ` Jan Kiszka @ 2010-06-02 9:21 ` Gilles Chanteperdrix 2010-06-02 9:23 ` Jan Kiszka 2010-06-02 9:34 ` Philippe Gerum 2010-06-02 12:02 ` Daniele Nicolodi 2 siblings, 2 replies; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-02 9:21 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, xenomai Philippe Gerum wrote: > On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Tschaeche IT-Services wrote: >>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>>>> Not in the absence of syscall. We thought about this once already, when >>>>> considering how a watchdog preempting a runaway task in primary mode >>>>> could force a secondary mode switch: there is no sane and easy solution >>>>> to this unfortunately. >>>> This is exactly Sigmatek's problem: Our customers develop code >>>> within our debugging/development environment. We want to catch >>>> this situation (the developer implements a while(1)) with a >>>> watchdog throwing SIGTRAP so that our debugger gets active >>>> and can locate the problem according to the stack frame... >>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries >>> to catch "well-behaving" broken threads via SIGDEBUG and kills the >>> hopelessly broken rest - system alive again. >>> >>> You can then debug the former and need to do code review on the latter. >>> Or you could also try to add some loop-breaking Xenomai syscalls (or >>> even more clever checks) to library services the code under suspect >>> usually invokes. >> I am afraid "well-behaving" means emitting syscalls. We have a radical >> way to cause a SIGSEGV to be sent to a thread having run amok: set its >> PC to an invalid address (after having printed the real PC). gdb will >> not be able to print where the program stopped, but should be able to >> print the backtrace. >> > > Actually, we could extend this logic and forge a stack frame to return > to the preempted application code via some userland trampoline code, > doing the switch: > > [watchdog trigger] > forge_return_frame(on =regs->sp, to =regs->pc); > regs->pc = __oops_I_did_it_again; > > __oops_I_did_it_again: > __xn_migrate(LINUX_DOMAIN); > ret (via forged frame) > > The thing is, that this brings in some arch-dep code to forge a stack > frame (like the kernel uses for signals), that should rather live in the > pipeline core. There seems to be a simple approach: when the thread runs amok, set the pc to invalid address, save the real pc somewhere when relaxing for handling the exception (xnpod_trap_fault), if the amok bit is set, restore the pc in the saved registers from the saved location. -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:21 ` Gilles Chanteperdrix @ 2010-06-02 9:23 ` Jan Kiszka 2010-06-02 10:19 ` Tschaeche IT-Services 2010-06-02 9:34 ` Philippe Gerum 1 sibling, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2010-06-02 9:23 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org Gilles Chanteperdrix wrote: > Philippe Gerum wrote: >> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: >>>> Tschaeche IT-Services wrote: >>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>>>>> Not in the absence of syscall. We thought about this once already, when >>>>>> considering how a watchdog preempting a runaway task in primary mode >>>>>> could force a secondary mode switch: there is no sane and easy solution >>>>>> to this unfortunately. >>>>> This is exactly Sigmatek's problem: Our customers develop code >>>>> within our debugging/development environment. We want to catch >>>>> this situation (the developer implements a while(1)) with a >>>>> watchdog throwing SIGTRAP so that our debugger gets active >>>>> and can locate the problem according to the stack frame... >>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries >>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the >>>> hopelessly broken rest - system alive again. >>>> >>>> You can then debug the former and need to do code review on the latter. >>>> Or you could also try to add some loop-breaking Xenomai syscalls (or >>>> even more clever checks) to library services the code under suspect >>>> usually invokes. >>> I am afraid "well-behaving" means emitting syscalls. We have a radical >>> way to cause a SIGSEGV to be sent to a thread having run amok: set its >>> PC to an invalid address (after having printed the real PC). gdb will >>> not be able to print where the program stopped, but should be able to >>> print the backtrace. >>> >> Actually, we could extend this logic and forge a stack frame to return >> to the preempted application code via some userland trampoline code, >> doing the switch: >> >> [watchdog trigger] >> forge_return_frame(on =regs->sp, to =regs->pc); >> regs->pc = __oops_I_did_it_again; >> >> __oops_I_did_it_again: >> __xn_migrate(LINUX_DOMAIN); >> ret (via forged frame) >> >> The thing is, that this brings in some arch-dep code to forge a stack >> frame (like the kernel uses for signals), that should rather live in the >> pipeline core. > > There seems to be a simple approach: > when the thread runs amok, set the pc to invalid address, save the real > pc somewhere > when relaxing for handling the exception (xnpod_trap_fault), if the amok > bit is set, restore the pc in the saved registers from the saved location. Sounds feasible, will give it a try. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:23 ` Jan Kiszka @ 2010-06-02 10:19 ` Tschaeche IT-Services 2010-06-02 10:48 ` Gilles Chanteperdrix 0 siblings, 1 reply; 27+ messages in thread From: Tschaeche IT-Services @ 2010-06-02 10:19 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Wed, Jun 02, 2010 at 11:23:51AM +0200, Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Philippe Gerum wrote: > >> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: > >>> Jan Kiszka wrote: > >>>> Tschaeche IT-Services wrote: > >>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: > >>>>>> Not in the absence of syscall. We thought about this once already, when > >>>>>> considering how a watchdog preempting a runaway task in primary mode > >>>>>> could force a secondary mode switch: there is no sane and easy solution > >>>>>> to this unfortunately. > >>>>> This is exactly Sigmatek's problem: Our customers develop code > >>>>> within our debugging/development environment. We want to catch > >>>>> this situation (the developer implements a while(1)) with a > >>>>> watchdog throwing SIGTRAP so that our debugger gets active > >>>>> and can locate the problem according to the stack frame... > >>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries > >>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the > >>>> hopelessly broken rest - system alive again. > >>>> > >>>> You can then debug the former and need to do code review on the latter. > >>>> Or you could also try to add some loop-breaking Xenomai syscalls (or > >>>> even more clever checks) to library services the code under suspect > >>>> usually invokes. > >>> I am afraid "well-behaving" means emitting syscalls. We have a radical > >>> way to cause a SIGSEGV to be sent to a thread having run amok: set its > >>> PC to an invalid address (after having printed the real PC). gdb will > >>> not be able to print where the program stopped, but should be able to > >>> print the backtrace. > >>> > >> Actually, we could extend this logic and forge a stack frame to return > >> to the preempted application code via some userland trampoline code, > >> doing the switch: > >> > >> [watchdog trigger] > >> forge_return_frame(on =regs->sp, to =regs->pc); > >> regs->pc = __oops_I_did_it_again; > >> > >> __oops_I_did_it_again: > >> __xn_migrate(LINUX_DOMAIN); > >> ret (via forged frame) > >> > >> The thing is, that this brings in some arch-dep code to forge a stack > >> frame (like the kernel uses for signals), that should rather live in the > >> pipeline core. > > > > There seems to be a simple approach: > > when the thread runs amok, set the pc to invalid address, save the real > > pc somewhere > > when relaxing for handling the exception (xnpod_trap_fault), if the amok > > bit is set, restore the pc in the saved registers from the saved location. > > Sounds feasible, will give it a try. Looking at your discussion, handling asynchronous Linux signals in a primary domain task is not a "must" (but would be nice) for Xenomai according to initiate the signal handling in secondary domain *immediately*. Another solution might be, checking the state of the AMOK-task when Xenomai schedules the task for execution. If Linux-Signals are pending, force secondary domain switch. Thus, asynchronous Linux signals are handled at latest on primary domain scheduler activities - which would be sufficient for us... Regards, Olli -- Tschaeche IT-Services Tel.: +49/9134/9089850 Dr.-Ing. Oliver Tschäche Mobil: +49/176/20435601 Welluckenweg 4 Email: services@domain.hid 91077 Neunkirchen ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 10:19 ` Tschaeche IT-Services @ 2010-06-02 10:48 ` Gilles Chanteperdrix 0 siblings, 0 replies; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-02 10:48 UTC (permalink / raw) To: Tschaeche IT-Services; +Cc: Jan Kiszka, xenomai@xenomai.org Tschaeche IT-Services wrote: > Looking at your discussion, handling asynchronous Linux signals in a primary domain task > is not a "must" (but would be nice) for Xenomai according to initiate the signal handling > in secondary domain *immediately*. > > Another solution might be, checking the state of the AMOK-task when Xenomai > schedules the task for execution. If Linux-Signals are pending, force secondary > domain switch. Thus, asynchronous Linux signals are handled at latest on > primary domain scheduler activities - which would be sufficient for us... As Philippe explained to you in the second answer you received to your initial mail, that is impossible, because the function migrating threads from primary to secondary mode can not be called at any time. This issue is bugging us for some time, if that had worked, we would have implemented it a long time ago. -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:21 ` Gilles Chanteperdrix 2010-06-02 9:23 ` Jan Kiszka @ 2010-06-02 9:34 ` Philippe Gerum 2010-06-02 9:43 ` Gilles Chanteperdrix 1 sibling, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2010-06-02 9:34 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai On Wed, 2010-06-02 at 11:21 +0200, Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: > >> Jan Kiszka wrote: > >>> Tschaeche IT-Services wrote: > >>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: > >>>>> Not in the absence of syscall. We thought about this once already, when > >>>>> considering how a watchdog preempting a runaway task in primary mode > >>>>> could force a secondary mode switch: there is no sane and easy solution > >>>>> to this unfortunately. > >>>> This is exactly Sigmatek's problem: Our customers develop code > >>>> within our debugging/development environment. We want to catch > >>>> this situation (the developer implements a while(1)) with a > >>>> watchdog throwing SIGTRAP so that our debugger gets active > >>>> and can locate the problem according to the stack frame... > >>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries > >>> to catch "well-behaving" broken threads via SIGDEBUG and kills the > >>> hopelessly broken rest - system alive again. > >>> > >>> You can then debug the former and need to do code review on the latter. > >>> Or you could also try to add some loop-breaking Xenomai syscalls (or > >>> even more clever checks) to library services the code under suspect > >>> usually invokes. > >> I am afraid "well-behaving" means emitting syscalls. We have a radical > >> way to cause a SIGSEGV to be sent to a thread having run amok: set its > >> PC to an invalid address (after having printed the real PC). gdb will > >> not be able to print where the program stopped, but should be able to > >> print the backtrace. > >> > > > > Actually, we could extend this logic and forge a stack frame to return > > to the preempted application code via some userland trampoline code, > > doing the switch: > > > > [watchdog trigger] > > forge_return_frame(on =regs->sp, to =regs->pc); > > regs->pc = __oops_I_did_it_again; > > > > __oops_I_did_it_again: > > __xn_migrate(LINUX_DOMAIN); > > ret (via forged frame) > > > > The thing is, that this brings in some arch-dep code to forge a stack > > frame (like the kernel uses for signals), that should rather live in the > > pipeline core. > > There seems to be a simple approach: > when the thread runs amok, set the pc to invalid address, save the real > pc somewhere > when relaxing for handling the exception (xnpod_trap_fault), if the amok > bit is set, restore the pc in the saved registers from the saved location. > It's indeed simpler. The limit of this approach is to count on a correct behaviour of the fault mechanism, since we would rely on it implicitly to deal with the mode switch. By "correct", I mean: the instruction fetch fault must be detectable and recoverable the same way, regardless of the architecture. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:34 ` Philippe Gerum @ 2010-06-02 9:43 ` Gilles Chanteperdrix 0 siblings, 0 replies; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-02 9:43 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, xenomai Philippe Gerum wrote: > On Wed, 2010-06-02 at 11:21 +0200, Gilles Chanteperdrix wrote: >> Philippe Gerum wrote: >>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote: >>>> Jan Kiszka wrote: >>>>> Tschaeche IT-Services wrote: >>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote: >>>>>>> Not in the absence of syscall. We thought about this once already, when >>>>>>> considering how a watchdog preempting a runaway task in primary mode >>>>>>> could force a secondary mode switch: there is no sane and easy solution >>>>>>> to this unfortunately. >>>>>> This is exactly Sigmatek's problem: Our customers develop code >>>>>> within our debugging/development environment. We want to catch >>>>>> this situation (the developer implements a while(1)) with a >>>>>> watchdog throwing SIGTRAP so that our debugger gets active >>>>>> and can locate the problem according to the stack frame... >>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries >>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the >>>>> hopelessly broken rest - system alive again. >>>>> >>>>> You can then debug the former and need to do code review on the latter. >>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or >>>>> even more clever checks) to library services the code under suspect >>>>> usually invokes. >>>> I am afraid "well-behaving" means emitting syscalls. We have a radical >>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its >>>> PC to an invalid address (after having printed the real PC). gdb will >>>> not be able to print where the program stopped, but should be able to >>>> print the backtrace. >>>> >>> Actually, we could extend this logic and forge a stack frame to return >>> to the preempted application code via some userland trampoline code, >>> doing the switch: >>> >>> [watchdog trigger] >>> forge_return_frame(on =regs->sp, to =regs->pc); >>> regs->pc = __oops_I_did_it_again; >>> >>> __oops_I_did_it_again: >>> __xn_migrate(LINUX_DOMAIN); >>> ret (via forged frame) >>> >>> The thing is, that this brings in some arch-dep code to forge a stack >>> frame (like the kernel uses for signals), that should rather live in the >>> pipeline core. >> There seems to be a simple approach: >> when the thread runs amok, set the pc to invalid address, save the real >> pc somewhere >> when relaxing for handling the exception (xnpod_trap_fault), if the amok >> bit is set, restore the pc in the saved registers from the saved location. >> > > It's indeed simpler. The limit of this approach is to count on a correct > behaviour of the fault mechanism, since we would rely on it implicitly > to deal with the mode switch. By "correct", I mean: the instruction > fetch fault must be detectable and recoverable the same way, regardless > of the architecture. Yes, if the kernel looks at what is under the PC to handle the fault, we are toast because it will probably do it after we have restored the real PC. -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 9:15 ` Philippe Gerum 2010-06-02 9:20 ` Jan Kiszka 2010-06-02 9:21 ` Gilles Chanteperdrix @ 2010-06-02 12:02 ` Daniele Nicolodi 2010-06-02 13:47 ` Gilles Chanteperdrix 2010-06-02 15:14 ` Philippe Gerum 2 siblings, 2 replies; 27+ messages in thread From: Daniele Nicolodi @ 2010-06-02 12:02 UTC (permalink / raw) To: xenomai On 02/06/10 11:15, Philippe Gerum wrote: > Actually, we could extend this logic and forge a stack frame to return > to the preempted application code via some userland trampoline code, > doing the switch: > > [watchdog trigger] > forge_return_frame(on =regs->sp, to =regs->pc); > regs->pc = __oops_I_did_it_again; > > __oops_I_did_it_again: > __xn_migrate(LINUX_DOMAIN); > ret (via forged frame) > > The thing is, that this brings in some arch-dep code to forge a stack > frame (like the kernel uses for signals), that should rather live in the > pipeline core. Am I to naive thinking that this solution would let the user space choose what to do when the watchdog interrupts the current thread? In your example, it would be enough to assign to __ops_I_did_it_again a function pointer to the function that has to be executed. Probably there will be hard constraint on what this function can do, but it would be a nice feature for debugging and for solving application specific issues. Cheers, -- Daniele ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 12:02 ` Daniele Nicolodi @ 2010-06-02 13:47 ` Gilles Chanteperdrix 2010-06-02 15:14 ` Philippe Gerum 1 sibling, 0 replies; 27+ messages in thread From: Gilles Chanteperdrix @ 2010-06-02 13:47 UTC (permalink / raw) To: Daniele Nicolodi; +Cc: xenomai Daniele Nicolodi wrote: > On 02/06/10 11:15, Philippe Gerum wrote: > >> Actually, we could extend this logic and forge a stack frame to return >> to the preempted application code via some userland trampoline code, >> doing the switch: >> >> [watchdog trigger] >> forge_return_frame(on =regs->sp, to =regs->pc); >> regs->pc = __oops_I_did_it_again; >> >> __oops_I_did_it_again: >> __xn_migrate(LINUX_DOMAIN); >> ret (via forged frame) >> >> The thing is, that this brings in some arch-dep code to forge a stack >> frame (like the kernel uses for signals), that should rather live in the >> pipeline core. > > Am I to naive thinking that this solution would let the user space > choose what to do when the watchdog interrupts the current thread? In > your example, it would be enough to assign to __ops_I_did_it_again a > function pointer to the function that has to be executed. > > Probably there will be hard constraint on what this function can do, but > it would be a nice feature for debugging and for solving application > specific issues. You already have that with SIGDEBUG. You can register whatever signal handler you want for the SIGDEBUG signal. The same goes for SIGSEGV. The only issue we are talking about here is that the SIGDEBUG mechanism does not work when a piece of code is blocked in an infinite loop without calling any syscall. But that should be a pretty rare case. -- Gilles. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Xenomai-help] Handling Linux Signals in primary domain context 2010-06-02 12:02 ` Daniele Nicolodi 2010-06-02 13:47 ` Gilles Chanteperdrix @ 2010-06-02 15:14 ` Philippe Gerum 1 sibling, 0 replies; 27+ messages in thread From: Philippe Gerum @ 2010-06-02 15:14 UTC (permalink / raw) To: Daniele Nicolodi; +Cc: xenomai On Wed, 2010-06-02 at 14:02 +0200, Daniele Nicolodi wrote: > On 02/06/10 11:15, Philippe Gerum wrote: > > > Actually, we could extend this logic and forge a stack frame to return > > to the preempted application code via some userland trampoline code, > > doing the switch: > > > > [watchdog trigger] > > forge_return_frame(on =regs->sp, to =regs->pc); > > regs->pc = __oops_I_did_it_again; > > > > __oops_I_did_it_again: > > __xn_migrate(LINUX_DOMAIN); > > ret (via forged frame) > > > > The thing is, that this brings in some arch-dep code to forge a stack > > frame (like the kernel uses for signals), that should rather live in the > > pipeline core. > > Am I to naive thinking that this solution would let the user space > choose what to do when the watchdog interrupts the current thread? In > your example, it would be enough to assign to __ops_I_did_it_again a > function pointer to the function that has to be executed. > > Probably there will be hard constraint on what this function can do, but > it would be a nice feature for debugging and for solving application > specific issues. If your question is related to handling a watchdog trigger in a syscall-less runaway loop, that method would likely allow for a user intercept via some hook, yes. Everything sensible that helps debugging will do. > > Cheers, -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-06-02 15:14 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-01 13:50 [Xenomai-help] Handling Linux Signals in primary domain context Tschaeche IT-Services 2010-06-01 13:52 ` Gilles Chanteperdrix 2010-06-01 13:59 ` Gilles Chanteperdrix 2010-06-01 14:32 ` Philippe Gerum 2010-06-01 15:54 ` Tschaeche IT-Services 2010-06-01 16:52 ` Tschaeche IT-Services 2010-06-01 16:58 ` Jan Kiszka 2010-06-02 8:36 ` Gilles Chanteperdrix 2010-06-02 9:14 ` Jan Kiszka 2010-06-02 9:15 ` Philippe Gerum 2010-06-02 9:20 ` Jan Kiszka 2010-06-02 9:28 ` Philippe Gerum 2010-06-02 9:37 ` Gilles Chanteperdrix 2010-06-02 10:06 ` Philippe Gerum 2010-06-02 10:19 ` Gilles Chanteperdrix 2010-06-02 10:42 ` Philippe Gerum 2010-06-02 10:51 ` Gilles Chanteperdrix 2010-06-02 10:29 ` Gilles Chanteperdrix 2010-06-02 9:21 ` Gilles Chanteperdrix 2010-06-02 9:23 ` Jan Kiszka 2010-06-02 10:19 ` Tschaeche IT-Services 2010-06-02 10:48 ` Gilles Chanteperdrix 2010-06-02 9:34 ` Philippe Gerum 2010-06-02 9:43 ` Gilles Chanteperdrix 2010-06-02 12:02 ` Daniele Nicolodi 2010-06-02 13:47 ` Gilles Chanteperdrix 2010-06-02 15:14 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.