* [Xenomai-core] [PATCH] i386: switch to root domain on unhandled non-root faults @ 2007-11-05 19:16 Jan Kiszka 2007-11-05 19:23 ` [Xenomai-core] [Adeos-main] " Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-11-05 19:16 UTC (permalink / raw) To: adeos-main; +Cc: Xenomai-core This patch addresses the recently discovered issue that I-pipe actually need to deal with faults over non-root domain in which the current domain shows no interest in. Such faults could be triggered inside copy_*_user, thus can cleanly be handled by Linux - if we only allow for this. Currently, if debugging is on, we warn about a potential bug, and corrupt the pipeline states otherwise. The new approach is to unconditionally drop to root domain in such cases, but - for debugging purposes of non-fixable faults - keep track of the original domain and report it on oops. Similar patches are required for other archs. Maybe I can look into x86_64 later. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-05 19:16 [Xenomai-core] [PATCH] i386: switch to root domain on unhandled non-root faults Jan Kiszka @ 2007-11-05 19:23 ` Jan Kiszka 2007-11-05 22:28 ` Philippe Gerum 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-11-05 19:23 UTC (permalink / raw) To: adeos-main; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 836 bytes --] Jan Kiszka wrote: > This patch addresses the recently discovered issue that I-pipe actually > need to deal with faults over non-root domain in which the current > domain shows no interest in. Such faults could be triggered inside > copy_*_user, thus can cleanly be handled by Linux - if we only allow for > this. Currently, if debugging is on, we warn about a potential bug, and > corrupt the pipeline states otherwise. > > The new approach is to unconditionally drop to root domain in such > cases, but - for debugging purposes of non-fixable faults - keep track > of the original domain and report it on oops. > > Similar patches are required for other archs. Maybe I can look into > x86_64 later. > > Jan [Damn it... now with attachment.] -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux [-- Attachment #2: handle-fixable-non-root-faults.patch --] [-- Type: text/x-patch, Size: 4388 bytes --] --- arch/i386/kernel/ipipe.c | 27 +++++++++++++++------------ arch/i386/kernel/traps.c | 3 ++- include/linux/ipipe.h | 1 + include/linux/ipipe_percpu.h | 1 + kernel/ipipe/core.c | 1 + 5 files changed, 20 insertions(+), 13 deletions(-) Index: linux-2.6.23.1-xeno/arch/i386/kernel/ipipe.c =================================================================== --- linux-2.6.23.1-xeno.orig/arch/i386/kernel/ipipe.c +++ linux-2.6.23.1-xeno/arch/i386/kernel/ipipe.c @@ -634,7 +634,7 @@ fastcall int __ipipe_handle_exception(st } #ifdef CONFIG_KGDB - /* catch exception KGDB is interested in over non-root domains */ + /* catch exceptions KGDB is interested in over non-root domains */ if (!ipipe_root_domain_p && __ipipe_xlate_signo[vector] >= 0 && !kgdb_handle_exception(vector, __ipipe_xlate_signo[vector], error_code, regs)) { @@ -643,18 +643,21 @@ fastcall int __ipipe_handle_exception(st } #endif /* CONFIG_KGDB */ - if (!ipipe_trap_notify(vector, regs)) { -#ifdef CONFIG_IPIPE_DEBUG - if (!ipipe_root_domain_p) { - /* Fix up domain so that Linux can handle this. */ + if (likely(!ipipe_trap_notify(vector, regs))) { + if (unlikely(!ipipe_root_domain_p)) { + struct ipipe_domain *prev_orig = ipipe_orig_domain; + + /* Drop to root domain so that Linux can handle the + * fault, but save the original domain for reporting + * purposes of oopses over non-root contexts. */ + ipipe_orig_domain = ipipe_current_domain; ipipe_current_domain = ipipe_root_domain; - ipipe_trace_panic_freeze(); - printk(KERN_ERR "BUG: Unhandled exception over domain" - " %s - switching to ROOT\n", - ipipe_current_domain->name); - } -#endif /* CONFIG_IPIPE_DEBUG */ - __ipipe_std_extable[vector](regs, error_code); + + __ipipe_std_extable[vector](regs, error_code); + + ipipe_orig_domain = prev_orig; + } else + __ipipe_std_extable[vector](regs, error_code); local_irq_restore(flags); __fixup_if(regs); return 0; Index: linux-2.6.23.1-xeno/arch/i386/kernel/traps.c =================================================================== --- linux-2.6.23.1-xeno.orig/arch/i386/kernel/traps.c +++ linux-2.6.23.1-xeno/arch/i386/kernel/traps.c @@ -321,7 +321,8 @@ void show_registers(struct pt_regs *regs TASK_COMM_LEN, current->comm, current->pid, current_thread_info(), current, task_thread_info(current)); #ifdef CONFIG_IPIPE - printk(KERN_EMERG "\nI-pipe domain %s", ipipe_current_domain->name); + printk(KERN_EMERG "\nI-pipe domain %s", ipipe_orig_domain ? + ipipe_orig_domain->name : ipipe_current_domain->name); #endif /* CONFIG_IPIPE */ /* * When in-kernel, we also print out the stack and code at the Index: linux-2.6.23.1-xeno/include/linux/ipipe.h =================================================================== --- linux-2.6.23.1-xeno.orig/include/linux/ipipe.h +++ linux-2.6.23.1-xeno/include/linux/ipipe.h @@ -83,6 +83,7 @@ #define IPIPE_NR_CPUS NR_CPUS #define ipipe_current_domain ipipe_cpu_var(ipipe_percpu_domain) +#define ipipe_orig_domain ipipe_cpu_var(ipipe_percpu_orig_domain) #define ipipe_virtual_irq_p(irq) ((irq) >= IPIPE_VIRQ_BASE && \ (irq) < IPIPE_NR_IRQS) Index: linux-2.6.23.1-xeno/include/linux/ipipe_percpu.h =================================================================== --- linux-2.6.23.1-xeno.orig/include/linux/ipipe_percpu.h +++ linux-2.6.23.1-xeno/include/linux/ipipe_percpu.h @@ -52,6 +52,7 @@ DECLARE_PER_CPU(struct ipipe_percpu_doma DECLARE_PER_CPU(struct ipipe_percpu_domain_data, ipipe_percpu_darray[CONFIG_IPIPE_DOMAINS]); DECLARE_PER_CPU(struct ipipe_domain *, ipipe_percpu_domain); +DECLARE_PER_CPU(struct ipipe_domain *, ipipe_percpu_orig_domain); #ifdef CONFIG_IPIPE_DEBUG_CONTEXT DECLARE_PER_CPU(int, ipipe_percpu_context_check); Index: linux-2.6.23.1-xeno/kernel/ipipe/core.c =================================================================== --- linux-2.6.23.1-xeno.orig/kernel/ipipe/core.c +++ linux-2.6.23.1-xeno/kernel/ipipe/core.c @@ -78,6 +78,7 @@ DEFINE_PER_CPU(struct ipipe_percpu_domai { [0] = { .status = IPIPE_STALL_MASK } }; /* Root domain stalled on each CPU at startup. */ DEFINE_PER_CPU(struct ipipe_domain *, ipipe_percpu_domain) = { &ipipe_root }; +DEFINE_PER_CPU(struct ipipe_domain *, ipipe_percpu_orig_domain) = { NULL }; static IPIPE_DEFINE_SPINLOCK(__ipipe_pipelock); ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-05 19:23 ` [Xenomai-core] [Adeos-main] " Jan Kiszka @ 2007-11-05 22:28 ` Philippe Gerum 2007-11-06 6:16 ` Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-11-05 22:28 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Xenomai-core [-- Attachment #1: Type: text/plain, Size: 1606 bytes --] Jan Kiszka wrote: > Jan Kiszka wrote: >> This patch addresses the recently discovered issue that I-pipe actually >> need to deal with faults over non-root domain in which the current >> domain shows no interest in. Such faults could be triggered inside >> copy_*_user, thus can cleanly be handled by Linux - if we only allow for >> this. Currently, if debugging is on, we warn about a potential bug, and >> corrupt the pipeline states otherwise. >> >> The new approach is to unconditionally drop to root domain in such >> cases, but - for debugging purposes of non-fixable faults - keep track >> of the original domain and report it on oops. >> >> Similar patches are required for other archs. Maybe I can look into >> x86_64 later. >> Nak, this patch would not work as wanted. Again, what you need is to always fixup, and conditionally send a bug report to the kernel log if CONFIG_IPIPE_DEBUG is enabled, nothing more. This patch assumes that die() is always going to be fired for any in-kernel fault, so that all reports only need to go through this routine, which is wrong. Kernel fixups through exception tables may fix the fault early and silently, and this is particularly the case for copy_to_user helpers, which do include kernel fixup code. By being silent when fixing up things in __ipipe_handle_exception() like your patch currently is, we would be left with no trace at all that some unhandled fault just happened, except by looking at /proc/xenomai/faults. By sending the report immediately when fixing up in the latter routine, you also avoid the ugly ipipe_orig_domain stuff. [-- Attachment #2: fixup-unhandled-fault.patch --] [-- Type: text/x-patch, Size: 962 bytes --] diff --git a/arch/i386/kernel/ipipe.c b/arch/i386/kernel/ipipe.c index bf6443d..9323951 100644 --- a/arch/i386/kernel/ipipe.c +++ b/arch/i386/kernel/ipipe.c @@ -644,16 +644,20 @@ fastcall int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int #endif /* CONFIG_KGDB */ if (!ipipe_trap_notify(vector, regs)) { -#ifdef CONFIG_IPIPE_DEBUG if (!ipipe_root_domain_p) { /* Fix up domain so that Linux can handle this. */ +#ifdef CONFIG_IPIPE_DEBUG + struct ipipe_domain *ipd = ipipe_current_domain; ipipe_current_domain = ipipe_root_domain; ipipe_trace_panic_freeze(); printk(KERN_ERR "BUG: Unhandled exception over domain" " %s - switching to ROOT\n", - ipipe_current_domain->name); - } + ipd->name); + dump_stack(); +#else + ipipe_current_domain = ipipe_root_domain; #endif /* CONFIG_IPIPE_DEBUG */ + } __ipipe_std_extable[vector](regs, error_code); local_irq_restore(flags); __fixup_if(regs); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-05 22:28 ` Philippe Gerum @ 2007-11-06 6:16 ` Jan Kiszka 2007-11-06 9:01 ` Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-11-06 6:16 UTC (permalink / raw) To: rpm; +Cc: adeos-main, Xenomai-core [-- Attachment #1: Type: text/plain, Size: 2922 bytes --] Philippe Gerum wrote: > Jan Kiszka wrote: >> Jan Kiszka wrote: >>> This patch addresses the recently discovered issue that I-pipe actually >>> need to deal with faults over non-root domain in which the current >>> domain shows no interest in. Such faults could be triggered inside >>> copy_*_user, thus can cleanly be handled by Linux - if we only allow for >>> this. Currently, if debugging is on, we warn about a potential bug, and >>> corrupt the pipeline states otherwise. >>> >>> The new approach is to unconditionally drop to root domain in such >>> cases, but - for debugging purposes of non-fixable faults - keep track >>> of the original domain and report it on oops. >>> >>> Similar patches are required for other archs. Maybe I can look into >>> x86_64 later. >>> > > Nak, this patch would not work as wanted. Again, what you need is to > always fixup, and conditionally send a bug report to the kernel log if > CONFIG_IPIPE_DEBUG is enabled, nothing more. > > This patch assumes that die() is always going to be fired for any > in-kernel fault, so that all reports only need to go through this > routine, which is wrong. Kernel fixups through exception tables may fix > the fault early and silently, and this is particularly the case for > copy_to_user helpers, which do include kernel fixup code. By being > silent when fixing up things in __ipipe_handle_exception() like your > patch currently is, we would be left with no trace at all that some > unhandled fault just happened, except by looking at /proc/xenomai/faults. As you are still remain vague on the actual problematic scenarios, I will try to go through them, and maybe you can add/correct what I miss: - faults in user land => can be silently handled by Linux after dropping to root domain. This lowering is perfectly fine as the higher domains showed no interest in the fault, thus are currently running in domain-agnostic code paths anyway. - faults on fixable kernel addresses => same as above. If the high domains fail to evaluate the fix-up result, it's not I-pipe's fault. - minor faults on kernel addresses (more precisely: in the I-pipe core or some I-pipe user) => those would now went unnoticed and need further thoughts, granted. - major faults on kernel addresses => still generate major oopses and will thus be visible. Did I missed something? If not, I would now start addressing the remaining problematic scenario directly instead of throwing all into the same pot. > > By sending the report immediately when fixing up in the latter routine, > you also avoid the ugly ipipe_orig_domain stuff. It's not nice, but it is at least as ugly as reporting a kernel BUG when there is only a gracefully fixable bug in user code. I definitely do not agree with your approach as well, and I'm convinced we need to find a third way here. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 249 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-06 6:16 ` Jan Kiszka @ 2007-11-06 9:01 ` Jan Kiszka 2007-11-08 13:48 ` Jan Kiszka 2007-11-08 23:31 ` Philippe Gerum 0 siblings, 2 replies; 17+ messages in thread From: Jan Kiszka @ 2007-11-06 9:01 UTC (permalink / raw) To: rpm; +Cc: adeos-main, Xenomai-core [-- Attachment #1: Type: text/plain, Size: 3235 bytes --] Jan Kiszka wrote: > Philippe Gerum wrote: >> Jan Kiszka wrote: >>> Jan Kiszka wrote: >>>> This patch addresses the recently discovered issue that I-pipe actually >>>> need to deal with faults over non-root domain in which the current >>>> domain shows no interest in. Such faults could be triggered inside >>>> copy_*_user, thus can cleanly be handled by Linux - if we only allow for >>>> this. Currently, if debugging is on, we warn about a potential bug, and >>>> corrupt the pipeline states otherwise. >>>> >>>> The new approach is to unconditionally drop to root domain in such >>>> cases, but - for debugging purposes of non-fixable faults - keep track >>>> of the original domain and report it on oops. >>>> >>>> Similar patches are required for other archs. Maybe I can look into >>>> x86_64 later. >>>> >> Nak, this patch would not work as wanted. Again, what you need is to >> always fixup, and conditionally send a bug report to the kernel log if >> CONFIG_IPIPE_DEBUG is enabled, nothing more. >> >> This patch assumes that die() is always going to be fired for any >> in-kernel fault, so that all reports only need to go through this >> routine, which is wrong. Kernel fixups through exception tables may fix >> the fault early and silently, and this is particularly the case for >> copy_to_user helpers, which do include kernel fixup code. By being >> silent when fixing up things in __ipipe_handle_exception() like your >> patch currently is, we would be left with no trace at all that some >> unhandled fault just happened, except by looking at /proc/xenomai/faults. > > As you are still remain vague on the actual problematic scenarios, I > will try to go through them, and maybe you can add/correct what I miss: > > - faults in user land => can be silently handled by Linux after > dropping to root domain. This lowering is perfectly fine as the > higher domains showed no interest in the fault, thus are currently > running in domain-agnostic code paths anyway. > > - faults on fixable kernel addresses => same as above. If the high > domains fail to evaluate the fix-up result, it's not I-pipe's fault. > > - minor faults on kernel addresses (more precisely: in the I-pipe core > or some I-pipe user) => those would now went unnoticed and need > further thoughts, granted. > + > - major faults on kernel addresses => still generate major oopses and > will thus be visible. > > Did I missed something? If not, I would now start addressing the > remaining problematic scenario directly instead of throwing all into the > same pot. > >> By sending the report immediately when fixing up in the latter routine, >> you also avoid the ugly ipipe_orig_domain stuff. > > It's not nice, but it is at least as ugly as reporting a kernel BUG when > there is only a gracefully fixable bug in user code. I definitely do not > agree with your approach as well, and I'm convinced we need to find a > third way here. OK, I think this should be acceptable for everyone. It's basically your approach, reworked to avoid false positives. Moreover, it unconditionally sets the root domain - should be faster. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux [-- Attachment #2: handle-fixable-non-root-faults-v2.patch --] [-- Type: text/x-patch, Size: 1910 bytes --] --- arch/i386/kernel/ipipe.c | 38 ++++++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 16 deletions(-) Index: linux-2.6.23-ipipe/arch/i386/kernel/ipipe.c =================================================================== --- linux-2.6.23-ipipe.orig/arch/i386/kernel/ipipe.c +++ linux-2.6.23-ipipe/arch/i386/kernel/ipipe.c @@ -643,26 +643,32 @@ fastcall int __ipipe_handle_exception(st } #endif /* CONFIG_KGDB */ - if (!ipipe_trap_notify(vector, regs)) { -#ifdef CONFIG_IPIPE_DEBUG - if (!ipipe_root_domain_p) { - /* Fix up domain so that Linux can handle this. */ - ipipe_current_domain = ipipe_root_domain; - ipipe_trace_panic_freeze(); - printk(KERN_ERR "BUG: Unhandled exception over domain" - " %s - switching to ROOT\n", - ipipe_current_domain->name); - } -#endif /* CONFIG_IPIPE_DEBUG */ - __ipipe_std_extable[vector](regs, error_code); + if (unlikely(ipipe_trap_notify(vector, regs))) { local_irq_restore(flags); - __fixup_if(regs); - return 0; + return 1; } - local_irq_restore(flags); +#ifdef CONFIG_IPIPE_DEBUG + /* Warn about unhandled faults over non-root domains in kernel space + * unless they occured at fixable locations. */ + if (unlikely(!ipipe_root_domain_p) && !(error_code & 4) && + !search_exception_tables(regs->eip)) { + struct ipipe_domain *ipd = ipipe_current_domain; + ipipe_current_domain = ipipe_root_domain; + ipipe_trace_panic_freeze(); + printk(KERN_ERR "BUG: Unhandled exception over domain" + " %s - switching to ROOT\n", ipd->name); + dump_stack(); + } +#endif /* CONFIG_IPIPE_DEBUG */ - return 1; + /* Always switch to root so that Linux can handle it cleanly. */ + ipipe_current_domain = ipipe_root_domain; + + __ipipe_std_extable[vector](regs, error_code); + local_irq_restore(flags); + __fixup_if(regs); + return 0; } fastcall int __ipipe_divert_exception(struct pt_regs *regs, int vector) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-06 9:01 ` Jan Kiszka @ 2007-11-08 13:48 ` Jan Kiszka 2007-11-08 23:31 ` Philippe Gerum 1 sibling, 0 replies; 17+ messages in thread From: Jan Kiszka @ 2007-11-08 13:48 UTC (permalink / raw) To: rpm; +Cc: adeos-main, Xenomai-core [-- Attachment #1: Type: text/plain, Size: 3375 bytes --] Jan Kiszka wrote: > Jan Kiszka wrote: >> Philippe Gerum wrote: >>> Jan Kiszka wrote: >>>> Jan Kiszka wrote: >>>>> This patch addresses the recently discovered issue that I-pipe actually >>>>> need to deal with faults over non-root domain in which the current >>>>> domain shows no interest in. Such faults could be triggered inside >>>>> copy_*_user, thus can cleanly be handled by Linux - if we only allow for >>>>> this. Currently, if debugging is on, we warn about a potential bug, and >>>>> corrupt the pipeline states otherwise. >>>>> >>>>> The new approach is to unconditionally drop to root domain in such >>>>> cases, but - for debugging purposes of non-fixable faults - keep track >>>>> of the original domain and report it on oops. >>>>> >>>>> Similar patches are required for other archs. Maybe I can look into >>>>> x86_64 later. >>>>> >>> Nak, this patch would not work as wanted. Again, what you need is to >>> always fixup, and conditionally send a bug report to the kernel log if >>> CONFIG_IPIPE_DEBUG is enabled, nothing more. >>> >>> This patch assumes that die() is always going to be fired for any >>> in-kernel fault, so that all reports only need to go through this >>> routine, which is wrong. Kernel fixups through exception tables may fix >>> the fault early and silently, and this is particularly the case for >>> copy_to_user helpers, which do include kernel fixup code. By being >>> silent when fixing up things in __ipipe_handle_exception() like your >>> patch currently is, we would be left with no trace at all that some >>> unhandled fault just happened, except by looking at /proc/xenomai/faults. >> As you are still remain vague on the actual problematic scenarios, I >> will try to go through them, and maybe you can add/correct what I miss: >> >> - faults in user land => can be silently handled by Linux after >> dropping to root domain. This lowering is perfectly fine as the >> higher domains showed no interest in the fault, thus are currently >> running in domain-agnostic code paths anyway. >> >> - faults on fixable kernel addresses => same as above. If the high >> domains fail to evaluate the fix-up result, it's not I-pipe's fault. >> >> - minor faults on kernel addresses (more precisely: in the I-pipe core >> or some I-pipe user) => those would now went unnoticed and need >> further thoughts, granted. >> + >> - major faults on kernel addresses => still generate major oopses and >> will thus be visible. >> >> Did I missed something? If not, I would now start addressing the >> remaining problematic scenario directly instead of throwing all into the >> same pot. >> >>> By sending the report immediately when fixing up in the latter routine, >>> you also avoid the ugly ipipe_orig_domain stuff. >> It's not nice, but it is at least as ugly as reporting a kernel BUG when >> there is only a gracefully fixable bug in user code. I definitely do not >> agree with your approach as well, and I'm convinced we need to find a >> third way here. > > OK, I think this should be acceptable for everyone. It's basically your > approach, reworked to avoid false positives. Moreover, it > unconditionally sets the root domain - should be faster. Attached the equivalent x86_64 version. Works fine here as well. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux [-- Attachment #2: handle-fixable-non-root-faults-v2.patch --] [-- Type: text/x-patch, Size: 1613 bytes --] --- arch/x86_64/kernel/ipipe.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) Index: linux-2.6.23.1-xeno_64/arch/x86_64/kernel/ipipe.c =================================================================== --- linux-2.6.23.1-xeno_64.orig/arch/x86_64/kernel/ipipe.c +++ linux-2.6.23.1-xeno_64/arch/x86_64/kernel/ipipe.c @@ -612,17 +612,32 @@ asmlinkage int __ipipe_handle_exception( } #endif /* CONFIG_KGDB */ - if (!ipipe_trap_notify(vector, regs)) { - __ipipe_exptr handler = __ipipe_std_extable[vector]; - handler(regs,error_code); + if (unlikely(ipipe_trap_notify(vector, regs))) { local_irq_restore(flags); - __fixup_if(regs); - return 0; + return 1; } - local_irq_restore(flags); +#ifdef CONFIG_IPIPE_DEBUG + /* Warn about unhandled faults over non-root domains in kernel space + * unless they occured at fixable locations. */ + if (unlikely(!ipipe_root_domain_p) && !(error_code & 4) && + !search_exception_tables(instruction_pointer(regs))) { + struct ipipe_domain *ipd = ipipe_current_domain; + ipipe_current_domain = ipipe_root_domain; + ipipe_trace_panic_freeze(); + printk(KERN_ERR "BUG: Unhandled exception over domain" + " %s - switching to ROOT\n", ipd->name); + dump_stack(); + } +#endif /* CONFIG_IPIPE_DEBUG */ - return 1; + /* Always switch to root so that Linux can handle it cleanly. */ + ipipe_current_domain = ipipe_root_domain; + + __ipipe_std_extable[vector](regs, error_code); + local_irq_restore(flags); + __fixup_if(regs); + return 0; } asmlinkage int __ipipe_divert_exception(struct pt_regs *regs, int vector) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-06 9:01 ` Jan Kiszka 2007-11-08 13:48 ` Jan Kiszka @ 2007-11-08 23:31 ` Philippe Gerum 2007-11-09 13:58 ` Jan Kiszka 1 sibling, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-11-08 23:31 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Xenomai-core [-- Attachment #1: Type: text/plain, Size: 3466 bytes --] Jan Kiszka wrote: > Jan Kiszka wrote: >> Philippe Gerum wrote: >>> Jan Kiszka wrote: >>>> Jan Kiszka wrote: >>>>> This patch addresses the recently discovered issue that I-pipe actually >>>>> need to deal with faults over non-root domain in which the current >>>>> domain shows no interest in. Such faults could be triggered inside >>>>> copy_*_user, thus can cleanly be handled by Linux - if we only allow for >>>>> this. Currently, if debugging is on, we warn about a potential bug, and >>>>> corrupt the pipeline states otherwise. >>>>> >>>>> The new approach is to unconditionally drop to root domain in such >>>>> cases, but - for debugging purposes of non-fixable faults - keep track >>>>> of the original domain and report it on oops. >>>>> >>>>> Similar patches are required for other archs. Maybe I can look into >>>>> x86_64 later. >>>>> >>> Nak, this patch would not work as wanted. Again, what you need is to >>> always fixup, and conditionally send a bug report to the kernel log if >>> CONFIG_IPIPE_DEBUG is enabled, nothing more. >>> >>> This patch assumes that die() is always going to be fired for any >>> in-kernel fault, so that all reports only need to go through this >>> routine, which is wrong. Kernel fixups through exception tables may fix >>> the fault early and silently, and this is particularly the case for >>> copy_to_user helpers, which do include kernel fixup code. By being >>> silent when fixing up things in __ipipe_handle_exception() like your >>> patch currently is, we would be left with no trace at all that some >>> unhandled fault just happened, except by looking at /proc/xenomai/faults. >> As you are still remain vague on the actual problematic scenarios, I >> will try to go through them, and maybe you can add/correct what I miss: >> >> - faults in user land => can be silently handled by Linux after >> dropping to root domain. This lowering is perfectly fine as the >> higher domains showed no interest in the fault, thus are currently >> running in domain-agnostic code paths anyway. >> >> - faults on fixable kernel addresses => same as above. If the high >> domains fail to evaluate the fix-up result, it's not I-pipe's fault. >> >> - minor faults on kernel addresses (more precisely: in the I-pipe core >> or some I-pipe user) => those would now went unnoticed and need >> further thoughts, granted. >> + >> - major faults on kernel addresses => still generate major oopses and >> will thus be visible. >> >> Did I missed something? If not, I would now start addressing the >> remaining problematic scenario directly instead of throwing all into the >> same pot. >> >>> By sending the report immediately when fixing up in the latter routine, >>> you also avoid the ugly ipipe_orig_domain stuff. >> It's not nice, but it is at least as ugly as reporting a kernel BUG when >> there is only a gracefully fixable bug in user code. I definitely do not >> agree with your approach as well, and I'm convinced we need to find a >> third way here. > > OK, I think this should be acceptable for everyone. It's basically your > approach, reworked to avoid false positives. Moreover, it > unconditionally sets the root domain - should be faster. > Fine with me now. I made the fault report systematically dumped when CONFIG_IPIPE_DEBUG is enabled, to help us fixing the existing issues in the skins (e.g. syscall arguments escaping the required sanity checks for whatever reason). -- Philippe. [-- Attachment #2: handle-fixable-non-root-faults-v3.patch --] [-- Type: text/x-patch, Size: 2077 bytes --] diff --git a/arch/i386/kernel/ipipe.c b/arch/i386/kernel/ipipe.c index 9323951..8381ae5 100644 --- a/arch/i386/kernel/ipipe.c +++ b/arch/i386/kernel/ipipe.c @@ -616,6 +616,12 @@ static int __ipipe_xlate_signo[] = { }; #endif /* CONFIG_KGDB */ +#ifdef CONFIG_IPIPE_DEBUG +#define ipipe_may_dump_nonroot_fault(regs) 1 +#else +#define ipipe_may_dump_nonroot_fault(regs) (!search_exception_tables((regs)->eip)) +#endif + fastcall int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) { unsigned long flags; @@ -643,30 +649,37 @@ fastcall int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int } #endif /* CONFIG_KGDB */ - if (!ipipe_trap_notify(vector, regs)) { - if (!ipipe_root_domain_p) { - /* Fix up domain so that Linux can handle this. */ -#ifdef CONFIG_IPIPE_DEBUG + if (unlikely(ipipe_trap_notify(vector, regs))) { + local_irq_restore(flags); + return 1; + } + + /* + * Detect unhandled faults from kernel space over non-root domains. + */ + if (unlikely(!ipipe_root_domain_p && !(error_code & 4))) { + /* + * Always warn about such faults when running in debug + * mode, otherwise avoid reporting the fixable ones. + */ + if (ipipe_may_dump_nonroot_fault(regs)) { struct ipipe_domain *ipd = ipipe_current_domain; ipipe_current_domain = ipipe_root_domain; ipipe_trace_panic_freeze(); printk(KERN_ERR "BUG: Unhandled exception over domain" - " %s - switching to ROOT\n", - ipd->name); + " %s - switching to ROOT\n", ipd->name); dump_stack(); -#else - ipipe_current_domain = ipipe_root_domain; -#endif /* CONFIG_IPIPE_DEBUG */ } - __ipipe_std_extable[vector](regs, error_code); - local_irq_restore(flags); - __fixup_if(regs); - return 0; } + /* Always switch to root so that Linux can handle it cleanly. */ + ipipe_current_domain = ipipe_root_domain; + + __ipipe_std_extable[vector](regs, error_code); local_irq_restore(flags); + __fixup_if(regs); - return 1; + return 0; } fastcall int __ipipe_divert_exception(struct pt_regs *regs, int vector) ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-08 23:31 ` Philippe Gerum @ 2007-11-09 13:58 ` Jan Kiszka 2007-11-09 16:46 ` Philippe Gerum 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-11-09 13:58 UTC (permalink / raw) To: rpm; +Cc: adeos-main, Xenomai-core Philippe Gerum wrote: > Jan Kiszka wrote: >> Jan Kiszka wrote: >>> Philippe Gerum wrote: >>>> Jan Kiszka wrote: >>>>> Jan Kiszka wrote: >>>>>> This patch addresses the recently discovered issue that I-pipe actually >>>>>> need to deal with faults over non-root domain in which the current >>>>>> domain shows no interest in. Such faults could be triggered inside >>>>>> copy_*_user, thus can cleanly be handled by Linux - if we only allow for >>>>>> this. Currently, if debugging is on, we warn about a potential bug, and >>>>>> corrupt the pipeline states otherwise. >>>>>> >>>>>> The new approach is to unconditionally drop to root domain in such >>>>>> cases, but - for debugging purposes of non-fixable faults - keep track >>>>>> of the original domain and report it on oops. >>>>>> >>>>>> Similar patches are required for other archs. Maybe I can look into >>>>>> x86_64 later. >>>>>> >>>> Nak, this patch would not work as wanted. Again, what you need is to >>>> always fixup, and conditionally send a bug report to the kernel log if >>>> CONFIG_IPIPE_DEBUG is enabled, nothing more. >>>> >>>> This patch assumes that die() is always going to be fired for any >>>> in-kernel fault, so that all reports only need to go through this >>>> routine, which is wrong. Kernel fixups through exception tables may fix >>>> the fault early and silently, and this is particularly the case for >>>> copy_to_user helpers, which do include kernel fixup code. By being >>>> silent when fixing up things in __ipipe_handle_exception() like your >>>> patch currently is, we would be left with no trace at all that some >>>> unhandled fault just happened, except by looking at /proc/xenomai/faults. >>> As you are still remain vague on the actual problematic scenarios, I >>> will try to go through them, and maybe you can add/correct what I miss: >>> >>> - faults in user land => can be silently handled by Linux after >>> dropping to root domain. This lowering is perfectly fine as the >>> higher domains showed no interest in the fault, thus are currently >>> running in domain-agnostic code paths anyway. >>> >>> - faults on fixable kernel addresses => same as above. If the high >>> domains fail to evaluate the fix-up result, it's not I-pipe's fault. >>> >>> - minor faults on kernel addresses (more precisely: in the I-pipe core >>> or some I-pipe user) => those would now went unnoticed and need >>> further thoughts, granted. >>> + >>> - major faults on kernel addresses => still generate major oopses and >>> will thus be visible. >>> >>> Did I missed something? If not, I would now start addressing the >>> remaining problematic scenario directly instead of throwing all into the >>> same pot. >>> >>>> By sending the report immediately when fixing up in the latter routine, >>>> you also avoid the ugly ipipe_orig_domain stuff. >>> It's not nice, but it is at least as ugly as reporting a kernel BUG when >>> there is only a gracefully fixable bug in user code. I definitely do not >>> agree with your approach as well, and I'm convinced we need to find a >>> third way here. >> OK, I think this should be acceptable for everyone. It's basically your >> approach, reworked to avoid false positives. Moreover, it >> unconditionally sets the root domain - should be faster. >> > > Fine with me now. I made the fault report systematically dumped when > CONFIG_IPIPE_DEBUG is enabled, to help us fixing the existing issues in > the skins (e.g. syscall arguments escaping the required sanity checks > for whatever reason). Well, this is practically your original version. I still don't see why we want debug code in production setups (WARN_ON, e.g., doesn't work this way either), and I still don't understand why you want to report user faults as kernel bugs. If you want to spot skin issues, just grep for __xn_copy_to/from_user or __xn_strncpy_from_user and check for missing return code evaluations. Really, that's nothing we need runtime checks in I-pipe for. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-09 13:58 ` Jan Kiszka @ 2007-11-09 16:46 ` Philippe Gerum 2007-11-09 17:07 ` Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-11-09 16:46 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Xenomai-core Jan Kiszka wrote: > Well, this is practically your original version. I still don't see why > we want debug code in production setups (WARN_ON, e.g., doesn't work > this way either), Do you actually leave CONFIG_IPIPE_DEBUG on in production setups? > and I still don't understand why you want to report > user faults as kernel bugs. > This is a red herring: we are currently in a situation where such messages would be actual kernel issues because some APIs are sloppy wrt kernel/user copies, this is the point. > If you want to spot skin issues, just grep for __xn_copy_to/from_user or > __xn_strncpy_from_user and check for missing return code evaluations. > Really, that's nothing we need runtime checks in I-pipe for. > It's a _debug_ tool until everything is fixed, because we have loads of APIs to fix for untested copy_from/to_user, and anything out-of-tree code may have used the same way. It's merely a conditional and passive check, far away from the hot path, which is there "just in case". The same way you may run with the domain context checker enabled although you may be reasonably confident that having some context mismatch still hiding in the Xenomai core is currently unlikely -- but you know nothing about what may be going on out-of-tree. This feature is mainly aimed at API writers, not users. Additionally, not everyone is going to switch to 2.4 with the ironed kernel/user copy code overnight, so we want this code in 2.6.20/x86 too, so that at least, we may tell people to upgrade their I-pipe patch if need be, then switch CONFIG_IPIPE_DEBUG on to trap those trivial issues, including over past 2.3.x releases. At least, we could tell them how and where to fix their code. At worst, the message will never trigger because all bugous spots will have been addressed within the skins during the next stable releases: fine. Otherwise, we will have more information to chase such kind of bugs. -- Philippe. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-09 16:46 ` Philippe Gerum @ 2007-11-09 17:07 ` Jan Kiszka 2007-11-09 17:31 ` Philippe Gerum 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-11-09 17:07 UTC (permalink / raw) To: rpm; +Cc: adeos-main, Xenomai-core Philippe Gerum wrote: > Jan Kiszka wrote: > >> Well, this is practically your original version. I still don't see why >> we want debug code in production setups (WARN_ON, e.g., doesn't work >> this way either), > > Do you actually leave CONFIG_IPIPE_DEBUG on in production setups? Not /me, but your patch drags in the the full warning message even without CONFIG_IPIPE_DEBUG - if I virtually apply the patch correctly. > >> and I still don't understand why you want to report >> user faults as kernel bugs. >> > > This is a red herring: we are currently in a situation where such > messages would be actual kernel issues because some APIs are sloppy wrt > kernel/user copies, this is the point. > >> If you want to spot skin issues, just grep for __xn_copy_to/from_user or >> __xn_strncpy_from_user and check for missing return code evaluations. >> Really, that's nothing we need runtime checks in I-pipe for. >> > > It's a _debug_ tool until everything is fixed, because we have loads of > APIs to fix for untested copy_from/to_user, and anything out-of-tree > code may have used the same way. It's merely a conditional and passive > check, far away from the hot path, which is there "just in case". The > same way you may run with the domain context checker enabled although > you may be reasonably confident that having some context mismatch still > hiding in the Xenomai core is currently unlikely -- but you know nothing > about what may be going on out-of-tree. > > This feature is mainly aimed at API writers, not users. Additionally, If you want to support API writer: Let us tag __xn_copy* with __must_check - _that_ would be the right tool for pointing out remaining issues, not some runtime warning that a) only triggers if the user stumbled over it and b) causes false positives for those skin which already behave nicely. > not everyone is going to switch to 2.4 with the ironed kernel/user copy > code overnight, so we want this code in 2.6.20/x86 too, so that at > least, we may tell people to upgrade their I-pipe patch if need be, then > switch CONFIG_IPIPE_DEBUG on to trap those trivial issues, including > over past 2.3.x releases. At least, we could tell them how and where to > fix their code. > > At worst, the message will never trigger because all bugous spots will > have been addressed within the skins during the next stable releases: > fine. Otherwise, we will have more information to chase such kind of bugs. Nope, it will trigger also after the skins are converted. This is what makes it inappropriate IMHO. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-09 17:07 ` Jan Kiszka @ 2007-11-09 17:31 ` Philippe Gerum 2007-11-11 18:59 ` Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-11-09 17:31 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Xenomai-core Jan Kiszka wrote: > Philippe Gerum wrote: >> Jan Kiszka wrote: >> >>> Well, this is practically your original version. I still don't see why >>> we want debug code in production setups (WARN_ON, e.g., doesn't work >>> this way either), >> Do you actually leave CONFIG_IPIPE_DEBUG on in production setups? > > Not /me, but your patch drags in the the full warning message even > without CONFIG_IPIPE_DEBUG - if I virtually apply the patch correctly. > Are you 100% sure of this? >>> and I still don't understand why you want to report >>> user faults as kernel bugs. >>> >> This is a red herring: we are currently in a situation where such >> messages would be actual kernel issues because some APIs are sloppy wrt >> kernel/user copies, this is the point. >> >>> If you want to spot skin issues, just grep for __xn_copy_to/from_user or >>> __xn_strncpy_from_user and check for missing return code evaluations. >>> Really, that's nothing we need runtime checks in I-pipe for. >>> >> It's a _debug_ tool until everything is fixed, because we have loads of >> APIs to fix for untested copy_from/to_user, and anything out-of-tree >> code may have used the same way. It's merely a conditional and passive >> check, far away from the hot path, which is there "just in case". The >> same way you may run with the domain context checker enabled although >> you may be reasonably confident that having some context mismatch still >> hiding in the Xenomai core is currently unlikely -- but you know nothing >> about what may be going on out-of-tree. >> >> This feature is mainly aimed at API writers, not users. Additionally, > > If you want to support API writer: Let us tag __xn_copy* with > __must_check - _that_ would be the right tool for pointing out remaining > issues, not some runtime warning that a) only triggers if the user What do you make of existing setups which will not upgrade to 2.4 in any foreseeable future? > stumbled over it and b) causes false positives for those skin which > already behave nicely. > The only case I see which could bother you is this one: people arming CONFIG_IPIPE_DEBUG because they are in development mode, and passing deadly spurious arguments to a syscall for reading/writing memory. I'm not exactly sure that their main problem would be the warning message per-se, right? Do you have any other illustration of so-called false positive I may have missed? >> not everyone is going to switch to 2.4 with the ironed kernel/user copy >> code overnight, so we want this code in 2.6.20/x86 too, so that at >> least, we may tell people to upgrade their I-pipe patch if need be, then >> switch CONFIG_IPIPE_DEBUG on to trap those trivial issues, including >> over past 2.3.x releases. At least, we could tell them how and where to >> fix their code. >> >> At worst, the message will never trigger because all bugous spots will >> have been addressed within the skins during the next stable releases: >> fine. Otherwise, we will have more information to chase such kind of bugs. > > Nope, it will trigger also after the skins are converted. This is what > makes it inappropriate IMHO. > What's really inappropriate, is more likely to pass bugous pointers to a syscall, I think. -- Philippe. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-09 17:31 ` Philippe Gerum @ 2007-11-11 18:59 ` Jan Kiszka 2007-11-13 10:28 ` Philippe Gerum 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-11-11 18:59 UTC (permalink / raw) To: rpm; +Cc: adeos-main, Xenomai-core [-- Attachment #1: Type: text/plain, Size: 6720 bytes --] Philippe Gerum wrote: > Jan Kiszka wrote: >> Philippe Gerum wrote: >>> Jan Kiszka wrote: >>> >>>> Well, this is practically your original version. I still don't see why >>>> we want debug code in production setups (WARN_ON, e.g., doesn't work >>>> this way either), >>> Do you actually leave CONFIG_IPIPE_DEBUG on in production setups? >> Not /me, but your patch drags in the the full warning message even >> without CONFIG_IPIPE_DEBUG - if I virtually apply the patch correctly. >> > > Are you 100% sure of this? Yep, I am. The rationale here is to keep the Linux fault path free of tests, unless we need them urgently also for non-debug setups. We don't, we just loose the context information in case the test is actually true. The oops will come anyway. > >>>> and I still don't understand why you want to report >>>> user faults as kernel bugs. >>>> >>> This is a red herring: we are currently in a situation where such >>> messages would be actual kernel issues because some APIs are sloppy wrt >>> kernel/user copies, this is the point. >>> >>>> If you want to spot skin issues, just grep for __xn_copy_to/from_user or >>>> __xn_strncpy_from_user and check for missing return code evaluations. >>>> Really, that's nothing we need runtime checks in I-pipe for. >>>> >>> It's a _debug_ tool until everything is fixed, because we have loads of >>> APIs to fix for untested copy_from/to_user, and anything out-of-tree >>> code may have used the same way. It's merely a conditional and passive >>> check, far away from the hot path, which is there "just in case". The >>> same way you may run with the domain context checker enabled although >>> you may be reasonably confident that having some context mismatch still >>> hiding in the Xenomai core is currently unlikely -- but you know nothing >>> about what may be going on out-of-tree. >>> >>> This feature is mainly aimed at API writers, not users. Additionally, >> If you want to support API writer: Let us tag __xn_copy* with >> __must_check - _that_ would be the right tool for pointing out remaining >> issues, not some runtime warning that a) only triggers if the user > > What do you make of existing setups which will not upgrade to 2.4 in any > foreseeable future? 99% of the very same setups won't upgrade to new kernel patches either, so the effect is likely negligible. > >> stumbled over it and b) causes false positives for those skin which >> already behave nicely. >> > > The only case I see which could bother you is this one: people arming No, I'm still mumbling about both of the explained issues. > CONFIG_IPIPE_DEBUG because they are in development mode, and passing > deadly spurious arguments to a syscall for reading/writing memory. I'm They are no longer deadly (since we started switching to root). The skins already handle them, some just fail to inform the user who may then stumble differently about his own faults. See, I didn't introduce CONFIG_IPIPE_DEBUG for user space debugging, it has always been a kernel debug switch. So I would prefer to keep its semantics straight here as well. > not exactly sure that their main problem would be the warning message > per-se, right? > > Do you have any other illustration of so-called false positive I may > have missed? > >>> not everyone is going to switch to 2.4 with the ironed kernel/user copy >>> code overnight, so we want this code in 2.6.20/x86 too, so that at >>> least, we may tell people to upgrade their I-pipe patch if need be, then >>> switch CONFIG_IPIPE_DEBUG on to trap those trivial issues, including >>> over past 2.3.x releases. At least, we could tell them how and where to >>> fix their code. >>> >>> At worst, the message will never trigger because all bugous spots will >>> have been addressed within the skins during the next stable releases: >>> fine. Otherwise, we will have more information to chase such kind of bugs. >> Nope, it will trigger also after the skins are converted. This is what >> makes it inappropriate IMHO. >> > > What's really inappropriate, is more likely to pass bugous pointers to a > syscall, I think. Bogus user pointers don't need to worry us from the I-pipe perspective. I-pipe users either wrapped them in copy_to/from_user or will trap on them loudly anyway. If they wrap but fail to report, then this is still no excuse to pile up debugging code under I-pipe, because the kernel will remain unaffected. Even if it is a small test like this here, it is a question of principles. Jan PS: --- arch/i386/kernel/ipipe.c | 34 ++++++++++++---------------------- 1 file changed, 12 insertions(+), 22 deletions(-) Index: linux-2.6.23.1-xeno/arch/i386/kernel/ipipe.c =================================================================== --- linux-2.6.23.1-xeno.orig/arch/i386/kernel/ipipe.c +++ linux-2.6.23.1-xeno/arch/i386/kernel/ipipe.c @@ -616,12 +616,6 @@ static int __ipipe_xlate_signo[] = { }; #endif /* CONFIG_KGDB */ -#ifdef CONFIG_IPIPE_DEBUG -#define ipipe_may_dump_nonroot_fault(regs) 1 -#else -#define ipipe_may_dump_nonroot_fault(regs) (!search_exception_tables((regs)->eip)) -#endif - fastcall int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) { unsigned long flags; @@ -654,23 +648,19 @@ fastcall int __ipipe_handle_exception(st return 1; } - /* - * Detect unhandled faults from kernel space over non-root domains. - */ - if (unlikely(!ipipe_root_domain_p && !(error_code & 4))) { - /* - * Always warn about such faults when running in debug - * mode, otherwise avoid reporting the fixable ones. - */ - if (ipipe_may_dump_nonroot_fault(regs)) { - struct ipipe_domain *ipd = ipipe_current_domain; - ipipe_current_domain = ipipe_root_domain; - ipipe_trace_panic_freeze(); - printk(KERN_ERR "DOMAIN FAULT: Unhandled exception over domain" - " %s - switching to ROOT\n", ipd->name); - dump_stack(); - } +#ifdef CONFIG_IPIPE_DEBUG + /* Warn about unhandled faults over non-root domains in kernel space + * unless they occured at fixable locations. */ + if (unlikely(!ipipe_root_domain_p) && !(error_code & 4) && + !search_exception_tables(instruction_pointer(regs))) { + struct ipipe_domain *ipd = ipipe_current_domain; + ipipe_current_domain = ipipe_root_domain; + ipipe_trace_panic_freeze(); + printk(KERN_ERR "BUG: Unhandled exception over domain" + " %s - switching to ROOT\n", ipd->name); + dump_stack(); } +#endif /* CONFIG_IPIPE_DEBUG */ /* Always switch to root so that Linux can handle it cleanly. */ ipipe_current_domain = ipipe_root_domain; [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 249 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-11 18:59 ` Jan Kiszka @ 2007-11-13 10:28 ` Philippe Gerum 2007-11-13 17:40 ` Jan Kiszka 2007-11-13 21:32 ` Gilles Chanteperdrix 0 siblings, 2 replies; 17+ messages in thread From: Philippe Gerum @ 2007-11-13 10:28 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Xenomai-core [-- Attachment #1: Type: text/plain, Size: 6038 bytes --] Jan Kiszka wrote: > Philippe Gerum wrote: >> Jan Kiszka wrote: >>> Philippe Gerum wrote: >>>> Jan Kiszka wrote: >>>> >>>>> Well, this is practically your original version. I still don't see why >>>>> we want debug code in production setups (WARN_ON, e.g., doesn't work >>>>> this way either), >>>> Do you actually leave CONFIG_IPIPE_DEBUG on in production setups? >>> Not /me, but your patch drags in the the full warning message even >>> without CONFIG_IPIPE_DEBUG - if I virtually apply the patch correctly. >>> >> Are you 100% sure of this? > > Yep, I am. The rationale here is to keep the Linux fault path free of > tests, unless we need them urgently also for non-debug setups. We don't, > we just loose the context information in case the test is actually true. > The oops will come anyway. > Ok, the intent of the patch may still be unclear to you because two changes have been merged in a single place, covering two distinct issues, both occurring when some Xenomai context raises an exception without any possibility for the nucleus to downgrade to the root domain. Issue #1 - sloppy copy_from/to_user from skins. Fixable faults may be charged to userland, but since a current problem within skins prevents from sending any sensible information back to the caller through normal return codes (i.e. -EFAULT), the warning is welcome to point the finger at the problematic syscall, only when IPIPE_DEBUG is enabled. This is aimed at skin writers, to help them identifying issues reported by users, when they are caused by unchecked copy_to/from_user calls: in such a case, the first step would be to ask the user to check for its syscall arguments, if he happens to use an unfixed Xenomai release. What one has to update to get this help feature is only the I-pipe, not the entire Xenomai release: this is what makes a significant difference for deployed systems. You call this user space debug, I don't: we are only giving user-space some support to work around our current in-kernel sloppiness, at least temporarily. Because we have a legacy to deal with, and out-of-tree code elsewhere, anything unexpensive which may save everyone debug time is welcome. Issue #2 - Unfixable exception from kernel space, typically from a real-time ISR. Since this issue is _not_ recoverable through the exception table mechanism, it has to be a severe kernel issue, and as such, we do want a big fat warning for it, regardless of the debug/non-debug state. This has exactly the same semantics than the standard BUG() assertion. But, in such a case, you don't want to rely on the Linux fault handling mechanism to eventually generate an oops later, because the underlying context is fundamentally unsafe Linux-wise, and the exception path may well not make it up to that point. What we want is an early warning, asap, so that at the very least, we may get a hint about what was going on before a possible lockup. Said differently, it's co-kernel world, and we have basically no guarantee that all the code which is going to be executed on behalf of the regular Linux exception handler will survive long enough to reach the oops handler. E.g. notifier call chain invoked, hardware interrupts enabled anew in the fault handler, scanning of the VMA list, and I'm only illustrating the page fault handling case here, albeit practically all kind of exception handlers may be involved. We may even want to add %eip to the initial warning output, in case dump_stack() fails miserably -- usually, experience shows that at least the first printk() works, fortunately. In future updates, we may want to rely on raw UART output when the platform provides for it (currently, all blackfin, mpc52xx and some? ARM do, IIRC). > See, I didn't introduce > CONFIG_IPIPE_DEBUG for user space debugging, it has always been a kernel > debug switch. So I would prefer to keep its semantics straight here as well. > Really, If you look at the facts and what has been written so far, this patch has never been meant as a user-space debugging feature. The fact that warnings might be issued upon user-space errors is merely an help to better identify current kernel issues in skins, before the lack of check for copy_to/from_user return values may generalize havoc, and possibly trigger even more faults. This is issue #1 as described. >>>> This feature is mainly aimed at API writers, not users. Additionally, >>> If you want to support API writer: Let us tag __xn_copy* with >>> __must_check - _that_ would be the right tool for pointing out remaining >>> issues, not some runtime warning that a) only triggers if the user >> What do you make of existing setups which will not upgrade to 2.4 in any >> foreseeable future? > > 99% of the very same setups won't upgrade to new kernel patches either, > so the effect is likely negligible. > I don't know where you got this figure from, but as far as I can tell, since I'm also regularly helping people to deploy Xenomai in production systems, they will upgrade their kernel more frequently than their Xenomai baseline, e.g. to grab the latest kernel fixes from the stable Linux branch, which is especially true regarding 2.6.20. At this chance, they will much more likely upgrade their I-pipe patch rather than the entire Xenomai release. If you take into account that 2.6.20/x86 now includes this feature, people maintaining 2.3.x setups will actually benefit from it when it comes to identify issue #1. The same way, other non-x86 archs may use very recent kernels with v2.3.x, like powerpc which may run the latest 2.6.23. Basically, only x86 needs to upgrade to 2.4 for running kernels beyond 2.6.20. This is why an equivalent feature is also needed for all other archs when it makes sense. I will amend my latest patch to address your concern about the initial test though, by saving the cycles your patch spends unconditionally resetting the domain to the root one on the fast path. This will put our respective implementations roughly on par with respect to this particular issue. -- Philippe. [-- Attachment #2: no-useless-domain-downgrade.patch --] [-- Type: text/x-patch, Size: 1735 bytes --] diff --git a/arch/i386/kernel/ipipe.c b/arch/i386/kernel/ipipe.c index 07e6a65..26fc04b 100644 --- a/arch/i386/kernel/ipipe.c +++ b/arch/i386/kernel/ipipe.c @@ -657,24 +657,25 @@ fastcall int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int /* * Detect unhandled faults from kernel space over non-root domains. */ - if (unlikely(!ipipe_root_domain_p && !(error_code & 4))) { - /* - * Always warn about such faults when running in debug - * mode, otherwise avoid reporting the fixable ones. - */ - if (ipipe_may_dump_nonroot_fault(regs)) { - struct ipipe_domain *ipd = ipipe_current_domain; - ipipe_current_domain = ipipe_root_domain; - ipipe_trace_panic_freeze(); - printk(KERN_ERR "DOMAIN FAULT: Unhandled exception over domain" - " %s - switching to ROOT\n", ipd->name); - dump_stack(); + if (unlikely(!ipipe_root_domain_p)) { + if (unlikely(!(error_code & 4))) { + /* + * Always warn about such faults when running in debug + * mode, otherwise avoid reporting the fixable ones. + */ + if (ipipe_may_dump_nonroot_fault(regs)) { + struct ipipe_domain *ipd = ipipe_current_domain; + ipipe_current_domain = ipipe_root_domain; + ipipe_trace_panic_freeze(); + printk(KERN_ERR "DOMAIN FAULT: Unhandled exception over domain" + " %s at 0x%lx - switching to ROOT\n", ipd->name, regs->eip); + dump_stack(); + } } + /* Switch to root so that Linux can handle the fault cleanly. */ + ipipe_current_domain = ipipe_root_domain; } - /* Always switch to root so that Linux can handle it cleanly. */ - ipipe_current_domain = ipipe_root_domain; - __ipipe_std_extable[vector](regs, error_code); local_irq_restore(flags); __fixup_if(regs); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-13 10:28 ` Philippe Gerum @ 2007-11-13 17:40 ` Jan Kiszka 2007-11-19 11:59 ` Philippe Gerum 2007-11-13 21:32 ` Gilles Chanteperdrix 1 sibling, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-11-13 17:40 UTC (permalink / raw) To: rpm; +Cc: adeos-main, Xenomai-core [-- Attachment #1: Type: text/plain, Size: 7186 bytes --] Philippe Gerum wrote: > Jan Kiszka wrote: >> Philippe Gerum wrote: >>> Jan Kiszka wrote: >>>> Philippe Gerum wrote: >>>>> Jan Kiszka wrote: >>>>> >>>>>> Well, this is practically your original version. I still don't see why >>>>>> we want debug code in production setups (WARN_ON, e.g., doesn't work >>>>>> this way either), >>>>> Do you actually leave CONFIG_IPIPE_DEBUG on in production setups? >>>> Not /me, but your patch drags in the the full warning message even >>>> without CONFIG_IPIPE_DEBUG - if I virtually apply the patch correctly. >>>> >>> Are you 100% sure of this? >> Yep, I am. The rationale here is to keep the Linux fault path free of >> tests, unless we need them urgently also for non-debug setups. We don't, >> we just loose the context information in case the test is actually true. >> The oops will come anyway. >> > > Ok, the intent of the patch may still be unclear to you because two > changes have been merged in a single place, covering two distinct > issues, both occurring when some Xenomai context raises an exception > without any possibility for the nucleus to downgrade to the root domain. > > Issue #1 - sloppy copy_from/to_user from skins. Fixable faults may be > charged to userland, but since a current problem within skins prevents > from sending any sensible information back to the caller through normal > return codes (i.e. -EFAULT), the warning is welcome to point the finger > at the problematic syscall, only when IPIPE_DEBUG is enabled. This is > aimed at skin writers, to help them identifying issues reported by > users, when they are caused by unchecked copy_to/from_user calls: in > such a case, the first step would be to ask the user to check for its > syscall arguments, if he happens to use an unfixed Xenomai release. What > one has to update to get this help feature is only the I-pipe, not the > entire Xenomai release: this is what makes a significant difference for > deployed systems. You call this user space debug, I don't: we are only > giving user-space some support to work around our current in-kernel > sloppiness, at least temporarily. Because we have a legacy to deal with, > and out-of-tree code elsewhere, anything unexpensive which may save > everyone debug time is welcome. OK, if you insist on this, let's go for a clearer version, see attachment. > > Issue #2 - Unfixable exception from kernel space, typically from a > real-time ISR. Since this issue is _not_ recoverable through the > exception table mechanism, it has to be a severe kernel issue, and as > such, we do want a big fat warning for it, regardless of the > debug/non-debug state. This has exactly the same semantics than the > standard BUG() assertion. But, in such a case, you don't want to rely on > the Linux fault handling mechanism to eventually generate an oops later, > because the underlying context is fundamentally unsafe Linux-wise, and > the exception path may well not make it up to that point. What we want THAT is an argument I buy. > is an early warning, asap, so that at the very least, we may get a hint > about what was going on before a possible lockup. Said differently, it's > co-kernel world, and we have basically no guarantee that all the code > which is going to be executed on behalf of the regular Linux exception > handler will survive long enough to reach the oops handler. E.g. > notifier call chain invoked, hardware interrupts enabled anew in the > fault handler, scanning of the VMA list, and I'm only illustrating the > page fault handling case here, albeit practically all kind of exception > handlers may be involved. We may even want to add %eip to the initial > warning output, in case dump_stack() fails miserably -- usually, > experience shows that at least the first printk() works, fortunately. In > future updates, we may want to rely on raw UART output when the platform > provides for it (currently, all blackfin, mpc52xx and some? ARM do, IIRC). > >> See, I didn't introduce >> CONFIG_IPIPE_DEBUG for user space debugging, it has always been a kernel >> debug switch. So I would prefer to keep its semantics straight here as > well. > > Really, If you look at the facts and what has been written so far, > this patch has never been meant as a user-space debugging feature. The > fact that warnings might be issued upon user-space errors is merely an > help to better identify current kernel issues in skins, before the lack > of check for copy_to/from_user return values may generalize havoc, and > possibly trigger even more faults. This is issue #1 as described. > >>>>> This feature is mainly aimed at API writers, not users. Additionally, >>>> If you want to support API writer: Let us tag __xn_copy* with >>>> __must_check - _that_ would be the right tool for pointing out remaining >>>> issues, not some runtime warning that a) only triggers if the user >>> What do you make of existing setups which will not upgrade to 2.4 in any >>> foreseeable future? >> 99% of the very same setups won't upgrade to new kernel patches either, >> so the effect is likely negligible. >> > > I don't know where you got this figure from, but as far as I can tell, > since I'm also regularly helping people to deploy Xenomai in production > systems, they will upgrade their kernel more frequently than their > Xenomai baseline, e.g. to grab the latest kernel fixes from the stable > Linux branch, which is especially true regarding 2.6.20. At this chance, > they will much more likely upgrade their I-pipe patch rather than the > entire Xenomai release. This does not correlate with my experience, not regarding major Xenomai updates, but step-ups within the stable series. If upgrading the kernel anyway, obtaining the last Xenomai bug fixes is what I recommend and also observe (sometimes Xenomai updates are even mandatory due to required HAL updates). In any case, no one prevents us from fixing the skin issues also in the stable series. > > If you take into account that 2.6.20/x86 now includes this > feature, people maintaining 2.3.x setups will actually benefit from > it when it comes to identify issue #1. The same way, other non-x86 archs > may use very recent kernels with v2.3.x, like powerpc which may run the > latest 2.6.23. Basically, only x86 needs to upgrade to 2.4 for running > kernels beyond 2.6.20. This is why an equivalent feature is also needed > for all other archs when it makes sense. > > I will amend my latest patch to address your concern about the initial > test though, by saving the cycles your patch spends unconditionally > resetting the domain to the root one on the fast path. This will put The unconditional setting indeed only made sense as long as there was no test at all on the non-debug case. > our respective implementations roughly on par with respect to this > particular issue. > OK, attached is an attempt to follow your arguments, while also keeping topics separate which have different backgrounds (ie. clarify the messages). As we go for reporting user faults now, lets include those (unlikely and really buggy) cases over user mode as well. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux [-- Attachment #2: rework-fault-warnings.patch --] [-- Type: text/x-patch, Size: 2484 bytes --] --- arch/i386/kernel/ipipe.c | 44 ++++++++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 20 deletions(-) Index: linux-2.6.23.1-xeno/arch/i386/kernel/ipipe.c =================================================================== --- linux-2.6.23.1-xeno.orig/arch/i386/kernel/ipipe.c +++ linux-2.6.23.1-xeno/arch/i386/kernel/ipipe.c @@ -616,12 +616,6 @@ static int __ipipe_xlate_signo[] = { }; #endif /* CONFIG_KGDB */ -#ifdef CONFIG_IPIPE_DEBUG -#define ipipe_may_dump_nonroot_fault(regs) 1 -#else -#define ipipe_may_dump_nonroot_fault(regs) (!search_exception_tables((regs)->eip)) -#endif - fastcall int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector) { unsigned long flags; @@ -655,25 +649,35 @@ fastcall int __ipipe_handle_exception(st } /* - * Detect unhandled faults from kernel space over non-root domains. + * Detect faults over non-root domains. */ - if (unlikely(!ipipe_root_domain_p && !(error_code & 4))) { - /* - * Always warn about such faults when running in debug - * mode, otherwise avoid reporting the fixable ones. - */ - if (ipipe_may_dump_nonroot_fault(regs)) { - struct ipipe_domain *ipd = ipipe_current_domain; - ipipe_current_domain = ipipe_root_domain; + if (unlikely(!ipipe_root_domain_p)) { + struct ipipe_domain *ipd = ipipe_current_domain; + + /* Switch to root so that Linux can handle it cleanly. */ + ipipe_current_domain = ipipe_root_domain; + + /* Always warn about user land and unfixable faults. */ + if ((error_code & 4) && !search_exception_tables(regs->eip)) { ipipe_trace_panic_freeze(); - printk(KERN_ERR "DOMAIN FAULT: Unhandled exception over domain" - " %s - switching to ROOT\n", ipd->name); + printk(KERN_ERR "BUG: Unhandled exception over domain" + " %s at 0x%lx - switching to ROOT\n", + ipd->name, regs->eip); dump_stack(); } - } +#ifdef CONFIG_IPIPE_DEBUG + /* Also report fixable ones when debugging is enabled. */ + else { + ipipe_trace_panic_freeze(); + printk(KERN_WARNING "WARNING: Fixable exception over " + "domain %s at 0x%lx - switching to ROOT\n", + ipd->name, regs->eip); + dump_stack(); + } +#endif /* CONFIG_IPIPE_DEBUG */ - /* Always switch to root so that Linux can handle it cleanly. */ - ipipe_current_domain = ipipe_root_domain; + ipipe_current_domain = ipipe_root_domain; + } __ipipe_std_extable[vector](regs, error_code); local_irq_restore(flags); ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-13 17:40 ` Jan Kiszka @ 2007-11-19 11:59 ` Philippe Gerum 2007-11-19 12:42 ` Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-11-19 11:59 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Xenomai-core Jan Kiszka wrote: > > OK, attached is an attempt to follow your arguments, while also keeping > topics separate which have different backgrounds (ie. clarify the > messages). As we go for reporting user faults now, lets include those > (unlikely and really buggy) cases over user mode as well. I guess you meant this? - if ((error_code & 4) && !search_exception_tables(regs->eip)) + if ((error_code & 4) || !search_exception_tables(regs->eip)) In any case, I fully agree on the logic of your latest patch. Will commit, thanks. -- Philippe. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-19 11:59 ` Philippe Gerum @ 2007-11-19 12:42 ` Jan Kiszka 0 siblings, 0 replies; 17+ messages in thread From: Jan Kiszka @ 2007-11-19 12:42 UTC (permalink / raw) To: rpm; +Cc: adeos-main, Xenomai-core Philippe Gerum wrote: > Jan Kiszka wrote: >> OK, attached is an attempt to follow your arguments, while also keeping >> topics separate which have different backgrounds (ie. clarify the >> messages). As we go for reporting user faults now, lets include those >> (unlikely and really buggy) cases over user mode as well. > > I guess you meant this? > > - if ((error_code & 4) && !search_exception_tables(regs->eip)) > + if ((error_code & 4) || !search_exception_tables(regs->eip)) Oh... yeah. > > In any case, I fully agree on the logic of your latest patch. Will > commit, thanks. > That's fine. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [Adeos-main] [PATCH] i386: switch to root domain on unhandled non-root faults 2007-11-13 10:28 ` Philippe Gerum 2007-11-13 17:40 ` Jan Kiszka @ 2007-11-13 21:32 ` Gilles Chanteperdrix 1 sibling, 0 replies; 17+ messages in thread From: Gilles Chanteperdrix @ 2007-11-13 21:32 UTC (permalink / raw) To: rpm; +Cc: adeos-main, Jan Kiszka, Xenomai-core Philippe Gerum wrote: > (...) In > future updates, we may want to rely on raw UART output when the platform > provides for it (currently, all blackfin, mpc52xx and some? ARM do, IIRC). ARMs do have printascii, which outputs a string to the UART, it is defined from include/asm-arm/arch-*/debug-macro.S, so I guess all ARM architectures with a non-trivial debug-macro.S have a working printascii. Note that compilation of printascii depends on the option CONFIG_DEBUG_LL to be enabled in the kernel configuration. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-11-19 12:42 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-05 19:16 [Xenomai-core] [PATCH] i386: switch to root domain on unhandled non-root faults Jan Kiszka 2007-11-05 19:23 ` [Xenomai-core] [Adeos-main] " Jan Kiszka 2007-11-05 22:28 ` Philippe Gerum 2007-11-06 6:16 ` Jan Kiszka 2007-11-06 9:01 ` Jan Kiszka 2007-11-08 13:48 ` Jan Kiszka 2007-11-08 23:31 ` Philippe Gerum 2007-11-09 13:58 ` Jan Kiszka 2007-11-09 16:46 ` Philippe Gerum 2007-11-09 17:07 ` Jan Kiszka 2007-11-09 17:31 ` Philippe Gerum 2007-11-11 18:59 ` Jan Kiszka 2007-11-13 10:28 ` Philippe Gerum 2007-11-13 17:40 ` Jan Kiszka 2007-11-19 11:59 ` Philippe Gerum 2007-11-19 12:42 ` Jan Kiszka 2007-11-13 21:32 ` Gilles Chanteperdrix
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.