[Xenomai] Mayday issues again

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai] Mayday issues again
@ 2015-06-20 14:34 Jan Kiszka
  2015-06-20 14:46 ` Gilles Chanteperdrix
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jan Kiszka @ 2015-06-20 14:34 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

Hi Philippe,

the mayday mechanism is causing troubles to me again.

First of all, there is a bug /wrt restarting the mayday syscall. We
current don't restart on pending signals, thus destroy the register of
the interrupted thread that contains the syscall return code. See my
for-forge branch for a fix proposal.

But actually I would like to get rid of the syscall trampoline
completely if somehow possible, at least for certain archs. The reason
is that it ruins debuggability. If a RT thread is stopped by gdb with
the help of the mayday mechanism, it ends up waiting for resumption on
the mayday page. Even worse, backtracing is broken, at least on x86.

That brings me to the key question: why do we need the syscall
trampoline? We set TIP_MAYDAY in the target task, the task causes
IPIPE_TRAP_MAYDAY to be reported via the trap hook, the hook sets the
trampoline code, the trampoline triggers the syscall, and only on
syscall return, we finally migrate the task to Linux. What prevents
doing the migration already in the trap hook, ie. in
handle_mayday_event? The pattern seems similar to the migration we
trigger on userspace faults. And it seems to works, at least for x86,
and doesn't have the unwanted side effects.

The background of this work is improving gdb support, in particular
deterministic stopping and resuming of multi-threaded RT processes. I'm
still in the design & prototype phase, RFC patches will follow later.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://xenomai.org/pipermail/xenomai/attachments/20150620/34f24c6e/attachment.sig>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-06-20 14:34 [Xenomai] Mayday issues again Jan Kiszka
@ 2015-06-20 14:46 ` Gilles Chanteperdrix
  2015-06-20 15:01   ` Jan Kiszka
  2015-06-20 18:15 ` Philippe Gerum
  2015-07-07  9:27 ` Philippe Gerum
  2 siblings, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2015-06-20 14:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka wrote:
> Hi Philippe,
>
> the mayday mechanism is causing troubles to me again.
>
> First of all, there is a bug /wrt restarting the mayday syscall. We
> current don't restart on pending signals, thus destroy the register of
> the interrupted thread that contains the syscall return code. See my
> for-forge branch for a fix proposal.
>
> But actually I would like to get rid of the syscall trampoline
> completely if somehow possible, at least for certain archs. The reason
> is that it ruins debuggability. If a RT thread is stopped by gdb with
> the help of the mayday mechanism, it ends up waiting for resumption on
> the mayday page. Even worse, backtracing is broken, at least on x86.

The problem here seems to be mixing mayday with gdb. As its name
indicates, the mayday mechanism is something exceptional, that should not
happen during a casual debugging session with gdb.


-- 
                                            Gilles.
https://click-hack.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-06-20 14:46 ` Gilles Chanteperdrix
@ 2015-06-20 15:01   ` Jan Kiszka
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2015-06-20 15:01 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2015-06-20 16:46, Gilles Chanteperdrix wrote:
> 
> Jan Kiszka wrote:
>> Hi Philippe,
>>
>> the mayday mechanism is causing troubles to me again.
>>
>> First of all, there is a bug /wrt restarting the mayday syscall. We
>> current don't restart on pending signals, thus destroy the register of
>> the interrupted thread that contains the syscall return code. See my
>> for-forge branch for a fix proposal.
>>
>> But actually I would like to get rid of the syscall trampoline
>> completely if somehow possible, at least for certain archs. The reason
>> is that it ruins debuggability. If a RT thread is stopped by gdb with
>> the help of the mayday mechanism, it ends up waiting for resumption on
>> the mayday page. Even worse, backtracing is broken, at least on x86.
> 
> The problem here seems to be mixing mayday with gdb. As its name
> indicates, the mayday mechanism is something exceptional, that should not
> happen during a casual debugging session with gdb.

Yes, mayday is a misnomer today. It is also used to synchronously inject
Linux signals into the RT threads (handle_sigwake_event ->
__xnthread_kick -> ipipe_raise_mayday), like SIGSTOP.

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://xenomai.org/pipermail/xenomai/attachments/20150620/d73c711b/attachment.sig>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-06-20 14:34 [Xenomai] Mayday issues again Jan Kiszka
  2015-06-20 14:46 ` Gilles Chanteperdrix
@ 2015-06-20 18:15 ` Philippe Gerum
  2015-06-21 17:53   ` Jan Kiszka
  2015-07-07  9:27 ` Philippe Gerum
  2 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2015-06-20 18:15 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 06/20/2015 04:34 PM, Jan Kiszka wrote:
> Hi Philippe,
> 
> the mayday mechanism is causing troubles to me again.
> 
> First of all, there is a bug /wrt restarting the mayday syscall. We
> current don't restart on pending signals, thus destroy the register of
> the interrupted thread that contains the syscall return code. See my
> for-forge branch for a fix proposal.
> 
> But actually I would like to get rid of the syscall trampoline
> completely if somehow possible, at least for certain archs. The reason
> is that it ruins debuggability. If a RT thread is stopped by gdb with
> the help of the mayday mechanism, it ends up waiting for resumption on
> the mayday page. Even worse, backtracing is broken, at least on x86.
> 
> That brings me to the key question: why do we need the syscall
> trampoline? We set TIP_MAYDAY in the target task, the task causes
> IPIPE_TRAP_MAYDAY to be reported via the trap hook, the hook sets the
> trampoline code, the trampoline triggers the syscall, and only on
> syscall return, we finally migrate the task to Linux. What prevents
> doing the migration already in the trap hook, ie. in
> handle_mayday_event? The pattern seems similar to the migration we
> trigger on userspace faults. And it seems to works, at least for x86,
> and doesn't have the unwanted side effects.
> 
> The background of this work is improving gdb support, in particular
> deterministic stopping and resuming of multi-threaded RT processes. I'm
> still in the design & prototype phase, RFC patches will follow later.
> 

Ok. The proposed design has to cover the basic case solved by the mayday
mechanism, i.e. a runaway thread spinning into a syscall-less loop.

e.g. it should be able to switch such code back to secondary mode:

	for (;;) ;

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-06-20 18:15 ` Philippe Gerum
@ 2015-06-21 17:53   ` Jan Kiszka
  2015-06-21 18:57     ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2015-06-21 17:53 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 2015-06-20 20:15, Philippe Gerum wrote:
> On 06/20/2015 04:34 PM, Jan Kiszka wrote:
>> Hi Philippe,
>>
>> the mayday mechanism is causing troubles to me again.
>>
>> First of all, there is a bug /wrt restarting the mayday syscall. We
>> current don't restart on pending signals, thus destroy the register of
>> the interrupted thread that contains the syscall return code. See my
>> for-forge branch for a fix proposal.
>>
>> But actually I would like to get rid of the syscall trampoline
>> completely if somehow possible, at least for certain archs. The reason
>> is that it ruins debuggability. If a RT thread is stopped by gdb with
>> the help of the mayday mechanism, it ends up waiting for resumption on
>> the mayday page. Even worse, backtracing is broken, at least on x86.
>>
>> That brings me to the key question: why do we need the syscall
>> trampoline? We set TIP_MAYDAY in the target task, the task causes
>> IPIPE_TRAP_MAYDAY to be reported via the trap hook, the hook sets the
>> trampoline code, the trampoline triggers the syscall, and only on
>> syscall return, we finally migrate the task to Linux. What prevents
>> doing the migration already in the trap hook, ie. in
>> handle_mayday_event? The pattern seems similar to the migration we
>> trigger on userspace faults. And it seems to works, at least for x86,
>> and doesn't have the unwanted side effects.
>>
>> The background of this work is improving gdb support, in particular
>> deterministic stopping and resuming of multi-threaded RT processes. I'm
>> still in the design & prototype phase, RFC patches will follow later.
>>
> 
> Ok. The proposed design has to cover the basic case solved by the mayday
> mechanism, i.e. a runaway thread spinning into a syscall-less loop.
> 
> e.g. it should be able to switch such code back to secondary mode:
> 
> 	for (;;) ;
> 

Related smokey test case passes, both on x86 and ARM. Other archs need
to be checked carefully, though. I've pushed a commit that radically
removes everything, but essentially does this:

diff --git a/kernel/cobalt/posix/process.c b/kernel/cobalt/posix/process.c
index 6110da6..85e9144 100644
--- a/kernel/cobalt/posix/process.c
+++ b/kernel/cobalt/posix/process.c
@@ -762,16 +761,9 @@ static inline int handle_exception(struct ipipe_trap_data *d)
 
 static int handle_mayday_event(struct pt_regs *regs)
 {
-	struct xnthread *thread = xnthread_current();
-	struct xnarchtcb *tcb = xnthread_archtcb(thread);
-	struct cobalt_ppd *sys_ppd;
-
-	XENO_BUG_ON(COBALT, !xnthread_test_state(thread, XNUSER));
+	XENO_BUG_ON(COBALT, !xnthread_test_state(xnthread_current(), XNUSER));
 
-	/* We enter the mayday handler with hw IRQs off. */
-	sys_ppd = cobalt_ppd_get(0);
-
-	xnarch_handle_mayday(tcb, regs, sys_ppd->mayday_tramp);
+	xnthread_relax(0, 0);
 
 	return KEVENT_PROPAGATE;
 }


Could you check this on the other archs?

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://xenomai.org/pipermail/xenomai/attachments/20150621/dbcc3853/attachment.sig>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-06-21 17:53   ` Jan Kiszka
@ 2015-06-21 18:57     ` Philippe Gerum
  0 siblings, 0 replies; 21+ messages in thread
From: Philippe Gerum @ 2015-06-21 18:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 06/21/2015 07:53 PM, Jan Kiszka wrote:
> On 2015-06-20 20:15, Philippe Gerum wrote:
>> On 06/20/2015 04:34 PM, Jan Kiszka wrote:
>>> Hi Philippe,
>>>
>>> the mayday mechanism is causing troubles to me again.
>>>
>>> First of all, there is a bug /wrt restarting the mayday syscall. We
>>> current don't restart on pending signals, thus destroy the register of
>>> the interrupted thread that contains the syscall return code. See my
>>> for-forge branch for a fix proposal.
>>>
>>> But actually I would like to get rid of the syscall trampoline
>>> completely if somehow possible, at least for certain archs. The reason
>>> is that it ruins debuggability. If a RT thread is stopped by gdb with
>>> the help of the mayday mechanism, it ends up waiting for resumption on
>>> the mayday page. Even worse, backtracing is broken, at least on x86.
>>>
>>> That brings me to the key question: why do we need the syscall
>>> trampoline? We set TIP_MAYDAY in the target task, the task causes
>>> IPIPE_TRAP_MAYDAY to be reported via the trap hook, the hook sets the
>>> trampoline code, the trampoline triggers the syscall, and only on
>>> syscall return, we finally migrate the task to Linux. What prevents
>>> doing the migration already in the trap hook, ie. in
>>> handle_mayday_event? The pattern seems similar to the migration we
>>> trigger on userspace faults. And it seems to works, at least for x86,
>>> and doesn't have the unwanted side effects.
>>>
>>> The background of this work is improving gdb support, in particular
>>> deterministic stopping and resuming of multi-threaded RT processes. I'm
>>> still in the design & prototype phase, RFC patches will follow later.
>>>
>>
>> Ok. The proposed design has to cover the basic case solved by the mayday
>> mechanism, i.e. a runaway thread spinning into a syscall-less loop.
>>
>> e.g. it should be able to switch such code back to secondary mode:
>>
>> 	for (;;) ;
>>
> 
> Related smokey test case passes, both on x86 and ARM. Other archs need
> to be checked carefully, though. I've pushed a commit that radically
> removes everything, but essentially does this:
> 
> diff --git a/kernel/cobalt/posix/process.c b/kernel/cobalt/posix/process.c
> index 6110da6..85e9144 100644
> --- a/kernel/cobalt/posix/process.c
> +++ b/kernel/cobalt/posix/process.c
> @@ -762,16 +761,9 @@ static inline int handle_exception(struct ipipe_trap_data *d)
>  
>  static int handle_mayday_event(struct pt_regs *regs)
>  {
> -	struct xnthread *thread = xnthread_current();
> -	struct xnarchtcb *tcb = xnthread_archtcb(thread);
> -	struct cobalt_ppd *sys_ppd;
> -
> -	XENO_BUG_ON(COBALT, !xnthread_test_state(thread, XNUSER));
> +	XENO_BUG_ON(COBALT, !xnthread_test_state(xnthread_current(), XNUSER));
>  
> -	/* We enter the mayday handler with hw IRQs off. */
> -	sys_ppd = cobalt_ppd_get(0);
> -
> -	xnarch_handle_mayday(tcb, regs, sys_ppd->mayday_tramp);
> +	xnthread_relax(0, 0);
>  
>  	return KEVENT_PROPAGATE;
>  }
> 
> 
> Could you check this on the other archs?
> 

I'll check on bfin and ppc. We won't release nios2 and sh support for
3.0 eventually.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-06-20 14:34 [Xenomai] Mayday issues again Jan Kiszka
  2015-06-20 14:46 ` Gilles Chanteperdrix
  2015-06-20 18:15 ` Philippe Gerum
@ 2015-07-07  9:27 ` Philippe Gerum
  2015-07-07 12:53   ` Philippe Gerum
  2 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2015-07-07  9:27 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Hi Jan,

On 06/20/2015 04:34 PM, Jan Kiszka wrote:
> Hi Philippe,
> 
> the mayday mechanism is causing troubles to me again.
> 
> First of all, there is a bug /wrt restarting the mayday syscall. We
> current don't restart on pending signals, thus destroy the register of
> the interrupted thread that contains the syscall return code. See my
> for-forge branch for a fix proposal.
> 
> But actually I would like to get rid of the syscall trampoline
> completely if somehow possible, at least for certain archs. The reason
> is that it ruins debuggability. If a RT thread is stopped by gdb with
> the help of the mayday mechanism, it ends up waiting for resumption on
> the mayday page. Even worse, backtracing is broken, at least on x86.
> 
> That brings me to the key question: why do we need the syscall
> trampoline? We set TIP_MAYDAY in the target task, the task causes
> IPIPE_TRAP_MAYDAY to be reported via the trap hook, the hook sets the
> trampoline code, the trampoline triggers the syscall, and only on
> syscall return, we finally migrate the task to Linux. What prevents
> doing the migration already in the trap hook, ie. in
> handle_mayday_event? The pattern seems similar to the migration we
> trigger on userspace faults. And it seems to works, at least for x86,
> and doesn't have the unwanted side effects.
> 
> The background of this work is improving gdb support, in particular
> deterministic stopping and resuming of multi-threaded RT processes. I'm
> still in the design & prototype phase, RFC patches will follow later.
> 

Ok, I had to think about it and do some testing, now I see a general
flaw in this direct relax approach, and some arch-specific readblock too:

- we want the target thread to relax from a safe and sane location. What
about the IRQ context which signals the mayday event preempting, e.g.
xnthread_relax() prologue, or any kernel code supposed to run in primary
mode only? We would have xnthread_relax() stacking over that context,
this wouldn't be pretty. Redirecting the target thread by fixing up the
interrupt frame gives such guarantee, by making sure that it will relax
on a regular user->kernel syscall transition asap, which is inherently safe.

- the direct relax over the mayday trap handler can't work by design on
blackfin, due to the requirement of delaying the rescheduling procedure
until the outer interrupt context is about to unwind. So if a mayday
event is generated in a nested interrupt context, xnthread_relax() will
fail to suspend the current thread immediately, delaying the operation
(kernel/cobalt/arch/blackfin/thread.c, xnarch_escalate()) which is
definitely not acceptable for relaxing. Granted, we might have a
"generic" mayday implementation living side-by-side with arch-specific
ones like blackfin's, but that would not solve the major issue above anyway.

I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
bug with the mayday handler now turning hw IRQs on, as a result of
relaxing over the low level IRQ trampoline, which makes some I-pipe call
in the irq_handler boilerplate code unhappy. The very same issue is
looming on x86, with an unprotected call to __ipipe_root_p from
__ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
is required at the very least.

Your patch assumes that when it comes to relaxing a thread, trap/fault
and IRQ contexts are equivalent, so we might relax over the latter as
well: actually they are not. Since traps and faults are synchronous
events which do not happen from kernel space for a Xenomai thread (or
something is really wrong anyway), relaxing is always safe from such
context: we must have taken the event from userland, so we can't have
preempted any kernel code. Obviously, IRQ contexts don't give such
guarantee.

If there is a backtracing issue with gdb due to the mayday indirection,
I would rather try fixing the call chain information appropriately.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-07  9:27 ` Philippe Gerum
@ 2015-07-07 12:53   ` Philippe Gerum
  2015-07-07 13:01     ` Jan Kiszka
  2015-07-08 10:31     ` Jan Kiszka
  0 siblings, 2 replies; 21+ messages in thread
From: Philippe Gerum @ 2015-07-07 12:53 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 07/07/2015 11:27 AM, Philippe Gerum wrote:
> 
> - we want the target thread to relax from a safe and sane location. What
> about the IRQ context which signals the mayday event preempting, e.g.
> xnthread_relax() prologue, or any kernel code supposed to run in primary
> mode only? We would have xnthread_relax() stacking over that context,
> this wouldn't be pretty. Redirecting the target thread by fixing up the
> interrupt frame gives such guarantee, by making sure that it will relax
> on a regular user->kernel syscall transition asap, which is inherently safe.
>

Actually, the code currently prevents mayday traps over non-user callers
to skip foreign stack contexts such as Xenomai 2.x kthreads, so this bad
scenario would not happen anyway. Besides, without such elimination the
indirect call mechanism would not fix the unsafe preemption issue
either. So, the remaining problem is with blackfin and its peculiar
requirement about rescheduling, which is a barrier to a generic mayday
handling.

> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
> bug with the mayday handler now turning hw IRQs on, as a result of
> relaxing over the low level IRQ trampoline, which makes some I-pipe call
> in the irq_handler boilerplate code unhappy. The very same issue is
> looming on x86, with an unprotected call to __ipipe_root_p from
> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
> is required at the very least.
> 

Looking further, ARM is affected because it does not invoke
__ipipe_call_mayday() for triggering the mayday trap, but still uses the
open-coded method. This routine preserves the current hw state across
the trap, which should make x86 safe in the end.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-07 12:53   ` Philippe Gerum
@ 2015-07-07 13:01     ` Jan Kiszka
  2015-07-07 13:24       ` Philippe Gerum
  2015-07-08 10:31     ` Jan Kiszka
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2015-07-07 13:01 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 2015-07-07 14:53, Philippe Gerum wrote:
> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>
>> - we want the target thread to relax from a safe and sane location. What
>> about the IRQ context which signals the mayday event preempting, e.g.
>> xnthread_relax() prologue, or any kernel code supposed to run in primary
>> mode only? We would have xnthread_relax() stacking over that context,
>> this wouldn't be pretty. Redirecting the target thread by fixing up the
>> interrupt frame gives such guarantee, by making sure that it will relax
>> on a regular user->kernel syscall transition asap, which is inherently safe.
>>
> 
> Actually, the code currently prevents mayday traps over non-user callers
> to skip foreign stack contexts such as Xenomai 2.x kthreads, so this bad
> scenario would not happen anyway. Besides, without such elimination the
> indirect call mechanism would not fix the unsafe preemption issue
> either. So, the remaining problem is with blackfin and its peculiar
> requirement about rescheduling, which is a barrier to a generic mayday
> handling.

Yes, that is the assumption I was building upon. And, BTW, faults do
happen over kernel contexts as well and cause relaxing then:
copy_to/from_user. So that has to work already.

But Blackfin was my concern as well, and you confirmed it. But how does
Linux address the need for rescheduling on IRQ return - which should be
similar to what we need for relaxing?

> 
>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>> bug with the mayday handler now turning hw IRQs on, as a result of
>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>> in the irq_handler boilerplate code unhappy. The very same issue is
>> looming on x86, with an unprotected call to __ipipe_root_p from
>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>> is required at the very least.
>>
> 
> Looking further, ARM is affected because it does not invoke
> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
> open-coded method. This routine preserves the current hw state across
> the trap, which should make x86 safe in the end.
> 

ARM is not yet properly tested, just a quick smoke test. I will
eventually look into this.

Another reason I'm trying to overcome the mayday trampoline is that it
prevents properly synchronized stopping and resuming of RT threads for
debugging purposes. I'm trying to address this requirement with a
userspace trap/irq return notifier, similar to what Linux has
(implemented on x86-only so far) but capable of hardening the context
before returning.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://xenomai.org/pipermail/xenomai/attachments/20150707/9c1cd90f/attachment.sig>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-07 13:01     ` Jan Kiszka
@ 2015-07-07 13:24       ` Philippe Gerum
  0 siblings, 0 replies; 21+ messages in thread
From: Philippe Gerum @ 2015-07-07 13:24 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 07/07/2015 03:01 PM, Jan Kiszka wrote:
> On 2015-07-07 14:53, Philippe Gerum wrote:
>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>>
>>> - we want the target thread to relax from a safe and sane location. What
>>> about the IRQ context which signals the mayday event preempting, e.g.
>>> xnthread_relax() prologue, or any kernel code supposed to run in primary
>>> mode only? We would have xnthread_relax() stacking over that context,
>>> this wouldn't be pretty. Redirecting the target thread by fixing up the
>>> interrupt frame gives such guarantee, by making sure that it will relax
>>> on a regular user->kernel syscall transition asap, which is inherently safe.
>>>
>>
>> Actually, the code currently prevents mayday traps over non-user callers
>> to skip foreign stack contexts such as Xenomai 2.x kthreads, so this bad
>> scenario would not happen anyway. Besides, without such elimination the
>> indirect call mechanism would not fix the unsafe preemption issue
>> either. So, the remaining problem is with blackfin and its peculiar
>> requirement about rescheduling, which is a barrier to a generic mayday
>> handling.
> 
> Yes, that is the assumption I was building upon. And, BTW, faults do
> happen over kernel contexts as well and cause relaxing then:
> copy_to/from_user. So that has to work already.
>

copy_* are specific: they should never happen over an unsafe or atomic
context by design. That context can be considered as an extension of the
plain userland context in our case.

> But Blackfin was my concern as well, and you confirmed it. But how does
> Linux address the need for rescheduling on IRQ return - which should be
> similar to what we need for relaxing?

The kernel schedules a delayed call from the IRQ/trap epilogue, which
will run at the lowest priority from vector EVT15, which guarantees the
absence of nesting, since other core events such as IRQs have higher
priority (e.g. schedule_and_signal_from_int, mach-common/entry.S).

> 
>>
>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>> is required at the very least.
>>>
>>
>> Looking further, ARM is affected because it does not invoke
>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>> open-coded method. This routine preserves the current hw state across
>> the trap, which should make x86 safe in the end.
>>
> 
> ARM is not yet properly tested, just a quick smoke test. I will
> eventually look into this.
> 
> Another reason I'm trying to overcome the mayday trampoline is that it
> prevents properly synchronized stopping and resuming of RT threads for
> debugging purposes. I'm trying to address this requirement with a
> userspace trap/irq return notifier, similar to what Linux has
> (implemented on x86-only so far) but capable of hardening the context
> before returning.
> 
> Jan
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-07 12:53   ` Philippe Gerum
  2015-07-07 13:01     ` Jan Kiszka
@ 2015-07-08 10:31     ` Jan Kiszka
  2015-07-08 11:56       ` Philippe Gerum
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2015-07-08 10:31 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 2015-07-07 14:53, Philippe Gerum wrote:
> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>> bug with the mayday handler now turning hw IRQs on, as a result of
>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>> in the irq_handler boilerplate code unhappy. The very same issue is
>> looming on x86, with an unprotected call to __ipipe_root_p from
>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>> is required at the very least.
>>
> 
> Looking further, ARM is affected because it does not invoke
> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
> open-coded method. This routine preserves the current hw state across
> the trap, which should make x86 safe in the end.

Which kernel version are you testing? It's not reproducing on 3.14 for
Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
3.14 and 3.18 (fastcall_exit_check).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 10:31     ` Jan Kiszka
@ 2015-07-08 11:56       ` Philippe Gerum
  2015-07-08 12:24         ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2015-07-08 11:56 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 07/08/2015 12:31 PM, Jan Kiszka wrote:
> On 2015-07-07 14:53, Philippe Gerum wrote:
>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>> is required at the very least.
>>>
>>
>> Looking further, ARM is affected because it does not invoke
>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>> open-coded method. This routine preserves the current hw state across
>> the trap, which should make x86 safe in the end.
> 
> Which kernel version are you testing? It's not reproducing on 3.14 for
> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
> 3.14 and 3.18 (fastcall_exit_check).
> 

Looking at the code, any kernel version since 3.10 will have the same
issue, older ones likely too, tested on 3.18.12. This does not depend on
the ARM target.

irq_handler from entry-armv.S:
	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
MULTI_IRQ enabled)
		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
xnthread_relax() re-enables hw IRQs)
	=> __ipipe_check_root_interruptible (from irq_handler)
		BAD: __ipipe_root_p tested with CPU migration enabled

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 11:56       ` Philippe Gerum
@ 2015-07-08 12:24         ` Jan Kiszka
  2015-07-08 12:29           ` Philippe Gerum
  2015-07-08 12:32           ` Gilles Chanteperdrix
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Kiszka @ 2015-07-08 12:24 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 2015-07-08 13:56, Philippe Gerum wrote:
> On 07/08/2015 12:31 PM, Jan Kiszka wrote:
>> On 2015-07-07 14:53, Philippe Gerum wrote:
>>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>>> is required at the very least.
>>>>
>>>
>>> Looking further, ARM is affected because it does not invoke
>>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>>> open-coded method. This routine preserves the current hw state across
>>> the trap, which should make x86 safe in the end.
>>
>> Which kernel version are you testing? It's not reproducing on 3.14 for
>> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
>> 3.14 and 3.18 (fastcall_exit_check).
>>
> 
> Looking at the code, any kernel version since 3.10 will have the same
> issue, older ones likely too, tested on 3.18.12. This does not depend on
> the ARM target.
> 
> irq_handler from entry-armv.S:
> 	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
> MULTI_IRQ enabled)
> 		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
> xnthread_relax() re-enables hw IRQs)
> 	=> __ipipe_check_root_interruptible (from irq_handler)
> 		BAD: __ipipe_root_p tested with CPU migration enabled
> 

Maybe [1] makes the difference here? I think I had to fix this for a
different reason.

Jan

[1]
http://git.xenomai.org/ipipe-jki.git/commit/?h=for-upstream/3.14&id=4c81d6e63a4129e34b6fdc6a3854679535eed148

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 12:24         ` Jan Kiszka
@ 2015-07-08 12:29           ` Philippe Gerum
  2015-07-08 12:32           ` Gilles Chanteperdrix
  1 sibling, 0 replies; 21+ messages in thread
From: Philippe Gerum @ 2015-07-08 12:29 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 07/08/2015 02:24 PM, Jan Kiszka wrote:
> On 2015-07-08 13:56, Philippe Gerum wrote:
>> On 07/08/2015 12:31 PM, Jan Kiszka wrote:
>>> On 2015-07-07 14:53, Philippe Gerum wrote:
>>>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>>>> is required at the very least.
>>>>>
>>>>
>>>> Looking further, ARM is affected because it does not invoke
>>>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>>>> open-coded method. This routine preserves the current hw state across
>>>> the trap, which should make x86 safe in the end.
>>>
>>> Which kernel version are you testing? It's not reproducing on 3.14 for
>>> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
>>> 3.14 and 3.18 (fastcall_exit_check).
>>>
>>
>> Looking at the code, any kernel version since 3.10 will have the same
>> issue, older ones likely too, tested on 3.18.12. This does not depend on
>> the ARM target.
>>
>> irq_handler from entry-armv.S:
>> 	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
>> MULTI_IRQ enabled)
>> 		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
>> xnthread_relax() re-enables hw IRQs)
>> 	=> __ipipe_check_root_interruptible (from irq_handler)
>> 		BAD: __ipipe_root_p tested with CPU migration enabled
>>
> 
> Maybe [1] makes the difference here? I think I had to fix this for a
> different reason.
>

Yes, that would make a significant difference, although the core issue
would remain.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 12:24         ` Jan Kiszka
  2015-07-08 12:29           ` Philippe Gerum
@ 2015-07-08 12:32           ` Gilles Chanteperdrix
  2015-07-08 12:33             ` Jan Kiszka
  1 sibling, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2015-07-08 12:32 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On Wed, Jul 08, 2015 at 02:24:41PM +0200, Jan Kiszka wrote:
> On 2015-07-08 13:56, Philippe Gerum wrote:
> > On 07/08/2015 12:31 PM, Jan Kiszka wrote:
> >> On 2015-07-07 14:53, Philippe Gerum wrote:
> >>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
> >>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
> >>>> bug with the mayday handler now turning hw IRQs on, as a result of
> >>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
> >>>> in the irq_handler boilerplate code unhappy. The very same issue is
> >>>> looming on x86, with an unprotected call to __ipipe_root_p from
> >>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
> >>>> is required at the very least.
> >>>>
> >>>
> >>> Looking further, ARM is affected because it does not invoke
> >>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
> >>> open-coded method. This routine preserves the current hw state across
> >>> the trap, which should make x86 safe in the end.
> >>
> >> Which kernel version are you testing? It's not reproducing on 3.14 for
> >> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
> >> 3.14 and 3.18 (fastcall_exit_check).
> >>
> > 
> > Looking at the code, any kernel version since 3.10 will have the same
> > issue, older ones likely too, tested on 3.18.12. This does not depend on
> > the ARM target.
> > 
> > irq_handler from entry-armv.S:
> > 	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
> > MULTI_IRQ enabled)
> > 		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
> > xnthread_relax() re-enables hw IRQs)
> > 	=> __ipipe_check_root_interruptible (from irq_handler)
> > 		BAD: __ipipe_root_p tested with CPU migration enabled
> > 
> 
> Maybe [1] makes the difference here? I think I had to fix this for a
> different reason.

Except this does not work over legacy kernel thread stacks..

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 12:32           ` Gilles Chanteperdrix
@ 2015-07-08 12:33             ` Jan Kiszka
  2015-07-08 12:43               ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2015-07-08 12:33 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2015-07-08 14:32, Gilles Chanteperdrix wrote:
> On Wed, Jul 08, 2015 at 02:24:41PM +0200, Jan Kiszka wrote:
>> On 2015-07-08 13:56, Philippe Gerum wrote:
>>> On 07/08/2015 12:31 PM, Jan Kiszka wrote:
>>>> On 2015-07-07 14:53, Philippe Gerum wrote:
>>>>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>>>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>>>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>>>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>>>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>>>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>>>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>>>>> is required at the very least.
>>>>>>
>>>>>
>>>>> Looking further, ARM is affected because it does not invoke
>>>>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>>>>> open-coded method. This routine preserves the current hw state across
>>>>> the trap, which should make x86 safe in the end.
>>>>
>>>> Which kernel version are you testing? It's not reproducing on 3.14 for
>>>> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
>>>> 3.14 and 3.18 (fastcall_exit_check).
>>>>
>>>
>>> Looking at the code, any kernel version since 3.10 will have the same
>>> issue, older ones likely too, tested on 3.18.12. This does not depend on
>>> the ARM target.
>>>
>>> irq_handler from entry-armv.S:
>>> 	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
>>> MULTI_IRQ enabled)
>>> 		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
>>> xnthread_relax() re-enables hw IRQs)
>>> 	=> __ipipe_check_root_interruptible (from irq_handler)
>>> 		BAD: __ipipe_root_p tested with CPU migration enabled
>>>
>>
>> Maybe [1] makes the difference here? I think I had to fix this for a
>> different reason.
> 
> Except this does not work over legacy kernel thread stacks..

Xenomai 2 is out of scope for these changes on mayday and for gdb
improvements.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 12:33             ` Jan Kiszka
@ 2015-07-08 12:43               ` Philippe Gerum
  2015-07-08 12:52                 ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2015-07-08 12:43 UTC (permalink / raw)
  To: Jan Kiszka, Gilles Chanteperdrix; +Cc: Xenomai

On 07/08/2015 02:33 PM, Jan Kiszka wrote:
> On 2015-07-08 14:32, Gilles Chanteperdrix wrote:
>> On Wed, Jul 08, 2015 at 02:24:41PM +0200, Jan Kiszka wrote:
>>> On 2015-07-08 13:56, Philippe Gerum wrote:
>>>> On 07/08/2015 12:31 PM, Jan Kiszka wrote:
>>>>> On 2015-07-07 14:53, Philippe Gerum wrote:
>>>>>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>>>>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>>>>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>>>>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>>>>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>>>>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>>>>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>>>>>> is required at the very least.
>>>>>>>
>>>>>>
>>>>>> Looking further, ARM is affected because it does not invoke
>>>>>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>>>>>> open-coded method. This routine preserves the current hw state across
>>>>>> the trap, which should make x86 safe in the end.
>>>>>
>>>>> Which kernel version are you testing? It's not reproducing on 3.14 for
>>>>> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
>>>>> 3.14 and 3.18 (fastcall_exit_check).
>>>>>
>>>>
>>>> Looking at the code, any kernel version since 3.10 will have the same
>>>> issue, older ones likely too, tested on 3.18.12. This does not depend on
>>>> the ARM target.
>>>>
>>>> irq_handler from entry-armv.S:
>>>> 	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
>>>> MULTI_IRQ enabled)
>>>> 		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
>>>> xnthread_relax() re-enables hw IRQs)
>>>> 	=> __ipipe_check_root_interruptible (from irq_handler)
>>>> 		BAD: __ipipe_root_p tested with CPU migration enabled
>>>>
>>>
>>> Maybe [1] makes the difference here? I think I had to fix this for a
>>> different reason.
>>
>> Except this does not work over legacy kernel thread stacks..
> 
> Xenomai 2 is out of scope for these changes on mayday and for gdb
> improvements.
> 

Testing CONFIG_IPIPE_LEGACY for such change would be required then.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 12:43               ` Philippe Gerum
@ 2015-07-08 12:52                 ` Jan Kiszka
  2015-07-08 13:00                   ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2015-07-08 12:52 UTC (permalink / raw)
  To: Philippe Gerum, Gilles Chanteperdrix; +Cc: Xenomai

On 2015-07-08 14:43, Philippe Gerum wrote:
> On 07/08/2015 02:33 PM, Jan Kiszka wrote:
>> On 2015-07-08 14:32, Gilles Chanteperdrix wrote:
>>> On Wed, Jul 08, 2015 at 02:24:41PM +0200, Jan Kiszka wrote:
>>>> On 2015-07-08 13:56, Philippe Gerum wrote:
>>>>> On 07/08/2015 12:31 PM, Jan Kiszka wrote:
>>>>>> On 2015-07-07 14:53, Philippe Gerum wrote:
>>>>>>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>>>>>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>>>>>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>>>>>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>>>>>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>>>>>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>>>>>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>>>>>>> is required at the very least.
>>>>>>>>
>>>>>>>
>>>>>>> Looking further, ARM is affected because it does not invoke
>>>>>>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>>>>>>> open-coded method. This routine preserves the current hw state across
>>>>>>> the trap, which should make x86 safe in the end.
>>>>>>
>>>>>> Which kernel version are you testing? It's not reproducing on 3.14 for
>>>>>> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
>>>>>> 3.14 and 3.18 (fastcall_exit_check).
>>>>>>
>>>>>
>>>>> Looking at the code, any kernel version since 3.10 will have the same
>>>>> issue, older ones likely too, tested on 3.18.12. This does not depend on
>>>>> the ARM target.
>>>>>
>>>>> irq_handler from entry-armv.S:
>>>>> 	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
>>>>> MULTI_IRQ enabled)
>>>>> 		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
>>>>> xnthread_relax() re-enables hw IRQs)
>>>>> 	=> __ipipe_check_root_interruptible (from irq_handler)
>>>>> 		BAD: __ipipe_root_p tested with CPU migration enabled
>>>>>
>>>>
>>>> Maybe [1] makes the difference here? I think I had to fix this for a
>>>> different reason.
>>>
>>> Except this does not work over legacy kernel thread stacks..
>>
>> Xenomai 2 is out of scope for these changes on mayday and for gdb
>> improvements.
>>
> 
> Testing CONFIG_IPIPE_LEGACY for such change would be required then.

Do you mean if I can access preempt_count() in
__ipipe_check_percpu_access? We are over the root domain at that point,
thus legacy RT kernel threads are implicitly excluded, no?

Otherwise, the changes affect only the Xenomai 3 code base anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 12:52                 ` Jan Kiszka
@ 2015-07-08 13:00                   ` Philippe Gerum
  2015-07-08 13:04                     ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2015-07-08 13:00 UTC (permalink / raw)
  To: Jan Kiszka, Gilles Chanteperdrix; +Cc: Xenomai

On 07/08/2015 02:52 PM, Jan Kiszka wrote:
> On 2015-07-08 14:43, Philippe Gerum wrote:
>> On 07/08/2015 02:33 PM, Jan Kiszka wrote:
>>> On 2015-07-08 14:32, Gilles Chanteperdrix wrote:
>>>> On Wed, Jul 08, 2015 at 02:24:41PM +0200, Jan Kiszka wrote:
>>>>> On 2015-07-08 13:56, Philippe Gerum wrote:
>>>>>> On 07/08/2015 12:31 PM, Jan Kiszka wrote:
>>>>>>> On 2015-07-07 14:53, Philippe Gerum wrote:
>>>>>>>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>>>>>>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>>>>>>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>>>>>>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>>>>>>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>>>>>>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>>>>>>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>>>>>>>> is required at the very least.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Looking further, ARM is affected because it does not invoke
>>>>>>>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>>>>>>>> open-coded method. This routine preserves the current hw state across
>>>>>>>> the trap, which should make x86 safe in the end.
>>>>>>>
>>>>>>> Which kernel version are you testing? It's not reproducing on 3.14 for
>>>>>>> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
>>>>>>> 3.14 and 3.18 (fastcall_exit_check).
>>>>>>>
>>>>>>
>>>>>> Looking at the code, any kernel version since 3.10 will have the same
>>>>>> issue, older ones likely too, tested on 3.18.12. This does not depend on
>>>>>> the ARM target.
>>>>>>
>>>>>> irq_handler from entry-armv.S:
>>>>>> 	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
>>>>>> MULTI_IRQ enabled)
>>>>>> 		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
>>>>>> xnthread_relax() re-enables hw IRQs)
>>>>>> 	=> __ipipe_check_root_interruptible (from irq_handler)
>>>>>> 		BAD: __ipipe_root_p tested with CPU migration enabled
>>>>>>
>>>>>
>>>>> Maybe [1] makes the difference here? I think I had to fix this for a
>>>>> different reason.
>>>>
>>>> Except this does not work over legacy kernel thread stacks..
>>>
>>> Xenomai 2 is out of scope for these changes on mayday and for gdb
>>> improvements.
>>>
>>
>> Testing CONFIG_IPIPE_LEGACY for such change would be required then.
> 
> Do you mean if I can access preempt_count() in
> __ipipe_check_percpu_access? We are over the root domain at that point,
> thus legacy RT kernel threads are implicitly excluded, no?

Yes. I mean that if you intend to upstream changes that only care for
Xenomai 3, you should exclude them from legacy mode builds using
CONFIG_IPIPE_LEGACY, at least until we drop all support for Xenomai 2 in
the I-pipe for newer kernel releases.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 13:00                   ` Philippe Gerum
@ 2015-07-08 13:04                     ` Jan Kiszka
  2015-07-08 13:10                       ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2015-07-08 13:04 UTC (permalink / raw)
  To: Philippe Gerum, Gilles Chanteperdrix; +Cc: Xenomai

On 2015-07-08 15:00, Philippe Gerum wrote:
> On 07/08/2015 02:52 PM, Jan Kiszka wrote:
>> On 2015-07-08 14:43, Philippe Gerum wrote:
>>> On 07/08/2015 02:33 PM, Jan Kiszka wrote:
>>>> On 2015-07-08 14:32, Gilles Chanteperdrix wrote:
>>>>> On Wed, Jul 08, 2015 at 02:24:41PM +0200, Jan Kiszka wrote:
>>>>>> On 2015-07-08 13:56, Philippe Gerum wrote:
>>>>>>> On 07/08/2015 12:31 PM, Jan Kiszka wrote:
>>>>>>>> On 2015-07-07 14:53, Philippe Gerum wrote:
>>>>>>>>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>>>>>>>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>>>>>>>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>>>>>>>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>>>>>>>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>>>>>>>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>>>>>>>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>>>>>>>>> is required at the very least.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looking further, ARM is affected because it does not invoke
>>>>>>>>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>>>>>>>>> open-coded method. This routine preserves the current hw state across
>>>>>>>>> the trap, which should make x86 safe in the end.
>>>>>>>>
>>>>>>>> Which kernel version are you testing? It's not reproducing on 3.14 for
>>>>>>>> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
>>>>>>>> 3.14 and 3.18 (fastcall_exit_check).
>>>>>>>>
>>>>>>>
>>>>>>> Looking at the code, any kernel version since 3.10 will have the same
>>>>>>> issue, older ones likely too, tested on 3.18.12. This does not depend on
>>>>>>> the ARM target.
>>>>>>>
>>>>>>> irq_handler from entry-armv.S:
>>>>>>> 	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
>>>>>>> MULTI_IRQ enabled)
>>>>>>> 		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
>>>>>>> xnthread_relax() re-enables hw IRQs)
>>>>>>> 	=> __ipipe_check_root_interruptible (from irq_handler)
>>>>>>> 		BAD: __ipipe_root_p tested with CPU migration enabled
>>>>>>>
>>>>>>
>>>>>> Maybe [1] makes the difference here? I think I had to fix this for a
>>>>>> different reason.
>>>>>
>>>>> Except this does not work over legacy kernel thread stacks..
>>>>
>>>> Xenomai 2 is out of scope for these changes on mayday and for gdb
>>>> improvements.
>>>>
>>>
>>> Testing CONFIG_IPIPE_LEGACY for such change would be required then.
>>
>> Do you mean if I can access preempt_count() in
>> __ipipe_check_percpu_access? We are over the root domain at that point,
>> thus legacy RT kernel threads are implicitly excluded, no?
> 
> Yes. I mean that if you intend to upstream changes that only care for
> Xenomai 3, you should exclude them from legacy mode builds using
> CONFIG_IPIPE_LEGACY, at least until we drop all support for Xenomai 2 in
> the I-pipe for newer kernel releases.

Sure, if there are dependencies, I'll do this. Right now I see none,
thus I'd like to avoid cluttering up the code with #ifdefs.

What is in current for-upstream/3.14 and 3.18 is intended for upstream
as-is unless someone sees problems I missed.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] Mayday issues again
  2015-07-08 13:04                     ` Jan Kiszka
@ 2015-07-08 13:10                       ` Philippe Gerum
  0 siblings, 0 replies; 21+ messages in thread
From: Philippe Gerum @ 2015-07-08 13:10 UTC (permalink / raw)
  To: Jan Kiszka, Gilles Chanteperdrix; +Cc: Xenomai

On 07/08/2015 03:04 PM, Jan Kiszka wrote:
> On 2015-07-08 15:00, Philippe Gerum wrote:
>> On 07/08/2015 02:52 PM, Jan Kiszka wrote:
>>> On 2015-07-08 14:43, Philippe Gerum wrote:
>>>> On 07/08/2015 02:33 PM, Jan Kiszka wrote:
>>>>> On 2015-07-08 14:32, Gilles Chanteperdrix wrote:
>>>>>> On Wed, Jul 08, 2015 at 02:24:41PM +0200, Jan Kiszka wrote:
>>>>>>> On 2015-07-08 13:56, Philippe Gerum wrote:
>>>>>>>> On 07/08/2015 12:31 PM, Jan Kiszka wrote:
>>>>>>>>> On 2015-07-07 14:53, Philippe Gerum wrote:
>>>>>>>>>> On 07/07/2015 11:27 AM, Philippe Gerum wrote:
>>>>>>>>>>> I tested the patch on ARM. Enabling IPIPE_DEBUG_INTERNAL there reveals a
>>>>>>>>>>> bug with the mayday handler now turning hw IRQs on, as a result of
>>>>>>>>>>> relaxing over the low level IRQ trampoline, which makes some I-pipe call
>>>>>>>>>>> in the irq_handler boilerplate code unhappy. The very same issue is
>>>>>>>>>>> looming on x86, with an unprotected call to __ipipe_root_p from
>>>>>>>>>>> __ipipe_handle_irq(). Disabling IRQs before leaving the mayday handler
>>>>>>>>>>> is required at the very least.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Looking further, ARM is affected because it does not invoke
>>>>>>>>>> __ipipe_call_mayday() for triggering the mayday trap, but still uses the
>>>>>>>>>> open-coded method. This routine preserves the current hw state across
>>>>>>>>>> the trap, which should make x86 safe in the end.
>>>>>>>>>
>>>>>>>>> Which kernel version are you testing? It's not reproducing on 3.14 for
>>>>>>>>> Cortex-A7/15 targets at least. And I find __ipipe_call_mayday in both
>>>>>>>>> 3.14 and 3.18 (fastcall_exit_check).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at the code, any kernel version since 3.10 will have the same
>>>>>>>> issue, older ones likely too, tested on 3.18.12. This does not depend on
>>>>>>>> the ARM target.
>>>>>>>>
>>>>>>>> irq_handler from entry-armv.S:
>>>>>>>> 	=> __ipipe_grab_irq (or indirecty via ipipe_handle_multi_irq with
>>>>>>>> MULTI_IRQ enabled)
>>>>>>>> 		=> __ipipe_exit_irq (open coded __ipipe_notify_trap(MAYDAY),
>>>>>>>> xnthread_relax() re-enables hw IRQs)
>>>>>>>> 	=> __ipipe_check_root_interruptible (from irq_handler)
>>>>>>>> 		BAD: __ipipe_root_p tested with CPU migration enabled
>>>>>>>>
>>>>>>>
>>>>>>> Maybe [1] makes the difference here? I think I had to fix this for a
>>>>>>> different reason.
>>>>>>
>>>>>> Except this does not work over legacy kernel thread stacks..
>>>>>
>>>>> Xenomai 2 is out of scope for these changes on mayday and for gdb
>>>>> improvements.
>>>>>
>>>>
>>>> Testing CONFIG_IPIPE_LEGACY for such change would be required then.
>>>
>>> Do you mean if I can access preempt_count() in
>>> __ipipe_check_percpu_access? We are over the root domain at that point,
>>> thus legacy RT kernel threads are implicitly excluded, no?
>>
>> Yes. I mean that if you intend to upstream changes that only care for
>> Xenomai 3, you should exclude them from legacy mode builds using
>> CONFIG_IPIPE_LEGACY, at least until we drop all support for Xenomai 2 in
>> the I-pipe for newer kernel releases.
> 
> Sure, if there are dependencies, I'll do this. Right now I see none,
> thus I'd like to avoid cluttering up the code with #ifdefs.
>

IS_ENABLED() would likely help nicely.

> What is in current for-upstream/3.14 and 3.18 is intended for upstream
> as-is unless someone sees problems I missed.
> 
> Jan
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-07-08 13:10 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-20 14:34 [Xenomai] Mayday issues again Jan Kiszka
2015-06-20 14:46 ` Gilles Chanteperdrix
2015-06-20 15:01   ` Jan Kiszka
2015-06-20 18:15 ` Philippe Gerum
2015-06-21 17:53   ` Jan Kiszka
2015-06-21 18:57     ` Philippe Gerum
2015-07-07  9:27 ` Philippe Gerum
2015-07-07 12:53   ` Philippe Gerum
2015-07-07 13:01     ` Jan Kiszka
2015-07-07 13:24       ` Philippe Gerum
2015-07-08 10:31     ` Jan Kiszka
2015-07-08 11:56       ` Philippe Gerum
2015-07-08 12:24         ` Jan Kiszka
2015-07-08 12:29           ` Philippe Gerum
2015-07-08 12:32           ` Gilles Chanteperdrix
2015-07-08 12:33             ` Jan Kiszka
2015-07-08 12:43               ` Philippe Gerum
2015-07-08 12:52                 ` Jan Kiszka
2015-07-08 13:00                   ` Philippe Gerum
2015-07-08 13:04                     ` Jan Kiszka
2015-07-08 13:10                       ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.