All of lore.kernel.org
 help / color / mirror / Atom feed
* [Adeos-main] Stall bit setting in __ipipe_handle_exception
@ 2009-02-20 12:26 Jan Kiszka
  2009-02-20 12:33 ` Jan Kiszka
  2009-02-23 12:03 ` Philippe Gerum
  0 siblings, 2 replies; 5+ messages in thread
From: Jan Kiszka @ 2009-02-20 12:26 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: adeos-main

Hi Philippe,

as already indicated, I'm starting to understand the ipipe bug Roman
sees. It seems to melt down to the following path:

- exception raised over non-root domain (__rt_event_wait...)
- root domain is stalled on entry of __ipipe_handle_exception
- fault causing task is first relaxed, then scheduled away under Linux
- scheduled-in Linux task was interrupted in __ipipe_divert_exception,
  shortly before __fixup_if
- __fixup_if finds root domain stalled and propagates this to the
  register set of the interrupted context (user space task running on
  its first fpu instruction, having triggered device_not_available).
- return to user space task with irqs disable - bang!

Two ways to approach this:
1. Do we actually have to stall the root domain in
   __ipipe_handle_exception before ipipe_trap_notify? I don't see why we
   should be better off with doing this afterwards.
2. Avoid that __ipipe_divert_exception is interruptible and can pick up
   the stall flag from a different Linux task. But I don't know if there
   aren't more race windows like that.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Adeos-main] Stall bit setting in __ipipe_handle_exception
  2009-02-20 12:26 [Adeos-main] Stall bit setting in __ipipe_handle_exception Jan Kiszka
@ 2009-02-20 12:33 ` Jan Kiszka
  2009-02-23 12:03 ` Philippe Gerum
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kiszka @ 2009-02-20 12:33 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: adeos-main

Jan Kiszka wrote:
> Hi Philippe,
> 
> as already indicated, I'm starting to understand the ipipe bug Roman
> sees. It seems to melt down to the following path:
> 
> - exception raised over non-root domain (__rt_event_wait...)
> - root domain is stalled on entry of __ipipe_handle_exception
> - fault causing task is first relaxed, then scheduled away under Linux
> - scheduled-in Linux task was interrupted in __ipipe_divert_exception,
>   shortly before __fixup_if
> - __fixup_if finds root domain stalled and propagates this to the
>   register set of the interrupted context (user space task running on
>   its first fpu instruction, having triggered device_not_available).
> - return to user space task with irqs disable - bang!
> 
> Two ways to approach this:
> 1. Do we actually have to stall the root domain in
>    __ipipe_handle_exception before ipipe_trap_notify? I don't see why we
>    should be better off with doing this afterwards.

...why we should *not* be better off...

> 2. Avoid that __ipipe_divert_exception is interruptible and can pick up
>    the stall flag from a different Linux task. But I don't know if there
>    aren't more race windows like that.
> 
> Jan
> 

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Adeos-main] Stall bit setting in __ipipe_handle_exception
  2009-02-20 12:26 [Adeos-main] Stall bit setting in __ipipe_handle_exception Jan Kiszka
  2009-02-20 12:33 ` Jan Kiszka
@ 2009-02-23 12:03 ` Philippe Gerum
  2009-02-23 12:24   ` Jan Kiszka
  1 sibling, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2009-02-23 12:03 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main

Jan Kiszka wrote:
> Hi Philippe,
> 
> as already indicated, I'm starting to understand the ipipe bug Roman
> sees. It seems to melt down to the following path:
> 
> - exception raised over non-root domain (__rt_event_wait...)
> - root domain is stalled on entry of __ipipe_handle_exception
> - fault causing task is first relaxed, then scheduled away under Linux
> - scheduled-in Linux task was interrupted in __ipipe_divert_exception,
>   shortly before __fixup_if
> - __fixup_if finds root domain stalled and propagates this to the
>   register set of the interrupted context (user space task running on
>   its first fpu instruction, having triggered device_not_available).
> - return to user space task with irqs disable - bang!
>

Good catch.

> Two ways to approach this:
> 1. Do we actually have to stall the root domain in
>    __ipipe_handle_exception before ipipe_trap_notify? I don't see why we
>    should be better off with doing this afterwards.

We do, because the root domain may install an I-pipe event handler on exceptions
as well, and the callee may assume that the virtual interrupt state is correct.

> 2. Avoid that __ipipe_divert_exception is interruptible and can pick up
>    the stall flag from a different Linux task. But I don't know if there
>    aren't more race windows like that.
> 

Since the core of the issue is about a preemption point that may be introduced
by a thread migration to secondary, the same goes with __ipipe_syscall_root;
this is what I stumbled upon on a different trace set.

The way to fix this properly is to decouple fixup_if() from the current global
interrupt state at call time, and rather make such state context-dependent, so
that iret emulation always uses the proper state value. A typical approach would
be to record the stall bit value on the caller's stack, and feed fixup_if() with it.

> Jan
> 


-- 
Philippe.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Adeos-main] Stall bit setting in __ipipe_handle_exception
  2009-02-23 12:03 ` Philippe Gerum
@ 2009-02-23 12:24   ` Jan Kiszka
  2009-02-23 12:50     ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2009-02-23 12:24 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Hi Philippe,
>>
>> as already indicated, I'm starting to understand the ipipe bug Roman
>> sees. It seems to melt down to the following path:
>>
>> - exception raised over non-root domain (__rt_event_wait...)
>> - root domain is stalled on entry of __ipipe_handle_exception
>> - fault causing task is first relaxed, then scheduled away under Linux
>> - scheduled-in Linux task was interrupted in __ipipe_divert_exception,
>>   shortly before __fixup_if
>> - __fixup_if finds root domain stalled and propagates this to the
>>   register set of the interrupted context (user space task running on
>>   its first fpu instruction, having triggered device_not_available).
>> - return to user space task with irqs disable - bang!
>>
> 
> Good catch.
> 
>> Two ways to approach this:
>> 1. Do we actually have to stall the root domain in
>>    __ipipe_handle_exception before ipipe_trap_notify? I don't see why we
>>    should be better off with doing this afterwards.
> 
> We do, because the root domain may install an I-pipe event handler on exceptions
> as well, and the callee may assume that the virtual interrupt state is correct.

But from that POV, you would have to stall all domains before calling
the hook, not just root
.

> 
>> 2. Avoid that __ipipe_divert_exception is interruptible and can pick up
>>    the stall flag from a different Linux task. But I don't know if there
>>    aren't more race windows like that.
>>
> 
> Since the core of the issue is about a preemption point that may be introduced
> by a thread migration to secondary, the same goes with __ipipe_syscall_root;
> this is what I stumbled upon on a different trace set.
> 
> The way to fix this properly is to decouple fixup_if() from the current global
> interrupt state at call time, and rather make such state context-dependent, so
> that iret emulation always uses the proper state value. A typical approach would
> be to record the stall bit value on the caller's stack, and feed fixup_if() with it.
> 

Didn't get yet how this should work, but I guess you've implemented it
in -06. Will check.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Adeos-main] Stall bit setting in __ipipe_handle_exception
  2009-02-23 12:24   ` Jan Kiszka
@ 2009-02-23 12:50     ` Philippe Gerum
  0 siblings, 0 replies; 5+ messages in thread
From: Philippe Gerum @ 2009-02-23 12:50 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>> Hi Philippe,
>>>
>>> as already indicated, I'm starting to understand the ipipe bug Roman
>>> sees. It seems to melt down to the following path:
>>>
>>> - exception raised over non-root domain (__rt_event_wait...)
>>> - root domain is stalled on entry of __ipipe_handle_exception
>>> - fault causing task is first relaxed, then scheduled away under Linux
>>> - scheduled-in Linux task was interrupted in __ipipe_divert_exception,
>>>   shortly before __fixup_if
>>> - __fixup_if finds root domain stalled and propagates this to the
>>>   register set of the interrupted context (user space task running on
>>>   its first fpu instruction, having triggered device_not_available).
>>> - return to user space task with irqs disable - bang!
>>>
>> Good catch.
>>
>>> Two ways to approach this:
>>> 1. Do we actually have to stall the root domain in
>>>    __ipipe_handle_exception before ipipe_trap_notify? I don't see why we
>>>    should be better off with doing this afterwards.
>> We do, because the root domain may install an I-pipe event handler on exceptions
>> as well, and the callee may assume that the virtual interrupt state is correct.
> 
> But from that POV, you would have to stall all domains before calling
> the hook, not just root

Why? non-root domain may not affect the root stall bit, that is simply
forbidden. So there is no point in making it consistent, since they may not act
upon it anyway.

> .
> 
>>> 2. Avoid that __ipipe_divert_exception is interruptible and can pick up
>>>    the stall flag from a different Linux task. But I don't know if there
>>>    aren't more race windows like that.
>>>
>> Since the core of the issue is about a preemption point that may be introduced
>> by a thread migration to secondary, the same goes with __ipipe_syscall_root;
>> this is what I stumbled upon on a different trace set.
>>
>> The way to fix this properly is to decouple fixup_if() from the current global
>> interrupt state at call time, and rather make such state context-dependent, so
>> that iret emulation always uses the proper state value. A typical approach would
>> be to record the stall bit value on the caller's stack, and feed fixup_if() with it.
>>
> 
> Didn't get yet how this should work, but I guess you've implemented it
> in -06. Will check.
> 
> Jan
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-02-23 12:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-20 12:26 [Adeos-main] Stall bit setting in __ipipe_handle_exception Jan Kiszka
2009-02-20 12:33 ` Jan Kiszka
2009-02-23 12:03 ` Philippe Gerum
2009-02-23 12:24   ` Jan Kiszka
2009-02-23 12:50     ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.