[Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
@ 2009-02-20 14:33 Jan Kiszka
  2009-02-23 12:04 ` [Xenomai-help] " Philippe Gerum
  2009-02-23 12:10 ` [Xenomai-help] " Philippe Gerum
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Kiszka @ 2009-02-20 14:33 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: adeos-main

This is an attempt to fix the broken root domain state adjustment in
__ipipe_handle_exception. Patch below fixes the issues recently reported
by Roman Pisl. Also, it currently makes much more sense to me than what
we have so far.

In short, this patch propagates the hardware irq state into the root
domains stall flag shortly before calling into the Linux handler, and
only then. This avoids spurious root domain stalls the end up over the
wrong Linux context due to context switches between enter and exit of
ipipe_handle_exception. Also, this patch drops the bogus
local_irq_save/restore pair that doesn't account for Linux irq state
changes inside its fault handler.

But that doesn't mean that the ripped-out code may not have been there
for a reason. Maybe I oversee some corner case that regresses now. If
anyone knows such a case, please share your knowledge so that it can be
addressed and (very important!) properly documented in the code.

Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>

---
 arch/x86/kernel/ipipe.c |   34 +++++++++-------------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

Index: b/arch/x86/kernel/ipipe.c
===================================================================
--- a/arch/x86/kernel/ipipe.c
+++ b/arch/x86/kernel/ipipe.c
@@ -755,39 +755,18 @@ static int __ipipe_xlate_signo[] = {
 
 int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector)
 {
-	unsigned long flags;
-
-	local_save_flags(flags);
-
-	/* Track the hw interrupt state before calling the Linux
-	 * exception handler, replicating it into the virtual mask. */
-
-	if (irqs_disabled_hw()) {
-		/* Do not trigger the alarm in ipipe_check_context() by using
-		 * plain local_irq_disable(). */
-		__ipipe_stall_root();
-		trace_hardirqs_off();
-		barrier();
-	}
+	int irqs_disabled = irqs_disabled_hw();
 
 #ifdef CONFIG_KGDB
 	/* catch exception KGDB is interested in over non-root domains */
 	if (!ipipe_root_domain_p &&
 	    __ipipe_xlate_signo[vector] >= 0 &&
-	    !kgdb_handle_exception(vector, __ipipe_xlate_signo[vector], error_code, regs)) {
-		if (flags & X86_EFLAGS_IF)
-			__clear_bit(IPIPE_STALL_FLAG,
-				    &ipipe_root_cpudom_var(status));
+	    !kgdb_handle_exception(vector, __ipipe_xlate_signo[vector], error_code, regs))
 		return 1;
-	}
 #endif /* CONFIG_KGDB */
 
-	if (unlikely(ipipe_trap_notify(vector, regs))) {
-		if (flags & X86_EFLAGS_IF)
-			__clear_bit(IPIPE_STALL_FLAG,
-				    &ipipe_root_cpudom_var(status));
+	if (unlikely(ipipe_trap_notify(vector, regs)))
 		return 1;
-	}
 
 	/* Detect unhandled faults over non-root domains. */
 
@@ -818,9 +797,14 @@ int __ipipe_handle_exception(struct pt_r
 		}
 	}
 
+	/* Track the hw interrupt state before calling the Linux
+	 * exception handler, replicating it into the virtual mask. */
+
+	if (irqs_disabled)
+		local_irq_disable();
+
 	__ipipe_std_extable[vector](regs, error_code);
 	local_irq_disable_hw();
-	local_irq_restore(flags);
 	__fixup_if(regs);
 
 	return 0;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai-help] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-20 14:33 [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception Jan Kiszka
@ 2009-02-23 12:04 ` Philippe Gerum
  2009-02-23 12:37   ` Jan Kiszka
  2009-02-23 12:10 ` [Xenomai-help] " Philippe Gerum
  1 sibling, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2009-02-23 12:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-help, adeos-main

Jan Kiszka wrote:
> This is an attempt to fix the broken root domain state adjustment in
> __ipipe_handle_exception. Patch below fixes the issues recently reported
> by Roman Pisl. Also, it currently makes much more sense to me than what
> we have so far.
> 
> In short, this patch propagates the hardware irq state into the root
> domains stall flag shortly before calling into the Linux handler, and
> only then. This avoids spurious root domain stalls the end up over the
> wrong Linux context due to context switches between enter and exit of
> ipipe_handle_exception. Also, this patch drops the bogus
> local_irq_save/restore pair that doesn't account for Linux irq state
> changes inside its fault handler.
>

Actually, it is not bogus at all, it is even mandatory on x86_64, given that we
don't branch to any sysretq/iretq emulation unlike with x86_32. So if we don't
restore the stall bit for the root domain properly there, we could end up
running with interrupts off in user-space.

However, the way the interrupt state is currently saved is wrong: we should not
local_irq_disable() over non-root domains. Here is some on-line documentation to
explain why:

The main difference between x86_32 and 64 is that the former does virtualize the
interrupt state in entry_32.S, unlike the latter. For that reason, x86_64 does
not require (actually, we should not be doing) any fixup. So, to sum up:

- we use fixup_if() to restore the virtual interrupt state properly when control
is given back to the code that triggered the fault/exception (x86_32). We need
to do that because of task migrations between primary and secondary modes.

- we must clear the virtual interrupt flag before calling the I-pipe handler /
Linux regular exception handler, because our callee may/must run in the root
domain as well, and expect that interrupt state to reflect the hw one, as set by
the x86 exception gate / fault prologue in entry_*.S.

- because of the above, we must use local_irq_save()/local_irq_restore_nosync()
in our fault handler to make sure to restore the virtual interrupt flag properly
between this routine, and the exception return statement (i.e. during the Linux
fault epilogue in entry_*.S).

> But that doesn't mean that the ripped-out code may not have been there
> for a reason. Maybe I oversee some corner case that regresses now. If
> anyone knows such a case, please share your knowledge so that it can be
> addressed and (very important!) properly documented in the code.
>

As mentioned earlier, ripping this code out breaks the interrupt state when an
event handler has been registered by the root domain itself.

I'm about to roll out a couple of patches that basically rewrite the exception
handling for x86*, since we had serious problem in that area, particularly in
x86_64 with CONFIG_PREEMPT enabled. They do solve the issue raised by Roman, and
I think they should also fix the issue I have been tracing lately that involves
system calls, and not page faults like in Roman's case. However, we still need a
confirmation from Steven Seeger for the later.

In any case, you discovered the logic of this braindamage bug, and actually
saved the day. Thanks a lot for this.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai-help] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-20 14:33 [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception Jan Kiszka
  2009-02-23 12:04 ` [Xenomai-help] " Philippe Gerum
@ 2009-02-23 12:10 ` Philippe Gerum
  2009-02-23 17:21   ` Jan Kiszka
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Philippe Gerum @ 2009-02-23 12:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-help


Could anyone affected by the fs/buffer.c WARN_ON() message try at least one of
the patches below? It is an attempt to fix the exception handling bugs both for
x86_32 and x86_64.

TIA,

http://download.gna.org/adeos/patches/v2.6/x86/adeos-ipipe-2.6.27.19-x86-2.2-06.patch
http://download.gna.org/adeos/patches/v2.6/x86/adeos-ipipe-2.6.28.7-x86-2.2-06.patch


-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai-help] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 12:04 ` [Xenomai-help] " Philippe Gerum
@ 2009-02-23 12:37   ` Jan Kiszka
  2009-02-23 12:40     ` Jan Kiszka
  2009-02-23 12:55     ` Philippe Gerum
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Kiszka @ 2009-02-23 12:37 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-help, adeos-main

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> This is an attempt to fix the broken root domain state adjustment in
>> __ipipe_handle_exception. Patch below fixes the issues recently reported
>> by Roman Pisl. Also, it currently makes much more sense to me than what
>> we have so far.
>>
>> In short, this patch propagates the hardware irq state into the root
>> domains stall flag shortly before calling into the Linux handler, and
>> only then. This avoids spurious root domain stalls the end up over the
>> wrong Linux context due to context switches between enter and exit of
>> ipipe_handle_exception. Also, this patch drops the bogus
>> local_irq_save/restore pair that doesn't account for Linux irq state
>> changes inside its fault handler.
>>
> 
> Actually, it is not bogus at all, it is even mandatory on x86_64, given that we
> don't branch to any sysretq/iretq emulation unlike with x86_32. So if we don't
> restore the stall bit for the root domain properly there, we could end up
> running with interrupts off in user-space.
> 
> However, the way the interrupt state is currently saved is wrong: we should not
> local_irq_disable() over non-root domains. Here is some on-line documentation to
> explain why:
> 
> The main difference between x86_32 and 64 is that the former does virtualize the
> interrupt state in entry_32.S, unlike the latter. For that reason, x86_64 does
> not require (actually, we should not be doing) any fixup. So, to sum up:
> 
> - we use fixup_if() to restore the virtual interrupt state properly when control
> is given back to the code that triggered the fault/exception (x86_32). We need
> to do that because of task migrations between primary and secondary modes.
> 
> - we must clear the virtual interrupt flag before calling the I-pipe handler /
> Linux regular exception handler, because our callee may/must run in the root
> domain as well, and expect that interrupt state to reflect the hw one, as set by
> the x86 exception gate / fault prologue in entry_*.S.
> 
> - because of the above, we must use local_irq_save()/local_irq_restore_nosync()
> in our fault handler to make sure to restore the virtual interrupt flag properly
> between this routine, and the exception return statement (i.e. during the Linux
> fault epilogue in entry_*.S).

OK, if there is a reason to enforce a stalled root domain while calling
into the exception hook, this makes some sense. But I don't think it is
formally correct to save the root state on entry and blindly restore it
_after_ calling the Linux handler. I rather think we should keep the
state that Linux leaves behind to remain transparent to it. Maybe no
practical issue ATM, but it makes the code at least illogical.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai-help] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 12:37   ` Jan Kiszka
@ 2009-02-23 12:40     ` Jan Kiszka
  2009-02-23 12:55     ` Philippe Gerum
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2009-02-23 12:40 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-help, adeos-main

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>> This is an attempt to fix the broken root domain state adjustment in
>>> __ipipe_handle_exception. Patch below fixes the issues recently reported
>>> by Roman Pisl. Also, it currently makes much more sense to me than what
>>> we have so far.
>>>
>>> In short, this patch propagates the hardware irq state into the root
>>> domains stall flag shortly before calling into the Linux handler, and
>>> only then. This avoids spurious root domain stalls the end up over the
>>> wrong Linux context due to context switches between enter and exit of
>>> ipipe_handle_exception. Also, this patch drops the bogus
>>> local_irq_save/restore pair that doesn't account for Linux irq state
>>> changes inside its fault handler.
>>>
>> Actually, it is not bogus at all, it is even mandatory on x86_64, given that we
>> don't branch to any sysretq/iretq emulation unlike with x86_32. So if we don't
>> restore the stall bit for the root domain properly there, we could end up
>> running with interrupts off in user-space.
>>
>> However, the way the interrupt state is currently saved is wrong: we should not
>> local_irq_disable() over non-root domains. Here is some on-line documentation to
>> explain why:
>>
>> The main difference between x86_32 and 64 is that the former does virtualize the
>> interrupt state in entry_32.S, unlike the latter. For that reason, x86_64 does
>> not require (actually, we should not be doing) any fixup. So, to sum up:
>>
>> - we use fixup_if() to restore the virtual interrupt state properly when control
>> is given back to the code that triggered the fault/exception (x86_32). We need
>> to do that because of task migrations between primary and secondary modes.
>>
>> - we must clear the virtual interrupt flag before calling the I-pipe handler /
>> Linux regular exception handler, because our callee may/must run in the root
>> domain as well, and expect that interrupt state to reflect the hw one, as set by
>> the x86 exception gate / fault prologue in entry_*.S.
>>
>> - because of the above, we must use local_irq_save()/local_irq_restore_nosync()
>> in our fault handler to make sure to restore the virtual interrupt flag properly
>> between this routine, and the exception return statement (i.e. during the Linux
>> fault epilogue in entry_*.S).
> 
> OK, if there is a reason to enforce a stalled root domain while calling
> into the exception hook, this makes some sense. But I don't think it is
> formally correct to save the root state on entry and blindly restore it
> _after_ calling the Linux handler. I rather think we should keep the
> state that Linux leaves behind to remain transparent to it. Maybe no
> practical issue ATM, but it makes the code at least illogical.

And that reminds me: Please comment such considerations *in place*, ie.
in the code. No one can otherwise understand why there is a
local_irq_save/restore unless (s)he browses all the postings on our
mailing lists (which tend to become outdated over the time...).

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai-help] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 12:37   ` Jan Kiszka
  2009-02-23 12:40     ` Jan Kiszka
@ 2009-02-23 12:55     ` Philippe Gerum
  2009-02-23 13:36       ` [Adeos-main] " Jan Kiszka
  1 sibling, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2009-02-23 12:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-help, adeos-main

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>> This is an attempt to fix the broken root domain state adjustment in
>>> __ipipe_handle_exception. Patch below fixes the issues recently reported
>>> by Roman Pisl. Also, it currently makes much more sense to me than what
>>> we have so far.
>>>
>>> In short, this patch propagates the hardware irq state into the root
>>> domains stall flag shortly before calling into the Linux handler, and
>>> only then. This avoids spurious root domain stalls the end up over the
>>> wrong Linux context due to context switches between enter and exit of
>>> ipipe_handle_exception. Also, this patch drops the bogus
>>> local_irq_save/restore pair that doesn't account for Linux irq state
>>> changes inside its fault handler.
>>>
>> Actually, it is not bogus at all, it is even mandatory on x86_64, given that we
>> don't branch to any sysretq/iretq emulation unlike with x86_32. So if we don't
>> restore the stall bit for the root domain properly there, we could end up
>> running with interrupts off in user-space.
>>
>> However, the way the interrupt state is currently saved is wrong: we should not
>> local_irq_disable() over non-root domains. Here is some on-line documentation to
>> explain why:
>>
>> The main difference between x86_32 and 64 is that the former does virtualize the
>> interrupt state in entry_32.S, unlike the latter. For that reason, x86_64 does
>> not require (actually, we should not be doing) any fixup. So, to sum up:
>>
>> - we use fixup_if() to restore the virtual interrupt state properly when control
>> is given back to the code that triggered the fault/exception (x86_32). We need
>> to do that because of task migrations between primary and secondary modes.
>>
>> - we must clear the virtual interrupt flag before calling the I-pipe handler /
>> Linux regular exception handler, because our callee may/must run in the root
>> domain as well, and expect that interrupt state to reflect the hw one, as set by
>> the x86 exception gate / fault prologue in entry_*.S.
>>
>> - because of the above, we must use local_irq_save()/local_irq_restore_nosync()
>> in our fault handler to make sure to restore the virtual interrupt flag properly
>> between this routine, and the exception return statement (i.e. during the Linux
>> fault epilogue in entry_*.S).
> 
> OK, if there is a reason to enforce a stalled root domain while calling
> into the exception hook, this makes some sense. But I don't think it is
> formally correct to save the root state on entry and blindly restore it
> _after_ calling the Linux handler. I rather think we should keep the
> state that Linux leaves behind to remain transparent to it. Maybe no
> practical issue ATM, but it makes the code at least illogical.
> 

Please re-read the explanations, and you will find the logic. I cannot do
anything more than re-hashing what I just said. What you perceive as illogical
is actually the only sane way to do this. Formally speaking, a linux fault
handler may NOT alter the interrupt state blindly, so we must be able to assume
that we ought to restore it the way the lower code set it.

> Jan
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 12:55     ` Philippe Gerum
@ 2009-02-23 13:36       ` Jan Kiszka
  2009-02-23 17:42         ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2009-02-23 13:36 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> Jan Kiszka wrote:
>>>> This is an attempt to fix the broken root domain state adjustment in
>>>> __ipipe_handle_exception. Patch below fixes the issues recently reported
>>>> by Roman Pisl. Also, it currently makes much more sense to me than what
>>>> we have so far.
>>>>
>>>> In short, this patch propagates the hardware irq state into the root
>>>> domains stall flag shortly before calling into the Linux handler, and
>>>> only then. This avoids spurious root domain stalls the end up over the
>>>> wrong Linux context due to context switches between enter and exit of
>>>> ipipe_handle_exception. Also, this patch drops the bogus
>>>> local_irq_save/restore pair that doesn't account for Linux irq state
>>>> changes inside its fault handler.
>>>>
>>> Actually, it is not bogus at all, it is even mandatory on x86_64, given that we
>>> don't branch to any sysretq/iretq emulation unlike with x86_32. So if we don't
>>> restore the stall bit for the root domain properly there, we could end up
>>> running with interrupts off in user-space.
>>>
>>> However, the way the interrupt state is currently saved is wrong: we should not
>>> local_irq_disable() over non-root domains. Here is some on-line documentation to
>>> explain why:
>>>
>>> The main difference between x86_32 and 64 is that the former does virtualize the
>>> interrupt state in entry_32.S, unlike the latter. For that reason, x86_64 does
>>> not require (actually, we should not be doing) any fixup. So, to sum up:
>>>
>>> - we use fixup_if() to restore the virtual interrupt state properly when control
>>> is given back to the code that triggered the fault/exception (x86_32). We need
>>> to do that because of task migrations between primary and secondary modes.
>>>
>>> - we must clear the virtual interrupt flag before calling the I-pipe handler /
>>> Linux regular exception handler, because our callee may/must run in the root
>>> domain as well, and expect that interrupt state to reflect the hw one, as set by
>>> the x86 exception gate / fault prologue in entry_*.S.
>>>
>>> - because of the above, we must use local_irq_save()/local_irq_restore_nosync()
>>> in our fault handler to make sure to restore the virtual interrupt flag properly
>>> between this routine, and the exception return statement (i.e. during the Linux
>>> fault epilogue in entry_*.S).
>> OK, if there is a reason to enforce a stalled root domain while calling
>> into the exception hook, this makes some sense. But I don't think it is
>> formally correct to save the root state on entry and blindly restore it
>> _after_ calling the Linux handler. I rather think we should keep the
>> state that Linux leaves behind to remain transparent to it. Maybe no
>> practical issue ATM, but it makes the code at least illogical.
>>
> 
> Please re-read the explanations, and you will find the logic. I cannot do
> anything more than re-hashing what I just said. What you perceive as illogical
> is actually the only sane way to do this. Formally speaking, a linux fault
> handler may NOT alter the interrupt state blindly, so we must be able to assume
> that we ought to restore it the way the lower code set it.

I got your first and second point, but they don't imply to me that the
third shall be correct as well. "...to make sure to restore the virtual
interrupt flag properly" is not directly an clear explanation (for me)
why we have to restore the flag across calls to the _Linux_ handler. We
can demand that the hook handler leaves the root state untouched, but
requiring the same from Linux is a restriction that you don't find in
the ipipe-less case, nor do I see the reason for this under ipipe control.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai-help] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 12:10 ` [Xenomai-help] " Philippe Gerum
@ 2009-02-23 17:21   ` Jan Kiszka
  2009-02-23 23:32   ` Steven Seeger
  2009-02-23 23:35   ` Steven Seeger
  2 siblings, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2009-02-23 17:21 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-help

Philippe Gerum wrote:
> Could anyone affected by the fs/buffer.c WARN_ON() message try at least one of
> the patches below? It is an attempt to fix the exception handling bugs both for
> x86_32 and x86_64.
> 
> TIA,
> 
> http://download.gna.org/adeos/patches/v2.6/x86/adeos-ipipe-2.6.27.19-x86-2.2-06.patch
> http://download.gna.org/adeos/patches/v2.6/x86/adeos-ipipe-2.6.28.7-x86-2.2-06.patch
> 

For 2.6.28.7-x86-2.2-06 on 32 bit, I can report that Roman's test no
longer triggers an oops or a warning.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 13:36       ` [Adeos-main] " Jan Kiszka
@ 2009-02-23 17:42         ` Jan Kiszka
  2009-02-23 17:50           ` Jan Kiszka
  2009-02-23 18:57           ` Philippe Gerum
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Kiszka @ 2009-02-23 17:42 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>> Philippe Gerum wrote:
>>>> Jan Kiszka wrote:
>>>>> This is an attempt to fix the broken root domain state adjustment in
>>>>> __ipipe_handle_exception. Patch below fixes the issues recently reported
>>>>> by Roman Pisl. Also, it currently makes much more sense to me than what
>>>>> we have so far.
>>>>>
>>>>> In short, this patch propagates the hardware irq state into the root
>>>>> domains stall flag shortly before calling into the Linux handler, and
>>>>> only then. This avoids spurious root domain stalls the end up over the
>>>>> wrong Linux context due to context switches between enter and exit of
>>>>> ipipe_handle_exception. Also, this patch drops the bogus
>>>>> local_irq_save/restore pair that doesn't account for Linux irq state
>>>>> changes inside its fault handler.
>>>>>
>>>> Actually, it is not bogus at all, it is even mandatory on x86_64, given that we
>>>> don't branch to any sysretq/iretq emulation unlike with x86_32. So if we don't
>>>> restore the stall bit for the root domain properly there, we could end up
>>>> running with interrupts off in user-space.
>>>>
>>>> However, the way the interrupt state is currently saved is wrong: we should not
>>>> local_irq_disable() over non-root domains. Here is some on-line documentation to
>>>> explain why:
>>>>
>>>> The main difference between x86_32 and 64 is that the former does virtualize the
>>>> interrupt state in entry_32.S, unlike the latter. For that reason, x86_64 does
>>>> not require (actually, we should not be doing) any fixup. So, to sum up:
>>>>
>>>> - we use fixup_if() to restore the virtual interrupt state properly when control
>>>> is given back to the code that triggered the fault/exception (x86_32). We need
>>>> to do that because of task migrations between primary and secondary modes.
>>>>
>>>> - we must clear the virtual interrupt flag before calling the I-pipe handler /
>>>> Linux regular exception handler, because our callee may/must run in the root
>>>> domain as well, and expect that interrupt state to reflect the hw one, as set by
>>>> the x86 exception gate / fault prologue in entry_*.S.
>>>>
>>>> - because of the above, we must use local_irq_save()/local_irq_restore_nosync()
>>>> in our fault handler to make sure to restore the virtual interrupt flag properly
>>>> between this routine, and the exception return statement (i.e. during the Linux
>>>> fault epilogue in entry_*.S).
>>> OK, if there is a reason to enforce a stalled root domain while calling
>>> into the exception hook, this makes some sense. But I don't think it is
>>> formally correct to save the root state on entry and blindly restore it
>>> _after_ calling the Linux handler. I rather think we should keep the
>>> state that Linux leaves behind to remain transparent to it. Maybe no
>>> practical issue ATM, but it makes the code at least illogical.
>>>
>> Please re-read the explanations, and you will find the logic. I cannot do
>> anything more than re-hashing what I just said. What you perceive as illogical
>> is actually the only sane way to do this. Formally speaking, a linux fault
>> handler may NOT alter the interrupt state blindly, so we must be able to assume
>> that we ought to restore it the way the lower code set it.
> 
> I got your first and second point, but they don't imply to me that the
> third shall be correct as well. "...to make sure to restore the virtual
> interrupt flag properly" is not directly an clear explanation (for me)
> why we have to restore the flag across calls to the _Linux_ handler. We
> can demand that the hook handler leaves the root state untouched, but
> requiring the same from Linux is a restriction that you don't find in
> the ipipe-less case, nor do I see the reason for this under ipipe control.
> 

The make my question a bit more concrete (and help me writing the right
comments around these lines): What makes the following change bogus,
which scenario will fail?

Index: b/arch/x86/kernel/ipipe.c
===================================================================
--- a/arch/x86/kernel/ipipe.c
+++ b/arch/x86/kernel/ipipe.c
@@ -685,7 +685,9 @@ int __ipipe_handle_exception(struct pt_r
 	}
 
 	__ipipe_std_extable[vector](regs, error_code);
-	local_irq_restore_nosync(flags);
+
+	__fixup_if(test_bit(IPIPE_STALL_FLAG, &ipipe_root_cpudom_var(status)),
+		   regs);
 
 	return 0;
 }

My reasoning behind is: Once we call into the Linux handler (after
properly transferring the hardware IRQ state into a pipeline state, of
course), it's up to the Linux handler to decide about the root domain
state on exit. We really shouldn't overwrite it with what we found on
entry. That state is only to be replayed when we leave without calling
Linux.

BTW, I'm currently failing to find the code path that enables hardware
IRQs before calling the Linux handler. There's no related change in this
particular patch, so I guess I'm just blind ATM.

Thanks for insights,
Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 17:42         ` Jan Kiszka
@ 2009-02-23 17:50           ` Jan Kiszka
  2009-02-23 18:57           ` Philippe Gerum
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2009-02-23 17:50 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main

Jan Kiszka wrote:
> ...
> BTW, I'm currently failing to find the code path that enables hardware
> IRQs before calling the Linux handler. There's no related change in this
> particular patch, so I guess I'm just blind ATM.

Found it: We have to call into do_page_fault with IRQs disabled in order
to get the right CR2. So the callees need to be patched to enable hard
IRQs again, we can't do it earlier.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 17:42         ` Jan Kiszka
  2009-02-23 17:50           ` Jan Kiszka
@ 2009-02-23 18:57           ` Philippe Gerum
  2009-02-23 22:20             ` Jan Kiszka
  1 sibling, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2009-02-23 18:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main

Jan Kiszka wrote:
> Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> Jan Kiszka wrote:
>>>>>> This is an attempt to fix the broken root domain state adjustment in
>>>>>> __ipipe_handle_exception. Patch below fixes the issues recently reported
>>>>>> by Roman Pisl. Also, it currently makes much more sense to me than what
>>>>>> we have so far.
>>>>>>
>>>>>> In short, this patch propagates the hardware irq state into the root
>>>>>> domains stall flag shortly before calling into the Linux handler, and
>>>>>> only then. This avoids spurious root domain stalls the end up over the
>>>>>> wrong Linux context due to context switches between enter and exit of
>>>>>> ipipe_handle_exception. Also, this patch drops the bogus
>>>>>> local_irq_save/restore pair that doesn't account for Linux irq state
>>>>>> changes inside its fault handler.
>>>>>>
>>>>> Actually, it is not bogus at all, it is even mandatory on x86_64, given that we
>>>>> don't branch to any sysretq/iretq emulation unlike with x86_32. So if we don't
>>>>> restore the stall bit for the root domain properly there, we could end up
>>>>> running with interrupts off in user-space.
>>>>>
>>>>> However, the way the interrupt state is currently saved is wrong: we should not
>>>>> local_irq_disable() over non-root domains. Here is some on-line documentation to
>>>>> explain why:
>>>>>
>>>>> The main difference between x86_32 and 64 is that the former does virtualize the
>>>>> interrupt state in entry_32.S, unlike the latter. For that reason, x86_64 does
>>>>> not require (actually, we should not be doing) any fixup. So, to sum up:
>>>>>
>>>>> - we use fixup_if() to restore the virtual interrupt state properly when control
>>>>> is given back to the code that triggered the fault/exception (x86_32). We need
>>>>> to do that because of task migrations between primary and secondary modes.
>>>>>
>>>>> - we must clear the virtual interrupt flag before calling the I-pipe handler /
>>>>> Linux regular exception handler, because our callee may/must run in the root
>>>>> domain as well, and expect that interrupt state to reflect the hw one, as set by
>>>>> the x86 exception gate / fault prologue in entry_*.S.
>>>>>
>>>>> - because of the above, we must use local_irq_save()/local_irq_restore_nosync()
>>>>> in our fault handler to make sure to restore the virtual interrupt flag properly
>>>>> between this routine, and the exception return statement (i.e. during the Linux
>>>>> fault epilogue in entry_*.S).
>>>> OK, if there is a reason to enforce a stalled root domain while calling
>>>> into the exception hook, this makes some sense. But I don't think it is
>>>> formally correct to save the root state on entry and blindly restore it
>>>> _after_ calling the Linux handler. I rather think we should keep the
>>>> state that Linux leaves behind to remain transparent to it. Maybe no
>>>> practical issue ATM, but it makes the code at least illogical.
>>>>
>>> Please re-read the explanations, and you will find the logic. I cannot do
>>> anything more than re-hashing what I just said. What you perceive as illogical
>>> is actually the only sane way to do this. Formally speaking, a linux fault
>>> handler may NOT alter the interrupt state blindly, so we must be able to assume
>>> that we ought to restore it the way the lower code set it.
>> I got your first and second point, but they don't imply to me that the
>> third shall be correct as well. "...to make sure to restore the virtual
>> interrupt flag properly" is not directly an clear explanation (for me)
>> why we have to restore the flag across calls to the _Linux_ handler. We
>> can demand that the hook handler leaves the root state untouched, but
>> requiring the same from Linux is a restriction that you don't find in
>> the ipipe-less case, nor do I see the reason for this under ipipe control.
>>
> 
> The make my question a bit more concrete (and help me writing the right
> comments around these lines): What makes the following change bogus,
> which scenario will fail?
> 
> Index: b/arch/x86/kernel/ipipe.c
> ===================================================================
> --- a/arch/x86/kernel/ipipe.c
> +++ b/arch/x86/kernel/ipipe.c
> @@ -685,7 +685,9 @@ int __ipipe_handle_exception(struct pt_r
>  	}
>  
>  	__ipipe_std_extable[vector](regs, error_code);
> -	local_irq_restore_nosync(flags);
> +
> +	__fixup_if(test_bit(IPIPE_STALL_FLAG, &ipipe_root_cpudom_var(status)),
> +		   regs);
>  
>  	return 0;
>  }
>


This would break the interrupt state on x86_64, because it is not virtualized by
the low level code (latency wise, this is not worth the burden). So your
exception path would stall the root domain, and never unstall it because you do
not have any iretq/sysretq emulation; actually, you do not have any fixup. This
would work on x86_32 for the converse reason though.

Practically, here is the typical WARN_ON() you would get with your patch in on
x86_64:

WARNING: at kernel/softirq.c:138 local_bh_enable_ip+0xab/0xe0()
Modules linked in:
Pid: 464, comm: switchtest Not tainted 2.6.28.7 #5
Call Trace:
 [<ffffffff8023d40f>] warn_on_slowpath+0x5f/0x90
 [<ffffffff80231fde>] ? __wake_up+0x4e/0x70
 [<ffffffff804474f1>] ? serial8250_handle_port+0x51/0x320
 [<ffffffff802cbc21>] ? mempool_alloc_slab+0x11/0x20
 [<ffffffff802cbd83>] ? mempool_alloc+0x53/0x130
 [<ffffffff8024373b>] local_bh_enable_ip+0xab/0xe0
 [<ffffffff80554f59>] _spin_unlock_bh+0x19/0x20
 [<ffffffff80524445>] xprt_prepare_transmit+0x85/0xc0
 [<ffffffff805221c2>] call_transmit+0x42/0x2a0
 [<ffffffff80529db2>] __rpc_execute+0xa2/0x290
 [<ffffffff80529fc8>] rpc_execute+0x28/0x30
 [<ffffffff80522f57>] rpc_run_task+0x37/0x80
 [<ffffffff8052309d>] rpc_call_sync+0x3d/0x60
 [<ffffffff803852ba>] nfs_proc_getattr+0x4a/0x90
 [<ffffffff8037ceaa>] __nfs_revalidate_inode+0xda/0x220
 [<ffffffff80222d0e>] ? __ipipe_handle_irq+0x11e/0x2d0
 [<ffffffff8020c9d6>] ? common_interrupt+0x66/0x82
 [<ffffffff8037d0a7>] nfs_revalidate_inode+0x37/0x60
 [<ffffffff8037d50f>] nfs_getattr+0xcf/0x130
 [<ffffffff802fef50>] vfs_getattr+0x20/0x40
 [<ffffffff802ff70a>] vfs_fstat+0x3a/0x60
 [<ffffffff802ff74f>] sys_newfstat+0x1f/0x40
 [<ffffffff80554a52>] ? __ipipe_syscall_root_thunk+0x35/0x6a
 [<ffffffff8020c40f>] system_call_fastpath+0x16/0x1b
---[ end trace 032fc619f80159ff ]---

> My reasoning behind is: Once we call into the Linux handler (after
> properly transferring the hardware IRQ state into a pipeline state, of
> course), it's up to the Linux handler to decide about the root domain
> state on exit. We really shouldn't overwrite it with what we found on
> entry. That state is only to be replayed when we leave without calling
> Linux.
>

As a matter of fact, we do own the virtual interrupt state, the regular low
level code in entry_* does not. Keeping it intact is a requirement of the
pipeline, hence the x86_64 behaviour for instance. In that case, the underlying
I-pipe code does assume that nobody is going to mess with that state; what the
regular Linux handlers think of the current interrupt state is irrelevant, we
only have to make it compatible with their logic on entry (e.g. stall the root
domain on page fault, because this is what an interrupt gate does with the hw
flag). If something goes wrong with the low level code upon return from the
exception handler because of the virtual state, we are the ones to blame,
because we do control it fully.

> BTW, I'm currently failing to find the code path that enables hardware
> IRQs before calling the Linux handler. There's no related change in this
> particular patch, so I guess I'm just blind ATM.
> 
> Thanks for insights,
> Jan
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 18:57           ` Philippe Gerum
@ 2009-02-23 22:20             ` Jan Kiszka
  2009-02-24  9:03               ` Philippe Gerum
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2009-02-23 22:20 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main

[-- Attachment #1: Type: text/plain, Size: 6630 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Philippe Gerum wrote:
>>>> Jan Kiszka wrote:
>>>>> Philippe Gerum wrote:
>>>>>> Jan Kiszka wrote:
>>>>>>> This is an attempt to fix the broken root domain state adjustment in
>>>>>>> __ipipe_handle_exception. Patch below fixes the issues recently reported
>>>>>>> by Roman Pisl. Also, it currently makes much more sense to me than what
>>>>>>> we have so far.
>>>>>>>
>>>>>>> In short, this patch propagates the hardware irq state into the root
>>>>>>> domains stall flag shortly before calling into the Linux handler, and
>>>>>>> only then. This avoids spurious root domain stalls the end up over the
>>>>>>> wrong Linux context due to context switches between enter and exit of
>>>>>>> ipipe_handle_exception. Also, this patch drops the bogus
>>>>>>> local_irq_save/restore pair that doesn't account for Linux irq state
>>>>>>> changes inside its fault handler.
>>>>>>>
>>>>>> Actually, it is not bogus at all, it is even mandatory on x86_64, given that we
>>>>>> don't branch to any sysretq/iretq emulation unlike with x86_32. So if we don't
>>>>>> restore the stall bit for the root domain properly there, we could end up
>>>>>> running with interrupts off in user-space.
>>>>>>
>>>>>> However, the way the interrupt state is currently saved is wrong: we should not
>>>>>> local_irq_disable() over non-root domains. Here is some on-line documentation to
>>>>>> explain why:
>>>>>>
>>>>>> The main difference between x86_32 and 64 is that the former does virtualize the
>>>>>> interrupt state in entry_32.S, unlike the latter. For that reason, x86_64 does
>>>>>> not require (actually, we should not be doing) any fixup. So, to sum up:
>>>>>>
>>>>>> - we use fixup_if() to restore the virtual interrupt state properly when control
>>>>>> is given back to the code that triggered the fault/exception (x86_32). We need
>>>>>> to do that because of task migrations between primary and secondary modes.
>>>>>>
>>>>>> - we must clear the virtual interrupt flag before calling the I-pipe handler /
>>>>>> Linux regular exception handler, because our callee may/must run in the root
>>>>>> domain as well, and expect that interrupt state to reflect the hw one, as set by
>>>>>> the x86 exception gate / fault prologue in entry_*.S.
>>>>>>
>>>>>> - because of the above, we must use local_irq_save()/local_irq_restore_nosync()
>>>>>> in our fault handler to make sure to restore the virtual interrupt flag properly
>>>>>> between this routine, and the exception return statement (i.e. during the Linux
>>>>>> fault epilogue in entry_*.S).
>>>>> OK, if there is a reason to enforce a stalled root domain while calling
>>>>> into the exception hook, this makes some sense. But I don't think it is
>>>>> formally correct to save the root state on entry and blindly restore it
>>>>> _after_ calling the Linux handler. I rather think we should keep the
>>>>> state that Linux leaves behind to remain transparent to it. Maybe no
>>>>> practical issue ATM, but it makes the code at least illogical.
>>>>>
>>>> Please re-read the explanations, and you will find the logic. I cannot do
>>>> anything more than re-hashing what I just said. What you perceive as illogical
>>>> is actually the only sane way to do this. Formally speaking, a linux fault
>>>> handler may NOT alter the interrupt state blindly, so we must be able to assume
>>>> that we ought to restore it the way the lower code set it.
>>> I got your first and second point, but they don't imply to me that the
>>> third shall be correct as well. "...to make sure to restore the virtual
>>> interrupt flag properly" is not directly an clear explanation (for me)
>>> why we have to restore the flag across calls to the _Linux_ handler. We
>>> can demand that the hook handler leaves the root state untouched, but
>>> requiring the same from Linux is a restriction that you don't find in
>>> the ipipe-less case, nor do I see the reason for this under ipipe control.
>>>
>> The make my question a bit more concrete (and help me writing the right
>> comments around these lines): What makes the following change bogus,
>> which scenario will fail?
>>
>> Index: b/arch/x86/kernel/ipipe.c
>> ===================================================================
>> --- a/arch/x86/kernel/ipipe.c
>> +++ b/arch/x86/kernel/ipipe.c
>> @@ -685,7 +685,9 @@ int __ipipe_handle_exception(struct pt_r
>>  	}
>>  
>>  	__ipipe_std_extable[vector](regs, error_code);
>> -	local_irq_restore_nosync(flags);
>> +
>> +	__fixup_if(test_bit(IPIPE_STALL_FLAG, &ipipe_root_cpudom_var(status)),
>> +		   regs);
>>  
>>  	return 0;
>>  }
>>
> 
> 
> This would break the interrupt state on x86_64, because it is not virtualized by

Hmm, __fixup_if is void on x86-64, so this was way off what I was trying
to express.

> the low level code (latency wise, this is not worth the burden). So your
> exception path would stall the root domain, and never unstall it because you do
> not have any iretq/sysretq emulation; actually, you do not have any fixup. This
> would work on x86_32 for the converse reason though.

OK, I finally understood this difference. (I guess it comes from a
different code structure of entry_32.S compared to entry_64.S, right? So
nothing we could unify on our own?)

To clarify this for me: For 32 bit, the pipeline state after iret/sysret
is calculated in the entry layer (in __ipipe_unstall_iret_root more
precisely), so the local_irq_restore_nosync is actually of minor
importance here, isn't it? Couldn't we skip it for this arch then?

On 64 bit, we have to set the right pipeline state before returning to
the entry layer because it won't be touched there at all. We currently
do this based on the state found on exception entry, but we could also
do it based on the regs.flags state that the Linux handler left behind
(like we do on 32 bit). But the scenario I had in mind where this would
actually make a difference turned out to be a red herring. I don't think
Linux modifies regs.flags in its exception handlers. The ipipe pattern
remains inconsistent IMHO, but it is practically irrelevant.


Final question to explain the __fixup_if in __ipipe_handle_exception:
That's due to the scenario where we migrate to the root domain while
running the notify handler? We may return from that migration with some
IF state in regs.flags that no longer matches the one found on exception
entry, correct?

Will stuff all this into a few lines of comments soon.

Thanks for your patience,
Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai-help] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 12:10 ` [Xenomai-help] " Philippe Gerum
  2009-02-23 17:21   ` Jan Kiszka
@ 2009-02-23 23:32   ` Steven Seeger
  2009-02-23 23:35   ` Steven Seeger
  2 siblings, 0 replies; 15+ messages in thread
From: Steven Seeger @ 2009-02-23 23:32 UTC (permalink / raw)
  To: rpm, Jan Kiszka; +Cc: xenomai-help

I ran my old FPU test on this just for fun and got the following OOPS:

Eeek! Page_mapcount(page) went negative! (-1)
Page pfn = 0
Page->flags = 14
Page->count = ffffffff
Page->mapping = 00000000
Vm->vm_ops = 0
BUG at mm/rmap.c:684

In page_remove_rmap

I also see "fixing recursive fault but reboot needed"

Steven



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai-help] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 12:10 ` [Xenomai-help] " Philippe Gerum
  2009-02-23 17:21   ` Jan Kiszka
  2009-02-23 23:32   ` Steven Seeger
@ 2009-02-23 23:35   ` Steven Seeger
  2 siblings, 0 replies; 15+ messages in thread
From: Steven Seeger @ 2009-02-23 23:35 UTC (permalink / raw)
  To: rpm, Jan Kiszka; +Cc: xenomai-help

I also see "BUG: scheduling while atomic: init/951/0x10000002"

I also saw it say that with an ls process that crashed.

Steven


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception
  2009-02-23 22:20             ` Jan Kiszka
@ 2009-02-24  9:03               ` Philippe Gerum
  0 siblings, 0 replies; 15+ messages in thread
From: Philippe Gerum @ 2009-02-24  9:03 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>> Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Philippe Gerum wrote:
>>>>>>> Jan Kiszka wrote:
>>>>>>>> This is an attempt to fix the broken root domain state adjustment in
>>>>>>>> __ipipe_handle_exception. Patch below fixes the issues recently reported
>>>>>>>> by Roman Pisl. Also, it currently makes much more sense to me than what
>>>>>>>> we have so far.
>>>>>>>>
>>>>>>>> In short, this patch propagates the hardware irq state into the root
>>>>>>>> domains stall flag shortly before calling into the Linux handler, and
>>>>>>>> only then. This avoids spurious root domain stalls the end up over the
>>>>>>>> wrong Linux context due to context switches between enter and exit of
>>>>>>>> ipipe_handle_exception. Also, this patch drops the bogus
>>>>>>>> local_irq_save/restore pair that doesn't account for Linux irq state
>>>>>>>> changes inside its fault handler.
>>>>>>>>
>>>>>>> Actually, it is not bogus at all, it is even mandatory on x86_64, given that we
>>>>>>> don't branch to any sysretq/iretq emulation unlike with x86_32. So if we don't
>>>>>>> restore the stall bit for the root domain properly there, we could end up
>>>>>>> running with interrupts off in user-space.
>>>>>>>
>>>>>>> However, the way the interrupt state is currently saved is wrong: we should not
>>>>>>> local_irq_disable() over non-root domains. Here is some on-line documentation to
>>>>>>> explain why:
>>>>>>>
>>>>>>> The main difference between x86_32 and 64 is that the former does virtualize the
>>>>>>> interrupt state in entry_32.S, unlike the latter. For that reason, x86_64 does
>>>>>>> not require (actually, we should not be doing) any fixup. So, to sum up:
>>>>>>>
>>>>>>> - we use fixup_if() to restore the virtual interrupt state properly when control
>>>>>>> is given back to the code that triggered the fault/exception (x86_32). We need
>>>>>>> to do that because of task migrations between primary and secondary modes.
>>>>>>>
>>>>>>> - we must clear the virtual interrupt flag before calling the I-pipe handler /
>>>>>>> Linux regular exception handler, because our callee may/must run in the root
>>>>>>> domain as well, and expect that interrupt state to reflect the hw one, as set by
>>>>>>> the x86 exception gate / fault prologue in entry_*.S.
>>>>>>>
>>>>>>> - because of the above, we must use local_irq_save()/local_irq_restore_nosync()
>>>>>>> in our fault handler to make sure to restore the virtual interrupt flag properly
>>>>>>> between this routine, and the exception return statement (i.e. during the Linux
>>>>>>> fault epilogue in entry_*.S).
>>>>>> OK, if there is a reason to enforce a stalled root domain while calling
>>>>>> into the exception hook, this makes some sense. But I don't think it is
>>>>>> formally correct to save the root state on entry and blindly restore it
>>>>>> _after_ calling the Linux handler. I rather think we should keep the
>>>>>> state that Linux leaves behind to remain transparent to it. Maybe no
>>>>>> practical issue ATM, but it makes the code at least illogical.
>>>>>>
>>>>> Please re-read the explanations, and you will find the logic. I cannot do
>>>>> anything more than re-hashing what I just said. What you perceive as illogical
>>>>> is actually the only sane way to do this. Formally speaking, a linux fault
>>>>> handler may NOT alter the interrupt state blindly, so we must be able to assume
>>>>> that we ought to restore it the way the lower code set it.
>>>> I got your first and second point, but they don't imply to me that the
>>>> third shall be correct as well. "...to make sure to restore the virtual
>>>> interrupt flag properly" is not directly an clear explanation (for me)
>>>> why we have to restore the flag across calls to the _Linux_ handler. We
>>>> can demand that the hook handler leaves the root state untouched, but
>>>> requiring the same from Linux is a restriction that you don't find in
>>>> the ipipe-less case, nor do I see the reason for this under ipipe control.
>>>>
>>> The make my question a bit more concrete (and help me writing the right
>>> comments around these lines): What makes the following change bogus,
>>> which scenario will fail?
>>>
>>> Index: b/arch/x86/kernel/ipipe.c
>>> ===================================================================
>>> --- a/arch/x86/kernel/ipipe.c
>>> +++ b/arch/x86/kernel/ipipe.c
>>> @@ -685,7 +685,9 @@ int __ipipe_handle_exception(struct pt_r
>>>  	}
>>>  
>>>  	__ipipe_std_extable[vector](regs, error_code);
>>> -	local_irq_restore_nosync(flags);
>>> +
>>> +	__fixup_if(test_bit(IPIPE_STALL_FLAG, &ipipe_root_cpudom_var(status)),
>>> +		   regs);
>>>  
>>>  	return 0;
>>>  }
>>>
>>
>> This would break the interrupt state on x86_64, because it is not virtualized by
> 
> Hmm, __fixup_if is void on x86-64, so this was way off what I was trying
> to express.
> 
>> the low level code (latency wise, this is not worth the burden). So your
>> exception path would stall the root domain, and never unstall it because you do
>> not have any iretq/sysretq emulation; actually, you do not have any fixup. This
>> would work on x86_32 for the converse reason though.
> 
> OK, I finally understood this difference. (I guess it comes from a
> different code structure of entry_32.S compared to entry_64.S, right? So
> nothing we could unify on our own?)
>

Exactly, or at least something that would be overkill to unify when it comes to
full virtualization of the interrupt state in entry_64.S, for basically no gain,
and maybe worse performances.

> To clarify this for me: For 32 bit, the pipeline state after iret/sysret
> is calculated in the entry layer (in __ipipe_unstall_iret_root more
> precisely), so the local_irq_restore_nosync is actually of minor
> importance here, isn't it? Couldn't we skip it for this arch then?
> 

Yes, this would work with the current upstream code; but at the same time, not
restoring the stall bit the way we got it on entry of the exception handler
would break the assumption the I-pipe may make in further revisions that no
routine will interfere with the root domain state.

> On 64 bit, we have to set the right pipeline state before returning to
> the entry layer because it won't be touched there at all. We currently
> do this based on the state found on exception entry, but we could also
> do it based on the regs.flags state that the Linux handler left behind
> (like we do on 32 bit). But the scenario I had in mind where this would
> actually make a difference turned out to be a red herring. I don't think
> Linux modifies regs.flags in its exception handlers. The ipipe pattern
> remains inconsistent IMHO, but it is practically irrelevant.
> 

I don't think it is inconsistent, it is just about defining who owns the
interrupt state. When the I-pipe is enabled, the fundamental design choice is
that our low level code has precedence over what the upstream code does.
Obviously, we try to do that in a way that keeps Linux happy, but if you
consider the way x86_64 works for us, we actually choose to enforce interrupt
protection using the hw state like in the I-pipe less case, instead of relying
on a virtualized state while running the epilogue code, with some exceptions
like calling preempt_schedule_irq() with (virtualized) interrupts off.

> 
> Final question to explain the __fixup_if in __ipipe_handle_exception:
> That's due to the scenario where we migrate to the root domain while
> running the notify handler? We may return from that migration with some
> IF state in regs.flags that no longer matches the one found on exception
> entry, correct?

Correct.

> 
> Will stuff all this into a few lines of comments soon.
> 
> Thanks for your patience,
> Jan
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-02-24  9:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-20 14:33 [Adeos-main] [PATCH] x86: Proper root domain state management for ipipe_handle_exception Jan Kiszka
2009-02-23 12:04 ` [Xenomai-help] " Philippe Gerum
2009-02-23 12:37   ` Jan Kiszka
2009-02-23 12:40     ` Jan Kiszka
2009-02-23 12:55     ` Philippe Gerum
2009-02-23 13:36       ` [Adeos-main] " Jan Kiszka
2009-02-23 17:42         ` Jan Kiszka
2009-02-23 17:50           ` Jan Kiszka
2009-02-23 18:57           ` Philippe Gerum
2009-02-23 22:20             ` Jan Kiszka
2009-02-24  9:03               ` Philippe Gerum
2009-02-23 12:10 ` [Xenomai-help] " Philippe Gerum
2009-02-23 17:21   ` Jan Kiszka
2009-02-23 23:32   ` Steven Seeger
2009-02-23 23:35   ` Steven Seeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.