* [Adeos-main] Flaw in x86-32 syscall/irq return path - and maybe more
@ 2010-11-15 19:31 Jan Kiszka
2010-11-15 19:54 ` Jan Kiszka
2010-11-15 20:20 ` Philippe Gerum
0 siblings, 2 replies; 6+ messages in thread
From: Jan Kiszka @ 2010-11-15 19:31 UTC (permalink / raw)
To: Philippe Gerum; +Cc: adeos-main
Hi Philippe,
debugging some variant of I-pipe over an x86-32 target, I think I found
some fairly old flaw in the IRQ virtualization that causes rescheduling
delays (up to deadlocks) for Linux:
- we are in sysenter_tail (other exit paths should be affected as well)
- we DISABLE_INTERRUPTS, but only virtually
- we go past "testl $_TIF_ALLWORK_MASK, %ecx", nothing to be done
- an IRQ for Linux arrives, it is pushed to the backlog
- __ipipe_unstall_iret_root replays the IRQ as the regs we are about to
return to have IF set (obviously, we return from a syscall)
- the Linux IRQ handler sets _TIF_NEED_RESCHED, but doesn't perform the
work on return as __ipipe_sync_stage set the stall flag for the Linux
domain before calling the handler
- but now the preempted sysenter return also does no reschedule as it
already passed the check - bang!
Another variant of this Linux rescheduling issue:
- we are in a lengthy loop inside the kernel, but we are preemptible
most of the time
- after disabling Linux IRQs briefly, we are calling
local_irq_enable() again
- in the meantime, we received a Linux IRQ which is now pending in the
backlog
- __ipipe_unstall_root triggers __ipipe_sync_stage
- Linux handler is called, sets NEED_RESCHED but does not reschedule
(see above)
- we do not test for resched again as we are not returning to user
space, and that for quite some time - bang!
I think both issues are only related to virtualizing DISABLE_INTERRUPTS
for entry_32.S and I wonder if this doesn't finally qualify for a switch
to the 64-bit model. Or do you see simpler fixes?
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Adeos-main] Flaw in x86-32 syscall/irq return path - and maybe more
2010-11-15 19:31 [Adeos-main] Flaw in x86-32 syscall/irq return path - and maybe more Jan Kiszka
@ 2010-11-15 19:54 ` Jan Kiszka
2010-11-15 20:53 ` Philippe Gerum
2010-11-15 20:20 ` Philippe Gerum
1 sibling, 1 reply; 6+ messages in thread
From: Jan Kiszka @ 2010-11-15 19:54 UTC (permalink / raw)
To: Philippe Gerum; +Cc: adeos-main
[-- Attachment #1: Type: text/plain, Size: 2229 bytes --]
Am 15.11.2010 20:31, Jan Kiszka wrote:
> Hi Philippe,
>
> debugging some variant of I-pipe over an x86-32 target, I think I found
> some fairly old flaw in the IRQ virtualization that causes rescheduling
> delays (up to deadlocks) for Linux:
>
> - we are in sysenter_tail (other exit paths should be affected as well)
> - we DISABLE_INTERRUPTS, but only virtually
> - we go past "testl $_TIF_ALLWORK_MASK, %ecx", nothing to be done
> - an IRQ for Linux arrives, it is pushed to the backlog
> - __ipipe_unstall_iret_root replays the IRQ as the regs we are about to
> return to have IF set (obviously, we return from a syscall)
> - the Linux IRQ handler sets _TIF_NEED_RESCHED, but doesn't perform the
> work on return as __ipipe_sync_stage set the stall flag for the Linux
> domain before calling the handler
> - but now the preempted sysenter return also does no reschedule as it
> already passed the check - bang!
>
> Another variant of this Linux rescheduling issue:
>
> - we are in a lengthy loop inside the kernel, but we are preemptible
> most of the time
> - after disabling Linux IRQs briefly, we are calling
> local_irq_enable() again
> - in the meantime, we received a Linux IRQ which is now pending in the
> backlog
> - __ipipe_unstall_root triggers __ipipe_sync_stage
> - Linux handler is called, sets NEED_RESCHED but does not reschedule
> (see above)
> - we do not test for resched again as we are not returning to user
> space, and that for quite some time - bang!
And this one actually affects x86-64 as well: Here, ret_from_intr checks
for NEED_RESCHED only if IF is set in the flags of the preempted
context. But if the Linux domain is alone, we enter __ipipe_do_root_xirq
with hard IRQs disabled, thus we push this information incorrectly on
the Linux handler stack, preventing the required rescheduling check.
I guess the simplest fix for this is to drop the
"!__ipipe_pipeline_head_p(ipd)" check from __ipipe_sync_stage.
>
> I think both issues are only related to virtualizing DISABLE_INTERRUPTS
> for entry_32.S and I wonder if this doesn't finally qualify for a switch
> to the 64-bit model. Or do you see simpler fixes?
>
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Adeos-main] Flaw in x86-32 syscall/irq return path - and maybe more
2010-11-15 19:31 [Adeos-main] Flaw in x86-32 syscall/irq return path - and maybe more Jan Kiszka
2010-11-15 19:54 ` Jan Kiszka
@ 2010-11-15 20:20 ` Philippe Gerum
2010-11-15 22:20 ` Jan Kiszka
1 sibling, 1 reply; 6+ messages in thread
From: Philippe Gerum @ 2010-11-15 20:20 UTC (permalink / raw)
To: Jan Kiszka; +Cc: adeos-main
On Mon, 2010-11-15 at 20:31 +0100, Jan Kiszka wrote:
> Hi Philippe,
>
> debugging some variant of I-pipe over an x86-32 target, I think I found
> some fairly old flaw in the IRQ virtualization that causes rescheduling
> delays (up to deadlocks) for Linux:
>
> - we are in sysenter_tail (other exit paths should be affected as well)
> - we DISABLE_INTERRUPTS, but only virtually
> - we go past "testl $_TIF_ALLWORK_MASK, %ecx", nothing to be done
> - an IRQ for Linux arrives, it is pushed to the backlog
> - __ipipe_unstall_iret_root replays the IRQ as the regs we are about to
> return to have IF set (obviously, we return from a syscall)
> - the Linux IRQ handler sets _TIF_NEED_RESCHED, but doesn't perform the
> work on return as __ipipe_sync_stage set the stall flag for the Linux
> domain before calling the handler
> - but now the preempted sysenter return also does no reschedule as it
> already passed the check - bang!
Ouch. You must have had a really busy Monday to find this one.
>
> Another variant of this Linux rescheduling issue:
>
> - we are in a lengthy loop inside the kernel, but we are preemptible
> most of the time
> - after disabling Linux IRQs briefly, we are calling
> local_irq_enable() again
> - in the meantime, we received a Linux IRQ which is now pending in the
> backlog
> - __ipipe_unstall_root triggers __ipipe_sync_stage
> - Linux handler is called, sets NEED_RESCHED but does not reschedule
> (see above)
> - we do not test for resched again as we are not returning to user
> space, and that for quite some time - bang!
>
> I think both issues are only related to virtualizing DISABLE_INTERRUPTS
> for entry_32.S and I wonder if this doesn't finally qualify for a switch
> to the 64-bit model. Or do you see simpler fixes?
>
We could probably use hw masking from sysenter_tail and on, but quite
frankly, I think this time, enough is enough and this bug calls for a
radical fix, which is indeed getting rid of interrupt virtualization in
the kernel entry/exit paths for x86_32, which no other arch ever
implemented anyway.
The decision to virtualize there as well was taken circa 2.4.18, when
upstream did not care that much about latency yet. Things have changed,
and there is no more reason to virtualize interrupts in very short
critical sections, at the expense of a lot more complexity.
- __ipipe_unstall_iret_root
- __ipipe_kpreempt_root
and much of the nonsense we do to track linux's interrupt state would go
away.
--
Philippe.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Adeos-main] Flaw in x86-32 syscall/irq return path - and maybe more
2010-11-15 19:54 ` Jan Kiszka
@ 2010-11-15 20:53 ` Philippe Gerum
2010-11-15 22:23 ` Jan Kiszka
0 siblings, 1 reply; 6+ messages in thread
From: Philippe Gerum @ 2010-11-15 20:53 UTC (permalink / raw)
To: Jan Kiszka; +Cc: adeos-main
On Mon, 2010-11-15 at 20:54 +0100, Jan Kiszka wrote:
> Am 15.11.2010 20:31, Jan Kiszka wrote:
> > Hi Philippe,
> >
> > debugging some variant of I-pipe over an x86-32 target, I think I found
> > some fairly old flaw in the IRQ virtualization that causes rescheduling
> > delays (up to deadlocks) for Linux:
> >
> > - we are in sysenter_tail (other exit paths should be affected as well)
> > - we DISABLE_INTERRUPTS, but only virtually
> > - we go past "testl $_TIF_ALLWORK_MASK, %ecx", nothing to be done
> > - an IRQ for Linux arrives, it is pushed to the backlog
> > - __ipipe_unstall_iret_root replays the IRQ as the regs we are about to
> > return to have IF set (obviously, we return from a syscall)
> > - the Linux IRQ handler sets _TIF_NEED_RESCHED, but doesn't perform the
> > work on return as __ipipe_sync_stage set the stall flag for the Linux
> > domain before calling the handler
> > - but now the preempted sysenter return also does no reschedule as it
> > already passed the check - bang!
> >
> > Another variant of this Linux rescheduling issue:
> >
> > - we are in a lengthy loop inside the kernel, but we are preemptible
> > most of the time
> > - after disabling Linux IRQs briefly, we are calling
> > local_irq_enable() again
> > - in the meantime, we received a Linux IRQ which is now pending in the
> > backlog
> > - __ipipe_unstall_root triggers __ipipe_sync_stage
> > - Linux handler is called, sets NEED_RESCHED but does not reschedule
> > (see above)
> > - we do not test for resched again as we are not returning to user
> > space, and that for quite some time - bang!
>
> And this one actually affects x86-64 as well: Here, ret_from_intr checks
> for NEED_RESCHED only if IF is set in the flags of the preempted
> context. But if the Linux domain is alone, we enter __ipipe_do_root_xirq
> with hard IRQs disabled, thus we push this information incorrectly on
> the Linux handler stack, preventing the required rescheduling check.
>
> I guess the simplest fix for this is to drop the
> "!__ipipe_pipeline_head_p(ipd)" check from __ipipe_sync_stage.
>
We can't do that unfortunately, we would introduce significant latency
when the interrupt is to be delivered to the high priority domain. This
check makes sure to keep irqs off when delivering to Xenomai for
instance, since we don't want the handler to be delayed for pushing
non-rt interrupts to the log while handling a rt event.
Not that pretty, but this would be innocuous latency-wise:
diff --git a/arch/x86/include/asm/ipipe_64.h b/arch/x86/include/asm/ipipe_64.h
index b9367f6..a3b4fed 100644
--- a/arch/x86/include/asm/ipipe_64.h
+++ b/arch/x86/include/asm/ipipe_64.h
@@ -71,6 +71,7 @@ static inline void __do_root_xirq(ipipe_irq_handler_t handler,
"pushq $0\n\t"
"pushq %%rax\n\t"
"pushfq\n\t"
+ "orq %[x86if],(%%rsp)\n\t"
"pushq %[kernel_cs]\n\t"
"pushq $__xirq_end\n\t"
"pushq %[vector]\n\t"
@@ -91,7 +92,8 @@ static inline void __do_root_xirq(ipipe_irq_handler_t handler,
: /* no output */
: [kernel_cs] "i" (__KERNEL_CS),
[vector] "rm" (regs->orig_ax),
- [handler] "r" (handler), "D" (regs)
+ [handler] "r" (handler), "D" (regs),
+ [x86if] "i" (X86_EFLAGS_IF)
: "rax");
}
@@ -109,6 +111,7 @@ static inline void __do_root_virq(ipipe_irq_handler_t handler,
"pushq $0\n\t"
"pushq %%rax\n\t"
"pushfq\n\t"
+ "orq %[x86if],(%%rsp)\n\t"
"pushq %[kernel_cs]\n\t"
"pushq $__virq_end\n\t"
"pushq $-1\n\t"
@@ -125,7 +128,8 @@ static inline void __do_root_virq(ipipe_irq_handler_t handler,
"call *%[handler]\n\t"
: /* no output */
: [kernel_cs] "i" (__KERNEL_CS),
- [handler] "r" (handler), "D" (irq), "S" (cookie)
+ [handler] "r" (handler), "D" (irq), "S" (cookie),
+ [x86if] "i" (X86_EFLAGS_IF)
: "rax");
irq_exit();
__asm__ __volatile__("cli\n\t"
> >
> > I think both issues are only related to virtualizing DISABLE_INTERRUPTS
> > for entry_32.S and I wonder if this doesn't finally qualify for a switch
> > to the 64-bit model. Or do you see simpler fixes?
> >
>
> Jan
>
--
Philippe.
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Adeos-main] Flaw in x86-32 syscall/irq return path - and maybe more
2010-11-15 20:20 ` Philippe Gerum
@ 2010-11-15 22:20 ` Jan Kiszka
0 siblings, 0 replies; 6+ messages in thread
From: Jan Kiszka @ 2010-11-15 22:20 UTC (permalink / raw)
To: Philippe Gerum; +Cc: adeos-main
[-- Attachment #1: Type: text/plain, Size: 2788 bytes --]
Am 15.11.2010 21:20, Philippe Gerum wrote:
> On Mon, 2010-11-15 at 20:31 +0100, Jan Kiszka wrote:
>> Hi Philippe,
>>
>> debugging some variant of I-pipe over an x86-32 target, I think I found
>> some fairly old flaw in the IRQ virtualization that causes rescheduling
>> delays (up to deadlocks) for Linux:
>>
>> - we are in sysenter_tail (other exit paths should be affected as well)
>> - we DISABLE_INTERRUPTS, but only virtually
>> - we go past "testl $_TIF_ALLWORK_MASK, %ecx", nothing to be done
>> - an IRQ for Linux arrives, it is pushed to the backlog
>> - __ipipe_unstall_iret_root replays the IRQ as the regs we are about to
>> return to have IF set (obviously, we return from a syscall)
>> - the Linux IRQ handler sets _TIF_NEED_RESCHED, but doesn't perform the
>> work on return as __ipipe_sync_stage set the stall flag for the Linux
>> domain before calling the handler
>> - but now the preempted sysenter return also does no reschedule as it
>> already passed the check - bang!
>
> Ouch. You must have had a really busy Monday to find this one.
>
>>
>> Another variant of this Linux rescheduling issue:
>>
>> - we are in a lengthy loop inside the kernel, but we are preemptible
>> most of the time
>> - after disabling Linux IRQs briefly, we are calling
>> local_irq_enable() again
>> - in the meantime, we received a Linux IRQ which is now pending in the
>> backlog
>> - __ipipe_unstall_root triggers __ipipe_sync_stage
>> - Linux handler is called, sets NEED_RESCHED but does not reschedule
>> (see above)
>> - we do not test for resched again as we are not returning to user
>> space, and that for quite some time - bang!
>>
>> I think both issues are only related to virtualizing DISABLE_INTERRUPTS
>> for entry_32.S and I wonder if this doesn't finally qualify for a switch
>> to the 64-bit model. Or do you see simpler fixes?
>>
>
> We could probably use hw masking from sysenter_tail and on, but quite
> frankly, I think this time, enough is enough and this bug calls for a
> radical fix, which is indeed getting rid of interrupt virtualization in
> the kernel entry/exit paths for x86_32, which no other arch ever
> implemented anyway.
>
> The decision to virtualize there as well was taken circa 2.4.18, when
> upstream did not care that much about latency yet. Things have changed,
> and there is no more reason to virtualize interrupts in very short
> critical sections, at the expense of a lot more complexity.
>
> - __ipipe_unstall_iret_root
> - __ipipe_kpreempt_root
> and much of the nonsense we do to track linux's interrupt state would go
> away.
>
Much involved code is shared here, so I will check with $customer if and
how we can contribute to such a cleanup.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Adeos-main] Flaw in x86-32 syscall/irq return path - and maybe more
2010-11-15 20:53 ` Philippe Gerum
@ 2010-11-15 22:23 ` Jan Kiszka
0 siblings, 0 replies; 6+ messages in thread
From: Jan Kiszka @ 2010-11-15 22:23 UTC (permalink / raw)
To: Philippe Gerum; +Cc: adeos-main
[-- Attachment #1: Type: text/plain, Size: 4411 bytes --]
Am 15.11.2010 21:53, Philippe Gerum wrote:
> On Mon, 2010-11-15 at 20:54 +0100, Jan Kiszka wrote:
>> Am 15.11.2010 20:31, Jan Kiszka wrote:
>>> Hi Philippe,
>>>
>>> debugging some variant of I-pipe over an x86-32 target, I think I found
>>> some fairly old flaw in the IRQ virtualization that causes rescheduling
>>> delays (up to deadlocks) for Linux:
>>>
>>> - we are in sysenter_tail (other exit paths should be affected as well)
>>> - we DISABLE_INTERRUPTS, but only virtually
>>> - we go past "testl $_TIF_ALLWORK_MASK, %ecx", nothing to be done
>>> - an IRQ for Linux arrives, it is pushed to the backlog
>>> - __ipipe_unstall_iret_root replays the IRQ as the regs we are about to
>>> return to have IF set (obviously, we return from a syscall)
>>> - the Linux IRQ handler sets _TIF_NEED_RESCHED, but doesn't perform the
>>> work on return as __ipipe_sync_stage set the stall flag for the Linux
>>> domain before calling the handler
>>> - but now the preempted sysenter return also does no reschedule as it
>>> already passed the check - bang!
>>>
>>> Another variant of this Linux rescheduling issue:
>>>
>>> - we are in a lengthy loop inside the kernel, but we are preemptible
>>> most of the time
>>> - after disabling Linux IRQs briefly, we are calling
>>> local_irq_enable() again
>>> - in the meantime, we received a Linux IRQ which is now pending in the
>>> backlog
>>> - __ipipe_unstall_root triggers __ipipe_sync_stage
>>> - Linux handler is called, sets NEED_RESCHED but does not reschedule
>>> (see above)
>>> - we do not test for resched again as we are not returning to user
>>> space, and that for quite some time - bang!
>>
>> And this one actually affects x86-64 as well: Here, ret_from_intr checks
>> for NEED_RESCHED only if IF is set in the flags of the preempted
>> context. But if the Linux domain is alone, we enter __ipipe_do_root_xirq
>> with hard IRQs disabled, thus we push this information incorrectly on
>> the Linux handler stack, preventing the required rescheduling check.
>>
>> I guess the simplest fix for this is to drop the
>> "!__ipipe_pipeline_head_p(ipd)" check from __ipipe_sync_stage.
>>
>
> We can't do that unfortunately, we would introduce significant latency
> when the interrupt is to be delivered to the high priority domain. This
> check makes sure to keep irqs off when delivering to Xenomai for
> instance, since we don't want the handler to be delayed for pushing
> non-rt interrupts to the log while handling a rt event.
>
> Not that pretty, but this would be innocuous latency-wise:
>
> diff --git a/arch/x86/include/asm/ipipe_64.h b/arch/x86/include/asm/ipipe_64.h
> index b9367f6..a3b4fed 100644
> --- a/arch/x86/include/asm/ipipe_64.h
> +++ b/arch/x86/include/asm/ipipe_64.h
> @@ -71,6 +71,7 @@ static inline void __do_root_xirq(ipipe_irq_handler_t handler,
> "pushq $0\n\t"
> "pushq %%rax\n\t"
> "pushfq\n\t"
> + "orq %[x86if],(%%rsp)\n\t"
> "pushq %[kernel_cs]\n\t"
> "pushq $__xirq_end\n\t"
> "pushq %[vector]\n\t"
> @@ -91,7 +92,8 @@ static inline void __do_root_xirq(ipipe_irq_handler_t handler,
> : /* no output */
> : [kernel_cs] "i" (__KERNEL_CS),
> [vector] "rm" (regs->orig_ax),
> - [handler] "r" (handler), "D" (regs)
> + [handler] "r" (handler), "D" (regs),
> + [x86if] "i" (X86_EFLAGS_IF)
> : "rax");
> }
>
> @@ -109,6 +111,7 @@ static inline void __do_root_virq(ipipe_irq_handler_t handler,
> "pushq $0\n\t"
> "pushq %%rax\n\t"
> "pushfq\n\t"
> + "orq %[x86if],(%%rsp)\n\t"
> "pushq %[kernel_cs]\n\t"
> "pushq $__virq_end\n\t"
> "pushq $-1\n\t"
> @@ -125,7 +128,8 @@ static inline void __do_root_virq(ipipe_irq_handler_t handler,
> "call *%[handler]\n\t"
> : /* no output */
> : [kernel_cs] "i" (__KERNEL_CS),
> - [handler] "r" (handler), "D" (irq), "S" (cookie)
> + [handler] "r" (handler), "D" (irq), "S" (cookie),
> + [x86if] "i" (X86_EFLAGS_IF)
> : "rax");
> irq_exit();
> __asm__ __volatile__("cli\n\t"
Of course, this works as well. The alternative would be
"!__ipipe_pipeline_head_p(ipd) || ipd == ipipe_root_domain" - just a bit
shorter.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-11-15 22:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-15 19:31 [Adeos-main] Flaw in x86-32 syscall/irq return path - and maybe more Jan Kiszka
2010-11-15 19:54 ` Jan Kiszka
2010-11-15 20:53 ` Philippe Gerum
2010-11-15 22:23 ` Jan Kiszka
2010-11-15 20:20 ` Philippe Gerum
2010-11-15 22:20 ` Jan Kiszka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.