* [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()
@ 2007-11-12 15:39 Jeroen Van den Keybus
2007-11-12 16:47 ` Jan Kiszka
0 siblings, 1 reply; 7+ messages in thread
From: Jeroen Van den Keybus @ 2007-11-12 15:39 UTC (permalink / raw)
To: xenomai-core
[-- Attachment #1: Type: text/plain, Size: 5330 bytes --]
I have a setup which apparently suffers from long interrupt-disabled
sections during heavy load (make -j16 of a kernel). I have instrumented the
__ipipe_unstall_iret_root() function with the tracer as follows:
asmlinkage void __ipipe_unstall_iret_root(struct pt_regs regs)
{
ipipe_trace_special(0x20, 0);
ipipe_declare_cpuid;
/* Emulate IRET's handling of the interrupt flag. */
ipipe_trace_special(0x21, 0);
local_irq_disable_hw();
ipipe_trace_special(0x22, 0);
ipipe_load_cpuid();
/* Restore the software state as it used to be on kernel
entry. CAUTION: NMIs must *not* return through this
emulation. */
ipipe_trace_special(0x23, 0);
if (!(regs.eflags & X86_EFLAGS_IF)) {
ipipe_trace_special(0x24, 0);
if (!__test_and_set_bit(IPIPE_STALL_FLAG,
&ipipe_root_domain->cpudata[cpuid].status))
trace_hardirqs_off();
regs.eflags |= X86_EFLAGS_IF;
ipipe_trace_special(0x25, 0);
} else {
ipipe_trace_special(0x26, 0);
if (test_bit(IPIPE_STALL_FLAG,
&ipipe_root_domain->cpudata[cpuid].status)) {
trace_hardirqs_on();
__clear_bit(IPIPE_STALL_FLAG,
&ipipe_root_domain->cpudata[cpuid].status);
}
ipipe_trace_special(0x27, 0);
/* Only sync virtual IRQs here, so that we don't recurse
indefinitely in case of an external interrupt flood. */
if ((ipipe_root_domain->cpudata[cpuid].
irq_pending_hi & IPIPE_IRQMASK_VIRT) != 0) {
ipipe_trace_special(0x28, 0);
__ipipe_sync_pipeline(IPIPE_IRQMASK_VIRT);
}
ipipe_trace_special(0x29, 0);
}
#ifdef CONFIG_IPIPE_TRACE_IRQSOFF
ipipe_trace_end(0x8000000D);
#endif /* CONFIG_IPIPE_TRACE_IRQSOFF */
}
While under heavy load, the following trace log is produced:
...
+ func -116 0.124 xnarch_get_cpu_time+0x8
(rtdm_toseq_init+0x21)
+ func -116 0.169 xnarch_tsc_to_ns+0x12
(xnarch_get_cpu_time+0xf)
+ func -116 0.284 rtdm_event_timedwait+0xe
(rt_e1000_read+0xf5 [rt_e1000])
| # func -115 0.189 xnsynch_sleep_on+0xe
(rtdm_event_timedwait+0x141)
| # func -115 0.184 xnpod_suspend_thread+0xe
(xnsynch_sleep_on+0x3bf)
| # func -115 0.194 xntimer_migrate+0xe
(xnpod_suspend_thread+0xa2)
| # func -115 0.144 xntimer_start_aperiodic+0xe
(xnpod_suspend_thread+0xc5)
| # func -115 0.334 xnarch_ns_to_tsc+0x14
(xntimer_start_aperiodic+0x63)
| # func -114 0.199 xnpod_schedule+0xe
(xnpod_suspend_thread+0x237)
| # [18006] -<?>- 99 -114 0.814 xnpod_schedule+0x176
(xnpod_suspend_thread+0x237)
| # func -113 0.874 __switch_to+0xe
(xnpod_schedule+0x77c)
| # [18019] -<?>- -1 -112 0.799 xnpod_schedule+0x887
(xnintr_irq_handler+0x18e)
| +func -112 0.319 __ipipe_walk_pipeline+0xb
(__ipipe_handle_irq+0x87)
| #func -111 0.204 __ipipe_stall_root+0x8
(resume_userspace+0x5)
#func -111 0.134 __ipipe_unstall_iret_root+0xa
(restore_nocheck_notrace+0x0)
#(0x20) 0x00000000 -111 0.119 __ipipe_unstall_iret_root+0x19
(restore_nocheck_notrace+0x0)
#(0x21) 0x00000000 -111 0.124 __ipipe_unstall_iret_root+0x25
(restore_nocheck_notrace+0x0)
| #(0x22) 0x00000000 -111 0.119 __ipipe_unstall_iret_root+0x32
(restore_nocheck_notrace+0x0)
| #(0x23) 0x00000000 -111 0.119 __ipipe_unstall_iret_root+0x45
(restore_nocheck_notrace+0x0)
| #(0x26) 0x00000000 -110 0.134 __ipipe_unstall_iret_root+0x88
(restore_nocheck_notrace+0x0)
| +(0x27) 0x00000000 -110 0.109 __ipipe_unstall_iret_root+0x9e
(restore_nocheck_notrace+0x0)
| +(0x29) 0x00000000 -110 57.850 __ipipe_unstall_iret_root+0xb6
(restore_nocheck_notrace+0x0)
| +func -52 0.289 __ipipe_handle_irq+0xe
(ipipe_ipi3+0x26)
| +func -52 0.169 __ipipe_ack_apic+0x8
(__ipipe_handle_irq+0xbf)
| +func -52 0.154 __ipipe_dispatch_wired+0x14
(__ipipe_handle_irq+0x5f)
| # func -52 0.309 xnintr_clock_handler+0xe
(__ipipe_dispatch_wired+0xbe)
| # func -51 0.494 xntimer_tick_aperiodic+0xe
(xnintr_clock_handler+0x72)
| # func -51 0.459 xntimer_next_local_shot+0x16
(xntimer_tick_aperiodic+0x170)
| # func -50 0.154 rthal_irq_host_pend+0x8
(xnintr_clock_handler+0x131)
| # func -50 0.684 __ipipe_schedule_irq+0xb
(rthal_irq_host_pend+0x29)
| +func -50 0.294 __ipipe_walk_pipeline+0xb
(__ipipe_handle_irq+0x87)
| +func -49 0.309 __ipipe_sync_stage+0xe
(__ipipe_walk_pipeline+0xcb)
#func -49 0.299 smp_apic_timer_interrupt+0x11
(__ipipe_sync_stage+0x17b)
This system apparently has an irq-free section of 62 us. Strangely,
ipipe_special_trace(0x28) is not executed, so the delay is incurred only in
the evaluation of
(ipipe_root_domain->cpudata[cpuid].irq_pending_hi & IPIPE_IRQMASK_VIRT) != 0
I don't quite understand this. What could be going on here ?
Jeroen.
[-- Attachment #2: Type: text/html, Size: 10183 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()
2007-11-12 15:39 [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root() Jeroen Van den Keybus
@ 2007-11-12 16:47 ` Jan Kiszka
2007-11-12 17:11 ` Jeroen Van den Keybus
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2007-11-12 16:47 UTC (permalink / raw)
To: Jeroen Van den Keybus; +Cc: xenomai-core
Jeroen Van den Keybus wrote:
> I have a setup which apparently suffers from long interrupt-disabled
> sections during heavy load (make -j16 of a kernel). I have instrumented the
> __ipipe_unstall_iret_root() function with the tracer as follows:
>
>
> asmlinkage void __ipipe_unstall_iret_root(struct pt_regs regs)
> {
> ipipe_trace_special(0x20, 0);
> ipipe_declare_cpuid;
>
> /* Emulate IRET's handling of the interrupt flag. */
>
> ipipe_trace_special(0x21, 0);
> local_irq_disable_hw();
>
> ipipe_trace_special(0x22, 0);
> ipipe_load_cpuid();
>
> /* Restore the software state as it used to be on kernel
> entry. CAUTION: NMIs must *not* return through this
> emulation. */
>
> ipipe_trace_special(0x23, 0);
> if (!(regs.eflags & X86_EFLAGS_IF)) {
> ipipe_trace_special(0x24, 0);
> if (!__test_and_set_bit(IPIPE_STALL_FLAG,
> &ipipe_root_domain->cpudata[cpuid].status))
> trace_hardirqs_off();
> regs.eflags |= X86_EFLAGS_IF;
> ipipe_trace_special(0x25, 0);
> } else {
> ipipe_trace_special(0x26, 0);
> if (test_bit(IPIPE_STALL_FLAG,
> &ipipe_root_domain->cpudata[cpuid].status)) {
> trace_hardirqs_on();
> __clear_bit(IPIPE_STALL_FLAG,
> &ipipe_root_domain->cpudata[cpuid].status);
> }
> ipipe_trace_special(0x27, 0);
>
> /* Only sync virtual IRQs here, so that we don't recurse
> indefinitely in case of an external interrupt flood. */
>
> if ((ipipe_root_domain->cpudata[cpuid].
> irq_pending_hi & IPIPE_IRQMASK_VIRT) != 0) {
> ipipe_trace_special(0x28, 0);
> __ipipe_sync_pipeline(IPIPE_IRQMASK_VIRT);
> }
> ipipe_trace_special(0x29, 0);
> }
> #ifdef CONFIG_IPIPE_TRACE_IRQSOFF
> ipipe_trace_end(0x8000000D);
> #endif /* CONFIG_IPIPE_TRACE_IRQSOFF */
> }
>
> While under heavy load, the following trace log is produced:
>
> ...
>
>
>
> + func -116 0.124 xnarch_get_cpu_time+0x8
> (rtdm_toseq_init+0x21)
> + func -116 0.169 xnarch_tsc_to_ns+0x12
> (xnarch_get_cpu_time+0xf)
> + func -116 0.284 rtdm_event_timedwait+0xe
> (rt_e1000_read+0xf5 [rt_e1000])
> | # func -115 0.189 xnsynch_sleep_on+0xe
> (rtdm_event_timedwait+0x141)
> | # func -115 0.184 xnpod_suspend_thread+0xe
> (xnsynch_sleep_on+0x3bf)
> | # func -115 0.194 xntimer_migrate+0xe
> (xnpod_suspend_thread+0xa2)
> | # func -115 0.144 xntimer_start_aperiodic+0xe
> (xnpod_suspend_thread+0xc5)
> | # func -115 0.334 xnarch_ns_to_tsc+0x14
> (xntimer_start_aperiodic+0x63)
> | # func -114 0.199 xnpod_schedule+0xe
> (xnpod_suspend_thread+0x237)
> | # [18006] -<?>- 99 -114 0.814 xnpod_schedule+0x176
> (xnpod_suspend_thread+0x237)
> | # func -113 0.874 __switch_to+0xe
> (xnpod_schedule+0x77c)
> | # [18019] -<?>- -1 -112 0.799 xnpod_schedule+0x887
> (xnintr_irq_handler+0x18e)
> | +func -112 0.319 __ipipe_walk_pipeline+0xb
> (__ipipe_handle_irq+0x87)
> | #func -111 0.204 __ipipe_stall_root+0x8
> (resume_userspace+0x5)
> #func -111 0.134 __ipipe_unstall_iret_root+0xa
> (restore_nocheck_notrace+0x0)
> #(0x20) 0x00000000 -111 0.119 __ipipe_unstall_iret_root+0x19
> (restore_nocheck_notrace+0x0)
> #(0x21) 0x00000000 -111 0.124 __ipipe_unstall_iret_root+0x25
> (restore_nocheck_notrace+0x0)
> | #(0x22) 0x00000000 -111 0.119 __ipipe_unstall_iret_root+0x32
> (restore_nocheck_notrace+0x0)
> | #(0x23) 0x00000000 -111 0.119 __ipipe_unstall_iret_root+0x45
> (restore_nocheck_notrace+0x0)
> | #(0x26) 0x00000000 -110 0.134 __ipipe_unstall_iret_root+0x88
> (restore_nocheck_notrace+0x0)
> | +(0x27) 0x00000000 -110 0.109 __ipipe_unstall_iret_root+0x9e
> (restore_nocheck_notrace+0x0)
> | +(0x29) 0x00000000 -110 57.850 __ipipe_unstall_iret_root+0xb6
> (restore_nocheck_notrace+0x0)
The tracer accounts the full time between the previous and the next
kernel trace point to the above one - although we likely left to user
land here (iret will re-enable IRQs for us). So, unless the IRQ below
should have been raised much earlier, there is no issue here.
If there is a real delay, evil user land may have switched off IRQs
(given root privileges), or there is some hardware related IRQ delivery
latency. But this cannot be determined via the tracer as it only sees
kernel land.
> | +func -52 0.289 __ipipe_handle_irq+0xe
> (ipipe_ipi3+0x26)
> | +func -52 0.169 __ipipe_ack_apic+0x8
> (__ipipe_handle_irq+0xbf)
> | +func -52 0.154 __ipipe_dispatch_wired+0x14
> (__ipipe_handle_irq+0x5f)
> | # func -52 0.309 xnintr_clock_handler+0xe
> (__ipipe_dispatch_wired+0xbe)
> | # func -51 0.494 xntimer_tick_aperiodic+0xe
> (xnintr_clock_handler+0x72)
> | # func -51 0.459 xntimer_next_local_shot+0x16
> (xntimer_tick_aperiodic+0x170)
> | # func -50 0.154 rthal_irq_host_pend+0x8
> (xnintr_clock_handler+0x131)
> | # func -50 0.684 __ipipe_schedule_irq+0xb
> (rthal_irq_host_pend+0x29)
> | +func -50 0.294 __ipipe_walk_pipeline+0xb
> (__ipipe_handle_irq+0x87)
> | +func -49 0.309 __ipipe_sync_stage+0xe
> (__ipipe_walk_pipeline+0xcb)
> #func -49 0.299 smp_apic_timer_interrupt+0x11
> (__ipipe_sync_stage+0x17b)
>
> This system apparently has an irq-free section of 62 us. Strangely,
> ipipe_special_trace(0x28) is not executed, so the delay is incurred only in
> the evaluation of
>
> (ipipe_root_domain->cpudata[cpuid].irq_pending_hi & IPIPE_IRQMASK_VIRT) != 0
>
> I don't quite understand this. What could be going on here ?
>
>
>
> Jeroen.
>
>
HTH,
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()
2007-11-12 16:47 ` Jan Kiszka
@ 2007-11-12 17:11 ` Jeroen Van den Keybus
2007-11-12 17:25 ` Jan Kiszka
0 siblings, 1 reply; 7+ messages in thread
From: Jeroen Van den Keybus @ 2007-11-12 17:11 UTC (permalink / raw)
To: Jan Kiszka; +Cc: xenomai-core
[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]
> So, unless the IRQ below
> should have been raised much earlier, there is no issue here.
The point is: I actually do have interrupts (not the IPI that is visible
here) that are being withheld (on average 80 us late) during this period. On
this particular setup, I have seen numbers exceeding 300 us. Strangely
enough, the latency test does not show these hiccups. I'm suspecting cache
issues, but I would think these numbers are a bit large for that.
What you say is that between tracepoint 0x27 and 0x29, we may have entered
userland. But where does this happen, given that point 0x28 is not executed
?
I also do understand that, if a cli or sti would still lurk somewhere in
this configuration, I'm going to have a real bad time... Although I doubt
it, since the unexpected latencies always happen at this particular point.
Jeroen.
If there is a real delay, evil user land may have switched off IRQs
> (given root privileges), or there is some hardware related IRQ delivery
> latency. But this cannot be determined via the tracer as it only sees
> kernel land.
>
>
[-- Attachment #2: Type: text/html, Size: 1432 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()
2007-11-12 17:11 ` Jeroen Van den Keybus
@ 2007-11-12 17:25 ` Jan Kiszka
[not found] ` <fd6a47a90711120936m6c8eb691na49fae868b803470@domain.hid>
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2007-11-12 17:25 UTC (permalink / raw)
To: Jeroen Van den Keybus; +Cc: xenomai-core
Jeroen Van den Keybus wrote:
>> So, unless the IRQ below
>> should have been raised much earlier, there is no issue here.
>
>
> The point is: I actually do have interrupts (not the IPI that is visible
> here) that are being withheld (on average 80 us late) during this period. On
> this particular setup, I have seen numbers exceeding 300 us. Strangely
> enough, the latency test does not show these hiccups. I'm suspecting cache
> issues, but I would think these numbers are a bit large for that.
Indeed, 300 us are too long. Maybe long DMA bursts on your PCI bus? Can
you exclude certain devices from your load to check for this (no hardisk
load e.g., use nfs instead, then no network load, etc.)?
>
> What you say is that between tracepoint 0x27 and 0x29, we may have entered
> userland. But where does this happen, given that point 0x28 is not executed
> ?
Nope, user land is entered _after_ 0x29 (you need to read the numbers
with an offset of one: the delay is between
__ipipe_unstall_iret_root+0xb6 and _ipipe_handle_irq+0xe).
>
> I also do understand that, if a cli or sti would still lurk somewhere in
> this configuration, I'm going to have a real bad time... Although I doubt
> it, since the unexpected latencies always happen at this particular point.
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()
[not found] ` <fd6a47a90711120936m6c8eb691na49fae868b803470@domain.hid>
@ 2007-11-12 17:58 ` Jan Kiszka
2007-11-12 18:10 ` Jeroen Van den Keybus
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2007-11-12 17:58 UTC (permalink / raw)
To: Jeroen Van den Keybus; +Cc: Xenomai-core
[Save the CCs!]
Jeroen Van den Keybus wrote:
>> Indeed, 300 us are too long. Maybe long DMA bursts on your PCI bus? Can
>> you exclude certain devices from your load to check for this (no hardisk
>> load e.g., use nfs instead, then no network load, etc.)?
>
>
> I already tried setting PCI latency timers of everything except my card to
> e.g. 0x08. No difference though.
>
>>> What you say is that between tracepoint 0x27 and 0x29, we may have
>> entered
>>> userland. But where does this happen, given that point 0x28 is not
>> executed
>>> ?
>> Nope, user land is entered _after_ 0x29 (you need to read the numbers
>> with an offset of one: the delay is between
>> __ipipe_unstall_iret_root+0xb6 and _ipipe_handle_irq+0xe).
>
>
> Ah! Now I see. So basically I would have some user space code that is doing
> something that would delay my interrupts considerably, irrespective of the
> fact that IRQs are enabled.
>
> I once tried a WBINVD in a user space task and it has a really devastating
> effect on RT (basically locks the CPU for 200us). If there was a way of
> preventing this kind of instructions...
>
Ouch, this shouldn't be allowed in user space! WBINVD is a privileged
instruction. Do we leak privileges to user land??? Please check if your
execution mode (privilege ring) is correct there.
But indeed, wbinvd is devastating if you can execute it, causing
typically around 300 us latencies, at worst even milliseconds
(cache-size and state dependent)!
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()
2007-11-12 17:58 ` Jan Kiszka
@ 2007-11-12 18:10 ` Jeroen Van den Keybus
2007-11-12 18:27 ` Jan Kiszka
0 siblings, 1 reply; 7+ messages in thread
From: Jeroen Van den Keybus @ 2007-11-12 18:10 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Xenomai-core
[-- Attachment #1: Type: text/plain, Size: 869 bytes --]
>
> Ouch, this shouldn't be allowed in user space! WBINVD is a privileged
> instruction. Do we leak privileges to user land??? Please check if your
> execution mode (privilege ring) is correct there.
No, I rather meant a kernel-mode program that was controlled from the user
space. Sorry for upsetting you.
But indeed, wbinvd is devastating if you can execute it, causing
> typically around 300 us latencies, at worst even milliseconds
> (cache-size and state dependent)!
If I recall correctly, some of the Linux AGP GART drivers use(d?) it.
This raises another interesting question: to what extent is the x86 actually
a viable and dependable realtime platform, with its SMI's and highly
uncontrollable caching architecture ? How would VT-based solutions compare ?
(BTW Intel should really have implemented a feature to use parts of the
cache as SRAM.)
Jeroen.
[-- Attachment #2: Type: text/html, Size: 1240 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root()
2007-11-12 18:10 ` Jeroen Van den Keybus
@ 2007-11-12 18:27 ` Jan Kiszka
0 siblings, 0 replies; 7+ messages in thread
From: Jan Kiszka @ 2007-11-12 18:27 UTC (permalink / raw)
To: Jeroen Van den Keybus; +Cc: Xenomai-core
Jeroen Van den Keybus wrote:
>> Ouch, this shouldn't be allowed in user space! WBINVD is a privileged
>> instruction. Do we leak privileges to user land??? Please check if your
>> execution mode (privilege ring) is correct there.
>
>
> No, I rather meant a kernel-mode program that was controlled from the user
> space. Sorry for upsetting you.
Puh... fine, the world is round again. :)
>
> But indeed, wbinvd is devastating if you can execute it, causing
>> typically around 300 us latencies, at worst even milliseconds
>> (cache-size and state dependent)!
>
>
> If I recall correctly, some of the Linux AGP GART drivers use(d?) it.
Yep, some only require this during setup (while MTRR reconfig e.g.),
others also use it during runtime of X - making your graphical system a
real-time killer.
>
> This raises another interesting question: to what extent is the x86 actually
> a viable and dependable realtime platform, with its SMI's and highly
> uncontrollable caching architecture ? How would VT-based solutions compare ?
No surprise: x86 is a real-time minefield. Actually, I have been told
that PCI Express also requires cache flushes before starting DMA
transactions, but I haven't checked this yet.
Regarding fun with virtualisation over RT:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/8520/focus=8540
(There are even more threads about this, and some light at the end of
this tunnel, also for Intel. Check the archive.)
This reminds me that I still need to post my experimental kvm-over-ipipe
patch for VMX (yet Intel only, no AMD at hand :( ).
> (BTW Intel should really have implemented a feature to use parts of the
> cache as SRAM.)
Ack.
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-11-12 18:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-12 15:39 [Xenomai-core] Long execution time inside __ipipe_unstall_iret_root() Jeroen Van den Keybus
2007-11-12 16:47 ` Jan Kiszka
2007-11-12 17:11 ` Jeroen Van den Keybus
2007-11-12 17:25 ` Jan Kiszka
[not found] ` <fd6a47a90711120936m6c8eb691na49fae868b803470@domain.hid>
2007-11-12 17:58 ` Jan Kiszka
2007-11-12 18:10 ` Jeroen Van den Keybus
2007-11-12 18:27 ` Jan Kiszka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.