* Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size
2007-05-14 14:40 ` Arnd Bergmann
@ 2007-05-15 4:12 ` Tsutomu OWA
2007-05-15 7:34 ` Benjamin Herrenschmidt
2007-05-15 6:27 ` Tsutomu OWA
2007-05-15 7:23 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 16+ messages in thread
From: Tsutomu OWA @ 2007-05-15 4:12 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: linuxppc-dev, Thomas Gleixner, mingo, linux-kernel
At Mon, 14 May 2007 16:40:02 +0200, Arnd Bergmann wrote:
>=20
> On Monday 14 May 2007, Tsutomu OWA wrote:
> > --- linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h=A0=A0=A0=A0=A02007-=
04-26 12:08:32.000000000 +0900
> > +++ rt/include/asm-powerpc/tlbflush.h=A0=A0=A02007-05-14 16:12:47.00000=
0000 +0900
> > @@ -25,7 +25,12 @@ struct mm_struct;
> > =A0#include <linux/percpu.h>
> > =A0#include <asm/page.h>
> > =A0
> > +#if defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT)
> > +/* Since tlb flush takes long time on Celleb, reduce it to 1 when Cell=
eb && RT */
> > +#define PPC64_TLB_BATCH_NR 1
> With this code, you get silent side-effects of enabling PPC_CELLEB
> along with another platform.
Yeah, thank you for pointing it out.
I'll send revised patch later.
> Maybe instead you should change the hpte_need_flush() to always flush
> when running on the celleb platform and PREEMPT_RT is enabled.
Hmm... Is it in linux-2.6.21? grep'ing it did not help...=20
Is http://patchwork.ozlabs.org/linuxppc/patch?id=3D10361 is the first pla=
ce
where the hpte_need_flush() appears?
-- owa
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size
2007-05-15 4:12 ` Tsutomu OWA
@ 2007-05-15 7:34 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2007-05-15 7:34 UTC (permalink / raw)
To: Tsutomu OWA
Cc: linuxppc-dev, Thomas Gleixner, mingo, Arnd Bergmann, linux-kernel
On Tue, 2007-05-15 at 13:12 +0900, Tsutomu OWA wrote:
> > With this code, you get silent side-effects of enabling PPC_CELLEB
> > along with another platform.
>
> Yeah, thank you for pointing it out.
> I'll send revised patch later.
>
> > Maybe instead you should change the hpte_need_flush() to always flush
> > when running on the celleb platform and PREEMPT_RT is enabled.
>
> Hmm... Is it in linux-2.6.21? grep'ing it did not help...
>
> Is http://patchwork.ozlabs.org/linuxppc/patch?id=10361 is the first place
> where the hpte_need_flush() appears?
Yes, I added that when reworking the batching mecanism.
Ben.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size
2007-05-14 14:40 ` Arnd Bergmann
2007-05-15 4:12 ` Tsutomu OWA
@ 2007-05-15 6:27 ` Tsutomu OWA
2007-05-15 7:38 ` Benjamin Herrenschmidt
2007-05-15 7:23 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 16+ messages in thread
From: Tsutomu OWA @ 2007-05-15 6:27 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: linuxppc-dev, Thomas Gleixner, mingo, linux-kernel
At Mon, 14 May 2007 16:40:02 +0200, Arnd Bergmann wrote:
> > +#if defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT)
> > +/* Since tlb flush takes long time on Celleb, reduce it to 1 when Celleb && RT */
> > +#define PPC64_TLB_BATCH_NR 1
> With this code, you get silent side-effects of enabling PPC_CELLEB
> along with another platform.
> Maybe instead you should change the hpte_need_flush() to always flush
> when running on the celleb platform and PREEMPT_RT is enabled.
OK, how about this one?
thanks a lot!
Since flushing tlb needs expensive hypervisor call(s) on celleb,
always flush it on RT to reduce scheduling latency.
Signed-off-by: Tsutomu OWA <tsutomu.owa@toshiba.co.jp>
-- owa
--- linux-2.6.21-rt1/arch/powerpc/mm/tlb_64.c 2007-05-07 14:08:12.000000000 +0900
+++ rt/arch/powerpc/mm/tlb_64.c 2007-05-15 15:19:34.000000000 +0900
@@ -31,6 +31,7 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include <asm/bug.h>
+#include <asm/machdep.h>
DEFINE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
@@ -180,6 +181,18 @@ void hpte_update(struct mm_struct *mm, u
batch->vaddr[i] = (vsid << 28 ) | (addr & 0x0fffffff);
batch->pte[i] = __real_pte(__pte(pte), ptep);
batch->index = ++i;
+
+#ifdef CONFIG_PREEMPT_RT
+ /*
+ * Since flushing tlb needs expensive hypervisor call(s) on celleb,
+ * always flush it on RT to reduce scheduling latency.
+ */
+ if (machine_is(celleb)) {
+ flush_tlb_pending();
+ return;
+ }
+#endif /* CONFIG_PREEMPT_RT */
+
if (i >= PPC64_TLB_BATCH_NR)
flush_tlb_pending();
}
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size
2007-05-15 6:27 ` Tsutomu OWA
@ 2007-05-15 7:38 ` Benjamin Herrenschmidt
2007-05-15 8:08 ` Tsutomu OWA
0 siblings, 1 reply; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2007-05-15 7:38 UTC (permalink / raw)
To: Tsutomu OWA
Cc: linuxppc-dev, Thomas Gleixner, mingo, Arnd Bergmann, linux-kernel
> +
> +#ifdef CONFIG_PREEMPT_RT
> + /*
> + * Since flushing tlb needs expensive hypervisor call(s) on celleb,
> + * always flush it on RT to reduce scheduling latency.
> + */
> + if (machine_is(celleb)) {
> + flush_tlb_pending();
> + return;
> + }
> +#endif /* CONFIG_PREEMPT_RT */
> +
> if (i >= PPC64_TLB_BATCH_NR)
> flush_tlb_pending();
> }
Any reason to do that only on celleb ? :-)
Also, we might want to still batch, though in smaller amount. Have you measured
the time it takes ? We might want to modulate the amount based on wether we
are using native hash tables or an hypervisor.
Ben.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size
2007-05-15 7:38 ` Benjamin Herrenschmidt
@ 2007-05-15 8:08 ` Tsutomu OWA
2007-05-15 8:40 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 16+ messages in thread
From: Tsutomu OWA @ 2007-05-15 8:08 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linuxppc-dev, Thomas Gleixner, mingo, Arnd Bergmann, linux-kernel
At Tue, 15 May 2007 17:38:27 +1000, Benjamin Herrenschmidt wrote:
> > +#ifdef CONFIG_PREEMPT_RT
> > + /*
> > + * Since flushing tlb needs expensive hypervisor call(s) on celleb,
> > + * always flush it on RT to reduce scheduling latency.
> > + */
> > + if (machine_is(celleb)) {
> > + flush_tlb_pending();
> > + return;
> > + }
> > +#endif /* CONFIG_PREEMPT_RT */
> Any reason to do that only on celleb ? :-)
Well, at least it does not harm any other platforms :)
> Also, we might want to still batch, though in smaller amount.
Yeah, it's a good point.
> Have you measured
> the time it takes ? We might want to modulate the amount based on wether we
> are using native hash tables or an hypervisor.
Yes, here is the trace log. Accordint it, flushing 9 entries takes about 50us.
It means that flushing 192 (PPC64_TLB_BATCH_NR) entries takes 1ms.
Please note that tracing causes *roughly* 20-30% overhead, though.
I'm afraid I don't have numbers w/ native version, but I suppose you have :)
# cat /proc/latency_trace
preemption latency trace v1.1.5 on 2.6.21-rt1
--------------------------------------------------------------------
latency: 60 us, #53/53, CPU#0 | (M:rt VP:0, KP:0, SP:1 HP:1 #P:1)
-----------------
| task: inetd-358 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
=> started at: .__switch_to+0xa8/0x178 <c00000000000feec>
=> ended at: .__start+0x4000000000000000/0x8 <00000000>
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
inetd-358 0D..2 0us : .user_trace_start (.__switch_to)
inetd-358 0D..2 0us : .rt_up (.user_trace_start)
inetd-358 0D..3 1us : .rt_mutex_unlock (.rt_up)
inetd-358 0D..3 2us : .__flush_tlb_pending (.__switch_to)
inetd-358 0D..4 3us : .flush_hash_range (.__flush_tlb_pending)
inetd-358 0D..4 3us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 4us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 5us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 6us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 13us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 13us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 14us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 14us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 15us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 18us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 19us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 20us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 20us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 21us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 24us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 25us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 25us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 26us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 26us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 30us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 30us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 31us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 31us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 32us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 36us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 36us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 37us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 37us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 38us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 41us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 42us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 42us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 43us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 43us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 47us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 48us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 48us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 49us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 49us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 52us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 53us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 54us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 54us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 55us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 58us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..2 59us : .user_trace_stop (.__switch_to)
inetd-358 0D..2 60us : .user_trace_stop (.__switch_to)
-- owa
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size
2007-05-15 8:08 ` Tsutomu OWA
@ 2007-05-15 8:40 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2007-05-15 8:40 UTC (permalink / raw)
To: Tsutomu OWA
Cc: linuxppc-dev, Thomas Gleixner, mingo, Arnd Bergmann, linux-kernel
> > Have you measured
> > the time it takes ? We might want to modulate the amount based on wether we
> > are using native hash tables or an hypervisor.
>
> Yes, here is the trace log. Accordint it, flushing 9 entries takes about 50us.
> It means that flushing 192 (PPC64_TLB_BATCH_NR) entries takes 1ms.
> Please note that tracing causes *roughly* 20-30% overhead, though.
>
> I'm afraid I don't have numbers w/ native version, but I suppose you have :)
Actually I don't off hand but I'll try to get some.
> # cat /proc/latency_trace
> preemption latency trace v1.1.5 on 2.6.21-rt1
> --------------------------------------------------------------------
> latency: 60 us, #53/53, CPU#0 | (M:rt VP:0, KP:0, SP:1 HP:1 #P:1)
> -----------------
> | task: inetd-358 (uid:0 nice:0 policy:0 rt_prio:0)
> -----------------
> => started at: .__switch_to+0xa8/0x178 <c00000000000feec>
> => ended at: .__start+0x4000000000000000/0x8 <00000000>
>
> _------=> CPU#
> / _-----=> irqs-off
> | / _----=> need-resched
> || / _---=> hardirq/softirq
> ||| / _--=> preempt-depth
> |||| /
> ||||| delay
> cmd pid ||||| time | caller
> \ / ||||| \ | /
> inetd-358 0D..2 0us : .user_trace_start (.__switch_to)
> inetd-358 0D..2 0us : .rt_up (.user_trace_start)
> inetd-358 0D..3 1us : .rt_mutex_unlock (.rt_up)
> inetd-358 0D..3 2us : .__flush_tlb_pending (.__switch_to)
> inetd-358 0D..4 3us : .flush_hash_range (.__flush_tlb_pending)
> inetd-358 0D..4 3us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 4us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 5us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 6us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 13us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 13us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 14us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 14us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 15us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 18us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 19us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 20us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 20us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 21us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 24us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 25us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 25us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 26us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 26us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 30us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 30us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 31us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 31us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 32us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 36us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 36us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 37us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 37us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 38us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 41us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 42us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 42us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 43us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 43us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 47us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 48us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 48us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 49us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 49us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 52us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 53us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 54us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 54us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 55us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 58us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..2 59us : .user_trace_stop (.__switch_to)
> inetd-358 0D..2 60us : .user_trace_stop (.__switch_to)
> -- owa
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size
2007-05-14 14:40 ` Arnd Bergmann
2007-05-15 4:12 ` Tsutomu OWA
2007-05-15 6:27 ` Tsutomu OWA
@ 2007-05-15 7:23 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2007-05-15 7:23 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: linuxppc-dev, Thomas Gleixner, mingo, linux-kernel
On Mon, 2007-05-14 at 16:40 +0200, Arnd Bergmann wrote:
> On Monday 14 May 2007, Tsutomu OWA wrote:
> > --- linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h 2007-04-26 12:08:32.000000000 +0900
> > +++ rt/include/asm-powerpc/tlbflush.h 2007-05-14 16:12:47.000000000 +0900
> > @@ -25,7 +25,12 @@ struct mm_struct;
> > #include <linux/percpu.h>
> > #include <asm/page.h>
> >
> > +#if defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT)
> > +/* Since tlb flush takes long time on Celleb, reduce it to 1 when Celleb && RT */
> > +#define PPC64_TLB_BATCH_NR 1
> > +#else
> > #define PPC64_TLB_BATCH_NR 192
> > +#endif /* defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT) */
>
> With this code, you get silent side-effects of enabling PPC_CELLEB
> along with another platform.
>
> Maybe instead you should change the hpte_need_flush() to always flush
> when running on the celleb platform and PREEMPT_RT is enabled.
I think it's wrong either way.... Maybe we can make it a variable and
measure how much we can reasonably do in a given time frame at
boot ? :-) Since it depends wether there's an hypervisor etc...
Ben.
^ permalink raw reply [flat|nested] 16+ messages in thread