* [PATCH 00/13 v3] idle performance improvements
@ 2017-06-13 13:05 Nicholas Piggin
0 siblings, 0 replies; 2+ messages in thread
From: Nicholas Piggin @ 2017-06-13 13:05 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Vaidyanathan Srinivasan
Since last time, I accounted for the various comments
in reviews, most importantly fixed the miscalculation
of SRR1 bit for the wakeup-interrupt. Verified it does
the right thing and replays the right wakeup interrupt
(e.g., decrementer) from __replay_interrupt stepping
through the instructions in the simulator.
I've found that performance testing is a little difficult
because the ping-pong test cases must sometimes get into
synchronization and run concurrently without sleeping.
Looking at context switch rates in vmstat, I run the
context-switch ping-poing test with snooze idle disabled
on a POWER8, and the numbers before and after this series
are:
different thread different core
vanilla 600K/s 470K/s
patched 780K/s 550K/s
(these are 2x what's reported by context_switch selftest
because each step switches to and from idle thread)
It's still not a perfect measurement because if there is
some unwanted concurrency happening then you can get
different amount of userspace work per context switch,
but there seems to be a decent speedup here.
Thanks,
Nick
Nicholas Piggin (13):
powerpc/64s: idle move soft interrupt mask logic into C code
powerpc/64s: idle hotplug lazy-irq simplification
powerpc/64s: idle process interrupts from system reset wakeup
powerpc/64s: msgclr when handling doorbell exceptions
powerpc/64s: interrupt replay balance the return branch predictor
powerpc/64s: idle branch to handler with virtual mode offset
powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
powerpc/64s: idle hmi wakeup is unlikely
powerpc/64s: cpuidle set polling before enabling irqs
powerpc/64s: cpuidle read mostly for common globals
powerpc/64s: cpuidle no memory barrier after break from idle
powerpc/64: runlatch CTRL[RUN] set optimisation
powerpc/64s: idle runlatch switch is done with MSR[EE]=0
arch/powerpc/include/asm/dbell.h | 13 +++
arch/powerpc/include/asm/exception-64s.h | 13 +++
arch/powerpc/include/asm/hw_irq.h | 5 ++
arch/powerpc/include/asm/machdep.h | 1 +
arch/powerpc/include/asm/ppc-opcode.h | 3 +
arch/powerpc/include/asm/processor.h | 10 +--
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/exceptions-64s.S | 38 +++++++--
arch/powerpc/kernel/idle_book3s.S | 135 +++++++++----------------------
arch/powerpc/kernel/irq.c | 62 +++++++++++++-
arch/powerpc/kernel/process.c | 12 +--
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +-
arch/powerpc/platforms/powernv/idle.c | 88 ++++++++++++++++++--
arch/powerpc/platforms/powernv/smp.c | 31 ++++---
arch/powerpc/platforms/powernv/subcore.c | 3 +-
drivers/cpuidle/cpuidle-powernv.c | 37 +++++----
drivers/cpuidle/cpuidle-pseries.c | 22 +++--
17 files changed, 315 insertions(+), 167 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH 00/13 v3] idle performance improvements
@ 2017-06-13 13:05 Nicholas Piggin
0 siblings, 0 replies; 2+ messages in thread
From: Nicholas Piggin @ 2017-06-13 13:05 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Vaidyanathan Srinivasan
Since last time, I accounted for the various comments
in reviews, most importantly fixed the miscalculation
of SRR1 bit for the wakeup-interrupt. Verified it does
the right thing and replays the right wakeup interrupt
(e.g., decrementer) from __replay_interrupt stepping
through the instructions in the simulator.
I've found that performance testing is a little difficult
because the ping-pong test cases must sometimes get into
synchronization and run concurrently without sleeping.
Looking at context switch rates in vmstat, I run the
context-switch ping-poing test with snooze idle disabled
on a POWER8, and the numbers before and after this series
are:
different thread different core
vanilla 600K/s 470K/s
patched 780K/s 550K/s
(these are 2x what's reported by context_switch selftest
because each step switches to and from idle thread)
It's still not a perfect measurement because if there is
some unwanted concurrency happening then you can get
different amount of userspace work per context switch,
but there seems to be a decent speedup here.
Thanks,
Nick
Nicholas Piggin (13):
powerpc/64s: idle move soft interrupt mask logic into C code
powerpc/64s: idle hotplug lazy-irq simplification
powerpc/64s: idle process interrupts from system reset wakeup
powerpc/64s: msgclr when handling doorbell exceptions
powerpc/64s: interrupt replay balance the return branch predictor
powerpc/64s: idle branch to handler with virtual mode offset
powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
powerpc/64s: idle hmi wakeup is unlikely
powerpc/64s: cpuidle set polling before enabling irqs
powerpc/64s: cpuidle read mostly for common globals
powerpc/64s: cpuidle no memory barrier after break from idle
powerpc/64: runlatch CTRL[RUN] set optimisation
powerpc/64s: idle runlatch switch is done with MSR[EE]=0
arch/powerpc/include/asm/dbell.h | 13 +++
arch/powerpc/include/asm/exception-64s.h | 13 +++
arch/powerpc/include/asm/hw_irq.h | 5 ++
arch/powerpc/include/asm/machdep.h | 1 +
arch/powerpc/include/asm/ppc-opcode.h | 3 +
arch/powerpc/include/asm/processor.h | 10 +--
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/exceptions-64s.S | 38 +++++++--
arch/powerpc/kernel/idle_book3s.S | 135 +++++++++----------------------
arch/powerpc/kernel/irq.c | 62 +++++++++++++-
arch/powerpc/kernel/process.c | 12 +--
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +-
arch/powerpc/platforms/powernv/idle.c | 88 ++++++++++++++++++--
arch/powerpc/platforms/powernv/smp.c | 31 ++++---
arch/powerpc/platforms/powernv/subcore.c | 3 +-
drivers/cpuidle/cpuidle-powernv.c | 37 +++++----
drivers/cpuidle/cpuidle-pseries.c | 22 +++--
17 files changed, 315 insertions(+), 167 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-06-13 13:06 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-13 13:05 [PATCH 00/13 v3] idle performance improvements Nicholas Piggin
-- strict thread matches above, loose matches on Subject: below --
2017-06-13 13:05 Nicholas Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).