* [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient @ 2016-03-24 5:08 Jisheng Zhang 2016-03-24 5:08 ` [PATCH 1/2] arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() Jisheng Zhang ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Jisheng Zhang @ 2016-03-24 5:08 UTC (permalink / raw) To: linux-arm-kernel This series is to improve the arm_cpuidle_suspend() a bit by removing/moving out checks from this hot path. Jisheng Zhang (2): arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient arch/arm64/kernel/cpuidle.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) -- 2.8.0.rc3 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() 2016-03-24 5:08 [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient Jisheng Zhang @ 2016-03-24 5:08 ` Jisheng Zhang 2016-03-24 5:08 ` [PATCH 2/2] arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient Jisheng Zhang [not found] ` <20160324111507.GB9323@arm.com> 2 siblings, 0 replies; 6+ messages in thread From: Jisheng Zhang @ 2016-03-24 5:08 UTC (permalink / raw) To: linux-arm-kernel If cpu_ops has not been registered, arm_cpuidle_init() will return -EOPNOTSUPP, so arm_cpuidle_suspend() will never have chance to run. In other word, the cpu_ops check can be avoid. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> --- arch/arm64/kernel/cpuidle.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c index 9047cab6..bd57c59 100644 --- a/arch/arm64/kernel/cpuidle.c +++ b/arch/arm64/kernel/cpuidle.c @@ -37,10 +37,9 @@ int arm_cpuidle_suspend(int index) int cpu = smp_processor_id(); /* - * If cpu_ops have not been registered or suspend - * has not been initialized, cpu_suspend call fails early. + * If suspend has not been initialized, cpu_suspend call fails early. */ - if (!cpu_ops[cpu] || !cpu_ops[cpu]->cpu_suspend) + if (!cpu_ops[cpu]->cpu_suspend) return -EOPNOTSUPP; return cpu_ops[cpu]->cpu_suspend(index); } -- 2.8.0.rc3 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient 2016-03-24 5:08 [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient Jisheng Zhang 2016-03-24 5:08 ` [PATCH 1/2] arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() Jisheng Zhang @ 2016-03-24 5:08 ` Jisheng Zhang [not found] ` <20160324111507.GB9323@arm.com> 2 siblings, 0 replies; 6+ messages in thread From: Jisheng Zhang @ 2016-03-24 5:08 UTC (permalink / raw) To: linux-arm-kernel Currently, we check cpu_ops->cpu_suspend every time when entering a low-power idle state. But this check could be avoided in this hot path by moving it into arm_cpuidle_init() to reduce arm_cpuidle_suspend() overhead a bit. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> --- arch/arm64/kernel/cpuidle.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c index bd57c59..e11857f 100644 --- a/arch/arm64/kernel/cpuidle.c +++ b/arch/arm64/kernel/cpuidle.c @@ -19,7 +19,8 @@ int __init arm_cpuidle_init(unsigned int cpu) { int ret = -EOPNOTSUPP; - if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_init_idle) + if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_suspend && + cpu_ops[cpu]->cpu_init_idle) ret = cpu_ops[cpu]->cpu_init_idle(cpu); return ret; @@ -36,10 +37,5 @@ int arm_cpuidle_suspend(int index) { int cpu = smp_processor_id(); - /* - * If suspend has not been initialized, cpu_suspend call fails early. - */ - if (!cpu_ops[cpu]->cpu_suspend) - return -EOPNOTSUPP; return cpu_ops[cpu]->cpu_suspend(index); } -- 2.8.0.rc3 ^ permalink raw reply related [flat|nested] 6+ messages in thread
[parent not found: <20160324111507.GB9323@arm.com>]
* [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient [not found] ` <20160324111507.GB9323@arm.com> @ 2016-03-24 13:18 ` Jisheng Zhang 2016-03-24 16:44 ` Lorenzo Pieralisi 0 siblings, 1 reply; 6+ messages in thread From: Jisheng Zhang @ 2016-03-24 13:18 UTC (permalink / raw) To: linux-arm-kernel Hi Will, On Thu, 24 Mar 2016 11:15:07 +0000 Will Deacon wrote: > On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote: > > This series is to improve the arm_cpuidle_suspend() a bit by removing/moving > > out checks from this hot path. > > > > Jisheng Zhang (2): > > arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() > > arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient > > > > arch/arm64/kernel/cpuidle.c | 9 ++------- > > 1 file changed, 2 insertions(+), 7 deletions(-) > > These look fine to me, but do you have any rough numbers showing what > sort of improvement we get from this change? Good question. Here it is: I measured the 4096 * time from arm_cpuidle_suspend entry point to the cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB board. 1. only one shell, no other process, hot-unplug secondary cpus, execute the following cmd while true do sleep 0.2 done before the patch: 1581220ns after the patch: 1579630ns reduced by 0.1% 2. only one shell, no other process, hot-unplug secondary cpus, execute the following cmd while true do md5sum /tmp/testfile sleep 0.2 done NOTE the testfile size should be larger than L1+L2 cache size before the patch: 1961960ns after the patch: 1912500ns reduced by 2.5% So the more complex the system load, the bigger the improvement. Thanks, Jisheng ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient 2016-03-24 13:18 ` [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() " Jisheng Zhang @ 2016-03-24 16:44 ` Lorenzo Pieralisi 2016-03-25 2:40 ` Jisheng Zhang 0 siblings, 1 reply; 6+ messages in thread From: Lorenzo Pieralisi @ 2016-03-24 16:44 UTC (permalink / raw) To: linux-arm-kernel On Thu, Mar 24, 2016 at 09:18:53PM +0800, Jisheng Zhang wrote: > Hi Will, > > On Thu, 24 Mar 2016 11:15:07 +0000 Will Deacon wrote: > > > On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote: > > > This series is to improve the arm_cpuidle_suspend() a bit by removing/moving > > > out checks from this hot path. > > > > > > Jisheng Zhang (2): > > > arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() > > > arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient > > > > > > arch/arm64/kernel/cpuidle.c | 9 ++------- > > > 1 file changed, 2 insertions(+), 7 deletions(-) > > > > These look fine to me, but do you have any rough numbers showing what > > sort of improvement we get from this change? > > Good question. Here it is: > > I measured the 4096 * time from arm_cpuidle_suspend entry point to the > cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB board. > > 1. only one shell, no other process, hot-unplug secondary cpus, execute the > following cmd > > while true > do > sleep 0.2 > done > > before the patch: 1581220ns > > after the patch: 1579630ns > > reduced by 0.1% > > 2. only one shell, no other process, hot-unplug secondary cpus, execute the > following cmd > > while true > do > md5sum /tmp/testfile > sleep 0.2 > done > > NOTE the testfile size should be larger than L1+L2 cache size > > before the patch: 1961960ns > after the patch: 1912500ns > > reduced by 2.5% > > So the more complex the system load, the bigger the improvement. So between arm_cpuidle_suspend() and psci_cpu_suspend_enter() the checks that you are removing are almost the *only* code that is currently executed and this patch saves us best case 12ns per idle state entry (which is noise compared to CPU PM notifiers/FW execution time) if I am not mistaken, I can't wait to use that energy for something more useful :) Anyway, as a clean-up your patches are fine it is sloppy to check those pointers on every idle state entry (do you really need two patches ?), so: Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient 2016-03-24 16:44 ` Lorenzo Pieralisi @ 2016-03-25 2:40 ` Jisheng Zhang 0 siblings, 0 replies; 6+ messages in thread From: Jisheng Zhang @ 2016-03-25 2:40 UTC (permalink / raw) To: linux-arm-kernel Hi Lorenzo, On Thu, 24 Mar 2016 16:44:19 +0000 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > On Thu, Mar 24, 2016 at 09:18:53PM +0800, Jisheng Zhang wrote: > > Hi Will, > > > > On Thu, 24 Mar 2016 11:15:07 +0000 Will Deacon wrote: > > > > > On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote: > > > > This series is to improve the arm_cpuidle_suspend() a bit by removing/moving > > > > out checks from this hot path. > > > > > > > > Jisheng Zhang (2): > > > > arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() > > > > arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient > > > > > > > > arch/arm64/kernel/cpuidle.c | 9 ++------- > > > > 1 file changed, 2 insertions(+), 7 deletions(-) > > > > > > These look fine to me, but do you have any rough numbers showing what > > > sort of improvement we get from this change? > > > > Good question. Here it is: > > > > I measured the 4096 * time from arm_cpuidle_suspend entry point to the > > cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB board. > > > > 1. only one shell, no other process, hot-unplug secondary cpus, execute the > > following cmd > > > > while true > > do > > sleep 0.2 > > done > > > > before the patch: 1581220ns > > > > after the patch: 1579630ns > > > > reduced by 0.1% > > > > 2. only one shell, no other process, hot-unplug secondary cpus, execute the > > following cmd > > > > while true > > do > > md5sum /tmp/testfile > > sleep 0.2 > > done > > > > NOTE the testfile size should be larger than L1+L2 cache size > > > > before the patch: 1961960ns > > after the patch: 1912500ns > > > > reduced by 2.5% > > > > So the more complex the system load, the bigger the improvement. > > So between arm_cpuidle_suspend() and psci_cpu_suspend_enter() the > checks that you are removing are almost the *only* code that is > currently executed and this patch saves us best case 12ns per idle state > entry (which is noise compared to CPU PM notifiers/FW execution time) > if I am not mistaken, I can't wait to use that energy for something more > useful :) > > Anyway, as a clean-up your patches are fine it is sloppy to check those > pointers on every idle state entry (do you really need two patches ?), so: hmm, yes, it makes more sense to combined them into one patch. > > Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Thanks for reviewing, Jisheng ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-03-25 2:40 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-24 5:08 [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient Jisheng Zhang 2016-03-24 5:08 ` [PATCH 1/2] arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() Jisheng Zhang 2016-03-24 5:08 ` [PATCH 2/2] arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient Jisheng Zhang [not found] ` <20160324111507.GB9323@arm.com> 2016-03-24 13:18 ` [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() " Jisheng Zhang 2016-03-24 16:44 ` Lorenzo Pieralisi 2016-03-25 2:40 ` Jisheng Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).