linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient
@ 2016-03-24  5:08 Jisheng Zhang
  2016-03-24  5:08 ` [PATCH 1/2] arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() Jisheng Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jisheng Zhang @ 2016-03-24  5:08 UTC (permalink / raw)
  To: linux-arm-kernel

This series is to improve the arm_cpuidle_suspend() a bit by removing/moving
out checks from this hot path.

Jisheng Zhang (2):
  arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
  arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient

 arch/arm64/kernel/cpuidle.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
  2016-03-24  5:08 [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient Jisheng Zhang
@ 2016-03-24  5:08 ` Jisheng Zhang
  2016-03-24  5:08 ` [PATCH 2/2] arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient Jisheng Zhang
       [not found] ` <20160324111507.GB9323@arm.com>
  2 siblings, 0 replies; 6+ messages in thread
From: Jisheng Zhang @ 2016-03-24  5:08 UTC (permalink / raw)
  To: linux-arm-kernel

If cpu_ops has not been registered, arm_cpuidle_init() will return
-EOPNOTSUPP, so arm_cpuidle_suspend() will never have chance to
run. In other word, the cpu_ops check can be avoid.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
---
 arch/arm64/kernel/cpuidle.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c
index 9047cab6..bd57c59 100644
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -37,10 +37,9 @@ int arm_cpuidle_suspend(int index)
 	int cpu = smp_processor_id();
 
 	/*
-	 * If cpu_ops have not been registered or suspend
-	 * has not been initialized, cpu_suspend call fails early.
+	 * If suspend has not been initialized, cpu_suspend call fails early.
 	 */
-	if (!cpu_ops[cpu] || !cpu_ops[cpu]->cpu_suspend)
+	if (!cpu_ops[cpu]->cpu_suspend)
 		return -EOPNOTSUPP;
 	return cpu_ops[cpu]->cpu_suspend(index);
 }
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient
  2016-03-24  5:08 [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient Jisheng Zhang
  2016-03-24  5:08 ` [PATCH 1/2] arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() Jisheng Zhang
@ 2016-03-24  5:08 ` Jisheng Zhang
       [not found] ` <20160324111507.GB9323@arm.com>
  2 siblings, 0 replies; 6+ messages in thread
From: Jisheng Zhang @ 2016-03-24  5:08 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, we check cpu_ops->cpu_suspend every time when entering a
low-power idle state. But this check could be avoided in this hot path
by moving it into arm_cpuidle_init() to reduce arm_cpuidle_suspend()
overhead a bit.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
---
 arch/arm64/kernel/cpuidle.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c
index bd57c59..e11857f 100644
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -19,7 +19,8 @@ int __init arm_cpuidle_init(unsigned int cpu)
 {
 	int ret = -EOPNOTSUPP;
 
-	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_init_idle)
+	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_suspend &&
+			cpu_ops[cpu]->cpu_init_idle)
 		ret = cpu_ops[cpu]->cpu_init_idle(cpu);
 
 	return ret;
@@ -36,10 +37,5 @@ int arm_cpuidle_suspend(int index)
 {
 	int cpu = smp_processor_id();
 
-	/*
-	 * If suspend has not been initialized, cpu_suspend call fails early.
-	 */
-	if (!cpu_ops[cpu]->cpu_suspend)
-		return -EOPNOTSUPP;
 	return cpu_ops[cpu]->cpu_suspend(index);
 }
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient
       [not found] ` <20160324111507.GB9323@arm.com>
@ 2016-03-24 13:18   ` Jisheng Zhang
  2016-03-24 16:44     ` Lorenzo Pieralisi
  0 siblings, 1 reply; 6+ messages in thread
From: Jisheng Zhang @ 2016-03-24 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Thu, 24 Mar 2016 11:15:07 +0000 Will Deacon wrote:

> On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote:
> > This series is to improve the arm_cpuidle_suspend() a bit by removing/moving
> > out checks from this hot path.
> > 
> > Jisheng Zhang (2):
> >   arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
> >   arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient
> > 
> >  arch/arm64/kernel/cpuidle.c | 9 ++-------
> >  1 file changed, 2 insertions(+), 7 deletions(-)  
> 
> These look fine to me, but do you have any rough numbers showing what
> sort of improvement we get from this change?

Good question. Here it is:

I measured the 4096 * time from arm_cpuidle_suspend entry point to the
cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB board.

1. only one shell, no other process, hot-unplug secondary cpus, execute the
following cmd

while true
do
	sleep 0.2
done

before the patch: 1581220ns

after the patch: 1579630ns

reduced by 0.1%

2. only one shell, no other process, hot-unplug secondary cpus, execute the
following cmd

while true
do
	md5sum /tmp/testfile
	sleep 0.2
done

NOTE the testfile size should be larger than L1+L2 cache size

before the patch: 1961960ns
after the patch: 1912500ns

reduced by 2.5%

So the more complex the system load, the bigger the improvement.

Thanks,
Jisheng

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient
  2016-03-24 13:18   ` [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() " Jisheng Zhang
@ 2016-03-24 16:44     ` Lorenzo Pieralisi
  2016-03-25  2:40       ` Jisheng Zhang
  0 siblings, 1 reply; 6+ messages in thread
From: Lorenzo Pieralisi @ 2016-03-24 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 24, 2016 at 09:18:53PM +0800, Jisheng Zhang wrote:
> Hi Will,
> 
> On Thu, 24 Mar 2016 11:15:07 +0000 Will Deacon wrote:
> 
> > On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote:
> > > This series is to improve the arm_cpuidle_suspend() a bit by removing/moving
> > > out checks from this hot path.
> > > 
> > > Jisheng Zhang (2):
> > >   arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
> > >   arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient
> > > 
> > >  arch/arm64/kernel/cpuidle.c | 9 ++-------
> > >  1 file changed, 2 insertions(+), 7 deletions(-)  
> > 
> > These look fine to me, but do you have any rough numbers showing what
> > sort of improvement we get from this change?
> 
> Good question. Here it is:
> 
> I measured the 4096 * time from arm_cpuidle_suspend entry point to the
> cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB board.
> 
> 1. only one shell, no other process, hot-unplug secondary cpus, execute the
> following cmd
> 
> while true
> do
> 	sleep 0.2
> done
> 
> before the patch: 1581220ns
> 
> after the patch: 1579630ns
> 
> reduced by 0.1%
> 
> 2. only one shell, no other process, hot-unplug secondary cpus, execute the
> following cmd
> 
> while true
> do
> 	md5sum /tmp/testfile
> 	sleep 0.2
> done
> 
> NOTE the testfile size should be larger than L1+L2 cache size
> 
> before the patch: 1961960ns
> after the patch: 1912500ns
> 
> reduced by 2.5%
> 
> So the more complex the system load, the bigger the improvement.

So between arm_cpuidle_suspend() and psci_cpu_suspend_enter() the
checks that you are removing are almost the *only* code that is
currently executed and this patch saves us best case 12ns per idle state
entry (which is noise compared to CPU PM notifiers/FW execution time)
if I am not mistaken, I can't wait to use that energy for something more
useful :)

Anyway, as a clean-up your patches are fine it is sloppy to check those
pointers on every idle state entry (do you really need two patches ?), so:

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient
  2016-03-24 16:44     ` Lorenzo Pieralisi
@ 2016-03-25  2:40       ` Jisheng Zhang
  0 siblings, 0 replies; 6+ messages in thread
From: Jisheng Zhang @ 2016-03-25  2:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Lorenzo,

On Thu, 24 Mar 2016 16:44:19 +0000
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:

> On Thu, Mar 24, 2016 at 09:18:53PM +0800, Jisheng Zhang wrote:
> > Hi Will,
> > 
> > On Thu, 24 Mar 2016 11:15:07 +0000 Will Deacon wrote:
> >   
> > > On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote:  
> > > > This series is to improve the arm_cpuidle_suspend() a bit by removing/moving
> > > > out checks from this hot path.
> > > > 
> > > > Jisheng Zhang (2):
> > > >   arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
> > > >   arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient
> > > > 
> > > >  arch/arm64/kernel/cpuidle.c | 9 ++-------
> > > >  1 file changed, 2 insertions(+), 7 deletions(-)    
> > > 
> > > These look fine to me, but do you have any rough numbers showing what
> > > sort of improvement we get from this change?  
> > 
> > Good question. Here it is:
> > 
> > I measured the 4096 * time from arm_cpuidle_suspend entry point to the
> > cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB board.
> > 
> > 1. only one shell, no other process, hot-unplug secondary cpus, execute the
> > following cmd
> > 
> > while true
> > do
> > 	sleep 0.2
> > done
> > 
> > before the patch: 1581220ns
> > 
> > after the patch: 1579630ns
> > 
> > reduced by 0.1%
> > 
> > 2. only one shell, no other process, hot-unplug secondary cpus, execute the
> > following cmd
> > 
> > while true
> > do
> > 	md5sum /tmp/testfile
> > 	sleep 0.2
> > done
> > 
> > NOTE the testfile size should be larger than L1+L2 cache size
> > 
> > before the patch: 1961960ns
> > after the patch: 1912500ns
> > 
> > reduced by 2.5%
> > 
> > So the more complex the system load, the bigger the improvement.  
> 
> So between arm_cpuidle_suspend() and psci_cpu_suspend_enter() the
> checks that you are removing are almost the *only* code that is
> currently executed and this patch saves us best case 12ns per idle state
> entry (which is noise compared to CPU PM notifiers/FW execution time)
> if I am not mistaken, I can't wait to use that energy for something more
> useful :)
> 
> Anyway, as a clean-up your patches are fine it is sloppy to check those
> pointers on every idle state entry (do you really need two patches ?), so:

hmm, yes, it makes more sense to combined them into one patch.

> 
> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Thanks for reviewing,
Jisheng

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-03-25  2:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-24  5:08 [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() more efficient Jisheng Zhang
2016-03-24  5:08 ` [PATCH 1/2] arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend() Jisheng Zhang
2016-03-24  5:08 ` [PATCH 2/2] arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient Jisheng Zhang
     [not found] ` <20160324111507.GB9323@arm.com>
2016-03-24 13:18   ` [PATCH 0/2] arm64: cpuidle: make arm_cpuidle_suspend() " Jisheng Zhang
2016-03-24 16:44     ` Lorenzo Pieralisi
2016-03-25  2:40       ` Jisheng Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).