* Re: [PATCH] powerpc/defconfigs: Set HZ=1000 on ppc64 and powernv defconfigs
2025-03-30 7:47 [PATCH] powerpc/defconfigs: Set HZ=1000 on ppc64 and powernv defconfigs Madadi Vineeth Reddy
@ 2025-03-31 7:08 ` Shrikanth Hegde
2025-04-01 5:43 ` Mukesh Kumar Chaurasiya
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Shrikanth Hegde @ 2025-03-31 7:08 UTC (permalink / raw)
To: Madadi Vineeth Reddy
Cc: linuxppc-dev, LKML, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Ard Biesheuvel,
Eric Biggers, Martin K . Petersen, Andrew Morton, Yosry Ahmed,
Tamir Duberstein, Srikar Dronamraju
On 3/30/25 13:17, Madadi Vineeth Reddy wrote:
> Commit 030bdc3fd080 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
> defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
> higher tick rate due to high-resolution timers and concerns about timer
> interrupt overhead and cascading effects in the timer wheel.
>
> However, improvements have been made to the timer wheel algorithm since
> then, particularly in eliminating cascading effects at the cost of minor
> timekeeping inaccuracies. More details are available here
> https://lwn.net/Articles/646950/. This removes the original concern about
> cascading, and the reliance on high-resolution timers is not applicable
> to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
>
> With the introduction of the EEVDF scheduler, users can specify custom
> slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
> (10ms tick interval), base_slice is ineffective. Workloads like stress-ng
> that do not voluntarily yield the CPU run for ~10ms before switching out.
> Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
> task latency, but this effect is lost due to the coarse 10ms tick.
>
It makes sense since base_slice is the only tunable available under EEVDF.
This would allow the users to make use of it.
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
> and user-defined slices work as expected. Benchmark results support this
> change:
>
> Latency improvements in schbench with EEVDF under stress-ng-induced noise:
>
> Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
> --------------------------------------------------------------------
> EEVDF 1000 No 0.30x
> EEVDF 1000 2 ms 0.29x
> EEVDF (default) 100 No 1.00x
>
> Switching to HZ=1000 reduces the 99th percentile latency in schbench by
> ~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
> for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
> schbench gets CPU time sooner, reducing its latency.
>
> Daytrader Performance:
>
> Daytrader results show minor variation within standard deviation,
> indicating no significant regression.
>
> Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
> --------------------------------------------------------------------------
> 30 u, 1 i +3.01% (1.62%)
> 60 u, 1 i +1.46% (2.69%)
> 90 u, 1 i –1.33% (3.09%)
> 30 u, 2 i -1.20% (1.71%)
> 30 u, 3 i –0.07% (1.33%)
>
> Avg. Response Time: No Change (=)
>
> pgbench select queries:
>
> Metric 1000HZ vs 100HZ (Std Dev%)
> ------------------------------------------------------------------
> Average TPS Change +2.16% (1.27%)
> Average Latency Change –2.21% (1.21%)
>
> Average TPS: Higher the better
> Average Latency: Lower the better
>
> pgbench shows both throughput and latency improvements beyond standard
> deviation.
>
> Given these results and the improvements in timer wheel implementation,
> increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
> base_slice and allows fine-tuned scheduling for latency-sensitive
> workloads.
>
> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> ---
> arch/powerpc/configs/powernv_defconfig | 2 +-
> arch/powerpc/configs/ppc64_defconfig | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
> index 6b6d7467fecf..8abf17d26b3a 100644
> --- a/arch/powerpc/configs/powernv_defconfig
> +++ b/arch/powerpc/configs/powernv_defconfig
> @@ -46,7 +46,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_IDLE=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_BINFMT_MISC=m
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_PPC_UV=y
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 5fa154185efa..45d437e4c62e 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -57,7 +57,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_FREQ_PMAC64=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_KEXEC=y
> CONFIG_KEXEC_FILE=y
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] powerpc/defconfigs: Set HZ=1000 on ppc64 and powernv defconfigs
2025-03-30 7:47 [PATCH] powerpc/defconfigs: Set HZ=1000 on ppc64 and powernv defconfigs Madadi Vineeth Reddy
2025-03-31 7:08 ` Shrikanth Hegde
@ 2025-04-01 5:43 ` Mukesh Kumar Chaurasiya
2025-04-02 7:43 ` Srikar Dronamraju
2025-04-16 6:40 ` Madadi Vineeth Reddy
3 siblings, 0 replies; 5+ messages in thread
From: Mukesh Kumar Chaurasiya @ 2025-04-01 5:43 UTC (permalink / raw)
To: Madadi Vineeth Reddy
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Ard Biesheuvel, Eric Biggers,
Martin K . Petersen, Andrew Morton, Yosry Ahmed, Tamir Duberstein,
Srikar Dronamraju, Shrikanth Hegde, linuxppc-dev, LKML
On Sun, Mar 30, 2025 at 01:17:34PM +0530, Madadi Vineeth Reddy wrote:
> Commit 030bdc3fd080 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
> defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
> higher tick rate due to high-resolution timers and concerns about timer
> interrupt overhead and cascading effects in the timer wheel.
>
> However, improvements have been made to the timer wheel algorithm since
> then, particularly in eliminating cascading effects at the cost of minor
> timekeeping inaccuracies. More details are available here
> https://lwn.net/Articles/646950/. This removes the original concern about
> cascading, and the reliance on high-resolution timers is not applicable
> to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
>
> With the introduction of the EEVDF scheduler, users can specify custom
> slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
> (10ms tick interval), base_slice is ineffective. Workloads like stress-ng
> that do not voluntarily yield the CPU run for ~10ms before switching out.
> Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
> task latency, but this effect is lost due to the coarse 10ms tick.
>
> By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
> and user-defined slices work as expected. Benchmark results support this
> change:
>
> Latency improvements in schbench with EEVDF under stress-ng-induced noise:
>
> Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
> --------------------------------------------------------------------
> EEVDF 1000 No 0.30x
> EEVDF 1000 2 ms 0.29x
> EEVDF (default) 100 No 1.00x
>
NIT: default value on top would be a little less confusing.
> Switching to HZ=1000 reduces the 99th percentile latency in schbench by
> ~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
> for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
> schbench gets CPU time sooner, reducing its latency.
>
> Daytrader Performance:
>
> Daytrader results show minor variation within standard deviation,
> indicating no significant regression.
>
> Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
> --------------------------------------------------------------------------
> 30 u, 1 i +3.01% (1.62%)
> 60 u, 1 i +1.46% (2.69%)
> 90 u, 1 i –1.33% (3.09%)
> 30 u, 2 i -1.20% (1.71%)
> 30 u, 3 i –0.07% (1.33%)
>
> Avg. Response Time: No Change (=)
>
> pgbench select queries:
>
> Metric 1000HZ vs 100HZ (Std Dev%)
> ------------------------------------------------------------------
> Average TPS Change +2.16% (1.27%)
> Average Latency Change –2.21% (1.21%)
>
> Average TPS: Higher the better
> Average Latency: Lower the better
>
> pgbench shows both throughput and latency improvements beyond standard
> deviation.
>
> Given these results and the improvements in timer wheel implementation,
> increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
> base_slice and allows fine-tuned scheduling for latency-sensitive
> workloads.
>
> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> ---
> arch/powerpc/configs/powernv_defconfig | 2 +-
> arch/powerpc/configs/ppc64_defconfig | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
> index 6b6d7467fecf..8abf17d26b3a 100644
> --- a/arch/powerpc/configs/powernv_defconfig
> +++ b/arch/powerpc/configs/powernv_defconfig
> @@ -46,7 +46,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_IDLE=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_BINFMT_MISC=m
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_PPC_UV=y
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 5fa154185efa..45d437e4c62e 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -57,7 +57,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_FREQ_PMAC64=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_KEXEC=y
> CONFIG_KEXEC_FILE=y
> --
> 2.47.0
>
LGTM
Reviewed-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] powerpc/defconfigs: Set HZ=1000 on ppc64 and powernv defconfigs
2025-03-30 7:47 [PATCH] powerpc/defconfigs: Set HZ=1000 on ppc64 and powernv defconfigs Madadi Vineeth Reddy
2025-03-31 7:08 ` Shrikanth Hegde
2025-04-01 5:43 ` Mukesh Kumar Chaurasiya
@ 2025-04-02 7:43 ` Srikar Dronamraju
2025-04-16 6:40 ` Madadi Vineeth Reddy
3 siblings, 0 replies; 5+ messages in thread
From: Srikar Dronamraju @ 2025-04-02 7:43 UTC (permalink / raw)
To: Madadi Vineeth Reddy
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Ard Biesheuvel, Eric Biggers,
Martin K . Petersen, Andrew Morton, Yosry Ahmed, Tamir Duberstein,
Shrikanth Hegde, linuxppc-dev, LKML
* Madadi Vineeth Reddy <vineethr@linux.ibm.com> [2025-03-30 13:17:34]:
> Commit 030bdc3fd080 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
> defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
> higher tick rate due to high-resolution timers and concerns about timer
> interrupt overhead and cascading effects in the timer wheel.
>
> However, improvements have been made to the timer wheel algorithm since
> then, particularly in eliminating cascading effects at the cost of minor
> timekeeping inaccuracies. More details are available here
> https://lwn.net/Articles/646950/. This removes the original concern about
> cascading, and the reliance on high-resolution timers is not applicable
> to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
>
> With the introduction of the EEVDF scheduler, users can specify custom
> slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
> (10ms tick interval), base_slice is ineffective. Workloads like stress-ng
> that do not voluntarily yield the CPU run for ~10ms before switching out.
> Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
> task latency, but this effect is lost due to the coarse 10ms tick.
>
> By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
> and user-defined slices work as expected. Benchmark results support this
> change:
>
> Latency improvements in schbench with EEVDF under stress-ng-induced noise:
>
> Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
> --------------------------------------------------------------------
> EEVDF 1000 No 0.30x
> EEVDF 1000 2 ms 0.29x
> EEVDF (default) 100 No 1.00x
>
> Switching to HZ=1000 reduces the 99th percentile latency in schbench by
> ~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
> for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
> schbench gets CPU time sooner, reducing its latency.
>
> Daytrader Performance:
>
> Daytrader results show minor variation within standard deviation,
> indicating no significant regression.
>
> Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
> --------------------------------------------------------------------------
> 30 u, 1 i +3.01% (1.62%)
> 60 u, 1 i +1.46% (2.69%)
> 90 u, 1 i –1.33% (3.09%)
> 30 u, 2 i -1.20% (1.71%)
> 30 u, 3 i –0.07% (1.33%)
>
> Avg. Response Time: No Change (=)
>
> pgbench select queries:
>
> Metric 1000HZ vs 100HZ (Std Dev%)
> ------------------------------------------------------------------
> Average TPS Change +2.16% (1.27%)
> Average Latency Change –2.21% (1.21%)
>
> Average TPS: Higher the better
> Average Latency: Lower the better
>
> pgbench shows both throughput and latency improvements beyond standard
> deviation.
>
> Given these results and the improvements in timer wheel implementation,
> increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
> base_slice and allows fine-tuned scheduling for latency-sensitive
> workloads.
>
> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Good work Vineeth,
As pointed by you, the base slice is 3ms and having base slice as a multiple
of tick will help. The numbers also support this change.
Looks good to me.
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] powerpc/defconfigs: Set HZ=1000 on ppc64 and powernv defconfigs
2025-03-30 7:47 [PATCH] powerpc/defconfigs: Set HZ=1000 on ppc64 and powernv defconfigs Madadi Vineeth Reddy
` (2 preceding siblings ...)
2025-04-02 7:43 ` Srikar Dronamraju
@ 2025-04-16 6:40 ` Madadi Vineeth Reddy
3 siblings, 0 replies; 5+ messages in thread
From: Madadi Vineeth Reddy @ 2025-04-16 6:40 UTC (permalink / raw)
To: Madhavan Srinivasan
Cc: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
Ard Biesheuvel, Eric Biggers, Martin K . Petersen, Andrew Morton,
Yosry Ahmed, Tamir Duberstein, Srikar Dronamraju, Shrikanth Hegde,
linuxppc-dev, LKML, Madadi Vineeth Reddy
Hi Maddy,
Ping.
Any thoughts on this? Can it be picked up?
Thanks,
Madadi Vineeth Reddy
On 30/03/25 13:17, Madadi Vineeth Reddy wrote:
> Commit 030bdc3fd080 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
> defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
> higher tick rate due to high-resolution timers and concerns about timer
> interrupt overhead and cascading effects in the timer wheel.
>
> However, improvements have been made to the timer wheel algorithm since
> then, particularly in eliminating cascading effects at the cost of minor
> timekeeping inaccuracies. More details are available here
> https://lwn.net/Articles/646950/. This removes the original concern about
> cascading, and the reliance on high-resolution timers is not applicable
> to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
>
> With the introduction of the EEVDF scheduler, users can specify custom
> slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
> (10ms tick interval), base_slice is ineffective. Workloads like stress-ng
> that do not voluntarily yield the CPU run for ~10ms before switching out.
> Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
> task latency, but this effect is lost due to the coarse 10ms tick.
>
> By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
> and user-defined slices work as expected. Benchmark results support this
> change:
>
> Latency improvements in schbench with EEVDF under stress-ng-induced noise:
>
> Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
> --------------------------------------------------------------------
> EEVDF 1000 No 0.30x
> EEVDF 1000 2 ms 0.29x
> EEVDF (default) 100 No 1.00x
>
> Switching to HZ=1000 reduces the 99th percentile latency in schbench by
> ~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
> for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
> schbench gets CPU time sooner, reducing its latency.
>
> Daytrader Performance:
>
> Daytrader results show minor variation within standard deviation,
> indicating no significant regression.
>
> Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
> --------------------------------------------------------------------------
> 30 u, 1 i +3.01% (1.62%)
> 60 u, 1 i +1.46% (2.69%)
> 90 u, 1 i –1.33% (3.09%)
> 30 u, 2 i -1.20% (1.71%)
> 30 u, 3 i –0.07% (1.33%)
>
> Avg. Response Time: No Change (=)
>
> pgbench select queries:
>
> Metric 1000HZ vs 100HZ (Std Dev%)
> ------------------------------------------------------------------
> Average TPS Change +2.16% (1.27%)
> Average Latency Change –2.21% (1.21%)
>
> Average TPS: Higher the better
> Average Latency: Lower the better
>
> pgbench shows both throughput and latency improvements beyond standard
> deviation.
>
> Given these results and the improvements in timer wheel implementation,
> increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
> base_slice and allows fine-tuned scheduling for latency-sensitive
> workloads.
>
> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> ---
> arch/powerpc/configs/powernv_defconfig | 2 +-
> arch/powerpc/configs/ppc64_defconfig | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
> index 6b6d7467fecf..8abf17d26b3a 100644
> --- a/arch/powerpc/configs/powernv_defconfig
> +++ b/arch/powerpc/configs/powernv_defconfig
> @@ -46,7 +46,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_IDLE=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_BINFMT_MISC=m
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_PPC_UV=y
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 5fa154185efa..45d437e4c62e 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -57,7 +57,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_FREQ_PMAC64=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_KEXEC=y
> CONFIG_KEXEC_FILE=y
^ permalink raw reply [flat|nested] 5+ messages in thread