* [PATCH 0/3] ARM: shmobile: Fix loop-based delays @ 2016-01-11 18:41 Geert Uytterhoeven 2016-01-11 18:41 ` [PATCH 1/3] ARM: shmobile: timer: Fix preset_lpj leading to too short delays Geert Uytterhoeven ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Geert Uytterhoeven @ 2016-01-11 18:41 UTC (permalink / raw) To: linux-sh Hi Simon, Magnus, I accidentally noticed that on r8a7740/armadillo, "mdelay(1000)" took only 333 ms with my personal config, while it took 3675 ms with shmobile_defconfig. Finding out why brought me an animating journey through obscure code and kernel config options... Apparently loop-based delays may complete early on all shmobile SoCs, which may lead to subtle driver failures. Fortunately(?) this was mitigated by: 1. the ARM arch timer on Cortex A7 and A15, which is used instead of loop-based delays if available, 2. shmobile_defconfig having CONFIG_CPU_BPREDICT_DISABLE enabled (why?), which makes loops up to 16 times slower. This patch series fixes both, and removes a related check for Cortex A8 CPU cores, which are no longer supported anyway. Tested on a collection of shmobile SoCs with Cortex A7, A9, and A15 CPU cores, with some verification on BeagleBone Black (Cortex A8). Thanks for applying! Geert Uytterhoeven (3): ARM: shmobile: timer: Fix preset_lpj leading to too short delays ARM: shmobile: timer: Drop support for Cortex A8 ARM: shmobile: defconfig: Do not enable CONFIG_CPU_BPREDICT_DISABLE arch/arm/configs/shmobile_defconfig | 1 - arch/arm/mach-shmobile/timer.c | 26 +++++++++----------------- 2 files changed, 9 insertions(+), 18 deletions(-) -- 1.9.1 Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/3] ARM: shmobile: timer: Fix preset_lpj leading to too short delays @ 2016-01-11 18:41 ` Geert Uytterhoeven 2016-01-18 1:42 ` Simon Horman 0 siblings, 1 reply; 7+ messages in thread From: Geert Uytterhoeven @ 2016-01-11 18:41 UTC (permalink / raw) To: linux-sh On all shmobile ARM SoCs, loop-based delays may complete early, which can be after only 1/3 (Cortex A9) or 1/2 (Cortex A7 or A15) of the minimum required time. This is caused by calculating preset_lpj based on incorrect assumptions about the number of clock cycles per loop: - All of Cortex A7, A9, and A15 run __loop_delay() at 1 loop per CPU clock cycle, - As of commit 11d4bb1bd067f9d0 ("ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation"), Cortex A8 runs __loop_delay() at 1 loop per 2 instead of 3 CPU clock cycles. On SoCs with Cortex A7 and/or A15 CPU cores, this went unnoticed, as delays use the ARM arch timer if available. R-Car Gen2 doesn't work if the arch timer is disabled. However, APE6 can be used without the arch timer. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> --- Backporting note: older kernels need modifications in all calls to shmobile_setup_delay() or shmobile_setup_delay_hz(). --- arch/arm/mach-shmobile/timer.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/arch/arm/mach-shmobile/timer.c b/arch/arm/mach-shmobile/timer.c index c17d4d3881ffc45e..d61014a52b43e205 100644 --- a/arch/arm/mach-shmobile/timer.c +++ b/arch/arm/mach-shmobile/timer.c @@ -38,8 +38,7 @@ static void __init shmobile_setup_delay_hz(unsigned int max_cpu_core_hz, void __init shmobile_init_delay(void) { struct device_node *np, *cpus; - bool is_a7_a8_a9 = false; - bool is_a15 = false; + unsigned int div = 0; bool has_arch_timer = false; u32 max_freq = 0; @@ -53,27 +52,22 @@ void __init shmobile_init_delay(void) if (!of_property_read_u32(np, "clock-frequency", &freq)) max_freq = max(max_freq, freq); - if (of_device_is_compatible(np, "arm,cortex-a8") || - of_device_is_compatible(np, "arm,cortex-a9")) { - is_a7_a8_a9 = true; - } else if (of_device_is_compatible(np, "arm,cortex-a7")) { - is_a7_a8_a9 = true; - has_arch_timer = true; - } else if (of_device_is_compatible(np, "arm,cortex-a15")) { - is_a15 = true; + if (of_device_is_compatible(np, "arm,cortex-a8")) { + div = 2; + } else if (of_device_is_compatible(np, "arm,cortex-a9")) { + div = 1; + } else if (of_device_is_compatible(np, "arm,cortex-a7") || + of_device_is_compatible(np, "arm,cortex-a15")) { + div = 1; has_arch_timer = true; } } of_node_put(cpus); - if (!max_freq) + if (!max_freq || !div) return; - if (!has_arch_timer || !IS_ENABLED(CONFIG_ARM_ARCH_TIMER)) { - if (is_a7_a8_a9) - shmobile_setup_delay_hz(max_freq, 1, 3); - else if (is_a15) - shmobile_setup_delay_hz(max_freq, 2, 4); - } + if (!has_arch_timer || !IS_ENABLED(CONFIG_ARM_ARCH_TIMER)) + shmobile_setup_delay_hz(max_freq, 1, div); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/3] ARM: shmobile: timer: Fix preset_lpj leading to too short delays 2016-01-11 18:41 ` [PATCH 1/3] ARM: shmobile: timer: Fix preset_lpj leading to too short delays Geert Uytterhoeven @ 2016-01-18 1:42 ` Simon Horman 0 siblings, 0 replies; 7+ messages in thread From: Simon Horman @ 2016-01-18 1:42 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 11, 2016 at 07:41:12PM +0100, Geert Uytterhoeven wrote: > On all shmobile ARM SoCs, loop-based delays may complete early, which > can be after only 1/3 (Cortex A9) or 1/2 (Cortex A7 or A15) of the > minimum required time. > > This is caused by calculating preset_lpj based on incorrect assumptions > about the number of clock cycles per loop: > - All of Cortex A7, A9, and A15 run __loop_delay() at 1 loop per > CPU clock cycle, > - As of commit 11d4bb1bd067f9d0 ("ARM: 7907/1: lib: delay-loop: Add > align directive to fix BogoMIPS calculation"), Cortex A8 runs > __loop_delay() at 1 loop per 2 instead of 3 CPU clock cycles. > > On SoCs with Cortex A7 and/or A15 CPU cores, this went unnoticed, as > delays use the ARM arch timer if available. R-Car Gen2 doesn't work if > the arch timer is disabled. However, APE6 can be used without the arch > timer. > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > --- > Backporting note: older kernels need modifications in all calls to > shmobile_setup_delay() or shmobile_setup_delay_hz(). Thanks, I have queued this up for v4.6. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/3] ARM: shmobile: timer: Drop support for Cortex A8 @ 2016-01-11 18:41 ` Geert Uytterhoeven 2016-01-18 1:45 ` Simon Horman 0 siblings, 1 reply; 7+ messages in thread From: Geert Uytterhoeven @ 2016-01-11 18:41 UTC (permalink / raw) To: linux-sh Commit edf4100906044225 ("ARM: shmobile: sh7372 dtsi: Remove Legacy file") removed the DTS for the last shmobile SoC with a Cortex A8 CPU core (sh7372 aka SH-Mobile AP4). Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> --- arch/arm/mach-shmobile/timer.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/arm/mach-shmobile/timer.c b/arch/arm/mach-shmobile/timer.c index d61014a52b43e205..15fc9db797ea6359 100644 --- a/arch/arm/mach-shmobile/timer.c +++ b/arch/arm/mach-shmobile/timer.c @@ -52,9 +52,7 @@ void __init shmobile_init_delay(void) if (!of_property_read_u32(np, "clock-frequency", &freq)) max_freq = max(max_freq, freq); - if (of_device_is_compatible(np, "arm,cortex-a8")) { - div = 2; - } else if (of_device_is_compatible(np, "arm,cortex-a9")) { + if (of_device_is_compatible(np, "arm,cortex-a9")) { div = 1; } else if (of_device_is_compatible(np, "arm,cortex-a7") || of_device_is_compatible(np, "arm,cortex-a15")) { -- 1.9.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] ARM: shmobile: timer: Drop support for Cortex A8 2016-01-11 18:41 ` [PATCH 2/3] ARM: shmobile: timer: Drop support for Cortex A8 Geert Uytterhoeven @ 2016-01-18 1:45 ` Simon Horman 0 siblings, 0 replies; 7+ messages in thread From: Simon Horman @ 2016-01-18 1:45 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 11, 2016 at 07:41:13PM +0100, Geert Uytterhoeven wrote: > Commit edf4100906044225 ("ARM: shmobile: sh7372 dtsi: Remove Legacy > file") removed the DTS for the last shmobile SoC with a Cortex A8 CPU > core (sh7372 aka SH-Mobile AP4). > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Thanks, I have queued this up for v4.6. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 3/3] ARM: shmobile: defconfig: Do not enable CONFIG_CPU_BPREDICT_DISABLE @ 2016-01-11 18:41 ` Geert Uytterhoeven 2016-01-18 1:48 ` Simon Horman 0 siblings, 1 reply; 7+ messages in thread From: Geert Uytterhoeven @ 2016-01-11 18:41 UTC (permalink / raw) To: linux-sh Do not disable branch prediction, as this has a big performance impact. After re-enabling branch prediction, __loop_delay() runs 16 (Cortex A15) or 11 (Cortex A9) times faster than before. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> --- This used to be enabled in: - ag5evm_defconfig (sh73a0) - ape6evm_defconfig (r8a73a4) - armadillo800eva_defconfig (r8a7740) - koelsch_defconfig (r8a7791) - kota2_defconfig (sh73a0) Was there ever a good reason for that? --- arch/arm/configs/shmobile_defconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm/configs/shmobile_defconfig b/arch/arm/configs/shmobile_defconfig index 969738324a5d5f81..5208b5caacd1a4e2 100644 --- a/arch/arm/configs/shmobile_defconfig +++ b/arch/arm/configs/shmobile_defconfig @@ -20,7 +20,6 @@ CONFIG_ARCH_R8A7791=y CONFIG_ARCH_R8A7793=y CONFIG_ARCH_R8A7794=y CONFIG_ARCH_SH73A0=y -CONFIG_CPU_BPREDICT_DISABLE=y CONFIG_PL310_ERRATA_588369=y CONFIG_ARM_ERRATA_754322=y CONFIG_PCI=y -- 1.9.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] ARM: shmobile: defconfig: Do not enable CONFIG_CPU_BPREDICT_DISABLE 2016-01-11 18:41 ` [PATCH 3/3] ARM: shmobile: defconfig: Do not enable CONFIG_CPU_BPREDICT_DISABLE Geert Uytterhoeven @ 2016-01-18 1:48 ` Simon Horman 0 siblings, 0 replies; 7+ messages in thread From: Simon Horman @ 2016-01-18 1:48 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 11, 2016 at 07:41:14PM +0100, Geert Uytterhoeven wrote: > Do not disable branch prediction, as this has a big performance impact. > > After re-enabling branch prediction, __loop_delay() runs 16 (Cortex A15) > or 11 (Cortex A9) times faster than before. > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > --- > This used to be enabled in: > > - ag5evm_defconfig (sh73a0) > - ape6evm_defconfig (r8a73a4) > - armadillo800eva_defconfig (r8a7740) > - koelsch_defconfig (r8a7791) > - kota2_defconfig (sh73a0) > > Was there ever a good reason for that? If that was a concious decision then I believe it was made before my time. I'm going ahead and queuing up this change for v4.6. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-01-18 1:48 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-11 18:41 [PATCH 0/3] ARM: shmobile: Fix loop-based delays Geert Uytterhoeven 2016-01-11 18:41 ` [PATCH 1/3] ARM: shmobile: timer: Fix preset_lpj leading to too short delays Geert Uytterhoeven 2016-01-18 1:42 ` Simon Horman 2016-01-11 18:41 ` [PATCH 2/3] ARM: shmobile: timer: Drop support for Cortex A8 Geert Uytterhoeven 2016-01-18 1:45 ` Simon Horman 2016-01-11 18:41 ` [PATCH 3/3] ARM: shmobile: defconfig: Do not enable CONFIG_CPU_BPREDICT_DISABLE Geert Uytterhoeven 2016-01-18 1:48 ` Simon Horman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).