* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
@ 2015-05-19 21:44 ` Heiko Stuebner
2015-05-19 21:55 ` Arnd Bergmann
2015-05-19 22:01 ` Florian Fainelli
` (7 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Heiko Stuebner @ 2015-05-19 21:44 UTC (permalink / raw)
To: linux-arm-kernel
Hi Russell,
Am Dienstag, 19. Mai 2015, 17:12:56 schrieb Russell King:
> All ARMv5 and older CPUs invalidate their caches in the early assembly
> setup function, prior to enabling the MMU. This is because the L1
> cache should not contain any data relevant to the execution of the
> kernel at this point; all data should have been flushed out to memory.
>
> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
> these typically do not search their caches when caching is disabled (as
> it needs to be when the MMU is disabled) so this change should be safe.
>
> ARMv7 allows there to be CPUs which search their caches while caching is
> disabled, and it's permitted that the cache is uninitialised at boot;
> for these, the architecture reference manual requires that an
> implementation specific code sequence is used immediately after reset
> to ensure that the cache is placed into a sane state. Such
> functionality is definitely outside the remit of the Linux kernel, and
> must be done by the SoC's firmware before _any_ CPU gets to the Linux
> kernel.
>
> Changing the data cache clean+invalidate to a mere invalidate allows us
> to get rid of a lot of platform specific hacks around this issue for
> their secondary CPU bringup paths - some of which were buggy.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Michael Niewoehner tested this on a rk3188 (Cortex-A9) and wrote in [0]
> Tested-by: Michael Niewoehner <mniewoeh@stud.hs-offenburg.de>
>
> Tested on Radxa Rock Pro with RK3188.
> The kernel panics on reboot I had before and also a kernel BUG when running
> "memtester 1900M" went away and the rock seems to run stable now.
I've also tested the patch on a rk3288-based board, so both supported Rockchip
ARM-cores (A9 and A12/A17) seem to run fine with this change.
Tested-by: Heiko Stuebner <heiko@sntech.de>
Rockchip-specific changes
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Thanks Heiko
[0] http://lists.infradead.org/pipermail/linux-rockchip/2015-May/003046.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-05-19 21:44 ` [PATCH] ARM: v7 setup function should invalidate L1 cache Heiko Stuebner
@ 2015-05-19 21:55 ` Arnd Bergmann
2015-05-19 22:07 ` Russell King - ARM Linux
0 siblings, 1 reply; 21+ messages in thread
From: Arnd Bergmann @ 2015-05-19 21:55 UTC (permalink / raw)
To: linux-arm-kernel
On Tuesday 19 May 2015 23:44:58 Heiko Stuebner wrote:
>
> Michael Niewoehner tested this on a rk3188 (Cortex-A9) and wrote in [0]
> > Tested-by: Michael Niewoehner <mniewoeh@stud.hs-offenburg.de>
> >
> > Tested on Radxa Rock Pro with RK3188.
> > The kernel panics on reboot I had before and also a kernel BUG when running
> > "memtester 1900M" went away and the rock seems to run stable now.
>
We should probably create a separate fix for that and add it to the stable
kernels then. I would suggest something like the untested patch below,
which takes advantage of the fact that we already have separate assignments
for the start_secondary function pointer for A9 and A17.
Arnd
diff --git a/arch/arm/mach-rockchip/headsmp.S b/arch/arm/mach-rockchip/headsmp.S
index 46c22dedf632..ae0077e8fe98 100644
--- a/arch/arm/mach-rockchip/headsmp.S
+++ b/arch/arm/mach-rockchip/headsmp.S
@@ -16,9 +16,6 @@
#include <linux/init.h>
ENTRY(rockchip_secondary_startup)
- mrc p15, 0, r0, c0, c0, 0 @ read main ID register
- ldr r1, =0x00000c09 @ Cortex-A9 primary part number
- teq r0, r1
beq v7_invalidate_l1
b secondary_startup
ENDPROC(rockchip_secondary_startup)
diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c
index 5b4ca3c3c879..5f46724cca2f 100644
--- a/arch/arm/mach-rockchip/platsmp.c
+++ b/arch/arm/mach-rockchip/platsmp.c
@@ -149,8 +149,7 @@ static int __cpuinit rockchip_boot_secondary(unsigned int cpu,
* sram_base_addr + 8: start address for pc
* */
udelay(10);
- writel(virt_to_phys(rockchip_secondary_startup),
- sram_base_addr + 8);
+ writel(virt_to_phys(secondary_startup), sram_base_addr + 8);
writel(0xDEADBEAF, sram_base_addr + 4);
dsb_sev();
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
2015-05-19 21:44 ` [PATCH] ARM: v7 setup function should invalidate L1 cache Heiko Stuebner
@ 2015-05-19 22:01 ` Florian Fainelli
2015-05-20 18:54 ` Dinh Nguyen
` (6 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Florian Fainelli @ 2015-05-19 22:01 UTC (permalink / raw)
To: linux-arm-kernel
On 19/05/15 09:12, Russell King wrote:
> All ARMv5 and older CPUs invalidate their caches in the early assembly
> setup function, prior to enabling the MMU. This is because the L1
> cache should not contain any data relevant to the execution of the
> kernel at this point; all data should have been flushed out to memory.
>
> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
> these typically do not search their caches when caching is disabled (as
> it needs to be when the MMU is disabled) so this change should be safe.
>
> ARMv7 allows there to be CPUs which search their caches while caching is
> disabled, and it's permitted that the cache is uninitialised at boot;
> for these, the architecture reference manual requires that an
> implementation specific code sequence is used immediately after reset
> to ensure that the cache is placed into a sane state. Such
> functionality is definitely outside the remit of the Linux kernel, and
> must be done by the SoC's firmware before _any_ CPU gets to the Linux
> kernel.
>
> Changing the data cache clean+invalidate to a mere invalidate allows us
> to get rid of a lot of platform specific hacks around this issue for
> their secondary CPU bringup paths - some of which were buggy.
For brcmstb:
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-05-19 21:55 ` Arnd Bergmann
@ 2015-05-19 22:07 ` Russell King - ARM Linux
2015-05-19 22:18 ` Arnd Bergmann
0 siblings, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-05-19 22:07 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 19, 2015 at 11:55:12PM +0200, Arnd Bergmann wrote:
> On Tuesday 19 May 2015 23:44:58 Heiko Stuebner wrote:
> >
> > Michael Niewoehner tested this on a rk3188 (Cortex-A9) and wrote in [0]
> > > Tested-by: Michael Niewoehner <mniewoeh@stud.hs-offenburg.de>
> > >
> > > Tested on Radxa Rock Pro with RK3188.
> > > The kernel panics on reboot I had before and also a kernel BUG when running
> > > "memtester 1900M" went away and the rock seems to run stable now.
> >
>
> We should probably create a separate fix for that and add it to the stable
> kernels then. I would suggest something like the untested patch below,
> which takes advantage of the fact that we already have separate assignments
> for the start_secondary function pointer for A9 and A17.
Your patch is also continuing the pattern with this code...
> diff --git a/arch/arm/mach-rockchip/headsmp.S b/arch/arm/mach-rockchip/headsmp.S
> index 46c22dedf632..ae0077e8fe98 100644
> --- a/arch/arm/mach-rockchip/headsmp.S
> +++ b/arch/arm/mach-rockchip/headsmp.S
> @@ -16,9 +16,6 @@
> #include <linux/init.h>
>
> ENTRY(rockchip_secondary_startup)
> - mrc p15, 0, r0, c0, c0, 0 @ read main ID register
> - ldr r1, =0x00000c09 @ Cortex-A9 primary part number
> - teq r0, r1
> beq v7_invalidate_l1
> b secondary_startup
If you look carefully at this, you'll notice that it's utter crap.
(Sorry, but it is.) It has two problems:
1. It'll never match a Cortex-A9 CPU. Cortex-A9 has a MIDR value of
0x412fc09a, not 0x00000c09. The bit position of the part number
field isn't even right.
2. If it does match, then we branch to "v7_invalidate_l1" without setting
the link register: we'll never return back here (we'll return to whatever
random value the link register contains) and so we'll never make it to
secondary_startup. *Thankfully*, because of (1), this branch will
never be taken - this is it's saving grace.
Your patch introduces a /third/ form of crapiness:
3. If the PSR happens to have Z=1, the "beq" instruction will be taken,
thereby crashing the system because of (2).
The /simplest/ change which would fix this problem is to just change
proc-v7.S. The remainder is effectively a cleanup removing redundant
code.
--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-05-19 22:07 ` Russell King - ARM Linux
@ 2015-05-19 22:18 ` Arnd Bergmann
2015-05-19 22:32 ` Russell King - ARM Linux
0 siblings, 1 reply; 21+ messages in thread
From: Arnd Bergmann @ 2015-05-19 22:18 UTC (permalink / raw)
To: linux-arm-kernel
On Tuesday 19 May 2015 23:07:21 Russell King - ARM Linux wrote:
> If you look carefully at this, you'll notice that it's utter crap.
> (Sorry, but it is.) It has two problems:
>
> 1. It'll never match a Cortex-A9 CPU. Cortex-A9 has a MIDR value of
> 0x412fc09a, not 0x00000c09. The bit position of the part number
> field isn't even right.
>
> 2. If it does match, then we branch to "v7_invalidate_l1" without setting
> the link register: we'll never return back here (we'll return to whatever
> random value the link register contains) and so we'll never make it to
> secondary_startup. *Thankfully*, because of (1), this branch will
> never be taken - this is it's saving grace.
Yes, I've understood both before.
> Your patch introduces a /third/ form of crapiness:
>
> 3. If the PSR happens to have Z=1, the "beq" instruction will be taken,
> thereby crashing the system because of (2).
Right, this was the result of sloppiness on my side when fat-fingering
a patch for illustration.
> The /simplest/ change which would fix this problem is to just change
> proc-v7.S. The remainder is effectively a cleanup removing redundant
> code.
Fair enough. I wasn't sure if we're confident enough about that
change already to put it into stable backports. If the risk is low
enough, that's fine.
Arnd
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-05-19 22:18 ` Arnd Bergmann
@ 2015-05-19 22:32 ` Russell King - ARM Linux
0 siblings, 0 replies; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-05-19 22:32 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, May 20, 2015 at 12:18:43AM +0200, Arnd Bergmann wrote:
> On Tuesday 19 May 2015 23:07:21 Russell King - ARM Linux wrote:
> > The /simplest/ change which would fix this problem is to just change
> > proc-v7.S. The remainder is effectively a cleanup removing redundant
> > code.
>
> Fair enough. I wasn't sure if we're confident enough about that
> change already to put it into stable backports. If the risk is low
> enough, that's fine.
When the kernel is entered from the decompressor, the last steps just
before calling the kernel is to clean and invalidate the caches, then
turn them off, and then jump to the kernel.
So, there _should_ be no dirty cache lines which we rely upon at this
point - if there was dirty data in the cache, then it hasn't been flushed
to memory, and we're potentially going to crash because some part of
the decompressed kernel image hasn't been flushed out to memory.
So I'll put a confidence value of 99.999% on this.
The case which might bite people is if they bypass the decompressor.
Even then, leaving bits of the kernel image in the caches is really
dodgy.
--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
2015-05-19 21:44 ` [PATCH] ARM: v7 setup function should invalidate L1 cache Heiko Stuebner
2015-05-19 22:01 ` Florian Fainelli
@ 2015-05-20 18:54 ` Dinh Nguyen
2015-05-20 22:48 ` Sebastian Hesselbarth
` (5 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Dinh Nguyen @ 2015-05-20 18:54 UTC (permalink / raw)
To: linux-arm-kernel
On 05/19/2015 11:12 AM, Russell King wrote:
> All ARMv5 and older CPUs invalidate their caches in the early assembly
> setup function, prior to enabling the MMU. This is because the L1
> cache should not contain any data relevant to the execution of the
> kernel at this point; all data should have been flushed out to memory.
>
> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
> these typically do not search their caches when caching is disabled (as
> it needs to be when the MMU is disabled) so this change should be safe.
>
> ARMv7 allows there to be CPUs which search their caches while caching is
> disabled, and it's permitted that the cache is uninitialised at boot;
> for these, the architecture reference manual requires that an
> implementation specific code sequence is used immediately after reset
> to ensure that the cache is placed into a sane state. Such
> functionality is definitely outside the remit of the Linux kernel, and
> must be done by the SoC's firmware before _any_ CPU gets to the Linux
> kernel.
>
> Changing the data cache clean+invalidate to a mere invalidate allows us
> to get rid of a lot of platform specific hacks around this issue for
> their secondary CPU bringup paths - some of which were buggy.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
For socfpga:
Tested-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Thanks,
Dinh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
` (2 preceding siblings ...)
2015-05-20 18:54 ` Dinh Nguyen
@ 2015-05-20 22:48 ` Sebastian Hesselbarth
2015-05-21 2:08 ` Shawn Guo
` (4 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Sebastian Hesselbarth @ 2015-05-20 22:48 UTC (permalink / raw)
To: linux-arm-kernel
On 19.05.2015 18:12, Russell King wrote:
> All ARMv5 and older CPUs invalidate their caches in the early assembly
> setup function, prior to enabling the MMU. This is because the L1
> cache should not contain any data relevant to the execution of the
> kernel at this point; all data should have been flushed out to memory.
>
> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
> these typically do not search their caches when caching is disabled (as
> it needs to be when the MMU is disabled) so this change should be safe.
>
> ARMv7 allows there to be CPUs which search their caches while caching is
> disabled, and it's permitted that the cache is uninitialised at boot;
> for these, the architecture reference manual requires that an
> implementation specific code sequence is used immediately after reset
> to ensure that the cache is placed into a sane state. Such
> functionality is definitely outside the remit of the Linux kernel, and
> must be done by the SoC's firmware before _any_ CPU gets to the Linux
> kernel.
>
> Changing the data cache clean+invalidate to a mere invalidate allows us
> to get rid of a lot of platform specific hacks around this issue for
> their secondary CPU bringup paths - some of which were buggy.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
[...]
> arch/arm/mach-berlin/headsmp.S | 6 ------
> arch/arm/mach-berlin/platsmp.c | 3 +--
For berlin,
Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Thanks!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
` (3 preceding siblings ...)
2015-05-20 22:48 ` Sebastian Hesselbarth
@ 2015-05-21 2:08 ` Shawn Guo
2015-05-21 8:30 ` Thierry Reding
` (3 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Shawn Guo @ 2015-05-21 2:08 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 19, 2015 at 05:12:56PM +0100, Russell King wrote:
> All ARMv5 and older CPUs invalidate their caches in the early assembly
> setup function, prior to enabling the MMU. This is because the L1
> cache should not contain any data relevant to the execution of the
> kernel at this point; all data should have been flushed out to memory.
>
> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
> these typically do not search their caches when caching is disabled (as
> it needs to be when the MMU is disabled) so this change should be safe.
>
> ARMv7 allows there to be CPUs which search their caches while caching is
> disabled, and it's permitted that the cache is uninitialised at boot;
> for these, the architecture reference manual requires that an
> implementation specific code sequence is used immediately after reset
> to ensure that the cache is placed into a sane state. Such
> functionality is definitely outside the remit of the Linux kernel, and
> must be done by the SoC's firmware before _any_ CPU gets to the Linux
> kernel.
>
> Changing the data cache clean+invalidate to a mere invalidate allows us
> to get rid of a lot of platform specific hacks around this issue for
> their secondary CPU bringup paths - some of which were buggy.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
...
> arch/arm/mach-imx/headsmp.S | 1 -
Acked-by: Shawn Guo <shawn.guo@linaro.org>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
` (4 preceding siblings ...)
2015-05-21 2:08 ` Shawn Guo
@ 2015-05-21 8:30 ` Thierry Reding
2015-05-22 7:36 ` Geert Uytterhoeven
` (2 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Thierry Reding @ 2015-05-21 8:30 UTC (permalink / raw)
To: linux-arm-kernel
[-- Attachment #1: Type: text/plain, Size: 2370 bytes --]
On Tue, May 19, 2015 at 05:12:56PM +0100, Russell King wrote:
> All ARMv5 and older CPUs invalidate their caches in the early assembly
> setup function, prior to enabling the MMU. This is because the L1
> cache should not contain any data relevant to the execution of the
> kernel at this point; all data should have been flushed out to memory.
>
> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
> these typically do not search their caches when caching is disabled (as
> it needs to be when the MMU is disabled) so this change should be safe.
>
> ARMv7 allows there to be CPUs which search their caches while caching is
> disabled, and it's permitted that the cache is uninitialised at boot;
> for these, the architecture reference manual requires that an
> implementation specific code sequence is used immediately after reset
> to ensure that the cache is placed into a sane state. Such
> functionality is definitely outside the remit of the Linux kernel, and
> must be done by the SoC's firmware before _any_ CPU gets to the Linux
> kernel.
>
> Changing the data cache clean+invalidate to a mere invalidate allows us
> to get rid of a lot of platform specific hacks around this issue for
> their secondary CPU bringup paths - some of which were buggy.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
[...]
> arch/arm/mach-tegra/Makefile | 2 +-
> arch/arm/mach-tegra/headsmp.S | 12 ------------
> arch/arm/mach-tegra/reset.c | 2 +-
> arch/arm/mach-tegra/reset.h | 1 -
[...]
Russell,
I saw a couple of conflicts trying to apply this to v4.1-rc1..v4.1-rc4
or any of recent linux-next versions. I ended up applying this on top of
your for-next branch and tested using that. The conflicts seem very
minor and a test merge of Tegra's for-next branch resolved this fine. A
test merge of next-20150520 shows one conflict in mach-socfpga/core.h,
but it's a trivial one to resolve.
Anyway, the patch seems to work fine on TrimSlice (Tegra20), Beaver
(Tegra30), Dalmore (Tegra114) and Jetson TK1 (Tegra124). Tested on
Paul's boot farm:
Tested-by: Thierry Reding <treding@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Paul, thanks for setting up the boot test infrastructure, this saved me
a lot of time.
Thierry
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
` (5 preceding siblings ...)
2015-05-21 8:30 ` Thierry Reding
@ 2015-05-22 7:36 ` Geert Uytterhoeven
2015-06-01 10:41 ` Geert Uytterhoeven
2015-05-22 10:45 ` Michal Simek
2015-06-01 10:21 ` Wei Xu
8 siblings, 1 reply; 21+ messages in thread
From: Geert Uytterhoeven @ 2015-05-22 7:36 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 19, 2015 at 6:12 PM, Russell King
<rmk+kernel@arm.linux.org.uk> wrote:
> All ARMv5 and older CPUs invalidate their caches in the early assembly
> setup function, prior to enabling the MMU. This is because the L1
> cache should not contain any data relevant to the execution of the
> kernel at this point; all data should have been flushed out to memory.
>
> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
> these typically do not search their caches when caching is disabled (as
> it needs to be when the MMU is disabled) so this change should be safe.
>
> ARMv7 allows there to be CPUs which search their caches while caching is
> disabled, and it's permitted that the cache is uninitialised at boot;
> for these, the architecture reference manual requires that an
> implementation specific code sequence is used immediately after reset
> to ensure that the cache is placed into a sane state. Such
> functionality is definitely outside the remit of the Linux kernel, and
> must be done by the SoC's firmware before _any_ CPU gets to the Linux
> kernel.
>
> Changing the data cache clean+invalidate to a mere invalidate allows us
> to get rid of a lot of platform specific hacks around this issue for
> their secondary CPU bringup paths - some of which were buggy.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
For various shmobile:
- sh73a0/kzm9g
- r8a7740/armadillo
- r8a73a4/ape6evm
- r8a7791/koelsch
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
` (6 preceding siblings ...)
2015-05-22 7:36 ` Geert Uytterhoeven
@ 2015-05-22 10:45 ` Michal Simek
2015-06-01 10:21 ` Wei Xu
8 siblings, 0 replies; 21+ messages in thread
From: Michal Simek @ 2015-05-22 10:45 UTC (permalink / raw)
To: linux-arm-kernel
On 05/19/2015 06:12 PM, Russell King wrote:
> All ARMv5 and older CPUs invalidate their caches in the early assembly
> setup function, prior to enabling the MMU. This is because the L1
> cache should not contain any data relevant to the execution of the
> kernel at this point; all data should have been flushed out to memory.
>
> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
> these typically do not search their caches when caching is disabled (as
> it needs to be when the MMU is disabled) so this change should be safe.
>
> ARMv7 allows there to be CPUs which search their caches while caching is
> disabled, and it's permitted that the cache is uninitialised at boot;
> for these, the architecture reference manual requires that an
> implementation specific code sequence is used immediately after reset
> to ensure that the cache is placed into a sane state. Such
> functionality is definitely outside the remit of the Linux kernel, and
> must be done by the SoC's firmware before _any_ CPU gets to the Linux
> kernel.
>
> Changing the data cache clean+invalidate to a mere invalidate allows us
> to get rid of a lot of platform specific hacks around this issue for
> their secondary CPU bringup paths - some of which were buggy.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
...
> arch/arm/mach-zynq/common.h | 2 --
> arch/arm/mach-zynq/headsmp.S | 5 -----
> arch/arm/mach-zynq/platsmp.c | 5 ++---
For Zynq:
Tested-by: Michal Simek <michal.simek@xilinx.com>
Thanks,
Michal
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
` (7 preceding siblings ...)
2015-05-22 10:45 ` Michal Simek
@ 2015-06-01 10:21 ` Wei Xu
8 siblings, 0 replies; 21+ messages in thread
From: Wei Xu @ 2015-06-01 10:21 UTC (permalink / raw)
To: linux-arm-kernel
On 5/19/2015 5:12 PM, Russell King wrote:
> All ARMv5 and older CPUs invalidate their caches in the early assembly
> setup function, prior to enabling the MMU. This is because the L1
> cache should not contain any data relevant to the execution of the
> kernel at this point; all data should have been flushed out to memory.
>
> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
> these typically do not search their caches when caching is disabled (as
> it needs to be when the MMU is disabled) so this change should be safe.
>
> ARMv7 allows there to be CPUs which search their caches while caching is
> disabled, and it's permitted that the cache is uninitialised at boot;
> for these, the architecture reference manual requires that an
> implementation specific code sequence is used immediately after reset
> to ensure that the cache is placed into a sane state. Such
> functionality is definitely outside the remit of the Linux kernel, and
> must be done by the SoC's firmware before _any_ CPU gets to the Linux
> kernel.
>
> Changing the data cache clean+invalidate to a mere invalidate allows us
> to get rid of a lot of platform specific hacks around this issue for
> their secondary CPU bringup paths - some of which were buggy.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
Hi Russell,
[...]
> arch/arm/mach-hisi/Makefile | 2 +-
> arch/arm/mach-hisi/core.h | 1 -
> arch/arm/mach-hisi/headsmp.S | 16 ----------------
> arch/arm/mach-hisi/platsmp.c | 4 ++--
[...]
Sorry for the late reply.
For Hisilicon:
- hip01/ca9x2
- hix5hd2/dkb
Tested-by: Wei Xu <xuwei5@hisilicon.com>
Thanks!
Best Regards,
Wei
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-05-22 7:36 ` Geert Uytterhoeven
@ 2015-06-01 10:41 ` Geert Uytterhoeven
2015-06-01 10:53 ` Russell King - ARM Linux
0 siblings, 1 reply; 21+ messages in thread
From: Geert Uytterhoeven @ 2015-06-01 10:41 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, May 22, 2015 at 9:36 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Tue, May 19, 2015 at 6:12 PM, Russell King
> <rmk+kernel@arm.linux.org.uk> wrote:
>> All ARMv5 and older CPUs invalidate their caches in the early assembly
>> setup function, prior to enabling the MMU. This is because the L1
>> cache should not contain any data relevant to the execution of the
>> kernel at this point; all data should have been flushed out to memory.
>>
>> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed,
>> these typically do not search their caches when caching is disabled (as
>> it needs to be when the MMU is disabled) so this change should be safe.
>>
>> ARMv7 allows there to be CPUs which search their caches while caching is
>> disabled, and it's permitted that the cache is uninitialised at boot;
>> for these, the architecture reference manual requires that an
>> implementation specific code sequence is used immediately after reset
>> to ensure that the cache is placed into a sane state. Such
>> functionality is definitely outside the remit of the Linux kernel, and
>> must be done by the SoC's firmware before _any_ CPU gets to the Linux
>> kernel.
>>
>> Changing the data cache clean+invalidate to a mere invalidate allows us
>> to get rid of a lot of platform specific hacks around this issue for
>> their secondary CPU bringup paths - some of which were buggy.
>>
>> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>
> For various shmobile:
> - sh73a0/kzm9g
> - r8a7740/armadillo
> - r8a73a4/ape6evm
> - r8a7791/koelsch
>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
FWIW, I have the feeling this has a slight influence on boot reliability on
two of my boards:
- r8a7740/armadillo, which is known to suffer from a cache-related bug in
its bootloader, seems to have a higher change of booting successfully on
cold boot,
- sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up,
seems to have a lower chance of booting successfully.
No time to spend all week turning this into a statistical significant test
project... The reset button is my friend...
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-06-01 10:41 ` Geert Uytterhoeven
@ 2015-06-01 10:53 ` Russell King - ARM Linux
2015-06-01 11:50 ` Geert Uytterhoeven
0 siblings, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-06-01 10:53 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote:
> FWIW, I have the feeling this has a slight influence on boot reliability on
> two of my boards:
> - r8a7740/armadillo, which is known to suffer from a cache-related bug in
> its bootloader, seems to have a higher change of booting successfully on
> cold boot,
> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up,
> seems to have a lower chance of booting successfully.
>
> No time to spend all week turning this into a statistical significant test
> project... The reset button is my friend...
Damn it, you sent this right after I merged and pushed out this change in
my for-arm-soc branch, and was just about to send it to the arm-soc people.
What excellent timing you have. :)
What happens on the kzm9g if you revert the mach-shmobile changes?
For armadillo, do you use the decompressor? That should be doing all the
cache cleaning already, prior to the kernel being entered.
--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-06-01 10:53 ` Russell King - ARM Linux
@ 2015-06-01 11:50 ` Geert Uytterhoeven
2015-06-17 20:35 ` Dinh Nguyen
0 siblings, 1 reply; 21+ messages in thread
From: Geert Uytterhoeven @ 2015-06-01 11:50 UTC (permalink / raw)
To: linux-arm-kernel
Hi Russell,
On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote:
>> FWIW, I have the feeling this has a slight influence on boot reliability on
>> two of my boards:
>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in
>> its bootloader, seems to have a higher change of booting successfully on
>> cold boot,
>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up,
>> seems to have a lower chance of booting successfully.
>>
>> No time to spend all week turning this into a statistical significant test
>> project... The reset button is my friend...
>
> Damn it, you sent this right after I merged and pushed out this change in
> my for-arm-soc branch, and was just about to send it to the arm-soc people.
> What excellent timing you have. :)
Don't worry, I didn't send that email to make you postpone this change.
Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo)
of the boot loader, and these are old SoCs, please go ahead.
> What happens on the kzm9g if you revert the mach-shmobile changes?
Seems to make no difference.
> For armadillo, do you use the decompressor? That should be doing all the
> cache cleaning already, prior to the kernel being entered.
I think so.
Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported
machine ID", to booting almost completely, but lacking a few devices due to
a corrupted DTB. Been like that as long as I remember, i.e. since I got the
board ca. 1 year ago. Boots fine (100%) with kexec.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-06-01 11:50 ` Geert Uytterhoeven
@ 2015-06-17 20:35 ` Dinh Nguyen
2015-06-17 21:30 ` Russell King - ARM Linux
0 siblings, 1 reply; 21+ messages in thread
From: Dinh Nguyen @ 2015-06-17 20:35 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> Hi Russell,
>
> On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote:
>>> FWIW, I have the feeling this has a slight influence on boot reliability on
>>> two of my boards:
>>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in
>>> its bootloader, seems to have a higher change of booting successfully on
>>> cold boot,
>>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up,
>>> seems to have a lower chance of booting successfully.
>>>
>>> No time to spend all week turning this into a statistical significant test
>>> project... The reset button is my friend...
>>
>> Damn it, you sent this right after I merged and pushed out this change in
>> my for-arm-soc branch, and was just about to send it to the arm-soc people.
>> What excellent timing you have. :)
>
> Don't worry, I didn't send that email to make you postpone this change.
> Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo)
> of the boot loader, and these are old SoCs, please go ahead.
>
>> What happens on the kzm9g if you revert the mach-shmobile changes?
>
> Seems to make no difference.
>
>> For armadillo, do you use the decompressor? That should be doing all the
>> cache cleaning already, prior to the kernel being entered.
>
> I think so.
>
> Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported
> machine ID", to booting almost completely, but lacking a few devices due to
> a corrupted DTB. Been like that as long as I remember, i.e. since I got the
> board ca. 1 year ago. Boots fine (100%) with kexec.
>
It seems like this patch is causing the SoCFPGA to not boot with SMP
reliably. About 1 out of every 10 reboots, I'm seeing the boot failure
below. The error seems to only happen when I do a cold or warm reboot,
but never occurs during a power-up. If I revert this patch, or put
back the call to v7_invalidate_l1 in socfpga_secondary_startup , then
its able to boot 100% of the time.
Just wondering if anyone else is seeing something similar?
I am testing this on both linux-next and arm-soc/rmk/for-arm-soc. When
the failure happens, here's the log:
Booting Linux on physical CPU 0x0
Initializing cgroup subsys cpuset
Linux version 4.1.0-rc8-next-20150617-00002-gdd1f624
(dinguyen@linux-builds1) (gcc version 4.7.3 20130226 (prerelease)
(crosstool-NG linaro-1.13.1-4.7-2013.03-20130313 - Linaro GCC 2013.03)
) #1 SMP Wed Jun 17 14:22:59 CDT 2015
CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr\x10c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Machine model: Altera SOCFPGA Cyclone V SoC Development Kit
Truncating RAM at 0x00000000-0x40000000 to -0x2f800000
Memory policy: Data cache writealloc
On node 0 totalpages: 194560
free_area_init_node: node 0, pgdat c0692640, node_mem_map ef20b000
Normal zone: 1520 pages used for memmap
Normal zone: 0 pages reserved
Normal zone: 194560 pages, LIFO batch:31
PERCPU: Embedded 12 pages/cpu @ef1e1000 s19648 r8192 d21312 u49152
pcpu-alloc: s19648 r8192 d21312 u49152 alloc\x12*4096
pcpu-alloc: [0] 0 [0] 1
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 193040
Kernel command line: console=ttyS0,115200 root=/dev/mmcblk0p2 rw
rootwait ip=dhcp earlyprintk
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 764288K/778240K available (4782K kernel code, 286K rwdata,
1344K rodata, 304K init, 135K bss, 13952K reserved, 0K cma-reserved)
Virtual kernel memory layout:
vector : 0xffff0000 - 0xffff1000 ( 4 kB)
fixmap : 0xffc00000 - 0xfff00000 (3072 kB)
vmalloc : 0xf0000000 - 0xff000000 ( 240 MB)
lowmem : 0xc0000000 - 0xef800000 ( 760 MB)
modules : 0xbf000000 - 0xc0000000 ( 16 MB)
.text : 0xc0008000 - 0xc0603e78 (6128 kB)
.init : 0xc0604000 - 0xc0650000 ( 304 kB)
.data : 0xc0650000 - 0xc0697920 ( 287 kB)
.bss : 0xc0697920 - 0xc06b976c ( 136 kB)
SLUB: HWalignd, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Hierarchical RCU implementation.
Additional per-CPU info printed with stalls.
Build-time adjustment of leaf fanout to 32.
NR_IRQS:16 nr_irqs:16 16
L2C-310 enabling early BRESP for Cortex-A9
L2C-310 full line of zeros enabled for Cortex-A9
L2C-310 dynamic clock gating enabled, standby mode enabled
L2C-310 cache controller enabled, 8 ways, 512 kB
L2C-310: CACHE_ID 0x410030c9, AUX_CTRL 0x46060001
clocksource: timer1: mask: 0xffffffff max_cycles: 0xffffffff,
max_idle_ns: 19112604467 ns
sched_clock: 32 bits at 100MHz, resolution 10ns, wraps every 21474836475ns
Console: colour dummy device 80x30
Calibrating delay loop... 1836.64 BogoMIPS (lpj‘83232)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
CPU: Testing write buffer coherency: ok
CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Setting up static identity map for 0x8280 - 0x82d8
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted
4.1.0-rc8-next-20150617-00002-gdd1f624 #1
Hardware name: Altera SOCFPGA
task: eecaeac0 ti: eecce000 task.ti: eecce000
PC is at vfp_notifier+0x58/0x12c
LR is at notifier_call_chain+0x44/0x84
pc : [<c000a6bc>] lr : [<c003d134>] psr: 80000193
sp : eeccff48 ip : c06563c8 fp : eeccffd4
r10: eecaef80 r9 : ef1f1300 r8 : 00000002
r7 : eecd0000 r6 : c0656bc0 r5 : 00000000 r4 : eecd0000
r3 : c000a664 r2 : eecd0000 r1 : 00000002 r0 : c06563c8
Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 10c5387d Table: 0000404a DAC: 00000015
Process swapper/1 (pid: 0, stack limit = 0xeecce218)
Stack: (0xeeccff48 to 0xeecd0000)
ff40: c000a664 ffffffff 00000000 c003d134 eecd0018 eecaeac0
ff60: c06648e0 0b52d2f9 c048cfa8 c003d18c 00000000 f0002100 00000001 c003d1ac
ff80: 00000000 eecaeac0 c064f300 c001369c c064b304 c0013140 00000000 ef1ed328
ffa0: eeccffe8 c001e760 c0486ec4 2eba2000 c06957c0 c06524dc 00000015 c06957c0
ffc0: c048c778 c064b304 c06957c0 00000000 eeccffdc c0486ec4 eeccffe4 c0487138
ffe0: 00000001 c00544e8 c0009494 c0697bc0 00000000 000094ac 7ef5bffd 3f39b3f8
[<c000a6bc>] (vfp_notifier) from [<c003d134>] (notifier_call_chain+0x44/0x84)
[<c003d134>] (notifier_call_chain) from [<c003d18c>]
(__atomic_notifier_call_chain+0x18/0x20)
[<c003d18c>] (__atomic_notifier_call_chain) from [<c003d1ac>]
(atomic_notifier_call_chain+0x18/0x20)
[<c003d1ac>] (atomic_notifier_call_chain) from [<c001369c>]
(__switch_to+0x34/0x58)
Code: e3a03002 e5843208 e3a00000 e8bd8038 (eef85a10)
---[ end trace 9eaea9661b3b550a ]---
Kernel panic - not syncing: Attempted to kill the idle task!
SMP: failed to stop secondary CPUs
---[ end Kernel panic - not syncing: Attempted to kill the idle task!
Dinh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-06-17 20:35 ` Dinh Nguyen
@ 2015-06-17 21:30 ` Russell King - ARM Linux
2015-06-17 22:12 ` Dinh Nguyen
0 siblings, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-06-17 21:30 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Jun 17, 2015 at 03:35:13PM -0500, Dinh Nguyen wrote:
> On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > Hi Russell,
> >
> > On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> >> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote:
> >>> FWIW, I have the feeling this has a slight influence on boot reliability on
> >>> two of my boards:
> >>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in
> >>> its bootloader, seems to have a higher change of booting successfully on
> >>> cold boot,
> >>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up,
> >>> seems to have a lower chance of booting successfully.
> >>>
> >>> No time to spend all week turning this into a statistical significant test
> >>> project... The reset button is my friend...
> >>
> >> Damn it, you sent this right after I merged and pushed out this change in
> >> my for-arm-soc branch, and was just about to send it to the arm-soc people.
> >> What excellent timing you have. :)
> >
> > Don't worry, I didn't send that email to make you postpone this change.
> > Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo)
> > of the boot loader, and these are old SoCs, please go ahead.
> >
> >> What happens on the kzm9g if you revert the mach-shmobile changes?
> >
> > Seems to make no difference.
> >
> >> For armadillo, do you use the decompressor? That should be doing all the
> >> cache cleaning already, prior to the kernel being entered.
> >
> > I think so.
> >
> > Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported
> > machine ID", to booting almost completely, but lacking a few devices due to
> > a corrupted DTB. Been like that as long as I remember, i.e. since I got the
> > board ca. 1 year ago. Boots fine (100%) with kexec.
> >
>
> It seems like this patch is causing the SoCFPGA to not boot with SMP
> reliably. About 1 out of every 10 reboots, I'm seeing the boot failure
> below. The error seems to only happen when I do a cold or warm reboot,
> but never occurs during a power-up. If I revert this patch, or put
> back the call to v7_invalidate_l1 in socfpga_secondary_startup , then
> its able to boot 100% of the time.
It really sucks that you're only just testing this change now, because
I've frozen my tree, and removing it for the next merge window is going
to be an entirely non-trivial matter. You were copied on the original
patch, which you failed to test... I can't say I have _much_ sympathy
for a bug report at this point in time.
> Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
> Modules linked in:
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> 4.1.0-rc8-next-20150617-00002-gdd1f624 #1
> Hardware name: Altera SOCFPGA
> task: eecaeac0 ti: eecce000 task.ti: eecce000
> PC is at vfp_notifier+0x58/0x12c
> LR is at notifier_call_chain+0x44/0x84
This suggests that access to the VFP coprocessor is still disabled.
However, vfp_hotplug() should have been called for CPU1 before it
gets here, which should call vfp_enable(), which should enable access.
However, what I'm wondering is...
> [<c000a6bc>] (vfp_notifier) from [<c003d134>] (notifier_call_chain+0x44/0x84)
> [<c003d134>] (notifier_call_chain) from [<c003d18c>]
> (__atomic_notifier_call_chain+0x18/0x20)
> [<c003d18c>] (__atomic_notifier_call_chain) from [<c003d1ac>]
> (atomic_notifier_call_chain+0x18/0x20)
> [<c003d1ac>] (atomic_notifier_call_chain) from [<c001369c>]
> (__switch_to+0x34/0x58)
what the rest of the trace is. Unfortunately, we mark __switch_to() as
"cantunwind" which means the unwinder always stops here. It would be
really good to know what is responsible for this scheduling event,
whether it's due to a lock which is tried to be taken but is found to
be locked, but I don't think we can modify __switch_to() to allow it
to unwind (and I don't have the unwinder knowledge to hand to hack
something together.)
In order to see what's going on here, we do need to see the rest of the
trace... right now I don't have the time to be able to sort out
__switch_to() to achieve that.
As I say, you should have tested this earlier. About the only thing I
can do now is to revert the entire original patch, which is going to be
extremely disruptive as it'll cause yet more conflicts between trees -
again, something that we want to be avoiding at this stage in the game.
Please test patches earlier.
--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-06-17 21:30 ` Russell King - ARM Linux
@ 2015-06-17 22:12 ` Dinh Nguyen
2015-06-17 22:31 ` Dinh Nguyen
0 siblings, 1 reply; 21+ messages in thread
From: Dinh Nguyen @ 2015-06-17 22:12 UTC (permalink / raw)
To: linux-arm-kernel
On 06/17/2015 04:30 PM, Russell King - ARM Linux wrote:
> On Wed, Jun 17, 2015 at 03:35:13PM -0500, Dinh Nguyen wrote:
>> On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>> Hi Russell,
>>>
>>> On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux
>>> <linux@arm.linux.org.uk> wrote:
>>>> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote:
>>>>> FWIW, I have the feeling this has a slight influence on boot reliability on
>>>>> two of my boards:
>>>>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in
>>>>> its bootloader, seems to have a higher change of booting successfully on
>>>>> cold boot,
>>>>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up,
>>>>> seems to have a lower chance of booting successfully.
>>>>>
>>>>> No time to spend all week turning this into a statistical significant test
>>>>> project... The reset button is my friend...
>>>>
>>>> Damn it, you sent this right after I merged and pushed out this change in
>>>> my for-arm-soc branch, and was just about to send it to the arm-soc people.
>>>> What excellent timing you have. :)
>>>
>>> Don't worry, I didn't send that email to make you postpone this change.
>>> Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo)
>>> of the boot loader, and these are old SoCs, please go ahead.
>>>
>>>> What happens on the kzm9g if you revert the mach-shmobile changes?
>>>
>>> Seems to make no difference.
>>>
>>>> For armadillo, do you use the decompressor? That should be doing all the
>>>> cache cleaning already, prior to the kernel being entered.
>>>
>>> I think so.
>>>
>>> Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported
>>> machine ID", to booting almost completely, but lacking a few devices due to
>>> a corrupted DTB. Been like that as long as I remember, i.e. since I got the
>>> board ca. 1 year ago. Boots fine (100%) with kexec.
>>>
>>
>> It seems like this patch is causing the SoCFPGA to not boot with SMP
>> reliably. About 1 out of every 10 reboots, I'm seeing the boot failure
>> below. The error seems to only happen when I do a cold or warm reboot,
>> but never occurs during a power-up. If I revert this patch, or put
>> back the call to v7_invalidate_l1 in socfpga_secondary_startup , then
>> its able to boot 100% of the time.
>
> It really sucks that you're only just testing this change now, because
> I've frozen my tree, and removing it for the next merge window is going
> to be an entirely non-trivial matter. You were copied on the original
> patch, which you failed to test... I can't say I have _much_ sympathy
> for a bug report at this point in time.
>
I apologize for not catching this error while testing this patch. But I
did test it when you first sent it out..I probably didn't do a stress
test. Sometimes the reboot fails in the 1st attempt, sometimes it fails
in the 9th attempt.
I only caught this error when I was testing my recent changes to use
CPU_METHOD_OF_DECLARE.
For me, I don't think you need to revert this patch or anything, but a
fix can go in for a -rcX?
Dinh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-06-17 22:12 ` Dinh Nguyen
@ 2015-06-17 22:31 ` Dinh Nguyen
2015-06-17 22:51 ` Russell King - ARM Linux
0 siblings, 1 reply; 21+ messages in thread
From: Dinh Nguyen @ 2015-06-17 22:31 UTC (permalink / raw)
To: linux-arm-kernel
On 06/17/2015 05:12 PM, Dinh Nguyen wrote:
> On 06/17/2015 04:30 PM, Russell King - ARM Linux wrote:
>> On Wed, Jun 17, 2015 at 03:35:13PM -0500, Dinh Nguyen wrote:
>>> On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>>> Hi Russell,
>>>>
>>>> On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux
>>>> <linux@arm.linux.org.uk> wrote:
>>>>> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote:
>>>>>> FWIW, I have the feeling this has a slight influence on boot reliability on
>>>>>> two of my boards:
>>>>>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in
>>>>>> its bootloader, seems to have a higher change of booting successfully on
>>>>>> cold boot,
>>>>>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up,
>>>>>> seems to have a lower chance of booting successfully.
>>>>>>
>>>>>> No time to spend all week turning this into a statistical significant test
>>>>>> project... The reset button is my friend...
>>>>>
>>>>> Damn it, you sent this right after I merged and pushed out this change in
>>>>> my for-arm-soc branch, and was just about to send it to the arm-soc people.
>>>>> What excellent timing you have. :)
>>>>
>>>> Don't worry, I didn't send that email to make you postpone this change.
>>>> Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo)
>>>> of the boot loader, and these are old SoCs, please go ahead.
>>>>
>>>>> What happens on the kzm9g if you revert the mach-shmobile changes?
>>>>
>>>> Seems to make no difference.
>>>>
>>>>> For armadillo, do you use the decompressor? That should be doing all the
>>>>> cache cleaning already, prior to the kernel being entered.
>>>>
>>>> I think so.
>>>>
>>>> Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported
>>>> machine ID", to booting almost completely, but lacking a few devices due to
>>>> a corrupted DTB. Been like that as long as I remember, i.e. since I got the
>>>> board ca. 1 year ago. Boots fine (100%) with kexec.
>>>>
>>>
>>> It seems like this patch is causing the SoCFPGA to not boot with SMP
>>> reliably. About 1 out of every 10 reboots, I'm seeing the boot failure
>>> below. The error seems to only happen when I do a cold or warm reboot,
>>> but never occurs during a power-up. If I revert this patch, or put
>>> back the call to v7_invalidate_l1 in socfpga_secondary_startup , then
>>> its able to boot 100% of the time.
>>
>> It really sucks that you're only just testing this change now, because
>> I've frozen my tree, and removing it for the next merge window is going
>> to be an entirely non-trivial matter. You were copied on the original
>> patch, which you failed to test... I can't say I have _much_ sympathy
>> for a bug report at this point in time.
>>
>
> I apologize for not catching this error while testing this patch. But I
> did test it when you first sent it out..I probably didn't do a stress
> test. Sometimes the reboot fails in the 1st attempt, sometimes it fails
> in the 9th attempt.
>
> I only caught this error when I was testing my recent changes to use
> CPU_METHOD_OF_DECLARE.
>
> For me, I don't think you need to revert this patch or anything, but a
> fix can go in for a -rcX?
>
Also, I am not seeing the error on the SoCFPGA Arria 10 platform at all.
This Arria10 platform is running a different version of bootloader than
the Cyclone5. Although, I also did test with the latest version of
U-Boot on the Cyclone5.
Dinh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache
2015-06-17 22:31 ` Dinh Nguyen
@ 2015-06-17 22:51 ` Russell King - ARM Linux
0 siblings, 0 replies; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-06-17 22:51 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Jun 17, 2015 at 05:31:30PM -0500, Dinh Nguyen wrote:
> Also, I am not seeing the error on the SoCFPGA Arria 10 platform at all.
> This Arria10 platform is running a different version of bootloader than
> the Cyclone5. Although, I also did test with the latest version of
> U-Boot on the Cyclone5.
It _shouldn't_ cause any issue, unless secondary_startup is called with
caches already enabled (which it shouldn't.)
If caches are enabled, that's incredibly dodgy even with the old code -
cache lines could be migrated to the incoming CPU which v7_invalidate_l1()
would then invalidate, resulting in the loss of data in those migrated
cache lines.
If caches are not enabled, this change should have no effect: caches
should not be searched if the C bit is clear, and we still end up
invalidating the local L1 cache later in the normal boot sequence.
(If caches are searched with the C bit clear, and the caches initialise
in an unpredictable state, the ARM ARM requires an implementation specific
routine to be executed immediately on CPU startup. The idea is that if
this option were to be chosen, the vendor must guarantee a certain code
sequence can't be disrupted by the uninitialised state in the caches,
which, if you're searching the caches for instruction fetches, is pretty
much impossible to guarantee no matter what kind of code sequence you
specify: even the first instruction fetched from the CPU reset vector
could ultimately be sourced from the instruction cache if the i-cache
reports that it thinks it has valid data.)
--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2015-06-17 22:51 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
2015-05-19 21:44 ` [PATCH] ARM: v7 setup function should invalidate L1 cache Heiko Stuebner
2015-05-19 21:55 ` Arnd Bergmann
2015-05-19 22:07 ` Russell King - ARM Linux
2015-05-19 22:18 ` Arnd Bergmann
2015-05-19 22:32 ` Russell King - ARM Linux
2015-05-19 22:01 ` Florian Fainelli
2015-05-20 18:54 ` Dinh Nguyen
2015-05-20 22:48 ` Sebastian Hesselbarth
2015-05-21 2:08 ` Shawn Guo
2015-05-21 8:30 ` Thierry Reding
2015-05-22 7:36 ` Geert Uytterhoeven
2015-06-01 10:41 ` Geert Uytterhoeven
2015-06-01 10:53 ` Russell King - ARM Linux
2015-06-01 11:50 ` Geert Uytterhoeven
2015-06-17 20:35 ` Dinh Nguyen
2015-06-17 21:30 ` Russell King - ARM Linux
2015-06-17 22:12 ` Dinh Nguyen
2015-06-17 22:31 ` Dinh Nguyen
2015-06-17 22:51 ` Russell King - ARM Linux
2015-05-22 10:45 ` Michal Simek
2015-06-01 10:21 ` Wei Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).