* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache [not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk> @ 2015-05-19 21:44 ` Heiko Stuebner 2015-05-19 21:55 ` Arnd Bergmann 2015-05-19 22:01 ` Florian Fainelli ` (7 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: Heiko Stuebner @ 2015-05-19 21:44 UTC (permalink / raw) To: linux-arm-kernel Hi Russell, Am Dienstag, 19. Mai 2015, 17:12:56 schrieb Russell King: > All ARMv5 and older CPUs invalidate their caches in the early assembly > setup function, prior to enabling the MMU. This is because the L1 > cache should not contain any data relevant to the execution of the > kernel at this point; all data should have been flushed out to memory. > > This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, > these typically do not search their caches when caching is disabled (as > it needs to be when the MMU is disabled) so this change should be safe. > > ARMv7 allows there to be CPUs which search their caches while caching is > disabled, and it's permitted that the cache is uninitialised at boot; > for these, the architecture reference manual requires that an > implementation specific code sequence is used immediately after reset > to ensure that the cache is placed into a sane state. Such > functionality is definitely outside the remit of the Linux kernel, and > must be done by the SoC's firmware before _any_ CPU gets to the Linux > kernel. > > Changing the data cache clean+invalidate to a mere invalidate allows us > to get rid of a lot of platform specific hacks around this issue for > their secondary CPU bringup paths - some of which were buggy. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Michael Niewoehner tested this on a rk3188 (Cortex-A9) and wrote in [0] > Tested-by: Michael Niewoehner <mniewoeh@stud.hs-offenburg.de> > > Tested on Radxa Rock Pro with RK3188. > The kernel panics on reboot I had before and also a kernel BUG when running > "memtester 1900M" went away and the rock seems to run stable now. I've also tested the patch on a rk3288-based board, so both supported Rockchip ARM-cores (A9 and A12/A17) seem to run fine with this change. Tested-by: Heiko Stuebner <heiko@sntech.de> Rockchip-specific changes Reviewed-by: Heiko Stuebner <heiko@sntech.de> Thanks Heiko [0] http://lists.infradead.org/pipermail/linux-rockchip/2015-May/003046.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-05-19 21:44 ` [PATCH] ARM: v7 setup function should invalidate L1 cache Heiko Stuebner @ 2015-05-19 21:55 ` Arnd Bergmann 2015-05-19 22:07 ` Russell King - ARM Linux 0 siblings, 1 reply; 21+ messages in thread From: Arnd Bergmann @ 2015-05-19 21:55 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 19 May 2015 23:44:58 Heiko Stuebner wrote: > > Michael Niewoehner tested this on a rk3188 (Cortex-A9) and wrote in [0] > > Tested-by: Michael Niewoehner <mniewoeh@stud.hs-offenburg.de> > > > > Tested on Radxa Rock Pro with RK3188. > > The kernel panics on reboot I had before and also a kernel BUG when running > > "memtester 1900M" went away and the rock seems to run stable now. > We should probably create a separate fix for that and add it to the stable kernels then. I would suggest something like the untested patch below, which takes advantage of the fact that we already have separate assignments for the start_secondary function pointer for A9 and A17. Arnd diff --git a/arch/arm/mach-rockchip/headsmp.S b/arch/arm/mach-rockchip/headsmp.S index 46c22dedf632..ae0077e8fe98 100644 --- a/arch/arm/mach-rockchip/headsmp.S +++ b/arch/arm/mach-rockchip/headsmp.S @@ -16,9 +16,6 @@ #include <linux/init.h> ENTRY(rockchip_secondary_startup) - mrc p15, 0, r0, c0, c0, 0 @ read main ID register - ldr r1, =0x00000c09 @ Cortex-A9 primary part number - teq r0, r1 beq v7_invalidate_l1 b secondary_startup ENDPROC(rockchip_secondary_startup) diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c index 5b4ca3c3c879..5f46724cca2f 100644 --- a/arch/arm/mach-rockchip/platsmp.c +++ b/arch/arm/mach-rockchip/platsmp.c @@ -149,8 +149,7 @@ static int __cpuinit rockchip_boot_secondary(unsigned int cpu, * sram_base_addr + 8: start address for pc * */ udelay(10); - writel(virt_to_phys(rockchip_secondary_startup), - sram_base_addr + 8); + writel(virt_to_phys(secondary_startup), sram_base_addr + 8); writel(0xDEADBEAF, sram_base_addr + 4); dsb_sev(); } ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-05-19 21:55 ` Arnd Bergmann @ 2015-05-19 22:07 ` Russell King - ARM Linux 2015-05-19 22:18 ` Arnd Bergmann 0 siblings, 1 reply; 21+ messages in thread From: Russell King - ARM Linux @ 2015-05-19 22:07 UTC (permalink / raw) To: linux-arm-kernel On Tue, May 19, 2015 at 11:55:12PM +0200, Arnd Bergmann wrote: > On Tuesday 19 May 2015 23:44:58 Heiko Stuebner wrote: > > > > Michael Niewoehner tested this on a rk3188 (Cortex-A9) and wrote in [0] > > > Tested-by: Michael Niewoehner <mniewoeh@stud.hs-offenburg.de> > > > > > > Tested on Radxa Rock Pro with RK3188. > > > The kernel panics on reboot I had before and also a kernel BUG when running > > > "memtester 1900M" went away and the rock seems to run stable now. > > > > We should probably create a separate fix for that and add it to the stable > kernels then. I would suggest something like the untested patch below, > which takes advantage of the fact that we already have separate assignments > for the start_secondary function pointer for A9 and A17. Your patch is also continuing the pattern with this code... > diff --git a/arch/arm/mach-rockchip/headsmp.S b/arch/arm/mach-rockchip/headsmp.S > index 46c22dedf632..ae0077e8fe98 100644 > --- a/arch/arm/mach-rockchip/headsmp.S > +++ b/arch/arm/mach-rockchip/headsmp.S > @@ -16,9 +16,6 @@ > #include <linux/init.h> > > ENTRY(rockchip_secondary_startup) > - mrc p15, 0, r0, c0, c0, 0 @ read main ID register > - ldr r1, =0x00000c09 @ Cortex-A9 primary part number > - teq r0, r1 > beq v7_invalidate_l1 > b secondary_startup If you look carefully at this, you'll notice that it's utter crap. (Sorry, but it is.) It has two problems: 1. It'll never match a Cortex-A9 CPU. Cortex-A9 has a MIDR value of 0x412fc09a, not 0x00000c09. The bit position of the part number field isn't even right. 2. If it does match, then we branch to "v7_invalidate_l1" without setting the link register: we'll never return back here (we'll return to whatever random value the link register contains) and so we'll never make it to secondary_startup. *Thankfully*, because of (1), this branch will never be taken - this is it's saving grace. Your patch introduces a /third/ form of crapiness: 3. If the PSR happens to have Z=1, the "beq" instruction will be taken, thereby crashing the system because of (2). The /simplest/ change which would fix this problem is to just change proc-v7.S. The remainder is effectively a cleanup removing redundant code. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-05-19 22:07 ` Russell King - ARM Linux @ 2015-05-19 22:18 ` Arnd Bergmann 2015-05-19 22:32 ` Russell King - ARM Linux 0 siblings, 1 reply; 21+ messages in thread From: Arnd Bergmann @ 2015-05-19 22:18 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 19 May 2015 23:07:21 Russell King - ARM Linux wrote: > If you look carefully at this, you'll notice that it's utter crap. > (Sorry, but it is.) It has two problems: > > 1. It'll never match a Cortex-A9 CPU. Cortex-A9 has a MIDR value of > 0x412fc09a, not 0x00000c09. The bit position of the part number > field isn't even right. > > 2. If it does match, then we branch to "v7_invalidate_l1" without setting > the link register: we'll never return back here (we'll return to whatever > random value the link register contains) and so we'll never make it to > secondary_startup. *Thankfully*, because of (1), this branch will > never be taken - this is it's saving grace. Yes, I've understood both before. > Your patch introduces a /third/ form of crapiness: > > 3. If the PSR happens to have Z=1, the "beq" instruction will be taken, > thereby crashing the system because of (2). Right, this was the result of sloppiness on my side when fat-fingering a patch for illustration. > The /simplest/ change which would fix this problem is to just change > proc-v7.S. The remainder is effectively a cleanup removing redundant > code. Fair enough. I wasn't sure if we're confident enough about that change already to put it into stable backports. If the risk is low enough, that's fine. Arnd ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-05-19 22:18 ` Arnd Bergmann @ 2015-05-19 22:32 ` Russell King - ARM Linux 0 siblings, 0 replies; 21+ messages in thread From: Russell King - ARM Linux @ 2015-05-19 22:32 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 20, 2015 at 12:18:43AM +0200, Arnd Bergmann wrote: > On Tuesday 19 May 2015 23:07:21 Russell King - ARM Linux wrote: > > The /simplest/ change which would fix this problem is to just change > > proc-v7.S. The remainder is effectively a cleanup removing redundant > > code. > > Fair enough. I wasn't sure if we're confident enough about that > change already to put it into stable backports. If the risk is low > enough, that's fine. When the kernel is entered from the decompressor, the last steps just before calling the kernel is to clean and invalidate the caches, then turn them off, and then jump to the kernel. So, there _should_ be no dirty cache lines which we rely upon at this point - if there was dirty data in the cache, then it hasn't been flushed to memory, and we're potentially going to crash because some part of the decompressed kernel image hasn't been flushed out to memory. So I'll put a confidence value of 99.999% on this. The case which might bite people is if they bypass the decompressor. Even then, leaving bits of the kernel image in the caches is really dodgy. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache [not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk> 2015-05-19 21:44 ` [PATCH] ARM: v7 setup function should invalidate L1 cache Heiko Stuebner @ 2015-05-19 22:01 ` Florian Fainelli 2015-05-20 18:54 ` Dinh Nguyen ` (6 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Florian Fainelli @ 2015-05-19 22:01 UTC (permalink / raw) To: linux-arm-kernel On 19/05/15 09:12, Russell King wrote: > All ARMv5 and older CPUs invalidate their caches in the early assembly > setup function, prior to enabling the MMU. This is because the L1 > cache should not contain any data relevant to the execution of the > kernel at this point; all data should have been flushed out to memory. > > This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, > these typically do not search their caches when caching is disabled (as > it needs to be when the MMU is disabled) so this change should be safe. > > ARMv7 allows there to be CPUs which search their caches while caching is > disabled, and it's permitted that the cache is uninitialised at boot; > for these, the architecture reference manual requires that an > implementation specific code sequence is used immediately after reset > to ensure that the cache is placed into a sane state. Such > functionality is definitely outside the remit of the Linux kernel, and > must be done by the SoC's firmware before _any_ CPU gets to the Linux > kernel. > > Changing the data cache clean+invalidate to a mere invalidate allows us > to get rid of a lot of platform specific hacks around this issue for > their secondary CPU bringup paths - some of which were buggy. For brcmstb: Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> -- Florian ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache [not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk> 2015-05-19 21:44 ` [PATCH] ARM: v7 setup function should invalidate L1 cache Heiko Stuebner 2015-05-19 22:01 ` Florian Fainelli @ 2015-05-20 18:54 ` Dinh Nguyen 2015-05-20 22:48 ` Sebastian Hesselbarth ` (5 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Dinh Nguyen @ 2015-05-20 18:54 UTC (permalink / raw) To: linux-arm-kernel On 05/19/2015 11:12 AM, Russell King wrote: > All ARMv5 and older CPUs invalidate their caches in the early assembly > setup function, prior to enabling the MMU. This is because the L1 > cache should not contain any data relevant to the execution of the > kernel at this point; all data should have been flushed out to memory. > > This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, > these typically do not search their caches when caching is disabled (as > it needs to be when the MMU is disabled) so this change should be safe. > > ARMv7 allows there to be CPUs which search their caches while caching is > disabled, and it's permitted that the cache is uninitialised at boot; > for these, the architecture reference manual requires that an > implementation specific code sequence is used immediately after reset > to ensure that the cache is placed into a sane state. Such > functionality is definitely outside the remit of the Linux kernel, and > must be done by the SoC's firmware before _any_ CPU gets to the Linux > kernel. > > Changing the data cache clean+invalidate to a mere invalidate allows us > to get rid of a lot of platform specific hacks around this issue for > their secondary CPU bringup paths - some of which were buggy. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> For socfpga: Tested-by: Dinh Nguyen <dinguyen@opensource.altera.com> Thanks, Dinh ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache [not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk> ` (2 preceding siblings ...) 2015-05-20 18:54 ` Dinh Nguyen @ 2015-05-20 22:48 ` Sebastian Hesselbarth 2015-05-21 2:08 ` Shawn Guo ` (4 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Sebastian Hesselbarth @ 2015-05-20 22:48 UTC (permalink / raw) To: linux-arm-kernel On 19.05.2015 18:12, Russell King wrote: > All ARMv5 and older CPUs invalidate their caches in the early assembly > setup function, prior to enabling the MMU. This is because the L1 > cache should not contain any data relevant to the execution of the > kernel at this point; all data should have been flushed out to memory. > > This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, > these typically do not search their caches when caching is disabled (as > it needs to be when the MMU is disabled) so this change should be safe. > > ARMv7 allows there to be CPUs which search their caches while caching is > disabled, and it's permitted that the cache is uninitialised at boot; > for these, the architecture reference manual requires that an > implementation specific code sequence is used immediately after reset > to ensure that the cache is placed into a sane state. Such > functionality is definitely outside the remit of the Linux kernel, and > must be done by the SoC's firmware before _any_ CPU gets to the Linux > kernel. > > Changing the data cache clean+invalidate to a mere invalidate allows us > to get rid of a lot of platform specific hacks around this issue for > their secondary CPU bringup paths - some of which were buggy. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > --- [...] > arch/arm/mach-berlin/headsmp.S | 6 ------ > arch/arm/mach-berlin/platsmp.c | 3 +-- For berlin, Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Thanks! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache [not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk> ` (3 preceding siblings ...) 2015-05-20 22:48 ` Sebastian Hesselbarth @ 2015-05-21 2:08 ` Shawn Guo 2015-05-21 8:30 ` Thierry Reding ` (3 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Shawn Guo @ 2015-05-21 2:08 UTC (permalink / raw) To: linux-arm-kernel On Tue, May 19, 2015 at 05:12:56PM +0100, Russell King wrote: > All ARMv5 and older CPUs invalidate their caches in the early assembly > setup function, prior to enabling the MMU. This is because the L1 > cache should not contain any data relevant to the execution of the > kernel at this point; all data should have been flushed out to memory. > > This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, > these typically do not search their caches when caching is disabled (as > it needs to be when the MMU is disabled) so this change should be safe. > > ARMv7 allows there to be CPUs which search their caches while caching is > disabled, and it's permitted that the cache is uninitialised at boot; > for these, the architecture reference manual requires that an > implementation specific code sequence is used immediately after reset > to ensure that the cache is placed into a sane state. Such > functionality is definitely outside the remit of the Linux kernel, and > must be done by the SoC's firmware before _any_ CPU gets to the Linux > kernel. > > Changing the data cache clean+invalidate to a mere invalidate allows us > to get rid of a lot of platform specific hacks around this issue for > their secondary CPU bringup paths - some of which were buggy. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > --- ... > arch/arm/mach-imx/headsmp.S | 1 - Acked-by: Shawn Guo <shawn.guo@linaro.org> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache [not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk> ` (4 preceding siblings ...) 2015-05-21 2:08 ` Shawn Guo @ 2015-05-21 8:30 ` Thierry Reding 2015-05-22 7:36 ` Geert Uytterhoeven ` (2 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Thierry Reding @ 2015-05-21 8:30 UTC (permalink / raw) To: linux-arm-kernel [-- Attachment #1: Type: text/plain, Size: 2370 bytes --] On Tue, May 19, 2015 at 05:12:56PM +0100, Russell King wrote: > All ARMv5 and older CPUs invalidate their caches in the early assembly > setup function, prior to enabling the MMU. This is because the L1 > cache should not contain any data relevant to the execution of the > kernel at this point; all data should have been flushed out to memory. > > This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, > these typically do not search their caches when caching is disabled (as > it needs to be when the MMU is disabled) so this change should be safe. > > ARMv7 allows there to be CPUs which search their caches while caching is > disabled, and it's permitted that the cache is uninitialised at boot; > for these, the architecture reference manual requires that an > implementation specific code sequence is used immediately after reset > to ensure that the cache is placed into a sane state. Such > functionality is definitely outside the remit of the Linux kernel, and > must be done by the SoC's firmware before _any_ CPU gets to the Linux > kernel. > > Changing the data cache clean+invalidate to a mere invalidate allows us > to get rid of a lot of platform specific hacks around this issue for > their secondary CPU bringup paths - some of which were buggy. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > --- [...] > arch/arm/mach-tegra/Makefile | 2 +- > arch/arm/mach-tegra/headsmp.S | 12 ------------ > arch/arm/mach-tegra/reset.c | 2 +- > arch/arm/mach-tegra/reset.h | 1 - [...] Russell, I saw a couple of conflicts trying to apply this to v4.1-rc1..v4.1-rc4 or any of recent linux-next versions. I ended up applying this on top of your for-next branch and tested using that. The conflicts seem very minor and a test merge of Tegra's for-next branch resolved this fine. A test merge of next-20150520 shows one conflict in mach-socfpga/core.h, but it's a trivial one to resolve. Anyway, the patch seems to work fine on TrimSlice (Tegra20), Beaver (Tegra30), Dalmore (Tegra114) and Jetson TK1 (Tegra124). Tested on Paul's boot farm: Tested-by: Thierry Reding <treding@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Paul, thanks for setting up the boot test infrastructure, this saved me a lot of time. Thierry [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache [not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk> ` (5 preceding siblings ...) 2015-05-21 8:30 ` Thierry Reding @ 2015-05-22 7:36 ` Geert Uytterhoeven 2015-06-01 10:41 ` Geert Uytterhoeven 2015-05-22 10:45 ` Michal Simek 2015-06-01 10:21 ` Wei Xu 8 siblings, 1 reply; 21+ messages in thread From: Geert Uytterhoeven @ 2015-05-22 7:36 UTC (permalink / raw) To: linux-arm-kernel On Tue, May 19, 2015 at 6:12 PM, Russell King <rmk+kernel@arm.linux.org.uk> wrote: > All ARMv5 and older CPUs invalidate their caches in the early assembly > setup function, prior to enabling the MMU. This is because the L1 > cache should not contain any data relevant to the execution of the > kernel at this point; all data should have been flushed out to memory. > > This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, > these typically do not search their caches when caching is disabled (as > it needs to be when the MMU is disabled) so this change should be safe. > > ARMv7 allows there to be CPUs which search their caches while caching is > disabled, and it's permitted that the cache is uninitialised at boot; > for these, the architecture reference manual requires that an > implementation specific code sequence is used immediately after reset > to ensure that the cache is placed into a sane state. Such > functionality is definitely outside the remit of the Linux kernel, and > must be done by the SoC's firmware before _any_ CPU gets to the Linux > kernel. > > Changing the data cache clean+invalidate to a mere invalidate allows us > to get rid of a lot of platform specific hacks around this issue for > their secondary CPU bringup paths - some of which were buggy. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> For various shmobile: - sh73a0/kzm9g - r8a7740/armadillo - r8a73a4/ape6evm - r8a7791/koelsch Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-05-22 7:36 ` Geert Uytterhoeven @ 2015-06-01 10:41 ` Geert Uytterhoeven 2015-06-01 10:53 ` Russell King - ARM Linux 0 siblings, 1 reply; 21+ messages in thread From: Geert Uytterhoeven @ 2015-06-01 10:41 UTC (permalink / raw) To: linux-arm-kernel On Fri, May 22, 2015 at 9:36 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Tue, May 19, 2015 at 6:12 PM, Russell King > <rmk+kernel@arm.linux.org.uk> wrote: >> All ARMv5 and older CPUs invalidate their caches in the early assembly >> setup function, prior to enabling the MMU. This is because the L1 >> cache should not contain any data relevant to the execution of the >> kernel at this point; all data should have been flushed out to memory. >> >> This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, >> these typically do not search their caches when caching is disabled (as >> it needs to be when the MMU is disabled) so this change should be safe. >> >> ARMv7 allows there to be CPUs which search their caches while caching is >> disabled, and it's permitted that the cache is uninitialised at boot; >> for these, the architecture reference manual requires that an >> implementation specific code sequence is used immediately after reset >> to ensure that the cache is placed into a sane state. Such >> functionality is definitely outside the remit of the Linux kernel, and >> must be done by the SoC's firmware before _any_ CPU gets to the Linux >> kernel. >> >> Changing the data cache clean+invalidate to a mere invalidate allows us >> to get rid of a lot of platform specific hacks around this issue for >> their secondary CPU bringup paths - some of which were buggy. >> >> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > > For various shmobile: > - sh73a0/kzm9g > - r8a7740/armadillo > - r8a73a4/ape6evm > - r8a7791/koelsch > > Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> FWIW, I have the feeling this has a slight influence on boot reliability on two of my boards: - r8a7740/armadillo, which is known to suffer from a cache-related bug in its bootloader, seems to have a higher change of booting successfully on cold boot, - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up, seems to have a lower chance of booting successfully. No time to spend all week turning this into a statistical significant test project... The reset button is my friend... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-06-01 10:41 ` Geert Uytterhoeven @ 2015-06-01 10:53 ` Russell King - ARM Linux 2015-06-01 11:50 ` Geert Uytterhoeven 0 siblings, 1 reply; 21+ messages in thread From: Russell King - ARM Linux @ 2015-06-01 10:53 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote: > FWIW, I have the feeling this has a slight influence on boot reliability on > two of my boards: > - r8a7740/armadillo, which is known to suffer from a cache-related bug in > its bootloader, seems to have a higher change of booting successfully on > cold boot, > - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up, > seems to have a lower chance of booting successfully. > > No time to spend all week turning this into a statistical significant test > project... The reset button is my friend... Damn it, you sent this right after I merged and pushed out this change in my for-arm-soc branch, and was just about to send it to the arm-soc people. What excellent timing you have. :) What happens on the kzm9g if you revert the mach-shmobile changes? For armadillo, do you use the decompressor? That should be doing all the cache cleaning already, prior to the kernel being entered. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-06-01 10:53 ` Russell King - ARM Linux @ 2015-06-01 11:50 ` Geert Uytterhoeven 2015-06-17 20:35 ` Dinh Nguyen 0 siblings, 1 reply; 21+ messages in thread From: Geert Uytterhoeven @ 2015-06-01 11:50 UTC (permalink / raw) To: linux-arm-kernel Hi Russell, On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote: >> FWIW, I have the feeling this has a slight influence on boot reliability on >> two of my boards: >> - r8a7740/armadillo, which is known to suffer from a cache-related bug in >> its bootloader, seems to have a higher change of booting successfully on >> cold boot, >> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up, >> seems to have a lower chance of booting successfully. >> >> No time to spend all week turning this into a statistical significant test >> project... The reset button is my friend... > > Damn it, you sent this right after I merged and pushed out this change in > my for-arm-soc branch, and was just about to send it to the arm-soc people. > What excellent timing you have. :) Don't worry, I didn't send that email to make you postpone this change. Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo) of the boot loader, and these are old SoCs, please go ahead. > What happens on the kzm9g if you revert the mach-shmobile changes? Seems to make no difference. > For armadillo, do you use the decompressor? That should be doing all the > cache cleaning already, prior to the kernel being entered. I think so. Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported machine ID", to booting almost completely, but lacking a few devices due to a corrupted DTB. Been like that as long as I remember, i.e. since I got the board ca. 1 year ago. Boots fine (100%) with kexec. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-06-01 11:50 ` Geert Uytterhoeven @ 2015-06-17 20:35 ` Dinh Nguyen 2015-06-17 21:30 ` Russell King - ARM Linux 0 siblings, 1 reply; 21+ messages in thread From: Dinh Nguyen @ 2015-06-17 20:35 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > Hi Russell, > > On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: >> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote: >>> FWIW, I have the feeling this has a slight influence on boot reliability on >>> two of my boards: >>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in >>> its bootloader, seems to have a higher change of booting successfully on >>> cold boot, >>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up, >>> seems to have a lower chance of booting successfully. >>> >>> No time to spend all week turning this into a statistical significant test >>> project... The reset button is my friend... >> >> Damn it, you sent this right after I merged and pushed out this change in >> my for-arm-soc branch, and was just about to send it to the arm-soc people. >> What excellent timing you have. :) > > Don't worry, I didn't send that email to make you postpone this change. > Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo) > of the boot loader, and these are old SoCs, please go ahead. > >> What happens on the kzm9g if you revert the mach-shmobile changes? > > Seems to make no difference. > >> For armadillo, do you use the decompressor? That should be doing all the >> cache cleaning already, prior to the kernel being entered. > > I think so. > > Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported > machine ID", to booting almost completely, but lacking a few devices due to > a corrupted DTB. Been like that as long as I remember, i.e. since I got the > board ca. 1 year ago. Boots fine (100%) with kexec. > It seems like this patch is causing the SoCFPGA to not boot with SMP reliably. About 1 out of every 10 reboots, I'm seeing the boot failure below. The error seems to only happen when I do a cold or warm reboot, but never occurs during a power-up. If I revert this patch, or put back the call to v7_invalidate_l1 in socfpga_secondary_startup , then its able to boot 100% of the time. Just wondering if anyone else is seeing something similar? I am testing this on both linux-next and arm-soc/rmk/for-arm-soc. When the failure happens, here's the log: Booting Linux on physical CPU 0x0 Initializing cgroup subsys cpuset Linux version 4.1.0-rc8-next-20150617-00002-gdd1f624 (dinguyen@linux-builds1) (gcc version 4.7.3 20130226 (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.03-20130313 - Linaro GCC 2013.03) ) #1 SMP Wed Jun 17 14:22:59 CDT 2015 CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr\x10c5387d CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache Machine model: Altera SOCFPGA Cyclone V SoC Development Kit Truncating RAM at 0x00000000-0x40000000 to -0x2f800000 Memory policy: Data cache writealloc On node 0 totalpages: 194560 free_area_init_node: node 0, pgdat c0692640, node_mem_map ef20b000 Normal zone: 1520 pages used for memmap Normal zone: 0 pages reserved Normal zone: 194560 pages, LIFO batch:31 PERCPU: Embedded 12 pages/cpu @ef1e1000 s19648 r8192 d21312 u49152 pcpu-alloc: s19648 r8192 d21312 u49152 alloc\x12*4096 pcpu-alloc: [0] 0 [0] 1 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 193040 Kernel command line: console=ttyS0,115200 root=/dev/mmcblk0p2 rw rootwait ip=dhcp earlyprintk PID hash table entries: 4096 (order: 2, 16384 bytes) Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 764288K/778240K available (4782K kernel code, 286K rwdata, 1344K rodata, 304K init, 135K bss, 13952K reserved, 0K cma-reserved) Virtual kernel memory layout: vector : 0xffff0000 - 0xffff1000 ( 4 kB) fixmap : 0xffc00000 - 0xfff00000 (3072 kB) vmalloc : 0xf0000000 - 0xff000000 ( 240 MB) lowmem : 0xc0000000 - 0xef800000 ( 760 MB) modules : 0xbf000000 - 0xc0000000 ( 16 MB) .text : 0xc0008000 - 0xc0603e78 (6128 kB) .init : 0xc0604000 - 0xc0650000 ( 304 kB) .data : 0xc0650000 - 0xc0697920 ( 287 kB) .bss : 0xc0697920 - 0xc06b976c ( 136 kB) SLUB: HWalignd, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 Hierarchical RCU implementation. Additional per-CPU info printed with stalls. Build-time adjustment of leaf fanout to 32. NR_IRQS:16 nr_irqs:16 16 L2C-310 enabling early BRESP for Cortex-A9 L2C-310 full line of zeros enabled for Cortex-A9 L2C-310 dynamic clock gating enabled, standby mode enabled L2C-310 cache controller enabled, 8 ways, 512 kB L2C-310: CACHE_ID 0x410030c9, AUX_CTRL 0x46060001 clocksource: timer1: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns sched_clock: 32 bits at 100MHz, resolution 10ns, wraps every 21474836475ns Console: colour dummy device 80x30 Calibrating delay loop... 1836.64 BogoMIPS (lpj‘83232) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 2048 (order: 1, 8192 bytes) Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes) CPU: Testing write buffer coherency: ok CPU0: thread -1, cpu 0, socket 0, mpidr 80000000 Setting up static identity map for 0x8280 - 0x82d8 CPU1: thread -1, cpu 1, socket 0, mpidr 80000001 Internal error: Oops - undefined instruction: 0 [#1] SMP ARM Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc8-next-20150617-00002-gdd1f624 #1 Hardware name: Altera SOCFPGA task: eecaeac0 ti: eecce000 task.ti: eecce000 PC is at vfp_notifier+0x58/0x12c LR is at notifier_call_chain+0x44/0x84 pc : [<c000a6bc>] lr : [<c003d134>] psr: 80000193 sp : eeccff48 ip : c06563c8 fp : eeccffd4 r10: eecaef80 r9 : ef1f1300 r8 : 00000002 r7 : eecd0000 r6 : c0656bc0 r5 : 00000000 r4 : eecd0000 r3 : c000a664 r2 : eecd0000 r1 : 00000002 r0 : c06563c8 Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 0000404a DAC: 00000015 Process swapper/1 (pid: 0, stack limit = 0xeecce218) Stack: (0xeeccff48 to 0xeecd0000) ff40: c000a664 ffffffff 00000000 c003d134 eecd0018 eecaeac0 ff60: c06648e0 0b52d2f9 c048cfa8 c003d18c 00000000 f0002100 00000001 c003d1ac ff80: 00000000 eecaeac0 c064f300 c001369c c064b304 c0013140 00000000 ef1ed328 ffa0: eeccffe8 c001e760 c0486ec4 2eba2000 c06957c0 c06524dc 00000015 c06957c0 ffc0: c048c778 c064b304 c06957c0 00000000 eeccffdc c0486ec4 eeccffe4 c0487138 ffe0: 00000001 c00544e8 c0009494 c0697bc0 00000000 000094ac 7ef5bffd 3f39b3f8 [<c000a6bc>] (vfp_notifier) from [<c003d134>] (notifier_call_chain+0x44/0x84) [<c003d134>] (notifier_call_chain) from [<c003d18c>] (__atomic_notifier_call_chain+0x18/0x20) [<c003d18c>] (__atomic_notifier_call_chain) from [<c003d1ac>] (atomic_notifier_call_chain+0x18/0x20) [<c003d1ac>] (atomic_notifier_call_chain) from [<c001369c>] (__switch_to+0x34/0x58) Code: e3a03002 e5843208 e3a00000 e8bd8038 (eef85a10) ---[ end trace 9eaea9661b3b550a ]--- Kernel panic - not syncing: Attempted to kill the idle task! SMP: failed to stop secondary CPUs ---[ end Kernel panic - not syncing: Attempted to kill the idle task! Dinh ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-06-17 20:35 ` Dinh Nguyen @ 2015-06-17 21:30 ` Russell King - ARM Linux 2015-06-17 22:12 ` Dinh Nguyen 0 siblings, 1 reply; 21+ messages in thread From: Russell King - ARM Linux @ 2015-06-17 21:30 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jun 17, 2015 at 03:35:13PM -0500, Dinh Nguyen wrote: > On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Russell, > > > > On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux > > <linux@arm.linux.org.uk> wrote: > >> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote: > >>> FWIW, I have the feeling this has a slight influence on boot reliability on > >>> two of my boards: > >>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in > >>> its bootloader, seems to have a higher change of booting successfully on > >>> cold boot, > >>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up, > >>> seems to have a lower chance of booting successfully. > >>> > >>> No time to spend all week turning this into a statistical significant test > >>> project... The reset button is my friend... > >> > >> Damn it, you sent this right after I merged and pushed out this change in > >> my for-arm-soc branch, and was just about to send it to the arm-soc people. > >> What excellent timing you have. :) > > > > Don't worry, I didn't send that email to make you postpone this change. > > Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo) > > of the boot loader, and these are old SoCs, please go ahead. > > > >> What happens on the kzm9g if you revert the mach-shmobile changes? > > > > Seems to make no difference. > > > >> For armadillo, do you use the decompressor? That should be doing all the > >> cache cleaning already, prior to the kernel being entered. > > > > I think so. > > > > Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported > > machine ID", to booting almost completely, but lacking a few devices due to > > a corrupted DTB. Been like that as long as I remember, i.e. since I got the > > board ca. 1 year ago. Boots fine (100%) with kexec. > > > > It seems like this patch is causing the SoCFPGA to not boot with SMP > reliably. About 1 out of every 10 reboots, I'm seeing the boot failure > below. The error seems to only happen when I do a cold or warm reboot, > but never occurs during a power-up. If I revert this patch, or put > back the call to v7_invalidate_l1 in socfpga_secondary_startup , then > its able to boot 100% of the time. It really sucks that you're only just testing this change now, because I've frozen my tree, and removing it for the next merge window is going to be an entirely non-trivial matter. You were copied on the original patch, which you failed to test... I can't say I have _much_ sympathy for a bug report at this point in time. > Internal error: Oops - undefined instruction: 0 [#1] SMP ARM > Modules linked in: > CPU: 1 PID: 0 Comm: swapper/1 Not tainted > 4.1.0-rc8-next-20150617-00002-gdd1f624 #1 > Hardware name: Altera SOCFPGA > task: eecaeac0 ti: eecce000 task.ti: eecce000 > PC is at vfp_notifier+0x58/0x12c > LR is at notifier_call_chain+0x44/0x84 This suggests that access to the VFP coprocessor is still disabled. However, vfp_hotplug() should have been called for CPU1 before it gets here, which should call vfp_enable(), which should enable access. However, what I'm wondering is... > [<c000a6bc>] (vfp_notifier) from [<c003d134>] (notifier_call_chain+0x44/0x84) > [<c003d134>] (notifier_call_chain) from [<c003d18c>] > (__atomic_notifier_call_chain+0x18/0x20) > [<c003d18c>] (__atomic_notifier_call_chain) from [<c003d1ac>] > (atomic_notifier_call_chain+0x18/0x20) > [<c003d1ac>] (atomic_notifier_call_chain) from [<c001369c>] > (__switch_to+0x34/0x58) what the rest of the trace is. Unfortunately, we mark __switch_to() as "cantunwind" which means the unwinder always stops here. It would be really good to know what is responsible for this scheduling event, whether it's due to a lock which is tried to be taken but is found to be locked, but I don't think we can modify __switch_to() to allow it to unwind (and I don't have the unwinder knowledge to hand to hack something together.) In order to see what's going on here, we do need to see the rest of the trace... right now I don't have the time to be able to sort out __switch_to() to achieve that. As I say, you should have tested this earlier. About the only thing I can do now is to revert the entire original patch, which is going to be extremely disruptive as it'll cause yet more conflicts between trees - again, something that we want to be avoiding at this stage in the game. Please test patches earlier. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-06-17 21:30 ` Russell King - ARM Linux @ 2015-06-17 22:12 ` Dinh Nguyen 2015-06-17 22:31 ` Dinh Nguyen 0 siblings, 1 reply; 21+ messages in thread From: Dinh Nguyen @ 2015-06-17 22:12 UTC (permalink / raw) To: linux-arm-kernel On 06/17/2015 04:30 PM, Russell King - ARM Linux wrote: > On Wed, Jun 17, 2015 at 03:35:13PM -0500, Dinh Nguyen wrote: >> On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: >>> Hi Russell, >>> >>> On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux >>> <linux@arm.linux.org.uk> wrote: >>>> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote: >>>>> FWIW, I have the feeling this has a slight influence on boot reliability on >>>>> two of my boards: >>>>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in >>>>> its bootloader, seems to have a higher change of booting successfully on >>>>> cold boot, >>>>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up, >>>>> seems to have a lower chance of booting successfully. >>>>> >>>>> No time to spend all week turning this into a statistical significant test >>>>> project... The reset button is my friend... >>>> >>>> Damn it, you sent this right after I merged and pushed out this change in >>>> my for-arm-soc branch, and was just about to send it to the arm-soc people. >>>> What excellent timing you have. :) >>> >>> Don't worry, I didn't send that email to make you postpone this change. >>> Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo) >>> of the boot loader, and these are old SoCs, please go ahead. >>> >>>> What happens on the kzm9g if you revert the mach-shmobile changes? >>> >>> Seems to make no difference. >>> >>>> For armadillo, do you use the decompressor? That should be doing all the >>>> cache cleaning already, prior to the kernel being entered. >>> >>> I think so. >>> >>> Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported >>> machine ID", to booting almost completely, but lacking a few devices due to >>> a corrupted DTB. Been like that as long as I remember, i.e. since I got the >>> board ca. 1 year ago. Boots fine (100%) with kexec. >>> >> >> It seems like this patch is causing the SoCFPGA to not boot with SMP >> reliably. About 1 out of every 10 reboots, I'm seeing the boot failure >> below. The error seems to only happen when I do a cold or warm reboot, >> but never occurs during a power-up. If I revert this patch, or put >> back the call to v7_invalidate_l1 in socfpga_secondary_startup , then >> its able to boot 100% of the time. > > It really sucks that you're only just testing this change now, because > I've frozen my tree, and removing it for the next merge window is going > to be an entirely non-trivial matter. You were copied on the original > patch, which you failed to test... I can't say I have _much_ sympathy > for a bug report at this point in time. > I apologize for not catching this error while testing this patch. But I did test it when you first sent it out..I probably didn't do a stress test. Sometimes the reboot fails in the 1st attempt, sometimes it fails in the 9th attempt. I only caught this error when I was testing my recent changes to use CPU_METHOD_OF_DECLARE. For me, I don't think you need to revert this patch or anything, but a fix can go in for a -rcX? Dinh ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-06-17 22:12 ` Dinh Nguyen @ 2015-06-17 22:31 ` Dinh Nguyen 2015-06-17 22:51 ` Russell King - ARM Linux 0 siblings, 1 reply; 21+ messages in thread From: Dinh Nguyen @ 2015-06-17 22:31 UTC (permalink / raw) To: linux-arm-kernel On 06/17/2015 05:12 PM, Dinh Nguyen wrote: > On 06/17/2015 04:30 PM, Russell King - ARM Linux wrote: >> On Wed, Jun 17, 2015 at 03:35:13PM -0500, Dinh Nguyen wrote: >>> On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: >>>> Hi Russell, >>>> >>>> On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux >>>> <linux@arm.linux.org.uk> wrote: >>>>> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote: >>>>>> FWIW, I have the feeling this has a slight influence on boot reliability on >>>>>> two of my boards: >>>>>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in >>>>>> its bootloader, seems to have a higher change of booting successfully on >>>>>> cold boot, >>>>>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up, >>>>>> seems to have a lower chance of booting successfully. >>>>>> >>>>>> No time to spend all week turning this into a statistical significant test >>>>>> project... The reset button is my friend... >>>>> >>>>> Damn it, you sent this right after I merged and pushed out this change in >>>>> my for-arm-soc branch, and was just about to send it to the arm-soc people. >>>>> What excellent timing you have. :) >>>> >>>> Don't worry, I didn't send that email to make you postpone this change. >>>> Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo) >>>> of the boot loader, and these are old SoCs, please go ahead. >>>> >>>>> What happens on the kzm9g if you revert the mach-shmobile changes? >>>> >>>> Seems to make no difference. >>>> >>>>> For armadillo, do you use the decompressor? That should be doing all the >>>>> cache cleaning already, prior to the kernel being entered. >>>> >>>> I think so. >>>> >>>> Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported >>>> machine ID", to booting almost completely, but lacking a few devices due to >>>> a corrupted DTB. Been like that as long as I remember, i.e. since I got the >>>> board ca. 1 year ago. Boots fine (100%) with kexec. >>>> >>> >>> It seems like this patch is causing the SoCFPGA to not boot with SMP >>> reliably. About 1 out of every 10 reboots, I'm seeing the boot failure >>> below. The error seems to only happen when I do a cold or warm reboot, >>> but never occurs during a power-up. If I revert this patch, or put >>> back the call to v7_invalidate_l1 in socfpga_secondary_startup , then >>> its able to boot 100% of the time. >> >> It really sucks that you're only just testing this change now, because >> I've frozen my tree, and removing it for the next merge window is going >> to be an entirely non-trivial matter. You were copied on the original >> patch, which you failed to test... I can't say I have _much_ sympathy >> for a bug report at this point in time. >> > > I apologize for not catching this error while testing this patch. But I > did test it when you first sent it out..I probably didn't do a stress > test. Sometimes the reboot fails in the 1st attempt, sometimes it fails > in the 9th attempt. > > I only caught this error when I was testing my recent changes to use > CPU_METHOD_OF_DECLARE. > > For me, I don't think you need to revert this patch or anything, but a > fix can go in for a -rcX? > Also, I am not seeing the error on the SoCFPGA Arria 10 platform at all. This Arria10 platform is running a different version of bootloader than the Cyclone5. Although, I also did test with the latest version of U-Boot on the Cyclone5. Dinh ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache 2015-06-17 22:31 ` Dinh Nguyen @ 2015-06-17 22:51 ` Russell King - ARM Linux 0 siblings, 0 replies; 21+ messages in thread From: Russell King - ARM Linux @ 2015-06-17 22:51 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jun 17, 2015 at 05:31:30PM -0500, Dinh Nguyen wrote: > Also, I am not seeing the error on the SoCFPGA Arria 10 platform at all. > This Arria10 platform is running a different version of bootloader than > the Cyclone5. Although, I also did test with the latest version of > U-Boot on the Cyclone5. It _shouldn't_ cause any issue, unless secondary_startup is called with caches already enabled (which it shouldn't.) If caches are enabled, that's incredibly dodgy even with the old code - cache lines could be migrated to the incoming CPU which v7_invalidate_l1() would then invalidate, resulting in the loss of data in those migrated cache lines. If caches are not enabled, this change should have no effect: caches should not be searched if the C bit is clear, and we still end up invalidating the local L1 cache later in the normal boot sequence. (If caches are searched with the C bit clear, and the caches initialise in an unpredictable state, the ARM ARM requires an implementation specific routine to be executed immediately on CPU startup. The idea is that if this option were to be chosen, the vendor must guarantee a certain code sequence can't be disrupted by the uninitialised state in the caches, which, if you're searching the caches for instruction fetches, is pretty much impossible to guarantee no matter what kind of code sequence you specify: even the first instruction fetched from the CPU reset vector could ultimately be sourced from the instruction cache if the i-cache reports that it thinks it has valid data.) -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache [not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk> ` (6 preceding siblings ...) 2015-05-22 7:36 ` Geert Uytterhoeven @ 2015-05-22 10:45 ` Michal Simek 2015-06-01 10:21 ` Wei Xu 8 siblings, 0 replies; 21+ messages in thread From: Michal Simek @ 2015-05-22 10:45 UTC (permalink / raw) To: linux-arm-kernel On 05/19/2015 06:12 PM, Russell King wrote: > All ARMv5 and older CPUs invalidate their caches in the early assembly > setup function, prior to enabling the MMU. This is because the L1 > cache should not contain any data relevant to the execution of the > kernel at this point; all data should have been flushed out to memory. > > This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, > these typically do not search their caches when caching is disabled (as > it needs to be when the MMU is disabled) so this change should be safe. > > ARMv7 allows there to be CPUs which search their caches while caching is > disabled, and it's permitted that the cache is uninitialised at boot; > for these, the architecture reference manual requires that an > implementation specific code sequence is used immediately after reset > to ensure that the cache is placed into a sane state. Such > functionality is definitely outside the remit of the Linux kernel, and > must be done by the SoC's firmware before _any_ CPU gets to the Linux > kernel. > > Changing the data cache clean+invalidate to a mere invalidate allows us > to get rid of a lot of platform specific hacks around this issue for > their secondary CPU bringup paths - some of which were buggy. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > --- ... > arch/arm/mach-zynq/common.h | 2 -- > arch/arm/mach-zynq/headsmp.S | 5 ----- > arch/arm/mach-zynq/platsmp.c | 5 ++--- For Zynq: Tested-by: Michal Simek <michal.simek@xilinx.com> Thanks, Michal ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] ARM: v7 setup function should invalidate L1 cache [not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk> ` (7 preceding siblings ...) 2015-05-22 10:45 ` Michal Simek @ 2015-06-01 10:21 ` Wei Xu 8 siblings, 0 replies; 21+ messages in thread From: Wei Xu @ 2015-06-01 10:21 UTC (permalink / raw) To: linux-arm-kernel On 5/19/2015 5:12 PM, Russell King wrote: > All ARMv5 and older CPUs invalidate their caches in the early assembly > setup function, prior to enabling the MMU. This is because the L1 > cache should not contain any data relevant to the execution of the > kernel at this point; all data should have been flushed out to memory. > > This requirement should also be true for ARMv6 and ARMv7 CPUs - indeed, > these typically do not search their caches when caching is disabled (as > it needs to be when the MMU is disabled) so this change should be safe. > > ARMv7 allows there to be CPUs which search their caches while caching is > disabled, and it's permitted that the cache is uninitialised at boot; > for these, the architecture reference manual requires that an > implementation specific code sequence is used immediately after reset > to ensure that the cache is placed into a sane state. Such > functionality is definitely outside the remit of the Linux kernel, and > must be done by the SoC's firmware before _any_ CPU gets to the Linux > kernel. > > Changing the data cache clean+invalidate to a mere invalidate allows us > to get rid of a lot of platform specific hacks around this issue for > their secondary CPU bringup paths - some of which were buggy. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > --- Hi Russell, [...] > arch/arm/mach-hisi/Makefile | 2 +- > arch/arm/mach-hisi/core.h | 1 - > arch/arm/mach-hisi/headsmp.S | 16 ---------------- > arch/arm/mach-hisi/platsmp.c | 4 ++-- [...] Sorry for the late reply. For Hisilicon: - hip01/ca9x2 - hix5hd2/dkb Tested-by: Wei Xu <xuwei5@hisilicon.com> Thanks! Best Regards, Wei ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2015-06-17 22:51 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1Yuk8W-0001tC-IK@rmk-PC.arm.linux.org.uk>
2015-05-19 21:44 ` [PATCH] ARM: v7 setup function should invalidate L1 cache Heiko Stuebner
2015-05-19 21:55 ` Arnd Bergmann
2015-05-19 22:07 ` Russell King - ARM Linux
2015-05-19 22:18 ` Arnd Bergmann
2015-05-19 22:32 ` Russell King - ARM Linux
2015-05-19 22:01 ` Florian Fainelli
2015-05-20 18:54 ` Dinh Nguyen
2015-05-20 22:48 ` Sebastian Hesselbarth
2015-05-21 2:08 ` Shawn Guo
2015-05-21 8:30 ` Thierry Reding
2015-05-22 7:36 ` Geert Uytterhoeven
2015-06-01 10:41 ` Geert Uytterhoeven
2015-06-01 10:53 ` Russell King - ARM Linux
2015-06-01 11:50 ` Geert Uytterhoeven
2015-06-17 20:35 ` Dinh Nguyen
2015-06-17 21:30 ` Russell King - ARM Linux
2015-06-17 22:12 ` Dinh Nguyen
2015-06-17 22:31 ` Dinh Nguyen
2015-06-17 22:51 ` Russell King - ARM Linux
2015-05-22 10:45 ` Michal Simek
2015-06-01 10:21 ` Wei Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).