* [PATCH] socfpga_a10: fix a kexec boot issue @ 2017-05-10 5:13 yanjiang.jin at windriver.com 2017-05-10 5:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin at windriver.com 0 siblings, 1 reply; 8+ messages in thread From: yanjiang.jin at windriver.com @ 2017-05-10 5:13 UTC (permalink / raw) To: linux-arm-kernel From: Yanjiang Jin <yanjiang.jin@windriver.com> I guess socfpga's other boards may need this patch, but I have Arria10 board only. So add a new function socfpga_a10_cpu_kill(), keep the old function socfpga_cpu_kill() for other boards. I also verified CPU_HOTPLUG after applying this patch, everything seems well. Test steps: 1. Enable kexec and build a SOCFPGA kernel; 2. Use zImage as 1st and 2nd kernel; 3. kexec -l /root/zImage --append="`cat /proc/cmdline`" 4. kexec -e Test env: U-Boot 2014.10 (Jan 13 2016 - 11:07:09) CPU : Altera SOCFPGA Arria 10 Platform BOARD : Altera SOCFPGA Arria 10 Dev Kit -- 1.9.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() 2017-05-10 5:13 [PATCH] socfpga_a10: fix a kexec boot issue yanjiang.jin at windriver.com @ 2017-05-10 5:13 ` yanjiang.jin at windriver.com 2017-05-12 14:25 ` Dinh Nguyen 2017-05-12 14:36 ` Mark Rutland 0 siblings, 2 replies; 8+ messages in thread From: yanjiang.jin at windriver.com @ 2017-05-10 5:13 UTC (permalink / raw) To: linux-arm-kernel From: Yanjiang Jin <yanjiang.jin@windriver.com> Kexec's second kernel would hang if CPU1 isn't reset. Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com> --- arch/arm/mach-socfpga/platsmp.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c index 0ee7677..db3940e 100644 --- a/arch/arm/mach-socfpga/platsmp.c +++ b/arch/arm/mach-socfpga/platsmp.c @@ -117,6 +117,16 @@ static int socfpga_cpu_kill(unsigned int cpu) { return 1; } + +static int socfpga_a10_cpu_kill(unsigned int cpu) +{ + /* This will put CPU #1 into reset. */ + if (socfpga_cpu1start_addr) + writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr + + SOCFPGA_A10_RSTMGR_MODMPURST); + + return 1; +} #endif static const struct smp_operations socfpga_smp_ops __initconst = { @@ -133,7 +143,7 @@ static int socfpga_cpu_kill(unsigned int cpu) .smp_boot_secondary = socfpga_a10_boot_secondary, #ifdef CONFIG_HOTPLUG_CPU .cpu_die = socfpga_cpu_die, - .cpu_kill = socfpga_cpu_kill, + .cpu_kill = socfpga_a10_cpu_kill, #endif }; -- 1.9.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() 2017-05-10 5:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin at windriver.com @ 2017-05-12 14:25 ` Dinh Nguyen 2017-05-12 14:36 ` Mark Rutland 1 sibling, 0 replies; 8+ messages in thread From: Dinh Nguyen @ 2017-05-12 14:25 UTC (permalink / raw) To: linux-arm-kernel On 05/10/2017 12:13 AM, yanjiang.jin at windriver.com wrote: > From: Yanjiang Jin <yanjiang.jin@windriver.com> > > Kexec's second kernel would hang if CPU1 isn't reset. > Can you please be a bit more descriptive on the commit log? Is it because when kexec starts, the SMP on the kexec's kernel try to run on CPU1? > Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com> > --- > arch/arm/mach-socfpga/platsmp.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c > index 0ee7677..db3940e 100644 > --- a/arch/arm/mach-socfpga/platsmp.c > +++ b/arch/arm/mach-socfpga/platsmp.c > @@ -117,6 +117,16 @@ static int socfpga_cpu_kill(unsigned int cpu) > { > return 1; > } > + > +static int socfpga_a10_cpu_kill(unsigned int cpu) > +{ > + /* This will put CPU #1 into reset. */ > + if (socfpga_cpu1start_addr) Do you need to check for socfpga_cpu1start_addr? Dinh ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() 2017-05-10 5:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin at windriver.com 2017-05-12 14:25 ` Dinh Nguyen @ 2017-05-12 14:36 ` Mark Rutland 2017-05-15 9:05 ` yjin 2017-05-15 10:00 ` Russell King - ARM Linux 1 sibling, 2 replies; 8+ messages in thread From: Mark Rutland @ 2017-05-12 14:36 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 10, 2017 at 01:13:04AM -0400, yanjiang.jin at windriver.com wrote: > From: Yanjiang Jin <yanjiang.jin@windriver.com> > > Kexec's second kernel would hang if CPU1 isn't reset. > > Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com> > --- > arch/arm/mach-socfpga/platsmp.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c > index 0ee7677..db3940e 100644 > --- a/arch/arm/mach-socfpga/platsmp.c > +++ b/arch/arm/mach-socfpga/platsmp.c > @@ -117,6 +117,16 @@ static int socfpga_cpu_kill(unsigned int cpu) > { > return 1; > } > + > +static int socfpga_a10_cpu_kill(unsigned int cpu) > +{ > + /* This will put CPU #1 into reset. */ > + if (socfpga_cpu1start_addr) > + writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr + > + SOCFPGA_A10_RSTMGR_MODMPURST); > + > + return 1; > +} > #endif I agree that currently, socfpga_cpu_die is completely bogus, as the CPU is just sat in WFI with the MMU, caches, etc enabled, and not actually off: static void socfpga_cpu_die(unsigned int cpu) { /* Do WFI. If we wake up early, go back into WFI */ while (1) cpu_do_idle(); } ... so that goes wrong as soon as the kerenl text gets ovewritten. However, AFAICT, this patch forcibly resets is without any teardown having happened. That will surely result in data being lost from the caches, for example. So I think this is incomplete. Thanks, Mark. > > static const struct smp_operations socfpga_smp_ops __initconst = { > @@ -133,7 +143,7 @@ static int socfpga_cpu_kill(unsigned int cpu) > .smp_boot_secondary = socfpga_a10_boot_secondary, > #ifdef CONFIG_HOTPLUG_CPU > .cpu_die = socfpga_cpu_die, > - .cpu_kill = socfpga_cpu_kill, > + .cpu_kill = socfpga_a10_cpu_kill, > #endif > }; > > -- > 1.9.1 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() 2017-05-12 14:36 ` Mark Rutland @ 2017-05-15 9:05 ` yjin 2017-05-15 10:00 ` Russell King - ARM Linux 1 sibling, 0 replies; 8+ messages in thread From: yjin @ 2017-05-15 9:05 UTC (permalink / raw) To: linux-arm-kernel On 2017?05?12? 22:36, Mark Rutland wrote: > On Wed, May 10, 2017 at 01:13:04AM -0400, yanjiang.jin at windriver.com wrote: >> From: Yanjiang Jin <yanjiang.jin@windriver.com> >> >> Kexec's second kernel would hang if CPU1 isn't reset. >> >> Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com> >> --- >> arch/arm/mach-socfpga/platsmp.c | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c >> index 0ee7677..db3940e 100644 >> --- a/arch/arm/mach-socfpga/platsmp.c >> +++ b/arch/arm/mach-socfpga/platsmp.c >> @@ -117,6 +117,16 @@ static int socfpga_cpu_kill(unsigned int cpu) >> { >> return 1; >> } >> + >> +static int socfpga_a10_cpu_kill(unsigned int cpu) >> +{ >> + /* This will put CPU #1 into reset. */ >> + if (socfpga_cpu1start_addr) >> + writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr + >> + SOCFPGA_A10_RSTMGR_MODMPURST); >> + >> + return 1; >> +} >> #endif > I agree that currently, socfpga_cpu_die is completely bogus, as the CPU is just > sat in WFI with the MMU, caches, etc enabled, and not actually off: > > static void socfpga_cpu_die(unsigned int cpu) > { > /* Do WFI. If we wake up early, go back into WFI */ > while (1) > cpu_do_idle(); > } > > ... so that goes wrong as soon as the kerenl text gets ovewritten. > > However, AFAICT, this patch forcibly resets is without any teardown > having happened. That will surely result in data being lost from the > caches, for example. > > So I think this is incomplete. Hi Mark, I think I got what you mean. So far, flush_cache is the only thing that I can think of. Can we just add flush_cache_all in cpu_die() now, and add other missing parts(if have) in the future. Thanks! Yanjiang > Thanks, > Mark. > >> >> static const struct smp_operations socfpga_smp_ops __initconst = { >> @@ -133,7 +143,7 @@ static int socfpga_cpu_kill(unsigned int cpu) >> .smp_boot_secondary = socfpga_a10_boot_secondary, >> #ifdef CONFIG_HOTPLUG_CPU >> .cpu_die = socfpga_cpu_die, >> - .cpu_kill = socfpga_cpu_kill, >> + .cpu_kill = socfpga_a10_cpu_kill, >> #endif >> }; >> >> -- >> 1.9.1 >> >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel at lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() 2017-05-12 14:36 ` Mark Rutland 2017-05-15 9:05 ` yjin @ 2017-05-15 10:00 ` Russell King - ARM Linux 1 sibling, 0 replies; 8+ messages in thread From: Russell King - ARM Linux @ 2017-05-15 10:00 UTC (permalink / raw) To: linux-arm-kernel On Fri, May 12, 2017 at 03:36:01PM +0100, Mark Rutland wrote: > However, AFAICT, this patch forcibly resets is without any teardown > having happened. That will surely result in data being lost from the > caches, for example. You're wrong on that point. Having each bloody platform implement the same friggin teardown is utter madness, and leads to all sorts of synchronisation issues. The generic code already deals with the cache issues, and has synchronisation to ensure that the dying CPU completes the cache handling before the requesting CPU continues with the killing. The only thing that platform code need concern itself with is doing is the "make the CPU die" thing. It's not perfect, but it's good enough for the majority of cases. In any case, encouraging people to add flush_cache_all() into their cpu_die() function is NOT the way forward if there is a problem - that introduces a new race between flush_cache_all() walking all the cache lines and cpu_kill() actually turning the power off to the CPU, which could very well happen either before or during the flush_cache_all() execution. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [V2 PATCH] socfpga_a10: fix a kexec boot issue @ 2017-05-15 9:13 yanjiang.jin at windriver.com 2017-05-15 9:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin at windriver.com 0 siblings, 1 reply; 8+ messages in thread From: yanjiang.jin at windriver.com @ 2017-05-15 9:13 UTC (permalink / raw) To: linux-arm-kernel From: Yanjiang Jin <yanjiang.jin@windriver.com> V1->V2: 1. Add flush_cache_all() in cpu_die(); 2. Add more descriptive on the commit log; Also validate kexec and CPU_HOTPLUG as before. I guess socfpga's other boards may need this patch, but I have Arria10 board only. So add a new function socfpga_a10_cpu_kill(), keep the old function socfpga_cpu_kill() for other boards. I also verified CPU_HOTPLUG after applying this patch, everything seems well. Test steps: 1. Enable kexec and build a SOCFPGA kernel; 2. Use zImage as 1st and 2nd kernel; 3. kexec -l /root/zImage --append="`cat /proc/cmdline`" 4. kexec -e Test env: U-Boot 2014.10 (Jan 13 2016 - 11:07:09) CPU : Altera SOCFPGA Arria 10 Platform BOARD : Altera SOCFPGA Arria 10 Dev Kit -- 1.9.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() 2017-05-15 9:13 [V2 PATCH] socfpga_a10: fix a kexec boot issue yanjiang.jin at windriver.com @ 2017-05-15 9:13 ` yanjiang.jin at windriver.com 2017-05-15 9:55 ` Russell King - ARM Linux 0 siblings, 1 reply; 8+ messages in thread From: yanjiang.jin at windriver.com @ 2017-05-15 9:13 UTC (permalink / raw) To: linux-arm-kernel From: Yanjiang Jin <yanjiang.jin@windriver.com> socfpga_cpu_die() just puts CPU in WFI, not actually off. So that Kexec's second kernel goes wrong since the kerenl text gets ovewritten. Now reset CPU1 in cpu_kill() to avoid this error. Also add flush_cache_all() to prevent data from being lost in cpu_die(). Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com> --- arch/arm/mach-socfpga/platsmp.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c index 0ee7677..53f290e 100644 --- a/arch/arm/mach-socfpga/platsmp.c +++ b/arch/arm/mach-socfpga/platsmp.c @@ -102,6 +102,8 @@ static void __init socfpga_smp_prepare_cpus(unsigned int max_cpus) */ static void socfpga_cpu_die(unsigned int cpu) { + flush_cache_all(); + /* Do WFI. If we wake up early, go back into WFI */ while (1) cpu_do_idle(); @@ -117,6 +119,15 @@ static int socfpga_cpu_kill(unsigned int cpu) { return 1; } + +static int socfpga_a10_cpu_kill(unsigned int cpu) +{ + /* This will put CPU #1 into reset. */ + writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr + + SOCFPGA_A10_RSTMGR_MODMPURST); + + return 1; +} #endif static const struct smp_operations socfpga_smp_ops __initconst = { @@ -133,7 +144,7 @@ static int socfpga_cpu_kill(unsigned int cpu) .smp_boot_secondary = socfpga_a10_boot_secondary, #ifdef CONFIG_HOTPLUG_CPU .cpu_die = socfpga_cpu_die, - .cpu_kill = socfpga_cpu_kill, + .cpu_kill = socfpga_a10_cpu_kill, #endif }; -- 1.9.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() 2017-05-15 9:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin at windriver.com @ 2017-05-15 9:55 ` Russell King - ARM Linux 0 siblings, 0 replies; 8+ messages in thread From: Russell King - ARM Linux @ 2017-05-15 9:55 UTC (permalink / raw) To: linux-arm-kernel On Mon, May 15, 2017 at 05:13:36PM +0800, yanjiang.jin at windriver.com wrote: > From: Yanjiang Jin <yanjiang.jin@windriver.com> > > socfpga_cpu_die() just puts CPU in WFI, not actually off. So that Kexec's > second kernel goes wrong since the kerenl text gets ovewritten. > Now reset CPU1 in cpu_kill() to avoid this error. > Also add flush_cache_all() to prevent data from being lost in cpu_die(). You should not need this, as the common code does the cache flushing for you. It's rather unsafe too - you are completely unsynchronised between the CPU executing in cpu_die() and the CPU executing cpu_kill() - it's quite possible for the code in cpu_kill() to take effect when you're part-way through the flush_cache_all(). -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-05-15 10:00 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-05-10 5:13 [PATCH] socfpga_a10: fix a kexec boot issue yanjiang.jin at windriver.com 2017-05-10 5:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin at windriver.com 2017-05-12 14:25 ` Dinh Nguyen 2017-05-12 14:36 ` Mark Rutland 2017-05-15 9:05 ` yjin 2017-05-15 10:00 ` Russell King - ARM Linux -- strict thread matches above, loose matches on Subject: below -- 2017-05-15 9:13 [V2 PATCH] socfpga_a10: fix a kexec boot issue yanjiang.jin at windriver.com 2017-05-15 9:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin at windriver.com 2017-05-15 9:55 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).