* [PATCH v3 0/3] ARM: rockchip: fix the SMP
@ 2015-06-05 15:11 Caesar Wang
2015-06-05 15:11 ` [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset Caesar Wang
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Caesar Wang @ 2015-06-05 15:11 UTC (permalink / raw)
To: linux-arm-kernel
Verified on url =
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-3.14
Caesar Wang (3):
ARM: rockchip: fix the CPU soft reset
ARM: rockchip: ensure CPU to enter WFI/WFE state
ARM: rockchip: fix the SMP code style
arch/arm/mach-rockchip/platsmp.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset 2015-06-05 15:11 [PATCH v3 0/3] ARM: rockchip: fix the SMP Caesar Wang @ 2015-06-05 15:11 ` Caesar Wang 2015-06-05 15:52 ` Heiko Stübner 2015-06-05 16:28 ` Doug Anderson 2015-06-05 15:11 ` [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state Caesar Wang 2015-06-05 15:11 ` [PATCH v3 3/3] ARM: rockchip: fix the SMP code style Caesar Wang 2 siblings, 2 replies; 12+ messages in thread From: Caesar Wang @ 2015-06-05 15:11 UTC (permalink / raw) To: linux-arm-kernel We need different orderings when turning a core on and turning a core off. In one case we need to assert reset before turning power off. In ther other case we need to turn power on and the deassert reset. In general, the correct flow is: CPU off: reset_control_assert regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), BIT(pd)) CPU on: regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), 0) reset_control_deassert This is needed for stressing CPU up/down, as per: cd /sys/devices/system/cpu/ for i in $(seq 1000); do echo "================= $i ============" for j in $(seq 100); do while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != "000"" ]] echo 0 > cpu1/online echo 0 > cpu2/online echo 0 > cpu3/online done while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != "111" ]]; do echo 1 > cpu1/online echo 1 > cpu2/online echo 1 > cpu3/online done done done The following is reproducile log: [34466.186812] PM: noirq suspend of devices complete after 0.669 msecs [34466.186824] Disabling non-boot CPUs ... [34466.187509] CPU1: shutdown [34466.188672] CPU2: shutdown [34473.736627] Kernel panic - not syncing:Watchdog detected hard LOCKUP on cpu 0 ....... Signed-off-by: Caesar Wang <wxt@rock-chips.com> --- arch/arm/mach-rockchip/platsmp.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c index 5b4ca3c..25da16f 100644 --- a/arch/arm/mach-rockchip/platsmp.c +++ b/arch/arm/mach-rockchip/platsmp.c @@ -88,18 +88,24 @@ static int pmu_set_power_domain(int pd, bool on) return PTR_ERR(rstc); } - if (on) - reset_control_deassert(rstc); - else + if (!on) reset_control_assert(rstc); - reset_control_put(rstc); - } + ret = regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); + if (ret < 0) { + pr_err("%s: could not update power domain\n", __func__); + reset_control_put(rstc); + return ret; + } - ret = regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); - if (ret < 0) { - pr_err("%s: could not update power domain\n", __func__); - return ret; + if (on) + reset_control_deassert(rstc); + } else { + ret = regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); + if (ret < 0) { + pr_err("%s: could not update power domain\n", __func__); + return ret; + } } ret = -1; -- 1.9.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset 2015-06-05 15:11 ` [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset Caesar Wang @ 2015-06-05 15:52 ` Heiko Stübner 2015-06-05 16:17 ` Heiko Stübner 2015-06-05 16:28 ` Doug Anderson 1 sibling, 1 reply; 12+ messages in thread From: Heiko Stübner @ 2015-06-05 15:52 UTC (permalink / raw) To: linux-arm-kernel Hi Caesar, Am Freitag, 5. Juni 2015, 23:11:42 schrieb Caesar Wang: > We need different orderings when turning a core on and turning a core > off. In one case we need to assert reset before turning power off. > In ther other case we need to turn power on and the deassert reset. > > In general, the correct flow is: > > CPU off: > reset_control_assert > regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), BIT(pd)) > CPU on: > regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), 0) > reset_control_deassert > > This is needed for stressing CPU up/down, as per: > cd /sys/devices/system/cpu/ > for i in $(seq 1000); do > echo "================= $i ============" > for j in $(seq 100); do > while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat > cpu3/online)" != "000"" ]] echo 0 > cpu1/online > echo 0 > cpu2/online > echo 0 > cpu3/online > done > while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat > cpu3/online)" != "111" ]]; do echo 1 > cpu1/online > echo 1 > cpu2/online > echo 1 > cpu3/online > done > done > done > > The following is reproducile log: > [34466.186812] PM: noirq suspend of devices complete after 0.669 msecs > [34466.186824] Disabling non-boot CPUs ... > [34466.187509] CPU1: shutdown > [34466.188672] CPU2: shutdown > [34473.736627] Kernel panic - not syncing:Watchdog detected hard LOCKUP > on cpu 0 ....... > > Signed-off-by: Caesar Wang <wxt@rock-chips.com> > --- could we do this something like the shown below instead? Here the deassertion of the reset really happens after we are sure the power-domain is on -------------- 8< --------------- diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c index 5b4ca3c..ee5dbad6 100644 --- a/arch/arm/mach-rockchip/platsmp.c +++ b/arch/arm/mach-rockchip/platsmp.c @@ -72,6 +72,7 @@ static struct reset_control *rockchip_get_core_reset(int cpu) static int pmu_set_power_domain(int pd, bool on) { u32 val = (on) ? 0 : BIT(pd); + struct reset_control *rstc = rockchip_get_core_reset(pd); int ret; /* @@ -80,20 +81,16 @@ static int pmu_set_power_domain(int pd, bool on) * processor is powered down. */ if (read_cpuid_part() != ARM_CPU_PART_CORTEX_A9) { - struct reset_control *rstc = rockchip_get_core_reset(pd); - + /* We only require the reset on the RK3288@the moment */ if (IS_ERR(rstc)) { pr_err("%s: could not get reset control for core %d\n", __func__, pd); return PTR_ERR(rstc); } - if (on) - reset_control_deassert(rstc); - else + if (!on) reset_control_assert(rstc); - reset_control_put(rstc); } ret = regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); @@ -112,6 +109,12 @@ static int pmu_set_power_domain(int pd, bool on) } } + if (read_cpuid_part() != ARM_CPU_PART_CORTEX_A9 && on) + reset_control_deassert(rstc); + + if (!IS_ERR(rstc)) + reset_control_put(rstc); + return 0; } -------------- 8< --------------- ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset 2015-06-05 15:52 ` Heiko Stübner @ 2015-06-05 16:17 ` Heiko Stübner 0 siblings, 0 replies; 12+ messages in thread From: Heiko Stübner @ 2015-06-05 16:17 UTC (permalink / raw) To: linux-arm-kernel Hi Caesar, Am Freitag, 5. Juni 2015, 17:52:48 schrieb Heiko St?bner: > could we do this something like the shown below instead? > Here the deassertion of the reset really happens after we are sure the > power-domain is on As you mention in the gerrit cl that the hang does happen when you deassert the reset before the power-domain wait, ignore my last message please :-) But please make sure to also provide a changelog to the list submissions in the future. Thanks Heiko ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset 2015-06-05 15:11 ` [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset Caesar Wang 2015-06-05 15:52 ` Heiko Stübner @ 2015-06-05 16:28 ` Doug Anderson 1 sibling, 0 replies; 12+ messages in thread From: Doug Anderson @ 2015-06-05 16:28 UTC (permalink / raw) To: linux-arm-kernel Caesar, On Fri, Jun 5, 2015 at 8:11 AM, Caesar Wang <wxt@rock-chips.com> wrote: > We need different orderings when turning a core on and turning a core > off. In one case we need to assert reset before turning power off. > In ther other case we need to turn power on and the deassert reset. > > In general, the correct flow is: > > CPU off: > reset_control_assert > regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), BIT(pd)) Add: "ensure power domain is on" to this list. > CPU on: > regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), 0) > reset_control_deassert Add: "ensure power domain is on" to this list. Adding the "ensure power domain is on" step helps document that patch set version 2 is not what you want and that you thought about it. > @@ -88,18 +88,24 @@ static int pmu_set_power_domain(int pd, bool on) > return PTR_ERR(rstc); > } > > - if (on) > - reset_control_deassert(rstc); > - else > + if (!on) > reset_control_assert(rstc); > > - reset_control_put(rstc); > - } > + ret = regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); > + if (ret < 0) { > + pr_err("%s: could not update power domain\n", __func__); > + reset_control_put(rstc); > + return ret; > + } > > - ret = regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); > - if (ret < 0) { > - pr_err("%s: could not update power domain\n", __func__); > - return ret; > + if (on) > + reset_control_deassert(rstc); I think you need a "reset_control_put(rstc);" here in the non-error case. Otherwise this looks reasonable to me and you can add my Reviewed-by tag. I'll also kick off some tests with this series today to confirm as well. -Doug ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state 2015-06-05 15:11 [PATCH v3 0/3] ARM: rockchip: fix the SMP Caesar Wang 2015-06-05 15:11 ` [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset Caesar Wang @ 2015-06-05 15:11 ` Caesar Wang 2015-06-05 17:49 ` Doug Anderson 2015-06-05 15:11 ` [PATCH v3 3/3] ARM: rockchip: fix the SMP code style Caesar Wang 2 siblings, 1 reply; 12+ messages in thread From: Caesar Wang @ 2015-06-05 15:11 UTC (permalink / raw) To: linux-arm-kernel In idle mode, core1/2/3 of Cortex-A17 should be either power off or in WFI/WFE state. we can delay 1ms to ensure the CPU enter WFI/WFE state. Signed-off-by: Caesar Wang <wxt@rock-chips.com> --- arch/arm/mach-rockchip/platsmp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c index 25da16f..6672fdd 100644 --- a/arch/arm/mach-rockchip/platsmp.c +++ b/arch/arm/mach-rockchip/platsmp.c @@ -325,6 +325,9 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus) #ifdef CONFIG_HOTPLUG_CPU static int rockchip_cpu_kill(unsigned int cpu) { + /* ensure CPU can enter the WFI/WFE state */ + mdelay(1); + pmu_set_power_domain(0 + cpu, false); return 1; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state 2015-06-05 15:11 ` [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state Caesar Wang @ 2015-06-05 17:49 ` Doug Anderson 2015-06-05 18:29 ` Russell King - ARM Linux 0 siblings, 1 reply; 12+ messages in thread From: Doug Anderson @ 2015-06-05 17:49 UTC (permalink / raw) To: linux-arm-kernel Caesar, On Fri, Jun 5, 2015 at 8:11 AM, Caesar Wang <wxt@rock-chips.com> wrote: > In idle mode, core1/2/3 of Cortex-A17 should be either power off or in > WFI/WFE state. > we can delay 1ms to ensure the CPU enter WFI/WFE state. > > Signed-off-by: Caesar Wang <wxt@rock-chips.com> > --- > > arch/arm/mach-rockchip/platsmp.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c > index 25da16f..6672fdd 100644 > --- a/arch/arm/mach-rockchip/platsmp.c > +++ b/arch/arm/mach-rockchip/platsmp.c > @@ -325,6 +325,9 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus) > #ifdef CONFIG_HOTPLUG_CPU > static int rockchip_cpu_kill(unsigned int cpu) > { > + /* ensure CPU can enter the WFI/WFE state */ > + mdelay(1); This is a pretty weak assurance. Is there any stronger assurance you can give that we're in WFI/WFE state and won't come out of it? Do you actually see problems if you power off a CPU when it's not in WFI/WFE state? ...so I _think_ I see the path that is happening here and what you're trying to handle. Specifically, I see: On dying CPU: 1. cpu_die() calls 'complete(&cpu_died)' 2. cpu_die() calls 'smp_ops.cpu_die(cpu)' AKA rockchip_cpu_die() 3. rockchip_cpu_die() does a bit more cache flushing before looping in cpu_do_idle() The problem is that the moment the completion happens in step #1 above the dying CPU can be killed. ...so you're trying to make sure the dying CPU makes it to cpu_do_idle(). In that case a fixed mdelay(1) might be OK since the time that the CPU takes to run through a few instructions (with no interrupts) is pretty predictable. It would be really nice if the commit message went through all this, though. ...but is there any chance that cpu_do_idle() could somehow return? We shouldn't send any events since we've marked the core offline, but perhaps some per-core interrupt (arch timer?) that didn't get migrated? -Doug ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state 2015-06-05 17:49 ` Doug Anderson @ 2015-06-05 18:29 ` Russell King - ARM Linux 2015-06-05 20:24 ` Doug Anderson 0 siblings, 1 reply; 12+ messages in thread From: Russell King - ARM Linux @ 2015-06-05 18:29 UTC (permalink / raw) To: linux-arm-kernel On Fri, Jun 05, 2015 at 10:49:14AM -0700, Doug Anderson wrote: > On Fri, Jun 5, 2015 at 8:11 AM, Caesar Wang <wxt@rock-chips.com> wrote: > > In idle mode, core1/2/3 of Cortex-A17 should be either power off or in > > WFI/WFE state. > > we can delay 1ms to ensure the CPU enter WFI/WFE state. > > > > Signed-off-by: Caesar Wang <wxt@rock-chips.com> > > --- > > > > arch/arm/mach-rockchip/platsmp.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c > > index 25da16f..6672fdd 100644 > > --- a/arch/arm/mach-rockchip/platsmp.c > > +++ b/arch/arm/mach-rockchip/platsmp.c > > @@ -325,6 +325,9 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus) > > #ifdef CONFIG_HOTPLUG_CPU > > static int rockchip_cpu_kill(unsigned int cpu) > > { > > + /* ensure CPU can enter the WFI/WFE state */ > > + mdelay(1); > > This is a pretty weak assurance. Is there any stronger assurance you > can give that we're in WFI/WFE state and won't come out of it? > > Do you actually see problems if you power off a CPU when it's not in > WFI/WFE state? I really don't like to see platforms pulling crap tricks in their CPU hotunplug code like this. If there's something wrong with the generic code, then the generic code needs to be fixed, not worked around in platform code. > ...so I _think_ I see the path that is happening here and what you're > trying to handle. Specifically, I see: > > On dying CPU: > 1. cpu_die() calls 'complete(&cpu_died)' > 2. cpu_die() calls 'smp_ops.cpu_die(cpu)' AKA rockchip_cpu_die() > 3. rockchip_cpu_die() does a bit more cache flushing before looping in > cpu_do_idle() > > The problem is that the moment the completion happens in step #1 above > the dying CPU can be killed. ...so you're trying to make sure the > dying CPU makes it to cpu_do_idle(). In that case a fixed mdelay(1) > might be OK since the time that the CPU takes to run through a few > instructions (with no interrupts) is pretty predictable. It would be > really nice if the commit message went through all this, though. > > ...but is there any chance that cpu_do_idle() could somehow return? > We shouldn't send any events since we've marked the core offline, but > perhaps some per-core interrupt (arch timer?) that didn't get > migrated? How this is supposed to work is: CPU requesting death CPU dying stop_machine(take_down_cpu) take_down_cpu() -__cpu_disable() - platform_cpu_disable() - marks CPU offline - migrates IRQs away - caches flushed - tlbs flushed __cpu_die() - processes migrated away -wait_for_completion_timeout() -timekeeping migrated away returns to idle loop arch_cpu_idle_dead() -cpu_die() - idle_task_exit() - flush_cache_louis() At this point, dirty cache lines which matter are flushed from the dying CPUs cache. However, we still need the dying CPU to be coherent for the next step, which is to issue the completion. - complete() - flush_cache_louis() At some point during the above, the completion becomes visible to the other CPU, and it continues its execution. -pr_notice() -platform_cpu_kill() - smp_ops.cpu_die() The precise ordering of smp_ops.cpu_die() vs platform_cpu_kill() is pretty much indeterminant, and is subject to scheduling effects which could delay the requesting CPU. Now, the problem is, from what ARM has been saying, that cache lines can get migrated to the dying CPU which may contain the "master" copy of the data, which means that when the dying CPU actually exits coherency, that data is lost to the rest of the system. Ideally, what we would like to do is to exit coherency earlier, and then have some notification mechanism between the dying CPU and the rest of the system to say that it's finished exiting coherency. However, we face several issues. 1) v7_coherency_exit() is specific to v7 CPUs and can't be used by generic code. 2) we have no way to perform that notification reliably once coherency has been exited. There's other reasons we need to change the notification mechanism, one of them is that completions use RCU, and RCU isn't actually valid in this context - and we can't make it valid because the dying CPU might loose power at any moment after it's called complete(). Paul McKenny created what was a generic infrastructure for this notification, but I object to it on principle using atomic_t... and if we've exited coherency (as we would want to), it won't work because it makes use of atomic_cmpxchg() on the dying CPU. I've been working on a solution to use an IPI to notify from the dying CPU, but that's fraught because of the separation of the IRQ controller code from the architecture code, and we now have spinlocks in the IPI-raising path due to the big.LITTLE stuff - and spinlocks need coherency to be working. So, we're actually in a very sticky position over taking CPUs offline. It seems to be something that the ARM architecture and kernel architecture doesn't actually allow to be done safely. So much so, that in a similar way to the original Keystone 2 physical address switch, I'm tempted to make taking a CPU offline taint the kernel! We're going to end up with tainted kernels through this path when Paul pushes his RCU correctness patches anyway anyway, so it's just a matter of the inevitable taint happening sooner or later. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state 2015-06-05 18:29 ` Russell King - ARM Linux @ 2015-06-05 20:24 ` Doug Anderson 2015-06-08 4:56 ` Caesar Wang 0 siblings, 1 reply; 12+ messages in thread From: Doug Anderson @ 2015-06-05 20:24 UTC (permalink / raw) To: linux-arm-kernel Russell, On Fri, Jun 5, 2015 at 11:29 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > 1) v7_coherency_exit() is specific to v7 CPUs and can't be used by > generic code. Oh, I see. So (I think) you're saying that perhaps the reason that Caesar needed his patch was that he needed the dying processor to execute v7_exit_coherency_flush(), NOT that he needed the dying processor to be in WFI/WFE. That actually makes a lot more sense to me! :) Thanks a lot for pointing that out, it's very helpful. > So, we're actually in a very sticky position over taking CPUs offline. > It seems to be something that the ARM architecture and kernel > architecture doesn't actually allow to be done safely. So much so, > that in a similar way to the original Keystone 2 physical address > switch, I'm tempted to make taking a CPU offline taint the kernel! Wow, that's going to suck. So if you want to suspend / resume you need to taint your kernel. So much for saving the planet by going into suspend... ...or are you thinking that it won't taint the kernel when the kernel takes CPUs offline for suspend/resume purposes? ...or are you thinking you've some solution that works for suspend/resume that doesn't work for the general cpu offlining problem? I'd be very interested to hear... I know I'm not a maintainer, but if I were and I knew that lots of smart people had thought about the problem of CPU offlining and they didn't have a solution and I could make my platform 99.99999999% reliable by allowing a very safe mdelay(1) where I had a pretty strong guarantee that the 1ms was enough time, I would probably accept that code... So since I'm not a maintainer and I certainly couldn't ack such code, I would certainly be happy to add my Reviewed-by to Caesar's patch if he changed it mention that he needed to make sure that v7_exit_coherency_flush() in rockchip_cpu_die() executed in time. -Doug ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state 2015-06-05 20:24 ` Doug Anderson @ 2015-06-08 4:56 ` Caesar Wang 2015-06-08 20:52 ` Doug Anderson 0 siblings, 1 reply; 12+ messages in thread From: Caesar Wang @ 2015-06-08 4:56 UTC (permalink / raw) To: linux-arm-kernel ? 2015?06?06? 04:24, Doug Anderson ??: > Russell, > > On Fri, Jun 5, 2015 at 11:29 AM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: >> 1) v7_coherency_exit() is specific to v7 CPUs and can't be used by >> generic code. > Oh, I see. So (I think) you're saying that perhaps the reason that > Caesar needed his patch was that he needed the dying processor to > execute v7_exit_coherency_flush(), NOT that he needed the dying > processor to be in WFI/WFE. That actually makes a lot more sense to > me! :) Thanks a lot for pointing that out, it's very helpful. > > >> So, we're actually in a very sticky position over taking CPUs offline. >> It seems to be something that the ARM architecture and kernel >> architecture doesn't actually allow to be done safely. So much so, >> that in a similar way to the original Keystone 2 physical address >> switch, I'm tempted to make taking a CPU offline taint the kernel! > Wow, that's going to suck. So if you want to suspend / resume you > need to taint your kernel. So much for saving the planet by going > into suspend... ...or are you thinking that it won't taint the kernel > when the kernel takes CPUs offline for suspend/resume purposes? ...or > are you thinking you've some solution that works for suspend/resume > that doesn't work for the general cpu offlining problem? I'd be very > interested to hear... > > > I know I'm not a maintainer, but if I were and I knew that lots of > smart people had thought about the problem of CPU offlining and they > didn't have a solution and I could make my platform 99.99999999% > reliable by allowing a very safe mdelay(1) where I had a pretty strong > guarantee that the 1ms was enough time, I would probably accept that > code... > > > So since I'm not a maintainer and I certainly couldn't ack such code, > I would certainly be happy to add my Reviewed-by to Caesar's patch if > he changed it mention that he needed to make sure that > v7_exit_coherency_flush() in rockchip_cpu_die() executed in time. OK. The dying processor to execute v7_exit_coherency_flush(),not that the dying processor to be in WFI/WFE. It's needed to enter WFI/WFE state from the ARM refer document when CPU down. But...... Here is my test: (won't to enter the WFI state) @@ -331,8 +331,8 @@ static int rockchip_cpu_kill(unsigned int cpu) static void rockchip_cpu_die(unsigned int cpu) { v7_exit_coherency_flush(louis); - while (1) - cpu_do_idle(); + while (1); + //cpu_do_idle(); } echo 0 > /sys/module/printk/parameters/console_suspend echo 1 > /sys/power/pm_print_times echo mem > sys/power/state You can play anything or do some test for CPU up/down: cd /sys/devices/system/cpu/ for i in $(seq 10000); do echo "================= $i ============" for j in $(seq 100); do while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != "000" ]]; do echo 0 > cpu1/online echo 0 > cpu2/online echo 0 > cpu3/online done while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != "111" ]]; do echo 1 > cpu1/online echo 1 > cpu2/online echo 1 > cpu3/online done done done Sometimes,the system will be restart when do the about test. I'm no sure what's happen, That maybe abnormal won't to enter the WFI state. > > -Doug > > > -- Thanks, - Caesar ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state 2015-06-08 4:56 ` Caesar Wang @ 2015-06-08 20:52 ` Doug Anderson 0 siblings, 0 replies; 12+ messages in thread From: Doug Anderson @ 2015-06-08 20:52 UTC (permalink / raw) To: linux-arm-kernel Caesar, On Sun, Jun 7, 2015 at 9:56 PM, Caesar Wang <wxt@rock-chips.com> wrote: > OK. > The dying processor to execute v7_exit_coherency_flush(),not that the dying > processor to be in WFI/WFE. > > It's needed to enter WFI/WFE state from the ARM refer document when CPU > down. > > But...... > > Here is my test: (won't to enter the WFI state) > @@ -331,8 +331,8 @@ static int rockchip_cpu_kill(unsigned int cpu) > static void rockchip_cpu_die(unsigned int cpu) > { > v7_exit_coherency_flush(louis); > - while (1) > - cpu_do_idle(); > + while (1); > + //cpu_do_idle(); > } > > echo 0 > /sys/module/printk/parameters/console_suspend > echo 1 > /sys/power/pm_print_times > echo mem > sys/power/state > > You can play anything > or do some test for CPU up/down: > > cd /sys/devices/system/cpu/ > for i in $(seq 10000); do > echo "================= $i ============" > for j in $(seq 100); do > while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != > "000" ]]; do > > echo 0 > cpu1/online > echo 0 > cpu2/online > echo 0 > cpu3/online > done > while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != > "111" ]]; do > echo 1 > cpu1/online > echo 1 > cpu2/online > echo 1 > cpu3/online > done > done > done > Sometimes,the system will be restart when do the about test. I assume you meant the _above_ test. So it sometimes works and sometimes doesn't? Strange... > I'm no sure what's happen, That maybe abnormal won't to enter the WFI state. Good test. I think I understood what you said above: basically if you don't get to the WFI state then the system will sometimes restart. I guess that seems a little strange to me since I would have thought that since you assert "reset" to the CPU before you power it off it wouldn't matter a whole lot what state the system was in. The processor has exited concurrency and isn't touching any memory, so nothing it does should be hurting anyone. ...but I also am only guessing about how all this works / trying to use common sense. If we really need to be in WFI/WFE then I guess that's what we need to do. It still makes me nervous to say both that we "need to be in WFI" and that we have a loop around cpu_do_idle(). The loop implies that cpu_do_idle() might return sometimes. That would be OK, except now we've learned that if we happen to kill the CPU while we're executing the loop that it might crash the system. That's pretty non-ideal. I know the chances are incredibly unlikely, but still something that worries me... -Doug ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 3/3] ARM: rockchip: fix the SMP code style 2015-06-05 15:11 [PATCH v3 0/3] ARM: rockchip: fix the SMP Caesar Wang 2015-06-05 15:11 ` [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset Caesar Wang 2015-06-05 15:11 ` [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state Caesar Wang @ 2015-06-05 15:11 ` Caesar Wang 2 siblings, 0 replies; 12+ messages in thread From: Caesar Wang @ 2015-06-05 15:11 UTC (permalink / raw) To: linux-arm-kernel Use the below scripts to check: scripts/checkpatch.pl -f --subject arch/arm/mach-rockchip/platsmp.c Although there is a check, it's no matter. CHECK: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt +167udelay(10); total: 0 errors, 0 warnings, 1 checks, 362 lines checked Changes in v3: - FIx the PATCH v2, it doesn't work on chromium 3.14. Changes in v2: - As Kever points out, Fix the subject typo WIF/WFI in PATCH [2/3]. - As Heiko suggestion, re-adjust the cpu on/off flow in PATCH [1/3]. - Use the checkpatch.pl -f --subjective to check in PATCH [3/3]. Signed-off-by: Caesar Wang <wxt@rock-chips.com> --- arch/arm/mach-rockchip/platsmp.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c index 6672fdd..ac9173e 100644 --- a/arch/arm/mach-rockchip/platsmp.c +++ b/arch/arm/mach-rockchip/platsmp.c @@ -113,7 +113,7 @@ static int pmu_set_power_domain(int pd, bool on) ret = pmu_power_domain_is_on(pd); if (ret < 0) { pr_err("%s: could not read power domain state\n", - __func__); + __func__); return ret; } } @@ -137,7 +137,7 @@ static int __cpuinit rockchip_boot_secondary(unsigned int cpu, if (cpu >= ncores) { pr_err("%s: cpu %d outside maximum number of cpus %d\n", - __func__, cpu, ncores); + __func__, cpu, ncores); return -ENXIO; } @@ -156,7 +156,7 @@ static int __cpuinit rockchip_boot_secondary(unsigned int cpu, * */ udelay(10); writel(virt_to_phys(rockchip_secondary_startup), - sram_base_addr + 8); + sram_base_addr + 8); writel(0xDEADBEAF, sram_base_addr + 4); dsb_sev(); } @@ -335,7 +335,7 @@ static int rockchip_cpu_kill(unsigned int cpu) static void rockchip_cpu_die(unsigned int cpu) { v7_exit_coherency_flush(louis); - while(1) + while (1) cpu_do_idle(); } #endif @@ -348,4 +348,5 @@ static struct smp_operations rockchip_smp_ops __initdata = { .cpu_die = rockchip_cpu_die, #endif }; + CPU_METHOD_OF_DECLARE(rk3066_smp, "rockchip,rk3066-smp", &rockchip_smp_ops); -- 1.9.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-06-08 20:52 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-06-05 15:11 [PATCH v3 0/3] ARM: rockchip: fix the SMP Caesar Wang 2015-06-05 15:11 ` [PATCH v3 1/3] ARM: rockchip: fix the CPU soft reset Caesar Wang 2015-06-05 15:52 ` Heiko Stübner 2015-06-05 16:17 ` Heiko Stübner 2015-06-05 16:28 ` Doug Anderson 2015-06-05 15:11 ` [PATCH v3 2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state Caesar Wang 2015-06-05 17:49 ` Doug Anderson 2015-06-05 18:29 ` Russell King - ARM Linux 2015-06-05 20:24 ` Doug Anderson 2015-06-08 4:56 ` Caesar Wang 2015-06-08 20:52 ` Doug Anderson 2015-06-05 15:11 ` [PATCH v3 3/3] ARM: rockchip: fix the SMP code style Caesar Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).