From mboxrd@z Thu Jan 1 00:00:00 1970 From: Heiko =?ISO-8859-1?Q?St=FCbner?= Subject: Re: [PATCH 1/3] ARM: rockchip: fix the CPU soft reset Date: Fri, 05 Jun 2015 11:46:21 +0200 Message-ID: <26548754.3lOPTtJQ55@diego> References: <1433479677-18086-1-git-send-email-wxt@rock-chips.com> <1604172.FydsUUUXNY@diego> <55716512.9050603@rock-chips.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <55716512.9050603@rock-chips.com> Sender: linux-kernel-owner@vger.kernel.org To: Caesar Wang Cc: dianders@chromium.org, Dmitry Torokhov , linux-rockchip@lists.infradead.org, Russell King , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org List-Id: linux-rockchip.vger.kernel.org Am Freitag, 5. Juni 2015, 17:00:02 schrieb Caesar Wang: > Heiko, >=20 > =E5=9C=A8 2015=E5=B9=B406=E6=9C=8805=E6=97=A5 16:45, Heiko St=C3=BCbn= er =E5=86=99=E9=81=93: > > Hi Caesar, > >=20 > >=20 > > thanks for investigating this. > >=20 > > Am Freitag, 5. Juni 2015, 12:47:55 schrieb Caesar Wang: > >> In general, the correct flow is: > >>=20 > >> cpu off: > >> reset_control_assert > >> regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), BIT(pd)) > >>=20 > >> cpu on: > >> regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), 0) > >> reset_control_deassert > >>=20 > >> You can repro it with bringing CPU up and down. > >> Says:(test scripts) > >>=20 > >> cd /sys/devices/system/cpu/ > >> for i in $(seq 1000); do > >> echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D $i = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" > >> =20 > >> for j in $(seq 100); do > >> =20 > >> while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat > >>=20 > >> cpu3/online)" !=3D "000" ]]; do echo 0 > cpu1/online > >>=20 > >> echo 0 > cpu2/online > >> echo 0 > cpu3/online > >> =20 > >> done > >> while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat > >>=20 > >> cpu3/online)" !=3D "111" ]]; do echo 1 > cpu1/online > >>=20 > >> echo 1 > cpu2/online > >> echo 1 > cpu3/online > >> =20 > >> done > >> =20 > >> done > >> =20 > >> done > >>=20 > >> The following is reproducile log: > >> [34466.186812] PM: noirq suspend of devices complete after 0.669 m= secs > >> [34466.186824] Disabling non-boot CPUs ... > >> [34466.187509] CPU1: shutdown > >> [34466.188672] CPU2: shutdown > >> [34473.736627] Kernel panic - not syncing: Watchdog detected hard = LOCKUP > >> on > >> cpu 0 [34473.736646] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.1= 4.0 #1 > >> [34473.736687] [] (unwind_backtrace) from [] > >> (show_stack+0x20/0x24) [34473.736711] [] (show_stack) fr= om > >> [] (dump_stack+0x70/0x8c) [34473.736731] [] > >> (dump_stack) from [] (panic+0xa8/0x1fc) [34473.736754] > >> [] (panic) from [] (watchdog_timer_fn+0x234/0x= 26c) > >> [34473.736777] [] (watchdog_timer_fn) from [] > >> (__run_hrtimer+0x118/0x1e0) [34473.736797] [] (__run_hrt= imer) > >> from [] (hrtimer_interrupt+0x148/0x2a0) [34473.736820] > >> [] (hrtimer_interrupt) from [] > >> (arch_timer_handler_phys+0x38/0x48) [34473.736844] [] > >> (arch_timer_handler_phys) from [] > >> (handle_percpu_devid_irq+0xb8/0x124) [34473.736867] [] > >> (handle_percpu_devid_irq) from [] > >> (generic_handle_irq+0x30/0x40) > >> [34473.736887] [] (generic_handle_irq) from [] > >> (__handle_domain_irq+0x8c/0xb0) [34473.736905] [] > >> (__handle_domain_irq) from [] (gic_handle_irq+0x48/0x6c) > >> [34473.736922] [] (gic_handle_irq) from [] > >> (__irq_svc+0x40/0x50) [34473.736936] Exception stack(0xee127f70 to > >> 0xee127fb8) > >> [34473.736948] 7f60: ffffffed > >> 00000000 > >> 2dd6d000 00000000 [34473.736964] 7f80: ee126000 00000015 c0b46bac > >> c0b46bac > >> 0000406a 410fc0d1 00000000 ee127fc4 [34473.736979] 7fa0: ee127fb8 > >> ee127fb8 > >> c0107038 c010703c 600f0013 ffffffff [34473.736995] [] > >> (__irq_svc) > >> from [] (arch_cpu_idle+0x40/0x48) [34473.737013] [] > >> (arch_cpu_idle) from [] (cpu_startup_entry+0x170/0x1d0) > >> [34473.737031] [] (cpu_startup_entry) from [] > >> (secondary_start_kernel+0x138/0x160) [34473.737059] [] > >> (secondary_start_kernel) from [<00100464>] (0x100464) [34474.90374= 0] SMP: > >> failed to stop secondary CPUs > >> [34476.099964] SMP: failed to stop secondary CPUs > >> ... > >>=20 > >> Signed-off-by: Caesar Wang > >> --- > >>=20 > >> arch/arm/mach-rockchip/platsmp.c | 13 +++++-------- > >> 1 file changed, 5 insertions(+), 8 deletions(-) > >>=20 > >> diff --git a/arch/arm/mach-rockchip/platsmp.c > >> b/arch/arm/mach-rockchip/platsmp.c index 5b4ca3c..1230d3d 100644 > >> --- a/arch/arm/mach-rockchip/platsmp.c > >> +++ b/arch/arm/mach-rockchip/platsmp.c > >> @@ -88,20 +88,17 @@ static int pmu_set_power_domain(int pd, bool o= n) > >>=20 > >> return PTR_ERR(rstc); > >> =09 > >> } > >>=20 > >> - if (on) > >> + if (on) { > >> + regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); > >>=20 > >> reset_control_deassert(rstc); > >>=20 > >> - else > >> + } else { > >>=20 > >> reset_control_assert(rstc); > >>=20 > >> + regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); > >> + } > >=20 > > you're loosing the return value of regmap_update_bits here, I guess= it > > should look like below? > >=20 > > if (on) { > > =09 > > ret =3D regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); >=20 > I search the kernel, Nobody do that return value. counter-example: http://lxr.free-electrons.com/source/drivers/phy/phy-exynos5250-sata.c#= L76 ;-) > Need we are that? > I will fix it on next patch if it's need. yes, while not very probable, it is nevertheless still good practice to= handle=20 it. But when chaning the logic as we observed below, the regmap update = part=20 can already stay how it is now, as you can simply do something like: if (!on) reset_control_assert(rstc); regmap_update_bits wait_for_pd if (on) reset_control_deassert(rstc); Heiko > > if (ret < 0) { > > =09 > > pr_err("%s: could not update power domain\n", __func__); > > =09 > > reset_control_put(rstc); > > =09 > > return ret; > > =09 > > } > > =09 > > reset_control_deassert(rstc); > > =09 > > } else { > > =09 > > reset_control_assert(rstc); > > =09 > > ret =3D regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); > > if (ret < 0) { > > =09 > > pr_err("%s: could not update power domain\n", __func__); > > =09 > > reset_control_put(rstc); > > =09 > > return ret; > > =09 > > } > > =09 > > } > > =09 > >> reset_control_put(rstc); > >> =09 > >> } > >>=20 > >> - ret =3D regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val); > >> - if (ret < 0) { > >> - pr_err("%s: could not update power domain\n", __func__); > >> - return ret; > >> - } > >> - > >>=20 > >> ret =3D -1; > >> while (ret !=3D on) { > >> =09 > >> ret =3D pmu_power_domain_is_on(pd); > >=20 > > second question - with this patch, what happens actually is > >=20 > > cpu off: > > reset_control_assert > > regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), BIT(pd)) > > wait_for_power_domain_to_turn_off > > =20 > > cpu on: > > regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), 0) > > reset_control_deassert > > wait_for_power_domain_to_turn_on > >=20 > > So shouldn't the deassertion of the reset happen after the powerdom= ain > > sucessfull turned on? Like > >=20 > > cpu on: > > regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), 0) > > wait_for_power_domain_to_turn_on > > reset_control_deassert >=20 > You are right.