* omap4-panda-es boot issues with v3.15-rc4 @ 2014-05-08 12:53 Roger Quadros 2014-05-08 15:31 ` Kevin Hilman 0 siblings, 1 reply; 18+ messages in thread From: Roger Quadros @ 2014-05-08 12:53 UTC (permalink / raw) To: linux-arm-kernel Hi, Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing similar issues, although less frequent, with v3.15-rc4 as well. Configuration: - kernel v3.15-rc4 or linux-next (20140507) - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled - u-boot/master 173d294b94cf Observations: - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. e.g. http://www.hastebin.com/ebumojegoq.vhdl - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. Maybe related to when high speed interrupts occur in the boot process. - On successful boots following warning is seen [ 4.010375] gic_timer_retrigger: lost localtimer interrupt - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in ON state as well. It does blink again when doing activity on console. Workaround: - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. cheers, -roger ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-08 12:53 omap4-panda-es boot issues with v3.15-rc4 Roger Quadros @ 2014-05-08 15:31 ` Kevin Hilman 2014-05-08 15:40 ` Kevin Hilman 0 siblings, 1 reply; 18+ messages in thread From: Kevin Hilman @ 2014-05-08 15:31 UTC (permalink / raw) To: linux-arm-kernel Roger Quadros <rogerq@ti.com> writes: > Hi, > > Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing > similar issues, although less frequent, with v3.15-rc4 as well. > > Configuration: > > - kernel v3.15-rc4 or linux-next (20140507) > - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled > - u-boot/master 173d294b94cf > > Observations: > > - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. > e.g. http://www.hastebin.com/ebumojegoq.vhdl > > - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 I've beeen noticing the same thing for awhile with my boot tests. For me, next-20140508 is failing most of the time now. > - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. > Maybe related to when high speed interrupts occur in the boot process. > > - On successful boots following warning is seen > [ 4.010375] gic_timer_retrigger: lost localtimer interrupt > > - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in > ON state as well. It does blink again when doing activity on console. > > Workaround: > > - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. > > I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem go away. Hmm.... Kevin ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-08 15:31 ` Kevin Hilman @ 2014-05-08 15:40 ` Kevin Hilman 2014-05-08 16:55 ` Tony Lindgren 2014-05-08 17:12 ` Grygorii Strashko 0 siblings, 2 replies; 18+ messages in thread From: Kevin Hilman @ 2014-05-08 15:40 UTC (permalink / raw) To: linux-arm-kernel On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote: > Roger Quadros <rogerq@ti.com> writes: > >> Hi, >> >> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing >> similar issues, although less frequent, with v3.15-rc4 as well. >> >> Configuration: >> >> - kernel v3.15-rc4 or linux-next (20140507) >> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled >> - u-boot/master 173d294b94cf >> >> Observations: >> >> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. >> e.g. http://www.hastebin.com/ebumojegoq.vhdl >> >> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 > > I've beeen noticing the same thing for awhile with my boot tests. For > me, next-20140508 is failing most of the time now. > >> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. >> Maybe related to when high speed interrupts occur in the boot process. >> >> - On successful boots following warning is seen >> [ 4.010375] gic_timer_retrigger: lost localtimer interrupt >> >> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in >> ON state as well. It does blink again when doing activity on console. >> >> Workaround: >> >> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. >> >> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. > > I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem > go away. Hmm.... Another finger pointing in the same direction: omap2plus_defconfig + CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's -next. Kevin ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-08 15:40 ` Kevin Hilman @ 2014-05-08 16:55 ` Tony Lindgren 2014-05-08 18:40 ` Tony Lindgren 2014-05-09 8:20 ` Roger Quadros 2014-05-08 17:12 ` Grygorii Strashko 1 sibling, 2 replies; 18+ messages in thread From: Tony Lindgren @ 2014-05-08 16:55 UTC (permalink / raw) To: linux-arm-kernel * Kevin Hilman <khilman@linaro.org> [140508 08:40]: > On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote: > > Roger Quadros <rogerq@ti.com> writes: > > > >> Hi, > >> > >> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing > >> similar issues, although less frequent, with v3.15-rc4 as well. > >> > >> Configuration: > >> > >> - kernel v3.15-rc4 or linux-next (20140507) > >> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled > >> - u-boot/master 173d294b94cf > >> > >> Observations: > >> > >> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. > >> e.g. http://www.hastebin.com/ebumojegoq.vhdl > >> > >> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 > > > > I've beeen noticing the same thing for awhile with my boot tests. For > > me, next-20140508 is failing most of the time now. > > > >> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. > >> Maybe related to when high speed interrupts occur in the boot process. > >> > >> - On successful boots following warning is seen > >> [ 4.010375] gic_timer_retrigger: lost localtimer interrupt > >> > >> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in > >> ON state as well. It does blink again when doing activity on console. > >> > >> Workaround: > >> > >> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. > >> > >> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. > > > > I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem > > go away. Hmm.... > > Another finger pointing in the same direction: omap2plus_defconfig + > CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's > -next. Booting today's next with multi_v7_defconfig (so cpuidle enabled) on omap4 sdp seems to boot reliably. And it's not producing these: gic_timer_retrigger: lost localtimer interrupt while panda is producing those errors like Roger mentioned. It seems that the USB networking is the main difference between omap4 sdp and panda? Regards, Tony ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-08 16:55 ` Tony Lindgren @ 2014-05-08 18:40 ` Tony Lindgren 2014-05-08 22:15 ` Kevin Hilman 2014-05-09 8:20 ` Roger Quadros 1 sibling, 1 reply; 18+ messages in thread From: Tony Lindgren @ 2014-05-08 18:40 UTC (permalink / raw) To: linux-arm-kernel Added few cpuidle people to Cc on this regression. * Tony Lindgren <tony@atomide.com> [140508 09:57]: > * Kevin Hilman <khilman@linaro.org> [140508 08:40]: > > On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote: > > > Roger Quadros <rogerq@ti.com> writes: > > > > > >> Hi, > > >> > > >> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing > > >> similar issues, although less frequent, with v3.15-rc4 as well. > > >> > > >> Configuration: > > >> > > >> - kernel v3.15-rc4 or linux-next (20140507) > > >> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled > > >> - u-boot/master 173d294b94cf > > >> > > >> Observations: > > >> > > >> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. > > >> e.g. http://www.hastebin.com/ebumojegoq.vhdl > > >> > > >> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 > > > > > > I've beeen noticing the same thing for awhile with my boot tests. For > > > me, next-20140508 is failing most of the time now. > > > > > >> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. > > >> Maybe related to when high speed interrupts occur in the boot process. > > >> > > >> - On successful boots following warning is seen > > >> [ 4.010375] gic_timer_retrigger: lost localtimer interrupt > > >> > > >> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in > > >> ON state as well. It does blink again when doing activity on console. > > >> > > >> Workaround: > > >> > > >> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. > > >> > > >> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. > > > > > > I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem > > > go away. Hmm.... > > > > Another finger pointing in the same direction: omap2plus_defconfig + > > CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's > > -next. > > Booting today's next with multi_v7_defconfig (so cpuidle enabled) on > omap4 sdp seems to boot reliably. And it's not producing these: > > gic_timer_retrigger: lost localtimer interrupt Still seeing the above, looks like the lost localtimer interrupt above is a separate issue.. > while panda is producing those errors like Roger mentioned. > > It seems that the USB networking is the main difference between > omap4 sdp and panda? ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Regards, Tony ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-08 18:40 ` Tony Lindgren @ 2014-05-08 22:15 ` Kevin Hilman 2014-05-09 8:23 ` Roger Quadros 0 siblings, 1 reply; 18+ messages in thread From: Kevin Hilman @ 2014-05-08 22:15 UTC (permalink / raw) To: linux-arm-kernel Tony Lindgren <tony@atomide.com> writes: [...] > ..but I think I found the cause for recent hangs on panda, just a wild > guess based on looking at the recent cpuidle patches after v3.14. > > Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts > until all coupled CPUs leave idle) makes booting work reliably again > on panda. > > Can you guys confirm, so far no issues here after few boot tests, > but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Kevin ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-08 22:15 ` Kevin Hilman @ 2014-05-09 8:23 ` Roger Quadros 2014-05-09 23:45 ` Kevin Hilman 0 siblings, 1 reply; 18+ messages in thread From: Roger Quadros @ 2014-05-09 8:23 UTC (permalink / raw) To: linux-arm-kernel Kevin, On 05/09/2014 01:15 AM, Kevin Hilman wrote: > Tony Lindgren <tony@atomide.com> writes: > > [...] > >> ..but I think I found the cause for recent hangs on panda, just a wild >> guess based on looking at the recent cpuidle patches after v3.14. >> >> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts >> until all coupled CPUs leave idle) makes booting work reliably again >> on panda. >> >> Can you guys confirm, so far no issues here after few boot tests, >> but it might be too early to tell. > > Reverting that makes things a bit more stable, but it still eventually > fails in the same way. For me it took 8 boots for it to eventually > fail. > > However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable > (20+ boots in a row and still going.) > Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? It worked for me 10/10 boots. diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index 01fc710..99362ff 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -206,7 +206,12 @@ static struct cpuidle_driver omap4_idle_driver = { .desc = "CPUx OFF, MPUSS OSWR", }, }, - .state_count = ARRAY_SIZE(omap4_idle_data), +/* + * Disable C3 state since it is unstable + * + * .state_count = ARRAY_SIZE(omap4_idle_data), + */ + .state_count = 2, .safe_state_index = 0, }; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-09 8:23 ` Roger Quadros @ 2014-05-09 23:45 ` Kevin Hilman 2014-05-11 15:55 ` Tony Lindgren 0 siblings, 1 reply; 18+ messages in thread From: Kevin Hilman @ 2014-05-09 23:45 UTC (permalink / raw) To: linux-arm-kernel Roger Quadros <rogerq@ti.com> writes: > Kevin, > > On 05/09/2014 01:15 AM, Kevin Hilman wrote: >> Tony Lindgren <tony@atomide.com> writes: >> >> [...] >> >>> ..but I think I found the cause for recent hangs on panda, just a wild >>> guess based on looking at the recent cpuidle patches after v3.14. >>> >>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts >>> until all coupled CPUs leave idle) makes booting work reliably again >>> on panda. >>> >>> Can you guys confirm, so far no issues here after few boot tests, >>> but it might be too early to tell. >> >> Reverting that makes things a bit more stable, but it still eventually >> fails in the same way. For me it took 8 boots for it to eventually >> fail. >> >> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable >> (20+ boots in a row and still going.) >> > > Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? > It worked for me 10/10 boots. Yup, it worked for me too for 10/10 boots in a row. Kevin ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-09 23:45 ` Kevin Hilman @ 2014-05-11 15:55 ` Tony Lindgren 2014-05-12 21:40 ` Santosh Shilimkar 0 siblings, 1 reply; 18+ messages in thread From: Tony Lindgren @ 2014-05-11 15:55 UTC (permalink / raw) To: linux-arm-kernel * Kevin Hilman <khilman@linaro.org> [140509 16:46]: > Roger Quadros <rogerq@ti.com> writes: > > > Kevin, > > > > On 05/09/2014 01:15 AM, Kevin Hilman wrote: > >> Tony Lindgren <tony@atomide.com> writes: > >> > >> [...] > >> > >>> ..but I think I found the cause for recent hangs on panda, just a wild > >>> guess based on looking at the recent cpuidle patches after v3.14. > >>> > >>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts > >>> until all coupled CPUs leave idle) makes booting work reliably again > >>> on panda. > >>> > >>> Can you guys confirm, so far no issues here after few boot tests, > >>> but it might be too early to tell. > >> > >> Reverting that makes things a bit more stable, but it still eventually > >> fails in the same way. For me it took 8 boots for it to eventually > >> fail. > >> > >> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable > >> (20+ boots in a row and still going.) > >> > > > > Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? > > It worked for me 10/10 boots. > > Yup, it worked for me too for 10/10 boots in a row. But what has caused this regression, does it work reliably with let's say v3.13 or v3.12? Regards, Tony ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-11 15:55 ` Tony Lindgren @ 2014-05-12 21:40 ` Santosh Shilimkar 2014-05-12 22:07 ` Tony Lindgren 2014-05-12 23:56 ` Kevin Hilman 0 siblings, 2 replies; 18+ messages in thread From: Santosh Shilimkar @ 2014-05-12 21:40 UTC (permalink / raw) To: linux-arm-kernel On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: > * Kevin Hilman <khilman@linaro.org> [140509 16:46]: >> Roger Quadros <rogerq@ti.com> writes: >> >>> Kevin, >>> >>> On 05/09/2014 01:15 AM, Kevin Hilman wrote: >>>> Tony Lindgren <tony@atomide.com> writes: >>>> >>>> [...] >>>> >>>>> ..but I think I found the cause for recent hangs on panda, just a wild >>>>> guess based on looking at the recent cpuidle patches after v3.14. >>>>> >>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts >>>>> until all coupled CPUs leave idle) makes booting work reliably again >>>>> on panda. >>>>> >>>>> Can you guys confirm, so far no issues here after few boot tests, >>>>> but it might be too early to tell. >>>> >>>> Reverting that makes things a bit more stable, but it still eventually >>>> fails in the same way. For me it took 8 boots for it to eventually >>>> fail. >>>> >>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable >>>> (20+ boots in a row and still going.) >>>> >>> >>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? >>> It worked for me 10/10 boots. >> >> Yup, it worked for me too for 10/10 boots in a row. > > But what has caused this regression, does it work reliably with let's > say v3.13 or v3.12? > IIRC things were stable till some CPUIDLE code consolidation happened. I don't recall exactly but some one did discuss about it a while back. Can you re-run your test-cases with patch at end of the email. This is just a hunch so don't blame me if I waste your time testing the patch. regards, Santosh >From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar <santosh.shilimkar@ti.com> Date: Mon, 12 May 2014 17:37:59 -0400 Subject: [PATCH] Revert "cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag" This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78. Conflicts: arch/arm/mach-omap2/cpuidle44xx.c --- arch/arm/mach-omap2/cpuidle44xx.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index 01fc710..aae3606 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, { struct idle_statedata *cx = state_ptr + index; u32 mpuss_can_lose_context = 0; + int cpu_id = smp_processor_id(); /* * CPU0 has to wait and stay ON until CPU1 is OFF state. @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) && (cx->mpu_logic_state == PWRDM_POWER_OFF); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); + /* * Call idle CPU PM enter notifier chain so that * VFP and per CPU interrupt context is saved. @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, if (dev->cpu == 0 && mpuss_can_lose_context) cpu_cluster_pm_exit(); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); + fail: cpuidle_coupled_parallel_barrier(dev, &abort_barrier); cpu_done[dev->cpu] = false; @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ .exit_latency = 328 + 440, .target_residency = 960, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = "C2", .desc = "CPUx OFF, MPUSS CSWR", @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ .exit_latency = 460 + 518, .target_residency = 1100, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = "C3", .desc = "CPUx OFF, MPUSS OSWR", -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-12 21:40 ` Santosh Shilimkar @ 2014-05-12 22:07 ` Tony Lindgren 2014-05-13 8:10 ` Roger Quadros 2014-05-12 23:56 ` Kevin Hilman 1 sibling, 1 reply; 18+ messages in thread From: Tony Lindgren @ 2014-05-12 22:07 UTC (permalink / raw) To: linux-arm-kernel * Santosh Shilimkar <santosh.shilimkar@ti.com> [140512 14:41]: > On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: > > * Kevin Hilman <khilman@linaro.org> [140509 16:46]: > >> Roger Quadros <rogerq@ti.com> writes: > >> > >>> Kevin, > >>> > >>> On 05/09/2014 01:15 AM, Kevin Hilman wrote: > >>>> Tony Lindgren <tony@atomide.com> writes: > >>>> > >>>> [...] > >>>> > >>>>> ..but I think I found the cause for recent hangs on panda, just a wild > >>>>> guess based on looking at the recent cpuidle patches after v3.14. > >>>>> > >>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts > >>>>> until all coupled CPUs leave idle) makes booting work reliably again > >>>>> on panda. > >>>>> > >>>>> Can you guys confirm, so far no issues here after few boot tests, > >>>>> but it might be too early to tell. > >>>> > >>>> Reverting that makes things a bit more stable, but it still eventually > >>>> fails in the same way. For me it took 8 boots for it to eventually > >>>> fail. > >>>> > >>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable > >>>> (20+ boots in a row and still going.) > >>>> > >>> > >>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? > >>> It worked for me 10/10 boots. > >> > >> Yup, it worked for me too for 10/10 boots in a row. > > > > But what has caused this regression, does it work reliably with let's > > say v3.13 or v3.12? > > > IIRC things were stable till some CPUIDLE code consolidation happened. > I don't recall exactly but some one did discuss about it a while back. OK that's good to hear. > Can you re-run your test-cases with patch at end of the email. This > is just a hunch so don't blame me if I waste your time testing the > patch. Seems to work after adding "#include <linux/clockchips.h>". I did about 10 reboots and they all succeeded for me. Without your revert, I'm getting a hang (with sysrq not working) about 1/3 of the boots. Kevin, Roger, does the revert from Santosh work for you too? BTW, I think the the RCU stall was/is a separate issue. That's different where the system actually recovers after about a minute, or after sysrq ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the older kernels around v3.10 time, or if it's still also happening. Regards, Tony > From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001 > From: Santosh Shilimkar <santosh.shilimkar@ti.com> > Date: Mon, 12 May 2014 17:37:59 -0400 > Subject: [PATCH] Revert "cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag" > > This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78. > > Conflicts: > > arch/arm/mach-omap2/cpuidle44xx.c > --- > arch/arm/mach-omap2/cpuidle44xx.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c > index 01fc710..aae3606 100644 > --- a/arch/arm/mach-omap2/cpuidle44xx.c > +++ b/arch/arm/mach-omap2/cpuidle44xx.c > @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > { > struct idle_statedata *cx = state_ptr + index; > u32 mpuss_can_lose_context = 0; > + int cpu_id = smp_processor_id(); > > /* > * CPU0 has to wait and stay ON until CPU1 is OFF state. > @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) && > (cx->mpu_logic_state == PWRDM_POWER_OFF); > > + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); > + > /* > * Call idle CPU PM enter notifier chain so that > * VFP and per CPU interrupt context is saved. > @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > if (dev->cpu == 0 && mpuss_can_lose_context) > cpu_cluster_pm_exit(); > > + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); > + > fail: > cpuidle_coupled_parallel_barrier(dev, &abort_barrier); > cpu_done[dev->cpu] = false; > @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = { > /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ > .exit_latency = 328 + 440, > .target_residency = 960, > - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | > - CPUIDLE_FLAG_TIMER_STOP, > + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, > .enter = omap_enter_idle_coupled, > .name = "C2", > .desc = "CPUx OFF, MPUSS CSWR", > @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = { > /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ > .exit_latency = 460 + 518, > .target_residency = 1100, > - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | > - CPUIDLE_FLAG_TIMER_STOP, > + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, > .enter = omap_enter_idle_coupled, > .name = "C3", > .desc = "CPUx OFF, MPUSS OSWR", > -- > 1.7.9.5 > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-12 22:07 ` Tony Lindgren @ 2014-05-13 8:10 ` Roger Quadros 2014-05-13 14:19 ` Santosh Shilimkar 0 siblings, 1 reply; 18+ messages in thread From: Roger Quadros @ 2014-05-13 8:10 UTC (permalink / raw) To: linux-arm-kernel On 05/13/2014 01:07 AM, Tony Lindgren wrote: > * Santosh Shilimkar <santosh.shilimkar@ti.com> [140512 14:41]: >> On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: >>> * Kevin Hilman <khilman@linaro.org> [140509 16:46]: >>>> Roger Quadros <rogerq@ti.com> writes: >>>> >>>>> Kevin, >>>>> >>>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote: >>>>>> Tony Lindgren <tony@atomide.com> writes: >>>>>> >>>>>> [...] >>>>>> >>>>>>> ..but I think I found the cause for recent hangs on panda, just a wild >>>>>>> guess based on looking at the recent cpuidle patches after v3.14. >>>>>>> >>>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts >>>>>>> until all coupled CPUs leave idle) makes booting work reliably again >>>>>>> on panda. >>>>>>> >>>>>>> Can you guys confirm, so far no issues here after few boot tests, >>>>>>> but it might be too early to tell. >>>>>> >>>>>> Reverting that makes things a bit more stable, but it still eventually >>>>>> fails in the same way. For me it took 8 boots for it to eventually >>>>>> fail. >>>>>> >>>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable >>>>>> (20+ boots in a row and still going.) >>>>>> >>>>> >>>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? >>>>> It worked for me 10/10 boots. >>>> >>>> Yup, it worked for me too for 10/10 boots in a row. >>> >>> But what has caused this regression, does it work reliably with let's >>> say v3.13 or v3.12? >>> >> IIRC things were stable till some CPUIDLE code consolidation happened. >> I don't recall exactly but some one did discuss about it a while back. > > OK that's good to hear. > >> Can you re-run your test-cases with patch at end of the email. This >> is just a hunch so don't blame me if I waste your time testing the >> patch. > > Seems to work after adding "#include <linux/clockchips.h>". I did about 10 > reboots and they all succeeded for me. Without your revert, I'm getting > a hang (with sysrq not working) about 1/3 of the boots. > > Kevin, Roger, does the revert from Santosh work for you too? > next-20140508 worked for me 10/10 times with Santosh's patch. The heartbeat LED behaves normally as well. So I like it :). cheers, -roger > BTW, I think the the RCU stall was/is a separate issue. That's different > where the system actually recovers after about a minute, or after sysrq > ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the > older kernels around v3.10 time, or if it's still also happening. > > Regards, > > Tony > >> From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001 >> From: Santosh Shilimkar <santosh.shilimkar@ti.com> >> Date: Mon, 12 May 2014 17:37:59 -0400 >> Subject: [PATCH] Revert "cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag" >> >> This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78. >> >> Conflicts: >> >> arch/arm/mach-omap2/cpuidle44xx.c >> --- >> arch/arm/mach-omap2/cpuidle44xx.c | 11 +++++++---- >> 1 file changed, 7 insertions(+), 4 deletions(-) >> >> diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c >> index 01fc710..aae3606 100644 >> --- a/arch/arm/mach-omap2/cpuidle44xx.c >> +++ b/arch/arm/mach-omap2/cpuidle44xx.c >> @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >> { >> struct idle_statedata *cx = state_ptr + index; >> u32 mpuss_can_lose_context = 0; >> + int cpu_id = smp_processor_id(); >> >> /* >> * CPU0 has to wait and stay ON until CPU1 is OFF state. >> @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >> mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) && >> (cx->mpu_logic_state == PWRDM_POWER_OFF); >> >> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); >> + >> /* >> * Call idle CPU PM enter notifier chain so that >> * VFP and per CPU interrupt context is saved. >> @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >> if (dev->cpu == 0 && mpuss_can_lose_context) >> cpu_cluster_pm_exit(); >> >> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); >> + >> fail: >> cpuidle_coupled_parallel_barrier(dev, &abort_barrier); >> cpu_done[dev->cpu] = false; >> @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = { >> /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ >> .exit_latency = 328 + 440, >> .target_residency = 960, >> - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | >> - CPUIDLE_FLAG_TIMER_STOP, >> + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, >> .enter = omap_enter_idle_coupled, >> .name = "C2", >> .desc = "CPUx OFF, MPUSS CSWR", >> @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = { >> /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ >> .exit_latency = 460 + 518, >> .target_residency = 1100, >> - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | >> - CPUIDLE_FLAG_TIMER_STOP, >> + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, >> .enter = omap_enter_idle_coupled, >> .name = "C3", >> .desc = "CPUx OFF, MPUSS OSWR", >> -- >> 1.7.9.5 >> >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-13 8:10 ` Roger Quadros @ 2014-05-13 14:19 ` Santosh Shilimkar 0 siblings, 0 replies; 18+ messages in thread From: Santosh Shilimkar @ 2014-05-13 14:19 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 13 May 2014 04:10 AM, Roger Quadros wrote: > On 05/13/2014 01:07 AM, Tony Lindgren wrote: >> * Santosh Shilimkar <santosh.shilimkar@ti.com> [140512 14:41]: >>> On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: >>>> * Kevin Hilman <khilman@linaro.org> [140509 16:46]: >>>>> Roger Quadros <rogerq@ti.com> writes: >>>>> >>>>>> Kevin, >>>>>> >>>>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote: >>>>>>> Tony Lindgren <tony@atomide.com> writes: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>> ..but I think I found the cause for recent hangs on panda, just a wild >>>>>>>> guess based on looking at the recent cpuidle patches after v3.14. >>>>>>>> >>>>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts >>>>>>>> until all coupled CPUs leave idle) makes booting work reliably again >>>>>>>> on panda. >>>>>>>> >>>>>>>> Can you guys confirm, so far no issues here after few boot tests, >>>>>>>> but it might be too early to tell. >>>>>>> >>>>>>> Reverting that makes things a bit more stable, but it still eventually >>>>>>> fails in the same way. For me it took 8 boots for it to eventually >>>>>>> fail. >>>>>>> >>>>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable >>>>>>> (20+ boots in a row and still going.) >>>>>>> >>>>>> >>>>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? >>>>>> It worked for me 10/10 boots. >>>>> >>>>> Yup, it worked for me too for 10/10 boots in a row. >>>> >>>> But what has caused this regression, does it work reliably with let's >>>> say v3.13 or v3.12? >>>> >>> IIRC things were stable till some CPUIDLE code consolidation happened. >>> I don't recall exactly but some one did discuss about it a while back. >> >> OK that's good to hear. >> >>> Can you re-run your test-cases with patch at end of the email. This >>> is just a hunch so don't blame me if I waste your time testing the >>> patch. >> >> Seems to work after adding "#include <linux/clockchips.h>". I did about 10 >> reboots and they all succeeded for me. Without your revert, I'm getting >> a hang (with sysrq not working) about 1/3 of the boots. >> >> Kevin, Roger, does the revert from Santosh work for you too? >> > > next-20140508 worked for me 10/10 times with Santosh's patch. > The heartbeat LED behaves normally as well. So I like it :). > Great. Will post the patch with change log updated and cc you guys. Regards, Santosh ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-12 21:40 ` Santosh Shilimkar 2014-05-12 22:07 ` Tony Lindgren @ 2014-05-12 23:56 ` Kevin Hilman 1 sibling, 0 replies; 18+ messages in thread From: Kevin Hilman @ 2014-05-12 23:56 UTC (permalink / raw) To: linux-arm-kernel Santosh Shilimkar <santosh.shilimkar@ti.com> writes: > On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: >> * Kevin Hilman <khilman@linaro.org> [140509 16:46]: >>> Roger Quadros <rogerq@ti.com> writes: >>> >>>> Kevin, >>>> >>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote: >>>>> Tony Lindgren <tony@atomide.com> writes: >>>>> >>>>> [...] >>>>> >>>>>> ..but I think I found the cause for recent hangs on panda, just a wild >>>>>> guess based on looking at the recent cpuidle patches after v3.14. >>>>>> >>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts >>>>>> until all coupled CPUs leave idle) makes booting work reliably again >>>>>> on panda. >>>>>> >>>>>> Can you guys confirm, so far no issues here after few boot tests, >>>>>> but it might be too early to tell. >>>>> >>>>> Reverting that makes things a bit more stable, but it still eventually >>>>> fails in the same way. For me it took 8 boots for it to eventually >>>>> fail. >>>>> >>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable >>>>> (20+ boots in a row and still going.) >>>>> >>>> >>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? >>>> It worked for me 10/10 boots. >>> >>> Yup, it worked for me too for 10/10 boots in a row. >> >> But what has caused this regression, does it work reliably with let's >> say v3.13 or v3.12? >> > IIRC things were stable till some CPUIDLE code consolidation happened. > I don't recall exactly but some one did discuss about it a while back. > > Can you re-run your test-cases with patch at end of the email. This > is just a hunch so don't blame me if I waste your time testing the > patch. With your patch applied on top of next-20140512, my 4460 Panda-ES has booted 25 times in a row, and still going. Kevin ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-08 16:55 ` Tony Lindgren 2014-05-08 18:40 ` Tony Lindgren @ 2014-05-09 8:20 ` Roger Quadros 1 sibling, 0 replies; 18+ messages in thread From: Roger Quadros @ 2014-05-09 8:20 UTC (permalink / raw) To: linux-arm-kernel On 05/08/2014 07:55 PM, Tony Lindgren wrote: > * Kevin Hilman <khilman@linaro.org> [140508 08:40]: >> On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote: >>> Roger Quadros <rogerq@ti.com> writes: >>> >>>> Hi, >>>> >>>> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing >>>> similar issues, although less frequent, with v3.15-rc4 as well. >>>> >>>> Configuration: >>>> >>>> - kernel v3.15-rc4 or linux-next (20140507) >>>> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled >>>> - u-boot/master 173d294b94cf >>>> >>>> Observations: >>>> >>>> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. >>>> e.g. http://www.hastebin.com/ebumojegoq.vhdl >>>> >>>> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 >>> >>> I've beeen noticing the same thing for awhile with my boot tests. For >>> me, next-20140508 is failing most of the time now. >>> >>>> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. >>>> Maybe related to when high speed interrupts occur in the boot process. >>>> >>>> - On successful boots following warning is seen >>>> [ 4.010375] gic_timer_retrigger: lost localtimer interrupt >>>> >>>> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in >>>> ON state as well. It does blink again when doing activity on console. >>>> >>>> Workaround: >>>> >>>> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. >>>> >>>> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. >>> >>> I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem >>> go away. Hmm.... >> >> Another finger pointing in the same direction: omap2plus_defconfig + >> CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's >> -next. > > Booting today's next with multi_v7_defconfig (so cpuidle enabled) on > omap4 sdp seems to boot reliably. And it's not producing these: > > gic_timer_retrigger: lost localtimer interrupt > > while panda is producing those errors like Roger mentioned. > > It seems that the USB networking is the main difference between > omap4 sdp and panda? Is your sdp using omap4430? To confirm 4430 vs 4460 I ran 10 tests each on omap4430 panda and omap4460 panda. 4430panda fails 2/10 times. 4460panda fails 7/10 times. cheers, -roger ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-08 15:40 ` Kevin Hilman 2014-05-08 16:55 ` Tony Lindgren @ 2014-05-08 17:12 ` Grygorii Strashko 2014-05-09 8:30 ` Roger Quadros 1 sibling, 1 reply; 18+ messages in thread From: Grygorii Strashko @ 2014-05-08 17:12 UTC (permalink / raw) To: linux-arm-kernel Hi, On 05/08/2014 06:40 PM, Kevin Hilman wrote: > On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote: >> Roger Quadros <rogerq@ti.com> writes: >> >>> Hi, >>> >>> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing >>> similar issues, although less frequent, with v3.15-rc4 as well. >>> >>> Configuration: >>> >>> - kernel v3.15-rc4 or linux-next (20140507) >>> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled >>> - u-boot/master 173d294b94cf >>> >>> Observations: >>> >>> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. >>> e.g. http://www.hastebin.com/ebumojegoq.vhdl >>> >>> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 >> >> I've beeen noticing the same thing for awhile with my boot tests. For >> me, next-20140508 is failing most of the time now. >> >>> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. >>> Maybe related to when high speed interrupts occur in the boot process. >>> >>> - On successful boots following warning is seen >>> [ 4.010375] gic_timer_retrigger: lost localtimer interrupt >>> >>> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in >>> ON state as well. It does blink again when doing activity on console. >>> >>> Workaround: >>> >>> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. >>> >>> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. >> >> I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem >> go away. Hmm.... > > Another finger pointing in the same direction: omap2plus_defconfig + > CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's > -next. Is it observed on OMAP4460 only? if no - it's smth new. if yes - may be some racing condition is still present. Roger, is it possible to connect debugger and check GIC distributor status (gic_dist_base_addr + GIC_DIST_CTRL) in case of failure? According to the current code (OMAP4460) it's possible that CPU0 will stuck only in case if CPU1 is kicked off from PWRDM_POWER_OFF state somehow but not by CPU0. Code assumes that CPU1 can exit from PWRDM_POWER_OFF state only when CPU0 calls clkdm_wakeup(cpu_clkdm[1]); Sorry, but I'm not able to debug it now. Regards, -grygorii ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-08 17:12 ` Grygorii Strashko @ 2014-05-09 8:30 ` Roger Quadros 2014-05-09 12:33 ` Nishanth Menon 0 siblings, 1 reply; 18+ messages in thread From: Roger Quadros @ 2014-05-09 8:30 UTC (permalink / raw) To: linux-arm-kernel Grygorii, On 05/08/2014 08:12 PM, Grygorii Strashko wrote: > Hi, > > On 05/08/2014 06:40 PM, Kevin Hilman wrote: >> On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote: >>> Roger Quadros <rogerq@ti.com> writes: >>> >>>> Hi, >>>> >>>> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing >>>> similar issues, although less frequent, with v3.15-rc4 as well. >>>> >>>> Configuration: >>>> >>>> - kernel v3.15-rc4 or linux-next (20140507) >>>> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled >>>> - u-boot/master 173d294b94cf >>>> >>>> Observations: >>>> >>>> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. >>>> e.g. http://www.hastebin.com/ebumojegoq.vhdl >>>> >>>> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 >>> >>> I've beeen noticing the same thing for awhile with my boot tests. For >>> me, next-20140508 is failing most of the time now. >>> >>>> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. >>>> Maybe related to when high speed interrupts occur in the boot process. >>>> >>>> - On successful boots following warning is seen >>>> [ 4.010375] gic_timer_retrigger: lost localtimer interrupt >>>> >>>> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in >>>> ON state as well. It does blink again when doing activity on console. >>>> >>>> Workaround: >>>> >>>> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. >>>> >>>> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. >>> >>> I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem >>> go away. Hmm.... >> >> Another finger pointing in the same direction: omap2plus_defconfig + >> CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's >> -next. > > Is it observed on OMAP4460 only? > if no - it's smth new. > if yes - may be some racing condition is still present. I could observe it on 4430 as well, but just less frequent. 2/10 times on 4430 vs 7/10 times on 4460. > > Roger, is it possible to connect debugger and check GIC distributor status > (gic_dist_base_addr + GIC_DIST_CTRL) in case of failure? Sorry, I do not have a debugger with me at the moment. > > According to the current code (OMAP4460) it's possible that CPU0 will stuck only in case > if CPU1 is kicked off from PWRDM_POWER_OFF state somehow but not by CPU0. > Code assumes that CPU1 can exit from PWRDM_POWER_OFF state only when CPU0 calls clkdm_wakeup(cpu_clkdm[1]); > > Sorry, but I'm not able to debug it now. Stupid question, is hearbeat LED even supposed to stop blinking in C3 state? It would make a user think that the board is dead. cheers, -roger ^ permalink raw reply [flat|nested] 18+ messages in thread
* omap4-panda-es boot issues with v3.15-rc4 2014-05-09 8:30 ` Roger Quadros @ 2014-05-09 12:33 ` Nishanth Menon 0 siblings, 0 replies; 18+ messages in thread From: Nishanth Menon @ 2014-05-09 12:33 UTC (permalink / raw) To: linux-arm-kernel On Fri, May 9, 2014 at 3:30 AM, Roger Quadros <rogerq@ti.com> wrote: > > Stupid question, is hearbeat LED even supposed to stop blinking in C3 state? > It would make a user think that the board is dead. I believe yes - we have tick suppression. else we'd be just wasting power by waking up just to blink an LED. some deeper C states need higher latencies. Regards, Nishanth Menon ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2014-05-13 14:19 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-08 12:53 omap4-panda-es boot issues with v3.15-rc4 Roger Quadros 2014-05-08 15:31 ` Kevin Hilman 2014-05-08 15:40 ` Kevin Hilman 2014-05-08 16:55 ` Tony Lindgren 2014-05-08 18:40 ` Tony Lindgren 2014-05-08 22:15 ` Kevin Hilman 2014-05-09 8:23 ` Roger Quadros 2014-05-09 23:45 ` Kevin Hilman 2014-05-11 15:55 ` Tony Lindgren 2014-05-12 21:40 ` Santosh Shilimkar 2014-05-12 22:07 ` Tony Lindgren 2014-05-13 8:10 ` Roger Quadros 2014-05-13 14:19 ` Santosh Shilimkar 2014-05-12 23:56 ` Kevin Hilman 2014-05-09 8:20 ` Roger Quadros 2014-05-08 17:12 ` Grygorii Strashko 2014-05-09 8:30 ` Roger Quadros 2014-05-09 12:33 ` Nishanth Menon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).