linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* omap4-panda-es boot issues with v3.15-rc4
@ 2014-05-08 12:53 Roger Quadros
  2014-05-08 15:31 ` Kevin Hilman
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Quadros @ 2014-05-08 12:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing
similar issues, although less frequent, with v3.15-rc4 as well.

Configuration:

- kernel v3.15-rc4 or linux-next (20140507)
- multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
- u-boot/master	173d294b94cf

Observations:

- Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops.
e.g. http://www.hastebin.com/ebumojegoq.vhdl

- Hang more noticeable on linux-next (20140507) than on v3.15-rc4

- Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD.
Maybe related to when high speed interrupts occur in the boot process.

- On successful boots following warning is seen
[    4.010375] gic_timer_retrigger: lost localtimer interrupt

- On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in
ON state as well. It does blink again when doing activity on console.

Workaround:

- Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues.

I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR.

cheers,
-roger

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-08 12:53 omap4-panda-es boot issues with v3.15-rc4 Roger Quadros
@ 2014-05-08 15:31 ` Kevin Hilman
  2014-05-08 15:40   ` Kevin Hilman
  0 siblings, 1 reply; 18+ messages in thread
From: Kevin Hilman @ 2014-05-08 15:31 UTC (permalink / raw)
  To: linux-arm-kernel

Roger Quadros <rogerq@ti.com> writes:

> Hi,
>
> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing
> similar issues, although less frequent, with v3.15-rc4 as well.
>
> Configuration:
>
> - kernel v3.15-rc4 or linux-next (20140507)
> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
> - u-boot/master	173d294b94cf
>
> Observations:
>
> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops.
> e.g. http://www.hastebin.com/ebumojegoq.vhdl
>
> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4

I've beeen noticing the same thing for awhile with my boot tests.  For
me, next-20140508 is failing most of the time now.

> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD.
> Maybe related to when high speed interrupts occur in the boot process.
>
> - On successful boots following warning is seen
> [    4.010375] gic_timer_retrigger: lost localtimer interrupt
>
> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in
> ON state as well. It does blink again when doing activity on console.
>
> Workaround:
>
> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues.
>
> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR.

I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
go away.  Hmm....

Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-08 15:31 ` Kevin Hilman
@ 2014-05-08 15:40   ` Kevin Hilman
  2014-05-08 16:55     ` Tony Lindgren
  2014-05-08 17:12     ` Grygorii Strashko
  0 siblings, 2 replies; 18+ messages in thread
From: Kevin Hilman @ 2014-05-08 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote:
> Roger Quadros <rogerq@ti.com> writes:
>
>> Hi,
>>
>> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing
>> similar issues, although less frequent, with v3.15-rc4 as well.
>>
>> Configuration:
>>
>> - kernel v3.15-rc4 or linux-next (20140507)
>> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
>> - u-boot/master       173d294b94cf
>>
>> Observations:
>>
>> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops.
>> e.g. http://www.hastebin.com/ebumojegoq.vhdl
>>
>> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4
>
> I've beeen noticing the same thing for awhile with my boot tests.  For
> me, next-20140508 is failing most of the time now.
>
>> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD.
>> Maybe related to when high speed interrupts occur in the boot process.
>>
>> - On successful boots following warning is seen
>> [    4.010375] gic_timer_retrigger: lost localtimer interrupt
>>
>> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in
>> ON state as well. It does blink again when doing activity on console.
>>
>> Workaround:
>>
>> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues.
>>
>> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR.
>
> I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
> go away.  Hmm....

Another finger pointing in the same direction: omap2plus_defconfig +
CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
-next.

Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-08 15:40   ` Kevin Hilman
@ 2014-05-08 16:55     ` Tony Lindgren
  2014-05-08 18:40       ` Tony Lindgren
  2014-05-09  8:20       ` Roger Quadros
  2014-05-08 17:12     ` Grygorii Strashko
  1 sibling, 2 replies; 18+ messages in thread
From: Tony Lindgren @ 2014-05-08 16:55 UTC (permalink / raw)
  To: linux-arm-kernel

* Kevin Hilman <khilman@linaro.org> [140508 08:40]:
> On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote:
> > Roger Quadros <rogerq@ti.com> writes:
> >
> >> Hi,
> >>
> >> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing
> >> similar issues, although less frequent, with v3.15-rc4 as well.
> >>
> >> Configuration:
> >>
> >> - kernel v3.15-rc4 or linux-next (20140507)
> >> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
> >> - u-boot/master       173d294b94cf
> >>
> >> Observations:
> >>
> >> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops.
> >> e.g. http://www.hastebin.com/ebumojegoq.vhdl
> >>
> >> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4
> >
> > I've beeen noticing the same thing for awhile with my boot tests.  For
> > me, next-20140508 is failing most of the time now.
> >
> >> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD.
> >> Maybe related to when high speed interrupts occur in the boot process.
> >>
> >> - On successful boots following warning is seen
> >> [    4.010375] gic_timer_retrigger: lost localtimer interrupt
> >>
> >> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in
> >> ON state as well. It does blink again when doing activity on console.
> >>
> >> Workaround:
> >>
> >> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues.
> >>
> >> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR.
> >
> > I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
> > go away.  Hmm....
> 
> Another finger pointing in the same direction: omap2plus_defconfig +
> CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
> -next.

Booting today's next with multi_v7_defconfig (so cpuidle enabled) on
omap4 sdp seems to boot reliably. And it's not producing these:

gic_timer_retrigger: lost localtimer interrupt 

while panda is producing those errors like Roger mentioned.

It seems that the USB networking is the main difference between
omap4 sdp and panda?

Regards,

Tony

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-08 15:40   ` Kevin Hilman
  2014-05-08 16:55     ` Tony Lindgren
@ 2014-05-08 17:12     ` Grygorii Strashko
  2014-05-09  8:30       ` Roger Quadros
  1 sibling, 1 reply; 18+ messages in thread
From: Grygorii Strashko @ 2014-05-08 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 05/08/2014 06:40 PM, Kevin Hilman wrote:
> On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote:
>> Roger Quadros <rogerq@ti.com> writes:
>>
>>> Hi,
>>>
>>> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing
>>> similar issues, although less frequent, with v3.15-rc4 as well.
>>>
>>> Configuration:
>>>
>>> - kernel v3.15-rc4 or linux-next (20140507)
>>> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
>>> - u-boot/master       173d294b94cf
>>>
>>> Observations:
>>>
>>> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops.
>>> e.g. http://www.hastebin.com/ebumojegoq.vhdl
>>>
>>> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4
>>
>> I've beeen noticing the same thing for awhile with my boot tests.  For
>> me, next-20140508 is failing most of the time now.
>>
>>> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD.
>>> Maybe related to when high speed interrupts occur in the boot process.
>>>
>>> - On successful boots following warning is seen
>>> [    4.010375] gic_timer_retrigger: lost localtimer interrupt
>>>
>>> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in
>>> ON state as well. It does blink again when doing activity on console.
>>>
>>> Workaround:
>>>
>>> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues.
>>>
>>> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR.
>>
>> I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
>> go away.  Hmm....
> 
> Another finger pointing in the same direction: omap2plus_defconfig +
> CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
> -next.

Is it observed on OMAP4460 only?
if no - it's smth new.
if yes - may be some racing condition is still present.

Roger, is it possible to connect debugger and check GIC distributor status
(gic_dist_base_addr + GIC_DIST_CTRL) in case of failure?

According to the current code (OMAP4460) it's possible that CPU0 will stuck only in case
if CPU1 is kicked off from PWRDM_POWER_OFF state somehow but not by CPU0. 
Code assumes that CPU1 can exit from PWRDM_POWER_OFF state only when CPU0 calls clkdm_wakeup(cpu_clkdm[1]); 

Sorry, but I'm not able to debug it now.

Regards,
-grygorii

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-08 16:55     ` Tony Lindgren
@ 2014-05-08 18:40       ` Tony Lindgren
  2014-05-08 22:15         ` Kevin Hilman
  2014-05-09  8:20       ` Roger Quadros
  1 sibling, 1 reply; 18+ messages in thread
From: Tony Lindgren @ 2014-05-08 18:40 UTC (permalink / raw)
  To: linux-arm-kernel

Added few cpuidle people to Cc on this regression.

* Tony Lindgren <tony@atomide.com> [140508 09:57]:
> * Kevin Hilman <khilman@linaro.org> [140508 08:40]:
> > On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote:
> > > Roger Quadros <rogerq@ti.com> writes:
> > >
> > >> Hi,
> > >>
> > >> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing
> > >> similar issues, although less frequent, with v3.15-rc4 as well.
> > >>
> > >> Configuration:
> > >>
> > >> - kernel v3.15-rc4 or linux-next (20140507)
> > >> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
> > >> - u-boot/master       173d294b94cf
> > >>
> > >> Observations:
> > >>
> > >> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops.
> > >> e.g. http://www.hastebin.com/ebumojegoq.vhdl
> > >>
> > >> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4
> > >
> > > I've beeen noticing the same thing for awhile with my boot tests.  For
> > > me, next-20140508 is failing most of the time now.
> > >
> > >> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD.
> > >> Maybe related to when high speed interrupts occur in the boot process.
> > >>
> > >> - On successful boots following warning is seen
> > >> [    4.010375] gic_timer_retrigger: lost localtimer interrupt
> > >>
> > >> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in
> > >> ON state as well. It does blink again when doing activity on console.
> > >>
> > >> Workaround:
> > >>
> > >> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues.
> > >>
> > >> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR.
> > >
> > > I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
> > > go away.  Hmm....
> > 
> > Another finger pointing in the same direction: omap2plus_defconfig +
> > CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
> > -next.
> 
> Booting today's next with multi_v7_defconfig (so cpuidle enabled) on
> omap4 sdp seems to boot reliably. And it's not producing these:
> 
> gic_timer_retrigger: lost localtimer interrupt 

Still seeing the above, looks like the lost localtimer interrupt
above is a separate issue..
 
> while panda is producing those errors like Roger mentioned.
> 
> It seems that the USB networking is the main difference between
> omap4 sdp and panda?

..but I think I found the cause for recent hangs on panda, just a wild
guess based on looking at the recent cpuidle patches after v3.14.

Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
until all coupled CPUs leave idle) makes booting work reliably again
on panda.

Can you guys confirm, so far no issues here after few boot tests,
but it might be too early to tell.

Regards,

Tony

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-08 18:40       ` Tony Lindgren
@ 2014-05-08 22:15         ` Kevin Hilman
  2014-05-09  8:23           ` Roger Quadros
  0 siblings, 1 reply; 18+ messages in thread
From: Kevin Hilman @ 2014-05-08 22:15 UTC (permalink / raw)
  To: linux-arm-kernel

Tony Lindgren <tony@atomide.com> writes:

[...]

> ..but I think I found the cause for recent hangs on panda, just a wild
> guess based on looking at the recent cpuidle patches after v3.14.
>
> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
> until all coupled CPUs leave idle) makes booting work reliably again
> on panda.
>
> Can you guys confirm, so far no issues here after few boot tests,
> but it might be too early to tell.

Reverting that makes things a bit more stable, but it still eventually
fails in the same way.  For me it took 8 boots for it to eventually
fail.

However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
(20+ boots in a row and still going.)

Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-08 16:55     ` Tony Lindgren
  2014-05-08 18:40       ` Tony Lindgren
@ 2014-05-09  8:20       ` Roger Quadros
  1 sibling, 0 replies; 18+ messages in thread
From: Roger Quadros @ 2014-05-09  8:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/08/2014 07:55 PM, Tony Lindgren wrote:
> * Kevin Hilman <khilman@linaro.org> [140508 08:40]:
>> On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote:
>>> Roger Quadros <rogerq@ti.com> writes:
>>>
>>>> Hi,
>>>>
>>>> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing
>>>> similar issues, although less frequent, with v3.15-rc4 as well.
>>>>
>>>> Configuration:
>>>>
>>>> - kernel v3.15-rc4 or linux-next (20140507)
>>>> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
>>>> - u-boot/master       173d294b94cf
>>>>
>>>> Observations:
>>>>
>>>> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops.
>>>> e.g. http://www.hastebin.com/ebumojegoq.vhdl
>>>>
>>>> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4
>>>
>>> I've beeen noticing the same thing for awhile with my boot tests.  For
>>> me, next-20140508 is failing most of the time now.
>>>
>>>> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD.
>>>> Maybe related to when high speed interrupts occur in the boot process.
>>>>
>>>> - On successful boots following warning is seen
>>>> [    4.010375] gic_timer_retrigger: lost localtimer interrupt
>>>>
>>>> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in
>>>> ON state as well. It does blink again when doing activity on console.
>>>>
>>>> Workaround:
>>>>
>>>> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues.
>>>>
>>>> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR.
>>>
>>> I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
>>> go away.  Hmm....
>>
>> Another finger pointing in the same direction: omap2plus_defconfig +
>> CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
>> -next.
> 
> Booting today's next with multi_v7_defconfig (so cpuidle enabled) on
> omap4 sdp seems to boot reliably. And it's not producing these:
> 
> gic_timer_retrigger: lost localtimer interrupt 
> 
> while panda is producing those errors like Roger mentioned.
> 
> It seems that the USB networking is the main difference between
> omap4 sdp and panda?

Is your sdp using omap4430?

To confirm 4430 vs 4460 I ran 10 tests each on omap4430 panda and omap4460 panda.

4430panda fails 2/10 times.
4460panda fails 7/10 times.

cheers,
-roger

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-08 22:15         ` Kevin Hilman
@ 2014-05-09  8:23           ` Roger Quadros
  2014-05-09 23:45             ` Kevin Hilman
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Quadros @ 2014-05-09  8:23 UTC (permalink / raw)
  To: linux-arm-kernel

Kevin,

On 05/09/2014 01:15 AM, Kevin Hilman wrote:
> Tony Lindgren <tony@atomide.com> writes:
> 
> [...]
> 
>> ..but I think I found the cause for recent hangs on panda, just a wild
>> guess based on looking at the recent cpuidle patches after v3.14.
>>
>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
>> until all coupled CPUs leave idle) makes booting work reliably again
>> on panda.
>>
>> Can you guys confirm, so far no issues here after few boot tests,
>> but it might be too early to tell.
> 
> Reverting that makes things a bit more stable, but it still eventually
> fails in the same way.  For me it took 8 boots for it to eventually
> fail.
> 
> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
> (20+ boots in a row and still going.)
> 

Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
It worked for me 10/10 boots.

diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c
index 01fc710..99362ff 100644
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -206,7 +206,12 @@ static struct cpuidle_driver omap4_idle_driver = {
 			.desc = "CPUx OFF, MPUSS OSWR",
 		},
 	},
-	.state_count = ARRAY_SIZE(omap4_idle_data),
+/*
+ * 	Disable C3 state since it is unstable
+ *
+ *	.state_count = ARRAY_SIZE(omap4_idle_data),
+ */
+	.state_count = 2,
 	.safe_state_index = 0,
 };
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-08 17:12     ` Grygorii Strashko
@ 2014-05-09  8:30       ` Roger Quadros
  2014-05-09 12:33         ` Nishanth Menon
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Quadros @ 2014-05-09  8:30 UTC (permalink / raw)
  To: linux-arm-kernel

Grygorii,

On 05/08/2014 08:12 PM, Grygorii Strashko wrote:
> Hi,
> 
> On 05/08/2014 06:40 PM, Kevin Hilman wrote:
>> On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman <khilman@linaro.org> wrote:
>>> Roger Quadros <rogerq@ti.com> writes:
>>>
>>>> Hi,
>>>>
>>>> Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing
>>>> similar issues, although less frequent, with v3.15-rc4 as well.
>>>>
>>>> Configuration:
>>>>
>>>> - kernel v3.15-rc4 or linux-next (20140507)
>>>> - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
>>>> - u-boot/master       173d294b94cf
>>>>
>>>> Observations:
>>>>
>>>> - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops.
>>>> e.g. http://www.hastebin.com/ebumojegoq.vhdl
>>>>
>>>> - Hang more noticeable on linux-next (20140507) than on v3.15-rc4
>>>
>>> I've beeen noticing the same thing for awhile with my boot tests.  For
>>> me, next-20140508 is failing most of the time now.
>>>
>>>> - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD.
>>>> Maybe related to when high speed interrupts occur in the boot process.
>>>>
>>>> - On successful boots following warning is seen
>>>> [    4.010375] gic_timer_retrigger: lost localtimer interrupt
>>>>
>>>> - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in
>>>> ON state as well. It does blink again when doing activity on console.
>>>>
>>>> Workaround:
>>>>
>>>> - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues.
>>>>
>>>> I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR.
>>>
>>> I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
>>> go away.  Hmm....
>>
>> Another finger pointing in the same direction: omap2plus_defconfig +
>> CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
>> -next.
> 
> Is it observed on OMAP4460 only?
> if no - it's smth new.
> if yes - may be some racing condition is still present.

I could observe it on 4430 as well, but just less frequent. 2/10 times on 4430 vs 7/10 times on 4460.

> 
> Roger, is it possible to connect debugger and check GIC distributor status
> (gic_dist_base_addr + GIC_DIST_CTRL) in case of failure?

Sorry, I do not have a debugger with me at the moment.
> 
> According to the current code (OMAP4460) it's possible that CPU0 will stuck only in case
> if CPU1 is kicked off from PWRDM_POWER_OFF state somehow but not by CPU0. 
> Code assumes that CPU1 can exit from PWRDM_POWER_OFF state only when CPU0 calls clkdm_wakeup(cpu_clkdm[1]); 
> 
> Sorry, but I'm not able to debug it now.

Stupid question, is hearbeat LED even supposed to stop blinking in C3 state?
It would make a user think that the board is dead.

cheers,
-roger

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-09  8:30       ` Roger Quadros
@ 2014-05-09 12:33         ` Nishanth Menon
  0 siblings, 0 replies; 18+ messages in thread
From: Nishanth Menon @ 2014-05-09 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 9, 2014 at 3:30 AM, Roger Quadros <rogerq@ti.com> wrote:
>
> Stupid question, is hearbeat LED even supposed to stop blinking in C3 state?
> It would make a user think that the board is dead.

I believe yes - we have tick suppression. else we'd be just wasting
power by waking up just to blink an LED. some deeper C states need
higher latencies.

Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-09  8:23           ` Roger Quadros
@ 2014-05-09 23:45             ` Kevin Hilman
  2014-05-11 15:55               ` Tony Lindgren
  0 siblings, 1 reply; 18+ messages in thread
From: Kevin Hilman @ 2014-05-09 23:45 UTC (permalink / raw)
  To: linux-arm-kernel

Roger Quadros <rogerq@ti.com> writes:

> Kevin,
>
> On 05/09/2014 01:15 AM, Kevin Hilman wrote:
>> Tony Lindgren <tony@atomide.com> writes:
>> 
>> [...]
>> 
>>> ..but I think I found the cause for recent hangs on panda, just a wild
>>> guess based on looking at the recent cpuidle patches after v3.14.
>>>
>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
>>> until all coupled CPUs leave idle) makes booting work reliably again
>>> on panda.
>>>
>>> Can you guys confirm, so far no issues here after few boot tests,
>>> but it might be too early to tell.
>> 
>> Reverting that makes things a bit more stable, but it still eventually
>> fails in the same way.  For me it took 8 boots for it to eventually
>> fail.
>> 
>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
>> (20+ boots in a row and still going.)
>> 
>
> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
> It worked for me 10/10 boots.

Yup, it worked for me too for 10/10 boots in a row.

Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-09 23:45             ` Kevin Hilman
@ 2014-05-11 15:55               ` Tony Lindgren
  2014-05-12 21:40                 ` Santosh Shilimkar
  0 siblings, 1 reply; 18+ messages in thread
From: Tony Lindgren @ 2014-05-11 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

* Kevin Hilman <khilman@linaro.org> [140509 16:46]:
> Roger Quadros <rogerq@ti.com> writes:
> 
> > Kevin,
> >
> > On 05/09/2014 01:15 AM, Kevin Hilman wrote:
> >> Tony Lindgren <tony@atomide.com> writes:
> >> 
> >> [...]
> >> 
> >>> ..but I think I found the cause for recent hangs on panda, just a wild
> >>> guess based on looking at the recent cpuidle patches after v3.14.
> >>>
> >>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
> >>> until all coupled CPUs leave idle) makes booting work reliably again
> >>> on panda.
> >>>
> >>> Can you guys confirm, so far no issues here after few boot tests,
> >>> but it might be too early to tell.
> >> 
> >> Reverting that makes things a bit more stable, but it still eventually
> >> fails in the same way.  For me it took 8 boots for it to eventually
> >> fail.
> >> 
> >> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
> >> (20+ boots in a row and still going.)
> >> 
> >
> > Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
> > It worked for me 10/10 boots.
> 
> Yup, it worked for me too for 10/10 boots in a row.

But what has caused this regression, does it work reliably with let's
say v3.13 or v3.12?

Regards,

Tony

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-11 15:55               ` Tony Lindgren
@ 2014-05-12 21:40                 ` Santosh Shilimkar
  2014-05-12 22:07                   ` Tony Lindgren
  2014-05-12 23:56                   ` Kevin Hilman
  0 siblings, 2 replies; 18+ messages in thread
From: Santosh Shilimkar @ 2014-05-12 21:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
> * Kevin Hilman <khilman@linaro.org> [140509 16:46]:
>> Roger Quadros <rogerq@ti.com> writes:
>>
>>> Kevin,
>>>
>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote:
>>>> Tony Lindgren <tony@atomide.com> writes:
>>>>
>>>> [...]
>>>>
>>>>> ..but I think I found the cause for recent hangs on panda, just a wild
>>>>> guess based on looking at the recent cpuidle patches after v3.14.
>>>>>
>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
>>>>> until all coupled CPUs leave idle) makes booting work reliably again
>>>>> on panda.
>>>>>
>>>>> Can you guys confirm, so far no issues here after few boot tests,
>>>>> but it might be too early to tell.
>>>>
>>>> Reverting that makes things a bit more stable, but it still eventually
>>>> fails in the same way.  For me it took 8 boots for it to eventually
>>>> fail.
>>>>
>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
>>>> (20+ boots in a row and still going.)
>>>>
>>>
>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
>>> It worked for me 10/10 boots.
>>
>> Yup, it worked for me too for 10/10 boots in a row.
> 
> But what has caused this regression, does it work reliably with let's
> say v3.13 or v3.12?
> 
IIRC things were stable till some CPUIDLE code consolidation happened.
I don't recall exactly but some one did discuss about it a while back.

Can you re-run your test-cases with patch at end of the email. This
is just a hunch so don't blame me if I waste your time testing the
patch.

regards,
Santosh

>From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <santosh.shilimkar@ti.com>
Date: Mon, 12 May 2014 17:37:59 -0400
Subject: [PATCH] Revert "cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag"

This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78.

Conflicts:

	arch/arm/mach-omap2/cpuidle44xx.c
---
 arch/arm/mach-omap2/cpuidle44xx.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c
index 01fc710..aae3606 100644
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
 {
 	struct idle_statedata *cx = state_ptr + index;
 	u32 mpuss_can_lose_context = 0;
+	int cpu_id = smp_processor_id();
 
 	/*
 	 * CPU0 has to wait and stay ON until CPU1 is OFF state.
@@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
 	mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) &&
 				 (cx->mpu_logic_state == PWRDM_POWER_OFF);
 
+	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id);
+
 	/*
 	 * Call idle CPU PM enter notifier chain so that
 	 * VFP and per CPU interrupt context is saved.
@@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
 	if (dev->cpu == 0 && mpuss_can_lose_context)
 		cpu_cluster_pm_exit();
 
+	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id);
+
 fail:
 	cpuidle_coupled_parallel_barrier(dev, &abort_barrier);
 	cpu_done[dev->cpu] = false;
@@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = {
 			/* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */
 			.exit_latency = 328 + 440,
 			.target_residency = 960,
-			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED |
-			         CPUIDLE_FLAG_TIMER_STOP,
+			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
 			.enter = omap_enter_idle_coupled,
 			.name = "C2",
 			.desc = "CPUx OFF, MPUSS CSWR",
@@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = {
 			/* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */
 			.exit_latency = 460 + 518,
 			.target_residency = 1100,
-			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED |
-			         CPUIDLE_FLAG_TIMER_STOP,
+			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
 			.enter = omap_enter_idle_coupled,
 			.name = "C3",
 			.desc = "CPUx OFF, MPUSS OSWR",
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-12 21:40                 ` Santosh Shilimkar
@ 2014-05-12 22:07                   ` Tony Lindgren
  2014-05-13  8:10                     ` Roger Quadros
  2014-05-12 23:56                   ` Kevin Hilman
  1 sibling, 1 reply; 18+ messages in thread
From: Tony Lindgren @ 2014-05-12 22:07 UTC (permalink / raw)
  To: linux-arm-kernel

* Santosh Shilimkar <santosh.shilimkar@ti.com> [140512 14:41]:
> On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
> > * Kevin Hilman <khilman@linaro.org> [140509 16:46]:
> >> Roger Quadros <rogerq@ti.com> writes:
> >>
> >>> Kevin,
> >>>
> >>> On 05/09/2014 01:15 AM, Kevin Hilman wrote:
> >>>> Tony Lindgren <tony@atomide.com> writes:
> >>>>
> >>>> [...]
> >>>>
> >>>>> ..but I think I found the cause for recent hangs on panda, just a wild
> >>>>> guess based on looking at the recent cpuidle patches after v3.14.
> >>>>>
> >>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
> >>>>> until all coupled CPUs leave idle) makes booting work reliably again
> >>>>> on panda.
> >>>>>
> >>>>> Can you guys confirm, so far no issues here after few boot tests,
> >>>>> but it might be too early to tell.
> >>>>
> >>>> Reverting that makes things a bit more stable, but it still eventually
> >>>> fails in the same way.  For me it took 8 boots for it to eventually
> >>>> fail.
> >>>>
> >>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
> >>>> (20+ boots in a row and still going.)
> >>>>
> >>>
> >>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
> >>> It worked for me 10/10 boots.
> >>
> >> Yup, it worked for me too for 10/10 boots in a row.
> > 
> > But what has caused this regression, does it work reliably with let's
> > say v3.13 or v3.12?
> > 
> IIRC things were stable till some CPUIDLE code consolidation happened.
> I don't recall exactly but some one did discuss about it a while back.

OK that's good to hear.
 
> Can you re-run your test-cases with patch at end of the email. This
> is just a hunch so don't blame me if I waste your time testing the
> patch.

Seems to work after adding "#include <linux/clockchips.h>". I did about 10
reboots and they all succeeded for me. Without your revert, I'm getting
a hang (with sysrq not working) about 1/3 of the boots.

Kevin, Roger, does the revert from Santosh work for you too?

BTW, I think the the RCU stall was/is a separate issue. That's different
where the system actually recovers after about a minute, or after sysrq
ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the
older kernels around v3.10 time, or if it's still also happening.

Regards,

Tony
 
> From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001
> From: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Date: Mon, 12 May 2014 17:37:59 -0400
> Subject: [PATCH] Revert "cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag"
> 
> This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78.
> 
> Conflicts:
> 
> 	arch/arm/mach-omap2/cpuidle44xx.c
> ---
>  arch/arm/mach-omap2/cpuidle44xx.c |   11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c
> index 01fc710..aae3606 100644
> --- a/arch/arm/mach-omap2/cpuidle44xx.c
> +++ b/arch/arm/mach-omap2/cpuidle44xx.c
> @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
>  {
>  	struct idle_statedata *cx = state_ptr + index;
>  	u32 mpuss_can_lose_context = 0;
> +	int cpu_id = smp_processor_id();
>  
>  	/*
>  	 * CPU0 has to wait and stay ON until CPU1 is OFF state.
> @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
>  	mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) &&
>  				 (cx->mpu_logic_state == PWRDM_POWER_OFF);
>  
> +	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id);
> +
>  	/*
>  	 * Call idle CPU PM enter notifier chain so that
>  	 * VFP and per CPU interrupt context is saved.
> @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
>  	if (dev->cpu == 0 && mpuss_can_lose_context)
>  		cpu_cluster_pm_exit();
>  
> +	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id);
> +
>  fail:
>  	cpuidle_coupled_parallel_barrier(dev, &abort_barrier);
>  	cpu_done[dev->cpu] = false;
> @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = {
>  			/* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */
>  			.exit_latency = 328 + 440,
>  			.target_residency = 960,
> -			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED |
> -			         CPUIDLE_FLAG_TIMER_STOP,
> +			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
>  			.enter = omap_enter_idle_coupled,
>  			.name = "C2",
>  			.desc = "CPUx OFF, MPUSS CSWR",
> @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = {
>  			/* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */
>  			.exit_latency = 460 + 518,
>  			.target_residency = 1100,
> -			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED |
> -			         CPUIDLE_FLAG_TIMER_STOP,
> +			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
>  			.enter = omap_enter_idle_coupled,
>  			.name = "C3",
>  			.desc = "CPUx OFF, MPUSS OSWR",
> -- 
> 1.7.9.5
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-12 21:40                 ` Santosh Shilimkar
  2014-05-12 22:07                   ` Tony Lindgren
@ 2014-05-12 23:56                   ` Kevin Hilman
  1 sibling, 0 replies; 18+ messages in thread
From: Kevin Hilman @ 2014-05-12 23:56 UTC (permalink / raw)
  To: linux-arm-kernel

Santosh Shilimkar <santosh.shilimkar@ti.com> writes:

> On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
>> * Kevin Hilman <khilman@linaro.org> [140509 16:46]:
>>> Roger Quadros <rogerq@ti.com> writes:
>>>
>>>> Kevin,
>>>>
>>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote:
>>>>> Tony Lindgren <tony@atomide.com> writes:
>>>>>
>>>>> [...]
>>>>>
>>>>>> ..but I think I found the cause for recent hangs on panda, just a wild
>>>>>> guess based on looking at the recent cpuidle patches after v3.14.
>>>>>>
>>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
>>>>>> until all coupled CPUs leave idle) makes booting work reliably again
>>>>>> on panda.
>>>>>>
>>>>>> Can you guys confirm, so far no issues here after few boot tests,
>>>>>> but it might be too early to tell.
>>>>>
>>>>> Reverting that makes things a bit more stable, but it still eventually
>>>>> fails in the same way.  For me it took 8 boots for it to eventually
>>>>> fail.
>>>>>
>>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
>>>>> (20+ boots in a row and still going.)
>>>>>
>>>>
>>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
>>>> It worked for me 10/10 boots.
>>>
>>> Yup, it worked for me too for 10/10 boots in a row.
>> 
>> But what has caused this regression, does it work reliably with let's
>> say v3.13 or v3.12?
>> 
> IIRC things were stable till some CPUIDLE code consolidation happened.
> I don't recall exactly but some one did discuss about it a while back.
>
> Can you re-run your test-cases with patch at end of the email. This
> is just a hunch so don't blame me if I waste your time testing the
> patch.

With your patch applied on top of next-20140512, my 4460 Panda-ES has
booted 25 times in a row, and still going.

Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-12 22:07                   ` Tony Lindgren
@ 2014-05-13  8:10                     ` Roger Quadros
  2014-05-13 14:19                       ` Santosh Shilimkar
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Quadros @ 2014-05-13  8:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/13/2014 01:07 AM, Tony Lindgren wrote:
> * Santosh Shilimkar <santosh.shilimkar@ti.com> [140512 14:41]:
>> On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
>>> * Kevin Hilman <khilman@linaro.org> [140509 16:46]:
>>>> Roger Quadros <rogerq@ti.com> writes:
>>>>
>>>>> Kevin,
>>>>>
>>>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote:
>>>>>> Tony Lindgren <tony@atomide.com> writes:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> ..but I think I found the cause for recent hangs on panda, just a wild
>>>>>>> guess based on looking at the recent cpuidle patches after v3.14.
>>>>>>>
>>>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
>>>>>>> until all coupled CPUs leave idle) makes booting work reliably again
>>>>>>> on panda.
>>>>>>>
>>>>>>> Can you guys confirm, so far no issues here after few boot tests,
>>>>>>> but it might be too early to tell.
>>>>>>
>>>>>> Reverting that makes things a bit more stable, but it still eventually
>>>>>> fails in the same way.  For me it took 8 boots for it to eventually
>>>>>> fail.
>>>>>>
>>>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
>>>>>> (20+ boots in a row and still going.)
>>>>>>
>>>>>
>>>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
>>>>> It worked for me 10/10 boots.
>>>>
>>>> Yup, it worked for me too for 10/10 boots in a row.
>>>
>>> But what has caused this regression, does it work reliably with let's
>>> say v3.13 or v3.12?
>>>
>> IIRC things were stable till some CPUIDLE code consolidation happened.
>> I don't recall exactly but some one did discuss about it a while back.
> 
> OK that's good to hear.
>  
>> Can you re-run your test-cases with patch at end of the email. This
>> is just a hunch so don't blame me if I waste your time testing the
>> patch.
> 
> Seems to work after adding "#include <linux/clockchips.h>". I did about 10
> reboots and they all succeeded for me. Without your revert, I'm getting
> a hang (with sysrq not working) about 1/3 of the boots.
> 
> Kevin, Roger, does the revert from Santosh work for you too?
> 

next-20140508 worked for me 10/10 times with Santosh's patch.
The heartbeat LED behaves normally as well. So I like it :).

cheers,
-roger

> BTW, I think the the RCU stall was/is a separate issue. That's different
> where the system actually recovers after about a minute, or after sysrq
> ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the
> older kernels around v3.10 time, or if it's still also happening.
> 
> Regards,
> 
> Tony
>  
>> From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001
>> From: Santosh Shilimkar <santosh.shilimkar@ti.com>
>> Date: Mon, 12 May 2014 17:37:59 -0400
>> Subject: [PATCH] Revert "cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag"
>>
>> This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78.
>>
>> Conflicts:
>>
>> 	arch/arm/mach-omap2/cpuidle44xx.c
>> ---
>>  arch/arm/mach-omap2/cpuidle44xx.c |   11 +++++++----
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c
>> index 01fc710..aae3606 100644
>> --- a/arch/arm/mach-omap2/cpuidle44xx.c
>> +++ b/arch/arm/mach-omap2/cpuidle44xx.c
>> @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
>>  {
>>  	struct idle_statedata *cx = state_ptr + index;
>>  	u32 mpuss_can_lose_context = 0;
>> +	int cpu_id = smp_processor_id();
>>  
>>  	/*
>>  	 * CPU0 has to wait and stay ON until CPU1 is OFF state.
>> @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
>>  	mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) &&
>>  				 (cx->mpu_logic_state == PWRDM_POWER_OFF);
>>  
>> +	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id);
>> +
>>  	/*
>>  	 * Call idle CPU PM enter notifier chain so that
>>  	 * VFP and per CPU interrupt context is saved.
>> @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
>>  	if (dev->cpu == 0 && mpuss_can_lose_context)
>>  		cpu_cluster_pm_exit();
>>  
>> +	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id);
>> +
>>  fail:
>>  	cpuidle_coupled_parallel_barrier(dev, &abort_barrier);
>>  	cpu_done[dev->cpu] = false;
>> @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = {
>>  			/* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */
>>  			.exit_latency = 328 + 440,
>>  			.target_residency = 960,
>> -			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED |
>> -			         CPUIDLE_FLAG_TIMER_STOP,
>> +			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
>>  			.enter = omap_enter_idle_coupled,
>>  			.name = "C2",
>>  			.desc = "CPUx OFF, MPUSS CSWR",
>> @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = {
>>  			/* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */
>>  			.exit_latency = 460 + 518,
>>  			.target_residency = 1100,
>> -			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED |
>> -			         CPUIDLE_FLAG_TIMER_STOP,
>> +			.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
>>  			.enter = omap_enter_idle_coupled,
>>  			.name = "C3",
>>  			.desc = "CPUx OFF, MPUSS OSWR",
>> -- 
>> 1.7.9.5
>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* omap4-panda-es boot issues with v3.15-rc4
  2014-05-13  8:10                     ` Roger Quadros
@ 2014-05-13 14:19                       ` Santosh Shilimkar
  0 siblings, 0 replies; 18+ messages in thread
From: Santosh Shilimkar @ 2014-05-13 14:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 13 May 2014 04:10 AM, Roger Quadros wrote:
> On 05/13/2014 01:07 AM, Tony Lindgren wrote:
>> * Santosh Shilimkar <santosh.shilimkar@ti.com> [140512 14:41]:
>>> On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
>>>> * Kevin Hilman <khilman@linaro.org> [140509 16:46]:
>>>>> Roger Quadros <rogerq@ti.com> writes:
>>>>>
>>>>>> Kevin,
>>>>>>
>>>>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote:
>>>>>>> Tony Lindgren <tony@atomide.com> writes:
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>> ..but I think I found the cause for recent hangs on panda, just a wild
>>>>>>>> guess based on looking at the recent cpuidle patches after v3.14.
>>>>>>>>
>>>>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
>>>>>>>> until all coupled CPUs leave idle) makes booting work reliably again
>>>>>>>> on panda.
>>>>>>>>
>>>>>>>> Can you guys confirm, so far no issues here after few boot tests,
>>>>>>>> but it might be too early to tell.
>>>>>>>
>>>>>>> Reverting that makes things a bit more stable, but it still eventually
>>>>>>> fails in the same way.  For me it took 8 boots for it to eventually
>>>>>>> fail.
>>>>>>>
>>>>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
>>>>>>> (20+ boots in a row and still going.)
>>>>>>>
>>>>>>
>>>>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
>>>>>> It worked for me 10/10 boots.
>>>>>
>>>>> Yup, it worked for me too for 10/10 boots in a row.
>>>>
>>>> But what has caused this regression, does it work reliably with let's
>>>> say v3.13 or v3.12?
>>>>
>>> IIRC things were stable till some CPUIDLE code consolidation happened.
>>> I don't recall exactly but some one did discuss about it a while back.
>>
>> OK that's good to hear.
>>  
>>> Can you re-run your test-cases with patch at end of the email. This
>>> is just a hunch so don't blame me if I waste your time testing the
>>> patch.
>>
>> Seems to work after adding "#include <linux/clockchips.h>". I did about 10
>> reboots and they all succeeded for me. Without your revert, I'm getting
>> a hang (with sysrq not working) about 1/3 of the boots.
>>
>> Kevin, Roger, does the revert from Santosh work for you too?
>>
> 
> next-20140508 worked for me 10/10 times with Santosh's patch.
> The heartbeat LED behaves normally as well. So I like it :).
> 
Great. Will post the patch with change log updated and cc
you guys.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-05-13 14:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-08 12:53 omap4-panda-es boot issues with v3.15-rc4 Roger Quadros
2014-05-08 15:31 ` Kevin Hilman
2014-05-08 15:40   ` Kevin Hilman
2014-05-08 16:55     ` Tony Lindgren
2014-05-08 18:40       ` Tony Lindgren
2014-05-08 22:15         ` Kevin Hilman
2014-05-09  8:23           ` Roger Quadros
2014-05-09 23:45             ` Kevin Hilman
2014-05-11 15:55               ` Tony Lindgren
2014-05-12 21:40                 ` Santosh Shilimkar
2014-05-12 22:07                   ` Tony Lindgren
2014-05-13  8:10                     ` Roger Quadros
2014-05-13 14:19                       ` Santosh Shilimkar
2014-05-12 23:56                   ` Kevin Hilman
2014-05-09  8:20       ` Roger Quadros
2014-05-08 17:12     ` Grygorii Strashko
2014-05-09  8:30       ` Roger Quadros
2014-05-09 12:33         ` Nishanth Menon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).