public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* OMAP4 panda gets stuck during reboot
@ 2011-01-05 14:08 Felipe Balbi
  2011-01-05 14:14 ` Santosh Shilimkar
  0 siblings, 1 reply; 5+ messages in thread
From: Felipe Balbi @ 2011-01-05 14:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

It seems like there's a bug on flush_pmd_entry() for multicore ARM CPUs.

I'm testing 2.6.37 with pandaboard and when running "reboot" it gets
stuck on:

asm("mcr	p15, 0, %0, c7, c10, 1	@ flush_pmd"
			: : "r" (pmd) : "cc");

I added a few printk()s around the code to find the specific spot where
the reset sequence was getting stuck and here's the output:

[  112.497253] ===> kernel_restart (line 312)
[  112.501525] ===> kernel_restart_prepare (line 294)
[  112.506561] ===> device_shutdown (line 1661)
[  112.511871] ===> device_shutdown (line 1692)
[  112.516387] ===> sysdev_shutdown (line 323)
[  112.520782] ===> sysdev_shutdown (line 349)
[  112.525146] ===> kernel_restart_prepare (line 299)
[  112.530181] Restarting system.
[  112.533386] ===> machine_restart (line 254)
[  112.537750] ===> machine_shutdown (line 232)
[  112.542236] ===> machine_shutdown (line 236)
[  112.546722] ===> arm_machine_restart (line 96)
[  112.551391] ===> setup_mm_for_reboot (line 1059)
[  112.556213] ===> setup_mm_for_reboot (line 1066)
[  112.561035] ===> setup_mm_for_reboot (line 1071)
[  112.565856] ===> flush_pmd_entry (line 522)

What could it be ? Any more debugging I could do to help ?

-- 
balbi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* OMAP4 panda gets stuck during reboot
  2011-01-05 14:08 OMAP4 panda gets stuck during reboot Felipe Balbi
@ 2011-01-05 14:14 ` Santosh Shilimkar
  2011-01-05 14:18   ` Felipe Balbi
  0 siblings, 1 reply; 5+ messages in thread
From: Santosh Shilimkar @ 2011-01-05 14:14 UTC (permalink / raw)
  To: linux-arm-kernel

> -----Original Message-----
> From: linux-omap-owner at vger.kernel.org [mailto:linux-omap-
> owner at vger.kernel.org] On Behalf Of Felipe Balbi
> Sent: Wednesday, January 05, 2011 7:38 PM
> To: Russell King; Tony Lindgren
> Cc: Linux ARM Kernel Mailing List; Linux OMAP Mailing List
> Subject: OMAP4 panda gets stuck during reboot
>
> Hi,
>
> It seems like there's a bug on flush_pmd_entry() for multicore ARM
> CPUs.
>
> I'm testing 2.6.37 with pandaboard and when running "reboot" it gets
> stuck on:
>
> asm("mcr	p15, 0, %0, c7, c10, 1	@ flush_pmd"
> 			: : "r" (pmd) : "cc");
>
> I added a few printk()s around the code to find the specific spot
> where
> the reset sequence was getting stuck and here's the output:
>
> [  112.497253] ===> kernel_restart (line 312)
> [  112.501525] ===> kernel_restart_prepare (line 294)
> [  112.506561] ===> device_shutdown (line 1661)
> [  112.511871] ===> device_shutdown (line 1692)
> [  112.516387] ===> sysdev_shutdown (line 323)
> [  112.520782] ===> sysdev_shutdown (line 349)
> [  112.525146] ===> kernel_restart_prepare (line 299)
> [  112.530181] Restarting system.
> [  112.533386] ===> machine_restart (line 254)
> [  112.537750] ===> machine_shutdown (line 232)
> [  112.542236] ===> machine_shutdown (line 236)
> [  112.546722] ===> arm_machine_restart (line 96)
> [  112.551391] ===> setup_mm_for_reboot (line 1059)
> [  112.556213] ===> setup_mm_for_reboot (line 1066)
> [  112.561035] ===> setup_mm_for_reboot (line 1071)
> [  112.565856] ===> flush_pmd_entry (line 522)
>
> What could it be ? Any more debugging I could do to help ?
>
This is known and seems to OMAP specific issue. Test patch and
relevant thread is here.

http://www.spinics.net/lists/arm-kernel/msg103493.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* OMAP4 panda gets stuck during reboot
  2011-01-05 14:14 ` Santosh Shilimkar
@ 2011-01-05 14:18   ` Felipe Balbi
  2011-01-05 14:22     ` Santosh Shilimkar
  0 siblings, 1 reply; 5+ messages in thread
From: Felipe Balbi @ 2011-01-05 14:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, Jan 05, 2011 at 07:44:31PM +0530, Santosh Shilimkar wrote:
> > What could it be ? Any more debugging I could do to help ?
> >
> This is known and seems to OMAP specific issue. Test patch and

Doesn't look like omap-specific from patch description. Looks like like
CPU1 is turned off and the instruction to flush PMD entry fails. Could
it be that all ARM SMPs are affected ?

-- 
balbi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* OMAP4 panda gets stuck during reboot
  2011-01-05 14:18   ` Felipe Balbi
@ 2011-01-05 14:22     ` Santosh Shilimkar
  2011-01-05 14:26       ` Felipe Balbi
  0 siblings, 1 reply; 5+ messages in thread
From: Santosh Shilimkar @ 2011-01-05 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

> -----Original Message-----
> From: Felipe Balbi [mailto:balbi at ti.com]
> Sent: Wednesday, January 05, 2011 7:48 PM
> To: Santosh Shilimkar
> Cc: balbi at ti.com; Russell King; Tony Lindgren; Linux ARM Kernel
> Mailing List; Linux OMAP Mailing List
> Subject: Re: OMAP4 panda gets stuck during reboot
>
> Hi,
>
> On Wed, Jan 05, 2011 at 07:44:31PM +0530, Santosh Shilimkar wrote:
> > > What could it be ? Any more debugging I could do to help ?
> > >
> > This is known and seems to OMAP specific issue. Test patch and
>
> Doesn't look like omap-specific from patch description. Looks like
> like
> CPU1 is turned off and the instruction to flush PMD entry fails.
> Could
> it be that all ARM SMPs are affected ?
>
Thread is broken some how. There were more emails on this one...
Russell confirmed that he don't see the issue on his A9 Versatile
platform and no one else complained except OMAP.

Copy pasting some last updates..

------------------------------------------------
> -----Original Message-----
> From: Catalin Marinas [mailto:catalin.marinas at arm.com]
> Sent: Wednesday, November 10, 2010 7:35 PM
> To: Russell King - ARM Linux
> Cc: Shilimkar, Santosh; linux-arm-kernel at lists.infradead.org; Gadiyar,
> Anand
> Subject: Re: [PATCH] ARM: Temporary fix for broken arch reboot
>
> On Wed, 2010-11-10 at 10:06 +0000, Russell King - ARM Linux wrote:
> > On Wed, Nov 10, 2010 at 11:25:21AM +0530, Shilimkar, Santosh wrote:
> > > > -----Original Message-----
> > > > From: Catalin Marinas [mailto:catalin.marinas at arm.com]
> > > > Sent: Tuesday, November 09, 2010 10:08 PM
> > > > To: Russell King - ARM Linux
> > > > Cc: Shilimkar, Santosh; linux-arm-kernel at lists.infradead.org;
> Gadiyar,
> > > > Anand
> > > > Subject: Re: [PATCH] ARM: Temporary fix for broken arch reboot
> > > >
> > > > On Tue, 2010-11-09 at 13:18 +0000, Russell King - ARM Linux wrote:
> > > > > On Tue, Nov 09, 2010 at 06:40:39PM +0530, Shilimkar, Santosh
> wrote:
> > > > > > With commit 3d3f78d752bf, reboot seems to broken on ARM
> > > > > > machines. CPU dies while doing flush_pmd_entry() as part of
> > > > > > setup_mm_for_reboot()
> > > >
> > > > What do you mean by 'dies'? Can you still connect with a debugger
or
> it
> > > > got to some weird state?
> > > >
> > > It goes to some weird state. Basically the emulation connection
dies,
> > > and debugger gets disconnected.
> > >
> > > > > > I know this is not the fix but intention is to report the
> > > > > > issue and also provide temporary fix till it get fixed
correctly
> > > > >
> > > > > So you're now rebooting with the secondary CPUs still running.
I
> guess
> > > > > that the secondary CPUs end up crashing and don't restart.
> > > > >
> > > > > I think more the question is why the CP15 cache clean/flush is
> hanging
> > > > > with the other CPUs taken down.  All the other CPUs will be
doing
> is
> > > > > sitting in a loop doing nothing.
> > > >
> > > > I can't think of anything. Did the other CPUs print 'stopping'?
> > > No it doesn't not print anything.
> >
> > The processing of the IPI is asynchronous to the CPU which is
rebooting
> > continuing - which means that if there is some kind of bus lockup, you
> > won't get anything from any of the CPUs.
>
> The printing only happens for SYSTEM_BOOTING or SYSTEM_RUNNING. I
> suspect in this case we have SYSTEM_RESTARTING and the condition in
> ipi_cpu_stop() is false, therefore no printing. It may be worth putting
> some printks outside the 'if' to see whether the secondary CPUs get
> there.
>
While doing some experiments on this issue, one interesting
observation I made. Looks like there is race between two
Cores which makes system behave badly in reboot path.

Just adding a delay in the ipi_cpu_stop() makes the reboot work
as well

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 8c19595..f7dadbf 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -526,6 +526,8 @@ static void ipi_cpu_stop(unsigned int cpu)
                spin_unlock(&stop_lock);
        }

+       udelay(500);
+
        set_cpu_online(cpu, false);

        local_fiq_disable();

------------------------------------------------

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* OMAP4 panda gets stuck during reboot
  2011-01-05 14:22     ` Santosh Shilimkar
@ 2011-01-05 14:26       ` Felipe Balbi
  0 siblings, 0 replies; 5+ messages in thread
From: Felipe Balbi @ 2011-01-05 14:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, Jan 05, 2011 at 07:52:34PM +0530, Santosh Shilimkar wrote:
> > On Wed, Jan 05, 2011 at 07:44:31PM +0530, Santosh Shilimkar wrote:
> > > > What could it be ? Any more debugging I could do to help ?
> > > >
> > > This is known and seems to OMAP specific issue. Test patch and
> >
> > Doesn't look like omap-specific from patch description. Looks like
> > like
> > CPU1 is turned off and the instruction to flush PMD entry fails.
> > Could
> > it be that all ARM SMPs are affected ?
> >
> Thread is broken some how. There were more emails on this one...
> Russell confirmed that he don't see the issue on his A9 Versatile
> platform and no one else complained except OMAP.

I see...

> Copy pasting some last updates..

Just finished reading the thread. Quite a problem on our side :-(

-- 
balbi

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-01-05 14:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-05 14:08 OMAP4 panda gets stuck during reboot Felipe Balbi
2011-01-05 14:14 ` Santosh Shilimkar
2011-01-05 14:18   ` Felipe Balbi
2011-01-05 14:22     ` Santosh Shilimkar
2011-01-05 14:26       ` Felipe Balbi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox