linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [BUG] i.MX25: soft lockups/freezes while getnstimeofday
@ 2013-01-29 16:12 Steffen Trumtrar
  2013-01-29 16:38 ` Fabio Estevam
  0 siblings, 1 reply; 5+ messages in thread
From: Steffen Trumtrar @ 2013-01-29 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi!

I have a problem with an imx25 on 3.7.2 kernel.

* Scenario

The scenario is as follows:

Under normal circumstances (i.e. system is running some daemons, but apart from
that idles most of the time), soft lockups happen after several hours. When the
watchdog_timer_fn has printed its stack dump, the system hangs for about 6min
and then continues to run as nothing ever happend.
I am able to force the lockup when I run the following little code snippet:

	while(1)
		syscall(SYS_clock_gettime, CLOCK_REALTIME, &tp);

With this running on the system, the lockup happens after 10-30mins:

[ 1175.247095] BUG: soft lockup - CPU#0 stuck for 22s! [time-test:268]
[ 1175.253421] Modules linked in:
[ 1175.256537] irq event stamp: 555315876
[ 1175.260318] hardirqs last  enabled at (555315875): [<c000df28>] __irq_svc+0x48/0x54
[ 1175.268073] hardirqs last disabled at (555315876): [<c000df14>] __irq_svc+0x34/0x54
[ 1175.275802] softirqs last  enabled at (555315874): [<c00243dc>] __do_softirq+0x210/0x2a0
[ 1175.283977] softirqs last disabled at (555315867): [<c0024854>] irq_exit+0x64/0xc8
[ 1175.291610]
[ 1175.293134] Pid: 268, comm:            time-test
[ 1175.297788] CPU: 0    Not tainted  (3.7.2-Katara-00060-g49acf87-dirty #25)
[ 1175.304707] PC is at getnstimeofday+0xc4/0xf0
[ 1175.309104] LR is at getnstimeofday+0x98/0xf0
[ 1175.313503] pc : [<c0052b4c>]    lr : [<c0052b20>]    psr: 80000013
[ 1175.313503] sp : d104bf38  ip : 00000005  fp : d104bf74
[ 1175.325022] r10: 43ecbb48  r9 : d104bf88  r8 : d1811020
[ 1175.330280] r7 : ffffffff  r6 : c4653600  r5 : f02f5ec5  r4 : ce61ad4b
[ 1175.336840] r3 : fffffffb  r2 : 00001243  r1 : 00000000  r0 : 3b9ac9ff
[ 1175.343402] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 1175.350572] Control: 0005317f  Table: 912ec000  DAC: 00000015
[ 1175.356408] [<c00130dc>] (unwind_backtrace+0x0/0xec) from [<c0391514>] (dump_stack+0x20/0x24)
[ 1175.365020] [<c0391514>] (dump_stack+0x20/0x24) from [<c000f2f0>] (show_regs+0x4c/0x58)
[ 1175.373115] [<c000f2f0>] (show_regs+0x4c/0x58) from [<c0071c18>] (watchdog_timer_fn+0x108/0x15c)
[ 1175.381991] [<c0071c18>] (watchdog_timer_fn+0x108/0x15c) from [<c0043470>] (__run_hrtimer+0x11c/0x250)
[ 1175.391378] [<c0043470>] (__run_hrtimer+0x11c/0x250) from [<c0043d40>] (hrtimer_interrupt+0x104/0x268)
[ 1175.400775] [<c0043d40>] (hrtimer_interrupt+0x104/0x268) from [<c00196a0>] (mxc_timer_interrupt+0x34/0x44)
[ 1175.410515] [<c00196a0>] (mxc_timer_interrupt+0x34/0x44) from [<c00726a0>] (handle_irq_event_percpu+0x88/0x274)
[ 1175.420677] [<c00726a0>] (handle_irq_event_percpu+0x88/0x274) from [<c00728d8>] (handle_irq_event+0x4c/0x6c)
[ 1175.430583] [<c00728d8>] (handle_irq_event+0x4c/0x6c) from [<c0074fc8>] (handle_level_irq+0xe0/0xf8)
[ 1175.439793] [<c0074fc8>] (handle_level_irq+0xe0/0xf8) from [<c0071ec4>] (generic_handle_irq+0x30/0x40)
[ 1175.449179] [<c0071ec4>] (generic_handle_irq+0x30/0x40) from [<c000ecec>] (handle_IRQ+0x70/0x94)
[ 1175.458036] [<c000ecec>] (handle_IRQ+0x70/0x94) from [<c0008740>] (avic_handle_irq+0x44/0x50)
[ 1175.466630] [<c0008740>] (avic_handle_irq+0x44/0x50) from [<c000df24>] (__irq_svc+0x44/0x54)
[ 1175.475104] Exception stack(0xd104bef0 to 0xd104bf38)
[ 1175.480202] bee0:                                     3b9ac9ff 00000000 00001243 fffffffb
[ 1175.488436] bf00: ce61ad4b f02f5ec5 c4653600 ffffffff d1811020 d104bf88 43ecbb48 d104bf74
[ 1175.496660] bf20: 00000005 d104bf38 c0052b20 c0052b4c 80000013 ffffffff
[ 1175.503346] [<c000df24>] (__irq_svc+0x44/0x54) from [<c0052b4c>] (getnstimeofday+0xc4/0xf0)
[ 1175.511785] [<c0052b4c>] (getnstimeofday+0xc4/0xf0) from [<c003d1b0>] (posix_clock_realtime_get+0x1c/0x24)
[ 1175.521524] [<c003d1b0>] (posix_clock_realtime_get+0x1c/0x24) from [<c003e5a8>] (sys_clock_gettime+0x3c/0x9c)
[ 1175.531520] [<c003e5a8>] (sys_clock_gettime+0x3c/0x9c) from [<c000e300>] (ret_fast_syscall+0x0/0x38)

The board itself supposedly worked up until v3.4.

The mxc-timer is set up to use ipg_clk_highfreq with a per5_div set to 2,
therefore it is clocked with 120MHz. I tried to set the per5_div to 4 to have
a 60MHz clock, but this didn't change anything.
On the other hand, I tried parenting the ipg_clk to the per5_clk to get a
66MHz clock. This seems to be working fine, but I only have it running for 4h now.


* Suspects

The current suspect is arch/arm/plat-mxc/time.c or the GPT respectively.
Is it okay to clock the gpt via the ipg_clk_highfreq? It is a valid clksrc according
to the datasheet, but that doesn't mean that the timer is stable then ;-)
It seems, that it is not correct to use the highfreq-clock, but I can't be absolutely
sure from the datasheet, maybe someone with access to the verilog/vhdl code can shed
some insight?!

Or is it valid, but leads to some very obscure and rare condition in the timer?
What I don't understand than, is why it worked with older kernels, when the
clocks are not okay. And what is happening in those 6mins when the system is
hanging?

It does not appear to be:
	- some timer wrap around (should happen more often)
	- some race condition with the set_next_event
	- something with getnstimeofday itself

I hope I didn't forget anything of importance and Shawn (someone) has an idea.
Or can tell me, that ipg_clk_highfreq is definitely wrong, because of $reason.


Thanks,
Steffen

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [BUG] i.MX25: soft lockups/freezes while getnstimeofday
  2013-01-29 16:12 [BUG] i.MX25: soft lockups/freezes while getnstimeofday Steffen Trumtrar
@ 2013-01-29 16:38 ` Fabio Estevam
  2013-01-30  7:03   ` Robert Schwebel
  2013-01-30  9:24   ` Steffen Trumtrar
  0 siblings, 2 replies; 5+ messages in thread
From: Fabio Estevam @ 2013-01-29 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Steffen,

On Tue, Jan 29, 2013 at 2:12 PM, Steffen Trumtrar
<s.trumtrar@pengutronix.de> wrote:

> The board itself supposedly worked up until v3.4.
>
> The mxc-timer is set up to use ipg_clk_highfreq with a per5_div set to 2,
> therefore it is clocked with 120MHz. I tried to set the per5_div to 4 to have
> a 60MHz clock, but this didn't change anything.
> On the other hand, I tried parenting the ipg_clk to the per5_clk to get a
> 66MHz clock. This seems to be working fine, but I only have it running for 4h now.

Can you dump the clock tree in 3.4 and 3.7.2, so that we can compare them?

Just looked at the FSL BSP and they have the following:

	/* GPT clock must be derived from AHB clock */
	clk_set_rate(&per_clk[5], ahb_clk.rate / 10);

Regards,

Fabio Estevam

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [BUG] i.MX25: soft lockups/freezes while getnstimeofday
  2013-01-29 16:38 ` Fabio Estevam
@ 2013-01-30  7:03   ` Robert Schwebel
  2013-01-30 11:03     ` Fabio Estevam
  2013-01-30  9:24   ` Steffen Trumtrar
  1 sibling, 1 reply; 5+ messages in thread
From: Robert Schwebel @ 2013-01-30  7:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 29, 2013 at 02:38:59PM -0200, Fabio Estevam wrote:
> On Tue, Jan 29, 2013 at 2:12 PM, Steffen Trumtrar <s.trumtrar@pengutronix.de> wrote:
> > The board itself supposedly worked up until v3.4.
> >
> > The mxc-timer is set up to use ipg_clk_highfreq with a per5_div set to 2,
> > therefore it is clocked with 120MHz. I tried to set the per5_div to 4 to have
> > a 60MHz clock, but this didn't change anything.
> > On the other hand, I tried parenting the ipg_clk to the per5_clk to get a
> > 66MHz clock. This seems to be working fine, but I only have it running for 4h now.
>
> Can you dump the clock tree in 3.4 and 3.7.2, so that we can compare them?
>
> Just looked at the FSL BSP and they have the following:
>
> 	/* GPT clock must be derived from AHB clock */
> 	clk_set_rate(&per_clk[5], ahb_clk.rate / 10);

Do your hardware guys have documentation about *why* it "must be derived
from AHB clock"?

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [BUG] i.MX25: soft lockups/freezes while getnstimeofday
  2013-01-29 16:38 ` Fabio Estevam
  2013-01-30  7:03   ` Robert Schwebel
@ 2013-01-30  9:24   ` Steffen Trumtrar
  1 sibling, 0 replies; 5+ messages in thread
From: Steffen Trumtrar @ 2013-01-30  9:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 29, 2013 at 02:38:59PM -0200, Fabio Estevam wrote:
> Hi Steffen,
> 
> On Tue, Jan 29, 2013 at 2:12 PM, Steffen Trumtrar
> <s.trumtrar@pengutronix.de> wrote:
> 
> > The board itself supposedly worked up until v3.4.
> >
> > The mxc-timer is set up to use ipg_clk_highfreq with a per5_div set to 2,
> > therefore it is clocked with 120MHz. I tried to set the per5_div to 4 to have
> > a 60MHz clock, but this didn't change anything.
> > On the other hand, I tried parenting the ipg_clk to the per5_clk to get a
> > 66MHz clock. This seems to be working fine, but I only have it running for 4h now.
> 
> Can you dump the clock tree in 3.4 and 3.7.2, so that we can compare them?
> 
> Just looked at the FSL BSP and they have the following:
> 
> 	/* GPT clock must be derived from AHB clock */
> 	clk_set_rate(&per_clk[5], ahb_clk.rate / 10);
> 

Well, v3.4 has 2012d9ca2a1381ae3e733330a7f0d1d2f1988bba

	/* Clock source for gpt is ahb_div */
	__raw_writel(__raw_readl(CRM_BASE+0x64) & ~(1 << 5), CRM_BASE + 0x64);

Seems, that it got lost somewhere. Therefore the clocktree in v3.4 and v3.7.2
is different.
I will send a patch, that will clk_set_parent per5 to ahb.

Thanks,
Steffen

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [BUG] i.MX25: soft lockups/freezes while getnstimeofday
  2013-01-30  7:03   ` Robert Schwebel
@ 2013-01-30 11:03     ` Fabio Estevam
  0 siblings, 0 replies; 5+ messages in thread
From: Fabio Estevam @ 2013-01-30 11:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 30, 2013 at 5:03 AM, Robert Schwebel
<r.schwebel@pengutronix.de> wrote:

> Do your hardware guys have documentation about *why* it "must be derived
> from AHB clock"?

I have asked for more clarification about this.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-01-30 11:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-29 16:12 [BUG] i.MX25: soft lockups/freezes while getnstimeofday Steffen Trumtrar
2013-01-29 16:38 ` Fabio Estevam
2013-01-30  7:03   ` Robert Schwebel
2013-01-30 11:03     ` Fabio Estevam
2013-01-30  9:24   ` Steffen Trumtrar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).