[Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
@ 2014-11-05 20:38 Thierry Bultel
  2014-11-05 20:59 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: Thierry Bultel @ 2014-11-05 20:38 UTC (permalink / raw)
  To: xenomai, jean-Baptiste, TREDEZ (GE Healthcare, non-ge)

Hi,
I have applied the adeos-ipipe-3.0.43-mx6q-1.18-14.patch provided with
xenomai-2.6.4 ,
onto the rel_imx_3.0.35_4.1.0 kernel

When CONFIG_IPIPE is enabled, the kernel is very slow to boot. Once it
is booted, it seems all right.
It boots fast when CONFIG_IPIPE is disabled

I have CONFIG_CPU_FREQ disabled.

It hangs for about 45 seconds after the
"console [tty0] enabled, bootconsole disabled" message

It is very slow to shutdown, too.

Where I can look to find the issue please ?

Best regards
Thierry



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-05 20:38 [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot Thierry Bultel
@ 2014-11-05 20:59 ` Gilles Chanteperdrix
  2014-11-06 10:57   ` tbultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-05 20:59 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: xenomai

On Wed, Nov 05, 2014 at 09:38:47PM +0100, Thierry Bultel wrote:
> Hi,
> I have applied the adeos-ipipe-3.0.43-mx6q-1.18-14.patch provided with
> xenomai-2.6.4 ,
> onto the rel_imx_3.0.35_4.1.0 kernel

Have you tried commit d97b871cc396738ff62293df6b4ba78ade44b6d2
for which the patch is made?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-05 20:59 ` Gilles Chanteperdrix
@ 2014-11-06 10:57   ` tbultel
  2014-11-06 11:47     ` Gilles Chanteperdrix
  2014-11-06 12:48     ` Gilles Chanteperdrix
  0 siblings, 2 replies; 46+ messages in thread
From: tbultel @ 2014-11-06 10:57 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai



----- Mail original -----
De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
À: "Thierry Bultel" <tbultel@free.fr>
Cc: xenomai@xenomai.org, "TREDEZ jean-Baptiste (GE Healthcare, non-ge)" <Jean-baptiste.tredez@basystemes.fr>
Envoyé: Mercredi 5 Novembre 2014 21:59:17
Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot

On Wed, Nov 05, 2014 at 09:38:47PM +0100, Thierry Bultel wrote:
> Hi,
> I have applied the adeos-ipipe-3.0.43-mx6q-1.18-14.patch provided with
> xenomai-2.6.4 ,
> onto the rel_imx_3.0.35_4.1.0 kernel

Have you tried commit d97b871cc396738ff62293df6b4ba78ade44b6d2
for which the patch is made?

-- 
					    Gilles.
Hi Gilles,
I can't afford going on that commit, it is a too old kernel and my BSP
patch does not apply on it.
Meanwhile, I have fixed the slowness issue, I had not payed enough
attention to some rejections when applying the ipipe patch.

But this leads to the reason why I was interested in taking your latest
ipipe patch.
We used to work with the previous one, which was in 3 pieces (pre-x-post).

We are facing random kernel crashes when using the eth intensively.
(ping -i 0.01 -s 65000)
That crashs happen with both ipipe patches versions.

That random freeze only happen on some machines ( 3 machines on a total
of 17 are impacted), that first had let us think about an hardware bug.
(There are no CPU revs differences between those machines)
But after some testing, the crash does not happen with a non-patched
3.0.35_4.1.0 kernel, or with a patched kernel that has CONFIG_IPIPE
disabled.
The freeze sometimes gives a backtrace on the serial port, sometimes not.
See a backtrace sample below. It has rarely the same shape.

By doing some code re-reading, and performing some dichotomy,
I found out that removing the mb() call in bufdesc_read_status() ....

static inline unsigned short bufdesc_read_status(struct bufdesc *bdp)
{
#if 0
#ifdef CONFIG_ARCH_MX6
	mb();
#endif /* CONFIG_ARCH_MX6 */
#endif /* 0 */
	return bdp->cbd_sc;

}

... makes the freeze less reproducible (a machine that froze in less
than 10 secs stayed alive for 2h30)

I noticed that the call mb() is not done on more recent kernels.
It is introduced by the Ipipe patch.
I do not understand neither why it makes the freeze happen, nor why it
is part of the ipipe patch.
What is your opinion about that ?

Regards,
Thierry

The backtrace:

[root@SGT_AGV2 ~]# Unable to handle kernel NULL pointer dereference at
virtual address 0000011c
pgd = 80004000
[0000011c] *pgd=00000000
Internal error: Oops: 17 [#2] PREEMPT SMP
Modules linked in: adv7180_tvin
CPU: 3    Tainted: G      D      (3.0.35_4.1.0 #1)
PC is at dev_alloc_skb+0x20/0x3c
LR is at dev_alloc_skb+0x18/0x3c
pc : [<8054bd2c>]    lr : [<8054bd24>]    psr: 20000113
sp : bff01e48  ip : 00000066  fp : 00000040
r10: 00000000  r9 : 00000040  r8 : 00000084
r7 : 00000001  r6 : bffec000  r5 : 00000020  r4 : 00000800
r3 : 80c89204  r2 : 00000000  r1 : 00000020  r0 : 00000084
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c53c7d  Table: 4449004a  DAC: 00000015
Process swapper (pid: 0, stack limit = 0xbff002f0)
Stack: (0xbff01e48 to 0xbff02000)
1e40:                   ffffffff 00000800 ffdfd108 bffec000 00000001
bffecec8
1e60: 80c88890 8054bd24 00000005 803f2430 bffece50 00000066 4446d020
b446d020
1e80: bffec250 80cbc320 00000080 bffecec8 00000001 0000012c 8c02a080
80c700a0
1ea0: 00000040 8c02a088 bff00000 80556648 ffff85a3 bff00000 00000020
00000001
1ec0: 00000003 0000000c bff00000 80c70070 00000010 80c7006c 80cd9ae0
800913d0
1ee0: 80c75f34 bfc21cc0 00000102 bff00020 00000003 0000000a 80060b74
80c88da8
1f00: 800499cc 00000096 00000000 80c88da8 00000010 80c9cfc0 00000004
8009195c
1f20: 00000270 80051e18 800461b8 00000099 80ce8960 00400000 00000001
800d1f38
1f40: 80d120a0 800467e8 80ce9c80 800494c8 b45201a0 bff01f80 f2a00100
00000096
1f60: 80c8c08c 1000406a bff00000 00000000 00000000 8005bc78 ffffffff
80050ec0
1f80: 20000000 0000001d 00000000 f40dc010 8c0267e8 80cdacc4 80667b04
80c8c08c
1fa0: 1000406a 412fc09a 00000000 00000000 00000001 bff01fc8 80061db0
800605ec
1fc0: 20000013 ffffffff bff00000 00000003 80667b04 800520f4 bff00000
800522d4
1fe0: 80659ffc 4ff0406a 00000015 10c03c7d 80cdae78 1065a014 00000000
00000000
[<8054bd2c>] (dev_alloc_skb+0x20/0x3c) from [<00000800>] (0x800)
Code: e3e03000 ebffffc1 e3500000 08bd8008 (e5902098)
---[ end trace 072d847414ccbc44 ]---
Kernel panic - not syncing: Fatal exception in interrupt
[<800586ac>] (unwind_backtrace+0x0/0xf8) from [<8065de10>]
(panic+0x64/0x180)
[<8065de10>] (panic+0x64/0x180) from [<80054efc>] (die+0x228/0x28c)
[<80054efc>] (die+0x228/0x28c) from [<8065d064>]
(__do_kernel_fault.part.4+0x54/0x74)
[<8065d064>] (__do_kernel_fault.part.4+0x54/0x74) from [<8005d454>]
(do_page_fault+0x214/0x3e4)
[<8005d454>] (do_page_fault+0x214/0x3e4) from [<8004b404>]
(do_DataAbort+0x34/0x17c)
[<8004b404>] (do_DataAbort+0x34/0x17c) from [<80050e50>]
(__dabt_svc+0x70/0xa0)
Exception stack(0xbff01e00 to 0xbff01e48)
1e00: 00000084 00000020 00000000 80c89204 00000800 00000020 bffec000
00000001
1e20: 00000084 00000040 00000000 00000040 00000066 bff01e48 8054bd24
8054bd2c
1e40: 20000113 ffffffff
[<80050e50>] (__dabt_svc+0x70/0xa0) from [<8054bd2c>]
(dev_alloc_skb+0x20/0x3c)
[<8054bd2c>] (dev_alloc_skb+0x20/0x3c) from [<00000800>] (0x800)
CPU2: stopping
[<800586ac>] (unwind_backtrace+0x0/0xf8) from [<8004b3a8>]
(do_IPI+0x12c/0x154)
[<8004b3a8>] (do_IPI+0x12c/0x154) from [<800d1f38>]
(__ipipe_sync_stage+0x254/0x270)
Exception stack(0xbffb5f40 to 0xbffb5f88)
5f40: 80d120a0 800467e8 80ce8a80 800494c8 8c01c7e8 bffb5f80 f2a00100
00000c06
5f60: 80c8c08c 1000406a bffb4000 00000000 00000000 8004b214 ffffffff
80050ec0
5f80: 20000000 0000001d
[<800d1f38>] (__ipipe_sync_stage+0x254/0x270) from [<8004b214>]
(__ipipe_grab_ipi+0x30/0x98)
[<8004b214>] (__ipipe_grab_ipi+0x30/0x98) from [<80050ec0>]
(__irq_svc+0x40/0xd4)
Exception stack(0xbffb5f80 to 0xbffb5fc8)
5f80: 20000000 0000001d 00000000 f40dc010 8c01c7e8 80cdacc4 80667b04
80c8c08c
5fa0: 1000406a 412fc09a 00000000 00000000 00000001 bffb5fc8 80061db0
800605ec
5fc0: 20000013 ffffffff
[<80050ec0>] (__irq_svc+0x40/0xd4) from [<800605ec>]
(cpu_v7_do_idle+0x8/0xc)
[<800605ec>] (cpu_v7_do_idle+0x8/0xc) from [<4ffb806a>] (0x4ffb806a)
CPU0: stopping
[<800586ac>] (unwind_backtrace+0x0/0xf8) from [<8004b3a8>]
(do_IPI+0x12c/0x154)
[<8004b3a8>] (do_IPI+0x12c/0x154) from [<800d1f38>]
(__ipipe_sync_stage+0x254/0x270)
Exception stack(0x80c6ff18 to 0x80c6ff60)
ff00:                                                       80d120a0
800467e8
ff20: 80ce8a80 800494c8 8c0087e8 80c6ff58 f2a00100 00000c06 80c8c08c
1000406a
ff40: 80c6e000 00000000 00000000 8004b214 ffffffff 80050ec0 20000000
0000001d
[<800d1f38>] (__ipipe_sync_stage+0x254/0x270) from [<8004b214>]
(__ipipe_grab_ipi+0x30/0x98)
[<8004b214>] (__ipipe_grab_ipi+0x30/0x98) from [<80050ec0>]
(__irq_svc+0x40/0xd4)
Exception stack(0x80c6ff58 to 0x80c6ffa0)
ff40:                                                       20000000
0000001d
ff60: 00000000 f40dc010 8c0087e8 80cdacc4 80667b04 80c8c08c 1000406a
412fc09a
ff80: 00000000 00000000 00000001 80c6ffa0 80061db0 800605ec 200f0013
ffffffff
[<80050ec0>] (__irq_svc+0x40/0xd4) from [<800605ec>]
(cpu_v7_do_idle+0x8/0xc)
[<800605ec>] (cpu_v7_do_idle+0x8/0xc) from [<80c88c24>] (0x80c88c24)
CPU1: stopping
[<800586ac>] (unwind_backtrace+0x0/0xf8) from [<8004b3a8>]
(do_IPI+0x12c/0x154)
[<8004b3a8>] (do_IPI+0x12c/0x154) from [<800d1f38>]
(__ipipe_sync_stage+0x254/0x270)
Exception stack(0xbffadf40 to 0xbffadf88)
df40: 80d120a0 800467e8 80ce8a80 800494c8 8c0127e8 bffadf80 f2a00100
00000c06
df60: 80c8c08c 1000406a bffac000 00000000 00000000 8004b214 ffffffff
80050ec0
df80: 20000000 0000001d
[<800d1f38>] (__ipipe_sync_stage+0x254/0x270) from [<8004b214>]
(__ipipe_grab_ipi+0x30/0x98)
[<8004b214>] (__ipipe_grab_ipi+0x30/0x98) from [<80050ec0>]
(__irq_svc+0x40/0xd4)
Exception stack(0xbffadf80 to 0xbffadfc8)
df80: 20000000 0000001d 00000000 f40dc010 8c0127e8 80cdacc4 80667b04
80c8c08c
dfa0: 1000406a 412fc09a 00000000 00000000 00000001 bffadfc8 80061db0
800605ec
dfc0: 200f0013 ffffffff
[<80050ec0>] (__irq_svc+0x40/0xd4) from [<800605ec>]
(cpu_v7_do_idle+0x8/0xc)
[<800605ec>] (cpu_v7_do_idle+0x8/0xc) from [<4ffb006a>] (0x4ffb006a)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-06 10:57   ` tbultel
@ 2014-11-06 11:47     ` Gilles Chanteperdrix
  2014-11-06 12:34       ` Gilles Chanteperdrix
  2014-11-06 12:48     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-06 11:47 UTC (permalink / raw)
  To: tbultel; +Cc: xenomai

On Thu, Nov 06, 2014 at 11:57:20AM +0100, tbultel@free.fr wrote:
> Hi Gilles,
> I can't afford going on that commit, it is a too old kernel and my BSP
> patch does not apply on it.
> Meanwhile, I have fixed the slowness issue, I had not payed enough
> attention to some rejections when applying the ipipe patch.
> 
> But this leads to the reason why I was interested in taking your latest
> ipipe patch.
> We used to work with the previous one, which was in 3 pieces (pre-x-post).

Yes, it is one piece now, because only mx6 users use the I-pipe
patch for Linux 3.0, so this reduces the maintenance cost.

I am surprised to hear that you are using this I-pipe patch as I had
heard it caused over-heating issue with your setup.

> 
> We are facing random kernel crashes when using the eth intensively.
> (ping -i 0.01 -s 65000)
> That crashs happen with both ipipe patches versions.
> 
> That random freeze only happen on some machines ( 3 machines on a total
> of 17 are impacted), that first had let us think about an hardware bug.
> (There are no CPU revs differences between those machines)
> But after some testing, the crash does not happen with a non-patched
> 3.0.35_4.1.0 kernel, or with a patched kernel that has CONFIG_IPIPE
> disabled.
> The freeze sometimes gives a backtrace on the serial port, sometimes not.
> See a backtrace sample below. It has rarely the same shape.
> 
> By doing some code re-reading, and performing some dichotomy,
> I found out that removing the mb() call in bufdesc_read_status() ....
> 
> static inline unsigned short bufdesc_read_status(struct bufdesc *bdp)
> {
> #if 0
> #ifdef CONFIG_ARCH_MX6
> 	mb();
> #endif /* CONFIG_ARCH_MX6 */
> #endif /* 0 */
> 	return bdp->cbd_sc;
> 
> }
> 
> ... makes the freeze less reproducible (a machine that froze in less
> than 10 secs stayed alive for 2h30)
> 
> I noticed that the call mb() is not done on more recent kernels.
> It is introduced by the Ipipe patch.
> I do not understand neither why it makes the freeze happen,

I believe I know why this happens. In order to workaround some
hardware issue, the imx6 idle loop is issuing calls to mb(), and
they cause starvation if a non-idle cpu is attempting to call mb().
The first fix that was posted on the lakml is to add a bunch of
"nops" in front of the mb() in the idle loop, I have not followed
how this got fixed finally. I would not be surprised if the fix for
this issue has not been backported to 3.0 kernels.

> nor why it is part of the ipipe patch.

It is part of the patch because it was passed in the tree I used
when starting working on imx6. Ironically, I believe it is a fix for
an eth lockup issue. Anyway, I do not believe this patch itself is a
problem, inserting a mb() should not cause any lockup, unless this
read_status function is called in a tight loop, which would cause
the issue mentioned above.

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-06 11:47     ` Gilles Chanteperdrix
@ 2014-11-06 12:34       ` Gilles Chanteperdrix
  2014-11-06 12:52         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-06 12:34 UTC (permalink / raw)
  To: tbultel; +Cc: xenomai

On Thu, Nov 06, 2014 at 12:47:35PM +0100, Gilles Chanteperdrix wrote:
> On Thu, Nov 06, 2014 at 11:57:20AM +0100, tbultel@free.fr wrote:
> > Hi Gilles,
> > I can't afford going on that commit, it is a too old kernel and my BSP
> > patch does not apply on it.
> > Meanwhile, I have fixed the slowness issue, I had not payed enough
> > attention to some rejections when applying the ipipe patch.
> > 
> > But this leads to the reason why I was interested in taking your latest
> > ipipe patch.
> > We used to work with the previous one, which was in 3 pieces (pre-x-post).
> 
> Yes, it is one piece now, because only mx6 users use the I-pipe
> patch for Linux 3.0, so this reduces the maintenance cost.
> 
> I am surprised to hear that you are using this I-pipe patch as I had
> heard it caused over-heating issue with your setup.
> 
> > 
> > We are facing random kernel crashes when using the eth intensively.
> > (ping -i 0.01 -s 65000)
> > That crashs happen with both ipipe patches versions.
> > 
> > That random freeze only happen on some machines ( 3 machines on a total
> > of 17 are impacted), that first had let us think about an hardware bug.
> > (There are no CPU revs differences between those machines)
> > But after some testing, the crash does not happen with a non-patched
> > 3.0.35_4.1.0 kernel, or with a patched kernel that has CONFIG_IPIPE
> > disabled.
> > The freeze sometimes gives a backtrace on the serial port, sometimes not.
> > See a backtrace sample below. It has rarely the same shape.
> > 
> > By doing some code re-reading, and performing some dichotomy,
> > I found out that removing the mb() call in bufdesc_read_status() ....
> > 
> > static inline unsigned short bufdesc_read_status(struct bufdesc *bdp)
> > {
> > #if 0
> > #ifdef CONFIG_ARCH_MX6
> > 	mb();
> > #endif /* CONFIG_ARCH_MX6 */
> > #endif /* 0 */
> > 	return bdp->cbd_sc;
> > 
> > }
> > 
> > ... makes the freeze less reproducible (a machine that froze in less
> > than 10 secs stayed alive for 2h30)
> > 
> > I noticed that the call mb() is not done on more recent kernels.
> > It is introduced by the Ipipe patch.
> > I do not understand neither why it makes the freeze happen,
> 
> I believe I know why this happens. In order to workaround some
> hardware issue, the imx6 idle loop is issuing calls to mb(), and
> they cause starvation if a non-idle cpu is attempting to call mb().
> The first fix that was posted on the lakml is to add a bunch of
> "nops" in front of the mb() in the idle loop, I have not followed
> how this got fixed finally. I would not be surprised if the fix for
> this issue has not been backported to 3.0 kernels.

Actually, I remembered incorrectly. The problem is not in the idle
loop (unless it calls cpu_relax() when CONFIG_IPIPE is on, instead
of wfi), but in cpu_relax(). And this happens only if you have
enabled support for armv6 processors (like imx31) like the imx_v6_v7
defconfig does, or if you enable CONFIG_ARM_ERRATA_754327.

If that is the case, please try the patch here:
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-August/114646.html

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-06 10:57   ` tbultel
  2014-11-06 11:47     ` Gilles Chanteperdrix
@ 2014-11-06 12:48     ` Gilles Chanteperdrix
  1 sibling, 0 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-06 12:48 UTC (permalink / raw)
  To: tbultel; +Cc: xenomai

On Thu, Nov 06, 2014 at 11:57:20AM +0100, tbultel@free.fr wrote:
> nor why it is part of the ipipe patch.

So, the reason it is part of the ipipe patch is that without it, we
get high latencies on imx6.

http://git.xenomai.org/ipipe-gch.git/commit/?h=ipipe-3.0-imx6q&id=46342aa94d2635cb575311fb3626b7393ffe78dd

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-06 12:34       ` Gilles Chanteperdrix
@ 2014-11-06 12:52         ` Gilles Chanteperdrix
  2014-11-06 14:41           ` tbultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-06 12:52 UTC (permalink / raw)
  To: tbultel; +Cc: xenomai

On Thu, Nov 06, 2014 at 01:34:55PM +0100, Gilles Chanteperdrix wrote:
> On Thu, Nov 06, 2014 at 12:47:35PM +0100, Gilles Chanteperdrix wrote:
> > On Thu, Nov 06, 2014 at 11:57:20AM +0100, tbultel@free.fr wrote:
> > > Hi Gilles,
> > > I can't afford going on that commit, it is a too old kernel and my BSP
> > > patch does not apply on it.
> > > Meanwhile, I have fixed the slowness issue, I had not payed enough
> > > attention to some rejections when applying the ipipe patch.
> > > 
> > > But this leads to the reason why I was interested in taking your latest
> > > ipipe patch.
> > > We used to work with the previous one, which was in 3 pieces (pre-x-post).
> > 
> > Yes, it is one piece now, because only mx6 users use the I-pipe
> > patch for Linux 3.0, so this reduces the maintenance cost.
> > 
> > I am surprised to hear that you are using this I-pipe patch as I had
> > heard it caused over-heating issue with your setup.
> > 
> > > 
> > > We are facing random kernel crashes when using the eth intensively.
> > > (ping -i 0.01 -s 65000)
> > > That crashs happen with both ipipe patches versions.
> > > 
> > > That random freeze only happen on some machines ( 3 machines on a total
> > > of 17 are impacted), that first had let us think about an hardware bug.
> > > (There are no CPU revs differences between those machines)
> > > But after some testing, the crash does not happen with a non-patched
> > > 3.0.35_4.1.0 kernel, or with a patched kernel that has CONFIG_IPIPE
> > > disabled.
> > > The freeze sometimes gives a backtrace on the serial port, sometimes not.
> > > See a backtrace sample below. It has rarely the same shape.
> > > 
> > > By doing some code re-reading, and performing some dichotomy,
> > > I found out that removing the mb() call in bufdesc_read_status() ....
> > > 
> > > static inline unsigned short bufdesc_read_status(struct bufdesc *bdp)
> > > {
> > > #if 0
> > > #ifdef CONFIG_ARCH_MX6
> > > 	mb();
> > > #endif /* CONFIG_ARCH_MX6 */
> > > #endif /* 0 */
> > > 	return bdp->cbd_sc;
> > > 
> > > }
> > > 
> > > ... makes the freeze less reproducible (a machine that froze in less
> > > than 10 secs stayed alive for 2h30)
> > > 
> > > I noticed that the call mb() is not done on more recent kernels.
> > > It is introduced by the Ipipe patch.
> > > I do not understand neither why it makes the freeze happen,
> > 
> > I believe I know why this happens. In order to workaround some
> > hardware issue, the imx6 idle loop is issuing calls to mb(), and
> > they cause starvation if a non-idle cpu is attempting to call mb().
> > The first fix that was posted on the lakml is to add a bunch of
> > "nops" in front of the mb() in the idle loop, I have not followed
> > how this got fixed finally. I would not be surprised if the fix for
> > this issue has not been backported to 3.0 kernels.
> 
> Actually, I remembered incorrectly. The problem is not in the idle
> loop (unless it calls cpu_relax() when CONFIG_IPIPE is on, instead
> of wfi), but in cpu_relax(). And this happens only if you have
> enabled support for armv6 processors (like imx31) like the imx_v6_v7
> defconfig does, or if you enable CONFIG_ARM_ERRATA_754327.
> 
> If that is the case, please try the patch here:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2012-August/114646.html

I already backported this patch:
http://git.xenomai.org/ipipe-gch.git/commit/?h=ipipe-3.0-imx6q&id=803f75460e99d4b4dbf029f57cc7cf5eb8dd7338

So, now, if we assume what is causing the problem is the fact that
the FEC patch is causing the lockup, you could try and add the magic
5 nops before the mb() in the bufdesc_read_status() function

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-06 12:52         ` Gilles Chanteperdrix
@ 2014-11-06 14:41           ` tbultel
  2014-11-06 14:51             ` Gilles Chanteperdrix
  2014-11-06 16:04             ` Lennart Sorensen
  0 siblings, 2 replies; 46+ messages in thread
From: tbultel @ 2014-11-06 14:41 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai



----- Mail original -----
> De: "Gilles Chanteperdrix" chanteperdrix@xenomai.org>
> À: tbultel@free.fr
> Cc: xenomai@xenomai.org
> Envoyé: Jeudi 6 Novembre 2014 13:52:17
> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> 
> On Thu, Nov 06, 2014 at 01:34:55PM +0100, Gilles Chanteperdrix wrote:
> > On Thu, Nov 06, 2014 at 12:47:35PM +0100, Gilles Chanteperdrix
> > wrote:
> > > On Thu, Nov 06, 2014 at 11:57:20AM +0100, tbultel@free.fr wrote:
> > > > Hi Gilles,
> > > > I can't afford going on that commit, it is a too old kernel and
> > > > my BSP
> > > > patch does not apply on it.
> > > > Meanwhile, I have fixed the slowness issue, I had not payed
> > > > enough
> > > > attention to some rejections when applying the ipipe patch.
> > > > 
> > > > But this leads to the reason why I was interested in taking
> > > > your latest
> > > > ipipe patch.
> > > > We used to work with the previous one, which was in 3 pieces
> > > > (pre-x-post).
> > > 
> > > Yes, it is one piece now, because only mx6 users use the I-pipe
> > > patch for Linux 3.0, so this reduces the maintenance cost.
> > > 
> > > I am surprised to hear that you are using this I-pipe patch as I
> > > had
> > > heard it caused over-heating issue with your setup.
> > > 
> > > > 
> > > > We are facing random kernel crashes when using the eth
> > > > intensively.
> > > > (ping -i 0.01 -s 65000)
> > > > That crashs happen with both ipipe patches versions.
> > > > 
> > > > That random freeze only happen on some machines ( 3 machines on
> > > > a total
> > > > of 17 are impacted), that first had let us think about an
> > > > hardware bug.
> > > > (There are no CPU revs differences between those machines)
> > > > But after some testing, the crash does not happen with a
> > > > non-patched
> > > > 3.0.35_4.1.0 kernel, or with a patched kernel that has
> > > > CONFIG_IPIPE
> > > > disabled.
> > > > The freeze sometimes gives a backtrace on the serial port,
> > > > sometimes not.
> > > > See a backtrace sample below. It has rarely the same shape.
> > > > 
> > > > By doing some code re-reading, and performing some dichotomy,
> > > > I found out that removing the mb() call in
> > > > bufdesc_read_status() ....
> > > > 
> > > > static inline unsigned short bufdesc_read_status(struct bufdesc
> > > > *bdp)
> > > > {
> > > > #if 0
> > > > #ifdef CONFIG_ARCH_MX6
> > > >         mb();
> > > > #endif /* CONFIG_ARCH_MX6 */
> > > > #endif /* 0 */
> > > >         return bdp->cbd_sc;
> > > > 
> > > > }
> > > > 
> > > > ... makes the freeze less reproducible (a machine that froze in
> > > > less
> > > > than 10 secs stayed alive for 2h30)
> > > > 
> > > > I noticed that the call mb() is not done on more recent
> > > > kernels.
> > > > It is introduced by the Ipipe patch.
> > > > I do not understand neither why it makes the freeze happen,
> > > 
> > > I believe I know why this happens. In order to workaround some
> > > hardware issue, the imx6 idle loop is issuing calls to mb(), and
> > > they cause starvation if a non-idle cpu is attempting to call
> > > mb().
> > > The first fix that was posted on the lakml is to add a bunch of
> > > "nops" in front of the mb() in the idle loop, I have not followed
> > > how this got fixed finally. I would not be surprised if the fix
> > > for
> > > this issue has not been backported to 3.0 kernels.
> > 
> > Actually, I remembered incorrectly. The problem is not in the idle
> > loop (unless it calls cpu_relax() when CONFIG_IPIPE is on, instead
> > of wfi), but in cpu_relax(). And this happens only if you have
> > enabled support for armv6 processors (like imx31) like the
> > imx_v6_v7
> > defconfig does, or if you enable CONFIG_ARM_ERRATA_754327.
> > 
> > If that is the case, please try the patch here:
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2012-August/114646.html
> 
> I already backported this patch:
> http://git.xenomai.org/ipipe-gch.git/commit/?h=ipipe-3.0-imx6q&id=803f75460e99d4b4dbf029f57cc7cf5eb8dd7338
> 
> So, now, if we assume what is causing the problem is the fact that
> the FEC patch is causing the lockup, you could try and add the magic
> 5 nops before the mb() in the bufdesc_read_status() function
> 
> --

Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
It is -not- enabled in the evaluation kernel that is provided by the 
manufacturer.
That errata is said to be for CPU revs < r2p0

I am a little bit puzzled about the naming conventions for the CPU revision,
uboot says rev1.2, the kernel says

Processor	: ARMv7 Processor rev 10 (v7l)
...

CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

how can I do the matching ?

Meanwhile, we noticed that compared to the evaluation kernel, we were missing
CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419

Adding them helps a lot, but the freeze still happens on one machine

We are currently trying with 754322 + 769419 + 754327 + 5 nops in fec ... 
but not sure if we need 754327.

Regards
Thierry

PS: Regarding the thermal issue, we have changed our supplier, we now have
a dissipator that is big enough (it is the AMOS820 from Via Embedded)


>                                             Gilles.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-06 14:41           ` tbultel
@ 2014-11-06 14:51             ` Gilles Chanteperdrix
  2014-11-06 16:04             ` Lennart Sorensen
  1 sibling, 0 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-06 14:51 UTC (permalink / raw)
  To: tbultel; +Cc: xenomai

On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr wrote:
> 
> 
> ----- Mail original -----
> > De: "Gilles Chanteperdrix" chanteperdrix@xenomai.org>
> > À: tbultel@free.fr
> > Cc: xenomai@xenomai.org
> > Envoyé: Jeudi 6 Novembre 2014 13:52:17
> > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > 
> > On Thu, Nov 06, 2014 at 01:34:55PM +0100, Gilles Chanteperdrix wrote:
> > > On Thu, Nov 06, 2014 at 12:47:35PM +0100, Gilles Chanteperdrix
> > > wrote:
> > > > On Thu, Nov 06, 2014 at 11:57:20AM +0100, tbultel@free.fr wrote:
> > > > > Hi Gilles,
> > > > > I can't afford going on that commit, it is a too old kernel and
> > > > > my BSP
> > > > > patch does not apply on it.
> > > > > Meanwhile, I have fixed the slowness issue, I had not payed
> > > > > enough
> > > > > attention to some rejections when applying the ipipe patch.
> > > > > 
> > > > > But this leads to the reason why I was interested in taking
> > > > > your latest
> > > > > ipipe patch.
> > > > > We used to work with the previous one, which was in 3 pieces
> > > > > (pre-x-post).
> > > > 
> > > > Yes, it is one piece now, because only mx6 users use the I-pipe
> > > > patch for Linux 3.0, so this reduces the maintenance cost.
> > > > 
> > > > I am surprised to hear that you are using this I-pipe patch as I
> > > > had
> > > > heard it caused over-heating issue with your setup.
> > > > 
> > > > > 
> > > > > We are facing random kernel crashes when using the eth
> > > > > intensively.
> > > > > (ping -i 0.01 -s 65000)
> > > > > That crashs happen with both ipipe patches versions.
> > > > > 
> > > > > That random freeze only happen on some machines ( 3 machines on
> > > > > a total
> > > > > of 17 are impacted), that first had let us think about an
> > > > > hardware bug.
> > > > > (There are no CPU revs differences between those machines)
> > > > > But after some testing, the crash does not happen with a
> > > > > non-patched
> > > > > 3.0.35_4.1.0 kernel, or with a patched kernel that has
> > > > > CONFIG_IPIPE
> > > > > disabled.
> > > > > The freeze sometimes gives a backtrace on the serial port,
> > > > > sometimes not.
> > > > > See a backtrace sample below. It has rarely the same shape.
> > > > > 
> > > > > By doing some code re-reading, and performing some dichotomy,
> > > > > I found out that removing the mb() call in
> > > > > bufdesc_read_status() ....
> > > > > 
> > > > > static inline unsigned short bufdesc_read_status(struct bufdesc
> > > > > *bdp)
> > > > > {
> > > > > #if 0
> > > > > #ifdef CONFIG_ARCH_MX6
> > > > >         mb();
> > > > > #endif /* CONFIG_ARCH_MX6 */
> > > > > #endif /* 0 */
> > > > >         return bdp->cbd_sc;
> > > > > 
> > > > > }
> > > > > 
> > > > > ... makes the freeze less reproducible (a machine that froze in
> > > > > less
> > > > > than 10 secs stayed alive for 2h30)
> > > > > 
> > > > > I noticed that the call mb() is not done on more recent
> > > > > kernels.
> > > > > It is introduced by the Ipipe patch.
> > > > > I do not understand neither why it makes the freeze happen,
> > > > 
> > > > I believe I know why this happens. In order to workaround some
> > > > hardware issue, the imx6 idle loop is issuing calls to mb(), and
> > > > they cause starvation if a non-idle cpu is attempting to call
> > > > mb().
> > > > The first fix that was posted on the lakml is to add a bunch of
> > > > "nops" in front of the mb() in the idle loop, I have not followed
> > > > how this got fixed finally. I would not be surprised if the fix
> > > > for
> > > > this issue has not been backported to 3.0 kernels.
> > > 
> > > Actually, I remembered incorrectly. The problem is not in the idle
> > > loop (unless it calls cpu_relax() when CONFIG_IPIPE is on, instead
> > > of wfi), but in cpu_relax(). And this happens only if you have
> > > enabled support for armv6 processors (like imx31) like the
> > > imx_v6_v7
> > > defconfig does, or if you enable CONFIG_ARM_ERRATA_754327.
> > > 
> > > If that is the case, please try the patch here:
> > > http://lists.infradead.org/pipermail/linux-arm-kernel/2012-August/114646.html
> > 
> > I already backported this patch:
> > http://git.xenomai.org/ipipe-gch.git/commit/?h=ipipe-3.0-imx6q&id=803f75460e99d4b4dbf029f57cc7cf5eb8dd7338
> > 
> > So, now, if we assume what is causing the problem is the fact that
> > the FEC patch is causing the lockup, you could try and add the magic
> > 5 nops before the mb() in the bufdesc_read_status() function
> > 
> > --
> 
> Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled

It should not be a problem, since the I-pipe patch contains the nop
workaround. 

> It is -not- enabled in the evaluation kernel that is provided by the 
> manufacturer.
> That errata is said to be for CPU revs < r2p0
> 
> I am a little bit puzzled about the naming conventions for the CPU revision,
> uboot says rev1.2, the kernel says
> 
> Processor	: ARMv7 Processor rev 10 (v7l)
> ...
> 
> CPU implementer	: 0x41
> CPU architecture: 7
> CPU variant	: 0x2
> CPU part	: 0xc09
> CPU revision	: 10
> 
> how can I do the matching ?

I would say this is an r2p10.

Looking at the linux boot logs for omap4, I get:

CPU: ARMv7 Processor [411fc092] revision 2 (ARMv7), cr=50c53c7d

This is split as:
41 1 fc09 2

41 is the vendor, it is ARM
and for ARM processors, fc09 identifies a cortex a9
the remaining 1 and 2, mean r1p2 from what I understand. This looks
similar to what u-boot is telling you.

> 
> Meanwhile, we noticed that compared to the evaluation kernel, we were missing
> CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
> 
> Adding them helps a lot, but the freeze still happens on one machine
> 
> We are currently trying with 754322 + 769419 + 754327 + 5 nops in fec ... 
> but not sure if we need 754327.

Normally, the workarounds are only enabled if the processor requires
them, but there are exceptions, with 754327 being one.

-- 
			
		    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-06 14:41           ` tbultel
  2014-11-06 14:51             ` Gilles Chanteperdrix
@ 2014-11-06 16:04             ` Lennart Sorensen
  2014-11-06 16:08               ` Gilles Chanteperdrix
  1 sibling, 1 reply; 46+ messages in thread
From: Lennart Sorensen @ 2014-11-06 16:04 UTC (permalink / raw)
  To: tbultel; +Cc: xenomai

On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr wrote:
> Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
> It is -not- enabled in the evaluation kernel that is provided by the 
> manufacturer.
> That errata is said to be for CPU revs < r2p0
> 
> I am a little bit puzzled about the naming conventions for the CPU revision,
> uboot says rev1.2, the kernel says
> 
> Processor	: ARMv7 Processor rev 10 (v7l)
> ...
> 
> CPU implementer	: 0x41
> CPU architecture: 7
> CPU variant	: 0x2
> CPU part	: 0xc09
> CPU revision	: 10
> 
> how can I do the matching ?
> 
> Meanwhile, we noticed that compared to the evaluation kernel, we were missing
> CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
> 
> Adding them helps a lot, but the freeze still happens on one machine
> 
> We are currently trying with 754322 + 769419 + 754327 + 5 nops in fec ... 
> but not sure if we need 754327.
> 
> Regards
> Thierry
> 
> PS: Regarding the thermal issue, we have changed our supplier, we now have
> a dissipator that is big enough (it is the AMOS820 from Via Embedded)

I am not sure how to read the A9 revision.  I have this on a system:

processor       : 0
model name      : ARMv7 Processor rev 2 (v7l)
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc0f
CPU revision    : 2

That's an A15, and on it the variant is the number after r, and the
revision is the number after p, so this one is r2p2 as far as I know.

Another one I have is:

processor       : 0
model name      : ARMv7 Processor rev 3 (v7l)
Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc0f
CPU revision    : 3

That should be an r2p3 (also an A15).

Quite how revision 10 converts I am not sure.

Looking at some freescale documents on the imx6, it appears the core is
in fact expected to be r2p10, so it appears it really is as simple as
it appears.  That of course means you are way past r2p0 and hence the
errata should not apply.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-06 16:04             ` Lennart Sorensen
@ 2014-11-06 16:08               ` Gilles Chanteperdrix
  2014-11-07  9:48                 ` tbultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-06 16:08 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: tbultel, xenomai

On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote:
> On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr wrote:
> > Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
> > It is -not- enabled in the evaluation kernel that is provided by the 
> > manufacturer.
> > That errata is said to be for CPU revs < r2p0
> > 
> > I am a little bit puzzled about the naming conventions for the CPU revision,
> > uboot says rev1.2, the kernel says
> > 
> > Processor	: ARMv7 Processor rev 10 (v7l)
> > ...
> > 
> > CPU implementer	: 0x41
> > CPU architecture: 7
> > CPU variant	: 0x2
> > CPU part	: 0xc09
> > CPU revision	: 10
> > 
> > how can I do the matching ?
> > 
> > Meanwhile, we noticed that compared to the evaluation kernel, we were missing
> > CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
> > 
> > Adding them helps a lot, but the freeze still happens on one machine
> > 
> > We are currently trying with 754322 + 769419 + 754327 + 5 nops in fec ... 
> > but not sure if we need 754327.
> > 
> > Regards
> > Thierry
> > 
> > PS: Regarding the thermal issue, we have changed our supplier, we now have
> > a dissipator that is big enough (it is the AMOS820 from Via Embedded)
> 
> I am not sure how to read the A9 revision.  I have this on a system:

I have found the method I gave in ARM documentation. I am pretty
sure this is how it works.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-06 16:08               ` Gilles Chanteperdrix
@ 2014-11-07  9:48                 ` tbultel
  2014-11-07  9:52                   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: tbultel @ 2014-11-07  9:48 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai



----- Mail original -----
> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> À: "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
> Cc: tbultel@free.fr, xenomai@xenomai.org
> Envoyé: Jeudi 6 Novembre 2014 17:08:21
> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> 
> On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote:
> > On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr wrote:
> > > Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
> > > It is -not- enabled in the evaluation kernel that is provided by
> > > the
> > > manufacturer.
> > > That errata is said to be for CPU revs < r2p0
> > > 
> > > I am a little bit puzzled about the naming conventions for the
> > > CPU revision,
> > > uboot says rev1.2, the kernel says
> > > 
> > > Processor	: ARMv7 Processor rev 10 (v7l)
> > > ...
> > > 
> > > CPU implementer	: 0x41
> > > CPU architecture: 7
> > > CPU variant	: 0x2
> > > CPU part	: 0xc09
> > > CPU revision	: 10
> > > 
> > > how can I do the matching ?
> > > 
> > > Meanwhile, we noticed that compared to the evaluation kernel, we
> > > were missing
> > > CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
> > > 
> > > Adding them helps a lot, but the freeze still happens on one
> > > machine
> > > 
> > > We are currently trying with 754322 + 769419 + 754327 + 5 nops in
> > > fec ...
> > > but not sure if we need 754327.
> > > 
> > > Regards
> > > Thierry
> > > 
> > > PS: Regarding the thermal issue, we have changed our supplier, we
> > > now have
> > > a dissipator that is big enough (it is the AMOS820 from Via
> > > Embedded)
> > 
> > I am not sure how to read the A9 revision.  I have this on a
> > system:
> 
> I have found the method I gave in ARM documentation. I am pretty
> sure this is how it works.
> 
> --
> 					    Gilles.
> 


Gille, 
we agree that as we have a r2p10, the 754327 does not apply.
Thus the only erratas I was missing are CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419,
that are in ./arch/arm/configs/imx6_defconfig
They are now part of my config.
Unfortunately, the network stress test still makes the freeze happen with CONFIG_IPIPE enabled

How come can that freeze only happen on -some- machines (they all have the same CPU rev), 
and that the time they stay up is dependent on them ?
If the freeze was reproducible without CONFIG_IPIPE, we could easily say that it is simply
an hardware bug but unfortunately with is not the case.

A new info: the machine that freezes the most also freezes with ethernet fec unplugged.
All these machines work fine with CONFIG_IPIPE disabled.

Regards
Thierry



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-07  9:48                 ` tbultel
@ 2014-11-07  9:52                   ` Gilles Chanteperdrix
  2014-11-07  9:59                     ` Gilles Chanteperdrix
  2014-11-07 12:47                     ` tbultel
  0 siblings, 2 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-07  9:52 UTC (permalink / raw)
  To: tbultel; +Cc: xenomai

On Fri, Nov 07, 2014 at 10:48:43AM +0100, tbultel@free.fr wrote:
> 
> 
> ----- Mail original -----
> > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > À: "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
> > Cc: tbultel@free.fr, xenomai@xenomai.org
> > Envoyé: Jeudi 6 Novembre 2014 17:08:21
> > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > 
> > On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote:
> > > On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr wrote:
> > > > Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
> > > > It is -not- enabled in the evaluation kernel that is provided by
> > > > the
> > > > manufacturer.
> > > > That errata is said to be for CPU revs < r2p0
> > > > 
> > > > I am a little bit puzzled about the naming conventions for the
> > > > CPU revision,
> > > > uboot says rev1.2, the kernel says
> > > > 
> > > > Processor	: ARMv7 Processor rev 10 (v7l)
> > > > ...
> > > > 
> > > > CPU implementer	: 0x41
> > > > CPU architecture: 7
> > > > CPU variant	: 0x2
> > > > CPU part	: 0xc09
> > > > CPU revision	: 10
> > > > 
> > > > how can I do the matching ?
> > > > 
> > > > Meanwhile, we noticed that compared to the evaluation kernel, we
> > > > were missing
> > > > CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
> > > > 
> > > > Adding them helps a lot, but the freeze still happens on one
> > > > machine
> > > > 
> > > > We are currently trying with 754322 + 769419 + 754327 + 5 nops in
> > > > fec ...
> > > > but not sure if we need 754327.
> > > > 
> > > > Regards
> > > > Thierry
> > > > 
> > > > PS: Regarding the thermal issue, we have changed our supplier, we
> > > > now have
> > > > a dissipator that is big enough (it is the AMOS820 from Via
> > > > Embedded)
> > > 
> > > I am not sure how to read the A9 revision.  I have this on a
> > > system:
> > 
> > I have found the method I gave in ARM documentation. I am pretty
> > sure this is how it works.
> > 
> > --
> > 					    Gilles.
> > 
> 
> 
> Gille, 
> we agree that as we have a r2p10, the 754327 does not apply.
> Thus the only erratas I was missing are CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419,
> that are in ./arch/arm/configs/imx6_defconfig
> They are now part of my config.
> Unfortunately, the network stress test still makes the freeze happen with CONFIG_IPIPE enabled
> 
> How come can that freeze only happen on -some- machines (they all have the same CPU rev), 
> and that the time they stay up is dependent on them ?
> If the freeze was reproducible without CONFIG_IPIPE, we could easily say that it is simply
> an hardware bug but unfortunately with is not the case.
> 
> A new info: the machine that freezes the most also freezes with ethernet fec unplugged.
> All these machines work fine with CONFIG_IPIPE disabled.

Well, you told me that you had freezes because of the mb() in the
FEC code, all that I can tell you is that the bug I know related to
mb() would probably be fixed by adding nops before the mb(). It is
not clear to me, have you tried that?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-07  9:52                   ` Gilles Chanteperdrix
@ 2014-11-07  9:59                     ` Gilles Chanteperdrix
  2014-11-07 12:47                     ` tbultel
  1 sibling, 0 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-07  9:59 UTC (permalink / raw)
  To: tbultel; +Cc: xenomai

On Fri, Nov 07, 2014 at 10:52:22AM +0100, Gilles Chanteperdrix wrote:
> On Fri, Nov 07, 2014 at 10:48:43AM +0100, tbultel@free.fr wrote:
> > 
> > 
> > ----- Mail original -----
> > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > À: "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
> > > Cc: tbultel@free.fr, xenomai@xenomai.org
> > > Envoyé: Jeudi 6 Novembre 2014 17:08:21
> > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > 
> > > On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote:
> > > > On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr wrote:
> > > > > Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
> > > > > It is -not- enabled in the evaluation kernel that is provided by
> > > > > the
> > > > > manufacturer.
> > > > > That errata is said to be for CPU revs < r2p0
> > > > > 
> > > > > I am a little bit puzzled about the naming conventions for the
> > > > > CPU revision,
> > > > > uboot says rev1.2, the kernel says
> > > > > 
> > > > > Processor	: ARMv7 Processor rev 10 (v7l)
> > > > > ...
> > > > > 
> > > > > CPU implementer	: 0x41
> > > > > CPU architecture: 7
> > > > > CPU variant	: 0x2
> > > > > CPU part	: 0xc09
> > > > > CPU revision	: 10
> > > > > 
> > > > > how can I do the matching ?
> > > > > 
> > > > > Meanwhile, we noticed that compared to the evaluation kernel, we
> > > > > were missing
> > > > > CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
> > > > > 
> > > > > Adding them helps a lot, but the freeze still happens on one
> > > > > machine
> > > > > 
> > > > > We are currently trying with 754322 + 769419 + 754327 + 5 nops in
> > > > > fec ...
> > > > > but not sure if we need 754327.
> > > > > 
> > > > > Regards
> > > > > Thierry
> > > > > 
> > > > > PS: Regarding the thermal issue, we have changed our supplier, we
> > > > > now have
> > > > > a dissipator that is big enough (it is the AMOS820 from Via
> > > > > Embedded)
> > > > 
> > > > I am not sure how to read the A9 revision.  I have this on a
> > > > system:
> > > 
> > > I have found the method I gave in ARM documentation. I am pretty
> > > sure this is how it works.
> > > 
> > > --
> > > 					    Gilles.
> > > 
> > 
> > 
> > Gille, 
> > we agree that as we have a r2p10, the 754327 does not apply.

Nothing says that the starvation due to calling mb in a tight loop
is related to errata 754327, it just happens that the workaround for
this erratum triggers this bug, because it results in calling mb in
a tight loop.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-07  9:52                   ` Gilles Chanteperdrix
  2014-11-07  9:59                     ` Gilles Chanteperdrix
@ 2014-11-07 12:47                     ` tbultel
  2014-11-07 19:58                       ` Gilles Chanteperdrix
  1 sibling, 1 reply; 46+ messages in thread
From: tbultel @ 2014-11-07 12:47 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai



----- Mail original -----
> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> À: tbultel@free.fr
> Cc: xenomai@xenomai.org, "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
> Envoyé: Vendredi 7 Novembre 2014 10:52:22
> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> 
> On Fri, Nov 07, 2014 at 10:48:43AM +0100, tbultel@free.fr wrote:
> > 
> > 
> > ----- Mail original -----
> > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > À: "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
> > > Cc: tbultel@free.fr, xenomai@xenomai.org
> > > Envoyé: Jeudi 6 Novembre 2014 17:08:21
> > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> > > adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > 
> > > On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote:
> > > > On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr
> > > > wrote:
> > > > > Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
> > > > > It is -not- enabled in the evaluation kernel that is provided
> > > > > by
> > > > > the
> > > > > manufacturer.
> > > > > That errata is said to be for CPU revs < r2p0
> > > > > 
> > > > > I am a little bit puzzled about the naming conventions for
> > > > > the
> > > > > CPU revision,
> > > > > uboot says rev1.2, the kernel says
> > > > > 
> > > > > Processor	: ARMv7 Processor rev 10 (v7l)
> > > > > ...
> > > > > 
> > > > > CPU implementer	: 0x41
> > > > > CPU architecture: 7
> > > > > CPU variant	: 0x2
> > > > > CPU part	: 0xc09
> > > > > CPU revision	: 10
> > > > > 
> > > > > how can I do the matching ?
> > > > > 
> > > > > Meanwhile, we noticed that compared to the evaluation kernel,
> > > > > we
> > > > > were missing
> > > > > CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
> > > > > 
> > > > > Adding them helps a lot, but the freeze still happens on one
> > > > > machine
> > > > > 
> > > > > We are currently trying with 754322 + 769419 + 754327 + 5
> > > > > nops in
> > > > > fec ...
> > > > > but not sure if we need 754327.
> > > > > 
> > > > > Regards
> > > > > Thierry
> > > > > 
> > > > > PS: Regarding the thermal issue, we have changed our
> > > > > supplier, we
> > > > > now have
> > > > > a dissipator that is big enough (it is the AMOS820 from Via
> > > > > Embedded)
> > > > 
> > > > I am not sure how to read the A9 revision.  I have this on a
> > > > system:
> > > 
> > > I have found the method I gave in ARM documentation. I am pretty
> > > sure this is how it works.
> > > 
> > > --
> > > 					    Gilles.
> > > 
> > 
> > 
> > Gille,
> > we agree that as we have a r2p10, the 754327 does not apply.
> > Thus the only erratas I was missing are CONFIG_ARM_ERRATA_754322
> > and CONFIG_PL310_ERRATA_769419,
> > that are in ./arch/arm/configs/imx6_defconfig
> > They are now part of my config.
> > Unfortunately, the network stress test still makes the freeze
> > happen with CONFIG_IPIPE enabled
> > 
> > How come can that freeze only happen on -some- machines (they all
> > have the same CPU rev),
> > and that the time they stay up is dependent on them ?
> > If the freeze was reproducible without CONFIG_IPIPE, we could
> > easily say that it is simply
> > an hardware bug but unfortunately with is not the case.
> > 
> > A new info: the machine that freezes the most also freezes with
> > ethernet fec unplugged.
> > All these machines work fine with CONFIG_IPIPE disabled.
> 
> Well, you told me that you had freezes because of the mb() in the
> FEC code, all that I can tell you is that the bug I know related to
> mb() would probably be fixed by adding nops before the mb(). It is
> not clear to me, have you tried that?
> 
> --

The freeze happens faster with he mb(), yes.
But it is still there without it, or when adding the 5 nops before.
And if the ethernet is unplugged (which normally leads to the code 
we mention not to be called),  we have the bug, too.
I have just made a test with a ethernet on USB adapter and it freezes the same way.

Thierry
> 					    Gilles.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-07 12:47                     ` tbultel
@ 2014-11-07 19:58                       ` Gilles Chanteperdrix
  2014-11-09 17:48                         ` Thierry Bultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-07 19:58 UTC (permalink / raw)
  To: tbultel; +Cc: xenomai

On Fri, Nov 07, 2014 at 01:47:59PM +0100, tbultel@free.fr wrote:
> 
> 
> ----- Mail original -----
> > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > À: tbultel@free.fr
> > Cc: xenomai@xenomai.org, "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
> > Envoyé: Vendredi 7 Novembre 2014 10:52:22
> > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > 
> > On Fri, Nov 07, 2014 at 10:48:43AM +0100, tbultel@free.fr wrote:
> > > 
> > > 
> > > ----- Mail original -----
> > > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > > À: "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
> > > > Cc: tbultel@free.fr, xenomai@xenomai.org
> > > > Envoyé: Jeudi 6 Novembre 2014 17:08:21
> > > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> > > > adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > > 
> > > > On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote:
> > > > > On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr
> > > > > wrote:
> > > > > > Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
> > > > > > It is -not- enabled in the evaluation kernel that is provided
> > > > > > by
> > > > > > the
> > > > > > manufacturer.
> > > > > > That errata is said to be for CPU revs < r2p0
> > > > > > 
> > > > > > I am a little bit puzzled about the naming conventions for
> > > > > > the
> > > > > > CPU revision,
> > > > > > uboot says rev1.2, the kernel says
> > > > > > 
> > > > > > Processor	: ARMv7 Processor rev 10 (v7l)
> > > > > > ...
> > > > > > 
> > > > > > CPU implementer	: 0x41
> > > > > > CPU architecture: 7
> > > > > > CPU variant	: 0x2
> > > > > > CPU part	: 0xc09
> > > > > > CPU revision	: 10
> > > > > > 
> > > > > > how can I do the matching ?
> > > > > > 
> > > > > > Meanwhile, we noticed that compared to the evaluation kernel,
> > > > > > we
> > > > > > were missing
> > > > > > CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
> > > > > > 
> > > > > > Adding them helps a lot, but the freeze still happens on one
> > > > > > machine
> > > > > > 
> > > > > > We are currently trying with 754322 + 769419 + 754327 + 5
> > > > > > nops in
> > > > > > fec ...
> > > > > > but not sure if we need 754327.
> > > > > > 
> > > > > > Regards
> > > > > > Thierry
> > > > > > 
> > > > > > PS: Regarding the thermal issue, we have changed our
> > > > > > supplier, we
> > > > > > now have
> > > > > > a dissipator that is big enough (it is the AMOS820 from Via
> > > > > > Embedded)
> > > > > 
> > > > > I am not sure how to read the A9 revision.  I have this on a
> > > > > system:
> > > > 
> > > > I have found the method I gave in ARM documentation. I am pretty
> > > > sure this is how it works.
> > > > 
> > > > --
> > > > 					    Gilles.
> > > > 
> > > 
> > > 
> > > Gille,
> > > we agree that as we have a r2p10, the 754327 does not apply.
> > > Thus the only erratas I was missing are CONFIG_ARM_ERRATA_754322
> > > and CONFIG_PL310_ERRATA_769419,
> > > that are in ./arch/arm/configs/imx6_defconfig
> > > They are now part of my config.
> > > Unfortunately, the network stress test still makes the freeze
> > > happen with CONFIG_IPIPE enabled
> > > 
> > > How come can that freeze only happen on -some- machines (they all
> > > have the same CPU rev),
> > > and that the time they stay up is dependent on them ?
> > > If the freeze was reproducible without CONFIG_IPIPE, we could
> > > easily say that it is simply
> > > an hardware bug but unfortunately with is not the case.
> > > 
> > > A new info: the machine that freezes the most also freezes with
> > > ethernet fec unplugged.
> > > All these machines work fine with CONFIG_IPIPE disabled.
> > 
> > Well, you told me that you had freezes because of the mb() in the
> > FEC code, all that I can tell you is that the bug I know related to
> > mb() would probably be fixed by adding nops before the mb(). It is
> > not clear to me, have you tried that?
> > 
> > --
> 
> The freeze happens faster with he mb(), yes.
> But it is still there without it, or when adding the 5 nops before.
> And if the ethernet is unplugged (which normally leads to the code 
> we mention not to be called),  we have the bug, too.
> I have just made a test with a ethernet on USB adapter and it freezes the same way.

When the freeze happens, is the timer still ticking? Have you
checked that all the tricks in the idle function are disabled, in
particular the switch to timer broadcast mode? Have you tried
enabling I-pipe and xenomai debugs?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-07 19:58                       ` Gilles Chanteperdrix
@ 2014-11-09 17:48                         ` Thierry Bultel
  2014-11-10 12:36                           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: Thierry Bultel @ 2014-11-09 17:48 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai

Le 07/11/2014 20:58, Gilles Chanteperdrix a écrit :
> On Fri, Nov 07, 2014 at 01:47:59PM +0100, tbultel@free.fr wrote:
>>
>>
>> ----- Mail original -----
>>> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
>>> À: tbultel@free.fr
>>> Cc: xenomai@xenomai.org, "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
>>> Envoyé: Vendredi 7 Novembre 2014 10:52:22
>>> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
>>>
>>> On Fri, Nov 07, 2014 at 10:48:43AM +0100, tbultel@free.fr wrote:
>>>>
>>>>
>>>> ----- Mail original -----
>>>>> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
>>>>> À: "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
>>>>> Cc: tbultel@free.fr, xenomai@xenomai.org
>>>>> Envoyé: Jeudi 6 Novembre 2014 17:08:21
>>>>> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
>>>>> adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
>>>>>
>>>>> On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote:
>>>>>> On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr
>>>>>> wrote:
>>>>>>> Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
>>>>>>> It is -not- enabled in the evaluation kernel that is provided
>>>>>>> by
>>>>>>> the
>>>>>>> manufacturer.
>>>>>>> That errata is said to be for CPU revs < r2p0
>>>>>>>
>>>>>>> I am a little bit puzzled about the naming conventions for
>>>>>>> the
>>>>>>> CPU revision,
>>>>>>> uboot says rev1.2, the kernel says
>>>>>>>
>>>>>>> Processor	: ARMv7 Processor rev 10 (v7l)
>>>>>>> ...
>>>>>>>
>>>>>>> CPU implementer	: 0x41
>>>>>>> CPU architecture: 7
>>>>>>> CPU variant	: 0x2
>>>>>>> CPU part	: 0xc09
>>>>>>> CPU revision	: 10
>>>>>>>
>>>>>>> how can I do the matching ?
>>>>>>>
>>>>>>> Meanwhile, we noticed that compared to the evaluation kernel,
>>>>>>> we
>>>>>>> were missing
>>>>>>> CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
>>>>>>>
>>>>>>> Adding them helps a lot, but the freeze still happens on one
>>>>>>> machine
>>>>>>>
>>>>>>> We are currently trying with 754322 + 769419 + 754327 + 5
>>>>>>> nops in
>>>>>>> fec ...
>>>>>>> but not sure if we need 754327.
>>>>>>>
>>>>>>> Regards
>>>>>>> Thierry
>>>>>>>
>>>>>>> PS: Regarding the thermal issue, we have changed our
>>>>>>> supplier, we
>>>>>>> now have
>>>>>>> a dissipator that is big enough (it is the AMOS820 from Via
>>>>>>> Embedded)
>>>>>>
>>>>>> I am not sure how to read the A9 revision.  I have this on a
>>>>>> system:
>>>>>
>>>>> I have found the method I gave in ARM documentation. I am pretty
>>>>> sure this is how it works.
>>>>>
>>>>> --
>>>>> 					    Gilles.
>>>>>
>>>>
>>>>
>>>> Gille,
>>>> we agree that as we have a r2p10, the 754327 does not apply.
>>>> Thus the only erratas I was missing are CONFIG_ARM_ERRATA_754322
>>>> and CONFIG_PL310_ERRATA_769419,
>>>> that are in ./arch/arm/configs/imx6_defconfig
>>>> They are now part of my config.
>>>> Unfortunately, the network stress test still makes the freeze
>>>> happen with CONFIG_IPIPE enabled
>>>>
>>>> How come can that freeze only happen on -some- machines (they all
>>>> have the same CPU rev),
>>>> and that the time they stay up is dependent on them ?
>>>> If the freeze was reproducible without CONFIG_IPIPE, we could
>>>> easily say that it is simply
>>>> an hardware bug but unfortunately with is not the case.
>>>>
>>>> A new info: the machine that freezes the most also freezes with
>>>> ethernet fec unplugged.
>>>> All these machines work fine with CONFIG_IPIPE disabled.
>>>
>>> Well, you told me that you had freezes because of the mb() in the
>>> FEC code, all that I can tell you is that the bug I know related to
>>> mb() would probably be fixed by adding nops before the mb(). It is
>>> not clear to me, have you tried that?
>>>
>>> --
>>
>> The freeze happens faster with he mb(), yes.
>> But it is still there without it, or when adding the 5 nops before.
>> And if the ethernet is unplugged (which normally leads to the code
>> we mention not to be called),  we have the bug, too.
>> I have just made a test with a ethernet on USB adapter and it freezes the same way.
>
> When the freeze happens, is the timer still ticking?

I will attempt to do some led debugging by next week, because I do not 
have a JTAG yet

Have you
> checked that all the tricks in the idle function are disabled, in
> particular the switch to timer broadcast mode?

Could you please be more specific ?

But as you are talking about timer broadcast, I do not know if you 
remember, but in a previous mail, I said that I saw strange behaviour
in the statistics of /proc/interrupts.

The 'iMX Timer Tick' interrupt, which is executed on CPU0,
increases its counter very slowly, less than 1 per minute.
We did not pay too much attention to it.
I see in /proc/timer_list that its handler is tick_handle_oneshot_broadcast

Could that be related ?

Also, one of our application runs in linux domain (not linked with 
xenomai), and uses clock_nanosleep to be woken up each 30 ms.
We initially used CONFIG_NO_HZ, and found out that sometimes it took
up to 200ms to be woken up. LTTng showed that it was not a preemption, 
and that the thread was really sched-switched, but that it took the CPU 
only after the next coming interrupt, for instance a network one.
Again, I probably should have looked deeper to understand why, but
the workaround of using CONFIG_HZ=1000 did it (which I guess hides the
bug, but makes that the thread only looses 1 ms in the worse case)
I wonder if that bug could be another symptom or not.


Have you tried
> enabling I-pipe and xenomai debugs?
>
This is my next step

Regards
Thierry


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-09 17:48                         ` Thierry Bultel
@ 2014-11-10 12:36                           ` Gilles Chanteperdrix
  2014-11-11 19:57                             ` Thierry Bultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-10 12:36 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: nicolas Mabire, xenomai

On Sun, Nov 09, 2014 at 06:48:59PM +0100, Thierry Bultel wrote:
> Le 07/11/2014 20:58, Gilles Chanteperdrix a écrit :
> >On Fri, Nov 07, 2014 at 01:47:59PM +0100, tbultel@free.fr wrote:
> >>
> >>
> >>----- Mail original -----
> >>>De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> >>>À: tbultel@free.fr
> >>>Cc: xenomai@xenomai.org, "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
> >>>Envoyé: Vendredi 7 Novembre 2014 10:52:22
> >>>Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> >>>
> >>>On Fri, Nov 07, 2014 at 10:48:43AM +0100, tbultel@free.fr wrote:
> >>>>
> >>>>
> >>>>----- Mail original -----
> >>>>>De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> >>>>>À: "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
> >>>>>Cc: tbultel@free.fr, xenomai@xenomai.org
> >>>>>Envoyé: Jeudi 6 Novembre 2014 17:08:21
> >>>>>Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> >>>>>adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> >>>>>
> >>>>>On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote:
> >>>>>>On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr
> >>>>>>wrote:
> >>>>>>>Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
> >>>>>>>It is -not- enabled in the evaluation kernel that is provided
> >>>>>>>by
> >>>>>>>the
> >>>>>>>manufacturer.
> >>>>>>>That errata is said to be for CPU revs < r2p0
> >>>>>>>
> >>>>>>>I am a little bit puzzled about the naming conventions for
> >>>>>>>the
> >>>>>>>CPU revision,
> >>>>>>>uboot says rev1.2, the kernel says
> >>>>>>>
> >>>>>>>Processor	: ARMv7 Processor rev 10 (v7l)
> >>>>>>>...
> >>>>>>>
> >>>>>>>CPU implementer	: 0x41
> >>>>>>>CPU architecture: 7
> >>>>>>>CPU variant	: 0x2
> >>>>>>>CPU part	: 0xc09
> >>>>>>>CPU revision	: 10
> >>>>>>>
> >>>>>>>how can I do the matching ?
> >>>>>>>
> >>>>>>>Meanwhile, we noticed that compared to the evaluation kernel,
> >>>>>>>we
> >>>>>>>were missing
> >>>>>>>CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
> >>>>>>>
> >>>>>>>Adding them helps a lot, but the freeze still happens on one
> >>>>>>>machine
> >>>>>>>
> >>>>>>>We are currently trying with 754322 + 769419 + 754327 + 5
> >>>>>>>nops in
> >>>>>>>fec ...
> >>>>>>>but not sure if we need 754327.
> >>>>>>>
> >>>>>>>Regards
> >>>>>>>Thierry
> >>>>>>>
> >>>>>>>PS: Regarding the thermal issue, we have changed our
> >>>>>>>supplier, we
> >>>>>>>now have
> >>>>>>>a dissipator that is big enough (it is the AMOS820 from Via
> >>>>>>>Embedded)
> >>>>>>
> >>>>>>I am not sure how to read the A9 revision.  I have this on a
> >>>>>>system:
> >>>>>
> >>>>>I have found the method I gave in ARM documentation. I am pretty
> >>>>>sure this is how it works.
> >>>>>
> >>>>>--
> >>>>>					    Gilles.
> >>>>>
> >>>>
> >>>>
> >>>>Gille,
> >>>>we agree that as we have a r2p10, the 754327 does not apply.
> >>>>Thus the only erratas I was missing are CONFIG_ARM_ERRATA_754322
> >>>>and CONFIG_PL310_ERRATA_769419,
> >>>>that are in ./arch/arm/configs/imx6_defconfig
> >>>>They are now part of my config.
> >>>>Unfortunately, the network stress test still makes the freeze
> >>>>happen with CONFIG_IPIPE enabled
> >>>>
> >>>>How come can that freeze only happen on -some- machines (they all
> >>>>have the same CPU rev),
> >>>>and that the time they stay up is dependent on them ?
> >>>>If the freeze was reproducible without CONFIG_IPIPE, we could
> >>>>easily say that it is simply
> >>>>an hardware bug but unfortunately with is not the case.
> >>>>
> >>>>A new info: the machine that freezes the most also freezes with
> >>>>ethernet fec unplugged.
> >>>>All these machines work fine with CONFIG_IPIPE disabled.
> >>>
> >>>Well, you told me that you had freezes because of the mb() in the
> >>>FEC code, all that I can tell you is that the bug I know related to
> >>>mb() would probably be fixed by adding nops before the mb(). It is
> >>>not clear to me, have you tried that?
> >>>
> >>>--
> >>
> >>The freeze happens faster with he mb(), yes.
> >>But it is still there without it, or when adding the 5 nops before.
> >>And if the ethernet is unplugged (which normally leads to the code
> >>we mention not to be called),  we have the bug, too.
> >>I have just made a test with a ethernet on USB adapter and it freezes the same way.
> >
> >When the freeze happens, is the timer still ticking?
> 
> I will attempt to do some led debugging by next week, because I do
> not have a JTAG yet

You can use printascii in the timer interrupt acknowledge routine to
print a character every HZ ticks, this will give bad latency, but
should work.

> 
> Have you
> >checked that all the tricks in the idle function are disabled, in
> >particular the switch to timer broadcast mode?
> 
> Could you please be more specific ?

On imx6, as on all cortex a9, Xenomai uses twd timers as local
timers. Imx6 can be configured so that twd interrupts do not wake up
a processor from wfi. So, the idle routine switches to "broadcast
mode", that is disables the local timers, and gets another timer to
send ipis to all cups when ticking. Since xenomai relies on local
timers only, this breaks xenomai. So, we try to avoid that, by
setting enable_wait_mode to false in arch/arm/mach-mx6/cpu.c and
putting a BUG() in the function which switches to broadcast mode,
just in case it is invoked another way.


> 
> But as you are talking about timer broadcast, I do not know if you
> remember, but in a previous mail, I said that I saw strange
> behaviour
> in the statistics of /proc/interrupts.
> 
> The 'iMX Timer Tick' interrupt, which is executed on CPU0,
> increases its counter very slowly, less than 1 per minute.
> We did not pay too much attention to it.
> I see in /proc/timer_list that its handler is tick_handle_oneshot_broadcast
> 
> Could that be related ?

If this is indeed the broadcast timer, it should never tick, because
we should never switch to broadcast mode.

> 
> Also, one of our application runs in linux domain (not linked with
> xenomai), and uses clock_nanosleep to be woken up each 30 ms.
> We initially used CONFIG_NO_HZ, and found out that sometimes it took
> up to 200ms to be woken up. LTTng showed that it was not a
> preemption, and that the thread was really sched-switched, but that
> it took the CPU only after the next coming interrupt, for instance a
> network one.
> Again, I probably should have looked deeper to understand why, but
> the workaround of using CONFIG_HZ=1000 did it (which I guess hides the
> bug, but makes that the thread only looses 1 ms in the worse case)
> I wonder if that bug could be another symptom or not.

This seems to be something different. This usually happens when a
scheduling of the softirqs at the end of irqs is missing. If you can
obtain a trace with the I-pipe tracer between the moment the timer
ticks and the moment the task is really scheduled, we can probably
find where the softirqs are missing.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-10 12:36                           ` Gilles Chanteperdrix
@ 2014-11-11 19:57                             ` Thierry Bultel
  2014-11-11 20:03                               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: Thierry Bultel @ 2014-11-11 19:57 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai

Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> On Sun, Nov 09, 2014 at 06:48:59PM +0100, Thierry Bultel wrote:
>> Le 07/11/2014 20:58, Gilles Chanteperdrix a écrit :
>>> On Fri, Nov 07, 2014 at 01:47:59PM +0100, tbultel@free.fr wrote:
>>>>
>>>>
>>>> ----- Mail original -----
>>>>> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
>>>>> À: tbultel@free.fr
>>>>> Cc: xenomai@xenomai.org, "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
>>>>> Envoyé: Vendredi 7 Novembre 2014 10:52:22
>>>>> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
>>>>>
>>>>> On Fri, Nov 07, 2014 at 10:48:43AM +0100, tbultel@free.fr wrote:
>>>>>>
>>>>>>
>>>>>> ----- Mail original -----
>>>>>>> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
>>>>>>> À: "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
>>>>>>> Cc: tbultel@free.fr, xenomai@xenomai.org
>>>>>>> Envoyé: Jeudi 6 Novembre 2014 17:08:21
>>>>>>> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
>>>>>>> adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
>>>>>>>
>>>>>>> On Thu, Nov 06, 2014 at 11:04:57AM -0500, Lennart Sorensen wrote:
>>>>>>>> On Thu, Nov 06, 2014 at 03:41:47PM +0100, tbultel@free.fr
>>>>>>>> wrote:
>>>>>>>>> Gilles, we do not have CONFIG_ARM_ERRATA_754327 enabled
>>>>>>>>> It is -not- enabled in the evaluation kernel that is provided
>>>>>>>>> by
>>>>>>>>> the
>>>>>>>>> manufacturer.
>>>>>>>>> That errata is said to be for CPU revs < r2p0
>>>>>>>>>
>>>>>>>>> I am a little bit puzzled about the naming conventions for
>>>>>>>>> the
>>>>>>>>> CPU revision,
>>>>>>>>> uboot says rev1.2, the kernel says
>>>>>>>>>
>>>>>>>>> Processor	: ARMv7 Processor rev 10 (v7l)
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> CPU implementer	: 0x41
>>>>>>>>> CPU architecture: 7
>>>>>>>>> CPU variant	: 0x2
>>>>>>>>> CPU part	: 0xc09
>>>>>>>>> CPU revision	: 10
>>>>>>>>>
>>>>>>>>> how can I do the matching ?
>>>>>>>>>
>>>>>>>>> Meanwhile, we noticed that compared to the evaluation kernel,
>>>>>>>>> we
>>>>>>>>> were missing
>>>>>>>>> CONFIG_ARM_ERRATA_754322 and CONFIG_PL310_ERRATA_769419
>>>>>>>>>
>>>>>>>>> Adding them helps a lot, but the freeze still happens on one
>>>>>>>>> machine
>>>>>>>>>
>>>>>>>>> We are currently trying with 754322 + 769419 + 754327 + 5
>>>>>>>>> nops in
>>>>>>>>> fec ...
>>>>>>>>> but not sure if we need 754327.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Thierry
>>>>>>>>>
>>>>>>>>> PS: Regarding the thermal issue, we have changed our
>>>>>>>>> supplier, we
>>>>>>>>> now have
>>>>>>>>> a dissipator that is big enough (it is the AMOS820 from Via
>>>>>>>>> Embedded)
>>>>>>>>
>>>>>>>> I am not sure how to read the A9 revision.  I have this on a
>>>>>>>> system:
>>>>>>>
>>>>>>> I have found the method I gave in ARM documentation. I am pretty
>>>>>>> sure this is how it works.
>>>>>>>
>>>>>>> --
>>>>>>> 					    Gilles.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Gille,
>>>>>> we agree that as we have a r2p10, the 754327 does not apply.
>>>>>> Thus the only erratas I was missing are CONFIG_ARM_ERRATA_754322
>>>>>> and CONFIG_PL310_ERRATA_769419,
>>>>>> that are in ./arch/arm/configs/imx6_defconfig
>>>>>> They are now part of my config.
>>>>>> Unfortunately, the network stress test still makes the freeze
>>>>>> happen with CONFIG_IPIPE enabled
>>>>>>
>>>>>> How come can that freeze only happen on -some- machines (they all
>>>>>> have the same CPU rev),
>>>>>> and that the time they stay up is dependent on them ?
>>>>>> If the freeze was reproducible without CONFIG_IPIPE, we could
>>>>>> easily say that it is simply
>>>>>> an hardware bug but unfortunately with is not the case.
>>>>>>
>>>>>> A new info: the machine that freezes the most also freezes with
>>>>>> ethernet fec unplugged.
>>>>>> All these machines work fine with CONFIG_IPIPE disabled.
>>>>>
>>>>> Well, you told me that you had freezes because of the mb() in the
>>>>> FEC code, all that I can tell you is that the bug I know related to
>>>>> mb() would probably be fixed by adding nops before the mb(). It is
>>>>> not clear to me, have you tried that?
>>>>>
>>>>> --
>>>>
>>>> The freeze happens faster with he mb(), yes.
>>>> But it is still there without it, or when adding the 5 nops before.
>>>> And if the ethernet is unplugged (which normally leads to the code
>>>> we mention not to be called),  we have the bug, too.
>>>> I have just made a test with a ethernet on USB adapter and it freezes the same way.
>>>
>>> When the freeze happens, is the timer still ticking?
>>
>> I will attempt to do some led debugging by next week, because I do
>> not have a JTAG yet
>
> You can use printascii in the timer interrupt acknowledge routine to
> print a character every HZ ticks, this will give bad latency, but
> should work.
>

For unknown reason, the kernel gets stuck after
"console [tty0] enabled, bootconsole disabled" if I use printascii in 
do_local_timer().
earlyprintk seems broken as well.

>>
>> Have you
>>> checked that all the tricks in the idle function are disabled, in
>>> particular the switch to timer broadcast mode?
>>
>> Could you please be more specific ?
>
> On imx6, as on all cortex a9, Xenomai uses twd timers as local
> timers. Imx6 can be configured so that twd interrupts do not wake up
> a processor from wfi. So, the idle routine switches to "broadcast
> mode", that is disables the local timers, and gets another timer to
> send ipis to all cups when ticking. Since xenomai relies on local
> timers only, this breaks xenomai. So, we try to avoid that, by
> setting enable_wait_mode to false in arch/arm/mach-mx6/cpu.c and
> putting a BUG() in the function which switches to broadcast mode,
> just in case it is invoked another way.
>
>
>>
>> But as you are talking about timer broadcast, I do not know if you
>> remember, but in a previous mail, I said that I saw strange
>> behaviour
>> in the statistics of /proc/interrupts.
>>
>> The 'iMX Timer Tick' interrupt, which is executed on CPU0,
>> increases its counter very slowly, less than 1 per minute.
>> We did not pay too much attention to it.
>> I see in /proc/timer_list that its handler is tick_handle_oneshot_broadcast
>>
>> Could that be related ?
>
> If this is indeed the broadcast timer, it should never tick, because
> we should never switch to broadcast mode.

I have found out why it was ticking.
This is due to tick_broadcast_switch_to_oneshot() in 
kernel/time/tick-broadcast.c

This sets the oneshot mode to the time, and leads to a call of 
mxc_set_mode()

In that function, there is that comment:
	if (mode != clockevent_mode) {
		/* Set event time into far-far future */
		if (timer_is_v2())

... and I estimate "far-far future" to be about 20 minutes.

As a correction, I have made that change to 
tick_broadcast_switch_to_oneshot():

@@ -603,11 +610,21 @@ void tick_broadcast_setup_oneshot(struct 
clock_event_device *bc)
  {
         int cpu = smp_processor_id();

+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+       printk(KERN_ALERT "%s cpu %d -> dev %s 
IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
+       return;
+#endif

... and that makes the job, the iMX Timer is no longer armed.
What do you think about it ?

Still currently stress-testing to see if things are getting better.

>
>>
>> Also, one of our application runs in linux domain (not linked with
>> xenomai), and uses clock_nanosleep to be woken up each 30 ms.
>> We initially used CONFIG_NO_HZ, and found out that sometimes it took
>> up to 200ms to be woken up. LTTng showed that it was not a
>> preemption, and that the thread was really sched-switched, but that
>> it took the CPU only after the next coming interrupt, for instance a
>> network one.
>> Again, I probably should have looked deeper to understand why, but
>> the workaround of using CONFIG_HZ=1000 did it (which I guess hides the
>> bug, but makes that the thread only looses 1 ms in the worse case)
>> I wonder if that bug could be another symptom or not.
>
> This seems to be something different. This usually happens when a
> scheduling of the softirqs at the end of irqs is missing. If you can
> obtain a trace with the I-pipe tracer between the moment the timer
> ticks and the moment the task is really scheduled, we can probably
> find where the softirqs are missing.
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-11 19:57                             ` Thierry Bultel
@ 2014-11-11 20:03                               ` Gilles Chanteperdrix
  2014-11-12 13:17                                 ` Thierry Bultel
  2014-11-13 14:44                                 ` tbultel
  0 siblings, 2 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-11 20:03 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: nicolas Mabire, xenomai

On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> >You can use printascii in the timer interrupt acknowledge routine to
> >print a character every HZ ticks, this will give bad latency, but
> >should work.
> >
> 
> For unknown reason, the kernel gets stuck after
> "console [tty0] enabled, bootconsole disabled" if I use printascii
> in do_local_timer().
> earlyprintk seems broken as well.

Without doing this, does earlyprintk work?

> >If this is indeed the broadcast timer, it should never tick, because
> >we should never switch to broadcast mode.
> 
> I have found out why it was ticking.
> This is due to tick_broadcast_switch_to_oneshot() in
> kernel/time/tick-broadcast.c
> 
> This sets the oneshot mode to the time, and leads to a call of
> mxc_set_mode()
> 
> In that function, there is that comment:
> 	if (mode != clockevent_mode) {
> 		/* Set event time into far-far future */
> 		if (timer_is_v2())
> 
> ... and I estimate "far-far future" to be about 20 minutes.
> 
> As a correction, I have made that change to
> tick_broadcast_switch_to_oneshot():
> 
> @@ -603,11 +610,21 @@ void tick_broadcast_setup_oneshot(struct
> clock_event_device *bc)
>  {
>         int cpu = smp_processor_id();
> 
> +#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
> +       printk(KERN_ALERT "%s cpu %d -> dev %s
> IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
> +       return;
> +#endif
> 
> ... and that makes the job, the iMX Timer is no longer armed.
> What do you think about it ?
> 
> Still currently stress-testing to see if things are getting better.

I am afraid this should not change anything. This timer ticking is
not a problem by itself, it is a problem if the twd gets disabled.

Note that we discovered in another thread that CONFIG_TRACE_IRQFLAGS
should not be enabled. So, if it is enabled, you should disable it.

Also, are you running with all I-pipe and Xenomai debugs?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-11 20:03                               ` Gilles Chanteperdrix
@ 2014-11-12 13:17                                 ` Thierry Bultel
  2014-11-12 13:34                                   ` Gilles Chanteperdrix
  2014-11-13 14:44                                 ` tbultel
  1 sibling, 1 reply; 46+ messages in thread
From: Thierry Bultel @ 2014-11-12 13:17 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai

Le 11/11/2014 21:03, Gilles Chanteperdrix a écrit :
> On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
>> Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
>>> You can use printascii in the timer interrupt acknowledge routine to
>>> print a character every HZ ticks, this will give bad latency, but
>>> should work.
>>>
>>
>> For unknown reason, the kernel gets stuck after
>> "console [tty0] enabled, bootconsole disabled" if I use printascii
>> in do_local_timer().
>> earlyprintk seems broken as well.
>
> Without doing this, does earlyprintk work?

No it does not. In fact, this kernel is strange with early debug.
Namely, even without earlyprintk, when it comes to disable the 
bootconsole to use the normal one, it uses then re-prints everything was 
printed before, making think that it has restarted from the beginning.

I confirm that calling printascii in do_local_timer() leads to a kernel 
panic. Same thing if I use __ipipe_serial_debug instead.
I have used a counter (one per cpu) to start logging after 30000 ticks 
and it crashes after that delay.

>
>>> If this is indeed the broadcast timer, it should never tick, because
>>> we should never switch to broadcast mode.
>>
>> I have found out why it was ticking.
>> This is due to tick_broadcast_switch_to_oneshot() in
>> kernel/time/tick-broadcast.c
>>
>> This sets the oneshot mode to the time, and leads to a call of
>> mxc_set_mode()
>>
>> In that function, there is that comment:
>> 	if (mode != clockevent_mode) {
>> 		/* Set event time into far-far future */
>> 		if (timer_is_v2())
>>
>> ... and I estimate "far-far future" to be about 20 minutes.
>>
>> As a correction, I have made that change to
>> tick_broadcast_switch_to_oneshot():
>>
>> @@ -603,11 +610,21 @@ void tick_broadcast_setup_oneshot(struct
>> clock_event_device *bc)
>>   {
>>          int cpu = smp_processor_id();
>>
>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
>> +       printk(KERN_ALERT "%s cpu %d -> dev %s
>> IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
>> +       return;
>> +#endif
>>
>> ... and that makes the job, the iMX Timer is no longer armed.
>> What do you think about it ?
>>
>> Still currently stress-testing to see if things are getting better.
>
> I am afraid this should not change anything. This timer ticking is
> not a problem by itself, it is a problem if the twd gets disabled.
>
> Note that we discovered in another thread that CONFIG_TRACE_IRQFLAGS
> should not be enabled. So, if it is enabled, you should disable it.
>
> Also, are you running with all I-pipe and Xenomai debugs?

That is how I am running now. For now, I am unable to reproduce the 
freeze. Still testing.

>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 13:17                                 ` Thierry Bultel
@ 2014-11-12 13:34                                   ` Gilles Chanteperdrix
  2014-11-12 14:27                                     ` Thierry Bultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-12 13:34 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 02:17:11PM +0100, Thierry Bultel wrote:
> Le 11/11/2014 21:03, Gilles Chanteperdrix a écrit :
> >On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> >>Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> >>>You can use printascii in the timer interrupt acknowledge routine to
> >>>print a character every HZ ticks, this will give bad latency, but
> >>>should work.
> >>>
> >>
> >>For unknown reason, the kernel gets stuck after
> >>"console [tty0] enabled, bootconsole disabled" if I use printascii
> >>in do_local_timer().
> >>earlyprintk seems broken as well.
> >
> >Without doing this, does earlyprintk work?
> 
> No it does not. In fact, this kernel is strange with early debug.
> Namely, even without earlyprintk, when it comes to disable the
> bootconsole to use the normal one, it uses then re-prints everything
> was printed before, making think that it has restarted from the
> beginning.
> 
> I confirm that calling printascii in do_local_timer() leads to a
> kernel panic. Same thing if I use __ipipe_serial_debug instead.
> I have used a counter (one per cpu) to start logging after 30000
> ticks and it crashes after that delay.

Having looked at the sources, I do not find a debug-macro.S for
imx6. So, I doubt printascii can work at all. Maybe a first step is
to implement this missing support.

> >Also, are you running with all I-pipe and Xenomai debugs?
> 
> That is how I am running now. For now, I am unable to reproduce the
> freeze. Still testing.

Bad news...

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 13:34                                   ` Gilles Chanteperdrix
@ 2014-11-12 14:27                                     ` Thierry Bultel
  2014-11-12 14:30                                       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: Thierry Bultel @ 2014-11-12 14:27 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai

Le 12/11/2014 14:34, Gilles Chanteperdrix a écrit :
> On Wed, Nov 12, 2014 at 02:17:11PM +0100, Thierry Bultel wrote:
>> Le 11/11/2014 21:03, Gilles Chanteperdrix a écrit :
>>> On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
>>>> Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
>>>>> You can use printascii in the timer interrupt acknowledge routine to
>>>>> print a character every HZ ticks, this will give bad latency, but
>>>>> should work.
>>>>>
>>>>
>>>> For unknown reason, the kernel gets stuck after
>>>> "console [tty0] enabled, bootconsole disabled" if I use printascii
>>>> in do_local_timer().
>>>> earlyprintk seems broken as well.
>>>
>>> Without doing this, does earlyprintk work?
>>
>> No it does not. In fact, this kernel is strange with early debug.
>> Namely, even without earlyprintk, when it comes to disable the
>> bootconsole to use the normal one, it uses then re-prints everything
>> was printed before, making think that it has restarted from the
>> beginning.
>>
>> I confirm that calling printascii in do_local_timer() leads to a
>> kernel panic. Same thing if I use __ipipe_serial_debug instead.
>> I have used a counter (one per cpu) to start logging after 30000
>> ticks and it crashes after that delay.
>
> Having looked at the sources, I do not find a debug-macro.S for
> imx6. So, I doubt printascii can work at all. Maybe a first step is
> to implement this missing support.

I think that the implementation is in arch/arm/plat-mxc/include
/mach/debug-macro.S
>
>>> Also, are you running with all I-pipe and Xenomai debugs?
>>
>> That is how I am running now. For now, I am unable to reproduce the
>> freeze. Still testing.
>
> Bad news...
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 14:27                                     ` Thierry Bultel
@ 2014-11-12 14:30                                       ` Gilles Chanteperdrix
  2014-11-12 15:20                                         ` Thierry Bultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-12 14:30 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 03:27:11PM +0100, Thierry Bultel wrote:
> Le 12/11/2014 14:34, Gilles Chanteperdrix a écrit :
> >On Wed, Nov 12, 2014 at 02:17:11PM +0100, Thierry Bultel wrote:
> >>Le 11/11/2014 21:03, Gilles Chanteperdrix a écrit :
> >>>On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> >>>>Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> >>>>>You can use printascii in the timer interrupt acknowledge routine to
> >>>>>print a character every HZ ticks, this will give bad latency, but
> >>>>>should work.
> >>>>>
> >>>>
> >>>>For unknown reason, the kernel gets stuck after
> >>>>"console [tty0] enabled, bootconsole disabled" if I use printascii
> >>>>in do_local_timer().
> >>>>earlyprintk seems broken as well.
> >>>
> >>>Without doing this, does earlyprintk work?
> >>
> >>No it does not. In fact, this kernel is strange with early debug.
> >>Namely, even without earlyprintk, when it comes to disable the
> >>bootconsole to use the normal one, it uses then re-prints everything
> >>was printed before, making think that it has restarted from the
> >>beginning.
> >>
> >>I confirm that calling printascii in do_local_timer() leads to a
> >>kernel panic. Same thing if I use __ipipe_serial_debug instead.
> >>I have used a counter (one per cpu) to start logging after 30000
> >>ticks and it crashes after that delay.
> >
> >Having looked at the sources, I do not find a debug-macro.S for
> >imx6. So, I doubt printascii can work at all. Maybe a first step is
> >to implement this missing support.
> 
> I think that the implementation is in arch/arm/plat-mxc/include
> /mach/debug-macro.S

And the debug UART you are using is UART2 ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 14:30                                       ` Gilles Chanteperdrix
@ 2014-11-12 15:20                                         ` Thierry Bultel
  2014-11-12 15:29                                           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: Thierry Bultel @ 2014-11-12 15:20 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai

Le 12/11/2014 15:30, Gilles Chanteperdrix a écrit :
> On Wed, Nov 12, 2014 at 03:27:11PM +0100, Thierry Bultel wrote:
>> Le 12/11/2014 14:34, Gilles Chanteperdrix a écrit :
>>> On Wed, Nov 12, 2014 at 02:17:11PM +0100, Thierry Bultel wrote:
>>>> Le 11/11/2014 21:03, Gilles Chanteperdrix a écrit :
>>>>> On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
>>>>>> Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
>>>>>>> You can use printascii in the timer interrupt acknowledge routine to
>>>>>>> print a character every HZ ticks, this will give bad latency, but
>>>>>>> should work.
>>>>>>>
>>>>>>
>>>>>> For unknown reason, the kernel gets stuck after
>>>>>> "console [tty0] enabled, bootconsole disabled" if I use printascii
>>>>>> in do_local_timer().
>>>>>> earlyprintk seems broken as well.
>>>>>
>>>>> Without doing this, does earlyprintk work?
>>>>
>>>> No it does not. In fact, this kernel is strange with early debug.
>>>> Namely, even without earlyprintk, when it comes to disable the
>>>> bootconsole to use the normal one, it uses then re-prints everything
>>>> was printed before, making think that it has restarted from the
>>>> beginning.
>>>>
>>>> I confirm that calling printascii in do_local_timer() leads to a
>>>> kernel panic. Same thing if I use __ipipe_serial_debug instead.
>>>> I have used a counter (one per cpu) to start logging after 30000
>>>> ticks and it crashes after that delay.
>>>
>>> Having looked at the sources, I do not find a debug-macro.S for
>>> imx6. So, I doubt printascii can work at all. Maybe a first step is
>>> to implement this missing support.
>>
>> I think that the implementation is in arch/arm/plat-mxc/include
>> /mach/debug-macro.S
>
> And the debug UART you are using is UART2 ?
>
Yes,
and I have found out why the logs before the console switch were 
displayed twice. This is because the BSP directly calls

early_console_setup(UART2_BASE_ADDR, uart_clk);

removing it does not make earlyprintk or printascii work better, 
unfortunately.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 15:20                                         ` Thierry Bultel
@ 2014-11-12 15:29                                           ` Gilles Chanteperdrix
  2014-11-12 15:44                                             ` Thierry Bultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-12 15:29 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 04:20:50PM +0100, Thierry Bultel wrote:
> Le 12/11/2014 15:30, Gilles Chanteperdrix a écrit :
> >On Wed, Nov 12, 2014 at 03:27:11PM +0100, Thierry Bultel wrote:
> >>Le 12/11/2014 14:34, Gilles Chanteperdrix a écrit :
> >>>On Wed, Nov 12, 2014 at 02:17:11PM +0100, Thierry Bultel wrote:
> >>>>Le 11/11/2014 21:03, Gilles Chanteperdrix a écrit :
> >>>>>On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> >>>>>>Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> >>>>>>>You can use printascii in the timer interrupt acknowledge routine to
> >>>>>>>print a character every HZ ticks, this will give bad latency, but
> >>>>>>>should work.
> >>>>>>>
> >>>>>>
> >>>>>>For unknown reason, the kernel gets stuck after
> >>>>>>"console [tty0] enabled, bootconsole disabled" if I use printascii
> >>>>>>in do_local_timer().
> >>>>>>earlyprintk seems broken as well.
> >>>>>
> >>>>>Without doing this, does earlyprintk work?
> >>>>
> >>>>No it does not. In fact, this kernel is strange with early debug.
> >>>>Namely, even without earlyprintk, when it comes to disable the
> >>>>bootconsole to use the normal one, it uses then re-prints everything
> >>>>was printed before, making think that it has restarted from the
> >>>>beginning.
> >>>>
> >>>>I confirm that calling printascii in do_local_timer() leads to a
> >>>>kernel panic. Same thing if I use __ipipe_serial_debug instead.
> >>>>I have used a counter (one per cpu) to start logging after 30000
> >>>>ticks and it crashes after that delay.
> >>>
> >>>Having looked at the sources, I do not find a debug-macro.S for
> >>>imx6. So, I doubt printascii can work at all. Maybe a first step is
> >>>to implement this missing support.
> >>
> >>I think that the implementation is in arch/arm/plat-mxc/include
> >>/mach/debug-macro.S
> >
> >And the debug UART you are using is UART2 ?
> >
> Yes,
> and I have found out why the logs before the console switch were
> displayed twice. This is because the BSP directly calls
> 
> early_console_setup(UART2_BASE_ADDR, uart_clk);
> 
> removing it does not make earlyprintk or printascii work better,
> unfortunately.

Do you get the "Uncompressing kernel..." message on the console?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 15:29                                           ` Gilles Chanteperdrix
@ 2014-11-12 15:44                                             ` Thierry Bultel
  2014-11-12 15:55                                               ` Gilles Chanteperdrix
                                                                 ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Thierry Bultel @ 2014-11-12 15:44 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai

Le 12/11/2014 16:29, Gilles Chanteperdrix a écrit :
> On Wed, Nov 12, 2014 at 04:20:50PM +0100, Thierry Bultel wrote:
>> Le 12/11/2014 15:30, Gilles Chanteperdrix a écrit :
>>> On Wed, Nov 12, 2014 at 03:27:11PM +0100, Thierry Bultel wrote:
>>>> Le 12/11/2014 14:34, Gilles Chanteperdrix a écrit :
>>>>> On Wed, Nov 12, 2014 at 02:17:11PM +0100, Thierry Bultel wrote:
>>>>>> Le 11/11/2014 21:03, Gilles Chanteperdrix a écrit :
>>>>>>> On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
>>>>>>>> Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
>>>>>>>>> You can use printascii in the timer interrupt acknowledge routine to
>>>>>>>>> print a character every HZ ticks, this will give bad latency, but
>>>>>>>>> should work.
>>>>>>>>>
>>>>>>>>
>>>>>>>> For unknown reason, the kernel gets stuck after
>>>>>>>> "console [tty0] enabled, bootconsole disabled" if I use printascii
>>>>>>>> in do_local_timer().
>>>>>>>> earlyprintk seems broken as well.
>>>>>>>
>>>>>>> Without doing this, does earlyprintk work?
>>>>>>
>>>>>> No it does not. In fact, this kernel is strange with early debug.
>>>>>> Namely, even without earlyprintk, when it comes to disable the
>>>>>> bootconsole to use the normal one, it uses then re-prints everything
>>>>>> was printed before, making think that it has restarted from the
>>>>>> beginning.
>>>>>>
>>>>>> I confirm that calling printascii in do_local_timer() leads to a
>>>>>> kernel panic. Same thing if I use __ipipe_serial_debug instead.
>>>>>> I have used a counter (one per cpu) to start logging after 30000
>>>>>> ticks and it crashes after that delay.
>>>>>
>>>>> Having looked at the sources, I do not find a debug-macro.S for
>>>>> imx6. So, I doubt printascii can work at all. Maybe a first step is
>>>>> to implement this missing support.
>>>>
>>>> I think that the implementation is in arch/arm/plat-mxc/include
>>>> /mach/debug-macro.S
>>>
>>> And the debug UART you are using is UART2 ?
>>>
>> Yes,
>> and I have found out why the logs before the console switch were
>> displayed twice. This is because the BSP directly calls
>>
>> early_console_setup(UART2_BASE_ADDR, uart_clk);
>>
>> removing it does not make earlyprintk or printascii work better,
>> unfortunately.
>
> Do you get the "Uncompressing kernel..." message on the console?
>
This is all what I get when adding earlyprintk=serial,ttymxc1,115200 to 
the bootargs:


Uncompressing Linux... done, booting the kernel.
Linux version 3.0.43_4.1.0 (localuser@thierry-desktop) (gcc version 
4.7.3 (Buildroot 2014.02-rc3-g7ebc513) ) #20 SMP PREEMPT Wed Nov 12 
16:10:51 CET 2014
CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c53c7d
CPU: VIPT nonaliasing data cache, VIPT aliasing instruction cache
Machine: Freescale i.MX 6Quad VAB-820 Board
bootconsole [earlycon0] enabled
Memory policy: ECC disabled, Data cache writealloc
CPU identified as i.MX6Q, silicon rev 1.2

<nothing after>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 15:44                                             ` Thierry Bultel
@ 2014-11-12 15:55                                               ` Gilles Chanteperdrix
  2014-11-12 16:17                                                 ` Thierry Bultel
  2014-11-12 16:15                                               ` Gilles Chanteperdrix
  2014-11-12 18:53                                               ` Lennart Sorensen
  2 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-12 15:55 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 04:44:24PM +0100, Thierry Bultel wrote:
> Le 12/11/2014 16:29, Gilles Chanteperdrix a écrit :
> >On Wed, Nov 12, 2014 at 04:20:50PM +0100, Thierry Bultel wrote:
> >>Le 12/11/2014 15:30, Gilles Chanteperdrix a écrit :
> >>>On Wed, Nov 12, 2014 at 03:27:11PM +0100, Thierry Bultel wrote:
> >>>>Le 12/11/2014 14:34, Gilles Chanteperdrix a écrit :
> >>>>>On Wed, Nov 12, 2014 at 02:17:11PM +0100, Thierry Bultel wrote:
> >>>>>>Le 11/11/2014 21:03, Gilles Chanteperdrix a écrit :
> >>>>>>>On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> >>>>>>>>Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> >>>>>>>>>You can use printascii in the timer interrupt acknowledge routine to
> >>>>>>>>>print a character every HZ ticks, this will give bad latency, but
> >>>>>>>>>should work.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>For unknown reason, the kernel gets stuck after
> >>>>>>>>"console [tty0] enabled, bootconsole disabled" if I use printascii
> >>>>>>>>in do_local_timer().
> >>>>>>>>earlyprintk seems broken as well.
> >>>>>>>
> >>>>>>>Without doing this, does earlyprintk work?
> >>>>>>
> >>>>>>No it does not. In fact, this kernel is strange with early debug.
> >>>>>>Namely, even without earlyprintk, when it comes to disable the
> >>>>>>bootconsole to use the normal one, it uses then re-prints everything
> >>>>>>was printed before, making think that it has restarted from the
> >>>>>>beginning.
> >>>>>>
> >>>>>>I confirm that calling printascii in do_local_timer() leads to a
> >>>>>>kernel panic. Same thing if I use __ipipe_serial_debug instead.
> >>>>>>I have used a counter (one per cpu) to start logging after 30000
> >>>>>>ticks and it crashes after that delay.
> >>>>>
> >>>>>Having looked at the sources, I do not find a debug-macro.S for
> >>>>>imx6. So, I doubt printascii can work at all. Maybe a first step is
> >>>>>to implement this missing support.
> >>>>
> >>>>I think that the implementation is in arch/arm/plat-mxc/include
> >>>>/mach/debug-macro.S
> >>>
> >>>And the debug UART you are using is UART2 ?
> >>>
> >>Yes,
> >>and I have found out why the logs before the console switch were
> >>displayed twice. This is because the BSP directly calls
> >>
> >>early_console_setup(UART2_BASE_ADDR, uart_clk);
> >>
> >>removing it does not make earlyprintk or printascii work better,
> >>unfortunately.
> >
> >Do you get the "Uncompressing kernel..." message on the console?
> >
> This is all what I get when adding earlyprintk=serial,ttymxc1,115200
> to the bootargs:
> 
> 
> Uncompressing Linux... done, booting the kernel.
> Linux version 3.0.43_4.1.0 (localuser@thierry-desktop) (gcc version
> 4.7.3 (Buildroot 2014.02-rc3-g7ebc513) ) #20 SMP PREEMPT Wed Nov 12
> 16:10:51 CET 2014
> CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c53c7d
> CPU: VIPT nonaliasing data cache, VIPT aliasing instruction cache
> Machine: Freescale i.MX 6Quad VAB-820 Board
> bootconsole [earlycon0] enabled
> Memory policy: ECC disabled, Data cache writealloc
> CPU identified as i.MX6Q, silicon rev 1.2
> 
> <nothing after>

Your problem seems to be that a static mapping for the UART2
registers is missing: printascii only works so long as the MMU is
not enabled.

Please try the following patch:
diff --git a/arch/arm/mach-mx6/mm.c b/arch/arm/mach-mx6/mm.c
index ad66a94..789d265 100644
--- a/arch/arm/mach-mx6/mm.c
+++ b/arch/arm/mach-mx6/mm.c
@@ -56,6 +56,11 @@ static struct map_desc mx6_io_desc[] __initdata = {
 	.pfn = __phys_to_pfn(ARM_PERIPHBASE),
 	.length = ARM_PERIPHBASE_SIZE,
 	.type = MT_DEVICE},
+	{
+	.virtual = IMX_IO_ADDRESS(MX6Q_UART2_BASE_ADDR),
+	.pfn = __phys_to_pfn(MX6Q_UART2_BASE_ADDR),
+	.length = 4096,
+	.type = MT_DEVICE},
 };
 
 static void mx6_set_cpu_type(void)


-- 
					    Gilles.


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 15:44                                             ` Thierry Bultel
  2014-11-12 15:55                                               ` Gilles Chanteperdrix
@ 2014-11-12 16:15                                               ` Gilles Chanteperdrix
  2014-11-12 18:53                                               ` Lennart Sorensen
  2 siblings, 0 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-12 16:15 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 04:44:24PM +0100, Thierry Bultel wrote:
> CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c53c7d

BTW:
41 2 fc09 a
so this is an arm cortex a9 r2p10.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 15:55                                               ` Gilles Chanteperdrix
@ 2014-11-12 16:17                                                 ` Thierry Bultel
  0 siblings, 0 replies; 46+ messages in thread
From: Thierry Bultel @ 2014-11-12 16:17 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai

Le 12/11/2014 16:55, Gilles Chanteperdrix a écrit :
> On Wed, Nov 12, 2014 at 04:44:24PM +0100, Thierry Bultel wrote:
>> Le 12/11/2014 16:29, Gilles Chanteperdrix a écrit :
>>> On Wed, Nov 12, 2014 at 04:20:50PM +0100, Thierry Bultel wrote:
>>>> Le 12/11/2014 15:30, Gilles Chanteperdrix a écrit :
>>>>> On Wed, Nov 12, 2014 at 03:27:11PM +0100, Thierry Bultel wrote:
>>>>>> Le 12/11/2014 14:34, Gilles Chanteperdrix a écrit :
>>>>>>> On Wed, Nov 12, 2014 at 02:17:11PM +0100, Thierry Bultel wrote:
>>>>>>>> Le 11/11/2014 21:03, Gilles Chanteperdrix a écrit :
>>>>>>>>> On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
>>>>>>>>>> Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
>>>>>>>>>>> You can use printascii in the timer interrupt acknowledge routine to
>>>>>>>>>>> print a character every HZ ticks, this will give bad latency, but
>>>>>>>>>>> should work.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For unknown reason, the kernel gets stuck after
>>>>>>>>>> "console [tty0] enabled, bootconsole disabled" if I use printascii
>>>>>>>>>> in do_local_timer().
>>>>>>>>>> earlyprintk seems broken as well.
>>>>>>>>>
>>>>>>>>> Without doing this, does earlyprintk work?
>>>>>>>>
>>>>>>>> No it does not. In fact, this kernel is strange with early debug.
>>>>>>>> Namely, even without earlyprintk, when it comes to disable the
>>>>>>>> bootconsole to use the normal one, it uses then re-prints everything
>>>>>>>> was printed before, making think that it has restarted from the
>>>>>>>> beginning.
>>>>>>>>
>>>>>>>> I confirm that calling printascii in do_local_timer() leads to a
>>>>>>>> kernel panic. Same thing if I use __ipipe_serial_debug instead.
>>>>>>>> I have used a counter (one per cpu) to start logging after 30000
>>>>>>>> ticks and it crashes after that delay.
>>>>>>>
>>>>>>> Having looked at the sources, I do not find a debug-macro.S for
>>>>>>> imx6. So, I doubt printascii can work at all. Maybe a first step is
>>>>>>> to implement this missing support.
>>>>>>
>>>>>> I think that the implementation is in arch/arm/plat-mxc/include
>>>>>> /mach/debug-macro.S
>>>>>
>>>>> And the debug UART you are using is UART2 ?
>>>>>
>>>> Yes,
>>>> and I have found out why the logs before the console switch were
>>>> displayed twice. This is because the BSP directly calls
>>>>
>>>> early_console_setup(UART2_BASE_ADDR, uart_clk);
>>>>
>>>> removing it does not make earlyprintk or printascii work better,
>>>> unfortunately.
>>>
>>> Do you get the "Uncompressing kernel..." message on the console?
>>>
>> This is all what I get when adding earlyprintk=serial,ttymxc1,115200
>> to the bootargs:
>>
>>
>> Uncompressing Linux... done, booting the kernel.
>> Linux version 3.0.43_4.1.0 (localuser@thierry-desktop) (gcc version
>> 4.7.3 (Buildroot 2014.02-rc3-g7ebc513) ) #20 SMP PREEMPT Wed Nov 12
>> 16:10:51 CET 2014
>> CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c53c7d
>> CPU: VIPT nonaliasing data cache, VIPT aliasing instruction cache
>> Machine: Freescale i.MX 6Quad VAB-820 Board
>> bootconsole [earlycon0] enabled
>> Memory policy: ECC disabled, Data cache writealloc
>> CPU identified as i.MX6Q, silicon rev 1.2
>>
>> <nothing after>
>
> Your problem seems to be that a static mapping for the UART2
> registers is missing: printascii only works so long as the MMU is
> not enabled.
>
> Please try the following patch:
> diff --git a/arch/arm/mach-mx6/mm.c b/arch/arm/mach-mx6/mm.c
> index ad66a94..789d265 100644
> --- a/arch/arm/mach-mx6/mm.c
> +++ b/arch/arm/mach-mx6/mm.c
> @@ -56,6 +56,11 @@ static struct map_desc mx6_io_desc[] __initdata = {
>   	.pfn = __phys_to_pfn(ARM_PERIPHBASE),
>   	.length = ARM_PERIPHBASE_SIZE,
>   	.type = MT_DEVICE},
> +	{
> +	.virtual = IMX_IO_ADDRESS(MX6Q_UART2_BASE_ADDR),
> +	.pfn = __phys_to_pfn(MX6Q_UART2_BASE_ADDR),
> +	.length = 4096,
> +	.type = MT_DEVICE},
>   };
>
>   static void mx6_set_cpu_type(void)
>
>
Many thanks Gilles, that solves both my call to printascii and 
earlyprintk in bootargs.
I keep trying to reproduce the freeze

Thierry


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 15:44                                             ` Thierry Bultel
  2014-11-12 15:55                                               ` Gilles Chanteperdrix
  2014-11-12 16:15                                               ` Gilles Chanteperdrix
@ 2014-11-12 18:53                                               ` Lennart Sorensen
  2014-11-12 19:06                                                 ` Gilles Chanteperdrix
  2 siblings, 1 reply; 46+ messages in thread
From: Lennart Sorensen @ 2014-11-12 18:53 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 04:44:24PM +0100, Thierry Bultel wrote:
> This is all what I get when adding earlyprintk=serial,ttymxc1,115200
> to the bootargs:

I have never used earlyprintk with an argument before.

I just set console= correctly, and then add earlyprintk as a seperate
argument, and it does the right thing (assuming you configure the right
UART for debuging if your platform requires that).

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 18:53                                               ` Lennart Sorensen
@ 2014-11-12 19:06                                                 ` Gilles Chanteperdrix
  2014-11-12 19:13                                                   ` Lennart Sorensen
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-12 19:06 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Thierry Bultel, nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 01:53:01PM -0500, Lennart Sorensen wrote:
> On Wed, Nov 12, 2014 at 04:44:24PM +0100, Thierry Bultel wrote:
> > This is all what I get when adding earlyprintk=serial,ttymxc1,115200
> > to the bootargs:
> 
> I have never used earlyprintk with an argument before.
> 
> I just set console= correctly, and then add earlyprintk as a seperate
> argument, and it does the right thing (assuming you configure the right
> UART for debuging if your platform requires that).

Thierry uses the documented syntax.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 19:06                                                 ` Gilles Chanteperdrix
@ 2014-11-12 19:13                                                   ` Lennart Sorensen
  2014-11-12 19:28                                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: Lennart Sorensen @ 2014-11-12 19:13 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Thierry Bultel, nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 08:06:52PM +0100, Gilles Chanteperdrix wrote:
> Thierry uses the documented syntax.

Nifty.  Never seen that syntax before.  I should check it out.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 19:13                                                   ` Lennart Sorensen
@ 2014-11-12 19:28                                                     ` Gilles Chanteperdrix
  2014-11-12 19:35                                                       ` Lennart Sorensen
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-12 19:28 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Thierry Bultel, nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 02:13:07PM -0500, Lennart Sorensen wrote:
> On Wed, Nov 12, 2014 at 08:06:52PM +0100, Gilles Chanteperdrix wrote:
> > Thierry uses the documented syntax.
> 
> Nifty.  Never seen that syntax before.  I should check it out.

Well, if you look at the code, what is passed to earlyprintk is not
used to setup the early console, not even what is passed to console,
the early console uses printascii, which behaviour is chosen at
compilation time.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-12 19:28                                                     ` Gilles Chanteperdrix
@ 2014-11-12 19:35                                                       ` Lennart Sorensen
  0 siblings, 0 replies; 46+ messages in thread
From: Lennart Sorensen @ 2014-11-12 19:35 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Thierry Bultel, nicolas Mabire, xenomai

On Wed, Nov 12, 2014 at 08:28:23PM +0100, Gilles Chanteperdrix wrote:
> On Wed, Nov 12, 2014 at 02:13:07PM -0500, Lennart Sorensen wrote:
> > On Wed, Nov 12, 2014 at 08:06:52PM +0100, Gilles Chanteperdrix wrote:
> > > Thierry uses the documented syntax.
> > 
> > Nifty.  Never seen that syntax before.  I should check it out.
> 
> Well, if you look at the code, what is passed to earlyprintk is not
> used to setup the early console, not even what is passed to console,
> the early console uses printascii, which behaviour is chosen at
> compilation time.

Oh.  omap explicitly has a CONFIG for which port to use for debug,
so I think that controls it.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-11 20:03                               ` Gilles Chanteperdrix
  2014-11-12 13:17                                 ` Thierry Bultel
@ 2014-11-13 14:44                                 ` tbultel
  2014-11-13 14:51                                   ` Gilles Chanteperdrix
  1 sibling, 1 reply; 46+ messages in thread
From: tbultel @ 2014-11-13 14:44 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai



----- Mail original -----
> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> À: "Thierry Bultel" <tbultel@free.fr>
> Cc: xenomai@xenomai.org, "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>, "nicolas Mabire"
> <nicolas.mabire@basystemes.fr>
> Envoyé: Mardi 11 Novembre 2014 21:03:58
> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> 
> On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> > Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> > >You can use printascii in the timer interrupt acknowledge routine
> > >to
> > >print a character every HZ ticks, this will give bad latency, but
> > >should work.
> > >
> > 
> > For unknown reason, the kernel gets stuck after
> > "console [tty0] enabled, bootconsole disabled" if I use printascii
> > in do_local_timer().
> > earlyprintk seems broken as well.
> 
> Without doing this, does earlyprintk work?
> 
> > >If this is indeed the broadcast timer, it should never tick,
> > >because
> > >we should never switch to broadcast mode.
> > 
> > I have found out why it was ticking.
> > This is due to tick_broadcast_switch_to_oneshot() in
> > kernel/time/tick-broadcast.c
> > 
> > This sets the oneshot mode to the time, and leads to a call of
> > mxc_set_mode()
> > 
> > In that function, there is that comment:
> > 	if (mode != clockevent_mode) {
> > 		/* Set event time into far-far future */
> > 		if (timer_is_v2())
> > 
> > ... and I estimate "far-far future" to be about 20 minutes.
> > 
> > As a correction, I have made that change to
> > tick_broadcast_switch_to_oneshot():
> > 
> > @@ -603,11 +610,21 @@ void tick_broadcast_setup_oneshot(struct
> > clock_event_device *bc)
> >  {
> >         int cpu = smp_processor_id();
> > 
> > +#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
> > +       printk(KERN_ALERT "%s cpu %d -> dev %s
> > IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
> > +       return;
> > +#endif
> > 
> > ... and that makes the job, the iMX Timer is no longer armed.
> > What do you think about it ?
> > 
> > Still currently stress-testing to see if things are getting better.
> 
> I am afraid this should not change anything. This timer ticking is
> not a problem by itself, it is a problem if the twd gets disabled.
> 

Unfortunately, you were right.
Even with that fix, we have 2 fail cases

I have added 4 leds to the GPIOs, and toggle them in do_local_timer, each 100 ticks
for being human visible. I found it more convenient at keeps my console clean.

1) on one machine, the freeze does not make a backtrace on the serial console
 a) in most cases, the leds do not blink, no more interrupts
 b) in one case, a single one kept blinking for a while before stopping

I first thought about graphics hardware acceleration because with it the display gets corrupted,
but even without it, the freeze happens the same way. 

2) on another machine, I am getting this backtrace. The local timers keep ticking.

Since it was hard to reproduce with IPIPE_DEBUG, it was unfortunately disabled for that run.
Attempting again with IPIPE_DEBUG enabled.

SGT_AGV2 login: Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 2    Not tainted  (3.0.43_4.1.0 #8)
PC is at flush_tlb_page+0x20/0xe8
LR is at ptep_set_access_flags+0x4c/0x8c
pc : [<80057da8>]    lr : [<8019a748>]    psr: 20000113
sp : ba989d90  ip : 00000800  fp : 00000002
r10: 00000000  r9 : 00000000  r8 : ba882380
r7 : 8bf5104c  r6 : ba97d108  r5 : ba97d108  r4 : 00000001
r3 : 2aae0000  r2 : 00000001  r1 : 2aae0000  r0 : ba97d108
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 4a06004a  DAC: 00000015
Process watch (pid: 3904, stack limit = 0xba9882f0)
Stack: (0xba989d90 to 0xba98a000)
9d80:                                     00000000 00000000 00000000 00000000
9da0: 00000001 2aae0000 ba97d108 8019a748 4f6b075f ba882380 00000001 ba97d108
9dc0: 2aae0000 8018d828 00000001 ba988000 ba989e54 80083a50 8004c094 ba5a7ba0
9de0: 00000000 00000000 00000002 bffb4000 800494c8 4f6b07df ba882380 ba97d108
9e00: ba060aa8 2aae0000 8bf5104c ba5a7ba0 00000002 8018f598 ba060aa8 8bf5104c
9e20: 4f6b07df bffb4000 00000002 800ae980 00000000 ba060000 00000155 2aae0000
9e40: ba97d108 00000001 ba060aa8 ba5a7ba0 00000002 8018f8b8 ba060aa8 00000001
9e60: ba989fb0 2aae0000 0000081f ba5a7ba0 bac1ae80 ba97d108 ba5a7bdc 8005d64c
9e80: 80046204 80049b80 ba989e9c 00000001 e96703c8 0000037d 8c01c4f8 80668000
9ea0: 8c01c4f8 800ad3d0 00000001 00000000 00000001 ba988000 ba989ecc 80082754
9ec0: ba989f38 00000000 ba989f98 80668000 8c01c4f8 800ace68 00000001 00000000
9ee0: 00000000 0000040f 0000000f 0000081f ba989fb0 80c8ba88 0006b08e 2aae0000
9f00: 2aad54c0 8004b4e0 3b9b8d50 00000000 00000001 00000000 ba988000 0000c350
9f20: 7e9977a4 800adcf4 0000c350 00000000 3b9b8d50 00000000 ba989f39 8c01c5a8
9f40: 00000000 00000000 e967c718 0000037d e96703c8 0000037d 800ac958 8c01c4f8
9f60: 00000000 00000000 00000000 8009da50 00000000 7e9977a4 00000000 00010000
9f80: 000000a2 800517a8 ba988000 00000000 00000000 0000040f 0000000f 00000000
9fa0: 0006b094 00000000 0005767b 80051544 2aae0000 0006b092 00000006 1b485b1b
9fc0: 00000006 2ac39a90 00000000 0006b094 00000000 0006b08e 0005767b 2aad54c0
9fe0: 2aae0000 7e9973b8 2abb98f9 2abbd158 00000010 ffffffff 00000000 00000000
[<80057da8>] (flush_tlb_page+0x20/0xe8) from [<8019a748>] (ptep_set_access_flags+0x4c                                                             /0x8c)
[<8019a748>] (ptep_set_access_flags+0x4c/0x8c) from [<8018d828>] (do_wp_page+0x2a4/0x                                                             728)
[<8018d828>] (do_wp_page+0x2a4/0x728) from [<8018f598>] (handle_pte_fault+0x1bc/0x410                                                             )
[<8018f598>] (handle_pte_fault+0x1bc/0x410) from [<8018f8b8>] (handle_mm_fault+0xcc/0                                                             x11c)
[<8018f8b8>] (handle_mm_fault+0xcc/0x11c) from [<8005d64c>] (do_page_fault+0x26c/0x3e                                                             4)
[<8005d64c>] (do_page_fault+0x26c/0x3e4) from [<8004b4e0>] (do_DataAbort+0x34/0x17c)
[<8004b4e0>] (do_DataAbort+0x34/0x17c) from [<80051544>] (ret_from_exception+0x0/0x40                                                             )
Exception stack(0xba989fb0 to 0xba989ff8)
9fa0:                                     2aae0000 0006b092 00000006 1b485b1b
9fc0: 00000006 2ac39a90 00000000 0006b094 00000000 0006b08e 0005767b 2aad54c0
9fe0: 2aae0000 7e9973b8 2abb98f9 2abbd158 00000010 ffffffff
Code: e5922000 e24dd010 e3520000 0a00000c (ee102ff1)
---[ end trace 1a38ad1feb34b78d ]---
note: watch[3904] exited with preempt_count 1
BUG: scheduling while atomic: watch/3904/0x40000002
Modules linked in:
[<8005876c>] (unwind_backtrace+0x0/0xf8) from [<80665ce8>] (__schedule+0x610/0x894)
[<80665ce8>] (__schedule+0x610/0x894) from [<80084200>] (__cond_resched+0x14/0x20)
[<80084200>] (__cond_resched+0x14/0x20) from [<8066600c>] (_cond_resched+0x3c/0x44)
[<8066600c>] (_cond_resched+0x3c/0x44) from [<8018e8d8>] (unmap_vmas+0x520/0x68c)
[<8018e8d8>] (unmap_vmas+0x520/0x68c) from [<80193dec>] (exit_mmap+0x108/0x248)
[<80193dec>] (exit_mmap+0x108/0x248) from [<800888b8>] (mmput+0x48/0x160)
[<800888b8>] (mmput+0x48/0x160) from [<8008ce08>] (exit_mm+0x128/0x168)
[<8008ce08>] (exit_mm+0x128/0x168) from [<8008e3f0>] (do_exit+0x130/0x7a0)
[<8008e3f0>] (do_exit+0x130/0x7a0) from [<80054fc4>] (die+0x218/0x28c)
[<80054fc4>] (die+0x218/0x28c) from [<8004b120>] (do_undefinstr+0x11c/0x154)
[<8004b120>] (do_undefinstr+0x11c/0x154) from [<800510c0>] (__und_svc+0x60/0x80)
Exception stack(0xba989d48 to 0xba989d90)
9d40:                   ba97d108 2aae0000 00000001 2aae0000 00000001 ba97d108
9d60: ba97d108 8bf5104c ba882380 00000000 00000000 00000002 00000800 ba989d90
9d80: 8019a748 80057da8 20000113 ffffffff
[<800510c0>] (__und_svc+0x60/0x80) from [<80057da8>] (flush_tlb_page+0x20/0xe8)
[<80057da8>] (flush_tlb_page+0x20/0xe8) from [<8019a748>] (ptep_set_access_flags+0x4c                                                             /0x8c)
[<8019a748>] (ptep_set_access_flags+0x4c/0x8c) from [<8018d828>] (do_wp_page+0x2a4/0x                                                             728)
[<8018d828>] (do_wp_page+0x2a4/0x728) from [<8018f598>] (handle_pte_fault+0x1bc/0x410                                                             )
[<8018f598>] (handle_pte_fault+0x1bc/0x410) from [<8018f8b8>] (handle_mm_fault+0xcc/0                                                             x11c)
[<8018f8b8>] (handle_mm_fault+0xcc/0x11c) from [<8005d64c>] (do_page_fault+0x26c/0x3e                                                             4)
[<8005d64c>] (do_page_fault+0x26c/0x3e4) from [<8004b4e0>] (do_DataAbort+0x34/0x17c)
[<8004b4e0>] (do_DataAbort+0x34/0x17c) from [<80051544>] (ret_from_exception+0x0/0x40                                                             )
Exception stack(0xba989fb0 to 0xba989ff8)
9fa0:                                     2aae0000 0006b092 00000006 1b485b1b
9fc0: 00000006 2ac39a90 00000000 0006b094 00000000 0006b08e 0005767b 2aad54c0
9fe0: 2aae0000 7e9973b8 2abb98f9 2abbd158 00000010 ffffffff


> Note that we discovered in another thread that CONFIG_TRACE_IRQFLAGS
> should not be enabled. So, if it is enabled, you should disable it.
> 

It is disabled, yes.

> Also, are you running with all I-pipe and Xenomai debugs?
> 
> --
> 					    Gilles.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-13 14:44                                 ` tbultel
@ 2014-11-13 14:51                                   ` Gilles Chanteperdrix
  2014-11-13 15:03                                     ` tbultel
  2014-11-14 10:15                                     ` tbultel
  0 siblings, 2 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-13 14:51 UTC (permalink / raw)
  To: tbultel; +Cc: nicolas Mabire, xenomai

On Thu, Nov 13, 2014 at 03:44:34PM +0100, tbultel@free.fr wrote:
> 
> 
> ----- Mail original -----
> > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > À: "Thierry Bultel" <tbultel@free.fr>
> > Cc: xenomai@xenomai.org, "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>, "nicolas Mabire"
> > <nicolas.mabire@basystemes.fr>
> > Envoyé: Mardi 11 Novembre 2014 21:03:58
> > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > 
> > On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> > > Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> > > >You can use printascii in the timer interrupt acknowledge routine
> > > >to
> > > >print a character every HZ ticks, this will give bad latency, but
> > > >should work.
> > > >
> > > 
> > > For unknown reason, the kernel gets stuck after
> > > "console [tty0] enabled, bootconsole disabled" if I use printascii
> > > in do_local_timer().
> > > earlyprintk seems broken as well.
> > 
> > Without doing this, does earlyprintk work?
> > 
> > > >If this is indeed the broadcast timer, it should never tick,
> > > >because
> > > >we should never switch to broadcast mode.
> > > 
> > > I have found out why it was ticking.
> > > This is due to tick_broadcast_switch_to_oneshot() in
> > > kernel/time/tick-broadcast.c
> > > 
> > > This sets the oneshot mode to the time, and leads to a call of
> > > mxc_set_mode()
> > > 
> > > In that function, there is that comment:
> > > 	if (mode != clockevent_mode) {
> > > 		/* Set event time into far-far future */
> > > 		if (timer_is_v2())
> > > 
> > > ... and I estimate "far-far future" to be about 20 minutes.
> > > 
> > > As a correction, I have made that change to
> > > tick_broadcast_switch_to_oneshot():
> > > 
> > > @@ -603,11 +610,21 @@ void tick_broadcast_setup_oneshot(struct
> > > clock_event_device *bc)
> > >  {
> > >         int cpu = smp_processor_id();
> > > 
> > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
> > > +       printk(KERN_ALERT "%s cpu %d -> dev %s
> > > IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
> > > +       return;
> > > +#endif
> > > 
> > > ... and that makes the job, the iMX Timer is no longer armed.
> > > What do you think about it ?
> > > 
> > > Still currently stress-testing to see if things are getting better.
> > 
> > I am afraid this should not change anything. This timer ticking is
> > not a problem by itself, it is a problem if the twd gets disabled.
> > 
> 
> Unfortunately, you were right.
> Even with that fix, we have 2 fail cases
> 
> I have added 4 leds to the GPIOs, and toggle them in do_local_timer, each 100 ticks
> for being human visible. I found it more convenient at keeps my console clean.
> 
> 1) on one machine, the freeze does not make a backtrace on the serial console
>  a) in most cases, the leds do not blink, no more interrupts
>  b) in one case, a single one kept blinking for a while before stopping


do_local_timer is a bit to high level, try putting it in
twd_timer_ack. Also, try setting the Linux timer to periodic
(disable HIGHRES_TIMERS), this should make Xenomai timer tick
periodically, not matter what happens on linux side.


> 
> I first thought about graphics hardware acceleration because with it the display gets corrupted,
> but even without it, the freeze happens the same way. 
> 
> 2) on another machine, I am getting this backtrace. The local timers keep ticking.
> 
> Since it was hard to reproduce with IPIPE_DEBUG, it was unfortunately disabled for that run.
> Attempting again with IPIPE_DEBUG enabled.
> 
> SGT_AGV2 login: Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 2    Not tainted  (3.0.43_4.1.0 #8)
> PC is at flush_tlb_page+0x20/0xe8
> LR is at ptep_set_access_flags+0x4c/0x8c
> pc : [<80057da8>]    lr : [<8019a748>]    psr: 20000113
> sp : ba989d90  ip : 00000800  fp : 00000002
> r10: 00000000  r9 : 00000000  r8 : ba882380
> r7 : 8bf5104c  r6 : ba97d108  r5 : ba97d108  r4 : 00000001
> r3 : 2aae0000  r2 : 00000001  r1 : 2aae0000  r0 : ba97d108
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 10c53c7d  Table: 4a06004a  DAC: 00000015
> Process watch (pid: 3904, stack limit = 0xba9882f0)
> Stack: (0xba989d90 to 0xba98a000)
> 9d80:                                     00000000 00000000 00000000 00000000
> 9da0: 00000001 2aae0000 ba97d108 8019a748 4f6b075f ba882380 00000001 ba97d108
> 9dc0: 2aae0000 8018d828 00000001 ba988000 ba989e54 80083a50 8004c094 ba5a7ba0
> 9de0: 00000000 00000000 00000002 bffb4000 800494c8 4f6b07df ba882380 ba97d108
> 9e00: ba060aa8 2aae0000 8bf5104c ba5a7ba0 00000002 8018f598 ba060aa8 8bf5104c
> 9e20: 4f6b07df bffb4000 00000002 800ae980 00000000 ba060000 00000155 2aae0000
> 9e40: ba97d108 00000001 ba060aa8 ba5a7ba0 00000002 8018f8b8 ba060aa8 00000001
> 9e60: ba989fb0 2aae0000 0000081f ba5a7ba0 bac1ae80 ba97d108 ba5a7bdc 8005d64c
> 9e80: 80046204 80049b80 ba989e9c 00000001 e96703c8 0000037d 8c01c4f8 80668000
> 9ea0: 8c01c4f8 800ad3d0 00000001 00000000 00000001 ba988000 ba989ecc 80082754
> 9ec0: ba989f38 00000000 ba989f98 80668000 8c01c4f8 800ace68 00000001 00000000
> 9ee0: 00000000 0000040f 0000000f 0000081f ba989fb0 80c8ba88 0006b08e 2aae0000
> 9f00: 2aad54c0 8004b4e0 3b9b8d50 00000000 00000001 00000000 ba988000 0000c350
> 9f20: 7e9977a4 800adcf4 0000c350 00000000 3b9b8d50 00000000 ba989f39 8c01c5a8
> 9f40: 00000000 00000000 e967c718 0000037d e96703c8 0000037d 800ac958 8c01c4f8
> 9f60: 00000000 00000000 00000000 8009da50 00000000 7e9977a4 00000000 00010000
> 9f80: 000000a2 800517a8 ba988000 00000000 00000000 0000040f 0000000f 00000000
> 9fa0: 0006b094 00000000 0005767b 80051544 2aae0000 0006b092 00000006 1b485b1b
> 9fc0: 00000006 2ac39a90 00000000 0006b094 00000000 0006b08e 0005767b 2aad54c0
> 9fe0: 2aae0000 7e9973b8 2abb98f9 2abbd158 00000010 ffffffff 00000000 00000000
> [<80057da8>] (flush_tlb_page+0x20/0xe8) from [<8019a748>] (ptep_set_access_flags+0x4c                                                             /0x8c)
> [<8019a748>] (ptep_set_access_flags+0x4c/0x8c) from [<8018d828>] (do_wp_page+0x2a4/0x                                                             728)
> [<8018d828>] (do_wp_page+0x2a4/0x728) from [<8018f598>] (handle_pte_fault+0x1bc/0x410                                                             )
> [<8018f598>] (handle_pte_fault+0x1bc/0x410) from [<8018f8b8>] (handle_mm_fault+0xcc/0                                                             x11c)
> [<8018f8b8>] (handle_mm_fault+0xcc/0x11c) from [<8005d64c>] (do_page_fault+0x26c/0x3e                                                             4)
> [<8005d64c>] (do_page_fault+0x26c/0x3e4) from [<8004b4e0>] (do_DataAbort+0x34/0x17c)
> [<8004b4e0>] (do_DataAbort+0x34/0x17c) from [<80051544>] (ret_from_exception+0x0/0x40                                                             )
> Exception stack(0xba989fb0 to 0xba989ff8)
> 9fa0:                                     2aae0000 0006b092 00000006 1b485b1b
> 9fc0: 00000006 2ac39a90 00000000 0006b094 00000000 0006b08e 0005767b 2aad54c0
> 9fe0: 2aae0000 7e9973b8 2abb98f9 2abbd158 00000010 ffffffff
> Code: e5922000 e24dd010 e3520000 0a00000c (ee102ff1)
> ---[ end trace 1a38ad1feb34b78d ]---

Do you have unlocked context switch enabled? If yes, turn it off.

Also, if enabling all debugs do not work, try only enabling the
I-pipe tracer, define sufficient back trace points (1000), and try
and get a trace when the problem happens.

You should try and debug the issue for which you have a trace of
course, but I do not see how it could be related to the other
issues.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-13 14:51                                   ` Gilles Chanteperdrix
@ 2014-11-13 15:03                                     ` tbultel
  2014-11-13 15:10                                       ` Gilles Chanteperdrix
  2014-11-14 10:15                                     ` tbultel
  1 sibling, 1 reply; 46+ messages in thread
From: tbultel @ 2014-11-13 15:03 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai



----- Mail original -----
> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> À: tbultel@free.fr
> Cc: xenomai@xenomai.org, "nicolas Mabire" <nicolas.mabire@basystemes.fr>
> Envoyé: Jeudi 13 Novembre 2014 15:51:15
> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> 
> On Thu, Nov 13, 2014 at 03:44:34PM +0100, tbultel@free.fr wrote:
> > 
> > 
> > ----- Mail original -----
> > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > À: "Thierry Bultel" <tbultel@free.fr>
> > > Cc: xenomai@xenomai.org, "Lennart Sorensen"
> > > <lsorense@csclub.uwaterloo.ca>, "nicolas Mabire"
> > > <nicolas.mabire@basystemes.fr>
> > > Envoyé: Mardi 11 Novembre 2014 21:03:58
> > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> > > adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > 
> > > On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> > > > Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> > > > >You can use printascii in the timer interrupt acknowledge
> > > > >routine
> > > > >to
> > > > >print a character every HZ ticks, this will give bad latency,
> > > > >but
> > > > >should work.
> > > > >
> > > > 
> > > > For unknown reason, the kernel gets stuck after
> > > > "console [tty0] enabled, bootconsole disabled" if I use
> > > > printascii
> > > > in do_local_timer().
> > > > earlyprintk seems broken as well.
> > > 
> > > Without doing this, does earlyprintk work?
> > > 
> > > > >If this is indeed the broadcast timer, it should never tick,
> > > > >because
> > > > >we should never switch to broadcast mode.
> > > > 
> > > > I have found out why it was ticking.
> > > > This is due to tick_broadcast_switch_to_oneshot() in
> > > > kernel/time/tick-broadcast.c
> > > > 
> > > > This sets the oneshot mode to the time, and leads to a call of
> > > > mxc_set_mode()
> > > > 
> > > > In that function, there is that comment:
> > > > 	if (mode != clockevent_mode) {
> > > > 		/* Set event time into far-far future */
> > > > 		if (timer_is_v2())
> > > > 
> > > > ... and I estimate "far-far future" to be about 20 minutes.
> > > > 
> > > > As a correction, I have made that change to
> > > > tick_broadcast_switch_to_oneshot():
> > > > 
> > > > @@ -603,11 +610,21 @@ void tick_broadcast_setup_oneshot(struct
> > > > clock_event_device *bc)
> > > >  {
> > > >         int cpu = smp_processor_id();
> > > > 
> > > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
> > > > +       printk(KERN_ALERT "%s cpu %d -> dev %s
> > > > IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
> > > > +       return;
> > > > +#endif
> > > > 
> > > > ... and that makes the job, the iMX Timer is no longer armed.
> > > > What do you think about it ?
> > > > 
> > > > Still currently stress-testing to see if things are getting
> > > > better.
> > > 
> > > I am afraid this should not change anything. This timer ticking
> > > is
> > > not a problem by itself, it is a problem if the twd gets
> > > disabled.
> > > 
> > 
> > Unfortunately, you were right.
> > Even with that fix, we have 2 fail cases
> > 
> > I have added 4 leds to the GPIOs, and toggle them in
> > do_local_timer, each 100 ticks
> > for being human visible. I found it more convenient at keeps my
> > console clean.
> > 
> > 1) on one machine, the freeze does not make a backtrace on the
> > serial console
> >  a) in most cases, the leds do not blink, no more interrupts
> >  b) in one case, a single one kept blinking for a while before
> >  stopping
> 
> 
> do_local_timer is a bit to high level, try putting it in
> twd_timer_ack. Also, try setting the Linux timer to periodic
> (disable HIGHRES_TIMERS), this should make Xenomai timer tick
> periodically, not matter what happens on linux side.

I can try that, but that would break my linux application that relies on it.

> 
> 
> > 
> > I first thought about graphics hardware acceleration because with
> > it the display gets corrupted,
> > but even without it, the freeze happens the same way.
> > 
> > 2) on another machine, I am getting this backtrace. The local
> > timers keep ticking.
> > 
> > Since it was hard to reproduce with IPIPE_DEBUG, it was
> > unfortunately disabled for that run.
> > Attempting again with IPIPE_DEBUG enabled.
> > 
> > SGT_AGV2 login: Internal error: Oops - undefined instruction: 0
> > [#1] PREEMPT SMP
> > Modules linked in:
> > CPU: 2    Not tainted  (3.0.43_4.1.0 #8)
> > PC is at flush_tlb_page+0x20/0xe8
> > LR is at ptep_set_access_flags+0x4c/0x8c
> > pc : [<80057da8>]    lr : [<8019a748>]    psr: 20000113
> > sp : ba989d90  ip : 00000800  fp : 00000002
> > r10: 00000000  r9 : 00000000  r8 : ba882380
> > r7 : 8bf5104c  r6 : ba97d108  r5 : ba97d108  r4 : 00000001
> > r3 : 2aae0000  r2 : 00000001  r1 : 2aae0000  r0 : ba97d108
> > Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> > Control: 10c53c7d  Table: 4a06004a  DAC: 00000015
> > Process watch (pid: 3904, stack limit = 0xba9882f0)
> > Stack: (0xba989d90 to 0xba98a000)
> > 9d80:                                     00000000 00000000
> > 00000000 00000000
> > 9da0: 00000001 2aae0000 ba97d108 8019a748 4f6b075f ba882380
> > 00000001 ba97d108
> > 9dc0: 2aae0000 8018d828 00000001 ba988000 ba989e54 80083a50
> > 8004c094 ba5a7ba0
> > 9de0: 00000000 00000000 00000002 bffb4000 800494c8 4f6b07df
> > ba882380 ba97d108
> > 9e00: ba060aa8 2aae0000 8bf5104c ba5a7ba0 00000002 8018f598
> > ba060aa8 8bf5104c
> > 9e20: 4f6b07df bffb4000 00000002 800ae980 00000000 ba060000
> > 00000155 2aae0000
> > 9e40: ba97d108 00000001 ba060aa8 ba5a7ba0 00000002 8018f8b8
> > ba060aa8 00000001
> > 9e60: ba989fb0 2aae0000 0000081f ba5a7ba0 bac1ae80 ba97d108
> > ba5a7bdc 8005d64c
> > 9e80: 80046204 80049b80 ba989e9c 00000001 e96703c8 0000037d
> > 8c01c4f8 80668000
> > 9ea0: 8c01c4f8 800ad3d0 00000001 00000000 00000001 ba988000
> > ba989ecc 80082754
> > 9ec0: ba989f38 00000000 ba989f98 80668000 8c01c4f8 800ace68
> > 00000001 00000000
> > 9ee0: 00000000 0000040f 0000000f 0000081f ba989fb0 80c8ba88
> > 0006b08e 2aae0000
> > 9f00: 2aad54c0 8004b4e0 3b9b8d50 00000000 00000001 00000000
> > ba988000 0000c350
> > 9f20: 7e9977a4 800adcf4 0000c350 00000000 3b9b8d50 00000000
> > ba989f39 8c01c5a8
> > 9f40: 00000000 00000000 e967c718 0000037d e96703c8 0000037d
> > 800ac958 8c01c4f8
> > 9f60: 00000000 00000000 00000000 8009da50 00000000 7e9977a4
> > 00000000 00010000
> > 9f80: 000000a2 800517a8 ba988000 00000000 00000000 0000040f
> > 0000000f 00000000
> > 9fa0: 0006b094 00000000 0005767b 80051544 2aae0000 0006b092
> > 00000006 1b485b1b
> > 9fc0: 00000006 2ac39a90 00000000 0006b094 00000000 0006b08e
> > 0005767b 2aad54c0
> > 9fe0: 2aae0000 7e9973b8 2abb98f9 2abbd158 00000010 ffffffff
> > 00000000 00000000
> > [<80057da8>] (flush_tlb_page+0x20/0xe8) from [<8019a748>]
> > (ptep_set_access_flags+0x4c
> >                                                             /0x8c)
> > [<8019a748>] (ptep_set_access_flags+0x4c/0x8c) from [<8018d828>]
> > (do_wp_page+0x2a4/0x
> >                                                             728)
> > [<8018d828>] (do_wp_page+0x2a4/0x728) from [<8018f598>]
> > (handle_pte_fault+0x1bc/0x410
> >                                                             )
> > [<8018f598>] (handle_pte_fault+0x1bc/0x410) from [<8018f8b8>]
> > (handle_mm_fault+0xcc/0
> >                                                             x11c)
> > [<8018f8b8>] (handle_mm_fault+0xcc/0x11c) from [<8005d64c>]
> > (do_page_fault+0x26c/0x3e
> >                                                             4)
> > [<8005d64c>] (do_page_fault+0x26c/0x3e4) from [<8004b4e0>]
> > (do_DataAbort+0x34/0x17c)
> > [<8004b4e0>] (do_DataAbort+0x34/0x17c) from [<80051544>]
> > (ret_from_exception+0x0/0x40
> >                                                             )
> > Exception stack(0xba989fb0 to 0xba989ff8)
> > 9fa0:                                     2aae0000 0006b092
> > 00000006 1b485b1b
> > 9fc0: 00000006 2ac39a90 00000000 0006b094 00000000 0006b08e
> > 0005767b 2aad54c0
> > 9fe0: 2aae0000 7e9973b8 2abb98f9 2abbd158 00000010 ffffffff
> > Code: e5922000 e24dd010 e3520000 0a00000c (ee102ff1)
> > ---[ end trace 1a38ad1feb34b78d ]---
> 
> Do you have unlocked context switch enabled? If yes, turn it off.

I can't: 
 #error "Xenomai: ARM SMP systems require unlocked context switch prior to Linux 3.8"



> 
> Also, if enabling all debugs do not work, try only enabling the
> I-pipe tracer, define sufficient back trace points (1000), and try
> and get a trace when the problem happens.
> 
> You should try and debug the issue for which you have a trace of
> course, but I do not see how it could be related to the other
> issues.
> 
> --
> 					    Gilles.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-13 15:03                                     ` tbultel
@ 2014-11-13 15:10                                       ` Gilles Chanteperdrix
  2014-11-13 15:23                                         ` tbultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-13 15:10 UTC (permalink / raw)
  To: tbultel; +Cc: nicolas Mabire, xenomai

On Thu, Nov 13, 2014 at 04:03:31PM +0100, tbultel@free.fr wrote:
> 
> 
> ----- Mail original -----
> > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > À: tbultel@free.fr
> > Cc: xenomai@xenomai.org, "nicolas Mabire" <nicolas.mabire@basystemes.fr>
> > Envoyé: Jeudi 13 Novembre 2014 15:51:15
> > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > 
> > On Thu, Nov 13, 2014 at 03:44:34PM +0100, tbultel@free.fr wrote:
> > > 
> > > 
> > > ----- Mail original -----
> > > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > > À: "Thierry Bultel" <tbultel@free.fr>
> > > > Cc: xenomai@xenomai.org, "Lennart Sorensen"
> > > > <lsorense@csclub.uwaterloo.ca>, "nicolas Mabire"
> > > > <nicolas.mabire@basystemes.fr>
> > > > Envoyé: Mardi 11 Novembre 2014 21:03:58
> > > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> > > > adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > > 
> > > > On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> > > > > Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> > > > > >You can use printascii in the timer interrupt acknowledge
> > > > > >routine
> > > > > >to
> > > > > >print a character every HZ ticks, this will give bad latency,
> > > > > >but
> > > > > >should work.
> > > > > >
> > > > > 
> > > > > For unknown reason, the kernel gets stuck after
> > > > > "console [tty0] enabled, bootconsole disabled" if I use
> > > > > printascii
> > > > > in do_local_timer().
> > > > > earlyprintk seems broken as well.
> > > > 
> > > > Without doing this, does earlyprintk work?
> > > > 
> > > > > >If this is indeed the broadcast timer, it should never tick,
> > > > > >because
> > > > > >we should never switch to broadcast mode.
> > > > > 
> > > > > I have found out why it was ticking.
> > > > > This is due to tick_broadcast_switch_to_oneshot() in
> > > > > kernel/time/tick-broadcast.c
> > > > > 
> > > > > This sets the oneshot mode to the time, and leads to a call of
> > > > > mxc_set_mode()
> > > > > 
> > > > > In that function, there is that comment:
> > > > > 	if (mode != clockevent_mode) {
> > > > > 		/* Set event time into far-far future */
> > > > > 		if (timer_is_v2())
> > > > > 
> > > > > ... and I estimate "far-far future" to be about 20 minutes.
> > > > > 
> > > > > As a correction, I have made that change to
> > > > > tick_broadcast_switch_to_oneshot():
> > > > > 
> > > > > @@ -603,11 +610,21 @@ void tick_broadcast_setup_oneshot(struct
> > > > > clock_event_device *bc)
> > > > >  {
> > > > >         int cpu = smp_processor_id();
> > > > > 
> > > > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
> > > > > +       printk(KERN_ALERT "%s cpu %d -> dev %s
> > > > > IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
> > > > > +       return;
> > > > > +#endif
> > > > > 
> > > > > ... and that makes the job, the iMX Timer is no longer armed.
> > > > > What do you think about it ?
> > > > > 
> > > > > Still currently stress-testing to see if things are getting
> > > > > better.
> > > > 
> > > > I am afraid this should not change anything. This timer ticking
> > > > is
> > > > not a problem by itself, it is a problem if the twd gets
> > > > disabled.
> > > > 
> > > 
> > > Unfortunately, you were right.
> > > Even with that fix, we have 2 fail cases
> > > 
> > > I have added 4 leds to the GPIOs, and toggle them in
> > > do_local_timer, each 100 ticks
> > > for being human visible. I found it more convenient at keeps my
> > > console clean.
> > > 
> > > 1) on one machine, the freeze does not make a backtrace on the
> > > serial console
> > >  a) in most cases, the leds do not blink, no more interrupts
> > >  b) in one case, a single one kept blinking for a while before
> > >  stopping
> > 
> > 
> > do_local_timer is a bit to high level, try putting it in
> > twd_timer_ack. Also, try setting the Linux timer to periodic
> > (disable HIGHRES_TIMERS), this should make Xenomai timer tick
> > periodically, not matter what happens on linux side.
> 
> I can try that, but that would break my linux application that relies on it.

I mean for testing. Does the application exit in error, or does not
work very correctly?

> > Do you have unlocked context switch enabled? If yes, turn it off.
> 
> I can't: 
>  #error "Xenomai: ARM SMP systems require unlocked context switch prior to Linux 3.8"

Oh yeah, there is that too. I completely forgot about that.

We would need to backport the changes that have been made for
switching contexts, but that is not a small change.

Is there anyway you can run your tests without the specific hardware
of your board, say with a recent mainline kernel? Just to see if the
bug is gone with what we have made since then, or if it is still there?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-13 15:10                                       ` Gilles Chanteperdrix
@ 2014-11-13 15:23                                         ` tbultel
  2014-11-13 15:26                                           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: tbultel @ 2014-11-13 15:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai



----- Mail original -----
> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> À: tbultel@free.fr
> Cc: xenomai@xenomai.org, "nicolas Mabire" <nicolas.mabire@basystemes.fr>
> Envoyé: Jeudi 13 Novembre 2014 16:10:22
> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> 
> On Thu, Nov 13, 2014 at 04:03:31PM +0100, tbultel@free.fr wrote:
> > 
> > 
> > ----- Mail original -----
> > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > À: tbultel@free.fr
> > > Cc: xenomai@xenomai.org, "nicolas Mabire"
> > > <nicolas.mabire@basystemes.fr>
> > > Envoyé: Jeudi 13 Novembre 2014 15:51:15
> > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> > > adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > 
> > > On Thu, Nov 13, 2014 at 03:44:34PM +0100, tbultel@free.fr wrote:
> > > > 
> > > > 
> > > > ----- Mail original -----
> > > > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > > > À: "Thierry Bultel" <tbultel@free.fr>
> > > > > Cc: xenomai@xenomai.org, "Lennart Sorensen"
> > > > > <lsorense@csclub.uwaterloo.ca>, "nicolas Mabire"
> > > > > <nicolas.mabire@basystemes.fr>
> > > > > Envoyé: Mardi 11 Novembre 2014 21:03:58
> > > > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> > > > > adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > > > 
> > > > > On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel
> > > > > wrote:
> > > > > > Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> > > > > > >You can use printascii in the timer interrupt acknowledge
> > > > > > >routine
> > > > > > >to
> > > > > > >print a character every HZ ticks, this will give bad
> > > > > > >latency,
> > > > > > >but
> > > > > > >should work.
> > > > > > >
> > > > > > 
> > > > > > For unknown reason, the kernel gets stuck after
> > > > > > "console [tty0] enabled, bootconsole disabled" if I use
> > > > > > printascii
> > > > > > in do_local_timer().
> > > > > > earlyprintk seems broken as well.
> > > > > 
> > > > > Without doing this, does earlyprintk work?
> > > > > 
> > > > > > >If this is indeed the broadcast timer, it should never
> > > > > > >tick,
> > > > > > >because
> > > > > > >we should never switch to broadcast mode.
> > > > > > 
> > > > > > I have found out why it was ticking.
> > > > > > This is due to tick_broadcast_switch_to_oneshot() in
> > > > > > kernel/time/tick-broadcast.c
> > > > > > 
> > > > > > This sets the oneshot mode to the time, and leads to a call
> > > > > > of
> > > > > > mxc_set_mode()
> > > > > > 
> > > > > > In that function, there is that comment:
> > > > > > 	if (mode != clockevent_mode) {
> > > > > > 		/* Set event time into far-far future */
> > > > > > 		if (timer_is_v2())
> > > > > > 
> > > > > > ... and I estimate "far-far future" to be about 20 minutes.
> > > > > > 
> > > > > > As a correction, I have made that change to
> > > > > > tick_broadcast_switch_to_oneshot():
> > > > > > 
> > > > > > @@ -603,11 +610,21 @@ void
> > > > > > tick_broadcast_setup_oneshot(struct
> > > > > > clock_event_device *bc)
> > > > > >  {
> > > > > >         int cpu = smp_processor_id();
> > > > > > 
> > > > > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
> > > > > > +       printk(KERN_ALERT "%s cpu %d -> dev %s
> > > > > > IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
> > > > > > +       return;
> > > > > > +#endif
> > > > > > 
> > > > > > ... and that makes the job, the iMX Timer is no longer
> > > > > > armed.
> > > > > > What do you think about it ?
> > > > > > 
> > > > > > Still currently stress-testing to see if things are getting
> > > > > > better.
> > > > > 
> > > > > I am afraid this should not change anything. This timer
> > > > > ticking
> > > > > is
> > > > > not a problem by itself, it is a problem if the twd gets
> > > > > disabled.
> > > > > 
> > > > 
> > > > Unfortunately, you were right.
> > > > Even with that fix, we have 2 fail cases
> > > > 
> > > > I have added 4 leds to the GPIOs, and toggle them in
> > > > do_local_timer, each 100 ticks
> > > > for being human visible. I found it more convenient at keeps my
> > > > console clean.
> > > > 
> > > > 1) on one machine, the freeze does not make a backtrace on the
> > > > serial console
> > > >  a) in most cases, the leds do not blink, no more interrupts
> > > >  b) in one case, a single one kept blinking for a while before
> > > >  stopping
> > > 
> > > 
> > > do_local_timer is a bit to high level, try putting it in
> > > twd_timer_ack. Also, try setting the Linux timer to periodic
> > > (disable HIGHRES_TIMERS), this should make Xenomai timer tick
> > > periodically, not matter what happens on linux side.
> > 
> > I can try that, but that would break my linux application that
> > relies on it.
> 
> I mean for testing. Does the application exit in error, or does not
> work very correctly?

Good for testing. It should behave badly but should keep alive. 
Let's try that.

> 
> > > Do you have unlocked context switch enabled? If yes, turn it off.
> > 
> > I can't:
> >  #error "Xenomai: ARM SMP systems require unlocked context switch
> >  prior to Linux 3.8"
> 
> Oh yeah, there is that too. I completely forgot about that.
> 
> We would need to backport the changes that have been made for
> switching contexts, but that is not a small change.
> 
> Is there anyway you can run your tests without the specific hardware
> of your board, say with a recent mainline kernel? Just to see if the
> bug is gone with what we have made since then, or if it is still
> there?


Well, I already thought about that. 
Basically, the AMOS820 is based on the 'nitrogen6x' board, formerly called 'sabrelite'
Even without supporting all the hardware, that would mean to
convert the BSP of VIA, written in C, in a device tree, and I do not feel
completely comfortable in doing this for the moment.

> 
> --
> 					    Gilles.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-13 15:23                                         ` tbultel
@ 2014-11-13 15:26                                           ` Gilles Chanteperdrix
  0 siblings, 0 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-13 15:26 UTC (permalink / raw)
  To: tbultel; +Cc: nicolas Mabire, xenomai

On Thu, Nov 13, 2014 at 04:23:17PM +0100, tbultel@free.fr wrote:
> 
> 
> ----- Mail original -----
> > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > À: tbultel@free.fr
> > Cc: xenomai@xenomai.org, "nicolas Mabire" <nicolas.mabire@basystemes.fr>
> > Envoyé: Jeudi 13 Novembre 2014 16:10:22
> > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > 
> > On Thu, Nov 13, 2014 at 04:03:31PM +0100, tbultel@free.fr wrote:
> > > 
> > > 
> > > ----- Mail original -----
> > > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > > À: tbultel@free.fr
> > > > Cc: xenomai@xenomai.org, "nicolas Mabire"
> > > > <nicolas.mabire@basystemes.fr>
> > > > Envoyé: Jeudi 13 Novembre 2014 15:51:15
> > > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> > > > adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > > 
> > > > On Thu, Nov 13, 2014 at 03:44:34PM +0100, tbultel@free.fr wrote:
> > > > > 
> > > > > 
> > > > > ----- Mail original -----
> > > > > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > > > > À: "Thierry Bultel" <tbultel@free.fr>
> > > > > > Cc: xenomai@xenomai.org, "Lennart Sorensen"
> > > > > > <lsorense@csclub.uwaterloo.ca>, "nicolas Mabire"
> > > > > > <nicolas.mabire@basystemes.fr>
> > > > > > Envoyé: Mardi 11 Novembre 2014 21:03:58
> > > > > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> > > > > > adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > > > > 
> > > > > > On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel
> > > > > > wrote:
> > > > > > > Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> > > > > > > >You can use printascii in the timer interrupt acknowledge
> > > > > > > >routine
> > > > > > > >to
> > > > > > > >print a character every HZ ticks, this will give bad
> > > > > > > >latency,
> > > > > > > >but
> > > > > > > >should work.
> > > > > > > >
> > > > > > > 
> > > > > > > For unknown reason, the kernel gets stuck after
> > > > > > > "console [tty0] enabled, bootconsole disabled" if I use
> > > > > > > printascii
> > > > > > > in do_local_timer().
> > > > > > > earlyprintk seems broken as well.
> > > > > > 
> > > > > > Without doing this, does earlyprintk work?
> > > > > > 
> > > > > > > >If this is indeed the broadcast timer, it should never
> > > > > > > >tick,
> > > > > > > >because
> > > > > > > >we should never switch to broadcast mode.
> > > > > > > 
> > > > > > > I have found out why it was ticking.
> > > > > > > This is due to tick_broadcast_switch_to_oneshot() in
> > > > > > > kernel/time/tick-broadcast.c
> > > > > > > 
> > > > > > > This sets the oneshot mode to the time, and leads to a call
> > > > > > > of
> > > > > > > mxc_set_mode()
> > > > > > > 
> > > > > > > In that function, there is that comment:
> > > > > > > 	if (mode != clockevent_mode) {
> > > > > > > 		/* Set event time into far-far future */
> > > > > > > 		if (timer_is_v2())
> > > > > > > 
> > > > > > > ... and I estimate "far-far future" to be about 20 minutes.
> > > > > > > 
> > > > > > > As a correction, I have made that change to
> > > > > > > tick_broadcast_switch_to_oneshot():
> > > > > > > 
> > > > > > > @@ -603,11 +610,21 @@ void
> > > > > > > tick_broadcast_setup_oneshot(struct
> > > > > > > clock_event_device *bc)
> > > > > > >  {
> > > > > > >         int cpu = smp_processor_id();
> > > > > > > 
> > > > > > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
> > > > > > > +       printk(KERN_ALERT "%s cpu %d -> dev %s
> > > > > > > IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
> > > > > > > +       return;
> > > > > > > +#endif
> > > > > > > 
> > > > > > > ... and that makes the job, the iMX Timer is no longer
> > > > > > > armed.
> > > > > > > What do you think about it ?
> > > > > > > 
> > > > > > > Still currently stress-testing to see if things are getting
> > > > > > > better.
> > > > > > 
> > > > > > I am afraid this should not change anything. This timer
> > > > > > ticking
> > > > > > is
> > > > > > not a problem by itself, it is a problem if the twd gets
> > > > > > disabled.
> > > > > > 
> > > > > 
> > > > > Unfortunately, you were right.
> > > > > Even with that fix, we have 2 fail cases
> > > > > 
> > > > > I have added 4 leds to the GPIOs, and toggle them in
> > > > > do_local_timer, each 100 ticks
> > > > > for being human visible. I found it more convenient at keeps my
> > > > > console clean.
> > > > > 
> > > > > 1) on one machine, the freeze does not make a backtrace on the
> > > > > serial console
> > > > >  a) in most cases, the leds do not blink, no more interrupts
> > > > >  b) in one case, a single one kept blinking for a while before
> > > > >  stopping
> > > > 
> > > > 
> > > > do_local_timer is a bit to high level, try putting it in
> > > > twd_timer_ack. Also, try setting the Linux timer to periodic
> > > > (disable HIGHRES_TIMERS), this should make Xenomai timer tick
> > > > periodically, not matter what happens on linux side.
> > > 
> > > I can try that, but that would break my linux application that
> > > relies on it.
> > 
> > I mean for testing. Does the application exit in error, or does not
> > work very correctly?
> 
> Good for testing. It should behave badly but should keep alive. 
> Let's try that.
> 
> > 
> > > > Do you have unlocked context switch enabled? If yes, turn it off.
> > > 
> > > I can't:
> > >  #error "Xenomai: ARM SMP systems require unlocked context switch
> > >  prior to Linux 3.8"
> > 
> > Oh yeah, there is that too. I completely forgot about that.
> > 
> > We would need to backport the changes that have been made for
> > switching contexts, but that is not a small change.
> > 
> > Is there anyway you can run your tests without the specific hardware
> > of your board, say with a recent mainline kernel? Just to see if the
> > bug is gone with what we have made since then, or if it is still
> > there?
> 
> 
> Well, I already thought about that. 
> Basically, the AMOS820 is based on the 'nitrogen6x' board, formerly called 'sabrelite'
> Even without supporting all the hardware, that would mean to
> convert the BSP of VIA, written in C, in a device tree, and I do not feel
> completely comfortable in doing this for the moment.

Both the sabrelite and nitrogen6x device trees are delivered with
the mainline kernel. Would not they be sufficient to boot on your board?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-13 14:51                                   ` Gilles Chanteperdrix
  2014-11-13 15:03                                     ` tbultel
@ 2014-11-14 10:15                                     ` tbultel
  2014-11-14 10:28                                       ` Gilles Chanteperdrix
  1 sibling, 1 reply; 46+ messages in thread
From: tbultel @ 2014-11-14 10:15 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai



----- Mail original -----
> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> À: tbultel@free.fr
> Cc: xenomai@xenomai.org, "nicolas Mabire" <nicolas.mabire@basystemes.fr>
> Envoyé: Jeudi 13 Novembre 2014 15:51:15
> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> 
> On Thu, Nov 13, 2014 at 03:44:34PM +0100, tbultel@free.fr wrote:
> > 
> > 
> > ----- Mail original -----
> > > De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> > > À: "Thierry Bultel" <tbultel@free.fr>
> > > Cc: xenomai@xenomai.org, "Lennart Sorensen"
> > > <lsorense@csclub.uwaterloo.ca>, "nicolas Mabire"
> > > <nicolas.mabire@basystemes.fr>
> > > Envoyé: Mardi 11 Novembre 2014 21:03:58
> > > Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 +
> > > adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> > > 
> > > On Tue, Nov 11, 2014 at 08:57:34PM +0100, Thierry Bultel wrote:
> > > > Le 10/11/2014 13:36, Gilles Chanteperdrix a écrit :
> > > > >You can use printascii in the timer interrupt acknowledge
> > > > >routine
> > > > >to
> > > > >print a character every HZ ticks, this will give bad latency,
> > > > >but
> > > > >should work.
> > > > >
> > > > 
> > > > For unknown reason, the kernel gets stuck after
> > > > "console [tty0] enabled, bootconsole disabled" if I use
> > > > printascii
> > > > in do_local_timer().
> > > > earlyprintk seems broken as well.
> > > 
> > > Without doing this, does earlyprintk work?
> > > 
> > > > >If this is indeed the broadcast timer, it should never tick,
> > > > >because
> > > > >we should never switch to broadcast mode.
> > > > 
> > > > I have found out why it was ticking.
> > > > This is due to tick_broadcast_switch_to_oneshot() in
> > > > kernel/time/tick-broadcast.c
> > > > 
> > > > This sets the oneshot mode to the time, and leads to a call of
> > > > mxc_set_mode()
> > > > 
> > > > In that function, there is that comment:
> > > > 	if (mode != clockevent_mode) {
> > > > 		/* Set event time into far-far future */
> > > > 		if (timer_is_v2())
> > > > 
> > > > ... and I estimate "far-far future" to be about 20 minutes.
> > > > 
> > > > As a correction, I have made that change to
> > > > tick_broadcast_switch_to_oneshot():
> > > > 
> > > > @@ -603,11 +610,21 @@ void tick_broadcast_setup_oneshot(struct
> > > > clock_event_device *bc)
> > > >  {
> > > >         int cpu = smp_processor_id();
> > > > 
> > > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
> > > > +       printk(KERN_ALERT "%s cpu %d -> dev %s
> > > > IGNORED\n",__PRETTY_FUNCTION__, cpu, bc->name);
> > > > +       return;
> > > > +#endif
> > > > 
> > > > ... and that makes the job, the iMX Timer is no longer armed.
> > > > What do you think about it ?
> > > > 
> > > > Still currently stress-testing to see if things are getting
> > > > better.
> > > 
> > > I am afraid this should not change anything. This timer ticking
> > > is
> > > not a problem by itself, it is a problem if the twd gets
> > > disabled.
> > > 
> > 
> > Unfortunately, you were right.
> > Even with that fix, we have 2 fail cases
> > 
> > I have added 4 leds to the GPIOs, and toggle them in
> > do_local_timer, each 100 ticks
> > for being human visible. I found it more convenient at keeps my
> > console clean.
> > 
> > 1) on one machine, the freeze does not make a backtrace on the
> > serial console
> >  a) in most cases, the leds do not blink, no more interrupts
> >  b) in one case, a single one kept blinking for a while before
> >  stopping
> 
> 
> do_local_timer is a bit to high level, try putting it in
> twd_timer_ack. Also, try setting the Linux timer to periodic
> (disable HIGHRES_TIMERS), this should make Xenomai timer tick
> periodically, not matter what happens on linux side.
> 
> 
> > 
> > I first thought about graphics hardware acceleration because with
> > it the display gets corrupted,
> > but even without it, the freeze happens the same way.
> > 
> > 2) on another machine, I am getting this backtrace. The local
> > timers keep ticking.
> > 
> > Since it was hard to reproduce with IPIPE_DEBUG, it was
> > unfortunately disabled for that run.
> > Attempting again with IPIPE_DEBUG enabled.
> > 
> > SGT_AGV2 login: Internal error: Oops - undefined instruction: 0
> > [#1] PREEMPT SMP
> > Modules linked in:
> > CPU: 2    Not tainted  (3.0.43_4.1.0 #8)
> > PC is at flush_tlb_page+0x20/0xe8
> > LR is at ptep_set_access_flags+0x4c/0x8c
> > pc : [<80057da8>]    lr : [<8019a748>]    psr: 20000113
> > sp : ba989d90  ip : 00000800  fp : 00000002
> > r10: 00000000  r9 : 00000000  r8 : ba882380
> > r7 : 8bf5104c  r6 : ba97d108  r5 : ba97d108  r4 : 00000001
> > r3 : 2aae0000  r2 : 00000001  r1 : 2aae0000  r0 : ba97d108
> > Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> > Control: 10c53c7d  Table: 4a06004a  DAC: 00000015
> > Process watch (pid: 3904, stack limit = 0xba9882f0)
> > Stack: (0xba989d90 to 0xba98a000)
> > 9d80:                                     00000000 00000000
> > 00000000 00000000
> > 9da0: 00000001 2aae0000 ba97d108 8019a748 4f6b075f ba882380
> > 00000001 ba97d108
> > 9dc0: 2aae0000 8018d828 00000001 ba988000 ba989e54 80083a50
> > 8004c094 ba5a7ba0
> > 9de0: 00000000 00000000 00000002 bffb4000 800494c8 4f6b07df
> > ba882380 ba97d108
> > 9e00: ba060aa8 2aae0000 8bf5104c ba5a7ba0 00000002 8018f598
> > ba060aa8 8bf5104c
> > 9e20: 4f6b07df bffb4000 00000002 800ae980 00000000 ba060000
> > 00000155 2aae0000
> > 9e40: ba97d108 00000001 ba060aa8 ba5a7ba0 00000002 8018f8b8
> > ba060aa8 00000001
> > 9e60: ba989fb0 2aae0000 0000081f ba5a7ba0 bac1ae80 ba97d108
> > ba5a7bdc 8005d64c
> > 9e80: 80046204 80049b80 ba989e9c 00000001 e96703c8 0000037d
> > 8c01c4f8 80668000
> > 9ea0: 8c01c4f8 800ad3d0 00000001 00000000 00000001 ba988000
> > ba989ecc 80082754
> > 9ec0: ba989f38 00000000 ba989f98 80668000 8c01c4f8 800ace68
> > 00000001 00000000
> > 9ee0: 00000000 0000040f 0000000f 0000081f ba989fb0 80c8ba88
> > 0006b08e 2aae0000
> > 9f00: 2aad54c0 8004b4e0 3b9b8d50 00000000 00000001 00000000
> > ba988000 0000c350
> > 9f20: 7e9977a4 800adcf4 0000c350 00000000 3b9b8d50 00000000
> > ba989f39 8c01c5a8
> > 9f40: 00000000 00000000 e967c718 0000037d e96703c8 0000037d
> > 800ac958 8c01c4f8
> > 9f60: 00000000 00000000 00000000 8009da50 00000000 7e9977a4
> > 00000000 00010000
> > 9f80: 000000a2 800517a8 ba988000 00000000 00000000 0000040f
> > 0000000f 00000000
> > 9fa0: 0006b094 00000000 0005767b 80051544 2aae0000 0006b092
> > 00000006 1b485b1b
> > 9fc0: 00000006 2ac39a90 00000000 0006b094 00000000 0006b08e
> > 0005767b 2aad54c0
> > 9fe0: 2aae0000 7e9973b8 2abb98f9 2abbd158 00000010 ffffffff
> > 00000000 00000000
> > [<80057da8>] (flush_tlb_page+0x20/0xe8) from [<8019a748>]
> > (ptep_set_access_flags+0x4c
> >                                                             /0x8c)
> > [<8019a748>] (ptep_set_access_flags+0x4c/0x8c) from [<8018d828>]
> > (do_wp_page+0x2a4/0x
> >                                                             728)
> > [<8018d828>] (do_wp_page+0x2a4/0x728) from [<8018f598>]
> > (handle_pte_fault+0x1bc/0x410
> >                                                             )
> > [<8018f598>] (handle_pte_fault+0x1bc/0x410) from [<8018f8b8>]
> > (handle_mm_fault+0xcc/0
> >                                                             x11c)
> > [<8018f8b8>] (handle_mm_fault+0xcc/0x11c) from [<8005d64c>]
> > (do_page_fault+0x26c/0x3e
> >                                                             4)
> > [<8005d64c>] (do_page_fault+0x26c/0x3e4) from [<8004b4e0>]
> > (do_DataAbort+0x34/0x17c)
> > [<8004b4e0>] (do_DataAbort+0x34/0x17c) from [<80051544>]
> > (ret_from_exception+0x0/0x40
> >                                                             )
> > Exception stack(0xba989fb0 to 0xba989ff8)
> > 9fa0:                                     2aae0000 0006b092
> > 00000006 1b485b1b
> > 9fc0: 00000006 2ac39a90 00000000 0006b094 00000000 0006b08e
> > 0005767b 2aad54c0
> > 9fe0: 2aae0000 7e9973b8 2abb98f9 2abbd158 00000010 ffffffff
> > Code: e5922000 e24dd010 e3520000 0a00000c (ee102ff1)
> > ---[ end trace 1a38ad1feb34b78d ]---
> 
> Do you have unlocked context switch enabled? If yes, turn it off.
> 
> Also, if enabling all debugs do not work, try only enabling the
> I-pipe tracer, define sufficient back trace points (1000), and try
> and get a trace when the problem happens.
> 
> You should try and debug the issue for which you have a trace of
> course, but I do not see how it could be related to the other
> issues.
> 

Gilles, it has taken time (about 3 hours), but I finally got
a backtrace with IPIPE_DEBUG: (I have disabled CONFIG_IPIPE_TRACE_VMALLOC)

Before the trace went to the serial console, I saw that there were no more timer irq
on 2 CPUs (my ssh session froze). Some 3 or 4 seconds later it came up.
Not clear to me why there are only 100 points, because I have CONFIG_IPIPE_TRACE_SHIFT=14

[root@SGT_AGV2 ~]#
[root@SGT_AGV2 ~]# Unable to handle kernel paging request at virtual address 0c2d3000
pgd = 80004000
[0c2d3000] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP
Modules linked in:
CPU: 2    Not tainted  (3.0.43_4.1.0 #10)
PC is at timekeeping_resume+0x98/0x120
LR is at __timekeeping_inject_sleeptime+0xb0/0xe0
pc : [<802407fc>]    lr : [<8023f990>]    psr: 600f0013
sp : bffb5e58  ip : 00000000  fp : bffb5ea4
r10: 80ee80e0  r9 : 00f6f581  r8 : 80ee80e0
r7 : 8c31c560  r6 : 00000000  r5 : 0c2d3000  r4 : 80f59af0
r3 : 02d5c6da  r2 : 87e00000  r1 : 3b9aca00  r0 : 0c2d3000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c53c7d  Table: 499b404a  DAC: 00000015
Process swapper (pid: 0, stack limit = 0xbffb42f0)
Stack: (0xbffb5e58 to 0xbffb6000)
5e40:                                                       80f58380 00000001
5e60: 00000000 8c31c4c0 8c31c598 bffb5ed8 8020bf38 80262728 0c2d3000 00000000
5e80: 0c2d3000 00000000 80f58380 00000001 00000000 8c31c4c0 bffb5f14 bffb5ea8
5ea0: 8023afb0 802417bc 80f58380 8c31c598 00000000 00000000 bffb5ed4 bffb5ec8
5ec0: 80267ee8 80266e94 8c31c560 bffb5ed8 80269cd0 80267ec0 b1f34262 365a3198
5ee0: 801d5000 80269cb4 00000002 80049040 0c2d3000 00000000 80f58380 00000001
5f00: 00000000 00000000 bffb5f2c bffb5f18 801db9b8 8023af40 80efc100 801ccb24
5f20: bffb5f4c bffb5f30 801ce5ec 801db988 80f5c380 80efc100 80f5c380 80f5c780
5f40: bffb5f5c bffb5f50 801dbe48 801ce588 bffb5f9c bffb5f60 802640d0 801dbe1c
5f60: 80267ee8 80266e94 80049800 80049808 80269cd0 8c31c808 80f4e6e4 80efbb10
5f80: 8088cfc4 80eff42c 412fc09a 00000000 bffb5fb4 bffb5fa0 801d5c44 80263f40
5fa0: bffb4000 80f4e6e4 bffb5fdc bffb5fb8 801d5e90 801d5ba8 00000002 80f13b30
5fc0: 00000002 10c03c7d 80f4e898 1000406a bffb5ff4 bffb5fe0 8087e59c 801d5de8
5fe0: 4ffb806a 00000015 00000000 bffb5ff8 1087ded4 8087e478 00000000 00000000
[<802407fc>] (timekeeping_resume+0x98/0x120) from [<8023afb0>] (hrtimer_interrupt+0x7c/0x2cc)
[<8023afb0>] (hrtimer_interrupt+0x7c/0x2cc) from [<801db9b8>] (ipi_timer+0x3c/0x40)
[<801db9b8>] (ipi_timer+0x3c/0x40) from [<801ce5ec>] (do_local_timer+0x70/0x9c)
[<801ce5ec>] (do_local_timer+0x70/0x9c) from [<801dbe48>] (__ipipe_root_localtimer+0x38/0x3c)
Exception stack(0xbffb5f50 to 0xbffb5f98)
5f40:                                     bffb5f9c bffb5f60 802640d0 801dbe1c
5f60: 80267ee8 80266e94 80049800 80049808 80269cd0 8c31c808 80f4e6e4 80efbb10
5f80: 8088cfc4 80eff42c 412fc09a 00000000 bffb5fb4 bffb5fa0
[<801dbe48>] (__ipipe_root_localtimer+0x38/0x3c) from [<802640d0>] (__ipipe_sync_stage+0x19c/0x330)
[<802640d0>] (__ipipe_sync_stage+0x19c/0x330) from [<801d5c44>] (default_idle+0xa8/0xc0)
[<801d5c44>] (default_idle+0xa8/0xc0) from [<801d5e90>] (cpu_idle+0xb4/0x11c)
[<801d5e90>] (cpu_idle+0xb4/0x11c) from [<8087e59c>] (secondary_start_kernel+0x130/0x154)
[<8087e59c>] (secondary_start_kernel+0x130/0x154) from [<1087ded4>] (0x1087ded4)
Code: e50b302c e51b3020 e50b3028 ebfffc38 (e5956000)
I-pipe tracer log (100 points):
     #func                    0 ipipe_trace_panic_freeze+0x10 (oops_enter+0x1c)
     #func                   -1 oops_enter+0x10 (die+0x30)
     #func                   -1 die+0x14 (__do_kernel_fault.part.3+0x64)
     #func                   -3 ipipe_check_context+0x14 (sub_preempt_count+0x20)
     #func                   -3 sub_preempt_count+0x10 (vprintk+0x21c)
     #func                   -4 __ipipe_stall_root+0x10 (__ipipe_restore_root+0x34)
     #func                   -5 ipipe_check_context+0x14 (__ipipe_restore_root+0x20)
     #func                   -5 __ipipe_restore_root+0x10 (vprintk+0x214)
     #func                   -6 ipipe_check_context+0x14 (sub_preempt_count+0x20)
     #func                   -6 sub_preempt_count+0x10 (wake_up_klogd+0x50)
     #func                   -7 ipipe_check_context+0x14 (add_preempt_count+0x20)
     #func                   -8 add_preempt_count+0x10 (wake_up_klogd+0x30)
     #func                   -8 wake_up_klogd+0x10 (console_unlock+0x194)
     #func                   -9 ipipe_check_context+0x14 (sub_preempt_count+0x20)
     #func                   -9 sub_preempt_count+0x10 (_raw_spin_unlock_irqrestore+0x38)
     #func                  -10 __ipipe_stall_root+0x10 (__ipipe_restore_root+0x34)
     #func                  -11 ipipe_check_context+0x14 (__ipipe_restore_root+0x20)
     #func                  -11 __ipipe_restore_root+0x10 (_raw_spin_unlock_irqrestore+0x30)
     #func                  -12 _raw_spin_unlock_irqrestore+0x10 (console_unlock+0x188)
     #func                  -12 __ipipe_spin_unlock_debug+0x10 (console_unlock+0x17c)
     #func                  -13 ipipe_check_context+0x14 (sub_preempt_count+0x20)
     #func                  -13 sub_preempt_count+0x10 (_raw_spin_unlock_irqrestore+0x38)
     #func                  -14 __ipipe_stall_root+0x10 (__ipipe_restore_root+0x34)
     #func                  -15 ipipe_check_context+0x14 (__ipipe_restore_root+0x20)
     #func                  -15 __ipipe_restore_root+0x10 (_raw_spin_unlock_irqrestore+0x30)
     #func                  -16 _raw_spin_unlock_irqrestore+0x10 (up+0x50)
     #func                  -16 __ipipe_spin_unlock_debug+0x10 (up+0x44)
     #func                  -17 ipipe_check_context+0x14 (add_preempt_count+0x20)
     #func                  -17 add_preempt_count+0x10 (_raw_spin_lock_irqsave+0x28)
     #func                  -18 __ipipe_test_and_stall_root+0x10 (_raw_spin_lock_irqsave+0x1c)
     #func                  -19 _raw_spin_lock_irqsave+0x10 (up+0x1c)
     #func                  -19 up+0x10 (console_unlock+0x174)
     #func                  -20 ipipe_check_context+0x14 (add_preempt_count+0x20)
     #func                  -20 add_preempt_count+0x10 (_raw_spin_lock_irqsave+0x28)
     #func                  -21 __ipipe_test_and_stall_root+0x10 (_raw_spin_lock_irqsave+0x1c)
     #func                  -22 _raw_spin_lock_irqsave+0x10 (console_unlock+0x54)
     #func                  -22 __ipipe_stall_root+0x10 (__ipipe_restore_root+0x34)
     #func                  -23 ipipe_check_context+0x14 (__ipipe_restore_root+0x20)
     #func                  -23 __ipipe_restore_root+0x10 (console_unlock+0x48)
     #func                  -24 _call_console_drivers+0x10 (console_unlock+0x40)
     #func                  -24 __ipipe_stall_root+0x10 (__ipipe_restore_root+0x34)
     #func                  -25 ipipe_check_context+0x14 (__ipipe_restore_root+0x20)
     #func                  -26 __ipipe_restore_root+0x10 (imx_console_write+0x154)
     #func                  -27 ipipe_check_context+0x14 (sub_preempt_count+0x20)
     #func                  -27 sub_preempt_count+0x10 (_raw_spin_unlock+0x30)
     #func                  -28 _raw_spin_unlock+0x10 (imx_console_write+0x178)
 |   #func                  -28 __ipipe_spin_unlock_irqrestore+0x10 (l2x0_cache_sync+0x3c)
     #func                  -29 __ipipe_spin_lock_irqsave+0x10 (l2x0_cache_sync+0x24)
     #func                  -29 l2x0_cache_sync+0x10 (imx_console_write+0x134)
 |   #func                  -30 __ipipe_spin_unlock_irqrestore+0x10 (l2x0_cache_sync+0x3c)
     #func                  -31 __ipipe_spin_lock_irqsave+0x10 (l2x0_cache_sync+0x24)
     #func                  -31 l2x0_cache_sync+0x10 (imx_console_write+0x118)
 |   #func                  -32 __ipipe_spin_unlock_irqrestore+0x10 (l2x0_cache_sync+0x3c)
     #func                  -32 __ipipe_spin_lock_irqsave+0x10 (l2x0_cache_sync+0x24)
     #func                  -33 l2x0_cache_sync+0x10 (imx_console_write+0xfc)
 |   #func                 -212 __ipipe_spin_unlock_irqrestore+0x10 (l2x0_cache_sync+0x3c)
     #func                 -213 __ipipe_spin_lock_irqsave+0x10 (l2x0_cache_sync+0x24)
     #func                 -213 l2x0_cache_sync+0x10 (imx_console_putchar+0x4c)
     #func                 -214 imx_console_putchar+0x10 (uart_console_write+0x64)
 |   #func                 -215 __ipipe_spin_unlock_irqrestore+0x10 (l2x0_cache_sync+0x3c)
     #func                 -215 __ipipe_spin_lock_irqsave+0x10 (l2x0_cache_sync+0x24)
     #func                 -216 l2x0_cache_sync+0x10 (imx_console_putchar+0x4c)
     #func                 -217 imx_console_putchar+0x10 (uart_console_write+0x58)
     #func                 -217 uart_console_write+0x10 (imx_console_write+0xd4)
 |   #func                 -218 __ipipe_spin_unlock_irqrestore+0x10 (l2x0_cache_sync+0x3c)
     #func                 -219 __ipipe_spin_lock_irqsave+0x10 (l2x0_cache_sync+0x24)
     #func                 -219 l2x0_cache_sync+0x10 (imx_console_write+0xb4)
 |   #func                 -220 __ipipe_spin_unlock_irqrestore+0x10 (l2x0_cache_sync+0x3c)
     #func                 -220 __ipipe_spin_lock_irqsave+0x10 (l2x0_cache_sync+0x24)
     #func                 -221 l2x0_cache_sync+0x10 (imx_console_write+0x94)
     #func                 -222 ipipe_check_context+0x14 (add_preempt_count+0x20)
     #func                 -223 add_preempt_count+0x10 (_raw_spin_lock+0x20)
     #func                 -224 _raw_spin_lock+0x10 (imx_console_write+0x16c)
     #func                 -224 __ipipe_test_and_stall_root+0x10 (imx_console_write+0x30)
     #func                 -225 imx_console_write+0x14 (__call_console_drivers+0xdc)
     #func                 -225 __call_console_drivers+0x10 (_call_console_drivers+0x70)
     #func                 -226 _call_console_drivers+0x10 (console_unlock+0xf8)
     #func                 -227 ipipe_check_context+0x14 (sub_preempt_count+0x20)
     #func                 -227 sub_preempt_count+0x10 (_raw_spin_unlock+0x30)
     #func                 -228 _raw_spin_unlock+0x10 (console_unlock+0x84)
     #func                 -228 ipipe_check_context+0x14 (add_preempt_count+0x20)
     #func                 -229 add_preempt_count+0x10 (_raw_spin_lock_irqsave+0x28)
     #func                 -230 __ipipe_test_and_stall_root+0x10 (_raw_spin_lock_irqsave+0x1c)
     #func                 -230 _raw_spin_lock_irqsave+0x10 (console_unlock+0x54)
     #func                 -231 console_unlock+0x10 (vprintk+0x208)
     #func                 -231 ipipe_check_context+0x14 (sub_preempt_count+0x20)
     #func                 -232 sub_preempt_count+0x10 (_raw_spin_unlock+0x30)
     #func                 -232 _raw_spin_unlock+0x10 (vprintk+0x204)
     #func                 -233 ipipe_check_context+0x14 (sub_preempt_count+0x20)
     #func                 -234 sub_preempt_count+0x10 (_raw_spin_unlock_irqrestore+0x38)
     #func                 -234 __ipipe_stall_root+0x10 (__ipipe_restore_root+0x34)
     #func                 -235 ipipe_check_context+0x14 (__ipipe_restore_root+0x20)
     #func                 -236 __ipipe_restore_root+0x10 (_raw_spin_unlock_irqrestore+0x30)
     #func                 -236 _raw_spin_unlock_irqrestore+0x10 (down_trylock+0x40)
     #func                 -237 __ipipe_spin_unlock_debug+0x10 (down_trylock+0x34)
     #func                 -237 ipipe_check_context+0x14 (add_preempt_count+0x20)
     #func                 -238 add_preempt_count+0x10 (_raw_spin_lock_irqsave+0x28)
     #func                 -239 __ipipe_test_and_stall_root+0x10 (_raw_spin_lock_irqsave+0x1c)
     #func                 -239 _raw_spin_lock_irqsave+0x10 (down_trylock+0x1c)
     #func                 -240 down_trylock+0x10 (console_trylock+0x1c)
---[ end trace 01c68765b125b83c ]---
Kernel panic - not syncing: Fatal exception in interrupt
[<801dd1d8>] (unwind_backtrace+0x0/0x104) from [<80881184>] (dump_stack+0x20/0x24)
[<80881184>] (dump_stack+0x20/0x24) from [<80882028>] (panic+0x74/0x190)
[<80882028>] (panic+0x74/0x190) from [<801d90a8>] (die+0x248/0x2b8)
[<801d90a8>] (die+0x248/0x2b8) from [<808811ec>] (__do_kernel_fault.part.3+0x64/0x84)
[<808811ec>] (__do_kernel_fault.part.3+0x64/0x84) from [<801e2db4>] (do_page_fault+0x240/0x4b8)
[<801e2db4>] (do_page_fault+0x240/0x4b8) from [<801e3200>] (do_translation_fault+0x144/0x254)
[<801e3200>] (do_translation_fault+0x144/0x254) from [<801ce29c>] (do_DataAbort+0x44/0x260)
[<801ce29c>] (do_DataAbort+0x44/0x260) from [<801d4750>] (__dabt_svc+0x70/0xa0)
Exception stack(0xbffb5e10 to 0xbffb5e58)
5e00:                                     0c2d3000 3b9aca00 87e00000 02d5c6da
5e20: 80f59af0 0c2d3000 00000000 8c31c560 80ee80e0 00f6f581 80ee80e0 bffb5ea4
5e40: 00000000 bffb5e58 8023f990 802407fc 600f0013 ffffffff
[<801d4750>] (__dabt_svc+0x70/0xa0) from [<802407fc>] (timekeeping_resume+0x98/0x120)
[<802407fc>] (timekeeping_resume+0x98/0x120) from [<8023afb0>] (hrtimer_interrupt+0x7c/0x2cc)
[<8023afb0>] (hrtimer_interrupt+0x7c/0x2cc) from [<801db9b8>] (ipi_timer+0x3c/0x40)
[<801db9b8>] (ipi_timer+0x3c/0x40) from [<801ce5ec>] (do_local_timer+0x70/0x9c)
[<801ce5ec>] (do_local_timer+0x70/0x9c) from [<801dbe48>] (__ipipe_root_localtimer+0x38/0x3c)
Exception stack(0xbffb5f50 to 0xbffb5f98)
5f40:                                     bffb5f9c bffb5f60 802640d0 801dbe1c
5f60: 80267ee8 80266e94 80049800 80049808 80269cd0 8c31c808 80f4e6e4 80efbb10
5f80: 8088cfc4 80eff42c 412fc09a 00000000 bffb5fb4 bffb5fa0
[<801dbe48>] (__ipipe_root_localtimer+0x38/0x3c) from [<802640d0>] (__ipipe_sync_stage+0x19c/0x330)
[<802640d0>] (__ipipe_sync_stage+0x19c/0x330) from [<801d5c44>] (default_idle+0xa8/0xc0)
[<801d5c44>] (default_idle+0xa8/0xc0) from [<801d5e90>] (cpu_idle+0xb4/0x11c)
[<801d5e90>] (cpu_idle+0xb4/0x11c) from [<8087e59c>] (secondary_start_kernel+0x130/0x154)
[<8087e59c>] (secondary_start_kernel+0x130/0x154) from [<1087ded4>] (0x1087ded4)
CPU1: stopping
[<801dd1d8>] (unwind_backtrace+0x0/0x104) from [<80881184>] (dump_stack+0x20/0x24)
[<80881184>] (dump_stack+0x20/0x24) from [<801ce754>] (do_IPI+0x13c/0x164)
[<801ce754>] (do_IPI+0x13c/0x164) from [<801dbe84>] (__ipipe_root_ipi+0x38/0x3c)
Exception stack(0xbffade48 to 0xbffade90)
de40:                   bffade94 bffade58 802640d0 801dbe58 80f5c380 8c192808
de60: 80049800 80049808 80269cd0 80f5c384 80efc100 80049808 80f5c380 80f103a0
de80: 80049800 80049808 bffaded4 bffade98
[<801dbe84>] (__ipipe_root_ipi+0x38/0x3c) from [<802640d0>] (__ipipe_sync_stage+0x19c/0x330)
[<802640d0>] (__ipipe_sync_stage+0x19c/0x330) from [<80264b7c>] (__ipipe_walk_pipeline+0x190/0x2e4)
[<80264b7c>] (__ipipe_walk_pipeline+0x190/0x2e4) from [<801e1014>] (__ipipe_handle_irq+0x1e0/0x29c)
[<801e1014>] (__ipipe_handle_irq+0x1e0/0x29c) from [<801ce1b4>] (__ipipe_grab_ipi+0x3c/0xe0)
[<801ce1b4>] (__ipipe_grab_ipi+0x3c/0xe0) from [<801d47c0>] (__irq_svc+0x40/0xd4)
Exception stack(0xbffadf40 to 0xbffadf88)
df40: 20000000 0000001d 00000000 f40dc010 8c192808 80f4e6e4 80efbb10 8088cfc4
df60: 80eff42c 412fc09a 00000000 bffadf9c 801e7c28 bffadf88 801e8214 801e670c
df80: 20000013 ffffffff
[<801d47c0>] (__irq_svc+0x40/0xd4) from [<801e670c>] (cpu_v7_do_idle+0x8/0xc)
SMP: failed to stop secondary CPUs




> --
> 					    Gilles.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-14 10:15                                     ` tbultel
@ 2014-11-14 10:28                                       ` Gilles Chanteperdrix
  2014-11-16 20:44                                         ` Thierry Bultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-14 10:28 UTC (permalink / raw)
  To: tbultel; +Cc: nicolas Mabire, xenomai

On Fri, Nov 14, 2014 at 11:15:00AM +0100, tbultel@free.fr wrote:
> Gilles, it has taken time (about 3 hours), but I finally got
> a backtrace with IPIPE_DEBUG: (I have disabled
> CONFIG_IPIPE_TRACE_VMALLOC)

Normally CONFIG_IPIPE_TRACE_VMALLOC is required on ARM. kmalloc fails
to allocate large enough areas, which causes the kernel to fail booting.

> 
> Before the trace went to the serial console, I saw that there were no more timer irq
> on 2 CPUs (my ssh session froze). Some 3 or 4 seconds later it came up.
> Not clear to me why there are only 100 points, because I have CONFIG_IPIPE_TRACE_SHIFT=14

You have to write the number of back_trace_points you want to
/proc/ipipe/trace/back_trace_points

Also your trace contains a lot of writing to the console, you should
move the ipipe_trace_panic_freeze to before any console output.

Also note that the UART being slow, a lot of trace points will take
a lot of time to print out.

You may want to enable the tracer verbose mode to have the timestamps.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-14 10:28                                       ` Gilles Chanteperdrix
@ 2014-11-16 20:44                                         ` Thierry Bultel
  2014-11-17 10:12                                           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: Thierry Bultel @ 2014-11-16 20:44 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai

Le 14/11/2014 11:28, Gilles Chanteperdrix a écrit :
> On Fri, Nov 14, 2014 at 11:15:00AM +0100, tbultel@free.fr wrote:
>> Gilles, it has taken time (about 3 hours), but I finally got
>> a backtrace with IPIPE_DEBUG: (I have disabled
>> CONFIG_IPIPE_TRACE_VMALLOC)
>
> Normally CONFIG_IPIPE_TRACE_VMALLOC is required on ARM. kmalloc fails
> to allocate large enough areas, which causes the kernel to fail booting.
>
>>
>> Before the trace went to the serial console, I saw that there were no more timer irq
>> on 2 CPUs (my ssh session froze). Some 3 or 4 seconds later it came up.
>> Not clear to me why there are only 100 points, because I have CONFIG_IPIPE_TRACE_SHIFT=14
>
> You have to write the number of back_trace_points you want to
> /proc/ipipe/trace/back_trace_points
>
> Also your trace contains a lot of writing to the console, you should
> move the ipipe_trace_panic_freeze to before any console output.
>
> Also note that the UART being slow, a lot of trace points will take
> a lot of time to print out.
>
> You may want to enable the tracer verbose mode to have the timestamps.
>

FYI, the kernel without IPIPE but with the PREEMPT_RT patch freezes as 
well, when flooding with ping, and mostly when started after a poweroff.
I have not been able (yet) to freeze it after a reboot.
I will consider looking deeper in the bootloader.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-16 20:44                                         ` Thierry Bultel
@ 2014-11-17 10:12                                           ` Gilles Chanteperdrix
  2014-11-17 10:43                                             ` tbultel
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-11-17 10:12 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: nicolas Mabire, xenomai

On Sun, Nov 16, 2014 at 09:44:57PM +0100, Thierry Bultel wrote:
> Le 14/11/2014 11:28, Gilles Chanteperdrix a écrit :
> >On Fri, Nov 14, 2014 at 11:15:00AM +0100, tbultel@free.fr wrote:
> >>Gilles, it has taken time (about 3 hours), but I finally got
> >>a backtrace with IPIPE_DEBUG: (I have disabled
> >>CONFIG_IPIPE_TRACE_VMALLOC)
> >
> >Normally CONFIG_IPIPE_TRACE_VMALLOC is required on ARM. kmalloc fails
> >to allocate large enough areas, which causes the kernel to fail booting.
> >
> >>
> >>Before the trace went to the serial console, I saw that there were no more timer irq
> >>on 2 CPUs (my ssh session froze). Some 3 or 4 seconds later it came up.
> >>Not clear to me why there are only 100 points, because I have CONFIG_IPIPE_TRACE_SHIFT=14
> >
> >You have to write the number of back_trace_points you want to
> >/proc/ipipe/trace/back_trace_points
> >
> >Also your trace contains a lot of writing to the console, you should
> >move the ipipe_trace_panic_freeze to before any console output.
> >
> >Also note that the UART being slow, a lot of trace points will take
> >a lot of time to print out.
> >
> >You may want to enable the tracer verbose mode to have the timestamps.
> >
> 
> FYI, the kernel without IPIPE but with the PREEMPT_RT patch freezes
> as well, when flooding with ping, and mostly when started after a
> poweroff.
> I have not been able (yet) to freeze it after a reboot.
> I will consider looking deeper in the bootloader.

Is it still with the 3.0 kernel, or with something more current?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
  2014-11-17 10:12                                           ` Gilles Chanteperdrix
@ 2014-11-17 10:43                                             ` tbultel
  0 siblings, 0 replies; 46+ messages in thread
From: tbultel @ 2014-11-17 10:43 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: nicolas Mabire, xenomai



----- Mail original -----
> De: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> À: "Thierry Bultel" <tbultel@free.fr>
> Cc: xenomai@xenomai.org, "nicolas Mabire" <nicolas.mabire@basystemes.fr>
> Envoyé: Lundi 17 Novembre 2014 11:12:17
> Objet: Re: [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot
> 
> On Sun, Nov 16, 2014 at 09:44:57PM +0100, Thierry Bultel wrote:
> > Le 14/11/2014 11:28, Gilles Chanteperdrix a écrit :
> > >On Fri, Nov 14, 2014 at 11:15:00AM +0100, tbultel@free.fr wrote:
> > >>Gilles, it has taken time (about 3 hours), but I finally got
> > >>a backtrace with IPIPE_DEBUG: (I have disabled
> > >>CONFIG_IPIPE_TRACE_VMALLOC)
> > >
> > >Normally CONFIG_IPIPE_TRACE_VMALLOC is required on ARM. kmalloc
> > >fails
> > >to allocate large enough areas, which causes the kernel to fail
> > >booting.
> > >
> > >>
> > >>Before the trace went to the serial console, I saw that there
> > >>were no more timer irq
> > >>on 2 CPUs (my ssh session froze). Some 3 or 4 seconds later it
> > >>came up.
> > >>Not clear to me why there are only 100 points, because I have
> > >>CONFIG_IPIPE_TRACE_SHIFT=14
> > >
> > >You have to write the number of back_trace_points you want to
> > >/proc/ipipe/trace/back_trace_points
> > >
> > >Also your trace contains a lot of writing to the console, you
> > >should
> > >move the ipipe_trace_panic_freeze to before any console output.
> > >
> > >Also note that the UART being slow, a lot of trace points will
> > >take
> > >a lot of time to print out.
> > >
> > >You may want to enable the tracer verbose mode to have the
> > >timestamps.
> > >
> > 
> > FYI, the kernel without IPIPE but with the PREEMPT_RT patch freezes
> > as well, when flooding with ping, and mostly when started after a
> > poweroff.
> > I have not been able (yet) to freeze it after a reboot.
> > I will consider looking deeper in the bootloader.
> 
> Is it still with the 3.0 kernel, or with something more current?
> 
> --

still with the 3.0.35-4.1.0 + official preempt-rt patch for 3.0.35
(I had to deal with some little rejections)


> 					    Gilles.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-11-17 10:43 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-05 20:38 [Xenomai] IMX kernel 3.0.35_4.1.0 + adeos-ipipe-3.0.43-mx6q-1.18-14 -> very slow boot Thierry Bultel
2014-11-05 20:59 ` Gilles Chanteperdrix
2014-11-06 10:57   ` tbultel
2014-11-06 11:47     ` Gilles Chanteperdrix
2014-11-06 12:34       ` Gilles Chanteperdrix
2014-11-06 12:52         ` Gilles Chanteperdrix
2014-11-06 14:41           ` tbultel
2014-11-06 14:51             ` Gilles Chanteperdrix
2014-11-06 16:04             ` Lennart Sorensen
2014-11-06 16:08               ` Gilles Chanteperdrix
2014-11-07  9:48                 ` tbultel
2014-11-07  9:52                   ` Gilles Chanteperdrix
2014-11-07  9:59                     ` Gilles Chanteperdrix
2014-11-07 12:47                     ` tbultel
2014-11-07 19:58                       ` Gilles Chanteperdrix
2014-11-09 17:48                         ` Thierry Bultel
2014-11-10 12:36                           ` Gilles Chanteperdrix
2014-11-11 19:57                             ` Thierry Bultel
2014-11-11 20:03                               ` Gilles Chanteperdrix
2014-11-12 13:17                                 ` Thierry Bultel
2014-11-12 13:34                                   ` Gilles Chanteperdrix
2014-11-12 14:27                                     ` Thierry Bultel
2014-11-12 14:30                                       ` Gilles Chanteperdrix
2014-11-12 15:20                                         ` Thierry Bultel
2014-11-12 15:29                                           ` Gilles Chanteperdrix
2014-11-12 15:44                                             ` Thierry Bultel
2014-11-12 15:55                                               ` Gilles Chanteperdrix
2014-11-12 16:17                                                 ` Thierry Bultel
2014-11-12 16:15                                               ` Gilles Chanteperdrix
2014-11-12 18:53                                               ` Lennart Sorensen
2014-11-12 19:06                                                 ` Gilles Chanteperdrix
2014-11-12 19:13                                                   ` Lennart Sorensen
2014-11-12 19:28                                                     ` Gilles Chanteperdrix
2014-11-12 19:35                                                       ` Lennart Sorensen
2014-11-13 14:44                                 ` tbultel
2014-11-13 14:51                                   ` Gilles Chanteperdrix
2014-11-13 15:03                                     ` tbultel
2014-11-13 15:10                                       ` Gilles Chanteperdrix
2014-11-13 15:23                                         ` tbultel
2014-11-13 15:26                                           ` Gilles Chanteperdrix
2014-11-14 10:15                                     ` tbultel
2014-11-14 10:28                                       ` Gilles Chanteperdrix
2014-11-16 20:44                                         ` Thierry Bultel
2014-11-17 10:12                                           ` Gilles Chanteperdrix
2014-11-17 10:43                                             ` tbultel
2014-11-06 12:48     ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.