* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1
@ 2015-02-23 23:59 Tony Lindgren
2015-02-24 1:59 ` Tony Lindgren
0 siblings, 1 reply; 10+ messages in thread
From: Tony Lindgren @ 2015-02-23 23:59 UTC (permalink / raw)
To: linux-arm-kernel
Hi Nishanth,
Olof told me about a new L3 error happening on omap5-uevm with
v4.0-rc1:
WARNING: CPU: 0 PID: 0 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340()
4000000.ocp:L3 Custom Error: MASTER MPU TARGET L4PER2 (Idle): Data Access in Supervisor mode during Functional access
...
I tried bisecting this with no luck, but narrowed it down to
having CONFIG_CPUFREQ_DT=y causing it, while =m wont' trigger
it. This got changed by commit 40d1746d2eee ("ARM:
omap2plus_defconfig: use CONFIG_CPUFREQ_DT").
Any ideas?
Regards,
Tony
^ permalink raw reply [flat|nested] 10+ messages in thread* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 2015-02-23 23:59 New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 Tony Lindgren @ 2015-02-24 1:59 ` Tony Lindgren 2015-02-24 2:24 ` Felipe Balbi 0 siblings, 1 reply; 10+ messages in thread From: Tony Lindgren @ 2015-02-24 1:59 UTC (permalink / raw) To: linux-arm-kernel * Tony Lindgren <tony@atomide.com> [150223 16:09]: > Hi Nishanth, > > Olof told me about a new L3 error happening on omap5-uevm with > v4.0-rc1: > > WARNING: CPU: 0 PID: 0 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() > 4000000.ocp:L3 Custom Error: MASTER MPU TARGET L4PER2 (Idle): Data Access in Supervisor mode during Functional access > ... > > I tried bisecting this with no luck, but narrowed it down to > having CONFIG_CPUFREQ_DT=y causing it, while =m wont' trigger > it. This got changed by commit 40d1746d2eee ("ARM: > omap2plus_defconfig: use CONFIG_CPUFREQ_DT"). > > Any ideas? Hmm so setting CONFIG_CPUFREQ_DT=m in arch/arm/configs/omap2plus_defconfig produces the same output with make omap2plus_defconfig as with =y.. So CPUFREQ_DT can't be the real cause of the problem. It's now looking like the l3-noc warning does not get triggered on every boot. It also seems the zImage triggering the error does not trigger the error on every boot. To trigger the error, it seems the device needs to be powered down for at least 10 or so seconds between the boots. So far no luck reproducing the error on v3.19. The easy way to reproduce is to power down omap5 for at least 10 seconds, make omap2lus_defconfig on v4.0-rc1 and boot it. And so far it looks like next-20150204 works and next-20150209 failed at once so far. But of course I would not trust anything at this point :) Regards, Tony ^ permalink raw reply [flat|nested] 10+ messages in thread
* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 2015-02-24 1:59 ` Tony Lindgren @ 2015-02-24 2:24 ` Felipe Balbi 2015-02-24 2:35 ` Tony Lindgren 0 siblings, 1 reply; 10+ messages in thread From: Felipe Balbi @ 2015-02-24 2:24 UTC (permalink / raw) To: linux-arm-kernel Hi, On Mon, Feb 23, 2015 at 05:59:04PM -0800, Tony Lindgren wrote: > * Tony Lindgren <tony@atomide.com> [150223 16:09]: > > Hi Nishanth, > > > > Olof told me about a new L3 error happening on omap5-uevm with > > v4.0-rc1: > > > > WARNING: CPU: 0 PID: 0 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() > > 4000000.ocp:L3 Custom Error: MASTER MPU TARGET L4PER2 (Idle): Data Access in Supervisor mode during Functional access > > ... > > > > I tried bisecting this with no luck, but narrowed it down to > > having CONFIG_CPUFREQ_DT=y causing it, while =m wont' trigger > > it. This got changed by commit 40d1746d2eee ("ARM: > > omap2plus_defconfig: use CONFIG_CPUFREQ_DT"). > > > > Any ideas? > > Hmm so setting CONFIG_CPUFREQ_DT=m in arch/arm/configs/omap2plus_defconfig > produces the same output with make omap2plus_defconfig as with =y.. So > CPUFREQ_DT can't be the real cause of the problem. > > It's now looking like the l3-noc warning does not get triggered on > every boot. > > It also seems the zImage triggering the error does not trigger the > error on every boot. To trigger the error, it seems the device needs to > be powered down for at least 10 or so seconds between the boots. > So far no luck reproducing the error on v3.19. > > The easy way to reproduce is to power down omap5 for at least 10 seconds, > make omap2lus_defconfig on v4.0-rc1 and boot it. > > And so far it looks like next-20150204 works and next-20150209 > failed at once so far. But of course I would not trust anything > at this point :) got a log of the failure ? Is it pointing to a device or one of the L4s? Might be worth to boot with just the bare minimum (UART & timers) and disable everything else. You might need to build busybox and append that to the kernel so you don't need to rely on MMC/USB/etc for rootfs. After that, you could start enabling modules one by one (as modules, not built-in) and loading them one by one to see which one causes the failure. Big PITA, I know, but I can't think of any other way to go about this. -- balbi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150223/26c84f17/attachment.sig> ^ permalink raw reply [flat|nested] 10+ messages in thread
* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 2015-02-24 2:24 ` Felipe Balbi @ 2015-02-24 2:35 ` Tony Lindgren 2015-02-24 3:01 ` Tony Lindgren 2015-02-24 3:12 ` Felipe Balbi 0 siblings, 2 replies; 10+ messages in thread From: Tony Lindgren @ 2015-02-24 2:35 UTC (permalink / raw) To: linux-arm-kernel * Felipe Balbi <balbi@ti.com> [150223 18:28]: > Hi, > > On Mon, Feb 23, 2015 at 05:59:04PM -0800, Tony Lindgren wrote: > > * Tony Lindgren <tony@atomide.com> [150223 16:09]: > > > Hi Nishanth, > > > > > > Olof told me about a new L3 error happening on omap5-uevm with > > > v4.0-rc1: > > > > > > WARNING: CPU: 0 PID: 0 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() > > > 4000000.ocp:L3 Custom Error: MASTER MPU TARGET L4PER2 (Idle): Data Access in Supervisor mode during Functional access > > > ... > > > > > > I tried bisecting this with no luck, but narrowed it down to > > > having CONFIG_CPUFREQ_DT=y causing it, while =m wont' trigger > > > it. This got changed by commit 40d1746d2eee ("ARM: > > > omap2plus_defconfig: use CONFIG_CPUFREQ_DT"). > > > > > > Any ideas? > > > > Hmm so setting CONFIG_CPUFREQ_DT=m in arch/arm/configs/omap2plus_defconfig > > produces the same output with make omap2plus_defconfig as with =y.. So > > CPUFREQ_DT can't be the real cause of the problem. > > > > It's now looking like the l3-noc warning does not get triggered on > > every boot. > > > > It also seems the zImage triggering the error does not trigger the > > error on every boot. To trigger the error, it seems the device needs to > > be powered down for at least 10 or so seconds between the boots. > > So far no luck reproducing the error on v3.19. > > > > The easy way to reproduce is to power down omap5 for at least 10 seconds, > > make omap2lus_defconfig on v4.0-rc1 and boot it. > > > > And so far it looks like next-20150204 works and next-20150209 > > failed at once so far. But of course I would not trust anything > > at this point :) > > got a log of the failure ? Is it pointing to a device or one of the L4s? Well mostly the MASTER MPU TARGET L4PER2, the following stack dump is really the stack dump of the l3_interrupt_handler. > Might be worth to boot with just the bare minimum (UART & timers) and > disable everything else. You might need to build busybox and append that > to the kernel so you don't need to rely on MMC/USB/etc for rootfs. > > After that, you could start enabling modules one by one (as modules, not > built-in) and loading them one by one to see which one causes the > failure. Big PITA, I know, but I can't think of any other way to go > about this. It seems the best way to deal with this is to make the l3_handle_target actually show the address where the error happened to limit it down to a single device.. Regards, Tony ^ permalink raw reply [flat|nested] 10+ messages in thread
* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 2015-02-24 2:35 ` Tony Lindgren @ 2015-02-24 3:01 ` Tony Lindgren 2015-02-24 3:15 ` Felipe Balbi 2015-02-24 3:12 ` Felipe Balbi 1 sibling, 1 reply; 10+ messages in thread From: Tony Lindgren @ 2015-02-24 3:01 UTC (permalink / raw) To: linux-arm-kernel * Tony Lindgren <tony@atomide.com> [150223 18:43]: > * Felipe Balbi <balbi@ti.com> [150223 18:28]: > > Hi, > > > > On Mon, Feb 23, 2015 at 05:59:04PM -0800, Tony Lindgren wrote: > > > * Tony Lindgren <tony@atomide.com> [150223 16:09]: > > > > Hi Nishanth, > > > > > > > > Olof told me about a new L3 error happening on omap5-uevm with > > > > v4.0-rc1: > > > > > > > > WARNING: CPU: 0 PID: 0 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() > > > > 4000000.ocp:L3 Custom Error: MASTER MPU TARGET L4PER2 (Idle): Data Access in Supervisor mode during Functional access > > > > ... > > > > > > > > I tried bisecting this with no luck, but narrowed it down to > > > > having CONFIG_CPUFREQ_DT=y causing it, while =m wont' trigger > > > > it. This got changed by commit 40d1746d2eee ("ARM: > > > > omap2plus_defconfig: use CONFIG_CPUFREQ_DT"). > > > > > > > > Any ideas? > > > > > > Hmm so setting CONFIG_CPUFREQ_DT=m in arch/arm/configs/omap2plus_defconfig > > > produces the same output with make omap2plus_defconfig as with =y.. So > > > CPUFREQ_DT can't be the real cause of the problem. > > > > > > It's now looking like the l3-noc warning does not get triggered on > > > every boot. > > > > > > It also seems the zImage triggering the error does not trigger the > > > error on every boot. To trigger the error, it seems the device needs to > > > be powered down for at least 10 or so seconds between the boots. > > > So far no luck reproducing the error on v3.19. > > > > > > The easy way to reproduce is to power down omap5 for at least 10 seconds, > > > make omap2lus_defconfig on v4.0-rc1 and boot it. > > > > > > And so far it looks like next-20150204 works and next-20150209 > > > failed at once so far. But of course I would not trust anything > > > at this point :) > > > > got a log of the failure ? Is it pointing to a device or one of the L4s? > > Well mostly the MASTER MPU TARGET L4PER2, the following stack dump is > really the stack dump of the l3_interrupt_handler. > > > Might be worth to boot with just the bare minimum (UART & timers) and > > disable everything else. You might need to build busybox and append that > > to the kernel so you don't need to rely on MMC/USB/etc for rootfs. > > > > After that, you could start enabling modules one by one (as modules, not > > built-in) and loading them one by one to see which one causes the > > failure. Big PITA, I know, but I can't think of any other way to go > > about this. > > It seems the best way to deal with this is to make the l3_handle_target > actually show the address where the error happened to limit it down > to a single device.. Looks like the address is 0 for "Custom Error". Anyways, reverting a fix for similar issue found on omap3 so far seems to help, that's 3d009c8c61f9 ("gpio: omap: Fix bad device access with setup_irq()"). That got merged very late for v3.19, so it certainly explains why it was not noticed until now. Regards, Tony ^ permalink raw reply [flat|nested] 10+ messages in thread
* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 2015-02-24 3:01 ` Tony Lindgren @ 2015-02-24 3:15 ` Felipe Balbi 2015-02-24 3:21 ` Tony Lindgren 0 siblings, 1 reply; 10+ messages in thread From: Felipe Balbi @ 2015-02-24 3:15 UTC (permalink / raw) To: linux-arm-kernel Hi, On Mon, Feb 23, 2015 at 07:01:42PM -0800, Tony Lindgren wrote: > * Tony Lindgren <tony@atomide.com> [150223 18:43]: > > * Felipe Balbi <balbi@ti.com> [150223 18:28]: > > > Hi, > > > > > > On Mon, Feb 23, 2015 at 05:59:04PM -0800, Tony Lindgren wrote: > > > > * Tony Lindgren <tony@atomide.com> [150223 16:09]: > > > > > Hi Nishanth, > > > > > > > > > > Olof told me about a new L3 error happening on omap5-uevm with > > > > > v4.0-rc1: > > > > > > > > > > WARNING: CPU: 0 PID: 0 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() > > > > > 4000000.ocp:L3 Custom Error: MASTER MPU TARGET L4PER2 (Idle): Data Access in Supervisor mode during Functional access > > > > > ... > > > > > > > > > > I tried bisecting this with no luck, but narrowed it down to > > > > > having CONFIG_CPUFREQ_DT=y causing it, while =m wont' trigger > > > > > it. This got changed by commit 40d1746d2eee ("ARM: > > > > > omap2plus_defconfig: use CONFIG_CPUFREQ_DT"). > > > > > > > > > > Any ideas? > > > > > > > > Hmm so setting CONFIG_CPUFREQ_DT=m in arch/arm/configs/omap2plus_defconfig > > > > produces the same output with make omap2plus_defconfig as with =y.. So > > > > CPUFREQ_DT can't be the real cause of the problem. > > > > > > > > It's now looking like the l3-noc warning does not get triggered on > > > > every boot. > > > > > > > > It also seems the zImage triggering the error does not trigger the > > > > error on every boot. To trigger the error, it seems the device needs to > > > > be powered down for at least 10 or so seconds between the boots. > > > > So far no luck reproducing the error on v3.19. > > > > > > > > The easy way to reproduce is to power down omap5 for at least 10 seconds, > > > > make omap2lus_defconfig on v4.0-rc1 and boot it. > > > > > > > > And so far it looks like next-20150204 works and next-20150209 > > > > failed at once so far. But of course I would not trust anything > > > > at this point :) > > > > > > got a log of the failure ? Is it pointing to a device or one of the L4s? > > > > Well mostly the MASTER MPU TARGET L4PER2, the following stack dump is > > really the stack dump of the l3_interrupt_handler. > > > > > Might be worth to boot with just the bare minimum (UART & timers) and > > > disable everything else. You might need to build busybox and append that > > > to the kernel so you don't need to rely on MMC/USB/etc for rootfs. > > > > > > After that, you could start enabling modules one by one (as modules, not > > > built-in) and loading them one by one to see which one causes the > > > failure. Big PITA, I know, but I can't think of any other way to go > > > about this. > > > > It seems the best way to deal with this is to make the l3_handle_target > > actually show the address where the error happened to limit it down > > to a single device.. > > Looks like the address is 0 for "Custom Error". Anyways, reverting yeah, that's because the error comes from l4per2, not l3 :-) > a fix for similar issue found on omap3 so far seems to help, that's > 3d009c8c61f9 ("gpio: omap: Fix bad device access with setup_irq()"). if we revert that, we regress omap3, right ? -- balbi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150223/ddbd7a61/attachment.sig> ^ permalink raw reply [flat|nested] 10+ messages in thread
* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 2015-02-24 3:15 ` Felipe Balbi @ 2015-02-24 3:21 ` Tony Lindgren 2015-02-24 3:24 ` Tony Lindgren 0 siblings, 1 reply; 10+ messages in thread From: Tony Lindgren @ 2015-02-24 3:21 UTC (permalink / raw) To: linux-arm-kernel * Felipe Balbi <balbi@ti.com> [150223 19:19]: > On Mon, Feb 23, 2015 at 07:01:42PM -0800, Tony Lindgren wrote: > > * Tony Lindgren <tony@atomide.com> [150223 18:43]: > > > > Looks like the address is 0 for "Custom Error". Anyways, reverting > > yeah, that's because the error comes from l4per2, not l3 :-) Right so it seems :) > > a fix for similar issue found on omap3 so far seems to help, that's > > 3d009c8c61f9 ("gpio: omap: Fix bad device access with setup_irq()"). > > if we revert that, we regress omap3, right ? Yes we saw that getting triggered on omap3/4 before 3d009c8c61f9. Now looking at the stack trace again, it actually has something: ... [<c05999a4>] (__irq_svc) from [<c0599164>] (_raw_spin_unlock_irqrestore+0x34/0x44) [<c0599164>] (_raw_spin_unlock_irqrestore) from [<c0027120>] (omap_hwmod_idle+0x34/0x44) [<c0027120>] (omap_hwmod_idle) from [<c00283f8>] (omap_device_idle+0x38/0x78) [<c00283f8>] (omap_device_idle) from [<c0028454>] (_od_runtime_suspend+0x1c/0x24) [<c0028454>] (_od_runtime_suspend) from [<c03b5214>] (__rpm_callback+0x2c/0x60) [<c03b5214>] (__rpm_callback) from [<c03b5268>] (rpm_callback+0x20/0x80) [<c03b5268>] (rpm_callback) from [<c03b56b8>] (rpm_suspend+0xe8/0x55c) [<c03b56b8>] (rpm_suspend) from [<c03b6bdc>] (pm_runtime_work+0x74/0xa8) [<c03b6bdc>] (pm_runtime_work) from [<c0054608>] (process_one_work+0x1b0/0x4a0) ... If it's the gpio-omap, there's probably some confusion still in the driver regarding the GPIO bank idle status. Anyways, will look more into it tomorrow. Regards, Tony ^ permalink raw reply [flat|nested] 10+ messages in thread
* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 2015-02-24 3:21 ` Tony Lindgren @ 2015-02-24 3:24 ` Tony Lindgren 2015-02-24 14:46 ` Felipe Balbi 0 siblings, 1 reply; 10+ messages in thread From: Tony Lindgren @ 2015-02-24 3:24 UTC (permalink / raw) To: linux-arm-kernel * Tony Lindgren <tony@atomide.com> [150223 19:29]: > * Felipe Balbi <balbi@ti.com> [150223 19:19]: > > On Mon, Feb 23, 2015 at 07:01:42PM -0800, Tony Lindgren wrote: > > > * Tony Lindgren <tony@atomide.com> [150223 18:43]: > > > > > > Looks like the address is 0 for "Custom Error". Anyways, reverting > > > > yeah, that's because the error comes from l4per2, not l3 :-) > > Right so it seems :) > > > > a fix for similar issue found on omap3 so far seems to help, that's > > > 3d009c8c61f9 ("gpio: omap: Fix bad device access with setup_irq()"). > > > > if we revert that, we regress omap3, right ? > > Yes we saw that getting triggered on omap3/4 before 3d009c8c61f9. > > Now looking at the stack trace again, it actually has something: > > ... > [<c05999a4>] (__irq_svc) from [<c0599164>] (_raw_spin_unlock_irqrestore+0x34/0x44) > [<c0599164>] (_raw_spin_unlock_irqrestore) from [<c0027120>] (omap_hwmod_idle+0x34/0x44) > [<c0027120>] (omap_hwmod_idle) from [<c00283f8>] (omap_device_idle+0x38/0x78) > [<c00283f8>] (omap_device_idle) from [<c0028454>] (_od_runtime_suspend+0x1c/0x24) > [<c0028454>] (_od_runtime_suspend) from [<c03b5214>] (__rpm_callback+0x2c/0x60) > [<c03b5214>] (__rpm_callback) from [<c03b5268>] (rpm_callback+0x20/0x80) > [<c03b5268>] (rpm_callback) from [<c03b56b8>] (rpm_suspend+0xe8/0x55c) > [<c03b56b8>] (rpm_suspend) from [<c03b6bdc>] (pm_runtime_work+0x74/0xa8) > [<c03b6bdc>] (pm_runtime_work) from [<c0054608>] (process_one_work+0x1b0/0x4a0) > ... > > If it's the gpio-omap, there's probably some confusion still in > the driver regarding the GPIO bank idle status. Anyways, will look > more into it tomorrow. And now I'm seeing the error with 3d009c8c61f9 reverted, so it does not seem to be that either.. Regards, Tony ^ permalink raw reply [flat|nested] 10+ messages in thread
* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 2015-02-24 3:24 ` Tony Lindgren @ 2015-02-24 14:46 ` Felipe Balbi 0 siblings, 0 replies; 10+ messages in thread From: Felipe Balbi @ 2015-02-24 14:46 UTC (permalink / raw) To: linux-arm-kernel Hi, On Mon, Feb 23, 2015 at 07:24:50PM -0800, Tony Lindgren wrote: > * Tony Lindgren <tony@atomide.com> [150223 19:29]: > > * Felipe Balbi <balbi@ti.com> [150223 19:19]: > > > On Mon, Feb 23, 2015 at 07:01:42PM -0800, Tony Lindgren wrote: > > > > * Tony Lindgren <tony@atomide.com> [150223 18:43]: > > > > > > > > Looks like the address is 0 for "Custom Error". Anyways, reverting > > > > > > yeah, that's because the error comes from l4per2, not l3 :-) > > > > Right so it seems :) > > > > > > a fix for similar issue found on omap3 so far seems to help, that's > > > > 3d009c8c61f9 ("gpio: omap: Fix bad device access with setup_irq()"). > > > > > > if we revert that, we regress omap3, right ? > > > > Yes we saw that getting triggered on omap3/4 before 3d009c8c61f9. > > > > Now looking at the stack trace again, it actually has something: > > > > ... > > [<c05999a4>] (__irq_svc) from [<c0599164>] (_raw_spin_unlock_irqrestore+0x34/0x44) > > [<c0599164>] (_raw_spin_unlock_irqrestore) from [<c0027120>] (omap_hwmod_idle+0x34/0x44) > > [<c0027120>] (omap_hwmod_idle) from [<c00283f8>] (omap_device_idle+0x38/0x78) > > [<c00283f8>] (omap_device_idle) from [<c0028454>] (_od_runtime_suspend+0x1c/0x24) > > [<c0028454>] (_od_runtime_suspend) from [<c03b5214>] (__rpm_callback+0x2c/0x60) > > [<c03b5214>] (__rpm_callback) from [<c03b5268>] (rpm_callback+0x20/0x80) > > [<c03b5268>] (rpm_callback) from [<c03b56b8>] (rpm_suspend+0xe8/0x55c) > > [<c03b56b8>] (rpm_suspend) from [<c03b6bdc>] (pm_runtime_work+0x74/0xa8) > > [<c03b6bdc>] (pm_runtime_work) from [<c0054608>] (process_one_work+0x1b0/0x4a0) > > ... > > > > If it's the gpio-omap, there's probably some confusion still in > > the driver regarding the GPIO bank idle status. Anyways, will look > > more into it tomorrow. > > And now I'm seeing the error with 3d009c8c61f9 reverted, so it > does not seem to be that either.. well, that's good :-) -- balbi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150224/9fbfc700/attachment.sig> ^ permalink raw reply [flat|nested] 10+ messages in thread
* New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 2015-02-24 2:35 ` Tony Lindgren 2015-02-24 3:01 ` Tony Lindgren @ 2015-02-24 3:12 ` Felipe Balbi 1 sibling, 0 replies; 10+ messages in thread From: Felipe Balbi @ 2015-02-24 3:12 UTC (permalink / raw) To: linux-arm-kernel On Mon, Feb 23, 2015 at 06:35:06PM -0800, Tony Lindgren wrote: > * Felipe Balbi <balbi@ti.com> [150223 18:28]: > > Hi, > > > > On Mon, Feb 23, 2015 at 05:59:04PM -0800, Tony Lindgren wrote: > > > * Tony Lindgren <tony@atomide.com> [150223 16:09]: > > > > Hi Nishanth, > > > > > > > > Olof told me about a new L3 error happening on omap5-uevm with > > > > v4.0-rc1: > > > > > > > > WARNING: CPU: 0 PID: 0 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() > > > > 4000000.ocp:L3 Custom Error: MASTER MPU TARGET L4PER2 (Idle): Data Access in Supervisor mode during Functional access > > > > ... > > > > > > > > I tried bisecting this with no luck, but narrowed it down to > > > > having CONFIG_CPUFREQ_DT=y causing it, while =m wont' trigger > > > > it. This got changed by commit 40d1746d2eee ("ARM: > > > > omap2plus_defconfig: use CONFIG_CPUFREQ_DT"). > > > > > > > > Any ideas? > > > > > > Hmm so setting CONFIG_CPUFREQ_DT=m in arch/arm/configs/omap2plus_defconfig > > > produces the same output with make omap2plus_defconfig as with =y.. So > > > CPUFREQ_DT can't be the real cause of the problem. > > > > > > It's now looking like the l3-noc warning does not get triggered on > > > every boot. > > > > > > It also seems the zImage triggering the error does not trigger the > > > error on every boot. To trigger the error, it seems the device needs to > > > be powered down for at least 10 or so seconds between the boots. > > > So far no luck reproducing the error on v3.19. > > > > > > The easy way to reproduce is to power down omap5 for at least 10 seconds, > > > make omap2lus_defconfig on v4.0-rc1 and boot it. > > > > > > And so far it looks like next-20150204 works and next-20150209 > > > failed at once so far. But of course I would not trust anything > > > at this point :) > > > > got a log of the failure ? Is it pointing to a device or one of the L4s? > > Well mostly the MASTER MPU TARGET L4PER2, the following stack dump is > really the stack dump of the l3_interrupt_handler. > > > Might be worth to boot with just the bare minimum (UART & timers) and > > disable everything else. You might need to build busybox and append that > > to the kernel so you don't need to rely on MMC/USB/etc for rootfs. > > > > After that, you could start enabling modules one by one (as modules, not > > built-in) and loading them one by one to see which one causes the > > failure. Big PITA, I know, but I can't think of any other way to go > > about this. > > It seems the best way to deal with this is to make the l3_handle_target > actually show the address where the error happened to limit it down > to a single device.. you can't really do that from within l3. It doesn't have enough information to figure that out. Since it pointed you to l4per2, then you need to decode l4per2's debug registers. That has never been implemented, though. What happened here is that l4per2 detected the bogus access from one of the devices attached to it and passed the error up to l3. Since we only have l3 decoding, that's what you see and it ends up being really cryptic. If you decode l4per2's registers, I'm sure it'll point you to a real device. I guess just to prove the concept, you just hack it inside l3 irq handler, though ideally we would have a real drivers/bus/omap-l4.c, or something like that. -- balbi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150223/39e82c45/attachment-0001.sig> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-02-24 14:46 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-23 23:59 New l3-noc error with CPUFREQ_DT built-in with v4.0-rc1 Tony Lindgren 2015-02-24 1:59 ` Tony Lindgren 2015-02-24 2:24 ` Felipe Balbi 2015-02-24 2:35 ` Tony Lindgren 2015-02-24 3:01 ` Tony Lindgren 2015-02-24 3:15 ` Felipe Balbi 2015-02-24 3:21 ` Tony Lindgren 2015-02-24 3:24 ` Tony Lindgren 2015-02-24 14:46 ` Felipe Balbi 2015-02-24 3:12 ` Felipe Balbi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).