* [PATCH] omap3: l3: Temporary fix to avoid the kernel hang with beagle board.
@ 2011-05-09 14:45 sricharan
2011-07-09 22:25 ` Paul Walmsley
0 siblings, 1 reply; 6+ messages in thread
From: sricharan @ 2011-05-09 14:45 UTC (permalink / raw)
To: linux-omap; +Cc: sricharan, Paul Wamsley, Santosh Shilimkar
Paul Walmsley reported a kernel hang issue with beagle board during
boot. This is an intermittent bug and the execution was found to be
stuck at the l3 interrupt handler.
This was due to a dss initiator agent timeout occuring during
the boot even when there is no actual interconnect access made by the
dss. since the reason for the dss timeout is not root caused yet,
the time out feature is disabled at the interconnect level.
Note that this is a temporary fix that should be removed once
the dss interconnect agent timeout issue is resolved.
Thanks to Paul Walmsley for reporting and helping in reproducing
this issue.
Signed-off-by: sricharan <r.sricharan@ti.com>
Cc: Paul Wamsley <paul@pwsan.com>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
arch/arm/mach-omap2/omap_l3_smx.c | 11 +++++++++++
arch/arm/mach-omap2/omap_l3_smx.h | 2 ++
2 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mach-omap2/omap_l3_smx.c b/arch/arm/mach-omap2/omap_l3_smx.c
index 4321e79..4ea7dcd 100644
--- a/arch/arm/mach-omap2/omap_l3_smx.c
+++ b/arch/arm/mach-omap2/omap_l3_smx.c
@@ -248,6 +248,17 @@ static int __init omap3_l3_probe(struct platform_device *pdev)
goto err2;
}
+ /*
+ * FIX ME: dss interconnect timeout error.
+ * Disable the l3 timeout reporting feature for all modules.
+ * Also reset the dss initiator agent with which the error is seen
+ * to clear the interrupt. This is a temporary fix and should be
+ * removed after root causing the issue.
+ */
+ omap3_l3_writell(l3->rt, L3_RT_NETWORK_CONTROL, 0x0);
+ omap3_l3_writell(l3->rt + L3_DSS_IA_CONTROL, L3_AGENT_CONTROL, 0x1);
+ omap3_l3_writell(l3->rt + L3_DSS_IA_CONTROL, L3_AGENT_CONTROL, 0x0);
+
l3->debug_irq = platform_get_irq(pdev, 0);
ret = request_irq(l3->debug_irq, omap3_l3_app_irq,
IRQF_DISABLED | IRQF_TRIGGER_RISING,
diff --git a/arch/arm/mach-omap2/omap_l3_smx.h b/arch/arm/mach-omap2/omap_l3_smx.h
index ba2ed9a..96fff9d 100644
--- a/arch/arm/mach-omap2/omap_l3_smx.h
+++ b/arch/arm/mach-omap2/omap_l3_smx.h
@@ -35,6 +35,8 @@
#define L3_ERROR_LOG_SECONDARY (1 << 30)
#define L3_ERROR_LOG_ADDR 0x060
+#define L3_RT_NETWORK_CONTROL 0x078
+#define L3_DSS_IA_CONTROL 0x5400
/* Register definitions for Sideband Interconnect */
#define L3_SI_CONTROL 0x020
--
1.7.0.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] omap3: l3: Temporary fix to avoid the kernel hang with beagle board.
2011-05-09 14:45 [PATCH] omap3: l3: Temporary fix to avoid the kernel hang with beagle board sricharan
@ 2011-07-09 22:25 ` Paul Walmsley
2011-07-10 0:22 ` Santosh Shilimkar
0 siblings, 1 reply; 6+ messages in thread
From: Paul Walmsley @ 2011-07-09 22:25 UTC (permalink / raw)
To: sricharan, Santosh Shilimkar; +Cc: linux-omap
Hi
On Mon, 9 May 2011, sricharan wrote:
> Paul Walmsley reported a kernel hang issue with beagle board during
> boot. This is an intermittent bug and the execution was found to be
> stuck at the l3 interrupt handler.
>
> This was due to a dss initiator agent timeout occuring during
> the boot even when there is no actual interconnect access made by the
> dss. since the reason for the dss timeout is not root caused yet,
> the time out feature is disabled at the interconnect level.
> Note that this is a temporary fix that should be removed once
> the dss interconnect agent timeout issue is resolved.
So it's been two months since this bug was reported. Any progress on
root-causing it?
I don't see how I can upstream this temporary patch with a straight face.
First, it tries to unconditionally reset the L3 DSS interconnect agent,
even if there's no problem on the L3 DSS IA that requires a reset. It
should only try to reset an IA if it's in a bad state.
Second, are you sure that reset sequence is correct? Writing a 1 and then
a 0 to that reset bit, without any barrier or delay in between? Could you
please confirm that this is a correct reset sequence with the L3 IA
designers and cc me on the E-mails, or send me an extract from the
relevant documentation?
Third, the patch disables L3 timeout reporting. This effectively reacts
to an error by pretending that the error did not exist. This isn't right.
If there's an L3 timeout, it needs to be reported, if at all possible. It
should never happen and it indicates something is wrong with the software
or the hardware.
- Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] omap3: l3: Temporary fix to avoid the kernel hang with beagle board.
2011-07-09 22:25 ` Paul Walmsley
@ 2011-07-10 0:22 ` Santosh Shilimkar
2011-07-10 0:30 ` Paul Walmsley
0 siblings, 1 reply; 6+ messages in thread
From: Santosh Shilimkar @ 2011-07-10 0:22 UTC (permalink / raw)
To: Paul Walmsley; +Cc: sricharan, linux-omap, Valkeinen, Tomi
+ Tomi,
On 7/9/2011 3:25 PM, Paul Walmsley wrote:
> Hi
>
> On Mon, 9 May 2011, sricharan wrote:
>
>> Paul Walmsley reported a kernel hang issue with beagle board
>> during boot. This is an intermittent bug and the execution was
>> found to be stuck at the l3 interrupt handler.
>>
>> This was due to a dss initiator agent timeout occuring during the
>> boot even when there is no actual interconnect access made by the
>> dss. since the reason for the dss timeout is not root caused yet,
>> the time out feature is disabled at the interconnect level. Note
>> that this is a temporary fix that should be removed once the dss
>> interconnect agent timeout issue is resolved.
>
> So it's been two months since this bug was reported. Any progress
> on root-causing it?
>
> I don't see how I can upstream this temporary patch with a straight
> face.
>
Sorry for not closing the loop on this thread but I thought Tomi
root-caused the DSS timeout issue to incorrect reset sequence of
DSS IP. With that fixed I though we shouldn't see that issue.
> First, it tries to unconditionally reset the L3 DSS interconnect
> agent, even if there's no problem on the L3 DSS IA that requires a
> reset. It should only try to reset an IA if it's in a bad state.
>
This was to ensure that the issue hasn't happened during boot-loader
DSS reset sequence in case it does. But I agree with your comments.
> Second, are you sure that reset sequence is correct? Writing a 1 and
> then a 0 to that reset bit, without any barrier or delay in between?
> Could you please confirm that this is a correct reset sequence with
> the L3 IA designers and cc me on the E-mails, or send me an extract
> from the relevant documentation?
>
> Third, the patch disables L3 timeout reporting. This effectively
> reacts to an error by pretending that the error did not exist. This
> isn't right. If there's an L3 timeout, it needs to be reported, if at
> all possible. It should never happen and it indicates something is
> wrong with the software or the hardware.
>
Will come back to you on above queries.
Regards
Santosh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] omap3: l3: Temporary fix to avoid the kernel hang with beagle board.
2011-07-10 0:22 ` Santosh Shilimkar
@ 2011-07-10 0:30 ` Paul Walmsley
2011-08-01 6:01 ` Tomi Valkeinen
0 siblings, 1 reply; 6+ messages in thread
From: Paul Walmsley @ 2011-07-10 0:30 UTC (permalink / raw)
To: Santosh Shilimkar; +Cc: sricharan, linux-omap, Valkeinen, Tomi
Hi Santosh,
On Sat, 9 Jul 2011, Santosh Shilimkar wrote:
> Sorry for not closing the loop on this thread but I thought Tomi
> root-caused the DSS timeout issue to incorrect reset sequence of
> DSS IP. With that fixed I though we shouldn't see that issue.
OK great, happy to hear that it was tracked down!
Tomi, do you have patches to fix the reset bug?
> > First, it tries to unconditionally reset the L3 DSS interconnect
> > agent, even if there's no problem on the L3 DSS IA that requires a
> > reset. It should only try to reset an IA if it's in a bad state.
> >
> This was to ensure that the issue hasn't happened during boot-loader
> DSS reset sequence in case it does. But I agree with your comments.
That's a good idea, but the patch should only do that if the L3 DSS IA is
reporting a timeout error.
> > Second, are you sure that reset sequence is correct? Writing a 1 and
> > then a 0 to that reset bit, without any barrier or delay in between?
> > Could you please confirm that this is a correct reset sequence with
> > the L3 IA designers and cc me on the E-mails, or send me an extract
> > from the relevant documentation?
> >
> > Third, the patch disables L3 timeout reporting. This effectively
> > reacts to an error by pretending that the error did not exist. This
> > isn't right. If there's an L3 timeout, it needs to be reported, if at
> > all possible. It should never happen and it indicates something is
> > wrong with the software or the hardware.
> >
> Will come back to you on above queries.
regards,
- Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] omap3: l3: Temporary fix to avoid the kernel hang with beagle board.
2011-07-10 0:30 ` Paul Walmsley
@ 2011-08-01 6:01 ` Tomi Valkeinen
2011-08-01 6:13 ` Santosh Shilimkar
0 siblings, 1 reply; 6+ messages in thread
From: Tomi Valkeinen @ 2011-08-01 6:01 UTC (permalink / raw)
To: Paul Walmsley; +Cc: Santosh Shilimkar, sricharan, linux-omap
On Sat, 2011-07-09 at 18:30 -0600, Paul Walmsley wrote:
> Hi Santosh,
>
> On Sat, 9 Jul 2011, Santosh Shilimkar wrote:
>
> > Sorry for not closing the loop on this thread but I thought Tomi
> > root-caused the DSS timeout issue to incorrect reset sequence of
> > DSS IP. With that fixed I though we shouldn't see that issue.
>
> OK great, happy to hear that it was tracked down!
>
> Tomi, do you have patches to fix the reset bug?
I have to say I'm not sure what this is about... I haven't seen any
hangs.
There was (or perhaps still is) problems with the hwmod code resetting
DSS. This was because the hwmod code didn't enable all the DSS clocks
before doing the reset. However, this shouldn't cause any problems in
the current mainline kernel, as the DSS driver there does a reset
itself. This will change then the DSS starts using pmruntime.
Tomi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] omap3: l3: Temporary fix to avoid the kernel hang with beagle board.
2011-08-01 6:01 ` Tomi Valkeinen
@ 2011-08-01 6:13 ` Santosh Shilimkar
0 siblings, 0 replies; 6+ messages in thread
From: Santosh Shilimkar @ 2011-08-01 6:13 UTC (permalink / raw)
To: Tomi Valkeinen; +Cc: Paul Walmsley, sricharan, linux-omap, Archit Taneja
Tomi,
On 8/1/2011 11:31 AM, Tomi Valkeinen wrote:
> On Sat, 2011-07-09 at 18:30 -0600, Paul Walmsley wrote:
>> Hi Santosh,
>>
>> On Sat, 9 Jul 2011, Santosh Shilimkar wrote:
>>
>>> Sorry for not closing the loop on this thread but I thought Tomi
>>> root-caused the DSS timeout issue to incorrect reset sequence of
>>> DSS IP. With that fixed I though we shouldn't see that issue.
>>
>> OK great, happy to hear that it was tracked down!
>>
>> Tomi, do you have patches to fix the reset bug?
>
> I have to say I'm not sure what this is about... I haven't seen any
> hangs.
>
> There was (or perhaps still is) problems with the hwmod code resetting
> DSS. This was because the hwmod code didn't enable all the DSS clocks
> before doing the reset. However, this shouldn't cause any problems in
> the current mainline kernel, as the DSS driver there does a reset
> itself. This will change then the DSS starts using pmruntime.
>
During your vacation, Archit and Sricharan looked at the issue further.
The issue is indeed related to DSS reset. Archit has posted the patch
on internal review. Please have a look at it.
Regards
Santosh
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-08-01 6:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-09 14:45 [PATCH] omap3: l3: Temporary fix to avoid the kernel hang with beagle board sricharan
2011-07-09 22:25 ` Paul Walmsley
2011-07-10 0:22 ` Santosh Shilimkar
2011-07-10 0:30 ` Paul Walmsley
2011-08-01 6:01 ` Tomi Valkeinen
2011-08-01 6:13 ` Santosh Shilimkar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).