* am335x: cpsw: interrupt failure
@ 2014-12-04 16:41 Yegor Yefremov
2014-12-04 16:56 ` Felipe Balbi
0 siblings, 1 reply; 16+ messages in thread
From: Yegor Yefremov @ 2014-12-04 16:41 UTC (permalink / raw)
To: netdev; +Cc: N, Mugunthan V, Felipe Balbi
I have following problem. My systems reboots at high network load
after this commit (found via git bissect):
commit 55601c9f24670ba926ebdd4d712ac3b177232330
Author: Felipe Balbi <balbi@ti.com>
Date: Mon Sep 8 17:54:58 2014 -0700
arm: omap: intc: switch over to linear irq domain
now that we don't need to support legacy board-files,
we can completely switch over to a linear irq domain
and make use of irq_alloc_domain_generic_chips() to
allocate all generic irq chips for us.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
and I get following error messages:
irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
->action(): (null)
IRQ_NOPROBE set
IRQ_NOREQUEST set
irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
->action(): (null)
IRQ_NOPROBE set
IRQ_NOREQUEST set
irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
->action(): (null)
IRQ_NOPROBE set
IRQ_NOREQUEST set
irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
->action(): (null)
My system: am335x with fast ethernet on the first slave and gigabit
Ethernet on second CPSW slave. This issue occurs, when I ran nuttcp
with default settings.
With commit above I can at least see these messages, but 3.18-rc7 for
example reboots without any messages.
Any idea?
Yegor
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-04 16:41 am335x: cpsw: interrupt failure Yegor Yefremov
@ 2014-12-04 16:56 ` Felipe Balbi
2014-12-05 10:03 ` Yegor Yefremov
0 siblings, 1 reply; 16+ messages in thread
From: Felipe Balbi @ 2014-12-04 16:56 UTC (permalink / raw)
To: Yegor Yefremov; +Cc: netdev, N, Mugunthan V, Felipe Balbi
[-- Attachment #1: Type: text/plain, Size: 2112 bytes --]
Hi,
On Thu, Dec 04, 2014 at 05:41:38PM +0100, Yegor Yefremov wrote:
> I have following problem. My systems reboots at high network load
> after this commit (found via git bissect):
>
> commit 55601c9f24670ba926ebdd4d712ac3b177232330
> Author: Felipe Balbi <balbi@ti.com>
> Date: Mon Sep 8 17:54:58 2014 -0700
>
> arm: omap: intc: switch over to linear irq domain
>
> now that we don't need to support legacy board-files,
> we can completely switch over to a linear irq domain
> and make use of irq_alloc_domain_generic_chips() to
> allocate all generic irq chips for us.
>
> Signed-off-by: Felipe Balbi <balbi@ti.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
>
> and I get following error messages:
>
> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
irq 0 ? Weird, that's not a valid IRQ.
> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> ->action(): (null)
> IRQ_NOPROBE set
> IRQ_NOREQUEST set
> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> ->action(): (null)
> IRQ_NOPROBE set
> IRQ_NOREQUEST set
> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> ->action(): (null)
> IRQ_NOPROBE set
> IRQ_NOREQUEST set
> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> ->action(): (null)
>
> My system: am335x with fast ethernet on the first slave and gigabit
> Ethernet on second CPSW slave. This issue occurs, when I ran nuttcp
> with default settings.
>
> With commit above I can at least see these messages, but 3.18-rc7 for
> example reboots without any messages.
>
> Any idea?
if you take v3.18-rc7 and just revert that commit, does the problem go
away ?
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-04 16:56 ` Felipe Balbi
@ 2014-12-05 10:03 ` Yegor Yefremov
2014-12-10 17:17 ` Felipe Balbi
0 siblings, 1 reply; 16+ messages in thread
From: Yegor Yefremov @ 2014-12-05 10:03 UTC (permalink / raw)
To: Felipe Balbi; +Cc: netdev, N, Mugunthan V
[-- Attachment #1: Type: text/plain, Size: 2645 bytes --]
On Thu, Dec 4, 2014 at 5:56 PM, Felipe Balbi <balbi@ti.com> wrote:
> Hi,
>
> On Thu, Dec 04, 2014 at 05:41:38PM +0100, Yegor Yefremov wrote:
>> I have following problem. My systems reboots at high network load
>> after this commit (found via git bissect):
>>
>> commit 55601c9f24670ba926ebdd4d712ac3b177232330
>> Author: Felipe Balbi <balbi@ti.com>
>> Date: Mon Sep 8 17:54:58 2014 -0700
>>
>> arm: omap: intc: switch over to linear irq domain
>>
>> now that we don't need to support legacy board-files,
>> we can completely switch over to a linear irq domain
>> and make use of irq_alloc_domain_generic_chips() to
>> allocate all generic irq chips for us.
>>
>> Signed-off-by: Felipe Balbi <balbi@ti.com>
>> Signed-off-by: Tony Lindgren <tony@atomide.com>
>>
>> and I get following error messages:
>>
>> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
>
> irq 0 ? Weird, that's not a valid IRQ.
>
>> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
>> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
>> ->action(): (null)
>> IRQ_NOPROBE set
>> IRQ_NOREQUEST set
>> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
>> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
>> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
>> ->action(): (null)
>> IRQ_NOPROBE set
>> IRQ_NOREQUEST set
>> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
>> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
>> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
>> ->action(): (null)
>> IRQ_NOPROBE set
>> IRQ_NOREQUEST set
>> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
>> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
>> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
>> ->action(): (null)
>>
>> My system: am335x with fast ethernet on the first slave and gigabit
>> Ethernet on second CPSW slave. This issue occurs, when I ran nuttcp
>> with default settings.
>>
>> With commit above I can at least see these messages, but 3.18-rc7 for
>> example reboots without any messages.
>>
>> Any idea?
>
> if you take v3.18-rc7 and just revert that commit, does the problem go
> away ?
git revert failed as the driver has more changes meanwhile or I'm
missing some params. I've tried to force the driver to use legacy
routines, but then I don't get pass U-Boot's "Starting kernel ..." See
attached patch.
Compiler used:
Linux version 3.18.0-rc7 (...) (gcc version 4.8.3 20140320
(prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #309 SMP Fri Dec 5
10:59:38 CET 2014
Btw, what am335x based hardware do you have? I can run tests on both
BBB and am335x-evmsk.
Yegor
[-- Attachment #2: 0001-Irq-patch-revert.patch --]
[-- Type: application/octet-stream, Size: 2338 bytes --]
From 889a0588e884083399e644909ec39e834f37c4d3 Mon Sep 17 00:00:00 2001
From: Yegor Yefremov <yegorslists@googlemail.com>
Date: Fri, 5 Dec 2014 10:54:37 +0100
Subject: [PATCH] Irq patch revert
---
drivers/irqchip/irq-omap-intc.c | 21 +++++++++++++++------
1 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/irqchip/irq-omap-intc.c b/drivers/irqchip/irq-omap-intc.c
index 28718d3..a60a0ac 100644
--- a/drivers/irqchip/irq-omap-intc.c
+++ b/drivers/irqchip/irq-omap-intc.c
@@ -186,6 +186,7 @@ void omap3_intc_suspend(void)
omap_ack_irq(NULL);
}
+#if 0
static int __init omap_alloc_gc_of(struct irq_domain *d, void __iomem *base)
{
int ret;
@@ -222,7 +223,7 @@ static int __init omap_alloc_gc_of(struct irq_domain *d, void __iomem *base)
return 0;
}
-
+#endif
static void __init omap_alloc_gc_legacy(void __iomem *base,
unsigned int irq_start, unsigned int num)
{
@@ -243,6 +244,7 @@ static void __init omap_alloc_gc_legacy(void __iomem *base,
IRQ_NOREQUEST | IRQ_NOPROBE, 0);
}
+#if 0
static int __init omap_init_irq_of(struct device_node *node)
{
int ret;
@@ -262,6 +264,7 @@ static int __init omap_init_irq_of(struct device_node *node)
return ret;
}
+#endif
static int __init omap_init_irq_legacy(u32 base)
{
@@ -301,13 +304,13 @@ static int __init omap_init_irq(u32 base, struct device_node *node)
{
int ret;
- if (node)
+ /*if (node)
ret = omap_init_irq_of(node);
- else
+ else*/
ret = omap_init_irq_legacy(base);
- if (ret == 0)
- omap_irq_enable_protection();
+ /*if (ret == 0)
+ omap_irq_enable_protection();*/
return ret;
}
@@ -376,6 +379,7 @@ static int __init intc_of_init(struct device_node *node,
struct device_node *parent)
{
int ret;
+ struct resource res;
omap_nr_pending = 3;
omap_nr_irqs = 96;
@@ -383,12 +387,17 @@ static int __init intc_of_init(struct device_node *node,
if (WARN_ON(!node))
return -ENODEV;
+ if (of_address_to_resource(node, 0, &res)) {
+ WARN(1, "unable to get intc registers\n");
+ return -EINVAL;
+ }
+
if (of_device_is_compatible(node, "ti,am33xx-intc")) {
omap_nr_irqs = 128;
omap_nr_pending = 4;
}
- ret = omap_init_irq(-1, of_node_get(node));
+ ret = omap_init_irq(res.start, of_node_get(node));
if (ret < 0)
return ret;
--
1.7.7
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-05 10:03 ` Yegor Yefremov
@ 2014-12-10 17:17 ` Felipe Balbi
2014-12-10 20:58 ` Yegor Yefremov
0 siblings, 1 reply; 16+ messages in thread
From: Felipe Balbi @ 2014-12-10 17:17 UTC (permalink / raw)
To: Yegor Yefremov; +Cc: Felipe Balbi, netdev, N, Mugunthan V
[-- Attachment #1: Type: text/plain, Size: 3078 bytes --]
Hi,
On Fri, Dec 05, 2014 at 11:03:44AM +0100, Yegor Yefremov wrote:
> On Thu, Dec 4, 2014 at 5:56 PM, Felipe Balbi <balbi@ti.com> wrote:
> > Hi,
> >
> > On Thu, Dec 04, 2014 at 05:41:38PM +0100, Yegor Yefremov wrote:
> >> I have following problem. My systems reboots at high network load
> >> after this commit (found via git bissect):
> >>
> >> commit 55601c9f24670ba926ebdd4d712ac3b177232330
> >> Author: Felipe Balbi <balbi@ti.com>
> >> Date: Mon Sep 8 17:54:58 2014 -0700
> >>
> >> arm: omap: intc: switch over to linear irq domain
> >>
> >> now that we don't need to support legacy board-files,
> >> we can completely switch over to a linear irq domain
> >> and make use of irq_alloc_domain_generic_chips() to
> >> allocate all generic irq chips for us.
> >>
> >> Signed-off-by: Felipe Balbi <balbi@ti.com>
> >> Signed-off-by: Tony Lindgren <tony@atomide.com>
> >>
> >> and I get following error messages:
> >>
> >> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> >
> > irq 0 ? Weird, that's not a valid IRQ.
> >
> >> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
> >> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> >> ->action(): (null)
> >> IRQ_NOPROBE set
> >> IRQ_NOREQUEST set
> >> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> >> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
> >> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> >> ->action(): (null)
> >> IRQ_NOPROBE set
> >> IRQ_NOREQUEST set
> >> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> >> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
> >> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> >> ->action(): (null)
> >> IRQ_NOPROBE set
> >> IRQ_NOREQUEST set
> >> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> >> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
> >> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> >> ->action(): (null)
> >>
> >> My system: am335x with fast ethernet on the first slave and gigabit
> >> Ethernet on second CPSW slave. This issue occurs, when I ran nuttcp
> >> with default settings.
> >>
> >> With commit above I can at least see these messages, but 3.18-rc7 for
> >> example reboots without any messages.
> >>
> >> Any idea?
> >
> > if you take v3.18-rc7 and just revert that commit, does the problem go
> > away ?
>
> git revert failed as the driver has more changes meanwhile or I'm
> missing some params. I've tried to force the driver to use legacy
> routines, but then I don't get pass U-Boot's "Starting kernel ..." See
> attached patch.
>
> Compiler used:
>
> Linux version 3.18.0-rc7 (...) (gcc version 4.8.3 20140320
> (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #309 SMP Fri Dec 5
> 10:59:38 CET 2014
>
> Btw, what am335x based hardware do you have? I can run tests on both
> BBB and am335x-evmsk.
coming back to this. I have BBB only. Can you provide some extra
information on how I can trigger this problem here ?
cheers
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-10 17:17 ` Felipe Balbi
@ 2014-12-10 20:58 ` Yegor Yefremov
2014-12-10 21:02 ` Felipe Balbi
0 siblings, 1 reply; 16+ messages in thread
From: Yegor Yefremov @ 2014-12-10 20:58 UTC (permalink / raw)
To: Felipe Balbi; +Cc: netdev, N, Mugunthan V
On Wed, Dec 10, 2014 at 6:17 PM, Felipe Balbi <balbi@ti.com> wrote:
> Hi,
>
> On Fri, Dec 05, 2014 at 11:03:44AM +0100, Yegor Yefremov wrote:
>> On Thu, Dec 4, 2014 at 5:56 PM, Felipe Balbi <balbi@ti.com> wrote:
>> > Hi,
>> >
>> > On Thu, Dec 04, 2014 at 05:41:38PM +0100, Yegor Yefremov wrote:
>> >> I have following problem. My systems reboots at high network load
>> >> after this commit (found via git bissect):
>> >>
>> >> commit 55601c9f24670ba926ebdd4d712ac3b177232330
>> >> Author: Felipe Balbi <balbi@ti.com>
>> >> Date: Mon Sep 8 17:54:58 2014 -0700
>> >>
>> >> arm: omap: intc: switch over to linear irq domain
>> >>
>> >> now that we don't need to support legacy board-files,
>> >> we can completely switch over to a linear irq domain
>> >> and make use of irq_alloc_domain_generic_chips() to
>> >> allocate all generic irq chips for us.
>> >>
>> >> Signed-off-by: Felipe Balbi <balbi@ti.com>
>> >> Signed-off-by: Tony Lindgren <tony@atomide.com>
>> >>
>> >> and I get following error messages:
>> >>
>> >> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
>> >
>> > irq 0 ? Weird, that's not a valid IRQ.
>> >
>> >> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
>> >> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
>> >> ->action(): (null)
>> >> IRQ_NOPROBE set
>> >> IRQ_NOREQUEST set
>> >> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
>> >> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
>> >> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
>> >> ->action(): (null)
>> >> IRQ_NOPROBE set
>> >> IRQ_NOREQUEST set
>> >> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
>> >> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
>> >> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
>> >> ->action(): (null)
>> >> IRQ_NOPROBE set
>> >> IRQ_NOREQUEST set
>> >> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
>> >> ->handle_irq(): c0087fc0, handle_bad_irq+0x0/0x258
>> >> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
>> >> ->action(): (null)
>> >>
>> >> My system: am335x with fast ethernet on the first slave and gigabit
>> >> Ethernet on second CPSW slave. This issue occurs, when I ran nuttcp
>> >> with default settings.
>> >>
>> >> With commit above I can at least see these messages, but 3.18-rc7 for
>> >> example reboots without any messages.
>> >>
>> >> Any idea?
>> >
>> > if you take v3.18-rc7 and just revert that commit, does the problem go
>> > away ?
>>
>> git revert failed as the driver has more changes meanwhile or I'm
>> missing some params. I've tried to force the driver to use legacy
>> routines, but then I don't get pass U-Boot's "Starting kernel ..." See
>> attached patch.
>>
>> Compiler used:
>>
>> Linux version 3.18.0-rc7 (...) (gcc version 4.8.3 20140320
>> (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #309 SMP Fri Dec 5
>> 10:59:38 CET 2014
>>
>> Btw, what am335x based hardware do you have? I can run tests on both
>> BBB and am335x-evmsk.
>
> coming back to this. I have BBB only. Can you provide some extra
> information on how I can trigger this problem here ?
I have basically two am335x based boards, where I can trigger this
problem via nuttcp (I think iperf would do the job too). The first
system stalls almost immediately, the second one was working for about
7 minutes. I have tried the same kernel on am335x-evmsk - and this
system didn't stall. I could provide dts files for both systems.
I've tried to reduce my dts as much as I could to match am335x-evmsk
dts, I have even removed entries for the PMIC, but still the system
stalls. Btw PMIC's INT line is connected to a GPIO pin on processor.
I've used omap2plus_defconfig for all 3 devices. Any other info I can supply?
Yegor
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-10 20:58 ` Yegor Yefremov
@ 2014-12-10 21:02 ` Felipe Balbi
2014-12-10 22:56 ` Yegor Yefremov
0 siblings, 1 reply; 16+ messages in thread
From: Felipe Balbi @ 2014-12-10 21:02 UTC (permalink / raw)
To: Yegor Yefremov; +Cc: Felipe Balbi, netdev, N, Mugunthan V
[-- Attachment #1: Type: text/plain, Size: 1727 bytes --]
Hi,
On Wed, Dec 10, 2014 at 09:58:23PM +0100, Yegor Yefremov wrote:
[snip]
> >> git revert failed as the driver has more changes meanwhile or I'm
> >> missing some params. I've tried to force the driver to use legacy
> >> routines, but then I don't get pass U-Boot's "Starting kernel ..." See
> >> attached patch.
> >>
> >> Compiler used:
> >>
> >> Linux version 3.18.0-rc7 (...) (gcc version 4.8.3 20140320
> >> (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #309 SMP Fri Dec 5
> >> 10:59:38 CET 2014
> >>
> >> Btw, what am335x based hardware do you have? I can run tests on both
> >> BBB and am335x-evmsk.
> >
> > coming back to this. I have BBB only. Can you provide some extra
> > information on how I can trigger this problem here ?
>
> I have basically two am335x based boards, where I can trigger this
too bad, I have a single am335x board available here and that's my BBB.
Do both boards stall or only the server or only the client ?
If only one of them fail, then I try connecting my BBB to my AM437x SK
and see if that'll die too.
> problem via nuttcp (I think iperf would do the job too). The first
> system stalls almost immediately, the second one was working for about
> 7 minutes. I have tried the same kernel on am335x-evmsk - and this
> system didn't stall. I could provide dts files for both systems.
>
> I've tried to reduce my dts as much as I could to match am335x-evmsk
> dts, I have even removed entries for the PMIC, but still the system
> stalls. Btw PMIC's INT line is connected to a GPIO pin on processor.
>
> I've used omap2plus_defconfig for all 3 devices. Any other info I can
> supply?
just the extra bit of info above.
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-10 21:02 ` Felipe Balbi
@ 2014-12-10 22:56 ` Yegor Yefremov
2014-12-12 12:00 ` Yegor Yefremov
0 siblings, 1 reply; 16+ messages in thread
From: Yegor Yefremov @ 2014-12-10 22:56 UTC (permalink / raw)
To: Felipe Balbi; +Cc: netdev, N, Mugunthan V
On Wed, Dec 10, 2014 at 10:02 PM, Felipe Balbi <balbi@ti.com> wrote:
> Hi,
>
> On Wed, Dec 10, 2014 at 09:58:23PM +0100, Yegor Yefremov wrote:
>
> [snip]
>
>> >> git revert failed as the driver has more changes meanwhile or I'm
>> >> missing some params. I've tried to force the driver to use legacy
>> >> routines, but then I don't get pass U-Boot's "Starting kernel ..." See
>> >> attached patch.
>> >>
>> >> Compiler used:
>> >>
>> >> Linux version 3.18.0-rc7 (...) (gcc version 4.8.3 20140320
>> >> (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #309 SMP Fri Dec 5
>> >> 10:59:38 CET 2014
>> >>
>> >> Btw, what am335x based hardware do you have? I can run tests on both
>> >> BBB and am335x-evmsk.
>> >
>> > coming back to this. I have BBB only. Can you provide some extra
>> > information on how I can trigger this problem here ?
>>
>> I have basically two am335x based boards, where I can trigger this
>
> too bad, I have a single am335x board available here and that's my BBB.
>
> Do both boards stall or only the server or only the client ?
>
> If only one of them fail, then I try connecting my BBB to my AM437x SK
> and see if that'll die too.
I have following setup: all three devices (am335x-evmsk inclusively)
are connected to my development host via Gigabit switch. And on all of
them I run "nuttcp -S" and my dev host is client (nuttcp -t). Just
normal setup, when you try to measure am335x network performance. So
am335x devices communicate only with my host and not with each other.
If I use 3.18 I can only observe, that the system freezes, but when I
use the kernel with above mentioned commit I see interrupt errors,
like I described in my first e-mail.
>> problem via nuttcp (I think iperf would do the job too). The first
>> system stalls almost immediately, the second one was working for about
>> 7 minutes. I have tried the same kernel on am335x-evmsk - and this
>> system didn't stall. I could provide dts files for both systems.
>>
>> I've tried to reduce my dts as much as I could to match am335x-evmsk
>> dts, I have even removed entries for the PMIC, but still the system
>> stalls. Btw PMIC's INT line is connected to a GPIO pin on processor.
>>
>> I've used omap2plus_defconfig for all 3 devices. Any other info I can
>> supply?
>
> just the extra bit of info above.
>
> --
> balbi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-10 22:56 ` Yegor Yefremov
@ 2014-12-12 12:00 ` Yegor Yefremov
2014-12-12 17:32 ` Felipe Balbi
0 siblings, 1 reply; 16+ messages in thread
From: Yegor Yefremov @ 2014-12-12 12:00 UTC (permalink / raw)
To: Felipe Balbi; +Cc: netdev, N, Mugunthan V
On Wed, Dec 10, 2014 at 11:56 PM, Yegor Yefremov
<yegorslists@googlemail.com> wrote:
> On Wed, Dec 10, 2014 at 10:02 PM, Felipe Balbi <balbi@ti.com> wrote:
>> Hi,
>>
>> On Wed, Dec 10, 2014 at 09:58:23PM +0100, Yegor Yefremov wrote:
>>
>> [snip]
>>
>>> >> git revert failed as the driver has more changes meanwhile or I'm
>>> >> missing some params. I've tried to force the driver to use legacy
>>> >> routines, but then I don't get pass U-Boot's "Starting kernel ..." See
>>> >> attached patch.
>>> >>
>>> >> Compiler used:
>>> >>
>>> >> Linux version 3.18.0-rc7 (...) (gcc version 4.8.3 20140320
>>> >> (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #309 SMP Fri Dec 5
>>> >> 10:59:38 CET 2014
>>> >>
>>> >> Btw, what am335x based hardware do you have? I can run tests on both
>>> >> BBB and am335x-evmsk.
>>> >
>>> > coming back to this. I have BBB only. Can you provide some extra
>>> > information on how I can trigger this problem here ?
>>>
>>> I have basically two am335x based boards, where I can trigger this
>>
>> too bad, I have a single am335x board available here and that's my BBB.
>>
>> Do both boards stall or only the server or only the client ?
>>
>> If only one of them fail, then I try connecting my BBB to my AM437x SK
>> and see if that'll die too.
>
> I have following setup: all three devices (am335x-evmsk inclusively)
> are connected to my development host via Gigabit switch. And on all of
> them I run "nuttcp -S" and my dev host is client (nuttcp -t). Just
> normal setup, when you try to measure am335x network performance. So
> am335x devices communicate only with my host and not with each other.
>
> If I use 3.18 I can only observe, that the system freezes, but when I
> use the kernel with above mentioned commit I see interrupt errors,
> like I described in my first e-mail.
>
>>> problem via nuttcp (I think iperf would do the job too). The first
>>> system stalls almost immediately, the second one was working for about
>>> 7 minutes. I have tried the same kernel on am335x-evmsk - and this
>>> system didn't stall. I could provide dts files for both systems.
>>>
>>> I've tried to reduce my dts as much as I could to match am335x-evmsk
>>> dts, I have even removed entries for the PMIC, but still the system
>>> stalls. Btw PMIC's INT line is connected to a GPIO pin on processor.
>>>
>>> I've used omap2plus_defconfig for all 3 devices. Any other info I can
>>> supply?
>>
>> just the extra bit of info above.
I've got BBB stalled. I could do it from two different hosts: one
Windows 7 64-bit machine with Xubuntu VM and one older native Linux
server. Command used on both PC's (not simultaneously):
nuttcp -t -N 4 -T30m 192.168.1.235
BBB stalled after some minutes.
Kernel defconfig:
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_FHANDLE=y
CONFIG_AUDIT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=16
CONFIG_CGROUPS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
CONFIG_NAMESPACES=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_EXPERT=y
CONFIG_SLAB=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_PARTITION_ADVANCED=y
CONFIG_ARCH_MULTI_V6=y
CONFIG_POWER_AVS_OMAP=y
CONFIG_POWER_AVS_OMAP_CLASS3=y
CONFIG_OMAP_RESET_CLOCKS=y
CONFIG_OMAP_MUX_DEBUG=y
CONFIG_ARCH_OMAP2=y
CONFIG_ARCH_OMAP3=y
CONFIG_ARCH_OMAP4=y
CONFIG_SOC_OMAP5=y
CONFIG_SOC_AM33XX=y
CONFIG_SOC_AM43XX=y
CONFIG_SOC_DRA7XX=y
CONFIG_ARM_THUMBEE=y
CONFIG_ARM_ERRATA_411920=y
CONFIG_ARM_ERRATA_430973=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2
CONFIG_CMA=y
CONFIG_SECCOMP=y
CONFIG_ZBOOT_ROM_TEXT=0x0
CONFIG_ZBOOT_ROM_BSS=0x0
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
CONFIG_CMDLINE="root=/dev/mmcblk0p2 rootwait console=ttyO2,115200"
CONFIG_KEXEC=y
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
# CONFIG_ARM_OMAP2PLUS_CPUFREQ is not set
CONFIG_CPU_IDLE=y
CONFIG_BINFMT_MISC=y
CONFIG_PM_DEBUG=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=y
CONFIG_NET_KEY=y
CONFIG_NET_KEY_MIGRATE=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y
# CONFIG_INET_LRO is not set
CONFIG_NETFILTER=y
CONFIG_CAN=m
CONFIG_CAN_C_CAN=m
CONFIG_CAN_C_CAN_PLATFORM=m
CONFIG_BT=m
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
CONFIG_BT_HCIUART_LL=y
CONFIG_BT_HCIBCM203X=m
CONFIG_BT_HCIBPA10X=m
CONFIG_CFG80211=m
CONFIG_MAC80211=m
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_DMA_CMA=y
CONFIG_OMAP_OCP2SCP=y
CONFIG_CONNECTOR=y
CONFIG_MTD=y
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_OOPS=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_INTELEXT=y
CONFIG_MTD_NAND=y
CONFIG_MTD_NAND_ECC_BCH=y
CONFIG_MTD_NAND_OMAP2=y
CONFIG_MTD_ONENAND=y
CONFIG_MTD_ONENAND_VERIFY_WRITE=y
CONFIG_MTD_ONENAND_OMAP2=y
CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_SENSORS_TSL2550=m
CONFIG_BMP085_I2C=m
CONFIG_SRAM=y
CONFIG_SENSORS_LIS3_I2C=m
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_MD=y
CONFIG_NETDEVICES=y
CONFIG_KS8851=y
CONFIG_KS8851_MLL=y
CONFIG_SMC91X=y
CONFIG_SMSC911X=y
CONFIG_TI_CPSW=y
CONFIG_AT803X_PHY=y
CONFIG_SMSC_PHY=y
CONFIG_USB_USBNET=y
CONFIG_USB_NET_SMSC95XX=y
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_EPSON2888=y
CONFIG_USB_KC2190=y
CONFIG_LIBERTAS=m
CONFIG_LIBERTAS_USB=m
CONFIG_LIBERTAS_SDIO=m
CONFIG_LIBERTAS_DEBUG=y
CONFIG_WL_TI=y
CONFIG_WL12XX=m
CONFIG_WL18XX=m
CONFIG_WLCORE_SPI=m
CONFIG_WLCORE_SDIO=m
CONFIG_MWIFIEX=m
CONFIG_MWIFIEX_SDIO=m
CONFIG_MWIFIEX_USB=m
CONFIG_INPUT_JOYDEV=y
CONFIG_INPUT_EVDEV=y
CONFIG_KEYBOARD_GPIO=y
CONFIG_KEYBOARD_MATRIX=m
CONFIG_KEYBOARD_TWL4030=y
CONFIG_INPUT_TOUCHSCREEN=y
CONFIG_TOUCHSCREEN_ADS7846=m
CONFIG_TOUCHSCREEN_TSC2005=m
CONFIG_TOUCHSCREEN_TSC2007=m
CONFIG_INPUT_MISC=y
CONFIG_INPUT_TWL4030_PWRBUTTON=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y
CONFIG_SERIAL_OF_PLATFORM=y
CONFIG_SERIAL_OMAP=y
CONFIG_SERIAL_OMAP_CONSOLE=y
CONFIG_HW_RANDOM=y
CONFIG_I2C_CHARDEV=y
CONFIG_SPI=y
CONFIG_SPI_OMAP24XX=y
CONFIG_PINCTRL_SINGLE=y
CONFIG_DEBUG_GPIO=y
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_TWL4030=y
CONFIG_W1=y
CONFIG_BATTERY_BQ27x00=m
CONFIG_CHARGER_ISP1704=m
CONFIG_CHARGER_TWL4030=m
CONFIG_CHARGER_BQ2415X=m
CONFIG_CHARGER_BQ24190=m
CONFIG_CHARGER_BQ24735=m
CONFIG_POWER_RESET=y
CONFIG_POWER_AVS=y
CONFIG_SENSORS_LM75=m
CONFIG_THERMAL=y
CONFIG_THERMAL_GOV_FAIR_SHARE=y
CONFIG_THERMAL_GOV_USER_SPACE=y
CONFIG_CPU_THERMAL=y
CONFIG_TI_SOC_THERMAL=y
CONFIG_TI_THERMAL=y
CONFIG_OMAP4_THERMAL=y
CONFIG_OMAP5_THERMAL=y
CONFIG_DRA752_THERMAL=y
CONFIG_WATCHDOG=y
CONFIG_OMAP_WATCHDOG=y
CONFIG_TWL4030_WATCHDOG=y
CONFIG_MFD_PALMAS=y
CONFIG_MFD_TPS65217=y
CONFIG_MFD_TPS65218=y
CONFIG_MFD_TPS65910=y
CONFIG_TWL6040_CORE=y
CONFIG_REGULATOR_PALMAS=y
CONFIG_REGULATOR_PBIAS=y
CONFIG_REGULATOR_TI_ABB=y
CONFIG_REGULATOR_TPS65023=y
CONFIG_REGULATOR_TPS6507X=y
CONFIG_REGULATOR_TPS65217=y
CONFIG_REGULATOR_TPS65218=y
CONFIG_REGULATOR_TPS65910=y
CONFIG_REGULATOR_TWL4030=y
CONFIG_FB=y
CONFIG_FIRMWARE_EDID=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y
CONFIG_OMAP2_DSS=m
CONFIG_OMAP5_DSS_HDMI=y
CONFIG_OMAP2_DSS_SDI=y
CONFIG_OMAP2_DSS_DSI=y
CONFIG_FB_OMAP2=m
CONFIG_DISPLAY_ENCODER_TFP410=m
CONFIG_DISPLAY_ENCODER_TPD12S015=m
CONFIG_DISPLAY_CONNECTOR_DVI=m
CONFIG_DISPLAY_CONNECTOR_HDMI=m
CONFIG_DISPLAY_CONNECTOR_ANALOG_TV=m
CONFIG_DISPLAY_PANEL_DPI=m
CONFIG_DISPLAY_PANEL_DSI_CM=m
CONFIG_DISPLAY_PANEL_SONY_ACX565AKM=m
CONFIG_DISPLAY_PANEL_LGPHILIPS_LB035Q02=m
CONFIG_DISPLAY_PANEL_SHARP_LS037V7DW01=m
CONFIG_DISPLAY_PANEL_TPO_TD028TTEC1=m
CONFIG_DISPLAY_PANEL_TPO_TD043MTEA1=m
CONFIG_DISPLAY_PANEL_NEC_NL8048HL11=m
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_PLATFORM=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_GENERIC=m
CONFIG_BACKLIGHT_PWM=m
CONFIG_BACKLIGHT_PANDORA=m
CONFIG_BACKLIGHT_GPIO=m
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
CONFIG_LOGO=y
CONFIG_SOUND=m
CONFIG_SND=m
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_VERBOSE_PRINTK=y
CONFIG_SND_DEBUG=y
CONFIG_SND_USB_AUDIO=m
CONFIG_SND_SOC=m
CONFIG_SND_OMAP_SOC=m
CONFIG_SND_OMAP_SOC_OMAP_TWL4030=m
CONFIG_SND_OMAP_SOC_OMAP_ABE_TWL6040=m
CONFIG_SND_OMAP_SOC_OMAP3_PANDORA=m
CONFIG_USB=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
CONFIG_USB_MON=y
CONFIG_USB_ACM=y
CONFIG_USB_WDM=y
CONFIG_USB_STORAGE=y
CONFIG_USB_MUSB_HDRC=y
CONFIG_USB_MUSB_DSPS=y
CONFIG_USB_TI_CPPI41_DMA=y
CONFIG_USB_DWC3=m
CONFIG_USB_SERIAL=y
CONFIG_USB_SERIAL_FTDI_SIO=y
CONFIG_USB_TEST=y
CONFIG_AM335X_PHY_USB=y
CONFIG_USB_GADGET=y
CONFIG_USB_GADGET_DEBUG=y
CONFIG_USB_GADGET_DEBUG_FILES=y
CONFIG_USB_GADGET_DEBUG_FS=y
CONFIG_USB_ZERO=m
CONFIG_MMC=y
CONFIG_SDIO_UART=y
CONFIG_MMC_OMAP=y
CONFIG_MMC_OMAP_HS=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_ONESHOT=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_BACKLIGHT=y
CONFIG_LEDS_TRIGGER_CPU=y
CONFIG_LEDS_TRIGGER_GPIO=y
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_TWL92330=y
CONFIG_RTC_DRV_TWL4030=y
CONFIG_RTC_DRV_OMAP=y
CONFIG_DMADEVICES=y
CONFIG_TI_EDMA=y
CONFIG_DMA_OMAP=y
CONFIG_EXTCON=y
CONFIG_EXTCON_PALMAS=y
CONFIG_PWM=y
CONFIG_PWM_TWL=y
CONFIG_PWM_TWL_LED=y
CONFIG_OMAP_USB2=y
CONFIG_TI_PIPE3=y
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
CONFIG_FANOTIFY=y
CONFIG_QUOTA=y
CONFIG_QFMT_V2=y
CONFIG_AUTOFS4_FS=m
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_JFFS2_FS=y
CONFIG_JFFS2_SUMMARY=y
CONFIG_JFFS2_FS_XATTR=y
CONFIG_JFFS2_COMPRESSION_OPTIONS=y
CONFIG_JFFS2_LZO=y
CONFIG_JFFS2_RUBIN=y
CONFIG_UBIFS_FS=y
CONFIG_CRAMFS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_ROOT_NFS=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
CONFIG_PROVE_LOCKING=y
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_SECURITY=y
CONFIG_CRYPTO_MICHAEL_MIC=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRC_CCITT=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=y
CONFIG_CRC7=y
CONFIG_LIBCRC32C=y
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
Buildroot defconfig:
BR2_arm=y
BR2_cortex_a8=y
BR2_JLEVEL=24
BR2_CCACHE=y
BR2_TOOLCHAIN_EXTERNAL=y
BR2_TARGET_GENERIC_GETTY_PORT="ttyO0"
BR2_LINUX_KERNEL=y
BR2_LINUX_KERNEL_DEFCONFIG="omap2plus"
BR2_LINUX_KERNEL_ZIMAGE=y
BR2_LINUX_KERNEL_DTS_SUPPORT=y
BR2_LINUX_KERNEL_INTREE_DTS_NAME="am335x-evmsk am335x-boneblack"
BR2_LINUX_KERNEL_INSTALL_TARGET=y
BR2_PACKAGE_NUTTCP=y
BR2_PACKAGE_OPENSSH=y
BR2_PACKAGE_SER2NET=y
BR2_TARGET_ROOTFS_TAR_BZIP2=y
BR2_TARGET_UBOOT=y
BR2_TARGET_UBOOT_BOARDNAME="am335x_evm"
BR2_TARGET_UBOOT_FORMAT_IMG=y
BR2_TARGET_UBOOT_SPL=y
BR2_TARGET_UBOOT_SPL_NAME="MLO"
U-Boot version: 2014.07
Kernel config is omap2plus with enabled USB
# cat /proc/version
Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
Mon Dec 8 22:47:43 CET 2014
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-12 12:00 ` Yegor Yefremov
@ 2014-12-12 17:32 ` Felipe Balbi
2014-12-12 19:19 ` Yegor Yefremov
0 siblings, 1 reply; 16+ messages in thread
From: Felipe Balbi @ 2014-12-12 17:32 UTC (permalink / raw)
To: Yegor Yefremov; +Cc: Felipe Balbi, netdev, N, Mugunthan V
[-- Attachment #1: Type: text/plain, Size: 556 bytes --]
Hi,
On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
> U-Boot version: 2014.07
> Kernel config is omap2plus with enabled USB
>
> # cat /proc/version
> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> Mon Dec 8 22:47:43 CET 2014
Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
blacklisted. Can you try with 4.9.x just to make sure ?
I'll try to run your test case here once i'm back from vacations.
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-12 17:32 ` Felipe Balbi
@ 2014-12-12 19:19 ` Yegor Yefremov
2014-12-29 9:33 ` Yegor Yefremov
0 siblings, 1 reply; 16+ messages in thread
From: Yegor Yefremov @ 2014-12-12 19:19 UTC (permalink / raw)
To: Felipe Balbi; +Cc: netdev, N, Mugunthan V
On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
> Hi,
>
> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
>> U-Boot version: 2014.07
>> Kernel config is omap2plus with enabled USB
>>
>> # cat /proc/version
>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
>> Mon Dec 8 22:47:43 CET 2014
>
> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> blacklisted. Can you try with 4.9.x just to make sure ?
Will do.
> I'll try to run your test case here once i'm back from vacations.
Wish you good vacation!
Yegor
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-12 19:19 ` Yegor Yefremov
@ 2014-12-29 9:33 ` Yegor Yefremov
2014-12-29 13:46 ` Peter Hurley
2014-12-29 15:50 ` Felipe Balbi
0 siblings, 2 replies; 16+ messages in thread
From: Yegor Yefremov @ 2014-12-29 9:33 UTC (permalink / raw)
To: Felipe Balbi; +Cc: netdev, N, Mugunthan V, linux-omap@vger.kernel.org
On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
<yegorslists@googlemail.com> wrote:
> On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
>> Hi,
>>
>> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
>>> U-Boot version: 2014.07
>>> Kernel config is omap2plus with enabled USB
>>>
>>> # cat /proc/version
>>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
>>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
>>> Mon Dec 8 22:47:43 CET 2014
>>
>> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
>> blacklisted. Can you try with 4.9.x just to make sure ?
>
> Will do.
Adding linux-omap. Beginning of this discussion:
http://comments.gmane.org/gmane.linux.network/341427
Quick summary: starting with kernel 3.18 or commit
55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
custom boards) stalls at high network load. Reproducible via nuttcp
within some minutes
nuttcp -S (on BBB)
nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
but both show the same behavior.
Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
Mon Dec 8 22:47:43 CET 2014
Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
(Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
CET 2014
Let me know, if you can reproduce this issue.
Thanks.
Yegor
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 9:33 ` Yegor Yefremov
@ 2014-12-29 13:46 ` Peter Hurley
2014-12-29 15:50 ` Felipe Balbi
1 sibling, 0 replies; 16+ messages in thread
From: Peter Hurley @ 2014-12-29 13:46 UTC (permalink / raw)
To: Yegor Yefremov, Felipe Balbi
Cc: netdev, N, Mugunthan V, linux-omap@vger.kernel.org
On 12/29/2014 04:33 AM, Yegor Yefremov wrote:
> On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
> <yegorslists@googlemail.com> wrote:
>> On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
>>> Hi,
>>>
>>> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
>>>> U-Boot version: 2014.07
>>>> Kernel config is omap2plus with enabled USB
>>>>
>>>> # cat /proc/version
>>>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
>>>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
>>>> Mon Dec 8 22:47:43 CET 2014
>>>
>>> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
>>> blacklisted. Can you try with 4.9.x just to make sure ?
>>
>> Will do.
>
> Adding linux-omap. Beginning of this discussion:
> http://comments.gmane.org/gmane.linux.network/341427
>
> Quick summary: starting with kernel 3.18 or commit
> 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> custom boards) stalls at high network load. Reproducible via nuttcp
> within some minutes
>
> nuttcp -S (on BBB)
> nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
>
> As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> but both show the same behavior.
>
> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> Mon Dec 8 22:47:43 CET 2014
> Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> CET 2014
>
> Let me know, if you can reproduce this issue.
I have seen the irq 0 error messages on the black since 3.18+, but didn't
bisect it yet. For me, these errors occurred with a slightly misconfigured
emacs24-nox, which drove the cpu load way up - over 50% - with just
cursor movement (it still gets above 20% which seems unacceptably high).
I'm not sure if all the crashes were over ssh; I hadn't considered
the cpsw relevant until reading this. I'll retest over the serial console.
I have seen abrupt resets without messages on earlier kernels so perhaps
the commit is not the root cause.
Regards,
Peter Hurley
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 9:33 ` Yegor Yefremov
2014-12-29 13:46 ` Peter Hurley
@ 2014-12-29 15:50 ` Felipe Balbi
2014-12-29 16:51 ` Tony Lindgren
1 sibling, 1 reply; 16+ messages in thread
From: Felipe Balbi @ 2014-12-29 15:50 UTC (permalink / raw)
To: Yegor Yefremov
Cc: Felipe Balbi, netdev, N, Mugunthan V, linux-omap@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1824 bytes --]
On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote:
> On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
> <yegorslists@googlemail.com> wrote:
> > On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
> >> Hi,
> >>
> >> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
> >>> U-Boot version: 2014.07
> >>> Kernel config is omap2plus with enabled USB
> >>>
> >>> # cat /proc/version
> >>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> >>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> >>> Mon Dec 8 22:47:43 CET 2014
> >>
> >> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> >> blacklisted. Can you try with 4.9.x just to make sure ?
> >
> > Will do.
>
> Adding linux-omap. Beginning of this discussion:
> http://comments.gmane.org/gmane.linux.network/341427
>
> Quick summary: starting with kernel 3.18 or commit
> 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> custom boards) stalls at high network load. Reproducible via nuttcp
> within some minutes
>
> nuttcp -S (on BBB)
> nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
>
> As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> but both show the same behavior.
>
> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> Mon Dec 8 22:47:43 CET 2014
> Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> CET 2014
>
> Let me know, if you can reproduce this issue.
finally managed to reproduce this, it took quite a bit of effort though.
I'll see if I can gether more information about the problem.
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 15:50 ` Felipe Balbi
@ 2014-12-29 16:51 ` Tony Lindgren
2014-12-29 17:13 ` Felipe Balbi
0 siblings, 1 reply; 16+ messages in thread
From: Tony Lindgren @ 2014-12-29 16:51 UTC (permalink / raw)
To: Felipe Balbi
Cc: Yegor Yefremov, netdev, N, Mugunthan V,
linux-omap@vger.kernel.org
* Felipe Balbi <balbi@ti.com> [141229 07:53]:
> On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote:
> > On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
> > <yegorslists@googlemail.com> wrote:
> > > On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
> > >> Hi,
> > >>
> > >> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
> > >>> U-Boot version: 2014.07
> > >>> Kernel config is omap2plus with enabled USB
> > >>>
> > >>> # cat /proc/version
> > >>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > >>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > >>> Mon Dec 8 22:47:43 CET 2014
> > >>
> > >> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> > >> blacklisted. Can you try with 4.9.x just to make sure ?
> > >
> > > Will do.
> >
> > Adding linux-omap. Beginning of this discussion:
> > http://comments.gmane.org/gmane.linux.network/341427
> >
> > Quick summary: starting with kernel 3.18 or commit
> > 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> > custom boards) stalls at high network load. Reproducible via nuttcp
> > within some minutes
> >
> > nuttcp -S (on BBB)
> > nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
> >
> > As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> > but both show the same behavior.
> >
> > Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > Mon Dec 8 22:47:43 CET 2014
> > Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> > (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> > CET 2014
> >
> > Let me know, if you can reproduce this issue.
>
> finally managed to reproduce this, it took quite a bit of effort though.
> I'll see if I can gether more information about the problem.
Maybe check if the irqnr is 127 (or the last reserved interrupt)
in irq-omap-intc.c. If so, also print out the previous interrupt.
It seems the intc uses the last reserved interrupt to signal a
spurious interrupt for the previous irqnr, so we should probably
add some handling for that.
If the previous interrupt is a cpsw interrupt, then there's probably
something wrong with cpsw interrupt handling. Either a missing
read-back to flush posted write in the cpsw interrupt handler,
or the EOI registers are written at a wrong time.
Regards,
Tony
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 16:51 ` Tony Lindgren
@ 2014-12-29 17:13 ` Felipe Balbi
2014-12-30 23:22 ` Felipe Balbi
0 siblings, 1 reply; 16+ messages in thread
From: Felipe Balbi @ 2014-12-29 17:13 UTC (permalink / raw)
To: Tony Lindgren
Cc: Felipe Balbi, Yegor Yefremov, netdev, N, Mugunthan V,
linux-omap@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 2881 bytes --]
On Mon, Dec 29, 2014 at 08:51:04AM -0800, Tony Lindgren wrote:
> * Felipe Balbi <balbi@ti.com> [141229 07:53]:
> > On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote:
> > > On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
> > > <yegorslists@googlemail.com> wrote:
> > > > On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
> > > >> Hi,
> > > >>
> > > >> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
> > > >>> U-Boot version: 2014.07
> > > >>> Kernel config is omap2plus with enabled USB
> > > >>>
> > > >>> # cat /proc/version
> > > >>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > > >>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > > >>> Mon Dec 8 22:47:43 CET 2014
> > > >>
> > > >> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> > > >> blacklisted. Can you try with 4.9.x just to make sure ?
> > > >
> > > > Will do.
> > >
> > > Adding linux-omap. Beginning of this discussion:
> > > http://comments.gmane.org/gmane.linux.network/341427
> > >
> > > Quick summary: starting with kernel 3.18 or commit
> > > 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> > > custom boards) stalls at high network load. Reproducible via nuttcp
> > > within some minutes
> > >
> > > nuttcp -S (on BBB)
> > > nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
> > >
> > > As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> > > but both show the same behavior.
> > >
> > > Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > > 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > > Mon Dec 8 22:47:43 CET 2014
> > > Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> > > (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> > > CET 2014
> > >
> > > Let me know, if you can reproduce this issue.
> >
> > finally managed to reproduce this, it took quite a bit of effort though.
> > I'll see if I can gether more information about the problem.
>
> Maybe check if the irqnr is 127 (or the last reserved interrupt)
> in irq-omap-intc.c. If so, also print out the previous interrupt.
> It seems the intc uses the last reserved interrupt to signal a
> spurious interrupt for the previous irqnr, so we should probably
> add some handling for that.
>
> If the previous interrupt is a cpsw interrupt, then there's probably
> something wrong with cpsw interrupt handling. Either a missing
> read-back to flush posted write in the cpsw interrupt handler,
> or the EOI registers are written at a wrong time.
yeah, I'll go over it, but I first need to reproduce it again. Just
rebooted to try again and after half an hour, couldn't reproduce it
anymore. Interesting race to end the year :-)
cheers
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 17:13 ` Felipe Balbi
@ 2014-12-30 23:22 ` Felipe Balbi
0 siblings, 0 replies; 16+ messages in thread
From: Felipe Balbi @ 2014-12-30 23:22 UTC (permalink / raw)
To: Felipe Balbi
Cc: Tony Lindgren, Yegor Yefremov, netdev, N, Mugunthan V,
linux-omap@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 2910 bytes --]
Hi,
On Mon, Dec 29, 2014 at 11:13:55AM -0600, Felipe Balbi wrote:
> > > > >>> U-Boot version: 2014.07
> > > > >>> Kernel config is omap2plus with enabled USB
> > > > >>>
> > > > >>> # cat /proc/version
> > > > >>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > > > >>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > > > >>> Mon Dec 8 22:47:43 CET 2014
> > > > >>
> > > > >> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> > > > >> blacklisted. Can you try with 4.9.x just to make sure ?
> > > > >
> > > > > Will do.
> > > >
> > > > Adding linux-omap. Beginning of this discussion:
> > > > http://comments.gmane.org/gmane.linux.network/341427
> > > >
> > > > Quick summary: starting with kernel 3.18 or commit
> > > > 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> > > > custom boards) stalls at high network load. Reproducible via nuttcp
> > > > within some minutes
> > > >
> > > > nuttcp -S (on BBB)
> > > > nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
> > > >
> > > > As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> > > > but both show the same behavior.
> > > >
> > > > Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > > > 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > > > Mon Dec 8 22:47:43 CET 2014
> > > > Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> > > > (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> > > > CET 2014
> > > >
> > > > Let me know, if you can reproduce this issue.
> > >
> > > finally managed to reproduce this, it took quite a bit of effort though.
> > > I'll see if I can gether more information about the problem.
> >
> > Maybe check if the irqnr is 127 (or the last reserved interrupt)
> > in irq-omap-intc.c. If so, also print out the previous interrupt.
> > It seems the intc uses the last reserved interrupt to signal a
> > spurious interrupt for the previous irqnr, so we should probably
> > add some handling for that.
> >
> > If the previous interrupt is a cpsw interrupt, then there's probably
> > something wrong with cpsw interrupt handling. Either a missing
> > read-back to flush posted write in the cpsw interrupt handler,
> > or the EOI registers are written at a wrong time.
>
> yeah, I'll go over it, but I first need to reproduce it again. Just
> rebooted to try again and after half an hour, couldn't reproduce it
> anymore. Interesting race to end the year :-)
alright, managed to reproduce multiple and I'm pretty confident I've
found the bug. Right now I'm testing with AM437x and AM335x to make sure
it's really working. If it's still running until tomorrow I'll send a
preliminary patch but I want to leave this running for quite a few days
before calling it "fixed".
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-12-30 23:22 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-04 16:41 am335x: cpsw: interrupt failure Yegor Yefremov
2014-12-04 16:56 ` Felipe Balbi
2014-12-05 10:03 ` Yegor Yefremov
2014-12-10 17:17 ` Felipe Balbi
2014-12-10 20:58 ` Yegor Yefremov
2014-12-10 21:02 ` Felipe Balbi
2014-12-10 22:56 ` Yegor Yefremov
2014-12-12 12:00 ` Yegor Yefremov
2014-12-12 17:32 ` Felipe Balbi
2014-12-12 19:19 ` Yegor Yefremov
2014-12-29 9:33 ` Yegor Yefremov
2014-12-29 13:46 ` Peter Hurley
2014-12-29 15:50 ` Felipe Balbi
2014-12-29 16:51 ` Tony Lindgren
2014-12-29 17:13 ` Felipe Balbi
2014-12-30 23:22 ` Felipe Balbi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).