* Re: am335x: cpsw: interrupt failure
[not found] ` <CAGm1_kuYrGsfjRO6TVr554yS8dcD4_Z9-j9KL0xpD=X+31OkXQ@mail.gmail.com>
@ 2014-12-29 9:33 ` Yegor Yefremov
2014-12-29 13:46 ` Peter Hurley
2014-12-29 15:50 ` Felipe Balbi
0 siblings, 2 replies; 6+ messages in thread
From: Yegor Yefremov @ 2014-12-29 9:33 UTC (permalink / raw)
To: Felipe Balbi; +Cc: netdev, N, Mugunthan V, linux-omap@vger.kernel.org
On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
<yegorslists@googlemail.com> wrote:
> On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
>> Hi,
>>
>> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
>>> U-Boot version: 2014.07
>>> Kernel config is omap2plus with enabled USB
>>>
>>> # cat /proc/version
>>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
>>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
>>> Mon Dec 8 22:47:43 CET 2014
>>
>> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
>> blacklisted. Can you try with 4.9.x just to make sure ?
>
> Will do.
Adding linux-omap. Beginning of this discussion:
http://comments.gmane.org/gmane.linux.network/341427
Quick summary: starting with kernel 3.18 or commit
55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
custom boards) stalls at high network load. Reproducible via nuttcp
within some minutes
nuttcp -S (on BBB)
nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
but both show the same behavior.
Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
Mon Dec 8 22:47:43 CET 2014
Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
(Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
CET 2014
Let me know, if you can reproduce this issue.
Thanks.
Yegor
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 9:33 ` am335x: cpsw: interrupt failure Yegor Yefremov
@ 2014-12-29 13:46 ` Peter Hurley
2014-12-29 15:50 ` Felipe Balbi
1 sibling, 0 replies; 6+ messages in thread
From: Peter Hurley @ 2014-12-29 13:46 UTC (permalink / raw)
To: Yegor Yefremov, Felipe Balbi
Cc: netdev, N, Mugunthan V, linux-omap@vger.kernel.org
On 12/29/2014 04:33 AM, Yegor Yefremov wrote:
> On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
> <yegorslists@googlemail.com> wrote:
>> On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
>>> Hi,
>>>
>>> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
>>>> U-Boot version: 2014.07
>>>> Kernel config is omap2plus with enabled USB
>>>>
>>>> # cat /proc/version
>>>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
>>>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
>>>> Mon Dec 8 22:47:43 CET 2014
>>>
>>> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
>>> blacklisted. Can you try with 4.9.x just to make sure ?
>>
>> Will do.
>
> Adding linux-omap. Beginning of this discussion:
> http://comments.gmane.org/gmane.linux.network/341427
>
> Quick summary: starting with kernel 3.18 or commit
> 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> custom boards) stalls at high network load. Reproducible via nuttcp
> within some minutes
>
> nuttcp -S (on BBB)
> nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
>
> As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> but both show the same behavior.
>
> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> Mon Dec 8 22:47:43 CET 2014
> Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> CET 2014
>
> Let me know, if you can reproduce this issue.
I have seen the irq 0 error messages on the black since 3.18+, but didn't
bisect it yet. For me, these errors occurred with a slightly misconfigured
emacs24-nox, which drove the cpu load way up - over 50% - with just
cursor movement (it still gets above 20% which seems unacceptably high).
I'm not sure if all the crashes were over ssh; I hadn't considered
the cpsw relevant until reading this. I'll retest over the serial console.
I have seen abrupt resets without messages on earlier kernels so perhaps
the commit is not the root cause.
Regards,
Peter Hurley
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 9:33 ` am335x: cpsw: interrupt failure Yegor Yefremov
2014-12-29 13:46 ` Peter Hurley
@ 2014-12-29 15:50 ` Felipe Balbi
2014-12-29 16:51 ` Tony Lindgren
1 sibling, 1 reply; 6+ messages in thread
From: Felipe Balbi @ 2014-12-29 15:50 UTC (permalink / raw)
To: Yegor Yefremov
Cc: Felipe Balbi, netdev, N, Mugunthan V, linux-omap@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1824 bytes --]
On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote:
> On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
> <yegorslists@googlemail.com> wrote:
> > On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
> >> Hi,
> >>
> >> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
> >>> U-Boot version: 2014.07
> >>> Kernel config is omap2plus with enabled USB
> >>>
> >>> # cat /proc/version
> >>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> >>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> >>> Mon Dec 8 22:47:43 CET 2014
> >>
> >> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> >> blacklisted. Can you try with 4.9.x just to make sure ?
> >
> > Will do.
>
> Adding linux-omap. Beginning of this discussion:
> http://comments.gmane.org/gmane.linux.network/341427
>
> Quick summary: starting with kernel 3.18 or commit
> 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> custom boards) stalls at high network load. Reproducible via nuttcp
> within some minutes
>
> nuttcp -S (on BBB)
> nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
>
> As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> but both show the same behavior.
>
> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> Mon Dec 8 22:47:43 CET 2014
> Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> CET 2014
>
> Let me know, if you can reproduce this issue.
finally managed to reproduce this, it took quite a bit of effort though.
I'll see if I can gether more information about the problem.
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 15:50 ` Felipe Balbi
@ 2014-12-29 16:51 ` Tony Lindgren
2014-12-29 17:13 ` Felipe Balbi
0 siblings, 1 reply; 6+ messages in thread
From: Tony Lindgren @ 2014-12-29 16:51 UTC (permalink / raw)
To: Felipe Balbi
Cc: Yegor Yefremov, netdev, N, Mugunthan V,
linux-omap@vger.kernel.org
* Felipe Balbi <balbi@ti.com> [141229 07:53]:
> On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote:
> > On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
> > <yegorslists@googlemail.com> wrote:
> > > On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
> > >> Hi,
> > >>
> > >> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
> > >>> U-Boot version: 2014.07
> > >>> Kernel config is omap2plus with enabled USB
> > >>>
> > >>> # cat /proc/version
> > >>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > >>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > >>> Mon Dec 8 22:47:43 CET 2014
> > >>
> > >> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> > >> blacklisted. Can you try with 4.9.x just to make sure ?
> > >
> > > Will do.
> >
> > Adding linux-omap. Beginning of this discussion:
> > http://comments.gmane.org/gmane.linux.network/341427
> >
> > Quick summary: starting with kernel 3.18 or commit
> > 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> > custom boards) stalls at high network load. Reproducible via nuttcp
> > within some minutes
> >
> > nuttcp -S (on BBB)
> > nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
> >
> > As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> > but both show the same behavior.
> >
> > Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > Mon Dec 8 22:47:43 CET 2014
> > Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> > (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> > CET 2014
> >
> > Let me know, if you can reproduce this issue.
>
> finally managed to reproduce this, it took quite a bit of effort though.
> I'll see if I can gether more information about the problem.
Maybe check if the irqnr is 127 (or the last reserved interrupt)
in irq-omap-intc.c. If so, also print out the previous interrupt.
It seems the intc uses the last reserved interrupt to signal a
spurious interrupt for the previous irqnr, so we should probably
add some handling for that.
If the previous interrupt is a cpsw interrupt, then there's probably
something wrong with cpsw interrupt handling. Either a missing
read-back to flush posted write in the cpsw interrupt handler,
or the EOI registers are written at a wrong time.
Regards,
Tony
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 16:51 ` Tony Lindgren
@ 2014-12-29 17:13 ` Felipe Balbi
2014-12-30 23:22 ` Felipe Balbi
0 siblings, 1 reply; 6+ messages in thread
From: Felipe Balbi @ 2014-12-29 17:13 UTC (permalink / raw)
To: Tony Lindgren
Cc: Felipe Balbi, Yegor Yefremov, netdev, N, Mugunthan V,
linux-omap@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 2881 bytes --]
On Mon, Dec 29, 2014 at 08:51:04AM -0800, Tony Lindgren wrote:
> * Felipe Balbi <balbi@ti.com> [141229 07:53]:
> > On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote:
> > > On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
> > > <yegorslists@googlemail.com> wrote:
> > > > On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
> > > >> Hi,
> > > >>
> > > >> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
> > > >>> U-Boot version: 2014.07
> > > >>> Kernel config is omap2plus with enabled USB
> > > >>>
> > > >>> # cat /proc/version
> > > >>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > > >>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > > >>> Mon Dec 8 22:47:43 CET 2014
> > > >>
> > > >> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> > > >> blacklisted. Can you try with 4.9.x just to make sure ?
> > > >
> > > > Will do.
> > >
> > > Adding linux-omap. Beginning of this discussion:
> > > http://comments.gmane.org/gmane.linux.network/341427
> > >
> > > Quick summary: starting with kernel 3.18 or commit
> > > 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> > > custom boards) stalls at high network load. Reproducible via nuttcp
> > > within some minutes
> > >
> > > nuttcp -S (on BBB)
> > > nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
> > >
> > > As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> > > but both show the same behavior.
> > >
> > > Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > > 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > > Mon Dec 8 22:47:43 CET 2014
> > > Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> > > (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> > > CET 2014
> > >
> > > Let me know, if you can reproduce this issue.
> >
> > finally managed to reproduce this, it took quite a bit of effort though.
> > I'll see if I can gether more information about the problem.
>
> Maybe check if the irqnr is 127 (or the last reserved interrupt)
> in irq-omap-intc.c. If so, also print out the previous interrupt.
> It seems the intc uses the last reserved interrupt to signal a
> spurious interrupt for the previous irqnr, so we should probably
> add some handling for that.
>
> If the previous interrupt is a cpsw interrupt, then there's probably
> something wrong with cpsw interrupt handling. Either a missing
> read-back to flush posted write in the cpsw interrupt handler,
> or the EOI registers are written at a wrong time.
yeah, I'll go over it, but I first need to reproduce it again. Just
rebooted to try again and after half an hour, couldn't reproduce it
anymore. Interesting race to end the year :-)
cheers
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: am335x: cpsw: interrupt failure
2014-12-29 17:13 ` Felipe Balbi
@ 2014-12-30 23:22 ` Felipe Balbi
0 siblings, 0 replies; 6+ messages in thread
From: Felipe Balbi @ 2014-12-30 23:22 UTC (permalink / raw)
To: Felipe Balbi
Cc: Tony Lindgren, Yegor Yefremov, netdev, N, Mugunthan V,
linux-omap@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 2910 bytes --]
Hi,
On Mon, Dec 29, 2014 at 11:13:55AM -0600, Felipe Balbi wrote:
> > > > >>> U-Boot version: 2014.07
> > > > >>> Kernel config is omap2plus with enabled USB
> > > > >>>
> > > > >>> # cat /proc/version
> > > > >>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > > > >>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > > > >>> Mon Dec 8 22:47:43 CET 2014
> > > > >>
> > > > >> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> > > > >> blacklisted. Can you try with 4.9.x just to make sure ?
> > > > >
> > > > > Will do.
> > > >
> > > > Adding linux-omap. Beginning of this discussion:
> > > > http://comments.gmane.org/gmane.linux.network/341427
> > > >
> > > > Quick summary: starting with kernel 3.18 or commit
> > > > 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
> > > > custom boards) stalls at high network load. Reproducible via nuttcp
> > > > within some minutes
> > > >
> > > > nuttcp -S (on BBB)
> > > > nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
> > > >
> > > > As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
> > > > but both show the same behavior.
> > > >
> > > > Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> > > > 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> > > > Mon Dec 8 22:47:43 CET 2014
> > > > Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
> > > > (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
> > > > CET 2014
> > > >
> > > > Let me know, if you can reproduce this issue.
> > >
> > > finally managed to reproduce this, it took quite a bit of effort though.
> > > I'll see if I can gether more information about the problem.
> >
> > Maybe check if the irqnr is 127 (or the last reserved interrupt)
> > in irq-omap-intc.c. If so, also print out the previous interrupt.
> > It seems the intc uses the last reserved interrupt to signal a
> > spurious interrupt for the previous irqnr, so we should probably
> > add some handling for that.
> >
> > If the previous interrupt is a cpsw interrupt, then there's probably
> > something wrong with cpsw interrupt handling. Either a missing
> > read-back to flush posted write in the cpsw interrupt handler,
> > or the EOI registers are written at a wrong time.
>
> yeah, I'll go over it, but I first need to reproduce it again. Just
> rebooted to try again and after half an hour, couldn't reproduce it
> anymore. Interesting race to end the year :-)
alright, managed to reproduce multiple and I'm pretty confident I've
found the bug. Right now I'm testing with AM437x and AM335x to make sure
it's really working. If it's still running until tomorrow I'll send a
preliminary patch but I want to leave this running for quite a few days
before calling it "fixed".
--
balbi
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-12-30 23:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAGm1_ksPkFXV_S6541cK1TQb_eRTHbspZy4zWxY5GMrrjGD2tA@mail.gmail.com>
[not found] ` <20141204165609.GJ18045@saruman>
[not found] ` <CAGm1_ksNuO75QfgaDUpOe525xDkPhby-z8T=MX3DHdB-LVcR8g@mail.gmail.com>
[not found] ` <20141210171724.GI4602@saruman>
[not found] ` <CAGm1_ksziwSjQnV1rDYsAv2QQCvhnZPOimowmsW5Cib7C01aqA@mail.gmail.com>
[not found] ` <20141210210234.GN4602@saruman>
[not found] ` <CAGm1_ktYg4iDgK7jyi5etSVup61LdookkV+SaQZOpLb6M=y7hA@mail.gmail.com>
[not found] ` <CAGm1_kuaqXrU4zhNUZeY85+OeXHApSUDjJdsXcyBdvHTNhLCmQ@mail.gmail.com>
[not found] ` <20141212173210.GI7549@saruman>
[not found] ` <CAGm1_kuYrGsfjRO6TVr554yS8dcD4_Z9-j9KL0xpD=X+31OkXQ@mail.gmail.com>
2014-12-29 9:33 ` am335x: cpsw: interrupt failure Yegor Yefremov
2014-12-29 13:46 ` Peter Hurley
2014-12-29 15:50 ` Felipe Balbi
2014-12-29 16:51 ` Tony Lindgren
2014-12-29 17:13 ` Felipe Balbi
2014-12-30 23:22 ` Felipe Balbi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox