linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Gianfar driver crashes in Kernel v3.10
@ 2013-10-04 12:03 Thomas Hühn
  2013-10-04 13:45 ` Claudiu Manoil
  2013-10-08 22:09 ` Scott Wood
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Hühn @ 2013-10-04 12:03 UTC (permalink / raw)
  To: claudiu.manoil@freescale.com; +Cc: linuxppc-dev@lists.ozlabs.org

Hi all,

We are several Openwrt users based on the TPlink 4900 device and suffer fro=
m a crashing gianfar driver.
We troubleshooted the problem down to the fact, that a 3.8er Linux kernel i=
s working, and a v3.10 crashes, but there is
no reproducable case yet. The driver crashes after a couple of minutes but =
this can not be triggered by high network load, or routing traffic.
I recorded the crash via a serial line and did a gdb lookup in gainfar.c
All infos and logs we collected so far are here: https://forum.openwrt.org/=
viewtopic.php?pid=3D213901#p213901

I cc the linuxppc-dev mailing but not sure this is the rigth one.
Please let us know how we could help to find that bug within the gianfar NA=
PI.

Greetings Thomas




ps: here is my last troubleshooting log on the openwrt mailing list

I just hooked up a serial line to my tplinl4900. Used a recent trunk image =
and could catch the output of the crash.
The problem comes from the ethernet driver gfar

[code]
[ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
[ 2671.847141] Freescale P1014
[ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat ath9k_c=
ommon pppox p
e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt=
_pkttype xt_o
mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMAP xt=
_LOG xt_IPMAR
ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic nf_nat=
_sip nf_nat_r
ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc nf_con=
ntrack_h323 n
 compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio sch_ht=
b sch_gred sc
skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch=
_hfsc sch_ing
r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_mod fsl=
_mph_dr_of gp
[ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
[ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
[ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
[ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
[ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
[ 2672.017125]=20
GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000 70000=
000=20
GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000 00000=
008=20
GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086 fffff=
800=20
GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000 00000=
001=20
[ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
[ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
[ 2672.055962] Call Trace:
[ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliable)
[ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
[ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
[ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
[ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
[ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
[ 2672.090991] --- Exception: 501 at 0x48083c28
[ 2672.090991]     LR =3D 0x48083bf8
[ 2672.098378] Instruction dump:
[ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378 81c100=
50 7ffafb78=20
[ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004c 4800=
0060 41a2004c
[ 2672.117021] ---[ end trace 565fb54528d305fa ]---
[ 2672.121628]=20
[ 2673.103130] Kernel panic - not syncing: Fatal exception in interrupt
[ 2673.109474] Rebooting in 3 seconds..

U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
[/code]


A cross-gdb lookup to gianfar.o shows that the problem appier in function "=
gfar_poll"

[code]
./gdb ../../../target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3=
.10.12/drivers/net/ethernet/freescale/gianfar.o

This GDB was configured as "--host=3Dx86_64-linux-gnu --target=3Dpowerpc-op=
enwrt-linux-uclibcspe".
For bug reporting instructions, please see:
<[url]http://bugs.launchpad.net/gdb-linaro/[/url]>...
Reading symbols from /home/thomas/BB-evernet/build_dir/target-powerpc_uClib=
c-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freesca=
le/gianfar.o...done.
(gdb) l *gfar_poll+0x2f8/0x520
0x4538 is in gfar_poll (drivers/net/ethernet/freescale/gianfar.c:2829).
2824
2825            return howmany;
2826    }
2827
2828    static int gfar_poll(struct napi_struct *napi, int budget)
2829    {
2830            struct gfar_priv_grp *gfargrp =3D
2831                    container_of(napi, struct gfar_priv_grp, napi);
2832            struct gfar_private *priv =3D gfargrp->priv;
2833            struct gfar __iomem *regs =3D gfargrp->regs;
(gdb) q

[/code]


The changes from Linux kernel 3.8, which seems to have proper working ehter=
net, to the current 3.10 seem to intruduce a bug in the GIANFAR driver: dri=
vers/net/ethernet/freescale/gianfra.c
There were different changes in the NAPI of gianfar driver made between the=
 two kernel versions.=20
You can have a look at them by doin a "git whatchanged -p v3.8..v3.10 drive=
rs/net/ethernet/freescale/gianfar.c" in a recent Linux kernel verion.

[b]So let us all have a look to those changes to find the bug !!![/b]

Probably the maintainer of the gianfar driver should be included here. Clau=
diu Manoil <claudiu.manoil@freescale.com>


So far from troubleshooting.

Greetings Bluse=

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Gianfar driver crashes in Kernel v3.10
@ 2013-10-04 12:28 Thomas Hühn
  2013-10-10 11:07 ` Claudiu Manoil
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Hühn @ 2013-10-04 12:28 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

[-- Attachment #1: Type: text/plain, Size: 4881 bytes --]

Hi all,

We are several Openwrt users based on the TPlink 4900 device and suffer from a crashing gianfar driver.
We troubleshooted the problem down to the fact, that a 3.8er Linux kernel is working, and a v3.10 crashes, but there is
no reproducable case yet. The driver crashes after a couple of minutes but this can not be triggered by high network load, or routing traffic.
I recorded the crash via a serial line and did a gdb lookup in gainfar.c
All infos and logs we collected so far in the OpenWRt forum:https://forum.openwrt.org/viewtopic.php?pid=213901#p213901

Here is my last troubleshooting log on the openwrt mailing list

I just hooked up a serial line to my tplinl4900. Used a recent trunk image and could catch the output of the crash.
The problem comes from the ethernet driver gfar

[code]
[ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
[ 2671.847141] Freescale P1014
[ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat ath9k_common pppox p
e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_pkttype xt_o
mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_IPMAR
ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_r
ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 n
compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio sch_htb sch_gred sc
skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ing
r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_mod fsl_mph_dr_of gp
[ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
[ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
[ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
[ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
[ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
[ 2672.017125] 
GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000 70000000 
GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000 00000008 
GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086 fffff800 
GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000 00000001 
[ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
[ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
[ 2672.055962] Call Trace:
[ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliable)
[ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
[ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
[ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
[ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
[ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
[ 2672.090991] --- Exception: 501 at 0x48083c28
[ 2672.090991]     LR = 0x48083bf8
[ 2672.098378] Instruction dump:
[ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378 81c10050 7ffafb78 
[ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004c 48000060 41a2004c
[ 2672.117021] ---[ end trace 565fb54528d305fa ]---
[ 2672.121628] 
[ 2673.103130] Kernel panic - not syncing: Fatal exception in interrupt
[ 2673.109474] Rebooting in 3 seconds..

U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
[/code]


A cross-gdb lookup to gianfar.o shows that the problem appier in function "gfar_poll"

[code]
./gdb ../../../target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freescale/gianfar.o

This GDB was configured as "--host=x86_64-linux-gnu --target=powerpc-openwrt-linux-uclibcspe".
For bug reporting instructions, please see:
<[url]http://bugs.launchpad.net/gdb-linaro/[/url]>...
Reading symbols from /home/thomas/BB-evernet/build_dir/target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freescale/gianfar.o...done.
(gdb) l *gfar_poll+0x2f8/0x520
0x4538 is in gfar_poll (drivers/net/ethernet/freescale/gianfar.c:2829).
2824
2825            return howmany;
2826    }
2827
2828    static int gfar_poll(struct napi_struct *napi, int budget)
2829    {
2830            struct gfar_priv_grp *gfargrp =
2831                    container_of(napi, struct gfar_priv_grp, napi);
2832            struct gfar_private *priv = gfargrp->priv;
2833            struct gfar __iomem *regs = gfargrp->regs;
(gdb) q

[/code]


The changes from Linux kernel 3.8, which seems to have proper working ehternet, to the current 3.10 seem to intruduce a bug in the GIANFAR driver: drivers/net/ethernet/freescale/gianfra.c
There were different changes in the NAPI of gianfar driver made between the two kernel versions. 
Please let us know which next troubleshooting step you would recommend to nail down the issue.

So far from troubleshooting.

Greetings Thomas

[-- Attachment #2: Type: text/html, Size: 5864 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-04 12:03 Gianfar driver crashes in Kernel v3.10 Thomas Hühn
@ 2013-10-04 13:45 ` Claudiu Manoil
  2013-10-08 22:09 ` Scott Wood
  1 sibling, 0 replies; 13+ messages in thread
From: Claudiu Manoil @ 2013-10-04 13:45 UTC (permalink / raw)
  To: Thomas Hühn; +Cc: linuxppc-dev@lists.ozlabs.org

On 10/4/2013 3:03 PM, Thomas H=FChn wrote:
> Hi all,
>
> We are several Openwrt users based on the TPlink 4900 device and suffer=
 from a crashing gianfar driver.
> We troubleshooted the problem down to the fact, that a 3.8er Linux kern=
el is working, and a v3.10 crashes, but there is
> no reproducable case yet.

I'll have some low traffic tests with the upstream 3.10 kernel on a
p1010rdb, but since you say there's no "reproducibility case" the crash
may not be easy to spot on my side.  I wouldn't jump to conclusions
too fast, the fact that it manifests now doesn't neccesarily mean
that a bug was introduced between v3.8 and v3.10.
I'll let you know should I find something.  Meanwhile, if you have
additional information about how to reproduce it, please share.

Thanks,
Claudiu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-04 12:03 Gianfar driver crashes in Kernel v3.10 Thomas Hühn
  2013-10-04 13:45 ` Claudiu Manoil
@ 2013-10-08 22:09 ` Scott Wood
  2013-10-11  9:17   ` Thomas Hühn
  1 sibling, 1 reply; 13+ messages in thread
From: Scott Wood @ 2013-10-08 22:09 UTC (permalink / raw)
  To: Thomas Hühn
  Cc: linuxppc-dev@lists.ozlabs.org, claudiu.manoil@freescale.com

On Fri, 2013-10-04 at 12:03 +0000, Thomas H=C3=BChn wrote:
> [code]
> [ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 2671.847141] Freescale P1014
> [ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat ath=
9k_common pppox p
> e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quot=
a xt_pkttype xt_o
> mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMA=
P xt_LOG xt_IPMAR
> ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic nf=
_nat_sip nf_nat_r
> ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc nf=
_conntrack_h323 n
>  compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio sc=
h_htb sch_gred sc
> skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw=
 sch_hfsc sch_ing
> r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_mod=
 fsl_mph_dr_of gp
> [ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
> [ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
> [ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
> [ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)

Trap 0x3202 is a watchdog timer.

Did you get a "Bad trap at..." line before the above dump?  Do you have
any idea why the watchdog would have been armed without CONFIG_BOOKE_WDT
being set?  Is CONFIG_BOOKE_WDT set?

-Scott

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-04 12:28 Thomas Hühn
@ 2013-10-10 11:07 ` Claudiu Manoil
  2013-10-10 21:41   ` Scott Wood
  0 siblings, 1 reply; 13+ messages in thread
From: Claudiu Manoil @ 2013-10-10 11:07 UTC (permalink / raw)
  To: Thomas Hühn; +Cc: linuxppc-dev@lists.ozlabs.org

On 10/4/2013 3:28 PM, Thomas H=FChn wrote:
>
> [code]
> [ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 2671.847141] Freescale P1014
> [ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat
> ath9k_common pppox p
> e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quot=
a
> xt_pkttype xt_o
> mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMA=
P
> xt_LOG xt_IPMAR
> ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic
> nf_nat_sip nf_nat_r
> ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc
> nf_conntrack_h323 n
> compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio
> sch_htb sch_gred sc
> skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw
> sch_hfsc sch_ing
> r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_mod
> fsl_mph_dr_of gp
> [ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
> [ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
> [ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
> [ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
> [ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
> [ 2672.017125]
> GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000
> 70000000
> GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000
> 00000008
> GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086
> fffff800
> GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000
> 00000001
> [ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
> [ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
> [ 2672.055962] Call Trace:
> [ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliable)
> [ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
> [ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
> [ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
> [ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
> [ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
> [ 2672.090991] --- Exception: 501 at 0x48083c28
> [ 2672.090991]     LR =3D 0x48083bf8
> [ 2672.098378] Instruction dump:
> [ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378
> 81c10050 7ffafb78
> [ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004c
> 48000060 41a2004c
> [ 2672.117021] ---[ end trace 565fb54528d305fa ]---
> [ 2672.121628]
> [ 2673.103130] Kernel panic - not syncing: Fatal exception in interrupt
> [ 2673.109474] Rebooting in 3 seconds..
>
> U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
> [/code]
>

Hi,

Does this show up on a half duplex (100Mb/s) link?
Could you provide following for the gianfar interface, on your setup:
# ethtool ethX
and
# ethtool -d ethX | grep 500

Is there any other indication before this Oops? Like a tx timeout WARN?

Thanks,
Claudiu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-10 11:07 ` Claudiu Manoil
@ 2013-10-10 21:41   ` Scott Wood
  2013-10-11  7:49     ` Claudiu Manoil
  0 siblings, 1 reply; 13+ messages in thread
From: Scott Wood @ 2013-10-10 21:41 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: Thomas Hühn, linuxppc-dev@lists.ozlabs.org

On Thu, 2013-10-10 at 14:07 +0300, Claudiu Manoil wrote:
> On 10/4/2013 3:28 PM, Thomas H=C3=BChn wrote:
> >
> > [code]
> > [ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
> > [ 2671.847141] Freescale P1014
> > [ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat
> > ath9k_common pppox p
> > e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_qu=
ota
> > xt_pkttype xt_o
> > mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NET=
MAP
> > xt_LOG xt_IPMAR
> > ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic
> > nf_nat_sip nf_nat_r
> > ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc
> > nf_conntrack_h323 n
> > compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio
> > sch_htb sch_gred sc
> > skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_=
fw
> > sch_hfsc sch_ing
> > r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_m=
od
> > fsl_mph_dr_of gp
> > [ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
> > [ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
> > [ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
> > [ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
> > [ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
> > [ 2672.017125]
> > GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000
> > 70000000
> > GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000
> > 00000008
> > GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086
> > fffff800
> > GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000
> > 00000001
> > [ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
> > [ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
> > [ 2672.055962] Call Trace:
> > [ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliabl=
e)
> > [ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
> > [ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
> > [ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
> > [ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
> > [ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
> > [ 2672.090991] --- Exception: 501 at 0x48083c28
> > [ 2672.090991]     LR =3D 0x48083bf8
> > [ 2672.098378] Instruction dump:
> > [ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378
> > 81c10050 7ffafb78
> > [ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004=
c
> > 48000060 41a2004c
> > [ 2672.117021] ---[ end trace 565fb54528d305fa ]---
> > [ 2672.121628]
> > [ 2673.103130] Kernel panic - not syncing: Fatal exception in interru=
pt
> > [ 2673.109474] Rebooting in 3 seconds..
> >
> > U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
> > [/code]
> >
>=20
> Hi,
>=20
> Does this show up on a half duplex (100Mb/s) link?
> Could you provide following for the gianfar interface, on your setup:
> # ethtool ethX
> and
> # ethtool -d ethX | grep 500
>=20
> Is there any other indication before this Oops? Like a tx timeout WARN?

It's a watchdog interrupt (CPU watchdog, not netdev).  I think it's only
showing up in the gianfar code because that's what's running (unless the
gianfar code is causing the watchdog daemon to not run).

-Scott

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-10 21:41   ` Scott Wood
@ 2013-10-11  7:49     ` Claudiu Manoil
  0 siblings, 0 replies; 13+ messages in thread
From: Claudiu Manoil @ 2013-10-11  7:49 UTC (permalink / raw)
  To: Scott Wood; +Cc: Thomas Hühn, linuxppc-dev@lists.ozlabs.org



On 10/11/2013 12:41 AM, Scott Wood wrote:
> On Thu, 2013-10-10 at 14:07 +0300, Claudiu Manoil wrote:
>> On 10/4/2013 3:28 PM, Thomas H=C3=BChn wrote:
>>>
>>> [code]
>>> [ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
>>> [ 2671.847141] Freescale P1014
>>> [ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat
>>> ath9k_common pppox p
>>> e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_qu=
ota
>>> xt_pkttype xt_o
>>> mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NET=
MAP
>>> xt_LOG xt_IPMAR
>>> ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic
>>> nf_nat_sip nf_nat_r
>>> ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc
>>> nf_conntrack_h323 n
>>> compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio
>>> sch_htb sch_gred sc
>>> skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_=
fw
>>> sch_hfsc sch_ing
>>> r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_m=
od
>>> fsl_mph_dr_of gp
>>> [ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
>>> [ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
>>> [ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
>>> [ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
>>> [ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
>>> [ 2672.017125]
>>> GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000
>>> 70000000
>>> GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000
>>> 00000008
>>> GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086
>>> fffff800
>>> GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000
>>> 00000001
>>> [ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
>>> [ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
>>> [ 2672.055962] Call Trace:
>>> [ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliabl=
e)
>>> [ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
>>> [ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
>>> [ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
>>> [ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
>>> [ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
>>> [ 2672.090991] --- Exception: 501 at 0x48083c28
>>> [ 2672.090991]     LR =3D 0x48083bf8
>>> [ 2672.098378] Instruction dump:
>>> [ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378
>>> 81c10050 7ffafb78
>>> [ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004=
c
>>> 48000060 41a2004c
>>> [ 2672.117021] ---[ end trace 565fb54528d305fa ]---
>>> [ 2672.121628]
>>> [ 2673.103130] Kernel panic - not syncing: Fatal exception in interru=
pt
>>> [ 2673.109474] Rebooting in 3 seconds..
>>>
>>> U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
>>> [/code]
>>>
>>
>> Hi,
>>
>> Does this show up on a half duplex (100Mb/s) link?
>> Could you provide following for the gianfar interface, on your setup:
>> # ethtool ethX
>> and
>> # ethtool -d ethX | grep 500
>>
>> Is there any other indication before this Oops? Like a tx timeout WARN=
?
>
> It's a watchdog interrupt (CPU watchdog, not netdev).  I think it's onl=
y
> showing up in the gianfar code because that's what's running (unless th=
e
> gianfar code is causing the watchdog daemon to not run).
>

Hi Scott,
Good to know that the exception is triggered by the watchdog, and at
this point I assume they simply enabled the watchdog support in kernel
(as you know, it's not enabled by the default config) and that the
exception triggered as the system froze.  Since this reportedly happens
under certain traffic conditions (not "high network load, or routing
traffic") I think that information about the link state (whether
it's 100 Mb/s half duplex or not) is relevant here.  Any other
indication on top of that (if there is any) is also useful.

Thanks.

Claudiu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-08 22:09 ` Scott Wood
@ 2013-10-11  9:17   ` Thomas Hühn
  2013-10-11 17:49     ` Scott Wood
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Hühn @ 2013-10-11  9:17 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev@lists.ozlabs.org, claudiu.manoil@freescale.com

Hi Scott,

On 09.10.2013, at 00:09, Scott Wood <scottwood@freescale.com> wrote:

> On Fri, 2013-10-04 at 12:03 +0000, Thomas H=FChn wrote:
>> [code]
>> [ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
>> [ 2671.847141] Freescale P1014
>> [ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat =
ath9k_common pppox p
>> e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent =
xt_quota xt_pkttype xt_o
>> mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT =
xt_NETMAP xt_LOG xt_IPMAR
>> ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic =
nf_nat_sip nf_nat_r
>> ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc =
nf_conntrack_h323 n
>> compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio =
sch_htb sch_gred sc
>> skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route =
cls_fw sch_hfsc sch_ing
>> r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod =
scsi_mod fsl_mph_dr_of gp
>> [ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
>> [ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
>> [ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
>> [ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
>=20
> Trap 0x3202 is a watchdog timer.
>=20
> Did you get a "Bad trap at=85" line before the above dump? =20

I need to setup my test scenario again as I just copied the crash lines =
out of it without saving the full file with all lines.

> Do you have
> any idea why the watchdog would have been armed without =
CONFIG_BOOKE_WDT
> being set? =20

> Is CONFIG_BOOKE_WDT set?
>=20
This config option is not set. Should I try with the CONFIG_BOOK_WDT =
enabled ?

Greetings Thomas

> -Scott
>=20
>=20
>=20
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-11  9:17   ` Thomas Hühn
@ 2013-10-11 17:49     ` Scott Wood
  2013-10-11 17:49       ` Scott Wood
  0 siblings, 1 reply; 13+ messages in thread
From: Scott Wood @ 2013-10-11 17:49 UTC (permalink / raw)
  To: Thomas Hühn
  Cc: linuxppc-dev@lists.ozlabs.org, claudiu.manoil@freescale.com

On Fri, 2013-10-11 at 11:17 +0200, Thomas H=C3=BChn wrote:
> Hi Scott,
>=20
> On 09.10.2013, at 00:09, Scott Wood <scottwood@freescale.com> wrote:
>=20
> > On Fri, 2013-10-04 at 12:03 +0000, Thomas H=C3=BChn wrote:
> >> [code]
> >> [ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
> >> [ 2671.847141] Freescale P1014
> >> [ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat =
ath9k_common pppox p
> >> e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_q=
uota xt_pkttype xt_o
> >> mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NE=
TMAP xt_LOG xt_IPMAR
> >> ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic=
 nf_nat_sip nf_nat_r
> >> ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc=
 nf_conntrack_h323 n
> >> compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio =
sch_htb sch_gred sc
> >> skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls=
_fw sch_hfsc sch_ing
> >> r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_=
mod fsl_mph_dr_of gp
> >> [ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
> >> [ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
> >> [ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
> >> [ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
> >=20
> > Trap 0x3202 is a watchdog timer.
> >=20
> > Did you get a "Bad trap at=E2=80=A6" line before the above dump? =20
>=20
> I need to setup my test scenario again as I just copied the crash lines=
 out of it without saving the full file with all lines.
>=20
> > Do you have
> > any idea why the watchdog would have been armed without CONFIG_BOOKE_=
WDT
> > being set? =20
>=20
> > Is CONFIG_BOOKE_WDT set?
> >=20
> This config option is not set. Should I try with the CONFIG_BOOK_WDT en=
abled ?

Instead, could you try to track down where TCR[WE] is getting set, and
dump TCR in unknown_exception()?

-Scott

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-11 17:49     ` Scott Wood
@ 2013-10-11 17:49       ` Scott Wood
  0 siblings, 0 replies; 13+ messages in thread
From: Scott Wood @ 2013-10-11 17:49 UTC (permalink / raw)
  To: Thomas Hühn
  Cc: linuxppc-dev@lists.ozlabs.org, claudiu.manoil@freescale.com

On Fri, 2013-10-11 at 12:49 -0500, Scott Wood wrote:
> On Fri, 2013-10-11 at 11:17 +0200, Thomas H=C3=BChn wrote:
> > Hi Scott,
> >=20
> > On 09.10.2013, at 00:09, Scott Wood <scottwood@freescale.com> wrote:
> >=20
> > > On Fri, 2013-10-04 at 12:03 +0000, Thomas H=C3=BChn wrote:
> > >> [code]
> > >> [ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
> > >> [ 2671.847141] Freescale P1014
> > >> [ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_na=
t ath9k_common pppox p
> > >> e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt=
_quota xt_pkttype xt_o
> > >> mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_=
NETMAP xt_LOG xt_IPMAR
> > >> ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_bas=
ic nf_nat_sip nf_nat_r
> > >> ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_i=
rc nf_conntrack_h323 n
> > >> compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_pri=
o sch_htb sch_gred sc
> > >> skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route c=
ls_fw sch_hfsc sch_ing
> > >> r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scs=
i_mod fsl_mph_dr_of gp
> > >> [ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
> > >> [ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
> > >> [ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
> > >> [ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
> > >=20
> > > Trap 0x3202 is a watchdog timer.
> > >=20
> > > Did you get a "Bad trap at=E2=80=A6" line before the above dump? =20
> >=20
> > I need to setup my test scenario again as I just copied the crash lin=
es out of it without saving the full file with all lines.
> >=20
> > > Do you have
> > > any idea why the watchdog would have been armed without CONFIG_BOOK=
E_WDT
> > > being set? =20
> >=20
> > > Is CONFIG_BOOKE_WDT set?
> > >=20
> > This config option is not set. Should I try with the CONFIG_BOOK_WDT =
enabled ?
>=20
> Instead, could you try to track down where TCR[WE] is getting set, and
> dump TCR in unknown_exception()?

Sorry, that should be TCR[WIE].

-Scott

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-16  7:10   ` Claudiu Manoil
@ 2013-10-16 12:44     ` Thomas Hühn
  2013-10-31 11:51     ` Thomas Hühn
  1 sibling, 0 replies; 13+ messages in thread
From: Thomas Hühn @ 2013-10-16 12:44 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: linuxppc-dev

Hi,

Together with other OpenWRT users we currently use this workaround patch =
(https://dev.openwrt.org/changeset/38409/trunk) that downgrades the =
gianfar driver to kernel version 3.9, as 3.10 is just crashing.=20
With this workaround, several users with TPLink 4900 routers reported =
that their system is runing stable and without issues.

>=20
> Please try the following patch:
> http://patchwork.ozlabs.org/patch/283235/
>=20
> It should help with your issue.
>=20

Thank for you patch.=20
I have adapted your patch to by applicable in current OpenWRT trunk.
I posted it in our current forum thread, where several users beside me =
will test it in the next days.
(https://forum.openwrt.org/viewtopic.php?pid=3D214931#p214931)
You can expect a report after the weekend.

Greetings Thomas

> claudiu
>=20
>=20
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-16  7:10   ` Claudiu Manoil
  2013-10-16 12:44     ` Thomas Hühn
@ 2013-10-31 11:51     ` Thomas Hühn
  2013-11-01 11:51       ` Claudiu Manoil
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Hühn @ 2013-10-31 11:51 UTC (permalink / raw)
  To: claudiu.manoil@freescale.com; +Cc: linuxppc-dev@lists.ozlabs.org

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

Hi Claudiu,


> Please try the following patch:
> http://patchwork.ozlabs.org/patch/283235/
> 
> It should help with your issue.
> 

Several OpenWrt users including myself have tested your patch on TPLink-4900 routers.
We do have positive feedback, as no crash nor system freeze was reported for different 
network loads and router setups. 
All different scenarios / details and two digit uptimes are in this forum thread:
https://forum.openwrt.org/viewtopic.php?id=42062&p=13

Thanks again for your work and  I hope to see this patch merged upstream.

Greetings Thomas


[-- Attachment #2: Type: text/html, Size: 1760 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Gianfar driver crashes in Kernel v3.10
  2013-10-31 11:51     ` Thomas Hühn
@ 2013-11-01 11:51       ` Claudiu Manoil
  0 siblings, 0 replies; 13+ messages in thread
From: Claudiu Manoil @ 2013-11-01 11:51 UTC (permalink / raw)
  To: Thomas Hühn; +Cc: linuxppc-dev@lists.ozlabs.org

Hi Thomas,

On 10/31/2013 1:51 PM, Thomas H=FChn wrote:
> Hi Claudiu,
>
>
>> Please try the following patch:
>> http://patchwork.ozlabs.org/patch/283235/
>>
>> It should help with your issue.
>>
>
> Several OpenWrt users including myself have tested your patch on
> TPLink-4900 routers.
> We do have positive feedback, as no crash nor system freeze was reporte=
d
> for different
> network loads and router setups.
> All different scenarios / details and two digit uptimes are in this
> forum thread:
> https://forum.openwrt.org/viewtopic.php?id=3D42062&p=3D13
>
> Thanks again for your work and  I hope to see this patch merged upstrea=
m.
>
> Greetings Thomas
>

Thanks for the testing and feedback.

The patch has been merged into davem/net-next.git:

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=
=3D3ba405db1c1b05d157474c71e559393f7ea436ad

And I think it will be merged from there into the next
kernel release. I think that in time it will be back-ported
to stable kernel versions too. In some cases, requests are made
to the netdev mailing list to speedup inclusion of fixes (such
as this one) to certain stable kernel versions.

Thanks,
Claudiu

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-11-01 11:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-04 12:03 Gianfar driver crashes in Kernel v3.10 Thomas Hühn
2013-10-04 13:45 ` Claudiu Manoil
2013-10-08 22:09 ` Scott Wood
2013-10-11  9:17   ` Thomas Hühn
2013-10-11 17:49     ` Scott Wood
2013-10-11 17:49       ` Scott Wood
  -- strict thread matches above, loose matches on Subject: below --
2013-10-04 12:28 Thomas Hühn
2013-10-10 11:07 ` Claudiu Manoil
2013-10-10 21:41   ` Scott Wood
2013-10-11  7:49     ` Claudiu Manoil
     [not found] <90BE8C5D-E23B-41B5-BB52-8A0C0758931D@net.t-labs.tu-berlin.de>
2013-10-11  8:59 ` Fwd: " Thomas Hühn
2013-10-16  7:10   ` Claudiu Manoil
2013-10-16 12:44     ` Thomas Hühn
2013-10-31 11:51     ` Thomas Hühn
2013-11-01 11:51       ` Claudiu Manoil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).