linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* mal_probe crash
@ 2009-01-07 20:44 Sean MacLennan
  2009-01-08 20:46 ` Josh Boyer
  0 siblings, 1 reply; 18+ messages in thread
From: Sean MacLennan @ 2009-01-07 20:44 UTC (permalink / raw)
  To: linuxppc-dev

With Linus' latest git, mal_probe crashes. It calls netif_napi_add with
the first parameter NULL. This was ok since the parameter, a net
device, was only used if CONFIG_NETPOLL was set.

Now it is always de-referenced. A quick check shows that ibm_newemac is
the only driver that passed NULL as the first parameter to this call in
2.6.28.

I don't really follow ibm_newemac changes, so the patch may be waiting
to be applied. This is really just a heads up.

Cheers,
   Sean

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-08 20:46 ` Josh Boyer
@ 2009-01-07 22:50   ` Benjamin Herrenschmidt
  2009-01-09 14:42     ` Geert Uytterhoeven
  2009-01-09 14:49     ` Matthias Fuchs
  0 siblings, 2 replies; 18+ messages in thread
From: Benjamin Herrenschmidt @ 2009-01-07 22:50 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linuxppc-dev, Sean MacLennan

On Thu, 2009-01-08 at 15:46 -0500, Josh Boyer wrote:
> On Wed, Jan 07, 2009 at 03:44:34PM -0500, Sean MacLennan wrote:
> >With Linus' latest git, mal_probe crashes. It calls netif_napi_add with
> >the first parameter NULL. This was ok since the parameter, a net
> >device, was only used if CONFIG_NETPOLL was set.
> >
> >Now it is always de-referenced. A quick check shows that ibm_newemac is
> >the only driver that passed NULL as the first parameter to this call in
> >2.6.28.
> >
> >I don't really follow ibm_newemac changes, so the patch may be waiting
> >to be applied. This is really just a heads up.
> 
> I haven't heard of that, so I doubt there's a patch pending.  *Sigh*

There isn't that I know of. The EMAC code creates a single NAPI instance
for all EMACs and I think used to completely disconnect things. The old
code created a fake netdev just for NAPI, that became unnecessary with
the new NAPI stuff.... but it looks like the way we do things now
displeases some changes in the network stack. I'll have to dig.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-07 20:44 mal_probe crash Sean MacLennan
@ 2009-01-08 20:46 ` Josh Boyer
  2009-01-07 22:50   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 18+ messages in thread
From: Josh Boyer @ 2009-01-08 20:46 UTC (permalink / raw)
  To: Sean MacLennan; +Cc: linuxppc-dev

On Wed, Jan 07, 2009 at 03:44:34PM -0500, Sean MacLennan wrote:
>With Linus' latest git, mal_probe crashes. It calls netif_napi_add with
>the first parameter NULL. This was ok since the parameter, a net
>device, was only used if CONFIG_NETPOLL was set.
>
>Now it is always de-referenced. A quick check shows that ibm_newemac is
>the only driver that passed NULL as the first parameter to this call in
>2.6.28.
>
>I don't really follow ibm_newemac changes, so the patch may be waiting
>to be applied. This is really just a heads up.

I haven't heard of that, so I doubt there's a patch pending.  *Sigh*

josh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-07 22:50   ` Benjamin Herrenschmidt
@ 2009-01-09 14:42     ` Geert Uytterhoeven
  2009-01-09 22:34       ` Herbert Xu
  2009-01-09 14:49     ` Matthias Fuchs
  1 sibling, 1 reply; 18+ messages in thread
From: Geert Uytterhoeven @ 2009-01-09 14:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Herbert Xu, David S. Miller
  Cc: linuxppc-dev, netdev, Sean MacLennan

On Thu, 8 Jan 2009, Benjamin Herrenschmidt wrote:
> On Thu, 2009-01-08 at 15:46 -0500, Josh Boyer wrote:
> > On Wed, Jan 07, 2009 at 03:44:34PM -0500, Sean MacLennan wrote:
> > >With Linus' latest git, mal_probe crashes. It calls netif_napi_add with
> > >the first parameter NULL. This was ok since the parameter, a net
> > >device, was only used if CONFIG_NETPOLL was set.
> > >
> > >Now it is always de-referenced. A quick check shows that ibm_newemac is
> > >the only driver that passed NULL as the first parameter to this call in
> > >2.6.28.
> > >
> > >I don't really follow ibm_newemac changes, so the patch may be waiting
> > >to be applied. This is really just a heads up.
> > 
> > I haven't heard of that, so I doubt there's a patch pending.  *Sigh*
> 
> There isn't that I know of. The EMAC code creates a single NAPI instance
> for all EMACs and I think used to completely disconnect things. The old
> code created a fake netdev just for NAPI, that became unnecessary with
> the new NAPI stuff.... but it looks like the way we do things now
> displeases some changes in the network stack. I'll have to dig.

Verified on my Sequoia (which now lost its network :-(

The regression/problem (requiring a valid net_device in netif_napi_add(), even
if CONFIG_NETPOLL=n) seems to be introduced by commit
d565b0a1a9b6ee7dff46e1f68b26b526ac11ae50 ("net: Add Generic Receive Offload
infrastructure").

However, it was broken before, in case CONFIG_NETPOLL=y.

So mal_probe() (triggered by mal_init()) needs to know about the net_device
before it has been allocated by emac_probe() (triggered by
of_register_platform_driver(&emac_driver)):

| static int __init emac_init(void)
| {
| 	int rc;
| 
| 	printk(KERN_INFO DRV_DESC ", version " DRV_VERSION "\n");
| 
| 	/* Init debug stuff */
| 	emac_init_debug();
| 
| 	/* Build EMAC boot list */
| 	emac_make_bootlist();
| 
| 	/* Init submodules */
| 	rc = mal_init();
| 	if (rc)
| 		goto err;
| 	rc = zmii_init();
| 	if (rc)
| 		goto err_mal;
| 	rc = rgmii_init();
| 	if (rc)
| 		goto err_zmii;
| 	rc = tah_init();
| 	if (rc)
| 		goto err_rgmii;
| 	rc = of_register_platform_driver(&emac_driver);
| 	if (rc)
| 		goto err_tah;
| 
| 	return 0;
| 
|  err_tah:
| 	tah_exit();
|  err_rgmii:
| 	rgmii_exit();
|  err_zmii:
| 	zmii_exit();
|  err_mal:
| 	mal_exit();
|  err:
| 	return rc;
| }

Can the order of mal_init() and of_register_platform_driver(&emac_driver) be
reversed? If yes, there still has some link to be made between the mal and the
emac devices.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-07 22:50   ` Benjamin Herrenschmidt
  2009-01-09 14:42     ` Geert Uytterhoeven
@ 2009-01-09 14:49     ` Matthias Fuchs
  2009-01-09 15:02       ` Matthias Fuchs
  2009-01-09 21:09       ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 18+ messages in thread
From: Matthias Fuchs @ 2009-01-09 14:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Sean MacLennan

On Wednesday 07 January 2009 23:50, Benjamin Herrenschmidt wrote:
> On Thu, 2009-01-08 at 15:46 -0500, Josh Boyer wrote:
> > On Wed, Jan 07, 2009 at 03:44:34PM -0500, Sean MacLennan wrote:
> > >With Linus' latest git, mal_probe crashes. It calls netif_napi_add with
> > >the first parameter NULL. This was ok since the parameter, a net
> > >device, was only used if CONFIG_NETPOLL was set.
> > >
> > >Now it is always de-referenced. A quick check shows that ibm_newemac is
> > >the only driver that passed NULL as the first parameter to this call in
> > >2.6.28.
> > >
> > >I don't really follow ibm_newemac changes, so the patch may be waiting
> > >to be applied. This is really just a heads up.
> > 
> > I haven't heard of that, so I doubt there's a patch pending.  *Sigh*
> 
> There isn't that I know of. The EMAC code creates a single NAPI instance
> for all EMACs and I think used to completely disconnect things. The old
> code created a fake netdev just for NAPI, that became unnecessary with
> the new NAPI stuff.... but it looks like the way we do things now
> displeases some changes in the network stack. I'll have to dig.
> 
Could it be that simple. Probably not. It works at a first glace on
a 405EP ang GPr board. But it might cause problems when having more than 
one EMAC up at the same time.

Matthias

[PATCH] powerpc: Fix ibm_newemac driver

Since commit d565b0a1a9b6ee7d netif_napi_add must be called
if a proper net_device pointer != NULL.

Signed-off-by: Matthias Fuchs <matthias.fuchs@esd-electronics.com>
---
 drivers/net/ibm_newemac/core.c |    3 +++
 drivers/net/ibm_newemac/mal.c  |    5 +----
 drivers/net/ibm_newemac/mal.h  |    1 +
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
index 87a7066..9bd4d6d 100644
--- a/drivers/net/ibm_newemac/core.c
+++ b/drivers/net/ibm_newemac/core.c
@@ -2767,6 +2767,9 @@ static int __devinit emac_probe(struct of_device *ofdev,
 	if (dev->mdio_dev != NULL)
 		dev->mdio_instance = dev_get_drvdata(&dev->mdio_dev->dev);
 
+	netif_napi_add(ndev, &dev->mal->napi, mal_poll,
+		       CONFIG_IBM_NEW_EMAC_POLL_WEIGHT);
+
 	/* Register with MAL */
 	dev->commac.ops = &emac_commac_ops;
 	dev->commac.dev = dev;
diff --git a/drivers/net/ibm_newemac/mal.c b/drivers/net/ibm_newemac/mal.c
index ecf9798..d5306ae 100644
--- a/drivers/net/ibm_newemac/mal.c
+++ b/drivers/net/ibm_newemac/mal.c
@@ -391,7 +391,7 @@ void mal_poll_enable(struct mal_instance *mal, struct mal_commac *commac)
 	napi_schedule(&mal->napi);
 }
 
-static int mal_poll(struct napi_struct *napi, int budget)
+int mal_poll(struct napi_struct *napi, int budget)
 {
 	struct mal_instance *mal = container_of(napi, struct mal_instance, napi);
 	struct list_head *l;
@@ -613,9 +613,6 @@ static int __devinit mal_probe(struct of_device *ofdev,
 	INIT_LIST_HEAD(&mal->list);
 	spin_lock_init(&mal->lock);
 
-	netif_napi_add(NULL, &mal->napi, mal_poll,
-		       CONFIG_IBM_NEW_EMAC_POLL_WEIGHT);
-
 	/* Load power-on reset defaults */
 	mal_reset(mal);
 
diff --git a/drivers/net/ibm_newemac/mal.h b/drivers/net/ibm_newemac/mal.h
index 2f0a873..51597bd 100644
--- a/drivers/net/ibm_newemac/mal.h
+++ b/drivers/net/ibm_newemac/mal.h
@@ -282,6 +282,7 @@ void mal_disable_rx_channel(struct mal_instance *mal, int channel);
 
 void mal_poll_disable(struct mal_instance *mal, struct mal_commac *commac);
 void mal_poll_enable(struct mal_instance *mal, struct mal_commac *commac);
+int mal_poll(struct napi_struct *napi, int budget);
 
 /* Add/remove EMAC to/from MAL polling list */
 void mal_poll_add(struct mal_instance *mal, struct mal_commac *commac);
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-09 14:49     ` Matthias Fuchs
@ 2009-01-09 15:02       ` Matthias Fuchs
  2009-01-09 15:24         ` Geert Uytterhoeven
  2009-01-09 21:09       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 18+ messages in thread
From: Matthias Fuchs @ 2009-01-09 15:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Sean MacLennan

Forget my last posting! It's just a dirty work around when having a single EMAC.
It does not work with two EMACs like on sequoia.

Matthias

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-09 15:02       ` Matthias Fuchs
@ 2009-01-09 15:24         ` Geert Uytterhoeven
  2009-01-09 21:30           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 18+ messages in thread
From: Geert Uytterhoeven @ 2009-01-09 15:24 UTC (permalink / raw)
  To: Matthias Fuchs; +Cc: linuxppc-dev, Sean MacLennan

On Fri, 9 Jan 2009, Matthias Fuchs wrote:
> Forget my last posting! It's just a dirty work around when having a single EMAC.
> It does not work with two EMACs like on sequoia.

Indeed. It doesn't on my sequoia :-(

I also tried reviving connectivity by adding an Intel PRO/1000 GT network card,
but I got a machine check exception. Don't know if this is a problem with the
PPC44x PCI code or with the e1000 driver.

U-Boot 1.2.0-gc0c292b2 (Jun  5 2007 - 07:16:12)

CPU:   AMCC PowerPC 440EPx Rev. A at 666.666 MHz (PLB=166, OPB=83, EBC=55 MHz)
       Security/Kasumi support
       I2C boot EEPROM enabled
       Bootstrap Option H - Boot ROM Location I2C (Addr 0x52)
       Internal PCI arbiter enabled, PCI async ext clock used
       32 kB I-Cache 32 kB D-Cache
Board: Sequoia - AMCC PPC440EPx Evaluation Board, Rev. F, PCI=33 MHz
I2C:   ready
DTT:   1 is 223 C
DRAM:  256 MB
FLASH: 64 MB
NAND:  32 MiB
PCI:   Bus Dev VenId DevId Class Int
        00  0c  8086  107c  0200  00
In:    serial
Out:   serial
Err:   serial
USB:   Host(int phy) Device(ext phy)
Net:   ppc_4xx_eth0, ppc_4xx_eth1

Type "run flash_nfs" to mount root filesystem over NFS

Hit any key to stop autoboot:  0 
Waiting for PHY auto negotiation to complete.. done
ENET Speed is 100 Mbps - FULL duplex connection (EMAC0)
BOOTP broadcast 1
DHCP client bound to address 192.168.106.188
Using ppc_4xx_eth0 device
TFTP from server 192.168.106.200; our IP address is 192.168.106.188
Filename '/sequoia/cuImage.sequoia'.
Load address: 0x100000
Loading: #################################################################
         #################################################################
         #################################################################
         #################################################################
         #############################################
done
Bytes transferred = 1556529 (17c031 hex)
## Booting image at 00100000 ...
   Image Name:   Linux-2.6.28-07939-g2150edc-dirt
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:    1556465 Bytes =  1.5 MB
   Load Address: 00400000
   Entry Point:  00400458
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
CPU clock-frequency <- 0x27bc86a4 (667MHz)
CPU timebase-frequency <- 0x27bc86a4 (667MHz)
/plb: clock-frequency <- 9ef21a9 (167MHz)
/plb/opb: clock-frequency <- 4f790d4 (83MHz)
/plb/opb/ebc: clock-frequency <- 34fb5e3 (56MHz)
/plb/opb/serial@ef600300: clock-frequency <- a8c000 (11MHz)
/plb/opb/serial@ef600400: clock-frequency <- a8c000 (11MHz)
/plb/opb/serial@ef600500: clock-frequency <- 42ecac (4MHz)
/plb/opb/serial@ef600600: clock-frequency <- 42ecac (4MHz)
Memory <- <0x0 0x0 0xffff000> (255MB)
ethernet0: local-mac-address <- 00:10:ec:00:f1:df
ethernet1: local-mac-address <- 00:10:ec:80:f1:df

zImage starting: loaded at 0x00400000 (sp: 0x0ff2ba18)
Allocating 0x333834 bytes for kernel ...
gunzipping (0x00000000 <- 0x0040e000:0x00735820)...done 0x31417c bytes

Linux/PowerPC load: ip=on root=/dev/nfs
Finalizing device tree... flat tree at 0x742300
Using PowerPC 44x Platform machine description
Linux version 2.6.28-07939-g2150edc-dirty (geert@vixen) (gcc version 4.3.2 (GCC) ) #4 Fri Jan 9 16:05:53 CET 2009
console [udbg0] enabled
setup_arch: bootmem
arch: exit
Zone PFN ranges:
  DMA      0x00000000 -> 0x0000ffff
  Normal   0x0000ffff -> 0x0000ffff
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    0: 0x00000000 -> 0x0000ffff
MMU: Allocated 1088 bytes of context maps for 255 contexts
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65023
Kernel command line: ip=on root=/dev/nfs
UIC0 (32 IRQ sources) at DCR 0xc0
UIC1 (32 IRQ sources) at DCR 0xd0
UIC2 (32 IRQ sources) at DCR 0xe0
PID hash table entries: 1024 (order: 10, 4096 bytes)
clocksource: timebase mult[600000] shift[22] registered
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 256256k/262140k available (2996k kernel code, 5572k reserved, 128k data, 122k bss, 156k init)
SLUB: Genslabs=10, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Calibrating delay loop... 1331.20 BogoMIPS (lpj=2662400)
Mount-cache hash table entries: 512
net_namespace: 716 bytes
NET: Registered protocol family 16
             
PCI host bridge /plb/pci@1ec000000 (primary) ranges:
 MEM 0x0000000180000000..0x00000001bfffffff -> 0x0000000080000000 
  IO 0x00000001e8000000..0x00000001e800ffff -> 0x0000000000000000
  IO 0x00000001e8800000..0x00000001ebffffff -> 0x0000000000000000
 \--> Skipped (too many) !
4xx PCI DMA offset set to 0x00000000
/plb/pci@1ec000000: Resource out of range
PCI: Probing PCI hardware
PCI: Hiding 4xx host bridge resources 0000:00:00.0
pci 0000:00:0c.0: PME# supported from D0 D3hot D3cold
pci 0000:00:0c.0: PME# disabled
bio: create slab <bio-0> at 0
NET: Registered protocol family 2
IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
TCP established hash table entries: 8192 (order: 4, 65536 bytes)
TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 8192 bind 8192)
TCP reno registered
NET: Registered protocol family 1
JFFS2 version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
msgmni has been set to 501
alg: No test for stdrng (krng)
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
serial8250.0: ttyS0 at MMIO 0x1ef600300 (irq = 17) is a 16550A
console handover: boot [udbg0] -> real [ttyS0]
serial8250.0: ttyS1 at MMIO 0x1ef600400 (irq = 18) is a 16550A
serial8250.0: ttyS2 at MMIO 0x1ef600500 (irq = 19) is a 16550A
serial8250.0: ttyS3 at MMIO 0x1ef600600 (irq = 20) is a 16550A
1ef600300.serial: ttyS0 at MMIO 0x1ef600300 (irq = 17) is a 16550A
1ef600400.serial: ttyS1 at MMIO 0x1ef600400 (irq = 18) is a 16550A
1ef600500.serial: ttyS2 at MMIO 0x1ef600500 (irq = 19) is a 16550A
1ef600600.serial: ttyS3 at MMIO 0x1ef600600 (irq = 20) is a 16550A
brd: module loaded
Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI
Copyright (c) 1999-2006 Intel Corporation.
e1000 0000:00:0c.0: enabling device (0000 -> 0003)
Machine check in kernel mode.
Data Read PLB Error
Oops: Machine check, sig: 7 [#1]
PowerPC 44x Platform
Modules linked in:
NIP: c0187cb8 LR: c0236300 CTR: c0187bb0
REGS: cfff7f10 TRAP: 0214   Not tainted  (2.6.28-07939-g2150edc-dirty)
MSR: 00029000 <EE,ME,CE>  CR: 28d6cb24  XER: 20000000
TASK = cf818400[1] 'swapper' THREAD: cf828000
GPR00: 00000000 cf829db0 cf818400 cf8114fc 00000004 00000000 00000002 cf829d88 
GPR08: 00000000 d10c0008 00000000 0000000b 00001000 00108000 0ffb2400 00000001 
GPR16: 007fff13 00400458 00800000 c032d69c c024bfc4 c0330000 cf8114fc 00000001 
GPR24: 00000000 00000001 00000047 cf811000 cf811320 cf811000 00000001 cf83d400 
NIP [c0187cb8] e1000_set_media_type+0x64/0xe4
LR [c0236300] e1000_probe+0x334/0xd5c
Call Trace:
[cf829db0] [c02362b4] e1000_probe+0x2e8/0xd5c (unreliable)
[cf829e10] [c015c018] local_pci_probe+0x24/0x34
[cf829e20] [c015c240] pci_device_probe+0x84/0xa8
[cf829e50] [c017b948] driver_probe_device+0xb4/0x1e8
[cf829e70] [c017bb20] __driver_attach+0xa4/0xa8
[cf829e90] [c017b0fc] bus_for_each_dev+0x70/0xac
[cf829ec0] [c017b760] driver_attach+0x24/0x34
[cf829ed0] [c017aa04] bus_add_driver+0x1d0/0x244
[cf829ef0] [c017bd40] driver_register+0x70/0x160
[cf829f10] [c015c4e8] __pci_register_driver+0x4c/0xac
[cf829f30] [c02dfb30] e1000_init_module+0x58/0xa8
[cf829f50] [c00013d8] do_one_initcall+0x34/0x1b0
[cf829fc0] [c02c6178] kernel_init+0x94/0x100
[cf829ff0] [c000da64] kernel_thread+0x50/0x6c
Instruction dump:
409c0080 2f8b0010 419e006c 2b8b0010 419d005c 380bffff 2b800001 409d0074 
81230000 39290008 7c0004ac 7c004c2c <0c000000> 4c00012c 70000020 40820060 
---[ end trace 85643a8ae0783f0b ]---
Kernel panic - not syncing: Attempted to kill init!
Rebooting in 180 seconds..


With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-09 14:49     ` Matthias Fuchs
  2009-01-09 15:02       ` Matthias Fuchs
@ 2009-01-09 21:09       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 18+ messages in thread
From: Benjamin Herrenschmidt @ 2009-01-09 21:09 UTC (permalink / raw)
  To: Matthias Fuchs; +Cc: linuxppc-dev, Sean MacLennan


> Could it be that simple. Probably not. It works at a first glace on
> a 405EP ang GPr board. But it might cause problems when having more than 
> one EMAC up at the same time.

I talked with the network folks and that should be ok.

We only need to be a bit careful in case for some reason the EMAC we
linked to NAPI get removed/destroyed... We only do all at once for now
but heh..

Ben.

> Matthias
> 
> [PATCH] powerpc: Fix ibm_newemac driver
> 
> Since commit d565b0a1a9b6ee7d netif_napi_add must be called
> if a proper net_device pointer != NULL.
> 
> Signed-off-by: Matthias Fuchs <matthias.fuchs@esd-electronics.com>
> ---
>  drivers/net/ibm_newemac/core.c |    3 +++
>  drivers/net/ibm_newemac/mal.c  |    5 +----
>  drivers/net/ibm_newemac/mal.h  |    1 +
>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
> index 87a7066..9bd4d6d 100644
> --- a/drivers/net/ibm_newemac/core.c
> +++ b/drivers/net/ibm_newemac/core.c
> @@ -2767,6 +2767,9 @@ static int __devinit emac_probe(struct of_device *ofdev,
>  	if (dev->mdio_dev != NULL)
>  		dev->mdio_instance = dev_get_drvdata(&dev->mdio_dev->dev);
>  
> +	netif_napi_add(ndev, &dev->mal->napi, mal_poll,
> +		       CONFIG_IBM_NEW_EMAC_POLL_WEIGHT);
> +
>  	/* Register with MAL */
>  	dev->commac.ops = &emac_commac_ops;
>  	dev->commac.dev = dev;
> diff --git a/drivers/net/ibm_newemac/mal.c b/drivers/net/ibm_newemac/mal.c
> index ecf9798..d5306ae 100644
> --- a/drivers/net/ibm_newemac/mal.c
> +++ b/drivers/net/ibm_newemac/mal.c
> @@ -391,7 +391,7 @@ void mal_poll_enable(struct mal_instance *mal, struct mal_commac *commac)
>  	napi_schedule(&mal->napi);
>  }
>  
> -static int mal_poll(struct napi_struct *napi, int budget)
> +int mal_poll(struct napi_struct *napi, int budget)
>  {
>  	struct mal_instance *mal = container_of(napi, struct mal_instance, napi);
>  	struct list_head *l;
> @@ -613,9 +613,6 @@ static int __devinit mal_probe(struct of_device *ofdev,
>  	INIT_LIST_HEAD(&mal->list);
>  	spin_lock_init(&mal->lock);
>  
> -	netif_napi_add(NULL, &mal->napi, mal_poll,
> -		       CONFIG_IBM_NEW_EMAC_POLL_WEIGHT);
> -
>  	/* Load power-on reset defaults */
>  	mal_reset(mal);
>  
> diff --git a/drivers/net/ibm_newemac/mal.h b/drivers/net/ibm_newemac/mal.h
> index 2f0a873..51597bd 100644
> --- a/drivers/net/ibm_newemac/mal.h
> +++ b/drivers/net/ibm_newemac/mal.h
> @@ -282,6 +282,7 @@ void mal_disable_rx_channel(struct mal_instance *mal, int channel);
>  
>  void mal_poll_disable(struct mal_instance *mal, struct mal_commac *commac);
>  void mal_poll_enable(struct mal_instance *mal, struct mal_commac *commac);
> +int mal_poll(struct napi_struct *napi, int budget);
>  
>  /* Add/remove EMAC to/from MAL polling list */
>  void mal_poll_add(struct mal_instance *mal, struct mal_commac *commac);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-09 15:24         ` Geert Uytterhoeven
@ 2009-01-09 21:30           ` Benjamin Herrenschmidt
  2009-01-09 22:01             ` Roland Dreier
  0 siblings, 1 reply; 18+ messages in thread
From: Benjamin Herrenschmidt @ 2009-01-09 21:30 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Sean MacLennan, linuxppc-dev

On Fri, 2009-01-09 at 16:24 +0100, Geert Uytterhoeven wrote:
> On Fri, 9 Jan 2009, Matthias Fuchs wrote:
> > Forget my last posting! It's just a dirty work around when having a single EMAC.
> > It does not work with two EMACs like on sequoia.
> 
> Indeed. It doesn't on my sequoia :-(
> 
> I also tried reviving connectivity by adding an Intel PRO/1000 GT network card,
> but I got a machine check exception. Don't know if this is a problem with the
> PPC44x PCI code or with the e1000 driver.

Can you double check that the e1000 isn't copying the PCI resources into
a unsigned long before ioremap'ing the result, thus cropping the top
bits ?

It had a bug like that for which I sent a fix a while ago but maybe that
crept back in...

Cheers,
Ben.

> U-Boot 1.2.0-gc0c292b2 (Jun  5 2007 - 07:16:12)
> 
> CPU:   AMCC PowerPC 440EPx Rev. A at 666.666 MHz (PLB=166, OPB=83, EBC=55 MHz)
>        Security/Kasumi support
>        I2C boot EEPROM enabled
>        Bootstrap Option H - Boot ROM Location I2C (Addr 0x52)
>        Internal PCI arbiter enabled, PCI async ext clock used
>        32 kB I-Cache 32 kB D-Cache
> Board: Sequoia - AMCC PPC440EPx Evaluation Board, Rev. F, PCI=33 MHz
> I2C:   ready
> DTT:   1 is 223 C
> DRAM:  256 MB
> FLASH: 64 MB
> NAND:  32 MiB
> PCI:   Bus Dev VenId DevId Class Int
>         00  0c  8086  107c  0200  00
> In:    serial
> Out:   serial
> Err:   serial
> USB:   Host(int phy) Device(ext phy)
> Net:   ppc_4xx_eth0, ppc_4xx_eth1
> 
> Type "run flash_nfs" to mount root filesystem over NFS
> 
> Hit any key to stop autoboot:  0 
> Waiting for PHY auto negotiation to complete.. done
> ENET Speed is 100 Mbps - FULL duplex connection (EMAC0)
> BOOTP broadcast 1
> DHCP client bound to address 192.168.106.188
> Using ppc_4xx_eth0 device
> TFTP from server 192.168.106.200; our IP address is 192.168.106.188
> Filename '/sequoia/cuImage.sequoia'.
> Load address: 0x100000
> Loading: #################################################################
>          #################################################################
>          #################################################################
>          #################################################################
>          #############################################
> done
> Bytes transferred = 1556529 (17c031 hex)
> ## Booting image at 00100000 ...
>    Image Name:   Linux-2.6.28-07939-g2150edc-dirt
>    Image Type:   PowerPC Linux Kernel Image (gzip compressed)
>    Data Size:    1556465 Bytes =  1.5 MB
>    Load Address: 00400000
>    Entry Point:  00400458
>    Verifying Checksum ... OK
>    Uncompressing Kernel Image ... OK
> CPU clock-frequency <- 0x27bc86a4 (667MHz)
> CPU timebase-frequency <- 0x27bc86a4 (667MHz)
> /plb: clock-frequency <- 9ef21a9 (167MHz)
> /plb/opb: clock-frequency <- 4f790d4 (83MHz)
> /plb/opb/ebc: clock-frequency <- 34fb5e3 (56MHz)
> /plb/opb/serial@ef600300: clock-frequency <- a8c000 (11MHz)
> /plb/opb/serial@ef600400: clock-frequency <- a8c000 (11MHz)
> /plb/opb/serial@ef600500: clock-frequency <- 42ecac (4MHz)
> /plb/opb/serial@ef600600: clock-frequency <- 42ecac (4MHz)
> Memory <- <0x0 0x0 0xffff000> (255MB)
> ethernet0: local-mac-address <- 00:10:ec:00:f1:df
> ethernet1: local-mac-address <- 00:10:ec:80:f1:df
> 
> zImage starting: loaded at 0x00400000 (sp: 0x0ff2ba18)
> Allocating 0x333834 bytes for kernel ...
> gunzipping (0x00000000 <- 0x0040e000:0x00735820)...done 0x31417c bytes
> 
> Linux/PowerPC load: ip=on root=/dev/nfs
> Finalizing device tree... flat tree at 0x742300
> Using PowerPC 44x Platform machine description
> Linux version 2.6.28-07939-g2150edc-dirty (geert@vixen) (gcc version 4.3.2 (GCC) ) #4 Fri Jan 9 16:05:53 CET 2009
> console [udbg0] enabled
> setup_arch: bootmem
> arch: exit
> Zone PFN ranges:
>   DMA      0x00000000 -> 0x0000ffff
>   Normal   0x0000ffff -> 0x0000ffff
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
>     0: 0x00000000 -> 0x0000ffff
> MMU: Allocated 1088 bytes of context maps for 255 contexts
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65023
> Kernel command line: ip=on root=/dev/nfs
> UIC0 (32 IRQ sources) at DCR 0xc0
> UIC1 (32 IRQ sources) at DCR 0xd0
> UIC2 (32 IRQ sources) at DCR 0xe0
> PID hash table entries: 1024 (order: 10, 4096 bytes)
> clocksource: timebase mult[600000] shift[22] registered
> Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
> Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
> Memory: 256256k/262140k available (2996k kernel code, 5572k reserved, 128k data, 122k bss, 156k init)
> SLUB: Genslabs=10, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> Calibrating delay loop... 1331.20 BogoMIPS (lpj=2662400)
> Mount-cache hash table entries: 512
> net_namespace: 716 bytes
> NET: Registered protocol family 16
>              
> PCI host bridge /plb/pci@1ec000000 (primary) ranges:
>  MEM 0x0000000180000000..0x00000001bfffffff -> 0x0000000080000000 
>   IO 0x00000001e8000000..0x00000001e800ffff -> 0x0000000000000000
>   IO 0x00000001e8800000..0x00000001ebffffff -> 0x0000000000000000
>  \--> Skipped (too many) !
> 4xx PCI DMA offset set to 0x00000000
> /plb/pci@1ec000000: Resource out of range
> PCI: Probing PCI hardware
> PCI: Hiding 4xx host bridge resources 0000:00:00.0
> pci 0000:00:0c.0: PME# supported from D0 D3hot D3cold
> pci 0000:00:0c.0: PME# disabled
> bio: create slab <bio-0> at 0
> NET: Registered protocol family 2
> IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
> TCP established hash table entries: 8192 (order: 4, 65536 bytes)
> TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
> TCP: Hash tables configured (established 8192 bind 8192)
> TCP reno registered
> NET: Registered protocol family 1
> JFFS2 version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
> msgmni has been set to 501
> alg: No test for stdrng (krng)
> io scheduler noop registered
> io scheduler anticipatory registered (default)
> io scheduler deadline registered
> io scheduler cfq registered
> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> serial8250.0: ttyS0 at MMIO 0x1ef600300 (irq = 17) is a 16550A
> console handover: boot [udbg0] -> real [ttyS0]
> serial8250.0: ttyS1 at MMIO 0x1ef600400 (irq = 18) is a 16550A
> serial8250.0: ttyS2 at MMIO 0x1ef600500 (irq = 19) is a 16550A
> serial8250.0: ttyS3 at MMIO 0x1ef600600 (irq = 20) is a 16550A
> 1ef600300.serial: ttyS0 at MMIO 0x1ef600300 (irq = 17) is a 16550A
> 1ef600400.serial: ttyS1 at MMIO 0x1ef600400 (irq = 18) is a 16550A
> 1ef600500.serial: ttyS2 at MMIO 0x1ef600500 (irq = 19) is a 16550A
> 1ef600600.serial: ttyS3 at MMIO 0x1ef600600 (irq = 20) is a 16550A
> brd: module loaded
> Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI
> Copyright (c) 1999-2006 Intel Corporation.
> e1000 0000:00:0c.0: enabling device (0000 -> 0003)
> Machine check in kernel mode.
> Data Read PLB Error
> Oops: Machine check, sig: 7 [#1]
> PowerPC 44x Platform
> Modules linked in:
> NIP: c0187cb8 LR: c0236300 CTR: c0187bb0
> REGS: cfff7f10 TRAP: 0214   Not tainted  (2.6.28-07939-g2150edc-dirty)
> MSR: 00029000 <EE,ME,CE>  CR: 28d6cb24  XER: 20000000
> TASK = cf818400[1] 'swapper' THREAD: cf828000
> GPR00: 00000000 cf829db0 cf818400 cf8114fc 00000004 00000000 00000002 cf829d88 
> GPR08: 00000000 d10c0008 00000000 0000000b 00001000 00108000 0ffb2400 00000001 
> GPR16: 007fff13 00400458 00800000 c032d69c c024bfc4 c0330000 cf8114fc 00000001 
> GPR24: 00000000 00000001 00000047 cf811000 cf811320 cf811000 00000001 cf83d400 
> NIP [c0187cb8] e1000_set_media_type+0x64/0xe4
> LR [c0236300] e1000_probe+0x334/0xd5c
> Call Trace:
> [cf829db0] [c02362b4] e1000_probe+0x2e8/0xd5c (unreliable)
> [cf829e10] [c015c018] local_pci_probe+0x24/0x34
> [cf829e20] [c015c240] pci_device_probe+0x84/0xa8
> [cf829e50] [c017b948] driver_probe_device+0xb4/0x1e8
> [cf829e70] [c017bb20] __driver_attach+0xa4/0xa8
> [cf829e90] [c017b0fc] bus_for_each_dev+0x70/0xac
> [cf829ec0] [c017b760] driver_attach+0x24/0x34
> [cf829ed0] [c017aa04] bus_add_driver+0x1d0/0x244
> [cf829ef0] [c017bd40] driver_register+0x70/0x160
> [cf829f10] [c015c4e8] __pci_register_driver+0x4c/0xac
> [cf829f30] [c02dfb30] e1000_init_module+0x58/0xa8
> [cf829f50] [c00013d8] do_one_initcall+0x34/0x1b0
> [cf829fc0] [c02c6178] kernel_init+0x94/0x100
> [cf829ff0] [c000da64] kernel_thread+0x50/0x6c
> Instruction dump:
> 409c0080 2f8b0010 419e006c 2b8b0010 419d005c 380bffff 2b800001 409d0074 
> 81230000 39290008 7c0004ac 7c004c2c <0c000000> 4c00012c 70000020 40820060 
> ---[ end trace 85643a8ae0783f0b ]---
> Kernel panic - not syncing: Attempted to kill init!
> Rebooting in 180 seconds..
> 
> 
> With kind regards,
> 
> Geert Uytterhoeven
> Software Architect
> 
> Sony Techsoft Centre Europe
> The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
> 
> Phone:    +32 (0)2 700 8453
> Fax:      +32 (0)2 700 8622
> E-mail:   Geert.Uytterhoeven@sonycom.com
> Internet: http://www.sony-europe.com/
> 
> A division of Sony Europe (Belgium) N.V.
> VAT BE 0413.825.160 · RPR Brussels
> Fortis · BIC GEBABEBB · IBAN BE41293037680010
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-09 21:30           ` Benjamin Herrenschmidt
@ 2009-01-09 22:01             ` Roland Dreier
  2009-01-12 13:37               ` Geert Uytterhoeven
  0 siblings, 1 reply; 18+ messages in thread
From: Roland Dreier @ 2009-01-09 22:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Geert Uytterhoeven, linuxppc-dev, Sean MacLennan

 > Can you double check that the e1000 isn't copying the PCI resources into
 > a unsigned long before ioremap'ing the result, thus cropping the top
 > bits ?

as far as I can see, e1000 is using pci_ioremap_bar(), which should do
the right thing as long as resource_size_t is the right type (which it
looks like it is on PowerPC 44x).

 - R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-09 14:42     ` Geert Uytterhoeven
@ 2009-01-09 22:34       ` Herbert Xu
  2009-01-09 23:13         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 18+ messages in thread
From: Herbert Xu @ 2009-01-09 22:34 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linuxppc-dev, netdev, Sean MacLennan, David S. Miller

On Fri, Jan 09, 2009 at 03:42:25PM +0100, Geert Uytterhoeven wrote:
> On Thu, 8 Jan 2009, Benjamin Herrenschmidt wrote:
>
> > There isn't that I know of. The EMAC code creates a single NAPI instance
> > for all EMACs and I think used to completely disconnect things. The old
> > code created a fake netdev just for NAPI, that became unnecessary with
> > the new NAPI stuff.... but it looks like the way we do things now
> > displeases some changes in the network stack. I'll have to dig.
> 
> Verified on my Sequoia (which now lost its network :-(
> 
> The regression/problem (requiring a valid net_device in netif_napi_add(), even
> if CONFIG_NETPOLL=n) seems to be introduced by commit
> d565b0a1a9b6ee7dff46e1f68b26b526ac11ae50 ("net: Add Generic Receive Offload
> infrastructure").

Yes EMAC just needs to go back to the old fake dev setup.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-09 22:34       ` Herbert Xu
@ 2009-01-09 23:13         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 18+ messages in thread
From: Benjamin Herrenschmidt @ 2009-01-09 23:13 UTC (permalink / raw)
  To: Herbert Xu
  Cc: netdev, linuxppc-dev, Sean MacLennan, Geert Uytterhoeven,
	David S. Miller

On Sat, 2009-01-10 at 09:34 +1100, Herbert Xu wrote:
> On Fri, Jan 09, 2009 at 03:42:25PM +0100, Geert Uytterhoeven wrote:
> > On Thu, 8 Jan 2009, Benjamin Herrenschmidt wrote:
> >
> > > There isn't that I know of. The EMAC code creates a single NAPI instance
> > > for all EMACs and I think used to completely disconnect things. The old
> > > code created a fake netdev just for NAPI, that became unnecessary with
> > > the new NAPI stuff.... but it looks like the way we do things now
> > > displeases some changes in the network stack. I'll have to dig.
> > 
> > Verified on my Sequoia (which now lost its network :-(
> > 
> > The regression/problem (requiring a valid net_device in netif_napi_add(), even
> > if CONFIG_NETPOLL=n) seems to be introduced by commit
> > d565b0a1a9b6ee7dff46e1f68b26b526ac11ae50 ("net: Add Generic Receive Offload
> > infrastructure").
> 
> Yes EMAC just needs to go back to the old fake dev setup.

One thing I wanted to do back then... which triggered the discussion
with Stephen just before he broke NAPI up from netdev, was to add a core
function that creates such dummy netdev so that drivers don't have to
break every time some new internal field changes or such...

I'll give that a spin asap, though it might have to wait for monday.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-09 22:01             ` Roland Dreier
@ 2009-01-12 13:37               ` Geert Uytterhoeven
  2009-01-12 21:36                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 18+ messages in thread
From: Geert Uytterhoeven @ 2009-01-12 13:37 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Linux/PPC Development, Sean MacLennan

On Fri, 9 Jan 2009, Roland Dreier wrote:
>  > Can you double check that the e1000 isn't copying the PCI resources into
>  > a unsigned long before ioremap'ing the result, thus cropping the top
>  > bits ?
> 
> as far as I can see, e1000 is using pci_ioremap_bar(), which should do
> the right thing as long as resource_size_t is the right type (which it
> looks like it is on PowerPC 44x).

Indeed, the full 36-bit address is passed to __ioremap() via pci_ioremap_bar(),
as evidenced from the additional debug output below (see [1]).

As I don't have any other 3.3V PCI Ethernet cards, I plugged in a 3.3V PCI USB
2.0 card in the second PCI slot, and got a similar crash (see [2]).

Are the PCI slots on the Sequoia known broken under recent Linux kernels? I've
never used them before...

[1] E1000 probe with more debug info:
| Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI
| Copyright (c) 1999-2006 Intel Corporation.
| e1000 0000:00:0a.0: enabling device (0000 -> 0003)
| resource 0: [0x180000000-0x18001ffff]
| resource 1: [0x180020000-0x18003ffff]
| resource 2: [0x1000-0x103f]
| resource 3: [0x0-0x0]
| resource 4: [0x0-0x0]
| resource 5: [0x0-0x0]
| __ioremap: addr 0x180000000, size 131072, flags 0x500
|   v = 0xd10c0000
|   ((unsigned long)addr & ~PAGE_MASK) = 0x0
|   return d10c0000
| hw->hw_addr = d10c0000
| e1000_set_media_type:502: hw = cf8114fc
| e1000_set_media_type:503: hw->hw_addr = d10c0000
| e1000_set_media_type:509: 
| e1000_set_media_type:534: er32(STATUS) will do a readl() on d10c0008
| Machine check in kernel mode.
| Data Read PLB Error
| Oops: Machine check, sig: 7 [#1]
| PowerPC 44x Platform
| Modules linked in:
| NIP: c0188b48 LR: c0188b38 CTR: c01732f8
| REGS: cfff7f10 TRAP: 0214   Not tainted  (2.6.28-07939-g2150edc-dirty)
| MSR: 00029000 <EE,ME,CE>  CR: 28f60b22  XER: 20000000
| TASK = cf818400[1] 'swapper' THREAD: cf828000
| GPR00: c0188b38 cf829d90 cf818400 00000048 00001a88 ffffffff c0173d7c 00003fff 
| GPR08: 00000000 d10c0008 00003fff 00001a88 28f60b22 01000030 0ffb2400 00000001 
| GPR16: 007fff13 00400458 00800000 c032f69c c024cfc4 c0330000 cf8114fc 00000001 
| GPR24: cf811000 cf811320 00000001 00000047 c02a27dc 00000000 c024d524 cf8114fc 
| NIP [c0188b48] e1000_set_media_type+0xf4/0x1ec
| LR [c0188b38] e1000_set_media_type+0xe4/0x1ec
| Call Trace:
| [cf829d90] [c0188b38] e1000_set_media_type+0xe4/0x1ec (unreliable)
| [cf829db0] [c0236c84] e1000_probe+0x3c0/0xdf4
| [cf829e10] [c015c0bc] local_pci_probe+0x24/0x34
| [cf829e20] [c015c2e4] pci_device_probe+0x84/0xa8
| [cf829e50] [c017b9ec] driver_probe_device+0xb4/0x1e8
| [cf829e70] [c017bbc4] __driver_attach+0xa4/0xa8
| [cf829e90] [c017b1a0] bus_for_each_dev+0x70/0xac
| [cf829ec0] [c017b804] driver_attach+0x24/0x34
| [cf829ed0] [c017aaa8] bus_add_driver+0x1d0/0x244
| [cf829ef0] [c017bde4] driver_register+0x70/0x160
| [cf829f10] [c015c58c] __pci_register_driver+0x4c/0xac
| [cf829f30] [c02e1b30] e1000_init_module+0x58/0xa8
| [cf829f50] [c00013d8] do_one_initcall+0x34/0x1b0
| [cf829fc0] [c02c8178] kernel_init+0x94/0x100
| [cf829ff0] [c000da64] kernel_thread+0x50/0x6c
| Instruction dump:
| 409d00e8 80df0000 3c60c02a 7fc4f378 38a00216 386327e8 38c60008 480aaa19 
| 813f0000 39290008 7c0004ac 7fa04c2c <0c1d0000> 4c00012c 3c60c02a 7fa6eb78 
| ---[ end trace 6c682238ca36f67d ]---

[2] EHCI probe:
| ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
| ehci_hcd 0000:00:0c.2: enabling device (0000 -> 0002)
| ehci_hcd 0000:00:0c.2: EHCI Host Controller
| ehci_hcd 0000:00:0c.2: new USB bus registered, assigned bus number 1
| Machine check in kernel mode.
| Data Read PLB Error
| Oops: Machine check, sig: 7 [#1]
| PowerPC 44x Platform
| Modules linked in:
| NIP: c01baebc LR: c01a8264 CTR: c01badcc
| REGS: cfff7f10 TRAP: 0214   Not tainted  (2.6.28-07939-g2150edc-dirty)
| MSR: 00029000 <EE,ME,CE>  CR: 28d60324  XER: 20000000
| TASK = cf818400[1] 'swapper' THREAD: cf828000
| GPR00: 00000000 cf829d80 cf818400 cfa0f800 00000001 00000000 c026dd0c fffffffd 
| GPR08: 00000000 d1098000 00000000 d1098000 48d60342 01208030 0ffb2400 00000001 
| GPR16: 007fff13 00400458 00800000 ffffffff 007fff00 0ffadd68 00000000 00000001 
| GPR24: 00000000 c032de18 00000010 000000a0 cf8c6400 cf84c000 cfa0f8b8 cfa0f800 
| NIP [c01baebc] ehci_pci_setup+0xf0/0x600
| LR [c01a8264] usb_add_hcd+0x1a8/0x5e8
| Call Trace:
| [cf829d80] [00000001] 0x1 (unreliable)
| [cf829db0] [c01a8264] usb_add_hcd+0x1a8/0x5e8
| [cf829de0] [c01b395c] usb_hcd_pci_probe+0x158/0x2e4
| [cf829e10] [c015c0bc] local_pci_probe+0x24/0x34
| [cf829e20] [c015c2e4] pci_device_probe+0x84/0xa8
| [cf829e50] [c017b9ec] driver_probe_device+0xb4/0x1e8
| [cf829e70] [c017bbc4] __driver_attach+0xa4/0xa8
| [cf829e90] [c017b1a0] bus_for_each_dev+0x70/0xac
| [cf829ec0] [c017b804] driver_attach+0x24/0x34
| [cf829ed0] [c017aaa8] bus_add_driver+0x1d0/0x244
| [cf829ef0] [c017bde4] driver_register+0x70/0x160
| [cf829f10] [c015c58c] __pci_register_driver+0x4c/0xac
| [cf829f30] [c03061a0] ehci_hcd_init+0xb0/0xf0
| [cf829f50] [c00013d8] do_one_initcall+0x34/0x1b0
| [cf829fc0] [c02ec178] kernel_init+0x94/0x100
| [cf829ff0] [c000da64] kernel_thread+0x50/0x6c
| Instruction dump:
| 2f8001b5 409eff68 801e00e0 64002000 901e00e0 74082000 813f008c 7d2b4b78 
| 913f00b8 40a2ff60 7c0004ac 7c004c2c <0c000000> 4c00012c 5400063e 7c0b0214 
| ---[ end trace 8e7aeede5368187f ]---

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-12 13:37               ` Geert Uytterhoeven
@ 2009-01-12 21:36                 ` Benjamin Herrenschmidt
  2009-01-12 22:48                   ` Josh Boyer
  2009-01-12 22:51                   ` Re[2]: " Yuri Tikhonov
  0 siblings, 2 replies; 18+ messages in thread
From: Benjamin Herrenschmidt @ 2009-01-12 21:36 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Linux/PPC Development, Roland Dreier, Sean MacLennan

On Mon, 2009-01-12 at 14:37 +0100, Geert Uytterhoeven wrote:
> On Fri, 9 Jan 2009, Roland Dreier wrote:
> >  > Can you double check that the e1000 isn't copying the PCI resources into
> >  > a unsigned long before ioremap'ing the result, thus cropping the top
> >  > bits ?
> > 
> > as far as I can see, e1000 is using pci_ioremap_bar(), which should do
> > the right thing as long as resource_size_t is the right type (which it
> > looks like it is on PowerPC 44x).
> 
> Indeed, the full 36-bit address is passed to __ioremap() via pci_ioremap_bar(),
> as evidenced from the additional debug output below (see [1]).
> 
> As I don't have any other 3.3V PCI Ethernet cards, I plugged in a 3.3V PCI USB
> 2.0 card in the second PCI slot, and got a similar crash (see [2]).
> 
> Are the PCI slots on the Sequoia known broken under recent Linux kernels? I've
> never used them before...

Hrm, something is indeed wrong, hard to say what tho. My canyonlands
works fine (460EPx) and I can try a Taishan one of these days (440GX
iirc). What is in sequoia ? I think it's a GX no ?

Could be something in the device-tree ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mal_probe crash
  2009-01-12 21:36                 ` Benjamin Herrenschmidt
@ 2009-01-12 22:48                   ` Josh Boyer
  2009-01-12 22:51                   ` Re[2]: " Yuri Tikhonov
  1 sibling, 0 replies; 18+ messages in thread
From: Josh Boyer @ 2009-01-12 22:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Geert Uytterhoeven, Linux/PPC Development, Roland Dreier,
	Sean MacLennan

On Tue, Jan 13, 2009 at 08:36:32AM +1100, Benjamin Herrenschmidt wrote:
>On Mon, 2009-01-12 at 14:37 +0100, Geert Uytterhoeven wrote:
>> On Fri, 9 Jan 2009, Roland Dreier wrote:
>> >  > Can you double check that the e1000 isn't copying the PCI resources into
>> >  > a unsigned long before ioremap'ing the result, thus cropping the top
>> >  > bits ?
>> > 
>> > as far as I can see, e1000 is using pci_ioremap_bar(), which should do
>> > the right thing as long as resource_size_t is the right type (which it
>> > looks like it is on PowerPC 44x).
>> 
>> Indeed, the full 36-bit address is passed to __ioremap() via pci_ioremap_bar(),
>> as evidenced from the additional debug output below (see [1]).
>> 
>> As I don't have any other 3.3V PCI Ethernet cards, I plugged in a 3.3V PCI USB
>> 2.0 card in the second PCI slot, and got a similar crash (see [2]).
>> 
>> Are the PCI slots on the Sequoia known broken under recent Linux kernels? I've
>> never used them before...
>
>Hrm, something is indeed wrong, hard to say what tho. My canyonlands
>works fine (460EPx) and I can try a Taishan one of these days (440GX
>iirc). What is in sequoia ? I think it's a GX no ?

440EPx.

josh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re[2]: mal_probe crash
  2009-01-12 21:36                 ` Benjamin Herrenschmidt
  2009-01-12 22:48                   ` Josh Boyer
@ 2009-01-12 22:51                   ` Yuri Tikhonov
  2009-01-13  2:52                     ` Benjamin Herrenschmidt
  2009-01-13 16:19                     ` Geert Uytterhoeven
  1 sibling, 2 replies; 18+ messages in thread
From: Yuri Tikhonov @ 2009-01-12 22:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Geert Uytterhoeven, Linux/PPC Development, Roland Dreier,
	Sean MacLennan

On Tuesday, January 13, 2009 you wrote:

> On Mon, 2009-01-12 at 14:37 +0100, Geert Uytterhoeven wrote:
>> On Fri, 9 Jan 2009, Roland Dreier wrote:
>> >  > Can you double check that the e1000 isn't copying the PCI resources=
 into
>> >  > a unsigned long before ioremap'ing the result, thus cropping the top
>> >  > bits ?
>> >=20
>> > as far as I can see, e1000 is using pci_ioremap_bar(), which should do
>> > the right thing as long as resource_size_t is the right type (which it
>> > looks like it is on PowerPC 44x).
>>=20
>> Indeed, the full 36-bit address is passed to __ioremap() via pci_ioremap=
_bar(),
>> as evidenced from the additional debug output below (see [1]).
>>=20
>> As I don't have any other 3.3V PCI Ethernet cards, I plugged in a 3.3V P=
CI USB
>> 2.0 card in the second PCI slot, and got a similar crash (see [2]).
>>=20
>> Are the PCI slots on the Sequoia known broken under recent Linux kernels=
? I've
>> never used them before...

> Hrm, something is indeed wrong, hard to say what tho. My canyonlands
> works fine (460EPx) and I can try a Taishan one of these days (440GX
> iirc). What is in sequoia ? I think it's a GX no ?

 Sequoia is equipped with 440EPx.

 I observe the 'mal_probe' crash on the Katmai board too (based on=20
440SPe):

PPC 4xx OCP EMAC driver, version 3.54
Unable to handle kernel paging request for data at address 0x0000003c
Faulting instruction address: 0xc01becb8
Oops: Kernel access of bad area, sig: 11 [#1]
PowerPC 44x Platform
Modules linked in:
NIP: c01becb8 LR: c0232200 CTR: c0014d68
REGS: cfe47d30 TRAP: 0300   Not tainted  (2.6.29-rc1-00014-g58a813f-dirty)
MSR: 00029000 <EE,ME,CE>  CR: 42144084  XER: 20000000
DEAR: 0000003c, ESR: 00000000
TASK =3D cfe00000[1] 'swapper' THREAD: cfe40000
GPR00: c08ce244 cfe47de0 cfe00000 00000000 c08ce22c c0158864 00000020 00029=
000=20
GPR08: 000000d0 c08ce254 000000d0 00000744 82144082 89003000 7ffe4300 00000=
000=20
GPR16: 7ffd901c 7ffde640 00000000 00000000 00000000 00000000 00000000 00000=
00d=20
GPR24: c0751e80 c0415e0c dfec1e00 c0751e64 c04161c8 00000000 dff46a00 c08ce=
200=20
NIP [c01becb8] netif_napi_add+0x1c/0x58
LR [c0232200] mal_probe+0x1cc/0x668
Call Trace:
[cfe47de0] [c0232194] mal_probe+0x160/0x668 (unreliable)
[cfe47e10] [c01abe38] of_platform_device_probe+0x5c/0x36c
[cfe47e30] [c0152c24] driver_probe_device+0xb8/0x1e8
[cfe47e50] [c0152df8] __driver_attach+0xa4/0xa8
[cfe47e70] [c0151fa8] bus_for_each_dev+0x5c/0x98
[cfe47ea0] [c0152a2c] driver_attach+0x24/0x34
[cfe47eb0] [c0152788] bus_add_driver+0x1d8/0x258
[cfe47ee0] [c0153008] driver_register+0x5c/0x158
[cfe47f00] [c01abd00] of_register_driver+0x54/0x70
[cfe47f10] [c031a160] mal_init+0x20/0x30
[cfe47f20] [c031a2c8] emac_init+0x158/0x1b4
[cfe47f60] [c0001170] do_one_initcall+0x34/0x1a0
[cfe47fd0] [c0300168] kernel_init+0x88/0xf4
[cfe47ff0] [c000d8c4] kernel_thread+0x4c/0x68
Instruction dump:
7fe3fb78 7c0803a6 bb210014 38210030 4e800020 38000000 90040024 90040020=20
90a40010 90c4000c 90840000 38040018 <8123003c> 3963003c 91240018 90840004=
=20
---[ end trace 22428c4f73106ff5 ]---


 This is with Linus's tree, head ae0..e10bb.

 The work-around from Matthias Fuchs (Jan 09, 2009) helps though.

 Regards, Yuri

 --
 Yuri Tikhonov, Senior Software Engineer
 Emcraft Systems, www.emcraft.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re[2]: mal_probe crash
  2009-01-12 22:51                   ` Re[2]: " Yuri Tikhonov
@ 2009-01-13  2:52                     ` Benjamin Herrenschmidt
  2009-01-13 16:19                     ` Geert Uytterhoeven
  1 sibling, 0 replies; 18+ messages in thread
From: Benjamin Herrenschmidt @ 2009-01-13  2:52 UTC (permalink / raw)
  To: Yuri Tikhonov
  Cc: Geert Uytterhoeven, Linux/PPC Development, Roland Dreier,
	Sean MacLennan

On Tue, 2009-01-13 at 01:51 +0300, Yuri Tikhonov wrote:
> On Tuesday, January 13, 2009 you wrote:
> 
> > On Mon, 2009-01-12 at 14:37 +0100, Geert Uytterhoeven wrote:
> >> On Fri, 9 Jan 2009, Roland Dreier wrote:
> >> >  > Can you double check that the e1000 isn't copying the PCI resources into
> >> >  > a unsigned long before ioremap'ing the result, thus cropping the top
> >> >  > bits ?
> >> > 
> >> > as far as I can see, e1000 is using pci_ioremap_bar(), which should do
> >> > the right thing as long as resource_size_t is the right type (which it
> >> > looks like it is on PowerPC 44x).
> >> 
> >> Indeed, the full 36-bit address is passed to __ioremap() via pci_ioremap_bar(),
> >> as evidenced from the additional debug output below (see [1]).
> >> 
> >> As I don't have any other 3.3V PCI Ethernet cards, I plugged in a 3.3V PCI USB
> >> 2.0 card in the second PCI slot, and got a similar crash (see [2]).
> >> 
> >> Are the PCI slots on the Sequoia known broken under recent Linux kernels? I've
> >> never used them before...
> 
> > Hrm, something is indeed wrong, hard to say what tho. My canyonlands
> > works fine (460EPx) and I can try a Taishan one of these days (440GX
> > iirc). What is in sequoia ? I think it's a GX no ?
> 
>  Sequoia is equipped with 440EPx.
> 
>  I observe the 'mal_probe' crash on the Katmai board too (based on 
> 440SPe):

Yes, EMAC is currently busted. We'll fix it asap.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re[2]: mal_probe crash
  2009-01-12 22:51                   ` Re[2]: " Yuri Tikhonov
  2009-01-13  2:52                     ` Benjamin Herrenschmidt
@ 2009-01-13 16:19                     ` Geert Uytterhoeven
  1 sibling, 0 replies; 18+ messages in thread
From: Geert Uytterhoeven @ 2009-01-13 16:19 UTC (permalink / raw)
  To: Yuri Tikhonov; +Cc: Roland Dreier, Sean MacLennan, Linux/PPC Development

On Tue, 13 Jan 2009, Yuri Tikhonov wrote:
> On Tuesday, January 13, 2009 you wrote:
> > On Mon, 2009-01-12 at 14:37 +0100, Geert Uytterhoeven wrote:
> >> On Fri, 9 Jan 2009, Roland Dreier wrote:
> >> >  > Can you double check that the e1000 isn't copying the PCI resources into
> >> >  > a unsigned long before ioremap'ing the result, thus cropping the top
> >> >  > bits ?
> >> > 
> >> > as far as I can see, e1000 is using pci_ioremap_bar(), which should do
> >> > the right thing as long as resource_size_t is the right type (which it
> >> > looks like it is on PowerPC 44x).
> >> 
> >> Indeed, the full 36-bit address is passed to __ioremap() via pci_ioremap_bar(),
> >> as evidenced from the additional debug output below (see [1]).
> >> 
> >> As I don't have any other 3.3V PCI Ethernet cards, I plugged in a 3.3V PCI USB
> >> 2.0 card in the second PCI slot, and got a similar crash (see [2]).
> >> 
> >> Are the PCI slots on the Sequoia known broken under recent Linux kernels? I've
> >> never used them before...
> 
> > Hrm, something is indeed wrong, hard to say what tho. My canyonlands
> > works fine (460EPx) and I can try a Taishan one of these days (440GX
> > iirc). What is in sequoia ? I think it's a GX no ?
> 
>  Sequoia is equipped with 440EPx.
> 
>  I observe the 'mal_probe' crash on the Katmai board too (based on 
> 440SPe):

Do PCI cards in the Katmai's PCI-X slot work?

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-01-13 16:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-07 20:44 mal_probe crash Sean MacLennan
2009-01-08 20:46 ` Josh Boyer
2009-01-07 22:50   ` Benjamin Herrenschmidt
2009-01-09 14:42     ` Geert Uytterhoeven
2009-01-09 22:34       ` Herbert Xu
2009-01-09 23:13         ` Benjamin Herrenschmidt
2009-01-09 14:49     ` Matthias Fuchs
2009-01-09 15:02       ` Matthias Fuchs
2009-01-09 15:24         ` Geert Uytterhoeven
2009-01-09 21:30           ` Benjamin Herrenschmidt
2009-01-09 22:01             ` Roland Dreier
2009-01-12 13:37               ` Geert Uytterhoeven
2009-01-12 21:36                 ` Benjamin Herrenschmidt
2009-01-12 22:48                   ` Josh Boyer
2009-01-12 22:51                   ` Re[2]: " Yuri Tikhonov
2009-01-13  2:52                     ` Benjamin Herrenschmidt
2009-01-13 16:19                     ` Geert Uytterhoeven
2009-01-09 21:09       ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).