LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: advice on reading a call trace
From: Benjamin Herrenschmidt @ 2010-10-08  1:29 UTC (permalink / raw)
  To: Jean-Mickael Guerin; +Cc: linuxppc-dev
In-Reply-To: <4CAAB5F4.9030002@6wind.com>

On Tue, 2010-10-05 at 07:21 +0200, Jean-Mickael Guerin wrote:
> SLUB is turned on, and the oops does not seem to happen when SLAB replaces SLUB.
> I just got lucky with SLAB, or does it sound familiar on ppc?

Nope, they should both work fine. Try enabling SLAB_DEBUG or SLUB_DEBUG
maybe ? You might be having some memory corruption (bad driver ?)

Cheers,
Ben.

> Regards,
> Jean-Mickael
> 
> On 10/4/2010 1:24 PM, Jean-Mickael Guerin wrote:
> > Hello,
> > I'm stepping into ppc world and I'd like to know how to read this call trace,
> > I enabled debug options but I'm not able to track the origin of this bug, I mean
> > what happens before handle_page_fault():
> > 
> > Unable to handle kernel paging request for data at address 0x00000008
> > Faulting instruction address: 0xc00abcd8
> > 
> > [c1d0dd40] [c00ab2e4] swapin_readahead+0x34/0xbc
> > [c1d0dd80] [c009e91c] handle_mm_fault+0x724/0x9b0
> > [c1d0dde0] [c0014e10] do_page_fault+0x2e8/0x55c
> > [c1d0df40] [c000fc8c] handle_page_fault+0xc/0x80
> > Instruction dump:
> > 80010034 7d435378 bb210014 38210030 7c0803a6 4e800020 7d695b78 4bffff00
> > 387a007c 4847616d 38000001 7c00f830 <817b0008> 7d3c0214 7f895840 7f9ee378
> > Kernel panic - not syncing: Fatal exception
> > 
> > Another one:
> > NIP: c00b6980 LR: c00b6978 CTR: 0f9ffb20
> > REGS: d6a9dc60 TRAP: 0300   Not tainted  (2.6.34.6-00392-g31e1857)
> > MSR: 10029002   CR: 24004442  XER: 00000000
> > DEAR: 00000008, ESR: 00000000
> > TASK = da6a7440[4915] 'smrd' THREAD: d6a9c000 CPU: 0
> > GPR00: 00000008 d6a9dd10 da6a7440 c0738340 00000000 00000000 00000000 00000001
> > GPR08: 00000000 00000000 c00b6978 00000000 44004442 10027f04 c077790c d79f94a8
> > GPR16: d6a9dd88 c07a0000 d72301f0 d6a9c000 000001ff 00000000 0f9ffb20 da5c6f78
> > GPR24: d79f9440 c0738340 d6a9dd48 00000000 07832400 00000001 00000003 07832400
> > NIP [c00b6980] valid_swaphandles+0x19c/0x1d0
> > LR [c00b6978] valid_swaphandles+0x194/0x1d0
> > Call Trace:
> > [d6a9dd10] [c00b6978] valid_swaphandles+0x194/0x1d0 (unreliable)
> > [d6a9dd40] [c00b5a10] swapin_readahead+0x34/0xbc
> > [d6a9dd80] [c00a9064] handle_mm_fault+0x7c0/0x868
> > [d6a9dde0] [c00158d8] do_page_fault+0x2fc/0x570
> > [d6a9df40] [c001061c] handle_page_fault+0xc/0x80
> > Instruction dump:
> > 38210030 7c0803a6 4e800020 7d695b78 4bffff04 3d20c074 39298300 3b290040
> > 7f23cb78 484843a9 38000001 7c00f030 <817b0008> 7d3f0214 7f895840 7ffdfb78
> > Kernel panic - not syncing: Fatal exception
> > Call Trace:
> > [d6a9dba0] [c000765c] show_stack+0x40/0x15c (unreliable)
> > [d6a9dbd0] [c053b780] panic+0x94/0x118
> > [d6a9dc20] [c000d630] die+0x15c/0x1bc
> > [d6a9dc40] [c00155a4] bad_page_fault+0x90/0xc8
> > [d6a9dc50] [c001068c] handle_page_fault+0x7c/0x80
> > [d6a9dd10] [c00b6978] valid_swaphandles+0x194/0x1d0
> > [d6a9dd40] [c00b5a10] swapin_readahead+0x34/0xbc
> > [d6a9dd80] [c00a9064] handle_mm_fault+0x7c0/0x868
> > [d6a9dde0] [c00158d8] do_page_fault+0x2fc/0x570
> > [d6a9df40] [c001061c] handle_page_fault+0xc/0x80
> > Rebooting in 5 seconds..
> > 
> > Thanks,
> > Jean-Mickael
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: in_atomic() check in page_cache_get_speculative()
From: Benjamin Herrenschmidt @ 2010-10-08  1:31 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Nick Piggin
In-Reply-To: <20101004153753.2cda1abf@udp111988uds.am.freescale.net>

On Mon, 2010-10-04 at 15:37 -0500, Scott Wood wrote:
> [Updated with Nick's current address; previous one bounced]
> 
> On Mon, 4 Oct 2010 15:22:59 -0500
> Scott Wood <scottwood@freescale.com> wrote:
> 
> > I'm seeing the in_atomic() check in page_cache_get_speculative()
> > (linux/pagemap.h:138) fail when running e500 KVM with .  
> 
> Sorry, that should finish as "with CONFIG_DEBUG_VM."
> 
> > It's coming
> > from get_user_pages_fast(), from KVM's hva_to_pfn().  This is on kvm.git
> > plus a few local patches that should be completely unrelated, but it
> > looks like this code hasn't changed much in a couple years.
> > 
> > Interrupts are disabled by get_user_pages_fast(), but apparently
> > preemption was not separately disabled.  The comment in
> > page_cache_get_speculative() says that preemption disabling is done by
> > rcu_read_lock(), and that "this function must be called inside the same
> > rcu_read_lock() section as has been used to lookup the page in the
> > pagecache radix-tree (or page table)".
> > 
> > Where is this RCU lock supposed to be acquired?  I don't see any RCU in
> > arch/powerpc/mm/gup.c.  Is it buried in some macro or function call?
> > 

Well, we shouldn't need the rcu lock if interrupts are off, at least
that's my understanding...

Cheers,
Ben.

^ permalink raw reply

* RE: [PATCH v3 6/7] mtd: m25p80: add a read function to read page by page
From: Hu Mingkai-B21284 @ 2010-10-08  2:13 UTC (permalink / raw)
  To: David Brownell, linuxppc-dev, spi-devel-general, linux-mtd
  Cc: Gala Kumar-B11780, Zang Roy-R61911
In-Reply-To: <841976.76219.qm@web180306.mail.gq1.yahoo.com>



> -----Original Message-----
> From: David Brownell [mailto:david-b@pacbell.net]
> Sent: Thursday, September 30, 2010 6:46 PM
> To: linuxppc-dev@ozlabs.org; spi-devel-general@lists.sourceforge.net; =
linux-
> mtd@lists.infradead.org; Hu Mingkai-B21284
> Cc: Gala Kumar-B11780; Zang Roy-R61911; Hu Mingkai-B21284
> Subject: Re: [PATCH v3 6/7] mtd: m25p80: add a read function to read =
page by
> page
>=20
>=20
> --- On Thu, 9/30/10, Mingkai Hu <Mingkai.hu@freescale.com> wrote:
>=20
> > From: Mingkai Hu <Mingkai.hu@freescale.com>
> > Subject: [PATCH v3 6/7] mtd: m25p80: add a read function to read =
page
> > by page
>=20
> NAK.
>=20
> We went over this before.
>=20
> =A0 The bug is in your SPI master controller driver, and the fix there =
involves
> mapping large reads  into multiple smaller reads.=A0 (Example, 128K =
read as two
> 64K reads instead of one of 128K.
>=20
> It's *NEVER* appropriate to commit to patching all upper level drivers =
in order
> to work around bugs in lower level ones.=A0 The set of such upper =
level drivers
> that may need bugfixing is quite large, most will never be used with =
your buggy
> controller driver, and all such patches will need testing (but the =
test
> resources are probably not available).
>=20
> Whatever SPI controller driver you're working with is clearly buggy =
... but not
> unfixably so.
>=20
> DO NOT head down the path of requiring every SPI device driver to =
include
> workarounds for this odd little SPI master driver bug.
>=20
> - Dave
>=20

Thanks for your comments, the controller driver is the proper place to =
handle this, I'll fix it.

Thanks,
Mingkai

^ permalink raw reply

* RE: [PATCH v3 6/7] mtd: m25p80: add a read function to read page by page
From: Hu Mingkai-B21284 @ 2010-10-08  2:15 UTC (permalink / raw)
  To: Anton Vorontsov, Grant Likely
  Cc: David Brownell, linuxppc-dev, Gala Kumar-B11780, linux-mtd,
	spi-devel-general
In-Reply-To: <20100930150633.GA13741@oksana.dev.rtsoft.ru>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQW50b24gVm9yb250c292
IFttYWlsdG86Y2JvdWF0bWFpbHJ1QGdtYWlsLmNvbV0NCj4gU2VudDogVGh1cnNkYXksIFNlcHRl
bWJlciAzMCwgMjAxMCAxMTowNyBQTQ0KPiBUbzogR3JhbnQgTGlrZWx5DQo+IENjOiBEYXZpZCBC
cm93bmVsbDsgbGludXhwcGMtZGV2QG96bGFicy5vcmc7IEh1IE1pbmdrYWktQjIxMjg0OyBsaW51
eC0NCj4gbXRkQGxpc3RzLmluZnJhZGVhZC5vcmc7IEdhbGEgS3VtYXItQjExNzgwOyBzcGktZGV2
ZWwtDQo+IGdlbmVyYWxAbGlzdHMuc291cmNlZm9yZ2UubmV0DQo+IFN1YmplY3Q6IFJlOiBbUEFU
Q0ggdjMgNi83XSBtdGQ6IG0yNXA4MDogYWRkIGEgcmVhZCBmdW5jdGlvbiB0byByZWFkIHBhZ2Ug
YnkNCj4gcGFnZQ0KPiANCj4gT24gVGh1LCBTZXAgMzAsIDIwMTAgYXQgMTE6NDE6NDBQTSArMDkw
MCwgR3JhbnQgTGlrZWx5IHdyb3RlOg0KPiA+IE9uIFRodSwgU2VwIDMwLCAyMDEwIGF0IDExOjE2
IFBNLCBHcmFudCBMaWtlbHkNCj4gPiA8Z3JhbnQubGlrZWx5QHNlY3JldGxhYi5jYT4gd3JvdGU6
DQo+ID4gPiBPbiBUaHUsIFNlcCAzMCwgMjAxMCBhdCA3OjQ2IFBNLCBEYXZpZCBCcm93bmVsbCA8
ZGF2aWQtYkBwYWNiZWxsLm5ldD4gd3JvdGU6DQo+ID4gPj4NCj4gPiA+PiAtLS0gT24gVGh1LCA5
LzMwLzEwLCBNaW5na2FpIEh1IDxNaW5na2FpLmh1QGZyZWVzY2FsZS5jb20+IHdyb3RlOg0KPiA+
ID4+DQo+ID4gPj4+IEZyb206IE1pbmdrYWkgSHUgPE1pbmdrYWkuaHVAZnJlZXNjYWxlLmNvbT4N
Cj4gPiA+Pj4gU3ViamVjdDogW1BBVENIIHYzIDYvN10gbXRkOiBtMjVwODA6IGFkZCBhIHJlYWQg
ZnVuY3Rpb24gdG8gcmVhZA0KPiA+ID4+PiBwYWdlIGJ5IHBhZ2UNCj4gPiA+Pg0KPiA+ID4+IE5B
Sy4NCj4gPiA+Pg0KPiA+ID4+IFdlIHdlbnQgb3ZlciB0aGlzIGJlZm9yZS4NCj4gPiA+DQo+ID4g
PiBZZXMsIEkgYWdyZWUgd2l0aCBEYXZpZCBvbiB0aGlzLiDCoElmIGxhcmdlIHRyYW5zZmVycyBk
b24ndCB3b3JrLA0KPiA+ID4gdGhlbiBpdCBpcyB0aGUgU1BJIG1hc3RlciBkcml2ZXIgdGhhdCBp
cyBidWdneS4NCj4gPg0KPiA+IEJ5IHRoZSB3YXksIGRvZXMgdGhpcyBmaXggeW91ciBwcm9ibGVt
Pw0KPiA+DQo+ID4gaHR0cHM6Ly9wYXRjaHdvcmsua2VybmVsLm9yZy9wYXRjaC8xODQ3NTIvDQo+
IA0KPiBJdCBzaG91bGRuJ3QuIEFGQUlLLCBlU1BJIGlzIFBJTy1vbmx5IGNvbnRyb2xsZXIsIGFu
ZCB0aGUgb3ZlcnJ1biBmaXggaXMgZm9yIHRoZQ0KPiBETUEgbW9kZS4NCj4gDQo+IFRoYW5rcywN
Cj4gDQo+IHAucy4gQnR3LCBpbiBwYXRjaCAzLzcsIGlzX2RtYV9tYXBwZWQgYXJndW1lbnQgb2Yg
ZnNsX2VzcGlfYnVmcygpIGlzIHVubmVlZGVkLg0KPiANCg0KWWVzLCB0aGUgaXNfZG1hX21hcHBl
ZCBpc24ndCBuZWVkZWQsIEknbGwgcmVtb3ZlIGl0Lg0KDQpUaGFua3MsDQpNaW5na2FpDQo=

^ permalink raw reply

* RE: [PATCH v3 5/7] mtd: m25p80: add support to parse the SPI flash's partitions
From: Hu Mingkai-B21284 @ 2010-10-08  2:42 UTC (permalink / raw)
  To: Grant Likely
  Cc: linuxppc-dev, Gala Kumar-B11780, linux-mtd, Zang Roy-R61911,
	spi-devel-general
In-Reply-To: <AANLkTi=eBA+i7kCOrEPao1AXofKUjBQokWLXFpgx3WND@mail.gmail.com>



> -----Original Message-----
> From: glikely@secretlab.ca [mailto:glikely@secretlab.ca] On Behalf Of =
Grant
> Likely
> Sent: Friday, October 01, 2010 5:35 AM
> To: Hu Mingkai-B21284
> Cc: linuxppc-dev@ozlabs.org; spi-devel-general@lists.sourceforge.net; =
linux-
> mtd@lists.infradead.org; Gala Kumar-B11780; Zang Roy-R61911
> Subject: Re: [PATCH v3 5/7] mtd: m25p80: add support to parse the SPI =
flash's
> partitions
>=20
> On Thu, Sep 30, 2010 at 5:00 PM, Mingkai Hu <Mingkai.hu@freescale.com> =
wrote:
> > Signed-off-by: Mingkai Hu <Mingkai.hu@freescale.com>
> > ---
> > v3:
> > =A0- Move the SPI flash partition code to the probe function.
> >
> > =A0drivers/mtd/devices/m25p80.c | =A0 39 =
+++++++++++++++++++++++++++------------
> > =A01 files changed, 27 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/mtd/devices/m25p80.c =
b/drivers/mtd/devices/m25p80.c
> > index 6f512b5..47d53c7 100644
> > --- a/drivers/mtd/devices/m25p80.c
> > +++ b/drivers/mtd/devices/m25p80.c
> > @@ -772,7 +772,7 @@ static const struct spi_device_id *__devinit
> jedec_probe(struct spi_device *spi)
> > =A0static int __devinit m25p_probe(struct spi_device *spi)
> > =A0{
> > =A0 =A0 =A0 =A0const struct spi_device_id =A0 =A0 =A0*id =3D =
spi_get_device_id(spi);
> > - =A0 =A0 =A0 struct flash_platform_data =A0 =A0 =A0*data;
> > + =A0 =A0 =A0 struct flash_platform_data =A0 =A0 =A0data, *pdata;
> > =A0 =A0 =A0 =A0struct m25p =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
*flash;
> > =A0 =A0 =A0 =A0struct flash_info =A0 =A0 =A0 =A0 =A0 =A0 =A0 *info;
> > =A0 =A0 =A0 =A0unsigned =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0i;
> > @@ -782,13 +782,27 @@ static int __devinit m25p_probe(struct =
spi_device *spi)
> > =A0 =A0 =A0 =A0 * a chip ID, try the JEDEC id commands; they'll work =
for most
> > =A0 =A0 =A0 =A0 * newer chips, even if we don't recognize the =
particular chip.
> > =A0 =A0 =A0 =A0 */
> > - =A0 =A0 =A0 data =3D spi->dev.platform_data;
> > - =A0 =A0 =A0 if (data && data->type) {
> > + =A0 =A0 =A0 pdata =3D spi->dev.platform_data;
> > + =A0 =A0 =A0 if (!pdata && spi->dev.of_node) {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 int nr_parts;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct mtd_partition *parts;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct device_node *np =3D =
spi->dev.of_node;
> > +
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 nr_parts =3D =
of_mtd_parse_partitions(&spi->dev, np, &parts);
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (nr_parts) {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pdata =3D &data;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 memset(pdata, 0, =
sizeof(*pdata));
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pdata->parts =3D =
parts;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pdata->nr_parts =3D =
nr_parts;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> > + =A0 =A0 =A0 }
>=20
> Yes, this is the correct way to go about adding the partitions.
> However, this patch can be made simpler by not renaming 'data' to
> 'pdata' and by moving the above code down to just before the partition
> information is actually used.  in the OF case, only the parts and the
> nr_parts values written into data, and those values aren't used until
> the last part of the probe function.
>=20
> Regardless, in principle this patch is correct:
>=20
> Acked-by: Grant Likely <grant.likely@secretlab.ca>
>=20
> > +
> > + =A0 =A0 =A0 if (pdata && pdata->type) {
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0const struct spi_device_id *plat_id;
> >
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0for (i =3D 0; i < =
ARRAY_SIZE(m25p_ids) - 1; i++) {
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0plat_id =3D =
&m25p_ids[i];
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (strcmp(data->type, =
plat_id->name))
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if =
(strcmp(pdata->type, plat_id->name))
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0continue;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> > @@ -796,7 +810,8 @@ static int __devinit m25p_probe(struct =
spi_device *spi)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (i < ARRAY_SIZE(m25p_ids) - 1)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0id =3D plat_id;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0else
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dev_warn(&spi->dev, =
"unrecognized id %s\n", data-
> >type);
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dev_warn(&spi->dev, =
"unrecognized id %s\n",
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 pdata->type);
> > =A0 =A0 =A0 =A0}
> >
> > =A0 =A0 =A0 =A0info =3D (void *)id->driver_data;
> > @@ -847,8 +862,8 @@ static int __devinit m25p_probe(struct =
spi_device *spi)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0write_sr(flash, 0);
> > =A0 =A0 =A0 =A0}
> >
> > - =A0 =A0 =A0 if (data && data->name)
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 flash->mtd.name =3D data->name;
> > + =A0 =A0 =A0 if (pdata && pdata->name)
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 flash->mtd.name =3D pdata->name;
> > =A0 =A0 =A0 =A0else
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0flash->mtd.name =3D =
dev_name(&spi->dev);
> >
> > @@ -919,9 +934,9 @@ static int __devinit m25p_probe(struct =
spi_device *spi)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0part_probes, &parts, 0);
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> >
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (nr_parts <=3D 0 && data && =
data->parts) {
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 parts =3D data->parts;
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 nr_parts =3D =
data->nr_parts;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (nr_parts <=3D 0 && pdata && =
pdata->parts) {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 parts =3D =
pdata->parts;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 nr_parts =3D =
pdata->nr_parts;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
>=20
> As per my comment earlier; since parts and nr_parts isn't needed
> before this point, this block could simply be:
>=20
> if (nr_parts <=3D 0 && data && data->parts) {
>         parts =3D data->parts;
>         nr_parts =3D data->nr_parts;
> }
> if (nr_parts <=3D 0 && spi->dev.of_node)
>   =A0 =A0 =A0 nr_parts =3D of_mtd_parse_partitions(&spi->dev, np, =
&parts);
>=20
> And most of the other changes to this file goes away.  Simpler, yes?
>=20

Yes, you're right, I'll fix it. Also thanks for your suggestion and ACK.

Thanks,
Mingkai

^ permalink raw reply

* Problem with Infiniband adapter on IBM p550
From: Patrick Finnegan @ 2010-10-08  2:57 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

I seem to be running into a problem getting a Mellanox Infinihost  
Infiniband adapter working on my IBM p550 (a 9113-550).  I'm using 
Debian squeeze, and tried upgrading to the 2.6.35.7 kernel without any 
help.

I get the following messages in dmesg:
[    4.972548] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 
2008)
[    4.972564] ib_mthca: Initializing 0000:c1:00.0
[    4.972674] ib_mthca 0000:c1:00.0: Missing DCS, aborting.


The problem looks the same as a problem I ran into with OpenFirmware on 
a Sun V880, which was fixed with this patch by Dave Miller:
http://ns3.spinics.net/lists/linux-rdma/msg01779.html

I spent some time looking at the equivalent function on powerpc, but 
didn't a block of code that looked similar.

Any suggestions?

I have dmesg, the dev .properties from openfirmware, and lspci -v from 
the machine:

http://ned.rcac.purdue.edu/p550-ib/dmesg
http://ned.rcac.purdue.edu/p550-ib/ib-of-device
http://ned.rcac.purdue.edu/p550-ib/lspci-v

Pat
-- 
Purdue University ITaP/Research Systems -- http://www.rcac.purdue.edu

^ permalink raw reply

* Problem with Infiniband adapter on IBM p550
From: Patrick Finnegan @ 2010-10-08  3:24 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

I seem to be running into a problem getting a Mellanox Infinihost  
Infiniband adapter working on my IBM p550 (a 9113-550).  I'm using 
Debian squeeze, and tried upgrading to the 2.6.35.7 kernel without any 
help.

I get the following messages in dmesg:
[    4.972548] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 
2008)
[    4.972564] ib_mthca: Initializing 0000:c1:00.0
[    4.972674] ib_mthca 0000:c1:00.0: Missing DCS, aborting.


The problem looks the same as a problem I ran into with OpenFirmware on 
a Sun V880, which was fixed with this patch by Dave Miller:
http://ns3.spinics.net/lists/linux-rdma/msg01779.html

I spent some time looking at the equivalent function on powerpc, but 
didn't a block of code that looked similar.

Any suggestions?

I have dmesg, the dev .properties from openfirmware, and lspci -v from 
the machine:

http://ned.rcac.purdue.edu/p550-ib/dmesg
http://ned.rcac.purdue.edu/p550-ib/ib-of-device
http://ned.rcac.purdue.edu/p550-ib/lspci-v

Pat
-- 
Purdue University Research Computing ---  http://www.rcac.purdue.edu/
The Computer Refuge                  ---  http://computer-refuge.org

^ permalink raw reply

* Re: Problem with Infiniband adapter on IBM p550
From: Benjamin Herrenschmidt @ 2010-10-08  5:41 UTC (permalink / raw)
  To: Patrick Finnegan; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <201010072324.33062.pat@computer-refuge.org>

On Thu, 2010-10-07 at 23:24 -0400, Patrick Finnegan wrote:
> I seem to be running into a problem getting a Mellanox Infinihost  
> Infiniband adapter working on my IBM p550 (a 9113-550).  I'm using 
> Debian squeeze, and tried upgrading to the 2.6.35.7 kernel without any 
> help.
> 
> I get the following messages in dmesg:
> [    4.972548] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 
> 2008)
> [    4.972564] ib_mthca: Initializing 0000:c1:00.0
> [    4.972674] ib_mthca 0000:c1:00.0: Missing DCS, aborting.

Ok, so from what I can tell, the driver is unhappy because either BAR 0
hasn't been assigned a memory resource or the size doesn't match what
the driver expects.

Let's see...

> The problem looks the same as a problem I ran into with OpenFirmware on 
> a Sun V880, which was fixed with this patch by Dave Miller:
> http://ns3.spinics.net/lists/linux-rdma/msg01779.html
> 
> I spent some time looking at the equivalent function on powerpc, but 
> didn't a block of code that looked similar.

I don't think we are hitting the same problem. I believe our code in
that area differs enough.

In your lspci, however, I see:

	Memory at <unassigned> (64-bit, non-prefetchable)
	Memory at <unassigned> (64-bit, prefetchable)

Which doesn't look good...

>From your OF log

> Any suggestions?
> 
> I have dmesg, the dev .properties from openfirmware, and lspci -v from 
> the machine:
> 
> http://ned.rcac.purdue.edu/p550-ib/dmesg
> http://ned.rcac.purdue.edu/p550-ib/ib-of-device
> http://ned.rcac.purdue.edu/p550-ib/lspci-v
> 
> Pat

^ permalink raw reply

* Re: Problem with Infiniband adapter on IBM p550
From: Benjamin Herrenschmidt @ 2010-10-08  5:45 UTC (permalink / raw)
  To: Patrick Finnegan; +Cc: paulus, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1286516470.2463.403.camel@pasglop>


> Ok, so from what I can tell, the driver is unhappy because either BAR 0
> hasn't been assigned a memory resource or the size doesn't match what
> the driver expects.
> 
Ooops, accidentally sent too quickly...

>From your OF log I see:

reg                     00c10000 00000000 00000000  00000000 00000000 
                        03c10010 00000000 00000000  00000000 00100000 
                        43c10018 00000000 00000000  00000000 00800000 
                        43c10020 00000000 00000000  00000000 08000000 
assigned-addresses      83c10020 00000000 e8000000  00000000 08000000 

Now, I think this is the problem.

The "assigned-addresses" property seems to indicate that the firmware only
assigned BAR 4 and didn't assign anything to the other ones.

I don't know why, but it definitely looks like a firmware bug to me. On those
machines, PCI resource assignment is under hypervisor control and so Linux
cannot re-assign missing resources itself.

I'll see if I can find a FW person to shed some light on this.

Can you provide me (privately maybe) with the FW version on the machine ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH v3 6/7] mtd: m25p80: add a read function to read page by page
From: Kumar Gala @ 2010-10-08  6:11 UTC (permalink / raw)
  To: Hu Mingkai-B21284
  Cc: David Brownell, linuxppc-dev, linux-mtd, spi-devel-general,
	Gala Kumar-B11780
In-Reply-To: <73839B4A0818E747864426270AC332C3057F6138@zmy16exm20.fsl.freescale.net>


On Oct 7, 2010, at 9:15 PM, Hu Mingkai-B21284 wrote:

>>>> Yes, I agree with David on this.  If large transfers don't work,
>>>> then it is the SPI master driver that is buggy.
>>>=20
>>> By the way, does this fix your problem?
>>>=20
>>> https://patchwork.kernel.org/patch/184752/
>>=20
>> It shouldn't. AFAIK, eSPI is PIO-only controller, and the overrun fix =
is for the
>> DMA mode.
>>=20
>> Thanks,
>>=20
>> p.s. Btw, in patch 3/7, is_dma_mapped argument of fsl_espi_bufs() is =
unneeded.
>>=20
>=20
> Yes, the is_dma_mapped isn't needed, I'll remove it.
>=20
> Thanks,
> Mingkai

I'd be really nice if we could close on this patchset in time for .37 =
acceptance.  I'm guessing that cutoff is quickly approaching.

- k=

^ permalink raw reply

* RE: [PATCH v3 3/7] eSPI: add eSPI controller support
From: Hu Mingkai-B21284 @ 2010-10-08  6:35 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: linuxppc-dev, Gala Kumar-B11780, linux-mtd, spi-devel-general
In-Reply-To: <20101001112204.GA16783@oksana.dev.rtsoft.ru>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQW50b24gVm9yb250c292
IFttYWlsdG86Y2JvdWF0bWFpbHJ1QGdtYWlsLmNvbV0NCj4gU2VudDogRnJpZGF5LCBPY3RvYmVy
IDAxLCAyMDEwIDc6MjIgUE0NCj4gVG86IEh1IE1pbmdrYWktQjIxMjg0DQo+IENjOiBsaW51eHBw
Yy1kZXZAb3psYWJzLm9yZzsgc3BpLWRldmVsLWdlbmVyYWxAbGlzdHMuc291cmNlZm9yZ2UubmV0
OyBsaW51eC0NCj4gbXRkQGxpc3RzLmluZnJhZGVhZC5vcmc7IEdhbGEgS3VtYXItQjExNzgwDQo+
IFN1YmplY3Q6IFJlOiBbUEFUQ0ggdjMgMy83XSBlU1BJOiBhZGQgZVNQSSBjb250cm9sbGVyIHN1
cHBvcnQNCj4gDQo+IEhlbGxvIE1pbmdrYWksDQo+IA0KPiBUaGVyZSBhcmUgbW9zdGx5IGNvc21l
dGljIGNvbW1lbnRzIGRvd24gYmVsb3cuDQo+IA0KPiA+ICsJCS8qIHNwaW4gdW50aWwgVFggaXMg
ZG9uZSAqLw0KPiA+ICsJCXdoaWxlICgoKGV2ZW50cyA9IG1wYzh4eHhfc3BpX3JlYWRfcmVnKCZy
ZWdfYmFzZS0+ZXZlbnQpKQ0KPiA+ICsJCQkJCSYgU1BJRV9ORikgPT0gMCkNCj4gPiArCQkJY3B1
X3JlbGF4KCk7DQo+IA0KPiBUaGlzIGlzIGRhbmdlcm91cy4gVGhlcmUncyBhIGhhbmR5IHNwaW5f
ZXZlbnRfdGltZW91dCgpIGluIGFzbS9kZWxheS5oLg0KPiANCg0KV2hlbiB0aW1lb3V0LCBjYW4g
SSB1c2UgcmV0dXJuIGluIHRoZSBpbnRlcnJ1cHQgZnVuY3Rpb24gZGlyZWN0bHkgbGlrZSB0aGlz
Pw0KDQppZiAoIShldmVudHMgJiBTUElFX05GKSkgew0KICAgICAgICBpbnQgcmV0Ow0KICAgICAg
ICAvKiBzcGluIHVudGlsIFRYIGlzIGRvbmUgKi8NCiAgICAgICAgcmV0ID0gc3Bpbl9ldmVudF90
aW1lb3V0KCgoZXZlbnRzID0gbXBjOHh4eF9zcGlfcmVhZF9yZWcoDQogICAgICAgICAgICAgICAg
ICAgICAgICAmcmVnX2Jhc2UtPmV2ZW50KSkgJiBTUElFX05GKSA9PSAwLCAxMDAwLCAwKTsNCiAg
ICAgICAgaWYgKCFyZXQpIHsNCiAgICAgICAgICAgICAgICBkZXZfZXJyKG1zcGktPmRldiwgInRp
cmVkIHdhaXRpbmcgZm9yIFNQSUVfTkZcbiIpOw0KICAgICAgICAgICAgICAgIHJldHVybjsNCiAg
ICAgICAgfQ0KfQ0KDQo+ID4gK30NCj4gPiArDQo+ID4gK3N0YXRpYyBjb25zdCBzdHJ1Y3Qgb2Zf
ZGV2aWNlX2lkIG9mX2ZzbF9lc3BpX21hdGNoW10gPSB7DQo+ID4gKwl7IC5jb21wYXRpYmxlID0g
ImZzbCxtcGM4NTM2LWVzcGkiIH0sDQo+ID4gKwl7fQ0KPiA+ICt9Ow0KPiA+ICtNT0RVTEVfREVW
SUNFX1RBQkxFKG9mLCBvZl9mc2xfZXNwaV9tYXRjaCk7DQo+ID4gKw0KPiA+ICtzdGF0aWMgc3Ry
dWN0IG9mX3BsYXRmb3JtX2RyaXZlciBmc2xfZXNwaV9kcml2ZXIgPSB7DQo+ID4gKwkuZHJpdmVy
ID0gew0KPiA+ICsJCS5uYW1lID0gImZzbF9lc3BpIiwNCj4gPiArCQkub3duZXIgPSBUSElTX01P
RFVMRSwNCj4gPiArCQkub2ZfbWF0Y2hfdGFibGUgPSBvZl9mc2xfZXNwaV9tYXRjaCwNCj4gPiAr
CX0sDQo+ID4gKwkucHJvYmUJCT0gb2ZfZnNsX2VzcGlfcHJvYmUsDQo+ID4gKwkucmVtb3ZlCQk9
IF9fZGV2ZXhpdF9wKG9mX2ZzbF9lc3BpX3JlbW92ZSksDQo+ID4gK307DQo+ID4gKw0KPiA+ICtz
dGF0aWMgaW50IF9faW5pdCBmc2xfZXNwaV9pbml0KHZvaWQpIHsNCj4gPiArCXJldHVybiBvZl9y
ZWdpc3Rlcl9wbGF0Zm9ybV9kcml2ZXIoJmZzbF9lc3BpX2RyaXZlcik7DQo+ID4gK30NCj4gPiAr
bW9kdWxlX2luaXQoZnNsX2VzcGlfaW5pdCk7DQo+ID4gKw0KPiA+ICtzdGF0aWMgdm9pZCBfX2V4
aXQgZnNsX2VzcGlfZXhpdCh2b2lkKSB7DQo+ID4gKwlvZl91bnJlZ2lzdGVyX3BsYXRmb3JtX2Ry
aXZlcigmZnNsX2VzcGlfZHJpdmVyKTsNCj4gPiArfQ0KPiA+ICttb2R1bGVfZXhpdChmc2xfZXNw
aV9leGl0KTsNCj4gPiArDQo+ID4gK01PRFVMRV9BVVRIT1IoIk1pbmdrYWkgSHUiKTsNCj4gPiAr
TU9EVUxFX0RFU0NSSVBUSU9OKCJFbmhhbmNlZCBGcmVlc2NhbGUgU1BJIERyaXZlciIpOw0KPiAN
Cj4gVGhpcyBzb3VuZHMgbGlrZSB0aGF0IHRoaXMgaXMgYW4gZW5oYW5jZWQgdmVyc2lvbiBvZiB0
aGUgRnJlZXNjYWxlIFNQSSBkcml2ZXIsDQo+IHdoaWNoIGl0IGlzIG5vdC4gOy0pDQo+IA0KDQpJ
IHF1b3RlZCBmcm9tIHRoZSBVTSwgbWF5YmUgdGhlIGVuaGFuY2VtZW50IGlzIHRoZSBjb250cm9s
bGVyIHRha2VzIG92ZXIgdGhlDQpDUyBzaWduYWwgZnJvbSB0aGUgSFcgcG9pbnQgdmlldy4NCg0K
SSBjaGFuZ2VkIGFsbCB0aGUgb3RoZXIgY29kZSBhY2NvcmRpbmcgdG8geW91ciBjb21tZW50cy4N
Cg0KVGhhbmtzLA0KTWluZ2thaQ0KDQo=

^ permalink raw reply

* RE: [PATCH v3 2/7] spi/mpc8xxx: refactor the common code for SPI/eSPI controller
From: Hu Mingkai-B21284 @ 2010-10-08  6:37 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: linuxppc-dev, Gala Kumar-B11780, linux-mtd, Zang Roy-R61911,
	spi-devel-general
In-Reply-To: <20101001112212.GA17505@oksana.dev.rtsoft.ru>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQW50b24gVm9yb250c292
IFttYWlsdG86Y2JvdWF0bWFpbHJ1QGdtYWlsLmNvbV0NCj4gU2VudDogRnJpZGF5LCBPY3RvYmVy
IDAxLCAyMDEwIDc6MjIgUE0NCj4gVG86IEh1IE1pbmdrYWktQjIxMjg0DQo+IENjOiBsaW51eHBw
Yy1kZXZAb3psYWJzLm9yZzsgc3BpLWRldmVsLWdlbmVyYWxAbGlzdHMuc291cmNlZm9yZ2UubmV0
OyBsaW51eC0NCj4gbXRkQGxpc3RzLmluZnJhZGVhZC5vcmc7IEdhbGEgS3VtYXItQjExNzgwOyBa
YW5nIFJveS1SNjE5MTENCj4gU3ViamVjdDogUmU6IFtQQVRDSCB2MyAyLzddIHNwaS9tcGM4eHh4
OiByZWZhY3RvciB0aGUgY29tbW9uIGNvZGUgZm9yIFNQSS9lU1BJDQo+IGNvbnRyb2xsZXINCj4g
DQo+IE9uIFRodSwgU2VwIDMwLCAyMDEwIGF0IDA0OjAwOjQxUE0gKzA4MDAsIE1pbmdrYWkgSHUg
d3JvdGU6DQo+IFsuLi5dDQo+ID4gLXN0YXRpYyB2b2lkIG1wYzh4eHhfc3BpX2NoYW5nZV9tb2Rl
KHN0cnVjdCBzcGlfZGV2aWNlICpzcGkpDQo+ID4gK3N0YXRpYyB2b2lkIGZzbF9zcGlfY2hhbmdl
X21vZGUoc3RydWN0IHNwaV9kZXZpY2UgKnNwaSkNCj4gPiAgew0KPiA+ICAJc3RydWN0IG1wYzh4
eHhfc3BpICptc3BpID0gc3BpX21hc3Rlcl9nZXRfZGV2ZGF0YShzcGktPm1hc3Rlcik7DQo+ID4g
IAlzdHJ1Y3Qgc3BpX21wYzh4eHhfY3MgKmNzID0gc3BpLT5jb250cm9sbGVyX3N0YXRlOw0KPiA+
IC0JX19iZTMyIF9faW9tZW0gKm1vZGUgPSAmbXNwaS0+YmFzZS0+bW9kZTsNCj4gPiArCXN0cnVj
dCBmc2xfc3BpX3JlZyAqcmVnX2Jhc2UgPSAoc3RydWN0IGZzbF9zcGlfcmVnICopbXNwaS0+cmVn
X2Jhc2U7DQo+IA0KPiBObyBuZWVkIGZvciB0aGVzZSB0eXBlIGNhc3RzICh0aGUgc2FtZSBpcyBm
b3IgdGhlIHdob2xlIHBhdGNoKS4NCj4gDQoNCkZpeCBpdC4NCg0KVGhhbmtzLA0KTWluZ2thaQ0K

^ permalink raw reply

* Re: powerpc, fs_enet: scanning PHY after Linux is up
From: Holger brunck @ 2010-10-08  8:50 UTC (permalink / raw)
  To: Grant Likely; +Cc: linuxppc-dev, devicetree-discuss, hs, Detlev Zundel, netdev
In-Reply-To: <AANLkTi=GkkD_-Vu-NswNedhgVuPaYePOHWa_2ytQgMf_@mail.gmail.com>

Hi Grant,

On 10/06/2010 06:52 PM, Grant Likely wrote:
> On Wed, Oct 6, 2010 at 3:53 AM, Heiko Schocher <hs@denx.de> wrote:
>>>> So, the question is, is there a possibility to solve this problem?
>>>>
>>>> If there is no standard option, what would be with adding a
>>>> "scan_phy" file in
>>>>
>>>> /proc/device-tree/soc\@f0000000/cpm\@119c0/mdio\@10d40
>>>> (or better destination?)
>>>>
>>>> which with we could rescan a PHY with
>>>> "echo addr > /proc/device-tree/soc\@f0000000/cpm\@119c0/mdio\@10d40/scan_phy"
>>>> (so there is no need for using of_find_node_by_path(), as we should
>>>>  have the associated device node here, and can step through the child
>>>>  nodes with "for_each_child_of_node(np, child)" and check if reg == addr)
>>>>
>>>> or shouldn;t be at least, if the phy couldn;t be found when opening
>>>> the port, retrigger a scanning, if the phy now is accessible?
>>>
>>> One option would be to still register a phy_device for each phy
>>> described in the device tree, but defer binding a driver to each phy
>>> that doesn't respond.  Then at of_phy_find_device() time, if it
>>
>> Maybe I din;t get the trick, but the problem is, that
>> you can;t register a phy_device in drivers/of/of_mdio.c
>> of_mdiobus_register(), if the phy didn;t respond with the
>> phy_id ... and of_phy_find_device() is not (yet) used in fs_enet
> 
> I'm suggesting modifying the phy layer so that it is possible to
> register a phy_device that doesn't (yet) respond.
> 

yes this sounds reasonable.

>>> matches with a phy_device that isn't bound to a driver yet, then
>>> re-trigger the binding operation.  At which point the phy id can be
>>> probed and the correct driver can be chosen.  If binding succeeds,
>>> then return the phy_device handle.  If not, then fail as it currently
>>> does.
>>
>> Wouldn;t it be good, just if we need a PHY (on calling fs_enet_open)
>> to look if there is one?
>>
>> Something like that (not tested):
>>
>> in drivers/net/fs_enet/fs_enet-main.c in fs_init_phy()
>> called from fs_enet_open():
>>
>> Do first:
>> phydev =  of_phy_find_device(fep->fpi->phy_node);
>>
>> Look if there is a driver (phy_dev->drv == NULL ?)
>>
>> If not, call new function
>> of_mdiobus_register_phy(mii_bus, fep->fpi->phy_node)
>> see below patch for it.
>>
>> If this succeeds, all is OK, and we can use this phy,
>> else ethernet not work.
> 
> I don't like this approach because it muddies the concept of which
> device is actually responsible for managing the phys on the bus.  Is
> it managed by the mdio bus device or the Ethernet device?  It also has
> a potential race condition.  Whereas triggering a late driver bind
> will be safe.
> 
> Alternately, I'd also be okay with a common method to trigger a
> reprobe of a particular phy from userspace, but I fear that would be a
> significantly more complex solution.
> 
>>
>> !!just no idea, how to get mii_bus pointer ...
> 
> You'd have to get the parent of the phy node, and then loop over all
> the registered mdio busses looking for a bus that uses that node.
> 

you say that you don't like the approach to probe the phy again in fs_enet_open,
but currently I don't understand what would be the alternate trigger point to
rescan the mdio bus?

I made a first patch to enhance the phy_device structure and rescan the mdio bus
at time of fs_enet_open (because I didn't see a better trigger point). The
advantage is that we got the mii_bus pointer and the phy addr stored in the
already created phy device structure and is therefore easy to use. See the patch
below for this modifications. Whats currently missing in the patch is to set the
phy_id if the phy was scanned later after phy_device creation. For the mgcoge
board it seems to solve our problem, but maybe I miss something important.

Best regards
Holger Brunck

diff --git a/drivers/net/fs_enet/fs_enet-main.c b/drivers/net/fs_enet/fs_enet-main.c
index ec2f503..6bc117f 100644
--- a/drivers/net/fs_enet/fs_enet-main.c
+++ b/drivers/net/fs_enet/fs_enet-main.c
@@ -775,7 +774,8 @@ static int fs_enet_open(struct net_device *dev)
 {
        struct fs_enet_private *fep = netdev_priv(dev);
        int r;
-       int err;
+       int err = 0;
+       u32 phy_id = 0;

        /* to initialize the fep->cur_rx,... */
        /* not doing this, will cause a crash in fs_enet_rx_napi */
@@ -795,13 +795,23 @@ static int fs_enet_open(struct net_device *dev)
                return -EINVAL;
        }

-       err = fs_init_phy(dev);
-       if (err) {
+       if (fep->phydev == NULL)
+               err = fs_init_phy(dev);
+
+       if (!err && (fep->phydev->available == false))
+               r = get_phy_id(fep->phydev->bus, fep->phydev->addr, &phy_id);
+
+       if (err || (phy_id == 0xffffffff)) {
                free_irq(fep->interrupt, dev);
                if (fep->fpi->use_napi)
                        napi_disable(&fep->napi);
-               return err;
+               if (err)
+                       return err;
+               else
+                       return -EINVAL;
        }
+       else
+               fep->phydev->available = true;
        phy_start(fep->phydev);

        netif_start_queue(dev);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index adbc0fd..1f443cb 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -173,6 +173,10 @@ struct phy_device* phy_device_create(struct mii_bus *bus,
int addr, int phy_id)
        dev->dev.bus = &mdio_bus_type;
        dev->irq = bus->irq != NULL ? bus->irq[addr] : PHY_POLL;
        dev_set_name(&dev->dev, PHY_ID_FMT, bus->id, addr);
+       if (phy_id == 0xffffffff)
+               dev->available = false;
+       else
+               dev->available = true;

        dev->state = PHY_DOWN;

@@ -232,13 +236,11 @@ struct phy_device * get_phy_device(struct mii_bus *bus,
int addr)
        int r;

        r = get_phy_id(bus, addr, &phy_id);
-       if (r)
-               return ERR_PTR(r);

        /* If the phy_id is mostly Fs, there is no device there */
-       if ((phy_id & 0x1fffffff) == 0x1fffffff)
-               return NULL;
-
+       if (((phy_id & 0x1fffffff) == 0x1fffffff) || r)
+               phy_id = 0xffffffff;
+       /* create phy even if the phy is currently not available */
        dev = phy_device_create(bus, addr, phy_id);

        return dev;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 6a7eb40..12dc3e4 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -303,6 +303,9 @@ struct phy_device {

        int link_timeout;

+       /* Flag to support delayed availability */
+       bool available;
+
        /*
         * Interrupt number for this PHY
         * -1 means no interrupt

^ permalink raw reply related

* Re: P1020RDB PCI-E Interrupt problem
From: tiejun.chen @ 2010-10-08  9:10 UTC (permalink / raw)
  To: Fabian Bertholm; +Cc: linuxppc-dev
In-Reply-To: <AANLkTikDsENpooCJZe68_WF4sb5qrQpjFN9ZqanUDMRx@mail.gmail.com>

Fabian Bertholm wrote:
> Hi,
> 
> I try to run ath9k on a P1020RDB Freescale board.
> I run into the problem similar to the Bug/Patch here:
> http://patchwork.ozlabs.org/patch/52137/
> 
> I get irq 16: nobody cared....

Firstly you should check if 'irq 16' is issue from your PCIe device.

> 
> I tried to fix the dts file in the same manner but this does not help.
> Currently I am using 2.6.33.7
> 
> Any hints? Anybody?
> 

Then if so I think it's unnecessary to add any #interrupt-map on dts since MSI
is used as interrupt mode on P1020RDB, not legacy interrupt. So please remove
all #interrupt-map on both pci nodes, then enable CONFIG_PCI_MSI to build/boot
again.

> The modified pci section from my dts:
> 
> 	pci0: pcie@ffe09000 {
> 		cell-index = <1>;
> 		compatible = "fsl,mpc8548-pcie";
> 		device_type = "pci";
> 		#interrupt-cells = <1>;
> 		#size-cells = <2>;
> 		#address-cells = <3>;
> 		reg = <0 0xffe09000 0 0x1000>;
> 		bus-range = <0 255>;
> 		ranges = <0x2000000 0x0 0xa0000000 0 0xa0000000 0x0 0x20000000
> 			  0x1000000 0x0 0x00000000 0 0xffc30000 0x0 0x10000>;
> 		clock-frequency = <33333333>;
> 		interrupt-parent = <&mpic>;
> 		interrupts = <16 2>;
> 		interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
> 		interrupt-map = <
> 			/* IDSEL 0x0 */
> 			0000 0x0 0x0 0x1 &mpic 0x4 0x2
> 			0000 0x0 0x0 0x2 &mpic 0x5 0x2
> 			0000 0x0 0x0 0x3 &mpic 0x6 0x2
> 			0000 0x0 0x0 0x4 &mpic 0x7 0x2

I don't know how you generate these interrupt-map. So even you really want to
use legacy interrupt you should make sure your PCIe bus number/device
number/function number, and actual interrupt number. Especially please check
interrupt trigger sense. Sometimes incorrect sense will issue your PCIe to
receive interrupt storm. As a result no interrupt handler deal with this
spurious interrupt like you saw 'irq 16: nobody cared'.

> 			>;
> 		pcie@0 {
> 			reg = <0x0 0x0 0x0 0x0 0x0>;
> 			#size-cells = <2>;
> 			#address-cells = <3>;
> 			device_type = "pci";
> 			ranges = <0x2000000 0x0 0xa0000000
> 				  0x2000000 0x0 0xa0000000
> 				  0x0 0x20000000
> 
> 				  0x1000000 0x0 0x0
> 				  0x1000000 0x0 0x0
> 				  0x0 0x100000>;
> 		};
> 	};
> 
> 	pci1: pcie@ffe0a000 {
> 		cell-index = <2>;
> 		compatible = "fsl,mpc8548-pcie";
> 		device_type = "pci";
> 		#interrupt-cells = <1>;
> 		#size-cells = <2>;
> 		#address-cells = <3>;
> 		reg = <0 0xffe0a000 0 0x1000>;
> 		bus-range = <0 255>;
> 		ranges = <0x2000000 0x0 0xc0000000 0 0xc0000000 0x0 0x20000000
> 			  0x1000000 0x0 0x00000000 0 0xffc20000 0x0 0x10000>;
> 		clock-frequency = <33333333>;
> 		interrupt-parent = <&mpic>;
> 		interrupts = <16 2>;
> 		interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
> 		interrupt-map = <
> 			/* IDSEL 0x0 */
> 			0000 0x0 0x0 0x1 &mpic 0x0 0x1
> 			0000 0x0 0x0 0x2 &mpic 0x1 0x1
> 			0000 0x0 0x0 0x3 &mpic 0x2 0x1
> 			0000 0x0 0x0 0x4 &mpic 0x3 0x1
> 			>;
> 

There are two PCIe controller so you also have to make sure which controller
your device exist.

-Tiejun

> 		pcie@0 {
> 			reg = <0x0 0x0 0x0 0x0 0x0>;
> 			#size-cells = <2>;
> 			#address-cells = <3>;
> 			device_type = "pci";
> 			ranges = <0x2000000 0x0 0xc0000000
> 				  0x2000000 0x0 0xc0000000
> 				  0x0 0x20000000
> 
> 				  0x1000000 0x0 0x0
> 				  0x1000000 0x0 0x0
> 				  0x0 0x100000>;
> 		};
> 	};
> 
> 
> Best Regards,
> Fabian
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* Re: Parsing a bus fault message?
From: tiejun.chen @ 2010-10-08  9:22 UTC (permalink / raw)
  To: david.hagood; +Cc: Scott Wood, linuxppc-dev, Ira W. Snyder
In-Reply-To: <af75f13d9ed8beb29470dffe56d295f5.squirrel@localhost>

david.hagood@gmail.com wrote:
>> Scott Wood wrote:
>> I also meet machine check exception if configure LAW improperly for PCI.
>> (i.e.
>> unmatched PCIe controller id.)
>>
>> From you log looks 0xexxxxxxx should be your PCI space. So you can check
>> if that
>>  fall into appropriate LAW configuration. Maybe you can post your boot log
>> and
>> error log here.
> 
> Using EP8641A machine description
> Total memory = 1024MB; using 2048kB for hash table (at cfa00000)
> Linux version 2.6.26.2-ep1.10 (SRWhite@WIC-102333) (gcc version 4.0.0
> (DENX ELDK 4.1 4.0.0)) #269 SMP PREEMPT Tue Sep 28 15:48:43 CDT 2010
> Found initrd at 0xcfdfa000:0xcffa9663
> Found legacy serial port 0 for /soc8641@e0000000/serial@4500
>   mem=e0004500, taddr=e0004500, irq=0, clk=500000000, speed=0
> Found legacy serial port 1 for /soc8641@e0000000/serial@4600
>   mem=e0004600, taddr=e0004600, irq=0, clk=500000000, speed=0
> CPU maps initialized for 1 thread per core
>  (thread shift is 0)
> console [udbg0] enabled
> Entering add_active_range(0, 0, 262144) 0 entries of 256 used
> EP8641A board from Embedded Planet
> Top of RAM: 0x40000000, Total RAM: 0x40000000
> Memory hole size: 0MB
> Zone PFN ranges:
>   DMA             0 ->   196608
>   Normal     196608 ->   196608
>   HighMem    196608 ->   262144
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
>     0:        0 ->   262144
> On node 0 totalpages: 262144
>   DMA zone: 1536 pages used for memmap
>   DMA zone: 0 pages reserved
>   DMA zone: 195072 pages, LIFO batch:31
>   Normal zone: 0 pages used for memmap
>   HighMem zone: 512 pages used for memmap
>   HighMem zone: 65024 pages, LIFO batch:15
>   Movable zone: 0 pages used for memmap
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 260096
> Kernel command line: root=/dev/ram rw console=ttyS0,115200
> ip=10.200.120.158:::255.255.0.0::eth0
> mtdparts=physmap-flash.0:0x1300000(linux)ro,0x6bc0000(jffs),-(rsvd)ro
> mpic: Setting up MPIC " MPIC     " version 1.2 at e0040000, max 2 CPUs
> mpic: ISU size: 16, shift: 4, mask: f
> mpic: Initializing for 80 sources
> PID hash table entries: 4096 (order: 12, 16384 bytes)
> time_init: decrementer frequency = 125.000000 MHz
> time_init: processor frequency   = 1500.000000 MHz
> clocksource: timebase mult[2000000] shift[22] registered
> clockevent: decrementer mult[2000] shift[16] cpu[0]
> Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> High memory: 262144k
> Memory: 1031248k/1048576k available (3064k kernel code, 16476k reserved,
> 84k data, 149k bss, 164k init)
> Calibrating delay loop... 249.85 BogoMIPS (lpj=499712)
> Mount-cache hash table entries: 512
> mpic: requesting IPIs ...
> Processor 1 found.
> Synchronizing timebase
> Got ack
> score 299, offset 1000
> score 299, offset 500
> score 299, offset 250
> score 299, offset 125
> score 299, offset 62
> score 297, offset 31
> score -299, offset 15
> score 297, offset 23
> score 253, offset 19
> score -299, offset 17
> score -269, offset 18
> Min 18 (score -269), Max 19 (score 253)
> Final offset: 19 (269/300)
> clockevent: decrementer mult[2000] shift[16] cpu[1]
> Brought up 2 CPUs
> net_namespace: 208 bytes
> NET: Registered protocol family 16
> NET: Registered protocol family 2
> IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
> TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
> TCP bind hash table entries: 65536 (order: 7, 786432 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP reno registered
> NET: Registered protocol family 1
> checking if image is initramfs...it isn't (no cpio magic); looks like an
> initrd
> Freeing initrd memory: 1725k freed
> setup_kcore: restrict size=3fffffff
> of-fsl-dma e0021300.dma: Probe the Freescale DMA driver for
> fsl,eloplus-dma controller at e0021300...
> of-fsl-dma-channel e0021100.dma-channe: #0 (fsl,eloplus-dma-channel), irq 20
> of-fsl-dma-channel e0021180.dma-channe: #1 (fsl,eloplus-dma-channel), irq 21
> of-fsl-dma-channel e0021200.dma-channe: #2 (fsl,eloplus-dma-channel), irq 22
> of-fsl-dma-channel e0021280.dma-channe: #3 (fsl,eloplus-dma-channel), irq 23
> Setting up RapidIO peer-to-peer network /rapidio@e00c0000
> fsl-of-rio e00c0000.rapidio: Of-device full name /rapidio@e00c0000
> fsl-of-rio e00c0000.rapidio: Regs start 0xe00c0000 size 0x20000
> fsl-of-rio e00c0000.rapidio: LAW start 0x00000000c0000000, size
> 0x0000000020000000.
> fsl-of-rio e00c0000.rapidio: bellirq: 50, txirq: 53, rxirq 54
> fsl-of-rio e00c0000.rapidio: RapidIO PHY type: serial
> fsl-of-rio e00c0000.rapidio: Hardware port width: 4
> fsl-of-rio e00c0000.rapidio: Training connection status: Four-lane
> fsl-of-rio e00c0000.rapidio: RapidIO Common Transport System size: 256
> highmem bounce pool size: 64 pages
> Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
> JFFS2 version 2.2. (NAND) � 2001-2006 Red Hat, Inc.
> msgmni has been set to 1507
> io scheduler noop registered (default)
> Generic RTC Driver v1.07
> Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing disabled
> serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 42) is a 16550A
> console handover: boot [udbg0] -> real [ttyS0]
> serial8250.0: ttyS1 at MMIO 0xe0004600 (irq = 28) is a 16550A
> brd: module loaded
> loop: module loaded
> Gianfar MII Bus: probed
> eth0: Gianfar Ethernet Controller Version 1.2, 00:10:ec:01:1a:d3
> eth0: Running with NAPI enabled
> eth0: 256/256 RX/TX BD ring size
> eth1: Gianfar Ethernet Controller Version 1.2, 00:10:ec:81:1a:d3
> eth1: Running with NAPI enabled
> eth1: 256/256 RX/TX BD ring size
> eth2: Gianfar Ethernet Controller Version 1.2, 00:10:ec:41:1a:d3
> eth2: Running with NAPI enabled
> eth2: 256/256 RX/TX BD ring size
> eth3: Gianfar Ethernet Controller Version 1.2, 00:10:ec:c1:1a:d3
> eth3: Running with NAPI enabled
> eth3: 256/256 RX/TX BD ring size
> physmap platform flash device: 08000000 at f8000000
> physmap-flash.0: Found 2 x16 devices at 0x0 in 32-bit bank
>  Amd/Fujitsu Extended Query Table at 0x0040
> physmap-flash.0: CFI does not contain boot bank location. Assuming top.
> number of CFI chips: 1
> cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
> 3 cmdlinepart partitions found on MTD device physmap-flash.0
> Creating 3 MTD partitions on "physmap-flash.0":
> 0x00000000-0x01300000 : "linux"
> 0x01300000-0x07ec0000 : "jffs"
> 0x07ec0000-0x08000000 : "rsvd"
> TCP cubic registered
> NET: Registered protocol family 17
> RPC: Registered udp transport module.
> RPC: Registered tcp transport module.
> IP-Config: Complete:
>      device=eth0, addr=10.200.120.158, mask=255.255.0.0, gw=255.255.255.255,
>      host=10.200.120.158, domain=, nis-domain=(none),
>      bootserver=255.255.255.255, rootserver=255.255.255.255, rootpath=
> RAMDISK: Compressed image found at block 0
> VFS: Mounted root (ext2 filesystem).
> Freeing unused kernel memory: 164k init
> PHY: e0024520:10 - Link is Up - 100/Full
> RIO: discover master port 0, RIO0 mport
> rionetlink_init: receive handler registration suceeded!!!!
> rionetlink_init: rio_register_driver suceeded!!!!
> 
> Besides, in my setup, there are 2 LAWS programmed to point at the PEX: one
> mapping A0000000 to BFFFFFFF to the PEX, and one mapping  E2000000 to
> E2FFFFFF. My code directly scans the LAWS and picks the first one it sees
> mapped to the PEX, so it is picking A00000000, and using that. Is there an
> issue with having 2 LAWs mapping the same device to different locations?
> 

I think it should be allowed to use two LAWs to map different space for PCIe
since I ever saw more one LAWs are created for different RAM space.

Sorry for this delay response.

-Tiejun

> 
> Being unfortunately a man of many hats, I haven't had a lot of time today
> to work on this particular fire. Hopefully in the next couple of days I
> can get some more time to look into it more.
> 
> I do thank all of you for the pointers, and I'll look into them.
> 
> (BTW: Anybody near Wichita, and want to earn some extra $$$ helping me out?)
> 
> 
> 

^ permalink raw reply

* Re: Parsing a bus fault message?
From: David Hagood @ 2010-10-08 11:16 UTC (permalink / raw)
  To: tiejun.chen; +Cc: Scott Wood, linuxppc-dev, Ira W. Snyder
In-Reply-To: <4CAEE2EC.2020103@windriver.com>

On Fri, 2010-10-08 at 17:22 +0800, tiejun.chen wrote:
> 
> I think it should be allowed to use two LAWs to map different space for PCIe
> since I ever saw more one LAWs are created for different RAM space.
> 
> Sorry for this delay response.
> 
> -Tiejun

It's my bad, as I'd worked out the problem: the host root complex had to
be forced to rescan the device and enable bus mastering on that side of
things. Once that happened, I could access PCI space.

I should have posted my success.

^ permalink raw reply

* RTC rtc-cmos.c : Fix warning on PowerPC
From: srikanth krishnakar @ 2010-10-08 12:49 UTC (permalink / raw)
  To: Linuxppc-dev, linuxppc-dev; +Cc: Linuxppc-embedded


[-- Attachment #1.1: Type: text/plain, Size: 218 bytes --]

Hello,

I noticed a compilation warning of RTC for powerpc. Found the fix existing
for MIPS & not for available for PowerPC.

-- 
"The Good You Do, The Best You GET"

Regards
Srikanth Krishnakar
**********************

[-- Attachment #1.2: Type: text/html, Size: 276 bytes --]

[-- Attachment #2: 0001-RTC-rtc-cmos.c-Fix-warning-on-PowerPC.patch --]
[-- Type: text/x-patch, Size: 1218 bytes --]

From 8435e5876fc3d629406c63b85bff82c25fc4bf75 Mon Sep 17 00:00:00 2001
From: Srikanth Krishnakar <skrishna@mvista.com>
Date: Fri, 8 Oct 2010 18:07:06 +0530
Subject: [PATCH] RTC: rtc-cmos.c: Fix warning on PowerPC

The following warning is seen while compilation of PowerPC kernel:

 CC      drivers/rtc/rtc-cmos.o
drivers/rtc/rtc-cmos.c:697:2: warning: #warning Assuming 128 bytes
of RTC+NVRAM address space, not 64 bytes.

Fix it by adding defined(__powerpc__).

Signed-off-by: Srikanth Krishnakar <skrishna@mvista.com>
---
 drivers/rtc/rtc-cmos.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 5856167..7e6ce62 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -687,7 +687,8 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
 #if	defined(CONFIG_ATARI)
 	address_space = 64;
 #elif defined(__i386__) || defined(__x86_64__) || defined(__arm__) \
-			|| defined(__sparc__) || defined(__mips__)
+			|| defined(__sparc__) || defined(__mips__) \
+			|| defined(__powerpc__)
 	address_space = 128;
 #else
 #warning Assuming 128 bytes of RTC+NVRAM address space, not 64 bytes.
-- 
1.6.3.3.333.gdf68


^ permalink raw reply related

* Re: [PATCH 1/3] Powerpc/4xx: Add suspend and idle support
From: Josh Boyer @ 2010-10-08 13:35 UTC (permalink / raw)
  To: Victor Gallardo; +Cc: linuxppc-dev
In-Reply-To: <1286478436-9942-1-git-send-email-vgallardo@apm.com>

On Thu, Oct 07, 2010 at 12:07:16PM -0700, Victor Gallardo wrote:
>Add suspend/resume support for all 4xx compatible CPUs.
>See /sys/power/state for available power states configured in.
>
>Add two different idle states (idle-wait and idle-doze)
>controlled via sysfs. Default is idle-wait.
>	cat /sys/devices/system/cpu/cpu0/idle
>	[wait] doze
>
>To save additional power, use idle-doze.
>	echo doze > /sys/devices/system/cpu/cpu0/idle
>	cat /sys/devices/system/cpu/cpu0/idle
>	wait [doze]
>
>Signed-off-by: Victor Gallardo <vgallardo@apm.com>
>---
> Documentation/powerpc/dts-bindings/4xx/cpm.txt |   43 +++
> arch/powerpc/Kconfig                           |   13 +-
> arch/powerpc/platforms/44x/Makefile            |    5 +-
> arch/powerpc/sysdev/Makefile                   |    1 +
> arch/powerpc/sysdev/ppc4xx_cpm.c               |  339 ++++++++++++++++++++++++
> 5 files changed, 397 insertions(+), 4 deletions(-)
> create mode 100644 Documentation/powerpc/dts-bindings/4xx/cpm.txt
> create mode 100644 arch/powerpc/sysdev/ppc4xx_cpm.c
>
>diff --git a/Documentation/powerpc/dts-bindings/4xx/cpm.txt b/Documentation/powerpc/dts-bindings/4xx/cpm.txt
>new file mode 100644
>index 0000000..9635df8
>--- /dev/null
>+++ b/Documentation/powerpc/dts-bindings/4xx/cpm.txt
>@@ -0,0 +1,43 @@
>+PPC4xx Clock Power Management (CPM) node
>+
>+Required properties:
>+	- compatible		: compatible list, currently only "ibm,cpm"
>+	- dcr-access-method	: "native"
>+	- dcr-reg		: < DCR register range >
>+
>+Optional properties:
>+	- er-offset		: All 4xx SoCs with a CPM controller have
>+				  one of two different order for the CPM
>+				  registers. Some have the CPM registers
>+				  in the following order (ER,FR,SR). The
>+				  others have them in the following order
>+				  (SR,ER,FR). For the second case set
>+				  er-offset = <1>.
>+	- unused-units		: specifier consist of one cell. For each
>+				  bit in the cell, the corresponding bit
>+				  in CPM will be set to turn off unused
>+				  devices.
>+	- idle-doze		: specifier consist of one cell. For each
>+				  bit in the cell, the corresponding bit
>+				  in CPM will be set to turn off unused
>+				  devices. This is usually just CPM[CPU].
>+	- standby		: specifier consist of one cell. For each
>+				  bit in the cell, the corresponding bit
>+				  in CPM will be set on standby and
>+				  restored on resume.
>+	- suspend		: specifier consist of one cell. For each
>+				  bit in the cell, the corresponding bit
>+				  in CPM will be set on suspend (mem) and
>+				  restored on resume.

So the difference, from what I can tell, between standby and suspend is
really only what devices are turned off.  I don't see any code to put
the DRAM into self-refresh mode, etc.  If that is the case, perhaps we
could add a bit of description as to the different kinds of devices that
may be disabled in one mode but not the other.

>+
>+Example:
>+        CPM0: cpm {
>+                compatible = "ibm,cpm";
>+                dcr-access-method = "native";
>+                dcr-reg = <0x160 0x003>;
>+		er-offset = <0>;
>+                unused-units = <0x00000100>;
>+                idle-doze = <0x02000000>;
>+                standby = <0xfeff0000>;
>+                standby = <0xfeff791d>;

One of these two should be illustrating suspend.

josh

^ permalink raw reply

* Re: [PATCH 1/3] Powerpc/4xx: Add suspend and idle support
From: Josh Boyer @ 2010-10-08 13:46 UTC (permalink / raw)
  To: Victor Gallardo; +Cc: linuxppc-dev
In-Reply-To: <1286478436-9942-1-git-send-email-vgallardo@apm.com>

On Thu, Oct 07, 2010 at 12:07:16PM -0700, Victor Gallardo wrote:

I hit send too soon on my earlier response.  My apologies.

>--- /dev/null
>+++ b/arch/powerpc/sysdev/ppc4xx_cpm.c
>@@ -0,0 +1,339 @@
>+/*
>+ * PowerPC 4xx Clock and Power Management
>+ *
>+ * Copyright (C) 2010, Applied Micro Circuits Corporation
>+ * Victor Gallardo (vgallardo@apm.com)
>+ *
>+ * Based on arch/powerpc/platforms/44x/idle.c:
>+ * Jerone Young <jyoung5@us.ibm.com>
>+ * Copyright 2008 IBM Corp.
>+ *
>+ * Based on arch/powerpc/sysdev/fsl_pmc.c:
>+ * Anton Vorontsov <avorontsov@ru.mvista.com>
>+ * Copyright 2009  MontaVista Software, Inc.
>+ *
>+ * See file CREDITS for list of people who contributed to this
>+ * project.
>+ *
>+ * This program is free software; you can redistribute it and/or
>+ * modify it under the terms of the GNU General Public License as
>+ * published by the Free Software Foundation; either version 2 of
>+ * the License, or (at your option) any later version.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+ * GNU General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU General Public License
>+ * along with this program; if not, write to the Free Software
>+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston,
>+ * MA 02111-1307 USA
>+ */
>+
>+#include <linux/kernel.h>
>+#include <linux/of_platform.h>
>+#include <linux/sysfs.h>
>+#include <linux/cpu.h>
>+#include <linux/suspend.h>
>+#include <asm/dcr.h>
>+#include <asm/dcr-native.h>
>+#include <asm/machdep.h>
>+
>+#define CPM_ER	0
>+#define CPM_FR	1
>+#define CPM_SR	2
>+
>+#define CPM_IDLE_WAIT	0
>+#define CPM_IDLE_DOZE	1
>+
>+struct cpm {
>+	dcr_host_t	dcr_host;
>+	unsigned int	dcr_offset[3];
>+	unsigned int	powersave_off;
>+	unsigned int	unused;
>+	unsigned int	idle_doze;
>+	unsigned int	standby;
>+	unsigned int	suspend;
>+};
>+
>+static struct cpm cpm;
>+
>+struct cpm_idle_mode {
>+	unsigned int enabled;
>+	const char  *name;
>+};
>+
>+static struct cpm_idle_mode idle_mode[] = {
>+	[CPM_IDLE_WAIT] = { 1, "wait" }, /* default */
>+	[CPM_IDLE_DOZE] = { 0, "doze" },
>+};
>+
>+static void cpm_set(unsigned int cpm_reg, unsigned int mask)
>+{
>+	unsigned int value;
>+
>+	value = dcr_read(cpm.dcr_host, cpm.dcr_offset[cpm_reg]);
>+	value |= mask;
>+	dcr_write(cpm.dcr_host, cpm.dcr_offset[cpm_reg], value);
>+}

This doesn't seem to do any sort of verification that the class 2 and
class 3 devices actually got disabled.  Or at least I don't see where we
verify anything in the SR.

Maybe in practice it doesn't matter, but we should add a comment that say
we just expect them to eventually go off when they can.

>+
>+static void cpm_idle_wait(void)
>+{
>+	unsigned long msr_save;
>+
>+	/* save off initial state */
>+	msr_save = mfmsr();
>+	/* sync required when CPM0_ER[CPU] is set */
>+	mb();
>+	/* set wait state MSR */
>+	mtmsr(msr_save|MSR_WE|MSR_EE|MSR_CE|MSR_DE);
>+	isync();
>+	/* return to initial state */
>+	mtmsr(msr_save);
>+	isync();
>+}
>+
>+static void cpm_idle_sleep(unsigned int mask)
>+{
>+	unsigned int er_save;
>+
>+	/* update CPM_ER state */
>+	er_save = dcr_read(cpm.dcr_host, cpm.dcr_offset[CPM_ER]);
>+	dcr_write(cpm.dcr_host, cpm.dcr_offset[CPM_ER],
>+		  er_save | mask);
>+
>+	/* go to wait state */
>+	cpm_idle_wait();
>+
>+	/* restore CPM_ER state */
>+	dcr_write(cpm.dcr_host, cpm.dcr_offset[CPM_ER], er_save);
>+}

For my clarification, the CPU is a class 2 device and we expect that the
logic between the CPU and the CPM allows us to do the write to make it
sleep, but not actually sleep until we set the idle state?

josh

^ permalink raw reply

* [PATCH] powerpc/ppc64e: Fix link problem when building ppc64e_defconfig
From: Kumar Gala @ 2010-10-08 16:00 UTC (permalink / raw)
  To: linuxppc-dev

arch/powerpc/platforms/built-in.o:(.toc1+0x18): undefined reference to `__early_start'

This is due to the 85xx/smp.c not handling the 64-bit side properly.  We
need to set the entry point for secondary cores on ppc64e to
generic_secondary_smp_init instead of __early_start that we due on ppc32.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
---
 arch/powerpc/platforms/85xx/smp.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index a6b1065..bd38b6a 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -79,6 +79,7 @@ smp_85xx_kick_cpu(int nr)
 	local_irq_save(flags);
 
 	out_be32(bptr_vaddr + BOOT_ENTRY_PIR, nr);
+#ifdef CONFIG_PPC32
 	out_be32(bptr_vaddr + BOOT_ENTRY_ADDR_LOWER, __pa(__early_start));
 
 	if (!ioremappable)
@@ -88,6 +89,12 @@ smp_85xx_kick_cpu(int nr)
 	/* Wait a bit for the CPU to ack. */
 	while ((__secondary_hold_acknowledge != nr) && (++n < 1000))
 		mdelay(1);
+#else
+	out_be64((u64 *)(bptr_vaddr + BOOT_ENTRY_ADDR_UPPER),
+		__pa((u64)*((unsigned long long *) generic_secondary_smp_init)));
+
+	smp_generic_kick_cpu(nr);
+#endif
 
 	local_irq_restore(flags);
 
-- 
1.7.2.3

^ permalink raw reply related

* [git pull] Please pull powerpc.git merge branch
From: Kumar Gala @ 2010-10-08 16:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Ben,

This isn't critical, but it does fix having the ppc64e_defconfig
build cleanly.

- k

The following changes since commit 6b0cd00bc396daf5c2dcf17a8d82055335341f46:

  Merge branch 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 (2010-10-07 13:59:32 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git merge

Kumar Gala (1):
      powerpc/ppc64e: Fix link problem when building ppc64e_defconfig

 arch/powerpc/platforms/85xx/smp.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

^ permalink raw reply

* Re: powerpc, fs_enet: scanning PHY after Linux is up
From: Grant Likely @ 2010-10-08 17:06 UTC (permalink / raw)
  To: Holger brunck; +Cc: linuxppc-dev, devicetree-discuss, hs, Detlev Zundel, netdev
In-Reply-To: <4CAEDB6A.70600@keymile.com>

On Fri, Oct 08, 2010 at 10:50:50AM +0200, Holger brunck wrote:
> Hi Grant,
> 
> On 10/06/2010 06:52 PM, Grant Likely wrote:
> > On Wed, Oct 6, 2010 at 3:53 AM, Heiko Schocher <hs@denx.de> wrote:
> >>>> So, the question is, is there a possibility to solve this problem?
> >>>>
> >>>> If there is no standard option, what would be with adding a
> >>>> "scan_phy" file in
> >>>>
> >>>> /proc/device-tree/soc\@f0000000/cpm\@119c0/mdio\@10d40
> >>>> (or better destination?)
> >>>>
> >>>> which with we could rescan a PHY with
> >>>> "echo addr > /proc/device-tree/soc\@f0000000/cpm\@119c0/mdio\@10d40/scan_phy"
> >>>> (so there is no need for using of_find_node_by_path(), as we should
> >>>>  have the associated device node here, and can step through the child
> >>>>  nodes with "for_each_child_of_node(np, child)" and check if reg == addr)
> >>>>
> >>>> or shouldn;t be at least, if the phy couldn;t be found when opening
> >>>> the port, retrigger a scanning, if the phy now is accessible?
> >>>
> >>> One option would be to still register a phy_device for each phy
> >>> described in the device tree, but defer binding a driver to each phy
> >>> that doesn't respond.  Then at of_phy_find_device() time, if it
> >>
> >> Maybe I din;t get the trick, but the problem is, that
> >> you can;t register a phy_device in drivers/of/of_mdio.c
> >> of_mdiobus_register(), if the phy didn;t respond with the
> >> phy_id ... and of_phy_find_device() is not (yet) used in fs_enet
> > 
> > I'm suggesting modifying the phy layer so that it is possible to
> > register a phy_device that doesn't (yet) respond.
> > 
> 
> yes this sounds reasonable.
> 
> >>> matches with a phy_device that isn't bound to a driver yet, then
> >>> re-trigger the binding operation.  At which point the phy id can be
> >>> probed and the correct driver can be chosen.  If binding succeeds,
> >>> then return the phy_device handle.  If not, then fail as it currently
> >>> does.
> >>
> >> Wouldn;t it be good, just if we need a PHY (on calling fs_enet_open)
> >> to look if there is one?
> >>
> >> Something like that (not tested):
> >>
> >> in drivers/net/fs_enet/fs_enet-main.c in fs_init_phy()
> >> called from fs_enet_open():
> >>
> >> Do first:
> >> phydev =  of_phy_find_device(fep->fpi->phy_node);
> >>
> >> Look if there is a driver (phy_dev->drv == NULL ?)
> >>
> >> If not, call new function
> >> of_mdiobus_register_phy(mii_bus, fep->fpi->phy_node)
> >> see below patch for it.
> >>
> >> If this succeeds, all is OK, and we can use this phy,
> >> else ethernet not work.
> > 
> > I don't like this approach because it muddies the concept of which
> > device is actually responsible for managing the phys on the bus.  Is
> > it managed by the mdio bus device or the Ethernet device?  It also has
> > a potential race condition.  Whereas triggering a late driver bind
> > will be safe.
> > 
> > Alternately, I'd also be okay with a common method to trigger a
> > reprobe of a particular phy from userspace, but I fear that would be a
> > significantly more complex solution.
> > 
> >>
> >> !!just no idea, how to get mii_bus pointer ...
> > 
> > You'd have to get the parent of the phy node, and then loop over all
> > the registered mdio busses looking for a bus that uses that node.
> > 
> 
> you say that you don't like the approach to probe the phy again in fs_enet_open,
> but currently I don't understand what would be the alternate trigger point to
> rescan the mdio bus?

Same trigger point, but different operation.  At fs_enet_open time,
instead of registering the phy_device, the phy layer could sanity
check the already registered phy_device, and refuse to connect to it
if the phy isn't responding.  If it is responding, then it could
re-attempt binding a phy_driver to it (although I just realized that
this has other problems, such as correct module loading.  See below)

> I made a first patch to enhance the phy_device structure and rescan the mdio bus
> at time of fs_enet_open (because I didn't see a better trigger point). The
> advantage is that we got the mii_bus pointer and the phy addr stored in the
> already created phy device structure and is therefore easy to use. See the patch
> below for this modifications. Whats currently missing in the patch is to set the
> phy_id if the phy was scanned later after phy_device creation. For the mgcoge
> board it seems to solve our problem, but maybe I miss something important.
> 
> Best regards
> Holger Brunck
> 
> diff --git a/drivers/net/fs_enet/fs_enet-main.c b/drivers/net/fs_enet/fs_enet-main.c
> index ec2f503..6bc117f 100644
> --- a/drivers/net/fs_enet/fs_enet-main.c
> +++ b/drivers/net/fs_enet/fs_enet-main.c
> @@ -775,7 +774,8 @@ static int fs_enet_open(struct net_device *dev)
>  {
>         struct fs_enet_private *fep = netdev_priv(dev);
>         int r;
> -       int err;
> +       int err = 0;
> +       u32 phy_id = 0;
> 
>         /* to initialize the fep->cur_rx,... */
>         /* not doing this, will cause a crash in fs_enet_rx_napi */
> @@ -795,13 +795,23 @@ static int fs_enet_open(struct net_device *dev)
>                 return -EINVAL;
>         }
> 
> -       err = fs_init_phy(dev);
> -       if (err) {
> +       if (fep->phydev == NULL)
> +               err = fs_init_phy(dev);
> +
> +       if (!err && (fep->phydev->available == false))
> +               r = get_phy_id(fep->phydev->bus, fep->phydev->addr, &phy_id);
> +
> +       if (err || (phy_id == 0xffffffff)) {
>                 free_irq(fep->interrupt, dev);
>                 if (fep->fpi->use_napi)
>                         napi_disable(&fep->napi);
> -               return err;
> +               if (err)
> +                       return err;
> +               else
> +                       return -EINVAL;
>         }
> +       else
> +               fep->phydev->available = true;
>         phy_start(fep->phydev);
> 
>         netif_start_queue(dev);
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index adbc0fd..1f443cb 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -173,6 +173,10 @@ struct phy_device* phy_device_create(struct mii_bus *bus,
> int addr, int phy_id)
>         dev->dev.bus = &mdio_bus_type;
>         dev->irq = bus->irq != NULL ? bus->irq[addr] : PHY_POLL;
>         dev_set_name(&dev->dev, PHY_ID_FMT, bus->id, addr);
> +       if (phy_id == 0xffffffff)
> +               dev->available = false;
> +       else
> +               dev->available = true;

This flag shouldn't be necessary.  Just check whether or not
phy_device->phy_id is sane at phy_attach_direct() time.  If it is
mostly f's, then don't attach.

> 
>         dev->state = PHY_DOWN;
> 
> @@ -232,13 +236,11 @@ struct phy_device * get_phy_device(struct mii_bus *bus,
> int addr)
>         int r;
> 
>         r = get_phy_id(bus, addr, &phy_id);
> -       if (r)
> -               return ERR_PTR(r);
> 
>         /* If the phy_id is mostly Fs, there is no device there */
> -       if ((phy_id & 0x1fffffff) == 0x1fffffff)
> -               return NULL;
> -
> +       if (((phy_id & 0x1fffffff) == 0x1fffffff) || r)
> +               phy_id = 0xffffffff;
> +       /* create phy even if the phy is currently not available */
>         dev = phy_device_create(bus, addr, phy_id);

Cannot do it this way because many phylib users probe the bus for phys
instead of the explicit creation used with the device tree.  There
needs to be a method to explicitly skip this test when creating a phy;
possibly by having the device tree code call phy_device_create()
directly.

Hmmm.... I see another problem.  Deferred probing of the phy will
potentially cause problems with module loading.  If the binding is
deferred to phy connect time; then the phy driver may not have time to
get loaded before the phy layer decides there is no driver and binds
it to the generic one.  Blech.

Okay, so it seems like a method of explicitly triggering a phy_device
rebind from userspace is necessary.  This could be done with a
per-phy_device sysfs file I suppose.  Just an empty file that when
read triggers a re-read of the phy id registers, and retries binding a
driver, including the request_module call in phy_device_create().

> 
>         return dev;
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 6a7eb40..12dc3e4 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -303,6 +303,9 @@ struct phy_device {
> 
>         int link_timeout;
> 
> +       /* Flag to support delayed availability */
> +       bool available;
> +
>         /*
>          * Interrupt number for this PHY
>          * -1 means no interrupt
> 

^ permalink raw reply

* [RFC PATCH 05/11] ppc: do not search for dma-window property on dlpar remove
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc
  Cc: devicetree-discuss, linux-kernel, miltonm, Paul Mackerras,
	Anton Blanchard, linuxppc-dev
In-Reply-To: <1286559192-10898-1-git-send-email-nacc@us.ibm.com>

The iommu_table pointer in the pci auxiliary struct of device_node has
not been used by the iommu ops since the dma refactor of
12d04eef927bf61328af2c7cbe756c96f98ac3bf, however this code still uses
it to find tables for dlpar. By only setting the PCI_DN iommu_table
pointer on nodes with dma window properties, we will be able to quickly
find the node for later checks, and can remove the table without looking
for the the dma window property on dlpar remove.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9184db3..8ab32da 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -455,9 +455,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 		ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
 		pr_debug("  created table: %p\n", ppci->iommu_table);
 	}
-
-	if (pdn != dn)
-		PCI_DN(dn)->iommu_table = ppci->iommu_table;
 }
 
 
@@ -571,8 +568,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 
 	switch (action) {
 	case PSERIES_RECONFIG_REMOVE:
-		if (pci && pci->iommu_table &&
-		    of_get_property(np, "ibm,dma-window", NULL))
+		if (pci && pci->iommu_table)
 			iommu_free_table(pci->iommu_table, np->full_name);
 		break;
 	default:
-- 
1.7.1

^ permalink raw reply related

* [RFC PATCH 08/11] ppc/iommu: remove unneeded pci_dma_bus_setup_pSeriesLP
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc
  Cc: devicetree-discuss, linux-kernel, miltonm, Paul Mackerras,
	Anton Blanchard, linuxppc-dev
In-Reply-To: <1286559192-10898-1-git-send-email-nacc@us.ibm.com>

The work done in pci_dma_bus_setup_pSeriesLP will be done in
pci_dma_dev_setup_pSeriesLP, and therefore we can remove the bus setup
function for lpar.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |   43 --------------------------------
 1 files changed, 0 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9d564b9..d8bb9be 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -417,47 +417,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
 	pr_debug("ISA/IDE, window size is 0x%llx\n", pci->phb->dma_window_size);
 }
 
-
-static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
-{
-	struct iommu_table *tbl;
-	struct device_node *dn, *pdn;
-	struct pci_dn *ppci;
-	const void *dma_window = NULL;
-
-	dn = pci_bus_to_OF_node(bus);
-
-	pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %s\n",
-		 dn->full_name);
-
-	/* Find nearest ibm,dma-window, walking up the device tree */
-	for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
-		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
-		if (dma_window != NULL)
-			break;
-	}
-
-	if (dma_window == NULL) {
-		pr_debug("  no ibm,dma-window property !\n");
-		return;
-	}
-
-	ppci = PCI_DN(pdn);
-
-	pr_debug("  parent is %s, iommu_table: 0x%p\n",
-		 pdn->full_name, ppci->iommu_table);
-
-	if (!ppci->iommu_table) {
-		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-				   ppci->phb->node);
-		iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window,
-			bus->number);
-		ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
-		pr_debug("  created table: %p\n", ppci->iommu_table);
-	}
-}
-
-
 static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 {
 	struct device_node *dn;
@@ -547,7 +506,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 #else  /* CONFIG_PCI */
 #define pci_dma_bus_setup_pSeries	NULL
 #define pci_dma_dev_setup_pSeries	NULL
-#define pci_dma_bus_setup_pSeriesLP	NULL
 #define pci_dma_dev_setup_pSeriesLP	NULL
 #endif /* !CONFIG_PCI */
 
@@ -588,7 +546,6 @@ void iommu_init_early_pSeries(void)
 			ppc_md.tce_free	 = tce_free_pSeriesLP;
 		}
 		ppc_md.tce_get   = tce_get_pSeriesLP;
-		ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pSeriesLP;
 		ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pSeriesLP;
 	} else {
 		ppc_md.tce_build = tce_build_pSeries;
-- 
1.7.1

^ permalink raw reply related

* [RFC PATCH 11/11] ppc: add dynamic dma window support
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc
  Cc: devicetree-discuss, linux-kernel, miltonm, Paul Mackerras,
	Anton Blanchard, linuxppc-dev
In-Reply-To: <1286559192-10898-1-git-send-email-nacc@us.ibm.com>

If firmware allows us to map all of a partition's memory for DMA on a
particular bridge, create a 1:1 mapping of that memory. Add hooks for
dealing with hotplug events. Dyanmic DMA windows can use larger than the
default page size, and we use the largest one possible.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |  319 +++++++++++++++++++++++++++++++-
 1 files changed, 315 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 451d2d1..23ca0d1 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -33,6 +33,7 @@
 #include <linux/pci.h>
 #include <linux/dma-mapping.h>
 #include <linux/crash_dump.h>
+#include <linux/memory.h>
 #include <asm/io.h>
 #include <asm/prom.h>
 #include <asm/rtas.h>
@@ -45,6 +46,7 @@
 #include <asm/tce.h>
 #include <asm/ppc-pci.h>
 #include <asm/udbg.h>
+#include <asm/mmzone.h>
 
 #include "plpar_wrappers.h"
 
@@ -278,10 +280,19 @@ struct dynamic_dma_window_prop {
 	__be32	window_shift;	/* ilog2(tce_window_size) */
 };
 
+struct direct_window {
+	struct device_node *device;
+	const struct dynamic_dma_window_prop *prop;
+	struct list_head list;
+};
+static LIST_HEAD(direct_window_list);
+static DEFINE_SPINLOCK(direct_window_list_lock);
+#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+
 static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
-					unsigned long num_pfn, void *arg)
+					unsigned long num_pfn, const void *arg)
 {
-	struct dynamic_dma_window_prop *maprange = arg;
+	const struct dynamic_dma_window_prop *maprange = arg;
 	int rc;
 	u64 tce_size, num_tce, dma_offset;
 	u32 tce_shift;
@@ -305,9 +316,9 @@ static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
 }
 
 static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
-					unsigned long num_pfn, void *arg)
+					unsigned long num_pfn, const void *arg)
 {
-	struct dynamic_dma_window_prop *maprange = arg;
+	const struct dynamic_dma_window_prop *maprange = arg;
 	u64 *tcep, tce_size, num_tce, dma_offset, next, proto_tce;
 	u32 tce_shift;
 	long rc = 0;
@@ -368,6 +379,12 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
 	return rc;
 }
 
+static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
+					unsigned long num_pfn, void *arg)
+{
+	return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
+}
+
 #ifdef CONFIG_PCI
 static void iommu_table_setparms(struct pci_controller *phb,
 				 struct device_node *dn,
@@ -553,6 +570,246 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		       pci_name(dev));
 }
 
+/*
+ * If the PE supports dynamic dma windows, and there is space for a table
+ * that can map all pages in a linear offset, then setup such a table,
+ * and record the dma-offset in the struct device.
+ *
+ * dev: the pci device we are checking
+ * pdn: the parent pe node with the ibm,dma_window property
+ * Future: also check if we can remap the base window for our base page size
+ */
+static void check_ddr_windowLP(struct pci_dev *dev, struct device_node *pdn)
+{
+	int len, ret;
+	u32 query[4], create[3], cfg_addr;
+	int page_shift;
+	u64 dma_addr, buid, max_addr;
+	struct pci_dn *pcidn;
+	const u32 *uninitialized_var(ddr_avail);
+	struct direct_window *window;
+	struct property *uninitialized_var(win64);
+	struct dynamic_dma_window_prop *ddwprop;
+	const struct dynamic_dma_window_prop *direct64;
+
+	spin_lock(&direct_window_list_lock);
+
+	/* check if we already created a window */
+	list_for_each_entry(window, &direct_window_list, list) {
+		if (window->device == pdn) {
+			direct64 = window->prop;
+			goto set_device;
+		}
+	}
+	/* check if we kexec'd with a window */
+	direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
+	if (direct64)
+		goto create_window_listent;
+
+	ddr_avail = of_get_property(pdn, "ibm,ddw-applicable", &len);
+
+	if (!ddr_avail || len < 4 * sizeof(u32))
+		return;
+	/*
+	 * the ibm,ddw-applicable property holds the tokens for:
+	 * ibm,query-pe-dma-window
+	 * ibm,create-pe-dma-window
+	 * ibm,remove-pe-dma-window
+	 * for the given node in that order.
+	 *
+	 * Query if there is a second window of size to map the
+	 * whole partition.  Query returns number of windows, largest
+	 * block assigned to PE (partition endpoint), and two bitmasks
+	 * of page sizes: supported and supported for migrate-dma.
+	 */
+
+	/*
+	 * Get the config address and phb build of the PE window.
+	 * Rely on eeh to retrieve this for us.
+	 * Retrieve them from the node with the dma window property.
+	 */
+	pcidn = PCI_DN(pdn);
+	cfg_addr = pcidn->eeh_config_addr;
+	if (pcidn->eeh_pe_config_addr)
+		cfg_addr = pcidn->eeh_pe_config_addr;
+	buid = pcidn->phb->buid;
+	ret = rtas_call(ddr_avail[0], 3, 5, &query[0],
+		  cfg_addr, BUID_HI(buid), BUID_LO(buid));
+	if (ret != 0) {
+		dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
+			" returned %d\n", ddr_avail[0], cfg_addr, BUID_HI(buid),
+			BUID_LO(buid), ret);
+		goto out_unlock;
+	}
+
+	if (!query[0]) {
+		/*
+		 * no additional windows are available for this device.
+		 * We might be able to reallocate the existing window,
+		 * trading in for a larger page size.
+		 */
+		dev_dbg(&dev->dev, "no free dynamic windows");
+		goto out_unlock;
+	}
+	if (query[2] & 4) {
+		page_shift = 24; /* 16MB */
+	} else if (query[2] & 2) {
+		page_shift = 16; /* 64kB */
+	} else if (query[2] & 1) {
+		page_shift = 12; /* 4kB */
+	} else {
+		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
+			  query[2]);
+		goto out_unlock;
+	}
+	/* verify the window * number of ptes will map the partition */
+	/* check largest block * page size > max memory hotplug addr */
+	max_addr = memory_hotplug_max();
+	if (query[1] < (max_addr >> page_shift)) {
+		dev_dbg(&dev->dev, "can't map partiton max 0x%llx with %u "
+			  "%llu-sized pages\n", max_addr,  query[1],
+			  1ULL << page_shift);
+		goto out_unlock;
+	}
+	len = order_base_2(max_addr);
+	win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
+	if (!win64) {
+		dev_info(&dev->dev,
+			"couldn't allocate property for 64bit dma window\n");
+		goto out_unlock;
+	}
+	win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
+	win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
+	if (!win64->name || !win64->value) {
+		dev_info(&dev->dev,
+			"couldn't allocate property name and value\n");
+		goto out_free_prop;
+	}
+	do {
+		/* extra outputs are LIOBN and dma-addr (hi, lo) */
+		ret = rtas_call(ddr_avail[1], 7, 4, &create[0], cfg_addr,
+				BUID_HI(buid), BUID_LO(buid), len, page_shift);
+	} while(rtas_busy_delay(ret));
+	if (ret) {
+		dev_info(&dev->dev,
+			"failed to create direct window: rtas returned %d"
+			" to ibm,create-pe-dma-window(%x) %x %x %x %x %x\n",
+			ret, ddr_avail[1], cfg_addr, BUID_HI(buid),
+			BUID_LO(buid), len, page_shift);
+		goto out_free_prop;
+	}
+
+	*ddwprop = (struct dynamic_dma_window_prop) {
+		.liobn = cpu_to_be32(create[0]),
+		.dma_base = {cpu_to_be32(create[1]), cpu_to_be32(create[2])},
+		.tce_shift = cpu_to_be32(page_shift),
+		.window_shift = cpu_to_be32(len)
+	};
+
+	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %s\n",
+		  create[0], pdn->full_name);
+
+	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
+			win64->value, tce_setrange_multi_pSeriesLP_walk);
+	if (ret) {
+		dev_info(&dev->dev, "failed to map direct window for %s\n",
+			 pdn->full_name);
+
+		goto out_clear_window;
+	}
+
+	ret = prom_add_property(pdn, win64);
+	if (ret) {
+		pr_err("%s: unable to add dma window property: %d",
+			 pdn->full_name, ret);
+		goto out_clear_window;
+	}
+
+	direct64 = ddwprop;
+
+create_window_listent:
+	window = kzalloc(sizeof(*window), GFP_KERNEL);
+	if (!window)
+		goto out_clear_window;
+	window->device = pdn;
+	window->prop = direct64;
+	list_add(&window->list, &direct_window_list);
+
+set_device:
+	dma_addr = of_read_number(&direct64->dma_base[0], 2);
+	set_dma_offset(&dev->dev, dma_addr);
+	set_dma_ops(&dev->dev, &dma_choose64_ops);
+
+	dev_dbg(&dev->dev, "Can use direct dma at %s (offset %llx)\n",
+		pdn->full_name, dma_addr);
+
+out_unlock:
+	spin_unlock(&direct_window_list_lock);
+	return;
+
+out_clear_window:
+	ret = tce_clearrange_multi_pSeriesLP(0,
+		memblock_end_of_DRAM() >> PAGE_SHIFT, win64->value);
+	if (ret)
+		dev_info(&dev->dev,
+			"failed to clear partial window for %s\n",
+			 pdn->full_name);
+
+	ret = rtas_call(ddr_avail[2], 1, 1, NULL, direct64->liobn);
+	if (ret) {
+		dev_info(&dev->dev,
+			"failed to remove direct window: rtas returned "
+			"%d to ibm,remove-pe-dma-window(%x) %x\n",
+			ret, ddr_avail[2], direct64->liobn);
+	}
+
+out_free_prop:
+	kfree(win64->name);
+	kfree(win64->value);
+	kfree(win64);
+
+	goto out_unlock;
+}
+
+#if 1 //def CLEAN_WINDOW_ON_REMOVE
+static void remove_ddr_windowLP(struct device_node *np)
+{
+	struct dynamic_dma_window_prop *dwp;
+	struct property *win64;
+	const u32 *ddr_avail;
+	int len, ret;
+
+	ddr_avail = of_get_property(np, "ibm,ddw-applicable", &len);
+
+	win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
+
+	if (!win64 || !ddr_avail || len < 4 * sizeof(u32))
+		return;
+
+	dwp = win64->value;
+
+	/* clear the whole window, note the arg is in kernel pages */
+	ret = tce_clearrange_multi_pSeriesLP(0,
+		1ULL << (dwp->window_shift - PAGE_SHIFT), dwp);
+	if (ret)
+		pr_warning("%s failed to clear tces in window.\n",
+			 np->full_name);
+
+	ret = rtas_call(ddr_avail[2], 1, 1, NULL, dwp->liobn);
+	if (ret)
+		pr_warning("%s: failed to remove direct window: rtas returned "
+			"%d to ibm,remove-pe-dma-window(%x) %x\n",
+			np->full_name, ret, ddr_avail[2], dwp->liobn);
+
+	ret = prom_remove_property(np, win64);
+	if (ret)
+		pr_warning("%s: failed to remove direct window property (%i)\n",
+			np->full_name, ret);
+}
+#else
+static void remove_ddr_windowLP(struct device_node *np) {}
+#endif
+
 static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 {
 	struct device_node *pdn, *dn;
@@ -598,6 +855,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 	}
 
 	set_iommu_table_base(&dev->dev, pci->iommu_table);
+	check_ddr_windowLP(dev, pdn);
 }
 #else  /* CONFIG_PCI */
 #define pci_dma_bus_setup_pSeries	NULL
@@ -605,16 +863,68 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 #define pci_dma_dev_setup_pSeriesLP	NULL
 #endif /* !CONFIG_PCI */
 
+static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
+		void *data)
+{
+	struct direct_window *window;
+	struct memory_notify *arg = data;
+	int ret = 0;
+
+	switch (action) {
+	case MEM_GOING_ONLINE:
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
+					arg->nr_pages, window->prop);
+			/* XXX log error */
+		}
+		spin_unlock(&direct_window_list_lock);
+		break;
+	case MEM_CANCEL_ONLINE:
+	case MEM_OFFLINE:
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
+					arg->nr_pages, window->prop);
+			/* XXX log error */
+		}
+		spin_unlock(&direct_window_list_lock);
+		break;
+	default:
+		break;
+	}
+	if (ret && action != MEM_CANCEL_ONLINE)
+		return NOTIFY_BAD;
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block iommu_mem_nb = {
+	.notifier_call = iommu_mem_notifier,
+};
+
 static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node)
 {
 	int err = NOTIFY_OK;
 	struct device_node *np = node;
 	struct pci_dn *pci = PCI_DN(np);
+	struct direct_window *window;
 
 	switch (action) {
 	case PSERIES_RECONFIG_REMOVE:
 		if (pci && pci->iommu_table)
 			iommu_free_table(pci->iommu_table, np->full_name);
+
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			if (window->device == np) {
+				list_del(&window->list);
+				break;
+			}
+		}
+		spin_unlock(&direct_window_list_lock);
+
+		remove_ddr_windowLP(np);
 		break;
 	default:
 		err = NOTIFY_DONE;
@@ -653,6 +963,7 @@ void iommu_init_early_pSeries(void)
 
 
 	pSeries_reconfig_notifier_register(&iommu_reconfig_nb);
+	register_memory_notifier(&iommu_mem_nb);
 
 	set_pci_dma_ops(&dma_iommu_ops);
 }
-- 
1.7.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox