LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] RapidIO: documentation update
From: Micha Nelissen @ 2011-09-17 10:05 UTC (permalink / raw)
  To: Alexandre Bounine; +Cc: Liu Gang, akpm, linuxppc-dev, linux-kernel
In-Reply-To: <1316092759-5460-1-git-send-email-alexandre.bounine@idt.com>

Alexandre Bounine wrote:
>  After the host has completed enumeration of the entire network it releases
>  devices by clearing device ID locks (calls rio_clear_locks()). For each endpoint
> -in the system, it sets the Master Enable bit in the Port General Control CSR
> +in the system, it sets the Discovered bit in the Port General Control CSR
>  to indicate that enumeration is completed and agents are allowed to execute
>  passive discovery of the network.

The host needs to set both. Without Master Enable an agent is not
supposed to initiate transactions on the "bus", that's the meaning of
that bit.

Micha

^ permalink raw reply

* mount file system using offset.
From: F. Heitkamp @ 2011-09-18 19:21 UTC (permalink / raw)
  To: linuxppc-dev

Hi,

First of all please accept my apologies for posting to this list on this 
particular topic, but it was the closest match I could find.

I pulled the harddisk out of my ps3, because I have forgotten the root 
password on the linux partition.

After spending some time googling I see that the harddisk is encrypted 
and there are no linux utilities that can read/write it.

However I did find that the "testdisk" program finds the linux 
partitions on the disk (see below.).

It seems I should be able to mount the partitions on my linux pc box 
somehow since I now know their location on the disk.

Does anyone know how one might go about that?

Thanks!

Fred

TestDisk 6.12, Data Recovery Utility, May 2011
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org

Disk /dev/sdc - 80 GB / 74 GiB - CHS 9729 255 63

The harddisk (80 GB / 74 GiB) seems too small! (< 13 TB / 11 TiB)
Check the harddisk size: HD jumpers settings, BIOS detection...

The following partition can't be recovered:
      Partition               Start        End    Size in sectors
 >  Linux SWAP 2          9675 127 42 1593896 192  5 25450514424


TestDisk 6.12, Data Recovery Utility, May 2011
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org

Disk /dev/sdc - 80 GB / 74 GiB - CHS 9729 255 63
      Partition               Start        End    Size in sectors
 >P ext3                  1566 128 42  1579 127 35     208776 [/boot]
  P ext3                  1579 127 42  9675 127 41  130062240 [/]

^ permalink raw reply

* Re: mount file system using offset.
From: Geert Uytterhoeven @ 2011-09-18 20:06 UTC (permalink / raw)
  To: F. Heitkamp; +Cc: linuxppc-dev
In-Reply-To: <4E7644C9.4070802@ameritech.net>

On Sun, Sep 18, 2011 at 21:21, F. Heitkamp <heitkamp@ameritech.net> wrote:
> First of all please accept my apologies for posting to this list on this
> particular topic, but it was the closest match I could find.
>
> I pulled the harddisk out of my ps3, because I have forgotten the root
> password on the linux partition.
>
> After spending some time googling I see that the harddisk is encrypted an=
d
> there are no linux utilities that can read/write it.
>
> However I did find that the "testdisk" program finds the linux partitions=
 on
> the disk (see below.).
>
> It seems I should be able to mount the partitions on my linux pc box some=
how
> since I now know their location on the disk.
>
> Does anyone know how one might go about that?

Now you know the offset of the Linux "partition" on the disk (i.e. the
part of the
disk that was visible to Linux), you can use dm-linear to map this part ont=
o a
new block device.
After that you can use kpartx to create block devices representing the
individual
partitions on the above new block device.

> TestDisk 6.12, Data Recovery Utility, May 2011
> Christophe GRENIER <grenier@cgsecurity.org>
> http://www.cgsecurity.org
>
> Disk /dev/sdc - 80 GB / 74 GiB - CHS 9729 255 63
>
> The harddisk (80 GB / 74 GiB) seems too small! (< 13 TB / 11 TiB)
> Check the harddisk size: HD jumpers settings, BIOS detection...
>
> The following partition can't be recovered:
> =C2=A0 =C2=A0 Partition =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
Start =C2=A0 =C2=A0 =C2=A0 =C2=A0End =C2=A0 =C2=A0Size in sectors
>> =C2=A0Linux SWAP 2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A09675 127 42 1593896=
 192 =C2=A05 25450514424
>
>
> TestDisk 6.12, Data Recovery Utility, May 2011
> Christophe GRENIER <grenier@cgsecurity.org>
> http://www.cgsecurity.org
>
> Disk /dev/sdc - 80 GB / 74 GiB - CHS 9729 255 63
> =C2=A0 =C2=A0 Partition =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
Start =C2=A0 =C2=A0 =C2=A0 =C2=A0End =C2=A0 =C2=A0Size in sectors
>>P ext3 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01566=
 128 42 =C2=A01579 127 35 =C2=A0 =C2=A0 208776 [/boot]
> =C2=A0P ext3 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A01579 127 42 =C2=A09675 127 41 =C2=A0130062240 [/]

Gr{oetje,eeting}s,

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k=
.org

In personal conversations with technical people, I call myself a hacker. Bu=
t
when I'm talking to journalists I just say "programmer" or something like t=
hat.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 -- Linus Torvalds

^ permalink raw reply

* Re: mount file system using offset.
From: Christoph Hellwig @ 2011-09-18 22:28 UTC (permalink / raw)
  To: F. Heitkamp; +Cc: linuxppc-dev
In-Reply-To: <4E7644C9.4070802@ameritech.net>

On Sun, Sep 18, 2011 at 03:21:45PM -0400, F. Heitkamp wrote:
> Hi,
>
> First of all please accept my apologies for posting to this list on this 
> particular topic, but it was the closest match I could find.
>
> I pulled the harddisk out of my ps3, because I have forgotten the root 
> password on the linux partition.

Why doesn't init=/bin/sh work on the ps3?

^ permalink raw reply

* RE: [PATCH] RapidIO: documentation update
From: Bounine, Alexandre @ 2011-09-19  0:20 UTC (permalink / raw)
  To: Micha Nelissen; +Cc: Liu Gang, akpm, linuxppc-dev, linux-kernel
In-Reply-To: <4E7470E6.2000404@neli.hopto.org>

Micha Nelissen wrote:
>=20
> Alexandre Bounine wrote:
> >  After the host has completed enumeration of the entire network it
releases
> >  devices by clearing device ID locks (calls rio_clear_locks()). For
each endpoint
> > -in the system, it sets the Master Enable bit in the Port General
Control CSR
> > +in the system, it sets the Discovered bit in the Port General
Control CSR
> >  to indicate that enumeration is completed and agents are allowed to
execute
> >  passive discovery of the network.
>=20
> The host needs to set both. Without Master Enable an agent is not
> supposed to initiate transactions on the "bus", that's the meaning of
> that bit.

This is correct and host set both bits: Master_Enable and Discovered.
The documentation change above is related to event that triggers
discovery process.
In this context the change shown above is sufficient.
Please see the original code patch submitted by Liu Gang.

Alex.
  =20

^ permalink raw reply

* Re: [PATCH 2/2] [PowerPC Book3E] Introduce new ptrace debug feature flag
From: David Gibson @ 2011-09-19  1:10 UTC (permalink / raw)
  To: Thiago Jung Bauermann; +Cc: linuxppc-dev, K.Prasad, Edjunior Barbosa Machado
In-Reply-To: <1314750461.20347.1.camel@hactar>

On Tue, Aug 30, 2011 at 09:27:41PM -0300, Thiago Jung Bauermann wrote:
> On Fri, 2011-08-26 at 14:41 +1000, David Gibson wrote:
> > On Wed, Aug 24, 2011 at 09:41:43PM -0300, Thiago Jung Bauermann wrote:
> > > On Wed, 2011-08-24 at 14:00 +1000, David Gibson wrote:
> > > > On Tue, Aug 23, 2011 at 02:57:56PM +0530, K.Prasad wrote:
> > > > > On Tue, Aug 23, 2011 at 03:09:31PM +1000, David Gibson wrote:
> > > > > > On Fri, Aug 19, 2011 at 01:23:38PM +0530, K.Prasad wrote:
> > > > > > > 
> > > > > > > While PPC_PTRACE_SETHWDEBUG ptrace flag in PowerPC accepts
> > > > > > > PPC_BREAKPOINT_MODE_EXACT mode of breakpoint, the same is not intimated to the
> > > > > > > user-space debuggers (like GDB) who may want to use it. Hence we introduce a
> > > > > > > new PPC_DEBUG_FEATURE_DATA_BP_EXACT flag which will be populated on the
> > > > > > > "features" member of "struct ppc_debug_info" to advertise support for the
> > > > > > > same on Book3E PowerPC processors.
> > > > > > 
> > > > > > I thought the idea was that the BP_EXACT mode was the default - if the
> > > > > > new interface was supported at all, then BP_EXACT was always
> > > > > > supported.  So, why do you need a new flag?
> > > > > > 
> > > > > 
> > > > > Yes, BP_EXACT was always supported but not advertised through
> > > > > PPC_PTRACE_GETHWDBGINFO. We're now doing that.
> > > > 
> > > > I can see that.  But you haven't answered why.
> > > 
> > > BookS doesn't support BP_EXACT, that's why I suggested this flag.
> > 
> > Surely you can support it with exactly the same sort of filtering
> > you're using for the 8-byte ranges now?
> 
> Yes, but to detect that the processor doesn't support BP_EXACT in
> hardware I'd have to send a ptrace request, and have it rejected. Only
> then I'd step back and simulate one with ranges. Considering that it's
> easy and backwards compatible to add a new flag to signal that BP_EXACT
> is not supported, I don't know why it would be better to go with the
> more convoluted process.

No, I'm saying why not implement BP_EXACT on server.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* RFC [PATCH] fsl pci quirk to __devinit Re: PCIe end-point on FPGA doesn't show up on PCI bus when configured
From: Matias Garcia @ 2011-09-19 15:35 UTC (permalink / raw)
  To: Elie De Brauwer, linuxppc-dev
In-Reply-To: <4D4321E1.7060506@gmail.com>

On Fri, 2011-01-28 at 21:06 +0100, Elie De Brauwer wrote:
> On 01/28/11 19:37, Matias Garcia wrote:
> > I'm running a vanilla linux 2.6.37 kernel on a Freescale P2020 dual-core
> > processor, and have the following conundrum: I configure the FPGA which
> > brings up a PCIe interface to the processor. I scan both PCI buses on
> > the system (I believe the second bus is behind the Freescale integrated
> > bridge on the first), and it doesn't show up. I initiate a reset on the
> > processor, and both U-boot and Linux now see the FPGA PCI device at
> > 0000:01:00.00. I've noticed some of the memory mappings in the PCI
> > bridge windows are different between the two boot sequences. I've tried
> > all manner of pci calls (including the pcibios_fixup routines) on the
> > bridge device (including removing and re-scanning it), and on bus 1,
> > which is otherwise empty, to no avail. Following are some debug listings
> > from dmesg; any help/ideas in tracking down the problem (hardware or
> > software) is greatly appreciated.
> >
> > #Boot without FPGA configured:
> > <snip>
> > Found FSL PCI host bridge at 0x00000008ff70a000. Firmware bus number:
> > 0->255
> > PCI host bridge /pcie@8ff70a000 ranges:
> > MEM 0x0000000880000000..0x000000088fffffff -> 0x0000000080000000
> > IO 0x00000008a0000000..0x00000008a000ffff -> 0x0000000000000000
> > /pcie@8ff70a000: PCICSRBAR @ 0xfff00000
> > /pcie@8ff70a000: WARNING: Outbound window cfg leaves gaps in memory map.
> > Adjusting the memory map could reduce unnecessary bounce buffering.
> > /pcie@8ff70a000: DMA window size is 0x80000000
> > MPC85xx RDB board from Freescale Semiconductor
> > <...>
> > PCI: Probing PCI hardware
> > pci 0000:00:00.0: [1957:0070] type 1 class 0x000b20
> > pci 0000:00:00.0: ignoring class b20 (doesn't match header type 01)
> > pci 0000:00:00.0: supports D1 D2
> > pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot D3cold
> > pci 0000:00:00.0: PME# disabled
> > pci 0000:00:00.0: PCI bridge to [bus 01-ff]
> > pci 0000:00:00.0: bridge window [io 0x0000-0x0000] (disabled)
> > pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff] (disabled)
> > pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff pref] (disabled)
> > PCI 0000:00 Cannot reserve Legacy IO [io 0xffbed000-0xffbedfff]
> > pci 0000:00:00.0: PCI bridge to [bus 01-01]
> > pci 0000:00:00.0: bridge window [io 0xffbed000-0xffbfcfff]
> > pci 0000:00:00.0: bridge window [mem 0x880000000-0x88fffffff]
> > pci 0000:00:00.0: bridge window [mem pref disabled]
> > pci 0000:00:00.0: enabling device (0106 -> 0107)
> > pci_bus 0000:00: resource 0 [io 0xffbed000-0xffbfcfff]
> > pci_bus 0000:00: resource 1 [mem 0x880000000-0x88fffffff]
> > pci_bus 0000:01: resource 0 [io 0xffbed000-0xffbfcfff]
> > pci_bus 0000:01: resource 1 [mem 0x880000000-0x88fffffff]
> >
> > #Reset with FPGA configured:
> > <snip>
> > Found FSL PCI host bridge at 0x00000008ff70a000. Firmware bus number:
> > 0->255
> > PCI host bridge /pcie@8ff70a000 ranges:
> > MEM 0x0000000880000000..0x000000088fffffff -> 0x0000000080000000
> > IO 0x00000008a0000000..0x00000008a000ffff -> 0x0000000000000000
> > /pcie@8ff70a000: PCICSRBAR @ 0xfff00000
> > /pcie@8ff70a000: WARNING: Outbound window cfg leaves gaps in memory map.
> > Adjusting the memory map could reduce unnecessary bounce buffering.
> > /pcie@8ff70a000: DMA window size is 0x80000000
> > MPC85xx RDB board from Freescale Semiconductor
> > <...>
> > PCI: Probing PCI hardware
> > pci 0000:00:00.0: [1957:0070] type 1 class 0x000b20
> > pci 0000:00:00.0: ignoring class b20 (doesn't match header type 01)
> > pci 0000:00:00.0: supports D1 D2
> > pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot D3cold
> > pci 0000:00:00.0: PME# disabled
> > pci 0000:01:00.0: [1172:0004] type 0 class 0x001000
> > pci 0000:01:00.0: reg 10: [mem 0x80000000-0x80ffffff]
> > pci 0000:01:00.0: reg 14: [mem 0x81000000-0x81ffffff]
> > pci 0000:01:00.0: reg 18: [mem 0x82000000-0x82ffffff]
> > pci 0000:00:00.0: PCI bridge to [bus 01-ff]
> > pci 0000:00:00.0: bridge window [io 0x0000-0x0000] (disabled)
> > pci 0000:00:00.0: bridge window [mem 0x80000000-0x82ffffff]
> > pci 0000:00:00.0: bridge window [mem 0x10000000-0x000fffff pref] (disabled)
> > irq: irq 0 on host /soc@8ff700000/pic@40000 mapped to virtual irq 16
> > PCI 0000:00 Cannot reserve Legacy IO [io 0xffbed000-0xffbedfff]
> > pci 0000:00:00.0: PCI bridge to [bus 01-01]
> > pci 0000:00:00.0: bridge window [io 0xffbed000-0xffbfcfff]
> > pci 0000:00:00.0: bridge window [mem 0x880000000-0x88fffffff]
> > pci 0000:00:00.0: bridge window [mem pref disabled]
> > pci 0000:00:00.0: enabling device (0106 -> 0107)
> > pci_bus 0000:00: resource 0 [io 0xffbed000-0xffbfcfff]
> > pci_bus 0000:00: resource 1 [mem 0x880000000-0x88fffffff]
> > pci_bus 0000:01: resource 0 [io 0xffbed000-0xffbfcfff]
> > pci_bus 0000:01: resource 1 [mem 0x880000000-0x88fffffff]
> 
> 
> Hi Mattias,
> 
> I'm doing the same on a similar setup, also a P2020 but a 2.6.36 and 
> with me it works just fine. However I encountered one problem. I 
> understand it as follows, if there is no physical PCIe link then 
> somewhere a flag PPC_INDIRECT_TYPE_NO_PCIE_LINK gets set. This has as 
> result that reading the PCIe config space will fail with a 
> PCIBIOS_DEVICE_NOT_FOUND (ref 
> http://lxr.linux.no/#linux+v2.6.37/arch/powerpc/sysdev/indirect_pci.c#L24 )
> 
> 
> At 
> http://lxr.linux.no/#linux+v2.6.37/arch/powerpc/include/asm/pci-bridge.h#L105 
> they specify this as a workaround since the PCIe might hang if there is 
> no physical link. So my workaround for this issue was:
> 
> - load the fpga
> - travel down the pci bus to the correct bus where the fpga is attached 
>   use a pci_bus_to_host() to obtain a struct pci_controller, unset the 
> PPC_INDIRECT_TYPE_NO_PCIE_LINK  and call a pci_rescon_bus() on that bus.
> 
> After doing this I can find access the FPGA, and reload it if needed. 
> Not a clue if this is 'the proper way' to do it, but it works for me.
> 
> gr
> E.

Elie et al,

Thanks again for the find. I've been using this method successfully
until now (programming the FPGA from U-Boot and resetting the PCIe
controller as Tiejun suggested was not practical). If anyone has found a
less weird solution, I'd love to hear it. That controller just doesn't
like booting without an end-point on the bus.

We started seeing intermittent failures at the call to
quirk_fsl_pcie_header, particularly on one unit. I finally clued in that
it might be called after initialization with an __init tag. Would it be
uncouth to ask that it be changed to __devinit? I gather it's only
supposed to be called once, but in my case, it gets called more than
once when the controller is re-added in my fpga loader.

Here's the patch against 2.6.37:

Change quirk_fsl_pcie_header from __init to __devinit.

Signed-off-by: Matias Garcia <mgarcia@rossvideo.com>
---
diff --git a/arch/powerpc/sysdev/fsl_pci.c
b/arch/powerpc/sysdev/fsl_pci.c
index 818f7c6..8807d77 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -36,7 +36,7 @@
 
 static int fsl_pcie_bus_fixup, is_mpc83xx_pci;
 
-static void __init quirk_fsl_pcie_header(struct pci_dev *dev)
+static void __devinit quirk_fsl_pcie_header(struct pci_dev *dev)
 {
        /* if we aren't a PCIe don't bother */
        if (!pci_find_capability(dev, PCI_CAP_ID_EXP))

^ permalink raw reply related

* WARNING: vmlinux.o (.PPC.EMB.apuinfo): unexpected non-allocatable section.
From: Timur Tabi @ 2011-09-19 20:18 UTC (permalink / raw)
  To: linuxppc-dev

When trying to build a 64-bit kernel, I'm getting this warning message:

WARNING: vmlinux.o (.PPC.EMB.apuinfo): unexpected non-allocatable section.
Did you forget to use "ax"/"aw" in a .S file?
Note that for example <linux/init.h> contains
section definitions for use in .S files.

This kernel continues to compile and link, but when I try to boot it, it stops
here (this is under the hypervisor):

Linux version 3.0.4-31-00029-g06d86c8-dirty (b04825@efes) (gcc version 4.5.1
(Sourcery G++ Lite 2010.09-55) ) #1 SMP Mon Sep 19 14:49:48 CDT 2011
[boot]0012 Setup Arch
Found FSL PCI host bridge at 0x0000000ffe200000. Firmware bus number: 0->1
PCI host bridge /devices/pci0  ranges:
 MEM 0x0000000c00000000..0x0000000c1fffffff -> 0x00000000e0000000
  IO 0x0000000ff8000000..0x0000000ff800ffff -> 0x0000000000000000
/devices/pci0: PCICSRBAR @ 0xdf000000
Found FSL PCI host bridge at 0x0000000ffe202000. Firmware bus number: 0->1
PCI host bridge /devices/pci2  ranges:
 MEM 0x0000000c40000000..0x0000000c5fffffff -> 0x00000000e0000000
  IO 0x0000000ff8020000..0x0000000ff802ffff -> 0x0000000000000000
/devices/pci2: PCICSRBAR @ 0xdf000000
P5020 DS board from Freescale Semiconductor
Zone PFN ranges:
  DMA      0x00000000 -> 0x00020000
  Normal   empty
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    0: 0x00000000 -> 0x00020000
MMU: Allocated 2112 bytes of context maps for 255 contexts
[boot]0015 Setup Done
PERCPU: Embedded 13 pages/cpu @c000000002d00000 s23872 r0 d29376 u524288
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129280
Kernel command line:
PID hash table entries: 2048 (order: 2, 16384 bytes)
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Memory: 425104k/524288k available (8500k kernel code, 99184k reserved, 968k
data, 702k bss, 300k init)
Hierarchical RCU implementation.
	RCU debugfs-based tracing is enabled.
	CONFIG_RCU_FANOUT set to non-default value of 32
NR_IRQS:512 nr_irqs:512 16
clocksource: timebase mult[a000000] shift[22] registered
Console: colour dummy device 80x25
console [tty0] enabled
console [ttyEHV0] enabled
ehv-bc: registered console driver for byte channel 300
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 256
e500 family performance monitor hardware support registered

If I disable performance counter support, it just stops at the "Mount-cache hash
table entries: 256" line.


-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: WARNING: vmlinux.o (.PPC.EMB.apuinfo): unexpected non-allocatable section.
From: Scott Wood @ 2011-09-19 20:25 UTC (permalink / raw)
  To: Timur Tabi; +Cc: linuxppc-dev
In-Reply-To: <4E77A378.3040306@freescale.com>

On 09/19/2011 03:18 PM, Timur Tabi wrote:
> This kernel continues to compile and link, but when I try to boot it, it stops
> here (this is under the hypervisor):

"The hypervisor" is ambiguous.  We have Topaz, and KVM.

> Linux version 3.0.4-31-00029-g06d86c8-dirty (b04825@efes) (gcc version 4.5.1
> (Sourcery G++ Lite 2010.09-55) ) #1 SMP Mon Sep 19 14:49:48 CDT 2011
> [boot]0012 Setup Arch
> Found FSL PCI host bridge at 0x0000000ffe200000. Firmware bus number: 0->1
> PCI host bridge /devices/pci0  ranges:
>  MEM 0x0000000c00000000..0x0000000c1fffffff -> 0x00000000e0000000
>   IO 0x0000000ff8000000..0x0000000ff800ffff -> 0x0000000000000000
> /devices/pci0: PCICSRBAR @ 0xdf000000
> Found FSL PCI host bridge at 0x0000000ffe202000. Firmware bus number: 0->1
> PCI host bridge /devices/pci2  ranges:
>  MEM 0x0000000c40000000..0x0000000c5fffffff -> 0x00000000e0000000
>   IO 0x0000000ff8020000..0x0000000ff802ffff -> 0x0000000000000000
> /devices/pci2: PCICSRBAR @ 0xdf000000
> P5020 DS board from Freescale Semiconductor
> Zone PFN ranges:
>   DMA      0x00000000 -> 0x00020000
>   Normal   empty
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
>     0: 0x00000000 -> 0x00020000
> MMU: Allocated 2112 bytes of context maps for 255 contexts
> [boot]0015 Setup Done
> PERCPU: Embedded 13 pages/cpu @c000000002d00000 s23872 r0 d29376 u524288
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129280
> Kernel command line:
> PID hash table entries: 2048 (order: 2, 16384 bytes)
> Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
> Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
> Memory: 425104k/524288k available (8500k kernel code, 99184k reserved, 968k
> data, 702k bss, 300k init)
> Hierarchical RCU implementation.
> 	RCU debugfs-based tracing is enabled.
> 	CONFIG_RCU_FANOUT set to non-default value of 32
> NR_IRQS:512 nr_irqs:512 16
> clocksource: timebase mult[a000000] shift[22] registered
> Console: colour dummy device 80x25
> console [tty0] enabled
> console [ttyEHV0] enabled
> ehv-bc: registered console driver for byte channel 300
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 256
> e500 family performance monitor hardware support registered
> 
> If I disable performance counter support, it just stops at the "Mount-cache hash
> table entries: 256" line.

These messages generally tell you what the kernel just did, not what
it's about to do (and thus what hung).

Have you tried using CCS (or Topaz's debug functionality) to see where
it's hung?

-Scott

^ permalink raw reply

* Re: [PATCH 0/5] ppc64 scheduler fixes
From: Anton Blanchard @ 2011-09-20  0:19 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linuxppc-dev, linux-kernel
In-Reply-To: <1311597702.24752.1.camel@twins>


Hi Peter,

> On Mon, 2011-07-25 at 12:33 +1000, Anton Blanchard wrote:
> > Here are a set of ppc64 scheduler fixes that help with some
> > multi node performance issues.
> 
> They look fine to me. I'll probably ping you when I'll rip out all
> that SD_NODES_PER_DOMAIN crap for good, but until then I'm fine with
> you fiddling it for ppc64.
> 
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

Are you OK if Ben takes this, or would you prefer to send it via the
scheduler tree?

Anton
--

[2/5] sched: Allow SD_NODES_PER_DOMAIN to be overridden

We want to override the default value of SD_NODES_PER_DOMAIN on ppc64,
so move it into linux/topology.h.

Signed-off-by: Anton Blanchard <anton@samba.org>

---

Index: linux-2.6-work/include/linux/topology.h
===================================================================
--- linux-2.6-work.orig/include/linux/topology.h	2011-07-25 11:20:02.588717796 +1000
+++ linux-2.6-work/include/linux/topology.h	2011-07-25 11:26:50.616468376 +1000
@@ -201,6 +201,10 @@ int arch_update_cpu_topology(void);
 	.balance_interval	= 64,					\
 }
 
+#ifndef SD_NODES_PER_DOMAIN
+#define SD_NODES_PER_DOMAIN 16
+#endif
+
 #ifdef CONFIG_SCHED_BOOK
 #ifndef SD_BOOK_INIT
 #error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
Index: linux-2.6-work/kernel/sched.c
===================================================================
--- linux-2.6-work.orig/kernel/sched.c	2011-07-25 11:20:09.538850173 +1000
+++ linux-2.6-work/kernel/sched.c	2011-07-25 11:26:50.626468565 +1000
@@ -6938,8 +6938,6 @@ static int __init isolated_cpu_setup(cha
 
 __setup("isolcpus=", isolated_cpu_setup);
 
-#define SD_NODES_PER_DOMAIN 16
-
 #ifdef CONFIG_NUMA
 
 /**

^ permalink raw reply

* Re: [PATCH 0/5] ppc64 scheduler fixes
From: Benjamin Herrenschmidt @ 2011-09-20  1:38 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Peter Zijlstra, mingo, linuxppc-dev, linux-kernel
In-Reply-To: <20110920101938.121098ed@kryten>

On Tue, 2011-09-20 at 10:19 +1000, Anton Blanchard wrote:
> Hi Peter,
> 
> > On Mon, 2011-07-25 at 12:33 +1000, Anton Blanchard wrote:
> > > Here are a set of ppc64 scheduler fixes that help with some
> > > multi node performance issues.
> > 
> > They look fine to me. I'll probably ping you when I'll rip out all
> > that SD_NODES_PER_DOMAIN crap for good, but until then I'm fine with
> > you fiddling it for ppc64.
> > 
> > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> Are you OK if Ben takes this, or would you prefer to send it via the
> scheduler tree?

I've already put it in my next branch that I'll stick on github later
today hopefully :-)

Cheers,
Ben.
 
> Anton
> --
> 
> [2/5] sched: Allow SD_NODES_PER_DOMAIN to be overridden
> 
> We want to override the default value of SD_NODES_PER_DOMAIN on ppc64,
> so move it into linux/topology.h.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> 
> ---
> 
> Index: linux-2.6-work/include/linux/topology.h
> ===================================================================
> --- linux-2.6-work.orig/include/linux/topology.h	2011-07-25 11:20:02.588717796 +1000
> +++ linux-2.6-work/include/linux/topology.h	2011-07-25 11:26:50.616468376 +1000
> @@ -201,6 +201,10 @@ int arch_update_cpu_topology(void);
>  	.balance_interval	= 64,					\
>  }
>  
> +#ifndef SD_NODES_PER_DOMAIN
> +#define SD_NODES_PER_DOMAIN 16
> +#endif
> +
>  #ifdef CONFIG_SCHED_BOOK
>  #ifndef SD_BOOK_INIT
>  #error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
> Index: linux-2.6-work/kernel/sched.c
> ===================================================================
> --- linux-2.6-work.orig/kernel/sched.c	2011-07-25 11:20:09.538850173 +1000
> +++ linux-2.6-work/kernel/sched.c	2011-07-25 11:26:50.626468565 +1000
> @@ -6938,8 +6938,6 @@ static int __init isolated_cpu_setup(cha
>  
>  __setup("isolcpus=", isolated_cpu_setup);
>  
> -#define SD_NODES_PER_DOMAIN 16
> -
>  #ifdef CONFIG_NUMA
>  
>  /**
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* [PATCH 01/20] powerpc/udbg: Fix Kconfig entry for avoiding 44x early debug with KVM
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev

It was preventing the global early debug selection whenever KVM was enabled
instead of only preventing the 440 specific one.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/Kconfig.debug |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 067cb84..cc01f1d 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -141,9 +141,6 @@ config BOOTX_TEXT
 
 config PPC_EARLY_DEBUG
 	bool "Early debugging (dangerous)"
-	# PPC_EARLY_DEBUG on 440 leaves AS=1 mappings above the TLB high water
-	# mark, which doesn't work with current 440 KVM.
-	depends on !KVM
 	help
 	  Say Y to enable some early debugging facilities that may be available
 	  for your processor/board combination. Those facilities are hacks
@@ -222,7 +219,9 @@ config PPC_EARLY_DEBUG_BEAT
 
 config PPC_EARLY_DEBUG_44x
 	bool "Early serial debugging for IBM/AMCC 44x CPUs"
-	depends on 44x
+	# PPC_EARLY_DEBUG on 440 leaves AS=1 mappings above the TLB high water
+	# mark, which doesn't work with current 440 KVM.
+	depends on 44x && !KVM
 	help
 	  Select this to enable early debugging for IBM 44x chips via the
 	  inbuilt serial port.  If you enable this, ensure you set
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 02/20] powerpc/smp: More generic support for "soft hotplug"
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

This adds more generic support for doing CPU hotplug with a simple
idle loop and no actual reset of the processors. The generic
smp_generic_kick_cpu() does the hotplug bringup trick if the PACA
shows that the CPU has already been started at boot and we provide
an accessor for the CPU state.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/smp.h |    1 +
 arch/powerpc/kernel/smp.c      |   30 +++++++++++++++++++++++++-----
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 15a70b7..adba970 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -65,6 +65,7 @@ int generic_cpu_disable(void);
 void generic_cpu_die(unsigned int cpu);
 void generic_mach_cpu_die(void);
 void generic_set_cpu_dead(unsigned int cpu);
+int generic_check_cpu_restart(unsigned int cpu);
 #endif
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 7bf2187..af7e772 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -70,6 +70,10 @@
 static DEFINE_PER_CPU(struct task_struct *, idle_thread_array);
 #define get_idle_for_cpu(x)      (per_cpu(idle_thread_array, x))
 #define set_idle_for_cpu(x, p)   (per_cpu(idle_thread_array, x) = (p))
+
+/* State of each CPU during hotplug phases */
+static DEFINE_PER_CPU(int, cpu_state) = { 0 };
+
 #else
 static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
 #define get_idle_for_cpu(x)      (idle_thread_array[(x)])
@@ -104,12 +108,25 @@ int __devinit smp_generic_kick_cpu(int nr)
 	 * cpu_start field to become non-zero After we set cpu_start,
 	 * the processor will continue on to secondary_start
 	 */
-	paca[nr].cpu_start = 1;
-	smp_mb();
+	if (!paca[nr].cpu_start) {
+		paca[nr].cpu_start = 1;
+		smp_mb();
+		return 0;
+	}
+
+#ifdef CONFIG_HOTPLUG_CPU
+	/*
+	 * Ok it's not there, so it might be soft-unplugged, let's
+	 * try to bring it back
+	 */
+	per_cpu(cpu_state, nr) = CPU_UP_PREPARE;
+	smp_wmb();
+	smp_send_reschedule(nr);
+#endif /* CONFIG_HOTPLUG_CPU */
 
 	return 0;
 }
-#endif
+#endif /* CONFIG_PPC64 */
 
 static irqreturn_t call_function_action(int irq, void *data)
 {
@@ -357,8 +374,6 @@ void __devinit smp_prepare_boot_cpu(void)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-/* State of each CPU during hotplug phases */
-static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 
 int generic_cpu_disable(void)
 {
@@ -406,6 +421,11 @@ void generic_set_cpu_dead(unsigned int cpu)
 {
 	per_cpu(cpu_state, cpu) = CPU_DEAD;
 }
+
+int generic_check_cpu_restart(unsigned int cpu)
+{
+	return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE;
+}
 #endif
 
 struct create_idle {
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 03/20] powerpc/pci: Call pcie_bus_configure_settings()
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

This new function is used to properly setup the PCI Express Max Payload Size
(and in some circumstances Max Read Request Size).

Some systems will not operate properly if these aren't set correctly and
the firmware doesn't always do it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/pci-common.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 32656f1..1bd47f3 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1730,6 +1730,17 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
 
 	if (mode == PCI_PROBE_NORMAL)
 		hose->last_busno = bus->subordinate = pci_scan_child_bus(bus);
+
+	/* Configure PCI Express settings */
+	if (bus) {
+		struct pci_bus *child;
+		list_for_each_entry(child, &bus->children, node) {
+			struct pci_dev *self = child->self;
+			if (!self)
+				continue;
+			pcie_bus_configure_settings(child, self->pcie_mpss);
+		}
+	}
 }
 
 static void fixup_hide_host_resource_fsl(struct pci_dev *dev)
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 04/20] powerpc/powernv: Don't clobber r9 in relative_toc()
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

With OPAL, r8 and r9 will be used to pass the OPAL base and entry
for debugging purposes (those informations are also in the
device-tree). We don't want to clobber those registers that
early.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/head_64.S |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 3564c49..e708abe 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -674,9 +674,9 @@ _GLOBAL(enable_64b_mode)
 _GLOBAL(relative_toc)
 	mflr	r0
 	bcl	20,31,$+4
-0:	mflr	r9
-	ld	r2,(p_toc - 0b)(r9)
-	add	r2,r2,r9
+0:	mflr	r11
+	ld	r2,(p_toc - 0b)(r11)
+	add	r2,r2,r11
 	mtlr	r0
 	blr
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 06/20] of: Change logic to overwrite cmd_line with CONFIG_CMDLINE
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

We used to overwrite with CONFIG_CMDLINE if we found a chosen
node but failed to get bootargs out of it or they were empty,
unless CONFIG_CMDLINE_FORCE is set.

Instead change that to overwrite if cmd_line is non empty after
the bootargs check. It allows arch code to have other mechanisms
to retrieve the command line prior to parsing the device-tree.

Note: CONFIG_CMDLINE_FORCE case should ideally be handled elsewhere
as it won't work as it-is if the device-tree has no /chosen node

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 drivers/of/fdt.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 65200af..d382163 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -683,7 +683,7 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
 
 #ifdef CONFIG_CMDLINE
 #ifndef CONFIG_CMDLINE_FORCE
-	if (p == NULL || l == 0 || (l == 1 && (*p) == 0))
+	if (!cmd_line[0])
 #endif
 		strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
 #endif /* CONFIG_CMDLINE */
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 05/20] powerpc: Add skeleton PowerNV platform
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

This adds a skeletton for the new Power "Non Virtualized"
platform which will be used by machines supporting running
without an hypervisor, for example in order to run KVM.

These machines will be using a new firmware called OPAL
for which the support will be provided by later patches.

The PowerNV platform is intended to be also usable under
the BML environment used internally for early CPU bringup
which is why the code also supports using RTAS instead of
OPAL in various places.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/boot/Makefile               |    1 +
 arch/powerpc/platforms/Kconfig           |    1 +
 arch/powerpc/platforms/Makefile          |    1 +
 arch/powerpc/platforms/powernv/Kconfig   |   12 +++
 arch/powerpc/platforms/powernv/Makefile  |    2 +
 arch/powerpc/platforms/powernv/powernv.h |   10 ++
 arch/powerpc/platforms/powernv/setup.c   |  153 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/smp.c     |   83 ++++++++++++++++
 8 files changed, 263 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/Kconfig
 create mode 100644 arch/powerpc/platforms/powernv/Makefile
 create mode 100644 arch/powerpc/platforms/powernv/powernv.h
 create mode 100644 arch/powerpc/platforms/powernv/setup.c
 create mode 100644 arch/powerpc/platforms/powernv/smp.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index c26200b..52cde90 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -171,6 +171,7 @@ quiet_cmd_wrap	= WRAP    $@
 		$(if $3, -s $3)$(if $4, -d $4)$(if $5, -i $5) vmlinux
 
 image-$(CONFIG_PPC_PSERIES)		+= zImage.pseries
+image-$(CONFIG_PPC_POWERNV)		+= zImage.pseries
 image-$(CONFIG_PPC_MAPLE)		+= zImage.maple
 image-$(CONFIG_PPC_IBM_CELL_BLADE)	+= zImage.pseries
 image-$(CONFIG_PPC_PS3)			+= dtbImage.ps3
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index b9ba861..6de27d2 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -1,5 +1,6 @@
 menu "Platform support"
 
+source "arch/powerpc/platforms/powernv/Kconfig"
 source "arch/powerpc/platforms/pseries/Kconfig"
 source "arch/powerpc/platforms/iseries/Kconfig"
 source "arch/powerpc/platforms/chrp/Kconfig"
diff --git a/arch/powerpc/platforms/Makefile b/arch/powerpc/platforms/Makefile
index 73e2116..2635a22 100644
--- a/arch/powerpc/platforms/Makefile
+++ b/arch/powerpc/platforms/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_PPC_82xx)		+= 82xx/
 obj-$(CONFIG_PPC_83xx)		+= 83xx/
 obj-$(CONFIG_FSL_SOC_BOOKE)	+= 85xx/
 obj-$(CONFIG_PPC_86xx)		+= 86xx/
+obj-$(CONFIG_PPC_POWERNV)	+= powernv/
 obj-$(CONFIG_PPC_PSERIES)	+= pseries/
 obj-$(CONFIG_PPC_ISERIES)	+= iseries/
 obj-$(CONFIG_PPC_MAPLE)		+= maple/
diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
new file mode 100644
index 0000000..5cd04f5
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -0,0 +1,12 @@
+config PPC_POWERNV
+	depends on PPC64 && PPC_BOOK3S
+	bool "IBM PowerNV (Non-Virtualized) platform support"	
+	select PPC_RTAS
+	select PPC_NATIVE
+	select PPC_XICS
+	select PPC_ICP_NATIVE
+	select PPC_ICS_RTAS
+	select PPC_P7_NAP
+	select PPC_PCI_CHOICE if EMBEDDED
+	default y
+
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
new file mode 100644
index 0000000..1c43250
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -0,0 +1,2 @@
+obj-y			+= setup.o
+obj-$(CONFIG_SMP)	+= smp.o
diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
new file mode 100644
index 0000000..35b7160
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -0,0 +1,10 @@
+#ifndef _POWERNV_H
+#define _POWERNV_H
+
+#ifdef CONFIG_SMP
+extern void pnv_smp_init(void);
+#else
+static inline void pnv_smp_init(void) { }
+#endif
+
+#endif /* _POWERNV_H */
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
new file mode 100644
index 0000000..569f9cc
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -0,0 +1,153 @@
+/*
+ * PowerNV setup code.
+ *
+ * Copyright 2011 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/cpu.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/tty.h>
+#include <linux/reboot.h>
+#include <linux/init.h>
+#include <linux/console.h>
+#include <linux/delay.h>
+#include <linux/irq.h>
+#include <linux/seq_file.h>
+#include <linux/of.h>
+#include <linux/interrupt.h>
+#include <linux/bug.h>
+
+#include <asm/machdep.h>
+#include <asm/firmware.h>
+#include <asm/xics.h>
+
+#include "powernv.h"
+
+static void __init pnv_setup_arch(void)
+{
+	/* Force console to hvc for now until we have sorted out the
+	 * real console situation for the platform. This will make
+	 * hvc_udbg work at least.
+	 */
+	add_preferred_console("hvc", 0, NULL);
+
+	/* Initialize SMP */
+	pnv_smp_init();
+
+	/* XXX PCI */
+
+	/* XXX NVRAM */
+
+	/* Enable NAP mode */
+	powersave_nap = 1;
+
+	/* XXX PMCS */
+}
+
+static void __init pnv_init_early(void)
+{
+	/* XXX IOMMU */
+}
+
+static void __init pnv_init_IRQ(void)
+{
+	xics_init();
+
+	WARN_ON(!ppc_md.get_irq);
+}
+
+static void pnv_show_cpuinfo(struct seq_file *m)
+{
+	struct device_node *root;
+	const char *model = "";
+
+	root = of_find_node_by_path("/");
+	if (root)
+		model = of_get_property(root, "model", NULL);
+	seq_printf(m, "machine\t\t: PowerNV %s\n", model);
+	of_node_put(root);
+}
+
+static void pnv_restart(char *cmd)
+{
+	for (;;);
+}
+
+static void pnv_power_off(void)
+{
+	for (;;);
+}
+
+static void pnv_halt(void)
+{
+	for (;;);
+}
+
+static unsigned long __init pnv_get_boot_time(void)
+{
+	return 0;
+}
+
+static void pnv_get_rtc_time(struct rtc_time *rtc_tm)
+{
+}
+
+static int pnv_set_rtc_time(struct rtc_time *tm)
+{
+	return 0;
+}
+
+static void pnv_progress(char *s, unsigned short hex)
+{
+}
+
+#ifdef CONFIG_KEXEC
+static void pnv_kexec_cpu_down(int crash_shutdown, int secondary)
+{
+	xics_kexec_teardown_cpu(secondary);
+}
+#endif /* CONFIG_KEXEC */
+
+static int __init pnv_probe(void)
+{
+	unsigned long root = of_get_flat_dt_root();
+
+	if (!of_flat_dt_is_compatible(root, "ibm,powernv"))
+		return 0;
+
+	hpte_init_native();
+
+	pr_debug("PowerNV detected !\n");
+
+	return 1;
+}
+
+define_machine(powernv) {
+	.name			= "PowerNV",
+	.probe			= pnv_probe,
+	.setup_arch		= pnv_setup_arch,
+	.init_early		= pnv_init_early,
+	.init_IRQ		= pnv_init_IRQ,
+	.show_cpuinfo		= pnv_show_cpuinfo,
+	.restart		= pnv_restart,
+	.power_off		= pnv_power_off,
+	.halt			= pnv_halt,
+	.get_boot_time		= pnv_get_boot_time,
+	.get_rtc_time		= pnv_get_rtc_time,
+	.set_rtc_time		= pnv_set_rtc_time,
+	.progress		= pnv_progress,
+	.power_save             = power7_idle,
+	.calibrate_decr		= generic_calibrate_decr,
+#ifdef CONFIG_KEXEC
+	.kexec_cpu_down		= pnv_kexec_cpu_down,
+#endif
+};
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
new file mode 100644
index 0000000..36c7151
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -0,0 +1,83 @@
+/*
+ * SMP support for PowerNV machines.
+ *
+ * Copyright 2011 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+#include <linux/interrupt.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/cpu.h>
+
+#include <asm/irq.h>
+#include <asm/smp.h>
+#include <asm/paca.h>
+#include <asm/machdep.h>
+#include <asm/cputable.h>
+#include <asm/firmware.h>
+#include <asm/system.h>
+#include <asm/rtas.h>
+#include <asm/vdso_datapage.h>
+#include <asm/cputhreads.h>
+#include <asm/xics.h>
+
+#include "powernv.h"
+
+static void __devinit pnv_smp_setup_cpu(int cpu)
+{
+	if (cpu != boot_cpuid)
+		xics_setup_cpu();
+}
+
+static int pnv_smp_cpu_bootable(unsigned int nr)
+{
+	/* Special case - we inhibit secondary thread startup
+	 * during boot if the user requests it.
+	 */
+	if (system_state < SYSTEM_RUNNING && cpu_has_feature(CPU_FTR_SMT)) {
+		if (!smt_enabled_at_boot && cpu_thread_in_core(nr) != 0)
+			return 0;
+		if (smt_enabled_at_boot
+		    && cpu_thread_in_core(nr) >= smt_enabled_at_boot)
+			return 0;
+	}
+
+	return 1;
+}
+
+static struct smp_ops_t pnv_smp_ops = {
+	.message_pass	= smp_muxed_ipi_message_pass,
+	.cause_ipi	= NULL,	/* Filled at runtime by xics_smp_probe() */
+	.probe		= xics_smp_probe,
+	.kick_cpu	= smp_generic_kick_cpu,
+	.setup_cpu	= pnv_smp_setup_cpu,
+	.cpu_bootable	= pnv_smp_cpu_bootable,
+};
+
+/* This is called very early during platform setup_arch */
+void __init pnv_smp_init(void)
+{
+	smp_ops = &pnv_smp_ops;
+
+	/* XXX We don't yet have a proper entry point from HAL, for
+	 * now we rely on kexec-style entry from BML
+	 */
+
+#ifdef CONFIG_PPC_RTAS
+	/* Non-lpar has additional take/give timebase */
+	if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) {
+		smp_ops->give_timebase = rtas_give_timebase;
+		smp_ops->take_timebase = rtas_take_timebase;
+	}
+#endif /* CONFIG_PPC_RTAS */
+}
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 07/20] powerpc/powernv: Add CPU hotplug support
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

Unplugged CPU go into NAP mode in a loop until woken up

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/Kconfig                 |    2 +-
 arch/powerpc/platforms/powernv/smp.c |   78 +++++++++++++++++++++++++++++++++-
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6926b61..85baae5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -323,7 +323,7 @@ config SWIOTLB
 
 config HOTPLUG_CPU
 	bool "Support for enabling/disabling CPUs"
-	depends on SMP && HOTPLUG && EXPERIMENTAL && (PPC_PSERIES || PPC_PMAC)
+	depends on SMP && HOTPLUG && EXPERIMENTAL && (PPC_PSERIES || PPC_PMAC || PPC_POWERNV)
 	---help---
 	  Say Y here to be able to disable and re-enable individual
 	  CPUs at runtime on SMP machines.
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 36c7151..4f4ec37 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -33,7 +33,14 @@
 
 #include "powernv.h"
 
-static void __devinit pnv_smp_setup_cpu(int cpu)
+#ifdef DEBUG
+#include <asm/udbg.h>
+#define DBG(fmt...) udbg_printf(fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+static void __cpuinit pnv_smp_setup_cpu(int cpu)
 {
 	if (cpu != boot_cpuid)
 		xics_setup_cpu();
@@ -55,6 +62,67 @@ static int pnv_smp_cpu_bootable(unsigned int nr)
 	return 1;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+
+static int pnv_smp_cpu_disable(void)
+{
+	int cpu = smp_processor_id();
+
+	/* This is identical to pSeries... might consolidate by
+	 * moving migrate_irqs_away to a ppc_md with default to
+	 * the generic fixup_irqs. --BenH.
+	 */
+	set_cpu_online(cpu, false);
+	vdso_data->processorCount--;
+	if (cpu == boot_cpuid)
+		boot_cpuid = cpumask_any(cpu_online_mask);
+	xics_migrate_irqs_away();
+	return 0;
+}
+
+static void pnv_smp_cpu_kill_self(void)
+{
+	unsigned int cpu;
+
+	/* If powersave_nap is enabled, use NAP mode, else just
+	 * spin aimlessly
+	 */
+	if (!powersave_nap) {
+		generic_mach_cpu_die();
+		return;
+	}
+
+	/* Standard hot unplug procedure */
+	local_irq_disable();
+	idle_task_exit();
+	current->active_mm = NULL; /* for sanity */
+	cpu = smp_processor_id();
+	DBG("CPU%d offline\n", cpu);
+	generic_set_cpu_dead(cpu);
+	smp_wmb();
+
+	/* We don't want to take decrementer interrupts while we are offline,
+	 * so clear LPCR:PECE1. We keep PECE2 enabled.
+	 */
+	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
+	while (!generic_check_cpu_restart(cpu)) {
+		power7_idle();
+		if (!generic_check_cpu_restart(cpu)) {
+			DBG("CPU%d Unexpected exit while offline !\n", cpu);
+			/* We may be getting an IPI, so we re-enable
+			 * interrupts to process it, it will be ignored
+			 * since we aren't online (hopefully)
+			 */
+			local_irq_enable();
+			local_irq_disable();
+		}
+	}
+	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1);
+	DBG("CPU%d coming online...\n", cpu);
+}
+
+#endif /* CONFIG_HOTPLUG_CPU */
+
 static struct smp_ops_t pnv_smp_ops = {
 	.message_pass	= smp_muxed_ipi_message_pass,
 	.cause_ipi	= NULL,	/* Filled at runtime by xics_smp_probe() */
@@ -62,6 +130,10 @@ static struct smp_ops_t pnv_smp_ops = {
 	.kick_cpu	= smp_generic_kick_cpu,
 	.setup_cpu	= pnv_smp_setup_cpu,
 	.cpu_bootable	= pnv_smp_cpu_bootable,
+#ifdef CONFIG_HOTPLUG_CPU
+	.cpu_disable	= pnv_smp_cpu_disable,
+	.cpu_die	= generic_cpu_die,
+#endif /* CONFIG_HOTPLUG_CPU */
 };
 
 /* This is called very early during platform setup_arch */
@@ -80,4 +152,8 @@ void __init pnv_smp_init(void)
 		smp_ops->take_timebase = rtas_take_timebase;
 	}
 #endif /* CONFIG_PPC_RTAS */
+
+#ifdef CONFIG_HOTPLUG_CPU
+	ppc_md.cpu_die	= pnv_smp_cpu_kill_self;
+#endif
 }
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 09/20] powerpc/powernv: Get kernel command line accross OPAL takeover
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

We stash it in boot_command_line which isn't in BSS and so won't
be overwritten. We then use that as a default cmd_line before
we walk the device-tree.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/prom.c             |    7 +++++++
 arch/powerpc/kernel/prom_init.c        |    4 ++++
 arch/powerpc/kernel/prom_init_check.sh |    3 ++-
 3 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 174e1e9..7b90c56 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -712,6 +712,13 @@ void __init early_init_devtree(void *params)
 	of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL);
 #endif
 
+	/* Pre-initialize the cmd_line with the content of boot_commmand_line,
+	 * which will be empty except when the content of the variable has
+	 * been overriden by a bootloading mechanism. This happens typically
+	 * with HAL takeover
+	 */
+	strlcpy(cmd_line, boot_command_line, COMMAND_LINE_SIZE);
+
 	/* Retrieve various informations from the /chosen node of the
 	 * device-tree, including the platform type, initrd location and
 	 * size, TCE reserve, and more ...
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 9369287..e3f3904 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1449,6 +1449,10 @@ static void prom_opal_takeover(void)
 		opal_addr = top_addr;
 	args->hal_addr = opal_addr;
 
+	/* Copy the command line to the kernel image */
+	strlcpy(RELOC(boot_command_line), RELOC(prom_cmd_line),
+		COMMAND_LINE_SIZE);
+
 	prom_debug("  k_image    = 0x%lx\n", args->k_image);
 	prom_debug("  k_size     = 0x%lx\n", args->k_size);
 	prom_debug("  k_entry    = 0x%lx\n", args->k_entry);
diff --git a/arch/powerpc/kernel/prom_init_check.sh b/arch/powerpc/kernel/prom_init_check.sh
index 20af6aa..70f4286 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -21,7 +21,8 @@ _end enter_prom memcpy memset reloc_offset __secondary_hold
 __secondary_hold_acknowledge __secondary_hold_spinloop __start
 strcmp strcpy strlcpy strlen strncmp strstr logo_linux_clut224
 reloc_got2 kernstart_addr memstart_addr linux_banner _stext
-opal_query_takeover opal_do_takeover opal_enter_rtas opal_secondary_entry"
+opal_query_takeover opal_do_takeover opal_enter_rtas opal_secondary_entry
+boot_command_line"
 
 NM="$1"
 OBJ="$2"
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 10/20] powerpc/powernv: Basic support for OPAL
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

Add definition of OPAL interfaces along with  the wrappers to call
into OPAL runtime and the early device-tree parsing hook to locate
the OPAL runtime firmware.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/firmware.h            |   10 +
 arch/powerpc/include/asm/opal.h                |  382 +++++++++++++++++++++++-
 arch/powerpc/kernel/prom.c                     |    7 +
 arch/powerpc/platforms/powernv/Kconfig         |    8 +-
 arch/powerpc/platforms/powernv/Makefile        |    2 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |  101 +++++++
 arch/powerpc/platforms/powernv/opal.c          |  154 ++++++++++
 arch/powerpc/platforms/powernv/setup.c         |    6 +
 arch/powerpc/platforms/powernv/smp.c           |   25 ++-
 9 files changed, 690 insertions(+), 5 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-wrappers.S
 create mode 100644 arch/powerpc/platforms/powernv/opal.c

diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 3a6c586..14db29b 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -48,6 +48,8 @@
 #define FW_FEATURE_CMO		ASM_CONST(0x0000000002000000)
 #define FW_FEATURE_VPHN		ASM_CONST(0x0000000004000000)
 #define FW_FEATURE_XCMO		ASM_CONST(0x0000000008000000)
+#define FW_FEATURE_OPAL		ASM_CONST(0x0000000010000000)
+#define FW_FEATURE_OPALv2	ASM_CONST(0x0000000020000000)
 
 #ifndef __ASSEMBLY__
 
@@ -65,6 +67,8 @@ enum {
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_ISERIES_POSSIBLE = FW_FEATURE_ISERIES | FW_FEATURE_LPAR,
 	FW_FEATURE_ISERIES_ALWAYS = FW_FEATURE_ISERIES | FW_FEATURE_LPAR,
+	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv2,
+	FW_FEATURE_POWERNV_ALWAYS = 0,
 	FW_FEATURE_PS3_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
 	FW_FEATURE_PS3_ALWAYS = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
 	FW_FEATURE_CELLEB_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_BEAT,
@@ -78,6 +82,9 @@ enum {
 #ifdef CONFIG_PPC_ISERIES
 		FW_FEATURE_ISERIES_POSSIBLE |
 #endif
+#ifdef CONFIG_PPC_POWERNV
+		FW_FEATURE_POWERNV_POSSIBLE |
+#endif
 #ifdef CONFIG_PPC_PS3
 		FW_FEATURE_PS3_POSSIBLE |
 #endif
@@ -95,6 +102,9 @@ enum {
 #ifdef CONFIG_PPC_ISERIES
 		FW_FEATURE_ISERIES_ALWAYS &
 #endif
+#ifdef CONFIG_PPC_POWERNV
+		FW_FEATURE_POWERNV_ALWAYS &
+#endif
 #ifdef CONFIG_PPC_PS3
 		FW_FEATURE_PS3_ALWAYS &
 #endif
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ecdb283..53cda41 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -37,14 +37,394 @@ extern long opal_query_takeover(u64 *hal_size, u64 *hal_align);
 
 extern long opal_do_takeover(struct opal_takeover_args *args);
 
+struct rtas_args;
 extern int opal_enter_rtas(struct rtas_args *args,
 			   unsigned long data,
 			   unsigned long entry);
 
-
 #endif /* __ASSEMBLY__ */
 
 /****** OPAL APIs ******/
 
+/* Return codes */
+#define OPAL_SUCCESS 		0
+#define OPAL_PARAMETER		-1
+#define OPAL_BUSY		-2
+#define OPAL_PARTIAL		-3
+#define OPAL_CONSTRAINED	-4
+#define OPAL_CLOSED		-5
+#define OPAL_HARDWARE		-6
+#define OPAL_UNSUPPORTED	-7
+#define OPAL_PERMISSION		-8
+#define OPAL_NO_MEM		-9
+#define OPAL_RESOURCE		-10
+#define OPAL_INTERNAL_ERROR	-11
+#define OPAL_BUSY_EVENT		-12
+#define OPAL_HARDWARE_FROZEN	-13
+
+/* API Tokens (in r0) */
+#define OPAL_CONSOLE_WRITE			1
+#define OPAL_CONSOLE_READ			2
+#define OPAL_RTC_READ				3
+#define OPAL_RTC_WRITE				4
+#define OPAL_CEC_POWER_DOWN			5
+#define OPAL_CEC_REBOOT				6
+#define OPAL_READ_NVRAM				7
+#define OPAL_WRITE_NVRAM			8
+#define OPAL_HANDLE_INTERRUPT			9
+#define OPAL_POLL_EVENTS			10
+#define OPAL_PCI_SET_HUB_TCE_MEMORY		11
+#define OPAL_PCI_SET_PHB_TCE_MEMORY		12
+#define OPAL_PCI_CONFIG_READ_BYTE		13
+#define OPAL_PCI_CONFIG_READ_HALF_WORD  	14
+#define OPAL_PCI_CONFIG_READ_WORD		15
+#define OPAL_PCI_CONFIG_WRITE_BYTE		16
+#define OPAL_PCI_CONFIG_WRITE_HALF_WORD		17
+#define OPAL_PCI_CONFIG_WRITE_WORD		18
+#define OPAL_SET_XIVE				19
+#define OPAL_GET_XIVE				20
+#define OPAL_GET_COMPLETION_TOKEN_STATUS	21 /* obsolete */
+#define OPAL_REGISTER_OPAL_EXCEPTION_HANDLER	22
+#define OPAL_PCI_EEH_FREEZE_STATUS		23
+#define OPAL_PCI_SHPC				24
+#define OPAL_CONSOLE_WRITE_BUFFER_SPACE		25
+#define OPAL_PCI_EEH_FREEZE_CLEAR		26
+#define OPAL_PCI_PHB_MMIO_ENABLE		27
+#define OPAL_PCI_SET_PHB_MEM_WINDOW		28
+#define OPAL_PCI_MAP_PE_MMIO_WINDOW		29
+#define OPAL_PCI_SET_PHB_TABLE_MEMORY		30
+#define OPAL_PCI_SET_PE				31
+#define OPAL_PCI_SET_PELTV			32
+#define OPAL_PCI_SET_MVE			33
+#define OPAL_PCI_SET_MVE_ENABLE			34
+#define OPAL_PCI_GET_XIVE_REISSUE		35
+#define OPAL_PCI_SET_XIVE_REISSUE		36
+#define OPAL_PCI_SET_XIVE_PE			37
+#define OPAL_GET_XIVE_SOURCE			38
+#define OPAL_GET_MSI_32				39
+#define OPAL_GET_MSI_64				40
+#define OPAL_START_CPU				41
+#define OPAL_QUERY_CPU_STATUS			42
+#define OPAL_WRITE_OPPANEL			43
+#define OPAL_PCI_MAP_PE_DMA_WINDOW		44
+#define OPAL_PCI_MAP_PE_DMA_WINDOW_REAL		45
+#define OPAL_PCI_RESET				49
+
+#ifndef __ASSEMBLY__
+
+/* Other enums */
+enum OpalVendorApiTokens {
+	OPAL_START_VENDOR_API_RANGE = 1000, OPAL_END_VENDOR_API_RANGE = 1999
+};
+enum OpalFreezeState {
+	OPAL_EEH_STOPPED_NOT_FROZEN = 0,
+	OPAL_EEH_STOPPED_MMIO_FREEZE = 1,
+	OPAL_EEH_STOPPED_DMA_FREEZE = 2,
+	OPAL_EEH_STOPPED_MMIO_DMA_FREEZE = 3,
+	OPAL_EEH_STOPPED_RESET = 4,
+	OPAL_EEH_STOPPED_TEMP_UNAVAIL = 5,
+	OPAL_EEH_STOPPED_PERM_UNAVAIL = 6
+};
+enum OpalEehFreezeActionToken {
+	OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO = 1,
+	OPAL_EEH_ACTION_CLEAR_FREEZE_DMA = 2,
+	OPAL_EEH_ACTION_CLEAR_FREEZE_ALL = 3
+};
+enum OpalPciStatusToken {
+	OPAL_EEH_PHB_NO_ERROR = 0,
+	OPAL_EEH_PHB_FATAL = 1,
+	OPAL_EEH_PHB_RECOVERABLE = 2,
+	OPAL_EEH_PHB_BUS_ERROR = 3,
+	OPAL_EEH_PCI_NO_DEVSEL = 4,
+	OPAL_EEH_PCI_TA = 5,
+	OPAL_EEH_PCIEX_UR = 6,
+	OPAL_EEH_PCIEX_CA = 7,
+	OPAL_EEH_PCI_MMIO_ERROR = 8,
+	OPAL_EEH_PCI_DMA_ERROR = 9
+};
+enum OpalShpcAction {
+	OPAL_SHPC_GET_LINK_STATE = 0,
+	OPAL_SHPC_GET_SLOT_STATE = 1
+};
+enum OpalShpcLinkState {
+	OPAL_SHPC_LINK_DOWN = 0,
+	OPAL_SHPC_LINK_UP = 1
+};
+enum OpalMmioWindowType {
+	OPAL_M32_WINDOW_TYPE = 1,
+	OPAL_M64_WINDOW_TYPE = 2,
+	OPAL_IO_WINDOW_TYPE = 3
+};
+enum OpalShpcSlotState {
+	OPAL_SHPC_DEV_NOT_PRESENT = 0,
+	OPAL_SHPC_DEV_PRESENT = 1
+};
+enum OpalExceptionHandler {
+	OPAL_MACHINE_CHECK_HANDLER = 1,
+	OPAL_HYPERVISOR_MAINTENANCE_HANDLER = 2,
+	OPAL_SOFTPATCH_HANDLER = 3
+};
+enum OpalPendingState {
+	OPAL_EVENT_OPAL_INTERNAL = 0x1,
+	OPAL_EVENT_NVRAM = 0x2,
+	OPAL_EVENT_RTC = 0x4,
+	OPAL_EVENT_CONSOLE_OUTPUT = 0x8,
+	OPAL_EVENT_CONSOLE_INPUT = 0x10
+};
+
+/* Machine check related definitions */
+enum OpalMCE_Version {
+	OpalMCE_V1 = 1,
+};
+
+enum OpalMCE_Severity {
+	OpalMCE_SEV_NO_ERROR = 0,
+	OpalMCE_SEV_WARNING = 1,
+	OpalMCE_SEV_ERROR_SYNC = 2,
+	OpalMCE_SEV_FATAL = 3,
+};
+
+enum OpalMCE_Disposition {
+	OpalMCE_DISPOSITION_RECOVERED = 0,
+	OpalMCE_DISPOSITION_NOT_RECOVERED = 1,
+};
+
+enum OpalMCE_Initiator {
+	OpalMCE_INITIATOR_UNKNOWN = 0,
+	OpalMCE_INITIATOR_CPU = 1,
+};
+
+enum OpalMCE_ErrorType {
+	OpalMCE_ERROR_TYPE_UNKNOWN = 0,
+	OpalMCE_ERROR_TYPE_UE = 1,
+	OpalMCE_ERROR_TYPE_SLB = 2,
+	OpalMCE_ERROR_TYPE_ERAT = 3,
+	OpalMCE_ERROR_TYPE_TLB = 4,
+};
+
+enum OpalMCE_UeErrorType {
+	OpalMCE_UE_ERROR_INDETERMINATE = 0,
+	OpalMCE_UE_ERROR_IFETCH = 1,
+	OpalMCE_UE_ERROR_PAGE_TABLE_WALK_IFETCH = 2,
+	OpalMCE_UE_ERROR_LOAD_STORE = 3,
+	OpalMCE_UE_ERROR_PAGE_TABLE_WALK_LOAD_STORE = 4,
+};
+
+enum OpalMCE_SlbErrorType {
+	OpalMCE_SLB_ERROR_INDETERMINATE = 0,
+	OpalMCE_SLB_ERROR_PARITY = 1,
+	OpalMCE_SLB_ERROR_MULTIHIT = 2,
+};
+
+enum OpalMCE_EratErrorType {
+	OpalMCE_ERAT_ERROR_INDETERMINATE = 0,
+	OpalMCE_ERAT_ERROR_PARITY = 1,
+	OpalMCE_ERAT_ERROR_MULTIHIT = 2,
+};
+
+enum OpalMCE_TlbErrorType {
+	OpalMCE_TLB_ERROR_INDETERMINATE = 0,
+	OpalMCE_TLB_ERROR_PARITY = 1,
+	OpalMCE_TLB_ERROR_MULTIHIT = 2,
+};
+
+enum OpalThreadStatus {
+	OPAL_THREAD_INACTIVE = 0x0,
+	OPAL_THREAD_STARTED = 0x1
+};
+
+enum OpalPciBusCompare {
+	OpalPciBusAny	= 0,	/* Any bus number match */
+	OpalPciBus3Bits	= 2,	/* Match top 3 bits of bus number */
+	OpalPciBus4Bits	= 3,	/* Match top 4 bits of bus number */
+	OpalPciBus5Bits	= 4,	/* Match top 5 bits of bus number */
+	OpalPciBus6Bits	= 5,	/* Match top 6 bits of bus number */
+	OpalPciBus7Bits	= 6,	/* Match top 7 bits of bus number */
+	OpalPciBusAll	= 7,	/* Match bus number exactly */
+};
+
+enum OpalDeviceCompare {
+	OPAL_IGNORE_RID_DEVICE_NUMBER = 0,
+	OPAL_COMPARE_RID_DEVICE_NUMBER = 1
+};
+
+enum OpalFuncCompare {
+	OPAL_IGNORE_RID_FUNCTION_NUMBER = 0,
+	OPAL_COMPARE_RID_FUNCTION_NUMBER = 1
+};
+
+enum OpalPeAction {
+	OPAL_UNMAP_PE = 0,
+	OPAL_MAP_PE = 1
+};
+
+enum OpalPciResetAndReinitScope { 
+	OPAL_PHB_COMPLETE = 1, OPAL_PCI_LINK = 2, OPAL_PHB_ERROR = 3,
+	OPAL_PCI_HOT_RESET = 4, OPAL_PCI_FUNDAMENTAL_RESET = 5,
+	OPAL_PCI_IODA_RESET = 6,
+};
+
+enum OpalPciResetState { OPAL_DEASSERT_RESET = 0, OPAL_ASSERT_RESET = 1 };
+
+struct opal_machine_check_event {
+	enum OpalMCE_Version	version:8;	/* 0x00 */
+	uint8_t			in_use;		/* 0x01 */
+	enum OpalMCE_Severity	severity:8;	/* 0x02 */
+	enum OpalMCE_Initiator	initiator:8;	/* 0x03 */
+	enum OpalMCE_ErrorType	error_type:8;	/* 0x04 */
+	enum OpalMCE_Disposition disposition:8; /* 0x05 */
+	uint8_t			reserved_1[2];	/* 0x06 */
+	uint64_t		gpr3;		/* 0x08 */
+	uint64_t		srr0;		/* 0x10 */
+	uint64_t		srr1;		/* 0x18 */
+	union {					/* 0x20 */
+		struct {
+			enum OpalMCE_UeErrorType ue_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		physical_address_provided;
+			uint8_t		reserved_1[5];
+			uint64_t	effective_address;
+			uint64_t	physical_address;
+			uint8_t		reserved_2[8];
+		} ue_error;
+
+		struct {
+			enum OpalMCE_SlbErrorType slb_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		reserved_1[6];
+			uint64_t	effective_address;
+			uint8_t		reserved_2[16];
+		} slb_error;
+
+		struct {
+			enum OpalMCE_EratErrorType erat_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		reserved_1[6];
+			uint64_t	effective_address;
+			uint8_t		reserved_2[16];
+		} erat_error;
+
+		struct {
+			enum OpalMCE_TlbErrorType tlb_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		reserved_1[6];
+			uint64_t	effective_address;
+			uint8_t		reserved_2[16];
+		} tlb_error;
+	} u;
+};
+
+typedef struct oppanel_line {
+	/* XXX */
+} oppanel_line_t;
+
+/* API functions */
+int64_t opal_console_write(int64_t term_number, int64_t *length,
+			   const uint8_t *buffer);
+int64_t opal_console_read(int64_t term_number, int64_t *length,
+			  uint8_t *buffer);
+int64_t opal_console_write_buffer_space(int64_t term_number,
+					int64_t *length);
+int64_t opal_rtc_read(uint32_t *year_month_day,
+		      uint64_t *hour_minute_second_millisecond);
+int64_t opal_rtc_write(uint32_t year_month_day,
+		       uint64_t hour_minute_second_millisecond);
+int64_t opal_cec_power_down(uint64_t request);
+int64_t opal_cec_reboot(void);
+int64_t opal_read_nvram(uint64_t buffer, uint64_t size, uint64_t offset);
+int64_t opal_write_nvram(uint64_t buffer, uint64_t size, uint64_t offset);
+int64_t opal_handle_interrupt(uint64_t isn, uint64_t *outstanding_event_mask);
+int64_t opal_poll_events(uint64_t *outstanding_event_mask);
+int64_t opal_pci_set_hub_tce_memory(uint64_t hub_id, uint64_t tce_mem_addr,
+				    uint64_t tce_mem_size);
+int64_t opal_pci_set_phb_tce_memory(uint64_t phb_id, uint64_t tce_mem_addr,
+				    uint64_t tce_mem_size);
+int64_t opal_pci_config_read_byte(uint64_t phb_id, uint64_t bus_dev_func,
+				  uint64_t offset, uint8_t *data);
+int64_t opal_pci_config_read_half_word(uint64_t phb_id, uint64_t bus_dev_func,
+				       uint64_t offset, uint16_t *data);
+int64_t opal_pci_config_read_word(uint64_t phb_id, uint64_t bus_dev_func,
+				  uint64_t offset, uint32_t *data);
+int64_t opal_pci_config_write_byte(uint64_t phb_id, uint64_t bus_dev_func,
+				   uint64_t offset, uint8_t data);
+int64_t opal_pci_config_write_half_word(uint64_t phb_id, uint64_t bus_dev_func,
+					uint64_t offset, uint16_t data);
+int64_t opal_pci_config_write_word(uint64_t phb_id, uint64_t bus_dev_func,
+				   uint64_t offset, uint32_t data);
+int64_t opal_set_xive(uint32_t isn, uint16_t server, uint8_t priority);
+int64_t opal_get_xive(uint32_t isn, uint16_t *server, uint8_t *priority);
+int64_t opal_register_exception_handler(uint64_t opal_exception,
+					uint64_t handler_address,
+					uint64_t glue_cache_line);
+int64_t opal_pci_eeh_freeze_status(uint64_t phb_id, uint64_t pe_number,
+				   uint8_t *freeze_state,
+				   uint16_t *pci_error_type,
+				   uint64_t *phb_status);
+int64_t opal_pci_eeh_freeze_clear(uint64_t phb_id, uint64_t pe_number,
+				  uint64_t eeh_action_token);
+int64_t opal_pci_shpc(uint64_t phb_id, uint64_t shpc_action, uint8_t *state);
+
+
+
+int64_t opal_pci_phb_mmio_enable(uint64_t phb_id, uint16_t window_type,
+				 uint16_t window_num, uint16_t enable);
+int64_t opal_pci_set_phb_mem_window(uint64_t phb_id, uint16_t window_type,
+				    uint16_t window_num,
+				    uint64_t starting_real_address,
+				    uint64_t starting_pci_address,
+				    uint16_t segment_size);
+int64_t opal_pci_map_pe_mmio_window(uint64_t phb_id, uint16_t pe_number,
+				    uint16_t window_type, uint16_t window_num,
+				    uint16_t segment_num);
+int64_t opal_pci_set_phb_table_memory(uint64_t phb_id, uint64_t rtt_addr,
+				      uint64_t ivt_addr, uint64_t ivt_len,
+				      uint64_t reject_array_addr,
+				      uint64_t peltv_addr);
+int64_t opal_pci_set_pe(uint64_t phb_id, uint64_t pe_number, uint64_t bus_dev_func,
+			uint8_t bus_compare, uint8_t dev_compare, uint8_t func_compare,
+			uint8_t pe_action);
+int64_t opal_pci_set_peltv(uint64_t phb_id, uint32_t parent_pe, uint32_t child_pe,
+			   uint8_t state);
+int64_t opal_pci_set_mve(uint64_t phb_id, uint32_t mve_number, uint32_t pe_number);
+int64_t opal_pci_set_mve_enable(uint64_t phb_id, uint32_t mve_number,
+				uint32_t state);
+int64_t opal_pci_get_xive_reissue(uint64_t phb_id, uint32_t xive_number,
+				  uint8_t *p_bit, uint8_t *q_bit);
+int64_t opal_pci_set_xive_reissue(uint64_t phb_id, uint32_t xive_number,
+				  uint8_t p_bit, uint8_t q_bit);
+int64_t opal_pci_set_xive_pe(uint64_t phb_id, uint32_t pe_number,
+			     uint32_t xive_num);
+int64_t opal_get_xive_source(uint64_t phb_id, uint32_t xive_num,
+			     int32_t *interrupt_source_number);
+int64_t opal_get_msi_32(uint64_t phb_id, uint32_t mve_number, uint32_t xive_num,
+			uint8_t msi_range, uint32_t *msi_address,
+			uint32_t *message_data);
+int64_t opal_get_msi_64(uint64_t phb_id, uint32_t mve_number,
+			uint32_t xive_num, uint8_t msi_range,
+			uint64_t *msi_address, uint32_t *message_data);
+int64_t opal_start_cpu(uint64_t thread_number, uint64_t start_address);
+int64_t opal_query_cpu_status(uint64_t thread_number, uint8_t *thread_status);
+int64_t opal_write_oppanel(oppanel_line_t *lines, uint64_t num_lines);
+int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t window_id,
+				   uint16_t tce_levels, uint64_t tce_table_addr,
+				   uint64_t tce_table_size, uint64_t tce_page_size);
+int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
+					uint16_t dma_window_number, uint64_t pci_start_addr,
+					uint64_t pci_mem_size);
+int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
+
+/* Internal functions */
+extern int early_init_dt_scan_opal(unsigned long node, const char *uname, int depth, void *data);
+
+extern int opal_get_chars(uint32_t vtermno, char *buf, int count);
+extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
+
+extern void hvc_opal_init_early(void);
+
+/* Internal functions */
+extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
+				   int depth, void *data);
+
+#endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_H */
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 7b90c56..831a201 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -54,6 +54,8 @@
 #include <asm/pci-bridge.h>
 #include <asm/phyp_dump.h>
 #include <asm/kexec.h>
+#include <asm/opal.h>
+
 #include <mm/mmu_decl.h>
 
 #ifdef DEBUG
@@ -707,6 +709,11 @@ void __init early_init_devtree(void *params)
 	of_scan_flat_dt(early_init_dt_scan_rtas, NULL);
 #endif
 
+#ifdef CONFIG_PPC_POWERNV
+	/* Some machines might need OPAL info for debugging, grab it now. */
+	of_scan_flat_dt(early_init_dt_scan_opal, NULL);
+#endif
+
 #ifdef CONFIG_PHYP_DUMP
 	/* scan tree to see if dump occurred during last boot */
 	of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL);
diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 5cd04f5..268cadb 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -1,12 +1,16 @@
 config PPC_POWERNV
 	depends on PPC64 && PPC_BOOK3S
 	bool "IBM PowerNV (Non-Virtualized) platform support"	
-	select PPC_RTAS
 	select PPC_NATIVE
 	select PPC_XICS
 	select PPC_ICP_NATIVE
-	select PPC_ICS_RTAS
 	select PPC_P7_NAP
 	select PPC_PCI_CHOICE if EMBEDDED
 	default y
 
+config PPC_POWERNV_RTAS
+	depends on PPC_POWERNV
+	bool "Support for RTAS based PowerNV platforms such as BML"
+	default y
+	select PPC_ICS_RTAS
+	select PPC_RTAS
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 4971330..8f69c0d 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,2 +1,2 @@
-obj-y			+= setup.o opal-takeover.o
+obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o
 obj-$(CONFIG_SMP)	+= smp.o
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
new file mode 100644
index 0000000..4a3f46d
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -0,0 +1,101 @@
+/*
+ * PowerNV OPAL API wrappers
+ *
+ * Copyright 2011 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/hvcall.h>
+#include <asm/asm-offsets.h>
+#include <asm/opal.h>
+
+/* TODO:
+ *
+ * - Trace irqs in/off (needs saving/restoring all args, argh...)
+ * - Get r11 feed up by Dave so I can have better register usage
+ */
+#define OPAL_CALL(name, token)		\
+ _GLOBAL(name);				\
+	mflr	r0;			\
+	mfcr	r12;			\
+	std	r0,16(r1);		\
+	std	r12,8(r1);		\
+	std	r1,PACAR1(r13);		\
+	li	r0,0;			\
+	mfmsr	r12;			\
+	ori	r0,r0,MSR_EE;		\
+	std	r12,PACASAVEDMSR(r13);	\
+	andc	r12,r12,r0;		\
+	mtmsrd	r12,1;			\
+	LOAD_REG_ADDR(r0,.opal_return);	\
+	mtlr	r0;			\
+	li	r0,MSR_DR|MSR_IR;	\
+	andc	r12,r12,r0;		\
+	li	r0,token;		\
+	mtspr	SPRN_HSRR1,r12;		\
+	LOAD_REG_ADDR(r11,opal);	\
+	ld	r12,8(r11);		\
+	ld	r2,0(r11);		\
+	mtspr	SPRN_HSRR0,r12;		\
+	hrfid
+
+_STATIC(opal_return)
+	ld	r2,PACATOC(r13);
+	ld	r4,8(r1);
+	ld	r5,16(r1);
+	ld	r6,PACASAVEDMSR(r13);
+	mtspr	SPRN_SRR0,r5;
+	mtspr	SPRN_SRR1,r6;
+	mtcr	r4;
+	rfid
+
+OPAL_CALL(opal_console_write,			OPAL_CONSOLE_WRITE);
+OPAL_CALL(opal_console_read,			OPAL_CONSOLE_READ);
+OPAL_CALL(opal_console_write_buffer_space,	OPAL_CONSOLE_WRITE_BUFFER_SPACE);
+OPAL_CALL(opal_rtc_read,			OPAL_RTC_READ);
+OPAL_CALL(opal_rtc_write,			OPAL_RTC_WRITE);
+OPAL_CALL(opal_cec_power_down,			OPAL_CEC_POWER_DOWN);
+OPAL_CALL(opal_cec_reboot,			OPAL_CEC_REBOOT);
+OPAL_CALL(opal_read_nvram,			OPAL_READ_NVRAM);
+OPAL_CALL(opal_write_nvram,			OPAL_WRITE_NVRAM);
+OPAL_CALL(opal_handle_interrupt,		OPAL_HANDLE_INTERRUPT);
+OPAL_CALL(opal_poll_events,			OPAL_POLL_EVENTS);
+OPAL_CALL(opal_pci_set_hub_tce_memory,		OPAL_PCI_SET_HUB_TCE_MEMORY);
+OPAL_CALL(opal_pci_set_phb_tce_memory,		OPAL_PCI_SET_PHB_TCE_MEMORY);
+OPAL_CALL(opal_pci_config_read_byte,		OPAL_PCI_CONFIG_READ_BYTE);
+OPAL_CALL(opal_pci_config_read_half_word,	OPAL_PCI_CONFIG_READ_HALF_WORD);
+OPAL_CALL(opal_pci_config_read_word,		OPAL_PCI_CONFIG_READ_WORD);
+OPAL_CALL(opal_pci_config_write_byte,		OPAL_PCI_CONFIG_WRITE_BYTE);
+OPAL_CALL(opal_pci_config_write_half_word,	OPAL_PCI_CONFIG_WRITE_HALF_WORD);
+OPAL_CALL(opal_pci_config_write_word,		OPAL_PCI_CONFIG_WRITE_WORD);
+OPAL_CALL(opal_set_xive,			OPAL_SET_XIVE);
+OPAL_CALL(opal_get_xive,			OPAL_GET_XIVE);
+OPAL_CALL(opal_register_exception_handler,	OPAL_REGISTER_OPAL_EXCEPTION_HANDLER);
+OPAL_CALL(opal_pci_eeh_freeze_status,		OPAL_PCI_EEH_FREEZE_STATUS);
+OPAL_CALL(opal_pci_eeh_freeze_clear,		OPAL_PCI_EEH_FREEZE_CLEAR);
+OPAL_CALL(opal_pci_shpc,			OPAL_PCI_SHPC);
+OPAL_CALL(opal_pci_phb_mmio_enable,		OPAL_PCI_PHB_MMIO_ENABLE);
+OPAL_CALL(opal_pci_set_phb_mem_window,		OPAL_PCI_SET_PHB_MEM_WINDOW);
+OPAL_CALL(opal_pci_map_pe_mmio_window,		OPAL_PCI_MAP_PE_MMIO_WINDOW);
+OPAL_CALL(opal_pci_set_phb_table_memory,	OPAL_PCI_SET_PHB_TABLE_MEMORY);
+OPAL_CALL(opal_pci_set_pe,			OPAL_PCI_SET_PE);
+OPAL_CALL(opal_pci_set_peltv,			OPAL_PCI_SET_PELTV);
+OPAL_CALL(opal_pci_set_mve,			OPAL_PCI_SET_MVE);
+OPAL_CALL(opal_pci_set_mve_enable,		OPAL_PCI_SET_MVE_ENABLE);
+OPAL_CALL(opal_pci_get_xive_reissue,		OPAL_PCI_GET_XIVE_REISSUE);
+OPAL_CALL(opal_pci_set_xive_reissue,		OPAL_PCI_SET_XIVE_REISSUE);
+OPAL_CALL(opal_pci_set_xive_pe,			OPAL_PCI_SET_XIVE_PE);
+OPAL_CALL(opal_get_xive_source,			OPAL_GET_XIVE_SOURCE);
+OPAL_CALL(opal_get_msi_32,			OPAL_GET_MSI_32);
+OPAL_CALL(opal_get_msi_64,			OPAL_GET_MSI_64);
+OPAL_CALL(opal_start_cpu,			OPAL_START_CPU);
+OPAL_CALL(opal_query_cpu_status,		OPAL_QUERY_CPU_STATUS);
+OPAL_CALL(opal_write_oppanel,			OPAL_WRITE_OPPANEL);
+OPAL_CALL(opal_pci_map_pe_dma_window,		OPAL_PCI_MAP_PE_DMA_WINDOW);
+OPAL_CALL(opal_pci_map_pe_dma_window_real,	OPAL_PCI_MAP_PE_DMA_WINDOW_REAL);
+OPAL_CALL(opal_pci_reset,			OPAL_PCI_RESET);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
new file mode 100644
index 0000000..8d55107
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -0,0 +1,154 @@
+/*
+ * PowerNV OPAL high level interfaces
+ *
+ * Copyright 2011 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/types.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <asm/opal.h>
+#include <asm/firmware.h>
+
+#include "powernv.h"
+
+struct opal {
+	u64 base;
+	u64 entry;
+} opal;
+
+static struct device_node *opal_node;
+static DEFINE_SPINLOCK(opal_write_lock);
+
+int __init early_init_dt_scan_opal(unsigned long node,
+				   const char *uname, int depth, void *data)
+{
+	const void *basep, *entryp;
+	unsigned long basesz, entrysz;
+
+	if (depth != 1 || strcmp(uname, "ibm,opal") != 0)
+		return 0;
+
+	basep  = of_get_flat_dt_prop(node, "opal-base-address", &basesz);
+	entryp = of_get_flat_dt_prop(node, "opal-entry-address", &entrysz);
+
+	if (!basep || !entryp)
+		return 1;
+
+	opal.base = of_read_number(basep, basesz/4);
+	opal.entry = of_read_number(entryp, entrysz/4);
+
+	pr_debug("OPAL Base  = 0x%llx (basep=%p basesz=%ld)\n",
+		 opal.base, basep, basesz);
+	pr_debug("OPAL Entry = 0x%llx (entryp=%p basesz=%ld)\n",
+		 opal.entry, entryp, entrysz);
+
+	powerpc_firmware_features |= FW_FEATURE_OPAL;
+	if (of_flat_dt_is_compatible(node, "ibm,opal-v2")) {
+		powerpc_firmware_features |= FW_FEATURE_OPALv2;
+		printk("OPAL V2 detected !\n");
+	} else {
+		printk("OPAL V1 detected !\n");
+	}
+
+	return 1;
+}
+
+int opal_get_chars(uint32_t vtermno, char *buf, int count)
+{
+	s64 len, rc;
+	u64 evt;
+
+	if (!opal.entry)
+		return 0;
+	opal_poll_events(&evt);
+	if ((evt & OPAL_EVENT_CONSOLE_INPUT) == 0)
+		return 0;
+	len = count;
+	rc = opal_console_read(vtermno, &len, buf);
+	if (rc == OPAL_SUCCESS)
+		return len;
+	return 0;
+}
+
+int opal_put_chars(uint32_t vtermno, const char *data, int total_len)
+{
+	int written = 0;
+	s64 len, rc = OPAL_BUSY;
+	unsigned long flags;
+	u64 evt;
+
+	if (!opal.entry)
+		return 0;
+
+	/* We want put_chars to be atomic to avoid mangling of hvsi
+	 * packets. To do that, we first test for room and return
+	 * -EAGAIN if there isn't enough
+	 */
+	spin_lock_irqsave(&opal_write_lock, flags);
+	rc = opal_console_write_buffer_space(vtermno, &len);
+	if (rc || len < total_len) {
+		spin_unlock_irqrestore(&opal_write_lock, flags);
+		/* Closed -> drop characters */
+		if (rc)
+			return total_len;
+		opal_poll_events(&evt);
+		return -EAGAIN;
+	}
+
+	/* We still try to handle partial completions, though they
+	 * should no longer happen.
+	 */
+	while(total_len > 0 && (rc == OPAL_BUSY ||
+				rc == OPAL_BUSY_EVENT || rc == OPAL_SUCCESS)) {
+		len = total_len;
+		rc = opal_console_write(vtermno, &len, data);
+		if (rc == OPAL_SUCCESS) {
+			total_len -= len;
+			data += len;
+			written += len;
+		}
+		/* This is a bit nasty but we need that for the console to
+		 * flush when there aren't any interrupts. We will clean
+		 * things a bit later to limit that to synchronous path
+		 * such as the kernel console and xmon/udbg
+		 */
+		do
+			opal_poll_events(&evt);
+		while(rc == OPAL_SUCCESS && (evt & OPAL_EVENT_CONSOLE_OUTPUT));
+	}
+	spin_unlock_irqrestore(&opal_write_lock, flags);
+	return written;
+}
+
+static int __init opal_init(void)
+{
+	struct device_node *np, *consoles;
+
+	opal_node = of_find_node_by_path("/ibm,opal");
+	if (!opal_node) {
+		pr_warn("opal: Node not found\n");
+		return -ENODEV;
+	}
+	if (firmware_has_feature(FW_FEATURE_OPALv2))
+		consoles = of_find_node_by_path("/ibm,opal/consoles");
+	else
+		consoles = of_node_get(opal_node);
+
+	/* Register serial ports */
+	for_each_child_of_node(consoles, np) {
+		if (strcmp(np->name, "serial"))
+			continue;
+		of_platform_device_create(np, NULL, NULL);
+	}
+	of_node_put(consoles);
+	return 0;
+}
+subsys_initcall(opal_init);
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 569f9cc..b6e5ff8 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -74,6 +74,12 @@ static void pnv_show_cpuinfo(struct seq_file *m)
 	if (root)
 		model = of_get_property(root, "model", NULL);
 	seq_printf(m, "machine\t\t: PowerNV %s\n", model);
+	if (firmware_has_feature(FW_FEATURE_OPALv2))
+		seq_printf(m, "firmware\t: OPAL v2\n");
+	else if (firmware_has_feature(FW_FEATURE_OPAL))
+		seq_printf(m, "firmware\t: OPAL v1\n");
+	else
+		seq_printf(m, "firmware\t: BML\n");
 	of_node_put(root);
 }
 
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 4f4ec37..e877366 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -30,6 +30,7 @@
 #include <asm/vdso_datapage.h>
 #include <asm/cputhreads.h>
 #include <asm/xics.h>
+#include <asm/opal.h>
 
 #include "powernv.h"
 
@@ -62,6 +63,28 @@ static int pnv_smp_cpu_bootable(unsigned int nr)
 	return 1;
 }
 
+int __devinit pnv_smp_kick_cpu(int nr)
+{
+	unsigned int pcpu = get_hard_smp_processor_id(nr);
+	unsigned long start_here = __pa(*((unsigned long *)
+					  generic_secondary_smp_init));
+	long rc;
+
+	BUG_ON(nr < 0 || nr >= NR_CPUS);
+
+	/* On OPAL v2 the CPU are still spinning inside OPAL itself,
+	 * get them back now
+	 */
+	if (firmware_has_feature(FW_FEATURE_OPALv2)) {
+		pr_devel("OPAL: Starting CPU %d (HW 0x%x)...\n", nr, pcpu);
+		rc = opal_start_cpu(pcpu, start_here);
+		if (rc != OPAL_SUCCESS)
+			pr_warn("OPAL Error %ld starting CPU %d\n",
+				rc, nr);
+	}
+	return smp_generic_kick_cpu(nr);
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 
 static int pnv_smp_cpu_disable(void)
@@ -127,7 +150,7 @@ static struct smp_ops_t pnv_smp_ops = {
 	.message_pass	= smp_muxed_ipi_message_pass,
 	.cause_ipi	= NULL,	/* Filled at runtime by xics_smp_probe() */
 	.probe		= xics_smp_probe,
-	.kick_cpu	= smp_generic_kick_cpu,
+	.kick_cpu	= pnv_smp_kick_cpu,
 	.setup_cpu	= pnv_smp_setup_cpu,
 	.cpu_bootable	= pnv_smp_cpu_bootable,
 #ifdef CONFIG_HOTPLUG_CPU
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 14/20] powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks
From: Benjamin Herrenschmidt @ 2011-09-20  3:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

Implements OPAL RTC and NVRAM support and wire all that up to
the powernv platform.

We use RTAS for RTC as a fallback if available. Using RTAS for nvram
is not supported yet, pending some rework/cleanup and generalization
of the pSeries & CHRP code. We also use RTAS fallbacks for power off
and reboot

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/opal.h             |    6 ++
 arch/powerpc/platforms/powernv/Makefile     |    2 +
 arch/powerpc/platforms/powernv/opal-nvram.c |   88 ++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-rtc.c   |   97 +++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/setup.c      |   57 ++++++++++------
 5 files changed, 229 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-nvram.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-rtc.c

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index dbeabea..9bb0efd 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -430,6 +430,12 @@ extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
 
 extern void hvc_opal_init_early(void);
 
+struct rtc_time;
+extern int opal_set_rtc_time(struct rtc_time *tm);
+extern void opal_get_rtc_time(struct rtc_time *tm);
+extern unsigned long opal_get_boot_time(void);
+extern void opal_nvram_init(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 8f69c0d..618ad83 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,2 +1,4 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o
+obj-y			+= opal-rtc.o opal-nvram.o
+
 obj-$(CONFIG_SMP)	+= smp.o
diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c b/arch/powerpc/platforms/powernv/opal-nvram.c
new file mode 100644
index 0000000..3f83e1a
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-nvram.c
@@ -0,0 +1,88 @@
+/*
+ * PowerNV nvram code.
+ *
+ * Copyright 2011 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define DEBUG
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/of.h>
+
+#include <asm/opal.h>
+#include <asm/machdep.h>
+
+static unsigned int nvram_size;
+
+static ssize_t opal_nvram_size(void)
+{
+	return nvram_size;
+}
+
+static ssize_t opal_nvram_read(char *buf, size_t count, loff_t *index)
+{
+	s64 rc;
+	int off;
+
+	if (*index >= nvram_size)
+		return 0;
+	off = *index;
+	if ((off + count) > nvram_size)
+		count = nvram_size - off;
+	rc = opal_read_nvram(__pa(buf), count, off);
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+	*index += count;
+	return count;
+}
+
+static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)
+{
+	s64 rc = OPAL_BUSY;
+	int off;
+
+	if (*index >= nvram_size)
+		return 0;
+	off = *index;
+	if ((off + count) > nvram_size)
+		count = nvram_size - off;
+
+	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
+		rc = opal_write_nvram(__pa(buf), count, off);
+		if (rc == OPAL_BUSY_EVENT)
+			opal_poll_events(NULL);
+	}
+	*index += count;
+	return count;
+}
+
+void __init opal_nvram_init(void)
+{
+	struct device_node *np;
+	const u32 *nbytes_p;
+
+	np = of_find_compatible_node(NULL, NULL, "ibm,opal-nvram");
+	if (np == NULL)
+		return;
+
+	nbytes_p = of_get_property(np, "#bytes", NULL);
+	if (!nbytes_p) {
+		of_node_put(np);
+		return;
+	}
+	nvram_size = *nbytes_p;
+
+	printk(KERN_INFO "OPAL nvram setup, %u bytes\n", nvram_size);
+	of_node_put(np);
+
+	ppc_md.nvram_read = opal_nvram_read;
+	ppc_md.nvram_write = opal_nvram_write;
+	ppc_md.nvram_size = opal_nvram_size;
+}
+
diff --git a/arch/powerpc/platforms/powernv/opal-rtc.c b/arch/powerpc/platforms/powernv/opal-rtc.c
new file mode 100644
index 0000000..2aa7641
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-rtc.c
@@ -0,0 +1,97 @@
+/*
+ * PowerNV Real Time Clock.
+ *
+ * Copyright 2011 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+
+#include <linux/kernel.h>
+#include <linux/time.h>
+#include <linux/bcd.h>
+#include <linux/rtc.h>
+#include <linux/delay.h>
+
+#include <asm/opal.h>
+#include <asm/firmware.h>
+
+static void opal_to_tm(u32 y_m_d, u64 h_m_s_ms, struct rtc_time *tm)
+{
+	tm->tm_year	= ((bcd2bin(y_m_d >> 24) * 100) +
+			   bcd2bin((y_m_d >> 16) & 0xff)) - 1900;
+	tm->tm_mon	= bcd2bin((y_m_d >> 8) & 0xff) - 1;
+	tm->tm_mday	= bcd2bin(y_m_d & 0xff);
+	tm->tm_hour	= bcd2bin((h_m_s_ms >> 56) & 0xff);
+	tm->tm_min	= bcd2bin((h_m_s_ms >> 48) & 0xff);
+	tm->tm_sec	= bcd2bin((h_m_s_ms >> 40) & 0xff);
+
+        GregorianDay(tm);
+}
+
+unsigned long __init opal_get_boot_time(void)
+{
+	struct rtc_time tm;
+	u32 y_m_d;
+	u64 h_m_s_ms;
+	long rc = OPAL_BUSY;
+
+	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
+		rc = opal_rtc_read(&y_m_d, &h_m_s_ms);
+		if (rc == OPAL_BUSY_EVENT)
+			opal_poll_events(NULL);
+		else
+			mdelay(10);
+	}
+	if (rc != OPAL_SUCCESS)
+		return 0;
+	opal_to_tm(y_m_d, h_m_s_ms, &tm);
+	return mktime(tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+		      tm.tm_hour, tm.tm_min, tm.tm_sec);
+}
+
+void opal_get_rtc_time(struct rtc_time *tm)
+{
+	long rc = OPAL_BUSY;
+	u32 y_m_d;
+	u64 h_m_s_ms;
+
+	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
+		rc = opal_rtc_read(&y_m_d, &h_m_s_ms);
+		if (rc == OPAL_BUSY_EVENT)
+			opal_poll_events(NULL);
+		else
+			mdelay(10);
+	}
+	if (rc != OPAL_SUCCESS)
+		return;
+	opal_to_tm(y_m_d, h_m_s_ms, tm);
+}
+
+int opal_set_rtc_time(struct rtc_time *tm)
+{
+	long rc = OPAL_BUSY;
+	u32 y_m_d = 0;
+	u64 h_m_s_ms = 0;
+
+	y_m_d |= ((u32)bin2bcd((tm->tm_year + 1900) / 100)) << 24;
+	y_m_d |= ((u32)bin2bcd((tm->tm_year + 1900) % 100)) << 16;
+	y_m_d |= ((u32)bin2bcd((tm->tm_mon + 1))) << 8;
+	y_m_d |= ((u32)bin2bcd(tm->tm_mday));
+
+	h_m_s_ms |= ((u64)bin2bcd(tm->tm_hour)) << 56;
+	h_m_s_ms |= ((u64)bin2bcd(tm->tm_min)) << 48;
+	h_m_s_ms |= ((u64)bin2bcd(tm->tm_sec)) << 40;
+
+	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
+		rc = opal_rtc_write(y_m_d, h_m_s_ms);
+		if (rc == OPAL_BUSY_EVENT)
+			opal_poll_events(NULL);
+		else
+			mdelay(10);
+	}
+	return rc == OPAL_SUCCESS ? 0 : -EIO;
+}
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 0fac0a6..4a2b2e2 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -29,7 +29,9 @@
 #include <asm/machdep.h>
 #include <asm/firmware.h>
 #include <asm/xics.h>
+#include <asm/rtas.h>
 #include <asm/opal.h>
+#include <asm/xics.h>
 
 #include "powernv.h"
 
@@ -40,7 +42,9 @@ static void __init pnv_setup_arch(void)
 
 	/* XXX PCI */
 
-	/* XXX NVRAM */
+	/* Setup RTC and NVRAM callbacks */
+	if (firmware_has_feature(FW_FEATURE_OPAL))
+		opal_nvram_init();
 
 	/* Enable NAP mode */
 	powersave_nap = 1;
@@ -118,30 +122,40 @@ static void __noreturn pnv_halt(void)
 	pnv_power_off();
 }
 
-static unsigned long __init pnv_get_boot_time(void)
-{
-	return 0;
-}
-
-static void pnv_get_rtc_time(struct rtc_time *rtc_tm)
+static void pnv_progress(char *s, unsigned short hex)
 {
 }
 
-static int pnv_set_rtc_time(struct rtc_time *tm)
+#ifdef CONFIG_KEXEC
+static void pnv_kexec_cpu_down(int crash_shutdown, int secondary)
 {
-	return 0;
+	xics_kexec_teardown_cpu(secondary);
 }
+#endif /* CONFIG_KEXEC */
 
-static void pnv_progress(char *s, unsigned short hex)
+static void __init pnv_setup_machdep_opal(void)
 {
+	ppc_md.get_boot_time = opal_get_boot_time;
+	ppc_md.get_rtc_time = opal_get_rtc_time;
+	ppc_md.set_rtc_time = opal_set_rtc_time;
+	ppc_md.restart = pnv_restart;
+	ppc_md.power_off = pnv_power_off;
+	ppc_md.halt = pnv_halt;
 }
 
-#ifdef CONFIG_KEXEC
-static void pnv_kexec_cpu_down(int crash_shutdown, int secondary)
+#ifdef CONFIG_PPC_POWERNV_RTAS
+static void __init pnv_setup_machdep_rtas(void)
 {
-	xics_kexec_teardown_cpu(secondary);
+	if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) {
+		ppc_md.get_boot_time = rtas_get_boot_time;
+		ppc_md.get_rtc_time = rtas_get_rtc_time;
+		ppc_md.set_rtc_time = rtas_set_rtc_time;
+	}
+	ppc_md.restart = rtas_restart;
+	ppc_md.power_off = rtas_power_off;
+	ppc_md.halt = rtas_halt;
 }
-#endif /* CONFIG_KEXEC */
+#endif /* CONFIG_PPC_POWERNV_RTAS */
 
 static int __init pnv_probe(void)
 {
@@ -152,6 +166,13 @@ static int __init pnv_probe(void)
 
 	hpte_init_native();
 
+	if (firmware_has_feature(FW_FEATURE_OPAL))
+		pnv_setup_machdep_opal();
+#ifdef CONFIG_PPC_POWERNV_RTAS
+	else if (rtas.base)
+		pnv_setup_machdep_rtas();
+#endif /* CONFIG_PPC_POWERNV_RTAS */
+
 	pr_debug("PowerNV detected !\n");
 
 	return 1;
@@ -160,16 +181,10 @@ static int __init pnv_probe(void)
 define_machine(powernv) {
 	.name			= "PowerNV",
 	.probe			= pnv_probe,
-	.setup_arch		= pnv_setup_arch,
 	.init_early		= pnv_init_early,
+	.setup_arch		= pnv_setup_arch,
 	.init_IRQ		= pnv_init_IRQ,
 	.show_cpuinfo		= pnv_show_cpuinfo,
-	.restart		= pnv_restart,
-	.power_off		= pnv_power_off,
-	.halt			= pnv_halt,
-	.get_boot_time		= pnv_get_boot_time,
-	.get_rtc_time		= pnv_get_rtc_time,
-	.set_rtc_time		= pnv_set_rtc_time,
 	.progress		= pnv_progress,
 	.power_save             = power7_idle,
 	.calibrate_decr		= generic_calibrate_decr,
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 15/20] powerpc/powernv: Add OPAL ICS backend
From: Benjamin Herrenschmidt @ 2011-09-20  3:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

OPAL handles HW access to the various ICS or equivalent chips
for us (with the exception of p5ioc2 based HEA which uses a
different backend) similarily to what RTAS does on pSeries.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/xics.h        |   19 +++
 arch/powerpc/sysdev/xics/Makefile      |    1 +
 arch/powerpc/sysdev/xics/ics-opal.c    |  244 ++++++++++++++++++++++++++++++++
 arch/powerpc/sysdev/xics/xics-common.c |    8 +-
 4 files changed, 266 insertions(+), 6 deletions(-)
 create mode 100644 arch/powerpc/sysdev/xics/ics-opal.c

diff --git a/arch/powerpc/include/asm/xics.h b/arch/powerpc/include/asm/xics.h
index b183a40..bd6c401 100644
--- a/arch/powerpc/include/asm/xics.h
+++ b/arch/powerpc/include/asm/xics.h
@@ -27,10 +27,18 @@
 #define MAX_NUM_PRIORITIES	3
 
 /* Native ICP */
+#ifdef CONFIG_PPC_ICP_NATIVE
 extern int icp_native_init(void);
+#else
+static inline int icp_native_init(void) { return -ENODEV; }
+#endif
 
 /* PAPR ICP */
+#ifdef CONFIG_PPC_ICP_HV
 extern int icp_hv_init(void);
+#else
+static inline int icp_hv_init(void) { return -ENODEV; }
+#endif
 
 /* ICP ops */
 struct icp_ops {
@@ -51,7 +59,18 @@ extern const struct icp_ops *icp_ops;
 extern int ics_native_init(void);
 
 /* RTAS ICS */
+#ifdef CONFIG_PPC_ICS_RTAS
 extern int ics_rtas_init(void);
+#else
+static inline int ics_rtas_init(void) { return -ENODEV; }
+#endif
+
+/* HAL ICS */
+#ifdef CONFIG_PPC_POWERNV
+extern int ics_opal_init(void);
+#else
+static inline int ics_opal_init(void) { return -ENODEV; }
+#endif
 
 /* ICS instance, hooked up to chip_data of an irq */
 struct ics {
diff --git a/arch/powerpc/sysdev/xics/Makefile b/arch/powerpc/sysdev/xics/Makefile
index b75a605..c606aa8 100644
--- a/arch/powerpc/sysdev/xics/Makefile
+++ b/arch/powerpc/sysdev/xics/Makefile
@@ -4,3 +4,4 @@ obj-y				+= xics-common.o
 obj-$(CONFIG_PPC_ICP_NATIVE)	+= icp-native.o
 obj-$(CONFIG_PPC_ICP_HV)	+= icp-hv.o
 obj-$(CONFIG_PPC_ICS_RTAS)	+= ics-rtas.o
+obj-$(CONFIG_PPC_POWERNV)	+= ics-opal.o
diff --git a/arch/powerpc/sysdev/xics/ics-opal.c b/arch/powerpc/sysdev/xics/ics-opal.c
new file mode 100644
index 0000000..f7e8609
--- /dev/null
+++ b/arch/powerpc/sysdev/xics/ics-opal.c
@@ -0,0 +1,244 @@
+/*
+ * ICS backend for OPAL managed interrupts.
+ *
+ * Copyright 2011 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/irq.h>
+#include <linux/smp.h>
+#include <linux/interrupt.h>
+#include <linux/init.h>
+#include <linux/cpu.h>
+#include <linux/of.h>
+#include <linux/spinlock.h>
+#include <linux/msi.h>
+
+#include <asm/prom.h>
+#include <asm/smp.h>
+#include <asm/machdep.h>
+#include <asm/irq.h>
+#include <asm/errno.h>
+#include <asm/xics.h>
+#include <asm/opal.h>
+#include <asm/firmware.h>
+
+static int ics_opal_mangle_server(int server)
+{
+	/* No link for now */
+	return server << 2;
+}
+
+static int ics_opal_unmangle_server(int server)
+{
+	/* No link for now */
+	return server >> 2;
+}
+
+static void ics_opal_unmask_irq(struct irq_data *d)
+{
+	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
+	int64_t rc;
+	int server;
+
+	pr_devel("ics-hal: unmask virq %d [hw 0x%x]\n", d->irq, hw_irq);
+
+	if (hw_irq == XICS_IPI || hw_irq == XICS_IRQ_SPURIOUS)
+		return;
+
+	server = xics_get_irq_server(d->irq, d->affinity, 0);
+	server = ics_opal_mangle_server(server);
+
+	rc = opal_set_xive(hw_irq, server, DEFAULT_PRIORITY);
+	if (rc != OPAL_SUCCESS)
+		pr_err("%s: opal_set_xive(irq=%d [hw 0x%x] server=%x)"
+		       " error %lld\n",
+		       __func__, d->irq, hw_irq, server, rc);
+}
+
+static unsigned int ics_opal_startup(struct irq_data *d)
+{
+#ifdef CONFIG_PCI_MSI
+	/*
+	 * The generic MSI code returns with the interrupt disabled on the
+	 * card, using the MSI mask bits. Firmware doesn't appear to unmask
+	 * at that level, so we do it here by hand.
+	 */
+	if (d->msi_desc)
+		unmask_msi_irq(d);
+#endif
+
+	/* unmask it */
+	ics_opal_unmask_irq(d);
+	return 0;
+}
+
+static void ics_opal_mask_real_irq(unsigned int hw_irq)
+{
+	int server = ics_opal_mangle_server(xics_default_server);
+	int64_t rc;
+
+	if (hw_irq == XICS_IPI)
+		return;
+
+	/* Have to set XIVE to 0xff to be able to remove a slot */
+	rc = opal_set_xive(hw_irq, server, 0xff);
+	if (rc != OPAL_SUCCESS)
+		pr_err("%s: opal_set_xive(0xff) irq=%u returned %lld\n",
+		       __func__, hw_irq, rc);
+}
+
+static void ics_opal_mask_irq(struct irq_data *d)
+{
+	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
+
+	pr_devel("ics-hal: mask virq %d [hw 0x%x]\n", d->irq, hw_irq);
+
+	if (hw_irq == XICS_IPI || hw_irq == XICS_IRQ_SPURIOUS)
+		return;
+	ics_opal_mask_real_irq(hw_irq);
+}
+
+static int ics_opal_set_affinity(struct irq_data *d,
+				 const struct cpumask *cpumask,
+				 bool force)
+{
+	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
+	int16_t server;
+	int8_t priority;
+	int64_t rc;
+	int wanted_server;
+
+	if (hw_irq == XICS_IPI || hw_irq == XICS_IRQ_SPURIOUS)
+		return -1;
+
+	rc = opal_get_xive(hw_irq, &server, &priority);
+	if (rc != OPAL_SUCCESS) {
+		pr_err("%s: opal_set_xive(irq=%d [hw 0x%x] server=%x)"
+		       " error %lld\n",
+		       __func__, d->irq, hw_irq, server, rc);
+		return -1;
+	}
+
+	wanted_server = xics_get_irq_server(d->irq, cpumask, 1);
+	if (wanted_server < 0) {
+		char cpulist[128];
+		cpumask_scnprintf(cpulist, sizeof(cpulist), cpumask);
+		pr_warning("%s: No online cpus in the mask %s for irq %d\n",
+			   __func__, cpulist, d->irq);
+		return -1;
+	}
+	server = ics_opal_mangle_server(wanted_server);
+
+	pr_devel("ics-hal: set-affinity irq %d [hw 0x%x] server: 0x%x/0x%x\n",
+		 d->irq, hw_irq, wanted_server, server);
+
+	rc = opal_set_xive(hw_irq, server, priority);
+	if (rc != OPAL_SUCCESS) {
+		pr_err("%s: opal_set_xive(irq=%d [hw 0x%x] server=%x)"
+		       " error %lld\n",
+		       __func__, d->irq, hw_irq, server, rc);
+		return -1;
+	}
+	return 0;
+}
+
+static struct irq_chip ics_opal_irq_chip = {
+	.name = "OPAL ICS",
+	.irq_startup = ics_opal_startup,
+	.irq_mask = ics_opal_mask_irq,
+	.irq_unmask = ics_opal_unmask_irq,
+	.irq_eoi = NULL, /* Patched at init time */
+	.irq_set_affinity = ics_opal_set_affinity
+};
+
+static int ics_opal_map(struct ics *ics, unsigned int virq);
+static void ics_opal_mask_unknown(struct ics *ics, unsigned long vec);
+static long ics_opal_get_server(struct ics *ics, unsigned long vec);
+
+static int ics_opal_host_match(struct ics *ics, struct device_node *node)
+{
+	return 1;
+}
+
+/* Only one global & state struct ics */
+static struct ics ics_hal = {
+	.map		= ics_opal_map,
+	.mask_unknown	= ics_opal_mask_unknown,
+	.get_server	= ics_opal_get_server,
+	.host_match	= ics_opal_host_match,
+};
+
+static int ics_opal_map(struct ics *ics, unsigned int virq)
+{
+	unsigned int hw_irq = (unsigned int)virq_to_hw(virq);
+	int64_t rc;
+	int16_t server;
+	int8_t priority;
+
+	if (WARN_ON(hw_irq == XICS_IPI || hw_irq == XICS_IRQ_SPURIOUS))
+		return -EINVAL;
+
+	/* Check if HAL knows about this interrupt */
+	rc = opal_get_xive(hw_irq, &server, &priority);
+	if (rc != OPAL_SUCCESS)
+		return -ENXIO;
+
+	irq_set_chip_and_handler(virq, &ics_opal_irq_chip, handle_fasteoi_irq);
+	irq_set_chip_data(virq, &ics_hal);
+
+	return 0;
+}
+
+static void ics_opal_mask_unknown(struct ics *ics, unsigned long vec)
+{
+	int64_t rc;
+	int16_t server;
+	int8_t priority;
+
+	/* Check if HAL knows about this interrupt */
+	rc = opal_get_xive(vec, &server, &priority);
+	if (rc != OPAL_SUCCESS)
+		return;
+
+	ics_opal_mask_real_irq(vec);
+}
+
+static long ics_opal_get_server(struct ics *ics, unsigned long vec)
+{
+	int64_t rc;
+	int16_t server;
+	int8_t priority;
+
+	/* Check if HAL knows about this interrupt */
+	rc = opal_get_xive(vec, &server, &priority);
+	if (rc != OPAL_SUCCESS)
+		return -1;
+	return ics_opal_unmangle_server(server);
+}
+
+int __init ics_opal_init(void)
+{
+	if (!firmware_has_feature(FW_FEATURE_OPAL))
+		return -ENODEV;
+
+	/* We need to patch our irq chip's EOI to point to the
+	 * right ICP
+	 */
+	ics_opal_irq_chip.irq_eoi = icp_ops->eoi;
+
+	/* Register ourselves */
+	xics_register_ics(&ics_hal);
+
+	pr_info("ICS OPAL backend registered\n");
+
+	return 0;
+}
diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
index 445c5a0..3d93a8d 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -409,14 +409,10 @@ void __init xics_init(void)
 	int rc = -1;
 
 	/* Fist locate ICP */
-#ifdef CONFIG_PPC_ICP_HV
 	if (firmware_has_feature(FW_FEATURE_LPAR))
 		rc = icp_hv_init();
-#endif
-#ifdef CONFIG_PPC_ICP_NATIVE
 	if (rc < 0)
 		rc = icp_native_init();
-#endif
 	if (rc < 0) {
 		pr_warning("XICS: Cannot find a Presentation Controller !\n");
 		return;
@@ -429,9 +425,9 @@ void __init xics_init(void)
 	xics_ipi_chip.irq_eoi = icp_ops->eoi;
 
 	/* Now locate ICS */
-#ifdef CONFIG_PPC_ICS_RTAS
 	rc = ics_rtas_init();
-#endif
+	if (rc < 0)
+		rc = ics_opal_init();
 	if (rc < 0)
 		pr_warning("XICS: Cannot find a Source Controller !\n");
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 16/20] powerpc/powernv: Register and handle OPAL interrupts
From: Benjamin Herrenschmidt @ 2011-09-20  3:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

We do the minimum which is to "pass" interrupts to HAL, which
makes the console smoother and will allow us to implement
interrupt based completion and console.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/platforms/powernv/opal.c |   31 +++++++++++++++++++++++++++++++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 7887733..5a598ca 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -14,6 +14,7 @@
 #include <linux/types.h>
 #include <linux/of.h>
 #include <linux/of_platform.h>
+#include <linux/interrupt.h>
 #include <asm/opal.h>
 #include <asm/firmware.h>
 
@@ -135,9 +136,22 @@ int opal_put_chars(uint32_t vtermno, const char *data, int total_len)
 	return written;
 }
 
+static irqreturn_t opal_interrupt(int irq, void *data)
+{
+	uint64_t events;
+
+	opal_handle_interrupt(virq_to_hw(irq), &events);
+
+	/* XXX TODO: Do something with the events */
+
+	return IRQ_HANDLED;
+}
+
 static int __init opal_init(void)
 {
 	struct device_node *np, *consoles;
+	const u32 *irqs;
+	int rc, i, irqlen;
 
 	opal_node = of_find_node_by_path("/ibm,opal");
 	if (!opal_node) {
@@ -156,6 +170,23 @@ static int __init opal_init(void)
 		of_platform_device_create(np, NULL, NULL);
 	}
 	of_node_put(consoles);
+
+	/* Find all OPAL interrupts and request them */
+	irqs = of_get_property(opal_node, "opal-interrupts", &irqlen);
+	pr_debug("opal: Found %d interrupts reserved for OPAL\n",
+		 irqs ? (irqlen / 4) : 0);
+	for (i = 0; irqs && i < (irqlen / 4); i++, irqs++) {
+		unsigned int hwirq = be32_to_cpup(irqs);
+		unsigned int irq = irq_create_mapping(NULL, hwirq);
+		if (irq == NO_IRQ) {
+			pr_warning("opal: Failed to map irq 0x%x\n", hwirq);
+			continue;
+		}
+		rc = request_irq(irq, opal_interrupt, 0, "opal", NULL);
+		if (rc)
+			pr_warning("opal: Error %d requesting irq %d"
+				   " (0x%x)\n", rc, irq, hwirq);
+	}
 	return 0;
 }
 subsys_initcall(opal_init);
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 12/20] powerpc/powernv: Support for OPAL console
From: Benjamin Herrenschmidt @ 2011-09-20  3:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

This adds a udbg and an hvc console backend for supporting a console
using the OPAL console interfaces.

On OPAL v1 we have hvc0 mapped to whatever console the system was
configured for (network or hvsi serial port) via the service
processor.

On OPAL v2 we have hvcN mapped to the Nth console provided by OPAL
which generally corresponds to:

	hvc0 : network console (raw protocol)
	hvc1 : serial port S1 (hvsi)
	hvc2 : serial port S2 (hvsi)

Note: At this point, early debug console only works with OPAL v1
and shouldn't be enabled in a normal kernel.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/Kconfig.debug             |   31 +++
 arch/powerpc/include/asm/opal.h        |    5 +
 arch/powerpc/include/asm/udbg.h        |    2 +
 arch/powerpc/kernel/head_64.S          |   14 +-
 arch/powerpc/kernel/udbg.c             |    4 +
 arch/powerpc/platforms/powernv/opal.c  |   31 ++-
 arch/powerpc/platforms/powernv/setup.c |   14 +-
 drivers/tty/hvc/Kconfig                |    9 +
 drivers/tty/hvc/Makefile               |    1 +
 drivers/tty/hvc/hvc_opal.c             |  424 ++++++++++++++++++++++++++++++++
 drivers/tty/hvc/hvsi_lib.c             |    4 +-
 11 files changed, 517 insertions(+), 22 deletions(-)
 create mode 100644 drivers/tty/hvc/hvc_opal.c

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index cc01f1d..e61830f 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -257,8 +257,27 @@ config PPC_EARLY_DEBUG_WSP
 	depends on PPC_WSP
 	select PPC_UDBG_16550
 
+config PPC_EARLY_DEBUG_OPAL_RAW
+	bool "OPAL raw console"
+	depends on HVC_OPAL
+	help
+	  Select this to enable early debugging for the PowerNV platform
+	  using a "raw" console
+
+config PPC_EARLY_DEBUG_OPAL_HVSI
+	bool "OPAL hvsi console"
+	depends on HVC_OPAL
+	help
+	  Select this to enable early debugging for the PowerNV platform
+	  using an "hvsi" console
+
 endchoice
 
+config PPC_EARLY_DEBUG_OPAL
+	def_bool y
+	depends on PPC_EARLY_DEBUG_OPAL_RAW || PPC_EARLY_DEBUG_OPAL_HVSI
+
+
 config PPC_EARLY_DEBUG_HVSI_VTERMNO
 	hex "vterm number to use with early debug HVSI"
 	depends on PPC_EARLY_DEBUG_LPAR_HVSI
@@ -267,6 +286,18 @@ config PPC_EARLY_DEBUG_HVSI_VTERMNO
 	  You probably want 0x30000000 for your first serial port and
 	  0x30000001 for your second one
 
+config PPC_EARLY_DEBUG_OPAL_VTERMNO
+	hex "vterm number to use with OPAL early debug"
+	depends on PPC_EARLY_DEBUG_OPAL
+	default "0"
+	help
+	  This correspond to which /dev/hvcN you want to use for early
+	  debug.
+
+	  On OPAL v1 (takeover) this should always be 0
+	  On OPAL v2, this will be 0 for network console and 1 or 2 for
+	  the machine built-in serial ports.
+
 config PPC_EARLY_DEBUG_44x_PHYSLOW
 	hex "Low 32 bits of early debug UART physical address"
 	depends on PPC_EARLY_DEBUG_44x
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 53cda41..dbeabea 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -425,6 +425,11 @@ extern void hvc_opal_init_early(void);
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
 				   int depth, void *data);
 
+extern int opal_get_chars(uint32_t vtermno, char *buf, int count);
+extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
+
+extern void hvc_opal_init_early(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_H */
diff --git a/arch/powerpc/include/asm/udbg.h b/arch/powerpc/include/asm/udbg.h
index 93e05d1..2ac1753 100644
--- a/arch/powerpc/include/asm/udbg.h
+++ b/arch/powerpc/include/asm/udbg.h
@@ -54,6 +54,8 @@ extern void __init udbg_init_40x_realmode(void);
 extern void __init udbg_init_cpm(void);
 extern void __init udbg_init_usbgecko(void);
 extern void __init udbg_init_wsp(void);
+extern void __init udbg_init_debug_opal_raw(void);
+extern void __init udbg_init_debug_opal_hvsi(void);
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_UDBG_H */
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index dea8191..06c7251 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -53,7 +53,8 @@
  *   2. The kernel is entered at __start
  * -or- For OPAL entry:
  *   1. The MMU is off, processor in HV mode, primary CPU enters at 0
- *      with device-tree in gpr3
+ *      with device-tree in gpr3. We also get OPAL base in r8 and
+ *	entry in r9 for debugging purposes
  *   2. Secondary processors enter at 0x60 with PIR in gpr3
  *
  *  For iSeries:
@@ -335,6 +336,11 @@ _GLOBAL(__start_initialization_multiplatform)
 	/* Save parameters */
 	mr	r31,r3
 	mr	r30,r4
+#ifdef CONFIG_PPC_EARLY_DEBUG_OPAL
+	/* Save OPAL entry */
+	mr	r28,r8
+	mr	r29,r9
+#endif
 
 #ifdef CONFIG_PPC_BOOK3E
 	bl	.start_initialization_book3e
@@ -711,6 +717,12 @@ _INIT_STATIC(start_here_multiplatform)
 	bdnz	3b
 4:
 
+#ifdef CONFIG_PPC_EARLY_DEBUG_OPAL
+	/* Setup OPAL entry */
+	std	r28,0(r11);
+	std	r29,8(r11);
+#endif
+
 #ifndef CONFIG_PPC_BOOK3E
 	mfmsr	r6
 	ori	r6,r6,MSR_RI
diff --git a/arch/powerpc/kernel/udbg.c b/arch/powerpc/kernel/udbg.c
index faa82c1..ea82faf 100644
--- a/arch/powerpc/kernel/udbg.c
+++ b/arch/powerpc/kernel/udbg.c
@@ -67,6 +67,10 @@ void __init udbg_early_init(void)
 	udbg_init_usbgecko();
 #elif defined(CONFIG_PPC_EARLY_DEBUG_WSP)
 	udbg_init_wsp();
+#elif defined(CONFIG_PPC_EARLY_DEBUG_OPAL_RAW)
+	udbg_init_debug_opal_raw();
+#elif defined(CONFIG_PPC_EARLY_DEBUG_OPAL_HVSI)
+	udbg_init_debug_opal_hvsi();
 #endif
 
 #ifdef CONFIG_PPC_EARLY_DEBUG
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 8d55107..7887733 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -67,7 +67,7 @@ int opal_get_chars(uint32_t vtermno, char *buf, int count)
 	u64 evt;
 
 	if (!opal.entry)
-		return 0;
+		return -ENODEV;
 	opal_poll_events(&evt);
 	if ((evt & OPAL_EVENT_CONSOLE_INPUT) == 0)
 		return 0;
@@ -81,31 +81,38 @@ int opal_get_chars(uint32_t vtermno, char *buf, int count)
 int opal_put_chars(uint32_t vtermno, const char *data, int total_len)
 {
 	int written = 0;
-	s64 len, rc = OPAL_BUSY;
+	s64 len, rc;
 	unsigned long flags;
 	u64 evt;
 
 	if (!opal.entry)
-		return 0;
+		return -ENODEV;
 
 	/* We want put_chars to be atomic to avoid mangling of hvsi
 	 * packets. To do that, we first test for room and return
-	 * -EAGAIN if there isn't enough
+	 * -EAGAIN if there isn't enough.
+	 *
+	 * Unfortunately, opal_console_write_buffer_space() doesn't
+	 * appear to work on opal v1, so we just assume there is
+	 * enough room and be done with it
 	 */
 	spin_lock_irqsave(&opal_write_lock, flags);
-	rc = opal_console_write_buffer_space(vtermno, &len);
-	if (rc || len < total_len) {
-		spin_unlock_irqrestore(&opal_write_lock, flags);
-		/* Closed -> drop characters */
-		if (rc)
-			return total_len;
-		opal_poll_events(&evt);
-		return -EAGAIN;
+	if (firmware_has_feature(FW_FEATURE_OPALv2)) {
+		rc = opal_console_write_buffer_space(vtermno, &len);
+		if (rc || len < total_len) {
+			spin_unlock_irqrestore(&opal_write_lock, flags);
+			/* Closed -> drop characters */
+			if (rc)
+				return total_len;
+			opal_poll_events(&evt);
+			return -EAGAIN;
+		}
 	}
 
 	/* We still try to handle partial completions, though they
 	 * should no longer happen.
 	 */
+	rc = OPAL_BUSY;
 	while(total_len > 0 && (rc == OPAL_BUSY ||
 				rc == OPAL_BUSY_EVENT || rc == OPAL_SUCCESS)) {
 		len = total_len;
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index b6e5ff8..07ba1ec 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -29,17 +29,12 @@
 #include <asm/machdep.h>
 #include <asm/firmware.h>
 #include <asm/xics.h>
+#include <asm/opal.h>
 
 #include "powernv.h"
 
 static void __init pnv_setup_arch(void)
 {
-	/* Force console to hvc for now until we have sorted out the
-	 * real console situation for the platform. This will make
-	 * hvc_udbg work at least.
-	 */
-	add_preferred_console("hvc", 0, NULL);
-
 	/* Initialize SMP */
 	pnv_smp_init();
 
@@ -55,7 +50,12 @@ static void __init pnv_setup_arch(void)
 
 static void __init pnv_init_early(void)
 {
-	/* XXX IOMMU */
+#ifdef CONFIG_HVC_OPAL
+	if (firmware_has_feature(FW_FEATURE_OPAL))
+		hvc_opal_init_early();
+	else
+#endif
+		add_preferred_console("hvc", 0, NULL);
 }
 
 static void __init pnv_init_IRQ(void)
diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
index e371753..4222035 100644
--- a/drivers/tty/hvc/Kconfig
+++ b/drivers/tty/hvc/Kconfig
@@ -34,6 +34,15 @@ config HVC_ISERIES
 	help
 	  iSeries machines support a hypervisor virtual console.
 
+config HVC_OPAL
+	bool "OPAL Console support"
+	depends on PPC_POWERNV
+	select HVC_DRIVER
+	select HVC_IRQ
+	default y
+	help
+	  PowerNV machines running under OPAL need that driver to get a console
+
 config HVC_RTAS
 	bool "IBM RTAS Console support"
 	depends on PPC_RTAS
diff --git a/drivers/tty/hvc/Makefile b/drivers/tty/hvc/Makefile
index e292053..89abf40b 100644
--- a/drivers/tty/hvc/Makefile
+++ b/drivers/tty/hvc/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_HVC_CONSOLE)	+= hvc_vio.o hvsi_lib.o
+obj-$(CONFIG_HVC_OPAL)		+= hvc_opal.o hvsi_lib.o
 obj-$(CONFIG_HVC_OLD_HVSI)	+= hvsi.o
 obj-$(CONFIG_HVC_ISERIES)	+= hvc_iseries.o
 obj-$(CONFIG_HVC_RTAS)		+= hvc_rtas.o
diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c
new file mode 100644
index 0000000..7b38512
--- /dev/null
+++ b/drivers/tty/hvc/hvc_opal.c
@@ -0,0 +1,424 @@
+/*
+ * opal driver interface to hvc_console.c
+ *
+ * Copyright 2011 Benjamin Herrenschmidt <benh@kernel.crashing.org>, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ */
+
+#undef DEBUG
+
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/slab.h>
+#include <linux/console.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+
+#include <asm/hvconsole.h>
+#include <asm/prom.h>
+#include <asm/firmware.h>
+#include <asm/hvsi.h>
+#include <asm/udbg.h>
+#include <asm/opal.h>
+
+#include "hvc_console.h"
+
+static const char hvc_opal_name[] = "hvc_opal";
+
+static struct of_device_id hvc_opal_match[] __devinitdata = {
+	{ .name = "serial", .compatible = "ibm,opal-console-raw" },
+	{ .name = "serial", .compatible = "ibm,opal-console-hvsi" },
+	{ },
+};
+
+typedef enum hv_protocol {
+	HV_PROTOCOL_RAW,
+	HV_PROTOCOL_HVSI
+} hv_protocol_t;
+
+struct hvc_opal_priv {
+	hv_protocol_t		proto;	/* Raw data or HVSI packets */
+	struct hvsi_priv	hvsi;	/* HVSI specific data */
+};
+static struct hvc_opal_priv *hvc_opal_privs[MAX_NR_HVC_CONSOLES];
+
+/* For early boot console */
+static struct hvc_opal_priv hvc_opal_boot_priv;
+static u32 hvc_opal_boot_termno;
+
+static const struct hv_ops hvc_opal_raw_ops = {
+	.get_chars = opal_get_chars,
+	.put_chars = opal_put_chars,
+	.notifier_add = notifier_add_irq,
+	.notifier_del = notifier_del_irq,
+	.notifier_hangup = notifier_hangup_irq,
+};
+
+static int hvc_opal_hvsi_get_chars(uint32_t vtermno, char *buf, int count)
+{
+	struct hvc_opal_priv *pv = hvc_opal_privs[vtermno];
+
+	if (WARN_ON(!pv))
+		return -ENODEV;
+
+	return hvsilib_get_chars(&pv->hvsi, buf, count);
+}
+
+static int hvc_opal_hvsi_put_chars(uint32_t vtermno, const char *buf, int count)
+{
+	struct hvc_opal_priv *pv = hvc_opal_privs[vtermno];
+
+	if (WARN_ON(!pv))
+		return -ENODEV;
+
+	return hvsilib_put_chars(&pv->hvsi, buf, count);
+}
+
+static int hvc_opal_hvsi_open(struct hvc_struct *hp, int data)
+{
+	struct hvc_opal_priv *pv = hvc_opal_privs[hp->vtermno];
+	int rc;
+
+	pr_devel("HVSI@%x: do open !\n", hp->vtermno);
+
+	rc = notifier_add_irq(hp, data);
+	if (rc)
+		return rc;
+
+	return hvsilib_open(&pv->hvsi, hp);
+}
+
+static void hvc_opal_hvsi_close(struct hvc_struct *hp, int data)
+{
+	struct hvc_opal_priv *pv = hvc_opal_privs[hp->vtermno];
+
+	pr_devel("HVSI@%x: do close !\n", hp->vtermno);
+
+	hvsilib_close(&pv->hvsi, hp);
+
+	notifier_del_irq(hp, data);
+}
+
+void hvc_opal_hvsi_hangup(struct hvc_struct *hp, int data)
+{
+	struct hvc_opal_priv *pv = hvc_opal_privs[hp->vtermno];
+
+	pr_devel("HVSI@%x: do hangup !\n", hp->vtermno);
+
+	hvsilib_close(&pv->hvsi, hp);
+
+	notifier_hangup_irq(hp, data);
+}
+
+static int hvc_opal_hvsi_tiocmget(struct hvc_struct *hp)
+{
+	struct hvc_opal_priv *pv = hvc_opal_privs[hp->vtermno];
+
+	if (!pv)
+		return -EINVAL;
+	return pv->hvsi.mctrl;
+}
+
+static int hvc_opal_hvsi_tiocmset(struct hvc_struct *hp, unsigned int set,
+				unsigned int clear)
+{
+	struct hvc_opal_priv *pv = hvc_opal_privs[hp->vtermno];
+
+	pr_devel("HVSI@%x: Set modem control, set=%x,clr=%x\n",
+		 hp->vtermno, set, clear);
+
+	if (set & TIOCM_DTR)
+		hvsilib_write_mctrl(&pv->hvsi, 1);
+	else if (clear & TIOCM_DTR)
+		hvsilib_write_mctrl(&pv->hvsi, 0);
+
+	return 0;
+}
+
+static const struct hv_ops hvc_opal_hvsi_ops = {
+	.get_chars = hvc_opal_hvsi_get_chars,
+	.put_chars = hvc_opal_hvsi_put_chars,
+	.notifier_add = hvc_opal_hvsi_open,
+	.notifier_del = hvc_opal_hvsi_close,
+	.notifier_hangup = hvc_opal_hvsi_hangup,
+	.tiocmget = hvc_opal_hvsi_tiocmget,
+	.tiocmset = hvc_opal_hvsi_tiocmset,
+};
+
+static int __devinit hvc_opal_probe(struct platform_device *dev)
+{
+	const struct hv_ops *ops;
+	struct hvc_struct *hp;
+	struct hvc_opal_priv *pv;
+	hv_protocol_t proto;
+	unsigned int termno, boot = 0;
+	const __be32 *reg;
+
+	if (of_device_is_compatible(dev->dev.of_node, "ibm,opal-console-raw")) {
+		proto = HV_PROTOCOL_RAW;
+		ops = &hvc_opal_raw_ops;
+	} else if (of_device_is_compatible(dev->dev.of_node,
+					   "ibm,opal-console-hvsi")) {
+		proto = HV_PROTOCOL_HVSI;
+		ops = &hvc_opal_hvsi_ops;
+	} else {
+		pr_err("hvc_opal: Unkown protocol for %s\n",
+		       dev->dev.of_node->full_name);
+		return -ENXIO;
+	}
+
+	reg = of_get_property(dev->dev.of_node, "reg", NULL);
+	termno = reg ? be32_to_cpup(reg) : 0;
+
+	/* Is it our boot one ? */
+	if (hvc_opal_privs[termno] == &hvc_opal_boot_priv) {
+		pv = hvc_opal_privs[termno];
+		boot = 1;
+	} else if (hvc_opal_privs[termno] == NULL) {
+		pv = kzalloc(sizeof(struct hvc_opal_priv), GFP_KERNEL);
+		if (!pv)
+			return -ENOMEM;
+		pv->proto = proto;
+		hvc_opal_privs[termno] = pv;
+		if (proto == HV_PROTOCOL_HVSI)
+			hvsilib_init(&pv->hvsi, opal_get_chars, opal_put_chars,
+				     termno, 0);
+
+		/* Instanciate now to establish a mapping index==vtermno */
+		hvc_instantiate(termno, termno, ops);
+	} else {
+		pr_err("hvc_opal: Device %s has duplicate terminal number #%d\n",
+		       dev->dev.of_node->full_name, termno);
+		return -ENXIO;
+	}
+
+	pr_info("hvc%d: %s protocol on %s%s\n", termno,
+		proto == HV_PROTOCOL_RAW ? "raw" : "hvsi",
+		dev->dev.of_node->full_name,
+		boot ? " (boot console)" : "");
+
+	/* We don't do IRQ yet */
+	hp = hvc_alloc(termno, 0, ops, MAX_VIO_PUT_CHARS);
+	if (IS_ERR(hp))
+		return PTR_ERR(hp);
+	dev_set_drvdata(&dev->dev, hp);
+
+	return 0;
+}
+
+static int __devexit hvc_opal_remove(struct platform_device *dev)
+{
+	struct hvc_struct *hp = dev_get_drvdata(&dev->dev);
+	int rc, termno;
+
+	termno = hp->vtermno;
+	rc = hvc_remove(hp);
+	if (rc == 0) {
+		if (hvc_opal_privs[termno] != &hvc_opal_boot_priv)
+			kfree(hvc_opal_privs[termno]);
+		hvc_opal_privs[termno] = NULL;
+	}
+	return rc;
+}
+
+static struct platform_driver hvc_opal_driver = {
+	.probe		= hvc_opal_probe,
+	.remove		= __devexit_p(hvc_opal_remove),
+	.driver		= {
+		.name	= hvc_opal_name,
+		.owner	= THIS_MODULE,
+		.of_match_table	= hvc_opal_match,
+	}
+};
+
+static int __init hvc_opal_init(void)
+{
+	if (!firmware_has_feature(FW_FEATURE_OPAL))
+		return -ENODEV;
+
+	/* Register as a vio device to receive callbacks */
+	return platform_driver_register(&hvc_opal_driver);
+}
+module_init(hvc_opal_init);
+
+static void __exit hvc_opal_exit(void)
+{
+	platform_driver_unregister(&hvc_opal_driver);
+}
+module_exit(hvc_opal_exit);
+
+static void udbg_opal_putc(char c)
+{
+	unsigned int termno = hvc_opal_boot_termno;
+	int count = -1;
+
+	if (c == '\n')
+		udbg_opal_putc('\r');
+
+	do {
+		switch(hvc_opal_boot_priv.proto) {
+		case HV_PROTOCOL_RAW:
+			count = opal_put_chars(termno, &c, 1);
+			break;
+		case HV_PROTOCOL_HVSI:
+			count = hvc_opal_hvsi_put_chars(termno, &c, 1);
+			break;
+		}
+	} while(count == 0 || count == -EAGAIN);
+}
+
+static int udbg_opal_getc_poll(void)
+{
+	unsigned int termno = hvc_opal_boot_termno;
+	int rc = 0;
+	char c;
+
+	switch(hvc_opal_boot_priv.proto) {
+	case HV_PROTOCOL_RAW:
+		rc = opal_get_chars(termno, &c, 1);
+		break;
+	case HV_PROTOCOL_HVSI:
+		rc = hvc_opal_hvsi_get_chars(termno, &c, 1);
+		break;
+	}
+	if (!rc)
+		return -1;
+	return c;
+}
+
+static int udbg_opal_getc(void)
+{
+	int ch;
+	for (;;) {
+		ch = udbg_opal_getc_poll();
+		if (ch == -1) {
+			/* This shouldn't be needed...but... */
+			volatile unsigned long delay;
+			for (delay=0; delay < 2000000; delay++)
+				;
+		} else {
+			return ch;
+		}
+	}
+}
+
+static void udbg_init_opal_common(void)
+{
+	udbg_putc = udbg_opal_putc;
+	udbg_getc = udbg_opal_getc;
+	udbg_getc_poll = udbg_opal_getc_poll;
+	tb_ticks_per_usec = 0x200; /* Make udelay not suck */
+}
+
+void __init hvc_opal_init_early(void)
+{
+	struct device_node *stdout_node = NULL;
+	const u32 *termno;
+	const char *name = NULL;
+	const struct hv_ops *ops;
+	u32 index;
+
+	/* find the boot console from /chosen/stdout */
+	if (of_chosen)
+		name = of_get_property(of_chosen, "linux,stdout-path", NULL);
+	if (name) {
+		stdout_node = of_find_node_by_path(name);
+		if (!stdout_node) {
+			pr_err("hvc_opal: Failed to locate default console!\n");
+			return;
+		}
+	} else {
+		struct device_node *opal, *np;
+
+		/* Current OPAL takeover doesn't provide the stdout
+		 * path, so we hard wire it
+		 */
+		opal = of_find_node_by_path("/ibm,opal/consoles");
+		if (opal)
+			pr_devel("hvc_opal: Found consoles in new location\n");
+		if (!opal) {
+			opal = of_find_node_by_path("/ibm,opal");
+			if (opal)
+				pr_devel("hvc_opal: "
+					 "Found consoles in old location\n");
+		}
+		if (!opal)
+			return;
+		for_each_child_of_node(opal, np) {
+			if (!strcmp(np->name, "serial")) {
+				stdout_node = np;
+				break;
+			}
+		}
+		of_node_put(opal);
+	}
+	if (!stdout_node)
+		return;
+	termno = of_get_property(stdout_node, "reg", NULL);
+	index = termno ? *termno : 0;
+	if (index >= MAX_NR_HVC_CONSOLES)
+		return;
+	hvc_opal_privs[index] = &hvc_opal_boot_priv;
+
+	/* Check the protocol */
+	if (of_device_is_compatible(stdout_node, "ibm,opal-console-raw")) {
+		hvc_opal_boot_priv.proto = HV_PROTOCOL_RAW;
+		ops = &hvc_opal_raw_ops;
+		pr_devel("hvc_opal: Found RAW console\n");
+	}
+	else if (of_device_is_compatible(stdout_node,"ibm,opal-console-hvsi")) {
+		hvc_opal_boot_priv.proto = HV_PROTOCOL_HVSI;
+		ops = &hvc_opal_hvsi_ops;
+		hvsilib_init(&hvc_opal_boot_priv.hvsi, opal_get_chars,
+			     opal_put_chars, index, 1);
+		/* HVSI, perform the handshake now */
+		hvsilib_establish(&hvc_opal_boot_priv.hvsi);
+		pr_devel("hvc_opal: Found HVSI console\n");
+	} else
+		goto out;
+	hvc_opal_boot_termno = index;
+	udbg_init_opal_common();
+	add_preferred_console("hvc", index, NULL);
+	hvc_instantiate(index, index, ops);
+out:
+	of_node_put(stdout_node);
+}
+
+#ifdef CONFIG_PPC_EARLY_DEBUG_OPAL_RAW
+void __init udbg_init_debug_opal(void)
+{
+	u32 index = CONFIG_PPC_EARLY_DEBUG_OPAL_VTERMNO;
+	hvc_opal_privs[index] = &hvc_opal_boot_priv;
+	hvc_opal_boot_priv.proto = HV_PROTOCOL_RAW;
+	hvc_opal_boot_termno = index;
+	udbg_init_opal_common();
+}
+#endif /* CONFIG_PPC_EARLY_DEBUG_OPAL_RAW */
+
+#ifdef CONFIG_PPC_EARLY_DEBUG_OPAL_HVSI
+void __init udbg_init_debug_opal_hvsi(void)
+{
+	u32 index = CONFIG_PPC_EARLY_DEBUG_OPAL_VTERMNO;
+	hvc_opal_privs[index] = &hvc_opal_boot_priv;
+	hvc_opal_boot_termno = index;
+	udbg_init_opal_common();
+	hvsilib_init(&hvc_opal_boot_priv.hvsi, opal_get_chars, opal_put_chars,
+		     index, 1);
+	hvsilib_establish(&hvc_opal_boot_priv.hvsi);
+}
+#endif /* CONFIG_PPC_EARLY_DEBUG_OPAL_HVSI */
diff --git a/drivers/tty/hvc/hvsi_lib.c b/drivers/tty/hvc/hvsi_lib.c
index bd9b098..6f4dd83 100644
--- a/drivers/tty/hvc/hvsi_lib.c
+++ b/drivers/tty/hvc/hvsi_lib.c
@@ -183,7 +183,7 @@ int hvsilib_get_chars(struct hvsi_priv *pv, char *buf, int count)
 	unsigned int tries, read = 0;
 
 	if (WARN_ON(!pv))
-		return 0;
+		return -ENXIO;
 
 	/* If we aren't open, don't do anything in order to avoid races
 	 * with connection establishment. The hvc core will call this
@@ -234,7 +234,7 @@ int hvsilib_put_chars(struct hvsi_priv *pv, const char *buf, int count)
 	int rc, adjcount = min(count, HVSI_MAX_OUTGOING_DATA);
 
 	if (WARN_ON(!pv))
-		return 0;
+		return -ENODEV;
 
 	dp.hdr.type = VS_DATA_PACKET_HEADER;
 	dp.hdr.len = adjcount + sizeof(struct hvsi_header);
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 19/20] powerpc/powernv: Implement MSI support for p5ioc2 PCIe
From: Benjamin Herrenschmidt @ 2011-09-20  3:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1316490307-28030-1-git-send-email-benh@kernel.crashing.org>

This implements support for MSIs on p5ioc2 PHBs. We only support
MSIs on the PCIe PHBs, not the PCI-X ones as the later hasn't been
properly verified in HW.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |   49 ++++++++++++
 arch/powerpc/platforms/powernv/pci.c        |  109 +++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h        |   10 +++
 3 files changed, 168 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
index afabc2b..5209160 100644
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
@@ -19,6 +19,7 @@
 #include <linux/bootmem.h>
 #include <linux/irq.h>
 #include <linux/io.h>
+#include <linux/msi.h>
 
 #include <asm/sections.h>
 #include <asm/io.h>
@@ -39,6 +40,51 @@
  */
 #define P5IOC2_TCE_MEMORY	0x01000000
 
+#ifdef CONFIG_PCI_MSI
+static int pnv_pci_p5ioc2_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
+				    unsigned int hwirq, unsigned int is_64,
+				    struct msi_msg *msg)
+{
+	if (WARN_ON(!is_64))
+		return -ENXIO;
+	msg->data = hwirq - phb->msi_base;
+	msg->address_hi = 0x10000000;
+	msg->address_lo = 0;
+
+	return 0;
+}
+
+static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb)
+{
+	unsigned int bmap_size;
+	const __be32 *prop = of_get_property(phb->hose->dn,
+					     "ibm,opal-msi-ranges", NULL);
+	if (!prop)
+		return;
+
+	/* Don't do MSI's on p5ioc2 PCI-X are they are not properly
+	 * verified in HW
+	 */
+	if (of_device_is_compatible(phb->hose->dn, "ibm,p5ioc2-pcix"))
+		return;
+	phb->msi_base = be32_to_cpup(prop);
+	phb->msi_count = be32_to_cpup(prop + 1);
+	bmap_size = BITS_TO_LONGS(phb->msi_count) * sizeof(unsigned long);
+	phb->msi_map = zalloc_maybe_bootmem(bmap_size, GFP_KERNEL);
+	if (!phb->msi_map) {
+		pr_err("PCI %d: Failed to allocate MSI bitmap !\n",
+		       phb->hose->global_number);
+		return;
+	}
+	phb->msi_setup = pnv_pci_p5ioc2_msi_setup;
+	phb->msi32_support = 0;
+	pr_info(" Allocated bitmap for %d MSIs (base IRQ 0x%x)\n",
+		phb->msi_count, phb->msi_base);
+}
+#else
+static void pnv_pci_setup_p5ioc2_msis(struct pnv_phb *phb) { }
+#endif /* CONFIG_PCI_MSI */
+
 static void __devinit pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
 						   struct pci_dev *pdev)
 {
@@ -117,6 +163,9 @@ static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np,
 
 	phb->hose->ops = &pnv_pci_ops;
 
+	/* Setup MSI support */
+	pnv_pci_init_p5ioc2_msis(phb);
+
 	/* Setup TCEs */
 	phb->dma_dev_setup = pnv_pci_p5ioc2_dma_dev_setup;
 	pnv_pci_setup_iommu_table(&phb->p5ioc2.iommu_table,
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index b512489..d3df7fd 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -19,6 +19,7 @@
 #include <linux/bootmem.h>
 #include <linux/irq.h>
 #include <linux/io.h>
+#include <linux/msi.h>
 
 #include <asm/sections.h>
 #include <asm/io.h>
@@ -38,6 +39,108 @@
 #define cfg_dbg(fmt...)	do { } while(0)
 //#define cfg_dbg(fmt...)	printk(fmt)
 
+#ifdef CONFIG_PCI_MSI
+static int pnv_msi_check_device(struct pci_dev* pdev, int nvec, int type)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	return (phb && phb->msi_map) ? 0 : -ENODEV;
+}
+
+static unsigned int pnv_get_one_msi(struct pnv_phb *phb)
+{
+	unsigned int id;
+
+	spin_lock(&phb->lock);
+	id = find_next_zero_bit(phb->msi_map, phb->msi_count, phb->msi_next);
+	if (id >= phb->msi_count && phb->msi_next)
+		id = find_next_zero_bit(phb->msi_map, phb->msi_count, 0);
+	if (id >= phb->msi_count) {
+		spin_unlock(&phb->lock);
+		return 0;
+	}
+	__set_bit(id, phb->msi_map);
+	spin_unlock(&phb->lock);
+	return id + phb->msi_base;
+}
+
+static void pnv_put_msi(struct pnv_phb *phb, unsigned int hwirq)
+{
+	unsigned int id;
+
+	if (WARN_ON(hwirq < phb->msi_base ||
+		    hwirq >= (phb->msi_base + phb->msi_count)))
+		return;
+	id = hwirq - phb->msi_base;
+	spin_lock(&phb->lock);
+	__clear_bit(id, phb->msi_map);
+	spin_unlock(&phb->lock);
+}
+
+static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct msi_desc *entry;
+	struct msi_msg msg;
+	unsigned int hwirq, virq;
+	int rc;
+
+	if (WARN_ON(!phb))
+		return -ENODEV;
+
+	list_for_each_entry(entry, &pdev->msi_list, list) {
+		if (!entry->msi_attrib.is_64 && !phb->msi32_support) {
+			pr_warn("%s: Supports only 64-bit MSIs\n",
+				pci_name(pdev));
+			return -ENXIO;
+		}
+		hwirq = pnv_get_one_msi(phb);
+		if (!hwirq) {
+			pr_warn("%s: Failed to find a free MSI\n",
+				pci_name(pdev));
+			return -ENOSPC;
+		}
+		virq = irq_create_mapping(NULL, hwirq);
+		if (virq == NO_IRQ) {
+			pr_warn("%s: Failed to map MSI to linux irq\n",
+				pci_name(pdev));
+			pnv_put_msi(phb, hwirq);
+			return -ENOMEM;
+		}
+		rc = phb->msi_setup(phb, pdev, hwirq, entry->msi_attrib.is_64,
+				    &msg);
+		if (rc) {
+			pr_warn("%s: Failed to setup MSI\n", pci_name(pdev));
+			irq_dispose_mapping(virq);
+			pnv_put_msi(phb, hwirq);
+			return rc;
+		}
+		irq_set_msi_desc(virq, entry);
+		write_msi_msg(virq, &msg);
+	}
+	return 0;
+}
+
+static void pnv_teardown_msi_irqs(struct pci_dev *pdev)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct msi_desc *entry;
+
+	if (WARN_ON(!phb))
+		return;
+
+	list_for_each_entry(entry, &pdev->msi_list, list) {
+		if (entry->irq == NO_IRQ)
+			continue;
+		irq_set_msi_desc(entry->irq, NULL);
+		pnv_put_msi(phb, virq_to_hw(entry->irq));
+		irq_dispose_mapping(entry->irq);
+	}
+}
+#endif /* CONFIG_PCI_MSI */
 
 static void pnv_pci_config_check_eeh(struct pnv_phb *phb, struct pci_bus *bus,
 				     u32 bdfn)
@@ -283,4 +386,10 @@ void __init pnv_pci_init(void)
 	ppc_md.tce_free = pnv_tce_free;
 	set_pci_dma_ops(&dma_iommu_ops);
 
+	/* Configure MSIs */
+#ifdef CONFIG_PCI_MSI
+	ppc_md.msi_check_device = pnv_msi_check_device;
+	ppc_md.setup_msi_irqs = pnv_setup_msi_irqs;
+	ppc_md.teardown_msi_irqs = pnv_teardown_msi_irqs;
+#endif
 }
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index b8be721..a468c9b 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -16,6 +16,16 @@ struct pnv_phb {
 	void __iomem		*regs;
 	spinlock_t		lock;
 
+#ifdef CONFIG_PCI_MSI
+	unsigned long		*msi_map;
+	unsigned int		msi_base;
+	unsigned int		msi_count;
+	unsigned int		msi_next;
+	unsigned int		msi32_support;
+#endif
+	int (*msi_setup)(struct pnv_phb *phb, struct pci_dev *dev,
+			 unsigned int hwirq, unsigned int is_64,
+			 struct msi_msg *msg);
 	void (*dma_dev_setup)(struct pnv_phb *phb, struct pci_dev *pdev);
 	void (*fixup_phb)(struct pci_controller *hose);
 	u32 (*bdfn_to_pe)(struct pnv_phb *phb, struct pci_bus *bus, u32 devfn);
-- 
1.7.4.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox