LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: While(1) in kernel space
From: Chris Friesen @ 2008-07-08 14:44 UTC (permalink / raw)
  To: Paolo Doz; +Cc: linuxppc-dev
In-Reply-To: <5468b7f30807080055p69571979i9a565653e523c2c0@mail.gmail.com>

Paolo Doz wrote:
> Hi folks,
> I'm developing a custom SPI driver (char device) on a MPC5200b, the 
> microcontroller linked as slave implements a protocol that must follow 
> strict timing constraints. I need to receive and send messages every 
> 6msec.

What are your timing requirements?  How much over/under 6ms can the 
protocol handle?

Kernel threads might work, but then you're at the mercy of the 
scheduler.  You'd probably be better off using a timer or softirq.

If the latency requirements are really strict, your best bet would 
probably be to use the -rt patches for the kernel.  That requires 
building a custom kernel though.

Chris

^ permalink raw reply

* RE: booting an ML405
From: John Linn @ 2008-07-08 14:58 UTC (permalink / raw)
  To: Lorenzo T. Flores, linuxppc-embedded
In-Reply-To: <486D774A.7000102@alphagolf.com>

Hi Lorenzo,

I'm assuming you're trying to use the reference design bit stream for
the ML405 that we have had out on the http://git.xilinx.com site?

Since the bootstrap loader is working, the UART appears to be OK.
Assuming you have the kernel configuration right with the 8250 driver
and the console, it sounds like something else must be wrong and you're
headed in the right direction with using the __log_buf.

The __log_buf should tell you if the kernel is really hung, or if it
booted and you just don't have a console working.

The ml405_defconfig sets up the kernel configuration so it should work
on the board just building it for the reference design.

One thing I notice is the available ram looks wrong to me for the ML405
based on my notes.
 =

> avail ram:     005B0000 80000000

I had a problem once with the kernel and I found the DDR_SIZE in
xparameters.h (arch/ppc/platforms/40x/xparameters) was not right, and
this does hang the kernel in my experience.

My old kernel output shows

> avail ram:     00519000 04000000

Hope that might help,
John


> -----Original Message-----
> From: linuxppc-embedded-bounces+john.linn=3Dxilinx.com@ozlabs.org
[mailto:linuxppc-embedded-
> bounces+john.linn=3Dxilinx.com@ozlabs.org] On Behalf Of Lorenzo T.
Flores
> Sent: Thursday, July 03, 2008 7:05 PM
> To: linuxppc-embedded@ozlabs.org
> Subject: booting an ML405
> =

> Hey all,
> =

> I did a little preliminary poking through the thread archives, but
> didn't turn anything up. I anyone could just point me in the right
> direction as far as troubleshooting, that would be great!
> =

> I'm trying to compile the Xilinx patched kernel tree (v2.6.25-rc9) and
> run it on an ML405
> =

> So far, I get to the following point in the boot sequence:
> =

> loaded at:     00400000 005AF5A0
> board data at: 005AD524 005AD5A0
> relocated to:  00405058 004050D4
> zimage at:     00405E90 0051D6CC
> initrd at:     0051E000 005ACC0D
> avail ram:     005B0000 80000000
> =

> Linux/PPC load: console=3DttyS0,9600 ip=3Don root=3D/dev/ram rw
> Uncompressing Linux...done.
> Now booting the kernel
> =

> The system hangs after it says "Now booting the kernel."
> =

> Once again, any input would be appreciated.
> =

> thank you,
> Lorenzo
> _______________________________________________
> Linuxppc-embedded mailing list
> Linuxppc-embedded@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-embedded


This email and any attachments are intended for the sole use of the named r=
ecipient(s) and contain(s) confidential information that may be proprietary=
, privileged or copyrighted under applicable law. If you are not the intend=
ed recipient, do not read, copy, or forward this email message or any attac=
hments. Delete this email message and any attachments immediately.

^ permalink raw reply

* [PATCH v2] ibm_newemac: Add MII mode support to the EMAC RGMII bridge.
From: Grant Erickson @ 2008-07-08 15:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sr, jgarzik
In-Reply-To: <C497F939.104DB%gerickson@nuovations.com>

This patch adds support to the RGMII handler in the EMAC driver for
the MII PHY mode such that device tree entries of the form `phy-mode = "mii";'
are recognized and handled appropriately.

While logically, in software, "gmii" and "mii" modes are the same,
they are wired differently, so it makes sense to allow DTS authors to
specify each explicitly.

Signed-off-by: Grant Erickson <gerickson@nuovations.com>
---
 drivers/net/ibm_newemac/rgmii.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ibm_newemac/rgmii.c b/drivers/net/ibm_newemac/rgmii.c
index e32da3d..5acb006 100644
--- a/drivers/net/ibm_newemac/rgmii.c
+++ b/drivers/net/ibm_newemac/rgmii.c
@@ -39,6 +39,7 @@
 #define RGMII_FER_RGMII(idx)	(0x5 << ((idx) * 4))
 #define RGMII_FER_TBI(idx)	(0x6 << ((idx) * 4))
 #define RGMII_FER_GMII(idx)	(0x7 << ((idx) * 4))
+#define RGMII_FER_MII(idx)	RGMII_FER_GMII(idx)
 
 /* RGMIIx_SSR */
 #define RGMII_SSR_MASK(idx)	(0x7 << ((idx) * 8))
@@ -49,6 +50,7 @@
 static inline int rgmii_valid_mode(int phy_mode)
 {
 	return  phy_mode == PHY_MODE_GMII ||
+		phy_mode == PHY_MODE_MII ||
 		phy_mode == PHY_MODE_RGMII ||
 		phy_mode == PHY_MODE_TBI ||
 		phy_mode == PHY_MODE_RTBI;
@@ -63,6 +65,8 @@ static inline const char *rgmii_mode_name(int mode)
 		return "TBI";
 	case PHY_MODE_GMII:
 		return "GMII";
+	case PHY_MODE_MII:
+		return "MII";
 	case PHY_MODE_RTBI:
 		return "RTBI";
 	default:
@@ -79,6 +83,8 @@ static inline u32 rgmii_mode_mask(int mode, int input)
 		return RGMII_FER_TBI(input);
 	case PHY_MODE_GMII:
 		return RGMII_FER_GMII(input);
+	case PHY_MODE_MII:
+		return RGMII_FER_MII(input);
 	case PHY_MODE_RTBI:
 		return RGMII_FER_RTBI(input);
 	default:
--
1.5.4.3

^ permalink raw reply related

* Re: powerpc/cell/cpufreq: add spu aware cpufreq governor
From: Dave Jones @ 2008-07-08 15:27 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Stephen Rothwell, cpufreq, linuxppc-dev, Jeremy Kerr, cbe-oss-dev
In-Reply-To: <200807080843.43674.arnd@arndb.de>

On Tue, Jul 08, 2008 at 08:43:43AM +0200, Arnd Bergmann wrote:
 > On Monday 07 July 2008, Dave Jones wrote:
 > > One question I do have though, is how userspace scripts are supposed
 > > to know they're to echo cbe_spu_governor into the relevant parts of
 > > sysfs.  I've not used anything with a cell. Do they expose the SPUs
 > > as regular CPUs, or do they show up in a different part of the tree?
 > 
 > An SPU is very different from a CPU from the user perspective.
 > SPUs show up in /sys/devices/system/spus, and if a user wants to access
 > them, the "spufs" file system needs to be mounted in the system, by
 > convention on /spu. 

Ok, that should be fairly simple to write scripts for.
All sounds good to me.

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply

* Re: [PATCH v2] ibm_newemac: Add MII mode support to the EMAC RGMII bridge.
From: Stefan Roese @ 2008-07-08 15:35 UTC (permalink / raw)
  To: Grant Erickson; +Cc: linuxppc-dev, jgarzik
In-Reply-To: <1215529386-12749-1-git-send-email-gerickson@nuovations.com>

On Tuesday 08 July 2008, Grant Erickson wrote:
> This patch adds support to the RGMII handler in the EMAC driver for
> the MII PHY mode such that device tree entries of the form `phy-mode =
> "mii";' are recognized and handled appropriately.
>
> While logically, in software, "gmii" and "mii" modes are the same,
> they are wired differently, so it makes sense to allow DTS authors to
> specify each explicitly.
>
> Signed-off-by: Grant Erickson <gerickson@nuovations.com>

Acked-by: Stefan Roese <sr@denx.de>

Best regards,
Stefan

^ permalink raw reply

* Re: [PATCH] irda: driver for Freescale FIRI controller
From: Anton Vorontsov @ 2008-07-08 15:44 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: linuxppc-dev, netdev, Timur Tabi, Zhang Wei
In-Reply-To: <20080611222524.GA6710@sortiz.org>

On Thu, Jun 12, 2008 at 12:25:25AM +0200, Samuel Ortiz wrote:
> Hi Anton,
> 
> On Wed, Jun 04, 2008 at 07:45:10PM +0400, Anton Vorontsov wrote:
> > From: Zhang Wei <wei.zhang@freescale.com>
> > 
> > The driver supports SIR, MIR, FIR modes and maximum 4000000bps rate.
> > 
> > Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
> > [AV: few small fixes, plus had made platform ops passing via node->data
> >      to avoid #ifdef stuff in the fsl_soc (think DIU). ]
> > Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
> Some comments below:

Sorry, it took me so long to answer. :-/

Much thanks for the comments, will fix them all. That is, your review
work isn't in vain, I'm much appreciated it.

Thanks again,

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply

* Re: [PATCH] Restore PERR/SERR bit settings during EEH device recovery
From: Mike Mason @ 2008-07-08 15:56 UTC (permalink / raw)
  To: linasvepstas; +Cc: paulus, linuxppc-dev
In-Reply-To: <3ae3aa420807080638t41e8851bx2ad061dd5a4e0279@mail.gmail.com>

Linas Vepstas wrote:
> 2008/7/7 Mike Mason <mmlnx@us.ibm.com>:
>> The following patch restores the PERR and SERR bits in the PCI
>> command register during an EEH device recovery.
>> We have found at least one case (an Agilent test card) where the
>> PERR/SERR bits are set to 1 by firmware at boot time, but are
>> not restored to 1 during EEH recovery.
> 
> Any chance they should be zero, and were accidentally set to 1?
> In which case, you'd need an else clause, below.

I suppose it's possible.  I'll add your suggestion and resubmit.

Mike

> 
>> The patch fixes the
>> Agilent card problem.  It has been tested on several other EEH-enabled cards
>> with no regressions.
>>
>> Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
>>
>> --- linux-2.6.26-rc9/arch/powerpc/platforms/pseries/eeh.c       2008-07-07
>> 16:06:57.000000000 -0700
>> +++ linux-2.6.26-rc9-new/arch/powerpc/platforms/pseries/eeh.c   2008-07-07
>> 16:11:10.000000000 -0700
>> @@ -812,6 +812,7 @@
>> static inline void __restore_bars (struct pci_dn *pdn)
>> {
>>        int i;
>> +       u32 cmd;
>>
>>        if (NULL==pdn->phb) return;
>>        for (i=4; i<10; i++) {
>> @@ -832,6 +833,15 @@
>>
>>        /* max latency, min grant, interrupt pin and line */
>>        rtas_write_config(pdn, 15*4, 4, pdn->config_space[15]);
>> +
>> +       /* Restore PERR & SERR bits, some devices require it,
>> +          don't touch the other command bits */
>> +       rtas_read_config(pdn, PCI_COMMAND, 4, &cmd);
>> +       if (pdn->config_space[1] & PCI_COMMAND_PARITY)
>> +               cmd |= PCI_COMMAND_PARITY;
> 
> else cmd &= ~PCI_COMMAND_PARITY;
> 
>> +       if (pdn->config_space[1] & PCI_COMMAND_SERR)
>> +               cmd |= PCI_COMMAND_SERR;
> 
> else cmd &= ~PCI_COMMAND_SERR;
> 
>> +       rtas_write_config(pdn, PCI_COMMAND, 4, cmd);
>> }
> 
> Other than that, I'll add an
> 
> Acked-by: Linas Vepstas <linasvepstas@gmail.com>
> 
> --linas

^ permalink raw reply

* Re: [PATCH] Restore PERR/SERR bit settings during EEH device recovery
From: Mike Mason @ 2008-07-08 16:04 UTC (permalink / raw)
  To: linasvepstas, paulus, benh, linuxppc-dev
In-Reply-To: <3ae3aa420807080638t41e8851bx2ad061dd5a4e0279@mail.gmail.com>

Here's a resubmission of the patch with Linas' suggestion.

The following patch restores the PERR and SERR bits in the PCI
command register during an EEH device recovery. We have found
at least one case (an Agilent test card) where the PERR/SERR
bits are set to 1 by firmware at boot time, but are not restored
to 1 during EEH recovery.  The patch fixes the Agilent card
problem.  It has been tested on several other EEH-enabled cards
with no regressions.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com> 
Acked-by: Linas Vepstas <linasvepstas@gmail.com>

--- linux-2.6.26-rc9/arch/powerpc/platforms/pseries/eeh.c	2008-07-07 16:06:57.000000000 -0700
+++ linux-2.6.26-rc9-new/arch/powerpc/platforms/pseries/eeh.c	2008-07-08 03:56:35.000000000 -0700
@@ -812,6 +812,7 @@
 static inline void __restore_bars (struct pci_dn *pdn)
 {
 	int i;
+	u32 cmd;
 
 	if (NULL==pdn->phb) return;
 	for (i=4; i<10; i++) {
@@ -832,6 +833,19 @@
 
 	/* max latency, min grant, interrupt pin and line */
 	rtas_write_config(pdn, 15*4, 4, pdn->config_space[15]);
+
+	/* Restore PERR & SERR bits, some devices require it,
+	   don't touch the other command bits */
+	rtas_read_config(pdn, PCI_COMMAND, 4, &cmd);
+	if (pdn->config_space[1] & PCI_COMMAND_PARITY)
+		cmd |= PCI_COMMAND_PARITY;
+	else
+		cmd &= ~PCI_COMMAND_PARITY;
+	if (pdn->config_space[1] & PCI_COMMAND_SERR)
+		cmd |= PCI_COMMAND_SERR;
+	else
+		cmd &= ~PCI_COMMAND_SERR;
+	rtas_write_config(pdn, PCI_COMMAND, 4, cmd);
 }
 
 /**

^ permalink raw reply

* powerpc: Add missing  reference to coherent_dma_mask
From: Vitaly Bordug @ 2008-07-08 16:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev@ozlabs.org, David Gibson

There is dma_mask in of_device upon of_platform_device_create()
but we don't actually set coherent_dma_mask. This may cause weird 
behavior of USB subsystem using of_device USB host drivers.

Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
---

 arch/powerpc/kernel/of_platform.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/of_platform.c b/arch/powerpc/kernel/of_platform.c
index e79ad8a..3f37a6e 100644
--- a/arch/powerpc/kernel/of_platform.c
+++ b/arch/powerpc/kernel/of_platform.c
@@ -76,6 +76,8 @@ struct of_device* of_platform_device_create(struct device_node *np,
 		return NULL;
 
 	dev->dma_mask = 0xffffffffUL;
+	dev->dev.coherent_dma_mask = DMA_32BIT_MASK;
+
 	dev->dev.bus = &of_platform_bus_type;
 
 	/* We do not fill the DMA ops for platform devices by default.

^ permalink raw reply related

* Re: booting an ML405
From: David Baird @ 2008-07-08 17:12 UTC (permalink / raw)
  To: Lorenzo T. Flores; +Cc: linuxppc-embedded
In-Reply-To: <4872C4D0.4060006@alphagolf.com>

On Mon, Jul 7, 2008 at 7:37 PM, Lorenzo T. Flores <lorenzo@alphagolf.com> wrote:
> If I stop the processor after it hangs:
>
> XMD% mrd 0xc0259fa4 10
> C0259FA4:   3C353E5B
> C0259FA8:   20202020
> C0259FAC:   302E3030
> C0259FB0:   30303030
> C0259FB4:   5D204C69
> C0259FB8:   6E757820
> C0259FBC:   76657273
> C0259FC0:   696F6E20
> C0259FC4:   322E362E
> C0259FC8:   32352D72

Since XMD is also a Tcl shell, you can easily download __log_buf like this:

    set fd [open xmd.log w]
    puts $fd [mrd  0xc0259fa4 10]

Then, with a little tweaking, you could use xxd or some other tool to
convert it to ASCII.

> If I cut off the 0xc0000000:
>
> XMD% mrd 0x259fa4 10
>  259FA4:   FFFFFFFF
>  259FA8:   FFFFFFFF
>  259FAC:   FFFFFFFF
>  259FB0:   FFFFFFFF
>  259FB4:   FFFFFFFF
>  259FB8:   FFFFFFFF
>  259FBC:   FFFFFFFF
>  259FC0:   FFFFFFFF
>  259FC4:   FFFFFFFF
>  259FC8:   FFFFFFFF

My guess is that if you issue a rst (reset) command, I think this will
take the processor out of virtual mode and then you can strip of the
0xc0000000.  But looks like you got what you need anyways, so no need
for this :-)

-David

^ permalink raw reply

* [PATCH 1/2] powerpc/83xx: fix ULPI setup for MPC8315 processors
From: Anton Vorontsov @ 2008-07-08 17:36 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

We must not use MPC831X_SICR[HL]_* definitions for the MPC8315 processors,
because SICR USB bits locations are not compatible with MPC8313.

This patch fixes ULPI workability on MPC8315E-RDB boards.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---
 arch/powerpc/platforms/83xx/mpc83xx.h |    4 ++++
 arch/powerpc/platforms/83xx/usb.c     |   24 +++++++++++++++---------
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/83xx/mpc83xx.h b/arch/powerpc/platforms/83xx/mpc83xx.h
index 88a3b5c..a8dcb81 100644
--- a/arch/powerpc/platforms/83xx/mpc83xx.h
+++ b/arch/powerpc/platforms/83xx/mpc83xx.h
@@ -26,6 +26,8 @@
 #define MPC834X_SICRL_USB1         0x20000000
 #define MPC831X_SICRL_USB_MASK     0x00000c00
 #define MPC831X_SICRL_USB_ULPI     0x00000800
+#define MPC8315_SICRL_USB_MASK     0x000000fc
+#define MPC8315_SICRL_USB_ULPI     0x00000054
 #define MPC837X_SICRL_USB_MASK     0xf0000000
 #define MPC837X_SICRL_USB_ULPI     0x50000000
 
@@ -34,6 +36,8 @@
 #define MPC834X_SICRH_USB_UTMI     0x00020000
 #define MPC831X_SICRH_USB_MASK     0x000000e0
 #define MPC831X_SICRH_USB_ULPI     0x000000a0
+#define MPC8315_SICRH_USB_MASK     0x0000ff00
+#define MPC8315_SICRH_USB_ULPI     0x00000000
 
 /* USB Control Register */
 #define FSL_USB2_CONTROL_OFFS      0x500
diff --git a/arch/powerpc/platforms/83xx/usb.c b/arch/powerpc/platforms/83xx/usb.c
index 64bcf0a..cc99c28 100644
--- a/arch/powerpc/platforms/83xx/usb.c
+++ b/arch/powerpc/platforms/83xx/usb.c
@@ -137,15 +137,21 @@ int mpc831x_usb_cfg(void)
 
 	/* Configure pin mux for ULPI.  There is no pin mux for UTMI */
 	if (prop && !strcmp(prop, "ulpi")) {
-		temp = in_be32(immap + MPC83XX_SICRL_OFFS);
-		temp &= ~MPC831X_SICRL_USB_MASK;
-		temp |= MPC831X_SICRL_USB_ULPI;
-		out_be32(immap + MPC83XX_SICRL_OFFS, temp);
-
-		temp = in_be32(immap + MPC83XX_SICRH_OFFS);
-		temp &= ~MPC831X_SICRH_USB_MASK;
-		temp |= MPC831X_SICRH_USB_ULPI;
-		out_be32(immap + MPC83XX_SICRH_OFFS, temp);
+		if (of_device_is_compatible(immr_node, "fsl,mpc8315-immr")) {
+			clrsetbits_be32(immap + MPC83XX_SICRL_OFFS,
+					MPC8315_SICRL_USB_MASK,
+					MPC8315_SICRL_USB_ULPI);
+			clrsetbits_be32(immap + MPC83XX_SICRH_OFFS,
+					MPC8315_SICRH_USB_MASK,
+					MPC8315_SICRH_USB_ULPI);
+		} else {
+			clrsetbits_be32(immap + MPC83XX_SICRL_OFFS,
+					MPC831X_SICRL_USB_MASK,
+					MPC831X_SICRL_USB_ULPI);
+			clrsetbits_be32(immap + MPC83XX_SICRH_OFFS,
+					MPC831X_SICRH_USB_MASK,
+					MPC831X_SICRH_USB_ULPI);
+		}
 	}
 
 	iounmap(immap);
-- 
1.5.5.4

^ permalink raw reply related

* [PATCH 2/2] powerpc/fsl_soc: gianfar: don't probe disabled devices
From: Anton Vorontsov @ 2008-07-08 17:36 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

Freescale ships MPC8315E-RDB boards in two variants:

1. With TSEC1 ethernet support and USB UTMI PHY;
2. Without TSEC1 support, but with USB ULPI PHY in addition.

For the second case U-Boot will add status = "disabled"; property
into the TSEC1 node, so Linux should not try to probe it.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---
 arch/powerpc/sysdev/fsl_soc.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_soc.c b/arch/powerpc/sysdev/fsl_soc.c
index 7d4cf00..2a19769 100644
--- a/arch/powerpc/sysdev/fsl_soc.c
+++ b/arch/powerpc/sysdev/fsl_soc.c
@@ -296,6 +296,9 @@ static int __init gfar_of_init(void)
 		const phandle *ph;
 		int n_res = 2;
 
+		if (!of_device_is_available(np))
+			continue;
+
 		memset(r, 0, sizeof(r));
 		memset(&gfar_data, 0, sizeof(gfar_data));
 
-- 
1.5.5.4

^ permalink raw reply related

* Re: [PATCH 15/16 v3] ibmvscsi: driver enablement for CMO
From: Robert Jennings @ 2008-07-08 17:41 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi, linuxppc-dev, David Darrington, paulus
In-Reply-To: <4872298D.8050802@linux.vnet.ibm.com>

* Brian King (brking@linux.vnet.ibm.com) wrote:
> Robert Jennings wrote:
> > @@ -1613,6 +1624,26 @@ static struct scsi_host_template driver_
> >  };
> >  
> >  /**
> > + * ibmvscsi_get_desired_dma - Calculate IO entitlement needed by the driver
> > + *
> > + * @vdev: struct vio_dev for the device whose entitlement is to be returned
> > + *
> > + * Return value:
> > + *	Number of bytes of IO data the driver will need to perform well.
> > + */
> > +static unsigned long ibmvscsi_get_desired_dma(struct vio_dev *vdev)
> > +{
> > +	/* iu_storage data allocated in initialize_event_pool */
> > +	unsigned long io_entitlement = max_requests * sizeof(union viosrp_iu);
> 
> Since you are removing the use of "entitlement" in the function description,
> you should probably remove it everywhere in this patch.

I'll clean this up.

> > +
> > +	/* add io space for sg data */
> > +	io_entitlement += (IBMVSCSI_MAX_SECTORS_DEFAULT *
> > +	                     IBMVSCSI_CMDS_PER_LUN_DEFAULT);
> > +
> > +	return IOMMU_PAGE_ALIGN(io_entitlement);
> 
> I really think this function should just return the number of bytes and
> let the caller round it up to any boundary requirements it might have.

I agree.  I'll be posting a new patch after a work out another issue I'm
having.  Hope to have that out soon.

--Rob Jennings

^ permalink raw reply

* Freescale SEC driver with cryptoloop on MPC8567e
From: mike zheng @ 2008-07-08 18:03 UTC (permalink / raw)
  To: linux-crypto, linuxppc-dev

Hello,

Any one know how can I integrate the Freescale's SEC driver with
cryptoloop in Kernel2.4 on MPC8567e? Or which version of kernel shall
I take if it is already there?

Thanks for your help,

Mike

^ permalink raw reply

* Re: [PATCH 1/2] elf loader support for auxvec base platform string
From: Steven Munroe @ 2008-07-08 18:35 UTC (permalink / raw)
  To: benh
  Cc: Steve Munroe, linux-kernel, linuxppc-dev, Paul Mackerras,
	Nathan Lynch, Roland McGrath
In-Reply-To: <1215478091.8970.183.camel@pasglop>

On Tue, 2008-07-08 at 10:48 +1000, Benjamin Herrenschmidt wrote:
> Adding Steve to the CC list as I'd like his input from the
> glibc/powerpc side as he's the requester of that feature in the first
> place.
> 
> Steve: Roland is proposing to ues dsocaps instead of AT_BASE_PLATFORM.
> 

I am will to discuss better solutions with Roland. It seems like I am
finally on the air for linuxppc-dev but it seems some of my earlier
notes got lost.

So I will restate. AT_BASE_PLATFORM is proposed solution to several
problems including CPU tuned library selection. If dsocaps is better
solution for library select I am happy to consider and discuss this.

However it is not clear that dsocaps is solution to all requires we need
to address for virtualization and partition migration of applications.
This required a durable and public API accessible form any application
or library.

First the problem:

We want to support migration of running partitions (including the kernel
and all running applications) abd we have to deal with mixed platform
clusters. If we want to migrate freely between POWER5+ and POWER6 (or
POWER7) systems then we need to make sure the application and its
libraries restrict themselves to the lowest ISA Version level (2.04 in
this case).

So the hardware and hypervisor support and enforce CPU compatibility
modes. For a partition is created on a POWER6 to run in POWER5+ mode.
There are HID bits set to restrict instruction set to the POWER5+
subset. So running a program that uses new POWER6 instruction on this
partition will SIGILL. 

So while this is really a POWER6 machine it is wrong for the kernel to
return AT_PLATFORM=power6. The /lib/power6/libc.so and libm.so do use
the new ISA V2.05 instructions that will SIGILL in this (POWER5+
compatible) partition. 

In this case the kernel should return AT_PLATFORM=power5+
because /lib/power5+/libc.so is build --with-cpu=power5+ and only uses
the ISA V2.04 instructions.

But that introduces some new problems. The processor, internal pipeline
(micro-architecture), and performance monitor unit (PMU events have to
match the pipeline structure) have not changed (still POWER6/7). This
implications on application performance and many performance tools.

For example oProfile/PAPI/libpfm need to know what the processor really
is because miss programing the PMU get bogus results or even crash the
systems. Another example is a JVM/JIT compiler which needs to know what
supported ISA level is (from AT_PLATFORM and AT_HWCAP), but can generate
better code if, it knows that base platform is different, and what the
actual micro-architecture is. For these examples the
AT_PLATFORM/AT_HWCAP based library selection mechanism does not apply.
And except for oProfile these examples are user mode
applications/libraries that need this information from a simple and
durable and public API. To me AT_BASE_PLATFORM seems like the minimal,
simplest, and most general solution to these problem.

Ok now back to library selection and dsocaps. Running power5+ libraries
on a power6 will execute (will not SIGILL) but may not be optimal. the
best performance also require careful instruction selection and
scheduling. For example the performance of memset/memcpy/memcmp depend
on tuning to the detail timing of the Load/Store pipelines, Store Queue
depth, and L2 cache clocking. This can be very different between
processor generations.

For this power5+ compatible partitions, we would like the option to
build libraries for -mcpu=power5+ -mtune=power6! etc!. The details of
how this will work are TBD. I put forth AT_BASE_PLATFORM with thought
that it could be search modifier in addition to AT_PLATFORM
(i.e. /lib/power5+/power6/libc.so.

If dsocaps is a better mechanism for library selection I am more then
will to discuss how dsocaps works and how it can be applied to this
specific case.

^ permalink raw reply

* Freescale SEC driver with cryptoloop on MPC8567e
From: mike zheng @ 2008-07-08 18:42 UTC (permalink / raw)
  To: linuxppc-embedded
In-Reply-To: <5c9cd53b0807081103y583d0013ic0d22fb46490be2b@mail.gmail.com>

Hello,

Any one know how can I integrate the Freescale's SEC driver with
cryptoloop in Kernel2.4 on MPC8567e? Or which version of kernel shall
I take if it is already there?

Thanks for your help,

Mike

^ permalink raw reply

* [PATCH] powerpc: i2c-ibm_iic register child nodes
From: Sean MacLennan @ 2008-07-08 18:46 UTC (permalink / raw)
  To: linuxppc-dev, Jean Delvare

This patch completes the conversion of the IBM IIC driver to an
of-platform driver.

It removes the index from the IBM IIC driver and makes it an unnumbered
driver. It then calls of_register_i2c_devices to properly register all
the child nodes in the DTS.

Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>
---

diff --git a/drivers/i2c/busses/i2c-ibm_iic.c b/drivers/i2c/busses/i2c-ibm_iic.c
index 070f078..651f2f1 100644
--- a/drivers/i2c/busses/i2c-ibm_iic.c
+++ b/drivers/i2c/busses/i2c-ibm_iic.c
@@ -43,6 +43,7 @@
 #include <linux/i2c.h>
 #include <linux/i2c-id.h>
 #include <linux/of_platform.h>
+#include <linux/of_i2c.h>
 
 #include "i2c-ibm_iic.h"
 
@@ -696,7 +697,7 @@ static int __devinit iic_probe(struct of_device *ofdev,
 	struct device_node *np = ofdev->node;
 	struct ibm_iic_private *dev;
 	struct i2c_adapter *adap;
-	const u32 *indexp, *freq;
+	const u32 *freq;
 	int ret;
 
 	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
@@ -707,14 +708,6 @@ static int __devinit iic_probe(struct of_device *ofdev,
 
 	dev_set_drvdata(&ofdev->dev, dev);
 
-	indexp = of_get_property(np, "index", NULL);
-	if (!indexp) {
-		dev_err(&ofdev->dev, "no index specified\n");
-		ret = -EINVAL;
-		goto error_cleanup;
-	}
-	dev->idx = *indexp;
-
 	dev->vaddr = of_iomap(np, 0);
 	if (dev->vaddr == NULL) {
 		dev_err(&ofdev->dev, "failed to iomap device\n");
@@ -757,14 +750,16 @@ static int __devinit iic_probe(struct of_device *ofdev,
 	adap->class = I2C_CLASS_HWMON | I2C_CLASS_SPD;
 	adap->algo = &iic_algo;
 	adap->timeout = 1;
-	adap->nr = dev->idx;
 
-	ret = i2c_add_numbered_adapter(adap);
+	ret = i2c_add_adapter(adap);
 	if (ret  < 0) {
 		dev_err(&ofdev->dev, "failed to register i2c adapter\n");
 		goto error_cleanup;
 	}
 
+	/* Now register all the child nodes */
+	of_register_i2c_devices(adap, np);
+
 	dev_info(&ofdev->dev, "using %s mode\n",
 		 dev->fast_mode ? "fast (400 kHz)" : "standard (100 kHz)");
 

^ permalink raw reply related

* [PATCH] powerpc: support nand boot for rev A warps
From: Sean MacLennan @ 2008-07-08 19:00 UTC (permalink / raw)
  To: linuxppc-dev

This patch is against linux-next.

Allow the Rev A Warp boards to boot from NAND.

Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>
---

diff --git a/arch/powerpc/platforms/44x/warp-nand.c b/arch/powerpc/platforms/44x/warp-nand.c
index 7bec281..e55746b 100644
--- a/arch/powerpc/platforms/44x/warp-nand.c
+++ b/arch/powerpc/platforms/44x/warp-nand.c
@@ -113,9 +113,14 @@ static int warp_setup_nand_flash(void)
 		pp = of_find_property(np, "reg", NULL);
 		if (pp && (pp->length == 12)) {
 			u32 *v = pp->value;
-			if (v[2] == 0x4000000)
+			if (v[2] == 0x4000000) {
 				/* Rev A = 64M NAND */
-				warp_nand_chip0.nr_partitions = 2;
+				warp_nand_chip0.nr_partitions = 3;
+
+				nand_parts[1].size   = 0x3000000;
+				nand_parts[2].offset = 0x3200000;
+				nand_parts[2].size   = 0x0e00000;
+			}
 		}
 		of_node_put(np);
 	}

^ permalink raw reply related

* [PATCH v2] Add PPC_FEATURE_PSERIES_PMU_COMPAT
From: Nathan Lynch @ 2008-07-08 20:01 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Olof Johansson, Paul Mackerras
In-Reply-To: <20080703232001.GB9594@localdomain>

Background from Maynard Johnson:
As of POWER6, a set of 32 common events is defined that must be
supported on all future POWER processors.  The main impetus for this
compat set is the need to support partition migration, especially from
processor P(n) to processor P(n+1), where performance software that's
running in the new partition may not be knowledgeable about processor
P(n+1).  If a performance tool determines it does not support the
physical processor, but is told (via the PPC_FEATURE_PSERIES_PMU_COMPAT
bit) that the processor supports the notion of the PMU compat set,
then the performance tool can surface just those events to the user
of the tool.

PPC_FEATURE_PSERIES_PMU_COMPAT indicates that the PMU supports at
least this basic subset of events which is compatible across POWER
processor lines.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
---

Changes since v1:
- make name of feature bit less generic
- provide more complete changelog

 arch/powerpc/kernel/cputable.c |    6 ++++--
 include/asm-powerpc/cputable.h |    1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 817cea1..c4eb377 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -70,10 +70,12 @@ extern void __restore_cpu_power7(void);
 				 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP)
 #define COMMON_USER_POWER6	(COMMON_USER_PPC64 | PPC_FEATURE_ARCH_2_05 |\
 				 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
-				 PPC_FEATURE_TRUE_LE)
+				 PPC_FEATURE_TRUE_LE | \
+				 PPC_FEATURE_PSERIES_PMU_COMPAT)
 #define COMMON_USER_POWER7	(COMMON_USER_PPC64 | PPC_FEATURE_ARCH_2_06 |\
 				 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
-				 PPC_FEATURE_TRUE_LE)
+				 PPC_FEATURE_TRUE_LE | \
+				 PPC_FEATURE_PSERIES_PMU_COMPAT)
 #define COMMON_USER_PA6T	(COMMON_USER_PPC64 | PPC_FEATURE_PA6T |\
 				 PPC_FEATURE_TRUE_LE | \
 				 PPC_FEATURE_HAS_ALTIVEC_COMP)
diff --git a/include/asm-powerpc/cputable.h b/include/asm-powerpc/cputable.h
index 3171ac9..d1492a2 100644
--- a/include/asm-powerpc/cputable.h
+++ b/include/asm-powerpc/cputable.h
@@ -26,6 +26,7 @@
 #define PPC_FEATURE_POWER6_EXT		0x00000200
 #define PPC_FEATURE_ARCH_2_06		0x00000100
 #define PPC_FEATURE_HAS_VSX		0x00000080
+#define PPC_FEATURE_PSERIES_PMU_COMPAT	0x00000040
 
 #define PPC_FEATURE_TRUE_LE		0x00000002
 #define PPC_FEATURE_PPC_LE		0x00000001
-- 
1.5.6.2

^ permalink raw reply related

* Re: [PATCH] powerpc: i2c-ibm_iic register child nodes
From: Sean MacLennan @ 2008-07-08 20:22 UTC (permalink / raw)
  Cc: linuxppc-dev
In-Reply-To: <20080708144609.14bae1de@lappy.seanm.ca>

Update the warp to use the new IBM IIC driver. We no longer need to
register the I2C devices in the platform code.

Also adds a small bugfix for the i2c code if the i2c read fails.

Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>
---

diff --git a/arch/powerpc/platforms/44x/warp.c b/arch/powerpc/platforms/44x/warp.c
index 9565995..960edf8 100644
--- a/arch/powerpc/platforms/44x/warp.c
+++ b/arch/powerpc/platforms/44x/warp.c
@@ -30,18 +30,6 @@ static __initdata struct of_device_id warp_of_bus[] = {
 	{},
 };
 
-static __initdata struct i2c_board_info warp_i2c_info[] = {
-	{ I2C_BOARD_INFO("ad7414", 0x4a) }
-};
-
-static int __init warp_arch_init(void)
-{
-	/* This should go away once support is moved to the dts. */
-	i2c_register_board_info(0, warp_i2c_info, ARRAY_SIZE(warp_i2c_info));
-	return 0;
-}
-machine_arch_initcall(warp, warp_arch_init);
-
 static int __init warp_device_probe(void)
 {
 	of_platform_bus_probe(NULL, warp_of_bus, NULL);
@@ -223,7 +211,7 @@ static void pika_setup_critical_temp(struct i2c_client *client)
 
 	/* These registers are in 1 degree increments. */
 	i2c_smbus_write_byte_data(client, 2, 65); /* Thigh */
-	i2c_smbus_write_byte_data(client, 3, 55); /* Tlow */
+	i2c_smbus_write_byte_data(client, 3,  0); /* Tlow */
 
 	np = of_find_compatible_node(NULL, NULL, "adi,ad7414");
 	if (np == NULL) {
@@ -289,8 +277,15 @@ found_it:
 	printk(KERN_INFO "PIKA DTM thread running.\n");
 
 	while (!kthread_should_stop()) {
-		u16 temp = swab16(i2c_smbus_read_word_data(client, 0));
-		out_be32(fpga + 0x20, temp);
+		int val;
+
+		val = i2c_smbus_read_word_data(client, 0);
+		if (val < 0)
+			dev_dbg(&client->dev, "DTM read temp failed.\n");
+		else {
+			s16 temp = swab16(val);
+			out_be32(fpga + 0x20, temp);
+		}
 
 		pika_dtm_check_fan(fpga);
 

^ permalink raw reply related

* Re: Freescale SEC driver with cryptoloop on MPC8567e
From: Kim Phillips @ 2008-07-08 20:28 UTC (permalink / raw)
  To: mike zheng; +Cc: linuxppc-dev, linux-crypto
In-Reply-To: <5c9cd53b0807081103y583d0013ic0d22fb46490be2b@mail.gmail.com>

On Tue, 8 Jul 2008 14:03:28 -0400
"mike zheng" <mail4mz@gmail.com> wrote:

> Hello,
> 
> Any one know how can I integrate the Freescale's SEC driver with
> cryptoloop in Kernel2.4 on MPC8567e? Or which version of kernel shall
> I take if it is already there?

Take Herbert's cryptodev-2.6 tree and add support in the talitos driver
for the ablkcipher algorithm you're using.

Kim

^ permalink raw reply

* Re: [PATCH 15/16 v3] [v2] ibmvscsi: driver enablement for CMO
From: Robert Jennings @ 2008-07-08 20:35 UTC (permalink / raw)
  To: paulus, benh, linuxppc-dev, linux-scsi, Brian King,
	Nathan Fontenot, David Darrington
In-Reply-To: <20080704125631.GR1310@linux.vnet.ibm.com>

=46rom: Robert Jennings <rcj@linux.vnet.ibm.com>

I removed references to 'entitlement' after having changed the function
'get_io_entitlement' to 'get_desired_dma' to correctly indicate what the
function was doing.  Also, this function does not need to page align the
return value, the VIO bus is responsible for this.

(We would like to take this patch through linuxppc-dev with the full
change set for this feature.  We are copying linux-scsi for review and ack)

Enable the driver to function in a Cooperative Memory Overcommitment (CMO)
environment.

The following changes are made to enable the driver for CMO:
 * DMA mapping errors will not result in error messages if entitlement has
   been exceeded and resources were not available.
 * The driver has a get_desired_dma function defined to function
   in a CMO environment. It will indicate how much IO memory it would like
   to function.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

---
 drivers/scsi/ibmvscsi/ibmvscsi.c |   45 +++++++++++++++++++++++++++++++++-=
-----
 drivers/scsi/ibmvscsi/ibmvscsi.h |    2 ++
 2 files changed, 40 insertions(+), 7 deletions(-)

Index: b/drivers/scsi/ibmvscsi/ibmvscsi.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -72,6 +72,7 @@
 #include <linux/delay.h>
 #include <asm/firmware.h>
 #include <asm/vio.h>
+#include <asm/firmware.h>
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_host.h>
@@ -426,8 +427,10 @@ static int map_sg_data(struct scsi_cmnd=20
 					   SG_ALL * sizeof(struct srp_direct_buf),
 					   &evt_struct->ext_list_token, 0);
 		if (!evt_struct->ext_list) {
-			sdev_printk(KERN_ERR, cmd->device,
-				    "Can't allocate memory for indirect table\n");
+			if (!firmware_has_feature(FW_FEATURE_CMO))
+				sdev_printk(KERN_ERR, cmd->device,
+				            "Can't allocate memory "
+				            "for indirect table\n");
 			return 0;
 		}
 	}
@@ -743,7 +746,9 @@ static int ibmvscsi_queuecommand(struct=20
 	srp_cmd->lun =3D ((u64) lun) << 48;
=20
 	if (!map_data_for_srp_cmd(cmnd, evt_struct, srp_cmd, hostdata->dev)) {
-		sdev_printk(KERN_ERR, cmnd->device, "couldn't convert cmd to srp_cmd\n");
+		if (!firmware_has_feature(FW_FEATURE_CMO))
+			sdev_printk(KERN_ERR, cmnd->device,
+			            "couldn't convert cmd to srp_cmd\n");
 		free_event_struct(&hostdata->pool, evt_struct);
 		return SCSI_MLQUEUE_HOST_BUSY;
 	}
@@ -855,7 +860,10 @@ static void send_mad_adapter_info(struct
 					    DMA_BIDIRECTIONAL);
=20
 	if (dma_mapping_error(req->buffer)) {
-		dev_err(hostdata->dev, "Unable to map request_buffer for adapter_info!\n=
");
+		if (!firmware_has_feature(FW_FEATURE_CMO))
+			dev_err(hostdata->dev,
+			        "Unable to map request_buffer for "
+			        "adapter_info!\n");
 		free_event_struct(&hostdata->pool, evt_struct);
 		return;
 	}
@@ -1400,7 +1408,9 @@ static int ibmvscsi_do_host_config(struc
 						    DMA_BIDIRECTIONAL);
=20
 	if (dma_mapping_error(host_config->buffer)) {
-		dev_err(hostdata->dev, "dma_mapping error getting host config\n");
+		if (!firmware_has_feature(FW_FEATURE_CMO))
+			dev_err(hostdata->dev,
+			        "dma_mapping error getting host config\n");
 		free_event_struct(&hostdata->pool, evt_struct);
 		return -1;
 	}
@@ -1604,7 +1614,7 @@ static struct scsi_host_template driver_
 	.eh_host_reset_handler =3D ibmvscsi_eh_host_reset_handler,
 	.slave_configure =3D ibmvscsi_slave_configure,
 	.change_queue_depth =3D ibmvscsi_change_queue_depth,
-	.cmd_per_lun =3D 16,
+	.cmd_per_lun =3D IBMVSCSI_CMDS_PER_LUN_DEFAULT,
 	.can_queue =3D IBMVSCSI_MAX_REQUESTS_DEFAULT,
 	.this_id =3D -1,
 	.sg_tablesize =3D SG_ALL,
@@ -1613,6 +1623,26 @@ static struct scsi_host_template driver_
 };
=20
 /**
+ * ibmvscsi_get_desired_dma - Calculate IO memory desired by the driver
+ *
+ * @vdev: struct vio_dev for the device whose desired IO mem is to be retu=
rned
+ *
+ * Return value:
+ *	Number of bytes of IO data the driver will need to perform well.
+ */
+static unsigned long ibmvscsi_get_desired_dma(struct vio_dev *vdev)
+{
+	/* iu_storage data allocated in initialize_event_pool */
+	unsigned long desired_io =3D max_requests * sizeof(union viosrp_iu);
+
+	/* add io space for sg data */
+	desired_io +=3D (IBMVSCSI_MAX_SECTORS_DEFAULT *
+	                     IBMVSCSI_CMDS_PER_LUN_DEFAULT);
+
+	return desired_io;
+}
+
+/**
  * Called by bus code for each adapter
  */
 static int ibmvscsi_probe(struct vio_dev *vdev, const struct vio_device_id=
 *id)
@@ -1641,7 +1671,7 @@ static int ibmvscsi_probe(struct vio_dev
 	hostdata->host =3D host;
 	hostdata->dev =3D dev;
 	atomic_set(&hostdata->request_limit, -1);
-	hostdata->host->max_sectors =3D 32 * 8; /* default max I/O 32 pages */
+	hostdata->host->max_sectors =3D IBMVSCSI_MAX_SECTORS_DEFAULT;
=20
 	rc =3D ibmvscsi_ops->init_crq_queue(&hostdata->queue, hostdata, max_reque=
sts);
 	if (rc !=3D 0 && rc !=3D H_RESOURCE) {
@@ -1735,6 +1765,7 @@ static struct vio_driver ibmvscsi_driver
 	.id_table =3D ibmvscsi_device_table,
 	.probe =3D ibmvscsi_probe,
 	.remove =3D ibmvscsi_remove,
+	.get_desired_dma =3D ibmvscsi_get_desired_dma,
 	.driver =3D {
 		.name =3D "ibmvscsi",
 		.owner =3D THIS_MODULE,
Index: b/drivers/scsi/ibmvscsi/ibmvscsi.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- a/drivers/scsi/ibmvscsi/ibmvscsi.h
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.h
@@ -45,6 +45,8 @@ struct Scsi_Host;
 #define MAX_INDIRECT_BUFS 10
=20
 #define IBMVSCSI_MAX_REQUESTS_DEFAULT 100
+#define IBMVSCSI_CMDS_PER_LUN_DEFAULT 16
+#define IBMVSCSI_MAX_SECTORS_DEFAULT 256 /* 32 * 8 =3D default max I/O 32 =
pages */
 #define IBMVSCSI_MAX_CMDS_PER_LUN 64
=20
 /* ------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH 14/16 v3] [v2] ibmveth: enable driver for CMO
From: Robert Jennings @ 2008-07-08 20:38 UTC (permalink / raw)
  To: paulus, benh, linuxppc-dev, netdev, Brian King, Nathan Fontenot,
	David Darrington
In-Reply-To: <20080704125615.GQ1310@linux.vnet.ibm.com>

I removed references to 'entitlement' after having changed the function
'get_io_entitlement' to 'get_desired_dma' to correctly indicate what the
function was doing.  Also, this function does not need to page align the
return value, the VIO bus is responsible for this.

(We would like to take this patch through linuxppc-dev with the full
 change set for this feature.  We are copying netdev for review and ack)

Enable ibmveth for Cooperative Memory Overcommitment (CMO).  For this driver
it means calculating a desired amount of IO memory based on the current MTU
and updating this value with the bus when MTU changes occur.  Because DMA
mappings can fail, we have added a bounce buffer for temporary cases where
the driver can not map IO memory for the buffer pool.

The following changes are made to enable the driver for CMO:
 * DMA mapping errors will not result in error messages if entitlement has
   been exceeded and resources were not available.
 * DMA mapping errors are handled gracefully, ibmveth_replenish_buffer_pool()
   is corrected to check the return from dma_map_single and fail gracefully.
 * The driver will have a get_desired_dma function defined to function
   in a CMO environment.
 * When the MTU is changed, the driver will update the device IO entitlement

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Santiago Leon <santil@us.ibm.com>

---

 drivers/net/ibmveth.c |  169 ++++++++++++++++++++++++++++++++++++++++----------
 drivers/net/ibmveth.h |    5 +
 2 files changed, 140 insertions(+), 34 deletions(-)

Index: b/drivers/net/ibmveth.c
===================================================================
--- a/drivers/net/ibmveth.c
+++ b/drivers/net/ibmveth.c
@@ -33,6 +33,7 @@
 */
 
 #include <linux/module.h>
+#include <linux/moduleparam.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/ioport.h>
@@ -52,7 +53,9 @@
 #include <asm/hvcall.h>
 #include <asm/atomic.h>
 #include <asm/vio.h>
+#include <asm/iommu.h>
 #include <asm/uaccess.h>
+#include <asm/firmware.h>
 #include <linux/seq_file.h>
 
 #include "ibmveth.h"
@@ -94,8 +97,10 @@ static void ibmveth_proc_register_adapte
 static void ibmveth_proc_unregister_adapter(struct ibmveth_adapter *adapter);
 static irqreturn_t ibmveth_interrupt(int irq, void *dev_instance);
 static void ibmveth_rxq_harvest_buffer(struct ibmveth_adapter *adapter);
+static unsigned long ibmveth_get_desired_dma(struct vio_dev *vdev);
 static struct kobj_type ktype_veth_pool;
 
+
 #ifdef CONFIG_PROC_FS
 #define IBMVETH_PROC_DIR "ibmveth"
 static struct proc_dir_entry *ibmveth_proc_dir;
@@ -226,16 +231,16 @@ static void ibmveth_replenish_buffer_poo
 	u32 i;
 	u32 count = pool->size - atomic_read(&pool->available);
 	u32 buffers_added = 0;
+	struct sk_buff *skb;
+	unsigned int free_index, index;
+	u64 correlator;
+	unsigned long lpar_rc;
+	dma_addr_t dma_addr;
 
 	mb();
 
 	for(i = 0; i < count; ++i) {
-		struct sk_buff *skb;
-		unsigned int free_index, index;
-		u64 correlator;
 		union ibmveth_buf_desc desc;
-		unsigned long lpar_rc;
-		dma_addr_t dma_addr;
 
 		skb = alloc_skb(pool->buff_size, GFP_ATOMIC);
 
@@ -255,6 +260,9 @@ static void ibmveth_replenish_buffer_poo
 		dma_addr = dma_map_single(&adapter->vdev->dev, skb->data,
 				pool->buff_size, DMA_FROM_DEVICE);
 
+		if (dma_mapping_error(dma_addr))
+			goto failure;
+
 		pool->free_map[free_index] = IBM_VETH_INVALID_MAP;
 		pool->dma_addr[index] = dma_addr;
 		pool->skbuff[index] = skb;
@@ -267,20 +275,9 @@ static void ibmveth_replenish_buffer_poo
 
 		lpar_rc = h_add_logical_lan_buffer(adapter->vdev->unit_address, desc.desc);
 
-		if(lpar_rc != H_SUCCESS) {
-			pool->free_map[free_index] = index;
-			pool->skbuff[index] = NULL;
-			if (pool->consumer_index == 0)
-				pool->consumer_index = pool->size - 1;
-			else
-				pool->consumer_index--;
-			dma_unmap_single(&adapter->vdev->dev,
-					pool->dma_addr[index], pool->buff_size,
-					DMA_FROM_DEVICE);
-			dev_kfree_skb_any(skb);
-			adapter->replenish_add_buff_failure++;
-			break;
-		} else {
+		if (lpar_rc != H_SUCCESS)
+			goto failure;
+		else {
 			buffers_added++;
 			adapter->replenish_add_buff_success++;
 		}
@@ -288,6 +285,24 @@ static void ibmveth_replenish_buffer_poo
 
 	mb();
 	atomic_add(buffers_added, &(pool->available));
+	return;
+
+failure:
+	pool->free_map[free_index] = index;
+	pool->skbuff[index] = NULL;
+	if (pool->consumer_index == 0)
+		pool->consumer_index = pool->size - 1;
+	else
+		pool->consumer_index--;
+	if (!dma_mapping_error(dma_addr))
+		dma_unmap_single(&adapter->vdev->dev,
+		                 pool->dma_addr[index], pool->buff_size,
+		                 DMA_FROM_DEVICE);
+	dev_kfree_skb_any(skb);
+	adapter->replenish_add_buff_failure++;
+
+	mb();
+	atomic_add(buffers_added, &(pool->available));
 }
 
 /* replenish routine */
@@ -297,7 +312,7 @@ static void ibmveth_replenish_task(struc
 
 	adapter->replenish_task_cycles++;
 
-	for(i = 0; i < IbmVethNumBufferPools; i++)
+	for (i = (IbmVethNumBufferPools - 1); i >= 0; i--)
 		if(adapter->rx_buff_pool[i].active)
 			ibmveth_replenish_buffer_pool(adapter,
 						     &adapter->rx_buff_pool[i]);
@@ -472,6 +487,18 @@ static void ibmveth_cleanup(struct ibmve
 		if (adapter->rx_buff_pool[i].active)
 			ibmveth_free_buffer_pool(adapter,
 						 &adapter->rx_buff_pool[i]);
+
+	if (adapter->bounce_buffer != NULL) {
+		if (!dma_mapping_error(adapter->bounce_buffer_dma)) {
+			dma_unmap_single(&adapter->vdev->dev,
+					adapter->bounce_buffer_dma,
+					adapter->netdev->mtu + IBMVETH_BUFF_OH,
+					DMA_BIDIRECTIONAL);
+			adapter->bounce_buffer_dma = DMA_ERROR_CODE;
+		}
+		kfree(adapter->bounce_buffer);
+		adapter->bounce_buffer = NULL;
+	}
 }
 
 static int ibmveth_register_logical_lan(struct ibmveth_adapter *adapter,
@@ -607,6 +634,24 @@ static int ibmveth_open(struct net_devic
 		return rc;
 	}
 
+	adapter->bounce_buffer =
+	    kmalloc(netdev->mtu + IBMVETH_BUFF_OH, GFP_KERNEL);
+	if (!adapter->bounce_buffer) {
+		ibmveth_error_printk("unable to allocate bounce buffer\n");
+		ibmveth_cleanup(adapter);
+		napi_disable(&adapter->napi);
+		return -ENOMEM;
+	}
+	adapter->bounce_buffer_dma =
+	    dma_map_single(&adapter->vdev->dev, adapter->bounce_buffer,
+			   netdev->mtu + IBMVETH_BUFF_OH, DMA_BIDIRECTIONAL);
+	if (dma_mapping_error(adapter->bounce_buffer_dma)) {
+		ibmveth_error_printk("unable to map bounce buffer\n");
+		ibmveth_cleanup(adapter);
+		napi_disable(&adapter->napi);
+		return -ENOMEM;
+	}
+
 	ibmveth_debug_printk("initial replenish cycle\n");
 	ibmveth_interrupt(netdev->irq, netdev);
 
@@ -853,10 +898,12 @@ static int ibmveth_start_xmit(struct sk_
 	unsigned int tx_packets = 0;
 	unsigned int tx_send_failed = 0;
 	unsigned int tx_map_failed = 0;
+	int used_bounce = 0;
+	unsigned long data_dma_addr;
 
 	desc.fields.flags_len = IBMVETH_BUF_VALID | skb->len;
-	desc.fields.address = dma_map_single(&adapter->vdev->dev, skb->data,
-					     skb->len, DMA_TO_DEVICE);
+	data_dma_addr = dma_map_single(&adapter->vdev->dev, skb->data,
+				       skb->len, DMA_TO_DEVICE);
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL &&
 	    ip_hdr(skb)->protocol != IPPROTO_TCP && skb_checksum_help(skb)) {
@@ -875,12 +922,16 @@ static int ibmveth_start_xmit(struct sk_
 		buf[1] = 0;
 	}
 
-	if (dma_mapping_error(desc.fields.address)) {
-		ibmveth_error_printk("tx: unable to map xmit buffer\n");
+	if (dma_mapping_error(data_dma_addr)) {
+		if (!firmware_has_feature(FW_FEATURE_CMO))
+			ibmveth_error_printk("tx: unable to map xmit buffer\n");
+		skb_copy_from_linear_data(skb, adapter->bounce_buffer,
+					  skb->len);
+		desc.fields.address = adapter->bounce_buffer_dma;
 		tx_map_failed++;
-		tx_dropped++;
-		goto out;
-	}
+		used_bounce = 1;
+	} else
+		desc.fields.address = data_dma_addr;
 
 	/* send the frame. Arbitrarily set retrycount to 1024 */
 	correlator = 0;
@@ -904,8 +955,9 @@ static int ibmveth_start_xmit(struct sk_
 		netdev->trans_start = jiffies;
 	}
 
-	dma_unmap_single(&adapter->vdev->dev, desc.fields.address,
-			 skb->len, DMA_TO_DEVICE);
+	if (!used_bounce)
+		dma_unmap_single(&adapter->vdev->dev, data_dma_addr,
+				 skb->len, DMA_TO_DEVICE);
 
 out:	spin_lock_irqsave(&adapter->stats_lock, flags);
 	netdev->stats.tx_dropped += tx_dropped;
@@ -1053,8 +1105,9 @@ static void ibmveth_set_multicast_list(s
 static int ibmveth_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct ibmveth_adapter *adapter = dev->priv;
+	struct vio_dev *viodev = adapter->vdev;
 	int new_mtu_oh = new_mtu + IBMVETH_BUFF_OH;
-	int i, rc;
+	int i;
 
 	if (new_mtu < IBMVETH_MAX_MTU)
 		return -EINVAL;
@@ -1085,10 +1138,15 @@ static int ibmveth_change_mtu(struct net
 				ibmveth_close(adapter->netdev);
 				adapter->pool_config = 0;
 				dev->mtu = new_mtu;
-				if ((rc = ibmveth_open(adapter->netdev)))
-					return rc;
-			} else
-				dev->mtu = new_mtu;
+				vio_cmo_set_dev_desired(viodev,
+						ibmveth_get_desired_dma
+						(viodev));
+				return ibmveth_open(adapter->netdev);
+			}
+			dev->mtu = new_mtu;
+			vio_cmo_set_dev_desired(viodev,
+						ibmveth_get_desired_dma
+						(viodev));
 			return 0;
 		}
 	}
@@ -1103,6 +1161,46 @@ static void ibmveth_poll_controller(stru
 }
 #endif
 
+/**
+ * ibmveth_get_desired_dma - Calculate IO memory desired by the driver
+ *
+ * @vdev: struct vio_dev for the device whose desired IO mem is to be returned
+ *
+ * Return value:
+ *	Number of bytes of IO data the driver will need to perform well.
+ */
+static unsigned long ibmveth_get_desired_dma(struct vio_dev *vdev)
+{
+	struct net_device *netdev = dev_get_drvdata(&vdev->dev);
+	struct ibmveth_adapter *adapter;
+	unsigned long ret;
+	int i;
+	int rxqentries = 1;
+
+	/* netdev inits at probe time along with the structures we need below*/
+	if (netdev == NULL)
+		return IOMMU_PAGE_ALIGN(IBMVETH_IO_ENTITLEMENT_DEFAULT);
+
+	adapter = netdev_priv(netdev);
+
+	ret = IBMVETH_BUFF_LIST_SIZE + IBMVETH_FILT_LIST_SIZE;
+	ret += IOMMU_PAGE_ALIGN(netdev->mtu);
+
+	for (i = 0; i < IbmVethNumBufferPools; i++) {
+		/* add the size of the active receive buffers */
+		if (adapter->rx_buff_pool[i].active)
+			ret +=
+			    adapter->rx_buff_pool[i].size *
+			    IOMMU_PAGE_ALIGN(adapter->rx_buff_pool[i].
+			            buff_size);
+		rxqentries += adapter->rx_buff_pool[i].size;
+	}
+	/* add the size of the receive queue entries */
+	ret += IOMMU_PAGE_ALIGN(rxqentries * sizeof(struct ibmveth_rx_q_entry));
+
+	return ret;
+}
+
 static int __devinit ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 {
 	int rc, i;
@@ -1247,6 +1345,8 @@ static int __devexit ibmveth_remove(stru
 	ibmveth_proc_unregister_adapter(adapter);
 
 	free_netdev(netdev);
+	dev_set_drvdata(&dev->dev, NULL);
+
 	return 0;
 }
 
@@ -1491,6 +1591,7 @@ static struct vio_driver ibmveth_driver 
 	.id_table	= ibmveth_device_table,
 	.probe		= ibmveth_probe,
 	.remove		= ibmveth_remove,
+	.get_desired_dma = ibmveth_get_desired_dma,
 	.driver		= {
 		.name	= ibmveth_driver_name,
 		.owner	= THIS_MODULE,
Index: b/drivers/net/ibmveth.h
===================================================================
--- a/drivers/net/ibmveth.h
+++ b/drivers/net/ibmveth.h
@@ -93,9 +93,12 @@ static inline long h_illan_attributes(un
   plpar_hcall_norets(H_CHANGE_LOGICAL_LAN_MAC, ua, mac)
 
 #define IbmVethNumBufferPools 5
+#define IBMVETH_IO_ENTITLEMENT_DEFAULT 4243456 /* MTU of 1500 needs 4.2Mb */
 #define IBMVETH_BUFF_OH 22 /* Overhead: 14 ethernet header + 8 opaque handle */
 #define IBMVETH_MAX_MTU 68
 #define IBMVETH_MAX_POOL_COUNT 4096
+#define IBMVETH_BUFF_LIST_SIZE 4096
+#define IBMVETH_FILT_LIST_SIZE 4096
 #define IBMVETH_MAX_BUF_SIZE (1024 * 128)
 
 static int pool_size[] = { 512, 1024 * 2, 1024 * 16, 1024 * 32, 1024 * 64 };
@@ -143,6 +146,8 @@ struct ibmveth_adapter {
     struct ibmveth_rx_q rx_queue;
     int pool_config;
     int rx_csum;
+    void *bounce_buffer;
+    dma_addr_t bounce_buffer_dma;
 
     /* adapter specific stats */
     u64 replenish_task_cycles;

^ permalink raw reply

* Re: [PATCH 10/16 v3] [v2] powerpc: iommu enablement for CMO
From: Robert Jennings @ 2008-07-08 20:48 UTC (permalink / raw)
  To: paulus, benh, linuxppc-dev, Brian King, Nathan Fontenot,
	David Darrington
In-Reply-To: <20080704125459.GM1310@linux.vnet.ibm.com>

=46rom: Robert Jennings <rcj@linux.vnet.ibm.com>

Minor change to add a call to align the return from the device's
get_desired_dma() function with IOMMU_PAGE_ALIGN().  Also removed a
comment referring to a non-existent structure member.

This is a large patch but the normal code path is not affected.  For
non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled
pSeries systems this does not affect the normal code path.  Devices that
do not perform DMA operations do not need modification with this patch.
The function get_desired_dma was renamed from get_io_entitlement for
clarity.

Overview

Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions
to be run with less RAM than the aggregate needs of the group of
partitions.  The firmware will balance memory between the partitions
and page in/out memory as needed.  Based on the number and type of IO
adapters preset each partition is allocated an amount of memory for
DMA operations and this allocation will be guaranteed to the partition;
this is referred to as the partition's 'entitlement'.

Partitions running in a CMO environment can only have virtual IO devices
present.  The VIO bus layer will manage the IO entitlement for the system.
Accounting, at a system and per-device level, is tracked in the VIO bus
code and exposed via sysfs.  A set of dma_ops functions are added to
the bus to allow for this accounting.

Bus initialization

At initialization, the bus will calculate the minimum needs of the system
based on providing each device present with a standard minimum entitlement
along with a spare allocation for the bus to handle hot-plug events.
If the minimum needs can not be met the system boot will be halted.

Device changes

The significant changes for devices while running under CMO are that the
devices must specify how much dedicated IO entitlement they desire and
must also handle DMA mapping errors that can occur due to constrained
IO memory.  The virtual IO drivers are modified to silence errors when
DMA mappings fail for CMO and handle these failures gracefully.

Each devices will be guaranteed a minimum entitlement that can always
be mapped.  Devices will specify how much entitlement they desire and
the VIO bus will attempt to provide for this.  Devices can change their
desired entitlement level at any point in time to address particular needs
(via vio_cmo_set_dev_desired()), not just at device probe time.

VIO bus changes

The system will have a particular entitlement level available from which
it can provide memory to the devices.  The bus defines two pools of memory
within this entitlement, the reserved and excess pools.  Each device is
provided with it's own entitlement no less than a system defined minimum
entitlement and no greater than what the device has specified as it's
desired entitlement.  The entitlement provided to devices comes from the
reserve pool.  The reserve pool can also contain a spare allocation as
large as the system defined minimum entitlement which is used for device
hot-plug events.  Any entitlement not needed to fulfill the needs of a
reserve pool is placed in the excess pool.  Each device is guaranteed
that it can map up to it's entitled level; additional mapping are possible
as long as there is unmapped memory in the excess pool.

Bus probe

As the system starts, each device is given an entitlement equal only
to the system defined minimum entitlement.  The reserve pool is equal
to the sum of these entitlements, plus a spare allocation.  The VIO bus
also tracks the aggregate desired entitlement of all the devices.  If the
system desired entitlement is greater than the size of the reserve pool,
when devices unmap IO memory it will be reserved and a balance operation
will be scheduled for some time in the future.

Entitlement balancing

The balance function tries to fairly distribute entitlement between the
devices in the system with the goal of providing each device with it's
desired amount of entitlement.  Devices using more than what would be
ideal will have their entitled set-point adjusted; this will effectively
set a goal for lower IO memory usage as future mappings can fail and
deallocations will trigger a balance operation to distribute the newly
unmapped memory.  A fair distribution of entitlement can take several
balance operations to achieve.  Entitlement changes and device DLPAR
events will alter the state of CMO and will trigger balance operations.

Hotplug events

The VIO bus allows for changes in system entitlement at run-time via
'vio_cmo_entitlement_update()'.  When devices are added the hot-plug
device event will be preceded by a system entitlement increase and this
is reversed when devices are removed.

The following changes are made that the VIO bus layer for CMO:
 * add IO memory accounting per device structure.
 * add IO memory entitlement query function to driver structure.
 * during vio bus probe, if CMO is enabled, check that driver has
   memory entitlement query function defined.  Fail if function not defined.
 * fail to register driver if io entitlement function not defined.
 * create set of dma_ops at vio level for CMO that will track allocations
   and return DMA failures once entitlement is reached.  Entitlement will
   limited by overall system entitlement.  Devices will have a reserved
   quantity of memory that is guaranteed, the rest can be used as available.
 * expose entitlement, current allocation, desired allocation, and the
   allocation error counter for devices to the user through sysfs
 * provide mechanism for changing a device's desired entitlement at run time
   for devices as an exported function and sysfs tunable
 * track any DMA failures for entitled IO memory for each vio device.
 * check entitlement against available system entitlement on device add
 * track entitlement metrics (high water mark, current usage)
 * provide function to reset high water mark
 * provide minimum and desired entitlement numbers at a bus level
 * provide drivers with a minimum guaranteed entitlement
 * balance available entitlement between devices to satisfy their needs
 * handle system entitlement changes and device hot-plug

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

---
 arch/powerpc/kernel/vio.c | 1024 +++++++++++++++++++++++++++++++++++++++++=
+++++
 include/asm-powerpc/vio.h |   27 +
 2 files changed, 1043 insertions(+), 8 deletions(-)

Index: b/arch/powerpc/kernel/vio.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- a/arch/powerpc/kernel/vio.c
+++ b/arch/powerpc/kernel/vio.c
@@ -1,11 +1,12 @@
 /*
  * IBM PowerPC Virtual I/O Infrastructure Support.
  *
- *    Copyright (c) 2003-2005 IBM Corp.
+ *    Copyright (c) 2003,2008 IBM Corp.
  *     Dave Engebretsen engebret@us.ibm.com
  *     Santiago Leon santil@us.ibm.com
  *     Hollis Blanchard <hollisb@us.ibm.com>
  *     Stephen Rothwell
+ *     Robert Jennings <rcjenn@us.ibm.com>
  *
  *      This program is free software; you can redistribute it and/or
  *      modify it under the terms of the GNU General Public License
@@ -46,6 +47,987 @@ static struct vio_dev vio_bus_device  =3D=20
 	.dev.bus =3D &vio_bus_type,
 };
=20
+#ifdef CONFIG_PPC_PSERIES
+/**
+ * vio_cmo_pool - A pool of IO memory for CMO use
+ *
+ * @size: The size of the pool in bytes
+ * @free: The amount of free memory in the pool
+ */
+struct vio_cmo_pool {
+	size_t size;
+	size_t free;
+};
+
+/* How many ms to delay queued balance work */
+#define VIO_CMO_BALANCE_DELAY 100
+
+/* Portion out IO memory to CMO devices by this chunk size */
+#define VIO_CMO_BALANCE_CHUNK 131072
+
+/**
+ * vio_cmo_dev_entry - A device that is CMO-enabled and requires entitleme=
nt
+ *
+ * @vio_dev: struct vio_dev pointer
+ * @list: pointer to other devices on bus that are being tracked
+ */
+struct vio_cmo_dev_entry {
+	struct vio_dev *viodev;
+	struct list_head list;
+};
+
+/**
+ * vio_cmo - VIO bus accounting structure for CMO entitlement
+ *
+ * @lock: spinlock for entire structure
+ * @balance_q: work queue for balancing system entitlement
+ * @device_list: list of CMO-enabled devices requiring entitlement
+ * @entitled: total system entitlement in bytes
+ * @reserve: pool of memory from which devices reserve entitlement, incl. =
spare
+ * @excess: pool of excess entitlement not needed for device reserves or s=
pare
+ * @spare: IO memory for device hotplug functionality
+ * @min: minimum necessary for system operation
+ * @desired: desired memory for system operation
+ * @curr: bytes currently allocated
+ * @high: high water mark for IO data usage
+ */
+struct vio_cmo {
+	spinlock_t lock;
+	struct delayed_work balance_q;
+	struct list_head device_list;
+	size_t entitled;
+	struct vio_cmo_pool reserve;
+	struct vio_cmo_pool excess;
+	size_t spare;
+	size_t min;
+	size_t desired;
+	size_t curr;
+	size_t high;
+} vio_cmo;
+
+/**
+ * vio_cmo_OF_devices - Count the number of OF devices that have DMA windo=
ws
+ */
+static int vio_cmo_num_OF_devs(void)
+{
+	struct device_node *node_vroot;
+	int count =3D 0;
+
+	/*
+	 * Count the number of vdevice entries with an
+	 * ibm,my-dma-window OF property
+	 */
+	node_vroot =3D of_find_node_by_name(NULL, "vdevice");
+	if (node_vroot) {
+		struct device_node *of_node;
+		struct property *prop;
+
+		for_each_child_of_node(node_vroot, of_node) {
+			prop =3D of_find_property(of_node, "ibm,my-dma-window",
+			                       NULL);
+			if (prop)
+				count++;
+		}
+	}
+	of_node_put(node_vroot);
+	return count;
+}
+
+/**
+ * vio_cmo_alloc - allocate IO memory for CMO-enable devices
+ *
+ * @viodev: VIO device requesting IO memory
+ * @size: size of allocation requested
+ *
+ * Allocations come from memory reserved for the devices and any excess
+ * IO memory available to all devices.  The spare pool used to service
+ * hotplug must be equal to %VIO_CMO_MIN_ENT for the excess pool to be
+ * made available.
+ *
+ * Return codes:
+ *  0 for successful allocation and -ENOMEM for a failure
+ */
+static inline int vio_cmo_alloc(struct vio_dev *viodev, size_t size)
+{
+	unsigned long flags;
+	size_t reserve_free =3D 0;
+	size_t excess_free =3D 0;
+	int ret =3D -ENOMEM;
+
+	spin_lock_irqsave(&vio_cmo.lock, flags);
+
+	/* Determine the amount of free entitlement available in reserve */
+	if (viodev->cmo.entitled > viodev->cmo.allocated)
+		reserve_free =3D viodev->cmo.entitled - viodev->cmo.allocated;
+
+	/* If spare is not fulfilled, the excess pool can not be used. */
+	if (vio_cmo.spare >=3D VIO_CMO_MIN_ENT)
+		excess_free =3D vio_cmo.excess.free;
+
+	/* The request can be satisfied */
+	if ((reserve_free + excess_free) >=3D size) {
+		vio_cmo.curr +=3D size;
+		if (vio_cmo.curr > vio_cmo.high)
+			vio_cmo.high =3D vio_cmo.curr;
+		viodev->cmo.allocated +=3D size;
+		size -=3D min(reserve_free, size);
+		vio_cmo.excess.free -=3D size;
+		ret =3D 0;
+	}
+
+	spin_unlock_irqrestore(&vio_cmo.lock, flags);
+	return ret;
+}
+
+/**
+ * vio_cmo_dealloc - deallocate IO memory from CMO-enable devices
+ * @viodev: VIO device freeing IO memory
+ * @size: size of deallocation
+ *
+ * IO memory is freed by the device back to the correct memory pools.
+ * The spare pool is replenished first from either memory pool, then
+ * the reserve pool is used to reduce device entitlement, the excess
+ * pool is used to increase the reserve pool toward the desired entitlement
+ * target, and then the remaining memory is returned to the pools.
+ *
+ */
+static inline void vio_cmo_dealloc(struct vio_dev *viodev, size_t size)
+{
+	unsigned long flags;
+	size_t spare_needed =3D 0;
+	size_t excess_freed =3D 0;
+	size_t reserve_freed =3D size;
+	size_t tmp;
+	int balance =3D 0;
+
+	spin_lock_irqsave(&vio_cmo.lock, flags);
+	vio_cmo.curr -=3D size;
+
+	/* Amount of memory freed from the excess pool */
+	if (viodev->cmo.allocated > viodev->cmo.entitled) {
+		excess_freed =3D min(reserve_freed, (viodev->cmo.allocated -
+		                                   viodev->cmo.entitled));
+		reserve_freed -=3D excess_freed;
+	}
+
+	/* Remove allocation from device */
+	viodev->cmo.allocated -=3D (reserve_freed + excess_freed);
+
+	/* Spare is a subset of the reserve pool, replenish it first. */
+	spare_needed =3D VIO_CMO_MIN_ENT - vio_cmo.spare;
+
+	/*
+	 * Replenish the spare in the reserve pool from the excess pool.
+	 * This moves entitlement into the reserve pool.
+	 */
+	if (spare_needed && excess_freed) {
+		tmp =3D min(excess_freed, spare_needed);
+		vio_cmo.excess.size -=3D tmp;
+		vio_cmo.reserve.size +=3D tmp;
+		vio_cmo.spare +=3D tmp;
+		excess_freed -=3D tmp;
+		spare_needed -=3D tmp;
+		balance =3D 1;
+	}
+
+	/*
+	 * Replenish the spare in the reserve pool from the reserve pool.
+	 * This removes entitlement from the device down to VIO_CMO_MIN_ENT,
+	 * if needed, and gives it to the spare pool. The amount of used
+	 * memory in this pool does not change.
+	 */
+	if (spare_needed && reserve_freed) {
+		tmp =3D min(spare_needed, min(reserve_freed,
+		                            (viodev->cmo.entitled -
+		                             VIO_CMO_MIN_ENT)));
+
+		vio_cmo.spare +=3D tmp;
+		viodev->cmo.entitled -=3D tmp;
+		reserve_freed -=3D tmp;
+		spare_needed -=3D tmp;
+		balance =3D 1;
+	}
+
+	/*
+	 * Increase the reserve pool until the desired allocation is met.
+	 * Move an allocation freed from the excess pool into the reserve
+	 * pool and schedule a balance operation.
+	 */
+	if (excess_freed && (vio_cmo.desired > vio_cmo.reserve.size)) {
+		tmp =3D min(excess_freed, (vio_cmo.desired - vio_cmo.reserve.size));
+
+		vio_cmo.excess.size -=3D tmp;
+		vio_cmo.reserve.size +=3D tmp;
+		excess_freed -=3D tmp;
+		balance =3D 1;
+	}
+
+	/* Return memory from the excess pool to that pool */
+	if (excess_freed)
+		vio_cmo.excess.free +=3D excess_freed;
+
+	if (balance)
+		schedule_delayed_work(&vio_cmo.balance_q, VIO_CMO_BALANCE_DELAY);
+	spin_unlock_irqrestore(&vio_cmo.lock, flags);
+}
+
+/**
+ * vio_cmo_entitlement_update - Manage system entitlement changes
+ *
+ * @new_entitlement: new system entitlement to attempt to accommodate
+ *
+ * Increases in entitlement will be used to fulfill the spare entitlement
+ * and the rest is given to the excess pool.  Decreases, if they are
+ * possible, come from the excess pool and from unused device entitlement
+ *
+ * Returns: 0 on success, -ENOMEM when change can not be made
+ */
+int vio_cmo_entitlement_update(size_t new_entitlement)
+{
+	struct vio_dev *viodev;
+	struct vio_cmo_dev_entry *dev_ent;
+	unsigned long flags;
+	size_t avail, delta, tmp;
+
+	spin_lock_irqsave(&vio_cmo.lock, flags);
+
+	/* Entitlement increases */
+	if (new_entitlement > vio_cmo.entitled) {
+		delta =3D new_entitlement - vio_cmo.entitled;
+
+		/* Fulfill spare allocation */
+		if (vio_cmo.spare < VIO_CMO_MIN_ENT) {
+			tmp =3D min(delta, (VIO_CMO_MIN_ENT - vio_cmo.spare));
+			vio_cmo.spare +=3D tmp;
+			vio_cmo.reserve.size +=3D tmp;
+			delta -=3D tmp;
+		}
+
+		/* Remaining new allocation goes to the excess pool */
+		vio_cmo.entitled +=3D delta;
+		vio_cmo.excess.size +=3D delta;
+		vio_cmo.excess.free +=3D delta;
+
+		goto out;
+	}
+
+	/* Entitlement decreases */
+	delta =3D vio_cmo.entitled - new_entitlement;
+	avail =3D vio_cmo.excess.free;
+
+	/*
+	 * Need to check how much unused entitlement each device can
+	 * sacrifice to fulfill entitlement change.
+	 */
+	list_for_each_entry(dev_ent, &vio_cmo.device_list, list) {
+		if (avail >=3D delta)
+			break;
+
+		viodev =3D dev_ent->viodev;
+		if ((viodev->cmo.entitled > viodev->cmo.allocated) &&
+		    (viodev->cmo.entitled > VIO_CMO_MIN_ENT))
+				avail +=3D viodev->cmo.entitled -
+				         max_t(size_t, viodev->cmo.allocated,
+				               VIO_CMO_MIN_ENT);
+	}
+
+	if (delta <=3D avail) {
+		vio_cmo.entitled -=3D delta;
+
+		/* Take entitlement from the excess pool first */
+		tmp =3D min(vio_cmo.excess.free, delta);
+		vio_cmo.excess.size -=3D tmp;
+		vio_cmo.excess.free -=3D tmp;
+		delta -=3D tmp;
+
+		/*
+		 * Remove all but VIO_CMO_MIN_ENT bytes from devices
+		 * until entitlement change is served
+		 */
+		list_for_each_entry(dev_ent, &vio_cmo.device_list, list) {
+			if (!delta)
+				break;
+
+			viodev =3D dev_ent->viodev;
+			tmp =3D 0;
+			if ((viodev->cmo.entitled > viodev->cmo.allocated) &&
+			    (viodev->cmo.entitled > VIO_CMO_MIN_ENT))
+				tmp =3D viodev->cmo.entitled -
+				      max_t(size_t, viodev->cmo.allocated,
+				            VIO_CMO_MIN_ENT);
+			viodev->cmo.entitled -=3D min(tmp, delta);
+			delta -=3D min(tmp, delta);
+		}
+	} else {
+		spin_unlock_irqrestore(&vio_cmo.lock, flags);
+		return -ENOMEM;
+	}
+
+out:
+	schedule_delayed_work(&vio_cmo.balance_q, 0);
+	spin_unlock_irqrestore(&vio_cmo.lock, flags);
+	return 0;
+}
+
+/**
+ * vio_cmo_balance - Balance entitlement among devices
+ *
+ * @work: work queue structure for this operation
+ *
+ * Any system entitlement above the minimum needed for devices, or
+ * already allocated to devices, can be distributed to the devices.
+ * The list of devices is iterated through to recalculate the desired
+ * entitlement level and to determine how much entitlement above the
+ * minimum entitlement is allocated to devices.
+ *
+ * Small chunks of the available entitlement are given to devices until
+ * their requirements are fulfilled or there is no entitlement left to giv=
e.
+ * Upon completion sizes of the reserve and excess pools are calculated.
+ *
+ * The system minimum entitlement level is also recalculated here.
+ * Entitlement will be reserved for devices even after vio_bus_remove to
+ * accommodate reloading the driver.  The OF tree is walked to count the
+ * number of devices present and this will remove entitlement for devices
+ * that have actually left the system after having vio_bus_remove called.
+ */
+static void vio_cmo_balance(struct work_struct *work)
+{
+	struct vio_cmo *cmo;
+	struct vio_dev *viodev;
+	struct vio_cmo_dev_entry *dev_ent;
+	unsigned long flags;
+	size_t avail =3D 0, level, chunk, need;
+	int devcount =3D 0, fulfilled;
+
+	cmo =3D container_of(work, struct vio_cmo, balance_q.work);
+
+	spin_lock_irqsave(&vio_cmo.lock, flags);
+
+	/* Calculate minimum entitlement and fulfill spare */
+	cmo->min =3D vio_cmo_num_OF_devs() * VIO_CMO_MIN_ENT;
+	BUG_ON(cmo->min > cmo->entitled);
+	cmo->spare =3D min_t(size_t, VIO_CMO_MIN_ENT, (cmo->entitled - cmo->min));
+	cmo->min +=3D cmo->spare;
+	cmo->desired =3D cmo->min;
+
+	/*
+	 * Determine how much entitlement is available and reset device
+	 * entitlements
+	 */
+	avail =3D cmo->entitled - cmo->spare;
+	list_for_each_entry(dev_ent, &vio_cmo.device_list, list) {
+		viodev =3D dev_ent->viodev;
+		devcount++;
+		viodev->cmo.entitled =3D VIO_CMO_MIN_ENT;
+		cmo->desired +=3D (viodev->cmo.desired - VIO_CMO_MIN_ENT);
+		avail -=3D max_t(size_t, viodev->cmo.allocated, VIO_CMO_MIN_ENT);
+	}
+
+	/*
+	 * Having provided each device with the minimum entitlement, loop
+	 * over the devices portioning out the remaining entitlement
+	 * until there is nothing left.
+	 */
+	level =3D VIO_CMO_MIN_ENT;
+	while (avail) {
+		fulfilled =3D 0;
+		list_for_each_entry(dev_ent, &vio_cmo.device_list, list) {
+			viodev =3D dev_ent->viodev;
+
+			if (viodev->cmo.desired <=3D level) {
+				fulfilled++;
+				continue;
+			}
+
+			/*
+			 * Give the device up to VIO_CMO_BALANCE_CHUNK
+			 * bytes of entitlement, but do not exceed the
+			 * desired level of entitlement for the device.
+			 */
+			chunk =3D min_t(size_t, avail, VIO_CMO_BALANCE_CHUNK);
+			chunk =3D min(chunk, (viodev->cmo.desired -
+			                    viodev->cmo.entitled));
+			viodev->cmo.entitled +=3D chunk;
+
+			/*
+			 * If the memory for this entitlement increase was
+			 * already allocated to the device it does not come
+			 * from the available pool being portioned out.
+			 */
+			need =3D max(viodev->cmo.allocated, viodev->cmo.entitled)-
+			       max(viodev->cmo.allocated, level);
+			avail -=3D need;
+
+		}
+		if (fulfilled =3D=3D devcount)
+			break;
+		level +=3D VIO_CMO_BALANCE_CHUNK;
+	}
+
+	/* Calculate new reserve and excess pool sizes */
+	cmo->reserve.size =3D cmo->min;
+	cmo->excess.free =3D 0;
+	cmo->excess.size =3D 0;
+	need =3D 0;
+	list_for_each_entry(dev_ent, &vio_cmo.device_list, list) {
+		viodev =3D dev_ent->viodev;
+		/* Calculated reserve size above the minimum entitlement */
+		if (viodev->cmo.entitled)
+			cmo->reserve.size +=3D (viodev->cmo.entitled -
+			                      VIO_CMO_MIN_ENT);
+		/* Calculated used excess entitlement */
+		if (viodev->cmo.allocated > viodev->cmo.entitled)
+			need +=3D viodev->cmo.allocated - viodev->cmo.entitled;
+	}
+	cmo->excess.size =3D cmo->entitled - cmo->reserve.size;
+	cmo->excess.free =3D cmo->excess.size - need;
+
+	cancel_delayed_work(container_of(work, struct delayed_work, work));
+	spin_unlock_irqrestore(&vio_cmo.lock, flags);
+}
+
+static void *vio_dma_iommu_alloc_coherent(struct device *dev, size_t size,
+                                          dma_addr_t *dma_handle, gfp_t fl=
ag)
+{
+	struct vio_dev *viodev =3D to_vio_dev(dev);
+	void *ret;
+
+	if (vio_cmo_alloc(viodev, roundup(size, PAGE_SIZE))) {
+		atomic_inc(&viodev->cmo.allocs_failed);
+		return NULL;
+	}
+
+	ret =3D dma_iommu_ops.alloc_coherent(dev, size, dma_handle, flag);
+	if (unlikely(ret =3D=3D NULL)) {
+		vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
+		atomic_inc(&viodev->cmo.allocs_failed);
+	}
+
+	return ret;
+}
+
+static void vio_dma_iommu_free_coherent(struct device *dev, size_t size,
+                                        void *vaddr, dma_addr_t dma_handle)
+{
+	struct vio_dev *viodev =3D to_vio_dev(dev);
+
+	dma_iommu_ops.free_coherent(dev, size, vaddr, dma_handle);
+
+	vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
+}
+
+static dma_addr_t vio_dma_iommu_map_single(struct device *dev, void *vaddr,
+                                           size_t size,
+                                           enum dma_data_direction directi=
on)
+{
+	struct vio_dev *viodev =3D to_vio_dev(dev);
+	dma_addr_t ret =3D DMA_ERROR_CODE;
+
+	if (vio_cmo_alloc(viodev, roundup(size, PAGE_SIZE))) {
+		atomic_inc(&viodev->cmo.allocs_failed);
+		return ret;
+	}
+
+	ret =3D dma_iommu_ops.map_single(dev, vaddr, size, direction);
+	if (unlikely(dma_mapping_error(ret))) {
+		vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
+		atomic_inc(&viodev->cmo.allocs_failed);
+	}
+
+	return ret;
+}
+
+static void vio_dma_iommu_unmap_single(struct device *dev,
+		dma_addr_t dma_handle, size_t size,
+		enum dma_data_direction direction)
+{
+	struct vio_dev *viodev =3D to_vio_dev(dev);
+
+	dma_iommu_ops.unmap_single(dev, dma_handle, size, direction);
+
+	vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
+}
+
+static int vio_dma_iommu_map_sg(struct device *dev, struct scatterlist *sg=
list,
+                                int nelems, enum dma_data_direction direct=
ion)
+{
+	struct vio_dev *viodev =3D to_vio_dev(dev);
+	struct scatterlist *sgl;
+	int ret, count =3D 0;
+	size_t alloc_size =3D 0;
+
+	for (sgl =3D sglist; count < nelems; count++, sgl++)
+		alloc_size +=3D roundup(sgl->length, PAGE_SIZE);
+
+	if (vio_cmo_alloc(viodev, alloc_size)) {
+		atomic_inc(&viodev->cmo.allocs_failed);
+		return 0;
+	}
+
+	ret =3D dma_iommu_ops.map_sg(dev, sglist, nelems, direction);
+
+	if (unlikely(!ret)) {
+		vio_cmo_dealloc(viodev, alloc_size);
+		atomic_inc(&viodev->cmo.allocs_failed);
+	}
+
+	return ret;
+}
+
+static void vio_dma_iommu_unmap_sg(struct device *dev,
+		struct scatterlist *sglist, int nelems,
+		enum dma_data_direction direction)
+{
+	struct vio_dev *viodev =3D to_vio_dev(dev);
+	struct scatterlist *sgl;
+	size_t alloc_size =3D 0;
+	int count =3D 0;
+
+	for (sgl =3D sglist; count < nelems; count++, sgl++)
+		alloc_size +=3D roundup(sgl->length, PAGE_SIZE);
+
+	dma_iommu_ops.unmap_sg(dev, sglist, nelems, direction);
+
+	vio_cmo_dealloc(viodev, alloc_size);
+}
+
+struct dma_mapping_ops vio_dma_mapping_ops =3D {
+	.alloc_coherent =3D vio_dma_iommu_alloc_coherent,
+	.free_coherent  =3D vio_dma_iommu_free_coherent,
+	.map_single     =3D vio_dma_iommu_map_single,
+	.unmap_single   =3D vio_dma_iommu_unmap_single,
+	.map_sg         =3D vio_dma_iommu_map_sg,
+	.unmap_sg       =3D vio_dma_iommu_unmap_sg,
+};
+
+/**
+ * vio_cmo_set_dev_desired - Set desired entitlement for a device
+ *
+ * @viodev: struct vio_dev for device to alter
+ * @new_desired: new desired entitlement level in bytes
+ *
+ * For use by devices to request a change to their entitlement at runtime =
or
+ * through sysfs.  The desired entitlement level is changed and a balancing
+ * of system resources is scheduled to run in the future.
+ */
+void vio_cmo_set_dev_desired(struct vio_dev *viodev, size_t desired)
+{
+	unsigned long flags;
+	struct vio_cmo_dev_entry *dev_ent;
+	int found =3D 0;
+
+	if (!firmware_has_feature(FW_FEATURE_CMO))
+		return;
+
+	spin_lock_irqsave(&vio_cmo.lock, flags);
+	if (desired < VIO_CMO_MIN_ENT)
+		desired =3D VIO_CMO_MIN_ENT;
+
+	/*
+	 * Changes will not be made for devices not in the device list.
+	 * If it is not in the device list, then no driver is loaded
+	 * for the device and it can not receive entitlement.
+	 */
+	list_for_each_entry(dev_ent, &vio_cmo.device_list, list)
+		if (viodev =3D=3D dev_ent->viodev) {
+			found =3D 1;
+			break;
+		}
+	if (!found)
+		return;
+
+	/* Increase/decrease in desired device entitlement */
+	if (desired >=3D viodev->cmo.desired) {
+		/* Just bump the bus and device values prior to a balance*/
+		vio_cmo.desired +=3D desired - viodev->cmo.desired;
+		viodev->cmo.desired =3D desired;
+	} else {
+		/* Decrease bus and device values for desired entitlement */
+		vio_cmo.desired -=3D viodev->cmo.desired - desired;
+		viodev->cmo.desired =3D desired;
+		/*
+		 * If less entitlement is desired than current entitlement, move
+		 * any reserve memory in the change region to the excess pool.
+		 */
+		if (viodev->cmo.entitled > desired) {
+			vio_cmo.reserve.size -=3D viodev->cmo.entitled - desired;
+			vio_cmo.excess.size +=3D viodev->cmo.entitled - desired;
+			/*
+			 * If entitlement moving from the reserve pool to the
+			 * excess pool is currently unused, add to the excess
+			 * free counter.
+			 */
+			if (viodev->cmo.allocated < viodev->cmo.entitled)
+				vio_cmo.excess.free +=3D viodev->cmo.entitled -
+				                       max(viodev->cmo.allocated, desired);
+			viodev->cmo.entitled =3D desired;
+		}
+	}
+	schedule_delayed_work(&vio_cmo.balance_q, 0);
+	spin_unlock_irqrestore(&vio_cmo.lock, flags);
+}
+
+/**
+ * vio_cmo_bus_probe - Handle CMO specific bus probe activities
+ *
+ * @viodev - Pointer to struct vio_dev for device
+ *
+ * Determine the devices IO memory entitlement needs, attempting
+ * to satisfy the system minimum entitlement at first and scheduling
+ * a balance operation to take care of the rest at a later time.
+ *
+ * Returns: 0 on success, -EINVAL when device doesn't support CMO, and
+ *          -ENOMEM when entitlement is not available for device or
+ *          device entry.
+ *
+ */
+static int vio_cmo_bus_probe(struct vio_dev *viodev)
+{
+	struct vio_cmo_dev_entry *dev_ent;
+	struct device *dev =3D &viodev->dev;
+	struct vio_driver *viodrv =3D to_vio_driver(dev->driver);
+	unsigned long flags;
+	size_t size;
+
+	/*
+	 * Check to see that device has a DMA window and configure
+	 * entitlement for the device.
+	 */
+	if (of_get_property(viodev->dev.archdata.of_node,
+	                    "ibm,my-dma-window", NULL)) {
+		/* Check that the driver is CMO enabled and get desired DMA */
+		if (!viodrv->get_desired_dma) {
+			dev_err(dev, "%s: device driver does not support CMO\n",
+			        __func__);
+			return -EINVAL;
+		}
+
+		viodev->cmo.desired =3D IOMMU_PAGE_ALIGN(viodrv->get_desired_dma(viodev)=
);
+		if (viodev->cmo.desired < VIO_CMO_MIN_ENT)
+			viodev->cmo.desired =3D VIO_CMO_MIN_ENT;
+		size =3D VIO_CMO_MIN_ENT;
+
+		dev_ent =3D kmalloc(sizeof(struct vio_cmo_dev_entry),
+		                  GFP_KERNEL);
+		if (!dev_ent)
+			return -ENOMEM;
+
+		dev_ent->viodev =3D viodev;
+		spin_lock_irqsave(&vio_cmo.lock, flags);
+		list_add(&dev_ent->list, &vio_cmo.device_list);
+	} else {
+		viodev->cmo.desired =3D 0;
+		size =3D 0;
+		spin_lock_irqsave(&vio_cmo.lock, flags);
+	}
+
+	/*
+	 * If the needs for vio_cmo.min have not changed since they
+	 * were last set, the number of devices in the OF tree has
+	 * been constant and the IO memory for this is already in
+	 * the reserve pool.
+	 */
+	if (vio_cmo.min =3D=3D ((vio_cmo_num_OF_devs() + 1) *
+	                    VIO_CMO_MIN_ENT)) {
+		/* Updated desired entitlement if device requires it */
+		if (size)
+			vio_cmo.desired +=3D (viodev->cmo.desired -
+		                        VIO_CMO_MIN_ENT);
+	} else {
+		size_t tmp;
+
+		tmp =3D vio_cmo.spare + vio_cmo.excess.free;
+		if (tmp < size) {
+			dev_err(dev, "%s: insufficient free "
+			        "entitlement to add device. "
+			        "Need %lu, have %lu\n", __func__,
+				size, (vio_cmo.spare + tmp));
+			spin_unlock_irqrestore(&vio_cmo.lock, flags);
+			return -ENOMEM;
+		}
+
+		/* Use excess pool first to fulfill request */
+		tmp =3D min(size, vio_cmo.excess.free);
+		vio_cmo.excess.free -=3D tmp;
+		vio_cmo.excess.size -=3D tmp;
+		vio_cmo.reserve.size +=3D tmp;
+
+		/* Use spare if excess pool was insufficient */
+		vio_cmo.spare -=3D size - tmp;
+
+		/* Update bus accounting */
+		vio_cmo.min +=3D size;
+		vio_cmo.desired +=3D viodev->cmo.desired;
+	}
+	spin_unlock_irqrestore(&vio_cmo.lock, flags);
+	return 0;
+}
+
+/**
+ * vio_cmo_bus_remove - Handle CMO specific bus removal activities
+ *
+ * @viodev - Pointer to struct vio_dev for device
+ *
+ * Remove the device from the cmo device list.  The minimum entitlement
+ * will be reserved for the device as long as it is in the system.  The
+ * rest of the entitlement the device had been allocated will be returned
+ * to the system.
+ */
+static void vio_cmo_bus_remove(struct vio_dev *viodev)
+{
+	struct vio_cmo_dev_entry *dev_ent;
+	unsigned long flags;
+	size_t tmp;
+
+	spin_lock_irqsave(&vio_cmo.lock, flags);
+	if (viodev->cmo.allocated) {
+		dev_err(&viodev->dev, "%s: device had %lu bytes of IO "
+		        "allocated after remove operation.\n",
+		        __func__, viodev->cmo.allocated);
+		BUG();
+	}
+
+	/*
+	 * Remove the device from the device list being maintained for
+	 * CMO enabled devices.
+	 */
+	list_for_each_entry(dev_ent, &vio_cmo.device_list, list)
+		if (viodev =3D=3D dev_ent->viodev) {
+			list_del(&dev_ent->list);
+			kfree(dev_ent);
+			break;
+		}
+
+	/*
+	 * Devices may not require any entitlement and they do not need
+	 * to be processed.  Otherwise, return the device's entitlement
+	 * back to the pools.
+	 */
+	if (viodev->cmo.entitled) {
+		/*
+		 * This device has not yet left the OF tree, it's
+		 * minimum entitlement remains in vio_cmo.min and
+		 * vio_cmo.desired
+		 */
+		vio_cmo.desired -=3D (viodev->cmo.desired - VIO_CMO_MIN_ENT);
+
+		/*
+		 * Save min allocation for device in reserve as long
+		 * as it exists in OF tree as determined by later
+		 * balance operation
+		 */
+		viodev->cmo.entitled -=3D VIO_CMO_MIN_ENT;
+
+		/* Replenish spare from freed reserve pool */
+		if (viodev->cmo.entitled && (vio_cmo.spare < VIO_CMO_MIN_ENT)) {
+			tmp =3D min(viodev->cmo.entitled, (VIO_CMO_MIN_ENT -
+			                                 vio_cmo.spare));
+			vio_cmo.spare +=3D tmp;
+			viodev->cmo.entitled -=3D tmp;
+		}
+
+		/* Remaining reserve goes to excess pool */
+		vio_cmo.excess.size +=3D viodev->cmo.entitled;
+		vio_cmo.excess.free +=3D viodev->cmo.entitled;
+		vio_cmo.reserve.size -=3D viodev->cmo.entitled;
+
+		/*
+		 * Until the device is removed it will keep a
+		 * minimum entitlement; this will guarantee that
+		 * a module unload/load will result in a success.
+		 */
+		viodev->cmo.entitled =3D VIO_CMO_MIN_ENT;
+		viodev->cmo.desired =3D VIO_CMO_MIN_ENT;
+		atomic_set(&viodev->cmo.allocs_failed, 0);
+	}
+
+	spin_unlock_irqrestore(&vio_cmo.lock, flags);
+}
+
+static void vio_cmo_set_dma_ops(struct vio_dev *viodev)
+{
+	vio_dma_mapping_ops.dma_supported =3D dma_iommu_ops.dma_supported;
+	viodev->dev.archdata.dma_ops =3D &vio_dma_mapping_ops;
+}
+
+/**
+ * vio_cmo_bus_init - CMO entitlement initialization at bus init time
+ *
+ * Set up the reserve and excess entitlement pools based on available
+ * system entitlement and the number of devices in the OF tree that
+ * require entitlement in the reserve pool.
+ */
+static void vio_cmo_bus_init(void)
+{
+	struct hvcall_mpp_data mpp_data;
+	int err;
+
+	memset(&vio_cmo, 0, sizeof(struct vio_cmo));
+	spin_lock_init(&vio_cmo.lock);
+	INIT_LIST_HEAD(&vio_cmo.device_list);
+	INIT_DELAYED_WORK(&vio_cmo.balance_q, vio_cmo_balance);
+
+	/* Get current system entitlement */
+	err =3D h_get_mpp(&mpp_data);
+
+	/*
+	 * On failure, continue with entitlement set to 0, will panic()
+	 * later when spare is reserved.
+	 */
+	if (err !=3D H_SUCCESS) {
+		printk(KERN_ERR "%s: unable to determine system IO "\
+		       "entitlement. (%d)\n", __func__, err);
+		vio_cmo.entitled =3D 0;
+	} else {
+		vio_cmo.entitled =3D mpp_data.entitled_mem;
+	}
+
+	/* Set reservation and check against entitlement */
+	vio_cmo.spare =3D VIO_CMO_MIN_ENT;
+	vio_cmo.reserve.size =3D vio_cmo.spare;
+	vio_cmo.reserve.size +=3D (vio_cmo_num_OF_devs() *
+	                         VIO_CMO_MIN_ENT);
+	if (vio_cmo.reserve.size > vio_cmo.entitled) {
+		printk(KERN_ERR "%s: insufficient system entitlement\n",
+		       __func__);
+		panic("%s: Insufficient system entitlement", __func__);
+	}
+
+	/* Set the remaining accounting variables */
+	vio_cmo.excess.size =3D vio_cmo.entitled - vio_cmo.reserve.size;
+	vio_cmo.excess.free =3D vio_cmo.excess.size;
+	vio_cmo.min =3D vio_cmo.reserve.size;
+	vio_cmo.desired =3D vio_cmo.reserve.size;
+}
+
+/* sysfs device functions and data structures for CMO */
+
+#define viodev_cmo_rd_attr(name)                                        \
+static ssize_t viodev_cmo_##name##_show(struct device *dev,             \
+                                        struct device_attribute *attr,  \
+                                         char *buf)                     \
+{                                                                       \
+	return sprintf(buf, "%lu\n", to_vio_dev(dev)->cmo.name);        \
+}
+
+static ssize_t viodev_cmo_allocs_failed_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct vio_dev *viodev =3D to_vio_dev(dev);
+	return sprintf(buf, "%d\n", atomic_read(&viodev->cmo.allocs_failed));
+}
+
+static ssize_t viodev_cmo_allocs_failed_reset(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct vio_dev *viodev =3D to_vio_dev(dev);
+	atomic_set(&viodev->cmo.allocs_failed, 0);
+	return count;
+}
+
+static ssize_t viodev_cmo_desired_set(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct vio_dev *viodev =3D to_vio_dev(dev);
+	size_t new_desired;
+	int ret;
+
+	ret =3D strict_strtoul(buf, 10, &new_desired);
+	if (ret)
+		return ret;
+
+	vio_cmo_set_dev_desired(viodev, new_desired);
+	return count;
+}
+
+viodev_cmo_rd_attr(desired);
+viodev_cmo_rd_attr(entitled);
+viodev_cmo_rd_attr(allocated);
+
+static ssize_t name_show(struct device *, struct device_attribute *, char =
*);
+static ssize_t devspec_show(struct device *, struct device_attribute *, ch=
ar *);
+static struct device_attribute vio_cmo_dev_attrs[] =3D {
+	__ATTR_RO(name),
+	__ATTR_RO(devspec),
+	__ATTR(cmo_desired,       S_IWUSR|S_IRUSR|S_IWGRP|S_IRGRP|S_IROTH,
+	       viodev_cmo_desired_show, viodev_cmo_desired_set),
+	__ATTR(cmo_entitled,      S_IRUGO, viodev_cmo_entitled_show,      NULL),
+	__ATTR(cmo_allocated,     S_IRUGO, viodev_cmo_allocated_show,     NULL),
+	__ATTR(cmo_allocs_failed, S_IWUSR|S_IRUSR|S_IWGRP|S_IRGRP|S_IROTH,
+	       viodev_cmo_allocs_failed_show, viodev_cmo_allocs_failed_reset),
+	__ATTR_NULL
+};
+
+/* sysfs bus functions and data structures for CMO */
+
+#define viobus_cmo_rd_attr(name)                                        \
+static ssize_t                                                          \
+viobus_cmo_##name##_show(struct bus_type *bt, char *buf)                \
+{                                                                       \
+	return sprintf(buf, "%lu\n", vio_cmo.name);                     \
+}
+
+#define viobus_cmo_pool_rd_attr(name, var)                              \
+static ssize_t                                                          \
+viobus_cmo_##name##_pool_show_##var(struct bus_type *bt, char *buf)     \
+{                                                                       \
+	return sprintf(buf, "%lu\n", vio_cmo.name.var);                 \
+}
+
+static ssize_t viobus_cmo_high_reset(struct bus_type *bt, const char *buf,
+                                     size_t count)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&vio_cmo.lock, flags);
+	vio_cmo.high =3D vio_cmo.curr;
+	spin_unlock_irqrestore(&vio_cmo.lock, flags);
+
+	return count;
+}
+
+viobus_cmo_rd_attr(entitled);
+viobus_cmo_pool_rd_attr(reserve, size);
+viobus_cmo_pool_rd_attr(excess, size);
+viobus_cmo_pool_rd_attr(excess, free);
+viobus_cmo_rd_attr(spare);
+viobus_cmo_rd_attr(min);
+viobus_cmo_rd_attr(desired);
+viobus_cmo_rd_attr(curr);
+viobus_cmo_rd_attr(high);
+
+static struct bus_attribute vio_cmo_bus_attrs[] =3D {
+	__ATTR(cmo_entitled, S_IRUGO, viobus_cmo_entitled_show, NULL),
+	__ATTR(cmo_reserve_size, S_IRUGO, viobus_cmo_reserve_pool_show_size, NULL=
),
+	__ATTR(cmo_excess_size, S_IRUGO, viobus_cmo_excess_pool_show_size, NULL),
+	__ATTR(cmo_excess_free, S_IRUGO, viobus_cmo_excess_pool_show_free, NULL),
+	__ATTR(cmo_spare,   S_IRUGO, viobus_cmo_spare_show,   NULL),
+	__ATTR(cmo_min,     S_IRUGO, viobus_cmo_min_show,     NULL),
+	__ATTR(cmo_desired, S_IRUGO, viobus_cmo_desired_show, NULL),
+	__ATTR(cmo_curr,    S_IRUGO, viobus_cmo_curr_show,    NULL),
+	__ATTR(cmo_high,    S_IWUSR|S_IRUSR|S_IWGRP|S_IRGRP|S_IROTH,
+	       viobus_cmo_high_show, viobus_cmo_high_reset),
+	__ATTR_NULL
+};
+
+static void vio_cmo_sysfs_init(void)
+{
+	vio_bus_type.dev_attrs =3D vio_cmo_dev_attrs;
+	vio_bus_type.bus_attrs =3D vio_cmo_bus_attrs;
+}
+#else /* CONFIG_PPC_PSERIES */
+/* Dummy functions for iSeries platform */
+int vio_cmo_entitlement_update(size_t new_entitlement) { return 0; }
+void vio_cmo_set_dev_desired(struct vio_dev *viodev, size_t desired) {}
+static int vio_cmo_bus_probe(struct vio_dev *viodev) { return 0; }
+static void vio_cmo_bus_remove(struct vio_dev *viodev) {}
+static void vio_cmo_set_dma_ops(struct vio_dev *viodev) {}
+static void vio_cmo_bus_init() {}
+static void vio_cmo_sysfs_init() { }
+#endif /* CONFIG_PPC_PSERIES */
+EXPORT_SYMBOL(vio_cmo_entitlement_update);
+EXPORT_SYMBOL(vio_cmo_set_dev_desired);
+
 static struct iommu_table *vio_build_iommu_table(struct vio_dev *dev)
 {
 	const unsigned char *dma_window;
@@ -114,8 +1096,17 @@ static int vio_bus_probe(struct device *
 		return error;
=20
 	id =3D vio_match_device(viodrv->id_table, viodev);
-	if (id)
+	if (id) {
+		memset(&viodev->cmo, 0, sizeof(viodev->cmo));
+		if (firmware_has_feature(FW_FEATURE_CMO)) {
+			error =3D vio_cmo_bus_probe(viodev);
+			if (error)
+				return error;
+		}
 		error =3D viodrv->probe(viodev, id);
+		if (error)
+			vio_cmo_bus_remove(viodev);
+	}
=20
 	return error;
 }
@@ -125,12 +1116,23 @@ static int vio_bus_remove(struct device=20
 {
 	struct vio_dev *viodev =3D to_vio_dev(dev);
 	struct vio_driver *viodrv =3D to_vio_driver(dev->driver);
+	struct device *devptr;
+	int ret =3D 1;
+
+	/*
+	 * Hold a reference to the device after the remove function is called
+	 * to allow for CMO accounting cleanup for the device.
+	 */
+	devptr =3D get_device(dev);
=20
 	if (viodrv->remove)
-		return viodrv->remove(viodev);
+		ret =3D viodrv->remove(viodev);
=20
-	/* driver can't remove */
-	return 1;
+	if (!ret && firmware_has_feature(FW_FEATURE_CMO))
+		vio_cmo_bus_remove(viodev);
+
+	put_device(devptr);
+	return ret;
 }
=20
 /**
@@ -215,7 +1217,11 @@ struct vio_dev *vio_register_device_node
 			viodev->unit_address =3D *unit_address;
 	}
 	viodev->dev.archdata.of_node =3D of_node_get(of_node);
-	viodev->dev.archdata.dma_ops =3D &dma_iommu_ops;
+
+	if (firmware_has_feature(FW_FEATURE_CMO))
+		vio_cmo_set_dma_ops(viodev);
+	else
+		viodev->dev.archdata.dma_ops =3D &dma_iommu_ops;
 	viodev->dev.archdata.dma_data =3D vio_build_iommu_table(viodev);
 	viodev->dev.archdata.numa_node =3D of_node_to_nid(of_node);
=20
@@ -245,6 +1251,9 @@ static int __init vio_bus_init(void)
 	int err;
 	struct device_node *node_vroot;
=20
+	if (firmware_has_feature(FW_FEATURE_CMO))
+		vio_cmo_sysfs_init();
+
 	err =3D bus_register(&vio_bus_type);
 	if (err) {
 		printk(KERN_ERR "failed to register VIO bus\n");
@@ -262,6 +1271,9 @@ static int __init vio_bus_init(void)
 		return err;
 	}
=20
+	if (firmware_has_feature(FW_FEATURE_CMO))
+		vio_cmo_bus_init();
+
 	node_vroot =3D of_find_node_by_name(NULL, "vdevice");
 	if (node_vroot) {
 		struct device_node *of_node;
Index: b/include/asm-powerpc/vio.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- a/include/asm-powerpc/vio.h
+++ b/include/asm-powerpc/vio.h
@@ -39,16 +39,32 @@
 #define VIO_IRQ_DISABLE		0UL
 #define VIO_IRQ_ENABLE		1UL
=20
+/*
+ * VIO CMO minimum entitlement for all devices and spare entitlement
+ */
+#define VIO_CMO_MIN_ENT 1562624
+
 struct iommu_table;
=20
-/*
- * The vio_dev structure is used to describe virtual I/O devices.
+/**
+ * vio_dev - This structure is used to describe virtual I/O devices.
+ *
+ * @desired: set from return of driver's get_desired_dma() function
+ * @entitled: bytes of IO data that has been reserved for this device.
+ * @allocated: bytes of IO data currently in use by the device.
+ * @allocs_failed: number of DMA failures due to insufficient entitlement.
  */
 struct vio_dev {
 	const char *name;
 	const char *type;
 	uint32_t unit_address;
 	unsigned int irq;
+	struct {
+		size_t desired;
+		size_t entitled;
+		size_t allocated;
+		atomic_t allocs_failed;
+	} cmo;
 	struct device dev;
 };
=20
@@ -56,12 +72,19 @@ struct vio_driver {
 	const struct vio_device_id *id_table;
 	int (*probe)(struct vio_dev *dev, const struct vio_device_id *id);
 	int (*remove)(struct vio_dev *dev);
+	/* A driver must have a get_desired_dma() function to
+	 * be loaded in a CMO environment if it uses DMA.
+	 */
+	unsigned long (*get_desired_dma)(struct vio_dev *dev);
 	struct device_driver driver;
 };
=20
 extern int vio_register_driver(struct vio_driver *drv);
 extern void vio_unregister_driver(struct vio_driver *drv);
=20
+extern int vio_cmo_entitlement_update(size_t);
+extern void vio_cmo_set_dev_desired(struct vio_dev *viodev, size_t desired=
);
+
 extern void __devinit vio_unregister_device(struct vio_dev *dev);
=20
 struct device_node;

^ permalink raw reply

* Re: While(1) in kernel space
From: Paolo Doz @ 2008-07-08 20:53 UTC (permalink / raw)
  To: Grant Likely; +Cc: linuxppc-dev, Arnd Bergmann
In-Reply-To: <fa686aa40807080747u3e1f4f57of1a2cddc67b446b9@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

Thanks for the infos, I'll try kernel thread or timer/softirq in the next
days.
I'll let you know which of them fit my problems.

I actually have about +/- 1msec of freedom (but still require more
investigation).

Paolo

On Tue, Jul 8, 2008 at 4:47 PM, Grant Likely <grant.likely@secretlab.ca>
wrote:

> On Tue, Jul 8, 2008 at 8:45 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tuesday 08 July 2008, Grant Likely wrote:
> >
> >>
> >> You can use a kernel thread.
> >>
> >> I'm not sure how accurate this is, but here is some information about
> them:
> >>
> >>
> http://www.linuxquestions.org/linux/articles/Technical/Linux_Kernel_Thread
> >
> > Not accurate at all. New code should use kthread_create, as documented in
> >
> > http://lwn.net/Articles/65178/
>
> Teach me to blindly use google.  Thanks Arnd.
>
> g.
>
> --
> Grant Likely, B.Sc., P.Eng.
> Secret Lab Technologies Ltd.
>

[-- Attachment #2: Type: text/html, Size: 1539 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox