LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: CONFIG_NO_HZ added too much idle time in /proc/stat during throughput test.
From: Anton Blanchard @ 2011-12-14  3:17 UTC (permalink / raw)
  To: Fushen Chen; +Cc: Peter Zijlstra, Thomas Gleixner, Linuxppc-dev Development
In-Reply-To: <CAEu=RPgnh-i-SFt4kiiAtDjLZ3A0cAHhk7ch56Uvv0uG_+HfCg@mail.gmail.com>


Hi,

> This is 2.6.32, but I think 2.6.36 is the same.

Sounds a bit like this, merged in 2.6.39.

Anton
--

commit ad5d1c888e556bc00c4e86f452cad4a3a87d22c1
Author: Anton Blanchard <anton@samba.org>
Date:   Sun Mar 20 15:28:03 2011 +0000

    powerpc: Fix accounting of softirq time when idle
    
    commit cf9efce0ce31 (powerpc: Account time using timebase rather
    than PURR) used in_irq() to detect if the time was spent in
    interrupt processing. This only catches hardirq context so if we
    are in softirq context and in the idle loop we end up accounting it
    as idle time. If we instead use in_interrupt() we catch both softirq
    and hardirq time.
    
    The issue was found when running a network intensive workload. top
    showed the following:
    
    0.0%us,  1.1%sy,  0.0%ni, 85.7%id,  0.0%wa,  9.9%hi,  3.3%si,  0.0%st
    
    85.7% idle. But this was wildly different to the perf events data.
    To confirm the suspicion I ran something to keep the core busy:
    
    # yes > /dev/null &
    
    8.2%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa, 10.3%hi, 81.4%si,  0.0%st
    
    We only got 8.2% of the CPU for the userspace task and softirq has
    shot up to 81.4%.
    
    With the patch below top shows the correct stats:
    
    0.0%us,  0.0%sy,  0.0%ni,  5.3%id,  0.0%wa, 13.3%hi, 81.3%si,  0.0%st
    
    Signed-off-by: Anton Blanchard <anton@samba.org>
    Cc: stable@kernel.org
    Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

^ permalink raw reply

* Re: [PATCH] powerpc: Fix swiotlb ops for ppc64
From: Benjamin Herrenschmidt @ 2011-12-14  3:00 UTC (permalink / raw)
  To: Becky Bruce; +Cc: linuxppc-dev list
In-Reply-To: <C9D58E47-96F4-45ED-BE54-758D71B361FD@kernel.crashing.org>

On Tue, 2011-12-13 at 20:53 -0600, Becky Bruce wrote:
> The non-coherent "specialness" is in the dma sync stuff and no, I
> don't think the iotlb stuff deals with that properly.
> 
> Do you not have a problem with 1)?  If not then I think we can look at
> switching over; 2) was more of a secondary thing.

So let's switch over, avoid an #ifdef which is always a good thing...

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: Fix swiotlb ops for ppc64
From: Becky Bruce @ 2011-12-14  2:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list
In-Reply-To: <1323750427.19891.59.camel@pasglop>


On Dec 12, 2011, at 10:27 PM, Benjamin Herrenschmidt wrote:

> On Mon, 2011-12-12 at 21:55 -0600, Becky Bruce wrote:
>> 1) dma_direct_alloc_coherent strips GFP_HIGHMEM out of the flags =
field
>> when calling the actual allocator and the iotlb version does not.  I
>> don't know how much this matters - I did a quick grep and I don't see
>> any users that specify that, but somebody went through the trouble of
>> putting it in there in the first place and without knowing why I
>> wasn't willing to get rid of it.  Now, since my patch it looks like
>> someone added a VM_BUG_ON into __get_free_pages() if GFP_HIGHMEM so
>> this would get caught.  However, I don't know if we really want to
>> throw a bug there.
>>=20
>> 2)  The iotlb code doesn't deal with the !coherent parts like 8xx.  =
We
>> can work around that by setting up the dma_ops differently for that
>> case instead.
>=20
> Does the rest of it handle them ? I mean swiotlb_map_sg_attrs etc...

The non-coherent "specialness" is in the dma sync stuff and no, I don't =
think the iotlb stuff deals with that properly.

Do you not have a problem with 1)?  If not then I think we can look at =
switching over; 2) was more of a secondary thing.

-B

^ permalink raw reply

* Re: CONFIG_NO_HZ added too much idle time in /proc/stat during throughput test.
From: Fushen Chen @ 2011-12-14  1:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Zijlstra, Linuxppc-dev Development, Thomas Gleixner
In-Reply-To: <1323819246.7671.4.camel@pasglop>

[-- Attachment #1: Type: text/plain, Size: 1524 bytes --]

This is 2.6.32, but I think 2.6.36 is the same.
Thanks,
Fushen

On Tue, Dec 13, 2011 at 3:34 PM, Benjamin Herrenschmidt <
benh@kernel.crashing.org> wrote:

> On Wed, 2011-12-14 at 00:28 +0100, Thomas Gleixner wrote:
> > On Wed, 14 Dec 2011, Benjamin Herrenschmidt wrote:
> >
> > > On Tue, 2011-12-13 at 12:42 -0800, Fushen Chen wrote:
> > > > On APM82181,  "vmstat" (/proc/stat)  doesn't show correct idle
> > > > percent, if kernel enables "CONFIG_NO_HZ" (Tickless System / Dynamic
> > > > Tick).
> > > >
> > > > When I run wireless throughput test with heavy traffic, "vmstat"
> shows
> > > > very high idle percent while "oprofile" shows very low idle percent.
> > > > During the test, the system is idle, but network traffic uses a lot
> of
> > > > hard IRQ and soft-irq time. "vmstat" would have the correct stats if
> > > > account_idle_ticks(ticks) in kernel/time/tick-sched.c doesn't add
> more
> > > > idle time in "vmstat". In the same test, if I disable "CONFIG_NO_HZ"
> > > > in kernel, idle percent in "vmstat" and "oprofile" would match.
> > > >
> > > > My APM82181 kernel configuration is "CONFIG_NO_HZ",
> "CONFIG_HZ_250=y",
> > > > "CONFIG_HZ=250", and "CONFIG_HIGH_RES_TIMERS".
> > > >
> > > > My question is that if kernel enables "CONFIG_NO_HZ", how would
> kernel
> > > > report correct stats.
> > >
> > > Hi Thomas ! Any idea what we're doing wrong ? :-)
> >
> > Not really, that had been an issue before and had been fixed. Peter ????
>
> Fusen, what kernel version is this ?
>
> Cheers,
> Ben.
>
>
>

[-- Attachment #2: Type: text/html, Size: 2237 bytes --]

^ permalink raw reply

* Re: CONFIG_NO_HZ added too much idle time in /proc/stat during throughput test.
From: Andreas Schwab @ 2011-12-13 23:57 UTC (permalink / raw)
  To: Fushen Chen; +Cc: Linuxppc-dev Development
In-Reply-To: <CAEu=RPirE=H1N=KjHNjNRBM6H1fRvrugCw6ojqWaTNm2=WTfng__4707.66240400753$1323813396$gmane$org@mail.gmail.com>

Does this help?

<http://permalink.gmane.org/gmane.linux.ports.ppc.embedded/47530>

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: CONFIG_NO_HZ added too much idle time in /proc/stat during throughput test.
From: Thomas Gleixner @ 2011-12-13 23:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Zijlstra, Linuxppc-dev Development, Fushen Chen
In-Reply-To: <1323818650.7671.3.camel@pasglop>

On Wed, 14 Dec 2011, Benjamin Herrenschmidt wrote:

> On Tue, 2011-12-13 at 12:42 -0800, Fushen Chen wrote:
> > On APM82181,  "vmstat" (/proc/stat)  doesn't show correct idle
> > percent, if kernel enables "CONFIG_NO_HZ" (Tickless System / Dynamic
> > Tick).
> > 
> > When I run wireless throughput test with heavy traffic, "vmstat" shows
> > very high idle percent while "oprofile" shows very low idle percent.
> > During the test, the system is idle, but network traffic uses a lot of
> > hard IRQ and soft-irq time. "vmstat" would have the correct stats if
> > account_idle_ticks(ticks) in kernel/time/tick-sched.c doesn't add more
> > idle time in "vmstat". In the same test, if I disable "CONFIG_NO_HZ"
> > in kernel, idle percent in "vmstat" and "oprofile" would match.
> > 
> > My APM82181 kernel configuration is "CONFIG_NO_HZ", "CONFIG_HZ_250=y",
> > "CONFIG_HZ=250", and "CONFIG_HIGH_RES_TIMERS".
> > 
> > My question is that if kernel enables "CONFIG_NO_HZ", how would kernel
> > report correct stats.
> 
> Hi Thomas ! Any idea what we're doing wrong ? :-)

Not really, that had been an issue before and had been fixed. Peter ????

Thanks,

	tglx

^ permalink raw reply

* Re: CONFIG_NO_HZ added too much idle time in /proc/stat during throughput test.
From: Benjamin Herrenschmidt @ 2011-12-13 23:34 UTC (permalink / raw)
  To: Fushen Chen; +Cc: Peter Zijlstra, Linuxppc-dev Development, Thomas Gleixner
In-Reply-To: <alpine.LFD.2.02.1112140027230.3020@ionos>

On Wed, 2011-12-14 at 00:28 +0100, Thomas Gleixner wrote:
> On Wed, 14 Dec 2011, Benjamin Herrenschmidt wrote:
> 
> > On Tue, 2011-12-13 at 12:42 -0800, Fushen Chen wrote:
> > > On APM82181,  "vmstat" (/proc/stat)  doesn't show correct idle
> > > percent, if kernel enables "CONFIG_NO_HZ" (Tickless System / Dynamic
> > > Tick).
> > > 
> > > When I run wireless throughput test with heavy traffic, "vmstat" shows
> > > very high idle percent while "oprofile" shows very low idle percent.
> > > During the test, the system is idle, but network traffic uses a lot of
> > > hard IRQ and soft-irq time. "vmstat" would have the correct stats if
> > > account_idle_ticks(ticks) in kernel/time/tick-sched.c doesn't add more
> > > idle time in "vmstat". In the same test, if I disable "CONFIG_NO_HZ"
> > > in kernel, idle percent in "vmstat" and "oprofile" would match.
> > > 
> > > My APM82181 kernel configuration is "CONFIG_NO_HZ", "CONFIG_HZ_250=y",
> > > "CONFIG_HZ=250", and "CONFIG_HIGH_RES_TIMERS".
> > > 
> > > My question is that if kernel enables "CONFIG_NO_HZ", how would kernel
> > > report correct stats.
> > 
> > Hi Thomas ! Any idea what we're doing wrong ? :-)
> 
> Not really, that had been an issue before and had been fixed. Peter ????

Fusen, what kernel version is this ?

Cheers,
Ben.

^ permalink raw reply

* Re: CONFIG_NO_HZ added too much idle time in /proc/stat during throughput test.
From: Benjamin Herrenschmidt @ 2011-12-13 23:24 UTC (permalink / raw)
  To: Fushen Chen; +Cc: Linuxppc-dev Development, Thomas Gleixner
In-Reply-To: <CAEu=RPirE=H1N=KjHNjNRBM6H1fRvrugCw6ojqWaTNm2=WTfng@mail.gmail.com>

On Tue, 2011-12-13 at 12:42 -0800, Fushen Chen wrote:
> On APM82181,  "vmstat" (/proc/stat)  doesn't show correct idle
> percent, if kernel enables "CONFIG_NO_HZ" (Tickless System / Dynamic
> Tick).
> 
> When I run wireless throughput test with heavy traffic, "vmstat" shows
> very high idle percent while "oprofile" shows very low idle percent.
> During the test, the system is idle, but network traffic uses a lot of
> hard IRQ and soft-irq time. "vmstat" would have the correct stats if
> account_idle_ticks(ticks) in kernel/time/tick-sched.c doesn't add more
> idle time in "vmstat". In the same test, if I disable "CONFIG_NO_HZ"
> in kernel, idle percent in "vmstat" and "oprofile" would match.
> 
> My APM82181 kernel configuration is "CONFIG_NO_HZ", "CONFIG_HZ_250=y",
> "CONFIG_HZ=250", and "CONFIG_HIGH_RES_TIMERS".
> 
> My question is that if kernel enables "CONFIG_NO_HZ", how would kernel
> report correct stats.

Hi Thomas ! Any idea what we're doing wrong ? :-)

Cheers,
Ben.

> Thanks,
> Fushen
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH] powerpc: fix compile error with 85xx/p1022_ds.c
From: Tabi Timur-B04825 @ 2011-12-13 20:56 UTC (permalink / raw)
  To: Michael Neuling, Gala Kumar-B11780
  Cc: linux-next@vger.kernel.org, linuxppc-dev, Kyle Moffett,
	sfr@canb.auug.org.au
In-Reply-To: <22284.1323643754@neuling.org>

On Sun, Dec 11, 2011 at 4:49 PM, Michael Neuling <mikey@neuling.org> wrote:
> Current linux-next compiled with mpc85xx_defconfig causes this:
> =A0arch/powerpc/platforms/85xx/p1022_ds.c:341:14: error: 'udbg_progress' =
undeclared here (not in a function)
>
> Add include to fix this.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

Acked-by: Timur Tabi <timur@freescale.com>

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply

* [PATCH] [v4] powerpc/fsl: add MSI support for the Freescale hypervisor
From: Timur Tabi @ 2011-12-13 20:51 UTC (permalink / raw)
  To: kumar.gala, scottwood, linuxppc-dev

Add support for vmpic-msi nodes to the fsl_msi driver.  The MSI is
virtualized by the hypervisor, so the vmpic-msi does not contain a 'reg'
property.  Instead, the driver uses hcalls.

Add support for the "msi-address-64" property to the fsl_pci driver.
The Freescale hypervisor typically puts the virtualized MSIIR register
in the page after the end of DDR, so we extend the DDR ATMU to cover it.
Any other location for MSIIR is not supported, for now.

Signed-off-by: Timur Tabi <timur@freescale.com>
---

v4: fix checkpatch complaint.  I know, I suck.

 arch/powerpc/sysdev/fsl_msi.c |   68 +++++++++++++++++++++++++++++------------
 arch/powerpc/sysdev/fsl_msi.h |    7 ++--
 arch/powerpc/sysdev/fsl_pci.c |   29 +++++++++++++++++
 3 files changed, 81 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 89548e0..ecb5c19 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -23,6 +23,8 @@
 #include <asm/hw_irq.h>
 #include <asm/ppc-pci.h>
 #include <asm/mpic.h>
+#include <asm/fsl_hcalls.h>
+
 #include "fsl_msi.h"
 #include "fsl_pci.h"
 
@@ -163,11 +165,13 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 	 */
 	np = of_parse_phandle(hose->dn, "fsl,msi", 0);
 	if (np) {
-		if (of_device_is_compatible(np, "fsl,mpic-msi"))
+		if (of_device_is_compatible(np, "fsl,mpic-msi") ||
+		    of_device_is_compatible(np, "fsl,vmpic-msi"))
 			phandle = np->phandle;
 		else {
-			dev_err(&pdev->dev, "node %s has an invalid fsl,msi"
-				" phandle\n", hose->dn->full_name);
+			dev_err(&pdev->dev,
+				"node %s has an invalid fsl,msi phandle %u\n",
+				hose->dn->full_name, np->phandle);
 			return -EINVAL;
 		}
 	}
@@ -196,16 +200,14 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 
 		if (hwirq < 0) {
 			rc = hwirq;
-			pr_debug("%s: fail allocating msi interrupt\n",
-					__func__);
+			dev_err(&pdev->dev, "could not allocate MSI interrupt\n");
 			goto out_free;
 		}
 
 		virq = irq_create_mapping(msi_data->irqhost, hwirq);
 
 		if (virq == NO_IRQ) {
-			pr_debug("%s: fail mapping hwirq 0x%x\n",
-					__func__, hwirq);
+			dev_err(&pdev->dev, "fail mapping hwirq %i\n", hwirq);
 			msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
 			rc = -ENOSPC;
 			goto out_free;
@@ -234,6 +236,7 @@ static void fsl_msi_cascade(unsigned int irq, struct irq_desc *desc)
 	u32 intr_index;
 	u32 have_shift = 0;
 	struct fsl_msi_cascade_data *cascade_data;
+	unsigned int ret;
 
 	cascade_data = irq_get_handler_data(irq);
 	msi_data = cascade_data->msi_data;
@@ -265,6 +268,14 @@ static void fsl_msi_cascade(unsigned int irq, struct irq_desc *desc)
 	case FSL_PIC_IP_IPIC:
 		msir_value = fsl_msi_read(msi_data->msi_regs, msir_index * 0x4);
 		break;
+	case FSL_PIC_IP_VMPIC:
+		ret = fh_vmpic_get_msir(virq_to_hw(irq), &msir_value);
+		if (ret) {
+			pr_err("fsl-msi: fh_vmpic_get_msir() failed for "
+			       "irq %u (ret=%u)\n", irq, ret);
+			msir_value = 0;
+		}
+		break;
 	}
 
 	while (msir_value) {
@@ -282,6 +293,7 @@ static void fsl_msi_cascade(unsigned int irq, struct irq_desc *desc)
 
 	switch (msi_data->feature & FSL_PIC_IP_MASK) {
 	case FSL_PIC_IP_MPIC:
+	case FSL_PIC_IP_VMPIC:
 		chip->irq_eoi(idata);
 		break;
 	case FSL_PIC_IP_IPIC:
@@ -311,7 +323,8 @@ static int fsl_of_msi_remove(struct platform_device *ofdev)
 	}
 	if (msi->bitmap.bitmap)
 		msi_bitmap_free(&msi->bitmap);
-	iounmap(msi->msi_regs);
+	if ((msi->feature & FSL_PIC_IP_MASK) != FSL_PIC_IP_VMPIC)
+		iounmap(msi->msi_regs);
 	kfree(msi);
 
 	return 0;
@@ -383,26 +396,32 @@ static int __devinit fsl_of_msi_probe(struct platform_device *dev)
 		goto error_out;
 	}
 
-	/* Get the MSI reg base */
-	err = of_address_to_resource(dev->dev.of_node, 0, &res);
-	if (err) {
-		dev_err(&dev->dev, "%s resource error!\n",
+	/*
+	 * Under the Freescale hypervisor, the msi nodes don't have a 'reg'
+	 * property.  Instead, we use hypercalls to access the MSI.
+	 */
+	if ((features->fsl_pic_ip & FSL_PIC_IP_MASK) != FSL_PIC_IP_VMPIC) {
+		err = of_address_to_resource(dev->dev.of_node, 0, &res);
+		if (err) {
+			dev_err(&dev->dev, "invalid resource for node %s\n",
 				dev->dev.of_node->full_name);
-		goto error_out;
-	}
+			goto error_out;
+		}
 
-	msi->msi_regs = ioremap(res.start, resource_size(&res));
-	if (!msi->msi_regs) {
-		dev_err(&dev->dev, "ioremap problem failed\n");
-		goto error_out;
+		msi->msi_regs = ioremap(res.start, resource_size(&res));
+		if (!msi->msi_regs) {
+			dev_err(&dev->dev, "could not map node %s\n",
+				dev->dev.of_node->full_name);
+			goto error_out;
+		}
+		msi->msiir_offset =
+			features->msiir_offset + (res.start & 0xfffff);
 	}
 
 	msi->feature = features->fsl_pic_ip;
 
 	msi->irqhost->host_data = msi;
 
-	msi->msiir_offset = features->msiir_offset + (res.start & 0xfffff);
-
 	/*
 	 * Remember the phandle, so that we can match with any PCI nodes
 	 * that have an "fsl,msi" property.
@@ -476,6 +495,11 @@ static const struct fsl_msi_feature ipic_msi_feature = {
 	.msiir_offset = 0x38,
 };
 
+static const struct fsl_msi_feature vmpic_msi_feature = {
+	.fsl_pic_ip = FSL_PIC_IP_VMPIC,
+	.msiir_offset = 0,
+};
+
 static const struct of_device_id fsl_of_msi_ids[] = {
 	{
 		.compatible = "fsl,mpic-msi",
@@ -485,6 +509,10 @@ static const struct of_device_id fsl_of_msi_ids[] = {
 		.compatible = "fsl,ipic-msi",
 		.data = (void *)&ipic_msi_feature,
 	},
+	{
+		.compatible = "fsl,vmpic-msi",
+		.data = (void *)&vmpic_msi_feature,
+	},
 	{}
 };
 
diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h
index b5d25ba..f6c646a 100644
--- a/arch/powerpc/sysdev/fsl_msi.h
+++ b/arch/powerpc/sysdev/fsl_msi.h
@@ -20,9 +20,10 @@
 #define IRQS_PER_MSI_REG	32
 #define NR_MSI_IRQS	(NR_MSI_REG * IRQS_PER_MSI_REG)
 
-#define FSL_PIC_IP_MASK	0x0000000F
-#define FSL_PIC_IP_MPIC	0x00000001
-#define FSL_PIC_IP_IPIC	0x00000002
+#define FSL_PIC_IP_MASK   0x0000000F
+#define FSL_PIC_IP_MPIC   0x00000001
+#define FSL_PIC_IP_IPIC   0x00000002
+#define FSL_PIC_IP_VMPIC  0x00000003
 
 struct fsl_msi {
 	struct irq_host *irqhost;
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 4ce547e..5204a10 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -113,6 +113,8 @@ static void __init setup_pci_atmu(struct pci_controller *hose,
 	u32 piwar = PIWAR_EN | PIWAR_PF | PIWAR_TGI_LOCAL |
 			PIWAR_READ_SNOOP | PIWAR_WRITE_SNOOP;
 	char *name = hose->dn->full_name;
+	const u64 *reg;
+	int len;
 
 	pr_debug("PCI memory map start 0x%016llx, size 0x%016llx\n",
 		 (u64)rsrc->start, (u64)resource_size(rsrc));
@@ -205,6 +207,33 @@ static void __init setup_pci_atmu(struct pci_controller *hose,
 
 	/* Setup inbound mem window */
 	mem = memblock_end_of_DRAM();
+
+	/*
+	 * The msi-address-64 property, if it exists, indicates the physical
+	 * address of the MSIIR register.  Normally, this register is located
+	 * inside CCSR, so the ATMU that covers all of CCSR is used. But if
+	 * this property exists, then we normally need to create a new ATMU
+	 * for it.  For now, however, we cheat.  The only entity that creates
+	 * this property is the Freescale hypervisor, and the address is
+	 * specified in the partition configuration.  Typically, the address
+	 * is located in the page immediately after the end of DDR.  If so, we
+	 * can avoid allocating a new ATMU by extending the DDR ATMU by one
+	 * page.
+	 */
+	reg = of_get_property(hose->dn, "msi-address-64", &len);
+	if (reg && (len == sizeof(u64))) {
+		u64 address = be64_to_cpup(reg);
+
+		if ((address >= mem) && (address < (mem + PAGE_SIZE))) {
+			pr_info("%s: extending DDR ATMU to cover MSIIR", name);
+			mem += PAGE_SIZE;
+		} else {
+			/* TODO: Create a new ATMU for MSIIR */
+			pr_warn("%s: msi-address-64 address of %llx is "
+				"unsupported\n", name, address);
+		}
+	}
+
 	sz = min(mem, paddr_lo);
 	mem_log = __ilog2_u64(sz);
 
-- 
1.7.3.4

^ permalink raw reply related

* CONFIG_NO_HZ added too much idle time in /proc/stat during throughput test.
From: Fushen Chen @ 2011-12-13 20:42 UTC (permalink / raw)
  To: Linuxppc-dev Development

[-- Attachment #1: Type: text/plain, Size: 853 bytes --]

On APM82181,  "vmstat" (/proc/stat)  doesn't show correct idle percent, if
kernel enables "CONFIG_NO_HZ" (Tickless System / Dynamic Tick).

When I run wireless throughput test with heavy traffic, "vmstat" shows very
high idle percent while "oprofile" shows very low idle percent. During the
test, the system is idle, but network traffic uses a lot of hard IRQ and
soft-irq time. "vmstat" would have the correct stats if
account_idle_ticks(ticks) in kernel/time/tick-sched.c doesn't add more idle
time in "vmstat". In the same test, if I disable "CONFIG_NO_HZ" in kernel,
idle percent in "vmstat" and "oprofile" would match.

My APM82181 kernel configuration is "CONFIG_NO_HZ", "CONFIG_HZ_250=y",
"CONFIG_HZ=250", and "CONFIG_HIGH_RES_TIMERS".

My question is that if kernel enables "CONFIG_NO_HZ", how would kernel
report correct stats.

Thanks,
Fushen

[-- Attachment #2: Type: text/html, Size: 1037 bytes --]

^ permalink raw reply

* [PATCH] [v3] powerpc/fsl: add MSI support for the Freescale hypervisor
From: Timur Tabi @ 2011-12-13 20:37 UTC (permalink / raw)
  To: kumar.gala, scottwood, linuxppc-dev

Add support for vmpic-msi nodes to the fsl_msi driver.  The MSI is
virtualized by the hypervisor, so the vmpic-msi does not contain a 'reg'
property.  Instead, the driver uses hcalls.

Add support for the "msi-address-64" property to the fsl_pci driver.
The Freescale hypervisor typically puts the virtualized MSIIR register
in the page after the end of DDR, so we extend the DDR ATMU to cover it.
Any other location for MSIIR is not supported, for now.

Signed-off-by: Timur Tabi <timur@freescale.com>
---

v3: fix a pr_warn message

v2: fix a comment and patch description

 arch/powerpc/sysdev/fsl_msi.c |   68 +++++++++++++++++++++++++++++------------
 arch/powerpc/sysdev/fsl_msi.h |    7 ++--
 arch/powerpc/sysdev/fsl_pci.c |   29 +++++++++++++++++
 3 files changed, 81 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 89548e0..7dc473f 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -23,6 +23,8 @@
 #include <asm/hw_irq.h>
 #include <asm/ppc-pci.h>
 #include <asm/mpic.h>
+#include <asm/fsl_hcalls.h>
+
 #include "fsl_msi.h"
 #include "fsl_pci.h"
 
@@ -163,11 +165,13 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 	 */
 	np = of_parse_phandle(hose->dn, "fsl,msi", 0);
 	if (np) {
-		if (of_device_is_compatible(np, "fsl,mpic-msi"))
+		if (of_device_is_compatible(np, "fsl,mpic-msi") ||
+		    of_device_is_compatible(np, "fsl,vmpic-msi"))
 			phandle = np->phandle;
 		else {
-			dev_err(&pdev->dev, "node %s has an invalid fsl,msi"
-				" phandle\n", hose->dn->full_name);
+			dev_err(&pdev->dev,
+				"node %s has an invalid fsl,msi phandle %u\n",
+				hose->dn->full_name, np->phandle);
 			return -EINVAL;
 		}
 	}
@@ -196,16 +200,14 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 
 		if (hwirq < 0) {
 			rc = hwirq;
-			pr_debug("%s: fail allocating msi interrupt\n",
-					__func__);
+			dev_err(&pdev->dev, "could not allocate MSI interrupt\n");
 			goto out_free;
 		}
 
 		virq = irq_create_mapping(msi_data->irqhost, hwirq);
 
 		if (virq == NO_IRQ) {
-			pr_debug("%s: fail mapping hwirq 0x%x\n",
-					__func__, hwirq);
+			dev_err(&pdev->dev, "fail mapping hwirq %i\n", hwirq);
 			msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
 			rc = -ENOSPC;
 			goto out_free;
@@ -234,6 +236,7 @@ static void fsl_msi_cascade(unsigned int irq, struct irq_desc *desc)
 	u32 intr_index;
 	u32 have_shift = 0;
 	struct fsl_msi_cascade_data *cascade_data;
+	unsigned int ret;
 
 	cascade_data = irq_get_handler_data(irq);
 	msi_data = cascade_data->msi_data;
@@ -265,6 +268,14 @@ static void fsl_msi_cascade(unsigned int irq, struct irq_desc *desc)
 	case FSL_PIC_IP_IPIC:
 		msir_value = fsl_msi_read(msi_data->msi_regs, msir_index * 0x4);
 		break;
+	case FSL_PIC_IP_VMPIC:
+		ret = fh_vmpic_get_msir(virq_to_hw(irq), &msir_value);
+		if (ret) {
+			pr_err("fsl-msi: fh_vmpic_get_msir() failed for "
+			       "irq %u (ret=%u)\n", irq, ret);
+			msir_value = 0;
+		}
+		break;
 	}
 
 	while (msir_value) {
@@ -282,6 +293,7 @@ static void fsl_msi_cascade(unsigned int irq, struct irq_desc *desc)
 
 	switch (msi_data->feature & FSL_PIC_IP_MASK) {
 	case FSL_PIC_IP_MPIC:
+	case FSL_PIC_IP_VMPIC:
 		chip->irq_eoi(idata);
 		break;
 	case FSL_PIC_IP_IPIC:
@@ -311,7 +323,8 @@ static int fsl_of_msi_remove(struct platform_device *ofdev)
 	}
 	if (msi->bitmap.bitmap)
 		msi_bitmap_free(&msi->bitmap);
-	iounmap(msi->msi_regs);
+	if ((msi->feature & FSL_PIC_IP_MASK) != FSL_PIC_IP_VMPIC)
+		iounmap(msi->msi_regs);
 	kfree(msi);
 
 	return 0;
@@ -383,26 +396,32 @@ static int __devinit fsl_of_msi_probe(struct platform_device *dev)
 		goto error_out;
 	}
 
-	/* Get the MSI reg base */
-	err = of_address_to_resource(dev->dev.of_node, 0, &res);
-	if (err) {
-		dev_err(&dev->dev, "%s resource error!\n",
+	/*
+	 * Under the Freescale hypervisor, the msi nodes don't have a 'reg'
+	 * property.  Instead, we use hypercalls to access the MSI.
+	 */
+	if ((features->fsl_pic_ip & FSL_PIC_IP_MASK) != FSL_PIC_IP_VMPIC) {
+		err = of_address_to_resource(dev->dev.of_node, 0, &res);
+		if (err) {
+			dev_err(&dev->dev, "invalid resource for node %s\n",
 				dev->dev.of_node->full_name);
-		goto error_out;
-	}
+			goto error_out;
+		}
 
-	msi->msi_regs = ioremap(res.start, resource_size(&res));
-	if (!msi->msi_regs) {
-		dev_err(&dev->dev, "ioremap problem failed\n");
-		goto error_out;
+		msi->msi_regs = ioremap(res.start, resource_size(&res));
+		if (!msi->msi_regs) {
+			dev_err(&dev->dev, "could not map node %s\n",
+				dev->dev.of_node->full_name);
+			goto error_out;
+		}
+		msi->msiir_offset =
+			features->msiir_offset + (res.start & 0xfffff);
 	}
 
 	msi->feature = features->fsl_pic_ip;
 
 	msi->irqhost->host_data = msi;
 
-	msi->msiir_offset = features->msiir_offset + (res.start & 0xfffff);
-
 	/*
 	 * Remember the phandle, so that we can match with any PCI nodes
 	 * that have an "fsl,msi" property.
@@ -476,6 +495,11 @@ static const struct fsl_msi_feature ipic_msi_feature = {
 	.msiir_offset = 0x38,
 };
 
+static const struct fsl_msi_feature vmpic_msi_feature = {
+	.fsl_pic_ip = FSL_PIC_IP_VMPIC,
+	.msiir_offset = 0,
+};
+
 static const struct of_device_id fsl_of_msi_ids[] = {
 	{
 		.compatible = "fsl,mpic-msi",
@@ -485,6 +509,10 @@ static const struct of_device_id fsl_of_msi_ids[] = {
 		.compatible = "fsl,ipic-msi",
 		.data = (void *)&ipic_msi_feature,
 	},
+ 	{
+		.compatible = "fsl,vmpic-msi",
+		.data = (void *)&vmpic_msi_feature,
+	},
 	{}
 };
 
diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h
index b5d25ba..f6c646a 100644
--- a/arch/powerpc/sysdev/fsl_msi.h
+++ b/arch/powerpc/sysdev/fsl_msi.h
@@ -20,9 +20,10 @@
 #define IRQS_PER_MSI_REG	32
 #define NR_MSI_IRQS	(NR_MSI_REG * IRQS_PER_MSI_REG)
 
-#define FSL_PIC_IP_MASK	0x0000000F
-#define FSL_PIC_IP_MPIC	0x00000001
-#define FSL_PIC_IP_IPIC	0x00000002
+#define FSL_PIC_IP_MASK   0x0000000F
+#define FSL_PIC_IP_MPIC   0x00000001
+#define FSL_PIC_IP_IPIC   0x00000002
+#define FSL_PIC_IP_VMPIC  0x00000003
 
 struct fsl_msi {
 	struct irq_host *irqhost;
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 4ce547e..5204a10 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -113,6 +113,8 @@ static void __init setup_pci_atmu(struct pci_controller *hose,
 	u32 piwar = PIWAR_EN | PIWAR_PF | PIWAR_TGI_LOCAL |
 			PIWAR_READ_SNOOP | PIWAR_WRITE_SNOOP;
 	char *name = hose->dn->full_name;
+	const u64 *reg;
+	int len;
 
 	pr_debug("PCI memory map start 0x%016llx, size 0x%016llx\n",
 		 (u64)rsrc->start, (u64)resource_size(rsrc));
@@ -205,6 +207,33 @@ static void __init setup_pci_atmu(struct pci_controller *hose,
 
 	/* Setup inbound mem window */
 	mem = memblock_end_of_DRAM();
+
+	/*
+	 * The msi-address-64 property, if it exists, indicates the physical
+	 * address of the MSIIR register.  Normally, this register is located
+	 * inside CCSR, so the ATMU that covers all of CCSR is used. But if
+	 * this property exists, then we normally need to create a new ATMU
+	 * for it.  For now, however, we cheat.  The only entity that creates
+	 * this property is the Freescale hypervisor, and the address is
+	 * specified in the partition configuration.  Typically, the address
+	 * is located in the page immediately after the end of DDR.  If so, we
+	 * can avoid allocating a new ATMU by extending the DDR ATMU by one
+	 * page.
+	 */
+	reg = of_get_property(hose->dn, "msi-address-64", &len);
+	if (reg && (len == sizeof(u64))) {
+		u64 address = be64_to_cpup(reg);
+
+		if ((address >= mem) && (address < (mem + PAGE_SIZE))) {
+			pr_info("%s: extending DDR ATMU to cover MSIIR", name);
+			mem += PAGE_SIZE;
+		} else {
+			/* TODO: Create a new ATMU for MSIIR */
+			pr_warn("%s: msi-address-64 address of %llx is "
+				"unsupported\n", name, address);
+		}
+	}
+
 	sz = min(mem, paddr_lo);
 	mem_log = __ilog2_u64(sz);
 
-- 
1.7.3.4

^ permalink raw reply related

* Re: sam460ex, rtc-m41t80 incorrect behaviour with kernel >=linux-2.6.38
From: acrux @ 2011-12-13 20:13 UTC (permalink / raw)
  To: John Stultz; +Cc: linuxppc-dev, Alexander Bigga, rtc-linux
In-Reply-To: <1323730073.4078.97.camel@work-vm>

On Mon, 12 Dec 2011 14:47:53 -0800
John Stultz <john.stultz@linaro.org> wrote:

> On Sun, 2011-12-04 at 03:29 +0100, acrux_it@libero.it wrote:
> > Acube Sam460ex is a 460ex SoC (PowerPC) based board with m41t81
> > like RTC. The wrong beahviour it seems to be caused this commit:
> > 
> > Thu, 3 Feb 2011 21:02:35 +0000 (13:02 -0800)
> > commit	16380c153a69c3784d2afaddfe0a22f353046cf6
> > RTC: Convert rtc drivers to use the alarm_irq_enable method
> > http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;
> > h=16380c153a69c3784d2afaddfe0a22f353046cf6
> 
> Yea. This was reported earlier, but it seems none of the proposed
> solutions have been implemented.
> 
> The cause is that the m41t80 driver's alarm functionality is somehow
> broken. The alarm mode is actually now used for UIE, so where before
> 2.6.38, UIE mode returned -EINVAL, it will now succeed, setting the
> alarm to fire a second away.
> 
> However, since the alarm code is broken, no alarm arrives and you get
> the timeout you see.
> 
> Could you test the following patch to see if it resolves the issue for
> you? If it does, I'll queue it for inclusion.
> 
> thanks
> -john
> 


hallo John,
CC to linuxppc-dev@lists.ozlabs.org


thanks for your time, this escamotage works fine.
I tested your patch against linux-2.6.38.8 and linux-3.1.5.

Tested-by: Nico Macrionitis <acrux@cruxppc.org>


cheers,
Nico



> 
> [PATCH] rtc: m41t80: Workaround broken alarm functionality
> 
> The m41t80 driver doesn't seem to have a functional alarm.
> 
> This causes failures when the generic core sees alarm functions,
> but then cannot use them properly for things like UIE mode.
> 
> Disabling the alarm functions allows proper error reporting,
> and possible fallback to emulated modes. Once someone fixes
> the alarm functions, this can be restored.
> 
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  drivers/rtc/rtc-m41t80.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/rtc/rtc-m41t80.c b/drivers/rtc/rtc-m41t80.c
> index eda128f..64aedd8 100644
> --- a/drivers/rtc/rtc-m41t80.c
> +++ b/drivers/rtc/rtc-m41t80.c
> @@ -357,10 +357,19 @@ static int m41t80_rtc_read_alarm(struct device
> *dev, struct rtc_wkalrm *t) static struct rtc_class_ops
> m41t80_rtc_ops = { .read_time = m41t80_rtc_read_time,
>  	.set_time = m41t80_rtc_set_time,
> +	/*
> +	 * XXX - m41t80 alarm functionality is reported broken.
> +	 * until it is fixed, don't register alarm functions.
> +	 *
>  	.read_alarm = m41t80_rtc_read_alarm,
>  	.set_alarm = m41t80_rtc_set_alarm,
> +	*/
>  	.proc = m41t80_rtc_proc,
> +	/*
> +	 * See above comment on broken alarm
> +	 *
>  	.alarm_irq_enable = m41t80_rtc_alarm_irq_enable,
> +	*/
>  };
>  
>  #if defined(CONFIG_RTC_INTF_SYSFS) || defined
>  #(CONFIG_RTC_INTF_SYSFS_MODULE)
> -- 
> 1.7.3.2.146.gca209
> 
> 
> 


-- 
GNU/Linux on Power Architecture
CRUX PPC - http://cruxppc.org/

^ permalink raw reply

* Re: Linux port availability for p5010 processor
From: Scott Wood @ 2011-12-13 19:28 UTC (permalink / raw)
  To: Vineeth; +Cc: linuxppc-dev, Senthil CGC, rajarama.b
In-Reply-To: <CAFbQSaALfZ-6-_g5aC1GgbUaoHyuBwaT342OFMTSf0iqtakDGA@mail.gmail.com>

On 12/12/2011 11:33 PM, Vineeth wrote:
> Do we have a linux port available for freescale P5010 processor (with
> single E5500 core) ?
> /(found arch/powerpc/platforms/pseries ; and a some details on
> kernel/cputable.c /)

p5010 is basically a p5020 with one core and "memory complex" instead of
two.  The same Linux code should work for both.

pseries is something completely different.

> Is there any reference board which uses this processor ?

p5020ds has a p5020.

Linux support is in arch/powerpc/platforms/85xx/p5020_ds.c (and
scattered in other places).

> any reference in DTS file also will be helpful.

arch/powerpc/boot/dts/p5020ds.dts

-Scott

^ permalink raw reply

* [PATCH] [v2] powerpc/fsl: add MSI support for the Freescale hypervisor
From: Timur Tabi @ 2011-12-13 18:52 UTC (permalink / raw)
  To: kumar.gala, scottwood, linuxppc-dev

Add support for vmpic-msi nodes to the fsl_msi driver.  The MSI is
virtualized by the hypervisor, so the vmpic-msi does not contain a 'reg'
property.  Instead, the driver uses hcalls.

Add support for the "msi-address-64" property to the fsl_pci driver.
The Freescale hypervisor typically puts the virtualized MSIIR register
in the page after the end of DDR, so we extend the DDR ATMU to cover it.
Any other location for MSIIR is not supported, for now.

Signed-off-by: Timur Tabi <timur@freescale.com>
---
 arch/powerpc/sysdev/fsl_msi.c |   68 +++++++++++++++++++++++++++++------------
 arch/powerpc/sysdev/fsl_msi.h |    7 ++--
 arch/powerpc/sysdev/fsl_pci.c |   30 ++++++++++++++++++
 3 files changed, 82 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 89548e0..7dc473f 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -23,6 +23,8 @@
 #include <asm/hw_irq.h>
 #include <asm/ppc-pci.h>
 #include <asm/mpic.h>
+#include <asm/fsl_hcalls.h>
+
 #include "fsl_msi.h"
 #include "fsl_pci.h"
 
@@ -163,11 +165,13 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 	 */
 	np = of_parse_phandle(hose->dn, "fsl,msi", 0);
 	if (np) {
-		if (of_device_is_compatible(np, "fsl,mpic-msi"))
+		if (of_device_is_compatible(np, "fsl,mpic-msi") ||
+		    of_device_is_compatible(np, "fsl,vmpic-msi"))
 			phandle = np->phandle;
 		else {
-			dev_err(&pdev->dev, "node %s has an invalid fsl,msi"
-				" phandle\n", hose->dn->full_name);
+			dev_err(&pdev->dev,
+				"node %s has an invalid fsl,msi phandle %u\n",
+				hose->dn->full_name, np->phandle);
 			return -EINVAL;
 		}
 	}
@@ -196,16 +200,14 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 
 		if (hwirq < 0) {
 			rc = hwirq;
-			pr_debug("%s: fail allocating msi interrupt\n",
-					__func__);
+			dev_err(&pdev->dev, "could not allocate MSI interrupt\n");
 			goto out_free;
 		}
 
 		virq = irq_create_mapping(msi_data->irqhost, hwirq);
 
 		if (virq == NO_IRQ) {
-			pr_debug("%s: fail mapping hwirq 0x%x\n",
-					__func__, hwirq);
+			dev_err(&pdev->dev, "fail mapping hwirq %i\n", hwirq);
 			msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
 			rc = -ENOSPC;
 			goto out_free;
@@ -234,6 +236,7 @@ static void fsl_msi_cascade(unsigned int irq, struct irq_desc *desc)
 	u32 intr_index;
 	u32 have_shift = 0;
 	struct fsl_msi_cascade_data *cascade_data;
+	unsigned int ret;
 
 	cascade_data = irq_get_handler_data(irq);
 	msi_data = cascade_data->msi_data;
@@ -265,6 +268,14 @@ static void fsl_msi_cascade(unsigned int irq, struct irq_desc *desc)
 	case FSL_PIC_IP_IPIC:
 		msir_value = fsl_msi_read(msi_data->msi_regs, msir_index * 0x4);
 		break;
+	case FSL_PIC_IP_VMPIC:
+		ret = fh_vmpic_get_msir(virq_to_hw(irq), &msir_value);
+		if (ret) {
+			pr_err("fsl-msi: fh_vmpic_get_msir() failed for "
+			       "irq %u (ret=%u)\n", irq, ret);
+			msir_value = 0;
+		}
+		break;
 	}
 
 	while (msir_value) {
@@ -282,6 +293,7 @@ static void fsl_msi_cascade(unsigned int irq, struct irq_desc *desc)
 
 	switch (msi_data->feature & FSL_PIC_IP_MASK) {
 	case FSL_PIC_IP_MPIC:
+	case FSL_PIC_IP_VMPIC:
 		chip->irq_eoi(idata);
 		break;
 	case FSL_PIC_IP_IPIC:
@@ -311,7 +323,8 @@ static int fsl_of_msi_remove(struct platform_device *ofdev)
 	}
 	if (msi->bitmap.bitmap)
 		msi_bitmap_free(&msi->bitmap);
-	iounmap(msi->msi_regs);
+	if ((msi->feature & FSL_PIC_IP_MASK) != FSL_PIC_IP_VMPIC)
+		iounmap(msi->msi_regs);
 	kfree(msi);
 
 	return 0;
@@ -383,26 +396,32 @@ static int __devinit fsl_of_msi_probe(struct platform_device *dev)
 		goto error_out;
 	}
 
-	/* Get the MSI reg base */
-	err = of_address_to_resource(dev->dev.of_node, 0, &res);
-	if (err) {
-		dev_err(&dev->dev, "%s resource error!\n",
+	/*
+	 * Under the Freescale hypervisor, the msi nodes don't have a 'reg'
+	 * property.  Instead, we use hypercalls to access the MSI.
+	 */
+	if ((features->fsl_pic_ip & FSL_PIC_IP_MASK) != FSL_PIC_IP_VMPIC) {
+		err = of_address_to_resource(dev->dev.of_node, 0, &res);
+		if (err) {
+			dev_err(&dev->dev, "invalid resource for node %s\n",
 				dev->dev.of_node->full_name);
-		goto error_out;
-	}
+			goto error_out;
+		}
 
-	msi->msi_regs = ioremap(res.start, resource_size(&res));
-	if (!msi->msi_regs) {
-		dev_err(&dev->dev, "ioremap problem failed\n");
-		goto error_out;
+		msi->msi_regs = ioremap(res.start, resource_size(&res));
+		if (!msi->msi_regs) {
+			dev_err(&dev->dev, "could not map node %s\n",
+				dev->dev.of_node->full_name);
+			goto error_out;
+		}
+		msi->msiir_offset =
+			features->msiir_offset + (res.start & 0xfffff);
 	}
 
 	msi->feature = features->fsl_pic_ip;
 
 	msi->irqhost->host_data = msi;
 
-	msi->msiir_offset = features->msiir_offset + (res.start & 0xfffff);
-
 	/*
 	 * Remember the phandle, so that we can match with any PCI nodes
 	 * that have an "fsl,msi" property.
@@ -476,6 +495,11 @@ static const struct fsl_msi_feature ipic_msi_feature = {
 	.msiir_offset = 0x38,
 };
 
+static const struct fsl_msi_feature vmpic_msi_feature = {
+	.fsl_pic_ip = FSL_PIC_IP_VMPIC,
+	.msiir_offset = 0,
+};
+
 static const struct of_device_id fsl_of_msi_ids[] = {
 	{
 		.compatible = "fsl,mpic-msi",
@@ -485,6 +509,10 @@ static const struct of_device_id fsl_of_msi_ids[] = {
 		.compatible = "fsl,ipic-msi",
 		.data = (void *)&ipic_msi_feature,
 	},
+ 	{
+		.compatible = "fsl,vmpic-msi",
+		.data = (void *)&vmpic_msi_feature,
+	},
 	{}
 };
 
diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h
index b5d25ba..f6c646a 100644
--- a/arch/powerpc/sysdev/fsl_msi.h
+++ b/arch/powerpc/sysdev/fsl_msi.h
@@ -20,9 +20,10 @@
 #define IRQS_PER_MSI_REG	32
 #define NR_MSI_IRQS	(NR_MSI_REG * IRQS_PER_MSI_REG)
 
-#define FSL_PIC_IP_MASK	0x0000000F
-#define FSL_PIC_IP_MPIC	0x00000001
-#define FSL_PIC_IP_IPIC	0x00000002
+#define FSL_PIC_IP_MASK   0x0000000F
+#define FSL_PIC_IP_MPIC   0x00000001
+#define FSL_PIC_IP_IPIC   0x00000002
+#define FSL_PIC_IP_VMPIC  0x00000003
 
 struct fsl_msi {
 	struct irq_host *irqhost;
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 4ce547e..cd82613 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -113,6 +113,8 @@ static void __init setup_pci_atmu(struct pci_controller *hose,
 	u32 piwar = PIWAR_EN | PIWAR_PF | PIWAR_TGI_LOCAL |
 			PIWAR_READ_SNOOP | PIWAR_WRITE_SNOOP;
 	char *name = hose->dn->full_name;
+	const u64 *reg;
+	int len;
 
 	pr_debug("PCI memory map start 0x%016llx, size 0x%016llx\n",
 		 (u64)rsrc->start, (u64)resource_size(rsrc));
@@ -205,6 +207,34 @@ static void __init setup_pci_atmu(struct pci_controller *hose,
 
 	/* Setup inbound mem window */
 	mem = memblock_end_of_DRAM();
+
+	/*
+	 * The msi-address-64 property, if it exists, indicates the physical
+	 * address of the MSIIR register.  Normally, this register is located
+	 * inside CCSR, so the ATMU that covers all of CCSR is used. But if
+	 * this property exists, then we normally need to create a new ATMU
+	 * for it.  For now, however, we cheat.  The only entity that creates
+	 * this property is the Freescale hypervisor, and the address is
+	 * specified in the partition configuration.  Typically, the address
+	 * is located in the page immediately after the end of DDR.  If so, we
+	 * can avoid allocating a new ATMU by extending the DDR ATMU by one
+	 * page.
+	 */
+	reg = of_get_property(hose->dn, "msi-address-64", &len);
+	if (reg && (len == sizeof(u64))) {
+		u64 address = be64_to_cpup(reg);
+
+		if ((address >= mem) && (address < (mem + PAGE_SIZE))) {
+			pr_info("%s: extending DDR ATMU to cover MSIIR", name);
+			mem += PAGE_SIZE;
+		} else {
+			/* TODO: Create a new ATMU for MSIIR */
+			pr_warn("%s: msi-address-64 address of %llx in node %s"
+				" is unsupported\n", name, address,
+				hose->dn->full_name);
+		}
+	}
+
 	sz = min(mem, paddr_lo);
 	mem_log = __ilog2_u64(sz);
 
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH RESEND] gpio: mpc8xxx: don't allow input-only pins to be output for MPC5121
From: Wolfram Sang @ 2011-12-13 18:23 UTC (permalink / raw)
  To: Grant Likely
  Cc: linuxppc-dev, Anatolij Gustschin, linux-kernel, Linus Walleij
In-Reply-To: <20111213181659.GA29243@ponder.secretlab.ca>

[-- Attachment #1: Type: text/plain, Size: 1681 bytes --]

On Tue, Dec 13, 2011 at 11:16:59AM -0700, Grant Likely wrote:
> On Tue, Dec 13, 2011 at 10:12:48AM +0100, Wolfram Sang wrote:
> > Add a 5121-custom reject if an input-only pin is requested to be output
> > (see 18.3.1.1 in the refman). Also, rewrite mach-specific quirk setup to
> > consume less lines which scales better.
> > 
> > Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
> > ---
> >  drivers/gpio/gpio-mpc8xxx.c |   17 ++++++++++++-----
> >  1 files changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpio/gpio-mpc8xxx.c b/drivers/gpio/gpio-mpc8xxx.c
> > index ec3fcf0..25dc736 100644
> > --- a/drivers/gpio/gpio-mpc8xxx.c
> > +++ b/drivers/gpio/gpio-mpc8xxx.c
> > @@ -115,6 +115,14 @@ static int mpc8xxx_gpio_dir_in(struct gpio_chip *gc, unsigned int gpio)
> >  	return 0;
> >  }
> >  
> > +static int mpc5121_gpio_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)
> > +{
> > +	/* GPIO 28..31 are input only on MPC5121 */
> > +	if (gpio >= 28)
> > +		return -EINVAL;
> > +
> > +	return mpc8xxx_gpio_dir_out(gc, gpio, val);
> > +}
> >  static int mpc8xxx_gpio_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)
> 
> This actually caused a build failure.  mpc8xxx_gpio_dir_out() was used before
> it was declared.  I moved the symbol to immediately below and applied anyway,
> but how did it compile for you?  Should I drop this patch until you retest?

Huh, I am surprised as well. Will investigate tomorrow. Sorry for the
inconvenience.

-- 
Pengutronix e.K.                           | Wolfram Sang                |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH RESEND] gpio: mpc8xxx: don't allow input-only pins to be output for MPC5121
From: Grant Likely @ 2011-12-13 18:16 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: linuxppc-dev, Anatolij Gustschin, linux-kernel, Linus Walleij
In-Reply-To: <1323767568-9565-1-git-send-email-w.sang@pengutronix.de>

On Tue, Dec 13, 2011 at 10:12:48AM +0100, Wolfram Sang wrote:
> Add a 5121-custom reject if an input-only pin is requested to be output
> (see 18.3.1.1 in the refman). Also, rewrite mach-specific quirk setup to
> consume less lines which scales better.
> 
> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
> ---
>  drivers/gpio/gpio-mpc8xxx.c |   17 ++++++++++++-----
>  1 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpio/gpio-mpc8xxx.c b/drivers/gpio/gpio-mpc8xxx.c
> index ec3fcf0..25dc736 100644
> --- a/drivers/gpio/gpio-mpc8xxx.c
> +++ b/drivers/gpio/gpio-mpc8xxx.c
> @@ -115,6 +115,14 @@ static int mpc8xxx_gpio_dir_in(struct gpio_chip *gc, unsigned int gpio)
>  	return 0;
>  }
>  
> +static int mpc5121_gpio_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)
> +{
> +	/* GPIO 28..31 are input only on MPC5121 */
> +	if (gpio >= 28)
> +		return -EINVAL;
> +
> +	return mpc8xxx_gpio_dir_out(gc, gpio, val);
> +}
>  static int mpc8xxx_gpio_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)

This actually caused a build failure.  mpc8xxx_gpio_dir_out() was used before
it was declared.  I moved the symbol to immediately below and applied anyway,
but how did it compile for you?  Should I drop this patch until you retest?

g.

^ permalink raw reply

* Re: [PATCH RESEND] gpio: mpc8xxx: don't allow input-only pins to be output for MPC5121
From: Grant Likely @ 2011-12-13 18:07 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: linuxppc-dev, Anatolij Gustschin, linux-kernel, Linus Walleij
In-Reply-To: <1323767568-9565-1-git-send-email-w.sang@pengutronix.de>

On Tue, Dec 13, 2011 at 10:12:48AM +0100, Wolfram Sang wrote:
> Add a 5121-custom reject if an input-only pin is requested to be output
> (see 18.3.1.1 in the refman). Also, rewrite mach-specific quirk setup to
> consume less lines which scales better.
> 
> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>

Applied, thanks.

g.

> ---
>  drivers/gpio/gpio-mpc8xxx.c |   17 ++++++++++++-----
>  1 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpio/gpio-mpc8xxx.c b/drivers/gpio/gpio-mpc8xxx.c
> index ec3fcf0..25dc736 100644
> --- a/drivers/gpio/gpio-mpc8xxx.c
> +++ b/drivers/gpio/gpio-mpc8xxx.c
> @@ -115,6 +115,14 @@ static int mpc8xxx_gpio_dir_in(struct gpio_chip *gc, unsigned int gpio)
>  	return 0;
>  }
>  
> +static int mpc5121_gpio_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)
> +{
> +	/* GPIO 28..31 are input only on MPC5121 */
> +	if (gpio >= 28)
> +		return -EINVAL;
> +
> +	return mpc8xxx_gpio_dir_out(gc, gpio, val);
> +}
>  static int mpc8xxx_gpio_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)
>  {
>  	struct of_mm_gpio_chip *mm = to_of_mm_gpio_chip(gc);
> @@ -340,11 +348,10 @@ static void __init mpc8xxx_add_controller(struct device_node *np)
>  	mm_gc->save_regs = mpc8xxx_gpio_save_regs;
>  	gc->ngpio = MPC8XXX_GPIO_PINS;
>  	gc->direction_input = mpc8xxx_gpio_dir_in;
> -	gc->direction_output = mpc8xxx_gpio_dir_out;
> -	if (of_device_is_compatible(np, "fsl,mpc8572-gpio"))
> -		gc->get = mpc8572_gpio_get;
> -	else
> -		gc->get = mpc8xxx_gpio_get;
> +	gc->direction_output = of_device_is_compatible(np, "fsl,mpc5121-gpio") ?
> +		mpc5121_gpio_dir_out : mpc8xxx_gpio_dir_out;
> +	gc->get = of_device_is_compatible(np, "fsl,mpc8572-gpio") ?
> +		mpc8572_gpio_get : mpc8xxx_gpio_get;
>  	gc->set = mpc8xxx_gpio_set;
>  	gc->to_irq = mpc8xxx_gpio_to_irq;
>  
> -- 
> 1.7.7.1
> 

^ permalink raw reply

* [PATCH V2] gpio: mpc8xxx: don't allow input-only pins to be output for MPC5121
From: Wolfram Sang @ 2011-12-13 17:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Anatolij Gustschin, linuxppc-dev, Linus Walleij

Add a 5121-custom reject if an input-only pin is requested to be output
(see 18.3.1.1 in the refman). Also, rewrite mach-specific quirk setup to
consume less lines which scales better.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
---

Since V1, added an empty line after the new function.

 drivers/gpio/gpio-mpc8xxx.c |   18 +++++++++++++-----
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/gpio/gpio-mpc8xxx.c b/drivers/gpio/gpio-mpc8xxx.c
index ec3fcf0..bb4975f 100644
--- a/drivers/gpio/gpio-mpc8xxx.c
+++ b/drivers/gpio/gpio-mpc8xxx.c
@@ -115,6 +115,15 @@ static int mpc8xxx_gpio_dir_in(struct gpio_chip *gc, unsigned int gpio)
 	return 0;
 }
 
+static int mpc5121_gpio_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)
+{
+	/* GPIO 28..31 are input only on MPC5121 */
+	if (gpio >= 28)
+		return -EINVAL;
+
+	return mpc8xxx_gpio_dir_out(gc, gpio, val);
+}
+
 static int mpc8xxx_gpio_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)
 {
 	struct of_mm_gpio_chip *mm = to_of_mm_gpio_chip(gc);
@@ -340,11 +349,10 @@ static void __init mpc8xxx_add_controller(struct device_node *np)
 	mm_gc->save_regs = mpc8xxx_gpio_save_regs;
 	gc->ngpio = MPC8XXX_GPIO_PINS;
 	gc->direction_input = mpc8xxx_gpio_dir_in;
-	gc->direction_output = mpc8xxx_gpio_dir_out;
-	if (of_device_is_compatible(np, "fsl,mpc8572-gpio"))
-		gc->get = mpc8572_gpio_get;
-	else
-		gc->get = mpc8xxx_gpio_get;
+	gc->direction_output = of_device_is_compatible(np, "fsl,mpc5121-gpio") ?
+		mpc5121_gpio_dir_out : mpc8xxx_gpio_dir_out;
+	gc->get = of_device_is_compatible(np, "fsl,mpc8572-gpio") ?
+		mpc8572_gpio_get : mpc8xxx_gpio_get;
 	gc->set = mpc8xxx_gpio_set;
 	gc->to_irq = mpc8xxx_gpio_to_irq;
 
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCH v2 01/19] mxc_udc: add workaround for ENGcm09152 for i.MX25
From: Felipe Balbi @ 2011-12-13 14:18 UTC (permalink / raw)
  To: Eric Bénard
  Cc: Greg Kroah-Hartman, open list:FREESCALE USB PER..., open list,
	Felipe Balbi, Sascha Hauer, open list:FREESCALE USB PER...,
	linux-arm-kernel
In-Reply-To: <1323785377-22463-1-git-send-email-eric@eukrea.com>

[-- Attachment #1: Type: text/plain, Size: 2468 bytes --]

Hi,

On Tue, Dec 13, 2011 at 03:09:37PM +0100, Eric Bénard wrote:
> this patch gives the possibility to workaround bug ENGcm09152
> on i.MX25 when the hardware workaround is also implemented on
> the board.
> It covers the workaround described on page 42 of the following Errata,
> titled "USB: UTMI_USBPHY VBUS input impedance implementation error" :
> http://cache.freescale.com/files/dsp/doc/errata/IMX25CE.pdf
> 
> Signed-off-by: Eric Bénard <eric@eukrea.com>
> Cc: Sascha Hauer <kernel@pengutronix.de>
> Cc: Greg Kroah-Hartman <gregkh@suse.de>
> Cc: Li Yang <leoli@freescale.com>

Sascha, are you taking this one or should I take it ? One comment below
though.

> ---
>  drivers/usb/gadget/fsl_mxc_udc.c |   22 +++++++++++++---------
>  1 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/usb/gadget/fsl_mxc_udc.c b/drivers/usb/gadget/fsl_mxc_udc.c
> index dcbc0a2..4aff05d 100644
> --- a/drivers/usb/gadget/fsl_mxc_udc.c
> +++ b/drivers/usb/gadget/fsl_mxc_udc.c
> @@ -23,7 +23,7 @@
>  static struct clk *mxc_ahb_clk;
>  static struct clk *mxc_usb_clk;
>  
> -/* workaround ENGcm09152 for i.MX35 */
> +/* workaround ENGcm09152 for i.MX25/35 */
>  #define USBPHYCTRL_OTGBASE_OFFSET	0x608
>  #define USBPHYCTRL_EVDO			(1 << 23)
>  
> @@ -89,16 +89,20 @@ eenahb:
>  void fsl_udc_clk_finalize(struct platform_device *pdev)
>  {
>  	struct fsl_usb2_platform_data *pdata = pdev->dev.platform_data;
> -	if (cpu_is_mx35()) {
> +	if (cpu_is_mx25() || cpu_is_mx35()) {
>  		unsigned int v;
> -
> -		/* workaround ENGcm09152 for i.MX35 */
> +		void __iomem *otgbase;
> +		if (cpu_is_mx25())
> +			otgbase = MX25_IO_ADDRESS(MX25_USB_BASE_ADDR +
> +					USBPHYCTRL_OTGBASE_OFFSET);
> +		else if (cpu_is_mx35())
> +			otgbase = MX35_IO_ADDRESS(MX35_USB_BASE_ADDR +
> +					USBPHYCTRL_OTGBASE_OFFSET);

I don't want to see cpu_is_* or machine_is_* or <plat/*> or <mach/*>
anywhere on drivers, please make sure to remove them for 3.4 merge
window the latest. You should also make sure that this driver has no
static globals. IOW, I would like to be able to have multiple instances
of this driver working fine.

Granted, you have no boards with such setup, but we made that mistake on
musb driver and now we have a bunch of other stuff to rework in order to
get our dual-MUSB board to work ;-)

I also see that fsl_udc_core.c is still not using dev_pm_ops, please fix
that too.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH v2 01/19] mxc_udc: add workaround for ENGcm09152 for i.MX25
From: Eric Bénard @ 2011-12-13 14:09 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Greg Kroah-Hartman, open list:FREESCALE USB PER..., open list,
	Felipe Balbi, Sascha Hauer, open list:FREESCALE USB PER...
In-Reply-To: <20111213101846.GC2619@pengutronix.de>

this patch gives the possibility to workaround bug ENGcm09152
on i.MX25 when the hardware workaround is also implemented on
the board.
It covers the workaround described on page 42 of the following Errata,
titled "USB: UTMI_USBPHY VBUS input impedance implementation error" :
http://cache.freescale.com/files/dsp/doc/errata/IMX25CE.pdf

Signed-off-by: Eric Bénard <eric@eukrea.com>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Li Yang <leoli@freescale.com>
---
 drivers/usb/gadget/fsl_mxc_udc.c |   22 +++++++++++++---------
 1 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/usb/gadget/fsl_mxc_udc.c b/drivers/usb/gadget/fsl_mxc_udc.c
index dcbc0a2..4aff05d 100644
--- a/drivers/usb/gadget/fsl_mxc_udc.c
+++ b/drivers/usb/gadget/fsl_mxc_udc.c
@@ -23,7 +23,7 @@
 static struct clk *mxc_ahb_clk;
 static struct clk *mxc_usb_clk;
 
-/* workaround ENGcm09152 for i.MX35 */
+/* workaround ENGcm09152 for i.MX25/35 */
 #define USBPHYCTRL_OTGBASE_OFFSET	0x608
 #define USBPHYCTRL_EVDO			(1 << 23)
 
@@ -89,16 +89,20 @@ eenahb:
 void fsl_udc_clk_finalize(struct platform_device *pdev)
 {
 	struct fsl_usb2_platform_data *pdata = pdev->dev.platform_data;
-	if (cpu_is_mx35()) {
+	if (cpu_is_mx25() || cpu_is_mx35()) {
 		unsigned int v;
-
-		/* workaround ENGcm09152 for i.MX35 */
+		void __iomem *otgbase;
+		if (cpu_is_mx25())
+			otgbase = MX25_IO_ADDRESS(MX25_USB_BASE_ADDR +
+					USBPHYCTRL_OTGBASE_OFFSET);
+		else if (cpu_is_mx35())
+			otgbase = MX35_IO_ADDRESS(MX35_USB_BASE_ADDR +
+					USBPHYCTRL_OTGBASE_OFFSET);
+
+		/* workaround ENGcm09152 for i.MX25/35 */
 		if (pdata->workaround & FLS_USB2_WORKAROUND_ENGCM09152) {
-			v = readl(MX35_IO_ADDRESS(MX35_USB_BASE_ADDR +
-					USBPHYCTRL_OTGBASE_OFFSET));
-			writel(v | USBPHYCTRL_EVDO,
-				MX35_IO_ADDRESS(MX35_USB_BASE_ADDR +
-					USBPHYCTRL_OTGBASE_OFFSET));
+			v = readl(otgbase);
+			writel(v | USBPHYCTRL_EVDO, otgbase);
 		}
 	}
 
-- 
1.7.6.4

^ permalink raw reply related

* Re: [PATCH 3/4] ppc32/kprobe: complete kprobe and migrate exception frame
From: tiejun.chen @ 2011-12-13 10:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <4EE6DA8C.8090107@windriver.com>

>> You need to hook into "resume_kernel" instead.
>

I regenerate this with hooking into resume_kernel in below.

> Maybe I'm misunderstanding what you mean since as I recall you suggestion we
> should do this at the end of do_work.
> 
>> Also, we may want to simplify the whole thing, instead of checking user
>> vs. kernel first etc... we could instead have a single _TIF_WORK_MASK
>> which includes both the bits for user work and the new bit for kernel
>> work. With preempt, the kernel work bits would also include
>> _TIF_NEED_RESCHED.
>>
>> Then you have in the common exit path, a single test for that, with a
>> fast path that skips everything and just goes to "restore" for both
>> kernel and user.
>>
>> The only possible issue is the setting of dbcr0 for BookE and 44x and we
>> can keep that as a special case keyed of MSR_PR in the resume path under
>> ifdef BOOKE (we'll probably sanitize that later with some different
>> rework anyway). 
>>
>> So the exit path because something like:
>>
>> ret_from_except:
>> 	.. hard disable interrupts (unchanged) ...
>> 	read TIF flags
>> 	andi with _TIF_WORK_MASK
>> 		nothing set -> restore
>> 	check PR
>> 		set -> do_work_user
>> 		no set -> do_work_kernel (kprobes & preempt)
>> 		(both loop until relevant _TIF flags are all clear)
>> restore:
>> 	#ifdef BOOKE & 44x test PR & do dbcr0 stuff if needed
>> 	... nornal restore ...
> 
> Do you mean we should reorganize current ret_from_except for ppc32 as well?
> 

I assume it may not necessary to reorganize ret_from_except for *ppc32* .

>>>  do_user_signal:			/* r10 contains MSR_KERNEL here */
>>>  	ori	r10,r10,MSR_EE
>>>  	SYNC
>>> @@ -1202,6 +1204,30 @@ do_user_signal:			/* r10 contains MSR_KERNEL here */
>>>  	REST_NVGPRS(r1)
>>>  	b	recheck
>>>  
>>> +restore_kprobe:
>>> +	lwz	r3,GPR1(r1)
>>> +	subi    r3,r3,INT_FRAME_SIZE; /* Allocate a trampoline exception frame */
>>> +	mr	r4,r1
>>> +	bl	copy_exc_stack	/* Copy from the original to the trampoline */
>>> +
>>> +	/* Do real stw operation to complete stwu */
>>> +	mr	r4,r1
>>> +	addi	r4,r4,INT_FRAME_SIZE	/* Get kprobed entry */
>>> +	lwz	r5,GPR1(r1)		/* Backup r1 */
>>> +	stw	r4,GPR1(r1)		/* Now store that safely */
>> The above confuses me. Shouldn't you do instead something like
>>
>> 	lwz	r4,GPR1(r1)

Now GPR1(r1) is already pointed with new r1 in emulate_step().

>> 	subi	r3,r4,INT_FRAME_SIZE

Here we need this, 'mr r4,r1', since r1 holds current exception stack.

>> 	li	r5,INT_FRAME_SIZE
>> 	bl	memcpy

Then the current exception stack is migrated below the kprobed function stack.

stack flow:

--------------------------  -> old r1 when hit 'stwu r1, -AA(r1)' in our
        ^       ^           kprobed function entry.
        |       |
        |       |------------> AA allocated for the kprobed function
        |       |
        |       v
--------|-----------------  -> new r1, also GPR1(r1). It holds the kprobed
   ^    |                   function stack , -AA(r1).
   |    |
   |    |--------------------> INT_FRAME_SIZE for program exception
   |    |
   |    v
---|---------------------  -> r1 is updated to hold program exception stack.
   |
   |
   |------------------------> migrate the exception stack (r1) below the
   |                        kprobed after memcpy with INT_FRAME_SIZE.
   v
-------------------------  -> reroute this as r1 for program exception stack.

>>
> 
> Anyway I'll try this if you think memcpy is fine/safe in exception return codes.
> 
>> To start with, then you need to know the "old" r1 value which may or may
>> not be related to your current r1. The emulation code should stash it
> 
> If the old r1 is not related to our current r1, it shouldn't be possible to go
> restore_kprob since we set that new flag only for the current.
> 
> If I'm wrong please correct me :)

If you agree what I say above, and its also avoid any issue introduced with
orig_gpr3, please check the follow:
=========
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 56212bc..277029d 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -813,9 +813,40 @@ restore_user:

 #ifdef CONFIG_PREEMPT
        b       restore
+#endif

-/* N.B. the only way to get here is from the beq following ret_from_except. */
 resume_kernel:
+#ifdef CONFIG_KPROBES
+       /* check current_thread_info, _TIF_EMULATE_STACK_STORE */
+       rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
+       lwz     r0,TI_FLAGS(r9)
+       andis.  r0,r0,_TIF_EMULATE_STACK_STORE@h
+       beq+    restore_kernel
+
+       addi    r9,r1,INT_FRAME_SIZE    /* Get the kprobed function entry */
+
+       lwz     r3,GPR1(r1)
+       subi    r3,r3,INT_FRAME_SIZE    /* dst: Allocate a trampoline exception
frame */
+       mr      r4,r1                   /* src:  current exception frame */
+       li      r5,INT_FRAME_SIZE       /* size: INT_FRAME_SIZE */
+       mr      r1,r3                   /* Reroute the trampoline frame to r1 */
+       bl      memcpy                  /* Copy from the original to the
trampoline */
+
+       /* Do real store operation to complete stwu */
+       lwz     r5,GPR1(r1)
+       stw     r9,0(r5)
+
+       /* Clear _TIF_EMULATE_STACK_STORE flag */
+       rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
+       lwz     r0,TI_FLAGS(r9)
+       rlwinm  r0,r0,0,_TIF_EMULATE_STACK_STORE
+       stw     r0,TI_FLAGS(r9)
+
+restore_kernel:
+#endif
+
+#ifdef CONFIG_PREEMPT
+/* N.B. the only way to get here is from the beq following ret_from_except. */
        /* check current_thread_info->preempt_count */
        rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
        lwz     r0,TI_PREEMPT(r9)
@@ -844,8 +875,6 @@ resume_kernel:
         */
        bl      trace_hardirqs_on
 #endif
-#else
-resume_kernel:
 #endif /* CONFIG_PREEMPT */

        /* interrupts are hard-disabled at this point */

Tiejun

^ permalink raw reply related

* Re: [PATCH 01/19] mxc_udc: add workaround for ENGcm09152 for i.MX25
From: Wolfram Sang @ 2011-12-13 10:18 UTC (permalink / raw)
  To: Eric Bénard
  Cc: Greg Kroah-Hartman, open list:FREESCALE USB PER..., open list,
	Felipe Balbi, Sascha Hauer, open list:FREESCALE USB PER...,
	linux-arm-kernel
In-Reply-To: <1323757911-25217-1-git-send-email-eric@eukrea.com>

[-- Attachment #1: Type: text/plain, Size: 628 bytes --]

On Tue, Dec 13, 2011 at 07:31:33AM +0100, Eric Bénard wrote:
> this patch gives the possibility to workaround bug ENGcm09152
> on i.MX25 when the hardware workaround is also implemented on
> the board.
> It covers the workaround described on page 42 of the following Errata :
> http://cache.freescale.com/files/dsp/doc/errata/IMX25CE.pdf

Nit: The title of the ENG would help reviewing, like it is preferred to
cite patch titles when mentioning the commit-id.

-- 
Pengutronix e.K.                           | Wolfram Sang                |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH 01/19] mxc_udc: add workaround for ENGcm09152 for i.MX25
From: Sascha Hauer @ 2011-12-13 10:17 UTC (permalink / raw)
  To: Eric Bénard
  Cc: Greg Kroah-Hartman, open list:FREESCALE USB PER..., open list,
	Felipe Balbi, Sascha Hauer, open list:FREESCALE USB PER...,
	linux-arm-kernel
In-Reply-To: <1323757911-25217-1-git-send-email-eric@eukrea.com>

Hi Eric,

To make the handling of your patches easier, please

- send a cover letter for multiple patches
- separate them at least by fixes and features
- if possible, do not send patches aimed at multiple
  maintainers in a single series.

I picked every patch looking like a fix for now myself,
but please remember next time. I'll have a look at the non-fix
patches later.

Sascha

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH 3/4] ppc32/kprobe: complete kprobe and migrate exception frame
From: tiejun.chen @ 2011-12-13 10:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <4EE70B13.7000102@windriver.com>

Sorry please ignore this email since I'm missing something here :(

Tiejun

tiejun.chen wrote:
>>> You need to hook into "resume_kernel" instead.
>> Maybe I'm misunderstanding what you mean since as I recall you suggestion we
>> should do this at the end of do_work.
>>
> 
> I regenerate this with hooking into resume_kernel in below.
> 
>>> Also, we may want to simplify the whole thing, instead of checking user
>>> vs. kernel first etc... we could instead have a single _TIF_WORK_MASK
>>> which includes both the bits for user work and the new bit for kernel
>>> work. With preempt, the kernel work bits would also include
>>> _TIF_NEED_RESCHED.
>>>
>>> Then you have in the common exit path, a single test for that, with a
>>> fast path that skips everything and just goes to "restore" for both
>>> kernel and user.
>>>
>>> The only possible issue is the setting of dbcr0 for BookE and 44x and we
>>> can keep that as a special case keyed of MSR_PR in the resume path under
>>> ifdef BOOKE (we'll probably sanitize that later with some different
>>> rework anyway). 
>>>
>>> So the exit path because something like:
>>>
>>> ret_from_except:
>>> 	.. hard disable interrupts (unchanged) ...
>>> 	read TIF flags
>>> 	andi with _TIF_WORK_MASK
>>> 		nothing set -> restore
>>> 	check PR
>>> 		set -> do_work_user
>>> 		no set -> do_work_kernel (kprobes & preempt)
>>> 		(both loop until relevant _TIF flags are all clear)
>>> restore:
>>> 	#ifdef BOOKE & 44x test PR & do dbcr0 stuff if needed
>>> 	... nornal restore ...
>> Do you mean we should reorganize current ret_from_except for ppc32 as well?
> 
> I assume it may not necessary to reorganize ret_from_except for *ppc32*.
> 
>>>>  do_user_signal:			/* r10 contains MSR_KERNEL here */
>>>>  	ori	r10,r10,MSR_EE
>>>>  	SYNC
>>>> @@ -1202,6 +1204,30 @@ do_user_signal:			/* r10 contains MSR_KERNEL here */
>>>>  	REST_NVGPRS(r1)
>>>>  	b	recheck
>>>>  
>>>> +restore_kprobe:
>>>> +	lwz	r3,GPR1(r1)
>>>> +	subi    r3,r3,INT_FRAME_SIZE; /* Allocate a trampoline exception frame */
>>>> +	mr	r4,r1
>>>> +	bl	copy_exc_stack	/* Copy from the original to the trampoline */
>>>> +
>>>> +	/* Do real stw operation to complete stwu */
>>>> +	mr	r4,r1
>>>> +	addi	r4,r4,INT_FRAME_SIZE	/* Get kprobed entry */
>>>> +	lwz	r5,GPR1(r1)		/* Backup r1 */
>>>> +	stw	r4,GPR1(r1)		/* Now store that safely */
>>> The above confuses me. Shouldn't you do instead something like
>>>
>>> 	lwz	r4,GPR1(r1)
> 
> Now GPR1(r1) is already pointed with new r1 in emulate_step().
> 
>>> 	subi	r3,r4,INT_FRAME_SIZE
> 
> Here we need this, 'mr r4,r1', since r1 holds current exception stack.
> 
>>> 	li	r5,INT_FRAME_SIZE
>>> 	bl	memcpy
> 
> Then the current exception stack is migrated below the kprobed function stack.
> 
> stack flow:
> 
> --------------------------  -> old r1 when hit 'stwu r1, -AA(r1)' in our
>         ^       ^           kprobed function entry.
>         |       |
>         |       |------------> AA allocated for the kprobed function
>         |       |
>         |       v
> --------|-----------------  -> new r1, also GPR1(r1). It holds the kprobed
>    ^    |                   function stack , -AA(r1).
>    |    |
>    |    |--------------------> INT_FRAME_SIZE for program exception
>    |    |
>    |    v
> ---|---------------------  -> r1 is updated to hold program exception stack.
>    |
>    |
>    |------------------------> migrate the exception stack (r1) below the
>    |                        kprobed after memcpy with INT_FRAME_SIZE.
>    v
> -------------------------  -> reroute this as r1 for program exception stack.
> 
>> Anyway I'll try this if you think memcpy is fine/safe in exception return codes.
>>
>>> To start with, then you need to know the "old" r1 value which may or may
>>> not be related to your current r1. The emulation code should stash it
>> If the old r1 is not related to our current r1, it shouldn't be possible to go
>> restore_kprob since we set that new flag only for the current.
> 		
> If you agree what I say above, please check the follow:
> ======
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 56212bc..b6554c1 100644
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 56212bc..b6554c1 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -813,12 +813,40 @@ restore_user:
> 
>  #ifdef CONFIG_PREEMPT
>         b       restore
> +#endif
> 
> -/* N.B. the only way to get here is from the beq following ret_from_except. */
>  resume_kernel:
>         /* check current_thread_info->preempt_count */
>         rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
>         lwz     r0,TI_PREEMPT(r9)
> +       andis.  r0,r0,_TIF_EMULATE_STACK_STORE@h
> +       beq+    restore
> +
> +       lwz     r3,GPR1(r1)
> +       subi    r3,r3,INT_FRAME_SIZE    /* Allocate a trampoline exception frame */
> +       mr      r4,r1
> +       li      r5,INT_FRAME_SIZE
> +       bl      memcpy                  /* Copy from the original to the
> trampoline */
> +
> +       /* Do real store operation to complete stwu */
> +       addi    r4,r1,INT_FRAME_SIZE    /* Get the kprobed function entry */
> +       lwz     r5,GPR1(r1)
> +       stw     r4,0(r5)                /* Now store that safely */
> +
> +       /* Reroute the trampoline frame to r1 */
> +       subi    r1,r5,INT_FRAME_SIZE
> +
> +       /* Clear _TIF_EMULATE_STACK_STORE flag */
> +       rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
> +       lwz     r0,TI_FLAGS(r9)
> +       rlwinm  r0,r0,0,_TIF_EMULATE_STACK_STORE
> +       stw     r0,TI_FLAGS(r9)
> +
> +#ifdef CONFIG_PREEMPT
> +/* N.B. the only way to get here is from the beq following ret_from_except. */
> +       /* check current_thread_info->preempt_count */
> +       rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
> +       lwz     r0,TI_PREEMPT(r9)
>         cmpwi   0,r0,0          /* if non-zero, just restore regs and return */
>         bne     restore
>         lwz     r0,TI_FLAGS(r9)
> @@ -844,8 +872,6 @@ resume_kernel:
>          */
>         bl      trace_hardirqs_on
>  #endif
> -#else
> -resume_kernel:
>  #endif /* CONFIG_PREEMPT */
> 
>         /* interrupts are hard-disabled at this point */
> 
> Tiejun
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox