LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ppc476: Enable a linker work around for IBM errata #46
From: Alistair Popple @ 2014-02-24  7:00 UTC (permalink / raw)
  To: benh; +Cc: Alistair Popple, linuxppc-dev

This patch adds an option to enable a work around for an icache bug on
476 that can cause execution of stale instructions when falling
through pages (IBM errata #46). It requires a recent version of
binutils which supports the --ppc476-workaround option.

The work around enables the appropriate linker options and ensures
that all module output sections are aligned to 4K page boundaries. The
work around is only required when building modules.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
 arch/powerpc/Makefile                         |    5 +++++
 arch/powerpc/platforms/44x/Kconfig            |   14 ++++++++++++++
 arch/powerpc/platforms/44x/ppc476_modules.lds |   15 +++++++++++++++
 3 files changed, 34 insertions(+)
 create mode 100644 arch/powerpc/platforms/44x/ppc476_modules.lds

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 0f4344e..2b13616 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -158,6 +158,11 @@ CHECKFLAGS	+= -m$(CONFIG_WORD_SIZE) -D__powerpc__ -D__powerpc$(CONFIG_WORD_SIZE)
 
 KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
 
+ifeq ($(CONFIG_476FPE_ERR46),y)
+	KBUILD_LDFLAGS_MODULE += --ppc476-workaround \
+		-T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
+endif
+
 # No AltiVec or VSX instructions when building kernel
 KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
 KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/Kconfig
index d6c7506..b817bf58 100644
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -324,6 +324,20 @@ config APM821xx
 	select IBM_EMAC_EMAC4
 	select IBM_EMAC_TAH
 
+config 476FPE_ERR46
+	depends on 476FPE
+	bool "Enable linker work around for PPC476FPE errata #46"
+	help
+	  This option enables a work around for an icache bug on 476
+	  that can cause execution of stale instructions when falling
+	  through pages (IBM errata #46). It requires a recent version
+	  of binutils which supports the --ppc476-workaround option.
+
+	  The work around enables the appropriate linker options and
+	  ensures that all module output sections are aligned to 4K
+	  page boundaries. The work around is only required when
+	  building modules.
+
 # 44x errata/workaround config symbols, selected by the CPU models above
 config IBM440EP_ERR42
 	bool
diff --git a/arch/powerpc/platforms/44x/ppc476_modules.lds b/arch/powerpc/platforms/44x/ppc476_modules.lds
new file mode 100644
index 0000000..9fec5d3
--- /dev/null
+++ b/arch/powerpc/platforms/44x/ppc476_modules.lds
@@ -0,0 +1,15 @@
+SECTIONS
+{
+	.text : ALIGN(4096)
+	{
+		*(.text .text.* .fixup)
+	}
+	.init.text : ALIGN(4096)
+	{
+		*(.init.text .init.text.*)
+	}
+	.exit.text : ALIGN(4096)
+	{
+		*(.exit.text .exit.text.*)
+	}
+}
-- 
1.7.10.4

^ permalink raw reply related

* Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node
From: Joonsoo Kim @ 2014-02-24  5:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Han Pingtian, Nishanth Aravamudan, Matt Mackall, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Anton Blanchard,
	David Rientjes, linuxppc-dev, Wanpeng Li
In-Reply-To: <alpine.DEB.2.10.1402181033480.28964@nuc>

On Tue, Feb 18, 2014 at 10:38:01AM -0600, Christoph Lameter wrote:
> On Mon, 17 Feb 2014, Joonsoo Kim wrote:
> 
> > On Wed, Feb 12, 2014 at 04:16:11PM -0600, Christoph Lameter wrote:
> > > Here is another patch with some fixes. The additional logic is only
> > > compiled in if CONFIG_HAVE_MEMORYLESS_NODES is set.
> > >
> > > Subject: slub: Memoryless node support
> > >
> > > Support memoryless nodes by tracking which allocations are failing.
> >
> > I still don't understand why this tracking is needed.
> 
> Its an optimization to avoid calling the page allocator to figure out if
> there is memory available on a particular node.
> 
> > All we need for allcation targeted to memoryless node is to fallback proper
> > node, that it, numa_mem_id() node of targeted node. My previous patch
> > implements it and use proper fallback node on every allocation code path.
> > Why this tracking is needed? Please elaborate more on this.
> 
> Its too slow to do that on every alloc. One needs to be able to satisfy
> most allocations without switching percpu slabs for optimal performance.

I don't think that we need to switch percpu slabs on every alloc.
Allocation targeted to specific node is rare. And most of these allocations
may be targeted to either numa_node_id() or numa_mem_id(). My patch considers
these cases, so most of allocations are processed by percpu slabs. There is
no suboptimal performance.

> 
> > > Allocations targeted to the nodes without memory fall back to the
> > > current available per cpu objects and if that is not available will
> > > create a new slab using the page allocator to fallback from the
> > > memoryless node to some other node.
> 
> And what about the next alloc? Assuem there are N allocs from a memoryless
> node this means we push back the partial slab on each alloc and then fall
> back?
> 
> > >  {
> > >  	void *object;
> > > -	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
> > > +	int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
> > >
> > >  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> > >  	if (object || node != NUMA_NO_NODE)
> >
> > This isn't enough.
> > Consider that allcation targeted to memoryless node.
> 
> It will not common get there because of the tracking. Instead a per cpu
> object will be used.
> > get_partial_node() always fails even if there are some partial slab on
> > memoryless node's neareast node.
> 
> Correct and that leads to a page allocator action whereupon the node will
> be marked as empty.

Why do we need to request to a page allocator if there is partial slab?
Checking whether node is memoryless or not is really easy, so we don't need
to skip this. To skip this is suboptimal solution.

> > We should fallback to some proper node in this case, since there is no slab
> > on memoryless node.
> 
> NUMA is about optimization of memory allocations. It is often *not* about
> correctness but heuristics are used in many cases. F.e. see the zone
> reclaim logic, zone reclaim mode, fallback scenarios in the page allocator
> etc etc.

Okay. But, 'do our best' is preferable to me.

Thanks.

^ permalink raw reply

* Re: [PATCH 2/7] IBM Akebono: Add support for a new PHY interface to the IBM emac driver
From: Alistair Popple @ 2014-02-24  2:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: netdev@vger.kernel.org, David S. Miller,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org
In-Reply-To: <20140221111823.GA8783@e106331-lin.cambridge.arm.com>

On Fri, 21 Feb 2014 11:18:23 Mark Rutland wrote:
> On Fri, Feb 21, 2014 at 06:31:28AM +0000, Alistair Popple wrote:

[...]

> 
> > +       /* Check for RGMII flags */
> > +       if (of_get_property(ofdev->dev.of_node, "has-mdio", NULL))
> > +               dev->flags |= EMAC_RGMII_FLAG_HAS_MDIO;
> 
> You can use of_property_read_bool here.

Thanks. I will update it.

Regards,

Alistair

> 
> Cheers,
> Mark.

^ permalink raw reply

* Re: [PATCH v4 0/3] powerpc/pseries: fix issues in suspend/resume code
From: Benjamin Herrenschmidt @ 2014-02-24  1:53 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: nfont, linuxppc-dev
In-Reply-To: <1392843415-17153-1-git-send-email-tyreld@linux.vnet.ibm.com>

On Wed, 2014-02-19 at 12:56 -0800, Tyrel Datwyler wrote:
> This patchset fixes a couple of issues encountered in the suspend/resume code
> base. First when using the kernel device tree update code update-nodes is
> unnecessarily called more than once. Second the cpu cache lists are not
> updated after a suspend/resume which under certain conditions may cause a
> panic. Finally, since the cache list fix utilzes in kernel device tree update
> code a means for telling drmgr that updating the device tree from userspace
> is unnecessary.

This series breaks the !SMP build.

Cheers,
Ben.

> Changes from v3:
> - Updated patch descriptions to better reflect the behavior/interface changes
>   and why they are acceptable per Ben's concerns.
> 
> Changes from v2:
> - Moved dynamic configuration update code into pseries specific routine
>   per Nathan's suggestion.
> 
> Changes from v1:
> - Fixed several commit message typos
> - Fixed authorship of first two patches
> 
> Haren Myneni (2):
>   powerpc/pseries: Device tree should only be updated once after
>     suspend/migrate
>   powerpc/pseries: Update dynamic cache nodes for suspend/resume
>     operation
> 
> Tyrel Datwyler (1):
>   powerpc/pseries: Report in kernel device tree update to drmgr
> 
>  arch/powerpc/include/asm/rtas.h           |  1 +
>  arch/powerpc/platforms/pseries/mobility.c | 26 +++++++-----------
>  arch/powerpc/platforms/pseries/suspend.c  | 44 ++++++++++++++++++++++++++++++-
>  3 files changed, 54 insertions(+), 17 deletions(-)
> 

^ permalink raw reply

* Re: [PATCH 1/7] IBM Akebono: Add a SDHCI platform driver
From: Alistair Popple @ 2014-02-24  0:38 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: devicetree, cjb, linux-mmc, linux-kernel, linuxppc-dev
In-Reply-To: <1773685.WfxOP0FuHM@wuerfel>

On Fri, 21 Feb 2014 15:14:30 Arnd Bergmann wrote:
> On Friday 21 February 2014 17:31:27 Alistair Popple wrote:
> > +config MMC_SDHCI_OF_476GTR
> > +	tristate "SDHCI OF support for the IBM PPC476GTR SoC"
> > +	depends on MMC_SDHCI_PLTFM
> > +	depends on PPC_OF
> > +	help
> > +	  This selects the Secure Digital Host Controller Interface (SDHCI)
> > +	  found on the PPC476GTR SoC.
> > +
> > +	  If you have a controller with this interface, say Y or M here.
> > +
> > +	  If unsure, say N.
> 
> Your driver doesn't actually do anything beyond what is in the common
> sdhci-pltfm.c infrastructure. IMHO you really shoulnd't need a SoC
> specific abstraction for it at all and instead add a generic
> platform driver registration into sdhci-pltfm.c. I'd suggest
> you use "generic-sdhci" (similar to what we do for usb-ohci and usb-ehci
> now) as the compatible string and change your device tree to claim
> compatibility with that and your soc-specific string.

That's a reasonable point. I guess I was just following the example set by the 
other sdhci-* drivers. However on review they're not as generic as this one so 
I will merge this into sdhci-pltfm.c as suggested.

- Alistair

> 	Arnd

^ permalink raw reply

* Re: [PATCH 4/7] ECHI Platform: Merge ppc-of EHCI driver into the ehci-platform driver
From: Alistair Popple @ 2014-02-24  0:28 UTC (permalink / raw)
  To: Alan Stern
  Cc: Mark Rutland, devicetree, gregkh, linux-usb, linux-kernel,
	Tony Prisk, linuxppc-dev
In-Reply-To: <Pine.LNX.4.44L0.1402211037300.1273-100000@iolanthe.rowland.org>

On Fri, 21 Feb 2014 10:41:50 Alan Stern wrote:
> On Fri, 21 Feb 2014, Alistair Popple wrote:
> > Currently the ppc-of driver uses the compatibility string
> > "usb-ehci". This means platforms that use device-tree and implement an
> > EHCI compatible interface have to either use the ppc-of driver or add
> > a compatible line to the ehci-platform driver. It would be more
> > appropriate for the platform driver to be compatible with "usb-ehci"
> > as non-powerpc platforms are also beginning to utilise device-tree.
> > 
> > This patch merges the device tree property parsing from ehci-ppc-of
> > into the platform driver and adds a "usb-ehci" compatibility
> > string. The existing ehci-ppc-of driver is removed and the 440EPX
> > specific quirks are added to the ehci-platform driver.
> > 
> > Signed-off-by: Alistair Popple <alistair@popple.id.au>
> 
> This patch is also out of date.  The compatibility string used by the
> ehci-platform driver is "generic-ehci".

Ok. I will switch to using "generic-ehci" in the Akebono device tree.

> There remains the question of whether to merge ehci-ppc-of into
> ehci-platform.  This would be a rather invasive change, but I suppose
> we could do it.  With adjustments along the lines suggested by Mark
> Rutland.

As I can use "generic-ehci" to support Akebono I will drop this patch. I can 
look at merging ehci-ppc-of at some later date if desired (thanks Mark & Tony 
for reviewing though).

- Alistair

> Alan Stern

^ permalink raw reply

* Re: [PATCH 3/7] IBM Akebono: Add support to the OHCI platform driver for PPC476GTR
From: Alistair Popple @ 2014-02-24  0:20 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: gregkh, linux-usb, linuxppc-dev, linux-kernel, devicetree
In-Reply-To: <16991392.luyzl1NYP3@wuerfel>

On Fri, 21 Feb 2014 15:16:52 Arnd Bergmann wrote:
> On Friday 21 February 2014 17:31:29 Alistair Popple wrote:
> > +static const struct of_device_id ohci_of_match[] = {
> > +       { .compatible = "usb-ohci", },
> > +       {},
> > +};
> > +
> > 
> >  static const struct platform_device_id ohci_platform_table[] = {
> >  
> >         { "ohci-platform", 0 },
> >         { }
> > 
> > @@ -198,6 +209,7 @@ static struct platform_driver ohci_platform_driver = {
> > 
> >                 .owner  = THIS_MODULE,
> >                 .name   = "ohci-platform",
> >                 .pm     = &ohci_platform_pm_ops,
> > 
> > +               .of_match_table = ohci_of_match,
> > 
> >         }
> >  
> >  };
> 
> Linux-next already has a patch to add an of_match_table in this driver,
> using
> 
> static const struct of_device_id ohci_platform_ids[] = {
>         { .compatible = "generic-ohci", },
>         { }
> };
> 
> I think you should just use that string on your platform.

Excellent! I will drop this patch and use "generic-ohci" instead. Thanks for 
pointing this out.

- Alistair

> 	Arnd

^ permalink raw reply

* Re: [patch 03/26] powerpc: eeh: Kill another abuse of irq_desc
From: Benjamin Herrenschmidt @ 2014-02-23 22:26 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Peter Zijlstra, Ingo Molnar, ppc, LKML, Gavin Shan
In-Reply-To: <20140223212736.562906212@linutronix.de>

The problems iirc have to do with drivers doing stupid things, but I'll
let Gavin comment further and fix that up properly.

Cheers,
Ben.

^ permalink raw reply

* [patch 03/26] powerpc: eeh: Kill another abuse of irq_desc
From: Thomas Gleixner @ 2014-02-23 21:40 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, ppc, Gavin Shan
In-Reply-To: <20140223212703.511977310@linutronix.de>

commit 91150af3a (powerpc/eeh: Fix unbalanced enable for IRQ) is
another brilliant example of trainwreck engineering.

The patch "fixes" the issue of an unbalanced call to irq_enable()
which causes a prominent warning by checking the disabled state of the
interrupt line and call conditionally into the core code.

This is wrong in two aspects:

1) The warning is there to tell users, that they need to fix their
   asymetric enable/disable patterns by finding the root cause and
   solving it there.

   It's definitely not meant to work around it by conditionally
   calling into the core code depending on the random state of the irq
   line.

   Asymetric irq_disable/enable calls are a clear sign of wrong usage
   of the interfaces which have to be cured at the root and not by
   somehow hacking around it.

2) The abuse of core internal data structure instead of using the
   proper interfaces for retrieving the information for the 'hack
   around'

   irq_desc is core internal and it's clear enough stated.

Replace at least the irq_desc abuse with the proper functions and add
a big fat comment why this is absurd and completely wrong.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: ppc <linuxppc-dev@lists.ozlabs.org>
---
 arch/powerpc/kernel/eeh_driver.c |   26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

Index: tip/arch/powerpc/kernel/eeh_driver.c
===================================================================
--- tip.orig/arch/powerpc/kernel/eeh_driver.c
+++ tip/arch/powerpc/kernel/eeh_driver.c
@@ -143,15 +143,31 @@ static void eeh_disable_irq(struct pci_d
 static void eeh_enable_irq(struct pci_dev *dev)
 {
 	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-	struct irq_desc *desc;
 
 	if ((edev->mode) & EEH_DEV_IRQ_DISABLED) {
 		edev->mode &= ~EEH_DEV_IRQ_DISABLED;
-
-		desc = irq_to_desc(dev->irq);
-		if (desc && desc->depth > 0)
+		/*
+		 * FIXME !!!!!
+		 *
+		 * This is just ass backwards. This maze has
+		 * unbalanced irq_enable/disable calls. So instead of
+		 * finding the root cause it works around the warning
+		 * in the irq_enable code by conditionally calling
+		 * into it.
+		 *
+		 * That's just wrong.The warning in the core code is
+		 * there to tell people to fix their assymetries in
+		 * their own code, not by abusing the core information
+		 * to avoid it.
+		 *
+		 * I so wish that the assymetry would be the other way
+		 * round and a few more irq_disable calls render that
+		 * shit unusable forever.
+		 *
+		 *	tglx
+		 */
+		if (irqd_irq_disabled(irq_get_irq_data(dev->irq))
 			enable_irq(dev->irq);
-	}
 }
 
 /**

^ permalink raw reply

* [patch 02/26] powerpc:evh_pic: Kill irq_desc abuse
From: Thomas Gleixner @ 2014-02-23 21:40 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ashish Kalra, Ingo Molnar, ppc, Timur Tabi
In-Reply-To: <20140223212703.511977310@linutronix.de>

I'm really grumpy about this one. The line:

#include "../../../kernel/irq/settings.h"

should have been an alarm sign for all people who added their SOB to
this trainwreck.

When I cleaned up the mess people made with interrupt descriptors a
few years ago, I warned that I'm going to hunt down new offenders and
treat them with stinking trouts. In this case I'll use frozen shark
for a better educational value.

The whole idiocy which was done there could have been avoided with two
lines of perfectly fine code. And do not complain about the lack of
correct examples in tree.

The solution is simple:

  Remove the brainfart and use the proper functions, which should
  have been used in the first place

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ashish Kalra <ashish.kalra@freescale.com>
Cc: Timur Tabi <timur@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: ppc <linuxppc-dev@lists.ozlabs.org>
---
 arch/powerpc/sysdev/ehv_pic.c |   10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

Index: tip/arch/powerpc/sysdev/ehv_pic.c
===================================================================
--- tip.orig/arch/powerpc/sysdev/ehv_pic.c
+++ tip/arch/powerpc/sysdev/ehv_pic.c
@@ -28,8 +28,6 @@
 #include <asm/ehv_pic.h>
 #include <asm/fsl_hcalls.h>
 
-#include "../../../kernel/irq/settings.h"
-
 static struct ehv_pic *global_ehv_pic;
 static DEFINE_SPINLOCK(ehv_pic_lock);
 
@@ -113,17 +111,13 @@ static unsigned int ehv_pic_type_to_vecp
 int ehv_pic_set_irq_type(struct irq_data *d, unsigned int flow_type)
 {
 	unsigned int src = virq_to_hw(d->irq);
-	struct irq_desc *desc = irq_to_desc(d->irq);
 	unsigned int vecpri, vold, vnew, prio, cpu_dest;
 	unsigned long flags;
 
 	if (flow_type == IRQ_TYPE_NONE)
 		flow_type = IRQ_TYPE_LEVEL_LOW;
 
-	irq_settings_clr_level(desc);
-	irq_settings_set_trigger_mask(desc, flow_type);
-	if (flow_type & (IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_LEVEL_LOW))
-		irq_settings_set_level(desc);
+	irqd_set_trigger_type(d, flow_type);
 
 	vecpri = ehv_pic_type_to_vecpri(flow_type);
 
@@ -144,7 +138,7 @@ int ehv_pic_set_irq_type(struct irq_data
 	ev_int_set_config(src, vecpri, prio, cpu_dest);
 
 	spin_unlock_irqrestore(&ehv_pic_lock, flags);
-	return 0;
+	return IRQ_SET_MASK_OK_NOCOPY;
 }
 
 static struct irq_chip ehv_pic_irq_chip = {

^ permalink raw reply

* [patch 01/26] powerpc: irq: Use generic_handle_irq
From: Thomas Gleixner @ 2014-02-23 21:40 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, ppc
In-Reply-To: <20140223212703.511977310@linutronix.de>

No functional change

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: ppc <linuxppc-dev@lists.ozlabs.org>
---
 arch/powerpc/kernel/irq.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Index: tip/arch/powerpc/kernel/irq.c
===================================================================
--- tip.orig/arch/powerpc/kernel/irq.c
+++ tip/arch/powerpc/kernel/irq.c
@@ -465,7 +465,6 @@ static inline void check_stack_overflow(
 
 void __do_irq(struct pt_regs *regs)
 {
-	struct irq_desc *desc;
 	unsigned int irq;
 
 	irq_enter();
@@ -487,11 +486,8 @@ void __do_irq(struct pt_regs *regs)
 	/* And finally process it */
 	if (unlikely(irq == NO_IRQ))
 		__get_cpu_var(irq_stat).spurious_irqs++;
-	else {
-		desc = irq_to_desc(irq);
-		if (likely(desc))
-			desc->handle_irq(irq, desc);
-	}
+	else
+		generic_handle_irq(irq);
 
 	trace_irq_exit(regs);
 

^ permalink raw reply

* Re: [PATCH 5/5] powerpc/powernv: Make PHB diag-data output short
From: Gavin Shan @ 2014-02-23  4:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1393013115.6771.98.camel@pasglop>

On Sat, Feb 22, 2014 at 07:05:15AM +1100, Benjamin Herrenschmidt wrote:
>On Fri, 2014-02-21 at 19:53 +0800, Gavin Shan wrote:
>> According to Ben's suggestion, the patch makes the PHB diag-data
>> dump looks a bit short by printing multiple values in one line
>> and outputing "-" for zero fields.
>> 
>> After the patch applied, the PHB diag-data dump looks like:
>
>Actually, I wouldn't do that "-" thing, I would leave zeros as
>zeros but I would remove lines that have all zeros.
>

Ok. I'll change it in next revision :-)

>Additionally, we might want to consider what if we can get rid
>of more fields for INF, or maybe even not dump them by default
>and just count them (should we have counters in sysfs ?)
>

Yes, I'll remove dumping for INF and have a sysfs entry for the
INF counter, which would be separate patch in next revision.

>One thing I'm tempted to do is turn the full logs into actual
>error logs (sent to FSP) and only display a "analyzed" version
>in the kernel, something that decodes the PEST for example
>and indicates if it's an DMA or MMIO error, the address, etc...
>

Ok. I'll try to do it in next revision :-)

Thanks,
Gavin

>> PHB3 PHB#3 Diag-data (Version: 1)
>> 
>>   brdgCtl:     00000002
>>   UtlSts:      - - -
>>   RootSts:     0000000f 00400000 b0830008 00100147 00002000
>>   RootErrSts:  - - -
>>   RootErrLog:  - - - -
>>   RootErrLog1: - - -
>>   nFir:        - 0030006e00000000 -
>>   PhbSts:      0000001c00000000 -
>>   Lem:         0000000000100000 42498e327f502eae -
>>   PhbErr:      - - - -
>>   OutErr:      - - - -
>>   InAErr:      8000000000000000 8000000000000000 0402030000000000 -
>>   InBErr:      - - - -
>>   PE[  8] A/B: 8480002b00000000 8000000000000000
>> 
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/powernv/pci.c |  238 ++++++++++++++++++++--------------
>>  1 file changed, 143 insertions(+), 95 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>> index 67b2254..a5f236a 100644
>> --- a/arch/powerpc/platforms/powernv/pci.c
>> +++ b/arch/powerpc/platforms/powernv/pci.c
>> @@ -124,67 +124,103 @@ static void pnv_teardown_msi_irqs(struct pci_dev *pdev)
>>  }
>>  #endif /* CONFIG_PCI_MSI */
>>  
>> +static char *pnv_pci_diag_field(char *buf, int fmt, u64 val64)
>> +{
>> +	u32 val32 = (u32)val64;
>> +
>> +	memset(buf, 0, 24);
>> +	switch (fmt) {
>> +	case 8:
>> +		if (val32)
>> +			sprintf(buf, "%08x", val32);
>> +		else
>> +			sprintf(buf, "%s", "-");
>> +		break;
>> +	case 16:
>> +		if (val64)
>> +			sprintf(buf, "%016llx", val64);
>> +		else
>> +			sprintf(buf, "%s", "-");
>> +		break;
>> +	default:
>> +		sprintf(buf, "%s", "-");
>> +	}
>> +
>> +	return buf;
>> +}
>> +
>>  static void pnv_pci_dump_p7ioc_diag_data(struct pci_controller *hose,
>>  					 struct OpalIoPhbErrorCommon *common)
>>  {
>>  	struct OpalIoP7IOCPhbErrorData *data;
>> +	char buf[120];
>>  	int i;
>>  
>>  	data = (struct OpalIoP7IOCPhbErrorData *)common;
>>  	pr_info("P7IOC PHB#%d Diag-data (Version: %d)\n\n",
>>  		hose->global_number, common->version);
>>  
>> -	pr_info("  brdgCtl:              %08x\n", data->brdgCtl);
>> -
>> -	pr_info("  portStatusReg:        %08x\n", data->portStatusReg);
>> -	pr_info("  rootCmplxStatus:      %08x\n", data->rootCmplxStatus);
>> -	pr_info("  busAgentStatus:       %08x\n", data->busAgentStatus);
>> -
>> -	pr_info("  deviceStatus:         %08x\n", data->deviceStatus);
>> -	pr_info("  slotStatus:           %08x\n", data->slotStatus);
>> -	pr_info("  linkStatus:           %08x\n", data->linkStatus);
>> -	pr_info("  devCmdStatus:         %08x\n", data->devCmdStatus);
>> -	pr_info("  devSecStatus:         %08x\n", data->devSecStatus);
>> -
>> -	pr_info("  rootErrorStatus:      %08x\n", data->rootErrorStatus);
>> -	pr_info("  uncorrErrorStatus:    %08x\n", data->uncorrErrorStatus);
>> -	pr_info("  corrErrorStatus:      %08x\n", data->corrErrorStatus);
>> -	pr_info("  tlpHdr1:              %08x\n", data->tlpHdr1);
>> -	pr_info("  tlpHdr2:              %08x\n", data->tlpHdr2);
>> -	pr_info("  tlpHdr3:              %08x\n", data->tlpHdr3);
>> -	pr_info("  tlpHdr4:              %08x\n", data->tlpHdr4);
>> -	pr_info("  sourceId:             %08x\n", data->sourceId);
>> -	pr_info("  errorClass:           %016llx\n", data->errorClass);
>> -	pr_info("  correlator:           %016llx\n", data->correlator);
>> -	pr_info("  p7iocPlssr:           %016llx\n", data->p7iocPlssr);
>> -	pr_info("  p7iocCsr:             %016llx\n", data->p7iocCsr);
>> -	pr_info("  lemFir:               %016llx\n", data->lemFir);
>> -	pr_info("  lemErrorMask:         %016llx\n", data->lemErrorMask);
>> -	pr_info("  lemWOF:               %016llx\n", data->lemWOF);
>> -	pr_info("  phbErrorStatus:       %016llx\n", data->phbErrorStatus);
>> -	pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
>> -	pr_info("  phbErrorLog0:         %016llx\n", data->phbErrorLog0);
>> -	pr_info("  phbErrorLog1:         %016llx\n", data->phbErrorLog1);
>> -	pr_info("  mmioErrorStatus:      %016llx\n", data->mmioErrorStatus);
>> -	pr_info("  mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus);
>> -	pr_info("  mmioErrorLog0:        %016llx\n", data->mmioErrorLog0);
>> -	pr_info("  mmioErrorLog1:        %016llx\n", data->mmioErrorLog1);
>> -	pr_info("  dma0ErrorStatus:      %016llx\n", data->dma0ErrorStatus);
>> -	pr_info("  dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus);
>> -	pr_info("  dma0ErrorLog0:        %016llx\n", data->dma0ErrorLog0);
>> -	pr_info("  dma0ErrorLog1:        %016llx\n", data->dma0ErrorLog1);
>> -	pr_info("  dma1ErrorStatus:      %016llx\n", data->dma1ErrorStatus);
>> -	pr_info("  dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus);
>> -	pr_info("  dma1ErrorLog0:        %016llx\n", data->dma1ErrorLog0);
>> -	pr_info("  dma1ErrorLog1:        %016llx\n", data->dma1ErrorLog1);
>> +	pr_info("  brdgCtl:     %s\n",
>> +		pnv_pci_diag_field(&buf[0], 8, data->brdgCtl));
>> +	pr_info("  UtlSts:      %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      8, data->portStatusReg),
>> +		pnv_pci_diag_field(&buf[1 * 24], 8, data->rootCmplxStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 8, data->busAgentStatus));
>> +	pr_info("  RootSts:     %s %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      8, data->deviceStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 8, data->slotStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 8, data->linkStatus),
>> +		pnv_pci_diag_field(&buf[3 * 24], 8, data->devCmdStatus),
>> +		pnv_pci_diag_field(&buf[4 * 24], 8, data->devSecStatus));
>> +	pr_info("  RootErrSts:  %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      8, data->rootErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 8, data->uncorrErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 8, data->corrErrorStatus));
>> +	pr_info("  RootErrLog:  %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      8, data->tlpHdr1),
>> +		pnv_pci_diag_field(&buf[1 * 24], 8, data->tlpHdr2),
>> +		pnv_pci_diag_field(&buf[2 * 24], 8, data->tlpHdr3),
>> +		pnv_pci_diag_field(&buf[3 * 24], 8, data->tlpHdr4));
>> +	pr_info("  RootErrLog1: %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],       8, data->sourceId),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->errorClass),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->correlator));
>> +	pr_info("  PhbSts:      %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->p7iocPlssr),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->p7iocCsr));
>> +	pr_info("  Lem:         %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->lemFir),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->lemErrorMask),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->lemWOF));
>> +	pr_info("  PhbErr:      %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->phbErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->phbFirstErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->phbErrorLog0),
>> +		pnv_pci_diag_field(&buf[3 * 24], 16, data->phbErrorLog1));
>> +	pr_info("  OutErr:      %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->mmioErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->mmioFirstErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->mmioErrorLog0),
>> +		pnv_pci_diag_field(&buf[3 * 24], 16, data->mmioErrorLog1));
>> +	pr_info("  InAErr:      %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->dma0ErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->dma0FirstErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->dma0ErrorLog0),
>> +		pnv_pci_diag_field(&buf[3 * 24], 16, data->dma0ErrorLog1));
>> +	pr_info("  InBErr:      %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->dma1ErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->dma1FirstErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->dma1ErrorLog0),
>> +		pnv_pci_diag_field(&buf[3 * 24], 16, data->dma1ErrorLog1));
>>  
>>  	for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) {
>>  		if ((data->pestA[i] >> 63) == 0 &&
>>  		    (data->pestB[i] >> 63) == 0)
>>  			continue;
>>  
>> -		pr_info("  PE[%3d] PESTA:        %016llx\n", i, data->pestA[i]);
>> -		pr_info("          PESTB:        %016llx\n", data->pestB[i]);
>> +		pr_info("  PE[%3d] A/B: %s %s\n",
>> +			i, pnv_pci_diag_field(&buf[0], 16, data->pestA[i]),
>> +			pnv_pci_diag_field(&buf[1 * 24], 16, data->pestB[i]));
>>  	}
>>  }
>>  
>> @@ -192,67 +228,79 @@ static void pnv_pci_dump_phb3_diag_data(struct pci_controller *hose,
>>  					struct OpalIoPhbErrorCommon *common)
>>  {
>>  	struct OpalIoPhb3ErrorData *data;
>> -	int i;
>> +	char buf[120];
>> +	int i = 0;
>>  
>> +	memset(buf, 0, 120);
>>  	data = (struct OpalIoPhb3ErrorData*)common;
>>  	pr_info("PHB3 PHB#%d Diag-data (Version: %d)\n\n",
>>  		hose->global_number, common->version);
>>  
>> -	pr_info("  brdgCtl:              %08x\n", data->brdgCtl);
>> -
>> -	pr_info("  portStatusReg:        %08x\n", data->portStatusReg);
>> -	pr_info("  rootCmplxStatus:      %08x\n", data->rootCmplxStatus);
>> -	pr_info("  busAgentStatus:       %08x\n", data->busAgentStatus);
>> -
>> -	pr_info("  deviceStatus:         %08x\n", data->deviceStatus);
>> -	pr_info("  slotStatus:           %08x\n", data->slotStatus);
>> -	pr_info("  linkStatus:           %08x\n", data->linkStatus);
>> -	pr_info("  devCmdStatus:         %08x\n", data->devCmdStatus);
>> -	pr_info("  devSecStatus:         %08x\n", data->devSecStatus);
>> -
>> -	pr_info("  rootErrorStatus:      %08x\n", data->rootErrorStatus);
>> -	pr_info("  uncorrErrorStatus:    %08x\n", data->uncorrErrorStatus);
>> -	pr_info("  corrErrorStatus:      %08x\n", data->corrErrorStatus);
>> -	pr_info("  tlpHdr1:              %08x\n", data->tlpHdr1);
>> -	pr_info("  tlpHdr2:              %08x\n", data->tlpHdr2);
>> -	pr_info("  tlpHdr3:              %08x\n", data->tlpHdr3);
>> -	pr_info("  tlpHdr4:              %08x\n", data->tlpHdr4);
>> -	pr_info("  sourceId:             %08x\n", data->sourceId);
>> -	pr_info("  errorClass:           %016llx\n", data->errorClass);
>> -	pr_info("  correlator:           %016llx\n", data->correlator);
>> -
>> -	pr_info("  nFir:                 %016llx\n", data->nFir);
>> -	pr_info("  nFirMask:             %016llx\n", data->nFirMask);
>> -	pr_info("  nFirWOF:              %016llx\n", data->nFirWOF);
>> -	pr_info("  PhbPlssr:             %016llx\n", data->phbPlssr);
>> -	pr_info("  PhbCsr:               %016llx\n", data->phbCsr);
>> -	pr_info("  lemFir:               %016llx\n", data->lemFir);
>> -	pr_info("  lemErrorMask:         %016llx\n", data->lemErrorMask);
>> -	pr_info("  lemWOF:               %016llx\n", data->lemWOF);
>> -	pr_info("  phbErrorStatus:       %016llx\n", data->phbErrorStatus);
>> -	pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
>> -	pr_info("  phbErrorLog0:         %016llx\n", data->phbErrorLog0);
>> -	pr_info("  phbErrorLog1:         %016llx\n", data->phbErrorLog1);
>> -	pr_info("  mmioErrorStatus:      %016llx\n", data->mmioErrorStatus);
>> -	pr_info("  mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus);
>> -	pr_info("  mmioErrorLog0:        %016llx\n", data->mmioErrorLog0);
>> -	pr_info("  mmioErrorLog1:        %016llx\n", data->mmioErrorLog1);
>> -	pr_info("  dma0ErrorStatus:      %016llx\n", data->dma0ErrorStatus);
>> -	pr_info("  dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus);
>> -	pr_info("  dma0ErrorLog0:        %016llx\n", data->dma0ErrorLog0);
>> -	pr_info("  dma0ErrorLog1:        %016llx\n", data->dma0ErrorLog1);
>> -	pr_info("  dma1ErrorStatus:      %016llx\n", data->dma1ErrorStatus);
>> -	pr_info("  dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus);
>> -	pr_info("  dma1ErrorLog0:        %016llx\n", data->dma1ErrorLog0);
>> -	pr_info("  dma1ErrorLog1:        %016llx\n", data->dma1ErrorLog1);
>> +	pr_info("  brdgCtl:     %s\n",
>> +		pnv_pci_diag_field(&buf[0], 8, data->brdgCtl));
>> +	pr_info("  UtlSts:      %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      8, data->portStatusReg),
>> +		pnv_pci_diag_field(&buf[1 * 24], 8, data->rootCmplxStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 8, data->busAgentStatus));
>> +	pr_info("  RootSts:     %s %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      8, data->deviceStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 8, data->slotStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 8, data->linkStatus),
>> +		pnv_pci_diag_field(&buf[3 * 24], 8, data->devCmdStatus),
>> +		pnv_pci_diag_field(&buf[4 * 24], 8, data->devSecStatus));
>> +	pr_info("  RootErrSts:  %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      8, data->rootErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 8, data->uncorrErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 8, data->corrErrorStatus));
>> +	pr_info("  RootErrLog:  %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      8, data->tlpHdr1),
>> +		pnv_pci_diag_field(&buf[1 * 24], 8, data->tlpHdr2),
>> +		pnv_pci_diag_field(&buf[2 * 24], 8, data->tlpHdr3),
>> +		pnv_pci_diag_field(&buf[3 * 24], 8, data->tlpHdr4));
>> +	pr_info("  RootErrLog1: %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],       8, data->sourceId),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->errorClass),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->correlator));
>> +	pr_info("  nFir:        %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->nFir),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->nFirMask),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->nFirWOF));
>> +	pr_info("  PhbSts:      %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->phbPlssr),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->phbCsr));
>> +	pr_info("  Lem:         %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->lemFir),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->lemErrorMask),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->lemWOF));
>> +	pr_info("  PhbErr:      %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->phbErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->phbFirstErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->phbErrorLog0),
>> +		pnv_pci_diag_field(&buf[3 * 24], 16, data->phbErrorLog1));
>> +	pr_info("  OutErr:      %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->mmioErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->mmioFirstErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->mmioErrorLog0),
>> +		pnv_pci_diag_field(&buf[3 * 24], 16, data->mmioErrorLog1));
>> +	pr_info("  InAErr:      %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->dma0ErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->dma0FirstErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->dma0ErrorLog0),
>> +		pnv_pci_diag_field(&buf[3 * 24], 16, data->dma0ErrorLog1));
>> +	pr_info("  InBErr:      %s %s %s %s\n",
>> +		pnv_pci_diag_field(&buf[0],      16, data->dma1ErrorStatus),
>> +		pnv_pci_diag_field(&buf[1 * 24], 16, data->dma1FirstErrorStatus),
>> +		pnv_pci_diag_field(&buf[2 * 24], 16, data->dma1ErrorLog0),
>> +		pnv_pci_diag_field(&buf[3 * 24], 16, data->dma1ErrorLog1));
>>  
>>  	for (i = 0; i < OPAL_PHB3_NUM_PEST_REGS; i++) {
>>  		if ((data->pestA[i] >> 63) == 0 &&
>>  		    (data->pestB[i] >> 63) == 0)
>>  			continue;
>>  
>> -		pr_info("  PE[%3d] PESTA:        %016llx\n", i, data->pestA[i]);
>> -		pr_info("          PESTB:        %016llx\n", data->pestB[i]);
>> +		pr_info("  PE[%3d] A/B: %s %s\n",
>> +			i, pnv_pci_diag_field(&buf[0], 16, data->pestA[i]),
>> +			pnv_pci_diag_field(&buf[1 * 24], 16, data->pestB[i]));
>>  	}
>>  }
>>  
>
>

^ permalink raw reply

* Re: [PATCH 4/5] powerpc/powernv: Cache PHB diag-data
From: Gavin Shan @ 2014-02-23  4:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1393012887.6771.94.camel@pasglop>

On Sat, Feb 22, 2014 at 07:01:27AM +1100, Benjamin Herrenschmidt wrote:
>On Fri, 2014-02-21 at 19:53 +0800, Gavin Shan wrote:
>> EEH core tries to recover from fenced PHB or frozen PE. Unfortunately,
>> the log isn't consistent with the site because enabling IO path for
>> the frozen PE before collecting log would ruin the site. The patch
>> solves the problem to cache the PHB diag-data in advance with the
>> help of additional flag PNV_PHB_FLAG_DIAG to pnv_phb::flags.
>
>Ok, so correct me if I'm wrong, but you are
>
>  - Collecting the diag data in get_state, as a sort of side effect
>(what happens if get_state is called multiple times ?)
>
>  - Dumping it later on
>

Yeah, the patch would have some problems when get_state gets called
for multiple times: the log could be much more than what we expected
in case that frozen PE#1 detected and in progress to handle it (we
don't dump it yet), frozen PE#2 detected. The log would include all
information for frozen PE#1 and PE#2 and it's not expected.

Another case is get_state called for multiple times on frozen PE#1
and we can check EEH_PE_ISOLATED to avoid diag-data over-writting.

I'm thinking of a new mechanism (please refer to the reply below).

>Any reason why we can't instead dump it immediately ? Also do we have a
>clean trigger for when we detect an error condition ? It can either be
>the result of an interrupt or a driver called get_state following an
>ffffffff. Are both path eventually going into the same function to
>handle a "new" error condition ? That's where I would both collect and
>dump the EEH state..
>

The reason I don't want dump it immediately is that I would keep
struct pnv_eeh_ops::get_log() to dump diag-data to guest in the
future.

The problem is that we have only one PHB diag-data instance in
struct pnv_phb::diag.blob[]. I'm thinking of to have each PE
to have diag-data for itself and the things would look like
followings. Ben, please comment :-)

- Extend "struct eeh_pe" to have a platform pointer (void *data).
  And we still collect diag-data in get_state() or next_error(),
  which will be dumped in pnv_eeh_ops->get_log(). The disadvantage
  could be lots of memory (extra 8KB usually) consumed by each PE
  instance.
- For PCI config accessors and informative (also dead PHB, dead
  P7IOC), we still use pnv_phb::diag.blob[].

Thanks,
Gavin

>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/powernv/eeh-ioda.c |   65 ++++++++++++++++++-----------
>>  arch/powerpc/platforms/powernv/pci.c      |   21 ++++++----
>>  arch/powerpc/platforms/powernv/pci.h      |    1 +
>>  3 files changed, 55 insertions(+), 32 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
>> index 04b4710..3ed8d22 100644
>> --- a/arch/powerpc/platforms/powernv/eeh-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
>> @@ -114,6 +114,27 @@ DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_inbB_dbgfs_ops, ioda_eeh_inbB_dbgfs_get,
>>  			ioda_eeh_inbB_dbgfs_set, "0x%llx\n");
>>  #endif /* CONFIG_DEBUG_FS */
>>  
>> +static void ioda_eeh_phb_diag(struct pci_controller *hose)
>> +{
>> +	struct pnv_phb *phb = hose->private_data;
>> +	unsigned long flags;
>> +	long rc;
>> +
>> +	spin_lock_irqsave(&phb->lock, flags);
>> +
>> +	rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
>> +					 PNV_PCI_DIAG_BUF_SIZE);
>> +	if (rc == OPAL_SUCCESS) {
>> +		phb->flags |= PNV_PHB_FLAG_DIAG;
>> +	} else {
>> +		pr_warn("%s: Can't get diag-data for PHB#%x (%ld)\n",
>> +			__func__, hose->global_number, rc);
>> +		phb->flags &= ~PNV_PHB_FLAG_DIAG;
>> +	}
>> +
>> +	spin_unlock_irqrestore(&phb->lock, flags);
>> +}
>> +
>>  /**
>>   * ioda_eeh_post_init - Chip dependent post initialization
>>   * @hose: PCI controller
>> @@ -272,6 +293,8 @@ static int ioda_eeh_get_state(struct eeh_pe *pe)
>>  			result |= EEH_STATE_DMA_ACTIVE;
>>  			result |= EEH_STATE_MMIO_ENABLED;
>>  			result |= EEH_STATE_DMA_ENABLED;
>> +		} else {
>> +			ioda_eeh_phb_diag(hose);
>>  		}
>>  
>>  		return result;
>> @@ -541,24 +564,13 @@ static int ioda_eeh_reset(struct eeh_pe *pe, int option)
>>  static int ioda_eeh_get_log(struct eeh_pe *pe, int severity,
>>  			    char *drv_log, unsigned long len)
>>  {
>> -	s64 ret;
>> +	struct pnv_phb *phb = pe->phb->private_data;
>>  	unsigned long flags;
>> -	struct pci_controller *hose = pe->phb;
>> -	struct pnv_phb *phb = hose->private_data;
>>  
>>  	spin_lock_irqsave(&phb->lock, flags);
>>  
>> -	ret = opal_pci_get_phb_diag_data2(phb->opal_id,
>> -			phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE);
>> -	if (ret) {
>> -		spin_unlock_irqrestore(&phb->lock, flags);
>> -		pr_warning("%s: Can't get log for PHB#%x-PE#%x (%lld)\n",
>> -			   __func__, hose->global_number, pe->addr, ret);
>> -		return -EIO;
>> -	}
>> -
>> -	/* The PHB diag-data is always indicative */
>> -	pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
>> +	pnv_pci_dump_phb_diag_data(pe->phb, phb->diag.blob);
>> +	phb->flags &= ~PNV_PHB_FLAG_DIAG;
>>  
>>  	spin_unlock_irqrestore(&phb->lock, flags);
>>  
>> @@ -646,19 +658,11 @@ static void ioda_eeh_hub_diag(struct pci_controller *hose)
>>  	}
>>  }
>>  
>> -static void ioda_eeh_phb_diag(struct pci_controller *hose)
>> +static void ioda_eeh_phb_diag_dump(struct pci_controller *hose)
>>  {
>>  	struct pnv_phb *phb = hose->private_data;
>> -	long rc;
>> -
>> -	rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
>> -					 PNV_PCI_DIAG_BUF_SIZE);
>> -	if (rc != OPAL_SUCCESS) {
>> -		pr_warning("%s: Failed to get diag-data for PHB#%x (%ld)\n",
>> -			    __func__, hose->global_number, rc);
>> -		return;
>> -	}
>>  
>> +	ioda_eeh_phb_diag(hose);
>>  	pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
>>  }
>>  
>> @@ -778,7 +782,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
>>  				pr_info("EEH: PHB#%x informative error "
>>  					"detected\n",
>>  					hose->global_number);
>> -				ioda_eeh_phb_diag(hose);
>> +				ioda_eeh_phb_diag_dump(hose);
>>  				ret = EEH_NEXT_ERR_NONE;
>>  			}
>>  
>> @@ -809,6 +813,17 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
>>  		}
>>  
>>  		/*
>> +		 * EEH core will try recover from fenced PHB or
>> +		 * frozen PE. In the time for frozen PE, EEH core
>> +		 * enable IO path for that before collecting logs,
>> +		 * but it ruins the site. So we have to cache the
>> +		 * log in advance here.
>> +		 */
>> +		if (ret == EEH_NEXT_ERR_FROZEN_PE ||
>> +		    ret == EEH_NEXT_ERR_FENCED_PHB)
>> +			ioda_eeh_phb_diag(hose);
>> +
>> +		/*
>>  		 * If we have no errors on the specific PHB or only
>>  		 * informative error there, we continue poking it.
>>  		 * Otherwise, we need actions to be taken by upper
>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>> index 437c37d..67b2254 100644
>> --- a/arch/powerpc/platforms/powernv/pci.c
>> +++ b/arch/powerpc/platforms/powernv/pci.c
>> @@ -259,11 +259,15 @@ static void pnv_pci_dump_phb3_diag_data(struct pci_controller *hose,
>>  void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
>>  				unsigned char *log_buff)
>>  {
>> +	struct pnv_phb *phb = hose->private_data;
>>  	struct OpalIoPhbErrorCommon *common;
>>  
>>  	if (!hose || !log_buff)
>>  		return;
>>  
>> +	if (!(phb->flags & PNV_PHB_FLAG_DIAG))
>> +		return;
>> +
>>  	common = (struct OpalIoPhbErrorCommon *)log_buff;
>>  	switch (common->ioType) {
>>  	case OPAL_PHB_ERROR_DATA_TYPE_P7IOC:
>> @@ -281,13 +285,19 @@ void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
>>  static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
>>  {
>>  	unsigned long flags, rc;
>> -	int has_diag;
>> +	bool has_diag = false;
>>  
>>  	spin_lock_irqsave(&phb->lock, flags);
>>  
>> -	rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
>> -					 PNV_PCI_DIAG_BUF_SIZE);
>> -	has_diag = (rc == OPAL_SUCCESS);
>> +	if (!(phb->flags & PNV_PHB_FLAG_DIAG)) {
>> +		rc = opal_pci_get_phb_diag_data2(phb->opal_id,
>> +						 phb->diag.blob,
>> +						 PNV_PCI_DIAG_BUF_SIZE);
>> +		if (rc == OPAL_SUCCESS) {
>> +			phb->flags |= PNV_PHB_FLAG_DIAG;
>> +			has_diag = true;
>> +		}
>> +	}
>>  
>>  	rc = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no,
>>  				       OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
>> @@ -303,9 +313,6 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
>>  		 */
>>  		if (has_diag)
>>  			pnv_pci_dump_phb_diag_data(phb->hose, phb->diag.blob);
>> -		else
>> -			pr_warning("PCI %d: No diag data available\n",
>> -				   phb->hose->global_number);
>>  	}
>>  
>>  	spin_unlock_irqrestore(&phb->lock, flags);
>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>> index adeb3c4..153af9a 100644
>> --- a/arch/powerpc/platforms/powernv/pci.h
>> +++ b/arch/powerpc/platforms/powernv/pci.h
>> @@ -82,6 +82,7 @@ struct pnv_eeh_ops {
>>  #endif /* CONFIG_EEH */
>>  
>>  #define PNV_PHB_FLAG_EEH	(1 << 0)
>> +#define PNV_PHB_FLAG_DIAG	(1 << 1)
>>  
>>  struct pnv_phb {
>>  	struct pci_controller	*hose;
>
>

^ permalink raw reply

* Re: [PATCH 3/5] powerpc/powernv: Cleanup on PNV_EEH_STATE_ENABLED
From: Gavin Shan @ 2014-02-23  2:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1393012609.6771.90.camel@pasglop>

On Sat, Feb 22, 2014 at 06:56:49AM +1100, Benjamin Herrenschmidt wrote:
>On Fri, 2014-02-21 at 19:53 +0800, Gavin Shan wrote:
>> The flag PNV_EEH_STATE_ENABLED is put into pnv_phb::eeh_state, which
>> is protected by CONFIG_EEH. We needn't that. Instead, we can have
>> pnv_phb::flags and maintain all flags there, which is the purpose
>> of the patch.
>
>Can you explain a bit more why we want to create a new flag set
>that didn't exist before ? This adds confusion so we need a very
>good reason... Do we need to know about the enable state of EEH
>even when CNFIG_EEH is not set ?
>

The commit log was a bit confusing. We didn't create a new flag
here and I just renamed PNV_EEH_STATE_ENABLED to PNV_PHB_FLAG_EEH.
I'll say more in the commit log about it in next revision.

The flag is needed even we have CONFIG_EEH set because we need
switch to EEH, instead of detecting frozen PE and clearing it
in PCI config accessors after EEH is initialized and loaded :-)

Thanks,
Gavin

>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/powernv/eeh-ioda.c |    2 +-
>>  arch/powerpc/platforms/powernv/pci.c      |    8 ++------
>>  arch/powerpc/platforms/powernv/pci.h      |    7 +++----
>>  3 files changed, 6 insertions(+), 11 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
>> index 0d1d424..04b4710 100644
>> --- a/arch/powerpc/platforms/powernv/eeh-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
>> @@ -153,7 +153,7 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
>>  	}
>>  #endif
>>  
>> -	phb->eeh_state |= PNV_EEH_STATE_ENABLED;
>> +	phb->flags |= PNV_PHB_FLAG_EEH;
>>  
>>  	return 0;
>>  }
>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>> index b555ebc..437c37d 100644
>> --- a/arch/powerpc/platforms/powernv/pci.c
>> +++ b/arch/powerpc/platforms/powernv/pci.c
>> @@ -396,7 +396,7 @@ int pnv_pci_cfg_read(struct device_node *dn,
>>  	if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED))
>>  		return PCIBIOS_SUCCESSFUL;
>>  
>> -	if (phb->eeh_state & PNV_EEH_STATE_ENABLED) {
>> +	if (phb->flags & PNV_PHB_FLAG_EEH) {
>>  		if (*val == EEH_IO_ERROR_VALUE(size) &&
>>  		    eeh_dev_check_failure(of_node_to_eeh_dev(dn)))
>>  			return PCIBIOS_DEVICE_NOT_FOUND;
>> @@ -434,12 +434,8 @@ int pnv_pci_cfg_write(struct device_node *dn,
>>  	}
>>  
>>  	/* Check if the PHB got frozen due to an error (no response) */
>> -#ifdef CONFIG_EEH
>> -	if (!(phb->eeh_state & PNV_EEH_STATE_ENABLED))
>> +	if (!(phb->flags & PNV_PHB_FLAG_EEH))
>>  		pnv_pci_config_check_eeh(phb, dn);
>> -#else
>> -	pnv_pci_config_check_eeh(phb, dn);
>> -#endif
>>  
>>  	return PCIBIOS_SUCCESSFUL;
>>  }
>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>> index dbeba3d..adeb3c4 100644
>> --- a/arch/powerpc/platforms/powernv/pci.h
>> +++ b/arch/powerpc/platforms/powernv/pci.h
>> @@ -79,24 +79,23 @@ struct pnv_eeh_ops {
>>  	int (*configure_bridge)(struct eeh_pe *pe);
>>  	int (*next_error)(struct eeh_pe **pe);
>>  };
>> -
>> -#define PNV_EEH_STATE_ENABLED	(1 << 0)	/* EEH enabled	*/
>> -
>>  #endif /* CONFIG_EEH */
>>  
>> +#define PNV_PHB_FLAG_EEH	(1 << 0)
>> +
>>  struct pnv_phb {
>>  	struct pci_controller	*hose;
>>  	enum pnv_phb_type	type;
>>  	enum pnv_phb_model	model;
>>  	u64			hub_id;
>>  	u64			opal_id;
>> +	int			flags;
>>  	void __iomem		*regs;
>>  	int			initialized;
>>  	spinlock_t		lock;
>>  
>>  #ifdef CONFIG_EEH
>>  	struct pnv_eeh_ops	*eeh_ops;
>> -	int			eeh_state;
>>  #endif
>>  
>>  #ifdef CONFIG_DEBUG_FS
>
>

^ permalink raw reply

* Re: [PATCH 4/7] ECHI Platform: Merge ppc-of EHCI driver into the ehci-platform driver
From: Tony Prisk @ 2014-02-22  2:32 UTC (permalink / raw)
  To: Mark Rutland, Alistair Popple
  Cc: gregkh@linuxfoundation.org, linux-usb@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org
In-Reply-To: <20140221114803.GB8783@e106331-lin.cambridge.arm.com>


On 22/02/14 00:48, Mark Rutland wrote:
> [Adding Tony Prisk to Cc]
>
> On Fri, Feb 21, 2014 at 06:31:30AM +0000, Alistair Popple wrote:
>> Currently the ppc-of driver uses the compatibility string
>> "usb-ehci". This means platforms that use device-tree and implement an
>> EHCI compatible interface have to either use the ppc-of driver or add
>> a compatible line to the ehci-platform driver. It would be more
>> appropriate for the platform driver to be compatible with "usb-ehci"
>> as non-powerpc platforms are also beginning to utilise device-tree.
>>
>> This patch merges the device tree property parsing from ehci-ppc-of
>> into the platform driver and adds a "usb-ehci" compatibility
>> string. The existing ehci-ppc-of driver is removed and the 440EPX
>> specific quirks are added to the ehci-platform driver.
>>
>> Signed-off-by: Alistair Popple <alistair@popple.id.au>
>> ---
>>   drivers/usb/host/Kconfig         |    7 +-
>>   drivers/usb/host/ehci-hcd.c      |    5 -
>>   drivers/usb/host/ehci-platform.c |   87 +++++++++++++-
>>   drivers/usb/host/ehci-ppc-of.c   |  238 --------------------------------------
>>   4 files changed, 89 insertions(+), 248 deletions(-)
>>   delete mode 100644 drivers/usb/host/ehci-ppc-of.c
> Please use of_property_read_bool for these.
>
> This driver already handles "via,vt8500-ehci" and "wm,prizm-ehci" which
> aren't documented to handle these properties, but now gain support for
> them. It might be worth unifying the binding documents if there's
> nothing special about those two host controllers.
>
> We seem to have two binding documents for "via,vt8500-ehci", so some
> cleanup is definitely in order.
>
> Tony, you seem to have written both documents judging by 95e9fd10f06c
> and 8ad551d150e3. Do you have any issue with merging both of these into
> a common usb-ehci document?
......
> Cheers,
> Mark.

I'm not sure how we ended up with two bindings for the driver anyway. I 
think this was an error on my part somewhere.

None of the in-tree dts files use "wm,prizm-ehci".

I have no issue with merging all the documentation into a single 
usb-ehci binding.

Regards
Tony Prisk

^ permalink raw reply

* Re: [PATCH] PPC: KVM: Introduce hypervisor call H_GET_TCE
From: Alexey Kardashevskiy @ 2014-02-22  0:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Benjamin Herrenschmidt, Laurent Dufour
  Cc: kvm, Gleb Natapov, Alexander Graf, kvm-ppc, Paul Mackerras,
	Paolo Bonzini, linuxppc-dev
In-Reply-To: <5307EF2F.9090100@ozlabs.ru>

On 02/22/2014 11:28 AM, Alexey Kardashevskiy wrote:
> On 02/22/2014 06:23 AM, Benjamin Herrenschmidt wrote:
>> On Fri, 2014-02-21 at 16:31 +0100, Laurent Dufour wrote:
>>> This fix introduces the H_GET_TCE hypervisor call which is basically the
>>> reverse of H_PUT_TCE, as defined in the Power Architecture Platform
>>> Requirements (PAPR).
>>>
>>> The hcall H_GET_TCE is required by the kdump kernel which is calling it to
>>> retrieve the TCE set up by the panicing kernel.
>>
>> Alexey, will that work for VFIO ?
> 
> Yes.


Oh! My bad, this is _G_et. Not, this won't support VFIO but this should not
break the current "slow" VFIO support in upstream.


>> Or are those patches *still* not
>> upstream ?
> 
> Yes.

This part is still true. I cannot get Alex Graf attention even on much
simpler things for several months.



> 
> 
>>
>>> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/include/asm/kvm_ppc.h      |    2 ++
>>>  arch/powerpc/kvm/book3s_64_vio_hv.c     |   28 ++++++++++++++++++++++++++++
>>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |    2 +-
>>>  3 files changed, 31 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>>> index fcd53f0..4096f16 100644
>>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>>> @@ -129,6 +129,8 @@ extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>>  				struct kvm_create_spapr_tce *args);
>>>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>>  			     unsigned long ioba, unsigned long tce);
>>> +extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>> +			     unsigned long ioba);
>>>  extern struct kvm_rma_info *kvm_alloc_rma(void);
>>>  extern void kvm_release_rma(struct kvm_rma_info *ri);
>>>  extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
>>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>>> index 2c25f54..89e96b3 100644
>>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>>> @@ -75,3 +75,31 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>>  	return H_TOO_HARD;
>>>  }
>>>  EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>>> +
>>> +long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>> +		      unsigned long ioba)
>>> +{
>>> +	struct kvm *kvm = vcpu->kvm;
>>> +	struct kvmppc_spapr_tce_table *stt;
>>> +
>>> +	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
>>> +		if (stt->liobn == liobn) {
>>> +			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
>>> +			struct page *page;
>>> +			u64 *tbl;
>>> +
>>> +			if (ioba >= stt->window_size)
>>> +				return H_PARAMETER;
>>> +
>>> +			page = stt->pages[idx / TCES_PER_PAGE];
>>> +			tbl = (u64 *)page_address(page);
>>> +
>>> +			vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
>>> +			return H_SUCCESS;
>>> +		}
>>> +	}
>>> +
>>> +	/* Didn't find the liobn, punt it to userspace */
>>> +	return H_TOO_HARD;
>>> +}
>>> +EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
>>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>> index e66d4ec..7d4fe2a 100644
>>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>> @@ -1758,7 +1758,7 @@ hcall_real_table:
>>>  	.long	0		/* 0x10 - H_CLEAR_MOD */
>>>  	.long	0		/* 0x14 - H_CLEAR_REF */
>>>  	.long	.kvmppc_h_protect - hcall_real_table
>>> -	.long	0		/* 0x1c - H_GET_TCE */
>>> +	.long	.kvmppc_h_get_tce - hcall_real_table
>>>  	.long	.kvmppc_h_put_tce - hcall_real_table
>>>  	.long	0		/* 0x24 - H_SET_SPRG0 */
>>>  	.long	.kvmppc_h_set_dabr - hcall_real_table
>>
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>>
> 
> 


-- 
Alexey Kardashevskiy
IBM OzLabs, LTC Team

e-mail: aik@au1.ibm.com
notes: Alexey Kardashevskiy/Australia/IBM

^ permalink raw reply

* Re: [PATCH] PPC: KVM: Introduce hypervisor call H_GET_TCE
From: Alexey Kardashevskiy @ 2014-02-22  0:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Laurent Dufour
  Cc: Alexey Kardashevskiy, kvm, Gleb Natapov, Alexander Graf, kvm-ppc,
	Paul Mackerras, Paolo Bonzini, linuxppc-dev
In-Reply-To: <1393010638.6771.74.camel@pasglop>

On 02/22/2014 06:23 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2014-02-21 at 16:31 +0100, Laurent Dufour wrote:
>> This fix introduces the H_GET_TCE hypervisor call which is basically the
>> reverse of H_PUT_TCE, as defined in the Power Architecture Platform
>> Requirements (PAPR).
>>
>> The hcall H_GET_TCE is required by the kdump kernel which is calling it to
>> retrieve the TCE set up by the panicing kernel.
> 
> Alexey, will that work for VFIO ?

Yes.

> Or are those patches *still* not
> upstream ?

Yes.


> 
>> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/kvm_ppc.h      |    2 ++
>>  arch/powerpc/kvm/book3s_64_vio_hv.c     |   28 ++++++++++++++++++++++++++++
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |    2 +-
>>  3 files changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index fcd53f0..4096f16 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -129,6 +129,8 @@ extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>  				struct kvm_create_spapr_tce *args);
>>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  			     unsigned long ioba, unsigned long tce);
>> +extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> +			     unsigned long ioba);
>>  extern struct kvm_rma_info *kvm_alloc_rma(void);
>>  extern void kvm_release_rma(struct kvm_rma_info *ri);
>>  extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 2c25f54..89e96b3 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -75,3 +75,31 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  	return H_TOO_HARD;
>>  }
>>  EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>> +
>> +long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> +		      unsigned long ioba)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvmppc_spapr_tce_table *stt;
>> +
>> +	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
>> +		if (stt->liobn == liobn) {
>> +			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
>> +			struct page *page;
>> +			u64 *tbl;
>> +
>> +			if (ioba >= stt->window_size)
>> +				return H_PARAMETER;
>> +
>> +			page = stt->pages[idx / TCES_PER_PAGE];
>> +			tbl = (u64 *)page_address(page);
>> +
>> +			vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
>> +			return H_SUCCESS;
>> +		}
>> +	}
>> +
>> +	/* Didn't find the liobn, punt it to userspace */
>> +	return H_TOO_HARD;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index e66d4ec..7d4fe2a 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -1758,7 +1758,7 @@ hcall_real_table:
>>  	.long	0		/* 0x10 - H_CLEAR_MOD */
>>  	.long	0		/* 0x14 - H_CLEAR_REF */
>>  	.long	.kvmppc_h_protect - hcall_real_table
>> -	.long	0		/* 0x1c - H_GET_TCE */
>> +	.long	.kvmppc_h_get_tce - hcall_real_table
>>  	.long	.kvmppc_h_put_tce - hcall_real_table
>>  	.long	0		/* 0x24 - H_SET_SPRG0 */
>>  	.long	.kvmppc_h_set_dabr - hcall_real_table
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 


-- 
Alexey

^ permalink raw reply

* [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
From: Cody P Schafer @ 2014-02-22  0:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Cody P Schafer, Madhavan Srinivasan,
	Olof Johansson, Paul Gortmaker, Wang Dongsheng
  Cc: Paul Mackerras, linuxppc-dev, linux-kernel

/sys/devices/system/cpu/cpu*/smt-snooze-delay was converted into a NOP
in commit 3fa8cad82b94d0bed002571bd246f2299ffc876b, and now does
nothing. Add a pr_warn() to convince any users that they should stop
using it.

The commit message from the removing commit notes that this
functionality should move into the cpuidle driver, essentially by
adjusting target_residency to the specified value. At the moment,
target_residency is not exposed by cpuidle's sysfs, so there isn't a
drop in replacement for this.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/sysfs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 97e1dc9..84097b4 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -50,6 +50,9 @@ static ssize_t store_smt_snooze_delay(struct device *dev,
 	if (ret != 1)
 		return -EINVAL;
 
+	pr_warn_ratelimited("%s (%d): /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
+		  current->comm, task_pid_nr(current), cpu->dev.id);
+
 	per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
 	return count;
 }
@@ -60,6 +63,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
 {
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
 
+	pr_warn_ratelimited("%s (%d): /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
+		  current->comm, task_pid_nr(current), cpu->dev.id);
+
 	return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
 }
 
-- 
1.9.0

^ permalink raw reply related

* Re: [PATCH 1/3] mm: return NUMA_NO_NODE in local_memory_node if zonelists are not setup
From: Nishanth Aravamudan @ 2014-02-21 23:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, linux-mm, Mel Gorman, David Rientjes,
	Christoph Lameter, linuxppc-dev, Joonsoo Kim, Anton Blanchard
In-Reply-To: <20140221144203.8d7b0d7039846c0304f86141@linux-foundation.org>

On 21.02.2014 [14:42:03 -0800], Andrew Morton wrote:
> On Thu, 20 Feb 2014 10:28:47 -0800 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> 
> > On 20.02.2014 [10:05:39 -0600], Christoph Lameter wrote:
> > > On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
> > > 
> > > > We can call local_memory_node() before the zonelists are setup. In that
> > > > case, first_zones_zonelist() will not set zone and the reference to
> > > > zone->node will Oops. Catch this case, and, since we presumably running
> > > > very early, just return that any node will do.
> > > 
> > > Really? Isnt there some way to avoid this call if zonelists are not setup
> > > yet?
> > 
> > How do I best determine if zonelists aren't setup yet?
> > 
> > The call-path in question (after my series is applied) is:
> > 
> > arch/powerpc/kernel/setup_64.c::setup_arch ->
> > 	arch/powerpc/mm/numa.c::do_init_bootmem() ->
> > 		cpu_numa_callback() ->
> > 			numa_setup_cpu() ->
> > 				map_cpu_to_node() ->
> > 					update_numa_cpu_node() ->
> > 						set_cpu_numa_mem()
> > 
> > and setup_arch() is called before build_all_zonelists(NULL, NULL) in
> > start_kernel(). This seemed like the most reasonable path, as it's used
> > on hotplug as well.
> > 
> 
> But the call to local_memory_node() you added was in start_secondary(),
> which isn't in that trace.

I added two calls to local_memory_node(), I *think* both are necessary,
but am willing to be corrected.

One is in map_cpu_to_node() and one is in start_secondary(). The
start_secondary() path is fine, AFAICT, as we are up & running at that
point. But in [the renamed function] update_numa_cpu_node() which is
used by hotplug, we get called from do_init_bootmem(), which is before
the zonelists are setup.

I think both calls are necessary because I believe the
arch_update_cpu_topology() is used for supporting firmware-driven
home-noding, which does not invoke start_secondary() again (the
processor is already running, we're just updating the topology in that
situation).

Then again, I could special-case the do_init_bootmem callpath, which is
only called at kernel init time?

> I do agree that calling local_memory_node() too early then trying to
> fudge around the consequences seems rather wrong.

If the answer is to simply not call local_memory_node() early, I'll
submit a patch to at least add a comment, as there's nothing in the code
itself to prevent this from happening and is guaranteed to oops.

Thanks,
Nish

^ permalink raw reply

* Re: [PATCH 2/9] ps3-vuart: don't use PREPARE_WORK
From: Geoff Levand @ 2014-02-21 23:19 UTC (permalink / raw)
  To: Tejun Heo; +Cc: cbe-oss-dev, linuxppc-dev, laijs, linux-kernel
In-Reply-To: <1392929071-16555-3-git-send-email-tj@kernel.org>

Hi,

On Thu, 2014-02-20 at 15:44 -0500, Tejun Heo wrote:
> ps3_vuart wasn't overriding the work item with multiple work functions
> but was using NULL for INIT_WORK() and then single PREPARE_WORK() to
> set the work function.  We can simply invoke INIT_WORK() with the work
> function and remove the PREPARE_WORK() usage.
> 
> It would probably be best to route this with other related updates
> through the workqueue tree.

I tested this on ps3 (DECR-1400) and it seems to work OK.  Please queue
with your other workqueue changes.  Thanks!

Acked-by: Geoff Levand <geoff@infradead.org>

^ permalink raw reply

* AUTO: Michael Barry is out of the office (returning 25/02/2014)
From: Michael Barry @ 2014-02-21 22:56 UTC (permalink / raw)
  To: linuxppc-dev


I am out of the office until 25/02/2014.




Note: This is an automated response to your message  "Linuxppc-dev Digest,
Vol 114, Issue 105" sent on 21/02/2014 19:24:30.

This is the only notification you will receive while this person is away.

^ permalink raw reply

* Re: [PATCH 1/3] mm: return NUMA_NO_NODE in local_memory_node if zonelists are not setup
From: Andrew Morton @ 2014-02-21 22:42 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Michal Hocko, linux-mm, Mel Gorman, David Rientjes,
	Christoph Lameter, linuxppc-dev, Joonsoo Kim, Anton Blanchard
In-Reply-To: <20140220182847.GA24745@linux.vnet.ibm.com>

On Thu, 20 Feb 2014 10:28:47 -0800 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:

> On 20.02.2014 [10:05:39 -0600], Christoph Lameter wrote:
> > On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
> > 
> > > We can call local_memory_node() before the zonelists are setup. In that
> > > case, first_zones_zonelist() will not set zone and the reference to
> > > zone->node will Oops. Catch this case, and, since we presumably running
> > > very early, just return that any node will do.
> > 
> > Really? Isnt there some way to avoid this call if zonelists are not setup
> > yet?
> 
> How do I best determine if zonelists aren't setup yet?
> 
> The call-path in question (after my series is applied) is:
> 
> arch/powerpc/kernel/setup_64.c::setup_arch ->
> 	arch/powerpc/mm/numa.c::do_init_bootmem() ->
> 		cpu_numa_callback() ->
> 			numa_setup_cpu() ->
> 				map_cpu_to_node() ->
> 					update_numa_cpu_node() ->
> 						set_cpu_numa_mem()
> 
> and setup_arch() is called before build_all_zonelists(NULL, NULL) in
> start_kernel(). This seemed like the most reasonable path, as it's used
> on hotplug as well.
> 

But the call to local_memory_node() you added was in start_secondary(),
which isn't in that trace.

I do agree that calling local_memory_node() too early then trying to
fudge around the consequences seems rather wrong.

^ permalink raw reply

* Re: [PATCH 7/7] powerpc: Added PCI MSI support using the HSTA module
From: Arnd Bergmann @ 2014-02-21 21:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alistair Popple, linuxppc-dev, linux-kernel, devicetree
In-Reply-To: <1393015286.6771.110.camel@pasglop>

On Friday 21 February 2014, Benjamin Herrenschmidt wrote:
> On Fri, 2014-02-21 at 15:33 +0100, Arnd Bergmann wrote:
> 
> > > @@ -242,8 +264,10 @@
> > >  			ranges = <0x02000000 0x00000000 0x80000000 0x00000110 0x80000000 0x0 0x80000000
> > >  			          0x01000000 0x0        0x0        0x00000140 0x0        0x0 0x00010000>;
> > >  
> > > -			/* Inbound starting at 0 to memsize filled in by zImage */
> > > -			dma-ranges = <0x42000000 0x0 0x0 0x0 0x0 0x0 0x0>;
> > > +			/* Inbound starting at 0x0 to 0x40000000000. In order to use MSI
> > > +			 * PCI devices must be able to write to the HSTA module.
> > > +			 */
> > > +			dma-ranges = <0x42000000 0x0 0x0 0x0 0x0 0x400 0x0>;
> 
> Should we (provided it's possible in HW) create two ranges instead ? One
> covering RAM and one covering MSIs ? To avoid stray DMAs whacking random
> HW registers in the chip ...
> 
> > >  			/* This drives busses 0 to 0xf */
> > >  			bus-range = <0x0 0xf>;
> > 
> > Ah, I first only saw the line you are removing and was about
> > to suggest what you do anyway. Great!
> > 
> > > diff --git a/arch/powerpc/sysdev/ppc4xx_pci.c b/arch/powerpc/sysdev/ppc4xx_pci.c
> > > index 54ec1d5..7cc3acc 100644
> > > --- a/arch/powerpc/sysdev/ppc4xx_pci.c
> > > +++ b/arch/powerpc/sysdev/ppc4xx_pci.c
> > > @@ -176,8 +176,12 @@ static int __init ppc4xx_parse_dma_ranges(struct pci_controller *hose,
> > >  		return -ENXIO;
> > >  	}
> > >  
> > > -	/* Check that we are fully contained within 32 bits space */
> > > -	if (res->end > 0xffffffff) {
> > > +	/* Check that we are fully contained within 32 bits space if we are not
> > > +	 * running on a 460sx or 476fpe which have 64 bit bus addresses.
> > > +	 */
> > > +	if (res->end > 0xffffffff &&
> > > +	    !(of_device_is_compatible(hose->dn, "ibm,plb-pciex-460sx")
> > > +	      || of_device_is_compatible(hose->dn, "ibm,plb-pciex-476fpe"))) {
> > >  		printk(KERN_ERR "%s: dma-ranges outside of 32 bits space\n",
> > >  		       hose->dn->full_name);
> > >  		return -ENXIO;
> > 
> > A more general question for BenH: Apparently this PCI implementation is
> > getting reused on arm64 for APM X-Gene. Do you see any value in trying to
> > share host controller drivers like this one across powerpc and arm64?
> 
> I would start duplicating, and see how much common code remains... Then
> eventually merge.

Ok.

> > It's possible we are going to see the same situation with fsl_pci in the
> > future, if arm and powerpc qoriq chips use the same peripherals. My
> > plan for arm64 right now is to make PCI work without any code in arch/,
> > just using new helper functions in drivers/pci and sticking the host
> > drivers into drivers/pci/host as we started doing for arm32, but it
> > can require significant work to make those drivers compatible with
> > the powerpc pci-common.c.
> 
> powerpc pci-common.c is shrinking :-) At least the address remapping is
> all in the core now, we could move more over I suppose...

Ah, good. We're currently trying to work out a generic way to parse
the DT and ioremap the I/O windows. That could probably be shared
and while I hope what we need on arm64 is compatible with what you
need on powerpc, I may always miss something. I'll make sure to add
you to the discussions.

Some parts are easier because we assume that we always scan the
entire PCI bus ourselves and don't do PCI_PROBE_DEVTREE. 
Other parts are harder because for the generic case we actually
want to support loading and unloading host bridge drivers,
as well as supporting any combination of host bridges in the
same system without platform specific code.

	Arnd

^ permalink raw reply

* Re: Question about EHCI on P4080
From: Scott Wood @ 2014-02-21 21:08 UTC (permalink / raw)
  To: Ruchika; +Cc: linuxppc-dev
In-Reply-To: <53064FC9.8070908@servergy.com>

On Thu, 2014-02-20 at 12:56 -0600, Ruchika wrote:
> linuxppc,
> Is there another mailing list which is more appropriate ?
> If so I'd be happy to get a recommendation.

If you're asking about U-Boot support, the list is u-boot@lists.denx.de

Be sure to mention what version of U-Boot you're using, and that you're
using a recent U-Boot.  Also, don't top post or post the same thing
twice.

If you're using U-Boot from the Freescale SDK rather than from upstream,
then instead get support from support@freescale.com or
https://community.freescale.com/  It might be a good idea to go that
route regardless, given that this sounds like it may be a hardware
compatibility issue.

I don't think there is a separate host controller or driver for USB 1.1
(or lower speed USB 2.0) devices on p4080 (though this is common on
PCs).

-Scott

> 
> Thank you
> Regards
> ruchika
> 
> 
> On 2/19/2014 11:55 PM, Ruchika wrote:
> > Hi,
> > I've been trying to understand why the uboot code is unable to work with
> > USB1.1 devices.
> > On a USB analyzer:
> > I notice on an analyzer that there are no SOF or IN tokens seen on the
> > bus at all. The only  after the very first setup packet is sent and ACK'd.
> >
> > The board has a hub chip on it connected to the USB controller via a
> > Ulpi interface.
> >
> > Uboot debug indicates that the transfer times out and was still "Active"
> >
> > The kernel code indicates that a companion driver handles 1.1 devices.
> > Does this mean there are 2 HCD's, one for HS and one for FS/LS in the
> > kernel ?
> > Thank you
> >
> >
> > ruchika
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 

^ permalink raw reply

* Re: [PATCH 7/7] powerpc: Added PCI MSI support using the HSTA module
From: Benjamin Herrenschmidt @ 2014-02-21 20:41 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Alistair Popple, linuxppc-dev, linux-kernel, devicetree
In-Reply-To: <1574137.lzDhNaVqIe@wuerfel>

On Fri, 2014-02-21 at 15:33 +0100, Arnd Bergmann wrote:

> > @@ -242,8 +264,10 @@
> >  			ranges = <0x02000000 0x00000000 0x80000000 0x00000110 0x80000000 0x0 0x80000000
> >  			          0x01000000 0x0        0x0        0x00000140 0x0        0x0 0x00010000>;
> >  
> > -			/* Inbound starting at 0 to memsize filled in by zImage */
> > -			dma-ranges = <0x42000000 0x0 0x0 0x0 0x0 0x0 0x0>;
> > +			/* Inbound starting at 0x0 to 0x40000000000. In order to use MSI
> > +			 * PCI devices must be able to write to the HSTA module.
> > +			 */
> > +			dma-ranges = <0x42000000 0x0 0x0 0x0 0x0 0x400 0x0>;

Should we (provided it's possible in HW) create two ranges instead ? One
covering RAM and one covering MSIs ? To avoid stray DMAs whacking random
HW registers in the chip ...

> >  			/* This drives busses 0 to 0xf */
> >  			bus-range = <0x0 0xf>;
> 
> Ah, I first only saw the line you are removing and was about
> to suggest what you do anyway. Great!
> 
> > diff --git a/arch/powerpc/sysdev/ppc4xx_pci.c b/arch/powerpc/sysdev/ppc4xx_pci.c
> > index 54ec1d5..7cc3acc 100644
> > --- a/arch/powerpc/sysdev/ppc4xx_pci.c
> > +++ b/arch/powerpc/sysdev/ppc4xx_pci.c
> > @@ -176,8 +176,12 @@ static int __init ppc4xx_parse_dma_ranges(struct pci_controller *hose,
> >  		return -ENXIO;
> >  	}
> >  
> > -	/* Check that we are fully contained within 32 bits space */
> > -	if (res->end > 0xffffffff) {
> > +	/* Check that we are fully contained within 32 bits space if we are not
> > +	 * running on a 460sx or 476fpe which have 64 bit bus addresses.
> > +	 */
> > +	if (res->end > 0xffffffff &&
> > +	    !(of_device_is_compatible(hose->dn, "ibm,plb-pciex-460sx")
> > +	      || of_device_is_compatible(hose->dn, "ibm,plb-pciex-476fpe"))) {
> >  		printk(KERN_ERR "%s: dma-ranges outside of 32 bits space\n",
> >  		       hose->dn->full_name);
> >  		return -ENXIO;
> 
> A more general question for BenH: Apparently this PCI implementation is
> getting reused on arm64 for APM X-Gene. Do you see any value in trying to
> share host controller drivers like this one across powerpc and arm64?

I would start duplicating, and see how much common code remains... Then
eventually merge.

> It's possible we are going to see the same situation with fsl_pci in the
> future, if arm and powerpc qoriq chips use the same peripherals. My
> plan for arm64 right now is to make PCI work without any code in arch/,
> just using new helper functions in drivers/pci and sticking the host
> drivers into drivers/pci/host as we started doing for arm32, but it
> can require significant work to make those drivers compatible with
> the powerpc pci-common.c.

powerpc pci-common.c is shrinking :-) At least the address remapping is
all in the core now, we could move more over I suppose...

Cheers,
Ben.

> 	Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox