LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCHv4 1/2] ppc64: perform proper max_bus_speed detection
From: lucaskt @ 2013-04-24 22:54 UTC (permalink / raw)
  To: linuxppc-dev, dri-devel, Benjamin Herrenschmidt, Bjorn Helgaas,
	David Airlie <airlied@linux.ie> Michael Ellerman
  Cc: Kleber Sacilotto de Souza, Alex Deucher, Jerome Glisse,
	Thadeu Lima de Souza Cascardo, Lucas Kannebley Tavares,
	Brian King
In-Reply-To: <1366844090-5492-1-git-send-email-lucaskt@linux.vnet.ibm.com>

From: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>

On pseries machines the detection for max_bus_speed should be done
through an OpenFirmware property. This patch adds a function to perform
this detection and a hook to perform dynamic adding of the function only for
pseries. This is done by overwriting the weak
pcibios_root_bridge_prepare function which is called by pci_create_root_bus().

Signed-off-by: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/machdep.h       |  2 ++
 arch/powerpc/kernel/pci-common.c         |  8 +++++
 arch/powerpc/platforms/pseries/pci.c     | 51 ++++++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pseries.h |  4 +++
 arch/powerpc/platforms/pseries/setup.c   |  2 ++
 5 files changed, 67 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 3d6b410..8f558bf 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -107,6 +107,8 @@ struct machdep_calls {
 	void		(*pcibios_fixup)(void);
 	int		(*pci_probe_mode)(struct pci_bus *);
 	void		(*pci_irq_fixup)(struct pci_dev *dev);
+	int		(*pcibios_root_bridge_prepare)(struct pci_host_bridge
+				*bridge);
 
 	/* To setup PHBs when using automatic OF platform driver for PCI */
 	int		(*pci_setup_phb)(struct pci_controller *host);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index fa12ae4..80986cf 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -844,6 +844,14 @@ int pci_proc_domain(struct pci_bus *bus)
 	return 1;
 }
 
+int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	if (ppc_md.pcibios_root_bridge_prepare)
+		return ppc_md.pcibios_root_bridge_prepare(bridge);
+
+	return 0;
+}
+
 /* This header fixup will do the resource fixup for all devices as they are
  * probed, but not for bridge ranges
  */
diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
index 0b580f4..7f9c956 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -108,3 +108,54 @@ static void fixup_winbond_82c105(struct pci_dev* dev)
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105,
 			 fixup_winbond_82c105);
+
+int pseries_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	struct device_node *dn, *pdn;
+	struct pci_bus *bus;
+	const uint32_t *pcie_link_speed_stats;
+
+	bus = bridge->bus;
+
+	dn = pcibios_get_phb_of_node(bus);
+	if (!dn)
+		return 0;
+
+	for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
+		pcie_link_speed_stats = (const uint32_t *) of_get_property(dn,
+			"ibm,pcie-link-speed-stats", NULL);
+		if (pcie_link_speed_stats)
+			break;
+	}
+
+	if (!pcie_link_speed_stats) {
+		pr_err("no ibm,pcie-link-speed-stats property\n");
+		return 0;
+	}
+
+	switch (pcie_link_speed_stats[0]) {
+	case 0x01:
+		bus->max_bus_speed = PCIE_SPEED_2_5GT;
+		break;
+	case 0x02:
+		bus->max_bus_speed = PCIE_SPEED_5_0GT;
+		break;
+	default:
+		bus->max_bus_speed = PCI_SPEED_UNKNOWN;
+		break;
+	}
+
+	switch (pcie_link_speed_stats[1]) {
+	case 0x01:
+		bus->cur_bus_speed = PCIE_SPEED_2_5GT;
+		break;
+	case 0x02:
+		bus->cur_bus_speed = PCIE_SPEED_5_0GT;
+		break;
+	default:
+		bus->cur_bus_speed = PCI_SPEED_UNKNOWN;
+		break;
+	}
+
+	return 0;
+}
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 9a3dda0..b79393d 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -60,4 +60,8 @@ extern int dlpar_detach_node(struct device_node *);
 /* Snooze Delay, pseries_idle */
 DECLARE_PER_CPU(long, smt_snooze_delay);
 
+/* PCI root bridge prepare function override for pseries */
+struct pci_host_bridge;
+int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
+
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 8bcc9ca..bf34cc9 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -466,6 +466,8 @@ static void __init pSeries_setup_arch(void)
 	else
 		ppc_md.enable_pmcs = power4_enable_pmcs;
 
+	ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+
 	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
 		long rc;
 		if ((rc = pSeries_enable_reloc_on_exc()) != H_SUCCESS) {
-- 
1.8.1.4

^ permalink raw reply related

* [PATCHv4 0/2] Speed Cap fixes for ppc64
From: lucaskt @ 2013-04-24 22:54 UTC (permalink / raw)
  To: linuxppc-dev, dri-devel, Benjamin Herrenschmidt, Bjorn Helgaas,
	David Airlie <airlied@linux.ie> Michael Ellerman
  Cc: Kleber Sacilotto de Souza, Alex Deucher, Jerome Glisse,
	Lucas Kannebley Tavares, Thadeu Lima de Souza Cascardo,
	Brian King

From: Lucas Kannebley Tavares <lucaskt@vnet.linux.ibm.com>

This patch series does:
  1. max_bus_speed is used to set the device to gen2 speeds
  2. on power there's no longer a conflict between the pseries call and other architectures, because the overwrite is done via a ppc_md hook
  3. radeon is using bus->max_bus_speed instead of drm_pcie_get_speed_cap_mask for gen2 capability detection

And I've also added the changes proposed by Michael Ellerman:
  1. Corrected Patch 1's comments
  2. Moved forward function declarations to pseries.h header
  3. Added forward references to struct pci_host_bridge, preventing compilation fails.

The first patch consists of some architecture changes, such as adding a hook on powerpc for pci_root_bridge_prepare, so that pseries will initialize it to a function, while all other architectures get a NULL pointer. So that whenever whenever pci_create_root_bus is called, we'll get max_bus_speed properly setup from OpenFirmware.

The second patch consists of simple radeon changes not to call drm_get_pcie_speed_cap_mask anymore. I assume that on x86 machines, the max_bus_speed property will be properly set already.

Lucas Kannebley Tavares (2):
  ppc64: perform proper max_bus_speed detection
  radeon: use max_bus_speed to activate gen2 speeds

 arch/powerpc/include/asm/machdep.h       |  2 ++
 arch/powerpc/kernel/pci-common.c         |  8 +++++
 arch/powerpc/platforms/pseries/pci.c     | 51 ++++++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pseries.h |  4 +++
 arch/powerpc/platforms/pseries/setup.c   |  2 ++
 drivers/gpu/drm/radeon/evergreen.c       | 10 ++-----
 drivers/gpu/drm/radeon/r600.c            |  9 ++----
 drivers/gpu/drm/radeon/rv770.c           |  9 ++----
 8 files changed, 74 insertions(+), 21 deletions(-)

-- 
1.8.1.4

^ permalink raw reply

* Re: [PATCH v2 12/15] powerpc/85xx: add time base sync support for e6500
From: Scott Wood @ 2013-04-24 22:38 UTC (permalink / raw)
  To: Zhao Chenhui; +Cc: linuxppc-dev, linux-kernel, r58472
In-Reply-To: <20130424112929.GC3172@localhost.localdomain>

On 04/24/2013 06:29:29 AM, Zhao Chenhui wrote:
> On Tue, Apr 23, 2013 at 07:04:06PM -0500, Scott Wood wrote:
> > On 04/19/2013 05:47:45 AM, Zhao Chenhui wrote:
> > >From: Chen-Hui Zhao <chenhui.zhao@freescale.com>
> > >
> > >For e6500, two threads in one core share one time base. Just need
> > >to do time base sync on first thread of one core, and skip it on
> > >the other thread.
> > >
> > >Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
> > >Signed-off-by: Li Yang <leoli@freescale.com>
> > >Signed-off-by: Andy Fleming <afleming@freescale.com>
> > >---
> > > arch/powerpc/platforms/85xx/smp.c |   52
> > >+++++++++++++++++++++++++++++++-----
> > > 1 files changed, 44 insertions(+), 8 deletions(-)
> > >
> > >diff --git a/arch/powerpc/platforms/85xx/smp.c
> > >b/arch/powerpc/platforms/85xx/smp.c
> > >index 74d8cde..5f3eee3 100644
> > >--- a/arch/powerpc/platforms/85xx/smp.c
> > >+++ b/arch/powerpc/platforms/85xx/smp.c
> > >@@ -26,6 +26,7 @@
> > > #include <asm/cacheflush.h>
> > > #include <asm/dbell.h>
> > > #include <asm/fsl_guts.h>
> > >+#include <asm/cputhreads.h>
> > >
> > > #include <sysdev/fsl_soc.h>
> > > #include <sysdev/mpic.h>
> > >@@ -45,6 +46,7 @@ static u64 timebase;
> > > static int tb_req;
> > > static int tb_valid;
> > > static u32 cur_booting_core;
> > >+static bool rcpmv2;
> > >
> > > #ifdef CONFIG_PPC_E500MC
> > > /* get a physical mask of online cores and booting core */
> > >@@ -53,26 +55,40 @@ static inline u32 get_phy_cpu_mask(void)
> > > 	u32 mask;
> > > 	int cpu;
> > >
> > >-	mask =3D 1 << cur_booting_core;
> > >-	for_each_online_cpu(cpu)
> > >-		mask |=3D 1 << get_hard_smp_processor_id(cpu);
> > >+	if (smt_capable()) {
> > >+		/* two threads in one core share one time base */
> > >+		mask =3D 1 << cpu_core_index_of_thread(cur_booting_core);
> > >+		for_each_online_cpu(cpu)
> > >+			mask |=3D 1 << cpu_core_index_of_thread(
> > >+					get_hard_smp_processor_id(cpu));
> > >+	} else {
> > >+		mask =3D 1 << cur_booting_core;
> > >+		for_each_online_cpu(cpu)
> > >+			mask |=3D 1 << get_hard_smp_processor_id(cpu);
> > >+	}
> >
> > Where is smt_capable defined()?  I assume somewhere in the patchset
> > but it's a pain to search 12 patches...
> >
>=20
> It is defined in arch/powerpc/include/asm/topology.h.
> 	#define smt_capable()           (cpu_has_feature(CPU_FTR_SMT))
>=20
> Thanks for your review again.

We shouldn't base it on CPU_FTR_SMT.  For example, e6500 doesn't claim =20
that feature yet, except in our SDK kernel.  That doesn't change the =20
topology of CPU numbering.

> > Is this really about whether we're SMT-capable or whether we have
> > rcpm v2?
> >
> > -Scott
>=20
> I think this "if" statement can be removed. The =20
> cpu_core_index_of_thread()
> can return the correct cpu number with thread or without thread.
>=20
> Like this:
> static inline u32 get_phy_cpu_mask(void)
> {
> 	u32 mask;
> 	int cpu;
>=20
> 	mask =3D 1 << cpu_core_index_of_thread(cur_booting_core);
> 	for_each_online_cpu(cpu)
> 		mask |=3D 1 << cpu_core_index_of_thread(
> 				get_hard_smp_processor_id(cpu));
>=20
> 	return mask;
> }

Likewise, this will get it wrong if SMT is disabled or not yet =20
implemented on a core.

-Scott=

^ permalink raw reply

* Re: [PATCH 1/3] rapidio: make enumeration/discovery configurable
From: Andrew Morton @ 2013-04-24 21:37 UTC (permalink / raw)
  To: Alexandre Bounine
  Cc: Micha Nelissen, linux-kernel, Andre van Herk, linuxppc-dev
In-Reply-To: <1366813919-13766-2-git-send-email-alexandre.bounine@idt.com>

On Wed, 24 Apr 2013 10:31:57 -0400 Alexandre Bounine <alexandre.bounine@idt.com> wrote:

> Rework to implement RapidIO enumeration/discovery method selection
> combined with ability to use enumeration/discovery as a kernel module.
> 
> This patch adds ability to introduce new RapidIO enumeration/discovery methods
> using kernel configuration options or loadable modules. Configuration option
> mechanism allows to select built-in or modular enumeration/discovery method from
> the list of existing methods or use external modules.
> If a modular enumeration/discovery is selected each RapidIO mport device can
> have its own method attached to it.
> 
> The currently existing enumeration/discovery code was updated to be used
> as built-in or modular method. This configuration option is named "Basic
> enumeration/discovery" method.
> 
> Several common routines have been moved from rio-scan.c to make them available
> to other enumeration methods and reduce number of exported symbols.
> 
> ...
>
> @@ -1421,3 +1295,46 @@ enum_done:
>  bail:
>  	return -EBUSY;
>  }
> +
> +struct rio_scan rio_scan_ops = {
> +	.enumerate = rio_enum_mport,
> +	.discover = rio_disc_mport,
> +};
> +
> +
> +#ifdef MODULE

Why the `ifdef MODULE'?  The module parameters are still accessible if
the driver is statically linked and we do want the driver to behave in
the same way regardless of how it was linked and loaded.

> +static bool scan;
> +module_param(scan, bool, 0);
> +MODULE_PARM_DESC(scan, "Start RapidIO network enumeration/discovery "
> +			"(default = 1)");
> +
> +/**
> + * rio_basic_attach:
> + *
> + * When this enumeration/discovery method is loaded as a module this function
> + * registers its specific enumeration and discover routines for all available
> + * RapidIO mport devices. The "scan" command line parameter controls ability of
> + * the module to start RapidIO enumeration/discovery automatically.
> + *
> + * Returns 0 for success or -EIO if unable to register itself.
> + *
> + * This enumeration/discovery method cannot be unloaded and therefore does not
> + * provide a matching cleanup_module routine.
> + */
> +
> +int __init rio_basic_attach(void)

static

> +{
> +	if (rio_register_scan(RIO_MPORT_ANY, &rio_scan_ops))
> +		return -EIO;
> +	if (scan)
> +		rio_init_mports();
> +	return 0;
> +}
> +
> +module_init(rio_basic_attach);
> +
> +MODULE_DESCRIPTION("Basic RapidIO enumeration/discovery");
> +MODULE_LICENSE("GPL");
> +
> +#endif /* MODULE */
> diff --git a/drivers/rapidio/rio.c b/drivers/rapidio/rio.c
> index d553b5d..e36628a 100644
> --- a/drivers/rapidio/rio.c
> +++ b/drivers/rapidio/rio.c
> @@ -31,6 +31,9 @@
>  
>  #include "rio.h"
>  
> +LIST_HEAD(rio_devices);

static?

> +DEFINE_SPINLOCK(rio_global_list_lock);

static?

> +
>  static LIST_HEAD(rio_mports);
>  static unsigned char next_portid;
>  static DEFINE_SPINLOCK(rio_mmap_lock);
> 
> ...
>
> +/**
> + * rio_switch_init - Sets switch operations for a particular vendor switch
> + * @rdev: RIO device
> + * @do_enum: Enumeration/Discovery mode flag
> + *
> + * Searches the RIO switch ops table for known switch types. If the vid
> + * and did match a switch table entry, then call switch initialization
> + * routine to setup switch-specific routines.
> + */
> +void rio_switch_init(struct rio_dev *rdev, int do_enum)
> +{
> +	struct rio_switch_ops *cur = __start_rio_switch_ops;
> +	struct rio_switch_ops *end = __end_rio_switch_ops;

huh, I hadn't noticed that RIO has its very own vmlinux section.  How
peculair.

> +	while (cur < end) {
> +		if ((cur->vid == rdev->vid) && (cur->did == rdev->did)) {
> +			pr_debug("RIO: calling init routine for %s\n",
> +				 rio_name(rdev));
> +			cur->init_hook(rdev, do_enum);
> +			break;
> +		}
> +		cur++;
> +	}
> +
> +	if ((cur >= end) && (rdev->pef & RIO_PEF_STD_RT)) {
> +		pr_debug("RIO: adding STD routing ops for %s\n",
> +			rio_name(rdev));
> +		rdev->rswitch->add_entry = rio_std_route_add_entry;
> +		rdev->rswitch->get_entry = rio_std_route_get_entry;
> +		rdev->rswitch->clr_table = rio_std_route_clr_table;
> +	}
> +
> +	if (!rdev->rswitch->add_entry || !rdev->rswitch->get_entry)
> +		printk(KERN_ERR "RIO: missing routing ops for %s\n",
> +		       rio_name(rdev));
> +}
> +EXPORT_SYMBOL_GPL(rio_switch_init);
> 
> ...
>
> +int rio_register_scan(int mport_id, struct rio_scan *scan_ops)
> +{
> +	struct rio_mport *port;
> +	int rc = -EBUSY;
> +
> +	list_for_each_entry(port, &rio_mports, node) {

How come the driver has no locking for rio_mports?  If a bugfix isn't
needed here then a code comment is!

> +		if (port->id == mport_id || mport_id == RIO_MPORT_ANY) {
> +			if (port->nscan && mport_id == RIO_MPORT_ANY)
> +				continue;
> +			else if (port->nscan)
> +				break;
> +
> +			port->nscan = scan_ops;
> +			rc = 0;
> +
> +			if (mport_id != RIO_MPORT_ANY)
> +				break;
> +		}
> +	}
> +
> +	return rc;
> +}
> 
> ...
>

^ permalink raw reply

* Re: [PATCH 5/7] powerpc/powernv: TCE invalidation for PHB3
From: Benjamin Herrenschmidt @ 2013-04-24 20:52 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev
In-Reply-To: <1366796259-29412-6-git-send-email-shangw@linux.vnet.ibm.com>


> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index cbfe678..0db308e 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -76,6 +76,7 @@ struct iommu_table {
>  	struct iommu_pool large_pool;
>  	struct iommu_pool pools[IOMMU_NR_POOLS];
>  	unsigned long *it_map;       /* A simple allocation bitmap for now */
> +	void *sysdata;
>  };

You should be able to avoid adding that field by using the container_of
trick to get to the PE and moving the iommu ops for ioda into pci-ioda.c
instead of sharing them with the non-ioda stuff.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 4/7] powerpc/powernv: Patch MSI EOI handler on P8
From: Benjamin Herrenschmidt @ 2013-04-24 20:49 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev
In-Reply-To: <1366796259-29412-5-git-send-email-shangw@linux.vnet.ibm.com>

On Wed, 2013-04-24 at 17:37 +0800, Gavin Shan wrote:
> The EOI handler of MSI/MSI-X interrupts for P8 (PHB3) need additional
> steps to handle the P/Q bits in IVE before EOIing the corresponding
> interrupt. The patch changes the EOI handler to cover that.

 .../...

>  static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
>  {
>  	unsigned int count;
> @@ -667,6 +681,8 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
>  	}
>  
>  	phb->msi_setup = pnv_pci_ioda_msi_setup;
> +	if (phb->type == PNV_PHB_IODA2)
> +		phb->msi_eoi = pnv_pci_ioda_msi_eoi;

Ouch, another function pointer call in a hot path...

>  	phb->msi32_support = 1;
>  	pr_info("  Allocated bitmap for %d MSIs (base IRQ 0x%x)\n",
>  		count, phb->msi_base);
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index a11b5a6..ea6a93d 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -115,6 +115,25 @@ static void pnv_teardown_msi_irqs(struct pci_dev *pdev)
>  		irq_dispose_mapping(entry->irq);
>  	}
>  }
> +
> +int pnv_pci_msi_eoi(unsigned int hw_irq)
> +{
> +	struct pci_controller *hose, *tmp;
> +	struct pnv_phb *phb = NULL;
> +
> +	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> +		phb = hose->private_data;
> +		if (hw_irq >= phb->msi_base &&
> +		    hw_irq < phb->msi_base + phb->msi_bmp.irq_count) {
> +			if (!phb->msi_eoi)
> +				return -EEXIST;
> +			return phb->msi_eoi(phb, hw_irq);
> +		}
> +	}
> +
> +	/* For LSI interrupts, we needn't do it */
> +	return 0;
> +}

And a list walk ... that's not right.

Also, you do it for all XICS interrupts, including the non-PCI ones, the
LSIs, etc... only to figure out that some might not be MSIs later in
the loop.

Why not instead look at changing the irq_chip for the MSIs ?

IE. When setting up the MSIs for IODA2, use a different irq_chip which
is a copy of the original one with a different ->eoi callback, which
does the original xics eoi and then the OPAL stuff ?

You might even be able to use something like container_of to get back
to the struct phb, no need to iterate them all.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH v2 0/8] powerpc/pseries: Nvram-to-pstore
From: Kees Cook @ 2013-04-24 20:45 UTC (permalink / raw)
  To: Aruna Balakrishnaiah
  Cc: jkenisto, Tony Luck, Colin Cross, LKML, Anton Vorontsov,
	linuxppc-dev, paulus, anton, mahesh
In-Reply-To: <20130424061807.7341.909.stgit@aruna-ThinkPad-T420>

On Tue, Apr 23, 2013 at 11:19 PM, Aruna Balakrishnaiah
<aruna@linux.vnet.ibm.com> wrote:
> Currently the kernel provides the contents of p-series NVRAM only as a
> simple stream of bytes via /dev/nvram, which must be interpreted in user
> space by the nvram command in the powerpc-utils package. This patch set
> exploits the pstore subsystem to expose each partition in NVRAM as a
> separate file in /dev/pstore. For instance Oops messages will stored in a
> file named [dmesg-nvram-2].
>
> Changes from v1:
>         - Reduce #ifdefs by and remove forward declarations of pstore callbacks
>         - Handle return value of nvram_write_os_partition
>         - Remove empty pstore callbacks and register pstore only when pstore
>           is configured
>
> ---
>
> Aruna Balakrishnaiah (8):
>       powerpc/pseries: Remove syslog prefix in uncompressed oops text
>       powerpc/pseries: Add version and timestamp to oops header
>       powerpc/pseries: Introduce generic read function to read nvram-partitions
>       powerpc/pseries: Read/Write oops nvram partition via pstore
>       powerpc/pseries: Read rtas partition via pstore
>       powerpc/pseries: Distinguish between a os-partition and non-os partition
>       powerpc/pseries: Read of-config partition via pstore
>       powerpc/pseries: Read common partition via pstore
>
>
>  arch/powerpc/platforms/pseries/nvram.c |  353 +++++++++++++++++++++++++++-----
>  fs/pstore/inode.c                      |    9 +
>  include/linux/pstore.h                 |    4
>  3 files changed, 313 insertions(+), 53 deletions(-)

This series looks good! Other than the naming conventions (are these
new pstore types really PPC-only?) I think it's a fine addition.

Thanks!

-Kees

--
Kees Cook
Chrome OS Security

^ permalink raw reply

* Re: [PATCH v2 7/8] powerpc/pseries: Read of-config partition via pstore
From: Kees Cook @ 2013-04-24 20:43 UTC (permalink / raw)
  To: Aruna Balakrishnaiah
  Cc: jkenisto, Tony Luck, Colin Cross, LKML, Anton Vorontsov,
	linuxppc-dev, paulus, anton, mahesh
In-Reply-To: <20130424062052.7341.18551.stgit@aruna-ThinkPad-T420>

On Tue, Apr 23, 2013 at 11:20 PM, Aruna Balakrishnaiah
<aruna@linux.vnet.ibm.com> wrote:
> This patch set exploits the pstore subsystem to read details of
> of-config partition in NVRAM to a separate file in /dev/pstore.
> For instance, of-config partition details will be stored in a
> file named [of-nvram-5].
>
> Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/nvram.c |   55 +++++++++++++++++++++++++++-----
>  fs/pstore/inode.c                      |    3 ++
>  include/linux/pstore.h                 |    1 +
>  3 files changed, 50 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c
> index b118382..de448af 100644
> --- a/arch/powerpc/platforms/pseries/nvram.c
> +++ b/arch/powerpc/platforms/pseries/nvram.c
> @@ -132,9 +132,16 @@ static size_t oops_data_sz;
>  static struct z_stream_s stream;
>
>  #ifdef CONFIG_PSTORE
> +static struct nvram_os_partition of_config_partition = {
> +       .name = "of-config",
> +       .index = -1,
> +       .os_partition = false
> +};
> +
>  static enum pstore_type_id nvram_type_ids[] = {
>         PSTORE_TYPE_DMESG,
>         PSTORE_TYPE_RTAS,
> +       PSTORE_TYPE_OF,
>         -1
>  };
>  static int read_type;
> @@ -332,10 +339,15 @@ int nvram_read_partition(struct nvram_os_partition *part, char *buff,
>
>         tmp_index = part->index;
>
> -       rc = ppc_md.nvram_read((char *)&info, sizeof(struct err_log_info), &tmp_index);
> -       if (rc <= 0) {
> -               pr_err("%s: Failed nvram_read (%d)\n", __FUNCTION__, rc);
> -               return rc;
> +       if (part->os_partition) {
> +               rc = ppc_md.nvram_read((char *)&info,
> +                                       sizeof(struct err_log_info),
> +                                       &tmp_index);
> +               if (rc <= 0) {
> +                       pr_err("%s: Failed nvram_read (%d)\n", __FUNCTION__,
> +                                                                       rc);
> +                       return rc;
> +               }
>         }
>
>         rc = ppc_md.nvram_read(buff, length, &tmp_index);
> @@ -344,8 +356,10 @@ int nvram_read_partition(struct nvram_os_partition *part, char *buff,
>                 return rc;
>         }
>
> -       *error_log_cnt = info.seq_num;
> -       *err_type = info.error_type;
> +       if (part->os_partition) {
> +               *error_log_cnt = info.seq_num;
> +               *err_type = info.error_type;
> +       }
>
>         return 0;
>  }
> @@ -516,7 +530,7 @@ static int nvram_pstore_write(enum pstore_type_id type,
>  }
>
>  /*
> - * Reads the oops/panic report and ibm,rtas-log partition.
> + * Reads the oops/panic report, rtas and of-config partition.
>   * Returns the length of the data we read from each partition.
>   * Returns 0 if we've been called before.
>   */
> @@ -525,9 +539,11 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
>                                 struct pstore_info *psi)
>  {
>         struct oops_log_info *oops_hdr;
> -       unsigned int err_type, id_no;
> +       unsigned int err_type, id_no, size = 0;
>         struct nvram_os_partition *part = NULL;
>         char *buff = NULL;
> +       int sig = 0;
> +       loff_t p;
>
>         read_type++;
>
> @@ -542,10 +558,29 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
>                 time->tv_sec = last_rtas_event;
>                 time->tv_nsec = 0;
>                 break;
> +       case PSTORE_TYPE_OF:
> +               sig = NVRAM_SIG_OF;
> +               part = &of_config_partition;
> +               *type = PSTORE_TYPE_OF;
> +               *id = PSTORE_TYPE_OF;
> +               time->tv_sec = 0;
> +               time->tv_nsec = 0;
> +               break;
>         default:
>                 return 0;
>         }
>
> +       if (!part->os_partition) {
> +               p = nvram_find_partition(part->name, sig, &size);
> +               if (p <= 0) {
> +                       pr_err("nvram: Failed to find partition %s, "
> +                               "err %d\n", part->name, (int)p);
> +                       return 0;
> +               }
> +               part->index = p;
> +               part->size = size;
> +       }
> +
>         buff = kmalloc(part->size, GFP_KERNEL);
>
>         if (!buff)
> @@ -557,7 +592,9 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
>         }
>
>         *count = 0;
> -       *id = id_no;
> +
> +       if (part->os_partition)
> +               *id = id_no;
>
>         if (nvram_type_ids[read_type] == PSTORE_TYPE_DMESG) {
>                 oops_hdr = (struct oops_log_info *)buff;
> diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
> index ec24f9c..8d4fb65 100644
> --- a/fs/pstore/inode.c
> +++ b/fs/pstore/inode.c
> @@ -327,6 +327,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count,
>         case PSTORE_TYPE_PPC_RTAS:
>                 sprintf(name, "rtas-%s-%lld", psname, id);
>                 break;
> +       case PSTORE_TYPE_PPC_OF:
> +               sprintf(name, "of-%s-%lld", psname, id);
> +               break;
>         case PSTORE_TYPE_UNKNOWN:
>                 sprintf(name, "unknown-%s-%lld", psname, id);
>                 break;
> diff --git a/include/linux/pstore.h b/include/linux/pstore.h
> index d7a8fe9..615dc18 100644
> --- a/include/linux/pstore.h
> +++ b/include/linux/pstore.h
> @@ -37,6 +37,7 @@ enum pstore_type_id {
>         PSTORE_TYPE_FTRACE      = 3,
>         /* PPC64 partition types */
>         PSTORE_TYPE_PPC_RTAS    = 4,
> +       PSTORE_TYPE_PPC_OF      = 5,
>         PSTORE_TYPE_UNKNOWN     = 255
>  };
>
>

Should this be named just "PSTORE_TYPE_OF" instead of "...PPC_OF"?

-Kees

--
Kees Cook
Chrome OS Security

^ permalink raw reply

* Re: [PATCH] powerpc: Add HWCAP2 aux entry
From: Andrew Morton @ 2013-04-24 19:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Neuling, vda.linux, Nishanth Aravamudan, linux-kernel,
	Steve Munroe, paulus, viro, Ryan Arnold, linuxppc-dev
In-Reply-To: <1366677702.2886.9.camel@pasglop>

On Tue, 23 Apr 2013 10:41:42 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Thu, 2013-04-18 at 13:41 +1000, Michael Neuling wrote:
> > akpm,
> > 
> > If you're happy with this, is it something you can take in your tree?
> 
> Andrew ? Or give me an ack ? :-) I'm happy to carry this, we need that
> rather urgently and we have the glibc folks on board.

Looks good to me - please proceed with the patch.

^ permalink raw reply

* Re: [PATCH] powerpc: Add isync to copy_and_flush
From: Benjamin Herrenschmidt @ 2013-04-24 16:36 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Linux PPC dev, miltonm, Nishanth Aravamudan
In-Reply-To: <29244.1366799409@ale.ozlabs.ibm.com>

On Wed, 2013-04-24 at 20:30 +1000, Michael Neuling wrote:
> benh: we should get this in 3.9 ASAP.

Considering that the bug has been there *forever* I don't think I have a
real standing to try to shove it down Linus throat as a "regression
fix" :-)

I'll put the fix in 3.10 and let it trickle down to stable.

Cheers,
Ben.

^ permalink raw reply

* [PATCH v4 12/13] Enable PRRN handling
From: Nathan Fontenot @ 2013-04-24 16:06 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

The Linux kernel and platform firmware negotiate their mutual support
of the PRRN option via the ibm,client-architecture-support interface.
This patch simply sets the appropriate fields in the client architecture
vector to indicate Linux support for PRRN and will allow the firmware to
report PRRN events via the RTAS event-scan mechanism.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/prom_init.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: powerpc/arch/powerpc/kernel/prom_init.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/prom_init.c	2013-04-23 14:46:23.000000000 -0500
+++ powerpc/arch/powerpc/kernel/prom_init.c	2013-04-23 15:30:36.000000000 -0500
@@ -698,7 +698,7 @@
 #else
 	0,
 #endif
-	OV5_FEAT(OV5_TYPE1_AFFINITY),
+	OV5_FEAT(OV5_TYPE1_AFFINITY) | OV5_FEAT(OV5_PRRN),
 	0,
 	0,
 	0,

^ permalink raw reply

* [PATCH v4 13/13] Add /proc interface to control topology updates
From: Nathan Fontenot @ 2013-04-24 16:07 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

There are instances in which we do not want topology updates to occur.
In order to allow this a /proc interface (/proc/powerpc/topology_updates)
is introduced so that topology updates can be enabled and disabled.

This patch also adds a prrn_is_enabled() call so that PRRN events are
handled in the kernel only if topology updating is enabled.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/topology.h |    5 ++
 arch/powerpc/kernel/rtasd.c         |    7 ++--
 arch/powerpc/mm/numa.c              |   62 +++++++++++++++++++++++++++++++++++-
 3 files changed, 71 insertions(+), 3 deletions(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-04-23 15:29:39.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-04-23 19:30:33.000000000 -0500
@@ -23,6 +23,9 @@
 #include <linux/cpuset.h>
 #include <linux/node.h>
 #include <linux/stop_machine.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/uaccess.h>
 #include <asm/sparsemem.h>
 #include <asm/prom.h>
 #include <asm/smp.h>
@@ -1585,7 +1588,6 @@
 
 	return rc;
 }
-__initcall(start_topology_update);
 
 /*
  * Disable polling for VPHN associativity changes.
@@ -1604,4 +1606,62 @@
 
 	return rc;
 }
+
+int prrn_is_enabled(void)
+{
+	return prrn_enabled;
+}
+
+static int topology_read(struct seq_file *file, void *v)
+{
+	if (vphn_enabled || prrn_enabled)
+		seq_puts(file, "on\n");
+	else
+		seq_puts(file, "off\n");
+
+	return 0;
+}
+
+static int topology_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, topology_read, NULL);
+}
+
+static ssize_t topology_write(struct file *file, const char __user *buf,
+			      size_t count, loff_t *off)
+{
+	char kbuf[4]; /* "on" or "off" plus null. */
+	int read_len;
+
+	read_len = count < 3 ? count : 3;
+	if (copy_from_user(kbuf, buf, read_len))
+		return -EINVAL;
+
+	kbuf[read_len] = '\0';
+
+	if (!strncmp(kbuf, "on", 2))
+		start_topology_update();
+	else if (!strncmp(kbuf, "off", 3))
+		stop_topology_update();
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+static const struct file_operations topology_ops = {
+	.read = seq_read,
+	.write = topology_write,
+	.open = topology_open,
+	.release = single_release
+};
+
+static int topology_update_init(void)
+{
+	start_topology_update();
+	proc_create("powerpc/topology_updates", 644, NULL, &topology_ops);
+
+	return 0;
+}
+device_initcall(topology_update_init);
 #endif /* CONFIG_PPC_SPLPAR */
Index: powerpc/arch/powerpc/include/asm/topology.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/topology.h	2013-04-23 12:54:22.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/topology.h	2013-04-23 19:31:57.000000000 -0500
@@ -71,6 +71,7 @@
 #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
 extern int start_topology_update(void);
 extern int stop_topology_update(void);
+extern int prrn_is_enabled(void);
 #else
 static inline int start_topology_update(void)
 {
@@ -80,6 +81,10 @@
 {
 	return 0;
 }
+static inline int prrn_is_enabled(void)
+{
+	return 0;
+}
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
 #include <asm-generic/topology.h>
Index: powerpc/arch/powerpc/kernel/rtasd.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/rtasd.c	2013-04-23 13:52:08.000000000 -0500
+++ powerpc/arch/powerpc/kernel/rtasd.c	2013-04-23 17:47:09.000000000 -0500
@@ -29,6 +29,7 @@
 #include <asm/nvram.h>
 #include <linux/atomic.h>
 #include <asm/machdep.h>
+#include <asm/topology.h>
 
 
 static DEFINE_SPINLOCK(rtasd_log_lock);
@@ -292,11 +293,13 @@
 
 static void handle_rtas_event(const struct rtas_error_log *log)
 {
-	if (log->type == RTAS_TYPE_PRRN)
+	if (log->type == RTAS_TYPE_PRRN) {
 		/* For PRRN Events the extended log length is used to denote
 		 * the scope for calling rtas update-nodes.
 		 */
-		prrn_schedule_update(log->extended_log_length);
+		if (prrn_is_enabled())
+			prrn_schedule_update(log->extended_log_length);
+	}
 
 	return;
 }

^ permalink raw reply

* [PATCH v4 11/13] RE-enable Virtual Processor Home Node updating
From: Nathan Fontenot @ 2013-04-24 16:05 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

The new PRRN firmware feature provides a more convenient and event-driven
interface than VPHN for notifying Linux of changes to the NUMA affinity of
platform resources. However, for practical reasons, it may not be feasible
for some customers to update to the latest firmware. For these customers,
the VPHN feature supported on previous firmware versions may still be the
best option.

The VPHN feature was previously disabled due to races with the load
balancing code when accessing the NUMA cpu maps, but the new stop_machine()
approach protects the NUMA cpu maps from these concurrent accesses. It
should be safe to re-enable this feature now.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-04-23 15:28:21.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-04-23 15:29:39.000000000 -0500
@@ -1572,9 +1572,8 @@
 			vphn_enabled = 0;
 			rc = of_reconfig_notifier_register(&dt_update_nb);
 		}
-	} else if (0 && firmware_has_feature(FW_FEATURE_VPHN) &&
+	} else if (firmware_has_feature(FW_FEATURE_VPHN) &&
 		   get_lppaca()->shared_proc) {
-		/* Disabled until races with load balancing are fixed */
 		if (!vphn_enabled) {
 			prrn_enabled = 0;
 			vphn_enabled = 1;

^ permalink raw reply

* [PATCH v4 10/13] Update NUMA VDSO information when updating CPU maps
From: Nathan Fontenot @ 2013-04-24 16:03 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

The following patch adds vdso_getcpu_init(), which stores the NUMA node for
a cpu in SPRG3:

Commit 18ad51dd34 ("powerpc: Add VDSO version of getcpu") adds
vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3.

This patch ensures that this information is also updated when the NUMA
affinity of a cpu changes.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-04-23 15:26:54.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-04-23 15:28:21.000000000 -0500
@@ -30,6 +30,7 @@
 #include <asm/paca.h>
 #include <asm/hvcall.h>
 #include <asm/setup.h>
+#include <asm/vdso.h>
 
 static int numa_enabled = 1;
 
@@ -1434,6 +1435,7 @@
 		unregister_cpu_under_node(update->cpu, update->old_nid);
 		unmap_cpu_from_node(update->cpu);
 		map_cpu_to_node(update->cpu, update->new_nid);
+		vdso_getcpu_init();
 		register_cpu_under_node(update->cpu, update->new_nid);
 	}
 
@@ -1449,6 +1451,7 @@
 	unsigned int cpu, changed = 0;
 	struct topology_update_data *updates, *ud;
 	unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0};
+	cpumask_t updated_cpus;
 	struct device *dev;
 	int weight, i = 0;
 
@@ -1460,6 +1463,8 @@
 	if (!updates)
 		return 0;
 
+	cpumask_clear(&updated_cpus);
+
 	for_each_cpu(cpu, &cpu_associativity_changes_mask) {
 		ud = &updates[i++];
 		ud->cpu = cpu;
@@ -1470,12 +1475,13 @@
 			ud->new_nid = first_online_node;
 
 		ud->old_nid = numa_cpu_lookup_table[cpu];
+		cpumask_set_cpu(cpu, &updated_cpus);
 
 		if (i < weight)
 			ud->next = &updates[i];
 	}
 
-	stop_machine(update_cpu_topology, &updates[0], cpu_online_mask);
+	stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
 
 	for (ud = &updates[0]; ud; ud = ud->next) {
 		dev = get_cpu_device(ud->cpu);

^ permalink raw reply

* [PATCH v4 9/13] Use stop machine to update cpu maps
From: Nathan Fontenot @ 2013-04-24 16:02 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

The new PRRN firmware feature allows CPU and memory resources to be
transparently reassigned across NUMA boundaries. When this happens, the
kernel must update the node maps to reflect the new affinity information.

Although the NUMA maps can be protected by locking primitives during the
update itself, this is insufficient to prevent concurrent accesses to these
structures. Since cpumask_of_node() hands out a pointer to these
structures, they can still be modified outside of the lock. Furthermore,
tracking down each usage of these pointers and adding locks would be quite
invasive and difficult to maintain.

The approach used is to make a list of affected cpus and call stop_machine
to have the update routine run on each of the affected cpus allowing them
to update themselves. Each cpu finds itself in the list of cpus and makes
the appropriate updates. We need to have each cpu do this for themselves to
handle calls to vdso_getcpu_init() added in a subsequent patch.

Situations like these are best handled using stop_machine(). Since the NUMA
affinity updates are exceptionally rare events, this approach has the
benefit of not adding any overhead while accessing the NUMA maps during
normal operation.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |   82 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 64 insertions(+), 18 deletions(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-04-23 15:20:18.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-04-23 15:26:54.000000000 -0500
@@ -22,6 +22,7 @@
 #include <linux/pfn.h>
 #include <linux/cpuset.h>
 #include <linux/node.h>
+#include <linux/stop_machine.h>
 #include <asm/sparsemem.h>
 #include <asm/prom.h>
 #include <asm/smp.h>
@@ -1254,6 +1255,13 @@
 
 /* Virtual Processor Home Node (VPHN) support */
 #ifdef CONFIG_PPC_SPLPAR
+struct topology_update_data {
+	struct topology_update_data *next;
+	unsigned int cpu;
+	int old_nid;
+	int new_nid;
+};
+
 static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS];
 static cpumask_t cpu_associativity_changes_mask;
 static int vphn_enabled;
@@ -1405,41 +1413,79 @@
 }
 
 /*
+ * Update the CPU maps and sysfs entries for a single CPU when its NUMA
+ * characteristics change. This function doesn't perform any locking and is
+ * only safe to call from stop_machine().
+ */
+static int update_cpu_topology(void *data)
+{
+	struct topology_update_data *update;
+	unsigned long cpu;
+
+	if (!data)
+		return -EINVAL;
+
+	cpu = get_cpu();
+
+	for (update = data; update; update = update->next) {
+		if (cpu != update->cpu)
+			continue;
+
+		unregister_cpu_under_node(update->cpu, update->old_nid);
+		unmap_cpu_from_node(update->cpu);
+		map_cpu_to_node(update->cpu, update->new_nid);
+		register_cpu_under_node(update->cpu, update->new_nid);
+	}
+
+	return 0;
+}
+
+/*
  * Update the node maps and sysfs entries for each cpu whose home node
  * has changed. Returns 1 when the topology has changed, and 0 otherwise.
  */
 int arch_update_cpu_topology(void)
 {
-	int cpu, nid, old_nid, changed = 0;
+	unsigned int cpu, changed = 0;
+	struct topology_update_data *updates, *ud;
 	unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0};
 	struct device *dev;
+	int weight, i = 0;
+
+	weight = cpumask_weight(&cpu_associativity_changes_mask);
+	if (!weight)
+		return 0;
+
+	updates = kzalloc(weight * (sizeof(*updates)), GFP_KERNEL);
+	if (!updates)
+		return 0;
 
 	for_each_cpu(cpu, &cpu_associativity_changes_mask) {
+		ud = &updates[i++];
+		ud->cpu = cpu;
 		vphn_get_associativity(cpu, associativity);
-		nid = associativity_to_nid(associativity);
+		ud->new_nid = associativity_to_nid(associativity);
 
-		if (nid < 0 || !node_online(nid))
-			nid = first_online_node;
+		if (ud->new_nid < 0 || !node_online(ud->new_nid))
+			ud->new_nid = first_online_node;
 
-		old_nid = numa_cpu_lookup_table[cpu];
+		ud->old_nid = numa_cpu_lookup_table[cpu];
 
-		/* Disable hotplug while we update the cpu
-		 * masks and sysfs.
-		 */
-		get_online_cpus();
-		unregister_cpu_under_node(cpu, old_nid);
-		unmap_cpu_from_node(cpu);
-		map_cpu_to_node(cpu, nid);
-		register_cpu_under_node(cpu, nid);
-		put_online_cpus();
+		if (i < weight)
+			ud->next = &updates[i];
+	}
+
+	stop_machine(update_cpu_topology, &updates[0], cpu_online_mask);
 
-		dev = get_cpu_device(cpu);
+	for (ud = &updates[0]; ud; ud = ud->next) {
+		dev = get_cpu_device(ud->cpu);
 		if (dev)
 			kobject_uevent(&dev->kobj, KOBJ_CHANGE);
-		cpumask_clear_cpu(cpu, &cpu_associativity_changes_mask);
+		cpumask_clear_cpu(ud->cpu, &cpu_associativity_changes_mask);
 		changed = 1;
 	}
 
+	kfree(updates);
 	return changed;
 }
 
@@ -1488,10 +1534,10 @@
 	int rc = NOTIFY_DONE;
 
 	switch (action) {
-	case OF_RECONFIG_ADD_PROPERTY:
 	case OF_RECONFIG_UPDATE_PROPERTY:
 		update = (struct of_prop_reconfig *)data;
-		if (!of_prop_cmp(update->dn->type, "cpu")) {
+		if (!of_prop_cmp(update->dn->type, "cpu") &&
+		    !of_prop_cmp(update->prop->name, "ibm,associativity")) {
 			u32 core_id;
 			of_property_read_u32(update->dn, "reg", &core_id);
 			stage_topology_update(core_id);

^ permalink raw reply

* [PATCH v4 8/13] Update CPU maps when device tree is updated
From: Nathan Fontenot @ 2013-04-24 16:00 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

Platform events such as partition migration or the new PRRN firmware
feature can cause the NUMA characteristics of a CPU to change, and these
changes will be reflected in the device tree nodes for the affected
CPUs.

This patch registers a handler for Open Firmware device tree updates
and reconfigures the CPU and node maps whenever the associativity
changes. Currently, this is accomplished by marking the affected CPUs in
the cpu_associativity_changes_mask and allowing
arch_update_cpu_topology() to retrieve the new associativity information
using hcall_vphn().

Protecting the NUMA cpu maps from concurrent access during an update
operation will be addressed in a subsequent patch in this series.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/firmware.h       |    3 
 arch/powerpc/include/asm/prom.h           |    1 
 arch/powerpc/mm/numa.c                    |   99 ++++++++++++++++++++++--------
 arch/powerpc/platforms/pseries/firmware.c |    1 
 4 files changed, 79 insertions(+), 25 deletions(-)

Index: powerpc/arch/powerpc/include/asm/prom.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/prom.h	2013-04-23 14:46:23.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/prom.h	2013-04-23 15:20:18.000000000 -0500
@@ -128,6 +128,7 @@
 #define OV5_CMO			0x0480	/* Cooperative Memory Overcommitment */
 #define OV5_XCMO		0x0440	/* Page Coalescing */
 #define OV5_TYPE1_AFFINITY	0x0580	/* Type 1 NUMA affinity */
+#define OV5_PRRN		0x0540	/* Platform Resource Reassignment */
 #define OV5_PFO_HW_RNG		0x0E80	/* PFO Random Number Generator */
 #define OV5_PFO_HW_842		0x0E40	/* PFO Compression Accelerator */
 #define OV5_PFO_HW_ENCR		0x0E20	/* PFO Encryption Accelerator */
Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-04-23 15:19:15.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-04-23 15:20:18.000000000 -0500
@@ -1257,7 +1257,8 @@
 static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS];
 static cpumask_t cpu_associativity_changes_mask;
 static int vphn_enabled;
-static void set_topology_timer(void);
+static int prrn_enabled;
+static void reset_topology_timer(void);
 
 /*
  * Store the current values of the associativity change counters in the
@@ -1293,11 +1294,9 @@
  */
 static int update_cpu_associativity_changes_mask(void)
 {
-	int cpu, nr_cpus = 0;
+	int cpu;
 	cpumask_t *changes = &cpu_associativity_changes_mask;
 
-	cpumask_clear(changes);
-
 	for_each_possible_cpu(cpu) {
 		int i, changed = 0;
 		u8 *counts = vphn_cpu_change_counts[cpu];
@@ -1311,11 +1310,10 @@
 		}
 		if (changed) {
 			cpumask_set_cpu(cpu, changes);
-			nr_cpus++;
 		}
 	}
 
-	return nr_cpus;
+	return cpumask_weight(changes);
 }
 
 /*
@@ -1416,7 +1414,7 @@
 	unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0};
 	struct device *dev;
 
-	for_each_cpu(cpu,&cpu_associativity_changes_mask) {
+	for_each_cpu(cpu, &cpu_associativity_changes_mask) {
 		vphn_get_associativity(cpu, associativity);
 		nid = associativity_to_nid(associativity);
 
@@ -1438,6 +1436,7 @@
 		dev = get_cpu_device(cpu);
 		if (dev)
 			kobject_uevent(&dev->kobj, KOBJ_CHANGE);
+		cpumask_clear_cpu(cpu, &cpu_associativity_changes_mask);
 		changed = 1;
 	}
 
@@ -1457,37 +1456,80 @@
 
 static void topology_timer_fn(unsigned long ignored)
 {
-	if (!vphn_enabled)
-		return;
-	if (update_cpu_associativity_changes_mask() > 0)
+	if (prrn_enabled && cpumask_weight(&cpu_associativity_changes_mask))
 		topology_schedule_update();
-	set_topology_timer();
+	else if (vphn_enabled) {
+		if (update_cpu_associativity_changes_mask() > 0)
+			topology_schedule_update();
+		reset_topology_timer();
+	}
 }
 static struct timer_list topology_timer =
 	TIMER_INITIALIZER(topology_timer_fn, 0, 0);
 
-static void set_topology_timer(void)
+static void reset_topology_timer(void)
 {
 	topology_timer.data = 0;
 	topology_timer.expires = jiffies + 60 * HZ;
-	add_timer(&topology_timer);
+	mod_timer(&topology_timer, topology_timer.expires);
+}
+
+static void stage_topology_update(int core_id)
+{
+	cpumask_or(&cpu_associativity_changes_mask,
+		&cpu_associativity_changes_mask, cpu_sibling_mask(core_id));
+	reset_topology_timer();
 }
 
+static int dt_update_callback(struct notifier_block *nb,
+				unsigned long action, void *data)
+{
+	struct of_prop_reconfig *update;
+	int rc = NOTIFY_DONE;
+
+	switch (action) {
+	case OF_RECONFIG_ADD_PROPERTY:
+	case OF_RECONFIG_UPDATE_PROPERTY:
+		update = (struct of_prop_reconfig *)data;
+		if (!of_prop_cmp(update->dn->type, "cpu")) {
+			u32 core_id;
+			of_property_read_u32(update->dn, "reg", &core_id);
+			stage_topology_update(core_id);
+			rc = NOTIFY_OK;
+		}
+		break;
+	}
+
+	return rc;
+}
+
+static struct notifier_block dt_update_nb = {
+	.notifier_call = dt_update_callback,
+};
+
 /*
- * Start polling for VPHN associativity changes.
+ * Start polling for associativity changes.
  */
 int start_topology_update(void)
 {
 	int rc = 0;
 
-	/* Disabled until races with load balancing are fixed */
-	if (0 && firmware_has_feature(FW_FEATURE_VPHN) &&
-	    get_lppaca()->shared_proc) {
-		vphn_enabled = 1;
-		setup_cpu_associativity_change_counters();
-		init_timer_deferrable(&topology_timer);
-		set_topology_timer();
-		rc = 1;
+	if (firmware_has_feature(FW_FEATURE_PRRN)) {
+		if (!prrn_enabled) {
+			prrn_enabled = 1;
+			vphn_enabled = 0;
+			rc = of_reconfig_notifier_register(&dt_update_nb);
+		}
+	} else if (0 && firmware_has_feature(FW_FEATURE_VPHN) &&
+		   get_lppaca()->shared_proc) {
+		/* Disabled until races with load balancing are fixed */
+		if (!vphn_enabled) {
+			prrn_enabled = 0;
+			vphn_enabled = 1;
+			setup_cpu_associativity_change_counters();
+			init_timer_deferrable(&topology_timer);
+			reset_topology_timer();
+		}
 	}
 
 	return rc;
@@ -1499,7 +1541,16 @@
  */
 int stop_topology_update(void)
 {
-	vphn_enabled = 0;
-	return del_timer_sync(&topology_timer);
+	int rc = 0;
+
+	if (prrn_enabled) {
+		prrn_enabled = 0;
+		rc = of_reconfig_notifier_unregister(&dt_update_nb);
+	} else if (vphn_enabled) {
+		vphn_enabled = 0;
+		rc = del_timer_sync(&topology_timer);
+	}
+
+	return rc;
 }
 #endif /* CONFIG_PPC_SPLPAR */
Index: powerpc/arch/powerpc/include/asm/firmware.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/firmware.h	2013-04-23 14:46:23.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/firmware.h	2013-04-23 15:20:18.000000000 -0500
@@ -51,6 +51,7 @@
 #define FW_FEATURE_SET_MODE	ASM_CONST(0x0000000040000000)
 #define FW_FEATURE_BEST_ENERGY	ASM_CONST(0x0000000080000000)
 #define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000)
+#define FW_FEATURE_PRRN		ASM_CONST(0x0000000200000000)
 
 #ifndef __ASSEMBLY__
 
@@ -66,7 +67,7 @@
 		FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
 		FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
 		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
-		FW_FEATURE_TYPE1_AFFINITY,
+		FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv2,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
Index: powerpc/arch/powerpc/platforms/pseries/firmware.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/firmware.c	2013-04-23 14:56:46.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/firmware.c	2013-04-23 15:20:18.000000000 -0500
@@ -110,6 +110,7 @@
 static __initdata struct vec5_fw_feature
 vec5_fw_features_table[] = {
 	{FW_FEATURE_TYPE1_AFFINITY,	OV5_TYPE1_AFFINITY},
+	{FW_FEATURE_PRRN,		OV5_PRRN},
 };
 
 void __init fw_vec5_feature_init(const char *vec5, unsigned long len)

^ permalink raw reply

* [PATCH v4 7/13] Update numa.c to use updated firmware_has_feature()
From: Nathan Fontenot @ 2013-04-24 15:58 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

Update the numa code to use the updated firmware_has_feature() when checking
for type 1 affinity.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |   22 +++-------------------
 1 file changed, 3 insertions(+), 19 deletions(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-04-23 12:54:23.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-04-23 15:19:15.000000000 -0500
@@ -291,9 +291,7 @@
 static int __init find_min_common_depth(void)
 {
 	int depth;
-	struct device_node *chosen;
 	struct device_node *root;
-	const char *vec5;
 
 	if (firmware_has_feature(FW_FEATURE_OPAL))
 		root = of_find_node_by_path("/ibm,opal");
@@ -325,24 +323,10 @@
 
 	distance_ref_points_depth /= sizeof(int);
 
-#define VEC5_AFFINITY_BYTE	5
-#define VEC5_AFFINITY		0x80
-
-	if (firmware_has_feature(FW_FEATURE_OPAL))
+	if (firmware_has_feature(FW_FEATURE_OPAL) ||
+	    firmware_has_feature(FW_FEATURE_TYPE1_AFFINITY)) {
+		dbg("Using form 1 affinity\n");
 		form1_affinity = 1;
-	else {
-		chosen = of_find_node_by_path("/chosen");
-		if (chosen) {
-			vec5 = of_get_property(chosen,
-					       "ibm,architecture-vec-5", NULL);
-			if (vec5 && (vec5[VEC5_AFFINITY_BYTE] &
-							VEC5_AFFINITY)) {
-				dbg("Using form 1 affinity\n");
-				form1_affinity = 1;
-			}
-
-			of_node_put(chosen);
-		}
 	}
 
 	if (form1_affinity) {

^ permalink raw reply

* [PATCH v4 6/13] Update firmware_has_feature() to check architecture vector 5 bits
From: Nathan Fontenot @ 2013-04-24 15:57 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

The firmware_has_feature() function makes it easy to check for supported
features of the hypervisor. This patch extends the capability of
firmware_has_feature() to include checking for specified bits
in vector 5 of the architecture vector as reported in the device tree.

As part of this the #defines used for the architecture vector are re-defined 
such that each option has the index into vector 5 and the feature bit encoded
into it. This makes checking for architecture bits when initiating data
for firmware_has_feature much easier.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/firmware.h       |    4 +-
 arch/powerpc/include/asm/prom.h           |   45 ++++++++++++---------------
 arch/powerpc/kernel/prom_init.c           |   23 ++++++++++----
 arch/powerpc/platforms/pseries/firmware.c |   49 +++++++++++++++++++++++-------
 arch/powerpc/platforms/pseries/pseries.h  |    5 ++-
 arch/powerpc/platforms/pseries/setup.c    |   40 ++++++++++++++++--------
 6 files changed, 111 insertions(+), 55 deletions(-)

Index: powerpc/arch/powerpc/include/asm/prom.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/prom.h	2013-04-23 14:17:16.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/prom.h	2013-04-23 14:46:23.000000000 -0500
@@ -110,31 +110,28 @@
 /* Option vector 4: IBM PAPR implementation */
 #define OV4_MIN_ENT_CAP		0x01	/* minimum VP entitled capacity */
 
-/* Option vector 5: PAPR/OF options supported */
-#define OV5_LPAR		0x80	/* logical partitioning supported */
-#define OV5_SPLPAR		0x40	/* shared-processor LPAR supported */
+/* Option vector 5: PAPR/OF options supported
+ * These bits are also used in firmware_has_feature() to validate
+ * the capabilities reported for vector 5 in the device tree so we
+ * encode the vector index in the define and use the OV5_FEAT()
+ * and OV5_INDX() macros to extract the desired information.
+ */
+#define OV5_FEAT(x)	((x) & 0xff)
+#define OV5_INDX(x)	((x) >> 8)
+#define OV5_LPAR		0x0280	/* logical partitioning supported */
+#define OV5_SPLPAR		0x0240	/* shared-processor LPAR supported */
 /* ibm,dynamic-reconfiguration-memory property supported */
-#define OV5_DRCONF_MEMORY	0x20
-#define OV5_LARGE_PAGES		0x10	/* large pages supported */
-#define OV5_DONATE_DEDICATE_CPU	0x02	/* donate dedicated CPU support */
-/* PCIe/MSI support.  Without MSI full PCIe is not supported */
-#ifdef CONFIG_PCI_MSI
-#define OV5_MSI			0x01	/* PCIe/MSI support */
-#else
-#define OV5_MSI			0x00
-#endif /* CONFIG_PCI_MSI */
-#ifdef CONFIG_PPC_SMLPAR
-#define OV5_CMO			0x80	/* Cooperative Memory Overcommitment */
-#define OV5_XCMO		0x40	/* Page Coalescing */
-#else
-#define OV5_CMO			0x00
-#define OV5_XCMO		0x00
-#endif
-#define OV5_TYPE1_AFFINITY	0x80	/* Type 1 NUMA affinity */
-#define OV5_PFO_HW_RNG		0x80	/* PFO Random Number Generator */
-#define OV5_PFO_HW_842		0x40	/* PFO Compression Accelerator */
-#define OV5_PFO_HW_ENCR		0x20	/* PFO Encryption Accelerator */
-#define OV5_SUB_PROCESSORS	0x01	/* 1,2,or 4 Sub-Processors supported */
+#define OV5_DRCONF_MEMORY	0x0220
+#define OV5_LARGE_PAGES		0x0210	/* large pages supported */
+#define OV5_DONATE_DEDICATE_CPU	0x0202	/* donate dedicated CPU support */
+#define OV5_MSI			0x0201	/* PCIe/MSI support */
+#define OV5_CMO			0x0480	/* Cooperative Memory Overcommitment */
+#define OV5_XCMO		0x0440	/* Page Coalescing */
+#define OV5_TYPE1_AFFINITY	0x0580	/* Type 1 NUMA affinity */
+#define OV5_PFO_HW_RNG		0x0E80	/* PFO Random Number Generator */
+#define OV5_PFO_HW_842		0x0E40	/* PFO Compression Accelerator */
+#define OV5_PFO_HW_ENCR		0x0E20	/* PFO Encryption Accelerator */
+#define OV5_SUB_PROCESSORS	0x0F01	/* 1,2,or 4 Sub-Processors supported */
 
 /* Option Vector 6: IBM PAPR hints */
 #define OV6_LINUX		0x02	/* Linux is our OS */
Index: powerpc/arch/powerpc/kernel/prom_init.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/prom_init.c	2013-04-23 14:17:16.000000000 -0500
+++ powerpc/arch/powerpc/kernel/prom_init.c	2013-04-23 14:46:23.000000000 -0500
@@ -684,11 +684,21 @@
 	/* option vector 5: PAPR/OF options */
 	19 - 2,				/* length */
 	0,				/* don't ignore, don't halt */
-	OV5_LPAR | OV5_SPLPAR | OV5_LARGE_PAGES | OV5_DRCONF_MEMORY |
-	OV5_DONATE_DEDICATE_CPU | OV5_MSI,
+	OV5_FEAT(OV5_LPAR) | OV5_FEAT(OV5_SPLPAR) | OV5_FEAT(OV5_LARGE_PAGES) |
+	OV5_FEAT(OV5_DRCONF_MEMORY) | OV5_FEAT(OV5_DONATE_DEDICATE_CPU) |
+#ifdef CONFIG_PCI_MSI
+	/* PCIe/MSI support.  Without MSI full PCIe is not supported */
+	OV5_FEAT(OV5_MSI),
+#else
+	0,
+#endif
+	0,
+#ifdef CONFIG_PPC_SMLPAR
+	OV5_FEAT(OV5_CMO) | OV5_FEAT(OV5_XCMO),
+#else
 	0,
-	OV5_CMO | OV5_XCMO,
-	OV5_TYPE1_AFFINITY,
+#endif
+	OV5_FEAT(OV5_TYPE1_AFFINITY),
 	0,
 	0,
 	0,
@@ -702,8 +712,9 @@
 	0,
 	0,
 	0,
-	OV5_PFO_HW_RNG | OV5_PFO_HW_ENCR | OV5_PFO_HW_842,
-	OV5_SUB_PROCESSORS,
+	OV5_FEAT(OV5_PFO_HW_RNG) | OV5_FEAT(OV5_PFO_HW_ENCR) |
+	OV5_FEAT(OV5_PFO_HW_842),
+	OV5_FEAT(OV5_SUB_PROCESSORS),
 	/* option vector 6: IBM PAPR hints */
 	4 - 2,				/* length */
 	0,
Index: powerpc/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/setup.c	2013-04-23 14:17:16.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/setup.c	2013-04-23 14:46:23.000000000 -0500
@@ -628,25 +628,39 @@
  * Called very early, MMU is off, device-tree isn't unflattened
  */
 
-static int __init pSeries_probe_hypertas(unsigned long node,
-					 const char *uname, int depth,
-					 void *data)
+static int __init pseries_probe_fw_features(unsigned long node,
+					    const char *uname, int depth,
+					    void *data)
 {
-	const char *hypertas;
+	const char *prop;
 	unsigned long len;
+	static int hypertas_found;
+	static int vec5_found;
 
-	if (depth != 1 ||
-	    (strcmp(uname, "rtas") != 0 && strcmp(uname, "rtas@0") != 0))
+	if (depth != 1)
 		return 0;
 
-	hypertas = of_get_flat_dt_prop(node, "ibm,hypertas-functions", &len);
-	if (!hypertas)
-		return 1;
+	if (!strcmp(uname, "rtas") || !strcmp(uname, "rtas@0")) {
+		prop = of_get_flat_dt_prop(node, "ibm,hypertas-functions",
+					   &len);
+		if (prop) {
+			powerpc_firmware_features |= FW_FEATURE_LPAR;
+			fw_hypertas_feature_init(prop, len);
+		}
+
+		hypertas_found = 1;
+	}
+
+	if (!strcmp(uname, "chosen")) {
+		prop = of_get_flat_dt_prop(node, "ibm,architecture-vec-5",
+					   &len);
+		if (prop)
+			fw_vec5_feature_init(prop, len);
 
-	powerpc_firmware_features |= FW_FEATURE_LPAR;
-	fw_feature_init(hypertas, len);
+		vec5_found = 1;
+	}
 
-	return 1;
+	return hypertas_found && vec5_found;
 }
 
 static int __init pSeries_probe(void)
@@ -669,7 +683,7 @@
 	pr_debug("pSeries detected, looking for LPAR capability...\n");
 
 	/* Now try to figure out if we are running on LPAR */
-	of_scan_flat_dt(pSeries_probe_hypertas, NULL);
+	of_scan_flat_dt(pseries_probe_fw_features, NULL);
 
 	if (firmware_has_feature(FW_FEATURE_LPAR))
 		hpte_init_lpar();
Index: powerpc/arch/powerpc/platforms/pseries/firmware.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/firmware.c	2013-04-23 14:46:10.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/firmware.c	2013-04-23 14:56:46.000000000 -0500
@@ -28,18 +28,18 @@
 
 #include "pseries.h"
 
-typedef struct {
+struct hypertas_fw_feature {
     unsigned long val;
     char * name;
-} firmware_feature_t;
+};
 
 /*
  * The names in this table match names in rtas/ibm,hypertas-functions.  If the
  * entry ends in a '*', only upto the '*' is matched.  Otherwise the entire
  * string must match.
  */
-static __initdata firmware_feature_t
-firmware_features_table[] = {
+static __initdata struct hypertas_fw_feature
+hypertas_fw_features_table[] = {
 	{FW_FEATURE_PFT,		"hcall-pft"},
 	{FW_FEATURE_TCE,		"hcall-tce"},
 	{FW_FEATURE_SPRG0,		"hcall-sprg0"},
@@ -69,16 +69,16 @@
  * device-tree/ibm,hypertas-functions.  Ultimately this functionality may
  * be moved into prom.c prom_init().
  */
-void __init fw_feature_init(const char *hypertas, unsigned long len)
+void __init fw_hypertas_feature_init(const char *hypertas, unsigned long len)
 {
 	const char *s;
 	int i;
 
-	pr_debug(" -> fw_feature_init()\n");
+	pr_debug(" -> fw_hypertas_feature_init()\n");
 
 	for (s = hypertas; s < hypertas + len; s += strlen(s) + 1) {
-		for (i = 0; i < ARRAY_SIZE(firmware_features_table); i++) {
-			const char *name = firmware_features_table[i].name;
+		for (i = 0; i < ARRAY_SIZE(hypertas_fw_features_table); i++) {
+			const char *name = hypertas_fw_features_table[i].name;
 			size_t size;
 
 			/*
@@ -94,10 +94,39 @@
 
 			/* we have a match */
 			powerpc_firmware_features |=
-				firmware_features_table[i].val;
+				hypertas_fw_features_table[i].val;
 			break;
 		}
 	}
 
-	pr_debug(" <- fw_feature_init()\n");
+	pr_debug(" <- fw_hypertas_feature_init()\n");
+}
+
+struct vec5_fw_feature {
+	unsigned long	val;
+	unsigned int	feature;
+};
+
+static __initdata struct vec5_fw_feature
+vec5_fw_features_table[] = {
+	{FW_FEATURE_TYPE1_AFFINITY,	OV5_TYPE1_AFFINITY},
+};
+
+void __init fw_vec5_feature_init(const char *vec5, unsigned long len)
+{
+	unsigned int index, feat;
+	int i;
+
+	pr_debug(" -> fw_vec5_feature_init()\n");
+
+	for (i = 0; i < ARRAY_SIZE(vec5_fw_features_table); i++) {
+		index = OV5_INDX(vec5_fw_features_table[i].feature);
+		feat = OV5_FEAT(vec5_fw_features_table[i].feature);
+
+		if (vec5[index] & feat)
+			powerpc_firmware_features |=
+				vec5_fw_features_table[i].val;
+	}
+
+	pr_debug(" <- fw_vec5_feature_init()\n");
 }
Index: powerpc/arch/powerpc/include/asm/firmware.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/firmware.h	2013-04-23 14:19:54.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/firmware.h	2013-04-23 14:46:23.000000000 -0500
@@ -50,6 +50,7 @@
 #define FW_FEATURE_OPALv2	ASM_CONST(0x0000000020000000)
 #define FW_FEATURE_SET_MODE	ASM_CONST(0x0000000040000000)
 #define FW_FEATURE_BEST_ENERGY	ASM_CONST(0x0000000080000000)
+#define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000)
 
 #ifndef __ASSEMBLY__
 
@@ -64,7 +65,8 @@
 		FW_FEATURE_BULK_REMOVE | FW_FEATURE_XDABR |
 		FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
 		FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
-		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY,
+		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
+		FW_FEATURE_TYPE1_AFFINITY,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv2,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
Index: powerpc/arch/powerpc/platforms/pseries/pseries.h
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/pseries.h	2013-04-23 14:17:16.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/pseries.h	2013-04-23 14:46:23.000000000 -0500
@@ -19,7 +19,10 @@
 
 #include <linux/of.h>
 
-extern void __init fw_feature_init(const char *hypertas, unsigned long len);
+extern void __init fw_hypertas_feature_init(const char *hypertas,
+					    unsigned long len);
+extern void __init fw_vec5_feature_init(const char *hypertas,
+					unsigned long len);
 
 struct pt_regs;
 

^ permalink raw reply

* [PATCH v4 5/13] Use ARRAY_SIZE to iterate over firmware_features_table array
From: Nathan Fontenot @ 2013-04-24 15:55 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

When iterating over the entries in firmware_features_table we only need
to go over the actual number of entries in the array instead of declaring
it to be bigger and checking to make sure there is a valid entry in every
slot.

This patch removes the FIRMWARE_MAX_FEATURES #define and replaces the
array looping with the use of ARRAY_SIZE().

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/firmware.h       |    1 -
 arch/powerpc/platforms/pseries/firmware.c |    8 +++-----
 2 files changed, 3 insertions(+), 6 deletions(-)

Index: powerpc/arch/powerpc/include/asm/firmware.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/firmware.h	2013-04-23 14:17:16.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/firmware.h	2013-04-23 14:19:54.000000000 -0500
@@ -18,7 +18,6 @@
 #include <asm/feature-fixups.h>
 
 /* firmware feature bitmask values */
-#define FIRMWARE_MAX_FEATURES 63
 
 #define FW_FEATURE_PFT		ASM_CONST(0x0000000000000001)
 #define FW_FEATURE_TCE		ASM_CONST(0x0000000000000002)
Index: powerpc/arch/powerpc/platforms/pseries/firmware.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/firmware.c	2013-04-23 14:17:16.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/firmware.c	2013-04-23 14:46:10.000000000 -0500
@@ -39,7 +39,7 @@
  * string must match.
  */
 static __initdata firmware_feature_t
-firmware_features_table[FIRMWARE_MAX_FEATURES] = {
+firmware_features_table[] = {
 	{FW_FEATURE_PFT,		"hcall-pft"},
 	{FW_FEATURE_TCE,		"hcall-tce"},
 	{FW_FEATURE_SPRG0,		"hcall-sprg0"},
@@ -77,12 +77,10 @@
 	pr_debug(" -> fw_feature_init()\n");
 
 	for (s = hypertas; s < hypertas + len; s += strlen(s) + 1) {
-		for (i = 0; i < FIRMWARE_MAX_FEATURES; i++) {
+		for (i = 0; i < ARRAY_SIZE(firmware_features_table); i++) {
 			const char *name = firmware_features_table[i].name;
 			size_t size;
-			/* check value against table of strings */
-			if (!name)
-				continue;
+
 			/*
 			 * If there is a '*' at the end of name, only check
 			 * upto there

^ permalink raw reply

* [PATCH v4 4/13] Move architecture vector definitions to prom.h
From: Nathan Fontenot @ 2013-04-24 15:53 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

As part of handling of PRRN events we need to check vector 5 of the
architecture vector bits reported in the device tree to ensure PRRN event
handling is enabled. To do this firmware_has_feature() is updated (in a
subsequent patch) to make this check vector 5 bits. To avoid having to
re-define bits in the architecture vector the bit definitions are moved
to prom.h.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

---
 arch/powerpc/include/asm/prom.h |   71 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/prom_init.c |   75 +++-------------------------------------
 2 files changed, 77 insertions(+), 69 deletions(-)

Index: powerpc/arch/powerpc/include/asm/prom.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/prom.h	2013-04-23 12:54:23.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/prom.h	2013-04-23 13:56:15.000000000 -0500
@@ -74,6 +74,77 @@
 #define DRCONF_MEM_AI_INVALID	0x00000040
 #define DRCONF_MEM_RESERVED	0x00000080
 
+/*
+ * There are two methods for telling firmware what our capabilities are.
+ * Newer machines have an "ibm,client-architecture-support" method on the
+ * root node.  For older machines, we have to call the "process-elf-header"
+ * method in the /packages/elf-loader node, passing it a fake 32-bit
+ * ELF header containing a couple of PT_NOTE sections that contain
+ * structures that contain various information.
+ */
+
+/* New method - extensible architecture description vector. */
+
+/* Option vector bits - generic bits in byte 1 */
+#define OV_IGNORE		0x80	/* ignore this vector */
+#define OV_CESSATION_POLICY	0x40	/* halt if unsupported option present*/
+
+/* Option vector 1: processor architectures supported */
+#define OV1_PPC_2_00		0x80	/* set if we support PowerPC 2.00 */
+#define OV1_PPC_2_01		0x40	/* set if we support PowerPC 2.01 */
+#define OV1_PPC_2_02		0x20	/* set if we support PowerPC 2.02 */
+#define OV1_PPC_2_03		0x10	/* set if we support PowerPC 2.03 */
+#define OV1_PPC_2_04		0x08	/* set if we support PowerPC 2.04 */
+#define OV1_PPC_2_05		0x04	/* set if we support PowerPC 2.05 */
+#define OV1_PPC_2_06		0x02	/* set if we support PowerPC 2.06 */
+#define OV1_PPC_2_07		0x01	/* set if we support PowerPC 2.07 */
+
+/* Option vector 2: Open Firmware options supported */
+#define OV2_REAL_MODE		0x20	/* set if we want OF in real mode */
+
+/* Option vector 3: processor options supported */
+#define OV3_FP			0x80	/* floating point */
+#define OV3_VMX			0x40	/* VMX/Altivec */
+#define OV3_DFP			0x20	/* decimal FP */
+
+/* Option vector 4: IBM PAPR implementation */
+#define OV4_MIN_ENT_CAP		0x01	/* minimum VP entitled capacity */
+
+/* Option vector 5: PAPR/OF options supported */
+#define OV5_LPAR		0x80	/* logical partitioning supported */
+#define OV5_SPLPAR		0x40	/* shared-processor LPAR supported */
+/* ibm,dynamic-reconfiguration-memory property supported */
+#define OV5_DRCONF_MEMORY	0x20
+#define OV5_LARGE_PAGES		0x10	/* large pages supported */
+#define OV5_DONATE_DEDICATE_CPU	0x02	/* donate dedicated CPU support */
+/* PCIe/MSI support.  Without MSI full PCIe is not supported */
+#ifdef CONFIG_PCI_MSI
+#define OV5_MSI			0x01	/* PCIe/MSI support */
+#else
+#define OV5_MSI			0x00
+#endif /* CONFIG_PCI_MSI */
+#ifdef CONFIG_PPC_SMLPAR
+#define OV5_CMO			0x80	/* Cooperative Memory Overcommitment */
+#define OV5_XCMO		0x40	/* Page Coalescing */
+#else
+#define OV5_CMO			0x00
+#define OV5_XCMO		0x00
+#endif
+#define OV5_TYPE1_AFFINITY	0x80	/* Type 1 NUMA affinity */
+#define OV5_PFO_HW_RNG		0x80	/* PFO Random Number Generator */
+#define OV5_PFO_HW_842		0x40	/* PFO Compression Accelerator */
+#define OV5_PFO_HW_ENCR		0x20	/* PFO Encryption Accelerator */
+#define OV5_SUB_PROCESSORS	0x01	/* 1,2,or 4 Sub-Processors supported */
+
+/* Option Vector 6: IBM PAPR hints */
+#define OV6_LINUX		0x02	/* Linux is our OS */
+
+/*
+ * The architecture vector has an array of PVR mask/value pairs,
+ * followed by # option vectors - 1, followed by the option vectors.
+ */
+extern unsigned char ibm_architecture_vec[];
+
 /* These includes are put at the bottom because they may contain things
  * that are overridden by this file.  Ideally they shouldn't be included
  * by this file, but there are a bunch of .c files that currently depend
Index: powerpc/arch/powerpc/kernel/prom_init.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/prom_init.c	2013-04-23 12:54:23.000000000 -0500
+++ powerpc/arch/powerpc/kernel/prom_init.c	2013-04-23 13:55:21.000000000 -0500
@@ -627,16 +627,11 @@
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
 /*
- * There are two methods for telling firmware what our capabilities are.
- * Newer machines have an "ibm,client-architecture-support" method on the
- * root node.  For older machines, we have to call the "process-elf-header"
- * method in the /packages/elf-loader node, passing it a fake 32-bit
- * ELF header containing a couple of PT_NOTE sections that contain
- * structures that contain various information.
- */
-
-/*
- * New method - extensible architecture description vector.
+ * The architecture vector has an array of PVR mask/value pairs,
+ * followed by # option vectors - 1, followed by the option vectors.
+ *
+ * See prom.h for the definition of the bits specified in the
+ * architecture vector.
  *
  * Because the description vector contains a mix of byte and word
  * values, we declare it as an unsigned char array, and use this
@@ -645,65 +640,7 @@
 #define W(x)	((x) >> 24) & 0xff, ((x) >> 16) & 0xff, \
 		((x) >> 8) & 0xff, (x) & 0xff
 
-/* Option vector bits - generic bits in byte 1 */
-#define OV_IGNORE		0x80	/* ignore this vector */
-#define OV_CESSATION_POLICY	0x40	/* halt if unsupported option present*/
-
-/* Option vector 1: processor architectures supported */
-#define OV1_PPC_2_00		0x80	/* set if we support PowerPC 2.00 */
-#define OV1_PPC_2_01		0x40	/* set if we support PowerPC 2.01 */
-#define OV1_PPC_2_02		0x20	/* set if we support PowerPC 2.02 */
-#define OV1_PPC_2_03		0x10	/* set if we support PowerPC 2.03 */
-#define OV1_PPC_2_04		0x08	/* set if we support PowerPC 2.04 */
-#define OV1_PPC_2_05		0x04	/* set if we support PowerPC 2.05 */
-#define OV1_PPC_2_06		0x02	/* set if we support PowerPC 2.06 */
-#define OV1_PPC_2_07		0x01	/* set if we support PowerPC 2.07 */
-
-/* Option vector 2: Open Firmware options supported */
-#define OV2_REAL_MODE		0x20	/* set if we want OF in real mode */
-
-/* Option vector 3: processor options supported */
-#define OV3_FP			0x80	/* floating point */
-#define OV3_VMX			0x40	/* VMX/Altivec */
-#define OV3_DFP			0x20	/* decimal FP */
-
-/* Option vector 4: IBM PAPR implementation */
-#define OV4_MIN_ENT_CAP		0x01	/* minimum VP entitled capacity */
-
-/* Option vector 5: PAPR/OF options supported */
-#define OV5_LPAR		0x80	/* logical partitioning supported */
-#define OV5_SPLPAR		0x40	/* shared-processor LPAR supported */
-/* ibm,dynamic-reconfiguration-memory property supported */
-#define OV5_DRCONF_MEMORY	0x20
-#define OV5_LARGE_PAGES		0x10	/* large pages supported */
-#define OV5_DONATE_DEDICATE_CPU 0x02	/* donate dedicated CPU support */
-/* PCIe/MSI support.  Without MSI full PCIe is not supported */
-#ifdef CONFIG_PCI_MSI
-#define OV5_MSI			0x01	/* PCIe/MSI support */
-#else
-#define OV5_MSI			0x00
-#endif /* CONFIG_PCI_MSI */
-#ifdef CONFIG_PPC_SMLPAR
-#define OV5_CMO			0x80	/* Cooperative Memory Overcommitment */
-#define OV5_XCMO			0x40	/* Page Coalescing */
-#else
-#define OV5_CMO			0x00
-#define OV5_XCMO			0x00
-#endif
-#define OV5_TYPE1_AFFINITY	0x80	/* Type 1 NUMA affinity */
-#define OV5_PFO_HW_RNG		0x80	/* PFO Random Number Generator */
-#define OV5_PFO_HW_842		0x40	/* PFO Compression Accelerator */
-#define OV5_PFO_HW_ENCR		0x20	/* PFO Encryption Accelerator */
-#define OV5_SUB_PROCESSORS	0x01    /* 1,2,or 4 Sub-Processors supported */
-
-/* Option Vector 6: IBM PAPR hints */
-#define OV6_LINUX		0x02	/* Linux is our OS */
-
-/*
- * The architecture vector has an array of PVR mask/value pairs,
- * followed by # option vectors - 1, followed by the option vectors.
- */
-static unsigned char ibm_architecture_vec[] = {
+unsigned char ibm_architecture_vec[] = {
 	W(0xfffe0000), W(0x003a0000),	/* POWER5/POWER5+ */
 	W(0xffff0000), W(0x003e0000),	/* POWER6 */
 	W(0xffff0000), W(0x003f0000),	/* POWER7 */

^ permalink raw reply

* [PATCH v4 2/13] Correct buffer parsing in update_dt_node()
From: Nathan Fontenot @ 2013-04-24 15:49 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

Correct parsing of the buffer returned from ibm,update-properties. The first
element is a length and the path to the property which is slightly different
from the list of properties in the buffer so we need to specifically
handle this.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Index: powerpc/arch/powerpc/platforms/pseries/mobility.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/mobility.c	2013-04-23 13:22:05.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/mobility.c	2013-04-23 13:39:36.000000000 -0500
@@ -134,6 +134,7 @@
 	char *prop_data;
 	char *rtas_buf;
 	int update_properties_token;
+	u32 vd;
 
 	update_properties_token = rtas_token("ibm,update-properties");
 	if (update_properties_token == RTAS_UNKNOWN_SERVICE)
@@ -160,13 +161,24 @@
 
 		prop_data = rtas_buf + sizeof(*upwa);
 
-		for (i = 0; i < upwa->nprops; i++) {
+		/* The first element of the buffer is the path of the node
+		 * being updated in the form of a 8 byte string length
+		 * followed by the string. Skip past this to get to the
+		 * properties being updated.
+		 */
+		vd = *prop_data++;
+		prop_data += vd;
+
+		/* The path we skipped over is counted as one of the elements
+		 * returned so start counting at one.
+		 */
+		for (i = 1; i < upwa->nprops; i++) {
 			char *prop_name;
-			u32 vd;
 
-			prop_name = prop_data + 1;
+			prop_name = prop_data;
 			prop_data += strlen(prop_name) + 1;
-			vd = *prop_data++;
+			vd = *(u32 *)prop_data;
+			prop_data += sizeof(vd);
 
 			switch (vd) {
 			case 0x00000000:

^ permalink raw reply

* [PATCH v4 3/13] Add PRRN RTAS event handler
From: Nathan Fontenot @ 2013-04-24 15:51 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

A PRRN event is signaled via the RTAS event-scan mechanism, which
returns a Hot Plug Event message "fixed part" indicating "Platform
Resource Reassignment". In response to the Hot Plug Event message,
we must call ibm,update-nodes to determine which resources were
reassigned and then ibm,update-properties to obtain the new affinity
information about those resources.

The PRRN event-scan RTAS message contains only the "fixed part" with
the "Type" field set to the value 160 and no Extended Event Log. The
four-byte Extended Event Log Length field is re-purposed (since no
Extended Event Log message is included) to pass the "scope" parameter
that causes the ibm,update-nodes to return the nodes affected by the
specific resource reassignment.

This patch adds a handler for RTAS events. The function
pseries_devicetree_update() (from mobility.c) is used to make the
ibm,update-nodes/ibm,update-properties RTAS calls. Updating the NUMA maps
(handled by a subsequent patch) will require significant processing,
so pseries_devicetree_update() is called from an asynchronous workqueue
to allow event processing to continue. 

PRRN RTAS events on pseries systems are rare events that have to be
initiated from the HMC console for the system by an IBM tech. This allows
us to assume that these events are widely spaced. Additionally, all work
on the queue is flushed before handling any new work to ensure we only have
one event in flight being handled at a time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |    2 +
 arch/powerpc/kernel/rtasd.c     |   46 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

Index: powerpc/arch/powerpc/include/asm/rtas.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/rtas.h	2013-04-23 13:22:37.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/rtas.h	2013-04-23 13:40:36.000000000 -0500
@@ -143,6 +143,8 @@
 #define RTAS_TYPE_PMGM_TIME_ALARM	0x6f
 #define RTAS_TYPE_PMGM_CONFIG_CHANGE	0x70
 #define RTAS_TYPE_PMGM_SERVICE_PROC	0x71
+/* Platform Resource Reassignment Notification */
+#define RTAS_TYPE_PRRN			0xA0

 /* RTAS check-exception vector offset */
 #define RTAS_VECTOR_EXTERNAL_INTERRUPT	0x500
Index: powerpc/arch/powerpc/kernel/rtasd.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/rtasd.c	2013-04-23 12:54:23.000000000 -0500
+++ powerpc/arch/powerpc/kernel/rtasd.c	2013-04-23 13:52:08.000000000 -0500
@@ -87,6 +87,8 @@
 			return "Resource Deallocation Event";
 		case RTAS_TYPE_DUMP:
 			return "Dump Notification Event";
+		case RTAS_TYPE_PRRN:
+			return "Platform Resource Reassignment Event";
 	}

 	return rtas_type[0];
@@ -265,9 +267,49 @@
 		spin_unlock_irqrestore(&rtasd_log_lock, s);
 		return;
 	}
+}
+
+#ifdef CONFIG_PPC_PSERIES
+static s32 prrn_update_scope;
+
+static void prrn_work_fn(struct work_struct *work)
+{
+	/*
+	 * For PRRN, we must pass the negative of the scope value in
+	 * the RTAS event.
+	 */
+	pseries_devicetree_update(-prrn_update_scope);
+}
+
+static DECLARE_WORK(prrn_work, prrn_work_fn);
+
+void prrn_schedule_update(u32 scope)
+{
+	flush_work(&prrn_work);
+	prrn_update_scope = scope;
+	schedule_work(&prrn_work);
+}
+
+static void handle_rtas_event(const struct rtas_error_log *log)
+{
+	if (log->type == RTAS_TYPE_PRRN)
+		/* For PRRN Events the extended log length is used to denote
+		 * the scope for calling rtas update-nodes.
+		 */
+		prrn_schedule_update(log->extended_log_length);
+
+	return;
+}
+
+#else

+static void handle_rtas_event(const struct rtas_error_log *log)
+{
+	return;
 }

+#endif
+
 static int rtas_log_open(struct inode * inode, struct file * file)
 {
 	return 0;
@@ -388,8 +430,10 @@
 			break;
 		}

-		if (error == 0)
+		if (error == 0) {
 			pSeries_log_error(logdata, ERR_TYPE_RTAS_LOG, 0);
+			handle_rtas_event((struct rtas_error_log *)logdata);
+		}

 	} while(error == 0);
 }

^ permalink raw reply

* [PATCH v4 1/13] Expose pseries devicetree_update()
From: Nathan Fontenot @ 2013-04-24 15:47 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <5177FB74.2050709@linux.vnet.ibm.com>

Newer firmware on Power systems can transparently reassign platform resources
(CPU and Memory) in use. For instance, if a processor or memory unit is
predicted to fail, the platform may transparently move the processing to an
equivalent unused processor or the memory state to an equivalent unused
memory unit. However, reassigning resources across NUMA boundaries may alter
the performance of the partition. When such reassignment is necessary, the
Platform Resource Reassignment Notification (PRRN) option provides a
mechanism to inform the Linux kernel of changes to the NUMA affinity of
its platform resources.

When rtasd receives a PRRN event, it needs to make a series of RTAS
calls (ibm,update-nodes and ibm,update-properties) to retrieve the
updated device tree information. These calls are already handled in the
pseries_devicetree_update() routine used in partition migration.

This patch exposes pseries_devicetree_update() to make it accessible
to other pseries routines, this patch also updates pseries_devicetree_update()
to take a 32-bit scope parameter. The scope value, which was previously hard
coded to 1 for partition migration, is used for the RTAS calls 
ibm,update-nodes/properties to update the device tree.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h           |    4 ++++
 arch/powerpc/platforms/pseries/mobility.c |   21 ++++++++++++---------
 2 files changed, 16 insertions(+), 9 deletions(-)

Index: powerpc/arch/powerpc/platforms/pseries/mobility.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/mobility.c	2013-04-15 09:18:10.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/mobility.c	2013-04-23 13:22:05.000000000 -0500
@@ -37,14 +37,16 @@
 #define UPDATE_DT_NODE	0x02000000
 #define ADD_DT_NODE	0x03000000
 
-static int mobility_rtas_call(int token, char *buf)
+#define MIGRATION_SCOPE	(1)
+
+static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
 	int rc;
 
 	spin_lock(&rtas_data_buf_lock);
 
 	memcpy(rtas_data_buf, buf, RTAS_DATA_BUF_SIZE);
-	rc = rtas_call(token, 2, 1, NULL, rtas_data_buf, 1);
+	rc = rtas_call(token, 2, 1, NULL, rtas_data_buf, scope);
 	memcpy(buf, rtas_data_buf, RTAS_DATA_BUF_SIZE);
 
 	spin_unlock(&rtas_data_buf_lock);
@@ -123,7 +125,7 @@
 	return 0;
 }
 
-static int update_dt_node(u32 phandle)
+static int update_dt_node(u32 phandle, s32 scope)
 {
 	struct update_props_workarea *upwa;
 	struct device_node *dn;
@@ -151,7 +153,8 @@
 	upwa->phandle = phandle;
 
 	do {
-		rc = mobility_rtas_call(update_properties_token, rtas_buf);
+		rc = mobility_rtas_call(update_properties_token, rtas_buf,
+					scope);
 		if (rc < 0)
 			break;
 
@@ -219,7 +222,7 @@
 	return rc;
 }
 
-static int pseries_devicetree_update(void)
+int pseries_devicetree_update(s32 scope)
 {
 	char *rtas_buf;
 	u32 *data;
@@ -235,7 +238,7 @@
 		return -ENOMEM;
 
 	do {
-		rc = mobility_rtas_call(update_nodes_token, rtas_buf);
+		rc = mobility_rtas_call(update_nodes_token, rtas_buf, scope);
 		if (rc && rc != 1)
 			break;
 
@@ -256,7 +259,7 @@
 					delete_dt_node(phandle);
 					break;
 				case UPDATE_DT_NODE:
-					update_dt_node(phandle);
+					update_dt_node(phandle, scope);
 					break;
 				case ADD_DT_NODE:
 					drc_index = *data++;
@@ -276,7 +279,7 @@
 	int rc;
 	int activate_fw_token;
 
-	rc = pseries_devicetree_update();
+	rc = pseries_devicetree_update(MIGRATION_SCOPE);
 	if (rc) {
 		printk(KERN_ERR "Initial post-mobility device tree update "
 		       "failed: %d\n", rc);
@@ -292,7 +295,7 @@
 
 	rc = rtas_call(activate_fw_token, 0, 1, NULL);
 	if (!rc) {
-		rc = pseries_devicetree_update();
+		rc = pseries_devicetree_update(MIGRATION_SCOPE);
 		if (rc)
 			printk(KERN_ERR "Secondary post-mobility device tree "
 			       "update failed: %d\n", rc);
Index: powerpc/arch/powerpc/include/asm/rtas.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/rtas.h	2013-04-15 09:18:10.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/rtas.h	2013-04-23 13:22:37.000000000 -0500
@@ -277,6 +277,10 @@
 
 extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 
+#ifdef CONFIG_PPC_PSERIES
+extern int pseries_devicetree_update(s32 scope);
+#endif
+
 #ifdef CONFIG_PPC_RTAS_DAEMON
 extern void rtas_cancel_event_scan(void);
 #else

^ permalink raw reply

* [PATCH v4 0/13] NUMA CPU Reconfiguration using PRRN
From: Nathan Fontenot @ 2013-04-24 15:34 UTC (permalink / raw)
  To: linuxppc-dev

Newer firmware on Power systems can transparently reassign platform resources
(CPU and Memory) in use. For instance, if a processor or memory unit is
predicted to fail, the platform may transparently move the processing to an
equivalent unused processor or the memory state to an equivalent unused
memory unit. However, reassigning resources across NUMA boundaries may alter
the performance of the partition. When such reassignment is necessary, the
Platform Resource Reassignment Notification (PRRN) option provides a
mechanism to inform the Linux kernel of changes to the NUMA affinity of
its platform resources.

PRRN Events are RTAS events sent up through the event-scan mechanism on
Power. When these events are received the system needs can get the updated
device tree affinity information for the affected CPUs/memory via the
rtas update-nodes and update-properties calls. This information is then
used to update the NUMA affinity of the CPUs/Memory in the kernel.

This patch set adds the ability to recognize PRRN events, update the device
tree and kernel information for CPUs (memory will be handled in a later
patch), and add an interface to enable/disable toplogy updates from /proc.

Additionally, these updates solve an existing problem with the VPHN (Virtual
Processor Home Node) capability and allow us to re-enable this feature.

Nathan Fontenot

 arch/powerpc/include/asm/firmware.h               |    7 
 arch/powerpc/include/asm/prom.h                   |   46 ++--
 arch/powerpc/include/asm/rtas.h                   |    2 
 arch/powerpc/kernel/prom_init.c                   |   98 ++--------
 arch/powerpc/kernel/rtasd.c                       |   46 ++++
 arch/powerpc/mm/numa.c                            |  214 +++++++++++++++-------
 arch/powerpc/platforms/pseries/firmware.c         |   50 ++++-
 arch/powerpc/platforms/pseries/mobility.c         |   21 +-
 powerpc/arch/powerpc/include/asm/firmware.h       |    1 
 powerpc/arch/powerpc/include/asm/prom.h           |   71 +++++++
 powerpc/arch/powerpc/include/asm/rtas.h           |    4 
 powerpc/arch/powerpc/include/asm/topology.h       |    5 
 powerpc/arch/powerpc/kernel/prom_init.c           |    2 
 powerpc/arch/powerpc/kernel/rtasd.c               |    7 
 powerpc/arch/powerpc/mm/numa.c                    |   62 ++++++
 powerpc/arch/powerpc/platforms/pseries/firmware.c |    8 
 powerpc/arch/powerpc/platforms/pseries/mobility.c |   20 +-
 powerpc/arch/powerpc/platforms/pseries/pseries.h  |    5 
 powerpc/arch/powerpc/platforms/pseries/setup.c    |   40 ++--
 19 files changed, 496 insertions(+), 213 deletions(-)

Updates for v4 of the patchset:
------------------------------
1/13 - Remove the hook in ppc_md for updating te device tree.

3/13 - Put the rtas code to handle PRRN events in #ifdef CONFIG_PPC_PSERIES

4/13 - New patch. Update the iteration over arrays in firmware.c to use
ARRAY_SIZE()

5/13 (was 4/12) - Remove the unnecessary #ifdef

6/13 (was 5/12) - Removed the references to platform_has_feature() and update
the firmware.c updates to use ARRAY_SIZE() for iteration.

8/13 (was 7/12) - Correct subject.

13/13 (was 12/12) - Remove inlining of prrn_is_enabled().

Updates for v3 of the patchset:
------------------------------
1/12 - Updated to use a ppc_md interface to invoke device tree updates, this
corrects the build break previously seen in patch 2/12 for non-pseries
platforms.

2/12 - New patch in the series to correct the parsing of the buffer returned
from ibm,update-properties rtas call.

5/12 - The parsing of architecture vector 5 has been made more efficient.

7/12 - Correct #define used in call the firmware_has_feature()

8/12 - Updated calling of stop_machine() to only call it once per PRRN event.

12/12 - Added inclusion of topology.h to rtasd.c to correct a build failure
on non-pseries platforms.

^ permalink raw reply

* Re: [PATCH -V6 18/27] mm/THP: withdraw the pgtable after pmdp related operations
From: Andrea Arcangeli @ 2013-04-24 15:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, David Gibson
In-Reply-To: <87a9oo2t5a.fsf@linux.vnet.ibm.com>

Hi,

On Wed, Apr 24, 2013 at 02:38:01PM +0530, Aneesh Kumar K.V wrote:
> From 7444a5eda33c00eea465b51c405cb830c57513b7 Mon Sep 17 00:00:00 2001
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Date: Wed, 6 Mar 2013 12:50:37 +0530
> Subject: [PATCH] mm/THP: withdraw the pgtable after pmdp related operations
> 
> For architectures like ppc64 we look at deposited pgtable when
> calling pmdp_get_and_clear. So do the pgtable_trans_huge_withdraw
> after finishing pmdp related operations.
> 
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/huge_memory.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 84f3180..21c5ebd 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1363,9 +1363,15 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  		struct page *page;
>  		pgtable_t pgtable;
>  		pmd_t orig_pmd;
> -		pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
> +		/*
> +		 * For architectures like ppc64 we look at deposited pgtable
> +		 * when calling pmdp_get_and_clear. So do the
> +		 * pgtable_trans_huge_withdraw after finishing pmdp related
> +		 * operations.
> +		 */
>  		orig_pmd = pmdp_get_and_clear(tlb->mm, addr, pmd);
>  		tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
> +		pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);

So I assume you're going to check the pmdp pointer address in
_withdraw, as the *pmd content is already clear. And that you're
checking the deposited pmd earlier in pmdp_get_and_clear. A bit
strange overall not seeing how exactly you're using the new parameter
and the deposited pmds, but safe.

Thanks,
Andrea

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox