devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport
@ 2015-08-06  4:11 Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 01/42] PCI: Add pcibios_setup_bridge() Gavin Shan
                   ` (36 more replies)
  0 siblings, 37 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
for review. The PCI slots are exposed by skiboot with device node properties,
and kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug
because the PE is assigned during PHB fixup time, which is called for once
during system boot time. For this, the PCI infrastructure on PowerNV platform
has been reworked for a lot. After that, the PE and its corresponding resources
(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
resources, on P8 strictly speaking). Each PE will maintain a reference count,
which is (number of child PCI devices + 1). That indicates when last child PCI
device leaves the PE, the PE and its included resources will be relased and put
back into free pool again. With this design, the PE will be released when EEH PE
is released. PATCH[1 - 23] are related to this part.

>From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
resets to EEH. The kernel gets to know if skiboot supports various reset on one
particular PCI slot through device-tree node. If it does, EEH will utilize the
functionality provided by skiboot. Besides, the device-tree nodes have to change
in order to support PCI hotplug. For example, when one PCI adapter inserted to
one slot, its device-tree node should be added to the system dynamically. Conversely,
the device-tree node should be removed from the system when the PCI adapter is going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. PATCH[24 - 36] are
doing the related work.

The OF driver is changed to support unflattening FDT blob for sub-stree, which
is covered by PATCH[37 - 41].

The last patch is the standalone PCI hotplug driver for PowerNV platform. When
removing PCI adapter from one PCI slot, which is invoked by command in userland,
the skiboot will power off the slot to save power and remove all device-tree
nodes for all PCI devices behind the slot. Conversely, the Power to the slot
is turned on, the PCI devices behind the slot is rescanned, and the device-tree
nodes for those newly detected PCI devices will be built in skiboot. For both
of cases, one message will be sent to kernel by skiboot so that the kernel
can adjust the device-tree accordingly. At the same time, the kernel also have
to deallocate or allocate PE# and its related resources (PE# and so on) for the
removed/added PCI devices.

Changelog
=========
v6:
   * Patch reorder, split, squash - Alexey.
   * Minor coding style - Alexey.
   * Better function names for pcibios_{add,remove}_pci_devices - Bjorn
   * Replace pr_warn() with dev_warn() in PowerNV hotplug driver - Bjorn
   * Concurrent depth as paramter passed to __unflatten_dt_node() - Grant / Alexey
   * Replace overlay with of_changeset - Grant
v5:
   * Rebased to 4.1.rc6 and some unmerged patches as below:
     Alexey's DDW patchset (v11);
     Gavin's EEH error injection support (in mpe's next branch);
     Richard's EEH cleanup patches (in mpe's next branch);
     Richard's EEH support for VF (v7);
     Gavin's misc EEH fixes for 4.2;
   * The revision bases on skiboot corresponding patches (v7):
     https://patchwork.ozlabs.org/patch/480437/
   * Utilize OF overlay to update device-tree with help of newly introduced
     OPAL API opal_get_overlay_dt().
   * Split patches for easy review according to aik's comments.
   * Fix coding style from checkpatchc.pl as pointed by aik.
   * Code cleanup and misc fixup according to aik's input.
v4:
   * Rebased to 4.1.RC1
   * Added API to unflatten FDT blob to device node sub-tree, which is attached
     the indicated parent device node. The original mechanism based on formatted
     string stream has been dropped.
   * The PATCH[v3 09/21] ("powerpc/eeh: Delay probing EEH device during hotplug")
     was picked up sent to linux-ppc@ separately for review as Richard's "VF EEH
     Support" depends on that.
v3:
   * Rebased to 4.1.RC0
   * PowerNV PCI infrasturcture is total refactored in order to support PCI
     hotplug. The PowerNV hotplug driver is also reworked a lot because of
     the changes in skiboot in order to support PCI hotplug.


Gavin Shan (42):
  PCI: Add pcibios_setup_bridge()
  powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
  powerpc/powernv: Enable M64 on P7IOC
  powerpc/powernv: Reorder fields in struct pnv_phb
  powerpc/powernv: Track IO/M32/M64 segments from PE
  powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  powerpc/powernv: Improve IO and M32 mapping
  powerpc/powernv: Calculate PHB's DMA weight dynamically
  powerpc/powernv: DMA32 cleanup
  powerpc/powernv: pnv_ioda_setup_dma() configure one PE only
  powerpc/powernv: Trace DMA32 segments consumed by PE
  powerpc/powernv: Increase PE# capacity
  powerpc/pci: Cleanup on pci_controller_ops
  powerpc/pci: Override pcibios_setup_bridge()
  powerpc/powernv: PE oriented during configuration
  powerpc/powernv: Helper function pnv_ioda_init_pe()
  powerpc/powernv: Rename PE# fields in PHB
  powerpc/powernv: Allocate PE# in deasending order
  powerpc/powernv: Reserve PE# for root bus
  powerpc/powernv: Create PEs dynamically
  powerpc/powernv: Remove DMA32 list of PEs
  powerpc/powernv: Move functions around
  powerpc/powernv: Release PEs dynamically
  powerpc/powernv: Supports slot ID
  powerpc/powernv: Use PCI slot reset infrastructure
  powerpc/powernv: Simplify pnv_eeh_reset()
  powerpc/powernv: Don't cover root bus in pnv_pci_reset_secondary_bus()
  powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
  powerpc/pci: Don't scan empty slot
  powerpc/pci: Move pcibios_find_pci_bus() around
  powerpc/pci: Rename pcibios_{add,remove}_pci_devices
  powerpc/powernv: Introduce pnv_pci_poll()
  powerpc/powernv: Functions to get/reset PCI slot status
  powerpc/pci: Delay creating pci_dn
  powerpc/pci: Export traverse_pci_device_nodes()
  powerpc/pci: Update bridge windows on PCI plugging
  powerpc/powernv: Select OF_DYNAMIC
  drivers/of: Unflatten subordinate nodes after specified level
  drivers/of: Allow to specify root node in of_fdt_unflatten_tree()
  drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree()
  drivers/of: Export OF changeset functions
  pci/hotplug: PowerPC PowerNV PCI hotplug driver

 MAINTAINERS                                    |    6 +
 arch/powerpc/include/asm/eeh.h                 |    2 +-
 arch/powerpc/include/asm/opal-api.h            |    8 +-
 arch/powerpc/include/asm/opal.h                |    9 +-
 arch/powerpc/include/asm/pci-bridge.h          |   25 +-
 arch/powerpc/include/asm/pnv-pci.h             |    7 +
 arch/powerpc/include/asm/ppc-pci.h             |    9 +-
 arch/powerpc/kernel/eeh_dev.c                  |   19 +-
 arch/powerpc/kernel/eeh_driver.c               |   12 +-
 arch/powerpc/kernel/pci-common.c               |   16 +-
 arch/powerpc/kernel/pci-hotplug.c              |   48 +-
 arch/powerpc/kernel/pci_dn.c                   |   71 +-
 arch/powerpc/platforms/maple/pci.c             |   34 +-
 arch/powerpc/platforms/pasemi/pci.c            |    3 -
 arch/powerpc/platforms/powermac/pci.c          |   40 +-
 arch/powerpc/platforms/powernv/Kconfig         |    1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c   |  181 +--
 arch/powerpc/platforms/powernv/opal-wrappers.S |    4 +
 arch/powerpc/platforms/powernv/pci-ioda.c      | 1661 ++++++++++++++----------
 arch/powerpc/platforms/powernv/pci.c           |   92 +-
 arch/powerpc/platforms/powernv/pci.h           |   63 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c     |   32 -
 arch/powerpc/platforms/pseries/setup.c         |    9 +-
 drivers/of/dynamic.c                           |   65 +-
 drivers/of/fdt.c                               |   69 +-
 drivers/of/overlay.c                           |    8 +-
 drivers/of/unittest.c                          |    6 +-
 drivers/pci/hotplug/Kconfig                    |   12 +
 drivers/pci/hotplug/Makefile                   |    4 +
 drivers/pci/hotplug/powernv_php.c              |  140 ++
 drivers/pci/hotplug/powernv_php.h              |   92 ++
 drivers/pci/hotplug/powernv_php_slot.c         |  722 ++++++++++
 drivers/pci/hotplug/rpadlpar_core.c            |    8 +-
 drivers/pci/hotplug/rpaphp_core.c              |    4 +-
 drivers/pci/hotplug/rpaphp_pci.c               |    4 +-
 drivers/pci/setup-bus.c                        |    5 +
 include/linux/of.h                             |    2 +
 include/linux/of_fdt.h                         |    3 +-
 include/linux/pci.h                            |    1 +
 39 files changed, 2536 insertions(+), 961 deletions(-)
 create mode 100644 drivers/pci/hotplug/powernv_php.c
 create mode 100644 drivers/pci/hotplug/powernv_php.h
 create mode 100644 drivers/pci/hotplug/powernv_php_slot.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v6 01/42] PCI: Add pcibios_setup_bridge()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 02/42] powerpc/powernv: Drop pnv_ioda_setup_dev_PE() Gavin Shan
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(),
which is called for once after PCI probing and resource assignment
are completed, to allocate platform required resources for PCI devices:
PE#, IO and MMIO mapping, DMA address translation (TCE) table etc.
Obviously, it's not hotplug friendly.

The patch adds weak function pcibios_setup_bridge(), which is called
by pci_setup_bridge(). PowerPC PowerNV platform will reuse the function
to assign above platform required resources to newly added PCI devices,
in order to support PCI hotplug in subsequent patches.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/setup-bus.c | 5 +++++
 include/linux/pci.h     | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 508cc56..a69eae1 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -696,11 +696,16 @@ static void __pci_setup_bridge(struct pci_bus *bus, unsigned long type)
 	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl);
 }
 
+void __weak pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+}
+
 void pci_setup_bridge(struct pci_bus *bus)
 {
 	unsigned long type = IORESOURCE_IO | IORESOURCE_MEM |
 				  IORESOURCE_PREFETCH;
 
+	pcibios_setup_bridge(bus, type);
 	__pci_setup_bridge(bus, type);
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3fed437..0fa9712 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -816,6 +816,7 @@ void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
 void pci_stop_root_bus(struct pci_bus *bus);
 void pci_remove_root_bus(struct pci_bus *bus);
 void pci_setup_cardbus(struct pci_bus *bus);
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type);
 void pci_sort_breadthfirst(void);
 #define dev_is_pci(d) ((d)->bus == &pci_bus_type)
 #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false))
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 02/42] powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 01/42] PCI: Add pcibios_setup_bridge() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC Gavin Shan
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

Nobody is using the this function. The patch drops it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 71 -------------------------------
 1 file changed, 71 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 07666ec..38b5405 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -923,77 +923,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 }
 #endif /* CONFIG_PCI_IOV */
 
-#if 0
-static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	struct pci_dn *pdn = pci_get_pdn(dev);
-	struct pnv_ioda_pe *pe;
-	int pe_num;
-
-	if (!pdn) {
-		pr_err("%s: Device tree node not associated properly\n",
-			   pci_name(dev));
-		return NULL;
-	}
-	if (pdn->pe_number != IODA_INVALID_PE)
-		return NULL;
-
-	/* PE#0 has been pre-set */
-	if (dev->bus->number == 0)
-		pe_num = 0;
-	else
-		pe_num = pnv_ioda_alloc_pe(phb);
-	if (pe_num == IODA_INVALID_PE) {
-		pr_warning("%s: Not enough PE# available, disabling device\n",
-			   pci_name(dev));
-		return NULL;
-	}
-
-	/* NOTE: We get only one ref to the pci_dev for the pdn, not for the
-	 * pointer in the PE data structure, both should be destroyed at the
-	 * same time. However, this needs to be looked at more closely again
-	 * once we actually start removing things (Hotplug, SR-IOV, ...)
-	 *
-	 * At some point we want to remove the PDN completely anyways
-	 */
-	pe = &phb->ioda.pe_array[pe_num];
-	pci_dev_get(dev);
-	pdn->pcidev = dev;
-	pdn->pe_number = pe_num;
-	pe->pdev = dev;
-	pe->pbus = NULL;
-	pe->tce32_seg = -1;
-	pe->mve_number = -1;
-	pe->rid = dev->bus->number << 8 | pdn->devfn;
-
-	pe_info(pe, "Associated device to PE\n");
-
-	if (pnv_ioda_configure_pe(phb, pe)) {
-		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
-		pdn->pe_number = IODA_INVALID_PE;
-		pe->pdev = NULL;
-		pci_dev_put(dev);
-		return NULL;
-	}
-
-	/* Assign a DMA weight to the device */
-	pe->dma_weight = pnv_ioda_dma_weight(dev);
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
-	/* Link the PE */
-	pnv_ioda_link_pe_by_weight(phb, pe);
-
-	return pe;
-}
-#endif /* Useful for SRIOV case */
-
 static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *dev;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 01/42] PCI: Add pcibios_setup_bridge() Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 02/42] powerpc/powernv: Drop pnv_ioda_setup_dev_PE() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-10  6:30   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 04/42] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
                   ` (33 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The patch enables M64 window on P7IOC, which has been enabled on
PHB3. Different from PHB3 where 16 M64 BARs are supported and each
of them can be owned by one particular PE# exclusively or divided
evenly to 256 segments, each P7IOC PHB has 16 M64 BARs and each
of them are divided into 8 segments. So each P7IOC PHB can support
128 M64 segments only. Also, P7IOC has M64DT, which helps mapping
one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
M64DT, indicating that one M64 segment can only be pinned to the
fixed PE#. In order to have similar logic to support M64 for PHB3
and P7IOC, we just provide 128 M64 (16 BARs) segments and fixed
mapping between PE# and M64 segment# on P7IOC. In turn, we just
need different phb->init_m64() hooks for P7IOC and PHB3 to support
M64.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 116 ++++++++++++++++++++++++++----
 1 file changed, 104 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 38b5405..e4ac703 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -172,6 +172,69 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
 	clear_bit(pe, phb->ioda.pe_alloc);
 }
 
+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+	struct resource *r;
+	int seg;
+
+	/* There are as many M64 segments as the maximum number
+	 * of PEs, which is 128.
+	 */
+	for (seg = 0; seg < phb->ioda.total_pe; seg += 8) {
+		unsigned long base;
+		int64_t rc;
+
+		base = phb->ioda.m64_base + seg * phb->ioda.m64_segsize;
+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+						 OPAL_M64_WINDOW_TYPE,
+						 seg / 8,
+						 base,
+						 0, /* unused */
+						 8 * phb->ioda.m64_segsize);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
+				rc, phb->hose->global_number, seg / 8);
+			goto fail;
+		}
+
+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
+					      OPAL_M64_WINDOW_TYPE,
+					      seg / 8,
+					      OPAL_ENABLE_M64_SPLIT);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
+				rc, phb->hose->global_number, seg / 8);
+			goto fail;
+		}
+	}
+
+	/* Strip off the segment used by the reserved PE, which
+	 * is expected to be 0 or last supported PE#. The PHB's
+	 * first memory window traces the 32-bits MMIO range
+	 * while the second one traces the 64-bits prefetchable
+	 * MMIO range that the PHB supports.
+	 */
+	r = &phb->hose->mem_resources[1];
+	if (phb->ioda.reserved_pe == 0)
+		r->start += phb->ioda.m64_segsize;
+	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
+		r->end -= phb->ioda.m64_segsize;
+	else
+		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
+			phb->ioda.reserved_pe);
+
+	return 0;
+
+fail:
+	for ( ; seg >= 0; seg -= 8)
+		opal_pci_phb_mmio_enable(phb->opal_id,
+					 OPAL_M64_WINDOW_TYPE,
+					 seg / 8,
+					 OPAL_DISABLE_M64);
+
+	return -EIO;
+}
+
 /* The default M64 BAR is shared by all PEs */
 static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 {
@@ -256,9 +319,9 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
 	}
 }
 
-static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
-				     unsigned long *pe_bitmap,
-				     bool all)
+static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
+				    unsigned long *pe_bitmap,
+				    bool all)
 {
 	struct pci_dev *pdev;
 
@@ -266,12 +329,12 @@ static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
 		pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
 
 		if (all && pdev->subordinate)
-			pnv_ioda2_reserve_m64_pe(pdev->subordinate,
-						 pe_bitmap, all);
+			pnv_ioda_reserve_m64_pe(pdev->subordinate,
+						pe_bitmap, all);
 	}
 }
 
-static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
+static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -293,7 +356,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 	}
 
 	/* Figure out reserved PE numbers by the PE */
-	pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
+	pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
 
 	/*
 	 * the current bus might not own M64 window and that's all
@@ -324,6 +387,26 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
 			pe->master = master_pe;
 			list_add_tail(&pe->list, &master_pe->slaves);
 		}
+
+		/* P7IOC supports M64DT, which helps mapping M64 segment
+		 * to one particular PE#. However, PHB3 has fixed mapping
+		 * between M64 segment and PE#. In order to have same logic
+		 * for P7IOC and PHB3, we enforce fixed mapping between M64
+		 * segment and PE# on P7IOC.
+		 */
+		if (phb->type == PNV_PHB_IODA1) {
+			int64_t rc;
+
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+							 pe->pe_number,
+							 OPAL_M64_WINDOW_TYPE,
+							 pe->pe_number / 8,
+							 pe->pe_number % 8);
+			if (rc != OPAL_SUCCESS)
+				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
+					__func__, rc, phb->hose->global_number,
+					pe->pe_number);
+		}
 	}
 
 	kfree(pe_alloc);
@@ -338,8 +421,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	const u32 *r;
 	u64 pci_addr;
 
-	/* FIXME: Support M64 for P7IOC */
-	if (phb->type != PNV_PHB_IODA2) {
+	if (phb->type != PNV_PHB_IODA1 &&
+	    phb->type != PNV_PHB_IODA2) {
 		pr_info("  Not support M64 window\n");
 		return;
 	}
@@ -372,9 +455,18 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 
 	/* Use last M64 BAR to cover M64 window */
 	phb->ioda.m64_bar_idx = 15;
-	phb->init_m64 = pnv_ioda2_init_m64;
-	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
-	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
+	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
+	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		phb->init_m64 = pnv_ioda1_init_m64;
+		break;
+	case PNV_PHB_IODA2:
+		phb->init_m64 = pnv_ioda2_init_m64;
+		break;
+	default:
+		pr_debug("   M64 not supported\n");
+	}
 }
 
 static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 04/42] powerpc/powernv: Reorder fields in struct pnv_phb
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (2 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE Gavin Shan
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The patches moves those fields of struct pnv_phb that are related
to PE# allocation around. No logical change.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index e891ff4..62239b1 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -145,15 +145,14 @@ struct pnv_phb {
 			unsigned int		io_segsize;
 			unsigned int		io_pci_base;
 
-			/* PE allocation bitmap */
-			unsigned long		*pe_alloc;
-			/* PE allocation mutex */
+			/* PE allocation */
 			struct mutex		pe_alloc_mutex;
+			unsigned long		*pe_alloc;
+			struct pnv_ioda_pe	*pe_array;
 
 			/* M32 & IO segment maps */
 			unsigned int		*m32_segmap;
 			unsigned int		*io_segmap;
-			struct pnv_ioda_pe	*pe_array;
 
 			/* IRQ chip */
 			int			irq_chip_init;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (3 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 04/42] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-10  7:16   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 06/42] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
                   ` (31 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The patch is adding 6 bitmaps, three to PE and three to PHB, to track
the consumed by one particular PE, which can be released once the PE
is destroyed during PCI unplugging time. Also, we're using fixed
quantity of bits to trace the used IO, M32 and M64 segments by PEs
in one particular PHB.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.h      | 18 ++++++++++++++----
 2 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e4ac703..78b49a1 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 			list_add_tail(&pe->list, &master_pe->slaves);
 		}
 
+		/* M64 segments consumed by slave PEs are tracked
+		 * by master PE
+		 */
+		set_bit(pe->pe_number, master_pe->m64_segmap);
+		set_bit(pe->pe_number, phb->ioda.m64_segmap);
+
 		/* P7IOC supports M64DT, which helps mapping M64 segment
 		 * to one particular PE#. However, PHB3 has fixed mapping
 		 * between M64 segment and PE#. In order to have same logic
@@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 
 			while (index < phb->ioda.total_pe &&
 			       region.start <= region.end) {
-				phb->ioda.io_segmap[index] = pe->pe_number;
+				set_bit(index, pe->io_segmap);
+				set_bit(index, phb->ioda.io_segmap);
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
+					pe->pe_number, OPAL_IO_WINDOW_TYPE,
+					0, index);
 				if (rc != OPAL_SUCCESS) {
 					pr_err("%s: OPAL error %d when mapping IO "
 					       "segment #%d to PE#%d\n",
@@ -2896,9 +2904,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 
 			while (index < phb->ioda.total_pe &&
 			       region.start <= region.end) {
-				phb->ioda.m32_segmap[index] = pe->pe_number;
+				set_bit(index, pe->m32_segmap);
+				set_bit(index, phb->ioda.m32_segmap);
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
+					pe->pe_number, OPAL_M32_WINDOW_TYPE,
+					0, index);
 				if (rc != OPAL_SUCCESS) {
 					pr_err("%s: OPAL error %d when mapping M32 "
 					       "segment#%d to PE#%d",
@@ -3090,7 +3100,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
+	unsigned long size, pemap_off;
 	const __be64 *prop64;
 	const __be32 *prop32;
 	int len;
@@ -3175,19 +3185,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
 	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
-	m32map_off = size;
-	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
-	if (phb->type == PNV_PHB_IODA1) {
-		iomap_off = size;
-		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
-	}
 	pemap_off = size;
 	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
-	phb->ioda.m32_segmap = aux + m32map_off;
-	if (phb->type == PNV_PHB_IODA1)
-		phb->ioda.io_segmap = aux + iomap_off;
 	phb->ioda.pe_array = aux + pemap_off;
 	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 62239b1..08a4e57 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -49,6 +49,15 @@ struct pnv_ioda_pe {
 	/* PE number */
 	unsigned int		pe_number;
 
+	/* IO/M32/M64 segments consumed by the PE. Each PE can
+	 * have one M64 segment at most, but M64 segments consumed
+	 * by slave PEs will be contributed to the master PE. One
+	 * PE can own multiple IO and M32 segments.
+	 */
+	unsigned long		io_segmap[8];
+	unsigned long		m32_segmap[8];
+	unsigned long		m64_segmap[8];
+
 	/* "Weight" assigned to the PE for the sake of DMA resource
 	 * allocations
 	 */
@@ -145,15 +154,16 @@ struct pnv_phb {
 			unsigned int		io_segsize;
 			unsigned int		io_pci_base;
 
+			/* IO, M32, M64 segment maps */
+			unsigned long		io_segmap[8];
+			unsigned long		m32_segmap[8];
+			unsigned long		m64_segmap[8];
+
 			/* PE allocation */
 			struct mutex		pe_alloc_mutex;
 			unsigned long		*pe_alloc;
 			struct pnv_ioda_pe	*pe_array;
 
-			/* M32 & IO segment maps */
-			unsigned int		*m32_segmap;
-			unsigned int		*io_segmap;
-
 			/* IRQ chip */
 			int			irq_chip_init;
 			struct irq_chip		irq_chip;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 06/42] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (4 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 07/42] powerpc/powernv: Improve IO and M32 mapping Gavin Shan
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The original implementation of pnv_ioda_setup_pe_seg() configures
IO and M32 segments by separate logics, which can be merged by
by caching @seg_bitmap, @seg_size, @win in advance. The patch
shouldn't cause any behavioural changes.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 68 ++++++++++++++-----------------
 1 file changed, 31 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 78b49a1..488a53e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2856,7 +2856,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 	struct pci_bus_region region;
 	struct resource *res;
 	int i, index;
-	int rc;
+	unsigned int segsize;
+	unsigned long *segmap, *pe_segmap;
+	uint16_t win;
+	int64_t rc;
 
 	/*
 	 * NOTE: We only care PCI bus based PE for now. For PCI
@@ -2873,25 +2876,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 		if (res->flags & IORESOURCE_IO) {
 			region.start = res->start - phb->ioda.io_pci_base;
 			region.end   = res->end - phb->ioda.io_pci_base;
-			index = region.start / phb->ioda.io_segsize;
-
-			while (index < phb->ioda.total_pe &&
-			       region.start <= region.end) {
-				set_bit(index, pe->io_segmap);
-				set_bit(index, phb->ioda.io_segmap);
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_IO_WINDOW_TYPE,
-					0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping IO "
-					       "segment #%d to PE#%d\n",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
-
-				region.start += phb->ioda.io_segsize;
-				index++;
-			}
+			segsize      = phb->ioda.io_segsize;
+			segmap       = phb->ioda.io_segmap;
+			pe_segmap    = pe->io_segmap;
+			win          = OPAL_IO_WINDOW_TYPE;
 		} else if ((res->flags & IORESOURCE_MEM) &&
 			   !pnv_pci_is_mem_pref_64(res->flags)) {
 			region.start = res->start -
@@ -2900,25 +2888,31 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 			region.end   = res->end -
 				       hose->mem_offset[0] -
 				       phb->ioda.m32_pci_base;
-			index = region.start / phb->ioda.m32_segsize;
-
-			while (index < phb->ioda.total_pe &&
-			       region.start <= region.end) {
-				set_bit(index, pe->m32_segmap);
-				set_bit(index, phb->ioda.m32_segmap);
-				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, OPAL_M32_WINDOW_TYPE,
-					0, index);
-				if (rc != OPAL_SUCCESS) {
-					pr_err("%s: OPAL error %d when mapping M32 "
-					       "segment#%d to PE#%d",
-					       __func__, rc, index, pe->pe_number);
-					break;
-				}
+			segsize      = phb->ioda.m32_segsize;
+			segmap       = phb->ioda.m32_segmap;
+			pe_segmap    = pe->m32_segmap;
+			win          = OPAL_M32_WINDOW_TYPE;
+		} else {
+			continue;
+		}
 
-				region.start += phb->ioda.m32_segsize;
-				index++;
+		index = region.start / phb->ioda.io_segsize;
+		while (index < phb->ioda.total_pe &&
+		       region.start <= region.end) {
+			set_bit(index, segmap);
+			set_bit(index, pe_segmap);
+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+					pe->pe_number, win, 0, index);
+			if (rc != OPAL_SUCCESS) {
+				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
+					__func__, rc, win, index,
+					pe->phb->hose->global_number,
+					pe->pe_number);
+				break;
 			}
+
+			region.start += segsize;
+			index++;
 		}
 	}
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 07/42] powerpc/powernv: Improve IO and M32 mapping
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (5 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 06/42] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
       [not found]   ` <1438834307-26960-8-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2015-08-06  4:11 ` [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically Gavin Shan
                   ` (29 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

There're 3 windows (IO, M32 and M64) for PHB, root port and upstream
port of the PCIE switch behind root port. In order to support PCI
hotplug, we extend the start/end address of those 3 windows of root
port or upstream port to the start/end address of the 3 PHB's windows.
The current implementation, assigning IO or M32 segment based on the
bridge's windows, isn't reliable.

The patch fixes above issue by calculating PE's consumed IO or M32
segments from its contained devices, no PCI bridge windows involved
if the PE doesn't contain all the subordinate PCI buses. Otherwise,
the PCI bridge windows still contribute to PE's consumed IO or M32
segments.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 136 +++++++++++++++++-------------
 1 file changed, 79 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 488a53e..713f4b4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2844,75 +2844,97 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 }
 #endif /* CONFIG_PCI_IOV */
 
-/*
- * This function is supposed to be called on basis of PE from top
- * to bottom style. So the the I/O or MMIO segment assigned to
- * parent PE could be overrided by its child PEs if necessary.
- */
-static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
-				  struct pnv_ioda_pe *pe)
+static int pnv_ioda_setup_one_res(struct pci_controller *hose,
+				  struct pnv_ioda_pe *pe,
+				  struct resource *res)
 {
 	struct pnv_phb *phb = hose->private_data;
 	struct pci_bus_region region;
-	struct resource *res;
-	int i, index;
-	unsigned int segsize;
+	unsigned int index, segsize;
 	unsigned long *segmap, *pe_segmap;
 	uint16_t win;
 	int64_t rc;
 
-	/*
-	 * NOTE: We only care PCI bus based PE for now. For PCI
-	 * device based PE, for example SRIOV sensitive VF should
-	 * be figured out later.
-	 */
-	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+	/* Check if we need map the resource */
+	if (!res->parent || !res->flags || res->start > res->end)
+		return 0;
 
-	pci_bus_for_each_resource(pe->pbus, res, i) {
-		if (!res || !res->flags ||
-		    res->start > res->end)
-			continue;
+	if (res->flags & IORESOURCE_IO) {
+		region.start = res->start - phb->ioda.io_pci_base;
+		region.end   = res->end - phb->ioda.io_pci_base;
+		segsize      = phb->ioda.io_segsize;
+		segmap       = phb->ioda.io_segmap;
+		pe_segmap    = pe->io_segmap;
+		win          = OPAL_IO_WINDOW_TYPE;
+	} else if ((res->flags & IORESOURCE_MEM) &&
+		   !pnv_pci_is_mem_pref_64(res->flags)) {
+		region.start = res->start -
+			       hose->mem_offset[0] -
+			       phb->ioda.m32_pci_base;
+		region.end   = res->end -
+			       hose->mem_offset[0] -
+			       phb->ioda.m32_pci_base;
+		segsize      = phb->ioda.m32_segsize;
+		segmap       = phb->ioda.m32_segmap;
+		pe_segmap    = pe->m32_segmap;
+		win          = OPAL_M32_WINDOW_TYPE;
+	} else {
+		return 0;
+	}
 
-		if (res->flags & IORESOURCE_IO) {
-			region.start = res->start - phb->ioda.io_pci_base;
-			region.end   = res->end - phb->ioda.io_pci_base;
-			segsize      = phb->ioda.io_segsize;
-			segmap       = phb->ioda.io_segmap;
-			pe_segmap    = pe->io_segmap;
-			win          = OPAL_IO_WINDOW_TYPE;
-		} else if ((res->flags & IORESOURCE_MEM) &&
-			   !pnv_pci_is_mem_pref_64(res->flags)) {
-			region.start = res->start -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			region.end   = res->end -
-				       hose->mem_offset[0] -
-				       phb->ioda.m32_pci_base;
-			segsize      = phb->ioda.m32_segsize;
-			segmap       = phb->ioda.m32_segmap;
-			pe_segmap    = pe->m32_segmap;
-			win          = OPAL_M32_WINDOW_TYPE;
-		} else {
-			continue;
+	region.start = _ALIGN_DOWN(region.start, segsize);
+	region.end   = _ALIGN_UP(region.end, segsize);
+	index = region.start / segsize;
+	while (index < phb->ioda.total_pe &&
+	       region.start < region.end) {
+		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+				pe->pe_number, win, 0, index);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
+				__func__, rc, win, index,
+				pe->phb->hose->global_number,
+				pe->pe_number);
+			return -EIO;
 		}
 
-		index = region.start / phb->ioda.io_segsize;
-		while (index < phb->ioda.total_pe &&
-		       region.start <= region.end) {
-			set_bit(index, segmap);
-			set_bit(index, pe_segmap);
-			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-					pe->pe_number, win, 0, index);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
-					__func__, rc, win, index,
-					pe->phb->hose->global_number,
-					pe->pe_number);
-				break;
-			}
+		set_bit(index, segmap);
+		set_bit(index, pe_segmap);
+		region.start += segsize;
+		index++;
+	}
+
+	return 0;
+}
+
+static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
+				  struct pnv_ioda_pe *pe)
+{
+	struct pci_dev *pdev;
+	struct resource *res;
+	int i;
+
+	/* This function only works for bus dependent PE */
+	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+
+	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
+		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
+			res = &pdev->resource[i];
+			if (pnv_ioda_setup_one_res(hose, pe, res))
+				return;
+		}
+
+		/* If the PE contains all subordinate PCI buses, the
+		 * resources of the child bridges should be mapped
+		 * to the PE as well.
+		 */
+		if (!(pe->flags & PNV_IODA_PE_BUS_ALL) ||
+		    (pdev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
+			continue;
 
-			region.start += segsize;
-			index++;
+		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
+			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
+			if (pnv_ioda_setup_one_res(hose, pe, res))
+				return;
 		}
 	}
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (6 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 07/42] powerpc/powernv: Improve IO and M32 mapping Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-10  7:48   ` Alexey Kardashevskiy
  2015-08-10  9:21   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 09/42] powerpc/powernv: DMA32 cleanup Gavin Shan
                   ` (28 subsequent siblings)
  36 siblings, 2 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

For P7IOC, the whole available DMA32 space, which is below the
MEM32 space, is divided evenly into 256MB segments. The number
of continuous segments assigned to one particular PE depends on
the PE's DMA weight that is calculated based on the type of each
PCI devices contained in the PE, and PHB's DMA weight which is
accumulative DMA weight of PEs contained in the PHB. It means
that the PHB's DMA weight calculation depends on existing PEs,
which works perfectly now, but not hotplug friendly. As the
whole available DMA32 space can be assigned to one PE on PHB3,
so we don't have the issue on PHB3.

The patch calculates PHB's DMA weight based on the PCI devices
contained in the PHB dynamically so that it's hotplug friendly.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 88 +++++++++++++++----------------
 arch/powerpc/platforms/powernv/pci.h      |  6 ---
 2 files changed, 43 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 713f4b4..7342cfd 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -927,6 +927,9 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
 
 static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
 {
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
 	/* This is quite simplistic. The "base" weight of a device
 	 * is 10. 0 means no DMA is to be accounted for it.
 	 */
@@ -939,14 +942,34 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
 	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
 	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
 	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-		return 3;
+		return 3 * phb->ioda.tce32_count;
 
 	/* Increase the weight of RAID (includes Obsidian) */
 	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
-		return 15;
+		return 15 * phb->ioda.tce32_count;
 
 	/* Default */
-	return 10;
+	return 10 * phb->ioda.tce32_count;
+}
+
+static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data)
+{
+	unsigned int *dma_weight = data;
+
+	*dma_weight += pnv_ioda_dma_weight(pdev);
+	return 0;
+}
+
+static unsigned int pnv_ioda_phb_dma_weight(struct pnv_phb *phb)
+{
+	unsigned int dma_weight = 0;
+
+	if (!phb->hose->bus)
+		return 0;
+
+	pci_walk_bus(phb->hose->bus,
+		     __pnv_ioda_phb_dma_weight, &dma_weight);
+	return dma_weight;
 }
 
 #ifdef CONFIG_PCI_IOV
@@ -1097,14 +1120,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	/* Put PE to the list */
 	list_add_tail(&pe->list, &phb->ioda.pe_list);
 
-	/* Account for one DMA PE if at least one DMA capable device exist
-	 * below the bridge
-	 */
-	if (pe->dma_weight != 0) {
-		phb->ioda.dma_weight += pe->dma_weight;
-		phb->ioda.dma_pe_count++;
-	}
-
 	/* Link the PE */
 	pnv_ioda_link_pe_by_weight(phb, pe);
 }
@@ -2431,24 +2446,13 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 {
 	struct pci_controller *hose = phb->hose;
-	unsigned int residual, remaining, segs, tw, base;
 	struct pnv_ioda_pe *pe;
+	unsigned int dma_weight;
 
-	/* If we have more PE# than segments available, hand out one
-	 * per PE until we run out and let the rest fail. If not,
-	 * then we assign at least one segment per PE, plus more based
-	 * on the amount of devices under that PE
-	 */
-	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
-		residual = 0;
-	else
-		residual = phb->ioda.tce32_count -
-			phb->ioda.dma_pe_count;
-
-	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
-		hose->global_number, phb->ioda.tce32_count);
-	pr_info("PCI: %d PE# for a total weight of %d\n",
-		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
+	/* Calculate the PHB's DMA weight */
+	dma_weight = pnv_ioda_phb_dma_weight(phb);
+	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
+		hose->global_number, phb->ioda.tce32_count, dma_weight);
 
 	pnv_pci_ioda_setup_opal_tce_kill(phb);
 
@@ -2456,22 +2460,9 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 	 * out one base segment plus any residual segments based on
 	 * weight
 	 */
-	remaining = phb->ioda.tce32_count;
-	tw = phb->ioda.dma_weight;
-	base = 0;
 	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
 		if (!pe->dma_weight)
 			continue;
-		if (!remaining) {
-			pe_warn(pe, "No DMA32 resources available\n");
-			continue;
-		}
-		segs = 1;
-		if (residual) {
-			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
-			if (segs > remaining)
-				segs = remaining;
-		}
 
 		/*
 		 * For IODA2 compliant PHB3, we needn't care about the weight.
@@ -2479,17 +2470,24 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 		 * the specific PE.
 		 */
 		if (phb->type == PNV_PHB_IODA1) {
-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
+			unsigned int segs, base = 0;
+
+			if (pe->dma_weight <
+			    dma_weight / phb->ioda.tce32_count)
+				segs = 1;
+			else
+				segs = (pe->dma_weight *
+					phb->ioda.tce32_count) / dma_weight;
+
+			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
 				pe->dma_weight, segs);
 			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
+
+			base += segs;
 		} else {
 			pe_info(pe, "Assign DMA32 space\n");
-			segs = 0;
 			pnv_pci_ioda2_setup_dma_pe(phb, pe);
 		}
-
-		remaining -= segs;
-		base += segs;
 	}
 }
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 08a4e57..addd3f7 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -183,12 +183,6 @@ struct pnv_phb {
 			/* 32-bit TCE tables allocation */
 			unsigned long		tce32_count;
 
-			/* Total "weight" for the sake of DMA resources
-			 * allocation
-			 */
-			unsigned int		dma_weight;
-			unsigned int		dma_pe_count;
-
 			/* Sorted list of used PE's, sorted at
 			 * boot for resource allocation purposes
 			 */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 09/42] powerpc/powernv: DMA32 cleanup
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (7 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-10  8:07   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only Gavin Shan
                   ` (27 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The patch cleans up DMA32 in pci-ioda.c. It shouldn't introduce
behavioural changes:

   * Rename various fields in "struct pnv_phb" and "struct pnv_ioda_pe"
     as 32-bits DMA should be related to "DMA", not "TCE".
   * Removed struct pnv_ioda_pe::tce32_segcount.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 48 +++++++++++++++----------------
 arch/powerpc/platforms/powernv/pci.h      |  7 ++---
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7342cfd..8456f37 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -917,7 +917,7 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
 	struct pnv_ioda_pe *lpe;
 
 	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
-		if (lpe->dma_weight < pe->dma_weight) {
+		if (lpe->dma32_weight < pe->dma32_weight) {
 			list_add_tail(&pe->dma_link, &lpe->dma_link);
 			return;
 		}
@@ -942,14 +942,14 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
 	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
 	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
 	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-		return 3 * phb->ioda.tce32_count;
+		return 3 * phb->ioda.dma32_segcount;
 
 	/* Increase the weight of RAID (includes Obsidian) */
 	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
-		return 15 * phb->ioda.tce32_count;
+		return 15 * phb->ioda.dma32_segcount;
 
 	/* Default */
-	return 10 * phb->ioda.tce32_count;
+	return 10 * phb->ioda.dma32_segcount;
 }
 
 static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data)
@@ -1057,7 +1057,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 			continue;
 		}
 		pdn->pe_number = pe->pe_number;
-		pe->dma_weight += pnv_ioda_dma_weight(dev);
+		pe->dma32_weight += pnv_ioda_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
 			pnv_ioda_setup_same_PE(dev->subordinate, pe);
 	}
@@ -1094,10 +1094,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
-	pe->tce32_seg = -1;
+	pe->dma32_seg = -1;
 	pe->mve_number = -1;
 	pe->rid = bus->busn_res.start << 8;
-	pe->dma_weight = 0;
+	pe->dma32_weight = 0;
 
 	if (all)
 		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
@@ -1460,7 +1460,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		pe->flags = PNV_IODA_PE_VF;
 		pe->pbus = NULL;
 		pe->parent_dev = pdev;
-		pe->tce32_seg = -1;
+		pe->dma32_seg = -1;
 		pe->mve_number = -1;
 		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
 			   pci_iov_virtfn_devfn(pdev, vf_index);
@@ -1936,7 +1936,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
 
 	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
+	if (WARN_ON(pe->dma32_seg >= 0))
 		return;
 
 	tbl = pnv_pci_table_alloc(phb->hose->node);
@@ -1945,7 +1945,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
 
 	/* Grab a 32-bit TCE table */
-	pe->tce32_seg = base;
+	pe->dma32_seg = base;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
 		(base << 28), ((base + segs) << 28) - 1);
 
@@ -2006,8 +2006,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	return;
  fail:
 	/* XXX Failure: Try to fallback to 64-bit only ? */
-	if (pe->tce32_seg >= 0)
-		pe->tce32_seg = -1;
+	if (pe->dma32_seg >= 0)
+		pe->dma32_seg = -1;
 	if (tce_mem)
 		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
 	if (tbl) {
@@ -2405,7 +2405,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	int64_t rc;
 
 	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->tce32_seg >= 0))
+	if (WARN_ON(pe->dma32_seg >= 0))
 		return;
 
 	/* TVE #1 is selected by PCI address bit 59 */
@@ -2415,7 +2415,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 			pe->pe_number);
 
 	/* The PE will reserve all possible 32-bits space */
-	pe->tce32_seg = 0;
+	pe->dma32_seg = 0;
 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
 		phb->ioda.m32_pci_base);
 
@@ -2432,8 +2432,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 
 	rc = pnv_pci_ioda2_setup_default_config(pe);
 	if (rc) {
-		if (pe->tce32_seg >= 0)
-			pe->tce32_seg = -1;
+		if (pe->dma32_seg >= 0)
+			pe->dma32_seg = -1;
 		return;
 	}
 
@@ -2452,7 +2452,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 	/* Calculate the PHB's DMA weight */
 	dma_weight = pnv_ioda_phb_dma_weight(phb);
 	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
-		hose->global_number, phb->ioda.tce32_count, dma_weight);
+		hose->global_number, phb->ioda.dma32_segcount, dma_weight);
 
 	pnv_pci_ioda_setup_opal_tce_kill(phb);
 
@@ -2461,7 +2461,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 	 * weight
 	 */
 	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
-		if (!pe->dma_weight)
+		if (!pe->dma32_weight)
 			continue;
 
 		/*
@@ -2472,15 +2472,15 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 		if (phb->type == PNV_PHB_IODA1) {
 			unsigned int segs, base = 0;
 
-			if (pe->dma_weight <
-			    dma_weight / phb->ioda.tce32_count)
+			if (pe->dma32_weight <
+			    dma_weight / phb->ioda.dma32_segcount)
 				segs = 1;
 			else
-				segs = (pe->dma_weight *
-					phb->ioda.tce32_count) / dma_weight;
+				segs = (pe->dma32_weight *
+					phb->ioda.dma32_segcount) / dma_weight;
 
 			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
-				pe->dma_weight, segs);
+				pe->dma32_weight, segs);
 			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
 
 			base += segs;
@@ -3211,7 +3211,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	mutex_init(&phb->ioda.pe_list_mutex);
 
 	/* Calculate how many 32-bit TCE segments we have */
-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
+	phb->ioda.dma32_segcount = phb->ioda.m32_pci_base >> 28;
 
 #if 0 /* We should really do that ... */
 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index addd3f7..574fe43 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -61,11 +61,10 @@ struct pnv_ioda_pe {
 	/* "Weight" assigned to the PE for the sake of DMA resource
 	 * allocations
 	 */
-	unsigned int		dma_weight;
+	unsigned int		dma32_weight;
 
 	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
-	int			tce32_seg;
-	int			tce32_segcount;
+	int			dma32_seg;
 	struct iommu_table_group table_group;
 
 	/* 64-bit TCE bypass region */
@@ -181,7 +180,7 @@ struct pnv_phb {
 			unsigned char		pe_rmap[0x10000];
 
 			/* 32-bit TCE tables allocation */
-			unsigned long		tce32_count;
+			unsigned long		dma32_segcount;
 
 			/* Sorted list of used PE's, sorted at
 			 * boot for resource allocation purposes
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (8 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 09/42] powerpc/powernv: DMA32 cleanup Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-10  9:31   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by PE Gavin Shan
                   ` (26 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The original implementation of pnv_ioda_setup_dma() iterates the
list of PEs and configures the DMA32 space for them one by one.
The function was designed to be called during PHB fixup time.
When configuring PE's DMA32 space in pcibios_setup_bridge(), in
order to support PCI hotplug, we have to have the function PE
oriented.

This renames pnv_ioda_setup_dma() to pnv_ioda1_setup_dma() and
adds one more argument "struct pnv_ioda_pe *pe" to it. The caller,
pnv_pci_ioda_setup_DMA(), gets PE from the list and passes to it
or pnv_pci_ioda2_setup_dma_pe(). The patch shouldn't cause behavioral
changes.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 75 +++++++++++++++----------------
 1 file changed, 36 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8456f37..cd22002 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2443,52 +2443,29 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
 }
 
-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
+static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
+					struct pnv_ioda_pe *pe,
+					unsigned int base)
 {
 	struct pci_controller *hose = phb->hose;
-	struct pnv_ioda_pe *pe;
-	unsigned int dma_weight;
+	unsigned int dma_weight, segs;
 
 	/* Calculate the PHB's DMA weight */
 	dma_weight = pnv_ioda_phb_dma_weight(phb);
 	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
 		hose->global_number, phb->ioda.dma32_segcount, dma_weight);
 
-	pnv_pci_ioda_setup_opal_tce_kill(phb);
-
-	/* Walk our PE list and configure their DMA segments, hand them
-	 * out one base segment plus any residual segments based on
-	 * weight
-	 */
-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
-		if (!pe->dma32_weight)
-			continue;
-
-		/*
-		 * For IODA2 compliant PHB3, we needn't care about the weight.
-		 * The all available 32-bits DMA space will be assigned to
-		 * the specific PE.
-		 */
-		if (phb->type == PNV_PHB_IODA1) {
-			unsigned int segs, base = 0;
-
-			if (pe->dma32_weight <
-			    dma_weight / phb->ioda.dma32_segcount)
-				segs = 1;
-			else
-				segs = (pe->dma32_weight *
-					phb->ioda.dma32_segcount) / dma_weight;
-
-			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
-				pe->dma32_weight, segs);
-			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
+	if (pe->dma32_weight <
+	    dma_weight / phb->ioda.dma32_segcount)
+		segs = 1;
+	else
+		segs = (pe->dma32_weight *
+			phb->ioda.dma32_segcount) / dma_weight;
+	pe_info(pe, "DMA weight %d, assigned %d segments\n",
+		pe->dma32_weight, segs);
+	pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
 
-			base += segs;
-		} else {
-			pe_info(pe, "Assign DMA32 space\n");
-			pnv_pci_ioda2_setup_dma_pe(phb, pe);
-		}
-	}
+	return segs;
 }
 
 #ifdef CONFIG_PCI_MSI
@@ -2955,12 +2932,32 @@ static void pnv_pci_ioda_setup_DMA(void)
 {
 	struct pci_controller *hose, *tmp;
 	struct pnv_phb *phb;
+	struct pnv_ioda_pe *pe;
+	unsigned int base;
 
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		pnv_ioda_setup_dma(hose->private_data);
+		phb = hose->private_data;
+		pnv_pci_ioda_setup_opal_tce_kill(phb);
+
+		base = 0;
+		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
+			if (!pe->dma32_weight)
+				continue;
+
+			switch (phb->type) {
+			case PNV_PHB_IODA1:
+				base += pnv_ioda1_setup_dma(phb, pe, base);
+				break;
+			case PNV_PHB_IODA2:
+				pnv_pci_ioda2_setup_dma_pe(phb, pe);
+				break;
+			default:
+				pr_warn("%s: No DMA for PHB type %d\n",
+					__func__, phb->type);
+			}
+		}
 
 		/* Mark the PHB initialization done */
-		phb = hose->private_data;
 		phb->initialized = 1;
 	}
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by PE
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (9 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-10  9:43   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 13/42] powerpc/pci: Cleanup on pci_controller_ops Gavin Shan
                   ` (25 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

On P7IOC, the whole DMA32 space is divided evenly to 256MB segments.
Each PE can consume one or multiple DMA32 segments. Current code
doesn't trace the available DMA32 segments and those consumed by
one particular PE. It's conflicting with PCI hotplug.

The patch introduces one bitmap to PHB to trace the available
DMA32 segments for allocation, more fields to "struct pnv_ioda_pe"
to trace the consumed DMA32 segments by the PE, which is going to
be released when the PE is destroyed at PCI unplugging time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 40 +++++++++++++++++++++++--------
 arch/powerpc/platforms/powernv/pci.h      |  4 +++-
 2 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index cd22002..57ba8fd 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1946,6 +1946,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 	/* Grab a 32-bit TCE table */
 	pe->dma32_seg = base;
+	pe->dma32_segcount = segs;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
 		(base << 28), ((base + segs) << 28) - 1);
 
@@ -2006,8 +2007,13 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	return;
  fail:
 	/* XXX Failure: Try to fallback to 64-bit only ? */
-	if (pe->dma32_seg >= 0)
+	if (pe->dma32_seg >= 0) {
+		bitmap_clear(phb->ioda.dma32_segmap,
+			     pe->dma32_seg, pe->dma32_segcount);
 		pe->dma32_seg = -1;
+		pe->dma32_segcount = 0;
+	}
+
 	if (tce_mem)
 		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
 	if (tbl) {
@@ -2443,12 +2449,11 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
 }
 
-static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
-					struct pnv_ioda_pe *pe,
-					unsigned int base)
+static void pnv_ioda1_setup_dma(struct pnv_phb *phb,
+					struct pnv_ioda_pe *pe)
 {
 	struct pci_controller *hose = phb->hose;
-	unsigned int dma_weight, segs;
+	unsigned int dma_weight, base, segs;
 
 	/* Calculate the PHB's DMA weight */
 	dma_weight = pnv_ioda_phb_dma_weight(phb);
@@ -2461,11 +2466,28 @@ static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
 	else
 		segs = (pe->dma32_weight *
 			phb->ioda.dma32_segcount) / dma_weight;
+
+	/*
+	 * Allocate DMA32 segments. We might not have enough
+	 * resources available. However we expect at least one
+	 * to be available.
+	 */
+	do {
+		base = bitmap_find_next_zero_area(phb->ioda.dma32_segmap,
+						  phb->ioda.dma32_segcount,
+						  0, segs, 0);
+		if (base < phb->ioda.dma32_segcount) {
+			bitmap_set(phb->ioda.dma32_segmap, base, segs);
+			break;
+		}
+	} while (--segs);
+
+	if (WARN_ON(!segs))
+		return;
+
 	pe_info(pe, "DMA weight %d, assigned %d segments\n",
 		pe->dma32_weight, segs);
 	pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
-
-	return segs;
 }
 
 #ifdef CONFIG_PCI_MSI
@@ -2933,20 +2955,18 @@ static void pnv_pci_ioda_setup_DMA(void)
 	struct pci_controller *hose, *tmp;
 	struct pnv_phb *phb;
 	struct pnv_ioda_pe *pe;
-	unsigned int base;
 
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
 		pnv_pci_ioda_setup_opal_tce_kill(phb);
 
-		base = 0;
 		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
 			if (!pe->dma32_weight)
 				continue;
 
 			switch (phb->type) {
 			case PNV_PHB_IODA1:
-				base += pnv_ioda1_setup_dma(phb, pe, base);
+				pnv_ioda1_setup_dma(phb, pe);
 				break;
 			case PNV_PHB_IODA2:
 				pnv_pci_ioda2_setup_dma_pe(phb, pe);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 574fe43..1dc9578 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -65,6 +65,7 @@ struct pnv_ioda_pe {
 
 	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
 	int			dma32_seg;
+	int			dma32_segcount;
 	struct iommu_table_group table_group;
 
 	/* 64-bit TCE bypass region */
@@ -153,10 +154,11 @@ struct pnv_phb {
 			unsigned int		io_segsize;
 			unsigned int		io_pci_base;
 
-			/* IO, M32, M64 segment maps */
+			/* IO, M32, M64, DMA32 segment maps */
 			unsigned long		io_segmap[8];
 			unsigned long		m32_segmap[8];
 			unsigned long		m64_segmap[8];
+			unsigned long		dma32_segmap[8];
 
 			/* PE allocation */
 			struct mutex		pe_alloc_mutex;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 12/42] powerpc/powernv: Increase PE# capacity
       [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2015-08-06  4:11   ` Gavin Shan
       [not found]     ` <1438834307-26960-13-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2015-08-06  4:11   ` [PATCH v6 17/42] powerpc/powernv: Rename PE# fields in PHB Gavin Shan
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r, mpe-Gsx/Oe8HsFggBc27wqDAHg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	panto-wVdstyuyKrO8r51toPun2/C9HSW9iNxf,
	aik-sLpHqDYs0B2HXe+LvDLADg, Gavin Shan

Each PHB maintains an array helping to translate RID (Request
ID) to PE# with the assumption that PE# takes 8 bits, indicating
that we can't have more than 256 PEs. However, pci_dn->pe_number
already had 4-bytes for the PE#.

The patch extends the PE# capacity so that each of them will be
4-bytes long. Then we can use IODA_INVALID_PE to check one entry
in phb->pe_rmap[] is valid or not.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 8 ++++++--
 arch/powerpc/platforms/powernv/pci.h      | 7 +++----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 57ba8fd..3094c61 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -786,7 +786,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	/* Clear the reverse map */
 	for (rid = pe->rid; rid < rid_end; rid++)
-		phb->ioda.pe_rmap[rid] = 0;
+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
 
 	/* Release from all parents PELT-V */
 	while (parent) {
@@ -3134,7 +3134,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	unsigned long size, pemap_off;
 	const __be64 *prop64;
 	const __be32 *prop32;
-	int len;
+	int len, i;
 	u64 phb_id;
 	void *aux;
 	long rc;
@@ -3201,6 +3201,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (prop32)
 		phb->ioda.reserved_pe = be32_to_cpup(prop32);
 
+	/* Invalidate RID to PE# mapping */
+	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
+		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
+
 	/* Parse 64-bit MMIO range */
 	pnv_ioda_parse_m64_window(phb);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 1dc9578..6f8568e 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -175,11 +175,10 @@ struct pnv_phb {
 			struct list_head	pe_list;
 			struct mutex            pe_list_mutex;
 
-			/* Reverse map of PEs, will have to extend if
-			 * we are to support more than 256 PEs, indexed
-			 * bus { bus, devfn }
+			/* Reverse map of PEs, indexed by
+			 * { bus, devfn }
 			 */
-			unsigned char		pe_rmap[0x10000];
+			int			pe_rmap[0x10000];
 
 			/* 32-bit TCE tables allocation */
 			unsigned long		dma32_segcount;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 13/42] powerpc/pci: Cleanup on pci_controller_ops
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (10 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by PE Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 14/42] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan, Daniel Axtens

Each PHB maintains one instance of "struct pci_controller_ops",
which includes various callbacks called by PCI subsystem. In the
definition of this struct, some callbacks have explicit names for
its arguments, but the left don't have.

This adds all explicit names of the arguments to the callbacks in
"struct pci_controller_ops" so that the code looks consistent.

Cc: Daniel Axtens <dja@axtens.net>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c927d5b..d627abf 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -21,18 +21,19 @@ struct pci_controller_ops {
 	void		(*dma_dev_setup)(struct pci_dev *dev);
 	void		(*dma_bus_setup)(struct pci_bus *bus);
 
-	int		(*probe_mode)(struct pci_bus *);
+	int		(*probe_mode)(struct pci_bus *bus);
 
 	/* Called when pci_enable_device() is called. Returns true to
 	 * allow assignment/enabling of the device. */
-	bool		(*enable_device_hook)(struct pci_dev *);
+	bool		(*enable_device_hook)(struct pci_dev *dev);
 
-	void		(*disable_device)(struct pci_dev *);
+	void		(*disable_device)(struct pci_dev *dev);
 
-	void		(*release_device)(struct pci_dev *);
+	void		(*release_device)(struct pci_dev *dev);
 
 	/* Called during PCI resource reassignment */
-	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
+	resource_size_t (*window_alignment)(struct pci_bus *bus,
+					    unsigned long type);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
 
 #ifdef CONFIG_PCI_MSI
@@ -43,7 +44,7 @@ struct pci_controller_ops {
 
 	int             (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask);
 
-	void		(*shutdown)(struct pci_controller *);
+	void		(*shutdown)(struct pci_controller *hose);
 };
 
 /*
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 14/42] powerpc/pci: Override pcibios_setup_bridge()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (11 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 13/42] powerpc/pci: Cleanup on pci_controller_ops Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 15/42] powerpc/powernv: PE oriented during configuration Gavin Shan
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The patch overrides pcibios_setup_bridge(), called to update PCI
bridge windows at completion of PCI resource assignment, to assign
PE and setup various (resource) mapping in next patch.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h | 2 ++
 arch/powerpc/kernel/pci-common.c      | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index d627abf..65357a9 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -34,6 +34,8 @@ struct pci_controller_ops {
 	/* Called during PCI resource reassignment */
 	resource_size_t (*window_alignment)(struct pci_bus *bus,
 					    unsigned long type);
+	void		(*setup_bridge)(struct pci_bus *bus,
+					unsigned long type);
 	void		(*reset_secondary_bus)(struct pci_dev *dev);
 
 #ifdef CONFIG_PCI_MSI
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index b9de34d..9c88dcd1 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -123,6 +123,14 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 	return 1;
 }
 
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+
+	if (hose->controller_ops.setup_bridge)
+		hose->controller_ops.setup_bridge(bus, type);
+}
+
 void pcibios_reset_secondary_bus(struct pci_dev *dev)
 {
 	struct pci_controller *phb = pci_bus_to_host(dev->bus);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 15/42] powerpc/powernv: PE oriented during configuration
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (12 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 14/42] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-10 10:02   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 16/42] powerpc/powernv: Helper function pnv_ioda_init_pe() Gavin Shan
                   ` (22 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

Several functions used to configure PE take pe_number to indentify
PE instance. As the pe_number is included in PE instance after it
is reserved or allocated. It's convienent for those functions to
return PE instance which includes the required pe_number.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 51 ++++++++++++++++---------------
 arch/powerpc/platforms/powernv/pci.h      |  2 +-
 2 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3094c61..9f53682 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -132,12 +132,12 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
-static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
+static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 {
 	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
 		pr_warn("%s: Invalid PE %d on PHB#%x\n",
 			__func__, pe_no, phb->hose->global_number);
-		return;
+		return NULL;
 	}
 
 	if (test_and_set_bit(pe_no, phb->ioda.pe_alloc))
@@ -146,9 +146,11 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 
 	phb->ioda.pe_array[pe_no].phb = phb;
 	phb->ioda.pe_array[pe_no].pe_number = pe_no;
+
+	return &phb->ioda.pe_array[pe_no];
 }
 
-static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
+static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
 	unsigned long pe;
 
@@ -156,12 +158,12 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 		pe = find_next_zero_bit(phb->ioda.pe_alloc,
 					phb->ioda.total_pe, 0);
 		if (pe >= phb->ioda.total_pe)
-			return IODA_INVALID_PE;
+			return NULL;
 	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
 	phb->ioda.pe_array[pe].phb = phb;
 	phb->ioda.pe_array[pe].pe_number = pe;
-	return pe;
+	return &phb->ioda.pe_array[pe];
 }
 
 static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
@@ -334,7 +336,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
 	}
 }
 
-static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
+static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -344,7 +346,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 
 	/* Root bus shouldn't use M64 */
 	if (pci_is_root_bus(bus))
-		return IODA_INVALID_PE;
+		return NULL;
 
 	/* Allocate bitmap */
 	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
@@ -352,7 +354,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	if (!pe_alloc) {
 		pr_warn("%s: Out of memory !\n",
 			__func__);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/* Figure out reserved PE numbers by the PE */
@@ -365,7 +367,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	 */
 	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
 		kfree(pe_alloc);
-		return IODA_INVALID_PE;
+		return NULL;
 	}
 
 	/*
@@ -416,7 +418,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	}
 
 	kfree(pe_alloc);
-	return master_pe->pe_number;
+	return master_pe;
 }
 
 static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
@@ -1069,28 +1071,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
  * subordinate PCI devices and buses. The second type of PE is normally
  * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
  */
-static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
+static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
-	struct pnv_ioda_pe *pe;
-	int pe_num = IODA_INVALID_PE;
+	struct pnv_ioda_pe *pe = NULL;
 
 	/* Check if PE is determined by M64 */
 	if (phb->pick_m64_pe)
-		pe_num = phb->pick_m64_pe(bus, all);
+		pe = phb->pick_m64_pe(bus, all);
 
 	/* The PE number isn't pinned by M64 */
-	if (pe_num == IODA_INVALID_PE)
-		pe_num = pnv_ioda_alloc_pe(phb);
+	if (!pe)
+		pe = pnv_ioda_alloc_pe(phb);
 
-	if (pe_num == IODA_INVALID_PE) {
-		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
+	if (!pe) {
+		pr_warning("%s: No enough PE# for PCI bus %04x:%02x\n",
 			__func__, pci_domain_nr(bus), bus->number);
-		return;
+		return NULL;
 	}
 
-	pe = &phb->ioda.pe_array[pe_num];
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
@@ -1101,17 +1101,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 
 	if (all)
 		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
-			bus->busn_res.start, bus->busn_res.end, pe_num);
+			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
 	else
 		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
-			bus->busn_res.start, pe_num);
+			bus->busn_res.start, pe->pe_number);
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
 		/* XXX What do we do here ? */
-		if (pe_num)
-			pnv_ioda_free_pe(phb, pe_num);
+		pnv_ioda_free_pe(phb, pe->pe_number);
 		pe->pbus = NULL;
-		return;
+		return NULL;
 	}
 
 	/* Associate it with all child devices */
@@ -1122,6 +1121,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 
 	/* Link the PE */
 	pnv_ioda_link_pe_by_weight(phb, pe);
+
+	return pe;
 }
 
 static void pnv_ioda_setup_PEs(struct pci_bus *bus)
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 6f8568e..c0bc57f 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -121,7 +121,7 @@ struct pnv_phb {
 	int (*init_m64)(struct pnv_phb *phb);
 	void (*reserve_m64_pe)(struct pci_bus *bus,
 			       unsigned long *pe_bitmap, bool all);
-	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
+	struct pnv_ioda_pe* (*pick_m64_pe)(struct pci_bus *bus, bool all);
 	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
 	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
 	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 16/42] powerpc/powernv: Helper function pnv_ioda_init_pe()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (13 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 15/42] powerpc/powernv: PE oriented during configuration Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order Gavin Shan
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The patch introduces helper function pnv_ioda_init_pe(), which
initialize PE instance after reserving or allocating PE#, to
simplify the code. The patch doesn't introduce behavioural
changes.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9f53682..9cccf2d5 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -132,6 +132,17 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
+static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
+{
+	struct pnv_ioda_pe *pe = &phb->ioda.pe_array[pe_no];
+
+	pe->phb = phb;
+	pe->pe_number = pe_no;
+	INIT_LIST_HEAD(&pe->list);
+
+	return pe;
+}
+
 static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 {
 	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
@@ -144,10 +155,7 @@ static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 		pr_debug("%s: PE %d was reserved on PHB#%x\n",
 			 __func__, pe_no, phb->hose->global_number);
 
-	phb->ioda.pe_array[pe_no].phb = phb;
-	phb->ioda.pe_array[pe_no].pe_number = pe_no;
-
-	return &phb->ioda.pe_array[pe_no];
+	return pnv_ioda_init_pe(phb, pe_no);
 }
 
 static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
@@ -161,9 +169,7 @@ static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 			return NULL;
 	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
-	phb->ioda.pe_array[pe].phb = phb;
-	phb->ioda.pe_array[pe].pe_number = pe;
-	return &phb->ioda.pe_array[pe];
+	return pnv_ioda_init_pe(phb, pe);
 }
 
 static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 17/42] powerpc/powernv: Rename PE# fields in PHB
       [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2015-08-06  4:11   ` [PATCH v6 12/42] powerpc/powernv: Increase PE# capacity Gavin Shan
@ 2015-08-06  4:11   ` Gavin Shan
  2015-08-10 14:21     ` Alexey Kardashevskiy
  2015-08-06  4:11   ` [PATCH v6 28/42] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus() Gavin Shan
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r, mpe-Gsx/Oe8HsFggBc27wqDAHg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	panto-wVdstyuyKrO8r51toPun2/C9HSW9iNxf,
	aik-sLpHqDYs0B2HXe+LvDLADg, Gavin Shan

This renames the fields related to PE# in "struct pnv_phb" for
better reflecting of their usages as Alexey suggested. It doesn't
introduce behavioural changes.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c    | 58 ++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.c         |  2 +-
 arch/powerpc/platforms/powernv/pci.h         |  4 +-
 4 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e5e0d0b..347b1cf 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -81,7 +81,7 @@ static int pnv_eeh_init(void)
 		 * and P7IOC separately. So we should regard
 		 * PE#0 as valid for P7IOC.
 		 */
-		if (phb->ioda.reserved_pe != 0)
+		if (phb->ioda.reserved_pe_idx != 0)
 			eeh_add_flag(EEH_VALID_PE_ZERO);
 
 		break;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9cccf2d5..56b058c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -145,7 +145,7 @@ static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
 
 static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 {
-	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
+	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
 		pr_warn("%s: Invalid PE %d on PHB#%x\n",
 			__func__, pe_no, phb->hose->global_number);
 		return NULL;
@@ -164,8 +164,8 @@ static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 
 	do {
 		pe = find_next_zero_bit(phb->ioda.pe_alloc,
-					phb->ioda.total_pe, 0);
-		if (pe >= phb->ioda.total_pe)
+					phb->ioda.total_pe_num, 0);
+		if (pe >= phb->ioda.total_pe_num)
 			return NULL;
 	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
@@ -188,7 +188,7 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
 	/* There are as many M64 segments as the maximum number
 	 * of PEs, which is 128.
 	 */
-	for (seg = 0; seg < phb->ioda.total_pe; seg += 8) {
+	for (seg = 0; seg < phb->ioda.total_pe_num; seg += 8) {
 		unsigned long base;
 		int64_t rc;
 
@@ -223,13 +223,13 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
 	 * MMIO range that the PHB supports.
 	 */
 	r = &phb->hose->mem_resources[1];
-	if (phb->ioda.reserved_pe == 0)
+	if (phb->ioda.reserved_pe_idx == 0)
 		r->start += phb->ioda.m64_segsize;
-	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
 		r->end -= phb->ioda.m64_segsize;
 	else
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
-			phb->ioda.reserved_pe);
+			phb->ioda.reserved_pe_idx);
 
 	return 0;
 
@@ -280,13 +280,13 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 	 * expected to be 0 or last one of PE capabicity.
 	 */
 	r = &phb->hose->mem_resources[1];
-	if (phb->ioda.reserved_pe == 0)
+	if (phb->ioda.reserved_pe_idx == 0)
 		r->start += phb->ioda.m64_segsize;
-	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
+	else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
 		r->end -= phb->ioda.m64_segsize;
 	else
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
-			phb->ioda.reserved_pe);
+			phb->ioda.reserved_pe_idx);
 
 	return 0;
 
@@ -355,7 +355,7 @@ static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 		return NULL;
 
 	/* Allocate bitmap */
-	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
+	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
 	pe_alloc = kzalloc(size, GFP_KERNEL);
 	if (!pe_alloc) {
 		pr_warn("%s: Out of memory !\n",
@@ -371,7 +371,7 @@ static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	 * contributed by its child buses. For the case, we needn't
 	 * pick M64 dependent PE#.
 	 */
-	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
+	if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
 		kfree(pe_alloc);
 		return NULL;
 	}
@@ -382,8 +382,8 @@ static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 	 */
 	master_pe = NULL;
 	i = -1;
-	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
-		phb->ioda.total_pe) {
+	while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe_num, i + 1)) <
+		phb->ioda.total_pe_num) {
 		pe = &phb->ioda.pe_array[i];
 
 		if (!master_pe) {
@@ -461,7 +461,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
 	hose->mem_offset[1] = res->start - pci_addr;
 
 	phb->ioda.m64_size = resource_size(res);
-	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe;
+	phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe_num;
 	phb->ioda.m64_base = pci_addr;
 
 	pr_info(" MEM64 0x%016llx..0x%016llx -> 0x%016llx\n",
@@ -571,7 +571,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
 	s64 rc;
 
 	/* Sanity check on PE number */
-	if (pe_no < 0 || pe_no >= phb->ioda.total_pe)
+	if (pe_no < 0 || pe_no >= phb->ioda.total_pe_num)
 		return OPAL_EEH_STOPPED_PERM_UNAVAIL;
 
 	/*
@@ -1567,9 +1567,9 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 		/* Calculate available PE for required VFs */
 		mutex_lock(&phb->ioda.pe_alloc_mutex);
 		pdn->offset = bitmap_find_next_zero_area(
-			phb->ioda.pe_alloc, phb->ioda.total_pe,
+			phb->ioda.pe_alloc, phb->ioda.total_pe_num,
 			0, num_vfs, 0);
-		if (pdn->offset >= phb->ioda.total_pe) {
+		if (pdn->offset >= phb->ioda.total_pe_num) {
 			mutex_unlock(&phb->ioda.pe_alloc_mutex);
 			dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
 			pdn->offset = 0;
@@ -2803,7 +2803,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 
 	total_vfs = pci_sriov_get_totalvfs(pdev);
 	pdn->m64_per_iov = 1;
-	mul = phb->ioda.total_pe;
+	mul = phb->ioda.total_pe_num;
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
@@ -2889,7 +2889,7 @@ static int pnv_ioda_setup_one_res(struct pci_controller *hose,
 	region.start = _ALIGN_DOWN(region.start, segsize);
 	region.end   = _ALIGN_UP(region.end, segsize);
 	index = region.start / segsize;
-	while (index < phb->ioda.total_pe &&
+	while (index < phb->ioda.total_pe_num &&
 	       region.start < region.end) {
 		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
 				pe->pe_number, win, 0, index);
@@ -3200,13 +3200,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		pr_err("  Failed to map registers !\n");
 
 	/* Initialize more IODA stuff */
-	phb->ioda.total_pe = 1;
+	phb->ioda.total_pe_num = 1;
 	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
 	if (prop32)
-		phb->ioda.total_pe = be32_to_cpup(prop32);
+		phb->ioda.total_pe_num = be32_to_cpup(prop32);
 	prop32 = of_get_property(np, "ibm,opal-reserved-pe", NULL);
 	if (prop32)
-		phb->ioda.reserved_pe = be32_to_cpup(prop32);
+		phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
 
 	/* Invalidate RID to PE# mapping */
 	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
@@ -3219,20 +3219,20 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	/* FW Has already off top 64k of M32 space (MSI space) */
 	phb->ioda.m32_size += 0x10000;
 
-	phb->ioda.m32_segsize = phb->ioda.m32_size / phb->ioda.total_pe;
+	phb->ioda.m32_segsize = phb->ioda.m32_size / phb->ioda.total_pe_num;
 	phb->ioda.m32_pci_base = hose->mem_resources[0].start - hose->mem_offset[0];
 	phb->ioda.io_size = hose->pci_io_size;
-	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe;
+	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
 	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
 
 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
-	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
+	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
 	pemap_off = size;
-	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
+	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
 	phb->ioda.pe_array = aux + pemap_off;
-	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
+	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
 
 	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
@@ -3251,7 +3251,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 #endif
 
 	pr_info("  %03d (%03d) PE's M32: 0x%x [segment=0x%x]\n",
-		phb->ioda.total_pe, phb->ioda.reserved_pe,
+		phb->ioda.total_pe_num, phb->ioda.reserved_pe_idx,
 		phb->ioda.m32_size, phb->ioda.m32_segsize);
 	if (phb->ioda.m64_size)
 		pr_info("                 M64: 0x%lx [segment=0x%lx]\n",
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index f3aead0..6c350a2 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -384,7 +384,7 @@ static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
 		if (phb->type == PNV_PHB_P5IOC2)
 			pe_no = 0;
 		else
-			pe_no = phb->ioda.reserved_pe;
+			pe_no = phb->ioda.reserved_pe_idx;
 	}
 
 	/*
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index c0bc57f..fc899cd 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -134,8 +134,8 @@ struct pnv_phb {
 
 		struct {
 			/* Global bridge info */
-			unsigned int		total_pe;
-			unsigned int		reserved_pe;
+			unsigned int		total_pe_num;
+			unsigned int		reserved_pe_idx;
 
 			/* 32-bit MMIO window */
 			unsigned int		m32_size;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (14 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 16/42] powerpc/powernv: Helper function pnv_ioda_init_pe() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-10 14:39   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 19/42] powerpc/powernv: Reserve PE# for root bus Gavin Shan
                   ` (20 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The available PE#, represented by a bitmap in the PHB, is allocated
in ascending order. It conflicts with the fact that M64 segments are
assigned in same order. In order to avoid the conflict, the patch
allocates PE# in descending order.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 56b058c..1c950e8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -161,13 +161,18 @@ static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
 	unsigned long pe;
+	unsigned long limit = phb->ioda.total_pe_num - 1;
 
 	do {
 		pe = find_next_zero_bit(phb->ioda.pe_alloc,
-					phb->ioda.total_pe_num, 0);
-		if (pe >= phb->ioda.total_pe_num)
+					phb->ioda.total_pe_num, limit);
+		if (pe < phb->ioda.total_pe_num &&
+		    !test_and_set_bit(pe, phb->ioda.pe_alloc))
+			break;
+
+		if (--limit >= phb->ioda.total_pe_num)
 			return NULL;
-	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
+	} while (1);
 
 	return pnv_ioda_init_pe(phb, pe);
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 19/42] powerpc/powernv: Reserve PE# for root bus
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (15 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 20/42] powerpc/powernv: Create PEs dynamically Gavin Shan
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

pcibios_setup_bridge() is normally called to update PCI bridge
windows. It allocates PE for PCI buses. However it is not called
on a root bus which does not have an upstream bridge.

This reserves PE# for a root bus in advance. This will be used in
the subsequent patch to do setup.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 30 +++++++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.h      |  1 +
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1c950e8..8aa6ab8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -236,6 +236,13 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe_idx);
 
+	/* Strip off the M64 segment corresponding to the PE#
+	 * for PCI root bus, which is last supported PE# or
+	 * (reserved PE# - 1).
+	 */
+	if (phb->ioda.root_pe_idx != IODA_INVALID_PE)
+		r->end -= phb->ioda.m64_segsize;
+
 	return 0;
 
 fail:
@@ -293,6 +300,13 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
 			phb->ioda.reserved_pe_idx);
 
+	/* Strip off the M64 segment corresponding to the PE#
+	 * for PCI root bus, which is last supported PE# or
+	 * (reserved PE# - 1).
+	 */
+	if (phb->ioda.root_pe_idx != IODA_INVALID_PE)
+		r->end -= phb->ioda.m64_segsize;
+
 	return 0;
 
 fail:
@@ -3237,7 +3251,21 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	aux = memblock_virt_alloc(size, 0);
 	phb->ioda.pe_alloc = aux;
 	phb->ioda.pe_array = aux + pemap_off;
-	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
+
+	/* Choose number of PE for root bus, which shouldn't consume
+	 * any M64 resource. So we avoid picking low-end PE#, which
+	 * is usually bound with M64 resources closely.
+	 */
+	pnv_ioda_reserve_pe(phb, phb->ioda.reserved_pe_idx);
+	if (phb->ioda.reserved_pe_idx == 0) {
+		phb->ioda.root_pe_idx = phb->ioda.total_pe_num - 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
+	} else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1)) {
+		phb->ioda.root_pe_idx = phb->ioda.reserved_pe_idx - 1;
+		pnv_ioda_reserve_pe(phb, phb->ioda.root_pe_idx);
+	} else {
+		phb->ioda.root_pe_idx = IODA_INVALID_PE;
+	}
 
 	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index fc899cd..e93a489 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -135,6 +135,7 @@ struct pnv_phb {
 		struct {
 			/* Global bridge info */
 			unsigned int		total_pe_num;
+			unsigned int		root_pe_idx;
 			unsigned int		reserved_pe_idx;
 
 			/* 32-bit MMIO window */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 20/42] powerpc/powernv: Create PEs dynamically
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (16 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 19/42] powerpc/powernv: Reserve PE# for root bus Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-14 13:52   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 21/42] powerpc/powernv: Remove DMA32 list of PEs Gavin Shan
                   ` (18 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

Currently, the PEs and their associated resources are assigned
in ppc_md.pcibios_fixup() except those consumed by SRIOV VFs.
The function is called for once after PCI probing and resources
assignment is finished which isn't hotplug friendly.

The patch creates PEs dynamically by ppc_md.pcibios_setup_bridge(),
which is called on the event during system bootup and PCI hotplug:
updating PCI bridge's windows after resource assignment/reassignment
are finished. For partial hotplug case, where not all PCI devices
belonging to the PE are unplugged and plugged again, we just need
unbinding/binding the affected PCI devices with the corresponding
PE without creating new one.

Besides, it might require additional resources (e.g. M32) to the
windows of the PCI bridge when unplugging current adapter, and
insert a different adapter if there is one PCI slot, which is
assumed behind root port, or the downstream bridge of the PCIE
switch behind root port. The parent bridge of the newly plugged
adapter would reject the request to add more resources, leading
to hotplug failure. For the issue, the patch extends the windows
of root port, or the upstream port of the PCIe switch behind root
port to PHB's windows when ppc_md.pcibios_setup_bridge() is called.

There is no upstream bridge for root bus, so we have to fix it up
before any PE is created because the root bus PE is the ancestor
to anyone else.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 226 ++++++++++++++++++------------
 arch/powerpc/platforms/powernv/pci.h      |   1 +
 2 files changed, 137 insertions(+), 90 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8aa6ab8..37847a3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1083,6 +1083,13 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 				pci_name(dev));
 			continue;
 		}
+
+		/* The PCI device might be not detached from the
+		 * PE in partial hotplug case.
+		 */
+		if (pdn->pe_number != IODA_INVALID_PE)
+			continue;
+
 		pdn->pe_number = pe->pe_number;
 		pe->dma32_weight += pnv_ioda_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
@@ -1101,9 +1108,27 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
 	struct pnv_ioda_pe *pe = NULL;
+	int pe_num;
+
+	/* For partial hotplug case, the PE instance hasn't been destroyed
+	 * yet. We shouldn't allocated a new one and assign resources to
+	 * it. The existing PE instance should be reused, but we should
+	 * associate the devices to the PE.
+	 */
+	pe_num = phb->ioda.pe_rmap[bus->number << 8];
+	if (pe_num != IODA_INVALID_PE) {
+		pe = &phb->ioda.pe_array[pe_num];
+		pnv_ioda_setup_same_PE(bus, pe);
+		return NULL;
+	}
+
+	/* PE number for root bus should have been reserved */
+	if (pci_is_root_bus(bus) &&
+	    phb->ioda.root_pe_idx != IODA_INVALID_PE)
+		pe = &phb->ioda.pe_array[phb->ioda.root_pe_idx];
 
 	/* Check if PE is determined by M64 */
-	if (phb->pick_m64_pe)
+	if (!pe && phb->pick_m64_pe)
 		pe = phb->pick_m64_pe(bus, all);
 
 	/* The PE number isn't pinned by M64 */
@@ -1150,46 +1175,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	return pe;
 }
 
-static void pnv_ioda_setup_PEs(struct pci_bus *bus)
-{
-	struct pci_dev *dev;
-
-	pnv_ioda_setup_bus_PE(bus, false);
-
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		if (dev->subordinate) {
-			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
-				pnv_ioda_setup_bus_PE(dev->subordinate, true);
-			else
-				pnv_ioda_setup_PEs(dev->subordinate);
-		}
-	}
-}
-
-/*
- * Configure PEs so that the downstream PCI buses and devices
- * could have their associated PE#. Unfortunately, we didn't
- * figure out the way to identify the PLX bridge yet. So we
- * simply put the PCI bus and the subordinate behind the root
- * port to PE# here. The game rule here is expected to be changed
- * as soon as we can detected PLX bridge correctly.
- */
-static void pnv_pci_ioda_setup_PEs(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pnv_phb *phb;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-
-		/* M64 layout might affect PE allocation */
-		if (phb->reserve_m64_pe)
-			phb->reserve_m64_pe(hose->bus, NULL, true);
-
-		pnv_ioda_setup_PEs(hose->bus);
-	}
-}
-
 #ifdef CONFIG_PCI_IOV
 static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
 {
@@ -2962,52 +2947,6 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 	}
 }
 
-static void pnv_pci_ioda_setup_seg(void)
-{
-	struct pci_controller *tmp, *hose;
-	struct pnv_phb *phb;
-	struct pnv_ioda_pe *pe;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			pnv_ioda_setup_pe_seg(hose, pe);
-		}
-	}
-}
-
-static void pnv_pci_ioda_setup_DMA(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pnv_phb *phb;
-	struct pnv_ioda_pe *pe;
-
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		phb = hose->private_data;
-		pnv_pci_ioda_setup_opal_tce_kill(phb);
-
-		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
-			if (!pe->dma32_weight)
-				continue;
-
-			switch (phb->type) {
-			case PNV_PHB_IODA1:
-				pnv_ioda1_setup_dma(phb, pe);
-				break;
-			case PNV_PHB_IODA2:
-				pnv_pci_ioda2_setup_dma_pe(phb, pe);
-				break;
-			default:
-				pr_warn("%s: No DMA for PHB type %d\n",
-					__func__, phb->type);
-			}
-		}
-
-		/* Mark the PHB initialization done */
-		phb->initialized = 1;
-	}
-}
-
 static void pnv_pci_ioda_create_dbgfs(void)
 {
 #ifdef CONFIG_DEBUG_FS
@@ -3029,9 +2968,8 @@ static void pnv_pci_ioda_create_dbgfs(void)
 
 static void pnv_pci_ioda_fixup(void)
 {
-	pnv_pci_ioda_setup_PEs();
-	pnv_pci_ioda_setup_seg();
-	pnv_pci_ioda_setup_DMA();
+	struct pci_controller *hose, *tmp;
+	struct pnv_phb *phb;
 
 	pnv_pci_ioda_create_dbgfs();
 
@@ -3039,6 +2977,12 @@ static void pnv_pci_ioda_fixup(void)
 	eeh_init();
 	eeh_addr_cache_build();
 #endif
+
+	/* Notify initialization of PHB done */
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+		phb = hose->private_data;
+		phb->initialized = 1;
+	}
 }
 
 /*
@@ -3082,6 +3026,105 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
 	return phb->ioda.io_segsize;
 }
 
+/*
+ * We are updating root port or the upstream bridge behind the
+ * root port with PHB's windows in order to accommodate the
+ * changes on required resources during PCI (slot) hotplug,
+ * which is connected to either root port or the downstream
+ * ports of PCIe switch behind the root port.
+ */
+static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
+					   unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct resource *r, *w;
+	int i;
+
+	/* Check if we need apply fixup to the bridge's windows */
+	if (!pci_is_root_bus(bridge->bus) &&
+	    !pci_is_root_bus(bridge->bus->self->bus))
+		return;
+
+	/* Fixup the resoureces */
+	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
+		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
+		if (!r->flags || !r->parent)
+			continue;
+
+		w = NULL;
+		if (r->flags & type & IORESOURCE_IO)
+			w = &hose->io_resource;
+		else if (pnv_pci_is_mem_pref_64(r->flags) &&
+			 (type & IORESOURCE_PREFETCH) &&
+			 phb->ioda.m64_segsize)
+			w = &hose->mem_resources[1];
+		else if (r->flags & type & IORESOURCE_MEM)
+			w = &hose->mem_resources[0];
+
+		r->start = w->start;
+		r->end = w->end;
+	}
+}
+
+static void pnv_pci_setup_bridge(struct pci_bus *bus,
+				 unsigned long type)
+{
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dev *bridge = bus->self;
+	struct pnv_ioda_pe *pe;
+	bool all = (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
+
+	/* The root bus (ancestor PE) should be finalized
+	 * before anyone else
+	 */
+	if (!phb->ioda.root_pe_is_populated) {
+		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, false);
+		if (pe && phb->ioda.root_pe_idx == IODA_INVALID_PE)
+			phb->ioda.root_pe_idx = pe->pe_number;
+			phb->ioda.root_pe_is_populated = true;
+		}
+
+	/* Extend bridge's windows if necessary */
+	pnv_pci_fixup_bridge_resources(bus, type);
+
+	/* Don't assign PE to bus which doesn't have any
+	 * subordinate PCI devices.
+	 */
+	if (list_empty(&bus->devices))
+		return;
+
+	/* Reserve PEs for M64 resource */
+	if (phb->reserve_m64_pe)
+		phb->reserve_m64_pe(bus, NULL, all);
+
+	/* Assign PE. We might run here because of partial hotplug.
+	 * For the case, we just pick up the existing PE and should
+	 * not allocate resources again.
+	 */
+	pe = pnv_ioda_setup_bus_PE(bus, all);
+	if (!pe)
+		return;
+
+	/* Setup MMIO mapping */
+	pnv_ioda_setup_pe_seg(hose, pe);
+
+	/* Setup DMA */
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		pnv_ioda1_setup_dma(phb, pe);
+		break;
+	case PNV_PHB_IODA2:
+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+		break;
+	default:
+		pr_warn("%s: No DMA for PHB type %d\n",
+			__func__, phb->type);
+	}
+}
+
 #ifdef CONFIG_PCI_IOV
 static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
@@ -3147,6 +3190,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 #endif
        .enable_device_hook = pnv_pci_enable_device_hook,
        .window_alignment = pnv_pci_window_alignment,
+	.setup_bridge = pnv_pci_setup_bridge,
        .reset_secondary_bus = pnv_pci_reset_secondary_bus,
        .dma_set_mask = pnv_pci_ioda_dma_set_mask,
        .shutdown = pnv_pci_ioda_shutdown,
@@ -3218,6 +3262,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (phb->regs == NULL)
 		pr_err("  Failed to map registers !\n");
 
+	pnv_pci_ioda_setup_opal_tce_kill(phb);
+
 	/* Initialize more IODA stuff */
 	phb->ioda.total_pe_num = 1;
 	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index e93a489..a160491 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -136,6 +136,7 @@ struct pnv_phb {
 			/* Global bridge info */
 			unsigned int		total_pe_num;
 			unsigned int		root_pe_idx;
+			bool			root_pe_is_populated;
 			unsigned int		reserved_pe_idx;
 
 			/* 32-bit MMIO window */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 21/42] powerpc/powernv: Remove DMA32 list of PEs
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (17 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 20/42] powerpc/powernv: Create PEs dynamically Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 22/42] powerpc/powernv: Move functions around Gavin Shan
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

Every PHB maintains a list of PEs based on their DMA32 weight. After
patch "powerpc/powernv: Create PEs dynamically", the list is useless
and it's safe to remove it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 18 ------------------
 arch/powerpc/platforms/powernv/pci.h      |  6 ------
 2 files changed, 24 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 37847a3..84b771e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -938,20 +938,6 @@ out:
 	return 0;
 }
 
-static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
-				       struct pnv_ioda_pe *pe)
-{
-	struct pnv_ioda_pe *lpe;
-
-	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
-		if (lpe->dma32_weight < pe->dma32_weight) {
-			list_add_tail(&pe->dma_link, &lpe->dma_link);
-			return;
-		}
-	}
-	list_add_tail(&pe->dma_link, &phb->ioda.pe_dma_list);
-}
-
 static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
@@ -1169,9 +1155,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	/* Put PE to the list */
 	list_add_tail(&pe->list, &phb->ioda.pe_list);
 
-	/* Link the PE */
-	pnv_ioda_link_pe_by_weight(phb, pe);
-
 	return pe;
 }
 
@@ -3313,7 +3296,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		phb->ioda.root_pe_idx = IODA_INVALID_PE;
 	}
 
-	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
 	mutex_init(&phb->ioda.pe_list_mutex);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index a160491..f8e6022 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -83,7 +83,6 @@ struct pnv_ioda_pe {
 	struct list_head	slaves;
 
 	/* Link in list of PE#s */
-	struct list_head	dma_link;
 	struct list_head	list;
 };
 
@@ -185,11 +184,6 @@ struct pnv_phb {
 			/* 32-bit TCE tables allocation */
 			unsigned long		dma32_segcount;
 
-			/* Sorted list of used PE's, sorted at
-			 * boot for resource allocation purposes
-			 */
-			struct list_head	pe_dma_list;
-
 			/* TCE cache invalidate registers (physical and
 			 * remapped)
 			 */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 22/42] powerpc/powernv: Move functions around
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (18 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 21/42] powerpc/powernv: Remove DMA32 list of PEs Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically Gavin Shan
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The patch moves functions related to releasing PE around so that
we don't need extra declaration for them in subsequent patches.
Also, it fixes warnings from scripts/checkpatch.pl. It doesn't
introduce any behavioural changes.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 743 +++++++++++++++---------------
 1 file changed, 377 insertions(+), 366 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 84b771e..d2697a3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -132,6 +132,295 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
+static inline void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe)
+{
+	/* 01xb - invalidate TCEs that match the specified PE# */
+	unsigned long val = (0x4ull << 60) | (pe->pe_number & 0xFF);
+	struct pnv_phb *phb = pe->phb;
+
+	if (!phb->ioda.tce_inval_reg)
+		return;
+
+	mb(); /* Ensure above stores are visible */
+	__raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
+}
+
+#if defined(CONFIG_IOMMU_API) || defined(CONFIG_PCI_IOV)
+static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
+		int num)
+{
+	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
+			table_group);
+	struct pnv_phb *phb = pe->phb;
+	long ret;
+
+	pe_info(pe, "Removing DMA window #%d\n", num);
+
+	ret = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
+			(pe->pe_number << 1) + num,
+			0/* levels */, 0/* table address */,
+			0/* table size */, 0/* page size */);
+	if (ret)
+		pe_warn(pe, "Unmapping failed, ret = %ld\n", ret);
+	else
+		pnv_pci_ioda2_tce_invalidate_entire(pe);
+
+	pnv_pci_unlink_table_and_group(table_group->tables[num], table_group);
+
+	return ret;
+}
+#endif /* CONFIG_IOMMU_API || CONFIG_PCI_IOV */
+
+static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
+{
+	uint16_t window_id = (pe->pe_number << 1) + 1;
+	int64_t rc;
+
+	pe_info(pe, "%sabling 64-bit DMA bypass\n", enable ? "En" : "Dis");
+	if (enable) {
+		phys_addr_t top = memblock_end_of_DRAM();
+
+		top = roundup_pow_of_two(top);
+		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
+						     pe->pe_number,
+						     window_id,
+						     pe->tce_bypass_base,
+						     top);
+	} else {
+		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
+						     pe->pe_number,
+						     window_id,
+						     pe->tce_bypass_base,
+						     0);
+	}
+	if (rc)
+		pe_err(pe, "OPAL error %lld configuring bypass window\n", rc);
+	else
+		pe->tce_bypass_enabled = enable;
+}
+
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev,
+					 struct pnv_ioda_pe *pe)
+{
+	struct iommu_table    *tbl;
+	int64_t               rc;
+
+	tbl = pe->table_group.tables[0];
+	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
+	if (rc)
+		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
+
+	pnv_pci_ioda2_set_bypass(pe, false);
+	if (pe->table_group.group) {
+		iommu_group_put(pe->table_group.group);
+		BUG_ON(pe->table_group.group);
+	}
+	pnv_pci_ioda2_table_free_pages(tbl);
+	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
+}
+#endif /* CONFIG_PCI_IOV */
+
+static int pnv_ioda_set_one_peltv(struct pnv_phb *phb,
+				  struct pnv_ioda_pe *parent,
+				  struct pnv_ioda_pe *child,
+				  bool is_add)
+{
+	const char *desc = is_add ? "adding" : "removing";
+	uint8_t op = is_add ? OPAL_ADD_PE_TO_DOMAIN :
+			      OPAL_REMOVE_PE_FROM_DOMAIN;
+	struct pnv_ioda_pe *slave;
+	long rc;
+
+	/* Parent PE affects child PE */
+	rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
+				child->pe_number, op);
+	if (rc != OPAL_SUCCESS) {
+		pe_warn(child, "OPAL error %ld %s to parent PELTV\n",
+			rc, desc);
+		return -ENXIO;
+	}
+
+	if (!(child->flags & PNV_IODA_PE_MASTER))
+		return 0;
+
+	/* Compound case: parent PE affects slave PEs */
+	list_for_each_entry(slave, &child->slaves, list) {
+		rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
+					slave->pe_number, op);
+		if (rc != OPAL_SUCCESS) {
+			pe_warn(slave, "OPAL error %ld %s to parent PELTV\n",
+				rc, desc);
+			return -ENXIO;
+		}
+	}
+
+	return 0;
+}
+
+static int pnv_ioda_set_peltv(struct pnv_phb *phb,
+			      struct pnv_ioda_pe *pe,
+			      bool is_add)
+{
+	struct pnv_ioda_pe *slave;
+	struct pci_dev *pdev = NULL;
+	int ret;
+
+	/*
+	 * Clear PE frozen state. If it's master PE, we need
+	 * clear slave PE frozen state as well.
+	 */
+	if (is_add) {
+		opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
+					  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+		if (pe->flags & PNV_IODA_PE_MASTER) {
+			list_for_each_entry(slave, &pe->slaves, list)
+				opal_pci_eeh_freeze_clear(phb->opal_id,
+					slave->pe_number,
+					OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+		}
+	}
+
+	/*
+	 * Associate PE in PELT. We need add the PE into the
+	 * corresponding PELT-V as well. Otherwise, the error
+	 * originated from the PE might contribute to other
+	 * PEs.
+	 */
+	ret = pnv_ioda_set_one_peltv(phb, pe, pe, is_add);
+	if (ret)
+		return ret;
+
+	/* For compound PEs, any one affects all of them */
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry(slave, &pe->slaves, list) {
+			ret = pnv_ioda_set_one_peltv(phb, slave, pe, is_add);
+			if (ret)
+				return ret;
+		}
+	}
+
+	if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
+		pdev = pe->pbus->self;
+	else if (pe->flags & PNV_IODA_PE_DEV)
+		pdev = pe->pdev->bus->self;
+#ifdef CONFIG_PCI_IOV
+	else if (pe->flags & PNV_IODA_PE_VF)
+		pdev = pe->parent_dev;
+#endif /* CONFIG_PCI_IOV */
+	while (pdev) {
+		struct pci_dn *pdn = pci_get_pdn(pdev);
+		struct pnv_ioda_pe *parent;
+
+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
+			parent = &phb->ioda.pe_array[pdn->pe_number];
+			ret = pnv_ioda_set_one_peltv(phb, parent, pe, is_add);
+			if (ret)
+				return ret;
+		}
+
+		pdev = pdev->bus->self;
+	}
+
+	return 0;
+}
+
+#ifdef CONFIG_PCI_IOV
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
+{
+	struct pci_dev *parent;
+	uint8_t bcomp, dcomp, fcomp;
+	int64_t rc;
+	long rid_end, rid;
+
+	/* Currently, we just deconfigure VF PE. Bus PE will always there.*/
+	if (pe->pbus) {
+		int count;
+
+		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
+		parent = pe->pbus->self;
+		if (pe->flags & PNV_IODA_PE_BUS_ALL)
+			count = pe->pbus->busn_res.end -
+				pe->pbus->busn_res.start + 1;
+		else
+			count = 1;
+
+		switch (count) {
+		case  1:
+			bcomp = OpalPciBusAll;
+			break;
+		case  2:
+			bcomp = OpalPciBus7Bits;
+			break;
+		case  4:
+			bcomp = OpalPciBus6Bits;
+			break;
+		case  8:
+			bcomp = OpalPciBus5Bits;
+			break;
+		case 16:
+			bcomp = OpalPciBus4Bits;
+			break;
+		case 32:
+			bcomp = OpalPciBus3Bits;
+			break;
+		default:
+			dev_err(&pe->pbus->dev, "Subordinate buses %d unsupported\n",
+				count);
+			/* Do an exact match only */
+			bcomp = OpalPciBusAll;
+		}
+		rid_end = pe->rid + (count << 8);
+	} else {
+		if (pe->flags & PNV_IODA_PE_VF)
+			parent = pe->parent_dev;
+		else
+			parent = pe->pdev->bus->self;
+		bcomp = OpalPciBusAll;
+		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
+		rid_end = pe->rid + 1;
+	}
+
+	/* Clear the reverse map */
+	for (rid = pe->rid; rid < rid_end; rid++)
+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
+
+	/* Release from all parents PELT-V */
+	while (parent) {
+		struct pci_dn *pdn = pci_get_pdn(parent);
+
+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
+			rc = opal_pci_set_peltv(phb->opal_id,
+					pdn->pe_number, pe->pe_number,
+					OPAL_REMOVE_PE_FROM_DOMAIN);
+			/* XXX What to do in case of error ? */
+		}
+		parent = parent->bus->self;
+	}
+
+	opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
+				  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+
+	/* Disassociate PE in PELT */
+	rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number,
+				pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
+	if (rc)
+		pe_warn(pe, "OPAL error %ld remove self from PELTV\n", rc);
+	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
+			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
+	if (rc)
+		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
+
+	pe->pbus = NULL;
+	pe->pdev = NULL;
+	pe->parent_dev = NULL;
+
+	return 0;
+}
+#endif /* CONFIG_PCI_IOV */
+
 static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
 {
 	struct pnv_ioda_pe *pe = &phb->ioda.pe_array[pe_no];
@@ -547,305 +836,117 @@ static int pnv_ioda_unfreeze_pe(struct pnv_phb *phb, int pe_no, int opt)
 	struct pnv_ioda_pe *pe, *slave;
 	s64 rc;
 
-	/* Find master PE */
-	pe = &phb->ioda.pe_array[pe_no];
-	if (pe->flags & PNV_IODA_PE_SLAVE) {
-		pe = pe->master;
-		WARN_ON(!pe || !(pe->flags & PNV_IODA_PE_MASTER));
-		pe_no = pe->pe_number;
-	}
-
-	/* Clear frozen state for master PE */
-	rc = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no, opt);
-	if (rc != OPAL_SUCCESS) {
-		pr_warn("%s: Failure %lld clear %d on PHB#%x-PE#%x\n",
-			__func__, rc, opt, phb->hose->global_number, pe_no);
-		return -EIO;
-	}
-
-	if (!(pe->flags & PNV_IODA_PE_MASTER))
-		return 0;
-
-	/* Clear frozen state for slave PEs */
-	list_for_each_entry(slave, &pe->slaves, list) {
-		rc = opal_pci_eeh_freeze_clear(phb->opal_id,
-					     slave->pe_number,
-					     opt);
-		if (rc != OPAL_SUCCESS) {
-			pr_warn("%s: Failure %lld clear %d on PHB#%x-PE#%x\n",
-				__func__, rc, opt, phb->hose->global_number,
-				slave->pe_number);
-			return -EIO;
-		}
-	}
-
-	return 0;
-}
-
-static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
-{
-	struct pnv_ioda_pe *slave, *pe;
-	u8 fstate, state;
-	__be16 pcierr;
-	s64 rc;
-
-	/* Sanity check on PE number */
-	if (pe_no < 0 || pe_no >= phb->ioda.total_pe_num)
-		return OPAL_EEH_STOPPED_PERM_UNAVAIL;
-
-	/*
-	 * Fetch the master PE and the PE instance might be
-	 * not initialized yet.
-	 */
-	pe = &phb->ioda.pe_array[pe_no];
-	if (pe->flags & PNV_IODA_PE_SLAVE) {
-		pe = pe->master;
-		WARN_ON(!pe || !(pe->flags & PNV_IODA_PE_MASTER));
-		pe_no = pe->pe_number;
-	}
-
-	/* Check the master PE */
-	rc = opal_pci_eeh_freeze_status(phb->opal_id, pe_no,
-					&state, &pcierr, NULL);
-	if (rc != OPAL_SUCCESS) {
-		pr_warn("%s: Failure %lld getting "
-			"PHB#%x-PE#%x state\n",
-			__func__, rc,
-			phb->hose->global_number, pe_no);
-		return OPAL_EEH_STOPPED_TEMP_UNAVAIL;
-	}
-
-	/* Check the slave PE */
-	if (!(pe->flags & PNV_IODA_PE_MASTER))
-		return state;
-
-	list_for_each_entry(slave, &pe->slaves, list) {
-		rc = opal_pci_eeh_freeze_status(phb->opal_id,
-						slave->pe_number,
-						&fstate,
-						&pcierr,
-						NULL);
-		if (rc != OPAL_SUCCESS) {
-			pr_warn("%s: Failure %lld getting "
-				"PHB#%x-PE#%x state\n",
-				__func__, rc,
-				phb->hose->global_number, slave->pe_number);
-			return OPAL_EEH_STOPPED_TEMP_UNAVAIL;
-		}
-
-		/*
-		 * Override the result based on the ascending
-		 * priority.
-		 */
-		if (fstate > state)
-			state = fstate;
-	}
-
-	return state;
-}
-
-/* Currently those 2 are only used when MSIs are enabled, this will change
- * but in the meantime, we need to protect them to avoid warnings
- */
-#ifdef CONFIG_PCI_MSI
-static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	struct pci_dn *pdn = pci_get_pdn(dev);
-
-	if (!pdn)
-		return NULL;
-	if (pdn->pe_number == IODA_INVALID_PE)
-		return NULL;
-	return &phb->ioda.pe_array[pdn->pe_number];
-}
-#endif /* CONFIG_PCI_MSI */
-
-static int pnv_ioda_set_one_peltv(struct pnv_phb *phb,
-				  struct pnv_ioda_pe *parent,
-				  struct pnv_ioda_pe *child,
-				  bool is_add)
-{
-	const char *desc = is_add ? "adding" : "removing";
-	uint8_t op = is_add ? OPAL_ADD_PE_TO_DOMAIN :
-			      OPAL_REMOVE_PE_FROM_DOMAIN;
-	struct pnv_ioda_pe *slave;
-	long rc;
-
-	/* Parent PE affects child PE */
-	rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
-				child->pe_number, op);
-	if (rc != OPAL_SUCCESS) {
-		pe_warn(child, "OPAL error %ld %s to parent PELTV\n",
-			rc, desc);
-		return -ENXIO;
-	}
-
-	if (!(child->flags & PNV_IODA_PE_MASTER))
-		return 0;
-
-	/* Compound case: parent PE affects slave PEs */
-	list_for_each_entry(slave, &child->slaves, list) {
-		rc = opal_pci_set_peltv(phb->opal_id, parent->pe_number,
-					slave->pe_number, op);
-		if (rc != OPAL_SUCCESS) {
-			pe_warn(slave, "OPAL error %ld %s to parent PELTV\n",
-				rc, desc);
-			return -ENXIO;
-		}
-	}
-
-	return 0;
-}
-
-static int pnv_ioda_set_peltv(struct pnv_phb *phb,
-			      struct pnv_ioda_pe *pe,
-			      bool is_add)
-{
-	struct pnv_ioda_pe *slave;
-	struct pci_dev *pdev = NULL;
-	int ret;
-
-	/*
-	 * Clear PE frozen state. If it's master PE, we need
-	 * clear slave PE frozen state as well.
-	 */
-	if (is_add) {
-		opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
-					  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
-		if (pe->flags & PNV_IODA_PE_MASTER) {
-			list_for_each_entry(slave, &pe->slaves, list)
-				opal_pci_eeh_freeze_clear(phb->opal_id,
-							  slave->pe_number,
-							  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
-		}
-	}
-
-	/*
-	 * Associate PE in PELT. We need add the PE into the
-	 * corresponding PELT-V as well. Otherwise, the error
-	 * originated from the PE might contribute to other
-	 * PEs.
-	 */
-	ret = pnv_ioda_set_one_peltv(phb, pe, pe, is_add);
-	if (ret)
-		return ret;
+	/* Find master PE */
+	pe = &phb->ioda.pe_array[pe_no];
+	if (pe->flags & PNV_IODA_PE_SLAVE) {
+		pe = pe->master;
+		WARN_ON(!pe || !(pe->flags & PNV_IODA_PE_MASTER));
+		pe_no = pe->pe_number;
+	}
 
-	/* For compound PEs, any one affects all of them */
-	if (pe->flags & PNV_IODA_PE_MASTER) {
-		list_for_each_entry(slave, &pe->slaves, list) {
-			ret = pnv_ioda_set_one_peltv(phb, slave, pe, is_add);
-			if (ret)
-				return ret;
-		}
+	/* Clear frozen state for master PE */
+	rc = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no, opt);
+	if (rc != OPAL_SUCCESS) {
+		pr_warn("%s: Failure %lld clear %d on PHB#%x-PE#%x\n",
+			__func__, rc, opt, phb->hose->global_number, pe_no);
+		return -EIO;
 	}
 
-	if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
-		pdev = pe->pbus->self;
-	else if (pe->flags & PNV_IODA_PE_DEV)
-		pdev = pe->pdev->bus->self;
-#ifdef CONFIG_PCI_IOV
-	else if (pe->flags & PNV_IODA_PE_VF)
-		pdev = pe->parent_dev;
-#endif /* CONFIG_PCI_IOV */
-	while (pdev) {
-		struct pci_dn *pdn = pci_get_pdn(pdev);
-		struct pnv_ioda_pe *parent;
+	if (!(pe->flags & PNV_IODA_PE_MASTER))
+		return 0;
 
-		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
-			parent = &phb->ioda.pe_array[pdn->pe_number];
-			ret = pnv_ioda_set_one_peltv(phb, parent, pe, is_add);
-			if (ret)
-				return ret;
+	/* Clear frozen state for slave PEs */
+	list_for_each_entry(slave, &pe->slaves, list) {
+		rc = opal_pci_eeh_freeze_clear(phb->opal_id,
+					     slave->pe_number,
+					     opt);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Failure %lld clear %d on PHB#%x-PE#%x\n",
+				__func__, rc, opt, phb->hose->global_number,
+				slave->pe_number);
+			return -EIO;
 		}
-
-		pdev = pdev->bus->self;
 	}
 
 	return 0;
 }
 
-#ifdef CONFIG_PCI_IOV
-static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
+static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
 {
-	struct pci_dev *parent;
-	uint8_t bcomp, dcomp, fcomp;
-	int64_t rc;
-	long rid_end, rid;
+	struct pnv_ioda_pe *slave, *pe;
+	u8 fstate, state;
+	__be16 pcierr;
+	s64 rc;
 
-	/* Currently, we just deconfigure VF PE. Bus PE will always there.*/
-	if (pe->pbus) {
-		int count;
+	/* Sanity check on PE number */
+	if (pe_no < 0 || pe_no >= phb->ioda.total_pe_num)
+		return OPAL_EEH_STOPPED_PERM_UNAVAIL;
 
-		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
-		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
-		parent = pe->pbus->self;
-		if (pe->flags & PNV_IODA_PE_BUS_ALL)
-			count = pe->pbus->busn_res.end - pe->pbus->busn_res.start + 1;
-		else
-			count = 1;
+	/*
+	 * Fetch the master PE and the PE instance might be
+	 * not initialized yet.
+	 */
+	pe = &phb->ioda.pe_array[pe_no];
+	if (pe->flags & PNV_IODA_PE_SLAVE) {
+		pe = pe->master;
+		WARN_ON(!pe || !(pe->flags & PNV_IODA_PE_MASTER));
+		pe_no = pe->pe_number;
+	}
 
-		switch(count) {
-		case  1: bcomp = OpalPciBusAll;         break;
-		case  2: bcomp = OpalPciBus7Bits;       break;
-		case  4: bcomp = OpalPciBus6Bits;       break;
-		case  8: bcomp = OpalPciBus5Bits;       break;
-		case 16: bcomp = OpalPciBus4Bits;       break;
-		case 32: bcomp = OpalPciBus3Bits;       break;
-		default:
-			dev_err(&pe->pbus->dev, "Number of subordinate buses %d unsupported\n",
-			        count);
-			/* Do an exact match only */
-			bcomp = OpalPciBusAll;
-		}
-		rid_end = pe->rid + (count << 8);
-	} else {
-		if (pe->flags & PNV_IODA_PE_VF)
-			parent = pe->parent_dev;
-		else
-			parent = pe->pdev->bus->self;
-		bcomp = OpalPciBusAll;
-		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
-		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
-		rid_end = pe->rid + 1;
+	/* Check the master PE */
+	rc = opal_pci_eeh_freeze_status(phb->opal_id, pe_no,
+					&state, &pcierr, NULL);
+	if (rc != OPAL_SUCCESS) {
+		pr_warn("%s: Failure %lld getting PHB#%x-PE#%x state\n",
+			__func__, rc, phb->hose->global_number, pe_no);
+		return OPAL_EEH_STOPPED_TEMP_UNAVAIL;
 	}
 
-	/* Clear the reverse map */
-	for (rid = pe->rid; rid < rid_end; rid++)
-		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
+	/* Check the slave PE */
+	if (!(pe->flags & PNV_IODA_PE_MASTER))
+		return state;
 
-	/* Release from all parents PELT-V */
-	while (parent) {
-		struct pci_dn *pdn = pci_get_pdn(parent);
-		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
-			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
-						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
-			/* XXX What to do in case of error ? */
+	list_for_each_entry(slave, &pe->slaves, list) {
+		rc = opal_pci_eeh_freeze_status(phb->opal_id,
+						slave->pe_number,
+						&fstate,
+						&pcierr,
+						NULL);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Failure %lld getting PHB#%x-PE#%x state\n",
+				__func__, rc, phb->hose->global_number,
+				slave->pe_number);
+			return OPAL_EEH_STOPPED_TEMP_UNAVAIL;
 		}
-		parent = parent->bus->self;
-	}
 
-	opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
-				  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+		/*
+		 * Override the result based on the ascending
+		 * priority.
+		 */
+		if (fstate > state)
+			state = fstate;
+	}
 
-	/* Disassociate PE in PELT */
-	rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number,
-				pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
-	if (rc)
-		pe_warn(pe, "OPAL error %ld remove self from PELTV\n", rc);
-	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
-			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
-	if (rc)
-		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
+	return state;
+}
 
-	pe->pbus = NULL;
-	pe->pdev = NULL;
-	pe->parent_dev = NULL;
+/* Currently those 2 are only used when MSIs are enabled, this will change
+ * but in the meantime, we need to protect them to avoid warnings
+ */
+#ifdef CONFIG_PCI_MSI
+static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dn *pdn = pci_get_pdn(dev);
 
-	return 0;
+	if (!pdn)
+		return NULL;
+	if (pdn->pe_number == IODA_INVALID_PE)
+		return NULL;
+	return &phb->ioda.pe_array[pdn->pe_number];
 }
-#endif /* CONFIG_PCI_IOV */
+#endif /* CONFIG_PCI_MSI */
 
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
@@ -1293,29 +1394,6 @@ m64_failed:
 	return -EBUSY;
 }
 
-static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
-		int num);
-static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
-
-static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe *pe)
-{
-	struct iommu_table    *tbl;
-	int64_t               rc;
-
-	tbl = pe->table_group.tables[0];
-	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
-	if (rc)
-		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
-
-	pnv_pci_ioda2_set_bypass(pe, false);
-	if (pe->table_group.group) {
-		iommu_group_put(pe->table_group.group);
-		BUG_ON(pe->table_group.group);
-	}
-	pnv_pci_ioda2_table_free_pages(tbl);
-	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
-}
-
 static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 {
 	struct pci_bus        *bus;
@@ -1804,19 +1882,6 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
 	.get = pnv_tce_get,
 };
 
-static inline void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe)
-{
-	/* 01xb - invalidate TCEs that match the specified PE# */
-	unsigned long val = (0x4ull << 60) | (pe->pe_number & 0xFF);
-	struct pnv_phb *phb = pe->phb;
-
-	if (!phb->ioda.tce_inval_reg)
-		return;
-
-	mb(); /* Ensure above stores are visible */
-	__raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
-}
-
 static void pnv_pci_ioda2_do_tce_invalidate(unsigned pe_number, bool rm,
 		__be64 __iomem *invalidate, unsigned shift,
 		unsigned long index, unsigned long npages)
@@ -2055,34 +2120,6 @@ static long pnv_pci_ioda2_set_window(struct iommu_table_group *table_group,
 	return 0;
 }
 
-static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
-{
-	uint16_t window_id = (pe->pe_number << 1 ) + 1;
-	int64_t rc;
-
-	pe_info(pe, "%sabling 64-bit DMA bypass\n", enable ? "En" : "Dis");
-	if (enable) {
-		phys_addr_t top = memblock_end_of_DRAM();
-
-		top = roundup_pow_of_two(top);
-		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-						     pe->pe_number,
-						     window_id,
-						     pe->tce_bypass_base,
-						     top);
-	} else {
-		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-						     pe->pe_number,
-						     window_id,
-						     pe->tce_bypass_base,
-						     0);
-	}
-	if (rc)
-		pe_err(pe, "OPAL error %lld configuring bypass window\n", rc);
-	else
-		pe->tce_bypass_enabled = enable;
-}
-
 static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 		__u32 page_shift, __u64 window_size, __u32 levels,
 		struct iommu_table *tbl);
@@ -2162,32 +2199,6 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	return 0;
 }
 
-#if defined(CONFIG_IOMMU_API) || defined(CONFIG_PCI_IOV)
-static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
-		int num)
-{
-	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
-			table_group);
-	struct pnv_phb *phb = pe->phb;
-	long ret;
-
-	pe_info(pe, "Removing DMA window #%d\n", num);
-
-	ret = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
-			(pe->pe_number << 1) + num,
-			0/* levels */, 0/* table address */,
-			0/* table size */, 0/* page size */);
-	if (ret)
-		pe_warn(pe, "Unmapping failed, ret = %ld\n", ret);
-	else
-		pnv_pci_ioda2_tce_invalidate_entire(pe);
-
-	pnv_pci_unlink_table_and_group(table_group->tables[num], table_group);
-
-	return ret;
-}
-#endif
-
 #ifdef CONFIG_IOMMU_API
 static unsigned long pnv_pci_ioda2_get_table_size(__u32 page_shift,
 		__u64 window_size, __u32 levels)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (19 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 22/42] powerpc/powernv: Move functions around Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-11 13:03   ` Alexey Kardashevskiy
  2015-08-06  4:11 ` [PATCH v6 24/42] powerpc/powernv: Supports slot ID Gavin Shan
                   ` (15 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

This adds the refcount to PE, which represents number of PCI
devices contained in the PE. When last device leaves from the
PE, the PE together with its consumed resources (IO, DMA, PELTM,
PELTV) are released, to support PCI hotplug.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 233 +++++++++++++++++++++++++++---
 arch/powerpc/platforms/powernv/pci.h      |   3 +
 2 files changed, 217 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d2697a3..13d8a5b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -132,6 +132,53 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
 		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
+static void pnv_pci_ioda_release_pe_dma(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	struct iommu_table *tbl;
+	int seg;
+	int64_t rc;
+
+	/* No DMA32 segments allocated */
+	if (pe->dma32_seg == PNV_INVALID_SEGMENT ||
+	    pe->dma32_segcount <= 0) {
+		pe->dma32_seg = PNV_INVALID_SEGMENT;
+		pe->dma32_segcount = 0;
+		return;
+	}
+
+	/* Unlink IOMMU table from group */
+	tbl = pe->table_group.tables[0];
+	pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
+	if (pe->table_group.group) {
+		iommu_group_put(pe->table_group.group);
+		BUG_ON(pe->table_group.group);
+	}
+
+	/* Release IOMMU table */
+	free_pages(tbl->it_base,
+		get_order(TCE32_TABLE_SIZE * pe->dma32_segcount));
+	iommu_free_table(tbl,
+		of_node_full_name(pci_bus_to_OF_node(pe->pbus)));
+
+	/* Disable TVE */
+	for (seg = pe->dma32_seg;
+	     seg < pe->dma32_seg + pe->dma32_segcount;
+	     seg++) {
+		rc = opal_pci_map_pe_dma_window(phb->opal_id,
+				pe->pe_number, seg, 0, 0ul, 0ul, 0ul);
+		if (rc)
+			pe_warn(pe, "Error %ld unmapping DMA32 seg#%d\n",
+				rc, seg);
+	}
+
+	/* Free the DMA32 segments */
+	bitmap_clear(phb->ioda.dma32_segmap,
+		pe->dma32_seg, pe->dma32_segcount);
+	pe->dma32_seg = PNV_INVALID_SEGMENT;
+	pe->dma32_segcount = 0;
+}
+
 static inline void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe)
 {
 	/* 01xb - invalidate TCEs that match the specified PE# */
@@ -199,13 +246,15 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 		pe->tce_bypass_enabled = enable;
 }
 
-#ifdef CONFIG_PCI_IOV
-static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev,
-					 struct pnv_ioda_pe *pe)
+static void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe)
 {
 	struct iommu_table    *tbl;
+	struct device_node    *dn;
 	int64_t               rc;
 
+	if (pe->dma32_seg == PNV_INVALID_SEGMENT)
+		return;
+
 	tbl = pe->table_group.tables[0];
 	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
 	if (rc)
@@ -216,10 +265,91 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev,
 		iommu_group_put(pe->table_group.group);
 		BUG_ON(pe->table_group.group);
 	}
+
+	if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
+		dn = pci_bus_to_OF_node(pe->pbus);
+	else if (pe->flags & PNV_IODA_PE_DEV)
+		dn = pci_device_to_OF_node(pe->pdev);
+#ifdef CONFIG_PCI_IOV
+	else if (pe->flags & PNV_IODA_PE_VF)
+		dn = pci_device_to_OF_node(pe->parent_dev);
+#endif
+	else
+		dn = NULL;
+
 	pnv_pci_ioda2_table_free_pages(tbl);
-	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
+	iommu_free_table(tbl, of_node_full_name(dn));
+	pe->dma32_seg = PNV_INVALID_SEGMENT;
+}
+
+static void pnv_ioda_release_pe_dma(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+
+	switch (phb->type) {
+	case PNV_PHB_IODA1:
+		pnv_pci_ioda_release_pe_dma(pe);
+		break;
+	case PNV_PHB_IODA2:
+		pnv_pci_ioda2_release_pe_dma(pe);
+		break;
+	default:
+		pr_warn("%s: Cannot release DMA for PHB type %d\n",
+			__func__, phb->type);
+	}
+}
+
+static void pnv_ioda_release_pe_one_seg(struct pnv_ioda_pe *pe, int win)
+{
+	struct pnv_phb *phb = pe->phb;
+	unsigned long *segmap = NULL;
+	unsigned long *pe_segmap = NULL;
+	int segno, limit, mod = 0;
+
+	switch (win) {
+	case OPAL_IO_WINDOW_TYPE:
+		segmap = phb->ioda.io_segmap;
+		pe_segmap = pe->io_segmap;
+		break;
+	case OPAL_M32_WINDOW_TYPE:
+		segmap = phb->ioda.m32_segmap;
+		pe_segmap = pe->m32_segmap;
+		break;
+	case OPAL_M64_WINDOW_TYPE:
+		if (phb->type != PNV_PHB_IODA1)
+			return;
+		segmap = phb->ioda.m64_segmap;
+		pe_segmap = pe->m64_segmap;
+		mod = 8;
+		break;
+	default:
+		return;
+	}
+
+	segno = -1;
+	limit = phb->ioda.total_pe_num;
+	while ((segno = find_next_bit(pe_segmap, limit, segno + 1)) < limit) {
+		if (mod > 0)
+			opal_pci_map_pe_mmio_window(phb->opal_id,
+				phb->ioda.reserved_pe_idx, win,
+				segno / mod, segno % mod);
+		else
+			opal_pci_map_pe_mmio_window(phb->opal_id,
+					phb->ioda.reserved_pe_idx, win,
+					0, segno);
+
+		clear_bit(segno, pe_segmap);
+		clear_bit(segno, segmap);
+	}
+}
+
+static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
+{
+	int win;
+
+	for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++)
+		pnv_ioda_release_pe_one_seg(pe, win);
 }
-#endif /* CONFIG_PCI_IOV */
 
 static int pnv_ioda_set_one_peltv(struct pnv_phb *phb,
 				  struct pnv_ioda_pe *parent,
@@ -325,7 +455,6 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
 	return 0;
 }
 
-#ifdef CONFIG_PCI_IOV
 static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *parent;
@@ -373,9 +502,11 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 		}
 		rid_end = pe->rid + (count << 8);
 	} else {
+#ifdef CONFIG_PCI_IOV
 		if (pe->flags & PNV_IODA_PE_VF)
 			parent = pe->parent_dev;
 		else
+#endif
 			parent = pe->pdev->bus->self;
 		bcomp = OpalPciBusAll;
 		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
@@ -415,11 +546,72 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	pe->pbus = NULL;
 	pe->pdev = NULL;
+#ifdef CONFIG_PCI_IOV
 	pe->parent_dev = NULL;
+#endif
 
 	return 0;
 }
-#endif /* CONFIG_PCI_IOV */
+
+static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
+{
+	struct pnv_phb *phb = pe->phb;
+	struct pnv_ioda_pe *tmp, *slave;
+
+	/* Release slave PEs in compound PE */
+	if (pe->flags & PNV_IODA_PE_MASTER) {
+		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
+			pnv_ioda_release_pe(pe);
+	}
+
+	/* Remove the PE from the list */
+	list_del(&pe->list);
+
+	/* Release resources */
+	pnv_ioda_release_pe_dma(pe);
+	pnv_ioda_release_pe_seg(pe);
+	pnv_ioda_deconfigure_pe(pe->phb, pe);
+
+	/* Release PE number */
+	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
+}
+
+static inline struct pnv_ioda_pe *pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
+{
+	if (!pe)
+		return NULL;
+
+	pe->device_count++;
+	return pe;
+}
+
+static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
+{
+	if (!pe)
+		return;
+
+	pe->device_count--;
+	BUG_ON(pe->device_count < 0);
+	if (pe->device_count == 0)
+		pnv_ioda_release_pe(pe);
+}
+
+static void pnv_pci_release_device(struct pci_dev *pdev)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	struct pnv_ioda_pe *pe;
+
+	if (pdev->is_virtfn)
+		return;
+
+	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
+		return;
+
+	pe = &phb->ioda.pe_array[pdn->pe_number];
+	pnv_ioda_pe_put(pe);
+}
 
 static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
 {
@@ -466,6 +658,7 @@ static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 	return pnv_ioda_init_pe(phb, pe);
 }
 
+#ifdef CONFIG_PCI_IOV
 static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
 {
 	WARN_ON(phb->ioda.pe_array[pe].pdev);
@@ -473,6 +666,7 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
 	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
 	clear_bit(pe, phb->ioda.pe_alloc);
 }
+#endif
 
 static int pnv_ioda1_init_m64(struct pnv_phb *phb)
 {
@@ -1177,6 +1371,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 		if (pdn->pe_number != IODA_INVALID_PE)
 			continue;
 
+		pnv_ioda_pe_get(pe);
 		pdn->pe_number = pe->pe_number;
 		pe->dma32_weight += pnv_ioda_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
@@ -1231,7 +1426,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
 	pe->pbus = bus;
 	pe->pdev = NULL;
-	pe->dma32_seg = -1;
+	pe->dma32_seg = PNV_INVALID_SEGMENT;
 	pe->mve_number = -1;
 	pe->rid = bus->busn_res.start << 8;
 	pe->dma32_weight = 0;
@@ -1244,9 +1439,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 			bus->busn_res.start, pe->pe_number);
 
 	if (pnv_ioda_configure_pe(phb, pe)) {
-		/* XXX What do we do here ? */
-		pnv_ioda_free_pe(phb, pe->pe_number);
 		pe->pbus = NULL;
+		pnv_ioda_release_pe(pe);
 		return NULL;
 	}
 
@@ -1449,14 +1643,14 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		if ((pe->flags & PNV_IODA_PE_MASTER) &&
 		    (pe->flags & PNV_IODA_PE_VF)) {
 			list_for_each_entry_safe(s, sn, &pe->slaves, list) {
-				pnv_pci_ioda2_release_dma_pe(pdev, s);
+				pnv_pci_ioda2_release_dma_pe(s);
 				list_del(&s->list);
 				pnv_ioda_deconfigure_pe(phb, s);
 				pnv_ioda_free_pe(phb, s->pe_number);
 			}
 		}
 
-		pnv_pci_ioda2_release_dma_pe(pdev, pe);
+		pnv_pci_ioda2_release_pe_dma(pe);
 
 		/* Remove from list */
 		mutex_lock(&phb->ioda.pe_list_mutex);
@@ -1532,7 +1726,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		pe->flags = PNV_IODA_PE_VF;
 		pe->pbus = NULL;
 		pe->parent_dev = pdev;
-		pe->dma32_seg = -1;
+		pe->dma32_seg = PNV_INVALID_SEGMENT;
 		pe->mve_number = -1;
 		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
 			   pci_iov_virtfn_devfn(pdev, vf_index);
@@ -1995,7 +2189,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
 
 	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->dma32_seg >= 0))
+	if (WARN_ON(pe->dma32_seg != PNV_INVALID_SEGMENT))
 		return;
 
 	tbl = pnv_pci_table_alloc(phb->hose->node);
@@ -2066,10 +2260,10 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	return;
  fail:
 	/* XXX Failure: Try to fallback to 64-bit only ? */
-	if (pe->dma32_seg >= 0) {
+	if (pe->dma32_seg != PNV_INVALID_SEGMENT) {
 		bitmap_clear(phb->ioda.dma32_segmap,
 			     pe->dma32_seg, pe->dma32_segcount);
-		pe->dma32_seg = -1;
+		pe->dma32_seg = PNV_INVALID_SEGMENT;
 		pe->dma32_segcount = 0;
 	}
 
@@ -2416,7 +2610,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	int64_t rc;
 
 	/* We shouldn't already have a 32-bit DMA associated */
-	if (WARN_ON(pe->dma32_seg >= 0))
+	if (WARN_ON(pe->dma32_seg != PNV_INVALID_SEGMENT))
 		return;
 
 	/* TVE #1 is selected by PCI address bit 59 */
@@ -2443,8 +2637,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 
 	rc = pnv_pci_ioda2_setup_default_config(pe);
 	if (rc) {
-		if (pe->dma32_seg >= 0)
-			pe->dma32_seg = -1;
+		if (pe->dma32_seg != PNV_INVALID_SEGMENT)
+			pe->dma32_seg = PNV_INVALID_SEGMENT;
 		return;
 	}
 
@@ -3183,6 +3377,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
        .teardown_msi_irqs = pnv_teardown_msi_irqs,
 #endif
        .enable_device_hook = pnv_pci_enable_device_hook,
+	.release_device = pnv_pci_release_device,
        .window_alignment = pnv_pci_window_alignment,
 	.setup_bridge = pnv_pci_setup_bridge,
        .reset_secondary_bus = pnv_pci_reset_secondary_bus,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index f8e6022..2058f06 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -25,11 +25,14 @@ enum pnv_phb_model {
 #define PNV_IODA_PE_SLAVE	(1 << 4)	/* Slave PE in compound case	*/
 #define PNV_IODA_PE_VF		(1 << 5)	/* PE for one VF 		*/
 
+#define PNV_INVALID_SEGMENT	(-1)
+
 /* Data associated with a PE, including IOMMU tracking etc.. */
 struct pnv_phb;
 struct pnv_ioda_pe {
 	unsigned long		flags;
 	struct pnv_phb		*phb;
+	int			device_count;
 
 	/* A PE can be associated with a single device or an
 	 * entire bus (& children). In the former case, pdev
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 24/42] powerpc/powernv: Supports slot ID
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (20 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 25/42] powerpc/powernv: Use PCI slot reset infrastructure Gavin Shan
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

PowerNV platform is running on top of skiboot firmware, which has
changes to support PCI slots. PCI slots are identified by PHB's
OPAL ID (PHB slot) or combo of that and PCI slot ID. The patch
changes argument names of opal_pci_reset() and opal_pci_poll()
to reflect the firmware's change. pnv_eeh_phb_poll() is also
renamed to pnv_eeh_poll() to reflect the firmware's change.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h              | 4 ++--
 arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index a091c27..bbb3aa6 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -129,7 +129,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
 int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
 					uint16_t dma_window_number, uint64_t pci_start_addr,
 					uint64_t pci_mem_size);
-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
 
 int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
 				   uint64_t diag_buffer_len);
@@ -146,7 +146,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
 int64_t opal_set_system_attention_led(uint8_t led_action);
 int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
 			    __be16 *pci_error_type, __be16 *severity);
-int64_t opal_pci_poll(uint64_t phb_id);
+int64_t opal_pci_poll(uint64_t id, uint8_t *val);
 int64_t opal_return_cpu(void);
 int64_t opal_check_token(uint64_t token);
 int64_t opal_reinit_cpus(uint64_t flags);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 347b1cf..0350dab 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -745,12 +745,12 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
 	return ret;
 }
 
-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
+static s64 pnv_eeh_poll(uint64_t id)
 {
 	s64 rc = OPAL_HARDWARE;
 
 	while (1) {
-		rc = opal_pci_poll(phb->opal_id);
+		rc = opal_pci_poll(id, NULL);
 		if (rc <= 0)
 			break;
 
@@ -790,7 +790,7 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 	 * reset followed by hot reset on root bus. So we also
 	 * need the PCI bus settlement delay.
 	 */
-	rc = pnv_eeh_phb_poll(phb);
+	rc = pnv_eeh_poll(phb->opal_id);
 	if (option == EEH_RESET_DEACTIVATE) {
 		if (system_state < SYSTEM_RUNNING)
 			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
@@ -833,7 +833,7 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 		goto out;
 
 	/* Poll state of the PHB until the request is done */
-	rc = pnv_eeh_phb_poll(phb);
+	rc = pnv_eeh_poll(phb->opal_id);
 	if (option == EEH_RESET_DEACTIVATE)
 		msleep(EEH_PE_RST_SETTLE_TIME);
 out:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 25/42] powerpc/powernv: Use PCI slot reset infrastructure
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (21 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 24/42] powerpc/powernv: Supports slot ID Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 26/42] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The skiboot firmware might provide the capability of resetting PCI
slot by property "ibm,reset-by-firmware" on the PCI slot associated
device node. The patch checks on the property and route the reset
to firmware if the property exists. Otherwise, we fail back to the
old path as before.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 44 +++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 0350dab..7be2ebf 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -843,7 +843,7 @@ out:
 	return 0;
 }
 
-static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
+static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 {
 	struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -894,6 +894,48 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 	return 0;
 }
 
+static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	struct device_node *dn = pdev ? pci_device_to_OF_node(pdev) : NULL;
+	uint64_t id = (0x1ul << 60);
+	uint8_t scope;
+	int64_t rc;
+
+	/*
+	 * If the firmware can't handle it, we will issue hot reset
+	 * on the secondary bus despite the requested reset type.
+	 */
+	if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
+		return __pnv_eeh_bridge_reset(pdev, option);
+
+	/* The firmware can handle the request */
+	switch (option) {
+	case EEH_RESET_HOT:
+		scope = OPAL_RESET_PCI_HOT;
+		break;
+	case EEH_RESET_FUNDAMENTAL:
+		scope = OPAL_RESET_PCI_FUNDAMENTAL;
+		break;
+	case EEH_RESET_DEACTIVATE:
+		return 0;
+	default:
+		dev_warn(&pdev->dev, "%s: Unsupported reset %d\n",
+			 __func__, option);
+		return -EINVAL;
+	}
+
+	hose = pci_bus_to_host(pdev->bus);
+	phb = hose->private_data;
+	id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
+	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
+	if (rc > 0)
+		rc = pnv_eeh_poll(id);
+
+	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+}
+
 static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
 				     u16 mask, bool af_flr_rst)
 {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 26/42] powerpc/powernv: Simplify pnv_eeh_reset()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (22 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 25/42] powerpc/powernv: Use PCI slot reset infrastructure Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 27/42] powerpc/powernv: Don't cover root bus in pnv_pci_reset_secondary_bus() Gavin Shan
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

This simplifies pnv_eeh_reset() by avoiding the unnecessary nested
"if" statement. No logicial changes introduced by this.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 65 +++++++++++++---------------
 1 file changed, 31 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 7be2ebf..95332e9 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1086,7 +1086,9 @@ void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 {
 	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
 	struct pci_bus *bus;
+	int64_t rc;
 	int ret;
 
 	/*
@@ -1103,44 +1105,39 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 	 * reset. The side effect is that EEH core has to clear the frozen
 	 * state explicitly after BAR restore.
 	 */
-	if (pe->type & EEH_PE_PHB) {
-		ret = pnv_eeh_phb_reset(hose, option);
-	} else {
-		struct pnv_phb *phb;
-		s64 rc;
+	if (pe->type & EEH_PE_PHB)
+		return pnv_eeh_phb_reset(hose, option);
 
-		/*
-		 * The frozen PE might be caused by PAPR error injection
-		 * registers, which are expected to be cleared after hitting
-		 * frozen PE as stated in the hardware spec. Unfortunately,
-		 * that's not true on P7IOC. So we have to clear it manually
-		 * to avoid recursive EEH errors during recovery.
-		 */
-		phb = hose->private_data;
-		if (phb->model == PNV_PHB_MODEL_P7IOC &&
-		    (option == EEH_RESET_HOT ||
-		    option == EEH_RESET_FUNDAMENTAL)) {
-			rc = opal_pci_reset(phb->opal_id,
-					    OPAL_RESET_PHB_ERROR,
-					    OPAL_ASSERT_RESET);
-			if (rc != OPAL_SUCCESS) {
-				pr_warn("%s: Failure %lld clearing "
-					"error injection registers\n",
-					__func__, rc);
-				return -EIO;
-			}
+	/*
+	 * The frozen PE might be caused by PAPR error injection
+	 * registers, which are expected to be cleared after hitting
+	 * frozen PE as stated in the hardware spec. Unfortunately,
+	 * that's not true on P7IOC. So we have to clear it manually
+	 * to avoid recursive EEH errors during recovery.
+	 */
+	phb = hose->private_data;
+	if (phb->model == PNV_PHB_MODEL_P7IOC &&
+	    (option == EEH_RESET_HOT ||
+	    option == EEH_RESET_FUNDAMENTAL)) {
+		rc = opal_pci_reset(phb->opal_id,
+				    OPAL_RESET_PHB_ERROR,
+				    OPAL_ASSERT_RESET);
+		if (rc != OPAL_SUCCESS) {
+			pr_warn("%s: Error %lld clearing errinjct registers\n",
+				__func__, rc);
+			return -EIO;
 		}
-
-		bus = eeh_pe_bus_get(pe);
-		if (pe->type & EEH_PE_VF)
-			ret = pnv_eeh_vf_pe_reset(pe, option);
-		else if (pci_is_root_bus(bus) ||
-			pci_is_root_bus(bus->parent))
-			ret = pnv_eeh_root_reset(hose, option);
-		else
-			ret = pnv_eeh_bridge_reset(bus->self, option);
 	}
 
+	bus = eeh_pe_bus_get(pe);
+	if (pe->type & EEH_PE_VF)
+		ret = pnv_eeh_vf_pe_reset(pe, option);
+	else if (pci_is_root_bus(bus) ||
+		 pci_is_root_bus(bus->parent))
+		ret = pnv_eeh_root_reset(hose, option);
+	else
+		ret = pnv_eeh_bridge_reset(bus->self, option);
+
 	return ret;
 }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 27/42] powerpc/powernv: Don't cover root bus in pnv_pci_reset_secondary_bus()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (23 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 26/42] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 29/42] powerpc/pci: Don't scan empty slot Gavin Shan
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

pnv_pci_reset_secondary_bus(), invoked by pcibios_reset_secondary_bus()
on PowerNV platform. The latter can't be called on root bus. So the
former needn't cover root bus as well.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 95332e9..19cb947 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1059,16 +1059,8 @@ static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
 
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
-	struct pci_controller *hose;
-
-	if (pci_is_root_bus(dev->bus)) {
-		hose = pci_bus_to_host(dev->bus);
-		pnv_eeh_root_reset(hose, EEH_RESET_HOT);
-		pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
-	} else {
-		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
-		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
-	}
+	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
+	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
 }
 
 /**
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 28/42] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()
       [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2015-08-06  4:11   ` [PATCH v6 12/42] powerpc/powernv: Increase PE# capacity Gavin Shan
  2015-08-06  4:11   ` [PATCH v6 17/42] powerpc/powernv: Rename PE# fields in PHB Gavin Shan
@ 2015-08-06  4:11   ` Gavin Shan
  2015-08-06  4:11   ` [PATCH v6 32/42] powerpc/powernv: Introduce pnv_pci_poll() Gavin Shan
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r, mpe-Gsx/Oe8HsFggBc27wqDAHg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	panto-wVdstyuyKrO8r51toPun2/C9HSW9iNxf,
	aik-sLpHqDYs0B2HXe+LvDLADg, Gavin Shan

Some subordinate PCI devices of one particular PCI bus might ask
for fundamental reset because the default (hot) reset isn't enough
for those PCI devices to be up successfully after reset.

This iterates all PCI devices behind the specified PCI bus and issues
fundamental reset if any one PCI device is asking for that. Otherwise,
hot reset is still issued.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 19cb947..4ae48ff 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1057,8 +1057,31 @@ static int pnv_eeh_vf_pe_reset(struct eeh_pe *pe, int option)
 	return 0;
 }
 
+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
+{
+	int *freset = data;
+
+	/*
+	 * Stop the iteration immediately if there is any
+	 * one PCI device requesting fundamental reset
+	 */
+	*freset |= pdev->needs_freset;
+	return *freset;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
+	int option = EEH_RESET_HOT;
+
+	if (dev->subordinate) {
+		int freset = 0;
+
+		pci_walk_bus(dev->subordinate,
+			     pnv_pci_dev_reset_type,
+			     &freset);
+		option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
+	}
+
 	pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
 	pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 29/42] powerpc/pci: Don't scan empty slot
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (24 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 27/42] powerpc/powernv: Don't cover root bus in pnv_pci_reset_secondary_bus() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 30/42] powerpc/pci: Move pcibios_find_pci_bus() around Gavin Shan
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

In hotplug case, function pcibios_add_pci_devices() is called to
rescan the specified PCI bus, which might not have any child devices.
Access to the PCI bus's child device node will cause kernel crash
without exception.

This adds condition of skipping scanning PCI bus without child devices
in order to avoid kernel crash.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 59c4361..c307d9a 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -92,7 +92,8 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
 	if (mode == PCI_PROBE_DEVTREE) {
 		/* use ofdt-based probe */
 		of_rescan_bus(dn, bus);
-	} else if (mode == PCI_PROBE_NORMAL) {
+	} else if (mode == PCI_PROBE_NORMAL &&
+		   dn->child && PCI_DN(dn->child)) {
 		/*
 		 * Use legacy probe. In the partial hotplug case, we
 		 * probably have grandchildren devices unplugged. So
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 30/42] powerpc/pci: Move pcibios_find_pci_bus() around
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (25 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 29/42] powerpc/pci: Don't scan empty slot Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 31/42] powerpc/pci: Rename pcibios_{add,remove}_pci_devices Gavin Shan
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

This moves pcibios_find_pci_bus() to PowerPC kernel directory
so that it can be reused by hotplug code for pSeries and PowerNV
platform at the same time. Also, the function is renamed to
of_node_to_pci_bus().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/pci-bridge.h      |  2 +-
 arch/powerpc/kernel/pci-hotplug.c          | 30 ++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pci_dlpar.c | 32 ------------------------------
 drivers/pci/hotplug/rpadlpar_core.c        |  6 +++---
 drivers/pci/hotplug/rpaphp_pci.c           |  2 +-
 5 files changed, 35 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 65357a9..84dee1e 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -259,7 +259,7 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
 #endif
 
 /** Find the bus corresponding to the indicated device node */
-extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
+extern struct pci_bus *of_node_to_pci_bus(struct device_node *dn);
 
 /** Remove all of the PCI devices under this bus */
 extern void pcibios_remove_pci_devices(struct pci_bus *bus);
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index c307d9a..692beca 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -21,6 +21,36 @@
 #include <asm/firmware.h>
 #include <asm/eeh.h>
 
+static struct pci_bus *find_pci_bus(struct pci_bus *bus,
+				    struct device_node *dn)
+{
+	struct pci_bus *tmp, *child = NULL;
+	struct device_node *busdn;
+
+	busdn = pci_bus_to_OF_node(bus);
+	if (busdn == dn)
+		return bus;
+
+	list_for_each_entry(tmp, &bus->children, node) {
+		child = find_pci_bus(tmp, dn);
+		if (child)
+			break;
+	}
+
+	return child;
+}
+
+struct pci_bus *of_node_to_pci_bus(struct device_node *dn)
+{
+	struct pci_dn *pdn = dn->data;
+
+	if (!pdn  || !pdn->phb || !pdn->phb->bus)
+		return NULL;
+
+	return find_pci_bus(pdn->phb->bus, dn);
+}
+EXPORT_SYMBOL_GPL(of_node_to_pci_bus);
+
 /**
  * pcibios_release_device - release PCI device
  * @dev: PCI device
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 5d4a3df..906dbaa 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -34,38 +34,6 @@
 
 #include "pseries.h"
 
-static struct pci_bus *
-find_bus_among_children(struct pci_bus *bus,
-                        struct device_node *dn)
-{
-	struct pci_bus *child = NULL;
-	struct pci_bus *tmp;
-	struct device_node *busdn;
-
-	busdn = pci_bus_to_OF_node(bus);
-	if (busdn == dn)
-		return bus;
-
-	list_for_each_entry(tmp, &bus->children, node) {
-		child = find_bus_among_children(tmp, dn);
-		if (child)
-			break;
-	};
-	return child;
-}
-
-struct pci_bus *
-pcibios_find_pci_bus(struct device_node *dn)
-{
-	struct pci_dn *pdn = dn->data;
-
-	if (!pdn  || !pdn->phb || !pdn->phb->bus)
-		return NULL;
-
-	return find_bus_among_children(pdn->phb->bus, dn);
-}
-EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
-
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
 	struct pci_controller *phb;
diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
index e12bafd..f57a293 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -176,7 +176,7 @@ static int dlpar_add_pci_slot(char *drc_name, struct device_node *dn)
 	struct pci_dev *dev;
 	struct pci_controller *phb;
 
-	if (pcibios_find_pci_bus(dn))
+	if (of_node_to_pci_bus(dn))
 		return -EINVAL;
 
 	/* Add pci bus */
@@ -213,7 +213,7 @@ static int dlpar_remove_phb(char *drc_name, struct device_node *dn)
 	struct pci_dn *pdn;
 	int rc = 0;
 
-	if (!pcibios_find_pci_bus(dn))
+	if (!of_node_to_pci_bus(dn))
 		return -EINVAL;
 
 	/* If pci slot is hotpluggable, use hotplug to remove it */
@@ -357,7 +357,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
 
 	pci_lock_rescan_remove();
 
-	bus = pcibios_find_pci_bus(dn);
+	bus = of_node_to_pci_bus(dn);
 	if (!bus) {
 		ret = -EINVAL;
 		goto out;
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
index 9243f3e7..293bd86 100644
--- a/drivers/pci/hotplug/rpaphp_pci.c
+++ b/drivers/pci/hotplug/rpaphp_pci.c
@@ -93,7 +93,7 @@ int rpaphp_enable_slot(struct slot *slot)
 	if (rc)
 		return rc;
 
-	bus = pcibios_find_pci_bus(slot->dn);
+	bus = of_node_to_pci_bus(slot->dn);
 	if (!bus) {
 		err("%s: no pci_bus for dn %s\n", __func__, slot->dn->full_name);
 		return -EINVAL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 31/42] powerpc/pci: Rename pcibios_{add,remove}_pci_devices
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (26 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 30/42] powerpc/pci: Move pcibios_find_pci_bus() around Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
       [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

This renames pcibios_{add,remove}_pci_devices to avoid conflicts
with names of weak functions in PCI subsystem. This doesn't
introduce logicial changes.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h |  4 ++--
 arch/powerpc/kernel/eeh_driver.c      | 12 ++++++------
 arch/powerpc/kernel/pci-hotplug.c     | 15 +++++++--------
 drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
 drivers/pci/hotplug/rpaphp_core.c     |  4 ++--
 drivers/pci/hotplug/rpaphp_pci.c      |  2 +-
 6 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 84dee1e..787a879 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -262,10 +262,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn *pdn)
 extern struct pci_bus *of_node_to_pci_bus(struct device_node *dn);
 
 /** Remove all of the PCI devices under this bus */
-extern void pcibios_remove_pci_devices(struct pci_bus *bus);
+extern void pci_remove_pci_devices(struct pci_bus *bus);
 
 /** Discover new pci devices under this bus, and add them */
-extern void pcibios_add_pci_devices(struct pci_bus *bus);
+extern void pci_add_pci_devices(struct pci_bus *bus);
 
 
 extern void isa_bridge_find_early(struct pci_controller *hose);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 99868e2..290a9df 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -600,7 +600,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	 * We don't remove the corresponding PE instances because
 	 * we need the information afterwords. The attached EEH
 	 * devices are expected to be attached soon when calling
-	 * into pcibios_add_pci_devices().
+	 * into pci_add_pci_devices().
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
@@ -608,7 +608,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
 		else {
 			pci_lock_rescan_remove();
-			pcibios_remove_pci_devices(bus);
+			pci_remove_pci_devices(bus);
 			pci_unlock_rescan_remove();
 		}
 	} else if (frozen_bus)
@@ -658,7 +658,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		if (pe->type & EEH_PE_VF)
 			eeh_add_virt_device(edev, NULL);
 		else
-			pcibios_add_pci_devices(bus);
+			pci_add_pci_devices(bus);
 	} else if (frozen_bus && removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
@@ -668,7 +668,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		if (pe->type & EEH_PE_VF)
 			eeh_add_virt_device(edev, NULL);
 		else
-			pcibios_add_pci_devices(frozen_bus);
+			pci_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -852,7 +852,7 @@ perm_error:
 		} else {
 			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 			pci_lock_rescan_remove();
-			pcibios_remove_pci_devices(frozen_bus);
+			pci_remove_pci_devices(frozen_bus);
 			pci_unlock_rescan_remove();
 		}
 	}
@@ -936,7 +936,7 @@ static void eeh_handle_special_event(void)
 				bus = eeh_pe_bus_get(phb_pe);
 				eeh_pe_dev_traverse(pe,
 					eeh_report_failure, NULL);
-				pcibios_remove_pci_devices(bus);
+				pci_remove_pci_devices(bus);
 			}
 			pci_unlock_rescan_remove();
 		}
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 692beca..00f193b 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -68,20 +68,20 @@ void pcibios_release_device(struct pci_dev *dev)
 }
 
 /**
- * pcibios_remove_pci_devices - remove all devices under this bus
+ * pci_remove_pci_devices - remove all devices under this bus
  * @bus: the indicated PCI bus
  *
  * Remove all of the PCI devices under this bus both from the
  * linux pci device tree, and from the powerpc EEH address cache.
  */
-void pcibios_remove_pci_devices(struct pci_bus *bus)
+void pci_remove_pci_devices(struct pci_bus *bus)
 {
 	struct pci_dev *dev, *tmp;
 	struct pci_bus *child_bus;
 
 	/* First go down child busses */
 	list_for_each_entry(child_bus, &bus->children, node)
-		pcibios_remove_pci_devices(child_bus);
+		pci_remove_pci_devices(child_bus);
 
 	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 		 pci_domain_nr(bus),  bus->number);
@@ -90,11 +90,10 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 		pci_stop_and_remove_bus_device(dev);
 	}
 }
-
-EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
+EXPORT_SYMBOL_GPL(pci_remove_pci_devices);
 
 /**
- * pcibios_add_pci_devices - adds new pci devices to bus
+ * pci_add_pci_devices - adds new pci devices to bus
  * @bus: the indicated PCI bus
  *
  * This routine will find and fixup new pci devices under
@@ -104,7 +103,7 @@ EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
  * is how this routine differs from other, similar pcibios
  * routines.)
  */
-void pcibios_add_pci_devices(struct pci_bus * bus)
+void pci_add_pci_devices(struct pci_bus *bus)
 {
 	int slotno, mode, pass, max;
 	struct pci_dev *dev;
@@ -145,4 +144,4 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
 	}
 	pcibios_finish_adding_to_bus(bus);
 }
-EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
+EXPORT_SYMBOL_GPL(pci_add_pci_devices);
diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
index f57a293..8870557 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -381,7 +381,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
 	}
 
 	/* Remove all devices below slot */
-	pcibios_remove_pci_devices(bus);
+	pci_remove_pci_devices(bus);
 
 	/* Unmap PCI IO space */
 	if (pcibios_unmap_io_space(bus)) {
diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
index f2945fa..3034693 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -405,7 +405,7 @@ static int enable_slot(struct hotplug_slot *hotplug_slot)
 
 	if (state == PRESENT) {
 		pci_lock_rescan_remove();
-		pcibios_add_pci_devices(slot->bus);
+		pci_add_pci_devices(slot->bus);
 		pci_unlock_rescan_remove();
 		slot->state = CONFIGURED;
 	} else if (state == EMPTY) {
@@ -427,7 +427,7 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
 		return -EINVAL;
 
 	pci_lock_rescan_remove();
-	pcibios_remove_pci_devices(slot->bus);
+	pci_remove_pci_devices(slot->bus);
 	pci_unlock_rescan_remove();
 	vm_unmap_aliases();
 
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
index 293bd86..2b99d48 100644
--- a/drivers/pci/hotplug/rpaphp_pci.c
+++ b/drivers/pci/hotplug/rpaphp_pci.c
@@ -116,7 +116,7 @@ int rpaphp_enable_slot(struct slot *slot)
 		}
 
 		if (list_empty(&bus->devices))
-			pcibios_add_pci_devices(bus);
+			pci_add_pci_devices(bus);
 
 		if (!list_empty(&bus->devices)) {
 			info->adapter_status = CONFIGURED;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 32/42] powerpc/powernv: Introduce pnv_pci_poll()
       [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-08-06  4:11   ` [PATCH v6 28/42] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus() Gavin Shan
@ 2015-08-06  4:11   ` Gavin Shan
  2015-08-06  4:11   ` [PATCH v6 33/42] powerpc/powernv: Functions to get/reset PCI slot status Gavin Shan
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r, mpe-Gsx/Oe8HsFggBc27wqDAHg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	panto-wVdstyuyKrO8r51toPun2/C9HSW9iNxf,
	aik-sLpHqDYs0B2HXe+LvDLADg, Gavin Shan

This converts pnv_eeh_poll() to pnv_pci_poll() in order to:

   * Return linux error code other than OPAL error code.
   * The return value from last OPAL call, requested delay, is
     passed to pnv_pci_poll() and delay accordingly. Thus one
     call to opal_pci_poll() is saved.
   * More information (e.g. PCI slot power status) is returned
     if the last argument isn't NULL.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 47 ++++++----------------------
 arch/powerpc/platforms/powernv/pci.c         | 21 +++++++++++++
 arch/powerpc/platforms/powernv/pci.h         |  1 +
 3 files changed, 31 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 4ae48ff..e664542 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -745,28 +745,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
 	return ret;
 }
 
-static s64 pnv_eeh_poll(uint64_t id)
-{
-	s64 rc = OPAL_HARDWARE;
-
-	while (1) {
-		rc = opal_pci_poll(id, NULL);
-		if (rc <= 0)
-			break;
-
-		if (system_state < SYSTEM_RUNNING)
-			udelay(1000 * rc);
-		else
-			msleep(rc);
-	}
-
-	return rc;
-}
-
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
 	s64 rc = OPAL_HARDWARE;
+	int ret;
 
 	pr_debug("%s: Reset PHB#%x, option=%d\n",
 		 __func__, hose->global_number, option);
@@ -781,8 +764,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 		rc = opal_pci_reset(phb->opal_id,
 				    OPAL_RESET_PHB_COMPLETE,
 				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
 
 	/*
 	 * Poll state of the PHB until the request is done
@@ -790,24 +771,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 	 * reset followed by hot reset on root bus. So we also
 	 * need the PCI bus settlement delay.
 	 */
-	rc = pnv_eeh_poll(phb->opal_id);
-	if (option == EEH_RESET_DEACTIVATE) {
+	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
+	if (option == EEH_RESET_DEACTIVATE && !ret) {
 		if (system_state < SYSTEM_RUNNING)
 			udelay(1000 * EEH_PE_RST_SETTLE_TIME);
 		else
 			msleep(EEH_PE_RST_SETTLE_TIME);
 	}
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
 
-	return 0;
+	return ret;
 }
 
 static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 {
 	struct pnv_phb *phb = hose->private_data;
 	s64 rc = OPAL_HARDWARE;
+	int ret;
 
 	pr_debug("%s: Reset PHB#%x, option=%d\n",
 		 __func__, hose->global_number, option);
@@ -829,18 +808,13 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 		rc = opal_pci_reset(phb->opal_id,
 				    OPAL_RESET_PCI_HOT,
 				    OPAL_DEASSERT_RESET);
-	if (rc < 0)
-		goto out;
 
 	/* Poll state of the PHB until the request is done */
-	rc = pnv_eeh_poll(phb->opal_id);
-	if (option == EEH_RESET_DEACTIVATE)
+	ret = pnv_pci_poll(phb->opal_id, rc, NULL);
+	if (option == EEH_RESET_DEACTIVATE && !ret)
 		msleep(EEH_PE_RST_SETTLE_TIME);
-out:
-	if (rc != OPAL_SUCCESS)
-		return -EIO;
 
-	return 0;
+	return ret;
 }
 
 static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
@@ -930,10 +904,7 @@ static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
 	phb = hose->private_data;
 	id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
 	rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
-	if (rc > 0)
-		rc = pnv_eeh_poll(id);
-
-	return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+	return pnv_pci_poll(id, rc, NULL);
 }
 
 static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 6c350a2..801e3e8 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -44,6 +44,27 @@
 #define cfg_dbg(fmt...)	do { } while(0)
 //#define cfg_dbg(fmt...)	printk(fmt)
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval)
+{
+	while (rval > 0) {
+		if (system_state < SYSTEM_RUNNING)
+			udelay(1000 * rval);
+		else
+			msleep(rval);
+
+		rval = opal_pci_poll(id, pval);
+	}
+
+	/*
+	 * The caller expects to retrieve additional information
+	 * if the last argument is valid.
+	 */
+	if (rval == OPAL_SUCCESS && pval)
+		rval = opal_pci_poll(id, pval);
+
+	return rval ? -EIO : 0;
+}
+
 #ifdef CONFIG_PCI_MSI
 int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 2058f06..99d2da6 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -214,6 +214,7 @@ extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
 		unsigned long *hpa, enum dma_data_direction *direction);
 extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval);
 void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
 				unsigned char *log_buff);
 int pnv_pci_cfg_read(struct pci_dn *pdn,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 33/42] powerpc/powernv: Functions to get/reset PCI slot status
       [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-08-06  4:11   ` [PATCH v6 32/42] powerpc/powernv: Introduce pnv_pci_poll() Gavin Shan
@ 2015-08-06  4:11   ` Gavin Shan
  2015-08-06  4:11   ` [PATCH v6 34/42] powerpc/pci: Delay creating pci_dn Gavin Shan
  2015-08-06  4:11   ` [PATCH v6 37/42] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
  6 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r, mpe-Gsx/Oe8HsFggBc27wqDAHg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	panto-wVdstyuyKrO8r51toPun2/C9HSW9iNxf,
	aik-sLpHqDYs0B2HXe+LvDLADg, Gavin Shan

The patch exports 4 functions, which base on corresponding OPAL
APIs to get or set PCI slot status. Those functions are going to
be used by PCI hotplug module in subsequent patches:

   pnv_pci_get_device_tree()      opal_get_device_tree()
   pnv_pci_get_presence_status()  opal_pci_get_presence_status()
   pnv_pci_get_power_status()     opal_pci_get_power_status()
   pnv_pci_set_power_status()     opal_pci_set_power_status()

Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
unregister}() to allow registration and unregistration of PCI hotplug
notifier, which will be used to receive PCI hotplug message from skiboot
firmware.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/include/asm/opal-api.h            |  8 +++-
 arch/powerpc/include/asm/opal.h                |  5 ++
 arch/powerpc/include/asm/pnv-pci.h             |  7 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
 arch/powerpc/platforms/powernv/pci.c           | 66 ++++++++++++++++++++++++++
 5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 442995b..33c67ee 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -154,7 +154,11 @@
 #define OPAL_FLASH_WRITE			111
 #define OPAL_FLASH_ERASE			112
 #define OPAL_PRD_MSG				113
-#define OPAL_LAST				113
+#define OPAL_GET_DEVICE_TREE			117
+#define OPAL_PCI_GET_PRESENCE_STATUS		118
+#define OPAL_PCI_GET_POWER_STATUS		119
+#define OPAL_PCI_SET_POWER_STATUS		120
+#define OPAL_LAST				120
 
 /* Device tree flags */
 
@@ -361,6 +365,8 @@ enum opal_msg_type {
 	OPAL_MSG_HMI_EVT,
 	OPAL_MSG_DPO,
 	OPAL_MSG_PRD,
+	OPAL_MSG_OCC,
+	OPAL_MSG_PCI_HOTPLUG,
 	OPAL_MSG_TYPE_MAX,
 };
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index bbb3aa6..53b8528 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -203,6 +203,11 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
 		uint64_t size, uint64_t token);
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
 		uint64_t token);
+int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
+int64_t opal_pci_get_presence_status(uint64_t id, uint8_t *status);
+int64_t opal_pci_get_power_status(uint64_t id, uint8_t *status);
+int64_t opal_pci_set_power_status(uint64_t id, uint8_t status);
+
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index 6f77f71..7efa87f 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,13 @@
 #include <linux/pci.h>
 #include <misc/cxl-base.h>
 
+extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
+extern int pnv_pci_get_presence_status(uint64_t id, uint8_t *status);
+extern int pnv_pci_get_power_status(uint64_t id, uint8_t *status);
+extern int pnv_pci_set_power_status(uint64_t id, uint8_t status);
+extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
+extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
+
 int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
 int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 			   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 88e4333..804f8cc 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -298,3 +298,7 @@ OPAL_CALL(opal_flash_read,			OPAL_FLASH_READ);
 OPAL_CALL(opal_flash_write,			OPAL_FLASH_WRITE);
 OPAL_CALL(opal_flash_erase,			OPAL_FLASH_ERASE);
 OPAL_CALL(opal_prd_msg,				OPAL_PRD_MSG);
+OPAL_CALL(opal_get_device_tree,			OPAL_GET_DEVICE_TREE);
+OPAL_CALL(opal_pci_get_presence_status,		OPAL_PCI_GET_PRESENCE_STATUS);
+OPAL_CALL(opal_pci_get_power_status,		OPAL_PCI_GET_POWER_STATUS);
+OPAL_CALL(opal_pci_set_power_status,		OPAL_PCI_SET_POWER_STATUS);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 801e3e8..5982110 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -65,6 +65,72 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval)
 	return rval ? -EIO : 0;
 }
 
+int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_GET_DEVICE_TREE))
+		return -ENXIO;
+
+	rc = opal_get_device_tree(phandle, (uint64_t)buf, len);
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_device_tree);
+
+int pnv_pci_get_presence_status(uint64_t id, uint8_t *status)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATUS))
+		return -ENXIO;
+
+	rc = opal_pci_get_presence_status(id, status);
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_presence_status);
+
+int pnv_pci_get_power_status(uint64_t id, uint8_t *status)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_GET_POWER_STATUS))
+		return -ENXIO;
+
+	rc = opal_pci_get_power_status(id, status);
+	return pnv_pci_poll(id, rc, status);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_power_status);
+
+int pnv_pci_set_power_status(uint64_t id, uint8_t status)
+{
+	int64_t rc;
+
+	if (!opal_check_token(OPAL_PCI_SET_POWER_STATUS))
+		return -ENXIO;
+
+	rc = opal_pci_set_power_status(id, status);
+	return pnv_pci_poll(id, rc, NULL);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_set_power_status);
+
+int pnv_pci_hotplug_notifier_register(struct notifier_block *nb)
+{
+	return opal_message_notifier_register(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_register);
+
+int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb)
+{
+	return opal_message_notifier_unregister(OPAL_MSG_PCI_HOTPLUG, nb);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_hotplug_notifier_unregister);
+
 #ifdef CONFIG_PCI_MSI
 int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 34/42] powerpc/pci: Delay creating pci_dn
       [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-08-06  4:11   ` [PATCH v6 33/42] powerpc/powernv: Functions to get/reset PCI slot status Gavin Shan
@ 2015-08-06  4:11   ` Gavin Shan
  2015-08-06  4:11   ` [PATCH v6 37/42] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
  6 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r, mpe-Gsx/Oe8HsFggBc27wqDAHg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	panto-wVdstyuyKrO8r51toPun2/C9HSW9iNxf,
	aik-sLpHqDYs0B2HXe+LvDLADg, Gavin Shan

The pci_dn instances are allocated from memblock or bootmem when
creating PCI controller (hoses) in setup_arch(). The PCI hotplug,
which will be supported by proceeding patches, will release PCI
device nodes and their corresponding pci_dn on unplugging event.
The pci_dn instance memory chunks alloed from memblock or bootmem
are hard to reused after being released.

This delays creating pci_dn using core_initcall() so that they can
be allocated from slab. In turn, the memory chunks for them can be
reused after being released without problem. Since the pci_dn and
eeh_dev has same life cycle, the eeh_dev is created when pci_dn is
populated. We needn't create eeh_dev with another initcall. The
time to create PHB PEs is delayed a bit from core_initcall() to
core_initcall_sync().

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/include/asm/eeh.h         |  2 +-
 arch/powerpc/include/asm/ppc-pci.h     |  1 -
 arch/powerpc/kernel/eeh_dev.c          | 19 ++++------------
 arch/powerpc/kernel/pci_dn.c           | 20 +++++++++++++++--
 arch/powerpc/platforms/maple/pci.c     | 34 ++++++++++++++++++-----------
 arch/powerpc/platforms/pasemi/pci.c    |  3 ---
 arch/powerpc/platforms/powermac/pci.c  | 40 ++++++++++++++++++++--------------
 arch/powerpc/platforms/powernv/pci.c   |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  7 +-----
 9 files changed, 69 insertions(+), 60 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index ea1f13c4..19b6050 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -272,7 +272,7 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
-void *eeh_dev_init(struct pci_dn *pdn, void *data);
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn, struct pci_controller *phb);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
 int eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index ca0c5bf..916775d 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -40,7 +40,6 @@ void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
 
-extern void pci_devs_phb_init(void);
 extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
 
 /* From rtas_pci.h */
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index aabba94..7a135c1 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -44,14 +44,13 @@
 /**
  * eeh_dev_init - Create EEH device according to OF node
  * @pdn: PCI device node
- * @data: PHB
+ * @phb: PCI controller
  *
  * It will create EEH device according to the given OF node. The function
  * might be called by PCI emunation, DR, PHB hotplug.
  */
-void *eeh_dev_init(struct pci_dn *pdn, void *data)
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn, struct pci_controller *phb)
 {
-	struct pci_controller *phb = data;
 	struct eeh_dev *edev;
 
 	/* Allocate EEH device */
@@ -68,7 +67,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
 	edev->phb = phb;
 	INIT_LIST_HEAD(&edev->list);
 
-	return NULL;
+	return edev;
 }
 
 /**
@@ -80,16 +79,8 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
  */
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
 {
-	struct pci_dn *root = phb->pci_data;
-
 	/* EEH PE for PHB */
 	eeh_phb_pe_create(phb);
-
-	/* EEH device for PHB */
-	eeh_dev_init(root, phb);
-
-	/* EEH devices for children OF nodes */
-	traverse_pci_dn(root, eeh_dev_init, phb);
 }
 
 /**
@@ -105,9 +96,7 @@ static int __init eeh_dev_phb_init(void)
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		eeh_dev_phb_init_dynamic(phb);
 
-	pr_info("EEH: devices created\n");
-
 	return 0;
 }
 
-core_initcall(eeh_dev_phb_init);
+core_initcall_sync(eeh_dev_phb_init);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f0ddde7..53a11e9 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -290,8 +290,11 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	const __be32 *regs;
 	struct device_node *parent;
 	struct pci_dn *pdn;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev;
+#endif
 
-	pdn = zalloc_maybe_bootmem(sizeof(*pdn), GFP_KERNEL);
+	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
 	if (pdn == NULL)
 		return NULL;
 	dn->data = pdn;
@@ -320,6 +323,15 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 	/* Extended config space */
 	pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1);
 
+	/* Initialize EEH device */
+#ifdef CONFIG_EEH
+	edev = eeh_dev_init(pdn, phb);
+	if (!edev) {
+		kfree(pdn);
+		return NULL;
+	}
+#endif
+
 	/* Attach to parent node */
 	INIT_LIST_HEAD(&pdn->child_list);
 	INIT_LIST_HEAD(&pdn->list);
@@ -465,15 +477,19 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
  * pci device found underneath.  This routine runs once,
  * early in the boot sequence.
  */
-void __init pci_devs_phb_init(void)
+static int __init pci_devs_phb_init(void)
 {
 	struct pci_controller *phb, *tmp;
 
 	/* This must be done first so the device nodes have valid pci info! */
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		pci_devs_phb_init_dynamic(phb);
+
+	return 0;
 }
 
+core_initcall(pci_devs_phb_init);
+
 static void pci_dev_pdn_setup(struct pci_dev *pdev)
 {
 	struct pci_dn *pdn;
diff --git a/arch/powerpc/platforms/maple/pci.c b/arch/powerpc/platforms/maple/pci.c
index a923230..a2f89e6 100644
--- a/arch/powerpc/platforms/maple/pci.c
+++ b/arch/powerpc/platforms/maple/pci.c
@@ -568,6 +568,26 @@ void maple_pci_irq_fixup(struct pci_dev *dev)
 	DBG(" <- maple_pci_irq_fixup\n");
 }
 
+static int maple_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions hopefully.
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+
 void __init maple_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -605,19 +625,7 @@ void __init maple_pci_init(void)
 	if (ht && maple_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */ 
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions hopefully.
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
+	ppc_md.pcibios_root_bridge_prepare = maple_pci_root_bridge_prepare;
 
 	/* Tell pci.c to not change any resource allocations.  */
 	pci_add_flags(PCI_PROBE_ONLY);
diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
index f3a68a0..10c4e8f 100644
--- a/arch/powerpc/platforms/pasemi/pci.c
+++ b/arch/powerpc/platforms/pasemi/pci.c
@@ -229,9 +229,6 @@ void __init pas_pci_init(void)
 			of_node_get(np);
 
 	of_node_put(root);
-
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
 }
 
 void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)
diff --git a/arch/powerpc/platforms/powermac/pci.c b/arch/powerpc/platforms/powermac/pci.c
index 59ab16f..20d7bde 100644
--- a/arch/powerpc/platforms/powermac/pci.c
+++ b/arch/powerpc/platforms/powermac/pci.c
@@ -878,6 +878,29 @@ void pmac_pci_irq_fixup(struct pci_dev *dev)
 #endif /* CONFIG_PPC32 */
 }
 
+#ifdef CONFIG_PPC64
+static int pmac_pci_root_bridge_prepare(struct pci_hot_bridge *bridge)
+{
+	struct pci_controller *hose = pci_bus_to_host(bridge->bus);
+	struct device_node *np, *child;
+
+	if (hose != u3_agp)
+		return 0;
+
+	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
+	 * assume there is no P2P bridge on the AGP bus, which should be a
+	 * safe assumptions for now. We should do something better in the
+	 * future though
+	 */
+	np = hose->dn;
+	PCI_DN(np)->busno = 0xf0;
+	for_each_child_of_node(np, child)
+		PCI_DN(child)->busno = 0xf0;
+
+	return 0;
+}
+#endif /* CONFIG_PPC64 */
+
 void __init pmac_pci_init(void)
 {
 	struct device_node *np, *root;
@@ -914,22 +937,7 @@ void __init pmac_pci_init(void)
 	if (ht && pmac_add_bridge(ht) != 0)
 		of_node_put(ht);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
-	/* Fixup the PCI<->OF mapping for U3 AGP due to bus renumbering. We
-	 * assume there is no P2P bridge on the AGP bus, which should be a
-	 * safe assumptions for now. We should do something better in the
-	 * future though
-	 */
-	if (u3_agp) {
-		struct device_node *np = u3_agp->dn;
-		PCI_DN(np)->busno = 0xf0;
-		for (np = np->child; np; np = np->sibling)
-			PCI_DN(np)->busno = 0xf0;
-	}
-	/* pmac_check_ht_link(); */
-
+	ppc_md.pcibios_root_bridge_prepare = pmac_pci_root_bridge_prepare;
 #else /* CONFIG_PPC64 */
 	init_p2pbridge();
 	init_second_ohare();
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 5982110..b13186d 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -935,9 +935,6 @@ void __init pnv_pci_init(void)
 	for_each_compatible_node(np, NULL, "ibm,ioda2-phb")
 		pnv_pci_init_ioda2_phb(np);
 
-	/* Setup the linkage between OF nodes and PHBs */
-	pci_devs_phb_init();
-
 	/* Configure IOMMU DMA hooks */
 	set_pci_dma_ops(&dma_iommu_ops);
 }
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index df6a704..92974aa 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -261,12 +261,8 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 	switch (action) {
 	case OF_RECONFIG_ATTACH_NODE:
 		pci = np->parent->data;
-		if (pci) {
+		if (pci)
 			update_dn_pci_info(np, pci->phb);
-
-			/* Create EEH device for the OF node */
-			eeh_dev_init(PCI_DN(np), pci->phb);
-		}
 		break;
 	default:
 		err = NOTIFY_DONE;
@@ -482,7 +478,6 @@ static void __init find_and_init_phbs(void)
 	}
 
 	of_node_put(root);
-	pci_devs_phb_init();
 
 	/*
 	 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 35/42] powerpc/pci: Export traverse_pci_device_nodes()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (28 preceding siblings ...)
       [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 36/42] powerpc/pci: Update bridge windows on PCI plugging Gavin Shan
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

Previously we wouldn't remove pdn because PCI hotplug isn't
supported. update_dn_pci_info() is called at system booting
time to create pdn for PCI device nodes. However, it's going
to be changed later because of PCI hotplug.

This converts update_dn_pci_info() to add_pci_device_node_info(),
traverse_pci_devices() to traverse_pci_device_nodes(). This also
adds remove_pci_device_node_info() which will be used in subsequent
patch at the moment of unplugging PCI devices. All those functions
are exported for PowerNV hotplug driver to use.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h  |  4 ++-
 arch/powerpc/include/asm/ppc-pci.h     |  8 ++++--
 arch/powerpc/kernel/pci_dn.c           | 51 +++++++++++++++++++++++++++++-----
 arch/powerpc/platforms/pseries/setup.c |  2 +-
 4 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 787a879..010eb54 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -237,7 +237,9 @@ extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
 extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
 extern void remove_dev_pci_data(struct pci_dev *pdev);
-extern void *update_dn_pci_info(struct device_node *dn, void *data);
+extern void *add_pci_device_node_info(struct device_node *dn,
+				      struct pci_controller *phb);
+extern void remove_pci_device_node_info(struct device_node *np);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
 					  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 916775d..c87ed42 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -33,9 +33,11 @@ extern struct pci_dev *isa_bridge_pcidev;	/* may be NULL if no ISA bus */
 struct device_node;
 struct pci_dn;
 
-typedef void *(*traverse_func)(struct device_node *me, void *data);
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data);
+typedef void *(*traverse_func)(struct device_node *me,
+			       struct pci_controller *phb);
+void *traverse_pci_device_nodes(struct device_node *start,
+				traverse_func pre,
+				struct pci_controller *phb);
 void *traverse_pci_dn(struct pci_dn *root,
 		      void *(*fn)(struct pci_dn *, void *),
 		      void *data);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 53a11e9..3a38a55 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -283,9 +283,9 @@ void remove_dev_pci_data(struct pci_dev *pdev)
  * Traverse_func that inits the PCI fields of the device node.
  * NOTE: this *must* be done before read/write config to the device.
  */
-void *update_dn_pci_info(struct device_node *dn, void *data)
+void *add_pci_device_node_info(struct device_node *dn,
+			       struct pci_controller *phb)
 {
-	struct pci_controller *phb = data;
 	const __be32 *type = of_get_property(dn, "ibm,pci-config-space-type", NULL);
 	const __be32 *regs;
 	struct device_node *parent;
@@ -342,6 +342,42 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
 
 	return NULL;
 }
+EXPORT_SYMBOL(add_pci_device_node_info);
+
+/**
+ * remove_pci_device_node_info - Remove pci_dn from PCI device node
+ * @dn: PCI device node
+ *
+ * Remove pci_dn from PCI device node. The pci_dn is also removed
+ * from the child list of the parent pci_dn.
+ */
+void remove_pci_device_node_info(struct device_node *np)
+{
+	struct pci_dn *pdn = np ? PCI_DN(np) : NULL;
+#ifdef CONFIG_EEH
+	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+#endif
+
+	if (!pdn)
+		return;
+
+#ifdef CONFIG_EEH
+	if (edev) {
+		pdn->edev = NULL;
+		kfree(edev);
+	}
+#endif
+
+	BUG_ON(!list_empty(&pdn->child_list));
+	list_del(&pdn->list);
+	if (pdn->parent)
+		of_node_put(pdn->parent->node);
+
+	np->data = NULL;
+	kfree(pdn);
+}
+EXPORT_SYMBOL(remove_pci_device_node_info);
+
 
 /*
  * Traverse a device tree stopping each PCI device in the tree.
@@ -361,8 +397,8 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
  * one of these nodes we also assume its siblings are non-pci for
  * performance.
  */
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-		void *data)
+void *traverse_pci_device_nodes(struct device_node *start, traverse_func pre,
+				struct pci_controller *phb)
 {
 	struct device_node *dn, *nextdn;
 	void *ret;
@@ -377,7 +413,7 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
 		if (classp)
 			class = of_read_number(classp, 1);
 
-		if (pre && ((ret = pre(dn, data)) != NULL))
+		if (pre && ((ret = pre(dn, phb)) != NULL))
 			return ret;
 
 		/* If we are a PCI bridge, go down */
@@ -400,6 +436,7 @@ void *traverse_pci_devices(struct device_node *start, traverse_func pre,
 	}
 	return NULL;
 }
+EXPORT_SYMBOL(traverse_pci_device_nodes);
 
 static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
 				      struct pci_dn *pdn)
@@ -455,7 +492,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	struct pci_dn *pdn;
 
 	/* PHB nodes themselves must not match */
-	update_dn_pci_info(dn, phb);
+	add_pci_device_node_info(dn, phb);
 	pdn = dn->data;
 	if (pdn) {
 		pdn->devfn = pdn->busno = -1;
@@ -465,7 +502,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
 	}
 
 	/* Update dn->phb ptrs for new phb and children devices */
-	traverse_pci_devices(dn, update_dn_pci_info, phb);
+	traverse_pci_device_nodes(dn, add_pci_device_node_info, phb);
 }
 
 /** 
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 92974aa..ed8c894 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -262,7 +262,7 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
 	case OF_RECONFIG_ATTACH_NODE:
 		pci = np->parent->data;
 		if (pci)
-			update_dn_pci_info(np, pci->phb);
+			add_pci_device_node_info(np, pci->phb);
 		break;
 	default:
 		err = NOTIFY_DONE;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 36/42] powerpc/pci: Update bridge windows on PCI plugging
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (29 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 35/42] powerpc/pci: Export traverse_pci_device_nodes() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level Gavin Shan
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

During the PCI plugging event, the PCI devices are rescanned and
their IO and MMIO resources are reassigned. However, the PowerNV
platform will assign PE# based on that, which depends on updating
to window of bridge of the PE's primary bus.

The patch updates the windows of bridge of PE's primary bus if
we have valid bridge. Otherwise, we assume it's root bus or SRIOV
virtual bus and PE won't be assigned during PCI plugging time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-common.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 9c88dcd1..713559d 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1473,8 +1473,12 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus)
 	/* Allocate bus and devices resources */
 	pcibios_allocate_bus_resources(bus);
 	pcibios_claim_one_bus(bus);
-	if (!pci_has_flag(PCI_PROBE_ONLY))
-		pci_assign_unassigned_bus_resources(bus);
+	if (!pci_has_flag(PCI_PROBE_ONLY)) {
+		if (bus->self)
+			pci_assign_unassigned_bridge_resources(bus->self);
+		else
+			pci_assign_unassigned_bus_resources(bus);
+	}
 
 	/* Fixup EEH */
 	eeh_add_device_tree_late(bus);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 37/42] powerpc/powernv: Select OF_DYNAMIC
       [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-08-06  4:11   ` [PATCH v6 34/42] powerpc/pci: Delay creating pci_dn Gavin Shan
@ 2015-08-06  4:11   ` Gavin Shan
  6 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r, mpe-Gsx/Oe8HsFggBc27wqDAHg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	panto-wVdstyuyKrO8r51toPun2/C9HSW9iNxf,
	aik-sLpHqDYs0B2HXe+LvDLADg, Gavin Shan

The device tree nodes will be changed dynamically on PCI hotplug
events on PowerNV platform. This enables CONFIG_OF_DYNAMIC on
PowerNV platform to support that.

Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/powerpc/platforms/powernv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 604190c..e7b1ad7 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -18,6 +18,7 @@ config PPC_POWERNV
 	select CPU_FREQ_GOV_ONDEMAND
 	select CPU_FREQ_GOV_CONSERVATIVE
 	select PPC_DOORBELL
+	select OF_DYNAMIC
 	default y
 
 config OPAL_PRD
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (30 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 36/42] powerpc/pci: Update bridge windows on PCI plugging Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06 14:09   ` Rob Herring
  2015-11-03 23:16   ` Gavin Shan
  2015-08-06  4:11 ` [PATCH v6 39/42] drivers/of: Allow to specify root node in of_fdt_unflatten_tree() Gavin Shan
                   ` (4 subsequent siblings)
  36 siblings, 2 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

unflatten_dt_node() is called recursively to unflatten FDT nodes
with the assumption that FDT blob has only one root node, which
isn't true when the FDT blob represents device sub-tree. This
improves the function to supporting device sub-tree that have
multiple nodes in the first level:

   * Rename original unflatten_dt_node() to __unflatten_dt_node().
   * Wrapper unflatten_dt_node() calls __unflatten_dt_node() with
     adjusted current node depth to 1 to avoid underflow.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c | 53 ++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 40 insertions(+), 13 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 0749656..a18a2ce 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -161,7 +161,7 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
 }
 
 /**
- * unflatten_dt_node - Alloc and populate a device_node from the flat tree
+ * __unflatten_dt_node - Alloc and populate a device_node from the flat tree
  * @blob: The parent device tree blob
  * @mem: Memory chunk to use for allocating device nodes and properties
  * @poffset: pointer to node in flat tree
@@ -171,20 +171,20 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
  * @dryrun: If true, do not allocate device nodes but still calculate needed
  * memory size
  */
-static void * unflatten_dt_node(const void *blob,
+static void *__unflatten_dt_node(const void *blob,
 				void *mem,
 				int *poffset,
 				struct device_node *dad,
 				struct device_node **nodepp,
 				unsigned long fpsize,
-				bool dryrun)
+				bool dryrun,
+				int *depth)
 {
 	const __be32 *p;
 	struct device_node *np;
 	struct property *pp, **prev_pp = NULL;
 	const char *pathp;
 	unsigned int l, allocl;
-	static int depth = 0;
 	int old_depth;
 	int offset;
 	int has_name = 0;
@@ -337,13 +337,25 @@ static void * unflatten_dt_node(const void *blob,
 			np->type = "<NULL>";
 	}
 
-	old_depth = depth;
-	*poffset = fdt_next_node(blob, *poffset, &depth);
-	if (depth < 0)
-		depth = 0;
-	while (*poffset > 0 && depth > old_depth)
-		mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
-					fpsize, dryrun);
+	/* Multiple nodes might be in the first depth level if
+	 * the device tree is sub-tree. All nodes in current
+	 * or deeper depth are unflattened after it returns.
+	 */
+	old_depth = *depth;
+	*poffset = fdt_next_node(blob, *poffset, depth);
+	while (*poffset > 0) {
+		if (*depth < old_depth)
+			break;
+
+		if (*depth == old_depth)
+			mem = __unflatten_dt_node(blob, mem, poffset,
+						  dad, NULL, fpsize,
+						  dryrun, depth);
+		else if (*depth > old_depth)
+			mem = __unflatten_dt_node(blob, mem, poffset,
+						  np, NULL, fpsize,
+						  dryrun, depth);
+	}
 
 	if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
 		pr_err("unflatten: error %d processing FDT\n", *poffset);
@@ -369,6 +381,20 @@ static void * unflatten_dt_node(const void *blob,
 	return mem;
 }
 
+static void *unflatten_dt_node(const void *blob,
+			       void *mem,
+			       int *poffset,
+			       struct device_node *dad,
+			       struct device_node **nodepp,
+			       bool dryrun)
+{
+	int depth = 1;
+
+	return __unflatten_dt_node(blob, mem, poffset,
+				   dad, nodepp, 0,
+				   dryrun, &depth);
+}
+
 /**
  * __unflatten_device_tree - create tree of device_nodes from flat blob
  *
@@ -408,7 +434,8 @@ static void __unflatten_device_tree(const void *blob,
 
 	/* First pass, scan for size */
 	start = 0;
-	size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
+	size = (unsigned long)unflatten_dt_node(blob, NULL, &start,
+						NULL, NULL, true);
 	size = ALIGN(size, 4);
 
 	pr_debug("  size is %lx, allocating...\n", size);
@@ -423,7 +450,7 @@ static void __unflatten_device_tree(const void *blob,
 
 	/* Second pass, do actual unflattening */
 	start = 0;
-	unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
+	unflatten_dt_node(blob, mem, &start, NULL, mynodes, false);
 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
 		pr_warning("End of tree marker overwritten: %08x\n",
 			   be32_to_cpup(mem + size));
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 39/42] drivers/of: Allow to specify root node in of_fdt_unflatten_tree()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (31 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-10 22:42   ` Frank Rowand
  2015-08-06  4:11 ` [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree() Gavin Shan
                   ` (3 subsequent siblings)
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

This introduces one more argument to of_fdt_unflatten_tree()
to specify the root node for the FDT blob, which is going to be
unflattened. In the result, the function can be used to unflatten
FDT blob, which represents device sub-tree in PowerNV hotplug
driver.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c       | 13 ++++++++-----
 drivers/of/unittest.c  |  2 +-
 include/linux/of_fdt.h |  1 +
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index a18a2ce..074870a 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -388,10 +388,11 @@ static void *unflatten_dt_node(const void *blob,
 			       struct device_node **nodepp,
 			       bool dryrun)
 {
+	unsigned long fpsize = dad ? strlen(of_node_full_name(dad)) : 0;
 	int depth = 1;
 
 	return __unflatten_dt_node(blob, mem, poffset,
-				   dad, nodepp, 0,
+				   dad, nodepp, fpsize,
 				   dryrun, &depth);
 }
 
@@ -408,6 +409,7 @@ static void *unflatten_dt_node(const void *blob,
  * for the resulting tree
  */
 static void __unflatten_device_tree(const void *blob,
+			     struct device_node *dad,
 			     struct device_node **mynodes,
 			     void * (*dt_alloc)(u64 size, u64 align))
 {
@@ -435,7 +437,7 @@ static void __unflatten_device_tree(const void *blob,
 	/* First pass, scan for size */
 	start = 0;
 	size = (unsigned long)unflatten_dt_node(blob, NULL, &start,
-						NULL, NULL, true);
+						dad, NULL, true);
 	size = ALIGN(size, 4);
 
 	pr_debug("  size is %lx, allocating...\n", size);
@@ -450,7 +452,7 @@ static void __unflatten_device_tree(const void *blob,
 
 	/* Second pass, do actual unflattening */
 	start = 0;
-	unflatten_dt_node(blob, mem, &start, NULL, mynodes, false);
+	unflatten_dt_node(blob, mem, &start, dad, mynodes, false);
 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
 		pr_warning("End of tree marker overwritten: %08x\n",
 			   be32_to_cpup(mem + size));
@@ -472,9 +474,10 @@ static void *kernel_tree_alloc(u64 size, u64 align)
  * can be used.
  */
 void of_fdt_unflatten_tree(const unsigned long *blob,
+			struct device_node *dad,
 			struct device_node **mynodes)
 {
-	__unflatten_device_tree(blob, mynodes, &kernel_tree_alloc);
+	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
 }
 EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
 
@@ -1125,7 +1128,7 @@ bool __init early_init_dt_scan(void *params)
  */
 void __init unflatten_device_tree(void)
 {
-	__unflatten_device_tree(initial_boot_params, &of_root,
+	__unflatten_device_tree(initial_boot_params, NULL, &of_root,
 				early_init_dt_alloc_memory_arch);
 
 	/* Get pointer to "/chosen" and "/aliases" nodes for use everywhere */
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 1801634..2270830 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -907,7 +907,7 @@ static int __init unittest_data_add(void)
 			"not running tests\n", __func__);
 		return -ENOMEM;
 	}
-	of_fdt_unflatten_tree(unittest_data, &unittest_data_node);
+	of_fdt_unflatten_tree(unittest_data, NULL, &unittest_data_node);
 	if (!unittest_data_node) {
 		pr_warn("%s: No tree to attach; not running tests\n", __func__);
 		return -ENODATA;
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index df9ef38..3644960 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -38,6 +38,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
 extern int of_fdt_match(const void *blob, unsigned long node,
 			const char *const *compat);
 extern void of_fdt_unflatten_tree(const unsigned long *blob,
+			       struct device_node *dad,
 			       struct device_node **mynodes);
 
 /* TBD: Temporary export of fdt globals - remove when code fully merged */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree()
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (32 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 39/42] drivers/of: Allow to specify root node in of_fdt_unflatten_tree() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
       [not found]   ` <1438834307-26960-41-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2015-08-10 22:42   ` Frank Rowand
  2015-08-06  4:11 ` [PATCH v6 41/42] drivers/of: Export OF changeset functions Gavin Shan
                   ` (2 subsequent siblings)
  36 siblings, 2 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

This changes of_fdt_unflatten_tree() so that it returns the allocated
memory chunk for unflattened device-tree, which can be released once
it's obsoleted.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/fdt.c       | 11 ++++++-----
 include/linux/of_fdt.h |  2 +-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 074870a..8e1ba7e 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -408,7 +408,7 @@ static void *unflatten_dt_node(const void *blob,
  * @dt_alloc: An allocator that provides a virtual address to memory
  * for the resulting tree
  */
-static void __unflatten_device_tree(const void *blob,
+static void *__unflatten_device_tree(const void *blob,
 			     struct device_node *dad,
 			     struct device_node **mynodes,
 			     void * (*dt_alloc)(u64 size, u64 align))
@@ -421,7 +421,7 @@ static void __unflatten_device_tree(const void *blob,
 
 	if (!blob) {
 		pr_debug("No device tree pointer\n");
-		return;
+		return NULL;
 	}
 
 	pr_debug("Unflattening device tree:\n");
@@ -431,7 +431,7 @@ static void __unflatten_device_tree(const void *blob,
 
 	if (fdt_check_header(blob)) {
 		pr_err("Invalid device tree blob header\n");
-		return;
+		return NULL;
 	}
 
 	/* First pass, scan for size */
@@ -458,6 +458,7 @@ static void __unflatten_device_tree(const void *blob,
 			   be32_to_cpup(mem + size));
 
 	pr_debug(" <- unflatten_device_tree()\n");
+	return mem;
 }
 
 static void *kernel_tree_alloc(u64 size, u64 align)
@@ -473,11 +474,11 @@ static void *kernel_tree_alloc(u64 size, u64 align)
  * pointers of the nodes so the normal device-tree walking functions
  * can be used.
  */
-void of_fdt_unflatten_tree(const unsigned long *blob,
+void *of_fdt_unflatten_tree(const unsigned long *blob,
 			struct device_node *dad,
 			struct device_node **mynodes)
 {
-	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
+	return __unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
 }
 EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
 
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 3644960..00db279 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -37,7 +37,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
 				 unsigned long node);
 extern int of_fdt_match(const void *blob, unsigned long node,
 			const char *const *compat);
-extern void of_fdt_unflatten_tree(const unsigned long *blob,
+extern void *of_fdt_unflatten_tree(const unsigned long *blob,
 			       struct device_node *dad,
 			       struct device_node **mynodes);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 41/42] drivers/of: Export OF changeset functions
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (33 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree() Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-06 13:48   ` Rob Herring
  2015-08-06  4:11 ` [PATCH v6 42/42] pci/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
  2015-08-10  6:05 ` [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Alexey Kardashevskiy
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The PowerNV PCI hotplug driver is going to use the OF changeset
to manage the changed device sub-tree, which requires those OF
changeset functions are exported.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/of/dynamic.c  | 65 ++++++++++++++++++++++++++++++++++++---------------
 drivers/of/overlay.c  |  8 +++----
 drivers/of/unittest.c |  4 ++--
 include/linux/of.h    |  2 ++
 4 files changed, 54 insertions(+), 25 deletions(-)

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 53826b8..af65b5b 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs)
 	memset(ocs, 0, sizeof(*ocs));
 	INIT_LIST_HEAD(&ocs->entries);
 }
+EXPORT_SYMBOL(of_changeset_init);
 
 /**
  * of_changeset_destroy - Destroy a changeset
@@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs)
 	list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
 		__of_changeset_entry_destroy(ce);
 }
+EXPORT_SYMBOL(of_changeset_destroy);
 
-/**
- * of_changeset_apply - Applies a changeset
- *
- * @ocs:	changeset pointer
- *
- * Applies a changeset to the live tree.
- * Any side-effects of live tree state changes are applied here on
- * sucess, like creation/destruction of devices and side-effects
- * like creation of sysfs properties and directories.
- * Returns 0 on success, a negative error value in case of an error.
- * On error the partially applied effects are reverted.
- */
-int of_changeset_apply(struct of_changeset *ocs)
+int __of_changeset_apply(struct of_changeset *ocs)
 {
 	struct of_changeset_entry *ce;
 	int ret;
@@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs)
 }
 
 /**
- * of_changeset_revert - Reverts an applied changeset
+ * of_changeset_apply - Applies a changeset
  *
  * @ocs:	changeset pointer
  *
- * Reverts a changeset returning the state of the tree to what it
- * was before the application.
- * Any side-effects like creation/destruction of devices and
- * removal of sysfs properties and directories are applied.
+ * Applies a changeset to the live tree.
+ * Any side-effects of live tree state changes are applied here on
+ * sucess, like creation/destruction of devices and side-effects
+ * like creation of sysfs properties and directories.
  * Returns 0 on success, a negative error value in case of an error.
+ * On error the partially applied effects are reverted.
  */
-int of_changeset_revert(struct of_changeset *ocs)
+int of_changeset_apply(struct of_changeset *ocs)
+{
+	int ret;
+
+	mutex_lock(&of_mutex);
+	ret = __of_changeset_apply(ocs);
+	mutex_unlock(&of_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL(of_changeset_apply);
+
+int __of_changeset_revert(struct of_changeset *ocs)
 {
 	struct of_changeset_entry *ce;
 	int ret;
@@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs)
 }
 
 /**
+ * of_changeset_revert - Reverts an applied changeset
+ *
+ * @ocs:	changeset pointer
+ *
+ * Reverts a changeset returning the state of the tree to what it
+ * was before the application.
+ * Any side-effects like creation/destruction of devices and
+ * removal of sysfs properties and directories are applied.
+ * Returns 0 on success, a negative error value in case of an error.
+ */
+int of_changeset_revert(struct of_changeset *ocs)
+{
+	int ret;
+
+	mutex_lock(&of_mutex);
+	ret = __of_changeset_revert(ocs);
+	mutex_unlock(&of_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL(of_changeset_revert);
+
+/**
  * of_changeset_action - Perform a changeset action
  *
  * @ocs:	changeset pointer
@@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, unsigned long action,
 	list_add_tail(&ce->node, &ocs->entries);
 	return 0;
 }
+EXPORT_SYMBOL(of_changeset_action);
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 24e025f..804ea33 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree)
 	}
 
 	/* apply the changeset */
-	err = of_changeset_apply(&ov->cset);
+	err = __of_changeset_apply(&ov->cset);
 	if (err) {
-		pr_err("%s: of_changeset_apply() failed for tree@%s\n",
+		pr_err("%s: __of_changeset_apply() failed for tree@%s\n",
 				__func__, tree->full_name);
 		goto err_revert_overlay;
 	}
@@ -508,7 +508,7 @@ int of_overlay_destroy(int id)
 
 
 	list_del(&ov->node);
-	of_changeset_revert(&ov->cset);
+	__of_changeset_revert(&ov->cset);
 	of_free_overlay_info(ov);
 	idr_remove(&ov_idr, id);
 	of_changeset_destroy(&ov->cset);
@@ -539,7 +539,7 @@ int of_overlay_destroy_all(void)
 	/* the tail of list is guaranteed to be safe to remove */
 	list_for_each_entry_safe_reverse(ov, ovn, &ov_list, node) {
 		list_del(&ov->node);
-		of_changeset_revert(&ov->cset);
+		__of_changeset_revert(&ov->cset);
 		of_free_overlay_info(ov);
 		idr_remove(&ov_idr, ov->id);
 		kfree(ov);
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 2270830..06eb3e5 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -527,7 +527,7 @@ static void __init of_unittest_changeset(void)
 	unittest(!of_changeset_update_property(&chgset, parent, ppupdate), "fail update prop\n");
 	unittest(!of_changeset_remove_property(&chgset, parent, ppremove), "fail remove prop\n");
 	mutex_lock(&of_mutex);
-	unittest(!of_changeset_apply(&chgset), "apply failed\n");
+	unittest(!__of_changeset_apply(&chgset), "apply failed\n");
 	mutex_unlock(&of_mutex);
 
 	/* Make sure node names are constructed correctly */
@@ -536,7 +536,7 @@ static void __init of_unittest_changeset(void)
 	of_node_put(np);
 
 	mutex_lock(&of_mutex);
-	unittest(!of_changeset_revert(&chgset), "revert failed\n");
+	unittest(!__of_changeset_revert(&chgset), "revert failed\n");
 	mutex_unlock(&of_mutex);
 
 	of_changeset_destroy(&chgset);
diff --git a/include/linux/of.h b/include/linux/of.h
index edc068d..5c030e1 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -1001,7 +1001,9 @@ extern int of_reconfig_get_state_change(unsigned long action,
 
 extern void of_changeset_init(struct of_changeset *ocs);
 extern void of_changeset_destroy(struct of_changeset *ocs);
+extern int __of_changeset_apply(struct of_changeset *ocs);
 extern int of_changeset_apply(struct of_changeset *ocs);
+extern int __of_changeset_revert(struct of_changeset *ocs);
 extern int of_changeset_revert(struct of_changeset *ocs);
 extern int of_changeset_action(struct of_changeset *ocs,
 		unsigned long action, struct device_node *np,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v6 42/42] pci/hotplug: PowerPC PowerNV PCI hotplug driver
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (34 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 41/42] drivers/of: Export OF changeset functions Gavin Shan
@ 2015-08-06  4:11 ` Gavin Shan
  2015-08-15  3:13   ` Alexey Kardashevskiy
  2015-08-10  6:05 ` [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Alexey Kardashevskiy
  36 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-06  4:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto, aik, Gavin Shan

The patch intends to add standalone driver to support PCI hotplug
for PowerPC PowerNV platform, which runs on top of skiboot firmware.
The firmware identified hotpluggable slots and marked their device
tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware".
The driver simply scans device-tree to create/register PCI hotplug slot
accordingly.

If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property "ibm,reset-by-firmware". In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 MAINTAINERS                            |   6 +
 drivers/pci/hotplug/Kconfig            |  12 +
 drivers/pci/hotplug/Makefile           |   4 +
 drivers/pci/hotplug/powernv_php.c      | 140 +++++++
 drivers/pci/hotplug/powernv_php.h      |  92 +++++
 drivers/pci/hotplug/powernv_php_slot.c | 722 +++++++++++++++++++++++++++++++++
 6 files changed, 976 insertions(+)
 create mode 100644 drivers/pci/hotplug/powernv_php.c
 create mode 100644 drivers/pci/hotplug/powernv_php.h
 create mode 100644 drivers/pci/hotplug/powernv_php_slot.c

diff --git a/MAINTAINERS b/MAINTAINERS
index fd60784..3b75c92 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7747,6 +7747,12 @@ L:	linux-pci@vger.kernel.org
 S:	Supported
 F:	Documentation/PCI/pci-error-recovery.txt
 
+PCI HOTPLUG DRIVER FOR POWERNV PLATFORM
+M:	Gavin Shan <gwshan@linux.vnet.ibm.com>
+L:	linux-pci@vger.kernel.org
+S:	Supported
+F:	drivers/pci/hotplug/powernv_php*
+
 PCI SUBSYSTEM
 M:	Bjorn Helgaas <bhelgaas@google.com>
 L:	linux-pci@vger.kernel.org
diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..ef55dae 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
 
 	  When in doubt, say N.
 
+config HOTPLUG_PCI_POWERNV
+	tristate "PowerPC PowerNV PCI Hotplug driver"
+	depends on PPC_POWERNV && EEH
+	help
+	  Say Y here if you run PowerPC PowerNV platform that supports
+          PCI Hotplug
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called powernv-php.
+
+	  When in doubt, say N.
+
 config HOTPLUG_PCI_RPA
 	tristate "RPA PCI Hotplug driver"
 	depends on PPC_PSERIES && EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index b616e75..fd51d65 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
 obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= powernv-php.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
@@ -50,6 +51,9 @@ ibmphp-objs		:=	ibmphp_core.o	\
 acpiphp-objs		:=	acpiphp_core.o	\
 				acpiphp_glue.o
 
+powernv-php-objs	:=	powernv_php.o	\
+				powernv_php_slot.o
+
 rpaphp-objs		:=	rpaphp_core.o	\
 				rpaphp_pci.o	\
 				rpaphp_slot.o
diff --git a/drivers/pci/hotplug/powernv_php.c b/drivers/pci/hotplug/powernv_php.c
new file mode 100644
index 0000000..4cbff7a
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php.c
@@ -0,0 +1,140 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+
+#include "powernv_php.h"
+
+#define DRIVER_VERSION	"0.1"
+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
+
+static struct notifier_block php_msg_nb = {
+	.notifier_call	= powernv_php_msg_handler,
+	.next		= NULL,
+	.priority	= 0,
+};
+
+static int powernv_php_register_one(struct device_node *dn)
+{
+	struct powernv_php_slot *slot;
+	const __be32 *prop32;
+	int ret;
+
+	/* Check if it's hotpluggable slot */
+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return -ENXIO;
+
+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return -ENXIO;
+
+	/* Allocate slot */
+	slot = powernv_php_slot_alloc(dn);
+	if (!slot)
+		return -ENODEV;
+
+	/* Register it */
+	ret = powernv_php_slot_register(slot);
+	if (ret) {
+		powernv_php_slot_put(slot);
+		return ret;
+	}
+
+	return powernv_php_slot_enable(slot->php_slot, false);
+}
+
+int powernv_php_register(struct device_node *dn)
+{
+	struct device_node *child;
+	int ret = 0;
+
+	/*
+	 * The parent slots should be registered before their
+	 * child slots.
+	 */
+	for_each_child_of_node(dn, child) {
+		powernv_php_register_one(child);
+		powernv_php_register(child);
+	}
+
+	return ret;
+}
+
+static void powernv_php_unregister_one(struct device_node *dn)
+{
+	struct powernv_php_slot *slot;
+
+	slot = powernv_php_slot_find(dn);
+	if (!slot)
+		return;
+
+	pci_hp_deregister(slot->php_slot);
+}
+
+void powernv_php_unregister(struct device_node *dn)
+{
+	struct device_node *child;
+
+	/* The child slots should go before their parent slots */
+	for_each_child_of_node(dn, child) {
+		powernv_php_unregister(child);
+		powernv_php_unregister_one(child);
+	}
+}
+
+static int __init powernv_php_init(void)
+{
+	struct device_node *dn;
+	int ret;
+
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+
+	/* Register hotplug message handler */
+	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
+	if (ret) {
+		pr_warn("%s: Error %d registering hotplug notifier\n",
+			__func__, ret);
+		return ret;
+	}
+
+	/* Scan PHB nodes and their children */
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		powernv_php_register(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		powernv_php_register(dn);
+
+	return 0;
+}
+
+static void __exit powernv_php_exit(void)
+{
+	struct device_node *dn;
+
+	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
+
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		powernv_php_unregister(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		powernv_php_unregister(dn);
+}
+
+module_init(powernv_php_init);
+module_exit(powernv_php_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/pci/hotplug/powernv_php.h b/drivers/pci/hotplug/powernv_php.h
new file mode 100644
index 0000000..8034cc6
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php.h
@@ -0,0 +1,92 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _POWERNV_PHP_H
+#define _POWERNV_PHP_H
+
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/of.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#include <asm/opal-api.h>
+
+/* Slot power status */
+#define POWERNV_PHP_SLOT_POWER_OFF	0
+#define POWERNV_PHP_SLOT_POWER_ON	1
+
+/* Slot presence status */
+#define POWERNV_PHP_SLOT_EMPTY		0
+#define POWERNV_PHP_SLOT_PRESENT	1
+
+/* Slot attention status */
+#define POWERNV_PHP_SLOT_ATTEN_OFF	0
+#define POWERNV_PHP_SLOT_ATTEN_ON	1
+#define POWERNV_PHP_SLOT_ATTEN_IND	2
+#define POWERNV_PHP_SLOT_ATTEN_ACT	3
+
+struct powernv_php_slot {
+	char			*name;
+	struct device_node	*dn;
+	struct pci_dev		*pdev;
+	struct pci_bus		*bus;
+	uint64_t		id;
+	int			slot_no;
+	struct kref		kref;
+#define POWERNV_PHP_SLOT_STATE_INIT		0
+#define POWERNV_PHP_SLOT_STATE_REGISTER		1
+#define POWERNV_PHP_SLOT_STATE_POPULATED	2
+	int			state;
+	int			check_power_status;
+	int			status_confirmed;
+	struct opal_msg		*msg;
+	void			*fdt;
+	void			*dt;
+	struct of_changeset	ocs;
+	struct work_struct	work;
+	wait_queue_head_t	queue;
+	struct hotplug_slot	*php_slot;
+	struct powernv_php_slot	*parent;
+	struct list_head	children;
+	struct list_head	link;
+};
+
+int powernv_php_msg_handler(struct notifier_block *nb,
+			    unsigned long type, void *message);
+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn);
+void powernv_php_slot_free(struct kref *kref);
+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn);
+int powernv_php_slot_register(struct powernv_php_slot *slot);
+int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan);
+int powernv_php_register(struct device_node *dn);
+void powernv_php_unregister(struct device_node *dn);
+
+#define to_powernv_php_slot(kref) \
+	container_of(kref, struct powernv_php_slot, kref)
+
+static inline void powernv_php_slot_get(struct powernv_php_slot *slot)
+{
+	if (slot)
+		kref_get(&slot->kref);
+}
+
+static inline int powernv_php_slot_put(struct powernv_php_slot *slot)
+{
+	if (slot)
+		return kref_put(&slot->kref, powernv_php_slot_free);
+
+	return 0;
+}
+
+#endif /* !_POWERNV_PHP_H */
diff --git a/drivers/pci/hotplug/powernv_php_slot.c b/drivers/pci/hotplug/powernv_php_slot.c
new file mode 100644
index 0000000..73a93a2
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php_slot.c
@@ -0,0 +1,722 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+#include <asm/ppc-pci.h>
+
+#include "powernv_php.h"
+
+static LIST_HEAD(php_slot_list);
+static DEFINE_SPINLOCK(php_slot_lock);
+
+/*
+ * Remove firmware data for all child device nodes of the
+ * indicated one.
+ */
+static void remove_child_pdn(struct device_node *np)
+{
+	struct device_node *child;
+
+	for_each_child_of_node(np, child) {
+		/* In depth first */
+		remove_child_pdn(child);
+
+		remove_pci_device_node_info(child);
+	}
+}
+
+/*
+ * Remove all subordinate device nodes of the indicated one.
+ * Those device nodes in deepest path should be released firstly.
+ */
+static int remove_child_device_nodes(struct device_node *parent)
+{
+	struct device_node *np, *child;
+	int ret = 0;
+
+	/* If the device node has children, remove them firstly */
+	for_each_child_of_node(parent, np) {
+		ret = remove_child_device_nodes(np);
+		if (ret)
+			return ret;
+
+		/* The device shouldn't have alive children */
+		child = of_get_next_child(np, NULL);
+		if (child) {
+			of_node_put(child);
+			of_node_put(np);
+			pr_err("%s: Alive children of node <%s>\n",
+			       __func__, of_node_full_name(np));
+			return -EBUSY;
+		}
+
+		/* Detach the device node */
+		of_detach_node(np);
+		of_node_put(np);
+	}
+
+	return 0;
+}
+
+/*
+ * The function processes the message sent by firmware
+ * to remove all device tree nodes beneath the slot's
+ * nodes, and the associated auxillary data.
+ */
+static void slot_power_off_handler(struct powernv_php_slot *slot)
+{
+	int ret, status = 1;
+
+	/* Release the firmware data for the child device nodes */
+	remove_child_pdn(slot->dn);
+
+	/*
+	 * Release the child device nodes. If the sub-tree was
+	 * built with the help of changeset, we just need destroy
+	 * the changes.
+	 */
+	if (slot->fdt) {
+		of_changeset_destroy(&slot->ocs);
+		kfree(slot->dt);
+		slot->dt = NULL;
+		slot->dn->child = NULL;
+		kfree(slot->fdt);
+		slot->fdt = NULL;
+	} else {
+		ret = remove_child_device_nodes(slot->dn);
+		if (ret) {
+			status = 2;
+			dev_warn(&slot->pdev->dev, "Error %d freeing nodes\n",
+				 ret);
+		}
+	}
+
+	/* Confirm status change */
+	slot->status_confirmed = status;
+	wake_up_interruptible(&slot->queue);
+}
+
+static int slot_populate_changeset(struct of_changeset *ocs,
+				    struct device_node *dn)
+{
+	struct device_node *child;
+	int ret = 0;
+
+	for_each_child_of_node(dn, child) {
+		ret = of_changeset_attach_node(ocs, child);
+		if (ret)
+			return ret;
+
+		ret = slot_populate_changeset(ocs, child);
+	}
+
+	return ret;
+}
+
+static void slot_power_on_handler(struct powernv_php_slot *slot)
+{
+	void *fdt, *dt;
+	uint64_t len;
+	int ret, status = 1;
+
+	/* We don't know the FDT blob size. It tries with incremental
+	 * sized memory chunk.
+	 */
+	for (len = 0x2000; len <= 0x10000; len += 0x2000) {
+		fdt = kzalloc(len, GFP_KERNEL);
+		if (!fdt)
+			break;
+
+		ret = pnv_pci_get_device_tree(slot->dn->phandle, fdt, len);
+		if (!ret)
+			break;
+
+		kfree(fdt);
+	}
+
+	if (len > 0x10000) {
+		dev_warn(&slot->pdev->dev, "Cannot alloc FDT blob\n");
+		goto out;
+	}
+
+	/* Unflatten device tree blob */
+	dt = of_fdt_unflatten_tree(fdt, slot->dn, NULL);
+	if (!dt) {
+		dev_warn(&slot->pdev->dev, "Cannot unflatten FDT\n");
+		goto free_fdt;
+	}
+
+	/* Initialize and apply the changeset */
+	of_changeset_init(&slot->ocs);
+	ret = slot_populate_changeset(&slot->ocs, slot->dn);
+	if (ret) {
+		dev_warn(&slot->pdev->dev, "Error %d populating changeset\n",
+			 ret);
+		goto free_dt;
+	}
+
+	slot->dn->child = NULL;
+	ret = of_changeset_apply(&slot->ocs);
+	if (ret) {
+		dev_warn(&slot->pdev->dev, "Error %d applying changeset\n",
+			 ret);
+		goto destroy_changeset;
+	}
+
+	/* Add device node firmware data */
+	traverse_pci_device_nodes(slot->dn,
+				  add_pci_device_node_info,
+				  pci_bus_to_host(slot->bus));
+	slot->fdt = fdt;
+	slot->dt = dt;
+	goto out;
+
+destroy_changeset:
+	of_changeset_destroy(&slot->ocs);
+free_dt:
+	kfree(dt);
+	slot->dn->child = NULL;
+free_fdt:
+	kfree(fdt);
+	status = 2;
+out:
+	/* Confirm status change */
+	slot->status_confirmed = status;
+	wake_up_interruptible(&slot->queue);
+}
+
+static void powernv_php_slot_work(struct work_struct *data)
+{
+	struct powernv_php_slot *slot = container_of(data,
+						     struct powernv_php_slot,
+						     work);
+	uint64_t php_event = be64_to_cpu(slot->msg->params[0]);
+
+	switch (php_event) {
+	case 0: /* Slot power off */
+		slot_power_off_handler(slot);
+		break;
+	case 1: /* Slot power on */
+		slot_power_on_handler(slot);
+		break;
+	default:
+		dev_warn(&slot->pdev->dev, "Unsupported hotplug event %lld\n",
+			 php_event);
+	}
+
+	of_node_put(slot->dn);
+}
+
+int powernv_php_msg_handler(struct notifier_block *nb,
+			    unsigned long type, void *message)
+{
+	phandle h;
+	struct device_node *np;
+	struct powernv_php_slot *slot;
+	struct opal_msg *msg = message;
+
+	/* Check the message type */
+	if (type != OPAL_MSG_PCI_HOTPLUG) {
+		pr_warn("%s: Wrong message type %ld received!\n",
+			__func__, type);
+		return NOTIFY_DONE;
+	}
+
+	/* Find the device node */
+	h = (phandle)be64_to_cpu(msg->params[1]);
+	np = of_find_node_by_phandle(h);
+	if (!np) {
+		pr_warn("%s: No device node for phandle 0x%08x\n",
+			__func__, h);
+		return NOTIFY_DONE;
+	}
+
+	/* Find the slot */
+	slot = powernv_php_slot_find(np);
+	if (!slot) {
+		pr_warn("%s: No slot found for node <%s>\n",
+			__func__, of_node_full_name(np));
+		of_node_put(np);
+		return NOTIFY_DONE;
+	}
+
+	/* Schedule the work */
+	slot->msg = msg;
+	schedule_work(&slot->work);
+	return NOTIFY_OK;
+}
+
+static int set_power_status(struct hotplug_slot *php_slot, u8 val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	int ret;
+
+	/* Set power status */
+	slot->status_confirmed = 0;
+	ret = pnv_pci_set_power_status(slot->id, val);
+	if (ret) {
+		dev_warn(&slot->pdev->dev, "Error %d powering %s slot\n",
+			 ret, val ? "on" : "off");
+		return ret;
+	}
+
+	/* Continue to PCI probing after finalized device-tree. The
+	 * device-tree might have been updated completely at this
+	 * point. Thus we don't have to always waiting for that.
+	 */
+	if (slot->status_confirmed == 1)
+		return 0;
+	else if (slot->status_confirmed > 0)
+		return -EBUSY;
+
+	ret = wait_event_timeout(slot->queue, slot->status_confirmed, 10 * HZ);
+	if (!ret) {
+		dev_warn(&slot->pdev->dev, "Error %d waiting for power-%s\n",
+			 ret, val ? "on" : "off");
+		return -EBUSY;
+	}
+
+	/* Check the result */
+	if (slot->status_confirmed == 1)
+		return 0;
+
+	dev_warn(&slot->pdev->dev, "Error status %d for power-%s\n",
+		 slot->status_confirmed, val ? "on" : "off");
+	return -EBUSY;
+}
+
+static int get_power_status(struct hotplug_slot *php_slot, u8 *val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t state;
+	int ret;
+
+	/*
+	 * Retrieve power status from firmware. If we fail
+	 * getting that, the power status fails back to
+	 * be on.
+	 */
+	ret = pnv_pci_get_power_status(slot->id, &state);
+	if (ret) {
+		*val = POWERNV_PHP_SLOT_POWER_ON;
+		dev_warn(&slot->pdev->dev, "Error %d getting power status\n",
+			 ret);
+	} else {
+		*val = state ? POWERNV_PHP_SLOT_POWER_ON :
+			       POWERNV_PHP_SLOT_POWER_OFF;
+		php_slot->info->power_status = *val;
+	}
+
+	return 0;
+}
+
+static int get_adapter_status(struct hotplug_slot *php_slot, u8 *val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t state;
+	int ret;
+
+	/*
+	 * Retrieve presence status from firmware. If we can't
+	 * get that, it will fail back to be empty.
+	 */
+	ret = pnv_pci_get_presence_status(slot->id, &state);
+	if (ret >= 0) {
+		ret = 0;
+		*val = state ? POWERNV_PHP_SLOT_PRESENT :
+			       POWERNV_PHP_SLOT_EMPTY;
+		php_slot->info->adapter_status = *val;
+		ret = 0;
+	} else {
+		*val = POWERNV_PHP_SLOT_EMPTY;
+		dev_warn(&slot->pdev->dev, "Error %d getting presence\n",
+			 ret);
+	}
+
+	return ret;
+}
+
+static int set_attention_status(struct hotplug_slot *php_slot, u8 val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+
+	/* The default operation would to turn on the attention */
+	switch (val) {
+	case POWERNV_PHP_SLOT_ATTEN_OFF:
+	case POWERNV_PHP_SLOT_ATTEN_ON:
+	case POWERNV_PHP_SLOT_ATTEN_IND:
+	case POWERNV_PHP_SLOT_ATTEN_ACT:
+		break;
+	default:
+		dev_warn(&slot->pdev->dev, "Invalid attention %d\n", val);
+		return -EINVAL;
+	}
+
+	/* FIXME: Make it real once firmware supports it */
+	php_slot->info->attention_status = val;
+
+	return 0;
+}
+
+int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t presence, power_status;
+	int ret;
+
+	/* Check if the slot has been configured */
+	if (slot->state != POWERNV_PHP_SLOT_STATE_REGISTER)
+		return 0;
+
+	/* Retrieve slot presence status */
+	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
+	if (ret)
+		return ret;
+
+	/* Proceed if there have nothing behind the slot */
+	if (presence == POWERNV_PHP_SLOT_EMPTY)
+		goto scan;
+
+	/*
+	 * If we don't detect something behind the slot, we need
+	 * make sure the power suply to the slot is on. Otherwise,
+	 * the slot downstream PCIe linkturn should be down.
+	 *
+	 * On the first time, we don't change the power status to
+	 * boost system boot with assumption that the firmware
+	 * supplies consistent slot power status: empty slot always
+	 * has its power off and non-empty slot has its power on.
+	 */
+	if (!slot->check_power_status) {
+		slot->check_power_status = 1;
+		goto scan;
+	}
+
+	/* Check the power status. Scan the slot if that's already on */
+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
+	if (ret)
+		return ret;
+
+	if (power_status == POWERNV_PHP_SLOT_POWER_ON)
+		goto scan;
+
+	/* Power is off, turn it on and then scan the slot */
+	ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_ON);
+	if (ret)
+		return ret;
+
+scan:
+	switch (presence) {
+	case POWERNV_PHP_SLOT_PRESENT:
+		if (rescan) {
+			pci_lock_rescan_remove();
+			pci_add_pci_devices(slot->bus);
+			pci_unlock_rescan_remove();
+		}
+
+		/* Rescan for child hotpluggable slots */
+		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
+		if (rescan)
+			powernv_php_register(slot->dn);
+		break;
+	case POWERNV_PHP_SLOT_EMPTY:
+		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
+		break;
+	default:
+		dev_warn(&slot->pdev->dev, "Invalid presence status %d\n",
+			 presence);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int enable_slot(struct hotplug_slot *php_slot)
+{
+	return powernv_php_slot_enable(php_slot, true);
+}
+
+static int disable_slot(struct hotplug_slot *php_slot)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t power_status;
+	int ret;
+
+	if (slot->state != POWERNV_PHP_SLOT_STATE_POPULATED)
+		return 0;
+
+	/* Remove all devices behind the slot */
+	pci_lock_rescan_remove();
+	pci_remove_pci_devices(slot->bus);
+	pci_unlock_rescan_remove();
+
+	/* Detach the child hotpluggable slots */
+	powernv_php_unregister(slot->dn);
+
+	/*
+	 * Check the power status and turn it off if necessary. If we
+	 * fail to get the power status, the power will be forced to
+	 * be off.
+	 */
+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
+	if (ret || power_status == POWERNV_PHP_SLOT_POWER_ON) {
+		ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_OFF);
+		if (ret)
+			dev_warn(&slot->pdev->dev, "Error %d powering off\n",
+				 ret);
+	}
+
+	/* Update slot state */
+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
+	return 0;
+}
+
+static struct hotplug_slot_ops php_slot_ops = {
+	.get_power_status	= get_power_status,
+	.get_adapter_status	= get_adapter_status,
+	.set_attention_status	= set_attention_status,
+	.enable_slot		= enable_slot,
+	.disable_slot		= disable_slot,
+};
+
+static struct powernv_php_slot *php_slot_match(struct device_node *dn,
+					       struct powernv_php_slot *slot)
+{
+	struct powernv_php_slot *target, *tmp;
+
+	if (slot->dn == dn)
+		return slot;
+
+	list_for_each_entry(tmp, &slot->children, link) {
+		target = php_slot_match(dn, tmp);
+		if (target)
+			return target;
+	}
+
+	return NULL;
+}
+
+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn)
+{
+	struct powernv_php_slot *slot, *tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&php_slot_lock, flags);
+	list_for_each_entry(tmp, &php_slot_list, link) {
+		slot = php_slot_match(dn, tmp);
+		if (slot) {
+			spin_unlock_irqrestore(&php_slot_lock, flags);
+			return slot;
+		}
+	}
+	spin_unlock_irqrestore(&php_slot_lock, flags);
+
+	return NULL;
+}
+
+void powernv_php_slot_free(struct kref *kref)
+{
+	struct powernv_php_slot *slot = to_powernv_php_slot(kref);
+
+	WARN_ON(!list_empty(&slot->children));
+	kfree(slot->name);
+	kfree(slot);
+}
+
+static void php_slot_release(struct hotplug_slot *hp_slot)
+{
+	struct powernv_php_slot *slot = hp_slot->private;
+	unsigned long flags;
+
+	/* Remove from global or child list */
+	spin_lock_irqsave(&php_slot_lock, flags);
+	list_del(&slot->link);
+	spin_unlock_irqrestore(&php_slot_lock, flags);
+
+	/* Detach from parent */
+	powernv_php_slot_put(slot);
+	powernv_php_slot_put(slot->parent);
+}
+
+static bool php_slot_get_id(struct device_node *dn,
+			    uint64_t *id)
+{
+	struct device_node *parent = dn;
+	const __be64 *prop64;
+	const __be32 *prop32;
+
+	/*
+	 * The hotpluggable slot always has a compound Id, which
+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
+	 * number, and compound indicator
+	 */
+	*id = (0x1ul << 63);
+
+	/* Bus/Slot/Function number */
+	prop32 = of_get_property(dn, "reg", NULL);
+	if (!prop32)
+		return false;
+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
+
+	/* PHB Id */
+	while ((parent = of_get_parent(parent))) {
+		if (!PCI_DN(parent)) {
+			of_node_put(parent);
+			break;
+		}
+
+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
+			of_node_put(parent);
+			continue;
+		}
+
+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
+		if (!prop64) {
+			of_node_put(parent);
+			return false;
+		}
+
+		*id |= be64_to_cpup(prop64);
+		of_node_put(parent);
+		return true;
+	}
+
+	return false;
+}
+
+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn)
+{
+	struct eeh_dev *edev = pdn_to_eeh_dev(PCI_DN(dn));
+	struct pci_bus *bus;
+	struct powernv_php_slot *slot;
+	const char *label;
+	uint64_t id;
+	int slot_no;
+	size_t size;
+	void *pmem;
+
+	/* Slot name */
+	label = of_get_property(dn, "ibm,slot-label", NULL);
+	if (!label)
+		return NULL;
+
+	/* Slot identifier */
+	if (!php_slot_get_id(dn, &id))
+		return NULL;
+
+	/* PCI bus */
+	bus = of_node_to_pci_bus(dn);
+	if (!bus)
+		return NULL;
+
+	/* Slot number */
+	if (dn->child && PCI_DN(dn->child))
+		slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
+	else
+		slot_no = -1;
+
+	/* Allocate slot */
+	size = sizeof(struct powernv_php_slot) +
+	       sizeof(struct hotplug_slot) +
+	       sizeof(struct hotplug_slot_info);
+	pmem = kzalloc(size, GFP_KERNEL);
+	if (!pmem) {
+		pr_warn("%s: Cannot allocate slot for node %s\n",
+			__func__, dn->full_name);
+		return NULL;
+	}
+
+	/* Assign memory blocks */
+	slot = pmem;
+	slot->php_slot = pmem + sizeof(struct powernv_php_slot);
+	slot->php_slot->info = pmem + sizeof(struct powernv_php_slot) +
+			      sizeof(struct hotplug_slot);
+	slot->name = kstrdup(label, GFP_KERNEL);
+	if (!slot->name) {
+		pr_warn("%s: Cannot populate name for node %s\n",
+			__func__, dn->full_name);
+		kfree(pmem);
+		return NULL;
+	}
+
+	/* Initialize slot */
+	kref_init(&slot->kref);
+	slot->state = POWERNV_PHP_SLOT_STATE_INIT;
+	slot->dn = dn;
+	slot->pdev = eeh_dev_to_pci_dev(edev);
+	slot->bus = bus;
+	slot->id = id;
+	slot->slot_no = slot_no;
+	INIT_WORK(&slot->work, powernv_php_slot_work);
+	init_waitqueue_head(&slot->queue);
+	slot->check_power_status = 0;
+	slot->status_confirmed = 0;
+	slot->php_slot->ops = &php_slot_ops;
+	slot->php_slot->release = php_slot_release;
+	slot->php_slot->private = slot;
+	INIT_LIST_HEAD(&slot->children);
+	INIT_LIST_HEAD(&slot->link);
+
+	return slot;
+}
+
+int powernv_php_slot_register(struct powernv_php_slot *slot)
+{
+	struct powernv_php_slot *parent;
+	struct device_node *dn = slot->dn;
+	unsigned long flags;
+	int ret;
+
+	/* Avoid register same slot for twice */
+	if (powernv_php_slot_find(slot->dn))
+		return -EEXIST;
+
+	/* Register slot */
+	ret = pci_hp_register(slot->php_slot, slot->bus,
+			      slot->slot_no, slot->name);
+	if (ret) {
+		dev_warn(&slot->pdev->dev, "Error %d registering slot\n",
+			 ret);
+		return ret;
+	}
+
+	/* Put into global or parent list */
+	while ((dn = of_get_parent(dn))) {
+		if (!PCI_DN(dn)) {
+			of_node_put(dn);
+			break;
+		}
+
+		parent = powernv_php_slot_find(dn);
+		if (parent) {
+			of_node_put(dn);
+			break;
+		}
+	}
+
+	spin_lock_irqsave(&php_slot_lock, flags);
+	if (parent) {
+		powernv_php_slot_get(parent);
+		slot->parent = parent;
+		list_add_tail(&slot->link, &parent->children);
+	} else {
+		list_add_tail(&slot->link, &php_slot_list);
+	}
+	spin_unlock_irqrestore(&php_slot_lock, flags);
+
+	/* Update slot state */
+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
+	return 0;
+}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 41/42] drivers/of: Export OF changeset functions
  2015-08-06  4:11 ` [PATCH v6 41/42] drivers/of: Export OF changeset functions Gavin Shan
@ 2015-08-06 13:48   ` Rob Herring
  2015-08-07  1:43     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Rob Herring @ 2015-08-06 13:48 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, Bjorn Helgaas, Grant Likely, Pantelis Antoniou,
	aik

On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> The PowerNV PCI hotplug driver is going to use the OF changeset
> to manage the changed device sub-tree, which requires those OF
> changeset functions are exported.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/dynamic.c  | 65 ++++++++++++++++++++++++++++++++++++---------------
>  drivers/of/overlay.c  |  8 +++----
>  drivers/of/unittest.c |  4 ++--
>  include/linux/of.h    |  2 ++
>  4 files changed, 54 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 53826b8..af65b5b 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs)
>         memset(ocs, 0, sizeof(*ocs));
>         INIT_LIST_HEAD(&ocs->entries);
>  }
> +EXPORT_SYMBOL(of_changeset_init);

We probably want these to be the _GPL variant.

>
>  /**
>   * of_changeset_destroy - Destroy a changeset
> @@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs)
>         list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
>                 __of_changeset_entry_destroy(ce);
>  }
> +EXPORT_SYMBOL(of_changeset_destroy);
>
> -/**
> - * of_changeset_apply - Applies a changeset
> - *
> - * @ocs:       changeset pointer
> - *
> - * Applies a changeset to the live tree.
> - * Any side-effects of live tree state changes are applied here on
> - * sucess, like creation/destruction of devices and side-effects
> - * like creation of sysfs properties and directories.
> - * Returns 0 on success, a negative error value in case of an error.
> - * On error the partially applied effects are reverted.
> - */
> -int of_changeset_apply(struct of_changeset *ocs)
> +int __of_changeset_apply(struct of_changeset *ocs)
>  {
>         struct of_changeset_entry *ce;
>         int ret;
> @@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs)
>  }
>
>  /**
> - * of_changeset_revert - Reverts an applied changeset
> + * of_changeset_apply - Applies a changeset
>   *
>   * @ocs:       changeset pointer
>   *
> - * Reverts a changeset returning the state of the tree to what it
> - * was before the application.
> - * Any side-effects like creation/destruction of devices and
> - * removal of sysfs properties and directories are applied.
> + * Applies a changeset to the live tree.
> + * Any side-effects of live tree state changes are applied here on
> + * sucess, like creation/destruction of devices and side-effects

s/sucess/success/

> + * like creation of sysfs properties and directories.
>   * Returns 0 on success, a negative error value in case of an error.
> + * On error the partially applied effects are reverted.
>   */
> -int of_changeset_revert(struct of_changeset *ocs)
> +int of_changeset_apply(struct of_changeset *ocs)
> +{
> +       int ret;
> +
> +       mutex_lock(&of_mutex);
> +       ret = __of_changeset_apply(ocs);
> +       mutex_unlock(&of_mutex);
> +
> +       return ret;
> +}
> +EXPORT_SYMBOL(of_changeset_apply);
> +
> +int __of_changeset_revert(struct of_changeset *ocs)
>  {
>         struct of_changeset_entry *ce;
>         int ret;
> @@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs)
>  }
>
>  /**
> + * of_changeset_revert - Reverts an applied changeset
> + *
> + * @ocs:       changeset pointer
> + *
> + * Reverts a changeset returning the state of the tree to what it
> + * was before the application.
> + * Any side-effects like creation/destruction of devices and
> + * removal of sysfs properties and directories are applied.
> + * Returns 0 on success, a negative error value in case of an error.
> + */
> +int of_changeset_revert(struct of_changeset *ocs)
> +{
> +       int ret;
> +
> +       mutex_lock(&of_mutex);
> +       ret = __of_changeset_revert(ocs);
> +       mutex_unlock(&of_mutex);
> +
> +       return ret;
> +}
> +EXPORT_SYMBOL(of_changeset_revert);
> +
> +/**
>   * of_changeset_action - Perform a changeset action
>   *
>   * @ocs:       changeset pointer
> @@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, unsigned long action,
>         list_add_tail(&ce->node, &ocs->entries);
>         return 0;
>  }
> +EXPORT_SYMBOL(of_changeset_action);
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index 24e025f..804ea33 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree)
>         }
>
>         /* apply the changeset */
> -       err = of_changeset_apply(&ov->cset);
> +       err = __of_changeset_apply(&ov->cset);
>         if (err) {
> -               pr_err("%s: of_changeset_apply() failed for tree@%s\n",
> +               pr_err("%s: __of_changeset_apply() failed for tree@%s\n",
>                                 __func__, tree->full_name);
>                 goto err_revert_overlay;
>         }
> @@ -508,7 +508,7 @@ int of_overlay_destroy(int id)
>
>
>         list_del(&ov->node);
> -       of_changeset_revert(&ov->cset);
> +       __of_changeset_revert(&ov->cset);
>         of_free_overlay_info(ov);
>         idr_remove(&ov_idr, id);
>         of_changeset_destroy(&ov->cset);
> @@ -539,7 +539,7 @@ int of_overlay_destroy_all(void)
>         /* the tail of list is guaranteed to be safe to remove */
>         list_for_each_entry_safe_reverse(ov, ovn, &ov_list, node) {
>                 list_del(&ov->node);
> -               of_changeset_revert(&ov->cset);
> +               __of_changeset_revert(&ov->cset);
>                 of_free_overlay_info(ov);
>                 idr_remove(&ov_idr, ov->id);
>                 kfree(ov);
> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
> index 2270830..06eb3e5 100644
> --- a/drivers/of/unittest.c
> +++ b/drivers/of/unittest.c
> @@ -527,7 +527,7 @@ static void __init of_unittest_changeset(void)
>         unittest(!of_changeset_update_property(&chgset, parent, ppupdate), "fail update prop\n");
>         unittest(!of_changeset_remove_property(&chgset, parent, ppremove), "fail remove prop\n");
>         mutex_lock(&of_mutex);
> -       unittest(!of_changeset_apply(&chgset), "apply failed\n");
> +       unittest(!__of_changeset_apply(&chgset), "apply failed\n");

You can just remove the mutex here.

>         mutex_unlock(&of_mutex);
>
>         /* Make sure node names are constructed correctly */
> @@ -536,7 +536,7 @@ static void __init of_unittest_changeset(void)
>         of_node_put(np);
>
>         mutex_lock(&of_mutex);
> -       unittest(!of_changeset_revert(&chgset), "revert failed\n");
> +       unittest(!__of_changeset_revert(&chgset), "revert failed\n");

And here.

>         mutex_unlock(&of_mutex);
>
>         of_changeset_destroy(&chgset);
> diff --git a/include/linux/of.h b/include/linux/of.h
> index edc068d..5c030e1 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -1001,7 +1001,9 @@ extern int of_reconfig_get_state_change(unsigned long action,
>
>  extern void of_changeset_init(struct of_changeset *ocs);
>  extern void of_changeset_destroy(struct of_changeset *ocs);
> +extern int __of_changeset_apply(struct of_changeset *ocs);
>  extern int of_changeset_apply(struct of_changeset *ocs);
> +extern int __of_changeset_revert(struct of_changeset *ocs);

These should go in of_private.h.

>  extern int of_changeset_revert(struct of_changeset *ocs);
>  extern int of_changeset_action(struct of_changeset *ocs,
>                 unsigned long action, struct device_node *np,
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level
  2015-08-06  4:11 ` [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level Gavin Shan
@ 2015-08-06 14:09   ` Rob Herring
  2015-11-03 23:16   ` Gavin Shan
  1 sibling, 0 replies; 102+ messages in thread
From: Rob Herring @ 2015-08-06 14:09 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, Bjorn Helgaas, Grant Likely, Pantelis Antoniou,
	aik

On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> unflatten_dt_node() is called recursively to unflatten FDT nodes
> with the assumption that FDT blob has only one root node, which
> isn't true when the FDT blob represents device sub-tree. This
> improves the function to supporting device sub-tree that have
> multiple nodes in the first level:
>
>    * Rename original unflatten_dt_node() to __unflatten_dt_node().
>    * Wrapper unflatten_dt_node() calls __unflatten_dt_node() with
>      adjusted current node depth to 1 to avoid underflow.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Acked-by: Rob Herring <robh@kernel.org>

> ---
>  drivers/of/fdt.c | 53 ++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 40 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 0749656..a18a2ce 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -161,7 +161,7 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
>  }
>
>  /**
> - * unflatten_dt_node - Alloc and populate a device_node from the flat tree
> + * __unflatten_dt_node - Alloc and populate a device_node from the flat tree
>   * @blob: The parent device tree blob
>   * @mem: Memory chunk to use for allocating device nodes and properties
>   * @poffset: pointer to node in flat tree
> @@ -171,20 +171,20 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
>   * @dryrun: If true, do not allocate device nodes but still calculate needed
>   * memory size
>   */
> -static void * unflatten_dt_node(const void *blob,
> +static void *__unflatten_dt_node(const void *blob,
>                                 void *mem,
>                                 int *poffset,
>                                 struct device_node *dad,
>                                 struct device_node **nodepp,
>                                 unsigned long fpsize,
> -                               bool dryrun)
> +                               bool dryrun,
> +                               int *depth)
>  {
>         const __be32 *p;
>         struct device_node *np;
>         struct property *pp, **prev_pp = NULL;
>         const char *pathp;
>         unsigned int l, allocl;
> -       static int depth = 0;
>         int old_depth;
>         int offset;
>         int has_name = 0;
> @@ -337,13 +337,25 @@ static void * unflatten_dt_node(const void *blob,
>                         np->type = "<NULL>";
>         }
>
> -       old_depth = depth;
> -       *poffset = fdt_next_node(blob, *poffset, &depth);
> -       if (depth < 0)
> -               depth = 0;
> -       while (*poffset > 0 && depth > old_depth)
> -               mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
> -                                       fpsize, dryrun);
> +       /* Multiple nodes might be in the first depth level if
> +        * the device tree is sub-tree. All nodes in current
> +        * or deeper depth are unflattened after it returns.
> +        */
> +       old_depth = *depth;
> +       *poffset = fdt_next_node(blob, *poffset, depth);
> +       while (*poffset > 0) {
> +               if (*depth < old_depth)
> +                       break;
> +
> +               if (*depth == old_depth)
> +                       mem = __unflatten_dt_node(blob, mem, poffset,
> +                                                 dad, NULL, fpsize,
> +                                                 dryrun, depth);
> +               else if (*depth > old_depth)
> +                       mem = __unflatten_dt_node(blob, mem, poffset,
> +                                                 np, NULL, fpsize,
> +                                                 dryrun, depth);
> +       }
>
>         if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
>                 pr_err("unflatten: error %d processing FDT\n", *poffset);
> @@ -369,6 +381,20 @@ static void * unflatten_dt_node(const void *blob,
>         return mem;
>  }
>
> +static void *unflatten_dt_node(const void *blob,
> +                              void *mem,
> +                              int *poffset,
> +                              struct device_node *dad,
> +                              struct device_node **nodepp,
> +                              bool dryrun)
> +{
> +       int depth = 1;
> +
> +       return __unflatten_dt_node(blob, mem, poffset,
> +                                  dad, nodepp, 0,
> +                                  dryrun, &depth);
> +}
> +
>  /**
>   * __unflatten_device_tree - create tree of device_nodes from flat blob
>   *
> @@ -408,7 +434,8 @@ static void __unflatten_device_tree(const void *blob,
>
>         /* First pass, scan for size */
>         start = 0;
> -       size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
> +       size = (unsigned long)unflatten_dt_node(blob, NULL, &start,
> +                                               NULL, NULL, true);
>         size = ALIGN(size, 4);
>
>         pr_debug("  size is %lx, allocating...\n", size);
> @@ -423,7 +450,7 @@ static void __unflatten_device_tree(const void *blob,
>
>         /* Second pass, do actual unflattening */
>         start = 0;
> -       unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
> +       unflatten_dt_node(blob, mem, &start, NULL, mynodes, false);
>         if (be32_to_cpup(mem + size) != 0xdeadbeef)
>                 pr_warning("End of tree marker overwritten: %08x\n",
>                            be32_to_cpup(mem + size));
> --
> 2.1.0
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree()
       [not found]   ` <1438834307-26960-41-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2015-08-06 14:19     ` Rob Herring
  0 siblings, 0 replies; 102+ messages in thread
From: Rob Herring @ 2015-08-06 14:19 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Benjamin Herrenschmidt, Michael Ellerman, Bjorn Helgaas,
	Grant Likely, Pantelis Antoniou, aik-sLpHqDYs0B2HXe+LvDLADg

On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> This changes of_fdt_unflatten_tree() so that it returns the allocated
> memory chunk for unflattened device-tree, which can be released once
> it's obsoleted.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

> ---
>  drivers/of/fdt.c       | 11 ++++++-----
>  include/linux/of_fdt.h |  2 +-
>  2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 074870a..8e1ba7e 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -408,7 +408,7 @@ static void *unflatten_dt_node(const void *blob,
>   * @dt_alloc: An allocator that provides a virtual address to memory
>   * for the resulting tree
>   */
> -static void __unflatten_device_tree(const void *blob,
> +static void *__unflatten_device_tree(const void *blob,
>                              struct device_node *dad,
>                              struct device_node **mynodes,
>                              void * (*dt_alloc)(u64 size, u64 align))
> @@ -421,7 +421,7 @@ static void __unflatten_device_tree(const void *blob,
>
>         if (!blob) {
>                 pr_debug("No device tree pointer\n");
> -               return;
> +               return NULL;
>         }
>
>         pr_debug("Unflattening device tree:\n");
> @@ -431,7 +431,7 @@ static void __unflatten_device_tree(const void *blob,
>
>         if (fdt_check_header(blob)) {
>                 pr_err("Invalid device tree blob header\n");
> -               return;
> +               return NULL;
>         }
>
>         /* First pass, scan for size */
> @@ -458,6 +458,7 @@ static void __unflatten_device_tree(const void *blob,
>                            be32_to_cpup(mem + size));
>
>         pr_debug(" <- unflatten_device_tree()\n");
> +       return mem;
>  }
>
>  static void *kernel_tree_alloc(u64 size, u64 align)
> @@ -473,11 +474,11 @@ static void *kernel_tree_alloc(u64 size, u64 align)
>   * pointers of the nodes so the normal device-tree walking functions
>   * can be used.
>   */
> -void of_fdt_unflatten_tree(const unsigned long *blob,
> +void *of_fdt_unflatten_tree(const unsigned long *blob,
>                         struct device_node *dad,
>                         struct device_node **mynodes)
>  {
> -       __unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
> +       return __unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
>  }
>  EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
>
> diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
> index 3644960..00db279 100644
> --- a/include/linux/of_fdt.h
> +++ b/include/linux/of_fdt.h
> @@ -37,7 +37,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
>                                  unsigned long node);
>  extern int of_fdt_match(const void *blob, unsigned long node,
>                         const char *const *compat);
> -extern void of_fdt_unflatten_tree(const unsigned long *blob,
> +extern void *of_fdt_unflatten_tree(const unsigned long *blob,
>                                struct device_node *dad,
>                                struct device_node **mynodes);
>
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 41/42] drivers/of: Export OF changeset functions
  2015-08-06 13:48   ` Rob Herring
@ 2015-08-07  1:43     ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-07  1:43 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gavin Shan, linuxppc-dev, linux-pci@vger.kernel.org,
	devicetree@vger.kernel.org, Benjamin Herrenschmidt,
	Michael Ellerman, Bjorn Helgaas, Grant Likely, Pantelis Antoniou,
	aik

On Thu, Aug 06, 2015 at 08:48:10AM -0500, Rob Herring wrote:
>On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:

Thanks, Rob. All your comments will be covered in next revision.

Thanks,
Gavin

>> The PowerNV PCI hotplug driver is going to use the OF changeset
>> to manage the changed device sub-tree, which requires those OF
>> changeset functions are exported.
>>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  drivers/of/dynamic.c  | 65 ++++++++++++++++++++++++++++++++++++---------------
>>  drivers/of/overlay.c  |  8 +++----
>>  drivers/of/unittest.c |  4 ++--
>>  include/linux/of.h    |  2 ++
>>  4 files changed, 54 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>> index 53826b8..af65b5b 100644
>> --- a/drivers/of/dynamic.c
>> +++ b/drivers/of/dynamic.c
>> @@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs)
>>         memset(ocs, 0, sizeof(*ocs));
>>         INIT_LIST_HEAD(&ocs->entries);
>>  }
>> +EXPORT_SYMBOL(of_changeset_init);
>
>We probably want these to be the _GPL variant.
>
>>
>>  /**
>>   * of_changeset_destroy - Destroy a changeset
>> @@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs)
>>         list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
>>                 __of_changeset_entry_destroy(ce);
>>  }
>> +EXPORT_SYMBOL(of_changeset_destroy);
>>
>> -/**
>> - * of_changeset_apply - Applies a changeset
>> - *
>> - * @ocs:       changeset pointer
>> - *
>> - * Applies a changeset to the live tree.
>> - * Any side-effects of live tree state changes are applied here on
>> - * sucess, like creation/destruction of devices and side-effects
>> - * like creation of sysfs properties and directories.
>> - * Returns 0 on success, a negative error value in case of an error.
>> - * On error the partially applied effects are reverted.
>> - */
>> -int of_changeset_apply(struct of_changeset *ocs)
>> +int __of_changeset_apply(struct of_changeset *ocs)
>>  {
>>         struct of_changeset_entry *ce;
>>         int ret;
>> @@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs)
>>  }
>>
>>  /**
>> - * of_changeset_revert - Reverts an applied changeset
>> + * of_changeset_apply - Applies a changeset
>>   *
>>   * @ocs:       changeset pointer
>>   *
>> - * Reverts a changeset returning the state of the tree to what it
>> - * was before the application.
>> - * Any side-effects like creation/destruction of devices and
>> - * removal of sysfs properties and directories are applied.
>> + * Applies a changeset to the live tree.
>> + * Any side-effects of live tree state changes are applied here on
>> + * sucess, like creation/destruction of devices and side-effects
>
>s/sucess/success/
>
>> + * like creation of sysfs properties and directories.
>>   * Returns 0 on success, a negative error value in case of an error.
>> + * On error the partially applied effects are reverted.
>>   */
>> -int of_changeset_revert(struct of_changeset *ocs)
>> +int of_changeset_apply(struct of_changeset *ocs)
>> +{
>> +       int ret;
>> +
>> +       mutex_lock(&of_mutex);
>> +       ret = __of_changeset_apply(ocs);
>> +       mutex_unlock(&of_mutex);
>> +
>> +       return ret;
>> +}
>> +EXPORT_SYMBOL(of_changeset_apply);
>> +
>> +int __of_changeset_revert(struct of_changeset *ocs)
>>  {
>>         struct of_changeset_entry *ce;
>>         int ret;
>> @@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs)
>>  }
>>
>>  /**
>> + * of_changeset_revert - Reverts an applied changeset
>> + *
>> + * @ocs:       changeset pointer
>> + *
>> + * Reverts a changeset returning the state of the tree to what it
>> + * was before the application.
>> + * Any side-effects like creation/destruction of devices and
>> + * removal of sysfs properties and directories are applied.
>> + * Returns 0 on success, a negative error value in case of an error.
>> + */
>> +int of_changeset_revert(struct of_changeset *ocs)
>> +{
>> +       int ret;
>> +
>> +       mutex_lock(&of_mutex);
>> +       ret = __of_changeset_revert(ocs);
>> +       mutex_unlock(&of_mutex);
>> +
>> +       return ret;
>> +}
>> +EXPORT_SYMBOL(of_changeset_revert);
>> +
>> +/**
>>   * of_changeset_action - Perform a changeset action
>>   *
>>   * @ocs:       changeset pointer
>> @@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, unsigned long action,
>>         list_add_tail(&ce->node, &ocs->entries);
>>         return 0;
>>  }
>> +EXPORT_SYMBOL(of_changeset_action);
>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
>> index 24e025f..804ea33 100644
>> --- a/drivers/of/overlay.c
>> +++ b/drivers/of/overlay.c
>> @@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree)
>>         }
>>
>>         /* apply the changeset */
>> -       err = of_changeset_apply(&ov->cset);
>> +       err = __of_changeset_apply(&ov->cset);
>>         if (err) {
>> -               pr_err("%s: of_changeset_apply() failed for tree@%s\n",
>> +               pr_err("%s: __of_changeset_apply() failed for tree@%s\n",
>>                                 __func__, tree->full_name);
>>                 goto err_revert_overlay;
>>         }
>> @@ -508,7 +508,7 @@ int of_overlay_destroy(int id)
>>
>>
>>         list_del(&ov->node);
>> -       of_changeset_revert(&ov->cset);
>> +       __of_changeset_revert(&ov->cset);
>>         of_free_overlay_info(ov);
>>         idr_remove(&ov_idr, id);
>>         of_changeset_destroy(&ov->cset);
>> @@ -539,7 +539,7 @@ int of_overlay_destroy_all(void)
>>         /* the tail of list is guaranteed to be safe to remove */
>>         list_for_each_entry_safe_reverse(ov, ovn, &ov_list, node) {
>>                 list_del(&ov->node);
>> -               of_changeset_revert(&ov->cset);
>> +               __of_changeset_revert(&ov->cset);
>>                 of_free_overlay_info(ov);
>>                 idr_remove(&ov_idr, ov->id);
>>                 kfree(ov);
>> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
>> index 2270830..06eb3e5 100644
>> --- a/drivers/of/unittest.c
>> +++ b/drivers/of/unittest.c
>> @@ -527,7 +527,7 @@ static void __init of_unittest_changeset(void)
>>         unittest(!of_changeset_update_property(&chgset, parent, ppupdate), "fail update prop\n");
>>         unittest(!of_changeset_remove_property(&chgset, parent, ppremove), "fail remove prop\n");
>>         mutex_lock(&of_mutex);
>> -       unittest(!of_changeset_apply(&chgset), "apply failed\n");
>> +       unittest(!__of_changeset_apply(&chgset), "apply failed\n");
>
>You can just remove the mutex here.
>
>>         mutex_unlock(&of_mutex);
>>
>>         /* Make sure node names are constructed correctly */
>> @@ -536,7 +536,7 @@ static void __init of_unittest_changeset(void)
>>         of_node_put(np);
>>
>>         mutex_lock(&of_mutex);
>> -       unittest(!of_changeset_revert(&chgset), "revert failed\n");
>> +       unittest(!__of_changeset_revert(&chgset), "revert failed\n");
>
>And here.
>
>>         mutex_unlock(&of_mutex);
>>
>>         of_changeset_destroy(&chgset);
>> diff --git a/include/linux/of.h b/include/linux/of.h
>> index edc068d..5c030e1 100644
>> --- a/include/linux/of.h
>> +++ b/include/linux/of.h
>> @@ -1001,7 +1001,9 @@ extern int of_reconfig_get_state_change(unsigned long action,
>>
>>  extern void of_changeset_init(struct of_changeset *ocs);
>>  extern void of_changeset_destroy(struct of_changeset *ocs);
>> +extern int __of_changeset_apply(struct of_changeset *ocs);
>>  extern int of_changeset_apply(struct of_changeset *ocs);
>> +extern int __of_changeset_revert(struct of_changeset *ocs);
>
>These should go in of_private.h.
>
>>  extern int of_changeset_revert(struct of_changeset *ocs);
>>  extern int of_changeset_action(struct of_changeset *ocs,
>>                 unsigned long action, struct device_node *np,
>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport
  2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
                   ` (35 preceding siblings ...)
  2015-08-06  4:11 ` [PATCH v6 42/42] pci/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
@ 2015-08-10  6:05 ` Alexey Kardashevskiy
  2015-08-10  7:17   ` Gavin Shan
  36 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  6:05 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> The series of patches intend to support PCI slot for PowerPC PowerNV platform,
> which is running on top of skiboot firmware. The patchset requires corresponding
> changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
> for review. The PCI slots are exposed by skiboot with device node properties,
> and kernel utilizes those properties to populated PCI slots accordingly.


This does not apply on top of any actual trees I have - torvalds/master, 
powerpc/master, powerpc/next.

The problem patches are (at least):
powerpc/powernv: Enable M64 on P7IOC
powerpc/powernv: Release PEs dynamically

What did you base them on (sha1)? It is always worth mentioning.




-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC
  2015-08-06  4:11 ` [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC Gavin Shan
@ 2015-08-10  6:30   ` Alexey Kardashevskiy
  2015-08-10 23:45     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  6:30 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> The patch enables M64 window on P7IOC, which has been enabled on
> PHB3. Different from PHB3 where 16 M64 BARs are supported and each
> of them can be owned by one particular PE# exclusively or divided
> evenly to 256 segments, each P7IOC PHB has 16 M64 BARs and each
> of them are divided into 8 segments.

Is this a limitation of POWER7 chip or it is from IODA1?


> So each P7IOC PHB can support
> 128 M64 segments only. Also, P7IOC has M64DT, which helps mapping
> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
> M64DT, indicating that one M64 segment can only be pinned to the
> fixed PE#. In order to have similar logic to support M64 for PHB3
> and P7IOC, we just provide 128 M64 (16 BARs) segments and fixed
> mapping between PE# and M64 segment# on P7IOC. In turn, we just
> need different phb->init_m64() hooks for P7IOC and PHB3 to support
> M64.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 116 ++++++++++++++++++++++++++----
>   1 file changed, 104 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 38b5405..e4ac703 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -172,6 +172,69 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>   	clear_bit(pe, phb->ioda.pe_alloc);
>   }
>
> +static int pnv_ioda1_init_m64(struct pnv_phb *phb)
> +{
> +	struct resource *r;
> +	int seg;
> +
> +	/* There are as many M64 segments as the maximum number
> +	 * of PEs, which is 128.
> +	 */
> +	for (seg = 0; seg < phb->ioda.total_pe; seg += 8) {


This "8" is used a lot across the patch, please make it a macro 
(PNV_PHB_P7IOC_SEGNUM or PNV_PHB_IODA1_SEGNUM or whatever you think it is) 
with a short comment why it is "8". Or a pnv_phb member.


> +		unsigned long base;
> +		int64_t rc;
> +
> +		base = phb->ioda.m64_base + seg * phb->ioda.m64_segsize;
> +		rc = opal_pci_set_phb_mem_window(phb->opal_id,
> +						 OPAL_M64_WINDOW_TYPE,
> +						 seg / 8,
> +						 base,
> +						 0, /* unused */
> +						 8 * phb->ioda.m64_segsize);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
> +				rc, phb->hose->global_number, seg / 8);
> +			goto fail;
> +		}
> +
> +		rc = opal_pci_phb_mmio_enable(phb->opal_id,
> +					      OPAL_M64_WINDOW_TYPE,
> +					      seg / 8,
> +					      OPAL_ENABLE_M64_SPLIT);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
> +				rc, phb->hose->global_number, seg / 8);
> +			goto fail;
> +		}
> +	}
> +
> +	/* Strip off the segment used by the reserved PE, which

What is this reserved PE on P7IOC? "Strip off" means "exclude" here?


> +	 * is expected to be 0 or last supported PE#. The PHB's
> +	 * first memory window traces the 32-bits MMIO range

s/traces/filters/ ? Or I did not understand this comment...


> +	 * while the second one traces the 64-bits prefetchable
> +	 * MMIO range that the PHB supports.

32/64 ranges comment seems irrelevant here.


> +	 */
> +	r = &phb->hose->mem_resources[1];
> +	if (phb->ioda.reserved_pe == 0)
> +		r->start += phb->ioda.m64_segsize;
> +	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
> +		r->end -= phb->ioda.m64_segsize;
> +	else
> +		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
> +			phb->ioda.reserved_pe);
> +
> +	return 0;
> +
> +fail:
> +	for ( ; seg >= 0; seg -= 8)
> +		opal_pci_phb_mmio_enable(phb->opal_id,
> +					 OPAL_M64_WINDOW_TYPE,
> +					 seg / 8,
> +					 OPAL_DISABLE_M64);
> +
> +	return -EIO;
> +}
> +
>   /* The default M64 BAR is shared by all PEs */
>   static int pnv_ioda2_init_m64(struct pnv_phb *phb)
>   {
> @@ -256,9 +319,9 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
>   	}
>   }
>
> -static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
> -				     unsigned long *pe_bitmap,
> -				     bool all)
> +static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
> +				    unsigned long *pe_bitmap,
> +				    bool all)
>   {
>   	struct pci_dev *pdev;
>
> @@ -266,12 +329,12 @@ static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
>   		pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
>
>   		if (all && pdev->subordinate)
> -			pnv_ioda2_reserve_m64_pe(pdev->subordinate,
> -						 pe_bitmap, all);
> +			pnv_ioda_reserve_m64_pe(pdev->subordinate,
> +						pe_bitmap, all);
>   	}
>   }
>
> -static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
> +static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
> @@ -293,7 +356,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   	}
>
>   	/* Figure out reserved PE numbers by the PE */
> -	pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
> +	pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
>
>   	/*
>   	 * the current bus might not own M64 window and that's all
> @@ -324,6 +387,26 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>   			pe->master = master_pe;
>   			list_add_tail(&pe->list, &master_pe->slaves);
>   		}
> +
> +		/* P7IOC supports M64DT, which helps mapping M64 segment
> +		 * to one particular PE#. However, PHB3 has fixed mapping
> +		 * between M64 segment and PE#. In order to have same logic
> +		 * for P7IOC and PHB3, we enforce fixed mapping between M64
> +		 * segment and PE# on P7IOC.
> +		 */
> +		if (phb->type == PNV_PHB_IODA1) {
> +			int64_t rc;
> +
> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +							 pe->pe_number,
> +							 OPAL_M64_WINDOW_TYPE,
> +							 pe->pe_number / 8,
> +							 pe->pe_number % 8);
> +			if (rc != OPAL_SUCCESS)
> +				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
> +					__func__, rc, phb->hose->global_number,
> +					pe->pe_number);
> +		}
>   	}
>
>   	kfree(pe_alloc);
> @@ -338,8 +421,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>   	const u32 *r;
>   	u64 pci_addr;
>
> -	/* FIXME: Support M64 for P7IOC */
> -	if (phb->type != PNV_PHB_IODA2) {
> +	if (phb->type != PNV_PHB_IODA1 &&
> +	    phb->type != PNV_PHB_IODA2) {
>   		pr_info("  Not support M64 window\n");
>   		return;


You are adding P7IOC support so at least "fixme" should go. Also, 
pnv_ioda_parse_m64_window() is only called from pnv_pci_init_ioda_phb() 
which is called only with PNV_PHB_IODA1 and PNV_PHB_IODA2 (no other value 
is passed there a type) so the check above will never succeed, just remove it.



>   	}
> @@ -372,9 +455,18 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>
>   	/* Use last M64 BAR to cover M64 window */
>   	phb->ioda.m64_bar_idx = 15;
> -	phb->init_m64 = pnv_ioda2_init_m64;
> -	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
> -	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
> +	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
> +	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
> +	switch (phb->type) {
> +	case PNV_PHB_IODA1:
> +		phb->init_m64 = pnv_ioda1_init_m64;
> +		break;
> +	case PNV_PHB_IODA2:
> +		phb->init_m64 = pnv_ioda2_init_m64;
> +		break;
> +	default:
> +		pr_debug("   M64 not supported\n");
> +	}
>   }
>
>   static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
  2015-08-06  4:11 ` [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE Gavin Shan
@ 2015-08-10  7:16   ` Alexey Kardashevskiy
  2015-08-11  0:03     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  7:16 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> The patch is adding 6 bitmaps, three to PE and three to PHB, to track

The patch is also removing 2 arrays (io_segmap and m32_segmap), what is 
that all about? Also, there was no m64_segmap, now there is, needs an 
explanation may be.


> the consumed by one particular PE, which can be released once the PE
> is destroyed during PCI unplugging time. Also, we're using fixed
> quantity of bits to trace the used IO, M32 and M64 segments by PEs
> in one particular PHB.
>

Out of curiosity - have you considered having just 3 arrays, in PHB, 
storing PE numbers, and ditching PE's arrays? Does PE itself need to know 
what PEs it is using? Not sure about this master/slave PEs though.

It would be easier to read patches if this one was right before
[PATCH v6 23/42] powerpc/powernv: Release PEs dynamically



> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++++++++++++++--------------
>   arch/powerpc/platforms/powernv/pci.h      | 18 ++++++++++++++----
>   2 files changed, 29 insertions(+), 18 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index e4ac703..78b49a1 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   			list_add_tail(&pe->list, &master_pe->slaves);
>   		}
>
> +		/* M64 segments consumed by slave PEs are tracked
> +		 * by master PE
> +		 */
> +		set_bit(pe->pe_number, master_pe->m64_segmap);
> +		set_bit(pe->pe_number, phb->ioda.m64_segmap);
> +
>   		/* P7IOC supports M64DT, which helps mapping M64 segment
>   		 * to one particular PE#. However, PHB3 has fixed mapping
>   		 * between M64 segment and PE#. In order to have same logic
> @@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>
>   			while (index < phb->ioda.total_pe &&
>   			       region.start <= region.end) {
> -				phb->ioda.io_segmap[index] = pe->pe_number;
> +				set_bit(index, pe->io_segmap);
> +				set_bit(index, phb->ioda.io_segmap);
>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> -					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
> +					pe->pe_number, OPAL_IO_WINDOW_TYPE,
> +					0, index);

Unrelated change.


>   				if (rc != OPAL_SUCCESS) {
>   					pr_err("%s: OPAL error %d when mapping IO "
>   					       "segment #%d to PE#%d\n",
> @@ -2896,9 +2904,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>
>   			while (index < phb->ioda.total_pe &&
>   			       region.start <= region.end) {
> -				phb->ioda.m32_segmap[index] = pe->pe_number;
> +				set_bit(index, pe->m32_segmap);
> +				set_bit(index, phb->ioda.m32_segmap);
>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> -					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
> +					pe->pe_number, OPAL_M32_WINDOW_TYPE,
> +					0, index);

Unrelated change.


>   				if (rc != OPAL_SUCCESS) {
>   					pr_err("%s: OPAL error %d when mapping M32 "
>   					       "segment#%d to PE#%d",
> @@ -3090,7 +3100,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   {
>   	struct pci_controller *hose;
>   	struct pnv_phb *phb;
> -	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
> +	unsigned long size, pemap_off;
>   	const __be64 *prop64;
>   	const __be32 *prop32;
>   	int len;
> @@ -3175,19 +3185,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>
>   	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */


This comment came with if(IODA1) below, since you are removing the 
condition below, makes sense to remove the comment as well or move it where 
people will look for it (arch/powerpc/platforms/powernv/pci.h ?)


>   	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
> -	m32map_off = size;
> -	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
> -	if (phb->type == PNV_PHB_IODA1) {
> -		iomap_off = size;
> -		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
> -	}
>   	pemap_off = size;
>   	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
>   	aux = memblock_virt_alloc(size, 0);


After adding static arrays to PE and PHB, do you still need this "aux"?


>   	phb->ioda.pe_alloc = aux;
> -	phb->ioda.m32_segmap = aux + m32map_off;
> -	if (phb->type == PNV_PHB_IODA1)
> -		phb->ioda.io_segmap = aux + iomap_off;
>   	phb->ioda.pe_array = aux + pemap_off;
>   	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
>
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 62239b1..08a4e57 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -49,6 +49,15 @@ struct pnv_ioda_pe {
>   	/* PE number */
>   	unsigned int		pe_number;
>
> +	/* IO/M32/M64 segments consumed by the PE. Each PE can
> +	 * have one M64 segment at most, but M64 segments consumed
> +	 * by slave PEs will be contributed to the master PE. One
> +	 * PE can own multiple IO and M32 segments.


A PE can have multiple IO and M32 segments but just one M64 segment? Is 
this correct for IODA1 or IODA2 or both? Is this a limitation of this 
implementation or it comes from P7IOC/PHB3 hardware?


> +	 */
> +	unsigned long		io_segmap[8];
> +	unsigned long		m32_segmap[8];
> +	unsigned long		m64_segmap[8];

Magic constant "8", 64bit*8 = 512 PEs - where did this come from?

Anyway,

#define PNV_IODA_MAX_PE_NUM	512

unsigned long io_segmap[PNV_IODA_MAX_PE_NUM/BITS_PER_LONG]




> +
>   	/* "Weight" assigned to the PE for the sake of DMA resource
>   	 * allocations
>   	 */
> @@ -145,15 +154,16 @@ struct pnv_phb {
>   			unsigned int		io_segsize;
>   			unsigned int		io_pci_base;
>
> +			/* IO, M32, M64 segment maps */
> +			unsigned long		io_segmap[8];
> +			unsigned long		m32_segmap[8];
> +			unsigned long		m64_segmap[8];
> +
>   			/* PE allocation */
>   			struct mutex		pe_alloc_mutex;
>   			unsigned long		*pe_alloc;
>   			struct pnv_ioda_pe	*pe_array;
>
> -			/* M32 & IO segment maps */
> -			unsigned int		*m32_segmap;
> -			unsigned int		*io_segmap;
> -
>   			/* IRQ chip */
>   			int			irq_chip_init;
>   			struct irq_chip		irq_chip;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport
  2015-08-10  6:05 ` [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Alexey Kardashevskiy
@ 2015-08-10  7:17   ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-10  7:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 04:05:40PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>The series of patches intend to support PCI slot for PowerPC PowerNV platform,
>>which is running on top of skiboot firmware. The patchset requires corresponding
>>changes from skiboot firmware, which is sent to skiboot@lists.ozlabs.org
>>for review. The PCI slots are exposed by skiboot with device node properties,
>>and kernel utilizes those properties to populated PCI slots accordingly.
>
>
>This does not apply on top of any actual trees I have - torvalds/master,
>powerpc/master, powerpc/next.
>
>The problem patches are (at least):
>powerpc/powernv: Enable M64 on P7IOC
>powerpc/powernv: Release PEs dynamically
>
>What did you base them on (sha1)? It is always worth mentioning.
>

The patchset bases on powerpc/next + below patches that will be merged
prior to this patchset, I think. I tried to avoid conflicts as much as
I can do:

e14f70b powerpc/powernv: compound PE for VFs				<<< EEH Support for VF - END
42f59ac powerpc/eeh: Support error recovery for VF PE
9c1c221 powerpc/powernv: Support PCI config restore for VFs
8ac2231 powerpc/powernv: Support EEH reset for VF PE
a636ce5 powerpc/eeh: Create PE for VFs
a4e56fc powerpc/powernv: EEH device for VF
2f02884 powerpc/eeh: Cache only BARs, not windows or IOV BARs
1888e95 powerpc/pci: Remove VFs prior to PF
0dab41d powerpc/pci: Cache VF index in pci_dn
fdc2d8a PCI: Add pcibios_bus_add_device() weak function
2bcc609 PCI/IOV: Rename and export virtfn_add/virtfn_remove		<<< EEH Support for VF - START
efde611 powerpc/eeh: Disable automatically blocked PCI config

All above patches can be found from linux-ppc mail archive.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 07/42] powerpc/powernv: Improve IO and M32 mapping
       [not found]   ` <1438834307-26960-8-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2015-08-10  7:40     ` Alexey Kardashevskiy
  2015-08-11  0:12       ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  7:40 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r, mpe-Gsx/Oe8HsFggBc27wqDAHg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	panto-wVdstyuyKrO8r51toPun2/C9HSW9iNxf

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> There're 3 windows (IO, M32 and M64) for PHB, root port and upstream

These are actually IO, non-prefetchable and prefetchable windows which 
happen to be IO, 32bit and 64bit windows but this has nothing to do with 
the M32/M64 BAR registers in P7IOC/PHB3, do I understand this correctly?


> port of the PCIE switch behind root port. In order to support PCI
> hotplug, we extend the start/end address of those 3 windows of root
> port or upstream port to the start/end address of the 3 PHB's windows.
> The current implementation, assigning IO or M32 segment based on the
> bridge's windows, isn't reliable.
>
> The patch fixes above issue by calculating PE's consumed IO or M32
> segments from its contained devices, no PCI bridge windows involved
> if the PE doesn't contain all the subordinate PCI buses.

Please, rephrase it. How can PCI bridges be involved in PE consumption?


> Otherwise,
> the PCI bridge windows still contribute to PE's consumed IO or M32
> segments.

PCI bridge windows themselves consume PEs? Is that correct?


>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 136 +++++++++++++++++-------------
>   1 file changed, 79 insertions(+), 57 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 488a53e..713f4b4 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2844,75 +2844,97 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>   }
>   #endif /* CONFIG_PCI_IOV */
>
> -/*
> - * This function is supposed to be called on basis of PE from top
> - * to bottom style. So the the I/O or MMIO segment assigned to
> - * parent PE could be overrided by its child PEs if necessary.
> - */
> -static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
> -				  struct pnv_ioda_pe *pe)
> +static int pnv_ioda_setup_one_res(struct pci_controller *hose,
> +				  struct pnv_ioda_pe *pe,
> +				  struct resource *res)
>   {
>   	struct pnv_phb *phb = hose->private_data;
>   	struct pci_bus_region region;
> -	struct resource *res;
> -	int i, index;
> -	unsigned int segsize;
> +	unsigned int index, segsize;
>   	unsigned long *segmap, *pe_segmap;
>   	uint16_t win;
>   	int64_t rc;
>
> -	/*
> -	 * NOTE: We only care PCI bus based PE for now. For PCI
> -	 * device based PE, for example SRIOV sensitive VF should
> -	 * be figured out later.
> -	 */
> -	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
> +	/* Check if we need map the resource */
> +	if (!res->parent || !res->flags || res->start > res->end)

res->start >= res->end ?


> +		return 0;
>
> -	pci_bus_for_each_resource(pe->pbus, res, i) {
> -		if (!res || !res->flags ||
> -		    res->start > res->end)
> -			continue;
> +	if (res->flags & IORESOURCE_IO) {
> +		region.start = res->start - phb->ioda.io_pci_base;
> +		region.end   = res->end - phb->ioda.io_pci_base;
> +		segsize      = phb->ioda.io_segsize;
> +		segmap       = phb->ioda.io_segmap;
> +		pe_segmap    = pe->io_segmap;
> +		win          = OPAL_IO_WINDOW_TYPE;
> +	} else if ((res->flags & IORESOURCE_MEM) &&
> +		   !pnv_pci_is_mem_pref_64(res->flags)) {
> +		region.start = res->start -
> +			       hose->mem_offset[0] -
> +			       phb->ioda.m32_pci_base;
> +		region.end   = res->end -
> +			       hose->mem_offset[0] -
> +			       phb->ioda.m32_pci_base;
> +		segsize      = phb->ioda.m32_segsize;
> +		segmap       = phb->ioda.m32_segmap;
> +		pe_segmap    = pe->m32_segmap;
> +		win          = OPAL_M32_WINDOW_TYPE;
> +	} else {
> +		return 0;
> +	}
>
> -		if (res->flags & IORESOURCE_IO) {
> -			region.start = res->start - phb->ioda.io_pci_base;
> -			region.end   = res->end - phb->ioda.io_pci_base;
> -			segsize      = phb->ioda.io_segsize;
> -			segmap       = phb->ioda.io_segmap;
> -			pe_segmap    = pe->io_segmap;
> -			win          = OPAL_IO_WINDOW_TYPE;
> -		} else if ((res->flags & IORESOURCE_MEM) &&
> -			   !pnv_pci_is_mem_pref_64(res->flags)) {
> -			region.start = res->start -
> -				       hose->mem_offset[0] -
> -				       phb->ioda.m32_pci_base;
> -			region.end   = res->end -
> -				       hose->mem_offset[0] -
> -				       phb->ioda.m32_pci_base;
> -			segsize      = phb->ioda.m32_segsize;
> -			segmap       = phb->ioda.m32_segmap;
> -			pe_segmap    = pe->m32_segmap;
> -			win          = OPAL_M32_WINDOW_TYPE;
> -		} else {
> -			continue;
> +	region.start = _ALIGN_DOWN(region.start, segsize);
> +	region.end   = _ALIGN_UP(region.end, segsize);
> +	index = region.start / segsize;
> +	while (index < phb->ioda.total_pe &&
> +	       region.start < region.end) {
> +		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> +				pe->pe_number, win, 0, index);
> +		if (rc != OPAL_SUCCESS) {
> +			pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
> +				__func__, rc, win, index,
> +				pe->phb->hose->global_number,
> +				pe->pe_number);
> +			return -EIO;
>   		}
>
> -		index = region.start / phb->ioda.io_segsize;
> -		while (index < phb->ioda.total_pe &&
> -		       region.start <= region.end) {
> -			set_bit(index, segmap);
> -			set_bit(index, pe_segmap);
> -			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> -					pe->pe_number, win, 0, index);
> -			if (rc != OPAL_SUCCESS) {
> -				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
> -					__func__, rc, win, index,
> -					pe->phb->hose->global_number,
> -					pe->pe_number);
> -				break;
> -			}
> +		set_bit(index, segmap);
> +		set_bit(index, pe_segmap);
> +		region.start += segsize;
> +		index++;
> +	}
> +
> +	return 0;
> +}
> +
> +static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
> +				  struct pnv_ioda_pe *pe)
> +{
> +	struct pci_dev *pdev;
> +	struct resource *res;
> +	int i;
> +
> +	/* This function only works for bus dependent PE */
> +	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
> +
> +	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
> +		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
> +			res = &pdev->resource[i];
> +			if (pnv_ioda_setup_one_res(hose, pe, res))
> +				return;
> +		}
> +
> +		/* If the PE contains all subordinate PCI buses, the
> +		 * resources of the child bridges should be mapped
> +		 * to the PE as well.
> +		 */
> +		if (!(pe->flags & PNV_IODA_PE_BUS_ALL) ||
> +		    (pdev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
> +			continue;
>
> -			region.start += segsize;
> -			index++;
> +		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
> +			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
> +			if (pnv_ioda_setup_one_res(hose, pe, res))
> +				return;
>   		}
>   	}
>   }
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically
  2015-08-06  4:11 ` [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically Gavin Shan
@ 2015-08-10  7:48   ` Alexey Kardashevskiy
  2015-08-10  9:21   ` Alexey Kardashevskiy
  1 sibling, 0 replies; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  7:48 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> For P7IOC, the whole available DMA32 space, which is below the
> MEM32 space, is divided evenly into 256MB segments. The number
> of continuous segments assigned to one particular PE depends on
> the PE's DMA weight that is calculated based on the type of each
> PCI devices contained in the PE, and PHB's DMA weight which is
> accumulative DMA weight of PEs contained in the PHB. It means
> that the PHB's DMA weight calculation depends on existing PEs,
> which works perfectly now, but not hotplug friendly. As the
> whole available DMA32 space can be assigned to one PE on PHB3,
> so we don't have the issue on PHB3.
>
> The patch calculates PHB's DMA weight based on the PCI devices
> contained in the PHB dynamically so that it's hotplug friendly.

It does not look like the patch changed anything about when to calculate 
weights, it was and is pnv_ioda_setup_dma().

What the patch seems to be doing is changing weights by multiplying them by 
phb->ioda.tce32_count but it is unclear why you do this.


>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 88 +++++++++++++++----------------
>   arch/powerpc/platforms/powernv/pci.h      |  6 ---
>   2 files changed, 43 insertions(+), 51 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 713f4b4..7342cfd 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -927,6 +927,9 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
>
>   static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>   {
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +
>   	/* This is quite simplistic. The "base" weight of a device
>   	 * is 10. 0 means no DMA is to be accounted for it.
>   	 */
> @@ -939,14 +942,34 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>   	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>   	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>   	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> -		return 3;
> +		return 3 * phb->ioda.tce32_count;
>
>   	/* Increase the weight of RAID (includes Obsidian) */
>   	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
> -		return 15;
> +		return 15 * phb->ioda.tce32_count;
>
>   	/* Default */
> -	return 10;
> +	return 10 * phb->ioda.tce32_count;
> +}
> +
> +static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data)
> +{
> +	unsigned int *dma_weight = data;
> +
> +	*dma_weight += pnv_ioda_dma_weight(pdev);
> +	return 0;
> +}
> +
> +static unsigned int pnv_ioda_phb_dma_weight(struct pnv_phb *phb)
> +{
> +	unsigned int dma_weight = 0;
> +
> +	if (!phb->hose->bus)
> +		return 0;
> +
> +	pci_walk_bus(phb->hose->bus,
> +		     __pnv_ioda_phb_dma_weight, &dma_weight);
> +	return dma_weight;
>   }
>
>   #ifdef CONFIG_PCI_IOV
> @@ -1097,14 +1120,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	/* Put PE to the list */
>   	list_add_tail(&pe->list, &phb->ioda.pe_list);
>
> -	/* Account for one DMA PE if at least one DMA capable device exist
> -	 * below the bridge
> -	 */
> -	if (pe->dma_weight != 0) {
> -		phb->ioda.dma_weight += pe->dma_weight;
> -		phb->ioda.dma_pe_count++;
> -	}
> -
>   	/* Link the PE */
>   	pnv_ioda_link_pe_by_weight(phb, pe);
>   }
> @@ -2431,24 +2446,13 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   {
>   	struct pci_controller *hose = phb->hose;
> -	unsigned int residual, remaining, segs, tw, base;
>   	struct pnv_ioda_pe *pe;
> +	unsigned int dma_weight;
>
> -	/* If we have more PE# than segments available, hand out one
> -	 * per PE until we run out and let the rest fail. If not,
> -	 * then we assign at least one segment per PE, plus more based
> -	 * on the amount of devices under that PE
> -	 */
> -	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
> -		residual = 0;
> -	else
> -		residual = phb->ioda.tce32_count -
> -			phb->ioda.dma_pe_count;
> -
> -	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
> -		hose->global_number, phb->ioda.tce32_count);
> -	pr_info("PCI: %d PE# for a total weight of %d\n",
> -		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
> +	/* Calculate the PHB's DMA weight */
> +	dma_weight = pnv_ioda_phb_dma_weight(phb);
> +	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
> +		hose->global_number, phb->ioda.tce32_count, dma_weight);
>
>   	pnv_pci_ioda_setup_opal_tce_kill(phb);
>
> @@ -2456,22 +2460,9 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   	 * out one base segment plus any residual segments based on
>   	 * weight
>   	 */
> -	remaining = phb->ioda.tce32_count;
> -	tw = phb->ioda.dma_weight;
> -	base = 0;
>   	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>   		if (!pe->dma_weight)
>   			continue;
> -		if (!remaining) {
> -			pe_warn(pe, "No DMA32 resources available\n");
> -			continue;
> -		}
> -		segs = 1;
> -		if (residual) {
> -			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
> -			if (segs > remaining)
> -				segs = remaining;
> -		}
>
>   		/*
>   		 * For IODA2 compliant PHB3, we needn't care about the weight.
> @@ -2479,17 +2470,24 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   		 * the specific PE.
>   		 */
>   		if (phb->type == PNV_PHB_IODA1) {
> -			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
> +			unsigned int segs, base = 0;
> +
> +			if (pe->dma_weight <
> +			    dma_weight / phb->ioda.tce32_count)
> +				segs = 1;
> +			else
> +				segs = (pe->dma_weight *
> +					phb->ioda.tce32_count) / dma_weight;
> +
> +			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
>   				pe->dma_weight, segs);
>   			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
> +
> +			base += segs;
>   		} else {
>   			pe_info(pe, "Assign DMA32 space\n");
> -			segs = 0;
>   			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>   		}
> -
> -		remaining -= segs;
> -		base += segs;
>   	}
>   }
>
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 08a4e57..addd3f7 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -183,12 +183,6 @@ struct pnv_phb {
>   			/* 32-bit TCE tables allocation */
>   			unsigned long		tce32_count;
>
> -			/* Total "weight" for the sake of DMA resources
> -			 * allocation
> -			 */
> -			unsigned int		dma_weight;
> -			unsigned int		dma_pe_count;
> -
>   			/* Sorted list of used PE's, sorted at
>   			 * boot for resource allocation purposes
>   			 */
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 09/42] powerpc/powernv: DMA32 cleanup
  2015-08-06  4:11 ` [PATCH v6 09/42] powerpc/powernv: DMA32 cleanup Gavin Shan
@ 2015-08-10  8:07   ` Alexey Kardashevskiy
  2015-08-11  0:19     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  8:07 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> The patch cleans up DMA32 in pci-ioda.c. It shouldn't introduce
> behavioural changes:
>
>     * Rename various fields in "struct pnv_phb" and "struct pnv_ioda_pe"
>       as 32-bits DMA should be related to "DMA", not "TCE".

s/dma_weight/dma32_weight/ is ok (does not add much though_ but the rest is 
not. The "tce32_" fields are still TCEs (translation entries) while DMA is 
a process initiated by a device which does not know about how exactly DMA 
addresses are translated later. Since we are on the host side and we 
actually manage TCE tables here, I suggest keeping the "tce32_" prefix for 
TCE tables and memory they use.

>     * Removed struct pnv_ioda_pe::tce32_segcount.

That's confusing - I had to walk through patches to find out where you 
stopped using it. It would be simpler if you put this particular change to

[PATCH v6 02/42] powerpc/powernv: Drop pnv_ioda_setup_dev_PE()

where you remove dead code.


>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 48 +++++++++++++++----------------
>   arch/powerpc/platforms/powernv/pci.h      |  7 ++---
>   2 files changed, 27 insertions(+), 28 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 7342cfd..8456f37 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -917,7 +917,7 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
>   	struct pnv_ioda_pe *lpe;
>
>   	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
> -		if (lpe->dma_weight < pe->dma_weight) {
> +		if (lpe->dma32_weight < pe->dma32_weight) {
>   			list_add_tail(&pe->dma_link, &lpe->dma_link);
>   			return;
>   		}
> @@ -942,14 +942,14 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>   	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>   	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>   	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> -		return 3 * phb->ioda.tce32_count;
> +		return 3 * phb->ioda.dma32_segcount;
>
>   	/* Increase the weight of RAID (includes Obsidian) */
>   	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
> -		return 15 * phb->ioda.tce32_count;
> +		return 15 * phb->ioda.dma32_segcount;
>
>   	/* Default */
> -	return 10 * phb->ioda.tce32_count;
> +	return 10 * phb->ioda.dma32_segcount;
>   }
>
>   static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data)
> @@ -1057,7 +1057,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>   			continue;
>   		}
>   		pdn->pe_number = pe->pe_number;
> -		pe->dma_weight += pnv_ioda_dma_weight(dev);
> +		pe->dma32_weight += pnv_ioda_dma_weight(dev);
>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>   			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>   	}
> @@ -1094,10 +1094,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>   	pe->pbus = bus;
>   	pe->pdev = NULL;
> -	pe->tce32_seg = -1;
> +	pe->dma32_seg = -1;
>   	pe->mve_number = -1;
>   	pe->rid = bus->busn_res.start << 8;
> -	pe->dma_weight = 0;
> +	pe->dma32_weight = 0;
>
>   	if (all)
>   		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
> @@ -1460,7 +1460,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>   		pe->flags = PNV_IODA_PE_VF;
>   		pe->pbus = NULL;
>   		pe->parent_dev = pdev;
> -		pe->tce32_seg = -1;
> +		pe->dma32_seg = -1;
>   		pe->mve_number = -1;
>   		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
>   			   pci_iov_virtfn_devfn(pdev, vf_index);
> @@ -1936,7 +1936,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>   	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>
>   	/* We shouldn't already have a 32-bit DMA associated */
> -	if (WARN_ON(pe->tce32_seg >= 0))
> +	if (WARN_ON(pe->dma32_seg >= 0))
>   		return;
>
>   	tbl = pnv_pci_table_alloc(phb->hose->node);
> @@ -1945,7 +1945,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>   	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>
>   	/* Grab a 32-bit TCE table */
> -	pe->tce32_seg = base;
> +	pe->dma32_seg = base;
>   	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>   		(base << 28), ((base + segs) << 28) - 1);
>
> @@ -2006,8 +2006,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>   	return;
>    fail:
>   	/* XXX Failure: Try to fallback to 64-bit only ? */
> -	if (pe->tce32_seg >= 0)
> -		pe->tce32_seg = -1;
> +	if (pe->dma32_seg >= 0)
> +		pe->dma32_seg = -1;
>   	if (tce_mem)
>   		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>   	if (tbl) {
> @@ -2405,7 +2405,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   	int64_t rc;
>
>   	/* We shouldn't already have a 32-bit DMA associated */
> -	if (WARN_ON(pe->tce32_seg >= 0))
> +	if (WARN_ON(pe->dma32_seg >= 0))
>   		return;
>
>   	/* TVE #1 is selected by PCI address bit 59 */
> @@ -2415,7 +2415,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   			pe->pe_number);
>
>   	/* The PE will reserve all possible 32-bits space */
> -	pe->tce32_seg = 0;
> +	pe->dma32_seg = 0;
>   	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
>   		phb->ioda.m32_pci_base);
>
> @@ -2432,8 +2432,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>
>   	rc = pnv_pci_ioda2_setup_default_config(pe);
>   	if (rc) {
> -		if (pe->tce32_seg >= 0)
> -			pe->tce32_seg = -1;
> +		if (pe->dma32_seg >= 0)
> +			pe->dma32_seg = -1;
>   		return;
>   	}
>
> @@ -2452,7 +2452,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   	/* Calculate the PHB's DMA weight */
>   	dma_weight = pnv_ioda_phb_dma_weight(phb);
>   	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
> -		hose->global_number, phb->ioda.tce32_count, dma_weight);
> +		hose->global_number, phb->ioda.dma32_segcount, dma_weight);
>
>   	pnv_pci_ioda_setup_opal_tce_kill(phb);
>
> @@ -2461,7 +2461,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   	 * weight
>   	 */
>   	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
> -		if (!pe->dma_weight)
> +		if (!pe->dma32_weight)
>   			continue;
>
>   		/*
> @@ -2472,15 +2472,15 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   		if (phb->type == PNV_PHB_IODA1) {
>   			unsigned int segs, base = 0;
>
> -			if (pe->dma_weight <
> -			    dma_weight / phb->ioda.tce32_count)
> +			if (pe->dma32_weight <
> +			    dma_weight / phb->ioda.dma32_segcount)
>   				segs = 1;
>   			else
> -				segs = (pe->dma_weight *
> -					phb->ioda.tce32_count) / dma_weight;
> +				segs = (pe->dma32_weight *
> +					phb->ioda.dma32_segcount) / dma_weight;
>
>   			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
> -				pe->dma_weight, segs);
> +				pe->dma32_weight, segs);
>   			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>
>   			base += segs;
> @@ -3211,7 +3211,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	mutex_init(&phb->ioda.pe_list_mutex);
>
>   	/* Calculate how many 32-bit TCE segments we have */
> -	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
> +	phb->ioda.dma32_segcount = phb->ioda.m32_pci_base >> 28;
>
>   #if 0 /* We should really do that ... */
>   	rc = opal_pci_set_phb_mem_window(opal->phb_id,
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index addd3f7..574fe43 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -61,11 +61,10 @@ struct pnv_ioda_pe {
>   	/* "Weight" assigned to the PE for the sake of DMA resource
>   	 * allocations
>   	 */
> -	unsigned int		dma_weight;
> +	unsigned int		dma32_weight;
>
>   	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
> -	int			tce32_seg;
> -	int			tce32_segcount;
> +	int			dma32_seg;
>   	struct iommu_table_group table_group;
>
>   	/* 64-bit TCE bypass region */
> @@ -181,7 +180,7 @@ struct pnv_phb {
>   			unsigned char		pe_rmap[0x10000];
>
>   			/* 32-bit TCE tables allocation */
> -			unsigned long		tce32_count;
> +			unsigned long		dma32_segcount;
>
>   			/* Sorted list of used PE's, sorted at
>   			 * boot for resource allocation purposes
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically
  2015-08-06  4:11 ` [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically Gavin Shan
  2015-08-10  7:48   ` Alexey Kardashevskiy
@ 2015-08-10  9:21   ` Alexey Kardashevskiy
  2015-08-12 23:57     ` Gavin Shan
  1 sibling, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  9:21 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> For P7IOC, the whole available DMA32 space, which is below the
> MEM32 space, is divided evenly into 256MB segments. The number
> of continuous segments assigned to one particular PE depends on
> the PE's DMA weight that is calculated based on the type of each
> PCI devices contained in the PE, and PHB's DMA weight which is
> accumulative DMA weight of PEs contained in the PHB. It means
> that the PHB's DMA weight calculation depends on existing PEs,
> which works perfectly now, but not hotplug friendly. As the
> whole available DMA32 space can be assigned to one PE on PHB3,
> so we don't have the issue on PHB3.
>
> The patch calculates PHB's DMA weight based on the PCI devices
> contained in the PHB dynamically so that it's hotplug friendly.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 88 +++++++++++++++----------------
>   arch/powerpc/platforms/powernv/pci.h      |  6 ---
>   2 files changed, 43 insertions(+), 51 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 713f4b4..7342cfd 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -927,6 +927,9 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
>
>   static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>   {
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +
>   	/* This is quite simplistic. The "base" weight of a device
>   	 * is 10. 0 means no DMA is to be accounted for it.
>   	 */
> @@ -939,14 +942,34 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>   	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>   	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>   	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> -		return 3;
> +		return 3 * phb->ioda.tce32_count;
>
>   	/* Increase the weight of RAID (includes Obsidian) */
>   	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
> -		return 15;
> +		return 15 * phb->ioda.tce32_count;
>
>   	/* Default */
> -	return 10;
> +	return 10 * phb->ioda.tce32_count;
> +}
> +
> +static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data)
> +{
> +	unsigned int *dma_weight = data;
> +
> +	*dma_weight += pnv_ioda_dma_weight(pdev);
> +	return 0;
> +}
> +
> +static unsigned int pnv_ioda_phb_dma_weight(struct pnv_phb *phb)
> +{
> +	unsigned int dma_weight = 0;
> +
> +	if (!phb->hose->bus)
> +		return 0;
> +
> +	pci_walk_bus(phb->hose->bus,
> +		     __pnv_ioda_phb_dma_weight, &dma_weight);
> +	return dma_weight;
>   }
>
>   #ifdef CONFIG_PCI_IOV
> @@ -1097,14 +1120,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	/* Put PE to the list */
>   	list_add_tail(&pe->list, &phb->ioda.pe_list);
>
> -	/* Account for one DMA PE if at least one DMA capable device exist
> -	 * below the bridge
> -	 */
> -	if (pe->dma_weight != 0) {
> -		phb->ioda.dma_weight += pe->dma_weight;
> -		phb->ioda.dma_pe_count++;
> -	}
> -
>   	/* Link the PE */
>   	pnv_ioda_link_pe_by_weight(phb, pe);
>   }
> @@ -2431,24 +2446,13 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   {
>   	struct pci_controller *hose = phb->hose;
> -	unsigned int residual, remaining, segs, tw, base;
>   	struct pnv_ioda_pe *pe;
> +	unsigned int dma_weight;
>
> -	/* If we have more PE# than segments available, hand out one
> -	 * per PE until we run out and let the rest fail. If not,
> -	 * then we assign at least one segment per PE, plus more based
> -	 * on the amount of devices under that PE
> -	 */
> -	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
> -		residual = 0;
> -	else
> -		residual = phb->ioda.tce32_count -
> -			phb->ioda.dma_pe_count;
> -
> -	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
> -		hose->global_number, phb->ioda.tce32_count);
> -	pr_info("PCI: %d PE# for a total weight of %d\n",
> -		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
> +	/* Calculate the PHB's DMA weight */
> +	dma_weight = pnv_ioda_phb_dma_weight(phb);
> +	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
> +		hose->global_number, phb->ioda.tce32_count, dma_weight);
>
>   	pnv_pci_ioda_setup_opal_tce_kill(phb);
>
> @@ -2456,22 +2460,9 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   	 * out one base segment plus any residual segments based on
>   	 * weight
>   	 */
> -	remaining = phb->ioda.tce32_count;
> -	tw = phb->ioda.dma_weight;
> -	base = 0;
>   	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>   		if (!pe->dma_weight)
>   			continue;
> -		if (!remaining) {
> -			pe_warn(pe, "No DMA32 resources available\n");
> -			continue;
> -		}
> -		segs = 1;
> -		if (residual) {
> -			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
> -			if (segs > remaining)
> -				segs = remaining;
> -		}
>
>   		/*
>   		 * For IODA2 compliant PHB3, we needn't care about the weight.
> @@ -2479,17 +2470,24 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>   		 * the specific PE.
>   		 */
>   		if (phb->type == PNV_PHB_IODA1) {
> -			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
> +			unsigned int segs, base = 0;
> +
> +			if (pe->dma_weight <
> +			    dma_weight / phb->ioda.tce32_count)
> +				segs = 1;
> +			else
> +				segs = (pe->dma_weight *
> +					phb->ioda.tce32_count) / dma_weight;
> +
> +			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
>   				pe->dma_weight, segs);
>   			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
> +
> +			base += segs;


This is not right. @base here is a local variable in the scope, 
pnv_pci_ioda_setup_dma_pe() will always be called with base==0.


Sorry for commenting the same patch twice.


>   		} else {
>   			pe_info(pe, "Assign DMA32 space\n");
> -			segs = 0;
>   			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>   		}
> -
> -		remaining -= segs;
> -		base += segs;
>   	}
>   }
>
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 08a4e57..addd3f7 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -183,12 +183,6 @@ struct pnv_phb {
>   			/* 32-bit TCE tables allocation */
>   			unsigned long		tce32_count;
>
> -			/* Total "weight" for the sake of DMA resources
> -			 * allocation
> -			 */
> -			unsigned int		dma_weight;
> -			unsigned int		dma_pe_count;
> -
>   			/* Sorted list of used PE's, sorted at
>   			 * boot for resource allocation purposes
>   			 */
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only
  2015-08-06  4:11 ` [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only Gavin Shan
@ 2015-08-10  9:31   ` Alexey Kardashevskiy
  2015-08-11  0:29     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  9:31 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> The original implementation of pnv_ioda_setup_dma() iterates the
> list of PEs and configures the DMA32 space for them one by one.
> The function was designed to be called during PHB fixup time.
> When configuring PE's DMA32 space in pcibios_setup_bridge(), in
> order to support PCI hotplug, we have to have the function PE
> oriented.
>
> This renames pnv_ioda_setup_dma() to pnv_ioda1_setup_dma() and
> adds one more argument "struct pnv_ioda_pe *pe" to it. The caller,
> pnv_pci_ioda_setup_DMA(), gets PE from the list and passes to it
> or pnv_pci_ioda2_setup_dma_pe(). The patch shouldn't cause behavioral
> changes.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 75 +++++++++++++++----------------
>   1 file changed, 36 insertions(+), 39 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 8456f37..cd22002 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2443,52 +2443,29 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>   }
>
> -static void pnv_ioda_setup_dma(struct pnv_phb *phb)
> +static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
> +					struct pnv_ioda_pe *pe,
> +					unsigned int base)
>   {
>   	struct pci_controller *hose = phb->hose;
> -	struct pnv_ioda_pe *pe;
> -	unsigned int dma_weight;
> +	unsigned int dma_weight, segs;
>
>   	/* Calculate the PHB's DMA weight */
>   	dma_weight = pnv_ioda_phb_dma_weight(phb);
>   	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
>   		hose->global_number, phb->ioda.dma32_segcount, dma_weight);
>
> -	pnv_pci_ioda_setup_opal_tce_kill(phb);
> -
> -	/* Walk our PE list and configure their DMA segments, hand them
> -	 * out one base segment plus any residual segments based on
> -	 * weight
> -	 */
> -	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
> -		if (!pe->dma32_weight)
> -			continue;
> -
> -		/*
> -		 * For IODA2 compliant PHB3, we needn't care about the weight.
> -		 * The all available 32-bits DMA space will be assigned to
> -		 * the specific PE.
> -		 */
> -		if (phb->type == PNV_PHB_IODA1) {
> -			unsigned int segs, base = 0;
> -
> -			if (pe->dma32_weight <
> -			    dma_weight / phb->ioda.dma32_segcount)
> -				segs = 1;
> -			else
> -				segs = (pe->dma32_weight *
> -					phb->ioda.dma32_segcount) / dma_weight;
> -
> -			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
> -				pe->dma32_weight, segs);
> -			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
> +	if (pe->dma32_weight <
> +	    dma_weight / phb->ioda.dma32_segcount)

Can be one line now.


> +		segs = 1;
> +	else
> +		segs = (pe->dma32_weight *
> +			phb->ioda.dma32_segcount) / dma_weight;
> +	pe_info(pe, "DMA weight %d, assigned %d segments\n",
> +		pe->dma32_weight, segs);
> +	pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);


Why not to merge pnv_ioda1_setup_dma() to pnv_pci_ioda_setup_dma_pe()?


>
> -			base += segs;
> -		} else {
> -			pe_info(pe, "Assign DMA32 space\n");
> -			pnv_pci_ioda2_setup_dma_pe(phb, pe);
> -		}
> -	}
> +	return segs;
>   }
>
>   #ifdef CONFIG_PCI_MSI
> @@ -2955,12 +2932,32 @@ static void pnv_pci_ioda_setup_DMA(void)
>   {
>   	struct pci_controller *hose, *tmp;
>   	struct pnv_phb *phb;
> +	struct pnv_ioda_pe *pe;
> +	unsigned int base;
>
>   	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> -		pnv_ioda_setup_dma(hose->private_data);
> +		phb = hose->private_data;
> +		pnv_pci_ioda_setup_opal_tce_kill(phb);
> +
> +		base = 0;
> +		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
> +			if (!pe->dma32_weight)
> +				continue;
> +
> +			switch (phb->type) {
> +			case PNV_PHB_IODA1:
> +				base += pnv_ioda1_setup_dma(phb, pe, base);


This @base handling seems never be tested between 8..11 as "[PATCH v6 
11/42] powerpc/powernv: Trace DMA32 segments consumed by PE"
removes it and I suspect you only tested the final version. Which is ok for 
the final result but not ok for bisectability.

Looks like 8/42, 9/42, 10/42, 11/42 need to be rearranged or merged to 
remove this multiple @base touching.


> +				break;
> +			case PNV_PHB_IODA2:
> +				pnv_pci_ioda2_setup_dma_pe(phb, pe);
> +				break;
> +			default:
> +				pr_warn("%s: No DMA for PHB type %d\n",
> +					__func__, phb->type);
> +			}
> +		}
>
>   		/* Mark the PHB initialization done */
> -		phb = hose->private_data;
>   		phb->initialized = 1;
>   	}
>   }
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by PE
  2015-08-06  4:11 ` [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by PE Gavin Shan
@ 2015-08-10  9:43   ` Alexey Kardashevskiy
  2015-08-11  0:33     ` Gavin Shan
  2015-08-13  0:02     ` Gavin Shan
  0 siblings, 2 replies; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  9:43 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> On P7IOC, the whole DMA32 space is divided evenly to 256MB segments.
> Each PE can consume one or multiple DMA32 segments. Current code
> doesn't trace the available DMA32 segments and those consumed by
> one particular PE. It's conflicting with PCI hotplug.
>
> The patch introduces one bitmap to PHB to trace the available
> DMA32 segments for allocation, more fields to "struct pnv_ioda_pe"
> to trace the consumed DMA32 segments by the PE, which is going to
> be released when the PE is destroyed at PCI unplugging time.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 40 +++++++++++++++++++++++--------
>   arch/powerpc/platforms/powernv/pci.h      |  4 +++-
>   2 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index cd22002..57ba8fd 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1946,6 +1946,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>
>   	/* Grab a 32-bit TCE table */
>   	pe->dma32_seg = base;
> +	pe->dma32_segcount = segs;
>   	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>   		(base << 28), ((base + segs) << 28) - 1);
>
> @@ -2006,8 +2007,13 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>   	return;
>    fail:
>   	/* XXX Failure: Try to fallback to 64-bit only ? */
> -	if (pe->dma32_seg >= 0)
> +	if (pe->dma32_seg >= 0) {
> +		bitmap_clear(phb->ioda.dma32_segmap,
> +			     pe->dma32_seg, pe->dma32_segcount);
>   		pe->dma32_seg = -1;
> +		pe->dma32_segcount = 0;
> +	}
> +
>   	if (tce_mem)
>   		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>   	if (tbl) {
> @@ -2443,12 +2449,11 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>   }
>
> -static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
> -					struct pnv_ioda_pe *pe,
> -					unsigned int base)
> +static void pnv_ioda1_setup_dma(struct pnv_phb *phb,
> +					struct pnv_ioda_pe *pe)
>   {
>   	struct pci_controller *hose = phb->hose;
> -	unsigned int dma_weight, segs;
> +	unsigned int dma_weight, base, segs;
>
>   	/* Calculate the PHB's DMA weight */
>   	dma_weight = pnv_ioda_phb_dma_weight(phb);
> @@ -2461,11 +2466,28 @@ static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
>   	else
>   		segs = (pe->dma32_weight *
>   			phb->ioda.dma32_segcount) / dma_weight;
> +
> +	/*
> +	 * Allocate DMA32 segments. We might not have enough
> +	 * resources available. However we expect at least one
> +	 * to be available.
> +	 */
> +	do {
> +		base = bitmap_find_next_zero_area(phb->ioda.dma32_segmap,
> +						  phb->ioda.dma32_segcount,
> +						  0, segs, 0);
> +		if (base < phb->ioda.dma32_segcount) {
> +			bitmap_set(phb->ioda.dma32_segmap, base, segs);
> +			break;
> +		}
> +	} while (--segs);


If segs==0 before entering the loop, the loop will execute 0xfffffffe 
times. Make it for(;segs;--segs){ }.


> +
> +	if (WARN_ON(!segs))
> +		return;
> +
>   	pe_info(pe, "DMA weight %d, assigned %d segments\n",
>   		pe->dma32_weight, segs);
>   	pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
> -
> -	return segs;
>   }
>
>   #ifdef CONFIG_PCI_MSI
> @@ -2933,20 +2955,18 @@ static void pnv_pci_ioda_setup_DMA(void)
>   	struct pci_controller *hose, *tmp;
>   	struct pnv_phb *phb;
>   	struct pnv_ioda_pe *pe;
> -	unsigned int base;
>
>   	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>   		phb = hose->private_data;
>   		pnv_pci_ioda_setup_opal_tce_kill(phb);
>
> -		base = 0;
>   		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>   			if (!pe->dma32_weight)
>   				continue;
>
>   			switch (phb->type) {
>   			case PNV_PHB_IODA1:
> -				base += pnv_ioda1_setup_dma(phb, pe, base);
> +				pnv_ioda1_setup_dma(phb, pe);
>   				break;
>   			case PNV_PHB_IODA2:
>   				pnv_pci_ioda2_setup_dma_pe(phb, pe);
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 574fe43..1dc9578 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -65,6 +65,7 @@ struct pnv_ioda_pe {
>
>   	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
>   	int			dma32_seg;
> +	int			dma32_segcount;
>   	struct iommu_table_group table_group;
>
>   	/* 64-bit TCE bypass region */
> @@ -153,10 +154,11 @@ struct pnv_phb {
>   			unsigned int		io_segsize;
>   			unsigned int		io_pci_base;
>
> -			/* IO, M32, M64 segment maps */
> +			/* IO, M32, M64, DMA32 segment maps */
>   			unsigned long		io_segmap[8];
>   			unsigned long		m32_segmap[8];
>   			unsigned long		m64_segmap[8];
> +			unsigned long		dma32_segmap[8];
>
>   			/* PE allocation */
>   			struct mutex		pe_alloc_mutex;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 12/42] powerpc/powernv: Increase PE# capacity
       [not found]     ` <1438834307-26960-13-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2015-08-10  9:53       ` Alexey Kardashevskiy
  2015-08-11  0:38         ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10  9:53 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r, mpe-Gsx/Oe8HsFggBc27wqDAHg,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robherring2-Re5JQEeQqe8AvxtiuMwx3w,
	panto-wVdstyuyKrO8r51toPun2/C9HSW9iNxf

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> Each PHB maintains an array helping to translate RID (Request
> ID) to PE# with the assumption that PE# takes 8 bits, indicating
> that we can't have more than 256 PEs. However, pci_dn->pe_number
> already had 4-bytes for the PE#.
>
> The patch extends the PE# capacity so that each of them will be
> 4-bytes long. Then we can use IODA_INVALID_PE to check one entry
> in phb->pe_rmap[] is valid or not.
>
> Signed-off-by: Gavin Shan <gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 8 ++++++--
>   arch/powerpc/platforms/powernv/pci.h      | 7 +++----
>   2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 57ba8fd..3094c61 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -786,7 +786,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>
>   	/* Clear the reverse map */
>   	for (rid = pe->rid; rid < rid_end; rid++)
> -		phb->ioda.pe_rmap[rid] = 0;
> +		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>
>   	/* Release from all parents PELT-V */
>   	while (parent) {
> @@ -3134,7 +3134,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	unsigned long size, pemap_off;
>   	const __be64 *prop64;
>   	const __be32 *prop32;
> -	int len;
> +	int len, i;
>   	u64 phb_id;
>   	void *aux;
>   	long rc;
> @@ -3201,6 +3201,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	if (prop32)
>   		phb->ioda.reserved_pe = be32_to_cpup(prop32);
>
> +	/* Invalidate RID to PE# mapping */
> +	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
> +		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
> +
>   	/* Parse 64-bit MMIO range */
>   	pnv_ioda_parse_m64_window(phb);
>
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 1dc9578..6f8568e 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -175,11 +175,10 @@ struct pnv_phb {
>   			struct list_head	pe_list;
>   			struct mutex            pe_list_mutex;
>
> -			/* Reverse map of PEs, will have to extend if
> -			 * we are to support more than 256 PEs, indexed
> -			 * bus { bus, devfn }
> +			/* Reverse map of PEs, indexed by
> +			 * { bus, devfn }
>   			 */
> -			unsigned char		pe_rmap[0x10000];
> +			int			pe_rmap[0x10000];


256k seems to be waste when only tiny fraction of it will ever be used. 
Using include/linux/hashtable.h makes sense here, and if you use a 
hashtable, you won't have to initialize anything with IODA_INVALID_PE.


>
>   			/* 32-bit TCE tables allocation */
>   			unsigned long		dma32_segcount;
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 15/42] powerpc/powernv: PE oriented during configuration
  2015-08-06  4:11 ` [PATCH v6 15/42] powerpc/powernv: PE oriented during configuration Gavin Shan
@ 2015-08-10 10:02   ` Alexey Kardashevskiy
  2015-08-11  0:39     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10 10:02 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> Several functions used to configure PE take pe_number to indentify
> PE instance. As the pe_number is included in PE instance after it
> is reserved or allocated. It's convienent for those functions to
> return PE instance which includes the required pe_number.

This is a description for the half of the patch but this patch also adds a 
return value to functions which did not have it before and I am not sure 
you need all of them to return something. It would be cleaner if you added 
"return" when/where you really need it, not just because it seems that it 
may be convenient later.


>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 51 ++++++++++++++++---------------
>   arch/powerpc/platforms/powernv/pci.h      |  2 +-
>   2 files changed, 27 insertions(+), 26 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 3094c61..9f53682 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -132,12 +132,12 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>   		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>   }
>
> -static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
> +static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>   {
>   	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
>   		pr_warn("%s: Invalid PE %d on PHB#%x\n",
>   			__func__, pe_no, phb->hose->global_number);
> -		return;
> +		return NULL;
>   	}
>
>   	if (test_and_set_bit(pe_no, phb->ioda.pe_alloc))
> @@ -146,9 +146,11 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>
>   	phb->ioda.pe_array[pe_no].phb = phb;
>   	phb->ioda.pe_array[pe_no].pe_number = pe_no;
> +
> +	return &phb->ioda.pe_array[pe_no];
>   }
>
> -static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
> +static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>   {
>   	unsigned long pe;
>
> @@ -156,12 +158,12 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>   		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>   					phb->ioda.total_pe, 0);
>   		if (pe >= phb->ioda.total_pe)
> -			return IODA_INVALID_PE;
> +			return NULL;
>   	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>
>   	phb->ioda.pe_array[pe].phb = phb;
>   	phb->ioda.pe_array[pe].pe_number = pe;
> -	return pe;
> +	return &phb->ioda.pe_array[pe];
>   }
>
>   static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
> @@ -334,7 +336,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>   	}
>   }
>
> -static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
> +static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
> @@ -344,7 +346,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>
>   	/* Root bus shouldn't use M64 */
>   	if (pci_is_root_bus(bus))
> -		return IODA_INVALID_PE;
> +		return NULL;
>
>   	/* Allocate bitmap */
>   	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
> @@ -352,7 +354,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   	if (!pe_alloc) {
>   		pr_warn("%s: Out of memory !\n",
>   			__func__);
> -		return IODA_INVALID_PE;
> +		return NULL;
>   	}
>
>   	/* Figure out reserved PE numbers by the PE */
> @@ -365,7 +367,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   	 */
>   	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
>   		kfree(pe_alloc);
> -		return IODA_INVALID_PE;
> +		return NULL;
>   	}
>
>   	/*
> @@ -416,7 +418,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>   	}
>
>   	kfree(pe_alloc);
> -	return master_pe->pe_number;
> +	return master_pe;
>   }
>
>   static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
> @@ -1069,28 +1071,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>    * subordinate PCI devices and buses. The second type of PE is normally
>    * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
>    */
> -static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
> +static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
> -	struct pnv_ioda_pe *pe;
> -	int pe_num = IODA_INVALID_PE;
> +	struct pnv_ioda_pe *pe = NULL;
>
>   	/* Check if PE is determined by M64 */
>   	if (phb->pick_m64_pe)
> -		pe_num = phb->pick_m64_pe(bus, all);
> +		pe = phb->pick_m64_pe(bus, all);
>
>   	/* The PE number isn't pinned by M64 */
> -	if (pe_num == IODA_INVALID_PE)
> -		pe_num = pnv_ioda_alloc_pe(phb);
> +	if (!pe)
> +		pe = pnv_ioda_alloc_pe(phb);
>
> -	if (pe_num == IODA_INVALID_PE) {
> -		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
> +	if (!pe) {
> +		pr_warning("%s: No enough PE# for PCI bus %04x:%02x\n",
>   			__func__, pci_domain_nr(bus), bus->number);
> -		return;
> +		return NULL;
>   	}
>
> -	pe = &phb->ioda.pe_array[pe_num];
>   	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>   	pe->pbus = bus;
>   	pe->pdev = NULL;
> @@ -1101,17 +1101,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>
>   	if (all)
>   		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
> -			bus->busn_res.start, bus->busn_res.end, pe_num);
> +			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
>   	else
>   		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
> -			bus->busn_res.start, pe_num);
> +			bus->busn_res.start, pe->pe_number);
>
>   	if (pnv_ioda_configure_pe(phb, pe)) {
>   		/* XXX What do we do here ? */
> -		if (pe_num)
> -			pnv_ioda_free_pe(phb, pe_num);
> +		pnv_ioda_free_pe(phb, pe->pe_number);
>   		pe->pbus = NULL;
> -		return;
> +		return NULL;
>   	}
>
>   	/* Associate it with all child devices */
> @@ -1122,6 +1121,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>
>   	/* Link the PE */
>   	pnv_ioda_link_pe_by_weight(phb, pe);
> +
> +	return pe;
>   }
>
>   static void pnv_ioda_setup_PEs(struct pci_bus *bus)
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 6f8568e..c0bc57f 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -121,7 +121,7 @@ struct pnv_phb {
>   	int (*init_m64)(struct pnv_phb *phb);
>   	void (*reserve_m64_pe)(struct pci_bus *bus,
>   			       unsigned long *pe_bitmap, bool all);
> -	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
> +	struct pnv_ioda_pe* (*pick_m64_pe)(struct pci_bus *bus, bool all);
>   	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
>   	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
>   	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 17/42] powerpc/powernv: Rename PE# fields in PHB
  2015-08-06  4:11   ` [PATCH v6 17/42] powerpc/powernv: Rename PE# fields in PHB Gavin Shan
@ 2015-08-10 14:21     ` Alexey Kardashevskiy
  2015-08-11  0:40       ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10 14:21 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> This renames the fields related to PE# in "struct pnv_phb" for
> better reflecting of their usages as Alexey suggested. It doesn't
> introduce behavioural changes.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


Makes sense to move this to the beginning of the patchset as patches prior 
this are changing the same lines as this patch changes.


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order
  2015-08-06  4:11 ` [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order Gavin Shan
@ 2015-08-10 14:39   ` Alexey Kardashevskiy
  2015-08-11  0:43     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-10 14:39 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> The available PE#, represented by a bitmap in the PHB, is allocated
> in ascending order.

Available PE# is available exactly because it is not allocated ;)

> It conflicts with the fact that M64 segments are
> assigned in same order. In order to avoid the conflict, the patch
> allocates PE# in descending order.

What kind of conflict?


>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 56b058c..1c950e8 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -161,13 +161,18 @@ static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>   static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>   {
>   	unsigned long pe;
> +	unsigned long limit = phb->ioda.total_pe_num - 1;
>
>   	do {
>   		pe = find_next_zero_bit(phb->ioda.pe_alloc,
> -					phb->ioda.total_pe_num, 0);
> -		if (pe >= phb->ioda.total_pe_num)
> +					phb->ioda.total_pe_num, limit);
> +		if (pe < phb->ioda.total_pe_num &&
> +		    !test_and_set_bit(pe, phb->ioda.pe_alloc))
> +			break;
> +
> +		if (--limit >= phb->ioda.total_pe_num)
>   			return NULL;
> -	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
> +	} while (1);


Usually, if it is "while(1)", then it is "while(1){}" rather than 
"do{}while(1)" :)


>
>   	return pnv_ioda_init_pe(phb, pe);
>   }
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 39/42] drivers/of: Allow to specify root node in of_fdt_unflatten_tree()
  2015-08-06  4:11 ` [PATCH v6 39/42] drivers/of: Allow to specify root node in of_fdt_unflatten_tree() Gavin Shan
@ 2015-08-10 22:42   ` Frank Rowand
  2015-08-11  0:52     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Frank Rowand @ 2015-08-10 22:42 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, aik

On 8/5/2015 9:11 PM, Gavin Shan wrote:
> This introduces one more argument to of_fdt_unflatten_tree()
> to specify the root node for the FDT blob, which is going to be
> unflattened. In the result, the function can be used to unflatten
> FDT blob, which represents device sub-tree in PowerNV hotplug
> driver.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c       | 13 ++++++++-----
>  drivers/of/unittest.c  |  2 +-
>  include/linux/of_fdt.h |  1 +
>  3 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index a18a2ce..074870a 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -388,10 +388,11 @@ static void *unflatten_dt_node(const void *blob,
>  			       struct device_node **nodepp,
>  			       bool dryrun)
>  {
> +	unsigned long fpsize = dad ? strlen(of_node_full_name(dad)) : 0;
>  	int depth = 1;
>  
>  	return __unflatten_dt_node(blob, mem, poffset,
> -				   dad, nodepp, 0,
> +				   dad, nodepp, fpsize,
>  				   dryrun, &depth);
>  }
>  
> @@ -408,6 +409,7 @@ static void *unflatten_dt_node(const void *blob,
>   * for the resulting tree
>   */
>  static void __unflatten_device_tree(const void *blob,
> +			     struct device_node *dad,
>  			     struct device_node **mynodes,
>  			     void * (*dt_alloc)(u64 size, u64 align))
>  {

Please add @dad to the documentation header for the function.


> @@ -435,7 +437,7 @@ static void __unflatten_device_tree(const void *blob,
>  	/* First pass, scan for size */
>  	start = 0;
>  	size = (unsigned long)unflatten_dt_node(blob, NULL, &start,
> -						NULL, NULL, true);
> +						dad, NULL, true);
>  	size = ALIGN(size, 4);
>  
>  	pr_debug("  size is %lx, allocating...\n", size);
> @@ -450,7 +452,7 @@ static void __unflatten_device_tree(const void *blob,
>  
>  	/* Second pass, do actual unflattening */
>  	start = 0;
> -	unflatten_dt_node(blob, mem, &start, NULL, mynodes, false);
> +	unflatten_dt_node(blob, mem, &start, dad, mynodes, false);
>  	if (be32_to_cpup(mem + size) != 0xdeadbeef)
>  		pr_warning("End of tree marker overwritten: %08x\n",
>  			   be32_to_cpup(mem + size));
> @@ -472,9 +474,10 @@ static void *kernel_tree_alloc(u64 size, u64 align)
>   * can be used.
>   */
>  void of_fdt_unflatten_tree(const unsigned long *blob,
> +			struct device_node *dad,
>  			struct device_node **mynodes)
>  {
> -	__unflatten_device_tree(blob, mynodes, &kernel_tree_alloc);
> +	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
>  }
>  EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
>  
> @@ -1125,7 +1128,7 @@ bool __init early_init_dt_scan(void *params)
>   */
>  void __init unflatten_device_tree(void)
>  {
> -	__unflatten_device_tree(initial_boot_params, &of_root,
> +	__unflatten_device_tree(initial_boot_params, NULL, &of_root,
>  				early_init_dt_alloc_memory_arch);
>  
>  	/* Get pointer to "/chosen" and "/aliases" nodes for use everywhere */
> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
> index 1801634..2270830 100644
> --- a/drivers/of/unittest.c
> +++ b/drivers/of/unittest.c
> @@ -907,7 +907,7 @@ static int __init unittest_data_add(void)
>  			"not running tests\n", __func__);
>  		return -ENOMEM;
>  	}
> -	of_fdt_unflatten_tree(unittest_data, &unittest_data_node);
> +	of_fdt_unflatten_tree(unittest_data, NULL, &unittest_data_node);
>  	if (!unittest_data_node) {
>  		pr_warn("%s: No tree to attach; not running tests\n", __func__);
>  		return -ENODATA;
> diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
> index df9ef38..3644960 100644
> --- a/include/linux/of_fdt.h
> +++ b/include/linux/of_fdt.h
> @@ -38,6 +38,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
>  extern int of_fdt_match(const void *blob, unsigned long node,
>  			const char *const *compat);
>  extern void of_fdt_unflatten_tree(const unsigned long *blob,
> +			       struct device_node *dad,
>  			       struct device_node **mynodes);
>  
>  /* TBD: Temporary export of fdt globals - remove when code fully merged */
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree()
  2015-08-06  4:11 ` [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree() Gavin Shan
       [not found]   ` <1438834307-26960-41-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2015-08-10 22:42   ` Frank Rowand
  2015-08-11  0:52     ` Gavin Shan
  1 sibling, 1 reply; 102+ messages in thread
From: Frank Rowand @ 2015-08-10 22:42 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, aik

On 8/5/2015 9:11 PM, Gavin Shan wrote:
> This changes of_fdt_unflatten_tree() so that it returns the allocated
> memory chunk for unflattened device-tree, which can be released once
> it's obsoleted.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c       | 11 ++++++-----
>  include/linux/of_fdt.h |  2 +-
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 074870a..8e1ba7e 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -408,7 +408,7 @@ static void *unflatten_dt_node(const void *blob,
>   * @dt_alloc: An allocator that provides a virtual address to memory
>   * for the resulting tree
>   */
> -static void __unflatten_device_tree(const void *blob,
> +static void *__unflatten_device_tree(const void *blob,
>  			     struct device_node *dad,
>  			     struct device_node **mynodes,
>  			     void * (*dt_alloc)(u64 size, u64 align))

Please add a description of the return value to the documentation header.


> @@ -421,7 +421,7 @@ static void __unflatten_device_tree(const void *blob,
>  
>  	if (!blob) {
>  		pr_debug("No device tree pointer\n");
> -		return;
> +		return NULL;
>  	}
>  
>  	pr_debug("Unflattening device tree:\n");
> @@ -431,7 +431,7 @@ static void __unflatten_device_tree(const void *blob,
>  
>  	if (fdt_check_header(blob)) {
>  		pr_err("Invalid device tree blob header\n");
> -		return;
> +		return NULL;
>  	}
>  
>  	/* First pass, scan for size */
> @@ -458,6 +458,7 @@ static void __unflatten_device_tree(const void *blob,
>  			   be32_to_cpup(mem + size));
>  
>  	pr_debug(" <- unflatten_device_tree()\n");
> +	return mem;
>  }
>  
>  static void *kernel_tree_alloc(u64 size, u64 align)
> @@ -473,11 +474,11 @@ static void *kernel_tree_alloc(u64 size, u64 align)
>   * pointers of the nodes so the normal device-tree walking functions
>   * can be used.
>   */
> -void of_fdt_unflatten_tree(const unsigned long *blob,
> +void *of_fdt_unflatten_tree(const unsigned long *blob,

Please add a description of the return value to the documentation header.


>  			struct device_node *dad,
>  			struct device_node **mynodes)
>  {
> -	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
> +	return __unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
>  }
>  EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
>  
> diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
> index 3644960..00db279 100644
> --- a/include/linux/of_fdt.h
> +++ b/include/linux/of_fdt.h
> @@ -37,7 +37,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
>  				 unsigned long node);
>  extern int of_fdt_match(const void *blob, unsigned long node,
>  			const char *const *compat);
> -extern void of_fdt_unflatten_tree(const unsigned long *blob,
> +extern void *of_fdt_unflatten_tree(const unsigned long *blob,
>  			       struct device_node *dad,
>  			       struct device_node **mynodes);
>  
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC
  2015-08-10  6:30   ` Alexey Kardashevskiy
@ 2015-08-10 23:45     ` Gavin Shan
  2015-08-11  2:06       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-10 23:45 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 04:30:09PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>The patch enables M64 window on P7IOC, which has been enabled on
>>PHB3. Different from PHB3 where 16 M64 BARs are supported and each
>>of them can be owned by one particular PE# exclusively or divided
>>evenly to 256 segments, each P7IOC PHB has 16 M64 BARs and each
>>of them are divided into 8 segments.
>
>Is this a limitation of POWER7 chip or it is from IODA1?
>

>From IODA1.

>>So each P7IOC PHB can support
>>128 M64 segments only. Also, P7IOC has M64DT, which helps mapping
>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>M64DT, indicating that one M64 segment can only be pinned to the
>>fixed PE#. In order to have similar logic to support M64 for PHB3
>>and P7IOC, we just provide 128 M64 (16 BARs) segments and fixed
>>mapping between PE# and M64 segment# on P7IOC. In turn, we just
>>need different phb->init_m64() hooks for P7IOC and PHB3 to support
>>M64.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 116 ++++++++++++++++++++++++++----
>>  1 file changed, 104 insertions(+), 12 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 38b5405..e4ac703 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -172,6 +172,69 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>>  	clear_bit(pe, phb->ioda.pe_alloc);
>>  }
>>
>>+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>+{
>>+	struct resource *r;
>>+	int seg;
>>+
>>+	/* There are as many M64 segments as the maximum number
>>+	 * of PEs, which is 128.
>>+	 */
>>+	for (seg = 0; seg < phb->ioda.total_pe; seg += 8) {
>
>
>This "8" is used a lot across the patch, please make it a macro
>(PNV_PHB_P7IOC_SEGNUM or PNV_PHB_IODA1_SEGNUM or whatever you think it is)
>with a short comment why it is "8". Or a pnv_phb member.
>

I would like to use "8". When having a macro, you have to check
the definition of the macro to get the real value of that. However,
it makes sense to add more comments explaining why it's 8 here.

>
>>+		unsigned long base;
>>+		int64_t rc;
>>+
>>+		base = phb->ioda.m64_base + seg * phb->ioda.m64_segsize;
>>+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>+						 OPAL_M64_WINDOW_TYPE,
>>+						 seg / 8,
>>+						 base,
>>+						 0, /* unused */
>>+						 8 * phb->ioda.m64_segsize);
>>+		if (rc != OPAL_SUCCESS) {
>>+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>+				rc, phb->hose->global_number, seg / 8);
>>+			goto fail;
>>+		}
>>+
>>+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>+					      OPAL_M64_WINDOW_TYPE,
>>+					      seg / 8,
>>+					      OPAL_ENABLE_M64_SPLIT);
>>+		if (rc != OPAL_SUCCESS) {
>>+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>+				rc, phb->hose->global_number, seg / 8);
>>+			goto fail;
>>+		}
>>+	}
>>+
>>+	/* Strip off the segment used by the reserved PE, which
>
>What is this reserved PE on P7IOC? "Strip off" means "exclude" here?
>

127 that was exported from skiboot. "Strip off" means "exclude".

>
>>+	 * is expected to be 0 or last supported PE#. The PHB's
>>+	 * first memory window traces the 32-bits MMIO range
>
>s/traces/filters/ ? Or I did not understand this comment...
>

It seems you didn't understand it: there are two memory windows
in every PHB. The first one is tracing M32 resource and the
second one is tracing M64 resource.

>
>>+	 * while the second one traces the 64-bits prefetchable
>>+	 * MMIO range that the PHB supports.
>
>32/64 ranges comment seems irrelevant here.
>

Maybe it's not so relevant, but still. We're stripping off the
M64 segment from the 2nd resource (as above), not first one.

>
>>+	 */
>>+	r = &phb->hose->mem_resources[1];
>>+	if (phb->ioda.reserved_pe == 0)
>>+		r->start += phb->ioda.m64_segsize;
>>+	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
>>+		r->end -= phb->ioda.m64_segsize;
>>+	else
>>+		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
>>+			phb->ioda.reserved_pe);
>>+
>>+	return 0;
>>+
>>+fail:
>>+	for ( ; seg >= 0; seg -= 8)
>>+		opal_pci_phb_mmio_enable(phb->opal_id,
>>+					 OPAL_M64_WINDOW_TYPE,
>>+					 seg / 8,
>>+					 OPAL_DISABLE_M64);
>>+
>>+	return -EIO;
>>+}
>>+
>>  /* The default M64 BAR is shared by all PEs */
>>  static int pnv_ioda2_init_m64(struct pnv_phb *phb)
>>  {
>>@@ -256,9 +319,9 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
>>  	}
>>  }
>>
>>-static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
>>-				     unsigned long *pe_bitmap,
>>-				     bool all)
>>+static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>+				    unsigned long *pe_bitmap,
>>+				    bool all)
>>  {
>>  	struct pci_dev *pdev;
>>
>>@@ -266,12 +329,12 @@ static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
>>  		pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
>>
>>  		if (all && pdev->subordinate)
>>-			pnv_ioda2_reserve_m64_pe(pdev->subordinate,
>>-						 pe_bitmap, all);
>>+			pnv_ioda_reserve_m64_pe(pdev->subordinate,
>>+						pe_bitmap, all);
>>  	}
>>  }
>>
>>-static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>+static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  {
>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>  	struct pnv_phb *phb = hose->private_data;
>>@@ -293,7 +356,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>  	}
>>
>>  	/* Figure out reserved PE numbers by the PE */
>>-	pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
>>+	pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
>>
>>  	/*
>>  	 * the current bus might not own M64 window and that's all
>>@@ -324,6 +387,26 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>  			pe->master = master_pe;
>>  			list_add_tail(&pe->list, &master_pe->slaves);
>>  		}
>>+
>>+		/* P7IOC supports M64DT, which helps mapping M64 segment
>>+		 * to one particular PE#. However, PHB3 has fixed mapping
>>+		 * between M64 segment and PE#. In order to have same logic
>>+		 * for P7IOC and PHB3, we enforce fixed mapping between M64
>>+		 * segment and PE# on P7IOC.
>>+		 */
>>+		if (phb->type == PNV_PHB_IODA1) {
>>+			int64_t rc;
>>+
>>+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>+							 pe->pe_number,
>>+							 OPAL_M64_WINDOW_TYPE,
>>+							 pe->pe_number / 8,
>>+							 pe->pe_number % 8);
>>+			if (rc != OPAL_SUCCESS)
>>+				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
>>+					__func__, rc, phb->hose->global_number,
>>+					pe->pe_number);
>>+		}
>>  	}
>>
>>  	kfree(pe_alloc);
>>@@ -338,8 +421,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>  	const u32 *r;
>>  	u64 pci_addr;
>>
>>-	/* FIXME: Support M64 for P7IOC */
>>-	if (phb->type != PNV_PHB_IODA2) {
>>+	if (phb->type != PNV_PHB_IODA1 &&
>>+	    phb->type != PNV_PHB_IODA2) {
>>  		pr_info("  Not support M64 window\n");
>>  		return;
>
>
>You are adding P7IOC support so at least "fixme" should go. Also,
>pnv_ioda_parse_m64_window() is only called from pnv_pci_init_ioda_phb() which
>is called only with PNV_PHB_IODA1 and PNV_PHB_IODA2 (no other value is passed
>there a type) so the check above will never succeed, just remove it.
>

The "fixme" is removed, isn't it?

As I explained last time, there will have another new type PHB and the function
will be called on the new type of PHB. The code has been there and it's not
in upstream yet. So it's reasonable to keep it, instead of removing it.

>>  	}
>>@@ -372,9 +455,18 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>
>>  	/* Use last M64 BAR to cover M64 window */
>>  	phb->ioda.m64_bar_idx = 15;
>>-	phb->init_m64 = pnv_ioda2_init_m64;
>>-	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
>>-	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
>>+	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>>+	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>>+	switch (phb->type) {
>>+	case PNV_PHB_IODA1:
>>+		phb->init_m64 = pnv_ioda1_init_m64;
>>+		break;
>>+	case PNV_PHB_IODA2:
>>+		phb->init_m64 = pnv_ioda2_init_m64;
>>+		break;
>>+	default:
>>+		pr_debug("   M64 not supported\n");
>>+	}
>>  }
>>
>>  static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
  2015-08-10  7:16   ` Alexey Kardashevskiy
@ 2015-08-11  0:03     ` Gavin Shan
  2015-08-11  2:23       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>The patch is adding 6 bitmaps, three to PE and three to PHB, to track
>
>The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that
>all about? Also, there was no m64_segmap, now there is, needs an explanation
>may be.
>

Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically.
Now, they have fixed sizes - 512 bits.

The subject "powerpc/powernv: Track IO/M32/M64 segments from PE" indicates
why m64_segmap is added.

>
>>the consumed by one particular PE, which can be released once the PE
>>is destroyed during PCI unplugging time. Also, we're using fixed
>>quantity of bits to trace the used IO, M32 and M64 segments by PEs
>>in one particular PHB.
>>
>
>Out of curiosity - have you considered having just 3 arrays, in PHB, storing
>PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it
>is using? Not sure about this master/slave PEs though.
>

I don't follow your suggestion. Can you rephrase and explain it a bit more?

>It would be easier to read patches if this one was right before
>[PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>

I'll try to reoder the patch, but not expect too much...

>
>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++++++++++++++--------------
>>  arch/powerpc/platforms/powernv/pci.h      | 18 ++++++++++++++----
>>  2 files changed, 29 insertions(+), 18 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index e4ac703..78b49a1 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  			list_add_tail(&pe->list, &master_pe->slaves);
>>  		}
>>
>>+		/* M64 segments consumed by slave PEs are tracked
>>+		 * by master PE
>>+		 */
>>+		set_bit(pe->pe_number, master_pe->m64_segmap);
>>+		set_bit(pe->pe_number, phb->ioda.m64_segmap);
>>+
>>  		/* P7IOC supports M64DT, which helps mapping M64 segment
>>  		 * to one particular PE#. However, PHB3 has fixed mapping
>>  		 * between M64 segment and PE#. In order to have same logic
>>@@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>
>>  			while (index < phb->ioda.total_pe &&
>>  			       region.start <= region.end) {
>>-				phb->ioda.io_segmap[index] = pe->pe_number;
>>+				set_bit(index, pe->io_segmap);
>>+				set_bit(index, phb->ioda.io_segmap);
>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
>>+					pe->pe_number, OPAL_IO_WINDOW_TYPE,
>>+					0, index);
>
>Unrelated change.
>

True, will drop. However, checkpatch.pl will complain wtih:
exceeding 80 characters.

>>  				if (rc != OPAL_SUCCESS) {
>>  					pr_err("%s: OPAL error %d when mapping IO "
>>  					       "segment #%d to PE#%d\n",
>>@@ -2896,9 +2904,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>
>>  			while (index < phb->ioda.total_pe &&
>>  			       region.start <= region.end) {
>>-				phb->ioda.m32_segmap[index] = pe->pe_number;
>>+				set_bit(index, pe->m32_segmap);
>>+				set_bit(index, phb->ioda.m32_segmap);
>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
>>+					pe->pe_number, OPAL_M32_WINDOW_TYPE,
>>+					0, index);
>
>Unrelated change.
>

same as above.

>>  				if (rc != OPAL_SUCCESS) {
>>  					pr_err("%s: OPAL error %d when mapping M32 "
>>  					       "segment#%d to PE#%d",
>>@@ -3090,7 +3100,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  {
>>  	struct pci_controller *hose;
>>  	struct pnv_phb *phb;
>>-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>+	unsigned long size, pemap_off;
>>  	const __be64 *prop64;
>>  	const __be32 *prop32;
>>  	int len;
>>@@ -3175,19 +3185,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>
>>  	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>
>
>This comment came with if(IODA1) below, since you are removing the condition
>below, makes sense to remove the comment as well or move it where people will
>look for it (arch/powerpc/platforms/powernv/pci.h ?)
>

Yes, will do.

>
>>  	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
>>-	m32map_off = size;
>>-	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
>>-	if (phb->type == PNV_PHB_IODA1) {
>>-		iomap_off = size;
>>-		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
>>-	}
>>  	pemap_off = size;
>>  	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
>>  	aux = memblock_virt_alloc(size, 0);
>
>
>After adding static arrays to PE and PHB, do you still need this "aux"?
>

"aux" is still needed to tell the boundary of pe_alloc_bitmap and pe_array.

>
>>  	phb->ioda.pe_alloc = aux;
>>-	phb->ioda.m32_segmap = aux + m32map_off;
>>-	if (phb->type == PNV_PHB_IODA1)
>>-		phb->ioda.io_segmap = aux + iomap_off;
>>  	phb->ioda.pe_array = aux + pemap_off;
>>  	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 62239b1..08a4e57 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -49,6 +49,15 @@ struct pnv_ioda_pe {
>>  	/* PE number */
>>  	unsigned int		pe_number;
>>
>>+	/* IO/M32/M64 segments consumed by the PE. Each PE can
>>+	 * have one M64 segment at most, but M64 segments consumed
>>+	 * by slave PEs will be contributed to the master PE. One
>>+	 * PE can own multiple IO and M32 segments.
>
>
>A PE can have multiple IO and M32 segments but just one M64 segment? Is this
>correct for IODA1 or IODA2 or both? Is this a limitation of this
>implementation or it comes from P7IOC/PHB3 hardware?
>

It's correct for IO and M32. However, on IODA1 or IODA2, one PE can have
multiple M64 segments as well.

>>+	 */
>>+	unsigned long		io_segmap[8];
>>+	unsigned long		m32_segmap[8];
>>+	unsigned long		m64_segmap[8];
>
>Magic constant "8", 64bit*8 = 512 PEs - where did this come from?
>
>Anyway,
>
>#define PNV_IODA_MAX_PE_NUM	512
>
>unsigned long io_segmap[PNV_IODA_MAX_PE_NUM/BITS_PER_LONG]
>

I prefer "8", not macro for 3 reasons:
- The macro won't be used in the code.
- The total segment number of specific resource is variable
  on IODA1 and IODA2. I just choosed the max value with margin.
- PNV_IODA_MAX_PE_NUM, indicating max PE number, isn't 512 on
  IODA1 or IODA2.

>>+
>>  	/* "Weight" assigned to the PE for the sake of DMA resource
>>  	 * allocations
>>  	 */
>>@@ -145,15 +154,16 @@ struct pnv_phb {
>>  			unsigned int		io_segsize;
>>  			unsigned int		io_pci_base;
>>
>>+			/* IO, M32, M64 segment maps */
>>+			unsigned long		io_segmap[8];
>>+			unsigned long		m32_segmap[8];
>>+			unsigned long		m64_segmap[8];
>>+
>>  			/* PE allocation */
>>  			struct mutex		pe_alloc_mutex;
>>  			unsigned long		*pe_alloc;
>>  			struct pnv_ioda_pe	*pe_array;
>>
>>-			/* M32 & IO segment maps */
>>-			unsigned int		*m32_segmap;
>>-			unsigned int		*io_segmap;
>>-
>>  			/* IRQ chip */
>>  			int			irq_chip_init;
>>  			struct irq_chip		irq_chip;
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 07/42] powerpc/powernv: Improve IO and M32 mapping
  2015-08-10  7:40     ` Alexey Kardashevskiy
@ 2015-08-11  0:12       ` Gavin Shan
  2015-08-11  2:32         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 05:40:08PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>There're 3 windows (IO, M32 and M64) for PHB, root port and upstream
>
>These are actually IO, non-prefetchable and prefetchable windows which happen
>to be IO, 32bit and 64bit windows but this has nothing to do with the M32/M64
>BAR registers in P7IOC/PHB3, do I understand this correctly?
>

In pci-ioda.c, we have below definiations that are defined when
developing the code, not from any specification:

IO  - resources with IO property
M32 - 32-bits or non-prefetchable resources
M64 - 64-bits and prefetchable resources

>>port of the PCIE switch behind root port. In order to support PCI
>>hotplug, we extend the start/end address of those 3 windows of root
>>port or upstream port to the start/end address of the 3 PHB's windows.
>>The current implementation, assigning IO or M32 segment based on the
>>bridge's windows, isn't reliable.
>>
>>The patch fixes above issue by calculating PE's consumed IO or M32
>>segments from its contained devices, no PCI bridge windows involved
>>if the PE doesn't contain all the subordinate PCI buses.
>
>Please, rephrase it. How can PCI bridges be involved in PE consumption?
>

Ok. Will add something like below:

if the PE, corresponding to the PCI bus, doesn't contain all the subordinate
PCI buses.

>
>>Otherwise,
>>the PCI bridge windows still contribute to PE's consumed IO or M32
>>segments.
>
>PCI bridge windows themselves consume PEs? Is that correct?
>

PCI bridge windows consume IO, M32, M64 segments, not PEs.

>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 136 +++++++++++++++++-------------
>>  1 file changed, 79 insertions(+), 57 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 488a53e..713f4b4 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2844,75 +2844,97 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  }
>>  #endif /* CONFIG_PCI_IOV */
>>
>>-/*
>>- * This function is supposed to be called on basis of PE from top
>>- * to bottom style. So the the I/O or MMIO segment assigned to
>>- * parent PE could be overrided by its child PEs if necessary.
>>- */
>>-static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>-				  struct pnv_ioda_pe *pe)
>>+static int pnv_ioda_setup_one_res(struct pci_controller *hose,
>>+				  struct pnv_ioda_pe *pe,
>>+				  struct resource *res)
>>  {
>>  	struct pnv_phb *phb = hose->private_data;
>>  	struct pci_bus_region region;
>>-	struct resource *res;
>>-	int i, index;
>>-	unsigned int segsize;
>>+	unsigned int index, segsize;
>>  	unsigned long *segmap, *pe_segmap;
>>  	uint16_t win;
>>  	int64_t rc;
>>
>>-	/*
>>-	 * NOTE: We only care PCI bus based PE for now. For PCI
>>-	 * device based PE, for example SRIOV sensitive VF should
>>-	 * be figured out later.
>>-	 */
>>-	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>>+	/* Check if we need map the resource */
>>+	if (!res->parent || !res->flags || res->start > res->end)
>
>res->start >= res->end ?
>

No, res->start == res->end is valid.

>
>>+		return 0;
>>
>>-	pci_bus_for_each_resource(pe->pbus, res, i) {
>>-		if (!res || !res->flags ||
>>-		    res->start > res->end)
>>-			continue;
>>+	if (res->flags & IORESOURCE_IO) {
>>+		region.start = res->start - phb->ioda.io_pci_base;
>>+		region.end   = res->end - phb->ioda.io_pci_base;
>>+		segsize      = phb->ioda.io_segsize;
>>+		segmap       = phb->ioda.io_segmap;
>>+		pe_segmap    = pe->io_segmap;
>>+		win          = OPAL_IO_WINDOW_TYPE;
>>+	} else if ((res->flags & IORESOURCE_MEM) &&
>>+		   !pnv_pci_is_mem_pref_64(res->flags)) {
>>+		region.start = res->start -
>>+			       hose->mem_offset[0] -
>>+			       phb->ioda.m32_pci_base;
>>+		region.end   = res->end -
>>+			       hose->mem_offset[0] -
>>+			       phb->ioda.m32_pci_base;
>>+		segsize      = phb->ioda.m32_segsize;
>>+		segmap       = phb->ioda.m32_segmap;
>>+		pe_segmap    = pe->m32_segmap;
>>+		win          = OPAL_M32_WINDOW_TYPE;
>>+	} else {
>>+		return 0;
>>+	}
>>
>>-		if (res->flags & IORESOURCE_IO) {
>>-			region.start = res->start - phb->ioda.io_pci_base;
>>-			region.end   = res->end - phb->ioda.io_pci_base;
>>-			segsize      = phb->ioda.io_segsize;
>>-			segmap       = phb->ioda.io_segmap;
>>-			pe_segmap    = pe->io_segmap;
>>-			win          = OPAL_IO_WINDOW_TYPE;
>>-		} else if ((res->flags & IORESOURCE_MEM) &&
>>-			   !pnv_pci_is_mem_pref_64(res->flags)) {
>>-			region.start = res->start -
>>-				       hose->mem_offset[0] -
>>-				       phb->ioda.m32_pci_base;
>>-			region.end   = res->end -
>>-				       hose->mem_offset[0] -
>>-				       phb->ioda.m32_pci_base;
>>-			segsize      = phb->ioda.m32_segsize;
>>-			segmap       = phb->ioda.m32_segmap;
>>-			pe_segmap    = pe->m32_segmap;
>>-			win          = OPAL_M32_WINDOW_TYPE;
>>-		} else {
>>-			continue;
>>+	region.start = _ALIGN_DOWN(region.start, segsize);
>>+	region.end   = _ALIGN_UP(region.end, segsize);
>>+	index = region.start / segsize;
>>+	while (index < phb->ioda.total_pe &&
>>+	       region.start < region.end) {
>>+		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>+				pe->pe_number, win, 0, index);
>>+		if (rc != OPAL_SUCCESS) {
>>+			pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
>>+				__func__, rc, win, index,
>>+				pe->phb->hose->global_number,
>>+				pe->pe_number);
>>+			return -EIO;
>>  		}
>>
>>-		index = region.start / phb->ioda.io_segsize;
>>-		while (index < phb->ioda.total_pe &&
>>-		       region.start <= region.end) {
>>-			set_bit(index, segmap);
>>-			set_bit(index, pe_segmap);
>>-			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>-					pe->pe_number, win, 0, index);
>>-			if (rc != OPAL_SUCCESS) {
>>-				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
>>-					__func__, rc, win, index,
>>-					pe->phb->hose->global_number,
>>-					pe->pe_number);
>>-				break;
>>-			}
>>+		set_bit(index, segmap);
>>+		set_bit(index, pe_segmap);
>>+		region.start += segsize;
>>+		index++;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>+				  struct pnv_ioda_pe *pe)
>>+{
>>+	struct pci_dev *pdev;
>>+	struct resource *res;
>>+	int i;
>>+
>>+	/* This function only works for bus dependent PE */
>>+	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>>+
>>+	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
>>+		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>>+			res = &pdev->resource[i];
>>+			if (pnv_ioda_setup_one_res(hose, pe, res))
>>+				return;
>>+		}
>>+
>>+		/* If the PE contains all subordinate PCI buses, the
>>+		 * resources of the child bridges should be mapped
>>+		 * to the PE as well.
>>+		 */
>>+		if (!(pe->flags & PNV_IODA_PE_BUS_ALL) ||
>>+		    (pdev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
>>+			continue;
>>
>>-			region.start += segsize;
>>-			index++;
>>+		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
>>+			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
>>+			if (pnv_ioda_setup_one_res(hose, pe, res))
>>+				return;
>>  		}
>>  	}
>>  }
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 09/42] powerpc/powernv: DMA32 cleanup
  2015-08-10  8:07   ` Alexey Kardashevskiy
@ 2015-08-11  0:19     ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 06:07:27PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>The patch cleans up DMA32 in pci-ioda.c. It shouldn't introduce
>>behavioural changes:
>>
>>    * Rename various fields in "struct pnv_phb" and "struct pnv_ioda_pe"
>>      as 32-bits DMA should be related to "DMA", not "TCE".
>
>s/dma_weight/dma32_weight/ is ok (does not add much though_ but the rest is
>not. The "tce32_" fields are still TCEs (translation entries) while DMA is a
>process initiated by a device which does not know about how exactly DMA
>addresses are translated later. Since we are on the host side and we actually
>manage TCE tables here, I suggest keeping the "tce32_" prefix for TCE tables
>and memory they use.
>

Ok. Will change accordingly.

>>    * Removed struct pnv_ioda_pe::tce32_segcount.
>
>That's confusing - I had to walk through patches to find out where you
>stopped using it. It would be simpler if you put this particular change to
>
>[PATCH v6 02/42] powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
>
>where you remove dead code.
>

I'll try to reorder the patch...

>
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 48 +++++++++++++++----------------
>>  arch/powerpc/platforms/powernv/pci.h      |  7 ++---
>>  2 files changed, 27 insertions(+), 28 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 7342cfd..8456f37 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -917,7 +917,7 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
>>  	struct pnv_ioda_pe *lpe;
>>
>>  	list_for_each_entry(lpe, &phb->ioda.pe_dma_list, dma_link) {
>>-		if (lpe->dma_weight < pe->dma_weight) {
>>+		if (lpe->dma32_weight < pe->dma32_weight) {
>>  			list_add_tail(&pe->dma_link, &lpe->dma_link);
>>  			return;
>>  		}
>>@@ -942,14 +942,14 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>>  	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>>  	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>>  	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
>>-		return 3 * phb->ioda.tce32_count;
>>+		return 3 * phb->ioda.dma32_segcount;
>>
>>  	/* Increase the weight of RAID (includes Obsidian) */
>>  	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
>>-		return 15 * phb->ioda.tce32_count;
>>+		return 15 * phb->ioda.dma32_segcount;
>>
>>  	/* Default */
>>-	return 10 * phb->ioda.tce32_count;
>>+	return 10 * phb->ioda.dma32_segcount;
>>  }
>>
>>  static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data)
>>@@ -1057,7 +1057,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>  			continue;
>>  		}
>>  		pdn->pe_number = pe->pe_number;
>>-		pe->dma_weight += pnv_ioda_dma_weight(dev);
>>+		pe->dma32_weight += pnv_ioda_dma_weight(dev);
>>  		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>  			pnv_ioda_setup_same_PE(dev->subordinate, pe);
>>  	}
>>@@ -1094,10 +1094,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>>  	pe->pbus = bus;
>>  	pe->pdev = NULL;
>>-	pe->tce32_seg = -1;
>>+	pe->dma32_seg = -1;
>>  	pe->mve_number = -1;
>>  	pe->rid = bus->busn_res.start << 8;
>>-	pe->dma_weight = 0;
>>+	pe->dma32_weight = 0;
>>
>>  	if (all)
>>  		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
>>@@ -1460,7 +1460,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>  		pe->flags = PNV_IODA_PE_VF;
>>  		pe->pbus = NULL;
>>  		pe->parent_dev = pdev;
>>-		pe->tce32_seg = -1;
>>+		pe->dma32_seg = -1;
>>  		pe->mve_number = -1;
>>  		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
>>  			   pci_iov_virtfn_devfn(pdev, vf_index);
>>@@ -1936,7 +1936,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>  	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>>
>>  	/* We shouldn't already have a 32-bit DMA associated */
>>-	if (WARN_ON(pe->tce32_seg >= 0))
>>+	if (WARN_ON(pe->dma32_seg >= 0))
>>  		return;
>>
>>  	tbl = pnv_pci_table_alloc(phb->hose->node);
>>@@ -1945,7 +1945,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>  	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>>
>>  	/* Grab a 32-bit TCE table */
>>-	pe->tce32_seg = base;
>>+	pe->dma32_seg = base;
>>  	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>  		(base << 28), ((base + segs) << 28) - 1);
>>
>>@@ -2006,8 +2006,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>  	return;
>>   fail:
>>  	/* XXX Failure: Try to fallback to 64-bit only ? */
>>-	if (pe->tce32_seg >= 0)
>>-		pe->tce32_seg = -1;
>>+	if (pe->dma32_seg >= 0)
>>+		pe->dma32_seg = -1;
>>  	if (tce_mem)
>>  		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>>  	if (tbl) {
>>@@ -2405,7 +2405,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  	int64_t rc;
>>
>>  	/* We shouldn't already have a 32-bit DMA associated */
>>-	if (WARN_ON(pe->tce32_seg >= 0))
>>+	if (WARN_ON(pe->dma32_seg >= 0))
>>  		return;
>>
>>  	/* TVE #1 is selected by PCI address bit 59 */
>>@@ -2415,7 +2415,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  			pe->pe_number);
>>
>>  	/* The PE will reserve all possible 32-bits space */
>>-	pe->tce32_seg = 0;
>>+	pe->dma32_seg = 0;
>>  	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
>>  		phb->ioda.m32_pci_base);
>>
>>@@ -2432,8 +2432,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>
>>  	rc = pnv_pci_ioda2_setup_default_config(pe);
>>  	if (rc) {
>>-		if (pe->tce32_seg >= 0)
>>-			pe->tce32_seg = -1;
>>+		if (pe->dma32_seg >= 0)
>>+			pe->dma32_seg = -1;
>>  		return;
>>  	}
>>
>>@@ -2452,7 +2452,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  	/* Calculate the PHB's DMA weight */
>>  	dma_weight = pnv_ioda_phb_dma_weight(phb);
>>  	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
>>-		hose->global_number, phb->ioda.tce32_count, dma_weight);
>>+		hose->global_number, phb->ioda.dma32_segcount, dma_weight);
>>
>>  	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>
>>@@ -2461,7 +2461,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  	 * weight
>>  	 */
>>  	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>-		if (!pe->dma_weight)
>>+		if (!pe->dma32_weight)
>>  			continue;
>>
>>  		/*
>>@@ -2472,15 +2472,15 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  		if (phb->type == PNV_PHB_IODA1) {
>>  			unsigned int segs, base = 0;
>>
>>-			if (pe->dma_weight <
>>-			    dma_weight / phb->ioda.tce32_count)
>>+			if (pe->dma32_weight <
>>+			    dma_weight / phb->ioda.dma32_segcount)
>>  				segs = 1;
>>  			else
>>-				segs = (pe->dma_weight *
>>-					phb->ioda.tce32_count) / dma_weight;
>>+				segs = (pe->dma32_weight *
>>+					phb->ioda.dma32_segcount) / dma_weight;
>>
>>  			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
>>-				pe->dma_weight, segs);
>>+				pe->dma32_weight, segs);
>>  			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>>
>>  			base += segs;
>>@@ -3211,7 +3211,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	mutex_init(&phb->ioda.pe_list_mutex);
>>
>>  	/* Calculate how many 32-bit TCE segments we have */
>>-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
>>+	phb->ioda.dma32_segcount = phb->ioda.m32_pci_base >> 28;
>>
>>  #if 0 /* We should really do that ... */
>>  	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index addd3f7..574fe43 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -61,11 +61,10 @@ struct pnv_ioda_pe {
>>  	/* "Weight" assigned to the PE for the sake of DMA resource
>>  	 * allocations
>>  	 */
>>-	unsigned int		dma_weight;
>>+	unsigned int		dma32_weight;
>>
>>  	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
>>-	int			tce32_seg;
>>-	int			tce32_segcount;
>>+	int			dma32_seg;
>>  	struct iommu_table_group table_group;
>>
>>  	/* 64-bit TCE bypass region */
>>@@ -181,7 +180,7 @@ struct pnv_phb {
>>  			unsigned char		pe_rmap[0x10000];
>>
>>  			/* 32-bit TCE tables allocation */
>>-			unsigned long		tce32_count;
>>+			unsigned long		dma32_segcount;
>>
>>  			/* Sorted list of used PE's, sorted at
>>  			 * boot for resource allocation purposes
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only
  2015-08-10  9:31   ` Alexey Kardashevskiy
@ 2015-08-11  0:29     ` Gavin Shan
  2015-08-11  2:39       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 07:31:11PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>The original implementation of pnv_ioda_setup_dma() iterates the
>>list of PEs and configures the DMA32 space for them one by one.
>>The function was designed to be called during PHB fixup time.
>>When configuring PE's DMA32 space in pcibios_setup_bridge(), in
>>order to support PCI hotplug, we have to have the function PE
>>oriented.
>>
>>This renames pnv_ioda_setup_dma() to pnv_ioda1_setup_dma() and
>>adds one more argument "struct pnv_ioda_pe *pe" to it. The caller,
>>pnv_pci_ioda_setup_DMA(), gets PE from the list and passes to it
>>or pnv_pci_ioda2_setup_dma_pe(). The patch shouldn't cause behavioral
>>changes.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 75 +++++++++++++++----------------
>>  1 file changed, 36 insertions(+), 39 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 8456f37..cd22002 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2443,52 +2443,29 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>>  }
>>
>>-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>+static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
>>+					struct pnv_ioda_pe *pe,
>>+					unsigned int base)
>>  {
>>  	struct pci_controller *hose = phb->hose;
>>-	struct pnv_ioda_pe *pe;
>>-	unsigned int dma_weight;
>>+	unsigned int dma_weight, segs;
>>
>>  	/* Calculate the PHB's DMA weight */
>>  	dma_weight = pnv_ioda_phb_dma_weight(phb);
>>  	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
>>  		hose->global_number, phb->ioda.dma32_segcount, dma_weight);
>>
>>-	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>-
>>-	/* Walk our PE list and configure their DMA segments, hand them
>>-	 * out one base segment plus any residual segments based on
>>-	 * weight
>>-	 */
>>-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>-		if (!pe->dma32_weight)
>>-			continue;
>>-
>>-		/*
>>-		 * For IODA2 compliant PHB3, we needn't care about the weight.
>>-		 * The all available 32-bits DMA space will be assigned to
>>-		 * the specific PE.
>>-		 */
>>-		if (phb->type == PNV_PHB_IODA1) {
>>-			unsigned int segs, base = 0;
>>-
>>-			if (pe->dma32_weight <
>>-			    dma_weight / phb->ioda.dma32_segcount)
>>-				segs = 1;
>>-			else
>>-				segs = (pe->dma32_weight *
>>-					phb->ioda.dma32_segcount) / dma_weight;
>>-
>>-			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
>>-				pe->dma32_weight, segs);
>>-			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>>+	if (pe->dma32_weight <
>>+	    dma_weight / phb->ioda.dma32_segcount)
>
>Can be one line now.
>

Indeed.

>>+		segs = 1;
>>+	else
>>+		segs = (pe->dma32_weight *
>>+			phb->ioda.dma32_segcount) / dma_weight;
>>+	pe_info(pe, "DMA weight %d, assigned %d segments\n",
>>+		pe->dma32_weight, segs);
>>+	pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>
>
>Why not to merge pnv_ioda1_setup_dma() to pnv_pci_ioda_setup_dma_pe()?
>

There're two reasons:
- They're separate logically. One is calculating number of DMA32 segments required.
  Another one is allocate TCE32 tables and configure devices with them.
- In PCI hotplug path, I need pnv_ioda1_setup_dma() which has "pe" as parameter.

>>
>>-			base += segs;
>>-		} else {
>>-			pe_info(pe, "Assign DMA32 space\n");
>>-			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>-		}
>>-	}
>>+	return segs;
>>  }
>>
>>  #ifdef CONFIG_PCI_MSI
>>@@ -2955,12 +2932,32 @@ static void pnv_pci_ioda_setup_DMA(void)
>>  {
>>  	struct pci_controller *hose, *tmp;
>>  	struct pnv_phb *phb;
>>+	struct pnv_ioda_pe *pe;
>>+	unsigned int base;
>>
>>  	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>-		pnv_ioda_setup_dma(hose->private_data);
>>+		phb = hose->private_data;
>>+		pnv_pci_ioda_setup_opal_tce_kill(phb);
>>+
>>+		base = 0;
>>+		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>+			if (!pe->dma32_weight)
>>+				continue;
>>+
>>+			switch (phb->type) {
>>+			case PNV_PHB_IODA1:
>>+				base += pnv_ioda1_setup_dma(phb, pe, base);
>
>
>This @base handling seems never be tested between 8..11 as "[PATCH v6 11/42]
>powerpc/powernv: Trace DMA32 segments consumed by PE"
>removes it and I suspect you only tested the final version. Which is ok for
>the final result but not ok for bisectability.
>
>Looks like 8/42, 9/42, 10/42, 11/42 need to be rearranged or merged to remove
>this multiple @base touching.
>

Why ?

>
>>+				break;
>>+			case PNV_PHB_IODA2:
>>+				pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>+				break;
>>+			default:
>>+				pr_warn("%s: No DMA for PHB type %d\n",
>>+					__func__, phb->type);
>>+			}
>>+		}
>>
>>  		/* Mark the PHB initialization done */
>>-		phb = hose->private_data;
>>  		phb->initialized = 1;
>>  	}
>>  }
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by PE
  2015-08-10  9:43   ` Alexey Kardashevskiy
@ 2015-08-11  0:33     ` Gavin Shan
  2015-08-13  0:02     ` Gavin Shan
  1 sibling, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:33 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 07:43:48PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>On P7IOC, the whole DMA32 space is divided evenly to 256MB segments.
>>Each PE can consume one or multiple DMA32 segments. Current code
>>doesn't trace the available DMA32 segments and those consumed by
>>one particular PE. It's conflicting with PCI hotplug.
>>
>>The patch introduces one bitmap to PHB to trace the available
>>DMA32 segments for allocation, more fields to "struct pnv_ioda_pe"
>>to trace the consumed DMA32 segments by the PE, which is going to
>>be released when the PE is destroyed at PCI unplugging time.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 40 +++++++++++++++++++++++--------
>>  arch/powerpc/platforms/powernv/pci.h      |  4 +++-
>>  2 files changed, 33 insertions(+), 11 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index cd22002..57ba8fd 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1946,6 +1946,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>
>>  	/* Grab a 32-bit TCE table */
>>  	pe->dma32_seg = base;
>>+	pe->dma32_segcount = segs;
>>  	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>  		(base << 28), ((base + segs) << 28) - 1);
>>
>>@@ -2006,8 +2007,13 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>  	return;
>>   fail:
>>  	/* XXX Failure: Try to fallback to 64-bit only ? */
>>-	if (pe->dma32_seg >= 0)
>>+	if (pe->dma32_seg >= 0) {
>>+		bitmap_clear(phb->ioda.dma32_segmap,
>>+			     pe->dma32_seg, pe->dma32_segcount);
>>  		pe->dma32_seg = -1;
>>+		pe->dma32_segcount = 0;
>>+	}
>>+
>>  	if (tce_mem)
>>  		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>>  	if (tbl) {
>>@@ -2443,12 +2449,11 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>>  }
>>
>>-static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
>>-					struct pnv_ioda_pe *pe,
>>-					unsigned int base)
>>+static void pnv_ioda1_setup_dma(struct pnv_phb *phb,
>>+					struct pnv_ioda_pe *pe)
>>  {
>>  	struct pci_controller *hose = phb->hose;
>>-	unsigned int dma_weight, segs;
>>+	unsigned int dma_weight, base, segs;
>>
>>  	/* Calculate the PHB's DMA weight */
>>  	dma_weight = pnv_ioda_phb_dma_weight(phb);
>>@@ -2461,11 +2466,28 @@ static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
>>  	else
>>  		segs = (pe->dma32_weight *
>>  			phb->ioda.dma32_segcount) / dma_weight;
>>+
>>+	/*
>>+	 * Allocate DMA32 segments. We might not have enough
>>+	 * resources available. However we expect at least one
>>+	 * to be available.
>>+	 */
>>+	do {
>>+		base = bitmap_find_next_zero_area(phb->ioda.dma32_segmap,
>>+						  phb->ioda.dma32_segcount,
>>+						  0, segs, 0);
>>+		if (base < phb->ioda.dma32_segcount) {
>>+			bitmap_set(phb->ioda.dma32_segmap, base, segs);
>>+			break;
>>+		}
>>+	} while (--segs);
>
>
>If segs==0 before entering the loop, the loop will execute 0xfffffffe times.
>Make it for(;segs;--segs){ }.
>

The segs is always equal to 1 or more than that. However, "for()" statement
seems better and I'll change it.

>
>>+
>>+	if (WARN_ON(!segs))
>>+		return;
>>+
>>  	pe_info(pe, "DMA weight %d, assigned %d segments\n",
>>  		pe->dma32_weight, segs);
>>  	pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>>-
>>-	return segs;
>>  }
>>
>>  #ifdef CONFIG_PCI_MSI
>>@@ -2933,20 +2955,18 @@ static void pnv_pci_ioda_setup_DMA(void)
>>  	struct pci_controller *hose, *tmp;
>>  	struct pnv_phb *phb;
>>  	struct pnv_ioda_pe *pe;
>>-	unsigned int base;
>>
>>  	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>  		phb = hose->private_data;
>>  		pnv_pci_ioda_setup_opal_tce_kill(phb);
>>
>>-		base = 0;
>>  		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>  			if (!pe->dma32_weight)
>>  				continue;
>>
>>  			switch (phb->type) {
>>  			case PNV_PHB_IODA1:
>>-				base += pnv_ioda1_setup_dma(phb, pe, base);
>>+				pnv_ioda1_setup_dma(phb, pe);
>>  				break;
>>  			case PNV_PHB_IODA2:
>>  				pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 574fe43..1dc9578 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -65,6 +65,7 @@ struct pnv_ioda_pe {
>>
>>  	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
>>  	int			dma32_seg;
>>+	int			dma32_segcount;
>>  	struct iommu_table_group table_group;
>>
>>  	/* 64-bit TCE bypass region */
>>@@ -153,10 +154,11 @@ struct pnv_phb {
>>  			unsigned int		io_segsize;
>>  			unsigned int		io_pci_base;
>>
>>-			/* IO, M32, M64 segment maps */
>>+			/* IO, M32, M64, DMA32 segment maps */
>>  			unsigned long		io_segmap[8];
>>  			unsigned long		m32_segmap[8];
>>  			unsigned long		m64_segmap[8];
>>+			unsigned long		dma32_segmap[8];
>>
>>  			/* PE allocation */
>>  			struct mutex		pe_alloc_mutex;
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 12/42] powerpc/powernv: Increase PE# capacity
  2015-08-10  9:53       ` Alexey Kardashevskiy
@ 2015-08-11  0:38         ` Gavin Shan
  2015-08-11  2:47           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 07:53:02PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>Each PHB maintains an array helping to translate RID (Request
>>ID) to PE# with the assumption that PE# takes 8 bits, indicating
>>that we can't have more than 256 PEs. However, pci_dn->pe_number
>>already had 4-bytes for the PE#.
>>
>>The patch extends the PE# capacity so that each of them will be
>>4-bytes long. Then we can use IODA_INVALID_PE to check one entry
>>in phb->pe_rmap[] is valid or not.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 8 ++++++--
>>  arch/powerpc/platforms/powernv/pci.h      | 7 +++----
>>  2 files changed, 9 insertions(+), 6 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 57ba8fd..3094c61 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -786,7 +786,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>
>>  	/* Clear the reverse map */
>>  	for (rid = pe->rid; rid < rid_end; rid++)
>>-		phb->ioda.pe_rmap[rid] = 0;
>>+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>>
>>  	/* Release from all parents PELT-V */
>>  	while (parent) {
>>@@ -3134,7 +3134,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	unsigned long size, pemap_off;
>>  	const __be64 *prop64;
>>  	const __be32 *prop32;
>>-	int len;
>>+	int len, i;
>>  	u64 phb_id;
>>  	void *aux;
>>  	long rc;
>>@@ -3201,6 +3201,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	if (prop32)
>>  		phb->ioda.reserved_pe = be32_to_cpup(prop32);
>>
>>+	/* Invalidate RID to PE# mapping */
>>+	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
>>+		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
>>+
>>  	/* Parse 64-bit MMIO range */
>>  	pnv_ioda_parse_m64_window(phb);
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 1dc9578..6f8568e 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -175,11 +175,10 @@ struct pnv_phb {
>>  			struct list_head	pe_list;
>>  			struct mutex            pe_list_mutex;
>>
>>-			/* Reverse map of PEs, will have to extend if
>>-			 * we are to support more than 256 PEs, indexed
>>-			 * bus { bus, devfn }
>>+			/* Reverse map of PEs, indexed by
>>+			 * { bus, devfn }
>>  			 */
>>-			unsigned char		pe_rmap[0x10000];
>>+			int			pe_rmap[0x10000];
>
>
>256k seems to be waste when only tiny fraction of it will ever be used. Using
>include/linux/hashtable.h makes sense here, and if you use a hashtable, you
>won't have to initialize anything with IODA_INVALID_PE.
>

I'm not sure if I follow your idea completely. With hash table to trace
RID mapping here, won't more memory needed if all PCI buse numbers (0
to 255) are all valid? It means hash table doesn't have advantage in
memory consumption. On the other hand, searching in hash table buckets
have to iterate list of conflicting items (keys), which is slow comparing
to what we have. Actually, I like the idea, using array to map RID to PE#,
which was implemented by Ben.

>
>>
>>  			/* 32-bit TCE tables allocation */
>>  			unsigned long		dma32_segcount;
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 15/42] powerpc/powernv: PE oriented during configuration
  2015-08-10 10:02   ` Alexey Kardashevskiy
@ 2015-08-11  0:39     ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 08:02:20PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>Several functions used to configure PE take pe_number to indentify
>>PE instance. As the pe_number is included in PE instance after it
>>is reserved or allocated. It's convienent for those functions to
>>return PE instance which includes the required pe_number.
>
>This is a description for the half of the patch but this patch also adds a
>return value to functions which did not have it before and I am not sure you
>need all of them to return something. It would be cleaner if you added
>"return" when/where you really need it, not just because it seems that it may
>be convenient later.
>

Fair enough. I'll change the commit log accordingly.

>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 51 ++++++++++++++++---------------
>>  arch/powerpc/platforms/powernv/pci.h      |  2 +-
>>  2 files changed, 27 insertions(+), 26 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 3094c61..9f53682 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -132,12 +132,12 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>  		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>>  }
>>
>>-static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>+static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>  {
>>  	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
>>  		pr_warn("%s: Invalid PE %d on PHB#%x\n",
>>  			__func__, pe_no, phb->hose->global_number);
>>-		return;
>>+		return NULL;
>>  	}
>>
>>  	if (test_and_set_bit(pe_no, phb->ioda.pe_alloc))
>>@@ -146,9 +146,11 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>
>>  	phb->ioda.pe_array[pe_no].phb = phb;
>>  	phb->ioda.pe_array[pe_no].pe_number = pe_no;
>>+
>>+	return &phb->ioda.pe_array[pe_no];
>>  }
>>
>>-static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>+static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>  {
>>  	unsigned long pe;
>>
>>@@ -156,12 +158,12 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>  		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>  					phb->ioda.total_pe, 0);
>>  		if (pe >= phb->ioda.total_pe)
>>-			return IODA_INVALID_PE;
>>+			return NULL;
>>  	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>
>>  	phb->ioda.pe_array[pe].phb = phb;
>>  	phb->ioda.pe_array[pe].pe_number = pe;
>>-	return pe;
>>+	return &phb->ioda.pe_array[pe];
>>  }
>>
>>  static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>>@@ -334,7 +336,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>  	}
>>  }
>>
>>-static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>+static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  {
>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>  	struct pnv_phb *phb = hose->private_data;
>>@@ -344,7 +346,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>
>>  	/* Root bus shouldn't use M64 */
>>  	if (pci_is_root_bus(bus))
>>-		return IODA_INVALID_PE;
>>+		return NULL;
>>
>>  	/* Allocate bitmap */
>>  	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
>>@@ -352,7 +354,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  	if (!pe_alloc) {
>>  		pr_warn("%s: Out of memory !\n",
>>  			__func__);
>>-		return IODA_INVALID_PE;
>>+		return NULL;
>>  	}
>>
>>  	/* Figure out reserved PE numbers by the PE */
>>@@ -365,7 +367,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  	 */
>>  	if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
>>  		kfree(pe_alloc);
>>-		return IODA_INVALID_PE;
>>+		return NULL;
>>  	}
>>
>>  	/*
>>@@ -416,7 +418,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>  	}
>>
>>  	kfree(pe_alloc);
>>-	return master_pe->pe_number;
>>+	return master_pe;
>>  }
>>
>>  static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>@@ -1069,28 +1071,26 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>   * subordinate PCI devices and buses. The second type of PE is normally
>>   * orgiriated by PCIe-to-PCI bridge or PLX switch downstream ports.
>>   */
>>-static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>+static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  {
>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>  	struct pnv_phb *phb = hose->private_data;
>>-	struct pnv_ioda_pe *pe;
>>-	int pe_num = IODA_INVALID_PE;
>>+	struct pnv_ioda_pe *pe = NULL;
>>
>>  	/* Check if PE is determined by M64 */
>>  	if (phb->pick_m64_pe)
>>-		pe_num = phb->pick_m64_pe(bus, all);
>>+		pe = phb->pick_m64_pe(bus, all);
>>
>>  	/* The PE number isn't pinned by M64 */
>>-	if (pe_num == IODA_INVALID_PE)
>>-		pe_num = pnv_ioda_alloc_pe(phb);
>>+	if (!pe)
>>+		pe = pnv_ioda_alloc_pe(phb);
>>
>>-	if (pe_num == IODA_INVALID_PE) {
>>-		pr_warning("%s: Not enough PE# available for PCI bus %04x:%02x\n",
>>+	if (!pe) {
>>+		pr_warning("%s: No enough PE# for PCI bus %04x:%02x\n",
>>  			__func__, pci_domain_nr(bus), bus->number);
>>-		return;
>>+		return NULL;
>>  	}
>>
>>-	pe = &phb->ioda.pe_array[pe_num];
>>  	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>>  	pe->pbus = bus;
>>  	pe->pdev = NULL;
>>@@ -1101,17 +1101,16 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>
>>  	if (all)
>>  		pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
>>-			bus->busn_res.start, bus->busn_res.end, pe_num);
>>+			bus->busn_res.start, bus->busn_res.end, pe->pe_number);
>>  	else
>>  		pe_info(pe, "Secondary bus %d associated with PE#%d\n",
>>-			bus->busn_res.start, pe_num);
>>+			bus->busn_res.start, pe->pe_number);
>>
>>  	if (pnv_ioda_configure_pe(phb, pe)) {
>>  		/* XXX What do we do here ? */
>>-		if (pe_num)
>>-			pnv_ioda_free_pe(phb, pe_num);
>>+		pnv_ioda_free_pe(phb, pe->pe_number);
>>  		pe->pbus = NULL;
>>-		return;
>>+		return NULL;
>>  	}
>>
>>  	/* Associate it with all child devices */
>>@@ -1122,6 +1121,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>
>>  	/* Link the PE */
>>  	pnv_ioda_link_pe_by_weight(phb, pe);
>>+
>>+	return pe;
>>  }
>>
>>  static void pnv_ioda_setup_PEs(struct pci_bus *bus)
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 6f8568e..c0bc57f 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -121,7 +121,7 @@ struct pnv_phb {
>>  	int (*init_m64)(struct pnv_phb *phb);
>>  	void (*reserve_m64_pe)(struct pci_bus *bus,
>>  			       unsigned long *pe_bitmap, bool all);
>>-	int (*pick_m64_pe)(struct pci_bus *bus, bool all);
>>+	struct pnv_ioda_pe* (*pick_m64_pe)(struct pci_bus *bus, bool all);
>>  	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
>>  	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
>>  	int (*unfreeze_pe)(struct pnv_phb *phb, int pe_no, int opt);
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 17/42] powerpc/powernv: Rename PE# fields in PHB
  2015-08-10 14:21     ` Alexey Kardashevskiy
@ 2015-08-11  0:40       ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Tue, Aug 11, 2015 at 12:21:16AM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>This renames the fields related to PE# in "struct pnv_phb" for
>>better reflecting of their usages as Alexey suggested. It doesn't
>>introduce behavioural changes.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>
>Makes sense to move this to the beginning of the patchset as patches prior
>this are changing the same lines as this patch changes.
>

Ok. I'll try to reorder the patch...

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order
  2015-08-10 14:39   ` Alexey Kardashevskiy
@ 2015-08-11  0:43     ` Gavin Shan
  2015-08-11  2:50       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Tue, Aug 11, 2015 at 12:39:02AM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>The available PE#, represented by a bitmap in the PHB, is allocated
>>in ascending order.
>
>Available PE# is available exactly because it is not allocated ;)
>

Yeah, will correct it.

>>It conflicts with the fact that M64 segments are
>>assigned in same order. In order to avoid the conflict, the patch
>>allocates PE# in descending order.
>
>What kind of conflict?
>

On PHB3, the M64 segment is assigned to one PE whose PE number is
determined. M64 segment are allocated in ascending order. It's why
I would like to allocate PE# in deascending order.

>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 11 ++++++++---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 56b058c..1c950e8 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -161,13 +161,18 @@ static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>  static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>  {
>>  	unsigned long pe;
>>+	unsigned long limit = phb->ioda.total_pe_num - 1;
>>
>>  	do {
>>  		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>-					phb->ioda.total_pe_num, 0);
>>-		if (pe >= phb->ioda.total_pe_num)
>>+					phb->ioda.total_pe_num, limit);
>>+		if (pe < phb->ioda.total_pe_num &&
>>+		    !test_and_set_bit(pe, phb->ioda.pe_alloc))
>>+			break;
>>+
>>+		if (--limit >= phb->ioda.total_pe_num)
>>  			return NULL;
>>-	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>+	} while (1);
>
>
>Usually, if it is "while(1)", then it is "while(1){}" rather than
>"do{}while(1)" :)

Agree, will change it.

>
>
>>
>>  	return pnv_ioda_init_pe(phb, pe);
>>  }
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree()
  2015-08-10 22:42   ` Frank Rowand
@ 2015-08-11  0:52     ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:52 UTC (permalink / raw)
  To: Frank Rowand
  Cc: devicetree, aik, panto, Gavin Shan, grant.likely, robherring2,
	linux-pci, bhelgaas, linuxppc-dev

On Mon, Aug 10, 2015 at 03:42:32PM -0700, Frank Rowand wrote:
>On 8/5/2015 9:11 PM, Gavin Shan wrote:

Frank, thanks for your comments. All of them will be included
in next revision.

Thanks,
Gavin

>> This changes of_fdt_unflatten_tree() so that it returns the allocated
>> memory chunk for unflattened device-tree, which can be released once
>> it's obsoleted.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  drivers/of/fdt.c       | 11 ++++++-----
>>  include/linux/of_fdt.h |  2 +-
>>  2 files changed, 7 insertions(+), 6 deletions(-)
>> 
>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>> index 074870a..8e1ba7e 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -408,7 +408,7 @@ static void *unflatten_dt_node(const void *blob,
>>   * @dt_alloc: An allocator that provides a virtual address to memory
>>   * for the resulting tree
>>   */
>> -static void __unflatten_device_tree(const void *blob,
>> +static void *__unflatten_device_tree(const void *blob,
>>  			     struct device_node *dad,
>>  			     struct device_node **mynodes,
>>  			     void * (*dt_alloc)(u64 size, u64 align))
>
>Please add a description of the return value to the documentation header.
>
>
>> @@ -421,7 +421,7 @@ static void __unflatten_device_tree(const void *blob,
>>  
>>  	if (!blob) {
>>  		pr_debug("No device tree pointer\n");
>> -		return;
>> +		return NULL;
>>  	}
>>  
>>  	pr_debug("Unflattening device tree:\n");
>> @@ -431,7 +431,7 @@ static void __unflatten_device_tree(const void *blob,
>>  
>>  	if (fdt_check_header(blob)) {
>>  		pr_err("Invalid device tree blob header\n");
>> -		return;
>> +		return NULL;
>>  	}
>>  
>>  	/* First pass, scan for size */
>> @@ -458,6 +458,7 @@ static void __unflatten_device_tree(const void *blob,
>>  			   be32_to_cpup(mem + size));
>>  
>>  	pr_debug(" <- unflatten_device_tree()\n");
>> +	return mem;
>>  }
>>  
>>  static void *kernel_tree_alloc(u64 size, u64 align)
>> @@ -473,11 +474,11 @@ static void *kernel_tree_alloc(u64 size, u64 align)
>>   * pointers of the nodes so the normal device-tree walking functions
>>   * can be used.
>>   */
>> -void of_fdt_unflatten_tree(const unsigned long *blob,
>> +void *of_fdt_unflatten_tree(const unsigned long *blob,
>
>Please add a description of the return value to the documentation header.
>
>
>>  			struct device_node *dad,
>>  			struct device_node **mynodes)
>>  {
>> -	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
>> +	return __unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
>>  }
>>  EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
>>  
>> diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
>> index 3644960..00db279 100644
>> --- a/include/linux/of_fdt.h
>> +++ b/include/linux/of_fdt.h
>> @@ -37,7 +37,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
>>  				 unsigned long node);
>>  extern int of_fdt_match(const void *blob, unsigned long node,
>>  			const char *const *compat);
>> -extern void of_fdt_unflatten_tree(const unsigned long *blob,
>> +extern void *of_fdt_unflatten_tree(const unsigned long *blob,
>>  			       struct device_node *dad,
>>  			       struct device_node **mynodes);
>>  
>> 
>

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 39/42] drivers/of: Allow to specify root node in of_fdt_unflatten_tree()
  2015-08-10 22:42   ` Frank Rowand
@ 2015-08-11  0:52     ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-11  0:52 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto, aik

On Mon, Aug 10, 2015 at 03:42:13PM -0700, Frank Rowand wrote:
>On 8/5/2015 9:11 PM, Gavin Shan wrote:
>> This introduces one more argument to of_fdt_unflatten_tree()
>> to specify the root node for the FDT blob, which is going to be
>> unflattened. In the result, the function can be used to unflatten
>> FDT blob, which represents device sub-tree in PowerNV hotplug
>> driver.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  drivers/of/fdt.c       | 13 ++++++++-----
>>  drivers/of/unittest.c  |  2 +-
>>  include/linux/of_fdt.h |  1 +
>>  3 files changed, 10 insertions(+), 6 deletions(-)
>> 
>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>> index a18a2ce..074870a 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -388,10 +388,11 @@ static void *unflatten_dt_node(const void *blob,
>>  			       struct device_node **nodepp,
>>  			       bool dryrun)
>>  {
>> +	unsigned long fpsize = dad ? strlen(of_node_full_name(dad)) : 0;
>>  	int depth = 1;
>>  
>>  	return __unflatten_dt_node(blob, mem, poffset,
>> -				   dad, nodepp, 0,
>> +				   dad, nodepp, fpsize,
>>  				   dryrun, &depth);
>>  }
>>  
>> @@ -408,6 +409,7 @@ static void *unflatten_dt_node(const void *blob,
>>   * for the resulting tree
>>   */
>>  static void __unflatten_device_tree(const void *blob,
>> +			     struct device_node *dad,
>>  			     struct device_node **mynodes,
>>  			     void * (*dt_alloc)(u64 size, u64 align))
>>  {
>
>Please add @dad to the documentation header for the function.
>

Yes, it will be included in next revision.


>> @@ -435,7 +437,7 @@ static void __unflatten_device_tree(const void *blob,
>>  	/* First pass, scan for size */
>>  	start = 0;
>>  	size = (unsigned long)unflatten_dt_node(blob, NULL, &start,
>> -						NULL, NULL, true);
>> +						dad, NULL, true);
>>  	size = ALIGN(size, 4);
>>  
>>  	pr_debug("  size is %lx, allocating...\n", size);
>> @@ -450,7 +452,7 @@ static void __unflatten_device_tree(const void *blob,
>>  
>>  	/* Second pass, do actual unflattening */
>>  	start = 0;
>> -	unflatten_dt_node(blob, mem, &start, NULL, mynodes, false);
>> +	unflatten_dt_node(blob, mem, &start, dad, mynodes, false);
>>  	if (be32_to_cpup(mem + size) != 0xdeadbeef)
>>  		pr_warning("End of tree marker overwritten: %08x\n",
>>  			   be32_to_cpup(mem + size));
>> @@ -472,9 +474,10 @@ static void *kernel_tree_alloc(u64 size, u64 align)
>>   * can be used.
>>   */
>>  void of_fdt_unflatten_tree(const unsigned long *blob,
>> +			struct device_node *dad,
>>  			struct device_node **mynodes)
>>  {
>> -	__unflatten_device_tree(blob, mynodes, &kernel_tree_alloc);
>> +	__unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
>>  }
>>  EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
>>  
>> @@ -1125,7 +1128,7 @@ bool __init early_init_dt_scan(void *params)
>>   */
>>  void __init unflatten_device_tree(void)
>>  {
>> -	__unflatten_device_tree(initial_boot_params, &of_root,
>> +	__unflatten_device_tree(initial_boot_params, NULL, &of_root,
>>  				early_init_dt_alloc_memory_arch);
>>  
>>  	/* Get pointer to "/chosen" and "/aliases" nodes for use everywhere */
>> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
>> index 1801634..2270830 100644
>> --- a/drivers/of/unittest.c
>> +++ b/drivers/of/unittest.c
>> @@ -907,7 +907,7 @@ static int __init unittest_data_add(void)
>>  			"not running tests\n", __func__);
>>  		return -ENOMEM;
>>  	}
>> -	of_fdt_unflatten_tree(unittest_data, &unittest_data_node);
>> +	of_fdt_unflatten_tree(unittest_data, NULL, &unittest_data_node);
>>  	if (!unittest_data_node) {
>>  		pr_warn("%s: No tree to attach; not running tests\n", __func__);
>>  		return -ENODATA;
>> diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
>> index df9ef38..3644960 100644
>> --- a/include/linux/of_fdt.h
>> +++ b/include/linux/of_fdt.h
>> @@ -38,6 +38,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
>>  extern int of_fdt_match(const void *blob, unsigned long node,
>>  			const char *const *compat);
>>  extern void of_fdt_unflatten_tree(const unsigned long *blob,
>> +			       struct device_node *dad,
>>  			       struct device_node **mynodes);
>>  
>>  /* TBD: Temporary export of fdt globals - remove when code fully merged */
>> 

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC
  2015-08-10 23:45     ` Gavin Shan
@ 2015-08-11  2:06       ` Alexey Kardashevskiy
  2015-08-12 10:28         ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-11  2:06 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/11/2015 09:45 AM, Gavin Shan wrote:
> On Mon, Aug 10, 2015 at 04:30:09PM +1000, Alexey Kardashevskiy wrote:
>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>> The patch enables M64 window on P7IOC, which has been enabled on
>>> PHB3. Different from PHB3 where 16 M64 BARs are supported and each
>>> of them can be owned by one particular PE# exclusively or divided
>>> evenly to 256 segments, each P7IOC PHB has 16 M64 BARs and each
>>> of them are divided into 8 segments.
>>
>> Is this a limitation of POWER7 chip or it is from IODA1?
>>
>
>  From IODA1.
>
>>> So each P7IOC PHB can support
>>> 128 M64 segments only. Also, P7IOC has M64DT, which helps mapping
>>> one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>> M64DT, indicating that one M64 segment can only be pinned to the
>>> fixed PE#. In order to have similar logic to support M64 for PHB3
>>> and P7IOC, we just provide 128 M64 (16 BARs) segments and fixed
>>> mapping between PE# and M64 segment# on P7IOC. In turn, we just
>>> need different phb->init_m64() hooks for P7IOC and PHB3 to support
>>> M64.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 116 ++++++++++++++++++++++++++----
>>>   1 file changed, 104 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 38b5405..e4ac703 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -172,6 +172,69 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>>>   	clear_bit(pe, phb->ioda.pe_alloc);
>>>   }
>>>
>>> +static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>> +{
>>> +	struct resource *r;
>>> +	int seg;
>>> +
>>> +	/* There are as many M64 segments as the maximum number
>>> +	 * of PEs, which is 128.
>>> +	 */
>>> +	for (seg = 0; seg < phb->ioda.total_pe; seg += 8) {
>>
>>
>> This "8" is used a lot across the patch, please make it a macro
>> (PNV_PHB_P7IOC_SEGNUM or PNV_PHB_IODA1_SEGNUM or whatever you think it is)
>> with a short comment why it is "8". Or a pnv_phb member.
>>
>
> I would like to use "8". When having a macro, you have to check
> the definition of the macro to get the real value of that.

Give it a good name then.


> However,
> it makes sense to add more comments explaining why it's 8 here.

You cannot comment it everywhere and everywhere is exact place when you'll 
have to comment it as I believe sometime it is segments-per-M64 and 
sometime it is number of bits in a byte (or not? anyway, this is will 
always distract unless you use macro for segments-per-M64).


>
>>
>>> +		unsigned long base;
>>> +		int64_t rc;
>>> +
>>> +		base = phb->ioda.m64_base + seg * phb->ioda.m64_segsize;
>>> +		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>> +						 OPAL_M64_WINDOW_TYPE,
>>> +						 seg / 8,
>>> +						 base,
>>> +						 0, /* unused */
>>> +						 8 * phb->ioda.m64_segsize);
>>> +		if (rc != OPAL_SUCCESS) {
>>> +			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>> +				rc, phb->hose->global_number, seg / 8);
>>> +			goto fail;
>>> +		}
>>> +
>>> +		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>> +					      OPAL_M64_WINDOW_TYPE,
>>> +					      seg / 8,
>>> +					      OPAL_ENABLE_M64_SPLIT);
>>> +		if (rc != OPAL_SUCCESS) {
>>> +			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>> +				rc, phb->hose->global_number, seg / 8);
>>> +			goto fail;
>>> +		}
>>> +	}
>>> +
>>> +	/* Strip off the segment used by the reserved PE, which
>>
>> What is this reserved PE on P7IOC? "Strip off" means "exclude" here?
>>
>
> 127 that was exported from skiboot. "Strip off" means "exclude".

I like "exclude" lot better.


>
>>
>>> +	 * is expected to be 0 or last supported PE#. The PHB's
>>> +	 * first memory window traces the 32-bits MMIO range
>>
>> s/traces/filters/ ? Or I did not understand this comment...
>>
>
> It seems you didn't understand it: there are two memory windows
> in every PHB. The first one is tracing M32 resource and the
> second one is tracing M64 resource.


Tracing means logging, pretty much. Is this what you mean here?

>
>>
>>> +	 * while the second one traces the 64-bits prefetchable
>>> +	 * MMIO range that the PHB supports.
>>
>> 32/64 ranges comment seems irrelevant here.
>>
>
> Maybe it's not so relevant, but still.

Not relevant -> remove it. Put this text to the commit log.


> We're stripping off the
> M64 segment from the 2nd resource (as above), not first one.


2nd window (not _resource_), you mean?


>
>>
>>> +	 */
>>> +	r = &phb->hose->mem_resources[1];
>>> +	if (phb->ioda.reserved_pe == 0)
>>> +		r->start += phb->ioda.m64_segsize;
>>> +	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
>>> +		r->end -= phb->ioda.m64_segsize;
>>> +	else
>>> +		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
>>> +			phb->ioda.reserved_pe);
>>> +
>>> +	return 0;
>>> +
>>> +fail:
>>> +	for ( ; seg >= 0; seg -= 8)
>>> +		opal_pci_phb_mmio_enable(phb->opal_id,
>>> +					 OPAL_M64_WINDOW_TYPE,
>>> +					 seg / 8,
>>> +					 OPAL_DISABLE_M64);
>>> +
>>> +	return -EIO;
>>> +}
>>> +
>>>   /* The default M64 BAR is shared by all PEs */
>>>   static int pnv_ioda2_init_m64(struct pnv_phb *phb)
>>>   {
>>> @@ -256,9 +319,9 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
>>>   	}
>>>   }
>>>
>>> -static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
>>> -				     unsigned long *pe_bitmap,
>>> -				     bool all)
>>> +static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>> +				    unsigned long *pe_bitmap,
>>> +				    bool all)
>>>   {
>>>   	struct pci_dev *pdev;
>>>
>>> @@ -266,12 +329,12 @@ static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
>>>   		pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
>>>
>>>   		if (all && pdev->subordinate)
>>> -			pnv_ioda2_reserve_m64_pe(pdev->subordinate,
>>> -						 pe_bitmap, all);
>>> +			pnv_ioda_reserve_m64_pe(pdev->subordinate,
>>> +						pe_bitmap, all);
>>>   	}
>>>   }
>>>
>>> -static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>> +static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   {
>>>   	struct pci_controller *hose = pci_bus_to_host(bus);
>>>   	struct pnv_phb *phb = hose->private_data;
>>> @@ -293,7 +356,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   	}
>>>
>>>   	/* Figure out reserved PE numbers by the PE */
>>> -	pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
>>> +	pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
>>>
>>>   	/*
>>>   	 * the current bus might not own M64 window and that's all
>>> @@ -324,6 +387,26 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   			pe->master = master_pe;
>>>   			list_add_tail(&pe->list, &master_pe->slaves);
>>>   		}
>>> +
>>> +		/* P7IOC supports M64DT, which helps mapping M64 segment
>>> +		 * to one particular PE#. However, PHB3 has fixed mapping
>>> +		 * between M64 segment and PE#. In order to have same logic
>>> +		 * for P7IOC and PHB3, we enforce fixed mapping between M64
>>> +		 * segment and PE# on P7IOC.
>>> +		 */
>>> +		if (phb->type == PNV_PHB_IODA1) {
>>> +			int64_t rc;
>>> +
>>> +			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> +							 pe->pe_number,
>>> +							 OPAL_M64_WINDOW_TYPE,
>>> +							 pe->pe_number / 8,
>>> +							 pe->pe_number % 8);
>>> +			if (rc != OPAL_SUCCESS)
>>> +				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
>>> +					__func__, rc, phb->hose->global_number,
>>> +					pe->pe_number);
>>> +		}
>>>   	}
>>>
>>>   	kfree(pe_alloc);
>>> @@ -338,8 +421,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>   	const u32 *r;
>>>   	u64 pci_addr;
>>>
>>> -	/* FIXME: Support M64 for P7IOC */
>>> -	if (phb->type != PNV_PHB_IODA2) {
>>> +	if (phb->type != PNV_PHB_IODA1 &&
>>> +	    phb->type != PNV_PHB_IODA2) {
>>>   		pr_info("  Not support M64 window\n");
>>>   		return;
>>
>>
>> You are adding P7IOC support so at least "fixme" should go. Also,
>> pnv_ioda_parse_m64_window() is only called from pnv_pci_init_ioda_phb() which
>> is called only with PNV_PHB_IODA1 and PNV_PHB_IODA2 (no other value is passed
>> there a type) so the check above will never succeed, just remove it.
>>
>
> The "fixme" is removed, isn't it?

Ah, my bad.


> As I explained last time, there will have another new type PHB and the function
> will be called on the new type of PHB.

Then a new patch adding new PHB should take care of this check too. This is 
not something which can possibly happen on a real machine, we support one 
of 2 (later - 3) PHBs and if a machine got something else, we won't get 
that far anyway and we cannot gracefully fallback to some "generic PHB" 
(like 440fx on x86) as we do not have one.

At least make it BUG_ON() to document it.


> The code has been there and it's not
> in upstream yet. So it's reasonable to keep it, instead of removing it.

No, not really.

>
>>>   	}
>>> @@ -372,9 +455,18 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>
>>>   	/* Use last M64 BAR to cover M64 window */
>>>   	phb->ioda.m64_bar_idx = 15;
>>> -	phb->init_m64 = pnv_ioda2_init_m64;
>>> -	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
>>> -	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
>>> +	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>>> +	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>>> +	switch (phb->type) {
>>> +	case PNV_PHB_IODA1:
>>> +		phb->init_m64 = pnv_ioda1_init_m64;
>>> +		break;
>>> +	case PNV_PHB_IODA2:
>>> +		phb->init_m64 = pnv_ioda2_init_m64;
>>> +		break;
>>> +	default:
>>> +		pr_debug("   M64 not supported\n");
>>> +	}
>>>   }
>>>
>>>   static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
>>>
>
> Thanks,
> Gavin
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
  2015-08-11  0:03     ` Gavin Shan
@ 2015-08-11  2:23       ` Alexey Kardashevskiy
  2015-08-12 10:45         ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-11  2:23 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/11/2015 10:03 AM, Gavin Shan wrote:
> On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote:
>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>> The patch is adding 6 bitmaps, three to PE and three to PHB, to track
>>
>> The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that
>> all about? Also, there was no m64_segmap, now there is, needs an explanation
>> may be.
>>
>
> Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically.
> Now, they have fixed sizes - 512 bits.
>
> The subject "powerpc/powernv: Track IO/M32/M64 segments from PE" indicates
> why m64_segmap is added.


But before this patch, you somehow managed to keep it working without a map 
for M64, by the same time you needed map for IO and M32. It seems you are 
making things consistent in this patch but it also feels like you do not 
have to do so as M64 did not need a map before and I cannot see why it 
needs one now.


>>
>>> the consumed by one particular PE, which can be released once the PE
>>> is destroyed during PCI unplugging time. Also, we're using fixed
>>> quantity of bits to trace the used IO, M32 and M64 segments by PEs
>>> in one particular PHB.
>>>
>>
>> Out of curiosity - have you considered having just 3 arrays, in PHB, storing
>> PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it
>> is using? Not sure about this master/slave PEs though.
>>
>
> I don't follow your suggestion. Can you rephrase and explain it a bit more?


Please explains in what situations you need same map in both PHB and PE and 
how you are going to use them. For example, pe::m64_segmap and phb::m64_segmap.

I believe you need to know what segment is used by what PE and that's it 
and having 2 bitmaps is overcomplicated hard to follow. Is there anything 
else what I am missing?



>> It would be easier to read patches if this one was right before
>> [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>>
>
> I'll try to reoder the patch, but not expect too much...
>
>>
>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++++++++++++++--------------
>>>   arch/powerpc/platforms/powernv/pci.h      | 18 ++++++++++++++----
>>>   2 files changed, 29 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index e4ac703..78b49a1 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>   			list_add_tail(&pe->list, &master_pe->slaves);
>>>   		}
>>>
>>> +		/* M64 segments consumed by slave PEs are tracked
>>> +		 * by master PE
>>> +		 */
>>> +		set_bit(pe->pe_number, master_pe->m64_segmap);
>>> +		set_bit(pe->pe_number, phb->ioda.m64_segmap);
>>> +
>>>   		/* P7IOC supports M64DT, which helps mapping M64 segment
>>>   		 * to one particular PE#. However, PHB3 has fixed mapping
>>>   		 * between M64 segment and PE#. In order to have same logic
>>> @@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>
>>>   			while (index < phb->ioda.total_pe &&
>>>   			       region.start <= region.end) {
>>> -				phb->ioda.io_segmap[index] = pe->pe_number;
>>> +				set_bit(index, pe->io_segmap);
>>> +				set_bit(index, phb->ioda.io_segmap);
>>>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> -					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
>>> +					pe->pe_number, OPAL_IO_WINDOW_TYPE,
>>> +					0, index);
>>
>> Unrelated change.
>>
>
> True, will drop. However, checkpatch.pl will complain wtih:
> exceeding 80 characters.

It will not as you are not changing these lines, it only complains on changes.



>
>>>   				if (rc != OPAL_SUCCESS) {
>>>   					pr_err("%s: OPAL error %d when mapping IO "
>>>   					       "segment #%d to PE#%d\n",
>>> @@ -2896,9 +2904,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>
>>>   			while (index < phb->ioda.total_pe &&
>>>   			       region.start <= region.end) {
>>> -				phb->ioda.m32_segmap[index] = pe->pe_number;
>>> +				set_bit(index, pe->m32_segmap);
>>> +				set_bit(index, phb->ioda.m32_segmap);
>>>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> -					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
>>> +					pe->pe_number, OPAL_M32_WINDOW_TYPE,
>>> +					0, index);
>>
>> Unrelated change.
>>
>
> same as above.
>
>>>   				if (rc != OPAL_SUCCESS) {
>>>   					pr_err("%s: OPAL error %d when mapping M32 "
>>>   					       "segment#%d to PE#%d",
>>> @@ -3090,7 +3100,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   {
>>>   	struct pci_controller *hose;
>>>   	struct pnv_phb *phb;
>>> -	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>> +	unsigned long size, pemap_off;
>>>   	const __be64 *prop64;
>>>   	const __be32 *prop32;
>>>   	int len;
>>> @@ -3175,19 +3185,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>
>>>   	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>
>>
>> This comment came with if(IODA1) below, since you are removing the condition
>> below, makes sense to remove the comment as well or move it where people will
>> look for it (arch/powerpc/platforms/powernv/pci.h ?)
>>
>
> Yes, will do.
>
>>
>>>   	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
>>> -	m32map_off = size;
>>> -	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
>>> -	if (phb->type == PNV_PHB_IODA1) {
>>> -		iomap_off = size;
>>> -		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
>>> -	}
>>>   	pemap_off = size;
>>>   	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
>>>   	aux = memblock_virt_alloc(size, 0);
>>
>>
>> After adding static arrays to PE and PHB, do you still need this "aux"?
>>
>
> "aux" is still needed to tell the boundary of pe_alloc_bitmap and pe_array.
>>
>>>   	phb->ioda.pe_alloc = aux;
>>> -	phb->ioda.m32_segmap = aux + m32map_off;
>>> -	if (phb->type == PNV_PHB_IODA1)
>>> -		phb->ioda.io_segmap = aux + iomap_off;
>>>   	phb->ioda.pe_array = aux + pemap_off;
>>>   	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index 62239b1..08a4e57 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -49,6 +49,15 @@ struct pnv_ioda_pe {
>>>   	/* PE number */
>>>   	unsigned int		pe_number;
>>>
>>> +	/* IO/M32/M64 segments consumed by the PE. Each PE can
>>> +	 * have one M64 segment at most, but M64 segments consumed
>>> +	 * by slave PEs will be contributed to the master PE. One
>>> +	 * PE can own multiple IO and M32 segments.
>>
>>
>> A PE can have multiple IO and M32 segments but just one M64 segment? Is this
>> correct for IODA1 or IODA2 or both? Is this a limitation of this
>> implementation or it comes from P7IOC/PHB3 hardware?
>>
>
> It's correct for IO and M32. However, on IODA1 or IODA2, one PE can have
> multiple M64 segments as well.


But the comment says "Each PE can have one M64 segment at most". Which 
statement is correct?


>>> +	 */
>>> +	unsigned long		io_segmap[8];
>>> +	unsigned long		m32_segmap[8];
>>> +	unsigned long		m64_segmap[8];
>>
>> Magic constant "8", 64bit*8 = 512 PEs - where did this come from?
>>
>> Anyway,
>>
>> #define PNV_IODA_MAX_PE_NUM	512
>>
>> unsigned long io_segmap[PNV_IODA_MAX_PE_NUM/BITS_PER_LONG]
>>
>
> I prefer "8", not macro for 3 reasons:
> - The macro won't be used in the code.

You will use it 6 times in the header, if you give it a good name, people 
won't have to guess if the meaning of all these "8"s is the same and you 
won't have to comment every use of it in this header file (now you have).

Also, using BITS_PER_LONG tells the reader that this is a bitmask for sure.


> - The total segment number of specific resource is variable
>    on IODA1 and IODA2. I just choosed the max value with margin.
> - PNV_IODA_MAX_PE_NUM, indicating max PE number, isn't 512 on
>    IODA1 or IODA2.

Give it a better name.


>
>>> +
>>>   	/* "Weight" assigned to the PE for the sake of DMA resource
>>>   	 * allocations
>>>   	 */
>>> @@ -145,15 +154,16 @@ struct pnv_phb {
>>>   			unsigned int		io_segsize;
>>>   			unsigned int		io_pci_base;
>>>
>>> +			/* IO, M32, M64 segment maps */
>>> +			unsigned long		io_segmap[8];
>>> +			unsigned long		m32_segmap[8];
>>> +			unsigned long		m64_segmap[8];
>>> +
>>>   			/* PE allocation */
>>>   			struct mutex		pe_alloc_mutex;
>>>   			unsigned long		*pe_alloc;
>>>   			struct pnv_ioda_pe	*pe_array;
>>>
>>> -			/* M32 & IO segment maps */
>>> -			unsigned int		*m32_segmap;
>>> -			unsigned int		*io_segmap;
>>> -
>>>   			/* IRQ chip */
>>>   			int			irq_chip_init;
>>>   			struct irq_chip		irq_chip;
>>>
>
> Thanks,
> Gavin
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 07/42] powerpc/powernv: Improve IO and M32 mapping
  2015-08-11  0:12       ` Gavin Shan
@ 2015-08-11  2:32         ` Alexey Kardashevskiy
  2015-08-12 23:42           ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-11  2:32 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/11/2015 10:12 AM, Gavin Shan wrote:
> On Mon, Aug 10, 2015 at 05:40:08PM +1000, Alexey Kardashevskiy wrote:
>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>> There're 3 windows (IO, M32 and M64) for PHB, root port and upstream
>>
>> These are actually IO, non-prefetchable and prefetchable windows which happen
>> to be IO, 32bit and 64bit windows but this has nothing to do with the M32/M64
>> BAR registers in P7IOC/PHB3, do I understand this correctly?
>>
>
> In pci-ioda.c, we have below definiations that are defined when
> developing the code, not from any specification:
>
> IO  - resources with IO property
> M32 - 32-bits or non-prefetchable resources
> M64 - 64-bits and prefetchable resources


This what I am saying - it is incorrect and confusing. M32/M64 are PHB3 
register names and associated windows (with "M" in the beginning) but not 
device resources.


>>> port of the PCIE switch behind root port. In order to support PCI
>>> hotplug, we extend the start/end address of those 3 windows of root
>>> port or upstream port to the start/end address of the 3 PHB's windows.
>>> The current implementation, assigning IO or M32 segment based on the
>>> bridge's windows, isn't reliable.
>>>
>>> The patch fixes above issue by calculating PE's consumed IO or M32
>>> segments from its contained devices, no PCI bridge windows involved
>>> if the PE doesn't contain all the subordinate PCI buses.
>>
>> Please, rephrase it. How can PCI bridges be involved in PE consumption?
>>
>
> Ok. Will add something like below:
>
> if the PE, corresponding to the PCI bus, doesn't contain all the subordinate
> PCI buses.


No, my question was about "PCI bridge windows involved" - what do you do to 
the windows if PE does not own all child buses?



>>
>>> Otherwise,
>>> the PCI bridge windows still contribute to PE's consumed IO or M32
>>> segments.
>>
>> PCI bridge windows themselves consume PEs? Is that correct?
>>
>
> PCI bridge windows consume IO, M32, M64 segments, not PEs.

Ah, right.


>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 136 +++++++++++++++++-------------
>>>   1 file changed, 79 insertions(+), 57 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 488a53e..713f4b4 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -2844,75 +2844,97 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>   }
>>>   #endif /* CONFIG_PCI_IOV */
>>>
>>> -/*
>>> - * This function is supposed to be called on basis of PE from top
>>> - * to bottom style. So the the I/O or MMIO segment assigned to
>>> - * parent PE could be overrided by its child PEs if necessary.
>>> - */
>>> -static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>> -				  struct pnv_ioda_pe *pe)
>>> +static int pnv_ioda_setup_one_res(struct pci_controller *hose,
>>> +				  struct pnv_ioda_pe *pe,
>>> +				  struct resource *res)
>>>   {
>>>   	struct pnv_phb *phb = hose->private_data;
>>>   	struct pci_bus_region region;
>>> -	struct resource *res;
>>> -	int i, index;
>>> -	unsigned int segsize;
>>> +	unsigned int index, segsize;
>>>   	unsigned long *segmap, *pe_segmap;
>>>   	uint16_t win;
>>>   	int64_t rc;
>>>
>>> -	/*
>>> -	 * NOTE: We only care PCI bus based PE for now. For PCI
>>> -	 * device based PE, for example SRIOV sensitive VF should
>>> -	 * be figured out later.
>>> -	 */
>>> -	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>>> +	/* Check if we need map the resource */
>>> +	if (!res->parent || !res->flags || res->start > res->end)
>>
>> res->start >= res->end ?
>>
>
> No, res->start == res->end is valid.
>
>>
>>> +		return 0;
>>>
>>> -	pci_bus_for_each_resource(pe->pbus, res, i) {
>>> -		if (!res || !res->flags ||
>>> -		    res->start > res->end)
>>> -			continue;
>>> +	if (res->flags & IORESOURCE_IO) {
>>> +		region.start = res->start - phb->ioda.io_pci_base;
>>> +		region.end   = res->end - phb->ioda.io_pci_base;
>>> +		segsize      = phb->ioda.io_segsize;
>>> +		segmap       = phb->ioda.io_segmap;
>>> +		pe_segmap    = pe->io_segmap;
>>> +		win          = OPAL_IO_WINDOW_TYPE;
>>> +	} else if ((res->flags & IORESOURCE_MEM) &&
>>> +		   !pnv_pci_is_mem_pref_64(res->flags)) {
>>> +		region.start = res->start -
>>> +			       hose->mem_offset[0] -
>>> +			       phb->ioda.m32_pci_base;
>>> +		region.end   = res->end -
>>> +			       hose->mem_offset[0] -
>>> +			       phb->ioda.m32_pci_base;
>>> +		segsize      = phb->ioda.m32_segsize;
>>> +		segmap       = phb->ioda.m32_segmap;
>>> +		pe_segmap    = pe->m32_segmap;
>>> +		win          = OPAL_M32_WINDOW_TYPE;
>>> +	} else {
>>> +		return 0;
>>> +	}
>>>
>>> -		if (res->flags & IORESOURCE_IO) {
>>> -			region.start = res->start - phb->ioda.io_pci_base;
>>> -			region.end   = res->end - phb->ioda.io_pci_base;
>>> -			segsize      = phb->ioda.io_segsize;
>>> -			segmap       = phb->ioda.io_segmap;
>>> -			pe_segmap    = pe->io_segmap;
>>> -			win          = OPAL_IO_WINDOW_TYPE;
>>> -		} else if ((res->flags & IORESOURCE_MEM) &&
>>> -			   !pnv_pci_is_mem_pref_64(res->flags)) {
>>> -			region.start = res->start -
>>> -				       hose->mem_offset[0] -
>>> -				       phb->ioda.m32_pci_base;
>>> -			region.end   = res->end -
>>> -				       hose->mem_offset[0] -
>>> -				       phb->ioda.m32_pci_base;
>>> -			segsize      = phb->ioda.m32_segsize;
>>> -			segmap       = phb->ioda.m32_segmap;
>>> -			pe_segmap    = pe->m32_segmap;
>>> -			win          = OPAL_M32_WINDOW_TYPE;
>>> -		} else {
>>> -			continue;
>>> +	region.start = _ALIGN_DOWN(region.start, segsize);
>>> +	region.end   = _ALIGN_UP(region.end, segsize);
>>> +	index = region.start / segsize;
>>> +	while (index < phb->ioda.total_pe &&
>>> +	       region.start < region.end) {
>>> +		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> +				pe->pe_number, win, 0, index);
>>> +		if (rc != OPAL_SUCCESS) {
>>> +			pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
>>> +				__func__, rc, win, index,
>>> +				pe->phb->hose->global_number,
>>> +				pe->pe_number);
>>> +			return -EIO;
>>>   		}
>>>
>>> -		index = region.start / phb->ioda.io_segsize;
>>> -		while (index < phb->ioda.total_pe &&
>>> -		       region.start <= region.end) {
>>> -			set_bit(index, segmap);
>>> -			set_bit(index, pe_segmap);
>>> -			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> -					pe->pe_number, win, 0, index);
>>> -			if (rc != OPAL_SUCCESS) {
>>> -				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
>>> -					__func__, rc, win, index,
>>> -					pe->phb->hose->global_number,
>>> -					pe->pe_number);
>>> -				break;
>>> -			}
>>> +		set_bit(index, segmap);
>>> +		set_bit(index, pe_segmap);
>>> +		region.start += segsize;
>>> +		index++;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>> +				  struct pnv_ioda_pe *pe)
>>> +{
>>> +	struct pci_dev *pdev;
>>> +	struct resource *res;
>>> +	int i;
>>> +
>>> +	/* This function only works for bus dependent PE */
>>> +	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>>> +
>>> +	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
>>> +		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>>> +			res = &pdev->resource[i];
>>> +			if (pnv_ioda_setup_one_res(hose, pe, res))
>>> +				return;
>>> +		}
>>> +
>>> +		/* If the PE contains all subordinate PCI buses, the
>>> +		 * resources of the child bridges should be mapped
>>> +		 * to the PE as well.
>>> +		 */
>>> +		if (!(pe->flags & PNV_IODA_PE_BUS_ALL) ||
>>> +		    (pdev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
>>> +			continue;
>>>
>>> -			region.start += segsize;
>>> -			index++;
>>> +		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
>>> +			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
>>> +			if (pnv_ioda_setup_one_res(hose, pe, res))
>>> +				return;
>>>   		}
>>>   	}
>>>   }
>>>
>
> Thanks,
> Gavin
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only
  2015-08-11  0:29     ` Gavin Shan
@ 2015-08-11  2:39       ` Alexey Kardashevskiy
  2015-08-12 23:59         ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-11  2:39 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/11/2015 10:29 AM, Gavin Shan wrote:
> On Mon, Aug 10, 2015 at 07:31:11PM +1000, Alexey Kardashevskiy wrote:
>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>> The original implementation of pnv_ioda_setup_dma() iterates the
>>> list of PEs and configures the DMA32 space for them one by one.
>>> The function was designed to be called during PHB fixup time.
>>> When configuring PE's DMA32 space in pcibios_setup_bridge(), in
>>> order to support PCI hotplug, we have to have the function PE
>>> oriented.
>>>
>>> This renames pnv_ioda_setup_dma() to pnv_ioda1_setup_dma() and
>>> adds one more argument "struct pnv_ioda_pe *pe" to it. The caller,
>>> pnv_pci_ioda_setup_DMA(), gets PE from the list and passes to it
>>> or pnv_pci_ioda2_setup_dma_pe(). The patch shouldn't cause behavioral
>>> changes.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 75 +++++++++++++++----------------
>>>   1 file changed, 36 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 8456f37..cd22002 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -2443,52 +2443,29 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>>   		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>>>   }
>>>
>>> -static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>> +static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
>>> +					struct pnv_ioda_pe *pe,
>>> +					unsigned int base)
>>>   {
>>>   	struct pci_controller *hose = phb->hose;
>>> -	struct pnv_ioda_pe *pe;
>>> -	unsigned int dma_weight;
>>> +	unsigned int dma_weight, segs;
>>>
>>>   	/* Calculate the PHB's DMA weight */
>>>   	dma_weight = pnv_ioda_phb_dma_weight(phb);
>>>   	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
>>>   		hose->global_number, phb->ioda.dma32_segcount, dma_weight);
>>>
>>> -	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>> -
>>> -	/* Walk our PE list and configure their DMA segments, hand them
>>> -	 * out one base segment plus any residual segments based on
>>> -	 * weight
>>> -	 */
>>> -	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>> -		if (!pe->dma32_weight)
>>> -			continue;
>>> -
>>> -		/*
>>> -		 * For IODA2 compliant PHB3, we needn't care about the weight.
>>> -		 * The all available 32-bits DMA space will be assigned to
>>> -		 * the specific PE.
>>> -		 */
>>> -		if (phb->type == PNV_PHB_IODA1) {
>>> -			unsigned int segs, base = 0;
>>> -
>>> -			if (pe->dma32_weight <
>>> -			    dma_weight / phb->ioda.dma32_segcount)
>>> -				segs = 1;
>>> -			else
>>> -				segs = (pe->dma32_weight *
>>> -					phb->ioda.dma32_segcount) / dma_weight;
>>> -
>>> -			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
>>> -				pe->dma32_weight, segs);
>>> -			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>>> +	if (pe->dma32_weight <
>>> +	    dma_weight / phb->ioda.dma32_segcount)
>>
>> Can be one line now.
>>
>
> Indeed.
>
>>> +		segs = 1;
>>> +	else
>>> +		segs = (pe->dma32_weight *
>>> +			phb->ioda.dma32_segcount) / dma_weight;
>>> +	pe_info(pe, "DMA weight %d, assigned %d segments\n",
>>> +		pe->dma32_weight, segs);
>>> +	pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>>
>>
>> Why not to merge pnv_ioda1_setup_dma() to pnv_pci_ioda_setup_dma_pe()?
>>
>
> There're two reasons:
> - They're separate logically. One is calculating number of DMA32 segments required.
>    Another one is allocate TCE32 tables and configure devices with them.
> - In PCI hotplug path, I need pnv_ioda1_setup_dma() which has "pe" as parameter.


And hotplug path does not care about dma weight why?


>
>>>
>>> -			base += segs;
>>> -		} else {
>>> -			pe_info(pe, "Assign DMA32 space\n");
>>> -			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>> -		}
>>> -	}
>>> +	return segs;
>>>   }
>>>
>>>   #ifdef CONFIG_PCI_MSI
>>> @@ -2955,12 +2932,32 @@ static void pnv_pci_ioda_setup_DMA(void)
>>>   {
>>>   	struct pci_controller *hose, *tmp;
>>>   	struct pnv_phb *phb;
>>> +	struct pnv_ioda_pe *pe;
>>> +	unsigned int base;
>>>
>>>   	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>> -		pnv_ioda_setup_dma(hose->private_data);
>>> +		phb = hose->private_data;
>>> +		pnv_pci_ioda_setup_opal_tce_kill(phb);
>>> +
>>> +		base = 0;
>>> +		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>> +			if (!pe->dma32_weight)
>>> +				continue;
>>> +
>>> +			switch (phb->type) {
>>> +			case PNV_PHB_IODA1:
>>> +				base += pnv_ioda1_setup_dma(phb, pe, base);
>>
>>
>> This @base handling seems never be tested between 8..11 as "[PATCH v6 11/42]
>> powerpc/powernv: Trace DMA32 segments consumed by PE"
>> removes it and I suspect you only tested the final version. Which is ok for
>> the final result but not ok for bisectability.
>>
>> Looks like 8/42, 9/42, 10/42, 11/42 need to be rearranged or merged to remove
>> this multiple @base touching.
>>
>
> Why ?

You are touching this @base from 8/42 to 11/12 and in between it is very 
broken, you only get it fixed (by removing) in 11/42. Read my comment for 
8/42. After every single patch in any patchset the functionality should not 
break but it does in this patchset.


>
>>
>>> +				break;
>>> +			case PNV_PHB_IODA2:
>>> +				pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>> +				break;
>>> +			default:
>>> +				pr_warn("%s: No DMA for PHB type %d\n",
>>> +					__func__, phb->type);
>>> +			}
>>> +		}
>>>
>>>   		/* Mark the PHB initialization done */
>>> -		phb = hose->private_data;
>>>   		phb->initialized = 1;
>>>   	}
>>>   }
>>>
>
> Thanks,
> Gavin
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 12/42] powerpc/powernv: Increase PE# capacity
  2015-08-11  0:38         ` Gavin Shan
@ 2015-08-11  2:47           ` Alexey Kardashevskiy
  2015-08-13  0:23             ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-11  2:47 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/11/2015 10:38 AM, Gavin Shan wrote:
> On Mon, Aug 10, 2015 at 07:53:02PM +1000, Alexey Kardashevskiy wrote:
>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>> Each PHB maintains an array helping to translate RID (Request
>>> ID) to PE# with the assumption that PE# takes 8 bits, indicating
>>> that we can't have more than 256 PEs. However, pci_dn->pe_number
>>> already had 4-bytes for the PE#.
>>>
>>> The patch extends the PE# capacity so that each of them will be
>>> 4-bytes long. Then we can use IODA_INVALID_PE to check one entry
>>> in phb->pe_rmap[] is valid or not.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 8 ++++++--
>>>   arch/powerpc/platforms/powernv/pci.h      | 7 +++----
>>>   2 files changed, 9 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 57ba8fd..3094c61 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -786,7 +786,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>>
>>>   	/* Clear the reverse map */
>>>   	for (rid = pe->rid; rid < rid_end; rid++)
>>> -		phb->ioda.pe_rmap[rid] = 0;
>>> +		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>>>
>>>   	/* Release from all parents PELT-V */
>>>   	while (parent) {
>>> @@ -3134,7 +3134,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   	unsigned long size, pemap_off;
>>>   	const __be64 *prop64;
>>>   	const __be32 *prop32;
>>> -	int len;
>>> +	int len, i;
>>>   	u64 phb_id;
>>>   	void *aux;
>>>   	long rc;
>>> @@ -3201,6 +3201,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   	if (prop32)
>>>   		phb->ioda.reserved_pe = be32_to_cpup(prop32);
>>>
>>> +	/* Invalidate RID to PE# mapping */
>>> +	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
>>> +		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
>>> +
>>>   	/* Parse 64-bit MMIO range */
>>>   	pnv_ioda_parse_m64_window(phb);
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index 1dc9578..6f8568e 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -175,11 +175,10 @@ struct pnv_phb {
>>>   			struct list_head	pe_list;
>>>   			struct mutex            pe_list_mutex;
>>>
>>> -			/* Reverse map of PEs, will have to extend if
>>> -			 * we are to support more than 256 PEs, indexed
>>> -			 * bus { bus, devfn }
>>> +			/* Reverse map of PEs, indexed by
>>> +			 * { bus, devfn }
>>>   			 */
>>> -			unsigned char		pe_rmap[0x10000];
>>> +			int			pe_rmap[0x10000];
>>
>>
>> 256k seems to be waste when only tiny fraction of it will ever be used. Using
>> include/linux/hashtable.h makes sense here, and if you use a hashtable, you
>> won't have to initialize anything with IODA_INVALID_PE.
>>
>
> I'm not sure if I follow your idea completely. With hash table to trace
> RID mapping here, won't more memory needed if all PCI buse numbers (0
> to 255) are all valid? It means hash table doesn't have advantage in
> memory consumption.

You need 3 bytes - one for a bus and two for devfn - which makes it a 
perfect 32bit has key and you only store existing devices in a hash so you 
do not waste memory.


> On the other hand, searching in hash table buckets
> have to iterate list of conflicting items (keys), which is slow comparing
> to what we have.

How often do you expect this code to execute? Is not it setup-type and 
hotplug only? Unless it is thousands times per second, it is not an issue here.


> Actually, I like the idea, using array to map RID to PE#,
> which was implemented by Ben.

Where?



>
>>
>>>
>>>   			/* 32-bit TCE tables allocation */
>>>   			unsigned long		dma32_segcount;
>>>
>
> Thanks,
> Gavin
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order
  2015-08-11  0:43     ` Gavin Shan
@ 2015-08-11  2:50       ` Alexey Kardashevskiy
  2015-08-13  0:28         ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-11  2:50 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/11/2015 10:43 AM, Gavin Shan wrote:
> On Tue, Aug 11, 2015 at 12:39:02AM +1000, Alexey Kardashevskiy wrote:
>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>> The available PE#, represented by a bitmap in the PHB, is allocated
>>> in ascending order.
>>
>> Available PE# is available exactly because it is not allocated ;)
>>
>
> Yeah, will correct it.
>
>>> It conflicts with the fact that M64 segments are
>>> assigned in same order. In order to avoid the conflict, the patch
>>> allocates PE# in descending order.
>>
>> What kind of conflict?
>>
>
> On PHB3, the M64 segment is assigned to one PE whose PE number is
> determined. M64 segment are allocated in ascending order. It's why
> I would like to allocate PE# in deascending order.


 From previous lessons, I thought M64 segment number is PE# number as well :-/
Seems this is not the case, so what does store this seg#<->PE# mapping in PHB?


>
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 11 ++++++++---
>>>   1 file changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 56b058c..1c950e8 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -161,13 +161,18 @@ static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>   static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>>   {
>>>   	unsigned long pe;
>>> +	unsigned long limit = phb->ioda.total_pe_num - 1;
>>>
>>>   	do {
>>>   		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>> -					phb->ioda.total_pe_num, 0);
>>> -		if (pe >= phb->ioda.total_pe_num)
>>> +					phb->ioda.total_pe_num, limit);
>>> +		if (pe < phb->ioda.total_pe_num &&
>>> +		    !test_and_set_bit(pe, phb->ioda.pe_alloc))
>>> +			break;
>>> +
>>> +		if (--limit >= phb->ioda.total_pe_num)
>>>   			return NULL;
>>> -	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>> +	} while (1);
>>
>>
>> Usually, if it is "while(1)", then it is "while(1){}" rather than
>> "do{}while(1)" :)
>
> Agree, will change it.
>
>>
>>
>>>
>>>   	return pnv_ioda_init_pe(phb, pe);
>>>   }
>>>
>
> Thanks,
> Gavin
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
  2015-08-06  4:11 ` [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically Gavin Shan
@ 2015-08-11 13:03   ` Alexey Kardashevskiy
  2015-08-13  0:54     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-11 13:03 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> This adds the refcount to PE, which represents number of PCI
> devices contained in the PE. When last device leaves from the
> PE, the PE together with its consumed resources (IO, DMA, PELTM,
> PELTV) are released, to support PCI hotplug.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 233 +++++++++++++++++++++++++++---
>   arch/powerpc/platforms/powernv/pci.h      |   3 +
>   2 files changed, 217 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index d2697a3..13d8a5b 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -132,6 +132,53 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>   		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>   }
>
> +static void pnv_pci_ioda_release_pe_dma(struct pnv_ioda_pe *pe)

Is this ioda1 helper or common helper for both ioda1 and ioda2?

> +{
> +	struct pnv_phb *phb = pe->phb;
> +	struct iommu_table *tbl;
> +	int seg;
> +	int64_t rc;
> +
> +	/* No DMA32 segments allocated */
> +	if (pe->dma32_seg == PNV_INVALID_SEGMENT ||
> +	    pe->dma32_segcount <= 0) {


dma32_segcount is unsigned long, cannot be less than 0.


> +		pe->dma32_seg = PNV_INVALID_SEGMENT;
> +		pe->dma32_segcount = 0;
> +		return;
> +	}
> +
> +	/* Unlink IOMMU table from group */
> +	tbl = pe->table_group.tables[0];
> +	pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
> +	if (pe->table_group.group) {
> +		iommu_group_put(pe->table_group.group);
> +		BUG_ON(pe->table_group.group);
> +	}
> +
> +	/* Release IOMMU table */
> +	free_pages(tbl->it_base,
> +		get_order(TCE32_TABLE_SIZE * pe->dma32_segcount));
> +	iommu_free_table(tbl,
> +		of_node_full_name(pci_bus_to_OF_node(pe->pbus)));

There is pnv_pci_ioda2_table_free_pages(), use it.


> +
> +	/* Disable TVE */
> +	for (seg = pe->dma32_seg;
> +	     seg < pe->dma32_seg + pe->dma32_segcount;
> +	     seg++) {
> +		rc = opal_pci_map_pe_dma_window(phb->opal_id,
> +				pe->pe_number, seg, 0, 0ul, 0ul, 0ul);
> +		if (rc)
> +			pe_warn(pe, "Error %ld unmapping DMA32 seg#%d\n",
> +				rc, seg);
> +	}

May be implement iommu_table_group_ops::unset_window for IODA1 too?


> +
> +	/* Free the DMA32 segments */
> +	bitmap_clear(phb->ioda.dma32_segmap,
> +		pe->dma32_seg, pe->dma32_segcount);
> +	pe->dma32_seg = PNV_INVALID_SEGMENT;
> +	pe->dma32_segcount = 0;
> +}
> +
>   static inline void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe)
>   {
>   	/* 01xb - invalidate TCEs that match the specified PE# */
> @@ -199,13 +246,15 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
>   		pe->tce_bypass_enabled = enable;
>   }
>
> -#ifdef CONFIG_PCI_IOV
> -static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev,
> -					 struct pnv_ioda_pe *pe)
> +static void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe)
>   {
>   	struct iommu_table    *tbl;
> +	struct device_node    *dn;
>   	int64_t               rc;
>
> +	if (pe->dma32_seg == PNV_INVALID_SEGMENT)
> +		return;
> +
>   	tbl = pe->table_group.tables[0];
>   	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
>   	if (rc)
> @@ -216,10 +265,91 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev,
>   		iommu_group_put(pe->table_group.group);
>   		BUG_ON(pe->table_group.group);
>   	}
> +
> +	if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
> +		dn = pci_bus_to_OF_node(pe->pbus);
> +	else if (pe->flags & PNV_IODA_PE_DEV)
> +		dn = pci_device_to_OF_node(pe->pdev);
> +#ifdef CONFIG_PCI_IOV
> +	else if (pe->flags & PNV_IODA_PE_VF)
> +		dn = pci_device_to_OF_node(pe->parent_dev);
> +#endif
> +	else
> +		dn = NULL;
> +
>   	pnv_pci_ioda2_table_free_pages(tbl);
> -	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
> +	iommu_free_table(tbl, of_node_full_name(dn));
> +	pe->dma32_seg = PNV_INVALID_SEGMENT;
> +}



I'd drop the chunk about calculating @dn above, nobody really cares what 
iommu_free_table() prints. If you really need to print something, print PE#.


> +
> +static void pnv_ioda_release_pe_dma(struct pnv_ioda_pe *pe)
> +{
> +	struct pnv_phb *phb = pe->phb;
> +
> +	switch (phb->type) {
> +	case PNV_PHB_IODA1:
> +		pnv_pci_ioda_release_pe_dma(pe);
> +		break;
> +	case PNV_PHB_IODA2:
> +		pnv_pci_ioda2_release_pe_dma(pe);
> +		break;
> +	default:
> +		pr_warn("%s: Cannot release DMA for PHB type %d\n",
> +			__func__, phb->type);

This is BUG_ON() indeed because we cannot possibly get that far with 
unsupported PHB type, it would have crashed earlier.


> +	}
> +}
> +
> +static void pnv_ioda_release_pe_one_seg(struct pnv_ioda_pe *pe, int win)
> +{
> +	struct pnv_phb *phb = pe->phb;
> +	unsigned long *segmap = NULL;
> +	unsigned long *pe_segmap = NULL;
> +	int segno, limit, mod = 0;
> +
> +	switch (win) {
> +	case OPAL_IO_WINDOW_TYPE:
> +		segmap = phb->ioda.io_segmap;
> +		pe_segmap = pe->io_segmap;
> +		break;
> +	case OPAL_M32_WINDOW_TYPE:
> +		segmap = phb->ioda.m32_segmap;
> +		pe_segmap = pe->m32_segmap;
> +		break;
> +	case OPAL_M64_WINDOW_TYPE:
> +		if (phb->type != PNV_PHB_IODA1)
> +			return;
> +		segmap = phb->ioda.m64_segmap;
> +		pe_segmap = pe->m64_segmap;


You seem to keep phb->ioda.m64_segmap update but you never actually read 
it, you only read pe->m64_segmap. Is that correct or I am missing something 
here?


> +		mod = 8;
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	segno = -1;
> +	limit = phb->ioda.total_pe_num;
> +	while ((segno = find_next_bit(pe_segmap, limit, segno + 1)) < limit) {
> +		if (mod > 0)
> +			opal_pci_map_pe_mmio_window(phb->opal_id,
> +				phb->ioda.reserved_pe_idx, win,
> +				segno / mod, segno % mod);
> +		else
> +			opal_pci_map_pe_mmio_window(phb->opal_id,
> +					phb->ioda.reserved_pe_idx, win,
> +					0, segno);
> +
> +		clear_bit(segno, pe_segmap);
> +		clear_bit(segno, segmap);
> +	}
> +}
> +
> +static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
> +{
> +	int win;
> +
> +	for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++)
> +		pnv_ioda_release_pe_one_seg(pe, win);
>   }
> -#endif /* CONFIG_PCI_IOV */
>
>   static int pnv_ioda_set_one_peltv(struct pnv_phb *phb,
>   				  struct pnv_ioda_pe *parent,
> @@ -325,7 +455,6 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
>   	return 0;
>   }
>
> -#ifdef CONFIG_PCI_IOV
>   static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>   {
>   	struct pci_dev *parent;
> @@ -373,9 +502,11 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>   		}
>   		rid_end = pe->rid + (count << 8);
>   	} else {
> +#ifdef CONFIG_PCI_IOV
>   		if (pe->flags & PNV_IODA_PE_VF)
>   			parent = pe->parent_dev;
>   		else
> +#endif
>   			parent = pe->pdev->bus->self;
>   		bcomp = OpalPciBusAll;
>   		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
> @@ -415,11 +546,72 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>
>   	pe->pbus = NULL;
>   	pe->pdev = NULL;
> +#ifdef CONFIG_PCI_IOV
>   	pe->parent_dev = NULL;
> +#endif
>
>   	return 0;
>   }
> -#endif /* CONFIG_PCI_IOV */
> +
> +static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
> +{
> +	struct pnv_phb *phb = pe->phb;
> +	struct pnv_ioda_pe *tmp, *slave;
> +
> +	/* Release slave PEs in compound PE */
> +	if (pe->flags & PNV_IODA_PE_MASTER) {
> +		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
> +			pnv_ioda_release_pe(pe);
> +	}
> +
> +	/* Remove the PE from the list */
> +	list_del(&pe->list);
> +
> +	/* Release resources */
> +	pnv_ioda_release_pe_dma(pe);
> +	pnv_ioda_release_pe_seg(pe);
> +	pnv_ioda_deconfigure_pe(pe->phb, pe);
> +
> +	/* Release PE number */
> +	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
> +}
> +
> +static inline struct pnv_ioda_pe *pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
> +{
> +	if (!pe)
> +		return NULL;
> +
> +	pe->device_count++;
> +	return pe;
> +}
> +
> +static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
> +{
> +	if (!pe)
> +		return;
> +
> +	pe->device_count--;
> +	BUG_ON(pe->device_count < 0);
> +	if (pe->device_count == 0)
> +		pnv_ioda_release_pe(pe);
> +}

Sure you do not want atomic_t for device_count? Races are impossibe here?


> +
> +static void pnv_pci_release_device(struct pci_dev *pdev)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pci_dn *pdn = pci_get_pdn(pdev);
> +	struct pnv_ioda_pe *pe;
> +
> +	if (pdev->is_virtfn)
> +		return;
> +
> +	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
> +		return;
> +
> +	pe = &phb->ioda.pe_array[pdn->pe_number];
> +	pnv_ioda_pe_put(pe);
> +}
>
>   static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
>   {
> @@ -466,6 +658,7 @@ static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>   	return pnv_ioda_init_pe(phb, pe);
>   }
>
> +#ifdef CONFIG_PCI_IOV
>   static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)

The name of pnv_ioda_free_pe() suggests it should work for non-SRIOV case 
too but you put it under #ifdef IOV, is that correct? Is so, rename it please.


>   {
>   	WARN_ON(phb->ioda.pe_array[pe].pdev);
> @@ -473,6 +666,7 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>   	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
>   	clear_bit(pe, phb->ioda.pe_alloc);
>   }
> +#endif
>
>   static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>   {
> @@ -1177,6 +1371,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>   		if (pdn->pe_number != IODA_INVALID_PE)
>   			continue;
>
> +		pnv_ioda_pe_get(pe);
>   		pdn->pe_number = pe->pe_number;
>   		pe->dma32_weight += pnv_ioda_dma_weight(dev);
>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
> @@ -1231,7 +1426,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>   	pe->pbus = bus;
>   	pe->pdev = NULL;
> -	pe->dma32_seg = -1;
> +	pe->dma32_seg = PNV_INVALID_SEGMENT;
>   	pe->mve_number = -1;
>   	pe->rid = bus->busn_res.start << 8;
>   	pe->dma32_weight = 0;
> @@ -1244,9 +1439,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   			bus->busn_res.start, pe->pe_number);
>
>   	if (pnv_ioda_configure_pe(phb, pe)) {
> -		/* XXX What do we do here ? */
> -		pnv_ioda_free_pe(phb, pe->pe_number);
>   		pe->pbus = NULL;
> +		pnv_ioda_release_pe(pe);
>   		return NULL;
>   	}
>
> @@ -1449,14 +1643,14 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>   		if ((pe->flags & PNV_IODA_PE_MASTER) &&
>   		    (pe->flags & PNV_IODA_PE_VF)) {
>   			list_for_each_entry_safe(s, sn, &pe->slaves, list) {
> -				pnv_pci_ioda2_release_dma_pe(pdev, s);
> +				pnv_pci_ioda2_release_dma_pe(s);
>   				list_del(&s->list);
>   				pnv_ioda_deconfigure_pe(phb, s);
>   				pnv_ioda_free_pe(phb, s->pe_number);
>   			}
>   		}
>
> -		pnv_pci_ioda2_release_dma_pe(pdev, pe);
> +		pnv_pci_ioda2_release_pe_dma(pe);
>
>   		/* Remove from list */
>   		mutex_lock(&phb->ioda.pe_list_mutex);
> @@ -1532,7 +1726,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>   		pe->flags = PNV_IODA_PE_VF;
>   		pe->pbus = NULL;
>   		pe->parent_dev = pdev;
> -		pe->dma32_seg = -1;
> +		pe->dma32_seg = PNV_INVALID_SEGMENT;


This and similar changes are not really about "Release PEs dynamically".


>   		pe->mve_number = -1;
>   		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
>   			   pci_iov_virtfn_devfn(pdev, vf_index);
> @@ -1995,7 +2189,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>   	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>
>   	/* We shouldn't already have a 32-bit DMA associated */
> -	if (WARN_ON(pe->dma32_seg >= 0))
> +	if (WARN_ON(pe->dma32_seg != PNV_INVALID_SEGMENT))
>   		return;
>
>   	tbl = pnv_pci_table_alloc(phb->hose->node);
> @@ -2066,10 +2260,10 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>   	return;
>    fail:
>   	/* XXX Failure: Try to fallback to 64-bit only ? */
> -	if (pe->dma32_seg >= 0) {
> +	if (pe->dma32_seg != PNV_INVALID_SEGMENT) {
>   		bitmap_clear(phb->ioda.dma32_segmap,
>   			     pe->dma32_seg, pe->dma32_segcount);
> -		pe->dma32_seg = -1;
> +		pe->dma32_seg = PNV_INVALID_SEGMENT;
>   		pe->dma32_segcount = 0;
>   	}
>
> @@ -2416,7 +2610,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>   	int64_t rc;
>
>   	/* We shouldn't already have a 32-bit DMA associated */
> -	if (WARN_ON(pe->dma32_seg >= 0))
> +	if (WARN_ON(pe->dma32_seg != PNV_INVALID_SEGMENT))
>   		return;
>
>   	/* TVE #1 is selected by PCI address bit 59 */
> @@ -2443,8 +2637,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>
>   	rc = pnv_pci_ioda2_setup_default_config(pe);
>   	if (rc) {
> -		if (pe->dma32_seg >= 0)
> -			pe->dma32_seg = -1;
> +		if (pe->dma32_seg != PNV_INVALID_SEGMENT)
> +			pe->dma32_seg = PNV_INVALID_SEGMENT;
>   		return;
>   	}
>
> @@ -3183,6 +3377,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>          .teardown_msi_irqs = pnv_teardown_msi_irqs,
>   #endif
>          .enable_device_hook = pnv_pci_enable_device_hook,
> +	.release_device = pnv_pci_release_device,
>          .window_alignment = pnv_pci_window_alignment,
>   	.setup_bridge = pnv_pci_setup_bridge,
>          .reset_secondary_bus = pnv_pci_reset_secondary_bus,
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index f8e6022..2058f06 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -25,11 +25,14 @@ enum pnv_phb_model {
>   #define PNV_IODA_PE_SLAVE	(1 << 4)	/* Slave PE in compound case	*/
>   #define PNV_IODA_PE_VF		(1 << 5)	/* PE for one VF 		*/
>
> +#define PNV_INVALID_SEGMENT	(-1)
> +
>   /* Data associated with a PE, including IOMMU tracking etc.. */
>   struct pnv_phb;
>   struct pnv_ioda_pe {
>   	unsigned long		flags;
>   	struct pnv_phb		*phb;
> +	int			device_count;
>
>   	/* A PE can be associated with a single device or an
>   	 * entire bus (& children). In the former case, pdev
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC
  2015-08-11  2:06       ` Alexey Kardashevskiy
@ 2015-08-12 10:28         ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-12 10:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Tue, Aug 11, 2015 at 12:06:26PM +1000, Alexey Kardashevskiy wrote:
>On 08/11/2015 09:45 AM, Gavin Shan wrote:
>>On Mon, Aug 10, 2015 at 04:30:09PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>The patch enables M64 window on P7IOC, which has been enabled on
>>>>PHB3. Different from PHB3 where 16 M64 BARs are supported and each
>>>>of them can be owned by one particular PE# exclusively or divided
>>>>evenly to 256 segments, each P7IOC PHB has 16 M64 BARs and each
>>>>of them are divided into 8 segments.
>>>
>>>Is this a limitation of POWER7 chip or it is from IODA1?
>>>
>>
>> From IODA1.
>>
>>>>So each P7IOC PHB can support
>>>>128 M64 segments only. Also, P7IOC has M64DT, which helps mapping
>>>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>>>M64DT, indicating that one M64 segment can only be pinned to the
>>>>fixed PE#. In order to have similar logic to support M64 for PHB3
>>>>and P7IOC, we just provide 128 M64 (16 BARs) segments and fixed
>>>>mapping between PE# and M64 segment# on P7IOC. In turn, we just
>>>>need different phb->init_m64() hooks for P7IOC and PHB3 to support
>>>>M64.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 116 ++++++++++++++++++++++++++----
>>>>  1 file changed, 104 insertions(+), 12 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 38b5405..e4ac703 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -172,6 +172,69 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>>>>  	clear_bit(pe, phb->ioda.pe_alloc);
>>>>  }
>>>>
>>>>+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>>>+{
>>>>+	struct resource *r;
>>>>+	int seg;
>>>>+
>>>>+	/* There are as many M64 segments as the maximum number
>>>>+	 * of PEs, which is 128.
>>>>+	 */
>>>>+	for (seg = 0; seg < phb->ioda.total_pe; seg += 8) {
>>>
>>>
>>>This "8" is used a lot across the patch, please make it a macro
>>>(PNV_PHB_P7IOC_SEGNUM or PNV_PHB_IODA1_SEGNUM or whatever you think it is)
>>>with a short comment why it is "8". Or a pnv_phb member.
>>>
>>
>>I would like to use "8". When having a macro, you have to check
>>the definition of the macro to get the real value of that.
>
>Give it a good name then.
>
>
>>However,
>>it makes sense to add more comments explaining why it's 8 here.
>
>You cannot comment it everywhere and everywhere is exact place when you'll
>have to comment it as I believe sometime it is segments-per-M64 and sometime
>it is number of bits in a byte (or not? anyway, this is will always distract
>unless you use macro for segments-per-M64).
>

Ok. I will use PNV_PHB_IODA1_SEGNUM then.

>>
>>>
>>>>+		unsigned long base;
>>>>+		int64_t rc;
>>>>+
>>>>+		base = phb->ioda.m64_base + seg * phb->ioda.m64_segsize;
>>>>+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>>>+						 OPAL_M64_WINDOW_TYPE,
>>>>+						 seg / 8,
>>>>+						 base,
>>>>+						 0, /* unused */
>>>>+						 8 * phb->ioda.m64_segsize);
>>>>+		if (rc != OPAL_SUCCESS) {
>>>>+			pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>>>+				rc, phb->hose->global_number, seg / 8);
>>>>+			goto fail;
>>>>+		}
>>>>+
>>>>+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>>+					      OPAL_M64_WINDOW_TYPE,
>>>>+					      seg / 8,
>>>>+					      OPAL_ENABLE_M64_SPLIT);
>>>>+		if (rc != OPAL_SUCCESS) {
>>>>+			pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>>>+				rc, phb->hose->global_number, seg / 8);
>>>>+			goto fail;
>>>>+		}
>>>>+	}
>>>>+
>>>>+	/* Strip off the segment used by the reserved PE, which
>>>
>>>What is this reserved PE on P7IOC? "Strip off" means "exclude" here?
>>>
>>
>>127 that was exported from skiboot. "Strip off" means "exclude".
>
>I like "exclude" lot better.
>

Ok. Will use it.

>>
>>>
>>>>+	 * is expected to be 0 or last supported PE#. The PHB's
>>>>+	 * first memory window traces the 32-bits MMIO range
>>>
>>>s/traces/filters/ ? Or I did not understand this comment...
>>>
>>
>>It seems you didn't understand it: there are two memory windows
>>in every PHB. The first one is tracing M32 resource and the
>>second one is tracing M64 resource.
>
>
>Tracing means logging, pretty much. Is this what you mean here?
>

No, it means "recording", not "logging". So it would be appropriate
to replace it with "track"?

>>
>>>
>>>>+	 * while the second one traces the 64-bits prefetchable
>>>>+	 * MMIO range that the PHB supports.
>>>
>>>32/64 ranges comment seems irrelevant here.
>>>
>>
>>Maybe it's not so relevant, but still.
>
>Not relevant -> remove it. Put this text to the commit log.
>

Ok.

>>We're stripping off the
>>M64 segment from the 2nd resource (as above), not first one.
>
>
>2nd window (not _resource_), you mean?
>

I mean struct pci_controller::mem_resources[1].


>
>>
>>>
>>>>+	 */
>>>>+	r = &phb->hose->mem_resources[1];
>>>>+	if (phb->ioda.reserved_pe == 0)
>>>>+		r->start += phb->ioda.m64_segsize;
>>>>+	else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
>>>>+		r->end -= phb->ioda.m64_segsize;
>>>>+	else
>>>>+		pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
>>>>+			phb->ioda.reserved_pe);
>>>>+
>>>>+	return 0;
>>>>+
>>>>+fail:
>>>>+	for ( ; seg >= 0; seg -= 8)
>>>>+		opal_pci_phb_mmio_enable(phb->opal_id,
>>>>+					 OPAL_M64_WINDOW_TYPE,
>>>>+					 seg / 8,
>>>>+					 OPAL_DISABLE_M64);
>>>>+
>>>>+	return -EIO;
>>>>+}
>>>>+
>>>>  /* The default M64 BAR is shared by all PEs */
>>>>  static int pnv_ioda2_init_m64(struct pnv_phb *phb)
>>>>  {
>>>>@@ -256,9 +319,9 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
>>>>  	}
>>>>  }
>>>>
>>>>-static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
>>>>-				     unsigned long *pe_bitmap,
>>>>-				     bool all)
>>>>+static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>>>+				    unsigned long *pe_bitmap,
>>>>+				    bool all)
>>>>  {
>>>>  	struct pci_dev *pdev;
>>>>
>>>>@@ -266,12 +329,12 @@ static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
>>>>  		pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
>>>>
>>>>  		if (all && pdev->subordinate)
>>>>-			pnv_ioda2_reserve_m64_pe(pdev->subordinate,
>>>>-						 pe_bitmap, all);
>>>>+			pnv_ioda_reserve_m64_pe(pdev->subordinate,
>>>>+						pe_bitmap, all);
>>>>  	}
>>>>  }
>>>>
>>>>-static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>+static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>  {
>>>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>>>  	struct pnv_phb *phb = hose->private_data;
>>>>@@ -293,7 +356,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>  	}
>>>>
>>>>  	/* Figure out reserved PE numbers by the PE */
>>>>-	pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
>>>>+	pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
>>>>
>>>>  	/*
>>>>  	 * the current bus might not own M64 window and that's all
>>>>@@ -324,6 +387,26 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>  			pe->master = master_pe;
>>>>  			list_add_tail(&pe->list, &master_pe->slaves);
>>>>  		}
>>>>+
>>>>+		/* P7IOC supports M64DT, which helps mapping M64 segment
>>>>+		 * to one particular PE#. However, PHB3 has fixed mapping
>>>>+		 * between M64 segment and PE#. In order to have same logic
>>>>+		 * for P7IOC and PHB3, we enforce fixed mapping between M64
>>>>+		 * segment and PE# on P7IOC.
>>>>+		 */
>>>>+		if (phb->type == PNV_PHB_IODA1) {
>>>>+			int64_t rc;
>>>>+
>>>>+			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>+							 pe->pe_number,
>>>>+							 OPAL_M64_WINDOW_TYPE,
>>>>+							 pe->pe_number / 8,
>>>>+							 pe->pe_number % 8);
>>>>+			if (rc != OPAL_SUCCESS)
>>>>+				pr_warn("%s: Error %lld mapping M64 for PHB#%d-PE#%d\n",
>>>>+					__func__, rc, phb->hose->global_number,
>>>>+					pe->pe_number);
>>>>+		}
>>>>  	}
>>>>
>>>>  	kfree(pe_alloc);
>>>>@@ -338,8 +421,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>>  	const u32 *r;
>>>>  	u64 pci_addr;
>>>>
>>>>-	/* FIXME: Support M64 for P7IOC */
>>>>-	if (phb->type != PNV_PHB_IODA2) {
>>>>+	if (phb->type != PNV_PHB_IODA1 &&
>>>>+	    phb->type != PNV_PHB_IODA2) {
>>>>  		pr_info("  Not support M64 window\n");
>>>>  		return;
>>>
>>>
>>>You are adding P7IOC support so at least "fixme" should go. Also,
>>>pnv_ioda_parse_m64_window() is only called from pnv_pci_init_ioda_phb() which
>>>is called only with PNV_PHB_IODA1 and PNV_PHB_IODA2 (no other value is passed
>>>there a type) so the check above will never succeed, just remove it.
>>>
>>
>>The "fixme" is removed, isn't it?
>
>Ah, my bad.
>
>
>>As I explained last time, there will have another new type PHB and the function
>>will be called on the new type of PHB.
>
>Then a new patch adding new PHB should take care of this check too. This is
>not something which can possibly happen on a real machine, we support one of
>2 (later - 3) PHBs and if a machine got something else, we won't get that far
>anyway and we cannot gracefully fallback to some "generic PHB" (like 440fx on
>x86) as we do not have one.
>
>At least make it BUG_ON() to document it.
>

ok. I'll change accordingly.

>>The code has been there and it's not
>>in upstream yet. So it's reasonable to keep it, instead of removing it.
>
>No, not really.
>
>>
>>>>  	}
>>>>@@ -372,9 +455,18 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
>>>>
>>>>  	/* Use last M64 BAR to cover M64 window */
>>>>  	phb->ioda.m64_bar_idx = 15;
>>>>-	phb->init_m64 = pnv_ioda2_init_m64;
>>>>-	phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
>>>>-	phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
>>>>+	phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
>>>>+	phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
>>>>+	switch (phb->type) {
>>>>+	case PNV_PHB_IODA1:
>>>>+		phb->init_m64 = pnv_ioda1_init_m64;
>>>>+		break;
>>>>+	case PNV_PHB_IODA2:
>>>>+		phb->init_m64 = pnv_ioda2_init_m64;
>>>>+		break;
>>>>+	default:
>>>>+		pr_debug("   M64 not supported\n");
>>>>+	}
>>>>  }
>>>>
>>>>  static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
>>>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
  2015-08-11  2:23       ` Alexey Kardashevskiy
@ 2015-08-12 10:45         ` Gavin Shan
  2015-08-12 11:05           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-12 10:45 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote:
>On 08/11/2015 10:03 AM, Gavin Shan wrote:
>>On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>The patch is adding 6 bitmaps, three to PE and three to PHB, to track
>>>
>>>The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that
>>>all about? Also, there was no m64_segmap, now there is, needs an explanation
>>>may be.
>>>
>>
>>Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically.
>>Now, they have fixed sizes - 512 bits.
>>
>>The subject "powerpc/powernv: Track IO/M32/M64 segments from PE" indicates
>>why m64_segmap is added.
>
>
>But before this patch, you somehow managed to keep it working without a map
>for M64, by the same time you needed map for IO and M32. It seems you are
>making things consistent in this patch but it also feels like you do not have
>to do so as M64 did not need a map before and I cannot see why it needs one
>now.
>

The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
where the M64 segments consumed by one particular PE will be released.

>>>
>>>>the consumed by one particular PE, which can be released once the PE
>>>>is destroyed during PCI unplugging time. Also, we're using fixed
>>>>quantity of bits to trace the used IO, M32 and M64 segments by PEs
>>>>in one particular PHB.
>>>>
>>>
>>>Out of curiosity - have you considered having just 3 arrays, in PHB, storing
>>>PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it
>>>is using? Not sure about this master/slave PEs though.
>>>
>>
>>I don't follow your suggestion. Can you rephrase and explain it a bit more?
>
>
>Please explains in what situations you need same map in both PHB and PE and
>how you are going to use them. For example, pe::m64_segmap and
>phb::m64_segmap.
>
>I believe you need to know what segment is used by what PE and that's it and
>having 2 bitmaps is overcomplicated hard to follow. Is there anything else
>what I am missing?
>

The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap
as an example, it will be used when creating or destroying the PE who consumes
M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain.
It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks
the M64 segments consumed by the PE.

>>>It would be easier to read patches if this one was right before
>>>[PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>>>
>>
>>I'll try to reoder the patch, but not expect too much...
>>
>>>
>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++++++++++++++--------------
>>>>  arch/powerpc/platforms/powernv/pci.h      | 18 ++++++++++++++----
>>>>  2 files changed, 29 insertions(+), 18 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index e4ac703..78b49a1 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>  			list_add_tail(&pe->list, &master_pe->slaves);
>>>>  		}
>>>>
>>>>+		/* M64 segments consumed by slave PEs are tracked
>>>>+		 * by master PE
>>>>+		 */
>>>>+		set_bit(pe->pe_number, master_pe->m64_segmap);
>>>>+		set_bit(pe->pe_number, phb->ioda.m64_segmap);
>>>>+
>>>>  		/* P7IOC supports M64DT, which helps mapping M64 segment
>>>>  		 * to one particular PE#. However, PHB3 has fixed mapping
>>>>  		 * between M64 segment and PE#. In order to have same logic
>>>>@@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>
>>>>  			while (index < phb->ioda.total_pe &&
>>>>  			       region.start <= region.end) {
>>>>-				phb->ioda.io_segmap[index] = pe->pe_number;
>>>>+				set_bit(index, pe->io_segmap);
>>>>+				set_bit(index, phb->ioda.io_segmap);
>>>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
>>>>+					pe->pe_number, OPAL_IO_WINDOW_TYPE,
>>>>+					0, index);
>>>
>>>Unrelated change.
>>>
>>
>>True, will drop. However, checkpatch.pl will complain wtih:
>>exceeding 80 characters.
>
>It will not as you are not changing these lines, it only complains on changes.
>
>
>
>>
>>>>  				if (rc != OPAL_SUCCESS) {
>>>>  					pr_err("%s: OPAL error %d when mapping IO "
>>>>  					       "segment #%d to PE#%d\n",
>>>>@@ -2896,9 +2904,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>
>>>>  			while (index < phb->ioda.total_pe &&
>>>>  			       region.start <= region.end) {
>>>>-				phb->ioda.m32_segmap[index] = pe->pe_number;
>>>>+				set_bit(index, pe->m32_segmap);
>>>>+				set_bit(index, phb->ioda.m32_segmap);
>>>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
>>>>+					pe->pe_number, OPAL_M32_WINDOW_TYPE,
>>>>+					0, index);
>>>
>>>Unrelated change.
>>>
>>
>>same as above.
>>
>>>>  				if (rc != OPAL_SUCCESS) {
>>>>  					pr_err("%s: OPAL error %d when mapping M32 "
>>>>  					       "segment#%d to PE#%d",
>>>>@@ -3090,7 +3100,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>  {
>>>>  	struct pci_controller *hose;
>>>>  	struct pnv_phb *phb;
>>>>-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>>>+	unsigned long size, pemap_off;
>>>>  	const __be64 *prop64;
>>>>  	const __be32 *prop32;
>>>>  	int len;
>>>>@@ -3175,19 +3185,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>
>>>>  	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>>
>>>
>>>This comment came with if(IODA1) below, since you are removing the condition
>>>below, makes sense to remove the comment as well or move it where people will
>>>look for it (arch/powerpc/platforms/powernv/pci.h ?)
>>>
>>
>>Yes, will do.
>>
>>>
>>>>  	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
>>>>-	m32map_off = size;
>>>>-	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
>>>>-	if (phb->type == PNV_PHB_IODA1) {
>>>>-		iomap_off = size;
>>>>-		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
>>>>-	}
>>>>  	pemap_off = size;
>>>>  	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
>>>>  	aux = memblock_virt_alloc(size, 0);
>>>
>>>
>>>After adding static arrays to PE and PHB, do you still need this "aux"?
>>>
>>
>>"aux" is still needed to tell the boundary of pe_alloc_bitmap and pe_array.
>>>
>>>>  	phb->ioda.pe_alloc = aux;
>>>>-	phb->ioda.m32_segmap = aux + m32map_off;
>>>>-	if (phb->type == PNV_PHB_IODA1)
>>>>-		phb->ioda.io_segmap = aux + iomap_off;
>>>>  	phb->ioda.pe_array = aux + pemap_off;
>>>>  	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>>>index 62239b1..08a4e57 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci.h
>>>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>>>@@ -49,6 +49,15 @@ struct pnv_ioda_pe {
>>>>  	/* PE number */
>>>>  	unsigned int		pe_number;
>>>>
>>>>+	/* IO/M32/M64 segments consumed by the PE. Each PE can
>>>>+	 * have one M64 segment at most, but M64 segments consumed
>>>>+	 * by slave PEs will be contributed to the master PE. One
>>>>+	 * PE can own multiple IO and M32 segments.
>>>
>>>
>>>A PE can have multiple IO and M32 segments but just one M64 segment? Is this
>>>correct for IODA1 or IODA2 or both? Is this a limitation of this
>>>implementation or it comes from P7IOC/PHB3 hardware?
>>>
>>
>>It's correct for IO and M32. However, on IODA1 or IODA2, one PE can have
>>multiple M64 segments as well.
>
>
>But the comment says "Each PE can have one M64 segment at most". Which
>statement is correct?
>

The comment is correct regarding PHB's 15th M64 BAR: Each PE can have one
M64 segment at post. It's from hardware limitation. However, once one PE
consumes multiple M64 segments. all those M64 segments will be tracked in
"master" PE and it's determined by software implementation.

>>>>+	 */
>>>>+	unsigned long		io_segmap[8];
>>>>+	unsigned long		m32_segmap[8];
>>>>+	unsigned long		m64_segmap[8];
>>>
>>>Magic constant "8", 64bit*8 = 512 PEs - where did this come from?
>>>
>>>Anyway,
>>>
>>>#define PNV_IODA_MAX_PE_NUM	512
>>>
>>>unsigned long io_segmap[PNV_IODA_MAX_PE_NUM/BITS_PER_LONG]
>>>
>>
>>I prefer "8", not macro for 3 reasons:
>>- The macro won't be used in the code.
>
>You will use it 6 times in the header, if you give it a good name, people
>won't have to guess if the meaning of all these "8"s is the same and you
>won't have to comment every use of it in this header file (now you have).
>
>Also, using BITS_PER_LONG tells the reader that this is a bitmask for sure.
>
>
>>- The total segment number of specific resource is variable
>>   on IODA1 and IODA2. I just choosed the max value with margin.
>>- PNV_IODA_MAX_PE_NUM, indicating max PE number, isn't 512 on
>>   IODA1 or IODA2.
>
>Give it a better name.
>

Ok. It it has to be a macro, then it's as below:

#define PNV_IODA_MAX_SEG_NUM	512

>
>>
>>>>+
>>>>  	/* "Weight" assigned to the PE for the sake of DMA resource
>>>>  	 * allocations
>>>>  	 */
>>>>@@ -145,15 +154,16 @@ struct pnv_phb {
>>>>  			unsigned int		io_segsize;
>>>>  			unsigned int		io_pci_base;
>>>>
>>>>+			/* IO, M32, M64 segment maps */
>>>>+			unsigned long		io_segmap[8];
>>>>+			unsigned long		m32_segmap[8];
>>>>+			unsigned long		m64_segmap[8];
>>>>+
>>>>  			/* PE allocation */
>>>>  			struct mutex		pe_alloc_mutex;
>>>>  			unsigned long		*pe_alloc;
>>>>  			struct pnv_ioda_pe	*pe_array;
>>>>
>>>>-			/* M32 & IO segment maps */
>>>>-			unsigned int		*m32_segmap;
>>>>-			unsigned int		*io_segmap;
>>>>-
>>>>  			/* IRQ chip */
>>>>  			int			irq_chip_init;
>>>>  			struct irq_chip		irq_chip;
>>>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
  2015-08-12 10:45         ` Gavin Shan
@ 2015-08-12 11:05           ` Alexey Kardashevskiy
  2015-08-12 11:20             ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-12 11:05 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/12/2015 08:45 PM, Gavin Shan wrote:
> On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote:
>> On 08/11/2015 10:03 AM, Gavin Shan wrote:
>>> On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote:
>>>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>> The patch is adding 6 bitmaps, three to PE and three to PHB, to track
>>>>
>>>> The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that
>>>> all about? Also, there was no m64_segmap, now there is, needs an explanation
>>>> may be.
>>>>
>>>
>>> Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically.
>>> Now, they have fixed sizes - 512 bits.
>>>
>>> The subject "powerpc/powernv: Track IO/M32/M64 segments from PE" indicates
>>> why m64_segmap is added.
>>
>>
>> But before this patch, you somehow managed to keep it working without a map
>> for M64, by the same time you needed map for IO and M32. It seems you are
>> making things consistent in this patch but it also feels like you do not have
>> to do so as M64 did not need a map before and I cannot see why it needs one
>> now.
>>
>
> The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
> where the M64 segments consumed by one particular PE will be released.


Then add it where it is really started being used. It is really hard to 
review a patch which is actually spread between patches. Do not count that 
reviewers will just trust you.


>>>>
>>>>> the consumed by one particular PE, which can be released once the PE
>>>>> is destroyed during PCI unplugging time. Also, we're using fixed
>>>>> quantity of bits to trace the used IO, M32 and M64 segments by PEs
>>>>> in one particular PHB.
>>>>>
>>>>
>>>> Out of curiosity - have you considered having just 3 arrays, in PHB, storing
>>>> PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it
>>>> is using? Not sure about this master/slave PEs though.
>>>>
>>>
>>> I don't follow your suggestion. Can you rephrase and explain it a bit more?
>>
>>
>> Please explains in what situations you need same map in both PHB and PE and
>> how you are going to use them. For example, pe::m64_segmap and
>> phb::m64_segmap.
>>
>> I believe you need to know what segment is used by what PE and that's it and
>> having 2 bitmaps is overcomplicated hard to follow. Is there anything else
>> what I am missing?
>>
>
> The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap
> as an example, it will be used when creating or destroying the PE who consumes
> M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain.
> It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks
> the M64 segments consumed by the PE.


You could have a single map in PHB, key would be a segment number and value 
would be PE number. No need to have a map in PE. At all. No need to 
initialize bitmaps, etc.



>>>> It would be easier to read patches if this one was right before
>>>> [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>>>>
>>>
>>> I'll try to reoder the patch, but not expect too much...
>>>
>>>>
>>>>
>>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>> ---
>>>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++++++++++++++--------------
>>>>>   arch/powerpc/platforms/powernv/pci.h      | 18 ++++++++++++++----
>>>>>   2 files changed, 29 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> index e4ac703..78b49a1 100644
>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> @@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>>   			list_add_tail(&pe->list, &master_pe->slaves);
>>>>>   		}
>>>>>
>>>>> +		/* M64 segments consumed by slave PEs are tracked
>>>>> +		 * by master PE
>>>>> +		 */
>>>>> +		set_bit(pe->pe_number, master_pe->m64_segmap);
>>>>> +		set_bit(pe->pe_number, phb->ioda.m64_segmap);
>>>>> +
>>>>>   		/* P7IOC supports M64DT, which helps mapping M64 segment
>>>>>   		 * to one particular PE#. However, PHB3 has fixed mapping
>>>>>   		 * between M64 segment and PE#. In order to have same logic
>>>>> @@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>>
>>>>>   			while (index < phb->ioda.total_pe &&
>>>>>   			       region.start <= region.end) {
>>>>> -				phb->ioda.io_segmap[index] = pe->pe_number;
>>>>> +				set_bit(index, pe->io_segmap);
>>>>> +				set_bit(index, phb->ioda.io_segmap);
>>>>>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>> -					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
>>>>> +					pe->pe_number, OPAL_IO_WINDOW_TYPE,
>>>>> +					0, index);
>>>>
>>>> Unrelated change.
>>>>
>>>
>>> True, will drop. However, checkpatch.pl will complain wtih:
>>> exceeding 80 characters.
>>
>> It will not as you are not changing these lines, it only complains on changes.
>>
>>
>>
>>>
>>>>>   				if (rc != OPAL_SUCCESS) {
>>>>>   					pr_err("%s: OPAL error %d when mapping IO "
>>>>>   					       "segment #%d to PE#%d\n",
>>>>> @@ -2896,9 +2904,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>>
>>>>>   			while (index < phb->ioda.total_pe &&
>>>>>   			       region.start <= region.end) {
>>>>> -				phb->ioda.m32_segmap[index] = pe->pe_number;
>>>>> +				set_bit(index, pe->m32_segmap);
>>>>> +				set_bit(index, phb->ioda.m32_segmap);
>>>>>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>> -					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
>>>>> +					pe->pe_number, OPAL_M32_WINDOW_TYPE,
>>>>> +					0, index);
>>>>
>>>> Unrelated change.
>>>>
>>>
>>> same as above.
>>>
>>>>>   				if (rc != OPAL_SUCCESS) {
>>>>>   					pr_err("%s: OPAL error %d when mapping M32 "
>>>>>   					       "segment#%d to PE#%d",
>>>>> @@ -3090,7 +3100,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>>   {
>>>>>   	struct pci_controller *hose;
>>>>>   	struct pnv_phb *phb;
>>>>> -	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>>>> +	unsigned long size, pemap_off;
>>>>>   	const __be64 *prop64;
>>>>>   	const __be32 *prop32;
>>>>>   	int len;
>>>>> @@ -3175,19 +3185,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>>
>>>>>   	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>>>
>>>>
>>>> This comment came with if(IODA1) below, since you are removing the condition
>>>> below, makes sense to remove the comment as well or move it where people will
>>>> look for it (arch/powerpc/platforms/powernv/pci.h ?)
>>>>
>>>
>>> Yes, will do.
>>>
>>>>
>>>>>   	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
>>>>> -	m32map_off = size;
>>>>> -	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
>>>>> -	if (phb->type == PNV_PHB_IODA1) {
>>>>> -		iomap_off = size;
>>>>> -		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
>>>>> -	}
>>>>>   	pemap_off = size;
>>>>>   	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
>>>>>   	aux = memblock_virt_alloc(size, 0);
>>>>
>>>>
>>>> After adding static arrays to PE and PHB, do you still need this "aux"?
>>>>
>>>
>>> "aux" is still needed to tell the boundary of pe_alloc_bitmap and pe_array.
>>>>
>>>>>   	phb->ioda.pe_alloc = aux;
>>>>> -	phb->ioda.m32_segmap = aux + m32map_off;
>>>>> -	if (phb->type == PNV_PHB_IODA1)
>>>>> -		phb->ioda.io_segmap = aux + iomap_off;
>>>>>   	phb->ioda.pe_array = aux + pemap_off;
>>>>>   	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
>>>>>
>>>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>>>> index 62239b1..08a4e57 100644
>>>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>>>> @@ -49,6 +49,15 @@ struct pnv_ioda_pe {
>>>>>   	/* PE number */
>>>>>   	unsigned int		pe_number;
>>>>>
>>>>> +	/* IO/M32/M64 segments consumed by the PE. Each PE can
>>>>> +	 * have one M64 segment at most, but M64 segments consumed
>>>>> +	 * by slave PEs will be contributed to the master PE. One
>>>>> +	 * PE can own multiple IO and M32 segments.
>>>>
>>>>
>>>> A PE can have multiple IO and M32 segments but just one M64 segment? Is this
>>>> correct for IODA1 or IODA2 or both? Is this a limitation of this
>>>> implementation or it comes from P7IOC/PHB3 hardware?
>>>>
>>>
>>> It's correct for IO and M32. However, on IODA1 or IODA2, one PE can have
>>> multiple M64 segments as well.
>>
>>
>> But the comment says "Each PE can have one M64 segment at most". Which
>> statement is correct?
>>
>
> The comment is correct regarding PHB's 15th M64 BAR: Each PE can have one
> M64 segment at post. It's from hardware limitation. However, once one PE
> consumes multiple M64 segments. all those M64 segments will be tracked in
> "master" PE and it's determined by software implementation.
>
>>>>> +	 */
>>>>> +	unsigned long		io_segmap[8];
>>>>> +	unsigned long		m32_segmap[8];
>>>>> +	unsigned long		m64_segmap[8];
>>>>
>>>> Magic constant "8", 64bit*8 = 512 PEs - where did this come from?
>>>>
>>>> Anyway,
>>>>
>>>> #define PNV_IODA_MAX_PE_NUM	512
>>>>
>>>> unsigned long io_segmap[PNV_IODA_MAX_PE_NUM/BITS_PER_LONG]
>>>>
>>>
>>> I prefer "8", not macro for 3 reasons:
>>> - The macro won't be used in the code.
>>
>> You will use it 6 times in the header, if you give it a good name, people
>> won't have to guess if the meaning of all these "8"s is the same and you
>> won't have to comment every use of it in this header file (now you have).
>>
>> Also, using BITS_PER_LONG tells the reader that this is a bitmask for sure.
>>
>>
>>> - The total segment number of specific resource is variable
>>>    on IODA1 and IODA2. I just choosed the max value with margin.
>>> - PNV_IODA_MAX_PE_NUM, indicating max PE number, isn't 512 on
>>>    IODA1 or IODA2.
>>
>> Give it a better name.
>>
>
> Ok. It it has to be a macro, then it's as below:
>
> #define PNV_IODA_MAX_SEG_NUM	512


Thanks mate :)


>>
>>>
>>>>> +
>>>>>   	/* "Weight" assigned to the PE for the sake of DMA resource
>>>>>   	 * allocations
>>>>>   	 */
>>>>> @@ -145,15 +154,16 @@ struct pnv_phb {
>>>>>   			unsigned int		io_segsize;
>>>>>   			unsigned int		io_pci_base;
>>>>>
>>>>> +			/* IO, M32, M64 segment maps */
>>>>> +			unsigned long		io_segmap[8];
>>>>> +			unsigned long		m32_segmap[8];
>>>>> +			unsigned long		m64_segmap[8];
>>>>> +
>>>>>   			/* PE allocation */
>>>>>   			struct mutex		pe_alloc_mutex;
>>>>>   			unsigned long		*pe_alloc;
>>>>>   			struct pnv_ioda_pe	*pe_array;
>>>>>
>>>>> -			/* M32 & IO segment maps */
>>>>> -			unsigned int		*m32_segmap;
>>>>> -			unsigned int		*io_segmap;
>>>>> -
>>>>>   			/* IRQ chip */
>>>>>   			int			irq_chip_init;
>>>>>   			struct irq_chip		irq_chip;
>>>>>
>
> Thanks,
> Gavin
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
  2015-08-12 11:05           ` Alexey Kardashevskiy
@ 2015-08-12 11:20             ` Gavin Shan
  2015-08-12 12:57               ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-12 11:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Wed, Aug 12, 2015 at 09:05:09PM +1000, Alexey Kardashevskiy wrote:
>On 08/12/2015 08:45 PM, Gavin Shan wrote:
>>On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/11/2015 10:03 AM, Gavin Shan wrote:
>>>>On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote:
>>>>>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>>>The patch is adding 6 bitmaps, three to PE and three to PHB, to track
>>>>>
>>>>>The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that
>>>>>all about? Also, there was no m64_segmap, now there is, needs an explanation
>>>>>may be.
>>>>>
>>>>
>>>>Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically.
>>>>Now, they have fixed sizes - 512 bits.
>>>>
>>>>The subject "powerpc/powernv: Track IO/M32/M64 segments from PE" indicates
>>>>why m64_segmap is added.
>>>
>>>
>>>But before this patch, you somehow managed to keep it working without a map
>>>for M64, by the same time you needed map for IO and M32. It seems you are
>>>making things consistent in this patch but it also feels like you do not have
>>>to do so as M64 did not need a map before and I cannot see why it needs one
>>>now.
>>>
>>
>>The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>>where the M64 segments consumed by one particular PE will be released.
>
>
>Then add it where it is really started being used. It is really hard to
>review a patch which is actually spread between patches. Do not count that
>reviewers will just trust you.
>

Ok. I'll try. 

>
>>>>>
>>>>>>the consumed by one particular PE, which can be released once the PE
>>>>>>is destroyed during PCI unplugging time. Also, we're using fixed
>>>>>>quantity of bits to trace the used IO, M32 and M64 segments by PEs
>>>>>>in one particular PHB.
>>>>>>
>>>>>
>>>>>Out of curiosity - have you considered having just 3 arrays, in PHB, storing
>>>>>PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it
>>>>>is using? Not sure about this master/slave PEs though.
>>>>>
>>>>
>>>>I don't follow your suggestion. Can you rephrase and explain it a bit more?
>>>
>>>
>>>Please explains in what situations you need same map in both PHB and PE and
>>>how you are going to use them. For example, pe::m64_segmap and
>>>phb::m64_segmap.
>>>
>>>I believe you need to know what segment is used by what PE and that's it and
>>>having 2 bitmaps is overcomplicated hard to follow. Is there anything else
>>>what I am missing?
>>>
>>
>>The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap
>>as an example, it will be used when creating or destroying the PE who consumes
>>M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain.
>>It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks
>>the M64 segments consumed by the PE.
>
>
>You could have a single map in PHB, key would be a segment number and value
>would be PE number. No need to have a map in PE. At all. No need to
>initialize bitmaps, etc.
>

So it would be arrays for various segmant maps if I understood your suggestion
as below. Please confirm:

#define PNV_IODA_MAX_SEG_NUM	512

	int struct pnv_phb::io_segmap[PNV_IODA_MAX_SEG_NUM];
			    m32_segmap[PNV_IODA_MAX_SEG_NUM];
			    m64_segmap[PNV_IODA_MAX_SEG_NUM];
- Initially, they are initialize to IODA_INVALID_PE;
- When one segment is assigned to one PE, the corresponding entry
  of the array is set to PE number.
- When one segment is relased, the corresponding entry of the array
  is set to IODA_INVALID_PE;
 

>>>>>It would be easier to read patches if this one was right before
>>>>>[PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>>>>>
>>>>
>>>>I'll try to reoder the patch, but not expect too much...
>>>>
>>>>>
>>>>>
>>>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>>>---
>>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++++++++++++++--------------
>>>>>>  arch/powerpc/platforms/powernv/pci.h      | 18 ++++++++++++++----
>>>>>>  2 files changed, 29 insertions(+), 18 deletions(-)
>>>>>>
>>>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>index e4ac703..78b49a1 100644
>>>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>@@ -388,6 +388,12 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
>>>>>>  			list_add_tail(&pe->list, &master_pe->slaves);
>>>>>>  		}
>>>>>>
>>>>>>+		/* M64 segments consumed by slave PEs are tracked
>>>>>>+		 * by master PE
>>>>>>+		 */
>>>>>>+		set_bit(pe->pe_number, master_pe->m64_segmap);
>>>>>>+		set_bit(pe->pe_number, phb->ioda.m64_segmap);
>>>>>>+
>>>>>>  		/* P7IOC supports M64DT, which helps mapping M64 segment
>>>>>>  		 * to one particular PE#. However, PHB3 has fixed mapping
>>>>>>  		 * between M64 segment and PE#. In order to have same logic
>>>>>>@@ -2871,9 +2877,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>>>
>>>>>>  			while (index < phb->ioda.total_pe &&
>>>>>>  			       region.start <= region.end) {
>>>>>>-				phb->ioda.io_segmap[index] = pe->pe_number;
>>>>>>+				set_bit(index, pe->io_segmap);
>>>>>>+				set_bit(index, phb->ioda.io_segmap);
>>>>>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>>>-					pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, index);
>>>>>>+					pe->pe_number, OPAL_IO_WINDOW_TYPE,
>>>>>>+					0, index);
>>>>>
>>>>>Unrelated change.
>>>>>
>>>>
>>>>True, will drop. However, checkpatch.pl will complain wtih:
>>>>exceeding 80 characters.
>>>
>>>It will not as you are not changing these lines, it only complains on changes.
>>>
>>>
>>>
>>>>
>>>>>>  				if (rc != OPAL_SUCCESS) {
>>>>>>  					pr_err("%s: OPAL error %d when mapping IO "
>>>>>>  					       "segment #%d to PE#%d\n",
>>>>>>@@ -2896,9 +2904,11 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>>>
>>>>>>  			while (index < phb->ioda.total_pe &&
>>>>>>  			       region.start <= region.end) {
>>>>>>-				phb->ioda.m32_segmap[index] = pe->pe_number;
>>>>>>+				set_bit(index, pe->m32_segmap);
>>>>>>+				set_bit(index, phb->ioda.m32_segmap);
>>>>>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>>>-					pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, index);
>>>>>>+					pe->pe_number, OPAL_M32_WINDOW_TYPE,
>>>>>>+					0, index);
>>>>>
>>>>>Unrelated change.
>>>>>
>>>>
>>>>same as above.
>>>>
>>>>>>  				if (rc != OPAL_SUCCESS) {
>>>>>>  					pr_err("%s: OPAL error %d when mapping M32 "
>>>>>>  					       "segment#%d to PE#%d",
>>>>>>@@ -3090,7 +3100,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>>>  {
>>>>>>  	struct pci_controller *hose;
>>>>>>  	struct pnv_phb *phb;
>>>>>>-	unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>>>>>+	unsigned long size, pemap_off;
>>>>>>  	const __be64 *prop64;
>>>>>>  	const __be32 *prop32;
>>>>>>  	int len;
>>>>>>@@ -3175,19 +3185,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>>>
>>>>>>  	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>>>>
>>>>>
>>>>>This comment came with if(IODA1) below, since you are removing the condition
>>>>>below, makes sense to remove the comment as well or move it where people will
>>>>>look for it (arch/powerpc/platforms/powernv/pci.h ?)
>>>>>
>>>>
>>>>Yes, will do.
>>>>
>>>>>
>>>>>>  	size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
>>>>>>-	m32map_off = size;
>>>>>>-	size += phb->ioda.total_pe * sizeof(phb->ioda.m32_segmap[0]);
>>>>>>-	if (phb->type == PNV_PHB_IODA1) {
>>>>>>-		iomap_off = size;
>>>>>>-		size += phb->ioda.total_pe * sizeof(phb->ioda.io_segmap[0]);
>>>>>>-	}
>>>>>>  	pemap_off = size;
>>>>>>  	size += phb->ioda.total_pe * sizeof(struct pnv_ioda_pe);
>>>>>>  	aux = memblock_virt_alloc(size, 0);
>>>>>
>>>>>
>>>>>After adding static arrays to PE and PHB, do you still need this "aux"?
>>>>>
>>>>
>>>>"aux" is still needed to tell the boundary of pe_alloc_bitmap and pe_array.
>>>>>
>>>>>>  	phb->ioda.pe_alloc = aux;
>>>>>>-	phb->ioda.m32_segmap = aux + m32map_off;
>>>>>>-	if (phb->type == PNV_PHB_IODA1)
>>>>>>-		phb->ioda.io_segmap = aux + iomap_off;
>>>>>>  	phb->ioda.pe_array = aux + pemap_off;
>>>>>>  	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
>>>>>>
>>>>>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>>>>>index 62239b1..08a4e57 100644
>>>>>>--- a/arch/powerpc/platforms/powernv/pci.h
>>>>>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>>>>>@@ -49,6 +49,15 @@ struct pnv_ioda_pe {
>>>>>>  	/* PE number */
>>>>>>  	unsigned int		pe_number;
>>>>>>
>>>>>>+	/* IO/M32/M64 segments consumed by the PE. Each PE can
>>>>>>+	 * have one M64 segment at most, but M64 segments consumed
>>>>>>+	 * by slave PEs will be contributed to the master PE. One
>>>>>>+	 * PE can own multiple IO and M32 segments.
>>>>>
>>>>>
>>>>>A PE can have multiple IO and M32 segments but just one M64 segment? Is this
>>>>>correct for IODA1 or IODA2 or both? Is this a limitation of this
>>>>>implementation or it comes from P7IOC/PHB3 hardware?
>>>>>
>>>>
>>>>It's correct for IO and M32. However, on IODA1 or IODA2, one PE can have
>>>>multiple M64 segments as well.
>>>
>>>
>>>But the comment says "Each PE can have one M64 segment at most". Which
>>>statement is correct?
>>>
>>
>>The comment is correct regarding PHB's 15th M64 BAR: Each PE can have one
>>M64 segment at post. It's from hardware limitation. However, once one PE
>>consumes multiple M64 segments. all those M64 segments will be tracked in
>>"master" PE and it's determined by software implementation.
>>
>>>>>>+	 */
>>>>>>+	unsigned long		io_segmap[8];
>>>>>>+	unsigned long		m32_segmap[8];
>>>>>>+	unsigned long		m64_segmap[8];
>>>>>
>>>>>Magic constant "8", 64bit*8 = 512 PEs - where did this come from?
>>>>>
>>>>>Anyway,
>>>>>
>>>>>#define PNV_IODA_MAX_PE_NUM	512
>>>>>
>>>>>unsigned long io_segmap[PNV_IODA_MAX_PE_NUM/BITS_PER_LONG]
>>>>>
>>>>
>>>>I prefer "8", not macro for 3 reasons:
>>>>- The macro won't be used in the code.
>>>
>>>You will use it 6 times in the header, if you give it a good name, people
>>>won't have to guess if the meaning of all these "8"s is the same and you
>>>won't have to comment every use of it in this header file (now you have).
>>>
>>>Also, using BITS_PER_LONG tells the reader that this is a bitmask for sure.
>>>
>>>
>>>>- The total segment number of specific resource is variable
>>>>   on IODA1 and IODA2. I just choosed the max value with margin.
>>>>- PNV_IODA_MAX_PE_NUM, indicating max PE number, isn't 512 on
>>>>   IODA1 or IODA2.
>>>
>>>Give it a better name.
>>>
>>
>>Ok. It it has to be a macro, then it's as below:
>>
>>#define PNV_IODA_MAX_SEG_NUM	512
>
>
>Thanks mate :)
>
>
>>>
>>>>
>>>>>>+
>>>>>>  	/* "Weight" assigned to the PE for the sake of DMA resource
>>>>>>  	 * allocations
>>>>>>  	 */
>>>>>>@@ -145,15 +154,16 @@ struct pnv_phb {
>>>>>>  			unsigned int		io_segsize;
>>>>>>  			unsigned int		io_pci_base;
>>>>>>
>>>>>>+			/* IO, M32, M64 segment maps */
>>>>>>+			unsigned long		io_segmap[8];
>>>>>>+			unsigned long		m32_segmap[8];
>>>>>>+			unsigned long		m64_segmap[8];
>>>>>>+
>>>>>>  			/* PE allocation */
>>>>>>  			struct mutex		pe_alloc_mutex;
>>>>>>  			unsigned long		*pe_alloc;
>>>>>>  			struct pnv_ioda_pe	*pe_array;
>>>>>>
>>>>>>-			/* M32 & IO segment maps */
>>>>>>-			unsigned int		*m32_segmap;
>>>>>>-			unsigned int		*io_segmap;
>>>>>>-
>>>>>>  			/* IRQ chip */
>>>>>>  			int			irq_chip_init;
>>>>>>  			struct irq_chip		irq_chip;
>>>>>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
  2015-08-12 11:20             ` Gavin Shan
@ 2015-08-12 12:57               ` Alexey Kardashevskiy
  2015-08-12 23:34                 ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-12 12:57 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/12/2015 09:20 PM, Gavin Shan wrote:
> On Wed, Aug 12, 2015 at 09:05:09PM +1000, Alexey Kardashevskiy wrote:
>> On 08/12/2015 08:45 PM, Gavin Shan wrote:
>>> On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote:
>>>> On 08/11/2015 10:03 AM, Gavin Shan wrote:
>>>>> On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote:
>>>>>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>>>> The patch is adding 6 bitmaps, three to PE and three to PHB, to track
>>>>>>
>>>>>> The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that
>>>>>> all about? Also, there was no m64_segmap, now there is, needs an explanation
>>>>>> may be.
>>>>>>
>>>>>
>>>>> Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically.
>>>>> Now, they have fixed sizes - 512 bits.
>>>>>
>>>>> The subject "powerpc/powernv: Track IO/M32/M64 segments from PE" indicates
>>>>> why m64_segmap is added.
>>>>
>>>>
>>>> But before this patch, you somehow managed to keep it working without a map
>>>> for M64, by the same time you needed map for IO and M32. It seems you are
>>>> making things consistent in this patch but it also feels like you do not have
>>>> to do so as M64 did not need a map before and I cannot see why it needs one
>>>> now.
>>>>
>>>
>>> The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>>> where the M64 segments consumed by one particular PE will be released.
>>
>>
>> Then add it where it is really started being used. It is really hard to
>> review a patch which is actually spread between patches. Do not count that
>> reviewers will just trust you.
>>
>
> Ok. I'll try.
>
>>
>>>>>>
>>>>>>> the consumed by one particular PE, which can be released once the PE
>>>>>>> is destroyed during PCI unplugging time. Also, we're using fixed
>>>>>>> quantity of bits to trace the used IO, M32 and M64 segments by PEs
>>>>>>> in one particular PHB.
>>>>>>>
>>>>>>
>>>>>> Out of curiosity - have you considered having just 3 arrays, in PHB, storing
>>>>>> PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it
>>>>>> is using? Not sure about this master/slave PEs though.
>>>>>>
>>>>>
>>>>> I don't follow your suggestion. Can you rephrase and explain it a bit more?
>>>>
>>>>
>>>> Please explains in what situations you need same map in both PHB and PE and
>>>> how you are going to use them. For example, pe::m64_segmap and
>>>> phb::m64_segmap.
>>>>
>>>> I believe you need to know what segment is used by what PE and that's it and
>>>> having 2 bitmaps is overcomplicated hard to follow. Is there anything else
>>>> what I am missing?
>>>>
>>>
>>> The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap
>>> as an example, it will be used when creating or destroying the PE who consumes
>>> M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain.
>>> It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks
>>> the M64 segments consumed by the PE.
>>
>>
>> You could have a single map in PHB, key would be a segment number and value
>> would be PE number. No need to have a map in PE. At all. No need to
>> initialize bitmaps, etc.
>>
>
> So it would be arrays for various segmant maps if I understood your suggestion
> as below. Please confirm:
>
> #define PNV_IODA_MAX_SEG_NUM	512
>
> 	int struct pnv_phb::io_segmap[PNV_IODA_MAX_SEG_NUM];
> 			    m32_segmap[PNV_IODA_MAX_SEG_NUM];
> 			    m64_segmap[PNV_IODA_MAX_SEG_NUM];
> - Initially, they are initialize to IODA_INVALID_PE;
> - When one segment is assigned to one PE, the corresponding entry
>    of the array is set to PE number.
> - When one segment is relased, the corresponding entry of the array
>    is set to IODA_INVALID_PE;


No, not arrays, I meant DEFINE_HASHTABLE(), hash_add(), etc from 
include/linux/hashtable.h.

http://kernelnewbies.org/FAQ/Hashtables is a good place to start :)



-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE
  2015-08-12 12:57               ` Alexey Kardashevskiy
@ 2015-08-12 23:34                 ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-12 23:34 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Wed, Aug 12, 2015 at 10:57:33PM +1000, Alexey Kardashevskiy wrote:
>On 08/12/2015 09:20 PM, Gavin Shan wrote:
>>On Wed, Aug 12, 2015 at 09:05:09PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/12/2015 08:45 PM, Gavin Shan wrote:
>>>>On Tue, Aug 11, 2015 at 12:23:42PM +1000, Alexey Kardashevskiy wrote:
>>>>>On 08/11/2015 10:03 AM, Gavin Shan wrote:
>>>>>>On Mon, Aug 10, 2015 at 05:16:40PM +1000, Alexey Kardashevskiy wrote:
>>>>>>>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>>>>>The patch is adding 6 bitmaps, three to PE and three to PHB, to track
>>>>>>>
>>>>>>>The patch is also removing 2 arrays (io_segmap and m32_segmap), what is that
>>>>>>>all about? Also, there was no m64_segmap, now there is, needs an explanation
>>>>>>>may be.
>>>>>>>
>>>>>>
>>>>>>Originally, the bitmaps (io_segmap and m32_segmap) are allocated dynamically.
>>>>>>Now, they have fixed sizes - 512 bits.
>>>>>>
>>>>>>The subject "powerpc/powernv: Track IO/M32/M64 segments from PE" indicates
>>>>>>why m64_segmap is added.
>>>>>
>>>>>
>>>>>But before this patch, you somehow managed to keep it working without a map
>>>>>for M64, by the same time you needed map for IO and M32. It seems you are
>>>>>making things consistent in this patch but it also feels like you do not have
>>>>>to do so as M64 did not need a map before and I cannot see why it needs one
>>>>>now.
>>>>>
>>>>
>>>>The M64 map is used by [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
>>>>where the M64 segments consumed by one particular PE will be released.
>>>
>>>
>>>Then add it where it is really started being used. It is really hard to
>>>review a patch which is actually spread between patches. Do not count that
>>>reviewers will just trust you.
>>>
>>
>>Ok. I'll try.
>>
>>>
>>>>>>>
>>>>>>>>the consumed by one particular PE, which can be released once the PE
>>>>>>>>is destroyed during PCI unplugging time. Also, we're using fixed
>>>>>>>>quantity of bits to trace the used IO, M32 and M64 segments by PEs
>>>>>>>>in one particular PHB.
>>>>>>>>
>>>>>>>
>>>>>>>Out of curiosity - have you considered having just 3 arrays, in PHB, storing
>>>>>>>PE numbers, and ditching PE's arrays? Does PE itself need to know what PEs it
>>>>>>>is using? Not sure about this master/slave PEs though.
>>>>>>>
>>>>>>
>>>>>>I don't follow your suggestion. Can you rephrase and explain it a bit more?
>>>>>
>>>>>
>>>>>Please explains in what situations you need same map in both PHB and PE and
>>>>>how you are going to use them. For example, pe::m64_segmap and
>>>>>phb::m64_segmap.
>>>>>
>>>>>I believe you need to know what segment is used by what PE and that's it and
>>>>>having 2 bitmaps is overcomplicated hard to follow. Is there anything else
>>>>>what I am missing?
>>>>>
>>>>
>>>>The situation is same to all (IO, M32 and M64) segment maps. Taking m64_segmap
>>>>as an example, it will be used when creating or destroying the PE who consumes
>>>>M64 segments. phb::m64_segmap is recording the M64 segment usage in PHB's domain.
>>>>It's used to check same M64 segment won't be used for towice. pe::m64_segmap tracks
>>>>the M64 segments consumed by the PE.
>>>
>>>
>>>You could have a single map in PHB, key would be a segment number and value
>>>would be PE number. No need to have a map in PE. At all. No need to
>>>initialize bitmaps, etc.
>>>
>>
>>So it would be arrays for various segmant maps if I understood your suggestion
>>as below. Please confirm:
>>
>>#define PNV_IODA_MAX_SEG_NUM	512
>>
>>	int struct pnv_phb::io_segmap[PNV_IODA_MAX_SEG_NUM];
>>			    m32_segmap[PNV_IODA_MAX_SEG_NUM];
>>			    m64_segmap[PNV_IODA_MAX_SEG_NUM];
>>- Initially, they are initialize to IODA_INVALID_PE;
>>- When one segment is assigned to one PE, the corresponding entry
>>   of the array is set to PE number.
>>- When one segment is relased, the corresponding entry of the array
>>   is set to IODA_INVALID_PE;
>
>
>No, not arrays, I meant DEFINE_HASHTABLE(), hash_add(), etc from
>include/linux/hashtable.h.
>
>http://kernelnewbies.org/FAQ/Hashtables is a good place to start :)
>

Are you sure it needs hashtable to represent the simple data struct?
I really don't understand the benefits, could you provide more details
about the benefits?

With hashtable, every bucket will include multiple items with conflicting
hash key, each of which would be represented by data struct as below. The
data struct uses 24 bytes memory and not efficient enough from this aspect.
When one more segment consued, instance of "struct pnv_ioda_segment" is
allocated and put into the conflicting list of the target bucket. At later
point, the instance is removed from the list and released when the segment
is detached from the PE. It's more complex than it should be.

struct pnv_ioda_segment {
	int               pe_number;
	int               seg_number;
	struct hlist_node node;
};

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 07/42] powerpc/powernv: Improve IO and M32 mapping
  2015-08-11  2:32         ` Alexey Kardashevskiy
@ 2015-08-12 23:42           ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-12 23:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Tue, Aug 11, 2015 at 12:32:13PM +1000, Alexey Kardashevskiy wrote:
>On 08/11/2015 10:12 AM, Gavin Shan wrote:
>>On Mon, Aug 10, 2015 at 05:40:08PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>There're 3 windows (IO, M32 and M64) for PHB, root port and upstream
>>>
>>>These are actually IO, non-prefetchable and prefetchable windows which happen
>>>to be IO, 32bit and 64bit windows but this has nothing to do with the M32/M64
>>>BAR registers in P7IOC/PHB3, do I understand this correctly?
>>>
>>
>>In pci-ioda.c, we have below definiations that are defined when
>>developing the code, not from any specification:
>>
>>IO  - resources with IO property
>>M32 - 32-bits or non-prefetchable resources
>>M64 - 64-bits and prefetchable resources
>
>
>This what I am saying - it is incorrect and confusing. M32/M64 are PHB3
>register names and associated windows (with "M" in the beginning) but not
>device resources.
>

I don't see how it's incorrect and confusing. M32/M64 are not PHB3
register names. Also, device resource is either IO, 32-bits prefetchable,
memory, 32-bits non-prefetchable memory, 64-bits non-prefetchable memory,
64-bits prefetchable memory. They match with IO, M32, M64.


>>>>port of the PCIE switch behind root port. In order to support PCI
>>>>hotplug, we extend the start/end address of those 3 windows of root
>>>>port or upstream port to the start/end address of the 3 PHB's windows.
>>>>The current implementation, assigning IO or M32 segment based on the
>>>>bridge's windows, isn't reliable.
>>>>
>>>>The patch fixes above issue by calculating PE's consumed IO or M32
>>>>segments from its contained devices, no PCI bridge windows involved
>>>>if the PE doesn't contain all the subordinate PCI buses.
>>>
>>>Please, rephrase it. How can PCI bridges be involved in PE consumption?
>>>
>>
>>Ok. Will add something like below:
>>
>>if the PE, corresponding to the PCI bus, doesn't contain all the subordinate
>>PCI buses.
>
>
>No, my question was about "PCI bridge windows involved" - what do you do to
>the windows if PE does not own all child buses?
>

All of it is about the original implementation: When the PE doesn't include
all child buses, the resource consumed by the PE is: resources assigned to
current PCI bus and then exclude the resources assigned to the child buses.
Note that PCI bridge windows are actually PCI bus's resource.

>>>
>>>>Otherwise,
>>>>the PCI bridge windows still contribute to PE's consumed IO or M32
>>>>segments.
>>>
>>>PCI bridge windows themselves consume PEs? Is that correct?
>>>
>>
>>PCI bridge windows consume IO, M32, M64 segments, not PEs.
>
>Ah, right.
>
>
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 136 +++++++++++++++++-------------
>>>>  1 file changed, 79 insertions(+), 57 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 488a53e..713f4b4 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -2844,75 +2844,97 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>>  }
>>>>  #endif /* CONFIG_PCI_IOV */
>>>>
>>>>-/*
>>>>- * This function is supposed to be called on basis of PE from top
>>>>- * to bottom style. So the the I/O or MMIO segment assigned to
>>>>- * parent PE could be overrided by its child PEs if necessary.
>>>>- */
>>>>-static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>-				  struct pnv_ioda_pe *pe)
>>>>+static int pnv_ioda_setup_one_res(struct pci_controller *hose,
>>>>+				  struct pnv_ioda_pe *pe,
>>>>+				  struct resource *res)
>>>>  {
>>>>  	struct pnv_phb *phb = hose->private_data;
>>>>  	struct pci_bus_region region;
>>>>-	struct resource *res;
>>>>-	int i, index;
>>>>-	unsigned int segsize;
>>>>+	unsigned int index, segsize;
>>>>  	unsigned long *segmap, *pe_segmap;
>>>>  	uint16_t win;
>>>>  	int64_t rc;
>>>>
>>>>-	/*
>>>>-	 * NOTE: We only care PCI bus based PE for now. For PCI
>>>>-	 * device based PE, for example SRIOV sensitive VF should
>>>>-	 * be figured out later.
>>>>-	 */
>>>>-	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>>>>+	/* Check if we need map the resource */
>>>>+	if (!res->parent || !res->flags || res->start > res->end)
>>>
>>>res->start >= res->end ?
>>>
>>
>>No, res->start == res->end is valid.
>>
>>>
>>>>+		return 0;
>>>>
>>>>-	pci_bus_for_each_resource(pe->pbus, res, i) {
>>>>-		if (!res || !res->flags ||
>>>>-		    res->start > res->end)
>>>>-			continue;
>>>>+	if (res->flags & IORESOURCE_IO) {
>>>>+		region.start = res->start - phb->ioda.io_pci_base;
>>>>+		region.end   = res->end - phb->ioda.io_pci_base;
>>>>+		segsize      = phb->ioda.io_segsize;
>>>>+		segmap       = phb->ioda.io_segmap;
>>>>+		pe_segmap    = pe->io_segmap;
>>>>+		win          = OPAL_IO_WINDOW_TYPE;
>>>>+	} else if ((res->flags & IORESOURCE_MEM) &&
>>>>+		   !pnv_pci_is_mem_pref_64(res->flags)) {
>>>>+		region.start = res->start -
>>>>+			       hose->mem_offset[0] -
>>>>+			       phb->ioda.m32_pci_base;
>>>>+		region.end   = res->end -
>>>>+			       hose->mem_offset[0] -
>>>>+			       phb->ioda.m32_pci_base;
>>>>+		segsize      = phb->ioda.m32_segsize;
>>>>+		segmap       = phb->ioda.m32_segmap;
>>>>+		pe_segmap    = pe->m32_segmap;
>>>>+		win          = OPAL_M32_WINDOW_TYPE;
>>>>+	} else {
>>>>+		return 0;
>>>>+	}
>>>>
>>>>-		if (res->flags & IORESOURCE_IO) {
>>>>-			region.start = res->start - phb->ioda.io_pci_base;
>>>>-			region.end   = res->end - phb->ioda.io_pci_base;
>>>>-			segsize      = phb->ioda.io_segsize;
>>>>-			segmap       = phb->ioda.io_segmap;
>>>>-			pe_segmap    = pe->io_segmap;
>>>>-			win          = OPAL_IO_WINDOW_TYPE;
>>>>-		} else if ((res->flags & IORESOURCE_MEM) &&
>>>>-			   !pnv_pci_is_mem_pref_64(res->flags)) {
>>>>-			region.start = res->start -
>>>>-				       hose->mem_offset[0] -
>>>>-				       phb->ioda.m32_pci_base;
>>>>-			region.end   = res->end -
>>>>-				       hose->mem_offset[0] -
>>>>-				       phb->ioda.m32_pci_base;
>>>>-			segsize      = phb->ioda.m32_segsize;
>>>>-			segmap       = phb->ioda.m32_segmap;
>>>>-			pe_segmap    = pe->m32_segmap;
>>>>-			win          = OPAL_M32_WINDOW_TYPE;
>>>>-		} else {
>>>>-			continue;
>>>>+	region.start = _ALIGN_DOWN(region.start, segsize);
>>>>+	region.end   = _ALIGN_UP(region.end, segsize);
>>>>+	index = region.start / segsize;
>>>>+	while (index < phb->ioda.total_pe &&
>>>>+	       region.start < region.end) {
>>>>+		rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>+				pe->pe_number, win, 0, index);
>>>>+		if (rc != OPAL_SUCCESS) {
>>>>+			pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
>>>>+				__func__, rc, win, index,
>>>>+				pe->phb->hose->global_number,
>>>>+				pe->pe_number);
>>>>+			return -EIO;
>>>>  		}
>>>>
>>>>-		index = region.start / phb->ioda.io_segsize;
>>>>-		while (index < phb->ioda.total_pe &&
>>>>-		       region.start <= region.end) {
>>>>-			set_bit(index, segmap);
>>>>-			set_bit(index, pe_segmap);
>>>>-			rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>-					pe->pe_number, win, 0, index);
>>>>-			if (rc != OPAL_SUCCESS) {
>>>>-				pr_warn("%s: Error %lld mapping (%d) seg#%d to PHB#%d-PE#%d\n",
>>>>-					__func__, rc, win, index,
>>>>-					pe->phb->hose->global_number,
>>>>-					pe->pe_number);
>>>>-				break;
>>>>-			}
>>>>+		set_bit(index, segmap);
>>>>+		set_bit(index, pe_segmap);
>>>>+		region.start += segsize;
>>>>+		index++;
>>>>+	}
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>>+				  struct pnv_ioda_pe *pe)
>>>>+{
>>>>+	struct pci_dev *pdev;
>>>>+	struct resource *res;
>>>>+	int i;
>>>>+
>>>>+	/* This function only works for bus dependent PE */
>>>>+	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
>>>>+
>>>>+	list_for_each_entry(pdev, &pe->pbus->devices, bus_list) {
>>>>+		for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>>>>+			res = &pdev->resource[i];
>>>>+			if (pnv_ioda_setup_one_res(hose, pe, res))
>>>>+				return;
>>>>+		}
>>>>+
>>>>+		/* If the PE contains all subordinate PCI buses, the
>>>>+		 * resources of the child bridges should be mapped
>>>>+		 * to the PE as well.
>>>>+		 */
>>>>+		if (!(pe->flags & PNV_IODA_PE_BUS_ALL) ||
>>>>+		    (pdev->class >> 8) != PCI_CLASS_BRIDGE_PCI)
>>>>+			continue;
>>>>
>>>>-			region.start += segsize;
>>>>-			index++;
>>>>+		for (i = 0; i <= PCI_BRIDGE_RESOURCE_NUM; i++) {
>>>>+			res = &pdev->resource[PCI_BRIDGE_RESOURCES + i];
>>>>+			if (pnv_ioda_setup_one_res(hose, pe, res))
>>>>+				return;
>>>>  		}
>>>>  	}
>>>>  }
>>>>
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically
  2015-08-10  9:21   ` Alexey Kardashevskiy
@ 2015-08-12 23:57     ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-12 23:57 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 07:21:12PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>For P7IOC, the whole available DMA32 space, which is below the
>>MEM32 space, is divided evenly into 256MB segments. The number
>>of continuous segments assigned to one particular PE depends on
>>the PE's DMA weight that is calculated based on the type of each
>>PCI devices contained in the PE, and PHB's DMA weight which is
>>accumulative DMA weight of PEs contained in the PHB. It means
>>that the PHB's DMA weight calculation depends on existing PEs,
>>which works perfectly now, but not hotplug friendly. As the
>>whole available DMA32 space can be assigned to one PE on PHB3,
>>so we don't have the issue on PHB3.
>>
>>The patch calculates PHB's DMA weight based on the PCI devices
>>contained in the PHB dynamically so that it's hotplug friendly.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 88 +++++++++++++++----------------
>>  arch/powerpc/platforms/powernv/pci.h      |  6 ---
>>  2 files changed, 43 insertions(+), 51 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 713f4b4..7342cfd 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -927,6 +927,9 @@ static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
>>
>>  static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>>  {
>>+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
>>+	struct pnv_phb *phb = hose->private_data;
>>+
>>  	/* This is quite simplistic. The "base" weight of a device
>>  	 * is 10. 0 means no DMA is to be accounted for it.
>>  	 */
>>@@ -939,14 +942,34 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>>  	if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>>  	    dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>>  	    dev->class == PCI_CLASS_SERIAL_USB_EHCI)
>>-		return 3;
>>+		return 3 * phb->ioda.tce32_count;
>>
>>  	/* Increase the weight of RAID (includes Obsidian) */
>>  	if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
>>-		return 15;
>>+		return 15 * phb->ioda.tce32_count;
>>
>>  	/* Default */
>>-	return 10;
>>+	return 10 * phb->ioda.tce32_count;
>>+}
>>+
>>+static int __pnv_ioda_phb_dma_weight(struct pci_dev *pdev, void *data)
>>+{
>>+	unsigned int *dma_weight = data;
>>+
>>+	*dma_weight += pnv_ioda_dma_weight(pdev);
>>+	return 0;
>>+}
>>+
>>+static unsigned int pnv_ioda_phb_dma_weight(struct pnv_phb *phb)
>>+{
>>+	unsigned int dma_weight = 0;
>>+
>>+	if (!phb->hose->bus)
>>+		return 0;
>>+
>>+	pci_walk_bus(phb->hose->bus,
>>+		     __pnv_ioda_phb_dma_weight, &dma_weight);
>>+	return dma_weight;
>>  }
>>
>>  #ifdef CONFIG_PCI_IOV
>>@@ -1097,14 +1120,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  	/* Put PE to the list */
>>  	list_add_tail(&pe->list, &phb->ioda.pe_list);
>>
>>-	/* Account for one DMA PE if at least one DMA capable device exist
>>-	 * below the bridge
>>-	 */
>>-	if (pe->dma_weight != 0) {
>>-		phb->ioda.dma_weight += pe->dma_weight;
>>-		phb->ioda.dma_pe_count++;
>>-	}
>>-
>>  	/* Link the PE */
>>  	pnv_ioda_link_pe_by_weight(phb, pe);
>>  }
>>@@ -2431,24 +2446,13 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  {
>>  	struct pci_controller *hose = phb->hose;
>>-	unsigned int residual, remaining, segs, tw, base;
>>  	struct pnv_ioda_pe *pe;
>>+	unsigned int dma_weight;
>>
>>-	/* If we have more PE# than segments available, hand out one
>>-	 * per PE until we run out and let the rest fail. If not,
>>-	 * then we assign at least one segment per PE, plus more based
>>-	 * on the amount of devices under that PE
>>-	 */
>>-	if (phb->ioda.dma_pe_count > phb->ioda.tce32_count)
>>-		residual = 0;
>>-	else
>>-		residual = phb->ioda.tce32_count -
>>-			phb->ioda.dma_pe_count;
>>-
>>-	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
>>-		hose->global_number, phb->ioda.tce32_count);
>>-	pr_info("PCI: %d PE# for a total weight of %d\n",
>>-		phb->ioda.dma_pe_count, phb->ioda.dma_weight);
>>+	/* Calculate the PHB's DMA weight */
>>+	dma_weight = pnv_ioda_phb_dma_weight(phb);
>>+	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
>>+		hose->global_number, phb->ioda.tce32_count, dma_weight);
>>
>>  	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>
>>@@ -2456,22 +2460,9 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  	 * out one base segment plus any residual segments based on
>>  	 * weight
>>  	 */
>>-	remaining = phb->ioda.tce32_count;
>>-	tw = phb->ioda.dma_weight;
>>-	base = 0;
>>  	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>  		if (!pe->dma_weight)
>>  			continue;
>>-		if (!remaining) {
>>-			pe_warn(pe, "No DMA32 resources available\n");
>>-			continue;
>>-		}
>>-		segs = 1;
>>-		if (residual) {
>>-			segs += ((pe->dma_weight * residual)  + (tw / 2)) / tw;
>>-			if (segs > remaining)
>>-				segs = remaining;
>>-		}
>>
>>  		/*
>>  		 * For IODA2 compliant PHB3, we needn't care about the weight.
>>@@ -2479,17 +2470,24 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>  		 * the specific PE.
>>  		 */
>>  		if (phb->type == PNV_PHB_IODA1) {
>>-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>>+			unsigned int segs, base = 0;
>>+
>>+			if (pe->dma_weight <
>>+			    dma_weight / phb->ioda.tce32_count)
>>+				segs = 1;
>>+			else
>>+				segs = (pe->dma_weight *
>>+					phb->ioda.tce32_count) / dma_weight;
>>+
>>+			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
>>  				pe->dma_weight, segs);
>>  			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>>+
>>+			base += segs;
>
>
>This is not right. @base here is a local variable in the scope,
>pnv_pci_ioda_setup_dma_pe() will always be called with base==0.
>
>
>Sorry for commenting the same patch twice.
>

That's ok to comment for twice on same patch. But I don't see
how it's wrong. The function (pnv_ioda_setup_dma()) is called
as below and it iterate all PEs in the PHB's DMA32 list. That
means the function is affects PHB, not every PE yet. It's out
of problem with "base=0".

pnv_pci_ioda_fixup
  pnv_pci_ioda_setup_DMA
    pnv_ioda_setup_dma
>
>>  		} else {
>>  			pe_info(pe, "Assign DMA32 space\n");
>>-			segs = 0;
>>  			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>  		}
>>-
>>-		remaining -= segs;
>>-		base += segs;
>>  	}
>>  }
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 08a4e57..addd3f7 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -183,12 +183,6 @@ struct pnv_phb {
>>  			/* 32-bit TCE tables allocation */
>>  			unsigned long		tce32_count;
>>
>>-			/* Total "weight" for the sake of DMA resources
>>-			 * allocation
>>-			 */
>>-			unsigned int		dma_weight;
>>-			unsigned int		dma_pe_count;
>>-
>>  			/* Sorted list of used PE's, sorted at
>>  			 * boot for resource allocation purposes
>>  			 */
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only
  2015-08-11  2:39       ` Alexey Kardashevskiy
@ 2015-08-12 23:59         ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-12 23:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Tue, Aug 11, 2015 at 12:39:02PM +1000, Alexey Kardashevskiy wrote:
>On 08/11/2015 10:29 AM, Gavin Shan wrote:
>>On Mon, Aug 10, 2015 at 07:31:11PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>The original implementation of pnv_ioda_setup_dma() iterates the
>>>>list of PEs and configures the DMA32 space for them one by one.
>>>>The function was designed to be called during PHB fixup time.
>>>>When configuring PE's DMA32 space in pcibios_setup_bridge(), in
>>>>order to support PCI hotplug, we have to have the function PE
>>>>oriented.
>>>>
>>>>This renames pnv_ioda_setup_dma() to pnv_ioda1_setup_dma() and
>>>>adds one more argument "struct pnv_ioda_pe *pe" to it. The caller,
>>>>pnv_pci_ioda_setup_DMA(), gets PE from the list and passes to it
>>>>or pnv_pci_ioda2_setup_dma_pe(). The patch shouldn't cause behavioral
>>>>changes.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 75 +++++++++++++++----------------
>>>>  1 file changed, 36 insertions(+), 39 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 8456f37..cd22002 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -2443,52 +2443,29 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>>>  		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>>>>  }
>>>>
>>>>-static void pnv_ioda_setup_dma(struct pnv_phb *phb)
>>>>+static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
>>>>+					struct pnv_ioda_pe *pe,
>>>>+					unsigned int base)
>>>>  {
>>>>  	struct pci_controller *hose = phb->hose;
>>>>-	struct pnv_ioda_pe *pe;
>>>>-	unsigned int dma_weight;
>>>>+	unsigned int dma_weight, segs;
>>>>
>>>>  	/* Calculate the PHB's DMA weight */
>>>>  	dma_weight = pnv_ioda_phb_dma_weight(phb);
>>>>  	pr_info("PCI%04x has %ld DMA32 segments, total weight %d\n",
>>>>  		hose->global_number, phb->ioda.dma32_segcount, dma_weight);
>>>>
>>>>-	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>>>-
>>>>-	/* Walk our PE list and configure their DMA segments, hand them
>>>>-	 * out one base segment plus any residual segments based on
>>>>-	 * weight
>>>>-	 */
>>>>-	list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>>>-		if (!pe->dma32_weight)
>>>>-			continue;
>>>>-
>>>>-		/*
>>>>-		 * For IODA2 compliant PHB3, we needn't care about the weight.
>>>>-		 * The all available 32-bits DMA space will be assigned to
>>>>-		 * the specific PE.
>>>>-		 */
>>>>-		if (phb->type == PNV_PHB_IODA1) {
>>>>-			unsigned int segs, base = 0;
>>>>-
>>>>-			if (pe->dma32_weight <
>>>>-			    dma_weight / phb->ioda.dma32_segcount)
>>>>-				segs = 1;
>>>>-			else
>>>>-				segs = (pe->dma32_weight *
>>>>-					phb->ioda.dma32_segcount) / dma_weight;
>>>>-
>>>>-			pe_info(pe, "DMA32 weight %d, assigned %d segments\n",
>>>>-				pe->dma32_weight, segs);
>>>>-			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>>>>+	if (pe->dma32_weight <
>>>>+	    dma_weight / phb->ioda.dma32_segcount)
>>>
>>>Can be one line now.
>>>
>>
>>Indeed.
>>
>>>>+		segs = 1;
>>>>+	else
>>>>+		segs = (pe->dma32_weight *
>>>>+			phb->ioda.dma32_segcount) / dma_weight;
>>>>+	pe_info(pe, "DMA weight %d, assigned %d segments\n",
>>>>+		pe->dma32_weight, segs);
>>>>+	pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>>>
>>>
>>>Why not to merge pnv_ioda1_setup_dma() to pnv_pci_ioda_setup_dma_pe()?
>>>
>>
>>There're two reasons:
>>- They're separate logically. One is calculating number of DMA32 segments required.
>>   Another one is allocate TCE32 tables and configure devices with them.
>>- In PCI hotplug path, I need pnv_ioda1_setup_dma() which has "pe" as parameter.
>
>
>And hotplug path does not care about dma weight why?
>

PHB3 doesn't care about DMA weight, but P7IOC needs.

>>
>>>>
>>>>-			base += segs;
>>>>-		} else {
>>>>-			pe_info(pe, "Assign DMA32 space\n");
>>>>-			pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>>>-		}
>>>>-	}
>>>>+	return segs;
>>>>  }
>>>>
>>>>  #ifdef CONFIG_PCI_MSI
>>>>@@ -2955,12 +2932,32 @@ static void pnv_pci_ioda_setup_DMA(void)
>>>>  {
>>>>  	struct pci_controller *hose, *tmp;
>>>>  	struct pnv_phb *phb;
>>>>+	struct pnv_ioda_pe *pe;
>>>>+	unsigned int base;
>>>>
>>>>  	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>>>-		pnv_ioda_setup_dma(hose->private_data);
>>>>+		phb = hose->private_data;
>>>>+		pnv_pci_ioda_setup_opal_tce_kill(phb);
>>>>+
>>>>+		base = 0;
>>>>+		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>>>+			if (!pe->dma32_weight)
>>>>+				continue;
>>>>+
>>>>+			switch (phb->type) {
>>>>+			case PNV_PHB_IODA1:
>>>>+				base += pnv_ioda1_setup_dma(phb, pe, base);
>>>
>>>
>>>This @base handling seems never be tested between 8..11 as "[PATCH v6 11/42]
>>>powerpc/powernv: Trace DMA32 segments consumed by PE"
>>>removes it and I suspect you only tested the final version. Which is ok for
>>>the final result but not ok for bisectability.
>>>
>>>Looks like 8/42, 9/42, 10/42, 11/42 need to be rearranged or merged to remove
>>>this multiple @base touching.
>>>
>>
>>Why ?
>
>You are touching this @base from 8/42 to 11/12 and in between it is very
>broken, you only get it fixed (by removing) in 11/42. Read my comment for
>8/42. After every single patch in any patchset the functionality should not
>break but it does in this patchset.
>

Please refer the reply to PATCH[8/42] then.


>
>>
>>>
>>>>+				break;
>>>>+			case PNV_PHB_IODA2:
>>>>+				pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>>>+				break;
>>>>+			default:
>>>>+				pr_warn("%s: No DMA for PHB type %d\n",
>>>>+					__func__, phb->type);
>>>>+			}
>>>>+		}
>>>>
>>>>  		/* Mark the PHB initialization done */
>>>>-		phb = hose->private_data;
>>>>  		phb->initialized = 1;
>>>>  	}
>>>>  }
>>>>
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by PE
  2015-08-10  9:43   ` Alexey Kardashevskiy
  2015-08-11  0:33     ` Gavin Shan
@ 2015-08-13  0:02     ` Gavin Shan
  1 sibling, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-13  0:02 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Mon, Aug 10, 2015 at 07:43:48PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>On P7IOC, the whole DMA32 space is divided evenly to 256MB segments.
>>Each PE can consume one or multiple DMA32 segments. Current code
>>doesn't trace the available DMA32 segments and those consumed by
>>one particular PE. It's conflicting with PCI hotplug.
>>
>>The patch introduces one bitmap to PHB to trace the available
>>DMA32 segments for allocation, more fields to "struct pnv_ioda_pe"
>>to trace the consumed DMA32 segments by the PE, which is going to
>>be released when the PE is destroyed at PCI unplugging time.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 40 +++++++++++++++++++++++--------
>>  arch/powerpc/platforms/powernv/pci.h      |  4 +++-
>>  2 files changed, 33 insertions(+), 11 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index cd22002..57ba8fd 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1946,6 +1946,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>
>>  	/* Grab a 32-bit TCE table */
>>  	pe->dma32_seg = base;
>>+	pe->dma32_segcount = segs;
>>  	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>  		(base << 28), ((base + segs) << 28) - 1);
>>
>>@@ -2006,8 +2007,13 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>  	return;
>>   fail:
>>  	/* XXX Failure: Try to fallback to 64-bit only ? */
>>-	if (pe->dma32_seg >= 0)
>>+	if (pe->dma32_seg >= 0) {
>>+		bitmap_clear(phb->ioda.dma32_segmap,
>>+			     pe->dma32_seg, pe->dma32_segcount);
>>  		pe->dma32_seg = -1;
>>+		pe->dma32_segcount = 0;
>>+	}
>>+
>>  	if (tce_mem)
>>  		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>>  	if (tbl) {
>>@@ -2443,12 +2449,11 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>>  }
>>
>>-static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
>>-					struct pnv_ioda_pe *pe,
>>-					unsigned int base)
>>+static void pnv_ioda1_setup_dma(struct pnv_phb *phb,
>>+					struct pnv_ioda_pe *pe)
>>  {
>>  	struct pci_controller *hose = phb->hose;
>>-	unsigned int dma_weight, segs;
>>+	unsigned int dma_weight, base, segs;
>>
>>  	/* Calculate the PHB's DMA weight */
>>  	dma_weight = pnv_ioda_phb_dma_weight(phb);
>>@@ -2461,11 +2466,28 @@ static unsigned int pnv_ioda1_setup_dma(struct pnv_phb *phb,
>>  	else
>>  		segs = (pe->dma32_weight *
>>  			phb->ioda.dma32_segcount) / dma_weight;
>>+
>>+	/*
>>+	 * Allocate DMA32 segments. We might not have enough
>>+	 * resources available. However we expect at least one
>>+	 * to be available.
>>+	 */
>>+	do {
>>+		base = bitmap_find_next_zero_area(phb->ioda.dma32_segmap,
>>+						  phb->ioda.dma32_segcount,
>>+						  0, segs, 0);
>>+		if (base < phb->ioda.dma32_segcount) {
>>+			bitmap_set(phb->ioda.dma32_segmap, base, segs);
>>+			break;
>>+		}
>>+	} while (--segs);
>
>
>If segs==0 before entering the loop, the loop will execute 0xfffffffe times.
>Make it for(;segs;--segs){ }.
>

"segs" is always equal to or bigger than 1 when entering the loop.

>>+
>>+	if (WARN_ON(!segs))
>>+		return;
>>+
>>  	pe_info(pe, "DMA weight %d, assigned %d segments\n",
>>  		pe->dma32_weight, segs);
>>  	pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
>>-
>>-	return segs;
>>  }
>>
>>  #ifdef CONFIG_PCI_MSI
>>@@ -2933,20 +2955,18 @@ static void pnv_pci_ioda_setup_DMA(void)
>>  	struct pci_controller *hose, *tmp;
>>  	struct pnv_phb *phb;
>>  	struct pnv_ioda_pe *pe;
>>-	unsigned int base;
>>
>>  	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>  		phb = hose->private_data;
>>  		pnv_pci_ioda_setup_opal_tce_kill(phb);
>>
>>-		base = 0;
>>  		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>  			if (!pe->dma32_weight)
>>  				continue;
>>
>>  			switch (phb->type) {
>>  			case PNV_PHB_IODA1:
>>-				base += pnv_ioda1_setup_dma(phb, pe, base);
>>+				pnv_ioda1_setup_dma(phb, pe);
>>  				break;
>>  			case PNV_PHB_IODA2:
>>  				pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 574fe43..1dc9578 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -65,6 +65,7 @@ struct pnv_ioda_pe {
>>
>>  	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
>>  	int			dma32_seg;
>>+	int			dma32_segcount;
>>  	struct iommu_table_group table_group;
>>
>>  	/* 64-bit TCE bypass region */
>>@@ -153,10 +154,11 @@ struct pnv_phb {
>>  			unsigned int		io_segsize;
>>  			unsigned int		io_pci_base;
>>
>>-			/* IO, M32, M64 segment maps */
>>+			/* IO, M32, M64, DMA32 segment maps */
>>  			unsigned long		io_segmap[8];
>>  			unsigned long		m32_segmap[8];
>>  			unsigned long		m64_segmap[8];
>>+			unsigned long		dma32_segmap[8];
>>
>>  			/* PE allocation */
>>  			struct mutex		pe_alloc_mutex;
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 12/42] powerpc/powernv: Increase PE# capacity
  2015-08-11  2:47           ` Alexey Kardashevskiy
@ 2015-08-13  0:23             ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-13  0:23 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Tue, Aug 11, 2015 at 12:47:25PM +1000, Alexey Kardashevskiy wrote:
>On 08/11/2015 10:38 AM, Gavin Shan wrote:
>>On Mon, Aug 10, 2015 at 07:53:02PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>Each PHB maintains an array helping to translate RID (Request
>>>>ID) to PE# with the assumption that PE# takes 8 bits, indicating
>>>>that we can't have more than 256 PEs. However, pci_dn->pe_number
>>>>already had 4-bytes for the PE#.
>>>>
>>>>The patch extends the PE# capacity so that each of them will be
>>>>4-bytes long. Then we can use IODA_INVALID_PE to check one entry
>>>>in phb->pe_rmap[] is valid or not.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 8 ++++++--
>>>>  arch/powerpc/platforms/powernv/pci.h      | 7 +++----
>>>>  2 files changed, 9 insertions(+), 6 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 57ba8fd..3094c61 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -786,7 +786,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>>>
>>>>  	/* Clear the reverse map */
>>>>  	for (rid = pe->rid; rid < rid_end; rid++)
>>>>-		phb->ioda.pe_rmap[rid] = 0;
>>>>+		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>>>>
>>>>  	/* Release from all parents PELT-V */
>>>>  	while (parent) {
>>>>@@ -3134,7 +3134,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>  	unsigned long size, pemap_off;
>>>>  	const __be64 *prop64;
>>>>  	const __be32 *prop32;
>>>>-	int len;
>>>>+	int len, i;
>>>>  	u64 phb_id;
>>>>  	void *aux;
>>>>  	long rc;
>>>>@@ -3201,6 +3201,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>>  	if (prop32)
>>>>  		phb->ioda.reserved_pe = be32_to_cpup(prop32);
>>>>
>>>>+	/* Invalidate RID to PE# mapping */
>>>>+	for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
>>>>+		phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
>>>>+
>>>>  	/* Parse 64-bit MMIO range */
>>>>  	pnv_ioda_parse_m64_window(phb);
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>>>index 1dc9578..6f8568e 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci.h
>>>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>>>@@ -175,11 +175,10 @@ struct pnv_phb {
>>>>  			struct list_head	pe_list;
>>>>  			struct mutex            pe_list_mutex;
>>>>
>>>>-			/* Reverse map of PEs, will have to extend if
>>>>-			 * we are to support more than 256 PEs, indexed
>>>>-			 * bus { bus, devfn }
>>>>+			/* Reverse map of PEs, indexed by
>>>>+			 * { bus, devfn }
>>>>  			 */
>>>>-			unsigned char		pe_rmap[0x10000];
>>>>+			int			pe_rmap[0x10000];
>>>
>>>
>>>256k seems to be waste when only tiny fraction of it will ever be used. Using
>>>include/linux/hashtable.h makes sense here, and if you use a hashtable, you
>>>won't have to initialize anything with IODA_INVALID_PE.
>>>
>>
>>I'm not sure if I follow your idea completely. With hash table to trace
>>RID mapping here, won't more memory needed if all PCI buse numbers (0
>>to 255) are all valid? It means hash table doesn't have advantage in
>>memory consumption.
>
>You need 3 bytes - one for a bus and two for devfn - which makes it a perfect
>32bit has key and you only store existing devices in a hash so you do not
>waste memory.
>

You don't answer my concern yet: more memory will be needed if all PCI bus
numbers (0 to 255) are all valid. Also, 2 bytes are enough: one byte is for
bus number, another byte for devfn. Why we need 3 bytes here?

How many bits of the 16-bits (2-bytes) used as the hash key? I believe it
shouldn't all of them because lot of memory will be consumed for the hash
bucket heads. Since most of cases, we have bus level PE. So it sounds
reasonable to use the "devfn" as hash key, which is one-byte long. In this
case, 2KB (256 * 8) is used for the hash bucket head without any node
populated in the table yet.

Every node would be represented by below data struct, each of which consumes
24-bytes. If the PHB has 5 PCI buses, which is commonly seen, the total consumed
memory will be:

2KB for hash bucket head
30KB for hash nodes: (24 * 256 * 5)

struct pnv_ioda_rid {
       int bdfn;
       int pe_number;
       struct hlist_node node;
};

Don't forget it need more complex to maintain the conflicting list in one
bucket. So I don't see the benefit to use hashtable here.

>
>>On the other hand, searching in hash table buckets
>>have to iterate list of conflicting items (keys), which is slow comparing
>>to what we have.
>
>How often do you expect this code to execute? Is not it setup-type and
>hotplug only? Unless it is thousands times per second, it is not an issue
>here.
>

I was intending to say: hashtable has more complex than array. The data
struct can be as simple as array. I don't see why we bother to have
hashtable here. However, you're correct, the code is just executed at
system booting and hotplug time.

>>Actually, I like the idea, using array to map RID to PE#,
>>which was implemented by Ben.
>
>Where?
>

The array (unsigned char pe_rmap[0x10000]) was introduced by Ben. I think
it's simple enough to finish the simple work: translating RID to PE number.

>>
>>>
>>>>
>>>>  			/* 32-bit TCE tables allocation */
>>>>  			unsigned long		dma32_segcount;
>>>>
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order
  2015-08-11  2:50       ` Alexey Kardashevskiy
@ 2015-08-13  0:28         ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-13  0:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Tue, Aug 11, 2015 at 12:50:33PM +1000, Alexey Kardashevskiy wrote:
>On 08/11/2015 10:43 AM, Gavin Shan wrote:
>>On Tue, Aug 11, 2015 at 12:39:02AM +1000, Alexey Kardashevskiy wrote:
>>>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>>>The available PE#, represented by a bitmap in the PHB, is allocated
>>>>in ascending order.
>>>
>>>Available PE# is available exactly because it is not allocated ;)
>>>
>>
>>Yeah, will correct it.
>>
>>>>It conflicts with the fact that M64 segments are
>>>>assigned in same order. In order to avoid the conflict, the patch
>>>>allocates PE# in descending order.
>>>
>>>What kind of conflict?
>>>
>>
>>On PHB3, the M64 segment is assigned to one PE whose PE number is
>>determined. M64 segment are allocated in ascending order. It's why
>>I would like to allocate PE# in deascending order.
>
>
>From previous lessons, I thought M64 segment number is PE# number as well :-/
>Seems this is not the case, so what does store this seg#<->PE# mapping in PHB?
>

Your understanding is somewhat correct. Let me explain for more here. Taking
PHB3 as an example: it has 16 M64 BARs. The last BAR (15th) is running in
share mode. When one segment from this BAR is assigned to one PE, the PE number
is determined and that's equal to the segment number. However, it's still possible
one PE has multiple segments. We have "master" and "slave" PEs for the later case.

If any one left BARs (0 to 14) is running in single mode and assigned to one particular
PE. the PE number can be confiugred.

>>
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>---
>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 11 ++++++++---
>>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 56b058c..1c950e8 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -161,13 +161,18 @@ static struct pnv_ioda_pe *pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>>  static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>>>  {
>>>>  	unsigned long pe;
>>>>+	unsigned long limit = phb->ioda.total_pe_num - 1;
>>>>
>>>>  	do {
>>>>  		pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>>>-					phb->ioda.total_pe_num, 0);
>>>>-		if (pe >= phb->ioda.total_pe_num)
>>>>+					phb->ioda.total_pe_num, limit);
>>>>+		if (pe < phb->ioda.total_pe_num &&
>>>>+		    !test_and_set_bit(pe, phb->ioda.pe_alloc))
>>>>+			break;
>>>>+
>>>>+		if (--limit >= phb->ioda.total_pe_num)
>>>>  			return NULL;
>>>>-	} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>>>+	} while (1);
>>>
>>>
>>>Usually, if it is "while(1)", then it is "while(1){}" rather than
>>>"do{}while(1)" :)
>>
>>Agree, will change it.
>>
>>>
>>>
>>>>
>>>>  	return pnv_ioda_init_pe(phb, pe);
>>>>  }
>>>>
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically
  2015-08-11 13:03   ` Alexey Kardashevskiy
@ 2015-08-13  0:54     ` Gavin Shan
  0 siblings, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-08-13  0:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, devicetree, linux-pci, panto,
	grant.likely, robherring2, bhelgaas

On Tue, Aug 11, 2015 at 11:03:40PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>This adds the refcount to PE, which represents number of PCI
>>devices contained in the PE. When last device leaves from the
>>PE, the PE together with its consumed resources (IO, DMA, PELTM,
>>PELTV) are released, to support PCI hotplug.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 233 +++++++++++++++++++++++++++---
>>  arch/powerpc/platforms/powernv/pci.h      |   3 +
>>  2 files changed, 217 insertions(+), 19 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index d2697a3..13d8a5b 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -132,6 +132,53 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>  		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>>  }
>>
>>+static void pnv_pci_ioda_release_pe_dma(struct pnv_ioda_pe *pe)
>
>Is this ioda1 helper or common helper for both ioda1 and ioda2?
>

It's for IODA1 only.

>>+{
>>+	struct pnv_phb *phb = pe->phb;
>>+	struct iommu_table *tbl;
>>+	int seg;
>>+	int64_t rc;
>>+
>>+	/* No DMA32 segments allocated */
>>+	if (pe->dma32_seg == PNV_INVALID_SEGMENT ||
>>+	    pe->dma32_segcount <= 0) {
>
>
>dma32_segcount is unsigned long, cannot be less than 0.
>

It's "int dma32_segcount" in pci.h:

>>+		pe->dma32_seg = PNV_INVALID_SEGMENT;
>>+		pe->dma32_segcount = 0;
>>+		return;
>>+	}
>>+
>>+	/* Unlink IOMMU table from group */
>>+	tbl = pe->table_group.tables[0];
>>+	pnv_pci_unlink_table_and_group(tbl, &pe->table_group);
>>+	if (pe->table_group.group) {
>>+		iommu_group_put(pe->table_group.group);
>>+		BUG_ON(pe->table_group.group);
>>+	}
>>+
>>+	/* Release IOMMU table */
>>+	free_pages(tbl->it_base,
>>+		get_order(TCE32_TABLE_SIZE * pe->dma32_segcount));
>>+	iommu_free_table(tbl,
>>+		of_node_full_name(pci_bus_to_OF_node(pe->pbus)));
>
>There is pnv_pci_ioda2_table_free_pages(), use it.
>

The function (pnv_pci_ioda_release_pe_dma()) is for IODA1 only.

>>+
>>+	/* Disable TVE */
>>+	for (seg = pe->dma32_seg;
>>+	     seg < pe->dma32_seg + pe->dma32_segcount;
>>+	     seg++) {
>>+		rc = opal_pci_map_pe_dma_window(phb->opal_id,
>>+				pe->pe_number, seg, 0, 0ul, 0ul, 0ul);
>>+		if (rc)
>>+			pe_warn(pe, "Error %ld unmapping DMA32 seg#%d\n",
>>+				rc, seg);
>>+	}
>
>May be implement iommu_table_group_ops::unset_window for IODA1 too?
>

Good point, but it's something out of scope. I'm putting it into my TODO
list and cook up the patch when having chance.

>>+
>>+	/* Free the DMA32 segments */
>>+	bitmap_clear(phb->ioda.dma32_segmap,
>>+		pe->dma32_seg, pe->dma32_segcount);
>>+	pe->dma32_seg = PNV_INVALID_SEGMENT;
>>+	pe->dma32_segcount = 0;
>>+}
>>+
>>  static inline void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe)
>>  {
>>  	/* 01xb - invalidate TCEs that match the specified PE# */
>>@@ -199,13 +246,15 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
>>  		pe->tce_bypass_enabled = enable;
>>  }
>>
>>-#ifdef CONFIG_PCI_IOV
>>-static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev,
>>-					 struct pnv_ioda_pe *pe)
>>+static void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe)
>>  {
>>  	struct iommu_table    *tbl;
>>+	struct device_node    *dn;
>>  	int64_t               rc;
>>
>>+	if (pe->dma32_seg == PNV_INVALID_SEGMENT)
>>+		return;
>>+
>>  	tbl = pe->table_group.tables[0];
>>  	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
>>  	if (rc)
>>@@ -216,10 +265,91 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev,
>>  		iommu_group_put(pe->table_group.group);
>>  		BUG_ON(pe->table_group.group);
>>  	}
>>+
>>+	if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
>>+		dn = pci_bus_to_OF_node(pe->pbus);
>>+	else if (pe->flags & PNV_IODA_PE_DEV)
>>+		dn = pci_device_to_OF_node(pe->pdev);
>>+#ifdef CONFIG_PCI_IOV
>>+	else if (pe->flags & PNV_IODA_PE_VF)
>>+		dn = pci_device_to_OF_node(pe->parent_dev);
>>+#endif
>>+	else
>>+		dn = NULL;
>>+
>>  	pnv_pci_ioda2_table_free_pages(tbl);
>>-	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>+	iommu_free_table(tbl, of_node_full_name(dn));
>>+	pe->dma32_seg = PNV_INVALID_SEGMENT;
>>+}
>
>
>
>I'd drop the chunk about calculating @dn above, nobody really cares what
>iommu_free_table() prints. If you really need to print something, print PE#.
>

It makes sense. I'll drop the chunk of garbage and replace it with the
PE number.

>>+
>>+static void pnv_ioda_release_pe_dma(struct pnv_ioda_pe *pe)
>>+{
>>+	struct pnv_phb *phb = pe->phb;
>>+
>>+	switch (phb->type) {
>>+	case PNV_PHB_IODA1:
>>+		pnv_pci_ioda_release_pe_dma(pe);
>>+		break;
>>+	case PNV_PHB_IODA2:
>>+		pnv_pci_ioda2_release_pe_dma(pe);
>>+		break;
>>+	default:
>>+		pr_warn("%s: Cannot release DMA for PHB type %d\n",
>>+			__func__, phb->type);
>
>This is BUG_ON() indeed because we cannot possibly get that far with
>unsupported PHB type, it would have crashed earlier.
>

Right. I'll using BUG_ON() then.

>>+	}
>>+}
>>+
>>+static void pnv_ioda_release_pe_one_seg(struct pnv_ioda_pe *pe, int win)
>>+{
>>+	struct pnv_phb *phb = pe->phb;
>>+	unsigned long *segmap = NULL;
>>+	unsigned long *pe_segmap = NULL;
>>+	int segno, limit, mod = 0;
>>+
>>+	switch (win) {
>>+	case OPAL_IO_WINDOW_TYPE:
>>+		segmap = phb->ioda.io_segmap;
>>+		pe_segmap = pe->io_segmap;
>>+		break;
>>+	case OPAL_M32_WINDOW_TYPE:
>>+		segmap = phb->ioda.m32_segmap;
>>+		pe_segmap = pe->m32_segmap;
>>+		break;
>>+	case OPAL_M64_WINDOW_TYPE:
>>+		if (phb->type != PNV_PHB_IODA1)
>>+			return;
>>+		segmap = phb->ioda.m64_segmap;
>>+		pe_segmap = pe->m64_segmap;
>
>
>You seem to keep phb->ioda.m64_segmap update but you never actually read it,
>you only read pe->m64_segmap. Is that correct or I am missing something here?
>
>

You're correct to some extent. There're two reasons to have phb->ioda.m64_segmap
as below. However, you suggested to have hashtable to reprenet segment mapping,
which isn't finalized yet:

- Track the used M64 segment from PHB's domain. Easy for debugging.
- Used to avoid reserve same segment for twice.

>>+		mod = 8;
>>+		break;
>>+	default:
>>+		return;
>>+	}
>>+
>>+	segno = -1;
>>+	limit = phb->ioda.total_pe_num;
>>+	while ((segno = find_next_bit(pe_segmap, limit, segno + 1)) < limit) {
>>+		if (mod > 0)
>>+			opal_pci_map_pe_mmio_window(phb->opal_id,
>>+				phb->ioda.reserved_pe_idx, win,
>>+				segno / mod, segno % mod);
>>+		else
>>+			opal_pci_map_pe_mmio_window(phb->opal_id,
>>+					phb->ioda.reserved_pe_idx, win,
>>+					0, segno);
>>+
>>+		clear_bit(segno, pe_segmap);
>>+		clear_bit(segno, segmap);
>>+	}
>>+}
>>+
>>+static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
>>+{
>>+	int win;
>>+
>>+	for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++)
>>+		pnv_ioda_release_pe_one_seg(pe, win);
>>  }
>>-#endif /* CONFIG_PCI_IOV */
>>
>>  static int pnv_ioda_set_one_peltv(struct pnv_phb *phb,
>>  				  struct pnv_ioda_pe *parent,
>>@@ -325,7 +455,6 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
>>  	return 0;
>>  }
>>
>>-#ifdef CONFIG_PCI_IOV
>>  static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>  {
>>  	struct pci_dev *parent;
>>@@ -373,9 +502,11 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>  		}
>>  		rid_end = pe->rid + (count << 8);
>>  	} else {
>>+#ifdef CONFIG_PCI_IOV
>>  		if (pe->flags & PNV_IODA_PE_VF)
>>  			parent = pe->parent_dev;
>>  		else
>>+#endif
>>  			parent = pe->pdev->bus->self;
>>  		bcomp = OpalPciBusAll;
>>  		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
>>@@ -415,11 +546,72 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>
>>  	pe->pbus = NULL;
>>  	pe->pdev = NULL;
>>+#ifdef CONFIG_PCI_IOV
>>  	pe->parent_dev = NULL;
>>+#endif
>>
>>  	return 0;
>>  }
>>-#endif /* CONFIG_PCI_IOV */
>>+
>>+static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
>>+{
>>+	struct pnv_phb *phb = pe->phb;
>>+	struct pnv_ioda_pe *tmp, *slave;
>>+
>>+	/* Release slave PEs in compound PE */
>>+	if (pe->flags & PNV_IODA_PE_MASTER) {
>>+		list_for_each_entry_safe(slave, tmp, &pe->slaves, list)
>>+			pnv_ioda_release_pe(pe);
>>+	}
>>+
>>+	/* Remove the PE from the list */
>>+	list_del(&pe->list);
>>+
>>+	/* Release resources */
>>+	pnv_ioda_release_pe_dma(pe);
>>+	pnv_ioda_release_pe_seg(pe);
>>+	pnv_ioda_deconfigure_pe(pe->phb, pe);
>>+
>>+	/* Release PE number */
>>+	clear_bit(pe->pe_number, phb->ioda.pe_alloc);
>>+}
>>+
>>+static inline struct pnv_ioda_pe *pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
>>+{
>>+	if (!pe)
>>+		return NULL;
>>+
>>+	pe->device_count++;
>>+	return pe;
>>+}
>>+
>>+static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
>>+{
>>+	if (!pe)
>>+		return;
>>+
>>+	pe->device_count--;
>>+	BUG_ON(pe->device_count < 0);
>>+	if (pe->device_count == 0)
>>+		pnv_ioda_release_pe(pe);
>>+}
>
>Sure you do not want atomic_t for device_count? Races are impossibe here?
>

Yes, I don't see any possible race. Also, it's what you suggested. Here's
the comment you gave:

 | You do not need kref here. You call kref_put() in a single location and can do
 | stuff directly, without kref. Just have an "unsigned int" counter and that's
 | it (it does not even have to be atomic if you do not have races but I am not
 | sure you do not).

>>+
>>+static void pnv_pci_release_device(struct pci_dev *pdev)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>+	struct pnv_phb *phb = hose->private_data;
>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>+	struct pnv_ioda_pe *pe;
>>+
>>+	if (pdev->is_virtfn)
>>+		return;
>>+
>>+	if (!pdn || pdn->pe_number == IODA_INVALID_PE)
>>+		return;
>>+
>>+	pe = &phb->ioda.pe_array[pdn->pe_number];
>>+	pnv_ioda_pe_put(pe);
>>+}
>>
>>  static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
>>  {
>>@@ -466,6 +658,7 @@ static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>  	return pnv_ioda_init_pe(phb, pe);
>>  }
>>
>>+#ifdef CONFIG_PCI_IOV
>>  static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>
>The name of pnv_ioda_free_pe() suggests it should work for non-SRIOV case too
>but you put it under #ifdef IOV, is that correct? Is so, rename it please.
>

It's used by SRIOV code only. I'll rename it to pnv_ioda_free_vf_pe() in
separate patch.

>
>>  {
>>  	WARN_ON(phb->ioda.pe_array[pe].pdev);
>>@@ -473,6 +666,7 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
>>  	memset(&phb->ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
>>  	clear_bit(pe, phb->ioda.pe_alloc);
>>  }
>>+#endif
>>
>>  static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>  {
>>@@ -1177,6 +1371,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>  		if (pdn->pe_number != IODA_INVALID_PE)
>>  			continue;
>>
>>+		pnv_ioda_pe_get(pe);
>>  		pdn->pe_number = pe->pe_number;
>>  		pe->dma32_weight += pnv_ioda_dma_weight(dev);
>>  		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>@@ -1231,7 +1426,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  	pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>>  	pe->pbus = bus;
>>  	pe->pdev = NULL;
>>-	pe->dma32_seg = -1;
>>+	pe->dma32_seg = PNV_INVALID_SEGMENT;
>>  	pe->mve_number = -1;
>>  	pe->rid = bus->busn_res.start << 8;
>>  	pe->dma32_weight = 0;
>>@@ -1244,9 +1439,8 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  			bus->busn_res.start, pe->pe_number);
>>
>>  	if (pnv_ioda_configure_pe(phb, pe)) {
>>-		/* XXX What do we do here ? */
>>-		pnv_ioda_free_pe(phb, pe->pe_number);
>>  		pe->pbus = NULL;
>>+		pnv_ioda_release_pe(pe);
>>  		return NULL;
>>  	}
>>
>>@@ -1449,14 +1643,14 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>  		if ((pe->flags & PNV_IODA_PE_MASTER) &&
>>  		    (pe->flags & PNV_IODA_PE_VF)) {
>>  			list_for_each_entry_safe(s, sn, &pe->slaves, list) {
>>-				pnv_pci_ioda2_release_dma_pe(pdev, s);
>>+				pnv_pci_ioda2_release_dma_pe(s);
>>  				list_del(&s->list);
>>  				pnv_ioda_deconfigure_pe(phb, s);
>>  				pnv_ioda_free_pe(phb, s->pe_number);
>>  			}
>>  		}
>>
>>-		pnv_pci_ioda2_release_dma_pe(pdev, pe);
>>+		pnv_pci_ioda2_release_pe_dma(pe);
>>
>>  		/* Remove from list */
>>  		mutex_lock(&phb->ioda.pe_list_mutex);
>>@@ -1532,7 +1726,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>  		pe->flags = PNV_IODA_PE_VF;
>>  		pe->pbus = NULL;
>>  		pe->parent_dev = pdev;
>>-		pe->dma32_seg = -1;
>>+		pe->dma32_seg = PNV_INVALID_SEGMENT;
>
>
>This and similar changes are not really about "Release PEs dynamically".
>

Agree, I'll split the patch and move this similar changes into another one
separate patch.

>
>>  		pe->mve_number = -1;
>>  		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
>>  			   pci_iov_virtfn_devfn(pdev, vf_index);
>>@@ -1995,7 +2189,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>  	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>>
>>  	/* We shouldn't already have a 32-bit DMA associated */
>>-	if (WARN_ON(pe->dma32_seg >= 0))
>>+	if (WARN_ON(pe->dma32_seg != PNV_INVALID_SEGMENT))
>>  		return;
>>
>>  	tbl = pnv_pci_table_alloc(phb->hose->node);
>>@@ -2066,10 +2260,10 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>  	return;
>>   fail:
>>  	/* XXX Failure: Try to fallback to 64-bit only ? */
>>-	if (pe->dma32_seg >= 0) {
>>+	if (pe->dma32_seg != PNV_INVALID_SEGMENT) {
>>  		bitmap_clear(phb->ioda.dma32_segmap,
>>  			     pe->dma32_seg, pe->dma32_segcount);
>>-		pe->dma32_seg = -1;
>>+		pe->dma32_seg = PNV_INVALID_SEGMENT;
>>  		pe->dma32_segcount = 0;
>>  	}
>>
>>@@ -2416,7 +2610,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>  	int64_t rc;
>>
>>  	/* We shouldn't already have a 32-bit DMA associated */
>>-	if (WARN_ON(pe->dma32_seg >= 0))
>>+	if (WARN_ON(pe->dma32_seg != PNV_INVALID_SEGMENT))
>>  		return;
>>
>>  	/* TVE #1 is selected by PCI address bit 59 */
>>@@ -2443,8 +2637,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>>
>>  	rc = pnv_pci_ioda2_setup_default_config(pe);
>>  	if (rc) {
>>-		if (pe->dma32_seg >= 0)
>>-			pe->dma32_seg = -1;
>>+		if (pe->dma32_seg != PNV_INVALID_SEGMENT)
>>+			pe->dma32_seg = PNV_INVALID_SEGMENT;
>>  		return;
>>  	}
>>
>>@@ -3183,6 +3377,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>>         .teardown_msi_irqs = pnv_teardown_msi_irqs,
>>  #endif
>>         .enable_device_hook = pnv_pci_enable_device_hook,
>>+	.release_device = pnv_pci_release_device,
>>         .window_alignment = pnv_pci_window_alignment,
>>  	.setup_bridge = pnv_pci_setup_bridge,
>>         .reset_secondary_bus = pnv_pci_reset_secondary_bus,
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index f8e6022..2058f06 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -25,11 +25,14 @@ enum pnv_phb_model {
>>  #define PNV_IODA_PE_SLAVE	(1 << 4)	/* Slave PE in compound case	*/
>>  #define PNV_IODA_PE_VF		(1 << 5)	/* PE for one VF 		*/
>>
>>+#define PNV_INVALID_SEGMENT	(-1)
>>+
>>  /* Data associated with a PE, including IOMMU tracking etc.. */
>>  struct pnv_phb;
>>  struct pnv_ioda_pe {
>>  	unsigned long		flags;
>>  	struct pnv_phb		*phb;
>>+	int			device_count;
>>
>>  	/* A PE can be associated with a single device or an
>>  	 * entire bus (& children). In the former case, pdev

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 20/42] powerpc/powernv: Create PEs dynamically
  2015-08-06  4:11 ` [PATCH v6 20/42] powerpc/powernv: Create PEs dynamically Gavin Shan
@ 2015-08-14 13:52   ` Alexey Kardashevskiy
  2015-08-15  4:59     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-14 13:52 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> Currently, the PEs and their associated resources are assigned
> in ppc_md.pcibios_fixup() except those consumed by SRIOV VFs.
> The function is called for once after PCI probing and resources
> assignment is finished which isn't hotplug friendly.
>
> The patch creates PEs dynamically by ppc_md.pcibios_setup_bridge(),
> which is called on the event during system bootup and PCI hotplug:
> updating PCI bridge's windows after resource assignment/reassignment
> are finished. For partial hotplug case, where not all PCI devices
> belonging to the PE are unplugged and plugged again, we just need
> unbinding/binding the affected PCI devices with the corresponding
> PE without creating new one.
>
> Besides, it might require additional resources (e.g. M32) to the
> windows of the PCI bridge when unplugging current adapter, and
> insert a different adapter if there is one PCI slot, which is
> assumed behind root port, or the downstream bridge of the PCIE
> switch behind root port. The parent bridge of the newly plugged
> adapter would reject the request to add more resources, leading
> to hotplug failure. For the issue, the patch extends the windows
> of root port, or the upstream port of the PCIe switch behind root
> port to PHB's windows when ppc_md.pcibios_setup_bridge() is called.
>
> There is no upstream bridge for root bus, so we have to fix it up
> before any PE is created because the root bus PE is the ancestor
> to anyone else.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 226 ++++++++++++++++++------------
>   arch/powerpc/platforms/powernv/pci.h      |   1 +
>   2 files changed, 137 insertions(+), 90 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 8aa6ab8..37847a3 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1083,6 +1083,13 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>   				pci_name(dev));
>   			continue;
>   		}
> +
> +		/* The PCI device might be not detached from the
> +		 * PE in partial hotplug case.
> +		 */
> +		if (pdn->pe_number != IODA_INVALID_PE)
> +			continue;
> +
>   		pdn->pe_number = pe->pe_number;
>   		pe->dma32_weight += pnv_ioda_dma_weight(dev);
>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
> @@ -1101,9 +1108,27 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	struct pci_controller *hose = pci_bus_to_host(bus);
>   	struct pnv_phb *phb = hose->private_data;
>   	struct pnv_ioda_pe *pe = NULL;
> +	int pe_num;
> +
> +	/* For partial hotplug case, the PE instance hasn't been destroyed
> +	 * yet. We shouldn't allocated a new one and assign resources to
> +	 * it. The existing PE instance should be reused, but we should
> +	 * associate the devices to the PE.
> +	 */
> +	pe_num = phb->ioda.pe_rmap[bus->number << 8];
> +	if (pe_num != IODA_INVALID_PE) {
> +		pe = &phb->ioda.pe_array[pe_num];
> +		pnv_ioda_setup_same_PE(bus, pe);
> +		return NULL;
> +	}
> +
> +	/* PE number for root bus should have been reserved */
> +	if (pci_is_root_bus(bus) &&
> +	    phb->ioda.root_pe_idx != IODA_INVALID_PE)
> +		pe = &phb->ioda.pe_array[phb->ioda.root_pe_idx];
>
>   	/* Check if PE is determined by M64 */
> -	if (phb->pick_m64_pe)
> +	if (!pe && phb->pick_m64_pe)


else if (phb->pick_m64_pe)



>   		pe = phb->pick_m64_pe(bus, all);
>
>   	/* The PE number isn't pinned by M64 */
> @@ -1150,46 +1175,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>   	return pe;
>   }
>
> -static void pnv_ioda_setup_PEs(struct pci_bus *bus)
> -{
> -	struct pci_dev *dev;
> -
> -	pnv_ioda_setup_bus_PE(bus, false);
> -
> -	list_for_each_entry(dev, &bus->devices, bus_list) {
> -		if (dev->subordinate) {
> -			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
> -				pnv_ioda_setup_bus_PE(dev->subordinate, true);
> -			else
> -				pnv_ioda_setup_PEs(dev->subordinate);
> -		}
> -	}
> -}
> -
> -/*
> - * Configure PEs so that the downstream PCI buses and devices
> - * could have their associated PE#. Unfortunately, we didn't
> - * figure out the way to identify the PLX bridge yet. So we
> - * simply put the PCI bus and the subordinate behind the root
> - * port to PE# here. The game rule here is expected to be changed
> - * as soon as we can detected PLX bridge correctly.
> - */
> -static void pnv_pci_ioda_setup_PEs(void)
> -{
> -	struct pci_controller *hose, *tmp;
> -	struct pnv_phb *phb;
> -
> -	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> -		phb = hose->private_data;
> -
> -		/* M64 layout might affect PE allocation */
> -		if (phb->reserve_m64_pe)
> -			phb->reserve_m64_pe(hose->bus, NULL, true);
> -
> -		pnv_ioda_setup_PEs(hose->bus);
> -	}
> -}
> -
>   #ifdef CONFIG_PCI_IOV
>   static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>   {
> @@ -2962,52 +2947,6 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>   	}
>   }
>
> -static void pnv_pci_ioda_setup_seg(void)
> -{
> -	struct pci_controller *tmp, *hose;
> -	struct pnv_phb *phb;
> -	struct pnv_ioda_pe *pe;
> -
> -	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> -		phb = hose->private_data;
> -		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> -			pnv_ioda_setup_pe_seg(hose, pe);
> -		}
> -	}
> -}
> -
> -static void pnv_pci_ioda_setup_DMA(void)
> -{
> -	struct pci_controller *hose, *tmp;
> -	struct pnv_phb *phb;
> -	struct pnv_ioda_pe *pe;
> -
> -	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> -		phb = hose->private_data;
> -		pnv_pci_ioda_setup_opal_tce_kill(phb);
> -
> -		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
> -			if (!pe->dma32_weight)
> -				continue;
> -
> -			switch (phb->type) {
> -			case PNV_PHB_IODA1:
> -				pnv_ioda1_setup_dma(phb, pe);
> -				break;
> -			case PNV_PHB_IODA2:
> -				pnv_pci_ioda2_setup_dma_pe(phb, pe);
> -				break;
> -			default:
> -				pr_warn("%s: No DMA for PHB type %d\n",
> -					__func__, phb->type);
> -			}
> -		}
> -
> -		/* Mark the PHB initialization done */
> -		phb->initialized = 1;
> -	}
> -}
> -
>   static void pnv_pci_ioda_create_dbgfs(void)
>   {
>   #ifdef CONFIG_DEBUG_FS
> @@ -3029,9 +2968,8 @@ static void pnv_pci_ioda_create_dbgfs(void)
>
>   static void pnv_pci_ioda_fixup(void)
>   {
> -	pnv_pci_ioda_setup_PEs();
> -	pnv_pci_ioda_setup_seg();
> -	pnv_pci_ioda_setup_DMA();
> +	struct pci_controller *hose, *tmp;
> +	struct pnv_phb *phb;
>
>   	pnv_pci_ioda_create_dbgfs();
>
> @@ -3039,6 +2977,12 @@ static void pnv_pci_ioda_fixup(void)
>   	eeh_init();
>   	eeh_addr_cache_build();
>   #endif
> +
> +	/* Notify initialization of PHB done */
> +	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> +		phb = hose->private_data;
> +		phb->initialized = 1;
> +	}
>   }
>
>   /*
> @@ -3082,6 +3026,105 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>   	return phb->ioda.io_segsize;
>   }
>
> +/*
> + * We are updating root port or the upstream bridge behind the
> + * root port with PHB's windows in order to accommodate the
> + * changes on required resources during PCI (slot) hotplug,
> + * which is connected to either root port or the downstream
> + * ports of PCIe switch behind the root port.
> + */
> +static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
> +					   unsigned long type)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pci_dev *bridge = bus->self;
> +	struct resource *r, *w;
> +	int i;
> +
> +	/* Check if we need apply fixup to the bridge's windows */
> +	if (!pci_is_root_bus(bridge->bus) &&
> +	    !pci_is_root_bus(bridge->bus->self->bus))
> +		return;
> +
> +	/* Fixup the resoureces */
> +	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
> +		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
> +		if (!r->flags || !r->parent)
> +			continue;
> +
> +		w = NULL;
> +		if (r->flags & type & IORESOURCE_IO)
> +			w = &hose->io_resource;
> +		else if (pnv_pci_is_mem_pref_64(r->flags) &&
> +			 (type & IORESOURCE_PREFETCH) &&
> +			 phb->ioda.m64_segsize)
> +			w = &hose->mem_resources[1];
> +		else if (r->flags & type & IORESOURCE_MEM)
> +			w = &hose->mem_resources[0];
> +
> +		r->start = w->start;
> +		r->end = w->end;
> +	}
> +}
> +
> +static void pnv_pci_setup_bridge(struct pci_bus *bus,
> +				 unsigned long type)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pci_dev *bridge = bus->self;
> +	struct pnv_ioda_pe *pe;
> +	bool all = (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
> +
> +	/* The root bus (ancestor PE) should be finalized
> +	 * before anyone else
> +	 */
> +	if (!phb->ioda.root_pe_is_populated) {
> +		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, false);
> +		if (pe && phb->ioda.root_pe_idx == IODA_INVALID_PE)
> +			phb->ioda.root_pe_idx = pe->pe_number;
> +			phb->ioda.root_pe_is_populated = true;
> +		}


This "}" should be 1 tab left. Of you lost one "{" after if() and its 
counterpart.



> +
> +	/* Extend bridge's windows if necessary */
> +	pnv_pci_fixup_bridge_resources(bus, type);
> +
> +	/* Don't assign PE to bus which doesn't have any
> +	 * subordinate PCI devices.
> +	 */
> +	if (list_empty(&bus->devices))
> +		return;
> +
> +	/* Reserve PEs for M64 resource */
> +	if (phb->reserve_m64_pe)
> +		phb->reserve_m64_pe(bus, NULL, all);
> +
> +	/* Assign PE. We might run here because of partial hotplug.
> +	 * For the case, we just pick up the existing PE and should
> +	 * not allocate resources again.
> +	 */
> +	pe = pnv_ioda_setup_bus_PE(bus, all);
> +	if (!pe)
> +		return;
> +
> +	/* Setup MMIO mapping */
> +	pnv_ioda_setup_pe_seg(hose, pe);
> +
> +	/* Setup DMA */
> +	switch (phb->type) {
> +	case PNV_PHB_IODA1:
> +		pnv_ioda1_setup_dma(phb, pe);
> +		break;
> +	case PNV_PHB_IODA2:
> +		pnv_pci_ioda2_setup_dma_pe(phb, pe);
> +		break;
> +	default:
> +		pr_warn("%s: No DMA for PHB type %d\n",
> +			__func__, phb->type);
> +	}
> +}
> +
>   #ifdef CONFIG_PCI_IOV
>   static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>   						      int resno)
> @@ -3147,6 +3190,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>   #endif
>          .enable_device_hook = pnv_pci_enable_device_hook,
>          .window_alignment = pnv_pci_window_alignment,
> +	.setup_bridge = pnv_pci_setup_bridge,
>          .reset_secondary_bus = pnv_pci_reset_secondary_bus,
>          .dma_set_mask = pnv_pci_ioda_dma_set_mask,
>          .shutdown = pnv_pci_ioda_shutdown,
> @@ -3218,6 +3262,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   	if (phb->regs == NULL)
>   		pr_err("  Failed to map registers !\n");
>
> +	pnv_pci_ioda_setup_opal_tce_kill(phb);
> +
>   	/* Initialize more IODA stuff */
>   	phb->ioda.total_pe_num = 1;
>   	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index e93a489..a160491 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -136,6 +136,7 @@ struct pnv_phb {
>   			/* Global bridge info */
>   			unsigned int		total_pe_num;
>   			unsigned int		root_pe_idx;
> +			bool			root_pe_is_populated;
>   			unsigned int		reserved_pe_idx;
>
>   			/* 32-bit MMIO window */
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 42/42] pci/hotplug: PowerPC PowerNV PCI hotplug driver
  2015-08-06  4:11 ` [PATCH v6 42/42] pci/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
@ 2015-08-15  3:13   ` Alexey Kardashevskiy
  2015-08-15  4:47     ` Gavin Shan
  0 siblings, 1 reply; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-15  3:13 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
  Cc: linux-pci, devicetree, benh, mpe, bhelgaas, grant.likely,
	robherring2, panto

On 08/06/2015 02:11 PM, Gavin Shan wrote:
> The patch intends to add standalone driver to support PCI hotplug
> for PowerPC PowerNV platform, which runs on top of skiboot firmware.
> The firmware identified hotpluggable slots and marked their device
> tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware".
> The driver simply scans device-tree to create/register PCI hotplug slot
> accordingly.
>
> If the skiboot firmware doesn't support slot status retrieval, the PCI
> slot device node shouldn't have property "ibm,reset-by-firmware". In
> that case, none of valid PCI slots will be detected from device tree.
> The skiboot firmware doesn't export the capability to access attention
> LEDs yet and it's something for TBD.
>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>   MAINTAINERS                            |   6 +
>   drivers/pci/hotplug/Kconfig            |  12 +
>   drivers/pci/hotplug/Makefile           |   4 +
>   drivers/pci/hotplug/powernv_php.c      | 140 +++++++
>   drivers/pci/hotplug/powernv_php.h      |  92 +++++
>   drivers/pci/hotplug/powernv_php_slot.c | 722 +++++++++++++++++++++++++++++++++
>   6 files changed, 976 insertions(+)
>   create mode 100644 drivers/pci/hotplug/powernv_php.c
>   create mode 100644 drivers/pci/hotplug/powernv_php.h
>   create mode 100644 drivers/pci/hotplug/powernv_php_slot.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index fd60784..3b75c92 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7747,6 +7747,12 @@ L:	linux-pci@vger.kernel.org
>   S:	Supported
>   F:	Documentation/PCI/pci-error-recovery.txt
>
> +PCI HOTPLUG DRIVER FOR POWERNV PLATFORM
> +M:	Gavin Shan <gwshan@linux.vnet.ibm.com>
> +L:	linux-pci@vger.kernel.org
> +S:	Supported
> +F:	drivers/pci/hotplug/powernv_php*
> +
>   PCI SUBSYSTEM
>   M:	Bjorn Helgaas <bhelgaas@google.com>
>   L:	linux-pci@vger.kernel.org
> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> index df8caec..ef55dae 100644
> --- a/drivers/pci/hotplug/Kconfig
> +++ b/drivers/pci/hotplug/Kconfig
> @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>
>   	  When in doubt, say N.
>
> +config HOTPLUG_PCI_POWERNV
> +	tristate "PowerPC PowerNV PCI Hotplug driver"
> +	depends on PPC_POWERNV && EEH
> +	help
> +	  Say Y here if you run PowerPC PowerNV platform that supports
> +          PCI Hotplug
> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called powernv-php.
> +
> +	  When in doubt, say N.
> +
>   config HOTPLUG_PCI_RPA
>   	tristate "RPA PCI Hotplug driver"
>   	depends on PPC_PSERIES && EEH
> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
> index b616e75..fd51d65 100644
> --- a/drivers/pci/hotplug/Makefile
> +++ b/drivers/pci/hotplug/Makefile
> @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>   obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
> +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= powernv-php.o
>   obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>   obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
> @@ -50,6 +51,9 @@ ibmphp-objs		:=	ibmphp_core.o	\
>   acpiphp-objs		:=	acpiphp_core.o	\
>   				acpiphp_glue.o
>
> +powernv-php-objs	:=	powernv_php.o	\
> +				powernv_php_slot.o
> +
>   rpaphp-objs		:=	rpaphp_core.o	\
>   				rpaphp_pci.o	\
>   				rpaphp_slot.o
> diff --git a/drivers/pci/hotplug/powernv_php.c b/drivers/pci/hotplug/powernv_php.c
> new file mode 100644
> index 0000000..4cbff7a
> --- /dev/null
> +++ b/drivers/pci/hotplug/powernv_php.c
> @@ -0,0 +1,140 @@
> +/*
> + * PCI Hotplug Driver for PowerPC PowerNV platform.
> + *
> + * Copyright Gavin Shan, IBM Corporation 2015.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +
> +#include <asm/opal.h>
> +#include <asm/pnv-pci.h>
> +
> +#include "powernv_php.h"
> +
> +#define DRIVER_VERSION	"0.1"
> +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
> +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"


Align all or none.


> +
> +static struct notifier_block php_msg_nb = {
> +	.notifier_call	= powernv_php_msg_handler,
> +	.next		= NULL,
> +	.priority	= 0,
> +};
> +
> +static int powernv_php_register_one(struct device_node *dn)
> +{
> +	struct powernv_php_slot *slot;
> +	const __be32 *prop32;
> +	int ret;
> +
> +	/* Check if it's hotpluggable slot */
> +	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
> +	if (!prop32 || !of_read_number(prop32, 1))
> +		return -ENXIO;
> +
> +	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
> +	if (!prop32 || !of_read_number(prop32, 1))
> +		return -ENXIO;
> +
> +	/* Allocate slot */
> +	slot = powernv_php_slot_alloc(dn);
> +	if (!slot)
> +		return -ENODEV;
> +
> +	/* Register it */
> +	ret = powernv_php_slot_register(slot);
> +	if (ret) {
> +		powernv_php_slot_put(slot);
> +		return ret;
> +	}
> +
> +	return powernv_php_slot_enable(slot->php_slot, false);


And if it fails, no unregister and cleanup is needed?


> +}
> +
> +int powernv_php_register(struct device_node *dn)
> +{
> +	struct device_node *child;
> +	int ret = 0;

@ret is not used below.

> +
> +	/*
> +	 * The parent slots should be registered before their
> +	 * child slots.
> +	 */
> +	for_each_child_of_node(dn, child) {
> +		powernv_php_register_one(child);
> +		powernv_php_register(child);
> +	}
> +
> +	return ret;
> +}
> +
> +static void powernv_php_unregister_one(struct device_node *dn)
> +{
> +	struct powernv_php_slot *slot;
> +
> +	slot = powernv_php_slot_find(dn);
> +	if (!slot)
> +		return;
> +
> +	pci_hp_deregister(slot->php_slot);
> +}
> +
> +void powernv_php_unregister(struct device_node *dn)
> +{
> +	struct device_node *child;
> +
> +	/* The child slots should go before their parent slots */
> +	for_each_child_of_node(dn, child) {
> +		powernv_php_unregister(child);
> +		powernv_php_unregister_one(child);
> +	}
> +}
> +
> +static int __init powernv_php_init(void)
> +{
> +	struct device_node *dn;
> +	int ret;
> +
> +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> +
> +	/* Register hotplug message handler */
> +	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
> +	if (ret) {
> +		pr_warn("%s: Error %d registering hotplug notifier\n",
> +			__func__, ret);
> +		return ret;
> +	}
> +
> +	/* Scan PHB nodes and their children */
> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> +		powernv_php_register(dn);
> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> +		powernv_php_register(dn);


May be move pnv_pci_hotplug_notifier_register() after 
powernv_php_register()? If not, then below (in powernv_php_exit()) move 
pnv_pci_hotplug_notifier_unregister() to the end?


> +
> +	return 0;
> +}
> +
> +static void __exit powernv_php_exit(void)
> +{
> +	struct device_node *dn;
> +
> +	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
> +
> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
> +		powernv_php_unregister(dn);
> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
> +		powernv_php_unregister(dn);
> +}
> +
> +module_init(powernv_php_init);
> +module_exit(powernv_php_exit);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> diff --git a/drivers/pci/hotplug/powernv_php.h b/drivers/pci/hotplug/powernv_php.h
> new file mode 100644
> index 0000000..8034cc6
> --- /dev/null
> +++ b/drivers/pci/hotplug/powernv_php.h
> @@ -0,0 +1,92 @@
> +/*
> + * PCI Hotplug Driver for PowerPC PowerNV platform.
> + *
> + * Copyright Gavin Shan, IBM Corporation 2015.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#ifndef _POWERNV_PHP_H
> +#define _POWERNV_PHP_H
> +
> +#include <linux/list.h>
> +#include <linux/kref.h>
> +#include <linux/of.h>
> +#include <linux/pci.h>
> +#include <linux/pci_hotplug.h>
> +#include <linux/wait.h>
> +#include <linux/workqueue.h>
> +
> +#include <asm/opal-api.h>
> +
> +/* Slot power status */
> +#define POWERNV_PHP_SLOT_POWER_OFF	0
> +#define POWERNV_PHP_SLOT_POWER_ON	1
> +
> +/* Slot presence status */
> +#define POWERNV_PHP_SLOT_EMPTY		0
> +#define POWERNV_PHP_SLOT_PRESENT	1

These two are also only used in drivers/pci/hotplug/powernv_php_slot.c,
move them there at least. It also seems your PHP driver is the only one 
which uses flags for an adapter status, others return plain 0 or 1 (which 
are c-style "false" and "true", pretty much, so it is not the case of magic 
constants). Since you return these values from the hotplug_slot_ops 
callbacks to external code, you should probably do the same.

And exactly the same comment about POWERNV_PHP_SLOT_POWER_ON/OFF few lines 
above.



> +
> +/* Slot attention status */
> +#define POWERNV_PHP_SLOT_ATTEN_OFF	0
> +#define POWERNV_PHP_SLOT_ATTEN_ON	1
> +#define POWERNV_PHP_SLOT_ATTEN_IND	2
> +#define POWERNV_PHP_SLOT_ATTEN_ACT	3


These should go to drivers/pci/hotplug/powernv_php_slot.c. Where are these 
flags defined? Looks to me like there is a way to pass some status from the 
userspace via sysfs to OPAL so only OPAL is supposed to recognize and 
handle these. If so, these macros are missing "OPAL" in their names. I have 
one more comment below about it.


> +
> +struct powernv_php_slot {
> +	char			*name;
> +	struct device_node	*dn;
> +	struct pci_dev		*pdev;
> +	struct pci_bus		*bus;
> +	uint64_t		id;
> +	int			slot_no;
> +	struct kref		kref;
> +#define POWERNV_PHP_SLOT_STATE_INIT		0
> +#define POWERNV_PHP_SLOT_STATE_REGISTER		1
> +#define POWERNV_PHP_SLOT_STATE_POPULATED	2
> +	int			state;
> +	int			check_power_status;
> +	int			status_confirmed;

s/status_confirmed/power_status_confirmed/

What is this status? It can be 0, 1, 2 which seems to be 
UNCONFIRMED/INPROGRESS/CONFIRMED (does not need PNV/IODA prefixes as it is 
local to the powernv_php_slot.c file).


> +	struct opal_msg		*msg;
> +	void			*fdt;
> +	void			*dt;
> +	struct of_changeset	ocs;
> +	struct work_struct	work;
> +	wait_queue_head_t	queue;
> +	struct hotplug_slot	*php_slot;
> +	struct powernv_php_slot	*parent;
> +	struct list_head	children;
> +	struct list_head	link;
> +};

This should go to drivers/pci/hotplug/powernv_php_slot.c and this header 
should only have a forward declaration. After you move it there, you get 
better separation of the driver code from the slot code and only 2 changes 
will be needed:

1. powernv_php_slot_enable() should receive powernv_php_slot
2. add powernv_php_slot_unregister() (like powernv_php_slot_register()), 
this way you will have pairing pci_hp_register/pci_hp_deregister in the 
same file.


After you moved this struct to the source file, you could remove/shorten 
POWERNV_PHP_SLOT_STATE_ prefixes if you wished.

> +
> +int powernv_php_msg_handler(struct notifier_block *nb,
> +			    unsigned long type, void *message);
> +struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn);
> +void powernv_php_slot_free(struct kref *kref);
> +struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn);
> +int powernv_php_slot_register(struct powernv_php_slot *slot);
> +int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan);
> +int powernv_php_register(struct device_node *dn);
> +void powernv_php_unregister(struct device_node *dn);
> +
> +#define to_powernv_php_slot(kref) \
> +	container_of(kref, struct powernv_php_slot, kref)
> +
> +static inline void powernv_php_slot_get(struct powernv_php_slot *slot)
> +{
> +	if (slot)
> +		kref_get(&slot->kref);
> +}
> +
> +static inline int powernv_php_slot_put(struct powernv_php_slot *slot)
> +{
> +	if (slot)
> +		return kref_put(&slot->kref, powernv_php_slot_free);
> +
> +	return 0;
> +}

In these 2 helpers you do not have to check for @slof - it is checked in 
the callers pretty much always. Or it is not checked in php_slot_release() 
but dereferenced before you call powernv_php_slot_put(slot).

The only place you really want this check is 
powernv_php_slot_put(slot->parent) so just check it there. btw is it even 
possible for the slot not to have a parent?

So I'd ditch these helpers.


> +
> +#endif /* !_POWERNV_PHP_H */
> diff --git a/drivers/pci/hotplug/powernv_php_slot.c b/drivers/pci/hotplug/powernv_php_slot.c
> new file mode 100644
> index 0000000..73a93a2
> --- /dev/null
> +++ b/drivers/pci/hotplug/powernv_php_slot.c
> @@ -0,0 +1,722 @@
> +/*
> + * PCI Hotplug Driver for PowerPC PowerNV platform.
> + *
> + * Copyright Gavin Shan, IBM Corporation 2015.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +
> +#include <asm/opal.h>
> +#include <asm/pnv-pci.h>
> +#include <asm/ppc-pci.h>
> +
> +#include "powernv_php.h"
> +
> +static LIST_HEAD(php_slot_list);
> +static DEFINE_SPINLOCK(php_slot_lock);
> +
> +/*
> + * Remove firmware data for all child device nodes of the
> + * indicated one.
> + */
> +static void remove_child_pdn(struct device_node *np)
> +{
> +	struct device_node *child;
> +
> +	for_each_child_of_node(np, child) {
> +		/* In depth first */
> +		remove_child_pdn(child);
> +
> +		remove_pci_device_node_info(child);
> +	}
> +}
> +
> +/*
> + * Remove all subordinate device nodes of the indicated one.
> + * Those device nodes in deepest path should be released firstly.
> + */
> +static int remove_child_device_nodes(struct device_node *parent)
> +{
> +	struct device_node *np, *child;
> +	int ret = 0;
> +
> +	/* If the device node has children, remove them firstly */
> +	for_each_child_of_node(parent, np) {
> +		ret = remove_child_device_nodes(np);
> +		if (ret)
> +			return ret;
> +
> +		/* The device shouldn't have alive children */
> +		child = of_get_next_child(np, NULL);
> +		if (child) {
> +			of_node_put(child);
> +			of_node_put(np);
> +			pr_err("%s: Alive children of node <%s>\n",
> +			       __func__, of_node_full_name(np));
> +			return -EBUSY;
> +		}
> +
> +		/* Detach the device node */
> +		of_detach_node(np);
> +		of_node_put(np);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * The function processes the message sent by firmware
> + * to remove all device tree nodes beneath the slot's
> + * nodes, and the associated auxillary data.
> + */
> +static void slot_power_off_handler(struct powernv_php_slot *slot)
> +{
> +	int ret, status = 1;
> +
> +	/* Release the firmware data for the child device nodes */
> +	remove_child_pdn(slot->dn);
> +
> +	/*
> +	 * Release the child device nodes. If the sub-tree was
> +	 * built with the help of changeset, we just need destroy
> +	 * the changes.
> +	 */
> +	if (slot->fdt) {
> +		of_changeset_destroy(&slot->ocs);
> +		kfree(slot->dt);
> +		slot->dt = NULL;
> +		slot->dn->child = NULL;
> +		kfree(slot->fdt);
> +		slot->fdt = NULL;
> +	} else {
> +		ret = remove_child_device_nodes(slot->dn);
> +		if (ret) {
> +			status = 2;
> +			dev_warn(&slot->pdev->dev, "Error %d freeing nodes\n",
> +				 ret);
> +		}
> +	}
> +
> +	/* Confirm status change */
> +	slot->status_confirmed = status;
> +	wake_up_interruptible(&slot->queue);
> +}
> +
> +static int slot_populate_changeset(struct of_changeset *ocs,
> +				    struct device_node *dn)
> +{
> +	struct device_node *child;
> +	int ret = 0;
> +
> +	for_each_child_of_node(dn, child) {
> +		ret = of_changeset_attach_node(ocs, child);
> +		if (ret)
> +			return ret;
> +
> +		ret = slot_populate_changeset(ocs, child);
> +	}
> +
> +	return ret;
> +}
> +
> +static void slot_power_on_handler(struct powernv_php_slot *slot)
> +{
> +	void *fdt, *dt;
> +	uint64_t len;
> +	int ret, status = 1;
> +
> +	/* We don't know the FDT blob size. It tries with incremental
> +	 * sized memory chunk.
> +	 */


What is the real expected size? 0x10000 is just 64K, just allocate it and 
that's it.


> +	for (len = 0x2000; len <= 0x10000; len += 0x2000) {
> +		fdt = kzalloc(len, GFP_KERNEL);
> +		if (!fdt)
> +			break;
> +
> +		ret = pnv_pci_get_device_tree(slot->dn->phandle, fdt, len);
> +		if (!ret)
> +			break;
> +
> +		kfree(fdt);
> +	}
> +
> +	if (len > 0x10000) {
> +		dev_warn(&slot->pdev->dev, "Cannot alloc FDT blob\n");
> +		goto out;
> +	}
> +
> +	/* Unflatten device tree blob */
> +	dt = of_fdt_unflatten_tree(fdt, slot->dn, NULL);
> +	if (!dt) {
> +		dev_warn(&slot->pdev->dev, "Cannot unflatten FDT\n");
> +		goto free_fdt;
> +	}

Right here you could kfree(fdt) and not cache it in the slot struct at all. 
You do not use it later anyway.


> +
> +	/* Initialize and apply the changeset */
> +	of_changeset_init(&slot->ocs);
> +	ret = slot_populate_changeset(&slot->ocs, slot->dn);
> +	if (ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d populating changeset\n",
> +			 ret);
> +		goto free_dt;
> +	}
> +
> +	slot->dn->child = NULL;
> +	ret = of_changeset_apply(&slot->ocs);
> +	if (ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d applying changeset\n",
> +			 ret);
> +		goto destroy_changeset;
> +	}
> +
> +	/* Add device node firmware data */
> +	traverse_pci_device_nodes(slot->dn,
> +				  add_pci_device_node_info,
> +				  pci_bus_to_host(slot->bus));
> +	slot->fdt = fdt;
> +	slot->dt = dt;
> +	goto out;
> +
> +destroy_changeset:
> +	of_changeset_destroy(&slot->ocs);
> +free_dt:
> +	kfree(dt);
> +	slot->dn->child = NULL;

Can of_fdt_unflatten_tree() or of_changeset_init() or 
slot_populate_changeset() initialize dn->child? No kfree(slot->dn->child)?


> +free_fdt:
> +	kfree(fdt);
> +	status = 2;
> +out:
> +	/* Confirm status change */
> +	slot->status_confirmed = status;
> +	wake_up_interruptible(&slot->queue);
> +}
> +
> +static void powernv_php_slot_work(struct work_struct *data)
> +{
> +	struct powernv_php_slot *slot = container_of(data,
> +						     struct powernv_php_slot,
> +						     work);
> +	uint64_t php_event = be64_to_cpu(slot->msg->params[0]);
> +
> +	switch (php_event) {
> +	case 0: /* Slot power off */
> +		slot_power_off_handler(slot);
> +		break;
> +	case 1: /* Slot power on */
> +		slot_power_on_handler(slot);
> +		break;

These 0 and 1 are not the same 0 and 1 used for @val in get_power_status(), 
these are from OPAL so please define them.


> +	default:
> +		dev_warn(&slot->pdev->dev, "Unsupported hotplug event %lld\n",
> +			 php_event);
> +	}
> +
> +	of_node_put(slot->dn);
> +}
> +
> +int powernv_php_msg_handler(struct notifier_block *nb,
> +			    unsigned long type, void *message)
> +{
> +	phandle h;
> +	struct device_node *np;
> +	struct powernv_php_slot *slot;
> +	struct opal_msg *msg = message;
> +
> +	/* Check the message type */
> +	if (type != OPAL_MSG_PCI_HOTPLUG) {
> +		pr_warn("%s: Wrong message type %ld received!\n",
> +			__func__, type);
> +		return NOTIFY_DONE;
> +	}
> +
> +	/* Find the device node */
> +	h = (phandle)be64_to_cpu(msg->params[1]);
> +	np = of_find_node_by_phandle(h);
> +	if (!np) {
> +		pr_warn("%s: No device node for phandle 0x%08x\n",
> +			__func__, h);
> +		return NOTIFY_DONE;
> +	}
> +
> +	/* Find the slot */
> +	slot = powernv_php_slot_find(np);
> +	if (!slot) {
> +		pr_warn("%s: No slot found for node <%s>\n",
> +			__func__, of_node_full_name(np));
> +		of_node_put(np);
> +		return NOTIFY_DONE;
> +	}
> +
> +	/* Schedule the work */
> +	slot->msg = msg;
> +	schedule_work(&slot->work);
> +	return NOTIFY_OK;
> +}

This function belongs to drivers/pci/hotplug/powernv_php.c (searching a 
slot is powernv_php.c's scope) except these lines:

 > +	/* Schedule the work */
 > +	slot->msg = msg;
 > +	schedule_work(&slot->work);

These 3 lines should be in helper in drivers/pci/hotplug/powernv_php_slot.c.


> +
> +static int set_power_status(struct hotplug_slot *php_slot, u8 val)
> +{
> +	struct powernv_php_slot *slot = php_slot->private;
> +	int ret;
> +
> +	/* Set power status */
> +	slot->status_confirmed = 0;
> +	ret = pnv_pci_set_power_status(slot->id, val);
> +	if (ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d powering %s slot\n",
> +			 ret, val ? "on" : "off");
> +		return ret;
> +	}
> +
> +	/* Continue to PCI probing after finalized device-tree. The
> +	 * device-tree might have been updated completely at this
> +	 * point. Thus we don't have to always waiting for that.
> +	 */
> +	if (slot->status_confirmed == 1)
> +		return 0;
> +	else if (slot->status_confirmed > 0)
> +		return -EBUSY;
> +
> +	ret = wait_event_timeout(slot->queue, slot->status_confirmed, 10 * HZ);
> +	if (!ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d waiting for power-%s\n",
> +			 ret, val ? "on" : "off");
> +		return -EBUSY;
> +	}
> +
> +	/* Check the result */
> +	if (slot->status_confirmed == 1)
> +		return 0;
> +
> +	dev_warn(&slot->pdev->dev, "Error status %d for power-%s\n",
> +		 slot->status_confirmed, val ? "on" : "off");
> +	return -EBUSY;
> +}
> +
> +static int get_power_status(struct hotplug_slot *php_slot, u8 *val)
> +{
> +	struct powernv_php_slot *slot = php_slot->private;
> +	uint8_t state;
> +	int ret;
> +
> +	/*
> +	 * Retrieve power status from firmware. If we fail
> +	 * getting that, the power status fails back to
> +	 * be on.
> +	 */
> +	ret = pnv_pci_get_power_status(slot->id, &state);
> +	if (ret) {
> +		*val = POWERNV_PHP_SLOT_POWER_ON;
> +		dev_warn(&slot->pdev->dev, "Error %d getting power status\n",
> +			 ret);
> +	} else {
> +		*val = state ? POWERNV_PHP_SLOT_POWER_ON :
> +			       POWERNV_PHP_SLOT_POWER_OFF;
> +		php_slot->info->power_status = *val;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_adapter_status(struct hotplug_slot *php_slot, u8 *val)
> +{
> +	struct powernv_php_slot *slot = php_slot->private;
> +	uint8_t state;
> +	int ret;
> +
> +	/*
> +	 * Retrieve presence status from firmware. If we can't
> +	 * get that, it will fail back to be empty.
> +	 */
> +	ret = pnv_pci_get_presence_status(slot->id, &state);
> +	if (ret >= 0) {
> +		ret = 0;
> +		*val = state ? POWERNV_PHP_SLOT_PRESENT :
> +			       POWERNV_PHP_SLOT_EMPTY;
> +		php_slot->info->adapter_status = *val;
> +		ret = 0;
> +	} else {
> +		*val = POWERNV_PHP_SLOT_EMPTY;
> +		dev_warn(&slot->pdev->dev, "Error %d getting presence\n",
> +			 ret);
> +	}
> +
> +	return ret;
> +}
> +
> +static int set_attention_status(struct hotplug_slot *php_slot, u8 val)
> +{
> +	struct powernv_php_slot *slot = php_slot->private;
> +
> +	/* The default operation would to turn on the attention */
> +	switch (val) {
> +	case POWERNV_PHP_SLOT_ATTEN_OFF:
> +	case POWERNV_PHP_SLOT_ATTEN_ON:
> +	case POWERNV_PHP_SLOT_ATTEN_IND:
> +	case POWERNV_PHP_SLOT_ATTEN_ACT:
> +		break;
> +	default:
> +		dev_warn(&slot->pdev->dev, "Invalid attention %d\n", val);
> +		return -EINVAL;
> +	}
> +
> +	/* FIXME: Make it real once firmware supports it */
> +	php_slot->info->attention_status = val;

Since firmware does not have an idea about these 
POWERNV_PHP_SLOT_ATTEN_xxx, just remove them. Later when the firmware will 
know about them, we will have to change this code anyway and by that time, 
the set of states may have changed.


> +
> +	return 0;
> +}
> +
> +int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan)


This should receive powernv_php_slot as described above.


> +{
> +	struct powernv_php_slot *slot = php_slot->private;
> +	uint8_t presence, power_status;
> +	int ret;
> +
> +	/* Check if the slot has been configured */
> +	if (slot->state != POWERNV_PHP_SLOT_STATE_REGISTER)
> +		return 0;
> +
> +	/* Retrieve slot presence status */
> +	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
> +	if (ret)
> +		return ret;
> +
> +	/* Proceed if there have nothing behind the slot */
> +	if (presence == POWERNV_PHP_SLOT_EMPTY)
> +		goto scan;
> +
> +	/*
> +	 * If we don't detect something behind the slot, we need
> +	 * make sure the power suply to the slot is on. Otherwise,
> +	 * the slot downstream PCIe linkturn should be down.
> +	 *
> +	 * On the first time, we don't change the power status to
> +	 * boost system boot with assumption that the firmware
> +	 * supplies consistent slot power status: empty slot always
> +	 * has its power off and non-empty slot has its power on.
> +	 */
> +	if (!slot->check_power_status) {
> +		slot->check_power_status = 1;
> +		goto scan;
> +	}
> +
> +	/* Check the power status. Scan the slot if that's already on */
> +	ret = php_slot->ops->get_power_status(php_slot, &power_status);
> +	if (ret)
> +		return ret;
> +
> +	if (power_status == POWERNV_PHP_SLOT_POWER_ON)
> +		goto scan;
> +
> +	/* Power is off, turn it on and then scan the slot */
> +	ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_ON);
> +	if (ret)
> +		return ret;
> +
> +scan:
> +	switch (presence) {
> +	case POWERNV_PHP_SLOT_PRESENT:
> +		if (rescan) {
> +			pci_lock_rescan_remove();
> +			pci_add_pci_devices(slot->bus);
> +			pci_unlock_rescan_remove();
> +		}
> +
> +		/* Rescan for child hotpluggable slots */
> +		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
> +		if (rescan)
> +			powernv_php_register(slot->dn);
> +		break;
> +	case POWERNV_PHP_SLOT_EMPTY:
> +		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
> +		break;
> +	default:
> +		dev_warn(&slot->pdev->dev, "Invalid presence status %d\n",
> +			 presence);
> +		return -EINVAL;

Neigher PHP driver will ever have presence other than 0 or 1. So this 
switch() is simple if(presence){}else{}.



> +	}
> +
> +	return 0;
> +}
> +
> +static int enable_slot(struct hotplug_slot *php_slot)
> +{
> +	return powernv_php_slot_enable(php_slot, true);
> +}
> +
> +static int disable_slot(struct hotplug_slot *php_slot)
> +{
> +	struct powernv_php_slot *slot = php_slot->private;
> +	uint8_t power_status;
> +	int ret;
> +
> +	if (slot->state != POWERNV_PHP_SLOT_STATE_POPULATED)
> +		return 0;
> +
> +	/* Remove all devices behind the slot */
> +	pci_lock_rescan_remove();
> +	pci_remove_pci_devices(slot->bus);
> +	pci_unlock_rescan_remove();
> +
> +	/* Detach the child hotpluggable slots */
> +	powernv_php_unregister(slot->dn);
> +
> +	/*
> +	 * Check the power status and turn it off if necessary. If we
> +	 * fail to get the power status, the power will be forced to
> +	 * be off.
> +	 */
> +	ret = php_slot->ops->get_power_status(php_slot, &power_status);
> +	if (ret || power_status == POWERNV_PHP_SLOT_POWER_ON) {
> +		ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_OFF);
> +		if (ret)
> +			dev_warn(&slot->pdev->dev, "Error %d powering off\n",
> +				 ret);
> +	}
> +
> +	/* Update slot state */
> +	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
> +	return 0;
> +}
> +
> +static struct hotplug_slot_ops php_slot_ops = {
> +	.get_power_status	= get_power_status,
> +	.get_adapter_status	= get_adapter_status,
> +	.set_attention_status	= set_attention_status,
> +	.enable_slot		= enable_slot,
> +	.disable_slot		= disable_slot,
> +};
> +
> +static struct powernv_php_slot *php_slot_match(struct device_node *dn,
> +					       struct powernv_php_slot *slot)
> +{
> +	struct powernv_php_slot *target, *tmp;
> +
> +	if (slot->dn == dn)
> +		return slot;
> +
> +	list_for_each_entry(tmp, &slot->children, link) {
> +		target = php_slot_match(dn, tmp);
> +		if (target)
> +			return target;
> +	}
> +
> +	return NULL;
> +}
> +
> +struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn)
> +{
> +	struct powernv_php_slot *slot, *tmp;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&php_slot_lock, flags);
> +	list_for_each_entry(tmp, &php_slot_list, link) {
> +		slot = php_slot_match(dn, tmp);
> +		if (slot) {
> +			spin_unlock_irqrestore(&php_slot_lock, flags);
> +			return slot;
> +		}
> +	}
> +	spin_unlock_irqrestore(&php_slot_lock, flags);
> +
> +	return NULL;
> +}
> +
> +void powernv_php_slot_free(struct kref *kref)
> +{
> +	struct powernv_php_slot *slot = to_powernv_php_slot(kref);
> +
> +	WARN_ON(!list_empty(&slot->children));
> +	kfree(slot->name);
> +	kfree(slot);
> +}
> +
> +static void php_slot_release(struct hotplug_slot *hp_slot)
> +{
> +	struct powernv_php_slot *slot = hp_slot->private;
> +	unsigned long flags;
> +
> +	/* Remove from global or child list */
> +	spin_lock_irqsave(&php_slot_lock, flags);
> +	list_del(&slot->link);
> +	spin_unlock_irqrestore(&php_slot_lock, flags);


This is a good example where RCU rules. powernv_php_slot_find() returns 
slot pointer and its use is not protected by spin_lock -> dangerous.

Remove spin_lock(), s/list_del/list_del_rcu/, and move bits below to 
call_rcu(), and s/list_for_each_entry/list_for_each_entry_rcu/.


> +
> +	/* Detach from parent */
> +	powernv_php_slot_put(slot);
> +	powernv_php_slot_put(slot->parent);
> +}
> +
> +static bool php_slot_get_id(struct device_node *dn,
> +			    uint64_t *id)
> +{
> +	struct device_node *parent = dn;
> +	const __be64 *prop64;
> +	const __be32 *prop32;
> +
> +	/*
> +	 * The hotpluggable slot always has a compound Id, which
> +	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
> +	 * number, and compound indicator
> +	 */
> +	*id = (0x1ul << 63);
> +
> +	/* Bus/Slot/Function number */
> +	prop32 = of_get_property(dn, "reg", NULL);
> +	if (!prop32)
> +		return false;
> +	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
> +
> +	/* PHB Id */
> +	while ((parent = of_get_parent(parent))) {
> +		if (!PCI_DN(parent)) {
> +			of_node_put(parent);
> +			break;
> +		}
> +
> +		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
> +		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
> +			of_node_put(parent);
> +			continue;
> +		}
> +
> +		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
> +		if (!prop64) {
> +			of_node_put(parent);
> +			return false;
> +		}
> +
> +		*id |= be64_to_cpup(prop64);
> +		of_node_put(parent);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn)
> +{
> +	struct eeh_dev *edev = pdn_to_eeh_dev(PCI_DN(dn));
> +	struct pci_bus *bus;
> +	struct powernv_php_slot *slot;
> +	const char *label;
> +	uint64_t id;
> +	int slot_no;
> +	size_t size;
> +	void *pmem;
> +
> +	/* Slot name */
> +	label = of_get_property(dn, "ibm,slot-label", NULL);
> +	if (!label)
> +		return NULL;
> +
> +	/* Slot identifier */
> +	if (!php_slot_get_id(dn, &id))
> +		return NULL;
> +
> +	/* PCI bus */
> +	bus = of_node_to_pci_bus(dn);
> +	if (!bus)
> +		return NULL;
> +
> +	/* Slot number */
> +	if (dn->child && PCI_DN(dn->child))
> +		slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
> +	else
> +		slot_no = -1;

Not INVALID_SLOT and
#define INVALID_SLOT -1
? :)


> +
> +	/* Allocate slot */
> +	size = sizeof(struct powernv_php_slot) +
> +	       sizeof(struct hotplug_slot) +
> +	       sizeof(struct hotplug_slot_info);
> +	pmem = kzalloc(size, GFP_KERNEL);
> +	if (!pmem) {
> +		pr_warn("%s: Cannot allocate slot for node %s\n",
> +			__func__, dn->full_name);
> +		return NULL;
> +	}
> +
> +	/* Assign memory blocks */
> +	slot = pmem;
> +	slot->php_slot = pmem + sizeof(struct powernv_php_slot);
> +	slot->php_slot->info = pmem + sizeof(struct powernv_php_slot) +
> +			      sizeof(struct hotplug_slot);
> +	slot->name = kstrdup(label, GFP_KERNEL);
> +	if (!slot->name) {
> +		pr_warn("%s: Cannot populate name for node %s\n",
> +			__func__, dn->full_name);
> +		kfree(pmem);
> +		return NULL;
> +	}

Why not just embed structs one to another?

> +
> +	/* Initialize slot */
> +	kref_init(&slot->kref);
> +	slot->state = POWERNV_PHP_SLOT_STATE_INIT;
> +	slot->dn = dn;
> +	slot->pdev = eeh_dev_to_pci_dev(edev);
> +	slot->bus = bus;
> +	slot->id = id;
> +	slot->slot_no = slot_no;
> +	INIT_WORK(&slot->work, powernv_php_slot_work);
> +	init_waitqueue_head(&slot->queue);
> +	slot->check_power_status = 0;
> +	slot->status_confirmed = 0;
> +	slot->php_slot->ops = &php_slot_ops;
> +	slot->php_slot->release = php_slot_release;
> +	slot->php_slot->private = slot;
> +	INIT_LIST_HEAD(&slot->children);
> +	INIT_LIST_HEAD(&slot->link);
> +
> +	return slot;
> +}
> +
> +int powernv_php_slot_register(struct powernv_php_slot *slot)
> +{
> +	struct powernv_php_slot *parent;
> +	struct device_node *dn = slot->dn;
> +	unsigned long flags;
> +	int ret;
> +
> +	/* Avoid register same slot for twice */
> +	if (powernv_php_slot_find(slot->dn))
> +		return -EEXIST;
> +
> +	/* Register slot */
> +	ret = pci_hp_register(slot->php_slot, slot->bus,
> +			      slot->slot_no, slot->name);
> +	if (ret) {
> +		dev_warn(&slot->pdev->dev, "Error %d registering slot\n",
> +			 ret);
> +		return ret;
> +	}
> +
> +	/* Put into global or parent list */
> +	while ((dn = of_get_parent(dn))) {
> +		if (!PCI_DN(dn)) {
> +			of_node_put(dn);
> +			break;
> +		}
> +
> +		parent = powernv_php_slot_find(dn);
> +		if (parent) {
> +			of_node_put(dn);
> +			break;
> +		}
> +	}
> +
> +	spin_lock_irqsave(&php_slot_lock, flags);
> +	if (parent) {
> +		powernv_php_slot_get(parent);
> +		slot->parent = parent;
> +		list_add_tail(&slot->link, &parent->children);
> +	} else {
> +		list_add_tail(&slot->link, &php_slot_list);
> +	}
> +	spin_unlock_irqrestore(&php_slot_lock, flags);
> +
> +	/* Update slot state */
> +	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
> +	return 0;
> +}
>


Now I finished with this patchset respin :)


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 42/42] pci/hotplug: PowerPC PowerNV PCI hotplug driver
  2015-08-15  3:13   ` Alexey Kardashevskiy
@ 2015-08-15  4:47     ` Gavin Shan
  2015-08-15  9:15       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-15  4:47 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Sat, Aug 15, 2015 at 01:13:21PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>The patch intends to add standalone driver to support PCI hotplug
>>for PowerPC PowerNV platform, which runs on top of skiboot firmware.
>>The firmware identified hotpluggable slots and marked their device
>>tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware".
>>The driver simply scans device-tree to create/register PCI hotplug slot
>>accordingly.
>>
>>If the skiboot firmware doesn't support slot status retrieval, the PCI
>>slot device node shouldn't have property "ibm,reset-by-firmware". In
>>that case, none of valid PCI slots will be detected from device tree.
>>The skiboot firmware doesn't export the capability to access attention
>>LEDs yet and it's something for TBD.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>---
>>  MAINTAINERS                            |   6 +
>>  drivers/pci/hotplug/Kconfig            |  12 +
>>  drivers/pci/hotplug/Makefile           |   4 +
>>  drivers/pci/hotplug/powernv_php.c      | 140 +++++++
>>  drivers/pci/hotplug/powernv_php.h      |  92 +++++
>>  drivers/pci/hotplug/powernv_php_slot.c | 722 +++++++++++++++++++++++++++++++++
>>  6 files changed, 976 insertions(+)
>>  create mode 100644 drivers/pci/hotplug/powernv_php.c
>>  create mode 100644 drivers/pci/hotplug/powernv_php.h
>>  create mode 100644 drivers/pci/hotplug/powernv_php_slot.c
>>
>>diff --git a/MAINTAINERS b/MAINTAINERS
>>index fd60784..3b75c92 100644
>>--- a/MAINTAINERS
>>+++ b/MAINTAINERS
>>@@ -7747,6 +7747,12 @@ L:	linux-pci@vger.kernel.org
>>  S:	Supported
>>  F:	Documentation/PCI/pci-error-recovery.txt
>>
>>+PCI HOTPLUG DRIVER FOR POWERNV PLATFORM
>>+M:	Gavin Shan <gwshan@linux.vnet.ibm.com>
>>+L:	linux-pci@vger.kernel.org
>>+S:	Supported
>>+F:	drivers/pci/hotplug/powernv_php*
>>+
>>  PCI SUBSYSTEM
>>  M:	Bjorn Helgaas <bhelgaas@google.com>
>>  L:	linux-pci@vger.kernel.org
>>diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>>index df8caec..ef55dae 100644
>>--- a/drivers/pci/hotplug/Kconfig
>>+++ b/drivers/pci/hotplug/Kconfig
>>@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>>
>>  	  When in doubt, say N.
>>
>>+config HOTPLUG_PCI_POWERNV
>>+	tristate "PowerPC PowerNV PCI Hotplug driver"
>>+	depends on PPC_POWERNV && EEH
>>+	help
>>+	  Say Y here if you run PowerPC PowerNV platform that supports
>>+          PCI Hotplug
>>+
>>+	  To compile this driver as a module, choose M here: the
>>+	  module will be called powernv-php.
>>+
>>+	  When in doubt, say N.
>>+
>>  config HOTPLUG_PCI_RPA
>>  	tristate "RPA PCI Hotplug driver"
>>  	depends on PPC_PSERIES && EEH
>>diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>index b616e75..fd51d65 100644
>>--- a/drivers/pci/hotplug/Makefile
>>+++ b/drivers/pci/hotplug/Makefile
>>@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>>  obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>>  obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>>  obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
>>+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= powernv-php.o
>>  obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>>  obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>>  obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
>>@@ -50,6 +51,9 @@ ibmphp-objs		:=	ibmphp_core.o	\
>>  acpiphp-objs		:=	acpiphp_core.o	\
>>  				acpiphp_glue.o
>>
>>+powernv-php-objs	:=	powernv_php.o	\
>>+				powernv_php_slot.o
>>+
>>  rpaphp-objs		:=	rpaphp_core.o	\
>>  				rpaphp_pci.o	\
>>  				rpaphp_slot.o
>>diff --git a/drivers/pci/hotplug/powernv_php.c b/drivers/pci/hotplug/powernv_php.c
>>new file mode 100644
>>index 0000000..4cbff7a
>>--- /dev/null
>>+++ b/drivers/pci/hotplug/powernv_php.c
>>@@ -0,0 +1,140 @@
>>+/*
>>+ * PCI Hotplug Driver for PowerPC PowerNV platform.
>>+ *
>>+ * Copyright Gavin Shan, IBM Corporation 2015.
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+
>>+#include <linux/module.h>
>>+
>>+#include <asm/opal.h>
>>+#include <asm/pnv-pci.h>
>>+
>>+#include "powernv_php.h"
>>+
>>+#define DRIVER_VERSION	"0.1"
>>+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
>>+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
>
>
>Align all or none.
>

All of them are already aligned well. Please check your email setting.

>>+
>>+static struct notifier_block php_msg_nb = {
>>+	.notifier_call	= powernv_php_msg_handler,
>>+	.next		= NULL,
>>+	.priority	= 0,
>>+};
>>+
>>+static int powernv_php_register_one(struct device_node *dn)
>>+{
>>+	struct powernv_php_slot *slot;
>>+	const __be32 *prop32;
>>+	int ret;
>>+
>>+	/* Check if it's hotpluggable slot */
>>+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
>>+	if (!prop32 || !of_read_number(prop32, 1))
>>+		return -ENXIO;
>>+
>>+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
>>+	if (!prop32 || !of_read_number(prop32, 1))
>>+		return -ENXIO;
>>+
>>+	/* Allocate slot */
>>+	slot = powernv_php_slot_alloc(dn);
>>+	if (!slot)
>>+		return -ENODEV;
>>+
>>+	/* Register it */
>>+	ret = powernv_php_slot_register(slot);
>>+	if (ret) {
>>+		powernv_php_slot_put(slot);
>>+		return ret;
>>+	}
>>+
>>+	return powernv_php_slot_enable(slot->php_slot, false);
>
>
>And if it fails, no unregister and cleanup is needed?
>

You're right that it need care the failure cases. Will add something
in next revision.

>>+}
>>+
>>+int powernv_php_register(struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+	int ret = 0;
>
>@ret is not used below.
>

Ok. will remove it.

>>+
>>+	/*
>>+	 * The parent slots should be registered before their
>>+	 * child slots.
>>+	 */
>>+	for_each_child_of_node(dn, child) {
>>+		powernv_php_register_one(child);
>>+		powernv_php_register(child);
>>+	}
>>+
>>+	return ret;
>>+}
>>+
>>+static void powernv_php_unregister_one(struct device_node *dn)
>>+{
>>+	struct powernv_php_slot *slot;
>>+
>>+	slot = powernv_php_slot_find(dn);
>>+	if (!slot)
>>+		return;
>>+
>>+	pci_hp_deregister(slot->php_slot);
>>+}
>>+
>>+void powernv_php_unregister(struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+
>>+	/* The child slots should go before their parent slots */
>>+	for_each_child_of_node(dn, child) {
>>+		powernv_php_unregister(child);
>>+		powernv_php_unregister_one(child);
>>+	}
>>+}
>>+
>>+static int __init powernv_php_init(void)
>>+{
>>+	struct device_node *dn;
>>+	int ret;
>>+
>>+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
>>+
>>+	/* Register hotplug message handler */
>>+	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
>>+	if (ret) {
>>+		pr_warn("%s: Error %d registering hotplug notifier\n",
>>+			__func__, ret);
>>+		return ret;
>>+	}
>>+
>>+	/* Scan PHB nodes and their children */
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>+		powernv_php_register(dn);
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>+		powernv_php_register(dn);
>
>
>May be move pnv_pci_hotplug_notifier_register() after powernv_php_register()?
>If not, then below (in powernv_php_exit()) move
>pnv_pci_hotplug_notifier_unregister() to the end?
>
>

Ok. I'll move pnv_pci_hotplug_notifier_unregister() to end of powernv_php_exit().

>>+
>>+	return 0;
>>+}
>>+
>>+static void __exit powernv_php_exit(void)
>>+{
>>+	struct device_node *dn;
>>+
>>+	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
>>+
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>+		powernv_php_unregister(dn);
>>+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>+		powernv_php_unregister(dn);
>>+}
>>+
>>+module_init(powernv_php_init);
>>+module_exit(powernv_php_exit);
>>+
>>+MODULE_VERSION(DRIVER_VERSION);
>>+MODULE_LICENSE("GPL v2");
>>+MODULE_AUTHOR(DRIVER_AUTHOR);
>>+MODULE_DESCRIPTION(DRIVER_DESC);
>>diff --git a/drivers/pci/hotplug/powernv_php.h b/drivers/pci/hotplug/powernv_php.h
>>new file mode 100644
>>index 0000000..8034cc6
>>--- /dev/null
>>+++ b/drivers/pci/hotplug/powernv_php.h
>>@@ -0,0 +1,92 @@
>>+/*
>>+ * PCI Hotplug Driver for PowerPC PowerNV platform.
>>+ *
>>+ * Copyright Gavin Shan, IBM Corporation 2015.
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+
>>+#ifndef _POWERNV_PHP_H
>>+#define _POWERNV_PHP_H
>>+
>>+#include <linux/list.h>
>>+#include <linux/kref.h>
>>+#include <linux/of.h>
>>+#include <linux/pci.h>
>>+#include <linux/pci_hotplug.h>
>>+#include <linux/wait.h>
>>+#include <linux/workqueue.h>
>>+
>>+#include <asm/opal-api.h>
>>+
>>+/* Slot power status */
>>+#define POWERNV_PHP_SLOT_POWER_OFF	0
>>+#define POWERNV_PHP_SLOT_POWER_ON	1
>>+
>>+/* Slot presence status */
>>+#define POWERNV_PHP_SLOT_EMPTY		0
>>+#define POWERNV_PHP_SLOT_PRESENT	1
>
>These two are also only used in drivers/pci/hotplug/powernv_php_slot.c,
>move them there at least. It also seems your PHP driver is the only one which
>uses flags for an adapter status, others return plain 0 or 1 (which are
>c-style "false" and "true", pretty much, so it is not the case of magic
>constants). Since you return these values from the hotplug_slot_ops callbacks
>to external code, you should probably do the same.
>
>And exactly the same comment about POWERNV_PHP_SLOT_POWER_ON/OFF few lines
>above.
>

All those mcaroes are good enough to be put in this header file since they
are part of the slot's status. Yes, they're only used in powernv_php_slot.c
currently, but doesn't have to in future. I don't see why I need change them
to true/false, which just represents two states at most. Here, all states
are represented by numberic values (as POWERNV_PHP_SLOT_ATTEN_* as below).

>
>
>>+
>>+/* Slot attention status */
>>+#define POWERNV_PHP_SLOT_ATTEN_OFF	0
>>+#define POWERNV_PHP_SLOT_ATTEN_ON	1
>>+#define POWERNV_PHP_SLOT_ATTEN_IND	2
>>+#define POWERNV_PHP_SLOT_ATTEN_ACT	3
>
>
>These should go to drivers/pci/hotplug/powernv_php_slot.c. Where are these
>flags defined? Looks to me like there is a way to pass some status from the
>userspace via sysfs to OPAL so only OPAL is supposed to recognize and handle
>these. If so, these macros are missing "OPAL" in their names. I have one more
>comment below about it.
>

I prefer keep them in this header file as explained above. Those macros are
not connected/passed to OPAL directly. All OPAL calls have corresponding
wrappers in arch/powerpc/platforms/powernv/pci.c.

>>+
>>+struct powernv_php_slot {
>>+	char			*name;
>>+	struct device_node	*dn;
>>+	struct pci_dev		*pdev;
>>+	struct pci_bus		*bus;
>>+	uint64_t		id;
>>+	int			slot_no;
>>+	struct kref		kref;
>>+#define POWERNV_PHP_SLOT_STATE_INIT		0
>>+#define POWERNV_PHP_SLOT_STATE_REGISTER		1
>>+#define POWERNV_PHP_SLOT_STATE_POPULATED	2
>>+	int			state;
>>+	int			check_power_status;
>>+	int			status_confirmed;
>
>s/status_confirmed/power_status_confirmed/
>

Maybe, I think "status_confirmed" is enough here.

>What is this status? It can be 0, 1, 2 which seems to be
>UNCONFIRMED/INPROGRESS/CONFIRMED (does not need PNV/IODA prefixes as it is
>local to the powernv_php_slot.c file).
>

No, it only has two states: 0/1. Again, it's connected with any OPAL
calls directly. No need to have "IODA_" prefix at all.

>
>>+	struct opal_msg		*msg;
>>+	void			*fdt;
>>+	void			*dt;
>>+	struct of_changeset	ocs;
>>+	struct work_struct	work;
>>+	wait_queue_head_t	queue;
>>+	struct hotplug_slot	*php_slot;
>>+	struct powernv_php_slot	*parent;
>>+	struct list_head	children;
>>+	struct list_head	link;
>>+};
>
>This should go to drivers/pci/hotplug/powernv_php_slot.c and this header
>should only have a forward declaration. After you move it there, you get
>better separation of the driver code from the slot code and only 2 changes
>will be needed:
>
>1. powernv_php_slot_enable() should receive powernv_php_slot
>2. add powernv_php_slot_unregister() (like powernv_php_slot_register()), this
>way you will have pairing pci_hp_register/pci_hp_deregister in the same file.
>
>
>After you moved this struct to the source file, you could remove/shorten
>POWERNV_PHP_SLOT_STATE_ prefixes if you wished.
>

I don't see why it should go to drivers/pci/hotplug/powernv_php_slot.c. it's
the core data structure shared by multiple source files.

>>+
>>+int powernv_php_msg_handler(struct notifier_block *nb,
>>+			    unsigned long type, void *message);
>>+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn);
>>+void powernv_php_slot_free(struct kref *kref);
>>+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn);
>>+int powernv_php_slot_register(struct powernv_php_slot *slot);
>>+int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan);
>>+int powernv_php_register(struct device_node *dn);
>>+void powernv_php_unregister(struct device_node *dn);
>>+
>>+#define to_powernv_php_slot(kref) \
>>+	container_of(kref, struct powernv_php_slot, kref)
>>+
>>+static inline void powernv_php_slot_get(struct powernv_php_slot *slot)
>>+{
>>+	if (slot)
>>+		kref_get(&slot->kref);
>>+}
>>+
>>+static inline int powernv_php_slot_put(struct powernv_php_slot *slot)
>>+{
>>+	if (slot)
>>+		return kref_put(&slot->kref, powernv_php_slot_free);
>>+
>>+	return 0;
>>+}
>
>In these 2 helpers you do not have to check for @slof - it is checked in the
>callers pretty much always. Or it is not checked in php_slot_release() but
>dereferenced before you call powernv_php_slot_put(slot).
>
>The only place you really want this check is
>powernv_php_slot_put(slot->parent) so just check it there. btw is it even
>possible for the slot not to have a parent?
>

For most cases, slot doesn't have parent. That's why I had the check here.
Yes, if you really want me to drop the check, then I have to check slot's
validation in the callers. Also, those helpers are inline functions, no
too much difference actually.

>So I'd ditch these helpers.
>

I don't see the reason why these helpers need to be dropped. To use kref_{get,put}
directly?

>>+
>>+#endif /* !_POWERNV_PHP_H */
>>diff --git a/drivers/pci/hotplug/powernv_php_slot.c b/drivers/pci/hotplug/powernv_php_slot.c
>>new file mode 100644
>>index 0000000..73a93a2
>>--- /dev/null
>>+++ b/drivers/pci/hotplug/powernv_php_slot.c
>>@@ -0,0 +1,722 @@
>>+/*
>>+ * PCI Hotplug Driver for PowerPC PowerNV platform.
>>+ *
>>+ * Copyright Gavin Shan, IBM Corporation 2015.
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+
>>+#include <linux/module.h>
>>+
>>+#include <asm/opal.h>
>>+#include <asm/pnv-pci.h>
>>+#include <asm/ppc-pci.h>
>>+
>>+#include "powernv_php.h"
>>+
>>+static LIST_HEAD(php_slot_list);
>>+static DEFINE_SPINLOCK(php_slot_lock);
>>+
>>+/*
>>+ * Remove firmware data for all child device nodes of the
>>+ * indicated one.
>>+ */
>>+static void remove_child_pdn(struct device_node *np)
>>+{
>>+	struct device_node *child;
>>+
>>+	for_each_child_of_node(np, child) {
>>+		/* In depth first */
>>+		remove_child_pdn(child);
>>+
>>+		remove_pci_device_node_info(child);
>>+	}
>>+}
>>+
>>+/*
>>+ * Remove all subordinate device nodes of the indicated one.
>>+ * Those device nodes in deepest path should be released firstly.
>>+ */
>>+static int remove_child_device_nodes(struct device_node *parent)
>>+{
>>+	struct device_node *np, *child;
>>+	int ret = 0;
>>+
>>+	/* If the device node has children, remove them firstly */
>>+	for_each_child_of_node(parent, np) {
>>+		ret = remove_child_device_nodes(np);
>>+		if (ret)
>>+			return ret;
>>+
>>+		/* The device shouldn't have alive children */
>>+		child = of_get_next_child(np, NULL);
>>+		if (child) {
>>+			of_node_put(child);
>>+			of_node_put(np);
>>+			pr_err("%s: Alive children of node <%s>\n",
>>+			       __func__, of_node_full_name(np));
>>+			return -EBUSY;
>>+		}
>>+
>>+		/* Detach the device node */
>>+		of_detach_node(np);
>>+		of_node_put(np);
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+/*
>>+ * The function processes the message sent by firmware
>>+ * to remove all device tree nodes beneath the slot's
>>+ * nodes, and the associated auxillary data.
>>+ */
>>+static void slot_power_off_handler(struct powernv_php_slot *slot)
>>+{
>>+	int ret, status = 1;
>>+
>>+	/* Release the firmware data for the child device nodes */
>>+	remove_child_pdn(slot->dn);
>>+
>>+	/*
>>+	 * Release the child device nodes. If the sub-tree was
>>+	 * built with the help of changeset, we just need destroy
>>+	 * the changes.
>>+	 */
>>+	if (slot->fdt) {
>>+		of_changeset_destroy(&slot->ocs);
>>+		kfree(slot->dt);
>>+		slot->dt = NULL;
>>+		slot->dn->child = NULL;
>>+		kfree(slot->fdt);
>>+		slot->fdt = NULL;
>>+	} else {
>>+		ret = remove_child_device_nodes(slot->dn);
>>+		if (ret) {
>>+			status = 2;
>>+			dev_warn(&slot->pdev->dev, "Error %d freeing nodes\n",
>>+				 ret);
>>+		}
>>+	}
>>+
>>+	/* Confirm status change */
>>+	slot->status_confirmed = status;
>>+	wake_up_interruptible(&slot->queue);
>>+}
>>+
>>+static int slot_populate_changeset(struct of_changeset *ocs,
>>+				    struct device_node *dn)
>>+{
>>+	struct device_node *child;
>>+	int ret = 0;
>>+
>>+	for_each_child_of_node(dn, child) {
>>+		ret = of_changeset_attach_node(ocs, child);
>>+		if (ret)
>>+			return ret;
>>+
>>+		ret = slot_populate_changeset(ocs, child);
>>+	}
>>+
>>+	return ret;
>>+}
>>+
>>+static void slot_power_on_handler(struct powernv_php_slot *slot)
>>+{
>>+	void *fdt, *dt;
>>+	uint64_t len;
>>+	int ret, status = 1;
>>+
>>+	/* We don't know the FDT blob size. It tries with incremental
>>+	 * sized memory chunk.
>>+	 */
>
>
>What is the real expected size? 0x10000 is just 64K, just allocate it and
>that's it.
>

You can not know the size in advance. It depends on the size of the FDT
blob to be transfered. So the size has to be probed.

>>+	for (len = 0x2000; len <= 0x10000; len += 0x2000) {
>>+		fdt = kzalloc(len, GFP_KERNEL);
>>+		if (!fdt)
>>+			break;
>>+
>>+		ret = pnv_pci_get_device_tree(slot->dn->phandle, fdt, len);
>>+		if (!ret)
>>+			break;
>>+
>>+		kfree(fdt);
>>+	}
>>+
>>+	if (len > 0x10000) {
>>+		dev_warn(&slot->pdev->dev, "Cannot alloc FDT blob\n");
>>+		goto out;
>>+	}
>>+
>>+	/* Unflatten device tree blob */
>>+	dt = of_fdt_unflatten_tree(fdt, slot->dn, NULL);
>>+	if (!dt) {
>>+		dev_warn(&slot->pdev->dev, "Cannot unflatten FDT\n");
>>+		goto free_fdt;
>>+	}
>
>Right here you could kfree(fdt) and not cache it in the slot struct at all.
>You do not use it later anyway.
>

The "fdt" has been cached at end of this function and it can't be released
because the unflattened device-tree is still using data in the FDT blob (@fdt).

>
>>+
>>+	/* Initialize and apply the changeset */
>>+	of_changeset_init(&slot->ocs);
>>+	ret = slot_populate_changeset(&slot->ocs, slot->dn);
>>+	if (ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d populating changeset\n",
>>+			 ret);
>>+		goto free_dt;
>>+	}
>>+
>>+	slot->dn->child = NULL;
>>+	ret = of_changeset_apply(&slot->ocs);
>>+	if (ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d applying changeset\n",
>>+			 ret);
>>+		goto destroy_changeset;
>>+	}
>>+
>>+	/* Add device node firmware data */
>>+	traverse_pci_device_nodes(slot->dn,
>>+				  add_pci_device_node_info,
>>+				  pci_bus_to_host(slot->bus));
>>+	slot->fdt = fdt;
>>+	slot->dt = dt;
>>+	goto out;
>>+
>>+destroy_changeset:
>>+	of_changeset_destroy(&slot->ocs);
>>+free_dt:
>>+	kfree(dt);
>>+	slot->dn->child = NULL;
>
>Can of_fdt_unflatten_tree() or of_changeset_init() or
>slot_populate_changeset() initialize dn->child? No kfree(slot->dn->child)?
>

of_fdt_unflatten_tree() did. @dt is the unflattened device-tree here.
No need to free slot->dn->child.

>>+free_fdt:
>>+	kfree(fdt);
>>+	status = 2;
>>+out:
>>+	/* Confirm status change */
>>+	slot->status_confirmed = status;
>>+	wake_up_interruptible(&slot->queue);
>>+}
>>+
>>+static void powernv_php_slot_work(struct work_struct *data)
>>+{
>>+	struct powernv_php_slot *slot = container_of(data,
>>+						     struct powernv_php_slot,
>>+						     work);
>>+	uint64_t php_event = be64_to_cpu(slot->msg->params[0]);
>>+
>>+	switch (php_event) {
>>+	case 0: /* Slot power off */
>>+		slot_power_off_handler(slot);
>>+		break;
>>+	case 1: /* Slot power on */
>>+		slot_power_on_handler(slot);
>>+		break;
>
>These 0 and 1 are not the same 0 and 1 used for @val in get_power_status(),
>these are from OPAL so please define them.
>

Good point. I'll have following macros for them in opal-api.h:

#define OPAL_PCI_PHP_EVENT_POWER_OFF	0
#define OPAL_PCI_PHP_EVENT_POWER_ON	1

Or to use "enum" type.

>
>>+	default:
>>+		dev_warn(&slot->pdev->dev, "Unsupported hotplug event %lld\n",
>>+			 php_event);
>>+	}
>>+
>>+	of_node_put(slot->dn);
>>+}
>>+
>>+int powernv_php_msg_handler(struct notifier_block *nb,
>>+			    unsigned long type, void *message)
>>+{
>>+	phandle h;
>>+	struct device_node *np;
>>+	struct powernv_php_slot *slot;
>>+	struct opal_msg *msg = message;
>>+
>>+	/* Check the message type */
>>+	if (type != OPAL_MSG_PCI_HOTPLUG) {
>>+		pr_warn("%s: Wrong message type %ld received!\n",
>>+			__func__, type);
>>+		return NOTIFY_DONE;
>>+	}
>>+
>>+	/* Find the device node */
>>+	h = (phandle)be64_to_cpu(msg->params[1]);
>>+	np = of_find_node_by_phandle(h);
>>+	if (!np) {
>>+		pr_warn("%s: No device node for phandle 0x%08x\n",
>>+			__func__, h);
>>+		return NOTIFY_DONE;
>>+	}
>>+
>>+	/* Find the slot */
>>+	slot = powernv_php_slot_find(np);
>>+	if (!slot) {
>>+		pr_warn("%s: No slot found for node <%s>\n",
>>+			__func__, of_node_full_name(np));
>>+		of_node_put(np);
>>+		return NOTIFY_DONE;
>>+	}
>>+
>>+	/* Schedule the work */
>>+	slot->msg = msg;
>>+	schedule_work(&slot->work);
>>+	return NOTIFY_OK;
>>+}
>
>This function belongs to drivers/pci/hotplug/powernv_php.c (searching a slot
>is powernv_php.c's scope) except these lines:
>
>> +	/* Schedule the work */
>> +	slot->msg = msg;
>> +	schedule_work(&slot->work);
>
>These 3 lines should be in helper in drivers/pci/hotplug/powernv_php_slot.c.
>

It's no point to drop one helper and add another one. One thing I keep
in mind when writing the code: all hotplug logic (interacting with
OPAL firmware) is implemented in powernv_php_slot.c and powernv_php.c
just connect the helper functions with event.

>>+
>>+static int set_power_status(struct hotplug_slot *php_slot, u8 val)
>>+{
>>+	struct powernv_php_slot *slot = php_slot->private;
>>+	int ret;
>>+
>>+	/* Set power status */
>>+	slot->status_confirmed = 0;
>>+	ret = pnv_pci_set_power_status(slot->id, val);
>>+	if (ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d powering %s slot\n",
>>+			 ret, val ? "on" : "off");
>>+		return ret;
>>+	}
>>+
>>+	/* Continue to PCI probing after finalized device-tree. The
>>+	 * device-tree might have been updated completely at this
>>+	 * point. Thus we don't have to always waiting for that.
>>+	 */
>>+	if (slot->status_confirmed == 1)
>>+		return 0;
>>+	else if (slot->status_confirmed > 0)
>>+		return -EBUSY;
>>+
>>+	ret = wait_event_timeout(slot->queue, slot->status_confirmed, 10 * HZ);
>>+	if (!ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d waiting for power-%s\n",
>>+			 ret, val ? "on" : "off");
>>+		return -EBUSY;
>>+	}
>>+
>>+	/* Check the result */
>>+	if (slot->status_confirmed == 1)
>>+		return 0;
>>+
>>+	dev_warn(&slot->pdev->dev, "Error status %d for power-%s\n",
>>+		 slot->status_confirmed, val ? "on" : "off");
>>+	return -EBUSY;
>>+}
>>+
>>+static int get_power_status(struct hotplug_slot *php_slot, u8 *val)
>>+{
>>+	struct powernv_php_slot *slot = php_slot->private;
>>+	uint8_t state;
>>+	int ret;
>>+
>>+	/*
>>+	 * Retrieve power status from firmware. If we fail
>>+	 * getting that, the power status fails back to
>>+	 * be on.
>>+	 */
>>+	ret = pnv_pci_get_power_status(slot->id, &state);
>>+	if (ret) {
>>+		*val = POWERNV_PHP_SLOT_POWER_ON;
>>+		dev_warn(&slot->pdev->dev, "Error %d getting power status\n",
>>+			 ret);
>>+	} else {
>>+		*val = state ? POWERNV_PHP_SLOT_POWER_ON :
>>+			       POWERNV_PHP_SLOT_POWER_OFF;
>>+		php_slot->info->power_status = *val;
>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int get_adapter_status(struct hotplug_slot *php_slot, u8 *val)
>>+{
>>+	struct powernv_php_slot *slot = php_slot->private;
>>+	uint8_t state;
>>+	int ret;
>>+
>>+	/*
>>+	 * Retrieve presence status from firmware. If we can't
>>+	 * get that, it will fail back to be empty.
>>+	 */
>>+	ret = pnv_pci_get_presence_status(slot->id, &state);
>>+	if (ret >= 0) {
>>+		ret = 0;
>>+		*val = state ? POWERNV_PHP_SLOT_PRESENT :
>>+			       POWERNV_PHP_SLOT_EMPTY;
>>+		php_slot->info->adapter_status = *val;
>>+		ret = 0;
>>+	} else {
>>+		*val = POWERNV_PHP_SLOT_EMPTY;
>>+		dev_warn(&slot->pdev->dev, "Error %d getting presence\n",
>>+			 ret);
>>+	}
>>+
>>+	return ret;
>>+}
>>+
>>+static int set_attention_status(struct hotplug_slot *php_slot, u8 val)
>>+{
>>+	struct powernv_php_slot *slot = php_slot->private;
>>+
>>+	/* The default operation would to turn on the attention */
>>+	switch (val) {
>>+	case POWERNV_PHP_SLOT_ATTEN_OFF:
>>+	case POWERNV_PHP_SLOT_ATTEN_ON:
>>+	case POWERNV_PHP_SLOT_ATTEN_IND:
>>+	case POWERNV_PHP_SLOT_ATTEN_ACT:
>>+		break;
>>+	default:
>>+		dev_warn(&slot->pdev->dev, "Invalid attention %d\n", val);
>>+		return -EINVAL;
>>+	}
>>+
>>+	/* FIXME: Make it real once firmware supports it */
>>+	php_slot->info->attention_status = val;
>
>Since firmware does not have an idea about these POWERNV_PHP_SLOT_ATTEN_xxx,
>just remove them. Later when the firmware will know about them, we will have
>to change this code anyway and by that time, the set of states may have
>changed.
>

Sure, will do.

>>+
>>+	return 0;
>>+}
>>+
>>+int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan)
>
>
>This should receive powernv_php_slot as described above.
>

Good point, I'll change accordingly.

>>+{
>>+	struct powernv_php_slot *slot = php_slot->private;
>>+	uint8_t presence, power_status;
>>+	int ret;
>>+
>>+	/* Check if the slot has been configured */
>>+	if (slot->state != POWERNV_PHP_SLOT_STATE_REGISTER)
>>+		return 0;
>>+
>>+	/* Retrieve slot presence status */
>>+	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
>>+	if (ret)
>>+		return ret;
>>+
>>+	/* Proceed if there have nothing behind the slot */
>>+	if (presence == POWERNV_PHP_SLOT_EMPTY)
>>+		goto scan;
>>+
>>+	/*
>>+	 * If we don't detect something behind the slot, we need
>>+	 * make sure the power suply to the slot is on. Otherwise,
>>+	 * the slot downstream PCIe linkturn should be down.
>>+	 *
>>+	 * On the first time, we don't change the power status to
>>+	 * boost system boot with assumption that the firmware
>>+	 * supplies consistent slot power status: empty slot always
>>+	 * has its power off and non-empty slot has its power on.
>>+	 */
>>+	if (!slot->check_power_status) {
>>+		slot->check_power_status = 1;
>>+		goto scan;
>>+	}
>>+
>>+	/* Check the power status. Scan the slot if that's already on */
>>+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
>>+	if (ret)
>>+		return ret;
>>+
>>+	if (power_status == POWERNV_PHP_SLOT_POWER_ON)
>>+		goto scan;
>>+
>>+	/* Power is off, turn it on and then scan the slot */
>>+	ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_ON);
>>+	if (ret)
>>+		return ret;
>>+
>>+scan:
>>+	switch (presence) {
>>+	case POWERNV_PHP_SLOT_PRESENT:
>>+		if (rescan) {
>>+			pci_lock_rescan_remove();
>>+			pci_add_pci_devices(slot->bus);
>>+			pci_unlock_rescan_remove();
>>+		}
>>+
>>+		/* Rescan for child hotpluggable slots */
>>+		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
>>+		if (rescan)
>>+			powernv_php_register(slot->dn);
>>+		break;
>>+	case POWERNV_PHP_SLOT_EMPTY:
>>+		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
>>+		break;
>>+	default:
>>+		dev_warn(&slot->pdev->dev, "Invalid presence status %d\n",
>>+			 presence);
>>+		return -EINVAL;
>
>Neigher PHP driver will ever have presence other than 0 or 1. So this
>switch() is simple if(presence){}else{}.
>

Ok. Will use "if () {} else {}" then.

>>+	}
>>+
>>+	return 0;
>>+}
>>+
>>+static int enable_slot(struct hotplug_slot *php_slot)
>>+{
>>+	return powernv_php_slot_enable(php_slot, true);
>>+}
>>+
>>+static int disable_slot(struct hotplug_slot *php_slot)
>>+{
>>+	struct powernv_php_slot *slot = php_slot->private;
>>+	uint8_t power_status;
>>+	int ret;
>>+
>>+	if (slot->state != POWERNV_PHP_SLOT_STATE_POPULATED)
>>+		return 0;
>>+
>>+	/* Remove all devices behind the slot */
>>+	pci_lock_rescan_remove();
>>+	pci_remove_pci_devices(slot->bus);
>>+	pci_unlock_rescan_remove();
>>+
>>+	/* Detach the child hotpluggable slots */
>>+	powernv_php_unregister(slot->dn);
>>+
>>+	/*
>>+	 * Check the power status and turn it off if necessary. If we
>>+	 * fail to get the power status, the power will be forced to
>>+	 * be off.
>>+	 */
>>+	ret = php_slot->ops->get_power_status(php_slot, &power_status);
>>+	if (ret || power_status == POWERNV_PHP_SLOT_POWER_ON) {
>>+		ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_OFF);
>>+		if (ret)
>>+			dev_warn(&slot->pdev->dev, "Error %d powering off\n",
>>+				 ret);
>>+	}
>>+
>>+	/* Update slot state */
>>+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
>>+	return 0;
>>+}
>>+
>>+static struct hotplug_slot_ops php_slot_ops = {
>>+	.get_power_status	= get_power_status,
>>+	.get_adapter_status	= get_adapter_status,
>>+	.set_attention_status	= set_attention_status,
>>+	.enable_slot		= enable_slot,
>>+	.disable_slot		= disable_slot,
>>+};
>>+
>>+static struct powernv_php_slot *php_slot_match(struct device_node *dn,
>>+					       struct powernv_php_slot *slot)
>>+{
>>+	struct powernv_php_slot *target, *tmp;
>>+
>>+	if (slot->dn == dn)
>>+		return slot;
>>+
>>+	list_for_each_entry(tmp, &slot->children, link) {
>>+		target = php_slot_match(dn, tmp);
>>+		if (target)
>>+			return target;
>>+	}
>>+
>>+	return NULL;
>>+}
>>+
>>+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn)
>>+{
>>+	struct powernv_php_slot *slot, *tmp;
>>+	unsigned long flags;
>>+
>>+	spin_lock_irqsave(&php_slot_lock, flags);
>>+	list_for_each_entry(tmp, &php_slot_list, link) {
>>+		slot = php_slot_match(dn, tmp);
>>+		if (slot) {
>>+			spin_unlock_irqrestore(&php_slot_lock, flags);
>>+			return slot;
>>+		}
>>+	}
>>+	spin_unlock_irqrestore(&php_slot_lock, flags);
>>+
>>+	return NULL;
>>+}
>>+
>>+void powernv_php_slot_free(struct kref *kref)
>>+{
>>+	struct powernv_php_slot *slot = to_powernv_php_slot(kref);
>>+
>>+	WARN_ON(!list_empty(&slot->children));
>>+	kfree(slot->name);
>>+	kfree(slot);
>>+}
>>+
>>+static void php_slot_release(struct hotplug_slot *hp_slot)
>>+{
>>+	struct powernv_php_slot *slot = hp_slot->private;
>>+	unsigned long flags;
>>+
>>+	/* Remove from global or child list */
>>+	spin_lock_irqsave(&php_slot_lock, flags);
>>+	list_del(&slot->link);
>>+	spin_unlock_irqrestore(&php_slot_lock, flags);
>
>
>This is a good example where RCU rules. powernv_php_slot_find() returns slot
>pointer and its use is not protected by spin_lock -> dangerous.
>
>Remove spin_lock(), s/list_del/list_del_rcu/, and move bits below to
>call_rcu(), and s/list_for_each_entry/list_for_each_entry_rcu/.
>

Ok. It sounds RCU list might be better. I'll refactor it to use RCU list.

I think the spinlock is needed when adding one node to the RCU link list. If so,
the spinklock needn't to be removed?

>
>>+
>>+	/* Detach from parent */
>>+	powernv_php_slot_put(slot);
>>+	powernv_php_slot_put(slot->parent);
>>+}
>>+
>>+static bool php_slot_get_id(struct device_node *dn,
>>+			    uint64_t *id)
>>+{
>>+	struct device_node *parent = dn;
>>+	const __be64 *prop64;
>>+	const __be32 *prop32;
>>+
>>+	/*
>>+	 * The hotpluggable slot always has a compound Id, which
>>+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
>>+	 * number, and compound indicator
>>+	 */
>>+	*id = (0x1ul << 63);
>>+
>>+	/* Bus/Slot/Function number */
>>+	prop32 = of_get_property(dn, "reg", NULL);
>>+	if (!prop32)
>>+		return false;
>>+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
>>+
>>+	/* PHB Id */
>>+	while ((parent = of_get_parent(parent))) {
>>+		if (!PCI_DN(parent)) {
>>+			of_node_put(parent);
>>+			break;
>>+		}
>>+
>>+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
>>+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
>>+			of_node_put(parent);
>>+			continue;
>>+		}
>>+
>>+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
>>+		if (!prop64) {
>>+			of_node_put(parent);
>>+			return false;
>>+		}
>>+
>>+		*id |= be64_to_cpup(prop64);
>>+		of_node_put(parent);
>>+		return true;
>>+	}
>>+
>>+	return false;
>>+}
>>+
>>+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn)
>>+{
>>+	struct eeh_dev *edev = pdn_to_eeh_dev(PCI_DN(dn));
>>+	struct pci_bus *bus;
>>+	struct powernv_php_slot *slot;
>>+	const char *label;
>>+	uint64_t id;
>>+	int slot_no;
>>+	size_t size;
>>+	void *pmem;
>>+
>>+	/* Slot name */
>>+	label = of_get_property(dn, "ibm,slot-label", NULL);
>>+	if (!label)
>>+		return NULL;
>>+
>>+	/* Slot identifier */
>>+	if (!php_slot_get_id(dn, &id))
>>+		return NULL;
>>+
>>+	/* PCI bus */
>>+	bus = of_node_to_pci_bus(dn);
>>+	if (!bus)
>>+		return NULL;
>>+
>>+	/* Slot number */
>>+	if (dn->child && PCI_DN(dn->child))
>>+		slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
>>+	else
>>+		slot_no = -1;
>
>Not INVALID_SLOT and
>#define INVALID_SLOT -1
>? :)
>

No need to do that. "-1" means it's a "placeholder" slot. None of
PCI devices will be attached with the slot. "-1" is defined by
PCI hotplug core and it doesn't have a macro yet.

>
>>+
>>+	/* Allocate slot */
>>+	size = sizeof(struct powernv_php_slot) +
>>+	       sizeof(struct hotplug_slot) +
>>+	       sizeof(struct hotplug_slot_info);
>>+	pmem = kzalloc(size, GFP_KERNEL);
>>+	if (!pmem) {
>>+		pr_warn("%s: Cannot allocate slot for node %s\n",
>>+			__func__, dn->full_name);
>>+		return NULL;
>>+	}
>>+
>>+	/* Assign memory blocks */
>>+	slot = pmem;
>>+	slot->php_slot = pmem + sizeof(struct powernv_php_slot);
>>+	slot->php_slot->info = pmem + sizeof(struct powernv_php_slot) +
>>+			      sizeof(struct hotplug_slot);
>>+	slot->name = kstrdup(label, GFP_KERNEL);
>>+	if (!slot->name) {
>>+		pr_warn("%s: Cannot populate name for node %s\n",
>>+			__func__, dn->full_name);
>>+		kfree(pmem);
>>+		return NULL;
>>+	}
>
>Why not just embed structs one to another?
>

Good point. I'll do.

>>+
>>+	/* Initialize slot */
>>+	kref_init(&slot->kref);
>>+	slot->state = POWERNV_PHP_SLOT_STATE_INIT;
>>+	slot->dn = dn;
>>+	slot->pdev = eeh_dev_to_pci_dev(edev);
>>+	slot->bus = bus;
>>+	slot->id = id;
>>+	slot->slot_no = slot_no;
>>+	INIT_WORK(&slot->work, powernv_php_slot_work);
>>+	init_waitqueue_head(&slot->queue);
>>+	slot->check_power_status = 0;
>>+	slot->status_confirmed = 0;
>>+	slot->php_slot->ops = &php_slot_ops;
>>+	slot->php_slot->release = php_slot_release;
>>+	slot->php_slot->private = slot;
>>+	INIT_LIST_HEAD(&slot->children);
>>+	INIT_LIST_HEAD(&slot->link);
>>+
>>+	return slot;
>>+}
>>+
>>+int powernv_php_slot_register(struct powernv_php_slot *slot)
>>+{
>>+	struct powernv_php_slot *parent;
>>+	struct device_node *dn = slot->dn;
>>+	unsigned long flags;
>>+	int ret;
>>+
>>+	/* Avoid register same slot for twice */
>>+	if (powernv_php_slot_find(slot->dn))
>>+		return -EEXIST;
>>+
>>+	/* Register slot */
>>+	ret = pci_hp_register(slot->php_slot, slot->bus,
>>+			      slot->slot_no, slot->name);
>>+	if (ret) {
>>+		dev_warn(&slot->pdev->dev, "Error %d registering slot\n",
>>+			 ret);
>>+		return ret;
>>+	}
>>+
>>+	/* Put into global or parent list */
>>+	while ((dn = of_get_parent(dn))) {
>>+		if (!PCI_DN(dn)) {
>>+			of_node_put(dn);
>>+			break;
>>+		}
>>+
>>+		parent = powernv_php_slot_find(dn);
>>+		if (parent) {
>>+			of_node_put(dn);
>>+			break;
>>+		}
>>+	}
>>+
>>+	spin_lock_irqsave(&php_slot_lock, flags);
>>+	if (parent) {
>>+		powernv_php_slot_get(parent);
>>+		slot->parent = parent;
>>+		list_add_tail(&slot->link, &parent->children);
>>+	} else {
>>+		list_add_tail(&slot->link, &php_slot_list);
>>+	}
>>+	spin_unlock_irqrestore(&php_slot_lock, flags);
>>+
>>+	/* Update slot state */
>>+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
>>+	return 0;
>>+}
>>
>
>
>Now I finished with this patchset respin :)
>

Ok. Appreciated for your time on this :-)

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 20/42] powerpc/powernv: Create PEs dynamically
  2015-08-14 13:52   ` Alexey Kardashevskiy
@ 2015-08-15  4:59     ` Gavin Shan
  2015-08-15  9:23       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 102+ messages in thread
From: Gavin Shan @ 2015-08-15  4:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Gavin Shan, linuxppc-dev, linux-pci, devicetree, benh, mpe,
	bhelgaas, grant.likely, robherring2, panto

On Fri, Aug 14, 2015 at 11:52:44PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>Currently, the PEs and their associated resources are assigned
>>in ppc_md.pcibios_fixup() except those consumed by SRIOV VFs.
>>The function is called for once after PCI probing and resources
>>assignment is finished which isn't hotplug friendly.
>>
>>The patch creates PEs dynamically by ppc_md.pcibios_setup_bridge(),
>>which is called on the event during system bootup and PCI hotplug:
>>updating PCI bridge's windows after resource assignment/reassignment
>>are finished. For partial hotplug case, where not all PCI devices
>>belonging to the PE are unplugged and plugged again, we just need
>>unbinding/binding the affected PCI devices with the corresponding
>>PE without creating new one.
>>
>>Besides, it might require additional resources (e.g. M32) to the
>>windows of the PCI bridge when unplugging current adapter, and
>>insert a different adapter if there is one PCI slot, which is
>>assumed behind root port, or the downstream bridge of the PCIE
>>switch behind root port. The parent bridge of the newly plugged
>>adapter would reject the request to add more resources, leading
>>to hotplug failure. For the issue, the patch extends the windows
>>of root port, or the upstream port of the PCIe switch behind root
>>port to PHB's windows when ppc_md.pcibios_setup_bridge() is called.
>>
>>There is no upstream bridge for root bus, so we have to fix it up
>>before any PE is created because the root bus PE is the ancestor
>>to anyone else.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 226 ++++++++++++++++++------------
>>  arch/powerpc/platforms/powernv/pci.h      |   1 +
>>  2 files changed, 137 insertions(+), 90 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 8aa6ab8..37847a3 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1083,6 +1083,13 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>  				pci_name(dev));
>>  			continue;
>>  		}
>>+
>>+		/* The PCI device might be not detached from the
>>+		 * PE in partial hotplug case.
>>+		 */
>>+		if (pdn->pe_number != IODA_INVALID_PE)
>>+			continue;
>>+
>>  		pdn->pe_number = pe->pe_number;
>>  		pe->dma32_weight += pnv_ioda_dma_weight(dev);
>>  		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>@@ -1101,9 +1108,27 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  	struct pci_controller *hose = pci_bus_to_host(bus);
>>  	struct pnv_phb *phb = hose->private_data;
>>  	struct pnv_ioda_pe *pe = NULL;
>>+	int pe_num;
>>+
>>+	/* For partial hotplug case, the PE instance hasn't been destroyed
>>+	 * yet. We shouldn't allocated a new one and assign resources to
>>+	 * it. The existing PE instance should be reused, but we should
>>+	 * associate the devices to the PE.
>>+	 */
>>+	pe_num = phb->ioda.pe_rmap[bus->number << 8];
>>+	if (pe_num != IODA_INVALID_PE) {
>>+		pe = &phb->ioda.pe_array[pe_num];
>>+		pnv_ioda_setup_same_PE(bus, pe);
>>+		return NULL;
>>+	}
>>+
>>+	/* PE number for root bus should have been reserved */
>>+	if (pci_is_root_bus(bus) &&
>>+	    phb->ioda.root_pe_idx != IODA_INVALID_PE)
>>+		pe = &phb->ioda.pe_array[phb->ioda.root_pe_idx];
>>
>>  	/* Check if PE is determined by M64 */
>>-	if (phb->pick_m64_pe)
>>+	if (!pe && phb->pick_m64_pe)
>
>
>else if (phb->pick_m64_pe)
>

No. When this function is called for the root of root bus, the PE
should have been reserved. So we still have to check @pe.

>
>
>>  		pe = phb->pick_m64_pe(bus, all);
>>
>>  	/* The PE number isn't pinned by M64 */
>>@@ -1150,46 +1175,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>  	return pe;
>>  }
>>
>>-static void pnv_ioda_setup_PEs(struct pci_bus *bus)
>>-{
>>-	struct pci_dev *dev;
>>-
>>-	pnv_ioda_setup_bus_PE(bus, false);
>>-
>>-	list_for_each_entry(dev, &bus->devices, bus_list) {
>>-		if (dev->subordinate) {
>>-			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
>>-				pnv_ioda_setup_bus_PE(dev->subordinate, true);
>>-			else
>>-				pnv_ioda_setup_PEs(dev->subordinate);
>>-		}
>>-	}
>>-}
>>-
>>-/*
>>- * Configure PEs so that the downstream PCI buses and devices
>>- * could have their associated PE#. Unfortunately, we didn't
>>- * figure out the way to identify the PLX bridge yet. So we
>>- * simply put the PCI bus and the subordinate behind the root
>>- * port to PE# here. The game rule here is expected to be changed
>>- * as soon as we can detected PLX bridge correctly.
>>- */
>>-static void pnv_pci_ioda_setup_PEs(void)
>>-{
>>-	struct pci_controller *hose, *tmp;
>>-	struct pnv_phb *phb;
>>-
>>-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>-		phb = hose->private_data;
>>-
>>-		/* M64 layout might affect PE allocation */
>>-		if (phb->reserve_m64_pe)
>>-			phb->reserve_m64_pe(hose->bus, NULL, true);
>>-
>>-		pnv_ioda_setup_PEs(hose->bus);
>>-	}
>>-}
>>-
>>  #ifdef CONFIG_PCI_IOV
>>  static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>  {
>>@@ -2962,52 +2947,6 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>  	}
>>  }
>>
>>-static void pnv_pci_ioda_setup_seg(void)
>>-{
>>-	struct pci_controller *tmp, *hose;
>>-	struct pnv_phb *phb;
>>-	struct pnv_ioda_pe *pe;
>>-
>>-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>-		phb = hose->private_data;
>>-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>-			pnv_ioda_setup_pe_seg(hose, pe);
>>-		}
>>-	}
>>-}
>>-
>>-static void pnv_pci_ioda_setup_DMA(void)
>>-{
>>-	struct pci_controller *hose, *tmp;
>>-	struct pnv_phb *phb;
>>-	struct pnv_ioda_pe *pe;
>>-
>>-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>-		phb = hose->private_data;
>>-		pnv_pci_ioda_setup_opal_tce_kill(phb);
>>-
>>-		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>-			if (!pe->dma32_weight)
>>-				continue;
>>-
>>-			switch (phb->type) {
>>-			case PNV_PHB_IODA1:
>>-				pnv_ioda1_setup_dma(phb, pe);
>>-				break;
>>-			case PNV_PHB_IODA2:
>>-				pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>-				break;
>>-			default:
>>-				pr_warn("%s: No DMA for PHB type %d\n",
>>-					__func__, phb->type);
>>-			}
>>-		}
>>-
>>-		/* Mark the PHB initialization done */
>>-		phb->initialized = 1;
>>-	}
>>-}
>>-
>>  static void pnv_pci_ioda_create_dbgfs(void)
>>  {
>>  #ifdef CONFIG_DEBUG_FS
>>@@ -3029,9 +2968,8 @@ static void pnv_pci_ioda_create_dbgfs(void)
>>
>>  static void pnv_pci_ioda_fixup(void)
>>  {
>>-	pnv_pci_ioda_setup_PEs();
>>-	pnv_pci_ioda_setup_seg();
>>-	pnv_pci_ioda_setup_DMA();
>>+	struct pci_controller *hose, *tmp;
>>+	struct pnv_phb *phb;
>>
>>  	pnv_pci_ioda_create_dbgfs();
>>
>>@@ -3039,6 +2977,12 @@ static void pnv_pci_ioda_fixup(void)
>>  	eeh_init();
>>  	eeh_addr_cache_build();
>>  #endif
>>+
>>+	/* Notify initialization of PHB done */
>>+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>+		phb = hose->private_data;
>>+		phb->initialized = 1;
>>+	}
>>  }
>>
>>  /*
>>@@ -3082,6 +3026,105 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>>  	return phb->ioda.io_segsize;
>>  }
>>
>>+/*
>>+ * We are updating root port or the upstream bridge behind the
>>+ * root port with PHB's windows in order to accommodate the
>>+ * changes on required resources during PCI (slot) hotplug,
>>+ * which is connected to either root port or the downstream
>>+ * ports of PCIe switch behind the root port.
>>+ */
>>+static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
>>+					   unsigned long type)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(bus);
>>+	struct pnv_phb *phb = hose->private_data;
>>+	struct pci_dev *bridge = bus->self;
>>+	struct resource *r, *w;
>>+	int i;
>>+
>>+	/* Check if we need apply fixup to the bridge's windows */
>>+	if (!pci_is_root_bus(bridge->bus) &&
>>+	    !pci_is_root_bus(bridge->bus->self->bus))
>>+		return;
>>+
>>+	/* Fixup the resoureces */
>>+	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
>>+		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
>>+		if (!r->flags || !r->parent)
>>+			continue;
>>+
>>+		w = NULL;
>>+		if (r->flags & type & IORESOURCE_IO)
>>+			w = &hose->io_resource;
>>+		else if (pnv_pci_is_mem_pref_64(r->flags) &&
>>+			 (type & IORESOURCE_PREFETCH) &&
>>+			 phb->ioda.m64_segsize)
>>+			w = &hose->mem_resources[1];
>>+		else if (r->flags & type & IORESOURCE_MEM)
>>+			w = &hose->mem_resources[0];
>>+
>>+		r->start = w->start;
>>+		r->end = w->end;
>>+	}
>>+}
>>+
>>+static void pnv_pci_setup_bridge(struct pci_bus *bus,
>>+				 unsigned long type)
>>+{
>>+	struct pci_controller *hose = pci_bus_to_host(bus);
>>+	struct pnv_phb *phb = hose->private_data;
>>+	struct pci_dev *bridge = bus->self;
>>+	struct pnv_ioda_pe *pe;
>>+	bool all = (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
>>+
>>+	/* The root bus (ancestor PE) should be finalized
>>+	 * before anyone else
>>+	 */
>>+	if (!phb->ioda.root_pe_is_populated) {
>>+		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, false);
>>+		if (pe && phb->ioda.root_pe_idx == IODA_INVALID_PE)
>>+			phb->ioda.root_pe_idx = pe->pe_number;
>>+			phb->ioda.root_pe_is_populated = true;
>>+		}
>
>
>This "}" should be 1 tab left. Of you lost one "{" after if() and its
>counterpart.
>

Good catch! 

>>+
>>+	/* Extend bridge's windows if necessary */
>>+	pnv_pci_fixup_bridge_resources(bus, type);
>>+
>>+	/* Don't assign PE to bus which doesn't have any
>>+	 * subordinate PCI devices.
>>+	 */
>>+	if (list_empty(&bus->devices))
>>+		return;
>>+
>>+	/* Reserve PEs for M64 resource */
>>+	if (phb->reserve_m64_pe)
>>+		phb->reserve_m64_pe(bus, NULL, all);
>>+
>>+	/* Assign PE. We might run here because of partial hotplug.
>>+	 * For the case, we just pick up the existing PE and should
>>+	 * not allocate resources again.
>>+	 */
>>+	pe = pnv_ioda_setup_bus_PE(bus, all);
>>+	if (!pe)
>>+		return;
>>+
>>+	/* Setup MMIO mapping */
>>+	pnv_ioda_setup_pe_seg(hose, pe);
>>+
>>+	/* Setup DMA */
>>+	switch (phb->type) {
>>+	case PNV_PHB_IODA1:
>>+		pnv_ioda1_setup_dma(phb, pe);
>>+		break;
>>+	case PNV_PHB_IODA2:
>>+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>+		break;
>>+	default:
>>+		pr_warn("%s: No DMA for PHB type %d\n",
>>+			__func__, phb->type);
>>+	}
>>+}
>>+
>>  #ifdef CONFIG_PCI_IOV
>>  static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>  						      int resno)
>>@@ -3147,6 +3190,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>>  #endif
>>         .enable_device_hook = pnv_pci_enable_device_hook,
>>         .window_alignment = pnv_pci_window_alignment,
>>+	.setup_bridge = pnv_pci_setup_bridge,
>>         .reset_secondary_bus = pnv_pci_reset_secondary_bus,
>>         .dma_set_mask = pnv_pci_ioda_dma_set_mask,
>>         .shutdown = pnv_pci_ioda_shutdown,
>>@@ -3218,6 +3262,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>  	if (phb->regs == NULL)
>>  		pr_err("  Failed to map registers !\n");
>>
>>+	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>+
>>  	/* Initialize more IODA stuff */
>>  	phb->ioda.total_pe_num = 1;
>>  	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index e93a489..a160491 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -136,6 +136,7 @@ struct pnv_phb {
>>  			/* Global bridge info */
>>  			unsigned int		total_pe_num;
>>  			unsigned int		root_pe_idx;
>>+			bool			root_pe_is_populated;
>>  			unsigned int		reserved_pe_idx;
>>
>>  			/* 32-bit MMIO window */
>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 42/42] pci/hotplug: PowerPC PowerNV PCI hotplug driver
  2015-08-15  4:47     ` Gavin Shan
@ 2015-08-15  9:15       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-15  9:15 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/15/2015 02:47 PM, Gavin Shan wrote:
> On Sat, Aug 15, 2015 at 01:13:21PM +1000, Alexey Kardashevskiy wrote:
>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>> The patch intends to add standalone driver to support PCI hotplug
>>> for PowerPC PowerNV platform, which runs on top of skiboot firmware.
>>> The firmware identified hotpluggable slots and marked their device
>>> tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware".
>>> The driver simply scans device-tree to create/register PCI hotplug slot
>>> accordingly.
>>>
>>> If the skiboot firmware doesn't support slot status retrieval, the PCI
>>> slot device node shouldn't have property "ibm,reset-by-firmware". In
>>> that case, none of valid PCI slots will be detected from device tree.
>>> The skiboot firmware doesn't export the capability to access attention
>>> LEDs yet and it's something for TBD.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>> ---
>>>   MAINTAINERS                            |   6 +
>>>   drivers/pci/hotplug/Kconfig            |  12 +
>>>   drivers/pci/hotplug/Makefile           |   4 +
>>>   drivers/pci/hotplug/powernv_php.c      | 140 +++++++
>>>   drivers/pci/hotplug/powernv_php.h      |  92 +++++
>>>   drivers/pci/hotplug/powernv_php_slot.c | 722 +++++++++++++++++++++++++++++++++
>>>   6 files changed, 976 insertions(+)
>>>   create mode 100644 drivers/pci/hotplug/powernv_php.c
>>>   create mode 100644 drivers/pci/hotplug/powernv_php.h
>>>   create mode 100644 drivers/pci/hotplug/powernv_php_slot.c
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index fd60784..3b75c92 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -7747,6 +7747,12 @@ L:	linux-pci@vger.kernel.org
>>>   S:	Supported
>>>   F:	Documentation/PCI/pci-error-recovery.txt
>>>
>>> +PCI HOTPLUG DRIVER FOR POWERNV PLATFORM
>>> +M:	Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> +L:	linux-pci@vger.kernel.org
>>> +S:	Supported
>>> +F:	drivers/pci/hotplug/powernv_php*
>>> +
>>>   PCI SUBSYSTEM
>>>   M:	Bjorn Helgaas <bhelgaas@google.com>
>>>   L:	linux-pci@vger.kernel.org
>>> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>>> index df8caec..ef55dae 100644
>>> --- a/drivers/pci/hotplug/Kconfig
>>> +++ b/drivers/pci/hotplug/Kconfig
>>> @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
>>>
>>>   	  When in doubt, say N.
>>>
>>> +config HOTPLUG_PCI_POWERNV
>>> +	tristate "PowerPC PowerNV PCI Hotplug driver"
>>> +	depends on PPC_POWERNV && EEH
>>> +	help
>>> +	  Say Y here if you run PowerPC PowerNV platform that supports
>>> +          PCI Hotplug
>>> +
>>> +	  To compile this driver as a module, choose M here: the
>>> +	  module will be called powernv-php.
>>> +
>>> +	  When in doubt, say N.
>>> +
>>>   config HOTPLUG_PCI_RPA
>>>   	tristate "RPA PCI Hotplug driver"
>>>   	depends on PPC_PSERIES && EEH
>>> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>> index b616e75..fd51d65 100644
>>> --- a/drivers/pci/hotplug/Makefile
>>> +++ b/drivers/pci/hotplug/Makefile
>>> @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
>>>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
>>>   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
>>>   obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
>>> +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= powernv-php.o
>>>   obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
>>>   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
>>>   obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
>>> @@ -50,6 +51,9 @@ ibmphp-objs		:=	ibmphp_core.o	\
>>>   acpiphp-objs		:=	acpiphp_core.o	\
>>>   				acpiphp_glue.o
>>>
>>> +powernv-php-objs	:=	powernv_php.o	\
>>> +				powernv_php_slot.o
>>> +
>>>   rpaphp-objs		:=	rpaphp_core.o	\
>>>   				rpaphp_pci.o	\
>>>   				rpaphp_slot.o
>>> diff --git a/drivers/pci/hotplug/powernv_php.c b/drivers/pci/hotplug/powernv_php.c
>>> new file mode 100644
>>> index 0000000..4cbff7a
>>> --- /dev/null
>>> +++ b/drivers/pci/hotplug/powernv_php.c
>>> @@ -0,0 +1,140 @@
>>> +/*
>>> + * PCI Hotplug Driver for PowerPC PowerNV platform.
>>> + *
>>> + * Copyright Gavin Shan, IBM Corporation 2015.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + */
>>> +
>>> +#include <linux/module.h>
>>> +
>>> +#include <asm/opal.h>
>>> +#include <asm/pnv-pci.h>
>>> +
>>> +#include "powernv_php.h"
>>> +
>>> +#define DRIVER_VERSION	"0.1"
>>> +#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
>>> +#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
>>
>>
>> Align all or none.
>>
>
> All of them are already aligned well. Please check your email setting.

Right. tabs. My bad :)



>>> +
>>> +static struct notifier_block php_msg_nb = {
>>> +	.notifier_call	= powernv_php_msg_handler,
>>> +	.next		= NULL,
>>> +	.priority	= 0,
>>> +};
>>> +
>>> +static int powernv_php_register_one(struct device_node *dn)
>>> +{
>>> +	struct powernv_php_slot *slot;
>>> +	const __be32 *prop32;
>>> +	int ret;
>>> +
>>> +	/* Check if it's hotpluggable slot */
>>> +	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
>>> +	if (!prop32 || !of_read_number(prop32, 1))
>>> +		return -ENXIO;
>>> +
>>> +	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
>>> +	if (!prop32 || !of_read_number(prop32, 1))
>>> +		return -ENXIO;
>>> +
>>> +	/* Allocate slot */
>>> +	slot = powernv_php_slot_alloc(dn);
>>> +	if (!slot)
>>> +		return -ENODEV;
>>> +
>>> +	/* Register it */
>>> +	ret = powernv_php_slot_register(slot);
>>> +	if (ret) {
>>> +		powernv_php_slot_put(slot);
>>> +		return ret;
>>> +	}
>>> +
>>> +	return powernv_php_slot_enable(slot->php_slot, false);
>>
>>
>> And if it fails, no unregister and cleanup is needed?
>>
>
> You're right that it need care the failure cases. Will add something
> in next revision.
>
>>> +}
>>> +
>>> +int powernv_php_register(struct device_node *dn)
>>> +{
>>> +	struct device_node *child;
>>> +	int ret = 0;
>>
>> @ret is not used below.
>>
>
> Ok. will remove it.
>
>>> +
>>> +	/*
>>> +	 * The parent slots should be registered before their
>>> +	 * child slots.
>>> +	 */
>>> +	for_each_child_of_node(dn, child) {
>>> +		powernv_php_register_one(child);
>>> +		powernv_php_register(child);
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void powernv_php_unregister_one(struct device_node *dn)
>>> +{
>>> +	struct powernv_php_slot *slot;
>>> +
>>> +	slot = powernv_php_slot_find(dn);
>>> +	if (!slot)
>>> +		return;
>>> +
>>> +	pci_hp_deregister(slot->php_slot);
>>> +}
>>> +
>>> +void powernv_php_unregister(struct device_node *dn)
>>> +{
>>> +	struct device_node *child;
>>> +
>>> +	/* The child slots should go before their parent slots */
>>> +	for_each_child_of_node(dn, child) {
>>> +		powernv_php_unregister(child);
>>> +		powernv_php_unregister_one(child);
>>> +	}
>>> +}
>>> +
>>> +static int __init powernv_php_init(void)
>>> +{
>>> +	struct device_node *dn;
>>> +	int ret;
>>> +
>>> +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
>>> +
>>> +	/* Register hotplug message handler */
>>> +	ret = pnv_pci_hotplug_notifier_register(&php_msg_nb);
>>> +	if (ret) {
>>> +		pr_warn("%s: Error %d registering hotplug notifier\n",
>>> +			__func__, ret);
>>> +		return ret;
>>> +	}
>>> +
>>> +	/* Scan PHB nodes and their children */
>>> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>> +		powernv_php_register(dn);
>>> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>> +		powernv_php_register(dn);
>>
>>
>> May be move pnv_pci_hotplug_notifier_register() after powernv_php_register()?
>> If not, then below (in powernv_php_exit()) move
>> pnv_pci_hotplug_notifier_unregister() to the end?
>>
>>
>
> Ok. I'll move pnv_pci_hotplug_notifier_unregister() to end of powernv_php_exit().


Sure? Just checking. If you do this, you will have situation when a 
notifier is enabled but there is no slot which does not seem dangerous but 
also not very accurate. I'd rather stop events from coming first, and then 
do cleanup.


>
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static void __exit powernv_php_exit(void)
>>> +{
>>> +	struct device_node *dn;
>>> +
>>> +	pnv_pci_hotplug_notifier_unregister(&php_msg_nb);
>>> +
>>> +	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
>>> +		powernv_php_unregister(dn);
>>> +	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
>>> +		powernv_php_unregister(dn);
>>> +}
>>> +
>>> +module_init(powernv_php_init);
>>> +module_exit(powernv_php_exit);
>>> +
>>> +MODULE_VERSION(DRIVER_VERSION);
>>> +MODULE_LICENSE("GPL v2");
>>> +MODULE_AUTHOR(DRIVER_AUTHOR);
>>> +MODULE_DESCRIPTION(DRIVER_DESC);
>>> diff --git a/drivers/pci/hotplug/powernv_php.h b/drivers/pci/hotplug/powernv_php.h
>>> new file mode 100644
>>> index 0000000..8034cc6
>>> --- /dev/null
>>> +++ b/drivers/pci/hotplug/powernv_php.h
>>> @@ -0,0 +1,92 @@
>>> +/*
>>> + * PCI Hotplug Driver for PowerPC PowerNV platform.
>>> + *
>>> + * Copyright Gavin Shan, IBM Corporation 2015.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + */
>>> +
>>> +#ifndef _POWERNV_PHP_H
>>> +#define _POWERNV_PHP_H
>>> +
>>> +#include <linux/list.h>
>>> +#include <linux/kref.h>
>>> +#include <linux/of.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/pci_hotplug.h>
>>> +#include <linux/wait.h>
>>> +#include <linux/workqueue.h>
>>> +
>>> +#include <asm/opal-api.h>
>>> +
>>> +/* Slot power status */
>>> +#define POWERNV_PHP_SLOT_POWER_OFF	0
>>> +#define POWERNV_PHP_SLOT_POWER_ON	1
>>> +
>>> +/* Slot presence status */
>>> +#define POWERNV_PHP_SLOT_EMPTY		0
>>> +#define POWERNV_PHP_SLOT_PRESENT	1
>>
>> These two are also only used in drivers/pci/hotplug/powernv_php_slot.c,
>> move them there at least. It also seems your PHP driver is the only one which
>> uses flags for an adapter status, others return plain 0 or 1 (which are
>> c-style "false" and "true", pretty much, so it is not the case of magic
>> constants). Since you return these values from the hotplug_slot_ops callbacks
>> to external code, you should probably do the same.
>>
>> And exactly the same comment about POWERNV_PHP_SLOT_POWER_ON/OFF few lines
>> above.
>>
>
> All those mcaroes are good enough

For the starter these macros are useless and should be removed, below is why ;)


> to be put in this header file since they
> are part of the slot's status.


They are part of the slot status. And the slot struct members should only 
be accessed by powernv_php_slot.c just because it is cleaner and easier to 
read/debug/support.


> Yes, they're only used in powernv_php_slot.c
> currently, but doesn't have to in future.

When/if this happens - you will move them to the header. But I seriously 
doubt this will ever happen.


> I don't see why I need change them
> to true/false, which just represents two states at most.

Because you use these macros to return status to pci_hotplug_core.c which 
is common code. Everybody else returns 0 and 1, this is an established API, 
if you implement hotplug_slot_ops, you should use what the caller expects, 
which is precisely 0 or 1. Readers of this code expect to see 0 and 1 too. 
If there were some macros defined in include/linux/pci_hotplug.h, you would 
use them but there is none (which is not very nice and probably needs 
fixing by introducing them across all PHB drivers but not in this patchset 
for sure).


> Here, all states
> are represented by numberic values (as POWERNV_PHP_SLOT_ATTEN_* as below).


You have different sets of states - some you report to pci_hotplug_core.c 
and you should follow pci_hotplug_core.c's rules/expectations; some are 
for/from OPAL, these you should define. It is not like "define everything" 
or "define nothing" :)


>
>>
>>
>>> +
>>> +/* Slot attention status */
>>> +#define POWERNV_PHP_SLOT_ATTEN_OFF	0
>>> +#define POWERNV_PHP_SLOT_ATTEN_ON	1
>>> +#define POWERNV_PHP_SLOT_ATTEN_IND	2
>>> +#define POWERNV_PHP_SLOT_ATTEN_ACT	3
>>
>>
>> These should go to drivers/pci/hotplug/powernv_php_slot.c. Where are these
>> flags defined? Looks to me like there is a way to pass some status from the
>> userspace via sysfs to OPAL so only OPAL is supposed to recognize and handle
>> these. If so, these macros are missing "OPAL" in their names. I have one more
>> comment below about it.
>>
>
> I prefer keep them in this header file as explained above. Those macros are
> not connected/passed to OPAL directly. All OPAL calls have corresponding
> wrappers in arch/powerpc/platforms/powernv/pci.c.


aaaand below you agreed to remove them for now so I'll skip this comment :)


>>> +
>>> +struct powernv_php_slot {
>>> +	char			*name;
>>> +	struct device_node	*dn;
>>> +	struct pci_dev		*pdev;
>>> +	struct pci_bus		*bus;
>>> +	uint64_t		id;
>>> +	int			slot_no;
>>> +	struct kref		kref;
>>> +#define POWERNV_PHP_SLOT_STATE_INIT		0
>>> +#define POWERNV_PHP_SLOT_STATE_REGISTER		1
>>> +#define POWERNV_PHP_SLOT_STATE_POPULATED	2
>>> +	int			state;
>>> +	int			check_power_status;
>>> +	int			status_confirmed;
>>
>> s/status_confirmed/power_status_confirmed/
>>
>
> Maybe, I think "status_confirmed" is enough here.
>
>> What is this status? It can be 0, 1, 2 which seems to be
>> UNCONFIRMED/INPROGRESS/CONFIRMED (does not need PNV/IODA prefixes as it is
>> local to the powernv_php_slot.c file).
>>
>
> No, it only has two states: 0/1.

No it is not. slot_power_on_handler() puts "2" to local variable @status 
and then stores it into slot->status_confirmed:


127 static void slot_power_on_handler(struct powernv_php_slot *slot)
[...]

190 free_fdt:
191         kfree(fdt);
192         status = 2;
193 out:
194         /* Confirm status change */
195         slot->status_confirmed = status;
196         wake_up_interruptible(&slot->queue);
197 }



> Again, it's connected with any OPAL
> calls directly. No need to have "IODA_" prefix at all.

Sorry, I do not follow you here.


>
>>
>>> +	struct opal_msg		*msg;
>>> +	void			*fdt;
>>> +	void			*dt;
>>> +	struct of_changeset	ocs;
>>> +	struct work_struct	work;
>>> +	wait_queue_head_t	queue;
>>> +	struct hotplug_slot	*php_slot;
>>> +	struct powernv_php_slot	*parent;
>>> +	struct list_head	children;
>>> +	struct list_head	link;
>>> +};
>>
>> This should go to drivers/pci/hotplug/powernv_php_slot.c and this header
>> should only have a forward declaration. After you move it there, you get
>> better separation of the driver code from the slot code and only 2 changes
>> will be needed:
>>
>> 1. powernv_php_slot_enable() should receive powernv_php_slot
>> 2. add powernv_php_slot_unregister() (like powernv_php_slot_register()), this
>> way you will have pairing pci_hp_register/pci_hp_deregister in the same file.
>>
>>
>> After you moved this struct to the source file, you could remove/shorten
>> POWERNV_PHP_SLOT_STATE_ prefixes if you wished.
>>
>
> I don't see why it should go to drivers/pci/hotplug/powernv_php_slot.c. it's
> the core data structure shared by multiple source files.


It is not shared - this is my point. The _only_ shared data is a php_slot 
pointer, the rest is never used outside of powernv_php_slot.c and even this 
does not need to be shared - and I above explained how.

powernv_php_slot.c handles a single slot. powernv_php.c handles all slots. 
That is the distinction and it is better to follow it while writing the 
code. It is always a good thing to restrict struct members visibility to a 
single file.

If you do not want to draw precise line between these 2 files (which is 
easy in this soecific case), just merge them into one.

btw #2 needs to be done in any case.



>>> +
>>> +int powernv_php_msg_handler(struct notifier_block *nb,
>>> +			    unsigned long type, void *message);
>>> +struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn);
>>> +void powernv_php_slot_free(struct kref *kref);
>>> +struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn);
>>> +int powernv_php_slot_register(struct powernv_php_slot *slot);
>>> +int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan);
>>> +int powernv_php_register(struct device_node *dn);
>>> +void powernv_php_unregister(struct device_node *dn);
>>> +
>>> +#define to_powernv_php_slot(kref) \
>>> +	container_of(kref, struct powernv_php_slot, kref)
>>> +
>>> +static inline void powernv_php_slot_get(struct powernv_php_slot *slot)
>>> +{
>>> +	if (slot)
>>> +		kref_get(&slot->kref);
>>> +}
>>> +
>>> +static inline int powernv_php_slot_put(struct powernv_php_slot *slot)
>>> +{
>>> +	if (slot)
>>> +		return kref_put(&slot->kref, powernv_php_slot_free);
>>> +
>>> +	return 0;
>>> +}
>>
>> In these 2 helpers you do not have to check for @slof - it is checked in the
>> callers pretty much always. Or it is not checked in php_slot_release() but
>> dereferenced before you call powernv_php_slot_put(slot).
>>
>> The only place you really want this check is
>> powernv_php_slot_put(slot->parent) so just check it there. btw is it even
>> possible for the slot not to have a parent?
>>
>
> For most cases, slot doesn't have parent. That's why I had the check here.
> Yes, if you really want me to drop the check, then I have to check slot's
> validation in the callers.

You already do this in all places except one. If you did not - then you 
could argue that it is useful but you do check already (except one case).

> Also, those helpers are inline functions, no
> too much difference actually.

Harder to read.

>> So I'd ditch these helpers.
>>
>
> I don't see the reason why these helpers need to be dropped. To use kref_{get,put}
> directly?

Yes.


>
>>> +
>>> +#endif /* !_POWERNV_PHP_H */
>>> diff --git a/drivers/pci/hotplug/powernv_php_slot.c b/drivers/pci/hotplug/powernv_php_slot.c
>>> new file mode 100644
>>> index 0000000..73a93a2
>>> --- /dev/null
>>> +++ b/drivers/pci/hotplug/powernv_php_slot.c
>>> @@ -0,0 +1,722 @@
>>> +/*
>>> + * PCI Hotplug Driver for PowerPC PowerNV platform.
>>> + *
>>> + * Copyright Gavin Shan, IBM Corporation 2015.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + */
>>> +
>>> +#include <linux/module.h>
>>> +
>>> +#include <asm/opal.h>
>>> +#include <asm/pnv-pci.h>
>>> +#include <asm/ppc-pci.h>
>>> +
>>> +#include "powernv_php.h"
>>> +
>>> +static LIST_HEAD(php_slot_list);
>>> +static DEFINE_SPINLOCK(php_slot_lock);
>>> +
>>> +/*
>>> + * Remove firmware data for all child device nodes of the
>>> + * indicated one.
>>> + */
>>> +static void remove_child_pdn(struct device_node *np)
>>> +{
>>> +	struct device_node *child;
>>> +
>>> +	for_each_child_of_node(np, child) {
>>> +		/* In depth first */
>>> +		remove_child_pdn(child);
>>> +
>>> +		remove_pci_device_node_info(child);
>>> +	}
>>> +}
>>> +
>>> +/*
>>> + * Remove all subordinate device nodes of the indicated one.
>>> + * Those device nodes in deepest path should be released firstly.
>>> + */
>>> +static int remove_child_device_nodes(struct device_node *parent)
>>> +{
>>> +	struct device_node *np, *child;
>>> +	int ret = 0;
>>> +
>>> +	/* If the device node has children, remove them firstly */
>>> +	for_each_child_of_node(parent, np) {
>>> +		ret = remove_child_device_nodes(np);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>> +		/* The device shouldn't have alive children */
>>> +		child = of_get_next_child(np, NULL);
>>> +		if (child) {
>>> +			of_node_put(child);
>>> +			of_node_put(np);
>>> +			pr_err("%s: Alive children of node <%s>\n",
>>> +			       __func__, of_node_full_name(np));
>>> +			return -EBUSY;
>>> +		}
>>> +
>>> +		/* Detach the device node */
>>> +		of_detach_node(np);
>>> +		of_node_put(np);
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * The function processes the message sent by firmware
>>> + * to remove all device tree nodes beneath the slot's
>>> + * nodes, and the associated auxillary data.
>>> + */
>>> +static void slot_power_off_handler(struct powernv_php_slot *slot)
>>> +{
>>> +	int ret, status = 1;
>>> +
>>> +	/* Release the firmware data for the child device nodes */
>>> +	remove_child_pdn(slot->dn);
>>> +
>>> +	/*
>>> +	 * Release the child device nodes. If the sub-tree was
>>> +	 * built with the help of changeset, we just need destroy
>>> +	 * the changes.
>>> +	 */
>>> +	if (slot->fdt) {
>>> +		of_changeset_destroy(&slot->ocs);
>>> +		kfree(slot->dt);
>>> +		slot->dt = NULL;
>>> +		slot->dn->child = NULL;
>>> +		kfree(slot->fdt);
>>> +		slot->fdt = NULL;
>>> +	} else {
>>> +		ret = remove_child_device_nodes(slot->dn);
>>> +		if (ret) {
>>> +			status = 2;
>>> +			dev_warn(&slot->pdev->dev, "Error %d freeing nodes\n",
>>> +				 ret);
>>> +		}
>>> +	}
>>> +
>>> +	/* Confirm status change */
>>> +	slot->status_confirmed = status;
>>> +	wake_up_interruptible(&slot->queue);
>>> +}
>>> +
>>> +static int slot_populate_changeset(struct of_changeset *ocs,
>>> +				    struct device_node *dn)
>>> +{
>>> +	struct device_node *child;
>>> +	int ret = 0;
>>> +
>>> +	for_each_child_of_node(dn, child) {
>>> +		ret = of_changeset_attach_node(ocs, child);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>> +		ret = slot_populate_changeset(ocs, child);
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void slot_power_on_handler(struct powernv_php_slot *slot)
>>> +{
>>> +	void *fdt, *dt;
>>> +	uint64_t len;
>>> +	int ret, status = 1;
>>> +
>>> +	/* We don't know the FDT blob size. It tries with incremental
>>> +	 * sized memory chunk.
>>> +	 */
>>
>>
>> What is the real expected size? 0x10000 is just 64K, just allocate it and
>> that's it.
>>
>
> You can not know the size in advance. It depends on the size of the FDT
> blob to be transfered. So the size has to be probed.

May be this is common practice with OPAL or linux, I do not know.

Usually when people implement API call which returns a blob and a size, it 
also has a way to signal that the buffer is not big enough and say what 
size is enough

And then the client would call it with blob=NULL (or small buf size, like a 
minumum device tree header which may have size already, do not know details 
so cannot tell exactly), a positive return value would be a required size - 
if it is bigger than the buffer size - you increase the buffer size, 
reallocate the blob and call again; if it is equal or less - then you got 
your tree completely already.

I am telling you all this because I cannot see OPAL_GET_DEVICE_TREE in 
https://github.com/open-power/hostboot so I assume it has not been pushed 
there yet and therefore can be fixed.



>
>>> +	for (len = 0x2000; len <= 0x10000; len += 0x2000) {
>>> +		fdt = kzalloc(len, GFP_KERNEL);
>>> +		if (!fdt)
>>> +			break;
>>> +
>>> +		ret = pnv_pci_get_device_tree(slot->dn->phandle, fdt, len);
>>> +		if (!ret)
>>> +			break;
>>> +
>>> +		kfree(fdt);
>>> +	}
>>> +
>>> +	if (len > 0x10000) {
>>> +		dev_warn(&slot->pdev->dev, "Cannot alloc FDT blob\n");
>>> +		goto out;
>>> +	}
>>> +
>>> +	/* Unflatten device tree blob */
>>> +	dt = of_fdt_unflatten_tree(fdt, slot->dn, NULL);
>>> +	if (!dt) {
>>> +		dev_warn(&slot->pdev->dev, "Cannot unflatten FDT\n");
>>> +		goto free_fdt;
>>> +	}
>>
>> Right here you could kfree(fdt) and not cache it in the slot struct at all.
>> You do not use it later anyway.
>>
>
> The "fdt" has been cached at end of this function and it can't be released
> because the unflattened device-tree is still using data in the FDT blob (@fdt).

Oh. Did not know that. Ok.


>
>>
>>> +
>>> +	/* Initialize and apply the changeset */
>>> +	of_changeset_init(&slot->ocs);
>>> +	ret = slot_populate_changeset(&slot->ocs, slot->dn);
>>> +	if (ret) {
>>> +		dev_warn(&slot->pdev->dev, "Error %d populating changeset\n",
>>> +			 ret);
>>> +		goto free_dt;
>>> +	}
>>> +
>>> +	slot->dn->child = NULL;
>>> +	ret = of_changeset_apply(&slot->ocs);
>>> +	if (ret) {
>>> +		dev_warn(&slot->pdev->dev, "Error %d applying changeset\n",
>>> +			 ret);
>>> +		goto destroy_changeset;
>>> +	}
>>> +
>>> +	/* Add device node firmware data */
>>> +	traverse_pci_device_nodes(slot->dn,
>>> +				  add_pci_device_node_info,
>>> +				  pci_bus_to_host(slot->bus));
>>> +	slot->fdt = fdt;
>>> +	slot->dt = dt;
>>> +	goto out;
>>> +
>>> +destroy_changeset:
>>> +	of_changeset_destroy(&slot->ocs);
>>> +free_dt:
>>> +	kfree(dt);
>>> +	slot->dn->child = NULL;
>>
>> Can of_fdt_unflatten_tree() or of_changeset_init() or
>> slot_populate_changeset() initialize dn->child? No kfree(slot->dn->child)?
>>
>
> of_fdt_unflatten_tree() did. @dt is the unflattened device-tree here.
> No need to free slot->dn->child.

Ok, good.


>
>>> +free_fdt:
>>> +	kfree(fdt);
>>> +	status = 2;
>>> +out:
>>> +	/* Confirm status change */
>>> +	slot->status_confirmed = status;
>>> +	wake_up_interruptible(&slot->queue);
>>> +}
>>> +
>>> +static void powernv_php_slot_work(struct work_struct *data)
>>> +{
>>> +	struct powernv_php_slot *slot = container_of(data,
>>> +						     struct powernv_php_slot,
>>> +						     work);
>>> +	uint64_t php_event = be64_to_cpu(slot->msg->params[0]);
>>> +
>>> +	switch (php_event) {
>>> +	case 0: /* Slot power off */
>>> +		slot_power_off_handler(slot);
>>> +		break;
>>> +	case 1: /* Slot power on */
>>> +		slot_power_on_handler(slot);
>>> +		break;
>>
>> These 0 and 1 are not the same 0 and 1 used for @val in get_power_status(),
>> these are from OPAL so please define them.
>>
>
> Good point. I'll have following macros for them in opal-api.h:
>
> #define OPAL_PCI_PHP_EVENT_POWER_OFF	0
> #define OPAL_PCI_PHP_EVENT_POWER_ON	1
>
> Or to use "enum" type.
>
>>
>>> +	default:
>>> +		dev_warn(&slot->pdev->dev, "Unsupported hotplug event %lld\n",
>>> +			 php_event);
>>> +	}
>>> +
>>> +	of_node_put(slot->dn);
>>> +}
>>> +
>>> +int powernv_php_msg_handler(struct notifier_block *nb,
>>> +			    unsigned long type, void *message)
>>> +{
>>> +	phandle h;
>>> +	struct device_node *np;
>>> +	struct powernv_php_slot *slot;
>>> +	struct opal_msg *msg = message;
>>> +
>>> +	/* Check the message type */
>>> +	if (type != OPAL_MSG_PCI_HOTPLUG) {
>>> +		pr_warn("%s: Wrong message type %ld received!\n",
>>> +			__func__, type);
>>> +		return NOTIFY_DONE;
>>> +	}
>>> +
>>> +	/* Find the device node */
>>> +	h = (phandle)be64_to_cpu(msg->params[1]);
>>> +	np = of_find_node_by_phandle(h);
>>> +	if (!np) {
>>> +		pr_warn("%s: No device node for phandle 0x%08x\n",
>>> +			__func__, h);
>>> +		return NOTIFY_DONE;
>>> +	}
>>> +
>>> +	/* Find the slot */
>>> +	slot = powernv_php_slot_find(np);
>>> +	if (!slot) {
>>> +		pr_warn("%s: No slot found for node <%s>\n",
>>> +			__func__, of_node_full_name(np));
>>> +		of_node_put(np);
>>> +		return NOTIFY_DONE;
>>> +	}
>>> +
>>> +	/* Schedule the work */
>>> +	slot->msg = msg;
>>> +	schedule_work(&slot->work);
>>> +	return NOTIFY_OK;
>>> +}
>>
>> This function belongs to drivers/pci/hotplug/powernv_php.c (searching a slot
>> is powernv_php.c's scope) except these lines:
>>
>>> +	/* Schedule the work */
>>> +	slot->msg = msg;
>>> +	schedule_work(&slot->work);
>>
>> These 3 lines should be in helper in drivers/pci/hotplug/powernv_php_slot.c.
>>
>
> It's no point to drop one helper and add another one. One thing I keep
> in mind when writing the code: all hotplug logic (interacting with
> OPAL firmware) is implemented in powernv_php_slot.c and powernv_php.c
> just connect the helper functions with event.


As I said - powernv_php_slot.c handles a single slot and powernv_php.c 
handles all slots so it makes sense to keep all operations which do not 
access slot data in powernv_php.c  and  move operations which access slot 
internal data to powernv_php_slot.c.


>>> +
>>> +static int set_power_status(struct hotplug_slot *php_slot, u8 val)
>>> +{
>>> +	struct powernv_php_slot *slot = php_slot->private;
>>> +	int ret;
>>> +
>>> +	/* Set power status */
>>> +	slot->status_confirmed = 0;
>>> +	ret = pnv_pci_set_power_status(slot->id, val);
>>> +	if (ret) {
>>> +		dev_warn(&slot->pdev->dev, "Error %d powering %s slot\n",
>>> +			 ret, val ? "on" : "off");
>>> +		return ret;
>>> +	}
>>> +
>>> +	/* Continue to PCI probing after finalized device-tree. The
>>> +	 * device-tree might have been updated completely at this
>>> +	 * point. Thus we don't have to always waiting for that.
>>> +	 */
>>> +	if (slot->status_confirmed == 1)
>>> +		return 0;
>>> +	else if (slot->status_confirmed > 0)
>>> +		return -EBUSY;
>>> +
>>> +	ret = wait_event_timeout(slot->queue, slot->status_confirmed, 10 * HZ);
>>> +	if (!ret) {
>>> +		dev_warn(&slot->pdev->dev, "Error %d waiting for power-%s\n",
>>> +			 ret, val ? "on" : "off");
>>> +		return -EBUSY;
>>> +	}
>>> +
>>> +	/* Check the result */
>>> +	if (slot->status_confirmed == 1)
>>> +		return 0;
>>> +
>>> +	dev_warn(&slot->pdev->dev, "Error status %d for power-%s\n",
>>> +		 slot->status_confirmed, val ? "on" : "off");
>>> +	return -EBUSY;
>>> +}
>>> +
>>> +static int get_power_status(struct hotplug_slot *php_slot, u8 *val)
>>> +{
>>> +	struct powernv_php_slot *slot = php_slot->private;
>>> +	uint8_t state;
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * Retrieve power status from firmware. If we fail
>>> +	 * getting that, the power status fails back to
>>> +	 * be on.
>>> +	 */
>>> +	ret = pnv_pci_get_power_status(slot->id, &state);
>>> +	if (ret) {
>>> +		*val = POWERNV_PHP_SLOT_POWER_ON;
>>> +		dev_warn(&slot->pdev->dev, "Error %d getting power status\n",
>>> +			 ret);
>>> +	} else {
>>> +		*val = state ? POWERNV_PHP_SLOT_POWER_ON :
>>> +			       POWERNV_PHP_SLOT_POWER_OFF;
>>> +		php_slot->info->power_status = *val;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int get_adapter_status(struct hotplug_slot *php_slot, u8 *val)
>>> +{
>>> +	struct powernv_php_slot *slot = php_slot->private;
>>> +	uint8_t state;
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * Retrieve presence status from firmware. If we can't
>>> +	 * get that, it will fail back to be empty.
>>> +	 */
>>> +	ret = pnv_pci_get_presence_status(slot->id, &state);
>>> +	if (ret >= 0) {
>>> +		ret = 0;
>>> +		*val = state ? POWERNV_PHP_SLOT_PRESENT :
>>> +			       POWERNV_PHP_SLOT_EMPTY;
>>> +		php_slot->info->adapter_status = *val;
>>> +		ret = 0;
>>> +	} else {
>>> +		*val = POWERNV_PHP_SLOT_EMPTY;
>>> +		dev_warn(&slot->pdev->dev, "Error %d getting presence\n",
>>> +			 ret);
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static int set_attention_status(struct hotplug_slot *php_slot, u8 val)
>>> +{
>>> +	struct powernv_php_slot *slot = php_slot->private;
>>> +
>>> +	/* The default operation would to turn on the attention */
>>> +	switch (val) {
>>> +	case POWERNV_PHP_SLOT_ATTEN_OFF:
>>> +	case POWERNV_PHP_SLOT_ATTEN_ON:
>>> +	case POWERNV_PHP_SLOT_ATTEN_IND:
>>> +	case POWERNV_PHP_SLOT_ATTEN_ACT:
>>> +		break;
>>> +	default:
>>> +		dev_warn(&slot->pdev->dev, "Invalid attention %d\n", val);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	/* FIXME: Make it real once firmware supports it */
>>> +	php_slot->info->attention_status = val;
>>
>> Since firmware does not have an idea about these POWERNV_PHP_SLOT_ATTEN_xxx,
>> just remove them. Later when the firmware will know about them, we will have
>> to change this code anyway and by that time, the set of states may have
>> changed.
>>
>
> Sure, will do.
>
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan)
>>
>>
>> This should receive powernv_php_slot as described above.
>>
>
> Good point, I'll change accordingly.
>
>>> +{
>>> +	struct powernv_php_slot *slot = php_slot->private;
>>> +	uint8_t presence, power_status;
>>> +	int ret;
>>> +
>>> +	/* Check if the slot has been configured */
>>> +	if (slot->state != POWERNV_PHP_SLOT_STATE_REGISTER)
>>> +		return 0;
>>> +
>>> +	/* Retrieve slot presence status */
>>> +	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/* Proceed if there have nothing behind the slot */
>>> +	if (presence == POWERNV_PHP_SLOT_EMPTY)
>>> +		goto scan;
>>> +
>>> +	/*
>>> +	 * If we don't detect something behind the slot, we need
>>> +	 * make sure the power suply to the slot is on. Otherwise,
>>> +	 * the slot downstream PCIe linkturn should be down.
>>> +	 *
>>> +	 * On the first time, we don't change the power status to
>>> +	 * boost system boot with assumption that the firmware
>>> +	 * supplies consistent slot power status: empty slot always
>>> +	 * has its power off and non-empty slot has its power on.
>>> +	 */
>>> +	if (!slot->check_power_status) {
>>> +		slot->check_power_status = 1;
>>> +		goto scan;
>>> +	}
>>> +
>>> +	/* Check the power status. Scan the slot if that's already on */
>>> +	ret = php_slot->ops->get_power_status(php_slot, &power_status);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	if (power_status == POWERNV_PHP_SLOT_POWER_ON)
>>> +		goto scan;
>>> +
>>> +	/* Power is off, turn it on and then scan the slot */
>>> +	ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_ON);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +scan:
>>> +	switch (presence) {
>>> +	case POWERNV_PHP_SLOT_PRESENT:
>>> +		if (rescan) {
>>> +			pci_lock_rescan_remove();
>>> +			pci_add_pci_devices(slot->bus);
>>> +			pci_unlock_rescan_remove();
>>> +		}
>>> +
>>> +		/* Rescan for child hotpluggable slots */
>>> +		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
>>> +		if (rescan)
>>> +			powernv_php_register(slot->dn);
>>> +		break;
>>> +	case POWERNV_PHP_SLOT_EMPTY:
>>> +		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
>>> +		break;
>>> +	default:
>>> +		dev_warn(&slot->pdev->dev, "Invalid presence status %d\n",
>>> +			 presence);
>>> +		return -EINVAL;
>>
>> Neigher PHP driver will ever have presence other than 0 or 1. So this
>> switch() is simple if(presence){}else{}.
>>
>
> Ok. Will use "if () {} else {}" then.
>
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int enable_slot(struct hotplug_slot *php_slot)
>>> +{
>>> +	return powernv_php_slot_enable(php_slot, true);
>>> +}
>>> +
>>> +static int disable_slot(struct hotplug_slot *php_slot)
>>> +{
>>> +	struct powernv_php_slot *slot = php_slot->private;
>>> +	uint8_t power_status;
>>> +	int ret;
>>> +
>>> +	if (slot->state != POWERNV_PHP_SLOT_STATE_POPULATED)
>>> +		return 0;
>>> +
>>> +	/* Remove all devices behind the slot */
>>> +	pci_lock_rescan_remove();
>>> +	pci_remove_pci_devices(slot->bus);
>>> +	pci_unlock_rescan_remove();
>>> +
>>> +	/* Detach the child hotpluggable slots */
>>> +	powernv_php_unregister(slot->dn);
>>> +
>>> +	/*
>>> +	 * Check the power status and turn it off if necessary. If we
>>> +	 * fail to get the power status, the power will be forced to
>>> +	 * be off.
>>> +	 */
>>> +	ret = php_slot->ops->get_power_status(php_slot, &power_status);
>>> +	if (ret || power_status == POWERNV_PHP_SLOT_POWER_ON) {
>>> +		ret = set_power_status(php_slot, POWERNV_PHP_SLOT_POWER_OFF);
>>> +		if (ret)
>>> +			dev_warn(&slot->pdev->dev, "Error %d powering off\n",
>>> +				 ret);
>>> +	}
>>> +
>>> +	/* Update slot state */
>>> +	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
>>> +	return 0;
>>> +}
>>> +
>>> +static struct hotplug_slot_ops php_slot_ops = {
>>> +	.get_power_status	= get_power_status,
>>> +	.get_adapter_status	= get_adapter_status,
>>> +	.set_attention_status	= set_attention_status,
>>> +	.enable_slot		= enable_slot,
>>> +	.disable_slot		= disable_slot,
>>> +};
>>> +
>>> +static struct powernv_php_slot *php_slot_match(struct device_node *dn,
>>> +					       struct powernv_php_slot *slot)
>>> +{
>>> +	struct powernv_php_slot *target, *tmp;
>>> +
>>> +	if (slot->dn == dn)
>>> +		return slot;
>>> +
>>> +	list_for_each_entry(tmp, &slot->children, link) {
>>> +		target = php_slot_match(dn, tmp);
>>> +		if (target)
>>> +			return target;
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn)
>>> +{
>>> +	struct powernv_php_slot *slot, *tmp;
>>> +	unsigned long flags;
>>> +
>>> +	spin_lock_irqsave(&php_slot_lock, flags);
>>> +	list_for_each_entry(tmp, &php_slot_list, link) {
>>> +		slot = php_slot_match(dn, tmp);
>>> +		if (slot) {
>>> +			spin_unlock_irqrestore(&php_slot_lock, flags);
>>> +			return slot;
>>> +		}
>>> +	}
>>> +	spin_unlock_irqrestore(&php_slot_lock, flags);
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +void powernv_php_slot_free(struct kref *kref)
>>> +{
>>> +	struct powernv_php_slot *slot = to_powernv_php_slot(kref);
>>> +
>>> +	WARN_ON(!list_empty(&slot->children));
>>> +	kfree(slot->name);
>>> +	kfree(slot);
>>> +}
>>> +
>>> +static void php_slot_release(struct hotplug_slot *hp_slot)
>>> +{
>>> +	struct powernv_php_slot *slot = hp_slot->private;
>>> +	unsigned long flags;
>>> +
>>> +	/* Remove from global or child list */
>>> +	spin_lock_irqsave(&php_slot_lock, flags);
>>> +	list_del(&slot->link);
>>> +	spin_unlock_irqrestore(&php_slot_lock, flags);
>>
>>
>> This is a good example where RCU rules. powernv_php_slot_find() returns slot
>> pointer and its use is not protected by spin_lock -> dangerous.
>>
>> Remove spin_lock(), s/list_del/list_del_rcu/, and move bits below to
>> call_rcu(), and s/list_for_each_entry/list_for_each_entry_rcu/.
>>
>
> Ok. It sounds RCU list might be better. I'll refactor it to use RCU list.
>
> I think the spinlock is needed when adding one node to the RCU link list. If so,
> the spinklock needn't to be removed?

Yes, spin_lock() is still needed when adding.


>
>>
>>> +
>>> +	/* Detach from parent */
>>> +	powernv_php_slot_put(slot);
>>> +	powernv_php_slot_put(slot->parent);
>>> +}
>>> +
>>> +static bool php_slot_get_id(struct device_node *dn,
>>> +			    uint64_t *id)
>>> +{
>>> +	struct device_node *parent = dn;
>>> +	const __be64 *prop64;
>>> +	const __be32 *prop32;
>>> +
>>> +	/*
>>> +	 * The hotpluggable slot always has a compound Id, which
>>> +	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
>>> +	 * number, and compound indicator
>>> +	 */
>>> +	*id = (0x1ul << 63);
>>> +
>>> +	/* Bus/Slot/Function number */
>>> +	prop32 = of_get_property(dn, "reg", NULL);
>>> +	if (!prop32)
>>> +		return false;
>>> +	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 8);
>>> +
>>> +	/* PHB Id */
>>> +	while ((parent = of_get_parent(parent))) {
>>> +		if (!PCI_DN(parent)) {
>>> +			of_node_put(parent);
>>> +			break;
>>> +		}
>>> +
>>> +		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
>>> +		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
>>> +			of_node_put(parent);
>>> +			continue;
>>> +		}
>>> +
>>> +		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
>>> +		if (!prop64) {
>>> +			of_node_put(parent);
>>> +			return false;
>>> +		}
>>> +
>>> +		*id |= be64_to_cpup(prop64);
>>> +		of_node_put(parent);
>>> +		return true;
>>> +	}
>>> +
>>> +	return false;
>>> +}
>>> +
>>> +struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn)
>>> +{
>>> +	struct eeh_dev *edev = pdn_to_eeh_dev(PCI_DN(dn));
>>> +	struct pci_bus *bus;
>>> +	struct powernv_php_slot *slot;
>>> +	const char *label;
>>> +	uint64_t id;
>>> +	int slot_no;
>>> +	size_t size;
>>> +	void *pmem;
>>> +
>>> +	/* Slot name */
>>> +	label = of_get_property(dn, "ibm,slot-label", NULL);
>>> +	if (!label)
>>> +		return NULL;
>>> +
>>> +	/* Slot identifier */
>>> +	if (!php_slot_get_id(dn, &id))
>>> +		return NULL;
>>> +
>>> +	/* PCI bus */
>>> +	bus = of_node_to_pci_bus(dn);
>>> +	if (!bus)
>>> +		return NULL;
>>> +
>>> +	/* Slot number */
>>> +	if (dn->child && PCI_DN(dn->child))
>>> +		slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
>>> +	else
>>> +		slot_no = -1;
>>
>> Not INVALID_SLOT and
>> #define INVALID_SLOT -1
>> ? :)
>>
>
> No need to do that. "-1" means it's a "placeholder" slot. None of
> PCI devices will be attached with the slot. "-1" is defined by
> PCI hotplug core

No, not really. drivers/pci/hotplug/pci_hotplug_core.c does not have "-1". 
Where is defined then? And this "slot_no" is a member of powernv_php_slot, 
not hotplug_slot.


> and it doesn't have a macro yet.
>
>>
>>> +
>>> +	/* Allocate slot */
>>> +	size = sizeof(struct powernv_php_slot) +
>>> +	       sizeof(struct hotplug_slot) +
>>> +	       sizeof(struct hotplug_slot_info);
>>> +	pmem = kzalloc(size, GFP_KERNEL);
>>> +	if (!pmem) {
>>> +		pr_warn("%s: Cannot allocate slot for node %s\n",
>>> +			__func__, dn->full_name);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	/* Assign memory blocks */
>>> +	slot = pmem;
>>> +	slot->php_slot = pmem + sizeof(struct powernv_php_slot);
>>> +	slot->php_slot->info = pmem + sizeof(struct powernv_php_slot) +
>>> +			      sizeof(struct hotplug_slot);
>>> +	slot->name = kstrdup(label, GFP_KERNEL);
>>> +	if (!slot->name) {
>>> +		pr_warn("%s: Cannot populate name for node %s\n",
>>> +			__func__, dn->full_name);
>>> +		kfree(pmem);
>>> +		return NULL;
>>> +	}
>>
>> Why not just embed structs one to another?
>>
>
> Good point. I'll do.
>
>>> +
>>> +	/* Initialize slot */
>>> +	kref_init(&slot->kref);
>>> +	slot->state = POWERNV_PHP_SLOT_STATE_INIT;
>>> +	slot->dn = dn;
>>> +	slot->pdev = eeh_dev_to_pci_dev(edev);
>>> +	slot->bus = bus;
>>> +	slot->id = id;
>>> +	slot->slot_no = slot_no;
>>> +	INIT_WORK(&slot->work, powernv_php_slot_work);
>>> +	init_waitqueue_head(&slot->queue);
>>> +	slot->check_power_status = 0;
>>> +	slot->status_confirmed = 0;
>>> +	slot->php_slot->ops = &php_slot_ops;
>>> +	slot->php_slot->release = php_slot_release;
>>> +	slot->php_slot->private = slot;
>>> +	INIT_LIST_HEAD(&slot->children);
>>> +	INIT_LIST_HEAD(&slot->link);
>>> +
>>> +	return slot;
>>> +}
>>> +
>>> +int powernv_php_slot_register(struct powernv_php_slot *slot)
>>> +{
>>> +	struct powernv_php_slot *parent;
>>> +	struct device_node *dn = slot->dn;
>>> +	unsigned long flags;
>>> +	int ret;
>>> +
>>> +	/* Avoid register same slot for twice */
>>> +	if (powernv_php_slot_find(slot->dn))
>>> +		return -EEXIST;
>>> +
>>> +	/* Register slot */
>>> +	ret = pci_hp_register(slot->php_slot, slot->bus,
>>> +			      slot->slot_no, slot->name);
>>> +	if (ret) {
>>> +		dev_warn(&slot->pdev->dev, "Error %d registering slot\n",
>>> +			 ret);
>>> +		return ret;
>>> +	}
>>> +
>>> +	/* Put into global or parent list */
>>> +	while ((dn = of_get_parent(dn))) {
>>> +		if (!PCI_DN(dn)) {
>>> +			of_node_put(dn);
>>> +			break;
>>> +		}
>>> +
>>> +		parent = powernv_php_slot_find(dn);
>>> +		if (parent) {
>>> +			of_node_put(dn);
>>> +			break;
>>> +		}
>>> +	}
>>> +
>>> +	spin_lock_irqsave(&php_slot_lock, flags);
>>> +	if (parent) {
>>> +		powernv_php_slot_get(parent);
>>> +		slot->parent = parent;
>>> +		list_add_tail(&slot->link, &parent->children);
>>> +	} else {
>>> +		list_add_tail(&slot->link, &php_slot_list);
>>> +	}
>>> +	spin_unlock_irqrestore(&php_slot_lock, flags);
>>> +
>>> +	/* Update slot state */
>>> +	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
>>> +	return 0;
>>> +}
>>>
>>
>>
>> Now I finished with this patchset respin :)
>>
>
> Ok. Appreciated for your time on this :-)

you're welcome :)

And since Bjorn ack'ed this patch already, you can probably ditch all my 
comments, his ack is cooler anyway :)


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 20/42] powerpc/powernv: Create PEs dynamically
  2015-08-15  4:59     ` Gavin Shan
@ 2015-08-15  9:23       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 102+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-15  9:23 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto

On 08/15/2015 02:59 PM, Gavin Shan wrote:
> On Fri, Aug 14, 2015 at 11:52:44PM +1000, Alexey Kardashevskiy wrote:
>> On 08/06/2015 02:11 PM, Gavin Shan wrote:
>>> Currently, the PEs and their associated resources are assigned
>>> in ppc_md.pcibios_fixup() except those consumed by SRIOV VFs.
>>> The function is called for once after PCI probing and resources
>>> assignment is finished which isn't hotplug friendly.
>>>
>>> The patch creates PEs dynamically by ppc_md.pcibios_setup_bridge(),
>>> which is called on the event during system bootup and PCI hotplug:
>>> updating PCI bridge's windows after resource assignment/reassignment
>>> are finished. For partial hotplug case, where not all PCI devices
>>> belonging to the PE are unplugged and plugged again, we just need
>>> unbinding/binding the affected PCI devices with the corresponding
>>> PE without creating new one.
>>>
>>> Besides, it might require additional resources (e.g. M32) to the
>>> windows of the PCI bridge when unplugging current adapter, and
>>> insert a different adapter if there is one PCI slot, which is
>>> assumed behind root port, or the downstream bridge of the PCIE
>>> switch behind root port. The parent bridge of the newly plugged
>>> adapter would reject the request to add more resources, leading
>>> to hotplug failure. For the issue, the patch extends the windows
>>> of root port, or the upstream port of the PCIe switch behind root
>>> port to PHB's windows when ppc_md.pcibios_setup_bridge() is called.
>>>
>>> There is no upstream bridge for root bus, so we have to fix it up
>>> before any PE is created because the root bus PE is the ancestor
>>> to anyone else.
>>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 226 ++++++++++++++++++------------
>>>   arch/powerpc/platforms/powernv/pci.h      |   1 +
>>>   2 files changed, 137 insertions(+), 90 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 8aa6ab8..37847a3 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -1083,6 +1083,13 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
>>>   				pci_name(dev));
>>>   			continue;
>>>   		}
>>> +
>>> +		/* The PCI device might be not detached from the
>>> +		 * PE in partial hotplug case.
>>> +		 */
>>> +		if (pdn->pe_number != IODA_INVALID_PE)
>>> +			continue;
>>> +
>>>   		pdn->pe_number = pe->pe_number;
>>>   		pe->dma32_weight += pnv_ioda_dma_weight(dev);
>>>   		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>> @@ -1101,9 +1108,27 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>   	struct pci_controller *hose = pci_bus_to_host(bus);
>>>   	struct pnv_phb *phb = hose->private_data;
>>>   	struct pnv_ioda_pe *pe = NULL;
>>> +	int pe_num;
>>> +
>>> +	/* For partial hotplug case, the PE instance hasn't been destroyed
>>> +	 * yet. We shouldn't allocated a new one and assign resources to
>>> +	 * it. The existing PE instance should be reused, but we should
>>> +	 * associate the devices to the PE.
>>> +	 */
>>> +	pe_num = phb->ioda.pe_rmap[bus->number << 8];
>>> +	if (pe_num != IODA_INVALID_PE) {
>>> +		pe = &phb->ioda.pe_array[pe_num];
>>> +		pnv_ioda_setup_same_PE(bus, pe);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	/* PE number for root bus should have been reserved */
>>> +	if (pci_is_root_bus(bus) &&
>>> +	    phb->ioda.root_pe_idx != IODA_INVALID_PE)
>>> +		pe = &phb->ioda.pe_array[phb->ioda.root_pe_idx];
>>>
>>>   	/* Check if PE is determined by M64 */
>>> -	if (phb->pick_m64_pe)
>>> +	if (!pe && phb->pick_m64_pe)
>>
>>
>> else if (phb->pick_m64_pe)
>>
>
> No. When this function is called for the root of root bus, the PE
> should have been reserved. So we still have to check @pe.

When you check for "if (!pe && phb->pick_m64_pe)", pe may be not NULL 
_only_ if it was assigned by
"pe = &phb->ioda.pe_array[phb->ioda.root_pe_idx]"
and this assignment cannot produce NULL (assuming phb->ioda.pe_array!=NULL).


So instead of "if (!pe && phb->pick_m64_pe)" you could do "else if 
(phb->pick_m64_pe)". It is not serious at all, more about better readability.


>
>>
>>
>>>   		pe = phb->pick_m64_pe(bus, all);
>>>
>>>   	/* The PE number isn't pinned by M64 */
>>> @@ -1150,46 +1175,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
>>>   	return pe;
>>>   }
>>>
>>> -static void pnv_ioda_setup_PEs(struct pci_bus *bus)
>>> -{
>>> -	struct pci_dev *dev;
>>> -
>>> -	pnv_ioda_setup_bus_PE(bus, false);
>>> -
>>> -	list_for_each_entry(dev, &bus->devices, bus_list) {
>>> -		if (dev->subordinate) {
>>> -			if (pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE)
>>> -				pnv_ioda_setup_bus_PE(dev->subordinate, true);
>>> -			else
>>> -				pnv_ioda_setup_PEs(dev->subordinate);
>>> -		}
>>> -	}
>>> -}
>>> -
>>> -/*
>>> - * Configure PEs so that the downstream PCI buses and devices
>>> - * could have their associated PE#. Unfortunately, we didn't
>>> - * figure out the way to identify the PLX bridge yet. So we
>>> - * simply put the PCI bus and the subordinate behind the root
>>> - * port to PE# here. The game rule here is expected to be changed
>>> - * as soon as we can detected PLX bridge correctly.
>>> - */
>>> -static void pnv_pci_ioda_setup_PEs(void)
>>> -{
>>> -	struct pci_controller *hose, *tmp;
>>> -	struct pnv_phb *phb;
>>> -
>>> -	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>> -		phb = hose->private_data;
>>> -
>>> -		/* M64 layout might affect PE allocation */
>>> -		if (phb->reserve_m64_pe)
>>> -			phb->reserve_m64_pe(hose->bus, NULL, true);
>>> -
>>> -		pnv_ioda_setup_PEs(hose->bus);
>>> -	}
>>> -}
>>> -
>>>   #ifdef CONFIG_PCI_IOV
>>>   static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>>   {
>>> @@ -2962,52 +2947,6 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>>>   	}
>>>   }
>>>
>>> -static void pnv_pci_ioda_setup_seg(void)
>>> -{
>>> -	struct pci_controller *tmp, *hose;
>>> -	struct pnv_phb *phb;
>>> -	struct pnv_ioda_pe *pe;
>>> -
>>> -	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>> -		phb = hose->private_data;
>>> -		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>>> -			pnv_ioda_setup_pe_seg(hose, pe);
>>> -		}
>>> -	}
>>> -}
>>> -
>>> -static void pnv_pci_ioda_setup_DMA(void)
>>> -{
>>> -	struct pci_controller *hose, *tmp;
>>> -	struct pnv_phb *phb;
>>> -	struct pnv_ioda_pe *pe;
>>> -
>>> -	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>> -		phb = hose->private_data;
>>> -		pnv_pci_ioda_setup_opal_tce_kill(phb);
>>> -
>>> -		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
>>> -			if (!pe->dma32_weight)
>>> -				continue;
>>> -
>>> -			switch (phb->type) {
>>> -			case PNV_PHB_IODA1:
>>> -				pnv_ioda1_setup_dma(phb, pe);
>>> -				break;
>>> -			case PNV_PHB_IODA2:
>>> -				pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>> -				break;
>>> -			default:
>>> -				pr_warn("%s: No DMA for PHB type %d\n",
>>> -					__func__, phb->type);
>>> -			}
>>> -		}
>>> -
>>> -		/* Mark the PHB initialization done */
>>> -		phb->initialized = 1;
>>> -	}
>>> -}
>>> -
>>>   static void pnv_pci_ioda_create_dbgfs(void)
>>>   {
>>>   #ifdef CONFIG_DEBUG_FS
>>> @@ -3029,9 +2968,8 @@ static void pnv_pci_ioda_create_dbgfs(void)
>>>
>>>   static void pnv_pci_ioda_fixup(void)
>>>   {
>>> -	pnv_pci_ioda_setup_PEs();
>>> -	pnv_pci_ioda_setup_seg();
>>> -	pnv_pci_ioda_setup_DMA();
>>> +	struct pci_controller *hose, *tmp;
>>> +	struct pnv_phb *phb;
>>>
>>>   	pnv_pci_ioda_create_dbgfs();
>>>
>>> @@ -3039,6 +2977,12 @@ static void pnv_pci_ioda_fixup(void)
>>>   	eeh_init();
>>>   	eeh_addr_cache_build();
>>>   #endif
>>> +
>>> +	/* Notify initialization of PHB done */
>>> +	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>>> +		phb = hose->private_data;
>>> +		phb->initialized = 1;
>>> +	}
>>>   }
>>>
>>>   /*
>>> @@ -3082,6 +3026,105 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>>>   	return phb->ioda.io_segsize;
>>>   }
>>>
>>> +/*
>>> + * We are updating root port or the upstream bridge behind the
>>> + * root port with PHB's windows in order to accommodate the
>>> + * changes on required resources during PCI (slot) hotplug,
>>> + * which is connected to either root port or the downstream
>>> + * ports of PCIe switch behind the root port.
>>> + */
>>> +static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus,
>>> +					   unsigned long type)
>>> +{
>>> +	struct pci_controller *hose = pci_bus_to_host(bus);
>>> +	struct pnv_phb *phb = hose->private_data;
>>> +	struct pci_dev *bridge = bus->self;
>>> +	struct resource *r, *w;
>>> +	int i;
>>> +
>>> +	/* Check if we need apply fixup to the bridge's windows */
>>> +	if (!pci_is_root_bus(bridge->bus) &&
>>> +	    !pci_is_root_bus(bridge->bus->self->bus))
>>> +		return;
>>> +
>>> +	/* Fixup the resoureces */
>>> +	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
>>> +		r = &bridge->resource[PCI_BRIDGE_RESOURCES + i];
>>> +		if (!r->flags || !r->parent)
>>> +			continue;
>>> +
>>> +		w = NULL;
>>> +		if (r->flags & type & IORESOURCE_IO)
>>> +			w = &hose->io_resource;
>>> +		else if (pnv_pci_is_mem_pref_64(r->flags) &&
>>> +			 (type & IORESOURCE_PREFETCH) &&
>>> +			 phb->ioda.m64_segsize)
>>> +			w = &hose->mem_resources[1];
>>> +		else if (r->flags & type & IORESOURCE_MEM)
>>> +			w = &hose->mem_resources[0];
>>> +
>>> +		r->start = w->start;
>>> +		r->end = w->end;
>>> +	}
>>> +}
>>> +
>>> +static void pnv_pci_setup_bridge(struct pci_bus *bus,
>>> +				 unsigned long type)
>>> +{
>>> +	struct pci_controller *hose = pci_bus_to_host(bus);
>>> +	struct pnv_phb *phb = hose->private_data;
>>> +	struct pci_dev *bridge = bus->self;
>>> +	struct pnv_ioda_pe *pe;
>>> +	bool all = (pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
>>> +
>>> +	/* The root bus (ancestor PE) should be finalized
>>> +	 * before anyone else
>>> +	 */
>>> +	if (!phb->ioda.root_pe_is_populated) {
>>> +		pe = pnv_ioda_setup_bus_PE(phb->hose->bus, false);
>>> +		if (pe && phb->ioda.root_pe_idx == IODA_INVALID_PE)
>>> +			phb->ioda.root_pe_idx = pe->pe_number;
>>> +			phb->ioda.root_pe_is_populated = true;
>>> +		}
>>
>>
>> This "}" should be 1 tab left. Of you lost one "{" after if() and its
>> counterpart.
>>
>
> Good catch!
>
>>> +
>>> +	/* Extend bridge's windows if necessary */
>>> +	pnv_pci_fixup_bridge_resources(bus, type);
>>> +
>>> +	/* Don't assign PE to bus which doesn't have any
>>> +	 * subordinate PCI devices.
>>> +	 */
>>> +	if (list_empty(&bus->devices))
>>> +		return;
>>> +
>>> +	/* Reserve PEs for M64 resource */
>>> +	if (phb->reserve_m64_pe)
>>> +		phb->reserve_m64_pe(bus, NULL, all);
>>> +
>>> +	/* Assign PE. We might run here because of partial hotplug.
>>> +	 * For the case, we just pick up the existing PE and should
>>> +	 * not allocate resources again.
>>> +	 */
>>> +	pe = pnv_ioda_setup_bus_PE(bus, all);
>>> +	if (!pe)
>>> +		return;
>>> +
>>> +	/* Setup MMIO mapping */
>>> +	pnv_ioda_setup_pe_seg(hose, pe);
>>> +
>>> +	/* Setup DMA */
>>> +	switch (phb->type) {
>>> +	case PNV_PHB_IODA1:
>>> +		pnv_ioda1_setup_dma(phb, pe);
>>> +		break;
>>> +	case PNV_PHB_IODA2:
>>> +		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>> +		break;
>>> +	default:
>>> +		pr_warn("%s: No DMA for PHB type %d\n",
>>> +			__func__, phb->type);
>>> +	}
>>> +}
>>> +
>>>   #ifdef CONFIG_PCI_IOV
>>>   static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>   						      int resno)
>>> @@ -3147,6 +3190,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
>>>   #endif
>>>          .enable_device_hook = pnv_pci_enable_device_hook,
>>>          .window_alignment = pnv_pci_window_alignment,
>>> +	.setup_bridge = pnv_pci_setup_bridge,
>>>          .reset_secondary_bus = pnv_pci_reset_secondary_bus,
>>>          .dma_set_mask = pnv_pci_ioda_dma_set_mask,
>>>          .shutdown = pnv_pci_ioda_shutdown,
>>> @@ -3218,6 +3262,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>   	if (phb->regs == NULL)
>>>   		pr_err("  Failed to map registers !\n");
>>>
>>> +	pnv_pci_ioda_setup_opal_tce_kill(phb);
>>> +
>>>   	/* Initialize more IODA stuff */
>>>   	phb->ioda.total_pe_num = 1;
>>>   	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
>>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>> index e93a489..a160491 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.h
>>> +++ b/arch/powerpc/platforms/powernv/pci.h
>>> @@ -136,6 +136,7 @@ struct pnv_phb {
>>>   			/* Global bridge info */
>>>   			unsigned int		total_pe_num;
>>>   			unsigned int		root_pe_idx;
>>> +			bool			root_pe_is_populated;
>>>   			unsigned int		reserved_pe_idx;
>>>
>>>   			/* 32-bit MMIO window */
>>>
>
> Thanks,
> Gavin
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level
  2015-08-06  4:11 ` [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level Gavin Shan
  2015-08-06 14:09   ` Rob Herring
@ 2015-11-03 23:16   ` Gavin Shan
  1 sibling, 0 replies; 102+ messages in thread
From: Gavin Shan @ 2015-11-03 23:16 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linuxppc-dev, linux-pci, devicetree, benh, mpe, bhelgaas,
	grant.likely, robherring2, panto, aik

On Thu, Aug 06, 2015 at 02:11:43PM +1000, Gavin Shan wrote:
>unflatten_dt_node() is called recursively to unflatten FDT nodes
>with the assumption that FDT blob has only one root node, which
>isn't true when the FDT blob represents device sub-tree. This
>improves the function to supporting device sub-tree that have
>multiple nodes in the first level:
>
>   * Rename original unflatten_dt_node() to __unflatten_dt_node().
>   * Wrapper unflatten_dt_node() calls __unflatten_dt_node() with
>     adjusted current node depth to 1 to avoid underflow.
>
>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>---
> drivers/of/fdt.c | 53 ++++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 40 insertions(+), 13 deletions(-)
>
>diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>index 0749656..a18a2ce 100644
>--- a/drivers/of/fdt.c
>+++ b/drivers/of/fdt.c
>@@ -161,7 +161,7 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
> }
>
> /**
>- * unflatten_dt_node - Alloc and populate a device_node from the flat tree
>+ * __unflatten_dt_node - Alloc and populate a device_node from the flat tree
>  * @blob: The parent device tree blob
>  * @mem: Memory chunk to use for allocating device nodes and properties
>  * @poffset: pointer to node in flat tree
>@@ -171,20 +171,20 @@ static void *unflatten_dt_alloc(void **mem, unsigned long size,
>  * @dryrun: If true, do not allocate device nodes but still calculate needed
>  * memory size
>  */
>-static void * unflatten_dt_node(const void *blob,
>+static void *__unflatten_dt_node(const void *blob,
> 				void *mem,
> 				int *poffset,
> 				struct device_node *dad,
> 				struct device_node **nodepp,
> 				unsigned long fpsize,
>-				bool dryrun)
>+				bool dryrun,
>+				int *depth)
> {
> 	const __be32 *p;
> 	struct device_node *np;
> 	struct property *pp, **prev_pp = NULL;
> 	const char *pathp;
> 	unsigned int l, allocl;
>-	static int depth = 0;
> 	int old_depth;
> 	int offset;
> 	int has_name = 0;
>@@ -337,13 +337,25 @@ static void * unflatten_dt_node(const void *blob,
> 			np->type = "<NULL>";
> 	}
>
>-	old_depth = depth;
>-	*poffset = fdt_next_node(blob, *poffset, &depth);
>-	if (depth < 0)
>-		depth = 0;
>-	while (*poffset > 0 && depth > old_depth)
>-		mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
>-					fpsize, dryrun);
>+	/* Multiple nodes might be in the first depth level if
>+	 * the device tree is sub-tree. All nodes in current
>+	 * or deeper depth are unflattened after it returns.
>+	 */
>+	old_depth = *depth;
>+	*poffset = fdt_next_node(blob, *poffset, depth);
>+	while (*poffset > 0) {
>+		if (*depth < old_depth)
>+			break;
>+
>+		if (*depth == old_depth)
>+			mem = __unflatten_dt_node(blob, mem, poffset,
>+						  dad, NULL, fpsize,
>+						  dryrun, depth);
>+		else if (*depth > old_depth)
>+			mem = __unflatten_dt_node(blob, mem, poffset,
>+						  np, NULL, fpsize,
>+						  dryrun, depth);
>+	}
>

Sorry for the delay. I'm afraid this one has to be reworked. With current
code and changes, the nodes in the FDT blob are scanned in recursive fasion.
That would cause exhausted stack when this function is called at early stage
of system boot to unflatten the device tree that have too much levels and nodes.
In next revision, I'll rework it to avoid recursive calls on this function.

So there're more time needed to post next revision. This issue was observed in
recent testing with 4.3.rc6 and the patchset. On P7 box, the bad stack is reported
directly. On P8 box, the /bin/init in the initram image can't be started properly.
I run "git bisect" and this patch is located in both case.

> 	if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
> 		pr_err("unflatten: error %d processing FDT\n", *poffset);
>@@ -369,6 +381,20 @@ static void * unflatten_dt_node(const void *blob,
> 	return mem;
> }
>
>+static void *unflatten_dt_node(const void *blob,
>+			       void *mem,
>+			       int *poffset,
>+			       struct device_node *dad,
>+			       struct device_node **nodepp,
>+			       bool dryrun)
>+{
>+	int depth = 1;
>+
>+	return __unflatten_dt_node(blob, mem, poffset,
>+				   dad, nodepp, 0,
>+				   dryrun, &depth);
>+}
>+
> /**
>  * __unflatten_device_tree - create tree of device_nodes from flat blob
>  *
>@@ -408,7 +434,8 @@ static void __unflatten_device_tree(const void *blob,
>
> 	/* First pass, scan for size */
> 	start = 0;
>-	size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, NULL, 0, true);
>+	size = (unsigned long)unflatten_dt_node(blob, NULL, &start,
>+						NULL, NULL, true);
> 	size = ALIGN(size, 4);
>
> 	pr_debug("  size is %lx, allocating...\n", size);
>@@ -423,7 +450,7 @@ static void __unflatten_device_tree(const void *blob,
>
> 	/* Second pass, do actual unflattening */
> 	start = 0;
>-	unflatten_dt_node(blob, mem, &start, NULL, mynodes, 0, false);
>+	unflatten_dt_node(blob, mem, &start, NULL, mynodes, false);
> 	if (be32_to_cpup(mem + size) != 0xdeadbeef)
> 		pr_warning("End of tree marker overwritten: %08x\n",
> 			   be32_to_cpup(mem + size));
>-- 
>2.1.0
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2015-11-03 23:16 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-06  4:11 [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Gavin Shan
2015-08-06  4:11 ` [PATCH v6 01/42] PCI: Add pcibios_setup_bridge() Gavin Shan
2015-08-06  4:11 ` [PATCH v6 02/42] powerpc/powernv: Drop pnv_ioda_setup_dev_PE() Gavin Shan
2015-08-06  4:11 ` [PATCH v6 03/42] powerpc/powernv: Enable M64 on P7IOC Gavin Shan
2015-08-10  6:30   ` Alexey Kardashevskiy
2015-08-10 23:45     ` Gavin Shan
2015-08-11  2:06       ` Alexey Kardashevskiy
2015-08-12 10:28         ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 04/42] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
2015-08-06  4:11 ` [PATCH v6 05/42] powerpc/powernv: Track IO/M32/M64 segments from PE Gavin Shan
2015-08-10  7:16   ` Alexey Kardashevskiy
2015-08-11  0:03     ` Gavin Shan
2015-08-11  2:23       ` Alexey Kardashevskiy
2015-08-12 10:45         ` Gavin Shan
2015-08-12 11:05           ` Alexey Kardashevskiy
2015-08-12 11:20             ` Gavin Shan
2015-08-12 12:57               ` Alexey Kardashevskiy
2015-08-12 23:34                 ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 06/42] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
2015-08-06  4:11 ` [PATCH v6 07/42] powerpc/powernv: Improve IO and M32 mapping Gavin Shan
     [not found]   ` <1438834307-26960-8-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2015-08-10  7:40     ` Alexey Kardashevskiy
2015-08-11  0:12       ` Gavin Shan
2015-08-11  2:32         ` Alexey Kardashevskiy
2015-08-12 23:42           ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 08/42] powerpc/powernv: Calculate PHB's DMA weight dynamically Gavin Shan
2015-08-10  7:48   ` Alexey Kardashevskiy
2015-08-10  9:21   ` Alexey Kardashevskiy
2015-08-12 23:57     ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 09/42] powerpc/powernv: DMA32 cleanup Gavin Shan
2015-08-10  8:07   ` Alexey Kardashevskiy
2015-08-11  0:19     ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 10/42] powerpc/powernv: pnv_ioda_setup_dma() configure one PE only Gavin Shan
2015-08-10  9:31   ` Alexey Kardashevskiy
2015-08-11  0:29     ` Gavin Shan
2015-08-11  2:39       ` Alexey Kardashevskiy
2015-08-12 23:59         ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 11/42] powerpc/powernv: Trace DMA32 segments consumed by PE Gavin Shan
2015-08-10  9:43   ` Alexey Kardashevskiy
2015-08-11  0:33     ` Gavin Shan
2015-08-13  0:02     ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 13/42] powerpc/pci: Cleanup on pci_controller_ops Gavin Shan
2015-08-06  4:11 ` [PATCH v6 14/42] powerpc/pci: Override pcibios_setup_bridge() Gavin Shan
2015-08-06  4:11 ` [PATCH v6 15/42] powerpc/powernv: PE oriented during configuration Gavin Shan
2015-08-10 10:02   ` Alexey Kardashevskiy
2015-08-11  0:39     ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 16/42] powerpc/powernv: Helper function pnv_ioda_init_pe() Gavin Shan
2015-08-06  4:11 ` [PATCH v6 18/42] powerpc/powernv: Allocate PE# in deasending order Gavin Shan
2015-08-10 14:39   ` Alexey Kardashevskiy
2015-08-11  0:43     ` Gavin Shan
2015-08-11  2:50       ` Alexey Kardashevskiy
2015-08-13  0:28         ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 19/42] powerpc/powernv: Reserve PE# for root bus Gavin Shan
2015-08-06  4:11 ` [PATCH v6 20/42] powerpc/powernv: Create PEs dynamically Gavin Shan
2015-08-14 13:52   ` Alexey Kardashevskiy
2015-08-15  4:59     ` Gavin Shan
2015-08-15  9:23       ` Alexey Kardashevskiy
2015-08-06  4:11 ` [PATCH v6 21/42] powerpc/powernv: Remove DMA32 list of PEs Gavin Shan
2015-08-06  4:11 ` [PATCH v6 22/42] powerpc/powernv: Move functions around Gavin Shan
2015-08-06  4:11 ` [PATCH v6 23/42] powerpc/powernv: Release PEs dynamically Gavin Shan
2015-08-11 13:03   ` Alexey Kardashevskiy
2015-08-13  0:54     ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 24/42] powerpc/powernv: Supports slot ID Gavin Shan
2015-08-06  4:11 ` [PATCH v6 25/42] powerpc/powernv: Use PCI slot reset infrastructure Gavin Shan
2015-08-06  4:11 ` [PATCH v6 26/42] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
2015-08-06  4:11 ` [PATCH v6 27/42] powerpc/powernv: Don't cover root bus in pnv_pci_reset_secondary_bus() Gavin Shan
2015-08-06  4:11 ` [PATCH v6 29/42] powerpc/pci: Don't scan empty slot Gavin Shan
2015-08-06  4:11 ` [PATCH v6 30/42] powerpc/pci: Move pcibios_find_pci_bus() around Gavin Shan
2015-08-06  4:11 ` [PATCH v6 31/42] powerpc/pci: Rename pcibios_{add,remove}_pci_devices Gavin Shan
     [not found] ` <1438834307-26960-1-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2015-08-06  4:11   ` [PATCH v6 12/42] powerpc/powernv: Increase PE# capacity Gavin Shan
     [not found]     ` <1438834307-26960-13-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2015-08-10  9:53       ` Alexey Kardashevskiy
2015-08-11  0:38         ` Gavin Shan
2015-08-11  2:47           ` Alexey Kardashevskiy
2015-08-13  0:23             ` Gavin Shan
2015-08-06  4:11   ` [PATCH v6 17/42] powerpc/powernv: Rename PE# fields in PHB Gavin Shan
2015-08-10 14:21     ` Alexey Kardashevskiy
2015-08-11  0:40       ` Gavin Shan
2015-08-06  4:11   ` [PATCH v6 28/42] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus() Gavin Shan
2015-08-06  4:11   ` [PATCH v6 32/42] powerpc/powernv: Introduce pnv_pci_poll() Gavin Shan
2015-08-06  4:11   ` [PATCH v6 33/42] powerpc/powernv: Functions to get/reset PCI slot status Gavin Shan
2015-08-06  4:11   ` [PATCH v6 34/42] powerpc/pci: Delay creating pci_dn Gavin Shan
2015-08-06  4:11   ` [PATCH v6 37/42] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
2015-08-06  4:11 ` [PATCH v6 35/42] powerpc/pci: Export traverse_pci_device_nodes() Gavin Shan
2015-08-06  4:11 ` [PATCH v6 36/42] powerpc/pci: Update bridge windows on PCI plugging Gavin Shan
2015-08-06  4:11 ` [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level Gavin Shan
2015-08-06 14:09   ` Rob Herring
2015-11-03 23:16   ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 39/42] drivers/of: Allow to specify root node in of_fdt_unflatten_tree() Gavin Shan
2015-08-10 22:42   ` Frank Rowand
2015-08-11  0:52     ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree() Gavin Shan
     [not found]   ` <1438834307-26960-41-git-send-email-gwshan-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2015-08-06 14:19     ` Rob Herring
2015-08-10 22:42   ` Frank Rowand
2015-08-11  0:52     ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 41/42] drivers/of: Export OF changeset functions Gavin Shan
2015-08-06 13:48   ` Rob Herring
2015-08-07  1:43     ` Gavin Shan
2015-08-06  4:11 ` [PATCH v6 42/42] pci/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
2015-08-15  3:13   ` Alexey Kardashevskiy
2015-08-15  4:47     ` Gavin Shan
2015-08-15  9:15       ` Alexey Kardashevskiy
2015-08-10  6:05 ` [PATCH v6 00/42] powerpc/powernv: PCI hotplug suppport Alexey Kardashevskiy
2015-08-10  7:17   ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).